CN109271683B

CN109271683B - Building group automatic arrangement algorithm for sunlight constraint

Info

Publication number: CN109271683B
Application number: CN201810995301.0A
Authority: CN
Inventors: 孙澄宇
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2020-03-20
Anticipated expiration: 2038-08-29
Also published as: CN109271683A

Abstract

The invention discloses a building group automatic arrangement algorithm facing sunlight restriction, which is an automatic algorithm for dealing with the problem of 'how to arrange a plurality of building units (mainly high-rise buildings) to accord with the sunlight restriction of self and surrounding buildings' when planning and building design are carried out in a high-density urban environment. The method specifically comprises the following steps: according to the arrangement application professional knowledge, a solution 'environment' model capable of flexibly describing various arrangement problems is designed in the form of a mathematical model, a single building is used as a main body of a movement decision, and the arrangement problems of all buildings are described as the movement decision problems of buildings one by one in a step serial mode; meanwhile, designing an agent model which can realize mobile decision on a single building through information interaction with the environment model; a step-by-step regression type solving cycle of interaction between an agent and an environment is constructed by utilizing the existing deep reinforcement learning algorithm.

Description

Building group automatic arrangement algorithm for sunlight constraint

Technical Field

The invention provides an automatic algorithm for solving the problem of 'how to arrange a plurality of building units (mainly high-rise buildings) to meet the sunlight constraint of the building units and surrounding buildings' when planning and building design are carried out in a high-density urban environment.

Background

With the continuous promotion of the urbanization process in China, the high-density urban space becomes a normal state. To ensure the healthy life of a large number of citizens, both central and local levels of government have promulgated design specifications relating to ensuring that different types of buildings enjoy a minimum number of hours of sunlight on a particular date of the year. For example, in the national code "residential building code (GB 50368-2005)", "4.1.1 elderly houses should not be lower than the standard of sunshine 2h in winter solstice day", in "residential design standard (DGJ 08-20-2007)" in shanghai, "at least one living space should be provided in the small and middle housings of 3.2.1 high-rise houses, and at least two living spaces should be provided in the large housing to obtain effective sunshine of not less than 1h in winter solstice day with continuous full window. "

For a long time, in order to meet the above requirements, city planning and design and building designers need to repeatedly perform "fine-tuning arrangement and sunlight calculation" by adopting a "trial-and-error method" in the process of arranging buildings within the range of a base. Although the arrangement of the buildings can be completed at one time only according to the sunshine space coefficient in the law, in the high-density cities with tall buildings standing, the high-rise problem that the arrangement can not be performed by using the coefficient is mostly faced. On one hand, the shade of other high-rise buildings inside and outside the base can affect the sunshine condition of the buildings in the base; on the other hand, high-rise buildings in the base also need to be considered at the same time, and the sunshine influence on other buildings inside and outside the base is also needed. It can be seen that the positioning of high-rise buildings in the base is a key and difficult point of the problem in this arrangement. The situations that the peripheral buildings faced by each design are different, the space range capable of being distributed is different due to the shapes of bases, the number, the size and the height of the buildings contained in the design are different, and the like are difficult to solve by establishing a clear logic 'white box' algorithm. So far, the arrangement task is still completed by a time-consuming and labor-consuming manual mode in the design industry.

Therefore, in the current practical building arrangement, the sunshine constraint is one of the important factors that restrict the design efficiency in the high-density city construction compared with other design constraints (such as building spacing, bound-off and the like) without repeated operation due to the complexity of repeated trial and error.

Aiming at the problem, because the national conditions (land ownership system, environment density and height, design requirements of various buildings) in the foreign design industry are different from those in China, no sunlight-constraint-oriented automatic arrangement algorithm for high-rise building groups is directly available. On the contrary, there are some related research results in China.

One is the related research which can automatically generate the building arrangement result but does not relate to the sunshine constraint.

For example, Li Biao et al at southeast university base performs building generation design based on a "quasi-rectangular base subdivision" method, subdivides any given building terrain into a plurality of approximate rectangles, and then inserts a building group with Chinese traditional residence characteristics in a self-adaptive manner to generate a building block covering all bases^[1]The focus is moreMuch lies in the form itself, not the sun constraint.

For another example, by analyzing the shapes and building arrangement types of a large number of urban residential land cases, and by establishing a three-level parameterized prototype of a street, a group and a building, the Sucheng and the like of the university of the same province realize automatic arrangement generation from development indexes to three-dimensional building digital models without considering sunlight constraint conditions.^[2]

The other type is the building arrangement result which can automatically generate the constraint of sunshine, but the arrangement problem which can be processed has obvious limitation and can not adapt to the requirement of actual design work.

For example, Song Xiaodong et al, university and college, in 2010, attempted to perform computer-automated residential building layout under the constraints of the prevailing sun exposure standards. The method uses a two-dimensional packing algorithm for reference to automatically arrange a rectangular range comprising a building and a distance control area, and the requirement on the sunshine constraint condition is completely dependent on the size estimation of the distance control area. Obviously, the 'distance control area' in the algorithm is estimated according to the external rectangular coverage range and the empirical parameters of the outline of the building single body, and the sun shadow overlapping factor of the surrounding buildings cannot be considered fundamentally, so that the area can only be appropriately amplified to ensure that the generated result can meet the sun constraint at a higher probability, which directly causes that the automatic arrangement algorithm cannot really and fully utilize the favorable condition of mutually overlapping the sun shadow, and the arrangement density is improved to adapt to the actual design requirement of a high-density city.^[3]

For another example, in 2013, according to constraints such as sunshine, fire fighting and the like which need to be considered by buildings in residential areas, Ling et al of the university of science and technology in Huazhong constructs a mathematical model of building layout, and realizes automatic and rapid layout of a determinant and a peripheral type. According to introduction, the method only reflects the sunshine constraint processing in that the position relation between the building single bodies is controlled once by adopting a sunshine spacing coefficient of 1.2, and the method cannot deal with the high-rise actual shadow overlapping arrangement problem at all and cannot be applied to actual design.^[4]

For example, according to the introduction of the propaganda text, the "strongly-excluding" technology published in the Shenzhen Xiaoku science and technology in 2017 adopts two deep Neural Network technologies, namely "Convolutional Neural Network (CNN)" and "generate countermeasure Network (GAN)". The neural network is trained by using a large number of excellent building layout cases, so that the neural network can automatically generate a batch of layout results according to a given base condition, wherein a part of the layout results can meet various design requirements at a higher probability (namely, the so-called 'qualified rate'), and the sunshine constraint condition is included. [5] It is obvious to conclude that since the layout results that can be automatically generated here are completely dependent on the existing case characteristics for training, i.e. it is required to find a large number of corresponding training cases for various potential base conditions (including latitude, boundary shape, volume ratio, building density, number of building units, contour shape, height, existing building conditions around, etc.) -this is not practical under variable practical design conditions, i.e. the method has inherent disadvantages in adaptability.

Prior art indexing:

[1] li biao, Han Dong, technical understanding of building creation design and its prospect [ J ] architecture bulletin, 2011,06:96-100.

[2] Sun Cheng, Luo Qiming, Song Xiao Dong, Xie Junmin, rich appraisal, practice-oriented automatic generation method of city three-dimensional model-taking the strength partition planning of North sea city as an example, architecture bulletin, 2017, 8: 77-81.

[3] The estimation method of the volume rate of the residential plots in Song Xiao winter, Pan Lei, Sunsheng, City planning, 2010, 02: 57-63.

[4] Lingering, wang xiao bo, li cheng gang, automatic layout algorithm [ J ] for residential buildings based on internal feasible areas computer technology and development, 2013,08:1-4.

[5] AI changes into the centuries automobile industry? The millennium construction industry also needs to rescue [ EB/OL ]: http:// mp. weixin. qq. com/s/8XKKyxjgBU8 AkZkkkkkkkkkD 5iMQA,2017.12.27:

[6]Timothy P.Lillicrap,Jonathan J.Hunt,Alexander Pritzel,NicolasHeess,Tom Erez,Yuval Tassa,David Silver,Daan Wierstra.Continuous Control withDeep Reinforcement Learning[EB/OL].https://arxiv.org/abs/1509.02971,2016.2.29:

disclosure of Invention

The technical scheme required to be protected is characterized in that:

technical scheme one

A building body automatic arrangement design method oriented to calculable performance constraint is characterized in that a solution 'environment' model capable of flexibly describing various arrangement problems is designed in a mathematical model form according to arrangement application professional knowledge, a single building is used as a main body of a movement decision, and the arrangement problems of all buildings are described as the movement decision problems of one building by one in a step serial mode; meanwhile, designing an agent model which can realize mobile decision on a single building through information interaction with the environment model; constructing a step-by-step regression type solving cycle of interaction between an agent and an environment by using the existing deep reinforcement learning algorithm: at the beginning of each round, all buildings in the 'environment' are in an initial arrangement state, and in each subsequent step, the 'agent' moves each building one by one according to the interaction with the 'environment'; in any step, the "agent" acquires the current state of the "environment" to generate a single-step movement vector of a single building, and after the movement is performed in the "environment", acquires the new state after the movement, and the variation of the satisfaction degree of the "environment" before and after the movement on some calculable performance constraint. And training the deep neural network in the movement decision function by using the dynamically updated movement record (at least comprising the movement vector, the 'environment' state before and after movement and the variation of the satisfaction degree of the performance constraints before and after movement) so as to obtain higher performance constraint satisfaction degree in the later period of the turn, namely improving the capability of generating a correct movement vector from dynamically accumulated movement experience. With the increase of the number of rounds and the number of steps, the performance index is continuously improved until the solving of the arrangement problem is completed, namely the preset satisfying condition of the performance constraint is achieved. The performance indicators in the performance constraints must be numerical indicators that can be calculated from the "environment".

Technical scheme two

An automatic building group arrangement algorithm for sunlight constraint is characterized in that,

according to professional knowledge in the building sunshine field, an environment mathematical model (environment A) capable of flexibly describing various building group arrangement problems is designed, and an agent control module (agent B) capable of acquiring single-step interaction experience in the environment is designed, so that a step-by-step regression type deep-reinforcement learning training solving module (step-by-step regression type C) based on the interaction experience of the agent and the environment can be constructed according to an existing deep-reinforcement learning algorithm. The agent continuously acquires the state of the environment in each single step, drives the movement of the single building after inputting a movement decision function for controlling the single building, collects the variation of the global sunshine constraint satisfaction degree before and after movement and a new environment state, and continuously optimizes the movement decision function through recording and training the interaction experiences, so that the highest degree of global sunshine constraint satisfaction is achieved in subsequent rounds and steps until the solution of the sunshine distribution problem is completed.

The technical scheme is completely independent of the arrangement samples collected in advance, has extremely high adaptability to various arrangement tasks, and the solved by-product (the movement decision neural network of a single moving body) has very high learning mobility.

Drawings

FIG. 1 algorithm flow chart

FIG. 2 environmental model sample

FIG. 3 is a general plan view of the shielding relationship of a sample building group in the direction of a sunshine test at a certain time

FIG. 4 is a directed graph showing the occlusion relationship in the sunshine test direction

FIG. 5 illustrates the determination of the neighborhood of nodes in a graph with maximum occlusion

Fig. 6 shows n + m positive and negative test directions (n 9, m 3) with respect to the actual distance of movement around the test object

Detailed Description

The technical scheme of the invention is further described in the following by combining the attached drawings.

The existing sunshine constraint-oriented automatic arrangement method for the positions of the high-rise buildings and related researches cannot meet the adaptability requirement of actual design work on arrangement tasks. The technical scheme of the invention is as follows:

therefore, according to the relevant professional knowledge in the building sunshine field, the invention designs an 'environment' mathematical model (A frame part in figure 1) capable of flexibly describing various building group arrangement problems, designs an 'agent' control module (B frame part in figure 1) capable of acquiring single-step interactive experience in the 'environment', and accordingly can construct a 'step-by-step back-and-forth' deep reinforcement learning training solving module (C frame part in figure 1) based on the 'agent' and 'environment' interactive experience according to the existing deep reinforcement learning algorithm (taking DDPG algorithm as an example). The agent continuously acquires the state of the environment in each single step, drives the movement of the single building after inputting a movement decision function for controlling the single building, collects the variation of the global sunshine constraint satisfaction degree before and after movement, and continuously optimizes the movement decision function through recording and training of interaction experiences, so that the highest degree of global sunshine constraint satisfaction is achieved in subsequent rounds and steps until the solution of the sunshine distribution problem is completed. The algorithm is completely independent of building group configuration case samples collected in advance, has extremely high adaptability to various configuration tasks, and a solved by-product (a moving decision function of a single building, namely a neural network) of the algorithm also has high reusability, namely learning mobility.

The "Agent" of the present invention is called "Agent" in the deep reinforcement learning theory.

The "adaptability" of the present invention is embodied in:

the sunshine measuring point and the sunshine constraint condition can be set at the given position of each building unit to be distributed according to task requirements, so that the method is suitable for any calculation precision;

and the system can also deal with various arrangement problems, is not limited to latitude, boundary shape, terrain height change, volume ratio, building density, building unit number, building unit type, contour shape, height, existing building conditions on the periphery and the like, and is suitable for various arrangement conditions.

The application comprises the following steps: through the technical scheme of the algorithm, corresponding computer software can be supported, an automatic arrangement scheme meeting the sunlight constraint condition is generated for a designer who is planning a high-density city or designing a building for a given building base range, a plurality of high-rise building units and a series of other design requirements, the designer can be liberated from repeated manual 'trial and error' work with low efficiency, and a proper arrangement scheme is selected from an automatically generated result to serve as a basis for subsequent deepened design.

The invention provides a specific technical scheme

The invention can be summarized into a circular solving process consisting of three parts, namely 'step-by-step regression type' interaction between an A 'environment', a B 'agent person' and a C (figure 1).

A is an environment: an "environmental" mathematical model describing the building complex arrangement problem.

Any problem of building group automatic arrangement facing sunlight constraint is described as a three-dimensional environment mathematical model (hereinafter referred to as environment, the content in a left broken line frame in fig. 1) based on a three-dimensional coordinate system (such as that an X axis points to the east, a Y axis points to the north, and a Z axis points to the sky), and is used by a subsequent B 'agent' part in the solving process. It contains the following 5 objects:

1) and (4) measuring points by sunshine: a set of three-dimensional points { P1, P2, P3 … … }, which describe the spatial regions where the insolation constraints are to be considered in the form of discrete three-dimensional points, such as the center points of the two windows of each building floor in fig. 2, i.e., the black filled circles with numerical numbering.

2) Direction of sunshine test: a set of n three-dimensional vectors for describing in discrete form the course of continuous sunlight during a day. Each vector corresponds to a specific time, pointing in the direction of the sun from a sunshine test point (influenced by the latitude of the city), such as 9 vectors from the central point of window 0 on the bottom floor of building E to the south in fig. 2, which represent respectively 8: 00 to 16: the sun vector at 00 hour (above sea latitude is taken as an example).

3) Sunshine constraint condition parameters: a set of numerical parameters and logic judgment rules associated with each sunshine measuring point mathematically describe the sunshine constraint conditions of different cities and buildings of different types, and the method comprises the following steps:

corresponding time intervals between adjacent sunshine test directions: a numerical parameter describing the interval of time represented by any two adjacent sunshine test directions. As this value decreases, the more dense the discrete sunshine test directions become, the higher the sunshine simulation accuracy of the present algorithm, and the larger the calculation amount.

The constraint of the sunlight measuring point meets the judgment condition: and the set of logic judgment rules is used for describing whether the sunshine constraint is met or not in a rule form according to different sunshine constraint conditions corresponding to the types of buildings where the sunshine measuring points are located in different cities. The method comprises the steps of meeting judgment at the earliest effective sunshine time, meeting judgment at the latest effective sunshine time, meeting judgment at the minimum duration of accumulated effective sunshine, and meeting judgment at the minimum duration of continuous effective sunshine.

4) Building: a geometric polyhedron with multiple attribute data to describe the physical part of the building in the environment that can block sunlight (five rectangles marked with diagonal lines a to E in fig. 2). Its attributes include:

high-level attributes: yes or no, automatically given according to the Z-axis size of the polyhedron);

the movable property: if so, the building participates in automatic configuration calculation;

sunshine station list attribute: the numerical array describes the number of the related sunlight measuring points;

minimum horizontal boundary: a closed curve in the XY plane describes the minimum boundary requirements that need to be met when two buildings move horizontally without overlapping each other (the rectangular extent of each building with the north side of the diagonal rectangle in fig. 2).

5) Street boundary: the 4 closed curves in the XY plane respectively describe boundaries between the streets and roads, boundaries of a range in which multi-story buildings can be arranged in the streets, boundaries of a range in which high-rise buildings can be arranged in the streets, and boundaries of a range in which arrangement is not allowed in the streets (such as a concentrated greenbelt, a water body, and the like), such as irregular ranges of 1 solid line and 3 dotted lines from outside to inside in fig. 2.

B "agent": ' agent ' control module for acquiring single-step interaction experience in ' environment

Control of any building to be arranged in a single step (steps B1-B11 in the central dashed box in fig. 1) is accomplished by interaction with the "environment" (reading or modifying data in the model), including four types of functions: calculating the satisfaction degree of the global sunshine constraint conditions (steps B1 and B11 in the figure 1), selecting the current building to be moved (step B2 in the figure 1), calculating the state of an 'environment' obtained by observing the current building as the center (steps B3-B9 and B11 in the figure 1), and executing the movement of the current building under a given vector in the 'environment' (step B10 in the figure 1).

C, step-by-step involution type interaction between the two: deep reinforcement learning training solving module based on interactive experience of two

By referring to the common "step-and-loop type" general solution framework in deep reinforcement learning (steps C1-C10 within the right C-dashed box in fig. 1, which takes the DDPG algorithm as an example [6 ]): an interactive process is constructed with u "turns" and v "steps" per turn between "environment" and "agent". During this period, each "round" is restored to the initial state of "environment" At the beginning, and in any step t, the "agent" accumulates in the "environment" an interactive experience sample regarding the specific behavior condition, behavior decision, behavior effect (i.e. the "environment" state tensors St and St +1 before and after the execution of the specific behavior, the control tensor At of the specific behavior generated by the behavior decision function of the "agent", and the specific gain variation Rt +1-Rt of the "environment" before and after the execution of the "agent" behavior). When the number of the interactive experience samples meets the requirement, the deep neural network in the behavior decision function of the agent can be trained in each step, namely, the deep neural network is embodied as the behavior of the agent in the environment in the interactive process, and the learning process of continuously obtaining higher specific benefits is explored from no purpose. Thus, after a number of "rounds" of "steps," a problem solution is declared complete when the particular benefit meets the solution requirement.

The general principle of the deep reinforcement learning algorithm is combined with the specific environment and the agent which are constructed according to professional knowledge in the building sunshine field (namely, the state tensor of the environment is specifically the description of the shielding condition and the potential movable distance of the surrounding buildings observed by taking the selected single building as the center, the specific behavior control tensor of the agent is specifically the single-step movement vector of the selected single building in the movable range, and the specific gain variation of the environment is specifically the variation of the global sunshine constraint condition meeting the degree before and after the selected single building moves), so that a step-by-step regression type solving process of the algorithm for the problem of automatic arrangement of the building group under the sunshine constraint is formed.

Further detailed example algorithms are given

Example 1

Based on the idea of the technical solution of the present invention, an embodiment algorithm using DDPG as deep reinforcement learning is given, but the scope of the present invention should not be limited thereby.

The algorithm converts the parallel solving process among a plurality of buildings in the automatic configuration problem into a deep reinforcement learning process aiming at a single building movement decision function. The essence of the whole learning process is that starting from the initial arrangement of the 'rounds', the 'agent' in a series of 'steps' is used for controlling the movement of a single building in the 'environment', the amount of movement behaviors in each step, different states of the 'environment' before and after movement and the change of the satisfaction degree of the global sunshine constraint condition are recorded, the sample data are used for training the movement decision function of the 'agent', and convergence is carried out towards obtaining higher satisfaction degree of the global sunshine constraint, so that solution is realized after a plurality of 'rounds' and 'steps'. Specifically, the solving process contains u "rounds" and a nested loop of v "steps" in each round, i.e., for any t-step (e [0, u-1], t e [0, v-1]) in any e-round (FIG. 1):

step 1: (re) setting solution hyper-parameters

According to available computing resources, various solving hyper-parameters are set, such as: the number u of rounds of the calculation process, the number v of steps per round, the number of neurons in the deep neural network architecture, the connection mode, the type of the activation function and the like.

Step 2: determining the end condition of the round loop

If u, i.e., the current round has exceeded the last round set, and no solution has previously been available, then this round of solution terminates with a failure. (if the method is matched with the automatic searching method of the hyper-parameters, the step 1 can be returned, and a new round of solution is tried after resetting)

And 3, step 3: start a new round k

Performing initial layout on buildings in the environment according to a set rule (such as randomization), and resetting a counter t;

and 4, step 4: determining the end conditions of the step cycle

If t ═ v, namely the last step in the round set up has already been exceeded in this step, round counter e adds "1" (e ═ e +1), and return to step 2;

and 5, step 5: starting a new interaction step t of the agent and the environment

And 6, step 6: calculating global sunshine constraint satisfaction percentage Rt

The detailed conditions that all the sunlight measuring points can actually obtain sunlight can be simulated and calculated by starting from all the sunlight measuring points in the environment and respectively making rays along n different sunlight measuring directions to judge whether the rays form a geometric intersection relationship with buildings in the environment. And then, according to the sunshine constraint condition parameters, the proportion of the number of the sunshine measuring points meeting the sunshine constraint to the number of all the sunshine measuring points can be calculated, namely the global sunshine constraint meeting percentage Rt at the beginning of the step t.

And 7, step 7: selecting a current building to be moved

And selecting one movable building b in the environment as the current building to be moved in the step according to a certain strategy (such as random). All subsequent calculations of the "context" state Sbt at the current step t are centered on it.

And 8, step 8: constructing a global sunshine resource directed graph

And constructing n shielding relation directed graphs { G1, G2... Gn } of all sunshine measuring points in n sunshine testing directions. Wherein, any graph Gn is constructed as follows (as any one of FIG. 3 is in the "environment" of the Shanghai region, FIG. 4 is the directed graph G of the sunshine test direction at 13: 00): traversing the relation between all the sunlight measuring points and the surrounding buildings in the scene along the ground projection vector of the sunlight measuring direction corresponding to the graph (for example, any one of the 'environment' in the Shanghai area in FIG. 3, the ground projection traversal in the sunlight measuring direction at 13: 00 hours). It takes the sunshine measure point as the node of the graph, the sunshine shielding relation as the edge of the graph, and the sunshine measure point from the shielded sunshine measure point to the building shielding the sunshine measure point is the positive direction (as figure 4 is the directed graph G of the environment in the above direction). When a measuring point node i in the graph forms the maximum shielding relationship (the height-to-distance ratio is maximum) with a building on the sunny side, the measuring point and all measuring points (such as the measuring point node j) on the building form the adjacent relationship (such as pointing from the node i to the node j) in the graph (such as the node A:1 in the graph shown in FIG. 4, although the node A is shielded by the buildings B and C on the south side, the node A is larger in the height-to-distance ratio with the building C, so that the nodes in the positive direction of the node are C:0 and C:1, but not B:0 and B: 1). In the high distance ratio, the distance Dij is the length of the ground line segment of the building from the shielded sunshine measurement point node i to the adjacent node j, and the shielding relation is formed between the ground line segment and the adjacent node j. The height difference Hij is the vertical distance from the shielded sunshine measuring point node i to the building which is located by the adjacent node j and forms a shielding relation with the shielded building, and the actual top tangent point of the sun in the three-dimensional ray of the direction is shown in figure 4.

Step 9: extracting sunny side sunshine measuring point node

In each directed graph Gn, c sunlight measuring point nodes { P1, P2.. Pc } owned by the current movable building are taken as starting points, and all other sunlight measuring point nodes on the building, which are located on the sunny side and form a direct or indirect shielding relationship with the sunny measuring point nodes, are found along the shielded direction (the positive direction in the figure, and the arrow direction between the nodes in fig. 4).

Step 10: calculating the sun-facing side 'sunshine resource index'

For the nodes extracted from each directed graph Gn, calculating the ground distance Di and the shielding height difference Hi between the sunlight measuring point corresponding to each node and the building corresponding to the adjacent node on the sunny side, and then calculating the "sunlight resource index (+ solarIndex, + SI)" of the sunny side of the current building on the current directed graph Gn according to the following formula (formula 1). Finally, n + SIs will be obtained for n sunshine test directions.

And 11, step 11: node for extracting sunlight measuring point on sunny side

In each directed graph Gn, c sunlight measuring point nodes { P1, P2.. Pc } owned by the current movable building are respectively taken as starting points, and all sunlight measuring point nodes of other buildings which are located on the sun-back side and directly or indirectly shielded by the building located in the sun-back side are found along the direction (the opposite direction in the figure) for shielding other buildings.

Step 12: calculating sun resource index of sun-back side "

For the nodes extracted from each directed graph Gn, calculating the ground distance Di and the shielding height difference Hi between the sunlight measuring point corresponding to each node and the building corresponding to the adjacent node on the sunny side, and then calculating the sunlight resource index (-solarIndex, -SI) on the sunny side of the current directed graph Gn of the current building according to the following formula (formula 2). Finally, n-SIs will be obtained for n sunshine test directions.

Step 13: calculating the distance of each direction movement

From the centroid of the current building, the distances (the total length of the line segment occupied by the minimum horizontal boundary range of the building between which the line segment is subtracted) which can be actually moved in the distances from the ground projections in the n + m positive and negative directions (n is the sunshine test direction, as shown in fig. 6, n is a positive and negative long arrow of 9, and m is an additional direction, as shown in fig. 6, m is a positive and negative short arrow of 3) to the corresponding street boundary are respectively calculated (the corresponding street boundary is selected according to the high-rise attributes of the building for calculation). A total of 2n +2m distance values D are obtained.

Step 14: synthesis of the State of the "Environment

All + SI, -SI, D mentioned above constitute an array with 4n +2m elements, i.e. the global context state Sbt observed centered around the current building is described in the form of a fixed number of parameters.

Step 15: generation of action vector Abt with decision function

After Sbt is input into a motion decision function Fae with adjustable internal parameters, the current estimate of the current motion vector Abt is calculated (equation 3). Here, as the internal parameters of the movement decision function go through a process from random distribution to gradual convergence in the learning process (i.e., the training of the deep neural network), the calculated vector also enables the movement of the current monomer building to gradually trend to purposeful exploration from purposeless attempts, and further a better global sunshine constraint satisfaction percentage is obtained;

step 16: performing a movement of the current building in the "environment" under a given vector

Actual movement of the current building b in the environment according to the current estimate of the vector Abt;

step 17: calculating the state Sbt +1 of the 'environment' after movement and the sunshine constraint satisfying percentage Rt +1

According to the same steps as the steps 7-14, observing surrounding buildings by taking the currently moved building b as the center again, and calculating the global state description Sbt +1 of the new position (which is equivalent to the global state observed when the building b enters the step t + 1); meanwhile, calculating the percentage Rt +1 of all new sunlight measuring points meeting the sunlight constraint at the moment;

step 18: storing interactive experiences

Storing the data (Sbt, Abt current estimation, Sbt +1, Rt +1-Rt) into a list M with fixed capacity (when the list is empty at the beginning of solving, the data is stored continuously;

step 19: training interactive experience to optimize decision functions

And if the list M is full, performing learning training on the movement decision function.

The present embodiment sets according to the DDPG algorithm: the control tensor for a particular behavior of all "agents" (i.e. here the movement control vector of a single building) is computed by a set of internal parameter-adjustable deep neural networks (movement decision functions) face, Fat, Fce, Fct. The functions Fae and Fat calculate control tensors a (expressions 3 and 4) of the behavior from the state tensor S of the "environment". The functions Fce and Fct are responsible for calculating the cost of the behavior (i.e., movement) after execution from the state tensors S and a (equations 5 and 6). The first two functions have the same deep neural network structure and the second two functions have the same deep neural network structure, and the coefficients are periodically synchronized by the algorithm (from Fae to Fat; from Fce to Fct). The relation of formula 7 exists between the t step and the t +1 step (GAMMA is a parameter with a value between 0 and 1, and determines the proportion of the return (Rt +1-Rt) obtained in the current step in the decision relative to the forward value, namely, the more 1 learning tends to have 'far sight' 6). Thus, the deep neural network can be trained with FCEloss as the loss function of the function Fce (equation 8) and FAEloss as the loss function of the function Fae (equation 9), and finally 4 convergence states of the decision functions can be obtained.

Abt Fae (sbt) (equation 3)

Abt +1 Long term target Fat (Sbt +1) (equation 4)

Qt current estimate Fce (Sbt, Abt current estimate) (equation 5)

Qt +1 forward target Fct (Sbt +1, Abt +1 forward target) (equation 6)

Qt Long term target Rt +1-Rt + Qt +1 Long term target GAMMA (formula 7)

FCEloss ═ Qt long term target-Qt current estimate | (equation 8)

FAEloss ═ -Qt current estimate (equation 9)

Step 20: judging whether the solution is successful or not according to Rt +1

And judging whether the current Rt +1 meets the requirement of the solution problem on the sunshine constraint, recording the current arrangement in the environment as a successful solution, and at the moment, successfully terminating the step-by-step convolution solution process (if a plurality of solutions meeting the sunshine constraint are needed, the algorithm is not terminated after the arrangement is recorded, and the subsequent step 21 is continuously executed).

Step 21: entering the next step

And adding 1 to the step counter t (t is t +1), and returning to the step 4.

Key point of the invention

1) The 'environment' mathematical model describing the arrangement problem can flexibly express various arrangement problems.

In the A 'environment' of the algorithm, a three-dimensional mathematical model consisting of 5 objects is adopted, and various building group arrangement problems under various sunlight constraints in actual design can be flexibly expressed through different combination settings of various parameters.

2) And the default multi-building parallel description of the configuration problem is changed into the single-building serial description, so that the solving difficulty is reduced.

The step-by-step regression type interaction between the C 'environment' and the 'agent' of the algorithm converts the commonly acquiescent one-time parallel positioning problems of a plurality of buildings in the building arrangement task into a randomly serialized step-by-step regression type single building movement decision problem. Based on the above analysis of related research at home and abroad, the existing method tries to directly generate an arrangement result containing positioning information of all buildings to be arranged within a base. The difficulty of the problem is high due to the nonlinear spatial distribution of the sunlight conditions, the actual coincidence judgment among buildings with different base profiles and the common complex constraint conditions in various actual designs. The invention converts the problem into a single-building single-step movement decision-making problem that any single building adopts a proper movement tensor according to the surrounding environment state to gradually improve the global sunshine condition by constructing a step-by-step regression type solving process taking the single building as a movement decision-making unit. Thereby greatly reducing the difficulty rating of the problem.

3) The "sunshine resource index" is used to describe the global sunshine shielding comprehensive situation in 1 given sunshine test direction by 2 values.

In the algorithm, from step 8 to step 12, a series of steps are provided for the change of peripheral buildings on the sunny side and the sunny side in each sunshine testing direction, and a sunny side sunlight resource index of a node related to the current building are calculated in an accumulated mode by using a directed graph based on sunshine measuring points. The method provides the capability of describing the comprehensive condition of global sunlight shielding in 1 direction by using a fixed number (2) of numerical values for observing the environment of the current building, namely the capability of describing the comprehensive condition of the global sunlight shielding in n sunlight test directions by using a fixed 2n numerical value (under the general condition, the peripheral buildings of which the number and the positions are changed in one sunlight test direction are faced, or the numerical values of the fixed number are used for describing the local periphery of the current building, and the condition of sunlight shielding in the discrete direction of the fixed number, namely the global condition can not be described, or the numerical values of the same variable number are used for describing the sunlight shielding condition brought by the peripheral dynamic building, namely the global condition can not be described by using the numerical values of the fixed length). Because the information has high correlation with the change of sunlight influence brought to surrounding buildings after the current building moves, the information plays a key role in learning of 'movement decision functions' of various deep reinforcement learning algorithms, and is professional description innovation in the building field independent of a specific deep reinforcement learning algorithm. Meanwhile, the method is used as a statistic value derived from the altitude ratio, contains the altitude information of the environment, has the capability of treating buildings with different heights in a distinguishing way, and has the capability of treating the arrangement problem with the terrain altitude difference.

4) A fixed-length array is used to describe the global environment state.

In the algorithm, the 7 th step and the 14 th step face different arrangement problems, and an array with a fixed length (comprising 4n +2m elements, 2n description 'sunshine resource index' and 2n +2m description movable distance range) is adopted to describe the global environment observed by taking the current building as the center in a discrete form. This establishes the foundation for the network training results in the migration of "decision functions" among similar configuration tasks. (if the data length of the environment state description is changed due to different configuration tasks, a successfully trained neural network cannot be shared among decision functions with similar problems but different surrounding environments.)

5) The balance between the calculation precision and the consumed time is adjustable.

For a given arrangement task, the solving precision and the corresponding calculation time consumption can be changed by adjusting the number of sunshine measuring points, the number n of sunshine measuring directions, the number u of rounds and the number v of steps in a depth reinforcement learning frame in an environment, and different requirements of each design stage on precision and speed are met.

6) The solution acceleration mechanism under similar conditions.

The 'mobile decision function' of the invention adopts a deep neural network structure, and due to the 'learning mobility' characteristic commonly possessed by the training result, the 'step-by-step regression type' single building exploration process and the global state description with fixed length, the deep neural network learned by the first task can be directly copied to the deep neural network of the second task under the condition of two similar configuration tasks (the sunlight test direction is the same as the sunlight constraint condition parameter), so that the calculation time for solving the second configuration task is obviously reduced.

Advantages or advantageous effects of the invention

1) Samples are not relied upon.

The invention adopts a completely different idea from the prior art (relying on the accumulation and training of high-quality samples to try to generate successful arrangement results of a plurality of buildings at one time), and the solving process can be described as follows: an agent of building arrangement controlled by a neural network only carries out step-by-step round trial in a three-dimensional mathematical environment model according to the rule that human beings transfer the condition description parameters to the agent, learns better strategies for moving a single building from own experience, and finally completes the whole process of the arrangement task. The invention completely does not need to provide samples manually, and is not limited by manual sample accumulation, type deletion and the like. Theoretically, the present invention can handle the problem of an arrangement that is not as complex as human beings.

2) Better adaptability.

The object of learning in the prior art is a street as a unit, the generated result is a complete street, and the arrangement problem in the actual design needs to consider a plurality of existing and established buildings inside and outside the base and various terrain height distributions inside and outside the base, so that the prior art cannot naturally deal with the problem. The invention describes various arrangement problems through an 'environment' mathematical model, and the solved training learning process is a single-movement decision function aiming at a single building with customizable sunlight measuring points, so the invention is completely free from the restriction of the type, the form and the number of the buildings to be arranged (only the change of the calculation time is involved in the solution). For example, for cities with different latitudes and different local regulations, such as high-rise residential building arrangements in Shanghai city and Changchun city, the requirements of "winter solstice 9: 00-15: day 1 hour between 00 "," day 8: 2 hours of sunshine between 00 and 16:00, and can be completely adapted by respectively setting sunshine constraint condition parameters in the environment; for various base shapes, the number of arranged buildings and the types of buildings faced in the actual design work, the method can be adapted by setting objects such as buildings, attributes, block boundaries and the like in the environment; when the position of a window in a future building plane is not determined during arrangement, sunlight measuring points can be uniformly distributed according to the sunny side, and when the building plane is determined, the sunlight measuring points can be directly set according to the position of the window; the method can be realized by setting buildings in the environment and setting corresponding attributes for given objects existing inside and outside some bases (whether the given objects can shade other buildings or the sun illumination of the given objects is influenced by other buildings and needs to be guaranteed).

Meanwhile, the prior art is limited by the description precision of the training samples, and the precision of the generated result cannot exceed necessarily. For a given arrangement task, the solving precision and the corresponding calculation time consumption can be changed by adjusting the number of the sunshine measuring points, the number n of the sunshine measuring directions, the number u of the rounds and the number v of the steps in the depth reinforcement learning frame in the environment, and the different requirements of each design stage on the precision and the speed are met. For example, the corresponding time interval between adjacent sunshine test directions in the environment can be used for controlling the calculation accuracy (the smaller the interval is, the longer the calculation time is, but the higher the solution accuracy is), the larger value is adopted at the early stage of design to perform multiple rounds of rough rapid arrangement, and the smaller value is adopted at the later stage to obtain a more accurate arrangement result in a longer calculation time.

3) And the condition of reusing the training results is more relaxed.

The prior art is limited by the generation countermeasure network mechanism adopted by the method, the training result type of the sample and the generation solution type when the arrangement problem is processed are required to be highly matched, otherwise useless solutions are generated, the multiplexing condition of the training results is severe, for example, the arrangement training result of a small-scale base cannot be applied to the arrangement task of a large-scale base, and the arrangement training result of an L-type base cannot be applied to the arrangement task of a trapezoidal base. The method for moving single buildings one by one and the global environment state description with fixed length adopted by the invention ensure that if two distributed tasks have the same sunshine constraint condition parameters, are in the same city and have the same calculation precision (the same n sunshine test directions), the training result can be reused, namely the multiplexing condition is looser, thereby being beneficial to fully playing the 'learning mobility' advantage of the neural network training result and greatly reducing the calculation time for the second task to generate the solution for the first time.

The algorithm can be implemented based on various depth reinforcement learning algorithms (not limited to Actor critical, A3C, Deepeterminant Policy Gradient (DDPG), Determinant Policy Gradient (DPG), Proximal Policy Optimization (PPO), Distributed Proximal Policy Optimization (DPPO), Policy Gradient, Normalized Advantage Functions), and a series of earlier algorithms such as DeepQ Network (DQN), Dual DQN, Double DQN, and Normalized display DQN can be adopted when the continuously Distributed motion vector generated by the agent is changed into a plurality of discrete fixed motion vectors. Obviously, after different deep reinforcement learning algorithms are specifically selected, the training convergence rate and the success rate of the neural network are different, and the solving efficiency of the arrangement is affected. Since the solving framework of the algorithm (the step in the dotted rectangle at the right part C in fig. 1) follows the general principle of the deep reinforcement learning framework, and it is not the key point and the protection point claimed by the present invention, it should be considered as a continuation of the present invention if other deep reinforcement learning algorithms are used to train the "movement decision function" of a single building.

Claims

1. A building group automatic arrangement method facing sunlight constraint is characterized in that an 'environment' mathematical model capable of flexibly describing various building group arrangement problems is designed according to professional knowledge in the building sunlight field, and an 'agent' control module capable of acquiring single-step interactive experience in the 'environment' is designed, so that a 'step-by-step back-closing type' deep reinforcement learning training solving module based on the 'agent' and 'environment' interactive experience can be constructed according to an existing deep reinforcement learning algorithm; the method comprises the following steps that an agent continuously obtains the state of an environment in each single step, drives a mobile decision function for controlling a single building to move after inputting the mobile decision function, collects the variation of the global sunlight constraint satisfaction degree before and after movement and a new environment state, and continuously optimizes the mobile decision function through recording and training of interaction experiences, so that the highest degree of the global sunlight constraint satisfaction is achieved in subsequent rounds and steps until the solution of the sunlight distribution problem is completed;

the 'environment' mathematical model for describing the building group arrangement problem is as follows: any problem of automatic arrangement of building groups facing sunlight constraint is described as a three-dimensional environment mathematical model based on a three-dimensional coordinate system, and is used by a subsequent agent control module part in the solving process; it contains the following 5 objects:

1) and (4) measuring points by sunshine: set of three-dimensional points { P₁,P₂,P₃… …, describing the space area needing to take the sunshine constraint into consideration in the form of discrete three-dimensional points;

2) direction of sunshine test: a set of n three-dimensional vectors for describing in a discrete form the continuous course of sunshine during a day; each vector corresponds to a specific moment and points to the direction of the sun from a sunlight measuring point;

3) sunshine constraint condition parameters: a group of numerical parameters and logic judgment rules associated with each sunshine measuring point are used for mathematically describing different sunshine constraint conditions of each city and each type of building;

4) building: a geometric polyhedron with multiple items of attribute data for describing the building entity part which can block sunshine in the environment;

5) street boundary: the boundary between the street, the boundary of the range in which the multi-storey buildings in the street can be arranged, the boundary of the range in which the high-storey buildings in the street can be arranged and the boundary of the range in the street which is not allowed to be arranged are described.

2. The sunshine-constraint-oriented building group automatic arrangement method as claimed in claim 1, wherein the mathematical description comprises:

corresponding time intervals between adjacent sunshine test directions: a numerical parameter describing the interval of time represented by any two adjacent sunshine test directions;

the constraint of the sunlight measuring point meets the judgment condition: the method is characterized in that the method is a set of logic judgment rules, and whether sunlight constraint is met or not is described in a rule mode according to different sunlight constraint conditions corresponding to different cities and building types where sunlight measuring points are located.

3. The method as claimed in claim 1, wherein the building attributes comprise: high-level attribute, movable attribute, sunshine measure point list attribute and minimum horizontal boundary.

4. The sunlight-constrained-oriented automatic building group arrangement method of claim 1, wherein the "agent" control module performs control on any building to be arranged in a single step through interaction with an "environment" mathematical model, and comprises: calculating the satisfaction degree of the global sunshine constraint condition, selecting the current building to be moved, calculating the environment state observed and obtained by taking the current building as the center, and executing the movement of the current building under a given vector in an environment mathematical model.

5. The method for automatically arranging building groups facing sunlight constraints as claimed in claim 1, wherein the step-by-step regression formula: building an interactive process with u rounds and v steps in each round between an 'environment' mathematical model and an 'agent' control module;

during the period, each round is restored to the initial state of the environment at the beginning, and in any step t, the agent control module accumulates interactive experience samples related to specific behavior conditions, behavior decisions and behavior effects in the environment mathematical module; when the number of the interactive experience samples meets the requirement, in each step, the method can be used for training the deep neural network in the behavior decision function of the agent control module; after several "rounds" several "steps, the solution is complete when the particular benefit meets the solution requirement.

6. The sunshine constraint-oriented building group automatic arrangement method as claimed in claim 5, wherein:

the "ambient" state tensors St and St +1 before and after the execution of a particular action,

the control tensor At for a particular behavior generated by the behavior decision function of the "agent" control module,

the "agent" control module acts to a specific gain variation Rt +1-Rt of the "environment" before and after execution.

7. The sunshine-constraint-oriented building group automatic arrangement method as claimed in claim 6, wherein DDPG is adopted as an embodiment algorithm of deep reinforcement learning,

the solving process contains u "rounds" and a nested loop with v "steps" in each round, i.e. for any t steps in any e round:

step 1: setting solution hyper-parameters

Setting various solving hyper-parameters according to available computing resources: calculating the number u of rounds, the number v of steps in each round, the number of neurons in a deep neural network architecture, a connection mode and an activation function type in the process;

step 2: determining the end condition of the round loop

If e-u, i.e. the current round has exceeded the set last round, and no available solution has appeared before, the round of solution terminates with a failure;

and 3, step 3: start a new round k

Carrying out initial layout on buildings in the environment according to a set rule, and resetting a counter t;

and 4, step 4: determining the end conditions of the step cycle

If t is v, namely the last step in the set round is exceeded by the step, adding '1' to a round counter e, and returning to the step 2;

and 5, step 5: starting a new interaction step t of the 'agent' and the 'environment';

Starting from each sunlight measuring point in the environment, rays are respectively taken along n different sunlight measuring directions, whether the rays form a geometric intersection relationship with a building in the environment is judged, and the detailed condition that all the sunlight measuring points can actually obtain sunlight can be simulated and calculated; then, according to the sunshine constraint condition parameters, the proportion of the number of sunshine measuring points meeting the sunshine constraint to the number of all the sunshine measuring points can be counted, namely the global sunshine constraint meeting percentage Rt at the beginning of the step t is calculated;

and 7, step 7: selecting a current building to be moved

One movable building b selected according to a certain strategy in the environment is used as the current building to be moved in the step; all subsequent calculations of the "environment" state Sbt at the current step t are centered on it;

and 8, step 8: constructing a global sunshine resource directed graph

Constructing n shielding relation directed graphs { G1, G2... Gn } of all sunshine measuring points in n sunshine testing directions; wherein the arbitrary graph Gn is constructed by the following method: traversing the relation between all the sunlight measuring points and surrounding buildings in the scene along the ground projection vector of the sunlight measuring direction corresponding to the image; the method takes a sunlight measuring point as a node of a graph, takes a sunlight shielding relation as an edge of the graph, and takes the sunlight measuring point shielded from the sunlight measuring point to the sunlight measuring point owned by a building shielded from the sunlight measuring point as a positive direction; when a measuring point node i in the graph and a building on the sunny side of the measuring point node i form the maximum shielding relation, namely the height-distance ratio is maximum, the measuring point and a measuring point node j on the building form the adjacent relation of pointing to the node j from the node i in the graph; in the high-distance ratio, the distance Dij is the length of the ground line segment of the building which is located from the shielded sunshine measuring point node i to the adjacent node j and forms a shielding relation with the node j; the height difference Hij is the vertical distance from the shielded sunlight measuring point node i to the building where the adjacent node j is located and forms a shielding relation with the building, and the vertical distance is the actual top tangent point of the building under the three-dimensional rays of the sun in the direction;

step 9: extracting sunny side sunshine measuring point node

In each directed graph Gn, c sunlight measuring point nodes { P1, P2.. Pc } owned by the current movable building are respectively taken as starting points, and along the sheltered direction, other sunlight measuring point nodes on the building, which are positioned on the sunny side and form a direct or indirect sheltering relationship with the sunny side, are found;

step 10: calculating the sun-facing side 'sunshine resource index'

For the nodes extracted from each directed graph Gn, calculating the ground distance Di and the shielding height difference Hi between the sunlight measuring point corresponding to each node and the building corresponding to the adjacent node on the sunny side, and then calculating the 'sunlight resource index + SI' (formula 1) of the current building on the sunny side of the current directed graph Gn according to the following formula; finally, for n sunshine testing directions, n sunshine resource indexes + SI are obtained;

and 11, step 11: node for extracting sunlight measuring point on sunny side

In each directed graph Gn, c sunlight measuring point nodes { P1, P2.. Pc } owned by the current movable building are respectively taken as starting points, and all the sunlight measuring point nodes of other buildings which are located on the back sun side and directly or indirectly shielded by the building in which the movable building is located are found along the direction which shields other buildings in the reverse direction;

step 12: calculating sun resource index of sun-back side "

For the nodes extracted from each directed graph Gn, calculating the ground distance Di and the shielding height difference Hi between the sunlight measuring point corresponding to each node and the building corresponding to the adjacent node on the sunny side, and then calculating the sunlight resource index-SI (formula 2) of the current building on the sunny side of the current directed graph Gn according to the following formula; finally, for n sunshine testing directions, n sunshine resource indexes-SI are obtained;

step 13: calculating the distance of each direction movement

From the centroid of the current building, n is the sunshine testing direction, and the actual movable distances among the distances from the centroid to the boundary of the corresponding street are calculated respectively along the ground projections of n + m positive and negative directions; obtaining 2n +2m distance values D in total;

step 14: synthesis of the State of the "Environment

All the + SI, -SI and D form an array with 4n +2m elements, namely, a global environment state Sbt observed by taking the current building as the center is described in a form of a fixed number of parameters;

step 15: generation of action vector Abt with decision function

After Sbt is input into a motion decision function Fae with adjustable internal parameters, a current estimation (formula 3) of a current motion vector Abt is calculated; here, as the internal parameters of the movement decision function undergo a process from random distribution to gradual convergence in the learning process, the calculated vector can also cause the movement of the current monomer building to gradually trend to purposeful exploration from purposeless attempts, so as to obtain a better global sunshine constraint satisfaction percentage;

According to the same steps as the steps 7-14, observing surrounding buildings by taking the currently moved building b as the center again, and calculating the global state description Sbt +1 of the new position; meanwhile, calculating the percentage Rt +1 of all new sunlight measuring points meeting the sunlight constraint at the moment;

step 18: storing interactive experiences

Storing the current estimates of the data Sbt, Abt, Sbt +1, Rt +1-Rt into a list M with fixed capacity;

step 19: training interactive experience to optimize decision functions

If the list M is full, performing learning training on the movement decision function;

step 20: judging whether the solution is successful or not according to Rt +1

Judging whether the current Rt +1 meets the requirement of solving the problem on sunlight constraint, recording the current arrangement in the environment as a successful solution, and at the moment, successfully terminating the step-by-step regression type solving process; if a plurality of solutions meeting the sunshine constraint are needed, the algorithm is not stopped after the record arrangement, and the subsequent step 21 is continuously executed;

step 21: entering the next step

And adding 1 to the step counter t, and returning to the step 4 when t is t + 1.

8. The sunlight-constrained oriented building group automatic configuration method of claim 7, wherein in the step 19, the interactive experience is trained to optimize the decision function, and the setting of the decision function is performed according to the DDPG algorithm:

all control tensors for a particular behavior of an "agent" are depth-adjustable by a set of internal parametersCalculating mobility decision functions Fae, Fat, Fce and Fct of the neural network; the functions Fae and Fat calculate control tensors a (expressions 3 and 4) of the behavior from the state tensor S of the "environment"; the functions Fce and Fct are responsible for calculating the value of the behavior after execution (formula 5 and formula 6) according to the state tensors S and a; the first two functions and the second two functions have the same deep neural network structure, and the coefficients are periodically synchronized by an algorithm from Fae to Fat and from Fce to Fct; the relation of formula 7 exists between the t step and the t +1 step; FCEloss as loss function Fce (equation 8), FAE_lossAs a function F_aeThe loss function (formula 9) of (a), training of the deep neural network is carried out, and finally, the convergence states of 4 decision functions can be obtained;

A^b _{t current estimate}＝F_ae(S^b _t) (formula 3)

A^b _{t +1 forward target}＝F_at(S^b _t+1) (formula 4)

Q_{t current estimate}＝F_ce(S^b _t,A^b _{t current estimate}) (formula 5)

Q_{t +1 forward target}＝F_ct(S^b _t+1,A^b _{t +1 forward target}) (formula 6)

Q_{Tterm goal}＝R_t+1-R_t+Q_{t +1 forward target}GAMMA (formula 7)

FCE_loss＝|Q_{Tterm goal}-Q_{t current estimate}I (formula 8)

FAE_loss＝-Q_{t current estimate}(formula 9).