CN113609548A

CN113609548A - Bridge spanning method, device, equipment and readable storage medium

Info

Publication number: CN113609548A
Application number: CN202110757197.3A
Authority: CN
Inventors: 徐升桥; 李辉; 鲍跃全; 韩广晖; 李寰宇; 彭岚平; 宋浩; 罗天靖; 简方梁; 杨喜文
Original assignee: China Railway Engineering Consulting Group Co Ltd
Current assignee: China Railway Engineering Consulting Group Co Ltd
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-11-05
Anticipated expiration: 2041-07-05
Also published as: CN113609548B

Abstract

The invention provides a bridge span distribution method, a device, equipment and a readable storage medium, relates to the technical field of bridge construction, and comprises a new idea of distributing spans by using artificial intelligence. Aiming at the requirement of intelligent span distribution, a basic principle and a calculation principle of bridge span distribution are taken as a basis, and a good learning decision-making capability of reinforcement learning is combined, a strategy value network is optimized and output through a Sarsa (lambda) algorithm, and is added into an A l phaGo algorithm, in a simulation stage, a node can obtain an accumulated reward corresponding to global simulation taking the node as a root node through a rapid span distribution strategy, and can directly find a maximum accumulated reward corresponding to the strategy value network, and finally, the maximum node of the two is taken as an actual accumulated reward after the simulation of the node, so that the final operation result of the A l phaGo algorithm is improved.

Description

Bridge spanning method, device, equipment and readable storage medium

Technical Field

The invention relates to the technical field of bridge construction, in particular to a bridge spanning method, a bridge spanning device, bridge spanning equipment and a readable storage medium.

Background

The bridge plays an important role in transportation. The problem of subgrade settlement can be effectively solved, land resources are saved, and smooth driving is guaranteed by constructing the bridge, wherein the bridge hole span arrangement is a key problem in scheme design. The bridge span layout needs a very experienced designer, a large amount of time is needed, the obtained result is not necessarily optimal, and a bridge span layout algorithm needs to replace manual design.

Disclosure of Invention

The invention aims to provide a bridge spanning method, a device, equipment and a readable storage medium, so as to improve the problems. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

in a first aspect, the present application provides a bridge spanning method, including: acquiring topographic drawing information of a line; analyzing to obtain first information according to the line topographic drawing information, wherein the first information comprises pier elevation information, route mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information; and establishing a bridge span distribution mathematical model based on the Sarsa algorithm and the Alphago based on the first information, and solving the bridge span distribution mathematical model to obtain second information, wherein the second information comprises bridge pier coordinates and bridge pier mileage of the bridge.

Further, the establishing of the bridge span mathematical model based on the Sarsa algorithm and the AlphaGo based on the first information, and solving of the bridge span mathematical model to obtain second information includes: acquiring third information, wherein the third information comprises an action set, first price value network information, second price value network information and size information of a bridge in the action set; establishing a strategy value mathematical model based on a Sarsa (lambda) algorithm, taking the first information and the third information as input information of the strategy value mathematical model, and solving the strategy value mathematical model to obtain a strategy value network, wherein the strategy value network comprises a span distribution scheme and accumulated reward information, and the span distribution scheme comprises bridge pier coordinates and bridge pier mileage of a bridge; establishing a span analysis mathematical model based on an Alphago algorithm, taking the strategy value network information, the first information and the third information as input information of the span analysis mathematical model, and solving the span analysis mathematical model to obtain fourth information.

Further, the establishing a strategy value mathematical model based on Sarsa (λ) algorithm, taking the first information and the third information as input information of the strategy value mathematical model, and solving the strategy value mathematical model to obtain a strategy value network includes: acquiring preset cycle times; calculating to obtain a state number m according to the line mileage information and a preset state length; constructing a first cycle body: establishing a Sarsa (lambda) mathematical model based on a simulated annealing algorithm, setting the learning rate of the Sarsa (lambda) mathematical model as a first preset value, the discount rate as a second preset value, the updating step frequency as a third preset value, the initialization temperature as a first preset value and the annealing parameter as exponential decay, and solving the Sarsa (lambda) mathematical model to obtain a one-time span strategy value result; repeatedly executing the first loop body for a preset number of times to obtain fourth information, wherein the fourth information comprises the result of the cloth-spanning scheme with the preset number of times; and judging to obtain a strategy value network according to the fourth information, wherein the strategy value network is the result of the one-time crossing scheme with the highest accumulated reward in the fourth information.

Further, the establishing a strategy value mathematical model based on Sarsa (λ) algorithm, taking the first information and the third information as input information of the strategy value mathematical model, and solving the strategy value mathematical model to obtain a strategy value network includes: establishing a root node of an Alphago mathematical model according to the first information, wherein the root node is bridge initial coordinate information; establishing a third loop body, namely establishing a node expansion mathematical model based on the strategy value network, the action set and the AlphaGo mathematical model, and solving the node expansion mathematical model to obtain a child node of the optimal span distribution mode under the root node; and repeatedly executing the third loop body until the node extension mathematical model is calculated to the end of the bridge, and obtaining fourth information.

In a second aspect, the present application further provides a bridge spanning device, including: the first acquisition unit is used for acquiring topographic drawing information of the line; the first conversion unit is used for analyzing and obtaining first information according to the line topographic drawing information, wherein the first information comprises pier elevation information, route mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information; and the bridge span distribution unit is used for establishing a bridge span distribution mathematical model based on the Sarsa algorithm and Alphago based on the first information, solving the bridge span distribution mathematical model to obtain second information, and the second information comprises bridge pier coordinates and mileage bridge piers of the bridge.

Further, the bridge spanning unit comprises: the second acquisition unit is used for acquiring third information, wherein the third information comprises an action set, first price value network information, second price value network information and size information of a bridge in the action set; the strategy value unit is used for establishing a strategy value mathematical model based on an Sarsa (lambda) algorithm, solving the strategy value mathematical model by taking the first information and the third information as input information of the strategy value mathematical model to obtain a strategy value network, wherein the strategy value network comprises a span distribution scheme and accumulated reward information, and the span distribution scheme comprises bridge pier coordinates and bridge pier mileage; and the span analysis unit is used for establishing a span analysis mathematical model based on an Alphago algorithm, taking the strategy value network information, the first information and the third information as input information of the span analysis mathematical model, and solving the span analysis mathematical model to obtain fourth information.

Further, the policy value unit includes: a third obtaining unit, configured to obtain a preset cycle number; the first sub-calculation unit is used for calculating to obtain a state number m according to the line mileage information and a preset state length; a first circulation unit for constructing a first circulation body: establishing a Sarsa (lambda) mathematical model based on a simulated annealing algorithm, setting the learning rate of the Sarsa (lambda) mathematical model as a first preset value, the discount rate as a second preset value, the updating step frequency as a third preset value, the initialization temperature as a first preset value and the annealing parameter as exponential decay, and solving the Sarsa (lambda) mathematical model to obtain a one-time span strategy value result; the first loop judging unit is used for repeatedly executing the first loop body to a preset number of times to obtain fourth information, and the fourth information comprises the cloth span scheme results with the preset number of times; and the first analysis unit is used for judging to obtain a strategy value network according to the fourth information, wherein the strategy value network is a one-time cross-distribution scheme result with the highest accumulated reward in the fourth information.

Further, the cross-analysis unit includes: the first establishing unit is used for establishing a root node of the Alphago mathematical model according to the first information, wherein the root node is bridge initial coordinate information; a third circulation unit, configured to construct a third circulation body, where a node expansion mathematical model is established based on the policy value network, the action set, and the AlphaGo mathematical model, and the node expansion mathematical model is solved to obtain a child node of the optimal span-distribution mode under the root node; and the third loop judgment unit is used for repeatedly executing the third loop body until the node expansion mathematical model is calculated to be terminated to obtain fourth information.

In a third aspect, the present application further provides a bridge spanning device, including:

a memory for storing a computer program;

a processor for implementing the steps of the bridge spanning method when executing the computer program.

In a fourth aspect, the present application further provides a readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the bridge-based spanning method.

The invention has the beneficial effects that:

in the simulation stage of the AlphaGo algorithm, the node can directly find the maximum accumulated reward corresponding to the node in the strategy value network except that the accumulated reward corresponding to the global simulation taking the node as a root node can be obtained through a quick spanning strategy, and finally the maximum node of the two is taken as the actual accumulated reward after the simulation of the node is finished, so that the final operation result of the AlphaGo algorithm is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic flow chart of a bridge spanning method according to an embodiment of the present invention;

FIG. 2 is a set of actions described in an embodiment of the invention;

FIG. 3 is a table of cross-cloth priority levels in the first value network according to an embodiment of the present invention

FIG. 4 is a table of cross-precedence levels in the second value network according to an embodiment of the present invention

FIG. 5 is a schematic structural view of a bridge spanning apparatus according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a first circulation unit according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a cloth span analysis unit according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a bridge spanning device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Example 1:

the embodiment provides a bridge spanning method.

Referring to fig. 1, the method is shown to include step S100, step S200 and step S300.

S100, obtaining topographic drawing information of the line;

it can be understood that, in this step, a CAD drawing material is obtained, where the CAD drawing material includes a bridge site plan view, a circuit longitudinal-section view, and a circuit plan view. It is also possible for those skilled in the art to obtain route terrain information from a building information model, wherein the building information model includes a route model, a terrain model, and the like.

S200, analyzing to obtain first information according to the topographic drawing information of the route, wherein the first information comprises pier elevation information, route mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information;

it can be understood that, in this step, the line and terrain drawing information or the building information model is processed, specifically, in this step, the drawing is analyzed by using the CAD secondary development technology or the building information model is processed by using the CAD secondary development technology. The method comprises the steps of obtaining pier elevation information, line mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information from a line longitudinal section diagram and a line plan diagram. Specifically, a continuous line segment discretization method is adopted, one point is taken every other small section on a drawing, the position coordinates of all the points are obtained and then are converted into mileage through a formula; the three-dimensional coordinates of the line can be obtained by combining the bridge address plane graph and the line longitudinal section graph, so that any position on the line can be determined, and the bridge starting coordinate information and the bridge ending coordinate information for starting to build the bridge are obtained. According to the difference between the rail surface elevation line and the ground elevation line, the pier elevation information can be obtained. And red line information of the crossing region is the region needing crossing, and the intrusion of the bridge piers is forbidden. It should be noted that the above-mentioned CAD secondary development technology is the prior art, and is not described in detail in this application.

S300, establishing a bridge span distribution mathematical model based on the Sarsa algorithm and the Alphago based on the first information, and solving the bridge span distribution mathematical model to obtain second information, wherein the second information comprises bridge pier coordinates and bridge pier mileage of the bridge.

It can be understood that, in the step, the bridge span mathematical model provides that the Sarsa algorithm is used for optimizing the Alpha Go algorithm to improve the accuracy of bridge span aiming at the condition that the stability of the result is poor due to the fact that the action strategy in the Alpha Go algorithm is single.

Specifically, the present step includes steps S310, S320, S330,

S310, acquiring third information, wherein the third information comprises an action set, first price value network information, second price value network information and size information of a bridge in the action set;

wherein the action set in this step is a selection of 7 bridge spans. See fig. 2 for 7 bridge types for agent selection.

Wherein the agent is an artificial intelligence algorithm. Meanwhile, the first price network information and the second price network information also design the content of the bridge span reward function of the intelligent agent,

wherein the first value network information is used by Sarsa (λ) algorithm in this embodiment, and the second value network is used by AlphaGo algorithm.

The first value network information is specifically as follows:

where the cloth crosses the priority table, see fig. 3. Its cost function then includes:

(1) when the simply supported beam bridge piers are arranged in the area meeting all span distribution conditions

R₁＝a(1100-(0.4h+p))

In the formula: p is bridge base valence; h is the pier height of the bridge pier arranged at the current position; a-reduction factor, in order to train on the Q table, to increase the convergence rate.

(2) When the continuous girder bridge piers are arranged in the area satisfying all the span-laying conditions

R₂＝a(1100-0.4(h₁+h₂+h₃)-p)*w

In the formula: p is bridge base valence; h is the pier height of the pier with the span arranged at the current position, w is the span weight reduction coefficient of the continuous beam, and the values of the five continuous beams in the corresponding action set are respectively 0.5, 0.4, 0.3, 0.2 and 0.1, so that the requirement of preferentially selecting the simply supported beam is met, and the principle of preferentially using the small-span continuous beam is met.

(3) When the bridge piers are arranged in the areas which meet the crossing requirements but have poor geological conditions (such as relatively steep gradient)

R₃＝0.1

At this time, the purpose is to tell the agent that piers can be placed, but the resulting reward is small, and the pier can be selected if there are other better choices, or can be placed if there is no current location.

(4) When the current span position does not meet the span condition (such as the span is arranged at the position of the existing road or underground pipeline)

R₄＝-100

At the moment, the intelligent agent is informed that piers cannot be arranged at the intelligent agent absolutely, otherwise, a large negative reward is obtained.

(5) When the continuous beam is used in an unnecessary section (for example, no existing building needing to be spanned is provided, and a simple beam is selected for spanning)

R₅＝-50

At the moment, the priority of the intelligent agent span selection is told to be that the simple beam is superior to the continuous beam, and the unnecessary engineering quantity is reduced.

(6) When the position of the secondary simply supported beam is not at the starting point, the ending point or the vicinity of the continuous beam of the whole line

R₆＝-10

At the moment, the intelligent auxiliary simply-supported beam cannot be placed at any position, and the regularity of the cloth span is guaranteed.

The second value network is as follows:

the AlphaGo algorithm uses the Monte Carlo tree search as a framework, so the bridge spanning design algorithm also uses the Monte Carlo tree search as a basic framework. The essence of bridge span arrangement is to find a proper pier position on a specified route, so that each node in the Monte Carlo tree search can be regarded as a pier for storing span arrangement related information, and the tree search process can be approximately regarded as a process for finding an optimal pier in a specified route environment. The nodes searched by the Monte Carlo tree are actually information storage packages, in addition to winning times and access times which need to be recorded, coordinate information, mileage information, action information, reward information and cost information are added into the nodes in the embodiment, the coordinate information is used for recording real-time bridge pier positions, and the mileage is used for recording bridge pier positions in crossing areas so as to judge whether the algorithm is finished or not; the action information and the reward information can visually see the influence of each action on the whole situation, and meanwhile, the action and the reward information can be conveniently and directly quoted for calculation after the limited condition is added; the cost information is used for recording the cost generated by each action, and is convenient to be directly quoted after the calculation is finished.

For bridge span distribution, the advantages of intelligent span distribution are concise and efficient, but the traditional Monte Carlo tree search time is long, and a complete span distribution scheme cannot be output after the time is over, so that the Monte Carlo tree search process is improved by combining the two points, and the improvement idea refers to the design idea of Alpha Go in Go: and after the root node finishes selecting, expanding, simulating and backtracking and finds the optimal child node, setting the child node as a new root node, and circulating until the searching is finished.

Because the Monte Carlo tree search is intuitive in the form of the span, and the subnodes can be regarded as a pier for storing related information, in the setting of the value network, except that the function with the optimal cost needs to be intuitively embodied, the rest span rules can be realized by directly limiting the positions of the two subnodes. The value network should be designed in the order in which the data spans within the qualifier table, as shown in fig. 4.

(1) When arranging the simply supported beams in the non-spanned region, the cost function may be set as follows:

in the formula: alpha-adjustment coefficient, so that the algorithm can find the lowest cost selection more easily; ds — bridge length; q₂-bridge unit price; q₁-a bridge benchmark price; d₂A simple supported beam pier unit price; d₁And-pier reference price.

(2) When arranging continuous beams in non-spanned regions, the cost function may be set as follows:

in the formula D_2-1、D_2-2、D_2-3The unit prices of the three piers of the continuous beam are different according to the difference of the bridge forms and the difference of the bridge heights.

(3) When the bridge does not meet the crossing condition, a value function is not additionally arranged, but the method is directly limited in a child node expansion stage of Monte Carlo tree search. There are seven combinations of bridge lengths, as shown in the table of fig. 2:

when a node is expanded, seven actions under the node can be selected, at the moment, the seven actions can be simulated firstly, and if the crossing condition is not met, the child node is directly deleted, and the simulation process is carried out on the rest child nodes. The processing method can directly avoid the bridge pier from invading the crossing area to cause the bridge not to meet the crossing condition, and simultaneously, the calculation speed and efficiency of the algorithm are greatly improved due to the deletion of redundant child nodes.

(4) The continuous beam manufacturing process is more complex than that of a simple beam, and meanwhile, the cost is higher, so that the priority of the simple beam is higher than that of the continuous beam in unnecessary areas, the selection stage of Monte Carlo tree search can be modified, when the child node under the node has the option of the child node generated under 32m action, if the child node is not the worst option, the child node is set as the optimal selection, and the setting can ensure that the priority of the simple beam is always higher than that of the continuous beam in unnecessary positions, and meanwhile, the function can be realized without the help of experience accumulation.

S320, establishing a strategy value mathematical model based on the Sarsa (lambda) algorithm, solving the strategy value mathematical model by taking the first information and the third information as input information of the strategy value mathematical model to obtain a strategy value network, wherein the strategy value network comprises a span distribution scheme and accumulated reward information, and the span distribution scheme comprises bridge pier coordinates and bridge pier mileage of a bridge;

specifically, the present step includes step S321, step S322, step S323, step S324, and step S325.

S321, acquiring preset cycle times;

it should be noted that the preset cycle number in this step is 30 times;

s322, calculating to obtain a state number m according to the line mileage information and the preset state length;

the cross section width of the pier with the length of 3-35 m is 3-3.5 m according to a design construction drawing of the double-line round end-shaped solid pier of the high-speed railway with the speed of 350km per hour. In order to make the terrain information closer to the actual situation, the discretization line segment should be smaller than the bridge pier width. And because one state can only correspond to one span action in the algorithm, when the length of the state is more than 4m, one state can correspond to two span actions. Taking the preset state length (namely one state) of every 3m as a comprehensive consideration, the terrain elevation in each section is considered to be the same, and therefore continuous terrain conditions can be discretized. Here the whole line is divided into a number of 3m segments, each representing a new state. Namely, the state number is calculated by dividing the line mileage information by the preset state length.

S323, constructing a first cycle body: establishing a Sarsa (lambda) mathematical model based on a simulated annealing algorithm, setting the learning rate of the Sarsa (lambda) mathematical model as a first preset value, the discount rate as a second preset value, the updating step frequency as a third preset value, the initialization temperature as a first preset value and the annealing parameter as exponential decay, and solving the Sarsa (lambda) mathematical model to obtain a one-time span strategy value result;

it should be noted that the Sarsa (λ) mathematical model used in this step preferably uses an action strategy of a simulated annealing algorithm, and in order to accelerate the convergence time, in this step, the learning rate is set to 0.3, the discount rate is set to 0.6, the update step frequency is set to 0.6, the initialization temperature is set to 100 degrees, and the annealing parameter is exponentially decayed, and the decay coefficient calculation is preferably performed by the following formula:

β＝τⁱ*T

wherein tau is a reduction coefficient, i is the cycle number in the annealing algorithm, and T is the initial temperature.

Specifically, this step includes step S3231, step S3232, step S3233, step S3234, step S3235, and step S3236.

S3231, converting the initial state into an initial state according to the initial coordinate information of the bridge;

s3232, converting the information into a cut-off state according to the bridge termination coordinate information;

s3233, setting an m × m empty matrix as an initialized state-action value function matrix-Q table according to the state number m;

s3234, generating a random value delta epsilon (0,1), calculating the receiving probability P according to Metropolis receiving criteria, setting S and a as the current state and the currently selected action, setting S 'and a' as the next state and the next selected action, and if e is the next state and the next selected action^(Q(s ^{′,a′)-Q(s,a))/T}>Δ, randomly selecting an action in the action set as s, otherwise selecting argmax Q (s, a) as s;

s3235, constructing a second cycle body: establishing a single span distribution mathematical model based on safety indexes, taking the initialization temperature, annealing parameters, an initial state, a current state, a currently selected action, a next state and a next selected action as input information of a simulated annealing mathematical model, and solving the simulated annealing mathematical model to obtain a span distribution result and feedback reward corresponding to the initial state;

in S323, S3231 to S3234 and S3235 are conventional techniques, and are not described in detail in this application, but S3235 differs from the conventional techniques in that, in searching for each state, an operation for removing a security index to disable the operation is used. Reducing the error exploration result of building the bridge pier in the crossing area.

Specifically, the middle portion of the present step includes S32351, S32352, S32353, S32354, and S32355.

S32351, executing the currently selected action, and transferring the current state to the next state to obtain feedback rewards and converting the next state and the current state into pier coordinates;

s32352, in the next state, calculating a bridge safety index corresponding to each action in the action set according to a preset formula set, wherein the safety index comprises pier top transverse displacement, beam body vertical deflection and bridge deck elevation slope, and is obtained by calculating first information and third information;

it should be noted that in this embodiment, the pier top transverse displacement, the beam body vertical deflection and the bridge deck elevation slope are selected as safety indexes.

Firstly, the bridge deck elevation gradient needs the positions of two piers, in the Sarsa (lambda) algorithm, before updating an empirical value table, two states of s and s' need to be obtained, and the two states correspond to the positions of the piers, so that the states can be converted into coordinates of the piers, the corresponding bridge deck elevation is found according to the coordinates of the piers, and then calculation is carried out; in the transverse deflection deformation of the bridge, the main influencing factor is the centrifugal force, so the calculation of the centrifugal force is important. In the Sarsa (lambda) algorithm, two states of s and s' need to be obtained by updating an experience value table, the two states correspond to the positions of piers, and the three coordinate points can calculate the real-time curve radius at a curve section by adding the pier coordinate corresponding to the last state obtained in the last execution cycle, so that the centrifugal force and the transverse deflection of the bridge can be determined; the last safety index is the transverse displacement of the pier top of the pier, and in the Sarsa (lambda) algorithm, the experience value table is updated, so that two actions of a and a 'can be obtained while two states of s and s' are obtained, the two states correspond to the position of the pier, and the two actions correspond to the length of the bridge. Firstly converting the state into a pier coordinate, subtracting the corresponding ground elevation from the bridge deck elevation to obtain the height of the pier, calculating the maximum displacement limit value of the pier top of the pier according to the bridge length corresponding to the two actions a and a', and finally calculating the actual displacement of the pier top of the bridge according to the stress.

And is calculated by the following formula:

because the elevation of the bridge pier is given before the bridge is laid, the elevation gradient of the bridge can be directly determined and corresponding elevation calculation is carried out after the position of the bridge pier is determined. The calculation formulas of the other two safety indexes are deduced according to a direct integral method, firstly, the pier top lateral displacement is caused, and the pier displacement is mainly caused by the following loads: the method comprises the following steps of obtaining pier displacement under various factors through a direct integration method by utilizing train transverse swinging force, train derailment load, wind load and temperature stress:

displacement of the bridge pier under the action of horizontal force:

wherein:

displacement of the pier generated under the action of moment:

wherein:

displacement of the bridge pier under the action of wind load:

wherein:

displacement of the bridge pier generated under the action of temperature:

in the formula, p_ySet of horizontal forces to which the pier is subjected, M_Y-the set of moments experienced by the pier,

wind load, B-section width of rectangular section, k-gradient of pier shaft, D_UDiameter of circular end-shaped cross-section of the pier bottom D₀Pier apex rounded end section diameter, R_URadius of the cross-section of the round end at the bottom of the pier, R₀-pier top rounded end section radius.

And adding the four displacements to obtain the transverse displacement of the pier top of the pier.

The following is a derivation formula of the bridge transverse deflection: firstly, a calculation formula of horizontal deflection of the simply supported beam is as follows:

Δ_{limit value}＝Δ_{Wind load}+Δ_{Swinging force}+Δ_{Centrifugal force}

Wherein:

then, a calculation formula of the horizontal deflection of the side span of the continuous beam is shown as follows:

Δ_{limit value}＝Δ_{Wind load}+Δ_{Swinging force}+Δ_{Centrifugal force}+Δ_{Temperature of}

Wherein:

and finally, a horizontal deflection calculation formula of the midspan of the continuous beam is shown as follows:

Wherein

Wherein L represents the length of the simply supported beam, L₁-continuous beam side span length, L₂-continuous beam mid-span length, h-bridge section height, k-stiffness coefficient, k₁Unit wind load effectDisplacement, k, occurring at the edge span₂Representing the displacement generated by the side span under the action of the resultant force of the unit swinging force and the centrifugal force; k is a radical of₃Representing the displacement generated by the side span under the action of unit force; k is a radical of₄Representing the displacement generated at the edge span under the unit temperature change; k is a radical of₅Representing the displacement generated by the main span under the action of unit force; k is a radical of₆Representing the displacement generated by the main span under the action of unit wind load; k is a radical of₇Representing the displacement of the main span under the action of unit temperature change; k is a radical of₈Representing the displacement of the main span under the action of the unit resultant force.

S32353, judging whether the bridge safety index meets a preset design rule or not, if not, deleting actions which do not meet the bridge safety index in an action set, wherein the preset design rule is a bridge design specification standard;

it should be noted that, in order to ensure the safety of railway operation, the specification refers to:

(1) when the designed traveling speed is 200km/h to 250km/h, the maximum gradient of the positive line cannot exceed 2%, and when the designed traveling speed is 250km/h to 300km/h, the maximum gradient of the positive line cannot exceed 2%, and cannot exceed 3% under difficult conditions.

(2) Based on the consideration of safety, the limit value of the transverse deformation of the beam body is regulated by the railway bridge and culvert design specification to meet the following numerical value: the horizontal deflection of the beam body of the train is not more than 1/4000 of the calculated span of the beam body under the influence of transverse swinging force, centrifugal force, wind power, temperature and other factors.

(3) The design specification of railway bridges and culverts specifies the maximum value of the horizontal break angle of the bridge, and the maximum value of the horizontal break angle is in accordance with the following specifications: 1. designing the horizontal break angle of the railway beam end with the speed of 200km/h and above not to be more than 1.0 rad; 2. designing the railway with the speed per hour of 160km/h and below, wherein the horizontal break angle of the beam end with the span of less than 40m is not more than 1.5 per thousand rad, and the horizontal break angle of the beam end with the span of more than or equal to 40m is not more than 1.0 per thousand rad.

S32354, updating the Q table and the E table according to a preset updating function relation;

s32355 updates the current state to the next state, and updates the currently selected action to the next selected action.

It should be noted that steps S32351, S32354, and S32355 are prior art, and are not described again in this embodiment.

And S3236, repeatedly executing the second loop body until the next state in the second loop body is a cut-off state, and obtaining a one-time span strategy value result.

S324, repeatedly executing the first loop body for preset times to obtain fourth information, wherein the fourth information comprises the cloth-spanning scheme results with the preset times;

and S325, judging to obtain a strategy value network according to the fourth information, wherein the strategy value network is a primary crossing scheme result with the highest accumulated reward in the fourth information.

It should be noted that thirty fourth information are obtained by thirty operations on the Sarsa algorithm in this step, and the off-line learning capability of the Sarsa algorithm is fully utilized to calculate a cross-sectional plan result with the highest winning interest for thirty times.

S330, establishing a span analysis mathematical model based on an Alphago algorithm, taking the strategy value network information, the first information and the third information as input information of the span analysis mathematical model, and solving the span analysis mathematical model to obtain fourth information.

It should be noted that, although the bridge span is relatively simple, in order to better explore the span possibility, in the AlphaGo algorithm, the random crossing strategy still employed in the simulation phase in this embodiment, but the random span strategy can cause that the cost and the manufacturing cost of the output result of the algorithm are large after each operation is finished, therefore, in order to expand the search range and reduce the cost floating property, the embodiment considers introducing Sarsa (λ) algorithm as an additional strategy network, all having the step S320, Sarsa (λ) is a relatively conservative reinforcement learning algorithm, the action of updating the experience table is consistent with the action adopted in the next step, therefore, the safety index can be integrated into the algorithm, and the simulation result and the experience scheme proposed by the Sarsa (lambda) algorithm are comprehensively considered after the simulation is finished, so that a better span strategy is obtained. Specifically, the present step includes step S331, step S332, and step S333.

S331, establishing a root node of the Alphago mathematical model according to the first information, wherein the root node is bridge initial coordinate information;

s332, constructing a third loop body, namely establishing a node expansion mathematical model based on the strategy value network, the action set and the Alphago mathematical model, and solving the node expansion mathematical model to obtain a child node of the optimal span distribution mode under the root node;

it should be noted that, in this embodiment, in order to add the result of the Sarsa (λ) algorithm to the simulation stage of the Alpha Go algorithm in a manner of crossing the policy table, it is convenient for the Alpha Go algorithm to obtain the optimal action according to its own simulation and the policy value table information when updating the node information. This step also includes step S3321, step S3322, step S3323, step S3324, and step S3325.

S3321, calculating to obtain a safety index corresponding to each sub-node according to the first information and the third information, wherein the safety index comprises pier top transverse displacement, beam body vertical deflection and bridge deck elevation slope;

it should be noted that the safety indexes in this step are the same as those in step S32352, and are not described in detail in this application.

S3322, judging whether the safety indexes of each sub-node bridge meet preset design rules, if not, deleting the sub-nodes corresponding to the safety indexes which do not meet the preset design rules, wherein the preset design rules are the bridge design specification standards;

it should be noted that this step is the same as step S32353, and the process is not described in detail in this application.

Through the steps of S3321 and S3322, when a node is expanded by the Alphago algorithm, seven actions can be selected under the node, at the moment, the seven actions can be simulated firstly, and if the crossing condition is not met, the child node is directly deleted, and the simulation process is carried out on the rest child nodes. The processing method can directly avoid the bridge pier from invading the area where crossing is forbidden, so that the bridge does not meet the crossing condition, and meanwhile, the calculation speed and efficiency of the algorithm are greatly improved due to the deletion of redundant child nodes.

S3323, if the total number of the child nodes is zero, returning to the node at the previous stage to execute the third cycle again;

it should be noted that, when some nodes perform the monte carlo tree search downward, all the nodes are deleted because they do not satisfy the security index. Therefore, the step returns to the previous stage, and the nodes of the previous stage are replaced to carry out Monte Carlo tree search again.

S3324, performing simulation calculation on each sub-node by adopting a UCT algorithm, and selecting an optimal sub-node by the UCT algorithm, wherein the optimal sub-node comprises bridge pier coordinate information and bridge pier mileage information;

it should be noted that, in this step,

the flow of the UCT algorithm is as follows:

(1) carrying out an expansion and simulation stage on seven span-distribution actions under the current node to obtain 7 sub-nodes;

(2) randomly selecting the simulated total reward value of one sub-node as a standard value, comparing the reward values of the other 6 sub-nodes, judging the result as winning if the obtained reward is larger than the standard value, otherwise judging the result as failing or tie, and if the result is winning, adding 1 to the winning number;

(3) continuing executing the expanding and simulating stage of tree searching to obtain the simulated reward value of each child node, similarly setting the obtained reward to be larger than the standard value, judging the reward is a winner, otherwise, judging the reward is a failure or a tie, feeding back the result to the corresponding child node until the searching is finished;

(4) and calculating the scores of all the child nodes under the current node according to a UCB formula and obtaining the optimal node.

In order to solve the problem that if the total reward value of the poor node is set as a standard value, the winning times of the optimal node and the winning times of the general nodes are consistent, the intelligent agent cannot further distinguish which node is the optimal node, so that the UCB formula needs to be further improved, and the disadvantage that the winning advantage or the losing advantage of the node generated by each action is large is highlighted. Based on the above problem, a profit float term is added to the UCB formula, as shown in the following formula:

in the yield float term of the above equation: r_i-simulating the total prize value obtained for the first time for a certain child node; r_{Standard of merit}-total prize value criteria for the child nodes selected to be set as standard values.

The last term of the formula is the yield float term, where R_iRepresenting the total reward value, R, achieved by the i-th simulation under a certain sub-node_{Standard of merit}Representing the total reward value of the selected child nodes set as the standard value, the UCB value can be corrected by setting the profit floating item, and the child nodes with larger rewards are obtained, and the value of the profit floating item is larger and is easier to select.

The content of the algorithm not mentioned in this step is common knowledge in the art, and is not described in detail in this application.

S3325, comparing the optimal sub-node with the corresponding bridge pier coordinate and bridge pier mileage in the strategy value network to obtain the bridge pier coordinate and bridge pier mileage with the maximum accumulated reward, and updating the bridge pier coordinate and bridge pier mileage with the maximum accumulated reward by the content of the optimal sub-node.

It should be noted that, in the simulation phase, the node may obtain the accumulated reward corresponding to the global simulation taking the node as the root node through the fast span-distribution strategy, and may also directly find the maximum accumulated reward corresponding to the node in the span-distribution strategy table, and finally take the maximum of the two as the actual accumulated reward after the simulation of the node is finished. To promote superior results in the final cloth span.

And S333, repeatedly executing the third cycle until the node extension mathematical model is calculated to the end of the bridge, and obtaining fourth information.

According to the embodiment, an AlphaGo algorithm is designed for bridge span distribution according to the background of bridge span distribution and the requirement of bridge span distribution problem optimization, and formulas of three safety indexes of bridge deck elevation, pier top transverse displacement and bridge transverse deflection are deduced by combining railway bridge and culvert design specifications. On the basis, the safety index is merged into the child node expansion stage of the Alphago algorithm to play a role of 'pruning', the Alphago algorithm after 'pruning' is improved in algorithm efficiency, and the situation that the safety index exceeds a limited range can be avoided; 2) aiming at the condition that the stability of the result is poor due to the fact that action strategies in the AlphaGo algorithm are single, the Sarsa (lambda) algorithm is introduced for improvement, and an action strategy network based on the Sarsa (lambda) algorithm is provided for optimizing the AlphaGo algorithm. The optimal parameter ratio is selected, and compared with the worst case, the convergence rate of the algorithm can be increased by 75%, 25% and 22% respectively; 4) the strategy value network is optimized and output through a Sarsa (lambda) algorithm, the strategy value network is added to a simulation stage of the Alphago algorithm in Monte Carlo tree search, in the simulation stage, a node can obtain the accumulated reward corresponding to the global simulation taking the node as a root node through a quick spanning strategy, the maximum accumulated reward corresponding to the node can be directly found in the strategy value network, and finally the maximum node of the two is taken as the actual accumulated reward after the simulation of the node is finished, so that the Alphago algorithm operation result is improved.

Example 2:

as shown in fig. 5, the present embodiment provides a bridge spanning apparatus, and referring to fig. 5 to 7, the apparatus includes:

the first acquisition unit 1 is used for acquiring topographic drawing information of a line; the first conversion unit 2 is used for analyzing and obtaining first information according to the topographic drawing information of the line, wherein the first information comprises pier elevation information, line mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information; and the bridge span distribution unit 3 is used for establishing a bridge span distribution mathematical model based on the Sarsa algorithm and Alphago based on the first information, solving the bridge span distribution mathematical model to obtain second information, and the second information comprises bridge pier coordinates and bridge pier mileage of the bridge.

In one embodiment of the present disclosure, the bridge spanning unit 3 includes:

a second obtaining unit 31, configured to obtain third information, where the third information includes an action set, first value network information, second value network information, and size information of a bridge in the action set; the strategy value unit 32 is used for establishing a strategy value mathematical model based on the Sarsa (lambda) algorithm, taking the first information and the third information as input information of the strategy value mathematical model, solving the strategy value mathematical model to obtain a strategy value network, wherein the strategy value network comprises a span distribution scheme and accumulated reward information, and the span distribution scheme comprises bridge pier coordinates and bridge pier mileage; and the span analysis unit 5 is used for establishing a span analysis mathematical model based on the AlphaGo algorithm, taking the strategy value network information, the first information and the third information as input information of the span analysis mathematical model, and solving the span analysis mathematical model to obtain fourth information.

In one embodiment of the present disclosure, the policy value unit 32 includes:

a third obtaining unit 321, configured to obtain a preset cycle number; the first sub-calculation unit 322 is configured to calculate a state number m according to the route mileage information and a preset state length; a first circulation unit 323 for constructing a first circulation body: establishing a Sarsa (lambda) mathematical model based on a simulated annealing algorithm, setting the learning rate of the Sarsa (lambda) mathematical model as a first preset value, the discount rate as a second preset value, the updating step frequency as a third preset value, the initialization temperature as a first preset value and the annealing parameter as exponential decay, and solving the Sarsa (lambda) mathematical model to obtain a one-time span strategy value result; the first loop judging unit 4 is configured to repeatedly execute the first loop until a preset number of times to obtain fourth information, where the fourth information includes a cloth-spanning scheme result with the preset number of times; the first analyzing unit 325 is configured to determine, according to the fourth information, to obtain a policy value network, where the policy value network is a one-time cross-distribution scheme result with a highest accumulated reward in the fourth information.

In one embodiment of the present disclosure, the first circulation unit 323 includes:

the second conversion unit 41 is configured to convert the initial state into an initial state according to the initial coordinate information of the bridge; third stepThe conversion unit 42 is used for converting the information into a cut-off state according to the bridge termination coordinate information; an empty table establishing unit 43, configured to set an m × m empty matrix as an initialized state-action value function matrix-Q table according to the number m of states; a random unit 44 for generating a random value Δ ∈ (0,1), calculating the acceptance probability P according to Metropolis acceptance criteria, and setting s and a as the current state and the currently selected action, and setting s 'and a' as the next state and the next selected action, if e is^{(Q(s′,a′)-Q(s,a))/T}>Δ, randomly selecting an action in the action set as s, otherwise selecting argmax Q (s, a) as s; a second circulation unit 45 for constructing a second circulation body: establishing a single span distribution mathematical model based on safety indexes, taking the initialization temperature, annealing parameters, an initial state, a current state, a currently selected action, a next state and a next selected action as input information of a simulated annealing mathematical model, and solving the simulated annealing mathematical model to obtain a span distribution result and feedback reward corresponding to the initial state; the second loop judging unit 46 is configured to repeatedly execute the second loop until a next state in the second loop is a cut-off state, so as to obtain a one-time span policy value result.

In one embodiment of the present disclosure, the second circulation unit 45 includes:

the first execution unit 451 is configured to execute a currently selected action, transition the current state to a next state, obtain a feedback reward, and convert the next state and the current state into pier coordinates; the second sub-calculation unit 452 is used for calculating a bridge safety index corresponding to each action in the action set according to a preset formula set in the next state, wherein the safety index comprises pier top transverse displacement, beam body vertical deflection and bridge deck elevation gradient, and the safety index is obtained by calculating the first information and the third information; a first logic sub-judgment unit 453, configured to judge whether the bridge safety index meets a preset design rule, and if not, delete the actions that do not meet the bridge safety index in the action set, where the preset design rule is a bridge design specification standard; a first updating unit 454, configured to update the Q table and the E table according to a preset updating functional relationship; the second updating unit 455 is configured to update the current state to be the next state, and update the currently selected action to be the next selected action.

In one embodiment of the present disclosure, the cloth span analysis unit 5 includes:

the first establishing unit 51 is configured to establish a root node of the AlphaGo mathematical model according to the first information, where the root node is bridge initial coordinate information; a third circulation unit 52, configured to construct a third circulation body, which is to establish a node expansion mathematical model based on the policy value network, the action set, and the AlphaGo mathematical model, and solve the node expansion mathematical model to obtain a child node of the optimal span distribution mode under the root node; and a third loop judging unit 53, configured to repeatedly execute the third loop until the node expansion mathematical model is calculated to the end of the bridge, so as to obtain fourth information.

In one embodiment of the present disclosure, the third circulation unit 52 includes:

the third sub-calculation unit 531 is configured to calculate a safety index corresponding to each sub-node according to the first information and the third information, where the safety index includes a pier top transverse displacement amount, a beam body vertical deflection, and a bridge deck elevation slope; a second logic sub-judgment unit 532, configured to judge whether each sub-node bridge safety index meets a preset design rule, if not, delete the sub-node corresponding to the safety index that does not meet the preset design rule if not, where the preset design rule is a bridge design specification standard; a third logic sub-determining unit 533, configured to, if the total number of the sub-nodes is zero, return to the node at the previous stage to execute the third loop again; the fourth sub-calculation unit 534 is configured to perform simulation calculation on each sub-node by using a UCT algorithm, and obtain an optimal sub-node selected by the UCT algorithm, where the optimal sub-node includes one piece of bridge pier coordinate information and bridge pier mileage information; the fourth logic sub-judgment unit 535 is configured to compare the optimal one of the sub-nodes with the corresponding bridge pier coordinate and bridge pier mileage in the policy value network to obtain the bridge pier coordinate and bridge pier mileage with the largest accumulated reward, and the content of the optimal one of the sub-nodes is the updated bridge pier coordinate and bridge pier mileage with the largest accumulated reward.

It should be noted that, regarding the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.

Example 3:

corresponding to the above method embodiment, the present embodiment further provides a bridge spanning device, and the bridge spanning device described below and the bridge spanning method described above may be referred to in correspondence.

FIG. 8 is a block diagram illustrating a bridge spanning device 800, according to an exemplary embodiment. As shown in fig. 8, the bridge spanning device 800 may include: a processor 801, a memory 802. The bridging device 800 may also include one or more of a multimedia component 803, an input/output (I/O) interface 804, and a communication component 805.

The processor 801 is configured to control the overall operation of the bridge spanning device 800, so as to complete all or part of the steps in the bridge spanning method. The memory 802 is used to store various types of data to support the operation of the bridge spanning device 800, such data may include, for example, instructions for any application or method operating on the bridge spanning device 800, as well as application-related data, such as contact data, transceived messages, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the bridge spanning device 800 and other devices. Wireless communication, such as Wi-F i, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding communication component 805 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the bridge spanning Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the bridge spanning method described above.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the bridge spanning method described above is also provided. For example, the computer readable storage medium may be the memory 802 described above that includes program instructions executable by the processor 801 of the bridge spanning device 800 to perform the bridge spanning method described above.

Example 4:

corresponding to the above method embodiment, a readable storage medium is also provided in this embodiment, and a readable storage medium described below and a bridge spanning method described above may be referred to correspondingly.

A readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the bridge spanning method of the above-described method embodiments.

The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A bridge span distribution method is characterized by comprising the following steps:

acquiring topographic drawing information of a line;

analyzing to obtain first information according to the line topographic drawing information, wherein the first information comprises pier elevation information, line mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information;

and establishing a bridge span distribution mathematical model based on the Sarsa algorithm and the Alphago based on the first information, and solving the bridge span distribution mathematical model to obtain second information, wherein the second information comprises bridge pier coordinates and bridge pier mileage of the bridge.

2. The bridge span-laying method according to claim 1, wherein the establishing of the bridge span-laying mathematical model based on Sarsa algorithm and AlphaGo based on the first information, and the solving of the bridge span-laying mathematical model to obtain the second information comprises:

acquiring third information, wherein the third information comprises an action set, first price value network information, second price value network information and size information of a bridge in the action set;

establishing a strategy value mathematical model based on a Sarsa (lambda) algorithm, taking the first information and the third information as input information of the strategy value mathematical model, and solving the strategy value mathematical model to obtain a strategy value network, wherein the strategy value network comprises a span distribution scheme and accumulated reward information, and the span distribution scheme comprises bridge pier coordinates and bridge pier mileage of a bridge;

establishing a span analysis mathematical model based on an Alphago algorithm, taking the strategy value network information, the first information and the third information as input information of the span analysis mathematical model, and solving the span analysis mathematical model to obtain fourth information.

3. The bridge spanning method according to claim 2, wherein the establishing of a strategy value mathematical model based on Sarsa (λ) algorithm, taking the first information and the third information as input information of the strategy value mathematical model, and solving the strategy value mathematical model to obtain a strategy value network comprises:

acquiring preset cycle times;

calculating to obtain a state number m according to the line mileage information and a preset state length;

constructing a first cycle body: establishing a Sarsa (lambda) mathematical model based on a simulated annealing algorithm, setting the learning rate of the Sarsa (lambda) mathematical model as a first preset value, the discount rate as a second preset value, the updating step frequency as a third preset value, the initialization temperature as a first preset value and the annealing parameter as exponential decay, and solving the Sarsa (lambda) mathematical model to obtain a one-time span strategy value result;

repeatedly executing the first loop body for a preset number of times to obtain fourth information, wherein the fourth information comprises the result of the cloth-spanning scheme with the preset number of times;

and judging to obtain a strategy value network according to the fourth information, wherein the strategy value network is the result of the one-time crossing scheme with the highest accumulated reward in the fourth information.

4. The bridge spanning method according to claim 3, wherein the establishing of the strategy value mathematical model based on the Sarsa (λ) algorithm, the using of the first information and the third information as the input information of the strategy value mathematical model, and the solving of the strategy value mathematical model to obtain the strategy value network comprises:

establishing a root node of an Alphago mathematical model according to the first information, wherein the root node is bridge initial coordinate information;

establishing a third loop body, namely establishing a node expansion mathematical model based on the strategy value network, the action set and the AlphaGo mathematical model, and solving the node expansion mathematical model to obtain a child node of the optimal span distribution mode under the root node;

and repeatedly executing the third loop body until the node extension mathematical model is calculated to the end of the bridge, and obtaining fourth information.

5. A bridge spanning device, comprising:

the first acquisition unit is used for acquiring topographic drawing information of the line;

the first conversion unit is used for analyzing and obtaining first information according to the line topographic drawing information, wherein the first information comprises pier elevation information, route mileage information, bridge starting coordinate information, bridge ending coordinate information and crossing area position information;

and the bridge span distribution unit is used for establishing a bridge span distribution mathematical model based on the Sarsa algorithm and Alphago based on the first information, solving the bridge span distribution mathematical model to obtain second information, and the second information comprises bridge pier coordinates and mileage bridge piers of the bridge.

6. The bridge spanning device according to claim 5, wherein the bridge spanning unit comprises:

the second acquisition unit is used for acquiring third information, wherein the third information comprises an action set, first price value network information, second price value network information and size information of a bridge in the action set;

the strategy value unit is used for establishing a strategy value mathematical model based on an Sarsa (lambda) algorithm, solving the strategy value mathematical model by taking the first information and the third information as input information of the strategy value mathematical model to obtain a strategy value network, wherein the strategy value network comprises a span distribution scheme and accumulated reward information, and the span distribution scheme comprises bridge pier coordinates and bridge pier mileage;

and the span analysis unit is used for establishing a span analysis mathematical model based on an Alphago algorithm, taking the strategy value network information, the first information and the third information as input information of the span analysis mathematical model, and solving the span analysis mathematical model to obtain fourth information.

7. The bridge spanning device of claim 6, wherein the strategic value unit comprises:

a third obtaining unit, configured to obtain a preset cycle number;

the first sub-calculation unit is used for calculating to obtain a state number m according to the line mileage information and a preset state length;

a first circulation unit for constructing a first circulation body: establishing a Sarsa (lambda) mathematical model based on a simulated annealing algorithm, setting the learning rate of the Sarsa (lambda) mathematical model as a first preset value, the discount rate as a second preset value, the updating step frequency as a third preset value, the initialization temperature as a first preset value and the annealing parameter as exponential decay, and solving the Sarsa (lambda) mathematical model to obtain a one-time span strategy value result;

the first loop judging unit is used for repeatedly executing the first loop body to a preset number of times to obtain fourth information, and the fourth information comprises the cloth span scheme results with the preset number of times;

and the first analysis unit is used for judging to obtain a strategy value network according to the fourth information, wherein the strategy value network is a one-time cross-distribution scheme result with the highest accumulated reward in the fourth information.

8. The bridge spanning device according to claim 6, wherein the spanning analysis unit comprises:

the first establishing unit is used for establishing a root node of the Alphago mathematical model according to the first information, wherein the root node is bridge initial coordinate information;

a third circulation unit, configured to construct a third circulation body, where a node expansion mathematical model is established based on the policy value network, the action set, and the AlphaGo mathematical model, and the node expansion mathematical model is solved to obtain a child node of the optimal span-distribution mode under the root node;

and the third loop judgment unit is used for repeatedly executing the third loop body until the node expansion mathematical model is calculated to be terminated to obtain fourth information.

9. A bridge spanning device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the bridge spanning method according to any one of claims 1 to 4 when executing the computer program.

10. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, carries out the steps of the bridge spanning method according to any one of claims 1 to 4.