CN115098906B

CN115098906B - Bridge maintenance method and system based on deep reinforcement learning and system reliability

Info

Publication number: CN115098906B
Application number: CN202210482833.0A
Authority: CN
Inventors: 李惠; 徐阳; 陈家辉
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2023-04-07
Anticipated expiration: 2042-05-05
Also published as: CN115098906A

Abstract

The invention discloses a bridge intelligent maintenance decision-making method and system based on deep reinforcement learning and system reliability, belonging to the technical field of intelligent infrastructure, wherein the method comprises the following steps: constructing a redundancy system model of the whole bridge deck, decomposing the redundancy system model into a series of small-scale local bridge decks, and calculating the reliability probability and reliability index of the whole bridge deck system based on the failure probability of the local bridge decks and the reliability theory of the redundancy system; designing a comprehensive reward function based on maintenance cost and safety cost according to the reliability index so as to establish an integral bridge deck maintenance decision network model based on deep reinforcement learning; and training the integral bridge deck maintenance decision network model based on the deep reinforcement learning until convergence, and inputting the reliability index, the local reliability index and the service time of the bridge deck system into the trained model to obtain a bridge maintenance action result, namely realizing the intelligent bridge maintenance decision based on the deep reinforcement learning.

Description

Bridge maintenance method and system based on deep reinforcement learning and system reliability

Technical Field

The invention relates to the technical field of intelligent infrastructure, in particular to a bridge intelligent maintenance decision method and a bridge intelligent maintenance decision system based on deep reinforcement learning and system reliability.

Background

Failure of a bridge structure can cause severe economic loss, environmental damage and social impact. Wherein, the orthotropic steel bridge deck plate system is an important component of the whole bridge. Because the automobile directly bears the load of the automobile and various long-term environmental effects, damages such as fatigue, corrosion and the like inevitably occur in the service period. In order to ensure the operation safety of the bridge structure in the service period, the reasonable maintenance of the bridge deck system is very important. The maintenance decision of the bridge deck system is a comprehensive methodology combining state evaluation, degradation prediction and maintenance scheme optimization, and the purpose of researching the maintenance decision is to improve the reliability of the system, prevent the system from failing and reduce the maintenance cost of the system.

The initiation and development of fatigue cracks in the deck slab of the bridge as a whole may lead to a deterioration in the load-bearing properties. In order to ensure the safety of the bridge deck system in the service period with the lowest cost, the reliability evaluation of the long-span bridge girder system is very important. Reliability assessment of the decking is also a multi-scale problem, since the fatigue crack size is much smaller than the main girder span.

The bridge maintenance strategy mainly comprises two aspects of maintenance time and maintenance degree. Based on the maintenance decision criteria, the maintenance plan can be further divided into two categories, time-based maintenance strategy and state-based maintenance strategy. Time-based maintenance strategies, which may result in wasted component life due to a higher maintenance frequency, or significant loss of failure due to a lower maintenance frequency, maintain the system at predetermined intervals. State-based maintenance strategies make maintenance plans based on the detected or monitored current state of degradation of the structure and are therefore generally more efficient.

At present, a great deal of research is carried out on the maintenance decision of a single bridge component, and a maintenance strategy based on a threshold value is generally adopted, namely, the maintenance is carried out when the state of the component reaches the set threshold value. Due to the lower maintenance action space, state space, and decision variable dimensions, the optimal solution for a single component maintenance decision is generally easier to obtain.

The optimization of the maintenance decision of the bridge multi-component system is more complex than that of a single-component system, and the difficulty is mainly shown in the following four aspects:

(1) As the number of components increases, the system state space and maintenance action space dimensions increase exponentially;

(2) Structural correlation exists among the components, namely certain components can form a whole in function and need to be maintained simultaneously;

(3) There is an economic correlation between components, which is more pronounced when the cost of service initiation is high, while it is generally more economical to service multiple components at the same time;

(4) Random correlation exists among components, for example, correlation exists in the random process of component degradation: there is a correlation between the degradation processes of multiple fatigue cracks.

Due to the complexity of multi-component system maintenance decision modeling and analysis, research has been undertaken to date for time-based maintenance decisions in making maintenance decisions for multi-component systems. Threshold-based service decision extensions have been applied to multi-component systems in some studies, but are applied to multi-component systems requiring different service thresholds to be set for different components. As system components increase, the maintenance decision optimization problem becomes more complex and therefore is typically applied only to systems consisting of a small number of components.

The conventional threshold-based method is to optimize the maintenance threshold and other decision variables (such as maintenance time and maintenance frequency) of the degraded system based on the principles of cost minimization, safety maximization, etc. under the assumption of several different maintenance degrees (such as full maintenance and partial maintenance, etc.). Since optimization objectives are often mutually constrained, such as maintenance costs and remaining life of the structure. Thus, conventional repair decision problems are often relegated to multi-objective optimization problems. Dynamic programming, genetic algorithms, particle swarm algorithms, and the like are widely adopted multi-objective optimization methods. However, the application of the method is limited by high-dimensional decision variables of a multi-component system and the essence of a static optimization method, and the long-term non-stationary sequence decision problem is difficult to process, especially the multi-state multi-action situation is involved.

Disclosure of Invention

The present invention is directed to solving, at least in part, one of the technical problems in the related art.

Therefore, the invention aims to provide an intelligent bridge maintenance decision method based on deep reinforcement learning and system reliability.

The second purpose of the invention is to provide an intelligent bridge maintenance decision system based on deep reinforcement learning and system reliability.

A third object of the invention is to propose a computer device.

A fourth object of the invention is to propose a non-transitory computer-readable storage medium.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability, including: s1, constructing a redundancy system model of the whole bridge deck, decomposing the redundancy system model into a series of small-scale local bridge decks, and calculating the reliability probability and reliability index of the whole bridge deck system based on the failure probability of the local bridge decks and the reliability theory of the redundancy system; s2, designing a comprehensive reward function based on maintenance cost and safety cost according to the reliability index so as to establish an integral bridge deck maintenance decision network model based on deep reinforcement learning; and S3, training the integral bridge deck maintenance decision network model based on the deep reinforcement learning until convergence, and inputting the reliability index, the local reliability index and the service time of the bridge deck system into the trained integral bridge deck maintenance decision network model based on the deep reinforcement learning to obtain a bridge maintenance action result.

The intelligent bridge maintenance decision method based on deep reinforcement learning and system reliability of the embodiment of the invention carries out redundant system modeling of a full-bridge integral bridge deck, decomposes the full-bridge orthotropic bridge deck into a series of small-scale local bridge decks, and calculates the system reliability of the integral bridge deck based on the failure probability of the local bridge decks and the redundant system reliability theory; designing a comprehensive reward function based on maintenance cost and safety cost by taking a local bridge deck as a minimum maintenance unit, and establishing an integral bridge deck maintenance decision network model based on deep reinforcement learning; and finally, taking the reliability index of the bridge deck system, the local reliability index and the service time as the input of a maintenance decision network model, simplifying the maintenance action into the number of the maintained local bridge decks on the premise that the maintenance priority is in inverse proportion to the local reliability index, realizing the intelligent bridge maintenance decision based on deep reinforcement learning, and solving the problem of the traditional method that the multi-scale evaluation of the whole multi-component maintenance decision of the bridge cannot be carried out.

In addition, the bridge intelligent maintenance decision method based on the deep reinforcement learning and the system reliability according to the embodiment of the invention can also have the following additional technical features:

further, in an embodiment of the present invention, the step S1 specifically includes: step S101, establishing the integral bridge deck redundancy system model and decomposing the integral bridge deck redundancy system model into a series of small-scale local bridge decks; step S102, decomposing the system into a series of small-scale local bridge deck plates based on the system failure criterion of designing the whole bridge deck plate; step S103, constructing a system state space according to the system failure criterion; step S104, calculating a system state transition probability matrix according to the system state space; and S105, calculating the reliability probability and reliability index of the whole bridge deck system according to the system state space.

Further, in one embodiment of the present invention, in the two-dimensional rule system composed of n rows and m columns of cells, if there are k or more cells failing in the cells in the consecutive r rows and s columns, the system fails.

Further, in an embodiment of the present invention, the system state space is:

wherein, [ lambda ] _ij ] _n×(s-1) For a subsystem consisting of partial bridge decks of n rows (s-1) columns, lambda _ij Is the state of the ith row and jth column of the local bridge panel, λ _ij E {0,1},0 denotes local bridge deck security, 1 denotes local bridge deck failure, S is the security state set of the subsystem, and the ith element is marked as S _i And F is the failure state set of the subsystem.

Further, in an embodiment of the present invention, the system state transition probability matrix is:

wherein T is a transition probability matrix of system states, N is a transition probability matrix between system security states, and dimension is d _s ×d _s ，N _i,j The probability of the ith state to the jth state transition in the S is obtained; c is a probability matrix for transition from the system safety state to the failure state, and the dimensionality is d _s ×1；C _i,1 The probability of the ith state in the S transferring to the failure state; 0 is a matrix composed of 0 elements and has a dimension of 1 × d _s Indicating that the failure state cannot transition to the safe state; a failure state of 1 can only transition to the failure state.

Further, in an embodiment of the present invention, the step S2 specifically includes: step S201, defining a bridge deck system state, including a reliability matrix, a reliability index and service time of a local bridge deck; step S202, presetting that the maintenance priority of the local bridge deck is in inverse proportion to the local reliability index of the local bridge deck, and simplifying the maintenance action into the number of the maintained local bridge decks so as to define the maintenance action space of the bridge deck system; step S203, defining a comprehensive reward function considering maintenance cost and safety cost simultaneously based on the maintenance action space of the bridge deck system; and step S204, establishing the integral bridge deck maintenance decision network model based on the deep reinforcement learning according to the bridge deck system state and the comprehensive reward function.

Further, in an embodiment of the present invention, the service action space of the bridge deck system is:

A＝[0:p:max(th),n×m],0≤th≤n×m,th mod p＝0

wherein mod is a remainder operation, max (th) is a maximum positive integer divisible by p and not more than nxm, and nxm is the number of the local bridge decks.

Further, in one embodiment of the present invention, the composite reward function is:

Reward＝C _m +C _s

C _m ＝-a _cost -C _setup

C _s ＝-Φ(-β _sys )*C _sys -(β _T -β _sys )*F

wherein Reward is a comprehensive Reward function, C _m For maintenance costs, C _s For safety cost, a _cost For maintenance action corresponding to cost, a _cost >0 is proportional to the number of maintenance units, C _setup For maintenance start-up costs, beta _sys The system reliability index phi (-beta) of the whole bridge deck _sys ) Probability of system failure for a monolithic decking, C _sys Cost of system failure, beta, for a monolithic decking _T The system target reliability index of the whole bridge deck is obtained, and F is a punishment coefficient.

In order to achieve the above object, an embodiment of a second aspect of the present invention provides an intelligent bridge maintenance decision system based on deep reinforcement learning and system reliability, including: the computing module is used for constructing a redundancy system model of the whole bridge deck and decomposing the redundancy system model into a series of small-scale local bridge decks, and computing the reliability probability and reliability index of the whole bridge deck system based on the failure probability of the local bridge decks and the reliability theory of the redundancy system; the construction module is used for designing a comprehensive reward function based on maintenance cost and safety cost according to the reliability index so as to establish an integral bridge deck maintenance decision network model based on deep reinforcement learning; and the training and output module is used for training the integral bridge deck maintenance decision network model based on the deep reinforcement learning until convergence, and inputting the reliability index, the local reliability index and the service time of the bridge deck system into the trained integral bridge deck maintenance decision network model based on the deep reinforcement learning to obtain a bridge maintenance action result.

The bridge intelligent maintenance decision-making system based on deep reinforcement learning and system reliability of the embodiment of the invention carries out redundant system modeling of a full-bridge integral bridge deck, decomposes the full-bridge orthotropic bridge deck into a series of small-scale local bridge decks, and calculates the system reliability of the integral bridge deck based on the failure probability of the local bridge decks and the redundant system reliability theory; designing a comprehensive reward function based on maintenance cost and safety cost by taking a local bridge deck as a minimum maintenance unit, and establishing an integral bridge deck maintenance decision network model based on deep reinforcement learning; and finally, the reliability index, the local reliability index and the service time of the bridge deck system are used as the input of a maintenance decision network model, and on the premise that the maintenance priority is in inverse proportion to the local reliability index, the maintenance action is simplified into the number of the maintained local bridge decks, so that the intelligent bridge maintenance decision based on deep reinforcement learning is realized, and the problem of multi-scale evaluation that the traditional method cannot perform the overall multi-component maintenance decision of the bridge is solved.

To achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.

To achieve the above object, a fourth embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is used to implement the steps of the method described above when executed by a processor.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart of a bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability according to an embodiment of the present invention;

FIG. 2 is an overall flowchart of a bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the partitioning method of the integral bridge deck of the main girder of the bridge and the modeling of the redundancy system according to one embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the evolution process of the state space of the integrated bridge deck redundancy system according to an embodiment of the present invention;

FIG. 5 is a diagram of a deep Q-network model architecture for a repair optimization decision for a bridge deck system according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a bridge intelligent maintenance decision system based on deep reinforcement learning and system reliability according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a bridge intelligent maintenance decision method and system based on deep reinforcement learning and system reliability according to an embodiment of the invention with reference to the accompanying drawings.

It should be noted that the maintenance decision based on deep reinforcement learning is a dynamic method, and an end-to-end direct decision from the system degradation state to the maintenance action can be realized. Through reasonable design of a value function network, the maintenance decision based on deep reinforcement learning can be suitable for solving the high-dimensional problem. In addition, the reinforcement learning framework facilitates handling of correlations between components, such as random correlations, economic correlations, and the like. Therefore, the deep reinforcement learning method is combined with the state-based maintenance decision, so that the method has great development potential and prospect. Therefore, the embodiment of the invention adopts deep reinforcement learning to construct an intelligent beam maintenance decision method, and details are as follows.

Fig. 1 is a flowchart of a bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability according to an embodiment of the present invention.

As shown in fig. 1 and 2, the bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability includes the following steps:

in step S1, an integral bridge deck redundancy system model is constructed and decomposed into a series of small-scale local bridge decks, and the reliability probability and reliability index of the integral bridge deck system are calculated based on the failure probability of the local bridge decks and the redundancy system reliability theory.

Further, in an embodiment of the present invention, step S1 specifically includes:

step S101, establishing a whole bridge deck redundancy system model, and decomposing the model into a series of small-scale local bridge decks.

Specifically, the whole bridge deck is divided into a series of small-scale partial bridge decks, the whole bridge deck system is abstracted into a redundant system, and the partial bridge decks are members forming the redundant system. The process that the vehicle passes through the bridge floor is regarded as a sweeping process of a rectangular window to a regular grid, the size of the rectangular sweeping window is determined by the coverage area of the running vehicle, and the multi-scale calculation problem of the overall reliability of the bridge deck is converted into the reliability calculation problem of a subsystem which is contained in the rectangular window and consists of a plurality of local bridge decks. And considering the reliability analysis efficiency of the local bridge deck during the whole bridge deck division. On the one hand, the larger the local bridge deck size is, the higher the cost of the finite element analysis, the larger the random variable space is, the slower the convergence of the reliability analysis is, and the like. On the other hand, the smaller the scale of the local bridge deck, the larger the number thereof, and the less efficient the calculation of the system reliability.

Step S102, designing the system failure criterion of the whole bridge deck based on and decomposed into a series of small-scale partial bridge decks.

Specifically, the system failure criteria for designing a monolithic decking: in a two-dimensional rule system consisting of n rows and m columns of units, if k or more units in the units of r rows and s columns are failed continuously, the system is failed. As shown in fig. 3, where the whole bridge deck is divided into n rows and m columns for a total of m × n partial bridge decks.

Step S103, constructing a system state space according to the system failure criterion, as follows:

wherein [ lambda ] _ij ] _n×(s-1) For a subsystem consisting of partial bridge decks of n rows (s-1) columns, lambda _ij Is the state of the ith row and jth column of the local bridge panel, λ _ij E {0,1},0 denotes local bridge deck security, 1 denotes local bridge deck failure, S is the security state set of the subsystem, and the ith element is marked as S _i And F is the failure state set of the subsystem.

Specifically, as shown in fig. 4, the redundant system state space of the whole bridge deck is divided into a safe state and a failure state, and a large-scale system is considered to be evolved from a small-scale system along one dimension. According to the redundant system failure criterion in the step S102, if the original system fails, the new system fails regardless of the state of the newly added row of the local bridge deck; if the original system is safe, whether the new system fails depends on the state combination of the newly added component and the last s-1 column of components in the original system. Even if the original system is safe, the new system fails due to the fact that a failure window appears in the new system (a row of components are added on the basis of the original system).

Step S104, calculating a system state transition probability matrix according to the system state space, wherein the system state transition probability matrix is expressed in a block matrix form:

wherein T is a transition probability matrix of system states, N is a transition probability matrix between system security states, and dimension is d _s ×d _s ，N _i,j The probability of transition from the ith state to the jth state in S; c is a probability matrix for transition from the system safety state to the failure state, and the dimensionality is d _s ×1；C _i,1 The probability of the ith state in the S transferring to the failure state; 0 is a matrix composed of 0 elements and has a dimension of 1 × d _s Indicating that the failure state cannot be transferred to the safe state; a failure state of 1 can only transition to the failure state.

Considering that the failure state of the system can only be transferred to the failure state, and the transfer of the safety state depends on the state of the local bridge deck of the newly added column (both the transfer to the failure state and the transfer to the safety state is possible), in order to improve the calculation efficiency, only the transfer between the safety states of the system, namely the N matrix in the formula (2), needs to be calculated.

Recording the failure probability of the local bridge deck in the newly added column as [ p ] _f,1 ,p _f,2 ,L,p _f,n ] ^T Probability N _i,j The specific calculation steps are as follows:

(1) If S _i Second column of (1) and S _j Is different, state S is explained _i Cannot be transferred to S _j Then N is _i,j ＝0；

(2) If in the state S _i Post addition of S _j Satisfies a failure criterion, state S _i Cannot be transferred to S _j Then N is _i,j ＝0；

(3) If neither of the first two conditions is satisfied, state of the descriptionState S _i Can be transferred to S _j Then the transition probability can be calculated as:

in the formula, p _f,h Representing the failure probability of the h-th local bridge deck of the newly added column; lambda [ alpha ] _h State of h-th local bridge deck, lambda, representing a new column _h =1 and λ _h =0 represents a failure state and a safety state, respectively.

And S105, calculating the reliability probability and reliability index of the whole bridge deck system according to the system state space.

Specifically, the initial subsystem is composed of n rows and S-1 columns of local bridge panels, the probability of each state in the safe state set S is recorded as matrix xi, and the dimensionality is 1 xd _s . Calculating the element in xi according to the failure probability of the local bridge panel:

in the formula, xi _1,h Indicating an initial subsystem state of S _h The probability of (d); lambda [ alpha ] _h,i,j Represents the state S _h Row i and column j of the state of the local bridge panel; p is a radical of formula _f,i,j Indicating the failure probability of the ith row and jth column local bridge deck in the initial subsystem.

The initial subsystem expands into a complete system after increasing the m-s +1 columns, so that the state of the initial subsystem is transferred for m-s +1 times, and the reliability index of the whole bridge deck system obtained by calculation is as follows:

β＝Φ ^-1 (p _s ),p _s ＝ξN ^m-s+1 1 (5)

in the formula, beta represents the reliability index of the whole bridge deck system, phi represents the cumulative distribution function of the standard normal distribution, and p _s Representing the safety probability of the whole bridge deck system; xi N ^m-s+1 Representing the probability that the system is in each safety state in S after m-S +1 transfer; 1 represents a dimension of d _s Vector of x 1, each element is 1.

In step S2, a comprehensive reward function based on maintenance cost and safety cost is designed according to the reliability index so as to establish an integral bridge deck maintenance decision network model based on deep reinforcement learning.

It should be noted that, an overall bridge deck maintenance decision network model based on deep reinforcement learning is established, the goal of the model is to take the state of a bridge deck system as input and directly output maintenance actions, wherein a local bridge deck is a minimum maintenance unit, and the goal of overall bridge deck maintenance decision optimization is to ensure the reliability of the bridge deck system in the service period at minimum cost.

Further, in an embodiment of the present invention, step S2 specifically includes:

step S201, defining the system state of the bridge deck, including the reliability matrix, the reliability index and the service time of the local bridge deck.

Specifically, the defined bridge deck system state comprises a reliability matrix of a local bridge deck, a system reliability index and service time, wherein the dimension of the local bridge deck reliability matrix is n × m, and the system reliability index and the service time are scalars.

Step S202, presetting the maintenance priority of the local bridge deck to be inversely proportional to the local reliability index of the local bridge deck, and simplifying the maintenance action into the number of the local bridge deck to define the maintenance action space of the bridge deck system.

Specifically, since the entire bridge deck is divided into n × m partial bridge decks each having two maintenance actions, i.e., maintenance and non-maintenance, the actual size of the maintenance action space is 2 ^nm . However, such a high dimensional motion space cannot be directly modeled. The worst-case unit is generally most economical to maintain given the consistent unit maintenance costs. Therefore, setting the maintenance priority of the local bridge deck to be inversely proportional to the local reliability index thereof, simplifying the maintenance action into the number of the local bridge decks to be maintained, and converting the corresponding maintenance action space into A = [0,1, L, n × m =]The dimension is n × m +1. However, the n × m +1 dimensional motion space is still large, and the decision network is easy to converge locally in the training process of reinforcement learningOptimally, therefore, further, taking p local bridge decks as a group, further reducing the motion space:

A＝[0:p:max(th),n×m],0≤th≤n×m,thmodp＝0 (6)

where mod represents the remainder operation and max (th) represents the largest positive integer divisible by p that does not exceed n × m.

Step S203, defining a comprehensive reward function considering maintenance cost and safety cost simultaneously based on the maintenance action space of the bridge deck system.

It should be noted that since an objective of embodiments of the present invention is to minimize the maintenance cost of the bridge deck system over its life cycle, the reward function is designed according to its cost. The cost of the bridge in the service period comprises maintenance cost and safety cost which are mutually restricted. When the maintenance cost is high, the structure is safer, so that the safety cost is lower; when the maintenance cost is reduced, the structure may be out of order, thereby possibly causing an increase in the safety cost. Therefore, the embodiment of the invention designs a comprehensive reward function which simultaneously considers the maintenance cost and the safety cost:

wherein Reward represents a Reward function, C _m Represents the maintenance cost, C _s Representing a security cost. a is _cost Represents a repair action corresponding cost, a _cost >0 is proportional to the number of maintenance units, C _setup Indicating a repair initiation cost. Beta is a _sys Represents the system reliability index phi (-beta) of the whole bridge deck _sys ) Representing the probability of failure of the system of the monolithic decking, C _sys Represents the system failure cost, beta, of the integral deck slab _T Representing a system target reliability index of the whole bridge deck; and punishing when the system reliability index is lower than the target reliability index, wherein F represents a punishment coefficient, and the larger F represents the larger punishment degree when the system reliability index is lower than the target value.

And step S204, establishing an integral bridge deck maintenance decision network model based on deep reinforcement learning according to the state of the bridge deck system and the comprehensive reward function.

Specifically, as shown in FIG. 5, [ beta ] in the figure _ij ] _n×m Representing a reliability matrix of the local bridge deck, wherein the dimensionality is n multiplied by m; [ T ] _ij ] _n×m The service time of the local bridge deck from the latest maintenance is represented, and the dimension is n multiplied by m; beta is a _sys Representing the system reliability index of the whole bridge deck; t is a unit of _sys Representing the service time of the bridge; n is a radical of _action Representing the number of maintenance actions.

In step S3, the deep reinforcement learning-based overall bridge deck maintenance decision network model is trained until convergence, and the reliability index, the local reliability index, and the service time of the bridge deck system are input into the trained deep reinforcement learning-based overall bridge deck maintenance decision network model to obtain a bridge maintenance action result.

In summary, compared with the conventional bridge maintenance decision-making technology, the bridge intelligent maintenance decision-making method based on deep reinforcement learning and system reliability provided by the embodiment of the invention has the following effects:

(1) The method has the advantages that the redundant system modeling is carried out on the bridge integral bridge deck, the influence of the reliability of different local bridge deck boards on the system reliability is considered, and the problem that the traditional method cannot carry out multi-scale evaluation of the maintenance decision of the bridge integral multi-component is solved;

(2) The bridge maintenance decision network takes the system reliability of the whole bridge deck, the reliability matrix of the local bridge deck, the service time and other variables as input, and compared with the traditional method, the bridge maintenance decision network can better take the actual scene of the degradation of the service state of the bridge along with the time into consideration and better accords with the actual engineering application;

(3) A simplified system maintenance action space is designed, a criterion that the local component maintenance priority is in inverse proportion to the local bridge deck reliability index is established, the engineering application practice is better met, the system maintenance action space is greatly reduced, and the learning efficiency is improved;

(4) A comprehensive reward function considering maintenance cost and safety cost simultaneously is designed, so that the engineering practice is better met, and the defect that the structural safety is influenced or the maintenance cost is overlarge due to the fact that only the maintenance cost or the safety cost is considered is avoided;

(5) The bridge deck maintenance decision method based on deep reinforcement learning is provided, end-to-end direct decision can be realized based on the current state of the system, and the model has the functional characteristics of self-simulation, self-learning, self-evolution and self-updating.

The bridge intelligent maintenance decision system based on deep reinforcement learning and system reliability provided by the embodiment of the invention is described next with reference to the attached drawings.

FIG. 6 is a bridge intelligent maintenance decision system based on deep reinforcement learning and system reliability according to an embodiment of the present invention.

As shown in fig. 6, the system 10 includes: a calculation module 100, a construction module 200, a training and output module 300.

The calculation module 100 is configured to construct an overall bridge deck redundancy system model, decompose the model into a series of small-scale local bridge decks, and calculate a reliability probability and a reliability index of the overall bridge deck system based on a failure probability of the local bridge decks and a redundancy system reliability theory. The building module 200 is configured to design a comprehensive reward function based on the maintenance cost and the security cost according to the reliability index, so as to build an overall bridge deck maintenance decision network model based on deep reinforcement learning. The training and output module 300 is configured to train the deep reinforcement learning-based overall bridge deck maintenance decision network model until convergence, and input the reliability index of the bridge deck system, the local reliability index, and the service time into the trained deep reinforcement learning-based overall bridge deck maintenance decision network model to obtain a bridge maintenance action result.

It should be noted that the foregoing explanation focusing on the embodiment of the bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability is also applicable to the system of the embodiment of the present invention, and the implementation principle is similar, and is not described herein again.

Compared with the traditional bridge maintenance decision technology, the bridge intelligent maintenance decision system based on the deep reinforcement learning and the system reliability provided by the embodiment of the invention has the following effects:

In order to implement the foregoing embodiment, the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability as in the foregoing embodiment is implemented.

In order to implement the foregoing embodiment, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability according to the foregoing embodiment.

In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability is characterized by comprising the following steps:

s1, constructing an integral bridge deck redundancy system model, decomposing the integral bridge deck redundancy system model into a series of small-scale local bridge decks, and calculating the reliability probability and reliability index of an integral bridge deck system based on the failure probability of the local bridge decks and the redundancy system reliability theory;

s2, designing a comprehensive reward function based on maintenance cost and safety cost according to the reliability index so as to establish an integral bridge deck maintenance decision network model based on deep reinforcement learning, wherein the comprehensive reward function specifically comprises the following steps:

step S201, defining a bridge deck system state, including a reliability matrix, a reliability index and service time of a local bridge deck;

step S202, presetting the maintenance priority of local bridge decks in inverse proportion to the local reliability indexes of the local bridge decks, and simplifying maintenance actions into the number of the local bridge decks to define a maintenance action space of a bridge deck system;

step S203, defining a comprehensive reward function considering maintenance cost and safety cost simultaneously based on the maintenance action space of the bridge deck system;

step S204, establishing the integral bridge deck maintenance decision network model based on the deep reinforcement learning according to the bridge deck system state and the comprehensive reward function;

and S3, training the integral bridge deck maintenance decision network model based on the deep reinforcement learning until convergence, and inputting the reliability index, the local reliability index and the service time of the bridge deck system into the trained integral bridge deck maintenance decision network model based on the deep reinforcement learning to obtain a bridge maintenance action result.

2. The bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability as claimed in claim 1, wherein the step S1 specifically comprises:

step S101, establishing the integral bridge deck redundancy system model and decomposing the integral bridge deck redundancy system model into a series of small-scale local bridge decks;

step S102, decomposing the system into a series of small-scale local bridge deck plates based on the system failure criterion of designing the whole bridge deck plate;

step S103, constructing a system state space according to the system failure criterion;

step S104, calculating a system state transition probability matrix according to the system state space;

3. The bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability as claimed in claim 2, wherein the system failure criterion is that in a two-dimensional rule system composed of n rows and m columns of units, if there are k or more units in the units of r rows and s columns that are continuous, the system fails.

4. The bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability according to claim 2, wherein the system state space is:

wherein [ lambda ] _ij ] _n×(s-1) For a subsystem consisting of partial bridge decks of n rows (s-1) columns, lambda _ij For the state of the ith row and jth column local bridge deck, λ _ij E {0,1},0 denotes local bridge deck security, 1 denotes local bridge deck failure, S is the security state set of the subsystem, and the ith element is marked as S _i And F is the failure state set of the subsystem.

5. The bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability as claimed in claim 2, wherein the system state transition probability matrix is:

wherein T is a transition probability matrix of system states, N is a transition probability matrix between system security states, and dimension is d _s ×d _s ，N _i,j The probability of transition from the ith state to the jth state in S; c is a probability matrix for transition from the system safety state to the failure state, and the dimensionality is d _s ×1；C _i,1 The probability of the ith state in the S to be transferred to the failure state; 0 is a matrix composed of 0 elements and has a dimension of 1 × d _s Indicating that the failure state cannot transition to the safe state; a failure state of 1 can only transition to the failure state.

6. The bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability as claimed in claim 1, wherein the bridge deck system maintenance action space is:

A＝[0:p:max(th),n×m],0≤th≤n×m,thmodp＝0

7. The bridge intelligent maintenance decision method based on deep reinforcement learning and system reliability as claimed in claim 1, wherein the comprehensive reward function is:

Reward＝C _m +C _s

C _m ＝-a _cost -C _setup

C _s ＝-Φ(-β _sys )*C _sys -(β _T -β _sys )*F

wherein Reward is a comprehensive Reward function, C _m For maintenance costs, C _s For safety costs, a _cost For maintenance action corresponding to cost, a _cost 0 is proportional to the number of maintenance units C _setup For maintenance start-up costs, beta _sys The system reliability index phi (-beta) of the whole bridge deck _sys ) Probability of system failure for a monolithic decking, C _sys Cost of system failure, beta, for a monolithic decking _T The system target reliability index of the whole bridge deck is obtained, and F is a punishment coefficient.

8. The utility model provides a bridge intelligence maintenance decision-making system based on deep reinforcement study and system reliability which characterized in that includes:

the computing module is used for constructing a redundancy system model of the whole bridge deck and decomposing the redundancy system model into a series of small-scale local bridge decks, and computing the reliability probability and reliability index of the whole bridge deck system based on the failure probability of the local bridge decks and the reliability theory of the redundancy system;

the building module is used for designing a comprehensive reward function based on maintenance cost and safety cost according to the reliability index so as to build an integral bridge deck maintenance decision network model based on deep reinforcement learning, and specifically comprises the following steps:

defining the system state of the bridge deck, including the reliability matrix, reliability index and service time of the local bridge deck;

presetting the maintenance priority of the local bridge deck to be inversely proportional to the local reliability index of the local bridge deck, and simplifying the maintenance action into the number of the local bridge deck to be maintained so as to define the maintenance action space of the bridge deck system;

defining a comprehensive reward function based on the bridge deck system maintenance action space, wherein the comprehensive reward function simultaneously considers maintenance cost and safety cost;

establishing the integral bridge deck maintenance decision network model based on the deep reinforcement learning according to the bridge deck system state and the comprehensive reward function;

and the training and output module is used for training the integral bridge deck maintenance decision network model based on the deep reinforcement learning until convergence, and inputting the reliability index, the local reliability index and the service time of the bridge deck system into the trained integral bridge deck maintenance decision network model based on the deep reinforcement learning to obtain a bridge maintenance action result.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1-7.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.