CN112862295A

CN112862295A - Bridge and tunnel maintenance autonomous decision-making method based on Q learning

Info

Publication number: CN112862295A
Application number: CN202110141634.9A
Authority: CN
Inventors: 杨旻皓; 保丽霞; 季楠; 王秋兰; 刘玉喆; 姜滟
Original assignee: Shanghai Urban Construction Design Research Institute Group Co Ltd
Current assignee: Shanghai Urban Construction Design Research Institute Group Co Ltd
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-05-28
Anticipated expiration: 2041-02-02
Also published as: CN112862295B

Abstract

The invention discloses a Q learning-based road bridge and tunnel maintenance autonomous decision method; the method comprises the following steps: 1. establishing an index system for evaluating the health state of the road, bridge and tunnel; 2. evaluating the health state of the road, bridge and tunnel according to an index system; 3. acquiring health state data { X. }ofthe road, bridge and tunnel, recording maintenance decisions at corresponding moments and cost y. generated by the maintenance decisions, and forming a historical data set of the health state and the maintenance cost of the road, bridge and tunnel; 4. judging whether a Q learning model exists or not and whether the Q learning model needs to be updated or not; 5. training or updating a Q learning model; 6. using Q learning model, according to health statusIndex X_tObtaining maintenance decision a_t(ii) a 7. Performing maintenance decision a_tEntering step 2, reacquiring the health state evaluation X of the road, bridge and tunnel according to the state transition_t+1. The method comprehensively considers the health state indexes and the maintenance cost of the roads, bridges and tunnels, and realizes the maintenance decision of the roads, bridges and tunnels under the aim of minimizing the maintenance cost.

Description

Bridge and tunnel maintenance autonomous decision-making method based on Q learning

Technical Field

The invention relates to the technical field of road, bridge and tunnel maintenance, in particular to a Q learning-based road, bridge and tunnel maintenance autonomous decision-making method.

Background

The healthy road, bridge and tunnel state is the premise of ensuring high efficiency of material circulation and smooth trip of the masses. Therefore, it is of great significance to scientifically make maintenance decisions of the road, bridge and tunnel to ensure that the road, bridge and tunnel are in a healthy state.

However, as a main person and an operation unit of the bridge and tunnel, there is a concern about how to balance the relationship between the state of the bridge and the tunnel and the maintenance cost, that is, how to maintain the state of the bridge and tunnel while maintaining the state of the bridge and tunnel at a low maintenance cost.

The existing road bridge and tunnel maintenance decision method is mostly based on various health status indexes, and the required cost can be rarely considered comprehensively. In practice, in order to comprehensively consider the health status of roads, bridges and tunnels and maintenance cost, the final decision may need to depend on the subjective experience and judgment of experts.

Therefore, in the decision-making problem of road, bridge and tunnel maintenance, how to save maintenance cost and get rid of subjective judgment restriction, and the realization of autonomous decision-making of road, bridge and tunnel maintenance becomes a technical problem which needs to be solved urgently by technical personnel in the field.

Disclosure of Invention

In view of the above defects in the prior art, the invention provides a Q-learning-based autonomous decision-making method for maintaining a bridge and a tunnel in a healthy state and spending minimum maintenance cost, so as to make a suitable autonomous decision-making scheme for maintaining the bridge and the tunnel and obtain maximum economic and social benefits.

In order to achieve the aim, the invention discloses a road bridge and tunnel maintenance autonomous decision-making method based on Q learning; the method comprises the following steps:

step 1, establishing an index system for evaluating the health state of a road, bridge and tunnel;

step 2, evaluating the health state of the road, bridge and tunnel according to an index system, and recording an index vector at the time of t as X_t(ii) a Wherein the continuous index vector is discretized;

step 3, collecting health state data { X. } of the road, bridge and tunnel, recording maintenance decisions at corresponding moments and cost y. generated by the maintenance decisions, and forming a historical data set of the health state and the maintenance cost of the road, bridge and tunnel; if the maintenance is not carried out at the moment t, the historical maintenance cost y at the moment t is concentrated in the historical data of the health state and the maintenance cost_t＝0；

Step 4, judging whether a Q learning model exists or not, and if not, entering step 5; if yes, further judging whether the Q learning model is periodically updated; if the Q learning model needs to be updated, entering a step 5, otherwise, entering a step 6;

step 5, retraining the Q learning model based on the historical data set of the health state and maintenance cost of the road, bridge and tunnel;

step 6, utilizing the Q learning model according to the health state index X_tObtaining maintenance decision a_t；

Step 7, executing the maintenance decision a_tEntering step 2, reacquiring the health state evaluation X of the road, bridge and tunnel according to the state transition_t+1。

Preferably, the step 5 comprises the following steps:

step 5.1, establishing a Q table of the health state indexes of the roads, bridges and tunnels and maintenance decisions;

step 5.2, randomly selecting the health state X of the road, the bridge and the tunnel₀Starting a new round of training;

step 5.3, for any of said health states X_iT, T being a specified upper decision period limit, the decision action a being made using an epsilon-greedy policy_iT, T is the specified upper decision period limit;

step 5.4, according to the decision action a_iTo obtain a new health state X_i+1And the resulting maintenance cost y_i(ii) a To minimize the maintenance cost spent in a given decision period, the reward for this decision is recorded as r (X)_i,a_i)＝-y_i；

And 5.5, updating the Q value, specifically as follows:

Q(X_i,a_i)←(1-a)Q(X_i,a_i)+a(r(X_i,a_i)+γMax{Q(X_i+1,a_i)})；

wherein a is a learning rate, and a belongs to [0,1 ];

γ is the reward discount factor, and γ ═ 0, 1;

step 5.6, when the training number in the round does not exceed the upper limit of the decision period, i.e. i is less than or equal to T, the state is transferred, i.e. X_i←X_i+1Returning to the step 5.3; otherwise, returning to the step 5.2, starting a new round of training until the Q table is converged, and ending the training.

More preferably, in said step 5.3, said decision action a_iIncluding no maintenance, daily maintenance, minor repair, intermediate repair, major repair and reconstruction and extension.

The invention has the beneficial effects that:

the method comprehensively considers the health state indexes and the maintenance cost of the roads, bridges and tunnels, and realizes the maintenance decision of the roads, bridges and tunnels under the aim of minimizing the maintenance cost.

According to the method, the dependence of artificial experience is eliminated by establishing an effective Q learning model, and the autonomous decision of road, bridge and tunnel maintenance can be carried out.

The invention sets up a periodic updating mechanism of the Q learning model, and makes the decision more scientific and reasonable through continuous data acquisition and energization.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 shows a flow chart of an embodiment of the present invention.

FIG. 2 is a diagram illustrating relationships between variables in Q learning model training according to an embodiment of the present invention.

Detailed Description

Examples

As shown in fig. 1, a Q-learning-based autonomous decision-making method for maintaining roads, bridges and tunnels; the method comprises the following steps:

step 2, evaluating the health state of the road, bridge and tunnel according to an index system, and recording an index vector at the time of t as X_t(ii) a Wherein, discretizing continuous index vectors;

Step 4, judging whether a Q learning model exists or not, and if not, entering step 5; if yes, further judging whether to periodically update the Q learning model; if the Q learning model needs to be updated, entering a step 5, otherwise, entering a step 6;

step 6, utilizing a Q learning model according to the health state index X_tObtaining maintenance decision a_t；

Step 7, executing maintenance decision a_tEntering step 2, reacquiring the health state evaluation X of the road, bridge and tunnel according to the state transition_t+1。

The method comprehensively considers the health state indexes and maintenance cost of the road, the Q learning model is set for periodic updating, the decision is more scientific and reasonable through continuous data acquisition and energization, and the road, bridge and tunnel maintenance decision is realized under the aim of minimum maintenance cost.

As shown in fig. 2, in certain embodiments, step 5 comprises the steps of:

step 5.3, for any health State X_iT, T being a specified upper decision period limit, the decision action a being made using an epsilon-greedy policy_iT, T is the specified upper decision period limit;

step 5.4, according to decision action a_iTo obtain a new health state X_i+1And the resulting maintenance cost y_i(ii) a To minimize the maintenance cost spent in a given decision period, the reward for this decision is recorded as r (X)_i,a_i)＝-y_i；

And 5.5, updating the Q value, specifically as follows:

Q(X_i,a_i)←(1-a)Q(X_i,a_i)+a(r(X_i,a_i)+γMax{Q(X_i+1,a_i)})；

wherein a is a learning rate, and a belongs to [0,1 ];

γ is the reward discount factor, and γ ═ 0, 1;

In some embodiments, in step 5.3, decision action a_iIncluding no maintenance, daily maintenance, minor repair, intermediate repair, major repair and reconstruction and extension.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A bridge and tunnel maintenance autonomous decision method based on Q learning; the method comprises the following steps:

2. The method for autonomously deciding the maintenance of the road, bridge and tunnel based on the Q learning as claimed in claim 1, wherein the step 5 comprises the following steps:

step 5.2, randomly selecting the health state X of the road, the bridge and the tunnel₀Start to newTraining in a round;

And 5.5, updating the Q value, specifically as follows:

Q(X_i,a_i)←(1-a)Q(X_i,a_i)+a(r(X_i,a_i)+γMax{Q(X_i+1,a_i)})；

wherein a is a learning rate, and a belongs to [0,1 ];

γ is the reward discount factor, and γ ═ 0, 1;

3. The Q-learning based road, bridge and tunnel maintenance autonomous decision method according to claim 2, characterized in that in the step 5.3, the decision action a_iIncluding no maintenance, daily maintenance, minor repair, intermediate repair, major repair and reconstruction and extension.