CN113204054B

CN113204054B - Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Info

Publication number: CN113204054B
Application number: CN202110386529.1A
Authority: CN
Inventors: 董莉; 江沸菠; 李小龙; 肖林
Original assignee: Hunan University of Technology
Current assignee: Hunan University of Technology
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2022-06-10
Anticipated expiration: 2041-04-12
Also published as: CN113204054A

Abstract

The invention discloses a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method which is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized. According to the method, because the influence of the resistivity at the earlier stage of inversion on the observation data is far greater than the polarizability, the sensitivity of the resistivity is higher than the polarizability, the inversion at the moment is mainly based on the resistivity, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; the resistivity tends to be stable in the later period, the sensitivity of the polarizability is higher than that of the resistivity, the inversion at the moment mainly takes the polarizability, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters; and specific regularization coefficients and constraint application are set for judgment results of the inversion stage through reinforcement learning.

Description

Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Technical Field

The invention belongs to the technical field of geophysical, and relates to a self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.

Background

Wide area Electromagnetic (WFEM) is a new type of frequency domain Electromagnetic prospecting Method. The method has the advantages of stable and reliable field source signals of a controllable source audio magnetotelluric method (CSAMT) and non-remote area measurement of a magnetic couple source frequency sounding Method (MELOS). The wide-area apparent resistivity defined by the WFEM strictly reserves high-order terms in the series expansion of the electromagnetic field expression, can be extracted by measuring only one physical quantity in various working modes, and is the full-area applicable apparent resistivity capable of effectively improving the non-far-zone distortion effect of the electromagnetic sounding curve.

Currently, WFEM obtains a series of positive results in the fields of oil and gas resource exploration, metal mine exploration, engineering survey and the like. However, in practical applications, the frequency domain electromagnetic wave response of the subsurface medium is a comprehensive reflection of electromagnetic induction and the effect of induced electricity. The research of extracting the induced polarization information from the frequency domain electromagnetic method signal is developed, more physical parameters can be obtained, the influence of the polarization effect on the electromagnetic signal can be quantitatively analyzed, and the inversion interpretation precision of the frequency domain electromagnetic method is further improved.

However, since the intensity of the anomaly caused by the underground uneven conductivity is much larger than that caused by the induced electrical effect, the inversion process is obviously divided into two parts: (1) the resistivity inversion part is used for rapidly converging the individual to be close to the correct resistivity parameter in a solution space because the resistivity parameter has a main influence effect on the fitness function curve; (2) in the polarizability inversion part, the influence of the resistivity parameter on the fitness function tends to be stable, the individual starts to perform fine adjustment near the resistivity parameter, the optimization of the polarizability parameter becomes a main reason for the decline of the fitness curve, but the influence on the fitness function is far less than the influence of the resistivity parameter on the fitness function because the polarizability parameter is far less than the resistivity parameter in value, the algorithm is very easy to fall into a local extremum at the moment, the wrong polarizability parameter is obtained, and the difficulty in extracting the excitation information is increased. Therefore, how to extract weak polarizability parameters under the influence of resistivity parameters is a complex engineering problem, and has a particularly great technical challenge.

Disclosure of Invention

The invention aims to provide a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which realizes the identification and regularization setting of adaptive inversion parameters by defining sensitivity as the characteristics of inversion parameter identification and adopting a reinforcement learning method, thereby improving the accuracy of induced polarization information extraction.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized.

The invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which comprises the following steps of:

s1, setting a calculation equation of the wide area apparent resistivity:

in the formula (1), the reaction mixture is,

r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,

is the distance between observation points M and N;

p is the resistivity, I is the current intensity,

k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,

is the angle between r and the current source;

s2, setting an induced polarization model as follows:

in the formula (2), ρ (ω) is a wide-area complex resistivity related to frequency in consideration of the polarization effect; rho_aWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;

s3, setting an inverted objective function as follows:

fit＝E(e)+λ₁R(ρ)+λ₂R(m) (3)

in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]₁、λ₂The two regularization factors are respectively corresponding to R (delta) and R (m), and the reason for adopting the two independent regularization factors is the value space of the polarizability (m belongs to[0,1]) The value space of the resistivity is greatly different (delta > m can be generally considered), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;

r (ρ) and R (m) are both calculated here using the following formulae:

in the formula (4), M is a model parameter obtained by inversion, and comprises resistivity rho and polarizability M;

s4, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;

the sensitivities of resistivity and polarizability are defined as follows:

in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;

s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient;

the reinforcement learning comprises three elements of state, behavior and reward, and system modeling is carried out aiming at the three elements, wherein the state is the sensitivity of resistivity and polarizability, the behavior is a regularization coefficient, and the reward is an improved value of fitness; the system judges the inversion stage according to the current state and outputs a corresponding regularization coefficient, and then calculates the reward according to the inversion result to adjust a strategy and a value function in reinforcement learning; through repeated learning until the strategy and the value function are stable, the inversion stage can be accurately judged and a proper regularization coefficient is set;

and S6, controlling inversion imposed constraint according to the regularization coefficient generated by reinforcement learning, realizing the identification and regularization setting of the adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters).

Further, in step S5, the step of reinforcement learning includes:

step one, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';

the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;

step two, S is an initial state, the state S is input into the current policy network, and the action A is obtained:

A＝μ(S|θ)+N

wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;

step three, the state S executes the action A to obtain the next state S ', the reward R is obtained, and the S, A, R and S' are stored into an experience playback set D ═ S_t,A_t,R_t,S′_t}；

Step four, updating the state S to be S'; randomly collecting n samples from an empirical playback set D S_i,A_i,R_i,S′_i1,2,3, …, n, calculating the output value y of the current Q network Q_i：

y_i＝R_i+γq′(S′_i,μ′(S′_i|θ′)|w′

Wherein R is_iIs state S_iPerforming action A_iThe reward earned, γ is the reward attenuation factor, Q '() is the Q value of the target Q network output, w' is a parameter of the target Q network, μ '() is the policy of the target policy network output, θ' is a parameter of the target policy network;

step five, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;

where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, S_iIs the ith state, A_iIs the ith action, w is a parameter of the current Q network;

step six, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration times t by 1;

step seven, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;

w′＝τw+(1-τ)w′

θ′＝τθ+(1-τ)θ′

wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;

and step eight, judging whether the strategy and the value function are stably converged, finishing the training if the terminal condition is reached, and returning to the step two if the terminal condition is not reached.

Further, in step S6, two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened.

Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.

According to the method, the influence of the resistivity at the earlier stage of inversion on the observation data is far greater than the polarizability, so that the sensitivity of the resistivity is higher than the polarizability, the resistivity is mainly used at the earlier stage of inversion, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; and in the later inversion stage, the resistivity tends to be stable, the sensitivity of the polarizability is higher than that of the resistivity, the polarizability is mainly used, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters. And the specific constraint application also sets the judgment result of the inversion stage through reinforcement learning.

The invention designs a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, so that an inversion algorithm can automatically and quickly identify whether the main parameter of current inversion is polarizability or resistivity, and perform targeted inversion, thereby improving the accuracy of induced polarization information extraction.

Compared with the prior art, the invention has the following advantages:

(1) the method can judge the current inversion state (mainly polarization inversion or mainly resistivity inversion) according to the sensitivities of the resistivity and the polarization in the iteration process, and output correct regularization coefficients and apply correct constraint conditions, thereby realizing intelligent excited electricity information extraction.

(2) The method can effectively solve the problem of uncertainty in multi-parameter inversion.

(3) The method can strengthen the influence of the polarizability in the later inversion stage and improve the accuracy of the extraction of the induced polarization information.

Drawings

Fig. 1 is a flow chart of an adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.

FIG. 2 is a regularization coefficient and constraint setting strategy based on reinforcement learning.

Detailed Description

The invention will be further illustrated with reference to the following specific examples and the accompanying drawings:

example 1

s1, setting a calculation equation of the wide area apparent resistivity:

in the formula (1), the reaction mixture is,

Figure 449169DEST_PATH_FDA0003567677330000022

Figure 366310DEST_PATH_FDA0003567677330000023

is the distance between observation points M and N;

p is the resistivity, I is the current intensity,

is the angle between r and the current source;

s2, setting an induced polarization model as follows:

s3, setting an inverted objective function as follows:

fit＝E(e)+λ₁R(ρ)+λ₂R(m) (3)

in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]₁、λ₂The regularization factors respectively corresponding to R (rho) and R (m), and the reason for adopting two independent regularization factors is the value space of the polarizability (m belongs to [0,1]]) The value space of the resistivity is greatly different (generally, rho > m), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;

r (ρ) and R (m) are both calculated here using the following formulae:

the sensitivities of resistivity and polarizability are defined as follows:

s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient, as shown in FIG. 2 specifically;

the step of reinforcement learning comprises:

A＝μ(S|θ)+N

y_i＝R_i+γq′(S′_i,μ′(S′_i|θ′)|w′)

w′＝τw+(1-τ)w′

θ′＝τθ+(1-τ)θ′

step eight, judging whether the strategy and the value function are stably converged, finishing the training if the strategy and the value function reach the termination condition, and returning to the step two if the strategy and the value function do not reach the termination condition;

s6, controlling constraints imposed by inversion according to regularization coefficients generated by reinforcement learning, realizing identification and regularization setting of self-adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters);

two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened. Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.

Example 2

The method was tested on a three-layer model, the resistivity parameter ρ, the thickness parameter h and the polarizability parameter m of which are set as shown in table 1; the inversion algorithm uses a grayish optimization algorithm GWO in which the population size P and the number of iterations t_maxThe settings of (a) are shown in table 1; the soft update coefficient tau and the reward attenuation factor gamma of reinforcement learning are set as shown in table 1; regularization factor lambda of a minimum constructor when reinforcement learning is not employed₁And λ₂The settings of (2) are shown in table 1.

TABLE 1

The inversion results of the comparison between the method provided by the invention and the method which does not adopt reinforcement learning and adopts an Actor-Critic method (single network) are shown in table 2; the evaluation indexes are Root Mean Square Error (RMSE) and coefficient of determination R²。

TABLE 2

Method	RMSE	R²
			Learning without reinforcement	38.33	0.88
Actor-critical method	30.24	0.91
			The method of the invention	27.43	0.93

According to the inversion result, the inversion method based on reinforcement learning (Actor-Critic method and the method of the invention) is superior to the inversion method without reinforcement learning in result, because the reinforcement learning can automatically identify the physical property stage where the inversion is located, output correct regularization coefficients and apply constraints. The method is superior to the Actor-Critic method because the method adopts double networks to respectively realize Actor and Critic modules, and compared with the Actor-Critic method, the mode of separating the current network and the target network (double networks) can further improve the stability and generalization capability of reinforcement learning.

Claims

1. A reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized;

step 1, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;

the sensitivities of resistivity and polarizability are defined as follows:

the resistivity is taken as the main part in the early inversion stage, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; the resistivity tends to be stable in the later period, the sensitivity of the polarizability is higher than that of the resistivity, the polarizability is taken as the main in the later period of inversion, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters;

judging the current inversion state according to the sensitivity of resistivity and polarizability in the iterative process, mainly inverting the polarizability or mainly inverting the resistivity, outputting a correct regularization coefficient and applying a correct constraint condition, thereby realizing intelligent excited electricity information extraction;

step 2, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient;

and 3, controlling constraints applied by inversion according to the regularization coefficients generated by reinforcement learning, realizing identification and regularization setting of adaptive inversion parameters, and obtaining high-precision induced polarization information including resistivity and polarizability parameters.

2. The reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method according to claim 1, characterized in that the step 1 is preceded by the following steps:

s1, setting a calculation equation of the wide area apparent resistivity:

in the formula (1), the reaction mixture is,

is the distance between observation points M and N;

p is the resistivity, I is the current intensity,

is the angle between r and the current source;

s2, setting an induced polarization model as follows:

in the formula (2), ρ (ω) is a wide-area complex resistivity with respect to frequency in consideration of the polarization effect; rho_aWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;

s3, setting an inverted objective function as follows:

fit＝E(e)+λ₁R(ρ)+λ₂R(m) (3)

in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]₁、λ₂The two independent regularization factors are adopted because the value space m of the polarizability is greatly different from the value space rho of the resistivity;

wherein m belongs to [0,1], rho > m;

if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, which is the fitting error of the data during inversion;

r (ρ) and R (m) are both calculated here using the following formulae:

in the formula (4), M is a model parameter obtained by inversion, and includes resistivity ρ and polarizability M.

3. The reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method according to claim 2, wherein in the step 2, the reinforcement learning step comprises:

step S201, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';

step S202, S is the initial state, the state S is input into the current policy network, and the action A is obtained:

A＝μ(S|θ)+N

step S203, the state S executes action a to obtain the next state S ', reward R, and stores S, a, R, S' into the experience replay set D ═ S_t,A_t,R_t,S'_t}；

Step S204, updating the state S to be S'; randomly collecting n samples from an empirical playback set D S_i,A_i,R_i,S'_i1,2,3, …, n, calculating the output value y of the current Q network Q_i：

y_i＝R_i+γq'(S'_i,μ'(S'_i|θ')|w')

Wherein R is_iIs state S_iPerforming action A_iThe reward earned, γ is a reward attenuation factor, Q' (. cndot.) is the target Q netQ value of the net output, w ' is a parameter of the target Q network, μ ' () is a policy of the target policy network output, θ ' is a parameter of the target policy network;

step S205, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;

s206, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration number t by 1;

step S207, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;

w′＝τw+(1-τ)w′

θ′＝τθ+(1-τ)θ′

and S208, judging whether the strategy and the value function are stably converged, finishing the training if the terminal condition is reached, and returning to the S202 if the terminal condition is not reached.

4. The reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method according to claim 2, wherein in step 3, two types of constraints are applied in the inversion process:

s301, applying prior information constraint of resistivity and polarizability by using known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm;

step S302, when the resistivity parameter or the polarizability parameter is in an inversion stage, a limitation constraint is imposed on another physical parameter, namely, the search of the other physical parameter is limited within a small range, so as to strengthen the influence of the main physical parameter on the fitness function.