CN113204054B - Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning - Google Patents

Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning Download PDF

Info

Publication number
CN113204054B
CN113204054B CN202110386529.1A CN202110386529A CN113204054B CN 113204054 B CN113204054 B CN 113204054B CN 202110386529 A CN202110386529 A CN 202110386529A CN 113204054 B CN113204054 B CN 113204054B
Authority
CN
China
Prior art keywords
parameter
resistivity
inversion
network
polarizability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110386529.1A
Other languages
Chinese (zh)
Other versions
CN113204054A (en
Inventor
董莉
江沸菠
李小龙
肖林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202110386529.1A priority Critical patent/CN113204054B/en
Publication of CN113204054A publication Critical patent/CN113204054A/en
Application granted granted Critical
Publication of CN113204054B publication Critical patent/CN113204054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V3/00Electric or magnetic prospecting or detecting; Measuring magnetic field characteristics of the earth, e.g. declination, deviation
    • G01V3/08Electric or magnetic prospecting or detecting; Measuring magnetic field characteristics of the earth, e.g. declination, deviation operating with magnetic or electric fields produced or modified by objects or geological structures or by detecting devices
    • G01V3/083Controlled source electromagnetic [CSEM] surveying

Landscapes

  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Geology (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Geophysics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method which is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized. According to the method, because the influence of the resistivity at the earlier stage of inversion on the observation data is far greater than the polarizability, the sensitivity of the resistivity is higher than the polarizability, the inversion at the moment is mainly based on the resistivity, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; the resistivity tends to be stable in the later period, the sensitivity of the polarizability is higher than that of the resistivity, the inversion at the moment mainly takes the polarizability, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters; and specific regularization coefficients and constraint application are set for judgment results of the inversion stage through reinforcement learning.

Description

Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning
Technical Field
The invention belongs to the technical field of geophysical, and relates to a self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.
Background
Wide area Electromagnetic (WFEM) is a new type of frequency domain Electromagnetic prospecting Method. The method has the advantages of stable and reliable field source signals of a controllable source audio magnetotelluric method (CSAMT) and non-remote area measurement of a magnetic couple source frequency sounding Method (MELOS). The wide-area apparent resistivity defined by the WFEM strictly reserves high-order terms in the series expansion of the electromagnetic field expression, can be extracted by measuring only one physical quantity in various working modes, and is the full-area applicable apparent resistivity capable of effectively improving the non-far-zone distortion effect of the electromagnetic sounding curve.
Currently, WFEM obtains a series of positive results in the fields of oil and gas resource exploration, metal mine exploration, engineering survey and the like. However, in practical applications, the frequency domain electromagnetic wave response of the subsurface medium is a comprehensive reflection of electromagnetic induction and the effect of induced electricity. The research of extracting the induced polarization information from the frequency domain electromagnetic method signal is developed, more physical parameters can be obtained, the influence of the polarization effect on the electromagnetic signal can be quantitatively analyzed, and the inversion interpretation precision of the frequency domain electromagnetic method is further improved.
However, since the intensity of the anomaly caused by the underground uneven conductivity is much larger than that caused by the induced electrical effect, the inversion process is obviously divided into two parts: (1) the resistivity inversion part is used for rapidly converging the individual to be close to the correct resistivity parameter in a solution space because the resistivity parameter has a main influence effect on the fitness function curve; (2) in the polarizability inversion part, the influence of the resistivity parameter on the fitness function tends to be stable, the individual starts to perform fine adjustment near the resistivity parameter, the optimization of the polarizability parameter becomes a main reason for the decline of the fitness curve, but the influence on the fitness function is far less than the influence of the resistivity parameter on the fitness function because the polarizability parameter is far less than the resistivity parameter in value, the algorithm is very easy to fall into a local extremum at the moment, the wrong polarizability parameter is obtained, and the difficulty in extracting the excitation information is increased. Therefore, how to extract weak polarizability parameters under the influence of resistivity parameters is a complex engineering problem, and has a particularly great technical challenge.
Disclosure of Invention
The invention aims to provide a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which realizes the identification and regularization setting of adaptive inversion parameters by defining sensitivity as the characteristics of inversion parameter identification and adopting a reinforcement learning method, thereby improving the accuracy of induced polarization information extraction.
In order to achieve the purpose, the invention provides the following technical scheme:
the invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized.
The invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which comprises the following steps of:
s1, setting a calculation equation of the wide area apparent resistivity:
Figure GDA0003562403260000021
in the formula (1), the reaction mixture is,
Figure DEST_PATH_FDA0003567677330000022
r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,
Figure DEST_PATH_FDA0003567677330000023
is the distance between observation points M and N;
Figure GDA0003562403260000024
p is the resistivity, I is the current intensity,
Figure GDA0003562403260000025
k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,
Figure GDA0003562403260000026
is the angle between r and the current source;
s2, setting an induced polarization model as follows:
Figure GDA0003562403260000027
in the formula (2), ρ (ω) is a wide-area complex resistivity related to frequency in consideration of the polarization effect; rhoaWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;
s3, setting an inverted objective function as follows:
fit=E(e)+λ1R(ρ)+λ2R(m) (3)
in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]1、λ2The two regularization factors are respectively corresponding to R (delta) and R (m), and the reason for adopting the two independent regularization factors is the value space of the polarizability (m belongs to[0,1]) The value space of the resistivity is greatly different (delta > m can be generally considered), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;
r (ρ) and R (m) are both calculated here using the following formulae:
Figure GDA0003562403260000031
in the formula (4), M is a model parameter obtained by inversion, and comprises resistivity rho and polarizability M;
s4, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;
the sensitivities of resistivity and polarizability are defined as follows:
Figure GDA0003562403260000032
in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;
s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient;
the reinforcement learning comprises three elements of state, behavior and reward, and system modeling is carried out aiming at the three elements, wherein the state is the sensitivity of resistivity and polarizability, the behavior is a regularization coefficient, and the reward is an improved value of fitness; the system judges the inversion stage according to the current state and outputs a corresponding regularization coefficient, and then calculates the reward according to the inversion result to adjust a strategy and a value function in reinforcement learning; through repeated learning until the strategy and the value function are stable, the inversion stage can be accurately judged and a proper regularization coefficient is set;
and S6, controlling inversion imposed constraint according to the regularization coefficient generated by reinforcement learning, realizing the identification and regularization setting of the adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters).
Further, in step S5, the step of reinforcement learning includes:
step one, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';
the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;
step two, S is an initial state, the state S is input into the current policy network, and the action A is obtained:
A=μ(S|θ)+N
wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;
step three, the state S executes the action A to obtain the next state S ', the reward R is obtained, and the S, A, R and S' are stored into an experience playback set D ═ St,At,Rt,S′t};
Step four, updating the state S to be S'; randomly collecting n samples from an empirical playback set D Si,Ai,Ri,S′i1,2,3, …, n, calculating the output value y of the current Q network Qi
yi=Ri+γq′(S′i,μ′(S′i|θ′)|w′
Wherein R isiIs state SiPerforming action AiThe reward earned, γ is the reward attenuation factor, Q '() is the Q value of the target Q network output, w' is a parameter of the target Q network, μ '() is the policy of the target policy network output, θ' is a parameter of the target policy network;
step five, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;
Figure GDA0003562403260000051
where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, SiIs the ith state, AiIs the ith action, w is a parameter of the current Q network;
step six, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration times t by 1;
Figure GDA0003562403260000052
step seven, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;
w′=τw+(1-τ)w′
θ′=τθ+(1-τ)θ′
wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;
and step eight, judging whether the strategy and the value function are stably converged, finishing the training if the terminal condition is reached, and returning to the step two if the terminal condition is not reached.
Further, in step S6, two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened.
Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.
According to the method, the influence of the resistivity at the earlier stage of inversion on the observation data is far greater than the polarizability, so that the sensitivity of the resistivity is higher than the polarizability, the resistivity is mainly used at the earlier stage of inversion, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; and in the later inversion stage, the resistivity tends to be stable, the sensitivity of the polarizability is higher than that of the resistivity, the polarizability is mainly used, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters. And the specific constraint application also sets the judgment result of the inversion stage through reinforcement learning.
The invention designs a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, so that an inversion algorithm can automatically and quickly identify whether the main parameter of current inversion is polarizability or resistivity, and perform targeted inversion, thereby improving the accuracy of induced polarization information extraction.
Compared with the prior art, the invention has the following advantages:
(1) the method can judge the current inversion state (mainly polarization inversion or mainly resistivity inversion) according to the sensitivities of the resistivity and the polarization in the iteration process, and output correct regularization coefficients and apply correct constraint conditions, thereby realizing intelligent excited electricity information extraction.
(2) The method can effectively solve the problem of uncertainty in multi-parameter inversion.
(3) The method can strengthen the influence of the polarizability in the later inversion stage and improve the accuracy of the extraction of the induced polarization information.
Drawings
Fig. 1 is a flow chart of an adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.
FIG. 2 is a regularization coefficient and constraint setting strategy based on reinforcement learning.
Detailed Description
The invention will be further illustrated with reference to the following specific examples and the accompanying drawings:
example 1
The invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which comprises the following steps of:
s1, setting a calculation equation of the wide area apparent resistivity:
Figure GDA0003562403260000061
in the formula (1), the reaction mixture is,
Figure 449169DEST_PATH_FDA0003567677330000022
r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,
Figure 366310DEST_PATH_FDA0003567677330000023
is the distance between observation points M and N;
Figure GDA0003562403260000071
p is the resistivity, I is the current intensity,
Figure GDA0003562403260000072
k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,
Figure GDA0003562403260000073
is the angle between r and the current source;
s2, setting an induced polarization model as follows:
Figure GDA0003562403260000074
in the formula (2), ρ (ω) is a wide-area complex resistivity related to frequency in consideration of the polarization effect; rhoaWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;
s3, setting an inverted objective function as follows:
fit=E(e)+λ1R(ρ)+λ2R(m) (3)
in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]1、λ2The regularization factors respectively corresponding to R (rho) and R (m), and the reason for adopting two independent regularization factors is the value space of the polarizability (m belongs to [0,1]]) The value space of the resistivity is greatly different (generally, rho > m), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;
r (ρ) and R (m) are both calculated here using the following formulae:
Figure GDA0003562403260000075
in the formula (4), M is a model parameter obtained by inversion, and comprises resistivity rho and polarizability M;
s4, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;
the sensitivities of resistivity and polarizability are defined as follows:
Figure GDA0003562403260000076
in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;
s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient, as shown in FIG. 2 specifically;
the reinforcement learning comprises three elements of state, behavior and reward, and system modeling is carried out aiming at the three elements, wherein the state is the sensitivity of resistivity and polarizability, the behavior is a regularization coefficient, and the reward is an improved value of fitness; the system judges the inversion stage according to the current state and outputs a corresponding regularization coefficient, and then calculates the reward according to the inversion result to adjust a strategy and a value function in reinforcement learning; through repeated learning until the strategy and the value function are stable, the inversion stage can be accurately judged and a proper regularization coefficient is set;
the step of reinforcement learning comprises:
step one, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';
the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;
step two, S is an initial state, the state S is input into the current policy network, and the action A is obtained:
A=μ(S|θ)+N
wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;
step three, the state S executes the action A to obtain the next state S ', the reward R is obtained, and the S, A, R and S' are stored into an experience playback set D ═ St,At,Rt,S′t};
Step four, updating the state S to be S'; randomly collecting n samples from an empirical playback set D Si,Ai,Ri,S′i1,2,3, …, n, calculating the output value y of the current Q network Qi
yi=Ri+γq′(S′i,μ′(S′i|θ′)|w′)
Wherein R isiIs state SiPerforming action AiThe reward earned, γ is the reward attenuation factor, Q '() is the Q value of the target Q network output, w' is a parameter of the target Q network, μ '() is the policy of the target policy network output, θ' is a parameter of the target policy network;
step five, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;
Figure GDA0003562403260000091
where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, SiIs the ith state, AiIs the ith action, w is a parameter of the current Q network;
step six, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration times t by 1;
Figure GDA0003562403260000092
step seven, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;
w′=τw+(1-τ)w′
θ′=τθ+(1-τ)θ′
wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;
step eight, judging whether the strategy and the value function are stably converged, finishing the training if the strategy and the value function reach the termination condition, and returning to the step two if the strategy and the value function do not reach the termination condition;
s6, controlling constraints imposed by inversion according to regularization coefficients generated by reinforcement learning, realizing identification and regularization setting of self-adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters);
two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened. Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.
Example 2
The method was tested on a three-layer model, the resistivity parameter ρ, the thickness parameter h and the polarizability parameter m of which are set as shown in table 1; the inversion algorithm uses a grayish optimization algorithm GWO in which the population size P and the number of iterations tmaxThe settings of (a) are shown in table 1; the soft update coefficient tau and the reward attenuation factor gamma of reinforcement learning are set as shown in table 1; regularization factor lambda of a minimum constructor when reinforcement learning is not employed1And λ2The settings of (2) are shown in table 1.
TABLE 1
Figure GDA0003562403260000101
The inversion results of the comparison between the method provided by the invention and the method which does not adopt reinforcement learning and adopts an Actor-Critic method (single network) are shown in table 2; the evaluation indexes are Root Mean Square Error (RMSE) and coefficient of determination R2
TABLE 2
Method RMSE R2
Learning without reinforcement 38.33 0.88
Actor-critical method 30.24 0.91
The method of the invention 27.43 0.93
According to the inversion result, the inversion method based on reinforcement learning (Actor-Critic method and the method of the invention) is superior to the inversion method without reinforcement learning in result, because the reinforcement learning can automatically identify the physical property stage where the inversion is located, output correct regularization coefficients and apply constraints. The method is superior to the Actor-Critic method because the method adopts double networks to respectively realize Actor and Critic modules, and compared with the Actor-Critic method, the mode of separating the current network and the target network (double networks) can further improve the stability and generalization capability of reinforcement learning.

Claims (4)

1. A reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized;
step 1, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;
the sensitivities of resistivity and polarizability are defined as follows:
Figure FDA0003567677330000011
in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;
the resistivity is taken as the main part in the early inversion stage, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; the resistivity tends to be stable in the later period, the sensitivity of the polarizability is higher than that of the resistivity, the polarizability is taken as the main in the later period of inversion, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters;
judging the current inversion state according to the sensitivity of resistivity and polarizability in the iterative process, mainly inverting the polarizability or mainly inverting the resistivity, outputting a correct regularization coefficient and applying a correct constraint condition, thereby realizing intelligent excited electricity information extraction;
step 2, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient;
and 3, controlling constraints applied by inversion according to the regularization coefficients generated by reinforcement learning, realizing identification and regularization setting of adaptive inversion parameters, and obtaining high-precision induced polarization information including resistivity and polarizability parameters.
2. The reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method according to claim 1, characterized in that the step 1 is preceded by the following steps:
s1, setting a calculation equation of the wide area apparent resistivity:
Figure FDA0003567677330000021
in the formula (1), the reaction mixture is,
Figure FDA0003567677330000022
r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,
Figure FDA0003567677330000023
is the distance between observation points M and N;
Figure FDA0003567677330000024
Figure FDA0003567677330000025
p is the resistivity, I is the current intensity,
Figure FDA0003567677330000026
k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,
Figure FDA0003567677330000027
is the angle between r and the current source;
s2, setting an induced polarization model as follows:
Figure FDA0003567677330000028
in the formula (2), ρ (ω) is a wide-area complex resistivity with respect to frequency in consideration of the polarization effect; rhoaWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;
s3, setting an inverted objective function as follows:
fit=E(e)+λ1R(ρ)+λ2R(m) (3)
in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]1、λ2The two independent regularization factors are adopted because the value space m of the polarizability is greatly different from the value space rho of the resistivity;
wherein m belongs to [0,1], rho > m;
if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, which is the fitting error of the data during inversion;
r (ρ) and R (m) are both calculated here using the following formulae:
Figure FDA0003567677330000029
in the formula (4), M is a model parameter obtained by inversion, and includes resistivity ρ and polarizability M.
3. The reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method according to claim 2, wherein in the step 2, the reinforcement learning step comprises:
step S201, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';
the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;
step S202, S is the initial state, the state S is input into the current policy network, and the action A is obtained:
A=μ(S|θ)+N
wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;
step S203, the state S executes action a to obtain the next state S ', reward R, and stores S, a, R, S' into the experience replay set D ═ St,At,Rt,S't};
Step S204, updating the state S to be S'; randomly collecting n samples from an empirical playback set D Si,Ai,Ri,S'i1,2,3, …, n, calculating the output value y of the current Q network Qi
yi=Ri+γq'(S'i,μ'(S'i|θ')|w')
Wherein R isiIs state SiPerforming action AiThe reward earned, γ is a reward attenuation factor, Q' (. cndot.) is the target Q netQ value of the net output, w ' is a parameter of the target Q network, μ ' () is a policy of the target policy network output, θ ' is a parameter of the target policy network;
step S205, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;
Figure FDA0003567677330000031
where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, SiIs the ith state, AiIs the ith action, w is a parameter of the current Q network;
s206, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration number t by 1;
Figure FDA0003567677330000041
step S207, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;
w′=τw+(1-τ)w′
θ′=τθ+(1-τ)θ′
wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;
and S208, judging whether the strategy and the value function are stably converged, finishing the training if the terminal condition is reached, and returning to the S202 if the terminal condition is not reached.
4. The reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method according to claim 2, wherein in step 3, two types of constraints are applied in the inversion process:
s301, applying prior information constraint of resistivity and polarizability by using known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm;
step S302, when the resistivity parameter or the polarizability parameter is in an inversion stage, a limitation constraint is imposed on another physical parameter, namely, the search of the other physical parameter is limited within a small range, so as to strengthen the influence of the main physical parameter on the fitness function.
CN202110386529.1A 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning Active CN113204054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110386529.1A CN113204054B (en) 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110386529.1A CN113204054B (en) 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113204054A CN113204054A (en) 2021-08-03
CN113204054B true CN113204054B (en) 2022-06-10

Family

ID=77026635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110386529.1A Active CN113204054B (en) 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113204054B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113960674B (en) * 2021-10-14 2023-11-21 湖北省水文地质工程地质勘察院有限公司 Wide-area electromagnetic method two-dimensional inversion method
CN115793064B (en) * 2022-07-11 2023-06-02 成都理工大学 Improved extraction method of excitation information in semi-aviation transient electromagnetic data
CN115829001B (en) 2022-11-08 2023-06-20 中国科学院地质与地球物理研究所 Transient electromagnetic-excitation field separation and multi-parameter information extraction method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706587B (en) * 2009-11-24 2012-06-27 中南大学 Method for extracting induced polarization model parameters prospected by electrical method
CN102495428A (en) * 2011-12-12 2012-06-13 山东大学 Resistivity real-time imaging monitoring method and system for water-bursting geological disaster in construction period of underground engineering
CN107290793B (en) * 2017-06-05 2019-02-19 湖南师范大学 It is a kind of to be leapfroged the ultra high density electrical method parallel refutation method of algorithm based on weighting more strategies
CN111143984A (en) * 2019-12-23 2020-05-12 贵州大方煤业有限公司 Magnetotelluric two-dimensional inversion method based on genetic algorithm optimization neural network
CN112083509B (en) * 2020-08-14 2022-06-07 南方科技大学 Method for detecting induced polarization abnormity in time-frequency electromagnetic method

Also Published As

Publication number Publication date
CN113204054A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN113204054B (en) Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning
Song et al. Application of particle swarm optimization to interpret Rayleigh wave dispersion curves
CN107630697B (en) Formation resistivity joint inversion method based on electromagnetic wave resistivity logging while drilling
CN102508293B (en) Pre-stack inversion thin layer oil/gas-bearing possibility identifying method
CN110133733B (en) Conductance-polarizability multi-parameter imaging method based on particle swarm optimization algorithm
CN104614718A (en) Method for decomposing laser radar waveform data based on particle swarm optimization
CN112733449A (en) CNN well-seismic joint inversion method, CNN well-seismic joint inversion system, CNN well-seismic joint inversion storage medium, CNN well-seismic joint inversion equipment and CNN well-seismic joint inversion application
CN108318921A (en) A kind of quick earthquake stochastic inversion methods based on lateral confinement
Qi et al. A method for reducing transient electromagnetic noise: Combination of variational mode decomposition and wavelet denoising algorithm
CN112699596A (en) Wide-area electromagnetic method induced polarization information nonlinear extraction method based on learning
CN107256316B (en) Artificial intelligence electromagnetic logging inversion method based on high-speed forward result training
Zhang et al. An improved UKF algorithm for extracting weak signals based on RBF neural network
Zhu et al. A fast inversion of induction logging data in anisotropic formation based on deep learning
CN114442153A (en) Near-fault seismic motion fitting method
CN110119586B (en) Axial conductivity anisotropy transient electromagnetic three-component three-dimensional FDTD forward modeling method
CN113468466B (en) One-dimensional wave equation solving method based on neural network
CN113486591B (en) Gravity multi-parameter data density weighted inversion method for convolutional neural network result
Zhou et al. Determination of pore size distribution in tight gas sandstones based on Bayesian regularization neural network with MICP, NMR and petrophysical logs
Jiang et al. A new method for dynamic predicting porosity and permeability of low permeability and tight reservoir under effective overburden pressure based on BP neural network
Jiang et al. An ICPSO-RBFNN nonlinear inversion for electrical resistivity imaging
Wang et al. Multi-objective particle swarm optimization for multimode surface wave analysis
CN110441815B (en) Simulated annealing Rayleigh wave inversion method based on differential evolution and block coordinate descent
CN112773396A (en) Medical imaging method based on full waveform inversion, computer equipment and storage medium
CN116992754A (en) Rapid inversion method for logging while drilling data based on transfer learning
CN102227096B (en) Identification method for variable step-size least mean P-norm system in non-Gaussian environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant