CN113077853A

CN113077853A - Double-loss-value network deep reinforcement learning KVFD model mechanical parameter global optimization method and system

Info

Publication number: CN113077853A
Application number: CN202110368257.2A
Authority: CN
Inventors: 张红梅; 周衍; 王凯; 李文彬; 张可浩; 王炯; 万明习
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-07-06
Anticipated expiration: 2041-04-06
Also published as: CN113077853B

Abstract

The invention discloses a global optimization method and a system for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss-value network, wherein the method comprises the following steps: s1, inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; s2, taking the parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain an approximation of a global parameter solution of the pre-acquired nanoindentation measurement curve; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model. According to the method, parameter prediction values are introduced in iteration for parameter guidance, and the global optimal solution can be well approximated.

Description

Double-loss-value network deep reinforcement learning KVFD model mechanical parameter global optimization method and system

Technical Field

The invention belongs to the technical field of mechanical parameters of nanoindentation measurement data, relates to the field of KVFD model multi-parameter function fitting and global parameter approximation, and particularly relates to a method and a system for global optimization of mechanical parameters of a double-loss-value network deep reinforcement learning KVFD model.

Background

At present, in the process of obtaining mechanical parameters of a measured material through nanoindentation measurement data, for simple function fitting, a least square method is mostly adopted, and function parameters are adjusted in a successive iteration mode to reduce the minimum mean square error between a fitting curve and a real curve. The method is quick and effective for fitting simple functions, but is often poor for fitting complex functions and multi-parameter functions.

For multi-parameter optimization of a complex function of the KVFD model, a better global optimal solution cannot be obtained by adopting a common greedy algorithm, a gradient descent algorithm and a simulated annealing algorithm; the greedy algorithm and the gradient descent algorithm can find a local optimal solution near a given initial parameter, and have certain applicability to complex function multi-parameter optimization, but if a global parameter is found, the global parameter is difficult to approximate to the given initial value; the simulated annealing algorithm adopts a certain probability to accept a new parameter solution, has the capability of jumping out of a local optimal trap, shows better optimization capability in the optimization problem with a plurality of local optimal solutions, can find solutions near a global parameter solution with a certain probability, but has the characteristic of probability of the simulated annealing algorithm, so that the simulated annealing algorithm cannot approach the global parameter solution every time and has poor reliability.

In summary, for the current KVFD model complex function multi-parameter optimization problem, it is difficult to effectively approach the global optimal solution.

Disclosure of Invention

The invention aims to provide a method and a system for global optimization of mechanical parameters of a deep reinforcement learning KVFD model of a double-loss value network, so as to solve one or more technical problems. According to the method, parameter prediction values are introduced in iteration for parameter guidance, and the global optimal solution can be well approximated.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses a global optimization method for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss value network, which comprises the following steps of:

s1, inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;

s2, taking the parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain an approximation of a global parameter solution of the pre-acquired nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters;

and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.

In a further development of the invention, in step S1, the pre-acquired nanoindentation measurement curve includes a time series, a force series, and an indentation depth series.

In a further improvement of the present invention, in step S1, the predicted value obtaining network includes: a plurality of LSTM hidden layers and a DNN network;

the unit number of each layer of the LSTM hidden layers is fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; inputting a pre-acquired nanoindentation measurement curve by a first LSTM hidden layer, and inputting an output value of a last LSTM hidden layer into a DNN network;

the DNN network comprises a plurality of fully-connected layers and convolutional layers with different dimensions and is used for converting the value output by the last LSTM hidden layer into a parameter predicted value output.

In a further development of the invention, in step S1, the calculation expression for the LOSS function value is,

in the formula, L_pPartially calculating tag parameters D_trainAnd network output parameters

Loss value between, L_dPartial calculation of curve D_trainCurves corresponding to network output parameters

A loss value of between, w_p、w_dAre respectively L_p、L_dThe weight of both parts.

In a further improvement of the present invention, in step S2, the reward value prediction network of the deep reinforcement learning algorithm includes: a plurality of LSTM hidden layers and a DNN network;

the unit number of each layer of the LSTM hidden layers is fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; the first LSTM hidden layer inputs a difference value obtained by subtracting a pre-acquired nanoindentation measurement curve from a current iteration curve, and the output value of the last LSTM hidden layer enters a DNN network;

the DNN network includes a plurality of fully-connected and convolutional layers of different dimensions for translating the value output by the last LSTM hidden layer into a reward prediction for each directional action.

In a further improvement of the present invention, in step S2, the LOSS function used in the training of the reward value prediction network is the sum of absolute errors of the label reward value vector and the reward value vector output by the network.

In step S2, in the process of obtaining the approximation of the global parameter solution of the pre-obtained nanoindentation measurement curve by iterating the parameter prediction value obtained in step S1 as an initial iteration value of the depth-enhanced learning algorithm, the specific steps of each iteration include:

(1) respectively predicting the reward value of the alternative parameter set of the current iteration parameter by using a reward evaluation rule and the reward value prediction network, and performing weighted addition on the reward value and the alternative parameter set of the current iteration parameter to obtain reward evaluation on the alternative parameter set of the current iteration parameter;

the reward evaluation rule is that for the evaluation of a certain candidate parameter, the curve difference value delta of a curve corresponding to the candidate parameter and a pre-acquired nano indentation measurement curve is calculated firstly, and then the absolute average value of the curve difference value is calculated

The evaluation formula of the reward value r is expressed as:

(2) and (2) calculating a new line of the Q table according to the reward evaluation obtained in the step (1) and the content of the current line of the Q table in the deep reinforcement learning algorithm, finding the maximum value in the new line of the Q table, and taking the corresponding alternative parameter as the current iteration result parameter.

In a further improvement of the present invention, in step S2, the specific step of determining whether the approximation of the global parameter solution reaches the preset convergence condition includes: stopping iteration when the error between the current iteration result parameter corresponding curve and the pre-acquired nanoindentation measurement curve is smaller than a certain preset value; or stopping iteration when the iteration times reach a preset value.

The invention relates to a mechanical parameter extraction system of a nanometer indentation measurement curve based on a KVFD model, which comprises:

the parameter predicted value acquisition module is used for inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;

the depth reinforcement learning iteration output module is used for taking the obtained parameter prediction value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain approximation of a global parameter solution of the pre-obtained nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a hierarchical deep reinforcement learning strategy for complex function multi-parameter optimization, which can solve the problem that the existing complex function multi-parameter optimization is difficult to effectively approach a global parameter solution. The method can more stably search an approximate global parameter solution in the complex function multi-parameter optimization.

In the invention, an integral algorithm is constructed based on deep reinforcement learning (DQN), an iteration initial parameter is given by a deep circular neural network (I-LSTM), and the network also participates in guiding training by global parameters during training; the reward evaluation of the iteration parameters is given by the participation of another deep-cycle neural network (R-LSTM), and the network guides the training by the participation of the global parameters during the training; the design enables the nano indentation measurement curve based on the KVFD model to introduce parameter guidance in the fitting process, and not only guides the fitting parameters and curve adjustment through curve errors, so that the method can better approach a global parameter solution in the complex function multi-parameter optimization problem.

In the invention, the guidance of global parameters on the multi-parameter optimization of complex functions is introduced by designing a neural network (namely R-LSTM) in deep reinforcement learning (DQN), so that the method has better capability of approaching global parameters than the prior art. By additionally arranging a neural network (i.e. I-LSTM) for initial parameter guidance and increasing the calculation of parameter errors in the loss function, the initial iteration parameters of the deep enhanced learning (DQN) can be close to the global parameter solution, the capability of the strategy of the invention for approaching the global parameters in the complex function multi-parameter optimization problem is enhanced, the iteration times of the deep enhanced learning (DQN) can be effectively reduced, and the operation speed is accelerated.

In the present invention, by adjusting the weight w_p、w_dThe attention degree of the I-LSTM to the parameters or curves can be adjusted; by adjusting L_p、L_dThe specific calculation mode of the method can adapt the prediction capability of the I-LSTM to various curve equations with different characteristics, and achieves a good prediction effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic flow chart of a mechanical parameter extraction method of nanoindentation measurement data based on a KVFD model according to an embodiment of the present invention;

FIG. 2 is a diagram of an Initial LSTM Network (I-LSTM) structure for providing Initial parameter prediction for the QL section, in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a Reward LSTM Network (R-LSTM) structure for providing Reward evaluation predictions for the QL section and a method for generating a training data set thereof, in accordance with an embodiment of the present invention;

fig. 4 is a schematic diagram of a mechanical parameter extraction system for nanoindentation measurement data based on the KVFD model according to an embodiment of the present invention;

FIG. 5 is a graph showing experimental curve fitting results in an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating comparison between simulation parameters and fitting parameters according to an embodiment of the present invention; wherein (a) in fig. 6 is a comparison diagram of a parameter indicating elasticity, (b) in fig. 6 is a comparison diagram of a parameter indicating fluidity, and (c) in fig. 6 is a comparison diagram of a parameter indicating viscosity;

FIG. 7 is a schematic diagram of experimental curve fitting error distribution in an embodiment of the present invention; fig. 7 (a) is a schematic view of a scatter plot of the error distribution, and fig. 7 (b) is a schematic view of a histogram of the error distribution.

Detailed Description

In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.

Based on the existing complex function multi-parameter optimization strategy, it is difficult to effectively and stably find or approach a global parameter solution. The invention discloses a strategy for fitting a multi-parameter complex function and approximating global parameters based on Deep reinforcement learning (DQN). The method uses deep reinforcement learning (DQN) to iteratively optimize, wherein the Reward value of the DQN is given by a user-defined Reward Evaluation Rule (RER) and Reward LSTM network (Reward-LSTM, R-LSTM) prediction; initial parameters for the first iteration are given by custom Initial value LSTM network (Initial-LSTM, I-LSTM) prediction. The LSTM is adopted to build a network, and the purpose is to process a time sequence data-multi-parameter complex function curve with a large number of points.

Referring to fig. 1, the present invention designs an algorithm (current-Fitting-DQN, CF-DQN) for complex function multi-parameter Fitting to find a solution approximating global parameters. The design framework of CF-DQN is shown in FIG. 1, and the input is the curve D of the parameter solution to be found_realThe output is a prediction result parameter theta_outThe method takes a Q-Learning algorithm (QL) as a framework, 1 LSTM Network (called Initial LSTM Network, I-LSTM) predicts Initial parameters for the Q-Learning algorithm, and another LSTM Network (called Reward LSTM Network, R-LSTM) evaluates Reward values for alternative parameters in the Q-Learning algorithm. The R-LSTM in the algorithm gives out reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters.

In the embodiment of the invention, for convenient description, a multi-parameter complex function f and a curve D of parameters to be extracted are defined_real，D_realThe corresponding ideal global parameter is theta_realI.e. D_real＝f(θ_real) The output result of CF-DQN is θ_outIt corresponds to the curve D_outLet the output of I-LSTM be θ_I-LSTMThe action set of Q-Learning is A ═ a₁,a₂,……,a_nThe ith row vector of the Q table is

The algorithm operation mainly comprises 2 stages, namely a parameter guiding stage and an iteration optimizing stage:

(1) the parameter boot phase is performed by the I-LSTM network: curve D of the parameter to be extracted_realInputting into I-LSTM network to obtain a set of parameters theta_I-LSTMInputting QL as initial parameter of QL;

(2) the framework of the iterative optimization stage is QL, and meanwhile, the R-LSTM network participates in the evaluation reward link in the QL: when iterating to the ith step, adopting Reward Evaluation Rules (RER) and R-LSTM as the current parameter thetaⁱThe resulting set of parameters after performing action set a assesses the reward value. Based on the obtained prize value andcurrent row contents Q of Q tableⁱCalculate a new line Qⁱ⁺ⁱSelecting the maximum value

Then take the k-th action a of action set A_kLet the parameter be thetaⁱIs updated to thetaⁱ⁺¹And finishing one iteration optimization. Stopping iteration when the maximum iteration optimizing times are reached or the curve error is less than a preset value, wherein the obtained parameter is a final parameter theta_out。

Referring to FIG. 2, in the embodiment of the present invention, the first step of the CF-DQN algorithm is to provide an Initial iteration value for Q-Learning through the Initial LSTM Network (I-LSTM). The input of the I-LSTM is a one-dimensional curve D of the parameter to be extracted, and the number of curve points is fixed as m; then 3 LSTM hidden layers are provided, the number of units of each layer is fixed and consistent, the number of units is close to the number m of curve points when being selected, the first LSTM hidden layer receives input data D, so that the input dimension of the first LSTM layer is m, the output dimension is a set number of units, the input dimension and the output dimension of each subsequent LSTM layer are set numbers of units, and each LSTM layer is connected in a point-to-point mode; followed by a DNN network which may contain a plurality of fully-connected layers and convolutional layers of different dimensions for converting the output of the LSTM layer into an output parameter theta_I-LSTM。

In the embodiment of the invention, in order to introduce the guidance of global parameters to the loss function L, the loss function L is designed_I-LSFrom L_p、L_dThe two parts are respectively weighted by weight w_p、w_dThe weight of the components is made up,

wherein L is_pPartially calculating tag parameters D_trainAnd network output parameters

By adjusting the weight w_p、w_dWe can adjust the degree of importance of I-LSTM to the parameter or curve; by adjusting L_p、L_dThe specific calculation mode of the method can adapt the prediction capability of the I-LSTM to various curve equations with different characteristics, and achieves a good prediction effect.

Referring to fig. 3, in the embodiment of the present invention, the network structure of R-LSTM is similar to that of I-LSTM, and LSTM hidden layer and DNN network are used, the input is the point-by-point difference result of two curves, and the output is the reward prediction for each direction action.

For convenience of description, the following symbols are defined: the optional action set of parameter change is A ═ a₁,a₂,……,a_nD is the curve of the parameter to be extracted_realCorresponding to a true parameter of theta_realThe current reference parameter is theta_nowThe corresponding curve is D_now. Parameter theta_nowThe parameter set obtained after the action set A is

They correspond to a set of curves of

Curve D_nowAnd D_realHas a difference of Δ_trainSet of curves

Are each independently of D_realHas a difference of { Δ }₁,Δ₂,……,Δ_n}. Parameter sets using Reward Evaluation Rules (RER)

Wherein the reward obtained by evaluating each parameter is r₁,r₂,……,r_nLet R_train＝[r₁,r₂,……,r_n]. Will be delta_trainThe input R-LSTM obtains the result of

The algorithm flow for the R-LSTM training phase is shown in FIG. 3. According to the true parameter being theta_realIs the current reference parameter theta_nowThe parameter set obtained after the action set A is

The reward is evaluated. The network is made to memorize the relationship between the difference between the curve corresponding to the current reference parameter and the real curve and the reward evaluation of the alternative parameter set corresponding to the current reference parameter. The trained R-LSTM has indirect guidance of global parameters to parameter optimization, and the global parameter guidance is introduced into the algorithm, so that the algorithm can have the capability of approaching the global parameters.

Each set of data in the training set of R-LSTM contains: a vector delta of length m_trainIs a current reference parameter corresponding curve D_nowAnd the parameter curve D to be extracted_realThe difference results are made point by point; a vector R of length n_trainRepresents the current reference parameter θ_nowAfter the corresponding action set is executed, the reward value which can be obtained by each parameter; unlike I-LSTM, the loss function L used in R-LSTM training_R-LSTAs the sum of absolute errors of the vector of prize values, i.e.

In Q-Learning iteration, the Reward Evaluation of the alternative parameter set comes from 2 aspects, one is Reward Evaluation Rules (RER), the other is the Reward value predicted by R-LSTM, and the Reward Evaluation of the alternative parameter set is weighted to guide parameter iteration. And when the iteration is carried out for a certain number of times or the error between the fitted curve and the real curve is less than a certain value, ending the iteration.

Referring to fig. 4, a mechanical parameter extraction system for nanoindentation measurement curves based on KVFD models according to an embodiment of the present invention includes:

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In the embodiment of the invention, a round probe nanoindentation equation (hereinafter referred to as KVFD equation) of a KVFD model is selected to specifically implement the strategy of the invention. The equation is divided into three loading protocols (called replay, Load-unload and cruise below) of Ramp-replay, Load-unload and Ramp-cruise. The system of equations is shown in table 1.

TABLE 1 nanometer indentation equation of KVFD model under circular probe

Wherein the mechanical parameter to be optimized is [ E ]₀，α,τ]. R is the radius of the probe, v is the indentation depth increase rate under load, k is the pressure increase rate under load, T_rFor the turn-time, Γ (·) is a gamma function,

in order to be a complete beta function,

in order to be an incomplete beta function,

is a function of Mittag-Leffler (M-L).

And (3) constructing CF-DQN algorithm examples of the KVFD equation 3 loading protocols. Assuming that in the KVFD equation, the probe radius R is 8.5 μm and 8.5 × 10^-6And m is selected. The other conditions under the 3 loading protocols in the formula are set as follows:

(1) under the relaxation loading protocol: turning time T_r2s, hold time T _hold3s, 5 × 10 with maximum indentation depth of 5 μm^-6m, the indentation depth increasing speed v in the pressing stage is 2.5 mu m/s is 2.5 multiplied by 10^-6m/s；

(2) Under the load-unload protocol: turning time T _r25s, 5 × 10 with maximum indentation depth of 5 μm^- ⁶m, the indentation depth increasing speed v is 0.2 μm/s is 0.2 × 10^-6m/s；

(3) Under the cruise loading protocol: turning time T_r2s, hold time T _hold3s, 5 μ N-5 × 10^-6N, the increasing speed k of the pressure in the pressing stage is 2.5 mu N/s is 2.5 multiplied by 10^-6N/s。

When the CF-DQN algorithm is implemented, set oneSet parameter is θ ═ E₀，α，τ]The parameter range is E₀∈[10,100000]，α∈[0.01,0.99]，τ∈[1,1000](ii) a Setting the point number of the curve D corresponding to the parameter theta as m-250; the time starting point of the curve is set to be t equal to 0s, the sampling rate under the replay loading protocol is 0.02 s, the sampling rate under the load-unload loading protocol is 0.1 s, and the sampling rate under the cruise loading protocol is 0.02 s. Setting the corresponding equations of the three loading protocols of relax, load-unload and cruise as f_r、f_u、f_cWherein f corresponds to the relaxation load protocol_rAs follows

load-unload load protocol mapping f_uT is not less than 0 and not more than T_rLoading part of, not using t>T_rUnloading section of

F corresponding to CREEP loading protocol_cAs follows

Note that for 3 loading protocols, it is true for 3 different complex curve equations, so 3 different CF-DQN instances need to be implemented separately.

Action set of parameter change a ═ { a ═ a₁,a₂,……,a_nIt contains 8 elements, i.e. n-8, as detailed in table 2. The symbol "↓" represents the parameter item increasing in step size, and the symbol "↓" represents the parameter item decreasing in step size. The step size of each parameter item is 1% of the current parameter value.

TABLE 2 parameter Change action set A set upon implementation of the CF-DQN Algorithm

The number of nodes of 3 LSTM hidden layers in the I-LSTM is set as 256 nodes, a full connection layer of 256 nodes is arranged after the last LSTM hidden layer, the output of the full connection layer is 3 nodes, and the full connection layer corresponds to the standardized E nodes₀Parameters α, τ. When training I-LSTM, we use a training data set that contains one million pieces.

The number of nodes of 3 layers of LSTM hidden layers in the R-LSTM is set to be 256 nodes, a full-connection layer of 256 nodes is arranged behind the last layer of LSTM hidden layer, the output of the full-connection layer is 8 nodes, and the full-connection layer corresponds to the standardized reward value of the parameter after 8 actions are executed. When training R-LSTM, we use a training data set that contains one million pieces.

And setting the maximum iteration times of the QL part to be 1000 times, and terminating the iteration when the iteration times reach the maximum times. Meanwhile, at each iteration, the algorithm keeps track of the minimum curve error encountered in the iteration process

Corresponding result parameter theta^kAnd the iteration sequence number k, once the current iteration sequence number i>k +20, considering that the fitting effect of the multiple iterations on the curve is not improved, terminating the iterations and outputting a final result theta_out＝θ^k。

After building the CF-DQN algorithms for the 3 loading protocols, we tested the algorithms using simulation data. The simulation data is generated according to the above-described settings of the data, so that the generated curve D is known_realCorresponding real parameter theta_realTo facilitate the study of the CF-DQN algorithm result theta_outAnd the relation with the real parameters. Meanwhile, a noise signal can be added to the generated smooth curve to simulate data obtained in a real nanoindentation experiment.

The invention uses simulation data to carry out 2 tests on CF-DQN, and the following is a detailed description of the 2 tests:

(1) and (3) testing curve fitting effect:

using the parameter theta_real＝[20000，0.2，50]5 graphs of replay, load-unload and crop are generated respectively. For each of the 5 curves of the loading protocol, the following 5 treatments were performed: adding 10 average values without adding noise^-7Gaussian noise, uniform noise, rayleigh noise, exponential noise. The above 15 curves were fitted using the CF-DQN algorithm to see curve fitting effects and resulting parameters.

(2) 10000 randomly generated relaxation curves were fitted:

10000 sets of parameters were randomly generated within the range and f was used_r10000 relaxation curves are generated, and then the average value of all the curves is 10^-7N gaussian random noise. The 10000 curves are fitted by using CF-DQN, result parameters are extracted, and the analysis result parameters and the real parameters are compared to verify the robustness of the CF-DQN.

The following are the results of testing an example of the CF-DQN algorithm customized to the KVFD equations.

And (3) testing curve fitting effect:

using the CF-DQN algorithm to correct the parameter θ_real＝[20000,0.2,50]The resulting fitting effect of the KVFD equation curves with different noise additions is shown in fig. 5. FIG. 5 shows the CF-DQN algorithm for a true parameter θ_real＝[E₀，α，τ]＝[20000,0.2,50]The loading protocol is replay, cruise and load-unload, and the fitting results of 15 simulation curve pairs without noise, with Gaussian noise, uniform noise, Rayleigh noise and exponential noise are respectively made. In fig. 5, the curves in the 1 st column to the 3 rd column are corresponding to the replay, load-unload and cruise loading protocols, respectively; line 1 is a curve without noise, line 2 is a curve with gaussian noise, line 3 is a curve with uniform noise, line 4 is a curve with rayleigh noise, line 5 is a curve with exponential noise, the mean values of the noise added are all 10^-7. Generated simulation curve D_realIs a solid line, CF-DQN fitting result curve D_out＝f(θ_out) Represented by a hollow origin.

Table 3 shows the fitting result parameter θ of each curve_outAbsolute error of the resulting parameter from the true parameter | θ_out-θ_realAverage absolute error of | result curve and simulation curve

TABLE 3 CF-DQN Algorithm for real parameter θ_real＝[E₀,α,τ]＝[20000,0.2,50]

In the case of no noise (fig. 5, line 1), the curve of the fitting result of the 3 loading protocols almost agrees with the simulation curve, and the curve MAE is 10^-9Magnitude (see table 3). The relaxation fitting result parameters are [19339, 0.1953,70 ]]The load-unload is [19320,0.1951,70 ]]And creep [18410,0.1935,105 ]]Parameter θ of fitting result_outWith the true parameter theta_realAnd (4) approaching.

Under the condition of adding noise (from the line 2 to the line 5 of FIG. 5), the curve of the fitting result of the 3 loading protocols is consistent with the skeleton of the simulation curve, the fitting is not influenced by noise burrs basically, and the curve MAE is 10^-8Magnitude (see table 3). From the image, the 4 noises with the same mean value have the smallest curve disturbance, namely uniform noise (the 3 rd line in FIG. 5), and the MAEs of the three loading modes of replay, cruise and load-unload are respectively 2.64E-8, 3.15E-8 and 2.65E-8; secondly, Rayleigh noise (figure 5, line 4), and the MAE of the three loading modes of replay, street and load-unload are respectively 5.47E-8, 5.74E-8 and 5.48E-8; the additive gaussian noise (fig. 5, line 2) and the exponential noise (fig. 5, line 5) have the largest disturbance to the curve shape, and the MAE of the 6 curves is about 8E-8. Exponential noise also gives the curve many sharp peaks protruding upwards, making the curve appear to shift upwards, but the fit result curve can still match the bottom skeleton of the upper simulated curve. Fitting result parameter theta of 3 loading protocols_outStill with the true parameter theta_realClose to, and without causing significant changes due to the addition of noise.

In the embodiment of the invention, 10000 random-generated relax curves are fitted: 10000 sets of parameters theta are randomly generated_realGenerating a corresponding relaxation curve D_realAnd adding an average value of 10^-7N Gaussian random noise, then using CF-DQN algorithm to extract parameters from the 10000 curves, and extracting result parameter theta_outAnd a true parameter theta_realAnd performing comparative analysis.

Are respectively to E₀The results of Pearson correlation analysis with α, τ are shown in Table 4. Theta_realE of (A)₀Parameter and theta_outE of (A)₀The correlation coefficient between the parameters is r-0.8288 (p)<0.001)，θ_realAlpha parameter of and theta_outR-0.9963 (p) is the correlation coefficient between the α parameters of (a)<0.001)，θ_realτ parameter and θ_outR-0.2802 (p) is the correlation coefficient between the parameters of (t)<0.001) it can be seen that there is a significant correlation between the output parameter and the true parameter. The correlation coefficient of the alpha parameter is closest to 1, E₀The correlation coefficient of the parameter is also closer to 1 and the correlation coefficient of the tau parameter is further from 1. The method shows that in 3 parameters extracted by the designed CF-DQN algorithm example, the alpha parameter is very close to the real parameter of the curve, and E₀The parameter is closer to the real parameter of the curve, and the difference between the parameter tau and the real parameter of the curve is larger.

TABLE 4 true parameter θ for 10000 simulation curves_realAnd a resulting parameter theta_outResults of Pearson correlation analysis were performed separately

At theta_realThe parameter term in (1) is abscissa in theta_outThe parameter item in (1) is a vertical coordinate, and E is used for 10000 data₀The α, τ parameters were separately scatterplotted and the correlation line (red line) and 45 ° diagonal (black line) were plotted, and the results are shown in fig. 6. FIG. 6 shows the simulated curves in θ for 10000_realThe parameter term in (1) is abscissa in theta_outThe parameter item in (1) is a vertical coordinate, and a scattered point is drawnFigure (a). In the figure, the solid line is a correlation line, the dotted line is a diagonal line of 45 degrees, each small point is a piece of data, r is a correlation coefficient, and p is significance. (a) Parameter E₀(ii) a (b) A parameter α; (c) the parameter τ. FIG. 6 (a) is E₀Point-to-point diagram of parameters, θ_realE of (A)₀And theta_outE of (A)₀The pairs of points are distributed approximately around the 45 deg. diagonal and follow theta_realIn E₀Is more dispersed, the correlation line is closer to the 45 diagonal and the slope of the correlation line is less than the 45 diagonal. FIG. 6 (b) is a point-to-point diagram of the α parameter, θ_realAlpha and theta of_outThe pairs of points of alpha are closely spaced around the 45 deg. diagonal with only a small number of pairs offset from the 45 deg. diagonal, the associated lines substantially coinciding with the 45 deg. diagonal. FIG. 6 (c) is a point pair diagram of the τ parameter, D_realτ and θ of_outThe point pairs of tau are scattered in the whole parameter range, and show relatively weak positive correlation, and the correlation line is far away from the 45-degree diagonal. The CF-DQN algorithm has the best fitting effect on alpha parameter, and E₀The fitting effect of the parameters is also better, and the fitting effect on the tau parameters is the worst.

Referring to FIG. 7, FIG. 7 shows fitting of 10000 simulated curves

And the distribution thereof. (a) Fitting error of each curve; (b) error distribution situation. 10000 fitting result curves D_outAnd simulation curve D_realAbsolute average error of

And the distribution of the MAEs is shown in fig. 7. FIG. 7 (a) shows the logarithmic values of MAE for these 10000 pieces of data, and it can be seen that the distribution of MAE is 10^-8～10^-4Within the range. FIG. 7 (b) more intuitively reflects the MAE distribution, and statistically, the MEA with 4550 curves was less than 10^-7MEA with 3654 curves at 10^-7～10^-6In between, the MEA with 3654 curves is at 10^-7～10^-6Between, the MEA with 1293 curves is at 10^-6～10^-5In between, there are 503 curves of MEA at 10^-5～10^-4In the meantime. The curve fitting effect of the CF-DQN algorithm is good.

In summary, the embodiment of the present invention discloses a hierarchical Deep reinforcement learning strategy (CF-DQN) for complex function multi-parameter optimization. The method is used for function fitting, solves the fitting problem of complex functions, and has a good effect on global parameter approximation of multi-parameter functions, particularly non-convex functions. Based on deep reinforcement learning (DQN), 2 special Long and Short memory neural networks (LSTM) are established and are respectively used for DQN winning excitation prediction and DQN iteration initial parameter prediction. The Reward value of DQN is given by the custom Reward Evaluation Rules (REward Evaluation Rules, RER) and the custom Reward LSTM (Reward-LSTM, R-LSTM) prediction, and the Initial parameters of the first iteration are given by the custom Initial value LSTM (Initial-LSTM, I-LSTM) prediction.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims

1. A global optimization method for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss value network is characterized by comprising the following steps:

2. The KVFD model mechanics parameters global optimization method of claim 1, wherein in step S1, the pre-obtained nanoindentation measurement curves include time series, force series, and indentation depth series.

3. The KVFD model mechanical parameter global optimization method of claim 1, wherein in step S1, the predicted value obtaining network comprises: a plurality of LSTM hidden layers and a DNN network;

4. The global optimization method for mechanical parameters of KVFD model of claim 1, wherein in step S1, the calculation expression of the LOSS function value is,

in the formula, L_pIn partCalculating a tag parameter θ_trainAnd network output parameters

5. The KVFD model mechanical parameter global optimization method of claim 1, wherein in step S2, the reward value prediction network of the deep reinforcement learning algorithm comprises: a plurality of LSTM hidden layers and a DNN network;

6. The KVFD model mechanical parameter global optimization method of claim 5, wherein in step S2, the LOSS function used in the network training for the reward value prediction is the sum of absolute errors of the label reward value vector and the reward value vector output by the network.

7. The KVFD model mechanical parameter global optimization method of claim 6, wherein in step S2, the step of performing iteration by using the predicted parameter value obtained in step S1 as an initial iteration value of the depth-enhanced learning algorithm to obtain an approximation of a global parameter solution of the pre-obtained nanoindentation measurement curve includes the following specific steps:

The evaluation formula of the reward value r is expressed as:

8. The KVFD model mechanical parameter global optimization method of claim 7, wherein in step S2, the specific step of determining whether the approximation of the global parameter solution reaches a preset convergence condition includes: stopping iteration when the error between the current iteration result parameter corresponding curve and the pre-acquired nanoindentation measurement curve is smaller than a certain preset value; or stopping iteration when the iteration times reach a preset value.

9. A global optimization system for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss value network is characterized by comprising the following steps: