CN113077853B

CN113077853B - Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model

Info

Publication number: CN113077853B
Application number: CN202110368257.2A
Authority: CN
Inventors: 张红梅; 周衍; 王凯; 李文彬; 张可浩; 王炯; 万明习
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2023-08-18
Anticipated expiration: 2041-04-06
Also published as: CN113077853A

Abstract

The application discloses a global optimization method and a global optimization system for mechanical parameters of a double loss value network deep reinforcement learning KVFD model, wherein the method comprises the following steps: s1, inputting a pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; s2, iterating the parameter predicted value as an iteration initial value of a deep reinforcement learning algorithm to obtain an approximation of a global parameter solution of the pre-acquired nano indentation measurement curve; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KJFD model. According to the method, the parameter predicted value is introduced into iteration to conduct parameter guidance, and the global optimal solution can be well approximated.

Description

Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model

Technical Field

The application belongs to the technical field of mechanical parameters of nano indentation measurement data, relates to the field of KVFD model multi-parameter function fitting and global parameter approximation, and particularly relates to a method and a system for global optimization of mechanical parameters of a double loss value network deep reinforcement learning KVFD model.

Background

At present, in the process of acquiring mechanical parameters of a measured material through nano indentation measurement data, a least square method is mostly adopted for simple function fitting, and the function parameters are sequentially and iteratively adjusted to reduce the minimum mean square error between a fitted curve and a real curve. The method is fast and effective for fitting simple functions, but often does not perform well for fitting complex functions and multi-parameter functions.

For multi-parameter optimization of a KVFD model complex function, a common greedy algorithm, a gradient descent algorithm and a simulated annealing algorithm can not obtain a better global optimal solution; the greedy algorithm and the gradient descent algorithm can find a local optimal solution near a given initial parameter, have certain applicability to complex function multi-parameter optimization, but have a great relation with the given initial value if a global parameter is found, and are difficult to approach the global parameter; the simulated annealing algorithm adopts a certain probability to accept a new parameter solution, has the capability of jumping out of a local optimal trap, shows better optimizing capability in the optimizing problem of a plurality of local optimal solutions, has a certain probability to find a solution near a global parameter solution, but the probability characteristic of the simulated annealing algorithm also causes that the simulated annealing algorithm cannot approach the global parameter solution every time, and has poor reliability.

In summary, it is difficult to effectively approach the global optimal solution to the problem of multi-parameter optimization of the complex function of the kfd model at present.

Disclosure of Invention

The application aims to provide a global optimization method and a global optimization system for mechanical parameters of a double loss value network deep reinforcement learning KVFD model, which are used for solving one or more of the technical problems. According to the method, the parameter predicted value is introduced into iteration to conduct parameter guidance, and the global optimal solution can be well approximated.

In order to achieve the above purpose, the application adopts the following technical scheme:

the application discloses a global optimization method for mechanical parameters of a double loss value network deep reinforcement learning KVFD model, which comprises the following steps:

s1, inputting a pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a cyclic neural network based on an LSTM hidden layer, and LOSS function values used in the cyclic neural network are calculated by curve-curve corresponding parameters of an input network and parameter-parameter corresponding curves of network output;

s2, iterating the parameter predicted value as an iteration initial value of a deep reinforcement learning algorithm to obtain an approximation of a global parameter solution of the pre-acquired nano indentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives out the reward value when the current parameter changes to different directions through the difference value of the corresponding curve and the real curve of the current iteration parameter, and guides the parameter to approach to the global parameter;

and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KJFD model.

A further development of the application consists in that in step S1, the pre-acquired nanoindentation measurement profile comprises a time sequence, a stress sequence and an indentation depth sequence.

A further improvement of the present application is that, in step S1, the predicted value acquisition network includes: a plurality of LSTM hidden layers and a DNN network;

the unit numbers of each of the LSTM hidden layers are fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; the first LSTM hidden layer inputs a pre-acquired nanoindentation measurement curve, and the last LSTM hidden layer output value enters a DNN network;

the DNN network comprises a plurality of full-connection layers and convolution layers with different dimensions, and the full-connection layers and the convolution layers are used for converting the value output by the last LSTM hidden layer into a parameter predicted value to be output.

A further improvement of the application is that, in step S1, the expression of the LOSS function value is calculated as,

wherein L is _p Partial calculation of tag parameter θ _train And network output parametersBetween loss values, L _d Partial calculation curve D _train Curve corresponding to network output parameters ∈ ->Between loss values, w _p 、w _d Respectively is L _p 、L _d Weight of the two parts.

A further improvement of the present application is that, in step S2, the reward value prediction network of the deep reinforcement learning algorithm includes: a plurality of LSTM hidden layers and a DNN network;

the unit numbers of each of the LSTM hidden layers are fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; the first LSTM hidden layer inputs a difference value obtained by subtracting a pre-acquired nanoindentation measurement curve from a current iteration curve, and the last LSTM hidden layer output value enters a DNN network;

the DNN network comprises a plurality of fully connected layers and convolution layers with different dimensions, and the fully connected layers and the convolution layers are used for converting the value output by the last LSTM hidden layer into rewards prediction for actions in all directions.

A further improvement of the present application is that in step S2, the LOSS function used in the network training is the sum of absolute errors of the label rewards value vector and the rewards value vector output by the network.

In step S2, the iteration is performed by using the parameter predicted value obtained in step S1 as an iteration initial value of the deep reinforcement learning algorithm, and in the process of obtaining an approximation of the global parameter solution of the pre-obtained nanoindentation measurement curve, the specific steps of each iteration include:

(1) Respectively predicting the reward value of the candidate parameter set of the current iteration parameter by using a reward evaluation rule and the reward value prediction network, and taking the weighted sum of the reward evaluation rule and the reward value prediction network as the reward evaluation of the candidate parameter set of the current iteration parameter;

the reward evaluation rule is that for evaluating a certain alternative parameter, firstly calculating the curve difference delta between the corresponding curve of the alternative parameter and the pre-acquired nano indentation measurement curve, and then calculating the absolute average value of the curve difference

The evaluation formula of the prize value r is expressed as:

(2) And (3) calculating a new row of the Q table according to the reward evaluation obtained in the step (1) and the content of the current row of the Q table in the deep reinforcement learning algorithm, finding the maximum value in the new row of the Q table, and taking the corresponding alternative parameter as the current iteration result parameter.

The application further improves that in the step S2, the specific step of judging whether the approximation of the global parameter solution reaches the preset convergence condition comprises the following steps: stopping iteration when the error between the corresponding curve of the current iteration result parameter and the pre-acquired nanoindentation measurement curve is smaller than a certain preset value; or stopping iteration when the iteration times reach a preset value.

The application discloses a mechanical parameter extraction system of a nano indentation measurement curve based on a KVFD model, which comprises the following components:

the parameter prediction value acquisition module is used for inputting the pre-acquired nano indentation measurement curve into a trained prediction value acquisition network to acquire a parameter prediction value of the nano indentation measurement curve; the trained predicted value acquisition network is a cyclic neural network based on an LSTM hidden layer, and LOSS function values used in the cyclic neural network are calculated by curve-curve corresponding parameters of an input network and parameter-parameter corresponding curves of network output;

the depth reinforcement learning iteration output module is used for iterating the obtained parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm to obtain an approximation of a global parameter solution of the pre-obtained nano indentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives out the reward value when the current parameter changes to different directions through the difference value of the corresponding curve and the real curve of the current iteration parameter, and guides the parameter to approach to the global parameter; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KJFD model.

Compared with the prior art, the application has the following beneficial effects:

the application provides a hierarchical depth reinforcement learning strategy for multi-parameter optimization of a complex function, which can solve the problem that the existing multi-parameter optimization of the complex function is difficult to effectively approach to a global parameter solution. The method can more stably find the approach global parameter solution in the multi-parameter optimization of the complex function.

In the application, an integral algorithm is constructed based on deep reinforcement learning (DQN), iteration initial parameters are given by a deep-loop neural network (I-LSTM), and the network is also participated in guiding training by global parameters during training; the reward evaluation of the iteration parameter is given by another deep-loop neural network (R-LSTM) participation, and the network participates in guiding training by the global parameter during training; the design enables the nano indentation measurement curve based on the KVFD model to introduce parameter guidance in the fitting process, rather than guiding adjustment of fitting parameters and curves by curve errors, so that the method can better approximate to a global parameter solution in the complex function multi-parameter optimization problem.

In the application, the guidance of the global parameter on the multi-parameter optimization of the complex function is introduced by designing the neural network (namely R-LSTM) in the deep reinforcement learning (DQN), so that the application has better capability of approaching the global parameter than the prior art. By adding a neural network (i.e. I-LSTM) for initial parameter guidance and adding calculation of parameter errors in a loss function, initial iteration parameters of deep reinforcement learning (DQN) can be close to a global parameter solution, the capacity of the strategy of the application for approaching the global parameters in the complex function multi-parameter optimization problem is enhanced, the iteration times of the deep reinforcement learning (DQN) can be effectively reduced, and the operation speed is increased.

In the application, by adjusting the weight w _p 、w _d The importance degree of the I-LSTM on the parameter or the curve can be adjusted; by adjusting L _p 、L _d The specific calculation mode of the method can adapt the prediction capability of the I-LSTM to curve equations with different characteristics, and a good prediction effect is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description of the embodiments or the drawings used in the description of the prior art will make a brief description; it will be apparent to those of ordinary skill in the art that the drawings in the following description are of some embodiments of the application and that other drawings may be derived from them without undue effort.

FIG. 1 is a schematic flow chart of a method for extracting mechanical parameters of nano indentation measurement data based on a KVFD model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a Initial LSTM Network (I-LSTM) structure for providing initial parameter prediction for a QL portion in an embodiment of the application;

FIG. 3 is a schematic diagram of a Reward LSTM Network (R-LSTM) structure for providing reward evaluation prediction for the QL portion and a method of generating its training data set in an embodiment of the application;

FIG. 4 is a schematic diagram of a mechanical parameter extraction system based on nano-indentation measurement data of a KJFD model according to an embodiment of the present application;

FIG. 5 is a graph showing the results of experimental curve fitting in an embodiment of the present application;

FIG. 6 is a schematic diagram showing the comparison of simulation parameters and fitting parameters in an embodiment of the present application; wherein (a) in fig. 6 is a comparative schematic diagram of parameters representing elasticity, (b) in fig. 6 is a comparative schematic diagram of parameters representing fluidity, and (c) in fig. 6 is a comparative schematic diagram of parameters representing tackiness;

FIG. 7 is a graph showing experimental curve fitting error distribution in an embodiment of the present application; fig. 7 (a) is a schematic diagram of a scatter point of the error distribution, and fig. 7 (b) is a schematic diagram of a histogram of the error distribution.

Detailed Description

In order to make the purposes, technical effects and technical solutions of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application; it will be apparent that the described embodiments are some of the embodiments of the present application. Other embodiments, which may be made by those of ordinary skill in the art based on the disclosed embodiments without undue burden, are within the scope of the present application.

Based on the existing complex function multi-parameter optimization strategy, the global parameter solution is difficult to be found or approximated effectively and stably. The application discloses a strategy for fitting a multi-parameter complex function and approaching a global parameter based on Deep Q Network (DQN). The application uses deep reinforcement learning (DQN) to iterate optimization, and the Reward value of the DQN is predicted by a custom Reward evaluation rule (Reward Evaluation Rules, RER) and a Reward LSTM network (Reward-LSTM, R-LSTM); the Initial parameters of the first iteration are given by a custom Initial-LSTM network (I-LSTM) prediction. The LSTM is adopted to build a network, and aims to process time sequence data with a large number of points, namely a multi-parameter complex function curve.

Referring to FIG. 1, the present application designs an algorithm (Curve-fit-DQN, CF-DQN) for complex function multi-parameter Fitting, finding a solution to the global parameter approximation. The design framework of CF-DQN is shown in FIG. 1, and is input as curve D of the solution of the parameters to be found _real Output as predicted result parameter θ _out The method takes a Q-Learning algorithm (QL) as a skeleton, predicts initial parameters for the Q-Learning algorithm by 1 LSTM network (Initial LSTM Network, I-LSTM), and evaluates prize values for alternative parameters in the Q-Learning algorithm by another LSTM network (Reward LSTM Network, R-LSTM). AlgorithmAnd the R-LSTM in the model gives out a reward value when the current parameter changes to different directions through the difference value between the corresponding curve and the real curve of the current iteration parameter, and guides the parameter to approach to the global parameter.

In the embodiment of the application, for convenience of description, a multi-parameter complex function f is defined, and a curve D of parameters to be extracted is defined _real ，D _real The corresponding ideal global parameter is theta _real D is _real ＝f(θ _real ) The CF-DQN output result is θ _out It corresponds to curve D _out Let the output of I-LSTM be θ _I-LSTM The action set of Q-Learning is A= { a ₁ ，a ₂ ，......，a _n The ith row vector of the Q table isThe algorithm operation mainly comprises 2 stages, namely a parameter guiding stage and an iterative optimizing stage:

(1) The parameter boot phase is completed by the I-LSTM network: curve D of the parameter to be extracted _real Inputting I-LSTM network to obtain a group of parameters theta _I-LSTM And inputs it into QL as an initial parameter of QL;

(2) The skeleton of the iterative optimization stage is QL, and meanwhile, the R-LSTM network participates in the evaluation rewarding link in the QL: at the time of iteration to the ith step, the current parameter theta is jointly determined by adopting a reward value evaluation rule (Reward Evaluation Rules, RER) and R-LSTM ⁱ And evaluating the rewarding value by the parameter set obtained after the action set A is executed. Based on the obtained prize value and the current row content Q of the Q table ⁱ Calculate a new line Q ⁱ⁺¹ Selecting the maximum valueAction a of kth item of action set A is then taken _k The parameter is defined by theta ⁱ Updated to theta ⁱ⁺¹ And (5) completing one-time iterative optimization. Stopping iteration when the maximum number of iterative optimization or curve error is smaller than a preset value, wherein the obtained parameter is the final parameter theta _out 。

Referring to FIG. 2, in an embodiment of the present application, the CF-DQN algorithmThe first step is to provide an initial iteration value for Q-Learning via Initial LSTM Network (I-LSTM). The input of the I-LSTM is a one-dimensional curve D of the parameters to be extracted, and the number of curve points is fixed to be m; the number of units of each layer is fixed and consistent, the number of units is similar to the number m of curve points when the number of units is selected, the first LSTM hidden layer receives input data D, so that the input dimension of the first LSTM layer is m, the output dimension is a set number of units, the input dimension and the output dimension of each subsequent LSTM layer are set numbers of units, and each LSTM layer is connected in a point-to-point mode; followed by a DNN network which may contain multiple fully-connected layers and convolutional layers of different dimensions for converting the output of the LSTM layer into an output parameter θ _I-LSTM 。

In the embodiment of the application, in order to introduce the guidance of global parameters on the global parameters, a loss function L is designed _I-LSTM From L _p 、L _d The two parts are respectively weighted by w _p 、w _d The weight of the composition,

wherein L is _p Partial calculation of tag parameter θ _train And network output parametersBetween loss values, L _d Partial calculation curve D _train Curve corresponding to network output parameters ∈ ->The value of loss between the two is adjusted by adjusting the weight w _p 、w _d We can adjust the importance of I-LSTM to parameters or curves; by adjusting L _p 、L _d The specific calculation mode of the method can adapt the prediction capability of the I-LSTM to curve equations with different characteristics, and a good prediction effect is achieved.

Referring to fig. 3, in the embodiment of the present application, the network structure of R-LSTM is similar to that of I-LSTM, and LSTM hidden layer and DNN network are adopted, the input is the point-by-point difference result of two curves, and the output is the reward prediction for motion in each direction.

For convenience of description, the following symbols are defined: the parameter change optional action set is a= { a ₁ ，a ₂ ，......，a _n The curve of the parameter to be extracted is D _real Corresponding real parameter is theta _real The current reference parameter is theta _now The corresponding curve is D _now . Parameter θ _now The parameter set obtained after the action set A isTheir corresponding curve sets areCurve D _now And D _real Is delta as the difference of _train Curve set->Respectively with D _real Is { delta } ₁ ，Δ ₂ ，......，Δ _n }. Using Reward Evaluation Rules (RER) pair parameter setsThe rewards obtained by each parameter evaluation are respectively r ₁ ，r ₂ ,... rn, let R _train ＝[r ₁ ，r ₂ ，......，r _n ]. Will delta _train The result obtained by inputting R-LSTM is +.>

The algorithm flow for the R-LSTM training phase is shown in FIG. 3. According to the real parameter theta _real For the current reference parameter theta _now The parameter set obtained after the action set A isAnd evaluating the rewards. The network memorizes the alternative parameters corresponding to the current reference parameters and the difference between the current reference parameter corresponding curve and the real curveThe relation between the prize rating of the number sets ". The trained R-LSTM has indirect guidance of global parameters on parameter optimization, and the global parameter guidance is introduced into the algorithm, so that the algorithm can have the capacity of approaching the global parameters.

Each set of data in the training set of R-LSTM comprises: a vector delta of length m _train Is the corresponding curve D of the current reference parameter _now And the parameter curve D to be extracted _real A result of making a difference point by point; vector R of length n _train Representing the current reference parameter theta _now After executing the corresponding action set, the parameters can obtain the rewarding value; unlike I-LSTM, the loss function L employed in R-LSTM training _R-LSTM For the sum of absolute errors of vectors of prize values, i.e

When Q-Learning iterates, the reward evaluation of the candidate parameter set comes from 2 aspects, one is a reward evaluation rule (Reward Evaluation Rules, RER), the other is a reward value predicted by R-LSTM, and the reward evaluation of the candidate parameter set is obtained by weighting the two, so that the parameter iteration is guided. And ending the iteration when the iteration is performed for a certain number of times or the error between the fitted curve and the real curve is smaller than a certain value.

Referring to fig. 4, a mechanical parameter extraction system of a nanoindentation measurement curve based on a kfd model according to an embodiment of the present application includes:

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In the embodiment of the application, a round probe nanoindentation equation (hereinafter referred to as KJFD equation) of a KJFD model is selected, and the strategy of the application is implemented specifically. The equation is divided into three loading protocols (hereinafter, relaxation, load-unloading and creep) of Ramp-relaxation, load-unloading and Ramp-creep. The system of equations is shown in table 1.

Table 1. Nanoindentation equation of kvfd model under circular probe

Wherein the mechanical parameter to be optimized is [ E ] ₀ ，α，τ]. R is the radius of the probe, v is the increasing speed of the indentation depth during loading, k is the increasing speed of the pressure during loading, T _r For turning time, Γ (·) is a gamma function,for the complete beta function +.>As an incomplete beta function ++>Is a Mittag-Leffer (M-L) function.

Construction of CF-DQN algorithm example for KVFD equation 3 loading protocols. Assuming the KVFD equation, the probe radius r=8.5 μm=8.5×10 ^-6 m. Other conditions under the 3 loading protocols in the formula are set as follows:

(1) Under the relationship loading protocol: turning time T _r Time of retention T =2s _hold =3s, maximum indentation depth of 5 μm=5×10 ^-6 m, the indentation depth increasing speed v=2.5 μm/s=2.5×10 in the pressing stage ^-6 m/s；

(2) Under the load-unload loading protocol: turning time T _r =25 s, the maximum indentation depth is 5 μm=5×10 ^- ⁶ m, the indentation depth increasing speed v=0.2 μm/s=0.2×10 in the pressing stage ^-6 m/s；

(3) Under the creep loading protocol: turning time T _r Time of retention T =2s _hold =3s, maximum pressure of 5 μn=5×10 ^-6 N, the pressure increase rate in the pressing stage k=2.5 μn/s=2.5×10 ^-6 N/s。

When the CF-DQN algorithm is implemented, a set of parameters is set as θ= [ E ] ₀ ，α，τ]The parameter range is E ₀ ∈[10，100000]，α∈[0.01，0.99]，τ∈[1，1000]The method comprises the steps of carrying out a first treatment on the surface of the Setting the point number of the curve D corresponding to the parameter theta to be m=250; the time starting point of the curve is set to t=0s, the sampling rate under the correlation loading protocol is 0.02 seconds, the sampling rate under the load-unload loading protocol is 0.1 seconds, and the sampling rate under the creep loading protocol is 0.02 seconds. Setting the equations corresponding to the relaxation, load-load and creep loading protocols as f respectively _r 、f _u 、f _c Wherein f corresponding to the relationship loading protocol _r The following are listed below

load-unload load protocol corresponds to f _u T is not less than 0 but not more than T _r Not using T > T _r Is provided with an unloading part of (a)

F corresponding to creep loading protocol _c The following are listed below

Note that for 3 loading protocols, it is precisely for 3 different complex curve equations, so that 3 different CF-DQN instances need to be implemented separately.

Action set a= { a of parameter change ₁ ，a ₂ ，......，a _n (ii) contains 8 elements, i.e. n=8, see table 2 for details. The symbol "≡" represents that the parameter item increases in steps, and the symbol "≡" represents that the parameter item decreases in steps. The step size of each parameter item is 1% of the current parameter value.

TABLE 2 parameter Change action set A set for CF-DQN Algorithm implementation

The number of the 3-layer LSTM hidden layer nodes in the I-LSTM is set to 256 nodes, the last layer LSTM hidden layer is followed by a layer of 256-node fully-connected layer, the output of the fully-connected layer is 3 nodes, and the fully-connected layer corresponds to standardized E respectively ₀ α, τ parameters. In training I-LSTM, we use a training dataset that contains one million.

The number of nodes of a 3-layer LSTM hidden layer in the R-LSTM is set to 256 nodes, a full-connection layer of 256 nodes is arranged behind the last layer LSTM hidden layer, the output of the full-connection layer is 8 nodes, and the full-connection layer corresponds to standardized rewards of corresponding parameters after 8 actions are executed. In training R-LSTM, we use a training dataset that contains one million.

Setting the maximum iteration number of the QL part as 1000 times, and ending the iteration when the iteration number reaches the maximum number. At the same time, the algorithm keeps track of the minimum curve error encountered in the iterative process during each iterationCorresponding result parameter theta ^k And the iteration sequence number k, once the current iteration sequence number i is larger than k+20, considering that the fitting effect of multiple iterations on the curve is not improved, terminating the iteration and outputting a final result theta _out ＝θ ^k 。

After constructing the CF-DQN algorithm for 3 loading protocols, we tested the algorithm using simulation data. The simulation data is generated according to the above settings of the data, so we know the generated curve D _real Corresponding real parameter theta _real The result theta of the CF-DQN algorithm is convenient for us to examine _out Relationship with the real parameters. Meanwhile, a noise signal can be added to the generated smooth curve to simulate data obtained in a real nanoindentation experiment.

The present application uses simulation data to perform 2 tests on CF-DQN, the following is a detailed description of 2 tests:

(1) Curve fitting effect test:

using the parameter theta _real ＝[20000，0.2，50]5 curves of relaxation, load-unlock and creep are generated. For each of the 5 curves of the loading protocol, the following 5 treatments were performed: noise is not added, and the average value is 10- ⁷ Gaussian noise, uniform noise, rayleigh noise, exponential noise. The above 15 curves were fitted using the CF-DQN algorithm to see the curve fitting effect and the resulting parameters.

(2) Fitting 10000 randomly generated relatives:

in the range ofInternally randomly generating 10000 sets of parameters and using f _r Generating 10000 corresponding relatives, and adding 10-percent average value to all the relatives ⁷ N gaussian random noise. The CF-DQN is used for fitting the 10000 curves and extracting result parameters, and the analysis result parameters and the real parameters are compared to verify the robustness of the CF-DQN.

The following are the results of tests performed on examples of the CF-DQN algorithm customized to the KJFD equation.

Curve fitting effect test:

parameter θ using CF-DQN algorithm _real ＝[20000，0.2，50]The fitting effect of the generated KVFD equation curves with different noise added is shown in fig. 5. FIG. 5 shows the CF-DQN algorithm versus the real parameter θ _real ＝[E ₀ ，α，τ]＝[20000，0.2，50]The loading protocol is relaxation, creep, load-unloading, and fitting results of 15 simulation curve pairs without adding noise, gaussian noise, uniform noise, rayleigh noise and exponential noise are respectively made. The curves from column 1 to column 3 in FIG. 5 are corresponding curves of relaxation, load-unlock and creep loading protocols, respectively; line 1 is a curve without noise, line 2 is a curve with gaussian noise, line 3 is a curve with uniform noise, line 4 is a curve with gareli noise, line 5 is a curve with exponential noise, the average of the added noise is 10 ^-7 . Generated simulation curve D _real Fitting the result curve D to the solid line, CF-DQN _out ＝f(θ _out ) Represented by the hollow origin.

Table 3 shows the fitting result parameters θ for each curve _out Absolute error of resultant parameter and true parameter |theta _out -θ _real Mean Absolute Error (MAE) of the results curves and simulation curves

TABLE 3 CF-DQN Algorithm versus real parameters θ _real ＝[E ₀ ,α，τ]＝[20000，0.2，50]

In the absence of added noise (FIG. 5, line 1), the fitted result curves for the 3 loading protocols are almost identical to the simulation curves, with the curve MAE at 10- ⁹ The order of magnitude (see Table 3). The parameters of the fitting result of the correlation are [19339,0.1953, 70 ]]Load-unload is [19320,0.1951, 70]Creep is [18410,0.1935, 105 ]]Fitting the result parameter θ _out And the true parameter theta _real Proximity.

Under the condition of adding noise (line 2 to line 5 of fig. 5), the fitting result curves of the 3 loading protocols are also consistent with the skeleton of the simulation curve, the fitting is basically not interfered by noise burrs, and the curve MAE is 10- ⁸ The order of magnitude (see Table 3). From the image, the minimum disturbance to the curve is uniform noise (line 3 of fig. 5) in 4 kinds of noise with the same mean value, and the MAE of the three loading modes relaxation, creep, load-unloading are respectively 2.64E-8, 3.15E-8 and 2.65E-8; secondly, rayleigh noise (line 4 of FIG. 5) is respectively 5.47E-8, 5.74E-8 and 5.48E-8 for the relaxation, creep, load-unloading three loading modes MAE; the maximum disturbance to the curve morphology caused by gaussian noise (line 2 of fig. 5) and exponential noise (line 5 of fig. 5) is around 8E-8 for the 6 curves. Exponential noise also gives rise to many upward spike projections to the curve, making the curve appear to shift upward, but the fitted result curve can still match the bottom skeleton of the upper simulation curve. Fitting result parameter theta of 3 kinds of loading protocols _out Still with the real parameter theta _real Close, there is no significant change due to the addition of noise.

In the embodiment of the application, 10000 randomly generated correlation curve fitting results are obtained: randomly generating 10000 groups of parameters theta _real Generating a corresponding relativity curve D _real And add to the average value 10 ^-7 N Gaussian random noise, then using CF-DQN algorithm to extract parameters for 10000 curves, and extracting result parameters theta _out And a true parameter theta _real And (5) performing comparative analysis.

Respectively to E ₀ The Pearson correlation analysis was performed for α, τ, and the results are shown in table 4.θ _real E of (2) ₀ Parameters and θ _out E of (2) ₀ The correlation coefficient between the parameters is r=0.8288 (p < 0.001), θ _real Alpha parameter and theta of (2) _out The correlation coefficient between the alpha parameters of (a) is r= 0.9963 (p < 0.001), θ _real τ parameter and θ of (2) _out The correlation coefficient between τ parameters is r= 0.2802 (p < 0.001), and a significant correlation exists between the output parameters and the real parameters. The correlation coefficient of the alpha parameter is closest to 1, E ₀ The correlation coefficient of the parameter is also closer to 1, and the correlation coefficient of the τ parameter is farther from 1. Description of the CF-DQN algorithm example designed by us extracting 3 parameters, the alpha parameter is very close to the real parameter of the curve, E ₀ The parameter is close to the curve real parameter, and the difference between the tau parameter and the curve real parameter is larger.

TABLE 4 real parameters θ for 10000 simulation curves _real And the result parameter theta _out Results of Pearson correlation analysis

At theta _real The parameter term in (a) is the abscissa, and theta is taken as _out The parameter item in (a) is the ordinate, E for 10000 pieces of data ₀ The α, τ parameters are plotted in scatter plots, and the correlation line (red straight line) and 45 ° diagonal line (black straight line) are plotted, respectively, as shown in fig. 6. FIG. 6 shows a plot of θ for 10000 simulation curves _real The parameter term in (a) is the abscissa, and theta is taken as _out The parameter items in (a) are ordinate and the drawn scatter diagram. In the figure, the solid line is a correlation line, the dotted line is a 45-degree diagonal line, each small point is a piece of data, r is a correlation coefficient, and p is significance. (a) Parameter E ₀ The method comprises the steps of carrying out a first treatment on the surface of the (b) parameter α; (c) a parameter τ. FIG. 6 (a) is E ₀ Point-to-point plot of parameters, θ _real E of (2) ₀ And theta _out E of (2) ₀ The constituent pairs of points are distributed approximately about a 45 diagonal and follow θ _real E in (E) ₀ Is more diffuse, the correlation line is closer to the 45 diagonal, and the slope of the correlation line is smaller than the 45 diagonal. FIG. 6 (b) is a point of the alpha parameterFor the graph, θ _real Alpha and theta of (2) _out The pairs of points of alpha of (a) are closely spaced around the 45 diagonal with only a small number of pairs of points being offset from the 45 diagonal, the associated line being substantially coincident with the 45 diagonal. FIG. 6 (c) is a point-to-point plot of the τ parameter, θ _real τ and θ of (2) _out The pairs of points of τ of (c) spread over the entire parameter range, exhibiting a relatively weak positive correlation, the correlation line being farther from the 45 diagonal. The CF-DQN algorithm has the best fitting effect on alpha parameter and E ₀ The fitting effect of the parameters is also good, and the fitting effect of the tau parameter is the worst.

Referring to FIG. 7, FIG. 7 shows MAE fitted to 10000 simulation curvesAnd the distribution thereof. (a) fitting error for each curve; (b) error distribution. 10000 fitting result curves D _out And simulation curve D _real Absolute average error>And MAE distribution is shown in FIG. 7. FIG. 7 (a) shows the log of MAE for 10000 data, showing that MAE is distributed at 10 ^-8 ～10 ^-4 Within the range. FIG. 7 (b) more intuitively reflects MAE distribution, and statistically, MEA's with 4550 curves are less than 10 ^-7 MEA with 3654 curves at 10 ^-7 ～10 ^-6 Between which an MEA with 3654 curves is at 10 ^-7 ～10 ^-6 Between which an MEA having 1293 curves is at 10 ^-6 ～10 ^-5 Between which an MEA having 503 curves is at 10 ^-5 ～10 ^-4 Between them. The curve fitting effect of the CF-DQN algorithm is good.

In summary, the embodiment of the application discloses a hierarchical Deep-reinforcement learning strategy (CF-DQN) for complex function multi-parameter optimization. The method is used for function fitting, solves the fitting problem of complex functions, and has good effect on the global parameter approximation of multi-parameter functions, particularly non-convex functions. Based on deep reinforcement learning (DQN), 2 special long-short-time memory neural networks (Long Short Term Mermory network, LSTM) are established and are respectively used for DQN winning prediction and DQN iterative initial parameter prediction. The prize value of the DQN is predicted by a custom prize evaluation rule (Reward Evaluation Rules, RER) and custom prize LSTM (Reward-LSTM, R-LSTM), and the Initial parameters of the first iteration are predicted by custom Initial LSTM (I-LSTM).

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present application, and any modifications and equivalents not departing from the spirit and scope of the present application are within the scope of the claims of the present application.

Claims

1. The global optimization method for the mechanical parameters of the double loss value network deep reinforcement learning KVFD model is characterized by comprising the following steps:

when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KJFD model;

in step S1, the pre-acquired nano indentation measurement curve includes a time sequence, a stress sequence and an indentation depth sequence;

The evaluation formula of the prize value r is expressed as:

2. The method according to claim 1, wherein in step S1, the predicted value obtaining network comprises: a plurality of LSTM hidden layers and a DNN network;

3. The global optimization method for mechanical parameters of a kfd model according to claim 1, wherein in step S1, the calculation expression of the LOSS function value is,

4. The method according to claim 1, wherein in step S2, the reward value prediction network of the deep reinforcement learning algorithm comprises: a plurality of LSTM hidden layers and a DNN network;

5. The global optimization method of mechanical parameters of a kfd model according to claim 4, wherein in step S2, the LOSS function used in the training of the prize value prediction network is the sum of absolute errors of the label prize value vector and the prize value vector output by the network.

6. The method for global optimization of mechanical parameters of a kfd model according to claim 1, wherein in step S2, the specific step of determining whether the approximation of the global parameter solution reaches a preset convergence condition includes: stopping iteration when the error between the corresponding curve of the current iteration result parameter and the pre-acquired nanoindentation measurement curve is smaller than a certain preset value; or stopping iteration when the iteration times reach a preset value.

7. The utility model provides a global optimization system of double loss value network deep reinforcement learning KVFD model mechanical parameters which is characterized in that the system comprises:

the depth reinforcement learning iteration output module is used for iterating the obtained parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm to obtain an approximation of a global parameter solution of the pre-obtained nano indentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives out the reward value when the current parameter changes to different directions through the difference value of the corresponding curve and the real curve of the current iteration parameter, and guides the parameter to approach to the global parameter; when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KJFD model;

in the step parameter predicted value acquisition module, the pre-acquired nano indentation measurement curve comprises a time sequence, a stress sequence and an indentation depth sequence;

in the step of deep reinforcement learning iteration output module, the obtained parameter predicted value is iterated as an iteration initial value of a deep reinforcement learning algorithm, and in the process of obtaining the approximation of the global parameter solution of the pre-obtained nano indentation measurement curve, the specific steps of each iteration include:

The evaluation formula of the prize value r is expressed as: