CN113077853A - Double-loss-value network deep reinforcement learning KVFD model mechanical parameter global optimization method and system - Google Patents

Double-loss-value network deep reinforcement learning KVFD model mechanical parameter global optimization method and system Download PDF

Info

Publication number
CN113077853A
CN113077853A CN202110368257.2A CN202110368257A CN113077853A CN 113077853 A CN113077853 A CN 113077853A CN 202110368257 A CN202110368257 A CN 202110368257A CN 113077853 A CN113077853 A CN 113077853A
Authority
CN
China
Prior art keywords
parameter
value
curve
network
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110368257.2A
Other languages
Chinese (zh)
Other versions
CN113077853B (en
Inventor
张红梅
周衍
王凯
李文彬
张可浩
王炯
万明习
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110368257.2A priority Critical patent/CN113077853B/en
Publication of CN113077853A publication Critical patent/CN113077853A/en
Application granted granted Critical
Publication of CN113077853B publication Critical patent/CN113077853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/14Details relating to CAD techniques related to nanotechnology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a global optimization method and a system for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss-value network, wherein the method comprises the following steps: s1, inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; s2, taking the parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain an approximation of a global parameter solution of the pre-acquired nanoindentation measurement curve; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model. According to the method, parameter prediction values are introduced in iteration for parameter guidance, and the global optimal solution can be well approximated.

Description

Double-loss-value network deep reinforcement learning KVFD model mechanical parameter global optimization method and system
Technical Field
The invention belongs to the technical field of mechanical parameters of nanoindentation measurement data, relates to the field of KVFD model multi-parameter function fitting and global parameter approximation, and particularly relates to a method and a system for global optimization of mechanical parameters of a double-loss-value network deep reinforcement learning KVFD model.
Background
At present, in the process of obtaining mechanical parameters of a measured material through nanoindentation measurement data, for simple function fitting, a least square method is mostly adopted, and function parameters are adjusted in a successive iteration mode to reduce the minimum mean square error between a fitting curve and a real curve. The method is quick and effective for fitting simple functions, but is often poor for fitting complex functions and multi-parameter functions.
For multi-parameter optimization of a complex function of the KVFD model, a better global optimal solution cannot be obtained by adopting a common greedy algorithm, a gradient descent algorithm and a simulated annealing algorithm; the greedy algorithm and the gradient descent algorithm can find a local optimal solution near a given initial parameter, and have certain applicability to complex function multi-parameter optimization, but if a global parameter is found, the global parameter is difficult to approximate to the given initial value; the simulated annealing algorithm adopts a certain probability to accept a new parameter solution, has the capability of jumping out of a local optimal trap, shows better optimization capability in the optimization problem with a plurality of local optimal solutions, can find solutions near a global parameter solution with a certain probability, but has the characteristic of probability of the simulated annealing algorithm, so that the simulated annealing algorithm cannot approach the global parameter solution every time and has poor reliability.
In summary, for the current KVFD model complex function multi-parameter optimization problem, it is difficult to effectively approach the global optimal solution.
Disclosure of Invention
The invention aims to provide a method and a system for global optimization of mechanical parameters of a deep reinforcement learning KVFD model of a double-loss value network, so as to solve one or more technical problems. According to the method, parameter prediction values are introduced in iteration for parameter guidance, and the global optimal solution can be well approximated.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a global optimization method for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss value network, which comprises the following steps of:
s1, inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;
s2, taking the parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain an approximation of a global parameter solution of the pre-acquired nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters;
and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.
In a further development of the invention, in step S1, the pre-acquired nanoindentation measurement curve includes a time series, a force series, and an indentation depth series.
In a further improvement of the present invention, in step S1, the predicted value obtaining network includes: a plurality of LSTM hidden layers and a DNN network;
the unit number of each layer of the LSTM hidden layers is fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; inputting a pre-acquired nanoindentation measurement curve by a first LSTM hidden layer, and inputting an output value of a last LSTM hidden layer into a DNN network;
the DNN network comprises a plurality of fully-connected layers and convolutional layers with different dimensions and is used for converting the value output by the last LSTM hidden layer into a parameter predicted value output.
In a further development of the invention, in step S1, the calculation expression for the LOSS function value is,
Figure BDA0003008210590000031
in the formula, LpPartially calculating tag parameters DtrainAnd network output parameters
Figure BDA0003008210590000032
Loss value between, LdPartial calculation of curve DtrainCurves corresponding to network output parameters
Figure BDA0003008210590000033
A loss value of between, wp、wdAre respectively Lp、LdThe weight of both parts.
In a further improvement of the present invention, in step S2, the reward value prediction network of the deep reinforcement learning algorithm includes: a plurality of LSTM hidden layers and a DNN network;
the unit number of each layer of the LSTM hidden layers is fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; the first LSTM hidden layer inputs a difference value obtained by subtracting a pre-acquired nanoindentation measurement curve from a current iteration curve, and the output value of the last LSTM hidden layer enters a DNN network;
the DNN network includes a plurality of fully-connected and convolutional layers of different dimensions for translating the value output by the last LSTM hidden layer into a reward prediction for each directional action.
In a further improvement of the present invention, in step S2, the LOSS function used in the training of the reward value prediction network is the sum of absolute errors of the label reward value vector and the reward value vector output by the network.
In step S2, in the process of obtaining the approximation of the global parameter solution of the pre-obtained nanoindentation measurement curve by iterating the parameter prediction value obtained in step S1 as an initial iteration value of the depth-enhanced learning algorithm, the specific steps of each iteration include:
(1) respectively predicting the reward value of the alternative parameter set of the current iteration parameter by using a reward evaluation rule and the reward value prediction network, and performing weighted addition on the reward value and the alternative parameter set of the current iteration parameter to obtain reward evaluation on the alternative parameter set of the current iteration parameter;
the reward evaluation rule is that for the evaluation of a certain candidate parameter, the curve difference value delta of a curve corresponding to the candidate parameter and a pre-acquired nano indentation measurement curve is calculated firstly, and then the absolute average value of the curve difference value is calculated
Figure BDA0003008210590000034
The evaluation formula of the reward value r is expressed as:
Figure BDA0003008210590000035
(2) and (2) calculating a new line of the Q table according to the reward evaluation obtained in the step (1) and the content of the current line of the Q table in the deep reinforcement learning algorithm, finding the maximum value in the new line of the Q table, and taking the corresponding alternative parameter as the current iteration result parameter.
In a further improvement of the present invention, in step S2, the specific step of determining whether the approximation of the global parameter solution reaches the preset convergence condition includes: stopping iteration when the error between the current iteration result parameter corresponding curve and the pre-acquired nanoindentation measurement curve is smaller than a certain preset value; or stopping iteration when the iteration times reach a preset value.
The invention relates to a mechanical parameter extraction system of a nanometer indentation measurement curve based on a KVFD model, which comprises:
the parameter predicted value acquisition module is used for inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;
the depth reinforcement learning iteration output module is used for taking the obtained parameter prediction value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain approximation of a global parameter solution of the pre-obtained nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a hierarchical deep reinforcement learning strategy for complex function multi-parameter optimization, which can solve the problem that the existing complex function multi-parameter optimization is difficult to effectively approach a global parameter solution. The method can more stably search an approximate global parameter solution in the complex function multi-parameter optimization.
In the invention, an integral algorithm is constructed based on deep reinforcement learning (DQN), an iteration initial parameter is given by a deep circular neural network (I-LSTM), and the network also participates in guiding training by global parameters during training; the reward evaluation of the iteration parameters is given by the participation of another deep-cycle neural network (R-LSTM), and the network guides the training by the participation of the global parameters during the training; the design enables the nano indentation measurement curve based on the KVFD model to introduce parameter guidance in the fitting process, and not only guides the fitting parameters and curve adjustment through curve errors, so that the method can better approach a global parameter solution in the complex function multi-parameter optimization problem.
In the invention, the guidance of global parameters on the multi-parameter optimization of complex functions is introduced by designing a neural network (namely R-LSTM) in deep reinforcement learning (DQN), so that the method has better capability of approaching global parameters than the prior art. By additionally arranging a neural network (i.e. I-LSTM) for initial parameter guidance and increasing the calculation of parameter errors in the loss function, the initial iteration parameters of the deep enhanced learning (DQN) can be close to the global parameter solution, the capability of the strategy of the invention for approaching the global parameters in the complex function multi-parameter optimization problem is enhanced, the iteration times of the deep enhanced learning (DQN) can be effectively reduced, and the operation speed is accelerated.
In the present invention, by adjusting the weight wp、wdThe attention degree of the I-LSTM to the parameters or curves can be adjusted; by adjusting Lp、LdThe specific calculation mode of the method can adapt the prediction capability of the I-LSTM to various curve equations with different characteristics, and achieves a good prediction effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic flow chart of a mechanical parameter extraction method of nanoindentation measurement data based on a KVFD model according to an embodiment of the present invention;
FIG. 2 is a diagram of an Initial LSTM Network (I-LSTM) structure for providing Initial parameter prediction for the QL section, in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a Reward LSTM Network (R-LSTM) structure for providing Reward evaluation predictions for the QL section and a method for generating a training data set thereof, in accordance with an embodiment of the present invention;
fig. 4 is a schematic diagram of a mechanical parameter extraction system for nanoindentation measurement data based on the KVFD model according to an embodiment of the present invention;
FIG. 5 is a graph showing experimental curve fitting results in an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating comparison between simulation parameters and fitting parameters according to an embodiment of the present invention; wherein (a) in fig. 6 is a comparison diagram of a parameter indicating elasticity, (b) in fig. 6 is a comparison diagram of a parameter indicating fluidity, and (c) in fig. 6 is a comparison diagram of a parameter indicating viscosity;
FIG. 7 is a schematic diagram of experimental curve fitting error distribution in an embodiment of the present invention; fig. 7 (a) is a schematic view of a scatter plot of the error distribution, and fig. 7 (b) is a schematic view of a histogram of the error distribution.
Detailed Description
In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.
Based on the existing complex function multi-parameter optimization strategy, it is difficult to effectively and stably find or approach a global parameter solution. The invention discloses a strategy for fitting a multi-parameter complex function and approximating global parameters based on Deep reinforcement learning (DQN). The method uses deep reinforcement learning (DQN) to iteratively optimize, wherein the Reward value of the DQN is given by a user-defined Reward Evaluation Rule (RER) and Reward LSTM network (Reward-LSTM, R-LSTM) prediction; initial parameters for the first iteration are given by custom Initial value LSTM network (Initial-LSTM, I-LSTM) prediction. The LSTM is adopted to build a network, and the purpose is to process a time sequence data-multi-parameter complex function curve with a large number of points.
Referring to fig. 1, the present invention designs an algorithm (current-Fitting-DQN, CF-DQN) for complex function multi-parameter Fitting to find a solution approximating global parameters. The design framework of CF-DQN is shown in FIG. 1, and the input is the curve D of the parameter solution to be foundrealThe output is a prediction result parameter thetaoutThe method takes a Q-Learning algorithm (QL) as a framework, 1 LSTM Network (called Initial LSTM Network, I-LSTM) predicts Initial parameters for the Q-Learning algorithm, and another LSTM Network (called Reward LSTM Network, R-LSTM) evaluates Reward values for alternative parameters in the Q-Learning algorithm. The R-LSTM in the algorithm gives out reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters.
In the embodiment of the invention, for convenient description, a multi-parameter complex function f and a curve D of parameters to be extracted are definedreal,DrealThe corresponding ideal global parameter is thetarealI.e. Dreal=f(θreal) The output result of CF-DQN is θoutIt corresponds to the curve DoutLet the output of I-LSTM be θI-LSTMThe action set of Q-Learning is A ═ a1,a2,……,anThe ith row vector of the Q table is
Figure BDA0003008210590000071
The algorithm operation mainly comprises 2 stages, namely a parameter guiding stage and an iteration optimizing stage:
(1) the parameter boot phase is performed by the I-LSTM network: curve D of the parameter to be extractedrealInputting into I-LSTM network to obtain a set of parameters thetaI-LSTMInputting QL as initial parameter of QL;
(2) the framework of the iterative optimization stage is QL, and meanwhile, the R-LSTM network participates in the evaluation reward link in the QL: when iterating to the ith step, adopting Reward Evaluation Rules (RER) and R-LSTM as the current parameter thetaiThe resulting set of parameters after performing action set a assesses the reward value. Based on the obtained prize value andcurrent row contents Q of Q tableiCalculate a new line Qi+iSelecting the maximum value
Figure BDA0003008210590000072
Then take the k-th action a of action set AkLet the parameter be thetaiIs updated to thetai+1And finishing one iteration optimization. Stopping iteration when the maximum iteration optimizing times are reached or the curve error is less than a preset value, wherein the obtained parameter is a final parameter thetaout
Referring to FIG. 2, in the embodiment of the present invention, the first step of the CF-DQN algorithm is to provide an Initial iteration value for Q-Learning through the Initial LSTM Network (I-LSTM). The input of the I-LSTM is a one-dimensional curve D of the parameter to be extracted, and the number of curve points is fixed as m; then 3 LSTM hidden layers are provided, the number of units of each layer is fixed and consistent, the number of units is close to the number m of curve points when being selected, the first LSTM hidden layer receives input data D, so that the input dimension of the first LSTM layer is m, the output dimension is a set number of units, the input dimension and the output dimension of each subsequent LSTM layer are set numbers of units, and each LSTM layer is connected in a point-to-point mode; followed by a DNN network which may contain a plurality of fully-connected layers and convolutional layers of different dimensions for converting the output of the LSTM layer into an output parameter thetaI-LSTM
In the embodiment of the invention, in order to introduce the guidance of global parameters to the loss function L, the loss function L is designedI-LSFrom Lp、LdThe two parts are respectively weighted by weight wp、wdThe weight of the components is made up,
Figure BDA0003008210590000081
wherein L ispPartially calculating tag parameters DtrainAnd network output parameters
Figure BDA0003008210590000082
Loss value between, LdPartial calculation of curve DtrainCurves corresponding to network output parameters
Figure BDA0003008210590000083
By adjusting the weight wp、wdWe can adjust the degree of importance of I-LSTM to the parameter or curve; by adjusting Lp、LdThe specific calculation mode of the method can adapt the prediction capability of the I-LSTM to various curve equations with different characteristics, and achieves a good prediction effect.
Referring to fig. 3, in the embodiment of the present invention, the network structure of R-LSTM is similar to that of I-LSTM, and LSTM hidden layer and DNN network are used, the input is the point-by-point difference result of two curves, and the output is the reward prediction for each direction action.
For convenience of description, the following symbols are defined: the optional action set of parameter change is A ═ a1,a2,……,anD is the curve of the parameter to be extractedrealCorresponding to a true parameter of thetarealThe current reference parameter is thetanowThe corresponding curve is Dnow. Parameter thetanowThe parameter set obtained after the action set A is
Figure BDA0003008210590000084
They correspond to a set of curves of
Figure BDA0003008210590000085
Curve DnowAnd DrealHas a difference of ΔtrainSet of curves
Figure BDA0003008210590000086
Are each independently of DrealHas a difference of { Δ }12,……,Δn}. Parameter sets using Reward Evaluation Rules (RER)
Figure BDA0003008210590000087
Wherein the reward obtained by evaluating each parameter is r1,r2,……,rnLet Rtrain=[r1,r2,……,rn]. Will be deltatrainThe input R-LSTM obtains the result of
Figure BDA0003008210590000088
The algorithm flow for the R-LSTM training phase is shown in FIG. 3. According to the true parameter being thetarealIs the current reference parameter thetanowThe parameter set obtained after the action set A is
Figure BDA0003008210590000089
The reward is evaluated. The network is made to memorize the relationship between the difference between the curve corresponding to the current reference parameter and the real curve and the reward evaluation of the alternative parameter set corresponding to the current reference parameter. The trained R-LSTM has indirect guidance of global parameters to parameter optimization, and the global parameter guidance is introduced into the algorithm, so that the algorithm can have the capability of approaching the global parameters.
Each set of data in the training set of R-LSTM contains: a vector delta of length mtrainIs a current reference parameter corresponding curve DnowAnd the parameter curve D to be extractedrealThe difference results are made point by point; a vector R of length ntrainRepresents the current reference parameter θnowAfter the corresponding action set is executed, the reward value which can be obtained by each parameter; unlike I-LSTM, the loss function L used in R-LSTM trainingR-LSTAs the sum of absolute errors of the vector of prize values, i.e.
Figure BDA0003008210590000091
In Q-Learning iteration, the Reward Evaluation of the alternative parameter set comes from 2 aspects, one is Reward Evaluation Rules (RER), the other is the Reward value predicted by R-LSTM, and the Reward Evaluation of the alternative parameter set is weighted to guide parameter iteration. And when the iteration is carried out for a certain number of times or the error between the fitted curve and the real curve is less than a certain value, ending the iteration.
In the invention, the guidance of global parameters on the multi-parameter optimization of complex functions is introduced by designing a neural network (namely R-LSTM) in deep reinforcement learning (DQN), so that the method has better capability of approaching global parameters than the prior art. By additionally arranging a neural network (i.e. I-LSTM) for initial parameter guidance and increasing the calculation of parameter errors in the loss function, the initial iteration parameters of the deep enhanced learning (DQN) can be close to the global parameter solution, the capability of the strategy of the invention for approaching the global parameters in the complex function multi-parameter optimization problem is enhanced, the iteration times of the deep enhanced learning (DQN) can be effectively reduced, and the operation speed is accelerated.
Referring to fig. 4, a mechanical parameter extraction system for nanoindentation measurement curves based on KVFD models according to an embodiment of the present invention includes:
the parameter predicted value acquisition module is used for inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;
the depth reinforcement learning iteration output module is used for taking the obtained parameter prediction value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain approximation of a global parameter solution of the pre-obtained nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
In the embodiment of the invention, a round probe nanoindentation equation (hereinafter referred to as KVFD equation) of a KVFD model is selected to specifically implement the strategy of the invention. The equation is divided into three loading protocols (called replay, Load-unload and cruise below) of Ramp-replay, Load-unload and Ramp-cruise. The system of equations is shown in table 1.
TABLE 1 nanometer indentation equation of KVFD model under circular probe
Figure BDA0003008210590000101
Wherein the mechanical parameter to be optimized is [ E ]0,α,τ]. R is the radius of the probe, v is the indentation depth increase rate under load, k is the pressure increase rate under load, TrFor the turn-time, Γ (·) is a gamma function,
Figure BDA0003008210590000102
in order to be a complete beta function,
Figure BDA0003008210590000103
in order to be an incomplete beta function,
Figure BDA0003008210590000104
is a function of Mittag-Leffler (M-L).
And (3) constructing CF-DQN algorithm examples of the KVFD equation 3 loading protocols. Assuming that in the KVFD equation, the probe radius R is 8.5 μm and 8.5 × 10-6And m is selected. The other conditions under the 3 loading protocols in the formula are set as follows:
(1) under the relaxation loading protocol: turning time Tr2s, hold time T hold3s, 5 × 10 with maximum indentation depth of 5 μm-6m, the indentation depth increasing speed v in the pressing stage is 2.5 mu m/s is 2.5 multiplied by 10-6m/s;
(2) Under the load-unload protocol: turning time T r25s, 5 × 10 with maximum indentation depth of 5 μm- 6m, the indentation depth increasing speed v is 0.2 μm/s is 0.2 × 10-6m/s;
(3) Under the cruise loading protocol: turning time Tr2s, hold time T hold3s, 5 μ N-5 × 10-6N, the increasing speed k of the pressure in the pressing stage is 2.5 mu N/s is 2.5 multiplied by 10-6N/s。
When the CF-DQN algorithm is implemented, set oneSet parameter is θ ═ E0,α,τ]The parameter range is E0∈[10,100000],α∈[0.01,0.99],τ∈[1,1000](ii) a Setting the point number of the curve D corresponding to the parameter theta as m-250; the time starting point of the curve is set to be t equal to 0s, the sampling rate under the replay loading protocol is 0.02 s, the sampling rate under the load-unload loading protocol is 0.1 s, and the sampling rate under the cruise loading protocol is 0.02 s. Setting the corresponding equations of the three loading protocols of relax, load-unload and cruise as fr、fu、fcWherein f corresponds to the relaxation load protocolrAs follows
Figure BDA0003008210590000111
load-unload load protocol mapping fuT is not less than 0 and not more than TrLoading part of, not using t>TrUnloading section of
Figure BDA0003008210590000112
F corresponding to CREEP loading protocolcAs follows
Figure BDA0003008210590000113
Note that for 3 loading protocols, it is true for 3 different complex curve equations, so 3 different CF-DQN instances need to be implemented separately.
Action set of parameter change a ═ { a ═ a1,a2,……,anIt contains 8 elements, i.e. n-8, as detailed in table 2. The symbol "↓" represents the parameter item increasing in step size, and the symbol "↓" represents the parameter item decreasing in step size. The step size of each parameter item is 1% of the current parameter value.
TABLE 2 parameter Change action set A set upon implementation of the CF-DQN Algorithm
Figure BDA0003008210590000121
The number of nodes of 3 LSTM hidden layers in the I-LSTM is set as 256 nodes, a full connection layer of 256 nodes is arranged after the last LSTM hidden layer, the output of the full connection layer is 3 nodes, and the full connection layer corresponds to the standardized E nodes0Parameters α, τ. When training I-LSTM, we use a training data set that contains one million pieces.
The number of nodes of 3 layers of LSTM hidden layers in the R-LSTM is set to be 256 nodes, a full-connection layer of 256 nodes is arranged behind the last layer of LSTM hidden layer, the output of the full-connection layer is 8 nodes, and the full-connection layer corresponds to the standardized reward value of the parameter after 8 actions are executed. When training R-LSTM, we use a training data set that contains one million pieces.
And setting the maximum iteration times of the QL part to be 1000 times, and terminating the iteration when the iteration times reach the maximum times. Meanwhile, at each iteration, the algorithm keeps track of the minimum curve error encountered in the iteration process
Figure BDA0003008210590000122
Corresponding result parameter thetakAnd the iteration sequence number k, once the current iteration sequence number i>k +20, considering that the fitting effect of the multiple iterations on the curve is not improved, terminating the iterations and outputting a final result thetaout=θk
After building the CF-DQN algorithms for the 3 loading protocols, we tested the algorithms using simulation data. The simulation data is generated according to the above-described settings of the data, so that the generated curve D is knownrealCorresponding real parameter thetarealTo facilitate the study of the CF-DQN algorithm result thetaoutAnd the relation with the real parameters. Meanwhile, a noise signal can be added to the generated smooth curve to simulate data obtained in a real nanoindentation experiment.
The invention uses simulation data to carry out 2 tests on CF-DQN, and the following is a detailed description of the 2 tests:
(1) and (3) testing curve fitting effect:
using the parameter thetareal=[20000,0.2,50]5 graphs of replay, load-unload and crop are generated respectively. For each of the 5 curves of the loading protocol, the following 5 treatments were performed: adding 10 average values without adding noise-7Gaussian noise, uniform noise, rayleigh noise, exponential noise. The above 15 curves were fitted using the CF-DQN algorithm to see curve fitting effects and resulting parameters.
(2) 10000 randomly generated relaxation curves were fitted:
10000 sets of parameters were randomly generated within the range and f was usedr10000 relaxation curves are generated, and then the average value of all the curves is 10-7N gaussian random noise. The 10000 curves are fitted by using CF-DQN, result parameters are extracted, and the analysis result parameters and the real parameters are compared to verify the robustness of the CF-DQN.
The following are the results of testing an example of the CF-DQN algorithm customized to the KVFD equations.
And (3) testing curve fitting effect:
using the CF-DQN algorithm to correct the parameter θreal=[20000,0.2,50]The resulting fitting effect of the KVFD equation curves with different noise additions is shown in fig. 5. FIG. 5 shows the CF-DQN algorithm for a true parameter θreal=[E0,α,τ]=[20000,0.2,50]The loading protocol is replay, cruise and load-unload, and the fitting results of 15 simulation curve pairs without noise, with Gaussian noise, uniform noise, Rayleigh noise and exponential noise are respectively made. In fig. 5, the curves in the 1 st column to the 3 rd column are corresponding to the replay, load-unload and cruise loading protocols, respectively; line 1 is a curve without noise, line 2 is a curve with gaussian noise, line 3 is a curve with uniform noise, line 4 is a curve with rayleigh noise, line 5 is a curve with exponential noise, the mean values of the noise added are all 10-7. Generated simulation curve DrealIs a solid line, CF-DQN fitting result curve Dout=f(θout) Represented by a hollow origin.
Table 3 shows the fitting result parameter θ of each curveoutAbsolute error of the resulting parameter from the true parameter | θoutrealAverage absolute error of | result curve and simulation curve
Figure BDA0003008210590000141
TABLE 3 CF-DQN Algorithm for real parameter θreal=[E0,α,τ]=[20000,0.2,50]
Figure BDA0003008210590000142
In the case of no noise (fig. 5, line 1), the curve of the fitting result of the 3 loading protocols almost agrees with the simulation curve, and the curve MAE is 10-9Magnitude (see table 3). The relaxation fitting result parameters are [19339, 0.1953,70 ]]The load-unload is [19320,0.1951,70 ]]And creep [18410,0.1935,105 ]]Parameter θ of fitting resultoutWith the true parameter thetarealAnd (4) approaching.
Under the condition of adding noise (from the line 2 to the line 5 of FIG. 5), the curve of the fitting result of the 3 loading protocols is consistent with the skeleton of the simulation curve, the fitting is not influenced by noise burrs basically, and the curve MAE is 10-8Magnitude (see table 3). From the image, the 4 noises with the same mean value have the smallest curve disturbance, namely uniform noise (the 3 rd line in FIG. 5), and the MAEs of the three loading modes of replay, cruise and load-unload are respectively 2.64E-8, 3.15E-8 and 2.65E-8; secondly, Rayleigh noise (figure 5, line 4), and the MAE of the three loading modes of replay, street and load-unload are respectively 5.47E-8, 5.74E-8 and 5.48E-8; the additive gaussian noise (fig. 5, line 2) and the exponential noise (fig. 5, line 5) have the largest disturbance to the curve shape, and the MAE of the 6 curves is about 8E-8. Exponential noise also gives the curve many sharp peaks protruding upwards, making the curve appear to shift upwards, but the fit result curve can still match the bottom skeleton of the upper simulated curve. Fitting result parameter theta of 3 loading protocolsoutStill with the true parameter thetarealClose to, and without causing significant changes due to the addition of noise.
In the embodiment of the invention, 10000 random-generated relax curves are fitted: 10000 sets of parameters theta are randomly generatedrealGenerating a corresponding relaxation curve DrealAnd adding an average value of 10-7N Gaussian random noise, then using CF-DQN algorithm to extract parameters from the 10000 curves, and extracting result parameter thetaoutAnd a true parameter thetarealAnd performing comparative analysis.
Are respectively to E0The results of Pearson correlation analysis with α, τ are shown in Table 4. ThetarealE of (A)0Parameter and thetaoutE of (A)0The correlation coefficient between the parameters is r-0.8288 (p)<0.001),θrealAlpha parameter of and thetaoutR-0.9963 (p) is the correlation coefficient between the α parameters of (a)<0.001),θrealτ parameter and θoutR-0.2802 (p) is the correlation coefficient between the parameters of (t)<0.001) it can be seen that there is a significant correlation between the output parameter and the true parameter. The correlation coefficient of the alpha parameter is closest to 1, E0The correlation coefficient of the parameter is also closer to 1 and the correlation coefficient of the tau parameter is further from 1. The method shows that in 3 parameters extracted by the designed CF-DQN algorithm example, the alpha parameter is very close to the real parameter of the curve, and E0The parameter is closer to the real parameter of the curve, and the difference between the parameter tau and the real parameter of the curve is larger.
TABLE 4 true parameter θ for 10000 simulation curvesrealAnd a resulting parameter thetaoutResults of Pearson correlation analysis were performed separately
Figure BDA0003008210590000151
At thetarealThe parameter term in (1) is abscissa in thetaoutThe parameter item in (1) is a vertical coordinate, and E is used for 10000 data0The α, τ parameters were separately scatterplotted and the correlation line (red line) and 45 ° diagonal (black line) were plotted, and the results are shown in fig. 6. FIG. 6 shows the simulated curves in θ for 10000realThe parameter term in (1) is abscissa in thetaoutThe parameter item in (1) is a vertical coordinate, and a scattered point is drawnFigure (a). In the figure, the solid line is a correlation line, the dotted line is a diagonal line of 45 degrees, each small point is a piece of data, r is a correlation coefficient, and p is significance. (a) Parameter E0(ii) a (b) A parameter α; (c) the parameter τ. FIG. 6 (a) is E0Point-to-point diagram of parameters, θrealE of (A)0And thetaoutE of (A)0The pairs of points are distributed approximately around the 45 deg. diagonal and follow thetarealIn E0Is more dispersed, the correlation line is closer to the 45 diagonal and the slope of the correlation line is less than the 45 diagonal. FIG. 6 (b) is a point-to-point diagram of the α parameter, θrealAlpha and theta ofoutThe pairs of points of alpha are closely spaced around the 45 deg. diagonal with only a small number of pairs offset from the 45 deg. diagonal, the associated lines substantially coinciding with the 45 deg. diagonal. FIG. 6 (c) is a point pair diagram of the τ parameter, Drealτ and θ ofoutThe point pairs of tau are scattered in the whole parameter range, and show relatively weak positive correlation, and the correlation line is far away from the 45-degree diagonal. The CF-DQN algorithm has the best fitting effect on alpha parameter, and E0The fitting effect of the parameters is also better, and the fitting effect on the tau parameters is the worst.
Referring to FIG. 7, FIG. 7 shows fitting of 10000 simulated curves
Figure BDA0003008210590000161
And the distribution thereof. (a) Fitting error of each curve; (b) error distribution situation. 10000 fitting result curves DoutAnd simulation curve DrealAbsolute average error of
Figure BDA0003008210590000162
And the distribution of the MAEs is shown in fig. 7. FIG. 7 (a) shows the logarithmic values of MAE for these 10000 pieces of data, and it can be seen that the distribution of MAE is 10-8~10-4Within the range. FIG. 7 (b) more intuitively reflects the MAE distribution, and statistically, the MEA with 4550 curves was less than 10-7MEA with 3654 curves at 10-7~10-6In between, the MEA with 3654 curves is at 10-7~10-6Between, the MEA with 1293 curves is at 10-6~10-5In between, there are 503 curves of MEA at 10-5~10-4In the meantime. The curve fitting effect of the CF-DQN algorithm is good.
In summary, the embodiment of the present invention discloses a hierarchical Deep reinforcement learning strategy (CF-DQN) for complex function multi-parameter optimization. The method is used for function fitting, solves the fitting problem of complex functions, and has a good effect on global parameter approximation of multi-parameter functions, particularly non-convex functions. Based on deep reinforcement learning (DQN), 2 special Long and Short memory neural networks (LSTM) are established and are respectively used for DQN winning excitation prediction and DQN iteration initial parameter prediction. The Reward value of DQN is given by the custom Reward Evaluation Rules (REward Evaluation Rules, RER) and the custom Reward LSTM (Reward-LSTM, R-LSTM) prediction, and the Initial parameters of the first iteration are given by the custom Initial value LSTM (Initial-LSTM, I-LSTM) prediction.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims (9)

1. A global optimization method for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss value network is characterized by comprising the following steps:
s1, inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;
s2, taking the parameter predicted value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain an approximation of a global parameter solution of the pre-acquired nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters;
and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.
2. The KVFD model mechanics parameters global optimization method of claim 1, wherein in step S1, the pre-obtained nanoindentation measurement curves include time series, force series, and indentation depth series.
3. The KVFD model mechanical parameter global optimization method of claim 1, wherein in step S1, the predicted value obtaining network comprises: a plurality of LSTM hidden layers and a DNN network;
the unit number of each layer of the LSTM hidden layers is fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; inputting a pre-acquired nanoindentation measurement curve by a first LSTM hidden layer, and inputting an output value of a last LSTM hidden layer into a DNN network;
the DNN network comprises a plurality of fully-connected layers and convolutional layers with different dimensions and is used for converting the value output by the last LSTM hidden layer into a parameter predicted value output.
4. The global optimization method for mechanical parameters of KVFD model of claim 1, wherein in step S1, the calculation expression of the LOSS function value is,
Figure FDA0003008210580000021
in the formula, LpIn partCalculating a tag parameter θtrainAnd network output parameters
Figure FDA0003008210580000022
Loss value between, LdPartial calculation of curve DtrainCurves corresponding to network output parameters
Figure FDA0003008210580000023
A loss value of between, wp、wdAre respectively Lp、LdThe weight of both parts.
5. The KVFD model mechanical parameter global optimization method of claim 1, wherein in step S2, the reward value prediction network of the deep reinforcement learning algorithm comprises: a plurality of LSTM hidden layers and a DNN network;
the unit number of each layer of the LSTM hidden layers is fixed and consistent, and each LSTM hidden layer is connected in a point-to-point mode; the first LSTM hidden layer inputs a difference value obtained by subtracting a pre-acquired nanoindentation measurement curve from a current iteration curve, and the output value of the last LSTM hidden layer enters a DNN network;
the DNN network includes a plurality of fully-connected and convolutional layers of different dimensions for translating the value output by the last LSTM hidden layer into a reward prediction for each directional action.
6. The KVFD model mechanical parameter global optimization method of claim 5, wherein in step S2, the LOSS function used in the network training for the reward value prediction is the sum of absolute errors of the label reward value vector and the reward value vector output by the network.
7. The KVFD model mechanical parameter global optimization method of claim 6, wherein in step S2, the step of performing iteration by using the predicted parameter value obtained in step S1 as an initial iteration value of the depth-enhanced learning algorithm to obtain an approximation of a global parameter solution of the pre-obtained nanoindentation measurement curve includes the following specific steps:
(1) respectively predicting the reward value of the alternative parameter set of the current iteration parameter by using a reward evaluation rule and the reward value prediction network, and performing weighted addition on the reward value and the alternative parameter set of the current iteration parameter to obtain reward evaluation on the alternative parameter set of the current iteration parameter;
the reward evaluation rule is that for the evaluation of a certain candidate parameter, the curve difference value delta of a curve corresponding to the candidate parameter and a pre-acquired nano indentation measurement curve is calculated firstly, and then the absolute average value of the curve difference value is calculated
Figure FDA0003008210580000024
The evaluation formula of the reward value r is expressed as:
Figure FDA0003008210580000031
(2) and (2) calculating a new line of the Q table according to the reward evaluation obtained in the step (1) and the content of the current line of the Q table in the deep reinforcement learning algorithm, finding the maximum value in the new line of the Q table, and taking the corresponding alternative parameter as the current iteration result parameter.
8. The KVFD model mechanical parameter global optimization method of claim 7, wherein in step S2, the specific step of determining whether the approximation of the global parameter solution reaches a preset convergence condition includes: stopping iteration when the error between the current iteration result parameter corresponding curve and the pre-acquired nanoindentation measurement curve is smaller than a certain preset value; or stopping iteration when the iteration times reach a preset value.
9. A global optimization system for mechanical parameters of a KVFD model for deep reinforcement learning of a double-loss value network is characterized by comprising the following steps:
the parameter predicted value acquisition module is used for inputting the pre-acquired nano indentation measurement curve into a trained predicted value acquisition network to obtain a parameter predicted value of the nano indentation measurement curve; the trained predicted value acquisition network is a circulating neural network based on an LSTM hidden layer, and LOSS function values used by the circulating neural network during training are calculated by a curve and a curve corresponding parameter of an input network and a parameter and parameter corresponding curve of network output;
the depth reinforcement learning iteration output module is used for taking the obtained parameter prediction value as an iteration initial value of a depth reinforcement learning algorithm for iteration to obtain approximation of a global parameter solution of the pre-obtained nanoindentation measurement curve; the reward value prediction network of the deep reinforcement learning algorithm gives reward values when the current parameters change to different directions through the difference value between the curve corresponding to the current iteration parameters and the real curve, and guides the parameters to approach to the global parameters; and when the approximation of the global parameter solution reaches a preset convergence condition, outputting the approximation of the global parameter solution as a mechanical parameter of the KVFD model.
CN202110368257.2A 2021-04-06 2021-04-06 Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model Active CN113077853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110368257.2A CN113077853B (en) 2021-04-06 2021-04-06 Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110368257.2A CN113077853B (en) 2021-04-06 2021-04-06 Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model

Publications (2)

Publication Number Publication Date
CN113077853A true CN113077853A (en) 2021-07-06
CN113077853B CN113077853B (en) 2023-08-18

Family

ID=76615137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110368257.2A Active CN113077853B (en) 2021-04-06 2021-04-06 Global optimization method and system for mechanical parameters of double loss value network deep reinforcement learning KVFD model

Country Status (1)

Country Link
CN (1) CN113077853B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017004626A1 (en) * 2015-07-01 2017-01-05 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for providing reinforcement learning in a deep learning system
CN109002942A (en) * 2018-09-28 2018-12-14 河南理工大学 A kind of short-term load forecasting method based on stochastic neural net
CN110443364A (en) * 2019-06-21 2019-11-12 深圳大学 A kind of deep neural network multitask hyperparameter optimization method and device
WO2020220191A1 (en) * 2019-04-29 2020-11-05 Huawei Technologies Co., Ltd. Method and apparatus for training and applying a neural network
CN111914474A (en) * 2020-06-28 2020-11-10 西安交通大学 Fractional-order KVFD multi-parameter machine learning optimization method for viscoelastic mechanical characterization of soft substances

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017004626A1 (en) * 2015-07-01 2017-01-05 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for providing reinforcement learning in a deep learning system
CN109002942A (en) * 2018-09-28 2018-12-14 河南理工大学 A kind of short-term load forecasting method based on stochastic neural net
WO2020220191A1 (en) * 2019-04-29 2020-11-05 Huawei Technologies Co., Ltd. Method and apparatus for training and applying a neural network
CN110443364A (en) * 2019-06-21 2019-11-12 深圳大学 A kind of deep neural network multitask hyperparameter optimization method and device
CN111914474A (en) * 2020-06-28 2020-11-10 西安交通大学 Fractional-order KVFD multi-parameter machine learning optimization method for viscoelastic mechanical characterization of soft substances

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李聪;时宏伟;: "基于深度学习长时预测的非线性系统建模方法研究", 现代计算机, no. 15 *

Also Published As

Publication number Publication date
CN113077853B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
TWI698807B (en) Artificial neural network class-based pruning
CN109120462B (en) Method and device for predicting opportunistic network link and readable storage medium
CN109447156B (en) Method and apparatus for generating a model
US20220147877A1 (en) System and method for automatic building of learning machines using learning machines
CN108959474B (en) Entity relation extraction method
Zhao et al. Bearing health condition prediction using deep belief network
CN114298851A (en) Network user social behavior analysis method and device based on graph sign learning and storage medium
TW202123098A (en) Method and electronic device for selecting neural network hyperparameters
CN111564179A (en) Species biology classification method and system based on triple neural network
CN111027292A (en) Method and system for generating limited sampling text sequence
WO2021055442A1 (en) Small and fast video processing networks via neural architecture search
CN116415200A (en) Abnormal vehicle track abnormality detection method and system based on deep learning
JP2019105871A (en) Abnormality candidate extraction program, abnormality candidate extraction method and abnormality candidate extraction apparatus
CN113330462A (en) Neural network training using soft nearest neighbor loss
CN113128689A (en) Entity relationship path reasoning method and system for regulating knowledge graph
CN116204786B (en) Method and device for generating designated fault trend data
CN113077853A (en) Double-loss-value network deep reinforcement learning KVFD model mechanical parameter global optimization method and system
CN111612022A (en) Method, apparatus, and computer storage medium for analyzing data
CN115345303A (en) Convolutional neural network weight tuning method, device, storage medium and electronic equipment
CN115936303A (en) Transient voltage safety analysis method based on machine learning model
CN112419098B (en) Power grid safety and stability simulation sample screening and expanding method based on safety information entropy
CN115035304A (en) Image description generation method and system based on course learning
CN116324807A (en) Neural architecture and hardware accelerator search
CN116391193A (en) Method and apparatus for energy-based latent variable model based neural networks
CN113128677A (en) Model generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant