CN112632860B

CN112632860B - Power transmission system model parameter identification method based on reinforcement learning

Info

Publication number: CN112632860B
Application number: CN202110002104.6A
Authority: CN
Inventors: 丁建完; 陈立平; 郭超; 彭奇
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2024-06-04
Anticipated expiration: 2041-01-04
Also published as: CN112632860A

Abstract

The invention discloses a power transmission system model parameter identification method based on reinforcement learning, and belongs to the field of system modeling simulation. Aiming at the problems of inconsistent sensitivity of the model parameters of the power transmission system, low convergence speed and high search range requirement of the existing identification algorithm, the invention constructs a reinforcement learning frame for identifying the model parameters of the power transmission system, and has the advantages of local optimum prevention, high convergence speed and large search range; the invention adopts a staged identification flow, the rough adjustment stage utilizes the characteristics of fast convergence speed and large search range of reinforcement learning to quickly find the optimal subinterval of each parameter, and the fine adjustment stage utilizes the characteristics of high accuracy and strong global search capability of heuristic algorithm to determine the final identification result in the optimal subinterval, and compared with the method which only uses a single algorithm, the staged identification effect is better.

Description

Power transmission system model parameter identification method based on reinforcement learning

Technical Field

The invention belongs to the field of system modeling simulation, and particularly relates to a power transmission system model parameter identification method based on reinforcement learning.

Background

With the continuous development of multi-domain system modeling simulation technology, the Modelica language gradually becomes an industry standard for the development of multi-domain and multi-disciplinary system simulation, and the goal is to define a general programming language for complex system modeling. The Modelica model is a program written in Modelica language and having the capabilities of object-oriented modeling, multi-field unified modeling, non-causal statement modeling and continuous discrete mixed modeling, and can be used for engineers in different industries to build a simulation model system of the engineer and develop corresponding dynamic simulation. For the Modelica model, the model parameters are set as much as possible, the key point is that the model parameters are set, the process of building the model only determines the basic form of the model, the optimal simulation performance is obtained, and the accurate model parameters are required to be set.

The power transmission system is a device which obtains energy from a power source in the forms of speed/angular speed and force/moment under the control of a controller and transmits the energy to the next link of the system, and mainly comprises a motor, a speed reducer and other parts, and the power transmission process has states of energy loss, speed change and the like.

The existing model parameter identification method is multiple, namely the least square method is the most commonly used at present, and the method is a default parameter optimization method in modeling software. However, the least square estimation has too high dependence on data, the identification result is easily affected by noise, and the requirement on the initializing range of parameters is severe. In addition, the more commonly used identification methods are evolutionary algorithms, such as Particle Swarm Optimization (PSO), genetic algorithm (Genetic Algorithm, GA) and the like, and the methods have stronger capability of approaching global optimum, but have slower convergence speed and have to be improved in precision. Therefore, there is a strong need for an identification method with high convergence speed and low requirement on the initial range of parameters to identify parameters of a model-based driveline.

Disclosure of Invention

Aiming at the above defects or improvement demands of the prior art, the invention provides a power transmission system model parameter identification method based on reinforcement learning, which aims to provide an identification method with high convergence speed and low requirement on the initial range of parameters so as to carry out parameter identification on a power transmission system model based on Modelica.

In order to achieve the above object, the present invention provides a method for identifying parameters of a power transmission system model based on reinforcement learning, comprising:

S1, constructing a dynamic model of a power transmission system based on a multi-field unified modeling language Modelica;

S2, performing sensitivity analysis on parameters to be identified of the model;

S3, coarse adjustment is carried out on parameters to be identified based on a reinforcement learning algorithm:

constructing a reinforcement learning framework for model parameter identification of a Modelica power transmission system;

performing iterative training by using a reinforcement learning framework to obtain an optimal subinterval of each parameter to be identified;

s4, parameter fine adjustment:

and taking the mean square error of the measured data and the model estimated value as an objective function, and iteratively optimizing in a solution space formed by the model parameters to be identified to obtain each parameter value corresponding to the minimum objective function value as a final identification result.

Further, in step S2, the method sobol is specifically used to perform parameter sensitivity analysis on the power transmission system model with undetermined parameters, and the specific steps are as follows:

01. respectively carrying out Monte Carlo sampling in the range of possible value intervals of N parameters to be identified to generate an initial sample matrix A, an initial sample matrix B and a cross sample matrix Where i= {1,2, …, N };

02. sample matrix A, B, Simulation solution is carried out on the power transmission system model as input to respectively obtain an initial sample matrix A, an initial sample matrix B and a cross sample matrix/>Model simulation result vectors f (A), f (B),

03. The global impact index ST _i of each parameter is found based on the simulation result and the following formula:

Wherein Y represents f (A), f (B) and A vector set formed; var (Y) represents the variance of the driveline model output;

04. sensitivity ordering is carried out on parameters to be identified according to the size of the parameter global influence index; the larger the impact factor, the more sensitive;

05. combining parameters to be identified with sensitivity lower than a set threshold value.

Further, the reinforcement learning frame construction process specifically includes:

(1) Taking the mean square error of the model estimated value Y _est and the measured value Y _mea as a reinforcement learning objective function F (X);

(2) Constructing a single step prize:

r＝min(1,max(0,(F(X_mean)-F(X_best))/(F(X_mean)-F(X_cur)000

Wherein r represents a single-step prize value, F (X _cur) represents an objective function value under the current parameter, F (X _best) represents an objective function value under the optimal parameter, and F (X _meab) represents an objective function value under the parameter average;

(3) Setting an action according to the parameter minimum change amount G _i (i=1, 2, …, N) and the range of each parameter:

Splitting the search range of the ith parameter into Selecting one subinterval, and randomly acquiring a value in the subinterval as an action; wherein x _imax is the i-th parameter maximum value, and x _imin is the i-th parameter minimum value; the minimum variation G _i refers to the variation of the i-th parameter which is increased or decreased in each step in the identification process;

(4) Constructing an action selection strategy:

01. Selecting a search path:

Determining whether the action selected by the next round is on the left side or the right side of the current action, wherein the selection index is L _p (i, j), and the calculation formula is as follows:

Wherein k represents the number of parameter transform combinations; for the nth largest Q value in k actions adjacent to the current action a _i,j on the ith path, λ ₁ is a path weight coefficient;

a random number epsilon ₁ between 0,1 is obtained and a search path l is determined using the following formula:

rand (1, 2) represents the random probability distribution over the interval 1-2;

02. determining the action:

obtaining a random number epsilon ₂ between [0,1], determining action a using the following formula:

Q (i, m) represents the Q value of the ith parameter to be identified, epsilon ₁ and epsilon ₂ are random numbers used for guaranteeing reinforcement learning exploratory property;

(5) Constructing an updating strategy of the Q value function:

The Q value function update formula corresponding to the i-th parameter is as follows:

Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i,j))

Where α is a hyper-parameter controlling the learning rate, r is a single-step prize, and λ ₂ is a hyper-parameter controlling the update amplitude.

Further, in the parameter rough adjustment stage, a reinforcement Learning framework based on a Q-Learning algorithm is used, and the specific steps of iterative training are as follows:

(1) Randomly initializing all parameters to be identified, substituting the parameters into a power transmission system model for calculation, and obtaining a mean square error as an initial value of an optimal objective function F (X) compared with measured data;

(2) A plurality of intelligent agents execute serial learning from large to small according to the sensitivity of parameters to be identified; the step of adjusting parameters to be identified according to the Q value table in the reinforcement learning framework is called an agent, and each parameter to be identified uniquely corresponds to one agent; the process of the agent adjusting the parameters to be identified is called learning behavior of the agent;

The learning process includes: randomly selecting an action a _rand (i) in a possible value interval range of the corresponding parameter of the current intelligent agent, fixing other parameters, applying the parameters to a power transmission system model to obtain an objective function value F (X _cur) and a single-step rewarding value r under the current parameters, and updating a Q value table of the current intelligent agent according to an updating strategy of a Q value function; if F (X _cur)≤F(X_best), considering the current parameter to have search value, turning to the step (3), otherwise repeating the learning process;

(3) Selecting one action a _iter (i) according to an action selection strategy, fixing other parameters, applying the parameters to the FMU to obtain an objective function value F (X _cur) and a reward and punishment value r, and updating a Q value table;

(4) Iteratively performing steps (2) and (3) for the ith parameter Secondly, finishing one-time identification of the ith parameter, and transferring to identification of the (i+1) th parameter;

(5) After all parameters to be identified are serially executed in the steps (2) - (4), completing a training period, if the current completion period number is smaller than the given training times And (2) entering the next training period, otherwise, ending the training.

Further, in step S4, the parameter is finely tuned by using a PSO optimization algorithm to obtain a final identification result, which specifically includes the following steps:

Setting the fine tuning range to be Wherein, the mu value is set according to the searching capability of the particle swarm algorithm,/>For the optimal subinterval obtained by reinforcement learning,/>Is the optimal value of the i-th parameter after reinforcement learning identification;

Initializing an N-dimensional space and a particle group, and creating a speed vector v _i, a position vector x _i, a historical optimal position vector p _i and a historical optimal position vector p _g of the whole particle group of each particle;

establishing a fitness function G (X) according to the mean square error of a model predicted value and an actual measured value, wherein the model predicted value is obtained by substituting the current position of particles into an FMU for simulation solution; the search target of the whole particle swarm is to minimize G (X);

Starting iterative search:

In one iteration, calculating the fitness value of each particle, updating the p _i value of a single particle and the p _g value of the population, and updating the speed and the position vector of each particle according to the following formula;

Wherein ω is inertial weight, defaults to 0.6, c ₁、c₂ is learning factor, and r ₁、r₂ is a random number between [0,1 ];

And after the maximum iteration times are reached, the historical optimal position vector p _g of the particle swarm is the final result of parameter identification.

In general, the above technical solution conceived by the present invention can achieve the following advantageous effects compared to the prior art.

(1) Aiming at the problems of inconsistent parameter sensitivity of the Modelica model and low convergence speed and high search range requirement of the existing identification algorithm, the invention constructs a reinforcement learning framework for model parameter identification of the Modelica power transmission system, and has the advantages of local optimum prevention, high convergence speed and large search range; the invention adopts a staged identification flow, the rough adjustment stage utilizes the characteristics of fast convergence speed and large search range of reinforcement learning to quickly find the optimal subinterval of each parameter, the fine adjustment stage utilizes the characteristics of high accuracy and strong global search capability of heuristic algorithm to determine the final identification result in the optimal subinterval, compared with the method which only uses a single algorithm, the staged identification effect is better,

(2) Aiming at the problem of huge search space caused by excessive parameter quantity, the invention also carries out sensitivity analysis on the parameters to be identified, combines the parameters to be identified with lower sensitivity, is beneficial to reducing the quantity of the parameters to be identified and saves the cost of computing resources in the process of strengthening the learning of the parameters; meanwhile, the parameter identification priority is set by using the sensitivity, so that a basis can be provided for setting the learning rate in the parameter identification process by the reinforcement learning method, and the parameter identification precision can be improved.

(3) The invention designs a sensitivity analysis scheme based on a Sobol method, and the Sobol method simultaneously explores all parameters of a model in a possible value space based on variance, has no limitation on nonlinearity and monotonicity of the model, and is suitable for carrying out parameter global sensitivity analysis on any model.

Drawings

FIG. 1 is a flow chart of a method for identifying parameters of a power transmission system model based on reinforcement learning;

FIG. 2 is a model of a powertrain created based on Modelica;

FIG. 3 is a global impact index for the insertia 2.J/spring. C/damping/and Jmotor parameters;

Fig. 4 is a model-based power transmission model established for a numerically-controlled machine tool feeding system, parameter identification is performed by the method of the invention, and under the same input condition, a physical prototype output result and a model simulation result are obtained.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention provides a model parameter identification method based on reinforcement learning, which can effectively identify key parameters of a model so as to improve simulation performance of the model. In order to meet the requirements of high convergence speed and high identification precision, the invention takes the physical machine measured data corresponding to the model as a training sample, adopts a reinforcement Learning framework based on the improvement of a Q-Learning algorithm in the rough parameter adjustment stage, and carries out serial iterative training on the intelligent agent represented by each parameter to be identified; and in the fine tuning stage, a PSO/GA algorithm is adopted to determine the final value of the parameter to be identified. The invention has the characteristics of high convergence rate and high identification precision.

Reinforcement learning is a machine learning method based on a "trial and error mechanism", and originates from research on the intelligent field of animals, and then develops into an important branch of the machine learning field. The reinforcement learning considers the interaction process between the intelligent body guided by the target and the unknown environment, accumulates experience in the continuous trial-and-error process and continuously updates the action selection strategy, and has the advantages of high convergence speed, easiness in avoiding local optimum, no need of priori knowledge and the like. At present, reinforcement learning is mostly used in the field of combinatorial optimization (sequence decision) and has been successfully applied, but the function optimization field to which parameter identification belongs has not been applied. In view of the requirements of rapid convergence and global optimization required by Modelica model parameter identification, the invention provides a reinforcement learning-based parameter identification method to solve the problem.

The invention adopts the whole technical scheme that: analyzing the parameter sensitivity of the Modelica model to be identified by using a sensitivity analysis method based on Sobol, sequencing the parameters according to the global influence factor of the parameters, and determining the training sequence and discretization degree (namely the number of grids split in an initialization range) of the parameters in a coarse adjustment stage; the invention uses the reinforcement Learning framework based on Q-Learning improvement, takes the mean square error between the actual measurement data and the model simulation data on the physical machine as an objective function and an environmental state, adjusts parameters within a feasible range as actions, and the relation between the current objective function value and the history optimal/average value is single-step rewarding, and carries out rough adjustment on the parameters to be identified of the model and obtains the optimal subinterval of each parameter; based on the optimal subintervals of all the parameters, adopting a PSO/GA algorithm, and performing iterative training to obtain an optimal identification value of each parameter, namely a final identification result of the whole technical scheme.

S1, modelica model parameter sensitivity analysis based on a Sobol method comprises the following steps:

(1) An FMU (Functional model unit, including a C code of the model, a solver and the like, which can be directly called and independently complete Simulation solving of the model) of the model to be identified is exported and set as a Co-Simulation mode;

(2) And respectively carrying out Monte Carlo sampling within the range of N parameters to be identified, generating a sample matrix A, B and AB ⁱ (i=1, 2, …, N), carrying out simulation solution on the FMU by taking the sample matrix as input to obtain a Y-value matrix, obtaining global influence factors of each parameter to be identified based on the Y-value matrix and an influence index formula, and sequencing. Determining a training sequence according to the size of the influence factors of the parameters to be identified, wherein the parameters with larger influence factors (namely more sensitive) are optimized in a single training round;

(3) Obtaining a random value in the search range of each parameter, substituting the random value into the FMU calculation result as a reference value, and sequentially determining the minimum subinterval size G _i (i=1, 2, …, N) of each parameter to ensure that the ith parameter has a relatively obvious influence on the simulation result after increasing and decreasing G _i.

S2, a reinforcement Learning framework based on a Q-Learning algorithm is used for a parameter rough adjustment stage. The method is characterized by comprising the following steps:

(1) The environment state is set as the mean square error (i.e. reinforcement learning objective function F (X)) between the model estimated value Y _est of the current number of rounds and the physical machine measured value Y _mea, and is used for representing the difference between the current model parameter combination and the actual parameter combination;

(2) The single step rewards are used for measuring the quality of actions selected by the current state, are expressions of objective function values, and have the following two modes:

iir＝min(1,max(0,(F(X_mean)-F(X_best))/(F(X_mean)-f(X_cur))))

r represents a punishment and punishment value, F (X _mean) represents an objective function value under the average value of parameters, F (X _best) represents an objective function value under the optimal parameters, and F (X _cur) represents an objective function value under the current parameters

(3) The action splits the search range of the ith parameter into the following parameters according to the minimum subinterval G _i (i=1, 2, …, N) of the parameters and the range setting of each parameterSubinterval (x _imax is the maximum value of the parameter,/>Is the minimum value of the parameter), after a subinterval is selected, a value is randomly acquired in the interval to act;

(4) The action selection strategy is divided into two phases: selecting a search path and determining behavior. The former determines whether the next round of selected behavior is to the left or right of the current behavior, the selection index is L _p (i, j), and the calculation formula is as follows:

L _p (i, j) denotes … …, k denotes the number of parameter transform combinations

For the nth largest Q value of the k behaviors adjacent to the current behavior a _i,j on the ith path, λ ₁ is the path weight coefficient.

A random number epsilon ₁ between 0,1 is obtained and the search path is determined using the following formula:

rand (1, 2) represents the random probability distribution over the interval 1-2

The determining of the behavior is to select one of k behaviors adjacent to the current behavior on the search path, obtain a random number epsilon ₂ between [0,1], and determine the behavior by using the following formula:

epsilon ₁ and epsilon ₂ are both random numbers used to ensure reinforcement learning heuristics.

(5) Q-Learning algorithm based Q-value function updating strategy is designed, and the Q-value function updating formula corresponding to the ith parameter is as follows:

Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i,j))

Where α is a super parameter for controlling learning rate, r is a single-step prize, λ ₂ is a super parameter for controlling update amplitude, and the smaller the value, the larger the Q value table update amplitude.

S3, a Modelica model parameter identification algorithm based on a reinforcement learning framework is used for a parameter rough adjustment stage, and the method specifically comprises the following steps:

(1) Randomly initializing all parameters to be identified, substituting the parameters into an FMU for calculation, obtaining a mean square error as an initial value of an optimal objective function compared with measured data, and recording;

(2) Entering a training period, and executing serial learning by a plurality of agents in the period, namely optimizing each parameter in turn, wherein the learning process of a single agent is called' round;

(3) And entering a round, randomly selecting an action a _rand (i) in the search range of the parameter, fixing other parameters, applying the parameters to the FMU to obtain an objective function value F (X _cur) of the current round and a current reward and punishment value r, and updating a Q value table of the current intelligent agent according to an updating strategy of the Q value function. If F (X _cur)≤F(X_best), considering the current parameter to have search value, turning to (4), otherwise, entering the next round;

(4) Enter a search period in which Selecting one action a _iter (i) according to an action selection strategy by each iteration, fixing other parameters, applying the parameters to the FMU to obtain a target function value F (X _cur) of the current round and a current reward and punishment value r, and updating a Q value table of the current intelligent agent;

(5) After the search period of one agent is finished, the next agent is entered into a round until all agents go through a round, a training period is finished, if the current period number is smaller than that of the next agent And (2) entering the next training period, otherwise, ending the training.

S4, a parameter fine adjustment scheme based on a heuristic optimization algorithm PSO comprises the following steps:

(1) And determining the fine tuning range of each parameter to be identified according to the operation result of the reinforcement learning algorithm. If the optimal subinterval obtained by reinforcement learning is I=1, 2, …, N, the fine tuning range is set to/>, taking into account the recognition error of reinforcement learningMu value is set according to the searching capability of the particle swarm algorithm,/>Is the optimal value of the i-th parameter after reinforcement learning identification;

(2) Initializing an N-dimensional space and a particle group, and creating a speed vector v _i, a position vector x _i, a historical optimal position vector p _i and a historical optimal position vector p _g of the whole particle group of each particle;

(3) And establishing a fitness function F (X) by using the mean square error of the model predicted value and the measured value, wherein the model predicted value is obtained by substituting the current position of the particle into the FMU for simulation solution. The search target of the whole particle swarm is to minimize F (X);

(4) Starting iterative search, calculating fitness value of each particle in one round of iteration, updating p _i value of single particle and p _g value of population, updating speed and position vector of each particle according to the following formula,

Omega is inertial weight, defaults to 0.6, c ₁、c₂ is learning factor, and c ₁＝c₂＝2,r₁、r₂ is a random number between [0,1] in formula (1);

after the maximum iteration times are reached, the historical optimal position vector p _g of the particle swarm is the final result of parameter fine tuning, and is also the final result of the whole parameter identification process.

The embodiment of the invention is described in detail by taking a model of a power transmission system (a single-shaft servo feeding system of a numerical control machine tool) created based on Modelica as shown in fig. 2 as an example, wherein the system simulates a sinusoidal signal to drive a gear reducer to drive a load to operate, and two components for simulating inertia, a spring and a shock absorber for influencing the whole system are arranged in the middle of the system. The input of the whole system model is a sine signal, the output is the absolute angular velocity of the component inaertia 3, and parameters to be identified and the auxiliary information thereof are shown in table 1.

TABLE 1

It should be explained that this embodiment is to explain the technical solution of the present invention and prove its effectiveness and feasibility, but not the actual engineering case, so in this embodiment, the actual measurement data of the actual physical machine is not used as a sample set, but the output of the standard model is used as a sample set, and then all the parameters to be identified are randomly initialized, and further the technical solution of the present invention is used to perform parameter identification so that the parameters to be identified are as close to the parameter standard values as possible, thus completing identification. The complete identification process comprises the following steps:

1. An FMU (functional model unit) of the model of the transmission system is exported, and simulation parameters such as step length, starting time, ending time, solver and the like of simulation solving are set;

2. Setting parameters to be identified in the FMU as standard values, and obtaining output by simulation solution as a sample set. 1000 points are sampled on an output curve in a fixed period to serve as a data sample set for the parameter identification, and a subsequent reinforcement learning objective function and a PSO algorithm fitness function are mean square errors between the ordinate of the 1000 points and model predicted values;

3. Parameter sensitivity analysis was performed on the Modelica model. As described above, the 4 parameters to be identified are the load moment of inertia of the speed reducer inertian 2. J/spring stiffness coefficient spring. C/damper damping coefficient damping/motor output shaft moment of inertia Jmotor, and the sensitivity analysis steps are as follows:

(1) Setting a sample: the number of dimensions is 4 (i.e. 4 input variables), the number is 10000 (i.e. 10000 input samples), a 10000×8 Sobol matrix is generated based on the range of each parameter, and is processed according to the Sobol method to obtain a matrix A, B and a _B ⁽ⁱ⁾ (i=1, 2, …, 8);

(2) Inputting the sample matrix into an FMU for simulation solution to obtain output, and sorting the output into Y-value matrices Y _A、Y_B and Y _AB according to a Sobol method, wherein f (A), f (B) and f (A _B ⁱ) are respectively used for representing the output;

(3) The global impact index S _Ti for each parameter is found according to the following formula:

the results are shown in FIG. 3, jmotor > spring. C > damping > insertia 2.J for sensitivity;

(4) Determining that the learning sequence is Jmotor, spring.c, damping and inaertia 2.J according to the sensitivity ordering obtained in the step (3), determining G _i corresponding to each parameter, and further determining that the splitting grid number corresponding to each parameter is 200/1000/100/30 respectively.

4. And carrying out rough parameter adjustment on the Modelica model. Using Q-Learning based reinforcement Learning, a search range needs to be set for each parameter, in this case 4 parameters to be identified, which search range is obtained from a reference manual or experience, as also shown in table 1. Meanwhile, parameters in an algorithm are set, the learning rate alpha=0.7, lambda ₁ =0.5 is a path weight coefficient, lambda ₂ =0.25 is a super parameter for controlling the update amplitude, k=4 is the number of parameter transformation combinations, and then the parameters are primarily identified by the following steps:

(1) Randomly initializing all parameters to be identified, substituting the parameters into a model-based Modelica power transmission system model for calculation, obtaining a mean square error as an initial value of an optimal objective function compared with a data sample set, and recording;

(3) Entering a round, randomly selecting an action a _rand (i) in the search range of the parameter, fixing other parameters, applying the parameters to the FMU to obtain an objective function value F (X _cur) of the current round, and using a formula And obtaining a single-step prize r, and updating the Q value table of the current intelligent agent by using a formula Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i,j)). If F (X _cur)≤F(X_best), considering the current parameter to have search value, turning to (4), otherwise, entering the next round;

The preliminary identification result obtained at this stage is the optimal subinterval of each parameter, as shown in table 2, and on this basis, an accurate solution is also required to be determined;

TABLE 2

5. And performing parameter fine adjustment on the Modelica model. Step 4 is to obtain the optimal subinterval where each parameter to be identified is located, and the PSO algorithm is required to be used in the interval to determine the final identification result, and the specific steps are as follows:

(1) The optimal subinterval obtained in the step 4 is Considering recognition error of reinforcement learning, the fine tuning range is set to/>Setting the mu value to be 2 based on the searching capability of the particle swarm algorithm;

(2) Initializing a 4-dimensional space and particle groups, and creating a speed vector v _i, a position vector x _i, a historical optimal position vector p _i and a historical optimal position vector p _g of the whole particle group for each particle;

(3) And establishing a fitness function F (X) by using the mean square error of the model predicted value and the data sample set, wherein the model predicted value is obtained by substituting the current position of the particle into the FMU for simulation solution. The search target of the whole particle swarm is to minimize F (X);

(5) After the maximum iteration times are reached, the historical optimal position vector p _g of the particle swarm is the final result of parameter fine tuning, and is also the final result of the whole parameter identification process.

The comparison between the final identification result and the standard value of the parameter is shown in the table 3, and the errors of the parameter identification result and the standard value are less than 5%, so that the engineering use requirement is met.

TABLE 3 Table 3

The model-based power transmission model is established for the numerical control machine feeding system, the parameter identification is carried out by adopting the method, under the same input condition, the output result of the physical model machine and the simulation result of the model are shown in figure 4, and the transformation trend, the numerical value and the output curve of the physical model machine of the model simulation curve are basically consistent, so that the correctness and the effectiveness of the parameter identification method are illustrated.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The method for identifying the model parameters of the power transmission system based on reinforcement learning is characterized by comprising the following steps of:

the reinforcement learning frame construction process specifically includes:

(2) Constructing a single step prize:

r＝min(1，max(0，(F(X_mean)-F(X_best))/(F(X_mean)-F(X_cur))))

Wherein r represents a single-step prize value, F (X _cur) represents an objective function value under the current parameter, F (X _best) represents an objective function value under the optimal parameter, and F (X _mean) represents an objective function value under the parameter average;

Splitting the search range of the ith parameter into Selecting one subinterval, and randomly acquiring a value in the subinterval as an action; wherein/>For the ith parameter maximum,/>Is the minimum value of the ith parameter; the minimum variation G _i refers to the variation of the i-th parameter which is increased or decreased in each step in the identification process;

(4) Constructing an action selection strategy:

01. Selecting a search path:

Determining whether the action selected by the next round is on the left side or the right side of the current action, wherein the selection index is L _p (i, i), and the calculation formula is as follows:

02. determining the action:

Q (i, m) represents the Q value of the ith parameter to be identified, and epsilon ₁ and epsilon ₂ are random numbers used for guaranteeing reinforcement learning exploratory property;

(5) Constructing an updating strategy of the Q value function:

Q^r+1(i,j)+＝α(r+(1-λ₂)max(L_p(i,j))+λ₂min(L_p(i,j))-Q^r(i,j))

Wherein alpha is a super parameter for controlling learning rate, r is a single-step reward, and lambda ₂ is a super parameter for controlling update amplitude;

s4, parameter fine adjustment:

2. The method for identifying parameters of a power transmission system model based on reinforcement learning according to claim 1, wherein step S2 is specifically implemented by using a sobol method to analyze the sensitivity of the power transmission system model with undetermined parameters, and specifically includes the following steps:

02. matrix the samples Simulation solution is carried out on the power transmission system model as input to respectively obtain an initial sample matrix A, an initial sample matrix B and a cross sample matrix/>Model simulation result vectors f (A), f (B),

03. The global impact index S _Ti of each parameter is found based on the simulation result and the following formula:

3. The method for identifying the parameters of the model of the power transmission system based on reinforcement Learning according to claim 1, wherein the step of rough adjusting parameters uses a reinforcement Learning framework based on a Q-Learning algorithm, and the specific steps of iterative training are as follows:

(3) The action selection strategy selects an action a _iter (i), fixes other parameters, applies the parameters to the FMU to obtain an objective function value F (X _cur) and a reward and punishment value r, and updates a Q value table;

(5) After all parameters to be identified are serially executed in the steps (2) - (4), completing a training period, if the current completion period number is smaller than the given training times And (3) turning to the step (2) to enter the next training period, otherwise, finishing the training.

4. The method for identifying the parameters of the power transmission system model based on reinforcement learning according to claim 1, wherein step S4 is specifically to use a PSO optimization algorithm to fine tune the parameters to obtain a final identification result.

5. The method for identifying the parameters of the power transmission system model based on reinforcement learning according to claim 1, wherein the parameters are finely tuned by using a PSO optimization algorithm, and the specific steps are as follows:

Setting the trimming range to Wherein, the mu value is set according to the searching capability of the particle swarm algorithm,/>For the optimal subinterval obtained by reinforcement learning,/>Is the optimal value of the i-th parameter after reinforcement learning identification;

Establishing a fitness function G (X) according to the mean square error of a model predicted value and an actual measured value, wherein the model predicted value is obtained by substituting the current position of particles into an FMU for simulation solution; the search objective of the whole population of particles is to minimize G (X);

starting iterative search: in one iteration, calculating the fitness value of each particle, updating the p _i value of a single particle and the p _g value of the population, and updating the speed and the position vector of each particle according to the following formula;

wherein ω is inertial weight, c ₁、c₂ is learning factor, r ₁、r₂ is a random number between [0,1 ];

6. A reinforcement learning-based power transmission system model parameter identification system, comprising: a computer readable storage medium and a processor;

The computer-readable storage medium is for storing executable instructions;

The processor is configured to read executable instructions stored in the computer-readable storage medium and execute the reinforcement learning-based power train model parameter identification method of any one of claims 1 to 5.