CN117096984A - Battery pack balanced sensing quick charge control method and system based on reinforcement learning - Google Patents

Battery pack balanced sensing quick charge control method and system based on reinforcement learning Download PDF

Info

Publication number
CN117096984A
CN117096984A CN202311113458.3A CN202311113458A CN117096984A CN 117096984 A CN117096984 A CN 117096984A CN 202311113458 A CN202311113458 A CN 202311113458A CN 117096984 A CN117096984 A CN 117096984A
Authority
CN
China
Prior art keywords
battery
battery pack
reinforcement learning
equalization
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311113458.3A
Other languages
Chinese (zh)
Inventor
魏婧雯
杨易昆
贺嘉瑞
陈春林
王志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202311113458.3A priority Critical patent/CN117096984A/en
Publication of CN117096984A publication Critical patent/CN117096984A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/0013Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries acting upon several batteries simultaneously or sequentially
    • H02J7/0014Circuits for equalisation of charge between batteries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01MPROCESSES OR MEANS, e.g. BATTERIES, FOR THE DIRECT CONVERSION OF CHEMICAL ENERGY INTO ELECTRICAL ENERGY
    • H01M10/00Secondary cells; Manufacture thereof
    • H01M10/42Methods or arrangements for servicing or maintenance of secondary cells or secondary half-cells
    • H01M10/425Structural combination with electronic components, e.g. electronic circuits integrated to the outside of the casing
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01MPROCESSES OR MEANS, e.g. BATTERIES, FOR THE DIRECT CONVERSION OF CHEMICAL ENERGY INTO ELECTRICAL ENERGY
    • H01M10/00Secondary cells; Manufacture thereof
    • H01M10/42Methods or arrangements for servicing or maintenance of secondary cells or secondary half-cells
    • H01M10/44Methods for charging or discharging
    • H01M10/441Methods for charging or discharging for several batteries or cells simultaneously or sequentially
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/0047Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries with monitoring or indicating devices or circuits
    • H02J7/0048Detection of remaining charge capacity or state of charge [SOC]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/007Regulation of charging or discharging current or voltage
    • H02J7/0071Regulation of charging or discharging current or voltage with a programmable schedule
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/007Regulation of charging or discharging current or voltage
    • H02J7/00712Regulation of charging or discharging current or voltage the cycle being controlled or terminated in response to electric parameters
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/007Regulation of charging or discharging current or voltage
    • H02J7/00712Regulation of charging or discharging current or voltage the cycle being controlled or terminated in response to electric parameters
    • H02J7/007182Regulation of charging or discharging current or voltage the cycle being controlled or terminated in response to electric parameters in response to battery voltage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01MPROCESSES OR MEANS, e.g. BATTERIES, FOR THE DIRECT CONVERSION OF CHEMICAL ENERGY INTO ELECTRICAL ENERGY
    • H01M10/00Secondary cells; Manufacture thereof
    • H01M10/42Methods or arrangements for servicing or maintenance of secondary cells or secondary half-cells
    • H01M10/425Structural combination with electronic components, e.g. electronic circuits integrated to the outside of the casing
    • H01M2010/4271Battery management systems including electronic circuits, e.g. control of current or voltage to keep battery in healthy state, cell balancing
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01MPROCESSES OR MEANS, e.g. BATTERIES, FOR THE DIRECT CONVERSION OF CHEMICAL ENERGY INTO ELECTRICAL ENERGY
    • H01M10/00Secondary cells; Manufacture thereof
    • H01M10/42Methods or arrangements for servicing or maintenance of secondary cells or secondary half-cells
    • H01M10/425Structural combination with electronic components, e.g. electronic circuits integrated to the outside of the casing
    • H01M2010/4278Systems for data transfer from batteries, e.g. transfer of battery parameters to a controller, data transferred between battery controller and main controller

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Evolutionary Computation (AREA)
  • Manufacturing & Machinery (AREA)
  • Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Secondary Cells (AREA)

Abstract

The invention relates to the technical field of power systems, in particular to a battery pack balanced sensing quick charge control method and system based on reinforcement learning, wherein the method comprises the following steps: establishing a battery monomer model and an energy balance topology model of a battery pack, and converting the battery monomer model and the energy balance topology model into a balanced sensing fast charge mathematical model of the battery pack; establishing a battery pack equalization sensing fast charge optimization problem according to an equalization sensing fast charge mathematical model, and setting a task environment according to the optimization problem; wherein the optimization problem targets a minimum of charging time and inconsistency; setting super parameters of a reinforcement learning algorithm, interacting with the reinforcement learning algorithm through a balanced perception fast charge mathematical model, and collecting training data to complete training of the reinforcement learning algorithm; and transferring the trained reinforcement learning algorithm to a real battery pack, and performing balanced sensing fast charge control on the battery pack. The invention realizes a self-adaptive charge equalization strategy and ensures that the battery pack has more energy after the charge is finished.

Description

Battery pack balanced sensing quick charge control method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of power systems, in particular to a battery pack balanced sensing quick charge control method and system based on reinforcement learning.
Background
With the rapid development of electric vehicles, battery management systems have also tended to be more sophisticated and complex. How to shorten the charging time to the maximum extent without damaging the battery has important significance for the large-scale popularization of the electric automobile. The goal of shortening the charging time can be achieved by using a larger current, however, excessive current runs the risk of violating critical physical safety constraints and premature aging. On the other hand, practical energy storage systems are composed of hundreds or thousands of battery cells to meet the high voltage and capacity requirements. The battery variations caused by unavoidable inconsistencies during manufacture and use will significantly affect the charge efficiency, capacity and life of the battery pack. While high charging currents at fast charging further amplify the non-uniformity effect. Therefore, developing a fast charge control strategy with balance awareness is critical to shorten the battery pack charging time and achieve energy balance of the cells within the battery pack.
In order to achieve optimal charge management and solve the problem of cell inconsistency in a battery pack, a great number of techniques and methods are currently available, and can be reviewed in terms of a model-free charge control strategy, a model-based charge control strategy, and a charge equalization strategy.
The performance of model-free methods, such as constant-current constant-voltage charging (CC & CV) strategies, is determined by several important thresholds. However, such predefined charge current and voltage profiles lack a profound understanding of battery physical and chemical characteristics. And the charge rate and voltage limits are also selected based on empirical knowledge of battery behavior, the settings of which are often conservative. Furthermore, the equalization algorithm and the charging strategy are designed separately, which may lead to a sub-optimal solution for the charging and equalization scheme and even to safety problems.
Model-based strategies first capture battery dynamics and estimate state of charge using various battery models. And then, specific optimization objective functions and constraint conditions are formulated in consideration of different indexes such as charging speed, aging, heat and the like. And finally, designing an optimization algorithm, searching an optimal solution, and guiding the battery charging behavior. These methods can be broadly classified into two types according to the type of battery model: strategies based on reduced-order electrochemical models and on equivalent circuit models. Electrochemical models while the electrochemical mechanism of battery charging can be understood in depth, and great effort has been currently devoted to developing low-level models, these charging methods are far from real-time battery management systems due to the high complexity of parameter identification and state estimation. In contrast, equivalent circuit models are widely used in industrial battery management systems because of their relative simplicity and ease of parameterization. But these works were developed for a single battery. Notably, improper charging strategies may increase battery inconsistencies, but this is not considered in these works. Therefore, a charging strategy that allows for battery equalization is necessary.
The charge equalization strategy employs an external circuit design to deploy energy between the cells. The performance of the equalization system depends on the battery system, the equalization circuit topology, and the equalization algorithm. Although the existing equalization strategies can reach an equilibrium state, the fast charge of the battery and the equalization system are controlled separately. How to adjust the charging strategy according to the battery non-uniformity and how to control the equalization system according to the charging strategy are not fully considered.
Disclosure of Invention
Aiming at the problems of high model complexity and neglected coupling relation between quick charge and equalization, the invention provides an equalization sensing quick charge method of a lithium ion battery pack by utilizing the characteristics of no model and online adaptability of deep reinforcement learning, which can realize an adaptive charge equalization strategy under the scene of larger or smaller battery inconsistency and ensure that the battery pack has more energy after the charge is finished.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in a first aspect, a method for controlling battery pack equalization sensing and fast charge based on reinforcement learning is provided, including:
step S100: establishing a battery monomer model and an energy balance topology model of a battery pack, and converting the battery monomer model and the energy balance topology model into a balanced sensing fast charge mathematical model of the battery pack;
step S200: establishing a battery pack equalization sensing fast charge optimization problem according to an equalization sensing fast charge mathematical model, and setting a task environment according to the optimization problem; wherein the optimization problem targets a minimum of charging time and inconsistency;
step S300: setting super parameters of a reinforcement learning algorithm, interacting with the reinforcement learning algorithm through a balanced perception fast charge mathematical model, and collecting training data to complete training of the reinforcement learning algorithm;
step S400: and transferring the trained reinforcement learning algorithm to a real battery pack, and performing balanced sensing fast charge control on the battery pack.
Preferably, step S100 includes:
step S110: establishing a battery monomer model by using a first-order Thevenin equivalent circuit model, and determining parameters of the battery model; the specific parameters include: the resistance and capacitance of the model, the cell capacity, the open circuit voltage and the state of charge of the battery, i.e., the OCV-SOC relationship, and the coulomb efficiency.
Step S120: establishing an energy balance topology model of the battery pack in a battery balance mode of the battery cell to the battery pack; the method specifically comprises the following steps: determining an equilibrium topological structure, and determining the number of battery monomers, switching frequency, PWM wave duty ratio and equilibrium current;
step S130: based on kirchhoff current law, external charging current and balanced current in a battery pack are combined, and a battery monomer model and a battery pack energy balanced topology model are converted into a balanced sensing fast charge mathematical model of the battery pack.
Preferably, in the step S110, the dynamic characteristics of the battery cell model are expressed as:
U 1 (k+1)=α 1 U 1 (k)+(1-α 1 )R 1 I L (k)
U t (k)=U oc (k)-U 1 (k)-I L (k)R 0
wherein U is 1 For the voltage across the parallel Rc network, α 1 =exp(-Δt/(R 1 C 1 ) Δt represents the sampling time, k represents the kth period of sampling, I L Indicating load current, U t Representing terminal voltage, U oc Representing an open circuit voltage as a function of the state of charge, SOC, of the battery;
the state of charge SOC of the battery is expressed as:
wherein eta 0 E (0, 1) denotes coulombic efficiency, C bat Representing the battery capacity;
measurement of Battery Capacity C by Capacity test experiments bat Coulombic efficiency eta 0 The 0CV-SOC relation is measured by a low-current OCV test, dynamic data is obtained through DST, FUDS and UDDS working condition test experiments, parameter identification is carried out by a particle swarm algorithm, and the ohmic internal resistance R of the battery cell is determined 0 Internal resistance of polarization R 1 And polarization capacitor C 1
Preferably, in step S130, according to kirchhoff' S current law, in combination with external charging current and equalization current in the battery pack, an equalization sensing fast charge mathematical model is established as follows:
x(k+1)=Ax(k)+B(1 2n u 1 (k)+C[u 2 (k);u 2 (k)])
y(k)=U oc (x 1 (k))-x 2 (k)-D(1 n u 1 (k)+Cu 2 (k))
wherein 1 is n A vector representing n×1, n representing the number of battery cells of the entire battery pack; i represents the relevant parameter of the ith batteryA number;respectively representing the SOC of n battery cells and the voltage at two ends of an RC network; />Is the total state variable; />Represents bus current; />Is the terminal voltage of n cells, +.>Is a Boolean variable, u 2,i E {0,1}, i=1, 2, …, n, representing the state of n battery equalization switches; and
wherein I is n And 0 (0) n Is an n x n identity matrix and a zero matrix, B 1 =-η 0 dia g (-1/3600/C bat,1 ,…,-1/3600/C bat,n ),α=diag(α 1,1 ,…,α 1,n ),B 2 =diag((1-α 1,1 )R 1,1 ,…,(1-α 1,n )R 1,n ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein beta is the energy transfer efficiency of the primary side and the secondary side of the transformer, I cp Is the current magnitude of the primary side of the transformer.
Preferably, step S200 includes:
step S210: establishing a battery pack balanced sensing quick charge optimization problem, wherein the optimization problem comprises an optimization target and a limiting condition, and the optimization target is that the charging time and the battery inconsistency are minimized; limiting conditions include the total current flowing through the battery, the terminal voltage of the battery and the electric quantity of the battery, so that the battery is ensured to be within the safety limit in the charge equalization process;
step S220: setting an environment state observation value of a task environment and an action value of a control battery pack, wherein:
the environmental state observation value comprises the battery charge state SOC, terminal voltage and environmental temperature of all the monomers in the battery pack;
the action value comprises the magnitude of external charging current and the balanced switch state of all batteries;
step S230: and determining the rewarding function according to the optimization problem and the limiting condition.
Preferably, in step S210, the optimization objective is:
J=J t +J e
wherein J represents the overall optimization objective, the objective of the optimization problem being to minimize the optimization objective; j (J) t Represents the time taken from the initial state of charge SOC to the end of charge, J e Representing battery non-uniformity; cell inconsistency J e Quantification is by euclidean distance between battery state of charge and average state of charge, namely:
where N is the total number of cycles spent from initial state to end of charge, x 1 (k) State variable x representing the kth period 1
The limiting conditions include total current, terminal voltage and battery electric quantity flowing through the battery, and the battery is ensured to be within the safety limit in the charge equalization process, and the specific expression is as follows:
|1 n u 1 (k)+Cu 2 (k)|≤1 n I max
1 n U min ≤y(k)≤1 n U max
1 n 0≤x 1 (k)≤1 n 100%
wherein I is max Is the maximum charging current, and U min And U max Representing the minimum and maximum terminal voltages of the battery, respectively.
Preferably, in step S230, the reward function is as follows:
r k =r time +r bal +r vol +r soc
total prize function r for the kth period k WhereinRepresenting the cost of the time it takes, measuring the inconsistency between the cells, rewards r related to terminal voltage and charge capacity constraints vol And r soc Expressed in the following form:
wherein delta v,j And delta s,j The median variables represent the rewards obtained for each monomer, in the following specific form:
wherein,y j,k and SOC (System on chip) j,k The terminal voltage and SOC of the jth battery at time k are shown, respectively.
Preferably, step S300 includes:
step S310: setting basic super parameters of a reinforcement learning algorithm, wherein the basic super parameters at least comprise: size N of experience playback pool r The method comprises the steps of carrying out a first treatment on the surface of the Size N of data batch used for each update b The method comprises the steps of carrying out a first treatment on the surface of the The number of layers of the neural network, the number of nodes in each layer and the activation function; a discount factor gamma; policy network Q φ (s, a) optimizer for parameter update, target network Q φ Update frequency N of- (s, a) - Learning rate eta Q
Step S320: initializing the battery charge state SOC of the battery cells in the battery pack, and setting target conditions for ending the charging and balancing process;
step S330: the reinforcement learning algorithm outputs an action a according to the current state observation value s and the current strategy, the current action interacts with the environment, and the state of the battery pack is controlled to obtain a state s' and a reward value r at the next moment; storing (s, a, r, s') as a set of training data to an experience playback pool for training of a subsequent reinforcement learning algorithm;
step S340: randomly sampling a batch of data (S, a, R, S') from an empirical playback pool, wherein (S j ,a j ,r j ,s j+1 ) Representing single training data; calculating a loss function for updating the value function network parameters:
wherein y is j Representation s j Under the state, the future available rewards are predicted, and the specific expression is as follows:
wherein,represented on the current target network Q φ Under evaluation from s j+1 The maximum rewards that the state can achieve up to the end of the task, indicating a prediction of future state;
the parameters of the update value function network by adopting the gradient descent method are as follows:
wherein eta Q Network Q is a value function φ (s, a) updated learning rate,is the gradient of the value network;
step S350: every N - Updating parameters of a target network:
φ - ←φ
step S360: repeating the steps S320 to S340 until the algorithm is completely converged, and obtaining the optimal strategy network for the battery pack to reach the charging and balancing targets in the current state
In a second aspect, a battery pack equalization sensing fast charge control system based on reinforcement learning is provided, including a memory and a processor; the storage stores a computer program which can realize the battery pack equalization sensing fast charge control method based on reinforcement learning when being executed by the processor.
In a third aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the reinforcement learning based battery pack equalization aware fast charge control method described above.
The invention has the following beneficial effects:
the invention utilizes the characteristics of deep reinforcement learning, no model and online adaptability, avoids the problem of high complexity of online model parameter identification, and can adaptively adjust self-charging balance strategy after comprehensively considering the quick charging and balance problems, so that the battery pack obtains more electric quantity after charging is finished.
Drawings
FIG. 1 is a flow chart of a control method of the present invention;
FIG. 2 is a schematic diagram of a battery cell model in a simulation environment according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an equalization sensing fast-charging model according to an embodiment of the present invention;
FIG. 4 is a block diagram of an overall algorithm framework according to an embodiment of the present invention;
FIG. 5 is a graph of open circuit voltage OCV as a function of state of charge SOC for a battery used in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1, the invention relates to a battery pack equalization sensing fast charge control method based on reinforcement learning, comprising the following steps:
step S100: establishing a battery monomer model and an energy balance topology model of a battery pack, and converting the battery monomer model and the energy balance topology model into a balanced sensing fast charge mathematical model of the battery pack;
step S200: establishing a battery pack equalization sensing fast charge optimization problem according to an equalization sensing fast charge mathematical model, and setting a task environment according to the optimization problem; wherein the optimization problem targets a minimum of charging time and inconsistency;
step S300: setting super parameters of a reinforcement learning algorithm, interacting with the reinforcement learning algorithm through a balanced perception fast charge mathematical model, and collecting training data to complete training of the reinforcement learning algorithm;
step S400: and transferring the trained reinforcement learning algorithm to a real battery pack, and performing balanced sensing fast charge control on the battery pack.
In this embodiment, as shown in fig. 2 and fig. 3, the first-order Thevenin equivalent circuit model and the cell-to-pack balance topology structure are taken as examples, and the task scenario is designed so that the battery pack is charged to the target electric quantity in the shortest time under the condition that the safety limit is not broken, and the electric quantity difference between the battery cells is smaller than a certain threshold value. The algorithm frame diagram of the lithium ion battery pack equalization sensing fast charge control based on reinforcement learning according to the embodiment is shown in fig. 4.
The steps in fig. 1 are specifically described below.
In step S100, an energy balance topology model of a battery cell model and a battery pack is established, and the energy balance topology model and the battery cell model are converted into a balanced sensing fast charge mathematical model of the battery pack, which specifically includes:
step S110: establishing a battery monomer model by using a first-order Thevenin equivalent circuit model, determining parameters of the battery model, wherein the first-order Thevenin equivalent circuit model is shown in fig. 2; the specific parameters include: the resistance and capacitance of the model, the cell capacity, the open circuit voltage and the state of charge of the battery, i.e., the OCV-SOC relationship, and the coulomb efficiency.
Wherein, the dynamic characteristics of the battery monomer model based on the first-order Thevenin equivalent circuit model are expressed as follows:
U 1 (k+1)=α 1 U 1 (k)+(1-α 1 )R 1 I L (k)
U t (k)=U oc (k)-U 1 (k)-I L (k)R 0
wherein U is 1 For the voltage across the parallel RC network, alpha 1 =exp(-Δt/(R 1 C 1 ) Δt represents the sampling time, k represents the kth period of sampling, I L Indicating load current, U t Representing terminal voltage, U oc Representing an open circuit voltage as a function of the state of charge, SOC, of the battery;
the state of charge SOC of the battery is expressed as:
wherein eta 0 E (0, 1) representation libraryEfficiency of C bat Representing the battery capacity;
in the embodiment, four lithium iron phosphate batteries are adopted, and the battery capacity C is measured through capacity test experiments bat Coulombic efficiency eta 0 OCV-SOC relation is measured by a low-current OCV test, relevant dynamic data is obtained through DST, FUDS and UDDS working condition test experiments, parameter identification is carried out by a particle swarm algorithm, and ohmic internal resistance R of a battery monomer is determined 0 Internal resistance of polarization R 1 And polarization capacitor C 1 . Step S120: establishing an energy balance topology model of the battery pack in a battery balance mode of the battery cell to the battery pack, wherein the battery cell to the battery pack balance topology model is shown in fig. 3; the method specifically comprises the following steps: determining an equilibrium topological structure, and determining the number of battery monomers, switching frequency, PWM wave duty ratio and equilibrium current;
the model can be implemented to transfer the charge of a certain battery to the whole battery pack while taking the charging module into account. The present example determines that the number of cells in the battery pack is 4, the switching frequency is 124khz, the PWM wave duty cycles of the primary side and the secondary side of the transformer are 80% and 19%, respectively, and the average value of the equalizing current is 7.5A.
Step S130: based on kirchhoff current law, external charging current and equalization current in a battery pack are combined, a battery monomer model and a battery pack energy equalization topological model are converted into an equalization sensing fast-charge mathematical model of the battery pack, and the equalization sensing fast-charge mathematical model is in the following form:
x(k+1)=Ax(k)+B(1 2n u 1 (k)+C[u 2 (k);u 2 (k)])
y(k)=U oc (x 1 (k))-x 2 (k)-D(1 n u 1 (k)+Cu 2 (k))
wherein 1 is n A vector representing n×1, n representing the number of battery cells of the entire battery pack; i represents the relevant parameter of the ith battery;respectively representing the SOC of n battery cells and the voltage at two ends of an RC network; />Is the total state variable; />Represents bus current; />Is the terminal voltage of n cells, +.>Is a Boolean variable, u 2.i E {0,1}, i=1, 2, …, n, representing the state of n battery equalization switches; and
wherein I is n And 0 (0) n Is an n x n identity matrix and a zero matrix, B 1 =-η 0 dia g (-1/3600/C bat,1 ,…,-1/3600/C bat,n ),α=diag(α 1,1 ,…,α 1,n ),B 2 =diag((1-α 1,1 )R 1,1 ,…,(1-α 1,n )R 1,n ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein beta is the energy transfer efficiency of the primary side and the secondary side of the transformer, I cp Is the current magnitude of the primary side of the transformer.
In step S200, a battery pack equalization sensing fast charge optimization problem is established according to an equalization sensing fast charge mathematical model, and a task environment is set according to the optimization problem, which specifically includes:
step S210: establishing a battery pack balanced sensing quick charge optimization problem, wherein the optimization problem comprises an optimization target and a limiting condition, and the optimization target is that the charging time and the battery inconsistency are minimized; limiting conditions include the total current flowing through the battery, the terminal voltage of the battery and the electric quantity of the battery, so that the battery is ensured to be within the safety limit in the charge equalization process;
the optimization objective is a multi-objective optimization problem considering charging time and equalization effect, and the expression is:
J=J t +J e
wherein J represents the overall optimization objective, the objective of the optimization problem being to minimize the optimization objective; j (J) t Represents the time taken from the initial state of charge SOC to the end of charge, J e Representing battery non-uniformity; cell inconsistency J e Quantification is by euclidean distance between battery state of charge and average state of charge, namely:
where N is the total number of cycles spent from initial state to end of charge, x 1 (k) State variable x representing the kth period 1
The limiting conditions include total current, terminal voltage and battery electric quantity flowing through the battery, and the battery is ensured to be within the safety limit in the charge equalization process, and the specific expression is as follows:
|1 n u 1 (k)+Cu 2 (k)|≤1 n I max
1 n U min ≤y(k)≤1 n U max
1 n 0≤x 1 (k)≤1 n 100%
wherein I is max Is the maximum charging current, and U min And U max Representing the minimum and maximum terminal voltages of the battery, respectively.
Step S220: setting an environment state observation value of a task environment and an action value of a control battery pack, wherein:
the environmental state observation value comprises the battery charge states SOC, terminal voltage and environmental temperature of 4 battery monomers in the battery pack;
the action value comprises the magnitude of external charging current and the balanced switch state of all batteries; in this embodiment, the charging magnification is 0 to 3C, and the discretized value is 0.5C.
Step S230: determining a reward function according to the optimization problem and the limiting condition;
the reward function is as follows:
r k =r time +r bal +r vol +r soc
total prize function r for the kth period k WhereinRepresenting the cost of the time it takes, measuring the inconsistency between the cells, rewards r related to terminal voltage and charge capacity constraints vol And r soc Expressed in the following form:
wherein delta v,j And delta s,j The median variables represent the rewards obtained for each monomer, in the following specific form:
wherein y is j,k And S isOC j,k The terminal voltage and SOC of the jth battery at time k are shown, respectively.
In step S300, setting super parameters of the reinforcement learning algorithm, interacting with the reinforcement learning algorithm through the balanced perception fast charge mathematical model, and collecting training data to complete training of the reinforcement learning algorithm; the method specifically comprises the following steps:
step S310: setting basic super parameters of a reinforcement learning algorithm, in this embodiment, "MNIH, volodymyr, et al playing atari with deep reinforcement learning. ArXiv preprint arXiv:1312.5602 2013. "proposed DQN algorithm applicable to continuous states and discrete motion spaces, the super parameters that need to be set include: size N of experience playback pool r =20000; size N of data batch used for each update b =512; the number of layers of the neural network is 5, the number of neurons of each layer is 9, 128, 256 and 28, and the activation function is a ReLU function; discount factor γ=0.9; value function network Q φ The optimizer of the (s, a) parameter update is Adam, and the target network Q φ Update frequency N of- (s, a) - =100, learning rate η Q =0.01;
Step S320: initializing the battery charge state SOC of the battery cells in the battery pack, and setting target conditions for ending the charging and balancing process; in this example, the SOC of the battery cell was initialized to 80%,61%,50%,65%, the ambient temperature was set to 25 ℃ and 45 ℃ respectively, and the target condition for the end of charge was 97% or more of the average SOC, and the equalization standard deviation threshold was set to 1.5%. Step S330: the reinforcement learning algorithm observes the value according to the current state s And outputting an action a by the current strategy, wherein the current action interacts with the environment, and controls the state of the battery pack to obtain a state s' and a reward value r at the next moment; storing (s, a, r, s') as a set of training data to an experience playback pool for training of a subsequent reinforcement learning algorithm;
step S340: randomly sampling a batch of data (S, a, R, S') from an empirical playback pool, wherein (S j ,a j ,r j ,s j+1 ) Representing single training data; calculating a penalty function for updating a value function network parameterThe number:
wherein y is j Representation s j Under the state, the future available rewards are predicted, and the specific expression is as follows:
wherein,represented on the current target network Q φ Under evaluation from s j+1 The maximum rewards that the state can achieve up to the end of the task, indicating a prediction of future state;
the parameters of the update value function network by adopting the gradient descent method are as follows:
wherein eta Q Network Q is a value function φ (s, a) updated learning rate,is the gradient of the value network;
step S350: every N - Updating parameters of a target network:
φ - ←φ
step S360: repeating the steps S320 to S340 until the algorithm is completely converged, and obtaining the optimal strategy network for the battery pack to reach the charging and balancing targets in the current state
In step S400, the obtained policy network is migrated to the real battery pack control system, and the superiority and feasibility of the balanced sensing fast charge framework are verified by comparing the experimental result and the simulation result
In step S400, the trained reinforcement learning algorithm is migrated to the real battery pack, and the algorithm effect is verified at the ambient temperature of 25 degrees and 45 degrees. The relevant results are shown in the following table, and the superiority and feasibility of the balanced perception quick-charging framework are verified by comparing the experimental results with the simulation results. In this embodiment, the training effects of the method and the similar algorithm disclosed by the invention are compared, and the results of the experiment and the simulation are shown in the following table:
as can be seen from the table, the simulation and experiment obtained by the lithium ion battery pack equalization sensing fast charge control method based on reinforcement learning are relatively consistent within the error allowable range, and compared with the traditional constant current constant voltage charging (CC & CV) and the mode of separately considering equalization, the method disclosed by the invention can effectively improve the final residual electric quantity of the battery pack.
The invention is not related in part to the same or implemented in part by the prior art.
The foregoing is a further detailed description of the invention in connection with specific embodiments, and it is not intended that the invention be limited to such description. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. The battery pack equalization sensing fast charge control method based on reinforcement learning is characterized by comprising the following steps of: comprising
Step S100: establishing a battery monomer model and an energy balance topology model of a battery pack, and converting the battery monomer model and the energy balance topology model into a balanced sensing fast charge mathematical model of the battery pack;
step S200: establishing a battery pack equalization sensing fast charge optimization problem according to an equalization sensing fast charge mathematical model, and setting a task environment according to the optimization problem; wherein the optimization problem targets a minimum of charging time and inconsistency;
step S300: setting super parameters of a reinforcement learning algorithm, interacting with the reinforcement learning algorithm through a balanced perception fast charge mathematical model, and collecting training data to complete training of the reinforcement learning algorithm;
step S400: and transferring the trained reinforcement learning algorithm to a real battery pack, and performing balanced sensing fast charge control on the battery pack.
2. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 1, wherein: the step S100 includes:
step S110: establishing a battery monomer model by using a first-order Thevenin equivalent circuit model, and determining parameters of the battery model; the specific parameters include: the resistance and capacitance of the model, the cell capacity, the open circuit voltage and the state of charge of the battery, i.e., the OCV-SOC relationship, and the coulomb efficiency.
Step S120: establishing an energy balance topology model of the battery pack in a battery balance mode of the battery cell to the battery pack; the method specifically comprises the following steps: determining an equilibrium topological structure, and determining the number of battery monomers, switching frequency, PWM wave duty ratio and equilibrium current;
step S130: based on kirchhoff current law, external charging current and balanced current in a battery pack are combined, and a battery monomer model and a battery pack energy balanced topology model are converted into a balanced sensing fast charge mathematical model of the battery pack.
3. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 2, wherein: in the step S110, the dynamic characteristics of the battery cell model are expressed as:
U 1 (k+1)=α 1 U 1 (k)+(1-α 1 )R 1 I L (k)
U t (k)=U oc (k)-U 1 (k)-I L (k)R 0
wherein U is 1 For the voltage across the parallel RC network, alpha 1 =exp(-Δt/(R 1 C 1 ) Δt represents the sampling time, k represents the kth period of sampling, I L Indicating load current, U t Representing terminal voltage, U oc Representing an open circuit voltage as a function of the state of charge, SOC, of the battery;
the state of charge SOC of the battery is expressed as:
wherein eta 0 E (0, 1) denotes coulombic efficiency, C bat Representing the battery capacity;
measurement of Battery Capacity C by Capacity test experiments bat Coulombic efficiency eta 0 OCV-SOC relation is measured by a low-current OCV test, dynamic data are obtained through DST, FUDS and UDDS working condition test experiments, parameter identification is carried out by a particle swarm algorithm, and ohmic internal resistance R of a battery monomer is determined 0 Internal resistance of polarization R 1 And polarization capacitor C 1
4. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 2, wherein: in step S130, according to kirchhoff' S current law, in combination with external charging current and equalization current in the battery pack, an equalization sensing fast charge mathematical model is established as follows:
x(k+1)=Ax(k)+B(1 2n u 1 (k)+C[u 2 (k);u 2 (k)])
y(k)=U oc (x 1 (k))-x 2 (k)-D(1 n u 1 (k)+Cu 2 (k))
wherein 1 is n A vector representing n×1, n representing the number of battery cells of the entire battery pack; i represents the relevant parameter of the ith battery; respectively representing the SOC of n battery cells and the voltage at two ends of an RC network; /> Is the total state variable; />Represents bus current; />Is the terminal voltage of n cells, +.>Is a Boolean variable, u 2,i E {0,1}, i=1, 2, …, n, representing the state of n battery equalization switches; and
wherein I is n And 0 (0) n Is an n x n identity matrix and a zero matrix, B 1 =-η 0 diag(-1/3600/C bat,1 ,…,-1/3600/C bat,n ),α=diag(α 1,1 ,…,α 1,n ),B 2 =diag((1-α 1,1 )R 1,1 ,…,(1-α 1,n )R 1,n ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein beta is the energy transfer efficiency of the primary side and the secondary side of the transformer, I cp Is the current magnitude of the primary side of the transformer.
5. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 1, wherein: step S200 includes:
step S210: establishing a battery pack balanced sensing quick charge optimization problem, wherein the optimization problem comprises an optimization target and a limiting condition, and the optimization target is that the charging time and the battery inconsistency are minimized; limiting conditions include the total current flowing through the battery, the terminal voltage of the battery and the electric quantity of the battery, so that the battery is ensured to be within the safety limit in the charge equalization process;
step S220: setting an environment state observation value of a task environment and an action value of a control battery pack, wherein:
the environmental state observation value comprises the battery charge state SOC, terminal voltage and environmental temperature of all the monomers in the battery pack;
the action value comprises the magnitude of external charging current and the balanced switch state of all batteries;
step S230: and determining the rewarding function according to the optimization problem and the limiting condition.
6. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 5, wherein: in step S210, the optimization objective is:
J=J t +J e
wherein J represents the overall optimization objective, the objective of the optimization problem being to minimize the optimization objective; j (J) t Represents the time taken from the initial state of charge SOC to the end of charge, J e Representing battery non-uniformity; cell inconsistency J e Quantification is by euclidean distance between battery state of charge and average state of charge, namely:
where N is the total number of cycles spent from initial state to end of charge, x 1 (k) State variable x representing the kth period 1
The limiting conditions include total current, terminal voltage and battery electric quantity flowing through the battery, and the battery is ensured to be within the safety limit in the charge equalization process, and the specific expression is as follows:
|1 n u 1 (k)+Cu 2 (k)|≤1 n I max
1 n U min ≤y(k)≤1 n U max
1 n 0≤x 1 (k)≤1 n 100%
wherein I is max Is the maximum charging current, and U min And U max Representing the minimum and maximum terminal voltages of the battery, respectively.
7. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 5, wherein: in step S230, the bonus function is as follows:
r k =r time +r bat +r vol +r soc
total prize function r for the kth period k WhereinRepresenting the cost of the time it takes, measuring the inconsistency between the cells, rewards r related to terminal voltage and charge capacity constraints vol And r soc Expressed in the following form:
wherein delta v,j And delta s,j The median variables represent the rewards obtained for each monomer, in the following specific form:
wherein y is j,k And SOC (System on chip) j,k The terminal voltage and SOC of the jth battery at time k are shown, respectively.
8. The reinforcement learning-based battery pack equalization sensing fast charge control method of claim 1, wherein: step S300 includes:
step S310: setting basic super parameters of a reinforcement learning algorithm, wherein the basic super parameters at least comprise: size N of experience playback pool r The method comprises the steps of carrying out a first treatment on the surface of the Size N of data batch used for each update b The method comprises the steps of carrying out a first treatment on the surface of the The number of layers of the neural network, the number of nodes in each layer and the activation function; a discount factor gamma; policy network Q φ (s, a) optimizer for parameter update, target network Q φ Update frequency N of- (s, a) - Learning rate eta Q
Step S320: initializing the battery charge state SOC of the battery cells in the battery pack, and setting target conditions for ending the charging and balancing process;
step S330: the reinforcement learning algorithm outputs an action a according to the current state observation value s and the current strategy, the current action interacts with the environment, and the state of the battery pack is controlled to obtain a state s' and a reward value r at the next moment; storing (s, a, r, s') as a set of training data to an experience playback pool for training of a subsequent reinforcement learning algorithm;
step S340: randomly sampling a batch of data (S, a, R, S') from an empirical playback pool, wherein (S j ,a j ,r j ,s j+1 ) Representing single training data; calculating a loss function for updating the value function network parameters:
wherein y is j Representation s j Under the state, the future available rewards are predicted, and the specific expression is as follows:
wherein,represented on the current target network Q φ Under evaluation from s j+1 The maximum rewards that the state can achieve up to the end of the task, indicating a prediction of future state;
the parameters of the update value function network by adopting the gradient descent method are as follows:
wherein eta Q Network Q is a value function ψ (s, a) updated learning rate,is the gradient of the value network;
step S350: every N - Updating parameters of a target network:
φ - ←φ
step S360:repeating the steps S320 to S340 until the algorithm is completely converged, and obtaining the optimal strategy network for the battery pack to reach the charging and balancing targets in the current state
9. Battery pack equalization sensing fast charge control system based on reinforcement learning, and is characterized in that: comprising a memory and a processor;
wherein the memory stores a computer program which when executed by the processor is capable of implementing the reinforcement learning-based battery pack equalization sensing fast charge control method of any one of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the reinforcement learning based battery equalization aware fast charge control method of any of claims 1-8.
CN202311113458.3A 2023-08-31 2023-08-31 Battery pack balanced sensing quick charge control method and system based on reinforcement learning Pending CN117096984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311113458.3A CN117096984A (en) 2023-08-31 2023-08-31 Battery pack balanced sensing quick charge control method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311113458.3A CN117096984A (en) 2023-08-31 2023-08-31 Battery pack balanced sensing quick charge control method and system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN117096984A true CN117096984A (en) 2023-11-21

Family

ID=88771324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311113458.3A Pending CN117096984A (en) 2023-08-31 2023-08-31 Battery pack balanced sensing quick charge control method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN117096984A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117458675A (en) * 2023-12-22 2024-01-26 宁德时代新能源科技股份有限公司 Battery charging simulation method, device, equipment and storage medium
CN117578679A (en) * 2024-01-15 2024-02-20 太原理工大学 Lithium battery intelligent charging control method based on reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117458675A (en) * 2023-12-22 2024-01-26 宁德时代新能源科技股份有限公司 Battery charging simulation method, device, equipment and storage medium
CN117458675B (en) * 2023-12-22 2024-04-12 宁德时代新能源科技股份有限公司 Battery charging simulation method, device, equipment and storage medium
CN117578679A (en) * 2024-01-15 2024-02-20 太原理工大学 Lithium battery intelligent charging control method based on reinforcement learning
CN117578679B (en) * 2024-01-15 2024-03-22 太原理工大学 Lithium battery intelligent charging control method based on reinforcement learning

Similar Documents

Publication Publication Date Title
Wang et al. A method for state-of-charge estimation of Li-ion batteries based on multi-model switching strategy
CN117096984A (en) Battery pack balanced sensing quick charge control method and system based on reinforcement learning
Zhang et al. A GA optimization for lithium–ion battery equalization based on SOC estimation by NN and FLC
Li et al. A comparative study of battery state-of-health estimation based on empirical mode decomposition and neural network
CN115632179B (en) Intelligent quick charging method and system for lithium ion battery
CN109061506A (en) Lithium-ion-power cell SOC estimation method based on Neural Network Optimization EKF
CN107329094A (en) Electrokinetic cell health status evaluation method and device
CN112018465B (en) Multi-physical-field-constrained intelligent quick charging method for lithium ion battery
CN105183994B (en) A kind of power battery SOC Forecasting Methodologies and device based on improved I-ELM
Luo et al. State of charge estimation method based on the extended Kalman filter algorithm with consideration of time‐varying battery parameters
Abdollahi et al. Optimal charging for general equivalent electrical battery model, and battery life management
CN107132490B (en) Method for estimating state of charge of lithium battery pack
CN115291116B (en) Energy storage battery health state prediction method and device and intelligent terminal
CN110303945B (en) Self-adaptive optimization balance control method for electric quantity of storage battery pack
CN109001640A (en) A kind of data processing method and device of power battery
Xu et al. State of charge estimation for liquid metal battery based on an improved sliding mode observer
CN106887877A (en) A kind of battery pack active equalization control system estimated based on battery power status
Wei et al. State of health assessment for echelon utilization batteries based on deep neural network learning with error correction
Wang et al. Dynamic battery equalization scheme of multi-cell lithium-ion battery pack based on PSO and VUFLC
Huang et al. State of charge estimation of li-ion batteries based on the noise-adaptive interacting multiple model
Yang et al. Balancing awareness fast charging control for lithium-ion battery pack using deep reinforcement learning
CN115236526A (en) Method and device for predicting residual charging time, storage medium and vehicle
Lu et al. An active equalization method for redundant battery based on deep reinforcement learning
CN111948539A (en) Kalman filtering lithium ion battery SOC estimation method based on deep reinforcement learning
Demirci et al. Review of battery state estimation methods for electric vehicles-Part I: SOC estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination