CN106296044A

CN106296044A - power system risk scheduling method and system

Info

Publication number: CN106296044A
Application number: CN201610882652.1A
Authority: CN
Inventors: 郭晓斌; 许爱东; 简淦杨; 魏文潇; 占恺峤; 史训涛; 谭勤学; 吴俊阳; 韩传家; 余涛
Original assignee: China South Power Grid International Co ltd; South China University of Technology SCUT; Power Grid Technology Research Center of China Southern Power Grid Co Ltd
Current assignee: China South Power Grid International Co ltd; South China University of Technology SCUT
Priority date: 2016-10-08
Filing date: 2016-10-08
Publication date: 2017-01-04
Anticipated expiration: 2036-10-08
Also published as: CN106296044B

Abstract

The present invention relates to a power system risk scheduling method and system, which acquires the architecture data and new task load section data of the power system; according to the architecture data and the new task load section data, the preset initial knowledge matrix is modified by bacteria foraging reinforcement learning algorithm Perform iterative update to obtain the corresponding risk scheduling objective function value and the updated knowledge matrix. According to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, the new task is optimized online, and the risk scheduling optimization result is obtained and output. The optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge transfer, and the new task is optimized online by using the reinforcement learning of bacterial foraging based on knowledge transfer. Through transfer learning, the speed of online learning is greatly improved, and the online dynamic optimization of risk scheduling problems is realized. When the scale of the problem is further expanded, it can still guarantee a faster solution speed, and can adapt to the rapid optimization of large-scale and complex risk scheduling.

Description

Power system risk scheduling method and system

技术领域technical field

本发明涉及电力电网技术领域，特别是涉及一种电力系统风险调度方法和系统。The invention relates to the technical field of power grids, in particular to a method and system for power system risk scheduling.

背景技术Background technique

近年来，伴随区域电网互联与高压长距离大容量输电的发展，电力系统的安全稳定运行面临着更为严峻的挑战。为更好地权衡系统安全性与经济效益，增强调度操作抵御运行风险的水平，在发电优化中引入电力系统的风险理论，对风险调度进行了大量研究。In recent years, with the development of regional power grid interconnection and high-voltage long-distance large-capacity transmission, the safe and stable operation of the power system is facing more severe challenges. In order to better balance system security and economic benefits, and enhance the level of dispatching operations against operational risks, the risk theory of power system is introduced in power generation optimization, and a lot of research has been done on risk dispatching.

传统的电力系统风险调度方法是将遗传(genetic algorithm，GA)、量子遗传(quantum genetic algorithm，QGA)、蜂群(artificial bee colony，ABC)、粒子群(particle swarm optimization，PSO)等智能算法应用于电力系统各个优化问题。然而，这类智能算法对相似任务的优化是孤立进行的，不能有效保存过去任务的经验和知识，缺乏自学习能力，每次新任务执行时需重新初始化，导致寻优效率较低，难以适应大规模复杂风险调度的快速优化。The traditional power system risk scheduling method is to apply intelligent algorithms such as genetic algorithm (GA), quantum genetic algorithm (QGA), artificial bee colony (ABC), particle swarm optimization (PSO), etc. various optimization problems in power systems. However, the optimization of similar tasks by such intelligent algorithms is carried out in isolation, cannot effectively preserve the experience and knowledge of past tasks, lacks self-learning ability, and needs to be re-initialized every time a new task is executed, resulting in low optimization efficiency and difficulty in adapting Rapid optimization of large-scale complex risk scheduling.

发明内容Contents of the invention

基于此，有必要针对上述问题，提供一种可适应大规模复杂风险调度快速优化的电力系统风险调度方法和系统。Based on this, it is necessary to provide a power system risk scheduling method and system that can adapt to the rapid optimization of large-scale and complex risk scheduling to address the above problems.

一种电力系统风险调度方法，包括以下步骤：A power system risk scheduling method, comprising the following steps:

获取电力系统的架构数据以及新任务负荷断面数据；Obtain the architecture data of the power system and the load section data of the new task;

根据所述架构数据和所述新任务负荷断面数据，通过细菌觅食强化学习算法对预设的初始知识矩阵进行迭代更新，得到对应的风险调度目标函数值以及更新后的知识矩阵；所述初始知识矩阵为源任务中的最优知识矩阵；According to the framework data and the new task load profile data, the preset initial knowledge matrix is iteratively updated through the bacterial foraging reinforcement learning algorithm to obtain the corresponding risk scheduling objective function value and the updated knowledge matrix; the initial The knowledge matrix is the optimal knowledge matrix in the source task;

根据风险调度目标函数值最小时对应的更新后的知识矩阵进行新任务在线优化，得到风险调度优化结果并输出。According to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, the new task is optimized online, and the risk scheduling optimization result is obtained and output.

一种电力系统风险调度系统，包括：A power system risk scheduling system, comprising:

任务数据获取模块，用于获取电力系统的架构数据以及新任务负荷断面数据；The task data acquisition module is used to acquire the structure data of the power system and the load section data of the new task;

知识矩阵更新模块，用于根据所述架构数据和所述新任务负荷断面数据，通过细菌觅食强化学习算法对预设的初始知识矩阵进行迭代更新，得到对应的风险调度目标函数值以及更新后的知识矩阵；所述初始知识矩阵为源任务中的最优知识矩阵；The knowledge matrix update module is used to iteratively update the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm according to the architecture data and the new task load section data, to obtain the corresponding risk scheduling objective function value and the updated The knowledge matrix; The initial knowledge matrix is the optimal knowledge matrix in the source task;

风险调度优化模块，用于根据风险调度目标函数值最小时对应的更新后的知识矩阵进行新任务在线优化，得到风险调度优化结果并输出。The risk scheduling optimization module is used to perform online optimization of new tasks according to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, and obtain and output the risk scheduling optimization results.

上述电力系统风险调度方法和系统，获取电力系统的架构数据以及新任务负荷断面数据；根据架构数据和新任务负荷断面数据，通过细菌觅食强化学习算法对预设的初始知识矩阵进行迭代更新，得到对应的风险调度目标函数值以及更新后的知识矩阵。根据风险调度目标函数值最小时对应的更新后的知识矩阵进行新任务在线优化，得到风险调度优化结果并输出。将源任务中的最优知识矩阵作为新任务的初始矩阵实现知识迁移，利用基于知识迁移的细菌觅食强化学习对新任务进行在线优化。通过迁移学习极大提高了在线学习的速度，实现风险调度问题的在线动态优化，当问题规模进一步扩大仍能保证较快的求解速度，可适应大规模复杂风险调度快速优化。The above power system risk scheduling method and system obtains the architecture data of the power system and the new task load section data; according to the architecture data and the new task load section data, iteratively updates the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm, The corresponding risk scheduling objective function value and the updated knowledge matrix are obtained. According to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, the new task is optimized online, and the risk scheduling optimization result is obtained and output. The optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge transfer, and the new task is optimized online by using the reinforcement learning of bacterial foraging based on knowledge transfer. Through transfer learning, the speed of online learning is greatly improved, and the online dynamic optimization of risk scheduling problems is realized. When the scale of the problem is further expanded, it can still guarantee a faster solution speed, and can adapt to the rapid optimization of large-scale and complex risk scheduling.

附图说明Description of drawings

图1为一实施例中电力系统风险调度方法的流程图；Fig. 1 is a flowchart of a power system risk scheduling method in an embodiment;

图2为一实施例中基于知识迁移的细菌觅食强化学习算法的知识获取示意图；Fig. 2 is the knowledge acquisition schematic diagram of the bacteria foraging reinforcement learning algorithm based on knowledge transfer in an embodiment;

图3为一实施例中基于知识延伸的维度缩减示意图；FIG. 3 is a schematic diagram of dimensionality reduction based on knowledge extension in an embodiment;

图4为一实施例中知识迁移示意图；Fig. 4 is a schematic diagram of knowledge transfer in an embodiment;

图5为一实施例中测试系统拓扑图；Fig. 5 is a test system topology diagram in an embodiment;

图6为一实施例中电力系统风险调度系统的结构图。Fig. 6 is a structural diagram of a power system risk dispatching system in an embodiment.

具体实施方式detailed description

在一个实施例中，一种电力系统风险调度方法，如图1所示，包括以下步骤：In one embodiment, a power system risk scheduling method, as shown in Figure 1, includes the following steps:

步骤S120：获取电力系统的架构数据以及新任务负荷断面数据。Step S120: Obtain the structure data of the power system and the new task load profile data.

电力系统的架构数据具体可包括母线节点、传输线、变压器和发电机等数据。新任务负荷断面数据包括一个或多个负荷断面，每个负荷断面作为一个新任务。获取电力系统的架构数据以及新任务负荷断面数据用于后续进行风险调度优化。The architecture data of the power system may specifically include data such as bus nodes, transmission lines, transformers, and generators. The new task load section data includes one or more load sections, and each load section is regarded as a new task. Obtain the architecture data of the power system and the new task load section data for subsequent risk scheduling optimization.

步骤S130：根据架构数据和新任务负荷断面数据，通过细菌觅食强化学习算法对预设的初始知识矩阵进行迭代更新，得到对应的风险调度目标函数值以及更新后的知识矩阵。Step S130: According to the framework data and the new task load section data, iteratively update the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm to obtain the corresponding risk scheduling objective function value and the updated knowledge matrix.

初始知识矩阵为源任务中的最优知识矩阵。将源任务中的最优知识矩阵作为新任务的初始矩阵实现知识迁移，通过细菌群结合细菌觅食优化算法的随机搜索模式与概率空间动作选择策略执行动作选择，实现利用基于知识迁移的细菌觅食强化学习算法(Transfer Bacteria Foraging Optimization，TBFO)对新任务进行在线优化。The initial knowledge matrix is the optimal knowledge matrix in the source task. The optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge transfer, and the action selection is performed by combining the bacterial group with the random search mode of the bacterial foraging optimization algorithm and the probabilistic space action selection strategy to realize the bacterial foraging based on knowledge transfer. The new task is optimized online using the Transfer Bacteria Foraging Optimization (TBFO) algorithm.

初始知识矩阵的具体类型并不唯一，本实施例中，初始知识矩阵为Q矩阵。Q学习算法中，Q矩阵中的元素Q(s，a)表示在状态s下选择动作a所得累积奖励值的期望。矩阵记录了智能体把状态映射到动作这一过程的知识。将Q矩阵作为记录群体优化信息的知识矩阵，通过分析不同优化任务间的相似性，利用源任务的知识矩阵形成新任务的初始知识矩阵，以知识迁移的方式实现对不同时间断面任务的在线动态优化，确保优化可靠性。The specific type of the initial knowledge matrix is not unique. In this embodiment, the initial knowledge matrix is a Q matrix. In the Q learning algorithm, the element Q(s, a) in the Q matrix represents the expectation of the cumulative reward value obtained by choosing action a in state s. The matrix records the agent's knowledge of the process of mapping states to actions. The Q matrix is used as the knowledge matrix for recording group optimization information. By analyzing the similarity between different optimization tasks, the knowledge matrix of the source task is used to form the initial knowledge matrix of the new task, and the online dynamics of different time-section tasks are realized by knowledge transfer. optimized to ensure optimal reliability.

TBFO算法中细菌群从初始知识矩阵中获得针对特定环境状态的动作策略，并利用从多次重复试验中获得的反馈信息更新原有知识，形成对特定状态的固有反应，以使细菌群觅食过程中累积的能量值达到最大。In the TBFO algorithm, the bacterial group obtains the action strategy for a specific environmental state from the initial knowledge matrix, and uses the feedback information obtained from repeated experiments to update the original knowledge, forming an inherent response to the specific state, so that the bacterial group foraging The energy value accumulated during the process reaches a maximum.

在一个实施例中，步骤S130包括步骤131至步骤136。In one embodiment, step S130 includes step 131 to step 136 .

步骤131：根据架构数据和新任务负荷断面数据，控制细菌在初始知识矩阵的指导下进行趋向性操作、迁徙性操作和复制性操作。Step 131: According to the framework data and new task load section data, the control bacteria perform tropism, migration and replication operations under the guidance of the initial knowledge matrix.

细菌在初始知识矩阵的指导下，通过趋向性操作、迁徙性操作和复制性操作获取知识。TBFO算法中，全部细菌将根据初始知识矩阵对觅食区域进行搜索，并将所得奖励反馈到知识矩阵。如图2所示，按照正在执行的操作，TBFO将细菌划分配为趋向和迁徙两种状态。在算法单次迭代循环中，将两种状态分别赋予一定比例的细菌个体，两组细菌执行完各操作后，计算并排序全部细菌的能量值，进入复制性操作，以使细菌群觅食过程中累积的能量值达到最大。新一轮迭代循中，依据上次迭代中能量值高低对细菌状态进行再分配，能量值较大的细菌保持所在区域不变并进行趋向性操作，能量值较低的细菌执行迁徙性操作。Under the guidance of the initial knowledge matrix, bacteria acquire knowledge through tropism operation, migration operation and replication operation. In the TBFO algorithm, all bacteria will search the foraging area according to the initial knowledge matrix, and feed back the obtained rewards to the knowledge matrix. As shown in Figure 2, TBFO divides bacteria into two states, tropism and migration, according to the operation being performed. In a single iterative cycle of the algorithm, the two states are assigned to a certain proportion of individual bacteria. After the two groups of bacteria perform each operation, the energy values of all bacteria are calculated and sorted, and the replication operation is entered to make the bacteria group feed. The energy value accumulated in reaches the maximum. In the new iteration cycle, the bacterial status is redistributed according to the energy value in the previous iteration. Bacteria with higher energy values keep their area unchanged and perform tropism operations, while bacteria with lower energy values perform migration operations.

具体地，基于能量值排序，将菌群中的优势个体置于趋向状态，仍承担局部搜索的任务。其趋向行为可由下式表示:Specifically, based on the ranking of energy values, the dominant individuals in the flora are placed in the tendency state, and still undertake the task of local search. Its tendency behavior can be expressed by the following formula:

${θ θ}^{i i} ((j j + + 11,, k k,, l l)) = = {θ θ}^{i i} ((j j,, k k,, l l)) + + {C C}_{k k} ((i i)) \frac{Δ Δ ((i i))}{\sqrt{{Δ Δ}^{T T} ((i i)) Δ Δ ((i i))}}$

式中，θⁱ(j,k,l)为细菌个体i在第l代迁徙操作、第k代复制操作和第j代趋向操作后的位置；Δ表示游动后确定的随机方向上的单位向量。In the formula, θi ( ^j ,k,l) is the position of bacterial individual i after the migration operation of the first generation, the replication operation of the kth generation, and the trending operation of the jth generation; Δ represents the unit in the random direction determined after swimming vector.

C_k(i)可以是固定的步长，也可以是变化的步长。本实施例中，C_k(i)为非线性递减的惯性步长，C_k(i)更新方式如下式所示:C _k (i) can be a fixed step size or a variable step size. In the present embodiment, C _k (i) is the inertial step size of non-linear decrease, and C _k (i) update mode is shown in the following formula:

${C C}_{k k} ((i i)) = = {C C}_{00} ((i i)) - - (({C C}_{00} ((i i)) - - {C C}_{e e} ((i i)))) [[\frac{22 k k}{c c l l y the y} - - {((\frac{k k}{c c l l y the y}))}^{22}]]$

式中:C_k(i)为第k次迭代时的惯性步长，C₀为初始游动步长，C_e为最终游动步长，cly为最大迭代步数。In the formula: C _k (i) is the inertial step size at the kth iteration, C ₀ is the initial walking step size, C _e is the final walking step size, and cly is the maximum number of iteration steps.

对处于迁徙状态的细菌，当其满足迁徙概率P_ed时，细菌按照动作概率矩阵进行轮盘选择；否则细菌按照最大知识元素对应的动作迁徙(贪婪策略):For bacteria in the migrating state, when the migration probability P _ed is met, the bacteria perform roulette selection according to the action probability matrix; otherwise, the bacteria migrate according to the action corresponding to the largest knowledge element (greedy strategy):

式中：上标i表示第i个可控变量，与第i个子知识矩阵相对应，i∈M；M为可控变量集合；上标j表示第j只细菌,j∈N,N为菌群集合；P_ed为迁徙概率；r为0～1之间的随机数；a_s则是概率矩阵Pⁱ在全局范围内选择的动作。当满足迁徙条件时，细菌按照动作概率矩阵Pⁱ执行伪随机轮盘选择；Pⁱ的更新方式如下所示:In the formula: the superscript i represents the i-th controllable variable, corresponding to the i-th sub-knowledge matrix, i∈M; M is the set of controllable variables; the superscript j represents the j-th bacterium, j∈N, N is the bacteria group set; P _ed is the migration probability; r is a random number between 0 and 1; a _s is the action selected by the probability matrix P ⁱ in the global scope. When the migration conditions are met, the bacteria perform pseudo-random roulette selection according to the action probability matrix P ⁱ ; the update method of P ⁱ is as follows:

$\{\begin{matrix} {e e}^{i i} (({s the s}^{i i},, {a a}^{i i})) = = \frac{11}{{Q Q}^{i i} (({s the s}^{i i},, {a a}^{i i})) - - β β \underset{{a a}^{' '} &Element; &Element; {Λ Λ}^{i i}}{max max} {Q Q}^{i i} (({s the s}^{i i},, {a a}^{' '}))} \\ {P P}^{i i} (({s the s}^{i i},, {a a}^{i i})) = = \frac{{e e}^{i i} (({s the s}^{i i},, {a a}^{i i}))}{\underset{{a a}^{' '} &Element; &Element; {Λ Λ}^{i i}}{Σ Σ} {e e}^{i i} (({s the s}^{i i},, {a a}^{' '}))} \end{matrix}$

式中:β为差异系数,用于放大Qⁱ矩阵元素的差异性；eⁱ属于中间计算矩阵。In the formula: β is the difference coefficient, which is used to amplify the difference of Q ⁱ matrix elements; e ⁱ belongs to the intermediate calculation matrix.

在一个实施例中，复制性操作中引入了交叉的过程，其交叉方式如下：In one embodiment, a crossover process is introduced in the duplication operation, and the crossover method is as follows:

θ^i+S/2(j,k,l)＝rθⁱ(j,k,l)+(1-r)θ^i+S/2(j,k,l)θ ^i+S/2 (j,k,l)=rθ ⁱ (j,k,l)+(1-r)θ ^i+S/2 (j,k,l)

式中:S为细菌个体数，i∈[1，S/2]，r为[0，1]内的随机数。In the formula: S is the number of individual bacteria, i ∈ [1, S/2], r is a random number in [0, 1].

步骤132：根据细菌的趋向性操作、迁徙性操作和复制性操作，计算电力系统在基态和预设故障下的潮流值。Step 132: According to the tropism operation, migration operation and replication operation of the bacteria, calculate the power flow value of the power system under the ground state and preset fault.

在细菌的趋向性操作、迁徙性操作和复制性操作结束后，根据相应结果计算计算电力系统在基态和预设故障下的潮流值。基态即指系统未发生系统故障，预设故障的具体种类并不唯一。After the bacterial tropism operation, migration operation and replication operation are over, the power flow value of the power system in the ground state and preset fault is calculated according to the corresponding results. The ground state means that the system does not have a system failure, and the specific type of preset failure is not unique.

步骤133：根据电力系统在基态和预设故障下的潮流值计算得到风险调度目标函数值。Step 133: Calculate the value of the risk scheduling objective function according to the power flow value of the power system under the base state and the preset fault.

在TBFO算法中立即奖励值反映了优化的方向，菌群通过迭代优化知识矩阵获得最优策略，以期望获得最大的累积奖励函数值。在风险调度数学模型中，目标函数为算法奖励函数的倒数，期望通过优化使目标函最小化。本实施例中，奖励函数设计如下:In the TBFO algorithm, the immediate reward value reflects the direction of optimization, and the bacterial group obtains the optimal strategy by iteratively optimizing the knowledge matrix in order to expect to obtain the maximum cumulative reward function value. In the mathematical model of risk scheduling, the objective function is the reciprocal of the reward function of the algorithm, and it is expected to minimize the objective function through optimization. In this embodiment, the reward function is designed as follows:

${R R}^{i i j j} = = \frac{11}{{ω ω}_{11} (({F f}_{c c} / / {c c}_{11})) + + {ω ω}_{22} (({I I}_{R R} / / {c c}_{22})) + + {C C}_{v v}}$

其中，F_C非线性函数描述的燃料成本，I_R为非线性效用函数描述的系统安全风险指标。C_V是基态下系统总约束的违反程度，c₁、c₂分别来配合燃料成本与风险指标间的量级关系，ω₁、ω₂分别用于体现对对应目标的侧重程度。Among them, F _C is the fuel cost described by the nonlinear function, and I _R is the system safety risk index described by the nonlinear utility function. C _V is the degree of violation of the overall constraints of the system in the base state, c ₁ and c ₂ are used to match the magnitude relationship between fuel costs and risk indicators, and ω ₁ and ω ₂ are used to reflect the degree of emphasis on the corresponding goals.

步骤134：根据风险调度目标函数值对细菌状态进行再分配。在计算得到风险调度目标函数值之后，根据风险调度目标函数值对细菌状态进行再分配。Step 134: Redistribute the bacterial status according to the value of the risk scheduling objective function. After calculating the value of the risk scheduling objective function, the bacterial status is redistributed according to the value of the risk scheduling objective function.

步骤135：根据再分配后的细菌状态对初始知识矩阵进行迭代更新，得到更新后的知识矩阵。在一个实施例中，步骤135包括步骤11和步骤12。Step 135: Iteratively update the initial knowledge matrix according to the redistributed bacterial status to obtain an updated knowledge matrix. In one embodiment, step 135 includes step 11 and step 12 .

步骤11：对初始知识矩阵进行维度缩减，得到多个子知识矩阵。Step 11: Reduce the dimension of the initial knowledge matrix to obtain multiple sub-knowledge matrices.

如图3所示，为有效解决“维数灾难”问题，采用知识延伸进行维度缩减，将初始知识矩阵Q划分为多个子知识矩阵Qⁱ，与各变量一一对应。变量间通过知识矩阵联系起来，相邻矩阵中的元素为相关知识，也就是说x_i的动作空间A_i即为x_i+1的状态空间S_i+1。只有先确定了变量x_i的动作，才能基于其选择结果选择x_i+1的动作，从而在相关知识间形成了一种链式的延伸，实现了对知识矩阵的分解降维。As shown in Figure 3, in order to effectively solve the "curse of dimensionality" problem, knowledge extension is used to reduce the dimensionality, and the initial knowledge matrix Q is divided into multiple sub-knowledge matrix Q ⁱ , corresponding to each variable one by one. The variables are connected through the knowledge matrix, and the elements in the adjacent matrix are relevant knowledge, that is to say, the action space A _i of x _i is the state space S _i+1 of x _i +1. Only after the action of the variable _xi is determined, the action of _xi+1 can be selected based on the selection result, thus forming a chain extension among related knowledge and realizing the decomposition and dimensionality reduction of the knowledge matrix.

步骤12：根据再分配后的细菌状态对多个子知识矩阵进行更新，得到更新后的知识矩阵。对多个子知识矩阵进行更新，由更新后的多个子知识矩阵便可得到更新后的知识矩阵。Step 12: Update multiple sub-knowledge matrices according to the redistributed bacterial status to obtain an updated knowledge matrix. The multiple sub-knowledge matrices are updated, and an updated knowledge matrix can be obtained from the updated multiple sub-knowledge matrices.

将菌群作为多主体对知识矩阵协同更新，全部细菌共享一个知识矩阵，单次迭代中可同时更新多个知识元素，大大加快了寻优的效率。每次试错探索后会对各主体进行奖励值评估。引入菌群协同后，子知识矩阵Qⁱ更新方式如下:The bacteria group is used as a multi-agent to update the knowledge matrix collaboratively. All bacteria share a knowledge matrix, and multiple knowledge elements can be updated simultaneously in a single iteration, which greatly speeds up the efficiency of optimization. After each trial and error exploration, the reward value of each subject will be evaluated. After the introduction of bacterial group collaboration, the update method of the sub-knowledge matrix Q ⁱ is as follows:

${ρ ρ}_{k k}^{i i j j} = = R R (({s the s}_{k k}^{i i j j},, {s the s}_{k k + + 11}^{i i j j},, {a a}_{k k}^{i i j j})) + + γ γ \underset{{a a}^{i i} &Element; &Element; {A A}^{i i}}{max max} {Q Q}_{k k}^{i i} (({s the s}_{k k + + 11}^{i i j j},, a a)) - - {Q Q}_{k k}^{i i} (({s the s}_{k k}^{i i j j},, {a a}_{k k}^{i i j j}))$

${Q Q}_{k k + + 11}^{i i} (({s the s}_{k k}^{i i j j},, {a a}_{k k}^{i i j j})) = = {Q Q}_{k k + + 11}^{i i} (({s the s}_{k k}^{i i j j},, {a a}_{k k})) + + {αρ αρ}_{k k}^{i i j j} {s the s}_{k k}$

式中：R(s^ij _k，s^ij _k+1，a^ij _k)表示第k次迭代在状态s_k下选择动作a_k转移到状态s_k+1时得到的奖励函数值；α为学习因子，γ为折扣因子。In the formula: R(s ^ij _k , s ^ij _k+1 , a ^ij _k ) represents the reward function value obtained when the k-th iteration selects action a _k in state s _k and transfers to state s _k+1 ; α is the learning factor, and γ is the discount factor.

在另一个实施例中，步骤135包括步骤21和步骤23。In another embodiment, step 135 includes step 21 and step 23 .

步骤21：根据再分配后的细菌状态计算初始知识矩阵中各源任务与新任务的有功功率偏差。Step 21: Calculate the active power deviation between each source task and the new task in the initial knowledge matrix according to the redistributed bacterial state.

有功功率偏差定义为源任务和新任务间的相似性，并将有功需求由小到大划分成多个负荷断面：The active power deviation is defined as the similarity between the source task and the new task, and the active demand is divided into multiple load sections from small to large:

[P_Ds1,P_Ds2),[P_Ds2,P_Ds3),...[P_Dsi-1,P_Dsi)...,[P_Dsn-1,P_Dsn)[P _Ds1 ,P _Ds2 ),[P _Ds2 ,P _Ds3 ),...[P Dsi-1 ,P _Dsi )...,[P Dsn _- ₁ ,P _Dsn )

步骤22：根据有功功率偏差由大到小对源任务进行排序，获取前预设个数的源任务。预设个数的具体取值并不唯一，本实施例中，预设个数为两个。Step 22: Sort the source tasks according to the active power deviation from large to small, and obtain the previously preset number of source tasks. The specific value of the preset number is not unique, and in this embodiment, the preset number is two.

步骤23：根据获取的源任务对初始知识矩阵进行更新，得到更新后的知识矩阵。Step 23: Update the initial knowledge matrix according to the acquired source task to obtain an updated knowledge matrix.

以获取两个源任务进行矩阵更新为例，首先计算两个源任务迁移学习的贡献系数，然后根据迁移系数对初始知识矩阵进行更新，得到新任务的知识矩阵。Taking two source tasks for matrix update as an example, first calculate the contribution coefficients of the transfer learning of the two source tasks, and then update the initial knowledge matrix according to the transfer coefficients to obtain the knowledge matrix of the new task.

具体地，假设新任务x的有功需求是P_Dx，P_Di、P_Dk为源任务中与任务x最接近的两个断面负荷，且满足P_Di<P_Dx<P_Dk，则两个源任务P_Di、P_Dk对迁移学习的贡献系数η₁、η₂可由下式计算:Specifically, assuming that the active demand of the new task x is P _Dx , P _Di and P _Dk are the two section loads closest to task x in the source task, and satisfy P _Di <P _Dx <P _Dk , then the two source tasks The contribution coefficients η ₁ and η ₂ of P _Di and P _Dk to transfer learning can be calculated by the following formula:

$\{\begin{matrix} {η η}_{11} = = \frac{{P P}_{D D. x x} - - {P P}_{D D. j j}}{{P P}_{D D. k k} - - {P P}_{D D. j j}} \\ {η η}_{22} = = \frac{{P P}_{D D. k k} - - {P P}_{D D. x x}}{{P P}_{D D. k k} - - {P P}_{D D. j j}} \end{matrix}$

利用线性迁移方式，可以得新任务x的知识矩阵:Using the linear migration method, the knowledge matrix of the new task x can be obtained:

${Q Q}_{x x}^{i i} = = {η η}_{11} {Q Q}_{j j}^{i i} + + {η η}_{22} {Q Q}_{k k}^{i i}$

利用与新任务相似度高的知识，使用最接近新任务负荷需求的源任务断面信息进行迁移，避免迁移受到无效知识对新任务学习质量和速率产生负干扰，提高计算准确性。Utilize the knowledge with high similarity to the new task, use the source task section information closest to the new task load requirements for migration, avoid negative interference caused by invalid knowledge on the learning quality and rate of the new task, and improve calculation accuracy.

可以理解，在一个实施例中，还可以是先将初始知识矩阵进行维度缩减得到多个子知识矩阵，然后利用与新任务相似度高的知识对多个子知识矩阵进行更新，得到更新后的知识矩阵。It can be understood that, in one embodiment, it is also possible to reduce the dimensionality of the initial knowledge matrix to obtain multiple sub-knowledge matrices, and then use the knowledge with high similarity to the new task to update the multiple sub-knowledge matrices to obtain an updated knowledge matrix .

步骤136：判断迭代更新是否满足预设条件。Step 136: Determine whether the iterative update satisfies a preset condition.

预设条件的具体类型并不唯一，本实施例中，预设条件为k>k_max或其中，k_max表示预设最大迭代次数；为知识矩阵的2-范数，反映前后两次迭代中知识矩阵的偏差程度。The specific type of preset condition is not unique. In this embodiment, the preset condition is k>k _max or Among them, k _max represents the preset maximum number of iterations; is the knowledge matrix The 2-norm of , reflects the degree of deviation of the knowledge matrix in the two iterations before and after.

判断迭代更新是否满足预设条件，若否，则将更新后的知识矩阵作为初始知识矩阵，并返回步骤131，再次对知识矩阵进行更新；若是，则迭代更新结束，将最终得到的知识矩阵作为新任务优化所需的优化矩阵。Judging whether the iterative update meets the preset conditions, if not, then use the updated knowledge matrix as the initial knowledge matrix, and return to step 131, and update the knowledge matrix again; if so, the iterative update ends, and the finally obtained knowledge matrix is used as The optimization matrix needed for new task optimization.

步骤S140：根据风险调度目标函数值最小时对应的更新后的知识矩阵进行新任务在线优化，得到风险调度优化结果并输出。Step S140: Perform online optimization of new tasks according to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, and obtain and output the risk scheduling optimization result.

在对初始知识矩阵进行迭代更新结束后，将风险调度目标函数值最小时对应的更新后的知识矩阵作为优化矩阵对新任务进行在线优化，得到风险调度优化结果并输出。输出风险调度优化结果的具体方式并不唯一，可以是输出至存储器进行存储，也可以是输出至显示器进行显示。After the iterative update of the initial knowledge matrix, the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function is used as the optimization matrix to optimize the new task online, and the risk scheduling optimization result is obtained and output. The specific way of outputting the risk scheduling optimization result is not unique, it may be output to a memory for storage, or it may be output to a display for display.

此外，在一个实施例中，步骤S130之前，电力系统风险调度方法还包括步骤110。In addition, in one embodiment, before step S130, the power system risk scheduling method further includes step 110.

步骤110：接收源任务进行训练，得到最优知识矩阵作为初始知识矩阵。Step 110: Receive the source task for training, and obtain the optimal knowledge matrix as the initial knowledge matrix.

步骤110可以是在步骤S120之前，也可以是在步骤S120之后。TBFO算法在预学习阶段执行一系列的源任务以获取最优知识矩阵，并从中挖掘出初始知识，为将来相关的新任务做好准备。如图4所示，来自源任务的相关初始知识将用于在线优化中，根据源任务与新任务间的相似性，源任务Q_S的初始知识矩阵将迁移为新任务Q_N的初始知识矩阵。Step 110 may be before step S120 or after step S120. The TBFO algorithm performs a series of source tasks in the pre-learning stage to obtain the optimal knowledge matrix, and mines the initial knowledge from it to prepare for future related new tasks. As shown in Figure 4, the relevant initial knowledge from the source task will be used in online optimization. According to the similarity between the source task and the new task, the initial knowledge matrix of the source task _QS will be transferred to the initial knowledge matrix of the new task _QN .

为了便于更好地理解上述电力系统风险调度方法，下面结合具体实施例进行详细解释说明。In order to facilitate a better understanding of the above power system risk scheduling method, a detailed explanation will be given below in conjunction with specific embodiments.

将某一可靠性测试系统作为风险调度的仿真对象。选取系统基准容量为100MVA，系统中共有24条母线节点、34条传输线/变压器和32台发电机，其拓扑结构如图5所示。在全部10个发电机节点中，将单机容量最大的发电机节点21定为全系统的平衡节点，其余9个节点为PV(电压控制)节点。A certain reliability test system is taken as the simulation object of risk scheduling. The base capacity of the system is selected as 100MVA. There are 24 busbar nodes, 34 transmission lines/transformers and 32 generators in the system. Its topology is shown in Figure 5. Among all 10 generator nodes, the generator node 21 with the largest stand-alone capacity is designated as the balance node of the whole system, and the remaining 9 nodes are PV (voltage control) nodes.

为测试算法对不同负荷水平进行优化的适应性，本实施例进行96个断面的风险调度优化仿真。本实施例中，选取了典型日负荷曲线，并按照时序每隔十五分钟划分一个断面，得到断面1至断面96。In order to test the adaptability of the algorithm to optimize different load levels, this embodiment conducts risk scheduling optimization simulations for 96 sections. In this embodiment, a typical daily load curve is selected, and a section is divided every fifteen minutes according to the time sequence, and section 1 to section 96 are obtained.

基于以上平台，按照以下步骤进行风险调度优化。Based on the above platform, follow the steps below to optimize risk scheduling.

(1)选取PV节点处的发电机有功出力P_G为控制变量，动作变量空间A(A_PG1，A_PG2，…，A_PGi)与控制变量空间是一一对应的，i为PV节点上机组总数。前一个变量的动作空间即为下一个变量的状态空间。与各变量状态-动作空间对应的子知识矩阵分别为Q^PG1，Q^PG2，…，Q^PGi。细菌在知识矩阵的指导下，通过趋向性操作、迁徙性操作和复制性操作获取知识。(1) Select the active power output _PG of the generator at the PV node as the control variable, the action variable space A (A _PG1 , A _PG2 ,..., A _PGi ) is in one-to-one correspondence with the control variable space, and i is the unit on the PV node total. The action space of the previous variable is the state space of the next variable. The sub-knowledge matrices corresponding to each variable state-action space are Q ^PG1 , Q ^PG2 , . . . , Q ^PGi . Under the guidance of knowledge matrix, bacteria acquire knowledge through tropism operation, migration operation and replication operation.

(2)风险调度的目标函数的计算依赖于非线性潮流计算。若使用标准BFO算法，设N_ed、N_re和N_c分别表示迁徙、复制和趋向行为的操作数，最大游动次数为N_s，预想故障集包含故障N_p个，则潮流计算次数可多达N_edN_reN_cN_s(N_p+1)次，使得求解过程极为缓慢。通过对算法寻优模式的改进，去除了原算法的嵌套循环，提高了算法的效率。细菌群结合BFO算法的随机搜索模式与概率空间动作选择策略执行动作选择。(2) The calculation of the objective function of risk scheduling depends on nonlinear power flow calculation. If the standard BFO algorithm is used, let N _ed , N _re and N _c denote the operands of migration, replication and trend behavior respectively, the maximum number of walks is N _s , and the expected fault set contains N _p faults, then the number of power flow calculations can be more Up to N _ed N _re N _c N _s (N _p +1) times, making the solution process extremely slow. By improving the optimization mode of the algorithm, the nested loop of the original algorithm is removed, and the efficiency of the algorithm is improved. The bacterial swarm combines the random search mode of the BFO algorithm with the probability space action selection strategy to perform action selection.

(3)将同一节点上有相同燃料成本系数的机组划为一个控制变量，31台机组有功出力为控制变量共划分为13个变量。将前一台机组出力大小作为后一台机组的状态空间。其中，第一台机组的状态空间为当前断面的有功功率大小，从而减小了知识矩阵的维度。(3) Units with the same fuel cost coefficient on the same node are classified as a control variable, and the active output of 31 units is divided into 13 variables as control variables. The output of the former unit is used as the state space of the latter unit. Among them, the state space of the first unit is the active power of the current section, thus reducing the dimension of the knowledge matrix.

(4)在TBFO算法中，立即奖励值反应了优化的方向，菌群通过迭代优化知识矩阵获得最优策略，以期望获得最大的累积奖励函数值。(4) In the TBFO algorithm, the immediate reward value reflects the direction of optimization, and the bacterial group obtains the optimal strategy by iteratively optimizing the knowledge matrix in order to expect to obtain the maximum cumulative reward function value.

(5)将有功功率偏差定义为源任务和新任务间的相似性，并将有功需求由小到大划分成多个负荷断面。(5) The active power deviation is defined as the similarity between the source task and the new task, and the active demand is divided into multiple load sections from small to large.

为避免迁移受到无效知识对新任务学习质量和速率产生负干扰，学习过程中应尽量利用与新任务相似度高的知识，本实施例仅使用最接近新任务负荷需求的两个源任务断面信息进行迁移。假设新任务x的有功需求是P_Dx，P_Di、P_Dk为源任务中与任务x最接近的两个断面负荷，且满足P_Di<P_Dx<P_Dk，则获取两个源任务对迁移学习的贡献系数，然后利用线性迁移方式，可以得新任务x的知识矩阵。In order to avoid the negative interference of invalid knowledge on the learning quality and rate of the new task, knowledge with high similarity to the new task should be used as much as possible during the learning process. In this embodiment, only the two source task section information closest to the load requirements of the new task are used. to migrate. Assuming that the active demand of the new task x is P _Dx , P _Di and P _Dk are the two section loads closest to task x in the source task, and satisfying P _Di <P _Dx <P _Dk , then the transfer of the two source task pairs Learning the contribution coefficient, and then using the linear transfer method, the knowledge matrix of the new task x can be obtained.

在完成对初始知识矩阵的迭代更新之后，将风险调度目标函数值最小时对应的更新后的知识矩阵作为优化矩阵对新任务进行在线，得到风险调度优化结果并输出。After the iterative update of the initial knowledge matrix is completed, the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function is used as the optimization matrix to carry out online for new tasks, and the risk scheduling optimization results are obtained and output.

上述电力系统风险调度方法，将源任务中的最优知识矩阵作为新任务的初始矩阵实现知识迁移，利用基于知识迁移的细菌觅食强化学习对新任务进行在线优化。通过迁移学习极大提高了在线学习的速度，实现风险调度问题的在线动态优化，当问题规模进一步扩大仍能保证较快的求解速度，可适应大规模复杂风险调度快速优化。The above power system risk scheduling method uses the optimal knowledge matrix in the source task as the initial matrix of the new task to realize knowledge transfer, and uses knowledge transfer-based bacterial foraging reinforcement learning to optimize the new task online. Through transfer learning, the speed of online learning is greatly improved, and the online dynamic optimization of risk scheduling problems is realized. When the scale of the problem is further expanded, it can still guarantee a faster solution speed, and can adapt to the rapid optimization of large-scale and complex risk scheduling.

在一个实施例中，一种电力系统风险调度系统，如图6所示，包括任务数据获取模块120、知识矩阵更新模块130和风险调度优化模块140。In one embodiment, a power system risk scheduling system, as shown in FIG. 6 , includes a task data acquisition module 120 , a knowledge matrix updating module 130 and a risk scheduling optimization module 140 .

任务数据获取模块120用于获取电力系统的架构数据以及新任务负荷断面数据。电力系统的架构数据具体可包括母线节点、传输线、变压器和发电机等数据，新任务负荷断面数据包括一个或多个负荷断面。获取电力系统的架构数据以及新任务负荷断面数据用于后续进行风险调度优化。The task data acquisition module 120 is used to acquire the architecture data of the power system and the new task load section data. The architecture data of the power system may specifically include data such as bus nodes, transmission lines, transformers, and generators, and the new task load section data includes one or more load sections. Obtain the architecture data of the power system and the new task load section data for subsequent risk scheduling optimization.

知识矩阵更新模块130用于根据架构数据和新任务负荷断面数据，通过细菌觅食强化学习算法对预设的初始知识矩阵进行迭代更新，得到对应的风险调度目标函数值以及更新后的知识矩阵。The knowledge matrix update module 130 is used to iteratively update the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm according to the architecture data and the new task load section data, to obtain the corresponding risk scheduling objective function value and the updated knowledge matrix.

初始知识矩阵为源任务中的最优知识矩阵。将源任务中的最优知识矩阵作为新任务的初始矩阵实现知识迁移，通过细菌群结合细菌觅食优化算法的随机搜索模式与概率空间动作选择策略执行动作选择，实现利用TBFO算法对新任务进行在线优化。The initial knowledge matrix is the optimal knowledge matrix in the source task. The optimal knowledge matrix in the source task is used as the initial matrix of the new task to realize knowledge transfer, and the action selection is performed by combining the random search mode of the bacterial foraging optimization algorithm with the bacterial foraging optimization algorithm and the action selection strategy of probability space, and realizes the new task by using the TBFO algorithm. Online optimization.

初始知识矩阵的具体类型并不唯一，本实施例中，初始知识矩阵为Q矩阵。将Q矩阵作为记录群体优化信息的知识矩阵，通过分析不同优化任务间的相似性，利用源任务的知识矩阵形成新任务的初始知识矩阵，以知识迁移的方式实现对不同时间断面任务的在线动态优化，确保优化可靠性。The specific type of the initial knowledge matrix is not unique. In this embodiment, the initial knowledge matrix is a Q matrix. The Q matrix is used as the knowledge matrix for recording group optimization information. By analyzing the similarity between different optimization tasks, the knowledge matrix of the source task is used to form the initial knowledge matrix of the new task, and the online dynamics of different time-section tasks are realized by knowledge transfer. optimized to ensure optimum reliability.

在一个实施例中，知识矩阵更新模块130包括第一处理单元、第二处理单元、第三处理单元、第四处理单元、第五处理单元和第六处理单元。In one embodiment, the knowledge matrix update module 130 includes a first processing unit, a second processing unit, a third processing unit, a fourth processing unit, a fifth processing unit and a sixth processing unit.

第一处理单元用于根据架构数据和新任务负荷断面数据，控制细菌在初始知识矩阵的指导下进行趋向性操作、迁徙性操作和复制性操作。The first processing unit is used to control the bacteria to perform tropism, migration and replication operations under the guidance of the initial knowledge matrix according to the architecture data and the new task load section data.

细菌在初始知识矩阵的指导下，通过趋向性操作、迁徙性操作和复制性操作获取知识。具体地，基于能量值排序，将菌群中的优势个体置于趋向状态，仍承担局部搜索的任务。其趋向行为可由下式表示:Under the guidance of the initial knowledge matrix, bacteria acquire knowledge through tropism operation, migration operation and replication operation. Specifically, based on the ranking of energy values, the dominant individuals in the flora are placed in the tendency state, and still undertake the task of local search. Its tendency behavior can be expressed by the following formula:

当满足迁徙条件时，细菌按照动作概率矩阵Pⁱ执行伪随机轮盘选择a_S；Pⁱ的更新方式如下所示:When the migration conditions are met, the bacteria perform pseudo-random roulette selection a _S according to the action probability matrix P ⁱ ; the update method of P ⁱ is as follows:

在一个实施例中，复制性操作中引入了交叉的过程，其交叉方式如下：In one embodiment, a crossover process is introduced in the replication operation, and the crossover method is as follows:

第二处理单元用于根据细菌的趋向性操作、迁徙性操作和复制性操作，计算电力系统在基态和预设故障下的潮流值。The second processing unit is used to calculate the power flow value of the power system in the base state and preset fault according to the tropism operation, migration operation and replication operation of the bacteria.

在细菌的趋向性操作、迁徙性操作和复制性操作结束后，根据相应结果计算计算电力系统在基态和预设故障下的潮流值。基态即指系统未发生系统故障，预设故障的具体种类并不唯一。After the tropism operation, migration operation and replication operation of bacteria are over, the power flow value of the power system in the ground state and preset fault is calculated according to the corresponding results. The ground state means that the system does not have a system failure, and the specific type of preset failure is not unique.

第三处理单元用于根据电力系统在基态和预设故障下的潮流值计算得到风险调度目标函数值。The third processing unit is used to calculate the value of the risk scheduling objective function according to the power flow value of the power system in the base state and the preset fault.

在TBFO算法中立即奖励值反应了优化的方向，菌群通过迭代优化知识矩阵获得最优策略，以期望获得最大的累积奖励函数值。在风险调度数学模型中，目标函数为算法奖励函数的倒数，期望通过优化使目标函最小化。本实施例中，奖励函数设计如下:In the TBFO algorithm, the immediate reward value reflects the direction of optimization, and the bacterial group obtains the optimal strategy by iteratively optimizing the knowledge matrix in order to expect to obtain the maximum cumulative reward function value. In the mathematical model of risk scheduling, the objective function is the reciprocal of the reward function of the algorithm, and it is expected to minimize the objective function through optimization. In this embodiment, the reward function is designed as follows:

第四处理单元用于根据风险调度目标函数值对细菌状态进行再分配。在计算得到风险调度目标函数值之后，根据风险调度目标函数值对细菌状态进行再分配。The fourth processing unit is used to redistribute the bacteria status according to the value of the risk scheduling objective function. After calculating the value of the risk scheduling objective function, the bacterial status is redistributed according to the value of the risk scheduling objective function.

第五处理单元用于根据再分配后的细菌状态对初始知识矩阵进行迭代更新，得到更新后的知识矩阵。The fifth processing unit is configured to iteratively update the initial knowledge matrix according to the redistributed bacterial status to obtain an updated knowledge matrix.

在一个实施例中，第五处理单元包括维度缩减单元和矩阵更新单元。In one embodiment, the fifth processing unit includes a dimensionality reduction unit and a matrix update unit.

维度缩减单元用于对初始知识矩阵进行维度缩减，得到多个子知识矩阵。采用知识延伸进行维度缩减，将初始知识矩阵Q划分为多个子知识矩阵Qⁱ，与各变量一一对应。The dimension reduction unit is used to reduce the dimension of the initial knowledge matrix to obtain multiple sub-knowledge matrices. Dimensionality reduction is carried out by knowledge extension, and the initial knowledge matrix Q is divided into multiple sub-knowledge matrices Q ⁱ , which correspond to each variable one by one.

矩阵更新单元用于根据再分配后的细菌状态对多个子知识矩阵进行更新，得到更新后的知识矩阵。对多个子知识矩阵进行更新，由更新后的多个子知识矩阵便可得到更新后的知识矩阵。The matrix update unit is used to update multiple sub-knowledge matrices according to the redistributed bacterial states to obtain updated knowledge matrices. The multiple sub-knowledge matrices are updated, and an updated knowledge matrix can be obtained from the updated multiple sub-knowledge matrices.

在另一个实施例中，第五处理单元包括计算单元、提取单元和更新单元。In another embodiment, the fifth processing unit includes a calculation unit, an extraction unit and an update unit.

计算单元用于根据再分配后的细菌状态计算初始知识矩阵中各源任务与新任务的有功功率偏差。有功功率偏差定义为源任务和新任务间的相似性，并将有功需求由小到大划分成多个负荷断面：The calculation unit is used to calculate the active power deviation between each source task and the new task in the initial knowledge matrix according to the redistributed bacterial state. The active power deviation is defined as the similarity between the source task and the new task, and the active demand is divided into multiple load sections from small to large:

提取单元用于根据有功功率偏差由大到小对源任务进行排序，获取前预设个数的源任务。预设个数的具体取值并不唯一，本实施例中，预设个数为两个。The extraction unit is used to sort the source tasks according to the active power deviation from large to small, and obtain a preset number of source tasks. The specific value of the preset number is not unique, and in this embodiment, the preset number is two.

更新单元用于根据获取的源任务对初始知识矩阵进行更新，得到更新后的知识矩阵。The update unit is used to update the initial knowledge matrix according to the acquired source task to obtain the updated knowledge matrix.

以获取两个源任务进行矩阵更新为例，首先计算两个源任务迁移学习的贡献系数，然后根据迁移系数对初始知识矩阵进行更新，得到新任务的知识矩阵。两个源任务P_Di、P_Dk对迁移学习的贡献系数η₁、η₂可由下式计算:Taking two source tasks for matrix update as an example, first calculate the contribution coefficients of transfer learning of the two source tasks, and then update the initial knowledge matrix according to the transfer coefficients to obtain the knowledge matrix of the new task. The contribution coefficients η ₁ and η ₂ of the two source tasks P _Di and P _Dk to transfer learning can be calculated by the following formula:

利用与新任务相似度高的知识，使用最接近新任务负荷需求的源任务断面信息进行迁移，避免迁移受到无效知识对新任务学习质量和速率产生负干扰，提高计算准确性。Utilize the knowledge with high similarity to the new task, use the source task section information closest to the new task load requirements for migration, avoid the negative interference of invalid knowledge on the learning quality and rate of the new task, and improve the calculation accuracy.

可以理解，在一个实施例中，还可以是第五处理单元包括维度缩减单元和矩阵更新单元，矩阵更新单元包括计算单元、提取单元和更新单元。先将初始知识矩阵进行维度缩减得到多个子知识矩阵，然后利用与新任务相似度高的知识对多个子知识矩阵进行更新，得到更新后的知识矩阵。It can be understood that, in an embodiment, the fifth processing unit may also include a dimension reduction unit and a matrix update unit, and the matrix update unit includes a calculation unit, an extraction unit, and an update unit. First, reduce the dimensions of the initial knowledge matrix to obtain multiple sub-knowledge matrices, and then use the knowledge with high similarity to the new task to update the multiple sub-knowledge matrices to obtain the updated knowledge matrix.

第六处理单元用于判断迭代更新是否满足预设条件，并在迭代更新未满足预设条件时，将更新后的知识矩阵作为初始知识矩阵，并控制第一处理单元再次根据架构数据和新任务负荷断面数据，控制细菌在初始知识矩阵的指导下进行趋向性操作、迁徙性操作和复制性操作。The sixth processing unit is used to judge whether the iterative update satisfies the preset condition, and when the iterative update does not meet the preset condition, use the updated knowledge matrix as the initial knowledge matrix, and control the first processing unit again according to the architecture data and the new task Load section data, control bacteria to perform tropism operation, migration operation and replication operation under the guidance of the initial knowledge matrix.

预设条件的具体类型并不唯一，本实施例中，预设条件为k>k_max或判断迭代更新是否满足预设条件，若否，则将更新后的知识矩阵作为初始知识矩阵再次进行迭代更新，若是，则迭代更新结束，将最终得到的知识矩阵作为新任务优化所需的优化矩阵。The specific type of preset condition is not unique. In this embodiment, the preset condition is k>k _max or Determine whether the iterative update meets the preset conditions. If not, use the updated knowledge matrix as the initial knowledge matrix for iterative update again. If so, the iterative update ends, and use the final knowledge matrix as the optimization matrix required for new task optimization. .

风险调度优化模块140用于根据风险调度目标函数值最小时对应的更新后的知识矩阵进行新任务在线优化，得到风险调度优化结果并输出。The risk scheduling optimization module 140 is used to perform online optimization of new tasks according to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, and obtain and output the risk scheduling optimization result.

此外，在一个实施例中，电力系统风险调度系统还包括矩阵训练模块。In addition, in one embodiment, the power system risk scheduling system further includes a matrix training module.

矩阵训练模块用于在知识矩阵更新模块130根据架构数据和新任务负荷断面数据，通过细菌觅食强化学习算法对预设的初始知识矩阵进行迭代更新，得到对应的风险调度目标函数值以及更新后的知识矩阵之前，接收源任务进行训练，得到最优知识矩阵作为初始知识矩阵。TBFO算法在预学习阶段执行一系列的源任务以获取最优知识矩阵，并从中挖掘出初始知识，为将来相关的新任务做好准备。The matrix training module is used to iteratively update the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm in the knowledge matrix update module 130 according to the architecture data and the new task load section data, to obtain the corresponding risk scheduling objective function value and the updated Before the knowledge matrix, the source task is received for training, and the optimal knowledge matrix is obtained as the initial knowledge matrix. The TBFO algorithm performs a series of source tasks in the pre-learning stage to obtain the optimal knowledge matrix, and mines the initial knowledge from it to prepare for future related new tasks.

上述电力系统风险调度系统，将源任务中的最优知识矩阵作为新任务的初始矩阵实现知识迁移，利用基于知识迁移的细菌觅食强化学习对新任务进行在线优化。通过迁移学习极大提高了在线学习的速度，实现风险调度问题的在线动态优化，当问题规模进一步扩大仍能保证较快的求解速度，可适应大规模复杂风险调度快速优化。The above power system risk scheduling system uses the optimal knowledge matrix in the source task as the initial matrix of the new task to realize knowledge transfer, and uses bacterial foraging reinforcement learning based on knowledge transfer to optimize the new task online. Through transfer learning, the speed of online learning is greatly improved, and the online dynamic optimization of risk scheduling problems is realized. When the scale of the problem is further expanded, it can still guarantee a faster solution speed, and can adapt to the rapid optimization of large-scale and complex risk scheduling.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims

1. A power system risk scheduling method, characterized in that, comprising the following steps:

Obtain the architecture data of the power system and the load section data of the new task;

According to the framework data and the new task load profile data, the preset initial knowledge matrix is iteratively updated through the bacterial foraging reinforcement learning algorithm to obtain the corresponding risk scheduling objective function value and the updated knowledge matrix; the initial The knowledge matrix is the optimal knowledge matrix in the source task;

According to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, the new task is optimized online, and the risk scheduling optimization result is obtained and output.

2. The power system risk scheduling method according to claim 1, wherein the initial knowledge matrix is a Q matrix.

3. The power system risk scheduling method according to claim 1, characterized in that, according to the architecture data and the new task setting data, the preset initial knowledge matrix is iterated through the bacterial foraging reinforcement learning algorithm Updating, the step of obtaining the corresponding risk scheduling objective function value and the updated knowledge matrix includes the following steps:

According to the framework data and the new task load profile data, control bacteria to perform tropism, migration and replication operations under the guidance of the initial knowledge matrix;

According to the tropism operation, migratory operation and replicative operation of the bacteria, calculate the power flow value of the power system under the ground state and preset fault;

Calculate the value of the risk scheduling objective function according to the power flow value of the power system in the base state and the preset fault;

Redistribute the bacteria state according to the risk scheduling objective function value;

Iteratively updating the initial knowledge matrix according to the redistributed bacterial state to obtain an updated knowledge matrix;

Judging whether the iterative update meets the preset conditions;

If not, then use the updated knowledge matrix as the initial knowledge matrix, and return the section data based on the architecture data and the new task load, and control the bacteria to perform tropism operations under the guidance of the initial knowledge matrix , steps of migratory operation and replicative operation.

4. The power system risk scheduling method according to claim 3, wherein the step of iteratively updating the initial knowledge matrix according to the redistributed bacterial state to obtain an updated knowledge matrix comprises the following steps :

performing dimensionality reduction on the initial knowledge matrix to obtain multiple sub-knowledge matrices;

The plurality of sub-knowledge matrices are updated according to the redistributed bacterial states to obtain an updated knowledge matrix.

5. The power system risk scheduling method according to claim 1, characterized in that, according to the framework data and the new task load section data, the preset initial knowledge matrix is carried out by bacteria foraging reinforcement learning algorithm Before the step of iteratively updating to obtain the corresponding risk scheduling objective function value and the updated knowledge matrix, the following steps are also included:

The source task is received for training, and the optimal knowledge matrix is obtained as the initial knowledge matrix.

6. A power system risk dispatching system, characterized in that it comprises:

The task data acquisition module is used to acquire the structure data of the power system and the load section data of the new task;

The knowledge matrix update module is used to iteratively update the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm according to the architecture data and the new task load section data, to obtain the corresponding risk scheduling objective function value and the updated The knowledge matrix; The initial knowledge matrix is the optimal knowledge matrix in the source task;

The risk scheduling optimization module is used to perform online optimization of new tasks according to the updated knowledge matrix corresponding to the minimum value of the risk scheduling objective function, and obtain and output the risk scheduling optimization results.

7. The power system risk scheduling system according to claim 6, wherein the initial knowledge matrix is a Q matrix.

8. The power system risk scheduling system according to claim 6, wherein the knowledge matrix updating module comprises:

The first processing unit is used to control the bacteria to perform tropism, migration and replication operations under the guidance of the initial knowledge matrix according to the architecture data and the new task load profile data;

The second processing unit is used to calculate the power flow value of the power system in the base state and preset fault according to the tropism operation, migration operation and replication operation of the bacteria;

The third processing unit is used to calculate the risk scheduling objective function value according to the power flow value of the power system in the base state and the preset fault;

The fourth processing unit is used to redistribute the bacterial status according to the value of the risk scheduling objective function;

The fifth processing unit is configured to iteratively update the initial knowledge matrix according to the redistributed bacterial state to obtain an updated knowledge matrix;

The sixth processing unit is used to judge whether the iterative update meets the preset condition, and when the iterative update does not meet the preset condition, use the updated knowledge matrix as the initial knowledge matrix, and control the first processing unit again according to The architecture data and the new task load section data control the bacteria to perform tropism, migration and replication operations under the guidance of the initial knowledge matrix.

9. The power system risk scheduling system according to claim 8, wherein the fifth processing unit comprises:

A dimension reduction unit, configured to reduce the dimension of the initial knowledge matrix to obtain multiple sub-knowledge matrices;

A matrix updating unit, configured to update the plurality of sub-knowledge matrices according to the redistributed bacterial states to obtain updated knowledge matrices.

10. The power system risk scheduling system according to claim 6, further comprising a matrix training module, the matrix training module is used to update the knowledge matrix according to the architecture data and the new task load section data , iteratively updates the preset initial knowledge matrix through the bacterial foraging reinforcement learning algorithm, and obtains the corresponding risk scheduling objective function value and the updated knowledge matrix. Before receiving the source task for training, the optimal knowledge matrix is obtained as the initial knowledge matrix.