CN114507881A

CN114507881A - Model-free self-learning and stable control method for electrolyte temperature in zinc electrolysis process

Info

Publication number: CN114507881A
Application number: CN202210277803.6A
Authority: CN
Inventors: 阳春华; 刘天豪; 周灿; 朱红求; 李勇刚; 李繁飙
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-05-17
Anticipated expiration: 2042-03-21
Also published as: CN114507881B

Abstract

The embodiments of the present disclosure provide a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process, which belongs to the field of chemical technology, and specifically includes: establishing an environmental interaction model corresponding to a Q-learning algorithm, a reward mechanism and a Q-table, setting an electrolysis The target interval that the liquid temperature needs to be controlled, and initialize the parameters required for the Q table update; define the state space and action space of the electrolyte in the zinc electrolysis process; define the Q table, the horizontal axis represents the optional action, and the vertical axis represents the type of state space; The data generated by the interaction between the agent and the environment interaction model updates the Q table; according to the updated Q table, the stable control model corresponding to the electrolyte temperature in the zinc electrolysis process is obtained, and the optimal cooling corresponding to the current electrolyte state is output according to the stable control model. Tower fan frequency. Through the solution of the present disclosure, it is automatically ensured that the temperature of the zinc electrolyte is always within the required range of the process, and the control efficiency, adaptability and stability of the production of the zinc electrolysis process are improved.

Description

Model-free self-learning and stable control method for electrolyte temperature in zinc electrolysis process

技术领域technical field

本公开实施例涉及化学技术领域，尤其涉及一种锌电解过程电解液温度无模型自学习稳定控制方法。The embodiments of the present disclosure relate to the field of chemical technology, and in particular, to a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process.

背景技术Background technique

目前，锌电解湿法炼锌流程的重要工艺过程，指电解液中的锌离子在直流电作用下，从阳极板流动至阴极板从而在阴极板上析出成为锌单质的过程。因为其消耗大量的电能，在很大程度上影响着锌冶炼厂的生产成本。电解液的温度作为电解工艺控制的关键参数之一，对析出锌的效率与质量有着重要的影响。为了保证锌的高效析出，要求电解液出口温度处于合适范围内。过高的电解液温度将会导致氢气析出的超电压降低，使得析出锌的反溶加剧，电流效率下降；过低的电解液温度也会导致锌的析出效率降低，电解效果变差。At present, the important process of zinc electrolysis hydrometallurgy process refers to the process in which zinc ions in the electrolyte flow from the anode plate to the cathode plate under the action of direct current, and then precipitate on the cathode plate to form zinc element. Because it consumes a lot of electricity, it greatly affects the production cost of zinc smelters. As one of the key parameters of electrolysis process control, the temperature of the electrolyte has an important influence on the efficiency and quality of zinc precipitation. In order to ensure the efficient precipitation of zinc, the outlet temperature of the electrolyte is required to be within a suitable range. Too high electrolyte temperature will reduce the overvoltage of hydrogen precipitation, which will intensify the inversion of precipitation zinc and reduce the current efficiency.

电解液溶液循环体系具有循环流量大、循环不间断的特点，以某年产量30万吨锌冶炼企业为例，仅电解槽内电解液溶液体积就达到了上千立方米，且溶液循环24小时不间断，是典型的大规模循环流程体系。不仅如此，由于电解液存放于露天环境中，环境温度对电解液温度影响很大，由于昼夜、四季环境温度变化明显，导致在不同时间段内精准控制电解液温度十分困难。The electrolyte solution circulation system has the characteristics of large circulation flow and uninterrupted circulation. Taking a zinc smelting enterprise with an annual output of 300,000 tons as an example, the volume of electrolyte solution in the electrolytic cell alone reaches thousands of cubic meters, and the solution circulates for 24 hours. Uninterrupted, is a typical large-scale circulation process system. Not only that, because the electrolyte is stored in the open air, the ambient temperature has a great influence on the temperature of the electrolyte. Due to the obvious changes in ambient temperature during the day and night and the four seasons, it is very difficult to accurately control the temperature of the electrolyte in different time periods.

为了保持电解液温度处于合适的范围内，目前企业的做法是在电解液循环体系中安装机械通风式冷却塔，通过将车间外低温空气吹入电解液循环管道中，实现电解液温度的冷却。根据冷却塔风机是否安装变频装置，可以将冷却塔塔风机控制策略分为变频和非变频两种。对于变频冷却塔，可以通过调节风机频率实现冷却塔冷却性能的调整；对于非变频冷却塔，可以通过调节冷却塔风机开启时间实现冷却塔冷却性能的调整。In order to keep the electrolyte temperature within a suitable range, the current practice of enterprises is to install a mechanical ventilation cooling tower in the electrolyte circulation system, and cool the electrolyte temperature by blowing low-temperature air from outside the workshop into the electrolyte circulation pipeline. According to whether the cooling tower fan is installed with a frequency conversion device, the control strategy of the cooling tower fan can be divided into two types: frequency conversion and non-frequency conversion. For frequency conversion cooling towers, the cooling performance of the cooling tower can be adjusted by adjusting the fan frequency; for non-frequency conversion cooling towers, the cooling performance of the cooling tower can be adjusted by adjusting the opening time of the cooling tower fans.

根据行业标准，电解液温度的合适区间为35～40℃。以变频冷却塔为例，在高温环境下，此时需要增加冷却塔风机频率，提升冷却塔冷却性能；在低温环境下，需要降低冷却塔风机频率，降低冷却塔冷却性能。如果冷却塔没有安装变频装置，与变频冷却塔同理，高温季节增加冷却塔运行时间，在低温季节减少冷却塔运行时间。以此保证电解液温度处于合适区间内。本方法针对变频冷却塔，通过改变冷却塔风机的频率实现冷却塔冷却性能的调整，最终实现电解液温度调节。According to industry standards, the suitable range of electrolyte temperature is 35-40°C. Taking the variable frequency cooling tower as an example, in a high temperature environment, the frequency of the cooling tower fan needs to be increased to improve the cooling performance of the cooling tower; in a low temperature environment, the frequency of the cooling tower fan needs to be reduced to reduce the cooling tower cooling performance. If the cooling tower is not installed with a frequency conversion device, the same as the frequency conversion cooling tower, increase the operating time of the cooling tower in the high temperature season, and reduce the operating time of the cooling tower in the low temperature season. This ensures that the electrolyte temperature is within a suitable range. The method aims at the frequency conversion cooling tower, realizes the adjustment of the cooling performance of the cooling tower by changing the frequency of the cooling tower fan, and finally realizes the adjustment of the temperature of the electrolyte.

目前锌电解企业对于冷却塔的控制策略依赖于人工经验，当一天内昼夜温度差异较大时，人工需要频繁调整冷却塔风机频率，劳动强度大，且由于电解液温度往往采用人工手动测量，温度反馈信息滞后，导致基于人工经验的电解液温度控制存在严重的滞后与不稳定性。At present, the control strategy of zinc electrolysis enterprises for cooling towers relies on manual experience. When the temperature difference between day and night is large, the frequency of cooling tower fans needs to be adjusted frequently, which is labor-intensive. The feedback information lags, resulting in serious lag and instability in the electrolyte temperature control based on artificial experience.

可见，亟需一直能够自动、有效地实现锌电解过程电解液温度的稳定控制方法，保证电解液温度始终在工艺要求范围内，保障电解液中的锌离子在阴极能够高效析出，降低工人劳动强度的锌电解过程电解液温度无模型自学习稳定控制方法。It can be seen that there is an urgent need for a stable control method that can automatically and effectively realize the temperature of the electrolyte in the zinc electrolysis process, to ensure that the temperature of the electrolyte is always within the range of process requirements, to ensure that the zinc ions in the electrolyte can be efficiently precipitated at the cathode, and to reduce the labor intensity of workers. A model-free self-learning stable control method for electrolyte temperature during zinc electrolysis.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本公开实施例提供一种锌电解过程电解液温度无模型自学习稳定控制方法，至少部分解决现有技术中存在自动性、适应性、控制效率和稳定性较差的问题。In view of this, the embodiments of the present disclosure provide a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process, which at least partially solves the problems of poor automation, adaptability, control efficiency and stability in the prior art.

本公开实施例提供了一种锌电解过程电解液温度无模型自学习稳定控制方法，包括：The embodiments of the present disclosure provide a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process, including:

步骤1，建立Q学习算法对应的环境交互模型、奖励机制和Q表，设定电解液温度需要控制的目标区间，并初始化所述Q表更新需要的参数，其中，所述参数包括折扣因子、学习率和随机因子；Step 1, establish the environmental interaction model, reward mechanism and Q table corresponding to the Q learning algorithm, set the target interval that the electrolyte temperature needs to be controlled, and initialize the parameters required for the Q table update, wherein the parameters include a discount factor, learning rate and random factor;

步骤2，定义锌电解过程电解液的状态空间与动作空间，其中，所述动作空间为冷却塔风机频率；Step 2, define the state space and action space of the electrolyte in the zinc electrolysis process, wherein, the action space is the cooling tower fan frequency;

步骤3，定义所述Q表，横轴表示可选动作，纵轴表示状态空间的种类，其中，所述状态空间包括环境干球温度、环境空气湿球温度、环境相对湿度和电解液温度四个变量，所述种类的数量为四个变量排列组合的组合数量；Step 3, define the Q table, the horizontal axis represents optional actions, and the vertical axis represents the type of state space, wherein the state space includes ambient dry bulb temperature, ambient air wet bulb temperature, ambient relative humidity and electrolyte temperature. variables, and the number of the categories is the number of combinations of the four variables permutation and combination;

步骤4，根据智能体与环境交互模型的交互产生的数据更新所述Q表；Step 4, update the Q table according to the data generated by the interaction between the agent and the environment interaction model;

步骤5，根据更新完的Q表，得到锌电解过程电解液温度对应的稳定控制模型，并根据所述稳定控制模型输出当前电解液状态对应的最优冷却塔风机频率。Step 5, according to the updated Q table, obtain the stable control model corresponding to the electrolyte temperature in the zinc electrolysis process, and output the optimal cooling tower fan frequency corresponding to the current electrolyte state according to the stable control model.

根据本公开实施例的一种具体实现方式，所述环境交互模型采用BP神经网络搭建，输入参数为t时刻的风机频率f_tower，t时刻的环境干球温度T_dry，t时刻的环境湿球温度T_wet，t时刻的环境相对湿度RH，t时刻的电解液温度T_elec，输出参数为t+1时刻的电解液温度T_elec。According to a specific implementation of the embodiment of the present disclosure, the environment interaction model is built by using a BP neural network, and the input parameters are the fan frequency f _tower at time t, the ambient dry bulb temperature at time t T _dry , and the ambient wet bulb at time t The temperature T _wet , the ambient relative humidity RH at time t, the temperature of the electrolyte solution at time t T _elec , and the output parameter is the temperature of the electrolyte solution at time t+1 T _elec .

根据本公开实施例的一种具体实现方式，所述步骤4之前，所述方法还包括：According to a specific implementation manner of the embodiment of the present disclosure, before the step 4, the method further includes:

定义被控温度的上下限作为预设区间；Define the upper and lower limits of the controlled temperature as the preset interval;

将所述智能体与环境交互模型的交互过程中电解液温度的控制目标设定在所述预设区间内。The control target of the electrolyte temperature during the interaction between the agent and the environment interaction model is set within the preset interval.

根据本公开实施例的一种具体实现方式，所述奖励机制的计算方式为

According to a specific implementation of the embodiment of the present disclosure, the calculation method of the reward mechanism is as follows

根据本公开实施例的一种具体实现方式，所述步骤4具体包括：According to a specific implementation manner of the embodiment of the present disclosure, the step 4 specifically includes:

步骤4.1，设置初始状态s，step＝0；Step 4.1, set the initial state s, step=0;

步骤4.2，动作选取：随机因子rand随机取值，如果rand>0，则选择状态s下Q值最大的动作a；如果rand＝0，则从所有状态中随机选择一个状态，并选择该状态下Q值最大的动作a；Step 4.2, action selection: random factor rand takes a random value, if rand>0, select the action a with the largest Q value in state s; if rand=0, select a state randomly from all states, and select the Action a with the largest Q value;

步骤4.3，智能体agent将动作a输入至环境交互模型env，得到下一时刻的新状态s'；Step 4.3, the agent enters the action a into the environment interaction model env, and obtains the new state s' at the next moment;

步骤4.4，检查s'中的所有动作，将s'中Q值最大的动作作为a'；Step 4.4, check all actions in s', and take the action with the largest Q value in s' as a';

步骤4.5，判断本次迭代过程是否合格：根据下一时刻的新状态s'获得下一时刻的电解液温度T'，若满足T_min≤T'≤T_max，则说明本次迭代过程合格，奖励为0，继续步骤4.6，若不满足T_min≤T'≤T_max，则说明本次迭代过程失败，奖励为-1，重新回到步骤4.1；Step 4.5, determine whether this iteration process is qualified: obtain the electrolyte temperature T' at the next moment according to the new state s' at the next moment, if T _min ≤ T' ≤ T _max , it means that this iteration process is qualified, The reward is 0, go to step 4.6, if T _min ≤ T'≤T _max is not satisfied, it means that this iteration process failed, the reward is -1, and go back to step 4.1;

步骤4.6，使用如下公式更新当前动作a的Q值：Step 4.6, use the following formula to update the Q value of the current action a:

步骤4.7，令s＝s'，a＝a'，step＝step+1，回到步骤4.1，继续循环，定义4.3～4.7为一个step，智能体agent与环境交互模型env交互的过程，就是step_max次更新Q表的过程；Step 4.7, let s=s', a=a', step=step+1, go back to step 4.1, continue the cycle, define 4.3 to 4.7 as a step, the process of interaction between the agent and the environment interaction model env is step The process of updating the Q table for _max times;

步骤4.8，step累加到用户规定的数值step_max，Q表更新步骤结束。Step 4.8, the step is accumulated to the value step _max specified by the user, and the step of updating the Q table ends.

本公开实施例中的锌电解过程电解液温度无模型自学习稳定控制方案，包括：步骤1，建立Q学习算法对应的环境交互模型、奖励机制和Q表，设定电解液温度需要控制的目标区间，并初始化所述Q表更新需要的参数，其中，所述参数包括折扣因子、学习率和随机因子；步骤2，定义锌电解过程电解液的状态空间与动作空间，其中，所述动作空间为冷却塔风机频率；步骤3，定义所述Q表，横轴表示可选动作，纵轴表示状态空间的种类，其中，所述状态空间包括环境干球温度、环境空气湿球温度、环境相对湿度和电解液温度四个变量，所述种类的数量为四个变量排列组合的组合数量；步骤4，根据智能体与环境交互模型的交互产生的数据更新所述Q表；步骤5，根据更新完的Q表，得到锌电解过程电解液温度对应的稳定控制模型，并根据所述稳定控制模型输出当前电解液状态对应的最优冷却塔风机频率。The model-free self-learning and stable control scheme for the electrolyte temperature in the zinc electrolysis process in the embodiments of the present disclosure includes: Step 1, establishing an environmental interaction model, a reward mechanism and a Q table corresponding to the Q-learning algorithm, and setting the target to be controlled by the electrolyte temperature interval, and initialize the parameters required for the Q table update, wherein the parameters include discount factor, learning rate and random factor; Step 2, define the state space and action space of the electrolyte in the zinc electrolysis process, wherein the action space is the cooling tower fan frequency; step 3, define the Q table, the horizontal axis represents optional actions, and the vertical axis represents the type of state space, wherein the state space includes ambient dry bulb temperature, ambient air wet bulb temperature, ambient relative There are four variables of humidity and electrolyte temperature, and the number of the types is the number of combinations of the four variables; step 4, update the Q table according to the data generated by the interaction between the agent and the environment interaction model; step 5, according to the update After completing the Q table, the stable control model corresponding to the electrolyte temperature in the zinc electrolysis process is obtained, and the optimal cooling tower fan frequency corresponding to the current electrolyte state is output according to the stable control model.

本公开实施例的有益效果为：通过本公开的方案，将电解液温度、环境干球温度、环境湿球温度、环境相对湿度作为锌电解过程电解液温度状态空间，将冷却塔风机频率设定为动作空间，并根据智能体与环境交互模型交互过程产生的数据更新Q表，最终获得锌电解过程电解液温度稳定控制模型，解决了现有的技术由于电解液循环流量大、循环不间断，环境温度参数多变导致的电解液温度控制不稳定问题，从而保障锌电解液的温度始终处于工艺的要求范围内，提升锌电解工艺生产的稳定性。The beneficial effects of the embodiments of the present disclosure are: through the solution of the present disclosure, the temperature of the electrolyte, the ambient dry bulb temperature, the ambient wet bulb temperature, and the ambient relative humidity are used as the electrolyte temperature state space in the zinc electrolysis process, and the cooling tower fan frequency is set It is the action space, and the Q table is updated according to the data generated by the interaction process between the agent and the environment interaction model, and finally the electrolyte temperature stability control model of the zinc electrolysis process is obtained. The unstable temperature control of the electrolyte caused by the changing environmental temperature parameters ensures that the temperature of the zinc electrolyte is always within the required range of the process and improves the stability of the production of the zinc electrolysis process.

附图说明Description of drawings

为了更清楚地说明本公开实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本公开实施例提供的一种锌电解过程电解液温度无模型自学习稳定控制方法的流程示意图；1 is a schematic flowchart of a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process provided by an embodiment of the present disclosure;

图2为本公开实施例提供的一种基于BP网络拟合的电解液环境交互模型示意图；2 is a schematic diagram of an electrolyte environment interaction model based on BP network fitting provided by an embodiment of the present disclosure;

图3为本公开实施例提供的另一种智能体与环境交互模型之间交互关系示意图；FIG. 3 is a schematic diagram of an interaction relationship between another agent and an environment interaction model provided by an embodiment of the present disclosure;

图4为本公开实施例提供的一种实施例一锌电解过程电解液温度无模型自学习稳定控制方法的流程图；4 is a flowchart of a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process provided by the embodiments of the present disclosure;

图5为本公开实施例提供的一种实施例二的锌电解过程电解液温度无模型自学习稳定控制方法的流程图；5 is a flowchart of a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process according to Embodiment 2 provided by the embodiments of the present disclosure;

图6为本公开实施例提供的一种采用本发明实施例二的方法和人工经验的效果对比图示意图。FIG. 6 is a schematic diagram of a comparison diagram of the effects of a method using Embodiment 2 of the present invention and artificial experience provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

下面结合附图对本公开实施例进行详细描述。The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

以下通过特定的具体实例说明本公开的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本公开的其他优点与功效。显然，所描述的实施例仅仅是本公开一部分实施例，而不是全部的实施例。本公开还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本公开的精神下进行各种修饰或改变。需说明的是，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。基于本公开中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本公开保护的范围。The embodiments of the present disclosure are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present disclosure from the content disclosed in this specification. Obviously, the described embodiments are only some, but not all, embodiments of the present disclosure. The present disclosure can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other under the condition of no conflict. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

需要说明的是，下文描述在所附权利要求书的范围内的实施例的各种方面。应显而易见，本文中所描述的方面可体现于广泛多种形式中，且本文中所描述的任何特定结构及/或功能仅为说明性的。基于本公开，所属领域的技术人员应了解，本文中所描述的一个方面可与任何其它方面独立地实施，且可以各种方式组合这些方面中的两者或两者以上。举例来说，可使用本文中所阐述的任何数目个方面来实施设备及/或实践方法。另外，可使用除了本文中所阐述的方面中的一或多者之外的其它结构及/或功能性实施此设备及/或实践此方法。It is noted that various aspects of embodiments within the scope of the appended claims are described below. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is illustrative only. Based on this disclosure, those skilled in the art should appreciate that an aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method may be practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

还需要说明的是，以下实施例中所提供的图示仅以示意方式说明本公开的基本构想，图式中仅显示与本公开中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制，其实际实施时各组件的型态、数量及比例可为一种随意的改变，且其组件布局型态也可能更为复杂。It should also be noted that the drawings provided in the following embodiments are only illustrative of the basic concept of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and the number of components in actual implementation. For dimension drawing, the type, quantity and proportion of each component can be arbitrarily changed in actual implementation, and the component layout may also be more complicated.

另外，在以下描述中，提供具体细节是为了便于透彻理解实例。然而，所属领域的技术人员将理解，可在没有这些特定细节的情况下实践所述方面。Additionally, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, one skilled in the art will understand that the described aspects may be practiced without these specific details.

本公开实施例提供一种锌电解过程电解液温度无模型自学习稳定控制方法，所述方法可以应用于化学和冶金场景中的锌电解过程电解液温度稳定控制过程。Embodiments of the present disclosure provide a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process, and the method can be applied to a zinc electrolysis process electrolyte temperature stability control process in chemical and metallurgical scenarios.

参见图1，为本公开实施例提供的一种锌电解过程电解液温度无模型自学习稳定方法的流程示意图。如图1所示，所述方法主要包括以下步骤：Referring to FIG. 1 , a schematic flowchart of a model-free self-learning stabilization method for electrolyte temperature in a zinc electrolysis process provided by an embodiment of the present disclosure. As shown in Figure 1, the method mainly includes the following steps:

可选的，如图2所示，所述环境交互模型可以采用BP神经网络搭建，输入参数为t时刻的风机频率f_tower，t时刻的环境干球温度T_dry，t时刻的环境湿球温度T_wet，t时刻的环境相对湿度RH，t时刻的电解液温度T_elec，输出参数为t+1时刻的电解液温度T_elec。Optionally, as shown in FIG. 2 , the environment interaction model can be built using a BP neural network, and the input parameters are the fan frequency f _tower at time t , the ambient dry bulb temperature at time t T _dry , and the ambient wet bulb temperature at time t . T _wet , the ambient relative humidity RH at time t, the temperature of the electrolyte solution at time t T _elec , and the output parameter is the temperature of the electrolyte solution at time t+1 T _elec .

可选的，所述奖励机制的计算方式为

Optionally, the calculation method of the reward mechanism is:

具体实施时，可以建立Q学习控制算法采用的环境交互模型env、奖励机制r，设定电解液温度需要控制的目标区间C＝{T_min,T_max}，初始化Q表更新需要的参数，如折扣因子γ、学习率α、随机因子rand等。具体的，环境交互模型env采用BP神经网络搭建，输入参数为：t时刻的风机频率f_tower，t时刻的环境干球温度T_dry，t时刻的环境湿球温度T_wet，t时刻的环境相对湿度RH，t时刻的电解液温度T_elec。输出参数为：t+1时刻的电解液温度T_elec。In the specific implementation, the environment interaction model env and the reward mechanism r used by the Q-learning control algorithm can be established, the target interval C={T _min , T _max } that needs to be controlled for the temperature of the electrolyte can be set, and the parameters required for the update of the Q table can be initialized, such as Discount factor γ, learning rate α, random factor rand, etc. Specifically, the environment interaction model env is built by using a BP neural network, and the input parameters are: the fan frequency f _tower at time t , the ambient dry bulb temperature at time t T _dry , the ambient wet bulb temperature at time t T _wet , and the relative environment at time t Humidity RH, electrolyte temperature _Telec at time t. The output parameter is: electrolyte temperature T _elec at time t+1.

所述奖励机制r的计算方式为：The calculation method of the reward mechanism r is:

具体实施时，需要定义锌电解过程电解液的状态空间与动作空间，其中，所述状态空间包括环境干球温度T_dry，环境空气湿球温度T_wet，环境相对湿度RH，电解液温度T_elec四个变量，所述动作空间为冷却塔风机频率f_tower。During the specific implementation, it is necessary to define the state space and action space of the electrolyte in the zinc electrolysis process, wherein the state space includes the ambient dry bulb temperature T _dry , the ambient air wet bulb temperature T _wet , the ambient relative humidity RH, and the electrolyte temperature T _elec Four variables, the action space is the cooling tower fan frequency f _tower .

具体实施时，Q表是一个m×n矩阵，其中，横向代表可选动作(冷却塔风机频率)[a₁,a₂,a₃,...,a_n]，其中n是可选动作的种类数量，纵轴是不同的状态空间[S₁,S₂,S₃,...,S_m]，其个数m取决于环境干球温度T_dry，环境空气湿球温度T_wet，环境相对湿度RH，电解液温度T_elec四个变量排列组合的组合数量：将环境干球温度为m₁种情况，将环境湿球温度划分为m₂种情况，将相对湿度划分为m₃种情况，将电解液温度划分为m₄种情况，最后一种状态是电解液超标，因此状态数量m＝m₁×m₂×m₃×m₄+1。In the specific implementation, the Q table is an m×n matrix, where the horizontal direction represents the optional action (cooling tower fan frequency) [a ₁ , a ₂ , a ₃ ,..., a _n ], where n is the optional action The number of types of , the vertical axis is the different state spaces [S ₁ , S ₂ , S ₃ ,..., S _m ], the number m depends on the ambient dry bulb temperature T _dry , the ambient air wet bulb temperature T _wet , Ambient relative humidity RH, electrolyte temperature T _elec The number of combinations of four variable permutations: divide the ambient dry bulb temperature as m ₁ case, divide the ambient wet bulb temperature into m ₂ cases, and divide the relative humidity into m ₃ cases The temperature of the electrolyte is divided into m ₄ cases, and the last state is that the electrolyte exceeds the standard, so the number of states is m=m ₁ ×m ₂ ×m ₃ ×m ₄ +1.

可选的，所述步骤4之前，所述方法还包括：Optionally, before the step 4, the method further includes:

进一步的，所述步骤4具体包括：Further, the step 4 specifically includes:

具体实施时，如图3所示，智能体agent与锌电解液环境交互模型env不断交互产生的数据[s_t,a_t,r_t,s_t+1]。其中s_t是t时刻的锌电解状态，即上述的状态空间，at是t时刻的塔风机频率，即上述的动作空间，r_t是t时刻的奖励，s_t+1是t+1时刻的锌电解状态，智能体agent根据当前时刻t输入的奖励r_t与状态s_t输出下一时刻的动作a_t+1，环境交互模型env根据当前时刻t输入的动作a_t与状态s_t输出下一时刻的状态s_t+1，智能体的训练过程其实就是智能体与环境交互模型不断交互，产生数据[s_t,a_t,r_t,s_t+1]，从而更新Q表内容的过程。当满足训练要求后，Q表内容固定，智能体也就学习到了符合要求的电解液温度控制方法，具体而言：不需要专门对智能体agent进行建模，训练过程其实就是智能体agent与环境交互模型env不断交互，修正Q表的过程，该过程不需要人为控制。During specific implementation, as shown in Figure 3, the data [s _t , at , r _t , s _t ₊₁ ] generated by the constant interaction between the agent and the zinc electrolyte environment interaction model env. where s _t is the zinc electrolysis state at time t, that is, the above state space, at is the frequency of the tower fan at time t, that is, the above action space, _rt is the reward at time t, and s _t+1 is the time at t+1. In the zinc electrolysis state, the agent outputs the action a _t+1 at the next moment according to the reward r _t input at the current moment t and the state s _t , and the environment interaction model env outputs the action at the current moment t according to the input action a _t and the state s _t In the state s _t+1 at a moment, the training process of the agent is actually the continuous interaction between the agent and the environment interaction model to generate data [s _t , a _t , r _t , s _t+1 ], thereby updating the content of the Q table. . When the training requirements are met, the content of the Q table is fixed, and the agent has learned the electrolyte temperature control method that meets the requirements. Specifically, there is no need to specifically model the agent, and the training process is actually the agent and the environment. The interaction model env constantly interacts, and the process of revising the Q table does not require human control.

同时，在智能体agent与锌电解液环境交互模型env交互产生数据[s_t,a_t,r_t,s_t+1]，训练Q表之前还需要定义以下内容：定义被控温度的上下限，即要求电解液温度必须处于规定的区间内，设为C＝{T_min,T_max}，其中Tmin为规定的电解液温度最低值，Tmax为规定的电解液温度最高值，在智能体agent与锌电解液环境交互模型env交互过程中，控制目标为被控的电解液温度始终处于区间C内，即被控温度不低于Tmin，同时不高于Tmax。At the same time, when the agent interacts with the zinc electrolyte environment interaction model env to generate data [s _t , at , r _t , s _t ₊₁ ], the following content needs to be defined before training the Q table: Define the upper and lower limits of the controlled temperature , that is, it is required that the temperature of the electrolyte must be within a specified interval, and it is set to C={T _min , T _max }, where Tmin is the minimum value of the specified electrolyte temperature, and Tmax is the specified maximum value of the electrolyte temperature. During the interaction with the zinc electrolyte environment interaction model env, the control target is that the temperature of the controlled electrolyte is always within the interval C, that is, the controlled temperature is not lower than Tmin and not higher than Tmax.

Q学习算法的目标是获取状态-动作所对应的值函数，用Q(s,a)表示，其中s是状态，a是动作，Q-learning算法的学习目标是学习Q表的m×n个Q值，学习过程是通过与环境模型env不断交互，得到环境的奖励r，从而在Q表中形成状态-动作对所对应的Q值，通过Q值更新规则不断地迭代修改Q表中的Q值。Q值的更新规则为：The goal of the Q-learning algorithm is to obtain the value function corresponding to the state-action, which is represented by Q(s, a), where s is the state and a is the action. The learning goal of the Q-learning algorithm is to learn m×n Q-tables Q value, the learning process is to continuously interact with the environment model env to obtain the reward r of the environment, thereby forming the Q value corresponding to the state-action pair in the Q table, and iteratively modify the Q value in the Q table through the Q value update rule. value. The update rule for the Q value is:

(1)初始化Q表，Q的横轴是n个备选动作，纵轴是m个状态，因此表中共有m×n个Q值，在初始阶段，所有的Q值均为0。(1) Initialize the Q table. The horizontal axis of Q is n alternative actions, and the vertical axis is m states. Therefore, there are m×n Q values in the table. In the initial stage, all Q values are 0.

(2)定义随机因子Rand，其值为0～1之间的随机数，每一次执行步骤(7)，rand都会更新。(2) Define a random factor Rand, whose value is a random number between 0 and 1. Every time step (7) is executed, rand will be updated.

(3)设定折扣因子γ，取值在0～1之间，由用户指定具体值。(3) Set the discount factor γ, the value is between 0 and 1, and the specific value is specified by the user.

(4)设定学习率α，取值在0～1之间，由用户指定具体值。(4) Set the learning rate α, the value is between 0 and 1, and the specific value is specified by the user.

(5)设定迭代步长step的最大值step_max，由用户指定具体值。(5) Set the maximum value step _max of the iterative step size step, and specify the specific value by the user.

(6)设定奖励r的计算方式。(6) Set the calculation method of the reward r.

(7)设置初始状态s，step＝0。(7) Set the initial state s, step=0.

(8)动作选取：如果rand>0，则选择状态s下Q值最大的动作a；如果rand＝0，则从所有状态中随机选择一个状态，并选择该状态下Q值最大的动作a。(8) Action selection: if rand>0, select the action a with the largest Q value in the state s; if rand=0, randomly select a state from all states, and select the action a with the largest Q value in this state.

(9)智能体agent将动作a输入至环境交互模型env，得到下一时刻的新状态s'。(9) The agent inputs the action a into the environment interaction model env, and obtains the new state s' at the next moment.

(10)检查s'中的所有动作，看s'下哪个动作的Q值最大，假设该动作为a'。(10) Check all actions in s' to see which action has the largest Q value under s', assuming that the action is a'.

(11)判断本次迭代过程是否合格：根据下一时刻的新状态s'获得下一时刻的电解液温度T'，如果满足T_min≤T'≤T_max，则说明本次迭代过程合格，奖励为0，继续步骤(12)；反之，如果不满足T_min≤T'≤T_max，则说明本次迭代过程失败，奖励为-1，重新回到步骤(7)。(11) Judging whether this iteration process is qualified: Obtain the electrolyte temperature T' at the next moment according to the new state s' at the next moment. If T _min ≤ T' ≤ T _max , it means that this iteration process is qualified, If the reward is 0, go to step (12); otherwise, if T _min ≤ T'≤T _max is not satisfied, it means that this iteration process fails, the reward is -1, and go back to step (7).

(12)使用如下公式更新当前动作a的Q值：(12) Use the following formula to update the Q value of the current action a:

(13)

(13)

(14)令s＝s'，a＝a'，step＝step+1，回到步骤(7)，继续循环，定义(9)～(14)为一个step，智能体agent与环境模型env交互的过程，就是step_max次更新Q表的过程。(14) Let s=s', a=a', step=step+1, return to step (7), continue the loop, define (9) to (14) as a step, the agent interacts with the environment model env The process is the process of updating the Q table step _max times.

(15)step累加到用户规定的数值step_max，Q表更新步骤结束。(15) step is accumulated to the value step _max specified by the user, and the Q table update step ends.

具体实施时，在更新完所述Q表后，可以根据更新完的Q表，得到锌电解过程电解液温度对应的稳定控制模型，并根据所述稳定控制模型输出当前电解液状态对应的最优冷却塔风机频率，以完成稳定控制过程。During specific implementation, after the Q table is updated, a stable control model corresponding to the electrolyte temperature in the zinc electrolysis process can be obtained according to the updated Q table, and the optimal corresponding to the current electrolyte state can be output according to the stable control model. Cooling tower fan frequency to complete the stable control process.

本实施例提供的锌电解过程电解液温度无模型自学习稳定控制方法，通过将电解液温度、环境干球温度、环境湿球温度、环境相对湿度作为锌电解过程电解液温度状态空间，将冷却塔风机频率设定为动作空间，并根据智能体与环境交互模型交互过程产生的数据更新Q表，最终获得锌电解过程电解液温度稳定控制模型，解决了现有的技术由于电解液循环流量大、循环不间断，环境温度参数多变导致的电解液温度控制不稳定问题，从而保障锌电解液的温度始终处于工艺的要求范围内，提升锌电解工艺生产的稳定性。The model-free self-learning and stable control method for the electrolyte temperature in the zinc electrolysis process provided in this embodiment, by using the electrolyte temperature, the ambient dry bulb temperature, the ambient wet bulb temperature, and the ambient relative humidity as the electrolyte temperature state space in the zinc electrolysis process, the cooling The tower fan frequency is set as the action space, and the Q table is updated according to the data generated by the interaction process between the agent and the environment interaction model, and finally the electrolyte temperature stability control model in the zinc electrolysis process is obtained, which solves the problem of the existing technology due to the large electrolyte circulation flow. , Uninterrupted cycle, unstable temperature control of electrolyte caused by changing environmental temperature parameters, so as to ensure that the temperature of zinc electrolyte is always within the required range of the process, and improve the stability of zinc electrolysis process production.

下面将结合两个实施例对本方案进行说明。The present solution will be described below with reference to two embodiments.

实施例一Example 1

参照图4，本发明实例一提供的锌电解过程电解液温度无模型自学习稳定控制方法，包括：Referring to Fig. 4, the zinc electrolysis process electrolyte temperature model-free self-learning and stable control method provided by Example 1 of the present invention includes:

步骤S101：建立Q学习控制算法采用的环境交互模型env、奖励r机制，设定电解液温度需要控制的目标区间C＝{T_min,T_max}，初始化Q表更新需要的参数，如折扣因子γ、学习率α、随机因子rand等。Step S101: establish an environmental interaction model env and a reward r mechanism used by the Q-learning control algorithm, set a target interval C={T _min , T _max } that needs to be controlled for the temperature of the electrolyte, and initialize the parameters required for the Q table update, such as a discount factor γ, learning rate α, random factor rand, etc.

步骤S102：定义锌电解过程电解液的状态空间与动作空间，其中状态空间包括电解液温度T_elec、环境干球温度T_dry、环境湿球温度T_wet、相对湿度RH4个变量，即S＝[T_dry,T_wet,RH,T_elec],动作空间为冷却塔风机频率f_tower。Step S102 : define the state space and action space of the electrolyte in the zinc electrolysis process, wherein the state space includes four variables: the electrolyte temperature T _elec , the ambient dry bulb temperature T _dry , the ambient wet bulb temperature T _wet , and the relative humidity RH , that is, S=[ T _dry , T _wet , RH, _Telec ], the action space is the cooling tower fan frequency f _tower .

步骤S103：定义Q表，横轴代表可选动作，纵轴代表状态空间的种类，其种类数m量取决于环境干球温度，环境空气湿球温度，环境相对湿度，电解液温度四个变量排列组合的组合数量：将环境干球温度为m₁种情况，将环境湿球温度划分为m₂种情况，将相对湿度划分为m₃种情况，将电解液温度划分为m₄种情况，最后一种状态是电解液超标，因此状态数量m＝m₁×m₂×m₃×m₄+1。Step S103: define a Q table, the horizontal axis represents optional actions, the vertical axis represents the types of state space, and the number of types m depends on four variables: ambient dry bulb temperature, ambient air wet bulb temperature, ambient relative humidity, and electrolyte temperature The number _of combinations of permutations: divide the ambient dry bulb temperature into m1 cases, divide the ambient wet bulb temperature into _m2 cases, divide the relative humidity into _m3 cases, and divide the electrolyte temperature into _m4 cases, The last state is that the electrolyte exceeds the standard, so the number of states is m=m ₁ ×m ₂ ×m ₃ ×m ₄ +1.

步骤S104：根据智能体与环境交互模型产生的数据更新Q表，从而获得锌电解过程电解液温度稳定控制模型。更新过程为：Step S104: Update the Q table according to the data generated by the interaction model between the agent and the environment, so as to obtain a temperature stability control model of the electrolyte in the zinc electrolysis process. The update process is:

(1)设置初始状态s，step＝0。(1) Set the initial state s, step=0.

(2)动作选取：随机因子rand随机取值，如果rand>0，则选择状态s下Q值最大的动作a；如果rand＝0，则从所有状态中随机选择一个状态，并选择该状态下Q值最大的动作a。(2) Action selection: the random factor rand takes a random value. If rand>0, select the action a with the largest Q value in state s; if rand=0, randomly select a state from all states, and select the Action a with the largest Q value.

(3)智能体agent将动作a输入至环境交互模型env，得到下一时刻的新状态s'。(3) The agent inputs the action a into the environment interaction model env, and obtains the new state s' at the next moment.

(4)检查s'中的所有动作，看s'下哪个动作的Q值最大，假设该动作为a'。(4) Check all the actions in s' to see which action has the largest Q value under s', assuming that the action is a'.

(5)判断本次迭代过程是否合格：根据下一时刻的新状态s'获得下一时刻的电解液温度T'，如果满足T_min≤T'≤T_max，则说明本次迭代过程合格，奖励为0，继续步骤(6)；反之，如果不满足T_min≤T'≤T_max，则说明本次迭代过程失败，奖励为-1，重新回到步骤(1)。(5) Judging whether this iteration process is qualified: Obtain the electrolyte temperature T' at the next moment according to the new state s' at the next moment. If T _min ≤ T' ≤ T _max , it means that this iteration process is qualified If the reward is 0, go to step (6); on the contrary, if T _min ≤ T'≤T _max is not satisfied, it means that the iterative process fails, the reward is -1, and return to step (1).

(6)使用如下公式更新当前动作a的Q值：(6) Use the following formula to update the Q value of the current action a:

(7)

(7)

(8)令s＝s'，a＝a'，step＝step+1，回到步骤(3)，继续循环，定义(3)～(8)为一个step，智能体agent与环境模型env交互的过程，就是step_max次更新Q表的过程。(8) Let s=s', a=a', step=step+1, return to step (3), continue the cycle, define (3) to (8) as a step, the agent interacts with the environment model env The process is the process of updating the Q table step _max times.

(9)step累加到用户规定的数值step_max，Q表更新步骤结束。(9) step is accumulated to the value step _max specified by the user, and the Q table update step ends.

步骤S105：根据更新完毕后的Q表，得到锌电解过程电解液温度稳定控制模型，输出当前电解液状态对应的最优冷却塔风机频率。Step S105: According to the updated Q table, obtain the electrolyte temperature stability control model in the zinc electrolysis process, and output the optimal cooling tower fan frequency corresponding to the current electrolyte state.

本发明提供的锌电解过程电解液温度无模型自学习稳定控制方法，通过将电解液温度、环境干球温度、环境湿球温度、环境相对湿度作为锌电解过程电解液温度状态空间，将冷却塔风机频率设定为动作空间，并根据智能体与环境交互模型交互过程产生的数据更新Q表，最终获得锌电解过程电解液温度稳定控制模型，解决了现有的技术由于电解液循环流量大、循环不间断，环境温度参数多变导致的电解液温度控制不稳定问题，从而保障锌电解液的温度始终处于工艺的要求范围内，提升锌电解工艺生产的稳定性，对优化湿法炼锌过程具有重要意义。The present invention provides a model-free self-learning and stable control method for electrolyte temperature in zinc electrolysis process. The fan frequency is set as the action space, and the Q table is updated according to the data generated by the interaction process between the agent and the environment interaction model, and finally the electrolyte temperature stability control model in the zinc electrolysis process is obtained, which solves the problem of the existing technology due to the large circulating flow of electrolyte, The uninterrupted cycle and the unstable temperature control of the electrolyte caused by the changing environmental temperature parameters ensure that the temperature of the zinc electrolyte is always within the required range of the process, improve the production stability of the zinc electrolysis process, and help optimize the zinc hydrometallurgy process. significant.

具体的，本发明实施例创新性地引入Q学习强化学习算法思想，自定义锌电解液温度空间为电解液温度，环境干球温度、环境湿球温度、相对湿度，自定义动作空间为冷却塔风机频率，建立基于BP网络的化境交互模型，通过智能体与环境交互模型不断交互获得数据，从而更新Q表，使得智能体能够实现无模型自主学习到新的电解液温度稳定控制策略，保障锌电解液温度始终处于工艺要求范围内，提升锌电解工艺生产的稳定性。Specifically, the embodiment of the present invention innovatively introduces the idea of Q-learning reinforcement learning algorithm, the self-defined zinc electrolyte temperature space is the electrolyte temperature, the environmental dry bulb temperature, the environmental wet bulb temperature, and the relative humidity, and the self-defined action space is the cooling tower Fan frequency, establish a chemical environment interaction model based on BP network, and obtain data through continuous interaction between the agent and the environment interaction model, thereby updating the Q table, so that the agent can realize model-free autonomous learning of new electrolyte temperature stability control strategies, ensuring zinc The temperature of the electrolyte is always within the range of process requirements, which improves the stability of the zinc electrolysis process.

实施例二Embodiment 2

参照图5，本发明实施例二提供的锌电解过程电解液温度无模型自学习稳定控制方法，包括：5 , a model-free self-learning and stable control method for electrolyte temperature in a zinc electrolysis process provided by Embodiment 2 of the present invention includes:

步骤S101：建立Q学习控制算法采用的环境交互模型、Q表、奖励机制，设定电解液温度需要控制的目标区间C＝{T_min,T_max}，初始化Q表更新需要的参数，如折扣因子γ、学习率α、随机因子rand等。具体内容为：Step S101: establish an environmental interaction model, a Q table, and a reward mechanism used by the Q-learning control algorithm, set a target interval C={T _min , T _max } that needs to be controlled for the temperature of the electrolyte, and initialize the parameters required for updating the Q table, such as discounts Factor γ, learning rate α, random factor rand, etc. The specific contents are:

建立Q学习算法采用的环境交互模型env，该模型使用BP网络搭建，其中输入参数为t时刻的电解液温度，t时刻的环境干球温度，t时刻的环境湿球温度，t时刻的环境相对湿度，t时刻的冷却塔风机频率(该值由智能体给出)。输出参数为t+1时刻的电解液温度。网络隐藏层数目为20。The environment interaction model env used by the Q-learning algorithm is established. The model is built using a BP network. The input parameters are the electrolyte temperature at time t, the ambient dry bulb temperature at time t, the ambient wet bulb temperature at time t, and the relative environment at time t. Humidity, the cooling tower fan frequency at time t (this value is given by the agent). The output parameter is the electrolyte temperature at time t+1. The number of hidden layers of the network is 20.

定义奖励机制，根据环境交互模型env输出的状态s_t+1，确定奖励r的值，具体为：Define the reward mechanism, and determine the value of the reward r according to the state s _t+1 output by the environment interaction model env, specifically:

定义电解液温度需要控制的目标区间C＝{T_min,T_max}，其中T_min＝37，T_max＝40。Define the target interval C={T _min , T _max } that the electrolyte temperature needs to be controlled, wherein T _min =37, T _max =40.

步骤S103：建立Q表，其中横轴为冷却塔风机可选频率，分为14种，分别是24Hz、26Hz、28Hz、30Hz、32Hz、34Hz、36Hz、38Hz、40Hz、42Hz、44Hz、46Hz、48Hz、50Hz；纵轴为电解液空间的109种状态，根据电解液温度、环境干球温度、环境湿球温度、相对湿度四种参数的不同数值排列组合，共计3×3×3×4+1＝109种状态。他们分别是：Step S103: establish a Q table, in which the horizontal axis is the optional frequency of the cooling tower fan, which is divided into 14 types, namely 24Hz, 26Hz, 28Hz, 30Hz, 32Hz, 34Hz, 36Hz, 38Hz, 40Hz, 42Hz, 44Hz, 46Hz, 48Hz , 50Hz; the vertical axis is the 109 states of the electrolyte space, according to the different numerical arrangement and combination of the four parameters of electrolyte temperature, ambient dry bulb temperature, ambient wet bulb temperature, and relative humidity, a total of 3×3×3×4+1 = 109 states. They are:

(1)如果电解液温度超过40℃或者低于37℃，则进入第109种状态，同时说明本次step失败，智能体需要将Q表内容全部清零，重新开始Q值的更新。其中T_elec是电解液温度，box是第109种状态的状态名称(1) If the electrolyte temperature exceeds 40 °C or is lower than 37 °C, it will enter the 109th state, and it means that this step has failed. The agent needs to clear all the contents of the Q table and restart the update of the Q value. Where T _elec is the electrolyte temperature, box is the state name of the 109th state

if(T_elec<37||T_elec>40)if(T _elec <37||T _elec >40)

box＝109； box=109;

elseelse

(2)将电解液温度化为三个档次，共3种情况，其中T_elec是电解液温度，T_elec_Bucket是用于划分电解液温度状态的标识(2) The temperature of the electrolyte is changed into three grades, and there are 3 situations in total, wherein T _elec is the temperature of the electrolyte, and T _{elec_Bucket} is the sign used to divide the temperature state of the electrolyte

(3)将环境干球温度化为三个档次，共3种情况，其中T_dry是环境干球温度，T_dry_Bucket是用于划分环境干球温度状态的标识(3) Convert the ambient dry bulb temperature into three grades, with 3 cases in total, where T _dry is the ambient dry bulb temperature, and T _dry _Bucket is the identifier used to divide the ambient dry bulb temperature state

(4)将相对湿度化为三个档次，共3种情况，其中RH是环境相对湿度，RH_Bucket是用于划分环境相对湿度状态的标识(4) Convert the relative humidity into three grades, a total of 3 situations, where RH is the relative humidity of the environment, and RH_Bucket is the identifier used to divide the relative humidity state of the environment

(5)将环境湿球温度化为四个档次，共4种情况，其中T_wet是环境湿球温度，T_wet_Bucket是用于划分环境湿球温度状态的标识(5) Convert the ambient wet bulb temperature into four grades, a total of 4 cases, in which T _wet is the ambient wet bulb temperature, and T _wet _Bucket is the identifier used to divide the ambient wet bulb temperature state

初始化Q表更新需要的参数，如折扣因子γ＝0.9、学习率α＝0.5，随机因子rand在每个step都会更新，范围是0～1。Initialize the parameters required to update the Q table, such as discount factor γ=0.9, learning rate α=0.5, and random factor rand will be updated at each step, ranging from 0 to 1.

(1)设置初始状态s，step＝0。(1) Set the initial state s, step=0.

(5)判断本次迭代过程是否合格：根据下一时刻的新状态s'获得下一时刻的电解液温度T'，如果满足T_min≤T'≤T_max，则说明本次迭代过程合格，奖励为0，继续步骤(12)；反之，如果不满足T_min≤T'≤T_max，则说明本次迭代过程失败，奖励为-1，重新回到步骤(7)(5) Judging whether this iteration process is qualified: Obtain the electrolyte temperature T' at the next moment according to the new state s' at the next moment. If T _min ≤ T' ≤ T _max , it means that this iteration process is qualified If the reward is 0, go to step (12); on the contrary, if T _min ≤ T'≤T _max is not satisfied, it means that the iterative process failed, the reward is -1, and go back to step (7)

(7)

(7)

现有的电解液温度依赖于人工经验，工人根据当日的环境温度设定冷却塔风机的频率，劳动强度大。并且电解车间的电解液温度目前仍采用人工手动测量，即电解液温度离线，导致人工温度控制方法存在一定的滞后性。The existing electrolyte temperature relies on manual experience, and the worker sets the frequency of the cooling tower fan according to the ambient temperature of the day, which is labor-intensive. In addition, the temperature of the electrolyte in the electrolysis workshop is still measured manually, that is, the temperature of the electrolyte is offline, resulting in a certain hysteresis in the manual temperature control method.

相比较而言，本方法提出的无模型自学习电解液温度稳定控制方法，不需要专门对电解液温度控制过程进行建模，只需要采集现场的环境温度数据与电解液温度数据，搭建基于BP网络的环境交互模型，通过智能体与环境交互模型的不断交互，得到更新后的Q表，即可实现电解液温度的无模型自适应稳定控制，省略了建模环节，可以实现良好的电解液温度稳定控制效果。In contrast, the model-free self-learning electrolyte temperature stability control method proposed by this method does not require special modeling of the electrolyte temperature control process, but only needs to collect on-site ambient temperature data and electrolyte temperature data. The environment interaction model of the network, through the continuous interaction between the agent and the environment interaction model, obtains the updated Q table, and the model-free adaptive and stable control of the electrolyte temperature can be realized. The modeling link is omitted, and a good electrolyte can be realized. Temperature stable control effect.

图6是现场人工控制方法与本方法的温度控制效果对比，其中，图6的(a)是电解车间采用人工经验控制的电解液温度曲线，其中横轴是时间，间隔为2小时，纵轴为电解液温度，图中的两条横线分别是最低温度37和最高温度40的区间限；图6的(b)采用本方法控制的电解液温度曲线，其中横轴是时间，间隔为2小时，纵轴为电解液温度，图中的两条横线分别是最低温度37和最高温度40的区间限；经过计算，人工控制的温度均值为37.797，标准差为1.16，本方法控制的温度均值为38.584，标准差为0.545，可以看出，本方法的控制下的温度曲线满足工艺要求37～40，同时相较于人工控制，标准差更小，说明温度控制更加平滑，稳定。Fig. 6 is the temperature control effect comparison between the on-site manual control method and this method, wherein, (a) of Fig. 6 is the electrolyte temperature curve controlled by manual experience in the electrolysis workshop, wherein the horizontal axis is time, the interval is 2 hours, and the vertical axis For the electrolyte temperature, the two horizontal lines in the figure are the interval limits of the lowest temperature 37 and the highest temperature 40 respectively; (b) of Fig. 6 adopts the electrolyte temperature curve controlled by this method, wherein the horizontal axis is time, and the interval is 2 Hour, the vertical axis is the temperature of the electrolyte, and the two horizontal lines in the figure are the interval limits of the minimum temperature of 37 and the maximum temperature of 40. After calculation, the average temperature of the artificially controlled temperature is 37.797, and the standard deviation is 1.16. The temperature controlled by this method is The mean value is 38.584, and the standard deviation is 0.545. It can be seen that the temperature curve under the control of this method meets the process requirements of 37-40. At the same time, compared with manual control, the standard deviation is smaller, indicating that the temperature control is smoother and more stable.

应当理解，本公开的各部分可以用硬件、软件、固件或它们的组合来实现。It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.

以上所述，仅为本公开的具体实施方式，但本公开的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited to this. Any person skilled in the art who is familiar with the technical scope of the present disclosure can easily think of changes or substitutions. All should be included within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

1. a zinc electrolysis process electrolyte temperature model-free self-learning stable control method, is characterized in that, comprises:

Step 1, establish the environmental interaction model, reward mechanism and Q table corresponding to the Q learning algorithm, set the target interval that the electrolyte temperature needs to be controlled, and initialize the parameters required for the Q table update, wherein the parameters include a discount factor, learning rate and random factor;

Step 2, define the state space and action space of the electrolyte in the zinc electrolysis process, wherein the action space is the cooling tower fan frequency;

Step 3, define the Q table, the horizontal axis represents optional actions, and the vertical axis represents the type of state space, wherein the state space includes ambient dry bulb temperature, ambient air wet bulb temperature, ambient relative humidity and electrolyte temperature. variables, and the number of the categories is the number of combinations of the four variables permutation and combination;

Step 4, update the Q table according to the data generated by the interaction between the agent and the environment interaction model;

Step 5, according to the updated Q table, obtain the stable control model corresponding to the electrolyte temperature in the zinc electrolysis process, and output the optimal cooling tower fan frequency corresponding to the current electrolyte state according to the stable control model.

2. method according to claim 1 is characterized in that, described environment interaction model adopts BP neural network to build, input parameter is the fan frequency f _tower at time t, the ambient dry bulb temperature T _dry at time t, the The ambient wet bulb temperature T _wet , the ambient relative humidity RH at time t, the electrolyte temperature T _elec at time t, and the output parameter is the electrolyte temperature T _elec at time t+1.

3. The method according to claim 1, wherein, before the step 4, the method further comprises:

Define the upper and lower limits of the controlled temperature as the preset interval;

The control target of the electrolyte temperature during the interaction between the agent and the environment interaction model is set within the preset interval.

4. method according to claim 1, is characterized in that, the calculation mode of described reward mechanism is

5. method according to claim 1, is characterized in that, described step 4 specifically comprises:

Step 4.1, set the initial state s, step=0;

Step 4.2, action selection: random factor rand takes a random value, if rand>0, select the action a with the largest Q value in state s; if rand=0, select a state randomly from all states, and select the Action a with the largest Q value;

Step 4.3, the agent enters the action a into the environment interaction model env, and obtains the new state s' at the next moment;

Step 4.4, check all actions in s', and take the action with the largest Q value in s' as a';

Step 4.5, determine whether this iteration process is qualified: obtain the electrolyte temperature T' at the next moment according to the new state s' at the next moment, if T _min ≤ T' ≤ T _max , it means that this iteration process is qualified, The reward is 0, go to step 4.6, if T _min ≤ T'≤T _max is not satisfied, it means that this iteration process failed, the reward is -1, and go back to step 4.1;

Step 4.6, use the following formula to update the Q value of the current action a:

Step 4.7, let s=s', a=a', step=step+1, go back to step 4.3, continue the cycle, define 4.3 to 4.7 as a step, the process of interaction between the agent and the environment interaction model env is step The process of updating the Q table for _max times;

Step 4.8, the step is accumulated to the value step _max specified by the user, and the step of updating the Q table ends.