KR20230128190A

KR20230128190A - An intelligent protocol of multi-agent reinforcement learning- based autonomous model calibration for ASM-type model of membrane bioreactor

Info

Publication number: KR20230128190A
Application number: KR1020220025288A
Authority: KR
Inventors: 유창규; 허성구; 남기전
Original assignee: 경희대학교 산학협력단
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2023-09-04

Abstract

The present invention relates to a sewage treatment plant model autonomous correction system using reinforcement learning, and a process operation multi-optimization system using the system thereof. More specifically, as a sewage treatment plant model correction system, the present invention relates to a multi-optimization system for a sewage treatment plant process operation using reinforcement learning comprising: a modeling part that calculates a sewage treatment plant model by imitating a sewage treatment plant; a modeling correction part that corrects the sewage treatment plant model by correcting a dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning; and an optimization module that optimizes the operation control of sewage treatment processes based on an ASM model corrected by an autonomous correction system.

Description

Sewage treatment plant model autonomous calibration system using reinforcement learning, and process operation multi-optimization system using the system

본 발명은 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템에 관한 것이다. The present invention relates to a sewage treatment plant model autonomous calibration system using reinforcement learning and a process operation multi-optimization system using the system.

하수 처리 프로세스의 수학적 프로세스 모델링은 하수처리시설의 설계 단계에서 운영 관리에 이르기까지 관심과 사용이 증가했다. 수학적 모델링은 시스템에 대한 가설 테스트 가능성, 엔지니어 간의 시스템 지식 전달 및 시스템의 미래 상태 예측 가능성을 포함하는 기회를 제공한다. 하수처리공정의 수학적 모델 중 생물학적 하수 처리 공정을 모델링하는 데에는 국제수협회(International Water Association)에서 개발한 활성 슬러지 모델(ASM)이 가장 일반적으로 사용되어 왔다. 한편, ASM은 일반 변수 및 매개변수를 사용하여 특정 하수처리장에 적용함에 따라 복잡성이 증가하여 ASM의 적용 및 시간 소모적인 작업에 어려움을 초래한다. 따라서 모델 보정은 특정 식물의 생물학적 과정의 거동을 설명하는 데 중요하다.Mathematical process modeling of sewage treatment processes has increased interest and use from the design stage of sewage treatment facilities to operation management. Mathematical modeling provides opportunities including the possibility of testing hypotheses about a system, the transfer of system knowledge between engineers, and the ability to predict the future state of a system. Among mathematical models of sewage treatment processes, the activated sludge model (ASM) developed by the International Water Association has been most commonly used to model biological sewage treatment processes. On the other hand, as ASM is applied to a specific sewage treatment plant using general variables and parameters, its complexity increases, making it difficult to apply ASM and time-consuming work. Therefore, model calibration is important for elucidating the behavior of biological processes in specific plants.

ASM 모델의 동역학적 매개변수는 환경 특성의 가변성과 공정 내 생물학적 현상의 복잡성으로 인해 하수 처리 공정과 관련하여 크게 다르다. 따라서 여러 연구에서 하수 처리 프로세스를 정확하게 설명하기 위한 보정 접근 방식을 제시했다. 전체 규모의 하수처리장(WWTP)에 대한 보정 방법은 민감한 매개변수의 순위를 고려하고 폐수 오염 물질을 시뮬레이션하기 위한 값을 결정하여 제안되었다.The kinetic parameters of ASM models differ greatly with respect to sewage treatment processes due to the variability of environmental properties and the complexity of the biological phenomena within the process. Therefore, several studies have presented calibration approaches to accurately describe sewage treatment processes. A calibration method for a full-scale sewage treatment plant (WWTP) is proposed by considering the ranking of sensitive parameters and determining values for simulating wastewater pollutants.

ASM 매개변수에 대한 실험적 보정은 시퀀싱 배치 반응기에 대한 폐수 화학적 산소 요구량(COD)을 모델링하기 위해 수행되었다. WWTP의 기질 농도 측정값과 모의값 사이의 편차를 최소화하기 위해 가능한 매개변수 조합의 선택 방법이 제안되었다. 이러한 연구는 주목할 만한 보정 결과를 제시했지만 보정된 동적 매개변수의 정상 상태 값을 기반으로 했다. 대부분의 WWTP는 생물학적 대사의 변화에 영향을 줄 수 있는 상당한 계절성과 역학의 영향을 받는다. 변화된 온도 또는 변화된 작동 조건은 바이오매스의 성장 또는 붕괴 속도를 변화시켜 오염 물질 제거 효율을 증가 또는 감소시키기 때문이다. 따라서 ASM 보정 프로토콜을 위한 신뢰할 수 있는 도구를 개발할 때 ASM의 동적 상태 조건을 고려하고 시간에 따라 매개변수를 교정하는 것이 필수적이다.Experimental calibration for ASM parameters was performed to model the wastewater chemical oxygen demand (COD) for the sequencing batch reactor. A method for selecting possible parameter combinations was proposed to minimize the deviation between the simulated and measured substrate concentrations of the WWTP. Although these studies presented notable calibration results, they were based on steady-state values of calibrated dynamic parameters. Most WWTPs are subject to significant seasonality and dynamics that can influence changes in biological metabolism. This is because the changed temperature or changed operating conditions change the rate of growth or decay of the biomass, increasing or decreasing pollutant removal efficiency. Therefore, when developing a reliable tool for an ASM calibration protocol, it is essential to consider the dynamic state conditions of the ASM and calibrate its parameters over time.

미생물의 동적 거동을 고려하여 기존의 WWTP 및 막생물 반응기(MBR) 플랜트에서 동적 보정 프로토콜을 개발하고 검증했다. 이러한 연구는 동적 보정 방법을 적용하여 신뢰할 수 있는 결과를 얻었지만 보정된 매개변수 간의 상호 작용을 나타내는 데 국한되었다. 하수 처리 공정의 대부분의 생물학적 거동은 탄소 산화, 질화 및 탈질화와 같은 각 현상을 통합한다. ASM 유형 모델의 동적 매개변수는 하나의 프로세스뿐만 아니라 두 가지 이상의 생물학적 프로세스에도 영향을 미친다. 생물학적 현상은 복잡하고 시스템 구성 요소와 상호 작용하기 때문이다. 따라서 보정된 동적 매개변수 간의 상관관계는 보정 방법에서 고려되어야 한다.Considering the dynamic behavior of microorganisms, a dynamic calibration protocol was developed and validated in existing WWTP and membrane bioreactor (MBR) plants. Although these studies obtained reliable results by applying dynamic calibration methods, they were limited to revealing the interactions between calibrated parameters. Most biological behaviors in sewage treatment processes incorporate individual phenomena such as carbon oxidation, nitrification and denitrification. The dynamic parameters of ASM-type models affect not only one process, but also two or more biological processes. This is because biological phenomena are complex and interact with system components. Correlations between the calibrated dynamic parameters should therefore be considered in the calibration method.

수학적 및 통계적 방법을 사용한 ASM 유형 모델의 보정은 여전히 모델링 연구에서 병목 현상 중 하나로 간주되고 있다. 수학적 및 통계 기반 보정은 ASM 유형 모델의 많은 매개변수로 인해 시간이 많이 걸리고 불안정할 수 있기 때문이다. 대조적으로 인공 지능(AI)은 강력하고 확장 가능한 통찰력을 제공하여 복잡하고 고차원적인 문제를 해결하는 폐수 처리 프로세스의 대안 솔루션이다. 다중 에이전트 강화 학습(MARL(Multi Agent Reinforcement Learning))은 신뢰할 수 있고 실현 가능한 다중 솔루션을 제공하는 강력한 AI 알고리즘이지만 MARL 알고리즘은 폐수 처리 연구 분야에서 거의 사용되지 않았다.Calibration of ASM-type models using mathematical and statistical methods is still considered as one of the bottlenecks in modeling research. This is because mathematical and statistical-based calibration can be time consuming and unstable due to the many parameters of ASM-type models. In contrast, artificial intelligence (AI) is an alternative solution in wastewater treatment processes that provides powerful and scalable insights to solve complex and high-level problems. Multi-Agent Reinforcement Learning (MARL) is a powerful AI algorithm that provides reliable and feasible multi-solutions, but MARL algorithms have rarely been used in wastewater treatment research.

또한 하수처리시설(WWTP)은 인간의 활동을 초래하는 폐수의 환경 오염을 지속적으로 완화하기 위해 복잡하고 에너지 집약적인 시스템을 갖추고 있다. 또한 유해한 하수 배출로부터 환경을 보호하기 위해 전 세계적으로 하수에 대한 배출 기준 및 규정이 더욱 엄격해지고 있다. 따라서 폐수 처리 분야의 최근 문제는 운영 에너지 비용 효율성과 원하는 폐수 품질을 보장하기 위한 솔루션을 제공하는 것이다. 에너지 절감 및 환경 보호 측면에서 WWTP의 공정 최적화는 최근 주목받고 있다.In addition, sewage treatment plants (WWTPs) have complex and energy-intensive systems to continuously mitigate the environmental pollution of wastewater caused by human activities. In addition, discharge standards and regulations for sewage are becoming more stringent worldwide to protect the environment from harmful sewage discharge. Therefore, the current challenge in wastewater treatment is to provide solutions to ensure operational energy cost-efficiency and desired wastewater quality. In terms of energy saving and environmental protection, process optimization of WWTP has recently attracted attention.

최근 몇 년 동안 운영 효율성을 극대화하고 에너지 소비와 지속 가능한 하수 품질의 균형을 맞추기 위해 WWTP에 최적화 전략을 적용하는 것에 대한 여러 연구가 보고되고 있다. 다중 목적 유전자 알고리즘은 하수 벤치마크 시뮬레이션 모델(BSM)을 기반으로 폭기 에너지와 폐수 품질을 절충하기 위해 사용되었다. 반응기에서 화학물질 투여율과 산소 농도를 결정하기 위해 대화형 다중 목표 최적화 도구가 제안되었다. 처리 비용과 폐수 품질 지수가 최적화되었으며, 역전파 알고리즘이 최적화 목표와 결정 요인 간의 관계를 식별하는 데 사용되었다. 개념적 프로그래밍과 수학적 프로그래밍을 결합한 통합된 상부 구조 방법은 지속 가능한 WWTP를 설계하고 운영하기 위해 제안되었다. 이러한 연구들은 주목할 만한 최적화 성능을 보여주었지만, 수학적 알고리즘은 비선형 미분 및 대수 방정식을 필요로 하므로 최적화 문제를 해결하는 동안 불확실성을 고려하는 데 제한적일 수 있다.Several studies have been reported in recent years on the application of optimization strategies to WWTPs to maximize operational efficiency and balance energy consumption and sustainable sewage quality. A multi-objective genetic algorithm was used to trade off aeration energy and wastewater quality based on a sewage benchmark simulation model (BSM). An interactive multi-target optimization tool is proposed to determine the chemical dosage rate and oxygen concentration in the reactor. Treatment costs and wastewater quality indices were optimized, and backpropagation algorithms were used to identify relationships between optimization targets and determinants. An integrated superstructure method combining conceptual and mathematical programming has been proposed to design and operate a sustainable WWTP. Although these studies have shown remarkable optimization performance, mathematical algorithms require non-linear differential and algebraic equations, which may limit their ability to account for uncertainties while solving optimization problems.

WWTP는 생물학적-화학적-물리적 메커니즘과 관련하여 비선형 특성을 갖는 복잡한 시스템이다. 또한 동적 및 비선형 유입 조건은 WWTP의 안정적인 운영에 함정이 될 수 있다. 도 15는 Y-city WWTP에서 측정한 유입수의 비선형 특성을 나타낸다. 측정된 유입수 성분은 유입수 성분의 농도가 광범위하게 흩어져 있는 뚜렷한 비선형성을 나타냄을 알 수 있다(도 15(a) 및 (b)). 유입수 오염물질의 왜도값은 도 15(c)와 같이 편향된 분포를 보이는 양의 값이었다. 유입 오염 물질 분포의 꼬리는 비선형 특성을 나타내는 정규성에서 분기되었다. 따라서 데이터 기반 또는 모델이 없는 알고리즘은 비선형 유입수 속성에 해당하는 WWTP를 최적화하기 위한 대안 솔루션이 될 수 있다.WWTPs are complex systems with non-linear properties with respect to biological-chemical-physical mechanisms. Also, dynamic and non-linear inflow conditions can be a trap for reliable operation of WWTPs. 15 shows the nonlinear characteristics of the influent measured at the Y-city WWTP. It can be seen that the measured influent components exhibit distinct non-linearity in which the concentrations of the influent components are widely scattered (Fig. 15(a) and (b)). The skewness values of influent contaminants were positive values showing a biased distribution as shown in FIG. 15(c). The tails of the influent pollutant distribution diverged from normality indicating a non-linear nature. Therefore, data-driven or model-free algorithms can be an alternative solution for optimizing WWTPs corresponding to non-linear influent properties.

강화 학습(RL)은 고차원 및 비선형 문제를 해결하는 능력을 통해 WWTP를 최적화하기 위해 큰 주목을 받았다. Hernandez-Del-Olmo et al(2012)은 BSM1 기반 WWTP에 모델 없는 RL을 적용하여 용존 산소(DO) 농도를 최적화하여 폭기 에너지와 유출 암모니아 농도를 동시에 줄였다. Seo et al(2021)는 에너지 소비를 줄이기 위해 WWTP의 펌핑 시스템에 대한 PPO(Proximal Policy Optimization) 기반 최적화 전략을 제안했다. 이러한 연구는 RL을 사용했지만 동적 다중 목표 문제를 해결하는 데 제한적이었다. 대부분의 WWTP는 하나의 운영 변수가 아니라 더 영향력 있는 조작 변수에 의해 운영된다. 최근 Chen et al(2021)은 MADDPG(multi-agent deep deterministic policy gradient) 알고리즘을 사용하여 WWTP에서 DO 농도와 화학물질 양을 최적화했다. 사용된 MADDPG 알고리즘은 최적화에서 개선된 성능을 나타내었지만 MADDPG 알고리즘은 다중 에이전트 신용 할당 문제를 해결할 위험이 있다.이 문제에서 에이전트가 협업 환경에서 문제 해결에 대한 기여도를 계산하는 것은 어렵다. 이러한 맥락에서 QMIX 및 2단계 주의 네트워크(G2ANet) 알고리즘 기반 게임 추상화 메커니즘은 효과적인 자기 학습 기반 작업 완료 성공을 극대화하면서 최적의 솔루션을 제공하는 첨단 다중 에이전트 강화 학습(MARL) 기술이다.Reinforcement learning (RL) has received great attention for optimizing WWTP through its ability to solve high-dimensional and nonlinear problems. Hernandez-Del-Olmo et al (2012) applied model-free RL to a BSM1-based WWTP to optimize the dissolved oxygen (DO) concentration, thereby simultaneously reducing aeration energy and effluent ammonia concentration. Seo et al (2021) proposed an optimization strategy based on PPO (Proximal Policy Optimization) for the pumping system of WWTP to reduce energy consumption. Although these studies have used RL, they have been limited to solving dynamic multi-objective problems. Most WWTPs are not driven by one operating variable, but by more influential operating variables. Recently, Chen et al (2021) used a multi-agent deep deterministic policy gradient (MADDPG) algorithm to optimize DO concentrations and chemical amounts in a WWTP. The MADDPG algorithm used showed improved performance in optimization, but the MADDPG algorithm risks solving the multi-agent credit allocation problem. In this problem, it is difficult to calculate the agent's contribution to solving the problem in a collaborative environment. In this context, QMIX and the two-level attention network (G2ANet) algorithm-based game abstraction mechanism are advanced multi-agent reinforcement learning (MARL) techniques that provide optimal solutions while maximizing effective self-learning-based task completion success.

대한민국 등록특허 10-1927503Korean Registered Patent No. 10-1927503 대한민국 등록특허 10-1629240Korean registered patent 10-1629240 대한민국 등록특허10-2041326Republic of Korea Patent No. 10-2041326 대한민국 공개특허10-2021-0109161Republic of Korea Patent Publication 10-2021-0109161

따라서 본 발명은 상기와 같은 종래의 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 실시예에 따르면, 다중 에이전트 강화학습 인공지능 기술을 이용하여 하수처리장의 activated sludge model(ASM) 모델을 보정하는 지능형 모델 보정 프로토콜과, 그를 이용한 공정 운영 전략 다중 최적화 시스템을 제공하는데 그 목적이 있다. Therefore, the present invention has been made to solve the above conventional problems, and according to an embodiment of the present invention, an intelligent system that corrects an activated sludge model (ASM) model of a sewage treatment plant using multi-agent reinforcement learning artificial intelligence technology. Its purpose is to provide a model calibration protocol and a multi-optimization system for a process operation strategy using the same.

본 발명의 실시예에 따르면, 다중 에이전트 강화학습 인공지능 기술을 이용하여 하수처리장의 ASM 모델을 보정하는 지능형 모델 보정 프로토콜과, 그를 이용한 공정 운영 전략 다중 최적화 시스템을 통해, 하수처리장의 시가별 생물학적 미생물 특성 변동에 따라, 동적으로 공정 모사 ASM 수학적 모델을 보정하고 공정의 운영 설정 값 중 폭기 강도, 외부탄소원 주입량, 슬러지 순환 유량을 최적화 하여 하수처리공정의 환경성과 경제성을 개선할 수 있는, 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템을 제공하는데 그 목적이 있다. According to an embodiment of the present invention, an intelligent model calibration protocol for calibrating an ASM model of a sewage treatment plant using multi-agent reinforcement learning artificial intelligence technology, and a process operation strategy multiple optimization system using the same, biological microorganisms for each market value of the sewage treatment plant Reinforcement learning that can improve the environmental performance and economic feasibility of the sewage treatment process by dynamically correcting the process simulation ASM mathematical model according to the change in characteristics and optimizing the aeration intensity, external carbon source injection amount, and sludge circulation flow rate among the operating set values of the process The purpose is to provide a sewage treatment plant model autonomous correction system and a multi-optimization system for process operation using the system.

그리고 본 발명의 실시예에 따르면, 국내 하수처리장 데이터를 이용해 검증하여, 유출수 chemical oxygen demand (COD)와 total nitrogen (TN)을 모사하는데 있어 각각 모델링 에러를 87%, 52%를 감소할 수 있고, 또한, 지능형 모델 보정 프로토콜을 통해 보정된 ASM을 이용한 공정 운영 최적화 결과, 폭기 에너지를 25% 유출수 수질을 7% 개선할 수 있는, 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템을 제공하는데 그 목적이 있다. And according to an embodiment of the present invention, modeling errors can be reduced by 87% and 52% respectively in simulating effluent chemical oxygen demand (COD) and total nitrogen (TN) by verifying using domestic sewage treatment plant data, In addition, as a result of process operation optimization using ASM calibrated through an intelligent model calibration protocol, a sewage treatment plant model autonomous calibration system using reinforcement learning that can improve aeration energy by 25% and effluent water quality by 7%, and a process using the system Its purpose is to provide an operation multi-optimization system.

또한 본 발명의 실시예에 따르면, 국내 및 국외 하수처리장에 정확한 공정 모사와 경제적이며 친환경적인 공정 운영 전략을 제시함으로써, 다양한 하수처리장 운영 분야에 적용될 것이라 기대됨. 사업화로는 국내외 하수처리장, 환경 시스템 기업 등에서 에너지 및 친환경성 개선 등에 이용될 것으로 전망될 수 있는, 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템을 제공하는데 그 목적이 있다. In addition, according to the embodiment of the present invention, it is expected that it will be applied to various sewage treatment plant operation fields by presenting an accurate process simulation and an economical and environmentally friendly process operation strategy for domestic and foreign sewage treatment plants. For commercialization, we provide a sewage treatment plant model autonomous correction system using reinforcement learning, which can be expected to be used for energy and eco-friendliness improvement at domestic and foreign sewage treatment plants and environmental system companies, and a process operation multi-optimization system using the system. There is a purpose.

한편, 본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.On the other hand, the technical problems to be achieved in the present invention are not limited to the above-mentioned technical problems, and other technical problems that are not mentioned will become clear to those skilled in the art from the description below. You will be able to understand.

본 발명의 제1목적은 하수처리장 모델 보정시스템으로서, 상기 하수처리장을 모사하여 하수처리장 모델을 산출하는 모델링부; 및 상기 하수처리장 모델의 동력학계수값을 인공지능 학습을 이용하여 보정하여 상기 하수처리장 모델을 보정하는 모델링 보정부;를 포함하는 것을 특징으로 하는 강화학습을 이용한 하수처리장 모델 자율보정시스템으로서 달성될 수 있다. A first object of the present invention is a sewage treatment plant model correction system, comprising: a modeling unit for calculating a sewage treatment plant model by simulating the sewage treatment plant; And a modeling correction unit for correcting the sewage treatment plant model by correcting the dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning. there is.

그리고 상기 하수처리장 모델은 활성 슬러지 모델(ASM)인 것을 특징으로 할 수 있다. And the sewage treatment plant model may be characterized in that it is an activated sludge model (ASM).

또한 상기 인공지능 학습은 다중 에이전트 강화학습 인공지능인 것을 특징으로 할 수 있다. In addition, the artificial intelligence learning may be characterized as multi-agent reinforcement learning artificial intelligence.

그리고 상기 다중 에이전트 강화학습 인공지능은 G2ANet 알고리즘을 기반하는 것을 특징으로 할 수 있다. And the multi-agent reinforcement learning artificial intelligence may be characterized in that it is based on the G2ANet algorithm.

본 발명의 제2목적은 하수처리장 모델 보정방법으로서, 모델링부가 상기 하수처리장을 모사하여 하수처리장 활성 슬러지 모델(ASM)을 산출하는 단계; 모델링보정부가 상기 하수처리장 모델의 동력학계수값을 인공지능 학습을 이용하여 보정하여 보정된 ASM 모델을 생성하는 단계;를 포함하는 것을 특징으로 하는 강화학습을 이용한 하수처리장 모델 자율보정방법으로서 달성될 수 있다. A second object of the present invention is a method for correcting a sewage treatment plant model, comprising: calculating, by a modeling unit, a sewage treatment plant activated sludge model (ASM) by simulating the sewage treatment plant; It can be achieved as an autonomous correction method for a sewage treatment plant model using reinforcement learning, comprising: a modeling correction unit correcting the dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning to generate a calibrated ASM model. there is.

또한 인공지능 학습은 다중 에이전트 강화학습 인공지능이고, 상기 다중 에이전트 강화학습 인공지능은 G2ANet 알고리즘을 기반하는 것을 특징으로 할 수 있다. In addition, artificial intelligence learning may be multi-agent reinforcement learning artificial intelligence, and the multi-agent reinforcement learning artificial intelligence may be characterized as being based on the G2ANet algorithm.

본 발명의 제3목적은 하수처리장 운영 최적화시스템으로서, 앞서 언급한 제 1목적에 따른 자율보정시스템; 및 상기 자율보정시스템에 의해 보정된 ASM 모델을 기반으로 하수처리공정 운영 제어를 최적화하는 최적화모듈;을 포함하는 것을 특징으로 하는 강화학습을 이용한 하수처리장 공정 운영 다중 최적화시스템으로서 달성될 수 있다. A third object of the present invention is a sewage treatment plant operation optimization system, comprising: an autonomous correction system according to the first object mentioned above; and an optimization module for optimizing operation control of the sewage treatment process based on the ASM model corrected by the autonomous correction system.

그리고 상기 최적화모듈은 다중 에이전트 강화학습 인공지능을 기반으로 공정운영을 최적화하는 것을 특징으로 할 수 있다. And the optimization module may be characterized by optimizing process operation based on multi-agent reinforcement learning artificial intelligence.

또한 상기 최적화모듈은 폭기강도, 외부탄소원 주입량 및 슬러지 순환유량을 최적화하는 것을 특징으로 할 수 있다. In addition, the optimization module may be characterized by optimizing the aeration intensity, the external carbon source injection amount, and the sludge circulation flow rate.

본 발명의 제4목적은 하수처리장 공정 운영 다중 최적화방법으로서, 모델링부가 상기 하수처리장을 모사하여 하수처리장 활성 슬러지 모델(ASM)을 산출하는 단계; 모델링보정부가 상기 하수처리장 모델의 동력학계수값을 인공지능 학습을 이용하여 보정하여 보정된 ASM 모델을 생성하는 단계; 및 최적화모듈이 상기 보정된 ASM 모델을 기반으로 다중 에이전트 강화학습 인공지능을 통해 하수처리공정 운영의 폭기강도, 외부탄소원 주입량 및 슬러지 순환유량을 최적화하는 단계;를 포함하는 것을 특징으로 하는 강화학습을 이용한 하수처리장 공정 운영 다중 최적화방법으로서 달성될 수 있다. A fourth object of the present invention is a sewage treatment plant process operation multi-optimization method, wherein the modeling unit simulates the sewage treatment plant to calculate a sewage treatment plant activated sludge model (ASM); generating a calibrated ASM model by a modeling correction unit correcting the dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning; and an optimization module optimizing the aeration intensity, external carbon source injection amount, and sludge circulation flow rate of sewage treatment process operation through multi-agent reinforcement learning artificial intelligence based on the calibrated ASM model. It can be achieved as a multi-optimization method for operating the sewage treatment plant using

본 발명의 실시예에 따르면, 다중 에이전트 강화학습 인공지능 기술을 이용하여 하수처리장의 activated sludge model(ASM) 모델을 보정하는 지능형 모델 보정 프로토콜과, 그를 이용한 공정 운영 전략 다중 최적화 시스템을 제공할 수 있다. According to an embodiment of the present invention, an intelligent model correction protocol for correcting an activated sludge model (ASM) model of a sewage treatment plant using multi-agent reinforcement learning artificial intelligence technology, and a process operation strategy multiple optimization system using the same can be provided. .

본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템에 따르면, 다중 에이전트 강화학습 인공지능 기술을 이용하여 하수처리장의 ASM 모델을 보정하는 지능형 모델 보정 프로토콜과, 그를 이용한 공정 운영 전략 다중 최적화 시스템을 통해, 하수처리장의 시가별 생물학적 미생물 특성 변동에 따라, 동적으로 공정 모사 ASM 수학적 모델을 보정하고 공정의 운영 설정 값 중 폭기 강도, 외부탄소원 주입량, 슬러지 순환 유량을 최적화 하여 하수처리공정의 환경성과 경제성을 개선할 수 있는 효과를 갖는다. According to the sewage treatment plant model autonomous correction system using reinforcement learning and the process operation multi-optimization system using the system according to an embodiment of the present invention, intelligent correction of the ASM model of the sewage treatment plant using multi-agent reinforcement learning artificial intelligence technology Through a model calibration protocol and a multi-optimization system for process operation strategies using the same, the process simulation ASM mathematical model is dynamically calibrated according to changes in biological and microbial characteristics by market value of the sewage treatment plant, and aeration intensity and external carbon source injection amount among the operating set values of the process In addition, it has the effect of improving the environmental and economic feasibility of the sewage treatment process by optimizing the sludge circulation flow rate.

그리고 본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템에 따르면, 국내 하수처리장 데이터를 이용해 검증하여, 유출수 chemical oxygen demand (COD)와 total nitrogen (TN)을 모사하는데 있어 각각 모델링 에러를 87%, 52%를 감소할 수 있고, 또한, 지능형 모델 보정 프로토콜을 통해 보정된 ASM을 이용한 공정 운영 최적화 결과, 폭기 에너지를 25% 유출수 수질을 7% 개선할 수 있는 효과를 갖는다. In addition, according to the sewage treatment plant model autonomous correction system using reinforcement learning and the process operation multi-optimization system using the system according to an embodiment of the present invention, the effluent chemical oxygen demand (COD) and total In simulating nitrogen (TN), modeling errors can be reduced by 87% and 52%, respectively, and as a result of process operation optimization using ASM calibrated through an intelligent model calibration protocol, aeration energy can be reduced by 25% and effluent water quality reduced by 7 % has the potential to improve.

또한 본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템에 따르면, 국내 및 국외 하수처리장에 정확한 공정 모사와 경제적이며 친환경적인 공정 운영 전략을 제시함으로써, 다양한 하수처리장 운영 분야에 적용될 것이라 기대됨. 사업화로는 국내외 하수처리장, 환경 시스템 기업 등에서 에너지 및 친환경성 개선 등에 이용될 것으로 전망될 수 있다. In addition, according to the sewage treatment plant model autonomous correction system using reinforcement learning and the process operation multi-optimization system using the system according to an embodiment of the present invention, accurate process simulation and economical and eco-friendly process operation strategies are provided for domestic and foreign sewage treatment plants. By presenting it, it is expected that it will be applied to various sewage treatment plant operation fields. As for commercialization, it can be expected to be used in domestic and foreign sewage treatment plants and environmental system companies to improve energy and eco-friendliness.

한편, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.On the other hand, the effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below. You will be able to.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술적 사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석 되어서는 아니 된다.
도 1은 본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템의 블록도,
도 2는 MBR 플랜트용 ASM형 모델의 MARL 기반 자율교정 프레임워크,
도 3은 Guo et al에서 수정된 MARL 알고리즘의 교정 프로토콜의 일부로 초기화 및 교정 기간을 설명하는 체계,
도 4는 Wong et al에서 수정된 다중 에이전트 강화 학습의 훈련 계획,
도 5는 QMIX의 구조: (a) 혼합 네트워크 구조, (b) 전체 QMIX 아키텍처, (c) 에이전트 네트워크 구조
도 6은 2단계 어텐션 및 그래프 임베딩의 계산 절차,
도 7은 Central-V를 사용한 2단계 주의를 포함하는 G2ANet의 세부 구조,
도 8은 2011년부터 2018년까지 일일 측정 데이터: 유입수 및 유출수 (a) COD 및 (b) TN 농도, (c) 4개 원자로의 MLSS 농도, (d) 유입수 온도
도 9는 QMIX 및 G2ANet 기반 자율 교정 시스템의 평균 보상 변화
도 10은 G2ANet 알고리즘에 의해 할당된 보정된 운동 매개변수의 3차원 t-SNE 임베딩(포인트는 학습 에포크 수에 따라 파란색에서 빨간색으로 색상이 지정됨)
도 11은 (a) 폐수 COD 및 (b) 폐수 TN에 대한 G2ANet 기반 보정 모델과 MBR 플랜트의 기본 모델 비교,
도 12는 G2ANet 알고리즘에 의해 2018년에 대해 보정된 운동 매개변수 값의 히스토그램 및 커널 밀도(검은색 점선은 운동 매개변수의 기본값을 나타냄),
도 13은 G2ANet 기반 보정 모델의 결과: (a) 폐수 COD 및 TN 및 (b) 2월에 대한 보정 매개변수, (c) 폐수 COD 및 TN 및 (b) 7월에 대한 보정 매개변수,
도 14는 (a) 무산소 반응기와 (b) 호기 반응기에서 2월과 7월에 대한 생물학적 공정 속도의 변화,
도 15는 유입수 TSS, COD 및 TN 농도의 비선형 특성: (a) 측정된 유입 오염물질 농도 (b) 시간에 따른 유입수의 3차원 도표 및 (c) 유입수의 Skewness 값과 편향 히스토그램 및 확률 도표,
도 16은 MARL 기반 자율 다중 궤적 탐색 시스템의 그래픽 프레임워크,
도 17은 MBR 플랜트의 레이아웃 및 유입수 부하,
도 18은 유입수 COD 및 TN 농도의 변화: (a) 월별 및 (b) 일별 변화 추세
도 19은 다중 에이전트 강화 학습 알고리즘 기반 다중 작업 궤적 검색 시스템의 그래픽 표현,
도 20은 유입수 데이터 생성 결과: (a) k-means 클러스터링과 그 절차를 이용한 COD 및 TN 조성비에 대한 유입수 시나리오 클러스터 및 (b) 시나리오에 따른 일간 유입수 데이터 생성,
도 21은 성능 지수의 선형 조합을 사용하는 MARL 알고리즘의 선형 보상 함수,
도 22는 Epoch 횟수에 따른 평균 보상 값의 변화: (a) QMIX와 G2ANet 비교, (b) QMIX, (c) G2ANet 비교,
도 23은 (a) 낮음, (b) 정상 및 (c) 높은 유입 시나리오에서 G2ANet 알고리즘에 의한 DO 농도, 외부 재활용 비율 및 외부 탄소 유량의 최적 운영 궤적,
도 24는 대상 MBR 플랜트의 2018년 유입수 조건 특성: (a) 유입수 COD 및 TN 농도 및 (b) 일별 및 (c) 월별 유입수 조건을 구분하기 위한 k-means 클러스터링 기반 보로노이 다이어그램,
도 25는 2월 G2ANet 기반 궤적 검색 시스템의 자율 최적화 성능(높은 유입수 조건): (a) COD 및 TN 농도 변화, (b) 최적 궤적, (c) 성능 지수, (d) 품질에 따른 유출 성분 프로파일 제한,
도 26은 7월 G2ANet 기반 궤적 검색 시스템의 자율 최적화 성능(저유입 조건): (a) COD 및 TN 농도 변화, (b) 최적 궤적, (c) 성능 지수, (d) 품질에 따른 유출 성분 프로파일 제한The following drawings attached to this specification illustrate preferred embodiments of the present invention, and together with the detailed description of the invention serve to further understand the technical idea of the present invention, the present invention is limited only to those described in the drawings. and should not be interpreted.
1 is a block diagram of a sewage treatment plant model autonomous correction system using reinforcement learning and a process operation multi-optimization system using the system according to an embodiment of the present invention;
2 is a MARL-based self-correction framework of an ASM-type model for MBR plants;
3 is a scheme describing initialization and calibration periods as part of a calibration protocol for the MARL algorithm modified by Guo et al;
4 is a training plan for multi-agent reinforcement learning modified by Wong et al;
Figure 5 shows the structure of QMIX: (a) mixed network structure, (b) overall QMIX architecture, (c) agent network structure
6 is a calculation procedure of two-step attention and graph embedding;
7 is a detailed structure of G2ANet including two-level attention using Central-V;
Figure 8 shows daily measurement data from 2011 to 2018: influent and effluent (a) COD and (b) TN concentrations, (c) MLSS concentrations of four reactors, (d) influent temperature
Figure 9 is the average compensation change of QMIX and G2ANet-based autonomous calibration system
Figure 10 shows a three-dimensional t-SNE embedding of calibrated kinetic parameters assigned by the G2ANet algorithm (points are colored blue to red according to the number of training epochs).
11 is a comparison of the G2ANet-based calibration model for (a) wastewater COD and (b) wastewater TN and the basic model of the MBR plant;
Figure 12 is a histogram and kernel density of kinetic parameter values corrected for 2018 by the G2ANet algorithm (black dotted line represents the default values of kinetic parameters);
Figure 13 shows the results of the G2ANet-based calibration model: (a) wastewater COD and TN and (b) calibration parameters for February, (c) wastewater COD and TN and (b) calibration parameters for July;
Figure 14 shows the change in the biological process rate for February and July in (a) anoxic reactor and (b) aerobic reactor;
15 shows nonlinear characteristics of influent TSS, COD, and TN concentrations: (a) measured influent contaminant concentrations (b) 3-dimensional plots of influent over time and (c) influent skewness values and bias histograms and probability plots;
16 is a graphical framework of a MARL-based autonomous multi-trajectory search system;
17 shows the layout and influent load of the MBR plant;
18 shows changes in influent COD and TN concentrations: (a) monthly and (b) daily trend
19 is a graphical representation of a multi-agent reinforcement learning algorithm-based multi-task trajectory retrieval system;
20 shows influent data generation results: (a) influent scenario cluster for COD and TN composition ratio using k-means clustering and the procedure, and (b) daily influent data generation according to the scenario;
21 is a linear compensation function of the MARL algorithm using a linear combination of figures of merit;
22 shows changes in average reward values according to the number of epochs: (a) QMIX and G2ANet comparison, (b) QMIX, (c) G2ANet comparison,
23 shows optimal operating trajectories of DO concentration, external recycling rate and external carbon flux by the G2ANet algorithm in (a) low, (b) normal and (c) high inflow scenarios;
24 is a k-means clustering-based Voronoi diagram for classifying influent condition characteristics in 2018 of a target MBR plant: (a) influent COD and TN concentrations and (b) daily and (c) monthly influent conditions;
25 shows the self-optimization performance of the G2ANet-based trajectory search system in February (high influent condition): (a) COD and TN concentration changes, (b) optimal trajectory, (c) figure of merit, and (d) effluent component profile according to quality limit,
26 shows self-optimization performance of the G2ANet-based trajectory search system in July (low inflow condition): (a) COD and TN concentration changes, (b) optimal trajectory, (c) figure of merit, and (d) outflow component profile according to quality limit

이상의 본 발명의 목적들, 다른 목적들, 특징들 및 이점들은 첨부된 도면과 관련된 이하의 바람직한 실시예들을 통해서 쉽게 이해될 것이다. 그러나 본 발명은 여기서 설명되는 실시예들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 오히려, 여기서 소개되는 실시예들은 개시된 내용이 철저하고 완전해질 수 있도록 그리고 통상의 기술자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 제공되는 것이다.The above objects, other objects, features and advantages of the present invention will be easily understood through the following preferred embodiments in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments described herein and may be embodied in other forms. Rather, the embodiments introduced herein are provided so that the disclosed content will be thorough and complete and the spirit of the present invention will be sufficiently conveyed to those skilled in the art.

본 명세서에서, 어떤 구성요소가 다른 구성요소 상에 있다고 언급되는 경우에 그것은 다른 구성요소 상에 직접 형성될 수 있거나 또는 그들 사이에 제 3의 구성요소가 개재될 수도 있다는 것을 의미한다. 또한 도면들에 있어서, 구성요소들의 두께는 기술적 내용의 효과적인 설명을 위해 과장된 것이다.In this specification, when an element is referred to as being on another element, it means that it may be directly formed on the other element or a third element may be interposed therebetween. Also, in the drawings, the thickness of components is exaggerated for effective description of technical content.

본 명세서에서 기술하는 실시예들은 본 발명의 이상적인 예시도인 단면도 및/또는 평면도들을 참고하여 설명될 것이다. 도면들에 있어서, 막 및 영역들의 두께는 기술적 내용의 효과적인 설명을 위해 과장된 것이다. 따라서 제조 기술 및/또는 허용 오차 등에 의해 예시도의 형태가 변형될 수 있다. 따라서 본 발명의 실시예들은 도시된 특정 형태로 제한되는 것이 아니라 제조 공정에 따라 생성되는 형태의 변화도 포함하는 것이다. 예를 들면, 직각으로 도시된 영역은 라운드지거나 소정 곡률을 가지는 형태일 수 있다. 따라서 도면에서 예시된 영역들은 속성을 가지며, 도면에서 예시된 영역들의 모양은 소자의 영역의 특정 형태를 예시하기 위한 것이며 발명의 범주를 제한하기 위한 것이 아니다. 본 명세서의 다양한 실시예들에서 제1, 제2 등의 용어가 다양한 구성요소들을 기술하기 위해서 사용되었지만, 이들 구성요소들이 이 같은 용어들에 의해서 한정되어서는 안 된다. 이들 용어들은 단지 어느 구성요소를 다른 구성요소와 구별시키기 위해서 사용되었을 뿐이다. 여기에 설명되고 예시되는 실시예들은 그것의 상보적인 실시예들도 포함한다.Embodiments described in this specification will be described with reference to cross-sectional views and/or plan views, which are ideal exemplary views of the present invention. In the drawings, the thicknesses of films and regions are exaggerated for effective explanation of technical content. Accordingly, the shape of the illustrated drawings may be modified due to manufacturing techniques and/or tolerances. Therefore, embodiments of the present invention are not limited to the specific shape shown, but also include changes in the shape generated according to the manufacturing process. For example, a region shown at right angles may be rounded or have a predetermined curvature. Accordingly, the regions illustrated in the drawings have attributes, and the shapes of the regions illustrated in the drawings are intended to illustrate a specific shape of a region of a device and are not intended to limit the scope of the invention. Although terms such as first and second are used to describe various elements in various embodiments of the present specification, these elements should not be limited by these terms. These terms are only used to distinguish one component from another. Embodiments described and illustrated herein also include complementary embodiments thereof.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 '포함한다(comprises)' 및/또는 '포함하는(comprising)'은 언급된 구성요소는 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. The terms 'comprises' and/or 'comprising' used in the specification do not exclude the presence or addition of one or more other elements.

아래의 특정 실시예들을 기술하는데 있어서, 여러 가지의 특정적인 내용들은 발명을 더 구체적으로 설명하고 이해를 돕기 위해 작성되었다. 하지만 본 발명을 이해할 수 있을 정도로 이 분야의 지식을 갖고 있는 독자는 이러한 여러 가지의 특정적인 내용들이 없어도 사용될 수 있다는 것을 인지할 수 있다. 어떤 경우에는, 발명을 기술하는 데 있어서 흔히 알려졌으면서 발명과 크게 관련 없는 부분들은 본 발명을 설명하는데 있어 별 이유 없이 혼돈이 오는 것을 막기 위해 기술하지 않음을 미리 언급해 둔다.In describing the specific embodiments below, several specific contents are prepared to more specifically describe the invention and aid understanding. However, readers who have knowledge in this field to the extent that they can understand the present invention can recognize that it can be used without these various specific details. In some cases, it is mentioned in advance that parts that are commonly known in describing the invention and are not greatly related to the invention are not described in order to prevent confusion for no particular reason in explaining the present invention.

이하에서는 본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템의 구성 및 기능에 대해 설명하도록 한다. 도 1은 본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정시스템과, 그 시스템을 이용한 공정 운영 다중 최적화시스템의 블록도를 도시한 것이다. Hereinafter, the configuration and functions of a sewage treatment plant model autonomous correction system using reinforcement learning and a process operation multi-optimization system using the system according to an embodiment of the present invention will be described. 1 is a block diagram of a sewage treatment plant model autonomous correction system using reinforcement learning and a process operation multi-optimization system using the system according to an embodiment of the present invention.

먼저, 본 발명의 실시예에 따른 강화학습을 이용한 하수처리장 모델 자율 보정방법 및 시스템에 대해 설명하도록 한다. First, a sewage treatment plant model autonomous calibration method and system using reinforcement learning according to an embodiment of the present invention will be described.

본 발명의 실시예에서는 ASM형 모델을 보정하기 위한 지능형 프로토콜을 제안하고 다중 에이전트 강화학습(MARL) 알고리즘을 활용하여 생물학적 운동 매개변수를 보정했다. 먼저, 16개의 동력학적 매개변수 중에서 동력학적 매개변수의 하위 집합을 선택했다. 둘째, 동력학적 매개변수 값을 제안하기 위해 MARL 알고리즘의 구조를 선택하였다. 본 발명의 실시예에서는 2단계 주의 네트워크(G2ANet) 기반의 QMIX 및 게임 추상화 메커니즘인 두 가지 MARL 알고리즘을 비교했다. 동력학적 매개변수의 선택된 하위 집합을 기반으로 MARL의 에이전트가 동력학적 매개변수를 보정하도록 할당되었다. MARL 알고리즘의 가중치를 훈련시킨 후, 알고리즘 훈련 절차에 사용되지 않은 새로 측정된 데이터에 대해 MARL 테스트를 수행하였다. MARL 기반 보정된 동력학적 매개변수는 생물학적 현상을 해석하여 신뢰할 수 있고 실현 가능한 보정 성능을 보장하기 위해 평가되었다.In the embodiment of the present invention, an intelligent protocol for calibrating an ASM-type model is proposed and biological motion parameters are calibrated by utilizing a multi-agent reinforcement learning (MARL) algorithm. First, we selected a subset of kinetic parameters among 16 kinetic parameters. Second, the structure of the MARL algorithm was chosen to propose the kinetic parameter values. In the embodiment of the present invention, two MARL algorithms, QMIX and game abstraction mechanism based on two-level attentional network (G2ANet), were compared. Based on a selected subset of kinetic parameters, MARL's agents were assigned to calibrate the kinetic parameters. After training the weights of the MARL algorithm, MARL tests were performed on newly measured data that were not used in the algorithm training procedure. MARL-based calibrated kinetic parameters were evaluated to ensure reliable and feasible calibration performance by interpreting biological phenomena.

본 발명의 실시예에 따른 ASM 모델의 자율보정 프레임워크를 도 2의 블록로 나타내었다. 본 발명의 접근 방식은 (1) MBR 플랜트에서 사용할 MARL 알고리즘의 구조를 선택하는 부분과 (2) 타겟의 유입수 및 운영 데이터를 고려한 ASM-SMP 모델의 최적 동력학적 매개변수 값을 제안하는 두 부분으로 구성된다. The self-calibration framework of the ASM model according to an embodiment of the present invention is shown as a block in FIG. 2 . The approach of the present invention consists of two parts: (1) selecting the structure of the MARL algorithm to be used in the MBR plant and (2) proposing the optimal dynamic parameter values of the ASM-SMP model considering the influent and operational data of the target. It consists of

먼저 MBR 플랜트의 측정된 유입수 및 운전 데이터 세트를 고려하여 MARL 알고리즘의 구조를 도 2의 1단계와 같이 선택했다. 보정할 매개변수는 대상 하수처리장의 생물학적 정보를 반영하기 위해 선택되었다. 일반적으로 많은 동역학 및 화학량론적 매개변수 중 소수의 매개변수를 포함하는 하위 집합으로 필요하다. 그런 다음 보정된 모델의 정확도를 높이기 위해 하수처리장의 운영 정보에 따른 상태, 관찰, 행동, 보상을 이용하여 MARL의 구조를 선택하였다. 또한 특성화된 유입수 정보를 활용하여 유입수의 상태변수를 추정하였다. First, the structure of the MARL algorithm was selected as shown in step 1 of FIG. 2, considering the measured influent and operation data set of the MBR plant. Parameters to be calibrated were selected to reflect the biological information of the target sewage treatment plant. It is usually required as a subset containing a small number of kinetic and stoichiometric parameters. Then, in order to increase the accuracy of the calibrated model, the MARL structure was selected using the status, observation, behavior, and reward according to the operation information of the sewage treatment plant. In addition, the state variables of the influent were estimated using the influent information.

둘째, 도 2의 2단계와 같이 모델 파라미터의 최적값을 결정하기 위해 MARL 기반 자율보정을 훈련 및 적용하였다. 유입수 및 운영 데이터는 2011년부터 2018년까지 Y-city에 위치한 본격적인 MBR 플랜트에서 수집되었다. 2011년부터 2017년까지 7년간의 데이터를 훈련 세트로 사용하고 2018년 측정 데이터를 사용하여 자동 보정을 테스트했다. 본 발명의 실시예에서는 두 가지 MARL 알고리즘인 QMIX와 2단계 주의 네트워크(G2ANet) 기반 게임 추상화 메커니즘을 사용했다. MBR 플랜트의 유입수는 변동하는 특성을 가지고 있기 때문에 매일 보정된 매개변수를 제안하는 것은 불확실성을 증가시키는 함정이 될 수 있다. 따라서 MARL 알고리즘은 5일마다 보정된 매개변수 값을 제안했다. 도 2는 측정된 유입수 및 유출수 데이터 세트에 대해 MARL 기반 보정 모델이 기본 모델과 비교하여 검증되었음을 보여준다.Second, as shown in step 2 of FIG. 2, MARL-based self-calibration was trained and applied to determine the optimal value of model parameters. Influent and operational data were collected from 2011 to 2018 at a full-scale MBR plant located in Y-city. Seven years of data from 2011 to 2017 were used as the training set, and the automatic calibration was tested using the 2018 measurement data. In the embodiment of the present invention, two MARL algorithms, QMIX and a two-level attention network (G2ANet) based game abstraction mechanism are used. Given the fluctuating nature of MBR plant influents, suggesting daily calibrated parameters can be a pitfall that increases uncertainty. Therefore, the MARL algorithm suggested calibrated parameter values every 5 days. Figure 2 shows that the MARL-based calibration model was validated against the baseline model for the measured influent and effluent data sets.

먼저, 보정을 위한 동력학적 매개변수 하위집합 선택방법에 대해 설명하도록 한다. First, a method for selecting a subset of dynamic parameters for calibration will be described.

매개변수 하위 집합의 선택은 ASM-SMP에 의해 모델링된 프로세스 동작에 영향을 미칠 수 있는 매개변수를 식별하는 결정적인 절차이다. 이 절차는 최적의 매개변수 값 검색을 최적화하기 전에 수행해야 한다. 매개변수의 영향은 유입수 특성 및 작동 조건의 영향을 받을 수 있다. 매개변수 선택은 식별할 수 있는 ASM의 많은 매개변수 중에서 소수의 매개변수를 포함하는 하위 집합을 생성한다. 영향력 있는 매개변수는 두 가지 방법으로 선택할 수 있다((1) 매개변수의 민감도를 식별하기 위한 수학적 모델 및 (2) 이전 경험 및 전문 지식). 이전 연구에서 보고된 영향력 있는 매개변수 하위 집합이 정성적 모델링에 유용할 수 있음을 알 수 있다. 그러나 이전 연구의 부분집합은 다른 경우에 대한 정확한 모델링 결과를 완벽하게 보장할 수 없다.Selection of a subset of parameters is a critical procedure to identify parameters that can affect the process behavior modeled by ASM-SMP. This procedure should be performed prior to optimizing the search for optimal parameter values. The influence of parameters can be influenced by influent characteristics and operating conditions. Parameter selection creates a subset containing a small number of parameters from among the many parameters of an ASM that can be identified. Influential parameters can be selected in two ways ((1) mathematical models to identify the sensitivity of parameters and (2) prior experience and expertise). It can be seen that the influential parameter subsets reported in previous studies can be useful for qualitative modeling. However, a subset of previous studies cannot completely guarantee accurate modeling results for other cases.

따라서 본 발명의 실시예에서는 기존의 일반 하수처리 모델링에 대한 연구를 바탕으로 보정을 위한 영향력 있는 매개변수를 선정하였다. 또한 MBR 플랜트에 대한 특수 매개변수를 선택하는 것이 필수적이다. 따라서 MBR 플랜트의 생물학적 영양소 제거 과정에 영향을 미치는 추가 매개변수가 선택되었다. 모델의 화학량론의 경우, 호흡계 기반 실험을 통해 화학량론적 매개변수를 측정해야 한다. 여러 생물학적 과정의 Monod 방정식에서 화학량론적 매개변수는 일련의 ASM에서 지속적으로 확인되었으므로 본 발명의 실시예에서는 잘 정립된 화학량론적 매개변수를 활용했다.Therefore, in the embodiment of the present invention, influential parameters for correction were selected based on existing research on general sewage treatment modeling. It is also essential to select special parameters for the MBR plant. Therefore, additional parameters influencing the biological nutrient removal process of the MBR plant were selected. For the stoichiometry of the model, the stoichiometric parameters should be measured through respiratory system-based experiments. Since stoichiometric parameters in the Monod equations of several biological processes have been consistently identified in a series of ASMs, the examples of the present invention utilize well-established stoichiometric parameters.

[표 1]참고 문헌 및 본 발명에서 MBR 모델의 보정된 동력학 매개변수 요약[Table 1] Summary of calibrated kinetic parameters of the MBR model in references and the present invention

표 1은 ASM 시리즈 모델을 기반으로 하는 모델링 MBR 플랜트에서 보정된 동력학적 매개변수의 검토를 요약한 것이다. 종속 영양 생물과 독립 영양 생물의 성장과 부패, SMP의 형성과 관련된 일반적인 동력학적 매개변수를 주로 보정했습니다. 매개변수는 종속영양 및 독립영양 개체군 및 질산화 과정과 직접적인 관련이 있다. 결론적으로 8가지의 동력학적 매개변수()를 선택하였다. Table 1 summarizes the review of the calibrated kinetic parameters in the modeling MBR plant based on the ASM series model. We calibrated primarily for heterotrophic and autotrophic growth and decay, and general kinetic parameters associated with the formation of SMPs. The parameters are directly related to heterotrophic and autotrophic populations and nitrification processes. In conclusion, eight kinetic parameters ( ) was selected.

ASM-SMP의 정확도를 높이기 위해 선택된 8개의 동력학적 매개변수를 MARL 기반 자동보정을 진행했다. 보정을 위해 선택되지 않은 다른 매개변수는 기본적으로 제공되었으며 문헌 값은 국내 WWTP 및 MBR에 대해 허용 가능한 것으로 밝혀졌다. In order to improve the accuracy of ASM-SMP, MARL-based automatic calibration was performed on eight selected kinetic parameters. Other parameters not selected for calibration were provided by default and literature values were found to be acceptable for domestic WWTP and MBR.

다음으로 온도 종속 매개변수에 대해 설명하면, 생물학적 반응 속도는 활성화 에너지 개념 때문에 화학 반응과 유사하게 온도 변화에 따라 다르다. 생물학적 반응은 온도에 의존하므로 Arrhenius 방정식을 생물학적 반응에 사용할 수 있다. 따라서 ASM-SMP 모델의 성장, 붕괴 및 기타 속도 및 일정한 동역학과 관련된 온도 의존성은 Arrhenius 방정식에 따라 공식화될 수 있다. Arrhenius 방정식의 하수 처리 연구 분야에서 일반적으로 사용되는 형식은 수학식 1로 표현된다.Next, the temperature-dependent parameters are explained. Biological reaction rates vary with temperature change similarly to chemical reactions because of the concept of activation energy. Since biological reactions depend on temperature, the Arrhenius equation can be used for biological reactions. Therefore, the temperature dependence related to growth, decay and other rates and constant dynamics of the ASM-SMP model can be formulated according to the Arrhenius equation. The form commonly used in the field of sewage treatment research of the Arrhenius equation is expressed as Equation 1.

[수학식 1][Equation 1]

여기서 k_T는 온도 T에서의 동력학적 매개변수, k_T0는 기준 온도(20℃)에서의 동력학적 매개변수, θ는 일반적으로 1.04에서 1.09 사이의 아레니우스 상수이다.where k _T is the kinetic parameter at temperature T, k _T0 is the kinetic parameter at the reference temperature (20 °C), and θ is an Arrhenius constant, usually between 1.04 and 1.09.

표 2는 온도에 따른 동력학적 매개변수와 Arrhenius 계수 값의 요약을 나타낸다. ASM-SMP 모델에 포함된 생물학적 반응 속도를 나타내는 7가지 운동 매개변수는 다음과 같다. MARL 기반 보정의 매개변수 부분집합에는 와 같은 4개의 동력학적 매개변수가 포함되어 있으므로 를 포함한 나머지 3개의 동력학적 매개변수는 수학식 1의 아레니우스 방정식 기준에 따른 유입수 온도의 변화에 따라 조정되었다. 결과적으로 가수분해율, 가수분해상수, 가암모니아화율은 온도변화에 따른 다양한 생물학적 미생물의 활성을 반영하였다.Table 2 shows a summary of the temperature dependent kinetic parameters and Arrhenius coefficient values. The seven kinetic parameters representing biological reaction rates included in the ASM-SMP model are: The parameter subsets of MARL-based calibration include Since it contains four kinetic parameters such as The remaining three kinetic parameters, including , were adjusted according to the change in influent temperature according to the Arrhenius equation of Equation 1. As a result, the hydrolysis rate, hydrolysis constant, and ammonification rate reflected the activity of various biological microorganisms according to the temperature change.

[표 2] - 온도 종속 동력학적 매개변수 및 Arrhenius 계수[Table 2] - Temperature dependent kinetic parameters and Arrhenius coefficients

본 발명의 실시예에서, 다양한 ASM-SMP 시뮬레이션이 수행되었으며, 각각은 대상 MBR 플랜트의 측정 데이터 세트에서 샘플링된 유입수 및 작동 값의 샘플 세트가 서로 다르다. 각 시뮬레이션에서 MARL 기반 자율 보정은 를 포함한 적절한 운동 매개변수를 찾기 위한 전략을 검색하도록 훈련되었다. 보정된 매개변수를 사용한 각 시뮬레이션은 측정된 데이터 세트와 비교되었으며 보정된 유출수와 측정된 유출수 사이의 오차를 계산할 수 있었다. 따라서 정확한 보정 성능을 보장하기 위해 샘플링 데이터 세트를 선택하는 것이 중요하다.In an embodiment of the present invention, various ASM-SMP simulations were performed, each with a different sample set of influent and operational values sampled from the measurement data set of the target MBR plant. In each simulation, the MARL-based autonomic correction is trained to search for strategies to find appropriate kinetic parameters, including Each simulation using the calibrated parameters was compared to the measured data set and the error between calibrated and measured runoff could be calculated. Therefore, it is important to select a sampling data set to ensure accurate calibration performance.

본 발명의 실시예에서는 측정된 세트를 훈련 세트(2011년부터 2017년까지)와 테스트 세트(2018년)로 나누었다. 자율 보정 알고리즘은 훈련 세트의 데이터를 사용하여 훈련되었으며, 데이터 세트를 무작위로 샘플링하고 샘플링된 교정 기간 동안 교정을 실행했다. 샘플링된 훈련 기간은 초기화 기간 5일과 보정 기간 15일로 구성된 20일이었다. 동력학적 매개변수의 초기화는 각 시뮬레이션 동안 수행되었다. 이것은 모델의 제안된 동력학적 매개변수가 다른 보정 기간의 시작에 모델의 초기 조건에 영향을 미칠 수 있기 때문에 필요하다. 예를 들어, 추운 계절에 낮은 미생물의 활동은 미생물의 성장을 감소시키고 부패 속도 매개변수를 증가시킬 수 있다. 추운 계절에 보정된 성장률과 부패율을 사용한 초기 값은 더운 계절 데이터를 보정에 사용했을 때 미생물의 활동을 과소평가하는 영향을 미칠 수 있다.In the embodiment of the present invention, the measured set is divided into a training set (from 2011 to 2017) and a test set (2018). The autonomous calibration algorithm was trained using data from the training set, randomly sampling the data set and running the calibration during the sampled calibration period. The sampled training period was 20 days consisting of an initialization period of 5 days and a calibration period of 15 days. Initialization of kinetic parameters was performed during each simulation. This is necessary because the proposed kinetic parameters of the model may affect the initial conditions of the model at the start of different calibration periods. For example, low microbial activity in the cold season can reduce microbial growth and increase decay rate parameters. Initial values using growth and decay rates corrected for the cold season may have the effect of underestimating microbial activity when warm season data are used for correction.

도 3은 다중 에이전트 강화 학습 기반 보정의 개념적 샘플링 및 실행을 보여준다. 하나의 시뮬레이션 기간은 5일 동안의 초기화 기간(매개변수의 기본값이 사용됨)과 MARL이 동력학적 매개변수를 조절하는 15일 동안의 보정 기간의 두 가지 후속 부분으로 구성된다. MARL 기반 보정은 5일마다 동력학적 매개변수 값을 제안했다. 따라서 각 동역학 파라미터에 대해 15일의 보정 기간 동안 자율 보정을 통해 순차적인 3개의 값을 제안했다. 각 보정된 매개변수의 5일 기간은 윈도우의 크기를 나타낸다. 윈도우 크기는 계산 시간과 대상 시스템의 복잡성 사이의 균형을 맞추기 위해 선택할 수 있다. 위도우의 크기가 크면 데이터 정보 고유의 정보가 손실될 수 있으며, 반면에 윈도우 크기가 작으면 복잡성이 증가할 수 있다. 정확한 보정 결과를 보장하기 위해 윈도우 크기로 5일을 선택했다.Figure 3 shows the conceptual sampling and implementation of multi-agent reinforcement learning based calibration. One simulation period consists of two subsequent parts: an initialization period of 5 days (default values of the parameters are used) and a 15-day calibration period in which MARL adjusts the kinetic parameters. MARL-based calibration suggested kinetic parameter values every 5 days. Therefore, we proposed three sequential values for each kinetic parameter through autonomous calibration over a 15-day calibration period. The 5-day period for each calibrated parameter represents the size of the window. The window size can be chosen to strike a balance between computation time and the complexity of the target system. If the size of the window is large, information specific to data information may be lost, whereas if the size of the window is small, complexity may increase. A window size of 5 days was chosen to ensure accurate calibration results.

초기화 및 보정 기간을 포함하여 총 20일을 무작위로 선택하면 자동 보정 시스템은 도 3과 같이 초기화 기간의 기본 값에서 동력학적 매개 변수를 변경했다. 이 초기화 기간에는 보정 시스템의 훈련이 활성화되지 않았지만 조정된 동력학적 매개변수가 보정 기간의 초기 지점으로 제안되었다. 그런 다음 자율 시스템은 초기화 기간의 조정된 값에서 동력학적 매개변수를 변경했다. 이 시점에서 자율 시스템은 환경과 인터페이스하고 가중치 값을 업데이트하도록 훈련되었다.Randomly selecting a total of 20 days including the initialization and calibration period, the automatic calibration system changed the kinetic parameters from the default values of the initialization period as shown in Fig. 3. Training of the calibration system was not active in this initialization period, but the tuned kinetic parameters were suggested as an initial point in the calibration period. The autonomous system then changed the dynamic parameters at the adjusted values in the initialization period. At this point, the autonomous system has been trained to interface with the environment and update the weight values.

다음으로 자율교정 다중 에이전트 강화학습에 대해 설명한다. 최근 몇 년 동안 하수 처리 프로세스와 같은 수문학 연구의 의사 결정 문제에서 딥 러닝 및 강화 학습(RL)의 고급 적용이 시도되었다. RL은 출력이 지정되지 않은 자가 학습 알고리즘이다. 보상 정책과 시도 오류 메커니즘을 사용하여 주어진 문제에 대한 최적의 솔루션을 찾는다. 고급 RL은 최근 MARL(Multi-Agent RL)로 체계적으로 모델링된 하나 이상의 단일 에이전트를 사용하여 여러 문제를 해결하기 위해 적용되었다. MARL은 공통 환경에서 작동하는 여러 자율 에이전트의 동시 의사 결정 문제를 처리한다. 각 MARL 에이전트는 환경 및 다른 에이전트와 상호 작용하여 문제를 해결하고 최적화하는 것을 목표로 한다.Next, self-correcting multi-agent reinforcement learning is described. In recent years, advanced applications of deep learning and reinforcement learning (RL) have been attempted in decision-making problems in hydrological research, such as sewage treatment processes. RL is a self-learning algorithm with unspecified output. It finds the optimal solution to a given problem using a reward policy and a trial-and-error mechanism. Advanced RL has recently been applied to solve multiple problems using one or more single agents systematically modeled as Multi-Agent RL (MARL). MARL addresses the problem of concurrent decision-making by multiple autonomous agents operating in a common environment. Each MARL agent aims to solve and optimize problems by interacting with the environment and other agents.

본 발명의 실시예에서는 동력학적 매개변수의 값을 동시에 제시하기 위해 MARL을 사용하였다. 도 4는 ASM-SMP 모델을 보정하기 위한 MARL 알고리즘의 구조를 보여준다. 본 발명의 실시예에서는 MARL 구조에서 중앙 집중식 교육과 분산 실행을 채택했다. 에이전트는 훈련 중 에이전트의 각 정보를 공유할 수 있으며 각 (로컬) 관찰을 기반으로 분산 작업이 실행된다. 본 발명의 실시예에서는 보정된 운동 매개변수를 n개의 에이전트의 작용으로 사용하고 유입수 및 작동 조건을 관찰로 사용했다. 주요 알고리즘은 에이전트가 다른 에이전트의 관찰, 보상 및 기울기에서 외부 정보를 받을 수 있다는 것이다. 에이전트는 로컬 관찰을 고려하여 정책을 분산적으로 실행한다. 중앙 집중식 훈련의 일부에서는 비평가 또는 Q-네트워크를 사용하여 에이전트를 함께 훈련할 수 있다. 본 발명의 실시예에서는 유입수 및 운전조건을 비평가 또는 Q-네트워크 상태로 사용하였다. 이 구조는 비정상성과 부분 관찰 가능성을 줄이는 데 도움이 된다. 다른 에이전트의 정책이 변경되더라도 에이전트의 학습 절차를 안정화하기 때문이다.In an embodiment of the present invention, MARL was used to simultaneously present values of kinetic parameters. Figure 4 shows the structure of the MARL algorithm for calibrating the ASM-SMP model. In the embodiment of the present invention, centralized training and distributed execution are adopted in the MARL architecture. Agents can share their respective information during training, and distributed tasks are executed based on each (local) observation. In the embodiment of the present invention, the calibrated kinetic parameters were used as the actions of n agents and the influent and operating conditions were used as observations. The main algorithm is that an agent can receive external information from other agents' observations, rewards, and gradients. Agents execute policies distributedly, taking local observations into account. In part of centralized training, agents can be trained together using critics or Q-networks. In the embodiment of the present invention, influent and operating conditions were used as critics or Q-network conditions. This structure helps reduce non-stationarity and partial observability. This is because it stabilizes the agent's learning process even if other agents' policies change.

본 발명의 실시예에서는 QMIX와 G2ANet이라는 두 가지 MARL 기술을 사용했다. QMIX는 중앙 집중식 종단 간 방식으로 분산된 정책을 훈련하는 가치 기반 방법이다. G2ANet은 완전한 그래프로 에이전트 간의 관계를 모델링하고 2단계 주의 네트워크 기반 게임 추상화 메커니즘에 의한 조치를 제안한다. In the embodiment of the present invention, two MARL technologies are used: QMIX and G2ANet. QMIX is a value-based method of training decentralized policies in a centralized, end-to-end manner. G2ANet models the relationship between agents as a complete graph and proposes actions by a two-level attention network-based game abstraction mechanism.

QMIX는 중앙 집중식 교육 및 분산 실행 구조를 가지고 있습니다. QMIX는 로컬 관찰에 대해서만 조건을 지정하는 각 에이전트 값의 복잡한 비선형 조합으로 공동 작업 값을 계산하는 네트워크를 사용합니다. QMIX는 단일 에이전트 로컬 가치 기능을 수집하기 위해 하이브리드 네트워크를 활용하며 교육 및 학습 프로세스에 글로벌 상태를 포함합니다. 그 결과 알고리즘 성능이 향상되고 중앙 집중식 정책과 분산 정책 간의 안정적인 일관성이 유지됩니다.QMIX has a centralized training and distributed execution structure. QMIX uses networks that compute collaborative values as complex, non-linear combinations of the values of each agent that condition only on local observations. QMIX leverages a hybrid network to collect single-agent local value features and includes global state in the training and learning process. This results in better algorithm performance and stable consistency between centralized and distributed policies.

QMIX는 최적의 공동 행동 가치 기능을 기반으로 결정론적 탐욕적인 탈중앙화 정책과 중앙 집중화 정책 간의 일관성을 유지한다. 따라서 조인트 Q 함수와 로컬 Q 함수는 수학식 3을 충족한다.QMIX maintains consistency between deterministic and greedy decentralization policies and centralization policies based on the optimal joint action value function. Therefore, the joint Q function and the local Q function satisfy Equation 3.

[수학식 3][Equation 3]

여기서 는 함수에서 최대 Q-값을 제공하는 인수 u를 찾고, 는 공동 행동-관찰 기록 τ 및 공동 행동 u에 대한 중앙 집중식 행동-값 Q-함수 조건화이다. Q-함수는 주어진 상태에서 취한 행동에 대한 예상 보상을 계산한다. 는 총 에이전트의 각 에이전트 a에 대한 개별 가치 함수이며, 개별 행동 관찰 기록 τ ^a 와 u ^a 에 에 대해서만 조건을 갖는다. 수학식 3은 각 에이전트가 Q_a에 따라 greed action을 선택하여 분산 실행에 단독으로 참여할 수 있도록 한다.here Is find the argument u that gives the maximum Q-value in the function, is the centralized action-value Q-function conditioning for joint action-observation records τ and joint action u . The Q-function calculates the expected reward for an action taken in a given state. is the total It is an individual value function for each agent a of an agent, and is conditional only on individual behavioral observation records τ ^a and u ^a . Equation 3 allows each agent to participate singly in the distributed execution by selecting a greed action according to Q _a .

수학식 3은 더 큰 단조 함수 그룹으로 일반화될 수 있다. 단조성은 수학식 4로 표현되는 바와 같이 Q_tot 및 Q_a 각각 사이의 제약 함수로 정의된다. 각 로컬 에이전트 Q_a의 업데이트는 공동 에이전트 Q_tot를 증가시킨다.Equation 3 can be generalized to a larger group of monotonic functions. Monotonicity is defined as a constraint function between each of Q _tot and Q _a as expressed by Equation 4. Each update of the local agent Q _a increments the co-agent Q _tot .

[수학식 4][Equation 4]

도 5는 에이전트 네트워크, 혼합 네트워크 및 하이퍼 네트워크 세트를 포함하는 QMIX 구조를 나타낸 것이다. 는 단일 에이전트 네트워크 a의 개별 값 함수로 도 5(c)와 같이 현재 개별 관측값과 이전 동작을 매번 입력으로 받는 DRQN을 사용한다. DRQN은 GRU(gated recurrent unit)를 사용하여 숨겨진 상태 값을 전송하여 더 긴 시간 규모에 대한 학습을 용이하게 한다. 확률 1-ε가 있는 엡실론-스케줄에 따르면 DRQN의 출력은 의 최대값을 선택하는 정책 π에 따른 이다.Figure 5 shows a QMIX architecture comprising a set of agent networks, mixed networks and hypernetworks. is an individual value function of a single agent network a, and uses DRQN, which receives the current individual observation value and previous operation as inputs each time, as shown in FIG. DRQN uses gated recurrent units (GRUs) to transmit hidden state values to facilitate learning on longer time scales. According to the epsilon-schedule with probability 1-ε, the output of DRQN is According to the policy π to choose the maximum value of am.

도 5(a)에 표시된 혼합 네트워크는 개별 에이전트의 출력을 수신하고 를 단조롭게 혼합하고 를 생성하는 피드포워드 신경망이다. 하이퍼네트워크 W ₁ 및 W ₂ 는 상태 S를 입력으로 수신하여 혼합 네트워크의 각 계층의 가중치를 생성한다. 하이퍼네트워크는 ReLU(Rectified Linear Unit)를 사용하는 두 개의 완전히 연결된 레이어로 구성된다.The mixed network shown in Fig. 5(a) receives the output of individual agents and mix monotonously is a feedforward neural network that generates HyperNetwork W ₁ and W ₂ receives state S as an input and generates weights for each layer of the mixed network. The hypernetwork consists of two fully connected layers using Rectified Linear Units (ReLUs).

G2ANet은 게임 추상화 기반의 end-to-end 모델 설계를 통해 에이전트 간의 상호작용 관계를 자동으로 학습한다. 모든 에이전트는 그래프 구조로 표현되며 멀티 게임 추상화 알고리즘으로 그룹화된다. 이 멀티 게임 추상화 알고리즘은 2단계 주의 메커니즘으로 구성되며, 이는 하드 어텐션Hard Attention)과 소프트 어텐션(Soft Attention)이다. 또한 그래프 신경망(GNN)과 결합하여 다른 에이전트의 기여도를 표시하기 위한 공동 인코딩을 얻을 수 있다. 계산 절차는 도 6에 나와 있다. G2ANet automatically learns the interaction relationship between agents through game abstraction-based end-to-end model design. All agents are represented as graph structures and grouped into multi-game abstraction algorithms. This multi-game abstraction algorithm consists of a two-level attention mechanism, which is Hard Attention and Soft Attention. It can also be combined with a graph neural network (GNN) to obtain joint encoding for displaying contributions from other agents. The calculation procedure is shown in FIG. 6 .

(1) Hard Attention: Hard Attention은 샘플링 방법에 따라 중요한 요소를 선택하고 나머지는 버린다. 대상 에이전트와 관련이 없는 다른 에이전트를 잘라낸다.(1) Hard Attention: Hard Attention selects important elements according to the sampling method and discards the rest. Cut other agents unrelated to the target agent.

(2) Soft Attention: Soft Attention은 요소의 중요도 분포를 계산한다. 에이전트 간의 중요도 가중치(관계)를 학습한다. (2) Soft Attention: Soft Attention calculates the importance distribution of elements. Importance weights (relationships) between agents are learned.

(3) GNN(graph embedding): GNN은 다른 에이전트의 기여도를 계산하고 조정 관계를 포함하는 다른 에이전트의 정보를 출력한다.(3) GNN (graph embedding): GNN calculates the contribution of other agents and outputs information of other agents including coordination relationships.

도 7은 G2ANet의 세부 구조를 보여준다. 에이전트 i는 각 시간 단계 t에서 관찰 o_i ^t와 행동 a_i ^t를 MLP와 GRU에 의해 내장된 특징 벡터 e_i ^t로 인코딩한다. Hard Attention은 도 7(a)와 같이 에이전트 i와 에이전트 j 간의 연결을 나타내는 hard-attention 가중치 W_h ^i,j를 계산한다.7 shows the detailed structure of G2ANet. Agent i encodes the observation o _i ^t and the action a _i ^t at each time step t into a feature vector e _i ^t embedded by MLP and GRU. Hard Attention calculates the hard-attention weight W _h ^i,j representing the connection between agent i and agent j as shown in FIG. 7(a).

hard Attention 구조를 통해 각 에이전트는 다른 에이전트에 대해 다른 역할을 한다. 여기에서 양방향 그래프 신경망(Bi-GRU)과 gumble-softmax는 기울기의 역전파를 계산하는 데 사용된다. 도 7(b)는 에이전트 간의 소프트 어텐션 가중치 W_s ^i,j를 학습하는 소프트 어텐션을 나타낸다. 에이전트의 임베딩을 계산하는 쿼리-키-값(query- key-value) 구조로 구성되다. GNN은 도 7(c)와 같이 에이전트 i에 대한 다른 모든 에이전트의 기여도를 정의하기 위해 조인트 벡터 인코딩을 계산한다. 마지막으로 도 7(d)와 같이 Q-값을 계산하기 위해 중앙집중식 비평가와 개별 행위자(Central-V)를 사용한다. Central-V는 정책에 따른 행위자 비평가 알고리즘이며 개별 관찰 기록보다 상태를 조건화한다.Through the hard Attention structure, each agent plays a different role to other agents. Here Bi-GRU and gumble-softmax are used to compute the backpropagation of the gradient. 7(b) shows soft attention learning the soft attention weight W _s ^i,j between agents. It consists of a query-key-value structure that calculates the embedding of the agent. The GNN calculates the joint vector encoding to define the contributions of all other agents to agent i as shown in Fig. 7(c). Finally, as shown in Fig. 7(d), a centralized critic and individual actors (Central-V) are used to calculate the Q-value. Central-V is a policy-dependent actor-critic algorithm that conditions states rather than individual observation records.

본 발명의 실시예는 Y-city에 위치한 풀스케일 MBR 공장에서 수행되었다. 2011년부터 2018년까지 4개의 반응기에서 유입수 및 MLSS(혼합액 부유물질) 농도를 포함한 운영 데이터가 도 8과 같이 매일 수집되었다. 평균 유입수 및 유출수 COD 농도는 각각 331.92 및 14.05 mg/L였습니다. COD 제거 효율은 95.76%였다. TN의 경우 유입수 및 유출수 농도는 각각 44.23, 6.24 mg/L이다. 따라서 TN 제거 효율은 85.90%였다. 각 반응기의 MLSS 농도와 평균 농도를 도 8(c)에 나타내었다. 무산소, 호기 및 막 반응기는 일반적인 MBR 공정과 마찬가지로 질화 및 탈질 공정을 유지하기 위한 MLSS 농도가 상대적으로 높다. 도 8(d)는 시간에 따른 유입폐수 온도의 변화를 보여준다. 더운 계절(6월~9월)에 수온이 상승하고 추운 계절(11월~2월)에 낮아지는 명백한 경향이 있다. 따라서 정확하고 동적인 보정을 통해 대상 플랜트의 계절 및 온도 의존 특성뿐만 아니라 운영 특성을 반영하는 것이 중요하다.An embodiment of the present invention was performed at a full-scale MBR plant located in Y-city. From 2011 to 2018, operational data including influent and mixed liquor suspended solids (MLSS) concentrations from the four reactors were collected daily as shown in Figure 8. The average influent and effluent COD concentrations were 331.92 and 14.05 mg/L, respectively. The COD removal efficiency was 95.76%. For TN, the influent and effluent concentrations are 44.23 and 6.24 mg/L, respectively. Therefore, the TN removal efficiency was 85.90%. The MLSS concentration and average concentration of each reactor are shown in FIG. 8(c). Anoxic, aerobic, and membrane reactors have relatively high MLSS concentrations to sustain nitrification and denitrification processes, as in typical MBR processes. 8(d) shows the change in influent wastewater temperature over time. There is a clear trend for water temperatures to rise in the hot season (June to September) and decrease in the cold season (November to February). Therefore, it is important to reflect the operating characteristics as well as the seasonal and temperature dependent characteristics of the target plant through accurate and dynamic calibration.

값의 제안을 나타내는 8개의 에이전트가 각각 할당되었다. 본 발명의 실시예에서는 에이전트 1에서 에이전트 4가 각각 에 해당하는 특정 숫자를 각 에이전트에 부여했다. 에이전트 1에서 4는 하수 COD 농도와 관련이 있다. 매개변수는 종속영양생물과 종속영양생물의 성장과 용해에 영향을 미치는 BAP를 나타내기 때문이다. 다른 한편, 에이전트 5 내지 8은 각각 의 보정을 나타내었다. 이러한 매개변수는 TN 제거 메커니즘 및 독립영양생물의 성장 및 용해에 대한 영향을 나타낸다. MARL 에이전트의 세부 구조는 다음과 같다. Eight agents each representing a proposal of value were assigned. In an embodiment of the present invention, agents 1 to 4 are each A specific number corresponding to was given to each agent. Agents 1 to 4 are related to sewage COD concentrations. This is because the parameters represent heterotrophs and BAPs that affect heterotrophic growth and lysis. On the other hand, agents 5 to 8 are respectively showed the correction of These parameters represent TN clearance mechanisms and effects on autotrophic growth and lysis. The detailed structure of the MARL agent is as follows.

- State and observations:- State and observations :

중앙 집중식 학습 중 에이전트만 사용할 수 있는 전역 상태에는 유입수 및 생물학적 특성에 대한 정보가 포함된다. 구체적으로, 상태 벡터는 COD, TN, NO, NH, SS, Q와 현재 및 전일(d 및 d-1 일)에 4개 반응기의 유입수 온도 및 MLSS 농도를 포함한다. 에이전트 1에서 8은 COD 및 TN 제거와 관련된 목표 동력학적 매개변수에 따라 지역 관찰을 받았다. 에이전트 1~4는 유입수 S_i(용해성 불활성 COD), S_s(쉽게 생분해되는 COD), X_i(비활성 현탁 COD), X_s(천천히 생분해되는 COD), COD 농도 및 온도를 관찰로 사용했다. 유입수 S_NO(용해성 질산질소), S_NH(용해성 암모니아성 질소), S_ND(용해성 생분해성 유기 질소), X_ND(천천히 생분해성 유기 질소), TN 농도 및 온도는 5 내지 8 에이전트에서 관찰하였다.During centralized learning, the global state available only to the agent contains information about influent and biological properties. Specifically, the state vectors include COD, TN, NO, NH, SS, Q, and the influent temperatures and MLSS concentrations of the four reactors for the current and previous days (d and d-1 days). Agents 1 to 8 received regional observations according to target kinetic parameters related to COD and TN removal. Agents 1 to 4 used influent S _i (soluble inert COD), S _s (readily biodegradable COD), X _i (inert suspended COD), X _s (slowly biodegradable COD), COD concentration and temperature as observations. Influent S _NO (soluble nitrogen nitrate), S _NH (soluble ammonia nitrogen), S _ND (soluble biodegradable organic nitrogen), X _ND (slowly biodegradable organic nitrogen), TN concentrations and temperatures were observed for 5 to 8 agents. .

-Actions: -Actions:

에이전트에게 허용된 개별 작업 집합은 -20%, -10%, 0%, +10% 및 +20%로 구성된다. 백분율은 각 운동 매개변수의 변화 크기를 나타낸다. 예를 들어, 의 +20%는 +1.48이며, 이는 동력학적 매개변수의 하한(0.6)과 상한(8) 간의 차이 값에 +20%를 곱하여 계산된다. 따라서 (8-0.6)×0.2=1.48로 계산되었다.The set of individual actions allowed to the agent consist of -20%, -10%, 0%, +10% and +20%. Percentage represents the magnitude of change in each motion parameter. for example, +20% of is +1.48, which is calculated by multiplying +20% by the difference between the lower (0.6) and upper (8) bounds of the kinetic parameter. Therefore, it was calculated as (8-0.6) × 0.2 = 1.48.

-Rewards:- Rewards :

보정의 전반적인 목표는 시뮬레이션 오류를 최소화하는 것이다. 따라서 수학식 5.1과 같이 MARL에 수치적 보상함수를 부여하였다.The overall goal of calibration is to minimize simulation errors. Therefore, a numerical compensation function was assigned to MARL as shown in Equation 5.1.

[수학식 5.1][Equation 5.1]

여기서 r(d)는 d 일의 보상 값으로 -1에서 +1까지 다양하고 MAPE_pollutant(d)는 D일의 각 오염 물질의 평균 절대 백분율 오차dl다. 이는 에이전트가 작업을 효율적으로 탐색하고 보정 성능 향상을 촉진하는 데 도움이 될 수 있도록 보상을 정규화했다. MAPE는 수학식 5.2로 표현된다.where r(d) is the compensation value on day d, varying from -1 to +1, and MAPE _pollutant (d) is the average absolute percentage error of each contaminant on day dl. This normalized the reward so that it can help the agent navigate the task efficiently and promote better calibration performance. MAPE is expressed as Equation 5.2.

[수학식 5.2][Equation 5.2]

여기서 y는 측정된 폐수 COD 및 TN 농도이고 는 보정된 ASM-SMP 모델을 통해 모델링된 하수 COD 및 TN 농도이다. MAPE 값이 낮을수록 모델의 정확도가 높음을 의미하므로 보정된 모델이 폐수 COD 및 TN 모델링에 대한 정확한 성능을 갖는다면 수학식 5.1과 같이 MAPE 값이 감소하고 보상이 높아진다.where y is the measured wastewater COD and TN concentrations are the sewage COD and TN concentrations modeled through the calibrated ASM-SMP model. The lower the MAPE value, the higher the accuracy of the model, so if the calibrated model has accurate performance for wastewater COD and TN modeling, the MAPE value decreases and the compensation increases as shown in Equation 5.1.

앞서 언급한 ASM-SMP 모델의 운동 매개변수 보정을 위한 MARL의 구조는 표 3에 요약되어 있다.The structure of MARL for calibrating the kinetic parameters of the aforementioned ASM-SMP model is summarized in Table 3.

[표 3] ASM-SMP의 운동 매개변수 보정을 위한 QMIX 및 G2ANet의 구조[Table 3] Structure of QMIX and G2ANet for kinetic parameter calibration of ASM-SMP

자동 보정 시스템의 MARL 알고리즘은 과거 측정된 유입수 및 운영 데이터를 무작위로 샘플링하여 훈련되었다. 유입수의 비정상 변화에 대응하는 높고 강력한 성능을 위해 훈련된 MALR을 통해 보정된 운동 매개변수를 안정적으로 제안하는 것이 중요하다. 본 발명의 실시예에서는 MARL의 안정적인 성능을 검증하기 위해 epoch 수에 따른 MARL의 훈련 절차를 비교하였다. 도 9은 QMIX와 G2ANet 기반 자율보정 시스템의 누적 epoch 수에 따른 평균 보상 값을 보여준다. epochs 번호는 앞서 언급된 바와 같이 초기화 및 보정 기간을 샘플링하여 훈련한 횟수를 나타낸다. MARL 알고리즘을 훈련시키기 위해 7년 간의 데이터에서 총 20,000개의 무작위 샘플링이 수행되었다. 100 epoch당 평균 보상 값은 두 MARL 알고리즘의 보상 값 증가 경향을 명시적으로 비교하기 위해 표시된다.The automatic calibration system's MARL algorithm was trained on a random sampling of past measured influent and operational data. It is important to reliably propose calibrated kinetic parameters through a trained MALR for high and robust performance in response to anomalous changes in influent. In the embodiment of the present invention, MARL training procedures were compared according to the number of epochs in order to verify the stable performance of MARL. Figure 9 shows the average compensation value according to the cumulative number of epochs of QMIX and G2ANet-based autonomous calibration system. The number of epochs represents the number of times we trained by sampling the initialization and calibration period, as mentioned earlier. A total of 20,000 random samplings were performed on 7 years of data to train the MARL algorithm. The average reward value per 100 epochs is shown to explicitly compare the increasing trend of the reward value of the two MARL algorithms.

QMIX는 훈련 에포크(epoch)에 따른 보상 변화가 불안정했다. 2,000 epoch 정도의 알고리즘 훈련 초기에 QMIX는 빠르게 훈련되었고 높은 평균 보상 값을 받았다. QMIX는 알고리즘의 과거 상태-행동-보상 정보를 저장하기 위해 메모리를 사용했기 때문dl다. QMIX는 오래된 경험의 재생 버퍼에서 데이터를 사용하는 정책 외 강화 학습 알고리즘이다. 그러나 QMIX는 확률값을 감소시켜 탐색에서 활용으로 행동 선택 방식을 변경했음에도 불구하고 전체 훈련 절차에서 일관되지 않은 행동을 보였다. e 확률은 낮은 e 확률 값과 높은 e 확률 값으로 각각 탐색 또는 활용 여부를 선택하는 epsilon-greed을 나타낸다.QMIX was unstable in reward change according to training epoch. At the beginning of algorithm training, around 2,000 epochs, QMIX was trained quickly and received high average reward values. This is because QMIX uses memory to store the algorithm's past state-action-reward information. QMIX is an out-of-policy reinforcement learning algorithm that uses data from a playback buffer of old experiences. However, QMIX showed inconsistent behavior throughout the entire training procedure despite changing the action selection method from exploration to utilization by reducing the probability value. The e probability represents epsilon-greed, which selects whether to search or utilize with a low e probability value and a high e probability value, respectively.

G2ANet의 보상은 도 9과 같이 훈련이 진행될수록 안정적으로 증가하는 경향을 보였다. G2ANet 기반 자율교정 시스템의 central-V는 on-policy 알고리즘으로 action-value function이 현재 정책에 적합하고 action-values(O')에 따라 대부분 greedy에 의해 개선된 policy gradient 방법이다. G2ANet은 대상 MBR 플랜트의 상태 및 관찰에서 직접 행동 확률을 얻었고, 미래 기대 보상을 극대화하는 방법을 찾았다. 결과적으로 G2ANet 알고리즘은 QMIX에 비해 전체 훈련 기간 동안 더 안정적인 보상 값을 보였다. 훈련 동안 15,000 epoch 이후 안정적으로 수렴된 보상 값이 관찰되었다. G2ANet의 훈련된 가중치는 15,000 에포크 후에 업데이트된 운동 매개변수 보정에 사용되었다. 본 발명의 실시예에서는 적응성 및 안정적인 기능을 고려하여 운동 매개변수를 보정하기 위해 QMIX 대신 G2ANet을 선택했다. As shown in FIG. 9, the reward of G2ANet tended to increase stably as training progressed. The central-V of the G2ANet-based self-correcting system is an on-policy algorithm, a policy gradient method in which the action-value function is suitable for the current policy and most of them are improved by greedy according to the action-values (O'). G2ANet obtained action probabilities directly from the state and observation of the target MBR plant, and found a way to maximize future expected rewards. As a result, the G2ANet algorithm showed more stable reward values over the entire training period compared to QMIX. During training, a stable converged reward value was observed after 15,000 epochs. The trained weights of G2ANet were used to calibrate the updated motion parameters after 15,000 epochs. In the embodiment of the present invention, G2ANet was selected instead of QMIX to calibrate the motion parameters in consideration of adaptability and stable function.

본 발명의 실시예에서는 t-분산 확률적 이웃 임베딩(t-distributed stochastic neighbor embedding, t-SNE)이라고 하는 고차원 및 비선형 차원 데이터의 시각화를 사용하여 훈련 절차에서 여러 에이전트의 보정 성능을 뒷받침하는 G2ANet 에이전트의 8가지 동작 변화를 조사했다. t-SNE는 꼬리가 두꺼운 스튜던트-t 분포 방법을 사용하여 맵의 중심(임베딩이라고도 함)으로 다양한 축척의 고차원 및 비선형 차원 데이터를 줄인다.In an embodiment of the present invention, a G2ANet agent that underpins the calibration performance of multiple agents in a training procedure using visualization of high-dimensional and non-linear dimensional data, called t-distributed stochastic neighbor embedding (t-SNE). 8 types of motion changes were investigated. t-SNE uses the heavy-tailed Student-t distribution method to reduce high-dimensional and non-linear dimensional data at different scales to the center of the map (also called embedding).

도 10은 G2ANet의 유사한 보정된 동력학적 매개변수를 가까운 점에 매핑하는 t-SNE 알고리즘의 결과를 보여준다. t-SNE 알고리즘은 훈련 에포크 수 측면에서 동력학적 매개변수에 대해 유사한 임베딩을 생성했다. 보상 증가 추세에 따라 도 9와과 도 10에서 보는 바와 같이 훈련 절차는 크게 무작위 행동, 훈련 진행, 완전 훈련의 3단계로 나눌 수 있다. t-SNE에 의한 임베딩은 도 10(a)와 같이 세 가지 훈련 절차로 클러스터링되었다.Figure 10 shows the results of the t-SNE algorithm that maps similar calibrated dynamical parameters of G2ANet to nearby points. The t-SNE algorithm produced similar embeddings for the kinetic parameters in terms of number of training epochs. As shown in FIGS. 9 and 10 according to the reward increase trend, the training procedure can be largely divided into three stages: random action, training progress, and complete training. Embedding by t-SNE was clustered into three training procedures as shown in Fig. 10(a).

t-SNE의 결과는 임베디드 포인트가 양의 t-SNE1 방향을 따라 점진적으로 진행되는 명백한 경향을 보였다. G2ANet 알고리즘은 훈련 절차 전반에 걸쳐 적절한 운동 매개변수를 제안하여 모델의 정확도를 높이는 방법을 배웠다고 말할 수 있다.The results of t-SNE showed an obvious trend in which the embedded points progressively progressed along the positive t-SNE1 direction. It can be said that the G2ANet algorithm has learned to increase the accuracy of the model by suggesting appropriate motion parameters throughout the training procedure.

도 10(b)는 G2ANet의 훈련 절차의 무작위 동작에서 t-SNE에 의한 임베딩 포인트를 보여준다. 포인트는 랜덤 액션 절차 전반에 걸쳐 3차원 t-SNE에 널리 흩어져 있다. 8개 에이전트의 행동은 착취가 아닌 탐색과 G2ANet의 초기 정책에 따라 결정되었기 때문이다. 따라서 도 10의 왼쪽과 같이 해당 단계의 보상 값도 변동하였다. 훈련 절차가 진행됨에 따라 임베딩 포인트는 양의 t-SNE 1 방향으로 이동했다. 훈련 진행 시 임베딩 포인트는 랜덤 액션 절차와 비교하여 도 10(c)와 같이 3차원 t-SNE의 주변 공간에 매핑되었다. G2ANet은 훈련 중 MBR 플랜트의 유입수 및 운영 정보 등 무작위로 선택된 상태를 접하고 욕심을 내서 정제된 정책을 업데이트했다. 이 단계에서 G2ANet은 정책을 개선하고 성능의 기울기 방향으로 가중치를 업데이트했다.10(b) shows the embedding points by t-SNE in the random operation of the G2ANet training procedure. The points are widely scattered in 3-dimensional t-SNE throughout the random action procedure. This is because the actions of the 8 agents were determined by G2ANet's initial policy of exploration and not exploitation. Therefore, as shown on the left of FIG. 10, the compensation value of the corresponding step also changed. As the training procedure progressed, the embedding points moved in the positive t-SNE 1 direction. During training, the embedding points were mapped to the surrounding space of the 3D t-SNE as shown in FIG. 10(c) compared to the random action procedure. During training, G2ANet encountered randomly selected conditions such as influent and operational information of the MBR plant, and greedily updated the refined policy. At this stage, G2ANet refined the policy and updated the weights in the direction of the gradient of performance.

완전한 훈련의 임베딩 포인트는 도 10(d)에 나와 있다. 여기서 G2ANet은 성능을 평가하기 위해 해당 단계에서 가중치를 사용하여 테스트 세트에 대한 성능을 검증했다. 훈련 절차의 무작위 행동과 훈련 진행 상황과 비교하여 전체 훈련에서 매핑된 포인트가 집중되었다. 이는 MBR 플랜트의 상태가 무작위로 선택되었음에도 불구하고 G2ANet의 8개 에이전트가 일치하여 일관된 운동 매개변수를 제안했음을 나타낸다. 따라서 G2ANet은 MBR 플랜트의 복잡한 유입 조건이 관찰되는 경우에도 ASM-SMP 모델을 견고하게 보정할 수 있는 기능을 가지고 있었다. 결과적으로, 본 발명의 실시예에서는 MBR 플랜트의 생물학적 과정을 반영하기 위한 적절한 운동 매개변수를 제안하기 위해 15,000 훈련 에포크 이후 훈련된 G2ANet을 주요 보정 알고리즘으로 사용했다.The embedding points of the complete training are shown in Fig. 10(d). Here, G2ANet validated the performance on the test set using the weights at that stage to evaluate the performance. The mapped points were concentrated in the entire training compared to the random behavior of the training procedure and the training progress. This indicates that the eight agents in G2ANet agreed and suggested consistent kinetic parameters, despite the fact that the state of the MBR plant was randomly selected. Therefore, G2ANet had the ability to robustly calibrate the ASM-SMP model even when complex inlet conditions of MBR plants were observed. As a result, in the present embodiment, G2ANet trained after 15,000 training epochs was used as the main calibration algorithm to propose appropriate motion parameters to reflect the biological process of the MBR plant.

검증은 대상 MBR 공장에서 2018년에 수집된 다른 데이터 세트를 사용하여 수행되었다. 도 11은 제안된 G2ANet 알고리즘에서 제안한 2018년의 보정된 모델을 폐수 COD 및 TN 농도와 관련하여 보여준다. 모든 매개변수의 기본값을 사용하는 기본 모델과 비교하여 G2ANet 기반 자동 보정은 오류를 크게 줄였다. 도.11(a)와 같이 하수 COD 농도를 시뮬레이션하기 위해 G2ANet 알고리즘의 놀라운 보정 성능이 관찰되었다. 기본 모델은 폐수 COD 농도를 올바르게 시뮬레이션하는 데 한계가 있다. 측정된 하수 COD 농도와 기본 모델에 의한 모의 하수 COD 농도 간의 편차는 21.48mg/L였다. 반면 G2ANet 기반 보정 모델은 편차를 2.78mg/L로 줄였다. 따라서 ASM-SMP 모델을 보정하기 위해 G2ANet을 사용했을 때 폐수 COD 농도 시뮬레이션의 편차는 87.05% 감소했다.Verification was conducted using a different data set collected in 2018 from the target MBR plant. Figure 11 shows the 2018 calibrated model proposed by the proposed G2ANet algorithm in relation to wastewater COD and TN concentrations. Compared to the base model using default values for all parameters, the G2ANet-based automatic calibration significantly reduced errors. As shown in Fig. 11(a), the remarkable calibration performance of the G2ANet algorithm was observed to simulate the sewage COD concentration. The basic model has limitations in simulating wastewater COD concentrations correctly. The deviation between the measured sewage COD concentration and the simulated sewage COD concentration by the basic model was 21.48 mg/L. On the other hand, the G2ANet-based calibration model reduced the deviation to 2.78 mg/L. Therefore, when G2ANet was used to calibrate the ASM-SMP model, the variance of the wastewater COD concentration simulation was reduced by 87.05%.

G2ANet에 의한 유출수 TN 농도 시뮬레이션의 개선은 도 11(b)에서 확인되었다. 기본 모델은 하수 COD 농도와 비교하여 하수 TN 농도를 4.99mg/L로 시뮬레이션할 때 비교적 낮은 편차를 보였다. 그러나 유출수 TN 농도를 시뮬레이션하기 위해 보정도 필요했다. TN은 폐수 품질에 비교적 높은 영향을 미치기 때문에 높은 폐수 TN 농도를 배출하는 데 부과되는 폐수는 다른 오염 물질에 비해 높다. 이러한 맥락에서 하수 처리 공정의 TN 제거 효율을 개선하기 위한 모니터링-최적화-제어 전략을 수립하기 위해서는 폐수 TN 농도의 정확한 시뮬레이션이 중요하다. G2ANet 기반 보정 모델은 기본 모델에 비해 방류수 TN 농도의 시뮬레이션 정확도를 개선했다. G2ANet 알고리즘은 기본 모델의 4.99mg/L에서 2.40mg/L로 시뮬레이션 편차를 줄였다. 따라서 보정된 모델은 편차를 51.85% 감소시켜 시뮬레이션 성능을 향상시켰다. 결과적으로 G2ANet 기반 자율 교정 시스템은 전체 규모 MBR 플랜트의 폐수 COD 및 TN 농도 모두에서 정확한 시뮬레이션 성능을 보여주었다.The improvement of effluent TN concentration simulation by G2ANet was confirmed in Fig. 11(b). The base model showed a relatively low deviation when simulating the sewage TN concentration as 4.99 mg/L compared to the sewage COD concentration. However, calibration was also required to simulate effluent TN concentrations. Because TN has a relatively high impact on wastewater quality, the effluent charge for discharging high effluent TN concentrations is high compared to other pollutants. In this context, accurate simulation of wastewater TN concentration is important to establish a monitoring-optimization-control strategy to improve the TN removal efficiency of sewage treatment processes. The G2ANet-based calibration model improved the simulation accuracy of effluent TN concentration compared to the basic model. The G2ANet algorithm reduced the simulation deviation from 4.99 mg/L in the basic model to 2.40 mg/L. Therefore, the calibrated model improved the simulation performance by reducing the deviation by 51.85%. As a result, the G2ANet-based autonomous calibration system showed accurate simulation performance in both COD and TN concentrations of wastewater from a full-scale MBR plant.

또한, 본 발명의 실시예에서는 표 4에 요약된 바와 같이 G2ANet의 보정 성능을 정량적으로 평가하였다. MAPE와 RMSE(Root Mean Squared Error)는 측정 데이터에 대한 모델 적응도를 평가하기 위한 수치적 성능 평가자로 선정되었다. G2ANet은 하수 COD 및 TN 농도 모두에 대해 더 낮은 MAPE 및 RMSE를 보여주었다. 즉, 제안된 보정 프로토콜을 사용하여 보정된 모델이 기본 모델보다 성능이 우수했다. G2ANet 알고리즘은 2011년에서 2017년 데이터만을 사용하여 훈련된 알고리즘에도 불구하고 2018년에 대해 새로 제공된 상태 정보에 대해 강력하고 정확한 보정을 수행했음을 알 수 있다. 따라서 G2ANet 알고리즘을 추가 훈련 절차 없이 새로 측정된 데이터 세트에 대한 모델 보정에 사용할 수 있음을 유추할 수 있다.In addition, in the examples of the present invention, the calibration performance of G2ANet was quantitatively evaluated as summarized in Table 4. MAPE and RMSE (Root Mean Squared Error) were selected as numerical performance evaluators to evaluate model adaptability to measurement data. G2ANet showed lower MAPE and RMSE for both sewage COD and TN concentrations. That is, the model calibrated using the proposed calibration protocol outperformed the base model. It can be seen that the G2ANet algorithm performed robust and accurate corrections to the newly provided state information for 2018, despite the algorithm being trained using only 2011 to 2017 data. Therefore, it can be inferred that the G2ANet algorithm can be used for model calibration on a newly measured data set without any additional training procedure.

검증 기간 동안 보정된 동력학적 매개변수의 분석은 도 12와 같이 수행되었다. 히스토그램 및 커널 밀도 분포는 각각 모든 데이터의 개별 위치를 기반으로 하는 여러 데이터 클래스와 부드러운 경험적 확률 밀도 함수를 나타낸다. 도 12(a)는 COD 제거를 반영하기 위해 G2ANet 알고리즘에 의해 보정된 동력학적 매개변수의 분포를 보여준다. 검은색 점선은 각각 의 기본값을 의미한다. Analysis of the kinetic parameters calibrated during the validation period was performed as shown in FIG. 12 . The histogram and kernel density distributions each represent several classes of data and a smooth empirical probability density function based on the discrete location of all data. Figure 12(a) shows the distribution of kinetic parameters corrected by the G2ANet algorithm to reflect COD removal. The black dotted lines are means the default value of

종속영양 최대 성장률()을 증가시켰고 종속영양 붕괴율(b_h)을 기본값보다 낮은 값으로 보정하였다. 또한, 세포 용해 동안 생성된 BAP의 감소된 분율(f_BAP) 및 BAP의 증가된 가수분해 속도(K_h,BAP)는 BAP의 더 낮은 생성을 나타내며, 이는 종속 영양체 및 독립 영양체 모두의 낮은 용해에 영향을 미쳤다. 이러한 결과는 제안된 보정 시스템이 성장률을 증가시키고 부패율을 감소시켜 종속영양생물의 활성을 증가시킴을 나타내었다. 따라서 목표 MBR 플랜트의 COD 제거 효율은 G2ANet 알고리즘을 통해 반영될 수 있다.Heterotrophic maximum growth rate ( ) was increased and the heterotrophic decay rate (b _h ) was corrected to a value lower than the default value. In addition, the reduced fraction of BAP produced during cell lysis (f _BAP ) and the increased rate of hydrolysis of BAP (K _h,BAP ) indicate a lower production of BAP, indicating lower production of both heterotrophs and autotrophs. affected dissolution. These results indicated that the proposed correction system increased the activity of heterotrophs by increasing the growth rate and reducing the decay rate. Therefore, the COD removal efficiency of the target MBR plant can be reflected through the G2ANet algorithm.

도 12(b)는 생물학적 TN 제거 메커니즘과 관련된 보정된 운동 매개변수의 분석을 보여준다. 독립영양체의 생물학적 활성을 반영하여 독립영양체의 최대 생장률)을 증가시키고 종속영양체 붕괴율(b_a)을 감소시켰다. 증가된 는 독립 영양 생물의 호기성 성장을 활성화하고 감소는 독립 영양 생물의 용해를 완화하여 질화 메커니즘을 증가시켰다. 또한 세포 성장 과정에서 생성되는 UAP의 분획(f_UAP)과 UAP의 가수분해율(K_h,UAP) 모두 기본값에 비해 감소하였다. 이들은 탈질을 가속화하여 질산염 농도의 감소에 영향을 미쳤다. 대상 MBR 플랜트의 높은 TN 제거 효율은 제안된 보정 시스템에 의해 고려되었다. 따라서 반사된 TN 제거 효율은 도 11(b)와 같이 모의에서 방류수 TN 농도를 감소시켰고 기본 모델에 비해 만족스럽게 모의하였다. 결과적으로 G2ANet 기반 보정 시스템은 대상 식물의 생물학적 메커니즘을 반영하여 COD 및 TN 제거 프로세스를 시뮬레이션하기 위한 8가지 운동 매개변수의 만족스러운 값을 제안했다.12(b) shows an analysis of calibrated kinetic parameters related to biological TN clearance mechanisms. The maximum growth rate of an autotroph reflecting the biological activity of the autotroph ) and decreased the heterotroph decay rate (b _a ). increased activated the aerobic growth of autotrophs, and reduction alleviated the lysis of autotrophs, increasing the nitrification mechanism. In addition, both the fraction of UAP generated during cell growth (f _UAP ) and the rate of hydrolysis of UAP (K _{h, UAP} ) were decreased compared to the default values. They accelerated denitrification and thus contributed to the reduction of nitrate concentration. The high TN removal efficiency of the target MBR plant was taken into account by the proposed calibration system. Therefore, the reflected TN removal efficiency decreased the effluent TN concentration in the simulation as shown in Fig. 11(b) and was simulated satisfactorily compared to the basic model. As a result, the G2ANet-based calibration system suggested satisfactory values of eight kinetic parameters to simulate the COD and TN removal process, reflecting the biological mechanisms of the target plants.

다양한 작동 조건에서 COD 및 TN 제거 메커니즘을 반영하기 위해 보정된 운동 매개변수의 세부 변화가 도 13에 나와 있습니다. 도 13(a)와 (c)와 같이 2월과 7월에 모의된 폐수 COD 및 TN 농도는 측정된 유입수 데이터와 잘 일치하였다. 제안된 G2ANet 기반 보정 시스템은 대상 플랜트의 운영 조건과 계절적 특성을 반영한 것으로 추론할 수 있다. 여러 환경 요인이 폐수 처리 시뮬레이션에 영향을 줄 수 있으며, 특히 온도가 중요한 요인입니다. 온도 상승은 속도 계수 값의 증가를 초래한다. 따라서 생물학적 운동 매개변수의 값은 대상 식물의 온도에서 결정되어야 한다. 그러나 대부분의 온도 보정 계수는 격리된 처리 공정을 위해 개발되었다. 따라서 제안된 보정의 평가는 온도-계절 특성을 고려하여 새로운 데이터 세트에 대한 적용의 견고성을 검증하는 데 중요하다.Detailed changes in kinetic parameters calibrated to reflect COD and TN removal mechanisms under various operating conditions are shown in Figure 13. As shown in Fig. 13(a) and (c), the simulated wastewater COD and TN concentrations in February and July were in good agreement with the measured influent data. It can be inferred that the proposed G2ANet-based correction system reflects the operating conditions and seasonal characteristics of the target plant. Several environmental factors can affect a wastewater treatment simulation, with temperature being an important factor. An increase in temperature causes an increase in the value of the rate coefficient. Therefore, the values of biological kinetic parameters must be determined at the temperature of the target plant. However, most temperature correction factors have been developed for isolated treatment processes. Therefore, evaluation of the proposed calibration is important to verify the robustness of its application to new data sets by considering temperature-seasonal characteristics.

도 13 (b)와 (d)는 2월과 7월에 대한 G2ANet 알고리즘의 보정 성능을 검증하였다. 전반적인 경우에, G2ANet 알고리즘은 두 달 동안 종속 영양체와 독립 영양체의 성장률을 높이고 붕괴 속도를 줄였다. 특히, 온도가 낮을수록 성장률과 부패율이 각각 증가하고 감소하지만 미생물의 활동이 감소할 수 있다. 7월의 결과를 나타내는 도 13(d)와 비교하여 기온이 낮은 2월은 종속영양생물과 독립영양생물의 활동을 감소시켰다. 종속영양생물의 2월 최대 성장률과 부패율은 7월과 유사했다. 반면, 2월의 독립영양 최대 성장률은 7월보다 낮았다. 2월의 독립 영양 붕괴율은 7월에 비해 더 높은 값으로 보정되었다. 따라서 보정된 운동 매개변수를 고려하여 시뮬레이션에서 고온계절(7월)의 미생물 활성이 추운계절(2월)보다 높게 나타났다.13 (b) and (d) verify the calibration performance of the G2ANet algorithm for February and July. In the overall case, the G2ANet algorithm increased the growth rate of heterotrophs and autotrophs over two months and reduced their decay rate. In particular, as the temperature decreases, the growth rate and decay rate increase and decrease, respectively, but the activity of microorganisms may decrease. Compared to FIG. 13 (d) showing the results of July, February, when the temperature was low, decreased the activities of heterotrophs and autotrophs. The maximum growth rate and decay rate of heterotrophs in February were similar to those in July. On the other hand, the peak autotrophic growth rate in February was lower than that in July. Autotrophic decay rates in February were corrected for higher values than in July. Therefore, considering the calibrated kinetic parameters, the microbial activity in the hot season (July) was higher than that in the cold season (February) in the simulation.

이러한 결과는 유입수 특성 및 작동 조건을 고려한 G2ANet 기반 자율 교정의 장점을 보여준다. G2ANet 알고리즘의 상태에는 유입수 온도와 MLSS 농도가 포함된다. 따라서 본 발명의 실시예에 따른 자율 보정 시스템은 생물학적 미생물 메커니즘을 반영할 뿐만 아니라 모델 정확도를 개선했다. 이러한 결과는 G2ANet 기반 교정의 활용이 MBR 플랜트의 모델링 및 시뮬레이션에 효과적인 솔루션이 될 수 있음을 보여준다. These results show the advantages of G2ANet-based autonomous calibration considering influent characteristics and operating conditions. The states of the G2ANet algorithm include influent temperature and MLSS concentration. Therefore, the self-calibration system according to the embodiment of the present invention not only reflected the biological microbial mechanism but also improved the model accuracy. These results show that the utilization of G2ANet-based calibration can be an effective solution for modeling and simulation of MBR plants.

본 발명의 실시예에서는 신뢰성을 확보하기 위해 ASM-SMP 모델에서 생물학적 공정 속도를 해석하여 자율 보정 시스템을 추가로 평가했다. 도 14는 보정된 동력학적 매개변수에 의해 계산된 생물학적 공정 속도의 변화를 보여준다. 무산소 반응기의 생물학적 공정 속도는 도 14(a)와 같이 종속영양생물의 무산소 성장, 종속영양생물의 용해, 암모니아화 및 기질의 가수분해를 포함한다. 종속영양 미생물 유기체의 활동은 주로 무산소 반응기에서 발생하기 때문이다. 또한, 암모니아의 아미노산으로의 전환은 종속영양 및 독립영양 바이오매스 합성 중에 발생하며, 이는 온도 의존적 변수 k _a 에 의해 보정된다. 기질의 가수분해는 천천히 생분해되는 기질의 분해와 쉽게 생분해되는 기질의 형성으로, 무산소 및 호기성 조건에서 종속영양 박테리아에 의해 제거될 수 있다. 도 14(a)의 왼쪽에 표시된 2월의 처리율과 비교하여 증가된 폐수 온도는 종속영양생물의 무산소 성장, 가암모니아화 및 기질의 가수분해를 강화한 반면 7월에는 종속영양생물의 용해를 감소시켰다. 종속영양 미생물 유기체의 바이오매스는 증가된 성장과 감소된 부패율에 의해 폐수 온도가 높은 날에 증가했다. 또한 종속영양체의 합성에 의해 가암모니아화를 증가시킬 수 있으며 종속영양체의 공급원으로서 천천히 생분해성 기질을 쉽게 생분해성 기질로 전환하면 종속영양체의 성장을 증가시킬 수 있다. 이러한 결과는 제안된 G2ANet 기반 보정 시스템이 운영 조건에 따른 종속영양 유기체의 생물학적 과정을 반영함을 나타낸다.In the embodiment of the present invention, the autonomous calibration system was further evaluated by analyzing the biological process rate in the ASM-SMP model to ensure reliability. Figure 14 shows the change in biological process rate calculated by calibrated kinetic parameters. The biological process rate of the anoxic reactor includes anoxic growth of heterotrophs, dissolution of heterotrophs, ammonification, and hydrolysis of substrates, as shown in FIG. 14(a). This is because the activity of heterotrophic microbial organisms mainly occurs in anoxic reactors. In addition, conversion of ammonia to amino acids occurs during heterotrophic and autotrophic biomass synthesis, which is corrected by the temperature-dependent variable k _a . Substrate hydrolysis is the degradation of slowly biodegradable substrates and the formation of readily biodegradable substrates that can be removed by heterotrophic bacteria under anoxic and aerobic conditions. Compared to the treatment rate in February, shown on the left side of Figure 14 (a), the increased wastewater temperature enhanced the anoxic growth, ammonization and hydrolysis of the substrate, while reducing the dissolution of heterotrophs in July. . The biomass of heterotrophic microbial organisms increased on days with higher wastewater temperatures due to increased growth and reduced decay rates. It is also possible to increase ammonification by heterotrophic synthesis, and as a source of heterotrophs, slowly converting biodegradable substrates to easily biodegradable substrates can increase heterotrophic growth. These results indicate that the proposed G2ANet-based calibration system reflects the biological processes of heterotrophic organisms under operating conditions.

도 14(b)는 2월과 7월 호기성 반응기에서 발생한 생물학적 공정률을 나타낸다. 저온계절에 비해 고온계절에 독립영양생물의 호기성 생장이 증가하였다. 독립 영양체의 용해는 2월과 7월 모두 유사했다. 독립영양생물의 바이오매스는 보정된 운동 매개변수에 따라 7월에 증가할 수 있다. 따라서 7월의 TN 제거 효율은 2월의 81.16%에서 86.93%로 증가했다. 자세한 생물학적 메커니즘은 G2ANet 알고리즘에 제공되지 않았지만 자동 보정은 구조에 대한 상태 정보로 유입수 및 작동 조건을 고려하여 동적 매개변수의 신뢰할 수 있는 값을 제안했다. 결과적으로 G2ANet 기반의 자율교정은 미생물 유기체의 운영특성과 바이오매스를 고려하여 본격적인 MBR 플랜트에서의 교정 성능의 타당성과 신뢰성을 입증하였다.14(b) shows the biological process rates generated in the aerobic reactors in February and July. Compared to the low temperature season, the aerobic growth of autotrophs increased in the high temperature season. Dissolution of autotrophs was similar in both February and July. Biomass of autotrophs may increase in July according to calibrated kinetic parameters. Therefore, the TN removal efficiency in July increased from 81.16% in February to 86.93%. Detailed biological mechanisms were not provided to the G2ANet algorithm, but automatic calibration suggested reliable values of the dynamical parameters considering the influent and operating conditions as state information about the structure. As a result, the self-calibration based on G2ANet proved the feasibility and reliability of the calibration performance in a full-scale MBR plant considering the operating characteristics and biomass of microbial organisms.

본 발명의 실시예에 따르면, ASM-SMP 모델의 동력학적 매개변수를 보정하기 위해 G2ANet 알고리즘을 기반으로 하는 새로운 지능형 보정 프로토콜이 제안되었다. 8개의 동력학적 매개변수가 보정 목표 세트로 선택되었다. 또한 Arrhenius 방정식으로 3개의 온도 종속 매개변수를 보정했다. G2ANet 알고리즘의 에이전트는 COD 및 TN 제거 현상을 고려하기 위해 ASM-SMP 모델에서 자체 관찰을 수신하여 각 운동 매개변수를 보정하도록 할당되었다. 또한, 중앙 집중식 비평가는 중앙 집중식 교육 및 분산 실행 방식을 사용하여 교정 성능을 높였다. 목표 MBR 플랜트의 유입수 및 작동 조건은 생물학적 메커니즘을 반영하기 위해 에이전트에 대한 상태 및 관찰로 간주되었다. G2ANet 기반 보정 시스템은 각 알고리즘의 보상 값을 서로 비교할 때 QMIX 기반 매개변수 보정 성능을 능가했다.According to an embodiment of the present invention, a new intelligent calibration protocol based on the G2ANet algorithm is proposed to calibrate the dynamic parameters of the ASM-SMP model. Eight kinetic parameters were selected as a set of calibration targets. Additionally, three temperature-dependent parameters were calibrated with the Arrhenius equation. Agents of the G2ANet algorithm were assigned to calibrate each kinetic parameter by receiving self-observations from the ASM-SMP model to account for the COD and TN removal phenomena. In addition, the centralized reviewer used a centralized training and distributed execution method to increase proofreading performance. The influent and operating conditions of the target MBR plant were considered as conditions and observations for the agent to reflect the biological mechanisms. The G2ANet-based calibration system outperformed the QMIX-based parameter calibration performance when comparing the compensation values of each algorithm with each other.

G2ANet 알고리즘에 의해 보정된 ASM-SMP 모델은 폐수 COD 및 TN 시뮬레이션에 대해 기본 모델의 오차 편차를 각각 87.05% 및 51.85% 감소시켰다. 또한 제안된 보정 시스템은 실제 플랜트에 적용할 수 있는지를 보장하기 위해 계절별 월별로 검증되었다. G2ANet은 계절성과 관련하여 운동 매개변수를 보정했다. 더운 계절에는 미생물의 활동을 증가시키고 추운 계절에는 바이오 매스를 감소시킨다. 주요 기여는 WWTP를 위한 안정적이고 실현 가능한 G2ANet 기반 모델 보정 프로토콜이다. 제안된 G2ANet 기반 보정 시스템은 역동적이고 복잡한 생물학적 영양소 제거 과정의 특성을 고려할 수 있다. 또한 제안된 정확하고 신뢰할 수 있는 모델 보정 프로토콜이 실제 플랜트를 모방한 MBR 시스템의 적절하고 신뢰할 수 있는 운영 전략을 도출하는 데 핵심이 되기 때문에 정확한 모델링을 기반으로 WWTP를 최적화하고 제어하는 데 활용할 수 있다.The ASM-SMP model calibrated by the G2ANet algorithm reduced the error variance of the base model by 87.05% and 51.85% for wastewater COD and TN simulations, respectively. In addition, the proposed calibration system was verified on a seasonal basis and monthly basis to ensure its applicability to the actual plant. G2ANet corrected the kinetic parameters with respect to seasonality. It increases microbial activity in the hot season and reduces biomass in the cold season. The main contribution is a stable and feasible G2ANet-based model calibration protocol for WWTP. The proposed G2ANet-based calibration system can consider the characteristics of dynamic and complex biological nutrient removal processes. In addition, it can be utilized to optimize and control the WWTP based on accurate modeling, as the proposed accurate and reliable model calibration protocol is key to deriving an appropriate and reliable operating strategy of the MBR system mimicking a real plant. .

이하에서는 앞서 언급한 보정 시스템을 이용한 본 발명의 실시예에 따른 다중 에이전트 강화학습 기반 하수처리장 공정 운영 다중 최적화시스템의 구성 및 기능에 대해 설명하도록 한다. Hereinafter, the configuration and functions of a multi-agent reinforcement learning-based sewage treatment plant process operation multi-optimization system according to an embodiment of the present invention using the aforementioned calibration system will be described.

본 발명의 실시예에서는 다양한 비선형 유입수 조건에서 지속 가능한 운영을 달성하기 위해 첨단 MARL 알고리즘을 기반으로 한 자율 운영 궤적 검색 시스템을 제공한다. 첫째, k-means 클러스터링 알고리즘은 다양한 유입 시나리오에 따라 유입 조건을 구별한다. 둘째, 원하는 환경 및 경제 운영을 만족시키기 위해 MARL 알고리즘에 의해 DO, 외부 재활용 비율, 외부 탄소 선량을 포함한 최적의 운영 궤적을 자동으로 검색하였다. 셋째, 다양한 유입수 조건에서 타당성과 적응성을 검증하기 위해 MARL 알고리즘을 훈련하는 데 사용하지 않은 새로운 1년 데이터 세트에 제안된 시스템을 적용하고 평가했다.In an embodiment of the present invention, an autonomous operating trajectory search system based on an advanced MARL algorithm is provided to achieve sustainable operation in various non-linear influent conditions. First, the k-means clustering algorithm distinguishes inflow conditions according to various inflow scenarios. Second, the optimal operating trajectory including DO, external recycling rate, and external carbon dose was automatically searched by the MARL algorithm to satisfy the desired environmental and economic operation. Third, we applied and evaluated the proposed system on a new one-year data set that was not used to train the MARL algorithm to verify its feasibility and adaptability under various influent conditions.

MBR에 대한 다중 운영 궤적을 검색하고 제안하기 위한 프레임워크는 도 16에 도시되어 있다. 첫 번째 단계는 화학적 산소 요구량(COD) 및 총 질소(TN)에 대한 유입수 부하가 높고, 정상이며, 낮은 유입 시나리오의 생성이다. 측정된 유입수 데이터 세트의 일별 변화는 유입수 데이터 세트를 생성하는 데 활용되었다. 생성된 일별 유입수 데이터 세트는 MARL 알고리즘에 대한 다양한 유입수 시나리오를 제공했다. 두 번째 단계는 유입수 조건을 고려한 다중 작동 궤적을 제안하기 위해 MARL 알고리즘의 구조를 선택하는 단계이다. 관찰, 상태, 행동 및 보상은 대상 플랜트의 운영 특성에 따라 선택되었다. 또한 보상함수 계산 시 배출수질(EQI), 폭기에너지(AE), 펌핑에너지(PE), 외부탄소(EC)를 고려하였다. 이는 하수처리 공정의 환경적 안정성과 에너지 효율에 필수적인 요소이다. 세 번째 단계는 MARL 알고리즘의 적용과 훈련이다. 이 단계에서는 획득한 보상의 변화에 따라 최상의 MARL 알고리즘을 선택한다. 수렴된 보상의 추세는 MARL 알고리즘의 안정적인 성능을 평가할 수 있기 때문이다. 마지막 단계는 방류수 품질, 에너지 및 운영 비용 측면에서 제안된 MBR에 대한 자율 궤적 탐색 시스템에 대한 평가이다. 유입수가 일별(diurnal) 패턴을 가지고 있기 때문에 작동 궤적은 MARL 알고리즘에 의해 매일 제안되었다. 평가는 매일 실시하여 대상 폐수처리장의 수동 시스템과 비교하였다.A framework for retrieving and proposing multiple operational trajectories for MBR is shown in FIG. 16 . The first step is the creation of high, normal, and low influent scenarios with influent loads for chemical oxygen demand (COD) and total nitrogen (TN). Daily changes in the measured influent data set were utilized to generate the influent data set. The generated daily influent data sets provided different influent scenarios for the MARL algorithm. The second step is to select the structure of the MARL algorithm to propose multiple operating trajectories considering influent conditions. Observations, statuses, actions and rewards were selected according to the operating characteristics of the target plant. In addition, when calculating the compensation function, discharged water quality (EQI), aeration energy (AE), pumping energy (PE), and external carbon (EC) were considered. This is an essential factor for the environmental stability and energy efficiency of the sewage treatment process. The third step is the application and training of the MARL algorithm. In this step, the best MARL algorithm is selected according to the change in the reward obtained. This is because the converged reward trend can evaluate the stable performance of the MARL algorithm. The final step is the evaluation of the autonomous trajectory search system for the proposed MBR in terms of effluent quality, energy and operating cost. Since the influent has a diurnal pattern, the operating trajectory is suggested daily by the MARL algorithm. The evaluation was conducted daily and compared with the manual system of the target wastewater treatment plant.

본 발명의 실시예에서는 도 17과 같이 Y-city에 위치한 MBR 플랜트의 운영전략을 최적화하였다. 이 공정에는 순차적인 5개의 반응기에 의한 생물학적 처리 및 슬러지 여과가 포함된다. MBR 공장은 정화기를 막 반응기로 교체하여 발자국과 폐기물 슬러지를 줄였다. 호기성, 안정화, 무산소, 호기성 및 막 반응기의 부피는 각각 820, 220, 2,500, 2,900 및 720 m³였다. 멤브레인과 안정화 반응기 사이의 외부 슬러지 재순환은 유입 유량의 400%로 설정되었다. 무산소 반응기와 혐기 반응기 사이의 내부 재순환 비율은 유입 유량의 100% 였다. 호기성 및 막 반응기에 대한 DO 농도는 각각 4 및 7 mg/L로 유지되었다. COD, TSS 및 TN의 평균 유입 농도는 각각 360, 173 및 52 mg/L였다. 표준 편차는 12.9mg/L(TN)에서 69mg/L(COD)까지 다양했다. 이는 대상 MBR 플랜트의 유입수가 비정상적이어서 동적 조건에서 처리 프로세스를 최적화하기 어렵게 만들었다는 것을 의미한다. MBR 플랜트는 측정된 데이터 세트에 기반한 보정된 활성 슬러지 모델 가용성 미생물 제품(ASM-SMP) 모델로 시뮬레이션되었다. 또한 최적 궤적 제안 시 유입수 데이터 생성과 다양한 유입 시나리오를 활용하여 비정상 유입수 특성을 반영하였다.In the embodiment of the present invention, as shown in FIG. 17, the operation strategy of the MBR plant located in Y-city was optimized. The process includes biological treatment and sludge filtration by 5 reactors in sequence. The MBR plant replaced the clarifier with a membrane reactor to reduce footprint and waste sludge. The volumes of the aerobic, stabilizing, anoxic, aerobic and membrane reactors were 820, 220, 2,500, 2,900 and 720 m ³ respectively. External sludge recirculation between the membrane and stabilization reactor was set at 400% of the influent flow rate. The internal recirculation ratio between the anoxic and anaerobic reactors was 100% of the influent flow rate. DO concentrations for the aerobic and membrane reactors were maintained at 4 and 7 mg/L, respectively. The average influent concentrations of COD, TSS and TN were 360, 173 and 52 mg/L, respectively. Standard deviations ranged from 12.9 mg/L (TN) to 69 mg/L (COD). This means that the influent of the target MBR plant was abnormal, making it difficult to optimize the treatment process under dynamic conditions. The MBR plant was simulated with a calibrated activated sludge model soluble microbial products (ASM-SMP) model based on the measured data set. In addition, when proposing the optimal trajectory, influent data generation and various influent scenarios were used to reflect abnormal influent characteristics.

탄소와 질소의 부하는 하수 처리 공정의 중요한 유입 요인이다. 유입수의 변화는 오염 물질 제거 효율과 운영 에너지에 대한 폐수 처리 성능에 영향을 미칠 수 있다. 또한 동적 유입수를 고려하지 않은 폐수 처리의 최적화는 충분한 정확성을 보장하지 못하고 플랜트의 성능을 향상시킬 수 없다. 가능한 공정 최적화를 위한 동적 및 비정상 유입수 조건을 다루기 위해 이 연구는 오염물질 농도의 고유한 자연적 변동성을 반영하기 위해 측정된 유입수 데이터로부터 유입수 데이터 세트를 생성했다. 도 18은 목표 MBR 설비의 유입 오염물질 농도의 변화를 보여주고 있다. 도 18(a)는 오염물질 농도가 3~5월과 10~12월에 높았고 3~5월에 낮아졌다는 월별 자료의 뚜렷한 패턴을 보여준다. 월별 오염물질 농도는 높은 유입수 조건에서 12% 증가했지만 최저 유입수 조건에서는 -15%로 감소했습니다. 도 18(b)는 유입 농도의 일별 변화를 보여주며, 이는 가사 활동에 해당하는 아침과 저녁 피크로 명백하다. 측정된 데이터 세트와 그 특성을 기반으로 탄소와 질소 조성비의 월별 오염물질 변화와 유입 오염물질의 일별 추세를 고려하여 시간별 유입수 데이터 세트를 생성하였다. 또한 시계열 유입 데이터에 대한 불확실성을 포함하기 위해 가우스 노이즈가 생성된 시간별 데이터 세트에 추가되었다.Carbon and nitrogen loads are important input factors in sewage treatment processes. Changes in influent can affect pollutant removal efficiency and wastewater treatment performance for operating energy. In addition, optimization of wastewater treatment without considering dynamic influent cannot guarantee sufficient accuracy and improve plant performance. To address dynamic and unsteady influent conditions for possible process optimization, this study generated influent data sets from measured influent data to reflect the inherent natural variability in pollutant concentrations. 18 shows changes in the concentration of inflow pollutants of the target MBR facility. Figure 18(a) shows a clear pattern of monthly data that pollutant concentrations were high in March-May and October-December and decreased in March-May. Monthly contaminant concentrations increased by 12% under high influent conditions, but decreased by -15% under the lowest influent conditions. Figure 18(b) shows the daily variation of the influent concentration, which is evident as morning and evening peaks corresponding to housework activities. Based on the measured data set and its characteristics, an hourly influent data set was created by considering the monthly pollutant changes in carbon and nitrogen composition ratios and the daily trend of influent pollutants. Gaussian noise was also added to the generated hourly data set to include uncertainties for the time-series influx data.

이하에서는 K-means 클러스터링 알고리즘을 사용한 유입수 시나리오 클러스터링에 대해 설명한다. 생성된 유입수 데이터로부터 최적의 운전 궤적을 제시하기 위해서는 유사한 유입수 조건을 그룹별로 클러스터링하는 것이 필요하다. 각 종류의 유입수 조건에 대한 모든 작동 궤적을 제공하는 것은 비현실적이기 때문이다. 이는 최적화 문제의 차원을 증가시키고 시간 소모적인 계산으로 이어진다. 따라서 복잡한 유입수 조건을 반영하기 위해 목표 플랜트를 최적화하기 위해 다양한 COD 및 TN 조성비에 따른 유입수 시나리오를 제안하였다. 다양한 유입 조건은 k-means 클러스터링 알고리즘을 사용하여 둘 이상의 그룹으로 그룹화할 수 있다.In the following, influent scenario clustering using the K-means clustering algorithm is described. In order to present the optimal operating trajectory from the generated influent data, it is necessary to cluster similar influent conditions by group. This is because it is impractical to provide all operating trajectories for each type of influent condition. This increases the dimensionality of the optimization problem and leads to time-consuming computations. Therefore, influent scenarios according to various COD and TN composition ratios were proposed to optimize the target plant to reflect complex influent conditions. Different inflow conditions can be grouped into two or more groups using the k-means clustering algorithm.

k-mean 클러스터링 알고리즘은 유사성에 따라 변수를 클러스터링한다. 무작위 수의 중심을 선택한 다음 중심에서 유클리드 거리를 계산하여 가까운 클러스터에 데이터를 할당한다. k-means 클러스터링의 두 단계는 수학식 6을 최소화한다.The k-mean clustering algorithm clusters variables according to their similarity. It selects a random number of centroids and then assigns the data to nearby clusters by calculating the Euclidean distance from the centroid. The two steps of k-means clustering minimize Equation 6.

[수학식 6][Equation 6]

여기서 x_i는 본 발명에서 COD 및 TN 구성비인 데이터 세트, μ_k는 클러스터 c_k의 평균, K는 클러스터 수이다. Here, x _i is the COD and TN composition ratio in the present invention, μ _k is the average of cluster c _k , and K is the number of clusters.

MARL 기반 다중 운전 궤적 검색 시스템은 대상 MBR 플랜트의 조정 가능한 작업 변수에 대한 최적의 설정값을 결정하도록 설계되었다. 다중 운전 탐색 시스템의 주요 기여 포인트는 다양한 유입 조건에 대응하는 환경 및 비용 효율적인 플랜트 유지 관리였다. 따라서 도 19와 같이 현실적인 다양한 유입수 조건을 반영하기 위해 k-means 알고리즘에 의한 유입수 시나리오 클러스터링과 통합된 최적의 시스템을 제안한다. MARL 알고리즘의 에이전트는 각각의 최적의 설정값을 제안하고 대상 플랜트의 공정 정보를 공유하기 위해 서로 상호 작용했다. 호기성 반응기에서 DO 농도의 최적 설정점, 혐기성 반응기 전의 안정화 반응기로 멤브레인 반응기에서 재순환되는 슬러지 양 및 대상 플랜트의 유입 흐름에 대한 외부 탄소 투여량을 검색하기 위해 세 가지 에이전트가 할당되었다. MARL 알고리즘에 의해 조작된 3개의 작동 변수는 질산화 및 탈질 공정의 성능을 보장하기 위해 선택되었다. 제안된 최적의 궤적에 따라 4가지 성과지표를 주간운용에서 평가하였다. 지수는 EQI, AE, PE, EC로 구성되며 MARL 알고리즘의 보상함수로 추가적으로 고려되었다. A MARL-based multi-operating trajectory retrieval system is designed to determine optimal setpoints for tunable working parameters of a target MBR plant. The main contribution points of the multi-operation navigation system were environmental and cost-effective plant maintenance in response to various influent conditions. Therefore, as shown in FIG. 19, an optimal system integrated with influent scenario clustering by k-means algorithm is proposed to reflect various realistic influent conditions. Agents of the MARL algorithm interacted with each other to propose optimal settings for each and share process information of the target plant. Three agents were assigned to search for the optimal set point of DO concentration in the aerobic reactor, the amount of sludge recycled from the membrane reactor to the stabilization reactor before the anaerobic reactor, and the external carbon dose to the influent stream of the target plant. Three operating parameters manipulated by the MARL algorithm were selected to ensure the performance of the nitrification and denitrification processes. According to the proposed optimal trajectory, four performance indicators were evaluated in daytime operation. The index consists of EQI, AE, PE, and EC and was additionally considered as a reward function of the MARL algorithm.

운전 궤적 검색 시스템은 DO 농도, 슬러지 재활용 및 외부 탄소 사용량을 포함한 세 가지 변수를 조작했다. 폐수 수질 및 운전 에너지 효율 개선에 대한 여러 연구에서는 다음과 같이 세 가지 조작 변수의 중요성을 조사했다The driving trajectory search system manipulated three variables including DO concentration, sludge recycling and external carbon usage. Several studies on wastewater quality and operational energy efficiency improvement have investigated the importance of three manipulated variables:

- 첫째, DO 농도는 암모니아(NH₄)를 질산염(NO₃)으로 변환하는 질화와 관련된 영양소 제거 효율에 중요한 요소이다. 또한 DO는 폭기 공정과 관련된 가장 많은 에너지를 소비하는 폐수 처리 공정의 에너지 소비 지표이다. 따라서 충분한 질산화를 유지하고 과도한 폭기 에너지 소비를 방지하기 위해 적절한 DO 설정값을 최적화해야 한다.- First, the DO concentration is an important factor in the nutrient removal efficiency related to nitration, which converts ammonia (NH ₄ ) into nitrate (NO ₃ ). DO is also an indicator of the energy consumption of the wastewater treatment process, which consumes the most energy associated with the aeration process. Therefore, appropriate DO setpoints must be optimized to maintain sufficient nitrification and prevent excessive aeration energy consumption.

- 둘째, 슬러지 재활용은 호기성 조건에서 형성된 질산염을 무산소 지역으로 되돌려 탈질 과정을 향상시킨다. 질소 제거 효율은 슬러지 재활용량에 따라 결정되며, 질산염 재활용이 낮거나 높으면 탈질 잠재력이 감소한다.- Second, sludge recycling enhances the denitrification process by returning nitrate formed under aerobic conditions to the anoxic region. Nitrogen removal efficiency depends on the amount of sludge recycling, with low or high nitrate recycling reducing the denitrification potential.

- 셋째, 외부 탄소원을 추가하는 것은 흐름 중의 질산염(NO₃)을 질소 가스(N₂)로 전환시키는 탈질화를 향상시키는 효과적인 방법이다. 과도한 외부 탄소원 투여량은 운영 비용을 증가시키고 후속 제거 과정을 유발한다.- Third, adding an external carbon source is an effective way to enhance denitrification, which converts nitrate (NO ₃ ) into nitrogen gas (N ₂ ) in the stream. Excessive external carbon source dosage increases operating costs and triggers subsequent removal processes.

본 발명의 실시예에서에서 다중 운영 궤적 검색 시스템은 오염 물질 제거 효율 및 운영 비용의 개선과 가장 영향력 있는 조작 변수를 위해 DO 농도, 슬러지 재활용량 및 외부 탄소 사용량의 주간 최적 설정값을 제안했다. 최적화된 운영 궤적은 수동 운영 체제와 비교하여 평가되었다. 제안된 최적화 시스템의 성능을 비교하고 평가하기 위해 EQI, AE, PE 및 EC의 네 가지 지표가 사용되었다.In an embodiment of the present invention, the multi-operating trajectory search system suggested weekly optimal set values of DO concentration, sludge recycling amount and external carbon usage for the improvement of pollutant removal efficiency and operating cost and the most influential operating parameters. Optimized operating trajectories were evaluated compared to manual operating systems. Four indicators, EQI, AE, PE and EC, were used to compare and evaluate the performance of the proposed optimization system.

EQI(kg/d)는 수질에 큰 영향을 미치는 화합물의 가중 방류수 부하를 기준으로 관찰 기간 동안의 평균값이다. EQI는 수학식 7로 표현된다.EQI (kg/d) is the average value over the observation period based on the weighted effluent load of the compound that has a significant effect on water quality. EQI is expressed by Equation 7.

[수학식 7][Equation 7]

여기서 T는 t₁과 t₂ 사이의 작동 시간(일)이고, TSS_e(t), COD_e(t), Nkj_e(t), NO_e(t), 및 BOD_e(t)는 각각 총 부유 물질의 유출 농도(mg/L), 화학적 산소 요구량, 킬달 질소, 질산염 및 생물학적 산소 요구량(mg/L)이다. B는 폐수 내 오염물질에 대한 가중 계수(B_TSS=2, B_COD=1, B_Nkj=3-, B_NO=10, 및 B_BOD=2)이다.where T is the operating time (days) between t ₁ and t ₂ , and TSS _e (t), COD _e (t), Nkj _e (t), NO _e (t), and BOD _e (t) are each the total Suspended solids effluent concentrations (mg/L), chemical oxygen demand, Kjeldahl nitrogen, nitrate and biological oxygen demand (mg/L). B is the weighting factor for the pollutant in the wastewater (B _TSS =2, B _COD =1, B _Nkj =3-, B _NO =10, and B _BOD =2).

AE(kWh/d)는 폭기에 의해 소비되는 에너지이며 K_La 값에서 계산된다. 이는 식 8로 표현된다.AE (kWh/d) is the energy consumed by aeration and is calculated from the K _L a value. This is expressed in Equation 8.

[수학식 8][Equation 8]

여기서 S_O ^sat는 산소 포화 농도(8m/L), V는 i번째 반응기의 총 N 호기성 반응기 부피(m³), K_La는 산소 전달 계수(d^-1)이다.where S _O ^sat is the oxygen saturation concentration (8 m/L), V is the total N aerobic reactor volume in the ith reactor (m ³ ), and K _L a is the oxygen transfer coefficient (d ^-1 ).

PE(kWh/d)는 수학식 9에서와 같이 내부 및 외부 흐름 재순환 펌프의 소비 에너지에 의해 계산된다.PE (kWh/d) is calculated by the consumed energy of the internal and external flow recirculation pumps as in Equation 9.

[수학식 9][Equation 9]

여기서 Q_int(t), Q_r(t) 및 Q_w(t)는 각각 내부 재활용, 외부 슬러지 재활용 및 폐기물 슬러지의 유량이다. EC(kgCOD/d)는 수학식 10으로 표현되는 외부 탄소원의 소비량이다.where Q _int (t), Q _r (t) and Q _w (t) are the flow rates of internal recycling, external sludge recycling and waste sludge, respectively. EC (kgCOD / d) is the consumption of an external carbon source expressed by Equation 10.

[수학식 10][Equation 10]

여기서 Q_EC 외부 탄소 첨가 유량(m³/d), COD_EC는 외부 탄소원에서 쉽게 생분해되는 기질 농도(400,000gCOD/m³)이다.where Q _EC is the external carbon addition rate (m ³ /d), and COD _EC is the readily biodegradable substrate concentration from the external carbon source (400,000 gCOD/m ³ ).

다중 에이전트 시스템은 공통 대상 환경을 공유하는 상호 작용하는 자율적인 엔티티 그룹이다. MARL은 여러 RL 에이전트에 대한 공동 목표를 찾고 다른 에이전트와 행동을 조정하는 데 한계가 있었던 기존 RL(강화 학습) 알고리즘의 대체 인공 지능(AI) 기술이다. MARL은 다른 에이전트와 경험을 공유하고 유사한 작업을 함께 배우는 이점이 있다. 이는 MARL을 본질적으로 다목적 문제를 해결하기 위해 더 강력하게 만든다. MARL 실행의 공동 목표는 에이전트의 학습 역학의 안정성과 다른 에이전트의 동적 동작에 대한 적응을 통합한다. MARL 실행의 공동 목표는 에이전트의 학습 역학 및 상호 연결의 "안정성"과 환경의 새로운 이벤트에 해당하는 에이전트의 동적 동작에 대한 "적응"을 보여준다. MARL의 이러한 두 가지 장점은 MARL 알고리즘을 하수 처리 공정 연구 분야에 쉽게 적용할 수 있도록 한다. WWTP는 비정상 및 변동이 높기 때문에 최적화하기가 복잡하기 때문이다. 따라서 하수 처리 공정을 최적화하기 위해 안정적이고 적응 가능한 최적화 알고리즘을 사용하는 것이 매우 필요하다.A multi-agent system is a group of interacting autonomous entities that share a common target environment. MARL is an artificial intelligence (AI) technology that replaces existing RL (reinforcement learning) algorithms that have limitations in finding common goals for multiple RL agents and coordinating actions with other agents. MARL has the advantage of sharing experiences with other agents and learning similar tasks together. This makes MARL inherently more powerful for solving multipurpose problems. A common goal of MARL practice is to integrate the stability of an agent's learning dynamics with the adaptation to the dynamic behavior of other agents. The common goal of MARL practice is to show the “stability” of an agent’s learning dynamics and interconnections, and the “adaptation” of an agent’s dynamic behavior corresponding to novel events in its environment. These two advantages of MARL make it easy to apply the MARL algorithm to the field of sewage treatment process research. This is because WWTPs are unsteady and highly variable, making them complex to optimize. Therefore, it is highly necessary to use reliable and adaptable optimization algorithms to optimize sewage treatment processes.

본 발명의 실시예에서는 MARL 알고리즘을 사용하여 도 19과 같이 DO 설정점, 외부 슬러지 재활용 및 외부 탄소 투입량의 운영 궤적을 동시에 최적화했다. MARL 알고리즘의 3개 에이전트는 조작 변수(분산형 실행)를 조작하는 각각을 최적화하기 위해 개별적으로 배포되었다. 에이전트를 함께 학습하고 정보를 공유하기 위해 MARL의 중앙 집중식 교육 구조가 채택되었다. 본 발명에서는 최적화 문제를 시간별 프로세스 운영에서 순차적인 의사결정 문제로 추상화하였다. 환경은 앞서 설명된 대로 ASM-SMP 모델로 모델링된 MBR 플랜트이다. 에이전트가 DO 농도, 외부 슬러지 재활용량 및 외부 탄소 주입량을 설정하려고 시도한 상호 작용 인터페이스와 결합되었다. 관찰 및 에이전트 상태는 유입수 및 작동 조건과 관련이 있다. 각 인터페이스(즉, 학습 에포크) 후에 환경에 의해 보상 값이 분배된다. 보상 기능은 성과 지수, EQI, AE, PE 및 EC를 기반으로 결정되었다. MARL 알고리즘은 MBR 플랜트의 운영 성능을 향상시키기 위해 보상 값을 최대화하도록 훈련되었다.In the embodiment of the present invention, the operating trajectories of DO set point, external sludge recycling, and external carbon input were simultaneously optimized as shown in FIG. 19 using the MARL algorithm. The three agents of the MARL algorithm were deployed individually to optimize each manipulating the manipulated variable (distributed run). MARL's centralized training structure is adopted to learn agents together and share information. In the present invention, the optimization problem is abstracted from the time-based process operation into a sequential decision-making problem. The environment is an MBR plant modeled with the ASM-SMP model as described above. The agent was combined with the interaction interface to try to establish DO concentration, external sludge recycling amount and external carbon injection amount. Observations and agent status are related to influent and operating conditions. Reward values are distributed by the environment after each interface (i.e., learning epoch). Compensation functions were determined based on Performance Index, EQI, AE, PE and EC. The MARL algorithm was trained to maximize the reward value to improve the operational performance of MBR plants.

두 가지 MARL 알고리즘인 QMIX와 G2ANet은 다중 목표 문제를 해결하기 위해 사용되었다. QMIX는 중앙 집중식 종단 간 방식으로 에이전트의 분산 정책을 훈련하는 가치 기반 MARL 알고리즘이다. G2ANet은 하드 어텐션과 소프트 어텐션을 사용하는 게임 추상화 이론을 기반으로 에이전트의 관계를 해석한다. 두 알고리즘 모두 StarCraft Ⅱ, Traffic Junction, Predator-Prey와 같은 다중 에이전트 문제를 해결하는 데 우수한 성능을 보였다. QMIX 및 G2ANet의 세부 알고리즘은 앞선 보정시스템에서 설명된 바와 같다. Two MARL algorithms, QMIX and G2ANet, were used to solve multi-target problems. QMIX is a value-based MARL algorithm that trains distributed policies of agents in a centralized, end-to-end manner. G2ANet interprets the agent's relationship based on the game abstraction theory using hard and soft attention. Both algorithms showed excellent performance in solving multi-agent problems such as StarCraft II, Traffic Junction, and Predator-Prey. The detailed algorithms of QMIX and G2ANet are as described in the previous calibration system.

다양한 유입수 조건에서 MBR 플랜트의 지속 가능한 운영을 위한 최적의 운영 궤적을 제안하려면 유입수 조건의 적절한 클러스터링이 필요하다. 유입수 COD 농도는 0.85 - 1.12, 유입 TN 농도는 0.84 - 1.07의 조성비를 갖는 측정된 유입수 데이터 세트를 고려하여 유입수 조건을 생성하였다. 도 20은 생성된 유입수 데이터와 클러스터링된 데이터를 k-means 클러스터링 알고리즘을 사용하여 3개의 유입수 시나리오로 분석한 결과를 보여준다. 먼저 측정된 유입수 COD와 TN 농도의 조성비에 따라 유입수 데이터를 생성하였다. 여기에서 고도로 분산된 데이터에 대한 운영 궤적 검색 시스템을 직접 사용하면 다중 에이전트 시스템의 복잡성이 증가할 수 있으므로 k-means 클러스터링이 사용되었습니다.Appropriate clustering of influent conditions is required to propose optimal operating trajectories for sustainable operation of MBR plants under various influent conditions. Influent conditions were created by considering the measured influent data set with composition ratios of influent COD concentrations of 0.85 - 1.12 and influent TN concentrations of 0.84 - 1.07. 20 shows the results of analyzing the generated influent data and the clustered data in three influent scenarios using the k-means clustering algorithm. First, influent data was generated according to the composition ratio of the measured influent COD and TN concentrations. Here, k-means clustering was used because direct use of operational trajectory retrieval systems for highly distributed data can increase the complexity of multi-agent systems.

클러스터 영역은 k-means 클러스터링 알고리즘에 의해 자동으로 결정된다. k-means 알고리즘은 초기에 도 20(a)와 같이 중심을 무작위로 배치했다. 수학식 6의 계산을 몇 번 반복한 후에 클러스터에 포함된 데이터 세트로부터 적절한 거리에서 각 클러스터에 대해 클러스터 중심이 대표적으로 결정되었다. k-means 클러스터링은 COD 및 TN 구성 비율의 정도를 사용하여 3개의 클러스터를 계산했다. 세 클러스터는 높음, 보통 및 낮음 유입 시나리오를 나타낸다. 높음, 보통 및 낮음 시나리오의 중심은 각각 (1.12, 1.16), (1.01, 0.90) 및 (0.8, 0.92)였다.The cluster area is automatically determined by the k-means clustering algorithm. In the k-means algorithm, centers were initially randomly arranged as shown in FIG. 20(a). After repeating the calculation of Equation 6 several times, the cluster center was representatively determined for each cluster at an appropriate distance from the data set included in the cluster. k-means clustering calculated three clusters using the degree of COD and TN composition ratio. The three clusters represent high, medium and low inflow scenarios. The centers for the high, medium and low scenarios were (1.12, 1.16), (1.01, 0.90) and (0.8, 0.92), respectively.

도 20(b)는 일별(diurnal) 기간 동안 생성된 유입수 데이터 세트의 샘플을 보여준다. 본 발명에서는 생성된 COD 농도의 결과를 대표적으로 묘사하였다. 유입수 시나리오 클러스터에 따르면 생성된 데이터는 각각 다른 크기의 오염물질 농도와 가계 활동의 주간 시간과 관련된 변동의 전형적인 경향을 보여주었다. 또한 유입수 데이터에 혼입된 노이즈는 시간에 따라 생성된 데이터에 큰 편차를 발생시켰다. 결과적으로 유입수 시나리오에 따라 생성된 유입수 데이터 세트는 유입수의 다양한 상황을 반영하고 다양한 유입수 조건에서 최적의 운영 검색 시스템을 개발하는 데 사용될 수 있다.Figure 20(b) shows a sample of the influent data set generated over a diurnal period. In the present invention, the result of the generated COD concentration is depicted representatively. According to the influent scenario cluster, the generated data showed typical trends in pollutant concentrations of different magnitudes and diurnal time-related variability in household activity. In addition, the noise mixed in the influent data caused a large deviation in the generated data over time. As a result, influent data sets generated according to influent scenarios reflect the various situations of influent and can be used to develop an optimal operating search system under various influent conditions.

모델링된 MBR 플랫폼과 인터페이스하는 MARL 알고리즘의 구조는 환경 및 경제적 효율성의 개선을 통해 보상 가치를 극대화하기로 결정되었다. 3개의 에이전트가 있는 MARL과 MBR 플랜트 간의 상호 작용 인터페이스는 상태, 관찰, 작업 및 보상의 벡터를 고려하여 결정되었다. MARL 알고리즘의 세부 구조를 설명하면 다음과 같다.The structure of the MARL algorithm that interfaces with the modeled MBR platform was determined to maximize reward value through improvements in environmental and economic efficiency. The interaction interface between MARL and MBR plants with three agents was determined considering the vectors of state, observation, task and reward. The detailed structure of the MARL algorithm is as follows.

- State and observations: - State and observations:

중앙 집중식 학습이 진행될 때 활용되는 조인트 상태(Joint state(s))는 시간, COD, TN, NO₃, NH₄, TSS와 같은 유입수 조건과 현재 및 이전 시간의 유량(h 및 h-1)을 포함한다. 현재 및 이전 시간에 관찰된 유입수 조건을 통해 유입수 조건의 역학을 반영하는 데 도움이 되었다. 조인트 상태의 벡터는 에이전트가 서로 조정하고 다중 에이전트 문제의 복잡성을 줄이는 것을 용이하게 한다. 또한, 3개의 에이전트는 변수를 조작하는 대상에 따라 로컬 관측 벡터를 수신했다.The joint state(s) used when centralized learning is in progress is the time, COD, TN, NO ₃ , NH ₄ , TSS and the current and previous time flow rates (h and h-1). include Influent conditions observed at present and previous times helped reflect the dynamics of influent conditions. A vector of joint states makes it easy for agents to coordinate with each other and reduce the complexity of multi-agent problems. In addition, the three agents received local observation vectors according to which variables they manipulated.

첫 번째 에이전트는 시간(time), 유입수 COD(COD_in), 유입수 TN(TN_in), 유입유량(Q_in), 호기성 반응기(NHaerobic)의 암모니아 농도를 포함하는 벡터를 수신한 호기성 반응기에 대한 DO 설정값 농도의 작동 궤적을 제안한다. 첫 번째 고려되는 질화작용은 동적 유입 조건에 따라 호기성 반응기에서 발생하므로 벡터는 시간, CODin, TNin, Qin을 포함하였다. 또한 암모니아 제거는 질산화를 위한 DO 농도에 의존하므로, NHaerobic은 첫번째 에이전트의 관찰에 추가로 포함되었다.The first agent receives a vector containing time, influent COD (COD _in ), influent TN (TN _in ), influent flow rate (Q _in ), and ammonia concentration in the aerobic reactor (NHaerobic). We propose a working trajectory of the setpoint concentration. The first consideration is that nitrification occurs in an aerobic reactor according to dynamic inlet conditions, so vectors include time, CODin, TNin, and Qin. Since ammonia removal also depends on the DO concentration for nitrification, NHaerobic was additionally included in the observation of the first agent.

두 번째 제는 호기성 막반응기에서 안정화 반응기로의 외부 슬러지 재순환의 최적량을 제시하였다. 무산소 반응기(DO_anoxic, NO_anoxic 및 NH_anoxic)에서의 DO, 질산염, 암모니아 농도와 막반응기(NO_mem 및 NH_mem)의 질산염 및 암모니아 농도를 관찰 벡터로 고려했다. 탈질 공정을 위한 무산소 반응기에는 낮은 DO 농도(DO_anoxic)가 필요합니다. NO_mem과 NH_mem의 농도는 낮은 질산염이 무산소 반응기로 재활용되면 탈질소 성능이 저하되기 때문에 재활용 공정에서 필수적이다. 또한 NO_anoxic 및 NH_anoxic 농도 모두 식별 성능의 지표이기도 하다.The second article suggested the optimal amount of external sludge recirculation from the aerobic membrane reactor to the stabilization reactor. DO, nitrate and ammonia concentrations in anoxic reactors (DO _anoxic , NO _anoxic and NH _anoxic ) and nitrate and ammonia concentrations in membrane reactors (NO _mem and NH _mem ) were considered as observation vectors. Anoxic reactors for the denitrification process require low DO concentrations (DO _anoxic ). The NO _mem and NH _mem concentrations are essential in the recycling process because the denitrification performance is reduced when low nitrate is recycled to the anoxic reactor. Also, both NO _anoxic and NH _anoxic concentrations are indicators of discrimination performance.

세 번째 에이전트는 시간(time), 유입수 COD(COD_in), 유입수 TN(TN_in), 유입유량(Q_in), 유입수 COD/TN비(C/Nⁱⁿ)를 포함하는 관측벡터를 이용한 외부탄소 투입량 검색을 위한 것이다. 저탄소 대 질소(C/N_in) 조건은 빠르고 효율적인 탈질을 위해 외부 탄소 소스 에디션이 필요할 수 있다. TN_in 농도에 비해 COD_in이 적으면 불완전한 탈질소가 발생하는 반면 외부 탄소의 높은 투여량은 과도한 비용을 초래하고 이를 제거하는 후속 공정이 필요한다.The third agent is external carbon using observation vectors including time, influent COD (COD _in ), influent TN (TN _in ), influent flow rate (Q _in ), and influent COD/TN ratio (C/N ⁱⁿ ). It is for input retrieval. Low carbon to nitrogen (C/N _in ) conditions may require an external carbon source edition for fast and efficient denitrification. A small COD _in relative to the TN _in concentration results in incomplete denitrification, whereas a high dose of external carbon results in excessive costs and requires a subsequent process to remove it.

-Actions: -Actions:

각 다중 에이전트에 대해 개별 작업 세트가 사용되었다. DO 설정값 제안 에이전트의 작업 세트는 DO 농도에 대해 [-1, -0.5, 0, +0.5 및 +1 mg/L]이었다. 외부 슬러지 재활용 판단을 위한 두 번째 에이전트의 조치는 유입 유량의 400%에 대해 [-0.2, -0.1, 0, +0.1, +0.2 변경]이었다. 외부 탄소 투입량은 [-1, -0.5, 0, +0.5, +1 m³/d의 변화량] 중에서 탄소 유량을 선택한 제3 에이전트에 의해 조사되었다.A separate working set was used for each multi-agent. The working set of DO setpoint suggested agents were [-1, -0.5, 0, +0.5 and +1 mg/L] for the DO concentrations. The action of the second agent to determine external sludge recycling was [-0.2, -0.1, 0, +0.1, +0.2 change] for 400% of the influent flow. The external carbon input was investigated by the third agent, which selected the carbon flow rate among [variation of -1, -0.5, 0, +0.5, +1 m ³ /d].

-Rewards: - Rewards:

보상 기능은 MBR 플랜트의 운영 목표를 기반으로 정의되었으며 취해진 각 조치에 대해 MARL 알고리즘에 보상했다. 멀티 에이전트는 항상 자신의 보상을 최대화하려고 시도했고 시간이 지남에 따라 최적의 정책을 생성했다. 한편, 환경 및 경제 운영의 개선은 MBR 플랜트의 폐수 오염물질 및 운영 에너지의 감소를 의미한다. 따라서 본 발명에서는 제안된 운영전략의 개선량을 수동운영시스템과 비교하는 보상함수를 할당하였다. 운영 체제 J(t)의 성능은 수학식 11을 사용하여 계산되었고 보상 값 r(t)은 수학식 12에 의해 계산되었다.The reward function was defined based on the operational goals of the MBR plant and rewarded the MARL algorithm for each action taken. Multi-agents always try to maximize their own payoff and generate optimal policies over time. Meanwhile, improvements in environmental and economic operations mean reductions in wastewater pollutants and operating energy in MBR plants. Therefore, in the present invention, a compensation function is assigned to compare the improvement amount of the proposed operating strategy with the passive operating system. The performance of operating system J(t) was calculated using Equation 11 and the reward value r(t) was calculated using Equation 12.

[수학식 11][Equation 11]

[수학식 12][Equation 12]

여기서 ω는 각 성과 지표에 대한 가중치 요소이다(ω_EQI=200, ω_AE=40, ω_PE=3, 및 ω_EX=1). 도 19와 같이 선형보상함수는 하수처리공정에서 잘 알려진 성능특성을 선택하여 적용하였다. 성능 지수의 선형 조합은 전문가의 정책을 반영하여 RL의 성능을 높이는 데 도움이 된다. 또한 제안된 운영 궤적이 수동 운영 시스템에 비해 운영 성과를 향상시킬 수 없는 경우 음의 보상 값을 부여했다. 음수 보상 값은 MARL 알고리즘을 통해 가능한 비최적 조치를 피하는 데 도움이 된다.where ω is the weighting factor for each performance indicator (ω _EQI =200, ω _AE =40, ω _PE =3, and ω _EX =1). As shown in FIG. 19, the linear compensation function was applied by selecting well-known performance characteristics in the sewage treatment process. A linear combination of figures of merit reflects the expert's policy and helps to increase the performance of RL. We also assigned a negative reward value if the proposed operating trajectory could not improve operating performance compared to a passive operating system. Negative reward values help avoid non-optimal actions possible through the MARL algorithm.

MBR 플랜트의 주간 운전을 위한 최적의 운전 궤적을 탐색하기 위한 MARL의 구조는 표 4에 요약되어 있다.The structure of MARL to search the optimal operating trajectory for daytime operation of MBR plant is summarized in Table 4.

[표 4] MBR 플랜트의 최적 운전 궤적을 찾기 위한 QMIX 및 G2ANet의 구조[Table 4] Structure of QMIX and G2ANet to find the optimal operation trajectory of MBR plant

도 22는 훈련 에포크에 따라 사용된 두 MARL의 평균 보상 값을 보여준다. 두 MARL 알고리즘의 보상 증가 추세를 비교하기 위해 모든 Epoch에 대한 보상 값 대신 평균 보상 값이 표시된다. MBR 플랜트의 운전 조건을 최적화하기 위한 최상의 정책을 업데이트하고 획득하기 위해 10,000번의 훈련 에포크가 진행되었다. 높은, 정상 및 낮은 유입수 조건과 같은 생성된 유입수 시나리오는 지속 가능한 하수 처리 운영을 위해 모든 교육 기간에 활용되었다. 두 알고리즘 모두 도 22(a)와 같이 Epoch 수에 따라 수렴된 보상 값을 보였다. 그러나 QMIX에는 MBR 프로세스의 비단조적 환경에 대한 동작-값 함수의 정확한 학습을 제한하는 몇 가지 구조적 제약(도 22(b) 참조)이 있었다. 이 알고리즘은 엡실론 붕괴 탐색과 함께 잘못된 정책에 의해 국소 최적점으로 수렴되었다. 따라서 QMIX는 더 높은 보상 값에도 불구하고 보상이 매우 불안정한 차선책 정책만 학습했다.22 shows the average reward values of the two MARLs used according to training epochs. To compare the reward growth trend of the two MARL algorithms, the average reward value is shown instead of the reward value over all epochs. 10,000 training epochs were conducted to update and obtain the best policy to optimize the operating conditions of the MBR plant. Generated influent scenarios such as high, normal and low influent conditions were utilized in all training sessions for sustainable wastewater treatment operations. Both algorithms showed converged compensation values according to the number of epochs as shown in FIG. 22(a). However, QMIX has some structural constraints (see Fig. 22(b)) that limit accurate learning of the action-value function for the non-monotonic environment of the MBR process. This algorithm converged to a local optimum by the wrong policy with epsilon decay search. Thus, QMIX learned only sub-optimal policies with very unstable rewards despite higher reward values.

G2ANet의 자세한 보상 값은 도 22(c)와 같다. G2ANet 알고리즘은 수렴 안정성에서 눈에 띄게 다르다. 이는 G2ANet이 환경의 non-monotonic 및 non-stationary를 보다 정확하게 처리할 수 있었고, 잘 표현된 action-value 함수로 최적의 정책을 찾을 수 있었다는 것을 의미한다. G2ANet의 획득 보상 값은 QMIX보다 낮았지만 G2ANet은 모든 훈련 절차에서 무작위로 주어진 유입 조건에도 불구하고 보다 안정적인 보상 경향을 달성했다. 따라서 G2ANet 에이전트는 다양한 유입수 환경에서 환경 및 경제적 목표 달성 측면에서 MBR 플랜트의 운영 체제를 최적화하고 개선하기 위해 잘 협력했다고 추론할 수 있다. 그러나 QMIX에서 에이전트는 다소 제한된 조정으로 협력하는 법을 배웠으므로 적극적으로 프로세스를 함께 최적화하려고하지 않고 대신 조정없이 보상을 극대화하려고했다. 도 22(b)와 도 22(c)에 나타난 각 훈련과정의 보상값과 비교하여 G2ANet 알고리즘은 폐수처리공정에 적용하기에 적합하였다. 결과적으로, 본 발명에서는 고효율이지만 불안정한 QMIX와 효율성이 개선된 안정적인 G2ANet 간의 균형을 고려하여 MBR 플랜트의 다중 작동 설정값을 최적화하기 위해 G2ANet을 선택했다.The detailed compensation value of G2ANet is shown in FIG. 22(c). The G2ANet algorithm is noticeably different in convergence stability. This means that G2ANet was able to more accurately handle the non-monotonic and non-stationary of the environment and found the optimal policy with a well-expressed action-value function. Although the acquisition reward values of G2ANet were lower than those of QMIX, G2ANet achieved more stable reward trends in all training procedures despite randomly given influx conditions. Therefore, it can be inferred that the G2ANet agents cooperated well to optimize and improve the MBR plant's operating system in terms of achieving environmental and economic goals in various influent environments. However, in QMIX, the agents learned to cooperate with rather limited coordination, so they did not actively try to optimize the process together, but instead tried to maximize the reward without coordination. Compared with the compensation values of each training process shown in Fig. 22(b) and Fig. 22(c), the G2ANet algorithm was suitable for application to the wastewater treatment process. Consequently, in the present invention, G2ANet was chosen to optimize the multi-operational set-point of the MBR plant by considering the balance between the highly efficient but unstable QMIX and the stable G2ANet with improved efficiency.

세 가지 조작 변수의 최적 설정값은 다양한 유입수 조건에서 정성적 성능을 유지해야 한다. 공정 운영을 위한 최적의 설정값을 식별하기 위해 다양한 유입수 조건에서 운영 궤적 검색 시스템의 성능을 운영 시간에 따라 평가했다. 도 23은 COD 및 TN 조성비에 따른 낮음(저), 정상, 높음(고) 유입 시나리오와 같은 3가지 유입 조건에서 선택된 G2ANet 알고리즘에 의해 제안된 운영 궤적을 보여준다. 3개의 유입수 시나리오에서 20회 시뮬레이션 결과는 시뮬레이션의 전체 운영 궤적의 추세를 대표하여 플롯되었다. 3개의 중심(1.12, 1.16), (1.01, 0.90), (0.8, 0.92)의 유입 시나리오를 G2ANet 알고리즘에 제공하고 훈련된 G2ANet은 시간 단위로 운영 궤적을 검색했다.Optimal settings of the three manipulated variables should maintain qualitative performance under a variety of influent conditions. The performance of the operating trajectory retrieval system was evaluated over operating time under various influent conditions to identify optimal setpoints for process operation. 23 shows the operating trajectories proposed by the G2ANet algorithm selected under three inflow conditions, such as low (low), normal, and high (high) inflow scenarios, according to COD and TN composition ratios. The results of 20 simulations in the three influent scenarios are plotted as representative of the trend of the entire operating trajectory of the simulation. Influx scenarios of three centroids (1.12, 1.16), (1.01, 0.90), and (0.8, 0.92) were fed to the G2ANet algorithm, and the trained G2ANet retrieved the operating trajectories in time units.

도 23(a)는 저유입 조건에서 G2ANet 알고리즘에 의해 제안된 최적의 궤적을 보여주고 있다. 호기성 반응기에서 DO 농도의 최적 설정값은 4mg/L인 수동 DO 설정값보다 낮음이 분명했다. 따라서 G2ANet에서 제안한 DO 농도 값의 92.08%는 수동 DO 설정값보다 낮았다. 이러한 결과는 수동 폭기 시스템에 비해 감소된 작동 궤적을 통해 더 적은 폭기 에너지가 소모됨을 의미한다. 또한, DO 설정값의 최적 작동 궤적은 도 20에 표시된 유입수 경향의 상황과 유사한 명백한 변화하는 경향을 보였다. 검색된 DO 설정값은 주로 오전 10시와 오후 20시경에 증가하는 반면 유입수 오염물질의 변화와 일치하는 다른 시간에는 감소했다.23(a) shows the optimal trajectory proposed by the G2ANet algorithm under low inflow conditions. It was evident that the optimal setpoint for the DO concentration in the aerobic reactor was lower than the manual DO setpoint of 4 mg/L. Therefore, 92.08% of the DO concentration values suggested by G2ANet were lower than the manual DO settings. These results indicate that less aeration energy is consumed through a reduced operating trajectory compared to passive aeration systems. In addition, the optimal operating trajectory of the DO set point showed an obvious changing trend similar to the situation of the influent trend shown in FIG. 20 . The retrieved DO setpoints increased mainly around 10:00 am and 20:00 pm, while decreasing at other times consistent with changes in influent contaminants.

낮은 유입(저유입) 시나리오에서 외부 재활용 비율과 외부 탄소 유량의 작동 궤적은 도 23(a)에 나와 있습니다. 저유입 조건에서 탐색된 외부 재활용율의 궤적은 정상 및 고유입 조건과 다르지 않았다. 반면, 저유입 시나리오에서 제안된 외부 탄소유량은 다른 시나리오보다 약간 낮았다. 이러한 결과는 G2ANet 기반의 자율 궤적 탐색 시스템이 펌프 에너지 및 탄소 투입 비용을 줄이는 것보다 과도한 폭기 에너지를 줄이는 데 초점을 맞추고 있음을 추론할 수 있다. 폭기 에너지는 하수 처리장의 정상적인 운영 과정에서 에너지의 50% 이상을 차지하기 때문이다. The operating trajectories of the external recycling rate and external carbon flow in the low inflow (low inflow) scenario are shown in Fig. 23(a). The trajectory of the external recycling rate explored under the low inflow condition was not different from the normal and high inflow conditions. On the other hand, the proposed external carbon flux in the low-inflow scenario was slightly lower than in the other scenarios. These results can infer that the G2ANet-based autonomous trajectory search system focuses on reducing excessive aeration energy rather than reducing pump energy and carbon input costs. This is because aeration energy accounts for more than 50% of the energy in the normal operation of a sewage treatment plant.

도 23(b)는 정상 유입 시나리오에서 운전 궤적의 결과를 보여준다. 정상적인 유입 시나리오에서 최적 궤적의 DO 농도 값의 91.45%는 수동 DO 설정값보다 낮았다. 한편, 외부 재순환 비율은 유입 유량의 4배를 기준으로 변경되었다. 한편, 유입부하의 경향에 따라 외부탄소유량은 증가하였다. 저유입 시나리오의 결과와 비교하여 G2ANet 알고리즘은 DO 농도 및 외부 탄소 유량의 더 높은 운영 궤적을 검색했다. 더 높은 설정값은 질화 및 탈질화 공정을 모두 강화하여 영양소를 제거하고 TN 제거 효율을 증가시킨다.Fig. 23(b) shows the results of the driving trajectory in the normal inflow scenario. In the normal inflow scenario, 91.45% of the optimal trajectory DO concentration values were lower than the manual DO setpoint. On the other hand, the external recirculation rate was changed based on 4 times the inflow rate. On the other hand, the external carbon flow rate increased according to the inflow load trend. Compared to the results of the low inflow scenario, the G2ANet algorithm retrieved higher operating trajectories of DO concentration and external carbon flux. Higher settings enhance both nitrification and denitrification processes to remove nutrients and increase TN removal efficiency.

높은 유입(고유입) 시나리오에서 최적의 운영 궤적은 도 23(c)와 같다. G2ANet 알고리즘은 다른 시나리오와 비교하여 높은 유입 시나리오에 대해 가장 높은 설정값을 제안했다. 따라서 제안된 DO 농도 값의 88.33%는 4mg/L인 수동 DO 설정값보다 낮았다. 또한, 호기성 반응기에서 효과적인 질산화 공정을 위해 6mg/L 근처의 DO 농도가 때때로 제안되었다. 외부 탄소 유량의 궤적은 상대적으로 높은 값에 의해 제안된 반면 외부 재활용 비율은 낮고 정상적인 유입 시나리오와 유사했다. 재순환 흐름에서 더 높은 DO 농도는 무산소 반응기에서 탈질소의 효과를 감소시킬 것이며, 따라서 외부 재순환 비율은 크게 증가하지 않았다고 추론할 수 있다. 결과적으로, G2ANet 기반 자율 운영 궤적 검색 시스템은 질산화 및 탈질 프로세스를 효과적으로 향상시키기 위해 유입 시나리오에 따라 MBR 플랜트에 대한 세 가지 설정값을 제공했다.The optimal operating trajectory in the high inflow (high inflow) scenario is shown in FIG. 23(c). The G2ANet algorithm suggested the highest setpoint for the high inflow scenario compared to the other scenarios. Therefore, 88.33% of the suggested DO concentration values were lower than the manual DO setting value of 4 mg/L. Also, DO concentrations around 6 mg/L have sometimes been suggested for effective nitrification processes in aerobic reactors. The trajectory of the external carbon flux was suggested by relatively high values, while the external recycling rate was low and similar to the normal inflow scenario. It can be inferred that a higher DO concentration in the recycle stream would reduce the effect of denitrification in the anoxic reactor, and thus the external recycle rate did not increase significantly. As a result, the G2ANet-based autonomous operating trajectory retrieval system provided three setpoints for the MBR plant according to inflow scenarios to effectively enhance the nitrification and denitrification processes.

오히려 G2ANet 알고리즘은 매우 다양한 유입수 조건에 해당하는 세 가지 운영 궤적의 더 넓은 경계를 제안했다(도 20 참조). 운전 궤적의 폭이 좁으면 극적으로 변화하는 유입수 조건에 불안정하게 반응할 위험이 있고 비정상적인 이벤트에서 불리한 결과로 이어지기 때문에 실제 MBR 플랜트를 운영하는 데 함정이 될 수 있다. 따라서 제안된 자율 운전 궤적 탐색 시스템은 다양한 유입수 조건에서 전역 최적점을 탐색하여 MBR 플랜트를 경제적이고 환경적으로 운영할 수 있는 솔루션이 될 수 있다.Rather, the G2ANet algorithm suggested a broader boundary of the three operating trajectories corresponding to very different influent conditions (see Fig. 20). A narrow operating trajectory can be a pitfall in operating a real-world MBR plant, as it risks reacting unreliably to dramatically changing influent conditions and leads to unfavorable outcomes in anomalous events. Therefore, the proposed autonomous driving trajectory search system can be a solution that can economically and environmentally operate the MBR plant by searching for the global optimal point under various influent conditions.

대상 MBR 플랜트에서 2018년에 일일 측정된 유입수 COD 및 TN 농도는 도 24(a)에 나와 있습니다. 상대적으로 높은 유입 COD 농도는 2018년 초에 관찰되었다. 한편, 농도는 각각 5월과 10월에 해당하는 150일과 300일 사이에 감소하였다. 유입수 TN 농도는 60일(2월)경에 증가하였고 시간에 따라 변화하였다.The daily measured influent COD and TN concentrations in 2018 at the target MBR plants are shown in Figure 24(a). Relatively high inlet COD concentrations were observed in early 2018. On the other hand, the concentration decreased between days 150 and 300, corresponding to May and October, respectively. The influent TN concentration increased around day 60 (February) and varied over time.

COD 및 TN 농도는 매우 다양했다. 따라서 해당 구성 비율은 시간에 따라 크게 변경된다. 2018년 측정된 유입수 조건은 저 유입수, 정상 유입수 조건, 고 유입수 조건의 3가지 클러스터로 식별되었다. k-means 클러스터링에 따라 식별된 유입수 조건의 결과는 도 24(b)에 Voronoi 다이어그램으로 표시된다. 보로노이 다이어그램은 2차원 평면을 가장 가까운 점에 따라 다각형으로 분할하여 기하학적 특성에서 데이터 점의 해석을 용이하게 한다. 약 100일까지의 유입수(파란색으로 표시)는 주로 높은 유입수 및 정상 유입수 조건에 위치했다. 유입수 조성비는 시간이 지남에 따라 점차 정상 및 저 유입수 조건으로 변화하였다. 또한, 데이터 포인트는 각 유입 시나리오에서 특정 위치에 수집되지 않았다. 이러한 결과는 유입수 오염물질이 계절적 변화의 경향이 있음을 시사한다. 그러나 유입수는 또한 도 24(b)와 같이 COD와 TN에 대한 유입수 조성비가 (0.6, 0.6) 이하 또는 (1.3, 1.3) 이상인 비정상적 특성과 극한 상황을 가졌다. 따라서 유입수의 특성은 매우 다양한 유입수 환경에서 강력하고 안정적인 최적화 전략의 중요한 요인임을 알 수 있다.COD and TN concentrations were highly variable. Therefore, the composition ratio changes significantly over time. Influent conditions measured in 2018 were identified as three clusters: low influent conditions, normal influent conditions, and high influent conditions. The result of the influent condition identified according to k-means clustering is shown as a Voronoi diagram in FIG. 24(b). A Voronoi diagram divides a two-dimensional plane into polygons according to their closest points, facilitating the interpretation of data points in their geometrical properties. The influent up to about day 100 (shown in blue) was mainly located in high influent and normal influent conditions. The influent composition ratio gradually changed to normal and low influent conditions over time. In addition, data points were not collected at specific locations in each influx scenario. These results suggest that influent pollutants tend to change seasonally. However, the influent also had abnormal characteristics and extreme situations in which the composition ratio of the influent to COD and TN was less than (0.6, 0.6) or greater than (1.3, 1.3), as shown in FIG. 24(b). Therefore, it can be seen that influent characteristics are an important factor for a robust and stable optimization strategy in a highly variable influent environment.

도 24(c)는 k-means 클러스터링과 보로노이 다이어그램에 따라 특성화된 월별 유입수 조건을 나타낸다. 전술한 결과와 같이 1월과 2월의 유입수 조성비는 높은 유입수 조건도에 위치하였다. 유입수 조성비는 시간이 지남에 따라 오염도가 낮은 조건으로 이동하였다. COD와 TN의 조성비는 2월이 (1.13, 1.04)로 가장 높았고 COD와 TN의 조성비가 (0.76, 0.89)인 7월이 가장 낮은 유입수 조건 월임을 확인하였다. 결과적으로 G2ANet 기반의 자율궤적탐색시스템은 유입수 조성비가 낮거나 높은 달에 적용되어 극한 유입수 상황에서의 안정성과 적응력을 평가하였다.24(c) shows monthly influent conditions characterized according to k-means clustering and Voronoi diagram. As the above results, the influent composition ratio in January and February was located in the high influent condition diagram. The influent composition ratio shifted to a condition with a low degree of contamination over time. The composition ratio of COD and TN was the highest in February (1.13, 1.04), and July, when the composition ratio of COD and TN was (0.76, 0.89), was confirmed to be the month with the lowest influent condition. As a result, the G2ANet-based autonomous trajectory search system was applied to months with low or high influent composition ratios to evaluate stability and adaptability in extreme influent conditions.

G2ANet 알고리즘에 의한 최적 운전 궤적 탐색 성능은 유입수 조건이 극단적으로 높고 낮았던 2월과 7월의 유입수 조건을 적용하여 평가하였다. 도 25는 2월에 G2ANet 기반의 자율 궤적 탐색 시스템의 최적화된 성능을 보여주고 있다. 대상 플랜트를 시뮬레이션하기 위한 ASM-SMP 모델이 사용되었으며 앞서 언급한 동적으로 보정된 동력학적 매개변수가 MBR 플랜트를 정확하게 시뮬레이션하는 데 사용되었다. 2월의 유입수 조건은 도 25(a)와 같이 COD가 600 mg/L 이상이고 TN이 평균 TN 농도인 46 mg/L 이상인 비교적 높은 농도를 보였다. 또한 두 유입수 오염물질은 시간에 따라 변화하였다. 따라서 다양한 유입수 조건에 따른 최적의 운전조건을 탐색할 필요가 있었다.The optimal operation trajectory search performance by the G2ANet algorithm was evaluated by applying the influent conditions in February and July, when the influent conditions were extremely high and low. 25 shows the optimized performance of the G2ANet-based autonomous trajectory search system in February. An ASM-SMP model was used to simulate the target plant and the previously mentioned dynamically calibrated kinetic parameters were used to accurately simulate the MBR plant. As shown in FIG. 25(a), the influent condition in February showed a relatively high concentration of COD of 600 mg/L or more and TN of 46 mg/L or more, the average TN concentration. Also, the two influent contaminants changed over time. Therefore, it was necessary to search for optimal operating conditions according to various influent conditions.

도 25(b)는 DO 농도, 외부 재활용 비율, 외부 탄소 유량을 포함하는 2월의 최적 궤적을 보여준다. 전체 DO 농도는 수동 작동 시스템의 설정값보다 낮았다. 제안된 DO 농도의 평균은 2.30mg/L였으며, 이는 수동 DO 설정값 4mg/L의 42.5% 감소된 값이다. 한편, 변동하는 유입수 조건에 따라 외부 재활용율을 조사하였다. 평균값은 4.04로 수동운전시스템의 설정치 4에 근접했다. 한편, MBR 공장의 탈질소 공정을 강화하기 위해 외부 탄소량을 평균 1.53m3/d로 증량했다. 유입수 COD/TN 비율이 8.48이기 때문에 질소 제거 성능이 저하될 수 있다. 수동 운전 시스템은 외부 탄소를 추가하지 않았지만 G2ANet의 에이전트는 질소 제거 개선을 위해 일정 범위 내에서 노출량을 최적화했다.25(b) shows the optimal trajectories in February including DO concentration, external recycling rate, and external carbon flux. The total DO concentration was lower than the set point of the manually operated system. The average suggested DO concentration was 2.30 mg/L, which is a 42.5% reduction from the manual DO setting of 4 mg/L. Meanwhile, the external recycling rate was investigated according to the fluctuating influent conditions. The average value was 4.04, which was close to the setting value of 4 for the manual driving system. Meanwhile, in order to strengthen the MBR plant's denitrification process, the external carbon amount was increased to an average of 1.53 m3/d. Since the influent COD/TN ratio is 8.48, the nitrogen removal performance may be reduced. The passive driving system added no extraneous carbon, but G2ANet's agent optimized the dosage within a range for improved nitrogen removal.

EQI, AE, PE 및 EC를 포함한 성능 지표는 도 25(c)에 나와 있다. 성능 지표는 G2ANet에서 제안한 운영 궤적(도 25(b) 참조)을 통해 계산되어 수동 운영 시스템과 비교되었다. 3개의 에이전트가 다양한 유입 조건에서 EQI를 낮추기 위해 작동 궤적을 최적화하여 EQI 값이 수동 시스템에 비해 5.61% 감소했다. 이러한 결과는 감소된 DO 설정값이 과도한 질산화를 방지하여 유출 질산염을 감소시켰고 제안된 재활용 비율과 외부 탄소 투입량이 효과적인 탈질 과정을 유지했음을 유추할 수 있다. 폭기 에너지는 DO 설정값 감소로 인해 28.44%까지 효과적으로 감소한 반면, 펌핑 에너지는 1.01% 증가했다.Performance indicators including EQI, AE, PE and EC are shown in FIG. 25(c). The performance index was calculated through the operation trajectory proposed by G2ANet (see Fig. 25(b)) and compared with the manual operation system. The three agents optimized their actuation trajectories to lower EQI under various inflow conditions, resulting in a 5.61% reduction in EQI values compared to the passive system. These results suggest that the reduced DO setpoint reduced effluent nitrate by preventing excessive nitrification and that the proposed recycling ratio and external carbon input maintained an effective denitrification process. The aeration energy effectively decreased by 28.44% due to the DO setpoint reduction, while the pumping energy increased by 1.01%.

G2ANet 기반 자율 시스템의 환경적 이점은 도 25(d)에 나와 있다. 점선은 한국 환경부의 폐수 수질 제한을 나타낸다. CODCR 및 TN의 유출 한계는 각각 52 및 20mg/L이며 3시간 평균 값이다. 매뉴얼과 자율운전 시스템에 의한 배출수 COD 농도는 모두 배출수질 기준을 충분히 만족시켰다. 그러나 40~50일 및 52~58일 동안 18개의 방류수 TN 농도 값이 20 mg/L 기준을 초과했기 때문에 수동 시스템은 TN의 방류수 기준을 만족하는 데 제한될 수 있었다. 유출 한계가 3시간 평균 값으로 적용되었지만 수동 시스템은 도 25(d)의 빨간색 원으로 표시된 것처럼 20mg/L TN을 초과하는 유출수를 4회 배출했다. 수동 시스템은 폐수 TN 농도를 낮추기 위한 생물학적 또는 화학적 처리와 강력한 실시간 모니터링 시스템이 필요하다. 추가 처리로 인해 과도한 운영 비용이 발생할 수 있다. 반면에 제안된 G2ANet 기반 자율시스템은 방류수 TN 농도를 효과적으로 감소시켰고 기준을 만족시켰다. G2ANet의 상호 연결된 세 에이전트는 극도로 높은 유입 조건에도 불구하고 MBR 플랜트를 환경 및 경제적으로 만들기 위해 협력했다. 제안된 시스템의 정량적 평가는 표 7에 요약되어 있다.The environmental benefits of a G2ANet-based autonomous system are shown in Figure 25(d). The dotted line represents the wastewater quality limit of the Korean Ministry of Environment. The effluent limits for CODCR and TN are 52 and 20 mg/L, respectively, as 3-hour average values. The COD concentration of discharged water by the manual and autonomous driving system satisfies the discharged water quality standard. However, since 18 effluent TN concentration values for days 40–50 and 52–58 exceeded the 20 mg/L criterion, the passive system could be limited in meeting the effluent criterion for TN. Although the effluent limit was applied as a 3-hour average value, the passive system discharged effluent exceeding 20 mg/L TN 4 times as indicated by the red circles in Fig. 25(d). Passive systems require biological or chemical treatment to reduce wastewater TN concentrations and robust real-time monitoring systems. Additional processing may result in excessive operating costs. On the other hand, the proposed G2ANet-based autonomous system effectively reduced the effluent TN concentration and satisfied the criteria. The three interconnected agents of G2ANet have teamed up to make the MBR plant environmentally and economically viable despite extremely high inflow conditions. A quantitative evaluation of the proposed system is summarized in Table 7.

7월 유입수는 도 26(a)와 같이 COD의 경우 396mg/L, TN의 경우 46mg/L로 평균 유입수 농도와 비슷하거나 낮았다. 따라서 경제적 이점과 환경적 이점을 절충하여 적용 가능한 솔루션을 제안할 수 있다. 도 26(b)는 7월 G2ANet 기반 자율주행 시스템의 최적화된 궤적을 보여준다. 호기성 반응기에서 DO 농도의 설정값은 49.3%까지 효과적으로 감소되었으며, 따라서 자율 시스템의 평균 DO 농도는 2.03 mg/L였다. 2월의 DO 설정값과 비교하여 제안된 시스템은 7월의 DO 농도를 6.8% 더 감소시켰다. 7월의 설정값이 낮은 데에는 두 가지 이유가 있다. (1) 유입 조건은 더 낮았고 (2) 미생물 군집의 성장은 고온 조건에서 더 활성화되었습니다. 작동 온도에 따른 미생물의 활성화는 보정된 ASM-SMP 모델에 반영되었다. 따라서 G2ANet 기반 자율 시스템은 고온 계절에 더 낮은 DO 설정값을 제안했다.The influent in July was 396 mg/L for COD and 46 mg/L for TN, similar to or lower than the average influent concentration, as shown in FIG. 26(a). Therefore, it is possible to propose an applicable solution by balancing economic and environmental benefits. 26(b) shows the optimized trajectory of the G2ANet-based autonomous driving system in July. The setpoint of the DO concentration in the aerobic reactor was effectively reduced by 49.3%, so the average DO concentration in the autonomous system was 2.03 mg/L. Compared to the DO setpoint in February, the proposed system further reduced the DO concentration in July by 6.8%. There are two reasons for the low setpoint in July. (1) influx conditions were lower and (2) microbial community growth was more active under high temperature conditions. The activation of microorganisms according to the operating temperature was reflected in the calibrated ASM-SMP model. Therefore, the G2ANet-based autonomous system suggested a lower DO set point in the high temperature season.

7월의 G2ANet 기반 자율 시스템의 성능은 도 26(c)와 같다. EQI 값은 G2ANet에 의해 최적화된 운영 궤적을 통해 수동 시스템에 비해 11.11% 감소했다. 7월에는 고온계절에 질소제거율 개선이 더 잘 이루어졌기 때문에 방류수 수질개선이 더 높게 나타났다. 온도는 미생물의 성장에 민감한 요소이고 질산화-탈질소에 영향을 미치므로 더 적은 수의 조작 값으로 폐수 품질을 개선할 수 있다. 최적화된 DO 설정값은 폭기 에너지를 30.84% 감소시키고 펌핑 에너지를 0.43% 증가시켰다.The performance of the G2ANet-based autonomous system in July is shown in FIG. 26(c). The EQI value was reduced by 11.11% compared to the passive system with the operating trajectory optimized by G2ANet. In July, the improvement in effluent water quality was higher because the nitrogen removal rate was better improved in the high temperature season. Since temperature is a sensitive factor for microbial growth and influences nitrification-denitrification, wastewater quality can be improved with fewer operating values. The optimized DO setpoint reduced the aeration energy by 30.84% and increased the pumping energy by 0.43%.

자율 시스템의 환경 개선은 도 26(d)에 나와 있다. 하수 COD는 법적 배출 한도를 준수했으며 52mg/L 미만으로 유지되었다. 그러나 수동조작방식은 187일, 202일, 205일 총 4회에 한도를 초과한 방류수(TN)를 배출했다. 수동 시스템은 3시간 평균값을 고려하여 유출 한도를 충족할 수 있지만 향후 다양한 유입 조건에서 엄격한 운영이 위험할 수 있다. 환경 운영 관점에서 제안하는 G2ANet 기반 자율 시스템은 그림 도 26(d)의 빨간색 원과 같이 방류수 수질 제한을 안전하게 충족하는 솔루션이 될 수 있다. G2ANet은 경제적 및 환경적 이점을 달성함으로써 MBR 플랜트의 운영 시스템을 효과적으로 개선했다. 성능 향상에 대한 포괄적인 요약이 표 7에 나와 있다.The environment improvement of the autonomous system is shown in Fig. 26(d). Sewage COD complied with legal discharge limits and remained below 52 mg/L. However, the manual operation method discharged effluent (TN) that exceeded the limit on a total of four occasions on days 187, 202, and 205. Passive systems can meet effluent limits by considering 3-hour averages, but rigorous operation may be risky in future inflow conditions. From the environmental management point of view, the proposed G2ANet-based autonomous system can be a solution that safely meets the effluent water quality restrictions as shown in the red circle in Figure 26(d). G2ANet has effectively improved the MBR plant's operating system by achieving economic and environmental benefits. A comprehensive summary of the performance improvements is shown in Table 7.

[표 7] 풀스케일 MBR 공장에서의 G2ANet 기반 자율주행 시스템 종합 성능[Table 7] Overall performance of G2ANet-based autonomous driving system in full-scale MBR factory

표 7는 2018년 2월, 7월, 전체 G2ANet 기반 운영시스템의 정량적 성능 향상을 요약한 것이다. 자율 시스템은 도 25 및 도 26에서 볼 수 있듯이 유입수가 많은 조건과 낮은 조건 모두에서 유출수 품질을 분명히 개선했다. 따라서 본 발명의 실시예에 따른 시스템은 2018년에 다양한 유입수 조건에서 방류수 수질 한계를 만족하고 방류수 수질을 6.97% 개선하는 능력을 가졌다. 또한 1년 전체 폭기에너지는 24.80% 감소한 반면 펌핑에너지는 1.06% 증가하였다. 결과적으로 제안된 자율운항궤적탐색시스템은 운영 1년 동안 4.805·10⁵kWh를 절감할 수 있다. G2ANet 기반 자율주행 시스템은 한국전력이 공급하는 kWh당 0.075달러의 가격으로 연간 3만6039달러를 절감할 수 있다. 외부 탄소 투입 비용은 대상 공장에서 불규칙적으로 외부 탄소를 사용하기 때문에 성능 평가에 포함되지 않았다. 또한, 본 발명은 외부 탄소원을 쉽게 생분해되는 기질로 추정하였다. 메탄올 기반 외부 탄소 및 DO 농도와 같은 다른 변수를 조절하면서 최적화 시스템으로 확장될 수 있다. 에너지 및 오염 완화를 위해 막 반응기에서 결과적으로 G2ANet 기반 자율 시스템은 역동적이고 다양한 유입 조건에서 안전한 수질 유지와 에너지 절약에 기여할 수 있다. 또한, 본 발명은 다양한 유입수 조건에서 변수의 조작을 최적화하기 위해 중앙 집중식 학습-분권 실행 에이전트(centralized learning-decentralized execution agents)를 고려함으로써 보다 복잡한 운영 시스템으로 확장될 수 있는 가능성을 보여주었다Table 7 summarizes the quantitative performance improvement of the entire G2ANet-based operating system in February and July 2018. The autonomous system clearly improved effluent quality under both high and low influent conditions, as shown in FIGS. 25 and 26 . Therefore, the system according to the embodiment of the present invention has the ability to satisfy the effluent water quality limit and improve the effluent water quality by 6.97% under various influent conditions in 2018. In addition, the aeration energy decreased by 24.80% for the whole year, while the pumping energy increased by 1.06%. As a result, the proposed autonomous trajectory search system can save 4.805·10 ⁵ kWh during one year of operation. The G2ANet-based autonomous driving system can save $36,039 annually at a price of $0.075 per kWh supplied by KEPCO. External carbon input cost was not included in the performance evaluation because the target plant uses external carbon irregularly. In addition, the present invention assumes that the external carbon source is an easily biodegradable substrate. The optimization system can be extended by controlling other variables such as methanol-based external carbon and DO concentrations. As a result, G2ANet-based autonomous systems can contribute to safe water quality maintenance and energy savings under dynamic and varied influent conditions. In addition, the present invention showed the possibility of being extended to more complex operating systems by considering centralized learning-decentralized execution agents to optimize the manipulation of variables under various influent conditions.

본 발명의 실시예에서는 MBR 플랜트의 에너지 절감 및 환경적 운영을 위해 자율 운전 궤적 탐색 시스템을 제안하였다. 다양한 유입 조건에서 G2ANet 알고리즘에 의해 최적의 작동 설정값이 제공되었다. 안정적이고 적응 가능한 운영 전략을 개발하기 위해 낮은, 정상, 높은 COD 및 TN 구성 비율을 고려한 유입수 조건을 생성하고 k-평균 클러스터링 알고리즘을 기반으로 3가지 시나리오로 클러스터링했다. 3개의 협력 에이전트를 통해 DO, 외부 슬러지 재활용, 및 외부 탄소 투입량의 운영 궤적을 제안하기 위해 첨단 G2ANet 강화 학습이 사용되었다.In the embodiment of the present invention, an autonomous driving trajectory search system is proposed for energy saving and environmental operation of MBR plants. Optimal operating settings were provided by the G2ANet algorithm under various inlet conditions. To develop a stable and adaptable operating strategy, influent conditions considering low, normal, and high COD and TN composition ratios were created and clustered into three scenarios based on k-means clustering algorithm. Advanced G2ANet reinforcement learning was used to propose operational trajectories of DO, external sludge recycling, and external carbon input across the three collaborating agents.

G2ANet 기반 자율 시스템의 결과는 목표 MBR 플랜트의 성능을 향상시키기 위해 최적의 설정점이 적절하게 제안되었으며 다양한 유입수 조건에서 분명히 능가하는 것으로 나타났다. 또한 새롭게 측정된 1년 유입수 사례는 G2ANet 기반 자율 시스템이 동적 유입수의 다양한 COD-TN 조성비 조건에서 유사한 펌핑 에너지를 유지하면서 폭기 에너지를 24.80%, 유출수 품질을 6.97% 개선하여 새로운 유입수 조건에 적응적으로 대처할 수 있음을 나타낸다. 결과적으로, G2ANet 기반 자율 운영 궤적 검색 시스템은 디지털화된 수리 연구 영역에서 안정성과 적응성을 통해 WWTP 운영 시스템의 스마트 운영에 대한 혁신적인 기여를 나타낸다. 또한 역동적이고 복잡한 폐수 환경에서 안정적인 다중 솔루션을 제안함으로써 실제 플랜트 운영자를 위한 운영 지침이 될 수 있다.The results of the G2ANet-based autonomous system show that the optimal setpoint is adequately proposed to improve the performance of the target MBR plant and clearly outperforms under various influent conditions. In addition, the newly measured one-year influent case shows that the G2ANet-based autonomous system can adapt to new influent conditions by improving aeration energy by 24.80% and effluent quality by 6.97% while maintaining similar pumping energy under various COD-TN composition ratio conditions of dynamic influent. indicates that it can be dealt with. As a result, the G2ANet-based autonomous operating trajectory retrieval system represents an innovative contribution to the smart operation of WWTP operating systems through their stability and adaptability in the domain of digitized mathematical research. It can also serve as an operating guide for actual plant operators by proposing a stable multi-solution in a dynamic and complex wastewater environment.

또한, 상기와 같이 설명된 장치 및 방법은 상기 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.In addition, the device and method described above are not limited to the configuration and method of the above-described embodiments, but all or part of each embodiment is selectively combined so that various modifications can be made. may be configured.

Claims

As a sewage treatment plant model correction system,
a modeling unit for calculating a sewage treatment plant model by simulating the sewage treatment plant; and
A modeling correction unit for correcting the sewage treatment plant model by correcting the dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning; sewage treatment plant model autonomous correction system using reinforcement learning.

According to claim 1,
The sewage treatment plant model is an activated sludge model (ASM), characterized in that the sewage treatment plant model autonomous correction system using reinforcement learning.

According to claim 2,
The artificial intelligence learning is a sewage treatment plant model autonomous correction system using reinforcement learning, characterized in that multi-agent reinforcement learning artificial intelligence.

According to claim 3,
The multi-agent reinforcement learning artificial intelligence is a sewage treatment plant model autonomous correction system using reinforcement learning, characterized in that based on the G2ANet algorithm.

As a sewage treatment plant model correction method,
Calculating a sewage treatment plant activated sludge model (ASM) by a modeling unit simulating the sewage treatment plant;
A method for self-calibrating a sewage treatment plant model using reinforcement learning, comprising: a modeling correction unit correcting the dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning to generate a calibrated ASM model.

According to claim 5,
The artificial intelligence learning is a sewage treatment plant model autonomous correction method using reinforcement learning, characterized in that multi-agent reinforcement learning artificial intelligence.

According to claim 6,
The multi-agent reinforcement learning artificial intelligence is a sewage treatment plant model autonomous correction method using reinforcement learning, characterized in that based on the G2ANet algorithm.

As a sewage treatment plant operation optimization system,
An autonomous correction system according to any one of claims 1 to 4; and
An optimization module for optimizing the sewage treatment process operation control based on the ASM model calibrated by the autonomous correction system; a sewage treatment plant process operation multiple optimization system using reinforcement learning, comprising:

According to claim 8,
The optimization module is a sewage treatment plant process operation multiple optimization system using reinforcement learning, characterized in that for optimizing process operation based on multi-agent reinforcement learning artificial intelligence.

According to claim 9,
The optimization module optimizes the aeration intensity, external carbon source injection amount, and sludge circulation flow rate.

As a multiple optimization method for sewage treatment plant process operation,
Calculating a sewage treatment plant activated sludge model (ASM) by a modeling unit simulating the sewage treatment plant;
generating a calibrated ASM model by a modeling correction unit correcting the dynamic coefficient value of the sewage treatment plant model using artificial intelligence learning; and
The optimization module optimizes the aeration intensity, external carbon source injection amount, and sludge circulation flow rate of sewage treatment process operation through multi-agent reinforcement learning artificial intelligence based on the calibrated ASM model. Sewage treatment plant process operation multiple optimization method.