KR20200094577A

KR20200094577A - Realtime Accelerator Controlling System using Artificial Neural Network Simulator and Reinforcement Learning Controller

Info

Publication number: KR20200094577A
Application number: KR1020190012232A
Authority: KR
Inventors: 이근호; 이상윤; 이준현; 이준엽
Original assignee: 주식회사 모비스
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2020-08-07
Also published as: WO2020159052A1; JP2021515352A

Abstract

The present invention relates to an accelerator control system for accelerator control devices based on an artificial neural network. The accelerator control system for accelerator control devices according to the present invention comprises: a plurality of device simulators corresponding to a plurality of the accelerator control devices, respectively, and performing learning and simulation based on an artificial neural network; and a reinforcement learning controller adjusting at least one control parameter corresponding to the plurality of device simulators, managing final accelerator output quality according thereto, generating a plurality of sets of learning data in which an output of an injector for the accelerator, a control parameter value of each device simulator and a final accelerator output quality value are matched, and performing machine learning by using the corresponding plurality of sets of learning data to calculate optimal control parameter values for maximizing the final accelerator output quality. Each of the plurality of device simulators includes: a primary simulator; a noise processing unit receiving the output of the primary simulator and additionally processing noises to output the processing result; and a feedback processing unit generating a feedback signal based on the output of the noise processing unit, and providing the feedback signal to the primary simulator. According to the present invention, it is possible to rapidly determine the optimal value of control parameters for increasing the final accelerator output quality.

Description

Realtime Accelerator Controlling System using Artificial Neural Network Simulator and Reinforcement Learning Controller

본 발명은 가속기 제어 시스템에 관한 것으로서, 보다 상세하게는 가속기 제어 장치들에 대응되는 시뮬레이터를 인공 신경망 기반으로 구축한 후 학습 및 시뮬레이션을 통해 각 가속기 제어 장치들에 대한 파라미터 값을 결정하는 인공신경망 시뮬레이터와 강화학습 제어기를 사용한 실시간 가속기 제어 시스템에 관한 것이다.The present invention relates to an accelerator control system, and more specifically, an artificial neural network simulator that determines a parameter value for each accelerator control device through learning and simulation after constructing a simulator corresponding to the accelerator control devices based on an artificial neural network. And a real-time accelerator control system using a reinforcement learning controller.

입자 가속기는 원자핵, 또는 기본 입자를 가속시키는 장치이나 궁극적으로는 입자의 충돌, 그 관측을 통해 물질의 미세 구조를 관측, 판명하고자 하는 장치이다.A particle accelerator is a device for accelerating an atomic nucleus or a basic particle, but ultimately a device for observing and determining the microstructure of a material through collision of particles and observation thereof.

이러한 입자 가속기는 가속 대상에 따라 양이온 가속기, 음이온 가속기, 중이온 가속기, 전자 가속기(방사광 가속기) 등이 있고, 가속 방식에 따라 선형 가속기, 원형 가속기 등이 있는 등 그 종류는 다양하다.These particle accelerators are cationic accelerators, anion accelerators, heavy ion accelerators, and electron accelerators (radiation accelerators) depending on the object to be accelerated, and there are various types, such as linear accelerators and circular accelerators depending on the acceleration method.

그런데 이러한 입자 가속기에서 입자를 빛의 속도에 근접하는 속도까지 가속시키기 위해서는 상당히 많은 장치들로 이루어져 있고, 그 규모 또한 상당히 크다.However, in this particle accelerator, in order to accelerate the particle to a speed close to the speed of light, it is composed of a large number of devices, and the scale is also large.

예를 들어 4세대 포항 가속기의 경우에는, 강력한 자외선 레이저를 구리에 쬐어 전자가 튀어나오도록 하는 전자총, 이러한 전자총에 의한 전자빔의 길이를 대폭 압축시키는 선형가속기, 압축 가속된 전자빔이 영구자석 사이를 지나며 빛보다 상당히 밝은 X-선 방사광을 생성하는 언듈레이터, X-선 방사광을 출력하여 물질의 구조와 현상을 분자 구조까지 규명하는 X-선 실험장치(빔라인) 등이 하나의 가속 시스템을 구성하고 있고, 이러한 가속 시스템에서 전자총 제어, 입자의 가속 제어, 방사광 제어 등과 관련된 각종 제어 장치들이 모여 하나의 가속기 제어 시스템을 구성하는 것이다.For example, in the case of the 4th generation Pohang accelerator, an electron gun that strikes a powerful ultraviolet laser to copper to protrude electrons, a linear accelerator that greatly compresses the length of the electron beam by the electron gun, and the compressed accelerated electron beam pass between permanent magnets. An accelerator system consists of an undulator that generates X-ray emission light that is considerably brighter than light, and an X-ray experiment device (beam line) that outputs X-ray emission light to identify the structure and phenomena of a substance to a molecular structure. In this acceleration system, various control devices related to electron gun control, particle acceleration control, and radiation control are gathered to form one accelerator control system.

그런데 이러한 가속기 제어 시스템은 공간적으로 nm에서 Km까지의 영역을 다루고, 시간적으로 femto-sec에서 수일의 데이터 영역을 다루면서, 수십만 개의 제어 변수를 다루게 되는데, 이러한 가속기 제어 시스템에 포함된 각종 장치들에 대한 제어 파라미터에 대한 최적화가 상당히 어렵다.However, such an accelerator control system spatially covers a region from nm to Km, and temporally handles a data region of femto-sec, and handles hundreds of thousands of control variables. For various devices included in the accelerator control system, Optimization of the control parameters for the system is quite difficult.

즉, 가속기 제어 시스템에는 상술한 바와 같이 각종 제어 장치들(내부에 센서가 구비될 수 있음)이 가속 시스템 전 구간에 걸쳐 적게는 수십 개에서 많게는 수백 개까지 존재하고, 이들 각각의 제어 장치들에는 그 동작 방식을 결정하는 각종 파라미터(제어 변수)가 존재하게 되는데, 이러한 각 제어 장치들의 파라미터의 최적화는 종래에는 연구자들의 경험 등에 의존하고 있는 실정이다.That is, in the accelerator control system, as described above, various control devices (a sensor may be provided inside) exist in a small number of dozens to as many as hundreds over the entire section of the acceleration system, and each of these control devices There are various parameters (control variables) that determine the operation method, and the optimization of the parameters of each of these control devices is conventionally dependent on the experiences of researchers.

실제 제어 파라미터 설정의 시행착오를 줄이기 위하여 Matlab등을 사용하여 사전에 물리적 시뮬레이션은 수행하고 있으나, 전체 시스템의 시뮬레이션 기반 분석은 정확도나 계산 시간 면에서 많은 문제점을 가지고 있고, 포항가속기연구소등에서도 제어 파라미터의 최적화는 오랫동안의 운영 경험에 기초하여 전문가의 판단에 따른 제어 파라미터 설정을 사용하고 있다.In order to reduce the trial and error of setting actual control parameters, physical simulation is performed in advance using Matlab, etc., but simulation-based analysis of the entire system has many problems in terms of accuracy and calculation time, and control parameters in Pohang Accelerator Research Center, etc. Is using control parameter setting according to expert judgment based on long operating experience.

현재의 Matlab등을 이용한 물리 모델 시뮬레이션이나 실제 운영 경험에 따른 장비별 최적화 등은 부분적 최적화를 이룰 수 있으나, 다양한 제어 파라미터의 상호 영향에 따라 최종 목표인 가속기 최종 출력 품질(예를 들어 Q-BPM total 값)을 최대화하는 최적의 제어 파라미터를 탐색/결정하는 데에는 한계가 있다.The physical model simulation using the current Matlab or the optimization of each equipment according to the actual operating experience can achieve partial optimization, but the final output quality of the accelerator (for example, Q-BPM total) depending on the mutual influence of various control parameters. Value), there is a limit to searching/determining the optimal control parameter to maximize.

특히, 가속기 시뮬레이터인 Elegant 등을 기반으로 가상 가속기를 구현하고 제어 파라미터 최적화를 오프라인으로 구현하고자 하는 시도들이 있으나, 이러한 종래의 가속기 시뮬레이터의 속도 문제 등으로 인해 실시간 가속기 제어 시스템을 최적화할 수 없는 문제점이 있다.Particularly, there are attempts to implement a virtual accelerator based on Elegant, an accelerator simulator, and to implement optimization of control parameters offline, but there is a problem that a real-time accelerator control system cannot be optimized due to the speed problem of the conventional accelerator simulator. have.

공개특허 제10-2007-0054457호Patent Publication No. 10-2007-0054457

본 발명은 상기한 종래의 문제점을 해결하기 위해 안출된 것으로서, 그 목적은 가속기 제어 시스템에 포함된 각종 제어 파라미터들에 대해 가속기 최종 출력 품질을 극대화하기 위한 최적 값을 산출/탐색하는 시스템을 제공하는 것이다.The present invention has been devised to solve the above-mentioned conventional problems, and its purpose is to provide a system for calculating/searching an optimum value for maximizing the final output quality of the accelerator for various control parameters included in the accelerator control system. will be.

상기한 목적을 달성하기 위해 본 발명에 따른 가속기 제어 장치들에 대한 가속기 제어 시스템은, 복수 개의 가속기 제어 장치 각각에 대응되고, 인공 신경망 기반으로 학습 및 시뮬레이션을 수행하는 복수 개의 장치 시뮬레이터와; 상기 복수 개의 장치 시뮬레이터에 대응되는 적어도 하나의 제어 파라미터들에 대한 조정과 그에 따른 가속기 최종 출력 품질을 수행한 후, 가속기용 인젝터의 출력, 각 장치 시뮬레이터의 제어 파라미터 값과, 가속기 최종 출력 품질 값이 매칭된 복수 개의 학습용 데이터를 생성하고, 해당 복수 개의 학습용 데이터를 이용하여 기계 학습을 수행하여 상기 가속기 최종 출력 품질이 가장 높도록 하는 최적의 제어 파라미터들의 값을 산출하는 강화학습 제어기를 포함하고, 상기 복수 개의 장치 시뮬레이터 각각은, 1차 시뮬레이터와; 상기 1차 시뮬레이터의 출력을 수신하여 노이즈 부가 처리를 수행하여 출력하는 노이즈 처리부와; 상기 노이즈 처리부의 출력을 기초로 궤환 신호를 발생하여 상기 1차 시뮬레이터로 제공하는 궤환 처리부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the accelerator control system for the accelerator control devices according to the present invention includes a plurality of device simulators corresponding to each of the plurality of accelerator control devices and performing learning and simulation based on an artificial neural network; After performing adjustment for at least one control parameter corresponding to the plurality of device simulators and accelerator final output quality, the output of the injector for the accelerator, the control parameter value of each device simulator, and the final accelerator output quality value And a reinforcement learning controller for generating a plurality of matched learning data, and performing machine learning using the plurality of learning data to calculate values of optimal control parameters such that the final output quality of the accelerator is the highest. Each of the plurality of device simulators includes a primary simulator; A noise processing unit receiving the output of the primary simulator and performing noise addition processing to output the noise; And a feedback processor that generates a feedback signal based on the output of the noise processor and provides it to the primary simulator.

여기서, 상기 1차 시뮬레이터, 상기 노이즈 처리부, 상기 궤환 처리부는 모두 인공 신경망으로 이루어지고, 각 장치 시뮬레이터에 대응되는 가속기 제어 장치의 입력값에 따른 출력값을 기초로 한 기계 학습에 의해 형성된 것일 수 있다.Here, the primary simulator, the noise processor, and the feedback processor are all made of artificial neural networks, and may be formed by machine learning based on output values according to input values of an accelerator control device corresponding to each device simulator.

여기서, 상기 강화학습 제어기는 상기 복수 개의 장치 시뮬레이터에 구비된 적어도 하나의 제어 파라미터들의 모음에 해당하는 제어 파라미터 모음 세트를 지정하고, 상기 제어 파라미터 모음 셋에 포함된 제어 파라미터들의 값에 대한 변경 및 그에 따른 가속기 최종 출력 품질을 인공 신경망 기반 학습 과정을 통해 학습한 후, 상기 가속기 최종 출력 품질이 가장 높도록 하는 최적의 제어 파라미터들의 값을 산출하는 것일 수 있다.Here, the reinforcement learning controller designates a set of control parameter sets corresponding to a collection of at least one control parameter provided in the plurality of device simulators, changes to values of control parameters included in the set of control parameters, and the same After learning the accelerator final output quality through an artificial neural network-based learning process, it may be to calculate values of optimal control parameters such that the final accelerator output quality is highest.

여기서, 상기 강화학습 제어기는 상기 제어 파라미터 모음 세트에 포함된 제어 파라미터들 중 해당 제어 파라미터 모음 세트에 포함된 순서대로 하나씩 최적의 제어 파라미터 값을 산출한 것일 수 있다.Here, the reinforcement learning controller may calculate optimal control parameter values one by one in the order included in the corresponding control parameter set among control parameters included in the control parameter set.

이상 설명한 바와 같이 본 발명에 따르면, 각 가속기 제어 장치들의 제어 파라미터에 대해 인공 신경망 기반 기계 학습을 수행함으로써, 가속기 최종 출력 품질을 높이는 최적의 제어 파라미터들의 값을 신속하게 판단할 수 있다.As described above, according to the present invention, by performing artificial neural network-based machine learning on the control parameters of each accelerator control devices, it is possible to quickly determine values of optimal control parameters that increase the final output quality of the accelerator.

또한, 실제 운영되는 가속기 제어 장치를 인공 신경망 기반의 장치 시뮬레이터로 대체한 후, 학습용 데이터를 수집함으로써, 최적의 제어 파라미터 도출을 위한 가속기 운영 중단을 방지할 수 있음은 물론이고, 제대로 된 학습을 통해 가속기 최종 출력 품질을 높이는 최적의 제어 파리미터들에 대한 판단 정확도를 상당히 높일 수 있다.In addition, after replacing the actual accelerator control device with an artificial neural network-based device simulator, by collecting learning data, it is possible to prevent the accelerator from being interrupted to derive optimal control parameters, as well as through proper learning. It is possible to significantly increase the judgment accuracy for optimal control parameters that increase the accelerator final output quality.

도 1은 본 발명의 일 실시예에 따른 가속기 제어 시스템의 기능 블록도이고,
도 2는 도 1의 장치 시뮬레이터의 인공 신경망의 구조의 일 예이고,
도 3은 각 장치 시뮬레이터의 구체적 구성의 일 예이고,
도 4는 도 1의 강화학습 제어기가 지도 학습을 위해 이용하는 데이터의 일 예이고,
도 5는 종래의 EPICS 기반 가속기 제어 시스템의 구성의 일 예이고,
도 6은 도 5와 비교되는 본 발명의 일 실시예에 따른 가속기 제어 시스템과 종래의 EPIC 시스템 간의 결합을 나타낸 도면이다.1 is a functional block diagram of an accelerator control system according to an embodiment of the present invention,
2 is an example of the structure of an artificial neural network of the device simulator of FIG. 1,
3 is an example of a specific configuration of each device simulator,
4 is an example of data used by the reinforcement learning controller of FIG. 1 for supervised learning,
5 is an example of the configuration of a conventional EPICS-based accelerator control system,
6 is a view showing a combination between an accelerator control system and a conventional EPIC system according to an embodiment of the present invention compared to FIG. 5.

이하에서는 첨부도면을 참조하여 본 발명에 대해 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

이하 본 발명에 따른 각 실시예는 본 발명의 이해를 돕기 위한 하나의 예에 불과하고, 본 발명이 이러한 실시예에 한정되는 것은 아니다. 특히 본 발명은 각 실시예에 포함되는 개별 구성, 개별 기능 중 적어도 어느 하나 이상의 조합으로 구성될 수 있다.Hereinafter, each embodiment according to the present invention is only one example to help understanding of the present invention, and the present invention is not limited to these embodiments. In particular, the present invention may be configured in a combination of at least one or more of individual components and individual functions included in each embodiment.

도 1은 본 발명의 일 실시예에 따른 가속기 제어 장치들에 대한 실시간 가속기 제어 시스템(100)(이하 '가속기 제어 시스템(100)'이라 함)의 기능 블록도인데, 이러한 가속기 제어 시스템(100)은 후술하는 바와 같이 장치 시뮬레이터(110)의 각 파라미터를 최적화하기 위한 파라미터 결정 시스템에 해당할 수 있다.1 is a functional block diagram of a real-time accelerator control system 100 (hereinafter referred to as an'accelerator control system 100') for accelerator control devices according to an embodiment of the present invention, such accelerator control system 100 May correspond to a parameter determination system for optimizing each parameter of the device simulator 110 as described below.

동 도면에 도시된 바와 같이 가속기 제어 시스템(100)은 복수 개의 인공 신경망 기반 장치 시뮬레이터(110)(이하, '장치 시뮬레이터(110)'라 함) 및 강화학습 제어기(120)를 포함하여 구성된다.As shown in the figure, the accelerator control system 100 includes a plurality of artificial neural network based device simulators 110 (hereinafter, referred to as'device simulators 110') and a reinforcement learning controller 120.

여기서 각 장치 시뮬레이터(110)는 각 가속기 제어 장치에 대응되는 것으로서, 앞서 배경기술에서 언급한 바와 같이 하나의 가속기를 운영하기 위해서는 수십 개~ 수백 개, 필요에 따라서는 그 이상의 가속기 제어 장치가 필요한데, 도 1의 각 장치 시뮬레이터(110)는 이러한 가속기 제어 장치들을 각각 구현한 것이다.Here, each device simulator 110 corresponds to each accelerator control device, and as described in the background art, dozens to hundreds of accelerator control devices are required to operate one accelerator, if necessary. Each device simulator 110 of FIG. 1 implements each of these accelerator control devices.

특히 장치 시뮬레이터(110)는 인공 신경망 기반으로 학습 또는 시뮬레이션을 수행하도록 구성될 수 있다.In particular, the device simulator 110 may be configured to perform learning or simulation based on an artificial neural network.

예를 들어 장치 시뮬레이터(110)는 도 2와 같은 인공 신경망을 구성하고, 기계 학습을 통해 실제 가속기 제어 장치와 동일한 특성을 보이도록 하는 최적의 내부 파라미터 값을 결정할 수 있다.For example, the device simulator 110 may configure an artificial neural network as shown in FIG. 2 and determine an optimal internal parameter value to show the same characteristics as a real accelerator control device through machine learning.

예를 들어 장치 시뮬레이터(110)는 가속기 시스템에서 오프 타겟(Off Target)으로 종래에 이용되고 있는 Elegant, Genesis 시뮬레이터를 시뮬레이션 하는 인공신경망으로 이루어진 것일 수 있다.For example, the device simulator 110 may be an artificial neural network that simulates an elegant, Genesis simulator that is conventionally used as an off target in an accelerator system.

특히 이러한 복수 개의 장치 시뮬레이터(110) 각각은, 도 3에 도시된 바와 같이 1차 시뮬레이터(101), 노이즈 처리부(102), 궤환 처리부(103)를 포함하여 구성될 수 있다.In particular, each of the plurality of device simulators 110 may include a primary simulator 101, a noise processing unit 102, and a feedback processing unit 103, as shown in FIG. 3.

여기서 1차 시뮬레이터는 각 가속기 제어 장치의 기능을 시뮬레이션하는 것으로서, 예를 들어 Elegant LLRF(Low Level RF) 시뮬레이터 또는 Genesis 언듈레이터(Undultator) 시뮬레이터에 해당할 수 있다.Here, the primary simulator is to simulate the function of each accelerator control device, and may correspond to, for example, an elegant LLRF (Low Level RF) simulator or a Genesis Undultator simulator.

노이즈 처리부는 1차 시뮬레이터의 출력을 수신하여 노이즈 부가 처리를 수행하여 출력하는 기능을 수행하는 것이다.The noise processing unit performs a function of receiving the output of the primary simulator and performing noise addition processing to output the noise.

여기서 노이즈 처리부는 가속기 운영 중 실제 상황에서 발생하거나 발생할 수 있는 외력, 진동 등에 따른 노이즈를 인위적으로 생성할 수 있다.Here, the noise processing unit may artificially generate noise due to external force, vibration, etc., which may or may occur in an actual situation during the operation of the accelerator.

궤환 처리부는 노이즈 처리부의 출력을 기초로 궤환 신호를 발생하여 1차 시뮬레이터로 제공하는 기능을 수행한다.The feedback processing unit performs a function of generating a feedback signal based on the output of the noise processing unit and providing it to the primary simulator.

이러한 1차 시뮬레이터, 노이즈 처리부, 궤환 처리부는 모두 인공 신경망(Neural Network)으로 형성된 것으로서, 기존의 시뮬레이터의 특성(입력 및 출력)을 기계 학습하여 형성된 것일 수 있다.The primary simulator, the noise processing unit, and the feedback processing unit are all formed of an artificial neural network, and may be formed by machine learning the characteristics (input and output) of the existing simulator.

도 3에서 입력 및 출력되는 BPM(Beam Position Monitor)는 cBPM(cavity BPM) 이거나 sBPM(stripline BPM)일 수 있다.The input and output beam position monitor (BPM) in FIG. 3 may be a cavity BPM (cBPM) or a stripline BPM (sBPM).

참고로 도 2와 같은 형태는 기계 학습을 위한 인공 신경망 구성 분야에서 널리 알려진 형태에 불과하므로, 각 레이어에서의 가중치(Weight)를 기계 학습에 의해 최적의 값이 되도록 하는 공지된 기술에 대해서는 설명을 생략한다.For reference, since the form shown in FIG. 2 is only a form widely known in the field of constructing an artificial neural network for machine learning, a description will be given of a known technique for making the weight in each layer an optimal value by machine learning. Omitted.

다만, 도 2에 도시된 바와 같이 각 장치 시뮬레이터(110)는 앞 단에 위치한 장치 시뮬레이터(110)의 결과를 센서 파라미터로서 입력받고, 또한 제어 파리미터를 입력받은 후, 내부의 히든 레이어(히든 레이어의 형태나 가중치 등은 장치 시뮬레이터(110)에 대응되는 각 가속기 제어 장치의 기능에 따라 달라짐)를 거친 후, 최종적으로 출력 레이어에서 다음 단에 위치한 장치 시뮬레이터(110)로 센서 파라미터를 전달할 수 있다.However, as shown in FIG. 2, each device simulator 110 receives the result of the device simulator 110 located at the front end as a sensor parameter, and also receives a control parameter, and then an internal hidden layer (of the hidden layer After passing through the shape or weight, etc., depending on the function of each accelerator control device corresponding to the device simulator 110, finally, the sensor parameters may be transferred from the output layer to the device simulator 110 located at the next stage.

즉, 이 경우 각 장치 시뮬레이터(110)는 앞 단에 위치한 장치 시뮬레이터(110)의 결과값을 입력받은 후 내부 제어 파라미터 값에 따른 결과를 출력하는 구조로 이루어지는 것이다.That is, in this case, each device simulator 110 is configured to receive a result value of the device simulator 110 located at the front end and output a result according to an internal control parameter value.

여기서 각 장치 시뮬레이터(110)가 '앞 단' 또는 '다음 단'에 위치한다는 것은 실제 가속기 운영을 위한 가속기 제어 장치들이 그러한 순서대로 배치되어 있음을 의미한다.Here, the fact that each device simulator 110 is located in the'front stage' or'next stage' means that the accelerator control devices for actual accelerator operation are arranged in such order.

즉, 복수 개의 장치 시뮬레이터(110)는 각각 특정 배치 순서에 매칭되어 있고 이러한 장치 시뮬레이터(110)에 매칭된 배치 순서는 실제 가속기 운영을 위한 가속기 제어 장치들의 배치 순서와 일치하는 것이다.That is, the plurality of device simulators 110 are matched to a specific placement order, and the placement order matched to the device simulator 110 is to match the placement order of the accelerator control devices for operating the actual accelerator.

이렇게 인공 신경망 기반으로 기계 학습을 수행한 후 장치 시뮬레이터(110)의 각 레이어(인공 신경망 레이어)의 내부 변수 weight 값이 결정되게 되고, 이후 이를 전제로 가속기 최종 출력 품질을 가장 좋도록 하는 각 가속기 제어 장치들 즉, 각 장치 시뮬레이터(110)들의 제어 파라미터들이 강화학습 제어기(120)에 의해 산출 또는 결정되는데, 이를 위해 강화학습 제어기(120) 역시 도 2와 같은 인공 신경망으로 구성될 수 있다.After performing the machine learning based on the artificial neural network, the weight value of the internal variable of each layer (artificial neural network layer) of the device simulator 110 is determined, and after that, each accelerator control for the best accelerator final output quality Devices, that is, the control parameters of each device simulator 110 are calculated or determined by the reinforcement learning controller 120. To this end, the reinforcement learning controller 120 may also be configured as an artificial neural network as shown in FIG.

이하 이러한 과정을 구체적으로 설명한다.Hereinafter, this process will be described in detail.

우선, 강화학습 제어기(120)는 복수 개의 장치 시뮬레이터(110)에 대응되는 적어도 하나의 제어 파라미터들에 대한 조정과 그에 따른 가속기 최종 출력 품질을 수행한 후, 가속기용 인젝터의 출력, 각 장치 시뮬레이터(110)의 제어 파라미터 값과, 가속기 최종 출력 품질 값이 매칭된 복수 개의 학습용 데이터를 생성한다.First, the reinforcement learning controller 120 adjusts at least one control parameter corresponding to the plurality of device simulators 110 and performs final output quality of the accelerator, and then outputs the injector for the accelerator and each device simulator ( A plurality of training data matching the control parameter value of 110) and the final output quality value of the accelerator is generated.

즉, 종래에는 이러한 학습용 데이터를 가속기 운영 중에 발생하거나 적용한 데이터를 이용하였으나, 본 발명에서는 인공 신경망으로 이루어진 장치 시뮬레이터(110)를 이용하여 이러한 복수 개의 학습용 데이터를 생성하고, 이를 이용하는 것이다.That is, in the prior art, data generated or applied during the operation of the accelerator was used, but in the present invention, a plurality of such training data are generated and used using the device simulator 110 made of an artificial neural network.

이처럼 각 장치 시뮬레이터(110)가 인공 신경망으로 구성된 경우 학습용 데이터를 획득하는 시간이 상당히 단축될 수 있다. 즉, 앞서 설명한 바와 같이 장치 시뮬레이터(110)가 Elegant 기반의 가상화 장치에 불과한 경우에는 그 처리 속도가 상당히 떨어지므로 신속하게 학습용 데이터를 구할 수 없다.As such, when each device simulator 110 is composed of an artificial neural network, the time for acquiring training data may be significantly shortened. That is, as described above, when the device simulator 110 is only an elegant-based virtualization device, the processing speed is significantly reduced, and thus, learning data cannot be obtained quickly.

이에 반해 장치 시뮬레이터(110)가 인공 신경망으로 구성되어 기계 학습을 통해 실제 각 가속기 제어 장치의 특징을 구현한 경우에는 실제 가속기 운영 중에 적용할 수 없었던 제어 파라미터 변경 등을 적용할 수 있고, 그에 따른 학습용 데이터를 용이하게 획득할 수 있다.On the other hand, when the device simulator 110 is composed of an artificial neural network and implements the characteristics of each accelerator control device through machine learning, it is possible to apply control parameter changes, etc. that could not be applied during actual accelerator operation, and for learning accordingly. Data can be easily acquired.

이어서 강화학습 제어기(120)는 생성된 복수 개의 학습용 데이터를 이용하여 기계 학습을 수행하여 가속기 최종 출력 품질이 가장 높도록 하는 최적의 제어 파라미터들의 값을 산출하게 된다.Subsequently, the reinforcement learning controller 120 performs machine learning using the generated plurality of learning data to calculate values of optimal control parameters such that the final accelerator output quality is highest.

즉, 강화학습 제어기(120)가 상황에 따른 가속기 최종 출력 품질이 가장 높도록 하는 각 가속기 제어 장치들의 제어 파라미터를 산출하도록 하기 위해서는 상술한 학습용 데이터를 위한 기계 학습이 선행되어야 하는데, 종래에는 실제의 가속기 운영 중에 적용되었던 데이터를 수집하여 학습용 데이터로 이용함에 반해, 본 실시예에서는 각 가속기 제어 장치에 대응되는 장치 시뮬레이터(110)가 인공 신경망으로 구성되어 있으므로 다양한 제어 파라미터(실제 가속기에는 적용할 수 없었던 제어 파라미터)를 적용한 학습용 데이터를 생성한 후 이를 이용함으로써, 결국 인공 신경망의 장점을 극대화시킬 수 있다.That is, in order for the reinforcement learning controller 120 to calculate the control parameters of the respective accelerator control devices in which the final accelerator output quality according to the situation is highest, machine learning for the above-described learning data must be preceded. While the data that was applied during accelerator operation is collected and used as learning data, in the present embodiment, since the device simulator 110 corresponding to each accelerator control device is composed of an artificial neural network, various control parameters (which could not be applied to actual accelerators) After creating the learning data to which the control parameter) is applied and using it, the advantage of the artificial neural network can be maximized.

즉, 인공 신경망으로 이용하여 기계 학습을 하는 경우 그 학습용 데이터가 다양해야 제대로 된 학습이 이루어질 수 있는데, 종래에 실제 가속기 운영 중에 수집되는 학습용 데이터는 가속기의 안정성 등의 이유로 인해 제한된 범위의 데이터일 수밖에 없고, 결국 이를 이용하면 제대로 된 학습이 이루어 질 수 없지만, 본 실시예에 따른 학습용 데이터가 종래에 적용될 수 없는 제어 파라미터를 적용하여 수집한 데이터이기 때문에 강화학습 제어기(120)는 제대로 된 인공 신경망(상황별로 가속기 최종 출력 품질이 가장 높도록 하는 최적의 제어 파라미터들의 값을 산출하도록 하는 인공 신경망) 구축을 위한 기계 학습을 수행할 수 있는 것이다.In other words, in the case of machine learning using an artificial neural network, proper learning can be achieved only when the learning data is diverse. However, the learning data collected during the actual operation of the accelerator must be limited data due to the stability of the accelerator. There is no, after all, if this is not used, proper learning cannot be achieved, but since the learning data according to the present embodiment is data collected by applying control parameters that cannot be applied in the prior art, the reinforcement learning controller 120 has a proper artificial neural network ( It is possible to perform machine learning for constructing an artificial neural network) so as to calculate values of optimal control parameters that have the highest final output quality for each situation.

이하에서는 강화학습 제어기(120)가 제어 파라미터 모음 세트(Set)을 이용하여 학습을 진행하는 구체적인 과정을 설명한다.Hereinafter, a specific process in which the reinforcement learning controller 120 progresses learning using a set of control parameter sets.

강화학습 제어기(120)는 복수 개의 장치 시뮬레이터(110)에 구비된 적어도 하나의 제어 파라미터들의 모음에 해당하는 제어 파라미터 모음 세트(SET)을 지정하고, 그 제어 파라미터 모음 세트에 포함된 파라미터들의 값에 대한 변경 및 그에 따른 가속기 최종 출력 품질을 인공 신경망 기반 학습 과정을 통해 학습한 후, 가속기 최종 출력 품질이 가장 높도록 하는 최적의 제어 파라미터들의 값을 산출하는 기능을 수행한다.The reinforcement learning controller 120 designates a set of control parameter sets SET corresponding to a collection of at least one control parameter provided in the plurality of device simulators 110 and sets values of parameters included in the set of control parameter sets. After the change and the resulting accelerator final output quality are learned through an artificial neural network-based learning process, a function of calculating the values of optimal control parameters for the highest accelerator final output quality is performed.

예를 들어 각 복수 개의 장치 시뮬레이터(110)에 포함된 파라미터들이 각각 C0_1, C1_1, C2_1, C3_1인 경우, 강화학습 제어기(120)는 이들을 하나의 파라미터 모음 세트인 '{C0_1, C1_1, C2_1, C3_1}'을 지정한 후, 그 파라미터 모음 세트에 포함된 각 파라미터들(즉, C0_1, C1_1, C2_1, C3_1)의 값에 대한 변경을 기계 학습을 통해 수행하면서 최종 출력 품질이 가장 높도록 하는 제어 파라미터들의 값을 산출하는 것이다.For example, if the parameters included in each of the plurality of device simulators 110 are C0_1, C1_1, C2_1, and C3_1, respectively, the reinforcement learning controller 120 sets them as one parameter collection set'{C0_1, C1_1, C2_1, C3_1 }', the control parameters that make the final output quality the highest while changing the value of each parameter (ie, C0_1, C1_1, C2_1, C3_1) included in the parameter set through machine learning Is to calculate the value.

이때 강화학습 제어기(120)는 제어 파라미터 모음 세트에 포함된 파라미터들 중 해당 제어 파라미터 모음 세트에 포함된 순서대로 하나씩 최적의 파라미터 값을 산출할 수 있다.At this time, the reinforcement learning controller 120 may calculate an optimal parameter value one by one in the order included in the corresponding control parameter set among the parameters included in the control parameter set.

즉, 상술한 예와 같이 모음 세트이 '{C0_1, C1_1, C2_1, C3_1}'과 같이 되어 있는 경우 강화학습 제어기(120)는 첫 번째 파라미터인 C0_1에 대한 값을 지정하고, 그 값을 고정 값으로 한 상태에서 기계 학습을 통해 최적의 최종 출력 품질을 만족하는 C1_1을 산출하는 것이다.That is, when the vowel set is'{C0_1, C1_1, C2_1, C3_1}' as in the above-described example, the reinforcement learning controller 120 designates a value for the first parameter C0_1 and sets the value to a fixed value. In one state, machine learning yields C1_1 that satisfies the optimal final output quality.

이후 강화학습 제어기(120)는 기 지정 또는 산출된 C0_1, C1_1의 값을 고정 값으로 한 상태에서 기계 학습을 통해 최적의 최종 출력 품질을 만족하는 C2_1을 산출하고, 마찬가지로, C0_1, C1_1, C2_1의 값을 고정 값으로 한 상태에서 기계 학습을 통해 최적의 최종 출력 품질을 만족하는 C3_1을 산출할 수 있다.Subsequently, the reinforcement learning controller 120 calculates C2_1 that satisfies the optimal final output quality through machine learning in a state where the predetermined or calculated values of C0_1 and C1_1 are fixed, and similarly, of C0_1, C1_1, and C2_1. C3_1 that satisfies the optimal final output quality can be calculated through machine learning while the value is a fixed value.

여기서 제어 파라미터 모음 세트에는 복수 개의 장치 시뮬레이터(110)에 포함된 제어 파라미터들이 그 대응되는 장치 시뮬레이터(110)의 배치 순서와 동일한 순서대로 포함될 수 있다.Here, the set of control parameters may include control parameters included in the plurality of device simulators 110 in the same order as the arrangement order of the corresponding device simulator 110.

즉, 앞서 설명한 바와 같이 복수 개의 장치 시뮬레이터(110)는 그 대응되는 가속기 제어 장치에 따라 각각 특정 배치 순서를 가질 수 있는데, 제어 파라미터 모음 세트에는 이러한 각 장치 시뮬레이터(110)의 배치 순서에 따라 각 제어 파라미터들이 포함되게 되고, 강화학습 제어기(120)는 그 제어 파라미터 모음 세트에 포함된 순서대로 각 제어 파라미터의 값을 결정/산출하는 것이다.That is, as described above, the plurality of device simulators 110 may each have a specific arrangement order according to the corresponding accelerator control device, and the control parameter set includes each control according to the arrangement order of each of the device simulators 110. The parameters are included, and the reinforcement learning controller 120 determines/calculates the value of each control parameter in the order included in the set of control parameter sets.

예를 들어 가속기 운영을 위해 제1 가속기 제어 장치, 제2 가속기 제어 장치, 제3 가속기 제어 장치, 제4 가속기 제어 장치가 그 순서대로 배치되어야 하고, 제1 가속기 제어 장치에는 제1 제어 파라미터(C0_1)가, 제2 가속기 제어 장치에는 제2 제어 파라미터(C1_1)가, 제3 가속기 제어 장치에는 제3 제어 파라미터(C2_1)가, 제4 가속기 제어 장치에는 제4 제어 파라미터(C3_1)가 설정될 수 있다고 가정하면, 제어 파라미터 모음 세트는 '{C0_1, C1_1, C2_1, C3_1}'과 같이 구성될 수 있고, 강화학습 제어기(120)는 그 순서대로 각 파라미터 값을 결정하는 과정을 수행하는 것이다.For example, in order to operate an accelerator, a first accelerator control device, a second accelerator control device, a third accelerator control device, and a fourth accelerator control device must be arranged in that order, and the first accelerator control device includes a first control parameter (C0_1). ), a second control parameter (C1_1) for the second accelerator control device, a third control parameter (C2_1) for the third accelerator control device, and a fourth control parameter (C3_1) for the fourth accelerator control device. Assuming that, the set of control parameter sets may be configured as'{C0_1, C1_1, C2_1, C3_1}', and the reinforcement learning controller 120 performs a process of determining each parameter value in that order.

본 실시예에서는 각 가속기 제어 장치가 하나의 파라미터를 가지는 것을 일 예로 하였으나, 각 가속기 제어 장치는 복수 개의 파라미터를 가질 수도 있고, 그 복수 개의 파라미터들 간의 우선순위도 존재할 수 있음은 물론이다.In this embodiment, although each accelerator control apparatus has one parameter, each accelerator control apparatus may have a plurality of parameters, and there may be priorities among the plurality of parameters.

상술한 바와 같이 각 장치 시뮬레이터(110)(가속기 제어 장치)의 최적의 제어 파라미터 값을 산출하기 위해 강화학습 제어기(120)는 강화학습 모델을 기초로 MCTS(Monte Carlo Tree Search) 알고리즘을 이용할 수 있다.As described above, the reinforcement learning controller 120 may use a Monte Carlo Tree Search (MCTS) algorithm based on the reinforcement learning model to calculate the optimal control parameter value of each device simulator 110 (accelerator control device). .

여기서 MCTS 알고리즘은 난수를 사용하여 함수의 값을 확률적으로 계산하는 알고리즘을 부르는 용어로서, 계산하려는 값이 닫힌 값으로 표현되지 않거나 복잡한 경우, 이를 근사적으로 계산하기 위해 사용되는 것이다.Here, the MCTS algorithm is a term that refers to an algorithm for probabilistically calculating a function value using a random number, and is used to approximate the value to be calculated when it is not expressed as a closed value or complex.

예를 들어 Monte-Carlo 알고리즘을 적용해 원의 넓이를 구하는 경우, 원과 원에 내접하는 정사각형을 그리고 정사각형 안에 많은 수의 점을 찍어 점이 원의 내부에 찍힌 확률을 계산하면 원의 넓이를 근사적으로 구할 수 있다는 것이고, 이러한 Monte-Carlo 알고리즘은 임의 시행의 횟수를 증가시킬수록 정확도가 증가하게 된다.For example, if the area of a circle is obtained by applying the Monte-Carlo algorithm, the area of the circle is approximated by calculating the probability that the point is placed inside the circle by drawing a square inscribed in the circle and the circle and a large number of points in the square. The accuracy of the Monte-Carlo algorithm increases as the number of random trials increases.

Monte-Carlo Tree Search(MCTS)는 최선의 선택(optimal decision)을 찾는 방법으로서, 의사 결정을 위한 체험적 탐색 알고리즘으로 수식을 만들어 해를 찾기가 쉽지 않을 때 주로 사용되는데, 예를 들어 게임에서 최선의 수를 찾기 위한 방법으로 활용되는 것이다.Monte-Carlo Tree Search (MCTS) is a method for finding the optimal decision. It is mainly used when it is not easy to find a solution by formulating with an experiential search algorithm for decision making. It is used as a way to find the number of people.

예를 들어 MCTS를 게임에 적용한다면, 게임에서 두는 각각의 수가 노드이고 게임의 전체 과정은 각 수의 연속인 트리로 표현된다. 각 노드에는 승률이 기록되어 있으며, 게임에서 최선의 수를 찾는 과정은 가장 승률이 높은 노드를 찾아가는 것으로 근사될 수 있고, MCTS는 각 노드별 승률을 계산하고 승률이 높은 노드를 찾아가는 과정이라 할 수 있다.For example, if MCTS is applied to a game, each number placed in the game is a node, and the entire process of the game is represented by a tree of each number. The win rate is recorded in each node, and the process of finding the best number in the game can be approximated by finding the node with the highest win rate, and MCTS calculates the win rate for each node and refers to the process of finding the node with the highest win rate. have.

트리 탐색의 문제점은 자식 노드가 많아지면 탐색에 시간이 굉장히 많이 걸린다는 점인데, MCTS는 전체 가능성을 모두 탐색하지 않고 다수의 random simulation을 통해 게임 결과를 구하여 이를 노드의 승률에 적용하는 알고리즘에 해당한다.The problem with tree search is that the search takes a lot of time when there are many child nodes. MCTS does not search all the possibilities, but corresponds to an algorithm that obtains game results through a number of random simulations and applies them to the node's odds. do.

이러한 MCTS 알고리즘은 Selection, Expansion, Simulation, Back propagation 네 가지 단계로 이루어지는데, 간략하게 설명하면 다음과 같다.The MCTS algorithm consists of four steps: Selection, Expansion, Simulation, and Back propagation.

(1) Selection (선택): 루트 R에서 시작하여 연속되는 자식 노드를 따라 내려가 노드L을 선택한다.(1) Selection: Start at the root R and follow the child nodes in succession and select Node L.

(2) Expansion (확장): 노드L에서 게임이 종료되지 않은 경우, 새로운 자식 노드C를 생성하거나 기존의 자식 노드 중 하나를 노드C로 선정한다.(2) Expansion: If the game has not ended in Node L, create a new child Node C or select one of the existing child nodes as Node C.

(3)Simulation (시뮬레이션): 노드C를 대상으로 random playout을 수행한다.(3)Simulation: Perform random playout for node C.

(4) Back propagation (역전파): playout의 결과를 노드 C에서 루트 R까지 업데이트한다.(4) Back propagation: Update the result of playout from node C to root R.

이러한 MCTS 알고리즘 그 자체는 공지된 기술에 해당하므로 보다 상세한 설명은 생략한다.Since the MCTS algorithm itself is a known technique, a more detailed description is omitted.

강화학습 제어기(120)는 상술한 과정을 거쳐 가속기용 인젝터의 출력, 각 장치 시뮬레이터(110)의 제어 파라미터 값과, 가속기 최종 출력 품질 값이 매칭된 학습용 데이터를 복수 개 구비하게 되는데, 그 복수 개의 학습용 데이터들 중 기 설정된 개수만큼의 학습용 데이터를 이용하여 지도 학습(SL : Supervised Learning)을 통해 학습을 한 후, 강화 학습(RL : reinforcement learning)을 수행하여 최적의 제어 파라미터 값들을 산출해 낼 수 있다.The reinforcement learning controller 120 is provided with a plurality of learning data matching the output of the injector for the accelerator, the control parameter value of each device simulator 110, and the final output quality value of the accelerator through the above-described process. After learning through supervised learning (SL) using a preset number of learning data among the learning data, reinforcement learning (RL) can be performed to calculate optimal control parameter values. have.

여기서 '지도 학습'(Supervised Learning)은 데이터에 대한 레이블(Label)-명시적인 정답-이 주어진 상태에서 학습을 시키는 것으로서, 기존에 산출된 빅 데이터(즉, 상술한 바와 같이 인공 신경망으로 구성된 장치 시뮬레이터(110)를 이용하여 제어 파라미터를 변경해 가면서 획득한 입력, 제어 파라미터, 최종 출력 품질 값)를 이용하여 인공 신경망 기반으로 학습하는 것을 의미하고, 강화 학습은 에이전트가 주어진 환경(state)에 대해 어떤 행동(action)을 취하고 이로부터 어떤 보상(reward)을 얻으면서 학습을 진행하는 것을 의미한다.Here,'supervised learning' refers to learning in a state in which a label for data-an explicit correct answer-is given, and previously calculated big data (i.e., a device simulator composed of artificial neural networks as described above) (110) refers to learning based on artificial neural networks using input, control parameters, and final output quality values obtained while changing control parameters, and reinforcement learning means that an agent acts on a given state. It means taking a (action) and learning while getting some reward from it.

이러한 지도 학습 또는 강화 학습 역시 이론적 내용은 기 공지된 기술에 해당하는데, 본 발명의 특징은 이러한 학습 방법을 가속기 제어 장치들의 최적의 파라미터를 산출하는데 이용하고 있다는데도 그 특징이 있는 것이다.The theoretical content of the supervised learning or reinforcement learning also corresponds to a well-known technique, and the feature of the present invention is that the learning method is used to calculate optimal parameters of the accelerator control devices.

도 4는 강화학습 제어기(120)의 '지도 학습'을 위해 미리 획득한 데이터의 일 예이다.4 is an example of data acquired in advance for'supervised learning' of the reinforcement learning controller 120.

즉, 도 4의 각 행은 앞서 언급한 학습용 데이터에 해당하고, 강화학습 제어기(120)는 이렇게 모인 복수 개의 학습용 데이터 중 기 설정된 순서에 따라 또는 랜덤 방식으로 몇 개의 케이스에 해당하는 학습용 데이터를 추출한 후, 그 추출한 학습용 데이터를 이용하여 인공 신경망에서 '지도 학습'을 수행할 수 있는 것이다.That is, each row of FIG. 4 corresponds to the aforementioned learning data, and the reinforcement learning controller 120 extracts the learning data corresponding to several cases in a predetermined order or in a random manner among the plurality of learning data collected in this way. Then, it is possible to perform'supervised learning' in an artificial neural network using the extracted learning data.

참고로 도 4의 각 학습용 데이터에서 'I'는 인젝터의 출력 값, Q-BPM은 가속기 최종 출력 품질 값이고, 나머지는 각 가속기 제어 장치들의 제어 파라미터에 해당한다.For reference, in each learning data of FIG. 4,'I' is the output value of the injector, Q-BPM is the final output quality value of the accelerator, and the rest corresponds to control parameters of the respective accelerator control devices.

상술한 실시예에서 설명한 가속기 제어 시스템(100)은 예를 들어 EPICS 기반 가속기 제어 시스템과 결합하여 운영될 수도 있다.The accelerator control system 100 described in the above-described embodiment may be operated in combination with, for example, an EPICS-based accelerator control system.

도 5는 종래의 EPICS 기반 가속기 제어 시스템을 나타내고 있고, 도 6은 이러한 EPICS 기반 가속기 제어 시스템에서 제어 장치들이 본 발명에 따른 장치 시뮬레이터(110)로 대체되고 강화학습 제어기(120)가 추가되어 운영되는 형태를 도시하고 있다.5 shows a conventional EPICS-based accelerator control system, and FIG. 6 shows that the control devices are replaced by the device simulator 110 according to the present invention and the reinforcement learning controller 120 is added and operated in the EPICS-based accelerator control system. It shows the form.

도 5를 참조하면, LLRF, BPM 등의 가속기 제어 장치는 가속기 전체구간에서 수십대씩 배치되어 각 구간에서 최적화를 수행하는데, 이때 가속기 제어 장치들은 EPICS IOC에 각 디바이스별로 주어진 시간 주기별로 Pv data에 대한 이벤트를 발생하거나, 변화한 값들에 대한 이벤트를 발생시키고, EPICS IOC( Input Output Controller) 서버는 각 디바이스들의 이벤트를 수집하여 운영자가 사용하는 CSS 화면 등에 표시할 수 있게 데이터를 보내거나, 추후 데이터를 활용할 수 있게 AA (Archiver Appliance) 서버 등에 데이터를 보내어 저장하고, 만일 특정한 설정값을 벗어난 값이 들어올 때는 Alarm Server를 통해 CSS 시스템에 경고 메시지등을 출력함으로써 운영자가 조치할 수 있도록 한다.Referring to FIG. 5, accelerator control devices such as LLRF and BPM are arranged in dozens of sections in the entire accelerator section to perform optimization in each section. At this time, the accelerator control devices are configured for the Pv data for a given time period for each device in EPICS IOC. Generates an event, or generates an event for changed values, and the EPICS IOC (Input Output Controller) server collects the events of each device and sends data to display on the CSS screen used by the operator, or sends data later. Data can be saved by sending it to an AA (Archiver Appliance) server for use, and if a value outside a specific setting value is entered, an alarm message is output to the CSS system through the Alarm Server so that the operator can take action.

도 6은 이러한 가속기 제어 장치가 인공 신경망 기반의 장치 시뮬레이터(110)로 대체되고, 장치 시뮬레이터(110)와 통신하는 강화학습 제어기(120)가 추가된 상태가 도시되었다.FIG. 6 illustrates a state in which the accelerator control device is replaced with an artificial neural network based device simulator 110, and a reinforcement learning controller 120 that communicates with the device simulator 110 is added.

도 6과 같이 실제 운영되는 가속기 제어 장치를 인공 신경망 기반의 장치 시뮬레이터(110)로 대체함으로써, 최적의 제어 파라미터 도출을 위한 가속기 운영 중단을 방지할 수 있을 뿐만 아니라, 장치 시뮬레이터(110)가 인공 신경망 기반으로 학습이 이루어진 형태를 가짐으로써 실제의 가속기 제어 장치의 특성에 상당히 근접할 수 있게 되는 것이다.By replacing the accelerator control device that is actually operated as shown in FIG. 6 with an artificial neural network-based device simulator 110, it is possible to prevent the accelerator operation for optimal control parameter derivation, as well as the device simulator 110, the artificial neural network By having a form based on learning, it is possible to approach the characteristics of the actual accelerator control device considerably.

도 6에는 편의상 종래의 EPICS에 가속기 제어 장치가 추가로 포함되는 것을 일 예로 하였으나, 가속기 제어 장치는 별도의 시스템으로 구성되거나 상술한 장치 시뮬레이터(110)와 함께 하나의 시스템으로 구성될 수도 있다.In FIG. 6, for convenience, an accelerator control device is additionally included in the conventional EPICS, but the accelerator control device may be configured as a separate system or may be configured as one system together with the device simulator 110 described above.

상술한 실시예에서는 각 장치 시뮬레이터에 궤환 처리부가 포함되는 것을 일 예로 하였으나, 궤환 처리부 그 자체는 강화 학습 제어기에 포함될 수도 있다.In the above-described embodiment, the feedback processing unit is included in each device simulator as an example, but the feedback processing unit itself may be included in the reinforcement learning controller.

한편, 상술한 각 실시예를 수행하는 과정은 소정의 기록 매체(예를 들어 컴퓨터로 판독 가능한)에 저장된 프로그램 또는 애플리케이션에 의해 이루어질 수 있음은 물론이다. 여기서 기록 매체는 RAM(Random Access Memory)과 같은 전자적 기록 매체, 하드 디스크와 같은 자기적 기록 매체, CD(Compact Disk)와 같은 광학적 기록 매체 등을 모두 포함한다.Meanwhile, of course, the process of performing each of the above-described embodiments may be performed by a program or an application stored in a predetermined recording medium (for example, a computer readable). Here, the recording medium includes both an electronic recording medium such as a random access memory (RAM), a magnetic recording medium such as a hard disk, and an optical recording medium such as a compact disk (CD).

이때, 기록 매체에 저장된 프로그램은 컴퓨터나 스마트폰 등과 같은 하드웨어 상에서 실행되어 상술한 각 실시예를 수행할 수 있다. 특히, 상술한 본 발명에 따른 의 기능 블록 중 적어도 어느 하나는 이러한 프로그램 또는 애플리케이션에 의해 구현될 수 있다.At this time, the program stored in the recording medium may be executed on hardware such as a computer or a smartphone to perform each of the above-described embodiments. In particular, at least one of the functional blocks according to the present invention described above may be implemented by such a program or application.

또한, 본 발명은 상기한 특정 실시예에 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위 내에서 여러 가지로 변형 및 수정하여 실시할 수 있는 것이다. 이러한 변형 및 수정이 첨부되는 특허청구범위에 속한다면 본 발명에 포함된다는 것은 자명할 것이다. In addition, the present invention is not limited to the specific embodiments described above, but can be implemented by modifying and modifying in various ways without departing from the gist of the present invention. It will be apparent that such variations and modifications are included in the present invention if they fall within the scope of the appended claims.

100 : 가속기 제어 시스템 110 : 장치 시뮬레이터
120 : 강화학습 제어기100: accelerator control system 110: device simulator
120: reinforcement learning controller

Claims

A plurality of device simulators corresponding to each of the plurality of accelerator control devices and performing learning and simulation based on an artificial neural network;
After adjusting the at least one control parameter corresponding to the plurality of device simulators and performing final accelerator output quality, the output of the injector for the accelerator, the control parameter value of each device simulator, and the final accelerator output quality value It includes a reinforcement learning controller that generates a plurality of matched learning data, and performs machine learning using the plurality of learning data to calculate values of optimal control parameters such that the final output quality of the accelerator is the highest.
Each of the plurality of device simulators includes a primary simulator; A noise processing unit receiving the output of the primary simulator and performing noise addition processing to output the noise; And a feedback processor that generates a feedback signal based on the output of the noise processor and provides it to the primary simulator.

According to claim 1,
The primary simulator, the noise processing unit, and the feedback processing unit are all made of an artificial neural network, and accelerator control characterized in that it is formed by machine learning based on an output value according to an input value of an accelerator control device corresponding to each device simulator. Accelerator control system for devices.

According to claim 1,
The reinforcement learning controller designates a set of control parameter sets corresponding to a collection of at least one control parameter provided in the plurality of device simulators, changes to values of control parameters included in the set of control parameters, and accelerators accordingly. After learning the final output quality through the artificial neural network-based learning process, the accelerator control system for the accelerator control devices, characterized in that for calculating the value of the optimal control parameters to the highest accelerator final output quality.

According to claim 3,
The reinforcement learning controller is an accelerator control system for accelerator control devices, characterized in that for calculating the optimum control parameter value one by one in the order included in the corresponding control parameter set among the control parameters included in the set of control parameters.