RU2780340C2

RU2780340C2 - System for assistance in setting of installation operating mode, training device, and device for assistance in setting of operating mode

Info

Publication number: RU2780340C2
Application number: RU2020140013A
Authority: RU
Inventors: Такехито ЯСУИ; Сидзука ИКАВА; Акифуми ТОКИ; Масару СОГАБЕ; Юсуке СУВА
Original assignee: Тийода Корпорейшн; Гридинк
Priority date: 2018-05-08
Filing date: 2019-04-17
Publication date: 2022-09-21

Abstract

FIELD: measuring equipment.

SUBSTANCE: system 1 for assistance in setting of an installation operating mode, which performs a process formed by means of a set of devices, includes: a set of control devices 20, which act on controlled devices 10 for control with feedback, respectively; and device 30 for assistance in setting of an operating mode, which is combined assistance for setting of the set of control devices 20, which perform a set of feedback control tasks, respectively and independently. Device 30 for assistance in setting of an operating mode includes: a unit for obtaining a set of measured values, which obtains a set of measured values indicating states of the set of controlled devices 10, controlled by means of the set of control devices 20, respectively; and a unit for determination of an adjustment parameter of the control device, which determines, based on the set of measured values obtained by means of the unit for obtaining the set of measured values, a set of adjustment parameters of the control device, used by each of the set of control device 20 for determination of manipulation variables for control, which should be introduced into the set of controlled devices, according to policy learned by means of deep learning with support.

EFFECT: obtaining a system and a device for assistance in setting of an installation operating mode.

11 cl, 8 dwg

Description

Область техники, к которой относится изобретениеThe field of technology to which the invention belongs

[0001] Настоящее изобретение относится к системе поддержки настройки режима работы установки для поддержки настройки режима работы установки и к обучающему устройству и устройству поддержки настройки режима работы, которые могут быть использованы в системе настройки режима работы установки.[0001] The present invention relates to a plant operation mode setting support system for plant operation mode setting support, and a teaching device and an operation mode setting support device that can be used in the plant operation mode setting system.

Уровень техникиState of the art

[0002] В установках для производства химических продуктов и промышленных изделий последовательность процессов выполняется посредством большого числа устройств, таких как реактор и нагревательная печь. Большое число манипуляционных переменных для управления большим числом устройств соответственно изменяют состояние процесса. В установках, в которых выполняется многоэтапный процесс, большое число манипуляционных переменных могут взаимодействовать друг с другом сложным образом. Следовательно, нелегко прогнозировать влияние от изменения манипуляционной переменной, и параметр регулировки устройства управления для определения манипуляционной переменной задается опытным оператором, чтобы эксплуатировать установку.[0002] In installations for the production of chemical products and industrial products, the sequence of processes is carried out through a large number of devices, such as a reactor and a heating furnace. A large number of manipulation variables to control a large number of devices change the state of the process accordingly. In installations in which a multi-step process is performed, a large number of manipulation variables may interact with each other in complex ways. Therefore, it is not easy to predict the effect of changing the manipulated variable, and the adjustment parameter of the control device for determining the manipulated variable is set by an experienced operator to operate the plant.

[0003] Например, патентный документ 1 и патентный документ 2 предлагают технологию управления установкой типа этой, которая включает в себя множество систем управления, которые могут взаимно мешать друг другу.[0003] For example, Patent Document 1 and Patent Document 2 propose a plant control technology such as this, which includes a plurality of control systems that can interfere with each other.

[0004] Патентный документ 1 описывает технологию предоставления, между тремя или более системами контуров управления, элемента невмешательства, который отменяет взаимные помехи между контурами управления. Элемент невмешательства вычисляется посредством аппроксимации передаточной функции каждого контура управления и передаточной функции вмешивающегося элемента, вмешивающегося из другого контура управления, к форме реакции системы запаздывания первого порядка, которая включает в себя пустую трату времени.[0004] Patent Document 1 describes a technology for providing, between three or more control loop systems, a non-interference element that cancels interference between control loops. The non-interference element is computed by approximating the transfer function of each control loop and the transfer function of the intervening element intervening from the other control loop to a first-order lag system response form that includes wasting time.

[0005] Патентный документ 2 описывает технологию построения соотношения между позицией клапана для клапана управления и переменной, представляющей состояние процесса, которое изменяется в зависимости от позиции клапана, в форме уравнения в устойчивом состоянии, вычисления CV-значения, указывающего целевую позицию клапана для клапана управления на основе выражения аналитического решения, полученного для каждого клапана управления согласно уравнению, вычисления CV-значения, указывающего текущую позицию клапана для клапана управления, определенную посредством обнаруженного значения, и вычисления отклонения e между двумя CV-значениями и воздействие на состояние процесса для управления с обратной связью на основе отклонения e.[0005] Patent Document 2 describes a technique for constructing a relationship between a valve position for a control valve and a variable representing a process state that varies with valve position, in the form of a steady state equation, calculating a CV value indicating a target valve position for a control valve based on the analytical solution expression obtained for each control valve according to the equation, calculating a CV value indicating the current position of the valve for the control valve determined by the detected value, and calculating the deviation e between the two CV values, and affecting the process state for reverse control bond based on deviation e .

[0006] [патентный документ 1] JP2007-11866[0006] [patent document 1] JP2007-11866

[патентный документ 2] JP2010-97254[patent document 2] JP2010-97254

Проблема, которая должна быть решена изобретениемThe problem to be solved by the invention

[0007] Трудно аппроксимировать математически поведение значения процесса в каждой из множества систем управления с высокой точностью. Более трудным является прогноз поведения значения установки на основе математически аппроксимированного представления множества систем управления с высокой точностью при наличии непрогнозируемого нарушения работы во множестве систем управления, которые могут взаимодействовать сложным образом. Требуется технология, которая предоставляет возможность устойчивой работы установки, даже когда нарушение работы, которое может дестабилизировать характер работы установки, случается.[0007] It is difficult to mathematically approximate the behavior of a process value in each of a variety of control systems with high accuracy. It is more difficult to predict the behavior of a setting value based on a mathematically approximate representation of a plurality of control systems with high accuracy in the presence of an unpredictable malfunction in a plurality of control systems that may interact in complex ways. A technology is required that enables stable operation of the plant even when a disruption that can destabilize the operation of the plant occurs.

[0008] На этом фоне, общей целью настоящего изобретения является предоставление технологии для реализации устойчивой работы установки.[0008] Against this background, the general purpose of the present invention is to provide technology for realizing sustainable plant operation.

Средство решения проблемыTroubleshooter

[0009] Система поддержки настройки режима работы установки согласно варианту осуществления настоящего изобретения является системой поддержки настройки режима работы установки для поддержки настройки режима работы установки, которая выполняет процесс, сформированный посредством множества устройств, система включает в себя: множество устройств управления, которые воздействуют на одно или более управляемых устройств во множестве устройств для управления с обратной связью соответственно; и устройство поддержки настройки режима работы, который обеспечивает объединенную поддержку для настройки множества устройств управления, которые выполняют множество задач управления с обратной связью соответственно и независимо. Каждое из множества устройств управления включает в себя: блок получения измеренного значения, который получает измеренное значение, указывающее состояние управляемого устройства; блок получения параметра регулировки устройства управления, который получает параметр регулировки устройства управления для определения манипуляционной переменной для управления, вводимой в управляемое устройство; блок определения манипуляционной переменной для управления, который определяет манипуляционную переменную для управления на основе измеренного значения, полученного посредством блока получения измеренного значения, и параметра регулировки устройства управления, полученного посредством блока получения параметра регулировки устройства управления; и блок ввода манипуляционной переменной для управления, который вводит манипуляционную переменную для управления, определенную посредством блока определения манипуляционной переменной для управления, в управляемое устройство. Устройство поддержки настройки режима работы включает в себя: блок получения множества измеренных значений, который получает множество измеренных значений, указывающих состояния множества управляемых устройств, управляемых посредством множества устройств управления, соответственно; и блок определения параметра регулировки устройства управления, который определяет, на основе множества измеренных значений, полученных посредством блока получения множества измеренных значений, множество параметров регулировки устройства управления, используемых каждым из множества устройств управления для определения манипуляционных переменных для управления, которые должны быть введены во множество управляемых устройств, согласно политике, изученной посредством глубокого обучения с подкреплением.[0009] A plant operation mode setting support system according to an embodiment of the present invention is a plant operation mode setting support system for supporting plant operation mode setting that executes a process formed by a plurality of devices, the system includes: a plurality of control devices that act on one or more controllable devices in a plurality of devices for feedback control, respectively; and an operation mode setting support device that provides integrated support for setting a plurality of control devices that perform a plurality of feedback control tasks respectively and independently. Each of the plurality of control devices includes: a measured value acquisition unit that obtains a measured value indicative of a state of the device to be controlled; a control device adjustment parameter obtaining unit that obtains a control device adjustment parameter for determining a manipulated variable for control input to the controlled device; a manipulated variable determination unit for control that determines a manipulated variable for control based on the measured value obtained by the measured value acquisition unit and the control device adjustment parameter obtained by the control device adjustment parameter acquisition unit; and a manipulation variable input unit for control that inputs the manipulation variable for control determined by the manipulation variable for control determination unit to the controlled device. The operation mode setting support device includes: a plurality of measured value acquisition unit that acquires a plurality of measured values indicative of states of a plurality of controllable devices controlled by the plurality of control devices, respectively; and a control device adjustment parameter determination unit that determines, based on the plurality of measured values obtained by the plurality of measured value acquisition unit, a plurality of control device adjustment parameters used by each of the plurality of control devices to determine manipulated variables for control to be input to the plurality of managed devices, according to the policy learned through deep reinforcement learning.

[0010] Блок определения параметра регулировки устройства управления может определять множество параметров регулировки устройства управления согласно политике, изученной посредством глубокого обучения с подкреплением для изучения политики для определения множества параметров регулировки устройства управления, политика основывается на измеренном значении управляемого устройства, целевом значении управления и манипуляционной переменной для управления, возникающей, когда установка работает, на значении вознаграждения, которое представляет индекс устойчивости, указывающий оценку измеренного значения, целевое значение управления и манипуляционную переменную для управления в числовых выражениях, и на параметре регулировки устройства управления, использованном для определения манипуляционной переменной для управления.[0010] The control device adjustment parameter determining unit can determine the control device adjustment parameter set according to the policy learned through deep learning with reinforcement to learn the policy to determine the control device adjustment parameter set, the policy is based on the measured value of the control device, the target control value, and the manipulated variable. for a control occurring when the plant is running, on a reward value that represents a stability index indicating a measured value estimate, a control target value, and a manipulation variable for control in numerical terms, and on a control device adjustment parameter used to determine the manipulation variable for control.

[0011] Система поддержки настройки режима работы установки может дополнительно включать в себя: обучающее устройство, которое выполняет глубокое обучение с подкреплением. Обучающее устройство может включать в себя: блок определения действия, который получает множество измеренных значений, указывающих состояния множества управляемых устройств, и выводит множество параметров регулировки устройства управления, используемых каждым из множества устройств управления; и блок функции оценки, который вычисляет оценку для набора из i) множества измеренных значений, указывающих состояния множества управляемых устройств, возникающие, когда множество устройств управления управляют множеством управляемых устройств с помощью параметров регулировки устройства управления, выведенных блоком определения действия, и ii) используемых параметров регулировки устройства управления. Блок функции оценки может быть обучен таким образом, чтобы уменьшать погрешность между i) ожидаемым значением для значения вознаграждения, которое будет получено, когда блок определения параметра регулировки устройства управления определяет параметры регулировки устройства управления, которые вводятся в управляемые устройства, в то время как множество управляемых устройств находятся в состояниях, указанных посредством множества измеренных значений, манипуляционные переменные для управления, определенные посредством множества устройств управления с помощью определенных параметров регулировки устройства управления, вводятся во множество управляемых устройств, чтобы обновлять состояния множества управляемых устройств, и оптимальные параметры регулировки устройства управления продолжают выбираться впоследствии, и ii) оценкой, вычисленной посредством блока функции оценки.[0011] The installation mode setting support system may further include: a training device that performs deep reinforcement learning. The learning device may include: an action determination unit that receives a plurality of measured values indicative of states of a plurality of controllable devices and outputs a plurality of control device adjustment parameters used by each of the plurality of control devices; and an evaluation function block that calculates an estimate for a set of i) a plurality of measured values indicative of the states of the plurality of controllable devices occurring when the plurality of control devices control the plurality of controllable devices with the adjustment parameters of the control device outputted by the action determining block, and ii) the parameters used adjustment of the control device. The evaluation function block can be trained to reduce the error between i) an expected value for the reward value that will be obtained when the control device adjustment parameter determining block determines the control device adjustment parameters that are input to the controlled devices, while the set of controlled devices are in states indicated by the plurality of measured values, manipulation variables for control determined by the plurality of controllers by means of certain control device adjustment parameters are input to the plurality of controllable devices to update the states of the plurality of controllable devices, and optimal control device adjustment parameters continue to be selected subsequently, and ii) the score computed by the score function block.

[0012] Значение вознаграждения может представлять индекс устойчивости, указывающий правильность состояния процесса в числовых выражениях.[0012] The reward value may represent a stability index indicating the correctness of the state of the process in numerical terms.

[0013] Значение вознаграждения может представлять индекс устойчивости, указывающий правильность состояния процесса в числовых выражениях согласно одному или более следующим критериям: (1) разница между множеством измеренных значений и целевыми значениями управления является небольшой; (2) множество измеренных значений не колеблются; или (3) время, требуемое для стабилизации множества измеренных значений, является коротким.[0013] The reward value may represent a stability index indicating the correctness of the process state in numerical terms according to one or more of the following criteria: (1) the difference between the plurality of measured values and the target control values is small; (2) a set of measured values do not fluctuate; or (3) the time required for the set of measured values to stabilize is short.

[0014] Блок определения параметра регулировки устройства управления может определять множество параметров регулировки устройства управления, используемых, когда работа установки начинается или приводится к остановке, согласно политике, изученной посредством глубокого обучения с подкреплением, которое использует измеренные значения и манипуляционные переменные для управления, возникающие, когда работа установки начинается или приводится к остановке, и параметры регулировки устройства управления.[0014] The control device adjustment parameter determination unit may determine a plurality of control device adjustment parameters used when plant operation is started or brought to a stop, according to a policy learned through deep reinforcement learning that uses measured values and manipulated variables to control arising, when the operation of the installation is started or brought to a stop, and the adjustment parameters of the control device.

[0015] Блок определения параметра регулировки устройства управления может определять множество параметров регулировки устройства управления, когда нарушение возникает, или когда режим работы изменяется во время работы установки, согласно политике, изученной посредством глубокого обучения с подкреплением, которое использует измеренные значения и манипуляционные переменные для управления, возникающие, когда нарушение происходит, или когда режим работы изменяется во время работы установки, и параметр регулировки устройства управления.[0015] The control device adjustment parameter determination unit may determine a plurality of control device adjustment parameters when a violation occurs or when the operating mode is changed during plant operation, according to a policy learned through deep reinforcement learning that uses measured values and manipulative variables for control. , occurring when a violation occurs, or when the mode of operation is changed during the operation of the installation, and the adjustment parameter of the control device.

[0016] Устройство поддержки настройки режима работы может дополнительно включать в себя блок переключения режима, который указывает, на основе политики, изученной посредством глубокого обучения с подкреплением, устройству управления, следует ли осуществлять управление в автоматическом режиме, в котором устройство управления автоматически вводит манипуляционную переменную для управления в управляемое устройство, или в ручном режиме, в котором устройство управления вводит манипуляционную переменную для управления в управляемое устройство в ответ на инструкцию по манипуляционной переменной для управления от оператора.[0016] The operation mode setting support device may further include a mode switching unit that indicates, based on the policy learned through deep reinforcement learning, to the control device whether to control in an automatic mode in which the control device automatically inputs a manipulated variable. to control to a controlled device, or in a manual mode in which the control device inputs a manipulated variable to control to a controlled device in response to an instruction on a manipulated variable to be controlled from an operator.

[0017] Устройство поддержки настройки режима работы может сообщать множество параметров регулировки устройства управления, определенных посредством блока определения параметра регулировки устройства управления, соответствующим устройствам управления, и устройство управления может получать параметр регулировки устройства управления, сообщенный от устройства поддержки настройки режима работы, с помощью блока получения параметра регулировки устройства управления.[0017] The operation mode setting support device can report a plurality of control device adjustment parameters determined by the control device adjustment parameter determination unit to the respective control devices, and the control device can receive the control device adjustment parameter reported from the operation mode setting support device by the unit obtaining the adjustment parameter of the control device.

[0018] Устройство поддержки настройки режима работы может представлять множество параметров регулировки устройства управления, определенных посредством блока определения параметра регулировки устройства управления, оператору, и устройство управления может получать параметр регулировки устройства управления, введенный оператором, с помощью блока получения параметра регулировки устройства управления.[0018] The operation mode setting support device may present a plurality of control device adjustment parameters determined by the control device adjustment parameter determination unit to the operator, and the control device may obtain the control device adjustment parameter entered by the operator with the control device adjustment parameter obtaining unit.

[0019] Другой вариант осуществления настоящего изобретения относится к устройству поддержки настройки режима работы. Устройство является устройством поддержки настройки режима работы для обеспечения объединенной поддержки для настройки множества устройств управления для воздействия на одно или более управляемых устройств, которые существуют среди множества устройств, формирующих процесс, выполняемый в установке, для управления с обратной связью соответственно, устройство включает в себя: блок получения множества измеренных значений, который получает множество измеренных значений, указывающих состояния множества управляемых устройств, управляемых посредством множества устройств управления, соответственно; и блок определения параметра регулировки устройства управления, который определяет, на основе множества измеренных значений, полученных посредством блока получения множества измеренных значений, множество параметров регулировки устройства управления, используемых каждым из множества устройств управления для определения манипуляционных переменных для управления, которые должны быть введены во множество управляемых устройств, согласно политике, изученной посредством глубокого обучения с подкреплением.[0019] Another embodiment of the present invention relates to an operating mode setting support device. The device is an operation mode setting support device for providing joint support for setting a plurality of control devices to affect one or more controllable devices that exist among the plurality of devices shaping the process performed in the plant for feedback control, respectively, the device includes: a plurality of measured value acquisition unit that acquires a plurality of measured values indicative of the states of the plurality of controllable devices controlled by the plurality of control devices, respectively; and a control device adjustment parameter determination unit that determines, based on the plurality of measured values obtained by the plurality of measured value acquisition unit, a plurality of control device adjustment parameters used by each of the plurality of control devices to determine manipulated variables for control to be input to the plurality of managed devices, according to the policy learned through deep reinforcement learning.

[0020] Другой вариант осуществления настоящего изобретения относится к обучающему устройству. Обучающее устройство включает в себя: блок определения действия, который получает множество измеренных значений, указывающих состояния множества управляемых устройств, формирующих процесс, выполняемый в установке, и выводит множество параметров регулировки устройства управления, используемых каждым из множества устройств управления для воздействия на множество управляемых устройств для автоматического управления с обратной связью, соответственно; и блок функции оценки, который вычисляет оценку для набора из i) множества измеренных значений, указывающих состояния множества управляемых устройств, возникающих, когда множество устройств управления управляют множеством управляемых устройств с помощью параметров регулировки устройства управления, выводимых посредством блока определения действия, и ii) используемых параметров регулировки устройства управления. Блок функции оценки может быть обучен таким образом, чтобы уменьшать погрешность между i) ожидаемым значением для значения вознаграждения, которое будет получено, когда блок определения параметра регулировки устройства управления определяет параметры регулировки устройства управления, которые вводятся в управляемые устройства, в то время как множество управляемых устройств находятся в состояниях, указанных посредством множества измеренных значений, манипуляционные переменные для управления, определенные посредством множества устройств управления с помощью определенных параметров регулировки устройства управления, вводятся во множество управляемых устройств, чтобы обновлять состояния множества управляемых устройств, и оптимальные параметры регулировки устройства управления продолжают выбираться впоследствии, и ii) оценкой, вычисленной посредством блока функции оценки.[0020] Another embodiment of the present invention relates to a learning device. The learning device includes: an action determining unit that receives a plurality of measured values indicative of the states of a plurality of controllable devices forming a process performed in the plant, and outputs a plurality of control device adjustment parameters used by each of the plurality of control devices to influence the plurality of controllable devices to automatic feedback control, respectively; and an evaluation function block that calculates an estimate for a set of i) a plurality of measured values indicative of the states of the plurality of controllable devices occurring when the plurality of control devices control the plurality of controllable devices with control device adjustment parameters output by the action determination block, and ii) used control device adjustment parameters. The evaluation function block can be trained to reduce the error between i) an expected value for the reward value that will be obtained when the control device adjustment parameter determining block determines the control device adjustment parameters that are input to the controlled devices, while the set of controlled devices are in states indicated by the plurality of measured values, manipulation variables for control determined by the plurality of controllers by means of certain control device adjustment parameters are input to the plurality of controllable devices to update the states of the plurality of controllable devices, and optimal control device adjustment parameters continue to be selected subsequently, and ii) the score computed by the score function block.

[0021] Необязательные сочетания вышеупомянутых составляющих элементов, и реализации изобретения в форме способов, устройств, систем, носителей записи и компьютерных программ могут также быть применены на практике в качестве дополнительных режимов осуществления настоящего изобретения.[0021] Optional combinations of the aforementioned constituent elements, and embodiments of the invention in the form of methods, devices, systems, recording media, and computer programs may also be practiced as additional modes of implementation of the present invention.

Преимущество изобретенияThe advantage of the invention

[0022] Настоящее изобретение способно предоставлять технологию для реализации устойчивой работы установки.[0022] The present invention is capable of providing a technology for realizing stable operation of a plant.

Краткое описание чертежейBrief description of the drawings

[0023] Фиг. 1 показывает общую конфигурацию системы поддержки настройки режима работы установки согласно варианту осуществления;[0023] FIG. 1 shows an overall configuration of a plant operation mode setting support system according to the embodiment;

Фиг. 2 показывает примерную конфигурацию компрессорной системы, которая представляется в качестве примера процесса, подвергаемого управлению;Fig. 2 shows an exemplary configuration of a compressor system, which is presented as an example of the process being controlled;

Фиг. 3 схематично показывает способ управления в установке предшествующего уровня техники;Fig. 3 schematically shows the control method in a prior art plant;

Фиг. 4 схематично показывает конфигурацию устройства поддержки настройки режима работы согласно варианту осуществления;Fig. 4 schematically shows a configuration of an operation mode setting support apparatus according to the embodiment;

Фиг. 5 показывает конфигурацию устройства поддержки настройки режима работы и устройства управления согласно варианту осуществления;Fig. 5 shows a configuration of an operation mode setting support device and a control device according to the embodiment;

Фиг. 6 схематично показывает конфигурацию обучающего устройства согласно варианту осуществления;Fig. 6 schematically shows the configuration of the teaching device according to the embodiment;

Фиг. 7 показывает конфигурацию обучающего устройства согласно варианту осуществления; иFig. 7 shows a configuration of a teaching device according to an embodiment; and

Фиг. 8 показывает пример вида экрана, отображаемого на устройстве отображения пользовательской операционной панели.Fig. 8 shows an example of a screen layout displayed on the display device of the user operation panel.

Режим осуществления изобретенияMode of carrying out the invention

[0024] Фиг. 1 показывает общую конфигурацию системы поддержки настройки режима работы установки согласно варианту осуществления. Система 1 поддержки настройки режима работы установки для поддержки настройки режима работы установки 3 снабжается установкой 3 для производства химических продуктов, промышленных изделий и т.д. и обучающим устройством 2 для обеспечения глубокого обучения с подкреплением для изучения политики для определения множества параметров регулировки устройства управления, используемых для задания режима работы установки 3. Установка 3 включает в себя управляемое устройство 10, которое формирует процесс, выполняемый в установке 3, множество устройств 20 управления для воздействия на одно или более управляемых устройств 10 для управления с обратной связью соответственно, и устройство 30 поддержки настройки режима работы для предоставления объединенной поддержки для настройки множества устройств 20 управления, которые выполняют множество задач управления с обратной связью соответственно и независимо. Устройство 30 поддержки настройки режима работы определяет множество параметров регулировки устройства управления, используемых для определения манипуляционных переменных для управления, предоставляемых каждым из множества устройств 20 управления множеству управляемых устройств 10 согласно политике, изученной посредством глубокого обучения с подкреплением, выполняемого в обучающем устройстве 2.[0024] FIG. 1 shows the general configuration of the installation mode setting support system according to the embodiment. The installation mode setting support system 1 is provided with the installation 3 for producing chemical products, industrial products, and so on, to support setting the operation mode of the plant 3. and a learning device 2 to provide deep reinforcement learning for policy learning to determine a plurality of control device adjustment parameters used to set the operation mode of the plant 3. The plant 3 includes a controlled device 10 that generates a process performed in the plant 3, a plurality of devices 20 control for influencing one or more controllable devices 10 for feedback control, respectively, and an operation mode setting support device 30 for providing joint support for setting a plurality of control devices 20 that perform a plurality of feedback control tasks, respectively and independently. The operation mode setting support device 30 determines a plurality of control device adjustment parameters used to determine manipulated variables for control provided by each of the plurality of control devices 20 to the plurality of controllable devices 10 according to the policy learned through deep reinforcement learning performed in the learning device 2.

[0025] Фиг. 2 показывает примерную конфигурацию компрессорной системы, которая представляется в качестве примера процесса, подвергаемого управлению. Компрессорная система, показанная на чертеже, включает в себя, в качестве множества управляемых устройств 10, которые формируют процесс, теплообменник для охлаждения субъекта охлаждения с помощью пропанового хладагента, пропановый компрессор для сжатия газообразного пропана, испарившегося в теплообменнике, и т.д. Компрессорная система дополнительно включает в себя, в качестве устройств 20 управления, которые управляют каждым из множества управляемых устройств 10 независимо и автоматически, PID-контроллеры, такие как контроллер LC уровня жидкости, контроллер PC давления, контроллер SC скорости вращения и контроллер ASC противопомпажного регулирования.[0025] FIG. 2 shows an exemplary configuration of a compressor system, which is presented as an example of the process being controlled. The compressor system shown in the drawing includes, as a plurality of controllable devices 10 that form a process, a heat exchanger for cooling a cooling subject with propane refrigerant, a propane compressor for compressing propane gas vaporized in the heat exchanger, and so on. The compressor system further includes, as control devices 20 that control each of the plurality of control devices 10 independently and automatically, PID controllers such as a liquid level controller LC, a pressure controller PC, a rotation speed controller SC, and an anti-surge controller ASC.

[0026] Контроллер LC уровня жидкости управляет открытием подающего клапана для подачи пропанового хладагента в соответствии с уровнем жидкости пропанового хладагента для того, чтобы поддерживать уровень жидкости пропанового хладагента в теплообменнике постоянным. Контроллер PC давления управляет контроллером SC скорости вращения в соответствии с давлением газообразного пропана, испарившегося из теплообменника, для того, чтобы поддерживать давление газообразного пропана, вводимого в пропановый компрессор, постоянным. Контроллер SC скорости вращения управляет скоростью вращения газовой турбины GT для регулировки давления газообразного пропана, введенного в пропановый компрессор, в ответ на команду от контроллера PC давления. Контроллер ASC противопомпажного регулирования управляет открытием противопомпажного клапана в соответствии с давлением газообразного пропана на выходе пропанового компрессора для того, чтобы препятствовать помпажу в пропановом компрессоре. Из этих PID-контроллеров контроллер SC скорости вращения работает в ответ на команду от контроллера PC давления. Другие три PID-контроллера автоматически управляют управляемыми устройствами 10 соответственно и независимо.[0026] The liquid level controller LC controls the opening of the propane refrigerant supply valve according to the propane refrigerant liquid level in order to keep the propane refrigerant liquid level in the heat exchanger constant. The pressure controller PC controls the rotation speed controller SC according to the pressure of the propane gas vaporized from the heat exchanger in order to keep the pressure of the propane gas introduced into the propane compressor constant. The rotation speed controller SC controls the rotation speed of the gas turbine GT to adjust the pressure of the propane gas introduced into the propane compressor in response to a command from the pressure controller PC. The anti-surge controller ASC controls the opening of the anti-surge valve according to the propane gas pressure at the outlet of the propane compressor in order to prevent surge in the propane compressor. Of these PID controllers, the rotation speed controller SC operates in response to a command from the pressure controller PC. The other three PID controllers automatically control the controlled devices 10 respectively and independently.

[0027] Когда величина субъекта охлаждения уменьшается быстро в этой компрессорной системе, например, вследствие нарушения, величина холодопроизводительности уменьшается, так что количество пропана, испарившегося в теплообменнике, уменьшается, и уровень жидкости пропанового хладагента увеличивается. Когда это происходит, контроллер LC уровня жидкости уменьшает открытие клапана с тем, чтобы уменьшать количество втекающего пропанового хладагента и поддерживать уровень жидкости пропанового хладагента постоянным. Когда количество испарившегося пропана уменьшается, измеренное значение давления, вводимое в контроллер PC давления, уменьшается. В ответ, контроллер PC давления инструктирует контроллеру SC скорости вращения уменьшать скорость вращения газовой турбины GT.[0027] When the amount of the subject of cooling decreases rapidly in this compressor system, for example, due to a violation, the amount of cooling capacity decreases, so that the amount of propane evaporated in the heat exchanger decreases, and the liquid level of the propane refrigerant increases. When this occurs, the liquid level controller LC reduces the opening of the valve so as to reduce the amount of inflowing propane refrigerant and keep the propane refrigerant liquid level constant. As the amount of vaporized propane decreases, the measured pressure value input to the pressure controller PC decreases. In response, the pressure controller PC instructs the rotation speed controller SC to decrease the rotation speed of the gas turbine GT.

[0028] Однако, когда давление пропанового газа, вводимого в пропановый компрессор, уменьшается в результате уменьшения в скорости вращения газовой турбины GT, измеренное значение давления, вводимое в контроллер ASC противопомпажного регулирования, уменьшается, так что контроллер ASC противопомпажного регулирования увеличивает открытие противопомпажного клапана для того, чтобы избегать помпажа в пропановом компрессоре. Это вынуждает измеренное значение давления, вводимое в контроллер PC давления, увеличиваться, так что контроллер PC давления инструктирует контроллеру SC скорости вращения увеличивать скорость вращения газовой турбины GT.[0028] However, when the pressure of the propane gas input to the propane compressor decreases as a result of a decrease in the rotation speed of the gas turbine GT, the measured pressure value input to the anti-surge controller ASC decreases, so that the anti-surge controller ASC increases the opening of the anti-surge valve to in order to avoid surge in the propane compressor. This causes the measured pressure value input to the pressure controller PC to increase, so that the pressure controller PC instructs the rotation speed controller SC to increase the rotation speed of the gas turbine GT.

[0029] Когда давление пропанового газа, вводимого в пропановый компрессор, увеличивается в результате увеличения в скорости вращения газовой турбины GT, измеренное значение давления, вводимое в контроллер ASC противопомпажного регулирования, увеличивается, так что контроллер ASC противопомпажного регулирования уменьшает открытие противопомпажного клапана. Это уменьшает измеренное значение давления, вводимое в контроллер PC давления, так что контроллер PC давления инструктирует контроллеру SC скорости вращения уменьшать скорость вращения газовой турбины GT снова.[0029] When the pressure of the propane gas input to the propane compressor increases as a result of an increase in the rotation speed of the gas turbine GT, the measured pressure value input to the anti-surge controller ASC increases, so that the anti-surge controller ASC reduces the opening of the anti-surge valve. This reduces the measured pressure value input to the pressure controller PC so that the pressure controller PC instructs the rotation speed controller SC to decrease the rotation speed of the gas turbine GT again.

[0030] Таким образом, при наличии взаимных помех между воздействиями от автоматических задач управления с обратной связью в процессе, включающем в себя множество систем управления, подвергаемых автоматическому и независимому управлению с обратной связью посредством множества устройств 20 управления, соответственно, характер работы может становиться неустойчивым. Например, управление происходит в противоположных направлениях периодически, чтобы приводить в результате к колебанию регулируемой величины. Даже в таком случае, система, как ожидается, должна сводиться к устойчивой работе в конечном счете, если правильные PID-параметры заданы в соответствующих PID-контроллерах. Если нарушение, которое индуцировало колебание или отклонение, вызванное изменением в режиме работы, является серьезным или скачкообразным, однако, оно может занять длительное время, прежде чем система сведется к устойчивой работе, или колебание регулируемой величины может оставаться.[0030] Thus, when there is mutual interference between influences from automatic feedback control tasks in a process including a plurality of control systems subjected to automatic and independent feedback control by a plurality of control devices 20, respectively, the behavior may become unstable. . For example, control occurs in opposite directions intermittently to result in controlled variable fluctuation. Even so, the system is expected to be stable in the long run if the correct PID parameters are set in the appropriate PID controllers. If the disturbance that induced the oscillation or deviation caused by the change in operating mode is severe or intermittent, however, it may take a long time before the system is reduced to stable operation, or the controlled variable oscillation may remain.

[0031] Фиг. 3 схематично показывает способ управления в установке предшествующего уровня техники. Процесс 12, выполняемый в установке, формируется посредством множества управляемых устройств 10a, 10b, …, 10n. Множество управляемых устройств 10a, 10b, …, 10n управляются посредством устройств 20a, 20b, …, 20n управления, соответственно. В случае примера, показанного на фиг. 2, множество управляемых устройств 10a, 10b, …, 10n являются теплообменником, пропановым компрессором и т.д. Множество устройств 20a, 20b, …, 20n управления являются контроллером LC уровня жидкости, контроллером PC давления, контроллером SC скорости вращения, контроллером ASC противопомпажного регулирования и т.д.[0031] FIG. 3 schematically shows a control method in a prior art plant. The process 12 running in the plant is formed by a plurality of controlled devices 10a, 10b, ..., 10n. A plurality of controlled devices 10a, 10b, ..., 10n are controlled by control devices 20a, 20b, ..., 20n, respectively. In the case of the example shown in FIG. 2, the plurality of controlled devices 10a, 10b, ..., 10n are a heat exchanger, a propane compressor, and so on. The plurality of control devices 20a, 20b, .

[0032] В установке предшествующего уровня техники трудно прогнозировать влияние от изменения трех типов параметров регулировки устройства управления (далее в данном документе называемых "PID-параметрами"), включающих в себя пропорциональный коэффициент усиления (P-коэффициент усиления), интегральный коэффициент усиления (I-коэффициент усиления) и дифференциальный коэффициент усиления (D-коэффициент усиления), которые используются множеством устройств 20 управления для PID-регулирования. Следовательно, PID-параметры являются едва ли изменяемыми. Если изменение необходимо, оператор вводит параметр в соответствующее устройство 20 управления вручную. Следовательно, если состояние процесса 12 становится неустойчивым, например, вследствие нарушения, автоматическое управление посредством взаимно мешающих устройств 20a, 20b, …, 20n управления необходимо стабилизировать посредством ввода оператором соответствующих PID-параметров в соответствующие устройства 20 управления. Время, требуемое для сведения к устойчивой работе, зависело от опыта и квалификации оператора.[0032] In the prior art installation, it is difficult to predict the effect of changing the three types of control device adjustment parameters (hereinafter referred to as "PID parameters"), including proportional gain (P gain), integral gain (I -gain) and differential gain (D-gain), which are used by the plurality of control devices 20 for PID control. Therefore, the PID parameters are hardly changeable. If a change is needed, the operator enters the parameter into the corresponding control device 20 manually. Therefore, if the state of the process 12 becomes unstable, for example due to a disturbance, the automatic control by the mutually interfering control devices 20a, 20b, ..., 20n must be stabilized by the operator entering the appropriate PID parameters into the respective control devices 20. The time required to reduce to stable operation depended on the experience and skill of the operator.

[0033] Фиг. 4 схематично показывает конфигурацию устройства поддержки настройки режима работы согласно варианту осуществления. Устройство 30 поддержки настройки режима работы определяет PID-параметры, которые должны быть введены во множество устройств 20 управления в соответствии с политикой, изученной посредством глубокого обучения с подкреплением в обучающем устройстве 2, как описано ниже. Политика определяет PID-параметры, которые максимизируют оценку, на основе функции действие-значение для вычисления оценки сочетания множества значений, которые могут быть заданы в качестве PID-параметров, из множества измеренных значений, указывающих состояние множества управляемых устройств 10, целевых значений для значений, подвергаемых регулированию во множестве управляемых устройств 10, и значений манипуляционных переменных для управления, вводимых во множество управляемых устройств 10. Функция действие-значение изучается посредством обучающего устройства 2 таким образом, чтобы вычислять высокую оценку для PID-параметра, который предоставляет возможность значениям, подвергаемым регулированию, приближаться к целевым значениям в короткий срок, в то же время также управляя процессом в целом, чтобы он был устойчивым. В альтернативном примере функция действие-значение, используемая для определения PID-параметров, может использовать значения других параметров в дополнение к или вместо измеренных значений, целевых значений для значений, подвергаемых регулированию, и значений манипуляционных переменных для управления, с целью вычисления оценки сочетания множества значений, которые могут быть заданы в качестве PID-параметров. Например, значение текущего или прошлого PID-параметра, значение параметра, указывающего фактор нарушения, и т.д., могут быть использованы. Альтернативно, степень изменения или величина изменения таких параметров может быть использована в дополнение к или вместо абсолютных значений параметров.[0033] FIG. 4 schematically shows the configuration of the operation mode setting support device according to the embodiment. The operation mode setting support device 30 determines the PID parameters to be input to the plurality of control devices 20 according to the policy learned through deep reinforcement learning in the learning device 2, as described below. The policy determines the PID parameters that maximize the score, based on the action-value function to calculate the score of a combination of a set of values that can be set as PID parameters, from a set of measured values indicating the state of a plurality of controlled devices 10, target values for values, subject to regulation in the plurality of controlled devices 10, and manipulation variable values for control input to the plurality of controlled devices 10. , approach the target values in a short time, while also managing the process as a whole so that it is sustainable. In an alternative example, the action-value function used to determine the PID parameters may use the values of other parameters in addition to or instead of the measured values, the target values for the values subject to regulation, and the values of the manipulated variables to be controlled, in order to calculate an estimate of the combination of the set of values , which can be set as PID parameters. For example, the value of the current or past PID parameter, the value of a parameter indicating a violation factor, etc. may be used. Alternatively, the degree of change or amount of change in such parameters may be used in addition to or instead of the absolute values of the parameters.

[0034] Множество PID-параметров, определенных посредством устройства 30 поддержки настройки режима работы, могут быть представлены оператору, чтобы предоставлять возможность оператору вводить PID-параметр в устройство 20 управления, обращаясь к множеству представленных PID-параметров. Альтернативно, устройство 30 поддержки настройки режима работы может вводить PID-параметр непосредственно в устройство 20 управления. Это уменьшает работу оператора значительно и предоставляет возможность установке 3 работать устойчивым образом независимо от опыта и квалификации оператора.[0034] The plurality of PID parameters determined by the operation mode setting support device 30 may be presented to an operator to enable the operator to input a PID parameter to the control device 20 by referring to the plurality of presented PID parameters. Alternatively, the operation mode setting support device 30 may input the PID parameter directly to the control device 20 . This reduces the work of the operator significantly and enables the unit 3 to operate in a stable manner regardless of the experience and skill of the operator.

[0035] Фиг. 5 показывает конфигурацию устройства поддержки настройки режима работы и устройства управления согласно варианту осуществления. Устройство 20 управления снабжается блоком 21 управления и пользовательской операционной панелью 22.[0035] FIG. 5 shows a configuration of an operation mode setting support device and a control device according to the embodiment. The control device 20 is provided with a control unit 21 and a user operating panel 22.

[0036] Пользовательская операционная панель 22 отображает на устройстве отображения множество измеренных значений, указывающих состояние множества управляемых устройств 10, содержащихся в установке 3, значения манипуляционных переменных для управления, заданных посредством устройств 20 управления в управляемых устройствах 10, значения PID-параметров, заданных в устройствах 20 управления, и измеренные значения выходных данных, указывающих результат работы установки 3. Пользовательская операционная панель 22 также подтверждает ввод значения PID-параметра от оператора.[0036] The user operation panel 22 displays on the display device a plurality of measured values indicating the status of the plurality of controllable devices 10 contained in the installation 3, the values of manipulation variables for control set by the control devices 20 in the controllable devices 10, the values of the PID parameters set in control devices 20, and output measured values indicating the operation result of the plant 3. The user operation panel 22 also confirms the input of the PID parameter value from the operator.

[0037] Устройство 20 управления снабжается блоком 23 получения измеренного значения, блоком 24 получения целевого значения, блоком 25 получения PID-параметра, блоком 26 определения манипуляционной переменной для управления и блоком 27 ввода манипуляционной переменной для управления. Функции реализуются в компонентах аппаратных средств, таких как CPU и память в произвольном компьютере, программе, загруженной в память, и т.д. Чертеж изображает функциональные блоки, реализованные посредством взаимодействия этих элементов. Следовательно, специалистам в области техники будет понятно, что функциональные блоки могут быть реализованы множеством способов посредством только аппаратных средств, только программного обеспечения или посредством сочетания аппаратных средств и программного обеспечения.[0037] The control device 20 is provided with a measured value acquisition unit 23, a target value acquisition unit 24, a PID parameter acquisition unit 25, a manipulated variable determination unit 26 for control, and a manipulation variable input unit 27 for control. Functions are implemented in hardware components such as the CPU and memory in an arbitrary computer, a program loaded into memory, and so on. The drawing depicts functional blocks implemented through the interaction of these elements. Therefore, those skilled in the art will appreciate that the functional blocks may be implemented in a variety of ways in hardware alone, in software alone, or in a combination of hardware and software.

[0038] Блок 23 получения измеренного значения получает измеренное значение, указывающее состояние управляемого устройства 10. В случае, когда целевое значение для значений, подвергаемых регулированию в управляемом устройстве 10, является переменным, блок 24 получения целевого значения получает целевое значение. В примере, показанном на фиг. 2, например, целевое значение уровня жидкости пропанового хладагента в теплообменнике является фиксированным значением, но целевое значение скорости вращения газовой турбины регулируется переменным образом посредством контроллера LC давления. Следовательно, блок 24 получения целевого значения получает целевое значение скорости вращения газовой турбины от контроллера LC давления.[0038] The measured value acquisition unit 23 obtains a measured value indicating the state of the controllable device 10. In the case where the target value for the values to be adjusted in the controllable apparatus 10 is variable, the target value acquisition unit 24 obtains the target value. In the example shown in FIG. 2, for example, the target value of the liquid level of the propane refrigerant in the heat exchanger is a fixed value, but the target value of the rotational speed of the gas turbine is variably controlled by the pressure controller LC. Therefore, the target value acquisition unit 24 obtains the target value of the rotational speed of the gas turbine from the pressure controller LC.

[0039] Блок 25 получения PID-параметра получает PID-параметр, используемый для определения манипуляционных переменных для управления, вводимых в управляемое устройство 10. В автоматическом режиме, в котором устройство 30 поддержки настроек режима работы автоматически вводит PID-параметр в устройство 20 управления, устройство 30 поддержки настройки режима работы сообщает множество определенных PID-параметров соответствующим устройствам 20 управления. Устройство 20 управления непосредственно получает PID-параметр, сообщенный из устройства 30 поддержки настройки режима работы, с помощью блока 25 получения PID-параметра. В ручном режиме, в котором оператор вводит PID-параметр в устройство 20 управления, устройство 30 поддержки настройки режима работы представляет множество определенных PID-параметров оператору через пользовательскую операционную панель 22. Устройство 20 управления получает PID-параметр, введенный оператором, с помощью блока 25 получения PID-параметра.[0039] The PID parameter acquisition unit 25 acquires the PID parameter used to determine manipulation variables for control input to the control device 10. In the automatic mode, in which the operation mode setting support device 30 automatically inputs the PID parameter to the control device 20, the operating mode setting support device 30 reports a plurality of determined PID parameters to the respective control devices 20 . The control device 20 directly obtains the PID parameter reported from the operation mode setting support device 30 via the PID parameter acquisition unit 25 . In manual mode, in which the operator inputs the PID parameter to the control device 20, the operation mode setting support device 30 presents a plurality of determined PID parameters to the operator via the user operation panel 22. The control device 20 receives the PID parameter input by the operator using the block 25 getting the PID parameter.

[0040] Блок 26 определения манипуляционной переменной для управления определяет манипуляционные переменные для управления, которые должны быть заданы в управляемом устройстве 10, на основе измеренного значения, полученного посредством блока 23 получения измеренного значения, целевого значения, полученного посредством блока 24 получения целевого значения, и PID-параметра, полученного посредством блока 25 получения PID-параметра. Блок 26 определения манипуляционной переменной для управления может определять манипуляционные переменные для управления с помощью произвольной публично известной технологии PID-регулирования. Блок 27 ввода манипуляционной переменной для управления вводит манипуляционные переменные для управления, определенные посредством блока 26 определения манипуляционной переменной для управления, в управляемое устройство 10.[0040] The manipulation variable determination unit 26 determines the manipulation variables to be controlled to be set in the controlled device 10 based on the measured value obtained by the measured value acquisition unit 23, the target value obtained by the target value acquisition unit 24, and The PID parameter obtained by the PID parameter acquisition unit 25 . The manipulative variable determination unit 26 can determine manipulative variables to be controlled by an arbitrary publicly known PID control technology. The manipulated control variable input unit 27 inputs the manipulated control variables determined by the manipulated control variable determination unit 26 to the controlled device 10.

[0041] Устройство 30 поддержки настройки режима работы снабжается блоком 31 управления. Блок 31 управления снабжается блоком 32 получения множества измеренных значений, блоком 33 определения PID-параметра, блоком 34 вывода PID-параметра, блоком 35 переключения режима и блоком 36 обновления политики. Эти функции могут также быть реализованы множеством способов посредством только аппаратных средств, только программного обеспечения или посредством сочетания аппаратных средств и программного обеспечения.[0041] The operation mode setting support device 30 is provided with a control unit 31 . The control unit 31 is provided with a plurality of measured value acquisition unit 32 , a PID parameter determination unit 33 , a PID parameter output unit 34 , a mode switching unit 35 , and a policy update unit 36 . These functions may also be implemented in a variety of ways through hardware alone, software alone, or through a combination of hardware and software.

[0042] Блок 32 получения множества измеренных значений получает множество измеренных значений, указывающих состояния множества управляемых устройств 10, управляемых посредством множества устройств 20 управления, соответственно. Блок 32 получения множества измеренных значений получает все измеренные значения, указывающие состояния всех управляемых устройств 10, управляемых посредством множества устройств 20 управления, обеспечиваемых объединенной поддержкой посредством устройства 30 поддержки настройки режима работы.[0042] The plurality of measured values acquisition unit 32 acquires a plurality of measured values indicative of the states of the plurality of controllable devices 10 controlled by the plurality of control devices 20, respectively. The plurality of measured value acquisition unit 32 obtains all measured values indicating the states of all controllable devices 10 controlled by the plurality of control devices 20 provided by the joint support by the operation mode setting support device 30 .

[0043] Блок 33 определения PID-параметра определяет, на основе множества измеренных значений, полученных посредством блока 32 получения множества измеренных значений, множество PID-параметров, используемых для определения манипуляционных переменных для управления, которые множество устройств 20 управления должны, соответственно, вводить во множество управляемых устройств 10. Из PID-параметров, которые могут быть выбраны в состоянии, определенном по измеренным значениям, полученным посредством блока 32 получения множества измеренных значений, блок 33 определения PID-параметра определяет PID-параметр, который максимизирует оценку, на основе функции действие-значение, изученной посредством обучающего устройства 2. Как описано ниже, функция действие-значение является нейронной сетью, которая выводит оценку для каждого из множества PID-параметров, которые могут быть выбраны, в ответ на ввод множества измеренных значений, указывающих состояния множества управляемых устройств 10. Функция действие-значение изучается посредством глубокого обучения с подкреплением в обучающем устройстве 2.[0043] The PID parameter determining unit 33 determines, based on the plurality of measured values obtained by the measured value plurality acquisition unit 32, the plurality of PID parameters used to determine the manipulation variables for control that the plurality of control devices 20 should respectively input into a plurality of controllable devices 10. From the PID parameters that can be selected in the state determined from the measured values obtained by the measured value plurality acquisition unit 32, the PID parameter determining unit 33 determines the PID parameter that maximizes the estimate based on the function action -value learned by the learning device 2. As described below, the action-value function is a neural network that outputs an estimate for each of a plurality of PID parameters that can be selected in response to input of a plurality of measured values indicative of the states of a plurality of controlled devices 10. Action function twie-value is learned through deep reinforcement learning in learning device 2.

[0044] Блок 35 переключения режима указывает, на основе политики, изученной посредством глубокого обучения с подкреплением, устройству 20 управления, следует ли осуществлять управление в автоматическом режиме, в котором устройство 20 управления автоматически вводит манипуляционную переменную для управления в управляемое устройство 10, или в ручном режиме, в котором устройство 20 управления вводит манипуляционную переменную для управления в управляемое устройство 10 в ответ на инструкцию по манипуляционной переменной для управления от оператора.[0044] The mode switching unit 35 indicates, based on the policy learned through deep reinforcement learning, to the control device 20 whether to control in the automatic mode, in which the control device 20 automatically inputs a manipulated variable for control into the control device 10, or into a manual mode in which the control device 20 inputs a manipulated variable for control into the controlled device 10 in response to an instruction on the manipulated variable for control from an operator.

[0045] Блок 36 обновления политики получает, в качестве политики, обученную нейронную сеть от обучающего устройства 2 и обновляет блок 33 определения PID-параметра. Это делает возможным получение нейронной сети с ее точностью, улучшенной посредством обучающего устройства 2, даже во время работы установки 3 и обновление функции действие-значение для определения действия. Следовательно, более подходящий PID-параметр выбирается по сравнению с иным случаем.[0045] The policy update unit 36 receives, as a policy, the trained neural network from the training device 2 and updates the PID parameter determination unit 33 . This makes it possible to obtain a neural network with its accuracy improved by the tutor 2 even while the setup 3 is running and to update the action-value function to determine the action. Therefore, a more appropriate PID parameter is chosen over the other case.

[0046] Фиг. 6 схематично показывает конфигурацию обучающего устройства согласно варианту осуществления. Обучающее устройство 2 использует симулятор 40, чтобы выполнять глубокое обучение с подкреплением для изучения политики для объединенного управления поведением всех управляемых устройств 10, формирующих процесс 12, выполняемый в установке 3. Симулятор 40 включает в себя симулятор 42 процесса для симулирования процесса 12, выполняемого в установке 3, и симуляторы 43 устройств управления, которые симулируют каждое из устройств 20 управления, которые управляют множеством управляемых устройств 10, соответственно. Симулятор 42 процесса включает в себя симуляторы 41 управляемых устройств, которые, соответственно, симулируют множество управляемых устройств 10, формирующих процесс 12. Обучающее устройство 2 определяет PID-параметр, который каждый симулятор 43 устройства управления использует для определения манипуляционной переменной для управления, и вводит определенный PID-параметр в симулятор 40. Обучающее устройство 2 повторяет этап получения множества измеренных значений, указывающих результат управления, осуществленного с помощью введенного PID-параметра, множество раз во временной последовательности, чтобы изучать характер работы установки 3. Обучающее устройство 2 изучает политику для объединенного определения PID-параметра, который предоставляет возможность множеству устройств 20 управления работать согласованно, чтобы осуществлять работу установки 3 устойчивым образом.[0046] FIG. 6 schematically shows the configuration of the teaching device according to the embodiment. The educator 2 uses a simulator 40 to perform deep reinforcement learning for policy learning to jointly control the behavior of all managed devices 10 that form a process 12 running in the machine 3. The simulator 40 includes a process simulator 42 to simulate the process 12 running in the machine 3 and control device simulators 43 that simulate each of control devices 20 that control a plurality of control devices 10, respectively. The process simulator 42 includes controllable device simulators 41, which respectively simulate a plurality of controllable devices 10 forming a process 12. The educator 2 determines the PID parameter that each control device simulator 43 uses to determine the manipulated variable to control, and inputs the determined PID parameter to the simulator 40. Teaching device 2 repeats the step of obtaining a plurality of measured values indicating the result of control performed by the input PID parameter a plurality of times in time sequence to learn the behavior of the installation 3. Teaching device 2 learns the policy for the combined determination A PID parameter that enables the plurality of control devices 20 to work in concert so as to carry out the operation of the plant 3 in a stable manner.

[0047] Фиг. 7 показывает конфигурацию обучающего устройства 2 согласно варианту осуществления. Обучающее устройство 2 снабжается блоком 4 определения действия, блоком 5 получения значения вознаграждения, блоком 6 обновления функции действие-значение, нейронной сетью 7, блоком 8 управления обучением и блоком 9 получения множества измеренных значений. Эти функции могут также быть реализованы множеством способов посредством только аппаратных средств, только программного обеспечения или посредством сочетания аппаратных средств и программного обеспечения.[0047] FIG. 7 shows the configuration of the teaching device 2 according to the embodiment. The teaching device 2 is provided with an action determination unit 4, a reward value acquisition unit 5, an action-value function update unit 6, a neural network 7, a learning control unit 8, and a plurality of measured value acquisition unit 9. These functions may also be implemented in a variety of ways through hardware alone, software alone, or through a combination of hardware and software.

[0048] Обучающее устройство 2 изучает, посредством глубокого обучения с подкреплением, политику, посредством которой блок 33 определения PID-параметра устройства 30 поддержки настройки режима работы определяет значения PID-параметров, которые должны быть заданы в соответствующих устройствах 20 управления.[0048] The teaching device 2 learns, through deep reinforcement learning, the policy by which the PID parameter determination unit 33 of the operation mode setting support device 30 determines the PID parameter values to be set in the respective control devices 20.

[0049] Обучение с подкреплением определяет политику, которая максимизирует поощрение, получаемое посредством действия агента в заданном окружении, предпринятого, исходя из окружения. Этапы, на которых агент предпринимает действие, исходя из окружения, и окружение обновляет состояние, оценивает действие и уведомляет агента о состоянии и вознаграждает, повторяются во временной последовательности. Функция действие-значение и политика оптимизируются, чтобы максимизировать ожидаемое значение полученной суммы вознаграждения.[0049] Reinforcement learning defines a policy that maximizes the reward received by the action of the agent in a given environment, taken based on the environment. The steps in which the agent takes an action based on the environment and the environment updates the state, evaluates the action, and notifies the agent of the state and rewards, repeat in time sequence. The action-value function and policy are optimized to maximize the expected value of the received reward amount.

[0050] В этом варианте осуществления число сочетаний вариантов для состояния s установки 3, определенного по измеренным значениям множества управляемых устройств 10, и действия a ввода PID-параметров во множество устройств 20 управления в состоянии s будет огромным. Следовательно, глубокое обучение с подкреплением, в котором функция действие-значение аппроксимируется посредством нейронной сети 7, выполняется. Алгоритм глубокого обучения с подкреплением может быть сетью глубокого Q-обучения (DQN) или двойной DQN или любым другим произвольным алгоритмом. Нейронная сеть 7 может быть нейронной сетью с прямой связью, такой как многослойная персептронная нейронная сеть, простая персептронная нейронная сеть и сверточная нейронная сеть. Альтернативно, нейронная сеть любой из других произвольных форм может быть использована. Входными данными для входного слоя нейронной сети 7 являются все измеренные значения PVn, указывающие состояния всех управляемых устройств 10, целевые значения SVn для значений, подвергаемых регулированию во всех управляемых устройствах 10, и значения MVn манипуляционных переменных для управления, введенных из всех устройств 20 управления во все управляемые устройства 10. Значения PID-параметра, который может быть задан в устройстве 20 управления, являются выходными данными с выходного слоя. В случае, когда функция действие-значение, используемая для определения PID-параметра, использует значения других параметров в дополнение к или вместо измеренных значений PVn, целевых значений SVn для значений, подвергаемых регулированию, и значений MVn манипуляционных переменных для управления, значения других используемых параметров в равной степени вводятся во входной слой нейронной сети 7.[0050] In this embodiment, the number of combinations of options for the plant state s 3 determined from the measured values of the set of control devices 10 and the action a of inputting PID parameters to the control set 20 in state s will be huge. Therefore, deep reinforcement learning in which the action-value function is approximated by the neural network 7 is performed. A deep reinforcement learning algorithm can be a deep Q-learning network (DQN) or a double DQN or any other arbitrary algorithm. The neural network 7 may be a feed-forward neural network such as a multilayer perceptron neural network, a simple perceptron neural network, and a convolutional neural network. Alternatively, a neural network of any other arbitrary shape may be used. The input data for the input layer of the neural network 7 are all measured values PVn indicating the states of all controlled devices 10, target values SVn for values subject to regulation in all controlled devices 10, and values MVn of manipulated variables for control input from all control devices 20 during all controlled devices 10. The values of the PID parameter that can be set in the control device 20 are output data from the output layer. In the case where the action-value function used to determine the PID parameter uses the values of other parameters in addition to or instead of the measured PVn values, the target SVn values for the values subject to regulation, and the MVn values of the manipulated variables for control, the values of the other parameters used are equally introduced into the input layer of the neural network 7.

[0051] Блок 8 управления обучением определяет политику и детали обучения и выполняет глубокое обучение с подкреплением. Блок 8 управления обучением задает первоначальное условие в симуляторе 40 для запуска испытания и повторяет предварительно определенное число раз ввод PID-параметра в симулятор 40 и получение множества измеренных значений, указывающих состояние установки 3, которая управляется посредством введенного PID-параметра, которое возникает после предварительно определенного периода времени. Когда предварительно определенное число этапов завершается, блок 8 управления обучением заканчивает первое испытание и задает первоначальное условие снова, чтобы начинать следующее испытание. Например, в случае, когда характер работы установки 3, возникающий, когда нарушение или изменение в режиме работы происходит во время устойчивой работы установки 3, должен быть изучен, блок 8 управления обучением инструктирует симулятору 11 управляемого устройства и симулятору 43 устройства управления, формирующим симулятор 40, начинать обучение, с помощью измеренных значений, целевых значений и значений манипуляционных переменных для управления во время устойчивой работы, задаваемых в качестве первоначальных значений. Блок 8 управления обучением формирует нарушение или изменение в режиме работы в определенный случайным образом момент времени и вводит значение, соответствующее нарушению или изменению в режиме работы, в симулятор 40. Когда характер работы установки 3 при запуске установки 3 должен быть изучен, блок 8 управления обучением инструктирует симулятору 40 начинать обучение, с помощью значений, возникающих, когда работа останавливается, задаваемых в качестве первоначальных значений. Блок 8 управления обучением инструктирует симулятору 40 изучать характер работы установки 3, возникающий до тех пор, пока система не сведется к постоянной работе. Когда характер работы установки 3, возникающий, когда установка 3 приводится в остановленное состояние, должен быть изучен, блок 8 управления обучением инструктирует симулятору 40 начинать обучение, с помощью значений, возникающих при устойчивой работе, задаваемых в качестве первоначальных значений. Блок 8 управления обучением инструктирует симулятору 40 останавливать работу установки 3 и изучать характер работы установки 3, возникающий до тех пор, пока работа установки 3 не будет остановлена. Если предварительно определенное условие, в котором ясно, что выполняемое испытание не производит благоприятный результат, удовлетворяется, например, когда полученное значение вознаграждения меньше предварительно определенного значения, блок 8 управления обучением может прекращать испытание, прежде чем этапы завершаются предварительно определенное число раз, и начинать следующее испытание.[0051] The learning control unit 8 determines the training policy and details, and performs deep reinforcement learning. The learning control unit 8 sets the initial condition in the simulator 40 to start the test, and repeats the predetermined number of times the input of the PID parameter to the simulator 40 and obtains a plurality of measured values indicating the state of the plant 3, which is controlled by the input PID parameter, which occurs after the predetermined period of time. When the predetermined number of steps is completed, the learning control unit 8 ends the first trial and sets the initial condition again to start the next trial. For example, in the case where the operation behavior of the plant 3 occurring when a disturbance or change in the operation mode occurs during stable operation of the plant 3 is to be learned, the learning control unit 8 instructs the control device simulator 11 and the control device simulator 43 forming the simulator 40 , start learning, using the measured values, target values, and manipulative variable values to control during steady operation, given as initial values. The learning control unit 8 generates a violation or change in the operation mode at a randomly determined point in time, and inputs a value corresponding to the violation or change in the operation mode into the simulator 40. instructs the simulator 40 to start learning, with the values that occur when the operation is stopped, set as the initial values. The learning control unit 8 instructs the simulator 40 to learn how the plant 3 operates until the system is reduced to continuous operation. When the operation behavior of the plant 3 that occurs when the plant 3 is brought to a stopped state is to be learned, the learning control unit 8 instructs the simulator 40 to start learning, with the steady running values set as the initial values. The learning control unit 8 instructs the simulator 40 to stop the operation of the plant 3 and learn the behavior of the plant 3 occurring until the operation of the plant 3 is stopped. If the predetermined condition, in which it is clear that the test being performed does not produce a favorable result, is satisfied, for example, when the obtained reward value is less than the predetermined value, the learning control unit 8 may stop the test before the steps are completed the predetermined number of times, and start the next trial.

[0052] Блок 4 определения действия определяет множество PID-параметров, введенных в симулятор 40. Блок 4 определения действия определяет PID-параметры случайным образом или на основе функции действие-значение, представленной посредством нейронной сети 7. Блок 4 определения действия может выбирать, в соответствии с публично известным произвольным алгоритмом, таким как ε-каскадный метод, определять ли PID-параметры случайным образом или определять PID-параметры, которые максимизируют оценку, ожидаемую на основе функции действие-значение. Это обеспечивает эффективное обучение, в то же время предоставляя возможность испытать разнообразные и диверсифицированные варианты, следовательно, уменьшает время, прежде чем обучение сводится в одной точке.[0052] The action determination unit 4 determines a plurality of PID parameters input to the simulator 40. The action determination unit 4 determines the PID parameters randomly or based on the action-value function represented by the neural network 7. The action determination unit 4 may select, in according to a publicly known arbitrary algorithm such as the ε-cascade method, whether to determine the PID parameters randomly or to determine the PID parameters that maximize the estimate expected based on the action-value function. This ensures effective learning while providing the opportunity to experience diverse and diversified options, hence reducing the time before learning converges to one point.

[0053] Блок 9 получения множества измеренных значений получает множество измеренных значений, указывающих состояния множества симуляторов 41 управляемых устройств, из симулятора 40. Блок 5 получения значения вознаграждения получает значение вознаграждения для состояния установки 3, указанной посредством множества измеренных значений, полученных посредством блока 9 получения множества измеренных значений. Значение вознаграждения представляет индекс устойчивости, указывающий правильность состояния процесса 12, выполняемого в установке 3, в числовых выражениях. Более конкретно, значение вознаграждения представляет индекс устойчивости, указывающий правильность состояния процесса в числовых выражениях согласно одному или более следующим критериям: (1) разница между множеством измеренных значений и целевыми значениями управления является небольшой; (2) множество измеренных значений не колеблются; или (3) время, требуемое для стабилизации множества измеренных значений, является коротким. Например, значение вознаграждения определяется так, что, чем меньше разница между измеренными значениями и целевыми значениями управления, меньше колебание измеренных значений и короче время, требуемое для стабилизации измеренных значений, тем выше значение вознаграждения.[0053] The measurement value plurality acquisition unit 9 acquires a plurality of measured values indicating the states of the plurality of controllable simulators 41 from the simulator 40. The reward value acquisition unit 5 obtains the reward value for the setting state 3 indicated by the measurement value plurality obtained by the acquisition unit 9 set of measured values. The reward value represents a stability index indicating the correctness of the state of the process 12 running in the installation 3 in numerical terms. More specifically, the reward value represents a stability index indicating the correctness of the process state in numerical terms according to one or more of the following criteria: (1) the difference between the set of measured values and the target control values is small; (2) a set of measured values do not fluctuate; or (3) the time required for the set of measured values to stabilize is short. For example, the reward value is determined such that the smaller the difference between the measured values and the control target values, the smaller the fluctuation of the measured values, and the shorter the time required for the measured values to stabilize, the higher the reward value.

[0054] Блок 6 обновления функции действие-значение обновляет функцию действие-значение, представленную посредством нейронной сети 7, на основе значения вознаграждения, полученного посредством блока 5 получения значения вознаграждения. Блок 6 обновления функции действие-значение инструктирует изучение весовых коэффициентов в нейронной сети 7, так что выходное значение функции действие-значение для набора действий, предпринимаемых блоком 4 определения действия в данном состоянии s, приближается к ожидаемому значению суммы i) значения вознаграждения, полученного посредством блока 5 получения значения вознаграждения в результате действия, предпринятого блоком 4 определения действия в данном состоянии s, и ii) значения вознаграждения, которое будет получено, если оптимальное действие продолжится впоследствии. Другими словами, блок 6 обновления функции действие-значение регулирует весовые коэффициенты соединений в слоях нейронной сети 7, так что погрешность между i) суммой значения вознаграждения, фактически полученного посредством блока 5 получения значения вознаграждения, и значением, полученным из умножения ожидаемого значения для значения вознаграждения, которое будет получено впоследствии, на скидку по времени, и ii) выходным значением функции действие-значение уменьшается. Это предоставляет возможность обновления весовых коэффициентов и облегчает обучение, так что значение действия, вычисленное посредством нейронной сети 7, приближается к истинному значению.[0054] The action-value function updating unit 6 updates the action-value function represented by the neural network 7 based on the reward value obtained by the reward value obtaining block 5 . The action-value function updater 6 instructs learning the weights in the neural network 7 so that the output value of the action-value function for the set of actions taken by the action determiner 4 in a given state s approximates the expected value of the sum i) of the reward value obtained by block 5 obtaining the value of the reward as a result of the action taken by the block 4 determine the action in the given state s , and ii) the value of the reward that will be obtained if the optimal action continues afterwards. In other words, the action-value function updater 6 adjusts the connection weights in the layers of the neural network 7 so that the error between i) the sum of the reward value actually obtained by the reward value obtaining block 5 and the value obtained from multiplying the expected value for the reward value , which will be obtained subsequently, by the time discount, and ii) the output value of the action-value function is reduced. This allows the weights to be updated and facilitates learning so that the action value calculated by the neural network 7 approaches the true value.

[0055] Фиг. 8 показывает пример вида экрана, отображаемого на устройстве отображения пользовательской операционной панели. Экран отображает блок-схему последовательности операций процесса установки 3, текущее значение PID-параметров, заданных в соответствующих PID-контроллерах, и рекомендованные значения PID-параметров, определенные посредством устройства 30 поддержки настройки режима работы. Когда оператор вводит PID-параметр со ссылкой на рекомендованное значение, показанное на устройстве отображения, введенный PID-параметр получается блоком 25 получения PID-параметра устройства 20 управления и используется блоком 26 определения манипуляционной переменной для управления, чтобы определять манипуляционные переменные для управления. Это стабилизирует характер работы установки 3 в короткий срок, даже когда фактор, который может дестабилизировать характер работы установки 3, возникает.[0055] FIG. 8 shows an example of a screen layout displayed on the display device of the user operation panel. The screen displays a flowchart of the installation process 3, the current value of the PID parameters set in the respective PID controllers, and the recommended values of the PID parameters determined by the operation mode setting support device 30 . When the operator inputs a PID parameter with reference to the recommended value shown on the display device, the input PID parameter is received by the PID parameter obtaining unit 25 of the control device 20 and used by the manipulation variable determination unit 26 to control to determine the manipulation variables to be controlled. This stabilizes the operation behavior of the installation 3 in a short time even when a factor that can destabilize the operation behavior of the installation 3 occurs.

[0056] Выше описано объяснение на основе примерного варианта осуществления. Вариант осуществления предполагается только как иллюстративный, и специалистам в области техники будет понятно, что различные модификации в составляющих элементах и процессах могут быть разработаны, и что такие модификации также находятся в рамках настоящего изобретения.[0056] The above has described an explanation based on an exemplary embodiment. The embodiment is intended to be illustrative only, and those skilled in the art will appreciate that various modifications to the constituent elements and processes may be devised, and that such modifications are also within the scope of the present invention.

[0057] Технология настоящего изобретения может быть использована в установке, в которой множество устройств управления управляют множеством управляемых устройств (устройств, подвергаемых управлению). В то время как установка, которая включает в себя множество систем управления, каждая из которых подвергается PID-регулированию, описывается в варианте осуществления, технология настоящего изобретения в равной степени является применимой к установке, которая включает в себя системы управления, основанные на любой другой произвольной схеме управления, такой как P-регулирование и PI-регулирование.[0057] The technology of the present invention can be used in an installation in which a plurality of control devices control a plurality of controllable devices (devices to be controlled). While an installation that includes a plurality of control systems each subject to PID control is described in the embodiment, the technology of the present invention is equally applicable to an installation that includes control systems based on any other arbitrary control scheme such as P-regulation and PI-regulation.

Описание ссылочных символовDescription of reference symbols

[0058] 1 система поддержки настройки режима работы установки, 2 обучающее устройство, 3 установка, 4 блок определения действия, 5 блок получения значения вознаграждения, 6 блок обновления функции действие-значение, 7 нейронная сеть, 8 блок управления обучением, 9 блок получения множества измеренных значений, 10 управляемое устройство, 11 симулятор управляемого устройства, 12 процесс, 20 устройство управления, 21 блок управления, 22 пользовательская операционная панель, 23 блок получения измеренного значения, 24 блок получения целевого значения, 25 блок получения PID-параметра, 26 блок определения манипуляционной переменной для управления, 27 блок ввода манипуляционной переменной для управления, 30 устройство поддержки настройки режима работы, 31 блок управления, 32 блок получения множества измеренных значений, 33 блок определения PID-параметра, 34 блок вывода PID-параметра, 35 блок переключения режима, 36 блок обновления политики, 40 симулятор, 41 симулятор управляемого устройства, 42 симулятор процесса, 43 симулятор устройства управления[0058] 1 installation mode setting support system, 2 training device, 3 installation, 4 action determination unit, 5 reward value acquisition unit, 6 action-value function update unit, 7 neural network, 8 learning control unit, 9 set acquisition unit measured value, 10 controlled device, 11 controlled device simulator, 12 process, 20 control device, 21 control block, 22 user operation panel, 23 measured value acquisition block, 24 target value acquisition block, 25 PID parameter acquisition block, 26 determination block 27 Manipulation variable input block for control, 30 Operation mode setting support device, 31 Control block, 32 Measurement value set acquisition block, 33 PID parameter determination block, 34 PID parameter output block, 35 Mode switching block, 36 policy update block, 40 simulator, 41 managed device simulator, 42 process simulator, 43 control device simulator

Промышленная применимостьIndustrial Applicability

[0059] Настоящее изобретение является применимым к системе поддержки настройки режима работы установки для поддержки настройки режима работы установки.[0059] The present invention is applicable to a plant operation mode setting support system for supporting plant operation mode setting.

Claims

1. A plant operation mode setting support system for supporting setting an operation mode of a plant that executes a process generated by a plurality of devices, comprising:

a plurality of control devices that act on one or more controllable devices in a plurality of devices for feedback control, respectively; and

an operation mode setting support device that provides unified support for setting a plurality of control devices that perform a plurality of feedback control tasks respectively and independently, wherein

each of the many control devices includes:

a measurement value acquisition unit implemented by a central processing unit (CPU) of a computer that obtains a measurement value indicative of a state of the device to be controlled;

a control device adjustment parameter obtaining unit implemented by the central processing unit (CPU) of the computer, which obtains the control device adjustment parameter for determining a manipulated variable for control input to the controlled device;

a manipulated variable determination unit for control implemented by a central processing unit (CPU) of a computer that determines a manipulated variable for control based on a measured value obtained by the measured value acquisition unit and a control device adjustment parameter obtained by the control device adjustment parameter acquisition unit; and

a manipulation variable input unit for control implemented by the central processing unit (CPU) of the computer, which inputs the manipulation variable for control determined by the manipulation variable for control determination unit to the controlled device,

operation mode setting support device includes:

a plurality of measurement value acquisition unit implemented by a central processing unit (CPU) of a computer that acquires a plurality of measurement values indicative of states of a plurality of controllable devices controlled by the plurality of control devices, respectively; and

a control device adjustment parameter determination unit implemented by a central processing unit (CPU) of a computer that determines, based on the plurality of measured values obtained by the plurality of measured value acquisition unit, a plurality of control device adjustment parameters used by each of the plurality of control devices to determine manipulation variables to control to be injected into a plurality of managed devices according to the policy learned through deep reinforcement learning.

2. The system for supporting the setting of the operating mode of the installation according to claim 1, while

the control device adjustment parameter determining unit determines the control device adjustment parameter set according to the policy learned through deep learning with reinforcement to learn the policy to determine the control device adjustment parameter set, the policy is based on the measured value of the control device, the control target value, and the manipulative variable for control arising when the plant is running, on a reward value that represents a stability index indicating an evaluation of a measured value, a control target value, and a manipulated variable to be controlled in numerical terms, and on a control device adjustment parameter used to determine the manipulated variable to be controlled.

3. The system for supporting the setting of the operating mode of the installation according to claim 2, additionally comprising:

a learning device that performs deep reinforcement learning while

learning device includes:

an action determining unit implemented by a central processing unit (CPU) of a computer that receives a plurality of measured values indicative of states of a plurality of controllable devices and outputs a plurality of control device adjustment parameters used by each of the plurality of control devices; and

an evaluation function block implemented by a central processing unit (CPU) of a computer that calculates an estimate for a set of i) a plurality of measurement values indicative of the states of the plurality of controllable devices occurring when the plurality of control devices control the plurality of controllable devices with adjustment parameters of the control device outputted by the action determination block, and ii) the adjustment parameters of the control device used, wherein

the evaluation function block is trained to reduce the error between i) the expected value for the reward value to be obtained when the control device adjustment parameter determining block determines the control device adjustment parameters that are input to the controlled devices while the plurality of controlled devices are in the states indicated by the plurality of measured values, manipulation variables for control determined by the plurality of controllers by means of certain control device adjustment parameters are input to the plurality of controllable devices to update the states of the plurality of controllable devices, and optimal control device adjustment parameters continue to be selected subsequently, and ii) a score computed by the score function block.

4. Support system for setting the operating mode of the installation according to claim 2 or 3, while

the reward value may represent a stability index indicating the correctness of the state of the process in numerical terms.

5. Support system for setting the operating mode of the installation according to claim 2 or 3, while

the reward value represents a stability index indicating the correctness of the state of the process in numerical terms according to one or more of the following criteria: (1) the difference between the set of measured values and the target control values is small; (2) a set of measured values do not fluctuate; or (3) the time required for the set of measured values to stabilize is short.

6. Support system for setting the operating mode of the installation according to claim 2 or 3, while

the control device adjustment parameter determining unit determines a set of control device adjustment parameters used when the operation of the plant is started or brought to a stop according to the policy learned through deep reinforcement learning that uses the measured values and manipulated variables to control occurring when the plant operation is started or is brought to a stop, and the adjustment parameters of the control device.

7. Support system for setting the operating mode of the installation according to claim 2 or 3, while

the control device adjustment parameter determining unit determines a set of control device adjustment parameters when a disturbance occurs or when the operation mode is changed during plant operation, according to a policy learned through deep reinforcement learning that uses measured values and manipulative variables to control occurring when violation occurs, or when the operating mode is changed during the operation of the installation, and the adjustment parameter of the control device.

8. Support system for setting the operating mode of the installation according to claim 2 or 3, while

the operation mode setting support device further includes a mode switching unit that indicates, based on the policy learned through deep reinforcement learning, to the control device whether to control in an automatic mode, in which the control device automatically inputs a manipulative variable for control into a controllable device, or in manual mode, in which the control device inputs a manipulated variable to be controlled into the controlled device in response to an instruction on the manipulated variable to be controlled from an operator.

9. Support system for setting the operating mode of the installation according to paragraphs. 1-3, while

the operation mode setting support device notifies the plurality of control device adjustment parameters determined by the control device adjustment parameter determining unit to the respective control devices, and

the control device obtains the control device adjustment parameter reported from the operation mode setting support device by the control device adjustment parameter acquisition unit.

10. Support system for setting the operating mode of the installation according to paragraphs. 1-3, while

the operation mode setting support device presents a plurality of control device adjustment parameters determined by the control device adjustment parameter determining unit to the operator, and

the control device receives the control device adjustment parameter entered by the operator through the control device adjustment parameter acquisition unit.

11. An operation mode setting support device for providing joint support for setting a plurality of control devices to affect one or more controllable devices that exist among a plurality of devices forming a process performed in an installation for feedback control, respectively, the device comprises: