KR20240000926A

KR20240000926A - A neural network-based factory scheduling system independent of product type and number of types

Info

Publication number: KR20240000926A
Application number: KR1020220077726A
Authority: KR
Inventors: 이호열; 윤영민
Original assignee: 주식회사 뉴로코어
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2024-01-03

Abstract

N개의 제품 유형에 대해서만 신경망을 학습시켜 각 공정에서 제품 유형의 스케줄링을 수행하되, N개 이상의 제품 유형 중에서 N개의 후보 유형을 선정하고, 선정된 N개의 후보 유형의 공정 상태만을 추출하여 신경망에 입력하는, 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 관한 것으로서, 공장 시뮬레이터를 이용하여 상기 공장 워크플로우를 모의하는 모의 실행부; 상기 공장 시뮬레이터에 의해 모의된 워크플로우 상태에서 제품 유형의 후보(이하 후보 유형)를 선출하되, 상기 스케줄링 신경망의 제품 유형의 개수만큼의 후보 유형들을 선출하는 후보유형 선출부; 선출된 후보 유형과 상기 스케줄링 신경망의 제품 유형을 매핑하는 제품유형 매핑부; 및, 상기 공장 시뮬레이터에 의해 모의된 워크플로우 상태를 이용하여 상기 스케줄링 신경망의 학습 데이터를 생성하되, 상기 후보유형 선출부를 통해 후보 유형을 선출하게 하고 상기 제품유형 매핑부를 통해 선출된 후보 유형과 상기 스케줄링 신경망의 제품 유형을 매핑하게 하여, 해당 워크플로우 상태에 해당하는 학습 데이터를 생성하는 학습 실행부를 포함하는 구성을 마련한다.
상기와 같은 시스템에 의하여, N개의 후보 유형을 선정하여 신경망에 적용하고 N개로 조합되는 제품 유형이 다르더라도 신경망을 공유함으로써, 조합되는 제품 유형이 변경되더라도 신경망을 재설계/재학습 하지 않고 이전에 학습된 신경망을 그대로 이용할 수 있다.
Scheduling of product types is performed in each process by learning a neural network for only N product types. N candidate types are selected from more than N product types, and only the process states of the selected N candidate types are extracted and input into the neural network. A neural network-based process scheduling system independent of product type and number of types, comprising: a simulation execution unit that simulates the factory workflow using a factory simulator; a candidate type selection unit that selects product type candidates (hereinafter referred to as candidate types) in a workflow state simulated by the factory simulator, and selects as many candidate types as the number of product types in the scheduling neural network; a product type mapping unit that maps the selected candidate type and the product type of the scheduling neural network; And, generate learning data of the scheduling neural network using the workflow state simulated by the factory simulator, select a candidate type through the candidate type selection unit, and select the candidate type selected through the product type mapping unit and the scheduling. By mapping the product type of the neural network, a configuration including a learning execution unit that generates learning data corresponding to the corresponding workflow state is prepared.
By using the above system, N candidate types are selected and applied to the neural network, and the neural network is shared even if the product types combined into N are different, so that even if the combined product type changes, the neural network can be used without redesigning/relearning the neural network. The learned neural network can be used as is.

Description

A neural network-based factory scheduling system independent of product type and number of types }

본 발명은 일련의 공정으로 구성되는 공장 워크플로우에서 각 공정의 스케줄링을 위한 공정별 신경망을 학습하되, 시뮬레이터에 의해 공장 워크플로우를 모의하여 모의 데이터로 신경망을 학습하는, 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 관한 것이다.The present invention learns a process-specific neural network for scheduling each process in a factory workflow consisting of a series of processes, but simulates the factory workflow using a simulator and learns the neural network with simulated data, independent of the product type and number of types. It relates to a neural network-based process scheduling system.

또한, 본 발명은 생산해야 할 제품 유형이 지속적으로 변화하는 공정 상황에서, 미리 정한 N개 만큼의 최우선 제품유형을 선별하고 이를 대상으로만 신경망을 학습시켜, 각 공정에서 N개의 최우선 제품 유형 중 최종 스케줄링 결과를 산출하는, 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 관한 것이다. 특히, 본 발명은, 이를 위해서, N개 이상의 다양한 제품 유형이 존재하고, 또 스케줄링 수행 시마다 제품 유형 개수가 변화하는 상황에서, N개의 최우선 후보 유형을 선정하고, 선정된 N개의 후보 유형에 해당하는 공정 상태만을 추출하여 신경망에 입력한다.In addition, in a process situation where the type of product to be produced is continuously changing, the present invention selects a predetermined number of high-priority product types and trains a neural network only for these, so that the final product type among the N high-priority product types in each process It relates to a neural network-based process scheduling system that is independent of product type and number of types, and produces scheduling results. In particular, the present invention selects N highest candidate types in a situation where there are more than N various product types and the number of product types changes each time scheduling is performed, and the N highest candidate types corresponding to the selected N candidate types are used for this purpose. Only the process state is extracted and input into the neural network.

일반적으로, 공장 스케줄링은 원료나 재료로부터 제품이 완성되기까지의 제조 과정에서, 작업물의 가공에 필요한 공정의 작업 순서와, 각 공정에 필요한 재료나 생산 시점 등을 결정하는 작업을 일컫는다.In general, factory scheduling refers to the task of determining the work order of the processes necessary for processing a workpiece, the materials needed for each process, and the timing of production during the manufacturing process from raw materials or materials to the completion of the product.

특히, 제품을 생산하는 공장에는 각 공정 작업을 처리하는 장비들이 해당 공정의 작업 공간에 배치되어 구비된다. 해당 장비들에는 특정 작업을 처리하기 위한 로트들이 공급되도록 구성될 수 있다. 또한, 장비들 사이 또는 작업 공간들 사이에는 컨베이어 등 이송 장치 등이 설치되어, 장비에 의해 특정 공정이 완료되면 처리된 로트가 다음 공정으로 이동되도록 구성된다. 즉, 하나의 로트는 일련의 공정을 거쳐 완성된 제품으로 생산된다.In particular, factories that produce products are equipped with equipment that handles each process task, arranged in the work space for that process. The equipment can be configured to be supplied with lots to process specific tasks. In addition, transfer devices such as conveyors are installed between equipment or work spaces, so that the processed lot is moved to the next process when a specific process is completed by the equipment. In other words, one lot is produced as a finished product through a series of processes.

또한, 특정 공정을 수행하기 위해 유사/동일 기능의 다수의 장비들이 설치되어, 동일하거나 유사한 공정 작업을 분담하여 처리될 수 있다. 이와 같은 제조 라인에서 공정 또는 각 작업을 스케줄링하는 것은 공장 효율화를 위해 매우 중요한 문제이다. 종래에는 대부분 스케줄링을 각 조건에 따른 규칙 기반(rule-based) 형식으로 스케줄링 하였으나, 평가 척도가 명확하지 않아 만들어진 스케줄링 결과에 대한 성능 평가가 모호하였다.Additionally, in order to perform a specific process, multiple pieces of equipment with similar/same functions may be installed and the same or similar process tasks may be divided and processed. Scheduling the process or each task in such a manufacturing line is a very important issue for factory efficiency. Conventionally, most scheduling was done in a rule-based format according to each condition, but the performance evaluation of the scheduling results was ambiguous because the evaluation scale was not clear.

또한, 최근에는 제조 공정에 인공지능 기법을 도입하여 작업을 스케줄링하는 기술들이 제시되고 있다[특허문헌 1]. 상기 선행기술은 인공지능 기술 중 유전자 알고리즘이라는 기계학습 알고리즘을 사용했으나, 공작 기계의 작업을 스케줄링에 한정하고 있다. 또한, 다수 설비의 공정에 대한 신경망 학습 방법을 적용한 기술도 제시되고 있다[특허문헌 2]. 그러나 상기 선행기술은 과거의 데이터를 기반으로, 주어진 상황에서 최적 제어방법을 찾는 기술로서, 과거에 축적된 데이터가 없다면 작동하지 않는다는 명확한 한계가 존재한다.Additionally, recently, technologies for scheduling work by introducing artificial intelligence techniques into the manufacturing process have been proposed [Patent Document 1]. The above prior art used a machine learning algorithm called a genetic algorithm among artificial intelligence technologies, but limited the work of machine tools to scheduling. In addition, a technology applying a neural network learning method to processes of multiple facilities is also proposed [Patent Document 2]. However, the above prior art is a technology that finds the optimal control method in a given situation based on past data, and has a clear limitation in that it does not work without data accumulated in the past.

상기와 같은 문제점을 해결하기 위하여, 본 출원인은 공장 시뮬레이터를 이용하여 공정을 모의하고 모의된 데이터를 이용하여 각 공정의 신경망을 학습하는 기술을 제시하고 있다[특허문헌 3]. 그런데 이때, 인공 신경망의 출력에는 제품 유형의 가짓수에 해당하는 노드가 배치되며, 출력 노드들 중에서 산출되는 신호(또는 숫자)의 크기가 가장 큰 노드를 선택하고, 해당 출력 노드에 해당하는 로트가 인공 신경망이 선택한 제품이 된다.In order to solve the above problems, the present applicant proposes a technology for simulating the process using a factory simulator and learning a neural network for each process using the simulated data [Patent Document 3]. However, at this time, nodes corresponding to the number of product types are placed in the output of the artificial neural network, and among the output nodes, the node with the largest signal (or number) size is selected, and the lot corresponding to the output node is artificially The neural network becomes the product of choice.

예를 들어, 현재 제품 유형이 제품 A, 제품 B 등 2가지인 경우의 인공 신경망을 도 1(a)에 예시하고 있다. 도 1에서 보는 바와 같이, 인공 신경망의 출력에는 제품 A, B 각각을 나타내는 2개의 출력 노드가 설정된다. 또한, 인공 신경망의 입력 측에도 제품 A, B의 공정 상태를 나타내는 노드들로 구성되며, 이들 입력 노드의 개수도 제품 유헝의 가짓수에 영향을 받는다.For example, the artificial neural network in the case where there are currently two product types, such as product A and product B, is shown in Figure 1(a). As shown in Figure 1, two output nodes representing products A and B are set at the output of the artificial neural network. In addition, the input side of the artificial neural network is composed of nodes representing the process status of products A and B, and the number of these input nodes is also affected by the number of product conditions.

따라서 공장 워크플로우의 상태에서 제품 유형의 개수가 변화되면, 인공신경망의 노드를 변경해 주어야 하고, 학습도 처음부터 다시 시작해야 한다. 예를들어, 제품 유형이 A, B 등 2가지인 공장에 제품 유형 C를 신규로 생산해야 하는 경우에는, 도 1(a)의 인공 신경망과 다른 형태의 인공 신경망을 재설계 후 재학습 해야만 한다. 즉, 도 1(b)와 같은 형태의 인공 신경망으로 재설계한 후 재학습 해야만 한다.Therefore, if the number of product types changes in the factory workflow, the nodes of the artificial neural network must be changed and learning must start again from the beginning. For example, if a factory with two product types, A and B, needs to newly produce product type C, a different type of artificial neural network from the artificial neural network in Figure 1(a) must be redesigned and retrained. . In other words, it must be redesigned and retrained as an artificial neural network of the type shown in Figure 1(b).

일반적인 공장에서 생산하는 제품의 가짓수는 미리 정의할 수 없다. 따라서 상기 선행기술은 생산하는 제품의 수가 변화할 때마다 인공 신경망 전체를 재설계/재학습 해야 한다는 문제점이 있다. 즉, 19×19 바둑 문제를 의사결정하는 알파고의 인공신경망의 아웃풋 노드는 19×19 수만큼 존재하는데, 바둑판이 20×20으로 변경 시, 신경망 전체를 재 설계 및 재 학습해야 한다.The number of products produced in a typical factory cannot be defined in advance. Therefore, the above prior art has a problem in that the entire artificial neural network must be redesigned/relearned every time the number of products produced changes. In other words, AlphaGo's artificial neural network that makes decisions on the 19×19 Go problem has as many output nodes as 19×19, but when the Go board is changed to 20×20, the entire neural network must be redesigned and retrained.

한국 등록특허공보 제10-1984460호(2019.05.30.공고)Korean Patent Publication No. 10-1984460 (announced on May 30, 2019) 한국 등록특허공보 제10-2035389호(2019.10.23.공고)Korean Patent Publication No. 10-2035389 (announced on October 23, 2019) 한국 등록특허공보 제10-2338304호(2021.12.13.공고)Korean Patent Publication No. 10-2338304 (announced on December 13, 2021)

V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015. V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015. The Goal: A Process of Ongoing Improvement, Eliyahu M. Goldratt 1984 The Goal: A Process of Ongoing Improvement, Eliyahu M. Goldratt 1984

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 공장마다 다르고 또한 특정 공장 내에서도 유동적으로 변화하는 공장의 생산제품 유형 환경에서, 항상 일정한 N개의 제품 유형 갯수에 대해서만 신경망을 학습시켜 각 공정에서 제품 유형의 스케줄링을 수행하되. 이를 위하여, N개 이상의 제품 유형 중에서 N개의 후보 유형을 선정하고, 선정된 N개의 후보 유형의 공정 상태만을 추출하여 신경망에 입력하는, 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템을 제공하는 것이다.The purpose of the present invention is to solve the problems described above, and in an environment of factory production product types that are different for each factory and fluctuate even within a specific factory, each process is performed by learning a neural network only for a constant number of N product types. Perform scheduling of product types in . To this end, N candidate types are selected from N or more product types, and only the process states of the selected N candidate types are extracted and input into the neural network, providing a neural network-based process scheduling system independent of product type and number of types. will be.

특히, 본 발명의 목적은 N개로 조합되는 제품 유형이 다르더라도, 해당 공정의 신경망을 공유하여, 해당 신경망을 학습시키고 해당 공정의 스케줄링을 위해 해당 신경망에 적용하는, 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템을 제공하는 것이다.In particular, the purpose of the present invention is to share the neural network of the process, learn the neural network, and apply it to the neural network for scheduling of the process, independent of the product type and number of types, even if the product types combined into N are different. It provides a neural network-based process scheduling system.

상기 목적을 달성하기 위해 본 발명은 공장 워크플로우의 각 공정의 스케줄링 신경망을 학습시키되, 공장 시뮬레이터로 상기 공장 워크플로우를 모의하고 모의 결과를 이용하여 상기 각 공정의 스케줄링 신경망을 학습시키는, 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 관한 것으로서, 상기 공장 시뮬레이터를 이용하여 상기 공장 워크플로우를 모의하는 모의 실행부; 상기 공장 시뮬레이터에 의해 모의된 워크플로우 상태에서 제품 유형의 후보(이하 후보 유형)를 선출하되, 상기 스케줄링 신경망의 제품 유형의 개수만큼의 후보 유형들을 선출하는 후보유형 선출부; 선출된 후보 유형과 상기 스케줄링 신경망의 제품 유형을 매핑하는 제품유형 매핑부; 및, 상기 공장 시뮬레이터에 의해 모의된 워크플로우 상태를 이용하여 상기 스케줄링 신경망의 학습 데이터를 생성하되, 상기 후보유형 선출부를 통해 후보 유형을 선출하게 하고 상기 제품유형 매핑부를 통해 선출된 후보 유형과 상기 스케줄링 신경망의 제품 유형을 매핑하게 하여, 해당 워크플로우 상태에 해당하는 학습 데이터를 생성하는 학습 실행부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention learns a scheduling neural network for each process of the factory workflow, simulates the factory workflow with a factory simulator, and uses the simulation results to learn the scheduling neural network for each process, including product type and A neural network-based process scheduling system independent of the number of types, comprising: a simulation execution unit that simulates the factory workflow using the factory simulator; a candidate type selection unit that selects product type candidates (hereinafter referred to as candidate types) in a workflow state simulated by the factory simulator, and selects as many candidate types as the number of product types in the scheduling neural network; a product type mapping unit that maps the selected candidate type and the product type of the scheduling neural network; And, generate learning data of the scheduling neural network using the workflow state simulated by the factory simulator, select a candidate type through the candidate type selection unit, and select the candidate type selected through the product type mapping unit and the scheduling. It is characterized by including a learning execution unit that maps the product type of the neural network and generates learning data corresponding to the corresponding workflow state.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 후보유형 선출부는 각 제품 유형과 관련된 워크플로우 상태의 변수(이하 상태 변수)를 이용하여, 각 제품 유형의 해당 변수의 변수 값을 구하고 해당 변수 값에 따라 제품 유형을 선출하는 것을 특징으로 한다.In addition, the present invention is a neural network-based process scheduling system that is independent of product type and number of types, wherein the candidate type selection unit uses variables of the workflow state (hereinafter referred to as state variables) related to each product type to determine the corresponding state of each product type. It is characterized by obtaining the variable value of the variable and selecting the product type according to the variable value.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 후보유형 선출부는 각 제품 유형 d의 평가치 V(d)를 다음 수식에 의하여 산출하는 것을 특징으로 한다.In addition, the present invention is a neural network-based process scheduling system independent of product type and number of types, wherein the candidate type selection unit calculates the evaluation value V(d) of each product type d using the following formula.

[수식 1][Formula 1]

단, w_k는 k번째 상태 변수에 대한 가중치를 나타내고, v_k(d)는 k번째 상태 변수에 대한 제품 유형 d의 변수값을 나타냄.However, w _k represents the weight for the kth state variable, and v _k (d) represents the variable value of product type d for the kth state variable.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 상태 변수는 기 기한, 대기 로트 수, 장비 필요도, 대기 시간, 연속 작업 가능도 중 어느 하나 이상인 것을 특징으로 한다.In addition, the present invention is a neural network-based process scheduling system independent of product type and number of types, wherein the state variable is one or more of machine deadline, number of waiting lots, equipment need, waiting time, and possibility of continuous operation. do.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 제품유형 매핑부는 선출된 제품 유형의 선출 순서에 따라 항상 동일하게 상기 스케줄링 신경망의 제품 유형과 매핑시키는 것을 특징으로 한다.In addition, the present invention is a neural network-based process scheduling system that is independent of product type and number of types, wherein the product type mapping unit always maps the product type of the scheduling neural network identically according to the order of selection of the selected product type. do.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 제품유형 매핑부는 상기 워크플로우 상태의 제품 유형의 개수가 상기 스케줄링 신경망의 제품 유형의 개수 보다 적은 경우, 적은 개수만큼의 가상의 제품 유형(이하 가상 유형)을 생성하되, 해당 가상 유형의 상태는 해당 제품이 없는 상태로 설정하여 매핑하는 것을 특징으로 한다.In addition, the present invention relates to a neural network-based process scheduling system that is independent of product type and number of types, where the product type mapping unit generates a smaller number when the number of product types in the workflow state is less than the number of product types in the scheduling neural network. It is characterized by creating as many virtual product types (hereinafter referred to as virtual types), but mapping the state by setting the state of the corresponding virtual type to a state where there is no corresponding product.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 제품유형 매핑부는 상기 워크플로우 상태의 제품 유형의 개수가 상기 스케줄링 신경망의 제품 유형의 개수 보다 적은 경우, 적은 개수만큼의 가상의 제품 유형(이하 가상 유형)을 생성하거나, 제품 유형을 중복하여 생성하는 것을 특징으로 한다.In addition, the present invention relates to a neural network-based process scheduling system that is independent of product type and number of types, where the product type mapping unit generates a smaller number when the number of product types in the workflow state is less than the number of product types in the scheduling neural network. It is characterized by creating as many virtual product types (hereinafter referred to as virtual types) or creating duplicate product types.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 시스템은, 상기 공장 시뮬레이터를 통해 모의하고 각 공정의 의사결정을 상기 각 공정의 스케줄링 신경망으로 수행하게 하여, 스케줄링을 위한 생산 에피소드를 생성하되, 각 공정의 스케줄링 신경망에 대하여, 상기 후보유형 선출부를 통해 후보 유형을 선출하게 하고, 상기 제품유형 매핑부를 통해 해당 후보 유형을 해당 스케줄링 신경망의 제품 유형에 매핑하여 상태 데이터를 생성하고, 생성된 상태 데이터를 해당 공정의 스케줄링 신경망에 적용하여 결과를 획득하는 스케줄링 생성부를 더 포함하는 것을 특징으로 한다.In addition, the present invention relates to a neural network-based process scheduling system that is independent of product type and number of types. The system simulates the factory simulator and makes decisions for each process using the scheduling neural network for each process, thereby scheduling the process. A production episode is generated for each process, and for the scheduling neural network of each process, a candidate type is selected through the candidate type selection unit, and the candidate type is mapped to the product type of the scheduling neural network through the product type mapping unit to generate state data. It is characterized by further comprising a scheduling generator that generates and obtains a result by applying the generated state data to the scheduling neural network of the corresponding process.

또한, 본 발명은 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 있어서, 상기 각 공정의 스케줄링 신경망은 강화학습에 의한 신경망인 것을 특징으로 한다.In addition, the present invention is a neural network-based process scheduling system that is independent of product type and number of types, and the scheduling neural network for each process is a neural network based on reinforcement learning.

상술한 바와 같이, 본 발명에 따른 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템에 의하면, N개의 유한한 가짓수로 미리 후보 유형을 정의하고 정의된 가짓수만큼의 후보를 선정하여 신경망에 적용하여 N개로 조합되는 제품 유형이 다르더라도 신경망을 공유함으로써, 조합되는 제품 유형이 변경되더라도 신경망을 재설계/재학습 하지 않고 이전에 학습된 신경망을 그대로 이용할 수 있는 효과가 얻어진다.As described above, according to the neural network-based process scheduling system independent of the product type and number of types according to the present invention, candidate types are defined in advance with a finite number of N, candidates as many as the defined number are selected and applied to the neural network. By sharing the neural network even if the product types being combined into N are different, the effect of being able to use the previously learned neural network as is without redesigning/relearning the neural network even if the combined product type changes is achieved.

도 1은 제품 유형의 개수에 따른 신경망의 구조를 나타낸 예시도.
도 2는 본 발명의 일실시예에 따른는 공장 워크플로우의 모델을 도시한 예시도.
도 3은 본 발명의 일실시예에 따른 공정의 구성에 대한 블록도.
도 4는 본 발명의 일실시예에 따른 공정의 장비 구성을 예시한 도면.
도 5는 본 발명의 일실시예에 따른 작업 교체 시간을 나타낸 예시 표.
도 6은 본 발명을 실시하기 위한 전체 시스템에 대한 구성도.
도 7은 본 발명에서 사용하는 강화학습의 기본 작동 구조도.
도 8은 본 발명의 일실시예에 따른 공장 워크플로우의 상태 데이터를 추출하여 신경망에 적용하는 과정을 나타낸 예시도.
도 9는 본 발명의 일실시예에 따른 공장 시뮬레이터 기반 스케줄링 시스템의 구성에 대한 블록도.
도 10은 본 발명의 일실시예에 따른 상태 변수를 예시한 표.
도 11은 본 발명의 일실시예에 따른 제품 유형을 선출하여 매핑하는 과정을 나타낸 예시도.Figure 1 is an example diagram showing the structure of a neural network according to the number of product types.
Figure 2 is an exemplary diagram showing a model of a factory workflow according to an embodiment of the present invention.
Figure 3 is a block diagram of the configuration of a process according to an embodiment of the present invention.
Figure 4 is a diagram illustrating the equipment configuration of a process according to an embodiment of the present invention.
Figure 5 is an example table showing job replacement times according to an embodiment of the present invention.
Figure 6 is a configuration diagram of the entire system for implementing the present invention.
Figure 7 is a basic operational structure diagram of reinforcement learning used in the present invention.
Figure 8 is an example diagram showing the process of extracting state data of a factory workflow and applying it to a neural network according to an embodiment of the present invention.
Figure 9 is a block diagram of the configuration of a factory simulator-based scheduling system according to an embodiment of the present invention.
10 is a table illustrating state variables according to an embodiment of the present invention.
Figure 11 is an example diagram showing the process of selecting and mapping a product type according to an embodiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.Hereinafter, specific details for implementing the present invention will be described with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.In addition, in explaining the present invention, like parts are given the same reference numerals, and repeated description thereof is omitted.

먼저, 본 발명에서 사용하는 공장 워크플로우 모델의 구성에 대하여 도 2 내지 도 5를 참조하여 설명한다.First, the configuration of the factory workflow model used in the present invention will be described with reference to FIGS. 2 to 5.

도 2에서 보는 바와 같이, 공장 워크플로우는 일련의 다수의 공정으로 구성되고, 하나의 공정은 다른 공정과 연결된다. 또한, 연결된 공정은 선후 관계를 가진다. 공정을 하나의 노드로 볼 때, 전체 공장 워크플로우는 방향성 그래프를 형성한다. 이하에서 설명의 편의를 위하여, 공정을 공정 노드와 혼용한다.As shown in Figure 2, the factory workflow consists of a series of multiple processes, and one process is connected to another process. Additionally, connected processes have a precedence relationship. When viewing a process as a single node, the entire factory workflow forms a directed graph. Below, for convenience of explanation, process is used interchangeably with process node.

도 2의 예에서, 공장 워크플로우는 공정 P0, P1, P2, ..., P4로 구성되고, 공정 P0로 시작되어 공정 P4로 종료된다. 공정 P0가 완료되면 다음 공정 P1이 시작되고, 공정 P1이 완료되면 다음 공정 P2가 시작된다.In the example of Figure 2, the factory workflow consists of processes P0, P1, P2, ..., P4, starting with process P0 and ending with process P4. When process P0 is completed, the next process P1 starts, and when process P1 is completed, the next process P2 starts.

한편, 하나의 로트(lot)는 특정 작업물로서, 공장 워크플로우의 각 공정 P0, P1, P2, ..., P4를 거쳐 완성된 제품으로 생산된다. 특정 작업물(=로트)에 대해, 공정 P0가 완료되면 해당 로트를 전달받아 다음 공정 P1를 시작할 수 있다. 즉, 공정 P0에서 처리가 완료된 로트(LOT)가 공정 P1에 제공되면, 공정 P1은 해당 로트를 이어받아 추가 작업을 지속 진행한다. 이와 같이 공장 플로우의 모든 일련의 공정이 처리되면, 해당 로트의 제품이 생산된다.Meanwhile, one lot is a specific workpiece and is produced as a finished product through each process P0, P1, P2, ..., P4 of the factory workflow. For a specific workpiece (=lot), when process P0 is completed, the lot can be delivered and the next process P1 can be started. In other words, when a lot (LOT) that has been processed in process P0 is provided to process P1, process P1 takes over the lot and continues to perform additional work. When all series of processes in the factory flow are processed in this way, the product of the corresponding lot is produced.

또한, 공장 워크플로우는 하나의 제품 종류(또는 제품 유형, 제품군)만을 생산하는 것이 아니라 다수 유형의 제품이 처리되어 생산될 수 있다. 따라서 각 로트는 제품 종류에 따라 유형이 달라진다. 예를 들어, 로트 1은 '제품 A' 유형이면, 로트 1의 최종 생산 제품은 제품 A가 생산된다. 또한, 로트 2는 '제품 B' 유형이면, 로트 2의 최종 생산 제품은 제품 B가 생산된다.Additionally, the factory workflow does not produce only one product type (or product type, product family), but multiple types of products can be processed and produced. Therefore, each lot has a different type depending on the product type. For example, if Lot 1 is of type 'Product A', then the final product of Lot 1 is Product A. Additionally, if Lot 2 is of type 'Product B', Product B is produced as the final product of Lot 2.

또한, 각 공정은 동시에 구동될 수 있다. 예를 들어, 공정 P4에서 로트 8(제품 B)를 처리하고 있을 때, 동시에 공정 P0에서 로트 1(또는 제품 A)을 중간 처리하고, 공정 P1에서 로트 2(제품 B)를 처리하고 있을 수 있다.Additionally, each process can be run simultaneously. For example, when lot 8 (product B) is being processed in process P4, lot 1 (or product A) may be intermediately processed in process P0 at the same time, and lot 2 (product B) may be processed in process P1. .

다음으로, 공정의 작업 유형에 대하여 설명한다.Next, the operation types of the process are explained.

또한, 하나의 공정은 다수의 작업 유형을 선택적으로 수행할 수 있다. 해당 공정은 투입 로트를 대상으로 진행 가능한 작업유형 중 하나를 작업한다. 이때, 로트(이하 투입 로트)가 해당 공정에 투입되고, 공정의 작업이 수행됨에 따라 처리된 로트(이하 완료 로트)가 출력(산출)된다. 즉, 작업이 완료된 로트는 다음 공정 Pn+1의 작업 가능 대상인 투입로트가 된다.Additionally, one process can selectively perform multiple task types. This process performs one of the possible work types for the input lot. At this time, a lot (hereinafter referred to as input lot) is input into the process, and as the work of the process is performed, the processed lot (hereinafter referred to as completed lot) is output (calculated). In other words, the lot on which work has been completed becomes the input lot that can be worked on in the next process Pn+1.

도 3의 예에서, 공정 Pn 은 작업유형 n-1, 작업유형 n-2, ..., 작업유형 n-M 등 다수의 작업 유형을 갖는다. 공정 Pn은 M개의 작업 중에서 하나의 작업을 선택하여 수행한다. 그때 환경이나 요청에 따라 다수의 작업 중 하나가 선택되어 수행된다. 특히, 바람직하게는, 해당 공정에 스케줄링 신경망이 있는 경우, 해당 공정의 신경망에서 (다음에 수행할) 작업(또는 작업 유형)을 선택하여 스케줄링 한다.In the example of Figure 3, process Pn has multiple operation types, such as operation type n-1, operation type n-2, ..., operation type n-M. Process Pn selects and performs one task among M tasks. At that time, one of multiple tasks is selected and performed depending on the environment or request. In particular, preferably, if the process has a scheduling neural network, the task (or task type) (to be performed next) is selected from the neural network of the process and scheduled.

일례로서, 공정 Pn (n=0,1,2,3…)는 여러 작업 유형들을 통칭하는 작업(예: "조립")을 의미하며, 각 작업 유형은 투입 로트의 종류에 따라 달라진다(예: 투입 로트 유형이 의자이면 "다리 조립", 투입 로트 유형이 테이블이면 "상판 조립", 투입 로트 유형이 서랍장이면 "서랍 조립" 등).As an example, process Pn (n=0,1,2,3…) refers to an operation that collectively refers to several operation types (e.g. “assembly”), each operation type depending on the type of input lot (e.g. “Assembling legs” if the input lot type is chair, “Assembling top” if the input lot type is table, “Assembling drawers” if the input lot type is chest of drawers, etc.)

한편, 바람직하게는, 하나의 공정(예를 들어, 공정 Pn)에서 동시에 2개의 작업 유형이 실시될 수 있다. 즉, 하나의 공정을 가공할 수 있는 설비가 M대 있다면, M개의 각기 다른 제품 유형에 대하여 작업을 수행할 수 있다.Meanwhile, preferably, two operation types can be carried out simultaneously in one process (eg process Pn). In other words, if there are M facilities capable of processing one process, work can be performed on M different product types.

다음으로, 공정의 장비에 대하여 설명한다.Next, the equipment for the process will be explained.

또한, 공정 Pn에 배치된 장비는 여러 대가 있을 수 있으며, 작업 유형 별로 투입 로트에 대해서는 특정 "장비" 가 하나 지정된다. 이러한 지정 가능한 관계정의를 도 4에 나타내고 있다. 즉, 특정 장비 별로 진행할 수 있는 작업 유형 및 로트 유형이 다르며, 장비 별로 하나 또는 여러 작업 유형이 진행될 수 있다.Additionally, there may be multiple pieces of equipment deployed in process Pn, and one specific “equipment” is designated for each input lot for each work type. These designable relationship definitions are shown in Figure 4. In other words, the work types and lot types that can be performed for each specific piece of equipment are different, and one or multiple work types can be performed for each piece of equipment.

도 4의 예에서, 공정 P0는 3가지의 작업 유형을 가지고, 각 작업 유형 P0-1, P0-2, P0-3은 각각 제품 A, 제품 B, 제품 C를 가공하기 위한 작업이다. 또한, 작업 유형 P0-1(또는 제품 A)을 작업하기 위한 장비는 장비1과 장비2 등 2개이다. 또한, 작업 유형 P0-2,P03을 작업하기 위한 장비는 각각 장비 2와, 장비1이다.In the example of Figure 4, process P0 has three operation types, and each operation type P0-1, P0-2, and P0-3 are operations for processing product A, product B, and product C, respectively. Additionally, there are two pieces of equipment, Equipment 1 and Equipment 2, for working on work type P0-1 (or Product A). Additionally, the equipment for working types P0-2 and P03 are Equipment 2 and Equipment 1, respectively.

공정 Pn 내에 여러 개의 로트가 존재할 수 있으며, 로트의 상태는 현재 작업 중인 로트와, 작업 시작을 기다리는 로트(대기중인 로트)로 구분될 수 있다. 대기중 로트 상태는 해당 공정에서 이미 장비가 다른 로트를 처리 중이거나, 해당 로트를 가공할 수 있는 장비의 작업 준비가 완료되지 않은 상태인 경우를 나타낸다.There may be multiple lots within the process Pn, and the lot status can be divided into a lot currently being worked on and a lot waiting for work to begin (waiting lot). The waiting lot status indicates that the equipment is already processing another lot in the process, or the equipment that can process the lot is not ready for work.

한편, 앞서 설명한 바와 같이, 하나의 공정에서 동시에 2개 이상의 제품 유형의 작업이 수행될 수 있다. 도 4의 예에서, 제품 B와 제품 C는 각각 장비 2와, 장비 1에 의해 수행되므로, 제품 B와 제품 C는 동시에 작업이 수행될 수 있다.Meanwhile, as described above, work on two or more product types may be performed simultaneously in one process. In the example of Figure 4, Product B and Product C are performed by Equipment 2 and Equipment 1, respectively, so Product B and Product C can be performed simultaneously.

다음으로, 공정의 작업교체(job change)에 대하여 설명한다.Next, job change in the process will be explained.

공정 Pn에서, 특정 장비가 이전 로트와 다음 로트의 작업유형이 달라지는 경우, 이전 로트의 작업(또는 작업 유형)에서 다음 로트의 작업 유형을 처리하기 위하여, 준비 작업을 수행하고 장비설정을 변경하는 과정을 수행한다. 이를 작업교체(job change)라 부르기로 한다. 또한, 작업 교체를 위해 일정한 시간이 소요된다.In process Pn, when specific equipment has different operation types for the previous lot and the next lot, the process of performing preparatory work and changing equipment settings in order to process the operation type of the next lot from the operation (or operation type) of the previous lot. Perform. This is called job change. Additionally, it takes a certain amount of time to change jobs.

예를 들어, 작업 교체는 오일 교체, 장비 구동 프로그램 교체, 장비 내 공구 교체, 장비 예열/냉각작업 등일 수 있다.For example, job replacement may be oil replacement, equipment drive program replacement, tool replacement within the equipment, equipment preheating/cooling work, etc.

이러한 작업 교체 소요시간이 도 5에 도시되고 있다. 도 5의 예에서, 현재 작업유형 P0-1인 로트를 생산하던 장비가, 작업유형 P0-3인 로트를 생산하려면, 12시간이 소요되는 작업교체 작업을 수행해야 한다. 즉, 12시간의 작업교체 후 다음 작업 유형 P0-3의 로트를 처리할 수 있다.The time required to replace this task is shown in Figure 5. In the example of FIG. 5, if the equipment that was currently producing the lot of operation type P0-1 wants to produce the lot of operation type P0-3, a job replacement operation that takes 12 hours must be performed. That is, after a 12-hour work change, the lot of the next work type P0-3 can be processed.

작업교체 시에는 장비에 투입 가능한 로트가 존재하더라도 생산(처리)될 수 없다. 따라서 작업교체가 자주 일어나면, 작업교체 시간이 많이 소요되므로, 장비 활용율이 저하된다.When changing work, even if there is a lot that can be put into the equipment, it cannot be produced (processed). Therefore, if job changes occur frequently, it takes a lot of time to change jobs, so the equipment utilization rate decreases.

공정 Pn에서 작업교체를 하지 않고 특정 로트 유형만 생산하는 작업 유형만 지속할 수 있다. 이 경우, 작업 대기 중인 다른 로트 유형은 완료 로트로 진행될 수 없기 때문에, 고객이 요구한 로트 유형이 여러 개이면, 반드시 적절한 "작업교체 시점"을 결정하여 작업교체를 수행해야 한다.In process Pn, only operation types that produce specific lot types can be continued without switching operations. In this case, since other lot types awaiting work cannot proceed to the completed lot, if there are multiple lot types requested by the customer, the appropriate “job change point” must be determined and work change must be performed.

또한, 작업 교체는 장비 별로 수행된다. 따라서 하나의 장비가 다른 제품 유형을 위해 작업 교체 되는 중 다른 장비에서 작업 교체가 수행될 수 있다.Additionally, job replacement is performed on a equipment-by-equipment basis. Therefore, while one piece of equipment is being swapped for a different product type, a job swap can be performed on another piece of equipment.

다음으로, 본 발명을 실시하기 위한 전체 시스템의 구성을 도 6을 참조하여 설명한다.Next, the configuration of the entire system for implementing the present invention will be described with reference to FIG. 6.

도 6에서 보는 바와 같이, 본 발명을 실시하기 위한 전체 시스템은 신경망(11)으로 구성되는 신경망 에이전트(10), 공장의 워크플로우를 시뮬레이션하는 공장 시뮬레이터(20), 및, 신경망 에이전트(10)를 통해 각 공정의 스케줄링을 생성하는 스케줄링 시스템(30)으로로 구성된다. 추가적으로, 스케줄링 데이터 등을 저장하는 데이터베이스(40)를 더 포함하여 구성될 수 있다.As shown in Figure 6, the entire system for implementing the present invention includes a neural network agent 10 consisting of a neural network 11, a factory simulator 20 that simulates the workflow of the factory, and a neural network agent 10. It consists of a scheduling system 30 that generates scheduling for each process. Additionally, it may be configured to further include a database 40 that stores scheduling data, etc.

먼저, 신경망 에이전트(10)는 워크플로우의 상태(또는 로트 상태)를 입력받으면 특정 공정의 다음 작업(또는 작업 행위)을 출력하는 적어도 하나의 신경망(11)으로 구성된다.First, the neural network agent 10 is composed of at least one neural network 11 that outputs the next task (or task action) of a specific process when the state of the workflow (or lot state) is input.

특히, 하나의 신경망(11)은 하나의 공정에 대한 다음 작업(또는 작업 유형)을 결정하도록 구성된다. 즉, 바람직하게는, 해당 공정에서 다음으로 수행할 수 있는 다수의 작업(또는 작업 유형) 중에서 하나를 선택한다. 일례로서, 신경망(11)의 출력은 작업 유형에 해당하는 노드들로 구성되고, 각 노드의 출력은 확률값을 출력하며, 가장 큰 확률값의 노드에 해당하는 작업이 다음 작업으로 선택된다.In particular, one neural network 11 is configured to determine the next task (or task type) for one process. That is, preferably, one of a number of operations (or types of operations) that can be performed next in the process is selected. As an example, the output of the neural network 11 consists of nodes corresponding to task types, the output of each node outputs a probability value, and the task corresponding to the node with the largest probability value is selected as the next task.

즉, 각 공정의 스케줄링 인공 신경망(11)은 "다음 작업로트"를 택1하는 의사결정을 수행하며, 작업교체 시간을 감수하고 다른 작업유형을 선택하거나, 동일한 작업유형의 로트를 다시 선택할 수 있다.In other words, the scheduling artificial neural network 11 of each process makes a decision to select the “next work lot,” and can choose a different work type at the expense of work change time, or select a lot of the same work type again. .

또한, 바람직하게는, 신경망(11)은 설비에 대하여 다음 작업에 대한 의사결정을 한다. 예를 들어, 공정 P2에서 MC1, MC2 설비가 2대 있다면, MC1이 현재 작업 로트를 종료하기 전, 다음 로트를 신경망에 의사결정 요청하고, MC2도 동일하게 각기 다른 시점(혹은 동일한 시점일 수도 있음)에 독립적으로 신경망(11)에 의사결정을 요청할 수 있다. 즉, 신경망은 여러 대의 설비를 동시에 의사결정 하지 않는다.Additionally, preferably, the neural network 11 makes decisions about the next task for the facility. For example, if there are two machines MC1 and MC2 in process P2, before MC1 ends the current work lot, it requests the neural network to make a decision on the next lot, and MC2 does the same at different times (or it may be the same time). ), you can request a decision from the neural network (11) independently. In other words, a neural network does not make decisions on multiple facilities simultaneously.

또한, 다수 공정들의 다음 작업을 결정하기 위하여, 다수 공정들 각각에 대한 다수의 신경망(11)을 구성할 수 있다. 도 2의 예에서, 공정이 5개이면, 각각의 공정에 대응되는 신경망(11)을 구성하여 모두 5개를 구성할 수 있다. 그러나, 공정 내에서 선택하는 작업이 하나만 있는 경우 등 선택이 필요없거나 단순한 공정에 대해 신경망을 구성하지 않는다.Additionally, in order to determine the next task of the multiple processes, multiple neural networks 11 for each of the multiple processes may be configured. In the example of FIG. 2, if there are five processes, a total of five processes can be configured by configuring the neural network 11 corresponding to each process. However, a neural network is not constructed for simple processes that do not require selection, such as when there is only one operation to select within the process.

신경망 및 그 신경망의 최적화는 DQN(Deep-Q Network) 등 통상의 강화학습 기반의 신경망 방식을 이용한다[비특허문헌 1]. 즉, 신경망(11)은 도 7과 같은 강화 학습을 이용하여 학습될 수 있다.Neural networks and their optimization use typical reinforcement learning-based neural network methods such as DQN (Deep-Q Network) [Non-patent Document 1]. That is, the neural network 11 can be learned using reinforcement learning as shown in FIG. 7.

또한, 신경망 에이전트(10)는 워크플로우 상태(S_t)와, 해당 상태에서의 작업(a_t), 해당 작업에 의해 수행된 후의 워크플로우 상태(S_t+1), 그리고 해당 상태에서의 작업에 대한 보상(r_t)을 입력받아, 해당 공정의 신경망(11)의 파라미터를 최적화 한다.In addition, the neural network agent 10 determines the workflow state (S _t ), the task in that state (a _t ), the workflow state after being performed by the task (S _t+1 ), and the task in that state. By receiving the compensation (r _t ) for , the parameters of the neural network (11) of the process are optimized.

또한, 신경망(11)이 최적화 되면(학습되면), 신경망 에이전트(10)는 워크플로우 상태(S_t)를 최적화된 신경망(11)에 적용하여 다음 작업(a_t)을 출력하게 한다.In addition, when the neural network 11 is optimized (learned), the neural network agent 10 applies the workflow state (S _t ) to the optimized neural network 11 to output the next task (a _t ).

한편, 워크플로우 상태(S_t)는 t시점에서의 워크플로우 상태(또는 공장 상태)를 나타낸다. 바람직하게는, 워크플로우 상태는 워크플로우 내의 각 공정의 상태와, 공장 전체에 해당하는 공장 상태로 구성된다. 예를 들어, 워크플로우 상태(St)는 의사결정 대상 제품 유형에 해당하는 로트의 공장 내 분포, 의사결정을 하고자 하는 생산 설비의 현재 작업 제품 유형 등 각 공정 또는 공장 상태를 나타내는 정보로 구성된다.Meanwhile, the workflow status (S _t ) represents the workflow status (or factory status) at time t. Preferably, the workflow state consists of the state of each process within the workflow and the factory state corresponding to the entire factory. For example, the workflow status (St) consists of information representing the status of each process or factory, such as the distribution within the factory of the lot corresponding to the product type for which the decision is to be made and the current work product type of the production facility for which the decision is to be made.

또한, 바람직하게는, 워크플로우 상태는 워크플로우 내의 일부 공정의 상태들만 포함할 수 있다. 이때, 워크플로우 내에서 병목 현상을 유발하는 공정 등 핵심적인 공정들만을 대상으로, 해당 공정들의 상태들만 포함할 수 있다. 또한, 워크플로우 상태는 워크플로우의 과정에서 변화되는 요소를 대상으로 설정된다. 즉, 워크플로우가 진행되어도 변하지 않는 구성요소는 상태로 설정되지 않는다.Additionally, preferably, the workflow state may include only the states of some processes within the workflow. At this time, only core processes, such as those that cause bottlenecks within the workflow, can be targeted, and only the states of those processes can be included. Additionally, the workflow state is set to elements that change during the workflow. In other words, components that do not change as the workflow progresses are not set to status.

각 공정의 상태(또는 공정 상태)는 투입 로트, 각 공정 장비의 상태 등으로 구성된다. 또한, 공장 상태는 제품의 생산 목표량, 달성된 현황 등 전체 공정에서의 상태를 나타낸다.The status of each process (or process status) consists of the input lot, the status of each process equipment, etc. In addition, the factory status indicates the status of the entire process, such as the product production target and achieved status.

한편, 위와 같이, 상태는 전체 워크플로우 상태로 설정하고, 행위는 해당 공정에서의 작업으로 설정하고 있다. 즉, 공정 상태는 전체 워크플로우 내에 있는 로트(Lot)들의 배치상태, 장비상태들을 포함하나, 행위(또는 작업)는 특정 공정 노드(Node)에 국한된다. 공장에서는 가장 생산능력의 병목이 되거나, 의사결정이 필요한 특정 공정 노드(Node)를 최적 스케줄링 할 경우, 연계된 전후 공정 노드(Node)의 문제는 개의치 않겠다는 제약이론(TOC, Theory of Constraint)[비특허문헌 2]이 전제된다. 이는 마치 신호등이나 교차로, 인터체인지와 같은 주요 관리 포인트에서 주요 의사결정을 진행하되, 이를 위해서 연결된 모든 전후 도로들의 트래픽 상황을 상태(State)로 반영해야 하는 것과 같다.Meanwhile, as above, the status is set to the overall workflow state, and the action is set to the task in the corresponding process. In other words, the process state includes the placement status and equipment status of lots within the entire workflow, but the action (or task) is limited to a specific process node. In factories, the Theory of Constraint (TOC) states that when optimally scheduling a specific process node that is the bottleneck of production capacity or requires decision-making, the problems of related process nodes before and after are not concerned. Non-patent Document 2] is assumed. This is like making major decisions at major management points such as traffic lights, intersections, and interchanges, but for this purpose, the traffic situation on all connected roads before and after must be reflected as the state.

한편, 보상(r_t)는 공장에서 의사결정으로 최적화 하고자 하는 보상을 나타낸다. 예를 들어, 보상은 설비 가동율(= 설비가 작업한 시간/전체 시간), 납기 만족율(= 제품 별 작업완료를 해야 하는 목표시간 준수 생산량 / 전체 작업완료를 해야하는 생산량) 등으로 구성된다. 즉, 이들 요소를 가중 합하여 보상치를 구성할 수 있다.Meanwhile, compensation (r _t ) represents the compensation to be optimized through decision making in the factory. For example, compensation consists of facility operation rate (= time worked by the facility/total time), delivery satisfaction rate (= production volume that complies with the target time to complete work for each product/production amount to complete all work), etc. In other words, the compensation value can be formed by weighting and summing these factors.

즉, 각 공정 상태(또는 로트 상태)(S_t)에서의 보상(r_t)은 강화학습 방식에 의하여 산출한다. 각 상태(S_t)에서의 보상(r_t)은 해당 생산 에피소드의 최종 결과(또는 최종 성과)로부터 산출한다. 즉, 최종 결과(또는 최종 성과)은 해당 공정 또는 전체 워크플로우의 생산 설비(장비)의 가동효율, 작업물의 작업시간(TAT: Turn-Around Time), 생산목표 달성율 등 공장 관리에서 사용하는 주요 KPI(Key Performance Index, 주요 성능 지수) 등에 의해 산출된다.That is, the reward (r _t ) in each process state (or lot state) (S _t ) is calculated using a reinforcement learning method. The reward (r _t ) in each state (S _t ) is calculated from the final result (or final performance) of the corresponding production episode. In other words, the final result (or final performance) is the main KPI used in factory management, such as the operation efficiency of the production equipment (equipment) of the process or overall workflow, the turn-around time (TAT) of the workpiece, and the production target achievement rate. It is calculated by (Key Performance Index), etc.

요약하면, 생산 에피소드로부터 시간 순에 따른 상태(S_t)와 작업(a_p,t), 보상(r_t)을 추출하면, 트랜지션(transition)들을 추출할 수 있다. 즉, 트랜지션은 현재 상태(S_t)와 작업(a_p,t)에서 다음 상태(S_t+1)와 보상(r_t)으로 구성된다. 이것은 현재 상태(S_t)에서 특정 공정의 작업(a_p,t)이 수행되면 다음 상태(S_t+1)로 전환되고 보상(r_t)의 가치를 얻는 것을 의미한다. 여기서의 보상(r_t)은 작업(a_p,t)이 수행된 경우의 현재 상태(S_t)에 대한 가치를 의미한다.In summary, transitions can be extracted by extracting the state (S _t ), task (a _p,t ), and reward (r _t ) in chronological order from the production episode. In other words, a transition consists of the current state (S _t ) and task (a _p,t ) to the next state (S _t+1 ) and reward (r _t ). This means that when a specific process task (a _p,t ) is performed in the current state (S _t ), it transitions to the next state (S _t+1 ) and obtains the value of reward (r _t ). Here, reward (r _t ) refers to the value of the current state (S _t ) when the task (a _p,t ) is performed.

한편, 각 공정의 신경망(11)의 상태 데이터(또는 입력 데이터, 입력노드)와 작업 데이터(또는 출력 데이터, 출력 노드)에 대해서, 제품 유형의 개수는 N개(N은 2이상의 자연수)로서 사전에 설정된다. 즉, 상태 데이터는 각 공정의 상태나 공장 상태를 나타내는데, 해당 상태는 서로 다른 제품 유형을 가지는 제품들에 대한 상태를 포함한다. 이때, 각 공정의 신경망(11)에서 제품 유형의 개수는 N개로 사전에 설정된다. 따라서 제품 유형과 관련된 상태 데이터의 형태 또는 형식은 고정되고, 그 작업 데이터의 개수 등 형태도 고정된다. 이때의 제품 유형의 개수를 유효 개수라고 부르기로 한다.Meanwhile, for the state data (or input data, input node) and work data (or output data, output node) of the neural network 11 of each process, the number of product types is N (N is a natural number of 2 or more), which is defined in the dictionary. is set in In other words, the status data represents the status of each process or factory, and the status includes the status of products with different product types. At this time, the number of product types in the neural network 11 of each process is preset to N. Therefore, the form or format of the status data related to the product type is fixed, and the form, such as the number of work data, is also fixed. The number of product types at this time is called the effective number.

한편, 각 공정의 신경망(11)에서의 제품 유형의 개수(또는 유효 개수)는 서로 다를 수 있다. 즉, 공정 P2의 신경망의 제품 유형의 개수(또는 유효 개수)는 2개이고, 공정 P4의 신경망의 제품 유형의 개수는 3개일 수 있다.Meanwhile, the number (or effective number) of product types in the neural network 11 of each process may be different. That is, the number (or effective number) of product types in the neural network of process P2 may be 2, and the number of product types in the neural network of process P4 may be 3.

강화 학습 모델의 신경망에서는 상태 데이터와 작업 데이터가 사용되나, 일반적인 모델의 신경망에서는 워크플로우 상태와 다음 작업은 각각 입력 데이터(또는 입력노드)와 출력 데이터(출력 노드)에 해당된다.State data and task data are used in the neural network of a reinforcement learning model, but in the neural network of a general model, the workflow state and next task correspond to input data (or input node) and output data (output node), respectively.

다음으로, 공장 시뮬레이터(20)는 공장 워크플로우를 시뮬레이션하는 통상의 시뮬레이터이다.Next, the factory simulator 20 is a typical simulator that simulates factory workflow.

공장 워크플로우는 앞서 도 2와 같은 워크플로우 모델을 사용한다. 즉, 시뮬레이션의 공장 워크플로우 모델은 공정을 나타내는 다수의 노드로 구성된 방향성 그래프로 모델링된다. 시뮬레이션의 각 공정 모델은 실제 현장의 설비 현황으로 모델링된다.The factory workflow uses the same workflow model as shown in Figure 2 above. In other words, the factory workflow model in the simulation is modeled as a directed graph consisting of a number of nodes representing the process. Each process model in the simulation is modeled with the actual facility status on site.

즉, 도 3과 같이, 각 공정 모델은 해당 공정에 투입되는 로트(LOT), 해당 공정에서 완료되는 로트(LOT), 다수의 작업 유형, 각 작업을 위한 장비, 각 작업 유형에 따른 작업의 처리 속도, 이전 작업 유형과 새로운 작업 유형을 교체하기 위한 작업교체 시간 등 설비 구성과 처리 능력을 모델링 변수로 모델링된다.That is, as shown in Figure 3, each process model includes the lot (LOT) input to the process, the lot (LOT) completed in the process, multiple work types, equipment for each work, and processing of work according to each work type. Equipment configuration and processing capabilities, such as speed and work changeover time to replace the previous work type with the new work type, are modeled as modeling variables.

상기와 같은 공장 시뮬레이터는 통상의 시뮬레이션 기술을 채용한다. 따라서 더 구체적인 설명은 생략한다.The above factory simulator employs conventional simulation technology. Therefore, more detailed explanations are omitted.

다음으로, 스케줄링 시스템(30)은 공장 시뮬레이터(20)를 이용하여 시뮬레이션을 수행하고, 시뮬레이션 결과로부터 학습 데이터를 추출하고, 추출된 학습 데이터로 신경망 에이전트(10), 즉, 각 공정의 신경망(11)을 학습시킨다. 또한, 스케줄링 시스템(30)은 실제 생산할 제품에 대하여 공장 시뮬레이터(20)와, 학습된 신경망 에이전트(10)를 이용하여 스케줄링 데이터를 생성한다.Next, the scheduling system 30 performs a simulation using the factory simulator 20, extracts learning data from the simulation results, and uses the extracted learning data to create a neural network agent 10, that is, the neural network 11 of each process. ) is learned. Additionally, the scheduling system 30 generates scheduling data for actual products to be produced using the factory simulator 20 and the learned neural network agent 10.

즉, 스케줄링 시스템(30)은 공장 시뮬레이터(20)로 다수의 생산 에피소드를 시뮬레이션한다. 생산 에피소드는 최종 제품(또는 로트)을 생산하는 전체 과정을 의미한다. 이때, 각 생산 에피소드는 각 처리과정이 상이하다.That is, the scheduling system 30 simulates multiple production episodes with the factory simulator 20. A production episode refers to the entire process of producing a final product (or lot). At this time, each production episode has a different processing process.

예를 들어, 빨간색 볼펜 100자루와 파랑색 볼펜 50자루를 생산하는 시뮬레이션을 한번 수행하는 것이 하나의 생산 에피소드이다. 이때, 공장 워크플로우 내에서 처리하는 세부 공정이 서로 다를 수 있다. 세부 공정을 다르게 하면 또 다른 하나의 생산 에피소드가 생성된다. 예를 들어, 특정 상태일때 공정 2에서 빨간색 볼펜의 로트를 처리하는 것과, 파랑색 볼펜의 로트를 처리하는 것 등은 서로 다른 생산 에피소드이다.For example, performing a simulation to produce 100 red ballpoint pens and 50 blue ballpoint pens once is one production episode. At this time, the detailed processes processed within the factory workflow may be different. If the detailed process is different, another production episode is created. For example, processing a lot of red ballpoint pens and processing a lot of blue ballpoint pens in process 2 under a certain state are different production episodes.

또한, 스케줄링 시스템(30)은 생성된 에피소드를 이용하여 학습 데이터를 생성하고, 신경망 에이전트(10)를 통해 신경망(11)들을 학습시킬 수 있다. 즉, 스케줄링 시스템(30)은 생산 에피소드에 따라 시뮬레이터(10)로 시뮬레이션(모의) 하고, 모의 결과로부터 트랜지션들을 추출하여 학습 데이터를 구축한다. 이때, 하나의 에피소드에서도 다수의 트랜지션들이 추출된다. 바람직하게는, 다수의 에피소드에 따라 시뮬레이션을 수행하고, 이로부터 다량의 트랜지션을 추출한다.Additionally, the scheduling system 30 can generate learning data using the generated episode and train the neural networks 11 through the neural network agent 10. That is, the scheduling system 30 simulates (mocks) the simulator 10 according to the production episode, extracts transitions from the simulation results, and constructs learning data. At this time, multiple transitions are extracted from one episode. Preferably, simulation is performed according to multiple episodes and a large amount of transitions are extracted from them.

그리고 스케줄링 시스템(30)은 추출된 학습 데이터(또는 트랜지션)을 신경망 에이전트(10)에 적용하여 학습시킨다.Then, the scheduling system 30 applies the extracted learning data (or transitions) to the neural network agent 10 to learn it.

이때, 일례로서, 학습 데이터(또는 트랜지션)을 시간 순에 의해 순차적으로 학습시킬 수 있다. 바람직하게는, 전체 트랜지션에서 랜덤하게 트랜지션을 샘플링하고, 샘플링된 트랜지션들로 신경망 에이전트(10)를 학습시킨다.At this time, as an example, learning data (or transitions) can be learned sequentially in chronological order. Preferably, transitions are randomly sampled from all transitions, and the neural network agent 10 is trained using the sampled transitions.

또한, 신경망 에이전트(10)가 다수의 신경망을 구성한 경우, 각 신경망에 대응되는 공정의 학습 데이터(또는 트랜지션 데이터)를 이용하여, 해당 신경망을 학습시킨다.Additionally, when the neural network agent 10 configures multiple neural networks, the corresponding neural network is trained using learning data (or transition data) of the process corresponding to each neural network.

또한, 스케줄링 시스템(30)은 실제 생산할 제품에 대하여 공장 시뮬레이터(20)로 모의하여 스케줄링 데이터를 생성한다. 이때, 공장 시뮬레이터(20)에서 모의할 때, 각 공정에서 다음 작업을 선택하는 의사결정을, 해당 공정의 신경망(11)으로 수행한다.Additionally, the scheduling system 30 generates scheduling data by simulating the actual product to be produced using the factory simulator 20. At this time, when simulating in the factory simulator 20, the decision to select the next task in each process is made by the neural network 11 of the process.

한편, 스케줄링 시스템(30)은 공장 시뮬레이터(20)의 워크플로우 상태에서 유효 개수만큼의 제품 유형의 후보(또는 후보 유형)를 선출하고, 선출된 후보 유형에 의한 상태 데이터(또는 각 공정의 신경망을 위한 상태 데이터)를 신경망의 상태 데이터로 매핑한다. 또한, 스케줄링 시스템(30)은 신경망의 작업 데이터(또는 출력 데이터)의 결과 또는 제품 유형에 대응되는(매핑되는) 후보 유형을 의사결정된 유형으로 결정하고 결정된 유형을 공장 시뮬레이터(20)에 반영한다.Meanwhile, the scheduling system 30 selects a valid number of product type candidates (or candidate types) from the workflow state of the factory simulator 20, and provides status data (or a neural network for each process) by the selected candidate types. state data) is mapped to the state data of the neural network. Additionally, the scheduling system 30 determines the candidate type corresponding to (mapped to) the result of the job data (or output data) of the neural network or the product type as the determined type and reflects the determined type in the factory simulator 20.

이때, 바람직하게는, 스케줄링 시스템(30)은 사전에 설정된 규칙에 의하여, 공장 시뮬레이터(20)의 워크플로우 상태에서 제품 유형의 후보를 선출한다.At this time, preferably, the scheduling system 30 selects a product type candidate from the workflow state of the factory simulator 20 according to preset rules.

다음으로, 데이터베이스(40)는 신경망 에이전트(10)를 학습시키기 위한 학습 데이터를 저장하는 학습데이터DB(41), 스케줄링 데이터 등을 저장하는 스케줄링DB(42) 등으로 구성된다. 그러나 상기 데이터베이스(40)의 구성은 바람직한 일실시예일 뿐이며, 구체적인 장치를 개발하는데 있어서, 접근 및 검색의 용이성 및 효율성 등을 감안하여 데이터베이스 구축이론에 의하여 다른 구조로 구성될 수 있다.Next, the database 40 is composed of a learning data DB 41 that stores learning data for training the neural network agent 10, a scheduling DB 42 that stores scheduling data, etc. However, the configuration of the database 40 is only a preferred embodiment, and when developing a specific device, it may be configured in a different structure according to database construction theory, taking into account ease of access and search and efficiency.

다음으로, 본 발명의 일실시예에 따른 시뮬레이터(10)에 의해 공장 워크플로우 상태와, 신경망의 상태 데이터가 매핑되는 예를 도 8을 참조하여 설명한다.Next, an example in which factory workflow status and neural network status data are mapped by the simulator 10 according to an embodiment of the present invention will be described with reference to FIG. 8.

도 8은 시뮬레이터(10)로 모의하는 제품 유형의 개수가 신경망의 유형 개수와 동일한 경우를 예시하고 있다. 즉, 시뮬레이터(10)로 모의하는 제품 유형은 제품 A와 제품 B 등 2가지이고, 신경망의 유형 개수는 제품 1과 제품 2 등 2가지이다. 이 경우, 별도로 후보를 선출하지 않고 공장 워크플로우 상태에서 신경망의 상태 데이터로 바로 매핑할 수 있다.Figure 8 illustrates a case where the number of product types simulated by the simulator 10 is equal to the number of types of the neural network. In other words, there are two types of products simulated by the simulator 10, Product A and Product B, and the number of types of neural network is two, Product 1 and Product 2. In this case, it is possible to map directly from the factory workflow state to the state data of the neural network without separately selecting candidates.

도 8에서, (a)는 시뮬레이터(10)에서 모의한 결과로서 시뮬레이터(10) 상에서의 워크플로우 상태를 나타내고, (b)는 공정 P4의 다음 작업에 대한 의사결정을 하는 신경망을 나타내고 있다.In FIG. 8, (a) shows the workflow state on the simulator 10 as a result of simulation in the simulator 10, and (b) shows a neural network that makes decisions about the next task of process P4.

도 8의 (a)는 도 2의 공장 워크플로우에 대한 공장 상태(또는 각 공정 상태) 중 하나의 시점에서의 공장 상태(또는 공장 워크플로우의 상태 정보)를 나타내고 있다. 공장 워크플로우의 상태 정보는 시간에 따라 변화될 것이다. 특정 시점의 공장 워크플로우의 상태는 로트의 위치 및 상태, 장비 상태 등으로 구성된다.FIG. 8(a) shows the factory state (or factory workflow status information) at one of the factory states (or each process state) for the factory workflow in FIG. 2. Status information in factory workflows will change over time. The status of the factory workflow at a specific point in time consists of lot location and status, equipment status, etc.

특히, 도 8(a)에서 보는 바와 같이, 전체 공장 상태는 공정 별 로트 유형, 로트 넘버(로트 식별정보), 로트 상태, 장비 등으로 구성된다. 로트 유형은 해당 로트가 어떤 제품(또는 제품 유형)에 대한 로트인지를 나타낸다. 또한, 로트 넘버는 로트 식별정보로서 식별하기 위한 정보이다. 또한, 로트 상태는 해당 로트가 대기 중인지 작업 중인지를 나타낸다. 또한, 장비는 해당 로트를 작업할 때 사용되는 장비를 나타낸다.In particular, as shown in Figure 8(a), the overall factory status consists of lot type, lot number (lot identification information), lot status, equipment, etc. for each process. The lot type indicates which product (or product type) the lot is for. Additionally, the lot number is information for identification as lot identification information. Additionally, lot status indicates whether the lot is waiting or in progress. Additionally, equipment refers to the equipment used when working that lot.

즉, 특정 시점의 공장 워크플로우의 상태는 로트의 위치 및 상태로 가늠할 수 있다. 특히, 공정 별로 어떤 로트 유형 별로 몇 개의 로트가 존재하며, 상태 (작업 중, 대기 중)는 무엇인지 나타낼 수 있다. 이러한 워크플로우 상태 중에서 공정 P4의 신경망에 필요한 상태 데이터가 도 8의 중간 표에 나타내고 있다. 즉, 도 8의 예에서, 후보 제품의 상태 데이터는 로트 분포와 장비 상태 등으로 구성된다. 실제 스케줄링 문제에 따라 도 8의 예와 다른 로트 유형 별 추가 정보들(예, 가공 소요시간, 주문 잔량 등)이 공장 워크플로우의 상태로 부가될 수 있다.In other words, the status of the factory workflow at a specific point in time can be gauged by the location and status of the lot. In particular, it can indicate how many lots exist for each lot type per process and what their status is (working, waiting). Among these workflow states, the state data required for the neural network of process P4 is shown in the middle table of FIG. 8. That is, in the example of FIG. 8, the status data of the candidate product consists of lot distribution, equipment status, etc. Depending on the actual scheduling problem, additional information for each lot type (e.g., processing time, remaining order amount, etc.) different from the example in FIG. 8 may be added to the status of the factory workflow.

또한, 도 8의 (b)는 공정 P4의 신경망은 공정 P4에서 다음 작업물을 선택하는 의사결정을 수행하는 신경망이다. 즉, 공정 P4의 장비1(MC1)에서 장비가 현재 작업을 마쳤을 경우, 다음 작업물을 선택하는 의사결정을 수행하는 인공 신경망이다. 한편, 워크플로우의 선행 공정 P0~P3에는 별도의 의사결정이 불필요하거나, 이미 의사결정 수행이 가능한 신경망이 존재할 수 있다.In addition, the neural network of process P4 in (b) of FIG. 8 is a neural network that makes decisions to select the next workpiece in process P4. In other words, it is an artificial neural network that makes a decision to select the next work piece when the machine has finished its current work in machine 1 (MC1) of process P4. Meanwhile, in the preceding processes P0 to P3 of the workflow, separate decisions may not be necessary, or a neural network that can already perform decision making may exist.

도 8의 예에서, 공정 P4의 장비 MC1은 현재 LOT7을 작업 중이며, 공정 P4에는 장비 3대가 각기 작업을 진행 중이다. 도 8의 공장 워크플로우에서 생산하는 제품 로트 유형은 A, B 두 가지가 있으며, 인공 신경망이 최적 의사결정을 수행할 수 있도록 로트 유형 별 공장 워크플로우 상태와, 공정 P4의 장비 별 상황을 인공신경망에 연결한다. 도 8에서는 (a)공장 워크플로우의 상태정보를, 제품 A/B로 나누어 로트 분포상태와 장비상태를 생성하고, 이들 상태 데이터를 인공 신경망의 인풋 노드 또는 신경망의 제품 1 및 2의 상태 정보로 매핑한다. 즉, 공장 워크플로우의 제품 A는 신경망의 제품 1에 매핑하고, 공장 워크플로우 제품 B는 신경망의 제품 2에 매핑한다.In the example of FIG. 8, equipment MC1 of process P4 is currently working on LOT7, and three pieces of equipment are each working on process P4. There are two product lot types, A and B, produced in the factory workflow in Figure 8. To enable the artificial neural network to make optimal decisions, the factory workflow status for each lot type and the situation for each equipment in process P4 are analyzed by the artificial neural network. Connect to In Figure 8 (a), the status information of the factory workflow is divided into product A/B to generate lot distribution status and equipment status, and these status data are converted into the input node of the artificial neural network or the status information of products 1 and 2 of the neural network. Map. That is, product A of the factory workflow maps to product 1 of the neural network, and product B of the factory workflow maps to product 2 of the neural network.

이때, 인공 신경망의 아웃풋 노드 또는 작업 데이터는 로트 유형의 가짓수에 해당하는 노드가 배치되며, 산출된 신호(숫자)의 크기가 강한 노드에 해당하는 로트가 인공 신경망이 선택한 제품(또는 제품 유형)이 된다. 신경망의 출력 노드에 대응되는(매핑되는) 공장 워크플로우의 제품 유형이 최종 선택된다. 예를 들어, 제품 1이 신경망에서 선택되면, 제품 1에 매핑되는 제품 A가 공정 P4에서 선택된다. 또한, 제품 2가 신경망에서 선택되면, 제품 2에 매핑되는 제품 B가 공정 P4에서 선택된다.At this time, the output node or work data of the artificial neural network is arranged with nodes corresponding to the number of lot types, and the lot corresponding to the node with a strong size of the calculated signal (number) is the product (or product type) selected by the artificial neural network. do. The product type of the factory workflow corresponding to (mapped to) the output node of the neural network is finally selected. For example, if product 1 is selected in the neural network, product A, which maps to product 1, is selected in process P4. Additionally, when product 2 is selected in the neural network, product B, which is mapped to product 2, is selected in process P4.

다음으로, 본 발명의 일실시예에 따른 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템의 구성을 도 9를 참조하여 설명한다.Next, the configuration of a neural network-based process scheduling system independent of product type and number of types according to an embodiment of the present invention will be described with reference to FIG. 9.

도 9에서 보는 바와 같이, 본 발명의 일실시예에 따른 제품 유형 및 유형 개수에 독립적인 신경망 기반 공정 스케줄링 시스템(30)은 시뮬레이터(10)를 모의하여 모의 데이터를 수집하는 모의 실행부(31), 워크플로우 상태에서 제품 유형의 후보를 선출하는 후보유형 선출부(32), 워크플로우의 후보 유형과 신경망의 제품 유형을 매핑하는 제품유형 매핑부(33), 및, 모의 결과로부터 학습 데이터를 생성하여 특정 공정의 신경망을 학습시키는 학습부(34)로 구성된다. 추가적으로, 스케줄링 데이터를 생성하는 스케줄링 생성부(35)를 더 포함하여 구성될 수 있다.As shown in Figure 9, the neural network-based process scheduling system 30, which is independent of product type and number of types according to an embodiment of the present invention, includes a simulation execution unit 31 that collects simulated data by simulating the simulator 10. , a candidate type selection unit 32 that selects a product type candidate in the workflow state, a product type mapping unit 33 that maps the candidate type of the workflow and the product type of the neural network, and generates learning data from the simulation results. It consists of a learning unit 34 that learns a neural network for a specific process. Additionally, it may be configured to further include a scheduling generator 35 that generates scheduling data.

먼저, 모의 실행부(31)는 시뮬레이터(10)를 실행시켜 모의 데이터를 수집한다. 시뮬레이터(10)로 모의한 결과로서, 공장 워크플로우의 상태 정보를 수집할 수 있다. 공장 워크플로우의 상태 정보는 시간에 따라 변화된다. 특정 시점의 공장 워크플로우의 상태는 로트의 위치 및 상태 등으로 구성된다.First, the simulation execution unit 31 runs the simulator 10 to collect simulation data. As a result of simulation with the simulator 10, status information of the factory workflow can be collected. The status information of factory workflow changes over time. The status of the factory workflow at a specific point in time consists of the location and status of the lot, etc.

특히, 모의 실행부(31)는 시뮬레이터(10)를 실행시켜, 앞서 도 8과 같은 공장 워크플로우에 대한 공장 상태(또는 각 공정 상태)를 수집할 수 있다.In particular, the simulation execution unit 31 can run the simulator 10 to collect the factory status (or each process status) for the factory workflow as shown in FIG. 8.

다음으로, 후보유형 선출부(32)는 시뮬레이터(10)에 의해 모의된 워크플로우 상태에서 제품 유형의 후보를 선출한다.Next, the candidate type selection unit 32 selects a product type candidate in the workflow state simulated by the simulator 10.

즉, 후보유형 선출부(32)는 시뮬레이터(10)에서 모의되는 제품 유형들 중에서 유효 개수 만큼의 후보를 선출한다. 이때, 유효 개수는 신경망의 제품 유형의 개수를 나타낸다. 한편, 후보유형 선출부(32)는 각 공정의 신경망 별로 제품 유형의 후보를 선출한다. 즉, 각 공정의 신경망 마다 서로 다르게 제품 유형의 후보가 선출될 수 있다.That is, the candidate type selection unit 32 selects a valid number of candidates from among the product types simulated in the simulator 10. At this time, the effective number represents the number of product types of the neural network. Meanwhile, the candidate type selection unit 32 selects a product type candidate for each neural network of each process. In other words, candidates for different product types may be selected for each neural network in each process.

바람직하게는, 각 제품 유형과 관련된 워크플로우 상태의 변수(또는 상태 변수)를 이용하여, 각 제품 유형의 변수 값(변수의 수치)을 구하고 해당 변수 값에 따라 제품 유형을 선택한다.Preferably, the variable value (numerical value of the variable) of each product type is obtained using the variable (or state variable) of the workflow state related to each product type, and the product type is selected according to the variable value.

제1 방식으로서, 각 제품 유형과 관련된 워크플로우 상태의 변수들에 가중치를 부여하여 해당 제품 유형을 평가하고, 평가치가 높은 순으로 유효 개수 만큼의 제품 유형을 후보로서 선택한다.In the first method, the corresponding product type is evaluated by assigning weights to the variables of the workflow state related to each product type, and a valid number of product types are selected as candidates in order of the highest evaluation value.

제품 유형 d의 평가치 V(d)는 다음 식과 같다.The evaluation value V(d) of product type d is given by the following equation.

[수학식 1][Equation 1]

여기서, w_k는 k변수에 대한 가중치를 나타내고, v_k(d)는 k변수에 대한 제품 유형의 수치(또는 값, 변수값)를 나타낸다.Here, w _k represents the weight for k variable, and v _k (d) represents the numerical value (or value, variable value) of the product type for k variable.

도 10의 표에서 보는 바와 같이, 각 제품 유형을 평가하기 위한 상태 변수는 납기 기한, 대기 로트 수, 장비 필요도, 대기 시간, 연속 작업 가능도 등으로 구성될 수 있다. 도 10의 표는 하나의 예시일 뿐, 다양한 변수를 사용하여 후보를 선정할 수 있다.As shown in the table of FIG. 10, state variables for evaluating each product type may consist of delivery deadline, number of waiting lots, equipment need, waiting time, possibility of continuous operation, etc. The table in Figure 10 is only an example, and various variables can be used to select candidates.

예를 들어, 해당 신경망의 제품 유형의 개수가 3개이면, 평가치가 상위인 3개의 제품 유형을 선출한다.For example, if the number of product types in the corresponding neural network is three, the three product types with the highest evaluation values are selected.

또 제2 방식으로서, 각 상태 변수 중에서 가장 높은 값을 가지는 제품 유형들을 각각 선정하여, 후보를 선출할 수 있다. 예를 들어, 제품 유형의 개수 N=3인 경우, 1번 후보는 납기 최우선 제품 유형, 2번 후보는 장비 필요도 최우선 제품유형, 3번 후보는 연속작업 가능제품 최우선 등으로 제품 유형을 선택할 수 있다. 즉, 각 변수의 가장 높은 값을 가지는 제품 유형들을 각기 선출한다.Also, as a second method, candidates can be selected by selecting product types with the highest values among each state variable. For example, if the number of product types is N=3, candidate 1 can be selected as a product type with highest priority on delivery date, candidate 2 as a product type with highest priority on equipment needs, candidate 3 as a product type with highest priority on continuous operation, etc. there is. In other words, the product types with the highest values of each variable are selected.

다음으로, 제품유형 매핑부(33)는 워크플로우의 후보 유형과 신경망의 제품 유형을 매핑한다. 즉, 워크플로우의 후보 유형의 상태를 신경망의 상태 데이터로 매핑하거나, 신경망의 결과 데이터(또는 작업 데이터)의 제품 유형과 워크플로우의 해당 공정의 다음 작업의 제품 유형을 매핑한다.Next, the product type mapping unit 33 maps the candidate type of the workflow and the product type of the neural network. In other words, the state of the candidate type of the workflow is mapped to the state data of the neural network, or the product type of the result data (or task data) of the neural network is mapped to the product type of the next task of the corresponding process in the workflow.

도 11에서 보는 바와 같이, 시뮬레이터(10) 상에서 실행되는 워크플로우에는 제품 유형이 A, B, C, D, E 등 모두 5개가 있다. 이때, 어느 한 공정의 신경망은 3개의 제품 유형을 가진다. 즉, 해당 신경망의 제품 유형의 개수는 3개이다.As shown in Figure 11, there are five product types in the workflow running on the simulator 10: A, B, C, D, and E. At this time, the neural network of one process has three product types. In other words, the number of product types in the corresponding neural network is three.

이때, 워크플로우 상의 제품 유형 중에서 3개의 제품 유형이 선출된다. 도 11의 예에서, 제품 B, C, E가 선출된다. 이때, 선출된 제품 유형의 개수와, 신경망의 제품 유형의 개수는 동일하다.At this time, three product types are selected from among the product types in the workflow. In the example of Figure 11, products B, C, and E are elected. At this time, the number of elected product types and the number of product types of the neural network are the same.

선출된 제품 유형의 상태 데이터(워크플로우 상의 상태)를 신경망의 제품 유형의 상태 데이터로 매핑한다. 앞서 예에서, 제품 B는 제품 1에 매핑되고, 제품 C는 제품 2에 매핑되고, 제품 E는 제품 3에 매핑된다. 즉, 선출된 제품의 상태(워크플로우 상의 상태)를 각 신경망에 입력되는 상태 데이터로 생성한다.The state data (state in the workflow) of the selected product type is mapped to the state data of the product type in the neural network. In the previous example, Product B is mapped to Product 1, Product C is mapped to Product 2, and Product E is mapped to Product 3. In other words, the state of the selected product (state in the workflow) is generated as state data input to each neural network.

즉, 인공 신경망은 시뮬레이터 내부에 정의되어 있는 실제 제품 코드/명칭을 알 필요 없이, 선출된 후보 순번에 의하여 의사결정을 수행한다.In other words, the artificial neural network makes decisions based on the order of the elected candidates without the need to know the actual product code/name defined inside the simulator.

또한, 도 11에서 신경망에 의해 선정된 제품 유형의 결과를 워크플로우 상의 제품 유형으로 매핑하여 리턴한다. 도 11의 예에서, 신경망의 결과는 제품 2이고, 제품 2는 제품 C에 매핑된다.Additionally, in Figure 11, the results of the product type selected by the neural network are mapped to the product type in the workflow and returned. In the example of Figure 11, the result of the neural network is product 2, and product 2 is mapped to product C.

즉, 신경망에서 선정된(의사결정된) 제품 유형(또는 제품 순번)을 다시 실제 제품 유형(또는 워크플로우 상의 제품 유형)으로 매핑한다. 즉, 신경망의 의사결정된 제품 유형을 워크플로우 상의 제품 유형으로 변환하여 시뮬레이터에 전달된다.In other words, the product type (or product sequence number) selected (or decision-making) in the neural network is mapped back to the actual product type (or product type in the workflow). In other words, the product type determined by the neural network is converted to the product type in the workflow and transmitted to the simulator.

한편, 생산 에피소드에 의해 학습 데이터를 생성하는 경우, 해당 생산 에피소드에 의한 다음 작업의 유형을 신경망의 작업 데이터(다음 작업, 출력 데이터)의 제품 유형에 매핑한다.Meanwhile, when learning data is generated by a production episode, the type of the next task by the production episode is mapped to the product type of the neural network's task data (next task, output data).

앞서와 같은 매핑 과정에 의하여, 인공 신경망은 공장 상황의 변화(제품 유형의 가짓수 변경, 공정 별 생산 제품의 변경)에 무관하게 항상 동일한 구조(인풋, 아웃풋 노드 개수)를 유지할 수 있다. 또한, 인공 신경망의 구조가 변화하지 않기에, 공장 시뮬레이터 내부의 생산제품 가짓수가 변경되거나 신제품이 생산되어도 인공 신경망은 재학습을 할 필요가 없고, 기존 학습 완료된 인공신경망을 활용하여 스케줄링을 수행할 수 있다.By using the same mapping process as before, the artificial neural network can always maintain the same structure (number of input and output nodes) regardless of changes in factory conditions (changes in the number of product types, changes in products produced by process). In addition, since the structure of the artificial neural network does not change, even if the number of products in the factory simulator changes or a new product is produced, the artificial neural network does not need to be retrained, and scheduling can be performed using the existing artificial neural network that has already been trained. there is.

또한, 전체 제품유형 중에서 N개의 후보를 추려내는 것은 매우 적은 비용 (계산속도, 계산자원 등)으로 가능하며, 이 N개 중 최적의 해가 존재할 확률이 매우 높다는 가정으로 신경망을 작동시키는 것이다. 즉, “매우 낮은 확률로”후보 N개 중 최적해가 없는 불이익보다는, 제품 유형 개수가 달라지더라도 인공신경망을 재구성/재학습 하지 않는 이익이 훨씬 크다.In addition, selecting N candidates from all product types is possible at very low cost (calculation speed, computational resources, etc.), and the neural network is operated under the assumption that the probability of an optimal solution among these N is very high. In other words, the benefit of not reconstructing/relearning the artificial neural network even if the number of product types changes is much greater than the disadvantage of not finding an optimal solution among N candidates “with very low probability.”

한편, 바람직하게는, 제품유형 매핑부(33)는 후보유형 선출부(32)에서 선출된 제품 유형의 순서(또는 선출 순서)를 항상 동일하게 신경망의 제품 유형에 매핑시킨다.Meanwhile, preferably, the product type mapping unit 33 always maps the order (or selection order) of product types selected in the candidate type selection unit 32 to the product types of the neural network in the same manner.

예를 들어, 앞서 제1 방식에서, 평가치가 높은 순에 의해 제품 유형을 배열하고, 배열된 순서에 따라 신경망의 제품 유형에 매핑시킨다. 1번째 단계에서 평가치가 높은 제품 유형이 B,C,E 순서이고, 2번째 단계에서 평가치가 높은 제품 유형이 C,E,B 순서인 경우를 가정한다. 이 경우, 선출된 제품 유형은 B,C,E로 동일하나, 매핑되는 순서는 다를 수 있다. 즉, 1번째 단계에서는 제품 유형 B는 신경망의 제품 1에 매핑되는 반면, 2번째 단계에서 제품 유형 B는 신경망의 제품 3에 매핑된다.For example, in the first method, product types are arranged in descending order of evaluation value and mapped to the product type of the neural network according to the arranged order. Assume that the product types with the highest evaluation values in the first step are in the order B, C, and E, and the product types with the highest evaluation values in the second step are in the order C, E, and B. In this case, the selected product types are the same as B, C, and E, but the mapping order may be different. That is, in the first step, product type B is mapped to product 1 of the neural network, while in the second step, product type B is mapped to product 3 of the neural network.

상기와 같이 선출 순서를 유지하면서 매핑하면, 제품 유형이 다르게 학습되더라도 신경망의 학습이 보다 균일하게 반영될 수 있다.If mapping is performed while maintaining the selection order as above, the learning of the neural network can be reflected more uniformly even if the product types are learned differently.

또한, 제품유형 매핑부(33)는 워크플로우 상의 제품 유형의 개수가 신경망의 제품 유형의 개수 보다 적은 경우, 적은 개수만큼의 디폴트 제품 유형(또는 가상 유형)을 생성하되, 해당 가상 유형의 상태는 해당 제품이 없는 상태로 설정하여 매핑한다.In addition, when the number of product types in the workflow is less than the number of product types in the neural network, the product type mapping unit 33 generates a smaller number of default product types (or virtual types), but the status of the virtual type is Set the product to a state where it does not exist and map it.

예를 들어, 워크플로우 상의 제품 유형의 개수가 2개이고, 신경망의 제품 유형의 개수가 3인 경우를 가정한다. 이 경우, 워크플로우 상의 1개 제품 유형을 가상의 유형으로 생성한다. 그리고 가상 유형의 상태 데이터가 로트 수이거나 대기 수이면, 해당 로트 수 및 대기 수를 모두 0으로 처리하여 매핑한다. 또한, 납기 기한 등은 무한 기간(또는 가장 긴 기한) 등으로 설정하여 매핑한다. 즉, 가상 유형의 제품 또는 해당 로트가 없는 경우를 가정하여 매핑한다.For example, assume that the number of product types in the workflow is 2 and the number of product types in the neural network is 3. In this case, one product type in the workflow is created as a virtual type. And if the status data of the virtual type is the lot number or the wait number, both the lot number and the wait number are treated as 0 and mapped. Additionally, delivery deadlines, etc. are mapped by setting them to an infinite period (or the longest deadline). In other words, the mapping assumes that there is no virtual type of product or corresponding lot.

또한, 다른 실시예로서, 제품유형 매핑부(33)는 워크플로우 상의 제품 유형의 개수가 신경망의 제품 유형의 개수 보다 적은 경우, 중복하여 제품 유형을 선택할 수 있다. 예를 들어, 공장에서 생산하는 제품이 A, B 등 2개인데 인공신경망/후보선정 로직이 3가지 유형이면, 후보1 A, 후보2 B, 후보3 A로 생성한다. 즉, 제품 유형 A가 2번 중복된다. 이러한 경우에도 신경망은 명확한 최적해를 찾을 수 있다. 즉, 제품 A, B, C, D, E를 생산하는 제품에서 N=3이고 제품 B,C,D,E는 주문이 없는 상황일 경우, 후보 1,2,3 모두 제품 A가 골라질 수 있다는 의미이다. 이 경우 인공신경망은 높은 확률로 A를 최종 선택한다.Additionally, as another embodiment, the product type mapping unit 33 may select product types in duplicate when the number of product types in the workflow is less than the number of product types in the neural network. For example, if a factory produces two products, A and B, but there are three types of artificial neural network/candidate selection logic, candidate 1 A, candidate 2 B, and candidate 3 A are generated. That is, product type A is duplicated twice. Even in these cases, the neural network can find a clear optimal solution. In other words, in a situation where N=3 among products producing products A, B, C, D, and E, and there are no orders for products B, C, D, and E, product A can be selected for all candidates 1, 2, and 3. It means there is. In this case, the artificial neural network finally selects A with high probability.

다음으로, 학습 실행부(34)는 수집된 모의결과, 즉, 워크플로우 상태로부터 학습 데이터를 생성하고, 학습 데이터를 해당 공정의 신경망에 적용하여 학습시킨다.Next, the learning execution unit 34 generates learning data from the collected simulation results, that is, the workflow state, and applies the learning data to the neural network of the corresponding process to train it.

특히, 학습 실행부(34)는 시뮬레이터(10)에 의해 모의된 모의 결과 또는 워크플로우 상태에서 후보 유형을 선출하고 선출된 후보 유형에 따라 신경망의 학습 데이터(상태 데이터 또는 작업 데이터 등)를 생성한다.In particular, the learning execution unit 34 selects a candidate type from the simulation results or workflow state simulated by the simulator 10 and generates learning data (state data or work data, etc.) of the neural network according to the selected candidate type. .

예를 들어, 하나의 생산 에피소드에 의해 학습 데이터를 생성할 때, 해당 생산 에피소드에 따라 각 단계에서의 신경망의 상태 데이터 또는 작업 데이터 등을 생성한다. 이때, 각 단계에서, 후보유형 선출부(32)를 통해 후보 유형을 선출하고, 제품유형 매핑부(33)를 통해 후보 유형을 신경망의 제품 유형에 매핑하여 상태 데이터를 생성한다. 또한, 해당 생산 에피소드에서 다음 작업의 제품 유형을 신경망의 제품 유형에 매핑하여 신경망의 작업 데이터를 생성한다. 이와 같이 신경망의 상태 데이터 및 작업 데이터 등을 생성하여 학습을 위한 트랜지션 데이터를 생성한다.For example, when learning data is generated through one production episode, state data or work data of the neural network at each stage are generated according to the production episode. At this time, at each stage, the candidate type is selected through the candidate type selection unit 32, and the candidate type is mapped to the product type of the neural network through the product type mapping unit 33 to generate state data. Additionally, the product type of the next task in the corresponding production episode is mapped to the product type of the neural network to generate task data for the neural network. In this way, the state data and work data of the neural network are generated to generate transition data for learning.

한편, 학습 실행부(34)는 각 생산 에피소드 또는 각 생산 에피소드의 단계 마다 후보 유형을 선출한다. 따라서 각 생산 에피소드 또는 각 단계에 따라 신경망에서 학습되는 학습 데이터의 제품 유형들은 달리질 수 있다.Meanwhile, the learning execution unit 34 selects a candidate type for each production episode or step of each production episode. Therefore, the product types of learning data learned from the neural network may vary depending on each production episode or each stage.

그런데, 본 발명에서 채택하고 있는 강화학습의 작동방식은, 주어진 입력 또는 상태에서 가장 보상치/점수가 높아질 것으로 기대하는 다음 작업(action) 또는 출력을 선택한다. 즉, 강화학습의 기본적인 목적이 모든 입력에 대한 정답지에 해당하는 작업(action, 출력)을 추정하지 않는다. 강화학습은 주어진 입력 상태에서, 가장 적절한 액션을 선택하는 것이 목표이며, 학습 시에는 한번도 경험하지 않은 입력 상태가 주어져도 적절한 출력(Action)을 선택한다.However, the operation method of reinforcement learning adopted in the present invention selects the next action or output expected to result in the highest reward value/score given the given input or state. In other words, the basic purpose of reinforcement learning is not to estimate the action (output) corresponding to the correct answer for all inputs. The goal of reinforcement learning is to select the most appropriate action from a given input state, and during learning, an appropriate output (Action) is selected even when given an input state that has never been experienced.

특히, 3가지의 제품 유형인 경우, 실제 이 제품이 어떤 제품 유형인지 무방하게, 전체 공장의 로트 분포상태 및 특정 공정에서의 설비상태로, 해당 공정에서의 의사결정을 수행할 수 있다. 해당 공정의 이전 공정의 후보 제품 별 로트 상태의 경우의 수는 무한의 경우가 가능하다. 따라서, 학습할 때의 제품 유형 상태가 아니더라도, 해당 공정에서는 보상치/점수를 향상시킬 수 있는 의사결정을 수행할 수 있다. 즉, 학습할 때의 제품유형이 각각 A,B,C이고 실제 구동 시의 제품 유형이 D, E, F 가 되어도 정상적으로 작동될 수 있다.In particular, in the case of three product types, decisions in the relevant process can be made based on the lot distribution status of the entire factory and the equipment status in a specific process, regardless of what type of product it is. The number of lot statuses for each candidate product from the previous process of the relevant process can be infinite. Therefore, even if the product type is not in the state at the time of learning, decisions that can improve the reward value/score can be made in the process. In other words, it can operate normally even if the product types during learning are A, B, and C, and the product types during actual operation are D, E, and F.

다음으로, 스케줄링 생성부(35)는 시뮬레이터(10)를 통해 모의하고 각 공정의 의사결정을 신경망으로 수행하게 하여, 스케줄링을 위한 생산 에피소드를 생성한다.Next, the scheduling generator 35 creates a production episode for scheduling by simulating the process through the simulator 10 and making decisions for each process using a neural network.

이때, 스케줄링 생성부(35)는 각 공정의 신경망에 대하여, 후보유형 선출부(32)를 통해 후보 유형을 선출하고, 제품유형 매핑부(33)를 통해 후보 유형을 신경망의 제품 유형에 매핑하여 상태 데이터를 생성한다. 그리고 생성된 상태 데이터를 해당 공정의 신경망에 적용하여 결과(또는 다음 작업 데이터)를 획득한다.At this time, the scheduling generation unit 35 selects a candidate type for the neural network of each process through the candidate type selection unit 32, and maps the candidate type to the product type of the neural network through the product type mapping unit 33. Generate status data. Then, the generated state data is applied to the neural network of the process to obtain the result (or next work data).

또한, 스케줄링 생성부(35)는 획득된 다음 작업 데이터의 작업 유형을 역으로 매핑하여, 실제 제품 유형을 획득하고 이를 시뮬레이터(10)에 적용시킨다.Additionally, the scheduling generator 35 reversely maps the job type of the obtained next job data to obtain the actual product type and applies it to the simulator 10.

이상, 본 발명자에 의해서 이루어진 발명을 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.Above, the invention made by the present inventor has been described in detail based on examples, but the present invention is not limited to the examples and, of course, can be changed in various ways without departing from the gist of the invention.

10 : 신경망 에이전트 11 : 신경망
20 : 공장 시뮬레이터 30 : 학습 시스템
31 : 모의 실행부 32 : 후보유형 선출부
33 : 제품유형 매핑부 34 : 학습 실행부
35 : 스케줄링 생성부
40 : 데이터베이스 41 : 학습DB
42 : 스케줄링DB10: Neural network agent 11: Neural network
20: Factory Simulator 30: Learning System
31: Mock execution department 32: Candidate type selection department
33: Product type mapping unit 34: Learning execution unit
35: Scheduling creation unit
40: Database 41: Learning DB
42: Scheduling DB

Claims

Neural network-based process scheduling independent of product type and number of types, where the scheduling neural network for each process of the factory workflow is learned, and the factory workflow is simulated with a factory simulator and the simulation results are used to learn the scheduling neural network for each process. In the system,
a simulation execution unit that simulates the factory workflow using the factory simulator;
a candidate type selection unit that selects product type candidates (hereinafter referred to as candidate types) in a workflow state simulated by the factory simulator, and selects as many candidate types as the number of product types in the scheduling neural network;
a product type mapping unit that maps the selected candidate type and the product type of the scheduling neural network; and,
Learning data of the scheduling neural network is generated using the workflow state simulated by the factory simulator, and a candidate type is selected through the candidate type selection unit, and the candidate type selected through the product type mapping unit is combined with the scheduling neural network. A neural network-based process scheduling system independent of product type and number of types, comprising a learning execution unit that maps product types and generates learning data corresponding to the corresponding workflow state.

According to paragraph 1,
The candidate type selection unit uses the variables of the workflow state (hereinafter referred to as state variables) related to each product type to obtain the variable value of the corresponding variable for each product type and selects the product type according to the variable value. A neural network-based process scheduling system that is independent of type and type count.

According to paragraph 2,
A neural network-based process scheduling system independent of product type and number of types, characterized in that the candidate type selection unit calculates the evaluation value V(d) of each product type d according to the following formula.
[Formula 1]

However, w _k represents the weight for the kth state variable, and v _k (d) represents the variable value of product type d for the kth state variable.

According to paragraph 2,
A neural network-based process scheduling system independent of product type and number of types, wherein the state variable is one or more of device deadline, number of waiting lots, equipment need, waiting time, and possibility of continuous operation.

According to paragraph 1,
A neural network-based process scheduling system independent of product type and number of types, wherein the product type mapping unit always maps the product type of the scheduling neural network identically according to the order of selection of the selected product type.

According to paragraph 1,
If the number of product types in the workflow state is less than the number of product types in the scheduling neural network, the product type mapping unit generates a smaller number of virtual product types (hereinafter referred to as virtual types) or creates duplicate product types. A neural network-based process scheduling system independent of product type and number of types.

According to paragraph 1,
The system generates production episodes for scheduling by simulating through the factory simulator and performing decision-making for each process with the scheduling neural network of each process, and for the scheduling neural network of each process, the candidate type selection unit. A candidate type is selected through the product type mapping unit, and the candidate type is mapped to the product type of the scheduling neural network to generate state data, and the generated state data is applied to the scheduling neural network of the process to obtain a result. A neural network-based process scheduling system independent of product type and number of types, further comprising a scheduling generator.

According to any one of claims 1 to 7,
A neural network-based process scheduling system independent of product type and number of types, characterized in that the scheduling neural network for each process is a neural network based on reinforcement learning.