KR102636029B1

KR102636029B1 - Method and system for recommending stacking position of container based on reinforcement learning

Info

Publication number: KR102636029B1
Application number: KR1020220186459A
Authority: KR
Inventors: 장우석; 김동규; 이효준; 정규민; 이성진
Original assignee: 주식회사 컨테인어스
Priority date: 2022-12-09
Filing date: 2022-12-27
Publication date: 2024-02-13

Abstract

본 개시는 적어도 하나의 프로세서에 의해 수행되는, 강화학습 기반 컨테이너 장치 위치 추천 방법에 관한 것이다. 컨테이너 장치 위치 추천 방법은, 대상 컨테이너 데이터 및 실제 컨테이너 야드의 상태 데이터를 수신하는 단계, 실제 컨테이너 야드에 위치한 실제 리치 스태커의 상태 데이터를 수신하는 단계 및 강화학습 모델을 이용하여, 대상 컨테이너 데이터, 실제 컨테이너 야드의 상태 데이터 및 실제 리치 스태커의 상태 데이터로부터 컨테이너 작업 데이터를 생성하는 단계를 포함하고, 강화학습 모델은, 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터에 기초하여 학습된다.The present disclosure relates to a method for recommending a container device location based on reinforcement learning, which is performed by at least one processor. The container device location recommendation method includes receiving target container data and status data of the actual container yard, receiving status data of the actual reach stacker located in the actual container yard, and using a reinforcement learning model to It includes generating container task data from the state data of the container yard and the state data of the actual reach stacker, and the reinforcement learning model is learned based on the virtual container data, the state data of the virtual container yard, and the state data of the virtual reach stacker. do.

Description

Reinforcement learning-based container device location recommendation method and system {METHOD AND SYSTEM FOR RECOMMENDING STACKING POSITION OF CONTAINER BASED ON REINFORCEMENT LEARNING}

본 개시는 강화학습을 기반으로 컨테이너 장치 위치를 추천하는 방법 및 시스템에 관한 것으로, 구체적으로 가상의 데이터로부터 학습된 강화학습 모델을 이용하여 컨테이너 작업 데이터를 생성함으로써 대상 컨테이너의 장치 위치를 추천하는 방법 및 시스템에 관한 것이다.This disclosure relates to a method and system for recommending a container device location based on reinforcement learning. Specifically, a method of recommending the device location of a target container by generating container task data using a reinforcement learning model learned from virtual data. and systems.

항만 물류 프로세스에서 컨테이너선의 컨테이너가 하역된 후 다음 행선지로 수송되기 전 내륙 컨테이너 야드에 장치되는 과정을 거치게 된다. 컨테이너 장치는 상당한 추가 비용 및 공간의 한계가 존재하는 바 컨테이너를 다단 적재하여 컨테이너 야드의 공간을 효율적으로 사용하고 있다.In the port logistics process, containers from container ships are unloaded and then placed in an inland container yard before being transported to the next destination. Container devices have significant additional costs and space limitations, so container yard space is used efficiently by stacking containers in multiple stages.

그러나, 컨테이너의 장치 위치를 고려하지 않고 다단 적재하는 경우 하단 부에 적재된 컨테이너를 수송하기 위해서는 상단에 적재된 화물을 모두 다른 장치장으로 이동하는 불필요한 리핸들링 작업이 수반된다는 문제가 있다.However, in the case of multi-level loading without considering the location of the container, there is a problem that transporting the container loaded at the bottom involves unnecessary rehandling work in which all cargo loaded at the top is moved to another storage area.

본 개시는 상기와 같은 문제를 해결하기 위한 강화학습 기반 컨테이너 장치 위치 추천 방법, 기록 매체에 저장된 컴퓨터 프로그램 및 시스템(장치)을 제공한다.The present disclosure provides a reinforcement learning-based container device location recommendation method, a computer program stored in a recording medium, and a system (device) to solve the above problems.

본 개시는 방법, 시스템(장치) 또는 판독 가능 저장 매체에 저장된 컴퓨터 프로그램을 포함한 다양한 방식으로 구현될 수 있다.The present disclosure may be implemented in various ways, including as a method, system (device), or computer program stored in a readable storage medium.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서에 의해 수행되는, 강화학습 기반 컨테이너 장치 위치 추천 방법은, 대상 컨테이너 데이터 및 실제 컨테이너 야드(container yard)의 상태 데이터를 수신하는 단계, 실제 컨테이너 야드에 위치한 실제 리치 스태커(reach stacker)의 상태 데이터를 수신하는 단계 및 강화학습 모델을 이용하여, 대상 컨테이너 데이터, 실제 컨테이너 야드의 상태 데이터 및 실제 리치 스태커의 상태 데이터로부터 컨테이너 작업 데이터를 생성하는 단계를 포함하고, 강화학습 모델은, 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터에 기초하여 학습된다.According to an embodiment of the present disclosure, a reinforcement learning-based container device location recommendation method performed by at least one processor includes receiving target container data and status data of an actual container yard, the actual container yard A step of receiving the state data of the actual reach stacker located in and a step of generating container work data from the target container data, the state data of the actual container yard, and the state data of the actual reach stacker using a reinforcement learning model. Including, the reinforcement learning model is learned based on virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker.

본 개시의 일 실시예에 따르면, 실제 리치 스태커는 실제 컨테이너 야드에 위치한 복수의 리치 스태커를 포함하고, 방법은 복수의 리치 스태커 중 적어도 하나에 대하여 컨테이너 작업 데이터를 전송하는 단계를 더 포함한다.According to one embodiment of the present disclosure, the actual reach stacker includes a plurality of reach stackers located in an actual container yard, and the method further includes transmitting container job data to at least one of the plurality of reach stackers.

본 개시의 일 실시예에 따르면, 컨테이너 작업 데이터는 작업 유형 및 대상 컨테이너의 장치 위치를 포함한다.According to one embodiment of the present disclosure, container task data includes a task type and a device location of a target container.

본 개시의 일 실시예에 따르면, 작업 유형은 대상 컨테이너의 입고를 포함하고, 대상 컨테이너의 장치 위치는 대상 컨테이너가 실제 컨테이너 야드에 장치될 위치를 포함한다.According to an embodiment of the present disclosure, the operation type includes the stocking of the target container, and the device location of the target container includes a location where the target container will be installed in the actual container yard.

본 개시의 일 실시예에 따르면, 작업 유형은 컨테이너 리핸들링을 포함하고, 대상 컨테이너의 장치 위치는 대상 컨테이너가 리핸들링될 위치를 포함한다.According to one embodiment of the present disclosure, the task type includes container rehandling, and the device location of the target container includes a location at which the target container will be rehandled.

본 개시의 일 실시예에 따르면, 강화학습 모델은, 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터를 수신하고, 수신된 데이터에 기초하여 가상 컨테이너 작업 데이터를 결정하고, 가상 리치 스태커에게 가상 컨테이너 작업 데이터를 전송하고 가상 리치 스태커에 의해 생성된 작업 수행 결과에 대한 보상을 획득하는 학습 방법을 통해 학습된다.According to an embodiment of the present disclosure, the reinforcement learning model receives virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker, determines virtual container operation data based on the received data, and It is learned through a learning method that transmits virtual container task data to the reach stacker and obtains rewards for the task performance results generated by the virtual reach stacker.

본 개시의 일 실시예에 따르면, 강화학습 모델은 가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하는지 확인한다.According to one embodiment of the present disclosure, the reinforcement learning model determines whether virtual container task data violates a preset prohibition condition.

본 개시의 일 실시예에 따르면, 가상 컨테이너 데이터는 무작위로 설정되고, 가상 컨테이너 작업 데이터가 금지 조건을 위반하는 경우 가상 컨테이너 데이터는 무작위로 재설정된다. According to an embodiment of the present disclosure, the virtual container data is randomly set, and if the virtual container task data violates the prohibition condition, the virtual container data is randomly reset.

본 개시의 일 실시예에 따르면, 강화학습 모델은 학습 방법을 반복하여 획득된 보상이 최대가 되도록 업데이트 된다.According to one embodiment of the present disclosure, the reinforcement learning model is updated so that the reward obtained by repeating the learning method is maximized.

본 개시의 일 실시예에 따르면, 가상 컨테이너 작업 데이터는, 가상 컨테이너 야드의 장치율을 임의의 수치로 제한하는 조건에서 생성된다.According to an embodiment of the present disclosure, virtual container work data is generated under conditions that limit the device rate of the virtual container yard to an arbitrary value.

본 개시의 일 실시예에 따르면, 적어도 하나의 프로세서에 의해 수행되는, 강화학습 기반 컨테이너 장치 위치 추천 모델 생성 방법은 (a) 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터를 설정하는 단계, (b) 강화학습 모델을 이용하여, 설정된 데이터로부터 가상 컨테이너 작업 데이터를 생성하는 단계, (c) 가상 리치 스태커에게 가상 컨테이너 작업 데이터를 전송하여 작업 수행 결과를 생성하는 단계 및 (d) 작업 수행 결과에 대한 보상을 획득하는 단계를 포함한다.According to an embodiment of the present disclosure, a method of generating a reinforcement learning-based container device location recommendation model, performed by at least one processor, includes (a) virtual container data, state data of a virtual container yard, and state data of a virtual reach stacker; setting step, (b) generating virtual container work data from the set data using a reinforcement learning model, (c) sending virtual container work data to the virtual reach stacker to generate work performance results, and (d) ) It includes the step of obtaining compensation for the results of task performance.

본 개시의 일 실시예에 따르면, 가상 컨테이너 작업 데이터는 작업 유형 및 가상 컨테이너의 장치 위치를 포함한다.According to one embodiment of the present disclosure, virtual container task data includes a task type and a device location of the virtual container.

본 개시의 일 실시예에 따르면, 작업 유형은 가상 컨테이너의 입고를 포함하고, 가상 컨테이너의 장치 위치는 가상 컨테이너가 가상 컨테이너 야드에 장치될 위치를 포함한다.According to an embodiment of the present disclosure, the operation type includes stocking of the virtual container, and the device location of the virtual container includes a location where the virtual container is to be installed in the virtual container yard.

본 개시의 일 실시예에 따르면, 작업 유형은 가상 컨테이너 리핸들링을 포함하고, 가상 컨테이너의 장치 위치는 가상 컨테이너가 리핸들링될 위치를 포함한다.According to one embodiment of the present disclosure, the task type includes virtual container rehandling, and the device location of the virtual container includes the location at which the virtual container will be rehandled.

본 개시의 일 실시예에 따르면, 가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하는지 확인하는 단계를 더 포함한다.According to an embodiment of the present disclosure, the method further includes checking whether the virtual container task data violates a preset prohibition condition.

본 개시의 일 실시예에 따르면, 가상 컨테이너 데이터는 무작위로 설정되고, 가상 컨테이너 작업 데이터가 금지 조건을 위반하는 경우 가상 컨테이너 데이터는 무작위로 재설정된다.According to an embodiment of the present disclosure, the virtual container data is randomly set, and if the virtual container task data violates the prohibition condition, the virtual container data is randomly reset.

본 개시의 일 실시예에 따르면, (a), (b), (c) 및 (d) 단계를 반복하여 획득된 보상이 최대가 되도록 강화학습 모델을 업데이트하는 단계를 더 포함한다.According to an embodiment of the present disclosure, the method further includes updating the reinforcement learning model so that the reward obtained by repeating steps (a), (b), (c), and (d) is maximized.

본 개시의 일 실시예에 따른 상술한 방법을 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 제공된다.In order to execute the above-described method according to an embodiment of the present disclosure on a computer, a computer program stored in a computer-readable recording medium is provided.

본 개시의 일 실시예에 따른 정보 처리 시스템은, 통신 모듈, 메모리 및 메모리와 연결되고, 메모리에 포함된 컴퓨터 판독 가능한 적어도 하나의 프로그램을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 적어도 하나의 프로그램은, 대상 컨테이너 데이터 및 실제 컨테이너 야드의 상태 데이터를 수신하고, 실제 컨테이너 야드에 위치한 실제 리치 스태커의 상태 데이터를 수신하고, 강화학습 모델을 이용하여, 대상 컨테이너 데이터, 실제 컨테이너 야드의 상태 데이터 및 실제 리치 스태커의 상태 데이터로부터 컨테이너 작업 데이터를 생성하기 위한 명령어들을 포함하고, 강화학습 모델은, 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터에 기초하여 학습된다.An information processing system according to an embodiment of the present disclosure includes a communication module, a memory, and at least one processor connected to the memory and configured to execute at least one computer-readable program included in the memory, and at least one program Receives the target container data and the status data of the actual container yard, receives the status data of the actual reach stacker located in the actual container yard, and uses a reinforcement learning model to collect the target container data, the status data of the actual container yard, and the actual It includes instructions for generating container task data from the state data of the reach stacker, and the reinforcement learning model is learned based on the virtual container data, the state data of the virtual container yard, and the state data of the virtual reach stacker.

본 개시의 일 실시예에 따르면, 가상의 데이터로부터 학습된 강화학습 모델을 이용하여 컨테이너를 장치할 최적의 위치를 제공함으로써, 컨테이너 야드에서 컨테이너를 효율적으로 장치할 수 있다. According to an embodiment of the present disclosure, containers can be efficiently installed at a container yard by providing an optimal location for installing containers using a reinforcement learning model learned from virtual data.

본 개시의 일 실시예에 따르면, 강화학습 모델을 이용하여 컨테이너 야드에서 컨테이너를 장치할 최적의 위치를 제공함으로써, 불필요한 컨테이너 리핸들링 작업을 최소화할 수 있다.According to an embodiment of the present disclosure, unnecessary container rehandling work can be minimized by providing an optimal location for installing containers in a container yard using a reinforcement learning model.

본 개시의 일 실시예 따르면, 컨테이너 입고 작업 데이터뿐만 아니라 컨테이너 야드의 상태를 고려한 리핸들링 작업 데이터를 생성함으로써, 최대한 많은 컨테이너를 한정된 공간에 최적의 상태로 장치할 수 있다. According to an embodiment of the present disclosure, by generating rehandling work data considering the state of the container yard as well as container receiving work data, as many containers as possible can be optimally installed in a limited space.

본 개시의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청구범위의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자(이하, '통상의 기술자'라 함)에게 명확하게 이해될 수 있을 것이다. The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be explained by those skilled in the art in the technical field to which this disclosure pertains from the description of the claims (hereinafter referred to as 'the person skilled in the art'). can be clearly understood.

본 개시의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 개시의 일 실시예에 따른 강화학습 기반 컨테이너 장치 위치 추천 시스템이 사용되는 예시를 나타내는 도면이다.
도 2는 본 개시의 일 실시예에 따른 정보 처리 시스템이 복수의 리치 스태커와 통신 가능하도록 연결된 구성을 나타내는 개요도이다.
도 3은 본 개시의 일 실시예에 따른 정보 처리 시스템의 내부 구성을 나타내는 블록도이다.
도 4는 본 개시의 일 실시예에 따른 정보 처리 시스템의 프로세서의 내부 구성을 나타내는 블록도이다.
도 5는 본 개시의 일 실시예에 따른 학습부의 내부 구성을 나타내는 도면이다.
도 6은 본 개시의 일 실시예에 따른 강화학습 모델의 예시를 나타내는 도면이다.
도 7은 본 개시의 일 실시예에 따른 에이전트 구조의 예시를 나타내는 도면이다.
도 8는 본 개시의 일 실시예에 따른 강화학습 모델의 학습 방법의 예시를 나타내는 도면이다.
도 9은 본 개시의 일 실시예에 따라 컨테이너 입고 작업 데이터를 생성하는 예시를 나타내는 도면이다.
도 10은 본 개시의 일 실시예에 따라 컨테이너 리핸들링 작업 데이터를 생성하는 예시를 나타내는 도면이다.
도 11는 본 개시의 일 실시예에 따른 강화학습 기반 컨테이너 장치 위치 추천 방법을 나타내는 흐름도이다.
도 12는 본 개시의 일 실시예에 따른 강화학습 기반 컨테이너 장치 위치 추천 모델 생성 방법을 나타내는 흐름도이다.Embodiments of the present disclosure will be described with reference to the accompanying drawings described below, in which like reference numerals indicate like elements, but are not limited thereto.
1 is a diagram illustrating an example of using a reinforcement learning-based container device location recommendation system according to an embodiment of the present disclosure.
Figure 2 is a schematic diagram showing a configuration in which an information processing system according to an embodiment of the present disclosure is connected to communicate with a plurality of reach stackers.
Figure 3 is a block diagram showing the internal configuration of an information processing system according to an embodiment of the present disclosure.
Figure 4 is a block diagram showing the internal configuration of a processor of an information processing system according to an embodiment of the present disclosure.
Figure 5 is a diagram showing the internal configuration of a learning unit according to an embodiment of the present disclosure.
Figure 6 is a diagram showing an example of a reinforcement learning model according to an embodiment of the present disclosure.
Figure 7 is a diagram showing an example of an agent structure according to an embodiment of the present disclosure.
Figure 8 is a diagram showing an example of a method for learning a reinforcement learning model according to an embodiment of the present disclosure.
Figure 9 is a diagram illustrating an example of generating container warehousing work data according to an embodiment of the present disclosure.
FIG. 10 is a diagram illustrating an example of generating container rehandling job data according to an embodiment of the present disclosure.
Figure 11 is a flowchart showing a reinforcement learning-based container device location recommendation method according to an embodiment of the present disclosure.
Figure 12 is a flowchart showing a method for generating a reinforcement learning-based container device location recommendation model according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific details for implementing the present disclosure will be described in detail with reference to the attached drawings. However, in the following description, detailed descriptions of well-known functions or configurations will be omitted if there is a risk of unnecessarily obscuring the gist of the present disclosure.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나, 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding components are given the same reference numerals. Additionally, in the description of the following embodiments, overlapping descriptions of identical or corresponding components may be omitted. However, even if descriptions of components are omitted, it is not intended that such components are not included in any embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 개시는 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 통상의 기술자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments and methods for achieving them will become clear by referring to the embodiments described below in conjunction with the accompanying drawings. However, the present disclosure is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely intended to ensure that the present disclosure is complete and that the present disclosure does not convey the scope of the invention to those skilled in the art. It is provided only for complete information.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in this specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in this specification are general terms that are currently widely used as much as possible while considering the function in the present disclosure, but this may vary depending on the intention or precedent of a technician working in the related field, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Accordingly, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of the present disclosure, rather than simply the name of the term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다.In this specification, singular expressions include plural expressions, unless the context clearly specifies the singular. Additionally, plural expressions include singular expressions, unless the context clearly specifies plural expressions. When it is said that a certain part includes a certain element throughout the specification, this does not mean excluding other elements, but may further include other elements, unless specifically stated to the contrary.

또한, 명세서에서 사용되는 '모듈' 또는 '부'라는 용어는 소프트웨어 또는 하드웨어 구성요소를 의미하며, '모듈' 또는 '부'는 어떤 역할들을 수행한다. 그렇지만, '모듈' 또는 '부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '모듈' 또는 '부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서, '모듈' 또는 '부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 또는 변수들 중 적어도 하나를 포함할 수 있다. 구성요소들과 '모듈' 또는 '부'들은 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '모듈' 또는 '부'들로 결합되거나 추가적인 구성요소들과 '모듈' 또는 '부'들로 더 분리될 수 있다.Additionally, the term 'module' or 'unit' used in the specification refers to a software or hardware component, and the 'module' or 'unit' performs certain roles. However, 'module' or 'unit' is not limited to software or hardware. A 'module' or 'unit' may be configured to reside on an addressable storage medium and may be configured to run on one or more processors. Thus, as an example, a 'module' or 'part' refers to components such as software components, object-oriented software components, class components and task components, processes, functions and properties. , procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, or variables. Components and 'modules' or 'parts' may be combined into smaller components and 'modules' or 'parts' or further components and 'modules' or 'parts'. Could be further separated.

본 개시의 일 실시예에 따르면, '모듈' 또는 '부'는 프로세서 및 메모리로 구현될 수 있다. '프로세서'는 범용 프로세서, 중앙 처리 장치(CPU), 마이크로프로세서, 디지털 신호 프로세서(DSP), 제어기, 마이크로제어기, 상태 머신 등을 포함하도록 넓게 해석되어야 한다. 몇몇 환경에서, '프로세서'는 주문형 반도체(ASIC), 프로그램가능 로직 디바이스(PLD), 필드 프로그램가능 게이트 어레이(FPGA) 등을 지칭할 수도 있다. '프로세서'는, 예를 들어, DSP와 마이크로프로세서의 조합, 복수의 마이크로프로세서들의 조합, DSP 코어와 결합한 하나 이상의 마이크로프로세서들의 조합, 또는 임의의 다른 그러한 구성들의 조합과 같은 처리 디바이스들의 조합을 지칭할 수도 있다. 또한, '메모리'는 전자 정보를 저장 가능한 임의의 전자 컴포넌트를 포함하도록 넓게 해석되어야 한다. '메모리'는 임의 액세스 메모리(RAM), 판독-전용 메모리(ROM), 비-휘발성 임의 액세스 메모리(NVRAM), 프로그램가능 판독-전용 메모리(PROM), 소거-프로그램가능 판독 전용 메모리(EPROM), 전기적으로 소거가능 PROM(EEPROM), 플래쉬 메모리, 자기 또는 광학 데이터 저장장치, 레지스터들 등과 같은 프로세서-판독가능 매체의 다양한 유형들을 지칭할 수도 있다. 프로세서가 메모리로부터 정보를 판독하고/하거나 메모리에 정보를 기록할 수 있다면 메모리는 프로세서와 전자 통신 상태에 있다고 불린다. 프로세서에 집적된 메모리는 프로세서와 전자 통신 상태에 있다.According to an embodiment of the present disclosure, a 'module' or 'unit' may be implemented with a processor and memory. 'Processor' should be interpreted broadly to include general-purpose processors, central processing units (CPUs), microprocessors, digital signal processors (DSPs), controllers, microcontrollers, state machines, etc. In some contexts, 'processor' may refer to an application-specific integrated circuit (ASIC), programmable logic device (PLD), field programmable gate array (FPGA), etc. 'Processor' refers to a combination of processing devices, for example, a combination of a DSP and a microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors in combination with a DSP core, or any other such combination of configurations. You may. Additionally, 'memory' should be interpreted broadly to include any electronic component capable of storing electronic information. 'Memory' refers to random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erasable-programmable read-only memory (EPROM), May also refer to various types of processor-readable media, such as electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, etc. A memory is said to be in electronic communication with a processor if the processor can read information from and/or write information to the memory. The memory integrated into the processor is in electronic communication with the processor.

본 개시에서, '강화학습(Reinforcement Learning, RL)'은 기계학습의 한 영역으로, 컴퓨터가 주어진 상태(state)에 대해 선택 가능한 액션(action)들 중 보상(reward)을 최대화하는 액션을 선택하는 학습 방법을 지칭할 수 있다. 여기서, 강화학습의 대상이 되는 컴퓨터 프로그램을 에이전트(agent)라고 지칭할 수 있고, 에이전트는 주어진 환경을 탐색하여 현재의 상태에서 자신이 취할 수 있는 액션의 확률을 나타내는 정책(policy)을 수립할 수 있다. 또한, 주어진 환경(environment)에 의해 에이전트에게 특정 상태(state)가 주어지면(에이전트가 관찰에 의해 상태를 얻게 된다고 볼 수도 있음), 에이전트는 상태에 따라 액션을 하고, 환경은 에이전트에게 보상을 주게 된다. 이러한 과정으로 에이전트는 환경과 상호작용을 하며 보상을 많이 취할 수 있는 액션들 또는 정책을 학습할 수 있다. 즉, 주어진 환경과 에이전트 사이에서 상태, 액션, 보상을 상호작용하면서 에이전트 또는 에이전트의 정책이 학습되고, 이를 통해 에이전트가 최대의 보상을 받을 수 있는 정책을 수립하는 것이 강화학습의 목표일 수 있다. In this disclosure, 'Reinforcement Learning (RL)' is an area of machine learning in which a computer selects an action that maximizes the reward among selectable actions for a given state. It can refer to a learning method. Here, the computer program that is the target of reinforcement learning can be referred to as an agent, and the agent can explore a given environment and establish a policy that indicates the probability of an action it can take in the current state. there is. Additionally, when a specific state is given to the agent by a given environment (the agent can be seen as obtaining the state through observation), the agent takes action according to the state, and the environment gives the agent a reward. do. Through this process, the agent can interact with the environment and learn actions or policies that can result in high rewards. In other words, the agent or its policy is learned by interacting states, actions, and rewards between the given environment and the agent, and the goal of reinforcement learning may be to establish a policy through which the agent can receive the maximum reward.

본 개시에서, '기계학습 모델'은 주어진 입력에 대한 해답(answer)을 추론하는데 사용하는 임의의 모델을 포함할 수 있다. 일 실시예에 따르면, 기계학습 모델은 입력 레이어(층), 복수 개의 은닉 레이어 및 출력 레이어를 포함한 인공신경망 모델을 포함할 수 있다. 여기서, 각 레이어는 복수의 노드를 포함할 수 있다. 본 개시에서, 기계학습 모델은 인공신경망 모델을 지칭할 수 있으며, 인공신경망 모델은 기계학습 모델을 지칭할 수 있다.In this disclosure, 'machine learning model' may include any model used to infer an answer to a given input. According to one embodiment, the machine learning model may include an artificial neural network model including an input layer (layer), a plurality of hidden layers, and an output layer. Here, each layer may include multiple nodes. In this disclosure, a machine learning model may refer to an artificial neural network model, and an artificial neural network model may refer to a machine learning model.

본 개시에서, '컨테이너 야드(container yard)'는 컨테이너를 보관하고 인도, 인수하는 장소를 지칭할 수 있다. 예를 들어, 컨테이너선의 선사가 컨테이너를 집적, 보관 및 장치하고 적입 컨테이너를 수도하는 항만 근처 지역에 있는 야적장을 지칭할 수 있다. 또한, '컨테이너 야드'는 컨테이너 스토리지 야드로 지칭될 수 있다.In this disclosure, 'container yard' may refer to a place where containers are stored, delivered, and received. For example, it may refer to a yard located in an area near a port where a container ship shipping company accumulates, stores, and installs containers, and transfers loaded containers. Additionally, 'container yard' may be referred to as a container storage yard.

본 개시에서, '리치 스태커(reach stacker)'는 소규모 컨테이너를 처리하는 부두나 컨테이너 장치장 내에서 주로 빈 컨테이너를 옮기거나, 컨테이너 크레인에서 컨테이너 박스를 내려 컨테이너 장치장 내에 장치할 때 쓰는 장비를 지칭할 수 있다.In the present disclosure, 'reach stacker' may refer to equipment mainly used to move empty containers within a dock or container storage yard that handles small containers, or to unload container boxes from a container crane and install them within the container storage facility. there is.

도 1은 본 개시의 일 실시예에 따른 강화학습 기반 컨테이너 장치 위치 추천 시스템(110)이 사용되는 예시를 나타내는 도면이다. 컨테이너 장치 위치 추천 시스템(110)은 강화학습 모델(120)을 포함하여 구성될 수 있다. Figure 1 is a diagram illustrating an example in which the reinforcement learning-based container device location recommendation system 110 according to an embodiment of the present disclosure is used. The container device location recommendation system 110 may be configured to include a reinforcement learning model 120.

일 실시예에 따르면, 컨테이너 장치 위치 추천 시스템(110)은 대상 컨테이너 데이터(132)를 수신할 수 있다. 대상 컨테이너 데이터(132)는 실제 컨테이너 야드에 장치될 컨테이너 또는 리핸들링될 컨테이너와 연관된 데이터를 지칭할 수 있다. 예를 들어, 대상 컨테이너 데이터(132)는 대상 컨테이너를 적재한 컨테이너선의 입항 항구, 입항 일자, 컨테이너선의 운행 노선 정보, 컨테이너선의 해운사 정보, 컨테이너의 크기, 컨테이너의 출고 일자 등을 포함할 수 있다. 등을 포함할 수 있다.According to one embodiment, the container device location recommendation system 110 may receive target container data 132. Target container data 132 may refer to data associated with a container to be installed in an actual container yard or a container to be rehandled. For example, the target container data 132 may include the port of entry of the container ship that loaded the target container, the port arrival date, the operation route information of the container ship, the shipping company information of the container ship, the size of the container, the shipping date of the container, etc. It may include etc.

일 실시예에 따르면, 컨테이너 장치 위치 추천 시스템(110)은 실제 컨테이너 야드의 상태 데이터(134)를 수신할 수 있다. 실제 컨테이너 야드의 상태 데이터(134)는 실제 컨테이너 야드의 현재 상태 및 실제 컨테이너 야드에 장치 중인 컨테이너에 대한 데이터를 지칭할 수 있다. 예를 들어, 실제 컨테이너 야드의 상태 데이터(134)는 컨테이너 야드의 크기, 컨테이너 야드의 최대 장치율일 때의 장치량, 컨테이너 야드에 컨테이너가 장치 중인지 여부, 컨테이너 야드에 컨테이너가 장치 중일 경우 해당 컨테이너의 상세 정보 등을 포함할 수 있다.According to one embodiment, the container device location recommendation system 110 may receive status data 134 of the actual container yard. The status data 134 of the actual container yard may refer to the current status of the actual container yard and data about containers being installed in the actual container yard. For example, the status data 134 of the actual container yard includes the size of the container yard, the device amount at the maximum device rate of the container yard, whether a container is being installed in the container yard, and if a container is being installed in the container yard, the status of the container. It may include detailed information, etc.

일 실시예에 따르면, 컨테이너 장치 위치 추천 시스템(110)은 실제 리치 스태커의 상태 데이터(136)를 수신할 수 있다. 이 경우, 실제 리치 스태커의 상태 데이터(136)는 실제 컨테이너 야드에 위치한 복수의 리치 스태커의 현재 상태와 연관된 데이터를 지칭할 수 있다. 예를 들어, 실제 리치 스태커의 상태 데이터(136)는 실제 리치 스태커의 위치, 실제 리치 스태커의 작업 여부, 실제 리치 스태커 작업량, 실제 리치 스태커의 운행 기사 정보, 실제 리치 스태커의 기타 정보 등을 포함할 수 있다.According to one embodiment, the container device location recommendation system 110 may receive status data 136 of the actual reach stacker. In this case, the status data 136 of the actual reach stacker may refer to data related to the current status of a plurality of reach stackers located in the actual container yard. For example, the status data 136 of the actual reach stacker may include the location of the actual reach stacker, whether the actual reach stacker is working, the amount of actual reach stacker work, the driver information of the actual reach stacker, and other information about the actual reach stacker. You can.

일 실시예에 따르면, 컨테이너 장치 위치 추천 시스템(110)은 강화학습 모델(120)을 이용하여 대상 컨테이너 데이터(132), 실제 컨테이너 야드의 상태 데이터(134) 및 실제 리치 스태커의 상태 데이터(136)로부터 컨테이너 작업 데이터(140)를 생성할 수 있다. 이 경우, 컨테이너 작업 데이터(140)는 작업 유형 및 대상 컨테이너의 장치 위치를 포함할 수 있다. 예를 들어, 컨테이너 작업 데이터(140)는 작업 유형으로서 컨테이너 입고, 컨테이너 장치 위치로서 대상 컨테이너가 실제 컨테이너 야드에 장치될 위치가 포함된 컨테이너 입고 작업 데이터를 포함할 수 있다. 다른 예로서, 컨테이너 작업 데이터(140)는 작업 유형으로서 컨테이너 리핸들링, 컨테이너 장치 위치로서 대상 컨테이너가 리핸들링될 위치가 포함된 컨테이너 리핸들링 작업 데이터를 포함할 수 있다. 이에 대한 세부적인 내용은 도 9 및 도 10을 참조하여 후술된다.According to one embodiment, the container device location recommendation system 110 uses the reinforcement learning model 120 to collect target container data 132, state data 134 of the actual container yard, and state data 136 of the actual reach stacker. Container task data 140 can be generated from. In this case, the container task data 140 may include the task type and device location of the target container. For example, the container operation data 140 may include container warehousing operation data including container warehousing as a task type and a location where the target container will be installed in the actual container yard as a container device location. As another example, the container task data 140 may include container rehandling task data including container rehandling as a task type and a location at which the target container is to be rehandled as a container device location. Details about this will be described later with reference to FIGS. 9 and 10.

일 실시예에 따르면, 컨테이너 장치 위치 추천 시스템(110)은 생성한 컨테이너 작업 데이터(140)를 리치 스태커(150)에 전송하여 해당 컨테이너 작업을 수행하도록 지시할 수 있다. 이 경우 리치 스태커(150)는 실제 컨테이너 야드에 존재하는 복수의 리치 스태커 중 하나일 수 있다. 도 1에서 하나의 리치 스태커(150)를 대상으로 컨테이너 작업 데이터(140)를 전송하는 것으로 도시되었으나 이에 한정되지 않으며, 상이한 수의 리치 스태커에게 컨테이너 작업 데이터(140)가 전송될 수 있다.According to one embodiment, the container device location recommendation system 110 may transmit the generated container task data 140 to the reach stacker 150 and instruct it to perform the corresponding container task. In this case, the reach stacker 150 may be one of a plurality of reach stackers existing in an actual container yard. In FIG. 1 , the container work data 140 is shown as being transmitted to one reach stacker 150, but the present invention is not limited thereto, and the container work data 140 may be transmitted to a different number of reach stackers.

일 실시예에 따르면, 강화학습 모델(120)은 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터에 기초하여 학습될 수 있다. 이 경우, 강화학습 모델(120)은, 가장 높은 보상을 주는 컨테이너 장치 위치를 선택하도록 학습된 임의의 알고리즘을 포함할 수 있다. 예를 들어, 강화학습 모델(120)은 기계학습 모델(예: 인공신경망 모델 등)을 이용하여 학습될 수 있다. 이에 대한 세부적인 내용은 도 6 내지 8을 참조하여 후술된다.According to one embodiment, the reinforcement learning model 120 may be learned based on virtual container data, state data of a virtual container yard, and state data of a virtual reach stacker. In this case, the reinforcement learning model 120 may include an arbitrary algorithm learned to select the container device location that gives the highest reward. For example, the reinforcement learning model 120 may be learned using a machine learning model (eg, artificial neural network model, etc.). Details about this will be described later with reference to FIGS. 6 to 8.

이상 설명한 구성을 통해, 본 개시의 일부 실시예에 따른 컨테이너 장치 위치 추천 시스템(110)은 강화학습 모델을 이용하여 컨테이너 야드에서 컨테이너를 장치할 최적의 위치를 제공함으로써, 컨테이너 야드에서 컨테이너를 효율적으로 장치하고 불필요한 컨테이너 리핸들링 작업을 최소화할 수 있다.Through the configuration described above, the container device location recommendation system 110 according to some embodiments of the present disclosure provides the optimal location for installing containers in the container yard using a reinforcement learning model, thereby efficiently storing containers in the container yard. You can minimize unnecessary container rehandling work.

도 2는 본 개시의 일 실시예에 따른 정보 처리 시스템(230)이 복수의 리치 스태커(212_1, 212_2, 212_3)와 통신 가능하도록 연결된 구성을 나타내는 개요도이다. 도 2에서 대상 객체를 리치 스태커(212_1, 212_2, 212_3)로 도시하였으나 이에 한정하지 않으며, 차량을 운행하는 운행 기사 또는 운행 기사가 이용하는 사용자 단말을 대상 객체로 할 수 있다. 또한, 정보 처리 시스템(230)은 도 1을 참조하여 설명한 컨테이너 장치 위치 추천 시스템(들)을 포함할 수 있다. Figure 2 is a schematic diagram showing a configuration in which the information processing system 230 according to an embodiment of the present disclosure is connected to communicate with a plurality of reach stackers 212_1, 212_2, and 212_3. In Figure 2, the target object is shown as a reach stacker (212_1, 212_2, 212_3), but the target object is not limited to this, and the target object may be a driver operating the vehicle or a user terminal used by the driver. Additionally, the information processing system 230 may include the container device location recommendation system(s) described with reference to FIG. 1 .

일 실시예에서, 정보 처리 시스템(230)은 컨테이너 장치 위치를 추천하는 것과 관련된 컴퓨터 실행 가능한 프로그램(예를 들어, 다운로드 가능한 어플리케이션) 및 데이터를 저장, 제공 및 실행할 수 있는 하나 이상의 서버 장치 및/또는 데이터베이스, 또는 클라우드 컴퓨팅 서비스 기반의 하나 이상의 분산 컴퓨팅 장치 및/또는 분산 데이터베이스를 포함할 수 있다. 정보 처리 시스템(230)은 어플리케이션(예: 컨테이너 장치 위치 관련 어플리케이션 등)을 통해 입력되는 신호에 대응하는 정보를 제공하거나 대응하는 처리를 수행할 수 있다. 예를 들어, 정보 처리 시스템(230)은 컨테이너 장치 위치와 관련된 임의의 어플리케이션을 통해 복수의 리치 스태커(212_1, 212_2, 212_3)에 컨테이너 작업 데이터를 전송할 수 있다.In one embodiment, information processing system 230 may include one or more server devices and/or capable of storing, providing, and executing computer-executable programs (e.g., downloadable applications) and data related to recommending container device locations. It may include a database, or one or more distributed computing devices and/or distributed databases based on cloud computing services. The information processing system 230 may provide information corresponding to a signal input through an application (eg, an application related to container device location, etc.) or perform corresponding processing. For example, the information processing system 230 may transmit container task data to a plurality of reach stackers 212_1, 212_2, and 212_3 through any application related to the container device location.

정보 처리 시스템(230)은 네트워크(220)를 통해 복수의 리치 스태커(212_1, 212_2, 212_3)와 통신할 수 있다. 네트워크(220)는 복수의 리치 스태커(212_1, 212_2, 212_3)와 정보 처리 시스템(230) 사이의 통신이 가능하도록 구성될 수 있다. 네트워크(220)는 설치 환경에 따라, 예를 들어, 이더넷(Ethernet), 유선 홈 네트워크(Power Line Communication), 전화선 통신 장치 및 RS-serial 통신 등의 유선 네트워크, 이동통신망, WLAN(Wireless LAN), Wi-Fi, Bluetooth 및 ZigBee 등과 같은 무선 네트워크 또는 그 조합으로 구성될 수 있다. 통신 방식은 제한되지 않으며, 네트워크(220)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망, 위성망 등)을 활용하는 통신 방식뿐만 아니라 리치 스태커(212_1, 212_2, 212_3) 사이의 근거리 무선 통신 역시 포함될 수 있다.The information processing system 230 may communicate with a plurality of reach stackers 212_1, 212_2, and 212_3 through the network 220. The network 220 may be configured to enable communication between a plurality of reach stackers 212_1, 212_2, and 212_3 and the information processing system 230. Depending on the installation environment, the network 220 may be, for example, a wired network such as Ethernet, a wired home network (Power Line Communication), a telephone line communication device, and RS-serial communication, a mobile communication network, a wireless LAN (WLAN), It may consist of wireless networks such as Wi-Fi, Bluetooth, and ZigBee, or a combination thereof. The communication method is not limited, and may include communication methods utilizing communication networks that the network 220 may include (e.g., mobile communication networks, wired Internet, wireless Internet, broadcasting networks, satellite networks, etc.), as well as reach stacker (212_1, 212_2, 212_3) ) may also include short-range wireless communication between

도 2에서 3개의 리치 스태커(212_1, 212_2, 212_3)가 네트워크(220)를 통해 정보 처리 시스템(230)과 통신하는 것으로 도시되어 있으나, 이에 한정되지 않으며, 상이한 수의 리치 스태커가 네트워크(220)를 통해 정보 처리 시스템(230)과 통신하도록 구성될 수도 있다.In FIG. 2 , three reach stackers (212_1, 212_2, 212_3) are shown as communicating with the information processing system 230 through the network 220, but this is not limiting, and a different number of reach stackers may be connected to the network 220. It may be configured to communicate with the information processing system 230 through.

일 실시예에서, 리치 스태커(212_1, 212_2, 212_3)은 네트워크(220)를 통해 컨테이너 작업과 관련된 데이터 요청을 정보 처리 시스템(230)으로 전송하고, 정보 처리 시스템(230)으로부터 컨테이너 작업과 관련된 데이터를 수신할 수 있다. 리치 스태커(212_1, 212_2, 212_3)가 컨테이너 작업과 관련된 데이터를 수신하는 것에 응답하여, 해당 컨테이너를 작업하도록 운행될 수 있다. In one embodiment, reach stackers 212_1, 212_2, and 212_3 transmit requests for data related to container operations to information processing system 230 via network 220 and retrieve data related to container operations from information processing system 230. can receive. In response to receiving data related to container work, the reach stackers 212_1, 212_2, and 212_3 may be operated to work on the corresponding container.

도 3은 본 개시의 일 실시예에 따른 정보 처리 시스템(230)의 내부 구성을 나타내는 블록도이다. 컨테이너 장치 위치 추천을 위한 정보 처리 시스템(230)은 메모리(310), 프로세서(320), 통신 모듈(330) 및 입출력 인터페이스(340)를 포함할 수 있다. 도 3에 도시된 바와 같이, 정보 처리 시스템(230)은 통신 모듈(330)을 이용하여 네트워크를 통해 정보 및/또는 데이터를 통신할 수 있도록 구성될 수 있다.Figure 3 is a block diagram showing the internal configuration of the information processing system 230 according to an embodiment of the present disclosure. The information processing system 230 for container device location recommendation may include a memory 310, a processor 320, a communication module 330, and an input/output interface 340. As shown in FIG. 3, the information processing system 230 may be configured to communicate information and/or data over a network using a communication module 330.

메모리(310)는 비-일시적인 임의의 컴퓨터 판독 가능한 기록매체를 포함할 수 있다. 일 실시예에 따르면, 메모리(310)는 디스크 드라이브, SSD(solid state drive), 플래시 메모리(flash memory) 등과 같은 비소멸성 대용량 저장 장치(permanent mass storage device)를 포함할 수 있다. 다른 예로서, ROM, SSD, 플래시 메모리, 디스크 드라이브 등과 같은 비소멸성 대용량 저장 장치는 메모리와는 구분되는 별도의 영구 저장 장치로서 정보 처리 시스템(230)에 포함될 수 있다. 또한, 메모리(310)에는 운영체제와 적어도 하나의 프로그램 코드(예를 들어, 정보 처리 시스템(230)에 설치되어 구동되는 컨테이너 작업 데이터 생성 명령 등)가 저장될 수 있다. 도 3에서, 메모리(310)는 단일 메모리인 것으로 도시되었지만, 이는 설명의 편의를 위한 것일 뿐이며, 메모리(310)는 복수의 메모리 및/또는 버퍼 메모리를 포함할 수 있다.Memory 310 may include any non-transitory computer-readable recording medium. According to one embodiment, the memory 310 may include a non-permanent mass storage device, such as a disk drive, solid state drive (SSD), or flash memory. As another example, non-perishable mass storage devices such as ROM, SSD, flash memory, disk drive, etc. may be included in the information processing system 230 as a separate persistent storage device that is distinct from memory. Additionally, the memory 310 may store an operating system and at least one program code (eg, a container task data generation command installed and driven in the information processing system 230, etc.). In FIG. 3, the memory 310 is shown as a single memory, but this is only for convenience of explanation, and the memory 310 may include a plurality of memories and/or buffer memories.

이러한 소프트웨어 구성요소들은 메모리(310)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 이러한 정보 처리 시스템(230)에 직접 연결가능한 기록 매체를 포함할 수 있는데, 예를 들어, 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 예로서, 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 모듈(330)을 통해 메모리(310)에 로딩될 수도 있다. 예를 들어, 적어도 하나의 프로그램은 개발자들 또는 어플리케이션의 설치 파일을 배포하는 파일 배포 시스템이 통신 모듈(330)을 통해 제공하는 파일들에 의해 설치되는 컴퓨터 프로그램(예를 들어, 컨테이너 작업 데이터 생성을 위한 프로그램 등)에 기반하여 메모리(310)에 로딩될 수 있다.These software components may be loaded from a computer-readable recording medium separate from the memory 310. Recording media readable by such a separate computer may include recording media directly connectable to the information processing system 230, for example, floppy drives, disks, tapes, DVD/CD-ROM drives, memory cards, etc. It may include a recording medium that can be read by a computer. As another example, software components may be loaded into the memory 310 through the communication module 330 rather than a computer-readable recording medium. For example, at least one program is a computer program (e.g., container task data generation) installed by files provided through the communication module 330 by developers or a file distribution system that distributes the installation file of the application. It may be loaded into the memory 310 based on a program (for example, program, etc.).

프로세서(320)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(310) 또는 통신 모듈(330)에 의해 사용자 단말(미도시) 또는 다른 외부 시스템으로 제공될 수 있다. 예를 들어, 프로세서(320)는 가상 컨테이너 야드 상태 데이터 등을 통해 강화학습 모델을 학습시킬 수 있다.The processor 320 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to a user terminal (not shown) or another external system by the memory 310 or the communication module 330. For example, the processor 320 may train a reinforcement learning model through virtual container yard status data, etc.

통신 모듈(330)은 네트워크를 통해 사용자 단말(미도시)과 정보 처리 시스템(230)이 서로 통신하기 위한 구성 또는 기능을 제공할 수 있으며, 정보 처리 시스템(230)이 외부 시스템(일례로 별도의 클라우드 시스템 등)과 통신하기 위한 구성 또는 기능을 제공할 수 있다. 일례로, 정보 처리 시스템(230)의 프로세서(320)의 제어에 따라 제공되는 제어 신호, 명령, 데이터 등이 통신 모듈(330)과 네트워크를 거쳐 사용자 단말 및/또는 외부 시스템의 통신 모듈을 통해 사용자 단말 및/또는 외부 시스템으로 전송될 수 있다. 예를 들어, 외부 시스템(예: 리치 스태커의 정보 처리 시스템)은 정보 처리 시스템(230)으로부터 컨테이너 작업 데이터 등을 전달받을 수 있다.The communication module 330 may provide a configuration or function for a user terminal (not shown) and the information processing system 230 to communicate with each other through a network, and the information processing system 230 may be connected to an external system (for example, a separate Configuration or functions for communicating with cloud systems, etc. can be provided. For example, control signals, commands, data, etc. provided under the control of the processor 320 of the information processing system 230 pass through the communication module 330 and the network to the user through the communication module of the user terminal and/or external system. It may be transmitted to the terminal and/or external system. For example, an external system (eg, a reach stacker's information processing system) may receive container task data, etc. from the information processing system 230.

또한, 정보 처리 시스템(230)의 입출력 인터페이스(340)는 정보 처리 시스템(230)과 연결되거나 정보 처리 시스템(230)이 포함할 수 있는 입력 또는 출력을 위한 장치(미도시)와의 인터페이스를 위한 수단일 수 있다. 예를 들면, 입출력 인터페이스(340)는 PCI express 인터페이스, 이더넷(ethernet) 인터페이스 중 적어도 하나를 포함할 수 있다. 도 3에서는 입출력 인터페이스(340)가 프로세서(320)와 별도로 구성된 요소로서 도시되었으나, 이에 한정되지 않으며, 입출력 인터페이스(340)가 프로세서(320)에 포함되도록 구성될 수 있다. 정보 처리 시스템(230)은 도 3의 구성요소들보다 더 많은 구성요소들을 포함할 수 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다.Additionally, the input/output interface 340 of the information processing system 230 is connected to the information processing system 230 or means for interfacing with a device (not shown) for input or output that the information processing system 230 may include. It can be. For example, the input/output interface 340 may include at least one of a PCI express interface and an Ethernet interface. In FIG. 3 , the input/output interface 340 is shown as an element configured separately from the processor 320, but the present invention is not limited thereto, and the input/output interface 340 may be included in the processor 320. Information processing system 230 may include more components than those in FIG. 3 . However, there is no need to clearly show most prior art components.

정보 처리 시스템(230)의 프로세서(320)는 복수의 사용자 단말 및/또는 복수의 외부 시스템으로부터 수신된 정보 및/또는 데이터를 관리, 처리 및/또는 저장하도록 구성될 수 있다. 일 실시예에 따르면, 프로세서(320)는 실제 컨테이너 야드의 상태 데이터 등을 수신할 수 있다. 도 3에서, 프로세서(320)는 단일 프로세서인 것으로 도시되었지만, 이는 설명의 편의를 위한 것일 뿐이며, 프로세서(320)는 복수의 프로세서를 포함할 수 있다.The processor 320 of the information processing system 230 may be configured to manage, process, and/or store information and/or data received from a plurality of user terminals and/or a plurality of external systems. According to one embodiment, the processor 320 may receive status data of an actual container yard, etc. In FIG. 3, the processor 320 is shown as a single processor, but this is only for convenience of explanation, and the processor 320 may include a plurality of processors.

도 4는 본 개시의 일 실시예에 따른 정보 처리 시스템의 프로세서(320)의 내부 구성을 나타내는 블록도이다. 프로세서(320)는 데이터 수신부(410), 학습부(420) 및 컨테이너 야드 마스터(430)를 포함할 수 있다. 이 경우, 데이터 수신부(410), 학습부(420) 및 컨테이너 야드 마스터(430)와 관련된 프로그램은 임의의 저장 매체(예: 메모리(310) 등)에 저장되거나 로딩되어, 프로세서(320)에 의해 접근되거나 실행될 수 있다. 도 4에서 프로세서(320) 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로, 복수의 구성요소가 실제 물리적 환경에서 서로 통합되는 형태로 구현될 수 있다. 또한, 도 4에서는 프로세서(320)는 데이터 수신부(410), 학습부(420) 및 컨테이너 야드 마스터(430)로 나누어서 구현되었으나, 이에 한정되지 않으며, 일부 구성이 생략되거나 다른 구성이 추가될 수 있다. Figure 4 is a block diagram showing the internal configuration of the processor 320 of the information processing system according to an embodiment of the present disclosure. The processor 320 may include a data receiving unit 410, a learning unit 420, and a container yard master 430. In this case, programs related to the data receiving unit 410, learning unit 420, and container yard master 430 are stored or loaded in any storage medium (e.g., memory 310, etc.), and are processed by the processor 320. Can be accessed or executed. In FIG. 4 , each component of the processor 320 represents functionally distinct functional elements, and a plurality of components may be implemented in a form that is integrated with each other in an actual physical environment. In addition, in Figure 4, the processor 320 is implemented by dividing into a data receiving unit 410, a learning unit 420, and a container yard master 430, but it is not limited to this, and some components may be omitted or other components may be added. .

일 실시예에 따르면, 데이터 수신부(410)는 컨테이너 작업 데이터 생성을 위한 데이터를 수신할 수 있다. 예를 들어, 데이터 수신부(410)는 대상 컨테이너 데이터(예: 대상 컨테이너를 적재한 컨테이너선의 입항 항구, 입항 일자 등), 실제 컨테이너 야드의 상태 데이터(예: 컨테이너 야드의 크기, 컨테이너 야드의 최대 장치율일 때의 장치량 등) 및 실제 리치 스태커의 상태 데이터(예: 리치 스태커의 위치, 리치 스태커의 작업 여부 등)를 수신할 수 있다. 또한, 데이터 수신부(410)는 수신한 데이터를 컨테이너 야드 마스터(430)에 전송할 수 있다.According to one embodiment, the data receiving unit 410 may receive data for generating container work data. For example, the data receiver 410 may receive target container data (e.g., port of entry of the container ship loading the target container, port arrival date, etc.), status data of the actual container yard (e.g., size of the container yard, maximum device of the container yard, etc.) It is possible to receive data on the status of the actual reach stacker (e.g., location of the reach stacker, whether the reach stacker is working, etc.). Additionally, the data receiving unit 410 may transmit the received data to the container yard master 430.

일 실시예에 따르면, 학습부(420)는 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터 기초하여 컨테이너 야드 마스터(430)의 강화학습 모델을 학습시킬 수 있다. 보다 상세하게, 학습부(420)는 강화학습 모델을 이용하여 가상 컨테이너 작업 데이터를 생성하고 이에 기초하여 획득된 보상이 최대가 되도록 강화학습 모델을 업데이트 함으로써 컨테이너 야드 마스터(430)의 강화학습 모델을 학습시킬 수 있다. 학습부(420)의 내부 구성의 예시에 대한 세부적인 내용은 도 5를 참조하여 후술된다. 또한, 학습부(420)의 학습 방법의 예시는 도 6 내지 8를 참조하여 후술된다.According to one embodiment, the learning unit 420 may learn a reinforcement learning model of the container yard master 430 based on virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker. In more detail, the learning unit 420 generates virtual container work data using a reinforcement learning model and updates the reinforcement learning model to maximize the reward obtained based on this, thereby creating the reinforcement learning model of the container yard master 430. It can be learned. Details of an example of the internal configuration of the learning unit 420 will be described later with reference to FIG. 5 . Additionally, examples of the learning method of the learning unit 420 will be described later with reference to FIGS. 6 to 8.

일 실시예에 따르면, 컨테이너 야드 마스터(430)는 대상 컨테이너 데이터, 실제 컨테이너 야드의 상태 데이터 및 실제 리치 스태커의 상태 데이터로부터 컨테이너 작업 데이터를 생성할 수 있다. 이를 위해, 컨테이너 야드 마스터(430)는 강화학습 모델을 포함하여 구성될 수 있다. 또한, 컨테이너 야드 마스터(430)는 정보 처리 시스템의 통신 모듈을 통해 실제 컨테이너 야드에 위치한 복수의 리치 스태커 중 적어도 하나에 대해 생성한 컨테이너 작업 데이터를 전송하여 컨테이너 작업을 수행하도록 할 수 있다.According to one embodiment, the container yard master 430 may generate container operation data from target container data, status data of the actual container yard, and status data of the actual reach stacker. To this end, the container yard master 430 may be configured to include a reinforcement learning model. Additionally, the container yard master 430 may perform container work by transmitting container work data generated for at least one of the plurality of reach stackers located in the actual container yard through the communication module of the information processing system.

도 5는 본 개시의 일 실시예에 따른 학습부(420)의 내부 구성을 나타내는 도면이다. 학습부(420)는 가상 컨테이너 설정부(510), 가상 컨테이너 야드 설정부(520), 가상 리치 스태커 제어부(530) 및 가상 컨테이너 야드 마스터(540)를 포함할 수 있다. 이 경우, 가상 컨테이너 설정부(510), 가상 컨테이너 야드 설정부(520), 가상 리치 스태커 제어부(530) 및 가상 컨테이너 야드 마스터(540)와 관련된 프로그램은 임의의 저장 매체(예: 메모리(310) 등)에 저장되거나 로딩되어, 프로세서(320)에 의해 접근되거나 실행될 수 있다. 도 5에서 학습부(420) 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로, 복수의 구성요소가 실제 물리적 환경에서 서로 통합되는 형태로 구현될 수 있다. 또한, 도 5에서는 학습부(420)는 가상 컨테이너 설정부(510), 가상 컨테이너 야드 설정부(520), 가상 리치 스태커 제어부(530) 및 가상 컨테이너 야드 마스터(540)로 나누어서 구현되었으나, 이에 한정되지 않으며, 일부 구성이 생략되거나 다른 구성이 추가될 수 있다. 예를 들어, 가상 컨테이너 야드 마스터(540)는 학습부(420)에서 생략될 수 있으며, 학습부(420)는 직접 프로세서의 컨테이너 야드 마스터와 학습 데이터(예: 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터, 가상 리치 스태커의 상태 데이터)를 송수신함으로써 프로세서의 컨테이너 야드 마스터를 학습시킬 수 있다. FIG. 5 is a diagram showing the internal configuration of the learning unit 420 according to an embodiment of the present disclosure. The learning unit 420 may include a virtual container setting unit 510, a virtual container yard setting unit 520, a virtual reach stacker control unit 530, and a virtual container yard master 540. In this case, programs related to the virtual container setting unit 510, the virtual container yard setting unit 520, the virtual reach stacker control unit 530, and the virtual container yard master 540 are stored in any storage medium (e.g., memory 310). etc.) and may be accessed or executed by the processor 320. In FIG. 5, each component of the learning unit 420 represents functional elements that are functionally distinct, and a plurality of components may be implemented in a form that is integrated with each other in an actual physical environment. In addition, in Figure 5, the learning unit 420 is implemented by dividing into a virtual container setting unit 510, a virtual container yard setting unit 520, a virtual reach stacker control unit 530, and a virtual container yard master 540, but is limited to this. Some configurations may be omitted or other configurations may be added. For example, the virtual container yard master 540 may be omitted from the learning unit 420, and the learning unit 420 directly processes the processor's container yard master and learning data (e.g., virtual container data, status of the virtual container yard). By sending and receiving data (state data of the virtual reach stacker), the processor's container yard master can be learned.

일 실시예에 따르면, 가상 컨테이너 설정부(510)는 강화학습 모델의 학습을 위한 가상 컨테이너 데이터를 설정할 수 있다. 가상 컨테이너 데이터는 대상 컨테이너 데이터에 대응하는 가상의 컨테이너에 대한 데이터로 설정될 수 있다. 가상 컨테이너 데이터는 가상 컨테이너선의 입항 항구, 입항 일자, 가상 컨테이너선의 운행 노선정보, 가상 컨테이너 선의 해운사 정보 등을 포함할 수 있다. 또한, 가상 컨테이너 설정부(510)는 설정된 가상 컨테이너 데이터를 가상 컨테이너 야드 마스터(540)에 전송할 수 있다.According to one embodiment, the virtual container setting unit 510 may set virtual container data for training a reinforcement learning model. Virtual container data may be set as data for a virtual container corresponding to target container data. The virtual container data may include the port of entry of the virtual container ship, the port arrival date, the operation route information of the virtual container ship, and the shipping company information of the virtual container ship. Additionally, the virtual container setting unit 510 may transmit the set virtual container data to the virtual container yard master 540.

일 실시예에 따르면, 가상 컨테이너 야드 설정부(520)는 가상 컨테이너 야드의 상태 데이터를 설정할 수 있다. 가상 컨테이너 야드는 실제 컨테이너 야드를 가상 환경 상에서 구현한 가상의 공간 내의 컨테이너 야드를 지칭할 수 있다. 예를 들어, 가상 컨테이너 야드는 하나 이상의 컨테이너 야드 섹션을 포함할 수 있으며, 컨테이너 야드 섹션 사이에는 리치 스태커, 트랙터 및 컨테이너 트레일러가 통행할 수 있는 도로가 존재할 수 있다. 가상 컨테이너 야드 섹션의 구성은 실제 컨테이너 장치장의 환경과 동일하게 변경될 수 있다. 또한, 가상 컨테이너 야드의 상태 데이터는 실제 컨테이너 야드의 상태 데이터에 대응하는 데이터일 수 있다. 가상 컨테이너 야드의 상태 데이터는 가상 컨테이너 야드의 크기, 가상 컨테이너 야드의 최대 장치율일 때의 장치량, 가상 컨테이너 야드에 가상 컨테이너가 장치 중인지 여부, 가상 컨테이너 야드에 가상 컨테이너가 장치 중일 경우 해당 가상 컨테이너의 상세 정보 등을 포함할 수 있다. 가상 컨테이너 야드 설정부(520)는 설정된 가상 컨테이너 야드의 상태 데이터를 가상 컨테이너 야드 마스터(540)에 전송할 수 있다.According to one embodiment, the virtual container yard setting unit 520 may set status data of the virtual container yard. A virtual container yard may refer to a container yard in a virtual space that implements an actual container yard in a virtual environment. For example, a virtual container yard may include one or more container yard sections, and there may be roads between the container yard sections for passage by reach stackers, tractors, and container trailers. The configuration of the virtual container yard section can be changed to be the same as the environment of the actual container yard. Additionally, the status data of the virtual container yard may be data corresponding to the status data of the actual container yard. The status data of the virtual container yard includes the size of the virtual container yard, the device amount at the maximum device rate of the virtual container yard, whether a virtual container is being installed in the virtual container yard, and if a virtual container is being installed in the virtual container yard, the status of the corresponding virtual container. It may include detailed information, etc. The virtual container yard setting unit 520 may transmit status data of the set virtual container yard to the virtual container yard master 540.

일 실시예에 따르면, 가상 리치 스태커 제어부(530)는 가상 리치 스태커 야드의 상태 데이터를 설정할 수 있다. 가상 리치 스태커는 실제 리치 스태커를 가상 환경상에 구현한 가상의 객체를 지칭할 수 있다. 예를 들어, 가상 리치 스태커는 가상 컨테이너 야드에 다수 존재할 수 있으며, 복수의 가상 리치 스태커 각각은 주행 중 상태, 작업 중 상태 또는 컨테이너 이송 중 상태 중 하나의 속성을 가질 수 있다. 가상 리치 스태커의 수는 실제 컨테이너 야드에 존재하는 실제 리치 스태커의 수와 동일하게 변경될 수 있다. 또한, 가상 리치 스태커의 상태 데이터는 실제 컨테이너 야드에 존재하는 실제 리치 스태커의 상태 데이터에 대응하는 데이터일 수 있다. 가상 리치 스태커의 상태 데이터는 가상 리치 스태커의 위치, 가상 리치 스태커의 작업 여부, 가상 리치 스태커의 작업량, 가상 리치 스태커의 운행 기사 정보, 실제 리치 스태커의 기타 정보 등을 포함할 수 있다. 가상 리치 스태커 제어부(530)는 설정된 가상 리치 스태커의 상태 데이터를 가상 컨테이너 야드 마스터(540)에 전송할 수 있다.According to one embodiment, the virtual reach stacker control unit 530 may set status data of the virtual reach stacker yard. A virtual reach stacker may refer to a virtual object that implements an actual reach stacker in a virtual environment. For example, a plurality of virtual reach stackers may exist in a virtual container yard, and each of the plurality of virtual reach stackers may have one attribute of a running state, a working state, or a container transporting state. The number of virtual reach stackers can be changed to be the same as the number of actual reach stackers existing in the actual container yard. Additionally, the status data of the virtual reach stacker may be data corresponding to the status data of the actual reach stacker existing in the actual container yard. The status data of the virtual reach stacker may include the location of the virtual reach stacker, whether the virtual reach stacker is working, the workload of the virtual reach stacker, driver information of the virtual reach stacker, and other information about the actual reach stacker. The virtual reach stacker control unit 530 may transmit status data of the set virtual reach stacker to the virtual container yard master 540.

일 실시예에 따르면, 가상 리치 스태커 제어부(530)는 가상 컨테이너 야드 마스터(540)에 의해 생성된 가상 컨테이너 작업 데이터를 수신하여 가상 리치 스태커를 제어할 수 있다. 가상 리치 스태커는 가상 컨테이너 작업 데이터에 따라 장치, 양하, 리핸들링 작업 등을 수행할 수 있다.According to one embodiment, the virtual reach stacker control unit 530 may receive virtual container work data generated by the virtual container yard master 540 and control the virtual reach stacker. The virtual reach stacker can perform device, unloading, and rehandling tasks according to virtual container work data.

일 실시예에 따르면, 가상 컨테이너 야드 마스터(540)는 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터로부터 가상 컨테이너 작업 데이터를 생성할 수 있다. 이를 위해, 가상 컨테이너 야드 마스터(540)는 강화학습 모델을 포함하여 구성될 수 있다. 또한, 가상 컨테이너 야드 마스터(540)는 가상 리치 스태커 제어부(530)에 생성한 가상 컨테이너 작업 데이터를 전송하여 가상 환경 상에서 컨테이너 작업을 수행하도록 할 수 있다. According to one embodiment, the virtual container yard master 540 may generate virtual container work data from virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker. To this end, the virtual container yard master 540 may be configured to include a reinforcement learning model. Additionally, the virtual container yard master 540 may transmit the generated virtual container work data to the virtual reach stacker control unit 530 to perform container work in a virtual environment.

일 실시예에 따르면, 가상 컨테이너 야드 마스터(540)의 강화학습 모델은 일련의 학습 방법을 통해 학습될 수 있다. 예를 들어, 가상 컨테이너 야드 마스터(540)의 강화학습 모델은 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터를 수신하고, 수신된 데이터에 기초하여 가상 컨테이너 작업 데이터를 생성하고, 가상 리치 스태커에게 가상 컨테이너 작업 데이터를 전송하고, 가상 리치 스태커에 의해 생성된 작업 수행 결과에 대한 보상을 획득하는 학습 방법을 통해 학습될 수 있다. 또한, 강화학습 모델은 이러한 학습 방법을 반복하여 획득된 보상이 최대가 되도록 업데이트 될 수 있다. 이에 대한 세부적인 내용은 도 8을 참조하여 후술된다. 이러한 과정에 의해 학습된 가상 컨테이너 야드 마스터(540)의 강화학습 모델은 컨테이너 야드 마스터의 강화학습 모델을 갱신하여 최적의 컨테이너 작업 데이터를 생성하도록 할 수 있다.According to one embodiment, the reinforcement learning model of the virtual container yard master 540 may be learned through a series of learning methods. For example, the reinforcement learning model of the virtual container yard master 540 receives virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker, and generates virtual container operation data based on the received data. , It can be learned through a learning method that transmits virtual container work data to the virtual reach stacker and obtains compensation for the task performance results generated by the virtual reach stacker. Additionally, the reinforcement learning model can be updated to maximize the reward obtained by repeating this learning method. Details about this will be described later with reference to FIG. 8. The reinforcement learning model of the virtual container yard master 540 learned through this process can update the reinforcement learning model of the container yard master to generate optimal container work data.

도 6는 본 개시의 일 실시예에 따른 강화학습 모델의 예시를 나타내는 도면이다. 강화학습(Reinforcement Learning, RL)은 기계학습의 한 분야로, 에이전트(610)가 주어진 환경(600)에서의 현재 상태에 대해 선택 가능한 액션들 중 보상을 최대화하는 액션을 선택하는 학습 방법을 지칭할 수 있다. 여기서, 에이전트(610)는 강화학습의 대상이 되는 알고리즘 및/또는 기계학습 모델(예: 인공신경망 모델 등)을 포함할 수 있고, 주어진 환경을 탐색하여 현재의 상태에서 자신이 취할 수 있는 액션의 확률을 나타내는 정책(policy)을 수립할 수 있다. Figure 6 is a diagram showing an example of a reinforcement learning model according to an embodiment of the present disclosure. Reinforcement Learning (RL) is a field of machine learning and refers to a learning method in which an agent 610 selects an action that maximizes reward among selectable actions for the current state in a given environment 600. You can. Here, the agent 610 may include an algorithm and/or a machine learning model (e.g., artificial neural network model, etc.) that are subject to reinforcement learning, and may explore the given environment to determine the actions it can take in the current state. A policy representing probability can be established.

본 개시에서, 강화학습은 연속적인 액션을 다룰 수 있는 알고리즘들 중 어느 것도 포함할 수 있다. 일 실시예에서, 에이전트는 정책 기반 강화학습(policy-based RL)을 이용하여 학습될 수 있다. 여기서, 정책 기반 강화학습은 신경망을 이용하여 정책(policy)을 함수로 직접 모델링하는 학습 방식을 지칭할 수 있다. 이렇게 모델링된 정책 신경망에 상태(state)가 입력으로 들어오면, 이에 대한 액션(action)을 직접 출력할 수 있다.In this disclosure, reinforcement learning may include any of the algorithms that can handle continuous actions. In one embodiment, the agent may be trained using policy-based reinforcement learning (policy-based RL). Here, policy-based reinforcement learning may refer to a learning method that directly models a policy as a function using a neural network. When a state is input to the policy neural network modeled in this way, an action in response to it can be directly output.

다른 실시예에서, 에이전트는 액터-크리틱(actor-critic) 알고리즘을 이용하여 학습될 수 있다. 여기서, 액터-크리틱 알고리즘은 정책 기반 강화학습을 통해 액션을 결정할 수 있고, 가치 함수(value function)를 이용하여 이 정책의 학습을 도와줄 수 있다. 예를 들어, A2C(Advantage Actor-Critic), A3C(Asynchronous Advantage Actor-Critic), PPO(Proximal Policy Optimization) 등이 액터-크리틱 알고리즘에 포함될 수 있다.In another embodiment, the agent may be trained using an actor-critic algorithm. Here, the actor-critic algorithm can determine actions through policy-based reinforcement learning and can help learn this policy using a value function. For example, Advantage Actor-Critic (A2C), Asynchronous Advantage Actor-Critic (A3C), Proximal Policy Optimization (PPO), etc. may be included in the actor-critic algorithm.

즉, 본 개시는, 정책 기반 강화학습 알고리즘 및 액터-크리틱 알고리즘을 모두 포함할 수 있는데, 두 방식은 모두 정책 그래디언트(Policy Gradient, PG)를 이용할 수 있다. 여기서, 정책 그래디언트는 에이전트가 더 많은 보상(reward)을 받을 수 있도록 정책의 파라미터를 조금씩 바꿔가는 방식을 지칭할 수 있다. 이를 통해, 모델링된 정책 신경망은 최적의 정책으로 학습될 수 있다.That is, the present disclosure may include both a policy-based reinforcement learning algorithm and an actor-critic algorithm, and both methods may use policy gradient (PG). Here, policy gradient may refer to a method of gradually changing policy parameters so that the agent can receive more rewards. Through this, the modeled policy neural network can be learned as an optimal policy.

이러한 정책 그래디언트 알고리즘은, 주어진 상태에 대하여 액션의 확률을 출력하는 방식인 확률 정책 그래디언트 최적화 방식(Stochastic Policy Optimization) 및 주어진 상태에 대한 액션을 직접 출력하는 방식인 결정 정책 그래디언트 최적화 방식(Deterministic Policy Optimization)을 포함할 수 있다. 여기서, 액션의 확률을 출력하는 알고리즘(확률 정책 그래디언트)에는 A2C(Advantage Actor-Critic), SAC(Soft Actor Critic), PPO(Proximal Policy Optimization), TRPO(Trust Region Policy Optimization) 등이 사용될 수 있으며, 액션을 직접 출력하는 알고리즘(결정 정책 그래디언트)에는 DDPG(Deep Deterministic Policy Gradient), TD3(Twin Delayed Deep Deterministic Policy Gradient) 등이 사용될 수 있으나, 이에 한정되지 않는다.These policy gradient algorithms include Stochastic Policy Gradient Optimization, which outputs the probability of an action for a given state, and Deterministic Policy Gradient Optimization, which directly outputs the action for a given state. may include. Here, the algorithm that outputs the probability of action (probability policy gradient) may include A2C (Advantage Actor-Critic), SAC (Soft Actor Critic), PPO (Proximal Policy Optimization), TRPO (Trust Region Policy Optimization), etc. Algorithms (decision policy gradients) that directly output actions may include, but are not limited to, DDPG (Deep Deterministic Policy Gradient) and TD3 (Twin Delayed Deep Deterministic Policy Gradient).

본 개시에서, 상태(620)는 대상 객체의 현재 정보를 지칭할 수 있다. 여기서, 현재 정보는, 컨테이너의 현재 정보, 컨테이너 야드의 현재 정보 또는 리치 스태커의 현재 정보를 지칭할 수 있다. 액션(640)은 컨테이너 작업 데이터를 산출하는 것을 포함할 수 있다. 또한, 보상(630)은 대상 객체의 액션에 대해 얻는 이익으로서, 대상 객체가 컨테이너 작업 데이터를 산출하는 경우 갱신되는 에이전트의 기계학습 모델(예: 인공신경망 모델)의 가중치를 지칭할 수 있으나, 이에 한정되지 않으며, 강화학습의 목적 함수를 최대화할 수 있는 파라미터는 보상을 통해 갱신될 수 있다. 환경(600)은 컨테이너 정보, 컨테이너 야드 정보 또는 리치 스태커 정보를 나타내거나 특징화하는 임의의 환경을 지칭하며, 예를 들어, 컨테이너를 적재한 컨테이너선의 입항 항구, 입항 일자, 컨테이너를 적재한 컨테이너선의 입항 항구, 입항 일자, 리치 스태커의 위치, 리치 스태커의 작업 여부 등을 포함할 수 있다. In this disclosure, state 620 may refer to current information of a target object. Here, the current information may refer to current information of the container, current information of the container yard, or current information of the reach stacker. Action 640 may include producing container task data. In addition, the reward 630 is a benefit obtained for the action of the target object, and may refer to the weight of the agent's machine learning model (e.g., artificial neural network model) that is updated when the target object produces container work data. It is not limited, and parameters that can maximize the objective function of reinforcement learning can be updated through compensation. Environment 600 refers to any environment that represents or characterizes container information, container yard information, or reach stacker information, for example, the port of entry of the container ship loading the container, the port arrival date, and the port of entry of the container ship loading the container. It may include port of entry, date of entry, location of the reach stacker, and whether the reach stacker is in operation.

일 실시예에 따르면, 에이전트(610)가 주어진 환경(600)로부터 특정 상태를 수신하면(또는 에이전트가 모니터링을 통해 특정 상태를 획득하게 되면), 에이전트(610)는 상태에 따라 제어 명령을 생성하고, 생성된 제어 명령을 환경(600)의 대상 객체에 전달할 수 있다. 이에 응답하여, 환경(600)의 대상 객체는 액션에 대한 보상(630)을 에이전트(610)에 제공할 수 있다. 이러한 과정을 반복함으로써, 에이전트(610)는 환경(600)과 상호작용을 하며 보상을 많이 취할 수 있는 액션들이 결정하도록 학습될 수 있다. According to one embodiment, when the agent 610 receives a specific state from a given environment 600 (or the agent acquires a specific state through monitoring), the agent 610 generates a control command according to the state and , the generated control command can be transmitted to the target object of the environment 600. In response, the target object in the environment 600 may provide the agent 610 with a reward 630 for the action. By repeating this process, the agent 610 can learn to interact with the environment 600 and determine actions that can result in high rewards.

일 실시예에 따르면, 상술된 강화학습의 대상을 표현하기 위하여, 의사결정 과정을 모델링한 MDP(Markov Decision Process)가 이용될 수 있다. MDP는 시간이 진행함에 따라 상태가 확률적으로 변화하는 과정을 나타내는 Markov Process에서 보상(630), 액션(640) 및 정책(policy)이라는 개념이 추가된 의사결정 모델이다. According to one embodiment, in order to express the object of the above-described reinforcement learning, MDP (Markov Decision Process), which models the decision-making process, may be used. MDP is a decision-making model that adds the concepts of reward (630), action (640), and policy to the Markov Process, which represents the process in which the state changes stochastically as time progresses.

이에 따라, MDP를 기초로 대상 객체의 네비게이션 작업 M은 수학식 1로 표현될 수 있다. Accordingly, the navigation task M of the target object based on the MDP can be expressed as Equation 1.

위 수학식 1의 작업 M에서, O는 관측값(observation)을 나타내고, A는 액션(640)의 공간을 나타낼 수 있다. 여기서, 관측값은 상태(620)를 지칭할 수 있는 대상 객체의 상태와 연관된 임의의 정보, 예를 들어, 컨테이너의 현재 정보, 컨테이너 야드의 현재 정보 또는 리치 스태커의 현재 정보 등을 나타낼 수 있고, 액션(640)은 컨테이너 작업 데이터에 대한 제어 명령을 나타낼 수 있다. 또한, 작업 M에서 r은 대상 객체의 액션에 대한 보상 함수를 나타낼 수 있다. 또한, 작업 M에서 P 및 γ의 각각은 상태 전이 확률과 감가율을 나타낼 수 있다.In task M of Equation 1 above, O may represent an observation, and A may represent the space of the action 640. Here, the observation value may represent any information associated with the state of the target object that may refer to the state 620, for example, current information of the container, current information of the container yard, or current information of the reach stacker, etc. Action 640 may represent a control command for container task data. Additionally, in task M, r may represent a reward function for the action of the target object. Additionally, in task M, each of P and γ can represent the state transition probability and decay rate.

도 7은 본 개시의 일 실시예에 따른 에이전트의 구조의 예시를 나타내는 도면이다. 일 실시예에 따르면, 에이전트는 환경 데이터로부터 행동을 출력하기 위해 기계학습 모델로서 인공 신경망 모델을 이용할 수 있다. 여기서, 인공신경망 모델은, 생물학적 신경망에서와 같이 시냅스의 결합으로 네트워크를 형성한 인공 뉴런인 노드(Node)들이 시냅스의 가중치를 반복적으로 조정하여, 특정 입력에 대응한 올바른 출력과 추론된 출력 사이의 오차가 감소되도록 학습함으로써, 문제 해결 능력을 가지는 기계학습 모델을 나타낼 수 있다. 예를 들어, 인공신경망 모델은 기계학습, 딥러닝 등의 인공지능 학습법에 사용되는 임의의 확률 모델, 뉴럴 네트워크 모델 등을 포함할 수 있다. 인공신경망 모델은 다층의 노드들과 이들 사이의 연결로 구성된 다층 퍼셉트론(MLP: multilayer perceptron)으로 구현된다. 본 실시예에 따른 인공신경망 모델은 MLP를 포함하는 다양한 인공신경망 모델 구조들 중의 하나를 이용하여 구현될 수 있다. 인공신경망 모델의 학습 방법에는, 교사 신호(정답)의 입력에 의해서 문제의 해결에 최적화되도록 학습하는 지도 학습(Supervised Learning) 방법과, 교사 신호를 필요로 하지 않는 비지도 학습(Unsupervised Learning) 방법이 있다.Figure 7 is a diagram showing an example of the structure of an agent according to an embodiment of the present disclosure. According to one embodiment, the agent may use an artificial neural network model as a machine learning model to output behavior from environmental data. Here, in the artificial neural network model, as in a biological neural network, nodes, which are artificial neurons that form a network through the combination of synapses, repeatedly adjust the weights of the synapses, creating a gap between the correct output corresponding to a specific input and the inferred output. By learning to reduce errors, a machine learning model with problem-solving capabilities can be expressed. For example, the artificial neural network model may include random probability models, neural network models, etc. used in artificial intelligence learning methods such as machine learning and deep learning. The artificial neural network model is implemented as a multilayer perceptron (MLP), which consists of multiple layers of nodes and connections between them. The artificial neural network model according to this embodiment can be implemented using one of various artificial neural network model structures including MLP. The learning method of the artificial neural network model includes a supervised learning method that learns to optimize problem solving by inputting a teacher signal (correct answer), and an unsupervised learning method that does not require a teacher signal. there is.

도 7에 도시된 바와 같이, 에이전트는 외부로부터 입력 신호 또는 데이터를 수신하는 입력층(710), 입력 데이터에 대응한 출력 신호 또는 데이터를 출력하는 출력층(730), 입력층(710)과 출력층(730) 사이에 위치하며 입력층(710)으로부터 신호를 받아 특성을 추출하여 출력층(730)으로 전달하는 은닉층(720)으로 구성된다. 여기서, 출력층(730)은 은닉층(720)으로부터 신호를 받아 외부로 출력한다. 일 실시예에서, 은닉층(720)의 일부 구성으로서 LSTM(722)이 이용될 수 있다. LSTM은 기존의 RNN이 출력과 먼 위치에 잇는 정보를 기억할 수 없다는 단점을 보완하여 장/단기 기억을 가능하게 설계한 신경망의 구조를 지칭할 수 있으며, 주로 시계열 처리나 자연어 처리에 이용될 수 있다.As shown in Figure 7, the agent includes an input layer 710 that receives input signals or data from the outside, an output layer 730 that outputs an output signal or data corresponding to the input data, an input layer 710, and an output layer ( 730) and consists of a hidden layer 720 that receives signals from the input layer 710, extracts characteristics, and transmits them to the output layer 730. Here, the output layer 730 receives a signal from the hidden layer 720 and outputs it to the outside. In one embodiment, an LSTM 722 may be used as part of the hidden layer 720. LSTM can refer to the structure of a neural network designed to enable short- and long-term memory by compensating for the disadvantage of existing RNNs in that they cannot remember information located far from the output. It can be mainly used for time series processing or natural language processing. .

일 실시예에 따르면, 에이전트의 입력변수는, 환경 데이터를 포함할 수 있다. 이 경우, 환경 데이터는 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터 등을 포함할 수 있다. 이와 같이 상술된 입력변수가 입력층(710)을 통해 입력되는 경우, 에이전트의 출력층(730)에서 출력되는 출력변수는 행동, 작업 지시에 대응하는 컨테이너 작업 데이터를 포함할 수 있다.According to one embodiment, the agent's input variables may include environmental data. In this case, the environmental data may include virtual container data, virtual container yard status data, and virtual reach stacker status data. In this way, when the above-mentioned input variables are input through the input layer 710, the output variables output from the agent's output layer 730 may include container work data corresponding to actions and work instructions.

이와 같이, 에이전트의 입력층(710)과 출력층(730)에 복수의 입력변수와 대응되는 복수의 출력변수가 각각 매칭되고, 입력층(710), 은닉층(720) 및 출력층(730)에 포함된 노드들 사이의 시냅스 값이 조정됨으로써, 특정 입력에 대응한 올바른 출력이 추출될 수 있도록 학습될 수 있다. 이러한 학습 과정을 통해, 에이전트의 입력변수에 숨겨져 있는 특성을 파악할 수 있고, 입력변수에 기초하여 계산된 출력변수와 목표 출력 간의 오차가 줄어들도록 에이전트의 노드들 사이의 시냅스 값(또는 가중치)를 조정할 수 있다. 일 실시예에서, 컴퓨팅 장치는 환경 데이터를 입력 받아, 정답 데이터(ground truth)인 제1 가상 컨테이너 작업 데이터와 에이전트로부터 출력된 제2 가상 컨테이너 작업 데이터의 손실(loss)을 최소화하도록 에이전트를 학습시킬 수 있다. 이렇게 학습된 에이전트를 이용하여, 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터로부터 가상 컨테이너 작업 데이터를 자동으로 생성할 수 있고, 생성된 가상 컨테이너 작업 데이터는 강화학습에 이용될 수 있다.In this way, a plurality of input variables and a plurality of output variables corresponding to the agent's input layer 710 and output layer 730 are matched, respectively, and the input variables included in the input layer 710, the hidden layer 720, and the output layer 730 are By adjusting the synapse values between nodes, they can be learned so that the correct output corresponding to a specific input can be extracted. Through this learning process, the hidden characteristics of the agent's input variables can be identified and the synapse values (or weights) between the agent's nodes can be adjusted to reduce the error between the output variables calculated based on the input variables and the target output. You can. In one embodiment, the computing device receives environmental data and trains the agent to minimize the loss of the first virtual container work data, which is the ground truth, and the second virtual container work data output from the agent. You can. Using this learned agent, virtual container task data can be automatically generated from virtual container data, virtual container yard state data, and virtual reach stacker state data, and the generated virtual container task data can be used for reinforcement learning. You can.

도 8는 본 개시의 일 실시예에 따른 강화학습 모델의 학습 방법(800)의 예시를 나타내는 도면이다. 방법(800)은 정보 처리 시스템의 적어도 하나의 프로세서(예: 프로세서(320))에 의해 수행될 수 있다. 또한, 방법(800)은 가상 컨테이너 마스터의 강화학습 모델 또는 컨테이너 마스터의 강화학습 모델을 학습하는데 이용될 수 있으며, 후술함에 있어 강화학습 모델은 가상 컨테이너 마스터의 강화학습 모델 또는 컨테이너 마스터의 강화학습 모델을 지칭할 수 있다.FIG. 8 is a diagram illustrating an example of a method 800 for learning a reinforcement learning model according to an embodiment of the present disclosure. Method 800 may be performed by at least one processor (e.g., processor 320) of an information processing system. Additionally, the method 800 can be used to learn a reinforcement learning model of the virtual container master or a reinforcement learning model of the container master. As described later, the reinforcement learning model is a reinforcement learning model of the virtual container master or a reinforcement learning model of the container master. can refer to.

도시된 바와 같이, 방법(800)은 가상 컨테이너 데이터를 랜덤으로 설정함으로써 개시될 수 있다(S810). 또한, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터가 설정될 수 있다. 이 경우, 가상 컨테이너 야드의 상태 데이터를 설정함에 있어, 가상 컨테이너 야드의 장치율이 임의의 수치로 제한될 수 있다. 예를 들어, 가상 컨테이너 야드의 상태 데이터는 최대 장치 수의 70%의 해당하는 수량만 가상 컨테이너를 입고할 수 있다고 제한할 수 있다. 강화학습 모델은 설정된 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터를 수신할 수 있다.As shown, the method 800 may be initiated by randomly setting virtual container data (S810). Additionally, status data of the virtual container yard and status data of the virtual reach stacker may be set. In this case, when setting the status data of the virtual container yard, the device rate of the virtual container yard may be limited to an arbitrary value. For example, the status data of a virtual container yard may limit the receipt of virtual containers to only 70% of the maximum number of devices. The reinforcement learning model can receive set virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker.

이후, 강화학습 모델의 초기 학습 여부가 판단될 수 있다(S820). 강화학습 모델의 학습이 이전 모델 없이 처음 시작되는 것이라면 레이어의 가중치 값이 랜덤으로 설정된 초기 모델이 생성될 수 있다(S832). 반면, 초기 학습이 아닌 경우, 기존의 가중치가 설정된 모델을 그대로 사용하여 학습이 진행될 수 있다(S834). Afterwards, it can be determined whether the reinforcement learning model is initially trained (S820). If learning of a reinforcement learning model is started for the first time without a previous model, an initial model with the weight value of the layer set randomly can be created (S832). On the other hand, if it is not initial learning, learning can proceed using the model with the existing weights set (S834).

이후, 강화학습 모델로부터 가상 컨테이너 작업 데이터가 생성될 수 있다(S840). 가상 컨테이너 작업 데이터는 작업 유형 및 가상 컨테이너의 장치 위치를 포함할 수 있다. 예를 들어, 작업 유형은 가상 컨테이너의 입고를 포함하고, 가상 컨테이너의 장치 위치는 가상 컨테이너가 가상 컨테이너 야드에 장치될 위치를 포함한 가상 컨테이너 입고 작업 데이터가 생성될 수 있다. 다른 예로서, 작업 유형은 컨테이너 리핸들링을 포함하고, 가상 컨테이너 장치 위치는 가상 컨테이너가 리핸들링될 위치를 포함한 가상 컨테이너 리핸들링 데이터가 생성될 수 있다. 가상 컨테이너 입고 작업 데이터 및 가상 컨테이너 리핸들링 데이터는 함께 또는 선택적으로 생성될 수 있다. 또한, 가상 컨테이너 작업 데이터는 가상 컨테이너 야드의 장치율을 임의의 수치로 제한하는 조건에서 생성될 수 있다. 예를 들어, 가상 컨테이너 작업 데이터는 가상 컨테이너 야드의 최대 장치 수의 70%에 해당하는 수량만 컨테이너를 입고하는 조건에서 생성될 수 있다.Afterwards, virtual container task data may be generated from the reinforcement learning model (S840). Virtual container task data may include the task type and device location of the virtual container. For example, the task type includes the receipt of a virtual container, and the device location of the virtual container includes a location where the virtual container will be installed in the virtual container yard. Virtual container receipt operation data may be generated. As another example, virtual container rehandling data may be generated where the task type includes container rehandling and the virtual container device location includes a location at which the virtual container will be rehandled. Virtual container receiving operation data and virtual container rehandling data can be generated together or selectively. Additionally, virtual container work data may be generated under conditions that limit the device rate of the virtual container yard to an arbitrary value. For example, virtual container operation data can be generated under the condition that containers are received only in quantities equivalent to 70% of the maximum number of devices in the virtual container yard.

이후, 가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하는지 확인될 수 있다. 보다 상세하게, 가상 컨테이너가 환경에서 금지되어 있는지 여부가 판단될 수 있다(S850). 예를 들어, 가상 컨테이너의 장치 위치 하단에 위치한 컨테이너가 해당 가상 컨테이너보다 빠른 시일에 반출되면, 미래에 확정적으로 리핸들링 작업이 동반되는 바 하단에 위치한 컨테이너가 해당 가상 컨테이너보다 빠른 시일에 반출되는지 여부가 판단될 수 있다. 가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하는 경우, 강화학습 모델은 해당 가상 컨테이너 작업 데이터가 금지 행위임을 학습하도록 업데이트 될 수 있다(S852). 또한, 가상 컨테이너 데이터는 무작위로 재설정될 수 있다.Afterwards, it can be checked whether the virtual container task data violates a preset prohibition condition. More specifically, it may be determined whether the virtual container is prohibited in the environment (S850). For example, if a container located at the bottom of the device location of a virtual container is exported earlier than the corresponding virtual container, a rehandling operation will be definitely performed in the future. Whether the container located at the bottom will be exported earlier than the corresponding virtual container. can be judged. If the virtual container task data violates a preset prohibition condition, the reinforcement learning model may be updated to learn that the corresponding virtual container task data is a prohibited behavior (S852). Additionally, virtual container data may be randomly reset.

가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하지 않는 경우, 생성된 가상 컨테이너 작업 데이터를 가상 리치 스태커에게 전송할 수 있다. 가상 리치 스태커는 가상 컨테이너 작업 데이터로부터 작업 수행 결과를 생성할 수 있으며, 강화학습 모델은 작업 수행 결과에 대한 보상을 획득할 수 있다(S860). 마지막으로, 상기한 S810 내지 S860 단계를 반복하여 수행함으로써 획득된 보상이 최대가 되도록 강화학습 모델은 업데이트될 수 있다(S870).If the virtual container task data does not violate preset prohibition conditions, the generated virtual container task data can be transmitted to the virtual reach stacker. The virtual reach stacker can generate task performance results from virtual container task data, and the reinforcement learning model can obtain rewards for the task performance results (S860). Finally, the reinforcement learning model can be updated so that the reward obtained by repeatedly performing steps S810 to S860 described above is maximized (S870).

도 9은 본 개시의 일 실시예에 따라 컨테이너 입고 작업 데이터(930)를 생성하는 예시를 나타내는 도면이다. 컨테이너 야드 마스터(430)는 대상 컨테이너 데이터(922), 실제 컨테이너 야드의 상태 데이터(924) 및 실제 리치 스태커의 상태 데이터(926)를 수신하여 컨테이너 입고 작업 데이터(930)를 생성할 수 있다. 이를 위해, 컨테이너 야드 마스터(430)는 강화학습 모델(120)을 포함할 수 있다. 대상 컨테이너 데이터(922), 실제 컨테이너 야드의 상태 데이터(924) 및 실제 리치 스태커의 상태 데이터(926)는 본 명세서에 개시된 대상 컨테이너 데이터(132), 실제 컨테이너 야드의 상태 데이터(134), 실제 리치 스태커의 상태 데이터(136)에 대응될 수 있다. 도 9에서 컨테이너 야드 마스터(430)가 수신하는 데이터를 대상 컨테이너 데이터(922), 실제 컨테이너 야드의 상태 데이터(924) 및 실제 리치 스태커의 상태 데이터(926)로 나누어 도시되었으나, 이에 한정되지 않으며, 일부 데이터가 생략되거나 다른 데이터가 추가될 수 있다. 또한, 복수의 데이터가 서로 통합되어 하나의 데이터로서 수신될 수 있다.Figure 9 is a diagram illustrating an example of generating container warehousing work data 930 according to an embodiment of the present disclosure. The container yard master 430 may generate container receiving operation data 930 by receiving target container data 922, status data 924 of the actual container yard, and status data 926 of the actual reach stacker. For this purpose, the container yard master 430 may include a reinforcement learning model 120. The target container data 922, the status data 924 of the actual container yard, and the status data 926 of the actual reach stacker are the target container data 132, the status data 134 of the actual container yard, and the actual reach stacker disclosed herein. It may correspond to the status data 136 of the stacker. In FIG. 9, the data received by the container yard master 430 is shown divided into target container data 922, status data of the actual container yard 924, and status data of the actual reach stacker 926, but is not limited to this. Some data may be omitted or other data may be added. Additionally, a plurality of data may be integrated with each other and received as one data.

일 실시예에 따르면, 대상 컨테이너 데이터(922)는 실제 컨테이너 야드에 장치될 컨테이너와 연관된 데이터를 지칭할 수 있다. 예를 들어, 대상 컨테이너 데이터(922)는 대상 컨테이너를 적재한 컨테이너선의 입항 항구, 입항 일자, 컨테이너선의 운행 노선 정보, 컨테이너선의 해운사 정보 등을 포함할 수 있다.According to one embodiment, target container data 922 may refer to data associated with a container to be installed in an actual container yard. For example, the target container data 922 may include the port of entry of the container ship that loaded the target container, the port arrival date, the operation route information of the container ship, and the shipping company information of the container ship.

일 실시예에 따르면, 실제 컨테이너 야드의 상태 데이터(924)는 실제 컨테이너 야드의 현재 상태 및 실제 컨테이너 야드에 장치 중인 컨테이너에 대한 데이터를 지칭할 수 있다. 예를 들어, 실제 컨테이너 야드의 상태 데이터(924)는 컨테이너 야드의 크기, 컨테이너 야드의 최대 장치율일 때의 장치량, 컨테이너 야드에 컨테이너가 장치 중인지 여부, 컨테이너 야드에 컨테이너가 장치 중일 경우 해당 컨테이너의 상세 정보 등을 포함할 수 있다.According to one embodiment, the status data 924 of the actual container yard may refer to the current state of the actual container yard and data about containers being installed in the actual container yard. For example, the status data 924 of the actual container yard includes the size of the container yard, the device amount at the maximum device rate of the container yard, whether a container is being installed in the container yard, and if a container is being installed in the container yard, the status of the container. It may include detailed information, etc.

일 실시예에 따르면, 실제 리치 스태커의 상태 데이터(926)는 실제 컨테이너 야드에 위치한 복수의 리치 스태커의 현재 상태와 연관된 데이터를 지칭할 수 있다. 예를 들어, 실제 리치 스태커의 상태 데이터(926)는 실제 리치 스태커의 위치, 실제 리치 스태커의 작업 여부, 실제 리치 스태커 작업량, 실제 리치 스태커의 운행 기사 정보, 실제 리치 스태커의 기타 정보 등을 포함할 수 있다.According to one embodiment, the state data 926 of the actual reach stacker may refer to data associated with the current state of a plurality of reach stackers located in an actual container yard. For example, the status data 926 of the actual reach stacker may include the location of the actual reach stacker, whether the actual reach stacker is working, the amount of actual reach stacker work, the driver information of the actual reach stacker, and other information about the actual reach stacker. You can.

일 실시예에 따르면, 컨테이너 입고 작업 데이터(930)는 작업 유형이 입고임을 표시하는 데이터 및 실제 컨테이너 야드에서 대상 컨테이너가 장치될 위치를 포함할 수 있다. 컨테이너 야드 마스터(430)는 컨테이너 입고 작업 데이터(930)를 작업을 수행할 리치 스태커(940)에게 전송할 수 있다. 해당 데이터를 수신한 리치 스태커(940)는 대상 컨테이너를 실제 컨테이너 야드에서 대상 컨테이너가 장치될 위치에 옮기는 작업을 수행할 수 있다.According to one embodiment, the container warehousing operation data 930 may include data indicating that the operation type is warehousing and a location where the target container will be installed in the actual container yard. The container yard master 430 may transmit the container receiving operation data 930 to the reach stacker 940 to perform the operation. The reach stacker 940 that has received the data may perform the task of moving the target container from the actual container yard to the location where the target container will be installed.

도 10은 본 개시의 일 실시예에 따라 컨테이너 리핸들링 작업 데이터(1030)를 생성하는 예시를 나타내는 도면이다. 컨테이너 야드 마스터(430)는 대상 컨테이너 데이터(1022), 실제 컨테이너 야드의 상태 데이터(1024) 및 실제 리치 스태커의 상태 데이터(1026)를 수신하여 컨테이너 리핸들링 작업 데이터(1030)를 생성할 수 있다. 이를 위해, 컨테이너 야드 마스터(430)는 강화학습 모델(120)을 포함할 수 있다. 대상 컨테이너 데이터(1022), 실제 컨테이너 야드의 상태 데이터(1024) 및 실제 리치 스태커의 상태 데이터(1026)는 본 명세서에 개시된 대상 컨테이너 데이터(132), 실제 컨테이너 야드의 상태 데이터(134), 실제 리치 스태커의 상태 데이터(136)에 대응될 수 있다. 도 10에서 컨테이너 야드 마스터(430)가 수신하는 데이터를 대상 컨테이너 데이터(1022), 실제 컨테이너 야드의 상태 데이터(1024) 및 실제 리치 스태커의 상태 데이터(1026)로 나누어 도시되었으나, 이에 한정되지 않으며, 일부 데이터가 생략되거나 다른 데이터가 추가될 수 있다. 예를 들어, 컨테이너 야드 마스터(430)는 실제 컨테이너 야드의 상태 데이터(1024) 및 실제 리치 스태커의 상태 데이터(1026)으로부터 대상 컨테이너를 색출함으로써 컨테이너 리핸들링 작업 데이터(1030)를 생성할 수 있다. 또한, 복수의 데이터가 서로 통합되어 하나의 데이터로서 수신될 수 있다. FIG. 10 is a diagram illustrating an example of generating container rehandling job data 1030 according to an embodiment of the present disclosure. The container yard master 430 may generate container rehandling job data 1030 by receiving target container data 1022, status data 1024 of the actual container yard, and status data 1026 of the actual reach stacker. For this purpose, the container yard master 430 may include a reinforcement learning model 120. The target container data 1022, the status data 1024 of the actual container yard, and the status data 1026 of the actual reach stacker are the target container data 132, the status data 134 of the actual container yard, and the actual reach stacker disclosed herein. It may correspond to the status data 136 of the stacker. In FIG. 10, the data received by the container yard master 430 is shown divided into target container data 1022, actual container yard status data 1024, and actual reach stacker status data 1026, but is not limited thereto. Some data may be omitted or other data may be added. For example, the container yard master 430 may generate container rehandling job data 1030 by searching for target containers from the status data 1024 of the actual container yard and the status data 1026 of the actual reach stacker. Additionally, a plurality of data may be integrated with each other and received as one data.

일 실시예에 따르면, 대상 컨테이너 데이터(1022)는 실제 컨테이너 야드에 리핸들링될 컨테이너와 연관된 데이터를 지칭할 수 있다. 예를 들어, 대상 컨테이너 데이터(1022)는 리핸들링될 컨테이너의 크기, 리핸들링될 컨테이너의 출고 일자 등을 포함할 수 있다.According to one embodiment, target container data 1022 may refer to data associated with a container to be rehandled in an actual container yard. For example, the target container data 1022 may include the size of the container to be rehandled, the shipping date of the container to be rehandled, etc.

일 실시예에 따르면, 실제 컨테이너 야드의 상태 데이터(1024)는 실제 컨테이너 야드의 현재 상태 및 실제 컨테이너 야드에 장치 중인 컨테이너에 대한 데이터를 지칭할 수 있다. 예를 들어, 실제 컨테이너 야드의 상태 데이터(1024)는 컨테이너 야드의 크기, 컨테이너 야드의 최대 장치율일 때의 장치량, 컨테이너 야드에 컨테이너가 장치 중인지 여부, 컨테이너 야드에 컨테이너가 장치 중일 경우 해당 컨테이너의 상세 정보 등을 포함할 수 있다.According to one embodiment, the status data 1024 of the actual container yard may refer to the current state of the actual container yard and data about containers being installed in the actual container yard. For example, the status data 1024 of the actual container yard includes the size of the container yard, the device amount at the maximum device rate of the container yard, whether a container is being installed in the container yard, and if a container is being installed in the container yard, the status of the container. It may include detailed information, etc.

일 실시예에 따르면, 실제 리치 스태커의 상태 데이터(1026)는 실제 컨테이너 야드에 위치한 복수의 리치 스태커의 현재 상태와 연관된 데이터를 지칭할 수 있다. 예를 들어, 실제 리치 스태커의 상태 데이터(1026)는 실제 리치 스태커의 위치, 실제 리치 스태커의 작업 여부, 실제 리치 스태커 작업량, 실제 리치 스태커의 운행 기사 정보, 실제 리치 스태커의 기타 정보 등을 포함할 수 있다.According to one embodiment, the state data 1026 of the actual reach stacker may refer to data associated with the current state of a plurality of reach stackers located in an actual container yard. For example, the status data 1026 of the actual reach stacker may include the location of the actual reach stacker, whether the actual reach stacker is working, the amount of actual reach stacker work, the driver information of the actual reach stacker, and other information about the actual reach stacker. You can.

일 실시예에 따르면, 컨테이너 리핸들링 작업 데이터(1030)는 작업 유형이 리핸들링임을 표시하는 데이터 및 실제 컨테이너 야드에서 대상 컨테이너가 리핸들링될 위치(예: 대상 컨테이너가 임시 장치될 위치 등)를 포함할 수 있다. 컨테이너 야드 마스터(430)는 컨테이너 리핸들링 작업 데이터(1030)를 작업을 수행할 리치 스태커(1040)에게 전송할 수 있다. 해당 데이터를 수신한 리치 스태커(1040)는 대상 컨테이너를 실제 컨테이너 야드에서 대상 컨테이너가 리핸들링될 위치에 옮기는 작업을 수행할 수 있다.According to one embodiment, the container rehandling operation data 1030 includes data indicating that the operation type is rehandling and the location in the actual container yard where the target container will be rehandled (e.g., the location where the target container will be temporarily installed, etc.) can do. The container yard master 430 may transmit container rehandling work data 1030 to the reach stacker 1040 that will perform the work. The reach stacker 1040 that has received the data may perform the task of moving the target container from the actual container yard to the location where the target container will be rehandled.

도 11는 본 개시의 일 실시예에 따른 강화학습 기반 컨테이너 장치 위치 추천 방법(1100)을 나타내는 흐름도이다. 방법(1100)은 정보 처리 시스템의 적어도 하나의 프로세서(예: 프로세서(320))에 의해 수행될 수 있다. 도시된 바와 같이, 방법(1100)은 대상 컨테이너 데이터 및 실제 컨테이너 야드의 상태 데이터를 수신함으로써 개시될 수 있다(S1110). 그런 다음, 프로세서는 실제 컨테이너 야드에 위치한 실제 리치 스태커의 상태 데이터를 수신할 수 있다(S1120). Figure 11 is a flowchart showing a reinforcement learning-based container device location recommendation method 1100 according to an embodiment of the present disclosure. Method 1100 may be performed by at least one processor (eg, processor 320) of an information processing system. As shown, the method 1100 may be initiated by receiving target container data and status data of the actual container yard (S1110). Then, the processor may receive status data of the actual reach stacker located in the actual container yard (S1120).

마지막으로, 프로세서는 강화학습 모델을 이용하여, 대상 컨테이너 데이터, 실제 컨테이너 야드의 상태 데이터 및 실제 리치 스태커의 상태 데이터로부터 컨테이너 작업 데이터를 생성할 수 있다(S1130). 컨테이너 작업 데이터는 작업 유형 및 대상 컨테이너의 장치 위치를 포함할 수 있다. 일 실시예에서, 작업 유형은 대상 컨테이너의 입고를 포함할 수 있고, 대상 컨테이너의 장치 위치는 대상 컨테이너가 실제 컨테이너 야드에 장치될 위치를 포함할 수 있다. 다른 실시예에서, 작업 유형은 컨테이너 리핸들링을 포함할 수 있고, 대상 컨테이너의 장치 위치는 대상 컨테이너가 리핸들링될 위치를 포함할 수 있다.Finally, the processor can use a reinforcement learning model to generate container task data from target container data, state data of the actual container yard, and state data of the actual reach stacker (S1130). Container task data may include the task type and device location of the target container. In one embodiment, the operation type may include stocking of the target container, and the device location of the target container may include the location at which the target container will be installed in the actual container yard. In another embodiment, the task type may include container rehandling, and the device location of the target container may include the location at which the target container will be rehandled.

일 실시예에서, 강화학습 모델은 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터를 수신하고, 수신된 데이터에 기초하여 가상 컨테이너 작업 데이터를 생성하고, 가상 리치 스태커에게 가상 컨테이너 작업 데이터를 전송하고, 가상 리치 스태커에 의해 생성된 작업 수행 결과에 대한 보상을 획득하는 학습 방법을 통해 학습될 수 있다. 이 경우, 가상 컨테이너 데이터는 무작위로 설정될 수 있다. 또한, 가상 컨테이너 작업 데이터는 가상 컨테이너 야드의 장치율을 임의의 수치로 제한하는 조건에서 생성될 수 있다.In one embodiment, the reinforcement learning model receives virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker, generates virtual container operation data based on the received data, and sends the virtual container to the virtual reach stacker. It can be learned through a learning method that transmits task data and obtains rewards for the task performance results generated by the virtual reach stacker. In this case, the virtual container data can be set randomly. Additionally, virtual container work data may be generated under conditions that limit the device rate of the virtual container yard to an arbitrary value.

일 실시예에서, 강화학습 모델은 가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하는지 확인할 수 있다. 가상 컨테이너 작업 데이터가 금지 조건을 위반하는 경우 가상 컨테이너 데이터는 무작위로 재설정될 수 있다. 또한, 강화학습 모델은 학습 방법을 반복하여 획득된 보상이 최대가 되도록 업데이트 될 수 있다.In one embodiment, the reinforcement learning model may determine whether virtual container task data violates a preset prohibition condition. If virtual container task data violates prohibition conditions, virtual container data may be reset randomly. Additionally, the reinforcement learning model can be updated to maximize the reward obtained by repeating the learning method.

일 실시예에서, 리치 스태커는 실제 컨테이너 야드에 위치한 복수의 리치 스태커를 포함할 수 있다. 또한 프로세서는 복수의 리치 스태커 중 적어도 하나에 대하여 컨테이너 작업 데이터를 전송할 수 있다.In one embodiment, the reach stacker may include a plurality of reach stackers located in an actual container yard. Additionally, the processor may transmit container task data to at least one of the plurality of reach stackers.

도 11에서 도시한 흐름도 및 상술한 설명은 일 예시일 뿐이며, 일부 실시예에서는 다르게 구현될 수 있다. 예를 들어, 일부 실시예에서는 각 단계의 순서가 바뀌거나, 일부 단계가 반복 수행되거나, 일부 단계가 생략되거나, 일부 단계가 추가될 수 있다.The flowchart shown in FIG. 11 and the above description are only examples and may be implemented differently in some embodiments. For example, in some embodiments, the order of each step may be changed, some steps may be performed repeatedly, some steps may be omitted, or some steps may be added.

도 12는 본 개시의 일 실시예에 따른 강화학습 기반 컨테이너 장치 위치 추천 모델 생성 방법(1200)을 나타내는 흐름도이다. 방법(1200)은 정보 처리 시스템의 적어도 하나의 프로세서(예: 프로세서(320))에 의해 수행될 수 있다. 도시된 바와 같이, 방법(1200)은 가상 컨테이너 데이터, 가상 컨테이너 야드의 상태 데이터 및 가상 리치 스태커의 상태 데이터를 설정함으로써 개시될 수 있다(S1210). 이 경우, 가상 컨테이너 데이터는 무작위로 설정될 수 있다.FIG. 12 is a flowchart illustrating a method 1200 for generating a reinforcement learning-based container device location recommendation model according to an embodiment of the present disclosure. Method 1200 may be performed by at least one processor (e.g., processor 320) of an information processing system. As shown, the method 1200 may be initiated by setting virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker (S1210). In this case, the virtual container data can be set randomly.

그런 다음, 프로세서는 강화학습 모델을 이용하여, 설정된 데이터로부터 가상 컨테이너 작업 데이터를 생성할 수 있다(S1220). 일 실시예에서, 가상 컨테이너 작업 데이터는 작업 유형 및 가상 컨테이너의 장치 위치를 포함할 수 있다. 예를 들어, 작업 유형은 가상 컨테이너의 입고를 포함하고, 가상 컨테이너의 장치 위치는 가상 컨테이너가 가상 컨테이너 야드에 장치될 위치를 포함할 수 있다. 다른 예로서, 작업 유형은 가상 컨테이너 리핸들링을 포함하고, 가상 컨테이너의 장치 위치는 가상 컨테이너가 리핸들링될 위치를 포함할 수 있다.Then, the processor can generate virtual container work data from the set data using a reinforcement learning model (S1220). In one embodiment, virtual container task data may include the task type and device location of the virtual container. For example, the operation type may include receipt of the virtual container, and the device location of the virtual container may include the location at which the virtual container will be stored in the virtual container yard. As another example, the task type may include virtual container rehandling, and the virtual container's device location may include the location at which the virtual container will be rehandled.

일 실시예에서, 프로세서는 가상 컨테이너 작업 데이터가 미리 설정된 금지 조건을 위반하는지 확인할 수 있다. 가상 컨테이너 작업 데이터가 금지 조건을 위반하는 경우 가상 컨테이너 데이터는 무작위로 재설정될 수 있다. 또한, 가상 컨테이너 작업 데이터는 가상 컨테이너 야드의 장치율을 임의의 수치로 제한하는 조건에서 생성될 수 있다.In one embodiment, the processor may determine whether the virtual container task data violates a preset prohibition condition. If virtual container task data violates prohibition conditions, virtual container data may be reset randomly. Additionally, virtual container work data may be generated under conditions that limit the device rate of the virtual container yard to an arbitrary value.

그런 다음, 프로세서는 가상 리치 스태커에게 가상 컨테이너 작업 데이터를 전송하여 작업 수행 결과를 생성할 수 있다(S1230). 마지막으로, 프로세서는 작업 수행 결과에 대한 보상을 획득할 수 있다(S1240). 일 실시예에서, 프로세서는 상술한 S1210 내지 S1240 단계를 반복하여 획득한 보상이 최대가 되도록 강화학습 모델을 업데이트 할 수 있다.Then, the processor may generate a task performance result by transmitting the virtual container task data to the virtual reach stacker (S1230). Finally, the processor can obtain compensation for the results of task performance (S1240). In one embodiment, the processor may update the reinforcement learning model so that the reward obtained by repeating steps S1210 to S1240 described above is maximized.

도 12에서 도시한 흐름도 및 상술한 설명은 일 예시일 뿐이며, 일부 실시예에서는 다르게 구현될 수 있다. 예를 들어, 일부 실시예에서는 각 단계의 순서가 바뀌거나, 일부 단계가 반복 수행되거나, 일부 단계가 생략되거나, 일부 단계가 추가될 수 있다.The flowchart shown in FIG. 12 and the above description are only examples and may be implemented differently in some embodiments. For example, in some embodiments, the order of each step may be changed, some steps may be performed repeatedly, some steps may be omitted, or some steps may be added.

상술한 방법은 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램으로 제공될 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The above-described method may be provided as a computer program stored in a computer-readable recording medium for execution on a computer. The medium may continuously store a computer-executable program, or may temporarily store it for execution or download. In addition, the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites or servers that supply or distribute various other software, etc.

본 개시의 방법, 동작 또는 기법들은 다양한 수단에 의해 구현될 수도 있다. 예를 들어, 이러한 기법들은 하드웨어, 펌웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수도 있다. 본원의 개시와 연계하여 설명된 다양한 예시적인 논리적 블록들, 모듈들, 회로들, 및 알고리즘 단계들은 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양자의 조합들로 구현될 수도 있음을 통상의 기술자들은 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 대체를 명확하게 설명하기 위해, 다양한 예시적인 구성요소들, 블록들, 모듈들, 회로들, 및 단계들이 그들의 기능적 관점에서 일반적으로 위에서 설명되었다. 그러한 기능이 하드웨어로서 구현되는지 또는 소프트웨어로서 구현되는지 여부는, 특정 애플리케이션 및 전체 시스템에 부과되는 설계 요구사항들에 따라 달라진다. 통상의 기술자들은 각각의 특정 애플리케이션을 위해 다양한 방식들로 설명된 기능을 구현할 수도 있으나, 그러한 구현들은 본 개시의 범위로부터 벗어나게 하는 것으로 해석되어서는 안 된다.The methods, operations, or techniques of this disclosure may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. Those skilled in the art will understand that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented in electronic hardware, computer software, or combinations of both. To clearly illustrate this interchange of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the specific application and design requirements imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementations should not be interpreted as causing a departure from the scope of the present disclosure.

하드웨어 구현에서, 기법들을 수행하는 데 이용되는 프로세싱 유닛들은, 하나 이상의 ASIC들, DSP들, 디지털 신호 프로세싱 디바이스들(digital signal processing devices; DSPD들), 프로그램가능 논리 디바이스들(programmable logic devices; PLD들), 필드 프로그램가능 게이트 어레이들(field programmable gate arrays; FPGA들), 프로세서들, 제어기들, 마이크로제어기들, 마이크로프로세서들, 전자 디바이스들, 본 개시에 설명된 기능들을 수행하도록 설계된 다른 전자 유닛들, 컴퓨터, 또는 이들의 조합 내에서 구현될 수도 있다.In a hardware implementation, the processing units used to perform the techniques may include one or more ASICs, DSPs, digital signal processing devices (DSPDs), programmable logic devices (PLDs). ), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, electronic devices, and other electronic units designed to perform the functions described in this disclosure. , a computer, or a combination thereof.

따라서, 본 개시와 연계하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 및 회로들은 범용 프로세서, DSP, ASIC, FPGA나 다른 프로그램 가능 논리 디바이스, 이산 게이트나 트랜지스터 로직, 이산 하드웨어 컴포넌트들, 또는 본원에 설명된 기능들을 수행하도록 설계된 것들의 임의의 조합으로 구현되거나 수행될 수도 있다. 범용 프로세서는 마이크로프로세서일 수도 있지만, 대안으로, 프로세서는 임의의 종래의 프로세서, 제어기, 마이크로제어기, 또는 상태 머신일 수도 있다. 프로세서는 또한, 컴퓨팅 디바이스들의 조합, 예를 들면, DSP와 마이크로프로세서, 복수의 마이크로프로세서들, DSP 코어와 연계한 하나 이상의 마이크로프로세서들, 또는 임의의 다른 구성의 조합으로 구현될 수도 있다.Accordingly, the various illustrative logical blocks, modules, and circuits described in connection with this disclosure may be general-purpose processors, DSPs, ASICs, FPGAs or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or It may be implemented or performed as any combination of those designed to perform the functions described in. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, such as a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other configuration.

펌웨어 및/또는 소프트웨어 구현에 있어서, 기법들은 랜덤 액세스 메모리(random access memory; RAM), 판독 전용 메모리(read-only memory; ROM), 비휘발성 RAM(non-volatile random access memory; NVRAM), PROM(programmable read-only memory), EPROM(erasable programmable read-only memory), EEPROM(electrically erasable PROM), 플래시 메모리, 컴팩트 디스크(compact disc; CD), 자기 또는 광학 데이터 스토리지 디바이스 등과 같은 컴퓨터 판독가능 매체 상에 저장된 명령어들로 구현될 수도 있다. 명령들은 하나 이상의 프로세서들에 의해 실행 가능할 수도 있고, 프로세서(들)로 하여금 본 개시에 설명된 기능의 특정 양태들을 수행하게 할 수도 있다.For firmware and/or software implementations, techniques include random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), and PROM ( on computer-readable media such as programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, compact disc (CD), magnetic or optical data storage devices, etc. It can also be implemented with stored instructions. Instructions may be executable by one or more processors and may cause the processor(s) to perform certain aspects of the functionality described in this disclosure.

이상 설명된 실시예들이 하나 이상의 독립형 컴퓨터 시스템에서 현재 개시된 주제의 양태들을 활용하는 것으로 기술되었으나, 본 개시는 이에 한정되지 않고, 네트워크나 분산 컴퓨팅 환경과 같은 임의의 컴퓨팅 환경과 연계하여 구현될 수도 있다. 또 나아가, 본 개시에서 주제의 양상들은 복수의 프로세싱 칩들이나 장치들에서 구현될 수도 있고, 스토리지는 복수의 장치들에 걸쳐 유사하게 영향을 받게 될 수도 있다. 이러한 장치들은 PC들, 네트워크 서버들, 및 휴대용 장치들을 포함할 수도 있다.Although the above-described embodiments have been described as utilizing aspects of the presently disclosed subject matter in one or more standalone computer systems, the disclosure is not limited thereto and may also be implemented in conjunction with any computing environment, such as a network or distributed computing environment. . Furthermore, aspects of the subject matter of this disclosure may be implemented in multiple processing chips or devices, and storage may be similarly effected across the multiple devices. These devices may include PCs, network servers, and portable devices.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 개시의 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present disclosure has been described in relation to some embodiments in this specification, various modifications and changes may be made without departing from the scope of the present disclosure as can be understood by a person skilled in the art to which the invention pertains. Additionally, such modifications and changes should be considered to fall within the scope of the claims appended hereto.

110: 컨테이너 장치 위치 추천 시스템
120: 강화학습 모델
132: 대상 컨테이너 데이터
134: 실제 컨테이너 야드의 상태 데이터
136: 실제 리치 스태커의 상태 데이터
140: 컨테이너 작업 데이터
150: 리치 스태커110: Container device location recommendation system
120: Reinforcement learning model
132: Target container data
134: Status data of actual container yard
136: Status data of actual reach stacker
140: Container task data
150: Reach Stacker

Claims

In a reinforcement learning-based container device location recommendation method performed by at least one processor,
Receiving target container data and status data of an actual container yard;
Receiving status data of an actual reach stacker located in the actual container yard;
Using a reinforcement learning model, generating container task data from the target container data, state data of the actual container yard, and state data of the actual reach stacker; and
Transmitting the container task data to a reach stacker associated with the container task data,
The target container data includes data of the target container retrieved from the status data of the actual container yard and the status data of the actual reach stacker,
The container task data is generated in relation to the current state of the actual reach stacker,
The container task data includes a task type and a device location of the target container,
The operation type includes receipt of the target container and container rehandling,
When the operation type is the warehousing, the device location of the target container includes a location where the target container will be installed in the actual container yard,
When the task type is container rehandling, the device location of the target container includes a location at which the target container will be rehandled,
The reinforcement learning model is,
receive virtual container data, state data of virtual container yards, and state data of virtual reach stackers;
Generate virtual container work data based on the received data,
Transmitting the virtual container work data to the virtual reach stacker,
Learned through a learning method that obtains rewards for task performance results generated by the virtual reach stacker,
The virtual container data includes data of a virtual container retrieved from the state data of the virtual container yard and the state data of the virtual reach stacker,
The virtual container job data is generated in association with the state of the virtual reach stacker.

delete

In paragraph 1,
A container device location recommendation method, wherein the reinforcement learning model checks whether the virtual container work data violates a preset prohibition condition.

In clause 7,
The virtual container data is set randomly,
The method of recommending a container device location, wherein the virtual container data is randomly reset if the virtual container work data violates the prohibition condition.

According to paragraph 1,
The reinforcement learning model is updated to maximize the reward obtained by repeating the learning method.

According to paragraph 1,
The virtual container work data is,
A container device location recommendation method created under the condition of limiting the device rate of the virtual container yard to an arbitrary value.

In a reinforcement learning-based container device location recommendation model generation method performed by at least one processor,
(a) setting virtual container data, state data of the virtual container yard, and state data of the virtual reach stacker;
(b) using a reinforcement learning model to generate virtual container work data from the set data;
(c) transmitting the virtual container task data to the virtual reach stacker to generate a task performance result; and
(d) comprising obtaining compensation for the results of performing the task,
The virtual container data includes data of a virtual container retrieved from the state data of the virtual container yard and the state data of the virtual reach stacker,
The virtual container task data includes a task type and a device location of the virtual container,
The virtual container work data is generated in association with the state of the virtual reach stacker,
The task type includes receipt of the virtual container and virtual container rehandling,
When the operation type is the warehousing, the device location of the virtual container includes a location in the virtual container yard where the virtual container is to be installed,
When the task type is container rehandling, the location of the virtual container includes a location at which the virtual container will be rehandled.

delete

According to clause 11,
A method for generating a container device location recommendation model, further comprising checking whether the virtual container task data violates a preset prohibition condition.

According to clause 15,
The virtual container data is set randomly,
A method for generating a container device location recommendation model, wherein the virtual container data is randomly reset when the virtual container task data violates the prohibition condition.

According to clause 11,
Updating the reinforcement learning model so that the reward obtained by repeating steps (a), (b), (c), and (d) is maximized.
A method for generating a container device location recommendation model, further comprising:

According to clause 11,
The virtual container work data is,
A method for generating a container device location recommendation model, which is generated under the condition of limiting the device rate of the virtual container yard to an arbitrary value.

A computer program stored in a computer-readable recording medium for executing the method according to any one of claims 1, 7 to 11, and 15 to 18 on a computer.

As an information processing system,
communication module;
Memory; and
At least one processor connected to the memory and configured to execute at least one computer-readable program included in the memory,
The at least one program is,
Receive target container data and status data of the actual container yard,
Receive status data of the actual reach stacker located in the actual container yard,
Using a reinforcement learning model, container task data is generated from the target container data, the state data of the actual container yard, and the state data of the actual reach stacker,
Includes instructions for transmitting the container work data to a reach stacker associated with the container work data,
The target container data includes data of the target container retrieved from the status data of the actual container yard and the status data of the actual reach stacker,
The container task data includes a task type and a device location of the target container,
The operation type includes receipt of the target container and container rehandling,
When the operation type is the warehousing, the device location of the target container includes a location where the target container will be installed in the actual container yard,
When the task type is container rehandling, the device location of the target container includes a location at which the target container will be rehandled,
The container task data is generated in relation to the current state of the actual reach stacker,
The reinforcement learning model is,
receive virtual container data, state data of virtual container yards, and state data of virtual reach stackers;
Generate virtual container work data based on the received data,
Transmitting the virtual container work data to the virtual reach stacker,
Learned through a learning method that obtains rewards for task performance results generated by the virtual reach stacker,
The virtual container data includes data of a virtual container retrieved from the state data of the virtual container yard and the state data of the virtual reach stacker,
The information processing system wherein the virtual container work data is generated in association with the state of the virtual reach stacker.