KR20240009846A

KR20240009846A - AI engine and learning method of AI engine

Info

Publication number: KR20240009846A
Application number: KR1020220143025A
Authority: KR
Inventors: 이민하; 이회주; 조현성; 연훈제
Original assignee: 삼성전자주식회사
Priority date: 2022-07-14
Filing date: 2022-10-31
Publication date: 2024-01-23

Abstract

개시된 실시예에 따른 복수의 데이터들을 학습하는 O-Ran(Open-Radio-network) 기반 인공지능(AI) 엔진은 로우 데이터들을 수신하는 버퍼, 로우 데이터들로부터 경험 데이터들을 추출하고 학습하는 파이프라인 및 파이프 라인의 학습 결과를 저장하는 스토리지를 포함할 수 있다. 일 실시예에 따른 파이프 라인은, 제1 경험 데이터를 처리하는 제1 시뮬레이션 엔진, 제1 시뮬레이션 엔진과 병렬적으로 작용하여 제2 경험 데이터를 처리하는 제2 시뮬레이션 엔진 및 제1 시뮬레이션 엔진 및 제2 시뮬레이션 엔진으로부터 처리된 제1 경험 데이터 및 제2 경험 데이터를 수신하고, 수신된 처리된 제1 경험 데이터 및 제2 경험 데이터를 학습하고, 학습 결과에 기초하여 정책(Policy)을 생성하는 트레이너를 포함할 수 있다.The O-Ran (Open-Radio-network)-based artificial intelligence (AI) engine that learns a plurality of data according to the disclosed embodiment includes a buffer for receiving raw data, a pipeline for extracting and learning experience data from the raw data, and It may include storage to store the learning results of the pipeline. The pipeline according to one embodiment includes a first simulation engine that processes first experience data, a second simulation engine that operates in parallel with the first simulation engine to process second experience data, and a first simulation engine and a second Includes a trainer that receives processed first experience data and second experience data from the simulation engine, learns the received processed first experience data and second experience data, and generates a policy based on the learning results. can do.

Description

Artificial intelligence engine and learning method of artificial intelligence engine {AI engine and learning method of AI engine}

개시된 실시예는 인공지능 엔진 및 인공지능 엔진의 학습 방법에 관한 것으로 더욱 상세하게는 복수의 데이터들을 병렬적으로 학습하는 인공지능 엔진 및 인공지능 엔진의 학습 방법에 관한 것이다.The disclosed embodiment relates to an artificial intelligence engine and a learning method of an artificial intelligence engine, and more specifically, to an artificial intelligence engine and a learning method of an artificial intelligence engine that learns a plurality of data in parallel.

인공지능 엔진의 학습 방법의 하나로 강화 학습이 활용되고있다. 강화 학습은 기본적으로 시행착오(Trial and Error)관점에서 데이터를 반복 학습 하는 방법으로 이루어지는데, 이는 데이터를 학습하는데 오랜 시간이 소요되고, 비용도 계속 발생하는 문제점이 존재한다. 이에, 수신된 데이터를 빠르게 처리하고 학습하는 인공지능 엔진이 필요하다.Reinforcement learning is used as one of the learning methods for artificial intelligence engines. Reinforcement learning is basically done by repeatedly learning data from a trial and error perspective, but this has the problem of taking a long time to learn the data and continuing to incur costs. Accordingly, an artificial intelligence engine that quickly processes and learns from received data is needed.

개시된 실시예에 따른 복수의 데이터들을 학습하는 O-Ran(Open-Radio-network) 기반 인공지능(AI) 엔진은 로우 데이터들을 수신하는 버퍼, 로우 데이터들로부터 경험 데이터들을 추출하고 학습하는 파이프라인 및 파이프 라인의 학습 결과를 저장하는 스토리지를 포함할 수 있다. The O-Ran (Open-Radio-network)-based artificial intelligence (AI) engine that learns a plurality of data according to the disclosed embodiment includes a buffer for receiving raw data, a pipeline for extracting and learning experience data from the raw data, and It may include storage to store the learning results of the pipeline.

일 실시예에 따른 파이프 라인은, 제1 경험 데이터를 처리하는 제1 시뮬레이션 엔진, 제1 시뮬레이션 엔진과 병렬적으로 작용하여 제2 경험 데이터를 처리하는 제2 시뮬레이션 엔진 및 제1 시뮬레이션 엔진 및 제2 시뮬레이션 엔진으로부터 처리된 제1 경험 데이터 및 제2 경험 데이터를 수신하고, 수신된 처리된 제1 경험 데이터 및 제2 경험 데이터를 학습하고, 학습 결과에 기초하여 정책(Policy)을 생성하는 트레이너를 포함할 수 있다.The pipeline according to one embodiment includes a first simulation engine that processes first experience data, a second simulation engine that operates in parallel with the first simulation engine to process second experience data, and a first simulation engine and a second Includes a trainer that receives processed first experience data and second experience data from the simulation engine, learns the received processed first experience data and second experience data, and generates a policy based on the learning results. can do.

개시된 실시예에 따른 O-Ran(Open-Radio-network) 기반 인공지능(AI) 엔진이 복수의 데이터들을 학습하는 방법은 로우 데이터(Raw Data)들을 수신하는 단계(510), 로우 데이터들로부터 경험 데이터들을 추출하고 학습하는 단계(520) 및 학습 결과를 저장하는 단계(530)를 포함할 수 있다. A method for an O-Ran (Open-Radio-network)-based artificial intelligence (AI) engine to learn a plurality of data according to the disclosed embodiment includes receiving raw data (510), and experiencing from the raw data. It may include extracting and learning data (520) and storing the learning results (530).

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계는, 제1 경험 데이터를 처리하는 제1 단계, 제1 경험 데이터와 병렬적으로 제2 경험 데이터를 처리하는 제2 단계 및 제1 단계 및 제2 단계에서 처리된 제1 경험 데이터 및 제2 경험 데이터를 수신하고, 수신된 처리된 제1 경험 데이터 및 제2 경험 데이터를 학습하고, 학습 결과에 기초하여 정책(Policy)을 생성하는 단계를 포함할 수 있다.The step of extracting and learning experience data according to an embodiment includes a first step of processing the first experience data, a second step of processing the second experience data in parallel with the first experience data, and the first step and Including receiving the first experience data and second experience data processed in step 2, learning the received processed first experience data and second experience data, and generating a policy based on the learning results. can do.

개시된 실시예에 따른 인공지능 엔진의 학습 방법은 컴퓨터에서 수행하기 위한 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.The learning method of an artificial intelligence engine according to the disclosed embodiment can be implemented as a computer-readable recording medium on which a program for execution on a computer is recorded.

도 1은 개시된 실시예에 따른 인공지능 시스템(10)의 블록도이다.
도 2는 개시된 실시예에 따른 인공지능 엔진(100)의 블록도이다.
도 3은 개시된 실시예에 따른 인공지능 엔진(100)의 파이프 라인(120)을 설명하기 위한 블록도이다.
도 4a 및 도 4b는 개시된 실시예에 따른 인공지능 엔진(100)의 시뮬레이션 엔진(122)을 설명하기 위한 블록도이다.
도 5는 개시된 실시예에 따른 인공지능 엔진(100)의 학습 방법의 흐름도이다.
도 6은 개시된 실시예에 따른 인공지능 엔진(100)이 정책을 생성하는 과정을 도시한 흐름도이다.
도 7은 개시된 실시예에 따른 인공지능 엔진(100)을 구성할 수 있는 명령어들의 실시예이다.
도 8은 개시된 실시예에 따른 인공지능 엔진(100)의 학습 방법의 흐름도이다.
도 9는 개시된 실시예에 따른 인공지능 엔진(100)의 데이터 흐름을 설명하기 위한 블록도이다.
도 10은 개시된 실시예에 따른 인공지능 시스템(10)의 블록도이다.Figure 1 is a block diagram of an artificial intelligence system 10 according to the disclosed embodiment.
Figure 2 is a block diagram of the artificial intelligence engine 100 according to the disclosed embodiment.
Figure 3 is a block diagram for explaining the pipeline 120 of the artificial intelligence engine 100 according to the disclosed embodiment.
FIGS. 4A and 4B are block diagrams for explaining the simulation engine 122 of the artificial intelligence engine 100 according to the disclosed embodiment.
Figure 5 is a flowchart of a learning method of the artificial intelligence engine 100 according to the disclosed embodiment.
Figure 6 is a flowchart illustrating a process in which the artificial intelligence engine 100 generates a policy according to the disclosed embodiment.
Figure 7 is an example of instructions that can configure the artificial intelligence engine 100 according to the disclosed embodiment.
Figure 8 is a flowchart of a learning method of the artificial intelligence engine 100 according to the disclosed embodiment.
Figure 9 is a block diagram for explaining the data flow of the artificial intelligence engine 100 according to the disclosed embodiment.
Figure 10 is a block diagram of the artificial intelligence system 10 according to the disclosed embodiment.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly explained, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention are general terms that are currently widely used as much as possible while considering the function in the present invention, but this may vary depending on the intention or precedent of a person working in the art, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a part "includes" a certain element throughout the specification, this means that, unless specifically stated to the contrary, it does not exclude other elements but may further include other elements. In addition, terms such as "... unit" and "module" used in the specification refer to a unit that processes at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 실시예들에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments will be described in detail so that those skilled in the art can easily implement the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

본 명세서의 실시예에서 "사용자"라는 용어는 시스템, 기능 또는 동작을 제어하는 사람을 의미하며, 개발자, 관리자 또는 설치 기사를 포함할 수 있다.In embodiments herein, the term “user” refers to a person who controls a system, function, or operation, and may include a developer, administrator, or installer.

도 1은 개시된 실시예에 따른 인공지능(AI: Artificial Intelligence) 학습 시스템(10)의 블록도이다.Figure 1 is a block diagram of an artificial intelligence (AI) learning system 10 according to the disclosed embodiment.

도 1을 인공지능 학습 시스템(10)은 인공지능 엔진(100) 및 인공지능(AI) 모델 매니지먼트(200)를 포함하고, 외부의 RIC 플랫폼(Ran Intelligent Control Platform)으로 AI 모델(AL ML)을 배포할 수 있다. The artificial intelligence learning system 10 in Figure 1 includes an artificial intelligence engine 100 and artificial intelligence (AI) model management 200, and generates an AI model (AL ML) using an external RIC platform (Ran Intelligent Control Platform). It can be distributed.

일 실시예에 따른 인공지능 엔진(100)은 데이터들을 학습하고 학습 결과에 기초하여 AI 모델(AL ML)을 생성한다. 예를 들면, 인공지능 엔진(100)은 O-Ran(Open-Radio-network)상에서 복수의 로우 데이터들을 수신하고, 수신된 로우 데이터(Raw Data)들로부터 경험 데이터를 추출하고, 추출된 경험 데이터들을 학습할 수 있다. 경험 데이터는 수신된 로우 데이터가 처리된 결과 생성된 데이터일 수 있다. 예를 들면, 인공지능 엔진(100)은 복수의 로우 데이터들에 대하여 반복하여 처리 과정을 수행할 수 있는데, 경험 데이터는 로우 데이터가 적어도 한 번 이상 처리된 결과물일 수 있다. 일 실시예에 따른 인공지능 엔진(100)은 경험 데이터를 학습하고, 해당 경험 데이터에 적합한 정책(Policy)를 포함한 AI 모델(AI ML)을 생성할 수 있다. The artificial intelligence engine 100 according to one embodiment learns data and creates an AI model (AL ML) based on the learning results. For example, the artificial intelligence engine 100 receives a plurality of raw data on O-Ran (Open-Radio-network), extracts experience data from the received raw data, and extracts the extracted experience data. You can learn them. Experience data may be data generated as a result of processing received raw data. For example, the artificial intelligence engine 100 may repeatedly perform processing on a plurality of raw data, and the experience data may be the result of processing the raw data at least once. The artificial intelligence engine 100 according to one embodiment may learn experience data and create an AI model (AI ML) including a policy suitable for the experience data.

인공지능(AI) 모델 매니지먼트(200)는 생성된 AI 모델(AI ML)을 RIC 플랫폼(300)으로 배포할 수 있다. 예를 들면, 인공지능(AI) 모델 매니지먼트(200)는 경험 데이터로부터 생성된 AI 모델(AI ML)을 사용자 디바이스의 RIC 플랫폼(300)으로 배포하고, 배포된 AI 모델(AI ML)에 기초하여 새로운 데이터들을 학습하게 할 수 있다. Artificial intelligence (AI) model management 200 can distribute the generated AI model (AI ML) to the RIC platform 300. For example, the artificial intelligence (AI) model management 200 distributes the AI model (AI ML) generated from experience data to the RIC platform 300 of the user device, and based on the distributed AI model (AI ML) New data can be learned.

RIC 플랫폼(300)은 임의의 사용자 디바이스에 존재하면서 새롭게 수신되는 데이터(Data)들을 학습하도록 AI 모델(AI ML)을 수신하고, 새로운 데이터(Data)들을 인공지능 엔진(100)으로 송신할 수 있다. 예를 들면, RIC 플랫폼(300)은 인공지능 학습 시스템(10)에 적용되는 AI 모델(AI ML)에 의하여 데이터(Data)를 수신하고, 수신된 데이터(Data)를 인공지능 엔진(100)으로 전달할 수 있다. The RIC platform 300 is present on any user device and can receive an AI model (AI ML) to learn newly received data and transmit the new data to the artificial intelligence engine 100. . For example, the RIC platform 300 receives data by an AI model (AI ML) applied to the artificial intelligence learning system 10, and transmits the received data to the artificial intelligence engine 100. It can be delivered.

도 2는 개시된 실시예에 따른 인공지능 엔진(100)의 블록도이다.Figure 2 is a block diagram of the artificial intelligence engine 100 according to the disclosed embodiment.

도 2를 참조하면, 일 실시예에 따른 인공지능 엔진(100)은 버퍼(110), 파이프 라인(120) 또는 스토지리(130)를 포함할 수 있다. Referring to FIG. 2, the artificial intelligence engine 100 according to one embodiment may include a buffer 110, a pipeline 120, or a storage 130.

일 실시예에 따른 버퍼(110)는 로우 데이터(Raw Data)를 수신할 수 있다. 일 실시예에 따른 로우 데이터(Raw Data)는 처리되지 않고 최초로 수신된 데이터일 수 있다. 예를 들면, 일 실시예에 따른 인공지능 엔진(100)이 이미지 데이터를 학습하는 경우, 로우 데이터(Raw Data)는 임의의 사용자 디바이스가 센서로부터 획득한 최초의 이미지 데이터 일 수 있다. 다만, 로우 데이터(Raw Data)는 이에 한정되는 것은 아니고 아날로그 신호를 디지털화 할 수 있는 다양한 종류의 데이터들을 포함할 수 있다. 또한, 일 실시예에 따른 버퍼(110)는 복수의 경험 데이터(Data_E)들을 저장할 수 있다. 예를 들면, 경험 데이터(Data_E)는 후술하는 파이프 라인(120)에서 로우 데이터(Raw Data)를 처리한 결과일 수 있다.The buffer 110 according to one embodiment may receive raw data. Raw data according to one embodiment may be data that is initially received without being processed. For example, when the artificial intelligence engine 100 according to one embodiment learns image data, raw data may be the first image data acquired from a sensor by any user device. However, raw data is not limited to this and may include various types of data that can digitize analog signals. Additionally, the buffer 110 according to one embodiment may store a plurality of experience data (Data_E). For example, the experience data (Data_E) may be the result of processing raw data (Raw Data) in the pipeline 120, which will be described later.

일 실시예에 따른 파이프 라인(120)은 로우 데이터(Raw Data)들로부터 경험 데이터(Data_E)들을 추출하고, 추출된 경험 데이터(Data_E)를 학습할 수 있다. 예를 들면, 파이프 라인(120)은 복수의 로우 데이터(Raw Data)들을 1차 학습하고 복수의 경험 데이터(Data_E)들을 생성할 수 있다. 일 실시예에 따른 파이프 라인(120)은 생성된 복수의 경험 데이터(Data_E)들을 다시 입력으로 하여 경험 데이터(Data_E)들에 대한 반복 학습을 수행할 수 있다. 예를 들면, 파이프 라인(120)은 경험 데이터(Data_E)들을 수신하고, 수신된 경험 데이터(Data_E)들을 병렬적으로 처리할 수 있다. 복수의 경험 데이터(Data_E)들을 병렬적으로 처리할 수 있게 됨으로써, 일 실시예에 따른 인공지능 엔진(100)은 경험 데이터(Data_E)들을 빠르게 학습하고, 경험 데이터(Data_E)들에 적합한 AI 모델(AI ML)을 생성할 수 있다. 예를 들면. 파이프 라인(120)은 경험 데이터(Data_E)들을 병렬적으로 학습하고, 학습 결과에 기초하여 정책(Policy)를 포함하는 AI 모델(AI ML)을 생성할 수 있다. The pipeline 120 according to one embodiment may extract experience data (Data_E) from raw data (Raw Data) and learn the extracted experience data (Data_E). For example, the pipeline 120 may first learn a plurality of raw data (Raw Data) and generate a plurality of experience data (Data_E). The pipeline 120 according to one embodiment may perform iterative learning on the experience data (Data_E) by re-inputting the plurality of generated experience data (Data_E). For example, the pipeline 120 may receive experience data (Data_E) and process the received experience data (Data_E) in parallel. By being able to process a plurality of experience data (Data_E) in parallel, the artificial intelligence engine 100 according to one embodiment quickly learns the experience data (Data_E) and creates an AI model suitable for the experience data (Data_E) AI ML) can be created. For example. The pipeline 120 can learn experience data (Data_E) in parallel and create an AI model (AI ML) including a policy based on the learning results.

정책(Poilcy)가 생성되면, 일 실시예에 따른 파이프 라인(120)은 생성된 정책(Policy)에 기초하여 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습할 수 있다. 예를 들면, 파이프 라인(120)은 복수의 경험 데이터(Data_E)들의 학습 결과 생성된 AI 모델(AI ML)에 대한 피드백을 수행하고, 피드백 결과 생성된 정책(Policy)를 업데이트 하여 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습할 수 있다. 파이프 라인(120)에 대한 설명을 도 3에서 상세히 설명한다.When a policy (Poilcy) is created, the pipeline 120 according to one embodiment may learn first experience data (Data_P1) and second experience data (Data_P2) based on the generated policy (Policy). For example, the pipeline 120 performs feedback on the AI model (AI ML) generated as a result of learning from a plurality of experience data (Data_E), and updates the policy (Policy) generated as a result of the feedback to provide first experience data. (Data_P1) and second experience data (Data_P2) can be learned. The pipeline 120 will be explained in detail in FIG. 3.

일 실시예에 따른 스토리지(130)는 파이프 라인(120)의 학습 결과 또는 학습 결과 처리된 데이터(Data_P)들을 저장하고, 처리된 데이터(Data_P)들을 파이프 라인(120)으로 전달할 수 있다. 예를 들면, 스토리지(130)는 경험 데이터(Data_E)들과 경험 데이터(Data_E)들의 학습 결과 또는 AI 모델(AI ML)들이 저장될 수 있으며, 학습 결과 처리된 데이터(Data_P)들은 다시 파이프 라인(120)의 입력 데이터로 활용될 수 있다. 처리된 데이터(Data_P)들이 다시 파이프 라인(120)의 입력 데이터로 활용됨으로써, 일 실시예에 따른 인공지능 엔진(100)은 AI 모델(AI ML)에 대한 피드백을 제공하고, 수신되는 데이터들에 적합한 AI 모델(AI ML)을 생성할 수 있다.The storage 130 according to one embodiment may store the learning results of the pipeline 120 or the data (Data_P) processed as a learning result, and transmit the processed data (Data_P) to the pipeline 120. For example, the storage 130 may store experience data (Data_E) and learning results of the experience data (Data_E) or AI models (AI ML), and the data (Data_P) processed as a learning result may be returned to the pipeline (Data_P). 120) can be used as input data. By using the processed data (Data_P) again as input data of the pipeline 120, the artificial intelligence engine 100 according to one embodiment provides feedback to the AI model (AI ML) and responds to the received data. A suitable AI model (AI ML) can be created.

일 실시예에 따른 스토리지(130)는 통상적인 저장매체를 포함할 수 있다. 예를 들어, 스토리지(130)는 하드디스크드라이브(Hard Disk Drive, HDD), ROM(Read Only Memory), RAM(Random Access Memory), 플래쉬메모리 (Flash Memory) 및 메모리카드(Memory Card)를 포함할 수 있으며, 외부에 마련된 클라우드 서버를 포함할 수 있다. Storage 130 according to one embodiment may include a typical storage medium. For example, the storage 130 may include a hard disk drive (HDD), read only memory (ROM), random access memory (RAM), flash memory, and memory card. and may include an external cloud server.

도 3은 개시된 실시예에 따른 인공지능 엔진(100)의 파이프 라인(120)을 설명하기 위한 블록도이다.Figure 3 is a block diagram for explaining the pipeline 120 of the artificial intelligence engine 100 according to the disclosed embodiment.

도 3을 참조하면, 일 실시예에 따른 파이프 라인(120)은 트레이너(121), 제1 시뮬레이션 엔진(122) 또는 제2 시뮬레이션 엔진(123)을 포함할 수 있다. 예를 들면, 파이프 라인(120)의 제1 시뮬레이션 엔진(122)은 제1 경험 데이터를 처리하고, 제2 시뮬레이션 엔진(123)은 제1 시뮬레이션 엔진(122)과 병렬적으로 작용하여 제2 경험 데이터를 처리할 수 있다. 또한, 일 실시예에 따른 제1 경험 데이터 및 제2 경험 데이터는 버퍼(110)에 저장될 수 있다. 여기서, 제1, 제2 의 용어는 복수의 경험 데이터(Data_E)를 구분하기 위한 용어에 불과하고, 제3 또는 제n 경험 데이터(Data_E)를 추가로 처리할 수 있음은 물론이다. Referring to FIG. 3, the pipeline 120 according to one embodiment may include a trainer 121, a first simulation engine 122, or a second simulation engine 123. For example, the first simulation engine 122 of the pipeline 120 processes first experience data, and the second simulation engine 123 operates in parallel with the first simulation engine 122 to generate second experience data. Data can be processed. Additionally, first experience data and second experience data according to one embodiment may be stored in the buffer 110. Here, the first and second terms are merely terms for distinguishing a plurality of experience data (Data_E), and of course, the third or nth experience data (Data_E) can be additionally processed.

일 실시예에 따른 트레이너(121)는 제1 시뮬레이션 엔진(122) 및 제2 시뮬레이션 엔진(123)으로부터 각각 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 수신하고, 수신된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습할 수 있다. 또한, 일 실시예에 따른 트레이너(121)는 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습 결과 정책(Policy)를 생성할 수 있다. 예를 들면, 트레이너(121)는 미리 정해진 코드에 기초하여 복수의 경험 데이터들(Data_E), 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)에 대한 학습을 수행하고, 학습 결과를 업데이트하여 정책(Policy)를 생성할 수 있다. 여기서 미리 정해진 코드는 사용자 설정에 의하여 정하여진 코드 또는 인공지능 엔진(100)에 의한 학습 결과에 의하여 생성된 코드일 수 있다. 여기서, 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)는 각각 제1 시뮬레이션 엔진(122) 또는 제2 시뮬레이션 엔진(123)에 의하여 생성된 데이터일 수 있다.Trainer 121 according to one embodiment receives first experience data (Data_P1) and second experience data (Data_P2) processed from the first simulation engine 122 and the second simulation engine 123, respectively, and receives the received First experience data (Data_P1) and second experience data (Data_P2) can be learned. Additionally, the trainer 121 according to one embodiment may generate a learning result policy (Policy) using the first experience data (Data_P1) and the second experience data (Data_P2). For example, the trainer 121 performs learning on a plurality of experience data (Data_E), first experience data (Data_P1), and second experience data (Data_P2) based on a predetermined code, and updates the learning results. You can create a policy by doing this. Here, the predetermined code may be a code determined by user settings or a code generated by a learning result by the artificial intelligence engine 100. Here, the first experience data (Data_P1) and the second experience data (Data_P2) may be data generated by the first simulation engine 122 or the second simulation engine 123, respectively.

일 실시예에 따른 정책(Policy)는 생성된 AI 모델(AI ML)에 활용되는 학습 방법을 의미할 수 있다. 또한, 일 실시예에 따른 트레이너(121)는 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습 결과 트레이닝 데이터(Data_T)를 생성할 수 있다. 여기서 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)는 복수의 처리된 경험 데이터들을 구분하기 위한 용어에 불과하며, 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)는 같거나 다를 수 있다. Policy according to one embodiment may refer to a learning method used in the generated AI model (AI ML). Additionally, the trainer 121 according to one embodiment may generate training data (Data_T) as a result of learning the processed first experience data (Data_P1) and second experience data (Data_P2). Here, the processed first experience data (Data_P1) and the second experience data (Data_P2) are merely terms for distinguishing a plurality of processed experience data, and the processed first experience data (Data_P1) and the second experience data (Data_P2) ) may be the same or different.

일 실시예에 따른 트레이너(121)는 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 강화 학습할 수 있다. 강화 학습은 경험 데이터(Data_E) 및 처리된 경험 데이터(Data_P1, Data_P2)들을 반복 학습 하는 것을 포함할 수 있다. 예를 들면, 트레이너(121)는 경험 데이터(Data_E) 및 처리된 경험 데이터(Data_P1, Data_P2)들을 입력으로 하고, 경험 데이터(Data_E) 및 처리된 경험 데이터(Data_P1, Data_P2)들에 기초하여 생성된 AI 모델(AI)의 개선점을 판단하고 새로운 AI 모델을 생성할 수 있다. The trainer 121 according to one embodiment may perform reinforcement learning on the first experience data (Data_P1) and the second experience data (Data_P2). Reinforcement learning may include repeatedly learning experience data (Data_E) and processed experience data (Data_P1, Data_P2). For example, the trainer 121 receives experience data (Data_E) and processed experience data (Data_P1, Data_P2) as input, and generates data based on the experience data (Data_E) and processed experience data (Data_P1, Data_P2). You can determine improvements in an AI model (AI) and create a new AI model.

일 실시예에 따른 트레이너(121)는 시뮬레이션 엔진들(122, 123)으로부터 곧바로 경험 데이터들(Data_E) 또는 처리된 경험 데이터들(Data_P1, Data_P2)을 수신할 수 있다. 예를 들면, 버퍼(110)에 경험 데이터들이 저장되어 있지 않은 경우, 파이프 라인(120)은 제1 시뮬레이션 엔진(122)에서 생성된 제1 경험 데이터(Data_P1) 및 제2 시뮬레이션 엔진(123)에서 생성된 제2 경험 데이터(Date_P2)를 트레이너(121)에 버퍼(110)를 거치지 않고 전달할 수 있다.The trainer 121 according to one embodiment may directly receive experience data (Data_E) or processed experience data (Data_P1, Data_P2) from the simulation engines 122 and 123. For example, when experience data is not stored in the buffer 110, the pipeline 120 uses the first experience data (Data_P1) generated in the first simulation engine 122 and the second simulation engine 123. The generated second experience data (Date_P2) can be transmitted to the trainer 121 without going through the buffer 110.

일 실시예에 따른 트레이너(121)는 경험 데이터(Data_E) 및 처리된 경험 데이터(Data_P1, Data_P2)들에 대한 강화 학습을 수행하기 위하여 회귀 분석을 활용한 최적화 동작을 수행할 수 있다. 예를 들면, 트레이너(121)는 경험 데이터(Data_E) 및 처리된 경험 데이터(Data_P1, Data_P2)들에 기초하여 생성된 AI 모델(AI ML)에서 오류를 찾아 제거하고 새로운 AI 모델을 생성할 수 있다. 일 실시예에 따른 새로운 AI 모델에는 트레이너(121)가 생성한 정책(Policy)가 적용될 수 있다.The trainer 121 according to one embodiment may perform an optimization operation using regression analysis to perform reinforcement learning on the experience data (Data_E) and the processed experience data (Data_P1, Data_P2). For example, the trainer 121 can find and remove errors in the AI model (AI ML) created based on the experience data (Data_E) and the processed experience data (Data_P1, Data_P2) and create a new AI model. . A policy created by the trainer 121 may be applied to the new AI model according to one embodiment.

제1 시뮬레이션 엔진(122)는 제1 경험 데이터(Data_P1)을 상호 학습할 수 있다. 일 실시예에 따른 상호 학습은 상술한 강화 학습을 포함할 수 있다. 예를 들면, 제1 시뮬레이션 엔진(122)은 트레이닝 데이터(Data_T)를 입력으로 하고, 트레이닝 데이터(Data_T)를 처리한 결과 처리된 제1 경험 데이터(Data_P1)를 생성할 수 있다. 일 실시예에 따른 트레이닝 데이터(Data_T)에는 복수의 경험 데이터(Data_E)들 및 처리된 제1 경험 데이터(Data_P1)에 대한 상태 정보 및 보상 정보가 포함되어 있을 수 있다. 예를 들면, 트레이닝 데이터(Data_T)에는 복수의 경험 데이터(Data_E)들 및 처리된 제1 경험 데이터(Data_P1)에서 오류가 제거된 보상 정보가 포함되어 있을 수 있고, 복수의 경험 데이터(Data_E)들 및 처리된 제1 경험 데이터(Data_P1)에 대한 상태 정보가 포함되어 있을 수 있다. 제1 시뮬레이션 엔진(122)이 상호 학습을 수행하는 과정은 도 4a에서 상세히 설명한다. The first simulation engine 122 can mutually learn the first experience data (Data_P1). Mutual learning according to one embodiment may include the above-described reinforcement learning. For example, the first simulation engine 122 may receive training data (Data_T) as input and process the training data (Data_T) to generate processed first experience data (Data_P1). Training data (Data_T) according to one embodiment may include a plurality of experience data (Data_E) and status information and compensation information for the processed first experience data (Data_P1). For example, the training data (Data_T) may include a plurality of experience data (Data_E) and compensation information in which errors are removed from the processed first experience data (Data_P1), and the plurality of experience data (Data_E) And it may include status information about the processed first experience data (Data_P1). The process by which the first simulation engine 122 performs mutual learning is explained in detail in FIG. 4A.

제2 시뮬레이션 엔진(123)는 제1 시뮬레이션 엔진(122)과 병렬적으로 작용하여 제2 경험 데이터(Data_P2)를 처리할 수 있다. 예를 들면, 파이프 라인(120)은 제1 파라미터에 의하여 제1 시뮬레이션 엔진(122)을 구동하고, 제2 파라미터에 의하여 제2 시뮬레이션 엔진(123)을 구동할 수 있다. 여기서 파라미터는 스토리지(130) 주소, 버퍼(110) 주소, 파이프 라인(120)을 관리하기 위한 언어 또는 학습 반복 횟수 등이 포함될 수 있으나 이에 한정되는 것은 아니고 인공지능 모델(100)의 학습 과정에 영향을 미칠 수 있는 다양한 요소를 포함할 수 있다. 또한, 제1 파라미터 및 제2 파라미터는 각각의 시뮬레이션 엔진에 적용되는 파라미터를 구분하기 위한 용어에 불과하고, 제1 파라미터 및 제2 파라미터는 같거나 다를 수 있다. 또한, 제n 시뮬레이션 엔진을 포함하는 경우, n개의 파라미터가 존재할 수 있다.The second simulation engine 123 may operate in parallel with the first simulation engine 122 to process second experience data (Data_P2). For example, the pipeline 120 may drive the first simulation engine 122 according to the first parameter and the second simulation engine 123 according to the second parameter. Here, the parameters may include, but are not limited to, the storage 130 address, the buffer 110 address, the language for managing the pipeline 120, or the number of learning repetitions, but are not limited to this and affect the learning process of the artificial intelligence model 100. It can include various factors that can affect it. Additionally, the first parameter and the second parameter are merely terms for distinguishing parameters applied to each simulation engine, and the first parameter and the second parameter may be the same or different. Additionally, when an n-th simulation engine is included, n parameters may exist.

일 실시예에 따른 제2 시뮬레이션 엔진(123)은 제2 경험 데이터(Data_P2)을 상호 학습할 수 있다. 예를 들면, 상호 학습은 상술한 강화 학습을 포함할 수 있다. 예를 들면, 제2 시뮬레이션 엔진(123)은 트레이닝 데이터(Data_T)를 입력으로 하고, 트레이닝 데이터(Data_T)를 처리한 결과 처리된 제2 경험 데이터(Data_P2)를 생성할 수 있다. 일 실시예에 따른 트레이닝 데이터(Data_T)에는 복수의 경험 데이터(Data_E)들 및 처리된 제2 경험 데이터(Data_P2)에 대한 상태 정보 및 보상 정보가 포함되어 있을 수 있다. 예를 들면, 트레이닝 데이터(Data_T)에는 복수의 경험 데이터(Data_E)들 및 처리된 제2 경험 데이터(Data_P2)에서 오류가 제거된 보상 정보가 포함되어 있을 수 있고, 복수의 경험 데이터(Data_E)들 및 처리된 제2 경험 데이터(Data_P2)에 대한 상태 정보가 포함되어 있을 수 있다. 제2 시뮬레이션 엔진(123)이 상호 학습을 수행하는 과정은 도 4b에서 상세히 설명한다.The second simulation engine 123 according to one embodiment may mutually learn the second experience data (Data_P2). For example, mutual learning may include reinforcement learning described above. For example, the second simulation engine 123 may receive training data (Data_T) as input and process the training data (Data_T) to generate processed second experience data (Data_P2). Training data (Data_T) according to one embodiment may include a plurality of experience data (Data_E) and state information and compensation information for the processed second experience data (Data_P2). For example, the training data (Data_T) may include a plurality of experience data (Data_E) and compensation information in which errors are removed from the processed second experience data (Data_P2), and the plurality of experience data (Data_E) And it may include status information about the processed second experience data (Data_P2). The process by which the second simulation engine 123 performs mutual learning is explained in detail in FIG. 4B.

일 실시예에 따른 파이프 라인(120)은 경험 데이터(Data_E)를 모사하는 시뮬레이터 이미지에 의하여 제1 시뮬레이션 엔진(122) 및 제2 시뮬레이션 엔진(123)을 구동할 수 있다. 예를 들면, 실제 경험 데이터를 수신하는 대신에 파이프 라인(120)은 생성된 AI 모델(AI ML)을 활용하여 경험 데이터를 모사하는 시뮬레이션 이미지를 생성하고, 생성된 시뮬레이션 이미지를 가상의 경험 데이터로 활용하여 학습 동작을 수행할 수 있다. The pipeline 120 according to one embodiment may drive the first simulation engine 122 and the second simulation engine 123 using a simulator image that simulates experience data (Data_E). For example, instead of receiving real experience data, pipeline 120 utilizes the generated AI model (AI ML) to generate a simulation image that replicates the experience data, and converts the generated simulation image into virtual experience data. You can use it to perform learning operations.

일 실시예에 따른 파이프 라인(120)은 정책(Policy)를 모사하는 최적화 이미지에 의하여 트레이너(121)를 구동할 수 있다. 예를 들면, 실제 정책(Policy)을 생성하는 대신에 파이프 라인(120)은 생성된 AI 모델(AI ML)을 활용하여 정책(Policy) 모사하는 최적화 이미지를 생성하고, 생성된 최적화 이미지를 가상의 경험 데이터로 활용하여 학습 동작을 수행할 수 있다. The pipeline 120 according to one embodiment may drive the trainer 121 using an optimized image that simulates a policy. For example, instead of generating an actual policy, the pipeline 120 uses the created AI model (AI ML) to create an optimization image that simulates the policy, and uses the generated optimization image as a virtual Learning operations can be performed using experience data.

일 실시예에 따른 파이프 라인(120)은 처리된 경험 데이터(Data_P1, Data_P2)를 모사하는 에이전트(Agent) 이미지에 의하여 제1 시뮬레이션 엔진(122) 및 제2 시뮬레이션 엔진(123)을 구동할 수 있다. 예를 들면, 파이프 라인(120)은 시뮬레이터 이미지 및 최적화 이미지에 의하여 생성된 정책(Policy)에서 경험 데이터만을 분리하고, 분리된 경험 데이터에 기초하여 에이전트(Agent) 이미지를 생성할 수 있다. The pipeline 120 according to one embodiment may drive the first simulation engine 122 and the second simulation engine 123 by an agent image that simulates the processed experience data (Data_P1, Data_P2). . For example, the pipeline 120 may separate only the experience data from the policy generated by the simulator image and the optimization image and generate an agent image based on the separated experience data.

도 4a 및 도 4b는 개시된 실시예에 따른 인공지능 엔진(100)의 시뮬레이션 엔진(122)을 설명하기 위한 블록도이다.FIGS. 4A and 4B are block diagrams for explaining the simulation engine 122 of the artificial intelligence engine 100 according to the disclosed embodiment.

도 4a 및 도 4b를 참조하면, 각각의 시뮬레이션 엔진들(122, 123)은 에이전트(122a, 123a) 및 시뮬레이터(122b, 123b)를 포함할 수 있다. 일 실시예에 따른 시뮬레이션 엔진들(122, 123)은 경험 데이터(Data_E)에 대한 상호 학습을 수행함으로써 데이터에 대한 학습을 수행할 수 있다. 예를 들면, 시뮬레이션 엔진들(122, 123)은 이하 설명하는 액션(Action) 동작 및 보상 정보(Reward) 및 상태 정보(State)를 에이전트(Agent)로 전달하는 동작을 반복하면서 상호 학습을 수행할 수 있다. Referring to FIGS. 4A and 4B, each of the simulation engines 122 and 123 may include agents 122a and 123a and simulators 122b and 123b. The simulation engines 122 and 123 according to one embodiment may perform learning on data by performing mutual learning on the experience data (Data_E). For example, the simulation engines 122 and 123 may perform mutual learning by repeating the action described below and the operation of transmitting reward information and state information to the agent. You can.

일 실시예에 따른 에이전트(122a, 123a)는 생성된 정책(Policy)에 기초하여 액션(Action)을 수행한다. 일 실시예에 따른 액션(Action)은 에이전트(122a, 123a)가 상태 정보(State) 및 경험 데이터들에 대한 보상 정보(Reward)에 기초하여 학습에 활용되는 데이터를 시뮬레이터(122b, 123b)에 전달하는 동작을 의미할 수 있다. Agents 122a and 123a according to one embodiment perform actions based on the created policy. Action according to one embodiment is where the agents 122a and 123a transmit data used for learning to the simulators 122b and 123b based on state information (State) and reward information (Reward) for experience data. It can mean an action.

일 실시예에 따른 시뮬레이터(122b, 123b)는 에이전트(122a, 123a)의 액션(Action) 동작에 의하여 경험 데이터들(Data_E) 및 처리된 경험 데이터들(Data_P1, Data_P2)를 수신하고, 수신된 데이터들에 대한 피드백을 제공할 수 있다. 예를 들면, 시뮬레이터(122b, 123b)는 수신된 데이터들에 대한 피드백으로 보상 정보(Reward) 및 상태 정보(State)를 에이전트(Agent)로 전달할 수 있다. 일 실시예에 따른 보상 정보(Reward)는 미리 생성된 정책(Policy)에 기초하여 처리된 경험 데이터(Data_P1, Data_P2)에서 오류(Error)를 제거한 데이터를 포함할 수 있고, 상태 정보(State)는 미리 생성된 정책(Policy)에 기초하여 처리된 경험 데이터(Data_P1, Data_P2)를 포함할 수 있다. The simulators 122b and 123b according to one embodiment receive experience data Data_E and processed experience data Data_P1 and Data_P2 by the action of the agents 122a and 123a, and the received data You can provide feedback on them. For example, the simulators 122b and 123b may transmit reward information and state information to the agent as feedback on the received data. Reward information (Reward) according to one embodiment may include data obtained by removing errors from experience data (Data_P1, Data_P2) processed based on a pre-generated policy, and state information (State) It may include experience data (Data_P1, Data_P2) processed based on a pre-generated policy.

일 실시예에 따른 시뮬레이션 엔진들(122, 123)은 상호 학습을 병렬적으로 수행하면서 데이터들의 종류에 관계없이 빠른 학습을 도모할 수 있다. The simulation engines 122 and 123 according to one embodiment can achieve fast learning regardless of the type of data by performing mutual learning in parallel.

도 5는 개시된 실시예에 따른 인공지능 엔진(100)의 학습 방법의 흐름도이다.Figure 5 is a flowchart of a learning method of the artificial intelligence engine 100 according to the disclosed embodiment.

도 5를 참조하면, 일 실시예에 따른 인공지능 엔진(100)은 로우 데이터(Raw Data)들을 수신할 수 있다(510). 일 실시예에 따른 로우 데이터(Raw Data)는 임의의 사용자 디바이스가 센서로부터 획득한 최초의 데이터 일 수 있다. 예를 들면, 로우 데이터(Raw Data)는 그 종류에 무관하게 인공지능 엔진(100)이 학습할 수 있는 다양한 데이터를 포함할 수 있다.Referring to FIG. 5, the artificial intelligence engine 100 according to one embodiment may receive raw data (510). Raw data according to one embodiment may be the first data acquired from a sensor by any user device. For example, raw data may include various data that the artificial intelligence engine 100 can learn regardless of its type.

로우 데이터(Raw Data)들이 수신되면, 일 실시예에 따른 인공지능 엔진(100)은 로우 데이터들로부터 경험 데이터(Data_E)들을 추출하고 학습할 수 있다(520). 예를 들면, 인공지능 엔진(100)은 복수의 로우 데이터(Raw Data)들을 1차 학습하고 복수의 경험 데이터(Data_E)들을 생성할 수 있다. 일 실시예에 따른 인공지능 엔진(100)은 생성된 복수의 경험 데이터(Data_E)들을 다시 입력으로 하여 경험 데이터(Data_E)들에 대한 반복 학습을 수행할 수 있다. 예를 들면, 인공지능 엔진(100)은 경험 데이터(Data_E)들을 수신하고, 수신된 경험 데이터(Data_E)들을 병렬적으로 처리할 수 있다. 복수의 경험 데이터(Data_E)들을 병렬적으로 처리할 수 있게 됨으로써, 일 실시예에 따른 인공지능 엔진(100)은 경험 데이터(Data_E)들을 빠르게 학습하고, 경험 데이터(Data_E)들에 적합한 AI 모델(AI ML)을 생성할 수 있다. 예를 들면. 인공지능 엔진(100)은 경험 데이터(Data_E)들을 병렬적으로 학습하고, 학습 결과에 기초하여 정책(Policy)를 포함하는 AI 모델(AI ML)을 생성할 수 있다. 정책(Policy)가 생성되는 과정은 도 6에서 상세히 설명한다.When raw data (Raw Data) is received, the artificial intelligence engine 100 according to one embodiment may extract and learn experience data (Data_E) from the raw data (520). For example, the artificial intelligence engine 100 may first learn a plurality of raw data (Raw Data) and generate a plurality of experience data (Data_E). The artificial intelligence engine 100 according to one embodiment may perform iterative learning on the experience data (Data_E) by re-inputting the plurality of generated experience data (Data_E). For example, the artificial intelligence engine 100 may receive experience data (Data_E) and process the received experience data (Data_E) in parallel. By being able to process a plurality of experience data (Data_E) in parallel, the artificial intelligence engine 100 according to one embodiment quickly learns the experience data (Data_E) and creates an AI model suitable for the experience data (Data_E) AI ML) can be created. For example. The artificial intelligence engine 100 can learn experience data (Data_E) in parallel and create an AI model (AI ML) including a policy based on the learning results. The process of creating a policy is explained in detail in FIG. 6.

경험 데이터(Data_E)들에대한 학습이 이루어지면, 일 실시예예 따른 인공지능 엔진(100)은 파이프 라인(120)의 학습 결과를 저장할 수 있다(530). 일 실시예에 따른 인공지능 엔진(100)은 경험 데이터(Data_E)들에 대한 학습 결과 또는 학습 결과 처리된 데이터(Data_P)들을 저장하고, 처리된 데이터(Data_P)들을 다시 학습할 수 있다. 예를 들면, 인공지능 엔진(100)은 경험 데이터(Data_E)들과 경험 데이터(Data_E)들의 학습 결과 또는 AI 모델(AI ML)들을 저장할 수 있으며, 학습 결과 처리된 데이터(Data_P)들은 다시 입력 데이터로 활용될 수 있다. 처리된 데이터(Data_P)들이 다시 인공지능 엔진(100)의 입력 데이터로 활용됨으로써, 일 실시예에 따른 인공지능 엔진(100)은 AI 모델(AI ML)에 대한 피드백을 제공하고, 수신되는 데이터들에 적합한 AI 모델(AI ML)을 생성할 수 있다. 일 실시예에 따른 인공지능 엔진(100)은 내부에 존재하는 스토리지(130) 또는 외부 클라우드 서버에 존재하는 스토리지(130)에 경험 데이터(Data_E)들에 대한 학습 결과 또는 학습 결과 처리된 데이터(Data_P)들을 저장할 수 있다.When learning about experience data (Data_E) is performed, the artificial intelligence engine 100 according to one embodiment may store the learning results of the pipeline 120 (530). The artificial intelligence engine 100 according to one embodiment may store learning results or data processed as learning results (Data_P) for the experience data (Data_E), and learn the processed data (Data_P) again. For example, the artificial intelligence engine 100 can store experience data (Data_E) and learning results of the experience data (Data_E) or AI models (AI ML), and the data (Data_P) processed as a learning result is again input data. It can be used as. By using the processed data (Data_P) again as input data of the artificial intelligence engine 100, the artificial intelligence engine 100 according to one embodiment provides feedback on the AI model (AI ML) and receives the data. You can create an AI model (AI ML) suitable for . The artificial intelligence engine 100 according to one embodiment stores learning results or learning result-processed data (Data_P) for experience data (Data_E) in the internal storage 130 or the storage 130 in an external cloud server. ) can be saved.

도 6은 개시된 실시예에 따른 인공지능 엔진(100)이 정책을 생성하는 과정을 도시한 흐름도이다.Figure 6 is a flow chart illustrating a process in which the artificial intelligence engine 100 generates a policy according to the disclosed embodiment.

도 6을 참조하면, 일 실시예에 따른 인공지능 엔진(100)은 제1 경험 데이터를 처리할 수 있다(610). 예를 들면, 인공지능 엔진(100)은 제1 경험 데이터를 처리하고, 처리된 제1 경험 데이터(Data_P1)를 생성할 수 있다. Referring to FIG. 6, the artificial intelligence engine 100 according to one embodiment may process first experience data (610). For example, the artificial intelligence engine 100 may process first experience data and generate processed first experience data (Data_P1).

일 실시예에 따른 인공지능 엔진(100)은 제2 경험데이터를 처리할 수 있다(620). 예를 들면, 인공지능 엔진(100)은 제2 경험 데이터를 제1 경험 데이터와 병렬적으로 처리하고, 처리된 제2 경험 데이터(Data_P2)를 생성할 수 있다.The artificial intelligence engine 100 according to one embodiment may process the second experience data (620). For example, the artificial intelligence engine 100 may process second experience data in parallel with first experience data and generate processed second experience data (Data_P2).

제1 경험 데이터 및 제2 경험 데이터가 처리되면, 일 실시예에 따른 인공지능 엔진(100)은 제1 경험 데이터 및 제2 경험 데이터를 학습할 수 있다(630). 예를 들면, 인공지능 엔진(100)은 각각 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 다시 입력으로 수신하고, 수신된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습할 수 있다. 일 실시예에 따른 인공지능 엔진(100)은 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 강화 학습할 수 있다. 강화 학습은 경험 데이터(Data_E) 및 처리된 경험 데이터(Data_P1, Data_P2)들을 반복 학습 하는 것을 포함할 수 있다.When the first experience data and the second experience data are processed, the artificial intelligence engine 100 according to one embodiment may learn the first experience data and the second experience data (630). For example, the artificial intelligence engine 100 receives the processed first experience data (Data_P1) and second experience data (Data_P2) again as input, and the received first experience data (Data_P1) and second experience data (Data_P2) can be learned. The artificial intelligence engine 100 according to one embodiment may perform reinforcement learning on first experience data (Data_P1) and second experience data (Data_P2). Reinforcement learning may include repeatedly learning experience data (Data_E) and processed experience data (Data_P1, Data_P2).

제1 경험 데이터 및 제2 경험 데이터가 학습되면, 일 실시예에 따른 인공지능 엔진(100)은 정책(Policy)를 생성할 수 있다(640). 일 실시예에 따른 인공지능 엔진(100)은 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습 결과 정책(Policy)를 생성할 수 있다. 예를 들면, 인공지능 엔진(100)은 미리 정해진 코드에 기초하여 복수의 경험 데이터들(Data_E), 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)에 대한 학습을 수행하고, 학습 결과를 업데이트하여 정책(Policy)를 생성할 수 있다. 여기서 미리 정해진 코드는 사용자 설정에 의하여 정하여진 코드 또는 인공지능 엔진(100)에 의한 학습 결과에 의하여 생성된 코드일 수 있다.When the first experience data and the second experience data are learned, the artificial intelligence engine 100 according to one embodiment may generate a policy (640). The artificial intelligence engine 100 according to one embodiment may generate a learning result policy (Policy) using the first experience data (Data_P1) and the second experience data (Data_P2). For example, the artificial intelligence engine 100 performs learning on a plurality of experience data (Data_E), first experience data (Data_P1), and second experience data (Data_P2) based on a predetermined code, and results in learning You can create a policy by updating . Here, the predetermined code may be a code determined by user settings or a code generated by a learning result by the artificial intelligence engine 100.

일 실시예에 따른 정책(Policy)는 생성된 AI 모델(AI ML)에 활용되는 학습 방법을 의미할 수 있다. 또한, 일 실시예에 따른 인공지능 엔진(100)은 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)를 학습 결과 트레이닝 데이터(Data_T)를 생성할 수 있다. 제1 여기서 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)는 복수의 처리된 경험 데이터들을 구분하기 위한 용어에 불과하며, 처리된 제1 경험 데이터(Data_P1) 및 제2 경험 데이터(Data_P2)는 같거나 다를 수 있다. Policy according to one embodiment may refer to a learning method used in the generated AI model (AI ML). Additionally, the artificial intelligence engine 100 according to one embodiment may generate training data (Data_T) as a result of learning the processed first experience data (Data_P1) and second experience data (Data_P2). First, the processed first experience data (Data_P1) and second experience data (Data_P2) are just terms for distinguishing a plurality of processed experience data, and the processed first experience data (Data_P1) and second experience data (Data_P2) may be the same or different.

도 7은 개시된 실시예에 따른 인공지능 엔진(100)을 구성할 수 있는 명령어들의 실시예이다.Figure 7 is an example of instructions that can configure the artificial intelligence engine 100 according to the disclosed embodiment.

일 실시예에 따른 인공지능 엔진(100)의 파이프 라인(120)은 파이프 라인 명령어들(120_1)로 구성될 수 있다. 예를 들면, 파이프 라인 명령어들(120_1)은 시뮬레이션 명령어(120_1a), 정책 최적화 명령어(120_1b) 또는 구성 명령어(120_1c)를 포함할 수 있다. The pipeline 120 of the artificial intelligence engine 100 according to one embodiment may be composed of pipeline instructions 120_1. For example, pipeline instructions 120_1 may include simulation instructions 120_1a, policy optimization instructions 120_1b, or configuration instructions 120_1c.

일 실시예에 따른 시뮬레이션 명령어(120_1a)는 시뮬레이터 이미지(Simulator image), 시뮬레이터 구성(Simulator Configuration) 또는 에이전트 이미지(Agent image)를 포함할 수 있다. 일 실시예에 따른 시뮬레이터 이미지(Simulator image)는 경험 데이터(Data_E)를 모사하도록 구성될 수 있다. 또한, 일 실시예에 따른 시뮬레이터 구성(Simulator Configuration)은 정책(Policy) 정보를 포함하고, 시뮬레이션 엔진(122, 123)의 동작을 제어하는 명령어를 포함할 수 있다. 또한, 일 실시예에 따른 에이전트 이미지(Agent image)는 처리된 경험 데이터(Data_P1, Data_P2)를 모사하도록 구성될 수 있다. The simulation command 120_1a according to one embodiment may include a simulator image, simulator configuration, or agent image. A simulator image according to one embodiment may be configured to simulate experience data (Data_E). Additionally, the simulator configuration according to one embodiment may include policy information and instructions for controlling the operation of the simulation engines 122 and 123. Additionally, an agent image according to one embodiment may be configured to replicate processed experience data (Data_P1, Data_P2).

정책 최적화 명령어(120_1b)는 정책에 대한 최적화 이미지(Policy optimizer image)를 포함할 수 있다. 일 실시예에 따른, 최적화 이미지(Policy optimizer image)는 정책(Policy)를 모사하도록 구성될 수 있다. 예를 들면, 실제 정책(Policy)을 생성하는 대신에 인공지능 엔진(100)은 생성된 AI 모델(AI ML)을 활용하여 정책(Policy) 모사하는 최적화 이미지를 생성하고, 생성된 최적화 이미지(Policy optimizer image)를 가상의 경험 데이터로 활용하여 학습 동작을 수행할 수 있다. The policy optimization command 120_1b may include a policy optimizer image. According to one embodiment, an optimization image (Policy optimizer image) may be configured to replicate a policy. For example, instead of generating an actual policy (Policy), the artificial intelligence engine 100 uses the generated AI model (AI ML) to generate an optimized image that simulates the policy, and the generated optimized image (Policy) You can perform learning operations by using optimizer image as virtual experience data.

일 실시예에 따른 구성 명령어(120_1c)는 학습 시간(number of epoch) 또는 학습 횟수(number of simulation) 정보를 포함할 수 있다. 예를 들면, 인공지능 엔진(100)은 구성 명령어(120_1c)에서 설정된 학습 시간(number of epoch) 또는 학습 횟수(number of simulation)에 기초하여 시뮬레이션 엔진(122, 123)을 제어할 수 있다. The configuration command 120_1c according to one embodiment may include learning time (number of epoch) or learning number (number of simulation) information. For example, the artificial intelligence engine 100 may control the simulation engines 122 and 123 based on the learning time (number of epoch) or the learning number (number of simulation) set in the configuration command 120_1c.

도 8은 개시된 실시예에 따른 인공지능 엔진(100)의 학습 방법의 흐름도이다.Figure 8 is a flowchart of a learning method of the artificial intelligence engine 100 according to the disclosed embodiment.

도 8을 참조하면, 배포 사용자가 인공지능 엔진의 학습 방법을 배포할 수 있다(810). 예를 들면, 인공지능 엔진(100)의 제조사는 배포 사용자일 수 있다. 인공지능 엔진(100)의 제조사가 배포 사용자인 경우, 인공지능 엔진의 학습 방법은 소프트웨어 형태로 사용자에 의해 설치될 수 있다. Referring to Figure 8, a distribution user can distribute the learning method of the artificial intelligence engine (810). For example, the manufacturer of the artificial intelligence engine 100 may be a distribution user. If the manufacturer of the artificial intelligence engine 100 is a distribution user, the learning method of the artificial intelligence engine may be installed by the user in software form.

인공지능 엔진의 학습 방법이 배포되면, 등록 사용자가 배포된 인공지능 엔진의 학습 방법에 워크 플로우(Work Flow)의 구성요소를 정의할 수 있다(820). 일 실시예에 따른 등록 사용자는 인공지능 엔진(100)을 설치하고 사용하는 주체로써, 사용자 디바이스의 사용자일 수 있으며, 워크 플로우(Work Flow)는 파이프 라인(120)정보를 포함할 수 있다. 예를 들면, 등록 사용자는 인공지능 엔진(100)에 사용되는 명령어들을 설정하고 인공지능 엔진(100)의 파이프 라인(120)을 미리 설정할 수 있다. 파이프 라인(120)이 사용자에 의해 설정되면, 일 실시예에 따른 인공지능 엔진(100)의 학습 방법이 정의될 수 있다.When the learning method of the artificial intelligence engine is distributed, the registered user can define the components of the work flow in the learning method of the distributed artificial intelligence engine (820). A registered user according to one embodiment is a subject who installs and uses the artificial intelligence engine 100 and may be a user of the user device, and the work flow may include pipeline 120 information. For example, a registered user can set commands used in the artificial intelligence engine 100 and set the pipeline 120 of the artificial intelligence engine 100 in advance. When the pipeline 120 is set by the user, the learning method of the artificial intelligence engine 100 according to one embodiment may be defined.

워크 플로우(Work Flow)의 구성 요소가 정의되면, 일 실시예에 따른 인공지능 엔진(100)은 정의된 워크 플로우(Work Flow)에 의하여 데이터를 학습할 수 있다(830). 예를 들면, 인공지능 엔진(100)은 미리 정해진 시뮬레이션 엔진(122, 123)의 구동 방법에 의하여 데이터들을 학습할 수 있다.When the components of the work flow (Work Flow) are defined, the artificial intelligence engine 100 according to one embodiment can learn data according to the defined work flow (Work Flow) (830). For example, the artificial intelligence engine 100 can learn data by using a predetermined driving method of the simulation engines 122 and 123.

도 9는 개시된 실시예에 따른 인공지능 엔진(100)의 데이터 흐름을 설명하기 위한 블록도이다.Figure 9 is a block diagram for explaining the data flow of the artificial intelligence engine 100 according to the disclosed embodiment.

도 9를 참조하면, 일 실시예에 따른 인공지능 엔진(100)은 버퍼(110)에 의하여 데이터들을 수신하고, 경험 데이터를 추출할 수 있다. 추출된 경험 데이터들은 트레이너(121), 제1 시뮬레이션 엔진(122), 제2 시뮬레이션 엔진(123), 제3 시뮬레이션 엔진(124) 또는 제n 시뮬레이션 엔진(12n+1)에 의하여 학습될 수 있다. 예를 들면, 인공지능 엔진(100)은 n개의 시뮬레이션 엔진들을 포함할 수 있으며, n개의 시뮬레이션 엔진들은 각각 병렬로 구동될 수 있다. Referring to FIG. 9, the artificial intelligence engine 100 according to one embodiment may receive data through the buffer 110 and extract experience data. The extracted experience data may be learned by the trainer 121, the first simulation engine 122, the second simulation engine 123, the third simulation engine 124, or the n-th simulation engine (12n+1). For example, the artificial intelligence engine 100 may include n simulation engines, and each of the n simulation engines may be driven in parallel.

도 10은 개시된 실시예에 따른 인공지능 시스템(10)의 블록도이다.Figure 10 is a block diagram of the artificial intelligence system 10 according to the disclosed embodiment.

도 10을 참조하면, 일 실시예에 따른 인공지능 시스템(10)은 오-픈랜(O-RAN) 방식의 통신 시스템에서 동작할 수 있다. Referring to FIG. 10, the artificial intelligence system 10 according to one embodiment may operate in an open RAN (O-RAN) communication system.

일 실시예에 따른 데이터(1000)는 사용자 장치(UE: User Equipment, 1010)에 의하여 획득될 수 있다. 또한, 일 실시예에 따른 데이터(1000)는 다양한 인터페이스들에 의하여 수신될 수 있다. 예를 들면, 데이터(1000)는 라디오 유닛(O-RU, 1020), 분산 유닛(O-DU, 1030), 클라우드 유닛(O-CU, 1040)에 의하여 수신될 수 있다. 또한, 데이터(1000)는 클라우드 네트워크(CN, 1050)에 의하여 통신될 수 있으며, 애플리캐이션 기능(AF, 1060)에 의해 송, 수신이 제어될 수 있다. 　또한, 데이터(1000)는 반복 학습될 수 있는데, 동작 및 유지 유닛(OAM: Operation and Maintenance, 1070)에 의하여 데이터의 송, 수신이 제어될 수 있다. 또한, 데이터(1000)는 RIC 플랫폼들(1080, 1090)으로 배포될 수 있다.Data 1000 according to one embodiment may be acquired by a user equipment (UE: 1010). Additionally, data 1000 according to one embodiment may be received through various interfaces. For example, data 1000 may be received by a radio unit (O-RU, 1020), a distribution unit (O-DU, 1030), and a cloud unit (O-CU, 1040). Additionally, data 1000 can be communicated through a cloud network (CN, 1050), and transmission and reception can be controlled by an application function (AF, 1060). Additionally, data 1000 can be learned repeatedly, and transmission and reception of data can be controlled by an operation and maintenance unit (OAM: 1070). Additionally, data 1000 may be distributed to RIC platforms 1080 and 1090.

수신된 최초의 데이터들은 로우 데이터(Raw Data)로 데이터 처리 준비 과정(1100)을 거칠 수 있다. 데이터 준비 과정(1100)은 데이터 수신/선택 과정(1110) 또는 데이터 병합/삭제 과정(1120)을 포함할 수 있다. The first data received is raw data and may undergo a data processing preparation process (1100). The data preparation process 1100 may include a data reception/selection process 1110 or a data merging/deletion process 1120.

데이터 처리 준비과정(1100)이 종료되면, 데이터들은 학습 모델 정보(Model Training Info)에 기초하여 모델 학습 호스트(1200)로 전달된다. 모델 학습 호스트(1200)는 학습 모델 선택 유닛(1210), 특징 엔지니어링 유닛(1210), 학습/테스트 유닛(1230), 모델 최적화 유닛(1230) 또는 모델 개선 유닛(1250)을 포함할 수 있다. 일 실시예에 따른 모델 학습 호스트(1200)는 학습 모델을 인증하고 적용한다(Model Certification/Onboarding). When the data processing preparation process 1100 is completed, the data are delivered to the model training host 1200 based on training model information (Model Training Info). The model training host 1200 may include a learning model selection unit 1210, a feature engineering unit 1210, a learning/testing unit 1230, a model optimization unit 1230, or a model improvement unit 1250. The model learning host 1200 according to one embodiment authenticates and applies the learning model (Model Certification/Onboarding).

일 실시예에 따른 학습 모델의 인증 및 적용(Model Certification/Onboarding)은 모델 매니지먼트(1300)에 의하여 구현될 수 있다. 모델 매니지먼트(1300)는 AI모델을 구동하는 인증 및 적용 유닛(1310) 및 AI 모델에 대한 피드백을 생성하는 개선/관리/종료 유닛(1320)을 포함할 수 있다. 일 실시예에 따른 데이터들은 분산 유닛(O-DU, 1030), 클라우드 유닛(O-CU, 1040), RIC 플랫폼(1080, 1090)에 의하여 제공될 수 있다(1430).Certification and application (Model Certification/Onboarding) of a learning model according to one embodiment may be implemented by model management 1300. The model management 1300 may include an authentication and application unit 1310 that runs an AI model and an improvement/management/termination unit 1320 that generates feedback on the AI model. Data according to one embodiment may be provided by a distribution unit (O-DU, 1030), a cloud unit (O-CU, 1040), and a RIC platform (1080, 1090) (1430).

모델 매니지먼트(1300)는 AI 모델에 대한 피드백을 수행하고, AI 모델을 개선 하여 모델 인터페이스 호스트(1400)로 전달한다. 일 실시예에 따른 모델 인터페이스 호스트(1400)는 데이터 준비 과정(1100)에서 생성된 모델 추론 정보(Model Inference Info)를 수신하고, 모델 추론 정보(Model Inference Info)와 생성된 AI 모델의 정보를 비교할 수 있다. 일 실시예에 따른 모델 인터페이스 호스트(1400)는 AI 모델을 편집하는 특징 엔지니어링 유닛(1410) 또는 모델 추론/학습 유닛(1420)을 포함할 수 있다. 일 실시예에 따른 모델 추론/학습 유닛(1420)은 온라인 상에서 데이터의 학습 동작을 수행하게 할 수 있다.The model management 1300 performs feedback on the AI model, improves the AI model, and delivers it to the model interface host 1400. The model interface host 1400 according to one embodiment receives model inference information (Model Inference Info) generated in the data preparation process 1100 and compares the model inference information (Model Inference Info) with information on the generated AI model. You can. The model interface host 1400 according to one embodiment may include a feature engineering unit 1410 or a model inference/learning unit 1420 that edits an AI model. The model inference/learning unit 1420 according to one embodiment may perform a learning operation on data online.

모델 인터페이스 호스트(1400)는 액터(1500)으로 출력(Output)을 전달한다. 일 실시예에 따른 액터(1500)는 인공지능 엔진(100)의 에이전트(122a, 122b)일 수 있고, 출력(Output)은 데이터의 학습 결과 생성된 AI 모델을 포함할 수 있다. 일 실시예에 따른 데이터들은 분산 유닛(O-DU, 1030), 클라우드 유닛(O-CU, 1040), RIC 플랫폼(1080, 1090)에 의하여 제공될 수 있다.The model interface host 1400 delivers output to the actor 1500. Actors 1500 according to one embodiment may be agents 122a and 122b of the artificial intelligence engine 100, and output may include an AI model generated as a result of learning data. Data according to one embodiment may be provided by a distribution unit (O-DU, 1030), a cloud unit (O-CU, 1040), and a RIC platform (1080, 1090).

일 실시예에 따른 액터(1500)은 배포된 AI 모델에 기초하여 학습 동작(Action)을 수행할 수 있다(1600, 1700). 일 실시예에 따른 시뮬레이터(122b, 123b)는 에이전트(122a, 123a)의 액션(Action) 동작에 의하여 경험 데이터들(Data_E) 및 처리된 경험 데이터들(Data_P1, Data_P2)를 수신하고, 수신된 데이터들에 대한 피드백을 제공할 수 있다. 예를 들면, 시뮬레이터(122b, 123b)는 수신된 데이터들에 대한 피드백으로 보상 정보(Reward) 및 상태 정보(State)를 에이전트(Agent)로 전달할 수 있다. 일 실시예에 따른 보상 정보(Reward)는 미리 생성된 정책(Policy)에 기초하여 처리된 경험 데이터(Data_P1, Data_P2)에서 오류(Error)를 제거한 데이터를 포함할 수 있고, 상태 정보(State)는 미리 생성된 정책(Policy)에 기초하여 처리된 경험 데이터(Data_P1, Data_P2)를 포함할 수 있다. The actor 1500 according to one embodiment may perform a learning action based on the distributed AI model (1600, 1700). The simulators 122b and 123b according to one embodiment receive experience data Data_E and processed experience data Data_P1 and Data_P2 by the action of the agents 122a and 123a, and the received data You can provide feedback on them. For example, the simulators 122b and 123b may transmit reward information and state information to the agent as feedback on the received data. Reward information (Reward) according to one embodiment may include data obtained by removing errors from experience data (Data_P1, Data_P2) processed based on a pre-generated policy, and state information (State) It may include experience data (Data_P1, Data_P2) processed based on a pre-generated policy.

한편, 인공지능 엔진(100)은 지속 동작 제어 유닛(1800)에 의하여 학습 동작을 반복하여 수행할수 있다. 예를 들면, 지속 동작 제어 유닛(1800)은 인증부(1810), 모니터링부(1820), 분석부(1830), 제안부(1840) 또는 지속 최적화부(1850)를 포함할 수 있다. Meanwhile, the artificial intelligence engine 100 can repeatedly perform learning operations by the continuous operation control unit 1800. For example, the continuous operation control unit 1800 may include an authentication unit 1810, a monitoring unit 1820, an analysis unit 1830, a proposal unit 1840, or a continuous optimization unit 1850.

일 실시예에 따른 인증부(1810)은 생성된 AI 모델을 인증할 수 있다. 일 실시예에 따른 모니터링부(1820)은 AI 모델의 데이터 학습 활동을 모니터일 하고, 분석부(1830) 데이터의 학습 결과 및 생성된 AI 모델을 분석할 수 있다. 일 실시예에 따른 제안부(1840)는 분석결과 피드백이 포함된 AI 모델을 제한할 수 있다. 지속 최적화부(1850)은 피드백 동작이 수행되어 업데이트된 AI 모델에 의하여 데이터의 학습을 계속하도록 인공지능 엔진(100)을 제어할 수 있다.The authentication unit 1810 according to one embodiment can authenticate the generated AI model. The monitoring unit 1820 according to one embodiment may monitor the data learning activities of the AI model and analyze the learning results of the data of the analysis unit 1830 and the generated AI model. The proposal unit 1840 according to one embodiment may limit the AI model that includes analysis result feedback. The continuous optimization unit 1850 may control the artificial intelligence engine 100 to perform a feedback operation and continue learning data using the updated AI model.

개시된 실시예에 따른 복수의 데이터들을 학습하는 O-Ran(Open-Radio-network) 기반 인공지능(AI) 엔진(100)은 로우 데이터들을 수신하는 버퍼(110), 로우 데이터들로부터 경험 데이터들을 추출하고 학습하는 파이프라인(120) 및 파이프 라인의 학습 결과를 저장하는 스토리지(130)를 포함할 수 있다. The O-Ran (Open-Radio-network) based artificial intelligence (AI) engine 100, which learns a plurality of data according to the disclosed embodiment, extracts experience data from the buffer 110, which receives raw data, and the raw data. It may include a pipeline 120 for learning and a storage 130 for storing the learning results of the pipeline.

일 실시예에 따른 파이프 라인(120)은, 제1 경험 데이터를 처리하는 제1 시뮬레이션 엔진(122), 제1 시뮬레이션 엔진과 병렬적으로 작용하여 제2 경험 데이터를 처리하는 제2 시뮬레이션 엔진(123) 및 제1 시뮬레이션 엔진 및 제2 시뮬레이션 엔진으로부터 처리된 제1 경험 데이터 및 제2 경험 데이터를 수신하고, 수신된 처리된 제1 경험 데이터 및 제2 경험 데이터를 학습하고, 학습 결과에 기초하여 정책(Policy)을 생성하는 트레이너(121)를 포함할 수 있다.The pipeline 120 according to one embodiment includes a first simulation engine 122 that processes first experience data, and a second simulation engine 123 that operates in parallel with the first simulation engine to process second experience data. ) and receive processed first experience data and second experience data from the first simulation engine and the second simulation engine, learn the received processed first experience data and second experience data, and implement a policy based on the learning result. It may include a trainer 121 that creates (Policy).

일 실시예에 따른 제1 시뮬레이션 엔진(122)은, 제1 경험 데이터를 상호 학습하는 제1 에이전트 및 제1 시뮬레이터를 포함할 수 있다. The first simulation engine 122 according to one embodiment may include a first agent and a first simulator that mutually learn first experience data.

일 실시예에 따른 제2 시뮬레이션 엔진(123)은, 제2 경험 데이터를 상호 학습하는 제2 에이전트 및 제2 시뮬레이터를 포함하는 제2 시뮬레이션 엔진을 포함할 수 있다.The second simulation engine 123 according to one embodiment may include a second simulation engine including a second agent and a second simulator that mutually learn second experience data.

일 실시예에 따른 트레이너(121)는, 미리 정해진 코드에 기초하여 학습 결과를 업데이트하여 정책을 생성할 수 있다.The trainer 121 according to one embodiment may generate a policy by updating learning results based on a predetermined code.

일 실시예에 따른 파이프 라인(120)은, 정책에 기초하여 제1 경험 데이터 및 제2 경험 데이터를 학습할 수 있다.The pipeline 120 according to one embodiment may learn first experience data and second experience data based on a policy.

일 실시예에 따른 파이프 라인(120)은, 제1 파라미터에 의하여 제1 시뮬레이션 엔진을 구동하고, 제1 파라미터와는 상이한 제2 파라미터에 의하여 제2 시뮬레이션 엔진을 구동할 수 있다.The pipeline 120 according to an embodiment may drive a first simulation engine based on a first parameter and drive a second simulation engine based on a second parameter different from the first parameter.

일 실시예에 따른 파이프 라인(120)은, 경험 데이터를 모사하는 시뮬레이터 이미지에 의하여 제1 시뮬레이션 엔진 및 제2 시뮬레이션 엔진을 구동할 수 있다.The pipeline 120 according to one embodiment may drive a first simulation engine and a second simulation engine using a simulator image that simulates experience data.

일 실시예에 따른 파이프 라인은, 정책을 모사하는 정책 최적화 이미지에 의하여 트레이너를 구동할 수 있다.The pipeline according to one embodiment may drive a trainer using a policy optimization image that simulates a policy.

일 실시예에 따른 파이프 라인(120)은, 처리된 경험 데이터를 모사하는 에이전트(Agent) 이미지에 의하여 제1 시뮬레이션 엔진 및 제2 시뮬레이션 엔진을 구동할 수 있다.The pipeline 120 according to one embodiment may drive the first simulation engine and the second simulation engine by using an agent image that simulates processed experience data.

일 실시예에 따른 파이프 라인(120)은, 정책에서 경험 데이터를 분리하고, 분리된 경험 데이터에 기초하여 에이전트(Agent) 이미지를 생성할 수 있다.The pipeline 120 according to one embodiment may separate experience data from policy and generate an agent image based on the separated experience data.

일 실시예에 따른 스토리지(130)는, 외부의 서버를 포함할 수 있다.Storage 130 according to one embodiment may include an external server.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 제1 경험 데이터를 처리하는 제1 단계(610), 제1 경험 데이터와 병렬적으로 제2 경험 데이터를 처리하는 제2 단계(620) 및 제1 단계 및 제2 단계에서 처리된 제1 경험 데이터 및 제2 경험 데이터를 수신하고, 수신된 처리된 제1 경험 데이터 및 제2 경험 데이터를 학습하고(630), 학습 결과에 기초하여 정책(Policy)을 생성하는 단계(640)를 포함할 수 있다.The step 520 of extracting and learning experience data according to an embodiment includes a first step 610 of processing the first experience data, and a second step of processing the second experience data in parallel with the first experience data. (620) and receive the processed first experience data and second experience data in the first step and the second step, learn the received processed first experience data and second experience data (630), and apply the learning result. A step 640 of generating a policy based on the policy may be included.

일 실시예에 따른 제1 단계(610)는, 제1 경험 데이터를 상호 학습하는 것을 포함할 수 있다.The first step 610 according to one embodiment may include mutual learning of first experience data.

일 실시예에 따른 제2 단계(620)는, 제2 경험 데이터를 상호 학습하는 것을 포함할 수 있다.The second step 620 according to one embodiment may include mutually learning the second experience data.

일 실시예에 따른 정책(Policy)을 생성하는 단계(640)는, 미리 정해진 코드에 기초하여 학습 결과를 업데이트하여 정책을 생성하는 것을 포함할 수 있다.Step 640 of creating a policy according to an embodiment may include generating a policy by updating learning results based on a predetermined code.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 정책에 기초하여 제1 경험 데이터 및 제2 경험 데이터를 학습하는 것을 포함할 수 있다.The step 520 of extracting and learning experience data according to an embodiment may include learning first experience data and second experience data based on a policy.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 제1 파라미터에 의하여 제1 단계를 수행하고, 제1 파라미터와는 상이한 제2 파라미터에 의하여 제2 단계를 수행하는 것을 포함할 수 있다.The step 520 of extracting and learning experience data according to one embodiment may include performing a first step by a first parameter and performing a second step by a second parameter different from the first parameter. You can.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 경험 데이터를 모사하는 시뮬레이터 이미지에 의하여 제1 단계 및 제2 단계를 수행하는 것을 포함할 수 있다.The step 520 of extracting and learning experience data according to an embodiment may include performing a first step and a second step using a simulator image that simulates the experience data.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 정책을 모사하는 정책 최적화 이미지에 의하여 정책(Policy)을 생성하는 단계를 수행하는 것을 포함할 수 있다.The step 520 of extracting and learning experience data according to an embodiment may include performing a step of generating a policy using a policy optimization image that simulates the policy.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 처리된 경험 데이터를 모사하는 에이전트(Agent) 이미지에 의하여 제1 단계 및 제2 단계를 수행하는 것을 포함할 수 있다.The step 520 of extracting and learning experience data according to an embodiment may include performing a first step and a second step using an agent image that simulates the processed experience data.

일 실시예에 따른 경험 데이터들을 추출하고 학습하는 단계(520)는, 정책에서 경험 데이터 만을 분리하고, 분리된 경험 데이터에 기초하여 에이전트(Agent) 이미지를 생성하는 것을 포함할 수 있다.The step 520 of extracting and learning experience data according to an embodiment may include separating only the experience data from the policy and generating an agent image based on the separated experience data.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' simply means that it is a tangible device and does not contain signals (e.g. electromagnetic waves). This term is used to refer to cases where data is semi-permanently stored in a storage medium and temporary storage media. It does not distinguish between cases where it is stored as . For example, a 'non-transitory storage medium' may include a buffer where data is temporarily stored.

일 실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadable app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, methods according to various embodiments disclosed in this document may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. A computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store or between two user devices (e.g. smartphones). It may be distributed in person or online (e.g., downloaded or uploaded). In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) is stored on a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server. It can be temporarily stored or created temporarily.

10: 인공지능 시스템
100: 인공지능 엔진
110: 버퍼
120: 파이프라인
121: 트레이너
122: 제1 시뮬레이션 엔진
123: 제2 시뮬레이션 엔진
130: 스토리지
200: AI 모델 매니지먼트
300:RIC 플랫폼10: Artificial intelligence system
100: Artificial intelligence engine
110: buffer
120: pipeline
121: Trainer
122: First simulation engine
123: Second simulation engine
130: storage
200: AI model management
300:RIC Platform

Claims

In the O-Ran (Open-Radio access network)-based artificial intelligence (AI) engine 100 that learns a plurality of data,
Buffer 110 for receiving raw data;
A pipeline 120 for extracting and learning experience data from the raw data; and
Includes a storage 130 that stores the learning results of the pipeline,
The pipeline 120 is,
a first simulation engine 122 that processes first experience data;
a second simulation engine 123 that operates in parallel with the first simulation engine to process second experience data; and
Receive the processed first experience data and second experience data from the first simulation engine and the second simulation engine, learn the received processed first experience data and second experience data, and based on the learning result An artificial intelligence engine including a trainer (121) that creates a policy.

According to claim 1,
The first simulation engine is,
Comprising a first agent and a first simulator that mutually learn the first experience data,
The second simulation engine is,
An artificial intelligence engine including a second simulation engine including a second agent and a second simulator that mutually learn the second experience data.

According to paragraph 1 or 2
The trainer 121 is
An artificial intelligence engine that generates the policy by updating the learning results based on a predetermined code.

According to any one of claims 1 to 3,
The pipeline 120 is,
An artificial intelligence engine that learns the first experience data and the second experience data based on the policy.

According to any one of claims 1 to 4,
The pipeline 120 is,
An artificial intelligence engine that drives the first simulation engine by a first parameter and drives the second simulation engine by a second parameter that is different from the first parameter.

According to any one of claims 1 to 5,
The pipeline 120 is,
An artificial intelligence engine that drives the first simulation engine and the second simulation engine by a simulator image that simulates the experience data.

The method according to any one of claims 1 to 6,
The pipeline 120 is,
An artificial intelligence engine that runs a trainer using a policy optimization image that replicates the above policy.

The method according to any one of claims 1 to 7,
The pipeline 120 is,
An artificial intelligence engine that drives the first simulation engine and the second simulation engine by an agent image that simulates the processed experience data.

The method according to any one of claims 1 to 8,
The pipeline 120 is,
An artificial intelligence engine that separates experience data from the policy and creates an agent image based on the separated experience data.

The method according to any one of claims 1 to 9,
The storage 130 is,
An artificial intelligence engine that includes an external server.

In a method for an O-Ran (Open-Radio access network)-based artificial intelligence (AI) engine to learn a plurality of data,
Receiving raw data (510);
Extracting and learning experience data from the raw data (520); and
Including a step 530 of storing the learning results,
The step 520 of extracting and learning the experience data is,
A first step 610 of processing first experience data;
A second step (620) of processing second experience data in parallel with the first experience data; and
Receiving the first experience data and second experience data processed in the first step and the second step, learning the received first experience data and second experience data (630), based on the learning result A learning method for an artificial intelligence engine including the step 640 of generating a policy.

According to claim 11,
The first step 610 is,
Including mutually learning the first experience data,
The second step 620 is,
A learning method for an artificial intelligence engine comprising mutually learning the second experience data.

According to clause 11 or 12
The step 640 of creating the policy is,
A learning method for an artificial intelligence engine including generating a policy by updating the learning result based on a predetermined code.

The method according to any one of claims 11 to 13,
The step 520 of extracting and learning the experience data is,
A learning method of an artificial intelligence engine comprising learning the first experience data and the second experience data based on the policy.

The method according to any one of claims 11 to 14,
The step 520 of extracting and learning the experience data is,
A learning method for an artificial intelligence engine comprising performing the first step using a first parameter and performing the second step using a second parameter different from the first parameter.

The method according to any one of claims 11 to 15,
The step 520 of extracting and learning the experience data is,
A learning method for an artificial intelligence engine comprising performing the first and second steps using a simulator image that simulates the experience data.

The method according to any one of claims 11 to 16,
The step 520 of extracting and learning the experience data is,
A learning method for an artificial intelligence engine comprising generating the policy using a policy optimization image that replicates the policy.

The method according to any one of claims 11 to 17,
The step 520 of extracting and learning the experience data is,
A learning method for an artificial intelligence engine comprising performing the first step and the second step by an agent image replicating the processed experience data.

The method according to any one of claims 11 to 18,
The step 520 of extracting and learning the experience data is,
A learning method for an artificial intelligence engine including separating only experience data from the policy and generating an agent image based on the separated experience data.

A computer-readable recording medium on which a program for performing the method of any one of claims 11 to 19 is recorded on a computer.