KR102635613B1

KR102635613B1 - Method for embedding non-structured data and apparatus for performing the method

Info

Publication number: KR102635613B1
Application number: KR1020230089063A
Authority: KR
Inventors: 이상수; 임정택; 윤준영; 백인욱
Original assignee: 스마트마인드 주식회사
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2024-02-20

Abstract

본 발명은 비정형 데이터에 대한 임베딩 방법 및 이러한 방법을 수행하는 장치에 관한 것이다. 데이터 처리 시스템에서 비정형 데이터의 임베딩 방법은 데이터 처리 시스템의 임베딩 인공지능모델 그룹이 비정형 데이터를 임베딩값으로 변화시키는 단계와 데이터 처리 시스템의 데이터베이스가 임베딩값을 저장하는 단계를 포함할 수 있다.The present invention relates to an embedding method for unstructured data and an apparatus for performing such method. The method of embedding unstructured data in a data processing system may include a step in which the embedding artificial intelligence model group of the data processing system changes unstructured data into an embedding value and a step in which the database of the data processing system stores the embedding value.

Description

Embedding method for unstructured data and apparatus for performing the method {Method for embedding non-structured data and apparatus for performing the method}

본 발명은 비정형 데이터에 대한 임베딩 방법 및 이러한 방법을 수행하는 장치에 관한 것이다. 보다 상세하게는 비정형 데이터에 대한 특성을 저장함으로써 데이터 사이즈를 줄이고, 빠른 결과를 출력하기 위한 비정형 데이터에 대한 임베딩 방법 및 이러한 방법을 수행하는 장치에 관한 것이다.The present invention relates to an embedding method for unstructured data and an apparatus for performing such method. More specifically, it relates to an embedding method for unstructured data to reduce data size and output fast results by storing the characteristics of unstructured data, and a device for performing this method.

급속한 비대면 환경과 모바일 우선 전략에 따라 해마다 많은 정형 데이터 및 비정형 데이터의 폭발적인 증가와 생성은 모든 분야에서 빅데이터를 활용한 새로운 의사 결정과 서비스를 요구하고 있다.Due to the rapid non-face-to-face environment and mobile-first strategy, the explosive increase and creation of structured and unstructured data every year is demanding new decisions and services utilizing big data in all fields.

이와 같이 데이터의 급격한 증가와 소비는 향후 더욱 가속화될 예정이며, 이러한 정형 데이터뿐만 아니라, 비정형 데이터에 포함되어 있는 다양한 패턴들을 수집하고 정제하고 분석하여 미래의 성장동력을 찾는 것이 기업들의 새로운 비즈니스 모델이 되고 있다.As such, the rapid increase and consumption of data is expected to accelerate further in the future, and finding future growth engines by collecting, refining and analyzing various patterns contained in not only structured data but also unstructured data will become a new business model for companies. It is becoming.

기존 선행 기술로는 국내출원번호10-2014-0036626건이 있다.Existing prior art includes domestic application number 10-2014-0036626.

본 발명은 상술한 문제점을 모두 해결하는 것을 그 목적으로 한다.The purpose of the present invention is to solve all of the above-mentioned problems.

또한, 본 발명은, 비정형 데이터에 임베딩을 통해 파일 특성을 저장하여 데이터 사이즈를 축소하고 빠른 결과를 도출해내는 것을 목적으로 한다.In addition, the purpose of the present invention is to save file characteristics through embedding in unstructured data to reduce data size and produce quick results.

또한, 본 발명은, 컨버트(CONVERT) 및 서치(SEARCH) 구문을 사용하여 비정형 데이터를 임베딩하는 과정 그리고 임베딩을 통해 비정형 검색을 하는 과정을 GPU를 통해 가속화하여 더욱더 빠르게 결과를 도출해내는 것을 목적으로 한다.In addition, the present invention aims to derive results more quickly by accelerating the process of embedding unstructured data using CONVERT and SEARCH syntax and performing unstructured search through embedding through GPU. .

상기 목적을 달성하기 위한 본 발명의 대표적인 구성은 다음과 같다.A representative configuration of the present invention to achieve the above object is as follows.

본 발명의 일 실시예에 따르면, 데이터 처리 시스템에서 비정형 데이터의 임베딩 방법은 데이터 처리 시스템의 임베딩 인공지능모델 그룹이 상기 비정형 데이터를 임베딩값으로 변화시키는 단계와 상기 데이터 처리 시스템의 데이터베이스가 상기 임베딩값을 저장하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method of embedding unstructured data in a data processing system includes the steps of an embedding artificial intelligence model group of a data processing system changing the unstructured data into an embedding value, and a database of the data processing system converting the embedding value into an embedding value. It may include the step of storing.

한편, 상기 임베딩 인공지능모델 그룹은 필요한 임베딩값을 생성하는 적어도 하나의 서로 다른 인공지능모델을 포함하고, 상기 임베딩값은 상기 적어도 하나의 서로 다른 인공지능모델의 특징맵을 기반으로 결정될 수 있다.Meanwhile, the embedding artificial intelligence model group includes at least one different artificial intelligence model that generates a necessary embedding value, and the embedding value may be determined based on a feature map of the at least one different artificial intelligence model.

또한, 상기 임베딩값은 확장 SQL 엔진을 기반으로 한 컨버트 구문에 의해 생성되고, 상기 임베딩값은 확장 SQL 엔진을 기반으로 한 서치 구문에 의해 탐색될 수 있다.Additionally, the embedding value is generated by a convert statement based on an extended SQL engine, and the embedding value can be searched by a search statement based on the extended SQL engine.

본 발명의 다른 실시예에 따르면, 비정형 데이터의 임베딩을 수행하는 데이터 처리 시스템은 임베딩 인공지능모델 그룹이 상기 비정형 데이터를 임베딩값으로 변화시키고, 데이터베이스가 상기 임베딩값을 저장하도록 구현될 수 있다.According to another embodiment of the present invention, a data processing system that performs embedding of unstructured data may be implemented so that an embedding artificial intelligence model group changes the unstructured data into an embedding value and a database stores the embedding value.

본 발명에 의하면, 비정형 데이터에 임베딩을 통해 파일 특성을 저장하여 데이터 사이즈가 축소되고 빠른 결과가 도출될 수 있다.According to the present invention, the data size can be reduced and quick results can be obtained by storing file characteristics through embedding in unstructured data.

또한, 본 발명에 의하면, 컨버트(CONVERT) 및 서치(SEARCH) 구문을 사용하여 비정형 데이터를 임베딩하는 과정 그리고 임베딩을 통해 비정형 검색을 하는 과정을 GPU를 통해 가속화하여 더욱더 빠르게 결과가 도출될 수 있다.In addition, according to the present invention, the process of embedding unstructured data using CONVERT and SEARCH syntax and the process of unstructured search through embedding are accelerated through GPU, so that results can be derived more quickly.

도 1은 기존 데이터 처리 시스템을 나타낸 개념도이다.
도 2는 본 발명의 실시예에 따른 하나의 플랫폼 상에서 정형 데이터와 비정형 데이터를 처리하기 위한 데이터 처리 시스템을 나타낸 개념도이다.
도 3은 본 발명의 실시예에 따른 하나의 플랫폼 상에서 정형 데이터와 비정형 데이터를 처리하기 위한 데이터 처리 시스템을 나타낸 개념도이다.
도 4는 본 발명의 실시예에 따른 데이터 처리 시스템의 동작을 나타낸 개념도이다.
도 5는 본 발명의 실시예에 따른 데이터 처리 시스템의 동작을 나타낸 개념도이다.
도 6은 본 발명의 실시예에 따른 데이터 처리 시스템을 기반으로 한 데이터 처리 방법을 나타낸 개념도이다.
도 7은 본 발명의 실시예에 따른 데이터 처리 시스템에서 쿼리를 분석하여 데이터를 처리하는 방법을 나타낸 개념도이다.
도 8은 본 발명의 실시예에 따른 자원 분배 알고리즘을 나타낸 개념도이다.
도 9는 본 발명의 실시예에 따른 입력 멀티-쿼리(input multi-query)를 처리하는 데이터 처리 시스템을 나타낸 개념도이다.
도 10은 본 발명의 실시예에 따른 멀티-쿼리 스케줄러의 동작을 나타낸 개념도이다.
도 11은 본 발명의 실시예에 따른 입력 멀티-쿼리를 처리하는 방법을 나타낸 개념도이다.
도 12는 본 발명의 실시예에 따른 정형 데이터 및 비정형 데이터에 대한 처리 서비스를 제공하는 서버를 나타낸 개념도이다.
도 13은 본 발명의 실시예에 따른 공유 워크스페이스 허브의 쿼리 처리 방법을 나타낸 개념도이다.
도 14는 본 발명의 실시예에 따른 자원 분배 방법을 나타낸 개념도이다.
도 15는 본 발명의 실시예에 따른 비정형 데이터에 대한 비정형 임베딩 방법을 나타낸 개념도이다.
도 16은 본 발명의 실시예에 따른 비정형 데이터에 대한 비정형 임베딩 방법을 나타낸 개념도이다.
도 17은 본 발명의 실시예에 따른 컨버트 구문 및 서치 구문을 사용한 결과를 나타낸 개념도이다.
도 18은 본 발명의 실시예에 따른 임베딩값을 바이너리화하기 위한 방법을 나타낸 개념도이다.
도 19는 본 발명의 실시예에 따른 바이너리화에 대한 예시이다.
도 20은 본 발명의 실시예에 따른 임베딩값 또는 바이너리값으로 저장하기 위한 방법이 개시된다.
도 21은 본 발명의 실시예에 따른 워크스페이스 백업 방법을 나타낸 개념도이다.
도 22는 본 발명의 실시예에 따른 워크스페이스 백업 방법을 나타낸 순서도이다.
도 23은 본 발명의 실시예에 따른 서버 백업 방법을 나타낸 개념도이다.
도 24는 본 발명의 실시예에 따른 서버 백업 방법을 나타낸 순서도이다.
도 25는 본 발명의 실시예에 따른 워크스페이스 마이그레이션 방법을 나타낸 개념도이다.
도 26은 본 발명의 실시예에 따른 워크스페이스 마이그레이션 방법을 나타낸 순서도이다.
도 27은 본 발명의 실시예에 따른 서버 마이그레이션 방법을 나타낸 개념도이다.
도 28은 본 발명의 실시예에 따른 서버 마이그레이션 방법을 나타낸 순서도이다.
도 29는 본 발명의 실시예에 따른 비정형 모델 캐싱 방법을 나타낸 개념도이다.
도 30은 본 발명의 실시예에 따른 제1 캐싱 알고리즘을 나타낸 개념도이다.
도 31은 본 발명의 실시예에 따른 비정형 모델 캐싱 방법을 나타낸 개념도이다.Figure 1 is a conceptual diagram showing an existing data processing system.
Figure 2 is a conceptual diagram showing a data processing system for processing structured data and unstructured data on one platform according to an embodiment of the present invention.
Figure 3 is a conceptual diagram showing a data processing system for processing structured data and unstructured data on one platform according to an embodiment of the present invention.
Figure 4 is a conceptual diagram showing the operation of a data processing system according to an embodiment of the present invention.
Figure 5 is a conceptual diagram showing the operation of a data processing system according to an embodiment of the present invention.
Figure 6 is a conceptual diagram showing a data processing method based on a data processing system according to an embodiment of the present invention.
Figure 7 is a conceptual diagram showing a method of analyzing queries and processing data in a data processing system according to an embodiment of the present invention.
Figure 8 is a conceptual diagram showing a resource distribution algorithm according to an embodiment of the present invention.
Figure 9 is a conceptual diagram showing a data processing system that processes input multi-query according to an embodiment of the present invention.
Figure 10 is a conceptual diagram showing the operation of a multi-query scheduler according to an embodiment of the present invention.
Figure 11 is a conceptual diagram showing a method of processing an input multi-query according to an embodiment of the present invention.
Figure 12 is a conceptual diagram showing a server that provides processing services for structured data and unstructured data according to an embodiment of the present invention.
Figure 13 is a conceptual diagram showing a query processing method of a shared workspace hub according to an embodiment of the present invention.
Figure 14 is a conceptual diagram showing a resource distribution method according to an embodiment of the present invention.
Figure 15 is a conceptual diagram showing an unstructured embedding method for unstructured data according to an embodiment of the present invention.
Figure 16 is a conceptual diagram showing an unstructured embedding method for unstructured data according to an embodiment of the present invention.
Figure 17 is a conceptual diagram showing the results of using the convert syntax and search syntax according to an embodiment of the present invention.
Figure 18 is a conceptual diagram showing a method for binarizing an embedding value according to an embodiment of the present invention.
Figure 19 is an example of binarization according to an embodiment of the present invention.
Figure 20 discloses a method for storing an embedding value or a binary value according to an embodiment of the present invention.
Figure 21 is a conceptual diagram showing a workspace backup method according to an embodiment of the present invention.
Figure 22 is a flowchart showing a workspace backup method according to an embodiment of the present invention.
Figure 23 is a conceptual diagram showing a server backup method according to an embodiment of the present invention.
Figure 24 is a flowchart showing a server backup method according to an embodiment of the present invention.
Figure 25 is a conceptual diagram showing a workspace migration method according to an embodiment of the present invention.
Figure 26 is a flowchart showing a workspace migration method according to an embodiment of the present invention.
Figure 27 is a conceptual diagram showing a server migration method according to an embodiment of the present invention.
Figure 28 is a flowchart showing a server migration method according to an embodiment of the present invention.
Figure 29 is a conceptual diagram showing an unstructured model caching method according to an embodiment of the present invention.
Figure 30 is a conceptual diagram showing a first caching algorithm according to an embodiment of the present invention.
Figure 31 is a conceptual diagram showing an unstructured model caching method according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이러한 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 본 명세서에 기재되어 있는 특정 형상, 구조 및 특성은 본 발명의 정신과 범위를 벗어나지 않으면서 일 실시예로부터 다른 실시예로 변경되어 구현될 수 있다. 또한, 각각의 실시예 내의 개별 구성요소의 위치 또는 배치도 본 발명의 정신과 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 행하여 지는 것이 아니며, 본 발명의 범위는 특허청구범위의 청구항들이 청구하는 범위 및 그와 균등한 모든 범위를 포괄하는 것으로 받아들여져야 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 구성요소를 나타낸다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein may be implemented with changes from one embodiment to another without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the detailed description described below is not intended to be limited, and the scope of the present invention should be taken to encompass the scope claimed by the claims and all equivalents thereof. Like reference numbers in the drawings indicate identical or similar elements throughout various aspects.

이하에서는, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 여러 바람직한 실시예에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, several preferred embodiments of the present invention will be described in detail with reference to the attached drawings in order to enable those skilled in the art to easily practice the present invention.

도 1은 기존 데이터 처리 시스템을 나타낸 개념도이다. Figure 1 is a conceptual diagram showing an existing data processing system.

도 1에서는 기존에 정형 데이터와 비정형 데이터를 처리하는 데이터 처리 시스템이 개시된다.In Figure 1, a data processing system that processes existing structured data and unstructured data is disclosed.

도 1을 참조하면, 기존 데이터 처리 시스템의 정형 데이터(100) 및 비정형 데이터(120)에 대한 데이터 처리 방식이 개시된다.Referring to FIG. 1, a data processing method for structured data 100 and unstructured data 120 in an existing data processing system is disclosed.

정형 데이터(100)는 스키마에 따라 테이블에 저장되고 관계를 통해 테이블 간에 연결이 가능한 데이터이다. 정형 데이터(100)는 보유하고 있는 정보에 대한 적절히 정의된 스키마를 가지고 행과 열로 표시될 수 있다. 각 열은 다른 속성을 나타내는 반면, 각 행에는 단일 인스턴스의 속성과 연결된 데이터가 있다. 행과 열은 쉽게 참조할 수 있는 테이블을 형성할 수 있고, 서로 다른 테이블들은 연결될 수 있고, 여러 테이블이 연속적으로 연결되어 있으면 관계형 데이터베이스(140)가 형성될 수 있다.Structured data 100 is data that is stored in tables according to schema and can be connected between tables through relationships. Structured data 100 can be displayed in rows and columns with an appropriately defined schema for the information it holds. Each column represents a different property, while each row contains data associated with a single instance of the property. Rows and columns can form a table that can be easily referenced, different tables can be linked, and a relational database 140 can be formed when several tables are sequentially linked.

비정형 데이터(120)는 정형 데이터(100)와 반대되는 데이터이고, 정해진 규칙이 없어서 값의 의미를 쉽게 파악하기 힘든 데이터로서 음성, 이미지, 영상과 같은 데이터를 포함할 수 있다.Unstructured data 120 is the opposite of structured data 100, and is data whose meaning is difficult to easily understand because there are no set rules, and may include data such as voice, image, and video.

기존 데이터 처리 시스템은 SQL(structured query language)을 기반으로 정형 데이터(100)에 대한 쿼리만이 가능하였고, 비정형 데이터(120)에 대한 처리를 위해서는 특정 스키마가 없는 NoSQL 데이터베이스가 사용되었다.The existing data processing system could only query structured data (100) based on SQL (structured query language), and a NoSQL database without a specific schema was used to process unstructured data (120).

또한, 기존의 데이터 처리 시스템은 정형 데이터(100)에 대한 실시간 쿼리가 가능하였으나, 비정형 데이터(120)에 대한 실시간 쿼리가 불가능하였다. 기존 데이터베이스 처리 시스템에서 비정형 데이터(120)는 실시간 처리(real time processing) 대신 배치 처리(batch processing)을 통해 처리되었다. 이로 인해, 기존의 데이터 처리 시스템에서 이미지, 영상, 음성에 대한 실시간 검색이 불가하였다. 보다 구체적으로 기존의 데이터 처리 시스템에서 비정형 데이터(120)는 대량의 데이터를 실시간으로 분석하기 어렵다. 따라서, 실시간으로 획득이 가능한 데이터 테이블과 정해진 시간에 계산을 미리 해놓은 배치(Batch) 테이블을 결합하는 람다 아키텍처(150) 기반의 처리가 수행되었고, 정형 데이터(100)와 비정형 데이터(120)가 별도의 DMBS(database management system)를 기반으로 처리되었다.In addition, the existing data processing system was capable of real-time querying of structured data (100), but real-time querying of unstructured data (120) was not possible. In existing database processing systems, unstructured data 120 is processed through batch processing instead of real time processing. Because of this, real-time search for images, videos, and voices was impossible in existing data processing systems. More specifically, in existing data processing systems, it is difficult to analyze large amounts of unstructured data 120 in real time. Therefore, processing was performed based on the Lambda architecture (150), which combines a data table that can be acquired in real time and a batch table that has been calculated in advance at a fixed time, and structured data (100) and unstructured data (120) are separated. It was processed based on DMBS (database management system).

또한, 기존의 데이터 처리 시스템은 비정형 데이터(120)에 대한 배치 프로세싱을 위해 다양한 파이프라인, 다양한 프레임워크, 다양한 언어를 사용하였다. 따라서, 하나의 가버넌스를 기반으로 한 데이터의 처리가 불가능하였고, 개발 이후 유지 보수가 어려웠다.Additionally, existing data processing systems used various pipelines, various frameworks, and various languages for batch processing of unstructured data 120. Therefore, processing of data based on a single governance was impossible, and maintenance after development was difficult.

또한, 기존의 데이터 처리 시스템에서 비정형 데이터(120)에 대한 학습을 위해서는 데이터베이스 내에서의 인공 지능 학습이 불가하였다. 기존의 데이터 처리 시스템은 정형 데이터(100)에 대한 학습을 데이터베이스에 구현된 AI 엔진을 기반으로 수행하였으나, 비정형 데이터(120)에 대한 학습은 데이터베이스 내에서 SQL 기반으로 처리되지 않았기 때문에 데이터베이스 안에서 비정형 데이터를 기반으로 한 AI 엔진 모델링은 불가능하였다.Additionally, in order to learn about unstructured data 120 in the existing data processing system, artificial intelligence learning within the database was not possible. The existing data processing system performed learning on structured data (100) based on an AI engine implemented in the database, but learning on unstructured data (120) was not processed based on SQL within the database, so unstructured data within the database AI engine modeling based on was impossible.

또한, 기존의 데이터 처리 시스템은 AI 엔진에 대한 모델링을 수행시 운영계의 모수 테이블에서 샘플링을 통해 샘플 테이블(160)을 생성하여 모델링을 수행하게 되고, 모델링을 수행하는 모델링 플랫폼과 실제 운영을 수행하는 운영 플랫폼이 서로 상이하다. 이러한 경우, 모델링 플랫폼과 운영 플랫폼의 차이로 인해 모델링 결과가 정확하지 않은 문제점이 발생된다.In addition, when performing modeling for an AI engine, the existing data processing system creates a sample table 160 through sampling from the parameter table of the operating system to perform modeling, and a modeling platform that performs modeling and actual operation are used to perform modeling. The operating platforms are different. In this case, the problem of inaccurate modeling results occurs due to differences between the modeling platform and the operating platform.

기존 데이터 처리 시스템에서는 샘플 데이터를 활용하여 AI 모델링 하는 데까지 정말 많은 시간이 소요된다.In existing data processing systems, it takes a lot of time to perform AI modeling using sample data.

기존 데이터 처리 시스템에서는 모수 테이블에서 샘플 데이터를 추출해오는 과정이 수행된다. 모수 데이터가 테이블 형태가 아닌 다양한 형태로 존재할 수 있기 때문에 데이터를 변형 및 추출해오는 과정에서 시간이 소요되고, 또 모델링 하기 위해 데이터를 전처리 하는 과정에서도 상당한 시간이 요구된다. In existing data processing systems, the process of extracting sample data from a parameter table is performed. Because parameter data can exist in various forms other than tables, it takes time to transform and extract the data, and a considerable amount of time is also required to preprocess the data for modeling.

또한, 기존 데이터 처리 시스템의 AI 모델링 과정에서 샘플 데이터는 정형과 비정형 데이터를 모두 포함하고 있고 정형/비정형 AI모델링을 하기 위해 기존 데이터 처리 시스템에서는 람다 아키텍처를 필수로 적용해야 한다. 람다 아키텍처를 통해서 개발을 하게 된다면 다양한 플랫폼과 언어를 사용하게 되는데 플랫폼 간의 특성 차이, 연동 문제 등으로 접목시키는데 시간을 많이 허비하게 된다.In addition, in the AI modeling process of existing data processing systems, sample data includes both structured and unstructured data, and in order to perform structured/unstructured AI modeling, Lambda architecture must be applied to existing data processing systems. If you develop through Lambda architecture, you will use various platforms and languages, but you will waste a lot of time integrating them due to differences in characteristics and interoperability issues between platforms.

이뿐만 아니라, 모수 테이블에서 데이터를 추출하고 람다 아키텍처 상에서 AI 모델링을 하는 동안 모수 테이블/데이터에 실시간으로 새로운 데이터들이 쌓이게 되는데 그렇게 되면 기존 데이터 처리 시스템에서 만들어진 AI 모델을 적용했을 때 예측 결과(모델의 결과 값)가 정확하지 않다는 문제점이 있다. 그렇다면 다시 한번 모델링을 하기 위해서 1번 프로세스와 2번 프로세스를 거치는 등 많은 시간이 소요된다.In addition, while extracting data from the parameter table and doing AI modeling on the Lambda architecture, new data is accumulated in the parameter table/data in real time. Then, when applying the AI model created in the existing data processing system, the prediction result (model's There is a problem that the result value is not accurate. In that case, it takes a lot of time to do modeling again, going through processes 1 and 2.

본 발명의 실시예에 따른 데이터 처리 시스템이 사용되는 경우, 모수 데이터가 하나의 형태(테이블)로 관리되고, 샘플 데이터를 추출해 오는 과정도 간단한 쿼리문을 통해 가능하고 람다 아키텍처를 필요로 하지 않기 때문에 정형 데이터 및 비정형 데이터에 대한 AI 모델링 또한 하나의 플랫폼과 하나의 언어를 사용하여 연동 문제없이 쉽게 프로세스 할 수 있다는 장점이 있다.When the data processing system according to the embodiment of the present invention is used, parameter data is managed in one form (table), and the process of extracting sample data is possible through a simple query statement and does not require a lambda architecture. AI modeling for structured and unstructured data also has the advantage of being able to be easily processed without integration problems using one platform and one language.

따라서, 본 발명의 실시예에 따른 데이터 처리 플랫폼은 하나의 플랫폼을 기반으로 하나의 언어를 기초로 정형 데이터(100)와 비정형 데이터(120)를 처리할 수 있다. Therefore, the data processing platform according to an embodiment of the present invention can process structured data 100 and unstructured data 120 based on one language based on one platform.

또한, 본 발명의 실시예에 따른 데이터 처리 플랫폼은 하나의 플랫폼 상에 운영 플랫폼과 모델링 플랫폼이 위치하여 보다 정확한 모델링이 가능할 뿐만 아니라, 별도의 배치 프로세싱 없이 정형 데이터(100) 및 비정형 데이터(120)를 기반으로 한 AI 모델링 기능을 제공할 수 있다.In addition, the data processing platform according to an embodiment of the present invention not only enables more accurate modeling by having an operating platform and a modeling platform on one platform, but also enables structured data 100 and unstructured data 120 without separate batch processing. It can provide AI modeling functions based on .

이하, 보다 구체적인 본 발명의 실시예에 따른 데이터 처리 플랫폼의 기능이 개시된다.Hereinafter, the functions of the data processing platform according to a more specific embodiment of the present invention are disclosed.

도 2는 본 발명의 실시예에 따른 하나의 플랫폼 상에서 정형 데이터와 비정형 데이터를 처리하기 위한 데이터 처리 시스템을 나타낸 개념도이다.Figure 2 is a conceptual diagram showing a data processing system for processing structured data and unstructured data on one platform according to an embodiment of the present invention.

도 2에서는 정형 데이터 및 비정형 데이터를 하나의 플랫폼 상에서 처리하기 위한 데이터 처리 시스템이 개시된다.In Figure 2, a data processing system for processing structured data and unstructured data on one platform is disclosed.

도 2를 참조하면, 데이터 처리 시스템은 비정형 데이터(220)와 정형 데이터(210)를 하나의 플랫폼 상에서 처리 가능하다. 본 발명에서는 비정형 데이터(220)를 정형 데이터(210)와 함께 하나의 플랫폼에서 처리하기 위한 데이터 처리 신택스(syntax)가 새롭게 정의되고, 새롭게 정의된 데이터 처리 신택스의 사용이 가능한 확장(extended) SQL(240)이 정의될 수 있다.Referring to FIG. 2, the data processing system is capable of processing unstructured data 220 and structured data 210 on one platform. In the present invention, a data processing syntax for processing unstructured data 220 together with structured data 210 on one platform is newly defined, and an extended SQL (extended SQL) that can use the newly defined data processing syntax is provided. 240) can be defined.

정형 데이터(210)에 대한 일반 쿼리는 PostgreSQL과 같은 기존의 SQL을 기반으로 처리되고 비정형 데이터에 대한 쿼리는 본 발명에서 새롭게 정의된 확장 SQL(240)을 기반으로 처리될 수 있다.General queries for structured data 210 may be processed based on existing SQL such as PostgreSQL, and queries for unstructured data may be processed based on extended SQL 240 newly defined in the present invention.

확장 SQL(240) 상에서 새롭게 정의된 데이터 처리 신택스를 처리하기 위한 확장 SQL 엔진(250)이 정의될 수 있다. 확장 SQL 엔진(250)은 새롭게 정의된 데이터 처리 신택스를 처리 가능하도록 하기 위한 엔진일 수 있다.An extended SQL engine 250 may be defined to process the newly defined data processing syntax on the extended SQL 240. The extended SQL engine 250 may be an engine that enables processing of newly defined data processing syntax.

기존의 데이터 처리 시스템과 다르게 확장 SQL 엔진(250)을 기반으로 네스티드 쿼리(nested query)(230)가 가능하다. 네스티드 쿼리(230)는 정형 데이터(210) 및 비정형 데이터(220)에 대한 혼합적인 쿼리로서 데이터베이스에 저장된 정형 데이터(210)와 비정형 데이터(220)에 대한 순차적인 처리 또는 복합적인 처리를 가능하게 할 수 있다.Unlike existing data processing systems, nested queries (230) are possible based on the extended SQL engine (250). Nested query 230 is a mixed query for structured data 210 and unstructured data 220, enabling sequential or complex processing of structured data 210 and unstructured data 220 stored in the database. can do.

즉, 기존에 정형 데이터(210)와 비정형 데이터(220)가 별도의 DMBS(database management system)를 기반으로 처리되는 것과 다르게 본 발명에서는 정형 데이터(210)와 비정형 데이터(220)가 하나의 플랫폼 상에서 확장 SQL 엔진(250)을 기반으로 처리되고, 네스티드 쿼리(nested query)(230)를 기반으로 정형 데이터(210)와 비정형 데이터(220)에 대한 데이터 프로세싱이 하나의 데이터 베이스(260) 상에서 동시에 이루어질 수 있다. 이를 기반으로 정형 데이터(210)와 비정형 데이터(220)에 대한 AI 모델링도 데이터 처리 시스템의 AI 엔진(270) 상에서 이루어진다.That is, unlike the existing structured data 210 and unstructured data 220 that are processed based on separate DMBS (database management system), in the present invention, the structured data 210 and unstructured data 220 are processed on one platform. It is processed based on the extended SQL engine 250, and data processing for structured data 210 and unstructured data 220 is performed simultaneously on one database 260 based on nested query 230. It can be done. Based on this, AI modeling for structured data 210 and unstructured data 220 is also performed on the AI engine 270 of the data processing system.

AI 엔진은 분류 모델, 회귀 모델, 추천 모델, 음성 인식 모델 등 다양한 AI 엔진이 미리 제공될 수도 있고, 사용자가 직접 생성한 모델, 오픈 소스로 제공되는 AI 엔진 등 제한 없이 사용될 수 있다.The AI engine may be provided in advance with various AI engines such as classification models, regression models, recommendation models, and voice recognition models, or can be used without restrictions, such as models directly created by the user or AI engines provided as open source.

본 발명의 데이터 처리 시스템은 비정형 데이터(220)에 대한 별도의 배치 프로세싱, 별도의 언어, 별도의 플랫폼 없이 하나의 플랫폼 내에서 처리 가능하다. 본 발명의 데이터 처리 시스템은 정형 데이터(210)와 비정형 데이터(220) 모두 SQL 만으로 쿼리 가능하고 정형 데이터(210)와 비정형 데이터(220)에 대한 AI 모델링을 가능하게 하는 통합 플랫폼이다. 따라서 모델링 플랫폼과 운영 플랫폼이 동일하므로 모수가 달라져서 모델링의 정확도가 떨어지는 문제도 줄어들 수 있다.The data processing system of the present invention can process unstructured data 220 within one platform without separate batch processing, separate language, or separate platform. The data processing system of the present invention is an integrated platform that allows both structured data 210 and unstructured data 220 to be queried using only SQL and enables AI modeling for structured data 210 and unstructured data 220. Therefore, since the modeling platform and the operating platform are the same, the problem of poor modeling accuracy due to different parameters can be reduced.

또한, 본 발명의 데이터 처리 시스템은 RDB(relational database), AI 그리고 빅데이터 플랫폼(big data platform)의 기능을 하나의 플랫폼에서 적용할 수 있으며 AI 기반의 디지털 전환시 발생하는 비효율성을 획기적으로 줄일 수 있고, 빅데이터 처리 및 분산 병렬 처리 기술을 기반으로 하여 기존 대비 2배 이상 빠른 데이터 처리를 가능하게 한다.In addition, the data processing system of the present invention can apply the functions of RDB (relational database), AI, and big data platform in one platform, and can dramatically reduce inefficiencies that occur during AI-based digital transformation. Based on big data processing and distributed parallel processing technology, it enables data processing more than twice as fast as before.

즉, 본 발명의 실시예에 따르면, 데이터베이스 상에서 정형 데이터와 비정형 데이터를 처리하는 방법은 데이터 처리 시스템이 네스티드 쿼리를 수신하는 단계와 데이터 처리 시스템이 네스티드 쿼리에 대한 처리를 수행하는 단계를 포함할 수 있다. 네스티드 쿼리는 비정형 데이터에 대한 제1 쿼리 및 정형 데이터에 대한 제2 쿼리를 혼합한 쿼리일 수 있다.That is, according to an embodiment of the present invention, a method of processing structured data and unstructured data in a database includes the steps of a data processing system receiving a nested query and the data processing system performing processing on the nested query. can do. A nested query may be a query that mixes a first query for unstructured data and a second query for structured data.

네스티드 쿼리에 대한 처리를 수행하는 단계는, 데이터 처리 시스템이 확장된 SQL(extended structured query language)을 처리하는 확장 SQL 엔진을 기반으로 비정형 데이터에 대한 처리를 수행하는 단계와 데이터 처리 시스템이 PostgreSQL(extended structured query language)을 처리하는 일반 SQL 엔진을 기반으로 정형 데이터에 대한 처리를 수행하는 단계를 포함할 수 있다.The step of processing a nested query includes a step in which the data processing system performs processing on unstructured data based on an extended SQL engine that processes extended structured query language (SQL), and a data processing system that processes PostgreSQL (PostgreSQL). It may include processing structured data based on a general SQL engine that processes extended structured query language.

데이터 처리 시스템은 정형 데이터에 대한 데이터 테이블 및 비정형 데이터에 대한 데이터 테이블을 생성하여 하나의 데이터베이스 상에서 처리하고, 데이터 처리 시스템은 정형 데이터 및 비정형 데이터를 기반으로 한 인공 지능 엔진 모델링을 하나의 데이터베이스 상에서 지원할 수 있다.The data processing system creates data tables for structured data and data tables for unstructured data and processes them in one database, and the data processing system supports artificial intelligence engine modeling based on structured data and unstructured data in one database. You can.

또한, 본 발명의 실시예에 따르면, 데이터 처리 시스템은 정형 데이터 및 비정형 데이터 각각에 대한 개별적인 처리를 수행할 수도 있다. 데이터 처리 시스템은 비정형 데이터 처리 쿼리 및 정형 데이터 처리 쿼리를 수신하고, 비정형 데이터 처리 쿼리 및 정형 데이터 처리 쿼리를 처리하도록 구현될 수 있다. 비정형 데이터 처리 쿼리는 비정형 데이터만을 처리하기 위한 쿼리이고, 정형 데이터 처리 쿼리는 정형 데이터만을 처리하기 위한 쿼리일 수 있다.Additionally, according to an embodiment of the present invention, the data processing system may perform individual processing for each of structured data and unstructured data. The data processing system may be implemented to receive unstructured data processing queries and structured data processing queries, and process the unstructured data processing queries and structured data processing queries. An unstructured data processing query may be a query for processing only unstructured data, and a structured data processing query may be a query for processing only structured data.

비정형 데이터 처리 쿼리는 확장 SQL 및 확장 SQL 엔진을 기반으로 처리 되고, 정형 데이터 처리 쿼리는 일반 SQL(PostgreSQL) 및 일반 SQL 엔진을 기반으로 처리될 수 있다.Unstructured data processing queries can be processed based on extended SQL and extended SQL engines, and structured data processing queries can be processed based on general SQL (PostgreSQL) and general SQL engines.

도 3은 본 발명의 실시예에 따른 하나의 플랫폼 상에서 정형 데이터와 비정형 데이터를 처리하기 위한 데이터 처리 시스템을 나타낸 개념도이다.Figure 3 is a conceptual diagram showing a data processing system for processing structured data and unstructured data on one platform according to an embodiment of the present invention.

도 3에서는 기존에 정의된 일반 쿼리와 비정형 데이터를 위해 확장된 SQL을 기반으로 정의된 확장 쿼리가 네스티드 쿼리를 형성하고, 네스티드 쿼리가 데이터 처리 시스템에서 처리되는 방법이 개시된다.In Figure 3, a previously defined general query and an extended query defined based on extended SQL for unstructured data form a nested query, and a method of processing the nested query in a data processing system is disclosed.

도 3을 참조하면, 입력 쿼리(300)로서 비정형 데이터 및 정형 데이터에 대한 처리를 위한 네스티드 쿼리가 입력될 수 있다.Referring to FIG. 3, a nested query for processing unstructured data and structured data may be input as the input query 300.

예를 들어, 네스티드 쿼리는 제1 쿼리(310), 제2 쿼리(320) 및 제3 쿼리(330)를 포함할 수 있고, 제1 쿼리(310) 및 제3 쿼리(330)는 확장 쿼리(350)이고, 제2 쿼리(320)는 일반 쿼리(360)일 수 있다.For example, a nested query may include a first query 310, a second query 320, and a third query 330, and the first query 310 and the third query 330 are extended queries. 350, and the second query 320 may be a general query 360.

제1 쿼리(310)는 PRINT IMAGE, 제2 쿼리(320)는 SELECT, 제3 쿼리(330)는 SEARCH IMAGE일 수 있다. 제1 쿼리(310), 제2 쿼리(320) 및 제3 쿼리(330)는 네스티드 구조로 입력 쿼리를 형성할 수 있다.The first query 310 may be PRINT IMAGE, the second query 320 may be SELECT, and the third query 330 may be SEARCH IMAGE. The first query 310, the second query 320, and the third query 330 may form an input query in a nested structure.

입력 쿼리(300)는 파서를 통해 파싱될 수 있다. 렉서(lexer)를 기반으로 네스티트 쿼리는 일반 쿼리(360)와 확장 쿼리(350)로 구분되고, 파서는 일반 쿼리(360)와 확장 쿼리(350)를 분할할 수 있다.The input query 300 may be parsed through a parser. Based on the lexer, nested queries are divided into general queries (360) and extended queries (350), and the parser can split the general queries (360) and extended queries (350).

제1 쿼리(310), 제2 쿼리(320) 및 제3 쿼리(330)는 클라우즈 아날라이즈(clause analyze) 및 쿼리 트리(query tree)를 통해 해석되어 처리될 수 있다. 제3 쿼리(330), 제2 쿼리(320) 및 제1 쿼리(310)의 순서로 처리될 수 있다. The first query 310, the second query 320, and the third query 330 may be interpreted and processed through cloud analysis and a query tree. The third query 330, the second query 320, and the first query 310 may be processed in this order.

제1 쿼리(310) 및 제3 쿼리(330)는 확장 쿼리(350)로서 확장 SQL 엔진을 기반으로 처리될 수 있고, 제2 쿼리(320)는 일반 쿼리로서 일반 쿼리 처리를 위한 SQL 엔진인 PostgreSQL 엔진을 기반으로 처리될 수 있다.The first query 310 and the third query 330 are extended queries 350 and can be processed based on an extended SQL engine, and the second query 320 is a general query, which is PostgreSQL, a SQL engine for general query processing. It can be processed based on the engine.

표준화 SQL 엔진과 PostgreSQL 엔진은 하나의 데이터베이스와 연결되어 쿼리를 처리할 수 있다. 하나의 데이터베이스를 기반으로 정형 데이터 및 비정형 데이터를 기반으로 한 인공 지능 학습이 가능하다.The standardized SQL engine and PostgreSQL engine can be connected to one database and process queries. Artificial intelligence learning based on structured and unstructured data is possible based on one database.

도 4는 본 발명의 실시예에 따른 데이터 처리 시스템의 동작을 나타낸 개념도이다.Figure 4 is a conceptual diagram showing the operation of a data processing system according to an embodiment of the present invention.

도 4에서는 하나의 플랫폼 상에서 정형 데이터와 비정형 데이터를 동시에 처리하기 위한 확장된 SQL 중 쿼리 기능이 개시된다.In Figure 4, an extended SQL query function for simultaneously processing structured data and unstructured data on one platform is disclosed.

도 4를 참조하면, 비정형 데이터에 대한 쿼리 기능은 아래와 같은 확장된 SQL을 기반으로 수행될 수 있다.Referring to FIG. 4, the query function for unstructured data can be performed based on the extended SQL below.

(1) 저장 모델 확인(LIST)(410)(1) Check storage model (LIST) (410)

사용자는 "LIST" 구문을 사용하여 비정형 데이터를 프로세싱하기 위한 비정형 데이터 테이블을 위해 미리 생성된 모델(pre-built model)과 사용자가 생성한 사용자 생성 모델을 확인할 수 있다.Users can use the "LIST" syntax to check pre-built models and user-created models for unstructured data tables for processing unstructured data.

예를 들어, LIST MODEL 기능을 통해 사용자에 의해 생성된 사용자 생성 모델에 대한 확인이 가능하고, LIST PREBUILT MODEL 기능을 사용하여 미리 생성된 모델에 대한 확인이 가능하다.For example, it is possible to check user-generated models created by users through the LIST MODEL function, and it is possible to check pre-created models using the LIST PREBUILT MODEL function.

(2) 비정형 특성 추가(CONVERT USING)(430)(2) Add unstructured characteristics (CONVERT USING) (430)

사용자는 "CONVERT USING" 구문을 사용하여 이미지, 비디오, 음성 등 비정형 데이터의 정보를 이용해서 수치화 알고리즘을 사용하여 벡터 형식으로 변환하고 이값을 사용할 데이터 세트에 추가할 수 있다. Using the "CONVERT USING" statement, users can use information from unstructured data such as images, videos, and voices to convert it into vector format using a numerical algorithm and add this value to the data set to be used.

아래의 표 2는 CONVERT USING 구문의 예시이다Table 2 below is an example of the CONVERT USING statement.

<표 2><Table 2>

CONVERT USING [사용할 인공지능 모델]CONVERT USING [AI model to use]

OPTIONS (OPTIONS (

table_name=[저장될 테이블명]table_name=[table name to be saved]

))

ASAS

[사용할 데이터 세트][Dataset to use]

예를 들어, CONVERT USING 기능을 사용하여 특정 경로에 존재하는 이미지 파일을 추가적인 속성 추출 인공지능 모델을 사용하여 데이터 테이블로서 데이터베이스 상에 생성할 수 있다.For example, using the CONVERT USING function, an image file that exists in a specific path can be created on the database as a data table using an additional attribute extraction artificial intelligence model.

(3) 비정형 데이터 검색(SEARCH)(440)(3) Unstructured data search (SEARCH) (440)

SEARCH 구문은 비정형 데이터에서 내용이나 의미 또는 유사도 등을 검색하기 위해 사용될 수 있다. The SEARCH syntax can be used to search for content, meaning, or similarity in unstructured data.

아래의 표 3은 SEARCH 구문의 예시이다.Table 3 below is an example of the SEARCH syntax.

<표 3><Table 3>

SEARCH [사용자 지정 데이터 테이블 이름]SEARCH [custom data table name]

USING [사용할 인공지능 모델]USING [AI model to use]

AS [사용할 데이터 세트]AS [dataset to use]

예를 들어, 이미지 수치화 인공 지능 모델을 기반으로 유사 이미지에 대한 검색을 진행하기 위해 SEARCH 구문이 활용될 수 있다.For example, the SEARCH statement can be used to search for similar images based on an image quantification artificial intelligence model.

(4) 결과 출력(PRINT)(450)(4) Print results (PRINT) (450)

사용자는 "PRINT" 구문을 사용하여 이미지, 오디오 및 비디오 파일을 출력할 수 있다. 또한, 서브 쿼리를 사용하여 "PRINT" 구문을 통해 나온 결과를 바로 출력할 수 있다. Users can output image, audio, and video files using the "PRINT" syntax. Additionally, you can use a subquery to immediately output the results obtained through the "PRINT" statement.

아래의 표 4는 "PRINT" 구문의 예시이다.Table 4 below is an example of the "PRINT" syntax.

<표 4><Table 4>

PRINT IMAGE, AUDIO, VIDEOPRINT IMAGE, AUDIO, VIDEO

AS [출력할 데이터 세트]AS [data set to output]

예를 들어, PRINT 쿼리문을 사용하여 데이터 테이블에 있는 이미지 파일/비디오 파일/오디오파일들을 출력할 수 있다.For example, you can use the PRINT query statement to output image files/video files/audio files in a data table.

위의 쿼리 신택스(syntax)는 본 발명에서 확정된 SQL을 위해 새롭게 정의된 신택스이다.The above query syntax is a newly defined syntax for SQL confirmed in the present invention.

위와 같은 쿼리 신텍스를 기반으로 만들어진 비정형 데이터 테이블을 기반으로 키워드 또는 텍스트를 기반으로 한 이미지 데이터, 오디오 데이터, 비디오 데이터의 검색이 가능하다. 또한, 이미지 데이터, 오디오 데이터, 비디오 데이터를 기반으로 한 이미지 데이터, 오디오 데이터, 비디오 데이터의 검색도 가능하다.It is possible to search image data, audio data, and video data based on keywords or text based on an unstructured data table created based on the above query syntax. In addition, it is possible to search image data, audio data, and video data based on image data, audio data, and video data.

즉, 본 발명의 실시예에 따른 데이터 처리 시스템에서는 기존의 정형 데이터에 대한 실시간 검색에 추가하여 위와 같은 비정형 데이터에 대한 실시간 검색이 가능하다. 또한, 위와 같은 확장된SQL을 기반으로 비정형 데이터 및 정형 데이터에 대한 쿼리의 조합인 네스티드 쿼리(nested query)도 가능하여 비정형 데이터 및 정형 데이터를 모두 활용한 모델링도 가능하다.That is, in the data processing system according to an embodiment of the present invention, real-time search for the above unstructured data is possible in addition to real-time search for existing structured data. In addition, based on the above extended SQL, nested queries, which are a combination of queries on unstructured data and structured data, are also possible, making modeling using both unstructured and structured data possible.

도 5는 본 발명의 실시예에 따른 데이터 처리 시스템의 동작을 나타낸 개념도이다.Figure 5 is a conceptual diagram showing the operation of a data processing system according to an embodiment of the present invention.

도 5에서는 하나의 플랫폼 상에서 정형 데이터와 비정형 데이터를 동시에 처리하기 위한 확장된 SQL 중 ML(machine learning) 기능이 개시된다.In Figure 5, the ML (machine learning) function of extended SQL for simultaneously processing structured data and unstructured data on one platform is disclosed.

도 5를 참조하면, 비정형 데이터에 대한 ML 기능은 아래와 같은 확장된 SQL을 기반으로 수행될 수 있다.Referring to FIG. 5, ML functions for unstructured data can be performed based on extended SQL as shown below.

(1) 모델 학습(BUILD MODEL)(510)(1) Model learning (BUILD MODEL) (510)

사용자는 "BUILD MODEL" 구문을 사용하여 인공지능모델을 생성할 수 있다. Users can create an artificial intelligence model using the “BUILD MODEL” statement.

아래의 표 5는 "BUILD MODEL" 구문의 예시이다.Table 5 below is an example of the “BUILD MODEL” syntax.

<표 5><Table 5>

BUILD MODEL [사용자 지정 모델 이름]BUILD MODEL [custom model name]

USING [사용할 인공지능모델]USING [Artificial intelligence model to use]

OPTIONS([인공지능모델을 만들 때 필요한 옵션값])OPTIONS([Option values required when creating an artificial intelligence model])

AS [사용할 데이터 세트]AS [dataset to use]

예를 들어, 사용자는 "BUILD MODEL" 구문을 사용하여 영화를 추천하는 영화추천모델을 생성할 수 있다.For example, a user can use the "BUILD MODEL" syntax to create a movie recommendation model that recommends movies.

(2) 모델 평가(EVALUATE)(520)(2) Model EVALUATE (520)

사용자는 "EVALUATE" 구문을 사용하여 인공지능모델에 대한 성능 평가를 수행할 수 있다. Users can perform performance evaluation of artificial intelligence models using the “EVALUATE” statement.

아래의 표 6은 "EVALUATE" 구문의 예시이다.Table 6 below is an example of the "EVALUATE" syntax.

<표 6><Table 6>

EVALUATEEVALUATE

USING [기존 학습한 모델 이름]USING [name of previously learned model]

OPTIONS ([모델별 평가시 필요한 옵션값])OPTIONS ([Option values required when evaluating each model])

ASAS

[사용할 데이터 세트][Dataset to use]

예를 들어, "EVALUATE" 구문을 사용하여 사용자가 모델 학습하기에서 만들었던 분류 모델에 대한 평가가 수행될 수 있다.For example, the "EVALUATE" syntax can be used to evaluate the classification model that the user created in Learning a Model.

(3) 모델 재학습(FIT MODEL)(530)(3) Model retraining (FIT MODEL) (530)

사용자는 "FIT MODEL" 구문을 사용하여 모델을 재학습시킬 수 있다.Users can retrain the model using the "FIT MODEL" statement.

아래의 표 7은 "FIT MODEL" 구문의 예시이다.Table 7 below is an example of the “FIT MODEL” syntax.

<표 7><Table 7>

FIT MODEL [사용자 지정 모델 이름]FIT MODEL [custom model name]

USING [기존 학습한 모델 이름 | 사전 학습된 인공지능모델 이름]USING [Name of previously learned model | Pre-trained artificial intelligence model name]

OPTIONS ([인공지능모델을 만들 때 필요한 옵션값])OPTIONS ([Option values required when creating an artificial intelligence model])

ASAS

[사용할 데이터세트][Dataset to use]

예를 들어, "FIT MODEL"을 사용하여 사용자가 이전에 만들었던 모델에 새롭게 추가된 데이터세트를 사용하여 모델을 재학습시킬 수 있다.For example, using "FIT MODEL", a user can retrain a previously created model using a newly added dataset.

(4) 모델 업로드(UPLOAD MODEL)(540)(4) UPLOAD MODEL (540)

사용자는 "UPLOAD MODEL" 구문을 사용하여 파이썬 환경에서 직접/자체적으로 만든 모델을 확장SQL을 기반으로 작동할 수 있도록 사용자의 모델을 확장 SQL로 업로드할 수 있다.Users can use the "UPLOAD MODEL" syntax to upload their models as extended SQL so that models created directly/self-created in the Python environment can be operated based on extended SQL.

UPLOAD MODEL [사용자 지정 모델 이름]UPLOAD MODEL [custom model name]

OPTIONS([모델 업로드 시 필요한 옵션갑])OPTIONS([Options required when uploading model])

FROM [업로드 할 모델의 경로]FROM [path to model to upload]

(5) 모델 적용하기(PREDICT)(550)(5) Applying the model (PREDICT) (550)

사용자는 "PREDICT" 구문을 사용하여 테스트 데이터 세트에 인공지능 모델을 적용하여 예측, 분류, 추천 등의 작업을 수행할 수 있다.Using the "PREDICT" syntax, users can apply artificial intelligence models to test data sets to perform tasks such as prediction, classification, and recommendations.

아래의 표 9는 "PREDICT" 구문의 예시이다.Table 9 below is an example of the "PREDICT" syntax.

<표 9><Table 9>

PREDICTPREDICT

USING [기존 학습한 모델 이름]USING [name of previously learned model]

OPTIONS ([모델별 추론시 필요한 옵션값])OPTIONS ([Option values required for inference for each model])

ASAS

[사용할 테스트 데이터 세트][Test dataset to use]

예를 들어, "PREDICT USING" 구문을 사용하여 기존에 모델 학습하기에서 만들어었던 기존 추천 모델을 사용하여 사용자 ID 31인 사용자가 좋아할만한 영화 목록에 대한 추천이 가능할 수 있다.For example, using the “PREDICT USING” syntax, it is possible to recommend a list of movies that the user with user ID 31 might like using the existing recommendation model created in the previous model training.

(6) 모델 삭제하기(DELETE MODEL)(560)(6) Deleting a model (DELETE MODEL) (560)

사용자는 "DELETE MODEL" 구문을 사용하여 생성되거나 업로드된 모델을 삭제할 수 있다. Users can delete created or uploaded models using the "DELETE MODEL" statement.

아래의 표 10은 "DELETE MODEL" 구문의 예시이다.Table 10 below is an example of the “DELETE MODEL” statement.

<표 10><Table 10>

DELETE MODEL [삭제할 모델 이름]DELETE MODEL [model name to delete]

예를 들어, "DELETE MODEL" 구문을 기반으로 사용자가 모델 학습하기에서 만들었던 영화추천모델이 데이터베이스에서 삭제될 수 있다.For example, the movie recommendation model that the user created in model training based on the "DELETE MODEL" statement may be deleted from the database.

위와 같은 확장된SQL을 기반으로 비정형 데이터 및 정형 데이터를 기반으로 한 AI 모델링이 별도의 배치 프로세스 없이 하나의 플랫폼인 데이터 처리 시스템 상에서 수행될 수 있다. Based on the above extended SQL, AI modeling based on unstructured data and structured data can be performed on a single platform, a data processing system, without a separate batch process.

데이터 처리 시스템 상에서는 미리 생성된 AI 모델 및 사용자에 의해 생성된 AI 모델이 위치할 수 있다. 이러한 AI 모델 생성을 통해 분류 모델, 회귀 모델, 추천 시스템, 음성 인식 모델 등 다양한 AI 모델에 대한 생성이 이루어질 수 있다.In the data processing system, a pre-generated AI model and an AI model created by a user may be located. Through this AI model creation, various AI models such as classification models, regression models, recommendation systems, and voice recognition models can be created.

도 6은 본 발명의 실시예에 따른 데이터 처리 시스템을 기반으로 한 데이터 처리 방법을 나타낸 개념도이다.Figure 6 is a conceptual diagram showing a data processing method based on a data processing system according to an embodiment of the present invention.

도 6에서는 전술한 데이터 처리 시스템을 기반으로 한 별도의 데이터베이스 상의 데이터 처리 방법이 개시된다. In Figure 6, a method of processing data on a separate database based on the data processing system described above is disclosed.

도 6를 참조하면, 도 1에서 도 5에서 전술한 바와 같이 데이터 처리 시스템 자체 데이터베이스를 기반으로 한 정형 데이터 및 비정형 데이터의 처리가 수행될 수도 있다. 하지만, 사용자는 사용자의 데이터베이스를 사용하되, 데이터 처리 시스템에서 제공하는 확장 SQL 및 확장 SQL 엔진의 기능을 API를 기반으로 활용할 수 있다.Referring to FIG. 6, as described above with reference to FIGS. 1 to 5, processing of structured data and unstructured data may be performed based on the data processing system's own database. However, users can use their own database and utilize the functions of the extended SQL and extended SQL engine provided by the data processing system based on the API.

데이터 처리 시스템의 자체 데이터베이스를 기반으로 한 정형 데이터 및 비정형 데이터의 처리는 내부 데이터 처리라는 용어로 표현될 수 있다. 데이터 처리 시스템의 자체 데이터베이스가 아닌 외부 데이터베이스를 기반으로 한 정형 데이터 및 비정형 데이터의 처리는 외부 데이터 처리라는 용어로 표현될 수 있다. The processing of structured and unstructured data based on the data processing system's own database can be expressed in the term internal data processing. The processing of structured and unstructured data based on an external database rather than the data processing system's own database can be expressed in the term external data processing.

내부 데이터 처리의 경우 전술한 도 1 내지 도 5에서 개시된 프로세스를 기반으로 처리될 수 있다.In the case of internal data processing, it can be processed based on the process disclosed in FIGS. 1 to 5 described above.

외부 데이터 처리를 위해 외부에서 본 발명의 실시예에 따른 데이터 처리 시스템을 사용하기 위해서는 제공된 'API' 또는 '데이터 이전 방법'을 사용하여 외부 데이터를 본 발명의 데이터 처리 시스템에 저장 및 변환시켜줘야 한다. 저장 및 변환이 완료된 데이터들에 대해서는 API를 사용하여 본 발명의 데이터 처리 시스템을 활용할 수 있다. 즉, 자체 엔진과 PostgreSQL 엔진 모두 외부 데이터베이스가 아닌 본 발명의 실시예에 따른 데이터베이스를 액세스하여 데이터 처리를 수행할 수 있다.In order to use the data processing system according to an embodiment of the present invention from the outside for external data processing, external data must be stored and converted into the data processing system of the present invention using the provided 'API' or 'data transfer method'. For data that has been stored and converted, the data processing system of the present invention can be utilized using the API. That is, both the internal engine and the PostgreSQL engine can perform data processing by accessing the database according to the embodiment of the present invention rather than an external database.

외부 데이터 처리의 경우, 사용자는 사용자의 데이터베이스에 저장된 별도의 비정형 데이터를 기반으로 한 학습을 API를 통해 확장된 SQL 및 확장 SQL 엔진의 기능을 기반으로 수행할 수 있다.In the case of external data processing, users can perform learning based on separate unstructured data stored in the user's database based on the functions of extended SQL and extended SQL engine through API.

예를 들어, 특정 사용자는 보안 업체로서 CCTV 영상을 저장하는 사용자 데이터베이스를 운영할 수 있다. 사용자는 본 발명의 데이터 처리 시스템의 확장된 SQL을 기반으로 CCTV 영상에 대한 인공지능학습을 사용자 데이터베이스에 저장된 데이터를 기반으로 수행할 수 있다. 외부 데이터베이스에서 본 발명의 데이터 처리 시스템의 데이터베이스로 정형 데이터 및 본 발명에서 정의된 비정형 데이터를 처리하기 위한 비정형 데이터에 대한 쿼리문을 기반으로 정형 데이터와 비정형 데이터가 삽입될 수 있다. 본 발명의 실시예에 따른 데이터 처리 시스템에 입력된 정형 데이터와 비정형 데이터에 대한 AI 모델링이 본 발명의 실시예에 따른 데이터 처리 시스템의 AI 엔진을 기반으로 수행될 수 있다.For example, a specific user may be a security company and operate a user database that stores CCTV footage. Based on the extended SQL of the data processing system of the present invention, users can perform artificial intelligence learning on CCTV images based on data stored in the user database. Structured data and unstructured data can be inserted from an external database into the database of the data processing system of the present invention based on a query statement for unstructured data for processing structured data and unstructured data defined in the present invention. AI modeling for structured data and unstructured data input to the data processing system according to an embodiment of the present invention can be performed based on the AI engine of the data processing system according to an embodiment of the present invention.

즉, 복수의 서로 다른 데이터베이스 상의 정형 데이터와 비정형 데이터를 처리하는 방법은 데이터 처리 시스템이 외부 데이터베이스로부터 외부 데이터를 수신하는 단계, 상기 데이터 처리 시스템이 상기 외부 데이터를 변환하는 단계와 데이터 처리 시스템이 변환된 상기 외부 데이터를 처리하는 단계를 포함할 수 있다.That is, the method of processing structured data and unstructured data on a plurality of different databases includes the steps of a data processing system receiving external data from an external database, the data processing system converting the external data, and the data processing system converting the external data. It may include processing the external data.

이때, 외부 데이터는 정형 데이터와 비정형 데이터를 포함하고, 데이터 처리 시스템은 네스티드 쿼리를 기반으로 정형 데이터 및 비정형 데이터를 처리하고, 네스티드 쿼리는 비정형 데이터에 대한 제1 쿼리 및 정형 데이터에 대한 제2 쿼리를 혼합한 쿼리일 수 있다.At this time, the external data includes structured data and unstructured data, the data processing system processes structured data and unstructured data based on nested queries, and the nested query is the first query for unstructured data and the second query for structured data. It may be a mixed query of 2 queries.

데이터 처리 시스템은 비정형 데이터 처리 쿼리를 기반으로 비정형 데이터를 처리하고, 데이터 처리 시스템은 정형 데이터 처리 쿼리를 기반으로 정형 데이터를 처리할 수 있다.A data processing system can process unstructured data based on unstructured data processing queries, and the data processing system can process structured data based on structured data processing queries.

네스티드 쿼리는 비정형 데이터에 대한 제1 쿼리 및 정형 데이터에 대한 제2 쿼리를 혼합한 쿼리이고, 비정형 데이터 처리 쿼리는 상기 비정형 데이터만을 처리하기 위한 쿼리이고, 정형 데이터 처리 쿼리는 정형 데이터만을 처리하기 위한 쿼리일 수 있다.A nested query is a query that combines a first query for unstructured data and a second query for structured data, an unstructured data processing query is a query for processing only the unstructured data, and a structured data processing query is a query for processing only structured data. It could be a query for

도 7은 본 발명의 실시예에 따른 데이터 처리 시스템에서 쿼리를 분석하여 데이터를 처리하는 방법을 나타낸 개념도이다.Figure 7 is a conceptual diagram showing a method of analyzing queries and processing data in a data processing system according to an embodiment of the present invention.

도 7에서는 입력된 쿼리를 분석하여 비정형 데이터 및 정형 데이터를 하나의 데이터 처리 시스템에서 처리를 수행하되 쿼리를 처리함에 있어 서로 다른 프로세싱 자원을 활용하는 방법이 개시된다.In Figure 7, a method is disclosed that analyzes an input query and processes unstructured data and structured data in one data processing system, but utilizes different processing resources when processing the query.

도 7을 참조하면, 입력 쿼리로서 비정형 데이터 및/또는 정형 데이터에 대한 처리를 위한 쿼리가 입력될 수 있다. 입력된 쿼리는 비정형 데이터 또는 정형 데이터에 대한 일반 쿼리일 수도 있고, 비정형 데이터 및 정형 데이터에 대한 혼합적인 처리를 위한 네스티드 쿼리일 수도 있다.Referring to FIG. 7, a query for processing unstructured data and/or structured data may be input as an input query. The entered query may be a general query for unstructured data or structured data, or a nested query for mixed processing of unstructured data and structured data.

입력 쿼리는 파서부(700)를 통해 파싱될 수 있다. 네스티트 쿼리는 일반 쿼리와 확장 쿼리로 구분되고, 파서부(700)는 일반 쿼리와 확장 쿼리를 분할할 수 있다.The input query may be parsed through the parser 700. Nested queries are divided into general queries and extended queries, and the parser unit 700 can split the general queries and extended queries.

입력 쿼리에 포함되는 복수의 쿼리는 클라우즈 아날라이즈부(clause analyze) (710) 및 쿼리 트리부(query tree)(720)를 통해 해석되어 처리될 수 있다.A plurality of queries included in the input query may be analyzed and processed through the cloud analyzer 710 and the query tree 720.

본 발명에서는 설명의 편의상 입력 쿼리로서 네스티드 쿼리가 입력되고, 네스티드 쿼리는 제1 쿼리(715), 제2 쿼리(725) 및 제3 쿼리(735)를 포함할 수 있다. 클라우즈 아날라이즈부(710)는 제1 쿼리(715), 제2 쿼리(725) 및 제3 쿼리(735)에 대하여 어떠한 컴퓨팅 자원을 사용하여 프로세싱되어야 할지를 결정할 수 있다. 보다 구체적으로 클라우즈 아날라이즈부(710)는 네스티드 쿼리에 포함되는 복수의 쿼리에 대한 분석을 통해 쿼리가 CPU(770)에서 실행되어야 할지 GPU(780)에서 실행되어야 할지를 결정할 수 있다. 클라우즈 아날라이즈부(710)는 본 발명의 실시예에 따른 자원 분배 알고리즘을 기반으로 복수의 쿼리 각각의 실행 예상 능률 및 실행 예상 속도를 판단하고 실행 예상 능률 및 실행 예상 속도를 기반으로 복수의 쿼리 각각을 CPU(770) 또는 GPU(780)에 할당할 수 있다.In the present invention, for convenience of explanation, a nested query is input as an input query, and the nested query may include a first query 715, a second query 725, and a third query 735. The cloud analyzer 710 may determine which computing resources should be used to process the first query 715, second query 725, and third query 735. More specifically, the cloud analyzer 710 can determine whether the query should be executed on the CPU 770 or the GPU 780 through analysis of a plurality of queries included in the nested query. The cloud analyzer 710 determines the expected execution efficiency and expected execution speed of each of a plurality of queries based on the resource distribution algorithm according to an embodiment of the present invention, and determines the expected execution efficiency and expected execution speed of the plurality of queries based on the expected execution efficiency and expected execution speed. Each can be assigned to the CPU 770 or GPU 780.

제1 쿼리(715) 및 제3 쿼리(735)는 확장 쿼리이고, 제2 쿼리(725)는 일반 쿼리인 경우가 가정될 수 있다. 제1 쿼리(715) 및 제3 쿼리(735)는 확장 SQL 엔진(760)을 기반으로 처리될 수 있고, 제2 쿼리(725)는 일반 쿼리로서 일반 쿼리 처리를 위한 SQL 엔진인 PostgreSQL 엔진(750)을 기반으로 처리될 수 있다.It can be assumed that the first query 715 and the third query 735 are extended queries, and the second query 725 is a general query. The first query 715 and the third query 735 can be processed based on the extended SQL engine 760, and the second query 725 is a general query, and the PostgreSQL engine (750) is a SQL engine for general query processing. ) can be processed based on.

이때, 자원 분배 알고리즘을 기반으로 제1 쿼리(715)는 GPU(780), 제3 쿼리(735)는 CPU(770), 제2 쿼리(725)는 GPU(780)를 기반으로 처리될 수 있다. 자원 분배 알고리즘을 기반으로 네스티드 쿼리의 처리 속도는 향상되고 컴퓨팅 자원은 보다 효율적으로 활용될 수 있다.At this time, based on the resource distribution algorithm, the first query 715 may be processed based on the GPU 780, the third query 735 may be processed based on the CPU 770, and the second query 725 may be processed based on the GPU 780. . Based on the resource distribution algorithm, the processing speed of nested queries can be improved and computing resources can be utilized more efficiently.

이러한 자원 분배 알고리즘은 일반 쿼리를 위한 SQL 엔진인 PostgreSQL 엔진(750)과 확장 쿼리를 위한 확장 SQL 엔진(760)에 모두 적용되어 활용될 수 있고, 모델링 역시 GPU를 사용하여 진행하기 때문에 빠른 모델링 결과가 도출될 수 있다.This resource distribution algorithm can be applied and utilized to both the PostgreSQL engine (750), a SQL engine for general queries, and the extended SQL engine (760) for extended queries, and modeling is also performed using GPU, providing fast modeling results. can be derived.

도 8은 본 발명의 실시예에 따른 자원 분배 알고리즘을 나타낸 개념도이다.Figure 8 is a conceptual diagram showing a resource distribution algorithm according to an embodiment of the present invention.

도 8에서는 자원 분배 알고리즘을 기반으로 쿼리를 처리시 CPU 또는 GPU를 선택하기 위한 방법이 개시된다. In Figure 8, a method for selecting CPU or GPU when processing a query based on a resource distribution algorithm is disclosed.

도 8을 참조하면, 확장 쿼리(800)와 일반 쿼리(820)를 처리시 CPU(860) 또는 GPU(850)를 사용할지 여부를 결정하는 방법이 개시된다.Referring to FIG. 8, a method for determining whether to use the CPU 860 or the GPU 850 when processing an extended query 800 and a general query 820 is disclosed.

(1) 확장 쿼리(800)(1) Extended query (800)

1) 확장 쿼리(800) 중 확장 엔진 상에서 GPU(850)를 필수적으로 사용해야 하는 모델을 기반으로 처리되어야 하는 제1 타입 확장 쿼리(803)는 GPU(850)를 기반으로 처리될 수 있다.1) Among the expansion queries 800, the first type expansion query 803, which must be processed based on a model that essentially requires the use of the GPU 850 on the expansion engine, can be processed based on the GPU 850.

2) 확장 쿼리(800) 중 확장 엔진 상에서 GPU(850)를 필수적으로 사용하지 않아도 처리 가능한 제2 타입 확장 쿼리(806)는 컴퓨팅 리소스 중 CPU 실행 능력과 GPU 실행 능력을 고려하여 CPU(860) 또는 GPU(850)를 기반으로 처리될 수 있다.2) Among the extended queries 800, the second type extended query 806, which can be processed without necessarily using the GPU 850 on the expansion engine, is processed by the CPU 860 or It can be processed based on GPU 850.

CPU 실행 능력은 CPU를 기반으로 쿼리를 처리시 쿼리 처리 속도, 리소스 사용량, 쿼리 비용을 기반으로 결정될 수 있다. GPU 실행 능력은 GPU를 기반으로 쿼리를 처리시 쿼리 처리 속도, 리소스 사용량, 쿼리 비용을 기반으로 결정될 수 있다. CPU 실행 능력을 결정하기 위한 쿼리 비용은 데이터 처리량(row processed), CPU 연산 비용을 기반으로 결정되고, GPU 실행 능력을 결정하기 위한 쿼리 비용은 데이터 처리량(row processed), GPU 연산 비용을 기반으로 결정될 수 있다. 쿼리 처리 속도가 상대적으로 빠르고 리소스 사용량이 상대적으로 작고, 쿼리 비용이 상대적으로 작을 수록 실행 능력은 높게 판단될 수 있다.CPU execution ability can be determined based on query processing speed, resource usage, and query cost when processing queries based on CPU. GPU execution ability can be determined based on query processing speed, resource usage, and query cost when processing queries based on GPU. The query cost to determine CPU execution ability is determined based on data processing volume (row processed) and CPU operation cost, and the query cost to determine GPU execution ability is determined based on data processing volume (row processed) and GPU operation cost. You can. The faster the query processing speed is, the resource usage is relatively small, and the query cost is relatively small, the higher the execution ability can be judged.

(2) 일반 쿼리 (2) General query

일반 쿼리는 컴퓨팅 리소스 중 CPU 실행 능력과 GPU 실행 능력을 고려하여 CPU 또는 GPU를 기반으로 처리될 수 있다.General queries can be processed based on CPU or GPU, considering CPU execution ability and GPU execution ability among computing resources.

보다 구체적으로 확장 쿼리 및 일반 쿼리에 대한 CPU 실행 능력과 GPU 실행 능력은 전체 비용을 고려하여 결정될 수 있다. 전체 비용이 작을수록 더 유리한 자원일 수 있다.More specifically, CPU execution ability and GPU execution ability for extended queries and general queries can be determined by considering the overall cost. The smaller the overall cost, the more advantageous the resource may be.

전체 비용(total cost)은 스타트업 코스트(star-up cost)와 런 코스트(run cost)의 합일 수 있다.The total cost may be the sum of the startup cost and run cost.

스타트업 코스트는 첫번째 튜플(tuple)이 페치(fetch)되기 전에 발생되는 비용일 수 있다. 튜플은 데이터베이스 내의 주어진 목록과 관계있는 속성값의 모음이다.　예를 들어, 인덱스 스캔 노드(index scan node)의 스타트업 코스트는 타겟 테이블의 첫번째 튜플에 액세스하기 위해 인덱스 페이지를 읽기 위한 코스트이다. Startup cost may be a cost incurred before the first tuple is fetched. A tuple is a collection of attribute values related to a given list in a database. For example, the startup cost of an index scan node is the cost of reading an index page to access the first tuple of the target table.

런 코스트는 모든 튜플에 액세스하기 위한 비용일 수 있다. The run cost may be the cost of accessing all tuples.

보다 구체적으로 CPU 런 코스트와 GPU 런 코스트는 아래의 수학식과 같이 결정될 수 있다.More specifically, CPU run cost and GPU run cost can be determined as shown in the equation below.

<수학식><Equation>

GPU run cost = (gpu_tuple_cost + gpu_operator_cost) x N_tuple + seq_page cost x N_page GPU run cost = (gpu_tuple_cost + gpu_operator_cost) x N _tuple + seq_page cost x N _page

CPU run cost = (cpu_tuple_cost + cpu_operator_cost) x N_tuple + seq_page cost x N_page CPU run cost = (cpu_tuple_cost + cpu_operator_cost) x N _tuple + seq_page cost x N _page

gpu_tuple_cost는 GPU가 연산시 테이블 행들을 프로세스하는 비용이다.gpu_tuple_cost is the cost of GPU processing table rows during operation.

gpu_operator_cost는 GPU가 테이블 튜플을 오퍼레이터나 함수로 프로세스하는 비용이다.gpu_operator_cost is the cost for the GPU to process table tuples as operators or functions.

cpu_tuple_cost는 CPU가 연산시 테이블 행들을 프로세스하는 비용이다cpu_tuple_cost is the cost for the CPU to process table rows during operation.

cpu_operator_cost는 CPU가 테이블 튜플을 오퍼레이터나 함수로 프로세스하는 비용이다.cpu_operator_cost is the cost for the CPU to process table tuples as operators or functions.

N_tuple은 테이블 튜플의 개수이다.N_tuple is the number of table tuples.

seq_page cost는 페이지를 가지고 오는 비용이다.seq_page cost is the cost of retrieving a page.

N_page는 인덱스 페이지의 개수이다.N _page is the number of index pages.

도 9는 본 발명의 실시예에 따른 입력 멀티-쿼리(input multi-query)를 처리하는 데이터 처리 시스템을 나타낸 개념도이다.Figure 9 is a conceptual diagram showing a data processing system that processes input multi-query according to an embodiment of the present invention.

도 9에서는 입력 멀티-쿼리를 수신시 멀티-쿼리 스케줄러를 기반으로 입력 멀티-쿼리에 포함된 복수의 쿼리에 대한 처리를 스케줄링하는 방법이 개시된다. FIG. 9 discloses a method of scheduling processing of a plurality of queries included in the input multi-query based on a multi-query scheduler when receiving the input multi-query.

도 9를 참조하면, 입력 멀티-쿼리(900)는 복수의 쿼리를 포함하는 쿼리일 수 있다. 예를 들어, 입력 멀티-쿼리(900)는 제1 쿼리, 제2 쿼리, 제3 쿼리 및 제4 쿼리를 포함할 수 있다. 제1 쿼리는 SELECT, 제2 쿼리, 제3 쿼리는 BUILD, 제4 쿼리는 PRINT일 수 있다.Referring to FIG. 9, the input multi-query 900 may be a query including a plurality of queries. For example, the input multi-query 900 may include a first query, a second query, a third query, and a fourth query. The first query may be SELECT, the second query may be BUILD, and the fourth query may be PRINT.

멀티-쿼리 스케줄러(920)는 멀티-쿼리 분석부(multi-query analyzer)(940) 및 쿼리 큐(query queue)(960)를 포함할 수 있다.The multi-query scheduler 920 may include a multi-query analyzer 940 and a query queue 960.

멀티-쿼리 분석부(920)는 입력 멀티-쿼리(900)에 포함된 복수의 쿼리를 분석하여 복수의 쿼리의 처리 순서를 결정할 수 있다. The multi-query analysis unit 920 may analyze a plurality of queries included in the input multi-query 900 and determine the processing order of the plurality of queries.

쿼리 큐(960)는 멀티-쿼리 분석부(920)에 의해 결정된 복수의 쿼리의 처리 순서를 고려하여 큐 상에 복수의 쿼리를 스케줄링할 수 있다.The query queue 960 may schedule a plurality of queries on the queue by considering the processing order of the plurality of queries determined by the multi-query analysis unit 920.

큐 상에 스케줄링된 쿼리는 파서부로 전달될 수 있다. 이후 전술한 절차와 같이 쿼리는 파서부에서 파싱되어 일반 쿼리와 확장 쿼리로 분할되고, 클라우즈 아날라이즈부 및 쿼리 트리부를 통해 해석되어 처리될 수 있다.Queries scheduled on the queue can be delivered to the parser unit. Thereafter, as in the above-mentioned procedure, the query may be parsed by the parser unit, divided into a general query and an extended query, and interpreted and processed through the cloud analyzer unit and the query tree unit.

클라우즈 아날라이즈부는 쿼리를 처리할 엔진 및 쿼리를 처리할 컴퓨팅 자원(CPU 또는 GPU)를 결정하여 결정된 엔진(PostgreSQL 엔진 또는 확장 엔진)으로 전달하여 처리할 수 있다.The cloud analysis unit can determine the engine to process the query and the computing resource (CPU or GPU) to process the query and transfer it to the determined engine (PostgreSQL engine or extension engine) for processing.

도 10은 본 발명의 실시예에 따른 멀티-쿼리 스케줄러의 동작을 나타낸 개념도이다.Figure 10 is a conceptual diagram showing the operation of a multi-query scheduler according to an embodiment of the present invention.

도 10에서는 멀티-쿼리 스케줄러의 복수의 쿼리를 스케줄링하기 위한 방법이 개시된다.In Figure 10, a method for scheduling a plurality of queries in a multi-query scheduler is disclosed.

도 10을 참조하면, 멀티-쿼리 스케줄러(1000)는 멀티-쿼리 형태로 들어오는 입력 쿼리에 대한 종속성 스코어(1020) 및 컴퓨팅 자원 할당 데이터(1040)를 분석하고, 종속성 스코어(1020) 및 컴퓨팅 자원 할당 데이터(1040)를 기반으로 입력 멀티-쿼리에 포함되는 복수의 쿼리의 처리 순서를 결정할 수 있다.Referring to FIG. 10, the multi-query scheduler 1000 analyzes the dependency score 1020 and computing resource allocation data 1040 for an input query that comes in the form of a multi-query, and calculates the dependency score 1020 and computing resource allocation. Based on the data 1040, the processing order of a plurality of queries included in the input multi-query can be determined.

종속성 스코어(1020)는 입력 멀티-쿼리에 포함되는 복수의 쿼리 간에 상호 상관 관계를 통해 결정될 수 있다. 예를 들어, 제3 쿼리가 제1 쿼리 및 제2 쿼리를 기반으로 처리된 결과에 종속적인 경우, 제3 쿼리는 제1 쿼리와 제2 쿼리에 대한 종속성을 가지는 쿼리일 수 있다. 제4 쿼리가 독립적으로 처리되는 경우, 제4 쿼리는 다른 쿼리에 대한 종속성을 가지지 않는 쿼리일 수 있다.The dependency score 1020 may be determined through cross-correlation between a plurality of queries included in the input multi-query. For example, if the third query is dependent on results processed based on the first query and the second query, the third query may be a query that has dependency on the first query and the second query. When the fourth query is processed independently, the fourth query may be a query that has no dependency on other queries.

종속성 스코어(1020)는 다른 쿼리의 처리 이후에 수행될 수 있는 쿼리에 대하여 판단될 수 있고, 다른 쿼리에 대한 종속성이 높을수록 상대적으로 높은 값을 가질 수 있다.The dependency score 1020 may be determined for a query that can be performed after processing another query, and may have a relatively high value as the dependency on other queries increases.

컴퓨팅 자원 할당 데이터(1040)는 쿼리가 CPU와 GPU 어떠한 프로세싱 유닛을 기반으로 처리될지 및 쿼리가 CPU 또는 GPU를 기반으로 처리될 경우, 할당되는 컴퓨팅 자원에 대한 데이터일 수 있다. Computing resource allocation data 1040 may be data about whether the query will be processed based on CPU or GPU processing units, and if the query is processed based on CPU or GPU, computing resources to be allocated.

예를 들어, 멀티-쿼리 스케줄러(1000)는 전술한 자원 분배 알고리즘을 기반으로 쿼리가 CPU와 GPU 어떠한 프로세싱 유닛을 기반으로 처리될지를 결정할 수 있다. 또한, 멀티-쿼리 스케줄러(1000)는 전술한 CPU 실행 능력 및 GPU 실행 능력을 판단하는 방법을 기반으로 쿼리가 CPU 또는 GPU를 기반으로 처리될 경우, 할당되는 컴퓨팅 자원에 대한 데이터를 획득할 수 있다.For example, the multi-query scheduler 1000 may determine whether a query will be processed based on which processing unit, CPU or GPU, based on the resource distribution algorithm described above. In addition, the multi-query scheduler 1000 can obtain data on allocated computing resources when a query is processed based on CPU or GPU based on the method for determining CPU execution ability and GPU execution ability described above. .

CPU 실행 능력은 CPU를 기반으로 쿼리를 처리시 쿼리 처리 속도, 리소스 사용량, 쿼리 비용을 기반으로 결정될 수 있다. GPU 실행 능력은 GPU를 기반으로 쿼리를 처리시 쿼리 처리 속도, 리소스 사용량, 쿼리 비용을 기반으로 결정될 수 있다. CPU 실행 능력을 결정하기 위한 쿼리 비용은 데이터 처리량(row processed), CPU 연산 비용을 기반으로 결정되고, GPU 실행 능력을 결정하기 위한 쿼리 비용은 데이터 처리량(row processed), GPU 연산 비용을 기반으로 결정될 수 있다.CPU execution ability can be determined based on query processing speed, resource usage, and query cost when processing queries based on CPU. GPU execution ability can be determined based on query processing speed, resource usage, and query cost when processing queries based on GPU. The query cost to determine CPU execution ability is determined based on data processing volume (row processed) and CPU operation cost, and the query cost to determine GPU execution ability is determined based on data processing volume (row processed) and GPU operation cost. You can.

또한, 종속성 스코어(1020)는 쿼리의 타입에 따라 서로 다른 우선 순위를 기반으로 처리되도록 결정될 수 있다. Additionally, the dependency score 1020 may be determined to be processed based on different priorities depending on the type of query.

쿼리 중 CREATE, COPY, BUILD 등과 같은 테이블이나 모델을 생성하는 특별한 구문(clause)을 인식하고 이러한 타입의 쿼리가 먼저 실행되도록 스코어가 할당될 수 있다. 테이블이나 모델을 생성하기 위한 쿼리는 우선 처리 쿼리로 정의될 수 있고, 우선 처리 쿼리 내에서도 별도의 우선 순위가 스코어를 기반으로 부여될 수 있다.Among queries, special statements that create tables or models, such as CREATE, COPY, BUILD, etc., can be recognized and scores assigned so that these types of queries are executed first. Queries for creating tables or models can be defined as priority processing queries, and even within priority processing queries, separate priorities can be given based on scores.

예를 들어, 입력 멀티-쿼리에 포함되는 복수의 쿼리가 CREATE, SELECT_1, COPY, SELECT_2, PREDICT, BUILD인 경우, COPY, CREATE, BUILD는 우선 처리 쿼리로서 우선적으로 처리될 수 있다.For example, if multiple queries included in the input multi-query are CREATE, SELECT_1, COPY, SELECT_2, PREDICT, and BUILD, COPY, CREATE, and BUILD may be processed preferentially as priority processing queries.

예를 들어, 종속성 스코어(1020)를 기반으로 COPY > CREATE > BUILD > PREDICT > SELECT_1 > SELECT_2 순서로 처리 우선 순위가 결정되고, 쿼리 큐는 쿼리를 수행하기 위한 큐(queue)를 재구성할 수 있다.For example, based on the dependency score 1020, processing priority is determined in the order COPY > CREATE > BUILD > PREDICT > SELECT_1 > SELECT_2, and the query queue can reconfigure the queue for performing queries.

보다 구체적으로 멀티-쿼리 분석부는 1차적으로 입력 멀티-쿼리에 포함되는 복수의 쿼리의 종속성을 체크한 후 복수의 쿼리 각각의 실행 순서를 결정할 수 있다. More specifically, the multi-query analysis unit may first check the dependency of a plurality of queries included in the input multi-query and then determine the execution order of each of the plurality of queries.

이후, 2차적으로 전술한 전체 비용을 고려하여 복수의 쿼리 각각에 대한 GPU 실행 능력 및 CPU 실행 능력이 결정될 수 있다. GPU 실행 능력 및 CPU 실행 능력을 기반으로 쿼리 그룹에 포함된 복수의 쿼리 각각이 CPU에 할당될지 GPU에 할당될지가 결정되고, 전체 비용이 낮을수록 빠르게 실행되도록 실행 순서가 결정될 수 있다.Thereafter, the GPU execution capability and CPU execution capability for each of the plurality of queries may be determined secondarily by considering the total cost described above. Based on GPU execution ability and CPU execution ability, it is determined whether each of the plurality of queries included in the query group is assigned to the CPU or GPU, and the execution order can be determined so that the lower the overall cost, the faster the query is executed.

보다 구체적으로 멀티-쿼리 분석부의 클라우즈 아날라이저는 입력 멀티 쿼리에 포함되는 복수의 쿼리의 상관 관계를 분석하고, 복수의 쿼리의 상관 관계(종속성)를 기초로 복수의 쿼리 중 특정 쿼리를 우선적으로 실행하도록 결정할 수 있다.More specifically, the cloud analyzer of the multi-query analysis unit analyzes the correlation of multiple queries included in the input multi-query, and preferentially executes a specific query among the multiple queries based on the correlation (dependency) of the multiple queries. You can decide to do it.

이후, 복수의 쿼리 각각에 대한 전체 비용(스타트업 코스트(start-up cost)와 런 코스트(run cost)의 합)이 결정될 수 있다. 또한, 복수의 쿼리 각각의 전체 비용 정보를 기초로 메이저 자원이 CPU 또는 GPU로 결정될 수 있다. 전체 코스트가 낮을수록 높은 우선 순위로서 처리되도록 큐에 쌓이되, 실행 능력에 따라 메이저 자원으로서 CPU, GPU를 선택적으로 사용되도록 복수의 쿼리 그룹이 큐에 쌓일 수 있다.Thereafter, the total cost (sum of startup cost and run cost) for each of the plurality of queries may be determined. Additionally, the major resource may be determined to be CPU or GPU based on the total cost information of each of the plurality of queries. The lower the overall cost, the higher the priority. Multiple query groups can be queued to be processed with higher priority, but CPU and GPU can be selectively used as major resources depending on execution ability.

마지막으로 복수의 쿼리에 포함되는 쿼리의 위치를 추가적으로 고려하여 입력 멀티 쿼리의 앞단에 위치한 쿼리일수록 높은 우선 순위로서 처리되도록 큐에 쌓일 수 있다.Lastly, by additionally considering the positions of queries included in multiple queries, queries located at the front of the input multi-query can be queued to be processed with higher priority.

예를 들어, 쿼리 1 내지 쿼리4가 존재하는 경우가 가정될 수 있다. For example, it may be assumed that queries 1 to 4 exist.

쿼리 1은 1순위로 작은 전체 비용을 가지고, GPU를 메이저 자원으로 사용하고, 입력 멀티-쿼리에 가장 앞부분에 작성된 쿼리를 포함할 수 있다.Query 1 has a small overall cost as the first priority, uses GPU as a major resource, and can include the query written at the very beginning in the input multi-query.

쿼리 2는 3순위로 작은 전체 비용을 가지고 GPU를 메이저 자원으로 사용할 수 있다.Query 2 is ranked 3rd and can use GPU as a major resource with small overall cost.

쿼리 3은 2순위의 작은 전체 비용을 가지고 CPU를 메이저 자원으로 사용할 수 있다.Query 3 can use CPU as a major resource with a small overall cost of rank 2.

쿼리 4은 4순위로 작은 전체 비용을 가지고 CPU를 메이저 자원으로 사용하고, 입력 멀티-쿼리에 가장 뒷부분에 작성된 쿼리를 포함할 수 있다. 이러한 경우, 쿼리 1, 쿼리 3, 쿼리 2, 쿼리 4의 순서로 처리되도록 큐에 쿼리 그룹이 쌓일 수 있다. Query 4 is ranked 4th, has a small overall cost, uses CPU as a major resource, and can include queries written last in the input multi-query. In this case, query groups may be stacked in the queue to be processed in the following order: query 1, query 3, query 2, and query 4.

도 11은 본 발명의 실시예에 따른 입력 멀티-쿼리를 처리하는 방법을 나타낸 개념도이다.Figure 11 is a conceptual diagram showing a method of processing an input multi-query according to an embodiment of the present invention.

도 11에서는 입력 멀티 쿼리에 포함되는 쿼리 그룹의 실행 순서를 결정하는 알고리즘이 개시된다. In Figure 11, an algorithm for determining the execution order of query groups included in an input multi-query is disclosed.

도 11을 참조하면, 멀티-쿼리 분석부의 클라우즈 아날라이저에 의해 복수의 쿼리 그룹의 메이저 자원과 전체 비용이 결정된 이후, 종속성, 전체 비용, 쿼리의 위치를 고려하여 쿼리 그룹의 실행 순서가 결정될 수 있다.Referring to FIG. 11, after the major resources and total cost of a plurality of query groups are determined by the cloud analyzer of the multi-query analysis unit, the execution order of the query group can be determined by considering the dependency, total cost, and location of the query. .

예를 들어, 쿼리1(1110)은 전체 비용 3, 쿼리문으로 PREDICT USING(my_model)을 포함한다. 쿼리2(1120)은 전체 비용 2, 쿼리문으로 CREATE TABLE(my_model)을 포함한다. 쿼리3(1130)은 전체 비용 5, 쿼리문으로 BUILD MODEL(my_model)을 포함할 수 있다.For example, query 1 (1110) includes a total cost of 3 and PREDICT USING (my_model) as a query statement. Query 2 (1120) includes a total cost of 2 and CREATE TABLE (my_model) as a query statement. Query 3 (1130) has a total cost of 5 and may include BUILD MODEL (my_model) as a query statement.

이러한 경우, 쿼리1(1110)과 쿼리2(1120)에서 사용되는 my_model은 쿼리3(1130)에서 먼저 생성되고 사용 가능하므로 쿼리1(1110)과 쿼리2(1120)은 쿼리3(1130)에 대하여 종속성을 가진다. 따라서, 쿼리3(1130)의 전체 코스트가 높고 위치가 가장 뒤임에도 불구하고 가장 먼저 실행될 수 있다.In this case, my_model used in query 1 (1110) and query 2 (1120) is created and available first in query 3 (1130), so query 1 (1110) and query 2 (1120) are related to query 3 (1130). It has dependencies. Therefore, even though the overall cost of query 3 (1130) is high and its location is last, it can be executed first.

쿼리3(1130)의 다음으로 전체 비용을 고려하여 전체 비용 1인 쿼리2(1120)가 실행되고, 마지막으로 전체 비용 3인 쿼리1(1110)이 실행될 수 있다.Next to query 3 (1130), query 2 (1120) with a total cost of 1 may be executed considering the total cost, and finally, query 1 (1110) with a total cost of 3 may be executed.

만약 쿼리1(1110)과 쿼리2(1120)의 전체 비용이 동일하다면, 쿼리문의 위치를 고려하여, 쿼리 3(1130)의 실행 이후, 쿼리1(1110), 쿼리2(1120)의 순서로 쿼리가 실행될 수 있다.If the total cost of Query 1 (1110) and Query 2 (1120) are the same, considering the location of the query statement, after executing Query 3 (1130), query in the order of Query 1 (1110) and Query 2 (1120). can be executed.

도 12는 본 발명의 실시예에 따른 정형 데이터 및 비정형 데이터에 대한 처리 서비스를 제공하는 서버를 나타낸 개념도이다. Figure 12 is a conceptual diagram showing a server that provides processing services for structured data and unstructured data according to an embodiment of the present invention.

도 12에서는 서로 다른 아키텍쳐를 기반으로 사용자 장치로 정형 데이터 및 비정형 데이터에 대한 처리 서비스를 제공하기 위한 서버가 개시된다.In Figure 12, a server for providing processing services for structured data and unstructured data to user devices based on different architectures is disclosed.

도 12를 참조하면, 서버(1200)는 인스턴스(instance)(1210)와 리소스(resource)를 포함할 수 있다. 인스턴스는 워크스페이스 허브(1270, 1280)와 데이터 처리 엔진(1220)을 포함하고, 리소스는 프로세싱 자원(1240)과 스토리지(1230), 메모리(미도시)를 포함할 수 있다. 워크스페이스 허브(1270, 1280)는 적어도 하나의 워크스페이스(1250)를 포함할 수 있다.Referring to FIG. 12, the server 1200 may include an instance 1210 and resources. The instance includes a workspace hub (1270, 1280) and a data processing engine (1220), and resources may include a processing resource (1240), storage (1230), and memory (not shown). Workspace hubs 1270 and 1280 may include at least one workspace 1250.

워크스페이스(1250)는 직접적으로 워크스페이스(1250)에 대응되는 스토리지(1230)에만 연결될 수 있고, 데이터 처리 엔진(1220)은 모든 스토리지(1230)에 연결될 수 있다. 인스턴스는 리소스와 직접적으로 연결된 구조를 가질 수 있다.The workspace 1250 can be directly connected only to the storage 1230 corresponding to the workspace 1250, and the data processing engine 1220 can be connected to all storages 1230. An instance can have a structure directly connected to a resource.

이하에서는 보다 구체적으로 서버(1200)를 구성하는 인스턴스 및 리소스에 대해 개시한다.Hereinafter, instances and resources constituting the server 1200 will be described in more detail.

워크스페이스 허브(1270, 1280)는 공유 아키텍쳐(shared architecture)를 기반으로 구현되는 공유 워크스페이스 허브(1270), 전용 아키텍쳐(dedicated architecture)를 기반으로 구현되는 전용 워크스페이스 허브(1280)를 포함할 수 있다.The workspace hubs 1270 and 1280 may include a shared workspace hub 1270 implemented based on a shared architecture and a dedicated workspace hub 1280 implemented based on a dedicated architecture. there is.

공유 워크스페이스 허브(1270)와 전용 워크스페이스 허브(1280)는 워크스페이스(1250)를 포함할 수 있다. 워크스페이스(1250)는 사용자에 의한 정형 데이터 및 비정형 데이터를 기반으로 한 서비스를 구현하기 위한 공간일 수 있다. 예를 들어, 하나의 워크스페이스(1250)는 하나의 사용자 식별자를 기반으로 부여될 수 있다. 워크스페이스(1250)는 랩(lab)과 데이터베이스(database)를 포함할 수 있다. 랩은 사용자에 의해 구현되는 서비스를 위한 쿼리를 생성하는 공간이고, 데이터베이스는 조직화된 데이터를 저장하는 논리적 저장소이다. 다른 표현으로 랩은 워크스페이스(1250)에 액세스하기 위한 사용자 인터페이스일 수 있다.The shared workspace hub 1270 and the dedicated workspace hub 1280 may include the workspace 1250. The workspace 1250 may be a space for implementing services based on structured data and unstructured data provided by users. For example, one workspace 1250 may be assigned based on one user identifier. The workspace 1250 may include a lab and a database. The lab is a space for generating queries for services implemented by users, and the database is a logical storage that stores organized data. In other words, the lab may be a user interface for accessing the workspace 1250.

공유 워크스페이스 허브(1270)는 복수의 사용자들의 복수의 워크스페이스(1250)를 포함할 수 있다. 공유 워크스페이스 허브(1270)는 할당된 데이터 처리 엔진(1220), 할당된 프로세싱 자원(1240)을 공유할 수 있다. 전용 워크스페이스 허브(1280)는 하나의 사용자에 대한 워크스페이스(1250)를 포함할 수 있다. 전용 워크스페이스 허브(1280)에는 하나의 사용자가 전용으로 사용할 데이터 처리 엔진(1220) 및 프로세싱 자원(1240)이 할당될 수 있다.The shared workspace hub 1270 may include multiple workspaces 1250 of multiple users. The shared workspace hub 1270 may share the allocated data processing engine 1220 and the allocated processing resources 1240. Dedicated workspace hub 1280 may include a workspace 1250 for one user. A data processing engine 1220 and a processing resource 1240 for exclusive use by one user may be allocated to the dedicated workspace hub 1280.

설명의 편의상 공유 워크스페이스 허브(1270)와 전용 워크스페이스 허브(1280)는 하나씩만 표현되었으나, 공유 워크스페이스 허브(1270)와 전용 워크스페이스 허브(1280)는 복수개 존재할 수 있고, 이러한 아키텍쳐 구조도 본 발명의 권리 범위에 포함될 수 있다.For convenience of explanation, only the shared workspace hub 1270 and the dedicated workspace hub 1280 are expressed one by one. However, there may be a plurality of shared workspace hubs 1270 and dedicated workspace hubs 1280, and this architectural structure is also shown in this example. It may be included in the scope of invention rights.

보다 구체적으로 공유 워크스페이스 허브(1270)는 할당된 데이터 처리 엔진(1220)과 할당된 프로세싱 자원(1240)을 사용할 수 있다. 공유 워크스페이스 허브(1270)는 복수의 워크스페이스(1250)를 포함할 수 있다. 복수의 워크스페이스(1250)는 할당된 데이터 처리 엔진(1220)과 할당된 프로세싱 자원(1240)을 공유할 수 있다.More specifically, the shared workspace hub 1270 may use the allocated data processing engine 1220 and the allocated processing resources 1240. The shared workspace hub 1270 may include a plurality of workspaces 1250. A plurality of workspaces 1250 may share the allocated data processing engine 1220 and the allocated processing resources 1240.

전용 워크스페이스 허브(1280)는 하나의 워크스페이스(1250)만을 포함하고, 하나의 워크스페이스(1250)는 할당된 데이터 처리 엔진(1220)과 할당된 프로세싱 자원(1240)을 전용으로 사용할 수 있다.The dedicated workspace hub 1280 includes only one workspace 1250, and one workspace 1250 may exclusively use the allocated data processing engine 1220 and the allocated processing resource 1240.

따라서, 사용자는 처리할 정형 데이터, 비정형 데이터의 양, 정형 데이터 및 비정형 데이터를 기반으로 구현될 서비스를 기반으로 공유 워크스페이스(1270) 또는 전용 워크스페이스(1280)를 선택적으로 선택하여 워크스페이스(1250)를 생성할 수 있다.Accordingly, the user selectively selects the shared workspace 1270 or the dedicated workspace 1280 based on the amount of structured data to be processed, the amount of unstructured data, and the services to be implemented based on the structured data and unstructured data. ) can be created.

데이터 처리 엔진(1220)은 도 2 내지 도 11에서 전술한 데이터 처리 시스템으로서 비정형 데이터와 정형 데이터에 대한 쿼리를 PostgreSQL와 확장 SQL를 기반으로 처리할 수 있다. 즉, 데이터 처리 엔진(1220)은 도 2 내지 도 11에서 전술한 데이터 처리 시스템의 동작을 수행할 수 있다.The data processing engine 1220 is the data processing system described above in FIGS. 2 to 11 and can process queries for unstructured data and structured data based on PostgreSQL and extended SQL. That is, the data processing engine 1220 can perform the operations of the data processing system described above in FIGS. 2 to 11.

데이터 처리 엔진(1220)은 워크스페이스(1250)를 기반으로 한 사용자 요청을 통해 사용자를 위한 모델을 생성할 수 있고, 사용자를 위한 모델은 생성되어 스토리지(1230) 상에 저장될 수 있다.The data processing engine 1220 may create a model for the user through a user request based on the workspace 1250, and the model for the user may be created and stored on the storage 1230.

사용자는 API(application programming interface) 및 사용자 고유 식별 정보를 기반으로 외부에서 데이터 처리 엔진(1220)을 통해 사용자의 스토리지(1230)에 저장된 모델을 기반으로 한 출력값을 요청하기 위한 쿼리를 전달할 수 있다. 데이터 처리 엔진(1220)은 사용자 고유 식별 정보를 기반으로 사용자의 워크스페이스(1250)를 확인하고, 스토리지(1230)에 저장된 모델을 기반으로 출력값을 사용자에게 전달할 수 있다.The user may send a query to request an output value based on the model stored in the user's storage 1230 externally through the data processing engine 1220 based on an application programming interface (API) and the user's unique identification information. The data processing engine 1220 may check the user's workspace 1250 based on the user's unique identification information and deliver output values to the user based on the model stored in the storage 1230.

데이터 처리 엔진(1220)은 모든 스토리지(1230)에 액세스 가능하게 구현되고 사용자의 요청에 따라 사용자 고유 식별 정보를 기반으로 스토리지(1230)에 접근하여 스토리지(1230)에 저장된 데이터를 활용할 수 있다.The data processing engine 1220 is implemented to be accessible to all storage 1230 and can access the storage 1230 based on user unique identification information upon user request and utilize data stored in the storage 1230.

스토리지(1230)는 파일, 모델 등에 대한 물리적인 저장소일 수 있고, 하나의 워크스페이스(1250)는 하나의 스토리지(1230)에 연결되어 사용자 고유 식별 정보를 인증받은 경우에만 액세스 가능할 수 있다.Storage 1230 may be a physical storage for files, models, etc., and one workspace 1250 may be connected to one storage 1230 and may be accessible only when the user is authenticated with unique identification information.

프로세싱 자원(1240)은 GPU, CPU, 메모리를 포함할 수 있다. 전술한 바와 같이 CPU와 GPU는 자원 분배 알고리즘을 기반으로 쿼리를 처리시 선택될 수 있다. 또한, CPU와 GPU는 멀티-쿼리 스케줄러를 기반으로 선택되어 멀티-쿼리를 처리하기 위해 선택적으로 사용될 수 있다.Processing resources 1240 may include GPU, CPU, and memory. As described above, CPU and GPU can be selected when processing queries based on a resource distribution algorithm. Additionally, CPU and GPU can be selected based on a multi-query scheduler and optionally used to process multi-queries.

전용 워크스페이스 허브(1280)의 워크스페이스(1250)는 할당된 자원을 독점적으로 사용할 수 있다. 공유 워크스페이스 허브(1270)의 복수의 워크스페이스(1250)는 할당된 자원을 공유한다. 따라서, 복수의 워크스페이스(1250) 각각에 의해 요청된 쿼리는 순차적으로 처리될 수 있다.The workspace 1250 of the dedicated workspace hub 1280 can exclusively use the allocated resources. A plurality of workspaces 1250 of the shared workspace hub 1270 share allocated resources. Accordingly, queries requested by each of the plurality of workspaces 1250 can be processed sequentially.

도 13은 본 발명의 실시예에 따른 공유 워크스페이스 허브의 쿼리 처리 방법을 나타낸 개념도이다. Figure 13 is a conceptual diagram showing a query processing method of a shared workspace hub according to an embodiment of the present invention.

도 13에서는 공유 워크스페이스 허브에 포함되는 복수의 워크스페이스 각각에 의한 쿼리를 처리하기 위한 방법이 개시된다. In Figure 13, a method for processing queries by each of a plurality of workspaces included in a shared workspace hub is disclosed.

도 13을 참조하면, 공유 워크스페이스 허브에 포함되는 개별 워크스페이스에 대한 쿼리 처리는 전술한 자원 배분 알고리즘을 기반으로 처리될 수 있다.Referring to FIG. 13, query processing for individual workspaces included in the shared workspace hub may be processed based on the resource distribution algorithm described above.

하지만, 복수의 워크스페이스에 의해 생성된 쿼리는 큐(queue)(1300)/스택(stack)을 기반으로 순차적으로 처리될 수 있다. 즉, 개별 워크스페이스의 쿼리에 대한 처리는 자원 배분 알고리즘을 기반으로 수행되고, 복수의 워크스페이스의 쿼리에 대한 처리는 시퀀셜(sequential)하게 쿼리 생성 시간을 고려하여 순차적으로 처리될 수 있다.However, queries generated by multiple workspaces can be processed sequentially based on the queue 1300/stack. In other words, processing of queries from individual workspaces is performed based on a resource distribution algorithm, and processing of queries from multiple workspaces can be processed sequentially, taking query creation time into account.

도 14는 본 발명의 실시예에 따른 자원 분배 방법을 나타낸 개념도이다. Figure 14 is a conceptual diagram showing a resource distribution method according to an embodiment of the present invention.

도 14에서는 워크스페이스의 특성을 고려하여 프로세싱 자원을 할당하기 위한 방법이 개시된다.In Figure 14, a method for allocating processing resources considering the characteristics of the workspace is disclosed.

도 14를 참조하면, 워크스페이스 허브(전용 워크스페이스 허브, 공유 워크스페이스 허브)의 특성을 고려하여 프로세싱 자원(CPU, GPU, 메모리)을 할당하기 위한 방법이 개시된다.Referring to FIG. 14, a method for allocating processing resources (CPU, GPU, memory) is disclosed taking into account the characteristics of a workspace hub (dedicated workspace hub, shared workspace hub).

프로세싱 자원을 할당하기 위한 워크스페이스 허브의 특성은 정형 데이터에 대한 쿼리의 볼륨(1400), 비정형 데이터에 대한 쿼리의 볼륨(1410), 쿼리의 처리를 위해 필요한 데이터 처리량(1420)을 포함할 수 있다.The characteristics of the workspace hub for allocating processing resources may include the volume of queries for structured data (1400), the volume of queries for unstructured data (1410), and the data throughput (1420) required to process the queries. .

워크스페이스가 비정형 데이터에 대한 쿼리(이하, 비정형 쿼리)를 상대적으로 많이 사용할수록 GPU를 상대적으로 많이 사용하게 된다.The more queries a workspace uses for unstructured data (hereinafter referred to as unstructured queries), the more GPU it uses.

워크스페이스가 정형 데이터에 대한 쿼리(이하, 정형 쿼리)를 상대적으로 많이 사용할수록 또한, 쿼리 요청수가 상대적으로 많을수록 CPU를 상대적으로 많이 사용하게 된다. The more a workspace uses queries for structured data (hereinafter referred to as structured queries), the more the workspace uses relatively more CPU.

워크스페이스가 쿼리를 처리함에 있어서 많은 데이터양의 처리가 필요할수록 상대적으로 메모리가 많이 할당될 수 있다.The more data the workspace needs to process when processing queries, the more memory can be allocated.

본 발명의 실시예에 따르면, 워크스페이스 허브 별로 사용되는 정형 데이터에 대한 정형 쿼리의 볼륨, 비정형 데이터에 대한 비정형 쿼리의 볼륨, 쿼리의 처리를 위해 필요한 데이터 처리량을 고려하여 워크스페이스 허브에 대한 자원이 적응적으로 할당될 수 있다.According to an embodiment of the present invention, resources for the workspace hub are allocated in consideration of the volume of structured queries for structured data used for each workspace hub, the volume of unstructured queries for unstructured data, and the data throughput required to process the queries. Can be allocated adaptively.

본 발명의 실시예에 따르면, 전용 워크스페이스 허브에 포함되는 워크스페이스(이하, 워크스페이스(전용)의 경우, 설정된 한도 내의 프로세싱 자원을 공유하지 않고 사용한다. 본 발명의 실시예에 따르면, 워크스페이스(전용)은 발생된 정형 쿼리 데이터, 비정형 쿼리 데이터, 데이터 처리량에 대한 통계 정보인 프로세싱 자원 사용 통계 데이터를 기반으로 워크스페이스(전용)에 대한 CPU, GPU, 메모리의 제공 한도를 적응적으로 조절할 수 있다. 예를 들어, 워크스페이스1(전용)은 CPU가 상대적으로 많이 필요할 수 있고, 이러한 경우, 제한 한도 내에서 사용하지 않는 GPU 자원 할당량이 치환되어 CPU 자원으로 전환되어 사용될 수 있다. 워크스페이스2(전용)은 GPU가 상대적으로 많이 필요할 수 있고, 이러한 경우, 제한 한도 내에서 사용하지 않는 CPU 자원 할당량이 치환되어 GPU 자원으로 전환되어 사용될 수 있다.According to an embodiment of the present invention, in the case of a workspace (hereinafter referred to as workspace (dedicated)) included in a dedicated workspace hub, processing resources within a set limit are used without sharing. According to an embodiment of the present invention, the workspace (Dedicated) can adaptively adjust the provision limits of CPU, GPU, and memory for the workspace (Dedicated) based on generated structured query data, unstructured query data, and processing resource usage statistical data, which is statistical information on data throughput. For example, Workspace 1 (dedicated) may require a relatively large amount of CPU, and in this case, unused GPU resource allocations within the limit can be replaced and converted to CPU resources. Workspace 2 (Dedicated) may require a relatively large number of GPUs, and in this case, unused CPU resource allocations within the limit can be replaced and converted to GPU resources for use.

공유 워크스페이스 허브에 포함되는 워크스페이스(이하, 워크스페이스(공유)의 경우, 프로세싱 자원을 공유하여 사용하게 된다. In the case of a workspace (hereinafter referred to as workspace (shared)) included in a shared workspace hub, processing resources are shared and used.

따라서, 프로세싱 자원의 공유를 위해 공유 워크스페이스 허브에 포함되는 복수의 워크스페이스(공유)의 프로세싱 자원 사용 통계 데이터를 기반으로 프로세싱 자원이 할당될 수 있다. 공유 워크스페이스 허브에 프로세싱 자원을 할당하는 방식은 고정 방식과 변동 방식이 존재할 수 있다.Therefore, in order to share processing resources, processing resources may be allocated based on processing resource usage statistics data of a plurality of workspaces (shared) included in the shared workspace hub. There can be fixed and variable methods for allocating processing resources to a shared workspace hub.

고정 방식은 하나의 공유 워크스페이스 허브에 할당되는 프로세싱 자원을 포함되는 워크스페이스(공유)의 개수에 따라 설정하는 방식이다.The fixed method sets the processing resources allocated to one shared workspace hub according to the number of workspaces (shares) included.

고정 방식이 사용되는 경우, 공유 워크스페이스 허브에 포함되는 복수의 워크스페이스는 복수의 워크스페이스 각각에서 사용되는 프로세싱 자원 사용 통계 각각을 고려하여 공유 워크스페이스 허브에 할당되는 워크스페이스가 결정될 수 있다. 예를 들어, 상대적으로 CPU를 GPU보다 많이 사용하는 워크스페이스와 상대적으로 GPU를 CPU보다 많이 사용하는 워크스페이스가 그룹핑되어 전체적으로 균형있는 프로세싱 자원이 이루어지도록 워크스페이스가 공유 워크스페이스 허브에 할당될 수 있다.When the fixed method is used, the workspaces allocated to the shared workspace hub may be determined by considering each of the processing resource usage statistics used in each of the plurality of workspaces. For example, workspaces that use relatively more CPU than GPU and workspaces that use GPU relatively more than CPU can be grouped and assigned to a shared workspace hub to achieve overall balanced processing resources. .

변동 방식은 하나의 공유 워크스페이스 허브에 할당되는 프로세싱 자원을 복수의 공유 워크스페이스 허브에서 사용되는 프로세싱 자원을 고려하여 할당하는 방식이다. 변동 방식에서는 복수의 공유 워크스페이스 허브에서 사용되는 프로세싱 자원이 결정되고, 복수의 공유 워크스페이스 허브 각각의 프로세싱 자원 사용 통계를 고려하여 복수의 공유 워크스페이스 허브 각각으로 서로 다른 CPU 자원, GPU 자원, 메모리가 할당될 수 있다. The variable method is a method of allocating processing resources allocated to one shared workspace hub by taking into account the processing resources used in multiple shared workspace hubs. In the variable method, the processing resources used in the multiple shared workspace hubs are determined, and different CPU resources, GPU resources, and memory are allocated to each of the multiple shared workspace hubs by considering the processing resource usage statistics of each of the multiple shared workspace hubs. can be assigned.

고정 방식과 변동 방식은 전체 프로세싱 자원의 사용량을 고려하여 안정적인 프로세싱 자원의 공급이 필요한 경우, 고정 방식이 사용되고, 보다 효율적인 프로세싱 자원의 공급이 필요한 경우, 변동 방식이 사용될 수 있다. 예를 들어, 특정 시간대(업무 시간)의 경우, 고정 방식으로 프로세싱 자원을 제공하고, 예를 들어, 특정 시간대(비업무 시간)의 경우, 변동 방식으로 프로세싱 자원을 제공하여 사용하지 않는 프로세싱 자원이 최대한 효율적으로 활용하도록 할 수 있다.The fixed method and the variable method can be used when a stable supply of processing resources is required considering the usage of total processing resources, and the variable method can be used when a more efficient supply of processing resources is needed. For example, for certain time zones (business hours), processing resources are provided in a fixed manner, and for certain time zones (non-business hours), processing resources are provided in a variable manner to maximize unused processing resources. It can be used efficiently.

더 큰 단위로는 서버 단위로 프로세싱 자원 사용 통계를 기반으로 복수의 서버 상에 워크스페이스 허브 및 워크스페이스가 할당될 수 있다.On a larger scale, workspace hubs and workspaces may be allocated on multiple servers based on processing resource usage statistics on a server-by-server basis.

이하, 개시되는 비정형 데이터의 임베딩 방법, 비정형 데이터의 바이너리화법, 서버/워크스페이브 백업 방법, 서버/워크스페이스 마이그레이션 방법, 캐시화 방법은 데이터 처리 시스템에서 수행될 수 있고, 데이터 처리 시스템은 전술한 서버와 외부 장치(외부 처리 서버)를 포함하는 개념일 수 있다.Hereinafter, the unstructured data embedding method, unstructured data binarization method, server/workspace backup method, server/workspace migration method, and caching method disclosed may be performed in a data processing system, and the data processing system may be performed as described above. It may be a concept that includes a server and an external device (external processing server).

도 15는 본 발명의 실시예에 따른 비정형 데이터에 대한 비정형 임베딩 방법을 나타낸 개념도이다.Figure 15 is a conceptual diagram showing an unstructured embedding method for unstructured data according to an embodiment of the present invention.

도 15에서는 비정형 데이터에 대한 임베딩을 수행하는 비정형 임베딩 방법이 개시된다.In Figure 15, an unstructured embedding method for performing embedding on unstructured data is disclosed.

도 15를 참조하면, 비정형 데이터(이미지, 비디오, 오디오, 등)은 차원이 크고, 파일 자체로(픽셀 단위 등)으로 저장하게 되면 사이즈가 크다. 따라서, 본 발명의 데이터 처리 시스템은 비정형 데이터를 임베딩으로 변환하여 처리할 수 있다. 임베딩은 이미지, 텍스트, 비디오와 같은 비정형 데이터의 특성을 파악하여 벡터 형태로 변환하는 것으로서 파일 자체를 저장하기 보다는 파일의 특성을 저장함으로써 데이터 사이즈를 축소할 수 있고 이는 데이터 처리 시스템 상에서 확장 SQL을 기반으로 빠른 결과를 도출해 내는데 큰 도움을 줄 수 있다.Referring to Figure 15, unstructured data (images, video, audio, etc.) has large dimensions, and when stored as a file itself (pixel unit, etc.), the size is large. Therefore, the data processing system of the present invention can process unstructured data by converting it into embeddings. Embedding identifies the characteristics of unstructured data such as images, text, and video and converts them into vector form. Data size can be reduced by storing the characteristics of the file rather than the file itself. This is based on extended SQL in the data processing system. This can be of great help in producing quick results.

비정형 데이터는 인공지능모델 상에서 임베딩 값으로서 추출될 수 있다.Unstructured data can be extracted as embedding values on an artificial intelligence model.

비정형 데이터의 임베딩 값을 추출하는 인공지능모델은 다양하다. 예를 들어 이미지의 색감이라는 특성을 추출해 내는 인공지능모델이 있는 반면 이미지에서 사람이 하고 있는 행동을 추출해 내는 인공지능모델이 있을 수 있고 형태 자체를 추출해 내는 모델이 있을 수도 있다.There are various artificial intelligence models that extract embedding values from unstructured data. For example, there may be an artificial intelligence model that extracts the color characteristics of an image, while there may be an artificial intelligence model that extracts the actions a person is doing in the image, and there may also be a model that extracts the shape itself.

본 발명의 실시예에 따르면, 컨버트(CONVERT) 함수가 비정형 데이터를 임베딩시켜줄 수 있다. 컨버트 함수에 모델명(또는 인공지능모델명)을 달리하여 원하는 특성이 임베딩값으로서 추출될 수 있으며, 임베딩값을 결정하는 인공지능모델은 새로운 모델이 개발되고 상용화시 데이터 처리 시스템에 추가될 수 있다.According to an embodiment of the present invention, the CONVERT function can embed unstructured data. By changing the model name (or artificial intelligence model name) in the convert function, the desired characteristics can be extracted as an embedding value, and the artificial intelligence model that determines the embedding value can be added to the data processing system when a new model is developed and commercialized.

본 발명의 실시예에 따르면, 컨버트(CONVERT) 함수를 사용하여 나온 임베딩값은 서치(SEARCH) 함수와 함께 비정형 검색시 사용될 수 있다. 서치(SEARCH) 함수는 임베딩과 데이터 처리 시스템에서 자체적으로 제공되는 인공지능 모델과 함께 사용되어 비정형 데이터에 대한 검색이 빠르게 수행될 수 있다.According to an embodiment of the present invention, the embedding value obtained using the CONVERT function can be used in an unstructured search together with the SEARCH function. The SEARCH function can be used with the artificial intelligence model provided by the embedding and data processing system to quickly search for unstructured data.

즉, 본 발명의 실시예에 따른 데이터 처리 시스템은 컨버트(CONVERT) 및 서치(SEARCH) 구문 사용시 비정형 데이터를 임베딩화시키는 과정 그리고 임베딩을 사용하여 비정형 검색을 하는 과정을 GPU를 사용하여 프로세스 가속화할 수 있다.In other words, the data processing system according to an embodiment of the present invention can accelerate the process of embedding unstructured data when using CONVERT and SEARCH syntax and the process of performing unstructured search using embedding using GPU. .

따라서, 비정형 데이터 및 정형 데이터에 대한 처리가 하나의 시스템/운영 플랫폼 및 하나의 데이터베이스로 수행할 뿐만 아니라, 기존의 시스템 대비하여 쿼리에 대한 보다 빠른 결과를 도출해 낼 수 있다.Therefore, not only can processing of unstructured data and structured data be performed with one system/operation platform and one database, but also faster results for queries can be derived compared to existing systems.

도 16은 본 발명의 실시예에 따른 비정형 데이터에 대한 비정형 임베딩 방법을 나타낸 개념도이다.Figure 16 is a conceptual diagram showing an unstructured embedding method for unstructured data according to an embodiment of the present invention.

도 16에서는 본 발명의 실시예에 따른 데이터 처리 시스템에서 컨버트(CONVERT) 및 서치(SEARCH) 구문 사용시 비정형 데이터를 임베딩화시키는 과정 그리고 임베딩을 사용하여 비정형 검색을 하는 과정이 개시된다.In Figure 16, a process of embedding unstructured data when using CONVERT and SEARCH syntax in a data processing system according to an embodiment of the present invention and a process of performing an unstructured search using embedding are disclosed.

도 16을 참조하면, 비정형 데이터(1600)는 컨버트 기능을 사용하여 인공지능모델1(1610) 내지 인공지능모델n를 통해 임베딩값으로 변환될 수 있다. 예를 들어, 비정형 데이터(1600)는 이미지이고, 임베딩값은 인공지능모델1(1610) 내지 인공지능모델n 각각의 이미지에 대한 특징맵을 기반으로 추출된 값일 수 있다.Referring to FIG. 16, unstructured data 1600 can be converted into embedding values through artificial intelligence model 1 (1610) to artificial intelligence model n using a convert function. For example, the unstructured data 1600 is an image, and the embedding value may be a value extracted based on the feature map for each image of artificial intelligence model 1 (1610) to artificial intelligence model n.

예를 들어, 비정형 데이터(1600)는 특정 이미지일 수 있고, 인공지능모델1(1610)은 이미지 내의 사람의 수에 대한 정보를 제1 임베딩값(1615)으로 변환하고, 인공지능모델2(1620)는 이미지를 구성하는 장소에 대한 정보를 제2 임베딩값(1625)으로 변환하고, 인공지능모델3(1630)은 이미지의 색체에 대한 정보를 제3 임베딩값(1635)으로 변환할 수 있다.For example, the unstructured data 1600 may be a specific image, artificial intelligence model 1 (1610) converts information about the number of people in the image into a first embedding value (1615), and artificial intelligence model 2 (1620) converts information about the number of people in the image into a first embedding value (1615). ) converts information about the place constituting the image into a second embedding value (1625), and artificial intelligence model 3 (1630) can convert information about the color of the image into a third embedding value (1635).

위와 같은 방식으로 이미지에 대한 제1 임베딩값(1615) 내지 제n 임베딩값이 데이터베이스(1650) 상에 저장될 수 있다.In the same manner as above, the first to nth embedding values 1615 for the image may be stored in the database 1650.

다음으로, 사용자는 사용자 장치를 기반으로 서치 구문을 사용하여 이미지 내에 포함된 사람의 수를 문의하는 쿼리를 데이터 처리 시스템으로 전송할 수 있고, 데이터 처리 시스템은 제1 임베딩값(1615)을 기초로 쿼리에 대한 응답을 사용자에게 전달할 수 있다.Next, the user may send a query to the data processing system to inquire about the number of people included in the image using a search syntax based on the user device, and the data processing system may query the data processing system based on the first embedding value 1615. The response can be delivered to the user.

비정형 데이터를 임베딩값으로 변화시키는 적어도 하나의 인공지능모델을 포함하는 그룹은 임베딩 인공지능모델 그룹이라는 용어로 표현될 수 있고, 임베딩 인공지능모델 그룹은 하나의 비정형 데이터를 목적에 따라 다양한 임베딩값으로 변화시킬 수 있다.A group containing at least one artificial intelligence model that changes unstructured data into an embedding value can be expressed by the term embedding artificial intelligence model group, and an embedding artificial intelligence model group converts one unstructured data into various embedding values depending on the purpose. It can change.

도 17은 본 발명의 실시예에 따른 컨버트 구문 및 서치 구문을 사용한 결과를 나타낸 개념도이다.Figure 17 is a conceptual diagram showing the results of using the convert syntax and search syntax according to an embodiment of the present invention.

도 17의 (a)는 컨버트 구문을 사용하여 임베딩한 결과가 개시된다.Figure 17 (a) shows the result of embedding using the convert syntax.

도 17의 (a)를 참조하면, 영화 리뷰에 대한 텍스트를 임베딩한 결과가 개시된다. 영화에 대한 복수의 리뷰가 존재하고, 복수의 리뷰 각각이 인공지능모델의 특징맵(feature map)을 기반으로 임베딩되어 벡터값으로 변화될 수 있다.Referring to (a) of FIG. 17, the result of embedding text for a movie review is disclosed. There are multiple reviews for a movie, and each of the multiple reviews can be embedded based on the feature map of the artificial intelligence model and converted into a vector value.

도 17의 (b)는 서치 구문을 사용하여 서치를 수행한 결과가 개시된다.Figure 17(b) shows the results of performing a search using a search syntax.

도 17의 (b)를 참조하면, 영화 리뷰('This movie was my favorite movie of all time')와 가장 유사도가 높은 다른 영화 리뷰를 10개를 우선순위로 찾는 것이다. 영화 리뷰('This movie was my favorite movie of all time')는 임베딩값으로 변화되고, 영화 리뷰와 가장 유사한 임베딩값을 가지는 우선 순위를 10개가 유사도 스코어값을 기준으로 제공될 수 있다.Referring to (b) of FIG. 17, 10 other movie reviews with the highest similarity to the movie review ('This movie was my favorite movie of all time') are prioritized. A movie review ('This movie was my favorite movie of all time') is converted into an embedding value, and the 10 priorities with the most similar embedding value to the movie review can be provided based on the similarity score value.

즉, 본 발명의 실시예에 따르면 데이터 처리 시스템에서 비정형 데이터의 임베딩 방법은 데이터 처리 시스템의 임베딩 인공지능모델 그룹이 비정형 데이터를 임베딩값으로 변화시키는 단계와 데이터 처리 시스템의 데이터베이스가 임베딩값을 저장하는 단계를 포함할 수 있다.That is, according to an embodiment of the present invention, the method of embedding unstructured data in a data processing system includes steps in which an embedding artificial intelligence model group of the data processing system changes unstructured data into an embedding value and a database of the data processing system stores the embedding value. May include steps.

임베딩 인공지능모델 그룹은 필요한 임베딩값을 생성하는 적어도 하나의 서로 다른 인공지능모델을 포함하고, 임베딩값은 적어도 하나의 서로 다른 인공지능모델의 특징맵을 기반으로 결정될 수 있다. 또한, 임베딩값은 확장 SQL 엔진을 기반으로 한 컨버트 구문에 의해 생성되고, 임베딩값은 확장 SQL 엔진을 기반으로 한 서치 구문에 의해 탐색될 수 있다.The embedding artificial intelligence model group includes at least one different artificial intelligence model that generates the necessary embedding value, and the embedding value may be determined based on the feature map of the at least one different artificial intelligence model. Additionally, the embedding value is generated by a convert statement based on the extended SQL engine, and the embedding value can be searched by a search statement based on the extended SQL engine.

도 18은 본 발명의 실시예에 따른 임베딩값을 바이너리화하기 위한 방법을 나타낸 개념도이다.Figure 18 is a conceptual diagram showing a method for binarizing an embedding value according to an embodiment of the present invention.

도 18에서는 추가적인 임베딩값에 대한 바이너리화를 통해 추가적으로 처리 속도를 높이기 위한 방법이 개시된다.In Figure 18, a method for additionally increasing processing speed through binarization of additional embedding values is disclosed.

도 18을 참조하면, 데이터 처리 시스템에서 컨버트 과정은 적지 않은 시간과 컴퓨팅 자원이 들어가는 만큼 재사용을 위해 데이터베이스에 저장할 필요가 있다. Referring to FIG. 18, the conversion process in a data processing system requires a considerable amount of time and computing resources, so it needs to be stored in a database for reuse.

임베딩 자체는 고밀도 실수 배열 또는 실수의 리스트의 리스트 (dense array of real numbers or list of lists of floats)이기 때문에 데이터베이스에 저장을 하게 된다면 많은 시간과 자원이 소요되어 추가적인 옵션으로 임베딩값을 바이트 배열 형식으로 저장할 수 있다.Since the embedding itself is a dense array of real numbers or list of lists of floats, storing it in a database would take a lot of time and resources, so as an additional option, the embedding value must be formatted as a byte array. You can save it.

임베딩값이 바이트 배열로 저장함으로써 데이터 테이블을 데이터베이스로 저장하는데 소요되는 시간 및 자원이 감소될 수 있다.By storing the embedding value as a byte array, the time and resources required to store the data table in the database can be reduced.

즉, 컨버트되어 생성된 임베딩값은 바이트 배열인 바이너리값으로 다시 한번 변환되고, 데이터베이스에 저장될 수 있다. 서치시 데이터베이스에 불러온 바이트 배열인 바이너리값은 다시 임베딩값으로 변한되어 서치 동작이 이루어질 수 있다.In other words, the converted and generated embedding value can be converted once again into a binary value that is a byte array and stored in the database. When searching, the binary value, which is a byte array loaded into the database, can be converted back into an embedding value and a search operation can be performed.

도 18에서는 본 발명의 실시예에 따른 데이터 처리 시스템에서 비정형 데이터를 임베딩화하는 과정, 임베딩값을 바이너리화하여 바이너리값을 생성하는 과정, 바이너리값을 저장하는 과정, 바이너리된 값을 임베딩값으로 변환하는 과정, 변환된 임베딩값을 기반으로 결과값을 출력하는 과정이 개시된다.Figure 18 shows a process of embedding unstructured data in a data processing system according to an embodiment of the present invention, a process of generating a binary value by binarizing the embedding value, a process of storing the binary value, and converting the binary value into an embedding value. The process of outputting the result based on the converted embedding value is initiated.

도 18을 참조하면, 비정형 데이터는 컨버트 기능을 사용하여 인공지능모델1 내지 인공지능모델n를 통해 임베딩값으로 변환될 수 있다. 예를 들어, 비정형 데이터는 이미지이고, 임베딩값은 인공지능모델1 내지 인공지능모델n 각각의 이미지에 대한 특징맵에 대한 정보일 수 있다.Referring to FIG. 18, unstructured data can be converted into embedding values through artificial intelligence model 1 to artificial intelligence model n using the convert function. For example, the unstructured data is an image, and the embedding value may be information about the feature map for each image of artificial intelligence model 1 to artificial intelligence model n.

예를 들어, 비정형 데이터는 특정 이미지일 수 있고, 인공지능모델1은 이미지 내의 사람의 수에 대한 정보를 제1 임베딩값으로 변환하고, 인공지능모델2는 이미지를 구성하는 장소에 대한 정보를 제2 임베딩값으로 변환하고, 인공지능모델3은 이미지의 색체에 대한 정보를 제3 임베딩값으로 변환할 수 있다.For example, unstructured data may be a specific image, artificial intelligence model 1 converts information about the number of people in the image into a first embedding value, and artificial intelligence model 2 provides information about the places that make up the image. 2 embedding value, and artificial intelligence model 3 can convert information about the color of the image into a third embedding value.

바이너리 모듈은 제1 임베딩값을 제1 바이너리값으로 변환시키고, 제2 임베딩값을 제2 바이너리값으로 변환시키고, 제3 임베딩값을 제3 바이너리값으로 변환시킬 수 있다.The binary module may convert the first embedding value into a first binary value, the second embedding value into a second binary value, and the third embedding value into a third binary value.

위와 같은 방식으로 변환된 제1 바이너리값 내지 제n 바이너리값이 데이터베이스 상에 저장될 수 있다.The first to nth binary values converted in the above manner may be stored in the database.

다음으로, 사용자는 사용자 장치를 통해 서치 구문을 통해 이미지 내에 포함된 사람의 수를 문의하는 쿼리를 데이터 처리 시스템으로 전송할 수 있고, 데이터 처리 시스템의 바이너리 모듈은 제1 바이너리값을 제1 임베딩값으로 변환시키고, 제1 임베딩값을 기초로 쿼리에 대한 응답이 사용자에게 전달할 수 있다.Next, the user may transmit a query inquiring about the number of people included in the image through a search phrase through the user device to the data processing system, and the binary module of the data processing system converts the first binary value into the first embedding value. It is converted, and a response to the query can be delivered to the user based on the first embedding value.

도 19는 본 발명의 실시예에 따른 바이너리화에 대한 예시이다.Figure 19 is an example of binarization according to an embodiment of the present invention.

도 19를 참조하면, 바이너리 모듈은 임베딩값을 바이너리값으로 변화시킬 수 있다.Referring to FIG. 19, the binary module can change the embedding value into a binary value.

바이너리 모듈은 영화 리뷰 텍스트를 임베딩값으로 변화시키고, 임베딩값을 다시 바이너리값으로 변화시킬 수 있다.The binary module can change the movie review text into an embedding value, and then change the embedding value back into a binary value.

즉, 본 발명의 실시예에 따르면, 데이터 처리 시스템에서 비정형 데이터의 바이너리화 방법은 데이터 처리 시스템의 바이너리 모듈이 비정형 데이터에 대한 임베딩값을 바이너리값으로 변화시키는 단계와 데이터 처리 시스템의 데이터베이스가 바이너리값을 저장하는 단계를 포함할 수 있다.That is, according to an embodiment of the present invention, the method of binarizing unstructured data in a data processing system includes the steps of the binary module of the data processing system changing the embedding value for the unstructured data into a binary value, and the database of the data processing system converting the embedding value into a binary value. It may include the step of storing.

데이터 처리 시스템의 임베딩 인공지능모델 그룹이 비정형 데이터를 임베딩값으로 변화시키는 단계를 더 포함할 수 있다. 임베딩 인공지능모델 그룹은 필요한 임베딩값을 생성하는 적어도 하나의 서로 다른 인공지능모델을 포함하고, 임베딩값은 적어도 하나의 서로 다른 인공지능모델의 특징맵을 기반으로 결정될 수 있다.The embedding artificial intelligence model group of the data processing system may further include a step of changing unstructured data into embedding values. The embedding artificial intelligence model group includes at least one different artificial intelligence model that generates the necessary embedding value, and the embedding value may be determined based on the feature map of the at least one different artificial intelligence model.

도 20은 본 발명의 실시예에 따른 임베딩값 또는 바이너리값으로 저장하기 위한 방법이 개시된다.Figure 20 discloses a method for storing an embedding value or a binary value according to an embodiment of the present invention.

도 20에서는 데이터 처리 시스템이 비정형 데이터를 임베딩값 또는 바이너리값으로 처리하기 위한 방법이 개시된다.In Figure 20, a method for a data processing system to process unstructured data into embedding values or binary values is disclosed.

도 20을 참조하면, 데이터 처리 시스템은 1차적으로 비정형 데이터를 바이너리값으로 저장하고, 비정형 데이터의 바이너리값의 활용 빈도를 고려하여 비정형 데이터를 제1 타입 비정형 데이터(2010)와 제2 타입 비정형 데이터(2020)로 구분하여 서로 다른 포맷으로서 데이터베이스 상에 저장할 수 있다.Referring to FIG. 20, the data processing system primarily stores unstructured data as binary values, and considers the frequency of use of binary values of unstructured data and divides the unstructured data into first type unstructured data (2010) and second type unstructured data. (2020) and can be stored in the database in different formats.

초기에 바이너리값으로 저장된 비정형 데이터는 디폴트 비정형 데이터(2000)라는 용어로 표현될 수 있다.Unstructured data initially stored as binary values can be expressed in terms of default unstructured data (2000).

디폴트 비정형 데이터(2000)는 제1 타입 비정형 데이터(2010) 또는 제2 타입 비정형 데이터(2020)로 변환 전의 데이터로서 임계 시간 동안 바이너리 값에 대한 사용 빈도수의 통계를 누적하기 전의 데이터일 수 있다. 임계 시간 동안 바이너리 값에 대한 사용 빈도수를 기반으로 디폴트 비정형 데이터(2000)는 제1 타입 비정형 데이터(2010) 또는 제2 타입 비정형 데이터(2020)로 변환될 수 있다.The default unstructured data 2000 may be data before conversion into the first type unstructured data 2010 or the second type unstructured data 2020, and may be data before accumulating statistics on the frequency of use of binary values for a critical time. The default unstructured data 2000 may be converted into first type unstructured data 2010 or second type unstructured data 2020 based on the frequency of use of the binary value during the critical time.

제1 타입 비정형 데이터(2010)는 임베딩값으로 저장되고, 제2 타입 비정형 데이터(2020)는 바이너리값으로 저장될 수 있다. The first type unstructured data 2010 may be stored as an embedding value, and the second type unstructured data 2020 may be stored as a binary value.

제1 타입 비정형 데이터(2010)는 데이터의 사용 빈도가 상대적으로 높은 데이터일 수 있다. 제1 타입 비정형 데이터(2010)는 학습, 예측, 서치 등 다양한 이유로 설정된 시간 내에서 바이너리값에서 임베딩값으로 임계 시간 내에 임계 횟수 이상 변화된 비정형 데이터일 수 있다. The first type of unstructured data (2010) may be data with a relatively high frequency of use. The first type of unstructured data (2010) may be unstructured data that has changed from a binary value to an embedding value more than a threshold number of times within a set time for various reasons such as learning, prediction, and search.

제2 타입 비정형 데이터(2020)는 사용 빈도가 상대적으로 낮은 데이터일 수 있다. 제1 타입 비정형 데이터(2010)는 학습, 예측, 서치 등 다양한 이유로 설정된 시간 내에서 바이너리값에서 임베딩값으로 임계 시간 내에 임계 횟수 미만 변화된 비정형 데이터일 수 있다.The second type of unstructured data (2020) may be data with a relatively low frequency of use. The first type of unstructured data (2010) may be unstructured data that has changed less than a threshold number of times within a threshold time from a binary value to an embedding value within a set time for various reasons such as learning, prediction, or search.

특정 주기별로 제1 타입 비정형 데이터(2010), 제2 타입 비정형 데이터(2020)에 대한 주기별 사용 빈도가 다시 측정될 수 있다. 주기별 사용 빈도를 고려하여 제1 타입 비정형 데이터(2010)는 제1 타입 비정형 데이터(2010)를 유지하거나, 제2 타입 비정형 데이터(2020)로 변화될 수 있다. 마찬가지로 주기별 사용 빈도를 고려하여 제2 타입 비정형 데이터(2020)도 제2 타입 비정형 데이터(2020)를 유지하거나, 제1 타입 비정형 데이터(2010)로 변화될 수 있다.The frequency of use for each cycle of the first type unstructured data (2010) and the second type unstructured data (2020) may be measured again for each specific cycle. Considering the frequency of use by cycle, the first type unstructured data 2010 may remain as the first type unstructured data 2010 or may be changed into the second type unstructured data 2020. Likewise, considering the frequency of use by cycle, the second type unstructured data 2020 may remain as the second type unstructured data 2020 or may be changed to the first type unstructured data 2010.

특정 주기는 사용자들의 제1 타입 비정형 데이터(2010)와 제2 타입 비정형 데이터(2020) 간의 전환이 일어나는 빈도를 고려하여 전환 빈도가 가장 높은 시점을 기준으로 설정되고, 주기의 설정에 따라 변화를 결정하기 위한 임계 사용 빈도는 변화될 수 있다. 특정 주기는 사용자의 워크스페이스 상에서 데이터 활용 빈도를 추가적으로 고려하여 조정될 수도 있다.A specific cycle is set based on the point in time when the switching frequency is highest, taking into account the frequency with which users switch between the first type of unstructured data (2010) and the second type of unstructured data (2020), and changes are determined according to the cycle setting. The threshold frequency of use may vary. The specific cycle may be adjusted to additionally consider the frequency of data utilization in the user's workspace.

도 21은 본 발명의 실시예에 따른 워크스페이스 백업 방법을 나타낸 개념도이다. Figure 21 is a conceptual diagram showing a workspace backup method according to an embodiment of the present invention.

도 21에서는 사용자가 데이터 처리 시스템 상에서 본인 소유의 워크스페이스를 백업하기 위한 방법이 개시된다.In Figure 21, a method for a user to back up his or her own workspace on a data processing system is disclosed.

도 21을 참조하면, 사용자는 워크스페이스(2160)의 백업을 수행할 수 있다. 워크스페이스(2160)의 백업은 백업 당시의 사용자의 사용자 데이터와 사용자의 워크스페이스 정보에 대한 사용자 메타데이터를 모두 저장하는 것을 말하며 사용자가 실수로 본인의 데이터를 삭제하거나 잃었을 시 백업해 뒀던 백업 파일을 사용하여 복원이 가능하다.Referring to FIG. 21, the user can perform a backup of the workspace 2160. The backup of the workspace (2160) refers to storing both the user's user data at the time of backup and the user metadata about the user's workspace information. If the user accidentally deletes or loses his or her data, the backup file that was backed up is stored. It can be restored using

본 발명의 실시예에 따르면, 마스터 서버(2100) 내의 마스터 컨트롤러(2110) 내에 워크스페이스 백업 컨트롤러(2120)가 존재할 수 있다. 워크스페이스 백업 컨트롤러(2120)는 사용자의 요청을 기초로 사용자에 대응되는 워크스페이스(2160)를 제어하는 서버 컨트롤러(2130)로 백업 요청을 전달할 수 있다.According to an embodiment of the present invention, a workspace backup controller 2120 may exist within the master controller 2110 within the master server 2100. The workspace backup controller 2120 may transmit the backup request to the server controller 2130, which controls the workspace 2160 corresponding to the user, based on the user's request.

서버 컨트롤러(2130)는 워크스페이스 스토리지 체커(workspace storage checker)(2140), 스토리지 디퍼런스 해셔(storage difference hasher)(2150)를 포함할 수 있다.The server controller 2130 may include a workspace storage checker 2140 and a storage difference hasher 2150.

워크스페이스 스토리지 체커(2140)는 워크스페이스 백업 스토리지(2170)에 워크스페이스 백업 데이터의 존재를 확인할 수 있다. 워크스페이스 백업 스토리지(2170)에 기존 워크스페이스 백업 데이터가 존재하는 경우, 사용자의 워크스페이스(2160)에 대응되는 스토리지 상의 사용자 데이터 및 사용자 메타데이터와 기존 워크스페이스 백업 데이터를 비교할 수 있다. 현재 스토리지 상의 사용자 데이터 및 사용자 메타데이터는 타겟 워크스페이스 백업 데이터라는 용어로 표현될 수 있다. 즉, 워크스페이스 스토리지 체커(2140)는 기존 워크스페이스 백업 데이터와 타겟 워크스페이스 백업 데이터 간의 비교를 수행할 수 있다. 만약, 워크스페이스 백업 스토리지(2170)에 기존 워크스페이스 백업 데이터가 존재하지 않는 경우, 타겟 워크스페이스 백업 데이터는 워크스페이스 백업 스토리지(2170)로 백업될 수 있다.The workspace storage checker 2140 can check the existence of workspace backup data in the workspace backup storage 2170. If existing workspace backup data exists in the workspace backup storage 2170, user data and user metadata on the storage corresponding to the user's workspace 2160 can be compared with the existing workspace backup data. User data and user metadata currently on storage may be expressed in terms of target workspace backup data. That is, the workspace storage checker 2140 can perform a comparison between existing workspace backup data and target workspace backup data. If there is no existing workspace backup data in the workspace backup storage 2170, the target workspace backup data can be backed up to the workspace backup storage 2170.

스토리지 디퍼런스 해셔(2150)는 변경된 데이터에 대한 해시화를 수행하기 위해 구현될 수 있다. 스토리지 디퍼런스 해셔(2150)는 기존 워크스페이스 백업 데이터와 타겟 워크스페이스 백업 데이터 간의 변경된 데이터가 있다면 그 부분만 해시화 시키고 워크스페이스 백업 스토리지(2170)는 해시 값을 기반으로 기존 워크스페이스 백업 데이터를 업데이트 시킬 수 있다.The storage difference hasher 2150 may be implemented to perform hashing on changed data. The storage difference hasher (2150) hashes only the changed data between the existing workspace backup data and the target workspace backup data, and the workspace backup storage (2170) updates the existing workspace backup data based on the hash value. You can do it.

이러한 방식으로 사용자 데이터 및 사용자 메타데이터가 백업되어 워크스페이스 백업 스토리지(2170)에 저장될 수 있다.In this way, user data and user metadata can be backed up and stored in workspace backup storage 2170.

도 22는 본 발명의 실시예에 따른 워크스페이스 백업 방법을 나타낸 순서도이다.Figure 22 is a flowchart showing a workspace backup method according to an embodiment of the present invention.

도 22에서는 사용자가 데이터 처리 시스템 상에서 본인 소유의 워크스페이스를 백업하기 위한 방법이 개시된다.In Figure 22, a method for a user to back up his or her own workspace on a data processing system is disclosed.

데이터 처리 시스템의 마스터 서버 내의 마스터 컨트롤러 내에 워크스페이스 백업 컨트롤러가 사용자 백업 요청을 수신할 수 있다(단계 S2200).A workspace backup controller within the master controller within the master server of the data processing system may receive the user backup request (step S2200).

워크스페이스 백업 컨트롤러는 사용자 백업 요청을 기반으로 서버 컨트롤러로 사용자 백업 요청을 전달할 수 있다(단계 S2210).The workspace backup controller may forward the user backup request to the server controller based on the user backup request (step S2210).

서버 컨트롤러의 워크스페이스 스토리지 체커는 워크스페이스 백업 스토리지에 기존 워크스페이스 백업 데이터의 존재를 확인할 수 있다(단계 S2220).The workspace storage checker of the server controller may check the existence of existing workspace backup data in the workspace backup storage (step S2220).

워크스페이스 백업 스토리지에 기존 워크스페이스 백업 데이터가 존재하는 경우, 사용자의 워크스페이스의 타겟 워크스페이스 백업 데이터와 사용자의 워크스페이스의 기존 워크스페이스 백업 데이터를 비교할 수 있다(단계 S2230).If existing workspace backup data exists in the workspace backup storage, the target workspace backup data of the user's workspace and the existing workspace backup data of the user's workspace can be compared (step S2230).

워크스페이스 백업 스토리지에 기존 워크스페이스 백업 데이터가 존재하지 않는 경우, 타겟 워크스페이스 백업 데이터는 워크스페이스 백업 스토리지로 백업될 수 있다(단계 S2240).If existing workspace backup data does not exist in the workspace backup storage, the target workspace backup data may be backed up to the workspace backup storage (step S2240).

스토리지 디퍼런스 해셔는 기존 워크스페이스 백업 데이터와 타겟 워크스페이스 백업 데이터 간의 변경된 데이터를 기반으로 해시값을 생성한다(단계 S2250).The storage difference hasher generates a hash value based on the changed data between the existing workspace backup data and the target workspace backup data (step S2250).

워크스페이스 백업 스토리지는 해시값을 기반으로 기존 워크스페이스 백업 데이터를 업데이트시킬 수 있다(단계 S2260).Workspace backup storage can update existing workspace backup data based on the hash value (step S2260).

즉, 본 발명의 실시예에 따른 데이터 처리 시스템에서 워크스페이스 백업 방법은 데이터 처리 시스템이 타겟 워크스페이스 백업 데이터와 기존 워크스페이스 백업 데이터를 결정하는 단계, 데이터 처리 시스템이 타겟 워크스페이스 백업 데이터와 기존 워크스페이스 백업 데이터를 비교를 기반으로 해시값을 생성하는 단계와 데이터 처리 시스템이 해시값을 기반으로 기존 워크스페이스를 타겟 워크스페이스로 백업하는 단계를 포함할 수 있다. 기존 워크스페이스는 사용자가 기존에 생성한 워크스페이스이고, 타겟 워크스페이스는 사용자가 기존 워크스페이스를 백업할 대상 워크스페이스이다.That is, the workspace backup method in the data processing system according to the embodiment of the present invention includes the steps of the data processing system determining target workspace backup data and existing workspace backup data, and the data processing system determining target workspace backup data and existing workspace backup data. It may include a step of generating a hash value based on comparison of space backup data and a step of the data processing system backing up the existing workspace to the target workspace based on the hash value. The existing workspace is the workspace that the user previously created, and the target workspace is the workspace where the user will back up the existing workspace.

데이터 처리 시스템의 서버 컨트롤러의 워크스페이스 스토리지 체커는 워크스페이스 백업 스토리지에 기존 워크스페이스 백업 데이터의 존재를 확인하고, 워크스페이스 백업 스토리지에 기존 워크스페이스 백업 데이터가 존재하는 경우, 타겟 워크스페이스 백업 데이터와 기존 워크스페이스 백업 데이터를 비교할 수 있다.The workspace storage checker of the server controller of the data processing system checks the existence of existing workspace backup data in the workspace backup storage, and if existing workspace backup data exists in the workspace backup storage, the target workspace backup data and the existing workspace backup data are stored in the workspace backup storage. You can compare workspace backup data.

도 23은 본 발명의 실시예에 따른 서버 백업 방법을 나타낸 개념도이다. Figure 23 is a conceptual diagram showing a server backup method according to an embodiment of the present invention.

도 23에서는 사용자가 데이터 처리 시스템 상에서 서버를 백업하기 위한 방법이 개시된다.In Figure 23, a method for a user to back up a server on a data processing system is disclosed.

도 23을 참조하면, 서버 백업은 자동 백업으로서 모든 서버 관련 데이터를 백업하는 것이다. 서버 백업은 서비스 운영 차원에서 하는 백업으로서 주기적으로 실시될 수 있고, 워크스페이스 단위의 사용자 데이터뿐만 아니라 서버에서 관련되는 전체 워크스페이스에 관련된 서버 데이터 및 서버 메타 데이터에 대한 백업을 수행할 수 있다. 서버 백업은 서버 자체에 문제가 있거나 특정 문제로 인해 서버가 작동을 안 하는 경우를 위한 것이다.Referring to FIG. 23, server backup is an automatic backup that backs up all server-related data. Server backup is a backup performed at the service operation level and can be performed periodically. It can back up not only workspace-level user data, but also server data and server metadata related to the entire workspace related to the server. Server backups are for when there is a problem with the server itself or when the server is not working due to a specific problem.

마스터 서버(2300)의 마스터 컨트롤러(2310) 내에 서버 백업 컨트롤러(2320)가 존재할 수 있다. 서버 백업 컨트롤러(2320)는 주기적으로 서버에 대한 백업 요청을 수신하고, 서버의 서버 컨트롤러(2330)에게 백업 요청을 전송할 수 있다.A server backup controller 2320 may exist within the master controller 2310 of the master server 2300. The server backup controller 2320 may periodically receive a backup request for the server and transmit the backup request to the server controller 2330 of the server.

서버 컨트롤러(2330)는 서버 스토리지 체커(2340)와 스토리지 디퍼런스 해셔(2350)를 포함할 수 있다.The server controller 2330 may include a server storage checker 2340 and a storage difference hasher 2350.

서버 스토리지 체커(2340)는 서버 백업 스토리지(2360)에 기존 서버 백업 데이터가 있는지 확인하고 현재 서버의 서버 데이터, 서버 메타데이터와 비교해 추가/변경/삭제된 데이터가 있는지 확인한다. 변경된 서버 데이터, 변경된 서버 메타데이터가 있다면 변경된 데이터를 기반으로 한 해시값을 생성하여 서버 백업 스토리지(2360)의 기존 서버 백업 데이터를 업데이트할 수 있다.The server storage checker 2340 checks whether there is existing server backup data in the server backup storage 2360 and compares it with the server data and server metadata of the current server to check whether there is any added/changed/deleted data. If there is changed server data or changed server metadata, the existing server backup data in the server backup storage 2360 can be updated by generating a hash value based on the changed data.

현재 서버 상의 서버 데이터 및 서버 메타데이터는 타겟 서버 백업 데이터라는 용어로 표현될 수 있다. 즉, 서버 스토리지 체커(2340)는 기존 서버 백업 데이터와 타겟 서버 백업 데이터 간의 비교를 수행할 수 있다. 만약, 서버 백업 스토리지(2360)에 기존 서버 백업 데이터가 존재하지 않는 경우, 타겟 서버 백업 데이터는 서버 백업 스토리지(2360)로 백업될 수 있다.Server data and server metadata on the current server may be expressed in terms of target server backup data. That is, the server storage checker 2340 can perform a comparison between existing server backup data and target server backup data. If existing server backup data does not exist in the server backup storage 2360, the target server backup data may be backed up to the server backup storage 2360.

스토리지 디퍼런스 해셔(2350)는 변경된 데이터에 대한 해시화를 수행하기 위해 구현될 수 있다. 스토리지 디퍼런스 해셔(2350)는 기존 서버 백업 데이터와 타겟 서버 백업 데이터 간의 변경된 데이터가 있다면 변경된 데이터 기반의 해시화를 수행하여 해시값을 생성하고, 서버 백업 스토리지(2360)는 해시값을 기반으로 기존 서버 백업 데이터를 업데이트시킬 수 있다. The storage difference hasher 2350 may be implemented to perform hashing on changed data. If there is changed data between the existing server backup data and the target server backup data, the storage difference hasher 2350 generates a hash value by performing hashing based on the changed data, and the server backup storage 2360 generates a hash value based on the hash value. Server backup data can be updated.

이러한 방식으로 서버 데이터 및 서버 메타데이터가 백업되어 서버 백업 스토리지(2360)에 저장될 수 있다.In this way, server data and server metadata can be backed up and stored in server backup storage 2360.

도 24는 본 발명의 실시예에 따른 서버 백업 방법을 나타낸 순서도이다. Figure 24 is a flowchart showing a server backup method according to an embodiment of the present invention.

도 24에서는 데이터 처리 시스템 상에서 서버 백업을 수행하기 위한 방법이 개시된다.24 discloses a method for performing server backup on a data processing system.

데이터 처리 시스템의 마스터 서버 내의 마스터 컨트롤러 내에 서버 백업 컨트롤러가 서버 백업 요청을 수신할 수 있다(단계 S2400).A server backup controller within the master controller within the master server of the data processing system may receive the server backup request (step S2400).

서버 백업 컨트롤러는 서버 백업 요청을 기반으로 서버 컨트롤러로 서버 백업 요청을 전달할 수 있다(단계 S2410).The server backup controller may forward the server backup request to the server controller based on the server backup request (step S2410).

서버 컨트롤러의 서버 스토리지 체커는 서버 백업 스토리지에 기존 서버 백업 데이터의 존재를 확인할 수 있다(단계 S2420).The server storage checker of the server controller may check the existence of existing server backup data in the server backup storage (step S2420).

서버 백업 스토리지에 기존 서버 백업 데이터가 존재하는 경우, 서버의 타겟 서버 백업 데이터와 사용자의 워크스페이스의 기존 서버 백업 데이터를 비교할 수 있다(단계 S2430).If existing server backup data exists in the server backup storage, the target server backup data of the server and the existing server backup data of the user's workspace can be compared (step S2430).

서버 백업 스토리지에 기존 서버 백업 데이터가 존재하지 않는 경우, 타겟 서버 백업 데이터는 서버 백업 스토리지로 백업될 수 있다(단계 S2440).If existing server backup data does not exist in the server backup storage, the target server backup data may be backed up to the server backup storage (step S2440).

스토리지 디퍼런스 해셔가 기존 서버 백업 데이터와 타겟 서버 백업 데이터 간의 변경된 데이터를 기반으로 결정된 해시값을 생성할 수 있다(단계 S2450). The storage difference hasher may generate a hash value determined based on the changed data between the existing server backup data and the target server backup data (step S2450).

서버 백업 스토리지는 해시값을 기반으로 기존 서버 백업 데이터를 업데이트시킬 수 있다(단계 S2460).Server backup storage can update existing server backup data based on the hash value (step S2460).

즉, 본 발명의 실시예에 따른 서버 백업 방법은 데이터 처리 시스템이 타겟 서버 백업 데이터와 기존 서버 백업 데이터를 결정하는 단계, 데이터 처리 시스템이 타겟 서버 백업 데이터와 기존 서버 백업 데이터를 비교를 기반으로 해시값을 생성하는 단계와 데이터 처리 시스템이 해시값을 기반으로 기존 서버를 타겟 서버로 백업하는 단계를 포함할 수 있다.That is, the server backup method according to an embodiment of the present invention includes the steps of the data processing system determining target server backup data and existing server backup data, and the data processing system hashing the target server backup data and existing server backup data based on comparison. It may include a step of generating a value and a step of the data processing system backing up the existing server to the target server based on the hash value.

데이터 처리 시스템의 서버 컨트롤러의 서버 스토리지 체커는 서버 백업 스토리지에 기존 서버 백업 데이터의 존재를 확인하고, 서버 백업 스토리지에 기존 서버 백업 데이터가 존재하는 경우, 타겟 서버 백업 데이터와 기존 서버 백업 데이터를 비교할 수 있다. 해시값은 스토리지 디퍼런스 해셔에 의해 기존 서버 백업 데이터와 타겟 서버 백업 데이터 간의 변경된 데이터를 기반으로 결정될 수 있다.The server storage checker of the server controller of the data processing system can check the existence of existing server backup data in the server backup storage, and compare the target server backup data with the existing server backup data if existing server backup data exists in the server backup storage. there is. The hash value may be determined based on the changed data between the existing server backup data and the target server backup data by the storage difference hasher.

도 25는 본 발명의 실시예에 따른 워크스페이스 마이그레이션 방법을 나타낸 개념도이다.Figure 25 is a conceptual diagram showing a workspace migration method according to an embodiment of the present invention.

도 25에서는 워크스페이스에 대한 마이그레이션을 수행하는 방법이 개시된다.In Figure 25, a method of performing migration for a workspace is disclosed.

도 25를 참조하면, 마이그레이션은 기존 서버에 있던 워크스페이스를 새로운 서버로 옮기는 행위를 의미한다.Referring to Figure 25, migration refers to the act of moving a workspace from an existing server to a new server.

마이그레이션시에 사용자 마이그레이션 사양(user migration specification)에 따라 사용자에게 배분된 스토리지나 사용자에게 배분된 컴퓨팅 리소스(CPU, GPU, 메모리)가 업데이트 될 수 있다.During migration, the storage distributed to the user or the computing resources (CPU, GPU, memory) distributed to the user may be updated according to the user migration specification.

사용자는 마이그레이션시 사용자 마이그레이션 사양으로서 스토리지 사이즈 업그레이드, 컴퓨팅 리소스 업그레이드 등을 요청하고, 이러한 사용자 마이그레이션 사양을 고려하여 사용자의 새로운 워크스페이스가 정의될 수 있다.When migrating, the user requests storage size upgrade, computing resource upgrade, etc. as user migration specifications, and the user's new workspace can be defined by considering these user migration specifications.

이하, 기존 사용자의 워크스페이스가 포함된 서버는 기존 서버(2503)라는 이름으로 표현되고, 워크스페이스가 마이그레이션될 서버는 마이그레이션 서버(2506)라는 이름으로 표현될 수 있다.Hereinafter, the server containing the existing user's workspace may be expressed as the existing server 2503, and the server to which the workspace will be migrated may be expressed as the migration server 2506.

마스터 서버(2500)의 마스터 컨트롤러(2510)의 워크스페이스 마이그레이션 컨트롤러(2520)는 사용자로부터 마이그레이션 요청을 수신할 수 있다.The workspace migration controller 2520 of the master controller 2510 of the master server 2500 may receive a migration request from the user.

워크스페이스 마이그레이션 컨트롤러(2520)는 사용자의 마이그레이션 요청을 받고 마이그레이션 요청을 기반으로 사용자의 워크스페이스의 이전이 가능한 마이그레이션 서버(2506)를 조회할 수 있다. 워크스페이스 마이그레이션 컨트롤러(2520)는 마이그레이션 서버(2506)를 결정하고, 기존 서버(2503)와 마이그레이션 서버(2506)로 마이그레이션 요청을 전달할 수 있다. 마이그레이션 요청은 사용자 마이그레이션 사양에 대한 정보를 포함할 수 있다.The workspace migration controller 2520 may receive a user's migration request and query the migration server 2506 that can transfer the user's workspace based on the migration request. The workspace migration controller 2520 can determine the migration server 2506 and deliver the migration request to the existing server 2503 and the migration server 2506. The migration request may include information about user migration specifications.

워크스페이스 마이그레이션 컨트롤러(2520)는 마이그레이션 서버(2506)로 마이그레이션을 수행할 것을 요청하는 제1 마이그레이션 요청을 기존 서버(2503)로 전달하고, 기존 서버(2503)로부터 마이그레이션에 따른 데이터를 수신할 것을 요청하는 제2 마이그레이션 요청을 마이그레이션 서버(2506)로 전송할 수 있다.The workspace migration controller 2520 transmits the first migration request requesting migration to the migration server 2506 to the existing server 2503 and requests to receive data according to the migration from the existing server 2503. A second migration request may be transmitted to the migration server 2506.

기존 서버(2503)의 서버 컨트롤러(2530)는 워크스페이스 마이그레이션 센더(workspace migration sender)(2540), 스토리지 해셔 인코더(storage hasher encoder)(2550)를 포함할 수 있다. 워크스페이스 마이그레이션 센더(2540)는 사용자 마이그레이션 데이터(사용자 데이터, 사용자 메타데이터 등)의 백업 데이터를 전송하기 위해 구현될 수 있다.The server controller 2530 of the existing server 2503 may include a workspace migration sender 2540 and a storage hasher encoder 2550. The workspace migration sender 2540 may be implemented to transmit backup data of user migration data (user data, user metadata, etc.).

워크스페이스 마이그레이션 센더(2540)는 기존 서버의 마이그레이션 데이터를 백업하고 스토리지 해셔 인코더(2550)에게 전달할 수 있다.The workspace migration sender (2540) can back up the migration data of the existing server and deliver it to the storage hasher encoder (2550).

스토리지 해셔 인코더(2550)는 마이그레이션 데이터를 해시화하고 템포러리 마이그레이션 스토리지(temporary migration storage)에 해시화된 마이그레이션 데이터를 전송할 수 있다.The storage hasher encoder 2550 may hash the migration data and transmit the hashed migration data to temporary migration storage.

템포러리 마이그레이션 스토리지(2560)는 기존 서버(2503)의 해시화된 마이그레이션 데이터를 저장하는 스토리지이고, 마이그레이션 서버(2506)로는 해시화된 마이그레이션 데이터를 전달할 수 있다. 템포러리 마이그레이션 스토리지(2560)에 저장된 마이그레이션 데이터는 마이그레이션 프로세스가 끝나는 동시에 삭제될 수 있다.Temporary migration storage 2560 is a storage that stores hashed migration data of the existing server 2503, and can transmit hashed migration data to the migration server 2506. Migration data stored in the temporary migration storage 2560 may be deleted at the same time the migration process is completed.

마이그레이션 서버(2506)의 서버 컨트롤러(2530)는 워크스페이스 마이그레이션 리시버(workspace migration receiver)(2570), 스토리지 해셔 디코더(storage hasher decoder)(2580), 워크스페이스 스포너(workspace spawner)(2590)를 포함할 수 있다.The server controller 2530 of the migration server 2506 includes a workspace migration receiver 2570, a storage hasher decoder 2580, and a workspace spawner 2590. can do.

워크스페이스 마이그레이션 리시버(2570)는 템포러리 마이그레이션 스토리지(2560)로부터 해시화된 마이그레이션 데이터를 수신하고, 해시화된 마이그레이션 데이터는 스토리지 해셔 디코더(2580)로 전달될 수 있다.The workspace migration receiver 2570 may receive hashed migration data from the temporary migration storage 2560, and the hashed migration data may be transmitted to the storage hasher decoder 2580.

스토리지 해셔 디코더(2580)는 해시화된 마이그레이션 데이터를 디코딩하여 마이그레이션 데이터를 생성할 수 있다. 또한, 스토리지 해셔 디코더(2580)는 사용자 마이그레이션 사양을 고려하여 마이그레이션 서버(2506) 상에 워크스페이스에 관련된 스토리지 및 메타데이터를 생성할 수 있다.The storage hasher decoder 2580 may generate migration data by decoding the hashed migration data. Additionally, the storage hasher decoder 2580 may generate storage and metadata related to the workspace on the migration server 2506 by considering user migration specifications.

워크스페이스 스포너(workspace spawner)(2590)는 생성된 워크스페이스에 관련된 스토리지 및 메타데이터를 기반으로 마이그레이션 서버(2506) 상에 새로운 워크스페이스를 생성할 수 있다.The workspace spawner 2590 can create a new workspace on the migration server 2506 based on storage and metadata related to the created workspace.

마이그레이션 서버(2506) 상에 새로운 워크스페이스가 생성되는 경우, 템포러리 마이그레이션 스토리지(2560)에 저장된 마이그레이션 데이터는 삭제될 수 있다.When a new workspace is created on the migration server 2506, migration data stored in the temporary migration storage 2560 may be deleted.

도 26은 본 발명의 실시예에 따른 워크스페이스 마이그레이션 방법을 나타낸 순서도이다.Figure 26 is a flowchart showing a workspace migration method according to an embodiment of the present invention.

도 26에서는 워크스페이스에 대한 마이그레이션을 수행하는 방법이 개시된다.In Figure 26, a method of performing migration for a workspace is disclosed.

도 26을 참조하면, 마스터 서버의 마스터 컨트롤러의 워크스페이스 마이그레이션 컨트롤러가 사용자 장치로부터 마이그레이션 요청을 수신한다(단계 S2600).Referring to FIG. 26, the workspace migration controller of the master controller of the master server receives a migration request from the user device (step S2600).

워크스페이스 마이그레이션 컨트롤러는 사용자의 마이그레이션 요청을 받고 마이그레이션 요청을 기반으로 사용자의 워크스페이스의 이전이 가능한 마이그레이션 서버를 결정한다(단계 S2610). The workspace migration controller receives the user's migration request and determines a migration server that can transfer the user's workspace based on the migration request (step S2610).

워크스페이스 마이그레이션 컨트롤러가 기존 서버와 마이그레이션 서버로 마이그레이션 요청을 전달할 수 있다(단계 S2620).The workspace migration controller may forward the migration request to the existing server and the migration server (step S2620).

마이그레이션 요청은 사용자 마이그레이션 사양에 대한 정보를 포함할 수 있다. 워크스페이스 마이그레이션 컨트롤러는 마이그레이션 서버로 마이그레이션을 수행할 것을 요청하는 제1 마이그레이션 요청을 기존 서버로 전달하고, 기존 서버로부터 마이그레이션에 따른 데이터를 수신할 것을 요청하는 제2 마이그레이션 요청을 마이그레이션 서버로 전송할 수 있다.The migration request may include information about user migration specifications. The workspace migration controller may transmit a first migration request requesting to perform migration to the migration server to the existing server, and transmit a second migration request requesting to receive migration-related data from the existing server to the migration server. .

워크스페이스 마이그레이션 센더는 기존 서버의 마이그레이션 데이터를 백업하고 스토리지 해셔 인코더에게 전달한다(단계 S2630).The workspace migration sender backs up the migration data of the existing server and delivers it to the storage hasher encoder (step S2630).

스토리지 해셔 인코더는 마이그레이션 데이터를 해시화하고 템포러리 마이그레이션 스토리지에 해시화된 마이그레이션 데이터를 전송한다(단계 S2640).The storage hasher encoder hashes the migration data and transmits the hashed migration data to the temporary migration storage (step S2640).

워크스페이스 마이그레이션 리시버는 템포러리 마이그레이션 스토리지로부터 해시화된 마이그레이션 데이터를 수신하고, 해시화된 마이그레이션 데이터를 스토리지 해셔 디코더로 전달한다(단계 S2650).The workspace migration receiver receives hashed migration data from the temporary migration storage and delivers the hashed migration data to the storage hasher decoder (step S2650).

스토리지 해셔 디코더는 해시화된 마이그레이션 데이터를 디코딩하여 마이그레이션 데이터를 생성하고, 사용자 마이그레이션 사양을 고려하여 마이그레이션 서버 상에 워크스페이스에 관련된 스토리지 및 메타데이터를 생성한다(단계 S2660).The storage hasher decoder decodes the hashed migration data to generate migration data, and generates storage and metadata related to the workspace on the migration server by considering the user migration specifications (step S2660).

워크스페이스 스포너(workspace spawner)는 생성된 워크스페이스에 관련된 스토리지 및 메타데이터를 기반으로 마이그레이션 서버 상에 새로운 워크스페이스를 생성한다(단계 S2670).The workspace spawner creates a new workspace on the migration server based on storage and metadata related to the created workspace (step S2670).

즉, 본 발명의 실시예에 따른 데이터 처리 시스템에서 워크스페이스 마이그레이션 방법은 데이터 처리 시스템이 기존 서버에 위치한 제1 워크스페이스의 마이그레이션 데이터를 백업하고 마이그레이션 데이터를 마이그레이션 서버로 전송하는 단계, 데이터 처리 시스템이 마이그레이션 데이터를 기반으로 마이그레이션 서버 상에 제2 워크스페이스를 생성하는 단계를 포함할 수 있다. That is, the workspace migration method in the data processing system according to an embodiment of the present invention includes the steps of the data processing system backing up migration data of the first workspace located in the existing server and transferring the migration data to the migration server, the data processing system It may include creating a second workspace on the migration server based on migration data.

마이그레이션 데이터를 마이그레이션 서버로 전송하는 단계는 워크스페이스 마이그레이션 센더가 기존 서버의 마이그레이션 데이터를 백업하고 스토리지 해셔 인코더에게 전달하는 단계, 스토리지 해셔 인코더가 마이그레이션 데이터를 해시화하고 템포러리 마이그레이션 스토리지에 해시화된 마이그레이션 데이터를 전송하는 단계와 템포러리 마이그레이션 스토리지가 기존 서버의 해시화된 마이그레이션 데이터를 마이그레이션 서버로 전송하는 단계를 포함할 수 있다.The step of transferring migration data to the migration server is where the workspace migration sender backs up the migration data from the existing server and delivers it to the storage hasher encoder. The storage hash encoder hashes the migration data and transfers the hashed migration to the temporary migration storage. It may include a step of transmitting data and a step of the temporary migration storage transmitting hashed migration data from the existing server to the migration server.

제2 워크스페이스를 생성하는 단계는 워크스페이스 마이그레이션 리시버가 템포러리 마이그레이션 스토리지로부터 해시화된 마이그레이션 데이터를 수신하고, 해시화된 마이그레이션 데이터를 스토리지 해셔 디코더로 전달하는 단계, 스토리지 해셔 디코더가 해시화된 마이그레이션 데이터를 디코딩하여 마이그레이션 데이터를 생성하는 단계, 스토리지 해셔 디코더가 사용자 마이그레이션 사양을 고려하여 마이그레이션 서버 상에 워크스페이스에 관련된 스토리지 및 메타데이터를 생성하는 단계, 워크스페이스 스포너가 워크스페이스에 관련된 스토리지 및 메타데이터를 기반으로 마이그레이션 서버 상에 제2 워크스페이스를 생성하는 단계를 포함할 수 있다.In the step of creating a second workspace, the workspace migration receiver receives hashed migration data from the temporary migration storage, forwards the hashed migration data to the storage hasher decoder, and the storage hasher decoder performs the hashed migration data. A step of decoding data to generate migration data, a storage hasher decoder generating storage and metadata related to the workspace on the migration server considering the user migration specification, and a step of generating storage and metadata related to the workspace by the workspace spawner. It may include the step of creating a second workspace on the migration server based on .

도 27은 본 발명의 실시예에 따른 서버 마이그레이션 방법을 나타낸 개념도이다.Figure 27 is a conceptual diagram showing a server migration method according to an embodiment of the present invention.

도 27에서는 서버 자체를 마이그레이션하는 서버 마이그레이션 방법이 개시된다.In Figure 27, a server migration method for migrating the server itself is disclosed.

도 27을 참조하면, 서버 마이그레이션은 기존 서버(2703)를 새로운 마이그레이션 서버(2706)로 옮기는 행위이다. 예를 들어, 기존 서버(2703)의 스토리지, 컴퓨팅 리소스의 사이즈를 변화시킨 새로운 서버로 서버 자체의 이동이 필요할 수 있다. 이러한 경우, 서버 마이그레이션이 수행될 수 있다. 이하, 설명의 편의상 마이그레이션된 새로운 서버는 마이그레이션 서버(2706)라는 용어로 표현될 수 있다.Referring to FIG. 27, server migration is the act of moving the existing server 2703 to a new migration server 2706. For example, it may be necessary to move the server itself to a new server that changes the size of the storage and computing resources of the existing server 2703. In these cases, server migration may be performed. Hereinafter, for convenience of explanation, the new migrated server may be expressed as a migration server 2706.

마스터 서버(2700)의 마스터 컨트롤러(2710)는 서버 상태 모니터(server status monitor)(2720)와 서버 마이그레이션 컨트롤러(2730)를 포함할 수 있다.The master controller 2710 of the master server 2700 may include a server status monitor 2720 and a server migration controller 2730.

서버 상태 모니터(2720)는 서버 상태를 모니터링하기 위해 구현될 수 있다. 서버 상태 모니터(2720)는 스케줄링을 기반으로 서버 컨트롤러(2740)에 위치한 상태 체커(status checker)(2750)와 통신할 수 있다. 상태 체커(2750)는 기존 서버(2703)의 스토리지 및/또는 리소스에 대한 상태를 체크할 수 있다. 서버 상태 모니터(2720)는 스토리지 상태 정보 및 리소스 상태 정보를 수신할 수 있다. 서버 상태 모니터(2720)는 스토리지 및/또는 리소스가 부족한 경우, 서버 마이그레이션 컨트롤러(2730)로 서버 마이그레이션 요청을 전송할 수 있다.Server status monitor 2720 may be implemented to monitor server status. The server status monitor 2720 may communicate with a status checker 2750 located in the server controller 2740 based on scheduling. The status checker 2750 may check the status of the storage and/or resources of the existing server 2703. Server status monitor 2720 may receive storage status information and resource status information. The server status monitor 2720 may transmit a server migration request to the server migration controller 2730 when storage and/or resources are insufficient.

서버 마이그레이션 요청은 서버 마이그레이션 사양(server migration specification)을 포함할 수 있고, 서버 마이그레이션 사양은 마이그레이션 서버(2706)에 요청되는 스펙(예를 들어, 스토리지, 컴퓨팅 자원)에 대한 정보를 포함할 수 있다.The server migration request may include a server migration specification, and the server migration specification may include information about specifications (eg, storage, computing resources) requested for the migration server 2706.

서버 마이그레이션 컨트롤러(2730)는 서버 마이그레이션 요청을 기반으로 기존 서버를 마이그레이션할 마이그레이션 서버(2706)를 결정할 수 있다. 서버 마이그레이션 컨트롤러(2730)는 서버 마이그레이션 요청을 기반으로 이전 가능한 새로운 마이그레이션 서버(2706)를 조회하고 요청을 전달할 수 있다.The server migration controller 2730 may determine the migration server 2706 to migrate the existing server to based on the server migration request. The server migration controller 2730 can search for a new migration server 2706 that can be migrated based on the server migration request and forward the request.

서버 마이그레이션 컨트롤러(2730)는 서버 컨트롤러(2740)로 서버 마이그레이션 요청을 전송할 수 있다.The server migration controller 2730 may transmit a server migration request to the server controller 2740.

서버 컨트롤러(2740)는 상태 체커(2750), 서버 마이그레이션 센더(server migration sender)(2760), 스토리지 해시 인코더(storage hash encoder)(2770)를 포함할 수 있다. The server controller 2740 may include a status checker 2750, a server migration sender 2760, and a storage hash encoder 2770.

서버 마이그레이션 센더(2760)는 서버에 대한 백업을 통해 서버 마이그레이션 데이터를 생성할 수 있다. 서버 마이그레이션 센더(2760)는 서버 마이그레이션 데이터를 스토리지 해시 인코더(2770)로 전달할 수 있다.The server migration sender 2760 can generate server migration data through backup of the server. The server migration sender 2760 may transmit server migration data to the storage hash encoder 2770.

스토리지 해시 인코더(2770)는 서버 마이그레이션 데이터를 해시화하고 해시화된 서버 마이그레이션 데이터를 템포러리 마이그레이션 스토리지(temporary migration storage)(2780)로 전송할 수 있다.The storage hash encoder 2770 may hash the server migration data and transmit the hashed server migration data to the temporary migration storage 2780.

템포러리 마이그레이션 스토리지(2780)는 해시화된 서버 마이그레이션 데이터를 저장할 수 있다.Temporary migration storage 2780 can store hashed server migration data.

템포러리 마이그레이션 스토리지(2780)는 마이그레이션 서버(2706)의 서버 마이그레이션 리시버(2765)로 해시화된 마이그레이션 데이터를 전송할 수 있다.Temporary migration storage 2780 may transmit hashed migration data to the server migration receiver 2765 of the migration server 2706.

해시화된 마이그레이션 데이터는 스토리지 해시 디코더(2775)를 통해 디코딩되어 마이그레이션 데이터로 변환될 수 있다.The hashed migration data may be decoded through the storage hash decoder 2775 and converted into migration data.

스토리지 해시 디코더(2775)는 서버 마이그레이션 사양을 기반으로 마이그레이션 서버에 적용될 스토리지 및 메타데이터(또는 기존 서버의 모든 워크스페이스에 관련된 스토리지 및 메타데이터)를 생성할 수 있다. The storage hash decoder 2775 may generate storage and metadata to be applied to the migrating server (or storage and metadata related to all workspaces on the existing server) based on the server migration specification.

워크스페이스 스포너(2780)는 서버에 관련된 스토리지 및 메타데이터를 기반으로 기존 서버에 존재하는 모든 워크스페이스를 마이그레이션 서버(2706) 상에 생성할 수 있다.The workspace spawner 2780 can create all workspaces existing in the existing server on the migration server 2706 based on storage and metadata related to the server.

템포러리 마이그레이션 스토리지(temporary migration storage)(2780)는 해시화된 서버 마이그레이션 데이터를 삭제할 수 있다.Temporary migration storage 2780 can delete hashed server migration data.

도 28은 본 발명의 실시예에 따른 서버 마이그레이션 방법을 나타낸 순서도이다.Figure 28 is a flowchart showing a server migration method according to an embodiment of the present invention.

도 28에서는 서버에 대한 마이그레이션을 수행하는 방법이 개시된다.In Figure 28, a method of performing migration to a server is disclosed.

도 28을 참조하면, 서버 상태 모니터가 스토리지 및/또는 리소스가 부족한 경우, 서버 마이그레이션 컨트롤러로 서버 마이그레이션 요청을 전송할 수 있다(단계 S2800)Referring to Figure 28, if the server status monitor is low on storage and/or resources, a server migration request may be transmitted to the server migration controller (step S2800).

서버 마이그레이션 컨트롤러가 서버 마이그레이션 요청을 기반으로 기존 서버를 마이그레이션할 마이그레이션 서버를 결정할 수 있다(단계 S2810). The server migration controller may determine a migration server to migrate the existing server to based on the server migration request (step S2810).

서버 마이그레이션 컨트롤러가 서버 컨트롤러로 서버 마이그레이션 요청을 전송할 수 있다(단계 S2820).The server migration controller may transmit a server migration request to the server controller (step S2820).

서버 마이그레이션 센더가 기존 서버에 대한 백업을 통해 서버 마이그레이션 데이터를 생성한다(단계 S2830).The server migration sender creates server migration data through a backup of the existing server (step S2830).

서버 마이그레이션 센더는 서버 마이그레이션 데이터를 스토리지 해시 인코더로 전달한다(단계 S2840).The server migration sender transmits the server migration data to the storage hash encoder (step S2840).

스토리지 해시 인코더가 서버 마이그레이션 데이터를 해시화한다(단계 S2850).A storage hash encoder hashes the server migration data (step S2850).

스토리지 해시 인코더가 해시화된 서버 마이그레이션 데이터를 템포러리 마이그레이션 스토리지로 전송한다(단계 S2855).The storage hash encoder transmits the hashed server migration data to the temporary migration storage (step S2855).

템포러리 마이그레이션 스토리지가 해시화된 서버 마이그레이션 데이터를 저장한다(단계 S2860).Temporary migration storage stores hashed server migration data (step S2860).

템포러리 마이그레이션 스토리지가 마이그레이션 서버의 서버 마이그레이션 리시버로 해시화된 마이그레이션 데이터를 전송한다(단계 S2865).Temporary migration storage transmits hashed migration data to the server migration receiver of the migration server (step S2865).

해시화된 마이그레이션 데이터는 스토리지 해시 디코더를 통해 디코딩되어 마이그레이션 데이터로 변환된다(단계 S2870).The hashed migration data is decoded through a storage hash decoder and converted into migration data (step S2870).

스토리지 해시 디코더가 서버 마이그레이션 사양을 기반으로 마이그레이션 서버에 적용될 스토리지 및 메타데이터(또는 기존 서버의 모든 워크스페이스에 관련된 스토리지 및 메타데이터)를 생성한다(단계 S2875). The storage hash decoder generates storage and metadata to be applied to the migration server (or storage and metadata related to all workspaces on the existing server) based on the server migration specification (step S2875).

워크스페이스 스포너가 서버에 관련된 스토리지 및 메타데이터를 기반으로 기존 서버에 존재하는 모든 워크스페이스를 마이그레이션 서버 상에 생성한다(단계 S2880).The workspace spawner creates all workspaces existing in the existing server on the migration server based on storage and metadata related to the server (step S2880).

즉, 본 발명의 실시예에 따른 데이터 처리 시스템에서 서버 마이그레이션 방법은 데이터 처리 시스템이 기존 서버의 마이그레이션 데이터를 백업하고 마이그레이션 데이터를 마이그레이션 서버로 전송하는 단계와 데이터 처리 시스템이 마이그레이션 데이터를 기반으로 마이그레이션 서버 상에 기존 서버에 존재하는 모든 워크스페이스를 생성하는 단계를 포함할 수 있다.That is, the server migration method in the data processing system according to the embodiment of the present invention includes the steps of the data processing system backing up the migration data of the existing server and transmitting the migration data to the migration server, and the data processing system backing up the migration data of the existing server and transferring the migration data to the migration server based on the migration data. This may include the step of creating all workspaces existing on the existing server.

마이그레이션 데이터를 마이그레이션 서버로 전송하는 단계는 서버 마이그레이션 센더가 기존 서버에 대한 백업을 통해 서버 마이그레이션 데이터를 생성하는 단계, 서버 마이그레이션 센더가 서버 마이그레이션 데이터를 스토리지 해시 인코더로 전달하는 단계, 스토리지 해시 인코더가 서버 마이그레이션 데이터를 해시화하는 단계, 스토리지 해시 인코더가 해시화된 서버 마이그레이션 데이터를 템포러리 마이그레이션 스토리지로 전송하는 단계와 템포러리 마이그레이션 스토리지가 마이그레이션 서버의 서버 마이그레이션 리시버로 해시화된 마이그레이션 데이터를 전송하는 단계를 포함할 수 있다.The steps for transmitting migration data to the migration server are: the server migration sender creates server migration data through a backup of the existing server, the server migration sender passes the server migration data to the storage hash encoder, and the storage hash encoder sends the server migration data to the server. Hashing the migration data, the storage hash encoder transmitting the hashed server migration data to the temporary migration storage, and the temporal migration storage transmitting the hashed migration data to the server migration receiver of the migration server. It can be included.

또한, 기존 서버에 존재하는 모든 워크스페이스를 생성하는 단계는 스토리지 해시 디코더가 해시화된 마이그레이션 데이터를 디코딩하여 마이그레이션 데이터로 변환하는 단계, 스토리지 해시 디코더가 서버 마이그레이션 사양을 기반으로 마이그레이션 서버에 적용될 스토리지 및 메타데이터를 생성하는 단계와 워크스페이스 스포너가 스토리지 및 메타데이터를 기반으로 기존 서버에 존재하는 모든 워크스페이스를 마이그레이션 서버 상에 생성하는 단계를 포함할 수 있다.In addition, the step of creating all workspaces existing in the existing server includes the storage hash decoder decoding the hashed migration data and converting it into migration data, the storage hash decoder determining the storage and storage to be applied to the migration server based on the server migration specification. It may include a step of creating metadata and a step of the workspace spawner creating all workspaces existing on the existing server on the migration server based on storage and metadata.

도 29는 본 발명의 실시예에 따른 비정형 모델 캐싱 방법을 나타낸 개념도이다.Figure 29 is a conceptual diagram showing an unstructured model caching method according to an embodiment of the present invention.

도 29에서는 비정형 모델의 캐싱을 위한 방법이 개시된다.In Figure 29, a method for caching an unstructured model is disclosed.

도 29를 참조하면, 캐시(cache)(2950)는 일반적으로 일시적인 특징이 있는 데이터 하위 집합을 저장하는 고속 데이터 스토리지 계층이다. 따라서, 캐시(2950) 상에 데이터가 존재할 경우, 데이터의 기본 스토리지 위치에 액세스할 때보다 더 빠르게 요청이 처리될 수 있다.Referring to FIG. 29, cache 2950 is a high-speed data storage layer that stores subsets of data that are generally transient in nature. Accordingly, when data resides in cache 2950, requests can be processed more quickly than when accessing the data's primary storage location.

캐싱(caching)은 캐시(2950) 상에 데이터를 업로드하고 데이터를 처리하는 것이다. 캐싱이 사용되는 경우, 이전에 검색하거나 계산한 캐시(2950) 상의 데이터가 효율적으로 재사용될 수 있다. 캐싱은 통상적으로 계산 비용이 높거나 자주 호출되는 함수나 데이터를 더 빠르게 요청하고 처리하는 데 사용될 수 있다.Caching refers to uploading data onto the cache 2950 and processing the data. When caching is used, data on cache 2950 that has previously been retrieved or computed can be efficiently reused. Caching can be used to more quickly request and process functions or data that are typically computationally expensive or frequently called.

본 발명의 데이터 처리 시스템은 인공지능모델을 캐싱하는 기술 및 어떠한 인공지능모델을 캐싱할지에 대한 캐싱 알고리즘을 통해 인공지능모델을 캐시(2950) 상에 업로드하여 사용할 수 있다.The data processing system of the present invention can be used by uploading an artificial intelligence model to the cache 2950 through a technology for caching the artificial intelligence model and a caching algorithm for determining which artificial intelligence model to cache.

인공지능모델의 캐싱은 아래와 같은 효과를 가질 수 있다. 예를 들어, 인공지능모델을 로드하는데 필요한 컴퓨팅 자원과 시간이 감소될 수 있다. 또한, 인공지능모델의 캐싱을 통해 보다 빠르게 인공지능모델의 결과값이 출력될 수 있다.Caching of artificial intelligence models can have the following effects. For example, the computing resources and time required to load an artificial intelligence model can be reduced. Additionally, the results of the artificial intelligence model can be output more quickly through caching of the artificial intelligence model.

본 발명의 실시예에 따르면, 인공지능모델의 캐싱은 SQL 엔진(예를 들어, 확장 SQL 엔진) 내부의 구문(clause)(2900)(예를 들어, SEARCH, CONVERT, PREDICT, EVALUATE, etc.) 등에 내장된다. 따라서, 사용자가 따로 트리거(trigger) 및 캐싱을 하기 위한 별도의 절차없이도 알고리즘 기반의 인공지능모델의 캐싱이 수행될 수 있다.According to an embodiment of the present invention, caching of the artificial intelligence model is carried out using clauses 2900 (e.g., SEARCH, CONVERT, PREDICT, EVALUATE, etc.) inside the SQL engine (e.g., extended SQL engine). It is built into the back. Therefore, caching of an algorithm-based artificial intelligence model can be performed without a separate procedure for the user to trigger and cache.

인공지능엔진의 캐싱은 아래와 같은 알고리즘이 사용될 수 있다.The following algorithm can be used for caching of artificial intelligence engines.

예를 들어, 디폴트 캐싱 알고리즘으로 제1 캐싱 알고리즘이 사용될 수 있다. 제1 캐싱 알고리즘은 최근에 사용되지 않은 인공지능모델 순서로 캐시 상에 제공하는 알고리즘이다. 캐시가 꽉 차면 가장 오래된 인공지능모델이 제거되고, 새로운 인공지능모델을 로딩할 공간이 생성될 수 있다. 즉, 제1 캐싱 알고리즘은 많이 사용되는 인공지능모델을 자주 접근할 수 있도록 하여서 데이터를 자주 입력하거나 출력하는 인공지능모델이 특정될수록 성능이 향상될 수 있다.For example, the first caching algorithm may be used as the default caching algorithm. The first caching algorithm is an algorithm that provides artificial intelligence models in the cache in the order of those that have not been used recently. When the cache is full, the oldest artificial intelligence model is removed, and space can be created to load a new artificial intelligence model. In other words, the first caching algorithm allows frequently used artificial intelligence models to be accessed frequently, so performance can be improved as artificial intelligence models that frequently input or output data are identified.

본 발명의 실시예에서는 제1 캐싱 알고리즘이 사용되어 인공지능모델이 캐시 상에 업로드되어 최근에 사용한 인공지능모델이 캐시 상에 로드될 수 있도록 할 수 있고, 최근에 사용되지 않은 인공지능 모델일수록 추가적인 인공지능모델의 캐시 상의 로드가 필요시 캐시 상에서 삭제될 수 있다.In an embodiment of the present invention, a first caching algorithm is used so that the artificial intelligence model is uploaded to the cache so that recently used artificial intelligence models can be loaded on the cache, and the more recently unused artificial intelligence models are added, the more they are added. A load on the cache of an artificial intelligence model can be deleted from the cache when necessary.

본 발명의 실시예에 따른 인공지능모델의 캐싱을 위해서는 자원이 사용되고, 자원의 효율적인 관리를 위해서는 전술한 워크스페이스 생성 및 워크스페이스 구조를 기반으로 분배된 컴퓨팅 자원(CPU, GPU, 메모리)을 고려하여 가장 효율적인 인공지능모델의 캐싱 알고리즘이 선택될 수 있다.Resources are used for caching of artificial intelligence models according to an embodiment of the present invention, and for efficient management of resources, computing resources (CPU, GPU, memory) distributed based on the above-described workspace creation and workspace structure are taken into consideration. The caching algorithm of the most efficient artificial intelligence model can be selected.

인공지능모델의 캐싱을 위해 캐시 상에 캐싱되는 인공지능모델의 개수, 캐싱시 사용되는 캐싱 알고리즘이 다양하게 고려될 수 있다. 또한, 사용자에게 할당된 컴퓨팅 자원에 따라 캐시에 저장 가능한 인공지능모델의 수는 달라질 수 밖에 없다. 따라서, 사용자별로 제1 캐싱 알고리즘 외에 다른 다양한 캐싱 알고리즘이 사용되어 캐시 상에 인공지능모델이 로딩될 수 있다.For caching of artificial intelligence models, the number of artificial intelligence models cached in the cache and the caching algorithm used during caching can be variously considered. Additionally, the number of artificial intelligence models that can be stored in the cache will inevitably vary depending on the computing resources allocated to the user. Accordingly, various caching algorithms other than the first caching algorithm may be used for each user to load the artificial intelligence model on the cache.

본 발명에서 사용될 수 있는 다른 캐싱 알고리즘으로 아래와 같은 캐싱 알고리즘이 사용될 수 있다.As another caching algorithm that can be used in the present invention, the caching algorithm below can be used.

제2 캐싱 알고리즘은 타임 스탬프를 추가적으로 캐시 상에 로드된 인공지능모델에 할당하고, 캐시 상에 인공지능모델이 로드된 시간을 추가적으로 고려하여 인공지능모델을 캐시 상에서 삭제할 수 있다. 제2 캐싱 알고리즘은 특정 기간 동안 사용되지 않은 인공지능 모델 중 가장 최근에 사용되지 않은 인공지능모델을 삭제하는 방식으로 캐시를 관리할 수 있다. 제2 캐싱 알고리즘은 오래된 항목을 더 적극적으로 제거하여 캐시 공간을 최대한 활용하기 위해 사용될 수 있다. 즉, 제2 캐싱 알고리즘에서 설정된 임계 시간 후에 가장 마지막에 엑세스 된 인공지능 모델이 삭제되는 반면, 제1 캐싱 알고리즘에서는 시간이 아닌 지정된 숫자까지의 인공지능모델을 캐시 상에 로드하는 알고리즘이기 때문에 다음 모델이 캐시에 로드될 때까지 마지막에 엑세스된 인공지능모델이 캐시 상에 존재할 수 있다.The second caching algorithm may additionally assign a time stamp to the artificial intelligence model loaded on the cache and delete the artificial intelligence model from the cache by additionally considering the time the artificial intelligence model was loaded on the cache. The second caching algorithm can manage the cache by deleting the most recently unused artificial intelligence model among artificial intelligence models that have not been used for a specific period of time. A secondary caching algorithm can be used to make the most of cache space by more aggressively removing older items. That is, in the second caching algorithm, the last accessed artificial intelligence model is deleted after the set threshold time, whereas in the first caching algorithm, the algorithm loads artificial intelligence models up to a specified number, not time, into the cache, so the next model The last accessed AI model may remain in the cache until it is loaded into this cache.

제3 캐싱 알고리즘은 비트마스크를 기반으로 인공지능모델의 액세스 상태를 추적하고, 가장 오래전에 액세스된 인공지능모델은 삭제될 수 있다. 비트 마스크는 캐시 라인에 저장된 모든 모델의 액세스 상태를 추적하는 데 사용되는 데이터 구조이다. 비트 마스크는 각 인공지능모델에 대해 하나 이상의 비트를 가지며, 비트의 위치는 각 인공지능모델의 위치에 대응된다. 각 비트는 모델의 액세스 상태를 나타내며, 0 또는 1로 설정된다. 즉, 캐시 라인의 각 인공지능모델에 대해 비트마스크를 사용하여 모델의 액세스 상태가 추적되고, 최근에 액세스된 모델을 찾을 때 사용될 수 있다.The third caching algorithm tracks the access status of the artificial intelligence model based on the bitmask, and the oldest accessed artificial intelligence model can be deleted. A bitmask is a data structure used to track the access status of all models stored in a cache line. The bit mask has one or more bits for each artificial intelligence model, and the position of the bit corresponds to the position of each artificial intelligence model. Each bit represents the access status of the model and is set to 0 or 1. In other words, the access status of the model is tracked using a bitmask for each artificial intelligence model in the cache line, and can be used to find a recently accessed model.

제4 캐싱 알고리즘은 사용된 인공지능모델을 캐시 상에 유지하고 캐시가 가득 차서 새로운 인공지능모델을 추가할 때 가장 최근에 캐시로 추가된 인공지능모델이 대체되거나 삭제되는 알고리즘이다. 지정한 숫자의 인공지능모델이 캐시 상에 로드될 때까지는 캐시 상에서 인공지능모델이 삭제되지 않지만 지정된 숫자를 넘어간다면 가장 최근에 추가된 인공지능모델이 캐시 상에서 삭제될 수 있다.The fourth caching algorithm is an algorithm that maintains the used artificial intelligence model in the cache, and when the cache is full and a new artificial intelligence model is added, the artificial intelligence model most recently added to the cache is replaced or deleted. Artificial intelligence models are not deleted from the cache until the specified number of artificial intelligence models are loaded into the cache, but if the specified number is exceeded, the most recently added artificial intelligence model may be deleted from the cache.

즉, 본 발명의 실시예에 따른 데이터 처리 시스템에서 비정형 모델 캐싱 방법은 데이터 처리 시스템이 캐시 상에 복수의 인공지능모델을 캐싱하는 단계, 데이터 처리 시스템이 캐싱 알고리즘을 기반으로 복수의 인공지능 모델 중 적어도 하나의 인공지능모델을 교체하는 단계를 포함할 수 있다.That is, the unstructured model caching method in the data processing system according to the embodiment of the present invention includes the steps of the data processing system caching a plurality of artificial intelligence models in the cache, and the data processing system selecting one of the plurality of artificial intelligence models based on the caching algorithm. It may include replacing at least one artificial intelligence model.

도 30은 본 발명의 실시예에 따른 제1 캐싱 알고리즘을 나타낸 개념도이다.Figure 30 is a conceptual diagram showing a first caching algorithm according to an embodiment of the present invention.

도 30에서는 제1 캐싱 알고리즘을 기반으로 인공지능엔진의 교체가 개시된다.In Figure 30, replacement of the artificial intelligence engine is initiated based on the first caching algorithm.

도 30을 참조하면, 제1 캐싱 알고리즘은 캐시 상에 사용자가 사용하는 인공지능모델을 고려하여 인공지능모델A, 인공지능모델B, 인공지능모델C, 인공지능모델D를 순차적으로 로드할 수 있다. Referring to Figure 30, the first caching algorithm may sequentially load artificial intelligence model A, artificial intelligence model B, artificial intelligence model C, and artificial intelligence model D in consideration of the artificial intelligence model used by the user on the cache. .

캐시에 추가적으로 인공지능모델E가 추가되어야 하고, 캐시 상에 로딩된 인공지능모델들 중 가장 최근에 사용되지 않은 인공지능모델이 인공지능모델A인 경우, 인공지능모델A가 캐시 상에서 삭제되고, 해당 자리에 인공지능모델E가 추가될 수 있다. In addition, artificial intelligence model E must be added to the cache, and if the most recently unused artificial intelligence model among the artificial intelligence models loaded on the cache is artificial intelligence model A, artificial intelligence model A is deleted from the cache, and the corresponding Artificial intelligence model E can be added in its place.

이후 추가적으로 인공지능모델F가 캐시 상에 로딩될 필요가 있고, 캐시 상에 로딩된 인공지능모델들 중 가장 최근에 사용되지 않은 인공지능모델이 인공지능모델이 인공지능모델D인 경우, 인공지능모델D는 삭제되고, 인공지능모델 F가 캐시 상에 로딩될 수 있다.Afterwards, artificial intelligence model F needs to be additionally loaded on the cache, and if the most recently unused artificial intelligence model among the artificial intelligence models loaded on the cache is artificial intelligence model D, the artificial intelligence model D can be deleted, and artificial intelligence model F can be loaded into the cache.

도 31은 본 발명의 실시예에 따른 비정형 모델 캐싱 방법을 나타낸 개념도이다.Figure 31 is a conceptual diagram showing an unstructured model caching method according to an embodiment of the present invention.

도 31에서는 비정형 모델의 캐싱을 위한 방법이 개시된다.In Figure 31, a method for caching an unstructured model is disclosed.

도 31을 참조하면, 전술한 바와 같이 워크스페이스 허브는 공유 아키텍쳐(shared architecture)를 기반으로 구현되는 공유 워크스페이스 허브와 전용 아키텍쳐(dedicated architecture)를 기반으로 구현되는 전용 워크스페이스 허브를 포함할 수 있다. 공유 워크스페이스 허브와 전용 워크스페이스 허브는 워크스페이스를 포함할 수 있다. 공유 워크스페이스 허브는 복수의 사용자들의 복수의 워크스페이스를 포함할 수 있고, 공유 워크스페이스 허브는 할당된 데이터 처리 엔진, 할당된 프로세싱 자원을 공유할 수 있다. 전용 워크스페이스 허브는 하나의 사용자에 대한 워크스페이스를 포함하고, 전용 워크스페이스 허브에는 하나의 사용자가 전용으로 사용할 데이터 처리 엔진 및 프로세싱 자원이 할당될 수 있다.Referring to FIG. 31, as described above, the workspace hub may include a shared workspace hub implemented based on a shared architecture and a dedicated workspace hub implemented based on a dedicated architecture. . Shared workspace hubs and dedicated workspace hubs can contain workspaces. The shared workspace hub may include a plurality of workspaces of a plurality of users, and the shared workspace hub may share an allocated data processing engine and allocated processing resources. A dedicated workspace hub includes a workspace for one user, and a data processing engine and processing resources for exclusive use by one user may be allocated to the dedicated workspace hub.

본 발명의 실시예에서는 워크스페이스의 특성에 따라 디폴트 캐시 알고리즘이 서로 다르게 설정될 수 있다.In an embodiment of the present invention, the default cache algorithm may be set differently depending on the characteristics of the workspace.

공유 워크스페이스 허브(또는 공유 워크스페이스)에서는 TLRU 캐시 알고리즘이 사용될 수 있다.In a shared workspace hub (or shared workspace), the TLRU cache algorithm can be used.

TLRU 캐시 알고리즘은 캐시 상에 로딩되는 캐시 오브젝트(cache object)인 인공지능 모델에 대해 타임스탬프(time-stamp)를 할당할 수 있다. 타임 스탬프를 할당함으로써 가장 많이 사용되지 않은 인공지능모델이 캐시 상에서 삭제될 뿐만 아니라, 할당된 시간이 많이 지난 모델도 캐시 상에서 삭제될 수 있다.The TLRU cache algorithm can assign a timestamp to the artificial intelligence model, which is a cache object loaded into the cache. By assigning a timestamp, not only the least used artificial intelligence models are deleted from the cache, but also models whose assigned time has elapsed can also be deleted from the cache.

즉, 공유 워크스페이스에서는 복수의 유저가 동일한 컴퓨팅 자원을 공유할 수 있으므로, 이러한 TLRU 캐시 알고리즘을 통해 보다 효율적으로 캐시 상의 인공지능모델을 삭제할 수 있다.In other words, since multiple users can share the same computing resources in a shared workspace, artificial intelligence models on the cache can be deleted more efficiently through this TLRU cache algorithm.

전용 워크스페이스 허브(또는 전용 워크스페이스)에서는 LRU 캐시 알고리즘이 디폴트 캐시 알고리즘으로서 사용될 수 있다. 전용 워크스페이스에서는 LRU 캐시 알고리즘이 디폴트 캐시 알고리즘으로서 사용될 수 있다. 전용 워크스페이스에서는 사용자의 모델 사용 데이터, 모델 사용 로그를 기반으로 캐시 알고리즘이 변화되어 적용될 수 있다.In a dedicated workspace hub (or dedicated workspace), the LRU cache algorithm can be used as the default cache algorithm. In a dedicated workspace, the LRU cache algorithm can be used as the default cache algorithm. In a dedicated workspace, the cache algorithm can be changed and applied based on the user's model usage data and model usage log.

이상 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.The embodiments according to the present invention described above can be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. medium), and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. A hardware device can be converted into one or more software modules to perform processing according to the invention and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항과 한정된 실시예 및 도면에 의하여 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위하여 제공된 것일 뿐, 본 발명이 상기 실시예에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정과 변경을 꾀할 수 있다.In the above, the present invention has been described in terms of specific details, such as specific components, and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. Anyone with ordinary knowledge in the technical field to which the invention pertains can make various modifications and changes from this description.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all scopes equivalent to or equivalently changed from the scope of the claims are within the scope of the spirit of the present invention. It will be said to belong to

Claims

The embedding method of unstructured data in a data processing system is:
An embedding artificial intelligence model group of a data processing system converts the unstructured data into an embedding value; and
A database of the data processing system stores the embedding value,
The data processing system includes a workspace,
The workspace processes structured and unstructured data based on nested queries,
The unstructured data is processed based on unstructured data processing queries,
The structured data is processed based on structured data processing queries,
The nested query is a query that combines a first query for the unstructured data and a second query for the structured data,
The unstructured data processing query is a query for processing only the unstructured data,
The structured data processing query is a query for processing only the structured data.

According to paragraph 1,
The data processing system performs artificial intelligence engine modeling using the unstructured data and the structured data on a single platform without a separate batch process based on extended structured query language (SQL),
Processing of the nested query is,
The data processing system performing processing on the unstructured data based on an extended SQL engine that processes the extended SQL; and
The data processing system is performed based on the step of performing processing on the structured data based on a general SQL engine that processes Postgre SQL (structured query language),
The extended SQL is a query language defined to process the unstructured data,
A method characterized in that the Postgre SQL is a query language defined to process the structured data.

According to paragraph 1,
The embedding artificial intelligence model group includes at least one different artificial intelligence model that generates the necessary embedding values,
The embedding value is determined based on the feature map of the at least one different artificial intelligence model,
The embedding value is generated by a convert statement based on an extended SQL engine,
A method characterized in that the embedding value is searched by a search statement based on an extended SQL engine.

A data processing system that performs embedding of unstructured data,
The embedding artificial intelligence model group changes the unstructured data into embedding values,
A database is implemented to store the embedding value,
The data processing system includes a workspace,
The workspace processes structured and unstructured data based on nested queries,
The unstructured data is processed based on unstructured data processing queries,
The structured data is processed based on structured data processing queries,
The nested query is a query that combines a first query for the unstructured data and a second query for the structured data,
The unstructured data processing query is a query for processing only the unstructured data,
A data processing system, characterized in that the structured data processing query is a query for processing only the structured data.

According to clause 4,
The data processing system performs artificial intelligence engine modeling using the unstructured data and the structured data on one platform without a separate batch process based on extended structured query language (SQL),
Processing of the nested query is,
The data processing system performing processing on the unstructured data based on an extended SQL engine that processes the extended SQL; and
The data processing system is performed based on the step of performing processing on the structured data based on a general SQL engine that processes Postgre SQL (structured query language),
The extended SQL is a query language defined to process the unstructured data,
The Postgre SQL is a data processing system characterized in that it is a query language defined to process the structured data.

According to paragraph 4,
The embedding artificial intelligence model group includes at least one different artificial intelligence model that generates the necessary embedding values,
The embedding value is determined based on the feature map of the at least one different artificial intelligence model,
The embedding value is generated by a convert statement based on an extended SQL engine,
A data processing system wherein the embedding value is searched by a search statement based on an extended SQL engine.