KR20190134983A

KR20190134983A - Method for providing big data-based artificial intelligence integration platform service

Info

Publication number: KR20190134983A
Application number: KR1020190059091A
Authority: KR
Inventors: 박병훈
Original assignee: 박병훈
Priority date: 2018-05-18
Filing date: 2019-05-20
Publication date: 2019-12-05
Also published as: KR102236302B1

Abstract

Provided is a method for providing services of an integrated platform in which a big data platform and an artificial intelligence platform are integrated. According to the present invention, a big data platform and an artificial intelligence platform are implemented in in a cluster-based distributed architecture to provide data processing, machine learning and artificial intelligence services with high availability in a large quantity at high speed in real time. Therefore, the real-time big data-based machine learning and artificial intelligence services can be provided through the combination of two platforms, thereby significantly reducing time and cost required to build the real-time big data-based artificial intelligence system in various application fields.

Description

Big data-based AI integrated platform service method {METHOD FOR PROVIDING BIG DATA-BASED ARTIFICIAL INTELLIGENCE INTEGRATION PLATFORM SERVICE}

본 발명은 인공지능 플랫폼 분야에 관한 것으로서, 보다 상세하게는 빅데이터 기반의 인공지능 통합 플랫폼 서비스 방법에 관한 것이다.The present invention relates to the field of artificial intelligence platform, and more particularly, to a method for artificial intelligence integrated platform service based on big data.

본 기술의 근간이 되는 빅데이터 기술은 다양한 소스들로부터의 데이터를 수집하고 데이터 원본을 저장하며 데이터를 처리하고 분석하여 그 결과를 저장하고 시각화한다.The big data technology underlying the technology collects data from various sources, stores data sources, processes and analyzes the data, and stores and visualizes the results.

또한 인공지능 기술은 학습 데이터 셋을 기반으로 기계 학습을 수행하며 기계 학습을 기반으로 인공지능 서비스를 구현하여 제공한다.In addition, AI technology performs machine learning based on the learning data set, and implements and provides AI services based on machine learning.

하지만 기존 빅데이터 시스템과 인공지능 시스템은 분산 기술이 제한적으로 적용되어 데이터 처리, 기계 학습, 인공지능 서비스 제공에 있어서의 대량, 실시간, 고속, 고가용성 보장에 한계가 존재하였다. 또한 빅데이터 시스템과 인공지능 시스템이 개별적으로 구축되고 분리 운용됨으로 인해 빅데이터 시스템에서 수집 및 분석되는 데이터 처리 결과가 인공지능 시스템에 실시간으로 반영되지 못함으로써 실시간 빅데이터 기반의 기계 학습, 인공지능 서비스 제공이 불가능한 문제가 있었다.However, existing big data systems and artificial intelligence systems have limited application of distributed technologies, and there is a limit in ensuring mass, real-time, high speed, and high availability in data processing, machine learning, and artificial intelligence services. In addition, since big data system and AI system are separately constructed and operated separately, data processing results collected and analyzed in big data system are not reflected to AI system in real time. There was a problem that could not be provided.

본 발명은 빅데이터 플랫폼과 인공지능 플랫폼을 결합한 빅데이터 기반 인공지능 플랫폼 서비스 방법을 제공한다.The present invention provides a big data-based AI platform service method combining a big data platform and an AI platform.

본 발명은 다양한 비즈니스에서 활용되는 빅데이터 플랫폼 및 인공지능 플랫폼의 아키텍처 구성 및 두 플랫폼의 결합에 관한 것이다. 보다 상세하게는 빅데이터 플랫폼의 실시간 수집, 메시지 큐, 실시간 처리, 저장소를 클러스터 기반의 분산 아키텍처로 구현함으로써 데이터 처리의 대량, 실시간, 고속, 고가용성을 보장하고 인공지능 플랫폼의 기계 학습, 인공지능 서비스, 학습모델/학습결과 저장소를 클러스터 기반의 분산 아키텍처로 구현함으로써 기계 학습, 인공지능 서비스 제공의 대량, 실시간, 고속, 고가용성을 보장하는 방법을 제공한다. 또한 분산 아키텍처로 구성된 빅데이터 플랫폼과 인공지능 플랫폼을 결합함으로써 실시간 빅데이터 기반의 기계 학습 및 인공지능 서비스의 제공을 가능하도록 한다.The present invention relates to the architecture of a big data platform and an artificial intelligence platform utilized in various businesses, and a combination of the two platforms. More specifically, real-time collection, message queues, real-time processing, and storage of big data platforms are implemented in a cluster-based distributed architecture to ensure mass, real-time, high speed, high availability of data processing, and machine learning and artificial intelligence of AI platforms. By implementing services, learning model / learning result repository in cluster-based distributed architecture, it provides a way to guarantee mass, real time, high speed and high availability of machine learning and AI service provision. In addition, by combining the big data platform and artificial intelligence platform composed of distributed architecture, it is possible to provide real-time big data-based machine learning and artificial intelligence services.

본 발명에 따르면, 빅데이터 플랫폼과 인공지능 플랫폼이 통합된 통합 플랫폼 서비스 방법이 제공된다. 빅데이터 플랫폼과 인공지능 플랫폼이 통합된 통합 플랫폼을 클러스터 기반의 분산 아키텍처로 구현함으로써 데이터 처리, 기계 학습, 인공지능 서비스 제공의 대량, 실시간, 고속, 고가용성을 보장한다. 두 플랫폼의 결합을 통해 실시간 빅데이터 기반의 기계 학습 및 인공지능 서비스 제공이 가능하게 한다. 이러한 통합 플랫폼을 이용한 서비스 방법은 다양한 응용분야에서 실시간 빅데이터 기반 인공지능 시스템 구축에 소요되는 기간과 비용을 획기적으로 줄일 수 있도록 해준다.According to the present invention, an integrated platform service method in which a big data platform and an artificial intelligence platform are integrated is provided. The integrated platform, which integrates the big data platform and the AI platform, is implemented in a cluster-based distributed architecture to ensure mass, real-time, high speed, and high availability of data processing, machine learning, and AI service delivery. The combination of the two platforms enables the provision of real-time big data-based machine learning and artificial intelligence services. The service method using this integrated platform can drastically reduce the time and cost required to build a real-time big data-based AI system in various applications.

도 1은 본 발명의 통합 플랫폼 시스템에서 빅데이터 플랫폼의 동작 프로세스를 나타내는 도면이다.
도 2는 본 발명의 통합 플랫폼 시스템에서 인공지능 플랫폼의 동작 프로세스를 나타내는 도면이다.
도 3은 본 발명에 따른 빅데이터 기반 인공지능 통합 플랫폼 시스템 구조를 도시한 도면이다.
도 4는 도 3의 통합 플랫폼 시스템의 동작 프로세스를 나타내는 도면이다.1 is a diagram illustrating an operation process of a big data platform in the integrated platform system of the present invention.
2 is a diagram illustrating an operation process of an artificial intelligence platform in the integrated platform system of the present invention.
3 is a diagram illustrating the structure of a big data-based artificial intelligence integrated platform system according to the present invention.
4 is a diagram illustrating an operation process of the integrated platform system of FIG. 3.

이하 첨부한 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. 본 발명의 실시예를 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the embodiments of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

도 1은 빅데이터 플랫폼에서 데이터의 실시간 수집, 메시지 큐에서의 보관 및 전달, 실시간 처리, 저장까지의 기능 흐름을 나타낸다. 실시간 수집, 메시지 큐, 실시간 처리, 저장 시스템을 클러스터 기반의 분산 아키텍처로 구현함으로써 데이터 처리의 대량, 실시간, 고속, 고가용성을 보장해준다. 데이터 소스는 인공지능 학습데이터로 활용되는 것과 인공지능 서비스를 요청하는 것으로 구성된다. 1 illustrates a functional flow of real-time collection of data, storage and delivery in a message queue, real-time processing, and storage in a big data platform. Real-time collection, message queues, real-time processing, and storage systems are implemented in a cluster-based distributed architecture to ensure high volume, real time, high speed, and high availability of data processing. The data source consists of being used as AI learning data and requesting AI service.

인공지능 학습데이터로 활용되는 데이터는 실시간 학습데이터 수집, 학습데이터 수집 큐, 실시간 전처리를 거쳐 학습데이터 저장소에 저장되어 인공지능 플랫폼의 학습데이터로 제공된다. 인공지능 서비스를 요청하는 데이터는 실시간 데이터 수집, 인공지능 서비스 요청 큐, 실시간 분석을 통해 인공지능 서비스를 호출하는데 사용되고 결과는 서비스 결과 저장소에 저장된다.The data used as the AI learning data is stored in the learning data repository through real time learning data collection, learning data collection queue, and real time preprocessing, and provided as learning data of the AI platform. The data requesting the AI service is used to call the AI service through real-time data collection, AI service request queue, and real-time analysis, and the result is stored in the service result repository.

도 4는 인공지능 플랫폼에서의 기계 학습 및 인공지능 서비스 기능을 나타낸다. 기계 학습. 인공지능 서비스, 학습모델/학습결과 저장소를 클러스터 기반의 분산 아키텍처로 구현함으로써 기계 학습 및 인공지능 서비스 제공을 통해 예측, 분류, 군집화, 이상탐지, 강화학습 문제 해결의 대량, 실시간, 고속, 고가용성을 보장해준다.4 illustrates machine learning and AI service functions in an AI platform. Machine learning. Implementing AI services and learning model / learning results repository in a cluster-based distributed architecture, providing machine learning and AI services to provide high-volume, real-time, high-speed, high availability for predicting, classifying, clustering, anomaly detection, and reinforcement learning Guarantees.

본 발명은 분산 아키텍처 기반의 빅데이터 플랫폼과 인공지능 플랫폼을 결합하여 실시간 빅데이터 기반의 인공지능 서비스 제공이 가능한 통합 플랫폼 및 그 서비스 방법을 제시한다. 빅데이터 플랫폼의 데이터 분석 결과인 학습 데이터 셋이 인공지능 플랫폼의 학습 모델 설계에 실시간으로 반영되어 실시간으로 최적의 학습 모델을 도출하고 이를 기반으로 인공지능 서비스를 구현 및 배포하여 실시간 분산 인공지능 서비스를 제공할 수 있도록 해준다. 나아가, 결합 플랫폼의 효율적 관리를 위한 설치 및 서비스 관리, 노드 및 서비스 상태 모니터링, 엔진 분석 및 로그 관리, 탐지 실시간 모니터링, 결과 관리, 모델 관리를 수행해준다.The present invention proposes an integrated platform capable of providing AI services based on real-time big data by combining a big data platform and an AI platform based on a distributed architecture, and a service method thereof. The training data set, which is the result of analyzing the data of big data platform, is reflected in the design of the learning model of the AI platform in real time to derive the optimal learning model in real time, and to implement and distribute the AI service based on the real-time distributed AI service. To provide. Furthermore, it provides installation and service management, node and service status monitoring, engine analysis and log management, real-time detection monitoring, result management, and model management for efficient management of the combined platform.

도 3은 본 발명에 따른 빅데이터 기반 인공지능 통합 플랫폼 시스템 구조를 도시한 도면이다. 도 4는 도 3의 통합 플랫폼 시스템의 동작 프로세스를 나타내는 도면이다.3 is a diagram illustrating a structure of a big data-based artificial intelligence integrated platform system according to the present invention. 4 is a diagram illustrating an operation process of the integrated platform system of FIG. 3.

본 발명의 통합 플랫폼은 빅데이터 플랫폼과 인공지능 플랫폼을 포함한다. 빅데이터 플랫폼과 인공지능 플랫폼으로 구성되어 다양한 인공지능 서비스팩을 제공하며, 사용자의 비즈니스에 맞게 서비스팩을 쉽고 빠르게 커스터마이징하여 활용하도록 구성된다.The integrated platform of the present invention includes a big data platform and an artificial intelligence platform. It is composed of big data platform and AI platform to provide various AI service packs, and is configured to easily and quickly customize service packs according to the user's business.

특정 비즈니스를 위한 서비스팩은 목적과 학습 데이터셋이 결정되면 코딩 없이 플랫폼 서비스에서의 설정만으로 구성되며 힉습모델 설계 및 학습 API 생성, 배포의 전체 라이프 사이클을 지원하는 플랫폼 기반으로 개발되기 때문에 적시 제공(Just in time)이 가능하다. 또한 플랫폼 서비스는 필요에 따라 실시간으로 Scale Out/ Scale In 이 가능하게 운영된다. Service packs for a specific business consist only of settings from platform services, without coding, once the purpose and learning datasets are determined, and are developed on a platform-based basis that supports the entire life cycle of learning model design, learning API generation, and deployment. Just in time is possible. In addition, platform services can be scaled out and scaled in as needed.

인공지능 플랫폼은 다양한 사용자들의 필요에 적합한 개발, 운영환경을 제공하며 고객의 요청에 따라 기존의 시설 및 장비 등의 사용되는 응용 서비스와도 연동이 되어 유연한 활용성 제공하도록 구성된다.AI platform provides development and operation environment suitable for various users' needs, and it is configured to provide flexible utilization by interworking with existing application services such as facilities and equipment according to customer's request.

별도의 개발 작업이 필요 없이(No Code Dev.) 알고리즘 설정으로 서비스 개발이 가능하며 고객의 요청에 따라 서비스를 개발하는 주문형 서비스(On-Demand Service)의 생산 가능하다.It is possible to develop service by setting algorithm without any development work (No Code Dev.) And to produce On-Demand Service that develops service according to customer's request.

인공지능 플랫폼의 분석 통합개발 및 사용자에게 데이터 생성, 모델 설계, 모델 학습, 모델 테스트, 추론에 이르는 일련의 과정 수행 편의성 제공한다.It provides analytical integration development of AI platform and user convenience of performing a series of processes ranging from data generation, model design, model learning, model testing, and inference.

인공지능 학습 및 실행을 위한 프레임워크 및 분석엔진 제공한다.It provides a framework and analysis engine for AI learning and execution.

서비스의 구성은 학습을 위한 일반 알고리즘의 학습 Data Set을 바인딩 시킨 Domain Specific Service, 여러 Domain Specific Service들의 조합으로 만들어진 Composite Service로 구성된다.Service composition consists of Domain Specific Service which binds learning data set of general algorithm for learning, and Composite Service which is composed of several Domain Specific Services.

본 발명의 통합 플랫폼 서비스 방법은 분석엔진의 확장성 제공하고, 인공지능 서비스 Scale Out 보장하며, 앙상블 서비스를 지원한다. 플랫폼 사용자가 학습 Data Set과 비즈니스 목적(예측, 분류, 이상탐지, 군집화 등)을 선택하면 최적의 알고리즘(단위, 앙상블)을 자동 추천하고, 추천된 알고리즘의 파라미터 및 네트워크를 자동으로 설정하여 최적의 학습결과를 제공하고 서비스를 자동으로 배포하여 즉각적인 서비스가 가능하다.The integrated platform service method of the present invention provides the scalability of the analysis engine, guarantees the AI service scale out, and supports the ensemble service. When the platform user selects the learning data set and the business purpose (prediction, classification, anomaly detection, clustering, etc.), the platform automatically recommends the optimal algorithm (unit, ensemble) and automatically sets the parameters and network of the recommended algorithm. Immediate service is possible by providing learning results and automatically distributing the service.

본 발명의 서비스 방법이 제공되는 통합 플랫폼은 다음과 같은 클러스터들을 포함할 수 있다.The integrated platform provided with the service method of the present invention may include the following clusters.

실시간 수집 클러스터Real Time Collection Cluster

· 실시간 수집 클러스터는 데이터 소스로부터 데이터를 가져와 메시지 큐에 저장한다. Real-time collection clusters take data from data sources and store it in message queues.

· 또한 실시간 수집 클러스터는 간단한 데이터 필터링 및 변환 기능을 제공한다. Real-time collection clusters also provide simple data filtering and transformation.

· 실시간 수집 클러스터는 실시간/배치 처리 기능을 제공하고, 수집 상태 모니터링 및 작업 로그를 제공한다.Real-time collection clusters provide real-time / batch processing, collection status monitoring, and job logs.

메시지 큐 클러스터Message Queuing Cluster

· 데이터 제공자로부터 실시간으로 데이터를 받아 큐에 저장한다.Receive data from the data provider in real time and store it in a queue.

· 데이터 소비자들에게 실시간으로 데이터를 제공한다.Provide data to data consumers in real time.

· 정해진 기간 동안 데이터를 보관하고, 과거의 데이터를 다시 조회할 수 있다.Keep data for a set period of time and retrieve historical data again.

· 수집되는 데이터가 많을 경우 메시지 큐를 병렬로 구성한다.If you have a lot of data collected, configure the message queues in parallel.

· 병렬 메시지 큐의 경우 데이터의 수집 시간에 영향을 줄 수 있다.Parallel message queues can affect the collection time of the data.

· 별도의 order tag를 불여 해결한다.Solve separate order tags.

실시간 처리 클러스터Real Time Processing Cluster

· 실시간으로 데이터를 저장소에 저장한다.Store data in storage in real time.

· 데이터 필터링/집계/통계 등 필요한 모든 분석 기능을 담당한다.Responsible for all necessary analysis functions such as data filtering, aggregation, and statistics.

· AI 서비스를 호출하여 분석 결과를 저장한다.Invoke the AI service to save the analysis results.

인공지능 데이터 클러스터AI data cluster

· 인공지능 학습, 서비스에 적용하기 위한 데이터를 생성한다.Generate data for AI learning and services.

· 데이터 정규화, 데이터 형변환, 데이터 확인, 특징 추출 등의 과정을 거쳐서 인공지능에 필요한 데이터로 변환한다.The data is converted to data necessary for artificial intelligence through data normalization, data type conversion, data verification, and feature extraction.

· 데이터의 변환 속도를 빠르게 하기 위해서 여러 GPU 로 분산하여 처리한다.It is distributed and processed among several GPUs to speed up the conversion of data.

학습(딥러닝) 클러스터Learning (deep learning) cluster

· 모델을 설계한다.Design the model.

· 설계한 모델을 데이터와 연계하여 학습을 진행한다.· Train the model by linking it with data.

· 학습에 필요한 하이퍼파라미터의 범위를 설정하여 여러 개의 학습을 분산환경에서 다중으로 처리하여 최적의 하이퍼파라미터를 찾는다.· Set the range of hyperparameters for learning to find the optimal hyperparameters by processing multiple learnings in a distributed environment.

· 유사 알고리즘을 다수 적용하여 최적의 모델을 찾는다.Applying many similar algorithms to find the best model

· 학습 속도를 빠르게 하기 위해서 여러 개의 GPU 를 사용하고, 여러 node 로 나누어 분산하여 학습한다.· Use multiple GPUs to speed up learning and divide and learn by dividing into multiple nodes.

· 학습 완료된 모델은 입력 데이터 구조와 출력 데이터 구조, 모델의 그래프 구성, 그리고 그외 필요한 데이터로 구성된다.The trained model consists of the input data structure, the output data structure, the graph construction of the model, and other necessary data.

· 완료된 학습 모델을 다시 불러와 추가 데이터를 적용하여 추가 학습을 진행할 수 있다.· You can reload the completed learning model and apply additional data for further learning.

· 완료된 학습을 서비스 클러스터의 모델 저장소로 배포하여 서비스를 시작할 수 있다.· You can start the service by distributing the completed training to the model repository for the service cluster.

· gpu를 이용한 분산 학습은 학습할 데이터를 분산에 참여하는 gpu 수만큼 나누어 학습하고, 결과를 취합하는 방식으로 제공한다.Distributed learning using gpu divides the data to be learned by dividing the number of gpu participating in the distribution, and provides the result by collecting the result.

인공지능 서비스 클러스터 & App 클러스터AI Service Cluster & App Cluster

· 모델 리포지터리를 운영한다. 리포지터리에 등록된 모델에 대하여 서비스를 제공한다.Operate the model repository. Provides services for models registered in the repository.

· 모델 리포지터리에 등록할 때 그래프 최적화를 진행하여 서비스 속도를 향상시킨다.• Improve the service speed by performing graph optimization when registering in the model repository.

· 같은 모델의 다른 버전을 등록할 수 있고, 버전 별 서비스 비율을 설정할 수 있다.You can register different versions of the same model and set the service rate for each version.

· 서비스 요청을 위하여 서비스를 병렬로 구성할 수 있다.You can configure services in parallel for service requests.

본 발명의 통합 플랫폼 서비스 제공 방법은 또한 다음과 같은 특징을 가진다.The integrated platform service providing method of the present invention also has the following features.

- 빅데이터 실시간 프로세스와 인공지능 플랫폼의 결합을 통해 대량의 데이터의 대하여 고속, 실시간, 고가용상 처리를 보장한다.The combination of big data real-time processes and AI platforms ensures high-speed, real-time and high availability processing of large amounts of data.

- 실시간 학습을 위한 실시간 빅데이터 처리 활용 가능하다.-Real-time big data processing for real-time learning can be utilized.

- 실시간 인공지능 서비스(예측, 분류, 이상탐지, 군집화, 강화학습 등)를 위해 실시간 빅데이터 처리활용 가능하다.-Real-time big data processing can be utilized for real-time artificial intelligence service (prediction, classification, abnormality detection, clustering, reinforcement learning, etc.).

- 인공지능 분산 학습 플랫폼 제공한다.-Provide AI distributed learning platform.

- 인공지능 분산 서비스 제공한다.-Provide AI distributed service.

- 학습 데이터 및 서비스 요청의 안정적인 처리를 위해 분산큐 활용한다.-Use distributed queue for stable processing of learning data and service requests.

- 실시간 전처리 및 실시간 인공지능 서비스를 위해 실시간 분산 빅데이터 처리 기술 활용한다.-Real-time distributed big data processing technology is utilized for real time preprocessing and real time AI service.

- 대량의 학습데이터 및 결과 데이터 저장을 위해 분산 저장소 활용한다.-Use distributed storage for storing large amount of learning data and result data.

- 서비스 결과를 학습데이터에 반영함으로써 추가학습, 재 학습, 강화학습을 실시간으로 가능하도록 한다.-The service results are reflected in the learning data to enable additional learning, re-learning and reinforcement learning in real time.

- API 서비스를 실시간으로 가능하도록 통해 내 외부에서 API방식의 호출을 지원한다.-It supports API type call from inside and outside by enabling API service in real time.

이상, 본 발명의 상세한 설명에서는 구체적인 실시예에 관해서 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 당해 분야에서 통상의 지식을 가진 자에게 있어서 자명하다 할 것이다.In the foregoing detailed description of the present invention, specific embodiments have been described, but it will be apparent to those skilled in the art that various modifications can be made without departing from the scope of the present invention.

Claims

As an integrated platform service method of big data platform and AI platform:
Collecting, by a real time collection cluster, learning data from a data source and request data from an AI service user in real time;
Storing, by the learning data collection queue and the AI request queue of the message queue cluster, the learning data and the request data from the real-time collection cluster, respectively;
A real time processing cluster preprocessing data from the learning data collection queue in real time, and calling an AI service according to the AI request queue to perform real time analysis;
Storing, by the artificial intelligence data cluster, data preprocessed in the real time processing cluster in a learning data storage, and storing the result of the real time analysis in a result storage;
A machine learning cluster designing a learning model and performing machine learning using data of the learning data repository;
Storing, by an AI service cluster, a learning model of the machine learning cluster in a learning model repository and storing a learning result in a learning result repository; And
App. Clustering process of distributing the service learning results; integrated platform service method comprising a.

The method according to claim 1,
The artificial intelligence data cluster converts the artificial data into data necessary for artificial intelligence through a process of data normalization, data type conversion, data verification, and feature extraction.

The method according to claim 1,
The machine learning cluster is configured to proceed to further learning by reloading the completed learning model and apply additional data.