KR102318533B1

KR102318533B1 - Method and System for GPU-based Embedded Edge Server Configuration and Neural Network Service Utilization

Info

Publication number: KR102318533B1
Application number: KR1020200079401A
Authority: KR
Inventors: 김덕환; 김주환; 울라샨
Original assignee: 인하대학교 산학협력단
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2021-10-28

Abstract

Disclosed are a method for configuring a GPU-based embedded edge server and utilizing a neural network service, which implements a single board-based edge cluster equipped with a GPU supporting a container environment, and a system thereof. According to the present invention, the method comprises the following steps: executing a device query application on all edge servers and transmitting update of an extended resource for a GPU specification to an API server as an HTTP patch request to register the update in all nodes; confirming a service request for requesting a desired service from a client to an API server of an edge existing locally; after confirming the service request, receiving, by the edge server, external data for the service request from an IoT data source transmitting/receiving data by using a stream socket; allocating a neural network pod by using Kubernetes scheduling for the received external data; and processing, by the assigned pod, the external data through the neural network.

Description

{Method and System for GPU-based Embedded Edge Server Configuration and Neural Network Service Utilization}

본 발명은 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 방법 및 시스템에 관한 것이다. The present invention relates to a method and system for configuring a GPU-based embedded edge server and utilizing a neural network service.

최근 다양한 분야에서 인공지능(Artificial Intelligence; AI)에 대한 관심이 높아지고 있다. 이들 분야 중 사물인터넷(Internet of Things; IoT)은 이용자들이 이용할 수 있는 데이터의 다양성과 규모, 환경이 증가함에 따라 AI의 필요성이 더욱 중요해지고 있는 하나의 도메인이다. 현재 IoT기기의 AI 서비스는 인터넷을 통해 중앙 클라우드를 통해 제공되고 있기 때문에, 이러한 AI 서비스는 짧은 대기 시간, 개발 용이성, 데이터베이스 쿼리, 자원 활용을 위한 확장 등 클라우드 서비스의 성능과 관리에 관한 고려사항의 영향을 받고 있다. 클라우드 업체들은 AI 모델에 필요한 방대한 양의 데이터를 처리하기 위해 다양한 방식으로 작동하는 새로운 클라우드 모델을 검토하고 있다. 이러한 클라우드 모델 중에서 에지 컴퓨팅은 중앙 클라우드 구조의 단점을 보완할 수 있는 새로운 처리 방법으로 논의되고 있다. 미국의 리서치 회사인 Gartner는 2020년에 발표된 10대 전략 기술 중 하나로 자율적 에지를 보고한 반면, IBM은 2020년에 네트워크 진화에 사용하기 위한 방법론으로 에지 컴퓨팅과 레버리지 쿠버네티스를 선택했다. 차세대 클라우드의 핵심으로 꼽히는 에지 컴퓨팅은 기존의 중앙 서버를 활용해 모든 데이터를 처리하는 대신 주변 환경과 사용자의 필요에 따라 실시간 데이터를 활용하는 패러다임을 말한다. 에지 컴퓨팅은 사용자 평면에 인접한 마이크로 데이터 센터, 클라우드렛(cloudlet), 포그(fog) 등 에지 위치 영역의 컴퓨팅 장치를 기반으로 한다. 기존 클라우드와 비교한 이점은 짧은 지연 시간, 트래픽 배포 및 개인 정보 데이터 보호이다. Recently, interest in artificial intelligence (AI) is increasing in various fields. Among these fields, the Internet of Things (IoT) is one domain in which the need for AI is becoming more important as the variety, size, and environment of data available to users increases. Currently, AI services for IoT devices are provided through a central cloud through the Internet, so these AI services are based on considerations regarding performance and management of cloud services, such as low latency, ease of development, database queries, and expansion for resource utilization. are being affected Cloud vendors are exploring new cloud models that work in different ways to handle the massive amounts of data required for AI models. Among these cloud models, edge computing is being discussed as a new processing method that can compensate for the shortcomings of the central cloud structure. US research firm Gartner reported autonomous edge as one of the top 10 strategic technologies announced for 2020, while IBM chose edge computing and leveraged Kubernetes as its methodology for use in network evolution in 2020. Edge computing, which is considered the core of the next-generation cloud, refers to a paradigm that utilizes real-time data according to the surrounding environment and user needs instead of using the existing central server to process all data. Edge computing is based on computing devices in edge location areas, such as micro data centers, cloudlets, and fogs, adjacent to the user plane. The advantages over traditional cloud are low latency, traffic distribution and privacy data protection.

도 1은 에지 컴퓨팅의 구조를 나타내는 도면이다. 1 is a diagram showing the structure of edge computing.

에지 컴퓨팅과 관련된 연구는 다양한 관점에서 수행되었다. 클라우드, 모바일 에지 컴퓨팅(Mobile Edge Computing; MEC), IoT를 포함하는 3계층(사용자 평면, 에지 컴퓨팅 평면, 클라우드 컴퓨팅 평면) 구조가 종래기술에서 제안되었다. 또한 최종 사용자에 가까운 영역에서 서비스, 컨텐츠 및 기능을 구성하기 위해 에지 컴퓨팅 솔루션을 사용할 것을 제안하는 연구, 필드, 얕고 깊은 클라우드렛으로 명명된 구성요소로 구성된 계층적 MEC 구조, 서비스 효율성을 높이기 위하여 투명한 컴퓨팅을 사용한 에지 컴퓨팅 구성 문제 고려와 새로운 접근법의 제안이 수행되었다. 그리드 및 피어 투 피어 시스템을 지원하는 계산 및 저장의 상호작용을 갖춘 에지 컴퓨팅 플랫폼 구성인 네뷸러(Nebula)도 제안되었다.Research related to edge computing has been conducted from various perspectives. A three-layer (user plane, edge computing plane, cloud computing plane) structure including cloud, mobile edge computing (MEC), and IoT has been proposed in the prior art. In addition, studies suggesting the use of edge computing solutions to organize services, content and functions in areas close to the end user, a hierarchical MEC structure composed of components named as fields, shallow and deep cloudlets, transparent to increase service efficiency The consideration of the configuration problem of edge computing using computing and the proposal of a new approach were carried out. Nebula, an edge computing platform configuration with interoperability of computation and storage to support grid and peer-to-peer systems, has also been proposed.

에지 컴퓨팅의 계획 관리 측면에서는 CPU와 GPU의 자원이 QoS 요구 사항을 충족할 수 있도록 하는 용량 계획, 평균 서버 활용률, MEC로의 어플리케이션 구축 지연 시간 등이 연구 대상이었다. 또한, 에지 컴퓨팅의 동적 컴퓨팅 웹 컨텐츠 제공의 관계와 효과가 연구되었다.In terms of planning management of edge computing, capacity planning that allows CPU and GPU resources to meet QoS requirements, average server utilization rate, and latency of application deployment to MEC were studied. In addition, the relationship and effect of edge computing's dynamic computing web content provision was studied.

리소스 관리 및 에지 컴퓨팅 공급 측면에서, 리소스가 제한된 노드에 대한 새로운 서비스 로드 비용, 제한된 노드 용량 및 전달 프로세스 요청 비용 계산을 통한 에지 클라우드의 동적 서비스 공급이 보고되었다. 마르코브 의사 결정 프로세스(Markov Decision Process; MDP)를 사용한 이송은 사용자 이동 및 네트워크 성능에 대한 대응을 기반으로 한다. In terms of resource management and edge computing supply, dynamic service provision of edge cloud through calculation of new service load cost for resource limited nodes, limited node capacity and delivery process request cost has been reported. Migration using the Markov Decision Process (MDP) is based on user movement and response to network performance.

나아가, 종래기술을 바탕으로 다양한 테스트베드 유형으로 일부 연구가 시행되고 있다. 이러한 연구에는 컨테이너 환경에 기초한 에지 컴퓨팅의 평가와 단일 보드 컴퓨터 환경인 라즈베리(Raspberry) Pi를 활용하는 클러스터에서 컨테이너 기반 PaaS 구조의 3가지 에지 클라우드 구현이 포함된다. 에지 기기에서 텐서플로우(TensorFlow), Caffe2, MXNet, PyTorch, 텐서플로우 Lite 등 머신러닝 패키지의 성능 비교 연구가 수행되었다. 다만, 이전 연구에서 파악한 라즈베리 Pi 등 단일 보드는 신경망 구현을 위한 하드웨어 사양 면에서 한계가 있기 때문에, GPU(Nvidia GPU)를 갖춘 싱글보드 컴퓨터(ARM 코어 기반 CPU) 환경에서 에지 컴퓨팅을 구현할 필요가 있다. 더욱이 클러스터의 GPU 자원을 컨테이너 단위로 사용하기 위해서는 신경망 서비스에 필요한 사양을 확인하고 각 에지 노드에서 GPU 자원을 감시할 필요가 있다. 이러한 기법에서는 에지 기기 컨테이너 환경을 고려하지 않는다.Furthermore, based on the prior art, some studies are being conducted with various test bed types. These studies include evaluation of edge computing based on container environments and three edge cloud implementations of container based PaaS architectures in clusters utilizing Raspberry Pi, a single board computing environment. Performance comparison studies of machine learning packages such as TensorFlow, Caffe2, MXNet, PyTorch, and TensorFlow Lite were performed on edge devices. However, since single boards such as Raspberry Pi identified in previous studies have limitations in terms of hardware specifications for implementing neural networks, it is necessary to implement edge computing in a single board computer (ARM core-based CPU) environment equipped with a GPU (Nvidia GPU). . Moreover, in order to use the GPU resources of the cluster in container units, it is necessary to check the specifications required for the neural network service and monitor the GPU resources at each edge node. These techniques do not take into account the edge device container environment.

본 발명이 이루고자 하는 기술적 과제는 장치 쿼리 작업의 실행을 통해 확인된 각 노드의 GPU 리소스에 따라 쿠버네티스 및 새로운 Pod 할당 방법으로 컨테이너 환경을 구축하는 내장형 에지 서버를 제안한다. 그런 다음 물체 감지 및 드라이버 프로파일링이라는 두 가지 신경망 모델을 이용하여 쿠버네티스의 Pod 단위로 구성된 서비스 어플리케이션을 구성하고, 기본 Pod 할당 방식과 새로운 Pod 할당 방식을 비교한다. The technical problem to be achieved by the present invention is to propose a built-in edge server that builds a container environment with Kubernetes and a new Pod allocation method according to the GPU resources of each node identified through the execution of a device query job. Then, using two neural network models, object detection and driver profiling, a service application composed of Pod units in Kubernetes is constructed, and the default Pod allocation method is compared with the new Pod allocation method.

일 측면에 있어서, 본 발명에서 제안하는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 방법은 모든 에지 서버에서 디바이스 쿼리 어플리케이션을 실행하고, GPU 규격에 대한 확장된 리소스의 업데이트가 HTTP 패치 요청으로 API 서버에 전송되어 모든 노드에 등록되는 단계, 클라이언트로부터 로컬로 존재하는 에지의 API 서버에 원하는 서비스를 요청하는 서비스 요청을 확인하는 단계, 서비스 요청을 확인 후, 스트림 소켓을 사용하여 데이터를 송수신하는 IoT 데이터 소스로부터 에지 서버가 서비스 요청에 대한 외부 데이터를 수신하는 단계, 수신된 외부 데이터에 대하여 쿠버네티스(Kuvernetes) 스케줄링을 사용하여 신경망 Pod를 할당하는 단계 및 할당된 Pod가 신경망을 통해 외부 데이터를 처리하는 단계를 포함한다. In one aspect, the GPU-based embedded edge server configuration and neural network service utilization method proposed in the present invention executes a device query application on all edge servers, and updates the extended resources for the GPU specification to the API server with an HTTP patch request. The step is transmitted and registered in all nodes, the step of confirming the service request requesting the desired service from the client to the API server of the edge that exists locally, the IoT data source that transmits and receives data using the stream socket after checking the service request The edge server receives external data for a service request from includes steps.

클라이언트로부터 로컬로 존재하는 에지의 API 서버에 원하는 서비스를 요청하는 서비스 요청을 확인하는 단계는 클라이언트의 각 요청에 대해 통신하는 프로그램은 클러스터에 접속 가능한 인증 라이브러리와 내부 클라우드 운영을 지원하는 쿠버네티스 API를 사용한다. The step of confirming the service request that requests the desired service from the client to the API server at the edge, which exists locally use

수신된 외부 데이터에 대하여 쿠버네티스 스케줄링을 사용하여 신경망 Pod를 할당하는 단계는 새로 추가된 GPU 리소스에 대하여 신경망 Pod의 구성 파일에서 새로운 필터링 조건으로 설정된다. The step of allocating a neural network pod using Kubernetes scheduling for the received external data is set as a new filtering condition in the configuration file of the neural network pod for the newly added GPU resource.

수신된 외부 데이터에 대하여 쿠버네티스 스케줄링을 사용하여 신경망 Pod를 할당하는 단계는 탑재 가능 볼륨 및 볼륨 간 충돌의 노드 유효성을 점검하는 볼륨 필터링, Pod가 요청한 포트와 충분한 리소스를 점검하는 리소스 필터링, 클러스터 토폴로지에 따른 스케줄링을 지원하기 위해 허용 및 선택기를 포함하는 쿠버네티스 구성요소를 식별하는 토폴로지 필터링을 통해 유효한 노드를 식별하는 노드 필터링 단계, 노드의 레이블 조건을 결정하고 가중치를 부여하여 우선순위를 조정하는 노드 우선 순위 계산 단계 및 노드 필터링 및 노드 우선 순위 계산에 따라 신경망 Pod를 할당하는 실제 스케줄링 단계를 포함한다. The steps for allocating neural network pods using Kubernetes scheduling for received external data are: volume filtering to check mountable volumes and node validity for conflicts between volumes, resource filtering to check ports requested by pods and sufficient resources, cluster Node filtering phase to identify valid nodes through topology filtering to identify Kubernetes components including permits and selectors to support topology-dependent scheduling, determining and weighting the label conditions of nodes to adjust priorities It includes a node priority calculation step and an actual scheduling step of allocating neural network Pods according to node filtering and node priority calculation.

또 다른 측면에 있어서, 본 발명에서 제안하는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템은 모든 에지 서버에서 디바이스 쿼리 어플리케이션을 실행하고, GPU 규격에 대한 확장된 리소스의 업데이트를 HTTP 패치 요청으로 API 서버에 전송하여 모든 노드에 등록하는 어플리케이션 실행부, API 서버에 원하는 서비스를 요청 및 확인 후, IoT 데이터 소스로부터 요청에 대한 외부 데이터를 스트림 소켓으로 수신하는 데이터를 송수신을 위한 서비스 요청 확인 부, 수신된 외부 데이터에 대하여 쿠버네티스(Kuvernetes) 스케줄링을 사용하여 신경망 Pod를 할당하는 Pod 할당부 및 할당된 Pod가 신경망을 사용하여 외부 전송 및 내부로 저장되는 데이터를 처리하는 데이터 처리부를 포함한다.In another aspect, the GPU-based embedded edge server configuration and neural network service utilization system proposed in the present invention executes a device query application on all edge servers, and updates the extended resources for the GPU specification as an HTTP patch request to the API server. Application execution unit that transmits and registers to all nodes, requests and confirms the desired service to the API server, and then receives external data for the request from the IoT data source through the stream socket. It includes a Pod allocator for allocating neural network Pods using Kubernetes scheduling for external data, and a data processing part for processing data that the allocated Pods use the neural network to transmit externally and stored internally.

본 발명의 실시예들에 따르면 컨테이너 환경을 지원하는 GPU를 탑재한 단일 보드 기반 에지 클러스터 구현이 가능하고, GPU 자원을 고려한 새로운 Pod 방식이 제안되며, 운용 Pod의 수에서의 증가가 확인될 수 있다. 또한, 제한된 자원을 가진 에지 디바이스에 대하여 실행 가능한 심층 학습 모델을 프로파일링하는 드라이버가 서비스 가용성을 검색하는 데 사용된다. According to embodiments of the present invention, it is possible to implement a single board-based edge cluster equipped with a GPU supporting a container environment, a new Pod method considering GPU resources is proposed, and an increase in the number of operation Pods can be confirmed. . In addition, drivers profiling actionable deep learning models for edge devices with limited resources are used to discover service availability.

도 1은 에지 컴퓨팅의 구조를 나타내는 도면이다.
도 2는 이미지 구축 단계와 컨테이너형 어플리케이션을 실행하는 과정을 나타내는 도면이다.
도 3은 컨테이너 환경을 지원하는 런타임을 표준화하는 개방형 컨테이너 계획에 기초한 컨테이너 런타임 인터페이스를 나타내는 도면이다.
도 4는 컨테이너 조정의 유형을 나타내는 도면이다.
도 5는 본 발명의 일 실시예에 따른 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 에지 서버의 실현을 위한 구조를 나타낸다.
도 7은 본 발명의 일 실시예에 따른 데이터 흐름의 프로세스에 따른 어플리케이션의 작동을 나타낸다.
도 8은 본 발명의 일 실시예에 따른 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템의 구성을 나타내는 도면이다. 1 is a diagram showing the structure of edge computing.
2 is a diagram illustrating an image construction step and a process of executing a container-type application.
3 is a diagram illustrating a container runtime interface based on an open container scheme that standardizes a runtime supporting a container environment.
4 is a diagram illustrating a type of container adjustment.
5 is a flowchart illustrating a configuration of a GPU-based embedded edge server and a method of utilizing a neural network service according to an embodiment of the present invention.
6 shows a structure for realizing an edge server according to an embodiment of the present invention.
7 illustrates the operation of an application according to a process of data flow according to an embodiment of the present invention.
8 is a diagram illustrating a configuration of a GPU-based embedded edge server and a system for utilizing a neural network service according to an embodiment of the present invention.

최근 부상하고 있는 에지 컴퓨팅 기술은 기존 클라우드 컴퓨팅의 단점을 보완하는 새로운 패러다임으로 제시되고 있다. 특히 에지 컴퓨팅은 지연 시간이 짧은 서비스 어플리케이션에 사용하면서 로컬 데이터를 사용한다. 이 최근 기술을 위해서는 에지 서버에서 대규모 머신러닝을 실행하기 위한 신경망 접근이 필요하다. 본 발명에서는 GPU 기반 내장형 보드와 텐서플로우(TensorFlow) 기반의 신경망 서비스 어플리케이션을 이용한 쿠버네티스(Kubernetes) 기반의 에지 서버 구성의 효율성을 높이기 위해 다양한 GPU 자원을 추가하여 Pod 할당 방식을 제안한다. 제안된 에지 서버에서 수행된 실험 결과에 따르면, 로컬(20.4Mbps~42.4Mbps) 및 서비스 어플리케이션용 인터넷 환경(6.31Mbps~25. 5Mbps)에서 시간 및 데이터 크기 변화에 따른 대역폭을 유추할 수 있다. 에지 서버에서 2개의 신경망 어플리케이션을 실행할 때 물체 감지 어플리케이션의 네트워크 시간은 112.2ms(Xavier)에서 515.8ms(Nan)까지, 드라이버(driver) 프로파일링 어플리케이션의 네트워크 시간은 321.8ms(Xavier)에서 495.7ms(nano)까지이다. 제안된 Pod 할당 방식은 디폴트(default) Pod 할당 방법보다 우수한 성능을 보여준다. 본 발명의 실시예에 따르면, 3개 작업자 노드의 할당 가능한 Pod 수가 5개에서 7개로 증가하고, 보드 성능 차이의 응답 시간의 표준 편차가 약 53% 감소했음을 알 수 있다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다. The emerging edge computing technology is presented as a new paradigm to complement the shortcomings of the existing cloud computing. In particular, edge computing uses local data while being used for low-latency service applications. This latest technology requires a neural network approach for running large-scale machine learning on edge servers. The present invention proposes a Pod allocation method by adding various GPU resources to increase the efficiency of configuring a Kubernetes-based edge server using a GPU-based embedded board and a TensorFlow-based neural network service application. According to the experimental results performed on the proposed edge server, the bandwidth according to time and data size change can be inferred in the local (20.4Mbps~42.4Mbps) and Internet environment for service applications (6.31Mbps~25.5Mbps). When running two neural network applications on the edge server, the network time of the object detection application was 112.2 ms (Xavier) to 515.8 ms (Nan), and the network time of the driver profiling application was 321.8 ms (Xavier) to 495.7 ms ( up to nano). The proposed Pod allocation method shows better performance than the default Pod allocation method. According to the embodiment of the present invention, it can be seen that the number of allocable Pods of 3 worker nodes is increased from 5 to 7, and the standard deviation of the response time of the board performance difference is reduced by about 53%. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

새롭게 부상하는 5G 네트워크에서 소프트웨어 네트워킹은 통신 인프라, 기능, 어플리케이션 등 IT 가상화 기술을 폭넓게 활용하는 프로그래밍 가능한 접근방식이다. 따라서, 에지 컴퓨팅은 5G의 진화를 가능하게 하는 핵심 기술과 아키텍처 개념을 가지고 있다. 네트워크의 변화는 적은 대기 시간, 빠른 속도, 구조 면에서 에지 컴퓨팅의 성능과 직결된다. 이러한 기술은 클러스터형 에지 서버에 있는 각 노드의 네트워크 문제에 따라 서비스의 안정성을 지원한다. 종래기술에서는 사용자 평면의 안정성 문제와 지연 시간을 확인하기 위해 다양한 보드에 에지 서버를 구현했다. In the emerging 5G network, software networking is a programmable approach that makes extensive use of IT virtualization technologies, including communications infrastructure, functions, and applications. Thus, edge computing has the core technologies and architectural concepts that enable the evolution of 5G. Changes in the network are directly related to the performance of edge computing in terms of low latency, high speed, and structure. These technologies support the reliability of services depending on the network problems of each node in the clustered edge server. In the prior art, edge servers are implemented on various boards to check the stability problem and latency of the user plane.

컨테이너는 Linux에서 별도의 가상 리눅스 시스템(컨테이너)을 실행하기 위한 가상화 방법이다. 컨테이너에서는 독립 파일 시스템으로 구성된 계층을 하나의 이미지로 연결하여 유니온 파일 시스템을 사용하여 환경을 저장하고, 컨테이너마다 하드웨어 자원을 별도로 사용한다. 저장된 이미지를 사용하여 컨테이너형 어플리케이션을 실행할 수 있다. A container is a virtualization method for running a separate virtual Linux system (container) in Linux. In containers, the layers composed of independent file systems are linked into one image to store the environment using the union file system, and hardware resources are used separately for each container. You can use the saved image to run containerized applications.

도 2는 이미지 구축 단계와 컨테이너형 어플리케이션을 실행하는 과정을 나타내는 도면이다.2 is a diagram illustrating an image construction step and a process of executing a container-type application.

이 과정에서 리눅스 컨테이너 기술은 컨테이너 런타임에 따라 용도에 따라 환경변화를 쉽게 설정하는 방식이며, 소단위로 구성된 마이크로 서비스의 구현을 실현하는 방식이다. In this process, Linux container technology is a method of easily setting environmental changes according to the use according to the container runtime, and it is a method of realizing the implementation of microservices composed of small units.

에지 서버에서 사용되는 컨테이너의 장점은 다음과 같다. 첫째, 에지 서버의 하드웨어 제한 때문에 하이퍼바이저로 구현되는 프로그램은 상당한 크기 제한이 있는 반면, 컨테이너 환경의 프로그램은 가벼운 메모리 사용과 빠른 시작과 같은 쉬운 기본 구성을 허용한다. 둘째, 확장 작업에서 OS 커널 수준에서 처리되는 리소스 관리를 통해 유연한 확장이 용이하여 에지 서버에서 효율적으로 관리할 수 있다. The advantages of containers used in edge servers are as follows. First, due to the hardware limitations of edge servers, programs implemented as hypervisors have significant size limitations, while programs in container environments allow for light memory usage and easy basic configuration such as fast startup. Second, flexible expansion is easy through resource management handled at the OS kernel level in the expansion operation, so it can be efficiently managed by the edge server.

도 3은 컨테이너 환경을 지원하는 런타임을 표준화하는 개방형 컨테이너 계획(Open Container Initiative; OCI)에 기초한 컨테이너 런타임 인터페이스(Container Runtime Interfaces; CRI)를 나타내는 도면이다. 3 is a diagram illustrating Container Runtime Interfaces (CRI) based on the Open Container Initiative (OCI) that standardizes a runtime supporting a container environment.

이 CRI 목록에는 CRI-O, Docker 및 gVisor가 포함된다. 이 중 가장 오래되고(상호간의) 많은 사람이 이용하는 플랫폼이기 때문에 가장 많이 사용하는 플랫폼은 Docker이다. 그러나 CRI는 하나의 호스트 환경에서 컨테이너 구성을 관리하는 데 초점을 맞추고 있다. 대규모 클러스터 서버 구성 환경에서는 기능 면에서 효율적인 업무 관리가 부족하다. 따라서 서버 환경설정에서는 추가적인 컨테이너 관리를 서비스하는 플랫폼이 필요하다.This list of CRIs includes CRI-O, Docker, and gVisor. Among them, Docker is the most used platform because it is the oldest (mutually) platform used by many people. However, CRI focuses on managing container configuration in a single host environment. In a large cluster server configuration environment, efficient business management in terms of functions is insufficient. Therefore, in the server environment configuration, a platform that provides additional container management is required.

컨테이너는 점차 DevOps 협업에 사용되는 기본 환경이 되고 있다. 이와 동시에 컨테이너 조정은 각 터미널 간의 하드웨어 리소스 관리, 클라우드와 같은 대규모 분산 컴퓨팅 환경에서 안정적인 업무 업데이트 및 배포를 돕는다. 따라서 컨테이너 조정 플랫폼의 필요성이 커진다. Containers are increasingly becoming the preferred environment for DevOps collaboration. At the same time, container coordination helps manage hardware resources between each terminal, and reliably update and distribute work in a large-scale distributed computing environment such as the cloud. Therefore, the need for a container coordination platform grows.

도 4는 컨테이너 조정의 유형을 나타내는 도면이다. 4 is a diagram illustrating a type of container adjustment.

컨테이너 조정 유형에는 Docker 군집, 쿠버네티스, Apache Mesos가 있으며, 현재 쿠버네티스는 가장 많이 사용되고 체계적으로 우수한 시스템이다. 첫째, 쿠버네티스는 다른 플랫폼과 달리 다양한 환경에서 구축될 수 있다. 둘째, 사용자 옵션에 따른 컨테이너 스케줄링, 배치, 관리 측면에서 장점이 있다. 셋째, 다양한 방법으로 서비스 업무를 스케줄링하고, 이러한 업무에 적용되는 자원을 활용하는 측면에서도 장점이 있다. 쿠버네티스는 노드에서 실행되는 어플리케이션의 구현이다. 배포에는 Pod 세트를 사용하고 업데이트 작업에는 작업 및 데몬(daemon) 세트를 사용한다. 또 쿠버네티스는 내부적으로 별도의 가상 네트워크를 활용해 컨테이너별 환경을 지원하고 마스터 노드에서 외부 접근을 지원하는 API-서버를 활용해 클라이언트 요청에 따라 별도의 서비스 어플리케이션을 운영할 수 있다. 이때 내부 네트워크는 쿠베-프록시(Kube-proxy)의 조작을 통해 각각의 요청에 따라 Pod 연결을 확인하고, 분배 알고리즘을 이용하여 Pod의 연결을 처리한다. Container orchestration types include Docker clusters, Kubernetes, and Apache Mesos, and Kubernetes is currently the most popular and well-organized system. First, unlike other platforms, Kubernetes can be deployed in a variety of environments. Second, it has advantages in terms of container scheduling, deployment, and management according to user options. Third, there is an advantage in scheduling service tasks in various ways and utilizing the resources applied to these tasks. Kubernetes is an implementation of an application running on a node. You use a set of Pods for deployment and a set of tasks and daemons for update operations. In addition, Kubernetes internally utilizes a separate virtual network to support each container environment and utilizes an API-server that supports external access from the master node to operate a separate service application according to a client request. At this time, the internal network checks the Pod connection according to each request through the manipulation of the Kube-proxy, and processes the Pod connection using a distribution algorithm.

에지 컴퓨팅은 각 컴퓨팅 자원을 통합하는 것이 어렵기 때문에 주로 인근 컴퓨팅 영역에서 사전 처리에 사용되고 있다. 나아가 Nvidia의 Jetson 보드 등 IoT 영역에서 이용 가능한 GPU 자원을 가진 보드가 등장하면서, 에지 컴퓨팅 영역의 GPU 자원 활용도 점차 검토되고 있다. 예를 들어 Nvidia는 TensorRT를 제공하고, Nvidia 플랫폼의 비디오 분석 어플리케이션 개발을 단순화하는 DeepStream SDK와 같은 추론 프레임워크를 제공한다. Edge computing is mainly used for pre-processing in nearby computing areas because it is difficult to integrate each computing resource. Furthermore, as boards with GPU resources available in the IoT area such as Nvidia's Jetson board appear, the utilization of GPU resources in the edge computing area is also being considered. For example, Nvidia provides TensorRT and provides an inference framework such as the DeepStream SDK that simplifies the development of video analytics applications on the Nvidia platform.

그런 점에서 이 프레임워크는 GPU 자원의 활용으로 인해 에지 서버에 있는 센서의 원시 데이터보다는 주로 병렬 컴퓨팅을 통해 처리하고 비디오, 사진 등의 미디어 데이터를 처리하는 인공 신경망 처리에서 CPU에 비해 이점이 있다. 중앙 클라우드로의 트래픽을 줄이면서도 빠른 응답 시간과 처리 시간을 동시에 달성해야 한다. 대표적인 GPU 개발사인 NVIDIA는 컨테이너 환경의 변화에 따라 Docker 기반 GPU 자원을 관리하는 데 도움을 주기 위해 Nvidia-docker라는 런타임을 Github에 나타낸다. Nvidia-GPU 클라우드(Nvidia-GPU Cloud; NGC)를 통해 다양한 플랫폼에서 AI를 위한 개발 도구와 신경망 모델이 컨테이너 이미지로 지원되고 있다. In that respect, this framework has advantages over CPUs in artificial neural network processing, which mainly processes through parallel computing rather than raw data from sensors on edge servers due to the utilization of GPU resources and processes media data such as video and photos. While reducing traffic to the central cloud, you need to achieve fast response times and processing times at the same time. NVIDIA, a leading GPU developer, presents a runtime called Nvidia-docker on Github to help manage Docker-based GPU resources according to changes in the container environment. Development tools for AI and neural network models are supported as container images on various platforms through the Nvidia-GPU Cloud (NGC).

도 5는 본 발명의 일 실시예에 따른 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 방법을 설명하기 위한 흐름도이다. 5 is a flowchart illustrating a configuration of a GPU-based embedded edge server and a method of utilizing a neural network service according to an embodiment of the present invention.

제안하는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 방법은 모든 에지 서버에서 디바이스 쿼리 어플리케이션을 실행하고, GPU 규격에 대한 확장된 리소스의 업데이트가 HTTP 패치 요청으로 API 서버에 전송되어 모든 노드에 등록되는 단계(510), 클라이언트로부터 로컬로 존재하는 에지의 API 서버에 원하는 서비스를 요청하는 서비스 요청을 확인하는 단계(520), 서비스 요청을 확인 후, 스트림 소켓을 사용하여 데이터를 송수신하는 IoT 데이터 소스로부터 에지 서버가 서비스 요청에 대한 외부 데이터를 수신하는 단계(530), 수신된 외부 데이터에 대하여 쿠버네티스(Kuvernetes) 스케줄링을 사용하여 신경망 Pod를 할당하는 단계(540) 및 할당된 Pod가 신경망을 통해 외부 데이터를 처리하는 단계(550)를 포함한다. The proposed GPU-based embedded edge server configuration and neural network service utilization method runs the device query application on all edge servers, and updates the extended resources for GPU specifications are sent to the API server as an HTTP patch request and registered in all nodes. (510), the step of confirming a service request for requesting a desired service from the client to the API server of the locally existing edge (520), after confirming the service request, the edge from the IoT data source that transmits and receives data using a stream socket The server receives external data for the service request ( 530 ), allocating a neural network Pod using Kubernetes scheduling for the received external data ( 540 ), and the allocated Pod is external through the neural network. processing the data (550).

단계(510)에서, 모든 에지 서버에서 디바이스 쿼리 어플리케이션을 실행하고, GPU 규격에 대한 확장된 리소스의 업데이트가 HTTP 패치 요청으로 API 서버에 전송되어 모든 노드에 등록된다. In step 510 , the device query application is executed on all edge servers, and the update of the extended resource for the GPU standard is transmitted to the API server as an HTTP patch request and registered in all nodes.

단계(520)에서, 클라이언트로부터 로컬로 존재하는 에지의 API 서버에 원하는 서비스를 요청하는 서비스 요청을 확인한다. In step 520, a service request for requesting a desired service from the client to the API server of the edge existing locally is confirmed.

클라이언트의 각 요청에 대해 통신하는 프로그램은 클러스터에 접속 가능한 인증 라이브러리와 내부 클라우드 운영을 지원하는 쿠버네티스 API를 사용한다. The program that communicates for each request from the client uses a cluster-accessible authentication library and a Kubernetes API that supports internal cloud operations.

단계(530)에서, 서비스 요청을 확인 후, 스트림 소켓을 사용하여 데이터를 송수신하는 IoT 데이터 소스로부터 에지 서버가 서비스 요청에 대한 외부 데이터를 수신한다. In step 530 , after confirming the service request, the edge server receives external data for the service request from an IoT data source that transmits and receives data using a stream socket.

단계(540)에서, 수신된 외부 데이터에 대하여 쿠버네티스(Kuvernetes) 스케줄링을 사용하여 신경망 Pod를 할당한다. 새로 추가된 GPU 리소스에 대하여 신경망 Pod의 구성 파일에서 새로운 필터링 조건으로 설정된다. In step 540, a neural network Pod is allocated using Kubernetes scheduling for the received external data. A new filtering condition is set in the configuration file of the neural network pod for the newly added GPU resource.

수신된 외부 데이터는 쿠버네티스 스케줄링을 사용하여 신경망 Pod를 할당하는 단계는 탑재 가능 볼륨 및 볼륨 간 충돌의 노드 유효성을 점검하는 볼륨 필터링, Pod가 요청한 포트와 충분한 리소스를 점검하는 리소스 필터링, 클러스터 토폴로지에 따른 스케줄링을 지원하기 위해 허용 및 선택기를 포함하는 쿠버네티스 구성요소를 식별하는 토폴로지 필터링, 총 3개의 필터링으로 이루어진 유효 노드를 식별을 위한 노드 필터링 단계, 노드 필터링에서 얻은 할당 가능한 노드의 Pod 복제본 수, 노드 가용성, 리소스 균형, Pod 연관성(toleration) 및 테인트(taints)의 조건을 확인하여 할당 우선 순위와 노드의 레이블 조건을 결정하고 가중치를 부여하여 우선순위를 조정하는 노드 우선 순위 계산 단계 및 노드 필터링 및 노드 우선 순위 계산에 따라 신경망 Pod를 할당하는 실제 스케줄링 단계를 포함한다. Received external data uses Kubernetes scheduling to allocate neural network pods, volume filtering checks the mountable volumes and node validity for conflicts between volumes, resource filtering checks the ports requested by the pods and sufficient resources, cluster topology Topology filtering to identify Kubernetes components, including permits and selectors to support scheduling according to Node priority calculation step, which determines the allocation priority and label condition of the node by checking conditions of number, node availability, resource balance, Pod toleration and taints, and adjusts the priority by weighting, and It includes the actual scheduling step of allocating neural network Pods according to node filtering and node priority calculation.

마지막으로 단계(550)에서, 할당된 Pod가 신경망을 통해 외부 데이터를 처리한다. 도 6 내지 도 8을 참조하여, GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 방법에 대하여 더욱 상세히 설명한다. Finally, in step 550, the assigned Pod processes external data through the neural network. A GPU-based embedded edge server configuration and a method of utilizing a neural network service will be described in more detail with reference to FIGS. 6 to 8 .

본 발명에서는, 컨테이너 어플리케이션 환경을 지원하는 쿠버네티스를 이용하여 무선 환경에 네이티브 클러스터링 에지 서버를 구성한다. 따라서 GPU 자원을 활용한 어플리케이션의 실시와 어플리케이션에 대한 새로운 할당 방식을 확인한다. 먼저, 컨테이너 환경에서 테스트 어플리케이션을 위한 내장형 에지 서버 구조를 제안한다. In the present invention, a native clustering edge server is configured in a wireless environment using Kubernetes that supports a container application environment. Therefore, the implementation of the application utilizing the GPU resource and the new allocation method for the application are checked. First, we propose a built-in edge server structure for test applications in a container environment.

에지 서버 환경의 각 영역에서 IoT 데이터 소스는 비교적 큰 미디어 데이터(예를 들어, 사진, 음성)의 사전 처리를 위해 하드웨어 성능을 강화해야 하며, 데이터 전송을 위한 신뢰성 있는 전송 방법과 데이터 포맷이 필요하다. 종래기술에서는 장치에 대해 저장된 사진 데이터를 위한 전송 스트림 소켓을 구현했다. In each area of the edge server environment, IoT data sources must enhance hardware performance for pre-processing of relatively large media data (eg, photos, voices), and reliable transmission methods and data formats for data transmission are required. . The prior art implemented a transport stream socket for stored picture data for the device.

본 발명에서는 IoT 기기에 연결된 카메라에서 사진 데이터를 수집해 GStreamer를 사용한 포맷을 구현하고, 데이터를 기본 전송 대기열에 넣어 스트림 소켓에 연결하고 전송하는 구조를 제안한다. The present invention proposes a structure in which photo data is collected from a camera connected to an IoT device, a format using GStreamer is implemented, and the data is put in a basic transmission queue, connected to a stream socket, and transmitted.

제안된 에지 서버 영역은 단순한 데이터 사전처리가 아닌 신경망을 이용한 고성능 처리를 포함한다. 결과 데이터를 저장하고 클라우드 컴퓨팅 평면에 데이터를 전송하는 역할을 한다. 나아가 스케일링의 유용성과 사용성을 판단하기 위해, 추가적인 클라이언트 전송에 따라 여러 컨테이너가 운용되는 작업부하의 구현을 통해 운용을 확인한다. The proposed edge server area includes high-performance processing using neural networks rather than simple data pre-processing. It serves to store the resulting data and transmit the data to the cloud computing plane. Furthermore, in order to judge the usefulness and usability of scaling, the operation is verified through the implementation of a workload in which multiple containers are operated according to additional client transmission.

도 6은 제안된 에지 서버의 실현을 위한 구조를 나타낸다. 6 shows a structure for realization of the proposed edge server.

모든 노드의 CRI로 Docker를 사용하고, 각 노드의 클러스터링과 조정을 위한 쿠버네티스를 추가하여 에지 서버에서 요청된 GPU 지원 어플리케이션을 운용한다. It uses Docker as the CRI for all nodes, and adds Kubernetes for clustering and coordinating each node to run the requested GPU-enabled application on the edge server.

제안된 에지 서버 구성은 에지 서버에서의 GPU 활용을 위한 신경망 서비스의 구현을 가지고 있다. 신경망의 크기가 클수록 클라우드 컴퓨팅 평면에서 연산 속도가 빨라진다. 다만 경량 신경망 모델은 에지 서버에서도 정확도에 큰 차이가 없다면 대기시간 우위를 제공할 수 있다. The proposed edge server configuration has the implementation of neural network services for GPU utilization in edge servers. The larger the size of the neural network, the faster the computation speed in the cloud computing plane. However, lightweight neural network models can provide latency advantages even on edge servers if there is no significant difference in accuracy.

일반 컨테이너 환경에서는 CPU 및 RAM 자원만을 사용하여 신경망 구동을 위한 최대 성능을 얻을 수 없기 때문에 추가 구성이 필요하다. 따라서 신경망 모델 구동에 GPU 자원을 활용하기 위한 Nvidia-Docker 런타임은 호스트가 사용하는 그래픽 드라이버 라이브러리를 탑재해 컨테이너 환경에서 운영한다. 다만 신경망 개발 도구와 실행 환경을 담은 기존 이미지가 Jetson 보드에는 적합하지 않아 새로운 이미지를 구현할 필요가 있다. 더욱이 객체 감지 기반의 신경망 처리는 막대한 GPU 자원을 운용해야 하므로, 최상의 사양을 갖춘 노드 순서에 따라 적용 업무를 배분하는 것이 좋다. 그러나 기존 에지 서버에서는 쿠버네티스가 CPU와 RAM 자원만 모니터링해 Pod을 배포한다. 신경망 모델을 실행하는 어플리케이션에서는 GPU 자원을 식별하고 적절한 노드에 Pod를 할당하는 방법이 필요하다. 데스크톱 환경에서는 k8s-장치-플러그인이라는 리소스를 사용하지만, Jetson 보드 환경(ARM64 칩셋)에서는 NVML(Nvidia Management Library) 라이브러리 지원이 부족하여 k8s-장치-플러그인이 작동하지 않는다. In a general container environment, additional configuration is required because the maximum performance for running the neural network cannot be obtained using only CPU and RAM resources. Therefore, the Nvidia-Docker runtime to utilize GPU resources to drive the neural network model is loaded with the graphic driver library used by the host and operated in a container environment. However, the existing image containing the neural network development tool and execution environment is not suitable for the Jetson board, so a new image needs to be implemented. Moreover, object detection-based neural network processing requires the operation of huge GPU resources, so it is better to distribute the application tasks according to the order of the nodes with the best specifications. However, in traditional edge servers, Kubernetes only monitors CPU and RAM resources to deploy Pods. Applications running neural network models need a way to identify GPU resources and assign Pods to appropriate nodes. In desktop environment, a resource called k8s-device-plugin is used, but in Jetson board environment (ARM64 chipset), k8s-device-plugin does not work due to lack of NVML (Nvidia Management Library) library support.

도 7은 본 발명의 일 실시예에 따른 데이터 흐름의 프로세스에 따른 어플리케이션의 작동을 나타낸다. 7 illustrates the operation of an application according to a process of data flow according to an embodiment of the present invention.

본 발명에서는 GPU 리소스를 추가하여 GPU 어플리케이션 작업을 Jetson 보드 환경의 노드에 할당하는 방법을 제안한다. 제안 방법은 Jetson 보드 기반 쿠버네티스의 각 노드 내에 있는 GPU의 최대 클럭을 확인하고 이를 CPU 및 RAM 용량과 함께 확장 자원으로 추가하여 Pod를 적절한 노드에 할당하는 것이다. 이 프로세스는 도 7과 같이 각 노드의 성능을 쿼리하는 방식으로 구현되었다. 따라서 에지 서버 내부에 서로 다른 하드웨어 사양이 구성된 Jetson Series 보드의 경우, 에지 서버 내 신경망 어플리케이션의 확장성을 높이는 동시에 작업 효율을 높일 수 있다.The present invention proposes a method of allocating GPU application tasks to nodes in the Jetson board environment by adding GPU resources. The proposed method is to check the maximum clock of the GPU within each node of Kubernetes based on Jetson board and add it as an expansion resource along with CPU and RAM capacity to allocate the Pods to the appropriate nodes. This process was implemented by querying the performance of each node as shown in FIG. 7 . Therefore, in the case of Jetson Series boards with different hardware specifications inside the edge server, it is possible to increase the scalability of the neural network application in the edge server and at the same time increase the work efficiency.

딥러닝 서비스에서 제안된 에지 서버 구성에 도전하고 검증하기 위해, 완전히 다른 어플리케이션에 대해 다양한 신경망을 배치하고, 배치된 신경망은 다음과 같다: ssd-mobilenetv2를 이용한 객체 감지는 2D 영상을 기반으로 하고, DeepConvRNN을 사용한 드라이버 행동 프로파일링은 1D 스칼라 데이터(드라이버 데이터, 즉 가속, 제동 등)를 기반으로 한다. To challenge and validate the proposed edge server configuration in deep learning service, various neural networks are deployed for completely different applications, and the deployed neural networks are as follows: Object detection using ssd-mobilenetv2 is based on 2D images, Driver behavior profiling using DeepConvRNN is based on 1D scalar data (driver data, i.e. acceleration, braking, etc.).

앞에서 언급한 바와 같이, 제안된 작업에 대해 알고리즘을 선택하는 이유는, 클라이언트의 카메라로부터의 이미지가 제안된 연구 구성 하에서 작업자 노드의 딥러닝 서비스(이 경우 객체 감지)를 필요로 하는 실시간 환경에 대한 시나리오를 가정하기 위함이다. 마찬가지로, 1D 스칼러 데이터의 경우, 차량 내 CAN(조종장치 영역망) 버스의 센서 데이터가 제안된 에지 서버 구성의 서비스를 필요로 하는 연결된 차량 환경을 가정한다. 또한, 딥러닝 서비스의 이용으로 에지 서버는 드라이버 행동 프로파일링을 통해 드라이버의 신원을 감지할 수 있게 된다. As mentioned earlier, the reason for choosing the algorithm for the proposed task is that images from the client's camera are used for real-time environments that require deep learning services (in this case, object detection) of worker nodes under the proposed study configuration. to assume a scenario. Similarly, in the case of 1D scalar data, we assume a connected vehicle environment in which sensor data of an in-vehicle CAN (Cruel Control Area Network) bus requires services of the proposed edge server configuration. In addition, the use of deep learning services enables edge servers to detect the identity of drivers through driver behavior profiling.

각 모델은 제안된 작업에 사용된 딥러닝 알고리즘의 복잡성 수준을 설명하기 위해 간략하게 설명될 것이다.Each model will be briefly described to describe the level of complexity of the deep learning algorithm used in the proposed task.

이의 감지를 위해 SSD-Mobilenetv2라는 잘 알려진 딥러닝 알고리즘이 배치되며, 이는 SSD(Single Shot Multi-Box Detector)와 MobilNetv2의 두 가지 모델로 구성된다. 반면, MobileNetv2는 형상 추출기로 사용되며 훨씬 적은 파라미터(430만 파라미터)로 경쟁적 정확도를 달성한다. MobileNetv2는 SSD의 최적화된 버전과 결합되어 SSD-MobileNetv2라는 이름을 함께 사용할 경우 컴퓨팅 복잡성이 감소한다. 최근 SSD-MobileNetv2는 텐서RT를 사용해 고도로 최적화된 Jetson 추론 라이브러리의 모든 Jetson 시리즈(Xavier, TX1, TX2, Nano)에 사용할 수 있다. 다만, 본 발명에서는 공식 텐서플로우(TensorFlow) Github 페이지에서 이용할 수 있는 frozen 모델을 배치했다. 이 모델은 COCO 데이터세트에 대해 사전 교육을 받았으며, Jetson 추론 라이브러리에서 이용할 수 있는 것과 비교하여 개발을 위해 텐서플로우(TensorFlow)를 사용하여 쉽게 수정할 수 있다. For its detection, a well-known deep learning algorithm called SSD-Mobilenetv2 is deployed, which consists of two models: Single Shot Multi-Box Detector (SSD) and MobilNetv2. On the other hand, MobileNetv2 is used as a shape extractor and achieves competitive accuracy with far fewer parameters (4.3 million parameters). MobileNetv2 is combined with an optimized version of SSD to reduce computing complexity when using the SSD-MobileNetv2 name together. The latest SSD-MobileNetv2 is available for all Jetson series (Xavier, TX1, TX2, Nano) of Jetson inference libraries that are highly optimized using TensorRT. However, in the present invention, frozen models available on the official TensorFlow Github page have been placed. This model has been pre-trained on the COCO dataset and can be easily modified using TensorFlow for development compared to what is available in the Jetson inference library.

또한 제안된 구성을 바탕으로 에지 컴퓨팅의 보호 하에 컨테이너 환경에 대한 경량 딥러닝 모델로 수정하였다. 그것은 콘볼루션 층으로 구성된 유명한 다중 모델 네트워크에 기초하고, 그 다음에 순환 신경망이 있다. 컨테이너 환경에 따르면 파라미터와 메모리 이미지가 적은 컴팩트한 네트워크가 필요하다. 이와 관련하여 커널 크기, 커널 깊이, 윈도우의 스트라이드, 배치 크기, LSTM의 숨겨진 계층 수 등의 파라미터를 튜닝하고, 네트워크의 연산 및 크기를 줄이기 위해 attention unit을 줄여 네트워크를 더욱 최적화했다. 본 발명에서는 미묘한 양으로 정확도를 떨어뜨려 네트워크의 크기를 성공적으로 줄였다. 도 7에 설명된 아키텍처는 드라이버 프로파일링 및 식별을 위해 Ocslab 구동 데이터 세트를 이용한다. Ocslab 운전 데이터세트에는 차량 내 CAN 데이터 버스를 사용하여 획득한 51개의 운전 기능이 있다. 다만 드라이버 식별에는 드라이버의 개인 기술에 상당하는 15개의 최종목록 기능이 사용된다. 이 15가지 형상은 통계적(평균, 중간값, 표준편차) 형상으로 추가 처리되어 45차원 형상(15 × 3)을 만든다. Docker 이미지를 이용한 컨테이너 환경에서 텐서플로우(TensorFlow) 1.15를 이용한 알고리즘을 구현했다. 도 7을 설명하는 이유는 ssd-mobilenetv2와는 달리 드라이버 프로파일링은 Jetson 추론 라이브러리의 일부가 아니며, 제안하는 시나리오에 대해서는 서로 다른 알고리즘을 평가하고, 드라이버 프로파일링을 위해 이 도구를 선택했으며, 에지 서버 딥러닝 서비스의 제안된 구성에 따라 추가적으로 수정했기 때문이다. In addition, based on the proposed configuration, it was modified as a lightweight deep learning model for a container environment under the protection of edge computing. It is based on the famous multi-model network consisting of convolutional layers, followed by a recurrent neural network. Container environments require a compact network with fewer parameters and memory images. In this regard, we further optimized the network by tuning parameters such as kernel size, kernel depth, window stride, batch size, and number of hidden layers of LSTM, and reducing attention units to reduce the computation and size of the network. In the present invention, we have successfully reduced the size of the network by reducing the accuracy by a subtle amount. The architecture described in Figure 7 uses the Ocslab driven data set for driver profiling and identification. The Ocslab driving dataset contains 51 driving functions acquired using an in-vehicle CAN data bus. However, 15 final list functions corresponding to the driver's personal description are used for driver identification. These 15 shapes are further processed into statistical (mean, median, standard deviation) shapes to create a 45-dimensional shape (15 × 3). An algorithm using TensorFlow 1.15 was implemented in a container environment using a Docker image. The reason for explaining Figure 7 is that, unlike ssd-mobilenetv2, driver profiling is not part of the Jetson inference library, we evaluated different algorithms for the scenarios we propose, and chose this tool for driver profiling, edge server deep This is because it was additionally modified according to the proposed configuration of the learning service.

본 발명의 실시예에 따른 GPU 리소스 스케줄링의 구조는 쿠버네티스를 이용한 컨테이너 환경에 기초하고 있으며, 에지 서버가 신경망을 구동하는 전체 과정은 도 5와 같다.The structure of GPU resource scheduling according to an embodiment of the present invention is based on a container environment using Kubernetes, and the entire process of running the neural network by the edge server is shown in FIG. 5 .

다시 도 5를 참조하면, 먼저 단계(510)에서 모든 에지 서버의 사전 설정된 설정으로 디바이스 쿼리 어플리케이션을 실행하고, GPU 규격에 대한 확장된 리소스의 업데이트는 HTTP 패치 요청으로 API-서버에 전송되어 모든 노드에 등록된다.Referring back to FIG. 5 , first, in step 510 , the device query application is executed with the preset settings of all edge servers, and the update of the extended resource for the GPU specification is transmitted to the API-server as an HTTP patch request to all nodes is registered in

단계(520)에서, 에지 서버 근처의 클라이언트는 로컬로 존재하는 작은 에지의 API 서버에 원하는 서비스를 요청한다. 클라이언트의 각 요청에 대해 통신하는 프로그램은 클러스터에 접속할 수 있는 인증 라이브러리와 내부 클라우드 운영을 지원하는 쿠버네티스 API를 사용한다.In step 520, a client near the edge server makes a request for a desired service to an API server of a small edge that exists locally. The program that communicates for each request from the client uses an authentication library that can connect to the cluster and the Kubernetes API to support internal cloud operations.

단계(530)에서, 요청 확인 후, 에지 서버는 IoT 데이터 소스에 연결된 데이터 수신 서버를 운영한다. 전송되는 데이터는 대용량 미디어 파일(예를 들어, 사진, 비디오 포맷 파일)과 스칼라 데이터(예를 들어, 센서 데이터, 단일 값 데이터)를 포함하며, 데이터 소스에서 스트림 소켓을 사용하여 에지 서버로 전송된다.In step 530 , after confirming the request, the edge server operates a data receiving server connected to the IoT data source. The data transferred includes large media files (e.g., photo, video format files) and scalar data (e.g., sensor data, single-valued data) and is sent from the data source to the edge server using stream sockets. .

단계(540)에서, 신경망 구동 Pod의 노드 할당은 쿠버네티스에 포함된 Kube-스케줄러의 스케줄링을 사용한다. 이때 새로 추가된 GPU 리소스는 신경망 구동 pod의 구성 파일에서 새로운 필터링 조건으로 설정되며, Pod가 할당된다.In step 540, the node assignment of the neural network driven Pod uses the scheduling of the Kube-Scheduler included in Kubernetes. At this time, the newly added GPU resource is set as a new filtering condition in the configuration file of the neural network driving pod, and the pod is assigned.

단계(550)에서, 할당된 Pod는 신경망을 통해 외부 데이터를 처리한다. 동일한 통신을 사용한 결과, 데이터 재전송이 처리되는 신경망 Pod의 성능에 영향을 미친다는 점에서, 에지 서버는 외부로부터의 인증을 통한 에지 서버 내부의 별도의 재전송 Pod나 모니터링 Pod를 이용한다.In step 550, the assigned Pod processes the external data through the neural network. As a result of using the same communication, the edge server uses a separate retransmission pod or monitoring pod inside the edge server through authentication from the outside in that data retransmission affects the performance of the processed neural network pod.

이 과정에서 단계(540)에서의 쿠버네티스 스케줄링은 1) 노드 필터링, 2) 노드 우선 순위 계산, 3) 실제 스케줄링 작업 등 3단계로 구성된다. 첫째 단계 노드 필터링은 유효한 노드를 식별하는 절차로서 필터를 볼륨 필터, 리소스 필터, 토폴로지 필터로 분류할 수 있다. 볼륨 필터는 탑재 가능 볼륨 및 볼륨 간 충돌과 같은 노드의 유효성을 점검한다. 리소스 필터는 Pod가 요청한 포트와 충분한 리소스(예를 들어, 노드 저장 공간, CPU, RAM 및 확장 리소스)를 점검한다. 토폴로지 필터는 클러스터 토폴로지에 따른 스케줄링을 지원하기 위해 허용 및 선택기와 같은 쿠버네티스 구성요소를 식별한다. 두 번째 단계인 노드 우선 순위 계산은 이전 노드 필터링에서 얻은 할당 가능한 노드의 Pod 복제본 수, 노드 가용성, 리소스 균형, Pod-관련성 및 테인트(taints)과 같은 조건을 확인하여 할당 우선 순위를 결정한다. 그 중 관련성은 노드 자체의 라벨 조건을 결정하고 가중치를 부여하여 우선순위를 조정한다. In this process, Kubernetes scheduling in step 540 consists of three steps: 1) node filtering, 2) node priority calculation, and 3) actual scheduling work. The first stage node filtering is a procedure for identifying valid nodes. Filters can be classified into volume filters, resource filters, and topology filters. Volume filters check the validity of nodes, such as mountable volumes and conflicts between volumes. The resource filter checks the ports requested by the Pod and sufficient resources (eg node storage, CPU, RAM, and scaling resources). Topology filters identify Kubernetes components such as permits and selectors to support scheduling according to the cluster topology. The second step, Calculating Node Priority, determines the allocation priority by checking conditions such as the number of Pod replicas of allocatable nodes, node availability, resource balance, Pod-relevance, and taints obtained from previous node filtering. Among them, relevance determines the label condition of the node itself and adjusts its priority by weighting it.

따라서 제안된 확장된 리소스 추가 및 선호도를 변경하기 위한 알고리즘은 알고리즘 1과 같다. Therefore, the algorithm for adding the proposed extended resource and changing the preference is the same as Algorithm 1.

이 알고리즘에서는 각 Pod에 적합한 노드의 볼륨, CPU, RAM 용량을 필요한 양과 비교하고, toleration/taints 함수를 사용한다. 이후 쿠버네티스 시스템에 영향을 미치는 노드 선택기 설정을 점검한다. 마지막으로 Pod 지원 노드가 목록에 추가된다.In this algorithm, the volume, CPU, and RAM capacity of the node suitable for each Pod is compared with the required amount, and the toleration/taints function is used. Then check the node selector settings that affect the Kubernetes system. Finally, the Pod support node is added to the list.

도 7은 데이터 흐름의 프로세스에 따른 어플리케이션의 작동을 나타낸다. 또한 어플리케이션에서 객체 감지를 수행하는 추정 신경망 모델은 텐서플로우(TensorFlow) 객체 감지 API에서 COCO 데이터세트로 훈련된 ssd-mobilenet-v2 모델을 얻을 수 있는 사전 훈련된 모델을 사용한다.7 shows the operation of the application according to the process of the data flow. In addition, the estimation neural network model that performs object detection in the application uses a pre-trained model from which you can get the ssd-mobilenet-v2 model trained with the COCO dataset from the TensorFlow object detection API.

도 8은 본 발명의 일 실시예에 따른 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템의 구성을 나타내는 도면이다. 8 is a diagram illustrating a configuration of a GPU-based embedded edge server and a system for utilizing a neural network service according to an embodiment of the present invention.

본 실시예에 따른 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)은 프로세서(810), 버스(820), 네트워크 인터페이스(830), 메모리(840) 및 데이터베이스(850)를 포함할 수 있다. 메모리(840)는 운영체제(841) 및 GPV 기반 임베디드 에지 서버 구성과 신경망 서비스 활용 루틴(842)을 포함할 수 있다. 프로세서(810)는 어플리케이션 실행부(811), 서비스 요청 확인부(812), Pod 할당부(813) 및 데이터 처리부(814)를 포함할 수 있다. 다른 실시예들에서 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)은 도 8의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)은 디스플레이나 트랜시버(transceiver)와 같은 다른 구성요소들을 포함할 수도 있다.The GPU-based embedded edge server configuration and neural network service utilization system 800 according to the present embodiment may include a processor 810 , a bus 820 , a network interface 830 , a memory 840 , and a database 850 . . The memory 840 may include an operating system 841 and a GPV-based embedded edge server configuration and a neural network service utilization routine 842 . The processor 810 may include an application execution unit 811 , a service request confirmation unit 812 , a pod allocator 813 , and a data processing unit 814 . In other embodiments, the GPU-based embedded edge server configuration and neural network service utilization system 800 may include more components than those of FIG. 8 . However, there is no need to clearly show most of the prior art components. For example, the GPU-based embedded edge server configuration and neural network service utilization system 800 may include other components such as a display or a transceiver.

메모리(840)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 또한, 메모리(840)에는 운영체제(841)와 GPV 기반 임베디드 에지 서버 구성과 신경망 서비스 활용 루틴(842)을 위한 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 드라이브 메커니즘(drive mechanism, 미도시)을 이용하여 메모리(840)와는 별도의 컴퓨터에서 판독 가능한 기록 매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록 매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록 매체(미도시)를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록 매체가 아닌 네트워크 인터페이스(830)를 통해 메모리(840)에 로딩될 수도 있다. The memory 840 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. In addition, program codes for the operating system 841 and the GPV-based embedded edge server configuration and neural network service utilization routine 842 may be stored in the memory 840 . These software components may be loaded from a computer-readable recording medium separate from the memory 840 using a drive mechanism (not shown). The separate computer-readable recording medium may include a computer-readable recording medium (not shown) such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card. In another embodiment, the software components may be loaded into the memory 840 through the network interface 830 instead of a computer-readable recording medium.

버스(820)는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)의 구성요소들간의 통신 및 데이터 전송을 가능하게 할 수 있다. 버스(820)는 고속 시리얼 버스(high-speed serial bus), 병렬 버스(parallel bus), SAN(Storage Area Network) 및/또는 다른 적절한 통신 기술을 이용하여 구성될 수 있다.The bus 820 may enable communication and data transmission between components of the GPU-based embedded edge server configuration and the neural network service utilization system 800 . Bus 820 may be configured using a high-speed serial bus, parallel bus, storage area network (SAN), and/or other suitable communication technology.

네트워크 인터페이스(830)는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)을 컴퓨터 네트워크에 연결하기 위한 컴퓨터 하드웨어 구성요소일 수 있다. 네트워크 인터페이스(830)는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)을 무선 또는 유선 커넥션을 통해 컴퓨터 네트워크에 연결시킬 수 있다.The network interface 830 may be a computer hardware component for connecting the GPU-based embedded edge server configuration and the neural network service utilization system 800 to a computer network. The network interface 830 may connect the GPU-based embedded edge server configuration and the neural network service utilization system 800 to a computer network through a wireless or wired connection.

데이터베이스(850)는 GPV 기반 임베디드 에지 서버 구성과 신경망 서비스 활용을 위해 필요한 모든 정보를 저장 및 유지하는 역할을 할 수 있다. 도 8에서는 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)의 내부에 데이터베이스(850)를 구축하여 포함하는 것으로 도시하고 있으나, 이에 한정되는 것은 아니며 시스템 구현 방식이나 환경 등에 따라 생략될 수 있고 혹은 전체 또는 일부의 데이터베이스가 별개의 다른 시스템 상에 구축된 외부 데이터베이스로서 존재하는 것 또한 가능하다.The database 850 may serve to store and maintain all information necessary for configuring the GPV-based embedded edge server and utilizing the neural network service. In FIG. 8, the GPU-based embedded edge server configuration and the neural network service utilization system 800 are illustrated as being built and included in the database 850, but the present invention is not limited thereto and may be omitted depending on the system implementation method or environment. Alternatively, it is also possible that all or part of the database exists as an external database built on a separate other system.

프로세서(810)는 기본적인 산술, 로직 및 GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)의 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(840) 또는 네트워크 인터페이스(830)에 의해, 그리고 버스(820)를 통해 프로세서(810)로 제공될 수 있다. 프로세서(810)는 어플리케이션 실행부(811), 서비스 요청 확인부(812), Pod 할당부(813) 및 데이터 처리부(814)를 위한 프로그램 코드를 실행하도록 구성될 수 있다. 이러한 프로그램 코드는 메모리(840)와 같은 기록 장치에 저장될 수 있다. The processor 810 may be configured to process commands of a computer program by performing basic arithmetic, logic, and GPU-based embedded edge server configuration and input/output operations of the neural network service utilization system 800 . Instructions may be provided to processor 810 by memory 840 or network interface 830 and via bus 820 . The processor 810 may be configured to execute program codes for the application execution unit 811 , the service request confirmation unit 812 , the Pod allocation unit 813 , and the data processing unit 814 . Such program codes may be stored in a recording device such as the memory 840 .

어플리케이션 실행부(811), 서비스 요청 확인부(812), Pod 할당부(813) 및 데이터 처리부(814)는 도 5의 단계들(510~550)을 수행하기 위해 구성될 수 있다.The application execution unit 811 , the service request confirmation unit 812 , the Pod allocator 813 , and the data processing unit 814 may be configured to perform steps 510 to 550 of FIG. 5 .

GPU기반 임베디드 에지 서버 구성과 신경망 서비스 활용 시스템(800)은 어플리케이션 실행부(811), 서비스 요청 확인부(812), Pod 할당부(813) 및 데이터 처리부(814)를 포함할 수 있다. The GPU-based embedded edge server configuration and neural network service utilization system 800 may include an application execution unit 811 , a service request confirmation unit 812 , a pod assignment unit 813 , and a data processing unit 814 .

어플리케이션 실행부(811)는 모든 에지 서버에서 디바이스 쿼리 어플리케이션을 실행하고, GPU 규격에 대한 확장된 리소스의 업데이트를 HTTP 패치 요청으로 API 서버에 전송하여 모든 노드에 등록한다. The application execution unit 811 executes the device query application in all edge servers, transmits the update of the extended resource for the GPU standard to the API server as an HTTP patch request, and registers it in all nodes.

서비스 요청 확인부(812)는 API 서버에 원하는 서비스를 요청 및 확인 후, IoT 데이터 소스로부터 요청에 대한 외부 데이터를 스트림 소켓으로 수신하는 데이터를 송수신한다. The service request confirmation unit 812 requests and confirms a desired service from the API server, and then transmits/receives data for receiving external data for the request from the IoT data source to the stream socket.

Pod 할당부(813)는 수신된 외부 데이터에 대하여 쿠버네티스(Kuvernetes) 스케줄링을 사용하여 신경망 Pod를 할당한다. Pod 할당부(813)는 새로 추가된 GPU 리소스에 대하여 신경망 Pod의 구성 파일에서 새로운 필터링 조건으로 설정한다. The pod allocator 813 allocates a neural network pod using Kubernetes scheduling for the received external data. The pod allocator 813 sets a new filtering condition in the configuration file of the neural network pod for the newly added GPU resource.

Pod 할당부(813)는 탑재 가능 볼륨 및 볼륨 간 충돌의 노드 유효성을 점검하는 볼륨 필터링, Pod가 요청한 포트와 충분한 리소스를 점검하는 리소스 필터링, 클러스터 토폴로지에 따른 스케줄링을 지원하기 위해 허용 및 선택기를 포함하는 쿠버네티스 구성요소를 식별하는 토폴로지 필터링을 통해 유효한 노드를 식별하는 노드 필터링을 수행한다.Pod allocator 813 includes volume filtering to check mountable volumes and node validity for conflicts between volumes, resource filtering to check ports requested by Pods and sufficient resources, and allow and selectors to support scheduling according to cluster topology Perform node filtering to identify valid nodes through topology filtering to identify Kubernetes components that

이후, 노드 필터링에서 얻은 할당 가능한 노드의 Pod 복제본 수, 노드 가용성, 리소스 균형, Pod-관련성 및 테인트(taints)의 조건을 확인하여 할당 우선 순위를 결정하고, Pod-관련성은 노드의 레이블 조건을 결정하고 가중치를 부여하여 우선순위를 조정한다. Thereafter, the allocation priority is determined by checking the conditions of the number of Pod replicas of the allocable node, node availability, resource balance, Pod-relevance and taints obtained from node filtering, and the Pod-relevance determines the label condition of the node. It decides and assigns weights to adjust priorities.

그리고, 노드 필터링 및 노드 우선 순위 계산에 따라 신경망 Pod를 할당하는 실제 스케줄링 과정을 수행한다. Then, the actual scheduling process of allocating neural network Pods according to node filtering and node priority calculation is performed.

마지막으로 데이터 처리부(814)는 할당된 Pod가 신경망을 사용하여 외부 전송 및 내부로 저장되는 데이터를 처리한다. Finally, the data processing unit 814 processes the data that the allocated Pod is transmitted externally and stored internally using a neural network.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, the devices and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU). ), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose computers or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible for those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

executing a device query application on all edge servers, and when GPU resources are added, updating information on the added GPU resources is transmitted to the API server as an HTTP patch request and registered in all nodes;
checking a service request for requesting a desired service from a client to an API server of an edge existing locally;
After confirming the service request, the edge server receiving external data for the service request from an IoT data source that transmits and receives data using a stream socket;
allocating neural network Pods using Kubernetes scheduling for received external data; and
The assigned Pods process external data through the neural network.
including,
Allocating neural network pods using Kubernetes scheduling for received external data includes:
When allocating a newly added GPU resource to a neural network pod, filtering conditions are set according to the information about the added GPU resource,
Allocating the neural network Pod comprises:
Kubernetes components including permissions and selectors to support mountable volumes and node validity checking for conflicts between volumes, resource filtering to check ports requested by Pods and sufficient resources, and scheduling based on cluster topology. a node filtering step of identifying valid nodes through the identifying topology filtering;
a node priority calculation step of determining a label condition of a node and adjusting the priority by assigning weights; and
It includes the actual scheduling step of allocating neural network Pods based on node filtering and node priority calculation.
How to configure embedded edge servers and utilize neural network services.

According to claim 1,
The step of confirming the service request requesting the desired service from the client to the API server of the edge existing locally is:
The program that communicates for each request from the client uses the cluster-accessible authentication library and the Kubernetes API to support internal cloud operations.
How to configure embedded edge servers and utilize neural network services.

delete

an application execution unit that executes a device query application on all edge servers and, when a GPU resource is added, transmits an update of information about the added GPU resource to the API server as an HTTP patch request and registers it with all nodes;
After requesting and confirming a desired service to the API server, a service request confirmation unit for transmitting and receiving data for receiving external data for the request from the IoT data source through the stream socket;
Pod allocator for allocating neural network Pods using Kubernetes scheduling for received external data; and
Data processing unit where assigned Pods use neural networks to process data sent externally and stored internally
including,
Pod allocation unit,
When allocating a newly added GPU resource to a neural network pod, a filtering condition is set according to the information about the added GPU resource,
Kubernetes components including permissions and selectors to support mountable volumes and node validity checking for conflicts between volumes, resource filtering to check ports requested by Pods and sufficient resources, and scheduling based on cluster topology. performing node filtering to identify valid nodes through topology filtering to identify;
calculating a node priority for determining a label condition of a node and adjusting the priority by weighting;
Actual scheduling of allocating neural network Pods based on node filtering and node priority calculation
Embedded edge server configuration and neural network service utilization system.

6. The method of claim 5,
The service request confirmation unit,
The program that communicates for each request from the client uses the cluster-accessible authentication library and the Kubernetes API to support internal cloud operations.
Embedded edge server configuration and neural network service utilization system.

delete