KR20200023239A

KR20200023239A - Electronic device and operating method for processing a neural network model by using a plurality of processors

Info

Publication number: KR20200023239A
Application number: KR1020190103878A
Authority: KR
Inventors: 마나스 사니; 아룬 아브라함; 쿠마르 샤란 알루르; 벤카파 말라
Original assignee: 삼성전자주식회사
Priority date: 2018-08-23
Filing date: 2019-08-23
Publication date: 2020-03-04
Also published as: CN112585624A; EP3794517A4; EP3794517A1; WO2020040599A1

Abstract

Provided is a method for processing a neural network model by using a plurality of processors in an electronic device. The method for processing a neural network model by using a plurality of processors in an electronic device comprises: allocating a plurality of layers included in a neural network model to at least one slice; allocating the at least one slice to a plurality of processors based on a processing time of each of the plurality of processors for each of the at least one slice; and processing a neural network model by using the plurality of processors based on the allocation result, wherein the processing time includes a switching time for receiving data required to process a current slice by a current processor from a previous processor processing a previous slice.

Description

Electronic device and operating method for processing a neural network model by using a multiple of processors}

본 개시는, 복수의 프로세서를 이용하여 신경망 모델을 처리하는 전자 장치 및 그 동작 방법에 관한 것이다. The present disclosure relates to an electronic device that processes a neural network model using a plurality of processors, and a method of operating the same.

전자 장치는 신경망 모델 기반의 딥러닝 기술을 이용하여, 얼굴 인식, 음성 인식, 영상 처리 등을 수행하고, 그 결과를 사용자에게 제공할 수 있다. The electronic device may perform face recognition, voice recognition, image processing, and the like, using a neural network model-based deep learning technology, and provide the result to the user.

전자 장치는 복수의 프로세서, 예를 들면, CPU(Central Processing Unit), GPU(Graphics Processing Unit), NPU(Neural Processing Unit), DSP(Digital Signal Processor) 등을 이용하여 신경망 모델을 처리할 수 있다. 전자 장치는, 신경망 모델을 구성하는 복수 개의 레이어들을 복수의 프로세서에 할당함으로써, 복수의 프로세서를 이용하여 신경망 모델을 처리할 수 있다.The electronic device may process the neural network model using a plurality of processors, for example, a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), a digital signal processor (DSP), and the like. The electronic device may process the neural network model using the plurality of processors by assigning a plurality of layers constituting the neural network model to the plurality of processors.

다만, 각각의 프로세서의 성능과 각각의 레이어의 특징에 따라 처리 속도 및 정확도가 서로 다를 수 있다. 따라서, 복수의 프로세서의 처리 능력과 레이어의 특징에 기초하여, 각 프로세서가 처리할 신경망 모델의 레이어를 할당하는 방법이 요구되고 있다.However, processing speed and accuracy may vary depending on the performance of each processor and the characteristics of each layer. Accordingly, there is a demand for a method of allocating a layer of a neural network model to be processed by each processor based on the processing capability of the plurality of processors and the characteristics of the layers.

본 개시가 해결하고자 하는 과제는 전술한 문제를 해결하기 위한 것으로서, 복수의 프로세서를 이용하여 신경망 모델을 처리하는 전자 장치 및 그 동작 방법을 제공하기 위한 것이다. SUMMARY An object of the present disclosure is to solve the above-described problem, and to provide an electronic device for processing a neural network model using a plurality of processors and a method of operating the same.

또한, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는 데 있다. 해결하려는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.In addition, the present invention provides a computer-readable recording medium having recorded thereon a program for executing the method on a computer. The technical problem to be solved is not limited to the above technical problems, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제1 측면은, 신경망 모델에 포함된 복수 개의 레이어를 적어도 하나의 슬라이스에 할당하는 단계; 상기 적어도 하나의 슬라이스 각각에 대한 상기 복수의 프로세서 각각의 처리 시간에 기초하여, 상기 복수의 프로세서에 상기 적어도 하나의 슬라이스를 할당하는 단계; 및 상기 할당된 결과에 기초하여, 상기 복수의 프로세서를 이용하여 상기 신경망 모델을 처리하는 단계를 포함하는 전자 장치에서 복수의 프로세서를 이용하여 신경망 모델을 처리하는 방법을 제공할 수 있고, 상기 처리 시간은, 이전 슬라이스를 처리하는 이전 프로세서로부터, 현재 프로세서가 현재 슬라이스를 처리하는데 필요한 데이터를 수신하는데 걸리는 스위칭 시간을 포함한다.As a technical means for achieving the above technical problem, a first aspect of the present disclosure, the step of assigning a plurality of layers included in the neural network model to at least one slice; Assigning the at least one slice to the plurality of processors based on a processing time of each of the plurality of processors for each of the at least one slice; And processing the neural network model by using the plurality of processors, based on the assigned result, and processing the neural network model by using the plurality of processors. Includes the switching time taken for the current processor to receive data needed to process the current slice from the previous processor processing the previous slice.

또한, 본 개시의 제2 측면은, 신경망 모델을 저장하는 메모리; 상기 신경망 모델에 포함된 복수 개의 레이어를 적어도 하나의 슬라이스에 할당하고, 상기 적어도 하나의 슬라이스 각각에 대한 복수의 프로세서 각각의 처리 시간에 기초하여, 상기 복수의 프로세서에 상기 적어도 하나의 슬라이스를 할당하고, 상기 할당된 결과에 기초하여, 상기 복수의 프로세서를 이용하여 상기 신경망 모델을 처리하는 적어도 하나의 프로세서; 및 상기 신경망 모델이 처리된 결과를 출력하는 출력부를 포함하고, 상기 처리 시간은, 이전 슬라이스를 처리하는 이전 프로세서로부터, 현재 프로세서가 현재 슬라이스를 처리하는데 필요한 데이터를 수신하는데 걸리는 스위칭 시간을 포함하는, 전자 장치를 제공할 수 있다.In addition, a second aspect of the present disclosure includes a memory for storing a neural network model; Allocating a plurality of layers included in the neural network model to at least one slice, and assigning the at least one slice to the plurality of processors based on a processing time of each of the plurality of processors for each of the at least one slice, At least one processor configured to process the neural network model using the plurality of processors based on the assigned result; And an output unit configured to output a result of processing the neural network model, wherein the processing time includes a switching time taken by the current processor to receive data necessary for processing the current slice from a previous processor processing the previous slice. An electronic device can be provided.

또한, 본 개시의 제3 측면은, 제1 측면 또는 제2 측면의 방법을 수행하도록 하는 프로그램이 저장된 기록매체를 제공할 수 있다.In addition, the third aspect of the present disclosure may provide a recording medium storing a program for performing the method of the first aspect or the second aspect.

일 실시 예에 의하면, 복수의 프로세서를 이용하여 더 빠르고 정확도 높게 신경망 모델을 처리할 수 있다.According to an embodiment of the present disclosure, the neural network model may be processed more quickly and accurately using a plurality of processors.

도 1은 일 실시 예에 의한 전자 장치에서 신경망 모델을 처리하는 일 예를 나타낸 도면이다.
도 2는 일 실시 예에 따라 신경망 모델의 레이어가 적어도 하나의 슬라이스로 할당되는 일 예를 나타낸 도면이다.
도 3 는 일 실시 예에 의한 전자 장치의 내부 구성을 설명하기 위한 블록도이다.
도 4은 일 실시 예에 의한 전자 장치의 내부 구성을 설명하기 위한 블록도이다.
도 5는 일 실시 예에 의한 복수의 프로세서를 이용하여 신경망 모델을 처리하는 방법을 나타낸 순서도이다.
도 6은 일 실시 예에 따른 슬라이스를 복수의 프로세서에 할당하는 방법을 나타낸 순서도이다.
도 7은 일 실시 예에 의한 슬라이스에 프로세서를 할당하는 일 예를 나타낸 도면이다.
도 8은 일 실시 예에 의한 레이어에서 메모리가 할당되는 일 예를 나타낸 도면이다.
도 9는 일 실시 예에 의한 레이어의 입출력 데이터에 대한 메모리를 할당하는 방법을 나타낸 순서도이다.
도 10은 일 실시 예에 따라 레이어 내부의 블롭을 식별하는 일 예를 나타낸 도면이다.
도 11은 일 실시 예에 따라 레이어 내부의 블롭을 포함한 신경망 모델의 블롭에 대해 메모리를 할당하는 일 예를 나타낸 것이다.
도 12는 일 실시 예에 의한 복수의 프로세서에 의해 신경망 모델이 처리되는 일 예를 나타낸 도면이다.1 is a diagram illustrating an example of processing a neural network model in an electronic device according to an embodiment of the present disclosure.
2 is a diagram illustrating an example in which a layer of a neural network model is allocated to at least one slice, according to an exemplary embodiment.
3 is a block diagram illustrating an internal configuration of an electronic device according to an embodiment of the present disclosure.
4 is a block diagram illustrating an internal configuration of an electronic device according to an embodiment of the present disclosure.
5 is a flowchart illustrating a method of processing a neural network model using a plurality of processors according to an exemplary embodiment.
6 is a flowchart illustrating a method of allocating a slice to a plurality of processors according to an exemplary embodiment.
7 is a diagram illustrating an example of allocating a processor to a slice, according to an exemplary embodiment.
8 is a diagram illustrating an example of allocating a memory in a layer according to an embodiment.
9 is a flowchart illustrating a method of allocating memory for input / output data of a layer, according to an exemplary embodiment.
10 is a diagram illustrating an example of identifying a blob in a layer according to an embodiment.
11 illustrates an example of allocating memory to a blob of a neural network model including a blob in a layer according to an embodiment.
12 is a diagram illustrating an example in which a neural network model is processed by a plurality of processors according to an exemplary embodiment.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, this means that it may further include other components, except to exclude other components unless otherwise stated.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다. Functions related to artificial intelligence according to the present disclosure are operated through a processor and a memory. The processor may consist of one or a plurality of processors. In this case, the one or more processors may be a general purpose processor such as a CPU, an AP, a digital signal processor (DSP), a graphics dedicated processor such as a GPU, a vision processing unit (VPU), or an artificial intelligence dedicated processor such as an NPU. One or more processors control to process the input data according to a predefined operating rule or artificial intelligence model stored in the memory. Alternatively, when one or a plurality of processors is an AI dedicated processor, the AI dedicated processor may be designed with a hardware structure specialized for processing a specific AI model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.The predefined action rule or artificial intelligence model is characterized by being made through learning. In this case, it is made through learning that a basic AI model is trained using a plurality of learning data by a learning algorithm, thereby creating a predefined action rule or AI model set to perform a desired characteristic (or purpose). It means load. Such learning may be made in the device itself in which the artificial intelligence according to the present disclosure is performed, or may be made through a separate server and / or system. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the above examples.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.The artificial intelligence model may consist of a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and performs neural network operation through an operation between a calculation result of a previous layer and a plurality of weights. The plurality of weights of the plurality of neural network layers may be optimized by learning results of the AI model. For example, the plurality of weights may be updated to reduce or minimize a loss value or a cost value acquired in the AI model during the learning process. Artificial neural networks may include deep neural networks (DNNs), for example, convolutional neural networks (CNNs), deep neural networks (DNNs), recurrent neural networks (RNNs), restricted boltzmann machines (RBMs), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), Deep Q-Networks, and the like, but are not limited to the above examples.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일 실시 예에 의한 전자 장치(1000)에서 신경망 모델(110)을 처리하는 일 예를 나타낸 도면이다.1 is a diagram illustrating an example of processing a neural network model 110 in an electronic device 1000 according to an embodiment of the present disclosure.

도 1을 참조하면, 전자 장치(1000)는, 복수의 프로세서들(1310, 1320, 1330)을 포함하는 프로세서(1300)를 이용하여 신경망 모델(110)을 처리할 수 있다. Referring to FIG. 1, the electronic apparatus 1000 may process the neural network model 110 by using a processor 1300 including a plurality of processors 1310, 1320, and 1330.

일 실시 예에 의하면, 전자 장치(1000)는 컴파일 단계 및 작업 수행 단계를 통해 복수의 프로세서를 이용하여, 신경망 모델(110)을 처리할 수 있다. 예를 들면, 전자 장치(1000)는 복수의 프로세서들이 수행할 작업을 할당하는 컴파일(compile) 단계와, 컴파일 단계에서의 할당 결과에 따라, 복수의 프로세서들을 이용하여 작업을 수행하는 작업 처리 단계를 수행할 수 있다.According to an embodiment of the present disclosure, the electronic apparatus 1000 may process the neural network model 110 by using a plurality of processors through a compilation step and a task execution step. For example, the electronic apparatus 1000 may include a compile step of allocating a task to be performed by a plurality of processors, and a task processing step of performing a task using a plurality of processors according to the allocation result in the compilation step. Can be done.

일 실시 예에 의한 컴파일 단계는, 상기 복수의 프로세서 중 적어도 하나의 프로세서에 의해 수행될 수 있다.Compiling may be performed by at least one processor among the plurality of processors.

일 실시 예에 의한 컴파일 단계는, 복수의 프로세서들이 수행할 작업을 할당하는 동작뿐만 아니라, 작업을 수행하는데 이용되는 데이터를 저장하기 위한 메모리를 할당하는 동작도 포함할 수 있다.According to an embodiment, the compiling step may include an operation of allocating a memory for storing data used to perform a task, as well as an operation of allocating a task to be performed by a plurality of processors.

일 실시예에 따른 전자 장치(1000)는 다양한 형태로 구현될 수 있다. 예를 들어, 본 명세서에서 기술되는 전자 장치(1000)는, 디지털 카메라, 스마트 폰(smart phone), 노트북 컴퓨터(laptop computer), 태블릿 PC, 전자북 단말기, 디지털방송용 단말기, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 네비게이션, MP3 플레이어 등이 있을 수 있으나, 이에 한정되는 것은 아니다. 본 명세서에서 기술되는 전자 장치(1000)는 사용자에 의해 착용될 수 있는 장치(wearable device)일 수 있다. 웨어러블 디바이스는 액세서리 형 장치(예컨대, 시계, 반지, 팔목 밴드, 발목 밴드, 목걸이, 안경, 콘택트 렌즈), 머리 착용형 장치(head-mounted-device(HMD)), 직물 또는 의류 일체형 장치(예: 전자 의복), 신체 부착형 장치(예컨대, 스킨 패드(skin pad)), 또는 생체 이식형 장치(예: implantable circuit) 중 적어도 하나를 포함할 수 있으나, 이에 한정되는 것은 아니다. 이하에서는, 설명의 편의상, 전자 장치(1000)가 스마트 폰인 경우를 예로 들어 설명하기로 한다.The electronic device 1000 according to an embodiment may be implemented in various forms. For example, the electronic device 1000 described in the present specification may include a digital camera, a smart phone, a laptop computer, a tablet PC, an electronic book terminal, a digital broadcasting terminal, and personal digital assistants (PDAs). There may be, but is not limited to, a Portable Multimedia Player (PMP), navigation, MP3 player, and the like. The electronic device 1000 described herein may be a wearable device that can be worn by a user. Wearable devices may be accessory devices (eg, watches, rings, cuffs, ankle bands, necklaces, glasses, contact lenses), head-mounted-devices (HMDs), textile or apparel-integrated devices (e.g., Electronic clothing), a body-attachable device (eg, a skin pad), or a living implantable device (eg, an implantable circuit), but is not limited thereto. Hereinafter, for convenience of description, the case where the electronic device 1000 is a smartphone will be described as an example.

일 실시 예에 의한 전자 장치(1000)는 신경망 모델(110)을 이용하여, 다양한 동작을 수행할 수 있다. 예를 들면, 신경망 모델(110)은, DNN(Deep Neural Network), RNN(Recurrent Neural Network), CNN(convolutional neural network) 등의 인공지능 모델을 포함할 수 있다. 상술한 예에 한하지 않고, 일 실시 예에 의한 신경망 모델(110)은 다양한 종류의 인공지능 모델을 포함할 수 있다.The electronic apparatus 1000 according to an embodiment may perform various operations using the neural network model 110. For example, the neural network model 110 may include an artificial intelligence model such as a deep neural network (DNN), a recurrent neural network (RNN), a convolutional neural network (CNN), and the like. The neural network model 110 according to an exemplary embodiment may include various types of artificial intelligence models.

일 실시 예에 의한 전자 장치(1000)는, 적어도 하나의 신경망 모델(110)을 이용하여, 데이터를 인식하거나, 새로운 데이터를 생성하는 등의 다양한 동작을 수행하고, 그 결과를 사용자에게 제공할 수 있다.According to an embodiment, the electronic apparatus 1000 may perform various operations such as recognizing data or generating new data using the at least one neural network model 110, and provide the result to the user. have.

일 실시 예에 의한 신경망 모델(110)은 복수 개의 레이어를 포함할 수 있다. 신경망 모델(110)에 포함된 각각의 레이어는, 데이터를 처리하는 적어도 하나의 함수를 포함할 수 있다. 예를 들어, CNN 모델을 포함한 신경망 모델(110)은, 컨볼루션(convolution) 레이어, 맥스 풀링(Max Pooling) 레이어, 플래튼(Flatten) 레이어 등 다양한 종류의 레이어들의 조합으로 구성될 수 있다.The neural network model 110 according to an embodiment may include a plurality of layers. Each layer included in the neural network model 110 may include at least one function for processing data. For example, the neural network model 110 including the CNN model may be configured by a combination of various types of layers such as a convolution layer, a max pooling layer, and a platen layer.

예를 들어, 컨볼루션 레이어에서는, 입력 데이터에 대한 특징 정보를 추출하는 동작이 수행될 수 있다. 맥스 풀링 레이어에서는, 입력 데이터에서 주요 데이터를 추출하는 동작이 수행될 수 있다. 플래튼 레이어에서는, 입력 데이터의 값을 일차원적 값으로 변환하는 동작이 수행될 수 있다. 상술한 예에 한하지 않고, 신경망 모델(110)은 다양한 종류의 레이어들을 포함할 수 있다.For example, in the convolution layer, an operation of extracting feature information about input data may be performed. In the max pooling layer, an operation of extracting main data from input data may be performed. In the platen layer, an operation of converting a value of input data into a one-dimensional value may be performed. In addition to the above-described example, the neural network model 110 may include various types of layers.

일 실시 예에 의한 전자 장치(1000)는 복수의 프로세서(1310, 1320, 1330)를 포함할 수 있다. 예를 들면, 전자 장치(1000)는, CPU, GPU, NPU, DSP 등 다양한 종류의 프로세서를 포함할 수 있다. 각각의 프로세서는 서로 다른 성능 및 특징을 가질 수 있다. 예를 들어, CPU는 다른 프로세서보다 속도는 느리지만 정확도가 높고, 에너지 효율이 좋은 특징을 가지고 있다.The electronic device 1000 according to an embodiment may include a plurality of processors 1310, 1320, and 1330. For example, the electronic apparatus 1000 may include various types of processors such as a CPU, a GPU, an NPU, and a DSP. Each processor may have different performance and features. For example, CPUs are slower than other processors, but feature high accuracy and energy efficiency.

이하 표 1은 CPU, GPU, NPU, DSP 각각의 프로세서에 대한 상대적인 특징을 나타낸 것이다.Table 1 below shows the relative characteristics of each of the processors of the CPU, GPU, NPU, and DSP.

CPUCPU GPUGPU DSPDSP NPUNPU 속도speed 느림Slow 빠름speed 아주 빠름Very fast 아주 빠름Very fast 정확도accuracy 강인함Toughness 강인함Toughness 민감함Sensitive 민감함Sensitive 에너지 효율Energy efficiency 좋음good 나쁨Bad 매우 좋음Very good 매우 좋음Very good 새로운 작업 수행의 용이성Ease of performing new tasks 쉬움facility 다소 어려움Somewhat difficult 매우 어려움Very difficult 불가능impossible 코드 검사Code inspection 쉬움facility 다소 어려움Somewhat difficult 매우 어려움Very difficult 불가능impossible

상기 표 1에 기재된 각 프로세서에 대한 특징은, 예시에 불과하고, 각 프로세서에서 처리되는 작업의 특징이나, 프로세서의 현재 상태 등 다양한 요인에 따라, 달라질 수 있다.The characteristics of each processor described in Table 1 are merely examples, and may vary according to various factors such as a characteristic of a task processed in each processor or a current state of the processor.

일 실시 예에 의하면, 컴파일 단계에서, 신경망 모델(110)에 포함된 각각의 레이어에 대하여 전자 장치(1000)의 프로세서(1310, 1320, 1330)들 중 하나의 프로세서가 할당될 수 있다. 전자 장치(1000)는, 표 1과 같은 각각의 프로세서(1310, 1320, 1330)의 성능과, 신경망 모델(110)에 포함된 각각의 레이어에 관한 특징에 기초하여, 각각의 레이어에 프로세서가 할당될 수 있다.According to an embodiment of the present disclosure, one processor among the processors 1310, 1320, and 1330 of the electronic apparatus 1000 may be allocated to each layer included in the neural network model 110 in the compilation step. The electronic apparatus 1000 allocates a processor to each layer based on the performance of each of the processors 1310, 1320, and 1330 as shown in Table 1, and the characteristics of each layer included in the neural network model 110. Can be.

일 실시 예에 의하면, 처리되는 레이어의 특징에 따라, 레이어를 처리하는 프로세서의 속도 및 정확도가 서로 다를 수 있다. 일 실시 예에 의한 전자 장치(1000)는, 어떤 프로세서가 신경망 모델(110)의 각각의 레이어를 처리하기에 적합한지 여부를 판단함으로써, 복수의 프로세서 중 각각의 레이어가 처리될 프로세서를 할당할 수 있다.According to an embodiment of the present disclosure, the speed and accuracy of the processor for processing the layer may be different according to the characteristics of the layer to be processed. The electronic apparatus 1000 according to an embodiment may allocate a processor to be processed by each layer of the plurality of processors by determining whether a processor is suitable for processing each layer of the neural network model 110. have.

일 실시 예에 따라 레이어가 처리될 프로세서가 할당되면, 전자 장치(1000)는 할당 결과에 따라서, 복수의 프로세서를 이용하여 신경망 모델(110)을 처리할 수 있다. According to an embodiment, when a processor to which a layer is to be processed is allocated, the electronic apparatus 1000 may process the neural network model 110 using a plurality of processors according to the allocation result.

도 2는 일 실시 예에 따라 신경망 모델(110)의 레이어가 적어도 하나의 슬라이스로 할당되는 일 예를 나타낸 도면이다.2 is a diagram illustrating an example in which a layer of the neural network model 110 is allocated to at least one slice, according to an exemplary embodiment.

도 2에 도시된 노드들은 신경망 모델(110)을 구성하는 일부의 레이어들을 나타낸다. 일 실시 예에 의하면, 화살표 방향에 따라 각 레이어에서 작업이 수행된 결과가 다음 레이어로 전달됨으로써, 전자 장치(1000)에서 신경망 모델(110)이 처리될 수 있다.The nodes shown in FIG. 2 represent some layers that make up the neural network model 110. According to an embodiment of the present disclosure, the neural network model 110 may be processed in the electronic apparatus 1000 by transmitting the result of the operation performed in each layer in the direction of the arrow to the next layer.

일 실시 예에 의하면, 신경망 모델(110)의 복수 개의 레이어가 적어도 하나의 슬라이스에 할당되고, 슬라이스 단위로 각각의 레이어가 프로세서에 할당될 수 있다. 일 실시 예에 의한 전자 장치(1000)는 레이어 단위 대신, 적어도 하나의 레이어를 포함하는 슬라이스 단위로 프로세서에 레이어를 할당함으로써, 할당 동작에 의해 발생되는 연산량이 감소될 수 있다.According to an embodiment of the present disclosure, a plurality of layers of the neural network model 110 may be allocated to at least one slice, and each layer may be allocated to a processor in slice units. According to an embodiment, the electronic apparatus 1000 may allocate a layer to a processor in a slice unit including at least one layer instead of a layer unit, thereby reducing the amount of computation generated by the allocation operation.

일 실시 예에 의하면, 전자 장치(1000)는 신경망 모델(110)을 구성하는 복수 개의 레이어 중 적어도 하나의 레이어를 슬라이스 포인트로 결정함으로써, 상기 복수 개의 레이어를 슬라이스에 할당할 수 있다. 일 실시 예에 의하면, 슬라이스 포인트를 기준으로, 각각의 레이어가 서로 다른 슬라이스에 할당될 수 있다.According to an embodiment of the present disclosure, the electronic apparatus 1000 may allocate the plurality of layers to slices by determining at least one layer of the plurality of layers constituting the neural network model 110 as a slice point. According to an embodiment, each layer may be allocated to a different slice based on the slice point.

예를 들면, 전자 장치(1000)는 레이어 "conv2d_9" 부터 "activation_11"까지의 각 레이어를, 슬라이스 포인트로 결정할지 여부를 차례대로 판단할 수 있다. 또한, "activation_11" 이후 "conv2d_7"부터 "activation_8"까지 차례대로 슬라이스 포인트인지 여부가 판단될 수 있다. "average_pooling2d_1" 내지 "activation_6"도 동일하게 차례대로 슬라이스 포인트인지 여부가 판단될 수 있다.For example, the electronic apparatus 1000 may sequentially determine whether to determine each layer from layers "conv2d_9" to "activation_11" as a slice point. In addition, it may be determined whether the slice point is sequentially from "conv2d_7" to "activation_8" after "activation_11". It may be determined whether "average_pooling2d_1" to "activation_6" are the same as the slice points in order.

일 실시 예에 의한 전자 장치(1000)는, 각각의 레이어가, 복수의 레이어들이 분기되는 지점인지 여부, 복수의 레이어들이 결합되는 지점인지 여부, 동일한 프로세서에서 처리 가능한 작업을 포함하는지 여부, 높은 정확도가 요구되는 작업을 포함하는지 여부 중 적어도 하나에 기초하여, 슬라이스 포인트를 결정할 수 있다. 일 실시 예에 의하면, 상술한 다양한 기준과 같이, 현재 레이어에서, 이전 레이어의 프로세서와 다른 프로세서로 스위칭될 가능성이 있는지에 따라서, 현재 레이어가 슬라이스 포인트로 결정될 수 있다. 상술한 예에 한하지 않고, 슬라이스 포인트는 다양한 기준에 따라 결정될 수 있다.According to an embodiment of the present disclosure, the electronic apparatus 1000 according to an embodiment may include whether each layer is a point where a plurality of layers are branched, is a point at which a plurality of layers are combined, whether a task that is processed by the same processor is included, and high accuracy The slice point may be determined based on at least one of whether or not includes the required operation. According to an embodiment of the present disclosure, as in the above-described various criteria, the current layer may be determined as a slice point according to whether the current layer may be switched to a processor different from the processor of the previous layer. In addition to the above-described examples, the slice point may be determined according to various criteria.

예를 들어, "conv2d_9"는 "max_pooling2d_2"로부터 분기된 복수 개의 레이어 중 하나이므로, 복수의 레이어들이 분기되는 지점에 해당됨에 따라 슬라이스 포인트로 결정될 수 있다. 따라서, "conv2d_9"은 앞서 결정된 슬라이스와는 다른 새로운 슬라이스 1(210)에 속할 수 있다.For example, since "conv2d_9" is one of a plurality of layers branched from "max_pooling2d_2", it may be determined as a slice point as it corresponds to a branching point of the plurality of layers. Thus, "conv2d_9" may belong to a new slice 1 210 that is different from the slice previously determined.

"activation_9" 내지 "actionvation_11"의 레이어들은, 상술한 슬라이스 포인트를 결정하는 기준에 따라서, 슬라이스 포인트로 결정되지 않음에 의해, 앞서 결정된 슬라이스(210)에 할당될 수 있다.The layers of "activation_9" to "actionvation_11" may be assigned to the slice 210 determined above by not being determined as the slice point according to the above-described criteria for determining the slice point.

"conv2d_7"은, "conv2d_9"와 동일하게, "max_pooling2d_2"로부터 분기된 복수 개의 레이어 중 하나이므로, 슬라이스 포인트로 결정되고, 새로운 슬라이스(220)에 속할 수 있다. "conv2d_7" 내지 "actionvation_8"의 레이어들은, 슬라이스 포인트로 결정되지 않음에 따라, 앞서 결정된 슬라이스 2(220)에 할당될 수 있다. Since "conv2d_7" is one of a plurality of layers branched from "max_pooling2d_2" like "conv2d_9", it is determined as a slice point and may belong to a new slice 220. Layers of "conv2d_7" to "actionvation_8" may be assigned to slice 2 220 determined above, as not determined as the slice point.

"average_pooling2d_1"은, "conv2d_9"와 동일하게, "max_pooling2d_2"로부터 분기된 복수 개의 레이어 중 하나이므로 슬라이스 포인트로 결정되고, 새로운 슬라이스 3(230)에 속할 수 있다. "conv2d_12"의 레이어는, 슬라이스 포인트로 결정되지 않음에 따라, 앞서 결정된 슬라이스 3(230)에 할당될 수 있다.Since "average_pooling2d_1" is one of a plurality of layers branched from "max_pooling2d_2" like "conv2d_9", it is determined as a slice point and may belong to a new slice 3 (230). The layer of "conv2d_12" may be assigned to slice 3 230 determined above, as it is not determined as the slice point.

"activation_12"는, 슬라이스 3(230)의 레이어들과 동일한 프로세서에서 처리 가능한 작업을 포함하지 않거나, 높은 정확도가 요구되는 작업을 포함함에 따라서, 슬라이스 포인트로 결정될 수 있고, 새로운 슬라이스 4(240)에 속할 수 있다. "activation_12" may be determined as a slice point as it does not include a task that can be processed in the same processor as the layers of slice 3 230, or includes a task that requires high accuracy, and thus a new slice 4 240 may be determined. Can belong.

예를 들어, "activation_12"를 처리할 수 있는 프로세서 중에 슬라이스 3(230)에 포함된 레이어인 "activation_12" 및 "conv2d_12"를 처리할 수 있는 프로세서가 존재하지 않는 경우, 슬라이스 3(230)에 할당된 프로세서로 "activation_12"를 처리할 수 없으므로, "activation_12"는 슬라이스 3(230)에 속할 수 없다. 일 실시 예에 의하면, 슬라이스 단위로 프로세서가 할당됨에 따라 동일한 슬라이스에 속한 레이어들은 동일한 프로세서로 처리될 수 있기 때문이다.For example, when no processor capable of processing "activation_12" and "conv2d_12", which are layers included in slice 3230, exists in the processor capable of processing "activation_12", it is allocated to slice 3230. "Activation_12" cannot belong to slice 3 230 because the processor cannot process "activation_12". According to an embodiment, as processors are allocated in units of slices, layers belonging to the same slice may be processed by the same processor.

또 다른 예로, "activation_12"가 높은 정확도가 요구되는 작업을 포함하는 경우에도, 다른 레이어로 인한 영향 없이 "activation_12"에 적합한 프로세서가 할당될 수 있도록, "activation_12"가 슬라이스 포인트로 결정될 수 있다.As another example, even when "activation_12" includes a task requiring high accuracy, "activation_12" may be determined as a slice point so that a processor suitable for "activation_12" may be allocated without being affected by another layer.

"conv2d_6"은, "conv2d_9"와 동일하게, "max_pooling2d_2"로부터 분기된 복수 개의 레이어 중 하나이므로 슬라이스 포인트로 결정되고, 새로운 슬라이스 5(250)에 할당될 수 있다. "activation_6"의 레이어는, 슬라이스 포인트로 결정되지 않음에 따라, 앞서 결정된 슬라이스 5(250)에 할당될 수 있다.Since "conv2d_6" is one of a plurality of layers branched from "max_pooling2d_2" like "conv2d_9", it is determined as a slice point and may be allocated to a new slice 5 (250). The layer of "activation_6" may be assigned to slice 5 250 determined above, as the slice point is not determined.

일 실시 예에 의하면, 적어도 하나의 레이어가 할당된 각각의 슬라이스마다 하나의 프로세서가 할당될 수 있다. 일 실시 예에 의하면, 전자 장치(1000)는 각각의 슬라이스에 대한 각각의 프로세서의 처리 시간에 기초하여, 각각의 슬라이스를 처리할 프로세서를 결정할 수 있다. According to an embodiment, one processor may be allocated to each slice to which at least one layer is allocated. According to an embodiment, the electronic apparatus 1000 may determine a processor to process each slice, based on a processing time of each processor for each slice.

일 실시 예에 의한 처리 시간은, 현재 프로세서가 현재 슬라이스에 포함된 적어도 하나의 레이어의 작업을 처리하는데 소요되는 시간과, 현재 프로세서가 다른 프로세서로부터 현재 슬라이스를 처리하는데 필요한 데이터를 수신하는데 소요되는 스위칭 시간을 포함할 수 있다. 예를 들면, 다른 프로세서로부터 수신되는 현재 슬라이스를 처리하는데 필요한 데이터는, 현재 슬라이스에 포함된 레이어에 입력되는 데이터를 포함할 수 있다. 또한, 다른 프로세서는, 이전 슬라이스를 처리하여 현재 슬라이스에 입력되는 데이터를 출력하는 이전 프로세서일 수 있다.The processing time according to an embodiment of the present invention is a switching time required for a current processor to process a task of at least one layer included in a current slice, and data required for a current processor to process a current slice from another processor. May include time. For example, the data required to process the current slice received from another processor may include data input to a layer included in the current slice. Also, another processor may be a previous processor that processes a previous slice and outputs data input to the current slice.

일 실시 예에 의한 스위칭 시간은, 이전 슬라이스를 처리하는 이전 프로세서와 현재 슬라이스를 처리하는 현재 프로세서가 상이한 경우, 이전 슬라이스의 처리 결과에 대한 데이터가 현재 프로세서로 전달되는데 걸리는 시간을 나타낼 수 있다. According to an embodiment, the switching time may indicate a time taken for data of a processing result of the previous slice to be transferred to the current processor when the previous processor processing the previous slice and the current processor processing the current slice are different.

예를 들어, 슬라이스 3(230)에 제1 프로세서(1310)가 할당된 경우, 슬라이스 4(240)에 대한 제2 프로세서(1320)의 스위칭 시간은, "conv2d_12"의 출력 데이터가 제1 프로세서(1310)에서 제2 프로세서(1320)로 전달되는 시간을 나타낼 수 있다. 또한, 슬라이스 4(240)에 대한 제3 프로세서(1330)의 스위칭 시간은, "conv2d_12"의 출력 데이터가 제1 프로세서(1310)에서 제3 프로세서(1330)로 전달되는 시간을 나타낼 수 있다. For example, when the first processor 1310 is allocated to the slice 3 230, the switching time of the second processor 1320 with respect to the slice 4 240 is that the output data of “conv2d_12” is determined by the first processor ( The time transferred from the 1310 to the second processor 1320 may be represented. In addition, the switching time of the third processor 1330 with respect to the slice 4 240 may indicate a time when the output data of “conv2d_12” is transferred from the first processor 1310 to the third processor 1330.

일 실시 예에 의한 스위칭 시간은, 제1 프로세서(1310)에서 제2 프로세서(1320)로 데이터가 전달되는 시간뿐만 아니라, 제2 프로세서(1320)의 데이터 포맷에 맞게 전달되는 데이터가 변환되는 시간을 더 포함할 수 있다.According to an exemplary embodiment, the switching time may include not only a time at which data is transferred from the first processor 1310 to the second processor 1320, but also a time at which data transferred according to the data format of the second processor 1320 is converted. It may further include.

또한, 일 실시 예에 의한 스위칭 시간은, 전달되는 데이터의 크기가 커짐에 따라 증가될 수 있다.In addition, the switching time according to an embodiment may be increased as the size of the transmitted data increases.

또한, 일 실시 예에 의한 스위칭 시간은, 이전 슬라이스를 처리하는 이전 프로세서와 현재 슬라이스를 처리하는 현재 프로세서가 동일한 경우, 프로세서 간 데이터가 전달되지 않으므로, 0으로 결정될 수 있다. 예를 들어, 슬라이스 4(240)에 대한 제1 프로세서(1310)의 스위칭 시간은, 슬라이스 3(230)에 제1 프로세서(1310)가 할당됨에 의해, 0으로 결정될 수 있다.In addition, the switching time according to an embodiment may be determined as 0 when data between the processors is not transferred when the previous processor processing the previous slice and the current processor processing the current slice are the same. For example, the switching time of the first processor 1310 for the slice 4 240 may be determined to be 0 by assigning the first processor 1310 to the slice 3 230.

따라서, 일 실시 예에 의하면, 각 프로세서에 의해 슬라이스의 레이어 작업이 처리되는데 소요되는 시간뿐만 아니라, 복수의 프로세서를 통해 복수 개의 슬라이스가 처리될 수 있도록, 프로세서 간 데이터가 전달되는 스위칭 시간에 기초하여, 각 슬라이스를 처리할 프로세서가 결정될 수 있다.Accordingly, according to an embodiment, not only the time required for processing a layer operation of a slice by each processor, but also based on a switching time for transferring data between processors so that a plurality of slices may be processed through a plurality of processors. The processor to process each slice may be determined.

도 3 는 일 실시 예에 의한 전자 장치(1000)의 내부 구성을 설명하기 위한 블록도이다.3 is a block diagram illustrating an internal configuration of an electronic apparatus 1000 according to an exemplary embodiment.

도 4은 일 실시 예에 의한 전자 장치(1000)의 내부 구성을 설명하기 위한 블록도이다.4 is a block diagram illustrating an internal configuration of an electronic device 1000 according to an exemplary embodiment.

도 3를 참조하면, 전자 장치(1000)는, 프로세서(1300), 메모리(1700) 및 출력부(1200)을 포함할 수 있다. 그러나, 도 3에 도시된 구성 요소 모두가 전자 장치(1000)의 필수 구성 요소인 것은 아니다. 도 3에 도시된 구성 요소보다 많은 구성 요소에 의해 전자 장치(1000)가 구현될 수도 있고, 도 3에 도시된 구성 요소보다 적은 구성 요소에 의해 전자 장치(1000)가 구현될 수도 있다.Referring to FIG. 3, the electronic apparatus 1000 may include a processor 1300, a memory 1700, and an output unit 1200. However, not all components illustrated in FIG. 3 are essential components of the electronic apparatus 1000. The electronic device 1000 may be implemented by more components than those illustrated in FIG. 3, or the electronic device 1000 may be implemented by fewer components than those illustrated in FIG. 3.

예를 들면, 전자 장치(1000)는 도 4에 도시된 바와 같이, 일부 실시예에 따른 전자 장치(1000)는, 프로세서(1300), 메모리(1700) 및 출력부(1200) 이외에 사용자 입력부(1100), 센싱부(1400), 통신부(1500), 및 A/V 입력부(1600)를 더 포함할 수도 있다.For example, as shown in FIG. 4, the electronic device 1000 may include a user input unit 1100 in addition to the processor 1300, the memory 1700, and the output unit 1200. ), The sensing unit 1400, the communication unit 1500, and the A / V input unit 1600 may be further included.

사용자 입력부(1100)는, 사용자가 전자 장치(1000)를 제어하기 위한 데이터를 입력하는 수단을 의미한다. 예를 들어, 사용자 입력부(1100)에는 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다.The user input unit 1100 means a means for a user to input data for controlling the electronic apparatus 1000. For example, the user input unit 1100 includes a key pad, a dome switch, a touch pad (contact capacitive type, pressure resistive layer type, infrared sensing type, surface ultrasonic conduction type, and integral type). Tension measurement method, piezo effect method, etc.), a jog wheel, a jog switch, and the like, but are not limited thereto.

일 실시 예에 의하면, 사용자 입력부(1100)는, 신경망 모델(110)에 의한 동작을 수행하는데 필요한 사용자의 입력을 수신할 수 있다. 일 실시 예에 의하면, 사용자 입력부(1100)에서 수신된 사용자의 입력에 따라서, 전자 장치(1000)는 복수의 프로세서를 이용하여 신경망 모델(110)을 처리하고, 처리 결과를 출력할 수 있다.According to an embodiment of the present disclosure, the user input unit 1100 may receive a user input required to perform an operation by the neural network model 110. According to an embodiment of the present disclosure, the electronic apparatus 1000 may process the neural network model 110 using a plurality of processors and output a processing result according to a user input received from the user input unit 1100.

출력부(1200)는, 오디오 신호 또는 비디오 신호 또는 진동 신호를 출력할 수 있으며, 출력부(1200)는 디스플레이부(1210), 음향 출력부(1220), 및 진동 모터(1230)를 포함할 수 있다.The output unit 1200 may output an audio signal, a video signal, or a vibration signal, and the output unit 1200 may include a display unit 1210, an audio output unit 1220, and a vibration motor 1230. have.

디스플레이부(1210)는 전자 장치(1000)에서 처리되는 정보를 표시 출력한다. The display unit 1210 displays and outputs information processed by the electronic apparatus 1000.

한편, 디스플레이부(1210)와 터치패드가 레이어 구조를 이루어 터치 스크린으로 구성되는 경우, 디스플레이부(1210)는 출력 장치 이외에 입력 장치로도 사용될 수 있다. 디스플레이부(1210)는 액정 디스플레이(liquid crystal display), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display), 유기 발광 다이오드(organic light-emitting diode), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 전기영동 디스플레이(electrophoretic display) 중에서 적어도 하나를 포함할 수 있다. 그리고 전자 장치(1000)의 구현 형태에 따라 전자 장치(1000)는 디스플레이부(1210)를 2개 이상 포함할 수도 있다. Meanwhile, when the display unit 1210 and the touch pad form a layer structure to form a touch screen, the display unit 1210 may be used as an input device in addition to the output device. The display unit 1210 may include a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, and a three-dimensional display. 3D display, an electrophoretic display. The electronic apparatus 1000 may include two or more display units 1210 according to the implementation form of the electronic apparatus 1000.

음향 출력부(1220)는 통신부(1500)로부터 수신되거나 메모리(1700)에 저장된 오디오 데이터를 출력한다. The sound output unit 1220 outputs audio data received from the communication unit 1500 or stored in the memory 1700.

진동 모터(1230)는 진동 신호를 출력할 수 있다. 또한, 진동 모터(1230)는 터치스크린에 터치가 입력되는 경우 진동 신호를 출력할 수도 있다.The vibration motor 1230 may output a vibration signal. In addition, the vibration motor 1230 may output a vibration signal when a touch is input to the touch screen.

일 실시 예에 따라 프로세서(1300)에 의해 신경망 모델(110)이 처리된 결과는, 디스플레이부(1210), 음향 출력부(1220), 및 진동 모터(1230) 통하여, 다양한 형태로 출력할 수 있다.According to an embodiment, a result of processing the neural network model 110 by the processor 1300 may be output in various forms through the display unit 1210, the sound output unit 1220, and the vibration motor 1230. .

프로세서(1300)는, 통상적으로 전자 장치(1000)의 전반적인 동작을 제어한다. 예를 들어, 프로세서(1300)는, 메모리(1700)에 저장된 프로그램들을 실행함으로써, 사용자 입력부(1100), 출력부(1200), 센싱부(1400), 통신부(1500), A/V 입력부(1600) 등을 전반적으로 제어할 수 있다. 전자 장치(1000)는 복수의 프로세서(1300)를 포함할 수 있다.The processor 1300 typically controls the overall operation of the electronic apparatus 1000. For example, the processor 1300 executes programs stored in the memory 1700 to thereby execute a user input unit 1100, an output unit 1200, a sensing unit 1400, a communication unit 1500, and an A / V input unit 1600. ) Can be controlled overall. The electronic device 1000 may include a plurality of processors 1300.

프로세서(1300)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(1700)로부터 프로세서(1300)에 제공되거나, 통신부(1500)를 통해 수신되어 프로세서(1300)로 제공될 수 있다. 예를 들면 프로세서(1300)는 메모리와 같은 기록 장치에 저장된 프로그램 코드에 따라 명령을 실행하도록 구성될 수 있다.The processor 1300 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. The command may be provided from the memory 1700 to the processor 1300 or may be received through the communication unit 1500 and provided to the processor 1300. For example, the processor 1300 may be configured to execute an instruction according to a program code stored in a recording device such as a memory.

일 실시 예에 의한 프로세서(1300)는 신경망 모델(110)에 포함된 복수 개의 레이어들을 적어도 하나의 슬라이스에 할당하고, 프로세서(1300)에 포함된 복수의 프로세서들 중 각 슬라이스를 처리할 프로세서를 할당할 수 있다. 또한, 할당된 결과에 기초하여, 프로세서(1300)는 신경망 모델(110)을 처리할 수 있다.The processor 1300 according to an embodiment allocates a plurality of layers included in the neural network model 110 to at least one slice, and allocates a processor to process each slice among the plurality of processors included in the processor 1300. can do. In addition, based on the assigned result, the processor 1300 may process the neural network model 110.

일 실시 예에 의하면, 슬라이스에 대한 프로세서의 처리 시간에 기초하여, 각각의 슬라이스에 대한 프로세서가 할당될 수 있다. 일 실시 예에 의한 처리 시간은, 프로세서 간 데이터가 전달되는데 소요되는 스위칭 시간을 포함할 수 있다.According to one embodiment, a processor for each slice may be allocated based on the processing time of the processor for the slice. The processing time according to an embodiment may include a switching time required for transferring data between processors.

일 실시 예에 의하면, 프로세서(1300)는 복수의 프로세서들 중 제1 프로세서에 할당된 적어도 하나의 슬라이스에 포함된 적어도 하나의 레이어를 식별할 수 있다. 또한, 프로세서(1300)는 식별된 레이어에 입력되는 데이터, 출력되는 데이터 및 식별된 레이어의 내부에서 임시 저장되는 데이터 중 적어도 하나의 데이터를 각각 나타내는 적어도 하나의 블롭을 식별하고, 식별된 블롭의 데이터를 저장하기 위한 메모리를 할당할 수 있다.According to an embodiment of the present disclosure, the processor 1300 may identify at least one layer included in at least one slice allocated to the first processor among the plurality of processors. In addition, the processor 1300 may identify at least one blob each representing at least one data among data input to the identified layer, output data, and data temporarily stored in the identified layer, and the data of the identified blob Allocate memory for storing

일 실시 예에 의하면, 현재 블롭에 대한 메모리는, 각각의 레이어의 처리 순서에 따라, 이전 블롭의 사용 구간이 현재 블롭의 데이터가 생성되기 전에 종료되는지 여부를 판단함으로써, 할당될 수 있다. 또한, 동일한 메모리가 할당된 적어도 하나의 블롭의 데이터 크기 중 가장 큰 데이터 크기에 기초하여, 상기 할당된 메모리의 크기가 결정될 수 있다.According to an embodiment, the memory for the current blob may be allocated by determining whether the use period of the previous blob is terminated before the data of the current blob is generated according to the processing order of each layer. Further, the size of the allocated memory may be determined based on the largest data size among data sizes of at least one blob to which the same memory is allocated.

센싱부(1400)는, 전자 장치(1000)의 상태 또는 전자 장치(1000) 주변의 상태를 감지하고, 감지된 정보를 프로세서(1300)로 전달할 수 있다. The sensing unit 1400 may detect a state of the electronic device 1000 or a state around the electronic device 1000 and transmit the detected information to the processor 1300.

센싱부(1400)는, 지자기 센서(Geomagnetic sensor)(1410), 가속도 센서(Acceleration sensor)(1420), 온/습도 센서(1430), 적외선 센서(1440), 자이로스코프 센서(1450), 위치 센서(예컨대, GPS)(1460), 기압 센서(1470), 근접 센서(1480), 및 RGB 센서(illuminance sensor)(1490) 중 적어도 하나를 포함할 수 있으나, 이에 한정되는 것은 아니다. The sensing unit 1400 may include a geomagnetic sensor 1410, an acceleration sensor 1420, a temperature / humidity sensor 1430, an infrared sensor 1440, a gyroscope sensor 1450, a position sensor (Eg, GPS) 1460, barometric pressure sensor 1470, proximity sensor 1480, and RGB sensor (illuminance sensor) 1490, but are not limited thereto.

일 실시 예에 따라 센싱부(1400)에 의해 감지된 정보는, 신경망 모델(110)의 입력 정보로서 이용되거나, 신경망 모델(110)이 갱신되는데 이용될 수 있다. 상술한 예에 한하지 않고, 센싱부(1400)에 의해 감지된 정보는, 신경망 모델(110)을 처리하기 위한 다양한 방법에 따라 이용될 수 있다.According to an embodiment, the information detected by the sensing unit 1400 may be used as input information of the neural network model 110 or may be used to update the neural network model 110. In addition to the above-described example, the information detected by the sensing unit 1400 may be used according to various methods for processing the neural network model 110.

통신부(1500)는, 전자 장치(1000)가 서버(2000) 또는 외부 장치(미도시)와 통신을 하게 하는 하나 이상의 구성요소를 포함할 수 있다. 예를 들어, 통신부(1500)는, 근거리 통신부(1510), 이동 통신부(1520), 방송 수신부(1530)를 포함할 수 있다. The communicator 1500 may include one or more components that allow the electronic device 1000 to communicate with the server 2000 or an external device (not shown). For example, the communicator 1500 may include a short range communicator 1510, a mobile communicator 1520, and a broadcast receiver 1530.

근거리 통신부(short-range wireless communication unit)(1510)는, 블루투스 통신부, BLE(Bluetooth Low Energy) 통신부, 근거리 무선 통신부(Near Field Communication unit), WLAN(와이파이) 통신부, 지그비(Zigbee) 통신부, 적외선(IrDA, infrared Data Association) 통신부, WFD(Wi-Fi Direct) 통신부, UWB(ultra wideband) 통신부, Ant+ 통신부 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. The short-range wireless communication unit 1510 includes a Bluetooth communication unit, a Bluetooth Low Energy (BLE) communication unit, a near field communication unit, a WLAN (Wi-Fi) communication unit, a Zigbee communication unit, an infrared ray ( IrDA, an infrared data association (WIRD) communication unit, WFD (Wi-Fi Direct) communication unit, UWB (ultra wideband) communication unit, Ant + communication unit and the like, but may not be limited thereto.

이동 통신부(1520)는, 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. 여기에서, 무선 신호는, 음성 호 신호, 화상 통화 호 신호 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 형태의 데이터를 포함할 수 있다.The mobile communication unit 1520 transmits and receives a radio signal with at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various types of data according to transmission and reception of a voice call signal, a video call call signal, or a text / multimedia message.

방송 수신부(1530)는, 방송 채널을 통하여 외부로부터 방송 신호 및/또는 방송 관련된 정보를 수신한다. 방송 채널은 위성 채널, 지상파 채널을 포함할 수 있다. 구현 예에 따라서 전자 장치(1000)가 방송 수신부(1530)를 포함하지 않을 수도 있다.The broadcast receiving unit 1530 receives a broadcast signal and / or broadcast related information from the outside through a broadcast channel. The broadcast channel may include a satellite channel and a terrestrial channel. According to an embodiment of the present disclosure, the electronic device 1000 may not include the broadcast receiver 1530.

일 실시 예에 의한, 통신부(1500)는 신경망 모델(110)이 처리된 결과를 외부 장치로 전송할 수 있다. 또는, 통신부(1500)는 신경망 모델(110)을 처리하는데 필요한 정보를 외부 장치로부터 수신할 수 있다.According to an embodiment of the present disclosure, the communication unit 1500 may transmit a result of processing the neural network model 110 to an external device. Alternatively, the communication unit 1500 may receive information necessary for processing the neural network model 110 from an external device.

A/V(Audio/Video) 입력부(1600)는 오디오 신호 또는 비디오 신호 입력을 위한 것으로, 이에는 카메라(1610)와 마이크로폰(1620) 등이 포함될 수 있다. 카메라(1610)는 화상 통화모드 또는 촬영 모드에서 이미지 센서를 통해 정지영상 또는 동영상 등의 화상 프레임을 얻을 수 있다. 이미지 센서를 통해 캡쳐된 이미지는 프로세서(1300) 또는 별도의 이미지 처리부(미도시)를 통해 처리될 수 있다. 마이크로폰(1620)은, 외부의 음향 신호를 입력 받아 전기적인 음성 데이터로 처리한다. The A / V input unit 1600 is for inputting an audio signal or a video signal, and may include a camera 1610 and a microphone 1620. The camera 1610 may obtain an image frame such as a still image or a moving image through an image sensor in a video call mode or a photographing mode. The image captured by the image sensor may be processed by the processor 1300 or a separate image processor (not shown). The microphone 1620 receives an external sound signal and processes the external sound signal into electrical voice data.

일 실시 예에 따라 A/V 입력부(1600)에 의해 획득된 영상 데이터 또는 음성 데이터는, 신경망 모델(110)의 입력 정보로서 이용되거나, 신경망 모델(110)이 갱신되는데 이용될 수 있다. 상술한 예에 한하지 않고, 영상 데이터 또는 음성 데이터는, 신경망 모델(110)을 처리하기 위한 다양한 방법에 따라 이용될 수 있다.According to an embodiment, the image data or the audio data obtained by the A / V input unit 1600 may be used as input information of the neural network model 110 or used to update the neural network model 110. In addition to the above-described examples, the image data or the audio data may be used according to various methods for processing the neural network model 110.

메모리(1700)는, 프로세서(1300)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 전자 장치(1000)로 입력되거나 전자 장치(1000)로부터 출력되는 데이터를 저장할 수도 있다. The memory 1700 may store a program for processing and controlling the processor 1300, and may store data input to or output from the electronic device 1000.

일 실시 예에 의한 메모리(1700)는 신경망 모델(110)에 관한 정보 및 복수의 프로세서의 성능에 관한 정보를 저장할 수 있다. 예를 들어, 복수의 프로세서의 성능에 관한 정보는, 레이어에 대한 프로세서의 처리 속도, 각 프로세서에서 처리 가능한 레이어에 관한 정보, 다른 프로세서 간 데이터를 스위칭하는데 소요되는 시간에 관한 정보 등을 포함할 수 있다. 상술한 예에 한하지 않고, 복수의 프로세서의 성능에 관한 정보는, 프로세서에 슬라이스를 할당하는데 필요한 다양한 종류의 정보를 포함할 수 있다.The memory 1700 according to an exemplary embodiment may store information about the neural network model 110 and information about performance of a plurality of processors. For example, the information about the performance of a plurality of processors may include information about a processor's processing speed for a layer, information about a layer that can be processed by each processor, and information about time to switch data between different processors. have. In addition to the above-described examples, the information about the performance of the plurality of processors may include various types of information required for allocating slices to the processors.

메모리(1700)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. The memory 1700 may be a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, SD or XD memory), RAM (RAM, Random Access Memory) Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM), Magnetic Memory, Magnetic Disk It may include at least one type of storage medium of the optical disk.

메모리(1700)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류할 수 있는데, 예를 들어, UI 모듈(1710), 터치 스크린 모듈(1720), 알림 모듈(1730) 등으로 분류될 수 있다. Programs stored in the memory 1700 may be classified into a plurality of modules according to their functions. For example, the programs stored in the memory 1700 may be classified into a UI module 1710, a touch screen module 1720, a notification module 1730, and the like. .

UI 모듈(1710)은, 애플리케이션 별로 전자 장치(1000)와 연동되는 특화된 UI, GUI 등을 제공할 수 있다. 터치 스크린 모듈(1720)은 사용자의 터치 스크린 상의 터치 제스처를 감지하고, 터치 제스처에 관한 정보를 프로세서(1300)로 전달할 수 있다. 일부 실시예에 따른 터치 스크린 모듈(1720)은 터치 코드를 인식하고 분석할 수 있다. 터치 스크린 모듈(1720)은 컨트롤러를 포함하는 별도의 하드웨어로 구성될 수도 있다.The UI module 1710 may provide a specialized UI, GUI, and the like that interoperate with the electronic device 1000 for each application. The touch screen module 1720 may detect a touch gesture on the user's touch screen and transmit information about the touch gesture to the processor 1300. The touch screen module 1720 according to some embodiments may recognize and analyze a touch code. The touch screen module 1720 may be configured as separate hardware including a controller.

터치스크린의 터치 또는 근접 터치를 감지하기 위해 터치스크린의 내부 또는 근처에 다양한 센서가 구비될 수 있다. 터치스크린의 터치를 감지하기 위한 센서의 일례로 촉각 센서가 있다. 촉각 센서는 사람이 느끼는 정도로 또는 그 이상으로 특정 물체의 접촉을 감지하는 센서를 말한다. 촉각 센서는 접촉면의 거칠기, 접촉 물체의 단단함, 접촉 지점의 온도 등의 다양한 정보를 감지할 수 있다.Various sensors may be provided inside or near the touch screen to detect a touch or proximity touch of the touch screen. An example of a sensor for sensing a touch of a touch screen is a tactile sensor. The tactile sensor refers to a sensor that senses the contact of a specific object to the extent that a person feels or more. The tactile sensor may sense various information such as the roughness of the contact surface, the rigidity of the contact object, the temperature of the contact point, and the like.

사용자의 터치 제스처에는 탭, 터치&홀드, 더블 탭, 드래그, 패닝, 플릭, 드래그 앤드 드롭, 스와이프 등이 있을 수 있다.The user's touch gesture may include tap, touch and hold, double tap, drag, pan, flick, drag and drop, and swipe.

알림 모듈(1730)은 전자 장치(1000)의 이벤트 발생을 알리기 위한 신호를 발생할 수 있다.The notification module 1730 may generate a signal for notifying occurrence of an event of the electronic device 1000.

도 5는 일 실시 예에 의한 복수의 프로세서를 이용하여 신경망 모델을 처리하는 방법을 나타낸 순서도이다.5 is a flowchart illustrating a method of processing a neural network model using a plurality of processors according to an exemplary embodiment.

도 5를 참조하면, 단계 510에서, 전자 장치(1000)는 신경망 모델(110)을 구성하는 복수 개의 레이어를 적어도 하나의 슬라이스에 할당할 수 있다. 일 실시 예에 의한 전자 장치(1000)는 각각의 레이어가 슬라이스 포인트인지 여부를 결정함으로써, 복수 개의 레이어를 적어도 하나의 슬라이스에 할당할 수 있다.Referring to FIG. 5, in operation 510, the electronic device 1000 may allocate a plurality of layers constituting the neural network model 110 to at least one slice. The electronic apparatus 1000 according to an embodiment may allocate a plurality of layers to at least one slice by determining whether each layer is a slice point.

예를 들어, 제1 레이어가 슬라이스 포인트로 결정되면, 새로운 제1 슬라이스가 생성되고, 제1 레이어는 이전 슬라이스 대신 상기 제1 슬라이스에 할당될 수 있다. 상기 이전 슬라이스는, 제1 레이어가 슬라이스 포인트인지 여부를 결정하기 전 적어도 하나의 레이어가 이미 할당되어 있는 슬라이스를 나타낼 수 있다. 반면, 제1 레이어가 슬라이스 포인트로 결정되지 않은 경우, 제1 레이어는 이전 슬라이스에 할당될 수 있다.For example, if the first layer is determined to be a slice point, a new first slice may be generated, and the first layer may be assigned to the first slice instead of the previous slice. The previous slice may indicate a slice in which at least one layer is already allocated before determining whether the first layer is a slice point. On the other hand, if the first layer is not determined as the slice point, the first layer may be allocated to the previous slice.

일 실시 예에 따라 제1 레이어가 제1 슬라이스에 할당된 후, 제2 레이어가 슬라이스 포인트로 결정되면, 제2 슬라이스가 새로 생성되고, 제2 레이어는 새로 생성된 제2 슬라이스에 할당될 수 있다. 반면, 제2 레이어가 슬라이스 포인트 결정되지 않은 경우, 제2 레이어는 제1 슬라이스에 할당될 수 있다.According to an embodiment, after the first layer is assigned to the first slice, if the second layer is determined as a slice point, a second slice may be newly created and the second layer may be allocated to the newly created second slice. . On the other hand, if the second layer is not determined slice point, the second layer may be assigned to the first slice.

단계 520에서, 전자 장치(1000)는, 신경망 모델(110)을 구성하는 복수 개의 레이어를 적어도 하나의 슬라이스에 할당한 후, 각 슬라이스를 복수의 프로세서에 할당할 수 있다. 일 실시 예에 의한 복수의 프로세서는, 각 프로세서에 할당된 슬라이스에 포함된 적어도 하나의 레이어를 처리함으로써, 신경망 모델(110)을 처리할 수 있다.In operation 520, the electronic apparatus 1000 may allocate a plurality of layers configuring the neural network model 110 to at least one slice, and then assign each slice to a plurality of processors. According to an embodiment, the plurality of processors may process the neural network model 110 by processing at least one layer included in a slice allocated to each processor.

일 실시 예에 의하면, 전자 장치(1000)는, 각 프로세서가 슬라이스를 처리하는데 소요되는 처리 시간에 기초하여, 각 슬라이스를 복수의 프로세서에 할당할 수 있다. 일 실시 예에 의한 처리 시간은, 각 프로세서가 슬라이스를 처리하는데 소요되는 시간뿐만 아니라, 슬라이스를 처리하기 위하여, 다른 프로세서로부터 데이터를 전달받는데 걸리는 스위칭 시간을 더 포함할 수 있다.According to an embodiment of the present disclosure, the electronic apparatus 1000 may allocate each slice to the plurality of processors based on the processing time required for each processor to process the slice. The processing time according to an embodiment may further include not only the time required for each processor to process the slice, but also the switching time required to receive data from another processor to process the slice.

단계 530에서, 전자 장치(1000)는 단계 520의 할당 결과에 기초하여, 복수의 프로세서를 이용하여 신경망 모델(110)을 처리할 수 있다.In operation 530, the electronic apparatus 1000 may process the neural network model 110 using a plurality of processors based on the allocation result of operation 520.

일 실시 예에 의하면, 각 슬라이스를 처리할 프로세서가 각각 할당됨에 따라, 신경망 모델(110)에서 병렬적으로 배열된 복수의 슬라이스들은 복수의 프로세서에 의해 동시에 처리될 수 있다. 따라서, 일 실시 예에 의하면, 신경망 모델(110)의 각 레이어를 위상 정렬(topological sort) 방법에 따라 일렬로 정렬시켜, 차례대로 처리하는 경우보다, 더 빠르게 신경망 모델(110)의 각 레이어가 처리될 수 있다.According to an embodiment, as processors to process each slice are allocated to each other, a plurality of slices arranged in parallel in the neural network model 110 may be simultaneously processed by a plurality of processors. Therefore, according to an exemplary embodiment, each layer of the neural network model 110 is processed faster than when each layer of the neural network model 110 is aligned in a line according to a topological sorting method, and processed sequentially. Can be.

도 6은 일 실시 예에 따른 슬라이스를 복수의 프로세서에 할당하는 방법을 나타낸 순서도이다.6 is a flowchart illustrating a method of allocating a slice to a plurality of processors according to an exemplary embodiment.

도 6을 참조하면, 단계 610에서, 전자 장치(1000)는 슬라이스 s 및 프로세서 p에 관한 정보를 획득할 수 있다. Referring to FIG. 6, in operation 610, the electronic apparatus 1000 may obtain information about a slice s and a processor p.

일 실시 예에 있어서, 슬라이스 s 및 프로세서 p는, 적어도 하나의 슬라이스 및 복수의 프로세서 중 하나의 슬라이스 s 및 하나의 프로세서 p를 나타낼 수 있다. 일 실시 예에 의하면, 단계 610에서, 전자 장치(100)에 포함된 복수의 프로세서 중 슬라이스 s에 포함된 레이어들을 처리할 수 있는 프로세서가 프로세서 p로 결정될 수 있다.In an embodiment, the slice s and the processor p may represent one slice s and one processor p of at least one slice and a plurality of processors. According to an embodiment of the present disclosure, in operation 610, a processor that may process layers included in the slice s among the plurality of processors included in the electronic device 100 may be determined as the processor p.

예를 들면, 슬라이스 s에 관한 정보는, 슬라이스 s에 포함된 적어도 하나의 레이어의 작업에 관한 정보를 포함할 수 있다. 또한, 프로세서 p에 관한 정보는 프로세서 p에 의해 수행될 수 있는 작업에 관한 정보, 프로세서 p에 의해 작업이 처리되는 시간에 관한 정보, 다른 프로세서 간 데이터가 전달되는데 걸리는 스위칭 시간에 관한 정보 등, 프로세서 p의 성능에 관한 정보를 포함할 수 있다. 상술한 예에 한하지 않고, 슬라이스 s 및 프로세서 p에 관한 정보는, 프로세서 p가 슬라이스 s를 처리하는데 소요되는 처리 시간과 처리 결과에 대한 정확도를 예측하는데 이용될 수 있는 다양한 정보를 포함할 수 있다. For example, the information about the slice s may include information about the operation of at least one layer included in the slice s. In addition, information about processor p may include information about a task that may be performed by processor p, information about the time that the task is processed by processor p, information about the switching time it takes to transfer data between different processors, and so on. It may include information about the performance of p. In addition to the above-described examples, the information about the slice s and the processor p may include various information that may be used to predict the processing time and the accuracy of the processing result of the processor p processing the slice s. .

단계 620에서, 전자 장치(1000)는 단계 610에서 획득된 정보에 기초하여, 프로세서 p에 의해 슬라이스 s가 처리되는 경우, 슬라이스 s의 처리 결과에 대한 정확도를 예측하고, 예측된 정확도가 기준값 이하인지 여부를 판단할 수 있다.In operation 620, when the slice s is processed by the processor p based on the information obtained in operation 610, the electronic apparatus 1000 may predict the accuracy of the processing result of the slice s and determine whether the predicted accuracy is less than or equal to the reference value. It can be determined.

일 실시 예에 의하면, 예측된 정확도가 기준값 이하인 경우, 단계 650에서, 프로세서 p외의 다른 프로세서가 존재하는지 여부가 판단될 수 있다. 상술한 다른 프로세서는, 슬라이스 s에 대한 할당 여부가 아직 결정되지 않은 전자 장치(1000)의 복수의 프로세서 중 슬라이스 s를 처리할 수 있는 프로세서일 수 있다. According to an embodiment, when the predicted accuracy is less than or equal to the reference value, in step 650, it may be determined whether there is a processor other than the processor p. The other processor described above may be a processor capable of processing slice s among a plurality of processors of the electronic apparatus 1000 that have not yet been determined to allocate slice s.

일 실시 예에 의하면, 단계 660에서, 전자 장치(1000)는 슬라이스 s를 처리할 수 있는 다른 프로세서를 식별할 수 있다. 전자 장치(1000)는, 식별된 다른 프로세서에 대해, 단계 610 내지 단계 640의 동작을 반복하여 수행할 수 있다.According to an embodiment, in operation 660, the electronic apparatus 1000 may identify another processor capable of processing the slice s. The electronic apparatus 1000 may repeatedly perform the operations of steps 610 to 640 with respect to the identified other processor.

단계 630에서, 전자 장치(1000)는, 단계 610에서 획득된 정보에 기초하여, 프로세서 p에 의한 슬라이스 s의 작업을 처리하는데 소요되는 시간을 예측할 수 있다. 예를 들면, 전자 장치(1000)는, 프로세서 p의 처리 시간을 작업 타입 별로 판단하고, 판단된 처리 시간에 기초하여, 슬라이스 s에 포함된 적어도 하나의 레이어의 작업에 대한 프로세서 p의 처리 시간을 획득할 수 있다. 상술한 예에 한하지 않고, 프로세서 p에 의한 슬라이스 s의 처리 시간은 다양한 방법으로 예측될 수 있다.In operation 630, the electronic apparatus 1000 may estimate a time required for processing a task of the slice s by the processor p based on the information obtained in operation 610. For example, the electronic apparatus 1000 may determine the processing time of the processor p for each task type, and based on the determined processing time, determine the processing time of the processor p for the work of at least one layer included in the slice s. Can be obtained. In addition to the above-described examples, the processing time of the slice s by the processor p can be estimated in various ways.

또한, 단계 640에서, 단계 610에서 획득된 정보에 기초하여, 전자 장치(1000)는, 슬라이스 s의 입력 데이터가 다른 프로세서에서 프로세서 p로 전달되는데 걸리는 스위칭 시간을 예측할 수 있다. In operation 640, the electronic apparatus 1000 may estimate a switching time taken for the input data of the slice s to be transferred to the processor p from another processor, based on the information obtained in operation 610.

예를 들면, 슬라이스 s의 입력 데이터는, 상기 입력 데이터가 출력되는 이전 슬라이스를 처리하는 프로세서에 의해 프로세서 p로 전달될 수 있다. 일 실시 예에 의한 이전 슬라이스는, 슬라이스 s에 입력될 데이터를 출력하는 슬라이스를 나타낼 수 있다. 또한, 스위칭 시간은, 이전 슬라이스를 처리하는 프로세서에서, 상기 슬라이스 s의 입력 데이터가 프로세서 p로 전달되는데 걸리는 시간을 나타낼 수 있다. 한편, 이전 슬라이스를 처리하는 프로세서가 프로세서 p와 동일한 경우, 스위칭 시간은 0으로 결정될 수 있다.For example, the input data of slice s may be delivered to processor p by a processor that processes the previous slice from which the input data is output. The previous slice according to an embodiment may indicate a slice for outputting data to be input to the slice s. In addition, the switching time may represent a time taken for the input data of the slice s to be transferred to the processor p in the processor processing the previous slice. On the other hand, when the processor processing the previous slice is the same as the processor p, the switching time may be determined to be zero.

일 실시 예에 의한 스위칭 시간은, 전자 장치(1000)의 서로 다른 프로세서들 간 데이터가 전달되는데 걸리는 시간에 기초하여 예측될 수 있다.According to an embodiment, the switching time may be estimated based on a time taken for data to be transferred between different processors of the electronic apparatus 1000.

단계 650에서, 전자 장치(1000)는, 전자 장치(1000)의 프로세서들 중, 슬라이스 s에 대하여, 단계 620 내지 640에 따라, 정확도, 처리시간 및 스위치 시간 중 적어도 하나가 판단되지 않은, 다른 프로세서가 존재하는지 여부를 판단할 수 있다. 다른 프로세서가 존재하는 경우, 전자 장치(1000)는 단계 660에서 식별된 다른 프로세서에 대해 단계 610 내지 단계 640의 동작을 반복하여 수행할 수 있다.In operation 650, the electronic apparatus 1000 may determine another slice of the processors of the electronic apparatus 1000, for which slice s, at least one of accuracy, processing time, and switch time is not determined according to operations 620 to 640. It can be determined whether there exists. If there is another processor, the electronic apparatus 1000 may repeat the operations of steps 610 to 640 with respect to the other processor identified in step 660.

또한, 슬라이스 s에 대하여, 단계 620 내지 640에 따라, 정확도, 처리시간 및 스위치 시간 중 적어도 하나가 판단되지 않은, 다른 프로세서가 존재하지 않는 경우, 전자 장치(1000)는 단계 670에서, 슬라이스 s를 복수의 프로세서 중 하나의 프로세서에 할당할 수 있다. In addition, with respect to slice s, if there is no other processor for which at least one of accuracy, processing time, and switch time is not determined according to steps 620 through 640, the electronic apparatus 1000 may select slice s in step 670. The processor may be allocated to one processor of the plurality of processors.

일 실시 예에 의하면, 슬라이스 s에 대한 각 프로세서의 처리 시간 및 스위치 시간에 기초하여, 슬라이스 s를 처리할 프로세서가 결정될 수 있다. 예를 들면, 처리 시간 및 스위치 시간을 합한 값이 가장 작은 프로세서가 슬라이스 s를 처리할 프로세서로 결정될 수 있다.According to an embodiment of the present disclosure, a processor to process slice s may be determined based on a processing time and a switch time of each processor for slice s. For example, the processor having the smallest sum of processing time and switch time may be determined as a processor to process slice s.

일 실시 예에 따라 정확도가 기준값 이하인 프로세서는, 단계 670의 슬라이스 s를 처리할 프로세서로 결정되지 않고, 제외될 수 있다. 따라서, 처리 시간이나 스위치 시간이 짧아도, 처리 결과에 대한 정확도가 낮은 프로세서는, 슬라이스 s를 처리할 프로세서로 할당되지 않을 수 있다.According to an embodiment of the present disclosure, a processor having an accuracy equal to or less than a reference value is not determined as a processor to process slice s of step 670 and may be excluded. Therefore, even if the processing time or the switch time is short, the processor with low accuracy for the processing result may not be allocated to the processor to process the slice s.

일 실시 예에 따라, 슬라이스 s에 대한 프로세서가 할당된 후, 슬라이스 s외에 프로세서가 할당되지 않은 다른 슬라이스에 대하여, 단계 610 내지 670에 대한 동작이 반복하여 수행될 수 있다.According to an embodiment of the present disclosure, after the processor for the slice s is allocated, the operations of steps 610 to 670 may be repeatedly performed for the other slices to which the processor is not allocated in addition to the slice s.

도 7은 일 실시 예에 의한 슬라이스에 프로세서를 할당하는 일 예를 나타낸 도면이다.7 is a diagram illustrating an example of allocating a processor to a slice, according to an exemplary embodiment.

도 7을 참조하면, 710은, 처리 시간 및 스위칭 시간에 따라 슬라이스에 프로세서를 할당하는 그래프의 일 예를 나타낸 것이다. 710의 그래프에서, 노드는, 슬라이스의 작업을 처리하는데 소요되는 시간을 나타내며, 화살표는 이전 노드에서 현재 노드로 데이터가 스위칭되는데 소요되는 시간을 나타낸다. Referring to FIG. 7, 710 illustrates an example of a graph of allocating a processor to a slice according to a processing time and a switching time. In the graph of 710, the node represents the time taken to process the slice's work, and the arrow represents the time taken to switch data from the previous node to the current node.

일 실시 예에 의하면, 도 6에 도시된 방법과 같이, 슬라이스 별로 차례대로 프로세서가 할당되는 방법과는 달리, 각각의 노드에서의 처리 시간의 총합이 가장 작은 경로에 따라 슬라이스를 처리할 프로세서가 결정될 수 있다. 일 실시 예에 의한 처리 시간은, 각 노드에서 프로세서가 슬라이스의 작업을 처리하는데 소요되는 시간뿐만 아니라 스위칭 시간도 포함할 수 있다.According to an embodiment of the present disclosure, unlike a method in which processors are sequentially allocated for each slice as in the method illustrated in FIG. 6, a processor to determine a slice may be determined according to a path having the smallest sum of processing times at each node. Can be. The processing time according to an embodiment may include a switching time as well as a time required for a processor to process a slice at each node.

예를 들면, 적어도 하나의 슬라이스 중 신경망 모델(110)에서 직렬적으로 배열된 슬라이스들은, 서로 다른 슬라이스 및 프로세서의 조합을 나타내는 복수 개의 노드(711, 721, 722, 723, 731, 732, 733, 741, 742)가 상기 슬라이스들의 배열 순서에 따라 연결됨으로써 생성된 복수 개의 경로 중 처리 시간의 총합이 가장 작은 경로에 기초하여, 상기 복수의 프로세서에 할당될 수 있다.For example, slices arranged in series in the neural network model 110 among at least one slice may include a plurality of nodes 711, 721, 722, 723, 731, 732, 733, representing a combination of different slices and processors. 741 and 742 may be allocated to the plurality of processors based on a path having the smallest sum of processing times among the plurality of paths generated by connecting the slices according to the arrangement order of the slices.

슬라이스 1로 입력되는 입력 데이터(711)는 기본 프로세서인 CPU에서 처리될 수 있다. 화살표 방향에 따라, 입력 데이터(711)는 CPU, GPU 및 NPU 등의 프로세서들 중 하나의 프로세서로 스위칭될 수 있다. 입력 데이터(711)가 어느 하나의 프로세서로 스위칭됨에 따라, 입력 데이터(711)는 슬라이스 1에 포함된 레이어에 입력되어 노드들(721, 722, 723) 중 하나의 노드에서 처리될 수 있다.The input data 711 input to the slice 1 may be processed by a CPU which is a basic processor. According to the direction of the arrow, the input data 711 may be switched to one of the processors, such as CPU, GPU and NPU. As the input data 711 is switched to any one processor, the input data 711 may be input to a layer included in the slice 1 and processed at one of the nodes 721, 722, and 723.

일 실시 예에 의하면, NPU에 의해 슬라이스 2가 처리된 결과의 정확도가 기준값 이하이거나, 슬라이스 2가 NPU에 의해 처리될 수 없는 레이어를 포함함에 따라, 슬라이스 2에 대한 프로세서 할당 동작에서 NPU가 제외될 수 있다.According to one embodiment, the NPU is excluded from the processor allocation operation for slice 2 as the accuracy of the result of processing slice 2 by the NPU is below a reference value, or because slice 2 includes a layer that cannot be processed by the NPU. Can be.

일 실시 예에 의하면, 그래프(710)와 같이, 슬라이스 1 내지 3에 대한 CPU, GPU 및 NPU의 할당방법은, 총 36개의 경로 중 하나로 결정될 수 있다. 36개의 경로들 중 각 노드의 처리 시간 및 스위칭 시간의 합이 최소인 경로에 따라 슬라이스 1 내지 3에 대한 프로세서가 각각 할당될 수 있다.According to an embodiment, as shown in the graph 710, the allocation method of the CPU, the GPU, and the NPU for the slices 1 to 3 may be determined as one of a total of 36 paths. Processors for slices 1 to 3 may be allocated according to a path having a minimum sum of processing time and switching time of each node among the 36 paths.

750은, 일 실시 예에 따라 슬라이스에 프로세서가 할당된 일 예를 나타낸 그래프이다. 적어도 하나의 레이어를 포함하는 각 슬라이스(751, 752, 753, 754, 755, 756)에 대해 프로세서가 할당될 수 있다.750 is a graph illustrating an example in which a processor is allocated to a slice according to an embodiment. A processor may be allocated to each slice 751, 752, 753, 754, 755, 756 including at least one layer.

750에 기재된, GPU 16 및 GPU 32는, 각각 16비트, 32비트 GPU를 나타내며, 일 실시 예에서 서로 다른 프로세서로 처리될 수 있다.GPU 16 and GPU 32, described in 750, represent 16-bit and 32-bit GPUs, respectively, and may be processed with different processors in one embodiment.

일 실시 예에 의하면, 752, 및 754, 755의 슬라이스들은, 각각 다른 프로세서에 의해 할당됨에 따라서, 병렬적으로 동시에 처리될 수 있다. 따라서, 일 실시 예에 따라, 신경망 모델(110)에서 병렬로 배열된 일부 레이어들은, 복수의 프로세서에 의해 병렬적으로 동시에 처리될 수 있으므로, 처리 속도가 보다 향상될 수 있다.According to one embodiment, the slices 752 and 754 and 755 may be processed simultaneously in parallel as they are allocated by different processors. Therefore, according to an embodiment, some layers arranged in parallel in the neural network model 110 may be processed simultaneously in parallel by a plurality of processors, thereby further improving processing speed.

일 실시 예에 의하면, 동일한 프로세서가 할당된 755 및 756의 슬라이스들은 NPU에 의해 동시에 처리될 수 있도록 각 레이어들이 직렬적으로 정렬된 후, NPU에 의해 처리될 수 있다. 예를 들면, 각 레이어들은, 위상 정렬(topological sort) 방법에 따라 정렬된 후, NPU에 의해 동시에 처리될 수 있다.According to one embodiment, slices of 755 and 756 assigned the same processor may be processed by the NPU after each layer is serially aligned so that they may be processed simultaneously by the NPU. For example, each layer may be sorted according to a topological sort method and then processed simultaneously by the NPU.

일 실시 예에 의하면, 도 7의 710에 따른 방법과 도 6에 도시된 방법을 조합한 방법에 따라 복수의 프로세서에 적어도 하나의 슬라이스가 할당될 수 있다. According to an embodiment of the present disclosure, at least one slice may be allocated to the plurality of processors according to a combination of the method of 710 of FIG. 7 and the method of FIG. 6.

신경망 모델(110)에서 직렬적으로 배열된 슬라이스들, 예를 들면, 751, 752, 753의 슬라이스들은, 도 7에 도시된 방법과 같은, 서로 다른 슬라이스 및 프로세서의 조합을 나타내는 복수 개의 노드가 상기 슬라이스들의 배열 순서에 따라 연결됨으로써 생성된 복수 개의 경로 중 처리 시간의 총합이 가장 작은 경로에 기초하여, 프로세서가 할당될 수 있다.In the neural network model 110, slices arranged in series, for example, slices of 751, 752, and 753, are represented by a plurality of nodes representing a combination of different slices and processors, such as the method illustrated in FIG. 7. A processor may be allocated based on a path having the smallest sum of processing times among a plurality of paths generated by concatenating according to an arrangement order of slices.

또한, 신경망 모델(110)에서 병렬적으로 배열된 슬라이스들, 예를 들면, 754, 755, 756의 슬라이스들은, 도 6에 도시된 방법에 따라, 슬라이스에 대한 프로세서의 처리 시간, 스위칭 시간 및 정확도에 기초하여, 프로세서가 할당될 수 있다.In addition, the slices arranged in parallel in the neural network model 110, for example, slices of 754, 755, 756, may be processed according to the method shown in FIG. 6, the processing time, the switching time, and the accuracy of the processor for the slice. Based on this, a processor may be allocated.

도 8은 일 실시 예에 의한 레이어에서 메모리가 할당되는 일 예를 나타낸 도면이다.8 is a diagram illustrating an example of allocating a memory in a layer according to an embodiment.

일 실시 예에 의한, 레이어 중 컨볼루션 레이어는 도 8에 도시된 예와 같이, 1x1 컨볼루션(812), DC(Depthwise Convolution, 813 내지 815), 및 1x1 컨볼루션(816)을 채널 1부터 채널 N까지 반복하여 수행되는 작업을 포함할 수 있다. 채널 1에 대한 작업은 단계 810에서 수행될 수 있고, 채널 2에 대한 작업은 단계 810이 수행된 후, 단계 820에서 수행될 수 있다. 채널 N도 마찬가지로, 채널 N에 앞선 단계들이 수행된 후, 단계 830에서 수행될 수 있다. According to an embodiment of the present invention, a convolutional layer among the layers may include a 1 × 1 convolution 812, a DC (Depthwise Convolution, 813 to 815), and a 1 × 1 convolution 816 from channel 1 as shown in FIG. It may include a task performed repeatedly to N. Operation for channel 1 may be performed in step 810, and operation for channel 2 may be performed in step 820 after step 810 is performed. Channel N may likewise be performed in step 830 after the steps preceding channel N are performed.

일 실시 예에 있어서, 채널은, 레이어에 입력되는 입력 데이터에서, 동일한 크기의 복수 개의 데이터 각각과 대응될 수 있다. 예를 들어, 제1 채널에서는, 복수 개의 데이터 중 제1 데이터가 처리될 수 있고, 제2 채널에서는 제2 데이터가 처리될 수 있다. 따라서, 레이어에 동일한 크기의 복수 개의 영상 데이터가 입력되는 경우, 복수 개의 입력 데이터는 복수 개의 입력 데이터 개수만큼의 채널 별로 순차적으로 처리될 수 있다.According to an embodiment, the channel may correspond to each of a plurality of data having the same size in the input data input to the layer. For example, in the first channel, first data of the plurality of data may be processed, and in the second channel, second data may be processed. Therefore, when a plurality of image data having the same size is input to the layer, the plurality of input data may be sequentially processed for each channel by the number of the plurality of input data.

일 실시 예에 의하면, 컨볼루션 레이어에서, 각 작업들은 순차적으로 처리될 수 있다. 예를 들면, 모든 채널의 입력 데이터에 대해 1x1 컨볼루션(812)이 수행된 후, 1x1 컨볼루션(812)이 수행된 결과에 기초하여, DC(813 내지 815)가 수행될 수 있다. 또한, DC(813 내지 815) 수행 결과에 기초하여, 1x1 컨볼루션(816)이 수행될 수 있다. According to one embodiment, in the convolution layer, each task may be processed sequentially. For example, after 1 × 1 convolution 812 is performed on input data of all channels, DCs 813 to 815 may be performed based on a result of 1 × 1 convolution 812 being performed. In addition, the 1 × 1 convolution 816 may be performed based on the result of performing the DCs 813 to 815.

일 실시 예에 의하면, 상술한 작업 별로 순차적으로 처리되는 대신, 채널 1부터 채널 N까지 차례대로 작업이 수행될 수도 있다. 예를 들면, 56x56 크기의 입력 데이터가 24개 존재하는 경우, 24개의 채널이 존재하고, 채널 1부터 채널 24까지의 작업이 채널별로 순차적으로 수행될 수 있다.According to an embodiment, instead of being sequentially processed for each of the above-described tasks, the tasks may be sequentially performed from channel 1 to channel N. FIG. For example, if there are 24 56x56 input data, there are 24 channels, and operations from channel 1 to channel 24 may be sequentially performed for each channel.

일 실시 예에 의하면, 작업이 채널별로 순차적으로 수행되는 경우, 컨볼루션 레이어에 입력되는, 56x56 크기의 24개의 데이터(811, 821, 831)를 저장하기 위한 294KB 크기의 메모리가 할당될 수 있다. 예를 들어, 입력 데이터가 영상 데이터인 경우, 동일 크기의 복수 개의 영상 데이터를 저장하기 위한 메모리가 할당될 수 있다.According to an embodiment of the present disclosure, when operations are sequentially performed for each channel, a 294KB size memory for storing 24 56x56 size data 811, 821, and 831 input to the convolutional layer may be allocated. For example, when the input data is image data, a memory for storing a plurality of image data of the same size may be allocated.

일 실시 예에 의하면, 채널 별로 작업이 순차적으로 처리되는 경우, 모든 채널에 대한 데이터를 저장하기 위한 메모리가 할당하는 대신, 1개 채널의 데이터를 저장하기 위한 메모리만 할당될 수 있다. 따라서, 채널 별로 작업이 순차적으로 처리되는 경우, 작업별로 순차적으로 처리되는 경우보다, 데이터 저장을 위한 할당되는 메모리 공간이 감소될 수 있다.According to an embodiment, when a job is sequentially processed for each channel, instead of allocating a memory for storing data for all channels, only a memory for storing data of one channel may be allocated. Therefore, when a job is sequentially processed for each channel, a memory space allocated for data storage may be reduced, as compared with when the job is sequentially processed for each job.

810 단계에서는, 채널 1의 작업인, 1x1 컨볼루션(812), DC(813, 814, 815) 및 1x1 컨볼루션(816)이 순차적으로 수행될 수 있다. 810 단계에서 수행되는 1x1 컨볼루션(812)은, 811의 56x56 크기의 24개의 데이터 중 첫번째 데이터 및 812의 1x24 크기의 24개의 데이터 중 첫번째 데이터에 대해 수행될 수 있다. In operation 810, the 1 × 1 convolution 812, the DC 813, 814, 815, and the 1 × 1 convolution 816, which are operations of the channel 1, may be sequentially performed. The 1x1 convolution 812 performed in step 810 may be performed on the first data of 24 data of size 56x56 of 811 and the first data of 24 data of size of 1x24 of 812.

일 실시 예에 의하면, 1x1 컨볼루션(812) 처리를 위해, 812의 1x24 크기의 24개의 데이터 모두가 메모리에 할당되는 대신, 810 단계에서만 이용되는, 812의 1x24 크기의 24개의 데이터 중 첫번째 데이터만이 메모리에 할당될 수 있다. 1x1 컨볼루션(812) 처리를 위해 이용되는 1x24 크기의 24개의 데이터는 전자 장치(1000)에 미리 저장된 값일 수 있다.According to one embodiment, for the 1x1 convolution 812 processing, instead of all 24 data of 1x24 size of 812 being allocated to memory, only the first data of 24 data of size 1x24 of 812, which is used only in step 810, is used. This memory can be allocated. Twenty four 1x24 size data used for the 1x1 convolution 812 processing may be values stored in advance in the electronic apparatus 1000.

820 단계에서는, 추가적인 메모리 할당 없이, 앞선 810단계에서 할당된 812의 데이터의 메모리와 동일한 공간에 822의 1x24 크기의 두번째 데이터가 저장될 수 있다. In step 820, without additional memory allocation, second data of size 822 of 1 × 24 may be stored in the same space as the memory of the data of 812 allocated in step 810.

따라서, 일 실시 예에 의하면, 24개의 각각의 채널에서 1x1 컨볼루션(812, 822, 832)이 수행되는데 있어, 1x24 크기의 24개의 데이터 모두가 저장될 수 있는 14KB 크기의 메모리가 할당되는 대신, 1x24 크기의 1개 데이터가 저장될 수 있는 0.09KB 크기의 메모리만 할당될 수 있다. 일 실시 예에 의하면, 이전 채널에서 1x1 컨볼루션 처리를 위해, 할당된 메모리 공간을 현재 채널에서 1x1 컨볼루션 처리를 위해, 재사용할 수 있다. 따라서, 각 채널에서 레이어를 처리하는데 필요한 메모리 공간이 감소될 수 있다. Thus, according to one embodiment, 1x1 convolutions 812, 822, and 832 are performed on each of the 24 channels, and instead of allocating a memory having a size of 14KB in which all 24 data of 1x24 size can be stored, Only memory of size 0.09KB can be allocated in which one piece of data of size 1x24 can be stored. According to an embodiment, the allocated memory space may be reused for 1 × 1 convolution processing in the current channel for 1 × 1 convolution processing in the previous channel. Thus, the memory space required to process the layer in each channel can be reduced.

DC(813, 814, 815)에서 처리되는 데이터에 대한 메모리 할당도, 모든 데이터가 아닌, 813 내지 815에서 처리되는 데이터에 대해 메모리가 할당될 수 있다. 813은, 812의 1x1 컨볼루션(812)이 수행됨에 의해 획득된 데이터로, 56x56 크기의 데이터를 포함할 수 있다. 또한, 814는 DC에 이용되는 3x3 크기의 커널 데이터이다. 3x3 크기의 커널 데이터는 DC 처리를 위하여 전자 장치(1000)에 미리 저장된 값일 수 있다. 815는, DC가 처리된 결과로 획득되는 데이터로, 56x56 크기의 데이터를 포함할 수 있다.Memory allocation for data processed at DCs 813, 814, 815 may also be allocated for data processed at 813-815, rather than all data. 813 is data obtained by performing the 1 × 1 convolution 812 of 812 and may include 56 × 56 data. Also, 814 is kernel data of 3x3 size used for DC. Kernel data of 3x3 size may be a value previously stored in the electronic apparatus 1000 for DC processing. 815 is data obtained as a result of the DC processing, and may include 56 × 56 data.

일 실시 예에 의하면, 채널 1에서 DC(813, 814, 815) 처리를 위하여, 813의 56x56 크기의 데이터(12.25KB), 814의 3x3 크기의 커널 데이터(0.03KB), 815의 56x56 크기의 데이터(12.25KB)를 저장하기 위한 24.53KB 크기의 메모리가 할당될 수 있다. 채널 2의 820 단계에서는, 추가적인 메모리 할당 없이, 앞선 810단계에서 할당된 813, 814, 815의 데이터의 메모리와 동일한 공간에 823, 824, 825의 데이터가 저장될 수 있다.According to one embodiment, for processing DC 813, 814, 815 in channel 1, 813 56x56 size data (12.25KB), 814 3x3 size kernel data (0.03KB), 815 56x56 size data Memory of size 24.53KB may be allocated for storing (12.25KB). In step 820 of the channel 2, the data of 823, 824, and 825 may be stored in the same space as the memory of the data of 813, 814, and 815 allocated in the previous step 810 without additional memory allocation.

1x1 컨볼루션(816)에서 처리되는 데이터에 대한 할당도 마찬가지로, 모든 데이터가 아닌, 816에서 처리되는 데이터에 대해 메모리가 할당될 수 있다. 816에서는, 1x1 크기의 144개의 데이터가 1x1 컨볼루션(816)에서 이용될 수 있다. 따라서, 1x1 크기의 144개의 데이터가 저장될 수 있는 0.09KB 크기의 메모리가 할당될 수 있다. 820 단계에서는, 추가적인 메모리 할당 없이, 앞선 810단계에서 할당된 816의 데이터의 메모리와 동일한 공간에 826의 데이터가 저장될 수 있다.Allocations for data processed in the 1 × 1 convolution 816 may likewise be allocated memory for data processed at 816, rather than all data. At 816, 144 data of 1 × 1 size may be used in the 1 × 1 convolution 816. Therefore, a memory of size 0.09 KB, in which 144 pieces of data of size 1 × 1 may be stored, may be allocated. In operation 820, the data of 826 may be stored in the same space as the memory of the data of 816 allocated in operation 810, without additional memory allocation.

1x1 컨볼루션(816)이 수행된 결과, 817에서, 56x56 크기의 24개의 데이터가 획득될 수 있다. 일 실시 예에 의하면, 817에서 획득된 56x56 크기의 24개의 데이터가 저장될 수 있는 294KB 크기의 메모리가 할당될 수 있다. 단계 820에서도, 817과 동일하게 827에서, 1x1 컨볼루션(826)이 수행됨에 따른, 56x56 크기의 24개의 데이터가 획득될 수 있다. 이후 채널 3 내지 채널 N에서도, 1x1 컨볼루션, DC, 및 1x1 컨볼루션이 수행된 결과에 따라, 각 채널에 대한 56x56 크기의 24개의 데이터가 획득될 수 있다.As a result of the 1 × 1 convolution 816 being performed, at 817, 24 data of 56 × 56 size may be obtained. According to an embodiment of the present disclosure, a 294KB size memory capable of storing 24 pieces of 56x56 size data acquired at 817 may be allocated. In step 820, as in 817, 24 data having a size of 56 × 56 may be obtained as the 1 × 1 convolution 826 is performed at 827. Then, even in channels 3 to N, 24 data of 56x56 size for each channel can be obtained according to the result of 1x1 convolution, DC, and 1x1 convolution.

일 실시 예에 의하면, 각 채널에서 출력된 56x56 크기의 24개의 데이터가 결합됨으로써, 컨볼루션 레이어에 대한 최종 결과가 획득될 수 있다. According to an embodiment, by concatenating 24 pieces of 56x56 data output from each channel, the final result for the convolutional layer may be obtained.

따라서, 채널 1에서의 작업(810)이 처리되기 위하여 이용되는 적어도 하나의 데이터의 총합인 총 612KB의 메모리 공간이 할당될 수 있다. 일 실시 예에 의하면, 채널 2 내지 채널 N(820, 830)에서는, 각 채널의 작업이 순차적으로 수행됨에 의해, 추가적인 메모리 할당없이, 채널 1에서 할당된 612KB의 메모리 공간이 재사용될 수 있다. 따라서, 컨볼루션 레이어의 채널 1 내지 채널 N(810, 820, 830)의 작업이 모두 처리되는데, 단 612KB의 메모리 공간만이 할당되고, 이용될 수 있다.Thus, a total of 612 KB of memory space, which is the sum of the at least one data used for processing 810 on channel 1, may be allocated. According to an embodiment of the present disclosure, in channels 2 to N 820 and 830, operations of each channel are sequentially performed, so that 612 KB of memory space allocated in channel 1 may be reused without additional memory allocation. Therefore, all the tasks of the channel 1 through the channel N (810, 820, 830) of the convolutional layer are processed, but only 612 KB of memory space can be allocated and used.

일 실시 예에 따라 레이어가 처리되는데 이용되는 메모리 공간의 크기는 상술한 예에 한하지 않고, 처리되는 데이터의 크기에 따라서, 다양한 크기의 메모리 공간이 할당되고 이용될 수 있다.According to an embodiment, the size of the memory space used to process the layer is not limited to the above-described example, and various sizes of memory space may be allocated and used according to the size of the processed data.

또한, 도 8은, 컨볼루션 레이어에 의해 입력 데이터가 처리되는 일 예를 나타낸 것이나, 일 실시 예에 의하면, 컨볼루션 레이어에 한하지 않고, 다양한 종류의 레이어에도 복수 개의 입력 데이터가 입력될 수 있다. 일 실시 예에 의하면, 상술한 컨볼루션 레이어에 대해 메모리 공간이 할당되는 경우와 동일하게, 다른 종류의 레이어에 대한 메모리 공간도, 복수 개의 채널 중 제1 채널에서 작업을 처리하기 위해 필요한 크기만큼의 메모리가 상기 레이어의 상기 복수 개의 채널에 대한 작업 처리를 위하여 할당될 수 있다. 제1 채널을 제외한 나머지 채널에서는, 추가적인 메모리 할당 없이, 제1 채널의 작업을 위해 할당된 메모리를 재사용할 수 있다.In addition, FIG. 8 illustrates an example in which input data is processed by a convolution layer, but according to an embodiment, a plurality of input data may be input to various types of layers as well as the convolution layer. . According to an embodiment of the present disclosure, as in the case where the memory space is allocated to the convolutional layer described above, the memory space for another type of layer may be as large as necessary to process a job in the first channel among the plurality of channels. Memory may be allocated for job processing on the plurality of channels of the layer. In the other channels except the first channel, the memory allocated for the operation of the first channel can be reused without additional memory allocation.

도 9는 일 실시 예에 의한 레이어의 입출력 데이터에 대한 메모리를 할당하는 방법을 나타낸 순서도이다.9 is a flowchart illustrating a method of allocating memory for input / output data of a layer, according to an exemplary embodiment.

일 실시 예에 의한 메모리는, 레이어의 입출력 데이터와 각각 대응되는 블롭(blob)에 대하여 할당될 수 있다. 일 실시 예에 의한 신경망 모델(110)은, 작업이 수행되는 레이어와, 레이어에 입출력되는 각각의 데이터와 대응되는 블롭으로 구성될 수 있다. Memory according to an embodiment may be allocated to blobs corresponding to input / output data of a layer, respectively. The neural network model 110 according to an embodiment may include a layer on which a task is performed and a blob corresponding to each data input and output to the layer.

일 실시 예에 의하면, 레이어의 입출력 데이터뿐만 아니라, 레이어의 내부 함수로 인해 발생된 중간 데이터에 대한 블롭도 식별될 수 있다. 따라서, 전자 장치(1000)는 상기 중간 데이터에 대한 블롭을 더 고려하여 메모리를 할당할 수 있다.According to an embodiment of the present disclosure, a blob for intermediate data generated due to an internal function of the layer may be identified as well as input / output data of the layer. Accordingly, the electronic apparatus 1000 may allocate the memory in consideration of the blob for the intermediate data.

일 실시 예에 있어서, 메모리 할당은, 데이터가 저장될 메모리의 공간을 컴파일 단계에서 미리 할당하는 동작을 나타낸다. 일 실시 예에 의한 메모리 할당은, 동일한 메모리 공간에 할당될 블롭이 결정된 후, 블롭들의 데이터 크기에 기초하여, 각각의 메모리 공간의 크기가 결정될 수 있다.In one embodiment, memory allocation refers to an operation of pre-allocating a space of a memory where data is to be stored in a compilation step. In the memory allocation according to an embodiment, after the blobs to be allocated to the same memory space are determined, the size of each memory space may be determined based on the data size of the blobs.

도 9를 참조하면, 단계 910에서, 전자 장치(1000)는 각각의 프로세서에 대하여 할당된 레이어를 식별할 수 있다. 일 실시 예에 의한 전자 장치(1000)는 일 실시 예에 의한 복수의 프로세서에 슬라이스를 할당하는 방법에 따라 복수의 프로세서 중 제1 프로세서에 할당된 적어도 하나의 슬라이스에 포함된 적어도 하나의 레이어를 식별할 수 있다. 일 실시 예에 의한 전자 장치(1000)는 프로세서 별로 일 실시 예에 의한 메모리 할당을 수행할 수 있다.Referring to FIG. 9, in operation 910, the electronic apparatus 1000 may identify a layer allocated to each processor. According to an embodiment of the present disclosure, the electronic apparatus 1000 identifies at least one layer included in at least one slice allocated to a first processor among a plurality of processors according to a method of allocating slices to a plurality of processors. can do. The electronic apparatus 1000 according to an embodiment may perform memory allocation according to an embodiment of each processor.

단계 920에서, 전자 장치(1000)는, 하나의 프로세서에 할당된 적어도 하나의 레이어에 대한 실행 순서를 결정할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)는 위상 정렬 방법에 따라서, 적어도 하나의 레이어에 대한 처리 순서를 결정할 수 있다. 상술한 예에 한하지 않고, 다양한 방법에 따라 하나의 레이어에 대한 처리 순서를 결정할 수 있다.In operation 920, the electronic apparatus 1000 may determine an execution order of at least one layer allocated to one processor. According to an embodiment of the present disclosure, the electronic apparatus 1000 may determine a processing order of at least one layer according to a phase alignment method. The present invention is not limited to the above-described example, and the processing order for one layer can be determined according to various methods.

단계 930에서, 전자 장치(1000)는 각 레이어에 포함된 내부 함수를 식별하고, 단계 940에서, 식별된 내부 함수에 의하여, 상기 내부 함수를 포함하는 레이어 내부에서 임시 저장되는 데이터를 포함하는 블롭을 식별할 수 있다. 일 실시 예에 의하면, 각각의 레이어에 입출력되는 블롭은 신경망 모델(110)의 구조에 관한 정보에 따라 이미 식별되어 있는 블롭일 수 있다.In operation 930, the electronic apparatus 1000 may identify an internal function included in each layer, and in operation 940, the blob including data temporarily stored in the layer including the internal function may be generated by the identified internal function. Can be identified. According to an embodiment, the blob input and output to each layer may be a blob already identified according to the information about the structure of the neural network model 110.

예를 들어, 레이어에 두 개 이상의 내부 함수가 포함되어 있는 경우, 두 개의 내부 함수 사이에 임시 저장되는 중간 데이터에 대한 블롭이 존재할 수 있다. 상술한 예에 한하지 않고, 일 실시 예에 따라 레이어의 내부 함수로 입력되거나 출력되는 데이터에 대한 블롭 중 레이어에 대한 입출력되는 데이터에 대한 블롭을 제외한, 블롭이 단계 940에서 식별될 수 있다.For example, if a layer includes two or more internal functions, there may be a blob for intermediate data that is temporarily stored between the two internal functions. In addition to the above-described examples, the blob may be identified in operation 940 except for blobs for data input / output for a layer among blobs for data input or output through an internal function of the layer, according to an exemplary embodiment.

일 실시 예에 의하면, 레이어의 내부 함수로 인해 임시 저장되는 블롭에 대하여도, 레이어의 입출력 데이터와 같이 작업 수행 중 데이터가 저장될 메모리가 할당되어야 한다. 따라서, 전자 장치(1000)는 레이어의 입출력 데이터에 대한 블롭뿐만 아니라 레이어 내부의 임시 저장되는 블롭도 고려하여, 메모리 할당을 수행할 수 있다.According to an embodiment of the present disclosure, for a blob temporarily stored due to an internal function of a layer, a memory in which data is to be stored while performing a task, such as input / output data of the layer, must be allocated. Accordingly, the electronic apparatus 1000 may perform memory allocation in consideration of a blob stored temporarily in the layer as well as a blob for the input / output data of the layer.

단계 950에서, 전자 장치(1000)는, 각 레이어에 대한 입출력 데이터에 대한 블롭과, 단계 940에서 식별된 각 레이어의 내부 함수에 대해 식별된 블롭 중 적어도 하나의 블롭이 저장될 메모리를 할당할 수 있다. In operation 950, the electronic apparatus 1000 may allocate a blob for input / output data for each layer and at least one of the blobs identified for the internal function of each layer identified in operation 940. have.

일 실시 예에 의하면, 각 레이어의 처리 순서에 따라 이전 블롭의 사용 구간이 현재 블롭의 데이터가 생성되기 전에 종료되는지 여부에 기초하여, 현재 블롭에 대한 메모리가 할당될 수 있다. 예를 들면, 이전 블롭의 사용 구간이 현재 블롭의 데이터가 생성되기 전에 종료되는 경우, 블롭에 할당된 메모리와 동일한 메모리가 현재 블롭에 할당될 수 있다. According to an embodiment of the present disclosure, the memory for the current blob may be allocated based on whether the use period of the previous blob is terminated before the data of the current blob is finished according to the processing order of each layer. For example, if the use interval of the previous blob ends before the data of the current blob is generated, the same memory as the memory allocated to the blob may be allocated to the current blob.

일 실시 예에 의한 사용 구간은, 블롭의 데이터가 적어도 하나의 레이어에 의해 이용되는 구간에 기초하여 결정될 수 있다. 예를 들어, 사용 구간은 블롭의 데이터가 상기 데이터가 저장된 메모리 공간(ex. 버퍼)으로부터, 각 레이어의 작업이 수행됨에 의하여, 읽히거나 기록되는 구간을 나타낼 수 있다.The use section according to an embodiment may be determined based on a section in which blob data is used by at least one layer. For example, the usage section may indicate a section in which the data of the blob is read or written by performing the operation of each layer from the memory space (ex. Buffer) in which the data is stored.

또한, 사용 구간을 결정하기 위한, 블롭의 데이터가 적어도 하나의 레이어에 의해 이용되는 구간은, 블롭의 수명(lifetime)에 따라 결정될 수 있다. 예를 들어, 블롭의 수명은, 미리 설정된 값이거나, 블롭이 처리되는 레이어의 함수의 실행 구간에 기초하여 결정될 수 있다. 상술한 예에 한하지 않고, 블롭의 사용 구간은 다양한 방법에 따라 결정될 수 있다.In addition, the interval in which the data of the blob is used by the at least one layer for determining the use interval may be determined according to the lifetime of the blob. For example, the lifetime of the blob may be a preset value or may be determined based on the execution interval of a function of the layer on which the blob is processed. In addition to the above-described examples, the use interval of the blob may be determined according to various methods.

예를 들어, 전자 장치(1000)는, 제1 레이어에 의한 작업이 종료되면, 상기 제1 레이어의 입력 데이터 및 상기 제1 레이어의 내부 함수에 의해 임시 저장된 데이터는 더 이상 이용되지 않을 수 있다. 따라서, 제1 레이어 이후 동작되는 제2 레이어의 임시 저장 데이터와 출력 데이터는, 상기 제1 레이어의 입력 데이터 및 상기 제1 레이어의 내부 함수에 의해 임시 저장된 데이터가 저장된 메모리 공간을 이용할 수 있다.For example, when the operation by the first layer is finished, the electronic apparatus 1000 may no longer use the temporarily stored data by the input data of the first layer and the internal function of the first layer. Therefore, the temporary storage data and the output data of the second layer operated after the first layer may use a memory space in which the temporary data stored by the input data of the first layer and the internal function of the first layer are stored.

일 실시 예에 의한 전자 장치(1000)는, 동일한 메모리가 할당된 적어도 하나의 블롭의 데이터 크기 중 가장 큰 데이터 크기로 상기 메모리의 크기를 결정할 수 있다.The electronic apparatus 1000 according to an embodiment may determine the size of the memory as the largest data size among data sizes of at least one blob to which the same memory is allocated.

예를 들어, 제1 레이어의 입력 데이터를 나타내는 블롭 1과, 제1 레이어의 임시 저장된 데이터를 나타내는 블롭 2에 대하여, 각각 제2 레이어의 임시 저장 데이터를 나타내는 블롭 3 및 제2 레이어의 출력 데이터를 나타내는 블롭 4와 동일한 메모리가 할당된 것으로 결정될 수 있다. 일 예로, 블롭 1 및 블롭 3에 제1 메모리가 할당될 수 있고, 또한 블롭 2 및 블롭 4에 대해 제2 메모리가 할당될 수 있다.For example, for blob 1 indicating input data of the first layer and blob 2 indicating temporary stored data of the first layer, output data of blob 3 and second layer indicating temporary storage data of the second layer, respectively It may be determined that the same memory as the blob 4 indicating is allocated. For example, a first memory may be allocated to blobs 1 and 3 and a second memory may be allocated to blobs 2 and 4.

블롭 1의 데이터 크기가 1kb이고, 블롭 3의 데이터 크기가 2kb인 경우, 제1 메모리는, 제1 메모리에 할당된 블롭들 중 가장 큰 데이터 크기인 2kb의 크기로 할당될 수 있다. 또한, 블롭 2의 데이터 크기가 4kb이고, 블롭 4의 데이터 크기가 5kb인 경우, 제2 메모리는, 제2 메모리에 할당된 블롭들 중 가장 큰 데이터 크기인 5kb의 크기로 할당될 수 있다.When the data size of blob 1 is 1 kb and the data size of blob 3 is 2 kb, the first memory may be allocated to a size of 2 kb, which is the largest data size among the blobs allocated to the first memory. In addition, when the data size of the blob 2 is 4 kb and the data size of the blob 4 is 5 kb, the second memory may be allocated to a size of 5 kb, which is the largest data size among the blobs allocated to the second memory.

도 10은 일 실시 예에 따라 레이어 내부의 블롭을 식별하는 일 예를 나타낸 도면이다.10 is a diagram illustrating an example of identifying a blob in a layer according to an embodiment.

도 10을 참조하면, 신경망 모델(110)에 포함된 복수 개의 레이어 중 "conv2d_65"(1010) 레이어에 대해 내부 함수 "im2col"(1011) 및 "sgemm"(1013)가 식별될 수 있다. 또한, 내부 함수들(1011, 1013) 사이에 임시 저장되는 블롭 b₂(1012)이 식별될 수 있다. Referring to FIG. 10, internal functions “im2col” 1011 and “sgemm” 1013 may be identified with respect to a “conv2d_65” 1010 layer among a plurality of layers included in the neural network model 110. Also, the internal function blob b ₂ (1012) is a temporary storage between (1011, 1013) can be identified.

일 실시 예에 의한 블롭 b₁(1020) 및 블롭 b₃(1030)는 각각 "conv2d_65"(1010) 레이어에 대한 입력 데이터 및 출력 데이터를 나타내는 블롭이다.Blob b ₁ according to one embodiment 1020, the blob and b ₃ (1030) is a blob that represents the input data and output data for each "conv2d_65" (1010) layer.

따라서, 일 실시 예에 의하면, "conv2d_65"(1010) 레이어에 대해, 블롭 b₁(1020), 블롭 b₂(1012), 및 블롭 b₃(1030)이 식별되고, 식별된 블롭들에 대해 메모리 할당이 수행될 수 있다.Therefore, according to one embodiment, for the "conv2d_65" (1010) layers, blob b ₁ (1020), blob b ₂ (1012), and blob b ₃ (1030) are identified, the memory for the identified blobs Allocation can be performed.

일 실시 예에 의하면, 각 블롭에 대해 할당되는 메모리는, 이전 블롭의 사용 구간이 현재 블롭의 데이터가 생성되기 전에 종료되는지 여부에 기초하여, 할당될 수 있다.According to one embodiment, the memory allocated for each blob may be allocated based on whether the usage interval of the previous blob is terminated before the data of the current blob is generated.

예를 들면, 블롭 b₁(1020)는, "im2col"(1011)의 처리가 종료된 후, "sgemm"(1013)이 처리되는 시점부터는 더 이상 이용되지 않을 수 있다. 블롭 b₁(1020)의 사용 구간은, 블롭 b₃(1030)의 데이터가 생성되기 전에 종료될 수 있다. 따라서, 일 실시 예에 의하면, 블롭 b₁(1020)와 블롭 b₃(1030)에는 동일한 메모리가 할당될 수 있다. 예를 들어, "sgemm"(1013)이 처리될 때, 블롭 b₁(1020)이 할당된 동일한 메모리 공간에 기 저장된 블롭 b₁(1020)가 삭제된 후, 블롭 b₃(1030)의 데이터가 저장될 수 있다.For example, the blob b ₁ (1020) is, and may be more longer be used starting when the "sgemm" (1013) is processed after the processing of the "im2col" (1011) termination. B blob using blocks ₁ 1020, it may be ended before the data of the blob b ₃ (1030) is generated. Therefore, according to one embodiment, the blob b ₁ (1020) and the blob b ₃ (1030), may be assigned the same memory. For example, "sgemm" when 1013 is processed, the data in the blob b ₁ after 1020 the previously stored blobs b ₁ 1020 is deleted in the same memory space allocated, blob b ₃ (1030), the Can be stored.

또한, 블롭 b₂(1012)에는, 블롭 b₁(1020) 이전에 존재하는 블롭들 중 사용 구간이 블롭 b₂(1012)의 생성 전에 종료되는 블롭에 대해 할당된 메모리가 재할당될 수 있다.In addition, the blob is b ₂ (1012), there is a blob b ₁ (1020) Memory The use of the blob that previously present in the interval assigned to the blob ending before generation of the blob b ₂ (1012) may be re-allocated.

도 11은 일 실시 예에 따라 레이어 내부의 블롭을 포함한 신경망 모델(110)의 블롭에 대해 메모리를 할당하는 일 예를 나타낸 것이다.11 illustrates an example of allocating memory to a blob of the neural network model 110 including a blob in a layer according to an embodiment.

일 실시 예에 의하면, 복수 개의 레이어들의 입출력 데이터를 나타내는 블롭들(1101, 1103, 1104, 1105, 1106, 1108, 1109, 1115, 1111)이 식별될 수 있다. 또한, 각 레이어들의 내부 함수로 인해 임시 저장될 수 있는 데이터를 나타내는 블롭들(1102, 1107)이 식별될 수 있다.According to an embodiment, blobs 1101, 1103, 1104, 1105, 1106, 1108, 1109, 1115, and 1111 representing input / output data of a plurality of layers may be identified. In addition, blobs 1102 and 1107 representing data that may be temporarily stored due to the internal function of each layer may be identified.

도 11에서, 빗금 표시된 블록은 작업이 수행될 수 있는 레이어를 나타내고, 빗금 표시되지 않은 블록은 데이터를 나타내는 블롭을 나타낸다.In FIG. 11, hatched blocks represent layers on which operations can be performed, and blocks that are not hatched represent blobs representing data.

일 실시 예에 의하면, data 블롭(1101), col_buffer1 블롭(1102) 및 conv1 블롭(1103)은 각 블롭의 사용 구간이 서로 겹침에 따라 서로 다른 메모리에 할당될 수 있다. 예를 들면, data 블롭(1101), col_buffer1 블롭(1102) 및 conv1 블롭(1103)은 각각 b1, b2, 및 b3 메모리에 할당될 수 있다.According to an embodiment of the present disclosure, the data blob 1101, the col_buffer1 blob 1102, and the conv1 blob 1103 may be allocated to different memories as the use intervals of the blobs overlap each other. For example, data blob 1101, col_buffer1 blob 1102 and conv1 blob 1103 may be allocated to b1, b2, and b3 memories, respectively.

또한, data 블롭(1101) 및 col_buffer1 블롭(1102)은 conv1 레이어에서 이용된 후 conv1 레이어 다음에 처리되는 relu1 레이어 이하에서는 더 이상 이용되지 않을 수 있다. 따라서, data 블롭(1101) 및 col_buffer1 블롭(1102)의 사용 구간이 relu1 블롭(1104)이 생성되기 전에 종료될 수 있다. relu1 블롭(1104)은 data 블롭(1101) 또는 col_buffer1 블롭(1102)과 동일한 메모리인, b1 또는 b2 메모리에 할당될 수 있다. 일 실시 예에서 relu1 블롭(1104)은 b1 메모리에 할당된 것으로 가정한다.In addition, the data blob 1101 and the col_buffer1 blob 1102 may be no longer used after the relu1 layer processed after the conv1 layer after being used in the conv1 layer. Accordingly, the use intervals of the data blob 1101 and the col_buffer1 blob 1102 may be terminated before the relu1 blob 1104 is generated. The relu1 blob 1104 may be allocated to a b1 or b2 memory, which is the same memory as the data blob 1101 or the col_buffer1 blob 1102. In an embodiment, it is assumed that relu1 blob 1104 is allocated to b1 memory.

또한, norm1 블롭(1105)은, norm1 블롭(1105)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리에 할당될 수 있다. 예를 들면, norm1 블롭(1105)은 b2 또는 b3 메모리에 할당될 수 있다. 일 실시 예에서 norm1 블롭(1105)은 b2 메모리에 할당된 것으로 가정한다.In addition, the norm1 blob 1105 may be allocated to a memory in which the blob in which the use interval ends is stored before the norm1 blob 1105 is generated. For example, norm1 blob 1105 may be allocated to b2 or b3 memory. In an embodiment, assume that norm1 blob 1105 is allocated to b2 memory.

또한, pool1 블롭(1106)은, pool1 블롭(1106)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리인 b1 또는 b2 메모리에 할당될 수 있다. 일 실시 예에서 pool1 블롭(1106)은 b1 메모리에 할당된 것으로 가정한다.In addition, before the pool1 blob 1106 is generated, the pool1 blob 1106 may be allocated to a b1 or b2 memory, which is a memory in which the blob in which the use interval ends is stored. In an embodiment, it is assumed that the pool1 blob 1106 is allocated to the b1 memory.

또한, col_buffer2 블롭(1107)은, col_buffer2 블롭(1107)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리인 b2 또는 b3 메모리에 할당될 수 있다. 일 실시 예에서 col_buffer2 블롭(1107)은 b2 메모리에 할당된 것으로 가정한다.In addition, the col_buffer2 blob 1107 may be allocated to a b2 or b3 memory, which is a memory in which the blob in which the use interval ends is stored before the col_buffer2 blob 1107 is generated. In an embodiment, assume that col_buffer2 blob 1107 is allocated to b2 memory.

또한, conv2 블롭(1108)은, conv2 블롭(1108)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리인 b3 메모리에 할당될 수 있다. In addition, the conv2 blob 1108 may be allocated to the b3 memory, which is a memory in which the blob in which the use interval ends is stored, before the conv2 blob 1108 is generated.

또한, relu2 블롭(1109)은, relu2 블롭(1109)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리인 b1 또는 b2 메모리에 할당될 수 있다. 일 실시 예에서 relu2 블롭(1109)은 b1 메모리에 할당된 것으로 가정한다.In addition, before the relu2 blob 1109 is generated, the relu2 blob 1109 may be allocated to a b1 or b2 memory, which is a memory in which the blob in which the use interval ends is stored. In an embodiment, it is assumed that relu2 blob 1109 is allocated to b1 memory.

또한, norm2 블롭(1115)은, norm2 블롭(1115)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리인 b2 또는 b3 메모리에 할당될 수 있다. 일 실시 예에서 norm2 블롭(1110)은 b2 메모리에 할당된 것으로 가정한다.In addition, the norm2 blob 1115 may be allocated to a b2 or b3 memory, which is a memory in which the blob in which the use period ends is stored before the norm2 blob 1115 is generated. In an embodiment, assume that norm2 blob 1110 is allocated to b2 memory.

또한, pool2 블롭(1111)은, pool2 블롭(1111)이 생성되기 전에, 사용 구간이 종료된 블롭이 저장된 메모리인 b1 또는 b2 메모리에 할당될 수 있다. 일 실시 예에서 pool2 블롭(1110)은 b1 메모리에 할당된 것으로 가정한다.In addition, before the pool2 blob 1111 is generated, the pool2 blob 1111 may be allocated to a b1 or b2 memory, which is a memory in which the blob in which the use interval ends is stored. In an embodiment, it is assumed that the pool2 blob 1110 is allocated to the b1 memory.

일 실시 예에 의하면, 상술한 방법에 따라 각각의 블롭들을 b1, b2 또는 b3 메모리에 각각 할당한 후, 각 메모리에 할당된 블롭들의 데이터 크기 중 가장 큰 값을 기준으로, b1, b2 및 b3 메모리의 크기를 결정할 수 있다.According to one embodiment, after assigning each blob to b1, b2 or b3 memory according to the above-described method, based on the largest value of the data size of the blobs allocated to each memory, b1, b2 and b3 memory The size of can be determined.

일 실시 예에 의한 각 블롭들의 데이터 크기는 아래 표 2와 같은 값을 가질 수 있다.According to an embodiment, the data size of each blob may have a value as shown in Table 2 below.

블롭 이름Blob names 타입type 데이터 크기(MB)Data size (MB) 할당된 메모리Allocated memory datadata 입출력 데이터I / O data 0.590.59 b1b1 col_buffer1col_buffer1 내부 데이터Internal data 4.194.19 b2b2 conv1conv1 입출력 데이터I / O data 1.111.11 b3b3 relu1relu1 입출력 데이터I / O data 1.111.11 b1b1 norm1norm1 입출력 데이터I / O data 1.111.11 b2b2 pool1pool1 입출력 데이터I / O data 0.270.27 b1b1 col_buffer2col_buffer2 내부 데이터Internal data 6.676.67 b2b2 conv2conv2 입출력 데이터I / O data 0.710.71 b3b3 relu2relu2 입출력 데이터I / O data 0.710.71 b1b1 norm2norm2 입출력 데이터I / O data 0.710.71 b2b2 pool2pool2 입출력 데이터I / O data 0.170.17 b1b1

일 실시 예에 의하면, b1 메모리에 할당된 data 블롭(1101), relu1 블롭(1104), pool1 블롭(1106), relu2 블롭(1109) 및 pool2 블롭(1111)의 데이터 크기 중 가장 큰 값인 1.11 MB을 기준으로 b1 메모리의 크기가 결정될 수 있다.According to an embodiment, 1.11 MB, which is the largest value among the data blobs 1101, relu1 blobs 1104, pool1 blobs 1106, relu2 blobs 1109, and pool2 blobs 1111, allocated to the b1 memory, is selected. Based on the size of the b1 memory may be determined.

또한, b2 메모리에 할당된 col_buffer1 블롭(1102), norm1 블롭(1105), col_buffer2 블롭(1107), norm2 블롭(1110)의 데이터 크기 중 가장 큰 값인 6.67 MB을 기준으로 b2 메모리의 크기가 결정될 수 있다.Further, the size of the b2 memory may be determined based on 6.67 MB, which is the largest value among the data sizes of the col_buffer1 blob 1102, the norm1 blob 1105, the col_buffer2 blob 1107, and the norm2 blob 1110 allocated to the b2 memory. .

또한, b3 메모리에 할당된 colv1 블롭(1103), conv2 블롭(1108)의 데이터 크기 중 가장 큰 값인 1.11MB을 기준으로 b3 메모리의 크기가 결정될 수 있다.In addition, the size of the b3 memory may be determined based on 1.11 MB, which is the largest value among the data sizes of the colv1 blob 1103 and the conv2 blob 1108 allocated to the b3 memory.

따라서, 일 실시 예에 의하면, 도 11에 도시된 블롭들을 저장하는데 b1, b2, b3 메모리 크기를 합친 8.89MB 크기의 메모리 공간이 이용될 수 있다.Therefore, according to an embodiment, a memory space of 8.89 MB in which b1, b2, and b3 memory sizes are combined may be used to store the blobs illustrated in FIG. 11.

내부 데이터에 대한 블롭 없이 입출력 데이터에 대한 블롭에 대하여만 메모리가 할당되는 경우, 내부 데이터에 대한 블롭을 저장하기 위한 메모리를 다시 할당하여야 하므로, 메모리 할당 공간이 증가할 수 있다. If the memory is allocated only for the blob for the input / output data without the blob for the internal data, the memory allocation space may increase because the memory for storing the blob for the internal data must be reallocated.

그러나, 일 실시 예에 의하면, 입출력 데이터에 대한 블롭뿐만 아니라 레이어의 내부 함수로 인해 생성되는 내부 데이터에 대한 블롭도 고려하여 메모리가 할당됨에 따라 더 적은 크기의 메모리 공간이 할당될 수 있다.However, according to an embodiment, as the memory is allocated in consideration of the blobs for the input / output data as well as the blobs for the internal data generated by the internal functions of the layer, a smaller memory space may be allocated.

도 12는 일 실시 예에 의한 복수의 프로세서에 의해 신경망 모델이 처리되는 일 예를 나타낸 도면이다.12 is a diagram illustrating an example in which a neural network model is processed by a plurality of processors according to an exemplary embodiment.

일 실시 예에 의한 복수의 프로세서에 레이어를 할당하는 방법에 따르면, 레이어 1 내지 4(1201, 1202, 1203, 1204)는 각각 CPU, GPU, GPU, CPU에 의해 처리될 수 있다.According to the method of allocating layers to a plurality of processors according to an embodiment, the layers 1 to 4 1201, 1202, 1203, and 1204 may be processed by a CPU, a GPU, a GPU, and a CPU, respectively.

CPU에서 처리되는 레이어 1(1201) 및 레이어 4(1204)의 입출력 데이터에 대한 블롭 및 내부 데이터에 대한 블롭은 일 실시 예에 의한 메모리 할당 방법에 따라 메모리가 할당될 수 있다. 예를 들면, 레이어 1(1201)의 블롭(예를 들면, 레이어 1의 입출력 데이터 또는 내부 데이터를 포함하는 블롭)에 할당된 메모리 중 적어도 하나의 메모리는 레이어 4(1204)의 블롭에도 재할당될 수 있다.The blobs for the input / output data and the blobs for the internal data of the layer 1 1201 and the layer 4 1204 processed by the CPU may be allocated memory according to the memory allocation method according to an embodiment. For example, at least one of the memories allocated to a blob of layer 1 1201 (eg, a blob containing input / output data or internal data of layer 1) may be reassigned to the blob of layer 4 1204. Can be.

일 실시 예에 의하면, 동일한 레이어 내에 복수 개의 내부 데이터에 대한 블롭들이 존재하는 경우, 각각의 블롭들 간 동일한 메모리가 할당될 수 있다. According to an embodiment, when blobs for a plurality of internal data exist in the same layer, the same memory may be allocated between the blobs.

일 실시 예에 의하면, 레이어 1(1201)에서 서로 다른 내부 함수에 의해 생성되는 서로 다른 내부 데이터에 대하여, 블롭 1 내지 4(1205, 1206, 1207, 1208)가 레이어 1(1201)에 존재할 수 있다. 예를 들어, 직렬적으로 배열된 내부 함수 1 내지 4에 의한 출력 데이터는 각각 블롭 1 내지 4(1205, 1206, 1207, 1208)로 나타낼 수 있다. 또한, 블롭 4(1208)의 데이터는 내부 함수 5의 입력 데이터로 이용될 수 있다.According to an embodiment, blobs 1 to 4 (1205, 1206, 1207, and 1208) may exist in layer 1 1201 for different internal data generated by different internal functions in layer 1 1201. . For example, the output data by the internal functions 1 to 4 arranged in series may be represented by blobs 1 to 4 (1205, 1206, 1207, and 1208), respectively. Also, the data in blob 4 1208 can be used as input data for internal function 5.

일 실시 예에 의하면, 블롭 1(1205)의 내부 함수 1의 작업이 종료된 후, 블롭 3(1207)의 내부 함수 3의 작업이 수행될 수 있으므로, 블롭 1(1205)의 사용 구간은 블롭 3(1207)이 생성되기 전에 종료될 수 있다. 따라서, 블롭 1(1205)에 할당된 메모리가 블롭 3(1207)에 재할당될 수 있다. According to one embodiment, after the operation of the internal function 1 of the blob 1 1205 is finished, the operation of the internal function 3 of the blob 3 1207 may be performed, so that the use interval of the blob 1 1205 is set to the blob 3 1207 may end before it is generated. Thus, memory allocated to blob 1 1205 may be reallocated to blob 3 1207.

또한, 블롭 2(1206)의 내부 함수 2의 작업이 종료된 후, 블롭 4(1208)의 내부 함수 4의 작업이 수행될 수 있으므로, 블롭 2(1206)의 사용 구간은 블롭 4(1208)이 생성되기 전에 종료될 수 있다. 따라서, 블롭 2(1206)에 할당된 메모리가 블롭 4(1208)에 재할당될 수 있다.Also, after the operation of the internal function 2 of the blob 2 1206 is finished, the operation of the internal function 4 of the blob 4 1208 may be performed, so that the use interval of the blob 2 1206 is determined by the blob 4 1208. It can be terminated before it is created. Thus, memory allocated to blob 2 1206 may be reallocated to blob 4 1208.

일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 또는 프로그램 모듈을 포함하며, 임의의 정보 전달 매체를 포함한다. One embodiment may also be implemented in the form of a recording medium containing instructions executable by a computer, such as a program module executed by the computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, or program modules, and includes any information delivery media.

또한, 본 명세서에서, “부”는 프로세서 또는 회로와 같은 하드웨어 구성(hardware component), 및/또는 프로세서와 같은 하드웨어 구성에 의해 실행되는 소프트웨어 구성(software component)일 수 있다.In addition, in this specification, “unit” may be a hardware component such as a processor or a circuit, and / or a software component executed by a hardware component such as a processor.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

Claims

In the method for processing a neural network model using a plurality of processors in an electronic device,
Allocating a plurality of layers included in the neural network model to at least one slice;
Assigning the at least one slice to the plurality of processors based on a processing time of each of the plurality of processors for each of the at least one slice; And
Based on the assigned result, processing the neural network model using the plurality of processors,
Wherein the processing time comprises a switching time taken for the current processor to receive data necessary for processing the current slice from the previous processor processing the previous slice.

The method of claim 1,
By determining at least one layer of the plurality of layers as a slice point, the plurality of layers are assigned to the at least one slice,
The slice point may include whether each of the plurality of layers is a branching point of a plurality of layers, whether a plurality of layers are combined, whether a plurality of layers are combined, a task that can be processed by the same processor, and high accuracy is required. The method is determined based on at least one of whether to include the task.

The method of claim 1, wherein the at least one slice is
A plurality of nodes representing a combination of different slices and processors are allocated to the plurality of processors based on a path having the smallest sum of processing times among the plurality of paths generated by connecting the at least one slice in an arrangement order. , Way.

The method of claim 1,
The plurality of input data input to any one of the plurality of layers is sequentially processed for each channel by the number of the plurality of input data in the layer,
For a plurality of channels of the layer, as much memory as is needed to process a job in a first channel of the plurality of channels is allocated.

The method of claim 1,
Identifying at least one layer included in at least one slice assigned to a first processor of the plurality of processors;
Identifying at least one blob each representing at least one of data input to the identified layer, output data, and data temporarily stored within the identified layer; And
Allocating a memory for storing data of the identified blob.

6. The method of claim 5, wherein allocating the memory
Determining a processing order of the identified layers; And
Based on the determined processing order, allocating memory for the current blob by determining whether the usage interval of the previous blob ends before data for the current blob is generated.

6. The method of claim 5, wherein allocating the memory
Determining the size of the memory based on the largest data size of the data sizes of at least one blob to which the same memory is allocated.

An electronic device for processing neural network models,
A memory for storing the neural network model;
Allocating a plurality of layers included in the neural network model to at least one slice, and assigning the at least one slice to the plurality of processors based on a processing time of each of the plurality of processors for each of the at least one slice, At least one processor for processing the neural network model based on the assigned result; And
The neural network model includes an output unit for outputting a processed result.
Wherein the processing time comprises a switching time taken for the current processor to receive data necessary for processing the current slice from a previous processor processing the previous slice.

The method of claim 8, wherein the plurality of layers are assigned to the at least one slice by determining at least one layer of the plurality of layers as a slice point.
The slice point may include whether each of the plurality of layers is a branching point of a plurality of layers, whether a plurality of layers are combined, whether a plurality of layers are combined, a task that can be processed by the same processor, and high accuracy is required. The electronic device is determined based on at least one of including a task.

The method of claim 8, wherein the at least one slice,
A plurality of nodes representing a combination of different slices and processors are allocated to the plurality of processors based on a path having the smallest sum of processing times among the plurality of paths generated by connecting the at least one slice in an arrangement order. , Electronic device.

The method of claim 8,
The plurality of input data input to any one of the plurality of layers is sequentially processed for each channel by the number of the plurality of input data in the layer,
The memory of the plurality of channels of the layer is allocated an amount of memory necessary for processing a job in a first channel of the plurality of channels.

10. The system of claim 8, wherein the at least one processor is
Identify at least one layer included in at least one slice allocated to a first processor of the plurality of processors,
Identifying at least one blob respectively representing at least one of data input to the identified layer, output data, and data temporarily stored inside the identified layer,
Allocating a memory for storing data of the identified blob.

13. The system of claim 12, wherein the at least one processor is
Allocating memory for the current blob by determining a processing order of the identified layers and determining, based on the determined processing order, whether the usage interval of the previous blob is terminated before data of the current blob is generated, Electronic devices.

13. The system of claim 12, wherein the at least one processor is
And determine the size of the memory based on the largest data size of the data sizes of at least one blob to which the same memory is allocated.

A computer-readable recording medium having recorded thereon a program for implementing the method of any one of claims 1 to 7.