KR20210035017A

KR20210035017A - Neural network training method, method and apparatus of processing data based on neural network

Info

Publication number: KR20210035017A
Application number: KR1020190150527A
Authority: KR
Inventors: 김영성; 한재준
Original assignee: 삼성전자주식회사
Priority date: 2019-09-23
Filing date: 2019-11-21
Publication date: 2021-03-31

Abstract

A method of processing data based on a neural network according to one embodiment of the present invention receives input data, obtains a plurality of parameter vectors representing a hierarchical hyperspherical space including a plurality of spheres belonging to different layers, and processes the input data using the neural network to which the plurality of parameter vectors are applied.

Description

Neural network training method, neural network-based data processing method and apparatus {NEURAL NETWORK TRAINING METHOD, METHOD AND APPARATUS OF PROCESSING DATA BASED ON NEURAL NETWORK}

아래의 실시예들은 신경망 트레이닝 방법, 신경망 기반의 데이터 처리 방법 및 장치에 관한 것이다.The following embodiments relate to a neural network training method, a data processing method and apparatus based on a neural network.

인공 신경망을 위한 트레이닝 데이터는 실제 데이터의 부분 집합인 경우가 대부분이다. 따라서, 트레이닝 데이터에 대해서는 오차가 감소하지만, 실제 데이터에 대해서는 오차가 증가하는 경우가 발생할 수 있다. '과적합(overfitting)'은 인공 신경망이 트레이닝 데이터에 대해 과하게 학습하여 실제 데이터에 대한 오차가 증가하는 현상을 의미하며, 인공 신경망의 오차를 증가시키는 원인으로 작용할 수 있다. Training data for artificial neural networks is often a subset of real data. Accordingly, the error may decrease with respect to training data, but may increase with respect to actual data. 'Overfitting' refers to a phenomenon in which an artificial neural network learns excessively about training data, thereby increasing an error in the actual data, and may act as a cause of increasing the error of the artificial neural network.

과적합을 해결하기 위해 예를 들어, 트레이닝 데이터의 개수를 늘리거나, 특징(feature)의 개수를 줄이거나, 또는 정규화(regularization)를 수행하는 등의 다양한 방법들이 고려될 수 있다. 정규화는 모델의 형태를 최대한 단순하게 만들려는 수치 해석적인 기법으로서, 트레이닝 데이터에 대한 모델의 성능을 약간 희생하더라도 모델을 최대한 간단하게 만들어 과적합을 방지하는 방법이다.In order to solve the overfitting, various methods such as increasing the number of training data, reducing the number of features, or performing regularization may be considered. Normalization is a numerical analysis technique that tries to simplify the shape of the model as much as possible, and it is a method to prevent overfitting by making the model as simple as possible even at a slight sacrifice of the model's performance for training data.

일 측에 따르면, 신경망 기반의 데이터 처리 방법은 입력 데이터를 수신하는 단계; 계층적 초구형 공간(hierarchical-hyperspherical space)- 상기 계층적 초구형 공간은 서로 다른 계층들에 속한 복수의 구(sphere)들을 포함함- 을 표현하는 복수의 파라미터 벡터들을 획득하는 단계; 상기 복수의 파라미터 벡터들을 신경망에 적용하는 단계; 및 상기 신경망을 이용하여 상기 입력 데이터를 처리하는 단계를 포함한다. According to one side, a data processing method based on a neural network includes receiving input data; Obtaining a plurality of parameter vectors representing a hierarchical-hyperspherical space, the hierarchical-hyperspherical space including a plurality of spheres belonging to different layers; Applying the plurality of parameter vectors to a neural network; And processing the input data using the neural network.

상기 신경망은 컨볼루션 신경망(convolutional neural network)을 포함하고, 상기 복수의 파라미터 벡터들은 복수의 필터 파라미터 벡터들(filter parameters)을 포함할 수 있다. The neural network may include a convolutional neural network, and the plurality of parameter vectors may include a plurality of filter parameter vectors.

상기 입력 데이터는 영상 데이터를 포함할 수 있다. The input data may include image data.

상기 계층적 초구형 공간에서 동일한 계층에 속한 구들의 중심들은 상기 동일한 계층의 상위 계층에 속한 구의 중심에 기초하여 결정될 수 있다. In the hierarchical superspherical space, centers of spheres belonging to the same layer may be determined based on the centers of spheres belonging to an upper layer of the same layer.

상기 계층적 초구형 공간에서 특정 계층에 속한 구의 반지름은 상기 특정 계층의 상위 계층에 속한 구의 반지름보다 작을 수 있다. In the hierarchical superspherical space, a radius of a sphere belonging to a specific layer may be smaller than a radius of a sphere belonging to an upper layer of the specific layer.

상기 계층적 초구형 공간에서 특정 계층에 속한 구의 중심은 상기 특정 계층의 상위 계층에 속한 구의 내부에 위치할 수 있다. In the hierarchical super-spherical space, a center of a sphere belonging to a specific layer may be located inside a sphere belonging to an upper layer of the specific layer.

상기 계층적 초구형 공간에서 동일한 계층에 속한 구들은 서로 겹치지 않을 수 있다. In the hierarchical superspherical space, spheres belonging to the same hierarchy may not overlap each other.

상기 복수의 파라미터 벡터들의 분포도- 상기 분포도는 상기 계층적 초구형 공간에서 상기 복수의 파라미터 벡터들이 전역적으로 균일하게 분포된 정도를 지시함- 는 임계 분포도보다 클 수 있다. A distribution diagram of the plurality of parameter vectors-the distribution diagram indicating a degree to which the plurality of parameter vectors are uniformly distributed globally in the hierarchical superspherical space-may be greater than a critical distribution diagram.

상기 분포도는 상기 복수의 파라미터 벡터들 사이의 이산적 거리 및 상기 복수의 파라미터 벡터들 사이의 연속적 거리의 조합에 기초하여 결정될 수 있다. The distribution degree may be determined based on a combination of a discrete distance between the plurality of parameter vectors and a continuous distance between the plurality of parameter vectors.

상기 이산적 거리는 상기 복수의 파라미터 벡터들을 양자화한 뒤, 상기 양자화된 파라미터 벡터들 사이의 해밍 거리(hamming distance)를 계산함으로써 결정될 수 있다. The discrete distance may be determined by quantizing the plurality of parameter vectors and then calculating a hamming distance between the quantized parameter vectors.

상기 연속적 거리는 상기 복수의 파라미터 벡터들 사이의 각도 거리(angular distance)를 포함할 수 있다. The continuous distance may include an angular distance between the plurality of parameter vectors.

상기 복수의 파라미터 벡터들 각각은 해당하는 구의 중심을 지시하는 중심 벡터(center vector)와 상기 해당하는 구의 표면을 지시하는 표면 벡터(surface vector)를 포함할 수 있다. Each of the plurality of parameter vectors may include a center vector indicating a center of a corresponding sphere and a surface vector indicating a surface of the corresponding sphere.

상기 적용하는 단계는 상기 복수의 파라미터 벡터들 각각에 대응하여, 해당하는 중심 벡터와 해당하는 표면 벡터에 기초한 투영 벡터를 생성하는 단계; 및 상기 투영 벡터를 상기 신경망에 적용하는 단계를 포함할 수 있다. The applying may include generating a projection vector based on a corresponding center vector and a corresponding surface vector, corresponding to each of the plurality of parameter vectors; And applying the projection vector to the neural network.

일 실시예에 따르면, 신경망 트레이닝 방법은 학습 데이터를 수신하는 단계; 신경망을 이용하여 상기 학습 데이터를 처리하는 단계; 상기 학습 데이터의 레이블과 상기 학습 데이터의 처리 결과에 기초하여 손실 항목(loss term)을 결정하는 단계; 상기 신경망의 파라미터 벡터들이 계층적 초구형 공간- 상기 계층적 초구형 공간은 서로 다른 계층들에 속한 복수의 구들을 포함함- 을 표현하도록, 정규화 항목(regularization term)을 결정하는 단계; 및 상기 손실 항목 및 상기 정규화 항목에 기초하여, 상기 파라미터 벡터들을 트레이닝(training)하는 단계를 포함한다. According to an embodiment, a method for training a neural network includes receiving training data; Processing the training data using a neural network; Determining a loss term based on a label of the training data and a result of processing the training data; Determining a regularization term such that the parameter vectors of the neural network represent a hierarchical super-spherical space-the hierarchical super-spherical space includes a plurality of spheres belonging to different layers; And training the parameter vectors based on the loss item and the normalization item.

상기 신경망은 컨볼루션 신경망을 포함하고, 상기 복수의 파라미터 벡터들은 복수의 필터 파라미터 벡터들을 포함하며, 상기 트레이닝 데이터는 영상 데이터를 포함할 수 있다. The neural network may include a convolutional neural network, the plurality of parameter vectors may include a plurality of filter parameter vectors, and the training data may include image data.

상기 정규화 항목은 상기 계층적 초구형 공간에서 특정 계층에 속한 구의 반지름이 상기 특정 계층의 상위 계층에 속한 구의 반지름보다 작아지도록 하는 제1 제약 조건; 상기 특정 계층에 속한 구의 중심이 상기 특정 계층의 상위 계층에 속한 구의 내부에 위치하도록 하는 제2 제약 조건; 및 상기 계층적 초구형 공간에서 동일한 계층에 속한 구들은 서로 겹치지 않도록 하는 제3 제약 조건 중 적어도 하나를 고려하여 결정될 수 있다. The normalization item may include: a first constraint in which a radius of a sphere belonging to a specific layer in the hierarchical superspherical space is smaller than a radius of a sphere belonging to an upper layer of the specific layer; A second constraint condition in which the center of the sphere belonging to the specific layer is located inside the sphere belonging to the higher layer of the specific layer; And a third constraint that prevents spheres belonging to the same layer from overlapping each other in the hierarchical superspherical space.

상기 정규화 항목은 상기 복수의 파라미터 벡터들의 분포도-상기 분포도는 상기 계층적 초구형 공간에서 상기 복수의 파라미터 벡터들이 전역적으로 균일하게 분포된 정도를 지시함-가 임계 분포도보다 커지도록 결정될 수 있다. The normalization item may be determined such that a distribution diagram of the plurality of parameter vectors-the distribution diagram indicates a degree to which the plurality of parameter vectors are uniformly distributed globally in the hierarchical superspherical space-is greater than a critical distribution diagram.

상기 분포도는 상기 복수의 파라미터 벡터들 사이의 이산적 거리(discrete distance) 및 상기 복수의 파라미터 벡터들 사이의 연속적 거리(continuous distance)의 조합에 기초하여 결정될 수 있다. The distribution degree may be determined based on a combination of a discrete distance between the plurality of parameter vectors and a continuous distance between the plurality of parameter vectors.

상기 이산적 거리는 상기 복수의 파라미터 벡터들을 양자화한 뒤, 상기 양자화된 파라미터 벡터들 사이의 해밍 거리를 계산함으로써 결정되고, 상기 연속적 거리는 상기 복수의 파라미터 벡터들 사이의 각도 거리를 포함할 수 있다. The discrete distance is determined by quantizing the plurality of parameter vectors and then calculating a Hamming distance between the quantized parameter vectors, and the continuous distance may include an angular distance between the plurality of parameter vectors.

상기 복수의 파라미터 벡터들 각각은 해당하는 구의 중심을 지시하는 중심 벡터와 상기 해당하는 구의 표면을 지시하는 표면 벡터를 포함할 수 있다. Each of the plurality of parameter vectors may include a center vector indicating a center of a corresponding sphere and a surface vector indicating a surface of the corresponding sphere.

상기 정규화 항목은 상기 계층적 구형 공간에서 서로 동일한 계층에 속한 구들의 중심 벡터들 사이의 거리에 기반하는 제1 거리 항목; 상기 계층적 구형 공간에서 서로 동일한 계층에 속한 구들의 표면 벡터들 사이의 거리에 기반하는 제2 거리 항목; 상기 계층적 구형 공간에서 서로 다른 계층에 속한 구들의 중심 벡터들 사이의 거리에 기반하는 제3 거리 항목; 및 상기 계층적 구형 공간에서 서로 다른 계층에 속한 구들의 표면 벡터들 사이의 거리에 기반하는 제4 거리 항목 중 적어도 하나에 기초하여 결정될 수 있다. The normalization item may include a first distance item based on a distance between center vectors of spheres belonging to the same hierarchy in the hierarchical spherical space; A second distance item based on a distance between surface vectors of spheres belonging to the same layer in the hierarchical spherical space; A third distance item based on a distance between center vectors of spheres belonging to different layers in the hierarchical spherical space; And a fourth distance item based on a distance between surface vectors of spheres belonging to different layers in the hierarchical spherical space.

일 실시예에 따르면, 신경망 기반의 데이터 처리 장치는 입력 데이터를 수신하고, 계층적 초구형 공간- 상기 계층적 초구형 공간은 서로 다른 계층들에 속한 복수의 구들을 포함함- 을 표현하는 복수의 파라미터 벡터들을 획득하는 통신 인터페이스; 및 상기 복수의 파라미터 벡터들을 신경망에 적용하고, 상기 신경망을 이용하여 상기 입력 데이터를 처리하는 프로세서를 포함한다.According to an embodiment, a data processing apparatus based on a neural network receives input data, and a plurality of hierarchical super-spherical spaces-the hierarchical super-spherical space includes a plurality of spheres belonging to different layers- A communication interface for obtaining parameter vectors; And a processor that applies the plurality of parameter vectors to a neural network and processes the input data using the neural network.

도 1은 일 실시예에 따른 계층적 초구형 공간(Hierarchical-hyper spherical space)을 도시한 도면.
도 2 내지 도 3은 실시예들에 따른 구형 공간에서 쌍 거리를 최대화하기 위한 거리 매트릭을 산출하는 방법을 설명하기 위한 도면.
도 4는 일 실시예에 따른 계층적 정규화를 적용한 네트워크의 구조를 도시한 도면.
도 5는 일 실시예에 따른 계층적 파라미터 벡터를 계산하는 네트워크를 도시한 도면.
도 6은 일 실시예에 따른 계층화된 노이즈 벡터의 생성을 통해 영상을 생성하는 생성기(generator)를 도시한 도면.
도 7은 일 실시예에 따른 신경망 기반의 데이터 처리 방법을 나타낸 흐름도.
도 8은 일 실시예에 따른 신경망 트레이닝 방법을 나타낸 흐름도.
도 9는 일 실시예에 따른 신경망 기반의 데이터 처리 장치의 블록도.1 is a diagram illustrating a hierarchical-hyper spherical space according to an embodiment.
2 to 3 are diagrams for explaining a method of calculating a distance metric for maximizing a pair distance in a spherical space according to embodiments.
4 is a diagram illustrating a structure of a network to which hierarchical normalization is applied according to an embodiment.
5 is a diagram illustrating a network for calculating a hierarchical parameter vector according to an embodiment.
6 is a diagram illustrating a generator for generating an image through generation of a layered noise vector according to an embodiment.
7 is a flowchart illustrating a data processing method based on a neural network according to an embodiment.
8 is a flow chart showing a neural network training method according to an embodiment.
9 is a block diagram of an apparatus for processing data based on a neural network according to an embodiment.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. The same reference numerals shown in each drawing indicate the same members.

아래 설명하는 실시예들에는 다양한 변경이 가해질 수 있다. 아래 설명하는 실시예들은 실시 형태에 대해 한정하려는 것이 아니며, 이들에 대한 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Various changes may be made to the embodiments described below. The embodiments described below are not intended to be limited to the embodiments, and should be understood to include all changes, equivalents, and substitutes thereto.

실시예에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 실시예를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used only to describe specific embodiments, and are not intended to limit the embodiments. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present specification, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof does not preclude in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms as defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present application. Does not. Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 계층적 초구형 공간(Hierarchical-hyper spherical space)을 도시한 도면이다. 초구(hypersphere)는 '중심(centre)'이라고 불리는 주어진 점으로부터 일정한 거리에 있는 점들의 집합이다. 초구는 여차원(codimension)의 매니폴드(manifold)로서, 하나의 차원이 주변 공간보다 작을 수 있다. 초구의 반경이 증가함에 따라 곡률이 감소할 수 있다. 극한(limit)에서 초구의 표면은 초평면(hyperplane)의 제로 곡률(zero curvature)에 접근할 수 있다. 초평면(Hyperplanes)과 초구형(hyperspheres)은 초표면(hypersurfaces)의 일 예일 수 있다.1 is a diagram illustrating a hierarchical-hyper spherical space according to an embodiment. A hypersphere is a set of points at a certain distance from a given point, called a'centre'. The initial sphere is a manifold of a codimension, and one dimension may be smaller than the surrounding space. As the radius of the initial sphere increases, the curvature may decrease. At the limit, the surface of the supersphere can approach the zero curvature of the hyperplane. Hyperplanes and hyperspheres may be examples of hypersurfaces.

일 실시예에서는 동일한 특성을 갖는 샘플들에 대한 파라미터 벡터들 간의 그룹을 형성하여 정규화를 적용할 수 있다. 이때, 각 그룹을 정의하는 것은 '슈퍼 클래스(super-class)' 라고 명명할 수 있다. 클래스의 각 샘플에 대하여 성긴(coarse)와 조밀한(fine)의 (슈퍼 클래스 & 서브 클래스(sub class)) 쌍을 정의함으로써 초구형 공간의 계층을 구성할 수 있다. In an embodiment, normalization may be applied by forming a group between parameter vectors for samples having the same characteristics. In this case, defining each group may be referred to as'super-class'. For each sample of a class, a hierarchy of super-spherical space can be constructed by defining a pair of coarse and fine (super class & sub class).

동일한 공간에서 계층 구조를 갖는 고차원 벡터들 사이의 쌍 거리(pairwise distance)를 측정하는 것은 용이하지 않으므로, 일 실시예에는 원래의 공간으로부터 고립된 공간을 포함하는 또 다른 식별 공간(identication space)

을 구성할 수 있다. Since it is not easy to measure the pairwise distance between high-dimensional vectors having a hierarchical structure in the same space, in one embodiment, another identification space including a space isolated from the original space

Can be configured.

d-sphere

는

를 만족하는 점들의 집합을 나타낼 수 있다. d-sphere

Is

It can represent a set of points that satisfy.

일 실시예에서는 다중 식별 관계(multiple identifying relations)를 사용하여 다중 분리된 초구형 공간을 구성할 수 있다. 일 실시예에서는 하나의 공간을 여러 개의 공간들로 분해하고, 계층적인 관점에서 공간을 재정의함으로써 다중 그룹 별 초구형 공간의 파라미터 벡터의 정규화에 계층 구조를 적용할 수 있다. 파라미터 벡터들(parameter vectors)이 단위 초구형(unit hypersphere)에 균일하게 분포되도록 하기 위해 파라미터 벡터들은 가우스 정규 분포(Gaussian normal distribution)로부터 샘플링될 수 있다. 이는 가우스 정규 분포가 구상 대칭(spherically symmetric)을 이루기 때문이다. 또한, 베이지안 관점에서, 가우시안 이전의 신경망은 L2-놈(norm) 정규화를 유도할 수 있다. In an embodiment, multiple separate super-spherical spaces may be configured using multiple identifying relations. In an embodiment, a hierarchical structure may be applied to normalization of a parameter vector of a superspherical space for each multi-group by decomposing one space into several spaces and redefining the space from a hierarchical perspective. Parameter vectors can be sampled from a Gaussian normal distribution so that the parameter vectors are evenly distributed over a unit hypersphere. This is because the Gaussian normal distribution is spherically symmetric. In addition, from the Bayesian perspective, neural networks before Gaussian can induce L2-norm normalization.

전술한 내용을 기초로, 일 실시예에서는 초구형 공간에 대한 신경망의 파라미터 벡터가 가우스 프라이어(Gaussian prior)을 갖도록 트레이닝할 수 있다. 정규 가우스 분포에서 두 개의 파라미터 벡터들 간의 차이 산술 연산(difference arithmetic operation)에 의해 계산된 투영 벡터는 정규 차이 분포(normal difference distribution)를 나타낼 수 있다. Based on the above description, in an embodiment, it is possible to train a parameter vector of a neural network for a superspherical space to have a Gaussian prior. In a normal Gaussian distribution, a projection vector calculated by a difference arithmetic operation between two parameter vectors may represent a normal difference distribution.

심층 신경망에서 손실

에 더하여 정규화

을 갖는 목적 함수

,

는 최소 손실

,

(여기서,

는 입력 벡터를 나타냄)에 가까운 파라미터 텐서(Tensor)

를 최적화할 수 있다. 파라미터 텐서는 다차원 매트릭스로서, 예를 들어, 행렬(matrix) 또는 벡터(vector)일 수 있다. Loss in deep neural networks

In addition to normalization

Objective function with

,

Is the minimum loss

,

(here,

Denotes the input vector)

Can be optimized. The parameter tensor is a multidimensional matrix, and may be, for example, a matrix or a vector.

본 명세서에 기재된 '파라미터 벡터'는 경우에 따라서 파라미터 텐서일 수도 있고, 파라미터 매트릭스일 수도 있다. The'parameter vector' described in the present specification may be a parameter tensor or a parameter matrix in some cases.

는 파라미터 벡터의 매트릭들(예를 들어, 뉴런(neuron) 또는 커널(kernels))을 나타내고, L은 레이어들의 수를 나타내며, λ> 0는 정규화의 정도(degree)를 제어하기 위한 것일 수 있다.

Denotes metrics of the parameter vector (eg, neurons or kernels), L denotes the number of layers, and λ> 0 may be for controlling the degree of normalization.

예를 들어, 분류 작업(classication task)의 경우, 손실 함수

에 교차 엔트로피 손실(cross entropy loss)이 사용될 수 있다. For example, for a classification task, the loss function

A cross entropy loss can be used.

일 실시예에서는 새로운 정규화 공식

을 이용하여 정규화를 수행할 수 있다.In one embodiment, the new normalization formula

Normalization can be performed using.

단일 레이어에서

의 구성 요소인

는 주어진 입력을 유클리드 매트릭 공간(

)에 정의된 임베딩 공간을 변환하기 위한 투영 벡터를 나타낼 수 있다. In a single layer

Is a component of

Is the given input in the Euclidean metric space(

A projection vector for transforming the embedding space defined in) can be represented.

단위 길이 투영(unit-length projection)

를 정의함으로써, 새로운 파라미터 벡터

가 d-구

(여기서,

는

및 중심이 제로(zero)임을 나타냄)에서 정의될 수 있다. 다시 말해, 투영 벡터

는 초구의 중심을 지시하는 중심 벡터(center vector)

및 산술 연산

을 사용하는 표면 벡터(surface vector)

에 의해 정의될 수 있다. Unit-length projection

By defining the new parameter vector

Going d-gu

(here,

Is

And the center is zero). In other words, the projection vector

Is the center vector indicating the center of the initial sphere

And arithmetic operations

Surface vector using

Can be defined by

일 실시예에서는 중심 벡터

및 표면 벡터

에 의해 d- 구

를 정의할 수 있다. 이하에서는 표기의 단순화를 위해,

대신에

를 사용하기로 한다.Centroid vector in one embodiment

And surface vectors

By d- phrase

Can be defined. In the following, for simplicity of notation,

Instead of

I decided to use.

일 실시예에서는 구의 반지름을 1로 간주하지만, 파라미터 벡터 벡터는 r> 0인 반지름을 가질 수 있다. In an embodiment, the radius of the sphere is regarded as 1, but the parameter vector vector may have a radius of r>0.

도 1의 (a)를 참조하면, 초구형 공간의 각 구형 공간에서 중심 벡터를 기준으로 구성되는 계층적 구형 공간들이 도시된다.Referring to FIG. 1A, hierarchical spherical spaces configured based on a center vector in each spherical space of a superspherical space are shown.

전체 영역(global area)의 반지름은 레벨

이 극한으로 향할 때,

에 수렴할 수 있다. 여기서,

는 반지름 시리즈의 합이고, δ는 상수(constant)일 수 있다.

는 구의 초기 반지름을 나타내고, 상수 δ는 절대값이 1보다 작은 반경

사이의 비율(ratio)을 나타낸다. The radius of the global area is the level

As we go to this extreme,

Can converge on. here,

Is the sum of the radial series, and δ can be a constant.

Denotes the initial radius of the sphere, and the constant δ is the radius with an absolute value less than 1

It represents the ratio between.

도 1의 (b)를 참조하면, 일 실시예에 따른 초구형 공간에 포함된 비-중첩 구들(non-overlapping spheres)이 도시된다. 전체 영역의 반지름은 일련의 구의 초기 반지름

에 의해 제한될 수 있다. 이것은 공간을 포함하는 비-중첩 구들을 배열하는 초구형 패킹(Hypersphere packing)의 반복 처리와 유사하다. Referring to FIG. 1B, non-overlapping spheres included in a super-spherical space according to an embodiment are shown. The radius of the entire area is the initial radius of the series of spheres.

May be limited by This is similar to the iterative process of Hypersphere packing, which arranges non-overlapping spheres containing spaces.

도 1의 (c)를 참조하면, 일 실시예에 따라 제한된 공간(bounded space)에 모델링된 계층적 초구형 공간이 도시된다. 도 1의 (b)에 이어, 계층적 2- 구체(hierarchical 2-sphere)는 더 높은 차원의 구인 초구형으로 정의되고 일반화될 수 있다. Referring to FIG. 1C, a hierarchical superspherical space modeled in a bounded space according to an embodiment is shown. Following (b) of FIG. 1, a hierarchical 2-sphere can be defined and generalized as a sphere of a higher dimension.

일 실시예에서는 입력 벡터에 대한 변환으로 투영 벡터 또는 투영 매트릭스와 같은 파라미터 벡터를 사용하여 다양성이 커지도록 하는 방향으로 파라미터 벡터를 트레이닝할 수 있다. 다시 말해, 파라미터 벡터들 간의 전역적 균일 분포를 통한 정규화에 의해 파라미터 벡터들의 다양성이 커지도록 할 수 있다. 이를 위해, 일 실시예서는 계층적 공간을 통해 파라미터 벡터 간의 의미를 적용하고, 동일 의미 공간(예를 들어, 하나의 그룹을 구성하는 동일 계층에 속한 구들) 및 비동일 의미 공간(예를 들어, 서로 다른 계층에 속한 구들) 안에서의 거리 매트릭(distance matric)을 통해 고차원 파라미터 벡터들 간의 분포를 다양하게 할 수 있다. In an embodiment, the parameter vector may be trained in a direction that increases diversity by using a parameter vector such as a projection vector or a projection matrix as a transformation for an input vector. In other words, it is possible to increase the diversity of parameter vectors by normalization through a global uniform distribution between parameter vectors. To this end, in one embodiment, a semantic between parameter vectors is applied through a hierarchical space, and the same semantic space (eg, phrases belonging to the same layer constituting one group) and a non-identical semantic space (eg, The distribution between high-dimensional parameter vectors can be varied through the distance metric in spheres belonging to different layers).

도 1의 (c)에서, 구(110)은 예를 들어, 제1 계층의 구에 해당하고, 구(121) 및 구(123)은 제2 계층의 구에 해당할 수 있다. 이때, 동일 계층에 해당하는 구(121) 및 구(123)은 하나의 그룹(120)에 해당할 수 있다. 구(130)은 제3 계층의 구에 해당할 수 있다. 이때, 도 1의 (c)에 도시된 계층적 초구형 공간에서 동일한 계층에 속한 구들(예를 들어, 구(121) 및 구(123))의 중심들은 동일한 계층의 상위 계층에 속한 구(예를 들어, 구(110))의 중심에 기초하여 결정될 수 있다. In (c) of FIG. 1, the sphere 110 corresponds to, for example, a sphere of a first layer, and the sphere 121 and the sphere 123 may correspond to a sphere of a second layer. In this case, the sphere 121 and the sphere 123 corresponding to the same layer may correspond to one group 120. The sphere 130 may correspond to a third layer of sphere. At this time, in the hierarchical superspherical space shown in FIG. 1(c), the centers of spheres belonging to the same layer (eg, sphere 121 and sphere 123) are spheres belonging to a higher layer of the same layer (eg For example, it may be determined based on the center of the sphere 110.

도 1의 (d)를 참조하면, 일 실시예에 따른 중심 벡터(center vector)

와 표면 벡터(surface vector)

및 투영 벡터(projection vector)

가 도시된다. 투영 벡터

는

와 같이 표면 벡터

와 중심 벡터

간의 차이에 의해 결정되며, 그 크기는 조절 가능하다. 이때,

일 수 있으며,

의 배수로 존재할 수 있다. 여기서, 각각의 벡터

,

는 전술한

,

에 해당할 수 있다. Referring to (d) of FIG. 1, a center vector according to an embodiment

And surface vector

And projection vector

Is shown. Projection vector

Is

Surface vector as

And centroid vector

It is determined by the difference between them, and the size is adjustable. At this time,

Can be

Can exist in multiples of Where, each vector

,

Is the aforementioned

,

It may correspond to.

일 실시예에서는 초구형의 계층적 구조가 기호

로 표기되는 레벨와이즈 구조(levelwise structure)와 기호

로 표기되는 그룹와이즈 구조(groupwise structure)로 구성되어 있다고 가정한다. In one embodiment, the super-spherical hierarchical structure is a symbol

Levelwise structure and symbol represented by

It is assumed that it is composed of a groupwise structure represented by.

레벨와이즈Levelwise 구조( rescue( LevelwiseLevelwise structure) structure)

에 대한 파라미터 벡터는 다음과 같이 레벨와이즈 기호(notation)

에 의해 아래의 수학식 1과 같이 정의될 수 있다.

The parameter vector for is a levelwise notation as

It can be defined as in Equation 1 below.

여기서, 파라미터 벡터들은

- 레벨의

-구 대해,

와 같이 정의될 수 있다. Here, the parameter vectors are

-Level of

-About Goo,

It can be defined as

일 실시예에서는 도 1의 (b) 및 (c) 보다 높은 차원의 공간에서 계층적 파라미터 벡터를 정의할 수 있다. In an embodiment, a hierarchical parameter vector may be defined in a space having a higher dimension than FIGS. 1B and 1C.

레벨와이즈 설정에서,

및

는 이전 레벨에서 계산된 중심 벡터에 기초하여

와 같이 나타낼 수 있다. 여기서,

는 누적된 중심 벡터를 나타내고,

는

에서

까지 새롭게 연결된 파라미터 벡터를 나타낸다. In the levelwise setting,

And

Is based on the centroid vector calculated at the previous level

It can be expressed as here,

Denotes the accumulated centroid vector,

Is

in

Represents a parameter vector newly connected to.

를

로 표시함으로써,

-레벨에서의 중심 벡터는

와 같이 정의되고, 표면 벡터는

와 같이 정의될 수 있다.

To

By denoting as,

-The center vector at the level is

Is defined as, and the surface vector is

It can be defined as

현재 레벨에서의 중심 벡터와 표면 벡터는 모두 이전 레벨에서의 중심 벡터에 기반할 수 있다. 다만, 모든 샘플들이 하위 샘플(child sample)을 가지는 것은 아니므로, 개별 투영 벡터가 아닌 대표 벡터 또는 중심 벡터를 기반으로 분기하는 것이 더 합리적일 수 있다. Both the center vector and the surface vector at the current level can be based on the center vector at the previous level. However, since not all samples have child samples, it may be more rational to branch based on a representative vector or a center vector rather than an individual projection vector.

레벨은 계층적 구조에서의 각 계층에 해당할 수 있다. 이하에서 '레벨'과 '계층'은 동일한 의미로 이해될 수 있다. The level may correspond to each level in the hierarchical structure. Hereinafter,'level' and'layer' may be understood to have the same meaning.

전술한 수학식 1은 아래의 수학식 2와 같이 표현될 수 있다. Equation 1 described above can be expressed as Equation 2 below.

일 실시예에서는

를 사용하여

번째 레벨의 중심 벡터에서

번째 레벨까지 연결된 벡터를 나타낼 수 있다. In one embodiment

use with

From the center vector of the first level

It can represent a vector connected up to the first level.

그룹와이즈Groupwise 구조( rescue( GroupwiseGroupwise structure) structure)

그룹 표기법

에 의해, 수학식 1에서의 중심 벡터는

-번째 레벨에서

그룹의 d-구인

에서

와 같이 표현될 수 있다. Group notation

By, the center vector in Equation 1 is

-At the first level

Group d-jobs

in

It can be expressed as

는

- 레벨에서의 그룹 집합을 나타내고,

는 카디날리티(cardinality)를 나타낸다.

Is

-Represents the set of groups at the level,

Denotes cardinality.

현재 레벨에서의 그룹

는 이전 레벨

의 그룹에서 조정될 수 있다. 여기서,

일 수 있다. Group at current level

Is the previous level

Can be adjusted in groups of. here,

Can be

레벨들에 대한 그룹와이즈 별 관계로, 인접 표시(adjacency indication)

가 계산될 수 있다. 실시예에 따라서, 인접 표시는 확률 모델(probability model)로 대체될 수 있다. 따라서,

-번째 레벨에서의 투영 벡터는

와 같이 결정될 수 있다. 여기서,

일 수 있다. Groupwise relationship for levels, adjacency indication

Can be calculated. Depending on the embodiment, the proximity indication may be replaced with a probability model. therefore,

The projection vector at the -th level is

It can be determined as here,

Can be

또한,

는 그들의 그룹 조건을 참조하는

및 인접 매트릭스

에 기초하여 산출될 수 있다. Also,

To refer to their group conditions

And adjacency matrix

It can be calculated based on

레벨에서의 그룹

의 대표 벡터는

이고, 대표 벡터

는

의 평균 벡터(mean vector)와 같을 수 있다.

Group at the level

The representative vector of is

Is, the representative vector

Is

Can be equal to the mean vector of.

그룹

의 대표 벡터가 특정 벡터와 이전 레벨의 중심 벡터에 의해 결정되면, 조정 인자(

)가

와 같이 사용될 수 있다. 여기서,

일 수 있다. group

If the representative vector of is determined by the specific vector and the center vector of the previous level, then the adjustment factor (

)end

Can be used with here,

Can be

일 실시예에서 각 계층 별 파라미터 벡터는 구형 공간에서 중심 벡터를 기준으로 정의되고, 이는 그룹 별 트레이닝에 적합한 형태일 수 있다. 일 실시예에서는 계층적 초구형 공간에 포함된 구(들)의 중심 및/또는 구(들)의 반지름을 정의하고, 그룹 별 공간에 제한 조건(constraint)를 부여하여 정규화를 수행할 수 있다. In an embodiment, a parameter vector for each layer is defined based on a center vector in a spherical space, and this may be a form suitable for group-specific training. In an embodiment, the center of the sphere(s) and/or the radius of the sphere(s) included in the hierarchical superspherical space may be defined, and a constraint may be applied to the space for each group to perform normalization.

위에서 정의한 계층적 파라미터 벡터의 정규화 항목(regularization term)은 아래와 같이 정의될 수 있다. The regularization term of the hierarchical parameter vector defined above can be defined as follows.

파라미터 벡터들의 집합

은 아래의 수학식 3과 같이 계층적 정규화의 최적화 대상일 수 있다. 여기서,

일 수 있다. Set of parameter vectors

May be an optimization target of hierarchical normalization as shown in Equation 3 below. here,

Can be

여기서,

는 개별 구

에 작용하고,

이며,

은 구에 기하-인식 제한 조건(geometry-aware constraints)을 적용하기 위한 것이다. 다시 말해, 항목

은 구들 간의 관계가 어떻게 형성되어야 하는지를 나타내는 구들 간의 관계성에 제한 조건에 해당할 수 있다. here,

Is an individual phrase

Acting on,

Is,

Is to apply geometry-aware constraints to the sphere. In other words, the entry

May be a limiting condition on the relationship between the spheres, which indicates how the relationship between the spheres should be formed.

수학식 3은 상, 하위 레벨 간의 정규화를 위한 것일 수 있다. Equation 3 may be for normalization between upper and lower levels.

은 아래의 수학식 4와 같이, 정규화 항목의 두 부분, 1)

동일 그룹

에서 투영 벡터들에 대한 항목(

)과 2)

의 동일 레벨에 있는 그룹들을 가로지르는 중심 벡터들에 대한 항목(

)으로 구성될 수 있다.

As shown in Equation 4 below, the two parts of the normalization item, 1)

Same group

Items for projection vectors in (

) And 2)

Items for centroid vectors traversing groups at the same level of (

).

여기서, 항목

는 투영 벡터들 간의 거리에 대한 정규화 항목으로서, 아래의 수학식 5와 같이 표현될 수 있다. 또한, 항목

는 중심 벡터들 간의 거리에 대한 정규화 항목으로서, 아래의 수학식 6과 같이 표현될 수 있다. Where, the item

Is a normalization item for the distance between projection vectors, and can be expressed as Equation 5 below. Also, the entry

Is a normalization item for the distance between center vectors, and can be expressed as Equation 6 below.

여기서,

일 수 있다.

이고,

일 수 있다.

는 파라미터 벡터들 사이의 거리 매트릭을 나타낼 수 있다. here,

Can be

ego,

Can be

May represent a distance metric between parameter vectors.

예를 들어, 입력으로 미니 배치(mini batch)가 주어지면, 정규화 항목은

가 될 수 있다. For example, given a mini batch as input, the normalization item is

Can be.

전술한 수학식 3의 계층적 정규화 외에도, 중심 벡터

에는 직교 촉진 항목(orthogonality promoting term)이 적용될 수 있다. In addition to the hierarchical normalization of Equation 3 above, the center vector

The orthogonality promoting term may be applied.

여기서,

이고,

는 프로베니우스 놈(Frobenius norm)이며,

일 수 있다. here,

ego,

Is Frobenius norm,

Can be

예를 들어, 계층적 정보를 가지지 않는 파라미터 벡터들에는 크기(magnitude)

최소화 및 에너지 최소화를 적용할 수 있다. 이때, 크기 최소화는

(여기서,

이고,

임)에 의해 수행될 수 있다. 에너지 최소화는

(여기서,

)에 의해 수행할 수 있다. 에너지 최소화는 '쌍 거리(pairwise distance) 최소화'라고도 부를 수 있다. For example, for parameter vectors that do not have hierarchical information, the magnitude

Minimization and energy minimization can be applied. At this time, minimizing the size

(here,

ego,

Im) can be performed by. Energy minimization

(here,

). Energy minimization can also be referred to as'pairwise distance minimization'.

수학식 3의 우측에 기재된 제한 조건

항목(constraint term)은 동일한 레벨과 전체 레벨들에서 서로 다른 구들 사이의 기하 인식 관계형 파라미터 벡터들(geometry-aware relational parameter vectors)을 구성하는 데에 도움이 될 수 있다. The limiting condition described on the right side of Equation 3

Constraint terms can help in constructing geometry-aware relational parameter vectors between different spheres at the same level and at all levels.

다중 제한 조건들은

와 같이 정의되고,

는

레벨과

레벨에서의 파라미터 벡터들 간의

번째 제한 조건을 나타낼 수 있다.

는 라그랑주 승수(Lagrange multiplier)를 나타낼 수 있다. Multiple constraints are

Is defined as,

Is

Level and

Between parameter vectors in the level

Can represent the second constraint.

Can represent a Lagrange multiplier.

일 실시예에서는 기하학적 관점에서 세 가지 제한 조건을 적용할 있다. 세 가지 제한 조건은 예를 들어, 다음과 같이 정의될 수 있다. In one embodiment, three constraints can be applied from a geometric point of view. The three constraints can be defined as follows, for example.

1. 제한 조건 1(C₁): 아래 수식과 같이

번째 내부 구의 반지름이

번째 외부 구의 반지름보다 작아야 함. 1. Constraint 1(C ₁ ): As shown in the formula below

The radius of the inner sphere

Must be less than the radius of the first outer sphere.

2. 제한 조건 2(C₂): 아래 수식과 같이

번째 내부 구의 중심이

번째 외부 구 내에 위치해야 함. 2. Constraint 2 (C ₂ ): As shown in the formula below

The center of the inner sphere

Must be located within the first outer ward.

3. 제한 조건 3(C₃) : 아래의 수식과 같이 구들 간의 여백이 0보다 커야 함. 3. Constraint 3 (C ₃ ): As shown in the formula below, the margin between the phrases must be greater than 0.

도 2는 일 실시예에 따른 구형 공간에서 쌍 거리를 최대화하기 위한 거리 매트릭을 산출하는 방법을 설명하기 위한 도면이다. 도 2를 참조하면, 한 쌍의 벡터

와

사이의 각도 거리

와 한 쌍의 벡터들 사이의 이산 거리

가 도시된다. 2 is a diagram illustrating a method of calculating a distance metric for maximizing a pair distance in a spherical space according to an exemplary embodiment. 2, a pair of vectors

Wow

Angular distance between

And the discrete distance between a pair of vectors

Is shown.

전술한 그룹와이즈 정의에 이산 프로덕트 매트릭(product metric)이 적합할 수 있으며, 이산 매트릭 공간(discrete metric space)에서 형성된 파라미터 벡터들로부터의 투영점은 서로 분리될 수 있다. A discrete product metric may be suitable for the groupwise definition described above, and projection points from parameter vectors formed in a discrete metric space may be separated from each other.

이산 거리는 동일한 각도 거리를 갖는 벡터 쌍들의 분포가 분산되도록 결정될 수 있다. 파라미터 벡터들 간의 거리를 최대화하기 위해, 이산 거리를 최대화하는 것은 파라미터 벡터들을 다양하게 분포시키는 데에 도움이 될 수 있다. The discrete distance may be determined such that the distribution of pairs of vectors having the same angular distance is distributed. In order to maximize the distance between the parameter vectors, maximizing the discrete distance can help in distributing the parameter vectors variously.

도 2에서, 한 쌍의 벡터

와

사이의 각도 거리

는 동일하지만, 벡터 간의 이산 거리

는 다를 수 있다. 파라미터 벡터 공간을 다양화하기 위해 부호가 있는 공간이 차이를 인식하는 데 효과적일 수 있다. In Figure 2, a pair of vectors

Wow

Angular distance between

Is the same, but the discrete distance between vectors

Can be different. In order to diversify the parameter vector space, a signed space may be effective in recognizing the difference.

유클리드 매트릭 공간

에서 사인 함수(sign function)를 이용하면 아래의 수학식 7과 같이 벡터

및

에 대한 이산 거리 매트릭(discrete distance metric)을 정의할 수 있다. At the metric space

If the sine function is used in the vector, as shown in Equation 7 below

And

You can define a discrete distance metric for.

여기서,

일 수 있다.

은 해밍 거리(Hamming distance)의 정규화된 버전을 나타낼 수 있다. 3진 이산(ternary discrete)의 경우,

에 {-1,0,1}이 사용될 수 있다.here,

Can be

May represent a normalized version of the Hamming distance. For ternary discrete,

{-1,0,1} can be used.

예를 들어, 이산 거리를 [0,1] 내의 각도 거리(angle distance)로 간주하기 위해, 정규화된 거리는

로 정의될 수 있다. For example, to consider the discrete distance as the angle distance within [0,1], the normalized distance is

Can be defined as

프로덕트에 기초한 각도 거리는

와 같이 표현될 수 있고,

일 수 있다. 반면, 각도는 코사인 유사성으로

로 간주될 수 있다. 따라서, 각도 거리를 얻기 위해서는 아크 코사인 함수(arccosine function)

가 필요할 수 있다. 정리하자면, 각도 거리

를 위해,

또는

가 적용될 수 있으며, 이때,

일 수 있다. The angular distance based on the product is

Can be expressed as,

Can be On the other hand, the angle is cosine similarity

Can be regarded as. Therefore, to obtain the angular distance, the arc cosine function

May be required. To sum up, the angular distance

for,

or

May be applied, in which case,

Can be

이산 거리는 모델 분포에 접근하도록 제한될 수 있다. The discrete distance can be limited to approach the model distribution.

일 실시예에서는 이산 거리 매트릭을 연속적인 각도 거리 매트릭

과 병합하여 하나의 매트릭을 구성할 수 있다. In one embodiment, the discrete distance metric is a continuous angular distance metric.

Can be merged to form a single metric.

이산 거리 매트릭과 연속 각도 거리 매트릭의 병합 시에는 예를 들어, 산술 평균(arithmetic mean; AM), 기하 평균(geometric mean; GM) 및 조화 평균(harmonic mean; HM)으로 구성된 피타고라스 평균(Pythagorean means)의 정의가 사용될 수 있다. When merging the discrete distance metric and the continuous angular distance metric, for example, the Pythagorean means consisting of the arithmetic mean (AM), geometric mean (GM) and harmonic mean (HM) The definition of can be used.

전술한 각도 쌍을 사용하는 피타고라스 평균은 아래의 수학식 8과 같이 정의될 수 있다. The Pythagorean average using the above-described angle pair may be defined as in Equation 8 below.

를 사용하는 각도 거리에서, 역전된 형태

는

대신 최소화된 형태로서 최적화 공식에서 각도를 최대화하기 위해 채택될 수 있다.

에서, 각도 및 그 코사인 값은

와 같은 반바례 관계를 나타낼 수 있다. 여기서, s = 1,2,...는 s-에너지를 시용하는 톰슨 문제(Thomson problem)에서 사용될 수 있다.

At angular distances using, inverted form

Is

Instead, it can be adopted to maximize the angle in the optimization formula as a minimized form.

In, the angle and its cosine value are

Such as It can represent an inverse relationship. Here, s = 1,2,... can be used in the Thomson problem using s-energy.

이러한 각도들의 코사인 유사성(cosine similarity)은 아래의 수학식 9와 같이 정의될 수 있다. The cosine similarity of these angles may be defined as in Equation 9 below.

여기서, 코사인 유사도 함수들은 [0,1] 이내의 거리 값을 갖도록

에 의해 정규화될 수 있다. Here, the cosine similarity functions have a distance value within [0,1]

Can be normalized by

마지막으로, 코사인 유사성의 피타고라스 평균은 아래의 수학식 10과 같이 계산될 수 있다. Finally, the Pythagorean mean of cosine similarity can be calculated as in Equation 10 below.

수학식 8, 9, 및 10에서 정의된 매트릭들은 비-음성(non-negativity), 대칭(symmetry) 및 삼각형 부등식(triangle inequality)의 세 가지 매트릭 조건을 충족시킬 수 있다. The metrics defined in Equations 8, 9, and 10 may satisfy three metric conditions: non-negativity, symmetry, and triangle inequality.

일 실시예에 따른 초구형은 조밀한 매니폴드(fine manifold)이기 때문에, 임의의 두 점 사이에서 전술한 매트릭들을 사용하는 거리가 제한될 수 있다. Since the superspherical shape according to one embodiment is a fine manifold, the distance using the aforementioned metrics between any two points may be limited.

사인 함수는 값 0에서 미분이 불가능하므로, 실시예에 따라서는 사인 함수를 대체하여 역전파 함수(backpropagation function)를 채택할 수 있다. 이산 매트릭의 사인 함수를 위해 일 실시예에서는 신경망의 역방향 경로(backward path)에 STE(straight-through estimator)를 채택할 수 있다.Since the sine function cannot be differentiated at a value of 0, a backpropagation function may be adopted by replacing the sine function according to an embodiment. For a sine function of a discrete metric, in one embodiment, a straight-through estimator (STE) may be used for a backward path of a neural network.

사인 함수의 도함수(derivative)는 역방향 경로에서 포화된 STE로 알려진

로 대체될 수 있다. The derivative of the sine function is known as the saturated STE in the reverse path.

Can be replaced with

의 도함수는 값

에서 정의되지 않기 때문에, 일 실시예에서는 코사인 함수에 클램핑을 적용하여

를 얻을 수 있다. 이때,

일 수 있다.

Is the derivative of the value

Since it is not defined in, in one embodiment, clamping is applied to the cosine function

Can be obtained. At this time,

Can be

도 3은 일 실시예에 따른 유클리디안 공간(Euclidean space)에서 연속적인 값을 이산적인 값으로 맵핑한 결과를 도시한 도면이다. 도 3의 (a)를 참조하면, 2차원 공간에서의 3진(ternary) 표현을 각 사분면 내의 모든 점들의 특정 표현으로 매핑한 결과가 도시된다. 도 3의 (b)를 참조하면, 이산화된 벡터들(discretized vectors) 간 거리를 경계(bound) 내에서 이산 값으로 나타난 결과가 도시된다. 3 is a diagram illustrating a result of mapping successive values to discrete values in a Euclidean space according to an exemplary embodiment. Referring to FIG. 3A, a result of mapping a ternary representation in a two-dimensional space to a specific representation of all points in each quadrant is shown. Referring to (b) of FIG. 3, a result of representing a distance between discretized vectors as a discrete value within a bound is shown.

벡터의 차원(dimensionality)이 높아질 질수록 그 벡터의 희소성(sparsity) 또한 증가할 확률이 높아진다. 이때, 유클리디언 거리(Euclidean distance)의 경우 (|x-y|^2=|x|^2+|y|^2-2 x·y)가 되고, (x·y

0)와 같이 두 파라미터 벡터가 비슷한 경우에도 두 파라미터 벡터의 크기값(|x|^2+|y|^2) 때문에 두 파라미터 벡터 간의 유사성을 반영하기 힘들 수 있다. The higher the dimensionality of a vector is, the higher the probability that the sparsity of the vector will also increase. At this time, in the case of the Euclidean distance, it becomes (|xy|^2=|x|^2+|y|^2-2 x·y), and (x·y

Even if the two parameter vectors are similar, such as 0), it may be difficult to reflect the similarity between the two parameter vectors because of the size values (|x|^2+|y|^2) of the two parameter vectors.

이때, 코사인 거리는 파라미터 벡터를 유닛 구형으로 투영(|x-y|2=2-2 x·y)시킨 후 계산하기 때문에, 노이즈 효과가 줄어들게 된다. 하지만 구형 공간에서 균등 분포를 갖는 파라미터 벡터를 찾는 입장에서는 탐색 공간이 늘어나기 때문에 여전히 최적화가 용이하지 않다. 따라서 일 실시예에서는 탐색 공간을 줄인 거리 공간을 사용할 수 있다. At this time, since the cosine distance is calculated after projecting the parameter vector into a unit sphere (|x-y|2=2-2 x·y), the noise effect is reduced. However, from the standpoint of finding a parameter vector having an even distribution in a spherical space, optimization is still not easy because the search space increases. Accordingly, in an embodiment, a distance space with a reduced search space may be used.

일 실시예에서는 유클리디언 공간에서의 연속적인 값을 예를 들어, 2진 또는 3진의 이산적인 값으로 매핑함으로써, 균등한 파라미터 벡터 분포를 안정적으로 트레이닝할 수 있다. In an embodiment, a uniform parameter vector distribution can be stably trained by mapping successive values in a Euclidean space to, for example, binary or ternary discrete values.

도 3과 같이 이산화된 공간에서 파라미터 벡터를 검색할 경우, 파라미터 벡터가 중복(redundant)되는 경우도 줄어들고, 해를 구하는 최적화 과정도 더 수월하게 수행될 수 있다. 하지만, 경우에 따라서 필요한 공간보다 공간이 더 적을 경우 표현력의 약화를 가져올 수 있으므로, 공간이 풍부한 연속적인 매트릭과의 결합을 통해 더 강인한 표현력을 가질 수 있다. 이를 위해, 일 실시예에서는 전술한 수학식 8 내지 10을 통해 코사인 거리 또는 아크코사인 거리와 같은 이산 거리 매트릭과 연속 각도 거리 매트릭의 병합을 수행할 수 있다. When a parameter vector is searched in a discrete space as shown in FIG. 3, redundant cases of parameter vectors are reduced, and an optimization process for obtaining a solution may be performed more easily. However, in some cases, if the space is less than the required space, the expressive power may be weakened, and thus, the expressive power may be stronger through the combination with a continuous matrix rich in space. To this end, in an embodiment, a discrete distance metric such as a cosine distance or an arc cosine distance and a continuous angular distance metric may be merged through Equations 8 to 10 described above.

도 4는 일 실시예에 따른 계층적 정규화를 적용한 네트워크의 구조를 도시한 도면이다. 도 4를 참조하면, 일 실시예에 따른 인코더(410), 성긴 분할기(Coarse Segmenter)(420), 조밀한 분류기(Fine Classifier)(430), 관계성 정규화기(440), 및 최적화기(Optimizer)(450)를 포함하는 네트워크가 도시된다. 4 is a diagram illustrating a structure of a network to which hierarchical normalization is applied according to an embodiment. 4, an encoder 410, a coarse segmenter 420, a fine classifier 430, a relational normalizer 440, and an optimizer according to an embodiment. ) 450 is shown.

인코더(410)는 입력 데이터에 대한 특징 벡터를 추출할 수 있다. The encoder 410 may extract feature vectors for input data.

성긴 분할기(Coarse Segmenter)(420)는 손실 함수(L) 및 정규화 함수(R)를 거쳐 특징 벡터에 대한 성긴 레이블(coarse label)을 출력할 수 있다. 성긴 분할기(420)는 전술한 수학식 3에 의한 상, 하위 레벨 간의 정규화를 수행하며, 성긴 레이블은 전술한 센터 벡터에 대응할 수 있다. The coarse segmenter 420 may output a coarse label for a feature vector through a loss function (L) and a normalization function (R). The coarse divider 420 performs normalization between the upper and lower levels according to Equation 3 described above, and the coarse label may correspond to the center vector described above.

조밀한 분류기(Fine Classifier)(430)는 손실 함수 및 정규화 함수를 거쳐 특징 벡터에 대한 조밀한 레이블(fine label)을 출력할 수 있다. 조밀한 분류기(430)는 전술한 수학식 4에 의한 동일 레벨 간의 정규화를 수행할 수 있으며, 조밀한 레이블은 전술한 표면 벡터에 해당할 수 있다. The fine classifier 430 may output a fine label for a feature vector through a loss function and a normalization function. The dense classifier 430 may perform normalization between the same levels according to Equation 4 described above, and the dense label may correspond to the aforementioned surface vector.

관계성 정규화기(440)는 성긴 레이블과 조밀한 레이블 간의 관계성에 의한 정규화를 수행할 수 있다. 관계성 정규화기(440)의 정규화 결과

는 전술한 수학식 3의 항목

에 해당하며, 구들 간의 관계가 어떻게 형성되어야 하는지를 나타내는 구들 간의 관계성에 제한 조건에 해당할 수 있다. The relationship normalizer 440 may perform normalization based on the relationship between the coarse label and the dense label. Normalization result of relational normalizer (440)

Is the item of Equation 3 above

It corresponds to, and may be a limiting condition on the relationship between the spheres, which indicates how the relationship between the spheres should be formed.

일 실시예에 따른 정규화는

와 같이 표현할 수 있으며, 이는 전술한 수학식 3 및 수학식 4에 대응할 수 있다. Normalization according to one embodiment is

It can be expressed as, which may correspond to Equation 3 and Equation 4 described above.

계층적 구조의 매 계층에서의 레이블은 성긴 레이블과 조밀한 레이블의 관계성

에 의해 학습되고, 가장 마지막 계층에서의 정규화는

에 의해 수행될 수 있다. Labels at every layer of a hierarchical structure are the relationship between coarse and dense labels.

Is learned by, and the normalization at the last layer is

Can be done by

정규화는 파라미터 벡터 간의 거리(예를 들어,

)를 최대화하거나, 또는 파라미터 벡터들 간의 에너지를 최소화함으로써 수행될 수 있다. Normalization is the distance between parameter vectors (e.g.

), or by minimizing the energy between the parameter vectors.

계층 정보를 반영한 정규화는 그룹별 파라미터 벡터의 통계적 특성(예를 들어, 평균(mean))을 반영한 그룹별 대표 파라미터 벡터의 정규화를 통해 수행될 수도 있다. The normalization by reflecting the layer information may be performed through normalization of the representative parameter vector for each group reflecting the statistical characteristics (eg, a mean) of the parameter vector for each group.

관계성을 나타내는

의 레이블은 반 감독 학습(semi-supervised learning) 또는 자율 학습(self-supervised learning)의 클러스터링을 통해 구할 수도 있다. 성긴 레이블에 대응하는 성긴 파라미터 벡터와 조밀한 레이블에 대응하는 조밀한 파라미터 벡터를 결합한 계층적 파라미터 벡터를 신경망에 적용하여 입력 데이터를 처리할 수 있다. Relational

The label of can also be obtained through clustering of semi-supervised learning or self-supervised learning. Input data can be processed by applying a hierarchical parameter vector that combines a coarse parameter vector corresponding to a coarse label and a dense parameter vector corresponding to a dense label to a neural network.

도 5은 일 실시예에 따른 계층적 파라미터 벡터를 계산하는 네트워크를 도시한 도면이다. 도 5를 참조하면, 일 실시예에 따른 입력 영상(input image)(510), 성긴 파라미터 벡터(Coarse parameter vector)(520), 조밀한 파라미터 벡터(Fine parameter vector)(530), 계층적 파라미터 벡터(Hierarchical parameter vector)(540) 및 특징(550)이 도시된다. 5 is a diagram illustrating a network for calculating a hierarchical parameter vector according to an embodiment. 5, an input image 510, a coarse parameter vector 520, a fine parameter vector 530, and a hierarchical parameter vector according to an embodiment. (Hierarchical parameter vector) 540 and features 550 are shown.

입력 영상(510)은 서로 다른 계층들에 속한 복수의 구들을 포함하는 계층적 초구형 공간을 거쳐 성긴 파라미터 벡터(520)와 조밀한 파라미터 벡터(530)로 표현될 수 있다. 이후, 성긴 파라미터 벡터와 조밀한 파라미터 벡터를 결합한 계층적 파라미터 벡터(540)를 신경망에 적용하여 입력 데이터를 처리함으로써 입력 데이터(510)에 대응하는 특징(550)이 출력될 수 있다. The input image 510 may be expressed as a coarse parameter vector 520 and a dense parameter vector 530 through a hierarchical superspherical space including a plurality of spheres belonging to different layers. Thereafter, a hierarchical parameter vector 540 obtained by combining a coarse parameter vector and a dense parameter vector is applied to the neural network to process the input data, thereby outputting a feature 550 corresponding to the input data 510.

도 6은 일 실시예에 따른 계층화된 노이즈 벡터의 생성을 통해 영상을 생성하는 생성기(generator)를 도시한 도면이다.6 is a diagram illustrating a generator that generates an image through generation of a layered noise vector according to an exemplary embodiment.

생성기는 다층 신경망을 구성할 수 있다. 또한, 전술한 성긴 파라미터 벡터와 조밀한 파라미터 벡터의 결합으로 계층화 표현이 된 인식기 혹은 생성기를 만들 수 있다. Generators can form multi-layer neural networks. In addition, by combining the above-described sparse parameter vector and dense parameter vector, it is possible to create a layered expression recognizer or generator.

계층화된 노이즈 벡터의 생성을 통해 영상을 생성할 수 있는 생성기의 활용이 가능하다.It is possible to use a generator capable of generating an image through the generation of a layered noise vector.

도 7은 일 실시예에 따른 신경망 기반의 데이터 처리 방법을 나타낸 흐름도이다. 도 7을 참조하면, 일 실시예에 따른 데이터 처리 장치는 입력 데이터를 수신한다(710). 입력 데이터는 예를 들어, 영상 데이터를 포함할 수 있다. 7 is a flowchart illustrating a data processing method based on a neural network according to an embodiment. Referring to FIG. 7, the data processing apparatus according to an embodiment receives input data (operation 710). The input data may include, for example, image data.

데이터 처리 장치는 서로 다른 계층들에 속한 복수의 구들을 포함하는 계층적 초구형 공간을 표현하는 복수의 파라미터 벡터들을 획득한다(720). 복수의 파라미터 벡터들은 예를 들어, 전술한 투영 벡터

또는 투영 파라미터 벡터에 해당할 수 있다. 복수의 파라미터 벡터들 각각은 예를 들어, 해당하는 구의 중심을 지시하는 중심 벡터

와 해당하는 구의 표면을 지시하는 표면 벡터(surface vector)

를 포함할 수 있다. The data processing apparatus acquires a plurality of parameter vectors representing a hierarchical super-spherical space including a plurality of spheres belonging to different layers (720). The plurality of parameter vectors are, for example, the above-described projection vector

Alternatively, it may correspond to a vector of projection parameters. Each of the plurality of parameter vectors is, for example, a center vector indicating the center of the corresponding sphere

And a surface vector indicating the surface of the corresponding sphere

It may include.

계층적 초구형 공간에서 동일한 계층에 속한 구들의 중심들은 예를 들어, 동일한 계층의 상위 계층에 속한 구의 중심에 기초하여 결정될 수 있다. 다시 말해, 현재 레벨에서의 중심 벡터와 표면 벡터는 모두 이전 레벨에서의 중심 벡터를 기반으로 할 수 있다. 일 실시예에 따른 계층적 초구형 공간은 다음과 같은 제한 조건들을 만족할 수 있다. 계층적 초구형 공간에서 특정 계층에 속한 구의 반지름은 특정 계층의 상위 계층에 속한 구의 반지름보다 작을 수 있다. 특정 계층에 속한 구의 중심은 특정 계층의 상위 계층에 속한 구의 내부에 위치하며, 계층적 초구형 공간에서 동일한 계층에 속한 구들은 서로 겹치지 않을 수 있다. In the hierarchical superspherical space, centers of spheres belonging to the same hierarchy may be determined based on, for example, the centers of spheres belonging to an upper hierarchy of the same hierarchy. In other words, both the center vector and the surface vector at the current level may be based on the center vector at the previous level. The hierarchical superspherical space according to an embodiment may satisfy the following constraints. In hierarchical superspherical space, the radius of a sphere belonging to a specific layer may be smaller than the radius of a sphere belonging to an upper layer of a specific layer. The center of a sphere belonging to a specific layer is located inside a sphere belonging to a higher layer of a specific layer, and spheres belonging to the same layer may not overlap each other in a hierarchical superspherical space.

이때, 계층적 초구형 공간에서 복수의 파라미터 벡터들이 전역적으로 균일하게 분포된 정도를 지시하는, 복수의 파라미터 벡터들의 분포도는 임계 분포도보다 클 수 있다. 분포도는 예를 들어, 복수의 파라미터 벡터들 사이의 이산적 거리 및 복수의 파라미터 벡터들 사이의 연속적 거리의 조합에 기초하여 결정될 수 있다. 이산적 거리는 복수의 파라미터 벡터들을 양자화한 뒤, 양자화된 파라미터 벡터들 사이의 해밍 거리를 계산함으로써 결정될 수 있다. 이산적 거리는 예를 들어, 도 2를 통해 전술한 벡터들 간의 이산 거리

에 해당할 수 있다. In this case, a distribution map of the plurality of parameter vectors indicating a degree to which the plurality of parameter vectors are uniformly distributed globally in the hierarchical superspherical space may be greater than the critical distribution map. The distribution degree may be determined, for example, based on a combination of a discrete distance between a plurality of parameter vectors and a continuous distance between a plurality of parameter vectors. The discrete distance may be determined by quantizing a plurality of parameter vectors and then calculating a Hamming distance between the quantized parameter vectors. The discrete distance is, for example, a discrete distance between vectors described above through FIG. 2

It may correspond to.

연속적 거리는 복수의 파라미터 벡터들 사이의 각도 거리(angular distance)를 포함할 수 있다. 연속적 거리는 예를 들어, 도 2를 통해 각도 거리

에 해당할 수 있다. The continuous distance may include an angular distance between a plurality of parameter vectors. The continuous distance is, for example, an angular distance through FIG. 2

It may correspond to.

데이터 처리 장치는 복수의 파라미터 벡터들을 신경망에 적용한다(730). 여기서, 신경망은 예를 들어, 컨볼루션 신경망(convolutional neural network)을 포함하고, 복수의 파라미터 벡터들은 복수의 필터 파라미터 벡터들을 포함할 수 있다. 데이터 처리 장치는 복수의 파라미터 벡터들 각각에 대응하여, 해당하는 중심 벡터와 해당하는 표면 벡터에 기초한 투영 벡터를 생성하고, 투영 벡터를 신경망에 적용할 수 있다. 여기서, 해당하는 중심 벡터와 해당하는 표면 벡터는 계층적 초구형 공간에 포함된 복수의 구들 중 어느 하나의 레벨(또는 계층)에 속하는 구의 중심 벡터와 표면 벡터에 해당할 수 있다. 예를 들어, 현재의 레벨

인 경우, 레벨

인 구의 중심을 지시하는 중심 벡터는 전술한

에 해당하고, 레벨

인 구의 표면을 지시하는 표면 벡터는 전술한

에 해당할 수 있다. The data processing apparatus applies a plurality of parameter vectors to the neural network (730). Here, the neural network may include, for example, a convolutional neural network, and the plurality of parameter vectors may include a plurality of filter parameter vectors. The data processing apparatus may generate a projection vector based on a corresponding center vector and a corresponding surface vector corresponding to each of the plurality of parameter vectors, and apply the projection vector to the neural network. Here, the corresponding center vector and the corresponding surface vector may correspond to a center vector and a surface vector of a sphere belonging to one level (or layer) among a plurality of spheres included in the hierarchical superspherical space. For example, the current level

If it is, the level

The centroid vector indicating the center of the population is

Corresponds to and level

The surface vector indicating the surface of the population is

It may correspond to.

데이터 처리 장치는 단계(730)에서 복수의 파라미터 벡터들이 적용된 신경망을 이용하여 입력 데이터를 처리한다(740).In step 730, the data processing apparatus processes the input data using a neural network to which a plurality of parameter vectors are applied (740).

도 8은 일 실시예에 따른 신경망 트레이닝 방법을 나타낸 흐름도이다. 도 8을 참조하면, 일 실시예에 따른 트레이닝 장치는 트레이닝 데이터를 수신한다(810). 트레이닝 데이터는 예를 들어, 영상 데이터를 포함할 수 있다. 8 is a flowchart illustrating a neural network training method according to an embodiment. Referring to FIG. 8, the training apparatus according to an embodiment receives training data (810). The training data may include, for example, image data.

트레이닝 장치는 신경망을 이용하여 트레이닝 데이터를 처리한다(820). 신경망은 예를 들어, 컨볼루션 신경망을 포함하고, 신경망의 복수의 파라미터 벡터들은 복수의 필터 파라미터 벡터들을 포함할 수 있다. 복수의 파라미터 벡터들 각각은 해당하는 구의 중심을 지시하는 중심 벡터와 해당하는 구의 표면을 지시하는 표면 벡터를 포함할 수 있다. The training device processes training data using a neural network (820). The neural network may include, for example, a convolutional neural network, and a plurality of parameter vectors of the neural network may include a plurality of filter parameter vectors. Each of the plurality of parameter vectors may include a center vector indicating a center of a corresponding sphere and a surface vector indicating a surface of the corresponding sphere.

트레이닝 장치는 트레이닝 데이터의 레이블과 트레이닝 데이터의 처리 결과에 기초하여 손실 항목(loss term)

을 결정한다(830).The training device provides a loss term based on the label of the training data and the processing result of the training data.

Is determined (830).

트레이닝 장치는 신경망의 파라미터 벡터들이 계층적 초구형 공간을 표현하도록, 정규화 항목(regularization term)

을 결정한다(840). 이때, 계층적 초구형 공간은 서로 다른 계층들에 속한 복수의 구들을 포함할 수 있다. 또한, 계층적 초구형 공간에서 동일한 계층에 속한 구들의 중심들은 동일한 계층의 상위 계층에 속한 구의 중심에 기초하여 결정될 수 있다. 단계(840)에서, 정규화 항목은 예를 들어, 계층적 초구형 공간에서 특정 계층에 속한 구의 반지름이 특정 계층의 상위 계층에 속한 구의 반지름보다 작아지도록 하는 제1 제약 조건, 특정 계층에 속한 구의 중심이 특정 계층의 상위 계층에 속한 구의 내부에 위치하도록 하는 제2 제약 조건, 및 계층적 초구형 공간에서 동일한 계층에 속한 구들이 서로 겹치지 않도록 하는 제3 제약 조건 중 적어도 하나를 고려하여 결정될 수 있다. The training device is a regularization term so that the parameter vectors of the neural network represent a hierarchical super-spherical space.

Is determined (840). In this case, the hierarchical superspherical space may include a plurality of spheres belonging to different layers. In addition, centers of spheres belonging to the same hierarchy in the hierarchical superspherical space may be determined based on the centers of spheres belonging to an upper hierarchy of the same hierarchy. In step 840, the normalization item is, for example, a first constraint such that a radius of a sphere belonging to a specific layer in a hierarchical superspherical space is smaller than a radius of a sphere belonging to an upper layer of a specific layer, and the center of a sphere belonging to a specific layer. It may be determined in consideration of at least one of a second constraint to be positioned inside a sphere belonging to an upper layer of the specific layer, and a third constraint to prevent spheres belonging to the same layer from overlapping each other in a hierarchical super-spherical space.

정규화 항목은 예를 들어, 복수의 파라미터 벡터들의 분포도가 임계 분포도보다 커지도록 결정될 수 있다. 이때, 분포도는 계층적 초구형 공간에서 복수의 파라미터 벡터들이 전역적으로 균일하게 분포된 정도, 다시 말해 정규화의 정도(

)를 지시할 수 있다. 분포도는 예를 들어, 복수의 파라미터 벡터들 사이의 이산적 거리 및 복수의 파라미터 벡터들 사이의 연속적 거리의 조합에 기초하여 결정될 수 있다. 이산적 거리는 복수의 파라미터 벡터들을 양자화한 뒤, 양자화된 파라미터 벡터들 사이의 해밍 거리를 계산함으로써 결정될 수 있다. 연속적 거리는 복수의 파라미터 벡터들 사이의 각도 거리를 포함할 수 있다. The normalization item may be determined such that, for example, a distribution degree of a plurality of parameter vectors is greater than a critical distribution degree. In this case, the distribution map is the degree to which the plurality of parameter vectors are uniformly distributed globally in the hierarchical superspherical space, that is, the degree of normalization (

) Can be indicated. The distribution degree may be determined, for example, based on a combination of a discrete distance between a plurality of parameter vectors and a continuous distance between a plurality of parameter vectors. The discrete distance may be determined by quantizing a plurality of parameter vectors and then calculating a Hamming distance between the quantized parameter vectors. The continuous distance may include an angular distance between a plurality of parameter vectors.

또한, 정규화 항목은 예를 들어, 계층적 구형 공간에서 서로 동일한 계층에 속한 구들의 중심 벡터들 사이의 거리에 기반하는 제1 거리 항목, 계층적 구형 공간에서 서로 동일한 계층에 속한 구들의 표면 벡터들 사이의 거리에 기반하는 제2 거리 항목, 계층적 구형 공간에서 서로 다른 계층에 속한 구들의 중심 벡터들 사이의 거리에 기반하는 제3 거리 항목, 및 계층적 구형 공간에서 서로 다른 계층에 속한 구들의 표면 벡터들 사이의 거리에 기반하는 제4 거리 항목 중 적어도 하나에 기초하여 결정될 수 있다. In addition, the normalization item is, for example, a first distance item based on the distance between center vectors of spheres belonging to the same layer in the hierarchical sphere space, and surface vectors of spheres belonging to the same layer in the hierarchical sphere space. The second distance item based on the distance between, the third distance item based on the distance between the center vectors of the spheres belonging to different layers in the hierarchical spherical space, and the spheres belonging to different layers in the hierarchical spherical space It may be determined based on at least one of the fourth distance items based on the distance between the surface vectors.

트레이닝 장치는 단계(830)에서 결정한 손실 항목 및 단계(840)에서 결정한 정규화 항목에 기초하여, 파라미터 벡터들을 트레이닝(training)한다(850). The training apparatus trains the parameter vectors based on the loss item determined in step 830 and the normalization item determined in step 840 (850).

도 9는 일 실시예에 따른 신경망 기반의 데이터 처리 장치의 블록도이다. 도 9를 참조하면, 일 실시예에 따른 데이터 처리 장치(900)는 통신 인터페이스(910) 및 프로세서(920)를 포함한다. 데이터 처리 장치(900)는 메모리(930)를 더 포함할 수 있다. 통신 인터페이스(910), 프로세서(920), 및 메모리(930)는 통신 버스(905)를 통해 서로 통신할 수 있다. 9 is a block diagram of a data processing apparatus based on a neural network according to an embodiment. Referring to FIG. 9, a data processing apparatus 900 according to an embodiment includes a communication interface 910 and a processor 920. The data processing device 900 may further include a memory 930. The communication interface 910, the processor 920, and the memory 930 may communicate with each other through a communication bus 905.

통신 인터페이스(910)는 입력 데이터를 수신한다. 통신 인터페이스(910)는 서로 다른 계층들에 속한 복수의 구들을 포함하는 계층적 초구형 공간을 표현하는 복수의 파라미터 벡터들을 획득한다. The communication interface 910 receives input data. The communication interface 910 acquires a plurality of parameter vectors representing a hierarchical superspherical space including a plurality of spheres belonging to different layers.

프로세서(920)는 복수의 파라미터 벡터들을 신경망에 적용하고, 신경망을 이용하여 입력 데이터를 처리한다. The processor 920 applies a plurality of parameter vectors to a neural network and processes input data using the neural network.

또한, 프로세서(920)는 도 1 내지 도 8을 통해 전술한 적어도 하나의 방법 또는 적어도 하나의 방법에 대응되는 알고리즘을 수행할 수 있다. 프로세서(920)는 목적하는 동작들(desired operations)을 실행시키기 위한 물리적인 구조를 갖는 회로를 가지는 하드웨어로 구현된 데이터 처리 장치일 수 있다. 예를 들어, 목적하는 동작들은 프로그램에 포함된 코드(code) 또는 인스트럭션들(instructions)을 포함할 수 있다. 예를 들어, 하드웨어로 구현된 데이터 처리 장치는 마이크로프로세서(microprocessor), 중앙 처리 장치(central processing unit), 프로세서 코어(processor core), 멀티-코어 프로세서(multi-core processor), 멀티프로세서(multiprocessor), ASIC(Application-Specific Integrated Circuit), FPGA(Field Programmable Gate Array)를 포함할 수 있다.In addition, the processor 920 may perform at least one method described above through FIGS. 1 to 8 or an algorithm corresponding to at least one method. The processor 920 may be a data processing device implemented in hardware having a circuit having a physical structure for executing desired operations. For example, desired operations may include code or instructions included in a program. For example, a data processing device implemented in hardware is a microprocessor, a central processing unit, a processor core, a multi-core processor, and a multiprocessor. , Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA).

프로세서(920)는 프로그램을 실행하고, 데이터 처리 장치(900)를 제어할 수 있다. 프로세서(920)에 의하여 실행되는 프로그램 코드는 메모리(930)에 저장될 수 있다.The processor 920 may execute a program and control the data processing device 900. The program code executed by the processor 920 may be stored in the memory 930.

메모리(930)는 상술한 프로세서(920)의 처리 과정에서 생성되는 다양한 정보들을 저장할 수 있다. 이 밖에도, 메모리(930)는 각종 데이터와 프로그램 등을 저장할 수 있다. 메모리(930)는 휘발성 메모리 또는 비휘발성 메모리를 포함할 수 있다. 메모리(930)는 하드 디스크 등과 같은 대용량 저장 매체를 구비하여 각종 데이터를 저장할 수 있다.The memory 930 may store various types of information generated in the process of the processor 920 described above. In addition, the memory 930 may store various types of data and programs. The memory 930 may include a volatile memory or a nonvolatile memory. The memory 930 may include a mass storage medium such as a hard disk to store various types of data.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.As described above, although the embodiments have been described by the limited drawings, a person of ordinary skill in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order from the described method, and/or components such as systems, structures, devices, circuits, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved. Therefore, other implementations, other embodiments, and claims and equivalents fall within the scope of the following claims.

Claims

Receiving input data;
Obtaining a plurality of parameter vectors representing a hierarchical-hyperspherical space, the hierarchical-hyperspherical space including a plurality of spheres belonging to different layers;
Applying the plurality of parameter vectors to a neural network; And
Processing the input data using the neural network
Containing,
Neural network-based data processing method.

The method of claim 1,
The neural network includes a convolutional neural network,
The plurality of parameter vectors includes a plurality of filter parameter vectors,
Neural network-based data processing method.

The method of claim 1,
The input data includes image data,
Neural network-based data processing method.

The method of claim 1,
The centers of spheres belonging to the same hierarchy in the hierarchical superspherical space are
It is determined based on the center of a sphere belonging to an upper layer of the same layer,
Neural network-based data processing method.

The method of claim 1,
In the hierarchical super-spherical space
The radius of a sphere belonging to a specific layer is smaller than the radius of a sphere belonging to an upper layer of the specific layer,
Neural network-based data processing method.

The method of claim 1,
In the hierarchical super-spherical space
The center of a sphere belonging to a specific layer is located inside a sphere belonging to a higher layer of the specific layer,
Neural network-based data processing method.

The method of claim 1,
In the hierarchical super-spherical space
Spheres belonging to the same hierarchy do not overlap each other,
Neural network-based data processing method.

The method of claim 1,
The distribution diagram of the plurality of parameter vectors-the distribution diagram indicates the degree to which the plurality of parameter vectors are uniformly distributed globally in the hierarchical superspherical space- is greater than a critical distribution diagram,
Neural network-based data processing method.

The method of claim 8,
The distribution chart is
Determined based on a combination of a discrete distance between the plurality of parameter vectors and a continuous distance between the plurality of parameter vectors,
Neural network-based data processing method.

The method of claim 9,
The discrete distance is
After quantizing the plurality of parameter vectors, it is determined by calculating a hamming distance between the quantized parameter vectors,
Neural network-based data processing method.

The method of claim 9,
The continuous distance is
Including an angular distance between the plurality of parameter vectors,
Neural network-based data processing method.

The method of claim 1,
Each of the plurality of parameter vectors is
Including a center vector indicating the center of the corresponding sphere and a surface vector indicating the surface of the corresponding sphere,
Neural network-based data processing method.

The method of claim 12,
The applying step
Corresponding to each of the plurality of parameter vectors,
Generating a projection vector based on the corresponding center vector and the corresponding surface vector; And
Applying the projection vector to the neural network
Containing,
Neural network-based data processing method.

Receiving training data;
Processing the training data using a neural network;
Determining a loss term based on a label of the training data and a result of processing the training data;
Determining a regularization term such that the parameter vectors of the neural network represent a hierarchical super-spherical space-the hierarchical super-spherical space includes a plurality of spheres belonging to different layers; And
Training the parameter vectors based on the loss item and the normalization item
Containing,
Neural network training method.

The method of claim 14,
The neural network includes a convolutional neural network,
The plurality of parameter vectors include a plurality of filter parameter vectors,
The training data includes image data,
Neural network training method.

The method of claim 14,
Centers of spheres belonging to the same layer in the hierarchical superspherical space are determined based on the centers of spheres belonging to an upper layer of the same layer,
Neural network training method.

The method of claim 14,
The above normalization item is
A first constraint in which a radius of a sphere belonging to a specific layer in the hierarchical superspherical space is smaller than a radius of a sphere belonging to an upper layer of the specific layer;
A second constraint condition in which the center of the sphere belonging to the specific layer is located inside the sphere belonging to the higher layer of the specific layer; And
A third constraint that prevents spheres belonging to the same layer from overlapping each other in the hierarchical superspherical space
Determined in consideration of at least one of,
Neural network training method.

The method of claim 14,
The above normalization item is
The distribution diagram of the plurality of parameter vectors-the distribution diagram indicates a degree to which the plurality of parameter vectors are uniformly distributed globally in the hierarchical superspherical space-is determined to be greater than a critical distribution diagram,
Neural network training method.

The method of claim 18,
The distribution chart is
It is determined based on a combination of a discrete distance between the plurality of parameter vectors and a continuous distance between the plurality of parameter vectors,
Neural network training method.

The method of claim 19,
The discrete distance is
After quantizing the plurality of parameter vectors, it is determined by calculating a Hamming distance between the quantized parameter vectors,
The continuous distance is
Including the angular distance between the plurality of parameter vectors,
Neural network training method.

The method of claim 14,
Each of the plurality of parameter vectors is
Containing a center vector indicating the center of the corresponding sphere and a surface vector indicating the surface of the corresponding sphere,
Neural network training method.

The method of claim 14,
The above normalization item is
A first distance item based on a distance between center vectors of spheres belonging to the same layer in the hierarchical spherical space;
A second distance item based on a distance between surface vectors of spheres belonging to the same layer in the hierarchical spherical space;
A third distance item based on a distance between center vectors of spheres belonging to different layers in the hierarchical spherical space; And
A fourth distance item based on a distance between surface vectors of spheres belonging to different layers in the hierarchical spherical space;
It is determined based on at least one of,
Neural network training method.

A computer program stored in a computer-readable recording medium for executing the method of any one of claims 14 to 22 in combination with hardware.

A communication interface for receiving input data and obtaining a plurality of parameter vectors representing a hierarchical super-spherical space, the hierarchical super-spherical space including a plurality of spheres belonging to different layers; And
A processor that applies the plurality of parameter vectors to a neural network and processes the input data using the neural network
Containing,
Neural network-based data processing device.