KR20230115143A

KR20230115143A - Apparatus for generating a learning model using log scaling loss function and method therefor

Info

Publication number: KR20230115143A
Application number: KR1020220011779A
Authority: KR
Inventors: 박영현
Original assignee: 에스케이플래닛 주식회사
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2023-08-02

Abstract

A method for generating a learning model of the present invention comprises: a step of preparing, by a data processing part, learning data of input data for learning and label corresponding to the input data; a step of inputting, by the data processing part, the input data for learning into a learning model wherein learning has not been completed; a step of calculating, by the learning model, output data for learning through a plurality of operations wherein a weight for which learning between a plurality of strata has not been completed are applied for the input data for learning; a step of calculating, by a loss deriving part, a loss representing a difference of the output data for learning and the label using a log mean square error function wherein a mean square error function is converted to a log scale; and a step of performing, by an optimization part, optimization to modify the weight of the learning model so that the loss is minimized.

Description

Apparatus for generating a learning model using log scaling loss function and method therefor}

본 발명은 손실함수에 관한 것으로, 더욱 상세하게는, 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치 및 이를 위한 방법에 관한 것이다. The present invention relates to a loss function, and more particularly, to an apparatus and method for generating a learning model using a logarithmic scaling loss function.

기계학습 알고리즘의 경우 모델을 학습시키기 위해 데이터의 세부적인 특징을 정의해야 한다. 만약, 기계학습 시스템 운영자가 해당 문제에 대한 고급 지식이나 경험이 있다면, 기계학습모델 훈련을 위한 특징 정보는 보다 정교하게 정의될 것이다. 그러나 이러한 레벨에 도달하기까지 많은 시간과 노력, 비용이 소요된다는 한계가 있다. 또한 이 경우 기계학습 운영자에 대한 시스템 의존도가 높아진다. In the case of machine learning algorithms, detailed features of the data must be defined in order to train the model. If the machine learning system operator has advanced knowledge or experience on the problem, the feature information for machine learning model training will be more elaborately defined. However, there is a limitation that it takes a lot of time, effort, and money to reach this level. In addition, this increases the system's dependence on machine learning operators.

반면, 심층학습 기법을 사용할 때 훈련을 위해서는 그래픽 처리 장치와 같은 고성능 컴퓨팅 환경이 요구된다. 대신, 앞서 언급한 기존 방법에서 결정 규칙이나 특징 정의와 같은 영역 지식을 기반으로 하는 섬세한 작업을 심층학습 기법이 대신할 수 있다. 이러한 심층학습 기법에서는 신경망을 더 잘 훈련시키기 위해 운영자는 모델 구조 및 손실 함수를 정의하기만 하면 된다. 다양한 효과적인 손실 함수와 신경망 구조가 이미 제안되고 있다. On the other hand, when using a deep learning technique, a high-performance computing environment such as a graphic processing unit is required for training. Instead, deep learning techniques can replace delicate tasks based on domain knowledge such as decision rules or feature definitions in the aforementioned existing methods. In these deep learning techniques, the operator only needs to define the model structure and loss function to train the neural network better. Various effective loss functions and neural network architectures have already been proposed.

즉, 심층학습기반 알고리즘은 작업에 대한 최소한의 도메인 지식도 없거나, 최소한의 도메인 지식만을 가지고 인공신경망 모델을 생성할 수 있다는 장점으로 인해 널리 채택되고 있다. 대신 인공신경망을 보다 안정적으로 훈련시키기 위해서는 적절한 신경망 구조나 손실 함수를 정의하는 것이 바람직하다. In other words, deep learning-based algorithms are widely adopted due to the advantage of being able to create an artificial neural network model with no or minimal domain knowledge about the task. Instead, it is desirable to define an appropriate neural network structure or loss function in order to train artificial neural networks more stably.

한국공개특허 제2020-0116225호 (2020년10월12일 공개)Korean Patent Publication No. 2020-0116225 (published on October 12, 2020)

본 발명의 목적은 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치 및 이를 위한 방법을 제공함에 있다. An object of the present invention is to provide an apparatus and method for generating a learning model using a logarithmic scaling loss function.

본 발명의 학습모델을 생성하기 위한 방법은 학습모델이 학습용 입력 데이터에 대해 학습용 출력 데이터를 산출하면, 손실도출부가 평균제곱오차 함수를 로그 스케일로 변환한 로그평균제곱오차 함수를 이용하여 상기 학습용 출력 데이터와 상기 라벨의 차이를 나타내는 손실을 산출하는 단계와, 최적화부가 상기 손실이 최소가 되도록 상기 학습모델의 가중치를 수정하는 최적화를 수행하는 단계를 포함한다. In the method for generating a learning model of the present invention, when the learning model calculates output data for learning with respect to the input data for learning, the loss derivation unit converts the mean square error function to a logarithmic scale using a log mean square error function for the learning output. Calculating a loss representing the difference between the data and the label, and performing optimization by an optimization unit to modify the weight of the learning model so that the loss is minimized.

상기 손실을 산출하는 단계는 상기 로그평균제곱오차 함수 를 이용하여 상기 손실을 산출하고, 상기 y는 상기 라벨이고, 상기 는 상기 학습용 출력 데이터인 것을 특징으로 한다. Calculating the loss is the log mean square error function Calculate the loss using , wherein y is the label, and is the output data for learning.

상기 로그평균제곱오차 함수는 평균제곱오차 함수의 구조에서 음수 형식의 로그 함수로 변환하고, 1-X 형식을 적용하고, X에 를 대입하여 생성되는 것을 특징으로 한다. The log mean square error function converts the structure of the mean square error function into a log function in negative form, applies the 1-X form, and to X It is characterized in that it is generated by substituting.

본 발명의 학습모델을 생성하기 위한 방법은 데이터처리부가 학습용 입력 데이터 및 입력 데이터에 대응하는 라벨을 포함하는 학습 데이터를 마련하는 단계와, 상기 데이터처리부가 상기 학습용 입력 데이터를 학습이 완료되지 않은 학습모델에 입력하는 단계와, 상기 학습모델이 상기 학습용 입력 데이터에 대해 복수의 계층간 학습이 완료되지 않은 가중치가 적용되는 복수의 연산을 통해 학습용 출력 데이터를 산출하는 단계와, 손실도출부가 평균제곱오차 함수를 로그 스케일로 변환한 로그평균제곱오차 함수를 이용하여 상기 학습용 출력 데이터와 상기 라벨의 차이를 나타내는 손실을 산출하는 단계와, 최적화부가 상기 손실이 최소가 되도록 상기 학습모델의 가중치를 수정하는 최적화를 수행하는 단계를 포함한다. A method for generating a learning model of the present invention includes the steps of preparing learning data including input data for learning and a label corresponding to the input data by a data processing unit, The step of inputting data to a model, the learning model calculating output data for learning through a plurality of operations to which weights for which learning between a plurality of layers is not completed are applied to the input data for learning, and a loss derivation unit calculating the mean square error Calculating a loss representing a difference between the training output data and the label by using a log mean square error function obtained by converting a function to a log scale; Optimization in which an optimizer modifies the weight of the learning model so that the loss is minimized. It includes the steps of performing

상기 손실을 산출하는 단계는 로그평균제곱오차 함수 를 이용하여 상기 손실을 산출하고, 상기 y는 라벨이고, 는 학습용 출력 데이터인 것을 특징으로 한다. Calculating the loss is a log mean square error function Calculate the loss using , where y is a label, It is characterized in that is output data for learning.

상기 학습모델이 생성형 네트워크인 경우, 상기 라벨은 학습용 입력 데이터이고, 상기 학습모델이 분류형 네트워크인 경우, 상기 라벨은 학습용 입력 데이터가 속하는 분류를 나타내는 목적값인 것을 특징으로 한다. When the learning model is a generative network, the label is input data for learning, and when the learning model is a classified network, the label is a target value representing a classification to which the input data for learning belongs.

본 발명의 학습모델을 생성하기 위한 장치는 학습모델이 학습용 입력 데이터에 대해 학습용 출력 데이터를 산출하면, 평균제곱오차 함수를 로그 스케일로 변환한 로그평균제곱오차 함수를 이용하여 상기 학습용 출력 데이터와 상기 라벨의 차이를 나타내는 손실을 산출하는 손실도출부와, 상기 손실이 최소가 되도록 상기 학습모델의 가중치를 수정하는 최적화를 수행하는 최적화부를 포함한다. An apparatus for generating a learning model of the present invention, when the learning model calculates output data for learning with respect to input data for learning, uses a log mean square error function converted from a mean square error function to a logarithmic scale to obtain the output data for learning and the output data for learning. It includes a loss derivation unit that calculates a loss representing the difference between labels, and an optimization unit that performs optimization to modify the weights of the learning model so that the loss is minimized.

상기 손실도출부는 상기 로그평균제곱오차 함수 를 이용하여 상기 손실을 산출하고, 상기 y는 상기 라벨이고, 상기 는 상기 학습용 출력 데이터인 것을 특징으로 한다. The loss derivation unit uses the log mean square error function Calculate the loss using , wherein y is the label, and is the output data for learning.

본 발명의 학습모델을 생성하기 위한 장치는 학습용 입력 데이터 및 입력 데이터에 대응하는 라벨을 학습 데이터를 마련하고, 상기 학습용 입력 데이터를 학습이 완료되지 않은 학습모델에 입력하는 데이터처리부와, 상기 학습모델이 상기 학습용 입력 데이터에 대해 복수의 계층간 학습이 완료되지 않은 가중치가 적용되는 복수의 연산을 통해 학습용 출력 데이터를 산출하면, 평균제곱오차 함수를 로그 스케일로 변환한 로그평균제곱오차 함수를 이용하여 상기 학습용 출력 데이터와 상기 라벨의 차이를 나타내는 손실을 산출하는 손실도출부와, 상기 손실이 최소가 되도록 상기 학습모델의 가중치를 수정하는 최적화를 수행하는 최적화부를 포함한다. An apparatus for generating a learning model of the present invention includes a data processing unit that prepares learning input data for learning and a label corresponding to the input data, and inputs the learning input data to a learning model in which learning has not been completed, and the learning model. When output data for learning is calculated through a plurality of operations to which weights for which learning between a plurality of layers are not completed are applied to the input data for learning, a log mean square error function converted from a mean square error function to a log scale is used. It includes a loss derivation unit that calculates a loss representing the difference between the training output data and the label, and an optimization unit that performs optimization to modify the weight of the learning model so that the loss is minimized.

상기 손실도출부는 상기 로그평균제곱오차 함수 를 이용하여 상기 손실을 산출하고, 상기 y는 라벨이고, 는 학습용 출력 데이터인 것을 특징으로 한다. The loss derivation unit uses the log mean square error function Calculate the loss using , where y is a label, It is characterized in that is output data for learning.

본 발명의 손실 함수인 로그평균제곱오차 함수를 이용하여 경사하강법에 의한 최적화를 수행하면, 안정적으로 최적화를 수행할 수 있다. Optimization by the gradient descent method can be performed stably by using the log mean square error function, which is the loss function of the present invention.

도 1은 본 발명의 실시예에 따른 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 학습모델을 생성하기 위한 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 실시예에 따른 로그평균제곱오차(LMSE) 함수의 이점을 설명하기 위한 그래프이다.
도 4는 본 발명의 일 실시예에 따른 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치를 구현하기 위한 하드웨어 시스템의 예시도이다. 1 is a diagram for explaining the configuration of an apparatus for generating a learning model using a logarithmic scaling loss function according to an embodiment of the present invention.
2 is a flowchart illustrating a method for generating a learning model according to an embodiment of the present invention.
3 is a graph for explaining the advantages of the Log Mean Square Error (LMSE) function according to an embodiment of the present invention.
4 is an exemplary diagram of a hardware system for implementing an apparatus for generating a learning model using a log scaling loss function according to an embodiment of the present invention.

본 발명의 과제 해결 수단의 특징 및 이점을 보다 명확히 하기 위하여, 첨부된 도면에 도시된 본 발명의 특정 실시 예를 참조하여 본 발명을 더 상세하게 설명한다. In order to clarify the characteristics and advantages of the problem solving means of the present invention, the present invention will be described in more detail with reference to specific embodiments of the present invention shown in the accompanying drawings.

다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다. However, detailed descriptions of well-known functions or configurations that may obscure the gist of the present invention will be omitted in the following description and accompanying drawings. In addition, it should be noted that the same components are indicated by the same reference numerals throughout the drawings as much as possible.

이하의 설명 및 도면에서 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. The terms or words used in the following description and drawings should not be construed as being limited to a common or dictionary meaning, and the inventor may appropriately define the concept of terms for explaining his/her invention in the best way. It should be interpreted as a meaning and concept consistent with the technical spirit of the present invention based on the principle that there is. Therefore, the embodiments described in this specification and the configurations shown in the drawings are only one of the most preferred embodiments of the present invention, and do not represent all of the technical ideas of the present invention. It should be understood that there may be equivalents and variations.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하기 위해 사용하는 것으로, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용될 뿐, 상기 구성요소들을 한정하기 위해 사용되지 않는다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제2 구성요소는 제1 구성요소로 명명될 수 있고, 유사하게 제1 구성요소도 제2 구성요소로 명명될 수 있다. In addition, terms including ordinal numbers, such as first and second, are used to describe various components, and are used only for the purpose of distinguishing one component from other components, and to limit the components. Not used. For example, a second element may be termed a first element, and similarly, a first element may be termed a second element, without departing from the scope of the present invention.

더하여, 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급할 경우, 이는 논리적 또는 물리적으로 연결되거나, 접속될 수 있음을 의미한다. 다시 말해, 구성요소가 다른 구성요소에 직접적으로 연결되거나 접속되어 있을 수 있지만, 중간에 다른 구성요소가 존재할 수도 있으며, 간접적으로 연결되거나 접속될 수도 있다고 이해되어야 할 것이다. Additionally, when an element is referred to as being “connected” or “connected” to another element, it means that it is logically or physically connected or capable of being connected. In other words, it should be understood that a component may be directly connected or connected to another component, but another component may exist in the middle, or may be indirectly connected or connected.

또한, 본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 본 명세서에서 기술되는 "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. In addition, terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In addition, terms such as "include" or "having" described in this specification are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or the It should be understood that the above does not preclude the possibility of the presence or addition of other features, numbers, steps, operations, components, parts, or combinations thereof.

또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. In addition, terms such as “… unit”, “… unit”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. there is.

또한, "일(a 또는 an)", "하나(one)", "그(the)" 및 유사어는 본 발명을 기술하는 문맥에 있어서(특히, 이하의 청구항의 문맥에서) 본 명세서에 달리 지시되거나 문맥에 의해 분명하게 반박되지 않는 한, 단수 및 복수 모두를 포함하는 의미로 사용될 수 있다. Also, "a or an", "one", "the" and similar words in the context of describing the invention (particularly in the context of the claims below) indicate otherwise in this specification. may be used in the sense of including both the singular and the plural, unless otherwise clearly contradicted by the context.

아울러, 본 발명의 범위 내의 실시 예들은 컴퓨터 실행가능 명령어 또는 컴퓨터 판독가능 매체에 저장된 데이터 구조를 가지거나 전달하는 컴퓨터 판독가능 매체를 포함한다. 이러한 컴퓨터 판독가능 매체는, 범용 또는 특수 목적의 컴퓨터 시스템에 의해 액세스 가능한 임의의 이용 가능한 매체일 수 있다. 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EPROM, CD-ROM 또는 기타 광 디스크 저장장치, 자기 디스크 저장장치 또는 기타 자기 저장장치, 또는 컴퓨터 실행가능 명령어, 컴퓨터 판독가능 명령어 또는 데이터 구조의 형태로 된 소정의 프로그램 코드 수단을 저장하거나 전달하는 데에 이용될 수 있고, 범용 또는 특수 목적 컴퓨터 시스템에 의해 액세스 될 수 있는 임의의 기타 매체와 같은 물리적 저장 매체를 포함할 수 있지만, 이에 한정되지 않는다. In addition, embodiments within the scope of the present invention include computer-readable media having or conveying computer-executable instructions or data structures stored thereon. Such computer readable media can be any available media that can be accessed by a general purpose or special purpose computer system. By way of example, such computer readable media may be in the form of RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, or computer executable instructions, computer readable instructions or data structures. physical storage media such as, but not limited to, any other medium that can be used to store or convey any program code means in a computer system and which can be accessed by a general purpose or special purpose computer system. .

이하의 설명 및 특허 청구 범위에서, "네트워크"는 컴퓨터 시스템들 및/또는 모듈들 간의 전자 데이터를 전송할 수 있게 하는 하나 이상의 데이터 링크로서 정의된다. 정보가 네트워크 또는 다른 (유선, 무선, 또는 유선 또는 무선의 조합인) 통신 접속을 통하여 컴퓨터 시스템에 전송되거나 제공될 때, 이 접속은 컴퓨터-판독가능매체로서 이해될 수 있다. 컴퓨터 판독가능 명령어는, 예를 들면, 범용 컴퓨터 시스템 또는 특수 목적 컴퓨터 시스템이 특정 기능 또는 기능의 그룹을 수행하도록 하는 명령어 및 데이터를 포함한다. 컴퓨터 실행가능 명령어는, 예를 들면, 어셈블리어, 또는 심지어는 소스코드와 같은 이진, 중간 포맷 명령어일 수 있다. In the following description and claims, a "network" is defined as one or more data links that enable the transfer of electronic data between computer systems and/or modules. When information is transmitted or provided to a computer system over a network or other (wired, wireless, or combination of wired or wireless) communication connection, the connection may be understood as a computer-readable medium. Computer readable instructions include, for example, instructions and data that cause a general purpose or special purpose computer system to perform a particular function or group of functions. Computer executable instructions may be, for example, binary, intermediate format instructions, such as assembly language, or even source code.

아울러, 본 발명은 퍼스널 컴퓨터, 랩탑 컴퓨터, 핸드헬드 장치, 멀티프로세서 시스템, 마이크로프로세서-기반 또는 프로그램 가능한 가전제품(programmable consumer electronics), 네트워크 PC, 미니컴퓨터, 메인프레임 컴퓨터, 모바일 전화, PDA, 페이저(pager) 등을 포함하는 다양한 유형의 컴퓨터 시스템 구성을 가지는 네트워크 컴퓨팅 환경에서 적용될 수 있다. 본 발명은 또한 네트워크를 통해 유선 데이터 링크, 무선 데이터 링크, 또는 유선 및 무선 데이터 링크의 조합으로 링크된 로컬 및 원격 컴퓨터 시스템 모두가 태스크를 수행하는 분산형 시스템 환경에서 실행될 수 있다. 분산형 시스템 환경에서, 프로그램 모듈은 로컬 및 원격 메모리 저장 장치에 위치될 수 있다. In addition, the present invention relates to personal computers, laptop computers, handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile phones, PDAs, pagers It can be applied in a network computing environment having various types of computer system configurations including (pager) and the like. The invention may also be practiced in distributed system environments where tasks are performed by both local and remote computer systems linked by wired data links, wireless data links, or a combination of wired and wireless data links through a network. In a distributed system environment, program modules may be located in local and remote memory storage devices.

먼저, 본 발명의 실시예에 따른 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치의 구성을 설명하기 위한 도면이다. First, an apparatus for generating a learning model using a logarithmic scaling loss function according to an embodiment of the present invention will be described. 1 is a diagram for explaining the configuration of an apparatus for generating a learning model using a logarithmic scaling loss function according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 학습모델을 생성하기 위한 장치(10, 이하, '학습장치'로 축약함)는 학습 데이터를 통해 학습모델(LM: machine learning model 혹은 deep learning model)을 생성하기 위한 것이다. 이러한 본 발명의 학습장치(10)는 데이터처리부(100), 손실산출부(200) 및 최적화부(300)를 포함한다. Referring to FIG. 1, a device (10, hereinafter abbreviated as 'learning device') for generating a learning model according to an embodiment of the present invention is a learning model (LM: machine learning model or deep learning model) through learning data. ) to generate. The learning device 10 of the present invention includes a data processing unit 100, a loss calculation unit 200, and an optimization unit 300.

학습모델(LM)은 복수의 계층을 포함하며, 복수의 계층 각각은 복수의 연산을 수행한다. 어느 하나의 계층의 복수의 연산 모듈의 연산 결과 각각은 가중치가 적용되어 다음 계층에 전달된다. 이는 현 계층의 연산 결과에 가중치가 적용되어 다음 계층의 연산에 입력되는 것을 의미한다. 즉, 학습모델(LM)은 복수의 계층의 가중치가 적용되는 복수의 연산을 수행한다. The learning model LM includes a plurality of layers, and each of the plurality of layers performs a plurality of operations. Each calculation result of a plurality of calculation modules of any one layer is transferred to the next layer after a weight is applied. This means that a weight is applied to the calculation result of the current layer and input to the calculation of the next layer. That is, the learning model LM performs a plurality of operations to which weights of a plurality of layers are applied.

학습모델(LM)의 복수의 계층은 완전연결계층(Fully-connected layer), 컨벌루션계층(Convolutional layer), 순환계층(Recurrent layer), 그래프계층(Graph layer), 풀링계층(Pooling Layer) 중 하나 이상의 조합을 포함한다. 복수의 연산은 컨볼루션(Convolution) 연산, 다운샘플링(Down Sampling) 연산, 업샘플링(Up Sampling) 연산, 활성화함수에 의한 연산 등을 예시할 수 있다. 여기서, 활성화함수는 시그모이드(Sigmoid), 하이퍼볼릭탄젠트(tanh: Hyperbolic tangent), ELU(Exponential Linear Unit), ReLU(Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax 등을 예시할 수 있다. The plurality of layers of the learning model (LM) include one or more of a fully-connected layer, a convolutional layer, a recurrent layer, a graph layer, and a pooling layer. contains a combination The plurality of operations may include a convolution operation, a down sampling operation, an up sampling operation, an operation using an activation function, and the like. Here, the activation function may include sigmoid, hyperbolic tangent (tanh), exponential linear unit (ELU), rectified linear unit (ReLU), leaky ReLU, Maxout, Minout, Softmax, and the like. .

학습모델(LM)은 생성형 네트워크이거나, 분류형 네트워크일 수 있다. The learning model (LM) may be a generative network or a classification network.

학습모델(LM)이 생성형 네트워크인 경우, 학습모델(LM)은 입력 데이터가 입력되면, 복수의 계층의 가중치가 적용되는 복수의 연산을 수행하여 입력 데이터를 모사하여 출력 데이터를 출력한다. 즉, 출력 데이터는 모사 입력 데이터가 될 수 있다. 이러한 생성형 네트워크는, 예컨대, 이러한 RBM(Restricted Boltzmann Machine), AE(Auto-Encoder), GAN(Generative Adversarial Network) 등을 예시할 수 있다. When the learning model LM is a generative network, when input data is input, the learning model LM performs a plurality of operations to which weights of a plurality of layers are applied to simulate the input data and outputs output data. That is, the output data may be simulated input data. Such a generative network may exemplify, for example, a Restricted Boltzmann Machine (RBM), an Auto-Encoder (AE), a Generative Adversarial Network (GAN), and the like.

학습모델(LM)이 분류형 네트워크인 경우, 학습모델(LM)은 입력 데이터가 입력되면, 복수의 계층의 가중치가 적용되는 복수의 연산을 수행하여 복수의 분류에 대한 확률을 출력 데이터로 출력한다. 즉, 출력 데이터는 입력 데이터가 복수의 분류 각각에 속할 확률이다. 이러하 분류형 네트워크는 예컨대, CNN(Convolution Neural Network), RNN(Recurrent Neural Network) 등을 예시할 수 있다. When the learning model (LM) is a classification network, when input data is input, the learning model (LM) performs a plurality of operations to which weights of a plurality of layers are applied, and outputs probabilities for a plurality of classifications as output data. . That is, the output data is a probability that the input data belongs to each of a plurality of classifications. Such a classification type network may exemplify, for example, a Convolution Neural Network (CNN), a Recurrent Neural Network (RNN), and the like.

데이터처리부(100)는 학습 데이터를 마련하기 위한 것이다. 학습 데이터는 학습모델(LM)에 입력되는 학습용 입력 데이터 및 학습용 입력 데이터에 대응하는 라벨을 포함한다. 학습모델(LM)이 생성형 네트워크인 경우, 라벨은 학습용 입력 데이터가 될 수 있다. 한편, 학습모델(LM)이 분류형 네트워크인 경우, 라벨은 학습용 입력 데이터가 속하는 분류를 나타내는 목적값이 될 수 있다. 데이터처리부(100)는 학습 데이터가 마련되면, 학습 데이터 중 입력 데이터를 학습모델(LM)에 입력한다. 그러면, 학습모델(LM)은 학습용 입력 데이터에 대해 복수의 계층간 학습이 완료되지 않은 가중치가 적용되는 복수의 연산을 통해 학습용 출력 데이터를 산출할 것이다. The data processing unit 100 is for preparing learning data. The learning data includes input data for learning input to the learning model LM and labels corresponding to the input data for learning. When the learning model (LM) is a generative network, labels may be input data for learning. Meanwhile, when the learning model (LM) is a classified network, the label may be a target value indicating a classification to which input data for learning belongs. When the learning data is prepared, the data processing unit 100 inputs input data among the learning data to the learning model LM. Then, the learning model LM will calculate output data for learning through a plurality of calculations to which a weight for which learning between a plurality of layers has not been completed is applied to the input data for learning.

손실도출부(200)는 손실 함수를 이용하여 손실을 산출하기 위한 것이다. 여기서, 손실은 학습용 출력 데이터와 라벨의 차이를 나타낸다. 본 발명의 실시예에 따른 손실 함수는 로그 스케일 기반의 손실 함수이다. The loss deriving unit 200 is for calculating a loss using a loss function. Here, the loss represents the difference between the output data for learning and the label. A loss function according to an embodiment of the present invention is a log scale based loss function.

보다 구체적으로, 본 발명의 실시예에 따른 손실 함수는 평균제곱오차(MSE: mean squared error) 함수를 로그 스케일로 변환하여 생성하며, '로그평균제곱오차(LMSE: Logarithmic Mean Squared Error)' 함수라고 칭하기로 한다. 이러한 로그평균제곱오차(LMSE) 함수는 다음의 수학식 1과 같다. More specifically, the loss function according to an embodiment of the present invention is generated by converting a mean squared error (MSE) function to a logarithmic scale, and is referred to as a 'logarithmic mean squared error (LMSE)' function. to name This log mean square error (LMSE) function is shown in Equation 1 below.

여기서, 는 학습모델의 학습용 출력 데이터이고, 는 학습모델의 학습용 입력 데이터에 대응하는 라벨을 나타낸다. 여기서, 라벨은 학습모델(LM)이 생성형 네트워크인 경우, 학습용 입력 데이터가 될 수 있다. 한편, 라벨은 학습모델(LM)이 분류형 네트워크인 경우, 학습용 입력 데이터가 속하는 분류를 나타내는 목적값이 될 수 있다. here, Is the learning output data of the learning model, represents a label corresponding to the learning input data of the learning model. Here, the label may be input data for learning when the learning model LM is a generative network. Meanwhile, when the learning model LM is a classified network, the label may be a target value representing a classification to which input data for learning belongs.

본 발명의 로그평균제곱오차(LMSE) 함수는 평균제곱오차(MSE) 함수의 구조를 유지하면서 로그 스케일로 변환한 것이다. 및 의 범위가 0과 1 사이에서 정규화 될 때, 로그 스케일을 가지는 의 범위는 -1에서 0이다. 손실을 최소화하기 위한 태스크, 즉, 최적화에 로그 스케일인 로그평균제곱오차(LMSE) 함수를 적용하기 위해 로그평균제곱오차(LMSE) 함수의 손실의 범위를 0에서 C(양의 유한 값) 형태로 조정해야 한다. The log mean square error (LMSE) function of the present invention is converted to a logarithmic scale while maintaining the structure of the mean square error (MSE) function. and When the range of is normalized between 0 and 1, it has a logarithmic scale The range of is -1 to 0. In order to apply the Log Mean Square Error (LMSE) function, which is a log scale to the task for minimizing the loss, i.e. optimization, the range of the loss of the Log Mean Square Error (LMSE) function is given in the form of 0 to C (positive finite value). need to adjust

평균제곱오차(MSE) 함수의 구조를 유지하면서 로그 함수를 활용하기 위해, 로그 함수의 표현 범위가 반전되도록 음수 형식의 로그 함수로 변환한다. 이에 따라, 출력 범위는 1에서 0으로 변경된다. 그런 다음, 출력 범위의 시작과 끝의 값(1~0)을 뒤집기 위해 1-X 형식을 적용한다. 이어서, 1-X 형식의 X에 를 대입한다. In order to utilize the logarithmic function while maintaining the structure of the mean square error (MSE) function, it is converted into a logarithmic function in negative form so that the expression range of the logarithmic function is inverted. Accordingly, the output range is changed from 1 to 0. Then, apply the 1-X format to flip the values (1 to 0) at the beginning and end of the output range. Then, for X of the form 1-X Substitute

하지만, 손실 함수는 끝의 값이 무한대가 되기 때문에, 1-X 형식에 부동소수점값 ε를 추가한다. 이로써 로그평균제곱오차(LMSE) 함수의 손실의 범위는 0에서 C(양의 유한 값)가 된다. However, since the loss function ends up with infinity, we add a floating point value ε to the 1-X form. This gives the range of loss of the Log Mean Square Error (LMSE) function from 0 to C (positive finite value).

최적화부(300)는 로그평균제곱오차 함수를 통해 도출되는 손실이 최소가 되도록 학습모델의 파라미터를 수정하는 최적화를 수행한다. 이때, 다음의 수학식 2에 따라 최적화를 수행한다. The optimizer 300 performs optimization to modify the parameters of the learning model so that the loss derived through the log mean square error function is minimized. At this time, optimization is performed according to Equation 2 below.

여기서, 은 t번째 반복(iteration)에서 도출되는 손실을 나타내며, 은 손실의 기울기를 나타내고, 은 학습율을 나타낸다. here, represents the loss derived from the tth iteration, represents the slope of the loss, represents the learning rate.

다음으로, 본 발명의 실시예에 따른 학습모델(LM)을 학습시키기 위한 방법에 대해서 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 학습모델을 생성하기 위한 방법을 설명하기 위한 흐름도이다. Next, a method for learning the learning model (LM) according to an embodiment of the present invention will be described. 2 is a flowchart illustrating a method for generating a learning model according to an embodiment of the present invention.

도 2를 참조하면, 학습부(100)는 S110 단계에서 학습모델(LM)에 대한 학습 데이터를 마련한다. 여기서, 학습 데이터는 학습용 입력 데이터 및 라벨을 포함한다. 여기서, 라벨은 학습모델(LM)이 생성형 네트워크인 경우, 학습용 입력 데이터가 될 수 있다. 한편, 라벨은 학습모델(LM)이 분류형 네트워크인 경우, 학습용 입력 데이터가 속하는 분류를 나타내는 목적값이 될 수 있다. Referring to FIG. 2 , the learning unit 100 prepares learning data for the learning model LM in step S110. Here, the learning data includes input data for learning and labels. Here, the label may be input data for learning when the learning model LM is a generative network. Meanwhile, when the learning model LM is a classified network, the label may be a target value representing a classification to which input data for learning belongs.

학습부(100)는 S120 단계에서 학습용 입력 데이터를 학습이 완료되지 않은 학습모델(LM)에 입력한다. 그러면, 학습모델(LM)은 S130 단계에서 복수의 계층 간 학습이 완료되지 않은 가중치가 적용되는 복수의 연산을 통해 학습용 출력 데이터를 산출한다. In step S120, the learning unit 100 inputs the input data for learning to the learning model LM where learning has not been completed. Then, the learning model (LM) calculates output data for learning through a plurality of calculations to which weights for which learning between a plurality of layers is not completed are applied in step S130.

이어서, 학습부(100)는 S140 단계에서 손실함수, 즉, 수학식 1의 로그평균제곱오차(LMSE) 함수를 통해 학습용 예측값과 라벨과의 차이를 나타내는 손실을 산출한다. Subsequently, the learning unit 100 calculates a loss representing a difference between the predicted value for learning and the label through a loss function, that is, a log mean square error (LMSE) function of Equation 1 in step S140.

다음으로, 학습부(100)는 S150 단계에서 손실함수를 통해 도출되는 손실이 최소가 되도록 학습모델(LM)의 가중치를 수정하는 최적화를 수행한다. 여기서, 최적화 알고리즘은 경사하강법(gradient descent method)을 이용할 수 있다. Next, the learning unit 100 performs optimization of modifying the weight of the learning model LM so that the loss derived through the loss function is minimized in step S150. Here, the optimization algorithm may use a gradient descent method.

전술한 S120 단계 내지 S150 단계는 서로 다른 복수의 학습 데이터를 이용하여 반복 수행되며, 이러한 반복에 따라 학습모델(LM)의 가중치는 반복하여 갱신된다. 그리고 이러한 반복은 손실이 기 설정된 목표치 이하가 될 때까지 이루어진다. 따라서 학습부(100)는 S160 단계에서 앞서(S140) 산출한 손실이 기 설정된 목표치 이하인지 여부를 판별하고, 손실이 기 설정된 목표치 이하이면, S170 단계에서 학습모델(LM)에 대한 학습을 완료한다. Steps S120 to S150 described above are repeatedly performed using a plurality of different learning data, and the weights of the learning model LM are repeatedly updated according to these repetitions. And this repetition is performed until the loss is less than the predetermined target value. Therefore, the learning unit 100 determines whether or not the loss calculated earlier (S140) is less than or equal to a preset target value in step S160, and if the loss is less than or equal to the preset target value, completes learning for the learning model (LM) in step S170. .

다음으로 본 발명의 실시예에 따른 로그평균제곱오차(LMSE) 함수와 평균제곱오차(MSE) 함수를 비교하여 설명하기로 한다. 도 3은 본 발명의 실시예에 따른 로그평균제곱오차(LMSE) 함수의 이점을 설명하기 위한 그래프이다. Next, a log mean square error (LMSE) function and a mean square error (MSE) function according to an embodiment of the present invention will be compared and described. 3 is a graph for explaining the advantages of the Log Mean Square Error (LMSE) function according to an embodiment of the present invention.

최적화를 위한 경사하강법을 적용하기 위해 손실의 기울기(경사)가 구해져야 하며, 이를 위하여, 손실함수에 대한 미분이 요구된다. In order to apply the gradient descent method for optimization, the gradient (slope) of the loss must be obtained, and for this purpose, the differentiation of the loss function is required.

먼저, 평균제곱오차(MSE) 함수의 손실의 경사를 나타내는 기울기평균제곱오차(MSE) 함수의 도함수는 다음의 수학식 3과 같다. First, the derivative of the gradient mean square error (MSE) function representing the slope of the loss of the mean square error (MSE) function is shown in Equation 3 below.

도 3의 (A)는 이러한 평균제곱오차(MSE) 함수의 손실의 경사를 시각화한 것이다. 3(A) visualizes the slope of the loss of the mean square error (MSE) function.

또한, 로그평균제곱오차(LMSE) 함수의 손실의 경사를 나타내는 로그평균제곱오차(LMSE) 함수의 도함수는 다음의 수학식 4와 같다. In addition, the derivative of the log mean square error (LMSE) function representing the slope of the loss of the log mean square error (LMSE) function is as shown in Equation 4 below.

도 3의 (B)는 이러한 로그평균제곱오차(LMSE) 함수의 손실의 경사를 시각화한 것이다. Figure 3 (B) is a visualization of the slope of the loss of the log mean square error (LMSE) function.

도 3의 (A) 및 (B)를 비교하면, 로그평균제곱오차(LMSE) 함수의 경우, 학습용 출력 데이터(), 라벨(y) 사이의 간격이 증가할 때 더욱 가파른 경사를 보인다. Comparing (A) and (B) of FIG. 3, in the case of the log mean square error (LMSE) function, the learning output data ( ), it shows a steeper slope when the spacing between labels (y) increases.

이러한 특성은 로그평균제곱오차(LMSE) 함수가 평균제곱오차(MSE) 함수에 비해 안정적인 최적화를 수행할 수 있음을 나타낸다. These characteristics indicate that the log mean square error (LMSE) function can perform stable optimization compared to the mean square error (MSE) function.

도 4는 본 발명의 일 실시예에 따른 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치를 구현하기 위한 하드웨어 시스템의 예시도이다. 4 is an exemplary diagram of a hardware system for implementing an apparatus for generating a learning model using a log scaling loss function according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명의 일 실시예에 따른 하드웨어 시스템(2000)은, 프로세서부(2100), 메모리 인터페이스부(2200), 및 주변장치 인터페이스부(2300)를 포함하는 구성을 가질 수 있다. As shown in FIG. 4 , a hardware system 2000 according to an embodiment of the present invention has a configuration including a processor unit 2100, a memory interface unit 2200, and a peripheral device interface unit 2300. can

이러한, 하드웨어 시스템(2000) 내 각 구성은, 개별 부품이거나 하나 이상의 집적 회로에 집적될 수 있으며, 이러한 각 구성들은 버스 시스템(도시 안됨)에 의해서 결합될 수 있다. Each component in the hardware system 2000 may be an individual part or may be integrated into one or more integrated circuits, and each component may be coupled by a bus system (not shown).

여기서, 버스 시스템의 경우, 적절한 브리지들, 어댑터들, 및/또는 제어기들에 의해 연결된 임의의 하나 이상의 개별적인 물리 버스들, 통신 라인들/인터페이스들, 및/또는 멀티 드롭(multi-drop) 또는 포인트 투 포인트(point-to-point) 연결들을 나타내는 추상화(abstraction)이다. where, in the case of a bus system, any one or more individual physical buses, communication lines/interfaces, and/or multi-drop or point-to-point communication lines connected by suitable bridges, adapters, and/or controllers; It is an abstraction representing point-to-point connections.

프로세서부(2100)는 하드웨어 시스템에서 다양한 기능들을 수행하기 위해 메모리 인터페이스부(2200)를 통해 메모리부(2210)와 통신함으로써, 메모리부(2210)에 저장된 다양한 소프트웨어 모듈을 실행하는 역할을 수행하게 된다. The processor unit 2100 performs a role of executing various software modules stored in the memory unit 2210 by communicating with the memory unit 2210 through the memory interface unit 2200 to perform various functions in the hardware system. .

여기서, 메모리부(2210)에는 앞서 도 2를 참조하여 설명한 학습모델(LM), 데이터처리부(100), 손실도출부(200) 및 최적화부(300)를 포함하는 구성 각각은 소프트웨어 모듈 형태로 저장될 수 있으며, 그 외 운영 체계(OS)가 추가로 저장될 수 있다. 이러한 학습모델(LM), 데이터처리부(100), 손실도출부(200) 및 최적화부(300)를 포함하는 구성은 프로세서부(2100)에 로드되어 실행될 수 있다. Here, in the memory unit 2210, each configuration including the learning model (LM), the data processing unit 100, the loss derivation unit 200, and the optimization unit 300 described above with reference to FIG. 2 is stored in the form of a software module. and other operating systems (OS) may be additionally stored. A configuration including the learning model (LM), the data processing unit 100, the loss derivation unit 200, and the optimization unit 300 may be loaded into the processor unit 2100 and executed.

이상 설명한 학습모델(LM), 데이터처리부(100), 손실도출부(200) 및 최적화부(300)를 포함하는 각 구성은 프로세서에 의해 실행되는 소프트웨어 모듈 또는 하드웨어 모듈 형태로 구현되거나, 소프트웨어 모듈과 하드웨어 모듈이 조합된 형태로도 구현될 수 있다. Each component including the above-described learning model (LM), data processing unit 100, loss deduction unit 200, and optimization unit 300 is implemented in the form of a software module or hardware module executed by a processor, or a software module and It may also be implemented in the form of a combination of hardware modules.

이와 같이, 프로세서에 의해 실행되는 소프트웨어 모듈, 하드웨어 모듈, 내지는 소프트웨어 모듈과 하드웨어 모듈이 조합된 형태는 실제 하드웨어 시스템(예: 컴퓨터 시스템)으로 구현될 수 있을 것이다. As such, a software module executed by a processor, a hardware module, or a combination of software modules and hardware modules may be implemented as an actual hardware system (eg, a computer system).

운영 체계(예: I-OS, Android, Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, 또는 VxWorks와 같은 임베디드 운영 체계)의 경우, 일반적인 시스템 작업들(예를 들어, 메모리 관리, 저장 장치 제어, 전력 관리 등)을 제어 및 관리하는 다양한 절차, 명령어 세트, 소프트웨어 컴포넌트 및/또는 드라이버를 포함하고 있으며 다양한 하드웨어 모듈과 소프트웨어 모듈 간의 통신을 용이하게 하는 역할을 수행하게 된다. For operating systems (e.g. embedded operating systems such as I-OS, Android, Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or VxWorks), general system tasks (e.g. memory management, storage control) , power management, etc.) and includes various procedures, command sets, software components and/or drivers that control and manage, and plays a role in facilitating communication between various hardware modules and software modules.

참고로, 메모리부(2210)는 캐쉬, 메인 메모리 및 보조 기억장치(secondary memory)를 포함하지만 이에 제한되지 않는 메모리 계층구조가 포함할 수 있는데, 이러한 메모리 계층구조의 경우 예컨대 RAM(예: SRAM, DRAM, DDRAM), ROM, FLASH, 자기 및/또는 광 저장 장치[예: 디스크 드라이브, 자기 테이프, CD(compact disk) 및 DVD(digital video disc) 등]의 임의의 조합을 통해서 구현될 수 있다. For reference, the memory unit 2210 may include a memory hierarchy including, but not limited to, a cache, a main memory, and a secondary memory. In the case of such a memory hierarchy, for example, RAM (eg, SRAM, DRAM, DDRAM), ROM, FLASH, magnetic and/or optical storage devices such as disk drives, magnetic tapes, compact disks (CDs), and digital video discs (DVDs).

주변장치 인터페이스부(2300)는 프로세서부(2100)와 주변장치 간에 통신을 가능하게 하는 역할을 수행한다. The peripheral device interface unit 2300 serves to enable communication between the processor unit 2100 and peripheral devices.

여기서 주변장치의 경우, 하드웨어 시스템(2000)에 상이한 기능을 제공하기 위한 것으로서, 본 발명의 일 실시예에서는, 예컨대, 통신부(2310)가 포함될 수 있다. Here, in the case of a peripheral device, it is for providing different functions to the hardware system 2000, and in one embodiment of the present invention, for example, a communication unit 2310 may be included.

여기서, 통신부(2310)는 다른 장치와의 통신 기능을 제공하는 역할을 수행하는 수행하게 되며, 이를 위해 예컨대, 안테나 시스템, RF 송수신기, 하나 이상의 증폭기, 튜너, 하나 이상의 발진기, 디지털 신호 처리기, 코덱(CODEC) 칩셋, 및 메모리 등을 포함하지만 이에 제한되지는 않으며, 이 기능을 수행하는 공지의 회로를 포함할 수 있다. Here, the communication unit 2310 serves to provide a communication function with other devices, and for this purpose, for example, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a codec ( CODEC) chipset, memory, etc., but are not limited thereto, and may include a known circuit that performs this function.

이러한, 통신부(2310)가 지원하는 통신 프로토콜로는, 예컨대, 무선랜(Wireless LAN: WLAN), DLNA(Digital Living Network Alliance), 와이브로(Wireless Broadband: Wibro), 와이맥스(World Interoperability for Microwave Access: Wimax), GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), IEEE 802.16, 롱 텀 에볼루션(Long Term Evolution: LTE), LTE-A(Long Term Evolution-Advanced), 5G 통신시스템, 광대역 무선 이동 통신 서비스(Wireless Mobile Broadband Service: WMBS), 블루투스(Bluetooth), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association: IrDA), UWB(Ultra-Wideband), 지그비(ZigBee), 인접 자장 통신(Near Field Communication: NFC), 초음파 통신(Ultra Sound Communication: USC), 가시광 통신(Visible Light Communication: VLC), 와이 파이(Wi-Fi), 와이 파이 다이렉트(Wi-Fi Direct) 등이 포함될 수 있다. 또한, 유선 통신망으로는 유선 LAN(Local Area Network), 유선 WAN(Wide Area Network), 전력선 통신(Power Line Communication: PLC), USB 통신, 이더넷(Ethernet), 시리얼 통신(serial communication), 광/동축 케이블 등이 포함될 수 있으며, 이제 제한되는 것이 아닌, 다른 장치와의 통신 환경을 제공할 수 있는 프로토콜은 모두 포함될 수 있다. Such communication protocols supported by the communication unit 2310 include, for example, Wireless LAN (WLAN), Digital Living Network Alliance (DLNA), Wireless Broadband (Wibro), and World Interoperability for Microwave Access (Wimax). ), GSM (Global System for Mobile communication), CDMA (Code Division Multi Access), CDMA2000 (Code Division Multi Access 2000), EV-DO (Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA (Wideband CDMA) , HSDPA (High Speed Downlink Packet Access), HSUPA (High Speed Uplink Packet Access), IEEE 802.16, Long Term Evolution (LTE), LTE-A (Long Term Evolution-Advanced), 5G communication system, broadband wireless Wireless Mobile Broadband Service (WMBS), Bluetooth, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra-Wideband (UWB), ZigBee, Near Field Communication ( Near Field Communication (NFC), Ultra Sound Communication (USC), Visible Light Communication (VLC), Wi-Fi, Wi-Fi Direct, and the like may be included. In addition, wired communication networks include wired local area network (LAN), wired wide area network (WAN), power line communication (PLC), USB communication, Ethernet, serial communication, optical/coaxial A cable may be included, and all protocols capable of providing a communication environment with other devices may be included, which are not now limited.

본 발명의 일 실시예에 따른 하드웨어 시스템(2000)에서 메모리부(2210)에 소프트웨어 모듈 형태로 저장되어 있는 각 구성은 프로세서부(2100)에 의해 실행되는 명령어의 형태로 메모리 인터페이스부(2200)와 주변장치 인터페이스부(2300)를 매개로 통신부(2310)와의 인터페이스를 수행한다. In the hardware system 2000 according to an embodiment of the present invention, each component stored in the form of a software module in the memory unit 2210 is in the form of a command executed by the processor unit 2100, and the memory interface unit 2200 and An interface with the communication unit 2310 is performed via the peripheral device interface unit 2300.

이상에서 설명한 바와 같이, 본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안 되며, 오히려 특정한 발명의 특정한 실시 형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 개별적인 실시 형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시 형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.As set forth above, this specification contains many specific implementation details, but these should not be construed as limiting on the scope of any invention or claimables, but rather may be specific to a particular embodiment of a particular invention. It should be understood as a description of the features in Certain features that are described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable subcombination. Further, while features may operate in particular combinations and are initially depicted as such claimed, one or more features from a claimed combination may in some cases be excluded from that combination, and the claimed combination is a subcombination. or sub-combination variations.

마찬가지로, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시 형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징 될 수 있다는 점을 이해하여야 한다.Similarly, while actions are depicted in the drawings in a particular order, it should not be construed as requiring that those actions be performed in the specific order shown or in the sequential order, or that all depicted actions must be performed to obtain desired results. In certain cases, multitasking and parallel processing can be advantageous. Further, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and the program components and systems described may generally be integrated together into a single software product or packaged into multiple software products. You have to understand that you can.

본 명세서에서 설명한 주제의 특정한 실시 형태를 설명하였다. 기타의 실시형태들은 이하의 청구항의 범위 내에 속한다. 예컨대, 청구항에서 인용된 동작들은 상이한 순서로 수행되면서도 여전히 바람직한 결과를 성취할 수 있다. 일 예로서, 첨부 도면에 도시한 프로세스는 바람직한 결과를 얻기 위하여 반드시 그 특정한 도시된 순서나 순차적인 순서를 요구하지 않는다. 특정한 구현 예에서, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다.Specific embodiments of the subject matter described herein have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As an example, the processes depicted in the accompanying figures do not necessarily require the particular depicted order or sequential order in order to obtain desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

본 기술한 설명은 본 발명의 최상의 모드를 제시하고 있으며, 본 발명을 설명하기 위하여, 그리고 당업자가 본 발명을 제작 및 이용할 수 있도록 하기 위한 예를 제공하고 있다. 이렇게 작성된 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하는 것이 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다.The present description presents the best mode of the invention and provides examples to illustrate the invention and to enable those skilled in the art to make and use the invention. The specification thus prepared does not limit the invention to the specific terms presented. Therefore, although the present invention has been described in detail with reference to the above-described examples, those skilled in the art may make alterations, changes, and modifications to the present examples without departing from the scope of the present invention.

따라서 본 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 특허청구범위에 의해 정해져야 한다.Therefore, the scope of the present invention should not be defined by the described embodiments, but by the claims.

본 발명은 로그 스케일링 손실함수를 이용한 학습모델을 생성하기 위한 장치 및 이를 위한 방법에 관한 것이다. 이러한 본 발명의 손실 함수인 로그평균제곱오차 함수를 이용하여 경사하강법에 의한 최적화를 수행하면, 안정적으로 최적화를 수행할 수 있다. 따라서 본 발명은 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있으므로 산업상 이용가능성이 있다. The present invention relates to an apparatus and method for generating a learning model using a logarithmic scaling loss function. When optimization by the gradient descent method is performed using the log mean square error function, which is the loss function of the present invention, optimization can be performed stably. Therefore, the present invention has industrial applicability because it can be clearly practiced in reality as well as having a sufficient possibility of commercialization or business.

10: 학습장치
100: 데이터처리부
200: 손실도출부
300: 최적화부 10: learning device
100: data processing unit
200: loss deduction unit
300: optimization unit

Claims

When the learning model calculates output data for learning with respect to the input data for learning, the loss derivation unit calculates a loss representing the difference between the output data for learning and the label by using a log mean square error function obtained by converting the mean square error function to a logarithmic scale. doing; and
performing optimization by an optimizer to modify weights of the learning model so that the loss is minimized;
characterized in that it includes
A method for creating a learning model.

According to claim 1,
The step of calculating the loss is
The log mean square error function

Calculate the loss using
Wherein y is the label,
remind Is the output data for learning, characterized in that
A method for creating a learning model.

According to claim 2,
The log mean square error function is
Convert the structure of the mean square error function to a logarithmic function in negative form,
Apply the 1-X format,
to X Characterized in that it is generated by substituting
A method for creating a learning model.

Calculating output data for learning through a plurality of calculations to which a weight for which learning between a plurality of layers is not completed is applied to the input data for learning by the learning model;
calculating, by a loss derivation unit, a loss representing a difference between the output data for training and the label by using a log mean square error function obtained by converting the mean square error function into a log scale; and
performing optimization by an optimizer to modify weights of the learning model so that the loss is minimized;
characterized in that it includes
A method for creating a learning model.

According to claim 4,
The step of calculating the loss is
The log mean square error function

Calculate the loss using
Wherein y is the label,
remind Is the output data for learning, characterized in that
A method for creating a learning model.

According to claim 5,
If the learning model is a generative network,
The label is the input data for learning,
If the learning model is a classification network,
Characterized in that the label is a target value indicating the classification to which the learning input data belongs
A method for creating a learning model.

According to claim 6,
The log mean square error function is
Convert the structure of the mean square error function to a logarithmic function in negative form,
Apply the 1-X format,
to X Characterized in that it is generated by substituting
A method for creating a learning model.

When the learning model calculates output data for learning with respect to the input data for learning, deriving a loss that calculates a loss representing the difference between the output data for learning and the label using a log mean square error function converted to a logarithmic scale of the mean square error function wealth; and
an optimization unit that performs optimization to modify the weight of the learning model so that the loss is minimized;
characterized in that it includes
A device for generating a learning model.

According to claim 8,
The loss derivation unit
The log mean square error function

using
Calculate the loss;
Wherein y is the label,
remind Is the output data for learning, characterized in that
A device for generating a learning model.

According to claim 8,
The log mean square error function is
Convert the structure of the mean square error function to a logarithmic function in negative form,
Apply the 1-X format,
to X Characterized in that it is generated by substituting
A device for generating a learning model.

a data processing unit that prepares learning data including input data for learning and a label corresponding to the input data for learning, and inputs the input data for learning to a learning model in which learning has not been completed;
When the learning model calculates output data for learning through a plurality of operations to which a weight for which learning between a plurality of layers is not completed is applied to the input data for learning,
a loss derivation unit that calculates a loss representing a difference between the output data for training and the label by using a log mean square error function obtained by converting the mean square error function to a logarithmic scale; and
an optimization unit that performs optimization to modify the weight of the learning model so that the loss is minimized;
characterized in that it includes
A device for generating a learning model.

According to claim 11,
The loss derivation unit
The log mean square error function

using
Calculate the loss;
Wherein y is the label,
remind Is the output data for learning, characterized in that
A device for generating a learning model.

According to claim 12,
If the learning model is a generative network,
The label is the input data for learning,
If the learning model is a classification network,
Characterized in that the label is a target value indicating the classification to which the learning input data belongs
A device for generating a learning model.

According to claim 12,
The log mean square error function is
Convert the structure of the mean square error function to a logarithmic function in negative form,
Apply the 1-X format,
to X Characterized in that it is generated by substituting
A device for generating a learning model.