KR20220009682A

KR20220009682A - Method and system for distributed machine learning

Info

Publication number: KR20220009682A
Application number: KR1020200088207A
Authority: KR
Inventors: 윤세영; 김성윤; 김상묵
Original assignee: 한국전력공사; 한국과학기술원
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2022-01-25

Abstract

The present invention relates to a method and system for distributed machine learning. The method comprises the operations of: transmitting, by a central server, a random seed; receiving, by a plurality of learning machines, the random seed and generating a random matrix based on the received random seed; generating, by the plurality of learning machines, local gradient parameters based on training data assigned thereto; generating, by the plurality of learning machines, compression information by compressing the generated local gradient parameters and setting parameters using the random matrix; transmitting, by the plurality of learning machines, the compression information to the central server; generating, by the central server, average compression gradient parameters by collecting the compression information received from the plurality of learning machines; transmitting, by the central server, the average compression gradient parameters to the plurality of learning machines; decompressing, by the plurality of learning machines, the received average compression gradient parameters by using the transpose of the random matrix to obtain average gradient parameters; and updating, the plurality of learning machines, the parameters of an artificial intelligence model based on the obtained average gradient parameters. Therefore, the time required to learn a very large amount of training data can be shortened.

Description

Distributed machine learning method and system {METHOD AND SYSTEM FOR DISTRIBUTED MACHINE LEARNING}

다양한 실시 예는 분산 기계 학습 방법 및 시스템에 관한 것이다.Various embodiments relate to a distributed machine learning method and system.

인공지능(artificial intelligence)은 컴퓨팅 장치에서 사람의 학습 능력과 추론능력, 지각능력, 자연언어의 이해능력, 등과 같은 지능적인 동작을 수행하는 기술로 정의될 수 있다.Artificial intelligence can be defined as a technology that performs intelligent actions such as human learning ability, reasoning ability, perceptual ability, natural language understanding ability, etc. in a computing device.

컴퓨팅 장치의 성능 및 컴퓨팅 장치에 이용되는 어플리케이션 개발의 수준이 점차 향상됨에 따라, 컴퓨팅 장치에서 구현되는 인공지능의 수준도 점차 향상되고 있다.As the performance of the computing device and the level of application development used in the computing device gradually improve, the level of artificial intelligence implemented in the computing device is also gradually improved.

최근 인공지능은 사람의 두뇌와 유사하게 구현된 뉴럴 네트워크(neural network)와, 그 구조를 이용하여 강화 학습을 수행하는 기법(예를 들어, 딥러닝 학습 기법)을 사용하여 구현되고 있다.Recently, artificial intelligence has been implemented using a neural network implemented similar to a human brain and a technique for performing reinforcement learning using the structure (eg, a deep learning learning technique).

인공지능의 성능은 학습을 통해 향상될 수 있는데, 인공지능의 학습을 위해서는 다양한 학습 데이터가 요구된다.The performance of artificial intelligence can be improved through learning, and various learning data are required for learning of artificial intelligence.

한편, 대용량 데이터를 이용한 딥러닝 모델 개발 시에는 학습에 많은 시간이 소요된다는 한계가 있었다. 이를 해결하고자, 최근에는 분산 기계 학습이 주목받고 있다.On the other hand, there is a limitation in that it takes a lot of time to learn when developing a deep learning model using large data. In order to solve this problem, distributed machine learning has recently been attracting attention.

딥러닝 학습 과정의 분산처리 시에는, 데이터 병렬처리(data parallelism) 기법과 모델 병렬처리(model parallelism) 기법이 이용된다.In the distributed processing of the deep learning learning process, a data parallelism technique and a model parallelism technique are used.

데이터 병렬처리 기법은 학습해야 하는 입력 데이터 셋(set)을 다수의 컴퓨터가 나누어 학습을 수행하는 방식을 의미하고, 모델 병렬처리 기법은 딥러닝 모델을 나누어 다수의 컴퓨터가 분할된 딥러닝 모델들에 대한 학습을 각각 수행하는 방식을 의미한다.The data parallel processing technique refers to a method in which multiple computers divide the input data set to be learned and perform learning, and the model parallel processing technique divides the deep learning model and divides the multiple computers into the divided deep learning models. It means the method of carrying out each learning.

특히, 데이터 병렬처리 기법은 전체 학습 데이터를 대상으로 복수의 학습 머신(worker machine)들에서 각각 일부 학습 데이터를 이용하여 학습을 진행하는 방식이므로, 네트워크에 연결된 외부 저장장치에 데이터를 저장하고, 네트워크를 통해 학습을 수행해야만 한다.In particular, since the data parallel processing technique is a method of learning by using some training data in each of a plurality of learning machines for the entire training data, the data is stored in an external storage device connected to the network, and the network learning must be carried out through

이 경우, 학습이 진행될 때마다 학습 머신들 각각은 외부 저장장치에 빈번하게 접근하여 학습을 진행하여야 하므로, 시스템의 성능이나 용량이 저하되는 병목현상으로 인해 학습 속도가 현저하게 떨어지는 문제점이 발생하게 된다.In this case, whenever learning is in progress, each of the learning machines must frequently access an external storage device to proceed with learning. .

또한, 초대용량 데이터의 경우, 특정한 분산 환경에서는 전체 데이터를 학습할 수 없으므로, 학습의 효율성이 감소하게 되는 한계가 존재하게 된다.In addition, in the case of very large data, since the entire data cannot be learned in a specific distributed environment, there is a limit in which the learning efficiency is reduced.

본 발명은 상기한 문제점들을 해결하고자 안출된 것으로서, 본 발명의 목적은 분산 기계 학습에서 요구되는 스토리지와 분산을 감소시켜 보다 효율적인 학습을 수행하는 분산 기계 학습 방법 및 시스템을 제공하는 것이다.The present invention has been devised to solve the above problems, and an object of the present invention is to provide a distributed machine learning method and system for performing more efficient learning by reducing storage and distribution required in distributed machine learning.

본 문서에서 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in this document are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. There will be.

본 발명의 다양한 실시 예들에 따르면 인공지능 신경망의 분산 기계 학습 방법은, 중앙 서버가 랜덤 시드를 송신하는 동작; 복수의 학습 머신이 상기 랜덤 시드를 수신하고, 수신된 랜덤 시드를 기반으로 랜덤 매트릭스를 생성하는 동작; 상기 복수의 학습 머신이 자신에게 할당된 학습 데이터에 기반하여 로컬 그래디언트 파라미터를 생성하는 동작; 상기 복수의 학습 머신이 상기 생성된 로컬 그래디언트 파라미터와 설정 파라미터를 상기 랜덤 매트릭스를 이용하여 압축하여 압축 정보를 생성하는 동작; 상기 복수의 학습 머신이 상기 압축 정보를 상기 중앙 서버로 전송하는 동작; 상기 중앙 서버가 상기 복수의 학습 머신으로부터 수신한 상기 압축 정보를 취합하여 평균 압축 그래디언트 파라미터를 생성하는 동작; 상기 중앙 서버가 상기 평균 압축 그래디언트 파라미터를 상기 복수의 학습 머신으로 전송하는 동작; 상기 복수의 학습 머신은 상기 수신한 평균 압축 그래디언트 파라미터를 상기 랜덤 매트릭스의 트랜스포즈를 이용하여 압축 해제하여 평균 그래디언트 파라미터를 획득하는 동작; 및 상기 복수의 학습 머신은 상기 획득한 평균 그래디언트 파라미터에 기초하여 인공지능 모델의 파라미터를 갱신하는 동작을 포함한다.According to various embodiments of the present disclosure, a distributed machine learning method of an artificial intelligence neural network includes: a central server transmitting a random seed; receiving, by a plurality of learning machines, the random seed, and generating a random matrix based on the received random seed; generating, by the plurality of learning machines, a local gradient parameter based on training data assigned to them; generating compressed information by compressing, by the plurality of learning machines, the generated local gradient parameter and setting parameter using the random matrix; sending, by the plurality of learning machines, the compressed information to the central server; generating, by the central server, the compressed information received from the plurality of learning machines to generate an average compression gradient parameter; sending, by the central server, the average compressed gradient parameter to the plurality of learning machines; decompressing, by the plurality of learning machines, the received average compressed gradient parameter using the transpose of the random matrix to obtain an average gradient parameter; and updating, by the plurality of learning machines, the parameters of the artificial intelligence model based on the obtained average gradient parameters.

또한, 상기 복수의 학습 머신이 상기 평균 그래디언트 파라미터를 이용하여 상기 설정 파라미터를 갱신하는 동작을 더 포함한다.The method further includes: updating, by the plurality of learning machines, the setting parameter using the average gradient parameter.

또한, 상기 복수의 학습 머신이 상기 압축 정보를 생성하는 동작은, 상기 로컬 그래디언트 파라미터와 상기 설정 파라미터 차에 상기 랜덤 매트릭스를 곱하여 상기 압축 정보를 생성하는 동작을 포함한다.In addition, the operation of the plurality of learning machines generating the compressed information includes multiplying a difference between the local gradient parameter and the set parameter by the random matrix to generate the compressed information.

또한, 상기 복수의 학습 머신이 상기 평균 그래디언트 파라미터를 획득하는 동작은, 상기 평균 압축 그래디언트 파라미터와 상기 랜덤 매트릭스의 트랜스포즈를 곱한 값에 상기 설정 파라미터를 더하여 상기 평균 그래디언트 파라미터를 획득하는 동작을 포함한다.In addition, the operation of the plurality of learning machines to obtain the average gradient parameter includes: obtaining the average gradient parameter by adding the setting parameter to a value obtained by multiplying the average compressed gradient parameter and the transform of the random matrix .

추가로, 상기의 기술적 과제를 해결하기 위한 본 발명의 다른 실시 예에 따른 컴퓨터 판독 가능 저장 매체는, 컴퓨터 상에서 실행될 때, 상기의 분산 기계 학습 방법을 수행하기 위한 프로그램이 기록될 수 있다.In addition, the computer-readable storage medium according to another embodiment of the present invention for solving the above technical problem, when executed on a computer, a program for performing the distributed machine learning method may be recorded.

추가로, 상기의 기술적 과제를 해결하기 위한 본 발명의 또 다른 실시 예에 따른 분산 기계 학습 시스템은, 중앙 서버는 랜덤 시드를 송신하고, 복수의 학습 머신은 상기 랜덤 시드를 수신하고, 수신된 랜덤 시드를 기반으로 랜덤 매트릭스를 생성하고, 상기 복수의 학습 머신은 자신에게 할당된 학습 데이터에 기반하여 로컬 그래디언트 파라미터를 생성하고, 상기 복수의 학습 머신은 상기 생성된 로컬 그래디언트 파라미터와 설정 파라미터를 상기 랜덤 매트릭스를 이용하여 압축하여 압축 정보를 생성하고, 상기 복수의 학습 머신은 상기 압축 정보를 상기 중앙 서버로 전송하고, 상기 중앙 서버는 상기 복수의 학습 머신으로부터 수신한 상기 압축 정보를 취합하여 평균 압축 그래디언트 파라미터를 생성하고, 상기 중앙 서버는 상기 평균 압축 그래디언트 파라미터를 상기 복수의 학습 머신으로 전송하고, 상기 복수의 학습 머신은 상기 수신한 평균 압축 그래디언트 파라미터를 상기 랜덤 매트릭스의 트랜스포즈를 이용하여 압축 해제하여 평균 그래디언트 파라미터를 획득하고, 상기 복수의 학습 머신은 상기 획득한 평균 그래디언트 파라미터에 기초하여 인공지능 모델의 파라미터를 갱신한다.In addition, in a distributed machine learning system according to another embodiment of the present invention for solving the above technical problem, a central server transmits a random seed, a plurality of learning machines receive the random seed, and the received random generate a random matrix based on a seed, the plurality of learning machines generate local gradient parameters based on training data assigned to them, and the plurality of learning machines use the generated local gradient parameters and the setting parameters as the random Compressed information is generated by compression using a matrix, the plurality of learning machines transmits the compressed information to the central server, and the central server aggregates the compressed information received from the plurality of learning machines to obtain an average compression gradient. generating parameters, the central server sends the average compression gradient parameter to the plurality of learning machines, and the plurality of learning machines decompresses the received average compression gradient parameter using the transpose of the random matrix, obtain an average gradient parameter, and the plurality of learning machines update the parameter of the artificial intelligence model based on the obtained average gradient parameter.

또한, 상기 복수의 학습 머신은 상기 평균 그래디언트 파라미터를 이용하여 상기 설정 파라미터를 갱신한다.Also, the plurality of learning machines use the average gradient parameter to update the setting parameter.

또한, 상기 복수의 학습 머신은 상기 로컬 그래디언트 파라미터와 상기 설정 파라미터 차에 상기 랜덤 매트릭스를 곱하여 상기 압축 정보를 생성한다.In addition, the plurality of learning machines generate the compressed information by multiplying the difference between the local gradient parameter and the set parameter by the random matrix.

또한, 상기 복수의 학습 머신은 상기 평균 압축 그래디언트 파라미터와 상기 랜덤 매트릭스의 트랜스포즈를 곱한 값에 상기 설정 파라미터를 더하여 상기 평균 그래디언트 파라미터를 획득한다.In addition, the plurality of learning machines obtain the average gradient parameter by adding the set parameter to a value obtained by multiplying the average compressed gradient parameter and the transposition of the random matrix.

본 발명은 초대용량 학습 데이터의 학습 시간을 단축시키고, 네트워크 통신량을 감소시켜 통신비용을 절감할 수 있는 효과가 있다.The present invention has the effect of shortening the learning time of the super-capacity learning data and reducing the communication cost by reducing the network communication amount.

또한, 본 발명은 분산 기계 학습에서 요구되는 스토리지와 분산을 감소시켜 학습의 성능을 향상시킬 수 있는 효과가 있다.In addition, the present invention has the effect of improving the learning performance by reducing the storage and distribution required in distributed machine learning.

본 개시에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects obtainable in the present disclosure are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present disclosure belongs from the description below. will be.

도 1은 본 발명의 일 실시 예에 따라 대용량 학습 데이터를 분산 학습을 위한 시스템(1000)의 개괄도를 도시한다.
도 2는 본 발명의 일 실시 예에 따른 분산 기계 학습에서 파라미터를 갱신하는 방법의 흐름도이다.
도 3 내지 도 5는 본 발명의 일 실시 예에 따른 분산 기계 학습에서 파라미터를 갱신하는 과정을 설명하는 예시도이다.1 is a schematic diagram of a system 1000 for distributed learning of large amounts of learning data according to an embodiment of the present invention.
2 is a flowchart of a method for updating parameters in distributed machine learning according to an embodiment of the present invention.
3 to 5 are exemplary diagrams for explaining a process of updating parameters in distributed machine learning according to an embodiment of the present invention.

이하 다양한 실시 예들이 첨부된 도면을 참고하여 상세히 설명된다.Hereinafter, various embodiments will be described in detail with reference to the accompanying drawings.

도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략할 수 있다. Regardless of the reference numerals, the same or similar components are assigned the same reference numerals, and overlapping descriptions thereof may be omitted.

이하의 설명에서 사용되는 구성요소에 대한 접미사 '모듈' 또는 '부'는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, '모듈' 또는 '부'는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC(application specific integrated circuit)과 같은 하드웨어 구성요소를 의미하나, 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '부' 또는 '모듈'은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '부' 또는 '모듈'은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함할 수 있다. 하나의 구성요소, '부' 또는 '모듈'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '부' 또는 '모듈'들로 결합되거나 추가적인 구성요소들과 '부' 또는 '모듈'들로 더 분리될 수 있다.The suffix 'module' or 'unit' for components used in the following description is given or mixed in consideration of only the ease of writing the specification, and does not have a meaning or role distinct from each other by itself. In addition, 'module' or 'unit' refers to software or hardware components such as field programmable gate array (FPGA) or application specific integrated circuit (ASIC), but is not limited to software or hardware. A 'unit' or 'module' may be configured to reside on an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, 'part' or 'module' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, properties, may include procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functionality provided in one component, 'unit' or 'module' may be combined into a smaller number of components and 'unit' or 'module' or additional components and 'unit' or 'module' can be further separated.

본 발명의 몇몇 실시 예들과 관련하여 설명되는 방법 또는 알고리즘의 단계는 프로세서에 의해 실행되는 하드웨어, 소프트웨어 모듈, 또는 그 2 개의 결합으로 직접 구현될 수 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈형 디스크, CD-ROM, 또는 당업계에 알려진 임의의 다른 형태의 기록 매체에 상주할 수도 있다. 예시적인 기록 매체는 프로세서에 커플링되며, 그 프로세서는 기록 매체로부터 정보를 판독할 수 있고 저장 매체에 정보를 기입할 수 있다. 다른 방법으로, 기록 매체는 프로세서와 일체형일 수도 있다. 프로세서 및 기록 매체는 주문형 집적회로(ASIC) 내에 상주할 수도 있다. ASIC은 사용자 단말기 내에 상주할 수도 있다.Steps of a method or algorithm described in connection with some embodiments of the present invention may be directly implemented in hardware executed by a processor, a software module, or a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of recording medium known in the art. An exemplary recording medium is coupled to the processor, the processor capable of reading information from, and writing information to, the storage medium. Alternatively, the recording medium may be integral with the processor. The processor and recording medium may reside within an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including an ordinal number such as 1st, 2nd, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 ‘연결되어’ 있다거나 ‘접속되어’ 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 ‘직접 연결되어’ 있다거나 ‘직접 접속되어’ 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being 'connected' or 'connected' to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in between. it should be On the other hand, when it is mentioned that a certain element is 'directly connected' or 'directly connected' to another element, it should be understood that the other element does not exist in the middle.

우선 본 명세서에서 사용되는 용어들에 대하여 간략히 설명한다.First, the terms used in this specification will be briefly described.

인공지능은 인간의 지능을 갖춘 컴퓨터 시스템 또는 장치이며, 인간의 지능을 기계 등에 인공적으로 구현한 것을 의미할 수 있다. 인공 지능은 또한 지능을 만들 수 있는 방법론이나 실현 가능성 등을 연구하는 과학 분야를 지칭하기도 한다.Artificial intelligence is a computer system or device equipped with human intelligence, and may refer to artificially implemented human intelligence in a machine or the like. Artificial intelligence also refers to the field of science that studies the methodology or feasibility of creating intelligence.

인공신경망은 인공지능을 구현하는 모델 또는 알고리즘으로써, 기계학습에서 생물학의 신경망을 모사하여 모델링한 통계학적 학습 알고리즘으로, 시냅스의 결합으로 네트워크를 형성한 인공 뉴런이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 또는 학습 알고리즘이라 할 수 있다.Artificial neural network is a model or algorithm that implements artificial intelligence. It is a statistical learning algorithm modeled by simulating the neural network of biology in machine learning. Therefore, it can be called a model or learning algorithm with problem-solving ability.

인공신경망(artificial neural network)은 입력 층, 출력 층 그리고 하나 이상의 은닉 층을 포함할 수 있다. 인공신경망의 각 층은 신경망의 뉴런에 대응하는 복수의 노드를 포함하고, 인공신경망의 한 층의 노드와 다른 층의 노드 간은 시냅스로 연결될 수 있다. 일 실시 예로 각 층의 모든 노드와 다음 층의 모든 노드가 시냅스로 연결된 인공신경망을 완전 연결된 인공신경망이라 칭할 수 있다. An artificial neural network may include an input layer, an output layer, and one or more hidden layers. Each layer of the artificial neural network includes a plurality of nodes corresponding to neurons of the neural network, and a node in one layer of the artificial neural network and a node in another layer may be connected by a synapse. In one embodiment, an artificial neural network in which all nodes of each layer and all nodes of the next layer are synaptically connected may be referred to as a fully connected artificial neural network.

인공신경망에서 각 노드는 시냅스를 통해 입력되는 입력 신호들을 받고 각 입력 신호들에 대한 가중치 및 편향에 대한 활성 함수에 기초하여 출력 값을 생성할 수 있다.In the artificial neural network, each node may receive input signals input through a synapse and generate an output value based on an activation function for weights and biases for each input signal.

심층신경망(deep neural network)은 입력층과 출력층 사이에 복수의 은닉층을 포함하는 인공신경망을 통칭할 수 있다. 심층신경망은 복잡한 비선형 관계들을 모델링할 수 있으며, 그 목적에 따라 다양한 구조를 가질 수 있다. 예를 들면, 심층신경망 구조로, 합성곱신경망(convolutional neural network), 순환신경망(recurrent neural network, RNN), LSTM(long short term memory) 등이 있을 수 있다.A deep neural network may collectively refer to an artificial neural network including a plurality of hidden layers between an input layer and an output layer. A deep neural network can model complex nonlinear relationships and can have various structures depending on its purpose. For example, as a deep neural network structure, there may be a convolutional neural network, a recurrent neural network (RNN), a long short term memory (LSTM), and the like.

합성곱 신경망은 이미지, 동영상, 문자열과 같은 구조적 공간 데이터의 특징을 식별하여 학습함으로서 이미지, 동영상을 분류하고 식별하는데 효과적일 수 있다. 순환신경망은 내부에 순환 구조가 들어 있어 과거 시간의 학습이 가중치와 곱해져 현재 학습에 반영될 수 있는 구조이며, 현재의 출력 결과는 과거 시간에서의 출력 결과에 영향을 받으며, 은닉 층은 일종의 메모리 기능을 수행한다. 따라서, 순차적인 데이터를 학습하여 분류 또는 예측을 수행하는 데 효과적일 수 있다.Convolutional neural networks can be effective in classifying and identifying images and videos by learning by identifying features of structured spatial data such as images, videos, and strings. The recurrent neural network has a cyclic structure inside, so the learning in the past time is multiplied by the weight and reflected in the current learning. The current output result is affected by the output result in the past time, and the hidden layer is a kind of memory perform the function Therefore, it can be effective in learning sequential data to perform classification or prediction.

대용량의 데이터를 이용해 데이터의 특성과 패턴을 학습하고, 결과값을 예측하는 방법으로서, 경사하강(gradient descent) 알고리즘과 역전파(back propagation) 알고리즘이 있다.As a method of learning the characteristics and patterns of data using a large amount of data and predicting the result, there are a gradient descent algorithm and a back propagation algorithm.

경사하강 알고리즘은 1차 근사값 발견용 최적화 알고리즘으로서, 함수의 기울기(gradient)를 구하여 기울기가 낮은 쪽으로 계속 이동시켜 극값에 이를 때까지 반복시키는 알고리즘이고, 역전파 알고리즘은 다층 퍼셉트론 학습에 사용되는 통계적 기법을 의미하는 것으로서, 신경망이 계산한 답과 정답 사이의 오차 정보를 획득하고, 그 오차를 줄이도록 피드백을 주어 신경망을 구성하는 개개의 가중치(weight) 또는 파라미터를 조정하는 알고리즘일 수 있다. 이때 역전파 알고리즘에서 사용하는 오차 정보는 경사하강 알고리즘에 의하여 획득할 수 있다.Gradient descent algorithm is an optimization algorithm for finding first-order approximations. It is an algorithm that finds the gradient of a function, moves it to a lower gradient, and repeats it until it reaches an extreme value. This may be an algorithm that obtains error information between the answer calculated by the neural network and the correct answer, and adjusts individual weights or parameters constituting the neural network by giving feedback to reduce the error. In this case, the error information used in the backpropagation algorithm can be obtained by the gradient descent algorithm.

이하에서는 경사하강 알고리즘을 이용해 대용량 데이터를 학습하여 인공신경망 파라미터를 갱신하는 방법에 대해 보다 구체적으로 상술하기로 한다.Hereinafter, a method of updating artificial neural network parameters by learning a large amount of data using a gradient descent algorithm will be described in more detail.

도 1은 본 발명의 일 실시 예에 따라 대용량 학습 데이터를 데이터 병렬 처리에 기반하여 분산 학습하기 위한 시스템(1000)의 개괄도를 도시한다.1 is a schematic diagram of a system 1000 for distributed learning based on data parallel processing on large-capacity learning data according to an embodiment of the present invention.

도 1을 참조하면, 대용량 학습 데이터를 분산 학습하기 위한 시스템(1000)은, 중앙 서버(server, 100)와 복수의 학습 머신(worker machine, 200)으로 구성될 수 있다.Referring to FIG. 1 , a system 1000 for distributed learning of large-capacity learning data may include a central server 100 and a plurality of worker machines 200 .

중앙 서버(100)는 복수의 학습 머신(200)들이 공유하는 서버 또는 이와 유사한 장치일 수 있다. 복수의 학습 머신(200)은 중앙 서버(100)와 네트워크를 통해 통신 가능하게 연결되고, 동일한 인공지능 신경망 모델을 구비하여 분산 학습을 수행 및 처리하는 전자 장치일 수 있다.The central server 100 may be a server shared by a plurality of learning machines 200 or a similar device. The plurality of learning machines 200 may be electronic devices that are communicatively connected to the central server 100 through a network, and have the same artificial intelligence neural network model to perform and process distributed learning.

한편, 도 1에서, 중앙 서버(100)와 학습 머신(200)은 별도의 구성으로 도시하였으나, 이는 예시적인 것에 불과하고, 중앙 서버(100)는 복수의 학습 머신(200) 각각의 일 구성에 포함될 수 있다.On the other hand, in FIG. 1 , the central server 100 and the learning machine 200 are shown as separate configurations, but this is only an example, and the central server 100 is configured in each of the plurality of learning machines 200 . may be included.

복수의 학습 머신(200)은 중앙 서버(100)와의 데이터 송수신을 위한 통신부 및 프로세서로 구성될 수 있다.The plurality of learning machines 200 may include a communication unit and a processor for data transmission/reception with the central server 100 .

예를 들어, 복수의 학습 머신(200)은 각각에 중복되지 않도록 할당된 학습 데이터를 기초로 학습된 로컬 그래디언트 파라미터(local gradient parameter)를 생성하고, 생성된 로컬 그래디언트 파라미터를 압축하여 중앙 서버(100)로 전달할 수 있다.For example, the plurality of learning machines 200 generate a local gradient parameter learned based on the training data allocated so as not to overlap each, and compress the generated local gradient parameter to form the central server 100 ) can be passed as

중앙 서버(100)는 복수의 학습 머신(200)들로부터 전달된 각각의 로컬 그래디언트 파라미터를 취합하여 평균화하고, 평균화된 정보를 복수의 학습 머신(200)이 서로 공유하도록 송신할 수 있다. 즉, 복수의 학습 머신(200)은 중앙 서버(100)로부터 전달된 정보를 이용하여 각각이 가지고 있는 학습시키고자하는 인공지능 신경망 모델의 파라미터를 업데이트할 수 있다.The central server 100 may collect and average each local gradient parameter transmitted from the plurality of learning machines 200 , and transmit the averaged information so that the plurality of learning machines 200 share with each other. That is, the plurality of learning machines 200 may use the information transmitted from the central server 100 to update the parameters of the artificial intelligence neural network model that each has and wants to learn.

본 명세서에서 제시하는 분산 학습 시스템(1000)은 분산 학습 환경에서 복수의 학습 머신(200) 각각이 전체 학습 데이터를 사용하지 않고, 일부 학습 데이터만을 사용함으로써 야기되는 분산을 설정 파라미터를 이용하여 줄이고, 복수의 학습 머신(200)이 중앙 서버(100)로 그래디언트 파라미터를 압축하여 전송함으로써, 전송되는 데이터 량을 줄이고, 복수의 학습 머신(200)으로부터 오는 설정 파라미터를 취합하는 중앙 서버(100)의 스토리지(storage)를 감소시킬 수 있다.The distributed learning system 1000 presented in this specification reduces the variance caused by each of the plurality of learning machines 200 using only some training data without using the entire training data in a distributed learning environment by using a setting parameter, The plurality of learning machines 200 compress and transmit the gradient parameters to the central server 100, thereby reducing the amount of transmitted data and collecting the setting parameters from the plurality of learning machines 200 Storage of the central server 100 (storage) can be reduced.

복수의 학습 머신(200)은 휴대폰, 스마트폰(smart phone), 노트북 컴퓨터(laptop computer), 디지털방송용 단말기, PDA(personal digital assistants), PMP(portable multimedia player), 네비게이션, 슬레이트 PC(slate PC), 태블릿 PC(tablet PC), 울트라북(ultrabook), 웨어러블 디바이스(wearable device, 예를 들어, 워치형 단말기(smart watch), 글래스형 단말기(smart glass), HMD(head mounted display)) 등을 포함하는 정보통신기기와 멀티미디어기기 및 그에 대한 응용 기기일 수 있음은 자명할 것이다. 즉, 복수의 학습 머신(200)들은 대용량 학습 데이터를 보다 효율적으로 학습할 수 있는 전자 장치일 수 있다.The plurality of learning machines 200 are mobile phones, smart phones, laptop computers, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation systems, and slate PCs. , including tablet PCs, ultrabooks, wearable devices, such as smart watch, smart glass, head mounted display (HMD), etc. It will be apparent that information and communication devices and multimedia devices and their application devices may be used. That is, the plurality of learning machines 200 may be electronic devices capable of learning large-capacity learning data more efficiently.

한편, 도 1에 도시되는 구성요소(100, 200)는 대용량 학습 데이터를 분산 학습하기 위한 시스템(1000)을 구성하는 구성요소의 일 예에 불과하며, 따라서 본 명세서에서 기술되는 분산 학습을 구현하기 위해서 추가의 구성요소가 부가될 수 있음은 명백할 것이다.On the other hand, the components 100 and 200 shown in FIG. 1 are only an example of the components constituting the system 1000 for distributed learning of large-capacity learning data, and therefore, to implement the distributed learning described in this specification It will be apparent that additional components may be added for this purpose.

도 2는 본 발명의 일 실시 예에 따른 분산 기계 학습에서 인공 신경망 모델의 파라미터를 갱신하는 방법의 흐름도이다. 이하에서 도 2의 적어도 일부 동작은 도 3 내지 도 5를 참조하여 설명한다. 도 3 내지 도 5는 본 발명의 일 실시 예에 따른 분산 기계 학습에서 파라미터를 갱신하는 과정을 설명하는 예시도이다.2 is a flowchart of a method of updating a parameter of an artificial neural network model in distributed machine learning according to an embodiment of the present invention. Hereinafter, at least some operations of FIG. 2 will be described with reference to FIGS. 3 to 5 . 3 to 5 are exemplary diagrams for explaining a process of updating parameters in distributed machine learning according to an embodiment of the present invention.

한편, 도 2에서 각 동작들은 순차적으로 수행될 수 있으나, 반드시 순차적으로 수행되는 것은 아니다. 예를 들어, 각 동작들의 순서가 변경될 수도 있으며, 적어도 두 동작들이 병렬적으로 수행될 수도 있다.Meanwhile, although each operation in FIG. 2 may be sequentially performed, it is not necessarily performed sequentially. For example, the order of each operation may be changed, and at least two operations may be performed in parallel.

도 2를 참조하면, 중앙 서버(100)는 복수의 학습 머신(200)들 각각에 동일한 랜덤 시드(random seed)를 제공할 수 있다(S210).Referring to FIG. 2 , the central server 100 may provide the same random seed to each of the plurality of learning machines 200 ( S210 ).

복수의 학습 머신(200)은 S210 동작을 통해 수신된 랜덤 시드를 기반으로 랜덤 매트릭스(R)를 생성할 수 있다(S220). 복수의 학습 머신(200)은 동일한 난수열과 동일 랜덤 시드를 기반으로 동일한 랜덤 매트릭스(R)를 생성할 수 있다.The plurality of learning machines 200 may generate a random matrix R based on the random seed received through operation S210 (S220). The plurality of learning machines 200 may generate the same random matrix R based on the same random number sequence and the same random seed.

복수의 학습 머신(200)은 각각에 할당된 학습 데이터를 기반으로 로컬 그래디언트 파라미터(

)를 생성할 수 있다(S231). 복수의 학습 머신(200)은 랜덤 매트릭스(R)을 이용하여 로컬 그래디언트 파라미터(

)를 압축하여 데이터 크기를 줄일 수 있다.A plurality of learning machines 200 based on the training data assigned to each of the local gradient parameters (

) can be generated (S231). A plurality of learning machines 200 using a random matrix (R) local gradient parameters (

) to reduce the data size.

복수의 학습 머신(200)은 각각에 구비된 동일한 인공지능 신경망 모델을 이용해 각각에 할당된 학습 데이터를 학습하여 로컬 그래디언트 파라미터(

)를 생성할 수 있다. 이 경우, 복수의 학습 머신(200)에는 학습 데이터가 서로 중복되지 않도록 각각 할당될 수 있다.The plurality of learning machines 200 learn the training data assigned to each using the same artificial intelligence neural network model provided in each, and the local gradient parameter (

) can be created. In this case, each of the plurality of learning machines 200 may be allocated so that the learning data do not overlap with each other.

도 2 및 도 3을 참조하면, 복수의 학습 머신(200)은 S220 동작을 통해 생성된 랜덤 매트릭스(R)를 이용하여 로컬 그래디언트 파라미터(

)와 설정 파라미터(

) 정보를 압축한 압축 정보를 생성할 수 있다(S232).2 and 3 , the plurality of learning machines 200 use the random matrix R generated through operation S220 to determine the local gradient parameter (

) and setting parameters (

) compressed information may be generated (S232).

예를 들어, 압축 정보는 중앙 서버(100)에서 복수의 학습 머신(200) 각각의 파라미터 정보를 취합하더라도, 그 정보의 크기 또는 양이 증대되지 않도록 데이터의 양을 감소시킨 정보를 의미할 수 있다. 즉, 복수의 학습 머신(200) 각각은 압축 정보를 생성함으로써, 분산 학습에서 요구되는 스토리지를 감소시킬 수 있다. 압축 정보는 다음 수학식 1에 의해 생성될 수 있다.For example, the compressed information may refer to information in which the amount of data is reduced so that the size or amount of the information is not increased even when the central server 100 collects parameter information of each of the plurality of learning machines 200 . . That is, each of the plurality of learning machines 200 may reduce storage required in distributed learning by generating compressed information. Compressed information may be generated by Equation 1 below.

수학식 1 및 도 3을 참조하면, 복수의 학습 머신(200)은 할당된 학습 데이터에 기초하여 얻어진 로컬 그래디언트 파라미터(

)와 설정 파라미터(

)의 차이에 랜덤 매트릭스(R)를 곱하여 계산된 함수의 출력 값인 압축 정보를 획득할 수 있다.Referring to

Equations

1 and 3, the plurality of learning machines 200 have local gradient parameters (

) and setting parameters (

) multiplied by the random matrix (R) to obtain compressed information that is an output value of the calculated function.

일 실시 예에서, 학습 머신(200)은 모두 동일한 설정 파라미터(

)를 이용하거나, 또는 각각 상이한 설정 파라미터(

)를 이용할 수 있다. 설정 파라미터(

)는 분산 학습에서의 분산을 감소시키기 위해 이용되는 임의의 설정 파라미터를 의미할 수 있다.In one embodiment, the learning machine 200 has all the same setting parameters (

), or using different setting parameters (

) can be used. setting parameters (

) may mean any setting parameter used to reduce variance in distributed learning.

복수의 학습 머신(200)은 S232 동작을 통해 획득한 압축 정보를 중앙 서버(100)로 전송할 수 있다(S233). 이 경우, 복수의 학습 머신(200)들로부터 전달된 압축 정보들은 중앙 서버(100)에서 수용 가능한 크기를 가질 수 있다.The plurality of learning machines 200 may transmit the compressed information obtained through operation S232 to the central server 100 (S233). In this case, the compressed information transmitted from the plurality of learning machines 200 may have a size acceptable to the central server 100 .

중앙 서버(100)는 S233 동작을 통해 수신된 압축 정보를 취합하여 평균 압축 그래디언트 파라미터(

)를 생성할 수 있다(S234).The central server 100 collects the compression information received through operation S233, and the average compression gradient parameter (

) can be generated (S234).

평균 압축 그래디언트 파라미터(

)는 복수 개(예를 들어, n개)의 학습 머신(200)으로부터 전달된 압축 정보의 평균값을 의미하며, 평균 압축 그래디언트 파라미터(

)는 일반적으로 이용되는 평균값 수학식을 통해 산출될 수 있다. 중앙 서버(100)는 다음 수학식 2에 따라 평균 압축 그래디언트 파라미터(

)를 계산할 수 있다.Average compression gradient parameter (

) means an average value of compressed information transmitted from a plurality of (eg, n) learning machines 200, and an average compression gradient parameter (

) can be calculated through a commonly used average value equation. The central server 100 calculates the average compression gradient parameter (

) can be calculated.

수학식 2를 참조하면, 중앙 서버(100)는 복수의 학습 머신(200)으로부터 수신된 복수의 압축 정보를 학습 머신(200)의 총 개수로 나누어 평균 압축 그래디언트 파라미터(

)를 산출할 수 있다.Referring to Equation 2, the central server 100 divides the plurality of pieces of compressed information received from the plurality of learning machines 200 by the total number of the learning machines 200 and divides the average compression gradient parameter (

) can be calculated.

중앙 서버(100)는 S234 동작을 통해 생성된 평균 압축 그래디언트 파라미터(

)를 복수의 학습 머신(200)으로 전송할 수 있다(S235). 즉, 복수의 학습 머신(200)은 동일한 평균 압축 그래디언트 파라미터(

)를 공유할 수 있다.The central server 100 generates the average compression gradient parameter (

) may be transmitted to the plurality of learning machines 200 (S235). That is, a plurality of learning machines 200 have the same average compressed gradient parameter (

) can be shared.

복수의 학습 머신(200)은 S235 동작을 통해 수신한 평균 압축 그래디언트 파라미터(

)를 압축 해제하여(S236), 평균 그래디언트 파라미터(

)를 획득할 수 있다(S237).The plurality of learning machines 200 receive the average compressed gradient parameter (

) by decompressing (S236), the average gradient parameter (

) can be obtained (S237).

복수의 학습 머신(200)은 랜덤 매트릭스(R)의 트랜스포즈(transpose,

)를 이용하여 평균 압축 그래디언트 파라미터(

)를 압축 해제시킴으로써, 이전 크기를 가진 평균 그래디언트 파라미터(

)를 획득할 수 있다.A plurality of learning machines 200 transpose of the random matrix (R) (transpose,

) using the average compression gradient parameter (

), by decompressing the average gradient parameter (

) can be obtained.

랜덤 매트릭스(R)의 트랜스포즈(transpose,

)는 랜덤 매트릭스(R)의 전치행렬로서, 랜덤 매트릭스(R)의 행과 열을 바꾼 행렬을 의미할 수 있다. 평균 그래디언트 파라미터(

)는 다음 수학식 3에 의해 산출될 수 있다.Transpose of the random matrix (R) (transpose,

) is a transpose matrix of the random matrix R, and may mean a matrix in which rows and columns of the random matrix R are exchanged. average gradient parameter (

) can be calculated by Equation 3 below.

수학식 3을 참조하면, 복수의 학습 머신(200)은 평균 압축 그래디언트(

)에 랜덤 매트릭스(R)의 트랜스포즈(

)를 곱하여 압축 해제하고, 설정 파라미터(

)를 더하여 평균 그래디언트 파라미터(

)를 획득할 수 있다. 여기서, 랜덤 매트릭스(R)과 랜덤 매트릭스의 트랜스포즈(

)의 곱은 단위행렬(identity matrix)로서 정의될 수 있다.Referring to Equation 3, the plurality of learning machines 200 have an average compression gradient (

) to the transpose of the random matrix (R) (

) to decompress, and set parameters (

) by adding the average gradient parameter (

) can be obtained. Here, the random matrix (R) and the random matrix transpose (

) can be defined as an identity matrix.

복수의 학습 머신(200)은 S237 동작을 통해 획득한 평균 그래디언트 파라미터(

)에 기초하여 인공지능 모델의 파라미터를 갱신할 수 있다(S238).The plurality of learning machines 200 obtain the average gradient parameter (

), it is possible to update the parameters of the artificial intelligence model based on (S238).

예를 들어, 복수의 학습 머신(200)은 분산 학습 수행에 따라 각각의 인공지능 모델의 파라미터를 갱신할 수 있으며, 갱신된 파라미터는 아래의 수학식 4에 의해 정의될 수 있다.For example, the plurality of learning machines 200 may update parameters of each AI model according to distributed learning, and the updated parameters may be defined by Equation 4 below.

수학식 4를 참조하면,

는 훈련 레이트(learning rate)를 의미하고, 바람직하게는 0.01 내지 0.02 값으로 설정될 수 있다.Referring to Equation 4,

denotes a training rate, and may preferably be set to a value of 0.01 to 0.02.

일 실시 예에서, 복수의 학습 머신(200)은 이동 평균법(moving average)을 이용해 설정 파라미터(

)를 업데이트할 수 있다. 이동 평균법은 평균을 구할 수량을 설정하여 시간이 경과함에 따라 과거 데이터를 제외하고, 신규 데이터를 추가하여 평균화하는 방법으로서, 갑작스러운 데이터의 변동을 방지할 수 있다.In one embodiment, the plurality of learning machines 200 use a moving average method to set parameters (

) can be updated. The moving average method is a method of averaging by adding new data while excluding past data as time elapses by setting the quantity to be averaged, and it is possible to prevent sudden data fluctuations.

즉, 복수의 학습 머신(200)은 인공지능 모델의 파라미터가 업데이트된 이후에 평균 그래디언트 파라미터(

)를 기초로 이동 평균법을 이용해 설정 파라미터(

)를 업데이트할 수 있다. 이동 평균법을 이용하여 갱신된 설정 파라미터(

)는 다음 수학식 5에 의해 계산될 수 있다.That is, after the plurality of learning machines 200 are updated with the parameters of the artificial intelligence model, the average gradient parameter (

) using the moving average method based on the setting parameters (

) can be updated. Setting parameters updated using the moving average method (

) can be calculated by Equation 5 below.

수학식 5를 참조하면,

는 압축률을 의미하고, 랜덤 매트릭스(R)의 열의 개수를 행의 개수로 나눈 값으로 정의될 수 있다.Referring to Equation 5,

denotes a compression ratio, and may be defined as a value obtained by dividing the number of columns of the random matrix R by the number of rows.

일 실시 예에서, S231 동작 내지 S238 동작은 인공지능 모델의 학습이 진행될 때마다 반복하여 수행될 수 있다.In an embodiment, operations S231 to S238 may be repeatedly performed whenever learning of the artificial intelligence model is in progress.

본 발명에서는 상술한 분산 기계 학습에서의 파라미터 업데이트 방법 및 시스템을 이용하여 보다 정확한 전력 수요량을 예측할 수 있으며, 이를 통해 보다 안정적이고 효율적인 전력 제공 전략을 수립할 수 있다.In the present invention, it is possible to more accurately predict the power demand by using the parameter update method and system in the distributed machine learning described above, and through this, it is possible to establish a more stable and efficient power provision strategy.

상술한 바와 같이, 본 발명은 초대용량 학습 데이터의 학습 시간을 단축시키고, 네트워크 통신량을 감소시켜 통신비용을 절감할 수 있는 효과가 있다.As described above, the present invention has the effect of shortening the learning time of the super-capacity learning data and reducing the communication cost by reducing the network communication amount.

한편, 본 명세서에 기재된 다양한 실시예들은 하드웨어, 미들웨어, 마이크로코드, 소프트웨어 및/또는 이들의 조합에 의해 구현될 수 있다. 예를 들어, 다양한 실시예들은 하나 이상의 주문형 반도체(ASIC)들, 디지털 신호 프로세서(DSP)들, 디지털 신호 프로세싱 디바이스(DSPD)들, 프로그램어블 논리 디바이스(PLD)들, 필드 프로그램어블 게이트 어레이(FPGA)들, 프로세서들, 컨트롤러들, 마이크로컨트롤러들, 마이크로프로세서들, 여기서 제시되는 기능들을 수행하도록 설계되는 다른 전자 유닛들 또는 이들의 조합 내에서 구현될 수 있다.Meanwhile, various embodiments described herein may be implemented by hardware, middleware, microcode, software, and/or a combination thereof. For example, various embodiments may include one or more application specific semiconductors (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs). ), processors, controllers, microcontrollers, microprocessors, other electronic units designed to perform the functions presented herein, or a combination thereof.

또한, 예를 들어, 다양한 실시예들은 명령들을 포함하는 컴퓨터-판독가능한 매체에 수록되거나 인코딩될 수 있다. 컴퓨터-판독가능한 매체에 수록 또는 인코딩된 명령들은 프로그램 가능한 프로세서 또는 다른 프로세서로 하여금 예컨대, 명령들이 실행될 때 방법을 수행하게끔 할 수 있다. 저장 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수도 있다. 예를 들어, 이러한 컴퓨터-판독가능한 매체는 RAM, ROM, EEPROM, CD-ROM 또는 기타 광학 디스크 저장 매체, 자기 디스크 저장 매체 또는 기타 자기 저장 디바이스, 또는 원하는 프로그램 코드를 컴퓨터에 의해 액세스가능한 명령들 또는 데이터 구조들의 형태로 저장하는데 이용될 수 있는 임의의 다른 매체를 포함할 수 있다. Also, for example, the various embodiments may be embodied in or encoded on a computer-readable medium comprising instructions. The instructions embodied in or encoded on a computer-readable medium may cause a programmable processor or other processor to perform a method, eg, when the instructions are executed. A storage medium may be any available medium that can be accessed by a computer. For example, such computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage medium, magnetic disk storage medium or other magnetic storage device, or desired program code, containing instructions or may include any other medium that can be used for storage in the form of data structures.

이러한 하드웨어, 소프트웨어, 펌웨어 등은 본 명세서에 기술된 다양한 동작들 및 기능들을 지원하도록 동일한 디바이스 내에서 또는 개별 디바이스들 내에서 구현될 수 있다. 추가적으로, 본 발명에서 "~부"로 기재된 구성요소들, 유닛들, 모듈들, 컴포넌트들 등은 함께 또는 개별적이지만 상호 운용가능한 로직 디바이스들로서 개별적으로 구현될 수 있다. 모듈들, 유닛들 등에 대한 서로 다른 특징들의 묘사는 서로 다른 기능적 실시예들을 강조하기 위해 의도된 것이며, 이들이 개별 하드웨어 또는 소프트웨어 컴포넌트들에 의해 실현되어야만 함을 필수적으로 의미하지 않는다. 오히려, 하나 이상의 모듈들 또는 유닛들과 관련된 기능은 개별 하드웨어 또는 소프트웨어 컴포넌트들에 의해 수행되거나 또는 공통의 또는 개별의 하드웨어 또는 소프트웨어 컴포넌트들 내에 통합될 수 있다.Such hardware, software, firmware, etc. may be implemented in the same device or in separate devices to support the various operations and functions described herein. Additionally, components, units, modules, components, etc. described as “parts” in the present invention may be implemented together or individually as separate but interoperable logic devices. Depictions of different features of modules, units, etc. are intended to emphasize different functional embodiments, and do not necessarily imply that they must be realized by separate hardware or software components. Rather, functionality associated with one or more modules or units may be performed by separate hardware or software components or integrated within common or separate hardware or software components.

특정한 순서로 동작들이 도면에 도시되어 있지만, 이러한 동작들이 원하는 결과를 달성하기 위해 도시된 특정한 순서, 또는 순차적인 순서로 수행되거나, 또는 모든 도시된 동작이 수행되어야 할 필요가 있는 것으로 이해되지 말아야 한다. 임의의 환경에서는, 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 더욱이, 상술한 실시예에서 다양한 구성요소들의 구분은 모든 실시예에서 이러한 구분을 필요로 하는 것으로 이해되어서는 안되며, 기술된 구성요소들이 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키징될 수 있다는 것이 이해되어야 한다.Although acts are shown in the drawings in a particular order, it should not be understood that these acts need to be performed in the specific order, or sequential order, shown, or that all shown acts need to be performed to achieve a desired result. . In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the division of various components in the above-described embodiments should not be construed as requiring such division in all embodiments, and that the described components will generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there can be

이상에서와 같이 도면과 명세서에서 최적 실시예가 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.As described above, an optimal embodiment has been disclosed in the drawings and the specification. Although specific terms are used herein, they are used only for the purpose of describing the present invention, and are not used to limit the meaning or scope of the present invention described in the claims. Therefore, it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

100: 중앙 서버 200: 학습 머신100: central server 200: learning machine

Claims

the central server sending a random seed;
receiving, by a plurality of learning machines, the random seed, and generating a random matrix based on the received random seed;
generating, by the plurality of learning machines, a local gradient parameter based on training data assigned to them;
generating compressed information by compressing, by the plurality of learning machines, the generated local gradient parameter and setting parameter using the random matrix;
sending, by the plurality of learning machines, the compressed information to the central server;
generating, by the central server, the compressed information received from the plurality of learning machines to generate an average compression gradient parameter;
sending, by the central server, the average compressed gradient parameter to the plurality of learning machines;
decompressing, by the plurality of learning machines, the received average compressed gradient parameter using the transpose of the random matrix to obtain an average gradient parameter; and
and updating, by the plurality of learning machines, the parameters of the artificial intelligence model based on the obtained average gradient parameters.

The method of claim 1,
The distributed machine learning method of an artificial intelligence neural network further comprising the plurality of learning machines updating the setting parameters by using the average gradient parameters.

The method of claim 1,
The operation of the plurality of learning machines generating the compressed information includes:
and generating the compressed information by multiplying the difference between the local gradient parameter and the set parameter by the random matrix.

The method of claim 1,
The operation of the plurality of learning machines to obtain the average gradient parameter comprises:
and obtaining the average gradient parameter by adding the setting parameter to a value obtained by multiplying the average compressed gradient parameter and the transform of the random matrix.

A computer-readable storage medium comprising:
A computer-readable storage medium comprising a computer program that, when executed on a computer, performs the distributed machine learning method according to any one of claims 1 to 4.

The central server sends a random seed,
a plurality of learning machines receive the random seed, and generate a random matrix based on the received random seed;
The plurality of learning machines generate local gradient parameters based on the training data assigned to them,
The plurality of learning machines compress the generated local gradient parameter and the setting parameter using the random matrix to generate compressed information,
the plurality of learning machines send the compressed information to the central server,
the central server aggregates the compression information received from the plurality of learning machines to generate an average compression gradient parameter;
the central server sends the average compressed gradient parameter to the plurality of learning machines;
the plurality of learning machines decompress the received average compressed gradient parameter using the transpose of the random matrix to obtain an average gradient parameter;
A distributed machine learning system of an artificial intelligence neural network, wherein the plurality of learning machines update the parameters of the artificial intelligence model based on the obtained average gradient parameters.

The method of claim 6, wherein the plurality of learning machines,
A distributed machine learning system of an artificial intelligence neural network that updates the set parameter using the average gradient parameter.

The method of claim 6, wherein the plurality of learning machines,
A distributed machine learning system of an artificial intelligence neural network for generating the compressed information by multiplying a difference between the local gradient parameter and the set parameter by the random matrix.

The method of claim 6, wherein the plurality of learning machines,
A distributed machine learning system of an artificial intelligence neural network for obtaining the average gradient parameter by adding the setting parameter to a value obtained by multiplying the average compressed gradient parameter and the transform of the random matrix.