KR20230158735A

KR20230158735A - apparatus and method for neural network compression using block transform

Info

Publication number: KR20230158735A
Application number: KR1020220058217A
Authority: KR
Inventors: 최종원; 서승모; 조승현; 정승진
Original assignee: 중앙대학교 산학협력단
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2023-11-21
Also published as: KR102650992B1

Abstract

미리 학습된 딥러닝 모델로부터 타겟에 적합하게 압축된 신경망 모델을 생성하기 위한 블록 변환을 이용한 신경망 압축 장치 및 방법이 개시된다. 일 실시예에 따른 블록 변환을 이용한 신경망 압축 방법은 둘 이상의 잔여 블록(residual block)으로 구성된 원본 신경망에서 둘 이상의 잔여 블록 중 적어도 하나를 우회 블록(bypassing block) 및 재활용 블록(recycling block) 중 적어도 하나로 변환한 하나 이상의 변환 신경망을 생성하는 블록 변환 단계; 하나 이상의 변환 신경망 중 소정 기준으로 선택된 하나 이상의 선택 변환 신경망을 레이블이 부여된 소스 데이터를 이용하여 학습 시키는 사전 학습 단계; 및 사전 학습된 하나 이상의 선택 변환 신경망 중 정규화 점수(regularization score)를 기준으로 선별된 선별 변환 신경망을 레이블이 부여되지 않은 타겟 데이터를 이용하여 학습시키는 타겟 적응 단계를 포함할 수 있다.A neural network compression apparatus and method using block transformation for generating a neural network model compressed appropriately for a target from a pre-trained deep learning model are disclosed. A neural network compression method using block transformation according to an embodiment is to convert at least one of the two or more residual blocks into at least one of a bypassing block and a recycling block in an original neural network composed of two or more residual blocks. A block transformation step of generating one or more transformed neural networks; A pre-learning step of training one or more selective transformation neural networks selected based on a predetermined criterion among one or more transformation neural networks using labeled source data; And it may include a target adaptation step of training a selection transformation neural network selected based on a regularization score among one or more pre-trained selection transformation neural networks using unlabeled target data.

Description

Neural network compression apparatus and method using block transform {apparatus and method for neural network compression using block transform}

블록 변환을 이용한 신경망 압축 장치 및 방법에 관한 것이다.This relates to a neural network compression device and method using block transformation.

최근 인공 신경망은 높은 성능을 위하여 딥러닝 모델 구조를 기반으로 하고 있으며, 딥러닝 모델은 두꺼운 레이어 구조 및 매우 많은 파라미터로 구성된다. 이에 따라, 딥러닝 모델은 높은 하드웨어의 성능이 요구되며, 연산을 위하여 다량의 에너지를 소비하는 문제가 있다. Recently, artificial neural networks are based on deep learning model structures for high performance, and deep learning models consist of a thick layer structure and a very large number of parameters. Accordingly, deep learning models require high hardware performance and have the problem of consuming a large amount of energy for calculation.

반면, 신경망 모델은 다양한 장비에 적용되는 추세이며, 특히, 모바일 장비는 한정된 하드웨어 성능 및 에너지만을 이용할 수 있어 딥러닝 모델을 그대로 구현하는데 문제가 발생한다. 또한, 장비 별로 타겟팅하는 대상이 다른 바, 각각의 타겟 및 하드웨어 규격에 최적화된 신경망 모델을 적용할 필요가 있다.On the other hand, neural network models are being applied to a variety of devices, and in particular, mobile devices can only use limited hardware performance and energy, causing problems in implementing deep learning models. In addition, since the target target is different for each device, it is necessary to apply a neural network model optimized for each target and hardware standard.

한국등록특허공보 제10-2332490호(2021.12.01)Korean Patent Publication No. 10-2332490 (2021.12.01)

미리 학습된 딥러닝 모델로부터 타겟에 적합하게 압축된 신경망 모델을 생성하기 위한 블록 변환을 이용한 신경망 압축 장치 및 방법을 제공하는데 목적이 있다.The purpose is to provide a neural network compression device and method using block transformation to generate a compressed neural network model appropriate for the target from a pre-trained deep learning model.

일 양상에 따르면, 블록 변환을 이용한 신경망 압축 방법은 둘 이상의 잔여 블록(residual block)으로 구성된 원본 신경망에서 둘 이상의 잔여 블록 중 적어도 하나를 우회 블록(bypassing block) 및 재활용 블록(recycling block) 중 적어도 하나로 변환한 하나 이상의 변환 신경망을 생성하는 블록 변환 단계; 하나 이상의 변환 신경망 중 소정 기준으로 선택된 하나 이상의 선택 변환 신경망을 레이블이 부여된 소스 데이터를 이용하여 학습 시키는 사전 학습 단계; 및 사전 학습된 하나 이상의 선택 변환 신경망 중 정규화 점수(regularization score)를 기준으로 선별된 선별 변환 신경망을 레이블이 부여되지 않은 타겟 데이터를 이용하여 학습시키는 타겟 적응 단계를 포함할 수 있다. According to one aspect, a neural network compression method using block transformation converts at least one of the two or more residual blocks into at least one of a bypassing block and a recycling block in an original neural network composed of two or more residual blocks. A block transformation step of generating one or more transformed neural networks; A pre-learning step of training one or more selective transformation neural networks selected based on a predetermined criterion among one or more transformation neural networks using labeled source data; And it may include a target adaptation step of training a selection transformation neural network selected based on a regularization score among one or more pre-trained selection transformation neural networks using unlabeled target data.

블록 변환 단계는 m개의 잔여 블록으로 구성된 원본 신경망 S₀으로부터 1개의 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S₁ 내지 m-1개 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S_m-1을 생성할 수 있다.The block conversion step is a set of transformation neural networks created by transforming 1 residual block from the original neural network S ₀ composed of m residual blocks, S ₁ to S m-1, a set of transformation neural networks created by transforming _m-1 residual blocks. can be created.

사전 학습 단계는 변환 신경망의 집합 S₁ 내지 변환 신경망의 집합 S_m-1의 집합들 각각으로부터 하나의 변환 신경망을 선택하며, 선택된 m-1 개의 선택 변환 신경망들을 학습시킬 수 있다.In the pre-learning step, one transform neural network is selected from each of the transform neural network set S ₁ to the transform neural network set S _m-1 , and the selected m-1 selected transform neural networks can be trained.

사전 학습 단계는 레이블이 부여된 소스 데이터 중 하나 이상의 변환 신경망에서 모두 레이블 값을 예측한 하나 이상의 양성 샘플 소스 데이터를 추출하며, 하나 이상의 양성 샘플 소스 데이터를 기초로 선택 변환 신경망의 정규화 손실(regularizations loss)을 계산할 수 있다. The pre-learning step extracts one or more positive sample source data whose label values are predicted by one or more transformation neural networks among the labeled source data, and the regularization loss of the selection transformation neural network based on the one or more positive sample source data. ) can be calculated.

정규화 손실은 소스 데이터에 대한 원본 신경망의 예측값과 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이 및 양성 샘플 소스 데이터에 대한 원본 신경망의 예측값과 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이에 기초하여 계산될 수 있다.The normalization loss is calculated based on the probability distribution difference between the predictions of the original network and the predictions of one or more selection transformation networks for the source data and the probability distribution difference between the predictions of the original network and the predictions of one or more selection transformation networks for the positive sample source data. It can be.

정규화 손실은 소스 데이터의 레이블에 레이블 스무딩(label-smoothing)을 적용하여 계산될 수 있다. Normalization loss can be calculated by applying label-smoothing to the labels of the source data.

사전 학습 단계는 원본 소스 데이터 및 레이블 스무딩이 적용된 소스 데이터에 대한 원본 신경망의 예측값을 기초로 레이블 스무딩에 대한 교차 엔트로피 손실(cross-entropy loss)을 계산할 수 있다.The pre-learning step may calculate the cross-entropy loss for label smoothing based on the predicted value of the original neural network for the original source data and the source data to which label smoothing was applied.

사전 학습 단계는 정규화 손실 및 교차 엔트로피 손실에 기초하여 하나 이상의 선택 변환 신경망을 학습시킬 수 있다. The pre-training step may train one or more selection transformation neural networks based on regularization loss and cross-entropy loss.

정규화 점수는 타겟 데이터에 대한 원본 신경망의 예측값과 사전 학습된 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이에 기초하여 계산될 수 있다. The normalization score may be calculated based on the probability distribution difference between the predicted value of the original neural network for the target data and the predicted value of one or more pre-trained selective transformation neural networks.

타겟 적응 단계는 타겟 데이터 중 원본 신경망과 선별 변환 신경망이 동일한 값을 예측하는 양성 샘플 타겟 데이터를 추출하며, 타겟 데이터에 대한 원본 신경망의 예측값과 선별 변환 신경망의 예측값의 확률 분포 차이 및 양성 샘플 타겟 데이터에 대한 원본 신경망의 예측값과 선별 변환 신경망의 예측값의 확률 분포 차이에 기초하여 정규화 손실(regularizations loss)을 계산하며, 정규화 점수는 정규화 손실을 더 기초하여 계산될 수 있다. The target adaptation step extracts positive sample target data from the target data in which the original neural network and the selective transformation neural network predict the same value, and the probability distribution difference between the predicted value of the original neural network and the predicted value of the selective transformation neural network for the target data and the positive sample target data The regularization loss is calculated based on the probability distribution difference between the predicted value of the original neural network and the predicted value of the selected transformation neural network, and the regularization score can be calculated further based on the regularization loss.

타겟 적응 단계는 타겟 데이터를 소정의 기준으로 클러스터링하여 생성한 합성 레이블(synthetic label)에 대한 원본 신경망의 교차 엔트로피 손실(cross-entropy loss)을 계산할 수 있다. The target adaptation step may calculate the cross-entropy loss of the original neural network for the synthetic label generated by clustering the target data based on a predetermined standard.

타겟 적응 단계는 정규화 손실 및 교차 엔트로피 손실에 기초하여 선별 변환 신경망을 학습할 수 있다.The target adaptation step may learn a selective transformation neural network based on the regularization loss and cross-entropy loss.

일 양상에 따르면, 블록 변환을 이용한 신경망 압축 장치는 둘 이상의 잔여 블록(residual block)으로 구성된 원본 신경망에서 둘 이상의 잔여 블록 중 적어도 하나를 우회 블록(bypassing block) 및 재활용 블록(recycling block) 중 적어도 하나로 변환한 하나 이상의 변환 신경망을 생성하는 블록 변환부; 하나 이상의 변환 신경망 중 소정 기준으로 선택된 하나 이상의 선택 변환 신경망을 레이블이 부여된 소스 데이터를 이용하여 학습 시키는 사전 학습부; 및 사전 학습된 하나 이상의 선택 변환 신경망 중 정규화 점수(regularization score)를 기준으로 선별된 선별 변환 신경망을 레이블이 부여되지 않은 타겟 데이터를 이용하여 학습시키는 타겟 적응부를 포함할 수 있다.According to one aspect, a neural network compression device using block transformation converts at least one of the two or more residual blocks into at least one of a bypassing block and a recycling block in an original neural network composed of two or more residual blocks. A block conversion unit that generates one or more converted neural networks; a dictionary learning unit that trains one or more selective transformation neural networks selected based on a predetermined criterion among one or more transformation neural networks using labeled source data; and a target adaptation unit that trains a selection transformation neural network selected based on a regularization score among one or more pre-trained selection transformation neural networks using unlabeled target data.

상기 블록 변환부는, m개의 잔여 블록으로 구성된 원본 신경망 S₀으로부터 1개의 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S₁ 내지 m-1개 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S_m-1을 생성할 수 있다.The block conversion unit is a set of transformation neural networks S m-, a set _of transformation neural networks generated by transforming one residual block from an original neural network S ₀ composed of m residual blocks, S ₁ to m-1 residual blocks. ₁ can be created.

상기 사전 학습부는, 변환 신경망의 집합 S₁ 내지 변환 신경망의 집합 S_m-1의 집합들 각각으로부터 하나의 변환 신경망을 선택하며, 선택된 m-1 개의 선택 변환 신경망들을 학습시킬 수 있다.The dictionary learning unit selects one transformation neural network from each of the sets S ₁ of transformation neural networks to the set S _m-1 of transformation neural networks, and may train the selected m-1 selected transformation neural networks.

상기 사전 학습부는, 상기 레이블이 부여된 소스 데이터 중 상기 하나 이상의 변환 신경망에서 모두 레이블 값을 예측한 하나 이상의 양성 샘플 소스 데이터를 추출하며, 상기 하나 이상의 양성 샘플 소스 데이터를 기초로 선택 변환 신경망의 정규화 손실(regularizations loss)을 계산할 수 있다.The dictionary learning unit extracts one or more positive sample source data whose label values are predicted by the one or more transformation neural networks among the labeled source data, and normalizes the selected transformation neural network based on the one or more positive sample source data. The loss (regularizations loss) can be calculated.

상기 정규화 손실은, 상기 소스 데이터에 대한 원본 신경망의 예측값과 상기 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이 및 상기 양성 샘플 소스 데이터에 대한 원본 신경망의 예측값과 상기 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이에 기초하여 계산될 수 있다.The normalization loss is a probability distribution difference between the predicted value of the original neural network for the source data and the predicted value of the one or more selective transformation neural networks and the probability distribution between the predicted value of the original neural network for the positive sample source data and the predicted value of the one or more selective transformation neural networks. It can be calculated based on distribution differences.

상기 정규화 손실은, 상기 소스 데이터의 레이블에 레이블 스무딩(label-smoothing)을 적용하여 계산되고, 상기 사전 학습부는, 원본 소스 데이터 및 레이블 스무딩이 적용된 소스 데이터에 대한 원본 신경망의 예측값을 기초로 레이블 스무딩에 대한 교차 엔트로피 손실(cross-entropy loss)을 계산하며, 상기 정규화 손실 및 상기 교차 엔트로피 손실에 기초하여 상기 하나 이상의 선택 변환 신경망을 학습시킬 수 있다.The normalization loss is calculated by applying label-smoothing to the label of the source data, and the dictionary learning unit performs label smoothing based on the predicted value of the original neural network for the original source data and the source data to which label smoothing has been applied. Calculate the cross-entropy loss for , and train the one or more selection transformation neural networks based on the normalization loss and the cross-entropy loss.

상기 정규화 점수는, 타겟 데이터에 대한 원본 신경망의 예측값과 상기 사전 학습된 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이에 기초하여 계산되고, 상기 타겟 적응부는, 타겟 데이터 중 상기 원본 신경망과 상기 선별 변환 신경망이 동일한 값을 예측하는 양성 샘플 타겟 데이터를 추출하며, 상기 타겟 데이터에 대한 원본 신경망의 예측값과 상기 선별 변환 신경망의 예측값의 확률 분포 차이 및 상기 양성 샘플 타겟 데이터에 대한 원본 신경망의 예측값과 상기 선별 변환 신경망의 예측값의 확률 분포 차이에 기초하여 정규화 손실(regularizations loss)을 계산하며, 상기 정규화 점수는 상기 정규화 손실을 더 기초하여 계산될 수 있다.The normalization score is calculated based on the probability distribution difference between the predicted value of the original neural network for the target data and the predicted value of the one or more pre-trained selection transformation neural networks, and the target adaptor is configured to configure the original neural network and the selection transformation among the target data. A neural network extracts positive sample target data predicting the same value, a probability distribution difference between the predicted value of the original neural network for the target data and the predicted value of the selection transformation neural network, and the predicted value of the original neural network for the positive sample target data and the selection Regularization losses are calculated based on the probability distribution difference between the predicted values of the transformation neural network, and the regularization score may be calculated further based on the regularization loss.

상기 타겟 적응부는, 상기 타겟 데이터를 소정의 기준으로 클러스터링하여 생성한 합성 레이블(synthetic label)에 대한 상기 원본 신경망의 교차 엔트로피 손실(cross-entropy loss)을 계산할 수 있다.The target adaptation unit may calculate a cross-entropy loss of the original neural network for a synthetic label generated by clustering the target data based on a predetermined standard.

상기 타겟 적응부는, 상기 정규화 손실 및 상기 교차 엔트로피 손실에 기초하여 상기 선별 변환 신경망을 학습할 수 있다.The target adaptor may learn the selective transformation neural network based on the normalization loss and the cross-entropy loss.

미리 학습된 딥러닝 모델로부터 타겟에 적합하게 압축된 신경망 모델을 생성할 수 있다. A compressed neural network model suitable for the target can be created from a pre-trained deep learning model.

도 1은 일 실시예에 따른 블록 변환을 이용한 신경망 압축 방법을 도시한 흐름도이다.
도 2는 일 예에 따른 블록 변환 방법을 설명하기 위한 예시도이다.
도 3은 일 예에 따른 클러스터링 방법을 설명하기 위한 예시도이다.
도 4는 일 실시예에 따른 블록 변환을 이용한 신경망 압축 장치의 구성도이다.
도 5는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도이다.Figure 1 is a flowchart illustrating a neural network compression method using block transform according to an embodiment.
Figure 2 is an example diagram for explaining a block conversion method according to an example.
Figure 3 is an example diagram for explaining a clustering method according to an example.
Figure 4 is a configuration diagram of a neural network compression device using block transformation according to an embodiment.
5 is a block diagram illustrating and illustrating a computing environment including a computing device suitable for use in example embodiments.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예를 상세하게 설명한다. 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로, 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the attached drawings. In describing the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the content throughout this specification.

이하, 블록 변환을 이용한 신경망 압축 장치 및 방법의 실시예들을 도면들을 참고하여 자세히 설명한다.Hereinafter, embodiments of a neural network compression device and method using block transformation will be described in detail with reference to the drawings.

도 1은 일 실시예에 따른 블록 변환을 이용한 신경망 압축 방법을 도시한 흐름도이다.Figure 1 is a flowchart illustrating a neural network compression method using block transform according to an embodiment.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 둘 이상의 잔여 블록(residual block, 이하, 0으로 표시)으로 구성된 원본 신경망에서 둘 이상의 잔여 블록 중 적어도 하나를 우회 블록(bypassing block, 이하, B로 표시) 및 재활용 블록(recycling block, 이하, R로 표시) 중 적어도 하나로 변환한 하나 이상의 변환 신경망을 생성할 수 있다(110).According to one embodiment, a neural network compression device using block transformation bypasses at least one of the two or more residual blocks in an original neural network composed of two or more residual blocks (hereinafter, denoted as 0). One or more transformation neural networks converted to at least one of (denoted as ) and recycling block (hereinafter, denoted as R) can be created (110).

도 2를 참조하면, 원본 신경망(200)은 둘 이상의 잔여 블록(210, 220)으로 구성될 수 있으며, 원본 신경망(200)에 포함된 둘 이상의 잔여 블록(210, 220) 중 적어도 하나는 우회 블록 및 재활용 블록 중 적어도 하나로 변환될 수 있다. 예를 들어, 하나의 잔여 블록(220)이 우회 블록(221)으로 변환되어 신경망(201)을 구성하거나, 재활용 블록(222)으로 변환되어 신경망(202)을 구성할 수 있다.Referring to FIG. 2, the original neural network 200 may be composed of two or more residual blocks 210 and 220, and at least one of the two or more residual blocks 210 and 220 included in the original neural network 200 is a bypass block. and a recycled block. For example, one residual block 220 may be converted into a bypass block 221 to form the neural network 201, or converted into a recycling block 222 to form the neural network 202.

일 예를 들어, 우회 블록은 입력 데이터를 별도의 처리 없이 출력 데이터로 바로 출력하는 블록이며, 재활용 블록은 입력 데이터를 앞 단의 블록의 입력으로 다시 입력한 후 앞 단의 블록에서 출력한 데이터를 출력 데이터로 출력하는 블록일 수 있다.For example, a bypass block is a block that outputs input data directly as output data without any additional processing, and a recycle block re-inputs input data as input to the previous block and then returns the data output from the previous block. It may be a block that outputs output data.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 m개의 잔여 블록으로 구성된 원본 신경망 S₀으로부터 1개의 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S₁ 내지 m-1개 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S_m-1을 생성할 수 있다.According to one embodiment, a neural network compression device using block transformation transforms a set of transformation neural networks S 1 to m-1 residual blocks generated by transforming ₁ residual block from an original neural network S ₀ composed of m residual blocks. A set S _m-1 of the created transformation neural network can be created.

일 예를 들어, 블록 변환을 이용한 신경망 압축 장치는 원본 신경망 S₀이 3개의 잔여 블록으로 구성되는 경우, S₀={<000>}과 같이 표현될 수 있으며, 원본 신경망 S₀으로부터 1개의 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S₁의 경우 S₁={<00R>, <0R0>, <00B>, <0B0>, <B00>}과 같이 생성할 수 있다. 다른 예로, 우회 블록 및 재활용 블록은 앞 단의 블록 결과를 이용하여야 하는 바, 가장 앞 단의 블록은 변환하지 않도록 설정할 수 있다. 예를 들어, S₁={<00R>, <0R0>, <00B>, <0B0>}과 같이 생성될 수 있다. For example, a neural network compression device using block transformation can be expressed as S ₀ ={<000>} when the original neural network S ₀ consists of 3 residual blocks, and 1 residual from the original neural network S ₀ In the case of a set S ₁ of a transformation neural network created by transforming a block, it can be created as follows: S ₁ ={<00R>, <0R0>, <00B>, <0B0>, <B00>}. As another example, since bypass blocks and recycling blocks must use the results of the previous block, the frontmost block can be set not to be converted. For example, S ₁ ={<00R>, <0R0>, <00B>, <0B0>}.

일 예를 들어, 블록 변환을 이용한 신경망 압축 장치는 하나 이상의 변환 신경망의 집합 중 일부를 선택적으로 생성하거나, 변환 신경망의 집합에 포함되는 하나 이상의 변환 신경망 중 일부를 선택적으로 생성할 수 있다. 예를 들어, 블록 변환을 이용한 신경망 압축 장치는 원본 신경망 S₀이 3개의 잔여 블록으로 구성되는 경우, 변환 신경망의 집합 S₁, S₂중 S₁만을 선택적으로 생성할 수 있다. 예를 들어, 블록 변환을 이용한 신경망 압축 장치는 1개의 잔여 블록을 변환하여 생성한 변환 신경망의 집합 S₁={<00R>, <0R0>, <00B>, <0B0>, <B00>} 중 {<00R>, <00B>}만을 선택적으로 생성할 수 있다.For example, a neural network compression apparatus using block transformation may selectively generate a portion of a set of one or more transform neural networks, or may selectively generate a portion of one or more transform neural networks included in a set of transform neural networks. For example, when the original neural network S ₀ consists of three residual blocks, a neural network compression device using block transformation can selectively generate only S ₁ among the sets S ₁ and S ₂ of the transformed neural networks. For example, a neural network compression device using block transformation is a set of transformation neural networks S ₁ ={<00R>, <0R0>, <00B>, <0B0>, <B00>} created by transforming one residual block. Only {<00R>, <00B>} can be created selectively.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 하나 이상의 변환 신경망 중 소정 기준으로 선택된 하나 이상의 선택 변환 신경망을 레이블이 부여된 소스 데이터를 이용하여 학습 시킬 수 있다(120). According to one embodiment, a neural network compression apparatus using block transformation can train one or more selective transformation neural networks selected based on a predetermined criterion among one or more transformation neural networks using labeled source data (120).

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 변환 신경망의 집합 S₁ 내지 변환 신경망의 집합 S_m-1의 집합들 각각으로부터 하나의 변환 신경망을 선택하며, 선택된 m-1 개의 선택 변환 신경망들을 학습시킬 수 있다.According to one embodiment, the neural network compression apparatus using block transformation selects one transformation neural network from each of the sets S ₁ of the transformation neural networks to the set S _m-1 of the transformation neural networks, and the selected m-1 selected transformation neural networks can be taught.

일 예로, 블록 변환을 이용한 신경망 압축 장치는 변환 신경망의 집합 S₁ 내지 변환 신경망의 집합 S_m-1의 집합들 각각으로부터 하나의 변환 신경망을 선택할 수 있다. 예를 들어, 원본 신경망 S₀이 3개의 잔여 블록으로 구성되는 경우, 블록 변환을 이용한 신경망 압축 장치는 변환 신경망의 집합 S₁, S₂, 각각에 포함된 변환 신경망 중 하나를 선택할 수 있다. 예를 들어, 블록 변환을 이용한 신경망 압축 장치는 S₁={<00R>} 및 S₂={<0BR>}를 선택할 수 있다.As an example, a neural network compression apparatus using block transformation may select one transformation neural network from each of the sets S ₁ of the transformation neural network to the set S _m-1 of the transformation neural network. For example, if the original neural network S ₀ consists of three residual blocks, a neural network compression device using block transformation can select one of the transformation neural networks included in each of the sets S ₁ and S ₂ of the transformation neural networks. For example, a neural network compression device using block transformation may select S ₁ ={<00R>} and S ₂ ={<0BR>}.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 레이블이 부여된 소스 데이터 중 하나 이상의 변환 신경망에서 모두 레이블 값을 예측한 하나 이상의 양성 샘플 소스 데이터를 추출할 수 있다.According to one embodiment, a neural network compression apparatus using block transformation may extract one or more positive sample source data whose label values are all predicted from one or more transformation neural networks among labeled source data.

예를 들어, 소스 데이터가 sd₁~sd_n로 구성되며, 이 중 sd₁, sd₂에 대하여 변환 신경망들이 모두 레이블 값을 정확히 예측한 경우, sd₁, sd₂가 양성 샘플 소스 데이터가 될 수 있다. For example, if the source data consists of sd ₁ to sd _n , and the transformation neural networks for sd ₁ and sd ₂ all correctly predicted the label values, sd ₁ and sd ₂ can be positive sample source data. there is.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 하나 이상의 양성 샘플 소스 데이터를 기초로 선택 변환 신경망의 정규화 손실(regularizations loss)을 계산할 수 있다.According to one embodiment, a neural network compression apparatus using block transformation may calculate regularization loss of a selection transformation neural network based on one or more positive sample source data.

일 실시예에 따르면, 정규화 손실은 소스 데이터에 대한 원본 신경망의 예측값과 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이 및 양성 샘플 소스 데이터에 대한 원본 신경망의 예측값과 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이에 기초하여 계산될 수 있다. According to one embodiment, the normalization loss is the probability distribution difference between the predicted values of the original network for source data and the predicted values of one or more selected transformation neural networks and the probability distribution between the predicted values of the original network and the predicted values of one or more selected transformation neural networks for positive sample source data. It can be calculated based on distribution differences.

일 예로, 정규화 손실은 아래 수학식과 같이 정의될 수 있다. As an example, the normalization loss can be defined as the equation below.

[수학식 1][Equation 1]

여기서, X는 소스 데이터, x_p는 양성 샘플 소스 데이터를 나타내며, || ||_H는 후버 놈(Huber norm)을 나타낸다. 또한, D_s(x)는 젠슨-섀넌 발산(Jensen-Shannon Divergence, JSD)에 의해 계산되는 원본 신경망과 변환 신경망 사이의 확률 분포 차이일 수 있으며, 아래와 같이 정의될 수 있다. Here, X represents the source data, x _p represents the positive sample source data, and || || _H represents Huber norm. Additionally, D _s (x) can be the probability distribution difference between the original neural network and the transformed neural network calculated by Jensen-Shannon Divergence (JSD), and can be defined as follows.

[수학식 2][Equation 2]

여기서, f^s는 신경망 s의 예측값, s()는 소프트맥스(softmax) 함수, t는 소프트맥스 함수의 출력값을 조절하기 위한 온도 하이퍼파라미터(temperature hyperparameter)를 나타낸다.Here, f ^s represents the predicted value of neural network s, s() represents the softmax function, and t represents the temperature hyperparameter for controlling the output value of the softmax function.

일 실시예에 따르면, 정규화 손실은 소스 데이터의 레이블에 레이블 스무딩(label-smoothing)을 적용하여 계산될 수 있다. 예를 들어, 소스 데이터의 레이블은 원-핫 인코딩 레이블(one-hot encoded label)일 수 있다. 이때, 레이블 스무딩을 적용하면, 원-핫 인코딩 레이블의 0 및 1의 0 내지 1의 실수로 변경될 수 있다. 예를 들어, 원-핫 인코딩 레이블이 [0, 0, 1]인 경우, 레이블 스무딩이 적용되면 레이블은 [0.001, 0.002, 0.998]과 같이 변환될 수 있다. 일 예에 따르면, 수학식 1은 레이블 스무딩된 레이블을 이용하여 계산될 수 있다. According to one embodiment, the normalization loss may be calculated by applying label-smoothing to the labels of the source data. For example, the label of the source data may be a one-hot encoded label. At this time, if label smoothing is applied, the 0 and 1 of the one-hot encoding label may be changed to real numbers of 0 to 1. For example, if the one-hot encoding label is [0, 0, 1], when label smoothing is applied, the label can be converted to [0.001, 0.002, 0.998]. According to one example, Equation 1 can be calculated using a label smoothed label.

일 예에 따르면, 레이블 스무딩을 적용하여 원본 신경망과 변환 신경망의 예측값의 차이를 일정한 범위 내로 조절하는 클러스터링을 수행할 수 있다. 예를 들어, 도 3(a)는 원본 신경망(<00>)과 변환 신경망(<0B>)에 원-핫 인코딩 레이블의 소스 데이터를 적용하는 경우의 예측값 분포를 나타낸다. 도 3(a)에서 나타나는 바와 같이, 원-핫 인코딩 레이블이 적용된 소스 데이터를 이용하는 경우, 일부 소스 데이터에 대한 변환 신경망(<0B>)의 예측값과 원본 신경망(<00>)의 예측값의 차이가 일정 범위를 벗어날 수 있다. 반면, 레이블 스무딩이 적용된 소스 데이터를 이용하는 경우 예측값이 클러스터링될 수 있으며, 도 3(b)와 같이 변환 신경망(<0B>)의 예측값이 원본 신경망(<00>)의 예측값을 중심으로 일정 범위 내에서 위치하는 것을 볼 수 있다.According to one example, label smoothing can be applied to perform clustering to adjust the difference between the predicted values of the original neural network and the transformed neural network within a certain range. For example, Figure 3(a) shows the distribution of predicted values when applying source data of one-hot encoding labels to the original neural network (<00>) and the transformed neural network (<0B>). As shown in Figure 3(a), when using source data to which a one-hot encoding label is applied, the difference between the predicted value of the transformation neural network (<0B>) and the predicted value of the original neural network (<00>) for some source data is It may be outside a certain range. On the other hand, when using source data to which label smoothing has been applied, the predicted values can be clustered, and as shown in Figure 3(b), the predicted value of the transformation neural network (<0B>) is within a certain range centered on the predicted value of the original neural network (<00>). You can see it is located in .

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 원본 소스 데이터 및 레이블 스무딩이 적용된 소스 데이터에 대한 원본 신경망의 예측값을 기초로 레이블 스무딩에 대한 교차 엔트로피 손실(cross-entropy loss)을 계산할 수 있다. 일 예로, 교차 엔트로피 손실은 아래 수학식과 같이 정의될 수 있다. According to one embodiment, a neural network compression device using block transformation may calculate a cross-entropy loss for label smoothing based on the predicted value of the original neural network for the original source data and the source data to which label smoothing has been applied. . As an example, cross entropy loss can be defined as the equation below.

[수학식 3][Equation 3]

여기서, y는 원-핫 인코딩 레이블이 적용된 소스 데이터를 나타내며, g()는 레이블 스무딩을 위한 함수로 아래와 같이 정의될 수 있다. Here, y represents source data to which a one-hot encoding label has been applied, and g() is a function for label smoothing and can be defined as follows.

[수학식 4][Equation 4]

여기서, 1_y는 원-핫 인코딩 벡터이며, C는 클래스의 개수, a는 하이퍼파라미터를 나타낸다. Here, 1 _y is a one-hot encoding vector, C represents the number of classes, and a represents a hyperparameter.

일 예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 정규화 손실 및 교차 엔트로피 손실에 기초하여 하나 이상의 선택 변환 신경망을 학습할 수 있다. 예를 들어, 소스 데이터를 이용하여 선택 변환 신경망을 사전 학습시키기 위한 손실함수는 아래 수학식과 같이 정의될 수 있다. According to one example, a neural network compression device using block transformation can learn one or more selective transformation neural networks based on normalization loss and cross-entropy loss. For example, the loss function for pre-training a selection transformation neural network using source data can be defined as in the equation below.

[수학식 5][Equation 5]

여기서, l는 임의의 파라미터이다. Here, l is an arbitrary parameter.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 사전 학습된 하나 이상의 선택 변환 신경망 중 정규화 점수(regularization score)를 기준으로 선별된 선별 변환 신경망을 레이블이 부여되지 않은 타겟 데이터를 이용하여 학습시킬 수 있다(130).According to one embodiment, a neural network compression device using block transformation trains a selection transformation neural network selected based on a regularization score among one or more pre-trained selection transformation neural networks using unlabeled target data. Can (130).

일 실시예에 따르면, 정규화 점수는 타겟 데이터에 대한 원본 신경망의 예측값과 사전 학습된 하나 이상의 선택 변환 신경망의 예측값의 확률 분포 차이에 기초하여 계산될 수 있다. 예를 들어, 타겟 데이터에 대한 원본 신경망의 예측값과 사전 학습된 하나 이상의 선택 변환 신경망의 예측값의 확률 분포의 차이는 쿨백-라이블러 발산(Kullback-Leibler divergence, KLD)을 이용하여 계산될 수 있다. According to one embodiment, the normalization score may be calculated based on the probability distribution difference between the predicted value of the original neural network for the target data and the predicted value of one or more pre-trained selection transformation neural networks. For example, the difference between the probability distribution of the predicted value of the original neural network for the target data and the predicted value of one or more pre-trained selection transformation neural networks may be calculated using Kullback-Leibler divergence (KLD).

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 타겟 데이터 중 원본 신경망과 선별 변환 신경망이 동일한 값을 예측하는 양성 샘플 타겟 데이터를 추출할 수 있으며, 타겟 데이터에 대한 원본 신경망의 예측값과 선별 변환 신경망의 예측값의 확률 분포 차이 및 양성 샘플 타겟 데이터에 대한 원본 신경망의 예측값과 선별 변환 신경망의 예측값의 확률 분포 차이에 기초하여 정규화 손실(regularizations loss)을 계산할 수 있다. 예를 들어, 정규화 손실은 수학식 1을 이용하여 계산될 수 있다. According to one embodiment, a neural network compression device using block transformation can extract positive sample target data for which the original neural network and the selective transformation neural network predict the same value among the target data, and the predicted value of the original neural network and the selective transformation for the target data. Regularization loss can be calculated based on the difference in probability distribution of the predicted value of the neural network and the difference in probability distribution between the predicted value of the original neural network and the predicted value of the selection transformation neural network for positive sample target data. For example, normalization loss can be calculated using Equation 1.

일 실시예에 따르면, 정규화 점수는 쿨백-라이블러 발산을 통한 원본 신경망과 사전 학습된 선택 변환 신경망의 확률 분포의 차이와 정규화 손실을 이용하여 계산될 수 있다. 예를 들어, 정규화 점수는 아래 수학식과 같이 정의될 수 있다.According to one embodiment, the normalization score may be calculated using the difference in probability distribution and normalization loss between the original neural network through Kullback-Leibler divergence and the pre-trained selective transformation neural network. For example, the normalized score can be defined as the equation below.

[수학식 6][Equation 6]

여기서, KL()은 쿨백-라이블러 발산을 나타내며, 는 양성 샘플 타겟 데이터를 나타낸다.Here, KL() represents the Kullback-Leibler divergence, represents positive sample target data.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 타겟 데이터를 소정의 기준으로 클러스터링하여 생성한 합성 레이블(synthetic label)에 대한 원본 신경망의 교차 엔트로피 손실(cross-entropy loss)을 계산할 수 있다. 일 예로, 타겟 데이터에 대한 원본 신경망의 교차 엔트로피 손실은 아래 수학식과 같이 정의될 수 있다.According to one embodiment, a neural network compression device using block transformation may calculate the cross-entropy loss of the original neural network for a synthetic label generated by clustering target data based on a predetermined standard. As an example, the cross-entropy loss of the original neural network for target data can be defined as the equation below.

[수학식 7][Equation 7]

여기서, 합성 레이블은 레이블이 지정되지 않은 타겟 데이터의 대표 특징을 클러스터링하여 생성되며, 교차 엔트로피 손실에 대한 의사 레이블(pseudo label)로 사용될 수 있다. 일 예로, 합성 레이블에 의한 손실은 아래 수학식과 같이 정의될 수 있다. Here, the synthetic label is generated by clustering representative features of unlabeled target data and can be used as a pseudo label for cross-entropy loss. As an example, the loss due to a synthetic label can be defined as the equation below.

[수학식 8][Equation 8]

여기서, 는 합성 레이블을 나타낸다.here, represents a composite label.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치는 정규화 손실 및 교차 엔트로피 손실에 기초하여 선별 변환 신경망을 학습할 수 있다. 예를 들어, 타겟 데이터에 적응하기 위한 학습 손실은 아래 수학식과 같이 정의될 수 있다. According to one embodiment, a neural network compression device using block transformation can learn a selective transformation neural network based on normalization loss and cross-entropy loss. For example, the learning loss for adapting to target data can be defined as the equation below.

[수학식 9][Equation 9]

여기서, 과 는 사용자에 의해 정의되는 하이퍼파라미터이다.here, class is a hyperparameter defined by the user.

일 예에 따르면, 타겟 데이터에 적응하는 과정은 수차례 반복될 수 있다. 예를 들어, 타겟 데이터에 적응하는 과정은 매 수행 시마다 수학식 6의 정규화 점수를 이용하여 사전 학습된 하나 이상의 선택 변환 신경망을 일정 비율로 선택할 수 있으며, 최종 하나의 선택 변환 신경망이 남을 때까지 반복될 수 있다. According to one example, the process of adapting to target data may be repeated several times. For example, the process of adapting to target data can select one or more pre-trained selection transformation neural networks at a certain rate using the normalization score of Equation 6 at each execution, and repeat until one final selection transformation neural network remains. It can be.

도 4는 일 실시예에 따른 블록 변환을 이용한 신경망 압축 장치의 구성도이다.Figure 4 is a configuration diagram of a neural network compression device using block transformation according to an embodiment.

일 실시예에 따르면, 블록 변환을 이용한 신경망 압축 장치(400)는 블록 변환부(410), 사전 학습부(420) 및 타겟 적응부(430)를 포함할 수 있다. According to one embodiment, the neural network compression apparatus 400 using block transformation may include a block transformation unit 410, a dictionary learning unit 420, and a target adaptation unit 430.

일 실시예에 따르면, 블록 변환부(410)는 둘 이상의 잔여 블록으로 구성된 원본 신경망에서 둘 이상의 잔여 블록 중 적어도 하나를 우회 블록 및 재활용 블록 중 적어도 하나로 변환한 하나 이상의 변환 신경망을 생성할 수 있다.According to one embodiment, the block conversion unit 410 may generate one or more transformation neural networks by converting at least one of the two or more residual blocks from an original neural network composed of two or more residual blocks into at least one of a bypass block and a recycled block.

일 실시예에 따르면, 사전 학습부(420)는 하나 이상의 변환 신경망 중 소정 기준으로 선택된 하나 이상의 선택 변환 신경망을 레이블이 부여된 소스 데이터를 이용하여 학습 시킬 수 있다.According to one embodiment, the dictionary learning unit 420 may train one or more selective transformation neural networks selected based on a predetermined criterion among one or more transformation neural networks using labeled source data.

일 실시예에 따르면, 타겟 적응부(430)는 사전 학습된 하나 이상의 선택 변환 신경망 중 정규화 점수를 기준으로 선별된 선별 변환 신경망을 레이블이 부여되지 않은 타겟 데이터를 이용하여 학습시킬 수 있다.According to one embodiment, the target adaptation unit 430 may train a selective transformation neural network selected based on a normalization score among one or more pre-trained selective transformation neural networks using unlabeled target data.

도 5는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.FIG. 5 is a block diagram illustrating and illustrating a computing environment 10 including computing devices suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 블록 변환을 이용한 신경망 압축 장치(400)일 수 있다.The illustrated computing environment 10 includes a computing device 12 . In one embodiment, the computing device 12 may be a neural network compression device 400 using block transform.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. Processor 14 may cause computing device 12 to operate in accordance with the example embodiments noted above. For example, processor 14 may execute one or more programs stored on computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 14, cause computing device 12 to perform operations according to example embodiments. It can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, computer-readable storage medium 16 includes memory (volatile memory, such as random access memory, non-volatile memory, or an appropriate combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, another form of storage medium that can be accessed by computing device 12 and store desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communication bus 18 interconnects various other components of computing device 12, including processor 14 and computer-readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide an interface for one or more input/output devices 24. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. Input/output device 24 may be coupled to other components of computing device 12 through input/output interface 22. Exemplary input/output devices 24 include, but are not limited to, a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touch screen), a voice or sound input device, various types of sensor devices, and/or imaging devices. It may include input devices and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included within the computing device 12 as a component constituting the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12. It may be possible.

이제까지 본 발명에 대하여 그 바람직한 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 범위는 전술한 실시 예에 한정되지 않고 특허 청구범위에 기재된 내용과 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.So far, the present invention has been examined focusing on its preferred embodiments. A person skilled in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Accordingly, the scope of the present invention is not limited to the above-described embodiments, but should be construed to include various embodiments within the scope equivalent to the content described in the patent claims.

400: 블록 변환을 이용한 신경망 압축 장치
410: 블록 변환부
420 : 사전 학습부
430: 타겟 적응부400: Neural network compression device using block transformation
410: Block conversion unit
420: Dictionary learning unit
430: Target adaptation unit

Claims

A block conversion step of generating one or more transformation neural networks in which at least one of the two or more residual blocks is converted from an original neural network composed of two or more residual blocks into at least one of a bypassing block and a recycling block. ;
A pre-learning step of training one or more selective transformation neural networks selected based on a predetermined criterion among the one or more transformation neural networks using labeled source data; and
Neural network compression using block transformation, including a target adaptation step of training a selection transformation neural network selected based on a regularization score among the one or more pre-trained selection transformation neural networks using unlabeled target data. method.

According to claim 1,
The block conversion step is
Generating a set S m-1 of a transformation neural network created by transforming one residual block S ₁ to m-1 residual blocks from an original neural network S ₀ composed of _m residual blocks, Neural network compression method using block transformation.

According to claim 2,
The pre-learning step is
A neural network compression method using block transformation, wherein one transformation neural network is selected from each of the sets S ₁ of transformation neural networks to the set S _m-1 of transformation neural networks, and the selected m-1 selected transformation neural networks are trained.

According to claim 1,
The pre-learning step is
Extracting one or more positive sample source data whose label values are all predicted by the one or more transformation neural networks from the labeled source data,
A neural network compression method using block transformation, wherein regularization loss of a selection transformation neural network is calculated based on the one or more positive sample source data.

According to claim 4,
The normalization loss is
Probability distribution difference between the predicted value of the original neural network for the source data and the predicted value of the one or more selected transformation neural networks, and
Neural network compression method using block transformation, calculated based on the probability distribution difference between the predicted value of the original neural network for the positive sample source data and the predicted value of the one or more selected transformation neural networks.

According to claim 4,
The normalization loss is
A neural network compression method using block transformation, calculated by applying label-smoothing to the labels of the source data.

According to claim 6,
The pre-learning step is
A neural network compression method using block transformation that calculates the cross-entropy loss for label smoothing based on the original neural network's predicted values for the original source data and the source data to which label smoothing has been applied.

According to claim 7,
The pre-learning step is
A neural network compression method using block transformation, wherein the one or more selective transformation neural networks are trained based on the normalization loss and the cross-entropy loss.

According to claim 1,
The normalized score is
A neural network compression method using block transformation, calculated based on the probability distribution difference between the predicted value of the original neural network for target data and the predicted value of one or more pre-trained selective transformation neural networks.

According to clause 9,
The target adaptation step is
From the target data, extract positive sample target data for which the original neural network and the selective transformation neural network predict the same value,
Regularization loss (regularizations) based on the probability distribution difference between the predicted value of the original neural network for the target data and the predicted value of the selected transformation neural network and the probability distribution difference between the predicted value of the original neural network and the predicted value of the selected transformation neural network for the positive sample target data. loss) is calculated,
Neural network compression method using block transformation, wherein the normalization score is calculated further based on the normalization loss.

According to claim 10,
The target adaptation step is
A neural network compression method using block transformation that calculates the cross-entropy loss of the original neural network for a synthetic label generated by clustering the target data based on a predetermined standard.

According to claim 11,
The target adaptation step is
A neural network compression method using block transformation, wherein the selection transformation neural network is learned based on the normalization loss and the cross-entropy loss.

A block conversion unit that generates one or more transformation neural networks in which at least one of the two or more residual blocks is converted from an original neural network composed of two or more residual blocks into at least one of a bypassing block and a recycling block. ;
a dictionary learning unit that trains one or more selective transformation neural networks selected based on a predetermined criterion among the one or more transformation neural networks using labeled source data; and
A neural network compression device using block transformation, including a target adaptation unit that trains a selective transformation neural network selected based on a regularization score among the one or more pre-trained selective transformation neural networks using unlabeled target data. .

In claim 13,
The block conversion unit,
Generating a set S m-1 of a transformation neural network created by transforming one residual block S ₁ to m-1 residual blocks from an original neural network S ₀ composed of _m residual blocks, Neural network compression device using block transformation.

In claim 14,
The dictionary learning department,
A neural network compression device using block transformation, which selects one transformation neural network from each of the sets S ₁ of transformation neural networks to the set S _m-1 of transformation neural networks, and trains the selected m-1 selected transformation neural networks.

In claim 13,
The dictionary learning department,
Extracting one or more positive sample source data whose label values are predicted by the one or more transformation neural networks from the labeled source data,
Neural network compression device using block transformation, wherein regularization loss of a selection transformation neural network is calculated based on the one or more positive sample source data.

In claim 16,
The normalization loss is,
Based on the probability distribution difference between the predicted value of the original neural network for the source data and the predicted value of the one or more selective transformation neural networks and the probability distribution difference between the predicted value of the original neural network for the positive sample source data and the predicted value of the one or more selective transformation neural network A neural network compression device using block transformation that is calculated.

In claim 16,
The normalization loss is calculated by applying label-smoothing to the labels of the source data,
The dictionary learning department,
A cross-entropy loss for label smoothing is calculated based on the original source data and the predicted value of the original neural network for the source data to which label smoothing has been applied, and the one or more A neural network compression device using block transformation that trains a selective transformation neural network.

In claim 13,
The normalization score is calculated based on the probability distribution difference between the predicted value of the original neural network for the target data and the predicted value of the one or more pre-trained selection transformation neural networks,
The target adaptation unit,
Among the target data, positive sample target data for which the original neural network and the selective transformation neural network predict the same value are extracted, and the probability distribution difference between the predicted value of the original neural network and the predicted value of the selective transformation neural network for the target data and the positive sample target A neural network using block transformation, wherein a regularization loss is calculated based on the probability distribution difference between the predicted value of the original neural network for the data and the predicted value of the selected transformation neural network, and the regularization score is further calculated based on the regularization loss. compression device.

In claim 19,
The target adaptation unit,
A neural network compression device using block transformation that calculates the cross-entropy loss of the original neural network for a synthetic label generated by clustering the target data based on a predetermined standard.

In claim 20,
The target adaptation unit,
A neural network compression device using block transformation, wherein the selection transformation neural network is learned based on the normalization loss and the cross-entropy loss.

A computer program stored on a non-transitory computer readable storage medium,
The computer program includes one or more instructions that, when executed by a computing device having one or more processors, cause the computing device to:
A block conversion step of generating one or more transformation neural networks in which at least one of the two or more residual blocks is converted from an original neural network composed of two or more residual blocks into at least one of a bypassing block and a recycling block. ;
A pre-learning step of training one or more selective transformation neural networks selected based on a predetermined criterion among the one or more transformation neural networks using labeled source data; and
A computer program that performs a target adaptation step of training a selection transformation neural network selected based on a regularization score among the one or more pre-trained selection transformation neural networks using unlabeled target data.