KR20210035679A

KR20210035679A - Compressing apparatus and method of trained deep artificial neural networks for video coding unit

Info

Publication number: KR20210035679A
Application number: KR1020190117794A
Authority: KR
Inventors: 천승문; 고현철
Original assignee: (주)인시그널
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-04-01

Abstract

Disclosed are a compression device of a trained deep artificial neural network of a video coding tool, and a method thereof. According to an embodiment of the present invention, the compression device comprises: a pruning unit for selectively pruning a trained deep neural network of a video coding tool; a quantization unit for quantizing the trained deep neural network pruned by the pruning unit; and an entropy coding unit for entropy-coding the trained deep neural network quantized by the quantization unit and outputting the same as a bitstream, wherein the pruning unit calculates the value for the elements describing the trained deep neural network by reflecting back propagation through the deep neural network.

Description

Compressing apparatus and method of trained deep artificial neural networks for video coding unit

본 발명은 인공 신경망(Artificial Neural Network, ANN)에 관한 것으로, 보다 구체적으로 비디오 코딩 도구를 위한 훈련된 심층 인공 신경망의 압축 표현 기술에 관한 것이다.The present invention relates to an artificial neural network (ANN), and more particularly, to a compressed expression technique of a trained deep artificial neural network for a video coding tool.

인공 지능(Artificial Intelligence, AI)을 다양한 산업 분야에서 활용하기 위한 시도들이 계속되어 왔다. 특히, 최근의 인공 지능 기술은 생물학적 신경망과 공통된 특정 성능을 갖는 정보 처리 시스템인 신경망(Neural Network, NN)을 활용하면서, 그 성능이 큰 폭으로 향상되고 있으며, 그에 따라 응용 분야도 급속도로 증가하고 있다. Attempts have been made to utilize artificial intelligence (AI) in various industries. In particular, the recent artificial intelligence technology uses a neural network (NN), which is an information processing system having a specific performance in common with biological neural networks, and its performance is greatly improved, and accordingly, application fields are rapidly increasing. have.

이러한 신경망(NN)은 '인공' 신경망(Artificial Neural Network, ANN)이라고도 불린다. 인공 신경망(ANN)은 동물 신경의 행동 특성을 모방하는 분산 병렬 정보 처리 모델이다. ANN에는 서로 연결되어 있는 많은 수의 노드(뉴런이라고도 함)가 존재한다. ANN은 두 가지 특징을 가지고 있다: 1) 각 뉴런은 특정 출력 기능(활성화 기능이라고도 함)을 통해 다른 인접한 뉴런으로부터 가중 입력값을 계산한다. 2) 뉴런들 사이의 정보 전송 강도는 소위 "가중치(weight)"이라고 불리는 것에 의해 측정되며, 그러한 가중치는 특정한 알고리즘의 자기 학습에 의해 조정될 수 있다.Such a neural network (NN) is also called an'artificial' neural network (ANN). An artificial neural network (ANN) is a distributed parallel information processing model that mimics the behavioral characteristics of animal neurons. ANN has a large number of nodes (also called neurons) that are connected to each other. An ANN has two characteristics: 1) Each neuron computes a weighted input from another adjacent neuron through a specific output function (also called an activation function). 2) The intensity of information transmission between neurons is measured by so-called "weights", and such weights can be adjusted by self-learning of specific algorithms.

ANN은 신경망에 포함되는 변수 및 토폴로지 관계를 지정하기 위해 상이한 아키텍쳐를 사용할 수 있다. 신경망에 포함되는 파라미터는 뉴런의 활동과 함께 뉴런들 간의 연결의 가중치일 수 있다. 신경망 토폴로지의 유형으로 피드 포워드 네트워크와 역방향 전파 신경망(backward propagation neural network)이 있다. 전자에서는 동일한 계층에서 서로 연결된 각 계층 내의 노드가 다음 스테이지로 공급되는데, 제공되는 입력 패턴에 따라 연결의 가중치를 수정하는 '학습 규칙'의 일부 형태를 포함한다. 후자에서는 가중 조정치의 역방향 에러 전파를 허용하는 것으로, 전자보다 진보된 신경망이다. ANNs can use different architectures to designate variables and topology relationships included in neural networks. A parameter included in the neural network may be a weight of a connection between neurons along with the activity of the neurons. There are feed forward networks and backward propagation neural networks as types of neural network topologies. In the former, nodes within each layer connected to each other in the same layer are supplied to the next stage, which includes some form of'learning rule' that modifies the weight of the connection according to the provided input pattern. The latter allows the propagation of errors in the reverse direction of the weighting adjustment, which is an advanced neural network than the former.

심층 신경망(Deep Neural Network, DNN)은 다수의 레벨의 상호 연결된 노드를 갖는 신경망에 대응하여 매우 비선형이고 고도로 변화하는 기능을 콤팩트하게 표현할 수 있다. 그럼에도 불구하고, 다수의 계층과 연관된 노드의 수와 함께 DNN에 대한 계산 복잡도가 급격히 상승한다. 최근까지 이러한 DNN을 학습 또는 훈련(training)시키기 위한 효율적인 연산 방법들이 개발되고 있다. DNN의 학습 속도가 획기적으로 높아짐에 따라, 음성 인식, 이미지 세분화, 물체 감지, 안면 인식 등의 다양하고 복잡한 작업에 성공적으로 적용되고 있다. A deep neural network (DNN) responds to a neural network having multiple levels of interconnected nodes and can compactly express a highly nonlinear and highly variable function. Nevertheless, with the number of nodes associated with multiple layers, the computational complexity for the DNN increases rapidly. Until recently, efficient computation methods for learning or training such DNNs have been developed. As the learning speed of DNN increases dramatically, it has been successfully applied to various and complex tasks such as speech recognition, image segmentation, object detection, and facial recognition.

멀티미디어 콘텐츠, 예컨데 비디오의 압축 및 복원도 이러한 DNN의 적용이 시도되고 있는 분야의 하나이다. 현재 차세대 비디오 코딩으로 고효율 비디오 코딩(High Efficiency Video Coding, HEVC)이 ITU-T(비디오 코딩 전문가 그룹) 및 ISO/IEC MPEG(동영상 전문가 그룹) 표준화 조직의 공동 비디오 프로젝트에 의하여 개발되어 국제 표준으로 채택되어 사용되고 있으며, DNN을 HEVC 등과 같은 새로운 비디오 코딩 표준에 적용함으로써, 그 성능을 더욱 향상시키는 것이 가능하다는 것이 알려져 있다. 이러한 시도의 하나가 한국공개특허 제10-2018-0052651호, "비디오 코딩에서의 신경망 기반 프로세싱의 방법 및 장치"에 개시되어 있다. Multimedia content, such as video compression and decompression, is also one of the fields in which DNN is being applied. High Efficiency Video Coding (HEVC) as the current next-generation video coding was developed by the joint video project of the ITU-T (Video Coding Expert Group) and ISO/IEC MPEG (Video Expert Group) standardization organizations and adopted as an international standard. It has been used, and it is known that it is possible to further improve its performance by applying the DNN to a new video coding standard such as HEVC. One such attempt is disclosed in Korean Patent Laid-Open Publication No. 10-2018-0052651, "Method and apparatus of neural network-based processing in video coding".

그러나, 신경망의 규모는 최근 몇 년 동안 급속한 발전으로 인해 폭발하고 있다. 몇몇 진보된 신경망 모델들은 수백 개의 층과 수십억 개의 연결을 가지고 있을 것이다. 그리고 그것의 구현은 계산-중심과 기억-중심 둘 다이다. However, the scale of neural networks has exploded due to rapid development in recent years. Some advanced neural network models will have hundreds of layers and billions of connections. And its implementation is both computation-centric and memory-centric.

신경망이 점점 커지고 있기 때문에, 이동 단말기 등과 같이 스토리지 및 프로세서의 성능에 제약이 있는 장치에서 적용하기 위해서는 신경망 모델을 작은 크기로 만드는 것이 상당히 중요하지만, 이는 신경망의 성능을 저하시킬 수가 있어서 한계가 존재한다. 특히, 이동 단말기에서 중요한 어플리케이션으로 활용되는 멀티미디어 콘텐츠의 생산 및 소비를 위한 비디오 코딩 어플리케이션에 적용하기 위해서는, 작은 크기의 신경망 모델이 필수적이다. 뿐만 아니라, 비디오 코딩 어플리케이션의 특성상, 인코딩 장치와 디코딩 장치 간의 호환성도 확보될 필요가 있다. As neural networks are getting bigger, it is very important to make the neural network model small in order to apply it to devices with limited storage and processor performance, such as mobile terminals, but this may degrade the performance of the neural network, so there is a limitation. . Particularly, in order to apply to a video coding application for the production and consumption of multimedia content used as an important application in a mobile terminal, a neural network model of a small size is essential. In addition, due to the characteristics of a video coding application, it is necessary to ensure compatibility between the encoding device and the decoding device.

한국공개특허 제10-2018-0052651호, "비디오 코딩에서의 신경망 기반 프로세싱의 방법 및 장치"Korean Patent Laid-Open Patent No. 10-2018-0052651, "Method and apparatus for processing based on neural networks in video coding"

본 발명이 해결하고자 하는 하나의 과제는, 심층 신경망의 성능을 저하를 최소화하면서 이동 단말기 등과 같이 스토리지 및 프로세서의 성능에 제약이 있는 장치에도 적용할 수 있는, 비디오 코딩 도구의 훈련된 심층 신경망(trained deep neural networks)의 압축 장치 및 방법을 제공하는 것이다.One problem to be solved by the present invention is a trained deep neural network (trained deep neural network) of a video coding tool that can be applied to devices with limited performance of storage and processors such as mobile terminals while minimizing deterioration of the performance of a deep neural network. deep neural networks) compression apparatus and method.

전술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 훈련된 심층 인공 신경망의 압축 장치는, 비디오 코딩을 구성하는 코딩 도구의 훈련된 심층 신경망을 선택적으로 가지치기하기 위한 가지치기 유닛, 상기 가지치기 유닛에 의하여 가지치기된 상기 훈련된 심층 신경망을 양자화하기 위한 양자화 유닛 및 상기 양자화 유닛에 의하여 양자화된 상기 훈련된 심층 신경망을 엔트로피 코딩하여 비트스트림으로 출력하기 위한 엔트로피 코딩 유닛을 포함하고, 상기 가지치기 유닛은, 상기 코딩 도구의 유형을 식별한 다음, 식별된 코딩 도구의 유형을 고려하여 상기 훈련된 심층 신경망을 기술하는 요소들에 대한 가치도를 계산하되, 상기 심층 신경망을 통하여 후방 전파되는 것을 반영하여 상기 가치도를 계산한다. The apparatus for compressing a trained deep artificial neural network according to an embodiment of the present invention for solving the above-described problem includes a pruning unit for selectively pruning a trained deep neural network of a coding tool constituting video coding, and the branch. A quantization unit for quantizing the trained deep neural network pruned by a pruning unit, and an entropy coding unit for entropy coding the trained deep neural network quantized by the quantization unit and outputting a bitstream, the branch The stroke unit, after identifying the type of the coding tool, calculates a value degree for the elements describing the trained deep neural network in consideration of the type of the identified coding tool, which propagates backward through the deep neural network. Reflecting the above value is calculated.

상기 실시예의 일 측면에 의하면, 상기 훈련된 심층 신경망은, 복수의 출력 특성, 복수의 뉴런 및 복수의 컨볼루션 커널을 포함하고, 상기 훈련된 심층 신경망을 기술하는 요소들 중에서 하나 또는 그 이상은, 상기 복수의 출력 특성, 상기 복수의 뉴런 및 상기 복수의 컨볼루션 커널 중에서 하나 이상을 포함할 수 있다. 이 경우에, 상기 훈련된 심층 신경망은, 전연결 계층에 존재하는 복수의 뉴런을 포함할 수 있다. According to an aspect of the embodiment, the trained deep neural network includes a plurality of output characteristics, a plurality of neurons, and a plurality of convolution kernels, and one or more of the elements describing the trained deep neural network are, It may include at least one of the plurality of output characteristics, the plurality of neurons, and the plurality of convolution kernels. In this case, the trained deep neural network may include a plurality of neurons present in the full connection layer.

전술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 훈련된 심층 인공 신경망의 압축 방법은, 비디오 코딩을 구성하는 코딩 도구의 훈련된 심층 신경망을 선택적으로 가지치기하기 위한 가지치기 단계, 상기 가지치기 단계에서 가지치기된 상기 훈련된 심층 신경망을 양자화하기 위한 양자화 단계 및 상기 양자화 단계에서 양자화된 상기 훈련된 심층 신경망을 엔트로피 코딩하여 비트스트림으로 출력하기 위한 엔트로피 코딩 단계를 포함하고, 상기 가지치기 단계에서는, 상기 코딩 도구의 유형을 식별한 다음, 식별된 코딩 도구의 유형을 고려하여 상기 훈련된 심층 신경망을 기술하는 요소들에 대한 가치도를 계산하되, 상기 심층 신경망을 통하여 후방 전파되는 것을 반영하여 상기 가치도를 계산한다. A method for compressing a trained deep artificial neural network according to an embodiment of the present invention for solving the above-described problem includes a pruning step for selectively pruning a trained deep neural network of a coding tool constituting video coding, the branch A quantization step for quantizing the trained deep neural network pruned in the pruning step, and an entropy coding step for entropy coding the trained deep neural network quantized in the quantization step and outputting a bitstream, the pruning step In, after identifying the type of the coding tool, the value of the elements describing the trained deep neural network is calculated in consideration of the type of the identified coding tool, reflecting that propagating backward through the deep neural network. Calculate the above value.

본 발명의 실시예에 의하면, 신경망의 성능을 저하를 최소화하면서 이동 단말기 등과 같이 스토리지 및 프로세서의 성능에 제약이 있는 장치에서도, 동영상을 이용하는 어플리케이션에 훈련된 심층 신경망을 적용할 수 있다. According to an embodiment of the present invention, a trained deep neural network can be applied to an application using a video even in a device having limited storage and processor performance, such as a mobile terminal, while minimizing deterioration of the performance of a neural network.

도 1은 본 발명의 일 실시예에 따른 비디오 코딩 도구의 훈련된 심층 인공 신경망의 압축 장치가 구현된 컴퓨터 시스템의 구체적인 구성을 보여 주는 블록도이다.
도 2는 심층 신경망 압축 장치의 구성의 일례를 보여 주는 블록도이다.
도 3은 본 발명의 일 실시예에 따른 훈련된 심층 신경망의 일례를 보여 주는 도면으로서, 컨볼루션 신경망(CNN)인 경우이다.
도 4는 본 발명의 일 실시예에 따른 비디오 코딩 도구의 훈련된 심층 인공 신경망의 압축 장치의 가지치기 유닛에서의 가지치기 방법의 일례를 보여 주는 흐름도이다.
도 5는 본 발명의 다른 실시예에 따른 비디오 코딩 도구의 훈련된 심층 인공 신경망의 압축 장치의 가지치기 유닛에서의 가지치기 방법의 일례를 보여 주는 흐름도이다.
도 6은 양자화 유닛에서의 양자화 과정의 일례로서, 적응적 양자화 과정의 일례를 보여 주는 흐름도이다.
도 7은 본 발명의 다른 실시예에 따른 양자화 과정을 보여 주는 흐름도이다. 1 is a block diagram showing a detailed configuration of a computer system in which an apparatus for compressing a trained deep artificial neural network of a video coding tool according to an embodiment of the present invention is implemented.
2 is a block diagram showing an example of a configuration of a deep neural network compression apparatus.
3 is a diagram showing an example of a trained deep neural network according to an embodiment of the present invention, in the case of a convolutional neural network (CNN).
4 is a flowchart illustrating an example of a pruning method in a pruning unit of a compression apparatus of a trained deep artificial neural network of a video coding tool according to an embodiment of the present invention.
5 is a flowchart illustrating an example of a pruning method in a pruning unit of a compression apparatus of a trained deep artificial neural network of a video coding tool according to another embodiment of the present invention.
6 is an example of a quantization process in a quantization unit, and is a flowchart showing an example of an adaptive quantization process.
7 is a flowchart showing a quantization process according to another embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 바람직한 실시형태 및 실시예를 설명한다. 다만, 이하의 실시형태 및 실시예는 본 발명의 바람직한 구성을 예시적으로 나타내는 것일 뿐이며, 본 발명의 범위는 이들 구성에 한정되지 않는다. 그리고 이하의 설명에 있어서, 장치의 하드웨어 구성 및 소프트웨어 구성, 처리 흐름, 제조조건, 크기, 재질, 형상 등은, 특히 특정적인 기재가 없는 한, 본 발명의 범위를 이것으로 한정하려는 취지인 것은 아니다.Hereinafter, preferred embodiments and examples of the present invention will be described with reference to the drawings. However, the following embodiments and examples are merely illustrative of preferred configurations of the present invention, and the scope of the present invention is not limited to these configurations. And in the following description, the hardware configuration and software configuration of the device, processing flow, manufacturing conditions, size, material, shape, etc. are not intended to limit the scope of the present invention to this, unless specifically stated otherwise. .

도 1은 본 발명의 일 실시예에 따른 비디오 코딩 도구의 훈련된 심층 인공 신경망의 압축 장치가 구현된 컴퓨터 시스템(100)의 구체적인 구성을 보여 주는 블록도이다. 1 is a block diagram showing a detailed configuration of a computer system 100 in which an apparatus for compressing a trained deep artificial neural network of a video coding tool according to an embodiment of the present invention is implemented.

도 1을 참조하면, 컴퓨터 시스템(100)은 하나 또는 이상의 프로세서(110), 입출력 장치 인터페이스(120), 네트워크 인터페이스(130), 인터컨넥터(BUS, 140), 메모리(150) 및 스토리지(160)을 포함한다. 이러한 컴퓨터 시스템(100)은 단일의 컴퓨팅 장치로 구성한 특정한 하나의 장치이거나 또는 하나 이상의 프로세서와 하나 이상의 관련 메모리를 포함하여 구성된 다수의 장치일 수 있다.Referring to FIG. 1, the computer system 100 includes one or more processors 110, an input/output device interface 120, a network interface 130, an interconnector (BUS, 140), a memory 150, and a storage 160. Includes. The computer system 100 may be one specific device composed of a single computing device, or may be a plurality of devices including one or more processors and one or more associated memories.

프로세서(110)는 메모리(150) 또는 스토리지(160)에 저장되어 있는 프로그래밍 명령어를 가져와서 실행한다. 마찬가지로, 프로세서(110)는 메모리(150)에 어플리케이션 데이터를 저장하거나 또는 가져온다. 입출력 장치 인터페이스(120)는, 키보드, 디스플레이 및 마우스 장치 등과 같은 입출력 장치(12)를 컴퓨터 시스템(100)에 연결하기 위한 것이다. 네트워크 인터페이스(130)는 유선이나 무선을 통해 자체망(인트라넷)이나 인터넷, 무신통신 네트워크 등과 같은 외부망과 통신하기 위한 것으로, 데이터 통신 네트워크(14)를 통해 데이터를 전송한다. The processor 110 fetches and executes programming instructions stored in the memory 150 or the storage 160. Likewise, the processor 110 stores or retrieves application data in the memory 150. The input/output device interface 120 is for connecting the input/output device 12 such as a keyboard, a display, and a mouse device to the computer system 100. The network interface 130 is for communicating with an external network such as an internal network (intranet), the Internet, or a wireless communication network through wired or wireless communication, and transmits data through the data communication network 14.

인터컨넥터(140)는, 프로세서(110)와 입출력 장치 인터페이스(120), 스토리지(160), 네트워크 인터페이스(130) 및 메모리(150) 각각의 사이에서, 프로그래밍 명령어 및 어플리케이션 데이터를 전송하는 기능을 수행한다. 이러한 인터컨넥터(140)는 하나 이상의 버스(BUS)일 수 있다. 프로세서(110)는 단일의 중앙처리장치(CPU)이거나 또는 복수의 CPU, 다양한 구현예에서 복수의 프로세싱 코어를 갖는 단일의 CPU로 구현될 수 있다. 일 측면에 의하면, 프로세서(110)는 디지털 신호 프로세서(Digital Signal Processor, DSP)일 수 있다. The interconnector 140 performs a function of transmitting programming commands and application data between the processor 110 and the input/output device interface 120, the storage 160, the network interface 130, and the memory 150, respectively. do. The interconnector 140 may be one or more buses. The processor 110 may be implemented as a single central processing unit (CPU) or as a single CPU having multiple CPUs, and multiple processing cores in various implementations. According to one aspect, the processor 110 may be a digital signal processor (DSP).

메모리(150)는 일반적으로 SRAM(Static Random Access Memory), DRAM(Dynamic Random Access Memory) 또는 플래시(Flash) 등과 같은 메모리랜덤 엑세스 메모리를 포함한다. 스토리지(160)는 일반적으로 하드 디스크 드라이브, SSD(Solid State Device), 제거 가능한 메모리 카드, 광 스토리지, 플래시 메모리 디바이스, NAS(Network Attached Storage) 또는 SAN(Storage Area Device)에의 연결(connections) 등과 같은 비휘발성 메모리를 포함한다. The memory 150 generally includes a memory random access memory such as static random access memory (SRAM), dynamic random access memory (DRAM), or flash. Storage 160 is generally a hard disk drive, solid state device (SSD), removable memory card, optical storage, flash memory device, such as connections to a network attached storage (NAS) or a storage area device (SAN). Includes non-volatile memory.

컴퓨터 시스템(100)은 하나 이상의 운영 체제(Operating System, OS, 164)를 포함할 수 있다. 운영 체제(164)는 일부는 메모리(150)에 저장되고 나머지 일부는 스토리지(160)에 저장될 수 있다. 이와는 달리, 운영 체제(164)는 전체가 메모리(150)에 저장되거나 또는 스토리지(160)에 저장될 수도 있다. 운영 체제(164)는, 프로세서(110), 입출력 장치 인터페이스(110), 네트워크 인터페이스(130) 등과 같은 다양한 하드웨어 리소스들 사이에서 인터페이스를 제공한다. 또한, 운영 체제(164)는 어플리케이션 프로그램을 위한 시간 기능(time function) 등과 같은 공통 서비스를 제공한다.The computer system 100 may include one or more operating systems (OS) 164. A part of the operating system 164 may be stored in the memory 150 and a part of the operating system 164 may be stored in the storage 160. Alternatively, the entire operating system 164 may be stored in the memory 150 or in the storage 160. The operating system 164 provides an interface between various hardware resources such as the processor 110, the input/output device interface 110, the network interface 130, and the like. In addition, the operating system 164 provides common services such as a time function for an application program.

심층 신경망 압축 장치(152)는 코딩 도구의 훈련된 심층 신경망, 예컨대 훈련된 심층 컨블루션 신경망(Deep Convolutional Neural Network)을 압축하여 비트스트림으로 출력한다. 즉, 심층 신경망 압축 장치(152)는 학습된 또는 훈련된 코딩 도구의 심층 신경망을 압축하여, 호환 가능한 포맷으로 기술하기 위한 수단이다. 이에 의하면, 결과물인 압축된 심층 신경망은 코딩 도구 신경망의 상호운용 가능한 압축 표현(interoperable compressed representation of neural networks)에 해당된다. 코딩 도구의 훈련된 심층 신경망을 압축하기 위하여, 심층 신경망 압축 장치(152)는, 훈련된 심층 신경망에 대한 가지치기(pruning), 양자화(quantization) 및 엔트로피 코딩(entropy coding)을 포함하는 일련의 과정을 수행하여, 비트스트림을 출력한다.The deep neural network compression apparatus 152 compresses a trained deep neural network of a coding tool, for example, a trained deep convolutional neural network, and outputs a compressed bitstream. That is, the deep neural network compression apparatus 152 is a means for compressing a deep neural network of a learned or trained coding tool and describing it in a compatible format. According to this, the resulting compressed deep neural network corresponds to an interoperable compressed representation of neural networks of a coding tool. In order to compress the trained deep neural network of the coding tool, the deep neural network compression device 152 is a series of processes including pruning, quantization, and entropy coding for the trained deep neural network. To output a bitstream.

도 2는 심층 신경망 압축 장치(152)의 구성의 일례를 보여 주는 블록도이다. 도 2를 참조하면, 심층 신경망 압축 장치(152)는 가지치기 유닛(22), 양자화 유닛(24) 및 엔트로피코딩 유닛(26)을 포함한다. 전술한 바와 같이, 심층 신경망 압축 장치(152)는, 코딩 도구의 훈련된 심층 신경망을 압축하여 비트스트림으로 출력한다.2 is a block diagram showing an example of the configuration of the deep neural network compression apparatus 152. Referring to FIG. 2, the deep neural network compression apparatus 152 includes a pruning unit 22, a quantization unit 24, and an entropy coding unit 26. As described above, the deep neural network compression apparatus 152 compresses the trained deep neural network of the coding tool and outputs it as a bitstream.

심층 신경망 압축 장치(152)으로의 입력은 코딩 도구의 훈련된 심층 신경망을 기술하기 위한 다양한 정보와 파라미터들을 포함한다. The input to the deep neural network compression device 152 includes various information and parameters for describing the trained deep neural network of the coding tool.

우선, 심층 신경망 압축 장치(152)으로의 입력은 여러가지 상위 레벨 정보를 포함한다. 예를 들어, 상위 레벨 정보는, 해당 심층 신경망(DNN) 기반 코딩 도구의 유형을 지시하는 정보를 포함할 수 있다. DNN 기반 코딩 도구는 2가지 유형이 존재할 수 있다. 보다 구체적으로, DNN 기반 코딩 도구는 인코더 및 디코더 모두에 필수적인 기능을 구현하는 제1 유형 코딩 도구와, 인코더와 디코더 중에서 어느 하나에만 필수적인 기능을 구현하는 제2 유형 코딩 도구의 두 가지가 존재한다. 이것은 이미지/비디오 코딩에서, 일부의 코딩 도구, 즉 제1 유형 코딩 도구는 인코더와 디코더 모두에 요구되는 것이고, 나머지 다른 일부의 코딩 도구, 즉 제2 유형 코딩 도구는 인코더와 디코더 중에서 어느 하나에만 요구되는 기능이기 때문이다. 예를 들어, 비디오 코딩 과정에서, 인-루프 필터링 과정은 인코더와 디코더 모두에서 행해지는 제1 유형 코딩 도구의 기능에 해당하지만, 인트라 모드 예측 과정은 오직 인코더에서만 행해지는 제2 유형 코딩 도구의 기능에 해당되며, 디코더로는 오직 결정된 예측 모드 정보만이 보내진다. 따라서 두 가지 유형의 DNN 기반 코딩 도구가 고려되어야 하며, 이러한 DNN 기반 코딩 도구의 유형(type)은 반드시 훈련된 DNN 기반 코딩 도구의 상위 레벨 정보로서 표시가 되어야 한다.First, the input to the deep neural network compression device 152 includes various high-level information. For example, the high-level information may include information indicating the type of a corresponding deep neural network (DNN)-based coding tool. There are two types of DNN-based coding tools. More specifically, there are two types of DNN-based coding tools: a first type coding tool that implements a function essential to both an encoder and a decoder, and a second type coding tool that implements a function essential to only one of the encoder and the decoder. This is, in image/video coding, some coding tools, i.e., the first type coding tool, are required for both the encoder and the decoder, and the rest of the coding tools, i. This is because it is a function to be used. For example, in the video coding process, the in-loop filtering process corresponds to the function of the first type coding tool performed at both the encoder and the decoder, but the intra mode prediction process is the function of the second type coding tool performed only at the encoder. And only determined prediction mode information is sent to the decoder. Therefore, two types of DNN-based coding tools must be considered, and the type of these DNN-based coding tools must be indicated as high-level information of the trained DNN-based coding tools.

그리고 상위 레벨 정보로는 DNN 기반 코딩 도구의 전반적인 구성에 관한 정보를 포함한다. 보다 구체적으로, 훈련된 DNN 기반 도구의 구성과 관련된 상위 레벨 정보로는, 인식(recognition), 분류(classification), 생성(generation), 차별화(discrimination) 등과 같은 해당 신경망의 기본 기능의 관점에서 본 타겟 어플리케이션(target application)에 관한 정보, 훈련된 DNN 기반 코딩 도구의 유형을 지시하는 정보, 인코더가 특정 부호화 과정의 수행시에 훈련된 도구 신경망을 추론 엔진에 적용하는 것과 규격화된 이미지 또는 비디오 부호화 도구를 적용하는 것 중에서 무엇을 선택하였는지를 지시하는 정보, 최적화된 콘텐츠 유형(customized content type)에 관한 정보, 오토인코더(autoencoder), CNN(Convolutional Neural Network), GAN(Generative Adversarial Network), RNN(Recurrent Neural Network) 등과 같은 훈련된 DNN 기반 신경망의 알고리즘에 관한 기초 정보, 트레이닝 데이터 및/또는 테스트 데이터에 관한 기본 정보, 메모리 용량 및 컴퓨팅 파워의 관점에서 추론 엔진에 요구되는 능력에 관한 정보, 모델 압축에 관한 정보 등을 포함한다.And, the high-level information includes information on the overall configuration of the DNN-based coding tool. More specifically, as high-level information related to the composition of the trained DNN-based tool, targets viewed from the perspective of basic functions of the neural network such as recognition, classification, generation, and discrimination. Information on the target application, information indicating the type of trained DNN-based coding tool, applying the trained tool neural network to the inference engine when the encoder performs a specific coding process, and a standardized image or video coding tool. Information indicating what is selected from among those to be applied, information on customized content type, autoencoder, CNN (Convolutional Neural Network), GAN (Generative Adversarial Network), RNN (Recurrent Neural Network) ), etc., basic information about the algorithm of the trained DNN-based neural network, basic information about training data and/or test data, information about the capability required by the inference engine in terms of memory capacity and computing power, information about model compression. And the like.

심층 신경망 압축 장치(152)으로의 입력은 훈련된 심층 신경망를 기술하는 다양한 파라미터들을 포함한다. 다양한 파라미터들은 커널, 뉴런, 연결의 가중치들을 포함한다. 이하, 심층 신경망의 일례인 컨볼루션 신경망(CNN)의 아키텍쳐를 참조하여, 이에 대하여 보다 구체적으로 설명한다.The input to the deep neural network compression device 152 includes various parameters describing the trained deep neural network. Various parameters include kernel, neuron, and connection weights. Hereinafter, with reference to the architecture of a convolutional neural network (CNN), which is an example of a deep neural network, this will be described in more detail.

도 3은 본 발명의 일 실시예에 따른 훈련된 심층 신경망의 일례를 보여 주는 도면으로서, 컨볼루션 신경망(CNN)인 경우이다. 도 3을 참조하면, CNN(200)은 입력 계층(imput layer, 210), 컨볼루션 계층(convolutional layer, 215), 서브샘플링 계층(220), 컨볼루션 계층(225), 서브샘플링 계층(230), 전연결 계층(fully connected(FC) layer, 235, 240) 및 출력 계층(245)를 포함한다. 도 3에 도시된 예에서 입력 계층(210)은 32×32 픽셀 이미지를 받아들이도록 구성되어 있으며, 컨볼루션 계층(215)은 입력 계층으로부터 6개의 28×28 특성맵(feature map)을 생성한다. 3 is a diagram showing an example of a trained deep neural network according to an embodiment of the present invention, in the case of a convolutional neural network (CNN). Referring to FIG. 3, the CNN 200 includes an input layer (imput layer, 210), a convolutional layer (215), a subsampling layer (220), a convolutional layer (225), and a subsampling layer (230). , A fully connected (FC) layer (235, 240) and an output layer (245). In the example shown in FIG. 3, the input layer 210 is configured to accept a 32×32 pixel image, and the convolution layer 215 generates six 28×28 feature maps from the input layer.

이와 같이, 도 3에는 특정한 CNN의 구성이 도시되어 있지만, 보다 일반적으로 CNN은 각각 서브 샘플링 단계를 가지는 하나 또는 그 이상의 컨볼루션 계층과, 그리고 하나 또는 그 이상의 전연결 계층을 포함하여 구성된다. 일반적으로, 도시된 CNN 아키텍쳐는 입력 이미지의 2차원 구조를 이용하기 위하여 고안되었다. 예를 들어, CNN은 국부적인 연결과, 특정 형태의 풀링이 뒤따르는 연결된 가중치를 이용하여 이를 달성한다. 일반적으로, 유사한 수의 은닉 유닛을 갖는 전연계 네트워크와 비교하여, CNN은 훈련하기가 보다 쉬우며, 보더 적은 파라미티러를 가지는 경향이 있다.As such, although the configuration of a specific CNN is shown in FIG. 3, more generally, the CNN includes one or more convolutional layers each having a sub-sampling step, and one or more fully-connected layers. In general, the illustrated CNN architecture is designed to use a two-dimensional structure of an input image. For example, CNN achieves this by using local connections and connected weights followed by some form of pooling. In general, compared to all-connected networks with a similar number of hidden units, CNNs are easier to train and tend to have fewer parameters.

일반적으로, CNN은 컨볼루션 계층과 서브샘플링 계층, 이에 뒤따르는 전연결 계층을 포함한다. 일 실시예에 따르면, CNN은 컨볼루션 계층에서 x×y×z의 이미지를 입력으로 받아들이는데, 여기서 x와y는 각각 이미지의 높이와 폭을 나타내며, z는 이미지에서의 채널을 나타낸다. 예컨대, RGB 이미지는 z=3의 채널을 가진다. 컨볼루션 계층은 a×b×c 크기의 필터(커널)을 포함할 수 있는데, 여기서 a×b는 x×ㅛ보다 작고, c는 z보다 작거나 같다. 일반적으로, 필터 k의 크기가 국부적으로 연결된 구조를 초래하는데, 이것은 이미지와 컨볼루션되어서 k개의 특성맵을 생성한다. 또한, 각 특성맵은 다양한 크기의 인접한 영역에 걸쳐서 서브샘플링된다. In general, the CNN includes a convolutional layer and a subsampling layer, followed by a full connection layer. According to an embodiment, the CNN receives an image of x×y×z as an input in the convolutional layer, where x and y represent the height and width of the image, respectively, and z represents a channel in the image. For example, an RGB image has a z=3 channel. The convolutional layer may include a filter (kernel) of size a×b×c, where a×b is less than x×ㅛ and c is less than or equal to z. In general, the size of the filter k results in a locally linked structure, which is convolved with the image to produce k feature maps. In addition, each feature map is subsampled over adjacent regions of various sizes.

가지치기 유닛(22)은 훈련된 심층 신경망을 구성하는 노드(뉴런) 및/또는 이들의 연결 관계의 일부를 제거하여 신경망의 구성을 간략하게 하기 위한 것이다. 즉, 가지치기 유닛(22)에서는, 입력되는 가지치기 비율(input pruning rate)에 따라서 적절한 임계치를 찾아서(search threshold), 찾아낸 임계치 이하의 가중치는 모두 '0'으로 처리한다(pruning weight).The pruning unit 22 is for simplifying the configuration of the neural network by removing some of the nodes (neurons) constituting the trained deep neural network and/or their connection relationships. That is, the pruning unit 22 searches for an appropriate threshold according to an input pruning rate (search threshold), and processes all weights below the found threshold as '0' (pruning weight).

예컨대, 가지치기 유닛(22)은 미리 결정된 소정의 가지치기 비율(pruning rate)에 따라서, 훈련된 신경망(좌측)을 구성하는 뉴런들에서 일부(녹색 라인으로 표시된 뉴런들의 연결로서, 가중치가 임의의 임계치 이하인 연결)를 제거하며, 그 결과 푸른색으로 표시된 뉴런들의 연결만 남게 된다. 이에 의하면, 훈련된 신경망을 표현하는 가중치(weight value)가 소정의 임계치(threshold) 이하인 가중치는 모두 '0'으로 치환함으로써, 훈련된 신경망에 대한 표현을 간략화한다. 여기서, 임계치의 크기는 가지치기 비율에 따라서 달라질 수 있다. 가지치기 유닛(22)은 입력되는 가치지치 비율에 대응하는 임계치(threshold value)을 찾아서, 임계치 이하의 가중치들을 '0'으로 처리한다.For example, the pruning unit 22 is a part of neurons constituting the trained neural network (left) according to a predetermined pruning rate (as a connection of neurons indicated by a green line, and the weight is random). Connections that are below the threshold) are removed, leaving only the connections of the neurons marked in blue. According to this, the expression of the trained neural network is simplified by substituting all weights whose weight values representing the trained neural network are less than or equal to a predetermined threshold with '0'. Here, the size of the threshold value may vary according to the pruning ratio. The pruning unit 22 finds a threshold value corresponding to the input value ratio, and processes weights below the threshold value as '0'.

훈련된 심층 신경망의 여분(redundancy)을 감소시키는 직접적인 방법은 컨볼루션 계층의 커널과 FC 계층의 뉴런을 가지치기하는 것이다. 비록, 모델의 여분은 일반적으로 심층 신경망의 보편성을 증가시키지만, 심층 신경망의 여분을 선택적으로 감소시키는 것은, 적절하게 여분을 감소시킬 수 있어서, 모델의 예측력, 추론 속도, 메모리 사용, 스토리지 공간 및 파워 소비 사이의 균형을 달성할 수 있다. 비록 커널들과 뉴런들을 프루닝하는 것이 이득이 되지만, 무작위적으로 프루닝하거나 또는 뉴런과 커널의 수를 임의적으로 변경하는 것은 예측력에 있어서 큰 퇴보를 초래할 수 있다. A direct way to reduce the redundancy of a trained deep neural network is to prune the kernel of the convolutional layer and the neurons of the FC layer. Although the redundancy of the model generally increases the universality of the deep neural network, selectively reducing the redundancy of the deep neural network can appropriately reduce the redundancy, so that the model's predictive power, inference speed, memory usage, storage space and power A balance between consumption can be achieved. Although pruning kernels and neurons is beneficial, random pruning or randomly changing the number of neurons and kernels can lead to a major decline in predictive power.

특성 추출의 관점에서 볼 때, 심층 신경망 모델의 커널과 뉴런은 특성 추출자로 간주될 수 있다. 따라서, 상당히 크기가 큰 특성을 직면한 경우, 가지치기 유닛(22)은 특성 선택 과정을 사용하여, 무관한 및/또는 여부의 특성을 개체화하고 과잉을 피한다. 일반적으로, 추출된 특성들에 특성 선택/순위화 방법을 적용하는 것은 각 특성 추출자의 중요성을 함축하고 있어서, 가지치기 유닛(22)이 덜 중요한 것을 프루닝하여, 심층 신경망의 예측력과 모델 과잉 사이의 균형을 달성할 수 있도록 한다. 그렇게 할 경우에, 가지치기 유닛(22)은 각 컨볼루션 및 FC 계층의 응답을 추출하고 특성 선택의 관점으로부터 그들의 가치도에 의하여 커널과 뉴런의 순위를 매겨서, 덜 가치가 있는 것들을 가지치기한다. 가지치기 유닛(22)은 또한 선택된 가치있는 특성 추출자들을 보다 크기가 작은 심층 신경망의 시작점으로 사용하여 보다 작은 학습율로 미세 튜닝을 수행하여 예측력을 회복할 수 있다. From the perspective of feature extraction, the kernel and neurons of the deep neural network model can be regarded as feature extractors. Thus, when faced with a feature that is quite large, the pruning unit 22 uses a feature selection process to individualize irrelevant and/or whether or not features and avoid redundancy. In general, applying the feature selection/ranking method to the extracted features implies the importance of each feature extractor, so that the pruning unit 22 prunes the less important ones, between the predictive power of the deep neural network and the model overkill. To achieve a balance of. In doing so, the pruning unit 22 extracts the responses of each convolution and FC layer and ranks the kernel and neurons by their value from the viewpoint of feature selection, pruning the less valuable ones. The pruning unit 22 may also recover predictive power by performing fine tuning at a smaller learning rate using the selected valuable feature extractors as a starting point of a smaller deep neural network.

커널과 뉴런을 선택적으로 프루닝하는 것에 있어서 하나의 장애물은, 컨볼루션 계층과 FC 계층에 의하여 추출된 특성들의 차원이 여전히 크다는 것이다. 따라서 본 발명의 실시예에 따른 가지치기 유닛(22)은, 심층 신경망의 크기를 줄이면서 동시에 예측력을 유지하기 위하여, 가치도 점수 후방 전파를 수행한다. 일 실시예에 의하면, 가지치기 유닛(22)은 심층 신경망의 보다 높은 레벨, 예컨대 분류기의 입력들에 가치도 점수를 부여하고, 그리고 심층 신경망의 하위 레벨들로 가치도 점수를 후방 전파시킬 수 있다. 이러한 가치도 점수 후방 전파를 통하여, 가지치기 유닛(22)은 전체 심층 신경망의 특성 추출자들의 가치도를 효과적으로 측정할 수 있을 뿐만 아니라 전체 망을 통해 일관되게 수행하는 것이 가능하다. One obstacle to selective pruning of kernels and neurons is that the dimensions of the features extracted by the convolutional and FC layers are still large. Accordingly, the pruning unit 22 according to an exemplary embodiment of the present invention performs value score backward propagation in order to reduce the size of the deep neural network and maintain the predictive power at the same time. According to an embodiment, the pruning unit 22 may assign a value score to a higher level of the deep neural network, such as inputs of a classifier, and propagate the value score backward to the lower levels of the deep neural network. . Through this value score backward propagation, the pruning unit 22 not only can effectively measure the value of the feature extractors of the entire deep neural network, but also can perform consistently throughout the entire network.

도 4는 본 발명의 일 실시예에 따른 비디오 코딩 도구의 훈련된 심층 인공 신경망의 압축 장치(152)의 가지치기 유닛(22)에서의 가지치기 방법의 일례를 보여 주는 흐름도이다.4 is a flowchart showing an example of a pruning method in the pruning unit 22 of the compressed device 152 of a trained deep artificial neural network of a video coding tool according to an embodiment of the present invention.

도 4를 참조하면, 가지치기 유닛(22)은 훈련된 심층 신경망으로 기술되는 코딩 도구에 대한 식별을 수행한다(S11). 즉, 가지치기 유닛(22)은 먼저 해당 훈련된 심층 신경망이 어떠한 유형의 코딩 도구인지를 식별한다. 예컨대, 가지치기 유닛(22)은 해당 코딩 도구가 인코딩과 디코딩에 공통적으로 수행되는 코딩 기능에 관한 것인지 또는 인코딩에만 적용되고 디코딩에는 적용되지 않는 코딩 기능에 관한 것인지를 식별할 수 있다.Referring to FIG. 4, the pruning unit 22 identifies a coding tool described as a trained deep neural network (S11). That is, the pruning unit 22 first identifies what type of coding tool the trained deep neural network is. For example, the pruning unit 22 may identify whether the corresponding coding tool relates to a coding function commonly performed in encoding and decoding or a coding function applied only to encoding and not applied to decoding.

그리고 가지치기 유닛(22)은, 식별된 코딩 도구의 유형을 고려하여, 코딩 도구의 훈련된 심층 신경망에 대한 특성을 추출한다(S12). 이 과정은 훈련된 심층 신경망으로부터 심층 신경망 응답을 추출하는 과정에 해당된다. 코딩 도구가 인코딩과 디코딩에 모두 적용되는 것인지 또는 인코딩에만 적용되는 것인지에 따라서, 특성을 추출할 때 적용하는 세부적인 알고리즘이 달라질 수 있다. Then, the pruning unit 22 extracts characteristics of the trained deep neural network of the coding tool in consideration of the type of the identified coding tool (S12). This process corresponds to the process of extracting the deep neural network response from the trained deep neural network. Depending on whether the coding tool is applied to both encoding and decoding or only to encoding, a detailed algorithm applied when extracting features may vary.

그리고 가지치기 유닛(22)은 각 특성 추출자에 대하여 가치도 점수를 계산하여 측정한다(S13). 본 단계에서는 가지치기 유닛(22)은 추출된 특성들에 대하여 가치도 점수를 계산하는데, 이하에서는 그 방법의 일례에 대하여 설명한다. 이하의 설명에서는, 공간적으로 정사각형인 3-방향 텐서에 대하여 설명하지만, 다른 구성에 대해서도 확장하여 적용될 수 있다. 출력 텐서 크기 Y×Y×F(여기서, Y는 공간 크기이고, F는 출력 채널의 수이다)를 갖는 컨볼루션 계층에 대하여, 가지치기 유닛(22)은 우선, 공간 위치(i, j)를 갖는 f번째 출력 채널의 각 위치에 대한 가치도 점수 IS_ijf를 획득한다. 일 실시예에 의하면, 가지치기 유닛(22)은 수학식 1을 사용하여 f번째 출력 채널에 대한 가치도 점수를 계산한다.Then, the pruning unit 22 calculates and measures a value degree score for each feature extractor (S13). In this step, the pruning unit 22 calculates a value score for the extracted features, and an example of the method will be described below. In the following description, a spatially square three-way tensor will be described, but it may be extended and applied to other configurations. For a convolutional layer having an output tensor size Y×Y×F (where Y is the spatial size and F is the number of output channels), the pruning unit 22 first determines the spatial position (i, j). The value of each position of the f-th output channel having the value IS _ijf is obtained. According to an embodiment, the pruning unit 22 calculates a value score for the f-th output channel using Equation 1.

일반적으로, 특성 선택에는 3가지의 주요한 카테고리가 존재한다. 첫 번째는 분류기를 사용하여 특성들의 부분집합에 대하여 점수를 부여하는 래퍼(wrapper)이고, 두 번째는 규칙화 방법에 의하여 분류기의 훈련 과정에서 특성들을 잠재적으로 선택하는 임베디드 방법이며, 그리고 세 번째는 분류기와 상관없이 데이터의 고유한 특성을 이용하는 필터 방법이다. 일 실시예에 의하면, 가지치기 유닛(22)은, 사전에 훈련된 모델의 응답에 기초하여 특성 선택을 수행하도록 고안되어 있을 수 있다. 특정한 실시예에서, 가지치기 유닛(22)은 특성 순위화를 수행하기 위하여 무한 특성 선택(Infinite Feature Selection) 필터 알고리즘을 사용할 수도 있다. 일반적으로, 무한 특성 선택 필터 알고리즘을 수행할 때, 가지치기 유닛(22)은 특성 선택 문제를 친화도 그래프에 대응시키는데, 여기서 각 정점이 하나의 특성이고, 정점들 사이의 가장자리는 관계인데, 이들의 중요도는 변수의 함수와 정점쌍의 상호 연관에 의해 정의된다. 그래프에서 각각의 경로(정점들과 가장자리들의 집합)는 특성 부분집합으로 간주되며, 경로 비용은 가장자리 가중치의 합이다. 따라서, 무한 특성 선택 필터 알고리즘을 수행할 경우에는, 가지치기 유닛(22)은, 모든 가장한 특성들의 부분집합을 고려하면서 주어진 특성의 가치도를 평가하므로, 각 특성의 점수는 모든 다른 특성들에 의하여 영향을 받는다. In general, there are three main categories of feature selection. The first is a wrapper that gives a score for a subset of features using a classifier, the second is an embedded method that potentially selects features in the training process of the classifier by a regularization method, and the third is It is a filter method that uses the unique characteristics of the data regardless of the classifier. According to an embodiment, the pruning unit 22 may be designed to perform feature selection based on a response of a previously trained model. In a particular embodiment, the pruning unit 22 may use an Infinite Feature Selection filter algorithm to perform feature ranking. In general, when performing the infinite feature selection filter algorithm, the pruning unit 22 correlates the feature selection problem to the affinity graph, where each vertex is a feature, and the edges between the vertices are a relationship. The importance of is defined by the function of the variable and the correlation of the vertex pairs. In the graph, each path (set of vertices and edges) is considered a subset of features, and the path cost is the sum of the edge weights. Therefore, in the case of performing the infinite feature selection filter algorithm, the pruning unit 22 evaluates the value of a given feature while considering a subset of all impersonated features, so that the score of each feature is assigned to all other features. Affected by

전술한 바와 같이, 가지치기 유닛(22)은 특성 추출자의 가치도를 측정한다(S13). 예를 들어, 가지치기 유닛(22)은 무한 특성 선택 필터 알고리즘에 따른 출력을 각 특성의 가치도 점수로 고려할 수 있다. 일 실시예에 의하면, 각 뉴런의 응답은 컨볼루션에 의해 계산된다. 가지치기 유닛(22)은 심층 신경망의 가중치를 레버리지하여 특성 추출자의 가치도를 대응시킬 수 있다. 뉴런 A의 경우에, 뉴런의 가치도 점수를, 뉴런 A에 연결되어 있는, FC 계층 또는 컨볼루션 계층인 이전 계층의 뉴런들에, 연결의 가중치에 비례하여 후방 전파한다. FC 계층에서 뉴런들에 대한 가치도 점수 벡터가 주어지면, 가지치기 유닛(22)은 순위에 기초하여 뉴런들을 가지치기할 수 있다. As described above, the pruning unit 22 measures the value of the feature extractor (S13). For example, the pruning unit 22 may consider the output according to the infinite feature selection filter algorithm as a value of each feature as a score. According to one embodiment, the response of each neuron is calculated by convolution. The pruning unit 22 may leverage the weight of the deep neural network to match the value of the feature extractor. In the case of neuron A, the neuron's value score propagates backwards in proportion to the weight of the connection, to neurons of the previous layer, either the FC layer or the convolutional layer, connected to neuron A. Given the value score vector for neurons in the FC layer, the pruning unit 22 may prune neurons based on the rank.

일단 가치도 점수가 계산이 되면, 가지치기 유닛(22)은, 출력 채널의 가치도 점수의 순위에 기초하여 덜 중요한 커널들을 가지치기한다(S14). 일 실시예에 의하면, 가지치기 유닛(22)에 의하여 사용되는 가지치기 비율은, 분류 성능과 모델 과잉 사이의 균형을 고려한 소정의 파라미터에 의하여 결정될 수 있다. 그리고 가지치기 유닛(22)은, 가지치기된 심층 신경망에 대하여 필요한 미세 조정을 수행한다(S15).Once the value score is calculated, the pruning unit 22 prunes the less important kernels based on the rank of the value score of the output channel (S14). According to an embodiment, the pruning ratio used by the pruning unit 22 may be determined by a predetermined parameter in consideration of a balance between classification performance and model excess. Then, the pruning unit 22 performs necessary fine adjustment on the pruned deep neural network (S15).

도 5는 본 발명의 다른 실시예에 따른 비디오 코딩 도구의 훈련된 심층 인공 신경망의 압축 장치(152)의 가지치기 유닛(22)에서의 가지치기 방법의 일례를 보여 주는 흐름도이다. 본 실시예에 따른 가지치기 방법은, 가치도 점수의 후방 전파를 사용한다는 점에서, 전술한 실시예와 차이가 있다.5 is a flowchart showing an example of a pruning method in the pruning unit 22 of the compressed device 152 of a trained deep artificial neural network of a video coding tool according to another embodiment of the present invention. The pruning method according to the present embodiment is different from the above-described embodiment in that it uses the backward propagation of the value score.

도 5를 참조하면, 가지치기 유닛(22)은 훈련된 심층 신경망으로 기술되는 코딩 도구에 대한 식별을 수행한다(S21). 즉, 가지치기 유닛(22)은 먼저 해당 훈련된 심층 신경망이 어떠한 유형의 코딩 도구인지를 식별한다. 예컨대, 가지치기 유닛(22)은 해당 코딩 도구가 인코딩과 디코딩에 공통적으로 수행되는 코딩 기능에 관한 것인지 또는 인코딩에만 적용되고 디코딩에는 적용되지 않는 코딩 기능에 관한 것인지를 식별할 수 있다.Referring to FIG. 5, the pruning unit 22 identifies a coding tool described as a trained deep neural network (S21). That is, the pruning unit 22 first identifies what type of coding tool the trained deep neural network is. For example, the pruning unit 22 may identify whether the corresponding coding tool relates to a coding function commonly performed in encoding and decoding or a coding function applied only to encoding and not applied to decoding.

그리고 가지치기 유닛(22)은, 식별된 코딩 도구의 유형을 고려하여, 코딩 도구의 훈련된 심층 신경망의 고레벨 계층에 대한 특성을 추출한다(S22). 이 과정은 훈련된 심층 신경망으로부터 심층 신경망 응답을 추출하는 과정에 해당된다. 코딩 도구가 인코딩과 디코딩에 모두 적용되는 것인지 또는 인코딩에만 적용되는 것인지에 따라서, 특성을 추출할 때 적용하는 세부적인 알고리즘이 달라질 수 있다. Then, the pruning unit 22 extracts a characteristic of the high-level layer of the trained deep neural network of the coding tool in consideration of the type of the identified coding tool (S22). This process corresponds to the process of extracting the deep neural network response from the trained deep neural network. Depending on whether the coding tool is applied to both encoding and decoding or only to encoding, a detailed algorithm applied when extracting features may vary.

그리고 가지치기 유닛(22)은 각 특성 추출자에 대하여 가치도 점수를 계산하여 측정한다(S23). 예를 들어, 가지치기 유닛(22)은 전술한 수학식 1을 사용하여 가치도 점수를 계산할 수 있다. 계산된 가치도 점수는 심층 신경망의 커널과 뉴런을 선택적으로 가지치기하는데 사용될 수 있으며, 본 실시예에서는 고레벨 계층부터 가지치기를 시작한다. And the pruning unit 22 calculates and measures a value degree score for each feature extractor (S23). For example, the pruning unit 22 may calculate a value degree score using Equation 1 described above. The calculated value score can be used to selectively prune the kernel and neurons of the deep neural network, and in this embodiment, pruning starts from the high-level layer.

계속해서, 가지치기 유닛(22)은 선택된 특성 추출자의 가치도 점수를 후방전파시킨다(S24). 이 경우에, 가지치기 유닛(22)은 가지치기되어서 이미 제거된 뉴런과 커널은 무시할 수 있다. 만일, 심층 신경망이 최종 분류기 이전에 FC 계층이 없는 신경망인 경우에는, 가지치기 유닛(22)은, 마지막 컨볼루션 계층의 평탄화된 응답에 대하여 특성 선택을 수행할 수 있다. Subsequently, the pruning unit 22 backpropagates the value score of the selected feature extractor (S24). In this case, the pruning unit 22 is pruned so that neurons and kernels that have already been removed can be ignored. If the deep neural network is a neural network without an FC layer before the final classifier, the pruning unit 22 may perform feature selection on the flattened response of the last convolutional layer.

일반적으로, 가지치기 유닛(22)은, 심층 신경망의 하류 커널 및 뉴런으루부터 심층 신경망의 상류 커널 및 뉴런으로 가치도를 전달하기 위하여, 가치도 후방 전파를 사용할 수 있다. 예를 들어, 특정한 뉴런이 소정의 가중치를 갖는 경우에, 가지치기 유닛(22)은, 뉴런의 활성도를 계산하기 위하여 이전에 사용된 뉴런을 식별하여, 해당 계층의 동작에 상응하는 가중치에 비례하여 뉴런의 가치도를 후방 전파한다.In general, the pruning unit 22 may use the value back propagation in order to transfer the value from the downstream kernels and neurons of the deep neural network to the upstream kernels and neurons of the deep neural network. For example, when a specific neuron has a predetermined weight, the pruning unit 22 identifies the neuron previously used to calculate the activity of the neuron, and is proportional to the weight corresponding to the operation of the layer. It propagates the value of neurons backwards.

그리고 가지치기 유닛(22)은 후방 전파된 가치도 점수의 결과에 기초하여 가지치기를 수행한 다음(S25), 가지치기된 심층 신경망에 대하여 필요한 미세 조정을 수행한다(S26). Then, the pruning unit 22 performs pruning based on the result of the backward propagated value score (S25), and then performs necessary fine adjustments on the pruned deep neural network (S26).

계속해서 도 2를 참조하면, 양자화 유닛(24)은 가지치기 유닛(22)의 출력들, 즉 가중치들에 대한 양자화를 수행한다. 이를 위하여, 입력되는 양자화 비트(input quantization bits)에 기초하여 양자화를 수행하는데, 최대/최소값(Max/Min vaules)을 추출하여 양자화된 가중치를 출력한다. With continued reference to FIG. 2, the quantization unit 24 performs quantization on outputs of the pruning unit 22, that is, weights. To this end, quantization is performed based on input quantization bits, and a quantized weight is output by extracting a maximum/minimum value (Max/Min vaules).

본 실시예의 일 측면에 의하면, 양자화를 위하여 입력되는 파라미터들의 분포를 고려함으로써 양자화 오차(quantization error)를 감소시키기 위하여, 양자화 유닛(24)은 적응적 양자화(adaptive quantization)를 수행할 수도 있다. 적응적 양자화 과정에서는 압축된 정수 가중치 파일과 코드북이 입력되며, 양자화 레벨(quantization level, r_k) 및 양자화 영역 경계(quantization region boundary, d_k)는 수학식 2로 표현될 수 있다. According to an aspect of the present embodiment, in order to reduce a quantization error by considering a distribution of parameters input for quantization, the quantization unit 24 may perform adaptive quantization. In the adaptive quantization process, a compressed integer weight file and a codebook are input, and a quantization level (r _k ) and a quantization region boundary (d _k ) may be expressed by Equation (2).

이러한 적응적 양자화 과정의 일례는 도 6에 도시되어 있다.An example of such an adaptive quantization process is shown in FIG. 6.

도 6에 도시된 것과 같은 적응적 양자화 과정에서, 만일 양자화 오차가 충분히 낮지 않은 경우에는, 불균일 양자화가 균일 양자화로 대체될 수 있다. 이에 의하면, 입력은 계층 수를 지시하는 구성 파일을 포함한다. 그리고 만일 양자화 오차의 크기가 균일 양자화보다 큰 경우에는, 불균일 양자화 대신에 균일 양자화가 사용될 수 있다. In the adaptive quantization process as shown in FIG. 6, if the quantization error is not low enough, non-uniform quantization may be replaced with uniform quantization. According to this, the input includes a configuration file indicating the number of layers. And, if the size of the quantization error is larger than the uniform quantization, uniform quantization may be used instead of non-uniform quantization.

도 7은 본 발명의 다른 실시예에 따른 양자화 과정을 보여 주는 흐름도이다. 도 7에 도시된 양자화 과정은, 뉴럴넷의 가중치 값을 양자화한 값(integer, 정수)을 이진화 형태로 저장하는 기존의 양자화 과정을 추가/보완하였다. 도 7에 도시된 양자화 과정에 의하면, 손실 코딩(Lossy coding)은 양자화 뿐만 아니라 다른 기법을 사용할 수도 있으며, 예를 들면 가중치 값 행렬에서 분할적으로 코딩하는 방법일 수 있다(부호화 효율이 좋은(RD(Rate distortion) 등으로 추정) 분할 맵을 결정). 추가적으로, 가중치 값들을 양자화한 값들을 무손실 코딩인 엔트로피 코딩(산술코딩, CABAC, 팔레트, 인덱스맵코딩 등)으로 이진화 파일을 만들어낸다. 또한 디코딩(or decompression) 과정에서는 이진화 파일 (bitstream)등을 입력값을 두면 복원(reconstruction)을 진행하는데, 이 때 뉴럴넷 모델 복원(reconstruction)을 전체를 진행할 지 또는 일부만 수행할 것인지에 대한 정보도 포함될 수 있다.7 is a flowchart showing a quantization process according to another embodiment of the present invention. The quantization process shown in FIG. 7 adds/supplements the existing quantization process for storing a quantized value (integer, integer) of a neuralnet weight value in a binarized form. According to the quantization process shown in FIG. 7, lossy coding may use not only quantization but also other techniques, and may be, for example, a method of dividing coding in a weight value matrix (with good coding efficiency (RD (Estimated by Rate distortion), etc.) Determine the segmentation map). In addition, the quantized values of the weight values are generated by entropy coding (arithmetic coding, CABAC, palette, index map coding, etc.), which is lossless coding. In addition, in the process of decoding (or decompression), reconstruction is performed by inputting a binary file (bitstream), etc., at this time, information on whether to perform the neuralnet model reconstruction in its entirety or only part of it may be included. have.

계속해서 도 2를 참조하면, 엔트로피 코딩 유닛(26)에서는 양자화된 가중치와 인덱스 각각을 소정의 알고리즘(예컨데, 엔트로피 부호화)에 따라서 부호화를 수행하며, 그 결과 압축된 심층 신경망의 비트 스트림이 출력된다. 본 실시예에 의하면, 엔트로피 부호화의 구체적인 과정에 대해서는 특별한 제한이 없으며, 당업계에서 공지된 것이라면, 엔트로피 부호화의 특성상 본질적으로 적용이 불가능한 알고리즘이 아니라면, 제한없이 적용될 수 있다.2, the entropy coding unit 26 encodes each of the quantized weights and indexes according to a predetermined algorithm (e.g., entropy coding), and as a result, a bit stream of a compressed deep neural network is output. . According to the present embodiment, there is no particular limitation on a specific process of entropy encoding, and as long as it is known in the art, it may be applied without limitation, unless an algorithm is essentially impossible to apply due to the nature of entropy encoding.

전술한 바와 같이, 이상의 설명은 실시예에 불과할 뿐이며 이에 의하여 한정되는 것으로 해석되어서는 안된다. 본 발명의 기술 사상은 후술하는 특허청구범위에 기재된 발명에 의해서만 특정되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다. 따라서 전술한 실시예가 다양한 형태로 변형되어 구현될 수 있다는 것은 통상의 기술자에게 자명하다.As described above, the above description is merely an example and should not be construed as being limited thereto. The technical idea of the present invention should be specified only by the invention described in the claims to be described later, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention. Therefore, it is obvious to a person skilled in the art that the above-described embodiments can be modified and implemented in various forms.

Claims

A pruning unit for selectively pruning a trained deep neural network of a coding tool constituting video coding;
A quantization unit for quantizing the trained deep neural network pruned by the pruning unit; And
Entropy coding tool for entropy coding the trained deep neural network quantized by the quantization unit and outputting it as a bitstream,
The pruning unit, after identifying the type of the coding tool, calculates the value of the elements describing the trained deep neural network in consideration of the type of the identified coding tool, and propagates backward through the deep neural network. Compressing apparatus of a trained deep artificial neural network of a video coding tool, characterized in that to calculate the value by reflecting that.

The method of claim 1,
The trained deep neural network includes a plurality of output characteristics, a plurality of neurons, and a plurality of convolution kernels,
One or more of the elements describing the trained deep neural network include at least one of the plurality of output characteristics, the plurality of neurons, and the plurality of convolution kernels. Compression device of deep artificial neural network.

The method of claim 2,
The trained deep neural network is a compression apparatus for a trained deep artificial neural network of a video coding tool, characterized in that it includes a plurality of neurons present in a full connection layer.

A pruning step for selectively pruning a trained deep neural network of a coding tool constituting video coding;
A quantization step for quantizing the trained deep neural network pruned in the pruning step; And
Entropy coding the trained deep neural network quantized in the quantization step and outputting it as a bitstream,
In the pruning step, after the type of the coding tool is identified, the value of the elements describing the trained deep neural network is calculated in consideration of the type of the identified coding tool, but propagated backward through the deep neural network. A method for compressing a trained deep artificial neural network of a video coding tool, characterized in that to calculate the value by reflecting that.

The method of claim 4,
The trained deep neural network includes a plurality of output characteristics, a plurality of neurons, and a plurality of convolution kernels,
One or more of the elements describing the trained deep neural network include at least one of the plurality of output characteristics, the plurality of neurons, and the plurality of convolution kernels. Deep artificial neural network compression method.

The method of claim 5,
The trained deep neural network, a method for compressing a trained deep artificial neural network of a video coding tool, characterized in that it includes a plurality of neurons present in a full connection layer.