KR20200035879A

KR20200035879A - Method and apparatus for image processing using context-adaptive entropy model

Info

Publication number: KR20200035879A
Application number: KR1020190118058A
Authority: KR
Inventors: 이주영; 조승현; 백승권; 고현석; 김연희; 김종호; 석진욱; 임웅; 정세윤; 김휘용; 최진수
Original assignee: 한국전자통신연구원
Priority date: 2018-09-27
Filing date: 2019-09-25
Publication date: 2020-04-06

Abstract

Provided is a context-adaptive entropy model for end-to-end optimized image compression. According to an embodiment, the model utilizes two types of contexts. The contexts are bit-consuming contexts and bit-free contexts, and the contexts are classified according to whether or not the contexts require additional bit allocation. The model can more accurately estimate the distribution of each hidden expression component while having a more generalized form of approximation models based on the contexts, thereby improving compression performance.

Description

Method and method for image processing using a context-adaptive entropy model {METHOD AND APPARATUS FOR IMAGE PROCESSING USING CONTEXT-ADAPTIVE ENTROPY MODEL}

아래의 실시예들은 비디오의 복호화 방법, 복호화 장치, 부호화 방법 및 부호화 장치에 관한 것으로서, 문맥-적응적 엔트로피 모델을 사용하는 복호화 방법, 복호화 장치, 부호화 방법 및 부호화 장치에 관한 것이다.The following embodiments relate to a video decoding method, a decoding apparatus, a coding method, and a coding apparatus, and related to a decoding method, a decoding apparatus, a coding method, and a coding apparatus using a context-adaptive entropy model.

최근, 인공 신경 네트워크들(Artificial Neural Networks; ANNs)은 다양한 영역들에 도달하였고, 그들의 탁월한 최적화 및 효현 학습 성능에 기인하여 많은 혁신들(breakthroughs)을 달성하였다.Recently, Artificial Neural Networks (ANNs) have reached a variety of domains and have achieved many breakthroughs due to their excellent optimization and efficacy learning performance.

특히, 직관적이라 사람에 의해 단기간에 해결될 수 있는 다양한 문제들에 대하여, 다수의 ANN-기반 연구들이 수행되어 왔고, 큰 진전들을 이루어왔다.In particular, a number of ANN-based studies have been conducted and large progress has been made on various problems that can be solved in a short period of time by a human being because they are intuitive.

그러나, 영상 압축에 대해서는, 영상 압축의 복잡한 대상(target) 문제들로 인해, 상대적으로 느린 진전들만이 이루어졌다.However, for image compression, only relatively slow progress has been made due to the complex target problems of image compression.

나아가, 전통적인 코덱들(codecs)의 수십 년에 걸친 표준화(standardization) 역사 및 견고한 구조들(firm structures)을 통해 축적된 휴리스틱들(heuristics)의 엄청난 양 때문에, 영상 압축에 관련된 작업들의 대부분은 재구축된(reconstructed) 영상들의 품질의 향상에 중점을 두고 있다.Furthermore, due to the decades of standardization history of traditional codecs and the tremendous amount of heuristics accumulated through firm structures, most of the work related to image compression is reconstructed. The emphasis is on improving the quality of reconstructed images.

예를 들면, 몇몇 접근방식들은 ANN들의 우월한 영상 복원(restoration) 능력(capability)에 의존하여 영상 압축 아티팩트들(artifacts)을 감소시키는 방법을 제안하였다.For example, several approaches have proposed a method for reducing image compression artifacts depending on the superior image restoration capabilities of ANNs.

아티팩트 감소가 ANN들의 이점들을 활용함에 있어서 가장 유망한 영역들 중 하나라는 것에 대해서는 의문의 여지가 없다 하더라도, 이러한 접근방법들은, 영상 압축 그 자체라기 보다는, 후-처리(post-processing)의 일종으로 보여질 수 있다.Although there is no doubt that artifact reduction is one of the most promising areas in exploiting the benefits of ANNs, these approaches appear to be a form of post-processing rather than image compression itself. Can lose.

일 실시예는 전통적인 영상 코덱들에 비해 더 나은 성능을 보이는 ANN 기반의 부호화 장치, 부호화 방법, 복호화 장치 및 복호화 방법을 제공할 수 있다.One embodiment may provide an ANN-based encoding apparatus, encoding method, decoding apparatus, and decoding method that exhibits better performance than traditional video codecs.

일 측에 있어서, 입력 영상에 대해 엔트로피 모델을 사용하는 엔트로피 부호화를 수행하여 비트스트림을 생성하는 단계; 및 상기 비트스트림을 전송 또는 저장하는 단계를 포함하는 부호화 방법이 제공된다.In one side, performing an entropy encoding using an entropy model on the input image to generate a bitstream; And transmitting or storing the bitstream.

상기 엔트로피 모델은 문맥-적응형 엔트로피 모델일 수 있다.The entropy model may be a context-adaptive entropy model.

상기 문맥-적응형 엔트로피 모델은 문맥들의 서로 상이한 복수의 타입들을 활용할 수 있다.The context-adaptive entropy model can utilize multiple different types of contexts.

상기 문맥들의 서로 상이한 복수의 타입들은 비트-소비 문맥 및 비트-프리 문맥을 포함할 수 있다.The plurality of different types of the contexts may include a bit-consumption context and a bit-free context.

상기 문맥들의 서로 상이한 복수한 타입들로부터 상기 엔트로피 모델의 표준 편차 파라미터 및 평균값 파라미터가 추정될 수 있다.The standard deviation parameter and the average value parameter of the entropy model can be estimated from a plurality of different types of the contexts.

상기 문맥-적응형 엔트로피 모델의 분석 변환으로의 입력은 균일하게 양자화된 표현성분들일 수 있다.The input to the analytic transformation of the context-adaptive entropy model may be uniformly quantized expression components.

상기 엔트로피 모델은 평균값 파라미터를 갖는 가우시안 모델에 기반할 수 있다.The entropy model can be based on a Gaussian model with mean value parameters.

상기 엔트로피 모델은 문맥-적응형 엔트로피 모델 및 경량 엔트로피 모델을 포함할 수 있다.The entropy model may include a context-adaptive entropy model and a lightweight entropy model.

상기 엔트로피 모델에서 사용되는 은닉 표현성분은 제1 부분 은닉 표현성분 및 제2 부분 은닉 표현성분으로 분할될 수 있다.The hidden expression component used in the entropy model may be divided into a first partial hidden expression component and a second partial hidden expression component.

상기 제1 부분 은닉 표현성분은 제1 양자화된 부분 은닉 표현성분으로 양자화될 수 있다.The first partial hidden expression component may be quantized as a first quantized partial hidden expression component.

상기 제2 부분 은닉 표현성분은 제2 양자화된 부분 은닉 표현성분으로 양자화될 수 있다.The second partial hidden expression component may be quantized as a second quantized partial hidden expression component.

상기 제1 양자화된 부분 은닉 표현성분은 상기 문맥-적응형 엔트로피 모델을 사용하여 부호화될 수 있다.The first quantized partial hidden expression component may be coded using the context-adaptive entropy model.

상기 제2 양자화된 부분 은닉 표현성분은 상기 경량 엔트로피 모델을 사용하여 부호화될 수 있다.The second quantized partial hidden expression component may be coded using the lightweight entropy model.

상기 경량 엔트로피 모델은 스케일 추정을 활용할 수 있다.The lightweight entropy model can utilize scale estimation.

상기 경량 엔트로피 모델은 분석 변환으로부터 직접적으로 추정된 표준 편차들을 검출할 수 있다.The lightweight entropy model can detect standard deviations estimated directly from the analytical transformation.

다른 일 측에 있어서, 비트스트림을 획득하는 통신부; 및 상기 비트스트림에 대해 엔트로피 모델을 사용하는 복호화를 수행하여 재구축된 영상을 생성하는 처리부를 포함하는 복호화 장치가 제공된다.On the other side, the communication unit for obtaining a bitstream; And a processing unit that performs decoding using the entropy model on the bitstream to generate a reconstructed image.

또 다른 일 측에 있어서, 비트스트림을 획득하는 단계; 및 상기 비트스트림에 대해 엔트로피 모델을 사용하는 복호화를 수행하여 재구축된 영상을 생성하는 단계를 포함하는 복호화 방법이 제공된다.In another aspect, obtaining a bitstream; And generating a reconstructed image by performing decoding using an entropy model on the bitstream.

전통적인 영상 코덱들에 비해 더 나은 성능을 보이는 ANN 기반의 부호화 장치, 부호화 방법, 복호화 장치 및 복호화 방법이 제공된다.An ANN-based encoding apparatus, encoding method, decoding apparatus, and decoding method showing better performance than traditional video codecs are provided.

도 1은 일 예에 따른 콘볼루션 레이어의 연산을 나타낸다.
도 2는 일 예에 따른 풀링 레이어의 연산을 나타낸다.
도 3은 일 예에 따른 디콘볼루션 레이어의 연산을 나타낸다.
도 4는 일 예에 따른 언-풀링 레이어의 연산을 나타낸다.
도 5는 일 예에 따른 렐루 레이어의 연산을 나타낸다.
도 6은 일 예에 따른 자동 부호기를 나타낸다.
도 7은 일 예에 따른 콘볼루션 부호기 및 콘볼루션 복호기를 나타낸다.
도 8는 실시예에 따른 부호기를 나타낸다.
도 9는 실시예에 따른 복호기를 나타낸다.
도 10는 일 실시예에 따른 자동 부호기의 구현을 나타낸다.
도 11은 일 예에 따른 더 높은 비트-레이트 환경들을 위한 하이브리드 네트워크의 구조를 나타낸다.
도 12는 일 실시예에 따른 부호화 장치의 구조도이다.
도 13은 일 실시예에 따른 복호화 장치의 구조도이다.
도 14는 일 실시예에 따른 부호화 방법의 흐름도이다.
도 15는 일 실시예에 따른 복호화 방법의 흐름도이다.1 shows an operation of a convolution layer according to an example.
2 shows an operation of a pooling layer according to an example.
3 shows an operation of a deconvolution layer according to an example.
4 shows an operation of an un-pooling layer according to an example.
5 shows an operation of a Relu layer according to an example.
6 shows an automatic encoder according to an example.
7 shows a convolutional encoder and a convolutional decoder according to an example.
8 shows an encoder according to an embodiment.
9 shows a decoder according to the embodiment.
10 shows an implementation of an automatic encoder according to an embodiment.
11 shows the structure of a hybrid network for higher bit-rate environments according to an example.
12 is a structural diagram of an encoding device according to an embodiment.
13 is a structural diagram of a decoding apparatus according to an embodiment.
14 is a flowchart of an encoding method according to an embodiment.
15 is a flowchart of a decoding method according to an embodiment.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The present invention can be applied to various changes and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

후술하는 예시적 실시예들에 대한 상세한 설명은, 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 실시예를 실시할 수 있기에 충분하도록 상세히 설명된다. 다양한 실시예들은 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들면, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 실시예의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 예시적 실시예들의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다.For detailed description of exemplary embodiments described below, reference is made to the accompanying drawings showing specific embodiments as examples. These embodiments are described in detail enough to enable those skilled in the art to practice the embodiments. It should be understood that the various embodiments are different, but need not be mutually exclusive. For example, the specific shapes, structures, and properties described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in relation to one embodiment. In addition, it should be understood that the location or placement of individual components within each disclosed embodiment can be changed without departing from the spirit and scope of the embodiment. Therefore, the following detailed description is not intended to be taken in a limiting sense, and the scope of exemplary embodiments, if appropriately described, is limited only by the appended claims, along with all ranges equivalent to those claimed.

도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.In the drawings, similar reference numerals refer to the same or similar functions across various aspects. The shape and size of elements in the drawings may be exaggerated for a clearer explanation.

본 발명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들면, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함할 수 있다.In the present invention, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from other components. For example, the first component may be referred to as a second component without departing from the scope of the present invention, and similarly, the second component may be referred to as a first component. The term and / or may include a combination of a plurality of related described items or any one of a plurality of related described items.

어떤 구성요소(component)가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기의 2개의 구성요소들이 서로 간에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있으나, 상기의 2개의 구성요소들의 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(component)가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기의 2개의 구성요소들의 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When it is mentioned that a component is "connected" or "connected" to another component, the two components may be directly connected to each other or may be connected, but the above 2 It should be understood that other components may exist in the middle of the dog components. On the other hand, when it is mentioned that a component is "directly connected" or "directly connected" to another component, it should be understood that no other component exists in the middle of the two components. something to do.

본 발명의 실시예에 나타나는 구성요소들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성요소들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성요소는 설명의 편의상 각각의 구성요소로 나열하여 포함한 것으로 각 구성요소 중 적어도 두 개의 구성요소가 합쳐져 하나의 구성요소로 이루어지거나, 하나의 구성요소가 복수 개의 구성요소로 나뉘어져 기능을 수행할 수 있고 이러한 각 구성요소의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리범위에 포함된다.Components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that each component is composed of separate hardware or one software component unit. That is, each component is included as being listed as each component for convenience of description, and at least two components of each component are combined to form one component, or one component is divided into a plurality of components to function. It can be carried out and the integrated and separated embodiments of each of these components are also included in the scope of the present invention without departing from the essence of the present invention.

또한, 예시적 실시예들에서 특정 구성을 "포함"한다고 기술하는 내용은 상기의 특정 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 예시적 실시예들의 실시 또는 예시적 실시예들의 기술적 사상의 범위에 포함될 수 있음을 의미한다.In addition, in the exemplary embodiments, the description that "includes" a specific configuration does not exclude configurations other than the specific configuration described above, and additional configurations may be used to implement the exemplary embodiments or the technical spirit of the exemplary embodiments. It means that it can be included in the scope.

본 발명에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 발명에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 즉, 본 발명에서 특정 구성을 "포함"한다고 기술하는 내용은 해당 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성 또한 본 발명의 실시 또는 본 발명의 기술적 사상의 범위에 포함될 수 있음을 의미한다.The terms used in the present invention are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present invention, terms such as “include” or “have” are intended to indicate that there are features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, and one or more other features. It should be understood that the existence or addition possibilities of fields or numbers, steps, operations, components, parts or combinations thereof are not excluded in advance. That is, in the present invention, a description that "includes" a specific configuration does not exclude a configuration other than the configuration, and means that additional configurations may also be included in the scope of the present invention or the technical spirit of the present invention.

본 발명의 일부의 구성요소는 본 발명에서 본질적인 기능을 수행하는 필수적인 구성요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성요소일 수 있다. 본 발명은, 단지 성능 향상을 위해 사용되는 구성요소가 제외된, 본 발명의 본질을 구현함에 있어 필수적인 구성요소만을 포함하여 구현될 수 있다. 단지 성능 향상을 위해 사용되는 선택적인 구성요소가 제외된 필수적인 구성요소만을 포함하는 구조도 본 발명의 권리범위에 포함된다.Some of the components of the present invention are not essential components for performing essential functions in the present invention, but may be optional components for improving performance. The present invention may be implemented by including only essential components for implementing the essence of the present invention, except for components used for performance improvement. A structure including only essential components excluding optional components used for performance improvement is also included in the scope of the present invention.

이하에서는, 기술분야에서 통상의 지식을 가진 자가 실시예들을 용이하게 실시할 수 있도록 하기 위하여, 첨부된 도면을 참조하여 실시 형태에 대하여 구체적으로 설명한다. 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 도면 상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고, 동일한 구성요소에 대한 중복된 설명은 생략한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings in order to enable those skilled in the art to easily implement the embodiments. In describing the embodiments, when it is determined that a detailed description of related known configurations or functions may obscure the subject matter of the present specification, the detailed description will be omitted. In addition, the same reference numerals are used for the same components in the drawings, and duplicate descriptions for the same components are omitted.

이하에서, 영상은 비디오(video)를 구성하는 하나의 픽처(picture)를 의미할 수 있으며, 비디오 자체를 나타낼 수도 있다. 예를 들면, "영상의 부호화 및/또는 복호화"는 "비디오의 부호화 및/또는 복호화"를 의미할 수 있으며, "비디오를 구성하는 영상들 중 하나의 영상의 부호화 및/또는 복호화"를 의미할 수도 있다.Hereinafter, an image may mean one picture constituting a video, or may represent the video itself. For example, "encoding and / or decoding of an image" may mean "encoding and / or decoding of a video", and "encoding and / or decoding of one of the images constituting the video". It might be.

이하에서, 용어 "부호화(encoding)"는 코딩(coding)의 의미로 사용될 수 있다.Hereinafter, the term "encoding" may be used as a meaning of coding.

``

콘볼루션 레이어Convolution layer

도 1은 일 예에 따른 콘볼루션 레이어의 연산을 나타낸다.1 shows an operation of a convolution layer according to an example.

콘볼루션 레이어는 입력된 프레임에 대한 필터링을 수행할 수 있고, 필터링의 결과로서 특징 맵(feature map)을 출력할 수 있다. 특징 맵은 다음의 레이어에 대한 입력으로 사용될 수 있다. 이러한 구조에 의해, 입력된 프레임이 복수의 레이어들에 의해서 연속하여 처리될 수 있다.The convolution layer may perform filtering on the input frame, and output a feature map as a result of filtering. The feature map can be used as input to the next layer. With this structure, the input frame can be continuously processed by a plurality of layers.

콘볼루션 레이어에서, 커널은 콘볼루션 연산 또는 필터링을 수행하는 필터를 의미할 수 있다. 커널의 크기는 커널 크기 또는 필터 크기로 칭해질 수 있다. 커널을 구성하는 연산 파라미터는 가중치, 커널 파라미터 또는 필터 파라미터로도 칭해질 수 있다.In the convolution layer, the kernel may refer to a filter that performs convolution operation or filtering. The size of the kernel can be referred to as the kernel size or filter size. The operation parameters constituting the kernel may also be referred to as weights, kernel parameters, or filter parameters.

콘볼루션 레이어에서는, 하나의 입력에 대해서 서로 다른 종류들의 필터들이 사용될 수 있다. 이 때, 하나의 필터가 입력을 처리하는 과정을 콘볼루션 채널(convolution channel)로 칭할 수 있다.In the convolutional layer, different types of filters can be used for one input. At this time, a process in which one filter processes an input may be referred to as a convolution channel.

도 1에서 도시된 것과 같이, 콘볼루션 레이어는 커널의 크기만큼의 샘플들을 하나의 샘플로 축소할 수 있다. 도 1에서, 예시된 커널의 크기는 3x3일 수 있다. 말하자면, 도 1에서는, 하나의 채널에서, 3x3의 커널 크기를 갖는 필터에 의해 콘볼루션 연산이 수행되는 과정이 도시되었다.As shown in FIG. 1, the convolution layer can reduce samples as many as the size of the kernel into one sample. In FIG. 1, the size of the illustrated kernel may be 3x3. In other words, in FIG. 1, a process in which a convolution operation is performed by a filter having a kernel size of 3x3 in one channel is illustrated.

도 1에서는, 입력 이미지 내의 짙은 테두리의 사각형에 대해서 연산이 수행될 수 있다. 이 때, 윈도우(window)는 짙은 테두리의 사각형과 같은 연산 영역을 의미할 수 있다. 윈도우는 프레임의 좌측 상단으로부터 우측 하단으로 한 칸씩 이동할 수 있으며, 이동의 크기는 조절될 수 있다.In FIG. 1, an operation may be performed on a rectangle having a dark border in the input image. At this time, the window (window) may mean an operation area such as a rectangle with a dark border. The window can be moved one space from the upper left to the lower right of the frame, and the size of the movement can be adjusted.

콘볼루션 연산의 필터에 대하여 스트라이드(stride) 및 패딩(padding)이 사용될 수 있다.Stride and padding can be used for filters of convolution operations.

스트라이드는 이동의 크기를 의미할 수 있다. 도 18에서 예시된 스트라이드의 값은 1일 수 있다. 스트라이드의 값이 2인 경우, 2 칸의 간격들로 벌어진 윈도우들에 대해서 연산들이 수행될 수 있다.Stride may mean the size of the movement. The value of the stride illustrated in FIG. 18 may be 1. When the stride value is 2, operations may be performed on windows opened at intervals of 2 columns.

패딩은 입력 이미지를 더 크게 만드는 것일 수 있으며, 입력 이미지의 상단, 하단, 좌측 및 우측에 특정된 값들을 채워 넣는 것을 의미할 수 있다.Padding may be to make the input image larger, and may mean filling specific values at the top, bottom, left, and right sides of the input image.

풀링 레이어Pulling layer

도 2는 일 예에 따른 풀링 레이어의 연산을 나타낸다.2 shows an operation of a pooling layer according to an example.

풀링은 콘볼루션 레이어에서의 연산에 의해 획득된 특징 맵에 대한 서브샘플링(subsampling)을 의미할 수 있다.Pulling may refer to subsampling of a feature map obtained by calculation in a convolutional layer.

도 2에서 도시된 것과 같이, 풀링 레이어는 풀링 레이어를 통과하는 특정된 크기의 샘플들에 대해 대표 샘플을 선택할 수 있다.As shown in FIG. 2, the pooling layer can select a representative sample for samples of a specified size passing through the pooling layer.

풀링에 있어서, 일반적으로 스트라이드의 크기 및 윈도우의 크기는 동일할 수 있다.For pooling, the size of the stride and the size of the window may generally be the same.

풀링은 최대 풀링(max pooling) 및 평균 풀링(average pooling)을 포함할 수 있다.Pooling may include max pooling and average pooling.

최대 풀링은 특정된 크기의 샘플들 중에서 최대 값을 갖는 샘플을 대표 샘플로서 선택하는 것일 수 있다. 예를 들면, 2x2의 샘플들에 대해서, 샘플들 중 최대 값이 대표 샘플로서 선택될 수 있다.Maximum pooling may be to select a sample having a maximum value among samples of a specified size as a representative sample. For example, for samples of 2x2, the maximum value of the samples can be selected as a representative sample.

평균 풀링은 특정된 크기의 샘플들의 평균 값을 대표 샘플로서 설정하는 것일 수 있다.The average pooling may be to set an average value of samples of a specified size as a representative sample.

도 2에서 도시된 풀링 레이어는 최대 풀링을 수행할 수 있다. 예를 들면, 풀링 레이어는 2x2 크기를 갖는 윈도우의 샘플들 중에서 하나의 샘플을 선택할 수 있다. 이러한 선택을 통해, 풀링 레이어로부터의 출력의 가로 및 세로는 풀링 레이어로부의 입력의 가로 및 세로의 절반일 수 있다.The pooling layer illustrated in FIG. 2 may perform maximum pooling. For example, the pooling layer may select one sample from samples of a window having a size of 2x2. With this selection, the horizontal and vertical of the output from the pooling layer can be half the horizontal and vertical of the input to the pooling layer.

도 2에서 예시된 것과 같이, 스트라이드 및 윈도우의 크기는 모두 2로 설정될 수 있다. 예를 들면, 풀링 레이어로 [h, w, n]의 크기의 값들이 입력될 때, 풀링 레이어를 거쳐 출력되는 값들의 크기는 [h/2, w/2, n]일 수 있다.As illustrated in FIG. 2, the size of the stride and the window can all be set to 2. For example, when values of [h, w, n] are input to the pooling layer, the size of values output through the pooling layer may be [h / 2, w / 2, n].

디콘볼루션 레이어(deconvolution layer)Deconvolution layer

도 3은 일 예에 따른 디콘볼루션 레이어의 연산을 나타낸다.3 shows an operation of a deconvolution layer according to an example.

디콘볼루션 레이어는 콘볼루션 레이어의 연산의 방향에 비해 반대되는 방향의 연산을 수행할 수 있다. 방향을 제외하고, 콘볼루션 레이어의 연산 및 디콘볼루션 레이어이 연산은 동일한 것으로 간주될 수 있다.The deconvolution layer may perform an operation in a direction opposite to that of the operation of the convolution layer. Except for the direction, the operation of the convolution layer and the operation of the deconvolution layer can be considered to be the same.

디콘볼루션 레이어는 입력된 특징 맵에 대해 콘볼루션 연산을 수행할 수 있고, 콘볼루션 연산을 통해 프레임을 출력할 수 있다.The deconvolution layer may perform a convolution operation on the input feature map, and output a frame through the convolution operation.

출력되는 프레임의 크기는 스트라이드의 값에 따라서 변할 수 있다. 예를 들면, 스트라이드의 값이 1인 경우, 프레임의 가로 크기 및 세로 크기는 특징 맵의 가로 크기 및 세로 크기와 동일할 수 있다. 스트라이드의 값이 2일 경우, 프레임의 가로 크기 및 세로 크기는 특징 맵의 가로 크기 및 세로 크기의 1/2일 수 있다.The size of the output frame can be changed according to the value of stride. For example, when the value of stride is 1, the horizontal and vertical sizes of the frame may be the same as the horizontal and vertical sizes of the feature map. When the value of the stride is 2, the horizontal and vertical sizes of the frame may be 1/2 of the horizontal and vertical sizes of the feature map.

언-풀링 레이어(unpooling layer)Unpooling layer

도 4는 일 예에 따른 언-풀링 레이어의 연산을 나타낸다.4 shows an operation of an un-pooling layer according to an example.

언-풀링 레이어는 풀링 레이어에서의 풀링의 방향의 반대 방향으로 업-샘플링을 진행할 수 있다. 언-풀링 레이어는, 풀링 레이어와는 반대로, 차원을 확대하는 기능을 수행할 수 있다. 말하자면, 언-풀링 레이어는, 풀링 레이어와는 반대로, 언-풀링 레이어를 통과하는 샘플을 특정된 크기의 샘플들로 확대할 수 있다. 예를 들면, 언-풀링 레이어를 통과하는 샘플은 2x2의 윈도우의 샘플들로 확대될 수 있다.The un-pooling layer may perform up-sampling in a direction opposite to the direction of pooling in the pooling layer. The un-pooling layer, as opposed to the pooling layer, may perform a function of expanding a dimension. In other words, the un-pooling layer, as opposed to the pooling layer, can enlarge the sample passing through the un-pooling layer to samples of a specified size. For example, a sample passing through an un-pooling layer can be enlarged to samples of a 2x2 window.

예를 들면, 언-풀링 레이어로 [h, w, n]의 크기의 값들이 입력될 때, 언-풀링 레이어를 거쳐 출력되는 값들의 크기는 [h*2, w*2, n]일 수 있다.For example, when values of the size [h, w, n] are input to the un-pooling layer, the size of values output through the un-pooling layer may be [h * 2, w * 2, n]. have.

비선형 연산 레이어(nonlinear operation layer)Nonlinear operation layer

도 5는 일 예에 따른 렐루 레이어의 연산을 나타낸다.5 shows an operation of a Relu layer according to an example.

도 5의 좌측에는 렐루 레이어(relu layer)로 입력되는 값들의 일 예가 도시되었고, 도 5의 우측에는 렐루 레이어로부터 출력되는 값들의 일 예가 도시되었다.An example of values input to the relu layer is illustrated on the left side of FIG. 5, and an example of values output from the Relu layer is illustrated on the right side of FIG. 5.

렐루 레이어는 도 5에서 도시된 것과 같은 비선형 연산을 수행할 수 있다. 실시예들에서, 렐루 레이어는 비선형 연산 레이어로 대체될 수 있다.The Relu layer may perform a nonlinear operation as illustrated in FIG. 5. In embodiments, the Relu layer may be replaced with a nonlinear computational layer.

렐루 레이어는 입력되는 값들에 대하여 전송 함수(transfer function)를 적용함으로써 출력되는 값들을 생성할 수 있다.The Relu layer may generate output values by applying a transfer function to input values.

렐루 레이어로 입력되는 값들의 크기 및 렐루 레이어로부터 출력되는 값들의 크기는 동일할 수 있다. 말하자면, 렐루 레이어를 통과하는 값들의 크기는 변하지 않을 수 있다.The size of values input to the Relu layer and the size of values output from the Relu layer may be the same. In other words, the size of values passing through the Relu layer may not change.

자동 부호기(autoencoder)Autoencoder

도 6은 일 예에 따른 자동 부호기를 나타낸다.6 shows an automatic encoder according to an example.

자동 부호기는 도 6에서 도시된 것과 같은 구조를 가질 수 있으며, 비지도 학습(unsupervised learning)에 널리 사용될 수 있다.The automatic encoder may have a structure as shown in FIG. 6, and may be widely used for unsupervised learning.

자동 부호기로부터 콘볼루션 부호기(convolution encoder) 및 콘볼루션 복호기(convolution decoder)가 파생될 수 있다.Convolutional encoders and convolutional decoders can be derived from automatic encoders.

자동 부호기의 구조에 따르면, 입력의 차원 및 출력의 차원이 동일할 수 있다. 자동 부호기의 목적은 f(X) = X가 성립하도록 f()에 대한 학습을 수행하는 것일 수 있다. X는 입력 값일 수 있다. 말하자면, 자동 부호기의 목적은 출력되는 예측 값 X'를 입력 값 X에 근사시키는 것일 수 있다.According to the structure of the automatic encoder, the dimension of the input and the dimension of the output may be the same. The purpose of the automatic encoder may be to perform learning on f () such that f (X) = X holds. X may be an input value. In other words, the purpose of the automatic encoder may be to approximate the output predicted value X 'to the input value X.

자동 부호기는 부호기(encoder) 및 복호기(decoder)를 포함할 수 있다. 부호기는 입력 값 X에 대하여 출력 값으로서 코드 또는 은닉 변수(latent variable)을 제공할 수 있다. 코드는 입력 값 X에 대한 특징 벡터(feature vector)로서 사용될 수 있다. 코드는 복호기로 입력될 수 있다. 복호기는 코드로부터 형성된 예측 값 X'을 출력할 수 있다.The automatic encoder may include an encoder and a decoder. The encoder can provide a code or latent variable as an output value for the input value X. The code can be used as a feature vector for the input value X. The code can be entered into the decoder. The decoder can output the predicted value X 'formed from the code.

콘볼루션 부호기 및 콘볼루션 복호기Convolutional encoder and convolutional decoder

도 7은 일 예에 따른 콘볼루션 부호기 및 콘볼루션 복호기를 나타낸다.7 shows a convolutional encoder and a convolutional decoder according to an example.

콘볼루션 부호기(encoder) 및 콘볼루션 복호기(decoder)의 구조들은 콘볼루션 레이어 및 디콘볼루션 레이어의 쌍으로 이루어질 수 있다. 콘볼루션 부호기 및 콘볼루션 복호기는 자동 부호기와 유사하게 입력, 특징 벡터 및 출력을 제공할 수 있다.The structures of the convolutional encoder and the convolutional decoder may consist of a pair of convolutional layers and deconvolutional layers. Convolutional encoders and convolutional decoders can provide inputs, feature vectors and outputs similar to automatic encoders.

콘볼루션 부호기는 콘볼루션 레이어 및 풀링(pooling) 레이어를 포함할 수 있다. 콘볼루션 부호기로의 입력은 프레임일 수 있고, 콘볼루션 부호기로부터의 출력은 특징 맵(feature-map)일 수 있다.The convolutional encoder may include a convolutional layer and a pooling layer. The input to the convolutional encoder may be a frame, and the output from the convolutional encoder may be a feature-map.

콘볼루션 복호기는 디콘볼루션(deconvolution) 레이어 및 언-풀링(un-pooling) 레이어를 포함할 수 있다. 콘볼루션 복호기로의 입력은 특징 맵일 수 있고, 콘볼루션 복호기로부터의 출력은 (재구축된) 프레임일 수 있다.The convolutional decoder may include a deconvolution layer and an un-pooling layer. The input to the convolution decoder can be a feature map, and the output from the convolution decoder can be a (rebuilt) frame.

콘볼루션 부호기 및 콘볼루션 복호기의 구조들에는 콘볼루션의 특징이 반영될 수 있다. 이러한 반영에 따라, 콘볼루션 부호기 및 콘볼루션 복호기는 더 작은 학습 가중치(weight)를 가질 수 있다. 콘볼루션 부호기 및 콘볼루션 복호기는 특히 출력 프레임에 대한 옵티컬 플로우(optical flow) 및 카운터 에지(counter edge) 등과 같은 목적 하에서 동작이 수행될 때 유용할 수 있다.Convolutional features may be reflected in structures of the convolutional encoder and the convolutional decoder. According to this reflection, the convolutional encoder and the convolutional decoder can have a smaller learning weight. The convolutional encoder and the convolutional decoder can be particularly useful when operations are performed under the purpose of optical flow and counter edge for the output frame.

콘볼루션 부호기는 콘볼루션 및 풀링을 활용함으로써 차원을 축소할 수 있고, 프레임으로부터 특징 벡터를 생성할 수 있다. 특징 벡터는 콘볼루션 부호기의 출력 단에서 생성될 수 있다.Convolutional encoders can reduce dimensions by utilizing convolution and pooling, and generate feature vectors from frames. The feature vector can be generated at the output stage of the convolutional encoder.

특징 벡터는 원래의 신호의 차원에 비해 더 낮은 차원에서 원래의 신호의 특징을 표현하는 벡터일 수 있다.The feature vector may be a vector representing characteristics of the original signal in a dimension lower than that of the original signal.

콘볼루션 복호기는 디콘볼루션(deconvolution) 및 언-풀링을 활용하여 특징 벡터로부터 프레임을 재구축할 수 있다.The convolution decoder can rebuild the frame from the feature vector using deconvolution and un-pooling.

ANN-기반 영상 압축ANN-based video compression

ANN-기반 영상 압축과 관련하여, 제안된 방법들은 두 개의 스트림들로 나뉠 수 있다.With regard to ANN-based image compression, the proposed methods can be divided into two streams.

첫 번-로, 생성적(generative) 모델들의 성공의 귀결로서, 인지적(perceptual) 품질을 타겟팅하는 몇몇 영상 압축 접근방식들이 제안되어 왔다. 이러한 접근방식들의 기본적인 아이디어는 자연 영상들의 분포의 학습에 있어서, 텍스처들과 같은, 재구축된 영상의 구조 또는 인지 품질에 큰 영향을 미치지 않는 영상 구성요소들(components)의 생성을 허용함으로써, 심각한 인지적 손실(loss) 없이 매우 높은 압축을 가능하게 한다는 것이다.First, as a result of the success of generative models, several image compression approaches have been proposed that target perceptual quality. The basic idea of these approaches is in learning the distribution of natural images, by allowing the creation of image components, such as textures, that do not significantly affect the structure or perceived quality of the reconstructed image, which is serious. It enables very high compression without cognitive loss.

그러나, 이러한 접근방식에 의해 생성된 영상들이 매우 사실적임에도 불구하고, 기계-생성된(machine-created) 영상 구성요소들의 수용가능성(acceptability)은 결국에는 다소 어플리케이션-의존(application-dependent)일 수 있다.However, although the images generated by this approach are very realistic, the acceptability of machine-created image components may eventually be somewhat application-dependent. .

한편, 두 번째로, 생성 모델들을 사용하지 않고, 엔드-투-엔드(end-to-end) 최적화된 ANN-기반 접근방식들이 사용될 수 있다.On the other hand, second, without using generation models, end-to-end optimized ANN-based approaches can be used.

이러한 접근방식에서는, 예측(prediction), 변환(transform) 및 양자화(quantization)와 같은 개별적인 도구들로 구성된 전통적인 코덱들과는 다르게, 엔드-투-엔드 최적화를 통해 전체 기능들을 커버하는 포괄적(comprehensive) 솔루션이 제공될 수 있다.In this approach, unlike traditional codecs made up of individual tools such as prediction, transformation and quantization, a comprehensive solution that covers the entire functionality through end-to-end optimization Can be provided.

예를 들면, 한 접근방식은 모든 단계들에서 압축된 정보를 포함하기 위해 이진의(binary) 은닉(latent) 표현성분들(representations)의 소량을 활용할 수 있다. 각 단계는 점진적으로 품질을 향상시키는 것을 달성하기 위해 추가의 은닉 표현성분들을 더욱 더 쌓을 수 있다.For example, one approach can utilize a small amount of binary latent representations to contain compressed information at all stages. Each step can further accumulate additional hidden expression components to achieve progressively improving quality.

다른 접근방식은, 전술된 접근방식의 네트워크 구조를 향상시켜서 압축 성능을 향상시킬 수 있다.Other approaches can improve compression performance by improving the network structure of the above-described approach.

이러한 접근방식들은 하나의 훈련된 네트워크를 통한 품질 제어에 적합한 새로운 프레임워크들을 제공할 수 있다. 이러한 접근방식들에 있어서, 반복(iteration) 단계들의 개수의 증가는 몇몇 어플리케이션들에는 부담이 될 수 있다.These approaches can provide new frameworks suitable for quality control through a single trained network. In these approaches, increasing the number of iteration steps can be a burden for some applications.

이러한 접근방식들은 최대한 높은 엔트로피를 갖는 이진 표현성분들을 추출할 수 있다. 반면, 다른 접근 방식들은 영상 압축 문제를 어떻게 가능한 낮은 엔트로피를 갖는 이산 은닉 표현성분들(discrete latent representations)을 어떻게 검출하는(retrieve) 가로 간주한다.These approaches can extract binary expression components with the highest entropy possible. On the other hand, other approaches regard the image compression problem as how to detect discrete latent representations with as low entropy as possible.

다시 말하면, 전자의 접근방식들의 목표 문제는 어떻게 고정된 개수의 표현성분 내에 가능한 많은 정보를 포함시키는가로 간주될 수 있고, 반면 후자의 접근방식들의 목표 문제는 단지 표현성분들이 충분한 개수가 주어졌을 때 어떻게 예상되는 비트-레이트를 감소시킬 수 있는가로 간주될 수 있다. 여기에서, 낮은 엔트로피는 엔트로피 코딩에 의한 낮은 비트-레이트에 대응한다고 가정될 수 있다.In other words, the target problem of the former approaches can be regarded as how to include as much information as possible within a fixed number of expression elements, whereas the target problem of the latter approaches is only when a sufficient number of expression elements are given. It can be regarded as how to reduce the expected bit-rate. Here, it can be assumed that the low entropy corresponds to the low bit-rate by entropy coding.

후자의 접근방식들의 목표 문제를 해결하기 위해, 접근방식들은 이산 은닉 표현성분들의 실제의 분포를 근사하기 위한 자체의 엔트로피 모델들을 채용할 수 있다.To solve the target problem of the latter approaches, the approaches can employ their own entropy models to approximate the actual distribution of discrete concealment representations.

예를 들면, 일부 접근방식들은 엔트로피 모델들을 활용하는 새로운 프레임워크들을 제안할 수 있고, 엔트로피 모델들에 의해 생성된 결과들을 JPEG2000과 같은 기존의 코덱들과 비교함으로써 엔트로피 모델들의 성능이 입증될 수 있다.For example, some approaches can propose new frameworks that utilize entropy models, and the performance of entropy models can be demonstrated by comparing the results produced by entropy models with existing codecs such as JPEG2000. .

이러한 접근방식들에 있어서, 각 표현성분이 고정된 분포를 갖는다고 가정될 수 있다. 접근방식에 대해서, 각 표현성분에 대한 분포의 스케일을 추정하는 입력-적응적(input-adaptive) 엔트로피 모델이 사용될 수 있다. 이러한 접근방식은 표현성분들의 스케일들이 인접한 영역들 내에서 함께 변한다는 자연 영상들의 특성에 기반할 수 있다.In these approaches, it can be assumed that each expression component has a fixed distribution. For the approach, an input-adaptive entropy model that estimates the scale of the distribution for each expression component can be used. This approach can be based on the nature of natural images that the scales of the expression components change together in adjacent regions.

엔드-투-엔드 최적화 이미지 압축의 주요 요소들 중 하나는 은닉 표현성분들을 위한 훈련가능한 엔트로피 모델일 수 있다.One of the main elements of end-to-end optimization image compression can be a trainable entropy model for hidden representations.

은닉 표현성분들의 실제의 분포들은 알려져 있지 않기 때문에, 엔트로피 모델들은 은닉 표현성분들의 분포들을 근사함으로써 은닉 표현성분들을 부호화하기 위한 추정된 비트들을 계산할 수 있다.Since the actual distributions of the hidden expression components are unknown, entropy models can calculate estimated bits for encoding the hidden expression components by approximating the distributions of the hidden expression components.

입력 이미지

가 은닉 표현성분

로 변환(transform)되고, 은닉 표현성분

가 로 양자화된 은닉 표현성분

로 균일하게 양자화될 때, 단순한 엔트로피 모델은

로 표현될 수 있다.Input image

A hidden expression ingredient

Transformed to, and hidden expression component

Hidden quantization component quantized with

When quantized uniformly, a simple entropy model

Can be expressed as

는

의 실제의 한계(marginal) 분포를 나타낼 수 있다. 엔트로피 모델

을 사용하는 교차(cross) 엔트로피를 통해 계산된 율 추정(rate estimation)은 아래의 수학식 1과 같이 표현될 수 있다.

The

Can represent the actual marginal distribution of. Entropy model

The rate estimation calculated through cross entropy using can be expressed as Equation 1 below.

율 추정은

의 실제의 엔트로피 및 추가의 비트들로 분해될 수 있다. 말하자면, 율 추정은

의 실제의 엔트로피 및 추가의 비트들을 포함할 수 있다.Rate estimation

The actual entropy of and can be decomposed into additional bits. In other words, rate estimation

The actual entropy of and may include additional bits.

추가의 비트들은 실제의 분포들 및 이러한 실제의 분포들에 대한 추정들 간의 불일치(mismatch)에 기인할 수 있다.Additional bits may be due to a mismatch between the actual distributions and estimates for these actual distributions.

따라서, 훈련의 프로세스 동안 율 항(rate term)

이 감소하면, 엔트로피 모델

및 근사

가 가능한 가까워질 수 있으며, 또한

의 실제의 엔트로피가 작게 되도록 다른 파라미터들이

를

로 원활하게 변환할 수 있다.Thus, during the process of training, rate terms

When it decreases, the entropy model

And approximation

Can be as close as possible, and also

Other parameters to make the actual entropy of

To

Can be converted smoothly.

쿨백-라이블러(Kullback-Leibler; KL)-발산(divergence)의 관점에서,

은

가 실제의 분포

와 완벽하게 매치될 때 최소화될 수 있다. 이는, 상기의 방법들의 압축 성능이 본질적으로 엔트로피 모델의 성능에 의존한다는 것을 의미할 수 있다.In terms of Kullback-Leibler (KL) -divergence,

silver

Is the actual distribution

And can be minimized when matched perfectly. This may mean that the compression performance of the above methods essentially depends on the performance of the entropy model.

실시예에서, 성능을 향상시키기 위해, 문맥들의 2 개의 타입들을 활용하는 새로운 엔트로피 모델이 제안될 수 있다. 문맥의 2 개의 타입들은, 비트-소비(bit-consuming) 문맥 및 비트-프리(bit-free) 문맥일 수 있다.In an embodiment, a new entropy model can be proposed that utilizes two types of contexts to improve performance. The two types of context can be a bit-consuming context and a bit-free context.

비트-소비 문맥 및 비트-프리 문맥은 문맥이 전송(transmission)을 위한 추가적인 비트 할당(allocation)을 요구하는지 여부에 따라 구분될 수 있다.The bit-consumption context and the bit-free context can be divided according to whether the context requires additional bit allocation for transmission.

이러한 문맥들을 이용하여, 제안되는 엔트로피 모델은 엔트로피 모델들의 보다 일반적인 형태를 사용하여 각 은닉 표현성분의 분포를 보다 정확하게 추정하게 할 수 있다. 또한, 제안되는 엔트로피 모델은 이러한 정확한 추정을 통해 인접한 은닉 표현성분들 간의 공간적 의존성들(spatial dependencies)을 더 효율적으로 감소시킬 수 있다.Using these contexts, the proposed entropy model can more accurately estimate the distribution of each hidden expression component using a more general form of entropy models. In addition, the proposed entropy model can more efficiently reduce spatial dependencies between adjacent hidden expression components through such accurate estimation.

후술될 실시예들에 의해 아래와 같은 효과가 이루어질 수 있다.The following effects can be achieved by the embodiments to be described later.

- 문맥들의 2 개의 다른 타입들을 접목시키는(incorporate) 새로운 문맥-적응적 엔트로피 모델 프레임워크가 제공될 수 있다.A new context-adaptive entropy model framework can be provided that incorporates two different types of contexts.

- 모델 용량(capacity) 및 문맥들의 레벨의 측면에서 실시예의 방법들의 개선(improvement) 방향들(directions)이 설명될 수 있다.-Improving directions of the method of the embodiment in terms of model capacity and level of contexts can be described.

- ANN 기반 영상 압축의 도메인에서, 최대 신호 대 잡음 비(Peak Signal-to-Noise Ratio; PSNR)의 측면에서, 널리 사용되는 기존의 이미지 코덱을 성능에서 능가하는 테스트 결과들이 제공될 수 있다.-In the domain of ANN-based image compression, in terms of peak signal-to-noise ratio (PSNR), test results can be provided that outperform conventional image codecs that are widely used in performance.

또한, 실시예들에 관하여 아래와 같은 설명들이 후술될 수 있다.In addition, the following descriptions of the embodiments may be described later.

1) 엔드-투-엔드 최적화된 영상 압축의 키 접근방식들이 소개되고, 문맥-적응적 엔트로피 모델이 제안될 수 있다.1) Key approaches of end-to-end optimized image compression are introduced, and a context-adaptive entropy model can be proposed.

2) 부호기 및 복호기 모델들이 구조가 설명될 수 있다.2) The structure of the encoder and decoder models can be described.

3) 실험의 셋업 및 실험의 결과가 제공될 수 있다.3) The setup of the experiment and the results of the experiment can be provided.

4) 실시예들의 현재의 상태 및 개선 방향들이 설명될 수 있다.4) The current state of the embodiments and directions for improvement can be described.

문맥-적응적 엔트로피 모델에 기반하는 엔드-투-엔드 최적화End-to-end optimization based on context-adaptive entropy model

엔트로피 모델들Entropy models

실시예의 엔트로피 모델들은 이산 은닉 표현성분들의 분포를 근사할 수 있다. 이러한 근사를 통해 엔트로피 모델들은 영상 압축 성능을 향상시킬 수 있다.The entropy models of the embodiment may approximate the distribution of discrete concealment expression components. Through this approximation, entropy models can improve image compression performance.

실시예의 엔트로피 모델들 중 어떤 것은 비-파라미터의(non-parametric) 모델들로 가정될 수 있고, 다른 것은 표현성분 당 6 개의 가중치가 부여되는(six weighted) 제로-평균값(zero-mean) 가우시안 모델로 구성된 가우시안 스케일 혼합 모델일 수 있다.Some of the entropy models of the embodiment can be assumed to be non-parametric models, and the other is a zero-mean Gaussian model with six weights per expression component. It may be a Gaussian scale mixed model consisting of.

엔트로피 모델들의 형태들이 서로 다르다고 가정되더라도, 엔트로피 모델들은 입력 적응성에 대한 고려 없이 표현성분들의 분포들을 학습하는 것에 집중한다는 공통된 특징을 가질 수 있다. 다시 말해서, 일단 엔트로피 모델이 훈련되면, 표현성분들에 대하여 훈련된 모델들은 테스트 시간 동안 임의의 입력에 대해서 고정될 수 있다.Even if it is assumed that the types of entropy models are different from each other, entropy models may have a common characteristic of focusing on learning distributions of expression components without considering input adaptability. In other words, once the entropy model is trained, the models trained on the expression components can be fixed for any input during the test time.

반면, 특정 엔트로피 모델은 표현성분들에 대하여 입력-적응적 스케일 추정을 채용할 수 있다. 이러한 엔트로피 모델에서는, 자연 영상들로부터의 은닉 표현성분들 스케일들은 인접한 영역 내에서 함께 움직이는 경향이 있다는 가정이 적용될 수 있다.On the other hand, a specific entropy model may employ input-adaptive scale estimation for expression components. In this entropy model, the assumption that the scales of hidden expression components from natural images tend to move together in adjacent regions can be applied.

이러한 중복성(redundancy)을 감소시키기 위해, 엔트로피 모델은 추가 정보의 소량을 사용할 수 있다. 추가 정보는 은닉 표현성분들의 적절한 스케일 파라미터들(예를 들면, 표준 편차들)과 같이 추정될 수 있다.To reduce this redundancy, entropy models can use small amounts of additional information. Additional information can be estimated as appropriate scale parameters (eg, standard deviations) of the hidden expression components.

스케일 추정 외에도, 연속적인 도메인 내의 각 표현성분에 대한 사전 확률 밀도 함수(Probability Density Function; PDF)가 표준 균일 밀도 함수(standard uniform density function)와 콘볼루션될(convolved) 때, 엔트로피 모델은 라운딩(rounding)에 의해 균일하게 양자화된 이산 은닉 표현성분의 사전의 확률 질량 함수(Probability Mass Function; PMF)에 더 가깝게 근사할 수 있다.In addition to scale estimation, entropy models are rounded when the Probability Density Function (PDF) for each expression component in a continuous domain is convolved with a standard uniform density function. ) Can be approximated more closely to the dictionary Probability Mass Function (PMF) of the discrete concealment expression component uniformly quantized by.

훈련에 대하여, 균일 노이즈가 각 은닉 표현성분에 추가될 수 있다. 이러한 추가는 노이즈 낀(noisy) 표현성분들의 분포를 언급된 PMF-근사 함수들에 맞추기 위한 것일 수 있다.For training, uniform noise can be added to each hidden expression component. This addition may be to fit the distribution of noisy expression components to the mentioned PMF-approximation functions.

이러한 접근방식들로, 엔트로피 모델은 BPG와 유사한 최신의(state-of-the-art) 압축 성능을 달성할 수 있다.With these approaches, the entropy model can achieve state-of-the-art compression performance similar to BPG.

은닉 변수들의 공간적 의존성들Spatial dependencies of hidden variables

은닉 표현성분들이 콘볼루션(convolution) 신경 네트워크를 통해 변환 될 때, 동일한 콘볼루션 필터들이 공간적 구역들(regions)을 걸쳐 공유되고, 자연 영상들은 인접한 구역들 내에서 다양한 팩터들(factors)을 공통적으로 갖기 때문에 은닉 표현성분들은 본질적으로 공간적 의존성들을 포함할 수 있다.When the hidden expression components are transformed through a convolutional neural network, the same convolution filters are shared across spatial regions, and natural images share a variety of factors within adjacent regions. Because of this, hidden expression components can essentially contain spatial dependencies.

엔트로피 모델에 있어서, 은닉 표현성분들의 표준 편차들을 입력-적응적으로 추정함으로써 이러한 공간 의존성들이 성공적으로 포착될 수 있고, 압축 성능이 향상될 수 있다.In the entropy model, these spatial dependencies can be successfully captured and compression performance can be improved by input-adaptive estimation of the standard deviations of the hidden expression components.

한 걸음 더 나아가서, 표준 편차 외에도, 문맥들을 활용하는 평균값(mean; mu) 추정을 통해 추정된 분포의 형태(form)가 일반화될 수 있다.Going one step further, in addition to the standard deviation, the form of the distribution estimated through mean (mu) estimation using contexts can be generalized.

예를 들면, 특정한(certain) 표현성분들이 공간적으로 인접한 영역 내에서 유사한 값을 갖는 경향이 있다고 가정하면, 모든 이웃 표현성분들이 10의 값을 가질 때, 현재의 표현성분이 10 또는 유사한 값들을 가질 가능성이 비교적 높다는 것이 직관적으로 추측될 수 있다. 따라서, 이러한 간단한 추정은 엔트로피를 감소시킬 수 있다. For example, assuming that certain expression components tend to have similar values in spatially adjacent regions, when all neighboring expression components have a value of 10, the current expression component has 10 or similar values. It can be intuitively assumed that the likelihood is relatively high. Thus, this simple estimation can reduce entropy.

마찬가지로, 실시예의 방법에 따른 엔트로피 모델은 각 은닉 표현성분의 mu 및 표준 편차를 추정하기 위해 주어진 문맥을 사용할 수 있다.Similarly, the entropy model according to the method of the embodiment can use a given context to estimate mu and standard deviation of each hidden expression component.

또는, 엔트로피 모델은 각 이진 표현성분의 확률을 추정함으로써 문맥-적응적 엔트로피 코딩을 수행할 수 있다.Alternatively, the entropy model can perform context-adaptive entropy coding by estimating the probability of each binary expression component.

그러나, 이러한 문맥-적응적 엔트로피 코딩은, 엔트로피 코딩의 확률 추정이 율-왜곡(Rate-Distortion; R-D) 최적화 프레임워크의 율 항(rate term)에 직접적으로 기여하지 않기 때문에, 앤드-투-앤드 최적화 구성요소들 중 하나라기 보다는 별개의 구성요소들로 보일 수 있다.However, this context-adaptive entropy coding does not directly contribute to the rate term of the Rate-Distortion (RD) optimization framework, since the probability estimation of entropy coding does not directly contribute to end-to-end. It can be seen as separate components rather than one of the optimization components.

2 개의 상이한 접근방식들의 은닉 변수들

및 이러한 은닉 변수들의 정규화된 버전들이 예시될 수 있다. 앞서 언급된 문맥들의 2 개의 타입들을 가지고, 하나의 접근방식에서는 단지 표준 편차 파라미터들이 추정될 수 있고, 다른 하나의 접근방식에서는 mu 및 표준 편차 파라미터들의 양자가 추정될 수 있다. 이 때, 주어진 문맥들을 가지고 mu가 함께 추정될 때 공간적 의존성은 더 효율적으로 제거될 수 있다.Hidden variables of two different approaches

And normalized versions of these hidden variables. With the two types of contexts mentioned above, only standard deviation parameters can be estimated in one approach, and both mu and standard deviation parameters can be estimated in another approach. At this time, when mu is estimated together with given contexts, spatial dependence can be removed more efficiently.

문맥-적응적 엔트로피 모델Context-adaptive entropy model

실시예에서의 최적화 문제에 있어서, 입력 영상

는 낮은 엔트로피를 갖는 은닉 표현성분

로 변환될 수 있고,

의 공간적 의존성들은

로 포착될 수 있다. 따라서, 4 개의 주요한 파라미터의(parametric) 변환 함수들이 사용될 수 있다. 엔트로피 모델의 4 개의 파라미터의 변환 함수들은 아래의 1) 내지 4)와 같다.In the optimization problem in the embodiment, the input image

Is a secret expression component with low entropy

Can be converted to

The spatial dependencies of

Can be captured by Thus, four major parametric transformation functions can be used. The transformation functions of the four parameters of the entropy model are as follows 1) to 4).

1)

를 은닉 표현성분

로 변환하기 위한 분석 변환

One)

The secret expression component

Analytical transformation to convert to

2) 재구축된 영상

를 생성하기 위한 합성(synthesis) 변환

2) Reconstructed video

Synthesis transformation to generate

2)

의 공간적 중복성들을 은닉 표현성분

로 포착(capture)하기 위한 분석 변환

2)

Components that hide the spatial redundancies of

Analytic transformation to capture with

4) 모델 추정에 대한 문맥들을 생성하기 위한 합성 변환

4) Composite transformation to generate contexts for model estimation

실시예에서,

는 표현성분들의 표준 편자들을 직접적으로 추정하지 않을 수 있다. 대신, 실시예에서,

는 분포를 추정하기 위해 문맥들의 2 개의 타입들 중 하니인 문맥

을 생성할 수 있다. 문맥들의 2 개의 타입들에 대해서는 아래에서 설명된다.In an embodiment,

May not directly estimate the standard deviations of the expression components. Instead, in an embodiment,

Is one of two types of contexts to estimate the distribution

You can create Two types of contexts are described below.

변이(variational) 자동 부호기(autoencoder)의 시점(viewpoint)로부터 최적화 문제가 분석될 수 있고, KL-발산의 최소화는 영상 압축의 R-D 최적화와 동일한 문제로 간주될 수 있다. 기본적으로, 실시예에서는 동일한 컨셉이 채용될 수 있다. 그러나 훈련에 있어서, 실시예에서는 노이즈 낀 표현성분들 대신에 조건들(conditions)에 대한 이산 표현성분들이 사용될 수 있고, 따라서 노이즈 낀 표현성분들은 엔트로피 모델들로의 입력들로만 사용될 수 있다.Optimization problems can be analyzed from a viewpoint of a variational autoencoder, and minimization of KL-emission can be regarded as the same problem as R-D optimization of image compression. Basically, the same concept can be employed in the embodiment. However, in training, in embodiments, discrete expression components for conditions can be used instead of the noisy expression components, so the noisy expression components can be used only as inputs to entropy models.

경험적으로, 조건들에 대한 이산 표현성분들을 사용하는 것은 더 나은 결과들을 낳을 수 있다. 이러한 결과들은 훈련 시간 및 테스팅 시간 사이에서의 조건들의 불일치를 제거하는 것과, 이러한 불일치의 제거에 의해 훈련 용량을 향상시키는 것으로부터 올 수 있다. 훈련 용량은 균일 노이즈의 영향(affect)을 단지 확률 질량 함수들로의 근사를 돕는 것만으로 제한함으로써 향상될 수 있다.Empirically, using discrete expressions for conditions can yield better results. These results can come from eliminating discrepancies in conditions between training time and testing time, and improving training capacity by eliminating these discrepancies. The training capacity can be improved by limiting the effect of uniform noise to only assist in approximation to probability mass functions.

실시예에서, 균일 양자화로부터의 불연속성들(discontinuities)을 다루기 위해 정체(identity) 함수를 갖는 그래디언트 오버라이딩(gradient overriding) 방법이 사용될 수 있다. 실시예에서 사용되는 결과인(resulting) 목적 함수들(objective functions)은 아래의 수학식 2에서 설명되었다.In an embodiment, a gradient overriding method with an identity function can be used to address discontinuities from uniform quantization. The objective functions that are used in the examples were described in Equation 2 below.

수학식 2에서, 총 손실(total loss)은 2 개의 항들을 포함한다. 2 개의 항들은 비율들 및 왜곡들을 나타난다. 말하자면, 총 손실은 율 항(rate term) R 및 왜곡 항(distortion term) D를 포함할 수 있다.In Equation 2, the total loss includes two terms. The two terms show ratios and distortions. That is, the total loss may include a rate term R and a distortion term D.

계수

는 R-D 최적화 동안의 율 및 왜곡 간의 균형(balance)을 제어할 수 있다.Coefficient

Can control the balance between rate and distortion during RD optimization.

여기에서,

가 변환

의 결과이고,

가 변환

의 결과일 때,

및

의 노이즈가 낀 표현성분은 표준 균일 분포를 따를 수 있다. 여기에서,

의 평균값은

일 수 있고,

의 평균값은

일 수 있다. 또한,

로의 입력은, 노이즈 낀 표현성분

가 아니라,

일 수 있다.

는 라운딩 함수

에 의한

의 균일하게 양자화된 표현성분들일 수 있다.From here,

Fall conversion

Is the result of,

Fall conversion

When the result of

And

The noisy expression component of can follow the standard uniform distribution. From here,

The average value of

Can be,

The average value of

Can be Also,

The input to the furnace is a noisy expression component

Not,

Can be

Is a rounding function

On by

It may be uniformly quantized expression components of.

율 항은

및

의 엔트로피 모델들을 가지고 계산된 예상되는 비트들을 나타낼 수 있다.

는 궁극적으로

의 근사일 수 있고,

는 궁극적으로

의 근사일 수 있다.Rate terms

And

Can represent the expected bits calculated with the entropy models of.

Ultimately

May be an approximation of

Ultimately

It can be an approximation of

아래의 수학식 4는

에 대한 요구되는 비트들의 근사를 위한 엔트로피 모델을 나타낼 수 있다. 수학식 4는 엔트로피 모델에 대한 공식적인(formal) 표현성분일 수 있다.Equation 4 below

It may represent an entropy model for approximation of the required bits for. Equation 4 may be a formal expression component for the entropy model.

엔트로피 모델은 표준 편차 파라미터

뿐만 아니라, mu 파라미터

도 갖는 가우시안 모델에 기반할 수 있다.The entropy model is a standard deviation parameter

In addition, the mu parameter

It can also be based on a Gaussian model.

및

는 함수

에 의해 주어진 문맥들의 2 개의 타입들로부터 결정적 방식으로 추정될 수 있다. 함수

는 분포 추정자(estimator)일 수 있다.

And

Is a function

Can be estimated in a deterministic manner from the two types of contexts given by. function

Can be a distribution estimator.

문맥들의 2 개의 타입들은 비트-소비 문맥 및 비트-프리 문맥일 수 있다. 여기에서, 어떤 표현성분의 분포를 추정하기 위한 문맥들의 2 개의 타입들은

및

로 표시될 수 있다.The two types of contexts can be bit-consumption context and bit-free context. Here, the two types of contexts for estimating the distribution of certain expression components are

And

It may be indicated by.

추출자

는

로부터

를 추출할 수 있다.

는 변환

의 결과일 수 있다. Extractor

The

from

Can be extracted.

Convert

May be the result of

와는 대조적으로,

에 대해서는 어떤 추가 비트 할당도 요구되지 않을 수 있다. 대신,

의 알려진(이미 엔트로피-부호화되거나, 엔트로피-복호화된) 서브세트가 활용될 수 있다. 이러한

의 알려진 서브세트는

로 표시될 수 있다.

In contrast to,

For, no additional bit allocation may be required. instead,

A known (already entropy-encoded or entropy-encoded) subset of can be utilized. Such

A known subset of

It may be indicated by.

추출자

는

로부터

를 추출할 수 있다.Extractor

The

from

Can be extracted.

엔트로피 부호기 및 엔트로피 복호기는, 래스트 스캐닝(raster scanning)과 같은, 동일한 특정된(specific) 순서로 순차적으로(sequentially)

를 처리할 수 있다. 따라서, 동일한

를 처리함에 있어서, 엔트로피 부호기 및 엔트로피 복호기에게 주어지는

는 언제나 동일할 수 있다.The entropy encoder and the entropy decoder are sequentially in the same specific order, such as raster scanning.

Can handle Therefore, the same

In processing, it is given to the entropy encoder and the entropy decoder.

Can always be the same.

의 경우에는, 단순한 엔트로피 모델이 사용될 수 있다. 이러한 단순한 엔트로피 모델은 훈련가능한

를 가진 제로-평균값 가우시안 분포들을 따르는 것으로 가정될 수 있다.

In the case of, a simple entropy model can be used. This simple entropy model is trainable

It can be assumed to follow the zero-average Gaussian distributions with.

는 부가 정보(side information)로 간주될 수 있으며,

는 총 비트-레이트의 매우 적은 양에 기여할 수 있다. 따라서, 실시예에서는, 더 복잡한 엔트로피 모델들이 아닌, 엔트로피 모델의 단순화된 버전이 제안된 방법의 전체의 파라미터들 상의 엔드-투-엔드 최적화를 위해 사용될 수 있다.

Can be considered as side information,

Can contribute to a very small amount of total bit-rate. Thus, in an embodiment, rather than more complex entropy models, a simplified version of the entropy model can be used for end-to-end optimization on the overall parameters of the proposed method.

아래의 수학식 5는 엔트로피 모델의 단순화된 버전을 나타낸다.Equation 5 below shows a simplified version of the entropy model.

율 항은 실제의 비트들의 양이 아니고, 언급된 것과 같이 엔트로피 모델들로부터 계산된 추정일 수 있다. 따라서, 훈련 또는 부호화에 있어서, 실제의 엔트로피 부호화 또는 엔트로피 복호화 프로세스들이 필수적으로 요구되지 않을 수 있다.The rate term is not the actual amount of bits, but may be an estimate calculated from entropy models as mentioned. Therefore, in training or encoding, actual entropy encoding or entropy decoding processes may not necessarily be required.

왜곡 항(distortion term)에 관하여,

가 널리-사용되는 왜곡 메트릭스들(metrics)로서 가우시안 분포들을 따른다고 가정될 수 있다. 이러한 가정 하에서, 왜곡 항은 평균 제곱된 에러(Mean Squared Error; MSE)를 사용하여 계산될 수 있다.Regarding the distortion term,

Can be assumed to follow Gaussian distributions as widely-used distortion metrics. Under this assumption, the distortion term can be calculated using Mean Squared Error (MSE).

부호기-복호기 모델Encoder-decoder model

도 8는 실시예에 따른 부호기를 나타낸다.8 shows an encoder according to an embodiment.

도 8에서, 우측의 작은 아이콘들은 엔트로피-부호화된 비트스트림을 나타낼 수 있다.In FIG. 8, the small icons on the right may indicate an entropy-encoded bitstream.

도 8에서, EC는 엔트로피 코딩(즉, 엔트로피 인코딩)을 나타낼 수 있다.

는 균일 잡음 추가 또는 균일 양자화를 나타낼 수 있다.In FIG. 8, EC may indicate entropy coding (ie, entropy encoding).

Can represent uniform noise addition or uniform quantization.

또한, 도 8에서, 노이즈가 낀 표현성분들은 점선들(dotted lines)로 도시되었다. 실시예에서, 노이즈가 낀 표현성분들은 엔트로피 모델들로의 입력으로서 단지 훈련을 위해 사용될 수 있다.In addition, in FIG. 8, the noisy expression components are shown as dotted lines. In an embodiment, noisy expression components can be used for training only as input to entropy models.

부호기(800) 및 복호기(900)의 동작들 및 상호작용(interaction)에 대해서 아래에서 더 상세하게 설명된다.The operations and interactions of the encoder 800 and decoder 900 are described in more detail below.

도 9는 실시예에 따른 복호기를 나타낸다.9 shows a decoder according to the embodiment.

도 9에서, 좌측의 작은 아이콘들은 엔트로피-부호화된 비트스트림을 나타낼 수 있다.In FIG. 9, the small icons on the left may represent an entropy-encoded bitstream.

ED는 엔트로피 디코딩을 나타낼 수 있다.ED may indicate entropy decoding.

부호기(800) 및 복호기(900)의 동작들 및 상호작용에 대해서 아래에서 더 상세하게 설명된다.The operations and interactions of the encoder 800 and decoder 900 are described in more detail below.

부호기(800)는 입력 이미지를 은닉 표현성분들로 변환할 수 있다. 부호기는 은닉 표현성분들을 양자화함으로써 양자화된 은닉 표현성분들을 생성할 수 있다. 또한, 부호기는 양자화된 은닉 표현성분들에 대해 훈련된 엔트로피 모델을 사용하는 엔트로피-부호화을 수행함으로서 엔트로피-인코딩된 은닉 표현성분들을 생성할 수 있고, 엔트로피-부호화된 은닉 표현성분들을 비트스트림으로서 출력할 수 있다.The encoder 800 may convert the input image into hidden expression components. The encoder can generate quantized hidden expression components by quantizing the hidden expression components. Also, the encoder can generate entropy-encoded hidden expression components by performing entropy-encoding using a trained entropy model for quantized hidden expression components, and can output entropy-coded hidden expression components as a bitstream. have.

훈련된 엔트로피 모델은 부호기(800) 및 복호기(900) 간에 공유될 수 있다. 말하자면, 훈련된 엔트로피 모델은 공유된 엔트로피 모델로도 칭해질 수 있다.The trained entropy model can be shared between the encoder 800 and the decoder 900. In other words, a trained entropy model can also be referred to as a shared entropy model.

반면, 복호기(900)는 비트스트림을 통해 엔트로피-부호화된 은닉 표현성분들을 수신할 수 있다. 복호기(900)는 엔트로피-인코딩된 은닉 표현성분들에 대해 공유된 엔트로피 모델을 사용하는 엔트로피-디코딩을 수행함으로써 은닉 표현성분들을 생성할 수 있다. 복호기(900)는 은닉 표현성분들을 사용하여 재구축된 영상을 생성할 수 있다. On the other hand, the decoder 900 may receive entropy-encoded hidden expression components through a bitstream. The decoder 900 may generate hidden expression components by performing entropy-decoding using a shared entropy model for the entropy-encoded hidden expression components. The decoder 900 may generate a reconstructed image using hidden expression components.

부호기(800) 및 복호기(900)에 있어서, 모든 파라미터들은 이미 훈련된 것으로 가정될 수 있다.For encoder 800 and decoder 900, all parameters can be assumed to have already been trained.

부호기-복호기 모델의 구조(structure)는 기본적으로

및

를 포함할 수 있다.

는

의

로의 변환을 담당할 수 있으며,

는

의 변환에 대한 역변환(inverse transform)을 담당할 수 있다.The structure of the encoder-decoder model is basically

And

It may include.

The

of

Can be responsible for conversion to,

The

In charge of the inverse transform (inverse transform) for the transformation.

변환된

는 라운딩에 의해

로 균일하게 양자화될 수 있다.Converted

By rounding

It can be quantized uniformly.

여기에서, 기존의 코덱들과는 다르게, 엔트로피 모델들에 기반한 접근방식들의 경우에는, 표현성분들의 스케일들이 훈련에 의해 함께 최적화되기 때문에 양자화 스텝들에 대한 튜닝은 일반적으로 불필요할 수 있다.Here, unlike existing codecs, in the case of approaches based on entropy models, tuning for quantization steps may generally be unnecessary because the scales of the expression components are optimized together by training.

및

의 사이의 다른 구성요소들은 1) 공유된 엔트로피 모델들 및 2) 기저에 있는(underlying) 문맥 준비(preparation) 프로세스들을 가지고 엔트로피 부호화(또는, 엔트로피 복호화)의 역할을 수행할 수 있다.

And

Other components in between 1) can perform the role of entropy encoding (or entropy decoding) with 1) shared entropy models and 2) underlying context preparation processes.

보다 구체적으로, 엔트로피 모델은 각

의 분포를 개별적으로 추정할 수 있다. 각

의 분포의 추정에 있어서,

및

는 주어진 문맥들의 2 개의 타입들인

및

을 가지고 추정될 수 있다.More specifically, each entropy model

The distribution of can be estimated individually. bracket

In the estimation of the distribution of,

And

Is two types of given contexts

And

It can be estimated with.

이러한 문맥들 중에서,

는 추가의 비트 할당을 요구하는 부가 정보일 수 있다.

를 운반하기 위해 요구되는 비트-레이트를 감소시키기 위해,

로부터 변환된 은닉 표현성분

는

자신의 엔트로피 모델에 의해 양자화 및 엔트로피-부호화될 수 있다.Among these contexts,

May be additional information requiring additional bit allocation.

In order to reduce the bit-rate required to transport,

Expression component transformed from

The

It can be quantized and entropy-coded by its entropy model.

반면,

는 어떤 추가의 비트 할당 없이

로부터 추출될 수 있다.On the other hand,

Without any additional bit allocation

It can be extracted from.

여기에서,

는 엔트로피 부호화 또는 엔트로피 복호화 진행함에 따라 변할 수 있다. 그러나,

는 동일한

를 처리함에 있어서 언제나 부호기(800) 및 복호기(900)의 양자 내에서 동일할 수 있다.From here,

May change as entropy encoding or entropy decoding proceeds. But,

Is the same

In processing, it can always be the same within both the encoder 800 and the decoder 900.

의 파라미터들 및 엔트로피 모델들은 부호기(800) 및 복호기(900)의 양자에 의해 단순하게 공유될 수 있다. 도 8의 점선으로 도시된 것과 같이 훈련이 진행되는 동안 엔트로피 모델들로의 입력들은 노이즈 낀 표현성분들일 수 있다. 노이즈 낀 표현성분들은 엔트로피 모델이 이산 표현성분들의 확률 질량 함수들에 근사하도록 할 수 있다.

The parameters and entropy models of can be simply shared by both the encoder 800 and the decoder 900. As illustrated by the dotted line in FIG. 8, inputs to entropy models during training may be noisy expression components. The noisy expression components can cause the entropy model to approximate the probability mass functions of the discrete expression components.

도 10는 일 실시예에 따른 자동 부호기의 구현을 나타낸다.10 shows an implementation of an automatic encoder according to an embodiment.

도 10의 전술된 부호기(800) 및 복호기(900)의 구조는 자동 부호기(1000)로서 표현될 수 있다. 말하자면, 실시예에서, 부호기(800) 및 복호기(900)를 위해, 컨볼루션(convolution) 자동 부호기 구조가 사용될 수 있고, 분포 추정자

또한 콘볼루션 신경 네트워크들과 함께 구현될 수 있다.The structures of the above-described encoder 800 and decoder 900 of FIG. 10 may be represented as an automatic encoder 1000. That is, in the embodiment, for the encoder 800 and the decoder 900, a convolution automatic encoder structure may be used, and the distribution estimator

It can also be implemented with convolutional neural networks.

도 10에서, 콘볼루션은 "conv"로 약술되었다. "GDN"은 일반화된 분할 정규화(generalized divisive normalization)를 나타낼 수 있다. "IGDN"은 역 일반화된 분할 정규화(inverse generalized divisive normalization)를 나타낼 수 있다.In Figure 10, convolution is abbreviated as "conv". "GDN" may indicate generalized divisive normalization. "IGDN" may indicate inverse generalized divisive normalization.

도 10에서, leakyReLU는 ReLU의 변형인 함수일 수 있으며, 유출되는(leaky) 정도가 특정되는 함수일 수 있다. leakyReLU 함수에 대해 제1 설정 값 및 제2 설정 값이 설정될 수 있다. leakyReLU 함수는 입력 값이 제1 설정 값의 이하인 경우, 제1 설정 값을 출력하지 않고, 입력 값 및 제2 설정 값을 출력할 수 있다.In FIG. 10, leakyReLU may be a function that is a variation of ReLU, and may be a function in which the degree of leakage is specified. The first set value and the second set value may be set for the leakyReLU function. The leakyReLU function may output the input value and the second setting value without outputting the first setting value when the input value is less than or equal to the first setting value.

또한, 도 10에서 사용된 콘볼루션 레이어에 대한 기보법들(notations)은 다음과 같을 수 있다: 필터들의 개수

필터 높이

필터 폭 (/ 다운스케일 또는 업스케일의 팩터(factor)).In addition, the notations for the convolutional layer used in FIG. 10 may be as follows: number of filters

Filter height

Filter width (/ downscale or upscale factor).

또한,

및

는 업스케일링 및 다운스케일링을 각각 나타낼 수 있다. 업스케일링 및 다운스케일링에 대해서, 트랜스포스된(transposed)된 컨볼루션이 사용될 수 있다.Also,

And

May indicate upscaling and downscaling, respectively. For upscaling and downscaling, transposed convolution can be used.

콘볼루션 신경 네트워크들은 변환 및 재구축 기능들을 구현하기 위해 사용될 수 있다.Convolutional neural networks can be used to implement transformation and reconstruction functions.

도 10에서 도시된

,

및

는 전술된 다른 실시예에서의 설명이 적용될 수 있다. 또한,

의 말단(end)에서는, 절대(absolute) 연산자(operator)가 아닌 자승(exponentiation) 연산자가 사용될 수 있다.Shown in FIG. 10

,

And

The description in the above-described other embodiments may be applied. Also,

At the end of, an exponentiation operator, not an absolute operator, can be used.

각

의 분포를 추정하기 위한 구성요소들이 컨볼루션 자동 부호기에 추가되었다.bracket

The components for estimating the distribution of are added to the convolutional automatic encoder.

도 10에서, "Q"는 균일 양자화 (반올링)을 나타낼 수 있다. "EC"는 엔트로피 인코딩을 나타낼 수 있다. "ED"는 엔트로피 디코딩을 나타낼 수 있다. "

"는 분포 추정자를 나타낼 수 있다.In FIG. 10, “Q” may indicate uniform quantization (rounding). "EC" may indicate entropy encoding. "ED" may indicate entropy decoding. "

"May represent a distribution estimator.

또한, 컨볼루션 자동 부호기는 컨볼루션 레이어들을 사용하여 구현될 수 있다. 컨볼루션 레이어로의 입력은 채널-단위로(channel-wisely) 연쇄된(concatenated)

및

일 수 있다. 컨볼루션 레이어는 추정된

및 추정된

를 결과들로서 출력할 수 있다.Also, the automatic convolutional encoder can be implemented using convolutional layers. The input to the convolution layer is channel-wisely concatenated

And

Can be The convolution layer is estimated

And estimated

Can be output as results.

여기에서, 동일한

및

가 동일한 공간적 위치에 위치하는 모든

들에게 공유될 수 있다.Here, the same

And

Are all located in the same spatial location

Can be shared with others.

는

를 검출하기 위해 채널들을 걸쳐 모든 공간적으로 인접한 요소들을

로부터 추출할 수 있다. 유사하게,

는

를 위하여 모든 인접한 알려진 요소들을

로부터 추출할 수 있다. 이러한

및

에 의한 추출들은 서로 다른 채널들 사이의 남아있는(remaining) 상관관계들(correlations)을 캡춰하는 효과를 가질 수 있다.

The

All spatially adjacent elements across the channels to detect

Can be extracted from Similarly,

The

For all adjacent known elements

Can be extracted from Such

And

The extractions by can have the effect of capturing the remaining correlations between different channels.

는 동일한 공간적 위치에서의 1) 모든

, 2)

의 채널들의 총 개수 및 3)

들의 분포들을 단 하나의 단계에서 추출할 수 있으며, 이러한 추출을 통해 추정들의 총 개수가 감소될 수 있다.

1) All in the same spatial location

, 2)

3) the total number of channels of

The distributions of their can be extracted in just one step, and the total number of estimates can be reduced through this extraction.

나아가

의 파라미터들은

의 모든 공간적 위치들에 대하여 공유될 수 있다. 이러한 공유를 통해

당 단지 하나의 훈련된

가 영상들의 임의의 크기를 처리하기 위해 필요할 수 있다.Furthermore

The parameters of

Can be shared for all spatial locations. Through this sharing

Only one trained per

A may be needed to process any size of images.

그러나, 훈련의 경우, 전술된 단순화들에도 불구하고, 율 항을 계산하기 위하 전체의 공간적 위치들로부터의 결과들을 수집하는 것은 크나큰 부담이 될 수 있다. 이러한 부담을 감소시키기 위해, 문맥 적응형 엔트로피 모델에 대한 모든 훈련 단계마다 랜덤의(random) 공간적 포인트들의 특정된 개수(예를 들면, 16)가 대표자들(representatives)로서 지정될 수 있다. 이러한 지정은 율 항의 계산을 용이하게 할 수 있다. 여기에서, 이러한 랜덤 공간적 포인트들은 단지 율 항을 위해서 사용될 수 있다. 반면, 왜곡 항은 여전히 전체의 영상들 상에서 계산될 수 있다.However, in the case of training, despite the simplifications described above, collecting results from the entire spatial locations to calculate the rate term can be a heavy burden. To reduce this burden, a specified number (eg, 16) of random spatial points can be designated as representatives for every training step for the context adaptive entropy model. This designation can facilitate calculation of the rate term. Here, these random spatial points can only be used for rate terms. On the other hand, the distortion term can still be calculated on the whole image.

는 3-차원의 배열(array)이기 -문에,

에 대한 인덱스 i는 3 개의 인덱스들 k, l 및 m을 포함할 수 있다. k는 수평의 인덱스일 수 있다. l는 수직의 인덱스일 수 있다. m는 채널 인덱스일 수 있다.

Is a 3-dimensional array -because,

The index i for may include three indices k , l and m . k may be a horizontal index. l may be a vertical index. m may be a channel index.

현재의 위치가 (k, l, m)일 때,

는

을

로서 추출할 수 있다. 또한,

는

를

로서 추출할 수 있다. 여기에서,

는

의 알려진 영역을 나타낼 수 있다.When the current position is ( k , l , m ),

The

of

Can be extracted. Also,

The

To

Can be extracted. From here,

The

It can represent a known area of.

의 알려지지 않은 영역은 0으로 채워질 수 있다.

의 알려지지 않은 영역을 0으로 채움에 따라,

의 차원이

의 차원과 동일성을 갖도록 유지될 수 있다. 따라서,

는 언제나 0으로 채워질 수 있다.

The unknown region of can be filled with zeros.

By filling the unknown area of 0 with,

The dimension of

It can be maintained to have the same identity as the dimension of. therefore,

Can always be filled with zeros.

추정 결과들의 차원을 입력으로 유지시키기 위해,

및

의 마진의(marginal) 영역들 또한 0으로 세트될 수 있다.To keep the dimensions of the estimation results as input,

And

The marginal areas of can also be set to zero.

훈련 또는 부호화가 수행될 때,

는 단지 단순한 4

4

윈도우들 및 이진(binary) 마스크들을 사용하여 추출될 수 있다. 이러한 추출은 병렬 처리를 가능하게 할 수 있다. 반면, 복호화에서는, 순차적인(sequential) 재구축이 사용될 수 있다.When training or coding is performed,

Is just simple 4

4

It can be extracted using windows and binary masks. Such extraction may enable parallel processing. On the other hand, in decoding, sequential reconstruction may be used.

구현 비용을 감소시키기 위한 다른 구현 테크닉으로서, 하이브리드 접근방식이 사용될 수 있다. 실시예의 엔트로피 모델은 경량(lightweight) 엔트로피 모델과 결합될 수 있다. 경량 엔트로피 모델에 있어서, 표현성분들은 추정된 표준 편차들을 갖는 제로-평균값 가우시안 모델을 따르는 것으로 가정될 수 있다.As another implementation technique to reduce the implementation cost, a hybrid approach can be used. The entropy model of an embodiment can be combined with a lightweight entropy model. For a lightweight entropy model, the expression components can be assumed to follow a zero-average Gaussian model with estimated standard deviations.

이러한 하이브리드 접근방식은 9 개의 구성들(configurations) 내에서 비트-레이트의 내림차순으로 상위 4 개의 경우들에 대하여 활용될 수 있다. 이러한 활용에 있어서, 더 고품질의 압축에 대하여 매우 낮은 공간적 의존성을 갖는 희소(sparse) 표현성분들의 개수가 증가하고, 따라서 직접 스케일 추정이 이러한 추가된 표현성분들에 대해서 충분한 성능을 제공한다는 것이 가정될 수 있다.This hybrid approach can be utilized for the top four cases in descending order of bit-rate within nine configurations. In this application, it can be assumed that the number of sparse expression components having a very low spatial dependence for higher quality compression increases, so that direct scale estimation provides sufficient performance for these added expression components. have.

구현에 있어서, 은닉 표현성분

는 2 개의 파트들

및

로 분리될 수 있다. 2 개의 상이한 엔트로피 모델들이

및

에 대해서 적용될 수 있다.

,

및

의 파라미터들은 공유될 수 있고, 전체의 파라미터들은 여전히 함께 훈련될 수 있다.In the implementation, the secret expression component

Is 2 parts

And

Can be separated into. Two different entropy models

And

Can be applied against.

,

And

The parameters of can be shared, and the overall parameters can still be trained together.

예를 들면, 5 개의 하위의 구성들에 대하여 파라미터들

의 개수는 182로 세트될 수 있다. 파라미터들

의 개수는 192로 세트될 수 있다. 약간 더 많은 파라미터들이 더 상위의 구성들에 대해서 사용될 수 있다.For example, parameters for 5 sub-configurations

The number of can be set to 182. Parameters

The number of can be set to 192. Slightly more parameters can be used for higher configurations.

실제의 엔트로피 부호화를 위해, 산술(arithmetic) 부호기가 사용될 수 있다. 산술 부호기는 추정된 모델 파라미터들을 가지고 전술된 것과 같은 비트스트림의 생성 및 재구축을 수행할 수 있다.For actual entropy encoding, an arithmetic encoder can be used. The arithmetic encoder can perform the generation and reconstruction of the bitstream as described above with the estimated model parameters.

앞서 설명된 것과 같이, 엔트로피 모델을 활용하는 ANN-기반 이미지 압축 접근방식에 기반하여, 실시예의 엔트로피 모델들은 문맥들의 2 개의 다른 타입들을 활용하도록 확장될 수 있다.As described above, based on the ANN-based image compression approach that utilizes the entropy model, the entropy models of the embodiment can be extended to utilize two different types of contexts.

이러한 문맥들은 엔트로피 모델이 mu 파라미터들 및 표준 편차들을 갖는 일반화된 형태를 갖고서 표현성분들의 분포를 더 정확하게 추정하게 할 수 있다.These contexts allow the entropy model to more accurately estimate the distribution of expression components with a generalized form with mu parameters and standard deviations.

활용되는 문맥들은 2 개의 타입들로 나뉠 수 있다. 2 개의 타입들 중 하나는 자유(free) 문맥의 일종일 수 있으며, 부호기(800) 및 복호기(900)의 양자에게 알려진 은닉 변수들의 부분을 포함할 수 있다. 2 개의 타입들 중 다른 하나는 공유될 추가의 비트의 할당을 요하는 문맥일 수 있다. 전자는 다양한 코덱들에서 일반적으로 이용되는 문맥들일 수 있다. 후자는 압축에 도움이 되는 것으로 검증된 것일 수 있다. 실시예에서는, 이러한 문맥들을 활용하는 엔트로피 모델들의 프레임워크가 제공되었다.The contexts utilized can be divided into two types. One of the two types may be a type of free context, and may include portions of hidden variables known to both the encoder 800 and the decoder 900. The other of the two types may be a context requiring allocation of additional bits to be shared. The former can be contexts commonly used in various codecs. The latter may have been proven to aid compression. In an embodiment, a framework of entropy models utilizing these contexts has been provided.

추가적으로 실시예의 성능을 향상시키는 다양한 방법들이 고려될 수 있다.Additionally, various methods of improving the performance of an embodiment can be considered.

성능 향상을 위한 하나의 방법은 엔트로피 모델의 기반이 되는 분포 모델을 일반화하는 것일 수 있다. 실시예에서는, 이전의 엔트로피 모델들을 일반화함으로써 성능이 향상될 수 있고, 상당히 수용 가능한 결과가 검출될 수 있다. 그러나, 가우시안-기반의 엔트로피 모델들은 명백하게 제한된 표현력(expression power)을 가질 수 있다.One method for improving performance may be to generalize a distribution model on which an entropy model is based. In an embodiment, performance can be improved by generalizing previous entropy models and significantly acceptable results can be detected. However, Gaussian-based entropy models can obviously have limited expression power.

예를 들면, 비-파라미터의(non-parametric) 모델들과 같이 더 정교한(elaborate) 모델들이 실시예의 문맥-적응성(context-adaptivity)과 결합될 경우, 이러한 결합은 실제의 분포들 및 추정 모델들 간의 미스매치를 감소시킴으로써 더 나은 결과들을 제공할 수 있다.For example, when more elaborate models, such as non-parametric models, are combined with the context-adaptivity of the embodiment, this combination is actual distributions and estimation models. Reducing liver mismatch can provide better results.

성능 향상을 위한 다른 방법은 문맥들의 레벨들을 향상시키는 것일 수 있다.Another way to improve performance may be to improve the levels of contexts.

실시예는 제한된 인접 영역들 내에서의 낮은 레벨의 표현성분들을 사용할 수 있다. 네트워크들의 충분한 용량과, 문맥들의 더 높은 레벨이 주어진다면, 실시예에 의해 더 정확한 추정이 가능해질 수 있다.The embodiment may use low level expression components within limited adjacent regions. Given sufficient capacity of the networks and a higher level of contexts, a more accurate estimate can be made possible by the embodiment.

예를 들면, 사람 안면들의 구조들에 관하여, 엔트로피 모델이 상기의 구조들이 일반적으로 2 개의 눈들을 가지고, 2 개의 눈들 간의 대칭이 존재한다는 것을 이해한다면, 엔트로피 모델은 사람 안면의 남은 하나의 눈을 부호화함에 있어서 (하나의 눈의 형상과 위치를 참조하여) 분포들을 더 정확하게 근사할 수 있다.For example, with respect to the structures of the human faces, if the entropy model understands that the above structures generally have two eyes, and that there is symmetry between the two eyes, the entropy model will look at the remaining one eye In coding, it is possible to approximate the distributions more accurately (with reference to the shape and position of one eye).

예를 들면, 생성적인 엔트로피 모델은, 예를 들면 사람 안면들 및 침실들과 같은 특정한 도메인 내에서의 영상들의 분포

를 학습할 수 있다. 또한, 인--페인팅(in-painting) 방법들은 보이는 영역들이

로 주어졌을 때 조건적인(conditional) 분포

를 학습할 수 있다. 이러한 고-레벨 이해들(understandings)이 실시예에 결합될 수 있다.For example, a generative entropy model is a distribution of images within a specific domain, such as human faces and bedrooms.

Can learn. Also, in-painting methods are used for

Conditional distribution given by

Can learn. These high-level understandings can be incorporated into the embodiment.

나아가, 부가 정보를 통해 제공되는 문맥들은 세그맨테이션 맵(segmentation map) 및 압축을 돕는 다른 정보과 같은 고-레벨 정보로 확장될 수 있다. 예를 들면, 세그맨테이션 맵은 표현성분이 속하는 세그먼트 클래스에 따라 표현성분의 분포를 구별적으로(discriminatively) 추정하는 것을 도울 수 있다.Furthermore, contexts provided through additional information can be extended to high-level information such as a segmentation map and other information that aids compression. For example, the segmentation map can help to estimate the distribution of the expression components discriminatively according to the segment class to which the expression component belongs.

도 11은 일 예에 따른 더 높은 비트-레이트 환경들을 위한 하이브리드 네트워크의 구조를 나타낸다.11 shows the structure of a hybrid network for higher bit-rate environments according to an example.

도 11에서 사용되는 기보법들은 이전의 실시예의 기보법들과 동일할 수 있다.The notation methods used in FIG. 11 may be the same as those of the previous embodiment.

도 11에서, 제1 EC/ED는 전술된 실시예에서의 문맥-적응적 엔트로피 모델을 사용하는 엔트로피 부호화 및 엔트로피 복호화를 나타낼 수 있다. 제로-평균값 가우시안 모델을 따르는 제2 EC/ED는 경량 엔트로피 모델을 사용하는 엔트로피 부호화 및 엔트로피 복호화를 나타낼 수 있다.In FIG. 11, the first EC / ED may indicate entropy encoding and entropy decoding using the context-adaptive entropy model in the above-described embodiment. The second EC / ED following the zero-average Gaussian model may represent entropy encoding and entropy decoding using a lightweight entropy model.

하이브리드 네트워크(1100)는 전술된 하이브리드 접근방식을 따르는 ANN일 수 있다.Hybrid network 1100 may be an ANN that follows the hybrid approach described above.

하이브리드 네트워크(1100)에서, 은닉 표현성분

는 2 개의 파트들 제1 부분 은닉 표현성분

및 제2 부분 은닉 표현성분

로 분할될 수 있고,

는 제1 양자화된 부분 은닉 표현성분

로 양자화될 수 있고,

는 제2 양자화된 부분 은닉 표현성분

로 양자화될 수 있다.In the hybrid network 1100, the hidden expression component

Is two parts, the first part hidden expression component

And the second part hidden expression component

Can be divided into

Is the first quantized partial hidden expression component

Can be quantized to

Is the second quantized partial hidden expression component

Can be quantized to

분할의 결과들 중 하나인

는 실시예의 문맥-적응적 엔트로피 모델을 사용하여 부호화될 수 있다. 반면, 분할의 결과들 중 다른 하나인

는 표준 추정을 사용하는 더 단순한 경량 엔트로피 모델을 사용하여 부호화될 수 있다.One of the results of the split

Can be coded using the context-adaptive entropy model of the embodiment. On the other hand, one of the results of division

Can be coded using a simpler lightweight entropy model using standard estimation.

모든 연쇄(concatenation) 및 분할(split) 연산자들은 채널-단위 방식(channel-wise manner)으로 수행될 수 있다.All concatenation and split operators can be performed in a channel-wise manner.

고-픽셀 당 비트들(Bits Per Pixel; BPP) 구성들에 대한 구현 비용을 감소시키기 위해 경량 엔트로피 모델이 문맥-적응적 엔트로피 모델에 결합될 수 있다.A lightweight entropy model can be combined with the context-adaptive entropy model to reduce implementation costs for high-bit-per-pixel (Bits Per Pixel) configurations.

경량 엔트로피 모델은, 양자화된 표현성분들의 PMF 근사들이 표준 균일 분포와 함께 제로-평균값 가우시안 분포들을 따른다는 가정 하에, 스케일 (말하자면, 표준 편차) 추정을 활용할 수 있다.The lightweight entropy model can utilize a scale (say, standard deviation) estimate, assuming that the PMF approximations of the quantized expression components follow a zero-means Gaussian distribution with a standard uniform distribution.

표현성분

는 채널-단위로 2 개의 파트들

및

로 분할될 수 있다.

는

채널들을 가질 수 있다.

는

채널들을 가질 수 있다. 다음으로,

및

는 양자화될 수 있다.

가 양자화됨에 따라

가 생성될 수 있다.

가 양자화됨에 따라

가 생성될 수 있다.Expression

Is two parts in channel-unit

And

Can be divided into

The

You can have channels.

The

You can have channels. to the next,

And

Can be quantized.

As is quantized

Can be generated.

As is quantized

Can be generated.

는 문맥-적응적 엔트로피 모델로 엔트로피 인코딩될 수 있다. 반면,

는 경량 엔트로피 모델로 엔트로피 인코딩될 수 있다.

Can be entropy encoded with a context-adaptive entropy model. On the other hand,

Can be entropy encoded with a lightweight entropy model.

의 표준 편차들은

및

를 통해 추정될 수 있다.

The standard deviations of

And

Can be estimated through

문맥-적응적 엔트로피 모델은

의 결과들을 추정자

로의 입력 소스로서 사용할 수 있다. 이러한 문맥-적응적 엔트로피 모델과는 달리, 경량 엔트로피 모델은

로부터 직접적으로 추정된 표준 편차들을 검출할 수 있다. 여기에서,

는

및

의 연쇄를 입력으로서 취할 수 있다.

는

와 함께

를 동시에 생성할 수 있다.The context-adaptive entropy model

The results of the estimator

Can be used as an input source for furnaces. Unlike this context-adaptive entropy model, a lightweight entropy model

The standard deviations estimated can be directly detected from. From here,

The

And

You can take the chain of as input.

The

with

Can be created simultaneously.

아래의 수학식 6은 총 손실 함수를 나타낼 수 있다.Equation 6 below may represent a total loss function.

총 손실 함수 또한 율 항 및 왜곡 항을 포함할 수 있다. 율 항은

,

및

의 3 개로 나뉠 수 있다. 말하자면, 율 항은

,

및

를 포함할 수 있다. 왜곡 항은 전술된 실시예에서의 왜곡 항과 동일할 수 있다. 단,

는

및

의 채널-단위로 연쇄된 표현성분일 수 있다.The total loss function can also include rate terms and distortion terms. Rate terms

,

And

Can be divided into three. In other words, the rate term

,

And

It may include. The distortion term may be the same as the distortion term in the above-described embodiment. only,

The

And

It may be an expression component chained in a channel-unit of.

,

및

의 노이즈 낀 표현성분은 표준 균일 분포를 따를 수 있다.

의 평균값은

일 수 있다.

의 평균값은

일 수 있다.

의 평균값은

일 수 있다.

,

And

The noisy expression component of can follow the standard uniform distribution.

The average value of

Can be

The average value of

Can be

The average value of

Can be

및

는

로부터의 채널-단위로 분할된 표현성분들일 수 있다.

는 변환

의 결과일 수 있다.

And

The

It may be expression components partitioned by channel-unit from.

Convert

May be the result of

는

채널들을 가질 수 있다.

는

채널들을 가질 수 있다.

The

You can have channels.

The

You can have channels.

에 대한 율 항은 전술된 수학식 4에서의 모델과 동일한 모델일 수 있다.

The rate term for may be the same model as the model in Equation 4 described above.

는

에 대한 모델에는 기여하지 않을 수 있고,

에 대한 모델에는 기여할 수 있다.

The

May not contribute to the model for,

Can contribute to the model.

에 대한 율 항에서, 노이즈 낀 표현성분들은 단지 학습을 위해서만 엔트로피 모델들의 입력으로 사용될 수 있다. 말하자면,

에 대한 율 항에서, 노이즈 낀 표현성분들은 엔트로피 모델들의 조건들에 대해서는 사용되지 않을 수 있다.

In the rate term for, the noisy expression components can be used as input to entropy models only for learning. as it were,

In the rate term for, noisy expression components may not be used for the conditions of entropy models.

의 엔트로피 모델은 전술된 수학식 5에서의 엔트로피 모델과 동일할 수 있다.

The entropy model of may be the same as the entropy model in Equation 5 described above.

구현에 있어서, 비트-레이트의 내림차순으로 상위 4 개의 구성들에 대하여 하이브리드 구조가 사용될 수 있다. In an implementation, a hybrid structure can be used for the top four configurations in descending order of bit-rate.

예를 들면, 상위 2 개의 구성들에 대해서,

은 400으로 세트될 수 있고,

는 192로 세트될 수 있고,

는 408로 세트될 수 있다.For example, for the top two configurations,

Can be set to 400,

Can be set to 192,

Can be set to 408.

예를 들면, 다음의 2 개의 구성들에 대해서,

은 320으로 세트될 수 있고,

는 192로 세트될 수 있고,

는 228로 세트될 수 있다.For example, for the following two configurations,

Can be set to 320,

Can be set to 192,

Can be set to 228.

도 12는 일 실시예에 따른 부호화 장치의 구조도이다.12 is a structural diagram of an encoding device according to an embodiment.

부호화 장치(1200)는 버스(1290)를 통하여 서로 통신하는 처리부(1210), 메모리(1230), 사용자 인터페이스(User Interface; UI) 입력 디바이스(1250), UI 출력 디바이스(1260) 및 저장소(storage)(1240)를 포함할 수 있다. 또한, 부호화 장치(1200)는 네트워크(1299)에 연결되는 통신부(1220)를 더 포함할 수 있다.The encoding apparatus 1200 includes a processor 1210, a memory 1230, a user interface (UI) input device 1250, a UI output device 1260, and a storage that communicates with each other through the bus 1290. It may include (1240). Also, the encoding device 1200 may further include a communication unit 1220 connected to the network 1299.

처리부(1210)는 중앙 처리 장치(Central Processing Unit; CPU), 메모리(1230) 또는 저장소(1240)에 저장된 프로세싱(processing) 명령어(instruction)들을 실행하는 반도체 장치일 수 있다. 처리부(1210)는 적어도 하나의 하드웨어 프로세서일 수 있다.The processing unit 1210 may be a central processing unit (CPU), a semiconductor device that executes processing instructions stored in the memory 1230 or the storage 1240. The processor 1210 may be at least one hardware processor.

처리부(1210)는 장치(1200)로 입력되거나, 장치(1200)에서 출력되거나, 장치(1200)의 내부에서 사용되는 신호, 데이터 또는 정보의 생성 및 처리를 수행할 수 있고, 신호, 데이터 또는 정보에 관련된 검사, 비교 및 판단 등을 수행할 수 있다. 말하자면, 실시예에서 데이터 또는 정보의 생성 및 처리와, 데이터 또는 정보에 관련된 검사, 비교 및 판단은 처리부(1210)에 의해 수행될 수 있다.The processing unit 1210 may generate and process signals, data, or information input to the device 1200, output from the device 1200, or used inside the device 1200, and may generate signals, data, or information. Related inspections, comparisons and judgments. That is, in the embodiment, generation and processing of data or information, and inspection, comparison, and judgment related to data or information may be performed by the processing unit 1210.

처리부(1210)를 구성하는 요소들의 적어도 일부는 프로그램 모듈들일 수 있으며, 외부의 장치 또는 시스템과 통신할 수 있다. 프로그램 모듈들은 운영 체제, 응용 프로그램 모듈 및 기타 프로그램 모듈의 형태로 부호화 장치(1200)에 포함될 수 있다.At least some of the elements constituting the processing unit 1210 may be program modules, and may communicate with an external device or system. The program modules may be included in the encoding device 1200 in the form of an operating system, application program modules, and other program modules.

프로그램 모듈들은 물리적으로는 여러 가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈 중 적어도 일부는 부호화 장치(1200)와 통신 가능한 원격 기억 장치에 저장될 수도 있다.The program modules may be physically stored on various known storage devices. Also, at least some of the program modules may be stored in a remote storage device capable of communicating with the encoding device 1200.

프로그램 모듈들은 일 실시예에 따른 기능 또는 동작을 수행하거나, 일 실시예에 따른 추상 데이터 유형을 구현하는 루틴(routine), 서브루틴(subroutine), 프로그램, 오브젝트(object), 컴퍼넌트(component) 및 데이터 구조(data structure) 등을 포괄할 수 있지만, 이에 제한되지는 않는다.Program modules perform a function or operation according to an embodiment, or a routine, subroutine, program, object, component and data implementing an abstract data type according to an embodiment It may include a data structure, but is not limited thereto.

프로그램 모듈들은 부호화 장치(1200)의 적어도 하나의 프로세서(processor)에 의해 수행되는 명령어(instruction) 또는 코드(code)로 구성될 수 있다.The program modules may be composed of instructions or codes performed by at least one processor of the encoding device 1200.

처리부(1210)는 전술된 부호화기(800)에 대응할 수 있다.The processor 1210 may correspond to the encoder 800 described above.

저장부는 메모리(1230) 및/또는 저장소(1240)를 나타낼 수 있다. 메모리(1230) 및 저장소(1240)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들면, 메모리(1230)는 롬(ROM)(1231) 및 램(RAM)(1232) 중 적어도 하나를 포함할 수 있다.The storage unit may represent the memory 1230 and / or the storage 1240. The memory 1230 and the storage 1240 may be various types of volatile or nonvolatile storage media. For example, the memory 1230 may include at least one of a ROM (ROM) 1231 and a RAM (1232).

저장부는 부호화 장치(1200)의 동작을 위해 사용되는 데이터 또는 정보를 저장할 수 있다. 실시예에서, 부호화 장치(1200)가 갖는 데이터 또는 정보는 저장부 내에 저장될 수 있다.The storage unit may store data or information used for the operation of the encoding device 1200. In an embodiment, data or information possessed by the encoding apparatus 1200 may be stored in the storage unit.

부호화 장치(1200)는 컴퓨터에 의해 독출(read)될 수 있는 기록 매체를 포함하는 컴퓨터 시스템에서 구현될 수 있다.The encoding apparatus 1200 may be implemented in a computer system including a recording medium that can be read by a computer.

기록 매체는 부호화 장치(1200)가 동작하기 위해 요구되는 적어도 하나의 모듈을 저장할 수 있다. 메모리(1230)는 적어도 하나의 모듈을 저장할 수 있고, 적어도 하나의 모듈이 처리부(1210)에 의하여 실행되도록 구성될 수 있다.The recording medium may store at least one module required for the encoding apparatus 1200 to operate. The memory 1230 may store at least one module, and at least one module may be configured to be executed by the processing unit 1210.

부호화 장치(1200)의 데이터 또는 정보의 통신과 관련된 기능은 통신부(1220)를 통해 수행될 수 있다.A function related to communication of data or information of the encoding device 1200 may be performed through the communication unit 1220.

네트워크(1299)는 부호화 장치(1200) 및 복호화 장치(1300) 간의 통신을 제공할 수 있다.The network 1299 may provide communication between the encoding device 1200 and the decoding device 1300.

도 13은 일 실시예에 따른 복호화 장치의 구조도이다.13 is a structural diagram of a decoding apparatus according to an embodiment.

복호화 장치(1300)는 버스(1390)를 통하여 서로 통신하는 처리부(1310), 메모리(1330), 사용자 인터페이스(User Interface; UI) 입력 디바이스(1350), UI 출력 디바이스(1360) 및 저장소(storage)(1340)를 포함할 수 있다. 또한, 복호화 장치(1300)는 네트워크(1399)에 연결되는 통신부(1320)를 더 포함할 수 있다.The decoding apparatus 1300 includes a processing unit 1310, a memory 1330, a user interface (UI) input device 1350, a UI output device 1360, and a storage unit that communicates with each other through the bus 1390. It may include (1340). In addition, the decoding device 1300 may further include a communication unit 1320 connected to the network 1399.

처리부(1310)는 중앙 처리 장치(Central Processing Unit; CPU), 메모리(1330) 또는 저장소(1340)에 저장된 프로세싱(processing) 명령어(instruction)들을 실행하는 반도체 장치일 수 있다. 처리부(1310)는 적어도 하나의 하드웨어 프로세서일 수 있다.The processing unit 1310 may be a central processing unit (CPU), a semiconductor device that executes processing instructions stored in the memory 1330 or the storage 1340. The processor 1310 may be at least one hardware processor.

처리부(1310)는 장치(1300)로 입력되거나, 장치(1300)에서 출력되거나, 장치(1300)의 내부에서 사용되는 신호, 데이터 또는 정보의 생성 및 처리를 수행할 수 있고, 신호, 데이터 또는 정보에 관련된 검사, 비교 및 판단 등을 수행할 수 있다. 말하자면, 실시예에서 데이터 또는 정보의 생성 및 처리와, 데이터 또는 정보에 관련된 검사, 비교 및 판단은 처리부(1310)에 의해 수행될 수 있다.The processor 1310 may be input to the device 1300, output from the device 1300, or may generate and process signals, data, or information used inside the device 1300, and may perform signal, data, or information Related inspections, comparisons and judgments. That is, in the embodiment, the generation and processing of data or information, and the inspection, comparison, and judgment related to the data or information may be performed by the processing unit 1310.

처리부(1310)를 구성하는 요소들의 적어도 일부는 프로그램 모듈들일 수 있으며, 외부의 장치 또는 시스템과 통신할 수 있다. 프로그램 모듈들은 운영 체제, 응용 프로그램 모듈 및 기타 프로그램 모듈의 형태로 복호화 장치(1300)에 포함될 수 있다.At least some of the elements constituting the processing unit 1310 may be program modules, and may communicate with an external device or system. The program modules may be included in the decoding device 1300 in the form of an operating system, application program modules, and other program modules.

프로그램 모듈들은 물리적으로는 여러 가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈 중 적어도 일부는 복호화 장치(1300)와 통신 가능한 원격 기억 장치에 저장될 수도 있다.The program modules may be physically stored on various known storage devices. Also, at least some of the program modules may be stored in a remote storage device capable of communicating with the decoding device 1300.

프로그램 모듈들은 복호화 장치(1300)의 적어도 하나의 프로세서(processor)에 의해 수행되는 명령어(instruction) 또는 코드(code)로 구성될 수 있다.The program modules may be composed of instructions or codes performed by at least one processor of the decoding device 1300.

처리부(1310)는 전술된 복호화기(900)에 대응할 수 있다.The processor 1310 may correspond to the decoder 900 described above.

저장부는 메모리(1330) 및/또는 저장소(1340)를 나타낼 수 있다. 메모리(1330) 및 저장소(1340)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들면, 메모리(1330)는 롬(ROM)(1331) 및 램(RAM)(1332) 중 적어도 하나를 포함할 수 있다.The storage unit may represent the memory 1330 and / or the storage 1340. The memory 1330 and the storage 1340 may be various types of volatile or nonvolatile storage media. For example, the memory 1330 may include at least one of a ROM (1331) and a RAM (RAM) 1332.

저장부는 복호화 장치(1300)의 동작을 위해 사용되는 데이터 또는 정보를 저장할 수 있다. 실시예에서, 복호화 장치(1300)가 갖는 데이터 또는 정보는 저장부 내에 저장될 수 있다.The storage unit may store data or information used for the operation of the decoding device 1300. In an embodiment, data or information possessed by the decoding apparatus 1300 may be stored in the storage unit.

복호화 장치(1300)는 컴퓨터에 의해 독출(read)될 수 있는 기록 매체를 포함하는 컴퓨터 시스템에서 구현될 수 있다.The decoding apparatus 1300 may be implemented in a computer system including a recording medium that can be read by a computer.

기록 매체는 복호화 장치(1300)가 동작하기 위해 요구되는 적어도 하나의 모듈을 저장할 수 있다. 메모리(1330)는 적어도 하나의 모듈을 저장할 수 있고, 적어도 하나의 모듈이 처리부(1310)에 의하여 실행되도록 구성될 수 있다.The recording medium may store at least one module required for the decoding apparatus 1300 to operate. The memory 1330 may store at least one module, and at least one module may be configured to be executed by the processing unit 1310.

복호화 장치(1300)의 데이터 또는 정보의 통신과 관련된 기능은 통신부(1320)를 통해 수행될 수 있다.A function related to communication of data or information of the decoding device 1300 may be performed through the communication unit 1320.

네트워크(1399)는 부호화 장치(1200) 및 복호화 장치(1300) 간의 통신을 제공할 수 있다.The network 1399 may provide communication between the encoding device 1200 and the decoding device 1300.

도 14는 일 실시예에 따른 부호화 방법의 흐름도이다.14 is a flowchart of an encoding method according to an embodiment.

단계(1410)에서, 부호화 장치(1200)의 처리부(1210)는 비트스트림을 생성할 수 있다.In step 1410, the processor 1210 of the encoding apparatus 1200 may generate a bitstream.

처리부(1210)는 입력 영상에 대해 엔트로피 모델을 사용하는 엔트로피 부호화를 수행하여 비트스트림을 생성할 수 있다.The processor 1210 may generate an bitstream by performing entropy encoding using an entropy model on the input image.

엔트로피 모델은 전술된 문맥-적응적 엔트로피 모델일 수 있다. 문맥-적응형 엔트로피 모델은 문맥들의 서로 상이한 복수의 타입들을 활용할 수 있다. 문맥들의 서로 상이한 복수의 타입들은 비트-소비 문맥 및 비트-프리 문맥을 포함할 수 있다.The entropy model can be the context-adaptive entropy model described above. The context-adaptive entropy model can utilize multiple different types of contexts. A plurality of different types of contexts may include a bit-consumption context and a bit-free context.

문맥들의 서로 상이한 복수한 타입들로부터 엔트로피 모델의 표준 편차 파라미터 및 평균값 파라미터가 추정될 수 있다. 말하자면, 엔트로피 모델은 평균값 파라미터를 갖는 가우시안 모델에 기반할 수 있다.The standard deviation parameter and the mean value parameter of the entropy model can be estimated from a plurality of different types of contexts. In other words, the entropy model can be based on a Gaussian model with mean value parameters.

또는, 엔트로피 모델은 복수의 타입들의 엔트로피 모델들일 수 있다. 예를 들면, 엔트로피 모델은 문맥-적응적 엔트로피 모델 및 경량 엔트로피 모델을 포함할 수 있다.Alternatively, the entropy model may be a plurality of types of entropy models. For example, the entropy model can include a context-adaptive entropy model and a lightweight entropy model.

단계(1420)에서, 부호화 장치(1200)의 통신부(1220)는 비트스트림을 전송할 수 있다. 통신부(1220)는 비트스트림을 복호화 장치(1300)로 전송할 수 있다. 또는, 비트스트림은 부호화 장치(1200)의 저장부에 저장할 수 있다.In operation 1420, the communication unit 1220 of the encoding device 1200 may transmit a bitstream. The communication unit 1220 may transmit the bitstream to the decoding device 1300. Alternatively, the bitstream may be stored in the storage unit of the encoding device 1200.

전술된 실시예에서 설명된 영상의 엔트로피 부호화 및 엔트로피 엔진에 관련된 내용은 본 실시예에도 적용될 수 있다. 중복되는 설명은 생략된다.The contents related to the entropy encoding and the entropy engine of the image described in the above-described embodiment can also be applied to this embodiment. Duplicate description is omitted.

도 15는 일 실시예에 따른 복호화 방법의 흐름도이다.15 is a flowchart of a decoding method according to an embodiment.

단계(1510)에서, 복호화 장치(1300)의 통신부(1320)는 비트스트림을 획득할 수 있다.In operation 1510, the communication unit 1320 of the decoding apparatus 1300 may acquire a bitstream.

단계(1520)에서, 복호화 장치(1300)의 처리부(1310)는 비트스트림을 사용하여 재구축된 영상을 생성할 수 있다.In operation 1520, the processing unit 1310 of the decoding apparatus 1300 may generate a reconstructed image using a bitstream.

복호화 장치(1300)의 처리부(1310)는 비트스트림에 대해 엔트로피 모델을 사용하는 복호화를 수행하여 재구축된 영상을 생성할 수 있다.The processor 1310 of the decoding apparatus 1300 may perform decoding using an entropy model on the bitstream to generate a reconstructed image.

전술된 실시예에서 설명된 영상의 엔트로피 복호화 및 엔트로피 엔진에 관련된 내용은 본 실시예에도 적용될 수 있다. 중복되는 설명은 생략된다.The contents related to the entropy decoding and the entropy engine of the image described in the above-described embodiment can also be applied to this embodiment. Duplicate description is omitted.

상술한 실시예들에서, 방법들은 일련의 단계 또는 유닛으로서 순서도를 기초로 설명되고 있으나, 본 발명은 단계들의 순서에 한정되는 것은 아니며, 어떤 단계는 상술한 바와 다른 단계와 다른 순서로 또는 동시에 발생할 수 있다. 또한, 당해 기술 분야에서 통상의 지식을 가진 자라면 순서도에 나타난 단계들이 배타적이지 않고, 다른 단계가 포함되거나, 순서도의 하나 또는 그 이상의 단계가 본 발명의 범위에 영향을 미치지 않고 삭제될 수 있음을 이해할 수 있을 것이다.In the above-described embodiments, the methods are described based on a flow chart as a series of steps or units, but the present invention is not limited to the order of steps, and some steps may occur in a different order or simultaneously with other steps as described above. You can. In addition, those skilled in the art may recognize that the steps shown in the flowchart are not exclusive, other steps are included, or one or more steps in the flowchart may be deleted without affecting the scope of the present invention. You will understand.

상술한 실시예들은 다양한 양태의 예시들을 포함한다. 다양한 양태들을 나타내기 위한 모든 가능한 조합이 기술될 수는 없지만, 해당 기술 분야의 통상의 지식을 가진 자는 명시적으로 기술된 조합 외에도 다른 조합이 가능함을 인식할 수 있을 것이다. 따라서, 본 발명은 이하의 특허청구범위 내에 속하는 모든 다른 교체, 수정 및 변경을 포함한다고 할 것이다.The above-described embodiments include examples of various aspects. Although not all possible combinations for indicating various aspects can be described, those skilled in the art will recognize that other combinations are possible in addition to those explicitly described. Accordingly, the present invention will be said to include all other replacements, modifications and changes that fall within the scope of the following claims.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and can be recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention or may be known and available to those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체는 본 발명에 따른 실시예들에서 사용되는 정보를 포함할 수 있다. 예를 들면, 컴퓨터 판독 가능한 기록 매체는 비트스트림을 포함할 수 있고, 비트스트림은 본 발명에 따른 실시예들에서 설명된 정보를 포함할 수 있다.The computer-readable recording medium may include information used in embodiments according to the present invention. For example, a computer-readable recording medium may include a bitstream, and the bitstream may include information described in embodiments according to the present invention.

컴퓨터 판독 가능한 기록 매체는 비-일시적 컴퓨터 판독 가능한 매체(non-transitory computer-readable medium)를 포함할 수 있다.The computer readable recording medium may include a non-transitory computer-readable medium.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs, DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the present invention, and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described by specific matters such as specific components, etc. and limited embodiments and drawings, which are provided to help the overall understanding of the present invention, but the present invention is not limited to the above embodiments , Those skilled in the art to which the present invention pertains can make various modifications and variations from these descriptions.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention is not limited to the above-described embodiment, and should not be determined, and all claims that are equally or equivalently modified with the claims as described below are within the scope of the spirit of the present invention. Would say

Claims

Generating a bitstream by performing entropy encoding on the input image using an entropy model; And
Transmitting or storing the bitstream
Encoding method comprising a.

According to claim 1,
The entropy model is a context-adaptive entropy model,
The context-adaptive entropy model utilizes a plurality of different types of contexts.

According to claim 2,
A plurality of types of different types of contexts include a bit-consumption context and a bit-free context.

According to claim 3,
A coding method in which standard deviation parameters and average value parameters of the entropy model are estimated from a plurality of different types of the contexts.

According to claim 2,
The input method of the context-adaptive entropy model to the analysis transform is a uniformly quantized expression component.

According to claim 1,
The entropy model is a coding method based on a Gaussian model having an average value parameter.

According to claim 1,
The entropy model includes a context-adaptive entropy model and a lightweight entropy model.

The method of claim 7,
The hidden expression component is divided into a first partial hidden expression component and a second partial hidden expression component,
The first partial hidden expression component is quantized as a first quantized partial hidden expression component,
The second partial hidden expression component is quantized as a second quantized partial hidden expression component,
The first quantized partial hidden expression component is encoded using the context-adaptive entropy model,
The second quantized partial hidden expression component is encoded using the lightweight entropy model.

The method of claim 7,
The lightweight entropy model is a coding method using scale estimation.

The method of claim 7,
The lightweight entropy model is a coding method for detecting standard deviations estimated directly from an analytical transform.

A communication unit that acquires a bitstream; And
A processor for decoding the bitstream using an entropy model and generating a reconstructed image
Decoding device comprising a.

Obtaining a bitstream; And
Generating a reconstructed image by performing decoding using an entropy model on the bitstream
Decoding method comprising a.

The method of claim 12,
The entropy model is a context-adaptive entropy model,
The context-adaptive entropy model utilizes a plurality of different types of contexts.

The method of claim 13,
A plurality of different types of contexts include a bit-consumption context and a bit-free context.

The method of claim 14,
A decoding method in which a standard deviation parameter and an average value parameter of the entropy model are estimated from a plurality of different types of the contexts.

The method of claim 13,
The input method of the context-adaptive entropy model to the analysis transform is a uniformly quantized expression component.

The method of claim 12,
The entropy model is a decoding method based on a Gaussian model having an average value parameter.

The method of claim 12,
The entropy model includes a context-adaptive entropy model and a lightweight entropy model.

The method of claim 18,
The lightweight entropy model is a decoding method using scale estimation.

The method of claim 18,
The lightweight entropy model is a decoding method for detecting standard deviations estimated directly from an analytical transformation.