KR20220164562A

KR20220164562A - Method and apparatus for alternative multi-learning rates in neural image compression

Info

Publication number: KR20220164562A
Application number: KR1020227038635A
Authority: KR
Inventors: 딩 딩; 웨이 장; 성 린; 웨이 왕; 샤오중 쉬; 산 류
Original assignee: 텐센트 아메리카 엘엘씨
Priority date: 2021-04-16
Filing date: 2021-10-14
Publication date: 2022-12-13
Also published as: WO2022220868A1; EP4100919A4; US20220343552A1; EP4100919A1; CN115485729A; JP2023525233A

Abstract

적어도 하나의 프로세서에 의해 수행되는 뉴럴 네트워크 기반 대체 엔드-투-엔드(E2E) 이미지 압축(NIC)은 E2E NIC 프레임워크에 대한 입력 이미지를 수신하는 단계; E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 대체 이미지를 결정하는 단계; 대체 이미지를 인코딩하여 비트스트림을 발생시키는 단계; 및 대체 이미지를 비트스트림에 매핑하여 입력 이미지의 압축된 표현을 발생시키는 단계를 포함한다. 또한, 입력은 블록들로 파티셔닝될 수 있으며, 이 경우, 각각의 블록에 대해 대체 표현이 결정되고, 전체 대체 이미지 대신에 각각의 블록이 인코딩된다.Neural network based alternative end-to-end (E2E) image compression (NIC) performed by at least one processor includes: receiving an input image for an E2E NIC framework; determining a replacement image based on the training model of the E2E NIC framework; encoding the replacement image to generate a bitstream; and mapping the replacement image to the bitstream to generate a compressed representation of the input image. Also, the input can be partitioned into blocks, in which case an alternate representation is determined for each block and each block is encoded instead of the entire alternate image.

Description

Method and apparatus for alternative multi-learning rates in neural image compression

전통적인 하이브리드 비디오 코덱은 전체적으로 최적화되기 어렵다. 단일 모듈의 개선은 전체 성능의 코딩 이득을 생성하지 않을 수 있다. 최근, 스탠다드 그룹들 및 회사들은 미래 비디오 코딩 기술의 표준화에 대한 잠재적인 필요성들을 적극적으로 모색하고 있다. 이들 스탠다드 그룹들 및 회사들은 딥 뉴럴 네트워크들(Deep Neural Networks)(DNN)을 사용한 AI-기반 엔드-투-엔드 뉴럴 이미지 압축에 중점을 둔 JPEG-AI 그룹을 확립하였다. 중국 AVS 스탠다드는 또한 뉴럴 이미지 및 비디오 압축 기술들을 연구하기 위해 AVS-AI 특별 그룹을 형성하였다. 최근 접근 방식들의 성공으로 고급 뉴럴 이미지 및 비디오 압축 방법론들에서의 산업적 관심들이 점점 더 높아지고 있다.Traditional hybrid video codecs are difficult to optimize overall. An improvement of a single module may not produce a coding gain of the overall performance. Recently, standards groups and companies are actively exploring potential needs for standardization of future video coding technology. These standards groups and companies have established the JPEG-AI group focused on AI-based end-to-end neural image compression using Deep Neural Networks (DNN). The Chinese AVS standard has also formed an AVS-AI special group to study neural image and video compression techniques. The success of recent approaches has led to increasing industry interest in advanced neural image and video compression methodologies.

예시적인 실시예들에 따르면, 적어도 하나의 프로세서에 의해 수행되는 뉴럴 네트워크(neural network)를 사용한 대체(substitutional) 엔드-투-엔드(end-to-end)(E2E) 뉴럴 이미지 압축(neural image compression)(NIC) 방법으로서, E2E NIC 프레임워크에 대한 입력 이미지를 수신하는 단계; E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 대체 이미지(substitute image)를 결정하는 단계; 대체 이미지를 인코딩하여 비트스트림을 발생시키는 단계; 및 대체 이미지를 비트스트림에 매핑하여 압축된 표현을 발생시키는 단계를 포함하는 방법이 제공된다.According to example embodiments, substitutional end-to-end (E2E) neural image compression using a neural network performed by at least one processor. ) (NIC) method, comprising: receiving an input image for an E2E NIC framework; Based on the training model of the E2E NIC framework, determining a substitute image; encoding the replacement image to generate a bitstream; and mapping the replacement image to the bitstream to generate a compressed representation.

예시적인 실시예들에 따르면, 뉴럴 네트워크를 사용한 대체 엔드-투-엔드(E2E) 뉴럴 이미지 압축(NIC)을 위한 장치로서, 프로그램 코드를 저장하도록 구성되는 적어도 하나의 메모리; 및 프로그램 코드를 판독하고 프로그램 코드에 의해 지시된 대로 동작하도록 구성되는 적어도 하나의 프로세서를 포함하는 장치가 제공된다. 프로그램 코드는, 적어도 하나의 프로세서로 하여금, E2E NIC 프레임워크에 대한 입력 이미지를 수신하게 하도록 구성되는 수신 코드; 적어도 하나의 프로세서로 하여금, E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 대체 이미지를 결정하게 하도록 구성되는 제1 결정 코드; 적어도 하나의 프로세서로 하여금, 대체 이미지를 인코딩하여 비트스트림을 발생시키게 하도록 구성되는 제1 인코딩 코드; 및 적어도 하나의 프로세서로 하여금, 대체 이미지를 비트스트림에 매핑하여 압축된 표현을 발생시키게 하도록 구성되는 매핑 코드를 포함한다.According to exemplary embodiments, an apparatus for alternative end-to-end (E2E) neural image compression (NIC) using a neural network, comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as directed by the program code. The program code may include receiving code configured to cause at least one processor to receive an input image for the E2E NIC framework; a first determining code configured to cause at least one processor to determine a replacement image based on a training model of an E2E NIC framework; a first encoding code configured to cause at least one processor to encode the replacement image to generate a bitstream; and mapping code configured to cause at least one processor to map the surrogate image to the bitstream to generate a compressed representation.

예시적인 실시예들에 따르면, 명령어들을 저장한 비일시적 컴퓨터 판독가능 매체로서, 명령어들은, 대체 엔드-투-엔드(E2E) 뉴럴 이미지 압축(NIC)을 위해 적어도 하나의 프로세서에 의해 실행될 때, 적어도 하나의 프로세서로 하여금, E2E NIC 프레임워크에 대한 입력 이미지를 수신하게 하고, E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 대체 이미지를 결정하게 하고, 대체 이미지를 인코딩하여 비트스트림을 발생시키게 하고, 대체 이미지를 비트스트림에 매핑하여 압축된 표현을 발생시키게 하는 비일시적 컴퓨터 판독가능 매체가 제공된다.According to example embodiments, a non-transitory computer-readable medium storing instructions, which, when executed by at least one processor for alternative end-to-end (E2E) neural image compression (NIC), at least Causes a processor to receive an input image to the E2E NIC framework, determine a replacement image based on a training model of the E2E NIC framework, encode the replacement image to generate a bitstream, and A non-transitory computer-readable medium is provided that allows mapping an image to a bitstream to generate a compressed representation.

도 1은 실시예들에 따른, 본 명세서에 설명된 방법들, 장치들 및 시스템들이 구현될 수 있는 환경의 도면이다.
도 2는 도 1의 하나 이상의 디바이스의 예시적인 컴포넌트들의 블록도이다.
도 3은 대체 학습-기반 이미지 코딩 사전프로세싱 모델을 예시하는 예시도이다.
도 4는 블록-단위(block-wise) 이미지 코딩의 예를 예시한다.
도 5는 실시예들에 따른, 대체 엔드-투-엔드(E2E) 뉴럴 이미지 압축(NIC) 방법의 흐름도이다.
도 6은 실시예들에 따른, 대체 엔드-투-엔드(E2E) 뉴럴 이미지 압축(NIC)을 위한 장치의 블록도이다.1 is a diagram of an environment in which the methods, apparatuses, and systems described herein, in accordance with embodiments, may be implemented.
FIG. 2 is a block diagram of example components of one or more devices of FIG. 1 .
3 is an exemplary diagram illustrating a replacement learning-based image coding preprocessing model.
4 illustrates an example of block-wise image coding.
5 is a flow diagram of an alternative end-to-end (E2E) neural image compression (NIC) method according to embodiments.
6 is a block diagram of an apparatus for alternative end-to-end (E2E) neural image compression (NIC), according to embodiments.

실시예들은 픽처를 수신하고, 엔드-투-엔드(E2E) 최적화된 프레임워크에 기초하여 픽처의 대체 표현(substitutional representation)을 코딩하는 레이트-왜곡 성능을 최적화하기 위해 픽처의 대체 표현의 요소들을 튜닝하는 최적화 프로세스를 수행함으로써 픽처의 대체 표현을 결정하는 단계를 포함할 수 있다. E2E 최적화된 프레임워크는 사전트레이닝되는 인공 뉴럴 네트워크(artificial neural network)(ANN) 기반 이미지 또는 비디오 코딩 프레임워크일 수 있다. 픽처의 대체 표현은 인코딩되어 비트스트림을 발생시킬 수 있다. 인공 뉴럴 네트워크-기반 비디오 코딩 프레임워크에서, 머신 학습 프로세스를 수행함으로써, 최종 목표(예를 들어, 레이트-왜곡 성능)를 개선하기 위해 상이한 모듈들이 입력부터 출력까지 공동으로 최적화되어, 엔드-투-엔드(E2E) 최적화된 뉴럴 이미지 압축(NIC)을 생성할 수 있다.Embodiments receive a picture and tune elements of the substitutional representation of a picture to optimize the rate-distortion performance of coding the substitutional representation of the picture based on an end-to-end (E2E) optimized framework. determining an alternative representation of the picture by performing an optimization process that The E2E optimized framework may be a pre-trained artificial neural network (ANN) based image or video coding framework. An alternate representation of a picture may be encoded to generate a bitstream. In an artificial neural network-based video coding framework, different modules are jointly optimized from input to output to improve an end goal (eg, rate-distortion performance) by performing a machine learning process, resulting in an end-to-end End (E2E) Optimized Neural Image Compression (NIC) can be generated.

도 1은 실시예들에 따른, 본 명세서에 설명된 방법들, 장치들 및 시스템들이 구현될 수 있는 환경(100)의 도면이다.1 is a diagram of an environment 100 in which the methods, apparatuses, and systems described herein may be implemented, in accordance with embodiments.

도 1에 도시된 바와 같이, 환경(100)은 사용자 디바이스(110), 플랫폼(120), 및 네트워크(130)를 포함할 수 있다. 환경(100)의 디바이스들은 유선 연결들, 무선 연결들, 또는 유선 및 무선 연결들의 조합을 통해 상호연결될 수 있다.As shown in FIG. 1 , environment 100 may include user device 110 , platform 120 , and network 130 . Devices in environment 100 may be interconnected through wired connections, wireless connections, or a combination of wired and wireless connections.

사용자 디바이스(110)는 플랫폼(120)과 연관된 정보를 수신, 발생, 저장, 프로세싱, 및/또는 제공 가능한 하나 이상의 디바이스를 포함한다. 예를 들어, 사용자 디바이스(110)는 컴퓨팅 디바이스(예를 들어, 데스크탑 컴퓨터, 랩탑 컴퓨터, 태블릿 컴퓨터, 핸드헬드 컴퓨터, 스마트 스피커, 서버 등), 모바일폰(예를 들어, 스마트폰, 무선 전화기(radiotelephone) 등), 웨어러블 디바이스(예를 들어, 한 쌍의 스마트 안경 또는 스마트 워치), 또는 이와 유사한 디바이스를 포함할 수 있다. 일부 구현들에서, 사용자 디바이스(110)는 플랫폼(120)으로부터 정보를 수신하고/하거나 플랫폼(120)에 정보를 송신할 수 있다.User device 110 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with platform 120 . For example, user device 110 may include a computing device (eg, a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.), a mobile phone (eg, a smart phone, a wireless telephone ( radiotelephone, etc.), a wearable device (eg, a pair of smart glasses or a smart watch), or a similar device. In some implementations, user device 110 can receive information from platform 120 and/or transmit information to platform 120 .

플랫폼(120)은 본 명세서의 다른 곳에서 설명된 바와 같은 하나 이상의 디바이스를 포함한다. 일부 구현들에서, 플랫폼(120)은 클라우드 서버 또는 클라우드 서버들의 그룹을 포함할 수 있다. 일부 구현들에서, 플랫폼(120)은 소프트웨어 컴포넌트들이 스왑 인 또는 스왑 아웃될 수 있는 모듈식으로 설계될 수 있다. 이와 같이, 플랫폼(120)은 상이한 용도들을 위해 쉽게 및/또는 빠르게 재구성될 수 있다.Platform 120 includes one or more devices as described elsewhere herein. In some implementations, platform 120 can include a cloud server or group of cloud servers. In some implementations, platform 120 can be designed to be modular in which software components can be swapped in or swapped out. As such, platform 120 can be easily and/or quickly reconfigured for different uses.

일부 구현들에서, 도시된 바와 같이, 플랫폼(120)은 클라우드 컴퓨팅 환경(122)에서 호스팅될 수 있다. 특히, 본 명세서에 설명된 구현들은 클라우드 컴퓨팅 환경(122)에서 호스팅되는 것으로 플랫폼(120)을 설명하지만, 일부 구현들에서, 플랫폼(120)은 클라우드-기반이 아닐 수도 있고(즉, 클라우드 컴퓨팅 환경의 외부에서 구현될 수 있음), 또는 부분적으로 클라우드-기반일 수도 있다.In some implementations, as shown, platform 120 can be hosted in cloud computing environment 122 . In particular, while the implementations described herein describe platform 120 as being hosted in cloud computing environment 122, in some implementations, platform 120 may not be cloud-based (i.e., a cloud computing environment may be implemented outside of the cloud), or may be partially cloud-based.

클라우드 컴퓨팅 환경(122)은 플랫폼(120)을 호스팅하는 환경을 포함한다. 클라우드 컴퓨팅 환경(122)은 플랫폼(120)을 호스팅하는 시스템(들) 및/또는 디바이스(들)의 물리적 위치 및 구성에 대한 최종-사용자(예를 들어, 사용자 디바이스(110)) 지식을 요구하지 않는 계산, 소프트웨어, 데이터 액세스, 스토리지 등의 서비스들을 제공할 수 있다. 도시된 바와 같이, 클라우드 컴퓨팅 환경(122)은 컴퓨팅 자원들(124)의 그룹(집합적으로 "컴퓨팅 자원들(124)"로서 지칭되고, 개별적으로 "컴퓨팅 자원(124)"으로서 지칭됨)을 포함할 수 있다.Cloud computing environment 122 includes an environment hosting platform 120 . The cloud computing environment 122 does not require end-user (eg, user device 110) knowledge about the physical location and configuration of the system(s) and/or device(s) hosting the platform 120. It can provide services such as computation, software, data access, storage, etc. As shown, a cloud computing environment 122 is a group of computing resources 124 (collectively referred to as "computing resources 124" and individually referred to as "computing resources 124"). can include

컴퓨팅 자원(124)은 하나 이상의 개인용 컴퓨터, 워크스테이션 컴퓨터, 서버 디바이스, 또는 다른 유형들의 계산 및/또는 통신 디바이스를 포함한다. 일부 구현들에서, 컴퓨팅 자원(124)은 플랫폼(120)을 호스팅할 수 있다. 클라우드 자원들은 컴퓨팅 자원(124)에서 실행되는 컴퓨트 인스턴스들, 컴퓨팅 자원(124)에서 제공되는 스토리지 디바이스들, 컴퓨팅 자원(124)에 의해 제공되는 데이터 전송 디바이스들 등을 포함할 수 있다. 일부 구현들에서, 컴퓨팅 자원(124)은 유선 연결들, 무선 연결들, 또는 유선 및 무선 연결들의 조합을 통해 다른 컴퓨팅 자원들(124)과 통신할 수 있다.Computing resource 124 includes one or more personal computers, workstation computers, server devices, or other types of computing and/or communication devices. In some implementations, computing resource 124 can host platform 120 . Cloud resources may include compute instances running on computing resource 124 , storage devices provided by computing resource 124 , data transfer devices provided by computing resource 124 , and the like. In some implementations, computing resource 124 can communicate with other computing resources 124 via wired connections, wireless connections, or a combination of wired and wireless connections.

도 1에 추가로 도시된 바와 같이, 컴퓨팅 자원(124)은 하나 이상의 애플리케이션("APP")(124-1), 하나 이상의 가상 머신("VM")(124-2), 가상화된 스토리지("VS")(124-3), 하나 이상의 하이퍼바이저(hypervisor)("HYP")(124-4) 등과 같은 클라우드 자원들의 그룹을 포함한다.As further shown in FIG. 1 , computing resource 124 includes one or more applications ("APP") 124-1, one or more virtual machines ("VM") 124-2, virtualized storage ("VM") 124-2, VS") 124-3, one or more hypervisors ("HYP") 124-4, and the like.

애플리케이션(124-1)은 사용자 디바이스(110) 및/또는 플랫폼(120)에 제공되거나 또는 이에 의해 액세스될 수 있는 하나 이상의 소프트웨어 애플리케이션을 포함한다. 애플리케이션(124-1)은 사용자 디바이스(110)에 소프트웨어 애플리케이션들을 설치하고 실행할 필요성을 제거할 수 있다. 예를 들어, 애플리케이션(124-1)은 플랫폼(120)과 연관된 소프트웨어 및/또는 클라우드 컴퓨팅 환경(122)을 통해 제공될 수 있는 임의의 다른 소프트웨어를 포함할 수 있다. 일부 구현들에서, 하나의 애플리케이션(124-1)은 가상 머신(124-2)을 통해 하나 이상의 다른 애플리케이션(124-1)으로/으로부터 정보를 전송/수신할 수 있다.Applications 124 - 1 include one or more software applications that may be provided to or accessed by user device 110 and/or platform 120 . Application 124 - 1 may eliminate the need to install and run software applications on user device 110 . For example, application 124 - 1 may include software associated with platform 120 and/or any other software that may be provided via cloud computing environment 122 . In some implementations, one application 124-1 can send/receive information to/from one or more other applications 124-1 via virtual machine 124-2.

가상 머신(124-2)은 물리적 머신과 같이 프로그램들을 실행하는 머신(예를 들어, 컴퓨터)의 소프트웨어 구현을 포함한다. 가상 머신(124-2)은 가상 머신(124-2)에 의한 임의의 실제 머신에 대한 대응 정도 및 용도에 따라, 시스템 가상 머신 또는 프로세스 가상 머신 중 어느 것일 수 있다. 시스템 가상 머신은 완전한 운영 체제("OS")의 실행을 지원하는 완전한 시스템 플랫폼을 제공할 수 있다. 프로세스 가상 머신은 단일 프로그램을 실행할 수 있으며, 단일 프로세스를 지원할 수 있다. 일부 구현들에서, 가상 머신(124-2)은 사용자(예를 들어, 사용자 디바이스(110))를 대신하여 실행될 수 있고, 데이터 관리, 동기화, 또는 장기-지속기간(long-duration) 데이터 전송들과 같은 클라우드 컴퓨팅 환경(122)의 인프라스트럭처를 관리할 수 있다.The virtual machine 124-2 includes a software implementation of a machine (eg, computer) that executes programs like a physical machine. The virtual machine 124-2 may be either a system virtual machine or a process virtual machine, depending on the purpose and degree of correspondence by the virtual machine 124-2 to any real machine. A system virtual machine may provide a complete system platform supporting the execution of a complete operating system ("OS"). A process virtual machine can run a single program and can support a single process. In some implementations, virtual machine 124 - 2 can execute on behalf of a user (eg, user device 110 ) and perform data management, synchronization, or long-duration data transfers. It is possible to manage the infrastructure of the cloud computing environment 122, such as.

가상화된 스토리지(124-3)는 컴퓨팅 자원(124)의 스토리지 시스템들 또는 디바이스들 내에서 가상화 기법들을 사용하는 하나 이상의 스토리지 시스템 및/또는 하나 이상의 디바이스를 포함한다. 일부 구현들에서, 스토리지 시스템의 컨텍스트 내에서, 가상화들의 유형들은 블록 가상화 및 파일 가상화를 포함할 수 있다. 블록 가상화는 물리적 스토리지 또는 이기종 구조(heterogeneous structure)에 관계없이 스토리지 시스템이 액세스될 수 있도록 물리적 스토리지로부터 논리적 스토리지를 추상화(또는 분리)하는 것을 의미할 수 있다. 분리는 스토리지 시스템의 관리자들에게 관리자들이 최종 사용자들을 위해 스토리지를 관리하는 방법에서의 유연성을 허용할 수 있다. 파일 가상화는 파일 레벨에서 액세스되는 데이터와 파일들이 물리적으로 저장되는 위치 사이의 종속성들을 제거할 수 있다. 이를 통해 스토리지 사용, 서버 통합(server consolidation), 및/또는 무중단 파일 마이그레이션(non-disruptive file migration)들의 성능의 최적화를 가능하게 할 수 있다.Virtualized storage 124 - 3 includes one or more storage systems and/or one or more devices that use virtualization techniques within the storage systems or devices of computing resource 124 . In some implementations, within the context of a storage system, types of virtualizations can include block virtualization and file virtualization. Block virtualization can refer to abstracting (or separating) logical storage from physical storage so that the storage system can be accessed regardless of the physical storage or heterogeneous structure. Separation can allow administrators of storage systems flexibility in how they manage storage for end users. File virtualization can remove dependencies between data accessed at the file level and where files are physically stored. This may enable optimization of storage usage, server consolidation, and/or performance of non-disruptive file migrations.

하이퍼바이저(124-4)는 다수의 운영 체제들(예를 들어, "게스트 운영 체제들")이 컴퓨팅 자원(124)과 같은 호스트 컴퓨터에서 동시에 실행될 수 있도록 하는 하드웨어 가상화 기법들을 제공할 수 있다. 하이퍼바이저(124-4)는 게스트 운영 체제들에 가상 운영 플랫폼을 제시할 수 있고, 게스트 운영 체제들의 실행을 관리할 수 있다. 다양한 운영 체제들의 다수의 인스턴스들은 가상화된 하드웨어 자원들을 공유할 수 있다.Hypervisor 124 - 4 may provide hardware virtualization techniques that allow multiple operating systems (eg, “guest operating systems”) to run concurrently on a host computer such as computing resource 124 . Hypervisor 124-4 may present a virtual operating platform to guest operating systems and may manage the execution of guest operating systems. Multiple instances of various operating systems may share virtualized hardware resources.

네트워크(130)는 하나 이상의 유선 및/또는 무선 네트워크를 포함한다. 예를 들어, 네트워크(130)는 셀룰러 네트워크(예를 들어, 5세대(fifth generation)(5G) 네트워크, LTE(long-term evolution) 네트워크, 3세대(third generation)(3G) 네트워크, CDMA(code division multiple access) 네트워크 등), PLMN(public land mobile network), LAN(local area network), WAN(wide area network), MAN(metropolitan area network), 전화 네트워크(예를 들어, PSTN(Public Switched Telephone Network)), 사설 네트워크, 애드혹 네트워크, 인트라넷, 인터넷, 광섬유-기반 네트워크 등, 및/또는 이들 또는 다른 유형들의 네트워크들의 조합을 포함할 수 있다.Network 130 includes one or more wired and/or wireless networks. For example, the network 130 may be a cellular network (eg, a fifth generation (5G) network, a long-term evolution (LTE) network, a third generation (3G) network, a CDMA (code division multiple access networks, etc.), public land mobile networks (PLMNs), local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), telephone networks (e.g., Public Switched Telephone Network (PSTN) )), private networks, ad hoc networks, intranets, the Internet, fiber-based networks, etc., and/or combinations of these or other types of networks.

도 1에 도시된 디바이스들 및 네트워크들의 수 및 배열은 예로서 제공된다. 실제로, 추가 디바이스들 및/또는 네트워크들, 더 적은 수의 디바이스들 및/또는 네트워크들, 상이한 디바이스들 및/또는 네트워크들, 또는 도 1에 도시된 것들과 상이하게 배열된 디바이스들 및/또는 네트워크들이 있을 수 있다. 또한, 도 1에 도시된 2개 이상의 디바이스는 단일 디바이스 내에서 구현될 수 있고, 또는 도 1에 도시된 단일 디바이스는 다수의 분산된 디바이스들로서 구현될 수 있다. 추가적으로, 또는 대안적으로, 환경(100)의 디바이스들의 세트(예를 들어, 하나 이상의 디바이스)는 환경(100)의 디바이스들의 다른 세트에 의해 수행되는 것으로 설명된 하나 이상의 기능을 수행할 수 있다.The number and arrangement of devices and networks shown in FIG. 1 is provided as an example. Indeed, additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or devices and/or network arranged differently than those shown in FIG. there may be Also, two or more devices shown in FIG. 1 may be implemented within a single device, or a single device shown in FIG. 1 may be implemented as multiple distributed devices. Additionally or alternatively, a set of devices (eg, one or more devices) of environment 100 may perform one or more functions described as being performed by another set of devices of environment 100 .

도 2는 도 1의 하나 이상의 디바이스의 예시적인 컴포넌트들의 블록도이다.FIG. 2 is a block diagram of example components of one or more devices of FIG. 1 .

디바이스(200)는 사용자 디바이스(110) 및/또는 플랫폼(120)에 대응할 수 있다. 도 2에 도시된 바와 같이, 디바이스(200)는 버스(210), 프로세서(220), 메모리(230), 스토리지 컴포넌트(240), 입력 컴포넌트(250), 출력 컴포넌트(260), 및 통신 인터페이스(270)를 포함할 수 있다.Device 200 may correspond to user device 110 and/or platform 120 . As shown in FIG. 2, the device 200 includes a bus 210, a processor 220, a memory 230, a storage component 240, an input component 250, an output component 260, and a communication interface ( 270) may be included.

버스(210)는 디바이스(200)의 컴포넌트들 사이의 통신을 허용하는 컴포넌트를 포함한다. 프로세서(220)는 하드웨어, 펌웨어, 또는 하드웨어와 소프트웨어의 조합으로 구현된다. 프로세서(220)는 CPU(central processing unit), GPU(graphics processing unit), APU(accelerated processing unit), 마이크로프로세서, 마이크로컨트롤러, DSP(digital signal processor), FPGA(field-programmable gate array), ASIC(application-specific integrated circuit), 또는 다른 유형의 프로세싱 컴포넌트이다. 일부 구현들에서, 프로세서(220)는 기능을 수행하도록 프로그래밍될 수 있는 하나 이상의 프로세서를 포함한다. 메모리(230)는 프로세서(220)에 의한 사용을 위해 정보 및/또는 명령어들을 저장한 RAM(random access memory), ROM(read only memory), 및/또는 다른 유형의 동적 또는 정적 스토리지 디바이스(예를 들어, 플래시 메모리, 자기 메모리, 및/또는 광학 메모리)를 포함한다.Bus 210 includes components that allow communication between components of device 200 . Processor 220 is implemented in hardware, firmware, or a combination of hardware and software. The processor 220 may include a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an ASIC ( application-specific integrated circuit), or other type of processing component. In some implementations, processor 220 includes one or more processors that can be programmed to perform a function. Memory 230 may include random access memory (RAM), read only memory (ROM), and/or other types of dynamic or static storage devices (e.g., random access memory) that store information and/or instructions for use by processor 220. eg, flash memory, magnetic memory, and/or optical memory).

스토리지 컴포넌트(240)는 디바이스(200)의 동작 및 사용과 관련된 정보 및/또는 소프트웨어를 저장한다. 예를 들어, 스토리지 컴포넌트(240)는 하드 디스크(예를 들어, 자기 디스크, 광 디스크, 광자기 디스크(magneto-optic disk), 및/또는 솔리드 스테이트 디스크(solid state disk)), CD(compact disc), DVD(digital versatile disc), 플로피 디스크, 카트리지, 자기 테이프, 및/또는 다른 유형의 비일시적 컴퓨터 판독가능 매체를, 대응하는 드라이브와 함께 포함할 수 있다.Storage component 240 stores information and/or software related to operation and use of device 200 . For example, the storage component 240 may include a hard disk (eg, a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid state disk), a compact disc (CD), ), digital versatile disc (DVD), floppy disk, cartridge, magnetic tape, and/or other types of non-transitory computer readable media, along with a corresponding drive.

입력 컴포넌트(250)는 디바이스(200)가, 예를 들어, 사용자 입력(예를 들어, 터치 스크린 디스플레이, 키보드, 키패드, 마우스, 버튼, 스위치, 및/또는 마이크로폰)을 통해 정보를 수신하도록 허용하는 컴포넌트를 포함한다. 추가적으로, 또는 대안적으로, 입력 컴포넌트(250)는 정보를 감지하기 위한 센서(예를 들어, GPS(global positioning system) 컴포넌트, 가속도계, 자이로스코프, 및/또는 액추에이터)를 포함할 수 있다. 출력 컴포넌트(260)는 디바이스(200)로부터 출력 정보를 제공하는 컴포넌트(예를 들어, 디스플레이, 스피커, 및/또는 하나 이상의 LED(light-emitting diode))를 포함한다.Input component 250 allows device 200 to receive information via, for example, user input (eg, a touch screen display, keyboard, keypad, mouse, buttons, switches, and/or microphone). contains components. Additionally or alternatively, the input component 250 may include sensors (eg, global positioning system (GPS) components, accelerometers, gyroscopes, and/or actuators) for sensing information. Output components 260 include components that provide output information from device 200 (eg, a display, a speaker, and/or one or more light-emitting diodes (LEDs)).

통신 인터페이스(270)는 디바이스(200)가, 예를 들어, 유선 연결, 무선 연결, 또는 유선 및 무선 연결들의 조합을 통해 다른 디바이스들과 통신할 수 있게 하는 송수신기-형 컴포넌트(예를 들어, 송수신기 및/또는 별도의 수신기 및 송신기)를 포함한다. 통신 인터페이스(270)는 디바이스(200)가 다른 디바이스로부터 정보를 수신하고 및/또는 다른 디바이스에 정보를 제공하도록 허용할 수 있다. 예를 들어, 통신 인터페이스(270)는 이더넷 인터페이스, 광 인터페이스, 동축 인터페이스, 적외선 인터페이스, RF(radio frequency) 인터페이스, USB(universal serial bus) 인터페이스, Wi-Fi 인터페이스, 셀룰러 네트워크 인터페이스 등을 포함할 수 있다.Communications interface 270 may be a transceiver-like component (e.g., a transceiver) that enables device 200 to communicate with other devices via, for example, a wired connection, a wireless connection, or a combination of wired and wireless connections. and/or separate receiver and transmitter). Communications interface 270 may allow device 200 to receive information from and/or provide information to other devices. For example, the communication interface 270 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, and the like. there is.

디바이스(200)는 본 명세서에 설명된 하나 이상의 프로세스를 수행할 수 있다. 디바이스(200)는 프로세서(220)가 메모리(230) 및/또는 스토리지 컴포넌트(240)와 같은 비일시적 컴퓨터 판독가능 매체에 의해 저장된 소프트웨어 명령어들을 실행하는 것에 응답하여 이들 프로세스들을 수행할 수 있다. 컴퓨터 판독가능 매체는 본 명세서에서 비일시적 메모리 디바이스로서 정의된다. 메모리 디바이스는 단일 물리적 스토리지 디바이스 내의 메모리 공간 또는 다수의 물리적 스토리지 디바이스들에 걸쳐 분산된 메모리 공간을 포함한다.Device 200 may perform one or more processes described herein. Device 200 may perform these processes in response to processor 220 executing software instructions stored by a non-transitory computer readable medium, such as memory 230 and/or storage component 240 . A computer readable medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space distributed across multiple physical storage devices.

소프트웨어 명령어들은 통신 인터페이스(270)를 통해 다른 컴퓨터 판독가능 매체로부터 또는 다른 디바이스로부터 메모리(230) 및/또는 스토리지 컴포넌트(240)로 판독될 수 있다. 실행될 때, 메모리(230) 및/또는 스토리지 컴포넌트(240)에 저장된 소프트웨어 명령어들은, 프로세서(220)로 하여금, 본 명세서에 설명된 하나 이상의 프로세스를 수행하게 할 수 있다. 추가적으로, 또는 대안적으로, 하드와이어드 회로망은 본 명세서에 설명된 하나 이상의 프로세스를 수행하기 위해 소프트웨어 명령어들 대신에 또는 이와 결합하여 사용될 수 있다. 따라서, 본 명세서에 설명된 구현들은 하드웨어 회로망과 소프트웨어의 임의의 특정 조합에 제한되지 않는다.Software instructions may be read into memory 230 and/or storage component 240 from another computer readable medium or from another device via communication interface 270 . When executed, software instructions stored in memory 230 and/or storage component 240 may cause processor 220 to perform one or more processes described herein. Additionally or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Accordingly, implementations described herein are not limited to any particular combination of hardware circuitry and software.

도 2에 도시된 컴포넌트들의 수 및 배열은 예로서 제공된다. 실제로, 디바이스(200)는 추가 컴포넌트들, 더 적은 수의 컴포넌트들, 상이한 컴포넌트들, 또는 도 2에 도시된 것들과 상이하게 배열된 컴포넌트들을 포함할 수 있다. 추가적으로, 또는 대안적으로, 디바이스(200)의 컴포넌트들의 세트(예를 들어, 하나 이상의 컴포넌트)는 디바이스(200)의 컴포넌트들의 다른 세트에 의해 수행되는 것으로 설명된 하나 이상의 기능을 수행할 수 있다.The number and arrangement of components shown in FIG. 2 is provided as an example. In practice, device 200 may include additional components, fewer components, different components, or components arranged differently than those shown in FIG. 2 . Additionally or alternatively, a set of components (eg, one or more components) of device 200 may perform one or more functions described as being performed by another set of components of device 200 .

입력 이미지 x가 주어지면, NIC의 타깃은 이미지 x를 DNN 인코더에 대한 입력으로서 사용하여 스토리지 및 송신 목적들을 위해 콤팩트한 압축된 표현

를 컴퓨트하는 것이다. 그 다음, 압축된 표현

를 DNN 디코더에 대한 입력으로서 사용하여 이미지

를 재구성한다. 일부 NIC 방법들은 VAE(variational autoencoder) 구조를 취할 수 있으며, 여기서, DNN 인코더들은 전체 이미지 x를 그것의 입력으로서 직접 사용하고, 이는 출력 표현(즉, 압축된 표현

)을 컴퓨트하기 위해 블랙 박스(black box)처럼 작동하는 네트워크 계층들의 세트를 통과한다. 이에 대응하여, DNN 디코더들은 전체 압축된 표현

를 그것의 입력으로서 취하며, 이는 재구성된 이미지

를 컴퓨트하기 위해 다른 블랙 박스처럼 작동하는 다른 네트워크 계층들의 세트를 통과한다. 레이트-왜곡(Rate-Distortion)(R-D) 손실은 다음 타깃 손실 함수(loss function)

을 사용하여 트레이드-오프 하이퍼파라미터

를 갖고 재구성된 이미지

의 왜곡 손실

와 압축된 표현

의 비트 소비(bit consumption) R 사이의 트레이드-오프를 달성하도록 최적화된다.Given an input image x, the NIC's target uses image x as input to the DNN encoder to obtain a compact compressed representation for storage and transmission purposes.

is to compute Then, the compressed expression

image using as input to the DNN decoder

reconstitute Some NIC methods may take a variational autoencoder (VAE) structure, where DNN encoders directly use the whole image x as its input, which is an output representation (i.e., a compressed representation).

) through a set of network layers that act like a black box to compute . Correspondingly, DNN decoders use the full compressed representation

takes as its input, which is the reconstructed image

It passes through another set of network layers that act like other black boxes to compute . The Rate-Distortion (RD) loss is the target loss function

Trade-off hyperparameters using

image reconstructed with

distortion loss of

and compressed expression

is optimized to achieve a trade-off between the bit consumption R of

사전프로세싱과 관련된 실시예들은, 압축될 각각의 입력 이미지에 대해, 최적의 대체물을 찾고 원본 이미지 대신에 이 대체물을 압축하는 데 온라인 트레이닝이 사용될 수 있다고 제안한다. 이 대체물을 사용함으로써, 인코더는 더 나은 압축 성능을 달성할 수 있다. 이 방법은 임의의 E2E NIC 압축 방법의 압축 성능을 부스팅하기 위한 사전프로세싱 단계로서 사용된다. 이것은 사전트레이닝된 압축 모델 자체 또는 임의의 트레이닝 데이터에 대한 임의의 트레이닝 또는 미세-튜닝을 요구하지 않는다. 이제, 하나 이상의 실시예에 따른, 사전프로세싱 모델을 위한 상세한 방법 및 장치가 설명될 것이다.Embodiments relating to preprocessing suggest that for each input image to be compressed, online training can be used to find the best replacement and compress this replacement instead of the original image. By using this replacement, the encoder can achieve better compression performance. This method is used as a preprocessing step to boost the compression performance of any E2E NIC compression method. It does not require any training or fine-tuning of the pretrained compression model itself or any training data. Detailed methods and apparatus for preprocessing models, in accordance with one or more embodiments, will now be described.

도 3은 대체 학습-기반 이미지 코딩 사전프로세싱 모델을 예시하는 예시도이다.3 is an exemplary diagram illustrating a replacement learning-based image coding preprocessing model.

학습-기반 이미지 압축은 2-단계 매핑 프로세스로서 보여질 수 있다. 도 3에 도시된 바와 같이, 고차원 공간의 원본 이미지

는 길이

을 갖는 비트-스트림에 매핑되고(인코딩 매핑(300)), 이는 그 다음에 왜곡 손실

를 갖고

에서 원본 공간에 다시 매핑된다(디코딩 매핑(310)).Learning-based image compression can be viewed as a two-step mapping process. As shown in FIG. 3, the original image in a high-dimensional space

is the length

(encoding mapping 300), which is followed by distortion loss

have

is mapped back to the original space in (decoding mapping 310).

예시적인 실시예에서, 도 3에 도시된 바와 같이, 대체 이미지

가 길이

을 갖는 비트-스트림에 매핑되도록 존재하는 경우, 이는 그 다음에 왜곡 손실

를 갖고 원본 이미지

에 더 가까운 공간

에 매핑된다. 거리 측정 또는 손실 함수가 주어지면, 대체 이미지를 사용하여 더 나은 압축이 달성될 수 있다. 최상의 압축 성능은 수학식 (1)에 따른 타깃 손실 함수의 전역 최소값(global minimum)에서 달성된다. 다른 예시적인 실시예에서는, 디코딩된 이미지와 원본 이미지

사이의 차이들을 감소시키기 위해, ANN의 임의의 중간 단계들에서 대체들이 발견될 수 있다.In an exemplary embodiment, as shown in FIG. 3 , the alternate image

the length

If present to be mapped to a bit-stream with

have the original image

space closer to

is mapped to Given a distance measure or loss function, better compression can be achieved using alternative images. The best compression performance is achieved at the global minimum of the target loss function according to Equation (1). In another exemplary embodiment, the decoded image and the original image

Substitutions can be found at any intermediate stages of the ANN, in order to reduce the differences between

모델의 파라미터들을 업데이트하기 위해 그래디언트(gradient)가 사용되는 모델 트레이닝 페이즈와 달리, 사전프로세싱 모델에서는, 모델의 파라미터들은 고정되고, 그래디언트는 입력 이미지 자체를 업데이트하는 데 사용될 수 있다. 미분가능하지 않은(non-differentiable) 부분들을 미분가능한(differentiable) 부분들로 교체함으로써(예를 들어, 양자화를 노이즈 주입으로 교체함으로써), 전체 모델이 미분가능하게 된다(따라서, 그래디언트들이 역전파(backpropagation)될 수 있다). 따라서, 상기 최적화는 그래디언트 하강법(gradient descent)에 의해 반복적으로 해결될 수 있다.Unlike the model training phase, where gradients are used to update the model's parameters, in preprocessing models, the model's parameters are fixed, and the gradient can be used to update the input image itself. By replacing non-differentiable parts with differentiable parts (e.g., by replacing quantization with noise injection), the entire model becomes differentiable (thus, the gradients are backpropagated ( can be backpropagated). Thus, the optimization can be iteratively solved by gradient descent.

이 사전프로세싱 모델에는 스텝 사이즈 및 스텝들의 수라는 두 가지 주요한 하이퍼파라미터가 있다. 스텝 사이즈는 온라인 트레이닝의 '학습률'을 나타낸다. 상이한 유형들의 콘텐츠를 갖는 이미지들은 최상의 최적화 결과들을 달성하기 위해 상이한 스텝 사이즈들에 대응할 수 있다. 스텝들의 수는 동작되는 업데이트들의 수를 나타낸다. 타깃 손실 함수

과 함께, 하이퍼파라미터들이 학습 프로세스를 위해 사용된다. 예를 들어, 스텝 사이즈는 학습 프로세스에서 수행되는 그래디언트 하강법 알고리즘 또는 역전파 계산에서 사용될 수 있다. 반복들의 수는 학습 프로세스가 종료될 수 있는 시기를 제어하기 위한 최대 반복들의 수의 임계값으로서 사용될 수 있다.There are two main hyperparameters in this preprocessing model: step size and number of steps. The step size represents the 'learning rate' of online training. Images with different types of content can correspond to different step sizes to achieve the best optimization results. The number of steps represents the number of updates operated. target loss function

With , hyperparameters are used for the learning process. For example, the step size can be used in gradient descent algorithms or backpropagation calculations performed in the learning process. The number of iterations can be used as a threshold for the maximum number of iterations to control when the learning process can end.

도 4는 블록-단위 이미지 코딩의 예를 예시한다.4 illustrates an example of block-wise image coding.

예시적인 실시예에서, 이미지(400)는 먼저 (도 4에서 파선에 의해 예시된) 블록들로 분할될 수 있고, 분할된 블록들은 이미지(400) 자체 대신에 압축될 수 있다. 압축된 블록들은 도 4에서 음영 처리되어 있고, 압축될 블록들은 음영 처리되어 있지 않다. 분할된 블록들은 사이즈가 동일할 수도 있고 또는 사이즈가 동일하지 않을 수도 있다. 각각의 블록에 대한 스텝 사이즈는 상이할 수 있다. 이를 위해, 더 나은 압축 결과를 달성하기 위해 이미지(400)에 대해 상이한 스텝 사이즈들이 할당될 수 있다. 블록(410)은 높이가 h이고 폭이 w인 분할된 블록들 중 하나의 것의 예이다.In an exemplary embodiment, image 400 may first be segmented into blocks (illustrated by dashed lines in FIG. 4 ), and the segmented blocks may be compressed instead of image 400 itself. Compressed blocks are shaded in FIG. 4 and blocks to be compressed are not shaded. The divided blocks may have the same size or may not have the same size. The step size for each block may be different. To this end, different step sizes may be assigned to image 400 to achieve better compression results. Block 410 is an example of one of the divided blocks having height h and width w.

예시적인 실시예에서, 이미지는 블록들로 분할되지 않고 압축될 수 있으며, 전체 이미지가 E2E NIC 모델의 입력이다. 최적화된 압축 결과들을 달성하기 위해 상이한 이미지들은 상이한 스텝 사이즈들을 가질 수 있다.In an exemplary embodiment, the image may be compressed without being split into blocks, and the entire image is the input to the E2E NIC model. Different images may have different step sizes to achieve optimized compression results.

다른 예시적인 실시예에서, 스텝 사이즈는 이미지(또는 블록)의 특성들, 예를 들어, 이미지의 RGB 분산(variance)에 기초하여 선택될 수 있다. 실시예들에서, RGB는 적색-녹색-청색 컬러 모델을 지칭할 수 있다. 또한, 다른 예시적인 실시예에서, 스텝 사이즈는 이미지(또는 블록)의 RD 성능에 기초하여 선택될 수 있다. 따라서, 그 실시예들에 따르면, 다수의 스텝 사이즈들에 기초하여 다수의 대체들이 발생되고, 더 나은 압축 성능을 갖는 것이 선택된다.In another exemplary embodiment, the step size may be selected based on characteristics of the image (or block), eg, the RGB variance of the image. In embodiments, RGB may refer to a red-green-blue color model. Also, in another exemplary embodiment, the step size may be selected based on the RD performance of the image (or block). Thus, according to the embodiments, multiple replacements are generated based on multiple step sizes, and the one with better compression performance is selected.

도 5는 실시예들에 따른, 대체 엔드-투-엔드(E2E) 뉴럴 이미지 압축(NIC) 방법(500)의 흐름도이다.5 is a flow diagram of an alternative end-to-end (E2E) neural image compression (NIC) method 500 according to embodiments.

일부 구현들에서, 도 5의 하나 이상의 프로세스 블록은 플랫폼(120)에 의해 수행될 수 있다. 일부 구현들에서, 도 11의 하나 이상의 프로세스 블록은 사용자 디바이스(110)와 같이 플랫폼(120)과 분리되거나 또는 이를 포함하는 다른 디바이스 또는 디바이스들의 그룹에 의해 수행될 수 있다.In some implementations, one or more process blocks in FIG. 5 may be performed by platform 120 . In some implementations, one or more process blocks in FIG. 11 can be performed by another device or group of devices that is separate from or includes platform 120 , such as user device 110 .

도 5에 도시된 바와 같이, 동작(510)에서, 방법(500)은 E2E NIC 프레임워크에 대한 입력 이미지를 수신하는 단계를 포함한다.As shown in FIG. 5 , at operation 510 , method 500 includes receiving an input image for an E2E NIC framework.

동작(520)에서, 방법(500)은 E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 대체 이미지를 결정하는 단계를 포함한다. 대체 이미지는 E2E NIC 프레임워크의 트레이닝 모델의 최적화 프로세스에 의해 결정될 수 있다. 이것은 입력 이미지의 요소들을 조정하여 대체 표현들을 발생시키고, 입력 이미지와 대체 표현들 사이에서 가장 적은 왜곡 손실을 갖는 요소들을 선택하여 대체 이미지로서 사용함으로써 행해진다. 또한, E2E NIC 프레임워크의 트레이닝 모델은 입력 이미지의 학습률, 입력 이미지에 대한 업데이트들의 수, 및 왜곡 손실에 기초하여 트레이닝될 수 있다. 입력 이미지의 상이한 학습률들에 기초하여 하나 이상의 대체 이미지가 결정될 수 있다. 학습률들은 입력 이미지의 특성들에 기초하여 선택된다. 특성들은 입력 이미지의 RGB 분산 또는 입력 이미지의 RD 성능일 수 있다.At operation 520 , the method 500 includes determining a replacement image based on the training model of the E2E NIC framework. The replacement image may be determined by the optimization process of the training model of the E2E NIC framework. This is done by adjusting elements of the input image to generate alternative representations, selecting the elements with the least distortion loss between the input image and the alternative representations and using them as the replacement image. In addition, the training model of the E2E NIC framework may be trained based on the learning rate of the input image, the number of updates to the input image, and the distortion loss. One or more replacement images may be determined based on the different learning rates of the input image. Learning rates are selected based on the characteristics of the input image. The characteristics can be the RGB variance of the input image or the RD performance of the input image.

입력 이미지가 블록들로 분할되는 경우, 대체 블록은, 분할된 블록들의 요소들을 조정하여 대체 블록 표현들을 발생시키고, 분할된 블록들과 대체 블록 표현들 사이에서 가장 적은 왜곡 손실을 갖는 요소들을 선택하여 대체 블록으로서 사용함으로써, E2E NIC 프레임워크의 트레이닝 모델의 최적화 프로세스에 의해 결정될 수 있다. 또한, E2E NIC 프레임워크의 트레이닝 모델은 분할된 블록들의 학습률, 분할된 블록들에 대한 업데이트들의 수, 및 분할된 블록들 각각의 왜곡 손실에 기초하여 트레이닝될 수 있다. 하나 이상의 대체 블록이 분할된 블록들의 상이한 학습률들에 기초하여 결정될 수 있다. 학습률들은 분할된 블록들의 특성들에 기초하여 선택된다. 특성들은 분할된 블록들의 RGB 분산 또는 분할된 블록들의 RD 성능일 수 있다.When the input image is divided into blocks, the replacement block generates replacement block representations by adjusting elements of the divided blocks, and selects elements with the smallest distortion loss between the divided blocks and replacement block representations, By using it as a replacement block, it can be determined by the optimization process of the E2E NIC framework's training model. In addition, the training model of the E2E NIC framework may be trained based on the learning rate of the divided blocks, the number of updates for the divided blocks, and the distortion loss of each of the divided blocks. One or more replacement blocks may be determined based on the different learning rates of the partitioned blocks. Learning rates are selected based on the characteristics of the partitioned blocks. The characteristics can be the RGB variance of the partitioned blocks or the RD performance of the partitioned blocks.

동작(530)에서, 방법(500)은 대체 이미지를 인코딩하여 비트스트림을 발생시키는 단계를 포함한다.At operation 530, the method 500 includes encoding the replacement image to generate a bitstream.

동작(540)에서, 방법(500)은 대체 이미지를 비트스트림에 매핑하여 압축된 표현을 발생시키는 단계를 포함한다. 실시예들에서, 비트스트림 또는 압축된 표현 중 하나 이상은, 예를 들어, 디코더 및/또는 수신 장치에 송신될 수 있다.At operation 540, the method 500 includes mapping the replacement image to the bitstream to generate a compressed representation. In embodiments, one or more of the bitstream or compressed representation may be transmitted to, for example, a decoder and/or a receiving device.

방법(500)은 또한 입력 이미지를 하나 이상의 블록으로 분할하는 단계를 포함할 수 있다. 이 경우, 전체 입력 이미지 대신에 블록들 각각에 대해 동작들(520-540)이 수행된다. 즉, 방법(500)은 E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 분할된 블록들 각각에 대해 대체 블록을 결정하는 단계, 대체 블록을 인코딩하여 블록 비트스트림을 발생시키는 단계, 및 대체 블록을 블록 비트스트림에 매핑하여 압축된 블록을 발생시키는 단계를 추가로 포함할 것이다. 분할된 블록들은 동일한 사이즈 또는 상이한 사이즈들을 가질 수 있고, 각각의 블록은 상이한 학습률을 갖는다.Method 500 may also include segmenting the input image into one or more blocks. In this case, operations 520-540 are performed on each of the blocks instead of the entire input image. That is, the method 500 includes determining a replacement block for each of the partitioned blocks based on a training model of the E2E NIC framework, encoding the replacement block to generate a block bitstream, and converting the replacement block into a block It will further include mapping to the bitstream to generate the compressed block. The partitioned blocks can have the same size or different sizes, and each block has a different learning rate.

방법(500)은 사전트레이닝된 이미지 코딩 모델에 기초한 인공 뉴럴 네트워크를 사용할 수 있으며, 여기서, 인공 뉴럴 네트워크의 파라미터들은 고정되고, 입력 이미지를 업데이트하는 데 그래디언트가 사용된다.Method 500 may use an artificial neural network based on a pretrained image coding model, where the parameters of the artificial neural network are fixed and a gradient is used to update the input image.

도 5는 방법의 예시적인 블록들을 도시하지만, 일부 구현들에서, 방법은 추가 블록들, 더 적은 수의 블록들, 상이한 블록들, 또는 도 5에 도시된 것들과 상이하게 배열된 블록들을 포함할 수 있다. 추가적으로, 또는 대안적으로, 방법의 블록들 중 2개 이상이 병렬로 수행될 수 있다.5 shows example blocks of a method, in some implementations, the method may include additional blocks, fewer blocks, different blocks, or blocks arranged differently than those shown in FIG. 5 . can Additionally or alternatively, two or more of the blocks of the method may be performed in parallel.

도 6은 실시예들에 따른, 대체 엔드-투-엔드(E2E) 뉴럴 이미지 압축(NIC) 장치(600)의 블록도이다.6 is a block diagram of an alternative end-to-end (E2E) neural image compression (NIC) device 600, according to embodiments.

도 6에 도시된 바와 같이, 장치(600)는 수신 코드(610), 결정 코드(620), 인코딩 코드(630), 및 매핑 코드(640)를 포함한다.As shown in FIG. 6 , the device 600 includes a receive code 610 , a decision code 620 , an encoding code 630 , and a mapping code 640 .

수신 코드(610)는, 적어도 하나의 프로세서로 하여금, E2E NIC 프레임워크에 대한 입력 이미지를 수신하게 하도록 구성된다.Receive code 610 is configured to cause at least one processor to receive an input image for the E2E NIC framework.

결정 코드(620)는, 적어도 하나의 프로세서로 하여금, E2E NIC 프레임워크의 트레이닝 모델에 기초하여, 대체 이미지를 결정하게 하도록 구성된다.The decision code 620 is configured to cause the at least one processor to determine a replacement image based on a training model of the E2E NIC framework.

인코딩 코드(630)는, 적어도 하나의 프로세서로 하여금, 대체 이미지를 인코딩하여 비트스트림을 발생시키게 하도록 구성된다.Encoding code 630 is configured to cause at least one processor to encode the replacement image to generate a bitstream.

매핑 코드(640)는, 적어도 하나의 프로세서로 하여금, 대체 이미지를 비트스트림에 매핑하여 압축된 표현을 발생시키게 하도록 구성된다.Mapping code 640 is configured to cause at least one processor to map the surrogate image to a bitstream to generate a compressed representation.

장치(600)는, 적어도 하나의 프로세서로 하여금, 입력 이미지를 하나 이상의 블록으로 분할하게 하도록 구성되는 분할 코드를 추가로 포함할 수 있다. 이 경우, 결정 코드(620), 인코딩 코드(630), 및 매핑 코드(640)는 전체 입력 이미지 대신에 분할된 블록들 각각을 사용하여 수행된다.Apparatus 600 may further include segmentation code configured to cause at least one processor to segment an input image into one or more blocks. In this case, the decision code 620, the encoding code 630, and the mapping code 640 are performed using each of the divided blocks instead of the entire input image.

결정 코드(620)에 의해 결정된 대체 이미지는 E2E NIC 프레임워크의 트레이닝 모델의 최적화 프로세스에 의해 결정될 수 있다. 이것은, 적어도 하나의 프로세서로 하여금, 입력 이미지의 요소들을 조정하여 대체 표현들을 발생시키게 하도록 구성되는 조정 코드, 및 적어도 하나의 프로세서로 하여금, 입력 이미지와 대체 표현들 사이에서 가장 적은 왜곡 손실을 갖는 요소들을 대체 이미지로서 선택하게 하도록 구성되는 선택 코드에 의해 행해진다. 또한, E2E NIC 프레임워크의 트레이닝 모델은 입력 이미지의 학습률, 입력 이미지에 대한 업데이트들의 수, 및 왜곡 손실에 기초하여 트레이닝될 수 있다. 입력 이미지의 상이한 학습률들에 기초하여 하나 이상의 대체 이미지가 결정될 수 있다. 학습률들은 입력 이미지의 특성들에 기초하여 선택된다. 특성들은 입력 이미지의 RGB 분산 또는 입력 이미지의 RD 성능일 수 있다.The replacement image determined by the decision code 620 may be determined by an optimization process of the training model of the E2E NIC framework. This includes adjustment code configured to cause at least one processor to adjust elements of the input image to generate alternate representations, and to cause the at least one processor to cause the element with the least distortion loss between the input image and the alternate representations. This is done by selection code that is configured to allow selection of images as replacement images. In addition, the training model of the E2E NIC framework may be trained based on the learning rate of the input image, the number of updates to the input image, and the distortion loss. One or more replacement images may be determined based on the different learning rates of the input image. Learning rates are selected based on the characteristics of the input image. The characteristics can be the RGB variance of the input image or the RD performance of the input image.

추가적으로, 장치(600)는 사전트레이닝된 이미지 코딩 모델에 기초한 인공 뉴럴 네트워크를 사용할 수 있으며, 여기서, 인공 뉴럴 네트워크의 파라미터들은 고정되고, 입력 이미지를 업데이트하는 데 그래디언트가 사용된다.Additionally, apparatus 600 may use an artificial neural network based on a pretrained image coding model, where the parameters of the artificial neural network are fixed and a gradient is used to update the input image.

도 6은 장치의 예시적인 블록들을 도시하지만, 일부 구현들에서, 장치는 추가 블록들, 더 적은 수의 블록들, 상이한 블록들, 또는 도 6에 도시된 것들과 상이하게 배열된 블록들을 포함할 수 있다. 추가적으로, 또는 대안적으로, 장치의 블록들 중 2개 이상이 조합될 수 있다.6 shows example blocks of an apparatus, in some implementations the apparatus may include additional blocks, fewer blocks, different blocks, or blocks arranged differently than those shown in FIG. 6 . can Additionally or alternatively, two or more of the blocks of the device may be combined.

본 명세서의 실시예들은 E2E 이미지 압축 방법들을 설명한다. 방법들은 다양한 유형들의 품질 메트릭들을 수용하는 유연하고 일반적인 프레임워크를 사용함으로써 NIC 코딩 효율성을 개선하기 위해 대체 메커니즘들을 활용한다.Embodiments herein describe E2E image compression methods. The methods utilize alternative mechanisms to improve NIC coding efficiency by using a flexible and general framework that accommodates various types of quality metrics.

하나 이상의 실시예에 따른 E2E 이미지 압축 방법들은 개별적으로 사용되거나 또는 임의의 순서로 조합될 수 있다. 또한, 방법들(또는 실시예들), 인코더 및 디코더 각각은 프로세싱 회로망(예를 들어, 하나 이상의 프로세서 또는 하나 이상의 집적 회로)에 의해 구현될 수 있다. 일 예에서, 하나 이상의 프로세서는 비일시적 컴퓨터 판독가능 매체에 저장된 프로그램을 실행한다.E2E image compression methods according to one or more embodiments may be used individually or combined in any order. Further, each of the methods (or embodiments), encoder and decoder may be implemented by processing circuitry (eg, one or more processors or one or more integrated circuits). In one example, one or more processors execute programs stored on non-transitory computer readable media.

전술한 개시내용은 예시 및 설명을 제공하지만, 포괄적인 것으로 또는 구현들을 개시된 정확한 형태로 제한하도록 의도되지 않는다. 수정들 및 변형들이 상기 개시내용에 비추어 가능하거나 또는 구현들의 실시로부터 취득될 수 있다.The foregoing disclosure provides examples and descriptions, but is not intended to be exhaustive or to limit implementations to the precise forms disclosed. Modifications and variations are possible in light of the above disclosure or may be obtained from practice of the implementations.

본 명세서에 사용된 바와 같이, 용어 컴포넌트는 하드웨어, 펌웨어, 또는 하드웨어와 소프트웨어의 조합으로서 광범위하게 해석되도록 의도된다.As used herein, the term component is intended to be interpreted broadly as hardware, firmware, or a combination of hardware and software.

본 명세서에 설명된 시스템들 및/또는 방법들은 상이한 형태들의 하드웨어, 펌웨어, 또는 하드웨어와 소프트웨어의 조합으로 구현될 수 있음이 명백할 것이다. 이들 시스템들 및/또는 방법들을 구현하는 데 사용되는 실제 특수화된 제어 하드웨어 또는 소프트웨어 코드는 구현들을 제한하지 않고 있다. 따라서, 시스템들 및/또는 방법들의 동작 및 거동은 특정 소프트웨어 코드를 참조하지 않고 본 명세서에 설명되었으며, 소프트웨어 및 하드웨어는 본 명세서의 설명에 기초하여 시스템들 및/또는 방법들을 구현하도록 설계될 수 있음을 이해해야 한다.It will be apparent that the systems and/or methods described herein may be implemented in different forms of hardware, firmware, or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting the implementations. Thus, the operation and behavior of systems and/or methods have been described herein without reference to specific software code, and software and hardware may be designed to implement systems and/or methods based on the description herein. should understand

특징들의 조합들이 청구범위에 인용되고/되거나 명세서에 개시되더라도, 이들 조합들은 가능한 구현들의 개시내용을 제한하도록 의도되지 않는다. 사실, 이들 특징들 중 다수는 구체적으로 청구범위에 인용되고/되거나 명세서에 개시되지 않은 방식들로 조합될 수 있다. 아래에 나열된 각각의 종속항은 하나의 청구항에만 직접적으로 종속할 수 있지만, 가능한 구현들의 개시내용은 청구항 세트의 모든 다른 청구항과 조합하여 각각의 종속항을 포함한다.Although combinations of features may be recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible implementations. Indeed, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Each dependent claim listed below may directly depend on only one claim, but the disclosure of possible implementations includes each dependent claim in combination with every other claim in the set of claims.

본 명세서에 사용된 요소, 액션, 또는 명령어는 중요하거나 필수적인 것으로 명시적으로 설명되지 않는 한 이와 같이 해석될 수 없다. 또한, 본 명세서에 사용된 바와 같이, 관사들("a" 및 "an")은 하나 이상의 아이템을 포함하도록 의도되며, "하나 이상"과 상호교환 가능하게 사용될 수 있다. 또한, 본 명세서에 사용된 바와 같이, 용어 "세트"는 하나 이상의 아이템(예를 들어, 관련된 아이템들, 관련되지 않은 아이템들, 관련된 아이템들과 관련되지 않은 아이템들의 조합 등)을 포함하는 것으로 의도되며, "하나 이상"과 상호교환 가능하게 사용될 수 있다. 하나의 아이템만 의도되는 경우, 용어 "하나" 또는 유사한 언어가 사용된다. 또한, 본 명세서에 사용된 바와 같이, 용어들 "갖는다(has)", "갖다(have)", "갖는(having)" 등은 개방형 용어들인 것으로 의도된다. 또한, 문구 "~에 기초하여(based on)"는 달리 명시적으로 언급되지 않는 한 "~에 적어도 부분적으로 기초하여(based, at least in part, on)"를 의미하도록 의도된다.No element, action, or command used herein should be construed as such unless expressly described as important or essential. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more”. Also, as used herein, the term "set" is intended to include one or more items (eg, related items, unrelated items, combinations of related and unrelated items, etc.) and can be used interchangeably with "one or more". Where only one item is intended, the term "one" or similar language is used. Also, as used herein, the terms “has,” “have,” “having” and the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless expressly stated otherwise.

Claims

A substitutional end-to-end (E2E) neural image compression (NIC) method using a neural network performed by at least one processor, comprising:
Receiving an input image for the E2E NIC framework;
determining a substitute image based on a training model of the E2E NIC framework;
encoding the replacement image to generate a bitstream; and
mapping the surrogate image to the bitstream to generate a compressed representation;
Including, method.

According to claim 1,
splitting the input image into one or more blocks;
determining a replacement block for each of the one or more blocks based on a training model of the E2E NIC framework;
encoding the replacement block to generate a block bitstream; and
mapping the replacement block to the block bitstream to generate a compressed block;
In addition,
wherein the one or more blocks have the same size, and each block of the one or more blocks has a different learning rate.

The method of claim 1, wherein the replacement image is determined by performing an optimization process of a training model of the E2E NIC framework, the optimization process comprising:
adjusting elements of the input image to generate substitute representations; and
Selecting elements having the smallest distortion loss between the input image and the replacement representations and using them as the replacement image
Including, method.

2. The method of claim 1, wherein a training model of the E2E NIC framework is trained based on a learning rate of the input image, a number of updates to the input image, and a distortion loss.

5. The method of claim 4, wherein a plurality of replacement images are determined based on learning rates;
wherein the learning rates are selected based on characteristics of the input image.

6. The method of claim 5, wherein the characteristics of the input image are one of RGB variance of the input image and RD performance of the input image.

The method of claim 1, wherein the training model of the E2E NIC framework is an artificial neural network based on pre-trained image coding,
wherein the parameters of the artificial neural network are fixed and a gradient is used to update the input image.

Apparatus for alternative end-to-end (E2E) neural image compression (NIC) using neural networks, comprising:
at least one memory configured to store program code; and
at least one processor configured to read the program code and operate as directed by the program code
Including, the program code,
receiving code configured to cause at least one processor to receive an input image for an E2E NIC framework;
a first determining code configured to cause at least one processor to determine a replacement image based on a training model of the E2E NIC framework;
a first encoding code configured to cause at least one processor to encode the replacement image to generate a bitstream; and
a first mapping code configured to cause at least one processor to map the surrogate image to the bitstream to generate a compressed representation;
Including, device.

According to claim 8,
segmentation code configured to cause at least one processor to segment the input image into one or more blocks;
second decision code configured to cause at least one processor to determine a replacement block for each of the one or more blocks based on a training model of the E2E NIC framework;
a second encoding code configured to cause at least one processor to encode the replacement block to generate a block bitstream; and
a second mapping code configured to cause at least one processor to map the replacement block to the block bitstream to generate a compressed block;
In addition,
wherein the one or more blocks have the same size, and each block of the one or more blocks has a different learning rate.

The method of claim 8, wherein the replacement image is determined by performing an optimization process of a training model of the E2E NIC framework, the optimization process comprising:
adjustment code configured to cause at least one processor to adjust elements of the input image to generate alternative representations; and
selection code configured to cause at least one processor to select elements with the smallest distortion loss between the input image and the alternate representations to use as the alternate image;
Including, device.

9. The apparatus of claim 8, wherein a training model of the E2E NIC framework is trained based on a learning rate of the input image, a number of updates to the input image, and a distortion loss.

12. The method of claim 11, wherein a plurality of alternate images are determined based on learning rates;
wherein the learning rates are selected based on characteristics of the input image.

13. The apparatus of claim 12, wherein the characteristics of the input image are one of RGB dispersion of the input image and RD performance of the input image.

The method of claim 8, wherein the training model of the E2E NIC framework is an artificial neural network based on pre-trained image coding,
wherein the parameters of the artificial neural network are fixed and a gradient is used to update the input image.

A non-transitory computer readable medium storing instructions,
The instructions, when executed by at least one processor for alternative end-to-end (E2E) neural image compression (NIC), cause the at least one processor to:
Receive an input image to the E2E NIC framework;
Determine a replacement image based on the training model of the E2E NIC framework;
encoding the replacement image to generate a bitstream;
A non-transitory computer-readable medium that causes mapping of the replacement image to the bitstream to generate a compressed representation.

16. The method of claim 15, wherein the instructions, when executed by at least one processor, cause the at least one processor to further:
divide the input image into one or more blocks;
determine a replacement block for each of the one or more blocks based on a training model of the E2E NIC framework;
encoding the replacement block to generate a block bitstream;
mapping the replacement block to the block bitstream to generate a compressed block;
The one or more blocks have the same size, and each block of the one or more blocks has a different learning rate.

16. The method of claim 15, wherein the instructions, when executed by at least one processor, cause the at least one processor to further perform an optimization process of a training model of the E2E NIC framework, the optimization process comprising: ,
adjusting elements of the input image to generate alternative representations; and
Selecting elements with the smallest distortion loss between the input image and the replacement representations and using them as the replacement image.
A non-transitory computer readable medium comprising a.

16. The non-transitory computer-readable medium of claim 15, wherein a training model of the E2E NIC framework is trained based on a learning rate of the input image, a number of updates to the input image, and distortion loss.

19. The method of claim 18, wherein a plurality of alternate images are determined based on learning rates;
the learning rates are selected based on characteristics of the input image;
wherein the characteristics of the input image are one of RGB variance of the input image and RD performance of the input image.

The method of claim 15, wherein the training model of the E2E NIC framework is an artificial neural network based on pre-trained image coding,
wherein the parameters of the artificial neural network are fixed and a gradient is used to update the input image.