KR20230140755A

KR20230140755A - Method and Apparatus for Improving Video Compression Performance for Video Codecs

Info

Publication number: KR20230140755A
Application number: KR1020220039403A
Authority: KR
Inventors: 이상윤; 강홍구
Original assignee: 연세대학교 산학협력단
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-10-10
Also published as: KR102604657B1

Abstract

개시된 실시예는 인가된 입력 영상에 대해 신경망 연산하여 입력 영상의 특성 따른 적어도 하나의 압축 파라미터를 획득하는 단계, 적어도 하나의 압축 파라미터에 따라 입력 영상에 대한 전처리를 수행하는 인공 신경망의 가중치를 조절하는 단계, 가중치가 조절된 인공 신경망으로 입력 영상에 대한 전처리를 수행하여 압축 영상을 획득하는 단계 및 압축 영상을 인코딩하여 인코딩 영상을 획득하는 단계를 포함하여, 입력 영상의 특성에 따라 적응적으로 입력 영상의 시간 및 공간 해상도를 압축 및 복원하므로, 압축률을 크게 향상시킬 수 있을 뿐만 아니라 향상된 압축률에도 복원 영상의 품질 저하를 최대한 억제할 수 있는 영상 압축 성능 개선 방법 및 장치를 제공한다.The disclosed embodiment includes the steps of performing a neural network operation on an applied input image to obtain at least one compression parameter according to the characteristics of the input image, and adjusting the weight of an artificial neural network that performs preprocessing on the input image according to the at least one compression parameter. Step, preprocessing the input image with an artificial neural network with adjusted weights to obtain a compressed image, and encoding the compressed image to obtain an encoded image, adaptively adapting the input image according to the characteristics of the input image. By compressing and restoring temporal and spatial resolution, we provide a method and device for improving image compression performance that can not only significantly improve the compression rate, but also suppress deterioration in the quality of restored images as much as possible even at the improved compression rate.

Description

Method and apparatus for improving video compression performance {Method and Apparatus for Improving Video Compression Performance for Video Codecs}

개시되는 실시예들은 영상 압축 성능 개선 방법 및 장치에 관한 것으로, 입력 영상의 특성에 따라 적응적으로 시공간 해상도를 조절하여 비디오 코덱의 효율성 향상시킬 수 있는 영상 압축 성능 개선 방법 및 장치에 관한 것이다.The disclosed embodiments relate to a method and device for improving video compression performance, and to a method and device for improving video compression performance that can improve the efficiency of a video codec by adaptively adjusting spatiotemporal resolution according to the characteristics of the input video.

화상 통신 및 동영상 기반의 OTT(Over-The-Top media service) 서비스가 급격히 증가함에 따라, 영상 신호를 효과적으로 전송 혹은 저장하기 위한 비디오 압축 혹은 코덱 기술에 대한 중요성이 지속적으로 제기되고 있다. 신호처리 및 확률/통계 이론을 기반으로 구현된 H.26L, HEVC (High Efficiency Video Codec) codec과 같은 기존의 레거시(legacy) 비디오 코덱은 ITU-T, MPEG 등의 국제 표준화 작업을 통해 다양한 응용 분야에서 성공적으로 상용화되고 있다.As video communication and video-based OTT (Over-The-Top media service) services rapidly increase, the importance of video compression or codec technology to effectively transmit or store video signals continues to increase. Existing legacy video codecs such as H.26L and HEVC (High Efficiency Video Codec) codec, which are implemented based on signal processing and probability/statistics theory, have been developed into various application fields through international standardization work such as ITU-T and MPEG. is being successfully commercialized.

한편 고화질 영상을 선호하는 사용자들의 요구를 해결하기 위해 영상 디스플레이 화면 크기가 지속적으로 증가되고 있어, 영상의 시간 및 공간 상에서의 해상도 역시 급격히 증가하고 있다. 이로 인해 고화질 영상을 표현하기 위해 필요한 정보량 역시 증가하였으며, 정보량이 증가된 고화질 영상을 저장하기 위해 필요한 메모리 용량 및 스트리밍 서비스를 위한 네트워크의 용량 역시 기하 급수적으로 증가하고 있다.Meanwhile, the size of video display screens is continuously increasing to address the needs of users who prefer high-definition video, and the resolution of video in time and space is also rapidly increasing. As a result, the amount of information required to express high-definition video has also increased, and the memory capacity required to store high-definition video with the increased information amount and the network capacity for streaming services are also increasing exponentially.

따라서 고화질 영상을 효율적으로 처리하기 위해 영상 압축 기술이 활발하게 연구되고 있다. 영상 압축 기술은 영상의 화질을 최대한 원본과 가깝게 유지하면서 적은 비트수로 영상 데이터를 표현하는 기술로써, 영상을 표현하기 위한 데이터의 양을 줄여 전송 및 저장의 효율성을 향상시키기 위한 기술이다. 코덱 또한 영상 압축 기술의 일부로 볼 수 있다.Therefore, image compression technology is being actively researched to efficiently process high-definition images. Video compression technology is a technology that expresses video data with a small number of bits while maintaining the quality of the video as close to the original as possible. It is a technology to improve the efficiency of transmission and storage by reducing the amount of data to express the video. Codecs can also be seen as part of video compression technology.

한편 인공 신경망을 이용한 딥러닝 기술의 발전으로 인해, 로우 레벨 컴퓨터 비전에서도 압축된 영상을 원본 영상에 가깝게 복원할 수 있는 초해상도(Super-resolution) 기법 및 프레임 보간 기법 등이 제안되었다. 딥러닝 기술을 이용하면 입력 영상의 스케일이나 프레임율이 축소될 지라도 원본 영상으로 더욱 가깝게 복원할 수 있다. 즉 기존의 핸드 크래프트(Hand-crafted) 알고리즘과 비교하여 압축 영상 복원 시, 왜곡의 발생을 최대한 줄일 수 있다. 이에 동영상 압축에서 인공 신경망을 이용하여 입력 영상의 시간 및 공간적 해상도 축소를 통한 전처리를 수행하고 축소된 영상을 기존 코덱을 이용해 영상을 부호화하여 압축률을 크게 높이고, 압축된 비트스트림을 기존 코덱의 디코더로 복호화를 수행한 후 인공 신경망으로 영상의 해상도를 복원하는 방식도 제안되었다.Meanwhile, due to the development of deep learning technology using artificial neural networks, super-resolution techniques and frame interpolation techniques that can restore compressed images close to the original images have been proposed even in low-level computer vision. Using deep learning technology, even if the scale or frame rate of the input image is reduced, it can be restored more closely to the original image. In other words, compared to the existing hand-crafted algorithm, the occurrence of distortion can be reduced as much as possible when restoring compressed images. Accordingly, in video compression, an artificial neural network is used to perform preprocessing by reducing the temporal and spatial resolution of the input video, and the reduced video is encoded using an existing codec to significantly increase the compression rate, and the compressed bitstream is transferred to the decoder of the existing codec. A method of restoring the resolution of the image using an artificial neural network after performing decoding was also proposed.

이러한 코덱과 인공 신경망을 함께 이용하는 압축 기법은 기존 비디오 코덱 모듈과 결합되어 매우 우수한 압축 성능 향상을 이루어 냈다. 다만 비디오 코덱의 전처리 및 후처리 작업을 수행하는 인공 신경망이 End-to-end 학습 방법 학습되므로, 학습 시에 가중치가 고정되어 여전히 영상의 다양한 특성을 명확히 반영하지 못한다는 한계가 있다. Compression techniques that use these codecs and artificial neural networks together with existing video codec modules have achieved excellent compression performance improvements. However, since the artificial neural network that performs the pre- and post-processing tasks of the video codec is trained using an end-to-end learning method, there is a limitation in that the weights are fixed during learning and still do not clearly reflect the various characteristics of the video.

영상 압축 기술을 이용하여 인코딩 과정에서 입력 영상의 크기 및 프레임율을 줄이면, 영상의 총 정보량이 감소하여 인코딩된 비트스트림(bitstream)의 비트레이트(bitrate)가 원본 영상보다 줄어들도록 압축될 수 있다. 그러나 다양한 특성을 갖는 입력 영상을 동일한 방식으로 압축하는 경우, 입력 영상의 특성에 따라 서로 상이한 크기로 정보량 손실이 발생된다. 따라서 단순하게 이중선형 다운 샘플링(bilinear downsampling) 및 프레임율 다운 샘플링을 수행했을 때, 비트레이트의 이득에 비해 복구된 영상의 왜곡(Distortion)이 월등히 커질 수도 있다. 예로서 기존에는 입력 영상 신호의 공간에 대한 해상도를 N:1 형태로 줄여서 레거시 코덱에 입력하여 부호화/복호화한 후, 초해상도 기법을 통해 복호화하여 원본 영상 해상도의 영상을 복원하였다. 또는 프레임 보간 기법을 이용하여 시간축 해상도를 복원하는 방식을 이용하였다. 그러나 입력 영상의 콘텐츠 종류나 촬영 방법 등에 따라 영상의 시간 및 공간적 특성이 변화하므로, 초해상도 기법과 프레임 보간 기법을 단순하게 적용하는 경우, 복원된 영상의 화질 저하가 발생할 수 있다. 즉 입력 영상의 특성에 따라 압축 성능이 균일하지 않아 압축에 의한 효율성이 일반적으로 향상된다고 볼 수 없다는 문제가 있다.If the size and frame rate of the input video are reduced during the encoding process using video compression technology, the total amount of information in the video is reduced, so the bitrate of the encoded bitstream can be compressed to be lower than that of the original video. However, when input images with various characteristics are compressed in the same way, information loss occurs in different sizes depending on the characteristics of the input images. Therefore, when bilinear downsampling and frame rate downsampling are simply performed, the distortion of the recovered image may be significantly greater than the bitrate gain. For example, in the past, the spatial resolution of the input video signal was reduced to N:1 format, input into a legacy codec, encoded/decoded, and then decoded using a super-resolution technique to restore the original video resolution. Alternatively, a method of restoring time axis resolution using a frame interpolation technique was used. However, since the temporal and spatial characteristics of the image change depending on the type of content of the input image or the shooting method, etc., if the super-resolution technique and frame interpolation technique are simply applied, the image quality of the restored image may deteriorate. In other words, there is a problem that compression performance cannot be considered to be generally improved because compression performance is not uniform depending on the characteristics of the input image.

한국 공개 특허 제10-2018-0119753호 (2018.11.05 공개)Korean Patent Publication No. 10-2018-0119753 (published on November 5, 2018)

개시되는 실시예들은 입력 영상을 특성에 따라 영상을 상이하게 압축 및 복원함으로써, 압축률을 향상시킬 수 있는 영상 압축 성능 개선 방법 및 장치를 제공하는데 있다.The disclosed embodiments provide a method and device for improving image compression performance that can improve the compression rate by compressing and restoring images differently depending on the characteristics of the input image.

개시되는 실시예들은 압축률이 향상되면서도 복원 영상의 품질 저하를 최대한 억제할 수 있는 영상 압축 성능 개선 방법 및 장치를 제공하는데 있다.The disclosed embodiments provide a method and device for improving image compression performance that can minimize degradation of the quality of restored images while improving compression rates.

실시예에 따른 영상 압축 성능 개선 방법은 인가된 입력 영상에 대해 신경망 연산하여 상기 입력 영상의 특성 따른 적어도 하나의 압축 파라미터를 획득하는 단계; 상기 적어도 하나의 압축 파라미터에 따라 상기 입력 영상에 대한 전처리를 수행하는 인공 신경망의 가중치를 조절하는 단계; 가중치가 조절된 인공 신경망으로 상기 입력 영상에 대한 전처리를 수행하여 압축 영상을 획득하는 단계; 및 상기 압축 영상을 인코딩하여 인코딩 영상을 획득하는 단계를 포함한다.A method for improving image compression performance according to an embodiment includes performing a neural network operation on an applied input image to obtain at least one compression parameter according to characteristics of the input image; adjusting weights of an artificial neural network that performs preprocessing on the input image according to the at least one compression parameter; Obtaining a compressed image by performing preprocessing on the input image using an artificial neural network with adjusted weights; and encoding the compressed video to obtain an encoded video.

상기 가중치를 조절하는 단계는 상기 전처리를 수행하는 인공 신경망의 다수의 연산 레이어 각각에 포함된 메인 커널과 적어도 하나의 서브 커널을 구성하는 가중치를 적어도 상기 적어도 하나의 압축 파라미터에 따른 비율로 혼합하여 각 연산 레이어의 가중치를 조절할 수 있다.The step of adjusting the weights includes mixing the weights constituting the main kernel and at least one sub-kernel included in each of the plurality of computational layers of the artificial neural network performing the preprocessing at a ratio according to the at least one compression parameter, The weight of the calculation layer can be adjusted.

상기 가중치를 조절하는 단계는 상기 메인 커널을 구성하는 가중치에 압축 파라미터에 따른 비율(1-α)을 가중하고, 서브 커널을 구성하는 가중치에 압축 파라미터(α)를 가중하여 합하여 가중치를 조절할 수 있다.In the step of adjusting the weights, the weights can be adjusted by adding a ratio (1-α) according to the compression parameter to the weights constituting the main kernel and adding the compression parameter (α) to the weights constituting the subkernel. .

상기 가중치를 조절하는 단계는 상기 압축 파라미터가 다수개이면, 메인 커널의 가중치 압축 파라미터에 따른 비율로 서브 커널의 가중치를 혼합하여 메인 커널의 가중치를 조절하고, 이후, 이전 압축 파라미터에 의해 조절된 메인 커널 가중치에 순차적으로 다른 압축 파라미터에 따른 비율로 서브 커널의 가중치를 혼합하여 조절할 수 있다.In the step of adjusting the weights, if there are multiple compression parameters, the weights of the main kernel are adjusted by mixing the weights of the subkernels in a ratio according to the weight compression parameters of the main kernel, and then, the weights of the main kernel adjusted by the previous compression parameters are adjusted. The weights of subkernels can be adjusted by mixing them in proportions according to different compression parameters sequentially with the kernel weights.

상기 적어도 하나의 압축 파라미터를 획득하는 단계는 둘 이상의 인공 신경망을 포함하고, 둘 이상의 인공 신경망은 각각 상기 입력 영상을 인가받아 신경망 연산하여, 상기 입력 영상의 크기를 줄일 수 있는 위한 비율을 나타내는 스케일 파라미터와 상기 입력 영상의 프레임 내 및 프레임간 복잡도를 나타내는 복잡도 파라미터를 상기 압축 파라미터로서 획득할 수 있다.The step of obtaining at least one compression parameter includes two or more artificial neural networks, each of the two or more artificial neural networks receives the input image and performs neural network operation, and a scale parameter indicating a ratio for reducing the size of the input image. and a complexity parameter representing the intra- and inter-frame complexity of the input image can be obtained as the compression parameter.

상기 압축 영상을 획득하는 단계는 가중치가 조절된 인공 신경망의 다수의 연산 레이어로 상기 입력 영상에 대해 신경망 연산하여 특징맵을 출력하고, 상기 적어도 하나의 압축 파라미터에 상기 스케일 파라미터가 포함되어 있으면, 상기 스케일 파라미터에 따라 스케일 다운 레이어가 특징맵을 다운 스케일링하여 상기 압축 영상을 획득할 수 있다.The step of acquiring the compressed image includes outputting a feature map by performing a neural network operation on the input image using a plurality of calculation layers of an artificial neural network with adjusted weights, and if the scale parameter is included in the at least one compression parameter, According to the scale parameter, the scale down layer can downscale the feature map to obtain the compressed image.

상기 인코딩 영상을 획득하는 단계는 인코딩된 영상에 상기 적어도 하나의 압축 파라미터를 포함하여 상기 인코딩 영상을 획득할 수 있다.In the step of acquiring the encoded video, the encoded video may be obtained by including the at least one compression parameter in the encoded video.

상기 영상 압축 성능 개선 방법은 상기 인코딩 영상을 인가받아 디코딩하여 디코딩 영상을 획득하고, 상기 인코딩 영상에 포함된 상기 적어도 하나의 압축 파라미터를 추출하는 단계; 상기 적어도 하나의 압축 파라미터에 따라 상기 디코딩 영상에 대한 후처리를 수행하는 인공 신경망의 후처리 가중치를 조절하는 단계; 및 가중치가 조절된 인공 신경망으로 상기 디코딩 영상에 대해 후처리를 수행하여 복원 영상을 획득하는 단계를 더 포함할 수 있다.The method for improving video compression performance includes receiving and decoding the encoded video to obtain a decoded video, and extracting the at least one compression parameter included in the encoded video; adjusting post-processing weights of an artificial neural network that performs post-processing on the decoded image according to the at least one compression parameter; And it may further include performing post-processing on the decoded image using an artificial neural network with adjusted weights to obtain a restored image.

상기 가중치를 조절하는 단계는 상기 후처리를 수행하는 인공 신경망의 다수의 연산 레이어 각각에 포함된 메인 커널과 적어도 하나의 서브 커널을 구성하는 가중치를 적어도 상기 적어도 하나의 압축 파라미터에 따른 비율로 혼합하여 각 연산 레이어의 가중치를 조절할 수 있다.The step of adjusting the weights includes mixing the weights constituting the main kernel and at least one sub-kernel included in each of the plurality of computational layers of the artificial neural network performing the post-processing at a ratio according to the at least one compression parameter. The weight of each computational layer can be adjusted.

상기 복원 영상을 획득하는 단계는 가중치가 조절된 인공 신경망의 다수의 연산 레이어로 상기 입력 영상에 대해 신경망 연산하여 특징맵을 출력하고, 상기 적어도 하나의 압축 파라미터에 스케일 파라미터가 포함되어 있으면, 상기 스케일 파라미터에 따라 스케일 업 레이어가 특징맵을 업 스케일링하여 상기 복원 영상을 획득할 수 있다.The step of acquiring the reconstructed image includes outputting a feature map by performing a neural network operation on the input image using a plurality of calculation layers of an artificial neural network with adjusted weights, and if the at least one compression parameter includes a scale parameter, the scale Depending on the parameter, the scale-up layer may upscale the feature map to obtain the restored image.

실시예에 따른 영상 압축 성능 개선 장치는 하나 이상의 프로세서; 및 상기 하나 이상의 프로세서에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 장치로서, 상기 프로세서는 인가된 입력 영상에 대해 신경망 연산하여 상기 입력 영상의 특성 따른 적어도 하나의 압축 파라미터를 획득하고, 상기 적어도 하나의 압축 파라미터에 따라 상기 입력 영상에 대한 전처리를 수행하는 인공 신경망의 가중치를 조절하고, 가중치가 조절된 인공 신경망으로 상기 입력 영상에 대한 전처리를 수행하여 압축 영상을 획득하며, 상기 압축 영상을 인코딩하여 인코딩 영상을 획득한다.An apparatus for improving image compression performance according to an embodiment includes one or more processors; and a memory that stores one or more programs executed by the one or more processors, wherein the processor performs a neural network operation on the input image to obtain at least one compression parameter according to the characteristics of the input image, Adjusting the weight of an artificial neural network that performs preprocessing on the input image according to at least one compression parameter, performing preprocessing on the input image with the artificial neural network with the adjusted weight to obtain a compressed image, and obtaining the compressed image. Encode to obtain the encoded video.

따라서, 실시예에 따른 영상 압축 성능 개선 방법 및 장치는 입력 영상의 특성을 분석하여, 입력 영상의 특성에 따라 적응적으로 입력 영상의 시간 및 공간 해상도를 압축 및 복원하므로, 압축률을 크게 향상시킬 수 있을 뿐만 아니라 향상된 압축률에도 복원 영상의 품질 저하를 최대한 억제할 수 있다.Therefore, the image compression performance improvement method and device according to the embodiment analyzes the characteristics of the input image and adaptively compresses and restores the temporal and spatial resolution of the input image according to the characteristics of the input image, thereby significantly improving the compression rate. Not only that, but even with the improved compression rate, the quality deterioration of the restored video can be suppressed as much as possible.

도 1은 일 실시예 따른 영상 압축 성능 개선 장치에서 수행되는 동작에 따라 구분된 구성을 나타낸다.
도 2는 도 1의 전처리 네트워크와 후처리 네트워크의 개략적 구성을 나타낸다.
도 3은 도 2의 전처리 네트워크와 후처리 네트워크의 다수의 연산 레이어의 상세 구성의 일 예를 나타낸다.
도 4는 도 4의 파라미터 필터의 상세 구성의 일 예를 나타낸다.
도 5는 일 실시예에 따른 영상 압축 성능 개선 방법을 나타낸다.
도 6은 일 실시예에 따른 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 설명하기 위한 도면이다.Figure 1 shows a configuration divided according to operations performed in an apparatus for improving video compression performance according to an embodiment.
Figure 2 shows a schematic configuration of the pre-processing network and post-processing network of Figure 1.
FIG. 3 shows an example of a detailed configuration of multiple computational layers of the pre-processing network and post-processing network of FIG. 2.
Figure 4 shows an example of the detailed configuration of the parameter filter of Figure 4.
Figure 5 shows a method for improving image compression performance according to an embodiment.
FIG. 6 is a diagram for explaining a computing environment including a computing device according to an embodiment.

이하, 도면을 참조하여 일 실시예의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of one embodiment will be described with reference to the drawings. The detailed description below is provided to provide a comprehensive understanding of the methods, devices and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

일 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 일 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 일 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현"은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In describing one embodiment, if it is determined that a detailed description of the known technology related to the present invention may unnecessarily obscure the gist of an embodiment, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification. The terminology used in the detailed description is intended to describe only one embodiment and should in no way be limiting. Unless explicitly stated otherwise, singular forms include plural meanings. In this description, expressions such as "comprising" or "having" are intended to indicate any features, numbers, steps, operations, elements, part or combination thereof, and one or more other than those described. It should not be construed to exclude the existence or possibility of other characteristics, numbers, steps, operations, elements, or parts or combinations thereof. In addition, "... part", "... period", " described in the specification. Terms such as “module” and “block” refer to a unit that processes at least one function or operation, which may be implemented as hardware, software, or a combination of hardware and software.

도 1은 일 실시예 따른 영상 압축 성능 개선 장치에서 수행되는 동작에 따라 구분된 구성을 나타낸다.Figure 1 shows a configuration divided according to operations performed in an apparatus for improving video compression performance according to an embodiment.

도시된 실시예에서, 각 구성들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술되지 것 이외에도 추가적인 구성을 포함할 수 있다. 또한, 일 실시예에서, 각 구성은 물리적으로 구분된 하나 이상의 장치를 이용하여 구현되거나, 하나 이상의 프로세서 또는 하나 이상의 프로세서 및 소프트웨어의 결합에 의해 구현될 수 있으며, 도시된 예와 달리 구체적 동작에 있어 명확히 구분되지 않을 수 있다.In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components other than those described below. Additionally, in one embodiment, each component may be implemented using one or more physically separate devices, one or more processors, or a combination of one or more processors and software, and, unlike the example shown, may be implemented in specific operations. It may not be clearly distinguished.

그리고 도 1에 도시된 영상 압축 성능 개선 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.Additionally, the image compression performance improvement device shown in FIG. 1 may be implemented in a logic circuit using hardware, firmware, software, or a combination thereof, and may also be implemented using a general-purpose or special-purpose computer. The device may be implemented using hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), etc. Additionally, the device may be implemented as a System on Chip (SoC) including one or more processors and a controller.

뿐만 아니라 영상 압축 성능 개선 장치는 하드웨어적 요소가 마련된 컴퓨팅 장치 또는 서버에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 장치 또는 서버는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.In addition, the image compression performance improvement device may be mounted on a computing device or server equipped with hardware elements in the form of software, hardware, or a combination thereof. A computing device or server includes all or part of a communication device such as a communication modem for communicating with various devices or a wired or wireless communication network, a memory for storing data to execute a program, and a microprocessor for executing a program to perform calculations and commands. It can refer to a variety of devices, including:

도 1을 참조하면, 일 실시예에 따른 영상 압축 성능 개선 장치는 영상 압축 모듈(100) 및 영상 복원 모듈(200)을 포함할 수 있다. 여기서 영상 압축 모듈(100)과 영상 복원 모듈(200)은 각각 영상 압축 장치와 영상 복원 장치로 구분되어 구현될 수 있다. 이 경우 영상 압축 장치와 영상 복원 장치 각각은 통신 모듈을 더 포함하여 영상 전송 장치와 영상 수신 장치로 동작할 수도 있다.Referring to FIG. 1, an apparatus for improving image compression performance according to an embodiment may include an image compression module 100 and an image restoration module 200. Here, the image compression module 100 and the image restoration module 200 may be implemented as an image compression device and an image restoration device, respectively. In this case, each of the image compression device and the image restoration device may further include a communication module and operate as an image transmission device and an image reception device.

영상 압축 모듈(100)은 영상을 획득하고, 획득된 영상을 저장 또는 전송하기 위해 압축하며, 영상 복원 모듈(200)은 압축되어 저장되거나 전송된 영상을 인가받아 복원하는 역할을 수행한다.The image compression module 100 acquires an image and compresses the acquired image for storage or transmission, and the image restoration module 200 receives and restores the compressed image that has been stored or transmitted.

특히 실시예에서 영상 압축 모듈(100)은 인공 신경망을 이용하여 획득된 영상의 특성을 판별하고, 판별된 특성에 따라 적응적으로 영상의 스케일이나 압축율을 조절하여 압축을 수행하며, 영상 복원 모듈(200) 또한 영상 압축 모듈(100)이 판별한 영상 특성에 따라 적용된 영상의 스케일이나 압축률을 반영하여 복원을 수행한다. 따라서 각 영상의 특성에 따라 상이한 스케일 및 압축률로 압축 및 복원을 수행함으로써, 압축률을 향상시킬 수 있을 뿐만 아니라 복원된 영상의 품질이 저하되는 것을 최대한 억제할 수 있다. 또한 영상 압축 성능 개선 장치는 기존의 레거시 코덱을 최대한 활용할 수 있도록 함으로써, 영상 표준 영상 압축 기법과 최대한 호환 가능하도록 한다.In particular, in the embodiment, the image compression module 100 determines the characteristics of the image acquired using an artificial neural network, performs compression by adaptively adjusting the scale or compression rate of the image according to the determined characteristics, and the image restoration module ( 200) Additionally, restoration is performed by reflecting the scale or compression rate of the image applied according to the image characteristics determined by the image compression module 100. Therefore, by performing compression and restoration at different scales and compression rates according to the characteristics of each image, not only can the compression rate be improved, but also the deterioration of the quality of the restored image can be suppressed as much as possible. In addition, the video compression performance improvement device makes full use of existing legacy codecs, making it as compatible as possible with standard video compression techniques.

구체적으로 영상 압축 모듈(100)은 영상 획득 모듈(110), 영상 특성 추정 모듈(120), 전처리 네트워크(130) 및 인코더(140)를 포함할 수 있다.Specifically, the image compression module 100 may include an image acquisition module 110, an image characteristic estimation module 120, a preprocessing network 130, and an encoder 140.

영상 획득 모듈(110)은 압축 대상이 되는 입력 영상을 획득한다. 영상 획득 모듈(110)은 저장 또는 전송을 위해 데이터량이 저감되어야 하는 입력 영상을 획득하며, 이때 획득되는 입력 영상은 단일 프레임의 정지 영상이거나 다수의 프레임으로 구성된 동영상일 수도 있다. 영상 획득 모듈(110)은 입력 영상을 직접 취득하는 카메라 모듈로 구현되거나, 취득된 입력 영상이 저장된 메모리와 같은 컴퓨터 판독 가능한 저장 매체 등으로 구현될 수도 있다.The image acquisition module 110 acquires an input image to be compressed. The image acquisition module 110 acquires an input image whose data amount must be reduced for storage or transmission. At this time, the acquired input image may be a still image of a single frame or a moving image composed of multiple frames. The image acquisition module 110 may be implemented as a camera module that directly acquires the input image, or may be implemented as a computer-readable storage medium such as a memory in which the acquired input image is stored.

영상 특성 추정 모듈(120)은 영상 획득 모듈(110)에서 획득된 입력 영상에 대해 학습된 인공 신경망으로 신경망 연산을 수행하여, 입력 영상의 특성을 추정하고, 추정된 영상 특성에 따른 압축 파라미터를 획득한다. 압축 파라미터는 입력 영상에 대해 적응적으로 서로 다른 압축률로 압축할 수 있도록 하기 위해 결정되는 파라미터로서, 입력 영상의 특성을 고려하여 결정된다.The image characteristic estimation module 120 performs a neural network operation with an artificial neural network learned on the input image acquired in the image acquisition module 110, estimates the characteristics of the input image, and obtains compression parameters according to the estimated image characteristics. do. Compression parameters are parameters determined to enable adaptive compression of input images at different compression rates, and are determined taking into account the characteristics of the input images.

입력 영상이 매우 단조로운 구조의 영상이거나 단색의 평면 영상 등과 같이 단순 구조의 영상인 경우와 다수의 객체와 다양한 배경 및 색상이 포함된 복잡한 구조의 영상은 영상 특성이 매우 상이하다고 할 수 있다. 그리고 이와 같이 영상 특성이 서로 매우 상이한 영상을 동일한 압축률로 압축하고자 하는 경우, 단순 영상을 기준으로 압축을 수행하면 압축율을 크게 향상시킬 수 있지만 복원된 복잡한 영상의 품질이 크게 저하되는 문제가 발생한다. 그에 반해, 복잡한 영상을 기준으로 압축을 수행하면 영상 품질을 유지할 수 있으나, 압축률이 매우 낮아져 영상 압축의 의미가 퇴색된다.When the input image is an image with a simple structure, such as an image with a very monotonous structure or a monochromatic flat image, and an image with a complex structure containing many objects and various backgrounds and colors, the image characteristics are very different. In this case, when images with very different image characteristics are to be compressed at the same compression rate, performing compression based on a simple image can greatly improve the compression rate, but the quality of the restored complex image is greatly reduced. On the other hand, if compression is performed based on a complex image, image quality can be maintained, but the compression rate becomes very low, so the meaning of image compression fades.

이에 도 1의 실시예에서는 영상 특성 모듈(120)이 영상의 특성을 사전에 판별하여 압축 파라미터를 획득하고, 획득된 압축 파라미터에 따라 전처리 네트워크(130)가 서로 상이하게 입력 영상을 압축하도록 한다. 이때 영상 특성 모듈(120)이 영상의 특성에 따라 획득하는 압축 파라미터는 다양하게 설정될 수 있으나, 여기서는 일 예로 영상 특성 모듈(120)이 압축 파라미터로서 스케일 파라미터(β)와 복잡도 파라미터(α)를 획득하는 것으로 가정한다. 여기서 각 압축 파라미터(α, β)는 [0,1] 사이의 범위의 실수 값으로 획득될 수 있다.Accordingly, in the embodiment of FIG. 1, the image characteristic module 120 determines the characteristics of the image in advance to obtain compression parameters, and the preprocessing network 130 compresses the input image differently according to the obtained compression parameters. At this time, the compression parameters that the image characteristics module 120 acquires according to the characteristics of the image may be set in various ways, but here, as an example, the image characteristics module 120 uses the scale parameter (β) and complexity parameter (α) as compression parameters. Assume it is obtained. Here, each compression parameter (α, β) can be obtained as a real value in the range between [0,1].

도 1에서는 영상 특성 추정 모듈(120)이 압축 파라미터로서 스케일 파라미터(β)와 복잡도 파라미터(α)를 획득하는 것으로 가정하였으므로, 영상 특성 추정 모듈(120)은 인공 신경망으로 구현되는 스케일 결정 네트워크(121)와 복잡도 판별 네트워크(122)를 구비하는 것으로 도시하였다. 그러나, 영상 특성 추정 모듈(120)은 획득하고자 하는 영상의 특성 종류에 따라 구분되는 압축 파라미터를 각각 추출하기 위한 별도의 인공 신경망을 구비할 수 있다.In Figure 1, it is assumed that the image characteristic estimation module 120 obtains the scale parameter (β) and the complexity parameter (α) as compression parameters, so the image characteristic estimation module 120 is implemented as an artificial neural network. ) and a complexity determination network 122. However, the image characteristic estimation module 120 may be equipped with a separate artificial neural network for extracting compression parameters classified according to the type of characteristic of the image to be acquired.

스케일 결정 네트워크(121)는 입력 영상을 인가받아 신경망 연산하여, 입력 영상의 해상도를 조절하기 위한 스케일 파라미터(β)를 결정한다. 입력 영상은 영상 내에 포함된 배경이나 객체 등의 구조적 형태 등에 따라 복원 가능한 수준에서 다운 스케일링 될 수 있다. 즉 입력 영상의 해상도보다 저해상도의 영상으로 스케일 다운되어도 입력 영상의 해상도로 복원된 복원 영상의 품질이 입력 영상의 품질과 크게 차이가 발생하지 않을 수 있다. 다만 상기한 바와 같이, 단일 색상이나 단순 구조 객체가 포함된 영상의 경우, 스케일 다운 비율을 크게 높이더라도 원본인 입력 영상과 큰 차이가 없는 복원 영상을 획득할 수 있는 반면, 복잡한 구조의 객체가 포함된 영상의 경우, 스케일 다운 비율을 높이면 복원 영상의 품질이 크게 저하될 수 있다. 예를 들어 3,080 X 2,106 크기를 갖는 UHD(Ultra High Definition) 해상도의 입력 영상을 동일하게 1,920 X 1,080 크기의 FHD(Full High Definition) 해상도로 압축하면, 복원된 영상은 영상의 특성에 따라 매우 상이한 영상 품질을 나타내게 된다.The scale determination network 121 receives the input image, performs neural network calculations, and determines a scale parameter (β) for adjusting the resolution of the input image. The input image may be downscaled to a level that can be restored depending on the structural form of the background or object included in the image. In other words, even if the resolution of the input image is scaled down to a lower resolution image, the quality of the reconstructed image restored to the resolution of the input image may not be significantly different from the quality of the input image. However, as mentioned above, in the case of images containing a single color or objects with a simple structure, it is possible to obtain a restored image that is not significantly different from the original input image even if the scale down ratio is greatly increased, whereas objects with a complex structure can be obtained. In the case of restored images, increasing the scale down ratio may significantly deteriorate the quality of the restored images. For example, if an input image with UHD (Ultra High Definition) resolution of 3,080 It indicates quality.

이에 스케일 결정 네트워크(121)는 입력 영상에 대해 신경망 연산을 수행하여 입력 영상의 해상도에 따른 스케일 특징을 추정하고, 추정된 스케일 특징에 따라 입력 영상에 적합한 스케일 파라미터를 추정할 수 있다. 이때 스케일 결정 네트워크(121)는 가로 및 세로 방향에서 균일한 비율의 스케일 파라미터(β)를 획득하도록 구성될 수 있으나, 경우에 따라서는 가로 및 세로 방향에 대해 각각 스케일 파라미터(β1, β2)를 획득하도록 구성될 수도 있다.Accordingly, the scale determination network 121 may perform a neural network operation on the input image to estimate scale characteristics according to the resolution of the input image, and estimate scale parameters suitable for the input image according to the estimated scale characteristics. At this time, the scale determination network 121 may be configured to obtain a uniform ratio of scale parameters (β) in the horizontal and vertical directions, but in some cases, scale parameters (β1, β2) are obtained for the horizontal and vertical directions, respectively. It may be configured to do so.

이는 입력 영상의 구성에 따라 가로 방향 및 세로 방향에 대해 상이한 특성이 나타날 수 있기 때문이다. 일 예로 가로수 길이나 가로등에 대한 영상인 경우, 세로 방향에 비해 가로 방향에서 더 큰 비율로 스케일 다운되더라도, 이후 용이하게 고품질의 영상을 복원할 수 있다. 반면, 세로 방향으로 더 큰 비율로 스케일 다운되면 고품질의 영상을 복원하기 어렵다. 이에 스케일 결정 네트워크(121)는 압축 성능을 향상시키면서도 이후 용이하게 영상을 복원할 수 있도록 가로 및 세로 방향에 각각에 대한 스케일 파라미터(β1, β2)를 구분하여 획득할 수도 있다.This is because different characteristics may appear in the horizontal and vertical directions depending on the configuration of the input image. For example, in the case of an image of a street tree or a streetlight, even if the scale is scaled down at a greater rate in the horizontal direction than in the vertical direction, a high-quality image can be easily restored later. On the other hand, if the scale is scaled down to a larger ratio in the vertical direction, it is difficult to restore high-quality video. Accordingly, the scale determination network 121 may separately obtain scale parameters (β1, β2) for each of the horizontal and vertical directions so as to improve compression performance and easily restore the image later.

한편, 복잡도 판별 네트워크(122)는 입력 영상에 대해 신경망 연산을 수행하여 입력 영상의 각 프레임 내 및 인접한 프레임간 복잡도를 추정하고, 추정된 복잡도에 따라 복잡도 파라미터(α)를 획득한다. 영상의 복잡도는 영상의 해상도에 따른 스케일과도 연관이 되지만, 스케일과 별개로 영상 자체의 특성에도 큰 영향을 받는다. 예로서 단색의 벽면을 촬영한 영상과 숲이나 많은 사람이 모인 광장을 촬영한 영상의 복잡도는 스케일과 무관하게 프레임내 복잡도가 서로 상이한 것으로 볼 수 있다. 즉 프레임내 픽셀간 유사도에 따라 프레임내 복잡도가 서로 상이하게 나타날 수 있다.Meanwhile, the complexity determination network 122 performs a neural network operation on the input image to estimate the complexity within each frame of the input image and between adjacent frames, and obtains a complexity parameter (α) according to the estimated complexity. The complexity of an image is related to the scale of the image resolution, but apart from the scale, it is also greatly affected by the characteristics of the image itself. For example, the complexity of an image of a monochromatic wall and an image of a forest or a plaza with many people can be seen as having different frame complexity regardless of scale. In other words, the complexity within a frame may appear different depending on the similarity between pixels within the frame.

또한 다수의 프레임으로 구성된 영상의 경우, 움직이지 않는 객체를 촬영한 영상과 빠르게 이동하는 객체를 촬영한 영상은 프레임감 복잡도가 서로 상이하다. 즉 서로 다른 프레임 사이에서 픽셀의 변화에 따라 프레임간 복잡도가 서로 상이하게 나타날 수 있다.Additionally, in the case of an image composed of multiple frames, the frame complexity is different between an image of a stationary object and an image of a fast-moving object. In other words, complexity between frames may appear different depending on pixel changes between different frames.

이러한 영상의 복잡도에 따른 영상 압축 및 복원은 스케일 다운 및 스케일 업에 따른 영상 압축 및 복원과 별도로 구분될 필요가 있다. 이에 본 실시예에서 복잡도 판별 네트워크(122)는 스케일 결정 네트워크(121)와 별도로 구성되어 복잡도 파라미터(α)를 획득한다.Image compression and restoration according to the complexity of the image needs to be separated from image compression and restoration according to scale down and scale up. Accordingly, in this embodiment, the complexity determination network 122 is configured separately from the scale determination network 121 to obtain the complexity parameter (α).

전처리 네트워크(130)는 미리 학습된 인공 신경망으로 구현되어, 인가되는 입력 영상에 대해 신경망 연산을 수행하여 입력 영상을 압축한다. 이때 전처리 네트워크(130)는 영상 특성 추정 모듈(120)에서 획득된 압축 파라미터에 따라 입력 영상을 서로 다르게 압축한다. 전처리 네트워크(130)는 압축 파라미터에 따라 인공 신경망의 가중치를 가변하여 입력 영상에 대한 압축을 수행한다. 여기서는 영상 특성 추정 모듈(120)이 압축 파라미터로서 스케일 파라미터(β)와 복잡도 파라미터(α)를 획득하는 것으로 가정하였으므로, 전처리 네트워크(130)는 복잡도 파라미터(α)에 따라 입력 영상에 대해 서로 상이하게 압축을 수행하면서 스케일 파라미터(β)에 따라 입력 영상을 스케일 다운시켜 압축 영상을 획득할 수 있다.The preprocessing network 130 is implemented as a pre-trained artificial neural network and compresses the input image by performing a neural network operation on the input image. At this time, the preprocessing network 130 compresses the input image differently according to the compression parameters obtained from the image characteristic estimation module 120. The preprocessing network 130 performs compression on the input image by varying the weight of the artificial neural network according to the compression parameters. Here, since the image characteristic estimation module 120 is assumed to obtain the scale parameter (β) and complexity parameter (α) as compression parameters, the preprocessing network 130 calculates the input image differently depending on the complexity parameter (α). While performing compression, the input image can be scaled down according to the scale parameter (β) to obtain a compressed image.

인코더(140)는 압축 영상을 인가받아, 지정된 방식으로 인코딩하여 인코딩 영상을 출력한다. 여기서 인코더(140)는 기존 레거시 코덱의 인코더로 구현될 수 있다. 인코더(140) 또한 영상을 압축하기 위해 이용되지만, 인공 신경망으로 구현되는 전처리 네트워크(130)와 상이한 방식으로 영상 압축을 수행한다. 다만 인공 신경망으로 구현되는 전처리 네트워크(130)는 입력 영상의 특성에 따라 인코더(140)의 영상 압축 효율성이 최대가 되도록 입력 영상에 대해 전처리를 수행하는 것으로 볼 수도 있다. 즉 전처리 네트워크(130)는 단순하게 자체적으로 수행되는 영상 압축에 의해 입력 영상이 최대로 압축되도록 하는 것이 아니라, 인코더(140)가 추가적으로 인코딩하여 출력되는 인코딩 영상의 압축률이 최대가 되도록 영상에 대한 압축을 수행하도록 학습될 수 있다.The encoder 140 receives compressed video, encodes it in a designated manner, and outputs the encoded video. Here, the encoder 140 may be implemented as an encoder of an existing legacy codec. The encoder 140 is also used to compress images, but performs image compression in a different way from the preprocessing network 130 implemented with an artificial neural network. However, the preprocessing network 130 implemented as an artificial neural network may be viewed as performing preprocessing on the input image to maximize the image compression efficiency of the encoder 140 depending on the characteristics of the input image. In other words, the preprocessing network 130 does not simply compress the input video to the maximum by self-performing video compression, but the encoder 140 additionally encodes the video to maximize the compression rate of the output encoded video. Can be learned to perform.

그리고 인코더(140)는 인코딩 영상에 압축 파라미터를 추가하여 출력할 수 있다. 레거시 코덱으로 영상을 인코딩하는 경우, 다양한 인코딩 정보가 인코딩 영상에 함께 포함되며, 이러한 인코딩 정보를 위한 데이터 공간 중에는 추후 추가될 수 있는 정보를 위해 데이터가 기록되지 않은 여백 공간이 존재한다. 이에 본 실시예에서 인코더(140)는 인코딩 영상의 여백 공간에 압축 파라미터의 비트열을 사이드 정보로서 추가하여 출력할 수 있다.And the encoder 140 can add compression parameters to the encoded video and output it. When encoding a video with a legacy codec, various encoding information is included in the encoded video, and among the data space for this encoding information, there is a blank space where no data is recorded for information that can be added later. Accordingly, in this embodiment, the encoder 140 may output the bit string of the compression parameter as side information in the blank space of the encoded video.

한편 영상 복원 모듈(200)은 디코더(210) 및 후처리 네트워크(220)를 포함한다. 디코더(210)는 인코더(140)에서 출력된 인코딩 영상을 인가받아 디코딩하여 디코딩 영상을 출력한다. 디코더(210) 또한 인코더(140)와 마찬가지로 레거시 코덱의 디코더로 구현되어 인코딩 영상을 디코딩할 수 있다. 그리고 디코더(210)는 인코딩 영상을 디코딩하여 디코딩 영상을 획득하면서, 인코딩 영상에 포함된 압축 파라미터를 추출하여 후처리 네트워크(220)로 전달한다.Meanwhile, the image restoration module 200 includes a decoder 210 and a post-processing network 220. The decoder 210 receives the encoded video output from the encoder 140, decodes it, and outputs a decoded video. Like the encoder 140, the decoder 210 is implemented as a decoder of a legacy codec and can decode encoded video. Then, the decoder 210 decodes the encoded image to obtain a decoded image, extracts compression parameters included in the encoded image, and transmits them to the post-processing network 220.

후처리 네트워크(220)는 인공 신경망으로 구현되어 디코더(210)로부터 디코딩 영상과 압축 파라미터를 인가받고, 인가된 압축 파라미터에 따라 디코딩 영상에 대해 서로 상이하게 신경망 연산하여 복원 영상을 획득한다. 이때 후처리 네트워크(220)는 전처리 네트워크(130)와 마찬가지로 인가된 압축 파라미터에 따라 가중치를 조절하고, 조절된 가중치에 따라 디코딩 영상에 대해 신경망 연산을 수행하여 복원 영상을 획득한다. 그리고 후처리 네트워크(220) 또한 디코더(21)에서 디코딩된 디코딩 영상을 기반으로 영상 품질을 최대로 복원할 수 있도록 학습된다.The post-processing network 220 is implemented as an artificial neural network, receives the decoded image and compression parameters from the decoder 210, and obtains a restored image by performing different neural network operations on the decoded image according to the applied compression parameters. At this time, the post-processing network 220, like the pre-processing network 130, adjusts the weights according to the applied compression parameters and performs a neural network operation on the decoded image according to the adjusted weights to obtain a restored image. And the post-processing network 220 is also trained to maximize image quality based on the decoded image decoded by the decoder 21.

도 2는 도 1의 전처리 네트워크와 후처리 네트워크의 개략적 구성을 나타내고, 도 3은 도 2의 전처리 네트워크와 후처리 네트워크의 다수의 연산 레이어의 상세 구성의 일 예를 나타내며, 도 4는 도 4의 파라미터 필터의 상세 구성의 일 예를 나타낸다.FIG. 2 shows a schematic configuration of the pre-processing network and the post-processing network of FIG. 1, FIG. 3 shows an example of the detailed configuration of a plurality of computational layers of the pre-processing network and the post-processing network of FIG. 2, and FIG. 4 shows the schematic configuration of the pre-processing network and the post-processing network of FIG. Shows an example of the detailed configuration of a parameter filter.

도 2에서 (a)는 전처리 네트워크(130)의 개략적 구조를 나타내고, (b)는 후처리 네트워크(220)의 개략적 구조를 나타낸다. 도 2의 (a) 및 (b)에 도시된 바와 같이, 전처리 네트워크(130)와 후처리 네트워크(220)는 각각 다수의 연산 레이어(L1 ~ Ln)를 포함하여 신경망 연산을 수행하는 인공 신경망으로 구현될 수 있다. 그리고 전처리 네트워크(130)와 후처리 네트워크(220)가 인가된 영상에 대해 스케일 다운 또는 스케일 업을 수행하는 경우, 전처리 네트워크(130)와 후처리 네트워크(220)는 다수의 연산 레이어(L1 ~ Ln) 중 마지막 레이어(Ln) 이후에 스케일 다운 레이어(Lsd) 또는 스케일 업 레이어(Lsu)를 더 포함할 수 있다.In FIG. 2, (a) shows the schematic structure of the pre-processing network 130, and (b) shows the schematic structure of the post-processing network 220. As shown in (a) and (b) of Figure 2, the pre-processing network 130 and the post-processing network 220 are artificial neural networks that each include a plurality of computational layers (L1 to Ln) and perform neural network operations. It can be implemented. And when the pre-processing network 130 and the post-processing network 220 perform scale-down or scale-up on the applied image, the pre-processing network 130 and the post-processing network 220 have a plurality of operation layers (L1 to Ln). ), a scale-down layer (Lsd) or a scale-up layer (Lsu) may be further included after the last layer (Ln).

다수의 연산 레이어(L1 ~ Ln) 각각은 네트워크로 입력되는 영상 또는 이전 레이어에서 출력된 특징맵에 대해 학습에 의해 지정된 가중치로 연산을 수행하여 출력한다. 여기서는 전처리 네트워크(130)와 후처리 네트워크(220)가 이미지 처리에 주로 이용되는 컨볼루션 네트워크(CNN)를 기반으로 구현되는 것으로 가정하며, 이에 다수의 연산 레이어(L1 ~ Ln) 각각은 입력되는 영상 또는 특징맵에 대해 가중치로 컨볼루션 연산을 수행하여 출력할 수 있다. 다만 본 실시예에서 다수의 연산 레이어(L1 ~ Ln) 각각은 압축 파라미터에 따라 가중치를 조절하면서 연산을 수행할 수 있다. 즉 일반적인 컨볼루션 네트워크의 연산 레이어와 상이하게 압축 파라미터에 따라 가중치를 가변하여 연산을 수행하여 출력할 수 있으며, 이에 동일한 영상 또는 특징맵이 인가되더라도 압축 파라미터에 따라 서로 상이한 가중치로 연산을 수행함으로써 상이한 특징맵이 출력되도록 할 수 있다.Each of the multiple calculation layers (L1 to Ln) performs calculations with weights specified through learning on images input to the network or feature maps output from the previous layer and outputs them. Here, it is assumed that the pre-processing network 130 and the post-processing network 220 are implemented based on a convolutional network (CNN), which is mainly used in image processing, and each of the plurality of operation layers (L1 to Ln) is used to process the input image. Alternatively, a convolution operation can be performed with weights on the feature map and output. However, in this embodiment, each of the multiple calculation layers (L1 to Ln) can perform calculations while adjusting weights according to compression parameters. In other words, unlike the calculation layer of a general convolutional network, the calculation can be performed and output by varying the weights according to the compression parameters. Therefore, even if the same image or feature map is applied, the calculations are performed with different weights depending on the compression parameters, resulting in different results. A feature map can be output.

도 3에서는 일 예로 다수의 연산 레이어(L1 ~ Ln) 중 i번째 연산 레이어(Li)의 상세 구성을 도시한 것으로서, 다른 연산 레이어 또한 동일한 구성을 가질 수 있다. 도 3을 참조하면, 연산 레이어(Li)는 이전 연산 레이어(Li-1)에서 출력된 특징맵(FMi-1)을 인가받는다. 다만 연산 레이어(Li)가 전처리 네트워크(130)의 첫번째 연산 레이어(L1)인 경우, 연산 레이어(L1)는 입력 영상을 인가받는다. 그리고 연산 레이어(Li)가 후처리 네트워크(220)의 첫번째 연산 레이어(L1)인 경우, 연산 레이어(L1)는 디코딩 영상을 인가받는다.FIG. 3 shows the detailed configuration of the ith computational layer (Li) among the plurality of computational layers (L1 to Ln) as an example, and other computational layers may also have the same configuration. Referring to FIG. 3, the calculation layer (Li) receives the feature map (FMi-1) output from the previous calculation layer (Li-1). However, when the computation layer (Li) is the first computation layer (L1) of the preprocessing network 130, the computation layer (L1) receives the input image. And when the computation layer (Li) is the first computation layer (L1) of the post-processing network 220, the computation layer (L1) receives the decoded image.

그리고 연산 레이어(Li)는 메인 커널(MK)과 다수의 파라미터 필터(PFL1 ~ PFLm)을 포함할 수 있다. 여기서 메인 커널(MK)과 다수의 파라미터 필터(PFL1 ~ PFLm) 각각은 학습에 의해 지정된 가중치를 갖는다. 이때 메인 커널(MK)은 압축 파라미터와 무관한 가중치를 갖다. 그러나 다수의 파라미터 필터(PFL1 ~ PFLm)는 각각 지정된 압축 파라미터에 따른 비율로 메인 커널(MK)의 가중치 또는 이전 파라미터 필터에서 조절된 가중치를 조절하기 위한 가중치를 갖는다.And the operation layer (Li) may include a main kernel (MK) and a number of parameter filters (PFL1 to PFLm). Here, the main kernel (MK) and multiple parameter filters (PFL1 to PFLm) each have weights specified by learning. At this time, the main kernel (MK) has weights that are unrelated to the compression parameters. However, multiple parameter filters (PFL1 to PFLm) each have a weight for adjusting the weight of the main kernel (MK) or the weight adjusted in the previous parameter filter at a ratio according to the specified compression parameter.

다수의 파라미터 필터(PFL1 ~ PFLm) 각각은 메인 커널(MK)의 가중치 또는 이전 파라미터 필터에서 조절된 가중치에 자신의 가중치를 압축 파라미터에 따른 비율로 가중함으로써 연산 레이어(Li)가 다수의 압축 파라미터에 따라 가변된 가중치로 연산을 수행할 수 있도록 한다. 연산 레이어(Li)에 포함되는 파라미터 필터(PFL1 ~ PFLm)의 개수(m)는 영상 특성 추정 모듈(120)이 획득하는 압축 파라미터의 개수에 따라 결정될 수 있다. 예로서 상기한 바와 같이, 영상 특성 추정 모듈(120)이 복잡도 파라미터(α) 및 스케일 파라미터(β)의 2개의 압축 파라미터를 획득하는 경우, 다수의 연산 레이어(L1 ~ Ln)는 각각 2개의 파라미터 필터(PFL1, PFL2)를 포함할 수 있다. 이때 2개의 파라미터 필터(PFL1, PFL2) 중 제1 파라미터 필터(PFL1)는 복잡도 파라미터(α)에 따라 메인 커널(MK)의 가중치를 조절하고, 제2 파라미터 필터(PFL1)는 스케일 파라미터(β)에 따라 제1 파라미터 필터(PFL1)에서 조절된 메인 커널(MK)의 가중치를 추가적으로 조절할 수 있다.Each of the multiple parameter filters (PFL1 to PFLm) adds its own weight to the weight of the main kernel (MK) or the weight adjusted in the previous parameter filter in a ratio according to the compression parameters, so that the operation layer (Li) It allows calculations to be performed with variable weights. The number (m) of parameter filters (PFL1 to PFLm) included in the calculation layer (Li) may be determined according to the number of compression parameters acquired by the image characteristic estimation module 120. As an example, as described above, when the image characteristic estimation module 120 obtains two compression parameters, the complexity parameter (α) and the scale parameter (β), the plurality of operation layers (L1 to Ln) each obtain two parameters. May include filters (PFL1, PFL2). At this time, among the two parameter filters (PFL1, PFL2), the first parameter filter (PFL1) adjusts the weight of the main kernel (MK) according to the complexity parameter (α), and the second parameter filter (PFL1) adjusts the weight of the main kernel (MK) according to the scale parameter (β). Accordingly, the weight of the main kernel (MK) adjusted in the first parameter filter (PFL1) can be additionally adjusted.

그러나 영상 특성 추정 모듈(120)이 스케일 파라미터(β)를 가로 방향 및 세로 방향에서 구분하여 2개의 스케일 파라미터(β1, β2)를 획득하는 경우, 다수의 연산 레이어(L1 ~ Ln) 각각은 2개의 스케일 파라미터 또한 구분하여 3개의 파라미터 필터(PFL1 ~ PFL3)를 포함할 수 있다.However, when the image characteristic estimation module 120 obtains two scale parameters (β1, β2) by dividing the scale parameter (β) in the horizontal and vertical directions, each of the plurality of operation layers (L1 to Ln) has two The scale parameters can also be divided into three parameter filters (PFL1 to PFL3).

다수의 파라미터 필터(PFL1 ~ PFLm) 각각은 메인 커널(MK) 또는 이전 파라미터 필터의 출력에 대해 저장된 서브 가중치를 압축 파라미터에 따른 가중비로 가중하여 조절한다. 즉 다수의 파라미터 필터(PFL1 ~ PFLm) 각각이 메인 커널(MK)을 구성하는 가중치를 대응하는 압축 파라미터에 따라 조절하는 역할을 수행한다.Each of the plurality of parameter filters (PFL1 to PFLm) adjusts the sub-weights stored for the output of the main kernel (MK) or the previous parameter filter by weighting them with a weighting ratio according to the compression parameter. That is, each of the multiple parameter filters (PFL1 to PFLm) plays the role of adjusting the weights constituting the main kernel (MK) according to the corresponding compression parameters.

도 4에서는 일 예로 다수의 파라미터 필터(PFL1 ~ PFLm) 중 제1 파라미터 필터(PFL1)의 상세 구성을 도시하였으며, 제1 파라미터 필터(PFL1)는 압축 파라미터 중 복잡도 파라미터(α)를 인가받는 것으로 가정하였다.Figure 4 shows the detailed configuration of the first parameter filter (PFL1) among the plurality of parameter filters (PFL1 to PFLm) as an example. It is assumed that the first parameter filter (PFL1) receives the complexity parameter (α) among the compression parameters. did.

도 4를 참조하면, 파라미터 필터(PFL1)는 필터 전이 네트워크(410), 메인 경감 모듈(420), 서브 커널(430), 서브 가중 모듈(440) 및 커널 혼합 모듈(450)을 포함할 수 있다.Referring to FIG. 4, the parameter filter (PFL1) may include a filter transfer network 410, a main alleviation module 420, a subkernel 430, a subweighting module 440, and a kernel mixing module 450. .

필터 전이 네트워크(410)는 메인 커널(MK) 또는 이전 파라미터 필터의 출력을 인가받아 서브 커널(430)에 연결한다. 필터 전이 네트워크(410)는 메인 커널(MK) 또는 이전 파라미터 필터의 출력을 서브 커널(430)로 전달하기 위한 구성으로 일 예로 스킵 커넥션으로 연결되는 2개의 1 X 1 콘볼루션 필터와 2개의 1 X 1 콘볼루션 필터 사이에 위치하는 활성화 함수인 PReLU(Parametric Rectified Linear Unit)로 구현될 수 있다.The filter transition network 410 receives the output of the main kernel (MK) or the previous parameter filter and connects it to the subkernel 430. The filter transition network 410 is a configuration for delivering the output of the main kernel (MK) or the previous parameter filter to the subkernel 430. For example, two 1 1 It can be implemented with PReLU (Parametric Rectified Linear Unit), which is an activation function located between convolutional filters.

메인 경감 모듈(420)은 메인 커널(MK) 또는 이전 파라미터 필터의 출력을 압축 파라미터(α)에 따른 비율(1-α)로 경감시킨다. 서브 커널(430)은 미리 학습에 의해 압축 파라미터(α)의 종류에 대응하여 지정된 서브 가중치로 구성된다. 서브 가중 모듈(440)은 서브 커널(430)을 구성하는 서브 가중치에 압축 파라미터(α)를 가중한다. 그리고 커널 혼합 모듈(450)은 압축 파라미터(α)에 따른 비율(1-α)로 경감된 경감 가중치와 압축 파라미터(α)가 가중된 서브 가중치를 가산하여 압축 파라미터(α)에 따라 가중치가 조절된 커널을 획득한다.The main reduction module 420 reduces the output of the main kernel (MK) or the previous parameter filter at a ratio (1-α) according to the compression parameter (α). The sub-kernel 430 is composed of sub-weights designated according to the type of compression parameter (α) by learning in advance. The sub-weighting module 440 weights the compression parameter (α) to the sub-weights constituting the sub-kernel 430. And the kernel mixing module 450 adds the relief weight reduced at a ratio (1-α) according to the compression parameter (α) and the sub-weight weighted by the compression parameter (α) to adjust the weight according to the compression parameter (α). Obtain the kernel.

그리고 가중치가 조절된 커널의 가중치로 인가되는 영상 또는 특징맵에 대해 연산을 수행하여 출력한다.Then, an operation is performed on the image or feature map applied as the weight of the weight-adjusted kernel and output.

한편 전처리 네트워크(130)와 후처리 네트워크(220)의 스케일 변환 레이어(Lsd) 또는 스케일 업 레이어(Lsu)는 압축 파라미터(α, β) 중 스케일 파라미터(β)를 인가받는다. 그리고 인가된 스케일 파라미터(β)에 따른 해상도를 갖도록 마지막 연산 레이어(Ln)에서 출력된 특징맵을 스케일 다운 또는 스케일 업한다. 이때 스케일 변환 레이어(Lsd) 또는 스케일 업 레이어(Lsu)는 [0, 1] 범위의 값을 갖는 스케일 파라미터(β)에 대응하여 [1, 2]의 범위의 실수값의 스케일로 특징맵을 스케일 다운 또는 스케일 업할 수 있다. 그리고 스케일 파라미터(β)가 가로 방향 및 세로 방향으로 구분된 경우, 스케일 변환 레이어(Lsd)와 스케일 업 레이어(Lsu) 또한 가로 및 세로 방향으로 구분된 2개씩의 스케일 변환 레이어(Lsd)와 스케일 업 레이어(Lsu)로 구성될 수 있다.Meanwhile, the scale conversion layer (Lsd) or scale up layer (Lsu) of the pre-processing network 130 and the post-processing network 220 receives the scale parameter (β) among the compression parameters (α, β). Then, the feature map output from the last calculation layer (Ln) is scaled down or scaled up to have a resolution according to the applied scale parameter (β). At this time, the scale conversion layer (Lsd) or scale up layer (Lsu) scales the feature map to a real value scale in the range [1, 2] in response to the scale parameter (β) with a value in the range [0, 1]. You can scale down or scale up. And when the scale parameter (β) is divided into horizontal and vertical directions, a scale conversion layer (Lsd) and a scale up layer (Lsu) are also divided into two horizontal and vertical directions. It may be composed of layers (Lsu).

이때 스케일 변환 레이어(Lsd) 또는 스케일 업 레이어(Lsu) 또한 다수의 파라미터 필터(PFL1 ~ PFLm)를 포함하여, 압축 파라미터(α, β)에 따라 가중치를 조절한 후 스케일 다운 또는 스케일 업을 수행할 수 있다.At this time, the scale conversion layer (Lsd) or scale up layer (Lsu) also includes a number of parameter filters (PFL1 to PFLm) to adjust the weights according to the compression parameters (α, β) and then perform scale down or scale up. You can.

영상 특성 추정 모듈(120)과 전처리 네트워크(130) 및 후처리 네트워크(220)는 인공 신경망으로 구현되므로 미리 학습되어 가중치가 설정되어야 하며, 이에 영상 압축 성능 개선 장치는 학습 시에 학습 모듈(310)을 더 구비할 수 있다. 학습 모듈(310)은 학습 영상을 영상 압축 성능 개선 장치의 입력 영상으로 입력하고, 후처리 네트워크(220)에서 출력되는 복원 영상을 인가받아 분석하여 손실을 역전파함으로써, 영상 압축 성능 개선 장치에 포함된 인공 신경망을 학습시킬 수 있다. 다만 인코더(140)와 디코더(210)로 구성되는 코덱은 미분 불가능한 양자화 연산을 가지고 있어, 손실 역전파를 위한 그라디언트의 전달이 불가능하다. 이를 해결하기 위해, 학습 시에는 코덱을 모델링하여 인코더(140)와 디코더(210)를 대체할 수 있는 코덱 모사 네트워크를 이용하여 학습을 수행할 수 있다. 레거시 코덱에 대한 코덱 모사 네트워크는 공지된 기술로서 별도로 학습이 완료된 인공 신경망이 이용될 수 있다. 따라서 여기서는 상세하게 설명하지 않는다.Since the image characteristic estimation module 120, the pre-processing network 130, and the post-processing network 220 are implemented as an artificial neural network, they must be learned in advance and the weights must be set. Accordingly, the image compression performance improvement device uses the learning module 310 during training. It can be further provided. The learning module 310 inputs the learning image as an input image to the image compression performance improvement device, receives the restored image output from the post-processing network 220, analyzes it, and backpropagates the loss, thereby being included in the image compression performance improvement device. An artificial neural network can be trained. However, the codec consisting of the encoder 140 and the decoder 210 has a non-differentiable quantization operation, making it impossible to transfer the gradient for lossy backpropagation. To solve this problem, learning can be performed using a codec simulation network that can replace the encoder 140 and decoder 210 by modeling the codec. The codec simulation network for the legacy codec is a known technology, and a separately trained artificial neural network can be used. Therefore, it will not be explained in detail here.

학습 모듈(310)은 학습 영상과 복원 영상 사이의 차이에 따른 손실을 후처리 네트워크로 역전파하여 학습을 수행할 수 있다. 이때 학습 모듈(310)은 다수의 학습 영상을 최대 신호대잡음비(Peak Signal-to-noise ratio: PSNR)와 비트레이트(bitrate)에 따라 학습 영상의 효과적인 압축 스케일을 구분하고, 프레임내 복잡도 및 프레임간 복잡도에 따라 학습 영상의 복잡도를 구분하고, 구분된 학습 영상의 압축 파라미터를 레이블링할 수 있다.The learning module 310 may perform learning by back-propagating the loss resulting from the difference between the training image and the restored image to a post-processing network. At this time, the learning module 310 classifies the effective compression scale of the multiple learning images according to the maximum signal-to-noise ratio (PSNR) and bitrate, and the intra-frame complexity and inter-frame complexity. The complexity of the training images can be classified according to complexity, and the compression parameters of the classified training images can be labeled.

학습 모듈(310)은 학습 영상을 이용한 학습 시에 영상 특성 추정 모듈(120)에서 획득된 압축 파라미터와 학습 영상에 레이블된 압축 파라미터 사이의 오차를 손실로 계산하여 역전파함으로써, 영상 특성 추정 모듈(120)을 학습시킬 수 있다.When learning using a training image, the learning module 310 calculates the error between the compression parameters obtained in the image characteristic estimation module 120 and the compression parameters labeled in the learning image as a loss and backpropagates it, thereby calculating the error between the compression parameters obtained in the image characteristic estimation module 120 and the compression parameters labeled in the learning image. 120) can be learned.

다만 본 실시예의 영상 압축 성능 개선 장치에서는 메인 커널과 함께 서브 커널을 포함하고 있을 뿐만 아니라 메인 커널과 서브 커널의 가중치가 압축 파라미터에 따라 서로 다른 비율로 혼합된다. 또한 압축 파라미터 또한 영상에 따라 가변된다. 따라서 다양한 압축 가능 스케일과 복잡도를 갖는 모든 학습 데이터를 이용하여 학습을 수행하는 경우, 인공 신경망의 커널 가중치가 수렴되기에 매우 오랜 시간이 소요되거나 수렴되지 않는 경우가 발생할 수 있다. 이와 같은 문제를 해소하고 학습 시간을 저감시키기 위해, 학습 모듈(310)은 우선 최소값을 갖는 압축 스케일과 복잡도를 갖는 학습 영상을 기반으로 학습을 수행한다. 이 경우, 학습 데이터의 압축 파라미터(α, β)가 모두 0의 값으로 레이블되며, 이에 따라 압축 파라미터(α, β)의 영향이 배제될 뿐만 아니라, 각 인공 신경망의 연산 레이어(L1 ~ Lm)에서도 다수의 파라미터 필터(PFL1 ~ PFLm)가 비활성화된다. 따라서 학습 모듈(310)은 각 인공 신경망의 메인 커널(MK)의 가중치를 학습시킬 수 있다. 이후, 다수의 압축 파라미터(α, β) 중 순차적으로 하나만이 최대값을 갖거나, 하나만이 최소값을 갖는 학습 영상들을 입력하여 학습을 수행함으로써, 다수의 압축 파라미터 각각에 따른 서브 커널의 가중치를 학습시킬 수 있다. 추가적으로 학습 모듈(310)은 레이블된 파라미터에 무관하게 학습 영상을 입력시킴으로써 최종적으로 커널의 가중치에 대한 세밀한 조절을 수행할 수 있다.However, the video compression performance improvement device of this embodiment not only includes a sub-kernel along with the main kernel, but the weights of the main kernel and sub-kernel are mixed at different ratios according to compression parameters. Additionally, compression parameters also vary depending on the video. Therefore, when learning is performed using all training data with various compressible scales and complexities, the kernel weights of the artificial neural network may take a very long time to converge or may not converge. In order to solve this problem and reduce the learning time, the learning module 310 first performs learning based on the learning image having the compression scale and complexity with the minimum value. In this case, the compression parameters (α, β) of the learning data are all labeled with a value of 0, which not only excludes the influence of the compression parameters (α, β), but also excludes the computational layers (L1 to Lm) of each artificial neural network. Several parameter filters (PFL1 to PFLm) are also disabled. Therefore, the learning module 310 can learn the weights of the main kernel (MK) of each artificial neural network. Afterwards, learning is performed by sequentially inputting training images in which only one of the multiple compression parameters (α, β) has the maximum value or only one has the minimum value, thereby learning the weight of the subkernel according to each of the multiple compression parameters. You can do it. Additionally, the learning module 310 can ultimately perform detailed adjustments to the weights of the kernel by inputting learning images regardless of the labeled parameters.

도 5는 일 실시예에 따른 영상 압축 성능 개선 방법을 나타낸다.Figure 5 shows a method for improving image compression performance according to an embodiment.

도 1 내지 도 4를 참조하여 도 5의 영상 압축 성능 개선 방법을 설명하면, 우선 저장하거나 전송할 입력 영상을 획득한다(51). 그리고 학습된 인공 신경망이 입력 영상을 인가받아 신경망 연산하여, 입력 영상의 특징을 추출하여 압축 파라미터를 획득한다(51). 이때 서로 다른 다수의 인공 신경망을 이용하여 입력 영상에 대한 서로 다른 특징을 추출함으로써 다수의 압축 파라미터를 획득할 수 있으며, 일 예로 영상의 손실을 최소화하면서 압축 가능한 해상도 특성을 나타내는 스케일 파라미터와 영상의 프레임 내 및 프레임간 복잡도를 나타내는 복잡도 파라미터를 획득할 수 있다.When explaining the image compression performance improvement method of FIG. 5 with reference to FIGS. 1 to 4, first, an input image to be stored or transmitted is obtained (51). Then, the learned artificial neural network receives the input image, performs neural network operations, extracts features of the input image, and obtains compression parameters (51). At this time, a number of compression parameters can be obtained by extracting different features of the input image using a number of different artificial neural networks. For example, scale parameters and image frames that indicate resolution characteristics that can be compressed while minimizing image loss are used. Complexity parameters representing intra- and inter-frame complexity can be obtained.

다수의 압축 파라미터가 획득되면, 획득된 압축 파라미터를 이용하여 입력 영상에 대해 전처리를 수행하는 인공 신경망의 가중치를 조절한다(53). 이때 전처리를 수행하는 인공 신경망은 다수의 연산 레이어(L1 ~ Ln)와 적어도 하나의 스케일 다운 레이어(Lsd)를 포함하며, 다수의 연산 레이어(L1 ~ Ln) 각각은 압축 파라미터에 무관한 가중치를 갖는 하나의 메인 커널과 영상 특성별로 구분되어 획득된 다수의 압축 파라미터 각각에 따른 다수의 서브 커널을 포함할 수 있다. 그리고 다수의 연산 레이어(L1 ~ Ln) 각각은 압축 파라미터로 지정된 비율에 따라 메인 커널과 다수의 서브 커널이 혼합된 가중치를 갖는다.When a number of compression parameters are obtained, the weights of the artificial neural network that performs preprocessing on the input image are adjusted using the obtained compression parameters (53). At this time, the artificial neural network that performs preprocessing includes multiple operation layers (L1 to Ln) and at least one scale down layer (Lsd), and each of the multiple operation layers (L1 to Ln) has a weight that is unrelated to the compression parameter. It may include one main kernel and a plurality of sub-kernels according to each of a plurality of compression parameters obtained separately for each image characteristic. And each of the multiple operation layers (L1 to Ln) has a weight that is a mixture of the main kernel and multiple subkernels according to the ratio specified as the compression parameter.

압축 파라미터에 따라 인공 신경망의 다수의 연산 레이어(L1 ~ Ln)의 가중치가 조절되면, 전처리를 수행하는 인공 신경망은 입력 영상을 인가받아 조절된 가중치로 신경망 연산을 수행하여 특징맵을 출력하고, 적어도 하나의 스케일 다운 레이어(Lsd)는 압축 파라미터 중 스케일 파라미터에 따라 신경망 연산된 특징맵을 인가받아 스케일 다운하는 전처리를 수행하여 압축 영상을 획득한다(54). When the weights of multiple calculation layers (L1 to Ln) of the artificial neural network are adjusted according to the compression parameters, the artificial neural network performing preprocessing receives the input image, performs neural network calculation with the adjusted weights, and outputs a feature map, at least One scale down layer (Lsd) obtains a compressed image by receiving a feature map calculated by a neural network according to the scale parameter among the compression parameters and performing preprocessing to scale down (54).

압축 영상은 레거시 코덱의 인코더에서 인코딩되어 인코딩 영상으로 획득된다(55). 이때 인코더는 압축 파라미터를 인코딩 영상에 추가하여 포함시킬 수 있다.The compressed video is encoded in the encoder of the legacy codec and obtained as an encoded video (55). At this time, the encoder can include compression parameters by adding them to the encoded video.

인코딩 영상은 레거시 코덱의 디코더에 인가되고, 디코더는 인가된 인코딩 영상을 디코딩하여 디코딩 영상을 획득한다(56). 그리고 인코딩 영상에 포함된 압축 파라미터를 추출한다(57).The encoded video is applied to the decoder of the legacy codec, and the decoder decodes the applied encoded video to obtain a decoded video (56). Then, the compression parameters included in the encoded video are extracted (57).

압축 파라미터가 추출되면, 추출된 압축 파라미터를 이용하여 디코딩 영상에 대해 후처리를 수행하는 인공 신경망의 가중치를 조절한다(58). 이때 전처리를 수행하는 인공 신경망과 유사하게 후처리를 수행하는 인공 신경망은 다수의 연산 레이어(L1 ~ Ln)와 적어도 하나의 스케일 업 레이어(Lsu)를 포함한다. 이에 다수의 연산 레이어(L1 ~ Ln) 각각은 압축 파라미터로 지정된 비율에 따라 메인 커널과 다수의 서브 커널이 혼합된 가중치를 갖는다.Once the compression parameters are extracted, the weights of the artificial neural network that performs post-processing on the decoded image are adjusted using the extracted compression parameters (58). At this time, similar to the artificial neural network that performs pre-processing, the artificial neural network that performs post-processing includes a plurality of operation layers (L1 to Ln) and at least one scale-up layer (Lsu). Accordingly, each of the multiple operation layers (L1 to Ln) has a weight that is a mixture of the main kernel and multiple subkernels according to the ratio specified as the compression parameter.

압축 파라미터에 따라 인공 신경망의 다수의 연산 레이어(L1 ~ Ln)의 가중치가 조절되면, 후처리를 수행하는 인공 신경망은 디코딩 영상을 인가받아 조절된 가중치로 신경망 연산을 수행하여 특징맵을 출력하고, 적어도 하나의 스케일 업 레이어(Lsu)는 압축 파라미터 중 스케일 파라미터에 따라 신경망 연산된 특징맵을 인가받아 스케일 업하는 후처리를 수행하여 복원 영상을 획득한다(59).When the weights of multiple calculation layers (L1 to Ln) of the artificial neural network are adjusted according to the compression parameters, the artificial neural network that performs post-processing receives the decoded image, performs neural network calculation with the adjusted weights, and outputs a feature map. At least one scale-up layer (Lsu) obtains a restored image by receiving a feature map calculated by a neural network according to a scale parameter among compression parameters and performing post-processing to scale up the feature map (59).

도 5에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 5에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능하다.In FIG. 5, each process is described as being sequentially executed, but this is only an illustrative explanation, and those skilled in the art can change the order shown in FIG. 5 and execute it without departing from the essential characteristics of the embodiments of the present invention. Alternatively, it can be applied through various modifications and modifications by executing one or more processes in parallel or adding other processes.

도 6은 일 실시예에 따른 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 설명하기 위한 도면이다.FIG. 6 is a diagram for explaining a computing environment including a computing device according to an embodiment.

도시된 실시예에서, 각 구성 요소들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술되지 않은 것 이외에도 추가적인 구성 요소를 포함할 수 있다. 도시된 컴퓨팅 환경(60)은 컴퓨팅 장치(61)를 포함한다. 일 실시예에서, 컴퓨팅 장치(61)는 도 1에 도시된 영상 압축 성능 개선 장치에 포함된 하나 이상의 구성 요소일 수 있다.In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components in addition to those not described below. The illustrated computing environment 60 includes a computing device 61 . In one embodiment, the computing device 61 may be one or more components included in the image compression performance improvement device shown in FIG. 1.

컴퓨팅 장치(61)는 적어도 하나의 프로세서(62), 컴퓨터 판독 가능 저장매체(63) 및 통신 버스(65)를 포함한다. 프로세서(62)는 컴퓨팅 장치(61)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(62)는 컴퓨터 판독 가능 저장매체(63)에 저장된 하나 이상의 프로그램들(64)을 실행할 수 있다. 상기 하나 이상의 프로그램들(64)은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(62)에 의해 실행되는 경우 컴퓨팅 장치(61)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 61 includes at least one processor 62, computer-readable storage medium 63, and communication bus 65. Processor 62 may cause computing device 61 to operate according to the above-mentioned example embodiments. For example, the processor 62 may execute one or more programs 64 stored in the computer-readable storage medium 63. The one or more programs 64 may include one or more computer-executable instructions, which, when executed by the processor 62, cause the computing device 61 to operate according to an example embodiment. It can be configured to perform these.

통신 버스(65)는 프로세서(62), 컴퓨터 판독 가능 저장매체(63)를 포함하여 컴퓨팅 장치(61)의 다른 다양한 구성 요소들을 상호 연결한다.Communication bus 65 interconnects various other components of computing device 61, including processor 62 and computer-readable storage medium 63.

컴퓨팅 장치(61)는 또한 하나 이상의 입출력 장치(68)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(66) 및 하나 이상의 통신 인터페이스(67)를 포함할 수 있다. 입출력 인터페이스(66) 및 통신 인터페이스(67)는 통신 버스(65)에 연결된다. 입출력 장치(68)는 입출력 인터페이스(66)를 통해 컴퓨팅 장치(61)의 다른 구성 요소들에 연결될 수 있다. 예시적인 입출력 장치(68)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(68)는 컴퓨팅 장치(61)를 구성하는 일 구성 요소로서 컴퓨팅 장치(61)의 내부에 포함될 수도 있고, 컴퓨팅 장치(61)와는 구별되는 별개의 장치로 컴퓨팅 장치(61)와 연결될 수도 있다.Computing device 61 may also include one or more input/output interfaces 66 and one or more communication interfaces 67 that provide an interface for one or more input/output devices 68 . The input/output interface 66 and communication interface 67 are connected to the communication bus 65. Input/output device 68 may be connected to other components of computing device 61 through input/output interface 66. Exemplary input/output devices 68 include, but are not limited to, a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touch screen), a voice or sound input device, various types of sensor devices, and/or imaging devices. It may include input devices and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 68 is a component constituting the computing device 61 and may be included within the computing device 61, or may be a separate device distinct from the computing device 61 and may be included in the computing device 61. It may be connected.

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described in detail through representative embodiments above, those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the attached claims.

100: 영상 압축 모듈 110: 영상 획득 모듈
120: 영상 특성 추정 모듈 121: 스케일 결정 네트워크
122: 복잡도 판별 네트워크 130: 전처리 네트워크
140: 인코더 200: 영상 복원 모듈
210: 디코더 220: 후처리 네트워크
410: 필터 전이 네트워크 420: 메인 경감 모듈
430: 서브 커널 440: 서브 가중 모듈
450: 커널 혼합 모듈 Ln1 ~ Ln: 연산 레이어
Lsd: 스케일 다운 레이어 Lsu: 스케일 업 레이어
MK: 메인 커널 PFL1 ~ PFLm: 파라미터 필터100: video compression module 110: video acquisition module
120: Image feature estimation module 121: Scale decision network
122: Complexity determination network 130: Preprocessing network
140: Encoder 200: Video restoration module
210: decoder 220: post-processing network
410: Filter transition network 420: Main mitigation module
430: subkernel 440: subweighting module
450: Kernel mixing module Ln1 ~ Ln: Operation layer
Lsd: scale down layer Lsu: scale up layer
MK: Main kernel PFL1 ~ PFLm: Parameter filter

Claims

1. A method performed by a computing device having one or more processors and a memory that stores one or more programs to be executed by the one or more processors, comprising:
Obtaining at least one compression parameter according to the characteristics of the input image by performing a neural network operation on the applied input image;
adjusting weights of an artificial neural network that performs preprocessing on the input image according to the at least one compression parameter;
Obtaining a compressed image by performing preprocessing on the input image using an artificial neural network with adjusted weights; and
A method of improving video compression performance, comprising the step of encoding the compressed video to obtain an encoded video.

The method of claim 1, wherein adjusting the weight
Adjusting the weight of each calculation layer by mixing the weights constituting the main kernel and at least one sub-kernel included in each of the plurality of calculation layers of the artificial neural network that performs the preprocessing in a ratio according to at least the at least one compression parameter. How to improve video compression performance.

The method of claim 2, wherein the step of adjusting the weight is
A method for improving video compression performance in which the weights are adjusted by adding a ratio (1-α) according to the compression parameter to the weights constituting the main kernel and adding the compression parameters (α) to the weights constituting the sub-kernel.

The method of claim 3, wherein the step of adjusting the weight is
If there are multiple compression parameters, the weight of the main kernel is adjusted by mixing the weights of the subkernels in a ratio according to the weight compression parameter of the main kernel, and then sequentially different compression is performed on the main kernel weight adjusted by the previous compression parameter. A method of improving video compression performance by mixing and adjusting the weights of subkernels at a ratio according to parameters.

The method of claim 1, wherein obtaining the at least one compression parameter
It includes two or more artificial neural networks, each of the two or more artificial neural networks receives the input image and performs a neural network operation, and includes a scale parameter indicating a ratio for reducing the size of the input image and intra-frame and inter-frame complexity of the input image. A method for improving image compression performance by obtaining a complexity parameter representing as the compression parameter.

The method of claim 5, wherein the step of acquiring the compressed image is
A feature map is output by performing a neural network operation on the input image using a plurality of calculation layers of an artificial neural network with adjusted weights, and if the scale parameter is included in the at least one compression parameter, a scale down layer is performed according to the scale parameter. A method for improving image compression performance by downscaling a feature map to obtain the compressed image.

The method of claim 1, wherein the step of acquiring the encoded image is
A method for improving video compression performance, wherein the encoded video is obtained by including the at least one compression parameter in the encoded video.

The method of claim 7, wherein the image compression performance improvement method is
Obtaining a decoded video by receiving and decoding the encoded video, and extracting the at least one compression parameter included in the encoded video;
adjusting post-processing weights of an artificial neural network that performs post-processing on the decoded image according to the at least one compression parameter; and
A method for improving image compression performance further comprising performing post-processing on the decoded image using an artificial neural network with adjusted weights to obtain a restored image.

The method of claim 8, wherein the step of adjusting the weight is
The weights of each calculation layer are adjusted by mixing the weights constituting the main kernel and at least one sub-kernel included in each of the plurality of calculation layers of the artificial neural network that performs the post-processing in a ratio according to at least the at least one compression parameter. How to improve video compression performance.

The method of claim 9, wherein the step of acquiring the restored image is
A feature map is output by performing a neural network operation on the input image using a plurality of calculation layers of an artificial neural network with adjusted weights, and if the at least one compression parameter includes a scale parameter, a scale-up layer features the scale parameter according to the scale parameter. A method for improving image compression performance by upscaling a map to obtain the restored image.

One or more processors; and a memory storing one or more programs executed by the one or more processors,
The processor is
Performing a neural network operation on the applied input image to obtain at least one compression parameter according to the characteristics of the input image,
Adjusting the weights of an artificial neural network that performs preprocessing on the input image according to the at least one compression parameter,
Obtain a compressed image by performing preprocessing on the input image using an artificial neural network with adjusted weights,
A video compression performance improvement device that obtains an encoded video by encoding the compressed video.

The method of claim 11, wherein the processor
Adjusting the weight of each calculation layer by mixing the weights constituting the main kernel and at least one sub-kernel included in each of the plurality of calculation layers of the artificial neural network that performs the preprocessing in a ratio according to at least the at least one compression parameter. Video compression performance improvement device.

The method of claim 12, wherein the processor
An image compression performance improvement device that adjusts the weight by adding a ratio (1-α) according to the compression parameter to the weight constituting the main kernel and adding the compression parameter (α) to the weight constituting the sub-kernel.

The method of claim 13, wherein the processor
If there are multiple compression parameters, the weight of the main kernel is adjusted by mixing the weights of the subkernels in a ratio according to the weight compression parameter of the main kernel, and then sequentially different compression is performed on the main kernel weight adjusted by the previous compression parameter. A video compression performance improvement device that mixes and adjusts the weights of subkernels at a ratio according to parameters.

The method of claim 12, wherein the processor
Using two or more artificial neural networks that each receive the input image and perform neural network calculations, a scale parameter indicating a ratio for reducing the size of the input image and a complexity parameter indicating the intra- and inter-frame complexity of the input image are set. A device for improving video compression performance obtained as compression parameters.

The method of claim 15, wherein the processor
A feature map is output by performing a neural network operation on the input image using the plurality of calculation layers with adjusted weights, and if the scale parameter is included in the at least one compression parameter, the feature map is downscaled according to the scale parameter. An image compression performance improvement device that obtains the compressed image.

The method of claim 11, wherein the processor
An image compression performance improvement device for obtaining an encoded image by including the at least one compression parameter in the encoded image.

The method of claim 17, wherein the processor
Obtaining a decoded image by receiving and decoding the encoded image, extracting the at least one compression parameter included in the encoded image,
Adjusting post-processing weights of an artificial neural network that performs post-processing on the decoded image according to the at least one compression parameter,
An image compression performance improvement device that obtains a restored image by performing post-processing on the decoded image using an artificial neural network with adjusted weights.

The method of claim 18, wherein the processor
The weights of each calculation layer are adjusted by mixing the weights constituting the main kernel and at least one sub-kernel included in each of the plurality of calculation layers of the artificial neural network that performs the post-processing in a ratio according to at least the at least one compression parameter. A device that improves video compression performance.

The method of claim 19, wherein the processor
A feature map is output by performing a neural network operation on the input image using a plurality of computational layers of an artificial neural network whose weights are adjusted and post-processing is performed. If the at least one compression parameter includes a scale parameter, the feature map is output according to the scale parameter. An image compression performance improvement device for obtaining the restored image by upscaling the feature map.