KR20240072961A

KR20240072961A - Method and apparatus for processing feature map of images for machine vision

Info

Publication number: KR20240072961A
Application number: KR1020230161127A
Authority: KR
Inventors: 김동하; 윤용욱; 한규웅; 김재곤; 천승문
Original assignee: 한국항공대학교산학협력단; 주식회사 인시그널
Priority date: 2022-11-17
Filing date: 2023-11-20
Publication date: 2024-05-24
Also published as: WO2024107003A1

Abstract

머신비전을 위한 영상의 특징 맵의 처리 방법 및 장치가 개시된다. 일 실시예에 따른 머신비전을 위한 특징 맵의 처리 방법은, 상기 영상으로부터 추출된 다채널의 멀티스케일 특징 맵들을 단일스케일 특징 맵들로 정렬하는 단계, 상기 단일스케일 특징 맵들 중에서 하나 이상의 단일스케일 특징 맵에 대한 특징 채널 변환을 수행하여 변환된 특징 맵을 생성하는 단계 및 상기 변환된 특징 맵에 대한 부호화를 수행하는 단계를 포함한다.A method and device for processing image feature maps for machine vision are disclosed. A method of processing a feature map for machine vision according to an embodiment includes arranging multi-channel multi-scale feature maps extracted from the image into single-scale feature maps, and selecting one or more single-scale feature maps from among the single-scale feature maps. It includes generating a converted feature map by performing feature channel conversion and encoding the converted feature map.

Description

{Method and apparatus for processing feature map of images for machine vision}

본 발명은 영상 처리 기술에 관한 것으로, 머신비전 임무 수행을 위하여 입력 영상의 특징 맵을 효율적으로 처리 또는 전송할 수 있도록, 입력 영상의 특징 맵을 추출하고 또한 추출된 특징 맵을 부호화 및/또는 복호화하는 영상 처리 기술에 관한 것이다.The present invention relates to image processing technology, which extracts a feature map of an input image and encodes and/or decodes the extracted feature map so that the feature map of the input image can be efficiently processed or transmitted to perform machine vision tasks. It is about image processing technology.

딥러닝(deep learning) 또는 머신러닝(machine learning)의 발전과 다양한 인공지능 플랫폼과 어플리케이션이 개발되고 있다. 이에 따라, 딥러닝 또는 머신러닝에 의해 기계 또는 사람이 처리하는 데이터가 방대해짐에 따라, 효율적인 정보 전송과 실시간 신호 처리를 위한 영상 및/또는 이의 특징에 대한 부호화 및 복호화 기술에 대한 요구가 증가하고 있다. Deep learning or machine learning is advancing and various artificial intelligence platforms and applications are being developed. Accordingly, as the data processed by machines or people through deep learning or machine learning becomes vast, the demand for encoding and decoding technology for images and/or their characteristics for efficient information transmission and real-time signal processing increases. there is.

종래의 딥러닝 또는 머신러닝을 이용한 영상(특히, 동영상) 압축 기술은 입력 영상으로부터 특징(feature)/특징 맵(feature map)/특징 벡터(feature vector)/은닉 벡터(latent vector)를 추출한 다음, 추출된 특징 맵을 그 자체로 전송하거나 또는 특징 맵을 부호화하여 전송하기도 한다. 이 때, 추출된 특징 맵은 설계된 딥러닝 또는 머신러닝 네트워크에 따라 크기 등이 정의될 수 있다. 즉, 특징 맵은 다양한 스케일, 다수의 채널 및 다양한 데이터 타입 중 하나 이상의 파라미터에 의해 정의될 수 있다. Conventional image (particularly video) compression technology using deep learning or machine learning extracts features/feature map/feature vector/latent vector from the input image, The extracted feature map may be transmitted as is, or the feature map may be encoded and transmitted. At this time, the size of the extracted feature map may be defined depending on the designed deep learning or machine learning network. That is, the feature map may be defined by one or more parameters among various scales, multiple channels, and various data types.

그런데, 이와 같이 정의된 특징 맵은 입력 영상과 비교하여 작은 크기를 가질 수 있지만, 작은 크기를 갖는 특징 맵은 원본 영상과 비교하여 정보량의 손실이 발생하게 된다. 따라서 정보량의 손실을 줄이기기 위해서는 가능한 큰 크기의 특징 맵이 요구되는데, 이 경우에는 데이터의 양이 증가하기 때문에 전송이나 실시간 데이터 처리에 어려움이 있다. 따라서 보다 큰 크기의 특징 맵에 대해서 엔트로피 부호화 및 복호화를 수행함으로써 정보 손실을 줄일 수 있을 뿐만 아니라 전송되거나 처리되어야 할 데이터의 크기를 줄이는 방법이 제안되고 있다.However, the feature map defined in this way may have a small size compared to the input image, but the feature map with a small size causes loss of information compared to the original image. Therefore, in order to reduce the loss of information, a feature map of as large a size as possible is required. In this case, because the amount of data increases, transmission or real-time data processing is difficult. Therefore, a method has been proposed that not only reduces information loss but also reduces the size of data to be transmitted or processed by performing entropy encoding and decoding on a larger-sized feature map.

일례로, 한국공개특허 제2018-0087264호, "특징맵 압축을 이용한 이미지 처리 방법 및 장치"(특허문헌 1)에는, 입력 이미지에 대하여 컨벌루션 연산을 수행하여 생성한 특징 맵을 적어도 하나의 라인 단위로 처리 또는 압축함으로써, 필터 파라미터의 수를 감소시켜서 연산량을 감소시키는 기술이 개시되어 있다. 그리고 한국공개특허 제2020-0026026호, "CNN 활용 시스템의 특징 맵을 고속으로 압축 처리하기 위한 전자장치 및 제어방법"(특허문헌 2)에는, 입력 영상에 대한 특징 맵을 획득하고, 획득한 특징 맵에 대응되는 룩업 테이블을 통해 특징 맵을 변환한 다음, 변환된 특징 맵을 이에 대응되는 압축 모드를 통해 압축하는 기술이 개시되어 있다.For example, in Korea Patent Publication No. 2018-0087264, “Image processing method and device using feature map compression” (Patent Document 1), a feature map generated by performing a convolution operation on an input image is stored in at least one line unit. A technology has been disclosed to reduce the amount of computation by reducing the number of filter parameters by processing or compressing the filter. And in Korean Patent Publication No. 2020-0026026, “Electronic device and control method for high-speed compression processing of feature maps of CNN utilization system” (Patent Document 2), feature maps for input images are acquired, and the acquired features are A technology has been disclosed that converts a feature map through a lookup table corresponding to the map and then compresses the converted feature map through a corresponding compression mode.

한국공개특허 제2018-0087264호Korean Patent Publication No. 2018-0087264 한국공개특허 제2020-0026026호Korean Patent Publication No. 2020-0026026

대표적인 머신 비전의 어플리케이션으로 객체 검출/분할에 사용되는 네트워크 구조에 따라, 다양한 스케일의 객체를 분별하기 위해 하나의 영상/비디오로부터 멀티스케일의 특징을 추출하여 머신 비전에 사용한다. 멀티스케일의 특징 맵의 집합을 정렬하고 채널을 축소하여 압축하기 좋은 효율적인 크기의 싱글스케일의 특징으로 변환하고 기존 영상 압축 코덱을 사용하여 압축하기 위해 최소 및 최대 정규화를 진행한다. As a representative machine vision application, according to the network structure used for object detection/segmentation, multi-scale features are extracted from a single image/video to distinguish objects of various scales and used for machine vision. The set of multi-scale feature maps is sorted, channels are reduced, converted to single-scale features of an efficient size for compression, and minimum and maximum normalization is performed for compression using an existing video compression codec.

기존 영상 압축 코덱을 사용한 특징 맵의 압축 및 복원의 경우, 특징 맵의 압축시에 최소 및 최대 정규화를 적용하여 생성된 최소값과 최대값을 전송하여 복원시 사용한다. 이 경우, 특징 맵의 최소값과 최대값을 추가해 전송하기 위해 별도의 단일 스트림 설계가 요구된다.In the case of compression and restoration of a feature map using an existing video compression codec, the minimum and maximum values generated by applying minimum and maximum normalization when compressing the feature map are transmitted and used during restoration. In this case, a separate single stream design is required to add and transmit the minimum and maximum values of the feature map.

일 실시예는 머신 비전을 위한 특징 맵을 부호화 및/또는 복호화함에 있어, 서로 다른 해상도 및/또는 채널 수를 갖는 특징 맵을 압축 및/또는 복원하는 장치 및 방법을 제공한다.One embodiment provides an apparatus and method for compressing and/or restoring feature maps having different resolutions and/or number of channels when encoding and/or decoding feature maps for machine vision.

일 실시예는 종래의 머신 비전 영상에 대한 부호화 및/또는 복호화의 방법에 의해, 유사한 임무 수행 결과에 대해 더 적은 데이터 양을 부호화 및/또는 복호화함으로써 압축 성능을 향상시키는 장치 및 방법을 제공한다.One embodiment provides an apparatus and method for improving compression performance by encoding and/or decoding a smaller amount of data for a similar task performance result using a conventional method of encoding and/or decoding a machine vision image.

일 실시예는 영상 압축 코덱을 사용하여 복호화한 싱글스케일 특징 맵으로부터 멀티스케일 특징 맵을 복원할 때 필요한 최소 및 최대 정규화에서 발생한 특징 맵의 최소값 및 최대값의 전송이 필요없는 장치 및 방법을 제공한다.One embodiment provides an apparatus and method that does not require transmission of the minimum and maximum values of the feature map resulting from minimum and maximum normalization required when restoring a multi-scale feature map from a single-scale feature map decoded using a video compression codec. .

일 실시예는 뉴럴 네트워크, 압축에 의해 발생할 수 있는 영상/특징 맵의 데이터 손실을 고려한 특징 맵의 추출 및 추출된 특징 맵을 부호화/복호화할 수 있는 장치 및 방법을 제공한다.One embodiment provides an apparatus and method for extracting a feature map considering data loss of an image/feature map that may occur due to neural network and compression, and encoding/decoding the extracted feature map.

상기한 과제를 해결하기 위한 본 발명의 일 실시 형태에 따른 머신비전을 위한 영상의 특징 맵의 처리 방법은, 상기 영상으로부터 추출된 다채널의 멀티스케일 특징 맵들을 단일스케일 특징 맵들로 정렬하는 단계, 상기 단일스케일 특징 맵들 중에서 하나 이상의 단일스케일 특징 맵에 대한 특징 채널 변환을 수행하여 변환된 특징 맵을 생성하는 단계 및 상기 변환된 특징 맵에 대한 부호화를 수행하는 단계를 포함한다.A method of processing a feature map of an image for machine vision according to an embodiment of the present invention to solve the above problem includes arranging multi-channel, multi-scale feature maps extracted from the image into single-scale feature maps; It includes generating a converted feature map by performing feature channel conversion on one or more single-scale feature maps among the single-scale feature maps, and performing encoding on the converted feature map.

상기한 과제를 해결하기 위한 본 발명의 다른 실시예에 따른 특징 맵의 처리 방법은, 변환된 특징 맵들에 대하여 크기 복원을 수행하는 단계 및 상기 변환된 특징 맵들에 대하여 채널 수 복원을 수행함으로써 복원된 특징 맵들을 생성하는 단계를 포함하고, 상기 복원된 특징 맵들은 서로 다른 해상도 또는 채널 수를 갖는 특징 맵들을 포함한다.A feature map processing method according to another embodiment of the present invention to solve the above problem includes the steps of performing size restoration on the converted feature maps and performing channel number restoration on the converted feature maps. and generating feature maps, wherein the reconstructed feature maps include feature maps having different resolutions or numbers of channels.

일 실시예에 의하면, 머신 비전을 위한 특징 맵을 부호화 및/또는 복호화함에 있어, 서로 다른 해상도 및/또는 채널 수를 갖는 특징 맵을 압축 및/또는 복원하는 장치, 방법 및 기록 매체가 제공될 수 있다.According to one embodiment, when encoding and/or decoding a feature map for machine vision, an apparatus, method, and recording medium may be provided for compressing and/or restoring feature maps having different resolutions and/or number of channels. there is.

일 실시예에 의하면, 머신 비전 영상에 대한 부호화 및/또는 복호화 방법에 의해, 유사한 임무 수행 결과를 달성하면서도 더 적은 데이터 양을 부호화 및/또는 복호화함으로써 압축 성능을 향상시키는 장치, 방법 및 기록 매체가 제공될 수 있다.According to one embodiment, an apparatus, method, and recording medium are provided for improving compression performance by encoding and/or decoding a smaller amount of data while achieving similar mission performance results by using an encoding and/or decoding method for a machine vision image. can be provided.

일 실시예에 의하면, 기존의 영상 압축 코덱을 사용하여 복호화한 싱글스케일 특징 맵으로부터 멀티스케일 특징 맵을 복원할 때, 필요한 최소 및 최대 정규화에서 발생한 특징 맵의 최소값 및 최대값의 전송이 필요없는 장치, 방법 및 기록 매체가 제공될 수 있다.According to one embodiment, when restoring a multi-scale feature map from a single-scale feature map decoded using an existing video compression codec, a device that does not need to transmit the minimum and maximum values of the feature map resulting from the required minimum and maximum normalization , a method, and a recording medium may be provided.

일 실시예에 의하면, 네트워크로부터 추출되는 특징 맵의 데이터 손실을 고려한 학습을 통해 데이터 손실이 발생하더라도 네트워크 성능에 미치는 영향을 최소화할 수 있다.According to one embodiment, even if data loss occurs, the impact on network performance can be minimized through learning that takes into account the data loss of the feature map extracted from the network.

도 1은 일 실시예에 따른 부호화 장치의 구성을 보여 주는 블록도이다.
도 2는 일 실시예에 따른 복호화 장치의 구성을 보여 주는 블록도이다.
도 3은 본 발명의 일 실시 형태에 따른 특징 맵의 처리 방법을 보여 주는 흐름도이다.
도 4는 멀티스케일 특징 맵들을 크기가 같은 단일스케일 특징 맵으로 정렬하는 일례를 모식적으로 도시한 것이다.
도 5는 도 3의 단계 S310에서의 채널 감소의 일예이다.
도 6은 멀티스케일 특징 맵의 잔차 복원의 일례이다.
도 7은 멀티스케일 특징 맵의 잔차 복원의 다른 예이다.
도 8은 멀티스케일 특징 맵에 대한 복원의 일예이다.
도 9는 특징 맵을 역정규화하는 것의 일례를 도시한 것이다.
도 10은 본 발명의 다른 실시 형태에 따른 특징 맵의 처리 방법을 보여 주는 흐름도이다.Figure 1 is a block diagram showing the configuration of an encoding device according to an embodiment.
Figure 2 is a block diagram showing the configuration of a decoding device according to an embodiment.
Figure 3 is a flowchart showing a method of processing a feature map according to an embodiment of the present invention.
Figure 4 schematically shows an example of aligning multi-scale feature maps into a single-scale feature map of the same size.
Figure 5 is an example of channel reduction in step S310 of Figure 3.
Figure 6 is an example of residual restoration of a multiscale feature map.
Figure 7 is another example of residual restoration of a multiscale feature map.
Figure 8 is an example of restoration for a multiscale feature map.
Figure 9 shows an example of denormalizing a feature map.
Figure 10 is a flowchart showing a feature map processing method according to another embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면들을 참조하여 상세하게 설명한다. 본 명세서에서 사용되는 용어 및 단어들은 실시예에서의 기능을 고려하여 선택된 용어들로서, 그 용어의 의미는 발명의 의도 또는 관례 등에 따라 달라질 수 있다. 따라서 후술하는 실시예에서 사용된 용어는, 본 명세서에 구체적으로 정의된 경우에는 그 정의에 따르며, 구체적인 정의가 없는 경우는 당업자들이 일반적으로 인식하는 의미로 해석되어야 할 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings. The terms and words used in this specification are terms selected in consideration of their functions in the embodiments, and the meaning of the terms may vary depending on the intention or custom of the invention. Accordingly, terms used in the embodiments described below, if specifically defined in the present specification, shall follow the definition, and if there is no specific definition, they shall be interpreted as meanings generally recognized by those skilled in the art.

그리고 본 명세서에서의 모듈(module)은 본 명세서에서 설명되는 각 명칭에 따른 기능과 동작을 수행할 수 있는 하드웨어를 의미할 수도 있고, 특정 기능과 동작을 수행할 수 있는 컴퓨터 프로그램 코드를 의미할 수도 있고, 또는 특정 기능과 동작을 수행시킬 수 있는 컴퓨터 프로그램 코드가 탑재된 전자적 기록 매체, 예를 들어 프로세서 또는 마이크로 프로세서를 의미할 수 있다. 즉, 모듈이란 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및/또는 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적 및/또는 구조적 결합을 의미할 수 있다.Additionally, a module in this specification may mean hardware that can perform functions and operations according to each name described in this specification, or it may mean computer program code that can perform specific functions and operations. Alternatively, it may mean an electronic recording medium loaded with computer program code that can perform specific functions and operations, for example, a processor or microprocessor. In other words, a module may mean a functional and/or structural combination of hardware for carrying out the technical idea of the present invention and/or software for driving the hardware.

도 1은 일 실시예에 따른 부호화 장치의 구성을 보여 주는 블록도이다. 도 1을 참조하면, 부호화 장치(100)는 버스(190)를 통하여 서로 통신하는 처리부(110), 메모리(130), 사용자 인터페이스(User Interface; UI) 입력 디바이스(150), UI 출력 디바이스(160) 및 저장소(storage)(140)를 포함할 수 있다. 또한, 부호화 장치(100)는 네트워크(199)에 연결되는 통신부(120)를 더 포함할 수 있다.Figure 1 is a block diagram showing the configuration of an encoding device according to an embodiment. Referring to FIG. 1, the encoding device 100 includes a processing unit 110, a memory 130, a user interface (UI) input device 150, and a UI output device 160 that communicate with each other through a bus 190. ) and storage (storage) 140. Additionally, the encoding device 100 may further include a communication unit 120 connected to the network 199.

처리부(110)는 중앙 처리 장치(Central Processing Unit; CPU), 메모리(130) 또는 저장소(140)에 저장된 프로세싱(processing) 명령어(instruction) 들을 실행하는 반도체 장치일 수 있다. 처리부(110)는 적어도 하나의 하드웨어 프로세서일 수 있다.The processing unit 110 may be a semiconductor device that executes processing instructions stored in a central processing unit (CPU), memory 130, or storage 140. The processing unit 110 may be at least one hardware processor.

처리부(110)는 부호화 장치(100)로 입력되거나, 부호화 장치(100)에서 출력되거나, 부호화 장치(100)의 내부에서 사용되는 신호, 데이터 또는 정보의 생성 및 처리를 수행할 수 있고, 신호, 데이터 또는 정보에 관련된 검사, 비교 및 판단 등을 수행할 수 있다. 말하자면, 실시예에서 데이터 또는 정보의 생성 및 처리와, 데이터 또는 정보에 관련된 검사, 비교 및 판단은 처리부(110)에 의해 수행될 수 있다.The processing unit 110 may generate and process signals, data, or information that are input to the encoding device 100, output from the encoding device 100, or used inside the encoding device 100. Inspection, comparison, and judgment related to data or information can be performed. That is, in an embodiment, generation and processing of data or information, and inspection, comparison, and judgment related to the data or information may be performed by the processing unit 110.

처리부(110)는 인터 예측부, 인트라 예측부, 스위치, 감산기, 변환부, 양자화부, 엔트로피 부호화부, 역양자화부, 역변환부, 가산기, 필터부 및 참조 픽처 버퍼를 포함할 수 있다. 인터 예측부, 인트라 예측부, 스위치, 감산기, 변환부, 양자화부, 엔트로피 부호화부, 역양자화부, 역변환부, 가산기, 필터부 및 참조 픽처 버퍼 중 적어도 일부는 프로그램 모듈들일 수 있으며, 외부의 장치 또는 시스템과 통신할 수 있다. 프로그램 모듈들은 운영 체제, 응용 프로그램 모듈 및 기타 프로그램 모듈의 형태로 부호화 장치(100)에 포함될 수 있다.The processing unit 110 may include an inter prediction unit, an intra prediction unit, a switch, a subtractor, a transform unit, a quantization unit, an entropy encoding unit, an inverse quantization unit, an inverse transform unit, an adder, a filter unit, and a reference picture buffer. At least some of the inter prediction unit, intra prediction unit, switch, subtractor, transform unit, quantization unit, entropy encoding unit, inverse quantization unit, inverse transform unit, adder, filter unit, and reference picture buffer may be program modules, and may be external devices. Or you can communicate with the system. Program modules may be included in the encoding device 100 in the form of an operating system, application program module, and other program modules.

프로그램 모듈들은 물리적으로는 여러 가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈 중 적어도 일부는 부호화 장치(100)와 통신 가능한 원격 기억 장치에 저장될 수도 있다.Program modules may be physically stored on various known storage devices. Additionally, at least some of these program modules may be stored in a remote memory device capable of communicating with the encoding device 100.

프로그램 모듈들은 일 실시예에 따른 기능 또는 동작을 수행하거나, 일 실시예에 따른 추상 데이터 유형을 구현하는 루틴(routine), 서브루틴(subroutine), 프로그램, 오브젝트(object), 컴포넌트(component) 및 데이터 구조(data structure) 등을 포괄할 수 있지만, 이에 제한되지는 않는다.Program modules are routines, subroutines, programs, objects, components, and data that perform a function or operation according to an embodiment or implement an abstract data type according to an embodiment. It may include data structures, etc., but is not limited thereto.

프로그램 모듈들은 부호화 장치(100)의 적어도 하나의 프로세서(processor)에 의해 수행되는 명령어(instruction) 또는 코드(code)로 구성될 수 있다.Program modules may be composed of instructions or codes that are executed by at least one processor of the encoding device 100.

처리부(110)는 인터 예측부, 인트라 예측부, 스위치, 감산기, 변환부, 양자화부, 엔트로피 부호화부, 역양자화부, 역변환부, 가산기, 필터부 및 참조 픽처 버퍼의 명령어 또는 코드를 실행할 수 있다.The processing unit 110 may execute instructions or codes of an inter prediction unit, an intra prediction unit, a switch, a subtractor, a transform unit, a quantizer, an entropy encoder, an inverse quantizer, an inverse transform unit, an adder, a filter unit, and a reference picture buffer. .

저장부는 메모리(130) 및/또는 저장소(140)를 나타낼 수 있다. 메모리(130) 및 저장소(140)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들면, 메모리(130)는 롬(ROM)(131) 및 램(RAM)(132) 중 적어도 하나를 포함할 수 있다.The storage unit may represent memory 130 and/or storage 140. Memory 130 and storage 140 may be various types of volatile or non-volatile storage media. For example, the memory 130 may include at least one of a read-only memory (ROM) 131 and a random access memory (RAM) 132.

저장부는 부호화 장치(100)의 동작을 위해 사용되는 데이터 또는 정보를 저장할 수 있다. 실시예에서, 부호화 장치(100)가 갖는 데이터 또는 정보는 저장부내에 저장될 수 있다. 예를 들면, 저장부는 픽처, 블록, 리스트, 움직임 정보, 인터 예측 정보 및 비트스트림 등을 저장할 수 있다.The storage unit may store data or information used for the operation of the encoding device 100. In an embodiment, data or information held by the encoding device 100 may be stored in a storage unit. For example, the storage unit can store pictures, blocks, lists, motion information, inter prediction information, and bitstreams.

부호화 장치(100)는 컴퓨터에 의해 독출(read)될 수 있는 기록 매체를 포함하는 컴퓨터 시스템에서 구현될 수 있다. 기록 매체는 부호화 장치(100)가 동작하기 위해 요구되는 적어도 하나의 모듈을 저장할 수 있다. 메모리(130)는 적어도 하나의 모듈을 저장할 수 있고, 적어도 하나의 모듈이 처리부(110)에 의하여 실행되도록 구성될 수 있다.The encoding device 100 may be implemented in a computer system that includes a recording medium that can be read by a computer. The recording medium may store at least one module required for the encoding device 100 to operate. The memory 130 may store at least one module, and the at least one module may be configured to be executed by the processing unit 110.

부호화 장치(100)의 데이터 또는 정보의 통신과 관련된 기능은 통신부(320)를 통해 수행될 수 있다. 예를 들면, 통신부(120)는 비트스트림을 후술될 복호화 장치(200)로 전송할 수 있다.Functions related to communication of data or information of the encoding device 100 may be performed through the communication unit 320. For example, the communication unit 120 may transmit a bitstream to the decoding device 200, which will be described later.

도 2는 일 실시예에 따른 복호화 장치의 구성을 보여 주는 블록도이다.Figure 2 is a block diagram showing the configuration of a decoding device according to an embodiment.

복호화 장치(200)는 버스(290)를 통하여 서로 통신하는 처리부(210), 메모리(230), 사용자 인터페이스(User Interface; UI) 입력 디바이스(250), UI 출력 디바이스(260) 및 저장소(storage)(240)를 포함할 수 있다. 또한, 복호화 장치(200)는 네트워크(299)에 연결되는 통신부(220)를 더 포함할 수 있다.The decryption device 200 includes a processing unit 210, a memory 230, a user interface (UI) input device 250, a UI output device 260, and storage that communicate with each other through the bus 290. It may include (240). Additionally, the decryption device 200 may further include a communication unit 220 connected to the network 299.

처리부(210)는 중앙 처리 장치(Central Processing Unit; CPU), 메모리(230) 또는 저장소(240)에 저장된 프로세싱(processing) 명령어(instruction)들을 실행하는 반도체 장치일 수 있다. 처리부(210)는 적어도 하나의 하드웨어 프로세서일 수 있다.The processing unit 210 may be a semiconductor device that executes processing instructions stored in a central processing unit (CPU), memory 230, or storage 240. The processing unit 210 may be at least one hardware processor.

처리부(210)는 복호화 장치(200)로 입력되거나, 복호화 장치(200)에서 출력되거나, 복호화 장치(200)의 내부에서 사용되는 신호, 데이터 또는 정보의 생성 및 처리를 수행할 수 있고, 신호, 데이터 또는 정보에 관련된 검사, 비교 및 판단 등을 수행할 수 있다. 말하자면, 실시예에서 데이터 또는 정보의 생성 및 처리와, 데이터 또는 정보에 관련된 검사, 비교 및 판단은 처리부(210)에 의해 수행될 수 있다.The processing unit 210 may generate and process signals, data, or information that are input to the decoding device 200, output from the decoding device 200, or used inside the decoding device 200. Inspection, comparison, and judgment related to data or information can be performed. That is, in an embodiment, generation and processing of data or information, and inspection, comparison, and judgment related to the data or information may be performed by the processing unit 210.

처리부(210)는 엔트로피 복호화부, 역양자화부, 역변환부, 인트라 예측부, 인터 예측부, 스위치, 가산기, 필터부 및 참조 픽처 버퍼를 포함할 수 있다. 엔트로피 복호화부, 역양자화부, 역변환부, 인트라 예측부, 인터 예측부, 스위치, 가산기, 필터부 및 참조 픽처 버퍼 중 적어도 일부는 프로그램 모듈들일 수 있으며, 외부의 장치 또는 시스템과 통신할 수 있다. 프로그램 모듈들은 운영 체제, 응용 프로그램 모듈 및 기타 프로그램 모듈의 형태로 복호화 장치(200)에 포함될 수 있다.The processing unit 210 may include an entropy decoder, an inverse quantization unit, an inverse transform unit, an intra prediction unit, an inter prediction unit, a switch, an adder, a filter unit, and a reference picture buffer. At least some of the entropy decoder, inverse quantization unit, inverse transform unit, intra prediction unit, inter prediction unit, switch, adder, filter unit, and reference picture buffer may be program modules and may communicate with an external device or system. Program modules may be included in the decryption device 200 in the form of an operating system, application program module, and other program modules.

프로그램 모듈들은 물리적으로는 여러 가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈 중 적어도 일부는 복호화 장치(200)와 통신 가능한 원격 기억 장치에 저장될 수도 있다. 프로그램 모듈들은 일 실시예에 따른 기능 또는 동작을 수행하거나, 일 실시예에 따른 추상 데이터 유형을 구현하는 루틴(routine), 서브루틴(subroutine), 프로그램, 오브젝트(object), 컴포넌트(component) 및 데이터 구조(data structure) 등을 포괄할 수 있지만, 이에 제한되지는 않는다. 프로그램 모듈들은 복호화 장치(200)의 적어도 하나의 프로세서(processor)에 의해 수행되는 명령어(instruction) 또는 코드(code)로 구성될 수 있다.Program modules may be physically stored on various known storage devices. Additionally, at least some of these program modules may be stored in a remote memory device capable of communicating with the decoding device 200. Program modules are routines, subroutines, programs, objects, components, and data that perform a function or operation according to an embodiment or implement an abstract data type according to an embodiment. It may include data structures, etc., but is not limited thereto. Program modules may be composed of instructions or codes that are executed by at least one processor of the decoding device 200.

처리부(210)는 엔트로피 복호화부, 역양자화부, 역변환부, 인트라 예측부, 인터 예측부, 스위치, 가산기, 필터부 및 참조 픽처 버퍼의 명령어 또는 코드를 실행할 수 있다.The processing unit 210 may execute instructions or codes of an entropy decoder, inverse quantization unit, inverse transform unit, intra prediction unit, inter prediction unit, switch, adder, filter unit, and reference picture buffer.

저장부는 메모리(230) 및/또는 저장소(240)를 나타낼 수 있다. 메모리(230) 및 저장소(240)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들면, 메모리(230)는 롬(ROM)(231) 및 램(RAM)(232) 중 적어도 하나를 포함할 수 있다.The storage unit may represent memory 230 and/or storage 240. Memory 230 and storage 240 may be various types of volatile or non-volatile storage media. For example, the memory 230 may include at least one of a read-only memory (ROM) 231 and a random access memory (RAM) 232.

저장부는 복호화 장치(200)의 동작을 위해 사용되는 데이터 또는 정보를 저장할 수 있다. 실시예에서, 복호화 장치(200)가 갖는 데이터 또는 정보는 저장부 내에 저장될 수 있다. 예를 들면, 저장부는 픽처, 블록, 리스트, 움직임 정보, 인터 예측 정보 및 비트스트림 등을 저장할 수 있다.The storage unit may store data or information used for the operation of the decoding device 200. In an embodiment, data or information held by the decoding device 200 may be stored in the storage unit. For example, the storage unit can store pictures, blocks, lists, motion information, inter prediction information, and bitstreams.

복호화 장치(200)는 컴퓨터에 의해 독출(read)될 수 있는 기록 매체를 포함하는 컴퓨터 시스템에서 구현될 수 있다. 기록 매체는 복호화 장치(200)가 동작하기 위해 요구되는 적어도 하나의 모듈을 저장할 수 있다. 메모리(230)는 적어도 하나의 모듈을 저장할 수 있고, 적어도 하나의 모듈이 처리부(210)에 의하여 실행되도록 구성될 수 있다.The decryption device 200 may be implemented in a computer system that includes a recording medium that can be read by a computer. The recording medium may store at least one module required for the decoding device 200 to operate. The memory 230 may store at least one module, and the at least one module may be configured to be executed by the processing unit 210.

복호화 장치(200)의 데이터 또는 정보의 통신과 관련된 기능은 통신부(220)를 통해 수행될 수 있다. 예를 들면, 통신부(220)는 부호화 장치(100)로부터 비트스트림을 수신할 수 있다. Functions related to communication of data or information of the decryption device 200 may be performed through the communication unit 220. For example, the communication unit 220 may receive a bitstream from the encoding device 100.

이하에서, 처리부는 부호화 장치(100)의 처리부(110) 및/또는 복호화 장치(200)의 처리부(210)를 나타낼 수 있다. 예를 들면, 예측에 관한 기능에 있어서, 처리부는 스위치를 나타낼 수 있다. 인터 예측에 관한 기능에 있어서, 처리부는 인터 예측부, 감산기 및/또는 가산기를 나타낼 수 있다. 인트라 예측에 관한 기능에 있어서, 처리부는 인트라 예측부, 감산기 및/또는 가산기를 나타낼 수 있다. 변환에 관한 기능에 있어서, 처리부는 변환부 및/또는 역변환부를 나타낼 수 있다. 양자화에 관한 기능에 있어서, 처리부는 양자화부 및/또는 역양자화부를 나타낼 수 있다. 엔트로피 부호화 및/또는 복호화에 관한 기능에 있어서, 처리부는 엔트로피 부호화부 및/또는 엔트로피 복호화부를 나타낼 수 있다. 필터링에 관한 기능에 있어서, 처리부는 필터부를 나타낼 수 있다. 참조 픽처에 관한 기능에 있어서, 처리부는 참조 픽처 버퍼를 나타낼 수 있다.Hereinafter, the processing unit may refer to the processing unit 110 of the encoding device 100 and/or the processing unit 210 of the decoding device 200. For example, in functions related to prediction, the processor may represent a switch. For functions related to inter prediction, the processing unit may represent an inter prediction unit, a subtractor, and/or an adder. For functions related to intra prediction, the processing unit may represent an intra prediction unit, a subtractor, and/or an adder. In functions related to conversion, the processing unit may represent a conversion unit and/or an inverse conversion unit. In functions related to quantization, the processing unit may represent a quantization unit and/or an inverse quantization unit. In functions related to entropy encoding and/or decoding, the processing unit may represent an entropy encoding unit and/or an entropy decoding unit. In functions related to filtering, the processing unit may represent a filter unit. For functions related to reference pictures, the processing unit may indicate a reference picture buffer.

후술하는 본 발명의 실시예는 본 발명은 기계 또는 사람을 위해 영상 정보의 특징을 효율적으로 전송하기 위한 특징맵 영상 부/복호화에 관한 것이다. 딥러닝/머신러닝의 발전과 다양한 인공지능 플랫폼 및 어플리케이션이 개발되고 있으며, 딥러닝/머신러닝에 의해 기계 또는 사람이 처리하는 데이터가 방대해짐에 따라, 효율적인 정보 전송과 실시간 신호처리를 위한 영상 또는 특징의 부/복호화 방법이 요구되고 있다.Embodiments of the present invention described later relate to feature map image encoding/decoding for efficiently transmitting features of image information for machines or people. As deep learning/machine learning develops and various artificial intelligence platforms and applications are being developed, and as the data processed by machines or people increases due to deep learning/machine learning, video or video for efficient information transmission and real-time signal processing A method for encoding/decoding features is required.

대표적인 머신비전으로 객체 검출/분할에 사용되는 네트워크 구조에 따라, 다양한 스케일의 객체를 분별하기 위해 하나의 영상/비디오로부터 멀티스케일 특징을 추출하여 머신비전에 사용된다. 멀티스케일 특징은 다양한 해상도와 다채널의 특징을 갖기 때문에, 데이터의 크기가 방대하여 데이터의 크기를 줄이는데 어려움이 있다.According to the network structure used for object detection/segmentation as a representative machine vision, multi-scale features are extracted from one image/video to distinguish objects of various scales and are used in machine vision. Because the multi-scale feature has various resolutions and multi-channel features, the size of the data is enormous, making it difficult to reduce the size of the data.

본 발명은 압축의 대상이 되는 특징 집합을 정렬하고, 정렬된 특징 집합을 재조정 후 크기를 줄이면서, 기존 방대한 특징의 크기를 효율적으로 변환시킨다. 또한 압축된 특징 복원시 각 복원 과정에 별도의 처리 과정을 추가하여 효과적인 복원이 가능하다.The present invention sorts the feature set subject to compression, readjusts the sorted feature set, and reduces its size, efficiently converting the size of existing massive features. Additionally, when restoring compressed features, effective restoration is possible by adding a separate processing step to each restoration process.

기존 특징 압축 및 복원의 경우, 방대한 크기의 특징을 줄이기 위해 복잡한 네트워크 구성 및 어려운 학습 과정을 거칠 뿐만 아니라, 복원시 단일 스트림에서 다양한 특징에 대한 복원을 수행하기 때문에, 복원 효율이 떨어져 성능에 영향을 미친다.In the case of existing feature compression and restoration, not only are complex network configurations and difficult learning processes required to reduce features of a large size, but also restoration is performed on various features in a single stream during restoration, which reduces restoration efficiency and affects performance. It's crazy.

본 발명은 간단한 특징 정렬, 재조정, 압축 과정을 통해 특징을 압축하고, 다양한 특징 복원을 위해 해당 특징에 대응하는 별도의 처리 과정을 구성하여 복원 능력을 향상시킨다.The present invention improves restoration ability by compressing features through simple feature alignment, readjustment, and compression processes, and configuring separate processing processes corresponding to the features to restore various features.

특징 정렬, 채널 변환함에 있어, 압축률를 고려하여 채널 변환을 최적화할 수 있을 뿐만 아니라, 채널 재조정 과정에서 적응적으로 채널 변환의 정도를 결정하여 컨텐츠 적응적인 특징 압축을 수행할 수 있다. 그리고 본 발명을 통해 특징을 효과적으로 압축할 뿐만 아니라, 특징 복원의 효과를 증진시켜 압축 성능대비 머신비전 성능의 효율을 높일 수 있다. In feature alignment and channel conversion, not only can channel conversion be optimized by considering the compression rate, but also content-adaptive feature compression can be performed by adaptively determining the degree of channel conversion during the channel readjustment process. And through the present invention, not only can features be effectively compressed, but the efficiency of machine vision performance can be increased compared to compression performance by improving the effect of feature restoration.

그리고 본 발명의 일 실시 형태에 의하면, 영상 압축 코덱을 사용한 특징의 압축 과정에서 특징의 최소값과 최대값을 전송하는 과정을 제외한다. 이에 의하여, 영상 압축 코덱을 사용한 특징 압축의 필수적인 과정인 특징 최소값과 최대값 전송을 제외함으로서, 최소값 및 최대값의 전송을 위한 단일 스트림의 설계를 생략하고 최소값과 최대값 크기만큼의 부호화 효율을 얻을 수 있다.And according to one embodiment of the present invention, the process of transmitting the minimum and maximum values of the features is excluded from the process of compressing the features using the video compression codec. By this, by excluding transmission of the feature minimum and maximum values, which is an essential process of feature compression using a video compression codec, the design of a single stream for transmission of the minimum and maximum values is omitted and encoding efficiency equivalent to the size of the minimum and maximum values is obtained. You can.

도 3은 본 발명의 일 실시 형태에 따른 특징 맵의 처리 방법을 보여 주는 흐름도이다. 도 3에서 단계 S342는 특징 맵을 역정규화하는 단계이며, 이에 의하여 특징 맵의 최소값 및 최대값을 부호화하여 비트스트림에 포함시키거나 또는 이를 복호화 장치로 전송할 필요가 없다. 일 실시 형태에 의하면, 단계 S342는 임의의 단계로서 생략될 수 있다.Figure 3 is a flowchart showing a method of processing a feature map according to an embodiment of the present invention. In Figure 3, step S342 is a step of denormalizing the feature map, thereby eliminating the need to encode the minimum and maximum values of the feature map and include them in the bitstream or transmit them to the decoding device. According to one embodiment, step S342 can be omitted as an optional step.

도 3을 참조하면, 특징 맵의 처리 방법은 특징 정렬 단계(S310)를 포함한다. 특징 정렬 단계(S310)에서는 입력 영상으로부터 추출된 멀티스케일의 특징 맵들 중에서 적어도 하나 이상의 특징 맵들을 같은 스케일의 특징 맵으로 정렬할 수 있다. 도 4는 멀티스케일 특징 맵들을 크기가 같은 단일스케일 특징 맵으로 정렬하는 일례를 모식적으로 도시한 것이다.Referring to FIG. 3, the feature map processing method includes a feature alignment step (S310). In the feature alignment step (S310), at least one feature map among the multi-scale feature maps extracted from the input image may be aligned into a feature map of the same scale. Figure 4 schematically shows an example of aligning multi-scale feature maps into a single-scale feature map of the same size.

특징 맵은 (H, W), (C, H, W), (B, C, H, W) 중 적어도 하나의 이상의 방법으로 표현될 수 있다. 여기서, C는 특징 맵의 채널 수, H는 특징 맵의 높이, W는 특징 맵의 너비일 수 있으며, 각각 0 이상의 정수일 수 있다. 그리고 B는 (C, H, W)의 크기를 갖는 특징 맵의 수로 0 이상의 정수일 수 있다.The feature map may be expressed in at least one of (H, W), (C, H, W), and (B, C, H, W). Here, C may be the number of channels of the feature map, H may be the height of the feature map, and W may be the width of the feature map, each of which may be an integer greater than or equal to 0. And B is the number of feature maps with a size of (C, H, W) and can be an integer greater than 0.

특징 맵의 집합은 서로 같은 (C, H, W)를 갖는 하나 이상의 특징 맵의 집합일 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, H, W), p3 = (C, H, W), … 와 같이 채널 수, 높이, 너비가 모두 같은 하나 이상의 특징 맵의 집합일 수 있다. The set of feature maps may be a set of one or more feature maps having the same (C, H, W). For example, p1 = (C, H, W), p2 = (C, H, W), p3 = (C, H, W), … It may be a set of one or more feature maps with the same number of channels, height, and width.

여기서, 멀티스케일 특징 맵의 집합은 서로 다른 (C, H, W)를 갖는 하나 이상의 특징 맵의 집합일 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), … 와 같이 채널 수는 C로 같고, 높이와 너비가 2배씩 증가하는 하나 이상의 특징 맵의 집합일 수 있다. 또는, p1 = (C, H, W), p2 = (C, H', W'), p3 = (C, H'', W''), … 와 같이 채널 수는 C로 같고, 높이와 너비가 모두 다른 하나 이상의 특징 맵의 집합일 수 있다. 또는, 예를 들어, p1 = (C, H, W), p2 = (C', 2H, 2W), p3 = (C'', 4H, 4W), … 와 같이 서로 다른 채널 수는 같고, 높이와 너비가 2배씩 증가하는 하나 이상의 특징 맵의 집합일 수 있다.Here, the set of multiscale feature maps may be a set of one or more feature maps with different (C, H, W). For example, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), … As shown, the number of channels is the same as C, and it can be a set of one or more feature maps whose height and width increase by two. Or, p1 = (C, H, W), p2 = (C, H', W'), p3 = (C, H'', W''), … As shown, the number of channels is the same as C, and it can be a set of one or more feature maps with different heights and widths. Or, for example, p1 = (C, H, W), p2 = (C', 2H, 2W), p3 = (C'', 4H, 4W), … It may be a set of one or more feature maps where the number of different channels is the same and the height and width are increased by two.

상기 멀티스케일 특징 맵의 집합 중에서 적어도 하나 이상은 적어도 하나 이상의 해상도로 정렬될 수 있다. 일례로, 가장 작은 특징 맵을 기준으로 동일한 크기의 특징 맵으로 스케일링되어 정렬될 수 있다. Among the set of multi-scale feature maps, at least one may be aligned with at least one resolution. For example, the smallest feature map may be scaled and sorted into feature maps of the same size.

예를 들어, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W)일 경우, 가장 작은 특징 맵인 p1을 기준으로 p2~p4의 특징이 p1과 동일한 크기로 다운스케일링될 수 있다. 다운스케일링된 특징 맵은 결합되어 (4C, H, W)의 크기로 정렬될 수 있다.For example, if p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W), the smallest feature map Based on p1, the features of p2 to p4 can be downscaled to the same size as p1. The downscaled feature maps can be combined and aligned to size (4C, H, W).

일례로, 가장 큰 특징 맵을 기준으로 동일한 크기의 특징 맵으로 스케일링되어 정렬될 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W)일 경우, 가장 큰 특징 맵인 p4을 기준으로 p1~p3의 특징이 p4와 동일한 크기로 업스케일링될 수 있다. 업스케일링된 각 특징 맵은 결합되어 (4C, 8H, 8W)의 크기로 정렬될 수 있다.For example, the largest feature map may be scaled and sorted into feature maps of the same size. For example, if p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W), the largest feature map Based on p4, the features of p1 to p3 can be upscaled to the same size as p4. Each upscaled feature map can be combined and aligned to a size of (4C, 8H, 8W).

다른 예로, 중간 크기의 특징을 기준으로 동일한 크기의 특징 맵으로 스케일링되어 정렬될 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W)일 경우, p2를 기준으로 p1은 업스케일링, p3~4는 다운스케일링 될 수 있다. 스케일링된 각 특징 맵은 결합되어 (4C, 2H, 2W)의 크기로 정렬될 수 있다.As another example, a feature map of the same size may be scaled and sorted based on a feature of a medium size. For example, if p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W), based on p2 p1 can be upscaled, and p3~4 can be downscaled. Each scaled feature map can be combined and sorted to size (4C, 2H, 2W).

상기 예시는 하나의 크기로 스케일링되는 경우의 예로서, 둘 이상의 해상도로 스케일링되는 경우 또한 예시와 같은 방법으로 스케일링 후 정렬될 수 있다.The above example is an example of scaling to one size, and when scaling to two or more resolutions, the resolution can be aligned after scaling in the same manner as the example.

계속해서 도 3을 참조하면, 특징 맵의 처리 방법은 단일스케일의 특징 맵들에 대한 특징 채널 변환 과정을 포함한다(S320). 보다 구체적으로, 본 단계에서는 다채널의 단일스케일 특징 맵들을 적어도 하나의 채널 이상의 특징 맵으로 채널 증가/감소/유지 중 적어도 하나 이상의 변환을 할 수 있다.Continuing to refer to FIG. 3, the feature map processing method includes a feature channel conversion process for single-scale feature maps (S320). More specifically, in this step, multi-channel single-scale feature maps can be converted to at least one of channel increase/decrease/maintenance into feature maps for at least one channel.

다채널 특징 맵은 적어도 하나 이상의 채널을 갖는 특징 맵을 의미한다. 예를 들어, 다채널 특징 맵의 크기는 (C, H, W)로 표현이 가능하며, C는 임의의 양의 정수일 수 있다. 다채널 특징 맵은 상기 특징 맵의 집합, 멀티스케일 특징 맵의 집합 중 적어도 하나 이상이 단일해상도로 결합된 형태일 수 있다.A multi-channel feature map refers to a feature map with at least one channel. For example, the size of a multi-channel feature map can be expressed as (C, H, W), where C can be any positive integer. The multi-channel feature map may be a combination of at least one of the feature map set and the multi-scale feature map set in a single resolution.

특징 맵의 채널을 변환하는데 있어, 채널간 재조정을 수행할 수 있다. 채널간 재조정 네트워크는 풀링(Pooling), 완전연결(Fully-connected), 합성곱(Convolution), 활성화 함수(activation fuction) 레이어 중 적어도 하나의 이상의 레이어로 구성할 수 있다.In converting the channels of the feature map, readjustment between channels can be performed. The inter-channel rebalancing network may be composed of at least one layer among pooling, fully-connected, convolution, and activation function layers.

채널간 재조정은 아래 방법 중 적어도 하나 이상에 의해 재조정될 수 있다. 예를 들어, 채널간 재조정을 함에 있어, 단일해상도 특징 맵으로부터 적어도 하나 이상의 대표값을 추출할 수 있다. Re-adjustment between channels can be done by at least one of the methods below. For example, when re-adjusting between channels, at least one representative value can be extracted from a single-resolution feature map.

예를 들어, 각 채널의 대표값은 각 채널의 화소의 {평균값, 최대값, 최소값, 중심값, 최빈값} 중 적어도 하나 이상이 될 수 있다. 일례로, 각 채널 정보의 대표값은 적어도 하나 이상의 합성곱 레이어를 통해 추출될 수 있다. 또는, 단일해상도 특징 맵의 크기가 (C, H, W)의 크기를 가질 경우, 각 채널의 대표값에 의해 (C, 1, 1)로 표현될 수 있다. 또는, 단일해상도 특징의 크기가 (C, H, W)의 크기를 가질 경우, 각 채널로부터 두개의 대표값을 추출하여 (2C, 1, 1)로 표현될 수 있다.For example, the representative value of each channel may be at least one of {average value, maximum value, minimum value, center value, mode value} of the pixels of each channel. For example, representative values of each channel information may be extracted through at least one convolution layer. Alternatively, if the size of the single resolution feature map is (C, H, W), it can be expressed as (C, 1, 1) by the representative value of each channel. Alternatively, when the size of a single resolution feature is (C, H, W), two representative values can be extracted from each channel and expressed as (2C, 1, 1).

예를 들어, 채널간 재조정 함에 있어, 각 채널의 대표값은 적어도 하나 이상의 완전연결 레이어를 통해 대표값이 조정될 수 있다. 일례로, 상기 대표값에 의해 표현된 (C, 1, 1)의 특징 맵의 대표값은 적어도 하나 이상의 완전연결 레이어를 통해 조정되어 (C', 1, 1)의 특징 맵이 될 수 있다. 또는, 상기 두개의 대표값에 의해 표현된 (2C, 1, 1)의 특징 맵의 대표값은 적어도 하나 이상의 완전연결 레이어를 통해 조정되어 (C', 1, 1)의 특징 맵이 될 수 있다. 이 때, C'은 C와 동일하거나 서로 다를 수 있다.For example, when readjusting between channels, the representative value of each channel may be adjusted through at least one fully connected layer. For example, the representative value of the feature map of (C, 1, 1) expressed by the representative value may be adjusted through at least one fully connected layer to become a feature map of (C', 1, 1). Alternatively, the representative value of the feature map of (2C, 1, 1) expressed by the two representative values can be adjusted through at least one fully connected layer to become a feature map of (C', 1, 1). . At this time, C' may be the same as or different from C.

예를 들어, 채널간 재조정 함에 있어, 단일해상도의 특징 맵은 상기 조정된 대표값에 의해 재조정될 수 있다. 특징 맵은 조정된 대표값에 (+, -, *, /) 중 적어도 하나 이상의 연산을 통해 재조정될 수 있다. 일례로, (C, H, W) 크기의 단일해상도 특징 맵은 조정된 대표값의 크기가 (C, 1, 1)일 경우, 각 채널별 곱으로 단일해상도 특징 맵이 재조정될 수 있다. 또는, (C, H, W) 크기의 단일해상도 특징 맵은 조정된 대표값의 크기가 (C, 1, 1)일 경우, 각 채널별 합으로 단일해상도 특징 맵이 재조정될 수 있다. 또는, (C, H, W) 크기의 단일해상도 특징 맵은 조정된 대표값의 크기가 (C, 1, 1)일 경우, 각 채널별 차로 단일해상도 특징 맵이 재조정될 수 있다.For example, in inter-channel readjustment, a single resolution feature map can be readjusted using the adjusted representative value. The feature map can be readjusted through at least one operation among (+, -, *, /) on the adjusted representative value. For example, if the size of the adjusted representative value of a single-resolution feature map of size (C, H, W) is (C, 1, 1), the single-resolution feature map can be readjusted by multiplying each channel. Alternatively, if the size of the adjusted representative value of the single-resolution feature map of size (C, H, W) is (C, 1, 1), the single-resolution feature map can be readjusted by the sum of each channel. Alternatively, if the size of the adjusted representative value is (C, 1, 1), the single-resolution feature map of size (C, H, W) can be readjusted by the difference for each channel.

일 실시 형태에 의하면, 특징 맵의 채널을 변환하는데 있어, 특징 맵의 채널을 적어도 하나 이상의 채널로 증가/감소/유지 중 적어도 하나 이상의 변환을 할 수 있다. 이 경우에, 특징 맵의 채널 변환 네트워크는 풀링(Pooling), 완전연결(Fully-connected), 합성곱(Convolution), 활성화 함수(activation fuction) 레이어 중 적어도 하나 이상의 레이어로 구성할 수 있다.According to one embodiment, when converting the channel of the feature map, at least one of the channels of the feature map can be converted to increase/decrease/maintain one or more channels. In this case, the channel conversion network of the feature map may be composed of at least one layer among pooling, fully-connected, convolution, and activation function layers.

이러한 특징 맵의 채널 변환은, 상기 특징 맵, 특징 맵의 집합, 멀티스케일 특징 맵의 집합, 재조정된 특징 맵 중 적어도 하나 이상에 대해 수행할 수 있다. 예를 들어, (C, H, W) 크기의 재조정된 특징 맵은 하나 이상의 합성곱 레이어에 의해 (C', H, W) 크기의 채널 변환된 특징 맵이 될 수 있다. 이 때, C'은 임의의 양의 정수일 수 있다. 그리고 C'은 C와 같거나, 크거나, 작을 수 있다.Channel conversion of the feature map may be performed on at least one of the feature map, a set of feature maps, a set of multi-scale feature maps, and a readjusted feature map. For example, a rescaled feature map of size (C, H, W) can become a channel transformed feature map of size (C', H, W) by one or more convolution layers. At this time, C' may be any positive integer. And C' can be equal to, greater than, or less than C.

이러한 특징 맵 채널 변환은, 상기 특징 맵의 정렬 단계(S310), 채널간 재조정 중 적어도 하나 이상의 의해 형성된 특징 맵에 대해 수행할 수 있다. 예를 들어, 특징 맵의 집합이 p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W)과 같을 경우, (4C, H, W) 크기를 갖는 특징 맵F로 스케일링 및 정렬 될 수 있고, 채널간 재조정을 통해 특징 맵F'가 될 수 있다. 재조정된 특징 맵F'은 채널 변환을 통해 (C', H, W)의 크기를 갖는 특징 맵F''이 될 수 있다. 도 5는 상기 단계 S310에서의 채널 감소를 예시한 것이다.This feature map channel conversion can be performed on the feature map formed by at least one of the feature map alignment step (S310) and inter-channel readjustment. For example, a set of feature maps might look like p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), p4 = (C, 8H, 8W). In this case, it can be scaled and aligned into a feature map F with a size of (4C, H, W), and can become a feature map F' through inter-channel readjustment. The readjusted feature map F' can become a feature map F'' with a size of (C', H, W) through channel conversion. Figure 5 illustrates channel reduction in step S310.

일 실시 형태에 의하면, 특징 맵 채널 변환에 있어, 압축률을 고려하여 채널 변환의 정도를 조정할 수 있다. 예를 들어, 높은 품질의 압축(높은 비트레이트)을 위해, 채널 감소의 정도를 높여 발생 비트레이트를 최소화 할 수 있다. 또는, 낮은 품질의 압축(낮은 비트레이트)을 위해, 채널 감소의 정도를 낮춰 발생 손실되는 정보의 양을 줄일 수 있다. 예를 들어, 채널 감소의 정도와 압축시 발생하는 비트레이트 간의 관계를 최적화할 수 있다.According to one embodiment, in feature map channel conversion, the degree of channel conversion can be adjusted by considering the compression rate. For example, for high quality compression (high bitrate), the bitrate generated can be minimized by increasing the degree of channel reduction. Alternatively, for low quality compression (low bitrate), the amount of information lost can be reduced by lowering the degree of channel reduction. For example, the relationship between the degree of channel reduction and the bitrate generated during compression can be optimized.

상기 예시에 따라, 채널 변환의 정도와 압축의 정도를 결정하는 파라미터 간의 함수를 구성하여 최적화할 수 있다. 예를 들어, 채널 감소의 정도와 압축 코덱의 양자화 계수(Quantization Parameter, QP)간의 최적화된 함수를 구성하여 채널 감소에 따라 QP를 조정할 수 있다. 이때 함수와 관련한 파라미터는 추가적으로 전송될 수 있다. 다른 예로, 채널 증가의 정도와 압축 코덱의 QP간의 최적화된 함수를 구성하여 채널 증가에 따라 QP를 조정할 수 있다. 이때 함수와 관련한 파라미터는 추가적으로 전송될 수 있다. 또 다른 예로, 채널 유지와 압축 코덱의 QP간의 최적화된 함수를 구성하여 채널 유지에 따라 QP를 조정할 수 있다. 이때 함수와 관련한 파라미터는 추가적으로 전송될 수 있다.According to the above example, optimization can be achieved by configuring a function between parameters that determine the degree of channel conversion and the degree of compression. For example, by configuring an optimized function between the degree of channel reduction and the quantization parameter (QP) of the compression codec, the QP can be adjusted according to channel reduction. At this time, parameters related to the function may be transmitted additionally. As another example, the QP can be adjusted according to the increase in channels by configuring an optimized function between the degree of channel increase and the QP of the compression codec. At this time, parameters related to the function may be transmitted additionally. As another example, the QP can be adjusted according to channel maintenance by configuring an optimized function between channel maintenance and QP of the compression codec. At this time, parameters related to the function may be transmitted additionally.

일 실시 형태에 의하면, 특징 맵 채널 변환에 있어, 채널간 재조정 중 중요한 채널을 판별하여 적응적으로 채널 수를 변환시킬 수 있다. 예를 들어, 채널간 재조정 과정에서 활성화 함수를 통해 채널간 중요도를 판별하고 중요도가 낮은 채널의 경우 제외시켜 적응적인 채널 감소를 수행할 수 있다. 이때, 적응적으로 변화하는 채널수에 따라 복원을 위한 채널수를 추가 정보로 전송할 수 있다. 그리고 활성화 함수는 sigmoid, relu, prelu, leaky-relu 중 적어도 하나 이상의 활성화 함수를 이용할 수 있다.According to one embodiment, when converting feature map channels, important channels can be determined during inter-channel readjustment and the number of channels can be adaptively converted. For example, in the process of rebalancing between channels, the importance of each channel can be determined through an activation function and channels with low importance can be excluded to perform adaptive channel reduction. At this time, the number of channels for restoration can be transmitted as additional information according to the adaptively changing number of channels. And, the activation function may use at least one of sigmoid, relu, prelu, and leaky-relu.

계속해서 도 3을 참조하면, 특징 맵의 처리 방법은 변환된 특징 맵에 대한 부호화 과정을 포함한다(S330). 보다 구체적으로, 본 단계에서는 전술한 특징 맵의 정렬 단계(S310) 및 특징 맵 채널 변환 단계(S320) 중 적어도 하나 이상의 단계가 수행된 특징 맵에 대해 부호화(압축)을 수행할 수 있다.Continuing to refer to FIG. 3, the feature map processing method includes an encoding process for the converted feature map (S330). More specifically, in this step, encoding (compression) may be performed on the feature map on which at least one of the above-described feature map alignment step (S310) and feature map channel conversion step (S320) has been performed.

본 단계에서의 특징 맵의 압축은 부호화 및 복호화 과정을 포함한다. 특징 맵의 부/복호화에 있어, 기존 영상 압축 코덱, 뉴럴 네트워크 기반의 영상 압축 코덱 중 적어도 하나 이상을 사용하여 부/복호화할 수 있다.Compression of the feature map in this step includes encoding and decoding processes. When encoding/decoding a feature map, at least one of an existing video compression codec or a neural network-based video compression codec may be used to encode/decode the feature map.

기존의 영상 압축 코덱 또는 뉴럴 네트워크 기반의 영상 압축 코덱 중 적어도 하나 이상을 사용함에 있어, 특징 맵을 영상 압축에 적합한 형태로 변환할 수 있다.When using at least one of an existing video compression codec or a neural network-based video compression codec, the feature map can be converted into a form suitable for video compression.

예를 들어, (C, H, W) 크기의 다채널 특징 맵을 nH'×mW' 크기의 한 프레임의 흑백 특징 맵으로 변환할 수 있다. 여기에서, n×m은 c와 같거나 다를 수 있다. 그리고 H'와 W'는 H, W와 같거나 다를 수 있다. For example, a multi-channel feature map of size (C, H, W) can be converted into a black and white feature map of one frame of size nH'×mW'. Here, n×m may be equal to or different from c. And H' and W' may be the same as or different from H and W.

다른 예로, (C, H, W) 크기의 다채널 특징을 3×nH' ×mW' 크기의 한 프레임의 컬러 특징 맵으로 변환할 수 있다. 여기에서, n×m×3은 C와 같거나 다를 수 있다. 그리고 H'와 W'는 H, W와 같거나 다를 수 있다.As another example, multi-channel features of size (C, H, W) can be converted into a color feature map of one frame of size 3 × nH' × mW'. Here, n×m×3 may be the same as or different from C. And H' and W' may be the same as or different from H and W.

또 다른 예로, (C, H, W) 크기의 다채널 특징 맵을 해상도 H' ×W'의 C' 프레임의 비디오로 변환할 수 있다. 여기에서, 비디오는 흑백 또는 컬러 비디오일 수 있다. 그리고, C', H', W'은 C, H, W와 같거나 다를 수 있다.As another example, a multi-channel feature map of size (C, H, W) can be converted into a video of C' frame with resolution H' × W'. Here, the video may be black and white or color video. And, C', H', and W' may be the same as or different from C, H, and W.

일 실시 형태에 의하면, 채널간 상관성을 높여, 압축효율을 높이기 위해 채널 재배열을 수행할 수도 있다. 채널 재배열을 수행함에 있어, 채널간 상관도를 파악하여 재배열 할 수 있다. 이때, 채널간 상관도는 채널간 MSE, 채널 평균값, 채널 중심값 등 적어도 하나 이상을 이용하여 파악할 수 있다. 그리고, 재배열된 채널의 순서 정보는 추가 전송될 수 있다. 또한, 채널 재배열은 공간적 재배열, 시간적 재배열 중 적어도 하나 이상의 재배열을 수행할 수 있다.According to one embodiment, channel rearrangement may be performed to increase compression efficiency by increasing correlation between channels. When performing channel rearrangement, the correlation between channels can be identified and rearranged. At this time, the correlation between channels can be determined using at least one of the MSE between channels, the channel average value, and the channel center value. And, order information of rearranged channels can be additionally transmitted. Additionally, channel rearrangement may perform at least one of spatial rearrangement and temporal rearrangement.

일 실시 형태에 의하면, 특징 맵을 영상 압축에 적합한 형태로 변환함에 있어, 변환에 필요한 정보들이 추가로 전송될 수 있다. 예를 들어, 특징 맵을 영상 형태로 변환하기 위해 정규화 과정에 필요한 정보들이 추가로 전송될 수 있다. 또는, 최대, 최소 정규화를 수행할 경우, 최대, 최소값이 추가적으로 전송될 수 있다. 또는, 평균, 표균편차 정규화를 수행할 경우, 평균, 표준편차값이 추가적으로 전송될 수 있다.According to one embodiment, when converting a feature map into a form suitable for image compression, information necessary for conversion may be additionally transmitted. For example, in order to convert a feature map into image form, information necessary for the normalization process may be additionally transmitted. Alternatively, when performing maximum and minimum normalization, the maximum and minimum values may be transmitted additionally. Alternatively, when performing mean and standard deviation normalization, the mean and standard deviation values may be additionally transmitted.

전술한 단계 S330에서, 기존 영상 압축 코덱, 뉴럴 네트워크 기반의 영상 압축 코덱 중 적어도 하나 이상을 사용함에 있어, 상기 다채널 특징 맵을 한 프레임의 특징 맵으로 변환할 경우, 채널간 경계에 의한 압축 비효율성을 감소시키기 위해, 타일 기반의 부/복호화가 수행될 수 있다. 예를 들어, 다채널 특징 맵이 nH' ×mW' 크기의 한 프레임으로 변환될 경우, 타일의 크기를 H' ×W'으로 지정할 수 있다.In the above-described step S330, when converting the multi-channel feature map into a feature map of one frame when using at least one of an existing video compression codec and a neural network-based video compression codec, the compression ratio due to the boundary between channels To reduce efficiency, tile-based encoding/decoding may be performed. For example, when a multi-channel feature map is converted into one frame of size nH' × mW', the size of the tile can be specified as H' × W'.

그리고, 타일 기반의 부/복호화를 수행함에 있어, 관심 영역에 따라 양자화의 정도를 다르게 수행할 수 있다. 예를 들어, 중요한 특징 맵이 배치된 영역의 경우, 낮은 양자화 파라미터를 설정하고, 중요치 않은 특징 맵이 배치된 영역의 경우, 높은 양자화 파라미터를 설정할 수 있다. 이 때, 양자화의 정도가 다른 영역에 대한 정보는 추가적으로 전송될 수 있다.Additionally, when performing tile-based encoding/decoding, the degree of quantization can be performed differently depending on the region of interest. For example, for an area where an important feature map is placed, a low quantization parameter can be set, and for an area where an unimportant feature map is placed, a high quantization parameter can be set. At this time, information about areas with different degrees of quantization may be additionally transmitted.

일 실시 형태에 의하면, 특징 맵을 부/복호화 함에 있어, 샘플, 라인, 블록, 프레임 중 적어도 하나 이상의 단위로 부/복호화를 수행할 수 있다. 이때, 입력되는 특징 맵의 형태, 크기, 차원 중 적어도 하나 이상에 따라, 샘플, 라인, 블록, 프레임 중 적어도 하나 이상의 단위로 부/복호화를 수행할 수 있다.According to one embodiment, when encoding/decoding a feature map, encoding/decoding may be performed in units of at least one of sample, line, block, and frame. At this time, encoding/decoding may be performed in units of at least one of sample, line, block, and frame, depending on at least one of the shape, size, and dimension of the input feature map.

이때, 부호화되는 단위에 따라 해당 단위의 대한 정보를 부/복호화할 수 있다. 예를 들어, 블록 단위의 부/복호화를 수행할 경우, 블록의 크기, 분할, 모양 중 적어도 하나 이상의 정보를 부/복호화 할 수 있다. 예를 들어, 라인 단위의 부/복호화를 수행할 경우, 라인의 길이, 개수, 모양 중 적어도 하나 이상의 정보를 부/복호화 할 수 있다. 예를 들어, 프레임 단위의 부/복호화를 수행할 경우, 프레임의 모양, 크기, 개수, 분할 정보 중 적어도 하나 이상의 정보를 부/복호화 할 수 있다.At this time, information about the unit can be encoded/decoded according to the encoded unit. For example, when performing block-level encoding/decoding, at least one piece of information among the size, division, and shape of the block can be encoded/decoded. For example, when performing line-by-line encoding/decoding, at least one piece of information among the length, number, and shape of the line can be encoded/decoded. For example, when encoding/decoding on a frame basis, at least one piece of information among the shape, size, number, and division information of the frame can be encoded/decoded.

일 실시 형태에 의하면, 특징 맵을 부/복호화 함에 있어, 예측 기반의 부/복호화를 수행할 수 있다. 이때, 예측을 수행하는데 있어, 샘플, 라인, 블록, 프레임 중 적어도 하나 이상의 단위를 예측을 수행할 수 있다. 이때, 특징 맵의 데이터 특성에 따라 예측 방법이 달라질 수 있다. 예측의 방법은 공간적 예측, 시간적 예측, 채널적 예측 중 적어도 하나 이상의 방법을 이용할 수 있다. 공간적 예측의 경우, 이미 부/복호화된 주변 샘플 또는 예측치로부터 부호화 단위에 따라 부호화 단위의 예측을 수행할 수 있다. 이때, 주변 샘플은 현재 부호화 단위에 인접한 샘플, 인접하지 않은 샘플 중 적어도 하나일 수 있다. 이때, 주변 샘플은 샘플, 샘플의 집합 중 적어도 하나 이상일 수 있다.According to one embodiment, when encoding/decoding a feature map, prediction-based encoding/decoding may be performed. At this time, when performing prediction, prediction may be performed on at least one unit among sample, line, block, and frame. At this time, the prediction method may vary depending on the data characteristics of the feature map. The prediction method may use at least one of spatial prediction, temporal prediction, and channel prediction. In the case of spatial prediction, prediction of the coding unit can be performed according to the coding unit from already encoded/decoded surrounding samples or predicted values. At this time, the neighboring sample may be at least one of a sample adjacent to the current coding unit and a sample that is not adjacent to the current coding unit. At this time, the surrounding sample may be at least one of a sample or a set of samples.

이미 부/복호화된 주변 샘플로부터 부호화 단위에 따라 예측을 수행하는 경우, 방향성 예측, 템플릿(template) 예측, 사전(dictionary) 예측, 매트릭스 곱 예측 중 적어도 하나 이상을 이용하여 예측할 수 있다. 예를 들어, 방향성 예측의 경우, 이미 부/복호화된 주변 샘플을 이용하여 방향에 따라 주변 샘플을 복사 또는 보간을 수행하여 현재 단위의 예측을 수행할 수 있다. 예를 들어, 템플릿 예측의 경우, 이미 부/복호화된 주변 샘플로부터 현재 예측 단위와 가장 유사한 템블릿을 탐색하여 예측을 수행할 수 있다. 예를 들어, 사전 예측의 경우, 이미 부/복호화된 주변 샘플 또는 단위의 예측 블록을 별도의 메모리에 저장하여, 현재 블록의 예측을 저장된 사전으로부터 예측할 수 있다. 예를 들어, 매트릭스 곱 예측의 경우, 이미 부/복호화된 주변 샘플과 임의의 매트릭스의 곱으로 현재 블록을 예측할 수 있다. 예를 들어, 시간적 예측의 경우, 시간적으로 전/후 프레임으로부터 부호화 단위에 따라 부호화 단위의 예측을 수행할 수 있다. 예를 들어, 채널적 예측의 경우, 특징 맵이 다채널 형태를 가질 때, 현재 채널이 아닌 다른 채널로 부터 부호화 단위에 따라 부호화 단위의 예측을 수행할 수 있다.When performing prediction according to coding units from already encoded/decoded neighboring samples, prediction may be performed using at least one of directional prediction, template prediction, dictionary prediction, and matrix product prediction. For example, in the case of directional prediction, prediction of the current unit can be performed by copying or interpolating the neighboring samples according to the direction using neighboring samples that have already been encoded/decoded. For example, in the case of template prediction, prediction can be performed by searching for a template most similar to the current prediction unit from surrounding samples that have already been encoded/decoded. For example, in the case of dictionary prediction, already encoded/decoded neighboring samples or unit prediction blocks are stored in a separate memory, and the prediction of the current block can be predicted from the stored dictionary. For example, in the case of matrix product prediction, the current block can be predicted by multiplying already encoded/decoded surrounding samples with an arbitrary matrix. For example, in the case of temporal prediction, prediction of the coding unit can be performed according to the coding unit from the temporally preceding/following frame. For example, in the case of channel prediction, when the feature map has a multi-channel form, prediction of the coding unit can be performed according to the coding unit from a channel other than the current channel.

이 때, 부호화 단위는 특징 맵의 구성에 따라 달라질 수 있으며, 특징 맵이 다채널 형태로 구성되고, 채널간의 상관성에 의해 배열이 되어있다면, 상관성을 이용한 부호화 단위를 지정할 수 있다. 예를 들어, 특정 평균값에 의해 채널이 배열되어 있다면 유사한 평균값의 묶음으로 부호화 단위를 구성할 수 있다. 예를 들어, 채널간의 유사도로 인해 배열되어 있다면 유사도의 묶음으로 부호화 단위를 구성할 수 있다.At this time, the coding unit may vary depending on the configuration of the feature map. If the feature map is composed of multiple channels and arranged by correlation between channels, a coding unit using correlation can be specified. For example, if channels are arranged by a specific average value, a coding unit can be formed as a bundle of similar average values. For example, if the channels are arranged based on similarity between channels, a coding unit can be formed as a bundle of similarities.

예측 기반의 부/복호화를 수행함에 있어, 원본 신호와 예측 신호의 차이를 부/복호화 할 수 있다. 이 때, 예측 신호는 상기 예측 방법 중 적어도 하나 이상의 방법이 이용될 수 있다. 이 때, 잔차 신호는 상기 예측 방법 중 적어도 하나 이상에 의한 예측 신호와 원본 신호의 차이일 수 있다.When performing prediction-based encoding/decoding, the difference between the original signal and the prediction signal can be encoded/decoded. At this time, at least one of the above prediction methods may be used as the prediction signal. At this time, the residual signal may be the difference between the predicted signal and the original signal by at least one of the above prediction methods.

일 실시 형태에 의하면, 특징에 대한 정보 중 적어도 하나 이상의 정보를 엔트로피 부/복호화 할 수 있다. 예를 들어, 특징에 대한 정보는 아래의 정보들 중 적어도 하나 이상을 포함할 수 있다: 특징 맵의 최대값, 최소값, 평균값, 최빈값, 표준편차. 특징 맵의 채널별 최대값, 최소값, 평균값, 최빈값, 표준편차. 특징 맵의 채널을 나타내는 인덱스. 특징 맵의 관심영역에 대한 인덱스. 특징 맵의 관심채널에 대한 인덱스. 특징 화소 및 잔차화소. 채널 변환 정도와 압축 정도 함수 관련 파라미터.According to one embodiment, at least one piece of information about features can be entropy encoded/decoded. For example, the information about the feature may include at least one of the following information: maximum value, minimum value, average value, mode, and standard deviation of the feature map. Maximum, minimum, average, mode, and standard deviation for each channel of the feature map. Index representing the channel of the feature map. Index to the region of interest in the feature map. Index to the channel of interest in the feature map. Feature pixels and residual pixels. Parameters related to the channel conversion degree and compression degree function.

상기 특징에 대한 정보 중 적어도 하나 이상을 엔트로피 부/복호화할 때, 아래의 이진화 방법 중 적어도 하나 이상을 이용할 수 있다: 절삭된 라이스(Truncated Rice) 이진화 방법. K차수 지수-골룸(K-th order exp-golomb) 이진화 방법. 고정 길이(fixed length) 이진화 방법. 단항(unary) 이진화 방법. 절삭된 단항(truncated unary) 이진화 방법. 절삭된 이진(truncated binary) 이진화 방법.When entropy encoding/decoding at least one of the information about the above features, at least one of the following binarization methods can be used: Truncated Rice binarization method. K-th order exp-golomb binarization method. Fixed length binarization method. Unary binarization method. A truncated unary binarization method. Truncated binary binarization method.

상기 특징에 대한 정보 또는 상기 이진화를 통해 발생한 이진 정보에 대한 엔트로피 부/복호화를 수행함에 있어, 다음 중 적어도 하나 이상의 방법을 이용할 수 있다: 문맥기반 적응적 이진 산술 부호화(CABAC, Context-adaptive binary arithmetic coding). 문맥기반 적응적 가변길이 부호화(CAVLC, Context-adaptive variable length coding). 우회 부호화.In performing entropy encoding/decoding on the information about the feature or the binary information generated through the binarization, at least one of the following methods can be used: Context-adaptive binary arithmetic (CABAC) coding). Context-adaptive variable length coding (CAVLC). Bypass encoding.

상기 엔트로피 부/복호화를 수행함에 있어, 채널 타입, 현재 특징 맵의 크기/형태, 주별 특징의 부호화 정보 중 적어도 하나 이상의 부호화 정보를 이용하여 적응적으로 엔트로피 부/복호화를 수행할 수 있다.When performing the entropy encoding/decoding, the entropy encoding/decoding may be adaptively performed using at least one encoding information of the channel type, size/shape of the current feature map, and encoding information of weekly features.

계속해서 도 3을 참조하면, 특징 맵의 처리 방법은 부호화된 특징 맵에 대한 복호화 과정을 포함한다(S340). 보다 구체적으로, 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵에 대한 부호화 단계(S333) 중에서 적어도 하나 이상의 단계에서 원본 특징 맵이 변환됐을 경우, 원본 특징 맵으로의 복원을 수행할 수 있다.Continuing to refer to FIG. 3, the feature map processing method includes a decoding process for the encoded feature map (S340). More specifically, when the original feature map is converted in at least one of the feature map alignment step (S310), the feature channel conversion step (S320), and the feature map encoding step (S333), the original feature map is converted to the original feature map. Restoration can be performed.

일 실시 형태에 의하면, 복원될 특징 맵은 (H, W), (C, H, W), (B, C, H, W) 중 적어도 하나의 이상의 방법으로 표현될 수 있다. C는 특징의 채널 수, H는 특징의 높이, W는 특징의 너비일 수 있으며 0 이상의 정수일 수 있다. B는 (C, H, W)의 크기를 갖는 특징의 수로 0 이상의 정수일 수 있다.According to one embodiment, the feature map to be restored may be expressed in at least one of (H, W), (C, H, W), and (B, C, H, W). C may be the number of channels of the feature, H may be the height of the feature, W may be the width of the feature, and may be an integer greater than or equal to 0. B is the number of features with a size of (C, H, W) and can be an integer greater than 0.

복원될 특징 맵의 집합은 서로 같은 (C, H, W)를 갖는 하나 이상의 특징 맵의 집합일 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, H, W), p3 = (C, H, W), … 와 같이 채널 수, 높이, 너비가 모두 같은 하나 이상의 특징 맵의 집합일 수 있다.The set of feature maps to be restored may be a set of one or more feature maps having the same (C, H, W). For example, p1 = (C, H, W), p2 = (C, H, W), p3 = (C, H, W), … It may be a set of one or more feature maps with the same number of channels, height, and width.

복원될 멀티스케일 특징 맵의 집합은 서로 다른 (C, H, W)를 갖는 하나 이상의 특징 맵의 집합일 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), … 와 같이 채널 수는 C로 같고, 높이와 너비가 2배씩 증가하는 하나 이상의 특징 맵의 집합일 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C, H', W'), p3 = (C, H'', W''), … 와 같이 채널 수는 C로 같고, 높이와 너비가 모두 다른 하나 이상의 특징 맵의 집합일 수 있다. 예를 들어, p1 = (C, H, W), p2 = (C', 2H, 2W), p3 = (C'', 4H, 4W), … 와 같이 서로 다른 채널 수는 같고, 높이와 너비가 2배씩 증가하는 하나 이상의 특징 맵의 집합일 수 있다.The set of multiscale feature maps to be restored may be a set of one or more feature maps with different (C, H, W). For example, p1 = (C, H, W), p2 = (C, 2H, 2W), p3 = (C, 4H, 4W), … As shown, the number of channels is the same as C, and it can be a set of one or more feature maps whose height and width increase by two. For example, p1 = (C, H, W), p2 = (C, H', W'), p3 = (C, H'', W''), … As shown, the number of channels is the same as C, and it can be a set of one or more feature maps with different heights and widths. For example, p1 = (C, H, W), p2 = (C', 2H, 2W), p3 = (C'', 4H, 4W), … It may be a set of one or more feature maps where the number of different channels is the same and the height and width are increased by two.

일 실시 형태에 의하면, 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵의 압축 단계(S330) 중에서 적어도 하나 이상의 단계에서 원본 특징이 변환됐을 경우, 업스케일링, 다운스케일링, 합성곱 레이어, 완전연결 레이어, 활성화 함수 중 적어도 하나 이상의 과정을 통해 원본 특징 맵으로 복원할 수 있다.According to one embodiment, when the original feature is converted in at least one of the feature map alignment step (S310), the feature channel conversion step (S320), and the feature map compression step (S330), upscaling and downscaling are performed. , the original feature map can be restored through at least one of the convolution layer, fully connected layer, and activation function.

원본 특징 맵으로 복원함에 있어, 공간적 크기를 복원해야하는 경우 업스케일링, 다운스케일링, 합성곱 레이어 중 적어도 하나 이상의 과정을 통해 복원할 수 있다. 예를 들어, (C, H', W') 크기의 특징 맵을 (C, H, W) 크기의 원본 특징 맵으로 복원해야 하고, H' < H, W' < W인 경우, 업스케일링을 통해 복원할 수 있다. 여기에서, 업스케일링은 선형보간, 최근값보간, bi-cubic 보간 중 적어도 하나 이상의 보간법이 될 수 있다.When restoring the original feature map, if the spatial size needs to be restored, it can be restored through at least one of upscaling, downscaling, and convolution layers. For example, a feature map of size (C, H', W') needs to be restored to an original feature map of size (C, H, W), and if H' < H, W' < W, upscaling is performed. It can be restored through Here, upscaling may be at least one of linear interpolation, latest value interpolation, and bi-cubic interpolation.

예를 들어, (C, H', W') 크기의 특징 맵을 (C, H, W) 크기의 원본 특징 맵으로 복원해야 하고, H' > H, W' > W인 경우, 업스케일링을 통해 복원할 수 있다. 여기에서, 업스케일링은 선형보간, 최근값보간, bi-cubic 보간 중 적어도 하나 이상의 보간법이 될 수 있다.For example, a feature map of size (C, H', W') needs to be restored to the original feature map of size (C, H, W), and if H' > H, W' > W, upscaling is performed. It can be restored through Here, upscaling may be at least one of linear interpolation, latest value interpolation, and bi-cubic interpolation.

예를 들어, (C, H', W') 크기의 특징 맵을 (C, H, W) 크기의 원본 특징 맵으로 복원해야 하고, H' < H, W' < W인 경우, 합성곱 레이어를 통해 복원할 수 있다. 여기에서, 합성곱 레이어는 전치된 합성곱(Transposed convolution) 레이어, 합성곱 레이어 중 적어도 하나 이상이 될 수 있다.For example, a feature map of size (C, H', W') needs to be restored to an original feature map of size (C, H, W), and if H' < H, W' < W, the convolution layer It can be restored through . Here, the convolution layer may be at least one of a transposed convolution layer and a convolution layer.

예를 들어, (C, H', W') 크기의 특징 맵을 (C, H, W) 크기의 원본 특징 맵으로 복원해야 하고, H' > H, W' > W인 경우, 합성곱 레이어를 통해 복원할 수 있다. 여기에서, 합성곱 레이어는 전치된 합성곱(Transposed convolution) 레이어, 합성곱 레이어 중 적어도 하나 이상이 될 수 있다.For example, a feature map of size (C, H', W') needs to be restored to an original feature map of size (C, H, W), and if H' > H, W' > W, the convolution layer It can be restored through . Here, the convolution layer may be at least one of a transposed convolution layer and a convolution layer.

일 실시 형태에 의하면, 원본 특징 맵으로 복원함에 있어, 채널의 크기를 복원해야하는 경우, 합성곱 레이어, 완전연결 레이어, 활성화 함수 정 적어도 하나 이상의 과정을 통해 복원할 수 있다. 예를 들어, (C', H, W) 크기의 특징 맵을 (C, H, W) 크기의 원본 특징 맵으로 복원해야 하고, 합성곱 레이어를 통해 복원할 수 있다. 여기에서 합성곱 레이어는 전치된 합성곱 레이어, 합성곱 레이어 중 적어도 하나 이상이 될 수 있다. 여기에서 합성곱 레이어를 통과한 결과는 활성화 함수 과정을 거칠 수 있다.According to one embodiment, when restoring the original feature map, if the size of the channel needs to be restored, it can be restored through at least one process of a convolution layer, a fully connected layer, and an activation function. For example, a feature map of size (C', H, W) must be restored to an original feature map of size (C, H, W), and can be restored through a convolution layer. Here, the convolution layer may be at least one of a transposed convolution layer and a convolution layer. Here, the result of passing the convolution layer can go through an activation function process.

일 실시 형태에 의하면, 상기 특징 맵을 복원함에 있어, 복원될 특징 맵이 멀티스케일 특징 맵일 경우, 잔차 복원 형식으로 복원될 수 있다.According to one embodiment, when restoring the feature map, if the feature map to be restored is a multi-scale feature map, it may be restored in a residual reconstruction format.

도 6과 도 7은 멀티스케일 특징 맵의 잔차 복원의 예시이다. 잔차 복원은 도 6과 같이 Top-down, 도 7과 같이 bottom-up 중 적어도 하나 이상의 형식일 수 있다. 해당 예시는 입력되는 [C', H', W'] 크기의 특징 맵으로부터 4개의 원본 특징 맵을 복원하는 예시로, 복원되는 특징 맵의 수는 N개일 수 있다. 여기에서 N은 양의 정수일 수 있다. Figures 6 and 7 are examples of residual restoration of a multiscale feature map. Residual restoration may be in at least one of the following forms: top-down as shown in FIG. 6 or bottom-up as shown in FIG. 7. This example is an example of restoring four original feature maps from input feature maps of size [C', H', W'], and the number of feature maps to be restored may be N. Here, N may be a positive integer.

그리고 해당 예시에서 "처리M"은 서로 다른 처리 과정이 될 수 있으며, 복원될 특징 맵, 특성 중 적어도 하나 이상에 따라, 업스케일링, 다운스케일링, 합성곱 레이어, 전치된 합성곱 레이어, 활성화 함수 중 적어도 하나 이상으로 구성될 수 있다. 예를 들어, 도 6에서 p1~p4의 크기가 동일한 채널 수를 갖지만, 높이와 너비가 2배씩 커질 경우, {처리2, 처리3, 처리4}는 업스케일링, 다운스케일링, 합성곱레이어, 전치된 합성곱 레이어, 활성화 함수 중 적어도 하나 이상으로 구성하여, 커지는 높이와 너비에 맞는 처리 과정을 구성할 수 있다. 예를 들어, 도 6에서 입력되는 특징 맵의 채널수가 C'이고, 복원될 특징 맵 p1~p4의 채널수가 모두 C일 경우, {처리2, 처리3, 처리4}는 채널수를 C'→C로 변환해주는 동일한 과정이 포함될 수 있다.And in this example, "processM" may be a different processing process, depending on at least one of the feature maps to be restored, features, upscaling, downscaling, convolutional layer, transposed convolutional layer, and activation function. It may consist of at least one or more. For example, in Figure 6, if the sizes of p1 to p4 have the same number of channels, but the height and width are doubled, {Process 2, Process 3, Process 4} is used for upscaling, downscaling, convolution layer, and transposition. By consisting of at least one of the convolution layer and activation function, a processing process suitable for the growing height and width can be configured. For example, in Figure 6, if the number of channels of the input feature map is C' and the number of channels of the feature maps p1 to p4 to be restored are all C, {Process 2, Process 3, Process 4} changes the number of channels to C' → The same process of converting to C may be included.

일 실시 형태에 의하면, 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵의 압축 단계(S330) 중에서 적어도 하나 이상의 단계에 의해, 특징 맵의 크기가 변환되면, 복원 과정에서 추가 정보가 요구될 수 있다. 이때, 추가 정보를 전송하여 복원 단계에서 사용될 수 있다.According to one embodiment, when the size of the feature map is converted by at least one of the feature map alignment step (S310), the feature channel conversion step (S320), and the feature map compression step (S330), the restoration process Additional information may be requested. At this time, additional information can be transmitted and used in the restoration step.

예를 들어, 고정된 복원 과정에서 원본 특징 맵의 크기와 동일하지 않게 복원될 경우, 원본 특징 맵의 크기 정보를 전송하여 이에 맞는 복원 과정을 거칠 수 있다.For example, if the size of the original feature map is not restored during a fixed restoration process, the size information of the original feature map can be transmitted and a restoration process corresponding to the size of the original feature map can be transmitted.

일 실시 형태에 의하면, 상기 특징 맵을 복원함에 있어, 복원될 특징이 멀티스케일 특징 맵일 경우, 간단한 처리과정을 통해 복원될 수 있다. 도 8은 멀티스케일 특징 맵에 대한 복원의 예시이다. 해당 예시는 입력되는 [C', H', W'] 크기의 특징 맵으로부터 4개의 원본 특징 맵을 복원하는 예시로, 복원되는 특징 맵의 수는 N개일 수 있다. 여기에서 N은 양의 정수일 수 있다.According to one embodiment, when restoring the feature map, if the feature to be restored is a multi-scale feature map, it can be restored through a simple processing process. Figure 8 is an example of restoration for a multiscale feature map. This example is an example of restoring four original feature maps from input feature maps of size [C', H', W'], and the number of feature maps to be restored may be N. Here, N may be a positive integer.

해당 예시에서 "처리M"은 서로 다른 처리 과정이 될 수 있으며, 복원될 특징, 특성 중 적어도 하나 이상에 따라, 업스케일링, 다운스케일링, 합성곱 레이어, 전치된 합성곱 레이어, 활성화 함수 중 적어도 하나 이상으로 구성될 수 있다.In this example, “processM” may be a different processing process, and depending on at least one of the features to be restored, at least one of upscaling, downscaling, convolution layer, transposed convolution layer, and activation function. It may consist of the above.

예를 들어, 도 6에서 p1~p4의 크기가 동일한 채널 수를 갖지만, 높이와 너비가 2배씩 커질 경우, {처리2, 처리3, 처리4}는 업스케일링, 다운스케일링, 합성곱레이어, 전치된 합성곱 레이어, 활성화 함수 중 적어도 하나 이상으로 구성하여, 커지는 높이와 너비에 맞는 처리 과정을 구성할 수 있다. 예를 들어, 도 6에서 입력되는 특징의 채널수가 C'이고, 복원될 특징 p1~p4의 채널수가 모두 C일 경우, {처리2, 처리3, 처리4}는 채널수를 C'→C로 변환해주는 동일한 과정이 포함될 수 있다.For example, in Figure 6, if the sizes of p1 to p4 have the same number of channels, but the height and width are doubled, {Process 2, Process 3, Process 4} is used for upscaling, downscaling, convolution layer, and transposition. By consisting of at least one of the convolution layer and activation function, a processing process suitable for the growing height and width can be configured. For example, in Figure 6, if the number of channels of the input feature is C' and the number of channels of the features p1 to p4 to be restored are all C, {Process 2, Process 3, Process 4} changes the number of channels from C' → C. The same conversion process may be included.

일 실시 형태에 의하면, 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵의 압축 단계(S330) 중에서 적어도 하나 이상의 단계에 의해, 특징 맵의 크기가 변환되면, 복원 과정에서 추가 정보가 요구될 수 있다. 이때, 추가 정보를 전송하여 복원 단계에서 사용될 수 있다. 예를 들어, 고정된 복원 과정에서 원본 특징 맵의 크기와 동일하지 않게 복원될 경우, 원본 특징 맵의 크기 정보를 전송하여 이에 맞는 복원 과정을 거칠 수 있다.According to one embodiment, when the size of the feature map is converted by at least one of the feature map alignment step (S310), the feature channel conversion step (S320), and the feature map compression step (S330), the restoration process Additional information may be requested. At this time, additional information can be transmitted and used in the restoration step. For example, if the size of the original feature map is not restored during a fixed restoration process, the size information of the original feature map can be transmitted and a restoration process corresponding to the size of the original feature map can be transmitted.

계속해서 도 3을 참조하면, 부호화된 특징 맵에 대한 복호화 단계(S340)는 복호화된 특징 맵에 대한 역정규화 과정(S342)를 포함해도 된다. 보다 구체적으로, 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵의 압축 단계(S330) 중에서 적어도 하나 이상의 단계가 수행된 특징 맵에 대해 역정규화를 할 수 있다. 이러한 역정규화 과정(S342)은 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵의 압축 단계(S330) 중에서 적어도 하나 이상의 단계에서 특징 맵의 정규화를 수행한 경우에만 수행해도 된다.Continuing to refer to FIG. 3, the decoding step (S340) for the encoded feature map may include a denormalization process (S342) for the decoded feature map. More specifically, denormalization may be performed on a feature map on which at least one of the feature map alignment step (S310), the feature channel conversion step (S320), and the feature map compression step (S330) have been performed. This denormalization process (S342) is performed only when normalization of the feature map is performed in at least one of the feature map alignment step (S310), feature channel conversion step (S320), and feature map compression step (S330). It's okay too.

역정규화가 수행된 특징 맵은 (H, W), (C, H, W), (B, C, H, W) 중 적어도 하나의 이상의 방법으로 표현될 수 있다. C는 특징 맵의 채널 수, H는 특징 맵의 높이, W는 특징 맵의 너비일 수 있으며 0 이상의 정수일 수 있다. B는 (C, H, W)의 크기를 갖는 특징 맵의 수로 0 이상의 정수일 수 있다.The feature map on which denormalization has been performed may be expressed in at least one of (H, W), (C, H, W), and (B, C, H, W). C may be the number of channels in the feature map, H may be the height of the feature map, and W may be the width of the feature map, and may be an integer greater than or equal to 0. B is the number of feature maps with a size of (C, H, W) and can be an integer greater than or equal to 0.

역정규화 과정에서 생성된 특징 맵은 (2,1,1), (B,2,1,1) 중 적어도 하나의 이상의 방법으로 표현될 수 있다. B는 (2,1,1)의 크기를 갖는 특징 맵의 수로 0 이상의 정수일 수 있다.The feature map generated during the denormalization process can be expressed in at least one of (2,1,1) and (B,2,1,1). B is the number of feature maps with a size of (2,1,1) and can be an integer greater than or equal to 0.

일 실시 형태에 의하면, 상기 특징 맵의 정렬 단계(S310), 특징 채널 변환 단계(S320) 및 특징 맵의 압축 단계(S330) 중에서 적어도 하나 이상의 단계에서 원본 특징 맵이 변환됐을 경우, 완전연결 레이어, 합성곱 레이어 중 적어도 하나 이상의 과정을 통해 특징 맵의 최소값과 최대값을 생성하여 상기 특징 맵에 대한 역정규화를 수행할 수 있다.According to one embodiment, when the original feature map is converted in at least one of the feature map alignment step (S310), the feature channel conversion step (S320), and the feature map compression step (S330), a fully connected layer; Denormalization of the feature map can be performed by generating the minimum and maximum values of the feature map through the process of at least one of the convolution layers.

특징 맵의 역정규화에 필요한 최소값과 최대값을 생성함에 있어, 특징 맵의 채널간 재조정과 채널 축소의 과정을 수행할 수 있다. 채널간 재조정은 아래 방법 중 적어도 하나 이상에 의해 재조정될 수 있다.In generating the minimum and maximum values required for denormalization of the feature map, the process of inter-channel readjustment and channel reduction of the feature map can be performed. Re-adjustment between channels can be done by at least one of the methods below.

채널간 재조정 함에 있어, 단일해상도 특징 맵으로부터 적어도 하나 이상의 대표값을 추출할 수 있다. 예를 들어, 각 채널의 대표값은 각 채널의 화소의 {평균값, 최대값, 최소값, 중심값} 중 적어도 하나 이상이 될 수 있다. 예를 들어, 각 채널의 대표값은 적어도 하나 이상의 합성곱 레이어를 통해 추출될 수 있다. 예를 들어, 단일해상도 특징 맵의 크기가 (C, H, W)의 크기를 가질 경우, 각 채널의 대표값에 의해 (C, 1, 1)로 표현될 수 있다.When readjusting between channels, at least one representative value can be extracted from a single resolution feature map. For example, the representative value of each channel may be at least one of {average value, maximum value, minimum value, center value} of the pixels of each channel. For example, representative values of each channel may be extracted through at least one convolution layer. For example, if the size of the single resolution feature map is (C, H, W), it can be expressed as (C, 1, 1) by the representative value of each channel.

채널 축소의 과정에 있어, 각 채널의 대표값은 적어도 하나 이상의 완전 연결 레이어를 통해 최소값과 최대값을 생성할 수 있다. 예를 들어, 상기 대표값에 의해 표현된 (C,1,1)의 특징 맵의 대표값은 적어도 하나 이상의 완전연결 레이어를 통해 조정되어 (2,1,1)의 최소값 최대값을 생성할 수 있다.In the process of channel reduction, the minimum and maximum values of the representative value of each channel can be generated through at least one fully connected layer. For example, the representative value of the feature map of (C,1,1) expressed by the representative value can be adjusted through at least one fully connected layer to generate the minimum and maximum value of (2,1,1). there is.

상기 채널 축소 과정에서 생성된 최소값과 최대값은 별도의 뉴럴 네트워크 손실 함수를 구성하여 생성할 수 있다. 예를 들어, 상기 특징 맵의 압축에서 특징 맵에서 추출한 최소값, 최대값을 이용해 손실함수를 구성할 경우, 역정규화를 위한 최소값, 최대값 생성을 결정하는 파라미터를 최적화할 수 있다. 여기에서 손실 함수는 {MSE,MAE} 중 적어도 하나 이상이 될 수 있다.The minimum and maximum values generated in the channel reduction process can be generated by configuring a separate neural network loss function. For example, when compressing the feature map and configuring a loss function using the minimum and maximum values extracted from the feature map, the parameters that determine the generation of the minimum and maximum values for denormalization can be optimized. Here, the loss function can be at least one of {MSE, MAE}.

도 9는 상기 특징 맵을 역정규화하는 것의 일례를 도시한 도면이다. Figure 9 is a diagram showing an example of denormalizing the feature map.

도 9의 처리 1은 역정규화의 채널 재조정을 의미하며, 글로벌 최대 풀링 (global max pooling), 글로벌 평균 풀링(global average pooling), 글로벌 최소 풀링 (global min pooling), 글로벌 중앙 풀링 (global median pooling)중 적어도 하나 이상의 형식일 수 있다. 예를 들어, 도 9에서 [C,H,W] 크기의 특징 맵이 입력되는 경우, 글로벌 최대 풀링, 글로벌 평균 풀링, 글로벌 최소 풀링, 글로벌 중앙 풀링중 적어도 하나의 글로벌 풀링에 의해 특징 맵은 [C,1,1] 크기로 재조정될 수 있다.Process 1 in Figure 9 refers to channel readjustment of denormalization, global max pooling, global average pooling, global min pooling, and global median pooling. It may be in at least one of the following formats. For example, in Figure 9, when a feature map of size [C, H, W] is input, the feature map is [ C,1,1] can be resized.

도 9의 처리 2는 채널 축소의 의미하며 합성곱 신경망, 완전연결 신경망 중 적어도 하나 이상의 형식일 수 있다. 예를 들어, 도 9에서 상기의 채널 재조정에 의해 [C,1,1]크기의 특징 맵의 경우, 합성곱 신경망, 완전연결 신경망 중 하나 이상의 레이어를 통해 [2,1,1] 크기의 특징 맵으로 채널 축수를 수행할 수 있다. 이때, 축소된 특징 맵의 특징값은 각각 최소값, 최대값을 의미한다.Process 2 in FIG. 9 refers to channel reduction and may be in the form of at least one of a convolutional neural network and a fully connected neural network. For example, in the case of a feature map of size [C,1,1] by the channel readjustment in Figure 9, features of size [2,1,1] are generated through one or more layers of a convolutional neural network or a fully connected neural network. Channel axes can be performed using a map. At this time, the feature values of the reduced feature map mean the minimum and maximum values, respectively.

도 9의 처리 3은 생성된 최소값, 최대값을 사용하여 상기 특징 맵에 역정규화를 수행하는 것을 의미한다. 예를 들어, 도 9에서 최초로 입력된 [C,H,W] 크기의 상기의 특징 맵의 채널 재조정과, 채널 축소의 과정을 통해 생성된 최소값, 최대값을 사용해 역정규화를 수행할 수 있다.Process 3 in Figure 9 means performing denormalization on the feature map using the generated minimum and maximum values. For example, denormalization can be performed using the minimum and maximum values generated through the channel readjustment and channel reduction process of the feature map of the size [C, H, W] initially input in Figure 9.

도 10은 본 발명의 다른 실시 형태에 따른 특징 맵의 처리 방법을 보여 주는 흐름도이다. 도 10에 도시되어 있는 처리 방법은, 영상의 특징 맵을 추출하는 단계(S405)를 더 포함한다는 점에서, 도 3에 도시되어 있는 처리 방법과 차이가 있다. 즉, 도 10에 도시되어 있는 처리 방법에 있어서, 단계 S410, S420, S430 및 S440(S442 포함함)은 각각 도 3에 도시되어 있는 처리 방법에 있어서, 단계 S310, S320, S330 및 S340(S342 포함함)에 대응한다. 이하에서는, 불필요한 중복 설명을 피하기 위하여, 단계 S410, S420, S430 및 S440(S442 포함함)에 대한 상세한 설명은 생략한다. Figure 10 is a flowchart showing a feature map processing method according to another embodiment of the present invention. The processing method shown in FIG. 10 is different from the processing method shown in FIG. 3 in that it further includes a step (S405) of extracting a feature map of the image. That is, in the processing method shown in FIG. 10, steps S410, S420, S430, and S440 (including S442) are respectively In the processing method shown in FIG. 3, steps S310, S320, S330, and S340 (including S342) corresponds to). Hereinafter, in order to avoid unnecessary duplicate description, detailed description of steps S410, S420, S430, and S440 (including S442) is omitted.

도 10을 참조하면, 우선 영상으로부터 특징 맵을 추출한다(S405). 이 때, 본 실시예에서는 데이터 손실을 고려하여 특징 맵을 추출한다.Referring to Figure 10, first, a feature map is extracted from the image (S405). At this time, in this embodiment, the feature map is extracted considering data loss.

본 단계에서 추출되는 특징 맵은 머신비전 네트워크를 통해 일반 영상으로부터 임의의 레이어로부터 추출되는 특징/특징 맵/특징 벡터/특징 집합 중 적어도 하나 이상이 될 수 있다.The feature map extracted in this step may be at least one of a feature/feature map/feature vector/feature set extracted from an arbitrary layer from a general image through a machine vision network.

특징 맵은 [H, W], [C, H, W], [B, C, H, W] 중 적어도 하나 이상의 방법으로 표현될 수 있다. C는 특징 맵의 채널 수, H는 특징 맵의 높이, W는 특징 맵의 너비일 수 있으며 0 이상의 정수로 표현될 수 있다. B는 [C, H, W]의 크기를 갖는 특징 맵의 수로 0 이상의 정수로 표현될 수 있다. 특징 집합은 서로 같은 [C, H, W]를 갖거나 서로 다른 [C, H, W]를 갖는 하나 이상의 특징 맵의 집합일 수 있다.The feature map may be expressed in at least one of [H, W], [C, H, W], and [B, C, H, W]. C may be the number of channels in the feature map, H may be the height of the feature map, and W may be the width of the feature map, and may be expressed as an integer greater than or equal to 0. B is the number of feature maps with a size of [C, H, W] and can be expressed as an integer greater than 0. A feature set may be a set of one or more feature maps that have the same [C, H, W] or different [C, H, W].

본 단계에서 추출되는 특징 맵은 차원축소, 양자화, 채널절삭 중 적어도 하나 이상에 의해서 발생할 수 있는 데이터 손실을 고려한 손실 함수 정의에 의해 추출되는 특징 맵이다.The feature map extracted in this step is a feature map extracted by defining a loss function that takes into account data loss that may occur due to at least one of dimensionality reduction, quantization, and channel cutting.

차원 축소는 다운샘플링, 풀링, 임의의 네트워크 레이어 등으로 인해 공간적 또는 시간적 크기가 줄어들 수 있는 모든 방법을 의미할 수 있다. 예를 들어, [C, H, W]의 크기를 갖는 영상 또는 특징 맵이 상기 차원축소 방법 중 적어도 하나 이상에 의해 C, H, W 중 적어도 하나 이상이 기존 크기보다 줄어드는 것을 의미할 수 있다.Dimensionality reduction can refer to any method in which the spatial or temporal size can be reduced by downsampling, pooling, arbitrary network layers, etc. For example, in an image or feature map with a size of [C, H, W], this may mean that at least one of C, H, and W is reduced from the existing size by at least one of the dimension reduction methods.

양자화는 균등 양자화, 비균등 양자화, 벡터 양자화, 기존 비디오 코덱을 활용한 압축 등에 의해 의해 데이터 표현 범위가 감소할 수 있는 모든 방법을 의미할 수 있다. 예를 들어, n비트로 표현된 영상 또는 특징 맵이 m비트로 표현되며 감소하는 것을 의미할 수 있다. 이때 n은 m보다 큰 양의 정수 일 수 있다.Quantization can refer to any method that can reduce the data expression range by uniform quantization, non-uniform quantization, vector quantization, compression using existing video codecs, etc. For example, this may mean that an image or feature map expressed in n bits is expressed in m bits and is reduced. At this time, n may be a positive integer larger than m.

채널 절삭은 3차원 [C, H, W]로 표현된 영상 또는 특징 맵이 채널 단위로 절삭되는 모든 방법을 의미할 수 있다. 채널 절삭에 의해 3차원 [C, H, W]로 표현된 영상 또는 특징 맵의 상위 인덱스의 채널이 임의의 수만큼 절삭될 수 있다. 예를 들어, c개의 채널을 갖는 특징 맵의 상위 인덱스 n개의 채널을 절삭하여 c-n 개의 채널을 갖는 특징 맵이 될 수 있다. 예를 들어, c개의 채널을 갖는 특징 맵의 하위 인덱스 n개의 채널을 절삭하여 c-n개의 채널을 갖는 특징 맵이 될 수 있다. 예를 들어, c개의 채널을 갖는 특징 맵의 임의의 인덱스 n개의 채널을 절삭하여 c-n개의 채널을 갖는 특징 맵이 될 수 있다.Channel cutting can refer to any method in which an image or feature map expressed in three dimensions [C, H, W] is cut on a channel basis. By channel cutting, an arbitrary number of channels of the upper index of an image or feature map expressed in three dimensions [C, H, W] can be cut. For example, the n channels of the upper index of a feature map with c channels can be cut to create a feature map with c-n channels. For example, the n lower index channels of a feature map with c channels can be cut to create a feature map with c-n channels. For example, a feature map with c channels can be made into a feature map with c-n channels by cutting n channels with random indices.

본 단계에서는, 상기 데이터 손실 방법 중 적어도 하나 이상에 의해 발생하는 데이터 손실을 고려하여 손실 함수를 정의하고, 정의된 손실 함수를 적용하여 특징 맵을 추출할 수 있다. 수학식 1은 본 단계에서 정의하는 손실 함수의 일예이다.In this step, a loss function can be defined in consideration of data loss caused by at least one of the data loss methods, and a feature map can be extracted by applying the defined loss function. Equation 1 is an example of the loss function defined in this step.

[수학식 1][Equation 1]

여기에서, L_task는 뉴럴네트워크 학습을 위해 정의된 임의의 손실 함수를 의미하며, 는 기존 네트워크의 파라미터를 의미한다. 수학식 1에서 데이터 손실을 고려한 항은 이다. 이때, d(A,B)는 A와 B의 차이를 의미한다. 는 기존 네트워크 손실 함수를 의미한다. 는 상기 데이터 손실 방법에 의해 손실된 데이터를 포함하여 학습되는 네트워크 파라미터에 의한 손실 함수를 의미한다. λ는 상기 본 발명에서 포함하는 항의 계수를 조절하는 임의의 실수일 수 있다.Here, L _task refers to an arbitrary loss function defined for neural network learning, means the parameters of the existing network. The term considering data loss in Equation 1 is am. At this time, d(A,B) means the difference between A and B. refers to the existing network loss function. means a loss function based on network parameters learned including data lost by the data loss method. λ may be any real number that adjusts the coefficients of the terms included in the present invention.

따라서, 본 단계에서는 기존 손실 함수 를 포함하며 상기 데이터 손실 방법에 의해 손실된 데이터를 포함하는 네트워크 파라미터에 의한 손실 함수와의 차이를 최소화할 수 있는 손실함수 L을 정의한다.Therefore, in this step, the existing loss function and defines a loss function L that can minimize the difference from the loss function by network parameters including data lost by the data loss method.

예를 들어, 상기 채널절삭 방법에 의해 절삭된 특징에 의해 계산되는 손실과 절삭되지 않은 특징에 의해 계산되는 손실의 차이에 임의의 계수를 곱한 항을 절삭되지 않은 특징에 의해 계산되는 손실에 더하여 최종 손실 함수로 구성한다. 예를 들어, 상기 양자화 방법에 의해 양자화된 특징에 의해 계산되는 손실과 양자화되지 않은 특징에 의해 계산되는 손실의 차이에 임의의 계수를 곱한 항을 양자화되지 않은 특징에 의해 계산되는 손실에 더하여 최종 손실 함수로 구성한다. For example, a term obtained by multiplying the difference between the loss calculated by the feature cut by the channel cutting method and the loss calculated by the uncut feature by an arbitrary coefficient is added to the loss calculated by the uncut feature to obtain the final result. It is composed of a loss function. For example, by the quantization method, a term obtained by multiplying the difference between the loss calculated by the quantized feature and the loss calculated by the unquantized feature by an arbitrary coefficient is added to the loss calculated by the unquantized feature to obtain the final loss. Composed of functions.

이로 인해, 상기 데이터 손실 방법에 의해 발생할 수 있는 데이터 손실을 데이터 손실이 발생하지 않을 때 계산되는 손실과의 차이를 최소화함으로써, 추출하고자 하는 특징 맵의 데이터 손실을 최소화할 수 있다.Due to this, the data loss of the feature map to be extracted can be minimized by minimizing the difference between the data loss that may occur by the data loss method and the loss calculated when no data loss occurs.

이상 바람직한 실시예를 들어 본 발명을 상세하게 설명하였으나, 본 발명은 전술한 실시예에 한정되지 않고, 본 발명의 기술적 사상의 범위 내에서 당분야에서 통상의 지식을 가진 자에 의하여 여러 가지 변형이 가능하다.Although the present invention has been described in detail with reference to preferred embodiments, the present invention is not limited to the above-described embodiments, and various modifications may be made by those skilled in the art within the scope of the technical idea of the present invention. possible.

Claims

As a method of processing an image feature map for machine vision,
Sorting multi-channel, multi-scale feature maps extracted from the image into single-scale feature maps;
generating a converted feature map by performing feature channel transformation on one or more single-scale feature maps among the single-scale feature maps; and
A feature map processing method comprising the step of performing encoding on the converted feature map.

According to paragraph 1,
The sorting step is a method of processing a feature map, characterized in that the feature maps having different resolutions are scaled to the same resolution.

According to paragraph 2,
The method of processing a feature map, wherein the step of aligning includes combining the single-scale feature maps.

According to paragraph 1,
After performing the encoding, it further includes the step of decoding the encoded feature map,
The step of performing the encoding includes normalizing the converted feature map,
The performing of the decoding includes the step of denormalizing the decoded feature map.

According to paragraph 1,
Further comprising extracting multi-scale feature maps from the image before the aligning step,
In the extracting step, multi-scale feature maps are extracted from the image in consideration of data loss.

According to paragraph 1,
The step of generating the converted feature map includes extracting representative values for each channel for the single-scale feature maps.

According to clause 6,
A method of processing a feature map, characterized in that the representative value for each channel is adjusted through a fully-connected layer.

According to clause 6,
A feature map processing method, characterized in that the single-scale feature maps are adjusted by the representative value for each channel.

According to paragraph 1,
The step of generating the converted feature map includes one or more layers from a pooling layer, a fully-connected layer, a convolution layer, and an activation function layer. A method of processing a feature map, characterized in that it is performed based on a transformation network.

According to paragraph 1,
The converted feature maps have a different number of channels than the multiscale feature maps,
The step of generating the converted feature maps includes adjusting quantization coefficients based on the number of channels of the converted feature maps.

According to paragraph 1,
The step of generating the converted feature maps includes calculating the importance of each channel of the multi-scale feature maps and converting the number of channels based on the importance of each channel.

According to paragraph 1,
The step of performing the encoding includes performing channel rearrangement of the converted feature maps based on channel-specific correlation of the converted feature maps.

According to clause 12,
The correlation for each channel is calculated based on the average value for each channel, the center value for each channel, or the mean square error for each channel.

According to paragraph 1,
The step of performing the encoding includes converting the converted feature maps into feature maps of one frame.

According to clause 14,
The step of performing the encoding includes performing tile-based encoding on the converted feature maps of one frame.

performing size restoration on the converted feature maps; and
Generating restored feature maps by performing channel number restoration on the converted feature maps,
A method of processing a feature map, wherein the restored feature maps include feature maps having different resolutions or numbers of channels.

According to clause 16,
The step of performing size restoration is a method of processing a feature map, characterized in that configuring layers for restoration differently depending on the resolution of the feature map to be restored.

According to clause 16,
The reconstructed feature maps are performed based on a transformation network including one or more layers from a pooling layer, a fully connected layer, a convolution layer, and an activation function layer.

According to clause 16,
A feature map processing method, wherein the step of performing size restoration uses size information of the original feature map.

According to clause 16,
A method of processing a feature map, wherein the size restoration or the channel number restoration is performed using residual-based restoration.