KR20230063307A

KR20230063307A - Apparatus for switching main screen based on distributed telepresence and method using the same

Info

Publication number: KR20230063307A
Application number: KR1020220116425A
Authority: KR
Inventors: 윤정일; 곽상운; 김준수
Original assignee: 한국전자통신연구원
Priority date: 2021-11-01
Filing date: 2022-09-15
Publication date: 2023-05-09

Abstract

An image processing method for performing machine vision missions based on feature importance and a device therefor are disclosed. The image processing method according to one embodiment of the present invention may comprise the steps of: extracting, by an image processing device for performing machine vision missions, features from an original image and a transformed image obtained by equally deteriorating the original image; generating a gradient map for the original image using feature loss calculated based on the extracted features; and performing image processing on the original image by considering the gradient value for each region included in the gradient map.

Description

Image processing method for machine vision mission performance based on feature importance and apparatus therefor

본 발명은 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 기술에 관한 것으로, 특히 영상 정보를 부호화 및 복호화하는 과정에서 발생하는 영상 품질 열화에 의한 머신비전 임무 수행 성능 저하를 최소화하면서 부호화 압축 효율을 향상시키는 기술에 관한 것이다.The present invention relates to an image processing technology for performing a machine vision mission based on feature importance. In particular, encoding and compression efficiency are improved while minimizing degradation in performance of a machine vision mission due to image quality deterioration occurring in the process of encoding and decoding image information. It's about improving your skills.

종래의 머신비전 임무 수행을 위한 영상 부/복호화 과정을 포함한 파이프라인은 도 1과 같다. A pipeline including an image encoding/decoding process for performing a conventional machine vision task is shown in FIG. 1.

부호화부와 복호화부는 HEVC(High Efficiency Video Coding), VVC(Versatile Video Coding)와 같이 기존의 사람을 위한 영상 압축 기술이 적용되며, 머신비전 임무 수행을 위한 입력 영상을 부호화하여 압축된 비트스트림(bitstream)을 생성하고 이를 복호화한다. 또한, 머신비전 임무 수행부는 영상 분류, 객체 검출 및 분할, 추적 등 딥러닝 기반의 다양한 기술이 적용되어 머신비전 임무를 수행한다. The encoding unit and the decoding unit apply existing video compression technologies for humans, such as HEVC (High Efficiency Video Coding) and VVC (Versatile Video Coding), and encode the input video for performing the machine vision mission to generate a compressed bitstream (bitstream). ) and decrypt it. In addition, the machine vision task execution unit performs machine vision tasks by applying various deep learning-based technologies such as image classification, object detection and segmentation, and tracking.

이 때, QP(Quantization Parameter)와 같이 영상 압축률 변화를 위한 파라미터 값 설정에 따라 부호화된 비트스트림의 바이트 크기와 복호화된 영상의 품질이 변화하게 되고, 이로 인해 mAP(Mean Average Precision)와 같이 머신비전 임무 수행부에서 임무 수행 결과를 측정한 성능도 변하게 된다. At this time, the byte size of the encoded bitstream and the quality of the decoded image change according to the setting of the parameter value for changing the image compression rate, such as QP (Quantization Parameter). The performance measured by the mission execution result in the mission execution unit also changes.

이러한 머신비전 임무 수행의 R-D(rate-distortion) 관계를 측정하여 영상 부호화 정도에 따른 머신비전 임무 수행 성능을 비교하고, 이를 기준으로 머신비전 임무에 적합한 압축률을 결정한다.By measuring the R-D (rate-distortion) relationship of machine vision task performance, machine vision task performance performance according to the degree of video encoding is compared, and based on this, a compression rate suitable for machine vision task is determined.

그러나, 전통적인 영상 부호화 기술은 사람의 인지 화질 특성을 고려해 개발되었기 때문에 이를 기계의 임무 수행에 필요한 입력 영상 부호화 기술로 이용할 경우에는 기계를 위한 필수적인 영상 정보 이외에도 사람의 시각적인 인지 화질을 고려한 불필요한 정보가 존재하게 된다.However, since the traditional video encoding technology was developed in consideration of human perception quality characteristics, when it is used as an input video encoding technology necessary for the machine to perform its mission, unnecessary information considering human visual perception quality is required in addition to essential image information for the machine. come into existence

한국 공개 특허 제10-2021-0020971호, 2021년 2월 24일 공개(명칭: 영상 변화 부호화/복호화 방법 및 장치)Korean Patent Publication No. 10-2021-0020971, published on February 24, 2021 (Name: Image Change Encoding/Decoding Method and Device)

본 발명의 목적은 사람의 시각적인 인지 화질을 고려한 불필요한 정보는 제외하고 기계를 위한 머신비전 임무 수행을 위해 추출된 특징의 차이를 최소화함으로써 머신비전 임무 성능을 유지하면서도 압축 효율을 향상시킬 수 있는 영상 처리 기술을 제공하는 것이다. An object of the present invention is an image that can improve compression efficiency while maintaining machine vision mission performance by minimizing the difference in features extracted for machine vision mission performance for machines, excluding unnecessary information considering human visual perception quality. to provide processing technology.

또한, 본 발명의 목적은 기계를 위한 머신비전 임무 수행에 있어서 성능 변화에 큰 영향을 주는 영상 영역의 중요도를 결정하고, 이를 기반으로 기존의 영상 부호화 기술을 적용하더라도 기계의 임무 수행 성능은 유지하면서 영상의 압축 효율을 향상시키는 것이다. In addition, an object of the present invention is to determine the importance of an image area that greatly affects performance change in machine vision task performance for machines, and based on this, even if the existing image encoding technology is applied, the machine's task performance performance is maintained while maintaining It is to improve the compression efficiency of video.

또한, 본 발명의 목적은 추가적인 학습 및 네트워크 변경 없이도 기존의 사람을 위한 부호화 기술을 적용하여 기계의 머신비전 임무 수행 성능 저하를 최소화하고, 영상 부호화 압축 이득을 향상시키는 것이다.In addition, an object of the present invention is to minimize machine vision task performance degradation and improve image encoding compression gain by applying an existing encoding technology for humans without additional learning and network change.

상기한 목적을 달성하기 위한 본 발명에 따른 영상 처리 방법은, 머신비전 임무 수행을 위한 영상 처리 장치가, 원본 영상과 상기 원본 영상을 균등하게 열화시킨 변형된 영상에서 각각 특징을 추출하는 단계; 추출된 특징을 기반으로 산출된 특징 손실(LOSS)을 이용하여 상기 원본 영상에 대한 기울기 맵(GRADIENT MAP)를 생성하는 단계; 및 상기 기울기 맵에 포함된 영역 별 기울기 값을 고려하여 상기 원본 영상에 대한 영상 처리를 수행하는 단계를 포함한다.An image processing method according to the present invention for achieving the above object includes, by an image processing apparatus for performing a machine vision task, extracting features from an original image and a deformed image obtained by equally deteriorating the original image; generating a gradient map for the original image using a feature loss (LOSS) calculated based on the extracted features; and performing image processing on the original image in consideration of a gradient value for each region included in the gradient map.

이 때, 영상 처리를 수행하는 단계는 상기 원본 영상에 대한 영역 별 중요도를 결정하는 단계를 포함하고, 상기 영역 별 중요도를 기반으로 전처리 영상을 생성하여 부호화 및 복호화를 수행하는 제1 영상 처리 방식 및 상기 영역 별 중요도에 상응하게 상기 원본 영상에 영역 별 부호화 파라미터를 적용하여 부호화 및 복호화를 수행하는 제2 영상 처리 방식 중 적어도 하나의 방식을 이용하여 영상 처리를 수행할 수 있다.In this case, the performing of the image processing includes determining the importance of each region of the original image, and generating a preprocessed image based on the importance of each region to perform encoding and decoding. A first image processing method; and Image processing may be performed using at least one of second image processing methods in which encoding and decoding are performed by applying encoding parameters for each region to the original image corresponding to the importance of each region.

이 때, 제1 영상 처리 방식을 이용하는 경우, 상기 영역 별 중요도가 기설정된 기준 값 이상인 영역은 상기 원본 영상을 사용하고, 상기 영역 별 중요도가 상기 기설정된 기준 값 미만인 영역은 상기 변형된 영상을 사용하여 상기 전처리 영상을 생성할 수 있다. In this case, in the case of using the first image processing method, the original image is used for regions whose importance for each region is equal to or greater than a preset reference value, and the transformed image is used for regions whose importance for each region is less than the preset reference value. Thus, the pre-processed image may be generated.

이 때, 제2 영상 처리 방식을 이용하는 경우, 상기 영역 별 중요도에 따라 상기 원본 영상의 영역 별 부호화 가중치를 설정하고, 상기 영역 별 부호화 가중치에 상응하게 상기 영역 별 부호화 파라미터를 설정할 수 있다.At this time, when using the second image processing method, encoding weights for each region of the original video may be set according to the importance of each region, and encoding parameters for each region may be set corresponding to the encoding weight for each region.

이 때, 영역 별 기울기 값의 절대값이 클수록 상기 영역 별 중요도가 높게 결정되고, 상기 영역 별 중요도가 높을수록 상기 영역 별 부호화 가중치가 크게 설정될 수 있다.In this case, the higher the absolute value of the gradient value for each region is, the higher the importance of each region is determined, and the higher the importance of each region is, the higher the encoding weight for each region may be set.

이 때, 영상 처리를 수행하는 단계는 부호화단에서 상기 전처리 영상에 대한 정보 또는 상기 영역 별 부호화 파라미터를 메타데이터로 생성하여 복호화단으로 전달하는 단계를 더 포함할 수 있다.In this case, the performing of image processing may further include generating information about the preprocessed image or encoding parameters for each region as metadata in an encoding stage and transmitting the generated metadata to a decoding stage.

이 때, 영상을 블러링(BLURING)시켜 인접 화소 간의 분포 특성을 변형하는 연산 및 영상에 노이즈(NOISE)를 더해 화소 간의 분포 특성을 불규칙적으로 변형하는 연산 중 적어도 하나를 이용하여 상기 변형된 영상을 생성하는 단계를 더 포함할 수 있다.At this time, the deformed image is obtained by using at least one of an operation of blurring the image to transform distribution characteristics between adjacent pixels and an operation of irregularly transforming distribution characteristics between pixels by adding noise to the image. A generating step may be further included.

이 때, 원본 영상에서 추출된 특징과 상기 변형된 영상에서 추출된 특징 간의 차원 별 특징 차이 및 원소 별 특징 차이를 계산하여 상기 특징 손실을 산출할 수 있다. In this case, the feature loss may be calculated by calculating a feature difference by dimension and a feature difference by element between the feature extracted from the original image and the feature extracted from the transformed image.

이 때, 기울기 맵을 생성하는 단계는 상기 특징 손실을 기반으로 역전파(BACK-PROPAGATION) 과정을 수행하여 상기 영역 별 기울기 값을 계산할 수 있다.In this case, in the step of generating the gradient map, a gradient value for each region may be calculated by performing a back-propagation process based on the feature loss.

또한, 본 발명의 일실시예에 따른 영상 처리 장치는, 원본 영상과 상기 원본 영상을 균등하게 열화시킨 변형된 영상에서 각각 특징을 추출하고, 추출된 특징을 기반으로 특징 손실(LOSS)을 계산하고, 상기 특징 손실을 이용하여 상기 원본 영상에 대한 기울기 맵(GRADIENT MAP)를 산출하고, 상기 기울기 맵을 기반으로 상기 원본 영상에 대한 영역 별 부호화 가중치를 결정하고, 상기 영역 별 부호화 가중치를 적용하여 상기 원본 영상에 대해 머신비전 임무 수행을 위한 영상 처리를 수행하는 프로세서; 및 상기 원본 영상 및 상기 변형된 영상을 저장하는 메모리를 포함한다.In addition, the image processing apparatus according to an embodiment of the present invention extracts features from an original image and a transformed image obtained by equally deteriorating the original image, calculates a feature loss (LOSS) based on the extracted features, and , Calculate a gradient map for the original image using the feature loss, determine encoding weights for each region of the original image based on the gradient map, and apply the encoding weights for each region to obtain the A processor that performs image processing for performing machine vision tasks on original images; and a memory for storing the original image and the transformed image.

이 때, 프로세서는 상기 원본 영상에 대한 영역 별 중요도를 결정하는 단계를 포함하고, 상기 영역 별 중요도를 기반으로 전처리 영상을 생성하여 부호화 및 복호화를 수행하는 제1 영상 처리 방식 및 상기 영역 별 중요도에 상응하게 상기 원본 영상에 영역 별 부호화 파라미터를 적용하여 부호화 및 복호화를 수행하는 제2 영상 처리 방식 중 적어도 하나의 방식을 이용하여 영상 처리를 수행할 수 있다.At this time, the processor determines the importance of each region of the original image, and based on the importance of each region, the first image processing method generates a preprocessed image and performs encoding and decoding, and the importance of each region Correspondingly, image processing may be performed using at least one of second image processing methods in which encoding and decoding are performed by applying encoding parameters for each region to the original image.

이 때, 프로세서는, 제1 영상 처리 방식을 이용하는 경우, 상기 영역 별 중요도가 기설정된 기준 값 이상인 영역은 상기 원본 영상을 사용하고, 상기 영역 별 중요도가 상기 기설정된 기준 값 미만인 영역은 상기 변형된 영상을 사용하여 상기 전처리 영상을 생성할 수 있다. In this case, when the first image processing method is used, the processor uses the original image for regions whose importance for each region is equal to or greater than a preset reference value, and uses the original image for regions whose importance for each region is less than the preset reference value for transforming regions. The pre-processed image may be generated using the image.

이 때, 프로세서는, 제2 영상 처리 방식을 이용하는 경우, 상기 영역 별 중요도에 따라 상기 원본 영상의 영역 별 부호화 가중치를 설정하고, 상기 영역 별 부호화 가중치에 상응하게 상기 영역 별 부호화 파라미터를 설정할 수 있다.In this case, when the second image processing method is used, the processor may set encoding weights for each region of the original video according to the importance of each region, and set encoding parameters for each region corresponding to the encoding weight for each region. .

이 때, 프로세서는 부호화단에서 상기 전처리 영상에 대한 정보 또는 상기 영역 별 부호화 파라미터를 메타데이터로 생성하여 복호화단으로 전달할 수 있다.In this case, the processor may generate information about the preprocessed image or encoding parameters for each region as metadata in an encoding stage, and transmit the generated metadata to a decoding stage.

이 때, 프로세서는 영상을 블러링(BLURING)시켜 인접 화소 간의 분포 특성을 변형하는 연산 및 영상에 노이즈(NOISE)를 더해 화소 간의 분포 특성을 불규칙적으로 변형하는 연산 중 적어도 하나를 이용하여 상기 변형된 영상을 생성할 수 있다.At this time, the processor uses at least one of an operation of blurring the image to transform the distribution characteristics between adjacent pixels and an operation of irregularly transforming the distribution characteristics between pixels by adding noise to the image. You can create video.

이 때, 프로세서는 상기 특징 손실을 기반으로 역전파(BACK-PROPAGATION) 과정을 수행하여 상기 영역 별 기울기 값을 계산할 수 있다.In this case, the processor may calculate the gradient value for each region by performing a back-propagation process based on the feature loss.

본 발명에 따르면, 사람의 시각적인 인지 화질을 고려한 불필요한 정보는 제외하고 기계를 위한 머신비전 임무 수행을 위해 추출된 특징의 차이를 최소화함으로써 머신비전 임무 성능을 유지하면서도 압축 효율을 향상시킬 수 있는 영상 처리 기술을 제공할 수 있다.According to the present invention, an image that can improve compression efficiency while maintaining machine vision mission performance by minimizing the difference in features extracted to perform machine vision missions for machines, excluding unnecessary information considering human visual perception quality. processing technology can be provided.

또한, 본 발명은 기계를 위한 머신비전 임무 수행에 있어서 성능 변화에 큰 영향을 주는 영상 영역의 중요도를 결정하고, 이를 기반으로 기존의 영상 부호화 기술을 적용하더라도 기계의 임무 수행 성능은 유지하면서 영상의 압축 효율을 향상시킬 수 있다.In addition, the present invention determines the importance of an image area that greatly affects performance change in performing machine vision tasks for machines, and based on this, even if the existing image encoding technology is applied, the machine's task performance performance is maintained while maintaining the image quality. The compression efficiency can be improved.

또한, 본 발명은 추가적인 학습 및 네트워크 변경 없이도 기존의 사람을 위한 부호화 기술을 적용하여 기계의 머신비전 임무 수행 성능 저하를 최소화하고, 영상 부호화 압축 이득을 향상시킬 수 있다.In addition, the present invention can minimize machine vision task performance degradation and improve image encoding compression gain by applying an existing encoding technology for humans without additional learning and network change.

도 1은 종래의 머신비전 임무 수행을 위한 영상 처리 파이프라인 구조의 일 예를 나타낸 도면이다.
도 2는 본 발명에 따른 영상 처리 장치 내 머신비전 임무 수행부의 일 예를 나타낸 블록도이다.
도 3은 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법을 나타낸 동작흐름도이다.
도 4는 본 발명에 따른 머신비전 임무 수행을 위한 영상 처리 파이프라인 구조의 일 예를 나타낸 도면이다.
도 5 내지 도 6은 도 4에 도시된 영상 중요 영역 및 부호화 가중치 판별부의 동작 개념의 일 예를 나타낸 도면이다.
도 7은 도 5 내지 도 4를 통해 중요도 가중치를 결정하는 과정의 일 예를 나타낸 도면이다.
도 8은 본 발명에 따른 제1 영상 처리 방식을 이용하여 영상 처리를 수행하는 일 예를 나타낸 도면이다.
도 9는 본 발명에 따른 전처리 영상을 생성하는 일 예를 나타낸 도면이다.
도 10은 도 9에 도시된 과정을 보다 상세하게 나타낸 도면이다.
도 11은 본 발명에 따른 제1 영상 처리 방식의 처리 과정을 상세하게 나타낸 동작흐름도이다.
도 12는 본 발명에 따른 제2 영상 처리 방식을 이용하여 영상 처리를 수행하는 일 예를 나타낸 도면이다.
도 13은 본 발명에 따른 제2 영상 처리 방식의 처리 과정을 상세하게 나타낸 동작흐름도이다.
도 14는 본 발명에 따른 제1 영상 처리 방식과 제2 영상 처리 방식을 모두 이용하여 영상 처리를 수행하는 일 예를 나타낸 도면이다.
도 15는 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 장치를 나타낸 도면이다.1 is a diagram showing an example of an image processing pipeline structure for performing a conventional machine vision task.
2 is a block diagram illustrating an example of a machine vision task performing unit in an image processing device according to the present invention.
3 is an operation flowchart illustrating an image processing method for performing a machine vision task based on feature importance according to an embodiment of the present invention.
4 is a diagram showing an example of an image processing pipeline structure for performing a machine vision task according to the present invention.
5 to 6 are diagrams illustrating an example of an operation concept of an image important region and an encoding weight determining unit shown in FIG. 4 .
7 is a diagram illustrating an example of a process of determining importance weights through FIGS. 5 to 4 .
8 is a diagram illustrating an example of performing image processing using a first image processing method according to the present invention.
9 is a diagram illustrating an example of generating a preprocessed image according to the present invention.
10 is a diagram showing the process shown in FIG. 9 in more detail.
11 is an operation flowchart showing in detail a processing process of the first image processing method according to the present invention.
12 is a diagram illustrating an example of performing image processing using a second image processing method according to the present invention.
13 is an operation flowchart showing in detail the processing process of the second image processing method according to the present invention.
14 is a diagram illustrating an example of performing image processing using both a first image processing method and a second image processing method according to the present invention.
15 is a diagram illustrating an image processing apparatus for performing a machine vision task based on feature importance according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.The present invention will be described in detail with reference to the accompanying drawings. Here, repeated descriptions, well-known functions that may unnecessarily obscure the subject matter of the present invention, and detailed descriptions of configurations are omitted. Embodiments of the present invention are provided to more completely explain the present invention to those skilled in the art. Accordingly, the shapes and sizes of elements in the drawings may be exaggerated for clarity.

본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다.In this document, "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "A Each of the phrases such as "at least one of , B, or C" may include any one of the items listed together in that phrase, or all possible combinations thereof.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 머신비전 임무 수행에 필요한 영상 정보를 부호화하여 저장 및 전송하기 위한 압축된 비트스트림을 생성하되, 이를 복호화하는 과정에서 발생하는 영상 품질 열화에 의한 머신비전 임무 수행 성능 저하를 최소화하면서도 부호화 압축 효율을 향상시키는 기술에 관한 것이다. The present invention creates a compressed bitstream for encoding, storing, and transmitting video information necessary for performing a machine vision mission, while minimizing degradation in performance of a machine vision mission due to image quality deterioration occurring in the process of decoding the encoded bitstream and compressing the encoded image. It is about technology that improves efficiency.

이를 위해, 본 발명에서는 도 2에 도시된 것과 같이 영상의 특징 정보를 추출하는 특징 추출부와 추출된 특징 정보를 이용하여 영상의 시각적/의미적 특징 정보를 검출 및 목적에 따른 임무를 수행하는 검출 및 인식부로 구분되는 딥러닝 기반의 뉴럴 네트워크(Neural Network) 구조를 제안하고자 한다. To this end, in the present invention, as shown in FIG. 2, the feature extractor extracts the feature information of the image and detects the visual/semantic feature information of the image using the extracted feature information and detects the task according to the purpose. We propose a structure of a deep learning-based neural network divided into a recognition unit and a recognition unit.

이 때, 본 발명에서 제안하는 딥러닝 기반의 뉴럴 네트워크는 특정 구조에 한정되지 않는 넓은 의미로 사용될 수 있다. 예를 들어, DNN(Deep Neural Network), CNN(Convolution Neural Network), RNN(Recurrent Neural Network) 등 다양한 방식의 딥러닝 기술이 사용될 수 있다. At this time, the deep learning-based neural network proposed in the present invention can be used in a broad sense that is not limited to a specific structure. For example, various deep learning techniques such as a deep neural network (DNN), a convolution neural network (CNN), and a recurrent neural network (RNN) may be used.

이 후, 본 발명에서 사용하는 '네트워크'는 딥러닝 기반 뉴럴 네트워크의 전체 또는 부분을 의미할 수 있다. After that, the 'network' used in the present invention may mean all or part of a deep learning-based neural network.

또한, 본 발명에서 제안하는 도 2의 머신비전 임무 수행부는 학습을 위해 별도로 분리된 데이터셋으로 학습이 완료되어 네트워크의 가중치가 고정되어 있으며, 성능 변화에 영향을 줄 수 있는 추가적인 학습 과정에 해당하는 파인튜닝(fine tuning)이나 네트워크 구조의 변경 및 추가는 없는 것으로 가정한다. In addition, the machine vision task execution unit of FIG. 2 proposed by the present invention is trained with a separately separated dataset for learning, the weight of the network is fixed, and corresponds to an additional learning process that can affect performance change. It is assumed that there is no change or addition of fine tuning or network structure.

이 때, 도 2에 도시된 특징 추출부는 머신비전 임무 수행 목적에 따라 입력 영상의 시각적 또는 의미적인 특징을 추출할 수 있다. 여기서 특징이란, 통상적으로 특징 추출부의 결과로써 출력되는 고차원의 텐서 형태 데이터를 의미할 수 있는데, 이 값들이 0이 아닌 큰 값을 가지는지 여부를 통해 특징 유무를 할 수 있다. 즉, 입력 영상 화소 값들의 분포 특성에 따라 출력되는 특징의 값과 그 분포가 변화할 수 있다. At this time, the feature extraction unit shown in FIG. 2 may extract visual or semantic features of the input image according to the purpose of performing the machine vision mission. Here, the feature may refer to high-dimensional tensor-type data that is typically output as a result of the feature extraction unit, and the presence or absence of a feature can be determined by determining whether these values have a large value other than 0. That is, the output feature value and its distribution may change according to the distribution characteristics of the pixel values of the input image.

예를 들어, 화소 단위로 분할하는 머신비전 임무 수행에 사용되는 대표적인 딥러닝 기술인 Mask R-CNN에 적용된 FPN(Feature Pyramid Network)의 출력 {P2, P3, P4, P5}를 특징 추출부에서 출력되는 특징이라 할 수 있다. For example, the output {P2, P3, P4, P5} of FPN (Feature Pyramid Network) applied to Mask R-CNN, which is a representative deep learning technique used to perform machine vision missions of dividing into pixels, is output from the feature extraction unit. can be said to be a feature.

또한, 도 2에 도시된 검출 및 인식부는 특징 추출부에서 추출된 특징에서 객체의 종류 및 위치를 검출하거나, 영상의 의미적인 특징을 인식하여 머신비전 임무를 수행할 수 있다. In addition, the detection and recognition unit shown in FIG. 2 may perform a machine vision task by detecting the type and position of an object from the features extracted by the feature extraction unit or by recognizing semantic features of an image.

예를 들어, 영상 내 객체들을 검출하고, 그 위치와 분류를 판단할 수 있다. For example, objects in an image can be detected, and their location and classification can be determined.

도 3은 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법을 나타낸 동작흐름도이다.3 is an operation flowchart illustrating an image processing method for performing a machine vision task based on feature importance according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법은, 머신비전 임무 수행을 위한 영상 처리 장치가, 원본 영상과 원본 영상을 균등하게 열화시킨 변형된 영상에서 각각 특징을 추출한다(S310).Referring to FIG. 3 , in an image processing method for performing a machine vision task based on feature importance according to an embodiment of the present invention, an image processing apparatus for performing a machine vision task equally degrades an original image and an original image. Each feature is extracted from the transformed image (S310).

이 때, 변형된 영상을 생성하고 그 특징을 추출하는 이유는, 도 2에 도시된 특징 추출부에 입력되는 영상 화소값들의 분포 특성 변화에 따른 특징 값 및 그 분포 특성의 변화의 상대적인 관계를 파악하기 위한 것일 수 있다. At this time, the reason for generating the deformed image and extracting the feature is to understand the relative relationship between the feature value according to the change in the distribution characteristic of the image pixel values input to the feature extraction unit shown in FIG. 2 and the change in the distribution characteristic. it may be for

예를 들어, 도 4는 본 발명에 따른 머신비전 임무 수행을 위한 영상 처리 파이프라인 구조의 일 예를 나타낸 것으로, 도 4에 도시된 영상 변형부(410)로 원본 영상(I)을 입력하여 변형된 영상(I`)을 생성할 수 있다. For example, FIG. 4 shows an example of an image processing pipeline structure for performing a machine vision task according to the present invention, and transforms an original image I by inputting it to the image transforming unit 410 shown in FIG. 4 . image (I`) can be generated.

이 때, 영상을 블러링(BLURING)시켜 인접 화소 간의 분포 특성을 변형하는 연산 및 영상에 노이즈(NOISE)를 더해 화소 간의 분포 특성을 불규칙적으로 변형하는 연산 중 적어도 하나를 이용하여 변형된 영상을 생성할 수 있다. At this time, a deformed image is generated using at least one of an operation of blurring the image to transform the distribution characteristics between adjacent pixels and an operation of irregularly transforming the distribution characteristics between pixels by adding noise to the image. can do.

예를 들어, 블러링만으로 영상을 변형시키거나 노이즈만으로 영상을 변형시킬 수 있으며, 블러링과 노이즈를 이용하는 각각의 연산을 조합하여 영상을 변형시킬 수도 있다. For example, an image may be transformed only with blurring or only with noise, or an image may be transformed by combining operations using blurring and noise.

이 때, 원본 영상과 변형된 영상에서 특징을 추출하는 과정은 도 4에 도시된 영상 중요 영역 및 부호화 가중치 판별부(430)에서 수행될 수 있다.At this time, the process of extracting features from the original image and the modified image may be performed by the image important region and encoding weight determining unit 430 shown in FIG. 4 .

이하에서는 도 5를 참조하여 도 4에 도시된 영상 중요 영역 및 부호화 가중치 판별부(430)에서 원본 영상과 변형된 영상 각각에 대해 특징을 추출하는 과정을 보다 구체적으로 설명하도록 한다. Hereinafter, with reference to FIG. 5, a process of extracting features for each of the original image and the modified image in the image important region and encoding weight determining unit 430 shown in FIG. 4 will be described in more detail.

도 5를 참조하면, 본 발명에 따른 영상 중요 영역 및 부호화 가중치 판별부(430)는 도 2에 도시된 특징 추출부와 동일한 네트워크를 복수개 이용하여 원본 영상과 변형된 영상 각각에서 특징을 추출할 수 있다. 즉, 하나의 특징 추출부(510)로는 원본 영상(I)을 입력하여 고차원 텐서 형태의 특징 데이터(F)를 획득하고, 다른 하나의 특징 추출부(520)로는 변형된 영상(I`)을 입력하여 또 다른 특징 데이터(F`)를 획득할 수 있다. Referring to FIG. 5 , the image important region and encoding weight determining unit 430 according to the present invention can extract features from each of the original image and the modified image using a plurality of networks identical to the feature extracting unit shown in FIG. 2 . there is. That is, the original image (I) is input into one feature extraction unit 510 to obtain feature data (F) in the form of a high-dimensional tensor, and the transformed image (I`) is obtained by the other feature extraction unit 520. It is possible to obtain another characteristic data (F') by inputting.

또한, 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법은, 머신비전 임무 수행을 위한 영상 처리 장치가, 추출된 특징을 기반으로 산출된 특징 손실(LOSS)을 이용하여 원본 영상에 대한 기울기 맵(GRADIENT MAP)을 생성한다(S320).In addition, in an image processing method for performing a machine vision mission based on feature importance according to an embodiment of the present invention, an image processing apparatus for performing a machine vision mission performs a feature loss (LOSS) calculated based on an extracted feature. A gradient map for the original image is generated by using (S320).

이 때, 원본 영상에서 추출된 특징과 변형된 영상에서 추출된 특징 간의 차원 별 특징 차이 및 원소 별 특징 차이를 계산하여 특징 손실을 산출할 수 있다. In this case, feature loss may be calculated by calculating feature differences by dimensions and feature differences by elements between features extracted from the original image and features extracted from the deformed image.

이 때, 특징 손실을 기반으로 역전파(BACK-PROPRAGATION) 과정을 수행하여 영역 별 기울기 값을 계산할 수 있다.In this case, a gradient value for each region may be calculated by performing a back-propagation process based on the feature loss.

예를 들어, 특징 손실(LOSS)을 산출하고 이를 이용하여 기울기 맵을 생성하는 과정은 도 4에 도시된 영상 중요 영역 및 부호화 가중치 판별부(430)에서 수행될 수 있다.For example, the process of calculating loss of feature (LOSS) and generating a gradient map using it may be performed by the image important region and encoding weight determining unit 430 shown in FIG. 4 .

이하에서는 도 5 내지 도 6을 참조하여 도 4에 도시된 영상 중요 영역 및 부호화 가중치 판별부(430)를 보다 구체적으로 설명하도록 한다. Hereinafter, with reference to FIGS. 5 and 6 , the image important region and the encoding weight determiner 430 shown in FIG. 4 will be described in more detail.

먼저, 도 5를 참조하면, 영상 중요 영역 및 부호화 가중치 판별부(430)에 포함되는 특징 손실 계산부(530)는 특징 추출부들(510, 520)의 출력에 해당하는 F와 F`에 대응되는 차원 별 특징 차이 및 원소 별 특징 차이를 계산하여 특징 손실 값 L을 출력하는 순방향 과정을 수행할 수 있다. First, referring to FIG. 5 , the feature loss calculation unit 530 included in the image important region and encoding weight determining unit 430 corresponds to F corresponding to the outputs of the feature extraction units 510 and 520 and F′ corresponding to A forward process of outputting a feature loss value L by calculating feature differences per dimension and feature differences per element may be performed.

예를 들어, 손실 계산을 위한 함수로는 통상적으로 사용되는 MSE loss를 적용할 수 있다. For example, a commonly used MSE loss can be applied as a function for calculating loss.

이 후, 각 입력 영상 별로 영상의 화소 값 변화가 특징 손실 값 L에 영향을 주는 상대적 중요도 관례를 파악하기 위해서, 도 6에 도시된 것처럼 미분 ∂L/∂I 와 ∂L/∂I'를 계산하는 역방향 과정, 즉 역전파(BACK-PROPRAGATION) 과정을 수행할 수 있다.Then, in order to understand the relative importance convention in which the pixel value change of the image affects the feature loss value L for each input image, the derivatives ∂L/∂I and ∂L/∂I' are calculated as shown in FIG. It is possible to perform a reverse process, that is, a back-propagation process.

이 때, ∂L/∂I와 ∂L/∂I'는 수학적으로 기울기(gradient)에 해당하기 때문에 양수 또는 음수가 될 수 있고, 그 값이 0인 경우에는 입력의 변화에 따라 출력에 변화가 없다는 것을 의미할 수 있다. 즉, ∂L/∂I와 ∂L/∂I'의 절대값이 큰 경우에는 입력의 변화가 출력의 변화에 영향을 줄 수 있는 요인으로 작용한다고 판단할 수 있다. At this time, since ∂L/∂I and ∂L/∂I' correspond to gradients mathematically, they can be positive or negative numbers. If the values are 0, the output changes according to the change in the input. can mean no That is, when the absolute values of ∂L/∂I and ∂L/∂I' are large, it can be determined that the change in input acts as a factor that can affect the change in output.

따라서, 도 5 내지 도 6과 같은 과정을 통해 원본 영상에 전체 화소에 대한 기울기 값을 산출함으로서 원본 영상에 대한 기울기 맵을 생성할 수 있다.Accordingly, a gradient map for the original image may be generated by calculating gradient values for all pixels in the original image through the same process as shown in FIGS. 5 and 6 .

이 때, 도 5 및 도 6에 도시된 처리 과정은 일반적인 딥러닝 학습 과정에서 사용되는 forward/backward propagation에 해당할 수 있다.At this time, the processing shown in FIGS. 5 and 6 may correspond to forward/backward propagation used in a general deep learning process.

또한, 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법은, 머신비전 임무 수행을 위한 영상 처리 장치가, 기울기 맵에 포함된 영역 별 기울기 값을 고려하여 원본 영상에 대한 영상 처리를 수행한다(S330).In addition, in an image processing method for performing a machine vision task based on feature importance according to an embodiment of the present invention, an image processing device for performing a machine vision task considers a gradient value for each region included in a gradient map to obtain an original image Image processing is performed on (S330).

이 때, 원본 영상에 대한 영역 별 중요도를 결정할 수 있다.At this time, the importance of each region of the original image may be determined.

이 때, 본 발명에서 제안하는 기술의 핵심은, 특징 추출부에 입력되는 영상 화소 값들의 분포 특성이 변화했을 때에 이에 따른 특징 값 및 그 분포 특성의 변화가 상대적으로 큰 경우에는 머신비전 임무 수행을 위한 중요도가 높다고 판단하고, 이에 따른 특징 값 및 그 분포 특성의 변화가 상대적으로 작은 경우에는 머신비전 임무 수행을 위한 중요도가 낮다고 판단함으로써 입력된 원본 영상의 영역 별 중요도와 가중치를 결정하는 것이다. At this time, the core of the technology proposed by the present invention is that when the distribution characteristics of the image pixel values input to the feature extraction unit change, the machine vision task can be performed when the corresponding feature values and the change in the distribution characteristics are relatively large. When it is determined that the importance for the image is high, and the change in the feature value and its distribution characteristics are relatively small, the importance and weight for each region of the original image are determined by determining that the importance for performing the machine vision task is low.

이를 위해, 도 4에 도시된 것과 같은 영상 중요 영역 및 부호화 가중치 판별부를 이용하여 특징 중요도 기반의 부호화 가중치, 즉 원본 영상에 대한 영역 별 중요도를 결정할 수 있다.To this end, encoding weights based on feature importance, that is, importance for each region with respect to the original video, may be determined using an image important region and an encoding weight determining unit as shown in FIG. 4 .

이하에서는 도 7을 참조하여 원본 영상의 영역 별 중요도를 결정하는 과정을 상세하게 설명하도록 한다. Hereinafter, with reference to FIG. 7, a process of determining the importance of each region of the original video will be described in detail.

도 7은 도 5 내지 도 6의 순방향 과정과 역방향 과정을 통해 계산된 ∂L/∂I 및 ∂L/∂I'를 이용하여 원본 영상의 영역 별 중요도 중요도 가중치를 결정하는 과정을 나타낸다. FIG. 7 shows a process of determining importance weights for each region of an original video using ∂L/∂I and ∂L/∂I′ calculated through forward and backward processes of FIGS. 5 to 6 .

먼저, 도 6에서의 기울기 계산 결과에는 양수와 음수가 모두 발생하기 때문에 요소별 제곱(element-wise square) 연산을 통해 양수화(element-wise square)할 수 있다. First, since both positive and negative numbers occur in the gradient calculation result in FIG. 6 , it can be element-wise squared through an element-wise square operation.

이 후, 화소 별로 고려할 수 있는 이웃 영역 내에서 양수화된 기울기가 큰 값으로 발생하는 경우에 주목하기 위해, (∂L/∂I)²와 (∂L/∂I')²를 채널(channel) 방향으로 결합하고(영상의 가로, 세로 방향은 유지), 채널 방향에 대한 MaxPooling3D 연산을 수행할 수 있다. 이러한 과정을 통해 머신비전 성능 저하가 최소화될 수 있다.After that, in order to pay attention to the case where the positive slope occurs with a large value in the neighboring area that can be considered for each pixel, (∂L/∂I) ² and (∂L/∂I') ² are channel ) direction (the horizontal and vertical directions of the image are maintained), and MaxPooling3D operation for the channel direction can be performed. Through this process, machine vision performance degradation may be minimized.

이 때, MaxPooling3D 연산 수행을 위한 가로와 세로 방향의 윈도우 크기는 특징이 추출되는 과정에 영향을 줄 수 있는 이웃 화소들의 수용 영역(receptive field)을 고려하여 결정할 수 있다. At this time, the horizontal and vertical window sizes for performing the MaxPooling3D operation may be determined by considering the receptive fields of neighboring pixels that may affect the feature extraction process.

이 후, 채널 방향의 윈도우 크기는 1로 설정하고, 윈도우를 이동시키는 간격도 1로 설정할 수 있다. Thereafter, the size of the window in the channel direction may be set to 1, and the interval for moving the window may also be set to 1.

이 때, 영상의 각 화소 위치를 기준으로 지정된 윈도우 영역 안에서 채널 방향으로 기울기가 최대인 값을 풀링하여 해당 화소의 중요도 기준 값으로 결정되는 G_max를 획득할 수 있다. In this case, G_max determined as the importance reference value of the corresponding pixel may be obtained by pooling a value having a maximum slope in the channel direction within a window area designated based on the position of each pixel of the image.

이 후, G_max를 이용하여 0부터 1 사이의 값을 갖는 중요도 가중치 W를 얻기 위해서는 정규화 과정을 수행할 수 있는데, 이를 위해 G_max의 통계적인 분포 특성을 바탕으로 최대 임계치 값을 결정하여 클램핑(clamping) 연산을 수행하고 Min-Max 정규화(normalization)를 수행할 수 있다.After that, a normalization process can be performed to obtain an importance weight W having a value between 0 and 1 using G_max. To this end, a maximum threshold value is determined based on the statistical distribution characteristics of G_max and clamping You can perform operations and perform Min-Max normalization.

이 때, 정규화 과정에서 설정되는 최대 임계치 값에 따라 중요도 가중치 W의 평균값이 변화할 수 있다.In this case, the average value of the importance weight W may change according to the maximum threshold value set in the normalization process.

예를 들어, 최대 임계치 값이 1에 가까우면 원본과 큰 차이가 없는 영상이 부호화되어 압축 이득은 낮아지지만 머신비전 임무 수행 성능이 저하될 가능성이 낮아지고, 반대로 최대 임계치 값이 0에 가까울수록 압축 이득은 높아지지만 머신비전 임무 수행 성능이 저하될 가능성이 높아질 수 있다. 따라서, 정규화 과정에서 설정되는 최대 임계치 값을 기준으로 R-D 관계에 따른 성능 변화를 조절할 수 있다. For example, if the maximum threshold value is close to 1, an image with no significant difference from the original is encoded, resulting in a low compression gain but a low possibility of deterioration in machine vision task performance. The gains increase, but the possibility of machine vision task performance deterioration may increase. Therefore, the performance change according to the R-D relationship can be adjusted based on the maximum threshold value set in the normalization process.

또한, 추후에 설명할 제2 영상 처리 방식을 적용하는 경우에는 코딩 블록 단위로 가중치를 설정하는 것이 적합할 수 있는데, 이 경우 코딩 블록 단위로 추가적인 MaxPooling 과정을 수행하여 중요도 가중치 W를 결정할 수도 있다.In addition, when a second image processing scheme to be described later is applied, it may be appropriate to set weights in units of coding blocks. In this case, an additional MaxPooling process may be performed in units of coding blocks to determine the importance weight W.

이 때, 도 7에 도시된 내용은 일실시예를 나타낸 것일 뿐, 채널 방향으로 평균을 취해도 되고, 분포 특성을 고려하는 방식을 사용하는 등 다양한 방식을 적용하여 원본 영상의 영역 별 중요도를 결정할 수 있다. At this time, the content shown in FIG. 7 is only an example, and the importance of each region of the original image can be determined by applying various methods, such as taking an average in the channel direction or using a method that considers distribution characteristics. there is.

이 때, 영역 별 중요도를 기반으로 전처리 영상을 생성하여 부호화 및 복호화를 수행하는 제1 영상 처리 방식 및 영역 별 중요도에 상응하게 원본 영상에 영역 별 부호화 파라미터를 적용하여 부호화 및 복호화를 수행하는 제2 영상 처리 방식 중 적어도 하나의 방식을 이용하여 영상 처리를 수행할 수 있다. In this case, the first image processing method generates a preprocessed image based on the importance of each region for encoding and decoding, and the second image processing method performs encoding and decoding by applying encoding parameters for each region to the original image corresponding to the importance of each region. Image processing may be performed using at least one of image processing methods.

이 때, 제1 영상 처리 방식을 이용하는 경우, 영역 별 중요도가 기설정된 기준 값 이상인 영역은 원본 영상을 사용하고, 영역 별 중요도가 기설정된 기준 값 미만인 영역은 변형된 영상을 사용하여 전처리 영상을 생성할 수 있다. At this time, in case of using the first image processing method, an original image is used for areas whose importance is greater than or equal to a predetermined reference value for each area, and a preprocessed image is generated using a deformed image for areas whose importance for each area is less than the predetermined reference value. can do.

즉, 제1 영상 처리 방식은 영역 별 중요도에 따른 부호화 가중치를 원본 영상에 직접 반영함으로써 도 8에 도시된 것처럼 가중치 적용 영상 생성부에서 영역 별 중요도에 따라 부호화부의 압축 효율을 높일 수 있는 전처리 영상을 생성하는 방법이다.That is, the first image processing method directly reflects the encoding weight according to the importance of each region to the original image, and as shown in FIG. way to create

예를 들어, 도 9 및 도 10은 원본 영상(910, 1010)과 가중치에 따라 영상의 고주파 성분을 선택적으로 줄이는 가우시안 블러링(Gaussian blurring)을 적용하여 변형된 영상(920, 1020)을 기반으로 제1 영상 처리 방식을 적용하여 전처리 영상을 생성하는 과정을 나타내고 있다. 이 때, 기울기 맵(930, 1030)을 고려하여 머신비전 임무 수행에 있어 중요한 영역으로 판단되는 부분은 높은 가중치가 적용되어 원본과 동일한 화소들로 영상을 구성하고, 상대적으로 중요도가 낮은 부분은 가중치에 반비례하는 강도의 가우시안 블러링이 적용된 화소들로 영상을 구성하여 전처리 영상(940)을 생성할 수 있다. 이 후, 전처리 영상(940)에 대한 부호화 및 복호화를 수행하여 머신비전 임무를 수행할 수 있다. For example, FIGS. 9 and 10 are based on original images 910 and 1010 and images 920 and 1020 transformed by applying Gaussian blurring to selectively reduce high-frequency components of the image according to weights. A process of generating a pre-processed image by applying the first image processing method is shown. At this time, in consideration of the gradient maps 930 and 1030, high weights are applied to the areas determined to be important in performing the machine vision task, and the image is composed of the same pixels as the original, and parts of relatively low importance are weighted A preprocessed image 940 may be generated by composing an image with pixels to which Gaussian blurring of an intensity inversely proportional to . Thereafter, encoding and decoding of the preprocessed image 940 may be performed to perform the machine vision task.

이 때, 도 8을 참조하면, 부호화부는 전처리 영상에 대한 정보를 메타데이터로 생성하여 복호화부로 전달할 수 있고, 복호화부는 영상을 복호화하는 과정에서 메타데이터를 적용하여 영상을 복원할 수 있다. At this time, referring to FIG. 8 , the encoder may generate information about the preprocessed image as metadata and transmit the information to the decoder, and the decoder may restore the image by applying the metadata in the process of decoding the image.

이하에서는 도 11을 참조하여 제1 영상 처리 방식으로 영상 처리 수행 후 머신비전 임무를 수행하는 과정을 설명하도록 한다. Hereinafter, referring to FIG. 11 , a process of performing a machine vision task after performing image processing using the first image processing method will be described.

먼저, 원본 영상에 대해 결정된 영역 별 중요도가 기설정된 기준 값 이상인 영역은 원본 영상을 사용하고, 영역 별 중요도가 기설정된 기준 값 미만인 영역은 변형된 영상을 사용하여 전처리 영상을 생성할 수 있다(S1110).First, a preprocessed image may be generated by using an original image for a region whose importance determined for each region of the original image is greater than or equal to a preset reference value, and using a transformed image for regions whose importance for each region is less than the preset reference value (S1110). ).

이러한 과정은 도 8에 도시된 가중치 적용 영상 생성부를 통해 수행될 수 있다. This process may be performed through the weighted image generator shown in FIG. 8 .

이 후, 도 8에 도시된 부호화부 및 복호화부를 통해 전처리 영상에 대한 부호화 및 복호화를 수행할 수 있고(S1120), 도 8에 도시된 머신비전 임무 수행부로 복호화된 결과를 입력하여 머신비전 임무를 수행할 수 있다.Thereafter, encoding and decoding of the pre-processed image can be performed through the encoder and decoder shown in FIG. 8 (S1120), and the machine vision task is performed by inputting the decoded result to the machine vision task execution unit shown in FIG. can be done

이 때, 제2 영상 처리 방식을 이용하는 경우, 영역 별 중요도에 따라 원본 영상의 영역 별 부호화 가중치를 설정하고, 영역 별 부호화 가중치에 상응하게 영역 별 부호화 파라미터를 설정할 수 있다. At this time, when using the second image processing method, encoding weights for each region of the original video may be set according to the importance of each region, and encoding parameters for each region may be set corresponding to the encoding weight for each region.

즉, 제2 영상 처리 방식은 도 12에 도시된 것처럼 영역 별 부호화 가중치에 상응하게 설정된 영역 별 부호화 파라미터를 부호화부에 직접 입력하고, 입력된 영역 별 부호화 파라미터를 기반으로 원본 영상의 영역 별 압축률을 선별적으로 적용하여 압축 효율을 높일 수 있다. That is, in the second image processing method, as shown in FIG. 12, the encoding parameter for each region set to correspond to the encoding weight for each region is directly input to the encoder, and the compression rate for each region of the original video is calculated based on the inputted encoding parameter for each region. It can be selectively applied to increase compression efficiency.

이 때, 부호화부는 원본 영상의 영역 별 압축률 설정 정보인 영역 별 부호화 파라미터를 메타데이터로 생성하여 복호화부로 전달할 수 있고, 복호화부는 영상을 복호화하는 과정에 메타데이터를 적용하여 영상을 복원할 수 있다.In this case, the encoder may generate encoding parameters for each region, which is compression rate setting information for each region of the original image, as metadata and transmit the generated metadata to the decoder, and the decoder may restore the image by applying the metadata in the process of decoding the image.

이하에서는 도 13을 참조하여 제2 영상 처리 방식으로 영상 처리 수행 후 머신비전 임무를 수행하는 과정을 설명하도록 한다. Hereinafter, referring to FIG. 13 , a process of performing a machine vision task after performing image processing using the second image processing method will be described.

먼저, 원본 영상에 대해 결정된 영역 별 중요도에 따라 원본 영상의 영역 별 부호화 가중치를 설정하고, 영역 별 부호화 가중치에 상응하게 영역 별 부호화 파라미터를 설정할 수 있다(S1310).First, encoding weights for each region of the original video may be set according to the importance of each region determined for the original video, and encoding parameters for each region may be set corresponding to the encoding weight for each region (S1310).

이 후, 도 12에 도시된 부호화부에서 영역 별 부호화 파라미터에 상응하게 영역 별 압축률을 설정하여 원본 영상에 대한 부호화를 수행하고, 복호화부에서 메타데이터를 기반으로 복호화를 수행할 수 있다(S1320).Thereafter, the encoder shown in FIG. 12 sets the compression rate for each region to correspond to the encoding parameter for each region to encode the original video, and the decoder may perform decoding based on the metadata (S1320). .

이 후, 도 12에 도시된 머신비전 임무 수행부로 복호화된 결과를 입력하여 머신비전 임무를 수행할 수 있다(S1330).Thereafter, the machine vision task may be performed by inputting the decoded result to the machine vision task performing unit shown in FIG. 12 (S1330).

이 때, 영역 별 기울기 값의 절대값이 클수록 영역 별 중요도가 높게 결정되고, 영역 별 중요도가 높을수록 영역 별 부호화 가중치가 크게 설정될 수 있다. At this time, as the absolute value of the gradient value for each region increases, the importance of each region is determined to be high, and the higher the importance of each region, the higher the encoding weight for each region may be set.

예를 들어, 기울기 값의 절대값이 큰 영역은 특징 손실(LOSS)을 변화시키는 기울기가 큰 것을 의미하므로 해당 영역을 조금만 변형시켜도 특징 손실(LOSS)이 커질 수 있다. 반대로, 기울기 값의 절대값이 작은 영역은 특징 손실(LOSS)을 변화시키는 기울기가 작은 것을 의미하므로 해당 영역을 변형시켜도 특징 손실(LOSS)에 큰 변화가 없음을 의미할 수 있다. For example, since a region having a large absolute value of the slope value means a large slope that changes the loss of feature (LOSS), the loss of feature (LOSS) may increase even if the corresponding region is slightly deformed. Conversely, since a region in which the absolute value of the slope value is small means that the slope that changes the loss of feature (LOSS) is small, it may mean that there is no significant change in the loss of feature (LOSS) even if the corresponding region is deformed.

따라서, 영상 변형에 의해 특징 손실의 변화가 큰 영역은 상대적으로 중요도가 높게 결정될 수 있으며 부호화 가중치 또한 크게 설정될 수 있다. Therefore, a region in which a change in feature loss is large due to image deformation may be determined to have a relatively high importance, and a large encoding weight may be set.

단, 기울기 값 자체는 네트워크가 학습된 초평면(hyperplane)의 형태에 따라 달라질 수 있으므로, 기울기 값의 절대값이 0에 가까울수록 중요도가 낮아진다는 정도의 의미로 해석될 수 있다.However, since the gradient value itself may vary depending on the shape of the hyperplane on which the network is learned, it can be interpreted as meaning that the degree of importance decreases as the absolute value of the gradient value approaches 0.

또한, 본 발명에서는 제1 영상 처리 방식과 제2 영상 처리 방식을 모두 적용하여 머신비전 임무를 위한 영상 처리를 수행할 수 있다.In addition, in the present invention, image processing for machine vision tasks may be performed by applying both the first image processing method and the second image processing method.

예를 들어, 도 14를 참조하면, 먼저 영상 중요 영역 및 부호화 가중치 판별부를 통해 결정된 원본 영상의 영역 별 중요도를 기반으로 가중치 적용 영상 생성부에서 전처리 영상을 생성할 수 있다. For example, referring to FIG. 14 , a weighted image generator may generate a preprocessed image based on an image important region and the importance of each region of the original image determined through the encoding weight determiner.

이 후, 부호화부를 통해 전처리 영상에 대한 부호화를 수행하되, 부호화부에서는 영역 별 중요도에 상응하게 설정된 영역 별 부호화 파라미터를 적용하여 전처리 영상에 대한 부호화를 수행할 수 있다. Thereafter, encoding is performed on the preprocessed image through the encoder, and the encoder may perform encoding on the preprocessed image by applying an encoding parameter for each region set to correspond to the importance of each region.

즉, 전처리 영상과 영역 별 부호화 파라미터를 모두 이용하여 부호화를 수행할 수 있으며, 복호화부에서는 전처리 영상에 대한 정보와 영역 별 부호화 파라미터에 대한 메타데이터를 참조하여 복호화를 수행할 수 있다. That is, encoding can be performed using both the preprocessed image and encoding parameters for each region, and the decoder can perform decoding by referring to information about the preprocessed image and metadata about encoding parameters for each region.

이와 같은 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법을 통해 사람의 시각적인 인지 화질을 고려한 불필요한 정보는 제외하고 기계를 위한 머신비전 임무 수행을 위해 추출된 특징의 차이를 최소화함으로써 머신비전 임무 성능을 유지하면서도 압축 효율을 향상시킬 수 있다.Through this image processing method for machine vision mission performance based on feature importance, machine vision missions are performed by minimizing differences in features extracted to perform machine vision missions for machines, excluding unnecessary information considering human visual perception quality. It is possible to improve compression efficiency while maintaining performance.

또한, 기계를 위한 머신비전 임무 수행에 있어서 성능 변화에 큰 영향을 주는 영상 영역의 중요도를 결정하고, 이를 기반으로 기존의 영상 부호화 기술을 적용하더라도 기계의 임무 수행 성능은 유지하면서 영상의 압축 효율을 향상시킬 수 있다.In addition, in performing machine vision missions for machines, the importance of the image area, which has a great impact on performance change, is determined, and based on this, even if the existing image encoding technology is applied, the compression efficiency of the image is improved while maintaining the machine's mission performance. can improve

또한, 추가적인 학습 및 네트워크 변경 없이도 기존의 사람을 위한 부호화 기술을 적용하여 기계의 머신비전 임무 수행 성능 저하를 최소화하고, 영상 부호화 압축 이득을 향상시킬 수 있다.In addition, without additional learning and network change, it is possible to minimize the degradation of machine vision task performance and improve the image encoding compression gain by applying the existing encoding technology for humans.

도 15는 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 장치를 나타낸 도면이다.15 is a diagram illustrating an image processing apparatus for performing a machine vision task based on feature importance according to an embodiment of the present invention.

도 15를 참조하면, 본 발명의 일실시예에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템에서 구현될 수 있다. 도 15에 도시된 바와 같이, 컴퓨터 시스템(1500)은 버스(1520)를 통하여 서로 통신하는 하나 이상의 프로세서(1510), 메모리(1530), 사용자 입력 장치(1540), 사용자 출력 장치(1550) 및 스토리지(1560)를 포함할 수 있다. 또한, 컴퓨터 시스템(1500)은 네트워크(1580)에 연결되는 네트워크 인터페이스(1570)를 더 포함할 수 있다. 프로세서(1510)는 중앙 처리 장치 또는 메모리(1530)나 스토리지(1560)에 저장된 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1530) 및 스토리지(1560)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들어, 메모리는 ROM(1531)이나 RAM(1532)을 포함할 수 있다.Referring to FIG. 15 , an image processing apparatus for performing a machine vision task based on feature importance according to an embodiment of the present invention may be implemented in a computer system such as a computer-readable recording medium. As shown in FIG. 15 , computer system 1500 includes one or more processors 1510, memory 1530, user input devices 1540, user output devices 1550, and storage that communicate with each other via a bus 1520. (1560). In addition, computer system 1500 may further include a network interface 1570 coupled to network 1580 . The processor 1510 may be a central processing unit or a semiconductor device that executes processing instructions stored in the memory 1530 or the storage 1560 . The memory 1530 and the storage 1560 may be various types of volatile or non-volatile storage media. For example, the memory may include ROM 1531 or RAM 1532.

따라서, 본 발명의 실시예는 컴퓨터로 구현된 방법이나 컴퓨터에서 실행 가능한 명령어들이 기록된 비일시적인 컴퓨터에서 읽을 수 있는 매체로 구현될 수 있다. 컴퓨터에서 읽을 수 있는 명령어들이 프로세서에 의해서 수행될 때, 컴퓨터에서 읽을 수 있는 명령어들은 본 발명의 적어도 한 가지 측면에 따른 방법을 수행할 수 있다.Accordingly, embodiments of the present invention may be implemented in a computer-implemented method or a non-transitory computer-readable medium in which instructions executable by a computer are recorded. When executed by a processor, the computer readable instructions may perform a method according to at least one aspect of the present invention.

프로세서(1510)는 원본 영상과 원본 영상을 균등하게 열화시킨 변형된 영상에서 각각 특징을 추출한다.The processor 1510 extracts features from the original image and the transformed image obtained by equally deteriorating the original image.

또한, 프로세서(1510)는 추출된 특징을 기반으로 산출된 특징 손실(LOSS)을 이용하여 원본 영상에 대한 기울기 맵(GRADIENT MAP)을 생성한다.In addition, the processor 1510 generates a gradient map for the original image using a feature loss (LOSS) calculated based on the extracted features.

또한, 프로세서(1510)는 기울기 맵에 포함된 영역 별 기울기 값을 고려하여 원본 영상에 대한 영상 처리를 수행한다.In addition, the processor 1510 performs image processing on the original image in consideration of the gradient value for each region included in the gradient map.

이 때, 원본 영상에 대한 영역 별 중요도를 결정하고, 영역 별 중요도를 기반으로 전처리 영상을 생성하여 부호화 및 복호화를 수행하는 제1 영상 처리 방식 및 영역 별 중요도에 상응하게 원본 영상에 영역 별 부호화 파라미터를 적용하여 부호화 및 복호화를 수행하는 제2 영상 처리 방식 중 적어도 하나의 방식을 이용하여 영상 처리를 수행할 수 있다. In this case, the first image processing method determines the importance of each region of the original image, generates a preprocessed image based on the importance of each region, and performs encoding and decoding, and encoding parameters for each region in the original image corresponding to the importance of each region. Image processing may be performed using at least one of the second image processing methods for performing encoding and decoding by applying .

이 때, 부호화단에서 전처리 영상에 대한 정보 또는 영역 별 부호화 파라미터를 메타데이터로 생성하여 복호화단으로 전달할 수 있다.At this time, the encoding stage may generate information about the preprocessed image or encoding parameters for each region as metadata, and transmit the metadata to the decoding stage.

이와 같은 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 장치를 이용함으로써 사람의 시각적인 인지 화질을 고려한 불필요한 정보는 제외하고 기계를 위한 머신비전 임무 수행을 위해 추출된 특징의 차이를 최소화함으로써 머신비전 임무 성능을 유지하면서도 압축 효율을 향상시킬 수 있다.Machine Vision It can improve compression efficiency while maintaining mission performance.

또한, 추가적인 학습 및 네트워크 변경 없이도 기존의 사람을 위한 부호화 기술을 적용하여 기계의 머신비전 임무 수행 성능 저하를 최소화하고, 영상 부호화 압축 이득을 향상시킬 수 있다.In addition, without additional learning and network change, it is possible to minimize the degradation of the performance of machine vision tasks and improve the image encoding compression gain by applying the existing encoding technology for humans.

이상에서와 같이 본 발명에 따른 특징 중요도 기반의 머신비전 임무 수행을 위한 영상 처리 방법 및 이를 위한 장치는 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the image processing method and apparatus for performing a feature importance-based machine vision mission according to the present invention are not limited to the configuration and method of the embodiments described above, but the above embodiment These may be configured by selectively combining all or part of each embodiment so that various modifications can be made.

410: 영상 변형부
420: 영상 중요 영역 및 부호화 가중치 판별부
421: 가중치 적용 영상 생성부 430: 부호화부
440: 복호화부 450: 머신비전 임무 수행부
510, 520, 610, 620: 특징 추출부 530: 특징 손실 계산부
910: 원본 영상 911: 원본 영상 특징 맵
920: 변형된 영상 921: 변형된 영상 특징 맵
930: 기울기 맵 940: 전처리 영상
1500: 컴퓨터 시스템 1510: 프로세서
1520: 버스 1530: 메모리
1531: 롬 1532: 램
1540: 사용자 입력 장치 1550: 사용자 출력 장치
1560: 스토리지 1570: 네트워크 인터페이스
1580: 네트워크410: image transformation unit
420: image important region and encoding weight determining unit
421: weighted image generator 430: encoder
440: decoding unit 450: machine vision task execution unit
510, 520, 610, 620: feature extraction unit 530: feature loss calculation unit
910 Original image 911 Original image feature map
920 Deformed image 921 Deformed image feature map
930: gradient map 940: preprocessing image
1500: computer system 1510: processor
1520: bus 1530: memory
1531: Rom 1532: Ram
1540: user input device 1550: user output device
1560: storage 1570: network interface
1580: network

Claims

An image processing device for performing machine vision missions,
extracting features from an original image and a transformed image obtained by equally deteriorating the original image;
generating a gradient map for the original image using a feature loss (LOSS) calculated based on the extracted features; and
Performing image processing on the original image in consideration of a gradient value for each region included in the gradient map.
Image processing method comprising a.

The method of claim 1,
The step of performing the image processing is
determining an importance for each region of the original image;
A first image processing method generates a preprocessed image based on the importance of each region and performs encoding and decoding, and a second image processing method performs encoding and decoding by applying encoding parameters for each region to the original image corresponding to the importance of each region. An image processing method characterized in that image processing is performed using at least one of image processing methods.

The method of claim 2,
In the case of using the first image processing method,
The image characterized in that the original image is used for regions whose importance for each region is greater than or equal to a preset reference value, and the preprocessed image is generated using the transformed image for regions whose importance for each region is less than the preset reference value. processing method.

The method of claim 2,
In the case of using the second image processing method,
and setting encoding weights for each region of the original video according to the importance of each region, and setting encoding parameters for each region corresponding to the encoding weight for each region.

The method of claim 4,
The image processing method according to claim 1 , wherein the higher the absolute value of the gradient value for each region is, the higher the importance of each region is.

The method of claim 2,
The step of performing the image processing is
The image processing method of claim 1, further comprising generating metadata about the preprocessed image or encoding parameters for each region in an encoding stage and transmitting the information to a decoding stage.

The method of claim 1,
Generating the deformed image by using at least one of an operation of blurring the image to transform distribution characteristics between adjacent pixels and an operation of irregularly transforming distribution characteristics between pixels by adding noise to the image. Image processing method characterized in that it further comprises.

The method of claim 1,
The image processing method characterized in that the feature loss is calculated by calculating a feature difference per dimension and a feature difference per element between the feature extracted from the original image and the feature extracted from the transformed image.

The method of claim 1,
Generating the gradient map
An image processing method characterized in that calculating a gradient value for each region by performing a back-propagation process based on the feature loss.

Each feature is extracted from the original image and the transformed image obtained by equally degrading the original image, a feature loss (LOSS) is calculated based on the extracted feature, and a gradient map for the original image is obtained using the feature loss ( GRADIENT MAP), determines encoding weights for each region of the original image based on the gradient map, and applies image processing for machine vision tasks to the original image by applying the encoding weights for each region. processor; and
Memory for storing the original image and the transformed image
An image processing device comprising a.

The method of claim 10,
The processor
A first image processing method for determining the importance of each region of the original image, generating a preprocessed image based on the importance of each region, and encoding and decoding the original image according to the importance of each region. An image processing device characterized in that image processing is performed using at least one method among second image processing methods in which encoding and decoding are performed by applying parameters.

The method of claim 11,
The processor
In the case of using the first image processing method,
The image characterized in that the original image is used for regions whose importance for each region is greater than or equal to a preset reference value, and the preprocessed image is generated using the transformed image for regions whose importance for each region is less than the preset reference value. processing unit.

The method of claim 11,
The processor
In the case of using the second image processing method,
The image processing device characterized in that the encoding weight for each region of the original video is set according to the importance of each region, and the encoding parameter for each region is set corresponding to the encoding weight for each region.

The method of claim 13,
The image processing apparatus according to claim 1 , wherein the higher the absolute value of the gradient value for each region is, the higher the importance of each region is, and the higher the importance of each region is, the larger the encoding weight for each region is.

The method of claim 11,
The processor
An image processing device characterized in that an encoding stage generates information about the preprocessed image or encoding parameters for each region as metadata and transmits it to a decoding stage.

The method of claim 10,
The processor
Generating the deformed image using at least one of an operation of blurring the image to transform the distribution characteristics between adjacent pixels and an operation of irregularly transforming the distribution characteristics between pixels by adding noise to the image. characterized image processing device.

The method of claim 10,
The processor
The image processing apparatus according to claim 1 , wherein the feature loss is calculated by calculating a feature difference per dimension and a feature difference per element between a feature extracted from the original image and a feature extracted from the transformed image.

The method of claim 10,
The processor
The image processing device according to claim 1 , wherein a gradient value for each region is calculated by performing a back-propagation process based on the feature loss.