KR20210088546A

KR20210088546A - Image semantic segmentation method and apparatus, storage medium

Info

Publication number: KR20210088546A
Application number: KR1020217011555A
Authority: KR
Inventors: 잔펭 짱; 후이 청; 카이펑 짱
Original assignee: 선전 센스타임 테크놀로지 컴퍼니 리미티드
Priority date: 2019-12-30
Filing date: 2020-04-09
Publication date: 2021-07-14
Also published as: CN111179283A; TW202125408A; JP2022518647A; WO2021134970A1; TWI728791B

Abstract

본 발명의 실시예는 이미지 시맨틱 분할 방법 및 장치, 저장 매체를 개시하였고, 여기서, 상기 이미지 시맨틱 분할 방법은, 획득된 처리될 이미지에 대해 특징 추출을 수행하고, 제1 특징 이미지를 획득하는 단계; 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득하는 단계; 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 상기 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하는 단계; 및 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 것에 응답하여, 제일 마지막에 획득한 상기 타깃 이미지에 기반하여, 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성하는 단계를 포함한다.An embodiment of the present invention discloses an image semantic segmentation method and apparatus, and a storage medium, wherein the image semantic segmentation method includes: performing feature extraction on an obtained image to be processed, and obtaining a first feature image; synchronously extracting a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images; determining a target image according to at least the plurality of second feature images, and synchronously extracting a plurality of context features having different ranges again by using the target image as the new first feature image; and in response to the number of times of synchronously extracting a plurality of context features having different ranges for the first feature image reaches the target number, based on the target image acquired last, corresponding to the image to be processed generating a semantic image.

Description

Image semantic segmentation method and apparatus, storage medium

관련 출원의 상호 참조Cross-referencing of related applications

본 발명은 출원번호가 201911397645.2이고 출원일자가 2019년 12월 30일인 중국 특허 출원에 기반하여 제출하였고, 상기 중국 특허 출원의 우선권을 주장하는 바, 상기 중국 특허 출원의 모든 내용은 참조로서 본 발명에 인용된다.The present invention was filed based on a Chinese patent application with an application number of 201911397645.2 and an application date of December 30, 2019, and claims the priority of the Chinese patent application, all contents of the Chinese patent application are incorporated herein by reference do.

본 발명은 딥 러닝 분야에 관한 것으로서, 특히 이미지 시맨틱 분할 방법 및 장치, 저장 매체에 관한 것이다.The present invention relates to the field of deep learning, and more particularly, to an image semantic segmentation method and apparatus, and a storage medium.

이동 가능한 기계 기기에 대해 말하자면, 장착된 카메라에 의해 수집된 이미지에 대해 시맨틱 분할을 수행하고, 시나리오에 대한 시맨틱 이해를 획득할 수 있음으로써, 장애물 회피, 내비게이션 등 기능을 구현하도록 한다.As for the movable mechanical device, it is possible to perform semantic segmentation on the image collected by the mounted camera and obtain a semantic understanding of the scenario, thereby implementing functions such as obstacle avoidance and navigation.

현재, 한편으로는, 원가 및 기동 성능을 고려하여, 이동 가능한 기계 기기의 컴퓨팅 자원은 흔히 비교적 제한된다. 다른 한편, 이동 가능한 기계 기기는 실시간으로 현실 환경과 상호 작용을 수행해야 한다. 따라서, 제한된 컴퓨팅 자원으로, 어떻게 실시간 시맨틱 분할을 수행할지는, 도전적인 기술적 과제이다At present, on the one hand, in consideration of the cost and maneuverability, the computing resources of movable mechanical devices are often relatively limited. On the other hand, mobile mechanical devices must interact with the real environment in real time. Therefore, with limited computing resources, how to perform real-time semantic segmentation is a challenging technical task.

본 발명의 실시예는 이미지 시맨틱 분할 방법 및 장치, 저장 매체를 제공한다.An embodiment of the present invention provides an image semantic segmentation method and apparatus, and a storage medium.

본 발명 실시예의 제1 측면에 따라, 이미지 시맨틱 분할 방법을 제공하고, 상기 이미지 시맨틱 분할 방법은, 획득된 처리될 이미지에 대해 특징 추출을 수행하고, 제1 특징 이미지를 획득하는 단계; 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득하는 단계; 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 상기 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하는 단계; 및 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 것에 응답하여, 제일 마지막에 획득한 상기 타깃 이미지에 기반하여, 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성하는 단계를 포함한다.According to a first aspect of an embodiment of the present invention, there is provided an image semantic segmentation method, the image semantic segmentation method comprising: performing feature extraction on an obtained image to be processed, and obtaining a first feature image; synchronously extracting a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images; determining a target image according to at least the plurality of second feature images, and synchronously extracting a plurality of context features having different ranges again by using the target image as the new first feature image; and in response to the number of times of synchronously extracting a plurality of context features having different ranges for the first feature image reaches the target number, based on the target image acquired last, corresponding to the image to be processed generating a semantic image.

일부 선택 가능한 실시예에 있어서, 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득하는 단계는, 상기 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득하는 단계; 및 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득하는 단계를 포함한다.In some selectable embodiments, the step of synchronously extracting a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images comprises: obtaining a plurality of third feature images by synchronously performing a dimension reduction process by dividing into channels; and extracting context features having different ranges from at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images.

일부 선택 가능한 실시예에 있어서, 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득하는 단계는, 깊이별 분리 가능한 컨볼루션 및 컨볼루션 커널이 상이한 확장 계수에 대응되는 확장 컨볼루션을 통해, 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득하는 단계를 포함한다.In some selectable embodiments, the step of obtaining a plurality of second feature images by extracting context features having different ranges for at least two third feature images among the plurality of third feature images includes: Through extension convolution corresponding to possible convolutions and extension coefficients having different convolutional kernels, context features having different ranges are extracted for at least two third feature images among the plurality of third feature images. 2 comprising acquiring a feature image.

일부 선택 가능한 실시예에 있어서, 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정는 단계는, 적어도 상기 복수 개 제2 특징 이미지를 융합하여, 제4 특징 이미지를 획득하는 단계; 및 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정하는 단계를 포함한다.In some selectable embodiments, determining the target image according to at least the plurality of second feature images comprises: fusing at least the plurality of second feature images to obtain a fourth feature image; and determining the target image according to at least the fourth feature image.

일부 선택 가능한 실시예에 있어서, 적어도 상기 복수 개 제2 특징 이미지를 융합하여, 제4 특징 이미지를 획득하는 단계는, 상기 복수 개 제2 특징 이미지를 오버레이하여, 상기 제4 특징 이미지를 얻는 단계; 또는 상기 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지에 대해 오버레이를 수행하여, 상기 제4 특징 이미지를 얻는 단계를 포함한다.In some selectable embodiments, obtaining a fourth feature image by fusing at least the plurality of second feature images includes: overlaying the plurality of second feature images to obtain the fourth feature image; or performing an overlay on at least one third feature image among the plurality of second feature images and the plurality of third feature images to obtain the fourth feature image.

일부 선택 가능한 실시예에 있어서, 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정하는 단계는, 상기 제4 특징 이미지에 대해 업 샘플링을 수행하여, 상기 타깃 이미지를 획득하는 단계; 또는 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하여, 상기 타깃 이미지를 획득하는 단계를 포함한다.In some selectable embodiments, the determining of the target image according to at least the fourth feature image comprises: performing up-sampling on the fourth feature image to obtain the target image; or performing sub-pixel convolution on the fourth feature image to obtain the target image.

일부 선택 가능한 실시예에 있어서, 상기 이미지 시맨틱 분할 방법은, 상기 처리될 이미지에 대해 특징 추출 및 차원 축소 처리를 수행한 후, 제5 특징 이미지를 획득하는 단계 - 상기 제5 특징 이미지에 대응되는 특징 추출의 계층 수는 상기 제1 특징 이미지에 대응되는 특징 추출의 계층 수보다 작음 - 를 더 포함하고; 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정하는 단계는, 상기 횟수가 상기 타깃 횟수보다 작은 경우, 상기 제4 특징 이미지 및 상기 제5 특징 이미지를 오버레이한 후 업 샘플링하여, 상기 타깃 이미지를 획득하는 단계; 또는 상기 횟수가 상기 타깃 횟수보다 작은 경우, 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행한 후 얻은 이미지와 상기 제5 특징 이미지에 대해 오버레이하여, 상기 타깃 이미지를 획득하는 단계를 포함한다.In some selectable embodiments, the image semantic segmentation method includes: after performing feature extraction and dimension reduction processing on the image to be processed, obtaining a fifth feature image - a feature corresponding to the fifth feature image the number of layers of extraction is smaller than the number of layers of feature extraction corresponding to the first feature image; In the determining of the target image according to at least the fourth characteristic image, if the number of times is smaller than the target number, over-sampling the fourth characteristic image and the fifth characteristic image and then up-sampling the target image obtaining a; or, when the number is smaller than the target number, overlaying an image obtained after performing sub-pixel convolution on the fourth feature image and the fifth feature image to obtain the target image.

일부 선택 가능한 실시예에 있어서, 상기 제일 마지막에 획득한 상기 타깃 이미지에 대응되는 차원이 타깃 차원이고, 상기 타깃 차원은 기설정된 상기 시맨틱 이미지에 포함된 물체 카테고리의 총 수량에 따라 결정된다.In some selectable embodiments, a dimension corresponding to the last acquired target image is a target dimension, and the target dimension is determined according to a total quantity of object categories included in the preset semantic image.

일부 선택 가능한 실시예에 있어서, 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성한 후, 상기 이미지 시맨틱 분할 방법은, 상기 시맨틱 이미지에 따라 기계 기기 내비게이션을 수행하는 단계를 더 포함한다.In some selectable embodiments, after generating a semantic image corresponding to the image to be processed, the image semantic segmentation method further comprises: performing machine device navigation according to the semantic image.

본 발명 실시예의 제2 측면에 따라, 이미지 시맨틱 분할 장치를 제공하고, 상기 이미지 시맨틱 분할 장치는, 획득된 처리될 이미지에 대해 특징 추출을 수행하고, 제1 특징 이미지를 획득하도록 구성된 특징 추출 모듈; 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득되도록 구성된 컨택스트 특징 추출 모듈; 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 상기 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하도록 구성된 결정 모듈; 및 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 것에 응답하여, 제일 마지막에 획득한 상기 타깃 이미지에 기반하여, 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성하도록 구성된 시맨틱 이미지 생성 모듈을 포함한다.According to a second aspect of an embodiment of the present invention, there is provided an image semantic segmentation apparatus, comprising: a feature extraction module, configured to perform feature extraction on an obtained image to be processed, and obtain a first feature image; a context feature extraction module configured to synchronously extract a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images; a determining module, configured to determine a target image according to at least the plurality of second feature images, and to use the target image as the new first feature image to again synchronously extract a plurality of ranges different context features; and in response to the number of times of synchronously extracting a plurality of context features having different ranges for the first feature image reaches the target number, based on the target image acquired last, corresponding to the image to be processed and a semantic image generation module configured to generate a semantic image to be

일부 선택 가능한 실시예에 있어서, 상기 컨택스트 특징 추출 모듈은, 상기 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득하도록 구성된 제1 처리 서브 모듈; 및 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득하도록 구성된 제2 처리 서브 모듈을 포함한다.In some selectable embodiments, the context feature extraction module is configured to: divide the first feature image into a plurality of channels and synchronously perform dimension reduction processing to obtain a plurality of third feature images processing sub-module; and a second processing sub-module, configured to extract context features having different ranges for at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images.

일부 선택 가능한 실시예에 있어서, 상기 제2 처리 서브 모듈은, 깊이별 분리 가능한 컨볼루션 및 컨볼루션 커널이 상이한 확장 계수에 대응되는 확장 컨볼루션을 통해, 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득하도록 구성된다.In some selectable embodiments, the second processing submodule is configured to perform at least two of the plurality of third feature images through convolution separable by depth and extension convolution corresponding to extension coefficients having different convolution kernels. and extracting context features having different ranges from the third feature image to obtain a plurality of second feature images.

일부 선택 가능한 실시예에 있어서, 상기 결정 모듈은, 적어도 상기 복수 개 제2 특징 이미지를 융합하여, 제4 특징 이미지를 획득하도록 구성된 제1 결정 서브 모듈; 및 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정하도록 구성된 제2 결정 서브 모듈을 포함한다.In some selectable embodiments, the determining module comprises: a first determining sub-module, configured to fuse at least the plurality of second feature images to obtain a fourth feature image; and a second determining sub-module, configured to determine the target image according to at least the fourth feature image.

일부 선택 가능한 실시예에 있어서, 상기 제1 결정 서브 모듈은, 상기 복수 개 제2 특징 이미지를 오버레이하여, 상기 제4 특징 이미지를 얻는 단계; 또는 상기 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지에 대해 오버레이를 수행하여, 상기 제4 특징 이미지를 얻도록 구성된다.In some selectable embodiments, the first determining sub-module comprises: overlaying the plurality of second feature images to obtain the fourth feature image; or performing overlay on at least one third feature image among the plurality of second feature images and the plurality of third feature images to obtain the fourth feature image.

일부 선택 가능한 실시예에 있어서, 상기 제2 결정 서브 모듈은, 상기 제4 특징 이미지에 대해 업 샘플링을 수행하여, 상기 타깃 이미지를 획득하는 단계; 또는 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하여, 상기 타깃 이미지를 획득하도록 구성된다.In some selectable embodiments, the second determining submodule may include: performing up-sampling on the fourth feature image to obtain the target image; or perform sub-pixel convolution on the fourth feature image to obtain the target image.

일부 선택 가능한 실시예에 있어서, 상기 이미지 시맨틱 분할 장치는, 상기 처리될 이미지에 대해 특징 추출 및 차원 축소 처리를 수행한 후, 제5 특징 이미지를 획득하도록 구성된 처리 모듈 - 상기 제5 특징 이미지에 대응되는 특징 추출의 계층 수는 상기 제1 특징 이미지에 대응되는 특징 추출의 계층 수보다 작음 - ; 및 상기 횟수가 상기 타깃 횟수보다 작은 경우, 상기 제4 특징 이미지 및 상기 제5 특징 이미지를 오버레이한 후 업 샘플링하여, 상기 타깃 이미지를 획득하거나; 또는 상기 횟수가 상기 타깃 횟수보다 작은 경우, 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행한 후 얻은 이미지와 상기 제5 특징 이미지에 대해 오버레이하여, 상기 타깃 이미지를 획득하도록 구성된 제2 결정 서브 모듈을 더 포함한다.In some selectable embodiments, the image semantic segmentation device is a processing module, configured to obtain a fifth feature image after performing feature extraction and dimension reduction processing on the image to be processed - corresponding to the fifth feature image The number of layers of feature extraction to be used is smaller than the number of layers of feature extraction corresponding to the first feature image; and when the number is smaller than the target number, overlaying the fourth characteristic image and the fifth characteristic image and then up-sampling to obtain the target image; or a second determining sub, configured to obtain the target image by overlaying an image obtained after performing sub-pixel convolution on the fourth feature image and the fifth feature image when the number is smaller than the target number It further includes a module.

일부 선택 가능한 실시예에 있어서, 상기 이미지 시맨틱 분할 장치는, 상기 시맨틱 이미지에 따라 기계 기기 내비게이션을 수행하도록 구성된 내비게이션 모듈을 더 포함한다.In some selectable embodiments, the image semantic segmentation device further comprises a navigation module, configured to perform machine device navigation according to the semantic image.

본 발명 실시예의 제3 측면에 따라, 컴퓨터 판독 가능 저장 매체를 제공하고, 상기 저장 매체에는 컴퓨터 프로그램이 저장되며, 상기 컴퓨터 프로그램은 상기 제1 측면의 어느 한 실시예에 따른 이미지 시맨틱 분할 방법을 실행하기 위한 것이다.According to a third aspect of an embodiment of the present invention, there is provided a computer-readable storage medium, wherein a computer program is stored in the storage medium, wherein the computer program executes the image semantic segmentation method according to any one of the embodiments of the first aspect. it is to do

본 발명 실시예의 제4 측면에 따라, 이미지 시맨틱 분할 장치를 제공하고, 상기 이미지 시맨틱 분할 장치는, 프로세서; 및 상기 프로세서가 실행 가능한 명령어를 저장하기 위한 메모리를 포함하며, 여기서, 상기 프로세서는 상기 메모리에 저장된 실행 가능한 명령어를 호출하여, 제1 측면 중 어느 한 실시예에 따른 이미지 시맨틱 분할 방법을 구현하도록 구성된다.According to a fourth aspect of an embodiment of the present invention, there is provided an image semantic segmentation apparatus, the image semantic segmentation apparatus comprising: a processor; and a memory for storing the instructions executable by the processor, wherein the processor is configured to call the executable instructions stored in the memory to implement the image semantic segmentation method according to any one of the first aspects. do.

본 발명 실시예의 제5 측면에 따라, 컴퓨터 프로그램을 제공하고, 상기 컴퓨터 프로그램은 컴퓨터로 하여금 본 발명의 실시예 제1 측면 중 어느 한 실시예에 따른 이미지 시맨틱 분할 방법을 실행하도록 한다.According to a fifth aspect of an embodiment of the present invention, there is provided a computer program, the computer program causing the computer to execute the image semantic segmentation method according to any one of the first aspects of the embodiment of the present invention.

본 발명 실시예에서 제공한 기술 방안은 아래의 유익한 효과를 포함할 수 있다.The technical solutions provided in the embodiments of the present invention may include the following beneficial effects.

본 발명 실시예에 있어서, 획득된 처리될 이미지에 대해 특징 추출을 수행하여, 제1 특징 이미지를 획득하고, 나아가 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득할 수 있다. 적어도 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한다. 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달했을 때, 제일 마지막에 획득한 타깃 이미지에 기반하여, 시맨틱 분할하여 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성할 수 있다. 본 발명의 실시예는 처리될 이미지에 대응되는 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 여러 번 동기적으로 추출하는 것을 통해, 상이한 스케일의 컨택스트 정보를 충분하게 융합하여, 시맨틱 분할의 정확도를 향상시킨다.In an embodiment of the present invention, feature extraction is performed on the obtained image to be processed to obtain a first feature image, and further, by synchronously extracting a plurality of contextual features having different ranges from the first feature image, , a plurality of second feature images may be acquired. A target image is determined according to at least a plurality of second feature images, and context features having different ranges are synchronously extracted again by using the target image as a new first feature image. When the number of times of synchronously extracting a plurality of context features with different ranges from the first feature image reaches the target number, semantics corresponding to the image to be processed by semantic division based on the last acquired target image You can create an image. An embodiment of the present invention provides sufficient fusion of context information of different scales by synchronously extracting a plurality of context features having different ranges from a feature image corresponding to the image to be processed several times, so that the semantic segmentation improve accuracy.

본 발명의 실시예에 있어서, 먼저 제1 특징 이미지에 대해 복수 개 채널로 나누어 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득하고, 다시 복수 개 제3 특징 이미지 중의 적어도 두 개에 대해 범위가 상이한 컨택스트 특징을 추출하여, 대응되는 복수 개 제2 특징 이미지를 획득할 수 있다. 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하는 목적을 구현함으로써, 시맨틱 분할의 정확성을 향상시킴에 있어서 유리하고, 시맨틱 분할 과정의 계산량을 감소시킨다.In an embodiment of the present invention, first, the first feature image is divided into a plurality of channels to perform dimensional reduction processing to obtain a plurality of third feature images, and again for at least two of the plurality of third feature images By extracting context features having different ranges, a plurality of corresponding second feature images may be obtained. By realizing the purpose of synchronously extracting context features having a plurality of ranges from the first feature image, it is advantageous in improving the accuracy of semantic segmentation and reducing the amount of calculation in the semantic segmentation process.

본 발명의 실시예에 있어서, 깊이별 분리 가능한 컨볼루션 및 컨볼루션 커널이 상이한 확장 계수에 대응되는 확장 컨볼루션을 통해, 복수 개 제3 특징 이미지 중의 적어도 두 개에 대해 범위가 상이한 컨택스트 특징을 추출할 수 있으며, 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하는 목적을 구현함으로써, 시맨틱 분할의 정확성을 향상시킴에 있어서 유리하다.In an embodiment of the present invention, context features having different ranges for at least two of a plurality of third feature images are obtained through convolution separable by depth and extension convolution corresponding to extension coefficients having different convolution kernels. It can be extracted, and by realizing the purpose of synchronously extracting context features having a plurality of ranges different from the first feature image, it is advantageous in improving the accuracy of semantic segmentation.

본 발명의 실시예에 있어서, 복수 개 제2 특징 이미지를 직접 오버레이하여 제4 특징 이미지를 얻을 수 있거나, 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중의 적어도 하나를 오버레이하여, 제4 특징 이미지를 얻을 수도 있으므로, 사용 가능성이 높고, 더욱 많은 스케일의 정보를 융합할 수 있으며, 시맨틱 분할을 수행하는 정확성을 향상시킨다.In an embodiment of the present invention, a fourth feature image may be obtained by directly overlaying a plurality of second feature images, or by overlaying at least one of a plurality of second feature images and a plurality of third feature images Since images can also be obtained, the usability is high, information of more scales can be fused, and the accuracy of performing semantic segmentation is improved.

본 발명의 실시예에 있어서, 타깃 이미지의 차원을 유지하기 위해, 제4 특징 이미지에 대해 업 샘플링을 수행함으로써, 타깃 이미지를 얻을 수 있다. 또는 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하여, 시맨틱 분할의 효과를 향상시키고, 시맨틱 분할 결과로 하여금 더욱 정확하게 한다.In an embodiment of the present invention, the target image may be obtained by performing upsampling on the fourth feature image in order to maintain the dimension of the target image. Alternatively, sub-pixel convolution is performed on the fourth feature image to improve the effect of semantic segmentation and make the semantic segmentation result more accurate.

본 발명의 실시예에 있어서, 타깃 이미지가 결정되기 전에, 제5 특징 이미지를 획득할 수 있다. 여기서, 제5 특징 이미지는 처리될 이미지에 대해 낮은 차원 이미지 특징을 추출하고 얻은 이미지이다. 상기 제5 특징 이미지에 대응되는 특징 추출의 계층 수는 상기 제1 특징 이미지에 대응되는 특징 추출의 계층 수보다 작다. 제4 특징 이미지 및 제5 특징 이미지를 오버레이한 후 업 샘플링하여, 상기 타깃 이미지를 획득하고, 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 경우, 제4 특징 이미지에 대해서만 업 샘플링을 수행하여, 타깃 이미지를 획득할 수 있고, 차원 축소 처리 후 처리될 이미지 중의 일부 중요한 특징을 잃을 가능성을 낮추며, 시맨틱 분할의 정확성을 향상시킨다.In an embodiment of the present invention, a fifth feature image may be acquired before the target image is determined. Here, the fifth feature image is an image obtained by extracting low-dimensional image features for the image to be processed. The number of layers of feature extraction corresponding to the fifth feature image is smaller than the number of layers of feature extraction corresponding to the first feature image. The fourth feature image and the fifth feature image are overlaid and then up-sampled to obtain the target image, and the number of times of synchronously extracting a plurality of contextual features having different ranges from the first feature image reaches the target number. In this case, by performing upsampling only on the fourth feature image, the target image can be obtained, the possibility of losing some important features in the image to be processed after dimensionality reduction processing is lowered, and the accuracy of semantic segmentation is improved.

본 발명의 실시예에 있어서, 제일 마지막에 획득한 타깃 이미지의 차원은 타깃 차원이고, 여기서, 타깃 차원은 기설정된 상기 시맨틱 이미지에 포함된 물체 카테고리의 총 수량에 따라 결정된 것이다. 최종적으로 얻은 시맨틱 이미지의 차원과 처리될 이미지의 차원이 일치함을 확보한다.In an embodiment of the present invention, the last dimension of the target image obtained is the target dimension, wherein the target dimension is determined according to the total number of object categories included in the preset semantic image. Ensure that the dimension of the finally obtained semantic image matches the dimension of the image to be processed.

본 발명의 실시예에 있어서, 생성된 처리될 이미지에 대응되는 시맨틱 이미지에 따라 기계 기기 내비게이션을 수행할 수 있고, 사용 가능성이 높다.In an embodiment of the present invention, the machine device navigation may be performed according to the generated semantic image corresponding to the image to be processed, and the possibility of use is high.

이해해야 할 것은, 이상의 일반적인 설명 및 하기의 상세한 설명은 다만 예시적이고 해석적인 것이며, 본 발명의 실시예를 한정하지는 못한다.It should be understood that the above general description and the following detailed description are merely illustrative and interpretative, and do not limit the embodiments of the present invention.

여기서 도면은 명세서에 포함되어 본 명세서의 일부를 구성하고, 본 발명에 부합되는 실시예를 나타내며, 명서세와 함께 본 발명의 원리 해석을 위한 것이다.
도 1a는 본 발명이 일 예시적 실시예에 따라 도시한 색상 이미지이다.
도 1b는 본 발명이 일 예시적 실시예에 따라 도시한 시맨틱 이미지이다.
도 2는 본 발명이 일 예시적 실시예에 따라 도시한 이미지 시맨틱 분할 방법 흐름도이다.
도 3은 본 발명이 일 예시적 실시예에 따라 도시한 다른 이미지 시맨틱 분할 방법 흐름도이다.
도 4는 본 발명이 일 예시적 실시예에 따라 도시한 상이한 범위의 컨택스트 특징 추출을 수행하는 시나리오 예시도이다.
도 5은 본 발명이 일 예시적 실시예에 따라 도시한 다른 이미지 시맨틱 분할 방법 흐름도이다.
도 6은 본 발명이 일 예시적 실시예에 따라 도시한 다른 이미지 시맨틱 분할 방법 흐름도이다.
도 7은 본 발명이 일 예시적 실시예에 따라 도시한 시맨틱 이미지를 획득하는 신경 네트워크 아키텍처 예시도이다.
도 8a는 본 발명이 일 예시적 실시예에 따라 도시한 백 엔드 서브 네트워크의 아키텍처 예시도이다.
도 8b는 본 발명이 일 예시적 실시예에 따라 도시한 다른 백 엔드 서브 네트워크의 아키텍처 예시도이다.
도 8c는 본 발명이 일 예시적 실시예에 따라 도시한 다른 백 엔드 서브 네트워크의 아키텍처 예시도이다.
도 8d는 본 발명이 일 예시적 실시예에 따라 도시한 다른 백 엔드 서브 네트워크의 아키텍처 예시도이다.
도 9는 본 발명이 일 예시적 실시예에 따라 도시한 또 다른 이미지 시맨틱 분할 방법 흐름도이다.
도 10은 본 발명이 일 예시적 실시예에 따라 도시한 이미지 시맨틱 분할 장치 블록도이다.
도 11은 본 발명이 일 예시적 실시예에 따라 도시한 이미지 시맨틱 분할 장치를 위한 구조 예시도이다.Here, the drawings are included in the specification and form a part of the specification, show embodiments consistent with the present invention, and are for interpreting the principles of the present invention together with the specification.
Fig. 1A is a color image of the present invention according to an exemplary embodiment.
Fig. 1B is a semantic image of the present invention according to an exemplary embodiment.
Fig. 2 is a flowchart of an image semantic segmentation method according to an exemplary embodiment of the present invention.
Fig. 3 is a flowchart of another image semantic segmentation method according to an exemplary embodiment of the present invention.
Fig. 4 is an exemplary scenario diagram in which the present invention performs context feature extraction of different ranges according to an exemplary embodiment.
Fig. 5 is a flowchart of another image semantic segmentation method according to an exemplary embodiment of the present invention.
Fig. 6 is a flowchart of another image semantic segmentation method according to an exemplary embodiment of the present invention.
Fig. 7 is an exemplary diagram of a neural network architecture for acquiring a semantic image according to an exemplary embodiment of the present invention.
Fig. 8A is an architectural diagram of a back-end sub-network according to an exemplary embodiment of the present invention.
Fig. 8B is an architectural diagram of another back-end sub-network according to an exemplary embodiment of the present invention.
Fig. 8c is an architectural diagram of another back-end sub-network according to an exemplary embodiment of the present invention.
Fig. 8D is an architectural diagram of another back-end sub-network according to an exemplary embodiment of the present invention.
Fig. 9 is a flowchart of another image semantic segmentation method according to an exemplary embodiment of the present invention.
Fig. 10 is a block diagram of an image semantic segmentation apparatus according to an exemplary embodiment of the present invention.
11 is a structural diagram for an image semantic segmentation apparatus according to an exemplary embodiment of the present invention.

여기서, 예시적 실시예를 상세하게 설명할 것이며, 그 예는 도면에 도시된다. 아래의 설명이 도면을 참조할 경우, 다른 표시가 없는 한, 상이한 도면에서의 동일한 숫자는 동일하거나 유사한 요소를 나타낸다. 아래의 예시적 실시예에서 설명된 실시 방식은 본 발명과 일치하는 모든 실시 방식을 나타내는 것은 아니다. 이와 반대로, 이들은 다만 청구 범위에 상세히 설명된 바와 같이 본 발명의 일부 측면과 일치하는 장치 및 방법의 예일뿐이다.Here, exemplary embodiments will be described in detail, examples of which are shown in the drawings. When the following description refers to drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present invention. To the contrary, these are merely examples of devices and methods consistent with some aspects of the invention as detailed in the claims.

본 발명에 운용된 용어는 다만 특정된 실시예를 설명하기 위한 것이며, 본 발명을 한정하려는 것은 아니다. 본 발명 및 청구 범위에 운용된 단수 형태인 “한 가지” 및 "상기”는 본문이 다른 의미를 명확하게 나타내지 않는 한, 복수 형태를 포함한다. 또한 이해해야 할 것은, 본문에 운용된 용어 “및/또는”은 하나 또는 복수 개의 관련되어 열거된 항목의 임의의 조합 또는 모든 가능한 조합을 의미하고 포함한다.The terminology used in the present invention is for the purpose of describing specific embodiments only, and is not intended to limit the present invention. The singular forms "a thing" and "the" used in the present invention and claims include the plural forms unless the text clearly indicates another meaning. It should also be understood that the terms "and/or" as used herein or” means and includes any or all possible combinations of one or more related listed items.

이해해야 할 것은, 본 발명에서 "제1", "제2", "제3" 등과 같은 용어를 사용하여 다양한 정보를 설명하지만 이러한 정보들은 이러한 용어로 한정되어서는 안된다. 이러한 용어는 동일한 타입의 정보를 서로 구별하기 위해서만 사용된다. 예를 들어, 본 발명의 범위를 벗어나지 않는 한, 제1 정보는 제2 정보로 지칭될 수 있고, 마찬가지로 제2 정보도 제1 정보로 지칭될 수 있다. 이는 단어 사용 상황에 따라 결정되고, 예를 들어 여기서 운용된 단어 "…면"은 "…때" 또는 "…할 경우" 또는 "결정에 응답하여"의 뜻으로 해석될 수 있다.It should be understood that various information is described using terms such as “first”, “second”, “third” and the like in the present invention, but such information should not be limited to these terms. These terms are only used to distinguish between the same types of information. For example, first information may be referred to as second information, and likewise second information may also be referred to as first information, without departing from the scope of the present invention. This is determined according to the usage situation of the word, for example, the word “if…” used herein may be interpreted as “when…” or “when…” or “in response to a decision”.

본 발명의 실시예는 이미지 시맨틱 분할 방법을 제공하고, 예를 들어 로봇, 자율주행 차량, 드론 등과 같이 이동 가능한 기계 기기를 위한 것일 수 있다. 또는, 프로세서가 컴퓨터 실행 가능한 코드를 작동하는 방식을 통해 본 발명의 실시예가 제공하는 방법을 구현할 수 있다.An embodiment of the present invention provides an image semantic segmentation method, and may be for a movable mechanical device, such as a robot, an autonomous vehicle, or a drone. Alternatively, the method provided by the embodiments of the present invention may be implemented through a method in which the processor operates computer-executable code.

이미지 시맨틱 분할은 입력된 빨간색, 녹색, 파란색(Red Green Blue, RGB) 이미지 중 각 픽셀 포인트에 대해, 그가 속한 물체의 타입을 예측하고, 상기 물체의 타입은 예를 들어 풀밭, 사람, 차량, 건물, 하늘 등과 같은 다양한 물체를 포함할 수 있지만 이에 한정되지 않으며, RGB 이미지와 대응되는 사이즈 및 차원이 동일한 소속된 물체 타입 라벨을 지닌 시멘틱 맵을 얻는다. 예를 들어 도 1a는 RGB 이미지이고, 도 1b는 대응되는 시맨틱 이미지이다.Image semantic segmentation predicts the type of an object to which it belongs, for each pixel point among input red, green, and blue (RGB) images, and the type of the object is, for example, grass, people, vehicles, and buildings. , sky, etc., but is not limited thereto, and obtains a semantic map having an object type label belonging to the same size and dimension corresponding to the RGB image. For example, FIG. 1A is an RGB image, and FIG. 1B is a corresponding semantic image.

본 발명의 실시예는 상기 기계 기기에 의해 획득된 처리될 이미지에 대해 특징 추출을 수행하는 것을 통해, 제1 특징 이미지를 얻고; 나아가 제1 특징 이미지에 대해 여러 번으로 나눠서 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득함으로써, 적어도 복수 개 제2 특징 이미지에 따라 타깃 이미지를 결정하며; 최종 제일 마지막에 획득한 타깃 이미지에 기반하여, 시맨틱 이미지를 생성할 수 있다. 본 발명의 실시예는 여러 차례 컨택스트 특징 추출 및 융합을 통해, 상이한 스케일의 컨택스트 정보를 충분하게 융합하여, 시맨틱 분할의 정확도를 향상시킬 수 있다. 기계 기기는 상기 처리될 이미지에 대응되는 시맨틱 이미지에 따라, 상기 기계 기기 전방의 장애물에 대해 회피하고, 주행 노선을 합리하게 계획하여, 사용 가능성이 높다.An embodiment of the present invention is to obtain a first feature image through performing feature extraction on an image to be processed obtained by the mechanical device; Furthermore, a target image is determined according to at least a plurality of second feature images by dividing the first feature image several times and synchronously extracting a plurality of context features having different ranges to obtain a plurality of second feature images, ; Finally, a semantic image may be generated based on the last acquired target image. The embodiment of the present invention can sufficiently fuse context information of different scales through context feature extraction and fusion several times, thereby improving the accuracy of semantic segmentation. The mechanical device avoids obstacles in front of the mechanical device according to the semantic image corresponding to the image to be processed, and rationally plans the driving route, so that the use possibility is high.

이상 본 발명의 예시적인 응용 시나리오일 뿐, 본 발명이 사용될 수 있는 다른 이미지 시맨틱 분할 방법의 시나리오는 모두 본 발명의 보호 범위에 속한다.The above are only exemplary application scenarios of the present invention, and scenarios of other image semantic segmentation methods in which the present invention can be used are all within the protection scope of the present invention.

도 2는 일 예시적 실시예에 따라 도시한 이미지 시맨틱 분할 방법 흐름도이고, 하기와 같은 단계를 포함한다.Fig. 2 is a flowchart of an image semantic segmentation method according to an exemplary embodiment, and includes the following steps.

단계 101에 있어서, 획득된 처리될 이미지에 대해 특징 추출을 수행하여, 제1 특징 이미지를 획득한다.In step 101, a first feature image is obtained by performing feature extraction on the obtained image to be processed.

본 발명의 실시예에 있어서, 처리될 이미지는 실시간 이미지일 수 있고, 실시간 이미지는 상기 기계 기기에 기설정된 카메라를 통해 이미지 수집을 수행할 수 있으며, 수집된 이미지에는 상기 기계 기기 이동 경로 전방에 위치한 다양한 물체들이 포함될 수 있다. 처리될 이미지는 상기 기계 기기에 의해 이미 수집된 이미지(예를 들어 기계 기기에 저장된 이미지)이거나, 다른 기기가 상기 기계 기기에 송신한, 시맨틱 분할 수행이 필요한 이미지일 수도 있다.In an embodiment of the present invention, the image to be processed may be a real-time image, and the real-time image may be image collected through a camera preset in the mechanical device. Various objects may be included. The image to be processed may be an image that has already been collected by the mechanical device (eg, an image stored in the mechanical device), or an image that requires performing semantic segmentation, which is transmitted to the mechanical device by another device.

처리될 이미지에 포함된 원본 이미지 정보를 뚜렷한 물리적 의미 또는 통계적 의미를 구비한 한 그룹의 특징으로 전환함으로써, 제1 특징 이미지를 얻을 수 있거나; 예를 들어 잔차 네트워크(Residual Networks, ResNet), 시각적 기하 그룹(Visual Geometry Group, VGG) 네트워크 등 방식과 같은 컨볼루션 네트워크를 통해 처리될 이미지에서 고차원 이미지 특징을 추출하여 제1 특징 이미지를 얻을 수 있다.a first feature image can be obtained by converting original image information included in the image to be processed into a group of features having distinct physical or statistical meanings; For example, the first feature image can be obtained by extracting high-dimensional image features from the image to be processed through a convolutional network such as a residual network (Residual Networks, ResNet), a Visual Geometry Group (VGG) network, etc. .

여기서, 일부 실시예에 있어서, 처리될 이미지에 대해 특징 추출을 수행할 때, 처리될 이미지에서 예를 들어 하르 유사 특징(Haar-like features, Haar), 국부 이진 패턴(Local Binary Pattern, LBP), 기울기 방향성 히스토그램(Histogram of Oriented Gradient, HOG) 등과 같은 특징을 추출할 수 있다. 하르 유사 특징이 설명하는 것은 이미지가 국부적 범위 내에서 픽셀 값 명암 변환 정보이고, LBP가 설명하는 것은 이미지가 국부적 범위 내에서 대응되는 무늬 정보이며, HOG가 설명하는 것은 이미지가 국부적 범위 내에서 대응되는 형상 에지 경사도 정보이다. 또는, 다른 일부 실시예에 있어서, 처리될 이미지에 대해 특징 추출을 수행할 때, 처리될 이미지의 고차원 시각 특징을 추출할 수 있다.Here, in some embodiments, when feature extraction is performed on an image to be processed, for example, Haar-like features (Haar), a local binary pattern (LBP), Features such as a histogram of oriented gradient (HOG) can be extracted. What the HAR-like feature describes is the pixel value intensity conversion information within the local range of the image, the LBP describes the pattern information that the image corresponds to within the local range, and the HOG describes the image that the image corresponds to within the local range. Shape edge gradient information. Alternatively, in some other embodiments, when feature extraction is performed on the image to be processed, high-dimensional visual features of the image to be processed may be extracted.

단계 102에 있어서, 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득한다.In step 102, a plurality of context features having different ranges are synchronously extracted from the first feature image to obtain a plurality of second feature images.

본 발명의 실시예에 있어서, 컨택스트 특징 추출은 제1 특징 이미지에서 픽셀 포인트 인근 지역내의 다른 픽셀 포인트의 분포 상황에 대해 수행한 통계이다.In an embodiment of the present invention, the context feature extraction is statistics performed on the distribution of other pixel points in the area adjacent to the pixel point in the first feature image.

범위가 상이한 컨택스트 특징 추출은 상이한 픽셀 수 간격으로 수행한 컨택스트 특징 추출을 의미하고, 예를 들어 제1 특징 이미지에 대해 컨택스트 특징 추출을 수행할 때, 제1 특징 이미지에 포함된 복수 개 픽셀 포인트 간격(예를 들어 3개, 7개, 12개 픽셀 포인트 간격)에 대해 동기적으로 컨택스트 특징 추출을 수행하여, 복수 개 제2 특징 이미지를 각각 얻을 수 있다.Context feature extraction with different ranges means context feature extraction performed at different pixel count intervals. For example, when context feature extraction is performed on the first feature image, a plurality of elements included in the first feature image are extracted. Context feature extraction is performed synchronously for pixel point intervals (eg, 3, 7, and 12 pixel point intervals) to obtain a plurality of second feature images, respectively.

단계 103에 있어서, 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 상기 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한다.In step 103, a target image is determined according to at least the plurality of second feature images, and the context features having different ranges are synchronously extracted again by using the target image as the new first feature image. .

본 발명의 실시예에 있어서, 타깃 이미지는 매번 적어도 복수 개 제2 특징 이미지에 따라 획득한 이미지이다. 타깃 이미지를 결정한 후, 상기 타깃 이미지는 새로운 제1 특징 이미지로 사용되고, 다시 돌아와 단계 102를 실행한다.In an embodiment of the present invention, the target image is an image obtained according to at least a plurality of second feature images each time. After determining the target image, the target image is used as a new first feature image, and comes back to execute step 102 .

단계 104에 있어서, 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 것에 응답하여, 제일 마지막에 획득한 상기 타깃 이미지에 기반하여, 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성한다.In step 104, in response to the number of times of synchronously extracting a plurality of contextual features having different ranges for the first feature image reaches the target number, based on the target image acquired last, the processing A semantic image corresponding to the image to be created is created.

본 발명의 실시예에 있어서, 타깃 횟수는 2보다 크거나 같은 양의 정수 일 수 있다.In an embodiment of the present invention, the target number may be a positive integer greater than or equal to two.

상기 실시예에 있어서, 획득된 처리될 이미지에 대해 특징 추출을 수행하여, 제1 특징 이미지를 획득하고, 나아가 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득할 수 있다. 적어도 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한다. 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달했을 때, 제일 마지막에 획득한 타깃 이미지에 기반하여, 시맨틱 분할하여 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성할 수 있다. 본 발명의 실시예에서, 처리될 이미지에 대응되는 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 여러 번 동기적으로 추출하는 것을 통해, 상이한 스케일의 컨택스트 정보를 충분하게 융합할 수 있고, 시맨틱 분할의 정확도를 향상시킨다.In the above embodiment, by performing feature extraction on the obtained image to be processed to obtain a first feature image, further, by synchronously extracting a plurality of contextual features with different ranges from the first feature image, A plurality of second feature images may be acquired. A target image is determined according to at least a plurality of second feature images, and context features having different ranges are synchronously extracted again by using the target image as a new first feature image. When the number of times of synchronously extracting a plurality of context features with different ranges from the first feature image reaches the target number, semantics corresponding to the image to be processed by semantic division based on the last acquired target image You can create an image. In an embodiment of the present invention, by synchronously extracting a plurality of different contextual features from a feature image corresponding to the image to be processed several times, context information of different scales can be sufficiently fused, Improve the accuracy of semantic segmentation.

일부 선택 가능한 실시예에 있어서, 단계 101에 대해, 특징 추출 네트워크를 사용하여, 수집된 처리될 이미지를 상기 특징 추출 네트워크에 입력함으로써, 상기 특징 추출 네트워크가 제1 특징 이미지를 출력할 수 있다. 여기서, 특징 추출 네트워크는 Resnet, VGG 등 특징 추출을 수행할 수 있는 신경 네트워크 일 수 있다.In some selectable embodiments, for step 101, the feature extraction network may output a first feature image by using the feature extraction network to input the collected images to be processed into the feature extraction network. Here, the feature extraction network may be a neural network capable of performing feature extraction such as Resnet or VGG.

일부 선택 가능한 실시예에 있어서, 예를 들어 도 3에 도시된 바와 같이, 단계 102는 하기와 같은 단계를 포함할 수 있다.In some selectable embodiments, for example as shown in FIG. 3 , step 102 may include the following steps.

단계 102-1에 있어서, 상기 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득한다.In step 102-1, a plurality of third feature images are obtained by dividing the first feature image into a plurality of channels and synchronously performing a dimension reduction process.

본 발명의 실시예에 있어서, 제1 특징 이미지에 대해 차원 축소 처리를 수행하는 것은 후속적으로 컨택스트 특징 추출을 더욱 잘 수행하기 위한 것이고, 후속적으로 처리하는 계산량을 감소하는데 유리하다. 제1 특징 이미지에 대해 복수 개 채널로 나누어 차원 축소 처리를 수행하고, 후속적으로 복수 개 채널에 대응되는 차원 축소 처리된 이미지에 대해 범위가 상이한 컨택스트 특징을 각각 추출할 수 있으며, 시맨틱 분할의 정확성을 향상시킴에 있어서 유리하고, 시맨틱 분할 과정의 계산량을 감소시킨다.In an embodiment of the present invention, performing dimensionality reduction processing on the first feature image is for better performing context feature extraction subsequently, which is advantageous in reducing the amount of computation to be subsequently processed. The first feature image may be divided into a plurality of channels to perform dimensional reduction processing, and context features having different ranges may be extracted from the dimensionally reduced image corresponding to the plurality of channels subsequently. It is advantageous in improving the accuracy and reduces the computational amount of the semantic segmentation process.

본 발명의 실시예에 있어서, 제1 특징 이미지에 대해 복수 개 채널로 나누어 동일한 차원의 차원 축소 처리를 동기적으로 수행할 수 있고, 예를 들어 도 4에 도시된 바와 같이, 1×1 컨볼루션 커널을 사용하는 컨볼루션 계층이 다중 채널 차원 축소 처리를 수행한 후 얻은 복수 개 제3 특징 이미지의 차원은 1×1×256 차원일 수 있다.In an embodiment of the present invention, dimensional reduction processing of the same dimension may be performed synchronously by dividing the first feature image into a plurality of channels, for example, as shown in FIG. 4 , 1×1 convolution The dimensions of the plurality of third feature images obtained after the convolutional layer using the kernel performs multi-channel dimensionality reduction processing may be 1×1×256 dimensions.

단계 102-2에 있어서, 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득한다.In step 102-2, context features having different ranges are extracted with respect to at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images.

본 발명의 실시예에 있어서, 깊이별 분리 가능한 컨볼루션 및 컨볼루션 커널이 상이한 확장 계수에 대응되는 확장 컨볼루션을 통해, 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득할 수 있고, 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하는 목적을 구현함으로써, 시맨틱 분할의 정확성을 향상시킴에 있어서 유리하다. 여기서, 확장 컨볼루션은 3×3 크기의 컨볼루션 커널을 선택할 수 있고, 5×5 또는 7×7 등 크기의 컨볼루션 커널을 사용할 수도 있으며, 본 발명의 실시예에 있어서 확장 컨볼루션의 컨볼루션 커널 크기에 대해 한정하지 않는다. 여기서, 시맨틱 분할의 시나리오에 따라 확장 컨볼루션의 확장 계수r을 상이한 값으로 설정할 수 있고, 예를 들어 r을 6, 12, 18, 32 등으로 설정할 수 있으며, r의 값에 따라 상이한 픽셀 포인트 수 간격으로 컨택스트 특징 추출을 수행할 수 있다.In an embodiment of the present invention, through a convolution separable by depth and extension convolution corresponding to extension coefficients having different convolution kernels, ranges of at least two third feature images among a plurality of third feature images are different. By extracting context features, a plurality of second feature images can be obtained, and the purpose of synchronously extracting context features having a plurality of ranges different from the first feature image is realized, thereby improving the accuracy of semantic segmentation. It is advantageous to improve Here, for the extended convolution, a convolution kernel having a size of 3×3 may be selected, and a convolution kernel having a size of 5×5 or 7×7 may be used. In an embodiment of the present invention, the convolution of the extended convolution There is no restriction on the kernel size. Here, the extension coefficient r of the extension convolution may be set to a different value according to the scenario of semantic segmentation, for example, r may be set to 6, 12, 18, 32, etc., and the number of pixel points different according to the value of r Context feature extraction can be performed at intervals.

예를 들어 도 4에 도시된 바와 같이, 제1 특징 이미지에 대해 4개 채널의 차원 축소 처리를 수행한 후, 4개의 제3 특징 이미지를 획득하고, 각각 제3 특징 이미지 1 내지 제3 특징 이미지 4로 표기하며; 제3 특징 이미지 1에 대해 컨택스트 특징 추출을 수행하지 않을 수 있고, 제3 특징 이미지 2, 3, 4가 각각 대응되는 확장 계수 r의 값은 6, 12 및 18이며, 즉 제3 특징 이미지 2, 3, 4에 대해 각각 6개 픽셀 포인트, 12개 픽셀 포인트 및 18개 픽셀 포인트를 간격으로, 컨택스트 특징을 추출하여, 3개 제2 특징 이미지를 얻는다.For example, as shown in FIG. 4 , after dimensionality reduction processing of four channels is performed on the first feature image, four third feature images are obtained, and the third feature image 1 to the third feature image, respectively. denoted as 4; Context feature extraction may not be performed on the third feature image 1, and the values of the extension coefficients r corresponding to the third feature images 2, 3, and 4 are 6, 12, and 18, that is, the third feature image 2 , 3, and 4, at intervals of 6 pixel points, 12 pixel points, and 18 pixel points, respectively, context features are extracted to obtain three second feature images.

상기 실시예에 있어서, 먼저 제1 특징 이미지에 대해 복수 개 채널로 나누어 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득하고, 다시 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 대응되는 적어도 두 개 제2 특징 이미지를 획득할 수 있다. 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하는 목적을 구현함으로써, 시맨틱 분할의 정확성을 향상시킴에 있어서 유리하고, 시맨틱 분할 과정의 계산량을 감소시킨다.In the above embodiment, first, the first feature image is divided into a plurality of channels to perform dimensional reduction processing to obtain a plurality of third feature images, and again at least two third feature images among the plurality of third feature images By extracting context features having different ranges for , at least two corresponding second feature images may be obtained. By realizing the purpose of synchronously extracting context features having a plurality of ranges from the first feature image, it is advantageous in improving the accuracy of semantic segmentation and reducing the amount of calculation in the semantic segmentation process.

일부 선택 가능한 실시예에 있어서, 예를 들어 도 5에 도시된 바와 같이, 단계 103에서 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하는 과정은 하기와 같은 단계를 포함할 수 있다.In some selectable embodiments, for example, as illustrated in FIG. 5 , the process of determining the target image according to at least the plurality of second feature images in step 103 may include the following steps.

단계 103-1에 있어서, 적어도 상기 복수 개 제2 특징 이미지를 융합하여, 제4 특징 이미지를 획득한다.In step 103-1, at least the plurality of second feature images are fused to obtain a fourth feature image.

본 발명의 실시예에 있어서, 적어도 상기 단계에서 획득한 복수 개 제2 특징 이미지를 오버레이를 수행함으로써, 제4 특징 이미지를 획득할 수 있다.In an embodiment of the present invention, at least the fourth feature image may be obtained by overlaying the plurality of second feature images obtained in the above step.

예를 들어, 복수 개 제2 특징 이미지를 한곳에 집합고, 컨볼루션 동작을 통해 다중 스케일 컨택스트 특징의 융합을 구현하여, 제4 특징 이미지를 얻는다. 복수 개 제2 특징 이미지를 접합하여, 제4 특징 이미지를 얻을 수도 있다.For example, a plurality of second feature images are collected in one place, and fusion of multi-scale context features is implemented through a convolution operation to obtain a fourth feature image. A plurality of second feature images may be joined to obtain a fourth feature image.

단계 103-2에 있어서, 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정한다.In step 103-2, the target image is determined according to at least the fourth feature image.

가능한 구현 방식에 있어서, 제4 특징 이미지를 타깃 이미지로 직접 사용할 수 있다. 다른 가능한 구현 방식에 있어서, 제4 특징 이미지에 대해 시맨틱 분할 효과를 향상시킬 수 있는 처리를 수행할 수 있음으로써, 타깃 이미지를 얻는다. 다른 가능한 구현 방식에 있어서, 제4 특징 이미지 및 다른 처리될 이미지와 관련되는 특징 이미지에 따라, 타깃 이미지를 결정할 수도 있다.In a possible implementation manner, the fourth feature image may be used directly as the target image. In another possible implementation manner, processing capable of enhancing the semantic segmentation effect may be performed on the fourth feature image, thereby obtaining a target image. In another possible implementation manner, the target image may be determined according to the fourth feature image and the feature image associated with another image to be processed.

상기 실시예에 있어서, 적어도 복수 개 제2 특징 이미지에 따라 타깃 이미지를 결정할 수 있고, 사용 가능성이 높다.In the above embodiment, the target image may be determined according to at least the plurality of second feature images, and the possibility of use is high.

일부 선택 가능한 실시예에 있어서, 단계 103-1에 대해, 가능한 구현 방식에 있어서, 복수 개 제2 특징 이미지에 대해 오버레이를 수행함으로써, 제4 특징 이미지를 얻을 수 있다. 처리될 이미지에 대응되는 특징 정보를 더욱 잘 보존하고, 시맨틱 분할의 정확성을 향상시키기 위해, 다른 가능한 구현 방식에 있어서, 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지에 대해 오버레이를 수행하며, 오버레이하여 얻은 이미지를 제4 특징 이미지로 사용한다.In some selectable embodiments, for step 103-1, in a possible implementation manner, by performing an overlay on a plurality of second feature images, a fourth feature image may be obtained. In order to better preserve the feature information corresponding to the image to be processed and improve the accuracy of semantic segmentation, in another possible implementation manner, a third feature of at least one of a plurality of second feature images and a plurality of third feature images An overlay is performed on the image, and the image obtained by overlaying is used as the fourth feature image.

복수 개 제3 특징 이미지는 바로 제1 특징 이미지에 대해 복수 개 채널로 동기적으로 나누어 차원 축소 처리 후 얻은 이미지이고, 본 발명의 실시예에 있어서, 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지를 오버레이하여, 컨볼루션 동작을 통해 다중 스케일 컨택스트 특징의 융합을 구현하여, 제4 특징 이미지를 얻는다. 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지를 접합하여, 제4 특징 이미지를 얻을 수도 있다.The plurality of third feature images are images obtained after dimensionally reduction processing by synchronously dividing the first feature image into a plurality of channels, and in an embodiment of the present invention, the plurality of second feature images and the plurality of third features By overlaying at least one third feature image among the images, a fusion of multi-scale context features is implemented through a convolution operation to obtain a fourth feature image. A fourth feature image may be obtained by joining at least one third feature image among the plurality of second feature images and the plurality of third feature images.

상기 실시예에 있어서, 복수 개 제2 특징 이미지를 직접 오버레이하여 제4 특징 이미지를 얻을 수 있거나, 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지를 오버레이하여, 제4 특징 이미지를 얻을 수도 있으므로, 사용 가능성이 높고, 더욱 많은 스케일의 정보를 융합할 수 있으며, 시맨틱 분할을 수행하는 정확성을 향상시킨다.In the above embodiment, a fourth feature image may be obtained by directly overlaying a plurality of second feature images, or at least one of a plurality of second feature images and a plurality of third feature images for which context feature extraction is not performed. Since the third feature image may be overlaid to obtain the fourth feature image, the usability is high, information of more scales can be fused, and the accuracy of performing semantic segmentation is improved.

일부 선택 가능한 실시예에 있어서, 단계 103-2에 대해, 아래 방식 중의 어느 한 방식을 사용하여 타깃 이미지를 결정할 수 있다.In some selectable embodiments, for step 103-2, the target image may be determined using one of the methods below.

가능한 구현 방식에 있어서, 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정하는 단계는, 상기 제4 특징 이미지에 대해 업 샘플링을 수행하여, 상기 타깃 이미지를 획득하는 단계를 포함한다.In a possible implementation manner, the determining of the target image according to at least the fourth feature image includes performing up-sampling on the fourth feature image to obtain the target image.

본 발명의 실시예에 있어서, 타깃 이미지는 또 후속적으로 차원 축소 처리를 수행하거나 시맨틱 이미지를 생성해야 되기에, 타깃 이미지의 차원을 유지하기 위해, 제4 특징 이미지에 대해 업 샘플링 처리를 수행해야 한다. 제4 특징 이미지를 결정한 후, 제4 특징 이미지에 대해 직접 업 샘플링 처리를 수행함으로써(예를 들어 선형 보간), 타깃 이미지를 얻는다. 나아가 타깃 이미지를 새로운 제1 특징 이미지로 사용하고, 돌아와 단계 102를 실행한다.In an embodiment of the present invention, since the target image must also subsequently undergo dimensionality reduction processing or generate a semantic image, up-sampling processing must be performed on the fourth feature image in order to maintain the dimension of the target image. do. After determining the fourth feature image, a target image is obtained by directly performing upsampling processing (eg, linear interpolation) on the fourth feature image. Further, the target image is used as the new first feature image, and the step 102 is returned and executed.

제4 특징 이미지에 대해 업 샘플링 처리를 수행할 때, 대응되는 업 샘플링 인자 t는 2, 4, 8 등일 수 있고, 매번 상기 제4 특징 이미지에 대해 업 샘플링 처리를 수행할 때, 동일하거나 상이한 업 샘플링 인자를 사용할 수 있다. 여기서, 업 샘플링 인자는 원본 이미지에 대해 확대할 때, 픽셀 포인트 사이에서 적합한 보간 알고리즘을 사용하여 새로운 픽셀 포인트를 삽입하는 수량이고, 예를 들어 업 샘플링 인자 t가 2일때, 두 개 인접한 픽셀 포인트 사이에서 선형 보간 알고리즘을 사용하여 2개 새로운 픽셀 포인트를 삽입할 수 있다.When the up-sampling process is performed on the fourth feature image, the corresponding up-sampling factor t may be 2, 4, 8, etc., and each time the up-sampling process is performed on the fourth feature image, the same or different up-sampling A sampling factor can be used. Here, the upsampling factor is the quantity to insert new pixel points using an appropriate interpolation algorithm between pixel points when zooming in on the original image, for example, when the upsampling factor t is 2, between two adjacent pixel points 2 new pixel points can be inserted using a linear interpolation algorithm in .

다른 가능한 구현 방식에 있어서, 적어도 상기 제4 특징 이미지에 따라, 상기 타깃 이미지를 결정하는 단계는, 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하여, 상기 타깃 이미지를 획득하는 단계를 포함한다.In another possible implementation manner, the determining of the target image according to at least the fourth feature image includes performing sub-pixel convolution on the fourth feature image to obtain the target image. .

서브 픽셀 컨볼루션은 출력된 특징 맵 깊이 방향의 픽셀에 대해 타일링을 수행하는 것을 통해, 특징 맵 깊이가 작아지고 2차원 평면의 공간 척도가 커지게 함으로써, 특징 맵의 공간 해상도를 향상시킨다.Sub-pixel convolution improves the spatial resolution of the feature map by tiling the pixels in the output feature map depth direction to decrease the feature map depth and increase the spatial scale of the two-dimensional plane.

제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하는 것을 통해, 시맨틱 분할의 효과를 향상시킬 수 있고, 시맨틱 분할 결과로 하여금 더욱 정확하게 한다. 서브 픽셀 컨볼루션 처리 후, 업 샘플링 처리도 수행할 수 있고, 업 샘플링 처리를 통해 타깃 이미지를 획득하며, 나아가 타깃 이미지를 새로운 제1 특징 이미지로 사용하고, 돌아와 단계 102를 실행한다.By performing sub-pixel convolution on the fourth feature image, the effect of semantic segmentation can be improved, and the semantic segmentation result can be made more accurate. After the sub-pixel convolution processing, up-sampling processing may also be performed, acquiring a target image through up-sampling processing, further using the target image as a new first feature image, and returning to execute step 102 .

다른 가능한 구현 방식에 있어서, 전에 제1 특징 이미지에 대해 차원 축소 처리를 수행하고, 후속적인 이미지는 모두 차원 축소 후의 복수 개 제3 특징 이미지에 기반하여 얻은 것이지만, 최종 생성된 시맨틱 이미지는 처리될 이미지와 동일한 고차원의 이미지이며, 차원 축소 처리 후 처리될 이미지 중의 일부 중요한 특징을 잃을 가능성을 낮추고, 시맨틱 분할의 정확성을 향상시키기 위해, 타깃 이미지가 결정되기 전에, 제5 특징 이미지를 획득할 수 있다.In another possible implementation manner, dimensionality reduction processing is performed on the first feature image before, and subsequent images are all obtained based on the plurality of third feature images after dimensionality reduction, but the finally generated semantic image is the image to be processed In order to reduce the possibility of losing some important features in the image to be processed after dimensionality reduction processing and to improve the accuracy of semantic segmentation, a fifth feature image may be obtained before the target image is determined.

여기서, 제5 특징 이미지는 처리될 이미지에 대해 낮은 차원 이미지 특징을 추출하여 얻은 이미지이다. 상기 제5 특징 이미지에 대응되는 특징 추출의 계층 수는 상기 제1 특징 이미지에 대응되는 특징 추출의 계층 수보다 작다. 처리될 이미지에 대해 10 계층 특징 추출을 수행하고, 제1 특징 이미지를 획득하면, 앞의 4 계층을 특징 추출을 수행한 후 얻은 이미지를 제5 특징 이미지로 사용할 수 있다.Here, the fifth feature image is an image obtained by extracting low-dimensional image features from the image to be processed. The number of layers of feature extraction corresponding to the fifth feature image is smaller than the number of layers of feature extraction corresponding to the first feature image. When 10-layer feature extraction is performed on the image to be processed and a first feature image is obtained, an image obtained after performing feature extraction on the previous four layers may be used as a fifth feature image.

상응하게, 예를 들어 도 6에 도시된 바와 같이, 상기 이미지 시맨틱 분할 방법은 하기와 같은 단계를 더 포함할 수 있다.Correspondingly, for example, as shown in FIG. 6 , the image semantic segmentation method may further include the following steps.

단계 105에 있어서, 상기 처리될 이미지에 대해 특징 추출 및 차원 축소 처리를 수행한 후, 제5 특징 이미지를 획득한다.In step 105, after feature extraction and dimension reduction processing are performed on the image to be processed, a fifth feature image is obtained.

본 발명의 실시예에 있어서, 제4 특징 이미지 및 제5 특징 이미지에 대해 오버레이하여 업 샘플링 처리를 수행하여, 타깃 이미지를 얻을 수 있다.In an embodiment of the present invention, an up-sampling process may be performed by overlaying the fourth and fifth feature images to obtain a target image.

여기서, 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수보다 작은 경우, 단계 103-2는 제4 특징 이미지 및 제5 특징 이미지를 오버레이한 후 업 샘플링하여, 상기 타깃 이미지를 획득하고, 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 경우, 제4 특징 이미지에 대해서만 업 샘플링을 수행하여, 타깃 이미지를 획득할 수 있다. 타깃 이미지를 새로운 제1 특징 이미지로 사용하고, 돌아와 단계 102를 실행한다. 매번 업 샘플링 처리를 수행할 때 대응되는 업 샘플링 인자 t는 동일하거나 상이할 수 있다.Here, if the number of times of synchronously extracting a plurality of context features having different ranges from the first feature image is less than the target number, step 103-2 is performed by overlaying the fourth feature image and the fifth feature image and then up-sampling , when the number of times of obtaining the target image and synchronously extracting a plurality of context features having different ranges with respect to the first feature image reaches the target number of times, upsampling is performed only on the fourth feature image, so that the target image can be obtained. Using the target image as the new first feature image, return and execute step 102 . When performing the up-sampling process each time, the corresponding up-sampling factor t may be the same or different.

다른 가능한 구현 방식에 있어서, 마찬가지로 제4 특징 이미지 및 제5 특징 이미지에 따라 타깃 이미지를 결정할 수 있다.In another possible implementation manner, the target image may be determined according to the fourth characteristic image and the fifth characteristic image as well.

제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수보다 작은 경우, 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행한 후 얻은 이미지와 상기 제5 특징 이미지에 대해 오버레이하여, 상기 타깃 이미지를 획득한다. 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 경우, 상기 제4 특징 이미지에 대해 직접 서브 픽셀 컨볼루션을 수행하여, 타깃 이미지를 획득한다.When the number of times of synchronously extracting a plurality of context features having different ranges from the first feature image is less than the target number, the image obtained after performing sub-pixel convolution on the fourth feature image and the fifth feature image By overlaying on , the target image is obtained. When the number of times of synchronously extracting a plurality of context features having different ranges from the first feature image reaches the target number, sub-pixel convolution is directly performed on the fourth feature image to obtain a target image.

본 발명의 실시예에 있어서, 시맨틱 분할의 효과를 확보하기 위해, 마찬가지로 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수보다 작은 경우, 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하고, 얻은 이미지와 제5 특징 이미지를 오버레이하여, 타깃 이미지를 획득할 수 있다. 만약 상기 횟수가 타깃 횟수에 도달하면, 제4 특징 이미지에 대해 직접 서브 픽셀 컨볼루션을 수행하여, 타깃 이미지를 획득할 수 있다. 여기서, 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행한 후, 다시 업 샘플링을 수행할 수 있다. 나아가 타깃 이미지를 새로운 제1 특징 이미지로 사용하고, 돌아와 단계 102를 실행한다.In an embodiment of the present invention, in order to secure the effect of semantic segmentation, similarly, when the number of synchronously extracting context features having a plurality of ranges from the first feature image is smaller than the target number, the fourth feature image is A target image may be obtained by performing sub-pixel convolution on the image and overlaying the obtained image and the fifth feature image. If the number reaches the target number, the target image may be obtained by directly performing sub-pixel convolution on the fourth feature image. Here, after sub-pixel convolution is performed on the fourth feature image, up-sampling may be performed again. Further, the target image is used as the new first feature image, and the step 102 is returned and executed.

설명해야 할 것은, 매번 타깃 이미지를 결정한 후, 타깃 이미지를 새로운 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출할 때, 새로운 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행할 때 얻은 새로운 복수 개 제3 특징 이미지의 차원은, 전에 차원 축소 처리 후 얻은 복수 개 제3 특징 이미지의 차원과 동일하거나 상이할 수 있다. 예를 들어, 지난번 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리 후 1×1×256 차원의 복수 개 제3 특징 이미지를 얻고, 새로운 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리 후 1×1×128 차원의 새로운 복수 개 제3 특징 이미지를 얻을 수 있다.It should be explained that after each target image is determined, when the target image is used as a new first feature image to synchronously extract a plurality of context features having different ranges again, a plurality of channels for the new first feature image The dimensions of the new plurality of third feature images obtained when the dimensionality reduction process is performed synchronously by dividing by ? may be the same as or different from the dimensions of the plurality of third feature images obtained after the previous dimension reduction process. For example, after dividing the last first feature image into a plurality of channels and synchronously reducing the dimension, a plurality of third feature images of 1×1×256 dimension are obtained, and the new first feature image is divided into a plurality of channels. After dividing and synchronously reducing the dimension, it is possible to obtain a plurality of new 3rd feature images of 1×1×128 dimensions.

이 밖에, 매번 복수 개 제3 특징 이미지에 대해 확장 컨볼루션을 수행할 때의 확장 계수도 동일하거나 상이할 수 있다. 예를 들어, 지난번 복수 개 제3 특징 이미지 중의 적어도 두 개에 대해 확장 컨볼루션을 수행할 때, 대응되는 확장 계수는 각각 6, 12, 18일 수 있고, 새로운 복수 개 제3 특징 이미지 중의 적어도 두 개에 대해 확장 컨볼루션을 수행할 때, 대응되는 확장 계수는 각각 6 및 12일 수 있다.In addition, extension coefficients when performing extension convolution on a plurality of third feature images each time may be the same or different. For example, when extension convolution is performed on at least two of the plurality of third feature images last time, corresponding extension coefficients may be 6, 12, and 18, respectively, and at least two of the new plurality of third feature images When the extension convolution is performed on dogs, the corresponding extension coefficients may be 6 and 12, respectively.

상기 실시예에 있어서, 적어도 제4 특징 이미지에 따라, 하나의 타깃 이미지를 결정할 수 있음으로써, 시맨틱 분할의 정밀도 및 정확성을 확보하고, 사용 가능성이 높다.In the above embodiment, by determining one target image according to at least the fourth feature image, the precision and accuracy of semantic segmentation are secured, and the possibility of use is high.

일부 선택 가능한 실시예에 있어서, 최종적으로 얻은 시맨틱 이미지의 차원과 처리될 이미지의 차원이 일치함을 확보하기 위해, 타깃 이미지가 출력되기 전에, 차원 축소 및/또는 차원 확장 처리를 수행함으로써, 타깃 이미지에 대응되는 차원은 타깃 차원임을 확보할 수 있다. 여기서, 타깃 차원은 기설정된 상기 시맨틱 이미지에 포함된 물체 카테고리의 총 수량에 따라 결정된 것이다.In some selectable embodiments, dimensionality reduction and/or dimensionality expansion processing is performed before the target image is output to ensure that the dimension of the finally obtained semantic image matches the dimension of the image to be processed, whereby the target image It can be ensured that the dimension corresponding to , is the target dimension. Here, the target dimension is determined according to the total number of object categories included in the preset semantic image.

예를 들어, 타깃 차원은 1×1×16N 일 수 있고, N은 기설정된 상기 시맨틱 이미지에 포함된 물체 카테고리의 총 수량이다. 만약 시맨틱 이미지에서 4가지 유형 물체 카테고리를 분석해야 하면, 타깃 차원은 1×1×64일 수 있다. For example, the target dimension may be 1×1×16N, where N is the total number of object categories included in the preset semantic image. If four tangible object categories need to be analyzed in the semantic image, the target dimension can be 1×1×64.

상기 실시예에 있어서, 제일 마지막에 획득한 타깃 이미지에 대응되는 차원은 타깃 이미지가 출력되기 전에, 차원 축소 및/또는 차원 확장 처리(예를 들어 기설정 채널 수의 컨볼루션 계층을 사용하여 컨볼루션 동작을 수행한다)를 수행함으로써, 타깃 이미지의 차원은 타깃 차원임을 확보하여, 시맨틱 분할의 정확성 및 정밀도를 향상시킨다.In the above embodiment, the dimension corresponding to the last acquired target image is dimensionally reduced and/or dimensionally expanded before the target image is output (for example, convolution using a convolutional layer with a preset number of channels) operation) to ensure that the dimension of the target image is the target dimension, thereby improving the accuracy and precision of semantic segmentation.

일부 선택 가능한 실시예에 있어서, 단계 104에 대해, 제일 마지막으로 타깃 이미지를 획득한 후, 보간 알고리즘을 사용하여 상기 시맨틱 이미지를 생성할 수 있고, 상기 보간 알고리즘은 이중 선형 보간 알고리즘을 포함할 수 있지만 이에 한정되지 않는다.In some selectable embodiments, for step 104, after the last target image is obtained, an interpolation algorithm may be used to generate the semantic image, wherein the interpolation algorithm may include a bilinear interpolation algorithm; However, the present invention is not limited thereto.

상기 실시예에 대해 진일보 예를 들어 설명하면, 예를 들어 도 7에 도시된 바와 같이, 수집된 처리될 이미지는(예를 들어 도면에서 도시된 실시간 이미지) 완전 컨볼루션의 신경 네트워크를 입력할 수 있고, 상기 완전 컨볼루션의 신경 네트워크에서 대응되는 시맨틱 이미지를 출력한다.Taking the above embodiment as a further example, for example, as shown in FIG. 7 , the collected images to be processed (eg, real-time images shown in the figure) can be input to a fully convolutional neural network. and output the corresponding semantic image in the neural network of the full convolution.

상기 완전 컨볼루션의 신경 네트워크는 프론트 엔드 서브 네트워크 및 백 엔드 서브 네트워크를 포함할 수 있다.The fully convolutional neural network may include a front-end subnetwork and a back-end subnetwork.

여기서 프론트 엔드 서브 네트워크는 특징 추출 네트워크일 수 있고, Resnet, VGG 등 신경 네트워크를 사용할 수 있다.Here, the front-end subnetwork may be a feature extraction network, and a neural network such as Resnet or VGG may be used.

프론트 엔드 서브 네트워크에 대해 훈련을 수행하는 과정에서, 예를 들어 ImageNet와 같은 인공적으로 태깅 된 이미지 카테고리 샘플 데이터 세트를 사용할 수 있다. ImageNet 세트에는 이미지 및 대응되는 이미지 특징 라벨이 포함되고, 프론트 엔드 서브 네트워크의 네트워크 파라미터를 조정하는 것을 통해, 프론트 엔드 서브 네트워크 출력 결과로 하여금 ImageNet 샘플 집합에서의 라벨 내용과 매칭되거나 또는 내결함성 범위 내에 있도록 한다.In the process of performing training on the front-end subnetwork, we can use artificially tagged image category sample datasets, such as ImageNet, for example. The ImageNet set contains the image and the corresponding image feature labels, and by adjusting the network parameters of the front-end subnetwork, the output of the front-end subnetwork matches the label content in the ImageNet sample set or is within fault tolerance. do.

프론트 엔드 서브 네트워크를 통해 상기 처리될 이미지에 대응되는 제1 특징 이미지를 획득할 수 있고, 나아가 제1 특징 이미지를 백 엔드 서브 네트워크에 입력하여, 상기 백 엔드 서브 네트워크에 의해 출력된 시맨틱 이미지를 획득한다.A first feature image corresponding to the image to be processed can be obtained through the front-end subnetwork, and further, the first feature image is input to the back-end subnetwork to obtain the semantic image output by the back-end subnetwork do.

백 엔드 서브 네트워크에 대해 훈련을 수행할 때, 예를 들어 CityScapes와 같은 인공적으로 태깅 된 이미지 시맨틱 분할 샘플 세트를 사용하고, 역전파 알고리즘을 통해 전체 신경 네트워크의 네트워크 파라미터를 훈련시킬 수 있으며, 프론트 엔드 서브 네트워크 및 백 엔드 서브 네트워크의 네트워크 파라미터를 포함하여, 백 엔드 서브 네트워크 출력 결과로 하여금 CityScapes 샘플 집합에서의 라벨 내용과 매칭되거나 또는 내결함성 범위 내에 있도록 한다.When training on the back-end subnetwork, for example, using an artificially tagged image semantic segmentation sample set such as CityScapes, the network parameters of the entire neural network can be trained through a back-propagation algorithm, and the front-end Including the network parameters of the subnetworks and the backend subnetworks, the output results of the backend subnetworks match the label content in the CityScapes sample set or fall within the fault tolerance range.

백 엔드 서브 네트워크에 의해 사용된 네트워크 아키텍처를 편리하게 소개하기 위해, 본 발명의 실시예에서는 타깃 횟수가 2일 때만 예로 들어 설명하고, 유의해야 할 것은, 타깃 횟수가 2보다 큰 다른 양의 정수 값일 때 모두 본 발명의 보호 범위에 속한다.In order to conveniently introduce the network architecture used by the back-end subnetwork, in the embodiment of the present invention, only the target number is 2 as an example, and it should be noted that the target number is another positive integer value greater than 2 All of them fall within the protection scope of the present invention.

가능한 구현 방식에 있어서, 백 엔드 서브 네트워크의 네트워크 아키텍처는 도 8a에 도시된 바와 같을 수 있다.In a possible implementation manner, the network architecture of the back-end subnetwork may be as shown in FIG. 8A .

서브 네트워크 1을 통해, 먼저 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하고, 깊이별 분리 가능한 컨볼루션 및 컨볼루션 커널이 상이한 확장 계수에 대으되는 확장 컨볼루션을 통해, 복수 개 제2 특징 이미지를 획득할 수 있다.Through the subnetwork 1, first, the first feature image is divided into a plurality of channels, and dimensionality reduction processing is performed synchronously to obtain a plurality of third feature images. Again, context features with different ranges are extracted for at least two third feature images among the plurality of third feature images, and through convolution separable by depth and extension convolution in which convolution kernels are applied to different extension coefficients. , a plurality of second feature images may be acquired.

진일보로, 복수 개 제2 특징 이미지를 오버레이하여 업 샘플링 처리(도 8a에서 업 샘플링 과정이 도시되지 않음)를 수행하여, 타깃 이미지를 얻을 수 있고, 복수 개 제1 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이한 후 업 샘플링 처리하여하여, 타깃 이미지를 얻을 수도 있다.Further, by performing an up-sampling process (not shown in FIG. 8A ) by overlaying a plurality of second feature images, a target image can be obtained, and a plurality of first images and context feature extraction are performed A target image may be obtained by overlaying at least one third feature image that has not been processed and then performing an up-sampling process.

타깃 이미지를 새로운 제1 특징 이미지로 직접 사용하고, 서브 네트워크 2를 통해, 먼저 새로운 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하고, 다시 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하고, 예를 들어 깊이별 분리 가능한 컨볼루션 및 컨볼루션 커널이 상이한 확장 계수에 대응되는 확장 컨볼루션을 통해, 복수 개 제2 특징 이미지를 획득할 수 있다. 다시 복수 개 제2 특징 이미지를 오버레이한 후 업 샘플링 처리하여하여, 타깃 이미지를 얻을 수 있고, 복수 개 제1 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이한 후 업 샘플링하여, 타깃 이미지를 얻을 수도 있다.The target image is directly used as the new first feature image, and through the subnetwork 2, the new first feature image is first divided into a plurality of channels to perform dimensional reduction processing synchronously, and then the plurality of third feature images are again acquire Again, context features having different ranges are extracted for at least two third feature images among the plurality of third feature images, and for example, a convolution separable by depth and an extension convolution corresponding to a different extension coefficient with a convolution kernel. Through , a plurality of second feature images may be acquired. After overlaying a plurality of second feature images again, the target image can be obtained by up-sampling processing, and after overlaying the plurality of first images and at least one third feature image on which context feature extraction is not performed, up-sampling By sampling, the target image can also be obtained.

서브 네트워크 2에 의해 출력된 타깃 이미지에 대해 양선형 보간 알고리즘을 사용하여, 상기 시맨틱 이미지를 생성한다.The semantic image is generated using a bilinear interpolation algorithm for the target image output by the subnetwork 2 .

상기 실시예에 있어서, 여러 번으로 나눠서 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징 추출을 동기적으로 추출하고 융합할 수 있으며, 상이한 스케일의 컨택스트 정보를 충분하게 융합하여, 시맨틱 분할의 정확도를 향상시킨다. 또한 깊이별 분리 가능한 확장 컨볼루션을 사용하였기에, 시맨틱 분할 과정에서의 계산량을 감소시킨다.In the above embodiment, it is possible to synchronously extract and fuse a plurality of context feature extractions with different ranges for the first feature image by dividing it several times, and by sufficiently fusing context information of different scales, semantic segmentation improve the accuracy of In addition, since an extended convolution that can be separated by depth is used, the amount of computation in the semantic segmentation process is reduced.

다른 가능한 구현 방식에 있어서, 백 엔드 서브 네트워크의 네트워크 아키텍처는 도 8b에 도시된 바와 같을 수 있다.In another possible implementation manner, the network architecture of the back-end subnetwork may be as shown in FIG. 8B .

서브 네트워크 1을 통해, 먼저 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하고, 예를 들어 깊이별 분리 가능한 확장 컨볼루션 동작을 수행할 수 있고, 확장 계수는 서로 상이하며, 복수 개 제2 특징 이미지를 획득한다.Through the subnetwork 1, first, the first feature image is divided into a plurality of channels, and dimensionality reduction processing is performed synchronously to obtain a plurality of third feature images. Again, a context feature having a different range may be extracted for at least two third feature images among the plurality of third feature images, for example, an extension convolution operation separable by depth may be performed, and extension coefficients are different from each other, , to acquire a plurality of second feature images.

시맨틱 분할의 효과를 향상시키기 위해, 복수 개 제2 특징 이미지를 오버레이하여 서브 픽셀 컨볼루션 및 업 샘플링 처리를 수행하여, 타깃 이미지를 획득할 수 있다. 또는 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여 서브 픽셀 컨볼루션 및 업 샘플링 처리(도 8b에서 업 샘플링 과정이 도시되지 않음)를 수행하여, 타깃 이미지를 얻을 수 있다.In order to improve the effect of semantic segmentation, a target image may be obtained by overlaying a plurality of second feature images to perform sub-pixel convolution and up-sampling processing. Alternatively, a plurality of second feature images and at least one third feature image on which context feature extraction is not performed are overlaid to perform sub-pixel convolution and up-sampling processing (up-sampling process is not shown in FIG. 8B ), You can get the target image.

타깃 이미지를 새로운 제1 특징 이미지로 직접 사용하고, 서브 네트워크 2를 통해, 먼저 새로운 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하고, 다시 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하고, 예를 들어 깊이별 분리 가능한 확장 컨볼루션 동작을 수행할 수 있고, 확장 계수는 서로 상이하며, 복수 개 제2 특징 이미지를 획득한다. 다시 복수 개 제2 특징 이미지를 오버레이하여 서브 픽셀 컨볼루션 및 업 샘플링 처리를 수행하여, 타깃 이미지를 얻고, 복수 개 제1 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여 서브 픽셀 컨볼루션 및 업 샘플링 처리를 수행하여, 타깃 이미지를 얻을 수도 있다.The target image is directly used as the new first feature image, and through the subnetwork 2, the new first feature image is first divided into a plurality of channels to perform dimensional reduction processing synchronously, and then the plurality of third feature images are again acquire Again, a context feature having a different range may be extracted for at least two third feature images among the plurality of third feature images, for example, an extension convolution operation separable by depth may be performed, and extension coefficients are different from each other, , to acquire a plurality of second feature images. Again, a plurality of second feature images are overlaid to perform sub-pixel convolution and up-sampling processing to obtain a target image, and the plurality of first images and at least one third feature image without context feature extraction are overlaid to perform sub-pixel convolution and up-sampling processing to obtain a target image.

상기 실시예에 있어서, 여러 번으로 나눠서 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징 추출을 동기적으로 추출하고 융합할 수 있으며, 상이한 스케일의 컨택스트 정보를 충분하게 융합하여, 시맨틱 분할의 정확도를 향상시킨다. 또한 깊이별 분리 가능한 확장 컨볼루션을 사용하였기에, 시맨틱 분할 과정에서의 계산량을 감소시킨다. 이 밖에, 서브 픽셀 컨볼루션을 통해 시맨틱 분할의 효과를 향상시킬 수 있다.In the above embodiment, it is possible to synchronously extract and fuse a plurality of context feature extractions with different ranges for the first feature image by dividing it several times, and by sufficiently fusing context information of different scales, semantic segmentation improve the accuracy of In addition, since extended convolution that can be separated by depth is used, the amount of computation in the semantic segmentation process is reduced. In addition, the effect of semantic segmentation can be improved through sub-pixel convolution.

다른 가능한 구현 방식에 있어서, 백 엔드 서브 네트워크의 네트워크 아키텍처는 도 8c에 도시된 바와 같을 수 있다.In another possible implementation manner, the network architecture of the back-end sub-network may be as shown in FIG. 8C .

서브 네트워크 1을 통해, 먼저 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개에 대해 범위가 상이한 컨택스트 특징을 추출하고, 예를 들어 깊이별 분리 가능한 확장 컨볼루션 동작을 수행할 수 있고, 확장 계수는 서로 상이하며, 복수 개 제2 특징 이미지를 획득한다.Through the subnetwork 1, first, the first feature image is divided into a plurality of channels, and dimensionality reduction processing is performed synchronously to obtain a plurality of third feature images. Again, it is possible to extract context features with different ranges from at least two of the plurality of third feature images, for example, perform an extension convolution operation that is separable by depth, and the extension coefficients are different from each other, and 2 Acquire a feature image.

진일보로, 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여, 다시 제5 특징 이미지와 오버레이를 수행하며, 오버레이된 이미지에 대해 업 샘플링 처리(도 8c에서 업 샘플링 과정이 도시되지 않음)를 수행하여, 타깃 이미지를 얻을 수 있다. 여기서, 제5 특징 이미지에 대응되는 특징 추출의 계층 수는 상기 제1 특징 이미지에 대응되는 특징 추출의 계층 수보다 작다.Further, by overlaying a plurality of second feature images and at least one third feature image that has not been subjected to context feature extraction, overlaying the fifth feature image again is performed, and up-sampling processing (Fig. By performing an up-sampling process (not shown) in 8c, a target image may be obtained. Here, the number of layers of feature extraction corresponding to the fifth feature image is smaller than the number of layers of feature extraction corresponding to the first feature image.

타깃 이미지를 새로운 제1 특징 이미지로 직접 사용하고, 서브 네트워크 2를 통해, 먼저 새로운 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하고, 다시 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하고, 예를 들어 깊이별 분리 가능한 확장 컨볼루션 동작을 수행할 수 있고, 확장 계수는 서로 상이하며, 복수 개 제2 특징 이미지를 획득한다. 다시 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여, 오버레이된 이미지에 대해 업 샘플링 처리를 수행하여, 타깃 이미지를 얻는다.The target image is directly used as the new first feature image, and through the subnetwork 2, the new first feature image is first divided into a plurality of channels to perform dimensional reduction processing synchronously, and then the plurality of third feature images are again acquire Again, a context feature having a different range may be extracted for at least two third feature images among the plurality of third feature images, for example, an extension convolution operation separable by depth may be performed, and extension coefficients are different from each other, , to acquire a plurality of second feature images. Again, a plurality of second feature images and at least one third feature image on which context feature extraction is not performed are overlaid, and an up-sampling process is performed on the overlaid image to obtain a target image.

다른 가능한 구현 방식에 있어서, 백 엔드 서브 네트워크의 네트워크 아키텍처는 도 8d에 도시된 바와 같을 수 있다.In another possible implementation manner, the network architecture of the back-end sub-network may be as shown in FIG. 8D .

진일보로, 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여, 서브 픽셀 컨볼루션 및 업 샘플링 처리(도 8d에서 업 샘플링 과정이 도시되지 않음)를 수행하며, 다시 제5 특징 이미지와 오버레이하여, 타깃 이미지를 얻을 수 있다. 여기서, 제5 특징 이미지에 대응되는 특징 추출의 계층 수는 상기 제1 특징 이미지에 대응되는 특징 추출의 계층 수보다 작다.Further, sub-pixel convolution and up-sampling processing (up-sampling process not shown in FIG. 8D ) is performed by overlaying a plurality of second feature images and at least one third feature image on which context feature extraction is not performed. and overlaying the fifth feature image again to obtain a target image. Here, the number of layers of feature extraction corresponding to the fifth feature image is smaller than the number of layers of feature extraction corresponding to the first feature image.

타깃 이미지를 새로운 제1 특징 이미지로 직접 사용하고, 서브 네트워크 2를 통해, 먼저 새로운 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하고, 다시 복수 개 제3 특징 이미지를 획득한다. 다시 복수 개 제3 특징 이미지 중의 적어도 두 개에 대해 범위가 상이한 컨택스트 특징을 추출하고, 예를 들어 깊이별 분리 가능한 확장 컨볼루션 동작을 수행할 수 있고, 확장 계수는 서로 상이하며, 복수 개 제2 특징 이미지를 획득한다. 다시 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여, 오버레이된 이미지에 대해 서브 픽셀 컨볼루션 및 업 샘플링 처리를 수행하여, 타깃 이미지를 얻는다.The target image is directly used as the new first feature image, and through the subnetwork 2, the new first feature image is first divided into a plurality of channels to perform dimensional reduction processing synchronously, and then the plurality of third feature images are again acquire Again, it is possible to extract context features with different ranges from at least two of the plurality of third feature images, for example, perform an extension convolution operation that is separable by depth, and the extension coefficients are different from each other, and 2 Acquire a feature image. Again, a plurality of second feature images and at least one third feature image on which context feature extraction is not performed are overlaid, and sub-pixel convolution and upsampling processing are performed on the overlaid image to obtain a target image.

이 밖에, 타깃 이미지의 차원은 타깃 차원임을 확보하기 위해, 복수 개 제2 특징 이미지 및 컨택스트 특징 추출을 수행하지 않은 적어도 하나의 제3 특징 이미지를 오버레이하여, 오버레이된 이미지에 대해 차원 축소 처리 및 차원 확장 처리를 수행한 다음, 서브 픽셀 컨볼루션 및 업 샘플링 처리를 수행하여, 타깃 이미지를 획득할 수 있다.In addition, in order to ensure that the dimension of the target image is the target dimension, a plurality of second feature images and at least one third feature image on which context feature extraction is not performed are overlaid to perform dimensionality reduction processing and After dimensional expansion processing is performed, sub-pixel convolution and up-sampling processing may be performed to obtain a target image.

상기 실시예에 있어서, 여러 번으로 나눠서 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징 추출을 동기적으로 추출하고 융합할 수 있으며, 상이한 스케일의 컨택스트 정보를 충분하게 융합하여, 시맨틱 분할의 정확도를 향상시킨다. 깊이별 분리 가능한 확장 컨볼루션을 사용하였기에, 시맨틱 분할 과정에서의 계산량을 감소시킨다. 이 밖에, 제5 특징 이미지를 사용하여 타깃 이미지를 결정하고, 검출될 이미지 중의 중요한 정보가 손실되지 않음을 확보할 수 있으며, 마찬가지로 시맨틱 분할의 정확도를 향상시킨다.In the above embodiment, it is possible to synchronously extract and fuse a plurality of context feature extractions with different ranges for the first feature image by dividing it several times, and by sufficiently fusing context information of different scales, semantic segmentation improve the accuracy of Since an extended convolution that is separable by depth is used, the amount of computation in the semantic segmentation process is reduced. In addition, it is possible to determine a target image by using the fifth feature image, to ensure that important information in the image to be detected is not lost, and similarly improve the accuracy of semantic segmentation.

일부 선택 가능한 실시예에 있어서, 예를 들어 도 9에 도시된 바와 같이, 단계 104을 완료한 후, 상기 이미지 시맨틱 분할 방법은, 상기 시맨틱 이미지에 따라 기계 기기 내비게이션을 수행하는 단계 106을 더 포함할 수 있다.In some selectable embodiments, after completing step 104 , for example as shown in FIG. 9 , the image semantic segmentation method further comprises a step 106 of performing machine device navigation according to the semantic image. can

본 발명의 실시예에 있어서, 생성된 시맨틱 이미지에 따라 기계 기기에 대해 내비게이션을 수행할 수 있다. 예를 들어 시맨틱 이미지에 장애물이 포함되면, 장애물을 피하는 내비게이션을 수행할 수 있고, 시맨틱 이미지에 갈림길이 포함되면, 지정된 노선에 따라, 직진 또는 코너링이 필요한지 여부를 결정할 수 있다. In an embodiment of the present invention, navigation may be performed for a mechanical device according to the generated semantic image. For example, if an obstacle is included in the semantic image, navigation to avoid the obstacle may be performed, and if a fork in the semantic image is included, it may be determined whether going straight or cornering is required according to a designated route.

상기 실시예에 있어서, 생성된 처리될 이미지에 대응되는 시맨틱 이미지에 따라 기계 기기 내비게이션을 수행할 수 있고, 사용 가능성이 높다.In the above embodiment, the machine device navigation can be performed according to the generated semantic image corresponding to the image to be processed, and the possibility of use is high.

전술한 방법 실시예와 대응되고, 본 발명은 장치의 실시예를 더 제공한다.Corresponding to the method embodiments described above, the present invention further provides an embodiment of the apparatus.

도 10에 도시된 바와 같이, 도 10은 본 발명이 일 예시적 실시예에 따라 도시한 이미지 시맨틱 분할 장치 블록도이고, 이미지 시맨틱 분할 장치는, 획득된 처리될 이미지에 대해 특징 추출을 수행하고, 제1 특징 이미지를 획득하도록 구성된 특징 추출 모듈(210); 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하여, 복수 개 제2 특징 이미지를 획득되도록 구성된 컨택스트 특징 추출 모듈(220); 적어도 상기 복수 개 제2 특징 이미지에 따라, 타깃 이미지를 결정하고, 상기 타깃 이미지를 새로운 상기 제1 특징 이미지로 사용하여 다시 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출하도록 구성된 결정 모듈(230); 및 상기 제1 특징 이미지에 대해 복수 개 범위가 상이한 컨택스트 특징을 동기적으로 추출한 횟수가 타깃 횟수에 도달한 것에 응답하여, 제일 마지막에 획득한 상기 타깃 이미지에 기반하여, 상기 처리될 이미지에 대응되는 시맨틱 이미지를 생성하도록 구성된 시맨틱 이미지 생성 모듈(240)을 포함한다.As shown in Fig. 10, Fig. 10 is a block diagram of an image semantic segmentation apparatus according to an exemplary embodiment of the present invention, wherein the image semantic segmentation apparatus performs feature extraction on the obtained image to be processed; a feature extraction module 210, configured to acquire a first feature image; a context feature extraction module 220 configured to synchronously extract a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images; A determining module 230, configured to determine a target image according to at least the plurality of second feature images, and to synchronously extract a plurality of different ranges of context features by using the target image as the new first feature image again ); and in response to the number of times of synchronously extracting a plurality of context features having different ranges for the first feature image reaches the target number, based on the target image acquired last, corresponding to the image to be processed and a semantic image generation module 240 configured to generate a semantic image that is

일부 선택 가능한 실시예에 있어서, 상기 컨택스트 특징 추출 모듈(220)은, 상기 제1 특징 이미지에 대해 복수 개 채널로 나누어 동기적으로 차원 축소 처리를 수행하여, 복수 개 제3 특징 이미지를 획득하도록 구성된 제1 처리 서브 모듈; 및 상기 복수 개 제3 특징 이미지 중의 적어도 두 개 제3 특징 이미지에 대해 범위가 상이한 컨택스트 특징을 추출하여, 복수 개 제2 특징 이미지를 획득하도록 구성된 제2 처리 서브 모듈을 포함한다.In some selectable embodiments, the context feature extraction module 220 is configured to divide the first feature image into a plurality of channels and synchronously perform dimension reduction processing to obtain a plurality of third feature images. a first processing sub-module configured; and a second processing sub-module, configured to extract context features having different ranges for at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images.

일부 선택 가능한 실시예에 있어서, 상기 제1 결정 서브 모듈은, 상기 복수 개 제2 특징 이미지를 오버레이하여, 상기 제4 특징 이미지를 얻고; 또는 상기 복수 개 제2 특징 이미지 및 복수 개 제3 특징 이미지 중 적어도 하나의 제3 특징 이미지에 대해 오버레이를 수행하여, 상기 제4 특징 이미지를 얻도록 구성된다.In some selectable embodiments, the first determining submodule is configured to: overlay the plurality of second feature images to obtain the fourth feature image; or performing overlay on at least one third feature image among the plurality of second feature images and the plurality of third feature images to obtain the fourth feature image.

일부 선택 가능한 실시예에 있어서, 상기 제2 결정 서브 모듈은, 상기 제4 특징 이미지에 대해 업 샘플링을 수행하여, 상기 타깃 이미지를 획득하고; 또는 상기 제4 특징 이미지에 대해 서브 픽셀 컨볼루션을 수행하여, 상기 타깃 이미지를 획득하도록 구성된다.In some selectable embodiments, the second determining submodule is configured to perform up-sampling on the fourth feature image to obtain the target image; or perform sub-pixel convolution on the fourth feature image to obtain the target image.

일부 선택 가능한 실시예에 있어서, 상기 제일 마지막에 획득한 상기 타깃 이미지에 대응되는 차원이 타깃 차원이고, 상기 타깃 차원은 기설정된 상기 시맨틱 이미지에 포함된 물체 카테고리의 총 수량에 따라 결정된 것이다.In some selectable embodiments, a dimension corresponding to the last acquired target image is a target dimension, and the target dimension is determined according to a total quantity of object categories included in the preset semantic image.

장치 실시예에 있어서, 방법 실시예에 거의 대응되므로, 관련 부분에 대해서는 방법 실시예의 부분적인 설명을 참조하면 된다. 전술한 장치 실시예는 다만 예시적일 뿐이고, 여기서 분리 부재로서 설명된 유닛은 물리적으로 분리되거나, 분리되지 않을 수도 있고, 유닛으로서 나타낸 부재는 물리적 유닛이거나 아닐 수 있고, 즉 한곳에 위치할 수 있거나, 또는 복수 개의 네트워크 유닛에 분포될 수도 있다. 실제 수요에 따라 그중의 일부 또는 전부 모듈을 선택하여 본 발명 방안의 목적을 구현할 수 있다. 본 분야 통상의 기술자라면 창조성 노동을 부여하지 않은 경우에도 이해 및 실시를 할 수 있다.In the apparatus embodiment, since it almost corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for related parts. The device embodiments described above are exemplary only, wherein a unit described as a separating member may or may not be physically separated, and a member shown as a unit may or may not be a physical unit, i.e., it may be located in one place, or It may be distributed in a plurality of network units. According to actual needs, some or all of the modules may be selected to implement the purpose of the present invention. A person skilled in the art can understand and practice even if creative labor is not given.

본 발명의 실시예는 컴퓨터 판독 가능 저장 매체를 더 제공하고, 저장 매체에는 컴퓨터 프로그램이 저장되어 있으며, 컴퓨터 프로그램은 상기 어느 한 항에 따른 이미지 시맨틱 분할 방법을 실행하기 위한 것이다.An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, the computer program for executing the image semantic segmentation method according to any one of the above.

본 발명의 실시예는 컴퓨터 프로그램을 더 제공하고, 상기 컴퓨터 프로그램은 컴퓨터로 하여금 상기 어느 한 항에 따른 이미지 시맨틱 분할 방법을 실행하도록 한다.An embodiment of the present invention further provides a computer program, which causes the computer to execute the image semantic segmentation method according to any one of the above.

일부 선택 가능한 실시예에 있어서, 본 발명의 실시예는 컴퓨터 판독 가능 코드를 포함하는 컴퓨터 프로그램 제품을 제공하고, 컴퓨터 판독 가능한 코드가 기기에서 작동될 경우, 기기에서의 프로세서는 상기 어느 한 실시예에서 제공하는 이미지 시맨틱 분할 방법을 구현하기 위한 명령어를 실행한다.In some selectable embodiments, embodiments of the present invention provide a computer program product comprising computer readable code, wherein when the computer readable code is run in a device, the processor in the device is configured in any of the above embodiments. Execute the command to implement the provided image semantic segmentation method.

일부 선택 가능한 실시예에 있어서, 본 발명의 실시예는 컴퓨터 판독 가능 명령어를 저장하기 위한 다른 컴퓨터 프로그램 제품을 더 제공하고, 명령어가 실행될 경우 컴퓨터로 하여금 상기 어느 한 실시예에서 제공하는 이미지 시맨틱 분할 방법의 동작을 실행하도록 한다.In some selectable embodiments, the embodiment of the present invention further provides another computer program product for storing computer readable instructions, and when the instruction is executed, causes the computer to cause the image semantic segmentation method provided by any of the above embodiments. to execute the action of

상기 컴퓨터 프로그램 제품은 구체적으로 하드웨어, 소프트웨어 또는 이들의 결합의 방식을 통해 구현될 수 있다. 선택 가능한 실시예에 있어서, 상기 컴퓨터 프로그램 제품은 컴퓨터 저장 매체로서 구체적으로 체현되며, 다른 선택 가능한 실시예에 있어서, 컴퓨터 프로그램 제품은 예를 들어 소프트웨어 개발 키트(Software Development Kit, SDK) 등과 같은 소프트웨어 제품으로서 구체적으로 체현된다.The computer program product may be specifically implemented through hardware, software, or a combination thereof. In an optional embodiment, the computer program product is specifically embodied as a computer storage medium, and in another optional embodiment, the computer program product is a software product such as, for example, a Software Development Kit (SDK) or the like. is specifically embodied as

본 발명의 실시예는 이미지 시맨틱 분할 장치를 더 제공하고, 상기 이미지 시맨틱 분할 장치는, 프로세서; 및 프로세서가 실행 가능한 명령어를 저장하기 위한 메모리를 포함하며, 여기서, 프로세서는 상기 메모리에 저장된 실행 가능한 명령어를 호출하여, 상기 어느 한 항에 따른 이미지 시맨틱 분할 방법을 구현하도록 구성된다.An embodiment of the present invention further provides an image semantic segmentation apparatus, comprising: a processor; and a memory for storing the instructions executable by the processor, wherein the processor is configured to call the executable instructions stored in the memory to implement the image semantic segmentation method according to any one of the preceding claims.

도 11은 본 출원의 실시예가 제공하는 이미지 시맨틱 분할 장치의 하드웨어 구조 예시도이다. 상기 이미지 시맨틱 분할 장치(310)는 프로세서(311)를 포함하고, 입력 장치(312), 출력 장치(313) 및 메모리(314)를 더 포함할 수 있다. 상기 입력 장치(312), 출력 장치(313), 메모리(314) 및 프로세서(311) 사이는 버스를 통해 상호 연결된다.11 is a diagram illustrating a hardware structure of an image semantic segmentation apparatus provided by an embodiment of the present application. The image semantic segmentation device 310 may include a processor 311 , and may further include an input device 312 , an output device 313 , and a memory 314 . The input device 312 , the output device 313 , the memory 314 and the processor 311 are interconnected via a bus.

메모리는 랜덤 액세스 메모리(Random Access Memory, RAM), 읽기 전용 메모리(Read-Only Memory, ROM), 소거 및 프로그램 가능 읽기 전용 메모리(Erasable Programmable Read Only Memory, EPROM) 또는 휴대용 콤팩트 디스크 읽기 전용 메모리(Compact Disc Read-Only Memory, CD-ROM)를 포함하지만 이에 한정되지 않으며, 상기 메모리는 관련 명령어 및 데이터를 위한 것이다.Memory can be Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), or Compact Disk Read Only Memory (Compact). Disc Read-Only Memory (CD-ROM)), wherein the memory is for related instructions and data.

입력 장치는 데이터 및/또는 신호를 입력하기 위한 것이고, 출력 장치는 데이터 및/또는 신호를 출력하기 위한 것이다. 출력 장치 및 입력 장치는 독립적인 소재일 수 있고, 전체적인 소재일 수도 있다.The input device is for inputting data and/or signals, and the output device is for outputting data and/or signals. The output device and the input device may be independent materials or may be a whole material.

프로세서는 하나 또는 복수 개 프로세서를 포함할 수 있고, 예를 들어 하나 또는 복수 개 중앙 처리 장치(Central Processing Unit, CPU)를 포함하며, 프로세서는 하나의 CPU인 경우, 상기 CPU는 싱글 코어 CPU일 수 있고, 멀티 코어 CPU일 수도 있다.A processor may include one or a plurality of processors, for example, one or a plurality of central processing units (CPUs), and when the processor is a single CPU, the CPU may be a single-core CPU It may also be a multi-core CPU.

메모리는 네트워크 기기의 프로그램 코드 및 데이터를 저장하기 위한 것이다.The memory is for storing program codes and data of the network device.

프로세서는 상기 메모리 중의 프로그램 코드 및 데이터를 호출하여, 상기 이미지 시맨틱 분할 방법 실시예에서의 단계를 실행하기 위한 것이다. 구체적으로 이미지 시맨틱 분할 방법 실시예에서의 설명을 참조할 수 있고, 여기서 더 이상 반복하여 설명하지 않는다.The processor is configured to call the program code and data in the memory to execute the steps in the image semantic segmentation method embodiment. Specifically, reference may be made to the description in the image semantic segmentation method embodiment, which will not be repeated here any longer.

이해할 수 있는 것은, 도 11은 단지 한 가지 이미지 시맨틱 분할 장치의 간단한 디자인만 도시하였다. 실제 응용에 있어서, 이미지 시맨틱 분할 장치는 필요한 다른 부품도 각각 더 포함할 수 있고, 임의의 수량의 입력/출력 장치, 프로세서, 제어기, 메모리 등이 포함되지만 이에 한정되지 않으며, 본 출원 실시예를 구현할 수 있는 모든 이미지 시맨틱 분할 장치는 모두 본 발명의 실시예의 보호 범위 내에 있다.Understandably, FIG. 11 only shows a simple design of one image semantic segmentation device. In practical applications, the image semantic segmentation device may further include other necessary components, respectively, including, but not limited to, any number of input/output devices, processors, controllers, memories, etc., to implement the embodiments of the present application. All possible image semantic segmentation devices are all within the protection scope of the embodiment of the present invention.

일부 실시예에 있어서, 본 발명의 실시예에서 제공한 장치가 갖고 있는 기능 또는 포함하는 모듈은 전술한 방법 실시예에서 설명한 방법을 실행하는 데 사용될 수 있고, 그 구체적인 구현은 전술한 방법 실시예의 설명을 참조할 수 있으며, 간결함을 위해, 여기서 더 이상 반복하여 설명하지 않는다.In some embodiments, a function possessed by an apparatus provided in an embodiment of the present invention or a module including the function may be used to execute the method described in the above-described method embodiment, and the specific implementation thereof is described in the above-described method embodiment description. , and for the sake of brevity, the description is not repeated here any further.

본 분야의 기술자는 명세서를 고려하고 여기서 개시된 발명을 실시한 후, 본 발명 실시예의 다른 실시 방안을 용이하게 생각해낼 수 있다. 본 발명의 실시예는 본 발명의 임의의 변형, 용도 또는 적응성 변화를 포함하도록 의도되며, 이러한 변형, 용도 또는 적응성 변화는 본 발명 실시예의 일반적 원리를 따르고 본 발명의 실시예에서 개시하지 않은 본 기술 분야에서의 공지된 상식 또는 통상적인 기술적 수단을 포함한다. 명세서 및 실시예는 다만 예시적인 것으로 간주하며, 본 발명 실시예의 진정한 범위 및 사상은 아래의 청구범위에 의해 지적된다.Other implementations of embodiments of the present invention can be readily conceived by those skilled in the art after considering the specification and practicing the invention disclosed herein. Embodiments of the present invention are intended to cover any modifications, uses or adaptability changes of the present invention, such modifications, uses or adaptability changes conform to the general principles of the embodiments of the present invention and are not disclosed in the embodiments of the present invention. It includes common knowledge or conventional technical means in the field. The specification and examples are to be regarded as illustrative only, the true scope and spirit of the embodiments of the present invention being indicated by the claims below.

상술한 내용은 본 발명의 선택 가능한 실시예일 뿐이고, 본 발명의 실시예를 한정하기 위한 것은 아니며, 본 발명 실시예의 사상 및 원칙 내에서 이루어진 임의의 수정, 동등한 교체, 개진 등은, 본 발명 실시예의 보호 범위에 포함되어야 한다.The above descriptions are only selectable embodiments of the present invention, and are not intended to limit the embodiments of the present invention, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiment of the present invention is, should be included in the scope of protection.

Claims

An image semantic segmentation method comprising:
performing feature extraction on the obtained image to be processed, and obtaining a first feature image;
synchronously extracting a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images;
determining a target image according to at least the plurality of second feature images, and synchronously extracting a plurality of context features having different ranges again by using the target image as the new first feature image; and
In response to the number of times of synchronously extracting a plurality of context features having different ranges for the first feature image reaches the target number, based on the target image acquired last, the image corresponding to the image to be processed An image semantic segmentation method comprising the step of generating a semantic image.

The method of claim 1,
synchronously extracting a plurality of context features having different ranges with respect to the first feature image to obtain a plurality of second feature images,
obtaining a plurality of third feature images by synchronously performing dimension reduction processing by dividing the first feature image into a plurality of channels; and
and extracting context features having different ranges from at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images.

3. The method of claim 2,
extracting context features having different ranges for at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images,
Extracting context features having different ranges for at least two third feature images among the plurality of third feature images through convolution separable by depth and extension convolution corresponding to extension coefficients having different convolution kernels, Image semantic segmentation method comprising the step of acquiring a plurality of second feature images.

4. The method according to any one of claims 1 to 3,
Determining a target image according to at least the plurality of second feature images includes:
fusing at least the plurality of second feature images to obtain a fourth feature image; and
and determining the target image according to at least the fourth feature image.

5. The method of claim 4,
The step of fusing at least the plurality of second feature images to obtain a fourth feature image comprises:
overlaying the plurality of second feature images to obtain the fourth feature image; or
and performing an overlay on at least one third feature image among the plurality of second feature images and the plurality of third feature images to obtain the fourth feature image.

6. The method according to claim 4 or 5,
Determining the target image according to at least the fourth feature image includes:
performing up-sampling on the fourth feature image to obtain the target image; or
and performing sub-pixel convolution on the fourth feature image to obtain the target image.

6. The method according to claim 4 or 5,
The image semantic segmentation method is
After performing feature extraction and dimension reduction processing on the image to be processed, obtaining a fifth feature image - The number of layers of feature extraction corresponding to the fifth feature image is feature extraction corresponding to the first feature image less than the number of layers of - further comprising;
Determining the target image according to at least the fourth feature image includes:
when the number of times is smaller than the target number, oversampling the fourth characteristic image and the fifth characteristic image and then up-sampling to obtain the target image; or
when the number is smaller than the target number, obtaining the target image by overlaying an image obtained after performing sub-pixel convolution on the fourth feature image and the fifth feature image Image semantic segmentation method with

8. The method according to any one of claims 1 to 7,
and a dimension corresponding to the last acquired target image is a target dimension, and the target dimension is determined according to a total number of object categories included in the preset semantic image.

9. The method according to any one of claims 1 to 8,
After generating a semantic image corresponding to the image to be processed, the image semantic segmentation method comprises:
Image semantic segmentation method according to claim 1, further comprising the step of performing machine device navigation according to the semantic image.

An image semantic segmentation device comprising:
a feature extraction module, configured to perform feature extraction on the obtained image to be processed, and obtain a first feature image;
a context feature extraction module configured to synchronously extract a plurality of context features having different ranges from the first feature image to obtain a plurality of second feature images;
a determining module, configured to determine a target image according to at least the plurality of second feature images, and to use the target image as the new first feature image to again synchronously extract a plurality of ranges different context features; and
In response to the number of times of synchronously extracting a plurality of context features having different ranges for the first feature image reaches the target number, based on the target image acquired last, the image corresponding to the image to be processed and a semantic image generation module configured to generate a semantic image.

11. The method of claim 10,
The context feature extraction module,
a first processing sub-module configured to synchronously perform dimension reduction processing by dividing the first characteristic image into a plurality of channels to obtain a plurality of third characteristic images; and
and a second processing sub-module, configured to extract context features having different ranges for at least two third feature images among the plurality of third feature images to obtain a plurality of second feature images. Semantic partitioning device.

12. The method of claim 11,
The second processing sub-module is configured to provide a range for at least two third feature images among the plurality of third feature images through convolution separable by depth and extension convolution corresponding to extension coefficients having different convolution kernels. Image semantic segmentation apparatus, configured to extract different context features to obtain a plurality of second feature images.

13. The method according to any one of claims 10 to 12,
The decision module is
a first determining submodule, configured to fuse at least the plurality of second feature images to obtain a fourth feature image; and
and a second determining sub-module, configured to determine the target image according to at least the fourth feature image.

14. The method of claim 13,
The first determining sub-module may include: overlaying the plurality of second feature images to obtain the fourth feature image; or performing overlay on at least one third feature image among the plurality of second feature images and the plurality of third feature images to obtain the fourth feature image.

15. The method of claim 13 or 14,
The second determining sub-module may include: performing up-sampling on the fourth feature image to obtain the target image; or performing sub-pixel convolution on the fourth feature image to obtain the target image.

15. The method of claim 13 or 14,
The image semantic segmentation apparatus is a processing module, configured to obtain a fifth feature image after performing feature extraction and dimension reduction processing on the image to be processed - the number of layers of feature extraction corresponding to the fifth feature image is the smaller than the number of layers of feature extraction corresponding to the first feature image; and
when the number is smaller than the target number, over-sampling the fourth characteristic image and the fifth characteristic image and then up-sampling to obtain the target image; or a second determining sub, configured to obtain the target image by overlaying an image obtained after performing sub-pixel convolution on the fourth feature image and the fifth feature image when the number is smaller than the target number Image semantic segmentation device, characterized in that it further comprises a module.

17. The method according to any one of claims 10 to 16,
and a dimension corresponding to the last acquired target image is a target dimension, and the target dimension is determined according to a total number of object categories included in the preset semantic image.

18. The method according to any one of claims 10 to 17,
The image semantic segmentation apparatus, characterized in that the image semantic segmentation apparatus further comprises a navigation module, configured to perform machine device navigation according to the semantic image.

A computer readable storage medium comprising:
A computer program is stored in the storage medium, and the computer program is for executing the image semantic segmentation method according to any one of claims 1 to 9.

An image semantic segmentation device comprising:
processor; and
a memory for storing instructions executable by the processor;
The image semantic segmentation apparatus, characterized in that the processor is configured to implement the image semantic segmentation method according to any one of claims 1 to 9 by calling an executable instruction stored in the memory.

A computer program comprising:
The computer program, characterized in that it causes the computer to execute the image semantic segmentation method according to any one of claims 1 to 9.