KR20200131425A

KR20200131425A - Improvement apparatus and method of image processing speed and accuracy by multiple semantic segmentation models

Info

Publication number: KR20200131425A
Application number: KR1020190056024A
Authority: KR
Inventors: 정한별; 김경훈
Original assignee: 주식회사 아이에스피디
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2020-11-24
Also published as: KR102217547B1

Abstract

Disclosed are an apparatus for improving image processing speed and accuracy by combining a multiple artificial intelligence semantic segmentation model and a method thereof, which solve a trade-off problem between the speed and accuracy in a semantic segmentation process. The proposed device includes: a motion detection unit which detects the motion of an object in an input image; a model selection unit which selects any one semantic segmentation model from among a plurality of semantic segmentation models in real-time in accordance with to whether or not the object moves; and an object area division unit which recognizes and divides an area of an object based on the semantic segmentation model selected in real-time.

Description

[Improvement apparatus and method of image processing speed and accuracy by multiple semantic segmentation models]

본 발명은 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치 및 방법에 관한 것으로, 보다 상세하게는 인공지능 영상 인식 및 분할 알고리즘인 시맨틱 세그멘테이션(semantic segmentation)을 이용하는 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, and more particularly, to multiple artificial intelligence semantic segmentation using semantic segmentation, an artificial intelligence image recognition and segmentation algorithm. It relates to an apparatus and method for improving image processing speed and accuracy by combining models.

인공지능 영상 인식 및 분할을 위하여 사용되는 시맨틱 세그멘테이션(semantic segmentation)은 이미지의 모든 픽셀에 레이블 또는 카테고리를 연관 짓는 딥러닝 알고리즘(즉, "인공지능 영상 인식 및 분할 알고리즘"이라 함)이다.Semantic segmentation used for artificial intelligence image recognition and segmentation is a deep learning algorithm that associates a label or category with all pixels of an image (that is, referred to as "artificial intelligence image recognition and segmentation algorithm").

이러한 시맨틱 세그멘테이션은 입력 영상의 인물, 사물, 배경 등의 분류 및 분할을 수행한다.This semantic segmentation classifies and divides a person, an object, and a background of an input image.

그리고, 시맨틱 세그멘테이션에 의해 분할된 특정 피사체 또는 특정 피사체 이외의 영역(예컨대, 배경 등)을 합성하거나 이들에게 특수 효과를 부가할 수 있다.In addition, a specific subject divided by semantic segmentation or an area other than a specific subject (eg, a background) may be synthesized or a special effect may be added to them.

그런데, 현재 상용화되고 있는 시맨틱 세그멘테이션 모델(Semantic Segmentation Model)을 실시간(preview) 영상에 적용할 경우, 인퍼런스 타임(Inference Time)과 세그멘테이션된 결과물의 정확도(Accuracy)가 양립하지 않는 트레이드 오프(Trade-off) 문제가 발생한다. 여기서, 인퍼런스 타임은 입력 영상(Input Image)을 분석하여 출력 데이터(즉, 결과물)를 도출하는데 소요되는 시간(속도 ; FPS)을 의미한다. 즉, 인퍼런스 타임(Inference Time)이 빠를수록 정확도(Accuracy ; mloU%)가 떨어지는 문제가 발생한다. However, when the Semantic Segmentation Model, which is currently commercially available, is applied to a preview image, a trade-off in which the Inference Time and the accuracy of the segmented result are not compatible. -off) a problem occurs. Here, the reference time refers to a time (speed; FPS) required to derive output data (ie, a result) by analyzing an input image. That is, the faster the inference time, the lower the accuracy (mloU%).

특히, 종래의 시맨틱 세그멘테이션은 다양한 모델이 존재하는데, 다양한 분야에서 시맨틱 세그멘테이션을 사용하는 경우 여러 시맨택 세그멘테이션 모델중에서 하나의 시맨틱 세그멘테이션 모델만을 사용하고 있다.In particular, there are various models in the conventional semantic segmentation. When semantic segmentation is used in various fields, only one semantic segmentation model is used among several semantic segmentation models.

이와 같이 하나의 시맨틱 세그멘테이션 모델을 모바일기기(스마트폰 등) 카메라에 적용하여 실시간(Preview) 촬영을 하기 위해서는 대략 30FPS 수준의 인퍼런스 타임이 필요하다. 빠른 속도의 FPS를 위해서는 일반적으로 입력 영상(Input Image)의 사이즈를 줄이는데, 이때 정확도(Accuracy)의 감소를 초래한다.In this way, in order to apply one semantic segmentation model to a camera of a mobile device (smart phone, etc.) and take a preview in real time, an inference time of approximately 30 FPS is required. For fast FPS, the size of an input image is generally reduced, which results in a decrease in accuracy.

선행기술 1 : 대한민국 등록특허 제10-0968244호(영상에서 특정 피사체를 인식하는 시스템 및 방법)Prior Art 1: Korean Patent Registration No. 10-0968244 (System and method for recognizing a specific subject in an image) 선행기술 2 : 대한민국 등록특허 제10-1507662호(비디오 내 객체들의 시맨틱 파싱)Prior Art 2: Korean Patent Registration No. 10-1507662 (Semantic parsing of objects in video)

본 발명은 상기한 종래의 문제점을 해결하기 위해 제안된 것으로, 시맨틱 세그멘테이션 과정에서의 속도와 정확도간의 트레이드 오프(Trade-off) 문제를 해결하도록 하는 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치 및 방법을 제공함에 그 목적이 있다.The present invention has been proposed to solve the above-described conventional problem, and image processing speed by combining multiple artificial intelligence semantic segmentation models to solve the trade-off problem between speed and accuracy in the semantic segmentation process, and An object thereof is to provide an apparatus and method for improving accuracy.

상기와 같은 목적을 달성하기 위하여 본 발명의 바람직한 실시양태에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치는, 입력 영상내의 객체의 움직임을 감지하는 움직임 감지부; 상기 객체의 움직임 여부에 따라 실시간으로 다수의 시맨틱 세그멘테이션 모델중에서 어느 한 시맨틱 세그멘테이션 모델을 선택하는 모델 선택부; 및 상기 실시간으로 선택된 시맨틱 세그멘테이션 모델에 근거하여 상기 객체의 영역을 인식 및 분할하는 객체 영역 분할부;를 포함한다.In order to achieve the above object, an apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to a preferred embodiment of the present invention includes: a motion detector configured to detect movement of an object in an input image; A model selection unit that selects one semantic segmentation model from among a plurality of semantic segmentation models in real time according to whether the object moves; And an object region dividing unit for recognizing and segmenting the region of the object based on the semantic segmentation model selected in real time.

상기 다수의 시맨틱 세그멘테이션 모델은, 제 1 속도 및 제 1 정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델, 및 제 2 속도 및 제 2 정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델을 포함하고, 상기 제 1 속도는 상기 제 2 속도에 비해 고속이고, 상기 제 1 정확도는 상기 제 2 정확도에 비해 낮을 수 있다.The plurality of semantic segmentation models include a first semantic segmentation model that satisfies a first speed and a first accuracy, and a first semantic segmentation model that satisfies a second speed and a second accuracy, and the first speed is the It is high speed compared to the second speed, and the first accuracy may be lower than the second accuracy.

상기 제 1 속도는 초당 20프레임 이상의 인공지능 분석 속도를 의미하고, 상기 제 1 정확도는 출력 결과물의 정확도가 70% 미만이고, 상기 제 2 속도는 초당 5프레임 이하의 인공지능 분석 속도를 의미하고, 상기 제 2 정확도는 출력 결과물의 정확도가 70% 이상일 수 있다.The first speed means an artificial intelligence analysis speed of 20 frames per second or more, the first accuracy means an accuracy of the output result is less than 70%, and the second speed means an artificial intelligence analysis speed of 5 frames per second or less, As for the second accuracy, the accuracy of the output result may be 70% or more.

상기 모델 선택부는, 상기 객체가 움직이고 있는 경우에는 상기 제 1 시맨틱 세그멘테이션 모델을 실시간으로 선택하고, 상기 객체가 움직이지 않는 경우에는 상기 제 2 시맨틱 세그멘테이션 모델을 실시간으로 선택할 수 있다.The model selection unit may select the first semantic segmentation model in real time when the object is moving, and may select the second semantic segmentation model in real time when the object is not moving.

상기 객체의 움직임 여부에 따른 인공지능 영상 인식 및 분할이 보다 빠르게 행해지도록 하기 위해, 초기 입력 영상에 대해 상기 다수의 시맨틱 세그멘테이션 모델을 적용시켜 각각의 모델별 객체 영역을 미리 생성하는 다중 모델 병합부;를 추가로 포함할 수 있다.A multi-model merging unit for generating an object region for each model in advance by applying the plurality of semantic segmentation models to an initial input image in order to perform faster AI image recognition and segmentation according to whether the object moves; It may further include.

한편, 본 발명의 바람직한 실시양태에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 방법은, 움직임 감지부가, 입력 영상내의 객체의 움직임을 감지하는 단계; 모델 선택부가, 상기 객체의 움직임 여부에 따라 실시간으로 다수의 시맨틱 세그멘테이션 모델중에서 어느 한 시맨틱 세그멘테이션 모델을 선택하는 단계; 및 객체 영역 분할부가, 상기 실시간으로 선택된 시맨틱 세그멘테이션 모델에 근거하여 상기 객체의 영역을 인식 및 분할하는 단계;를 포함한다.Meanwhile, a method for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to a preferred embodiment of the present invention includes the steps of, by a motion detection unit, detecting movement of an object in an input image; Selecting a semantic segmentation model from among a plurality of semantic segmentation models in real time according to whether the object moves or not; And recognizing and segmenting the object region based on the semantic segmentation model selected in real time by the object region dividing unit.

이러한 구성의 본 발명에 따르면, 피사체가 움직이거나 정지할 때마다 해당 상황에 상응하는 시맨틱 세그멘테이션 모델을 적절히 선택하여 사용함으로써 시맨틱 세그멘테이션 과정(즉, 인공지능 영상 인식 및 분할 알고리즘을 사용하는 과정)에서의 속도와 정확도간의 트레이드 오프 문제를 해결할 수 있다.According to the present invention with such a configuration, whenever a subject moves or stops, a semantic segmentation model corresponding to the situation is appropriately selected and used in the semantic segmentation process (that is, the process of using an artificial intelligence image recognition and segmentation algorithm). It can solve the trade-off problem between speed and accuracy.

도 1은 본 발명의 실시예에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치의 구성도이다.
도 2는 도 1에 도시된 다중 모델 병합부에서의 다중 모델 병합을 설명하기 위한 예시도면이다.
도 3은 본 발명의 실시예에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 방법을 설명하기 위한 플로우차트이다.
도 4는 도 3의 플로우차트 설명에 채용되는 예시도면이다.1 is a block diagram of an apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to an embodiment of the present invention.
FIG. 2 is an exemplary view for explaining merging of multiple models in the multiple model merging unit shown in FIG. 1.
3 is a flowchart illustrating a method of improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to an embodiment of the present invention.
FIG. 4 is an exemplary view employed to explain the flowchart of FIG. 3.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세하게 설명하고자 한다.In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.However, this is not intended to limit the present invention to a specific embodiment, it is to be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof, does not preclude in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in this application. Does not.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. In describing the present invention, the same reference numerals are used for the same components in the drawings in order to facilitate the overall understanding, and duplicate descriptions of the same components are omitted.

도 1은 본 발명의 실시예에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치의 구성도이고, 도 2는 도 1에 도시된 다중 모델 병합부에서의 다중 모델 병합을 설명하기 위한 예시도면이다.1 is a configuration diagram of an apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to an embodiment of the present invention, and FIG. 2 is a diagram illustrating multiple model merging in a multi-model merging unit shown in FIG. It is an exemplary drawing for the following.

본 발명의 실시예에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 장치는, 다중 모델 병합부(10), 움직임 감지부(12), 모델 선택부(14), 및 객체 영역 분할부(16)를 포함한다.An apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to an embodiment of the present invention includes a multi-model merging unit 10, a motion detecting unit 12, a model selecting unit 14, and an object region. It includes a division (16).

다중 모델 병합부(10)는 입력 영상(예컨대, 프리뷰(preview) 영상)에 대해 다수의 시맨틱 세그멘테이션 모델(즉, 인공지능 영상 인식 및 분할 알고리즘)을 적용시켜 각각의 모델별 객체 영역을 생성(추정)할 수 있다.The multi-model merging unit 10 applies a number of semantic segmentation models (i.e., artificial intelligence image recognition and segmentation algorithms) to an input image (eg, a preview image) to generate (estimated) object regions for each model. )can do.

본 발명의 실시예에서, 다중 모델 병합부(10)는 고속 및 저정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델과 저속 및 고정확도를 충족시키는 제 2 시맨틱 세그멘테이션 모델을 사용하는 것으로 한다.In an embodiment of the present invention, it is assumed that the multi-model merging unit 10 uses a first semantic segmentation model that satisfies high speed and low accuracy and a second semantic segmentation model that satisfies low speed and high accuracy.

도 2를 예로 하여 설명하면, 도 2에서 참조부호 20은 제 1 시맨틱 세그멘테이션 모델에 의해 추정된 객체 영역이고, 참조부호 22는 제 2 시맨틱 세그멘테이션 모델에 의해 추정된 객체 영역이다.Referring to FIG. 2 as an example, in FIG. 2, reference numeral 20 denotes an object region estimated by the first semantic segmentation model, and reference numeral 22 denotes an object region estimated by the second semantic segmentation model.

모바일용 카메라가 초당 30프레임의 프리뷰 영상을 생성할 수 있다고 가정하였을 경우, 제 1 시맨틱 세그멘테이션 모델은 초당 20프레임 이상을 분석할 수 있는 속도(예컨대, 인공지능 분석 속도라고 칭할 수 있음)를 가짐과 더불어 출력 결과물의 정확도가 70% 미만인 시맨틱 세그멘테이션 모델을 의미할 수 있다. 한편, 제 2 시맨틱 세그멘테이션 모델은 초당 5프레임 이하를 분석할 수 있는 속도(예컨대, 인공지능 분석 속도라고 칭할 수 있음)를 가짐과 더불어 출력 결과물의 정확도가 70% 이상인 시맨틱 세그멘테이션 모델을 의미할 수 있다. 여기서, 출력 결과물의 정확도는 추정된 객체 영역의 에지 라인과 객체의 실제 에지 라인이 서로 어느 정도로 부합되는지를 나타내는 수치라고 할 수 있다.Assuming that a mobile camera can generate a preview image of 30 frames per second, the first semantic segmentation model has a speed capable of analyzing more than 20 frames per second (for example, it can be referred to as an artificial intelligence analysis speed). In addition, it may mean a semantic segmentation model in which the accuracy of the output result is less than 70%. On the other hand, the second semantic segmentation model may mean a semantic segmentation model having a speed capable of analyzing 5 frames per second or less (for example, it may be referred to as an artificial intelligence analysis speed) and having an accuracy of 70% or more of the output result. . Here, the accuracy of the output result can be said to be a numerical value indicating the degree to which the edge line of the estimated object area and the actual edge line of the object match each other.

예를 들어, 제 1 시맨틱 세그멘테이션 모델로는 구글에서 오픈 소스로 공개한 텐서플로우(Tensorflow)가 사용될 수 있고, 제 2 시맨틱 세그멘테이션 모델로는 페이스북에서 오픈 소스로 공개한 카페2(caffe2)가 사용될 수 있다.For example, as the first semantic segmentation model, Tensorflow released as open source by Google can be used, and as the second semantic segmentation model, Caffe2 released as open source by Facebook will be used. I can.

상술한 다중 모델 병합부(10)는 객체(피사체)의 움직임 여부에 따른 인공지능 영상 인식 및 분할 알고리즘(즉, 제 1 시맨틱 세그멘테이션 모델 또는 제 2 시맨틱 세그멘테이션 모델)이 보다 빠르게 수행될 수 있도록 하기 위해, 다중 모델 병합을 미리 해 둔다고 볼 수 있다. 즉, 다중 모델 병합부(10)는 전처리를 수행하기 위한 것으로서, 초기 입력 영상(정지 상태일 가능성이 높음)에 대한 각각의 모델별 객체 영역을 미리 생성하여 대기한다고 볼 수 있다.The above-described multi-model merging unit 10 is to allow the artificial intelligence image recognition and segmentation algorithm (ie, the first semantic segmentation model or the second semantic segmentation model) to be performed more quickly depending on whether the object (subject) moves. , It can be seen that the merging of multiple models is done in advance. That is, the multi-model merging unit 10 is for performing pre-processing, and it can be considered that object regions for each model for the initial input image (highly in a still state) are generated in advance and waited.

물론, 상술한 다중 모델 병합부(10)는 필요에 따라서는 없어도 무방하다.Of course, the above-described multi-model merging unit 10 may be omitted if necessary.

움직임 감지부(12)는 입력 영상내의 객체(사람 또는 사물)의 움직임의 유무 및 움직임 양을 감지한다.The motion detection unit 12 detects the presence or absence and amount of movement of an object (person or object) in the input image.

즉, 움직임 감지부(12)는 입력 영상내의 현재 프레임과 이전 프레임간의 차영상을 근거로 객체(사람 또는 사물)의 움직임의 유무 및 움직임 양 등을 감지할 수 있다. That is, the motion detector 12 may detect the presence or absence of movement of the object (person or object) and the amount of movement based on the difference image between the current frame and the previous frame in the input image.

모델 선택부(14)는 움직임 감지부(12)로부터의 정보를 근거로 다중 모델 병합부(10)에서 사용된 다수의 시맨틱 세그멘테이션 모델중에서 어느 하나의 시맨틱 세그멘테이션 모델을 실시간으로 선택한다.The model selection unit 14 selects one semantic segmentation model from among a plurality of semantic segmentation models used in the multi-model merging unit 10 in real time based on the information from the motion detection unit 12.

예를 들어, 객체가 움직이고 있는 경우에는 모델 선택부(14)는 고속 및 저정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델을 실시간으로 선택하고, 객체가 움직이지 않는 경우에는 모델 선택부(14)는 저속 및 고정확도를 충족시키는 제 2 시맨틱 세그멘테이션 모델을 실시간으로 선택한다.For example, when the object is moving, the model selection unit 14 selects a first semantic segmentation model that satisfies high speed and low accuracy in real time, and when the object is not moving, the model selection unit 14 And a second semantic segmentation model that satisfies high accuracy is selected in real time.

즉, 모델 선택부(14)는 객체의 움직임의 유무에 따라 실시간으로 제 1 시맨틱 세그멘테이션 모델 또는 제 2 시맨틱 세그멘테이션 모델을 선택할 수 있다.That is, the model selection unit 14 may select the first semantic segmentation model or the second semantic segmentation model in real time according to the presence or absence of movement of the object.

모델 선택부(14)는 다수의 시맨틱 세그멘테이션 모델을 내장하고 있다.The model selection unit 14 incorporates a number of semantic segmentation models.

객체 영역 분할부(16)는 모델 선택부(14)에서 실시간으로 선택된 시맨틱 세그멘테이션 모델에 근거하여 입력 영상에서 객체(피사체라고 할 수 있음)의 영역을 인식하여 분할해 낸다. 이때, 객체 영역 분할부(12)는 입력 영상에서 해당 객체(사람 또는 사물)가 포함된 소정 영역을 분할할 수 있다. The object region dividing unit 16 recognizes and divides a region of an object (which may be referred to as a subject) in the input image based on the semantic segmentation model selected in real time by the model selection unit 14. In this case, the object region dividing unit 12 may divide a predetermined region including a corresponding object (person or object) in the input image.

상술한 본 발명의 실시예에서는 다중 모델 병합부(10)가 2개의 시맨틱 세그멘테이션 모델을 사용하는 것으로 설명하였으나, 필요에 따라 시맨틱 세그멘테이션 모델의 수를 증가시켜도 무방하다. 만약, 시맨틱 세그멘테이션 모델의 수를 증가시킬 경우에는 인공지능 분석 속도 및 정확도를 보다 세분화하면 된다. 그에 따라, 모델 선택부(14)는 2개 이상의 시맨틱 세그멘테이션 모델중에서 움직임 감지부(12)로부터의 정보를 근거로 어느 하나의 시맨틱 세그멘테이션 모델을 실시간으로 선택할 수 있을 것이다.In the above-described embodiment of the present invention, it has been described that the multi-model merging unit 10 uses two semantic segmentation models. However, if necessary, the number of semantic segmentation models may be increased. If the number of semantic segmentation models is increased, the AI analysis speed and accuracy can be further subdivided. Accordingly, the model selection unit 14 may select any one semantic segmentation model from among two or more semantic segmentation models based on information from the motion detection unit 12 in real time.

도 3은 본 발명의 실시예에 따른 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 방법을 설명하기 위한 플로우차트이고, 도 4는 도 3의 플로우차트 설명에 채용되는 예시도면이다.FIG. 3 is a flowchart illustrating a method of improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models according to an embodiment of the present invention, and FIG. 4 is an exemplary diagram employed to explain the flowchart of FIG. 3.

먼저, 객체(피사체)의 움직임 여부에 따른 인공지능 영상 인식 및 분할 알고리즘(즉, 제 1 시맨틱 세그멘테이션 모델 또는 제 2 시맨틱 세그멘테이션 모델)이 보다 빠르게 행해지도록, 전처리 과정으로서 다중 모델 병합부(10)가 다중 모델 병합을 미리 해 둔다. 즉, 다중 모델 병합부(10)는 입력 영상(예컨대, 프리뷰(preview) 영상)에 대해 다수의 시맨틱 세그멘테이션 모델(즉, 인공지능 영상 인식 및 분할 알고리즘)을 적용시켜 각각의 모델별 객체 영역을 생성한다(S10). 이 경우, 다중 모델 병합부(10)는 고속 및 저정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델과 저속 및 고정확도를 충족시키는 제 2 시맨틱 세그멘테이션 모델을 사용한 것으로 가정한다. 그에 따라, 다중 모델 병합부(10)는 도 4의 제일 좌측의 그림에 예시한 바와 같이 제 1 시맨틱 세그멘테이션 모델에 의해 추정된 객체 영역(20) 및 제 2 시맨틱 세그멘테이션 모델에 의해 추정된 객체 영역(22)을 함께 표현할 수 있다. First, the multi-model merging unit 10 as a preprocessing process is performed so that the artificial intelligence image recognition and segmentation algorithm (that is, the first semantic segmentation model or the second semantic segmentation model) according to the movement of the object (subject) is performed more quickly. Merge multiple models in advance. That is, the multi-model merging unit 10 applies a number of semantic segmentation models (i.e., artificial intelligence image recognition and segmentation algorithms) to an input image (eg, a preview image) to generate an object area for each model. Do (S10). In this case, it is assumed that the multi-model merging unit 10 uses a first semantic segmentation model that satisfies high speed and low accuracy and a second semantic segmentation model that satisfies low speed and high accuracy. Accordingly, the multi-model merging unit 10 includes the object region 20 estimated by the first semantic segmentation model and the object region estimated by the second semantic segmentation model, as illustrated in the leftmost figure of FIG. 22) can be expressed together.

물론, 전처리 과정으로 행해지는 다중 모델 병합부(10)에서의 다중 모델 병합 단계(S10)는 필요에 따라 없어도 무방하다.Of course, the multi-model merging step (S10) in the multi-model merging unit 10 performed as a pre-processing process may not be necessary if necessary.

그리고, 움직임 감지부(12)가 입력 영상내의 객체(피사체; 사람 또는 사물)의 움직임의 유무 및 움직임 양을 감지하고, 그 결과를 모델 선택부(14)에게로 보낸다.In addition, the motion detection unit 12 detects the presence or absence of movement and the amount of movement of an object (subject; person or object) in the input image, and sends the result to the model selection unit 14.

만약, 객체(피사체)가 움직이고 있는 경우(S20에서 "Yes")이면 모델 선택부(14)는 제 1 시맨틱 세그멘테이션 모델을 실시간으로 선택하고(S30), 객체(피사체)가 움직이지 않는 경우(S20에서 "No")이면 모델 선택부(14)는 제 2 시맨틱 세그멘테이션 모델을 실시간으로 선택한다(S40).If the object (subject) is moving ("Yes" in S20), the model selection unit 14 selects the first semantic segmentation model in real time (S30), and the object (subject) does not move (S20). If "No"), the model selection unit 14 selects the second semantic segmentation model in real time (S40).

그에 따라, 객체 영역 분할부(16)는 모델 선택부(14)에서 실시간으로 선택된 시맨틱 세그멘테이션 모델에 근거하여 입력 영상에서 객체의 영역을 분할해 낸다(S50). 예를 들어, 객체(피사체; 예컨대 사람)가 고개를 움직이고 있는 경우에는 고속 및 저정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델이 선택되었으므로, 객체 영역 분할부(16)는 제 1 시맨틱 세그멘테이션 모델에 근거하여 도 4의 중간 그림에서와 같은 객체 영역(20)을 인식하여 분할해 낸다. 그리고, 객체(피사체; 예컨대 사람)가 고개를 완전히 움직여서 정지한 경우에는 저속 및 고정확도를 충족시키는 제 2 시맨틱 세그멘테이션 모델이 선택되었으므로, 객체 영역 분할부(16)는 제 2 시맨틱 세그멘테이션 모델에 근거하여 도 4의 제일 우측 그림에서와 같은 객체 영역(22)을 인식하여 분할해 낸다.Accordingly, the object region dividing unit 16 divides the object region from the input image based on the semantic segmentation model selected in real time by the model selection unit 14 (S50). For example, when an object (a subject; for example, a person) is moving its head, a first semantic segmentation model that satisfies high speed and low accuracy is selected, and thus the object region segmentation unit 16 is based on the first semantic segmentation model. The object area 20 as in the middle figure of FIG. 4 is recognized and divided. In addition, when the object (subject; for example, a person) completely moves its head and stops, a second semantic segmentation model that satisfies the low speed and high accuracy is selected, so the object region segmentation unit 16 is based on the second semantic segmentation model. The object area 22 as shown in the rightmost figure of FIG. 4 is recognized and divided.

제 1 시맨틱 세그멘테이션 모델에 의한 객체 영역(20) 및 제 2 시맨틱 세그멘테이션 모델에 의한 객체 영역(22)을 비교하여 보면, 해당 객체의 실제 에지 라인의 추종 결과에서 차이난다. When comparing the object region 20 based on the first semantic segmentation model and the object region 22 based on the second semantic segmentation model, there is a difference in the tracking result of the actual edge line of the object.

다시 말해서, 도 4의 중간 그림에서와 같이 사람이 고개를 움직이고 있는 동안에는 움직이고 있는 사람의 고개 부분의 실제 에지 라인을 제대로 추종하여 실시간으로 표현하기에는 시간적으로 부담이 된다. 그에 따라, 제 1 시맨틱 세그멘테이션 모델을 통해 출력 결과물의 정확도는 다소 떨어지지만 사람의 고개가 움직이고 있는 동안에 거의 동일 시각으로 해당 객체에 대한 객체 영역(20)을 생성하도록 한 것이다. 이와 같이 하면 객체의 움직임에 따라 자연스럽게 움직이는 객체 영역(20)을 신속하게 실시간으로 표현할 수 있다. In other words, as in the middle figure of FIG. 4, while a person is moving his head, it is a burden in time to properly follow the actual edge line of the moving person's head and express it in real time. Accordingly, although the accuracy of the output result is somewhat degraded through the first semantic segmentation model, the object region 20 for the object is created at about the same time while the human head is moving. In this way, the object area 20 that naturally moves according to the movement of the object can be quickly expressed in real time.

그리고, 도 4의 제일 우측 그림에서와 같이 사람이 고개를 완전히 움직여서 정지한 경우에는 사람의 고개가 움직이고 있는 경우에 비해 객체 인식 및 분할을 보다 쉽고 정확하게 할 수 있으므로 제 2 시맨틱 세그멘테이션 모델을 통해 속도는 다소 늦지만 출력 결과물의 정확도는 높은 객체 영역(22)을 생성하도록 한 것이다. 이와 같이 하면 제 1 시맨틱 세그멘테이션 모델을 통한 객체 영역(20)에 비해 보다 정확한 객체 영역(22)을 실시간으로 표현할 수 있다.And, as shown in the rightmost figure of Fig. 4, when a person stops by moving his head completely, object recognition and segmentation can be performed more easily and accurately than when the person's head is moving, so the speed through the second semantic segmentation model is It is somewhat slow, but the accuracy of the output result is to create the object area 22 with high accuracy. In this way, the object area 22 can be more accurately expressed in real time than the object area 20 through the first semantic segmentation model.

종래의 모바일기기에서는 속도 및 정확도 중에서 어느 하나만을 우선시하였기 때문에 그에 상응하는 하나의 시맨틱 세그멘테이션 모델만을 사용하였다. 그에 따라, 정확도를 충족시켜주는 시맨틱 세그멘테이션 모델만을 모바일기기에서 사용하였다고 가정하였을 경우, 입력 영상내의 객체가 움직일 때에는 해당 시맨틱 세그멘테이션 모델의 처리속도가 늦어서 해당 객체의 움직임을 제대로 따라가지 못하였다. 반대로, 속도를 충족시켜주는 시맨틱 세그멘테이션 모델만을 모바일기기에서 사용하였다고 가정하였을 경우, 입력 영상내의 객체가 정지하였을 때에는 해당 시맨틱 세그멘테이션에 의한 출력 결과물의 정확도가 떨어진다.In a conventional mobile device, only one semantic segmentation model was used, since only one of speed and accuracy was prioritized. Accordingly, assuming that only the semantic segmentation model that satisfies the accuracy was used in the mobile device, when the object in the input image moves, the processing speed of the semantic segmentation model is slow, so the movement of the object cannot be properly followed. Conversely, assuming that only the semantic segmentation model that satisfies the speed is used in the mobile device, the accuracy of the output result by the semantic segmentation is degraded when the object in the input image is stopped.

그러나, 상술한 본 발명의 실시예에서는 객체가 움직이는 경우에는 고속 및 저정확도를 충족시키는 제 1 시맨틱 세그멘테이션 모델을 실시간으로 선택하고, 객체가 움직이지 않는 경우에는 저속 및 고정확도를 충족시키는 제 2 시맨틱 세그멘테이션 모델을 실시간으로 선택하여 객체 영역 분할을 행함으로써, 인공지능 영상 인식 및 분할 과정에서 발생되는 속도와 정확도간의 트레이드 오프 문제를 해결할 수 있다.However, in the embodiment of the present invention described above, when the object moves, the first semantic segmentation model that satisfies high speed and low accuracy is selected in real time, and when the object does not move, the second semantic segmentation model satisfies low speed and high accuracy. By selecting a segmentation model in real time and performing object region segmentation, it is possible to solve the trade-off problem between speed and accuracy occurring in the process of artificial intelligence image recognition and segmentation.

또한, 상술한 본 발명의 다중 인공지능 시맨틱 세그멘테이션 모델 결합에 의한 영상 처리 속도 및 정확도 개선 방법은, 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 상기 방법을 구현하기 위한 기능적인(function) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.In addition, the method for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models of the present invention described above can be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tapes, floppy disks, and optical data storage devices. In addition, the computer-readable recording medium is distributed over a computer system connected through a network, so that computer-readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes and code segments for implementing the method can be easily deduced by programmers in the art to which the present invention belongs.

이상에서와 같이 도면과 명세서에서 최적의 실시예가 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로, 본 기술 분야의 통상의 지식을 가진자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.As described above, an optimal embodiment has been disclosed in the drawings and specifications. Although specific terms have been used herein, these are only used for the purpose of describing the present invention, and are not used to limit the meaning or the scope of the present invention described in the claims. Therefore, those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true technical scope of the present invention should be determined by the technical spirit of the appended claims.

10 : 다중 모델 병합부 12 : 움직임 감지부
14 : 모델 선택부 16 : 객체 영역 분할부10: multi-model merging unit 12: motion detection unit
14: model selection unit 16: object area division unit

Claims

A motion detector for detecting a motion of an object in the input image;
A model selection unit that selects one semantic segmentation model from among a plurality of semantic segmentation models in real time according to whether the object moves; And
An apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, comprising: an object region segmentation unit that recognizes and divides the region of the object based on the semantic segmentation model selected in real time.

The method according to claim 1,
The plurality of semantic segmentation models,
A first semantic segmentation model that satisfies a first speed and a first accuracy, and a first semantic segmentation model that satisfies a second speed and a second accuracy,
The apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, wherein the first speed is higher than that of the second speed, and the first accuracy is lower than that of the second accuracy.

The method according to claim 2,
The first speed means an artificial intelligence analysis speed of 20 frames per second or more,
The first accuracy is that the accuracy of the output result is less than 70%,
The second speed means an artificial intelligence analysis speed of 5 frames per second or less,
The second accuracy is an apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, wherein the accuracy of the output result is 70% or more.

The method according to claim 2,
The model selection unit,
When the object is moving, the first semantic segmentation model is selected in real time, and when the object is not moving, the second semantic segmentation model is selected in real time. Image processing speed and accuracy improvement device.

The method according to claim 1,
A multi-model merging unit for generating an object region for each model in advance by applying the plurality of semantic segmentation models to an initial input image in order to perform faster AI image recognition and segmentation according to whether the object moves; An apparatus for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, characterized in that it further comprises.

Detecting, by a motion detection unit, a motion of an object in the input image;
Selecting a semantic segmentation model from among a plurality of semantic segmentation models in real time according to whether the object moves or not; And
Recognizing and segmenting the object region based on the semantic segmentation model selected in real time by an object region segmentation unit; and improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models.

The method of claim 6,
The plurality of semantic segmentation models,
A first semantic segmentation model that satisfies a first speed and a first accuracy, and a first semantic segmentation model that satisfies a second speed and a second accuracy,
The method of improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, wherein the first speed is higher than that of the second speed, and the first accuracy is lower than that of the second accuracy.

The method of claim 7,
The first speed means an artificial intelligence analysis speed of 20 frames per second or more,
The first accuracy is that the accuracy of the output result is less than 70%,
The second speed means an artificial intelligence analysis speed of 5 frames per second or less,
The second accuracy is an image processing speed and accuracy improvement method by combining multiple artificial intelligence semantic segmentation models, characterized in that the accuracy of the output result is 70% or more.

The method of claim 7,
The selecting step,
When the object is moving, the first semantic segmentation model is selected in real time, and when the object is not moving, the second semantic segmentation model is selected in real time. Improving image processing speed and accuracy by

The method of claim 6,
Prior to the sensing step,
The multi-model merging unit generates object regions for each model in advance by applying the plurality of semantic segmentation models to an initial input image so that artificial intelligence image recognition and segmentation according to whether the object is moving or not are performed faster. A method for improving image processing speed and accuracy by combining multiple artificial intelligence semantic segmentation models, characterized in that it further comprises.