KR102566614B1

KR102566614B1 - Apparatus, method and computer program for classifying object included in image

Info

Publication number: KR102566614B1
Application number: KR1020190172178A
Authority: KR
Inventors: 김광중; 박진욱
Original assignee: 주식회사 케이티
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2023-08-14
Also published as: KR20210079922A; WO2021125539A1

Abstract

영상에 포함된 객체를 분류하는 객체 분류 장치는 영상을 입력받는 영상 입력부, 상기 입력받은 영상으로부터 전경 객체를 추출하는 전경 객체 추출부, 딥러닝 알고리즘에 기초하여 상기 추출한 전경 객체로부터 분할 클래스 활성화 맵(S-CAM, Segmentation-Class Activation Map)을 생성하고, 상기 S-CAM을 이용하여 분류 영역을 추출하는 분류 영역 추출부 및 상기 추출된 분류 영역에 포함된 객체를 분류하는 객체 분류부를 포함한다.An object classification apparatus for classifying objects included in an image includes an image input unit that receives an image, a foreground object extractor that extracts a foreground object from the input image, and a segmentation class activation map from the extracted foreground object based on a deep learning algorithm ( It includes a classification area extraction unit that generates a segmentation-class activation map (S-CAM) and extracts a classification area using the S-CAM, and an object classification unit that classifies objects included in the extracted classification area.

Description

Apparatus, method and computer program for classifying objects included in images

본 발명은 영상에 포함된 객체를 분류하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to an apparatus, method, and computer program for classifying objects included in an image.

보안 관리, 교통 상황 분석 등 여러 분야에서 영상을 이용한 감시 시스템이 활용되고 있다. 그러나 기상 변화, 조명 변화, 벌레 등의 이물질 등 실제 환경에 존재하는 여러 가지 원인으로 인해 영상 감시 시스템의 성능이 현저히 저하될 수 있다.Surveillance systems using video are being used in various fields such as security management and traffic analysis. However, the performance of the video surveillance system may be significantly degraded due to various causes existing in the real environment, such as weather changes, lighting changes, and foreign substances such as insects.

현재 영상 감시 시스템에서는, 움직임 검출 기반의 이벤트 탐지 방식이 주로 이용되고 있다. 이에 의하면, 움직임이 검출된 물체가 사람 또는 특정 사물에 해당하는지 여부를 판단하기 위해 별도로 객체 분류기를 사용해야 하므로 불필요한 비용 및 시간이 소요되는 문제점이 있다.In a current video surveillance system, an event detection method based on motion detection is mainly used. According to this, there is a problem in that unnecessary cost and time are consumed because an object classifier must be separately used to determine whether an object whose motion is detected corresponds to a person or a specific object.

최근에는 딥러닝의 CNN(Convolutional Neural Networks) 기반의 객체 검출 및 분류 방법들이 많이 활용되고 있다. 그러나 딥러닝 기반의 알고리즘은 많은 연산자원을 필요로 하고, 처리시간이 비교적 길게 소요된다는 한계를 가진다.Recently, object detection and classification methods based on convolutional neural networks (CNNs) of deep learning have been widely used. However, deep learning-based algorithms have limitations in that they require a lot of operational resources and take a relatively long processing time.

또한, 종래의 객체 분류 방법은 그림자. 객체 간의 가려짐 등으로 인해 객체를 분류하지 못하거나, 복수의 객체가 존재하는 경우에 각각을 분류하지 못하는 문제점이 있다.In addition, the conventional object classification method is shadow. There is a problem in that an object cannot be classified due to occlusion between objects or cannot be classified when a plurality of objects exist.

한국공개특허 제2016-0037643호는 객체 인식을 위해 객체 후보영역을 설정하는 구성을 개시하고 있다.Korean Patent Publication No. 2016-0037643 discloses a configuration for setting an object candidate region for object recognition.

영상에서 객체와 객체가 아닌 부분을 명확히 구분하여 객체를 분류하는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. 또한, 영상에 복수의 객체가 포함된 경우에 복수의 객체 각각을 분류할 수 있는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.It is intended to provide a device, method, and computer program for classifying objects by clearly distinguishing between objects and non-objects in an image. In addition, it is intended to provide a device, method, and computer program capable of classifying each of a plurality of objects when a plurality of objects are included in an image.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 영상에 포함된 객체를 분류하는 객체 분류 장치에 있어서, 영상을 입력받는 영상 입력부, 상기 입력받은 영상으로부터 전경 객체를 추출하는 전경 객체 추출부, 딥러닝 알고리즘에 기초하여 상기 추출한 전경 객체로부터 분할 클래스 활성화 맵(S-CAM, Segmentation-Class Activation Map)을 생성하고, 상기 S-CAM을 이용하여 분류 영역을 추출하는 분류 영역 추출부 및 상기 추출된 분류 영역에 포함된 객체를 분류하는 객체 분류부를 포함하는 객체 분류 장치를 제공할 수 있다.As a means for achieving the above-described technical problem, an embodiment of the present invention is an object classification apparatus for classifying objects included in an image, an image input unit for receiving an image, and a foreground object extracting from the input image. Foreground object extraction unit, based on a deep learning algorithm, generates a segmentation class activation map (S-CAM, Segmentation-Class Activation Map) from the extracted foreground object, and extracts a classification area to extract a classification area using the S-CAM It is possible to provide an object classification apparatus including a unit and an object classification unit for classifying objects included in the extracted classification area.

일 실시예에서, 상기 분류 영역 추출부는 상기 객체가 분류될 적어도 하나의 카테고리와 대응되는 채널을 이용하여 클래스 활성화 맵(CAM, Class Activation Map)을 생성하는 CAM 생성부를 포함할 수 있다.In an embodiment, the classification area extraction unit may include a CAM generation unit generating a class activation map (CAM) using a channel corresponding to at least one category in which the object is to be classified.

일 실시예에서, 상기 분류 영역 추출부는 상기 생성된 CAM을 분할 처리하여 상기 S-CAM을 생성하는 S-CAM 생성부를 더 포함할 수 있다.In one embodiment, the classification area extraction unit may further include an S-CAM generation unit generating the S-CAM by dividing and processing the generated CAM.

일 실시예에서, 상기 S-CAM 생성부는 상기 CAM을 복수의 구간으로 나누고, 상기 복수의 구간 각각에 대한 가중치 분산을 도출하고, 상기 도출된 가중치 분산 중 최솟값을 가지는 구간을 도출하고, 상기 도출된 구간의 CAM을 2진 분류하여 상기 S-CAM을 생성할 수 있다.In one embodiment, the S-CAM generation unit divides the CAM into a plurality of sections, derives a weight variance for each of the plurality of sections, derives a section having a minimum value among the derived weight distributions, and derives The S-CAM may be generated by binary-classifying the CAM of the section.

일 실시예에서, 상기 S-CAM 생성부는 상기 CAM의 최댓값 및 최솟값을 도출하고, 상기 도출된 최댓값 및 상기 도출된 최솟값 간의 구간을 상기 복수의 구간으로 나누는 것일 수 있다.In one embodiment, the S-CAM generating unit may derive a maximum value and a minimum value of the CAM, and divide an interval between the derived maximum value and the derived minimum value into the plurality of intervals.

일 실시예에서, 상기 분류 영역 추출부는 상기 입력받은 영상에 복수의 객체가 존재하는 경우에 상기 S-CAM을 이용하여 상기 복수의 객체 각각에 대응하는 복수의 분류 영역을 분리하여 추출할 수 있다.In one embodiment, the classification area extraction unit may separate and extract a plurality of classification areas corresponding to each of the plurality of objects using the S-CAM when a plurality of objects exist in the input image.

일 실시예에서, 상기 객체 분류부는 상기 분류 영역에 포함된 객체를 사람, 자동차, 동물 중 어느 하나로 판단하여 상기 객체의 카테고리를 결정함으로써 상기 객체를 분류할 수 있다.In an embodiment, the object classification unit may classify the object by determining a category of the object by determining that the object included in the classification area is one of a person, a vehicle, and an animal.

일 실시예에서, 상기 객체 분류부에서 분류한 객체의 메타데이터를 저장하는 저장부를 더 포함할 수 있다.In an embodiment, a storage unit for storing metadata of objects classified by the object classification unit may be further included.

본 발명의 다른 실시예는, 영상에 포함된 객체를 분류하는 객체 분류 방법에 있어서, 영상을 입력받는 단계, 상기 입력받은 영상으로부터 객체 영역을 추출하는 단계, 딥러닝 알고리즘에 기초하여 상기 추출한 객체 영역으로부터 클래스 활성화 맵(CAM, Class Activation Map)을 생성하는 단계, 상기 생성된 CAM에 기초하여 분할 CAM을 생성하는 단계, 상기 분할 CAM을 이용하여 분류 영역을 추출하는 단계 및 상기 추출된 분류 영역에서 객체 분류를 수행하는 단계를 포함하는 객체 분류 방법을 제공할 수 있다.In another embodiment of the present invention, in an object classification method for classifying objects included in an image, the step of receiving an image, the step of extracting an object region from the input image, the extracted object region based on a deep learning algorithm Generating a class activation map (CAM) from, generating a segmentation CAM based on the generated CAM, extracting a classification area using the segmentation CAM, and object in the extracted classification area An object classification method including performing classification may be provided.

본 발명의 다른 실시예는, 영상에 포함된 객체를 분류하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램에 있어서, 상기 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 영상으로부터 객체 영역을 추출하고, 딥러닝 알고리즘에 기초하여 상기 추출한 객체 영역으로부터 S-CAM(Segmentation-Class Activation Map)을 생성하고, 상기 S-CAM을 이용하여 영상에 포함된 객체를 분류하도록 하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램을 제공할 수 있다.Another embodiment of the present invention is a computer program stored in a medium including a sequence of instructions for classifying objects included in an image, wherein the computer program, when executed by a computing device, extracts an object region from the image, and deep A computer stored in a medium including a sequence of instructions for generating a Segmentation-Class Activation Map (S-CAM) from the extracted object region based on a learning algorithm and classifying objects included in an image using the S-CAM program can be provided.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described means for solving the problems is only illustrative and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 분류 영역을 정확히 추출하여 객체를 분류하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, it is possible to provide a device, method, and computer program for classifying an object by accurately extracting a classification area.

또한, 복수의 객체를 분리하여 각각에 대응하는 분류 영역을 추출하는 객체 분류 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.In addition, it is possible to provide an object classification apparatus, method, and computer program for separating a plurality of objects and extracting a classification area corresponding to each object.

또한, 객체 분류에 소요되는 비용과 시간을 절감할 수 있다.In addition, cost and time required for object classification can be reduced.

도 1은 종래의 객체를 분류하는 방법에서 발생하는 문제점을 예시적으로 나타낸다.
도 2는 본 발명의 일 실시예에 따른 객체 분류 장치의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 객체 분류 장치가 객체 분류를 수행하는 과정을 예시적으로 나타낸다.
도 4는 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 발생하는 문제점을 예시적으로 나타낸다.
도 5는 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 발생하는 다른 문제점을 예시적으로 나타낸다.
도 6은 본 발명의 일 실시예에 따른 객체 분류 장치가 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 예시적으로 나타낸다.
도 7은 본 발명의 일 실시예에 따른 객체 분류 장치가 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 예시적으로 나타낸다.
도 8은 본 발명의 일 실시예에 따른 객체 분류 장치가 하나의 객체에 대한 분류 영역을 추출하는 과정을 예시적으로 나타낸다.
도 9는 본 발명의 일 실시예에 따른 객체 분류 장치가 복수의 객체에 대한 분류 영역을 분리하여 추출하는 과정을 예시적으로 나타낸다.
도 10은 클래스 활성화 맵(CAM)을 이용하여 추출한 분류 영역 및 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 예시적으로 나타낸다.
도 11은 본 발명의 일 실시예에 따른 객체 분류 장치가 이용될 수 있는 서비스를 예시적으로 나타낸다.
도 12는 본 발명의 일 실시예에 따른 객체 분류 방법의 순서도이다.1 illustratively shows a problem occurring in a conventional object classification method.
2 is a block diagram of an object classification apparatus according to an embodiment of the present invention.
3 illustratively illustrates a process in which an object classification apparatus according to an embodiment of the present invention performs object classification.
4 exemplarily illustrates problems that occur when a classification region is extracted using only a class activation map (CAM).
5 exemplarily illustrates another problem that occurs when a classification region is extracted using only a class activation map (CAM).
6 illustratively illustrates a classification area extracted by the object classification apparatus according to an embodiment of the present invention using a segmentation class activation map (S-CAM).
7 illustratively shows a classification area extracted by the object classification apparatus according to an embodiment of the present invention using a segmentation class activation map (S-CAM).
8 illustratively illustrates a process of extracting a classification area for one object by an object classification apparatus according to an embodiment of the present invention.
9 exemplarily illustrates a process of separating and extracting classification areas for a plurality of objects by the object classification apparatus according to an embodiment of the present invention.
10 illustratively shows a classification region extracted using a class activation map (CAM) and a classification region extracted using a division class activation map (S-CAM).
11 exemplarily shows a service that can be used by the object classification apparatus according to an embodiment of the present invention.
12 is a flowchart of an object classification method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware. On the other hand, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or configured to reproduce one or more processors. Therefore, as an example, '~unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. Functions provided within components and '~units' may be combined into smaller numbers of components and '~units' or further separated into additional components and '~units'. In addition, components and '~units' may be implemented to play one or more CPUs in a device or a secure multimedia card.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by a terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 종래의 객체를 분류하는 방법에서 발생하는 문제점을 예시적으로 나타낸다. 종래의 객체를 분류하는 방법은 영상 내 움직임을 기반으로 객체가 존재하는 것으로 추정되는 영역을 추출하여 객체 분류를 수행한다.1 illustratively shows a problem occurring in a conventional object classification method. A conventional object classification method performs object classification by extracting a region in which an object is estimated to exist based on motion in an image.

도 1a 및 도 1b는 종래의 방법에 따라 객체 분류가 수행되는 영역에 객체가 아닌 부분이 많이 포함된 경우를 나타낸다. 영상에서 객체(사람)가 존재하는 영역뿐 아니라, 그림자가 존재하는 영역에도 움직임이 검출되기 때문에, 객체 분류가 수행되는 영역에 객체가 아닌 배경 부분이 많이 포함된 것이 나타난다. 따라서, 해당 영역에 대하여 객체 분류를 수행한 결과, 영상에 포함된 객체가 사람으로 분류되지 않는 문제점이 발생하였다.1A and 1B show a case in which a large number of non-object parts are included in an area where object classification is performed according to a conventional method. Since motion is detected not only in areas where objects (people) exist in the image, but also in areas where shadows exist, it appears that the area where object classification is performed includes many background parts, not objects. Therefore, as a result of object classification for the corresponding region, a problem occurred in that an object included in the image was not classified as a person.

도 1c는 영상에 복수의 객체가 존재하는 경우에, 종래의 방법에 의하면 복수의 객체를 구분하지 못하는 경우를 나타낸다. 영상에서 복수의 객체가 존재하는 경우에, 영상에 복수의 객체가 포함되어 있는지 여부를 판단하지 못하였으며, 복수의 객체 각각에 대하여 객체 분류를 수행하지 못하는 문제점이 발생하였다.1C shows a case in which a plurality of objects cannot be distinguished according to a conventional method when a plurality of objects exist in an image. When a plurality of objects exist in the image, it is not possible to determine whether the image includes a plurality of objects, and object classification cannot be performed for each of the plurality of objects.

도 2는 본 발명의 일 실시예에 따른 객체 분류 장치의 구성도이다. 도 2를 참조하면, 객체 분류 장치(200)는 영상 입력부(210), 전경 객체 추출부(220), 분류 영역 추출부(230) 및 객체 분류부(240)를 포함할 수 있다.2 is a block diagram of an object classification apparatus according to an embodiment of the present invention. Referring to FIG. 2 , the object classification apparatus 200 may include an image input unit 210, a foreground object extraction unit 220, a classification area extraction unit 230, and an object classification unit 240.

객체 분류 장치(200)는 서버, 데스크탑, 노트북, 키오스크(KIOSK) 및 스마트폰(smartphone), 태블릿 PC를 포함할 수 있다. 다만, 객체 분류 장치(200)는 앞서 예시된 것들로 한정 해석되는 것은 아니다. 즉, 객체 분류 장치(200)는 후술하는 영상에 포함되는 객체를 분류하는 방법을 수행하는 프로세서를 탑재한 모든 장치를 포함할 수 있다.The object classification apparatus 200 may include a server, a desktop computer, a laptop computer, a KIOSK, a smartphone, and a tablet PC. However, the object classification apparatus 200 is not limited to those exemplified above. That is, the object classification apparatus 200 may include any device equipped with a processor that performs a method of classifying an object included in an image, which will be described later.

객체 분류 장치(200)는 영상에 포함된 객체를 분류할 수 있다. 일 실시예에서, 객체 분류 장치(200)는 영상에서 객체와 객체가 아닌 부분을 명확히 구분할 있다. 일 실시예에서, 객체 분류 장치(200)는 영상에 복수의 객체가 포함된 경우에 복수의 객체를 분리하여 검출하고, 각각에 대하여 객체 분류를 수행할 수 있다.The object classification apparatus 200 may classify objects included in an image. In one embodiment, the object classification apparatus 200 may clearly distinguish between an object and a non-object part in an image. In an embodiment, when a plurality of objects are included in an image, the object classification apparatus 200 may separate and detect a plurality of objects, and perform object classification for each object.

영상 입력부(210)는 영상을 입력받을 수 있다. 예를 들어, 영상 입력부(210)는 사용자 단말과 같은 외부 장치로부터 영상을 입력받을 수 있다. 영상 입력부(210)는 외부 서버와의 통신을 통해 영상을 입력받을 수 있다. 영상 입력부(210)는 입력받은 영상에 대하여 점 잡음 제거 등의 전처리를 수행할 수 있다.The image input unit 210 may receive an image. For example, the image input unit 210 may receive an image from an external device such as a user terminal. The image input unit 210 may receive an image through communication with an external server. The image input unit 210 may perform preprocessing such as point noise removal on the input image.

전경 객체 추출부(220)는 입력받은 영상으로부터 전경 객체(Foreground Object)를 추출할 수 있다. 전경 객체 추출부(220)는 입력받은 영상에서 객체가 존재하는 것으로 추정되는 영역인 전경 객체를 추출할 수 있다.The foreground object extractor 220 may extract a foreground object from an input image. The foreground object extractor 220 may extract a foreground object, which is an area in which an object is estimated to exist, from the input image.

전경 객체 추출부(220)는 차영상(Background Subtraction) 추출 방법을 이용하여 입력받은 영상으로부터 전경 객체를 추출할 수 있다. 차영상 추출 방법은 예를 들어, KNN(K-Nearest Neighbor), MOG(Mixture of Gaussian), GMG(Global Minimum with a Guarantee) 중 어느 하나 이상의 알고리즘을 이용할 수 있으나, 이에 제한되지 않는다.The foreground object extractor 220 may extract a foreground object from an input image using a background subtraction extraction method. The difference image extraction method may use, for example, one or more algorithms of K-Nearest Neighbor (KNN), Mixture of Gaussian (MOG), and Global Minimum with a Guarantee (GMG), but is not limited thereto.

도 3의 (a)는 전경 객체 추출부(220)에 의하여 전경 객체를 추출하는 과정을 예시적으로 나타낸다. 도 3의 (a)에 도시된 예와 같이, 입력받은 영상(301)에서, 차영상 추출 알고리즘을 이용하여 배경(302)을 제거하면, 전경 객체(303)를 추출할 수 있다.(a) of FIG. 3 illustrates a process of extracting a foreground object by the foreground object extractor 220 as an example. As shown in (a) of FIG. 3 , if the background 302 is removed from the input image 301 using a difference image extraction algorithm, the foreground object 303 can be extracted.

분류 영역 추출부(230)는 딥러닝 알고리즘에 기초하여 추출한 전경 객체로부터 분할 클래스 활성화 맵(S-CAM, Segmentation-Class Activation Map)을 생성하고, S-CAM을 이용하여 분류 영역을 추출할 수 있다.The classification area extractor 230 may generate a segmentation-class activation map (S-CAM) from the extracted foreground object based on a deep learning algorithm and extract the classification area using the S-CAM. .

다시 도 2를 참조하면, 분류 영역 추출부(230)는 CAM 생성부(231)를 포함할 수 있다. CAM 생성부(231)는 전경 객체 추출부(220)에서 추출된 전경 객체로부터 클래스 활성화 맵(CAM, Class Activation Map)을 생성할 수 있다. CAM 생성부(231)에서 생성된 클래스 활성화 맵(CAM)을 이용하여 객체의 위치를 추정할 수 있다.Referring back to FIG. 2 , the classification area extraction unit 230 may include a CAM generation unit 231 . The CAM generator 231 may generate a Class Activation Map (CAM) from the foreground object extracted by the foreground object extractor 220 . The position of the object may be estimated using the class activation map (CAM) generated by the CAM generation unit 231 .

전형적인 CNN(Convolutional Neural Networks) 알고리즘은, 특징 추출(Feature Extraction) 부분 및 분류(Classification) 부분으로 구성된다. 일반적으로 CNN의 특징 추출 부분은 콘볼루션 레이어(Convolution Layer)와 풀링 레이어(Pooling Layer)를 교대로 반복하여 스택(Stack)을 쌓도록 구성되며, 분류 부분은 완전 연결 레이어(Fully Connected Layer)와 마지막 출력층에 소프트맥스 레이어(Softmax layer)를 포함하도록 구성된다.A typical CNN (Convolutional Neural Networks) algorithm is composed of a feature extraction part and a classification part. In general, the feature extraction part of CNN is composed of stacks by alternating convolutional layers and pooling layers, and the classification part consists of fully connected layers and finally It is configured to include a Softmax layer in the output layer.

일 실시예에서 CAM 생성부(231)는, 전술한 CNN 알고리즘과 달리, 완전 연결 레이어를 이용하지 않고 글로벌 애버리지 풀링(GAP, Global Average Pooling) 레이어를 이용하여 클래스 활성화 맵(CAM)을 생성할 수 있다.In one embodiment, unlike the CNN algorithm described above, the CAM generation unit 231 may generate a class activation map (CAM) using a global average pooling (GAP) layer without using a fully connected layer. there is.

또한, 일 실시예에서 CAM 생성부(231)는, 객체가 분류될 카테고리의 수와 같은 수의 채널을 가지는 콘볼루션 레이어를 이용하여 클래스 활성화 맵(CAM)을 생성할 수 있다. 즉, CAM 생성부(231)는 객체가 분류될 적어도 하나의 카테고리와 대응되는 채널을 이용하여 클래스 활성화 맵(CAM)을 생성할 수 있다.Also, in one embodiment, the CAM generation unit 231 may generate a class activation map (CAM) using a convolution layer having the same number of channels as the number of categories into which objects are to be classified. That is, the CAM generating unit 231 may generate a class activation map (CAM) using a channel corresponding to at least one category into which an object is to be classified.

예를 들어, 객체를 카테고리 1 내지 5로 분류하는 장치의 경우에, CAM 생성부(231)는 각 카테고리에 대응되는 5개의 채널을 가지는 콘볼루션 레이어를 이용할 수 있다. 이에 따라, 콘볼루션 레이어 이후에 도출되는 특징 맵(Feature Map)이 5개의 채널을 가질 수 있다. 5개의 채널에서 각 채널을 기준으로 도출한 평균값이 각 카테고리에 대응하는 값이 될 수 있고, 가장 큰 값을 가지는 카테고리를 기준으로 객체 분류가 수행될 수 있다.For example, in the case of a device that classifies objects into categories 1 to 5, the CAM generation unit 231 may use a convolution layer having 5 channels corresponding to each category. Accordingly, a feature map derived after the convolution layer may have 5 channels. An average value derived on the basis of each channel in the five channels may be a value corresponding to each category, and object classification may be performed based on the category having the largest value.

CAM 생성부(231)는 다음의 수학식 1을 이용하여 클래스 활성화 맵(CAM)을 생성할 수 있다.The CAM generation unit 231 may generate a class activation map (CAM) using Equation 1 below.

여기서 x, y는 좌표값이고, c는 객체가 분류될 카테고리(판별 클래스)이고, k는 각 채널이고, M은 클래스 활성화 맵(CAM)이고, w는 각 채널별 판별 레이어의 가중치이고, f는 특징 맵일 수 있다. CAM 생성부(231)는 콘볼루션 레이어와 풀링 레이어를 거친 n x n 행렬에 마지막 판별 레이어의 가중치(weight)를 곱하여 클래스 활성화 맵(CAM)을 생성할 수 있다.where x, y are coordinate values, c is the category (discriminant class) to which the object is classified, k is each channel, M is the class activation map (CAM), w is the weight of the discriminant layer for each channel, and f may be a feature map. The CAM generation unit 231 may generate a class activation map (CAM) by multiplying an n x n matrix that has passed through the convolution layer and the pooling layer by a weight of a final discrimination layer.

도 3의 (b)는 추출된 전경 객체(311)로부터 CAM 생성부(231)에 의하여 생성된 클래스 활성화 맵(CAM)(312)을 예시적으로 나타낸다.(b) of FIG. 3 shows a class activation map (CAM) 312 generated by the CAM generation unit 231 from the extracted foreground object 311 as an example.

CAM 생성부(231)에서 생성된 클래스 활성화 맵(CAM)을 이용하여 바로 객체 분류를 수행하는 경우, 분류 영역이 제대로 추출되지 않는 문제점이 발생할 수 있다.When object classification is directly performed using the class activation map (CAM) generated by the CAM generation unit 231, a problem in that the classification area is not properly extracted may occur.

도 4의 (a) 내지 (d)는 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 발생하는 문제점을 예시적으로 나타낸다. 입력받은 영상에 복수의 객체가 포함된 경우에, 특히 도 4의 (a)에 도시된 바와 같이 복수의 객체 간의 간격이 좁은 경우에, 도 4의 (b) 및 (c)의 결과와 같이 복수의 객체를 분리하지 않은 클래스 활성화 맵(CAM)이 생성될 수 있다. 이에 의하여, 도 4의 (d)에 도시된 바와 같이 복수의 객체 각각에 대응하는 복수의 분류 영역이 분리하여 추출되지 않을 수 있다.(a) to (d) of FIG. 4 exemplarily illustrate problems that occur when a classification region is extracted using only a class activation map (CAM). When a plurality of objects are included in the input image, especially when the interval between the plurality of objects is narrow as shown in (a) of FIG. 4, as shown in (b) and (c) of FIG. A class activation map (CAM) that does not separate objects of can be created. Accordingly, as shown in (d) of FIG. 4 , a plurality of classification areas corresponding to each of a plurality of objects may not be separated and extracted.

도 5의 (a) 내지 (d)는 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 발생하는 다른 문제점을 예시적으로 나타낸다. 도 5의 (a)와 같은 영상을 입력받은 경우에, 도 5의 (b) 및 (c)의 결과와 같이 객체가 아닌 부분을 객체로 오분류한 클래스 활성화 맵(CAM)이 생성될 수 있다. 이에 의하여, 도 5의 (d)에 도시된 바와 같이 오분류된 부분이 분류 영역으로 추출될 수 있다.(a) to (d) of FIG. 5 exemplarily illustrate other problems that occur when a classification region is extracted using only a class activation map (CAM). When an image as shown in (a) of FIG. 5 is input, a class activation map (CAM) that misclassifies parts that are not objects as objects can be generated as shown in the results of (b) and (c) of FIG. 5 . . As a result, as shown in (d) of FIG. 5 , the misclassified part may be extracted as a classification area.

상술한 바와 같은 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 발생하는 문제점을 해결하기 위하여, 일 실시예에서 분할 클래스 활성화 맵(S-CAM)을 이용할 수 있다. 다시 도 2를 참조하면, 분류 영역 추출부(230)는 S-CAM 생성부(232)를 더 포함할 수 있다.In order to solve a problem that occurs when a classification region is extracted using only the class activation map (CAM) as described above, in an embodiment, a split class activation map (S-CAM) may be used. Referring back to FIG. 2 , the classification area extraction unit 230 may further include an S-CAM generation unit 232 .

S-CAM 생성부(232)는 생성된 CAM을 분할 처리하여 S-CAM을 생성할 수 있다. S-CAM 생성부(232)는 분할 알고리즘을 이용하여 클래스 활성화 맵(CAM)으로부터 분할 클래스 활성화 맵(S-CAM)을 생성할 수 있다.The S-CAM generation unit 232 may generate S-CAMs by dividing and processing the generated CAMs. The S-CAM generator 232 may generate a split class activation map (S-CAM) from the class activation map (CAM) using a split algorithm.

S-CAM 생성부(232)는 클래스 활성화 맵(CAM)을 복수의 구간으로 나눌 수 있다. S-CAM 생성부(232)는 클래스 활성화 맵(CAM)의 최댓값 및 최솟값을 도출하고, 상기 도출된 최댓값 및 상기 도출된 최솟값 간의 구간을 복수의 구간으로 나눌 수 있다.The S-CAM generation unit 232 may divide the class activation map (CAM) into a plurality of sections. The S-CAM generator 232 may derive the maximum and minimum values of the class activation map (CAM), and divide a section between the derived maximum value and the derived minimum value into a plurality of sections.

S-CAM 생성부(232)는 복수의 구간 각각에 대한 가중치 분산을 도출할 수 있다. S-CAM 생성부(232)는 다음의 수학식 2를 이용하여 수의 구간 각각에 대한 가중치 분산을 도출할 수 있다.The S-CAM generation unit 232 may derive a weight distribution for each of a plurality of sections. The S-CAM generation unit 232 may derive a weight distribution for each section of numbers using Equation 2 below.

여기서 σ_w ²는 복수의 구간 중 어느 한 구간에 대한 가중치 분산이고, a 및 b는 각각 2진 분류의 어느 하나를 의미할 수 있다. 예를 들어, a는 전경이고, b는 배경일 수 있다. W_a는 a의 가중치이고, μ_a는 a의 평균이고, σ_a ²은 a의 분산일 수 있다. W_b는 b의 가중치이고, μ_b는 b의 평균이고, σ_b ²은 b의 분산일 수 있다.Here, σ _w ² is a weight variance for any one of a plurality of sections, and a and b may respectively mean any one of binary classification. For example, a may be the foreground and b may be the background. W _a is the weight of a, μ _a is the mean of a, and σ _a ² may be the variance of a. W _b is the weight of b, μ _b is the average of b, and σ _b ² may be the variance of b.

예를 들어, 복수의 구간이 총 10 구간인 경우에, s 구간의 가중치 분산 σ_w ²을 도출할 때, 이고, 이고, 일 수 있다. 또한, 이고, 이고, 일 수 있다. X_i는 클래스 활성화 맵(CAM)의 실수형 값을 최솟값 및 최댓값을 기준으로 복수의 구간으로 나누었을 때, i번째 구간에 속하는 값들의 개수일 수 있다. 예를 들어, 4 구간(0.4~0.5)에 속하는 CAM의 실수형 값을 모두 구하면 0.44, 0.41, 0.46일 때, X₄는 3일 수 있다.For example, when a plurality of intervals is a total of 10 intervals, when deriving the weight distribution σ _w ² of s intervals, ego, ego, can be also, ego, ego, can be X _i may be the number of values belonging to the i-th section when the real value of the class activation map (CAM) is divided into a plurality of sections based on the minimum and maximum values. For example, X ₄ may be 3 when all real values of CAM belonging to 4 intervals (0.4 to 0.5) are obtained and 0.44, 0.41, and 0.46 are obtained.

S-CAM 생성부(232)는 복수의 구간 중에서, 가중치 분산 중 최솟값을 가지는 구간을 도출할 수 있다. S-CAM 생성부(232)는 도출된 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 분할 클래스 활성화 맵(S-CAM)을 생성할 수 있다.The S-CAM generating unit 232 may derive a section having a minimum value among weight distributions among a plurality of sections. The S-CAM generation unit 232 may binary classify the class activation map (CAM) of the derived section to generate a split class activation map (S-CAM).

예를 들어, 클래스 활성화 맵(CAM)의 최솟값이 0.0이고, 최댓값이 1.0인 경우에 클래스 활성화 맵(CAM)의 0.0부터 1.0까지의 구간을 0.1의 간격으로 나눌 수 있다. 이에 의하여 0.0부터 0.1까지의 제 1 구간, 0.1부터 0.2까지의 제 2 구간 등과 같이 클래스 활성화 맵(CAM)이 총 10개의 구간으로 나눠질 수 있다. 이 경우에 복수의 구간 각각에 대한 가중치 분산이 다음의 표 1과 같이 도출될 수 있다.For example, when the minimum value of the class activation map (CAM) is 0.0 and the maximum value is 1.0, an interval from 0.0 to 1.0 of the class activation map (CAM) may be divided by an interval of 0.1. Accordingly, the class activation map (CAM) may be divided into a total of 10 sections, such as a first section from 0.0 to 0.1 and a second section from 0.1 to 0.2. In this case, weight variance for each of a plurality of sections can be derived as shown in Table 1 below.

표 1에서 가중치 분산은 제 4 구간에서 최솟값을 가지는 것으로 나타난다. 따라서, S-CAM 생성부(232)는 제 4 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 분할 클래스 활성화 맵(S-CAM)을 생성할 수 있다.In Table 1, the weight variance appears to have a minimum value in the fourth interval. Accordingly, the S-CAM generation unit 232 may binary classify the class activation map (CAM) of the fourth section to generate a split class activation map (S-CAM).

도 3의 (c)는 S-CAM 생성부(232)에 의하여 생성되는 분할 클래스 활성화 맵(S-CAM)(321, 322, 323)을 예시적으로 나타낸다. 도면 부호 321, 322 및 323은 각각 다른 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)을 나타낸다. 도 3의 (c)을 참조하면, 클래스 활성화 맵(CAM)의 복수의 구간 각각으로부터 생성되는 분할 클래스 활성화 맵(S-CAM)이 다른 것을 확인할 수 있다.(c) of FIG. 3 shows split class activation maps (S-CAMs) 321, 322, and 323 generated by the S-CAM generation unit 232 as an example. Reference numerals 321, 322, and 323 denote split class activation maps (S-CAMs) generated by binary classification of class activation maps (CAMs) of different intervals, respectively. Referring to (c) of FIG. 3 , it can be seen that the split class activation maps (S-CAMs) generated from each of a plurality of sections of the class activation map (CAM) are different.

분류 영역 추출부(230)는 분할 클래스 활성화 맵(S-CAM)을 이용하여 분류 영역을 추출할 수 있다.The classification area extractor 230 may extract a classification area using a segmentation class activation map (S-CAM).

분류 영역 추출부(230)는 예를 들어, 도 3의 (d)에 도시된 바와 같이 가중치 분산이 최솟값을 가지는 구간의 분할 클래스 활성화 맵(S-CAM)(331)으로부터 분류 영역(332에 박스 표시된 부분)을 추출할 수 있다.For example, as shown in (d) of FIG. 3 , the classification area extraction unit 230 places a box in the classification area 332 from the segmentation class activation map (S-CAM) 331 of the section having the minimum weight variance. marked part) can be extracted.

분류 영역 추출부(230)는 입력받은 영상에 복수의 객체가 존재하는 경우에 분할 클래스 활성화 맵(S-CAM)을 이용하여 복수의 객체 각각에 대응하는 복수의 분류 영역을 분리하여 추출할 수 있다.When a plurality of objects exist in the input image, the classification region extractor 230 may separate and extract a plurality of classification regions corresponding to each of the plurality of objects using a segmentation class activation map (S-CAM). .

도 6의 (a)는 가중치 분산이 0.750인 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)이고, 도 6의 (b)는 (a)의 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 도시한다.(a) of FIG. 6 is a split class activation map (S-CAM) generated by binary classification of the class activation map (CAM) of a section with a weight variance of 0.750, and (b) of FIG. 6 is a split class activation map (S-CAM) of (a). Classification regions extracted using a class activation map (S-CAM) are shown.

도 6의 (c)는 가중치 분산이 0.409인 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)이고, 도 6의 (d)는 (c)의 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 도시한다.(c) of FIG. 6 is a split class activation map (S-CAM) generated by binary classification of the class activation map (CAM) of a section with a weight variance of 0.409, and (d) of FIG. 6 is a split class activation map (S-CAM) of (c). Classification regions extracted using a class activation map (S-CAM) are shown.

도 6의 (e)는 가중치 분산이 2.280인 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)이고, 도 6의 (f)는 (e)의 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 도시한다.(e) of FIG. 6 is a split class activation map (S-CAM) generated by binary classification of the class activation map (CAM) of a section with a weight variance of 2.280, and (f) of FIG. 6 is a split class activation map (S-CAM) of (e). Classification regions extracted using a class activation map (S-CAM) are shown.

도 6을 참조하면, 복수의 객체가 포함되는 영상에서 가중치 분산이 최솟값을 가지는 구간인 (c), (d)의 경우에 복수의 객체 각각에 대응하는 복수의 분류 영역이 가장 적절하게 분리해 추출되었다. 즉, 도 4에 도시된, 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 복수의 객체를 분리하지 못하는 문제점이 분할 클래스 활성화 맵(S-CAM)을 이용함으로써 해결된 것을 확인할 수 있다.Referring to FIG. 6, in the case of sections (c) and (d) in which the weight variance has the minimum value in an image including a plurality of objects, a plurality of classification regions corresponding to each of the plurality of objects are most appropriately separated and extracted. It became. That is, it can be seen that the problem of not being able to separate a plurality of objects when a classification area is extracted using only the class activation map (CAM) shown in FIG. 4 is solved by using the split class activation map (S-CAM). .

도 7의 (a)는 가중치 분산이 0.579인 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)이고, 도 7의 (b)는 (a)의 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 도시한다.(a) of FIG. 7 is a split class activation map (S-CAM) generated by binary classification of the class activation map (CAM) of a section with a weight variance of 0.579, and (b) of FIG. 7 is a split class activation map (S-CAM) of (a). Classification regions extracted using a class activation map (S-CAM) are shown.

도 7의 (c)는 가중치 분산이 0.345인 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)이고, 도 7의 (d)는 (c)의 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 도시한다.(c) of FIG. 7 is a split class activation map (S-CAM) generated by binary classification of a class activation map (CAM) of a section with a weight variance of 0.345, and (d) of FIG. 7 is a split class activation map (S-CAM) of (c). Classification regions extracted using a class activation map (S-CAM) are shown.

도 7의 (e)는 가중치 분산이 2.067인 구간의 클래스 활성화 맵(CAM)을 2진 분류하여 생성된 분할 클래스 활성화 맵(S-CAM)이고, 도 7의 (f)는 (e)의 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 도시한다.(e) of FIG. 7 is a split class activation map (S-CAM) generated by binary classification of the class activation map (CAM) of a section with a weight variance of 2.067, and (f) of FIG. 7 is a split class activation map (S-CAM) of (e). Classification regions extracted using a class activation map (S-CAM) are shown.

도 7을 참조하면, 가중치 분산이 최솟값을 가지는 구간인 (c), (d)의 경우에 객체와 객체가 아닌 부분을 명확하게 구분하여, 노이즈가 없는 가장 적절한 분류 영역이 추출되었다. 즉, 도 5에 도시된, 클래스 활성화 맵(CAM)만을 이용하여 분류 영역을 추출하는 경우 노이즈를 제거하지 못하는 문제점이 분할 클래스 활성화 맵(S-CAM)을 이용함으로써 해결된 것을 확인할 수 있다.Referring to FIG. 7 , in the case of (c) and (d), where the weight variance has the minimum value, the object and non-object part are clearly distinguished, and the most appropriate classification area without noise is extracted. That is, it can be confirmed that the problem of not removing noise when extracting a classification region using only the class activation map (CAM) shown in FIG. 5 is solved by using the split class activation map (S-CAM).

객체 분류부(240)는 추출된 분류 영역에 포함된 객체를 분류할 수 있다. 객체 분류부(240)는 입력받은 영상에 포함된 객체가 하나의 카테고리에 해당하는지 여부를 판단할 수 있다. 객체 분류부(240)는 입력받은 영상에 포함된 객체가 복수의 카테고리 중 어느 하나에 해당하는지 여부를 한 번에 판단할 수 있다.The object classification unit 240 may classify objects included in the extracted classification area. The object classification unit 240 may determine whether an object included in the input image corresponds to one category. The object classification unit 240 may determine at once whether an object included in an input image corresponds to any one of a plurality of categories.

객체 분류부(240)는 합성곱 신경망(Convolutional Neural Network, CNN) 분류기를 사용하여 객체의 카테고리를 결정할 수 있다. 합성곱 신경망 분류기는 영상의 특징을 추출하는 부분과 객체의 클래스를 판별(분류)하는 부분을 포함할 수 있다.The object classifier 240 may determine the category of the object using a Convolutional Neural Network (CNN) classifier. The convolutional neural network classifier may include a part for extracting features of an image and a part for determining (classifying) a class of an object.

영상의 특징을 추출하는 부분은 복수의 콘볼루션 레이어(Convolution Layer)와 복수의 풀링 레이어(Pooling Layer)를 포함하고, 객체의 클래스를 판별하는 부분은 이미지 분류를 위한 레이어, 예를 들어 풀리 커넥티드 레이어(Fully Connected Layer)를 포함할 수 있다. CAM 생성부(231)는 각 채널별 판별 레이어의 가중치(w)와 특징 맵(f)을 곱한 값에 기초하여 클래스 활성화 맵(CAM)을 생성하므로(수학식 1 참조), 합성곱 신경망 분류기와 클래스 활성화 맵(CAM)은 콘볼루션 레이어 및 이전 레이어들의 구조가 동일해야 한다.The part that extracts the feature of the image includes a plurality of convolution layers and a plurality of pooling layers, and the part that determines the class of the object is a layer for image classification, for example, a fully connected It may include a layer (Fully Connected Layer). Since the CAM generation unit 231 generates a class activation map (CAM) based on the value obtained by multiplying the weight (w) of the discrimination layer for each channel and the feature map (f) (see Equation 1), the convolutional neural network classifier and The class activation map (CAM) must have the same structure as the convolutional layer and previous layers.

상술한 바와 같은 구조의 합성곱 신경망 분류기에 대하여, 복수의 카테고리의 객체의 영상을 학습시킬 수 있다. 예를 들어, 합성곱 신경망 분류기에 사람, 자동차, 동물 각각의 영상을 학습시킨 경우, 학습된 합성곱 신경망 분류기는 분류 영역에 포함된 객체를, 각 카테고리에 해당할 확률을 도출하고, 가장 높은 확률값에 기초하여 객체를 분류할 수 있다.For the convolutional neural network classifier having the structure described above, images of objects of a plurality of categories may be trained. For example, if a convolutional neural network classifier is trained on each image of a person, car, or animal, the trained convolutional neural network classifier derives a probability corresponding to each category for an object included in the classification area, and obtains the highest probability value. Objects can be classified based on

일 실시예에서, 객체 분류부(240)는 분류 영역에 포함된 객체를 사람, 자동차, 동물 중 어느 하나로 판단하여 객체의 카테고리를 결정함으로써 객체를 분류할 수 있다. 다른 실시예에서, 객체 분류부(240)는 분류 영역에 포함된 객체가 복수의 카테고리 중 어느 하나의 카테고리에 해당할 확률을 각각 도출할 수 있다.In an embodiment, the object classification unit 240 may classify the object by determining the object category by determining the object included in the classification area as one of a person, a vehicle, and an animal. In another embodiment, the object classification unit 240 may derive a probability that an object included in the classification area corresponds to any one category among a plurality of categories.

예를 들어, 객체 분류부(240)는 도 3의 (e)에 도시된 바와 같은 분류 영역에 포함된 객체가 사람인지 여부를 판단하여, 객체가 사람에 해당할 확률이 98%라고 도출할 수 있다. 이에 따라, 객체를 사람으로 분류할 수 있다.For example, the object classification unit 240 may determine whether an object included in the classification area as shown in (e) of FIG. 3 is a human, and derive a 98% probability that the object corresponds to a human. there is. Accordingly, the object may be classified as a person.

도 8의 (a) 내지 (d)는 본 발명의 일 실시예에 따른 객체 분류 장치(200)가 하나의 객체에 대한 분류 영역을 추출하는 과정을 예시적으로 나타낸다. 영상 입력부(210)는 도 8의 (a)에 도시된 것과 같은 영상을 입력받을 수 있고, 전경 객체 추출부(220)는 입력받은 영상으로부터 전경 객체를 추출할 수 있다. 분류 영역 추출부(230)는 도 8의 (b)와 같이 분할 클래스 활성화 맵(S-CAM)을 생성하고, 이를 이용하여 도 8의 (c)와 같이 분류 영역을 추출할 수 있다. 객체 분류부(240)는 도 8의 (d)에 나타나는 분류 영역에 포함된 객체를 사람으로 분류할 수 있다.8 (a) to (d) exemplarily illustrate a process of extracting a classification region for one object by the object classification apparatus 200 according to an embodiment of the present invention. The image input unit 210 may receive an image as shown in (a) of FIG. 8 , and the foreground object extractor 220 may extract a foreground object from the input image. The classification region extraction unit 230 may generate a division class activation map (S-CAM) as shown in (b) of FIG. 8 and extract a classification region as shown in (c) of FIG. The object classification unit 240 may classify an object included in the classification area shown in (d) of FIG. 8 as a person.

도 9의 (a) 내지 (d)는 본 발명의 일 실시예에 따른 객체 분류 장치(200)가 복수의 객체에 대한 분류 영역을 추출하는 과정을 예시적으로 나타낸다. 영상 입력부(210)는 도 9의 (a)에 도시된 것과 같이 복수의 객체를 포함하는 영상을 입력받을 수 있고, 전경 객체 추출부(220)는 입력받은 영상으로부터 전경 객체를 추출할 수 있다. 분류 영역 추출부(230)는 도 9의 (b)와 같이 분할 클래스 활성화 맵(S-CAM)을 생성하고, 이를 이용하여 도 9의 (c)와 같이 복수의 객체 각각에 대응하는 복수의 분류 영역을 분리하여 추출할 수 있다. 객체 분류부(240)는 도 9의 (d)에 나타나는 각 분류 영역에 포함된 객체를 각각 사람으로 분류할 수 있다.9(a) to (d) exemplarily illustrate a process of extracting classification regions for a plurality of objects by the object classification apparatus 200 according to an embodiment of the present invention. The image input unit 210 may receive an image including a plurality of objects as shown in (a) of FIG. 9 , and the foreground object extractor 220 may extract a foreground object from the input image. The classification area extractor 230 generates a split class activation map (S-CAM) as shown in FIG. Areas can be separated and extracted. The object classification unit 240 may classify each object included in each classification area shown in (d) of FIG. 9 as a person.

도 10의 (a)는 클래스 활성화 맵(CAM)만을 이용하여 추출한 분류 영역을 나타내고, 도 10의 (b)는 분할 클래스 활성화 맵(S-CAM)을 이용하여 추출한 분류 영역을 나타낸다. 예를 들어, 객체 분류부(240)는 도 10의 (a)의 분류 영역에 포함된 객체가 사람일 확률이 56%라고 도출할 수 있으나, 도 10의 (b)의 각 분류 영역에 포함된 객체가 사람일 확률이 98%라고 도출할 수 있다.FIG. 10(a) shows a classification area extracted using only a class activation map (CAM), and FIG. 10(b) shows a classification area extracted using a split class activation map (S-CAM). For example, the object classification unit 240 may derive that the probability that the object included in the classification area of FIG. 10 (a) is a person is 56%, but the objects included in each classification area of FIG. 10 (b) It can be derived that the probability that the object is a person is 98%.

객체 분류 장치(200)는 객체 분류부(240)에서 분류한 객체의 메타데이터를 저장하는 저장부(미도시)를 더 포함할 수 있다. 다른 실시예에서, 객체 분류 장치(200)는 객체 분류부(240)에서 분류한 객체의 메타데이터를 외부 장치로 전송할 수 있다.The object classification apparatus 200 may further include a storage unit (not shown) for storing metadata of objects classified by the object classification unit 240 . In another embodiment, the object classification device 200 may transmit metadata of objects classified by the object classification unit 240 to an external device.

도 11의 (a) 내지 (c)는 본 발명의 일 실시예에 따른 객체 분류 장치(200)를 이용하여 제공할 수 있는 서비스를 예시적으로 나타낸다.11(a) to (c) illustrate services that can be provided using the object classification apparatus 200 according to an embodiment of the present invention.

도 11의 (a)에 도시된 바와 같이, 영상으로부터 객체의 침입 여부를 탐지하는 서비스를 제공하는 데에 객체 분류 장치(200)가 이용될 수 있다. 특정 영역에 움직이는 물체의 위치를 검출하고, 해당 물체가 무엇인지 분류할 수 있다. 복수 개의 카메라를 이용하여 촬영한 영상 또는 실시간 영상에 포함된 객체를 분류할 수 있다. 애완동물, 자동차, 사람 등을 구분하여 침입 여부를 판단할 수 있다.As shown in (a) of FIG. 11 , the object classification apparatus 200 may be used to provide a service for detecting intrusion of an object from an image. It can detect the position of a moving object in a specific area and classify what the object is. Objects included in images captured using a plurality of cameras or real-time images may be classified. Intrusion can be determined by classifying pets, cars, and people.

도 11의 (b)에 도시된 바와 같이, 영상 내에 특정 객체가 존재하는지 여부를 판단하는 서비스를 제공하는 데에 객체 분류 장치(200)가 이용될 수 있다. 예를 들어, 주차장 내부에 존재하는 자동차의 수를 도출하거나, 또는 주차 공간이 있는지 여부를 판단할 수 있다.As shown in (b) of FIG. 11 , the object classification apparatus 200 may be used to provide a service for determining whether a specific object exists in an image. For example, the number of cars present in the parking lot may be derived or whether there is a parking space may be determined.

도 11의 (c)에 도시된 바와 같이, 영상에 포함되는 객체의 수, 특히 인구 밀집도 등을 파악하는 데에 객체 분류 장치(200)가 이용될 수 있다. 또한, 물, 안개 등 비정형 객체의 밀집도를 검출하는 데에 객체 분류 장치(200)가 이용될 수 있다.As shown in (c) of FIG. 11 , the object classification apparatus 200 may be used to determine the number of objects included in an image, particularly population density. Also, the object classification apparatus 200 may be used to detect the density of irregular objects such as water and fog.

도 12는 본 발명의 일 실시예에 따른 객체 분류 방법의 순서도이다. 도 12에 도시된 장치(200)에서 수행되는 객체를 분류하는 방법(1200)은 도 2에 도시된 실시예에 따라 장치(200)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 2에 도시된 실시예에 따른 장치(200)에서 수행되는 객체를 분류하는 방법에도 적용된다.12 is a flowchart of an object classification method according to an embodiment of the present invention. The method 1200 of classifying an object performed by the apparatus 200 shown in FIG. 12 includes steps processed time-sequentially by the apparatus 200 according to the embodiment shown in FIG. 2 . Therefore, even if the contents are omitted below, they are also applied to the method of classifying objects performed in the apparatus 200 according to the embodiment shown in FIG. 2 .

단계 S1201에서 장치(200)는 영상을 입력받을 수 있다.In step S1201, the device 200 may receive an image.

단계 S1202에서 장치(200)는 입력받은 영상으로부터 객체 영역을 추출할 수 있다.In step S1202, the device 200 may extract an object area from the input image.

단계 S1203에서 장치(200)는 딥러닝 알고리즘에 기초하여 추출한 객체 영역으로부터 클래스 활성화 맵(CAM, Class Activation Map)을 생성할 수 있다.In step S1203, the device 200 may generate a Class Activation Map (CAM) from the extracted object region based on the deep learning algorithm.

단계 S1204에서 장치(200)는 생성된 CAM에 기초하여 분할 CAM을 생성할 수 있다.In step S1204, the apparatus 200 may generate a segmented CAM based on the generated CAM.

단계 S1205에서 장치(200)는 분할 CAM을 이용하여 분류 영역을 추출할 수 있다.In step S1205, the apparatus 200 may extract a classification area using segmented CAM.

단계 S1206에서 장치(200)는 추출된 분류 영역에서 객체 분류를 수행할 수 있다.In step S1206, the device 200 may perform object classification in the extracted classification area.

분할 CAM을 생성하는 단계 S1204는, CAM을 분할 처리하여 객체 간 분리 및 노이즈 제거로 객체 추정을 명확히 하는 분할 CAM을 생성할 수 있다.In step S1204 of generating the segmentation CAM, the segmentation process of the CAM may generate the segmentation CAM that clarifies object estimation by separating objects and removing noise.

상술한 설명에서, 단계 S1201 내지 S1206은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S1201 to S1206 may be further divided into additional steps or combined into fewer steps, depending on an embodiment of the present invention. Also, some steps may be omitted as needed, and the order of steps may be switched.

도 1 내지 도 12를 통해 설명된 객체 분류 장치에서 객체를 분류하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 12를 통해 설명된 객체 분류 장치에서 객체를 분류하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다.The method of classifying objects in the object classification apparatus described with reference to FIGS. 1 to 12 may be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by a computer. Also, the method of classifying objects in the object classification apparatus described with reference to FIGS. 1 to 12 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

200: 객체 분류 장치
220: 전경 객체 추출부
230: 분류 영역 추출부
240: 객체 분류부200: object classification device
220: foreground object extraction unit
230: classification area extraction unit
240: object classification unit

Claims

An object classification apparatus for classifying objects included in an image,
an image input unit that receives an image;
a foreground object extraction unit extracting a foreground object from the input image;
Based on a deep learning algorithm, a class activation map (CAM) is generated from the extracted foreground object, and a segmentation-class activation map (S-CAM) is generated based on the generated CAM, , a classification area extraction unit extracting a classification area using the S-CAM; and
An object classification unit for classifying objects included in the extracted classification area.
including,
The classification area extraction unit further includes an S-CAM generation unit configured to divide and process the generated CAM to generate the S-CAM,
The S-CAM generation unit divides the CAM into a plurality of sections, derives a weight distribution for each of the plurality of sections based on the variance of the foreground object and the background included in each of the plurality of sections, and the derived Generating the S-CAM based on a section having a minimum value among weight distributions;
The section having the minimum value corresponds to a plurality of classification areas corresponding to each of a plurality of objects or a section in which a classification area from which noise is removed is extracted through classification of the object and a non-object part, .

According to claim 1,
The object classification apparatus of claim 1 , wherein the classification area extraction unit includes a CAM generation unit generating the CAM using a channel corresponding to at least one category into which the object is to be classified.

delete

According to claim 1,
Wherein the S-CAM generation unit generates the S-CAM by binary classifying a CAM of a section having a minimum value among the derived weight distributions.

According to claim 4,
The S-CAM generation unit derives the maximum value and the minimum value of the CAM, and divides a section between the derived maximum value and the derived minimum value into the plurality of sections.

According to claim 1,
wherein the classification area extraction unit separates and extracts a plurality of classification areas corresponding to each of the plurality of objects using the S-CAM when a plurality of objects exist in the input image.

According to claim 1,
wherein the object classification unit classifies the object by determining a category of the object by determining an object included in the classification area as one of a person, a vehicle, and an animal.

According to claim 1,
The object classification apparatus further comprising a storage unit for storing metadata of objects classified by the object classification unit.

An object classification method for classifying objects included in an image,
Receiving an image;
extracting an object area from the input image;
generating a class activation map (CAM) from the extracted object region based on a deep learning algorithm;
generating a division CAM based on the generated CAM;
extracting a classification area using the segmented CAM; and
Performing object classification in the extracted classification area
including,
The step of generating the segmented CAM,
Dividing the CAM into a plurality of sections;
deriving a weight variance for each of the plurality of sections based on the variance of the electric object and the background included in each of the plurality of sections;
deriving a section having a minimum value among the derived weight distributions; and
Generating the divided CAM based on the interval having the minimum value;
The section having the minimum value corresponds to a plurality of classification areas corresponding to each of a plurality of objects or a section in which a classification area from which noise is removed is extracted through classification of the object and a non-object part, object classification method .

According to claim 9,
Wherein the generating of the CAM uses a channel corresponding to at least one category into which the object is to be classified.

According to claim 10,
The object classification method of claim 1, wherein the generating of the split CAM generates the split CAM that clarifies object estimation by dividing and processing the CAM to separate objects and remove noise.

According to claim 11,
The step of generating the segmented CAM is
Further comprising the step of generating the segmented CAM by binary classifying the CAM of the derived section.

According to claim 11,
The step of extracting the classification area comprises separating and extracting a plurality of classification areas corresponding to each of the plurality of objects using the split CAM when a plurality of objects exist in the input image. .

According to claim 9,
Wherein the step of classifying the object classifies the object by determining a category of the object by determining an object included in the classification area as one of a person, a vehicle, and an animal.

According to claim 9,
The object classification method further comprising the step of storing metadata of the object classified in the step of classifying the object.

A computer program stored in a medium containing a sequence of instructions for classifying an object included in an image,
When the computer program is executed by a computing device,
extracting an object region from an image;
Based on a deep learning algorithm, a Class Activation Map (CAM) is generated from the extracted object region, and a Segmentation-Class Activation Map (S-CAM) is generated based on the generated CAM,
A sequence of instructions for classifying an object included in an image using the S-CAM,
Dividing the CAM into a plurality of sections, deriving a weight distribution for each of the plurality of sections based on the variance of the foreground object and the background included in each of the plurality of sections, and having a minimum value among the derived weight distributions Further comprising a sequence of instructions to generate the S-CAM based on an interval;
The interval having the minimum value corresponds to a plurality of classification areas corresponding to each of a plurality of objects or a period in which a classification area from which noise is removed is extracted through classification of the object and a non-object part, stored in the medium. computer program.