KR20210027006A

KR20210027006A - Apparatus, method and computer program for categorizing video based on machine learning

Info

Publication number: KR20210027006A
Application number: KR1020190173363A
Authority: KR
Inventors: 김상백
Original assignee: 주식회사 카이
Priority date: 2019-08-29
Filing date: 2019-12-23
Publication date: 2021-03-10
Also published as: KR102314990B1

Abstract

Provided is a device for classifying frames in a video, which includes: an input unit for receiving a training dataset for video data; an extraction unit for extracting a characteristic frame of the video data from the input training dataset; and a classification unit configured to classify a frame of the video in consideration of an equal maximum margin proof rate problem with respect to the extracted characteristic frame and the training dataset, and classify the video based on the classified frame.

Description

Apparatus, method, and computer program for classifying video based on machine learning {APPARATUS, METHOD AND COMPUTER PROGRAM FOR CATEGORIZING VIDEO BASED ON MACHINE LEARNING}

본 발명은 머신러닝 기반으로 비디오를 분류하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to an apparatus, a method, and a computer program for classifying video based on machine learning.

최근에는 인터넷의 대중화와 컴퓨터 통신 기술의 급격한 발전에 따라 새로운 멀티미디어 정보 서비스에 대한 요구가 증가하고 있으며, 이로 인해, 멀티미디어 정보 처리를 요구하는 다양한 응용 분야의 출현으로 대량의 멀티미디어 정보를 효율적으로 가공하여 저장하고, 검색하고 재생할 수 있는 기술 개발의 필요성이 증대되고 있다. Recently, with the popularization of the Internet and the rapid development of computer communication technology, the demand for new multimedia information services is increasing, and due to this, the emergence of various application fields that require multimedia information processing has resulted in the efficient processing of a large amount of multimedia information. There is an increasing need to develop technologies that can store, search and reproduce.

멀티미디어 정보 중 비디오 정보는 방송, 교육, 출판, 도서관 등의 다양한 분야에서 중요한 요소로 부각되고 있으며, 비디오 검색은 대용량 데이터베이스에 저장된 비디오 데이터들 중 원하는 정보를 효율적으로 찾아내는 방법을 연구하는 기술로 그 필요성으로 인해 비디오 정보 처리 연구의 핵심이 되고 있다. Among multimedia information, video information is emerging as an important element in various fields such as broadcasting, education, publishing, and libraries, and video search is a technology that studies how to efficiently find desired information among video data stored in a large database. As a result, it has become the core of research on video information processing.

이와 관련하여, 선행기술인 한국등록특허 제 10-1826669호는 동영상 검색 시스템 및 그 방법을 개시하고 있다. In this regard, Korean Patent Registration No. 10-1826669, which is a prior art, discloses a video search system and method thereof.

최근에는 인공지능을 통해 이미지가 나타내는 상황을 파악하여 비디오의 카테고리를 자동으로 분류할 수 있게 되었다. 그러나 비디오 카테고리의 자동 분류는 비디오가 나타내는 상황의 모호성 및 불충분한 정보로 인해 매우 어렵다는 단점을 가지고 있다. In recent years, it is possible to automatically classify video categories by grasping the situations represented by images through artificial intelligence. However, automatic classification of video categories has a disadvantage in that it is very difficult due to ambiguity and insufficient information of the situation represented by the video.

비디오 데이터에 대한 트레이닝 데이터세트를 입력받으면, 입력된 트레이닝 데이터세트로부터 비디오 데이터의 특성 프레임을 추출하는 머신러닝 기반의 비디오 분류 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. An object of the present invention is to provide a machine learning-based video classification apparatus, method, and computer program for extracting feature frames of video data from the input training dataset when a training dataset for video data is input.

비디오 데이터의 특성 프레임 및 트레이닝 데이터세트에 대해 균등 최대 마진 증거율(Balanced Maximal Margin Evidence Rate) 문제를 고려하여 비디오의 프레임을 분류하고, 분류된 프레임에 기초하여 비디오를 분류하는 머신러닝 기반의 비디오 분류 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.Video classification based on machine learning that classifies frames of video by considering the problem of Balanced Maximal Margin Evidence Rate for characteristic frames and training datasets of video data, and classifies video based on the classified frames. It is intended to provide an apparatus, method and computer program.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 비디오 데이터에 대한 트레이닝 데이터세트를 입력받는 입력부, 상기 입력된 트레이닝 데이터세트로부터 상기 비디오 데이터의 특성 프레임을 추출하는 추출부 및 상기 추출된 특성 프레임 및 상기 트레이닝 데이터세트에 대해 균등 최대 마진 증거율(Balanced Maximal Margin Evidence Rate) 문제를 고려하여 상기 비디오의 프레임을 분류하고, 상기 분류된 프레임에 기초하여 상기 비디오를 분류하는 분류부를 포함하는 비디오 분류 장치를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention provides an input unit for receiving a training data set for video data, an extraction unit for extracting a characteristic frame of the video data from the input training data set, and Classification unit for classifying the frame of the video in consideration of a balanced maximal margin evidence rate problem for the extracted feature frame and the training dataset, and classifying the video based on the classified frame It is possible to provide a video classification device including.

본 발명의 다른 실시예는, 비디오 데이터에 대한 트레이닝 데이터세트를 입력받는 단계, 상기 입력된 트레이닝 데이터세트로부터 상기 비디오 데이터의 특성 프레임을 추출하는 단계 및 상기 추출된 특성 프레임 및 상기 트레이닝 데이터세트에 대해 균등 최대 마진 증거율 문제를 고려하여 상기 비디오의 프레임을 분류하고, 상기 분류된 프레임에 기초하여 상기 비디오를 분류하는 단계를 포함하는 비디오 분류 방법을 제공할 수 있다. Another embodiment of the present invention includes the steps of receiving a training dataset for video data, extracting a feature frame of the video data from the input training dataset, and with respect to the extracted feature frame and the training dataset. It is possible to provide a video classification method including classifying a frame of the video in consideration of a uniform maximum margin evidence rate problem, and classifying the video based on the classified frame.

본 발명의 또 다른 실시예는, 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 비디오 데이터에 대한 트레이닝 데이터세트를 입력받고, 상기 입력된 트레이닝 데이터세트로부터 상기 비디오 데이터의 특성 프레임을 추출하고, 상기 추출된 특성 프레임 및 상기 트레이닝 데이터세트에 대해 균등 최대 마진 증거율 문제를 고려하여 상기 비디오의 프레임을 분류하고, 상기 분류된 프레임에 기초하여 상기 비디오를 분류하도록 하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램을 제공할 수 있다. In another embodiment of the present invention, when the computer program is executed by a computing device, the computer program receives a training dataset for video data, extracts a feature frame of the video data from the input training dataset, and the extracted A computer program stored in a medium comprising a sequence of instructions for classifying a frame of the video based on the classified frame, and classifying the frame of the video in consideration of a characteristic frame and a uniform maximum margin evidence rate problem for the training dataset. Can provide.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 트레이닝 데이터세트가 분류된 비디오의 형태로 주어질 경우, 비디오의 프레임 분류 정보가 존재하지 않아 프레임 분류 및 비디오 분류를 수행할 수 없었으나, 이를 균등 최대 마진 증거율 문제를 고려하여 프레임 및 비디오를 분류하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, when the training dataset is given in the form of a classified video, frame classification and video classification cannot be performed because frame classification information of the video does not exist. An apparatus, a method, and a computer program for classifying frames and videos in consideration of a maximum margin evidence rate problem can be provided.

비디오 데이터에 대해 포지티브 비디오가 되도록 포지티브 프레임을 할당하고, 각 포지티브 비디오에 균등하게 포지티브 프레임이 분포하도록 균등 최대 증거율 문제를 고려하여 시뮬레이티드 어닐링을 수행하여 비디오의 프레임을 분류하고, 분류된 프레임에 기초하여 비디오를 분류하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.For video data, a positive frame is allocated to become a positive video, and the frame of the video is classified by performing simulated annealing in consideration of the equal maximum evidence rate problem so that the positive frames are evenly distributed in each positive video An apparatus, a method, and a computer program for classifying videos based on may be provided.

비디오 데이터에 대해 가장 많이 분류된 프레임의 클래스에 기초하여 비디오의 클래스를 분류하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다. It is possible to provide an apparatus, a method, and a computer program for classifying a class of a video based on the class of a frame classified most for video data.

도 1은 본 발명의 일 실시예에 따른 머신 러닝 기반의 비디오 분류 장치의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 비디오 분류 장치에서 특성 프레임을 추출하는 과정을 설명하기 위한 예시적인 도면이다.
도 3은 본 발명의 일 실시예에 따른 머신러닝 기반의 비디오 분류 장치에서 비디오를 분류하는 방법의 순서도이다. 1 is a block diagram of an apparatus for classifying a video based on machine learning according to an embodiment of the present invention.
2 is an exemplary diagram for explaining a process of extracting a feature frame in a video classification apparatus according to an embodiment of the present invention.
3 is a flowchart of a method of classifying a video in a machine learning-based video classification apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. Throughout the specification, when a part is said to be "connected" with another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized by using two or more hardware, or two or more units may be realized by one piece of hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by the terminal or device may be performed instead in a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 머신러닝 기반의 비디오 분류 장치의 구성도이다. 도 1을 참조하면, 비디오 분류 장치(100)는 입력부(110), 추출부(120) 및 분류부(130)를 포함할 수 있다. 1 is a block diagram of an apparatus for classifying a video based on machine learning according to an embodiment of the present invention. Referring to FIG. 1, the video classification apparatus 100 may include an input unit 110, an extraction unit 120, and a classification unit 130.

입력부(110)는 비디오 데이터에 대한 트레이닝 데이터세트(

)를 입력받을 수 있다. The input unit 110 is a training dataset for video data (

) Can be entered.

트레이닝 데이터세트(

)는 비디오 데이터로부터 소정의 시간 간격으로 샘플링된 프레임의 트레이닝 데이터(

), 비디오 데이터(

)의 클래스(

), 프레임의 클래스의 수(L), 트레이닝 데이터의 샘플 수(N) 등을 포함할 수 있다. Training dataset (

) Is training data of frames sampled at predetermined time intervals from video data (

), video data (

) Of class(

), the number of classes of frames (L), the number of samples of training data (N), and the like.

샘플링된 프레임은 객체의 얼굴이 포함되어 이벤트의 인지가 가능한 프레임일 수 있다. 객체의 얼굴은 이벤트를 특징 짓는 중요한 역할을 하며, 객체의 얼굴을 이용하는 경우, 긴 영상의 필요 없이 사진 한장으로 비디오의 이벤트를 인지할 수 있다는 장점을 갖는다. 따라서, 객체가 없는 프레임을 제외시킨 후, 특정 시간 간격(예를 들어, 3초)으로 샘플링하여 입력 데이터를 줄이면서, 프레임 간의 상관계수를 줄여 특성들 간에 중복되지 않도록 할 수 있다. The sampled frame may be a frame in which an event can be recognized by including the face of the object. The face of the object plays an important role in characterizing the event, and when using the face of the object, it has the advantage of being able to recognize the event of a video with a single picture without the need for a long image. Accordingly, after excluding a frame without an object, it is possible to reduce input data by sampling at a specific time interval (eg, 3 seconds) and reduce the correlation coefficient between frames to prevent overlapping between features.

추출부(120)는 입력된 트레이닝 데이터세트로부터 비디오 데이터의 특성 프레임을 추출할 수 있다. 여기서, 비디오 데이터의 특성 프레임은 레벨이 존재하지 않는 프레임일 수 있다. 비디오 데이터의 특성 프레임을 추출하는 과정에 대해서는 도 2를 통해 상세히 설명하도록 한다. The extraction unit 120 may extract a feature frame of video data from the input training data set. Here, the characteristic frame of the video data may be a frame in which a level does not exist. A process of extracting a feature frame of video data will be described in detail with reference to FIG. 2.

도 2는 본 발명의 일 실시예에 따른 비디오 분류 장치에서 특성 프레임을 추출하는 과정을 설명하기 위한 예시적인 도면이다. 2 is an exemplary diagram for explaining a process of extracting a feature frame in a video classification apparatus according to an embodiment of the present invention.

추출부(120)는 축약 프레임 네트워크(RFN, Reduced Frame Network)를 이용하여 특성 프레임을 추출할 수 있다. 축약 프레임 네트워크는 샘플링된 프레임 입력을 생성하고, 도메인 적응(DA, Domain Adaptation)을 적용한 특징 추출 모델을 이용하여 프레임 간의 독립성을 확보하고, 도메인 적응을 통해 트레이닝 데이터세트의 분포와 테스트 데이터세트의 분포 차이를 극복하여 테스트 데이터세트에 대한 분류 정확도를 향상시켜 테스트 에러를 최소화시킬 수 있다. The extraction unit 120 may extract a characteristic frame using a reduced frame network (RFN). The reduced frame network generates sampled frame input, secures the independence between frames by using a feature extraction model to which domain adaptation (DA) is applied, and distribution of the training dataset and the distribution of the test dataset through domain adaptation. By overcoming the difference, the classification accuracy for the test dataset can be improved, thereby minimizing test errors.

추출부(120)는 테스트 도메인에 대해 기설정된 커널을 적용하여 테스트 데이터세트의 성분값을 추출하고, 추출된 테스트 데이터세트의 성분값에 기초하여 테스트 도메인에 대한 부분 공간을 생성할 수 있다. 여기서, 테스트 데이터세트는 비디오 데이터로부터 소정의 시간 간격으로 샘플링된 프레임의 테스트 데이터, 테스트 데이터의 프레임 수 등을 포함할 수 있다. The extraction unit 120 may extract component values of the test dataset by applying a kernel preset for the test domain, and generate a partial space for the test domain based on the component values of the extracted test dataset. Here, the test data set may include test data of frames sampled from video data at predetermined time intervals, the number of frames of test data, and the like.

추출부(120)는 부분 공간에 트레이닝 데이터세트를 투영시킴으로써, 부분 공간에 투영된 트레이닝 데이터세트로부터 테스트 도메인에 적응된 특성 프레임(

)을 추출할 수 있다. The extraction unit 120 projects the training dataset onto the subspace, so that a feature frame adapted to the test domain from the training dataset projected onto the subspace (

) Can be extracted.

도 2를 참조하면, 추출부(120)는 비디오 데이터(200)에 대해 약 3초 간격으로 샘플링(210)을 수행하고, 샘플링된 각 프레임(230)에 대해 커널 PCA(240)를 적용하여 특성 프레임(250)을 추출할 수 있다. 이 때, 추출부(120)는 도메인 적응(220, DA: Domain Adaption)을 위해 트레이닝 도메인의 트레이닝 데이터를 커널 PCA가 적용되어 도메인의 특성이 잘 나타내어 지도록 부분 공간으로 표현된 테스트 도메인을 매핑하여 특성 프레임(250)을 추출할 수 있다. Referring to FIG. 2, the extraction unit 120 performs sampling 210 on the video data 200 at about 3 second intervals, and applies the kernel PCA 240 to each sampled frame 230 to obtain characteristics. The frame 250 can be extracted. At this time, the extraction unit 120 maps the test domain expressed as a subspace so that the kernel PCA is applied to the training data of the training domain for domain adaptation 220 (Domain Adaption). The frame 250 can be extracted.

다시 도 1로 돌아와서, 분류부(130)는 추출된 특성 프레임 및 트레이닝 데이터세트에 대해 균등 최대 마진 증거율(Balanced Maximal Margin Evidence Rate) 문제를 고려하여 비디오의 프레임을 분류하고, 분류된 프레임에 기초하여 비디오를 분류할 수 있다. 이 때, 균등 최대 마진 증거율 문제를 고려하여 비디오의 프레임을 분류하기 위해, 비디오 데이터(

)의 클래스(

), 트레이닝 데이터의 샘플 수(N), 비디오 데이터(

)의 프레임 수를 이용할 수 있다. Returning to FIG. 1 again, the classification unit 130 classifies the frames of the video in consideration of the problem of a balanced maximal margin evidence rate for the extracted feature frames and training datasets, and based on the classified frames. So you can classify your videos. In this case, in order to classify the frames of the video in consideration of the uniform maximum margin evidence rate problem, the video data (

) Of class(

), number of samples of training data (N), video data (

) Can be used.

분류부(130)는 추출된 특성 프레임의 각 클래스(

)에 대하여 초기화를 수행할 수 있다. 예를 들어, 분류부(130)는 다음의 수학식 1을 이용하여 각 클래스(

)에 대하여 초기화를 수행할 수 있다. The classification unit 130 includes each class of the extracted feature frame (

) Can be initialized. For example, the classification unit 130 uses each class (

) Can be initialized.

이후, 분류부(130)는 특성 프레임의 각 클래스(

)에 기초하여 균등 최대 마진 증거율 문제를 고려하여 비디오의 프레임을 분류할 수 있다. 비디오의 프레임을 분류하는 과정에 대해서는 다음의 수학식 2 및 3을 통해 설명하도록 한다. Thereafter, the classification unit 130 includes each class of the characteristic frame (

Based on ), the frame of the video can be classified in consideration of the problem of the uniform maximum margin evidence rate. The process of classifying the frames of a video will be described through Equations 2 and 3 below.

수학식 2 및 3을 참조하면, 분류부(130)는 수학식 2의 균등 최대 마진 증거율 문제를 수학식 3과 같이 단순 마진 문제로 변환할 수 있다. 이 때, 분류부(130)는

를

로 초기화할 수 있다. Referring to Equations 2 and 3, the classification unit 130 may convert the equal maximum margin evidence rate problem of Equation 2 into a simple margin problem as shown in Equation 3. At this time, the classification unit 130

To

Can be initialized with

분류부(130)는 변환된 단순 마진 문제에 기초하여 비디오 데이터 중 네거티브 비디오 데이터(negative video data)의 프레임에 대해 네거티브 프레임(-1)으로 초기화하고, 포지티브 비디오 데이터(positive video data)의 프레임에 대해 포지티브 프레임(+1)으로 초기화할 수 있다. 이 때, 시뮬레이티드 어닐링(simulated annealing)이 진행됨에 따라 포지티브 비디오 데이터의 프레임은 레이블은 변해가면서, 최적해로 수렴하지만, 네거티브 비디오 데이터의 프레임은 '-1'로 변하지 않을 수 있다. The classifier 130 initializes a frame of negative video data among the video data to a negative frame (-1) based on the converted simple margin problem, and then initializes a frame of negative video data to a frame of positive video data. On the other hand, it can be initialized with a positive frame (+1). At this time, as the simulated annealing proceeds, the frame of the positive video data converges to an optimal solution while the label changes, but the frame of the negative video data may not change to'-1'.

분류부(130)는 포지티브 비디오 데이터 중 포지티브 프레임이 가장 적게 할당된 수 및 단순 마진 문제에 대한 최적 목적함수 값에 기초하여 균등 최대 마진 증거율 문제에 대한 목적함수 값을 도출할 수 있다. 균등 최대 마진 증거율 문제에 대한 목적함수 값을 도출하는 과정에 대해서는 수학식 4 및 5를 통해 설명하도록 한다. The classifier 130 may derive an objective function value for the equal maximum margin evidence rate problem based on the number to which the least positive frames are allocated among the positive video data and an optimal objective function value for a simple margin problem. The process of deriving the objective function value for the uniform maximum margin evidence rate problem will be described through Equations 4 and 5.

수학식 4를 참조하면, 수학식 4의

는

가 '+1'로 할당된

의 집합을 나타내며,

가 주어지면,

의 모든 값이 정해질 수 있어 균등 최대 마진 증거율 문제가 RBF 커널의 단순 마진 문제로 줄어들게 될 수 있다. 이 때, 단순 마진 문제의 최적 목적함수 값을

로 두게 되면, 균등 최대 마진 증거율에 대한 목적함수 값은 수학식 5와 같이 도출될 수 있다. 여기서,

는 포지티브 비디오 데이터에 할당된 포지티브 프레임의 수를 나타내고,

는 포지티브 비디오 데이터 중 포지티브 프레임이 가장 적게 할당된 수를 나타낸다. Referring to Equation 4, of Equation 4

Is

Is assigned as'+1'

Represents a set of,

Is given,

Since all values of can be set, the uniform maximum margin evidence rate problem can be reduced to a simple margin problem of the RBF kernel. At this time, the optimal objective function value of the simple margin problem

If set to, the objective function value for the uniform maximum margin evidence rate can be derived as shown in Equation 5. here,

Represents the number of positive frames allocated to the positive video data,

Denotes the number in which positive frames are allocated the least among positive video data.

분류부(130)는 초기화가 완료되면, 마진 에러에 대한 규제 상수(C), 최대 증거율에 대한 규제 상수(C'), 커널 파라미터(

), 초기 온도(T), 온도 감소율(

), 내부 최대 반복 수(

) 등을 포함하는 파라미터를 이용하여 시뮬레이티드 어닐링(Simulated Annealing)을 수행할 수 있다. 여기서, 마진 에러에 대한 규제 상수(C) 및 커널 파라미터(

)는 커널 SVM(Support Vector Machine)에 관한 규제 상수이며, 최대 증거율에 대한 규제 상수(C')는 포지티브 인스턴스 밸런스를 위한 규제 상수로, 여러 번의 시행을 통해 최적값이 도출될 수 있다. When the initialization is completed, the classification unit 130, a regulation constant for margin error (C), a regulation constant for maximum evidence rate (C'), kernel parameter (

), initial temperature (T), temperature reduction rate (

), internal maximum number of iterations (

) And the like can be used to perform simulated annealing. Here, the regulatory constant for the margin error (C) and the kernel parameter (

) Is a regulatory constant for the kernel SVM (Support Vector Machine), and the regulatory constant (C') for the maximum evidence rate is a regulatory constant for positive instance balance, and an optimum value can be derived through several trials.

시뮬레이티드 어닐링을 위한 파라미터는 예를 들어, 초기 온도(T): T=100, 온도 감소율(

):

=0.95, 내부 최대 반복 수(

):

=100으로 설정될 수 있다. 여기서, 내부 최대 반복 수는 주어진 온도에서 최대 반복 가능한 수로 온도 길이라고도 하며, 경우에 따라 더 크게 조정될 수도 있다. The parameters for the simulated annealing are, for example, initial temperature (T): T = 100, temperature reduction rate (

):

=0.95, internal maximum number of iterations (

):

Can be set to =100. Here, the internal maximum repetition number is the maximum repetition number at a given temperature and is also referred to as a temperature length, and may be adjusted to be larger in some cases.

분류부(130)는 다음의 표 1의 시뮬레이티드 어닐링을 통해 균등 최대 마진 증거율 문제에 대한 적어도 하나의 최적해를 도출하고, 도출된 적어도 하나의 최적해에 기초하여 비디오의 프레임을 분류할 수 있다. The classifier 130 may derive at least one optimal solution for the equal maximum margin evidence rate problem through the simulated annealing shown in Table 1 below, and classify the frames of the video based on the derived at least one optimal solution. .

repeatrepeat -

Phosphorus random

Select

Change the sign of;

-

if

then

else else

Random number between

Choose if

then

endifendif endifendif

until

분류부(130)는 적어도 하나의 최적해로

,

를 도출할 수 있다. 여기서,

는 최종

에 의한

의 레이블로,

로 구성되고,

는 각 클래스(k)에 대한 균등 최대 증거율 문제에 대한 듀얼 최적해를 나타낼 수 있다. Classification unit 130 is at least one optimal solution

,

Can be derived. here,

Is the final

On by

As the label of,

Consists of,

Can represent the dual optimal solution to the uniform maximum evidence rate problem for each class (k).

분류부(130)는 균등 최대 마진 증거율 문제에 대한 최종 출력을 도출할 수 있다. 최종 출력에 대해서는 수학식 6을 통해 설명하도록 한다.The classification unit 130 may derive a final output for the problem of the equal maximum margin evidence rate. The final output will be described through Equation 6.

수학식 6을 참조하면, 분류부(130)는 도출된 적어도 하나의 최적해를 이용하여 절편

를 도출하고, 균등 최대 마진 증거율 문제에 대한 최종 출력₍

₎을 도출할 수 있다.Referring to Equation 6, the classification unit 130 uses at least one derived optimal solution to intercept

And the final output ₍

₎ Can be derived.

분류부(130)는 표 1에 기초하여 포지티브 비디오 데이터 중 적어도 하나의 프레임을 임의로 선택하여 레이블을 변경할 수 있다. 예를 들어, 분류부(130)는

이면,

를

에 더하고(즉,

로 변경함), 반대로

이면,

를

에 뺄 수 있다(즉,

로 변경함).The classifier 130 may change the label by randomly selecting at least one frame from among the positive video data based on Table 1. For example, the classification unit 130

If,

To

Add to (i.e.

Change to), vice versa

If,

To

Can be subtracted from (i.e.

Changed to).

분류부(130)는 포지티브 비디오 데이터에 대해 변경된 레이블에 따른 개선된 효과 값을 산출하고, 산출된 효과 값에 기초하여 비디오의 프레임을 분류할 수 있다. 예를 들어, 분류부(130)는

를 풀어,

를 계산함으로써,

의 레이블이 변경된 효과 값을 산출할 수 있다. 이 때, 해가 개선된 경우,

로 옮겨지고, 그렇지 않은 경우,

의 확률로 옮겨질 수 있다. The classifier 130 may calculate an improved effect value according to the changed label for the positive video data, and classify the frame of the video based on the calculated effect value. For example, the classification unit 130

Loosen it,

By calculating

The effect value of which the label of is changed can be calculated. At this time, if the solution is improved,

Moved to, if not,

Can be transferred with a probability of.

분류부(130)는 균등 최대 마진 증거율 문제의 최종 출력에 기초하여 비디오의 프레임을 분류하고, 분류된 프레임에 기초하여 비디오를 분류할 수 있다. 예를 들어, 분류부(130)는 비디오 데이터에 대해 가장 많이 분류된 프레임의 클래스에 기초하여 비디오의 클래스를 분류할 수 있다. 비디오의 프레임 및 비디오의 클래스를 분류하는 과정에 대해서는 수학식 7 및 8을 통해 설명하도록 한다. The classifier 130 may classify the frames of the video based on the final output of the equal maximum margin evidence rate problem, and classify the video based on the classified frames. For example, the classifier 130 may classify the video class based on the class of the frame that is most classified for the video data. A process of classifying a frame of a video and a class of a video will be described through Equations 7 and 8.

수학식 7을 참조하면, 분류부(130)는 수학식 6의 균등 최대 마진의 증거율 문제에 대한 최종 출력에 기초하여 생성된 수학식 7의 프레임 분류기를 통해 비디오의 프레임을 분류할 수 있다. Referring to Equation 7, the classifier 130 may classify the frames of the video through the frame classifier of Equation 7 generated based on the final output of the Equation 6, the evidence rate problem of the equal maximum margin.

분류부(130)는 비디오의 클래스를 분류하는 과정에 대해서는 수학식 8을 통해 설명하도록 한다. The classification unit 130 will describe a process of classifying a video class through Equation 8.

수학식 8을 참조하면, 분류부(130)는 수학식 8을 통해 생성된 비디오 분류기를 통해 새로운 비디오 데이터(

)에 대해 다수결(Majority Voting)에 의해 해당 비디오에 가장 많이 포함된 프레임의 클래스에 기초하여 비디오의 클래스 및 카테고리를 분류할 수 있다. Referring to Equation 8, the classifier 130 uses the video classifier generated through Equation 8 to generate new video data (

), it is possible to classify the class and category of the video based on the class of the frame most included in the corresponding video by majority voting.

이러한 비디오 분류 장치(100)는 비디오의 프레임을 분류하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램에 의해 실행될 수 있다. 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 비디오 데이터에 대한 트레이닝 데이터세트를 입력받고, 입력된 트레이닝 데이터세트로부터 비디오 데이터의 특성 프레임을 추출하고, 추출된 특성 프레임 및 트레이닝 데이터세트에 대해 균등 최대 마진 증거율 문제를 고려하여 비디오의 프레임을 분류하고, 분류된 프레임에 기초하여 비디오를 분류하도록 하는 명령어들의 시퀀스를 포함할 수 있다. The video classification apparatus 100 may be executed by a computer program stored in a medium including a sequence of instructions for classifying frames of a video. When the computer program is executed by the computing device, it receives a training dataset for video data, extracts feature frames of video data from the input training dataset, and equals maximum margin evidence for the extracted feature frames and training datasets. A sequence of instructions for classifying a frame of a video in consideration of a rate problem and classifying a video based on the classified frame may be included.

이러한 과정을 통해, 비디오 분류 장치(100)는 트레이닝 데이터가 분류된 비디오의 형태로 주어질 경우, 해당 비디오의 프레임 분류 정보가 존재하지 않아, 프레임 분류를 이용한 비디오 분류 기법을 적용할 수 없으므로, 이를 기존의 단순 mi-MIL 기법이 개선된 균등 최대마진 증거율(Balanced Maximal Margin Evidence Rate)을 고려한 VCMIL 기법을 이용하여 비디오의 카테고리를 분류할 수 있다. Through this process, when the training data is given in the form of a classified video, the video classification apparatus 100 does not have frame classification information of the video, so that a video classification technique using frame classification cannot be applied. The video category can be classified using the VCMIL technique that considers the balanced maximal margin evidence rate, which is improved by the simple mi-MIL technique of.

또한, 비디오 분류 장치(100)는 포지티브 비디오에 되도록 많은 포지티브 프레임이 할당되도록 하는 동시에 각 포지티브 비디오에 균등하게 포지티브 프레임이 분포되도록 균형 최대 증거율 마진(BER, Balanced Evidence Rate) SVM을 이용하여 하여 비디오의 카테고리를 분류할 수 있다. In addition, the video classification apparatus 100 uses the Balanced Evidence Rate (BER) SVM so that as many positive frames as possible are allocated to the positive video and the positive frames are evenly distributed to each positive video. You can classify the categories of.

도 3은 본 발명의 일 실시예에 따른 머신러닝 기반의 비디오 분류 장치에서 비디오의 프레임을 분류하는 방법의 순서도이다. 도 3에 도시된 비디오 분류 장치(100)에서 비디오의 프레임을 분류하는 방법은 도 1 및 도 2에 도시된 실시예에 따라 비디오 분류 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 및 도 2에 도시된 실시예에 따른 비디오 분류 장치(100)에서 수행되는 비디오의 프레임을 분류하는 방법에도 적용된다. 3 is a flowchart of a method for classifying frames of video in a machine learning-based video classification apparatus according to an embodiment of the present invention. A method of classifying a frame of a video in the video classification apparatus 100 shown in FIG. 3 includes steps processed in a time series by the video classification apparatus 100 according to the embodiments shown in FIGS. 1 and 2. Therefore, even if omitted below, it is also applied to a method of classifying frames of video performed by the video classification apparatus 100 according to the exemplary embodiment illustrated in FIGS. 1 and 2.

단계 S310에서 비디오 분류 장치(100)는 비디오 데이터에 대한 트레이닝 데이터세트를 입력받을 수 있다. In step S310, the video classification apparatus 100 may receive a training dataset for video data.

단계 S320에서 비디오 분류 장치(100)는 입력된 트레이닝 데이터세트로부터 비디오 데이터의 특성 프레임을 추출할 수 있다. In step S320, the video classification apparatus 100 may extract a feature frame of video data from the input training dataset.

단계 S330에서 비디오 분류 장치(100)는 추출된 특성 프레임 및 트레이닝 데이터세트에 대해 균등 최대 마진 증거율 문제를 고려하여 비디오의 프레임을 분류할 수 있다. In step S330, the video classification apparatus 100 may classify a frame of a video in consideration of an equal maximum margin evidence rate problem with respect to the extracted feature frame and the training dataset.

단계 S340에서 비디오 분류 장치(100)는 분류된 프레임에 기초하여 비디오를 분류할 수 있다. In step S340, the video classification apparatus 100 may classify a video based on the classified frame.

상술한 설명에서, 단계 S310 내지 S340은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S310 to S340 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between steps may be switched.

도 1 내지 도 3을 통해 설명된 비디오 분류 장치에서 비디오의 프레임을 분류하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 3을 통해 설명된 비디오 분류 장치에서 비디오의 프레임을 분류하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method for classifying frames of video in the video classification apparatus described with reference to FIGS. 1 to 3 may be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by a computer. have. In addition, the method of classifying frames of video in the video classification apparatus described with reference to FIGS. 1 to 3 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include a computer storage medium. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that other specific forms can be easily modified without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 비디오 분류 장치
110: 입력부
120: 추출부
130: 분류부100: video classification device
110: input unit
120: extraction unit
130: classification unit

Claims

In the device for classifying videos,
An input unit for receiving a training data set for video data;
An extraction unit for extracting a feature frame of the video data from the input training data set; And
Classification unit for classifying the frames of the video in consideration of the problem of a Balanced Maximal Margin Evidence Rate for the extracted feature frames and the training dataset, and classifying the video based on the classified frames
Including a video classification device.

The method of claim 1,
The extraction unit extracts the feature frame adapted to the test domain from the training dataset projected on the subspace by projecting the training dataset onto a subspace of the test domain generated based on component values of the test dataset. That, a video classification device.

The method of claim 2,
The training dataset includes at least one of training data of a frame sampled from the video data at a predetermined time interval, a class and class of the training data, and the number of classes of the frame,
Wherein the test data set includes at least one of test data of frames sampled from the video data at predetermined time intervals and the number of frames of the test data.

The method of claim 2,
The video classification apparatus, wherein the characteristic frame of the video data is a frame in which a level does not exist.

The method of claim 1,
Wherein the classification unit initializes each class of the extracted feature frame.

The method of claim 5,
The classification unit converts the uniform maximum margin evidence rate problem into a simple margin problem based on each class of the characteristic frame, and a frame of negative video data among the video data based on the converted simple margin problem. The video classification apparatus of claim 1, wherein the frame is initialized to a negative frame and a frame of positive video data is initialized to a positive frame.

The method of claim 6,
The classification unit derives an objective function value for the uniform maximum margin evidence rate problem based on the number of the positive frames allocated the least among the positive video data and an optimal objective function value for the simple margin problem. Classification device.

The method of claim 6,
When the initialization is completed, the classification unit is simulated using a parameter including at least one of a regulation constant for a margin error, a regulation constant for a maximum evidence rate, a kernel parameter, an initial temperature, a temperature reduction rate, and an internal maximum number of iterations. To perform annealing (Simulated Anealing), video classification apparatus.

The method of claim 8,
The classification unit derives at least one optimal solution for the equalized maximum margin evidence rate problem through the simulated annealing and classifies the frames of the video based on the derived at least one optimal solution. .

The method of claim 9,
The classification unit randomly selects at least one frame from among the positive video data to change a label, calculates an improved effect value according to the changed label for the positive video data, and calculates the video based on the calculated effect value. The video classification device is to classify the frames of.

The method of claim 1,
The classifying unit classifies the video class based on the class of the frame most classified for the video data.

In a method for classifying a video in a video classification device,
Receiving a training dataset for video data;
Extracting a feature frame of the video data from the input training dataset;
Classifying the frames of the video in consideration of the extracted feature frame and the uniform maximum margin evidence rate problem for the training dataset; And
Classifying the video based on the classified frame
Including, video classification method.

The method of claim 12,
The step of extracting the feature frame of the video data,
Projecting the training dataset onto a subspace of a test domain generated based on component values of the test dataset; And
And extracting the feature frame adapted to the test domain from a training dataset projected onto the subspace.

The method of claim 12,
Classifying the frames of the video,
And performing initialization for each class of the extracted feature frame.

The method of claim 14,
Classifying the frames of the video,
Converting the uniform maximum margin evidence rate problem into a simple margin problem based on each class of the characteristic frame;
Initializing a frame of negative video data among the video data as a negative frame and initializing a frame of positive video data as a positive frame based on the converted simple margin problem Including, video classification method.

The method of claim 15,
Classifying the frames of the video,
Including the step of deriving an objective function value for the uniform maximum margin evidence rate problem based on the number of the positive frames allocated to the least number of the positive video data and an optimal objective function value for the simple margin problem, How to classify videos.

The method of claim 15,
Classifying the frames of the video,
When the initialization is completed, simulated annealing using a parameter including at least one of a regulation constant for a margin error, a regulation constant for a maximum evidence rate, a kernel parameter, an initial temperature, a temperature reduction rate, and an internal maximum number of repetitions. Anealing) of the method comprising the step of performing.

The method of claim 17,
Classifying the frames of the video,
Deriving at least one optimal solution to the uniform maximum margin evidence rate problem through the simulated annealing; And
Classifying the frames of the video based on the derived at least one optimal solution.

The method of claim 12,
Classifying the video,
Classifying the class of the video based on the class of the frame most classified for the video data.

A computer program stored in a computer-readable medium comprising a sequence of instructions for classifying video, comprising:
When the computer program is executed by a computing device,
Receive a training dataset for video data,
Extracting a feature frame of the video data from the input training dataset,
Classify the frames of the video in consideration of the extracted feature frame and the uniform maximum margin evidence rate problem for the training dataset,
A computer program stored on a medium comprising a sequence of instructions for causing classification of the video based on the classified frame.