KR20210064587A

KR20210064587A - High speed split device and method for video section

Info

Publication number: KR20210064587A
Application number: KR1020190152975A
Authority: KR
Inventors: 강제원; 박종재; 추성권; 하성종
Original assignee: 주식회사 엔씨소프트; 이화여자대학교 산학협력단
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2021-06-03

Abstract

According to one embodiment of the present invention, provided is a video section high-speed segmentation device, which includes at least one processor. The at least one processor determines I frames in the video, and extracts visual features for each of the determined I frames.

Description

High-speed video segmentation device and high-speed segmentation method {HIGH SPEED SPLIT DEVICE AND METHOD FOR VIDEO SECTION}

아래의 실시예들은 동영상 구간 고속 분할 장치 및 고속 분할 방법에 관한 것이다.The following embodiments relate to a high-speed division apparatus and a high-speed division method for a video section.

코덱(영어: codec)은 어떠한 데이터 스트림이나 신호에 대해, 인코딩이나 디코딩, 혹은 둘 다를 할 수 있는 하드웨어나 소프트웨어를 말한다. 일반적으로 코덱이라고 하면 영상, 음향 등 미디어 정보를 압축하는 기술을 가리킨다. A codec (English: codec) is hardware or software that can encode, decode, or both for any data stream or signal. In general, a codec refers to a technology for compressing media information such as video and sound.

프레임은 크게 독립적으로 인코딩 된 프레임과 예측된 프레임 두가지로 구분된다. 전자가 I-프레임(Intra-coded frame)이고, 후자는 P-프레임(Predictive-coded frame)과 B-프레임(Bidirectional-coded frame)으로 구분된다. I-프레임(Intra-coded frame)은 데이터 스트림의 어느 위치에도 올 수 있으며, 데이터의 임의 접근을 위해 사용되며, 다른 이미지들의 참조 없이 부호화 된다. P-프레임(Predictive-coded frame)은 부호화와 복호화를 행할 때 이전의 I-프레임 정보와 이전의 P-프레임의 정보를 사용한다. B-프레임(Bidirectional-coded frame)은 부호화와 복호화를 행할 때 이전, 이후의 I-프레임과 P-프레임 모두를 사용한다.Frames are largely divided into independently encoded frames and predicted frames. The former is an I-frame (Intra-coded frame), and the latter is divided into a P-frame (Predictive-coded frame) and a B-frame (Bidirectional-coded frame). I-frames (Intra-coded frames) can come anywhere in the data stream, are used for random access to data, and are coded without reference to other images. A P-frame (Predictive-coded frame) uses the previous I-frame information and the previous P-frame information when encoding and decoding. A B-frame (Bidirectional-coded frame) uses both the I-frame and the P-frame before and after performing encoding and decoding.

머신 러닝(machine learning)은 인공 지능의 한 분야로, 패턴인식과 컴퓨터 학습 이론의 연구로부터 진화한 분야이며, 컴퓨터가 학습할 수 있도록 하는 알고리즘과 기술을 개발하는 분야를 말한다. Machine learning is a field of artificial intelligence that has evolved from the study of pattern recognition and computer learning theory, and refers to a field that develops algorithms and technologies that allow computers to learn.

딥 러닝(deep learning)은 여러 비선형 변환기법의 조합을 통해 높은 수준의 추상화를 시도하는 머신 러닝(machine learning) 알고리즘의 집합으로 정의되며, 큰 틀에서 사람의 사고방식을 컴퓨터에게 가르치는 머신 러닝의 한 분야라고 이야기할 수 있다.Deep learning is defined as a set of machine learning algorithms that attempt high-level abstraction through a combination of several nonlinear transformation methods. field can be said.

본 발명의 실시예에 따르면, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 기초로 동영상의 구간을 고속 분할할 수 있는 동영상 구간 고속 분할 장치 및 고속 분할 방법을 제공할 수 있다.According to an embodiment of the present invention, it is possible to provide a video section high-speed segmentation apparatus and a high-speed segmentation method capable of high-speed segmentation of a video section based on visual features for each of the I frames.

또한, 본 발명의 다른 실시예에 따르면, I 프레임(frame)들 각각에 대한 언어 특징(language features)을 기초로 동영상의 구간을 고속 분할할 수 있는 동영상 구간 고속 분할 장치 및 고속 분할 방법을 제공할 수 있다.Further, according to another embodiment of the present invention, it is to provide a high-speed segmentation apparatus and a high-speed segmentation method for a moving image section capable of rapidly segmenting a segment of a moving image based on language features for each of the I frames can

또한, 본 발명의 또 다른 실시예에 따르면, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features) 및 언어 특징(language features)을 기초로 동영상의 구간을 고속 분할할 수 있는 동영상 구간 고속 분할 장치 및 고속 분할 방법을 제공할 수 있다.In addition, according to another embodiment of the present invention, high-speed segmentation of a video section capable of rapidly segmenting a section of a video based on visual features and language features for each of the I frames A device and a high-speed partitioning method can be provided.

또한, 본 발명의 또 다른 실시예에 따르면, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features), 언어 특징(language features) 및 데이터 유닛(date unit)의 길이를 기초로 동영상의 구간을 고속 분할할 수 있는 동영상 구간 고속 분할 장치 및 고속 분할 방법을 제공할 수 있다.In addition, according to another embodiment of the present invention, the section of the video is determined based on the length of visual features, language features, and data units for each of the I frames. It is possible to provide a high-speed division apparatus and a high-speed division method for a video section capable of high-speed division.

또한, 본 발명의 또 다른 실시예에 따르면, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features) 및 데이터 유닛(date unit)의 길이를 기초로 동영상의 구간을 고속 분할할 수 있는 동영상 구간 고속 분할 장치 및 고속 분할 방법을 제공할 수 있다.In addition, according to another embodiment of the present invention, a video section capable of rapidly dividing a section of a video based on visual features for each of the I frames and the length of a data unit (date unit) A high-speed division apparatus and a high-speed division method can be provided.

또한, 본 발명의 또 다른 실시예에 따르면, I 프레임(frame)들 각각에 대한 언어 특징(language features) 및 데이터 유닛(date unit)의 길이를 기초로 동영상의 구간을 고속 분할할 수 있는 동영상 구간 고속 분할 장치 및 고속 분할 방법을 제공할 수 있다.In addition, according to another embodiment of the present invention, a video section capable of rapidly dividing a section of a video based on language features for each of the I frames and the length of a data unit (date unit). A high-speed division apparatus and a high-speed division method can be provided.

본 발명의 일실시예에 따르면, 동영상 구간 고속 분할 장치에 있어서, 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 동영상에서 I 프레임(frame)들을 판별하고, 상기 판별한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출한다.According to an embodiment of the present invention, in an apparatus for high-speed segmentation of a video section, the apparatus includes at least one processor, wherein the at least one processor determines I frames in the video, and the determined I frames ) to extract the visual features for each.

또한, 상기 적어도 하나의 프로세서는, 상기 추출한 비주얼 특징(visual features)을 기초로 상기 동영상의 구간을 분할할 수 있다.Also, the at least one processor may divide the section of the video based on the extracted visual features.

또한, 상기 적어도 하나의 프로세서는, 상기 판별한 I 프레임(frame)들 각각에 대한 이미지 캡션을 생성하고, 상기 생성한 이미지 캡션에서 언어 특징(language features)을 추출할 수 있다.Also, the at least one processor may generate an image caption for each of the determined I frames, and extract language features from the generated image caption.

또한, 상기 적어도 하나의 프로세서는, 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)을 기초로 상기 동영상의 구간을 분할할 수 있다.Also, the at least one processor may divide the section of the video based on the extracted visual features or the extracted language features.

또한, 상기 적어도 하나의 프로세서는, 상기 판별한 I 프레임(frame)들 각각에 대한 데이터 유닛(date unit)의 길이를 계산할 수 있다.Also, the at least one processor may calculate the length of a data unit for each of the determined I frames.

또한, 상기 적어도 하나의 프로세서는, 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할할 수 있다.In addition, the at least one processor may divide the section of the video based on the extracted visual features, the extracted language features, or the calculated length of the data unit. .

또한, 상기 적어도 하나의 프로세서는, 상기 추출한 비주얼 특징(visual features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할할 수 있다.Also, the at least one processor may divide the section of the video based on the extracted visual features or the calculated length of the data unit (date unit).

본 발명의 다른 실시예에 따르면, 동영상 구간 고속 분할 장치에 있어서, 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 동영상에서 I 프레임(frame)들을 판별하고, 상기 판별한 I 프레임(frame)들 각각에 대한 이미지 캡션을 생성하고, 상기 생성한 이미지 캡션에서 언어 특징(language features)을 추출한다.According to another embodiment of the present invention, in an apparatus for dividing a video section at a high speed, the apparatus includes at least one processor, wherein the at least one processor determines I frames in the video, and the determined I frames ), an image caption is generated for each, and language features are extracted from the generated image caption.

또한, 상기 적어도 하나의 프로세서는, 상기 추출한 언어 특징(language features)을 기초로 상기 동영상의 구간을 분할할 수 있다.Also, the at least one processor may divide the section of the video based on the extracted language features.

또한, 상기 적어도 하나의 프로세서는, 상기 판별한 I 프레임(frame)들 각각에 대한 데이터 유닛(date unit)의 길이를 계산하고, 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할할 수 있다.In addition, the at least one processor calculates a length of a data unit for each of the determined I frames, and uses the extracted language features or the calculated date unit. ) can be divided into sections of the video based on the length of the video.

본 발명의 또 다른 실시예에 따르면, 동영상에서 I 프레임(frame)들을 판별하는 동작 및 상기 판별한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출하는 동작을 포함한다.According to another embodiment of the present invention, an operation of determining I frames from a moving picture and an operation of extracting visual features for each of the determined I frames are included.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 추출한 비주얼 특징(visual features)을 기초로 상기 동영상의 구간을 분할하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section may further include dividing the section of the video based on the extracted visual features.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 판별한 I 프레임(frame)들 각각에 대한 이미지 캡션을 생성하는 동작 및 상기 생성한 이미지 캡션에서 언어 특징(language features)을 추출하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section may further include generating an image caption for each of the determined I frames and extracting language features from the generated image caption. have.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)을 기초로 상기 동영상의 구간을 분할하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section may further include dividing the section of the video based on the extracted visual features or the extracted language features.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 판별한 I 프레임(frame)들 각각에 대한 데이터 유닛(date unit)의 길이를 계산하는 동작을 더 포함할 수 있다.Also, the high-speed segmentation method of the video section may further include calculating the length of a data unit for each of the determined I frames.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section includes dividing the section of the video based on the extracted visual features, the extracted language features, or the calculated length of the data unit (date unit). may further include.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 추출한 비주얼 특징(visual features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section may further include dividing the section of the video based on the extracted visual features or the calculated length of the data unit (date unit).

본 발명의 또 다른 실시예에 따르면, 동영상에서 I 프레임(frame)들을 판별하는 동작, 상기 판별한 I 프레임(frame)들 각각에 대한 이미지 캡션을 생성하는 동작 및 상기 생성한 이미지 캡션에서 언어 특징(language features)을 추출하는 동작을 포함한다.According to another embodiment of the present invention, the operation of determining I frames in a video, the operation of generating an image caption for each of the determined I frames, and the language feature ( language features).

또한, 상기 동영상 구간 고속 분할 방법은, 상기 추출한 언어 특징(language features)을 기초로 상기 동영상의 구간을 분할하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section may further include dividing the section of the video based on the extracted language features.

또한, 상기 동영상 구간 고속 분할 방법은, 상기 판별한 I 프레임(frame)들 각각에 대한 데이터 유닛(date unit)의 길이를 계산하는 동작 및 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할하는 동작을 더 포함할 수 있다.In addition, the high-speed segmentation method of the video section includes an operation of calculating the length of a data unit for each of the determined I frames, and the extracted language features or the calculated data unit ( The operation of dividing the section of the video based on the length of the date unit) may be further included.

본 발명의 일실시예에 따르면, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 기초로 동영상의 구간을 고속 분할할 수 있는 효과가 있다.According to an embodiment of the present invention, there is an effect of high-speed segmentation of a section of a video based on visual features of each of the I frames.

또한, I 프레임(frame)들 각각에 대한 언어 특징(language features)을 기초로 동영상의 구간을 고속 분할할 수 있는 효과가 있다.In addition, there is an effect of high-speed division of a section of a video based on language features for each of the I frames.

또한, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features) 및 언어 특징(language features)을 기초로 동영상의 구간을 고속 분할할 수 있는 효과가 있다.In addition, there is an effect of high-speed segmentation of a section of a video based on visual features and language features of each of the I frames.

또한, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features), 언어 특징(language features) 및 데이터 유닛(date unit)의 길이를 기초로 동영상의 구간을 고속 분할할 수 있는 효과가 있다.In addition, there is an effect of high-speed segmentation of a section of a video based on the length of visual features, language features, and data units for each of the I frames.

또한, I 프레임(frame)들 각각에 대한 비주얼 특징(visual features) 및 데이터 유닛(date unit)의 길이를 기초로 동영상의 구간을 고속 분할할 수 있는 효과가 있다.In addition, there is an effect of high-speed segmentation of a section of a video based on the length of a data unit and visual features for each of the I frames.

또한, I 프레임(frame)들 각각에 대한 언어 특징(language features) 및 데이터 유닛(date unit)의 길이를 기초로 동영상의 구간을 고속 분할할 수 있는 효과가 있다.In addition, there is an effect of high-speed division of a section of a video based on the length of a data unit and language features for each of the I frames.

도 1은 일실시예에 따른 동영상 구간 고속 분할 장치의 구성을 나타내는 도면이다.
도 2는 일실시예에 따른 동영상 구간 고속 분할 방법을 나타내는 플로우 차트이다.
도 3는 다른 실시예에 따른 동영상 구간 고속 분할 방법을 나타내는 플로우 차트이다.
도 4는 일실시예에 따라 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 기초로 동영상의 구간을 고속 분할하는 모습을 나타내는 도면이다.
도 5는 다른 실시예에 따라 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features) 및 언어 특징(language features)을 기초로 동영상의 구간을 고속 분할하는 모습을 나타내는 도면이다.
도 6은 동영상 구간 및 상기 동영상을 구성하는 프레임을 나타내는 도면이다.1 is a diagram illustrating a configuration of a video section high-speed segmentation apparatus according to an embodiment.
2 is a flowchart illustrating a method for high-speed segmentation of a video section according to an exemplary embodiment.
3 is a flowchart illustrating a method for high-speed segmentation of a video section according to another exemplary embodiment.
4 is a diagram illustrating a state in which sections of a video are divided at high speed based on visual features of each of I frames according to an embodiment.
5 is a diagram illustrating a state in which sections of a video are divided at high speed based on visual features and language features of each of I frames according to another embodiment.
6 is a diagram illustrating a video section and a frame constituting the video.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions for the embodiments according to the concept of the present invention disclosed in this specification are only exemplified for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention are It may be implemented in various forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Since the embodiments according to the concept of the present invention may have various changes and may have various forms, the embodiments will be illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소는 제1구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one element from another element, for example, without departing from the scope of rights according to the concept of the present invention, a first element may be called a second element, and similarly The second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it is understood that the other component may be directly connected or connected to the other component, but other components may exist in between. it should be On the other hand, when it is mentioned that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle. Other expressions describing the relationship between elements, such as "between" and "immediately between" or "neighboring to" and "directly adjacent to", etc., should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise.

본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.As used herein, terms such as “comprise” or “have” are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers. , it is to be understood that it does not preclude the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하의 설명에서 동일한 식별 기호는 동일한 구성을 의미하며, 불필요한 중복적인 설명 및 공지 기술에 대한 설명은 생략하기로 한다.In the following description, the same identification symbols mean the same configuration, and unnecessary redundant descriptions and descriptions of well-known technologies will be omitted.

본 발명의 실시 예에서 '통신', '통신망' 및 '네트워크'는 동일한 의미로 사용될 수 있다. 상기 세 용어들은, 파일을 사용자 단말, 다른 사용자들의 단말 및 다운로드 서버 사이에서 송수신할 수 있는 유무선의 근거리 및 광역 데이터 송수신망을 의미한다.In an embodiment of the present invention, 'communication', 'communication network' and 'network' may be used as the same meaning. The above three terms mean a wired/wireless short-distance and wide-area data transmission/reception network capable of transmitting and receiving files between a user terminal, terminals of other users, and a download server.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 설명함으로써, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings.

도 1은 일실시예에 따른 동영상 구간 고속 분할 장치의 구성을 나타내는 도면이다.1 is a diagram illustrating a configuration of a video section high-speed segmentation apparatus according to an embodiment.

도 1을 참조하면, 동영상 구간 고속 분할 장치(100)는 프로세서(110), 입출력 인터페이스 모듈(120) 및 메모리(130)를 포함한다.Referring to FIG. 1 , the high-speed video section division apparatus 100 includes a processor 110 , an input/output interface module 120 , and a memory 130 .

동영상 구간 고속 분할 장치(100)를 프로세서(110), 입출력 인터페이스 모듈(120) 및 메모리(130)는 상호 연결되어 있으며, 상호 데이터를 전송하는 것이 가능하다.The processor 110, the input/output interface module 120, and the memory 130 of the video section high-speed division apparatus 100 are interconnected, and it is possible to transmit data to each other.

프로세서(110)는 메모리(130)에 저장된 프로그램들 또는 명령들을 실행시킬 수 있다. 이때, 메모리(130)에는 동영상 구간 고속 분할 장치(100)를 동작시키기 위한 동작프로그램(예컨대, OS)이 저장될 수 있다.The processor 110 may execute programs or instructions stored in the memory 130 . In this case, the memory 130 may store an operation program (eg, OS) for operating the high-speed video section division apparatus 100 .

프로세서(110)는 동영상 구간 고속 분할 장치(100)에 대한 정보를 관리하기 위한 프로그램을 실행시킬 수 있다.The processor 110 may execute a program for managing information on the high-speed segmentation apparatus 100 of a video section.

프로세서(110)는 동영상 구간 고속 분할 장치(100)의 동작을 관리하기 위한 프로그램을 실행시킬 수 있다.The processor 110 may execute a program for managing the operation of the high-speed segmentation apparatus 100 for video sections.

프로세서(110)는 입출력 인터페이스 모듈(120)의 동작을 관리하기 위한 프로그램을 실행시킬 수 있다.The processor 110 may execute a program for managing the operation of the input/output interface module 120 .

프로세서(110)는 입출력 인터페이스 모듈(120)을 통해 동영상(비디오 압축 스트림, 압축 비트 스트림)을 획득할 수 있다.The processor 110 may acquire a moving picture (a compressed video stream and a compressed bit stream) through the input/output interface module 120 .

프로세서(110)는 상기 획득한 동영상에서 I 프레임(frame)들, P 프레임(frame)들 또는 B 프레임(frame)들을 판별할 수 있다.The processor 110 may determine I frames, P frames, or B frames from the obtained video.

프로세서(110)는 상기 판별한 프레임(frame)들 중 I 프레임(frame)들만을 선택하여 디코딩할 수 있다.The processor 110 may select and decode only I frames from among the determined frames.

ⅰ) 비주얼 특징(visual features)i) visual features

프로세서(110)는 상기 디코딩한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출할 수 있다. 이때, 프로세서(110)는 상기 디코딩한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출하기 위하여 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용할 수 있으나, 프로세서(110)가 상기 디코딩한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출하기 위하여 이용할 수 있는 것이 이에 한정되는 것은 아니다.The processor 110 may extract visual features for each of the decoded I frames. In this case, the processor 110 may use a learned deep neural network to extract visual features for each of the decoded I frames, but the processor 110 may What can be used to extract visual features for each of the decoded I frames is not limited thereto.

프로세서(110)는 상기 디코딩한 I 프레임(frame)들 각각에서 추출한 상기 비주얼 특징(visual features)을 기초로 상기 동영상의 구간을 분할할 수 있다.The processor 110 may divide the section of the video based on the visual features extracted from each of the decoded I frames.

프로세서(110)는 상기 비주얼 특징(visual features)이 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 may set a point at which the visual features change abruptly as a reference point for dividing the video.

프로세서(110)는 상기 설정한 기준점과 기준점 사이를 상기 비주얼 특징(visual features)이 동일하거나 유사한 구간으로 설정할 수 있다.The processor 110 may set the interval between the set reference point and the reference point as a section having the same or similar visual features.

프로세서(110)는 상기 설정한 구간을 기초로 상기 동영상을 복수개의 구간으로 분할할 수 있다.The processor 110 may divide the video into a plurality of sections based on the set section.

ⅱ) 언어 특징(language features)ii) language features

프로세서(110)는 상기 디코딩한 I 프레임(frame)들 각각에 대해 이미지 캡셔닝(Image Captioning)을 적용하여 상기 디코딩한 I 프레임(frame)들 각각에 대한 이미지 캡션을 생성할 수 있다. The processor 110 may generate an image caption for each of the decoded I frames by applying image captioning to each of the decoded I frames.

프로세서(110)는 적어도 하나의 학습된 딥 뉴럴 네트워크(Deep Neural Network)(예컨대, 합성곱 신경망(Convolutional Neural Network, CNN))를 이용하여 상기 디코딩한 I 프레임(frame)들 각각에 해당하는 이미지를 처리하고, 상기 이미지를 처리한 상기 적어도 하나의 학습된 딥 뉴럴 네트워크(Deep Neural Network)(예컨대, 합성곱 신경망(Convolutional Neural Network, CNN))와 다른 적어도 하나의 학습된 딥 뉴럴 네트워크(Deep Neural Network)(예컨대, 순환 신경망(Recurrent Neural Network, RNN))를 이용하여 상기 처리한 이미지 각각에 대한 문장(이미지 캡션)을 생성할 수 있다.The processor 110 uses at least one learned deep neural network (eg, a convolutional neural network, CNN) to obtain an image corresponding to each of the decoded I frames. At least one learned deep neural network (Deep Neural Network) different from the at least one trained deep neural network (eg, convolutional neural network, CNN) that processed the image ) (eg, a recurrent neural network (RNN)) may be used to generate a sentence (image caption) for each of the processed images.

프로세서(110)는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 상기 처리한 이미지 각각에 대해 생성한 문장(이미지 캡션) 각각에 대한 언어 특징(language features)을 추출할 수 있다. 다만, 프로세서(110)가 상기 처리한 이미지 각각에 대해 생성한 문장(이미지 캡션) 각각에 대한 언어 특징(language features)을 추출하기 위하여 이용할 수 있는 것이 이에 한정되는 것은 아니다.The processor 110 may extract language features for each sentence (image caption) generated for each of the processed images by using the learned deep neural network. However, what the processor 110 can use to extract language features for each sentence (image caption) generated for each of the processed images is not limited thereto.

프로세서(110)는 상기 처리한 이미지 각각에 대해 생성한 문장(이미지 캡션) 각각에서 추출한 상기 언어 특징(language features)을 기초로 상기 동영상의 구간을 분할할 수 있다.The processor 110 may divide the section of the video based on the language features extracted from each sentence (image caption) generated for each of the processed images.

프로세서(110)는 상기 언어 특징(language features)이 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 may set a point at which the language features change abruptly as a reference point for dividing the video.

프로세서(110)는 상기 설정한 기준점과 기준점 사이를 상기 언어 특징(language features)이 동일하거나 유사한 구간으로 설정할 수 있다.The processor 110 may set a section having the same or similar language features between the set reference point and the reference point.

ⅲ) 비주얼 특징(visual features) 또는 언어 특징(language features)iii) visual features or language features;

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)을 기초로 상기 동영상을 복수개의 구간으로 분할할 수 있다.The processor 110 may divide the video into a plurality of sections based on the extracted visual features or the extracted language features.

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)을 기초로 상기 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 비주얼 특징(visual features) 및 상기 추출한 언어 특징(language features) 중 적어도 어느 하나에 가중치를 부여할 수 있다.The processor 110 divides the video into a plurality of sections based on the extracted visual features or the extracted language features. The extracted visual features and the extracted language features features) may be assigned a weight to at least one of them.

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 및 상기 추출한 언어 특징(language features)이 동시에 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 may set a point at which the extracted visual features and the extracted language features change abruptly at the same time as a reference point for dividing the video.

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features) 중 가중치가 부여된 특징이 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 may set a point at which a weighted feature among the extracted visual features or the extracted language features changes abruptly as a reference point for segmenting the video.

프로세서(110)는 상기 설정한 기준점과 기준점 사이를 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)이 동일하거나 유사한 구간으로 설정할 수 있다.The processor 110 may set the interval between the set reference point and the reference point as a section in which the extracted visual features or the extracted language features are the same or similar.

ⅳ) 비주얼 특징, 언어 특징 또는 데이터 유닛(date unit)의 길이iv) the length of the visual feature, the linguistic feature or the data unit (date unit);

프로세서(110)는 상기 디코딩한 I 프레임(frame)들 각각에 대해 데이터 유닛(date unit)의 길이를 계산할 수 있다.The processor 110 may calculate the length of a data unit for each of the decoded I frames.

프로세서(110)는 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할할 수 있다.The processor 110 may divide the video into a plurality of sections based on the extracted visual features, the extracted language features, or the calculated length of the data unit (date unit).

프로세서(110)는 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 및 상기 계산한 데이터 유닛(date unit)의 길이 중 적어도 어느 하나에 가중치를 부여할 수 있다.The processor 110 divides the video into a plurality of sections based on the extracted visual features, the extracted language features, or the calculated length of the data unit (date unit). A weight may be assigned to at least one of visual features, the extracted language features, and the calculated length of the data unit (date unit).

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 및 상기 추출한 언어 특징(language features)이 동시에 급격하게 달라지는 지점 또는 상기 계산한 데이터 유닛(date unit)의 길이가 급격하게 증가하는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 selects a point at which the extracted visual features and the extracted language features abruptly change at the same time or a point at which the length of the calculated date unit sharply increases as the moving picture. It can be set as a reference point for division.

프로세서(110)는 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이 중 가중치가 부여된 특징 또는 데이터 유닛(date unit)의 길이가 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 determines whether the weighted feature or the length of the data unit among the extracted visual features, the extracted language features, or the calculated length of the data unit is determined. A point that changes rapidly may be set as a reference point for dividing the video.

ⅴ) 비주얼 특징(visual features) 또는 데이터 유닛(date unit)의 길이v) length of visual features or data units

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할할 수 있다.The processor 110 may divide the video into a plurality of sections based on the extracted visual features or the calculated length of the data unit (date unit).

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 비주얼 특징(visual features) 및 상기 계산한 데이터 유닛(date unit)의 길이 중 적어도 어느 하나에 가중치를 부여할 수 있다.The processor 110 divides the video into a plurality of sections based on the extracted visual features or the calculated length of the data unit, the extracted visual features and the calculated A weight may be assigned to at least one of the lengths of a data unit.

프로세서(110)는 상기 추출한 비주얼 특징(visual features)이 급격하게 달라지는 지점 또는 상기 계산한 데이터 유닛(date unit)의 길이가 급격하게 증가하는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 may set a point at which the extracted visual features abruptly change or a point at which the calculated data unit length rapidly increases as a reference point for dividing the moving picture.

프로세서(110)는 상기 추출한 비주얼 특징(visual features) 또는 상기 계산한 데이터 유닛(date unit)의 길이 중 가중치가 부여된 특징 또는 데이터 유닛(date unit)의 길이가 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 divides the video at a point where the weighted feature or the length of the data unit changes abruptly among the extracted visual features or the calculated length of the data unit. It can be set as a reference point for

프로세서(110)는 상기 설정한 기준점과 기준점 사이를 상기 추출한 비주얼 특징(visual features)이 동일하거나 유사한 구간으로 설정할 수 있다.The processor 110 may set the interval between the set reference point and the reference point as a section in which the extracted visual features are the same or similar.

ⅵ) 언어 특징(language features) 또는 데이터 유닛(date unit)의 길이vi) length of language features or data units

프로세서(110)는 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할할 수 있다.The processor 110 may divide the video into a plurality of sections based on the extracted language features or the calculated length of the data unit (date unit).

프로세서(110)는 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 언어 특징(language features) 및 상기 계산한 데이터 유닛(date unit)의 길이 중 적어도 어느 하나에 가중치를 부여할 수 있다.The processor 110 divides the video into a plurality of sections based on the extracted language features or the calculated length of the data unit, the extracted language features and the calculated A weight may be assigned to at least one of the lengths of a data unit.

프로세서(110)는 상기 추출한 언어 특징(language features)이 급격하게 달라지는 지점 또는 상기 계산한 데이터 유닛(date unit)의 길이가 급격하게 증가하는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 may set a point at which the extracted language features abruptly change or a point at which the calculated length of the data unit rapidly increases as a reference point for dividing the video.

프로세서(110)는 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이 중 가중치가 부여된 특징 또는 데이터 유닛(date unit)의 길이가 급격하게 달라지는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The processor 110 divides the video at a point where the weighted feature or the length of the data unit changes rapidly among the extracted language features or the calculated length of the data unit. It can be set as a reference point for

프로세서(110)는 상기 설정한 기준점과 기준점 사이를 상기 추출한 언어 특징(language features)이 동일하거나 유사한 구간으로 설정할 수 있다.The processor 110 may set the interval between the set reference point and the reference point as a section having the same or similar language features.

프로세서(110)는 입출력 인터페이스 모듈(120)을 통해 상기 동영상을 분할한 복수개의 구간을 출력할 수 있다.The processor 110 may output a plurality of sections obtained by dividing the video through the input/output interface module 120 .

입출력 인터페이스 모듈(120)은 네트워크를 통하여 외부 장치(예컨대, 서버)와 연결될 수 있다.The input/output interface module 120 may be connected to an external device (eg, a server) through a network.

입출력 인터페이스 모듈(120)은 외부 장치로부터 데이터를 획득할 수 있다.The input/output interface module 120 may obtain data from an external device.

입출력 인터페이스 모듈(120)은 외부 장치로부터 동영상(비디오 압축 스트림, 압축 비트 스트림)을 획득할 수 있다.The input/output interface module 120 may acquire a moving picture (compressed video stream, compressed bit stream) from an external device.

입출력 인터페이스 모듈(120)은 사용자의 입력을 획득할 수 있다.The input/output interface module 120 may obtain a user's input.

입출력 인터페이스 모듈(120)은 프로세서(110)가 분할한 동영상의 복수개의 구간을 출력할 수 있다.The input/output interface module 120 may output a plurality of sections of the video divided by the processor 110 .

입출력 인터페이스 모듈(120)은 동영상 구간 고속 분할 장치(100)와 일체형으로 제공될 수 있다.The input/output interface module 120 may be provided integrally with the video section high-speed division apparatus 100 .

입출력 인터페이스 모듈(120)은 동영상 구간 고속 분할 장치(100)에서 분리되어 제공될 수 있다.The input/output interface module 120 may be provided separately from the video section high-speed division apparatus 100 .

입출력 인터페이스 모듈(120)은 동영상 구간 고속 분할 장치(100)와 통신적으로 연결될 별도의 장치일 수 있다.The input/output interface module 120 may be a separate device to be communicatively connected to the video section high-speed division apparatus 100 .

입출력 인터페이스 모듈(120)은 외부 장치와 연결되기 위한 포트(예컨대, USB 포트)를 포함할 수 있다.The input/output interface module 120 may include a port (eg, a USB port) for connecting to an external device.

입출력 인터페이스 모듈(120)은 모니터, 터치스크린, 마우스, 전자펜, 마이크로폰, 키보드, 스피커, 이어폰, 헤드폰 또는 터치패드를 포함할 수 있다.The input/output interface module 120 may include a monitor, a touch screen, a mouse, an electronic pen, a microphone, a keyboard, a speaker, an earphone, a headphone, or a touch pad.

메모리(130)는 입출력 인터페이스 모듈(120)을 통해 획득한 데이터를 저장할 수 있다.The memory 130 may store data acquired through the input/output interface module 120 .

메모리(130)는 입출력 인터페이스 모듈(120)을 통해 획득한 동영상(비디오 압축 스트림, 압축 비트 스트림)을 저장할 수 있다.The memory 130 may store moving images (compressed video stream, compressed bit stream) acquired through the input/output interface module 120 .

메모리(130)는 프로세서(110)가 판별한 프레임들을 저장할 수 있다.The memory 130 may store the frames determined by the processor 110 .

메모리(130)는 프로세서(110)가 디코딩한 I 프레임들을 저장할 수 있다.The memory 130 may store I frames decoded by the processor 110 .

메모리(130)는 프로세서(110)가 설정한 기준점을 저장할 수 있다.The memory 130 may store a reference point set by the processor 110 .

메모리(130)는 프로세서(110)가 추출한 비주얼 특징(visual features)을 저장할 수 있다.The memory 130 may store visual features extracted by the processor 110 .

메모리(130)는 프로세서(110)가 비주얼 특징(visual features)을 기초로 분할한 동영상의 구간을 저장할 수 있다.The memory 130 may store a section of the video divided by the processor 110 based on visual features.

메모리(130)는 프로세서(110)가 추출한 언어 특징(language features)을 저장할 수 있다.The memory 130 may store language features extracted by the processor 110 .

메모리(130)는 프로세서(110)가 생성한 이미지 캡션을 저장할 수 있다.The memory 130 may store the image caption generated by the processor 110 .

메모리(130)는 프로세서(110)가 처리한 이미지를 저장할 수 있다.The memory 130 may store the image processed by the processor 110 .

메모리(130)는 프로세서(110)가 생성한 문장(이미지 캡션)을 저장할 수 있다.The memory 130 may store a sentence (image caption) generated by the processor 110 .

메모리(130)는 프로세서(110)가 부여한 가중치를 저장할 수 있다.The memory 130 may store a weight assigned by the processor 110 .

메모리(130)는 프로세서(110)가 계산한 데이터 유닛(date unit)의 길이를 저장할 수 있다.The memory 130 may store the length of the data unit calculated by the processor 110 .

여기서 사용된 '모듈'이라는 용어는 논리적인 구성 단위를 나타내는 것으로서, 반드시 물리적으로 구분되는 구성 요소가 아니라는 점은 본 발명이 속하는 기술분야의 당업자에게 자명한 사항이다.As used herein, the term 'module' refers to a logical structural unit, and it is obvious to those skilled in the art that the present invention is not necessarily a physically separate component.

도 2는 일실시예에 따른 동영상 구간 고속 분할 방법을 나타내는 플로우 차트이다.2 is a flowchart illustrating a method for high-speed segmentation of a video section according to an exemplary embodiment.

도 2를 참조하면, 동영상 구간 고속 분할 장치가 동영상에서 I 프레임(frame)들을 판별한다(200).Referring to FIG. 2 , the high-speed video section division apparatus determines I frames in the video ( 200 ).

이때, 상기 동영상 구간 고속 분할 장치는 상기 동영상에서 판별한 상기 I 프레임(frame)들만을 디코딩할 수 있다.In this case, the high-speed segmentation apparatus of the moving picture may decode only the I frames determined from the moving picture.

동영상 구간 고속 분할 장치가 상기 판별한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출한다(210).The video section high-speed segmentation apparatus extracts visual features for each of the determined I frames ( 210 ).

이때, 상기 동영상 구간 고속 분할 장치는 상기 판별한 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 추출하기 위하여 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용할 수 있다.In this case, the high-speed segmentation apparatus of the video section may use a deep neural network learned to extract visual features for each of the determined I frames.

동영상 구간 고속 분할 장치가 상기 판별한 I 프레임(frame)들 각각에 대한 언어 특징(language features)을 추출한다(220).The high-speed video section division apparatus extracts language features for each of the determined I frames (220).

이때, 상기 동영상 구간 고속 분할 장치는 적어도 하나의 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 디코딩한 I 프레임(frame)들 각각에 해당하는 이미지를 처리할 수 있다.In this case, the high-speed segmentation apparatus of the video section may process an image corresponding to each of the decoded I frames using at least one learned convolutional neural network (CNN).

또한, 상기 동영상 구간 고속 분할 장치는 순환 신경망(Recurrent Neural Network, RNN)을 이용하여 상기 처리한 이미지 각각에 대한 문장(이미지 캡션)을 생성할 수 있다.In addition, the high-speed segmentation apparatus of the video section may generate a sentence (image caption) for each of the processed images by using a recurrent neural network (RNN).

또한, 상기 동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 상기 처리한 이미지 각각에 대해 생성한 문장(이미지 캡션) 각각에 대한 언어 특징(language features)을 추출할 수 있다.In addition, the high-speed segmentation apparatus of the video section may extract language features for each sentence (image caption) generated for each of the processed images by using a learned deep neural network. .

동영상 구간 고속 분할 장치가 상기 판별한 I 프레임(frame)들 각각에 대한 데이터 유닛(date unit)의 길이를 계산한다(230).The high-speed video section division apparatus calculates the length of a data unit for each of the determined I frames ( 230 ).

동영상 구간 고속 분할 장치가 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할한다(240).The high-speed video section segmentation apparatus divides the video section based on the extracted visual features, the extracted language features, or the calculated length of the data unit (240).

이때, 상기 동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 및 상기 계산한 데이터 유닛(date unit)의 길이 중 적어도 어느 하나에 가중치를 부여할 수 있다.In this case, the video section high-speed segmentation apparatus divides the video into a plurality of sections based on the extracted visual features, the extracted language features, or the calculated length of the data unit (date unit). For this purpose, a weight may be assigned to at least one of the extracted visual features, the extracted language features, and the calculated length of the data unit (date unit).

또한, 상기 동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)이 급격하게 달라지는 지점 또는 상기 계산한 데이터 유닛(date unit)의 길이가 급격하게 증가하는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.In addition, the high-speed segmentation apparatus for the video section determines a point at which the extracted visual features or the extracted language features abruptly change or a point at which the calculated length of the data unit rapidly increases. It can be set as a reference point for dividing the video.

또한, 상기 동영상 구간 고속 분할 장치는 상기 설정한 기준점과 기준점 사이를 하나의 구간으로 설정할 수 있다.In addition, the high-speed segmentation apparatus of the video section may set a section between the set reference point and the reference point as one section.

또한, 상기 동영상 구간 고속 분할 장치는 상기 설정한 구간을 기초로 상기 동영상을 복수개의 구간으로 분할할 수 있다.Also, the high-speed video section division apparatus may divide the video into a plurality of sections based on the set section.

동영상 구간 고속 분할 장치가 상기 동영상을 분할한 복수개의 구간을 출력할 수 있다(250).The video section high-speed division apparatus may output a plurality of sections in which the video is divided ( 250 ).

이때, 상기 동영상 구간 고속 분할 장치는 분할한 구간을 표시하기 위하여 디코딩 된 I 프레임(frame)들에 해당하는 이미지들 출력할 수 있다.In this case, the high-speed segmentation apparatus of the video section may output images corresponding to the decoded I frames in order to display the segmented section.

비록 도 2에서는 210, 220 및 230 동작이 순차적으로 수행되는 것으로 예시되어 있으나, 210, 220 및 230 동작은 동시, 병렬력 또는 오버랩되어 수행될 수도 있고, 210 동작이 220 및 230 동작보다 나중에 수행될 수 있다. 또한, 220 동작이 230 동작보다 나중에 수행될 수 있다.Although operations 210, 220, and 230 are illustrated as being sequentially performed in FIG. 2, operations 210, 220 and 230 may be performed simultaneously, in parallel, or overlapping, and operation 210 may be performed later than operations 220 and 230. can Also, operation 220 may be performed later than operation 230 .

또한, 도 2에서는 210, 220 및 230 동작이 수행 완료된 후, 210, 220 및 230 동작의 결과를 기초로 동영상의 구간을 분할하는 것으로 예시되어 있으나, 다른 실시예에 따라, 210, 220 및 230 동작 중 적어도 어느 2개 동작의 결과를 조합하여 동영상의 구간을 분할할 수 있고, 또 다른 실시예에 따라 210, 220 및 230 동작 각각의 결과를 기초로 동영상의 구간을 분할할 수 있다.Also, in FIG. 2 , after operations 210, 220, and 230 are completed, it is exemplified that the video section is divided based on the results of operations 210, 220, and 230. A section of the video may be divided by combining the results of at least any two of the operations, and according to another embodiment, the section of the video may be divided based on the results of each of operations 210, 220, and 230.

도 3는 다른 실시예에 따른 동영상 구간 고속 분할 방법을 나타내는 플로우 차트이다.3 is a flowchart illustrating a method for high-speed segmentation of a video section according to another exemplary embodiment.

도 3을 참조하면, 동영상 구간 고속 분할 장치가 동영상에서 I 프레임(frame)들을 판별한다(300).Referring to FIG. 3 , the high-speed video section division apparatus determines I frames in the video ( 300 ).

동영상 구간 고속 분할 장치가 상기 판별한 I 프레임(frame)들 각각에 대한 이미지 캡션을 생성한다(310).The high-speed video section division apparatus generates image captions for each of the determined I frames ( 310 ).

동영상 구간 고속 분할 장치가 상기 생성한 문장(이미지 캡션)에서 언어 특징(language features)을 추출한다(320).The high-speed segmentation apparatus extracts language features from the generated sentence (image caption) ( 320 ).

이때, 상기 동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 상기 처리한 이미지 각각에 대해 생성한 문장(이미지 캡션) 각각에 대한 언어 특징(language features)을 추출할 수 있다.In this case, the high-speed segmentation apparatus of the video section may extract language features for each sentence (image caption) generated for each of the processed images by using a learned deep neural network. .

동영상 구간 고속 분할 장치가 상기 판별한 I 프레임(frame)들 각각에 대한 데이터 유닛(date unit)의 길이를 계산한다(330).The high-speed video section division apparatus calculates the length of a data unit for each of the determined I frames ( 330 ).

동영상 구간 고속 분할 장치가 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상의 구간을 분할한다(340).The high-speed video section segmentation apparatus divides the video section based on the extracted language features or the calculated length of the data unit (S340).

이때, 상기 동영상 구간 고속 분할 장치는 상기 추출한 언어 특징(language features) 또는 상기 계산한 데이터 유닛(date unit)의 길이를 기초로 상기 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 언어 특징(language features) 및 상기 계산한 데이터 유닛(date unit)의 길이 중 적어도 어느 하나에 가중치를 부여할 수 있다.In this case, the high-speed segmentation apparatus for video section divides the video into a plurality of sections based on the extracted language features or the calculated length of the data unit (language features). and a weight may be assigned to at least one of the calculated lengths of the data unit (date unit).

또한, 상기 동영상 구간 고속 분할 장치는 상기 추출한 언어 특징(language features)이 급격하게 달라지는 지점 또는 상기 계산한 데이터 유닛(date unit)의 길이가 급격하게 증가하는 지점을 상기 동영상을 분할하기 위한 기준점으로 설정할 수 있다.In addition, the high-speed segmentation apparatus of the moving picture section sets a point at which the extracted language features abruptly change or a point at which the length of the calculated date unit rapidly increases as a reference point for dividing the moving picture. can

동영상 구간 고속 분할 장치가 상기 동영상을 분할한 복수개의 구간을 출력할 수 있다(350).The high-speed video section division apparatus may output a plurality of sections in which the video is divided ( S350 ).

비록 도 3에서는 300 내지 320 동작 및 330 동작보다 먼저 수행되는 것으로 예시되어 있으나, 300 내지 320 동작과 330 동작은 동시, 병렬력 또는 오버랩되어 수행될 수도 있고, 300 내지 320 동작이 330 동작보다 나중에 수행될 수 있다. Although it is exemplified in FIG. 3 as being performed before operations 300 to 320 and 330, operations 300 to 320 and 330 may be performed simultaneously, in parallel, or overlapping, and operations 300 to 320 are performed later than operation 330 can be

또한, 도 3에서는 300 내지 330 동작이 수행 완료된 후, 320 및 330 동작의 결과를 기초로 동영상의 구간을 분할하는 것으로 예시되어 있으나, 다른 실시예에 따라, 320 및 330 동작 각각의 결과를 기초로 동영상의 구간을 분할할 수 있다.In addition, in FIG. 3 , the video section is divided based on the results of operations 320 and 330 after operations 300 to 330 are completed, but according to another embodiment, based on the results of operations 320 and 330, respectively You can divide a section of a video.

도 4는 일실시예에 따라 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features)을 기초로 동영상의 구간을 고속 분할하는 모습을 나타내는 도면이다.4 is a diagram illustrating a state in which sections of a video are divided at high speed based on visual features of each of I frames according to an exemplary embodiment.

도 4를 참조하면, 동영상 구간 고속 분할 장치는 획득한 동영상을 구성하는 프레임(400, 410, 420, 430, 440, 450, 460, 470)을 판별할 수 있다.Referring to FIG. 4 , the high-speed video section division apparatus may determine frames 400 , 410 , 420 , 430 , 440 , 450 , 460 , and 470 constituting the acquired video.

동영상 구간 고속 분할 장치는 획득한 동영상에서 일실시예에 따라 32 프레임마다 한 장씩 발생하는 I 프레임(frame)들(400, 420, 440, 460)을 판별할 수 있다.The apparatus for high-speed segmentation of a moving picture may determine I frames 400 , 420 , 440 , and 460 that occur one by one every 32 frames in the obtained moving picture according to an embodiment.

동영상 구간 고속 분할 장치는 상기 획득한 동영상의 구간을 고속으로 분할하기 위하여 판별한 I 프레임(frame)들(400, 420, 440, 460)만을 디코딩할 수 있다.The high-speed video section division apparatus can decode only the I-frames 400 , 420 , 440 , and 460 that are determined to divide the acquired video section at high speed.

동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 디코딩한 I 프레임(frame)들(400, 420, 440, 460) 각각에 대한 비주얼 특징(visual features)을 추출할 수 있다.The video section high-speed segmentation apparatus may extract visual features for each of the decoded I frames 400 , 420 , 440 , 460 using a learned deep neural network. .

동영상 구간 고속 분할 장치는 상기 추출한 I 프레임(frame)들(400, 420, 440, 460) 각각에 대한 비주얼 특징(visual features)을 기초로 상기 획득한 동영상의 구간을 고속으로 분할할 수 있다.The high-speed video section division apparatus may rapidly divide the obtained video section based on visual features of each of the extracted I frames 400 , 420 , 440 , and 460 .

동영상 구간 고속 분할 장치는 분할한 구간을 표시하기 위하여 디코딩 된 I 프레임(frame)들에 해당하는 이미지들 출력할 수 있다. The high-speed video section division apparatus may output images corresponding to decoded I frames in order to display the divided section.

일실시예에 따라, 도 4의 경우, 동영상 구간 고속 분할 장치는 분할한 구간을 표시하기 위하여 디코딩 된 I 프레임(frame)들 중 400, 420 및 460에 해당하는 이미지들을 출력할 수 있다.According to an embodiment, in the case of FIG. 4 , the high-speed video section division apparatus may output images corresponding to 400, 420, and 460 among decoded I frames to display the divided section.

도 5는 다른 실시예에 따라 I 프레임(frame)들 각각에 대한 비주얼 특징(visual features) 및 언어 특징(language features)을 기초로 동영상의 구간을 고속 분할하는 모습을 나타내는 도면이다. 5 is a diagram illustrating a state in which sections of a video are divided at high speed based on visual features and language features of each of I frames according to another embodiment.

도 5를 참조하면, 동영상 구간 고속 분할 장치는 획득한 동영상을 구성하는 프레임(500, 510, 520, 530, 540, 550, 560, 570)을 판별할 수 있다.Referring to FIG. 5 , the high-speed segmentation apparatus for a moving image may determine frames 500 , 510 , 520 , 530 , 540 , 550 , 560 and 570 constituting the obtained moving image.

동영상 구간 고속 분할 장치는 획득한 동영상에서 일실시예에 따라 32 프레임마다 한 장씩 발생하는 I 프레임(frame)들(500, 520, 540, 560)을 판별할 수 있다.The apparatus for high-speed segmentation of a moving picture may determine I frames 500 , 520 , 540 , and 560 that are generated one by one every 32 frames in the obtained moving picture according to an embodiment.

동영상 구간 고속 분할 장치는 상기 획득한 동영상의 구간을 고속으로 분할하기 위하여 판별한 I 프레임(frame)들(500, 520, 540, 560)만을 디코딩할 수 있다.The high-speed video section division apparatus can decode only the I-frames 500 , 520 , 540 , and 560 that are determined to divide the acquired video section at high speed.

동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 디코딩한 I 프레임(frame)들(500, 520, 540, 560) 각각에 대한 비주얼 특징(visual features)을 추출할 수 있다.The video section high-speed segmentation apparatus may extract visual features for each of the decoded I frames 500 , 520 , 540 , and 560 using a learned deep neural network. .

동영상 구간 고속 분할 장치는 적어도 하나의 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 디코딩 된 I 프레임(frame)들(500, 520, 540, 560) 각각에 해당하는 이미지를 처리할 수 있다.The video section high-speed segmentation apparatus can process an image corresponding to each of the decoded I frames (500, 520, 540, 560) using at least one learned convolutional neural network (CNN). have.

동영상 구간 고속 분할 장치는 순환 신경망(Recurrent Neural Network, RNN)을 이용하여 상기 처리한 디코딩 된 I 프레임(frame)들(500, 520, 540, 560) 각각에 해당하는 이미지 각각에 대한 문장(이미지 캡션)(501, 521, 541, 561)을 생성할 수 있다.The video section high-speed segmentation apparatus uses a recurrent neural network (RNN) to process a sentence (image caption) for each image corresponding to each of the decoded I frames (500, 520, 540, 560). ) (501, 521, 541, 561).

동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 상기 생성한 문장(이미지 캡션)(501, 521, 541, 561)에서 언어 특징(language features)을 추출할 수 있다.The video section high-speed segmentation apparatus may extract language features from the generated sentences (image captions) 501 , 521 , 541 , and 561 using a learned deep neural network.

동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features) 및 상기 추출한 언어 특징(language features)을 기초로 상기 획득한 동영상의 구간을 분할할 수 있다.The high-speed video section segmentation apparatus may divide the obtained video section based on the extracted visual features and the extracted language features.

동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)을 기초로 상기 획득한 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 비주얼 특징(visual features) 및 상기 추출한 언어 특징(language features) 중 적어도 어느 하나에 가중치를 부여할 수 있다.The video section high-speed segmentation apparatus divides the obtained video into a plurality of sections based on the extracted visual features or the extracted language features, the extracted visual features and the extracted language At least one of language features may be weighted.

동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)이 급격하게 달라지는 지점을 상기 획득한 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The high-speed segmentation apparatus of a moving picture may set a point at which the extracted visual features or the extracted language features abruptly change as a reference point for segmenting the obtained moving image.

동영상 구간 고속 분할 장치는 상기 설정한 기준점과 기준점 사이를 하나의 구간으로 설정할 수 있다.The high-speed video section division apparatus may set a section between the set reference point and the reference point as one section.

동영상 구간 고속 분할 장치는 기 설정한 구간을 기초로 상기 획득한 동영상을 복수개의 구간으로 분할할 수 있다.The high-speed video section division apparatus may divide the obtained video into a plurality of sections based on a preset section.

일실시예에 따라, 도 5의 경우, 동영상 구간 고속 분할 장치는 분할한 구간을 표시하기 위하여 디코딩 된 I 프레임(frame)들 중 500, 520 및 660에 해당하는 이미지를 출력할 수 있다.According to an embodiment, in the case of FIG. 5 , the high-speed video section division apparatus may output images corresponding to 500, 520, and 660 among decoded I frames to display the divided section.

도 6은 동영상 구간 및 상기 동영상을 구성하는 프레임을 나타내는 도면이다.6 is a diagram illustrating a video section and a frame constituting the video.

도 6을 참조하면, 동영상(비디오 압축 스트림, 압축 비트 스트림)은 복수개의 구간(600, 650)이 결합되어 형성된다.Referring to FIG. 6 , a moving picture (a compressed video stream and a compressed bit stream) is formed by combining a plurality of sections 600 and 650 .

비디오 압축 스트림(동영상, 압축 비트 스트림)은 날 헤더(NAL Header, NH)(611) 및 데이터 유닛(Data Unit)(612)으로 구성된다. 여기서, 날(Network Abstraction Layer, NAL) 헤더(Header)(611)는 비디오 프레임 단위에서의 데이터 관련 정보이고, 데이터 유닛(Data Unit)(612)은 실제 데이터 부분이다.A video compression stream (movie, compressed bit stream) is composed of a raw header (NAL Header, NH) 611 and a data unit (Data Unit) 612 . Here, a Network Abstraction Layer (NAL) header 611 is data-related information in units of video frames, and a data unit 612 is an actual data part.

날 헤더(NAL Header, NH)(611) 및 데이터 유닛(Data Unit)(612)은 프레임(610, 620, 630, 640, 650) 단위로 형성되어 하나의 프레임을 구성하기 위한 정보를 포함한다.A NAL Header (NH) 611 and a data unit 612 are formed in units of frames 610 , 620 , 630 , 640 , and 650 and include information for configuring one frame.

I 프레임(frame)들 사이에 장면 변화(Scene Change)가 발생하게 되면 정확한 움직임 예측이 어려워 리지듀얼(residual) 정보가 증가하게 되고, 리지듀얼(residual) 정보의 증가로 비트량이 증가하게 되고, 최종적으로 데이터 유닛(Data Unit)의 길이가 증가하게 된다.When a scene change occurs between I frames, it is difficult to accurately predict motion, so residual information increases, and the bit amount increases due to the increase of residual information, and finally This increases the length of the data unit.

동영상 구간 고속 분할 장치는 획득한 동영상에서 일실시예에 따라 32 프레임마다 한 장씩 발생하는 I 프레임(frame)들(610, 660)을 판별할 수 있다.The apparatus for high-speed segmentation of a moving picture may determine I-frames 610 and 660 that occur one by one every 32 frames, according to an embodiment, in the obtained moving picture.

동영상 구간 고속 분할 장치는 상기 획득한 동영상의 구간을 고속으로 분할하기 위하여 판별한 I 프레임(frame)들(610, 660)만을 디코딩할 수 있다.The high-speed video section division apparatus may decode only the I frames 610 and 660 determined to rapidly divide the obtained video section.

동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 디코딩한 I 프레임(frame)들(610, 660) 각각에 대한 비주얼 특징(visual features)을 추출할 수 있다.The video section high-speed segmentation apparatus may extract visual features for each of the decoded I frames 610 and 660 using a learned deep neural network.

동영상 구간 고속 분할 장치는 적어도 하나의 학습된 합성곱 신경망(Convolutional Neural Network, CNN)을 이용하여 디코딩 된 I 프레임(frame)들(610, 660) 각각에 해당하는 이미지를 처리할 수 있다.The video section high-speed segmentation apparatus may process an image corresponding to each of the decoded I frames 610 and 660 using at least one learned convolutional neural network (CNN).

동영상 구간 고속 분할 장치는 순환 신경망(Recurrent Neural Network, RNN)을 이용하여 상기 처리한 디코딩 된 I 프레임(frame)들(610, 660) 각각에 해당하는 이미지 각각에 대한 문장(이미지 캡션)을 생성할 수 있다.The video section high-speed segmentation apparatus generates a sentence (image caption) for each image corresponding to each of the decoded I frames 610 and 660 processed above using a Recurrent Neural Network (RNN). can

동영상 구간 고속 분할 장치는 학습된 딥 뉴럴 네트워크(Deep Neural Network)를 이용하여 상기 생성한 문장(이미지 캡션)에서 언어 특징(language features)을 추출할 수 있다.The high-speed segmentation apparatus for video sections may extract language features from the generated sentences (image captions) using a learned deep neural network.

동영상 구간 고속 분할 장치는 상기 판별한 I 프레임(frame)들(610, 660) 각각에 대한 데이터 유닛(date unit)(612)의 길이를 계산할 수 있다.The high-speed video section division apparatus may calculate the length of a data unit 612 for each of the determined I frames 610 and 660 .

동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 및 상기 계산한 I 프레임(frame)들(610, 660) 각각에 대한 데이터 유닛(date unit)(612)의 길이를 기초로 상기 획득한 동영상의 구간을 분할할 수 있다.The video section high-speed segmentation apparatus includes a data unit 612 for each of the extracted visual features, the extracted language features, and the calculated I frames 610 and 660. A section of the obtained video may be divided based on the length of .

동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 또는 상기 계산한 I 프레임(frame)들(610, 660) 각각에 대한 데이터 유닛(date unit)(612)의 길이를 기초로 상기 획득한 동영상을 복수개의 구간으로 분할하기 위하여 상기 추출한 비주얼 특징(visual features), 상기 추출한 언어 특징(language features) 및 상기 계산한 I 프레임(frame)들(610, 660) 각각에 대한 데이터 유닛(date unit)(612)의 길이 중 적어도 어느 하나에 가중치를 부여할 수 있다.The video section high-speed segmentation apparatus includes a data unit 612 for each of the extracted visual features, the extracted language features, or the calculated I frames 610 and 660. Each of the extracted visual features, the extracted language features, and the calculated I frames 610 and 660 to divide the obtained video into a plurality of sections based on the length of A weight may be assigned to at least one of the lengths of the data unit 612 for .

동영상 구간 고속 분할 장치는 상기 추출한 비주얼 특징(visual features) 또는 상기 추출한 언어 특징(language features)이 급격하게 달라지는 지점 또는 상기 계산한 I 프레임(frame)들(610, 660) 각각에 대한 데이터 유닛(date unit)(612)의 길이가 급격하게 증가하는 지점을 상기 획득한 동영상을 분할하기 위한 기준점으로 설정할 수 있다.The video section high-speed segmentation apparatus is a data unit (date) for a point at which the extracted visual features or the extracted language features change abruptly or the calculated I frames 610 and 660, respectively. A point at which the length of the unit 612 rapidly increases may be set as a reference point for dividing the acquired video.

동영상 구간 고속 분할 장치는 분할한 구간을 표시하기 위하여 디코딩 된 I 프레임(frame)들에 해당하는 이미지들 출력할 수 있다.The high-speed video section division apparatus may output images corresponding to decoded I frames in order to display the divided section.

이상에서, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 적어도 하나로 선택적으로 결합하여 동작할 수도 있다. In the above, even though all the components constituting the embodiment of the present invention are described as being combined or operating in combination, the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining at least one.

또한, 그 모든 구성 요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. In addition, although all of the components may be implemented as one independent hardware, some or all of the components are selectively combined to perform some or all functions of the combined components in one or a plurality of hardware program modules It may be implemented as a computer program having Codes and code segments constituting the computer program can be easily deduced by those skilled in the art of the present invention.

이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 저장매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 저장매체로서는 자기 기록매체, 광 기록매체, 등이 포함될 수 있다.Such a computer program is stored in a computer readable storage medium (Computer Readable Media), read and executed by the computer, thereby implementing the embodiment of the present invention. The storage medium of the computer program may include a magnetic recording medium, an optical recording medium, and the like.

또한, 이상에서 기재된 "포함하다", "구성하다" 또는 "가지다" 등의 용어는, 특별히 반대되는 기재가 없는 한, 해당 구성 요소가 내재될 수 있음을 의미하는 것이므로, 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것으로 해석되어야 한다. In addition, terms such as "comprises", "comprises" or "have" described above mean that the corresponding component may be embedded, unless otherwise specified, excluding other components. Rather, it should be construed as being able to further include other components.

기술적이거나 과학적인 용어를 포함한 모든 용어들은, 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥 상의 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.All terms, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs, unless otherwise defined. Terms commonly used, such as those defined in the dictionary, should be interpreted as being consistent with the meaning in the context of the related art, and are not interpreted in an ideal or excessively formal meaning unless explicitly defined in the present invention.

본 발명에서 개시된 방법들은 상술된 방법을 달성하기 위한 하나 이상의 동작들 또는 단계들을 포함한다. 방법 동작들 및/또는 단계들은 청구항들의 범위를 벗어나지 않으면서 서로 상호 교환될 수도 있다. 다시 말해, 동작들 또는 단계들에 대한 특정 순서가 명시되지 않는 한, 특정 동작들 및/또는 단계들의 순서 및/또는 이용은 청구항들의 범위로부터 벗어남이 없이 수정될 수도 있다.The methods disclosed herein include one or more acts or steps for achieving the method described above. Method acts and/or steps may be interchanged with each other without departing from the scope of the claims. In other words, unless a specific order for acts or steps is specified, the order and/or use of specific acts and/or steps may be modified without departing from the scope of the claims.

본 발명에서 이용되는 바와 같이, 아이템들의 리스트 중 "그 중 적어도 하나" 를 지칭하는 구절은 단일 멤버들을 포함하여, 이들 아이템들의 임의의 조합을 지칭한다. 일 예로서, "a, b, 또는 c: 중의 적어도 하나" 는 a, b, c, a-b, a-c, b-c, 및 a-b-c 뿐만 아니라 동일한 엘리먼트의 다수의 것들과의 임의의 조합 (예를 들어, a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, 및 c-c-c 또는 a, b, 및 c 의 다른 임의의 순서 화한 것) 을 포함하도록 의도된다.As used herein, a phrase referring to “at least one of” in a list of items refers to any combination of these items, including single members. As an example, "at least one of a, b, or c:" means a, b, c, ab, ac, bc, and abc, as well as any combination with multiples of the same element (e.g., aa , aaa, aab, aac, abb, acc, bb, bbb, bbc, cc, and ccc or any other ordering of a, b, and c).

본 발명에서 이용되는 바와 같이, 용어 "결정하는"는 매우 다양한 동작들을 망라한다. 예를 들어, "결정하는"는 계산하는, 컴퓨팅, 프로세싱, 도출하는, 조사하는, 룩업하는 (예를 들어, 테이블, 데이터베이스, 또는 다른 데이터 구조에서 룩업하는), 확인하는 등을 포함할 수도 있다. 또한, "결정하는"은 수신하는 (예를 들면, 정보를 수신하는), 액세스하는 (메모리의 데이터에 액세스하는) 등을 포함할 수 있다. 또한, "결정하는"은 해결하는, 선택하는, 고르는, 확립하는 등을 포함할 수 있다.As used herein, the term “determining” encompasses a wide variety of operations. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (eg, looking up in a table, database, or other data structure), ascertaining, etc. . Also, “determining” may include receiving (eg, receiving information), accessing (accessing data in a memory), and the like. Also, “determining” may include resolving, choosing, choosing, establishing, and the like.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. The above description is merely illustrative of the technical idea of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains.

따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

100... 동영상 구간 고속 분할 장치100... high-speed segmentation device for video sections

Claims

In the video section high-speed segmentation apparatus,
at least one processor;
the at least one processor,
Determining I frames in the video,
A video section high-speed segmentation apparatus for extracting visual features for each of the determined I frames.

According to claim 1,
the at least one processor,
A high-speed segmentation apparatus for segmenting a segment of the moving image based on the extracted visual features.

According to claim 1,
the at least one processor,
Generate an image caption for each of the determined I frames,
Extracting language features from the generated image caption.
is a high-speed segmentation device for video sections.

4. The method of claim 3,
the at least one processor,
A high-speed segmentation apparatus for segmenting a segment of the moving image based on the extracted visual features or the extracted language features.

4. The method of claim 3,
the at least one processor,
A video section high-speed division apparatus for calculating the length of a data unit for each of the determined I frames.

6. The method of claim 5,
the at least one processor,
A high-speed segmentation apparatus for segmenting a segment of the moving image based on the extracted visual features, the extracted language features, or the calculated length of the data unit.

According to claim 1,
the at least one processor,
A video section high-speed division apparatus for calculating the length of a data unit for each of the determined I frames.

8. The method of claim 7,
the at least one processor,
A video section high-speed segmentation apparatus for dividing the section of the video based on the extracted visual features or the calculated length of the data unit (date unit).

In the video section high-speed segmentation apparatus,
at least one processor;
the at least one processor,
Determining I frames in the video,
Generate an image caption for each of the determined I frames,
A high-speed segmentation apparatus for video section extracting language features from the generated image caption.

10. The method of claim 9,
the at least one processor,
A high-speed segmentation apparatus for segmenting a segment of the moving image based on the extracted language features.

10. The method of claim 9,
the at least one processor,
Calculate the length of the data unit (date unit) for each of the determined I frame (frame),
A video section high-speed segmentation device for dividing the section of the video based on the extracted language features or the calculated length of the data unit (date unit).

Determining I frames in a moving picture; and
An operation of extracting visual features for each of the determined I frames
A video section high-speed segmentation method comprising a.

13. The method of claim 12,
The video section high-speed segmentation method,
Segmenting a section of the video based on the extracted visual features
A video section high-speed segmentation method further comprising a.

13. The method of claim 12,
The video section high-speed segmentation method,
generating an image caption for each of the determined I frames; and
Extracting language features from the generated image caption
A video section high-speed segmentation method further comprising a.

15. The method of claim 14,
The video section high-speed segmentation method,
Segmenting a section of the video based on the extracted visual features or the extracted language features
A video section high-speed segmentation method further comprising a.

15. The method of claim 14,
The video section high-speed segmentation method,
Calculating the length of a data unit for each of the determined I frames
A video section high-speed segmentation method further comprising a.

17. The method of claim 16,
The video section high-speed segmentation method,
dividing the section of the video based on the extracted visual features, the extracted language features, or the calculated length of the data unit (date unit)
A video section high-speed segmentation method further comprising a.

13. The method of claim 12,
The video section high-speed segmentation method,
Calculating the length of a data unit for each of the determined I frames
A video section high-speed segmentation method further comprising a.

19. The method of claim 18,
The video section high-speed segmentation method,
dividing the section of the video based on the extracted visual features or the calculated length of the data unit (date unit)
A video section high-speed segmentation method further comprising a.

Determining I frames in a moving picture;
generating an image caption for each of the determined I frames; and
Extracting language features from the generated image caption
A video section high-speed segmentation method comprising a.

21. The method of claim 20,
The video section high-speed segmentation method,
Segmenting a section of the video based on the extracted language features
A video section high-speed segmentation method further comprising a.

21. The method of claim 20,
The video section high-speed segmentation method,
calculating a length of a data unit for each of the determined I frames; and
dividing the section of the video based on the extracted language features or the calculated length of the data unit (date unit)
A video section high-speed segmentation method further comprising a.