KR102131751B1

KR102131751B1 - Method for processing interval division information based on recognition meta information and service device supporting the same

Info

Publication number: KR102131751B1
Application number: KR1020180145662A
Authority: KR
Inventors: 송민경; 윤종철
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2020-07-08
Also published as: KR20200060110A

Abstract

본 발명의 실시 예에 따른 서비스 장치는 메모리 및 상기 메모리에 기능적으로 연결되는 프로세서를 포함할 수 있다. 상기 프로세서는 상기 메모리에 저장된 영상 컨텐츠를 획득하고, 상기 획득된 영상 컨텐츠에서 인식 메타 정보들을 추출하고, 추출된 인식 메타 정보들의 검출 시점 정보를 클러스터링하고, 클러스터링된 클러스터들의 분포에 따라 구간을 구분한 후, 구간 구분 정보를 생성하도록 설정할 수 있다. The service device according to an embodiment of the present invention may include a memory and a processor functionally connected to the memory. The processor acquires image content stored in the memory, extracts recognition meta information from the acquired image content, clusters detection time information of the extracted recognition meta information, and divides a section according to the distribution of clustered clusters. After that, it may be set to generate section classification information.

Description

Method for processing interval division information based on recognition meta information and service device supporting the same}

본 발명은 영상 컨텐츠의 구간 구분 정보 처리에 관한 것으로, 특히 영상 컨텐츠에 대한 인식 메타 정보를 이용하여 구간 구분 정보를 획득하고, 이를 운용하는 방법 및 이를 지원하는 서비스 장치에 관한 것이다.The present invention relates to section division information processing of video content, and more particularly, to a method for obtaining section operation information using recognition meta information about image content, a method of operating the same, and a service device supporting the section.

영상 컨텐츠 내에서 일정 구간을 구분하는 것 특히, 시퀀스 단위로 구분하는 것은 광고 삽입 등의 목적을 위해 필요한 작업이다. 종래 영상 컨텐츠의 구간을 구분하는데 있어서 오디오 정보를 이용하는 방법 등이 제시된 바 있다. 예컨대, 이전 음원이 종료되고 새로운 음원이 시작되는 지점을 찾아서 구간 구분 정보로 이용할 수 있다. 그러나, 음원의 종료와 시작 지점이 해당 영상 컨텐츠의 내용이 분기되는 지점과 다른 경우가 많기 때문에, 음원을 이용하여 시퀀스를 구분하는 것은 정확도가 떨어지는 문제가 있었다.The division of a certain section within the video content, in particular, the division by sequence is a task required for the purpose of advertisement insertion and the like. Conventionally, a method of using audio information in classifying a section of video content has been proposed. For example, a point where the previous sound source ends and a new sound source starts may be found and used as section classification information. However, since the end and start points of the sound source are often different from the point where the contents of the corresponding video content are diverged, there is a problem in that the sequence classification using the sound source is less accurate.

상술한 바와 같이, 일정 이야기 단위로 구분되는 또는 문맥 단위로 구분되는 구간에 해당하는 시퀀스(Sequence) 정보를 탐색하는 기술은 미디어를 서비스하는데 있어서 가장 기본적이고도 핵심적인 기술이다. 시퀀스는 컨텐츠에서 문맥이라는 개념으로 구분할 수 있는 컨텐츠의 구분 단위로서, 미디어 서비스에서 컨텐츠 재생 중 자연스러운 지점에 중간 광고 등을 삽입하여야 할 때 이용될 수 있다. 이에 따라, 시퀀스는 문맥 상 어색하지 않고 자연스럽게 중간 광고를 삽입할 수 있는 주요 지점으로 선택되어야 한다. 그러나, 시퀀스 검출을 위해, 컨텐츠에서 사람이 문맥적인 내용을 인지하고, 자연스러운 흐름을 구분해야 하기 때문에, 사람이 직접적으로 컨텐츠를 시청하고 인위적으로 시퀀스를 구분하는 등의 작업이 필요하다. 즉, 시퀀스는 카메라의 변화로 감지하는 샷(Shot)과는 다르게 단순히 영상적인 구분만으로 정의 내리기 어렵고, 문맥과 내용의 흐름, 그리고 서비스의 요구사항에 맞게 동적으로 구분 될 수 있어야 하며, 사람이 보았을 때 문맥 상 자연스럽게 이뤄져야 한다. 따라서 사람이 수동으로 나누고자 하는 시퀀스의 개수만큼 직접 분할을 수행하지 않고서는 시퀀스를 구분하는 것이 어려운 문제가 있다. As described above, a technique of searching for sequence information corresponding to a section divided into a certain story unit or a section divided into a context unit is the most basic and core technology in service of media. A sequence is a division unit of content that can be divided into the concept of context in content, and can be used when an intermediate advertisement or the like needs to be inserted at a natural point during content playback in a media service. Accordingly, the sequence should be selected as a main point in which an intermediate advertisement can be naturally inserted without being awkward in context. However, in order to detect a sequence, since a person needs to recognize contextual content and distinguish natural flows in content, it is necessary for a person to directly watch the content and artificially classify the sequence. That is, unlike a shot that is sensed by a change in the camera, the sequence is difficult to define simply by visual classification, and must be able to be dynamically classified according to the context, the flow of content, and the requirements of the service. It should be done naturally in context. Therefore, there is a problem in that it is difficult to distinguish a sequence without performing direct division as many as the number of sequences that a person wants to manually divide.

한국공개특허 제10-2016-0110433호, 2016년 09월 21일 등록 (명칭: 컨텐츠 분석 방법 및 디바이스)Korean Patent Publication No. 10-2016-0110433, registered on September 21, 2016 (name: content analysis method and device)

본 발명은 상술한 요구를 충족하기 위한 것으로, 영상 컨텐츠에서 검출된 다양한 인식 메타 정보를 활용하여 자연스러운 시퀀스 구분을 수행하고, 이를 기반으로 다양한 시퀀스 기반 서비스를 제공할 수 있는 인식 메타 정보를 이용한 구간 구분 정보 처리 방법 및 이를 지원하는 서비스 장치를 제공함에 있다.The present invention is to satisfy the above-described needs, and performs natural sequence classification by using various recognition meta information detected from the video content, and based on this, segment division using recognition meta information capable of providing various sequence-based services It is to provide an information processing method and a service device supporting the information.

본 발명의 실시 예에 따른 서비스 장치는 메모리 및 상기 메모리에 기능적으로 연결되는 프로세서를 포함할 수 있다. 상기 프로세서는 상기 메모리에 저장된 영상 컨텐츠를 획득하고, 상기 획득된 영상 컨텐츠에서 인식 메타 정보들을 추출하고, 추출된 인식 메타 정보들의 검출 시점 정보를 클러스터링하고, 클러스터링된 클러스터들의 분포에 따라 구간을 구분한 후, 구간 구분 정보를 생성하도록 설정될 수 있다.The service device according to an embodiment of the present invention may include a memory and a processor functionally connected to the memory. The processor acquires image content stored in the memory, extracts recognition meta information from the acquired image content, clusters detection time information of the extracted recognition meta information, and divides a section according to the distribution of clustered clusters. Thereafter, it may be set to generate section classification information.

특히, 상기 프로세서는 지정된 시간 범위 내에서 반복 등장하는 인식 메타 정보들을 통합하여 통합된 인식 메타 정보를 생성하고, 상기 통합된 인식 메타 정보에 대한 클러스터링을 수행하도록 설정될 수 있다.In particular, the processor may be configured to generate recognition meta information by integrating recognition meta information repeatedly appearing within a specified time range, and to perform clustering on the integrated recognition meta information.

또는, 상기 프로세서는 상기 복수의 인식 기술을 이용하여 상기 인식 메타 정보들을 추출하고, 추출된 각 인식 메타 정보들에 대한 통합된 인식 메타 정보 생성을 수행한 후, 인식 메타의 시작 지점 및 종료 지점이 지정된 범위 이내로 유사한 인식 메타들을 모아서 하나의 클러스터의 입력 값을 처리하도록 설정될 수 있다. Alternatively, the processor extracts the recognition meta information using the plurality of recognition technologies, and after performing integrated recognition meta information generation for each extracted recognition meta information, the start and end points of the recognition meta are It may be configured to process similar input metadata of a cluster within a specified range and process input values of one cluster.

한편, 상기 프로세서는 상기 영상 컨텐츠의 컨텐츠 관련 정보를 확인하여, 상기 추출할 인식 메타의 종류를 결정하도록 설정될 수 있다. Meanwhile, the processor may be configured to determine the type of recognition meta to be extracted by checking content-related information of the video content.

이 경우, 상기 프로세서는 상기 영상 컨텐츠가 영화 컨텐츠인 경우 음원 인식 메타 및 인물 인식 메타를 추출하도록 설정될 수 있다.In this case, the processor may be configured to extract a sound source recognition meta and a person recognition meta when the image content is movie content.

또는, 상기 프로세서는 상기 영상 컨텐츠가 다큐멘터리인 경우 특정 사물 또는 특정 동물에 관한 인식 메타 및 음원 인식 메타를 추출하도록 설정될 수 있다.Alternatively, the processor may be configured to extract a recognition meta and a sound source recognition meta for a specific object or a particular animal when the video content is a documentary.

한편, 상기 서비스 장치는 외부 전자 장치와 통신 채널을 형성하는 통신 회로를 더 포함할 수 있으며, 상기 프로세서는 상기 통신 회로를 통해, 상기 외부 전자 장치로부터 시퀀스 개수를 수신하고, 상기 시퀀스 개수에 따라 상기 클러스터링을 통해 구분할 클러스터의 개수를 결정하도록 설정될 수 있다.Meanwhile, the service device may further include a communication circuit forming a communication channel with an external electronic device, and the processor receives the sequence number from the external electronic device through the communication circuit, and according to the sequence number, It can be set to determine the number of clusters to be classified through clustering.

이 경우, 상기 프로세서는 상기 클러스터링 분포를 기반으로 인식 메타의 등장에 따른 변화도에 따른 클러스터들의 랭킹을 부여하고, 상기 시퀀스 개수에 따라 일정 랭킹 안에 든 클러스터들을 구간 구분 정보 생성에 이용할 시퀀스들로 선택하도록 설정될 수 있다.In this case, the processor assigns a ranking of clusters according to the degree of change according to the appearance of the recognition meta based on the clustering distribution, and selects clusters within a certain ranking according to the number of sequences as sequences to be used for generating section classification information Can be set.

본 발명의 실시 예에 따른 인식 메타를 이용한 구간 구분 정보 처리 방법은 서비스 장치가, 메모리에 저장된 영상 컨텐츠에서 인식 메타 정보들을 추출하는 단계, 상기 인식 메타 정보들을 지정된 기준에 따라 통합하여 통합된 인식 메타 정보들을 생성하는 단계, 상기 통합된 인식 메타 정보들의 검출 시점 정보를 클러스터링하는 단계, 상기 클러스터링된 클러스터들의 분포에 따라 구간을 구분한 후, 구간 구분 정보를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method for processing section classification information using a recognition meta includes: a service device extracting recognition meta information from image content stored in a memory, and integrating the recognition meta information according to a specified criterion to integrate the recognition meta. The method may include generating information, clustering detection time information of the integrated recognition meta information, classifying sections according to the distribution of the clustered clusters, and generating section classification information.

여기서, 상기 방법은 상기 클러스터링 분포를 기반으로 인식 메타의 등장에 따른 변화도에 따른 클러스터들의 랭킹을 부여하는 단계를 더 포함할 수 있다.Here, the method may further include assigning a ranking of clusters according to the degree of change according to the appearance of the recognition meta based on the clustering distribution.

이 경우, 상기 구간 구분 정보를 생성하는 단계는 통신 회로를 통해, 외부 전자 장치로부터 시퀀스 개수를 수신하는 단계, 상기 시퀀스 개수에 따라 일정 랭킹 안에 든 클러스터들을 구간 구분 정보 생성에 이용할 시퀀스들로 선택하는 단계를 더 포함할 수 있다.In this case, the step of generating the section classification information includes receiving a sequence number from an external electronic device through a communication circuit, and selecting clusters within a certain ranking according to the sequence number as sequences to be used to generate section classification information. It may further include a step.

본 발명의 실시 예에 따른 프로세서에 의해 실행된 적어도 하나의 명령어를 저장하는 컴퓨터 기록 매체는, 상기 적어도 하나의 명령어를 저장하고, 상기 적어도 하나의 명령어는 서비스 장치가, 메모리에 저장된 영상 컨텐츠에서 인식 메타 정보들을 추출하는 동작, 상기 인식 메타 정보들을 지정된 기준에 따라 통합하여 통합된 인식 메타 정보들을 생성하는 동작, 상기 통합된 인식 메타 정보들의 검출 시점 정보를 클러스터링하는 동작, 상기 클러스터링된 클러스터들의 분포에 따라 구간을 구분한 후, 구간 구분 정보를 생성하는 동작을 수행하도록 설정될 수 있다.A computer recording medium storing at least one instruction executed by a processor according to an embodiment of the present invention stores the at least one instruction, and the at least one instruction is recognized by the service device in the image content stored in the memory. Extracting meta information, integrating the recognition meta information according to a specified criterion to generate integrated recognition meta information, clustering detection time information of the integrated recognition meta information, and distribution of the clustered clusters Accordingly, after classifying the sections, it may be set to perform an operation for generating section classifying information.

본 발명은 영상 컨텐츠 내에서 인식 메타 정보를 추출하고, 추출된 인식 메타 정보들을 통합한 후 클러스터링을 통하여 지정된 개수의 시퀀스들을 적절히 구분할 수 있도록 할 수 있어, 사람의 인위적인 구분이 없이 보다 빠르고 신속한 영상 컨텐츠 구간 구분 기능을 제공할 수 있으며, 균일한 영상 컨텐츠 구분 기준을 제공하여 양호한 품질의 구간 구분 능력을 제공 할 수 있다. The present invention can extract the recognition meta information within the video content, integrate the extracted recognition meta information, and then allow the specified number of sequences to be properly classified through clustering, so that the video content is faster and faster without artificial separation of persons. It is possible to provide a section division function and to provide a section classification capability of good quality by providing a uniform criteria for classifying video contents.

기타, 본 발명의 다른 효과들은 이하 설명에서 언급하기로 한다.Other effects of the present invention will be mentioned in the following description.

도 1은 본 발명의 실시 예에 따른 영상 컨텐츠 구간 구분 정보 처리와 관련한 네트워크 환경의 한 예를 나타낸 도면이다.
도 2는 본 발명의 실시 예에 따른 서비스 장치의 개략적인 형태를 나타낸 도면이다.
도 3은 본 발명의 실시 예에 따른 서비스 장치의 프로세서의 한 예를 나타낸 도면이다.
도 4는 특정 영상 컨텐츠에서 인물 인식한 인식 메타 정보들의 한 예를 그래프로 나타낸 것이다.
도 5는 도 4에 도시된 인식 메타 정보들을 일정 기준에 따라 통합 결과를 나타낸 것이다.
도 6은 도 5에서 설명한 통합된 인식 메타 정보들에 대하여 수동으로 시퀀스를 구분하여 태깅한 값을 나타낸 것이며, 도 7은 태깅된 정보들을 지정된 규칙에 따라 통합한 값을 나타낸 것이다.
도 8은 분할된 클러스터들을 기준으로 시퀀스 후보를 구성하고, 요구된 시퀀스 개수에 따라 클러스터링의 중앙값(k)값을 동적으로 지정한 값을 나타낸 것이다.
도 9는 클러스터링 최대/최소 값 기준의 시퀀스 구분 방법의 한 예를 나타낸 것이다.
도 10은 시퀀스의 시작을 최소값으로 설정한 경우의 한 예를 나타낸 도면이다.
도 11은 본 발명의 실시 예에 따른 인식 메타 정보를 이용한 구간 구분 정보 처리 방법의 한 예를 나타낸 도면이다.1 is a diagram illustrating an example of a network environment related to image content section classification information processing according to an embodiment of the present invention.
2 is a view showing a schematic form of a service device according to an embodiment of the present invention.
3 is a diagram illustrating an example of a processor of a service device according to an embodiment of the present invention.
4 is a graph showing an example of recognition meta information recognized by a person in a specific video content.
FIG. 5 shows the integration result of the recognition meta information shown in FIG. 4 according to a certain criterion.
FIG. 6 shows the values obtained by manually classifying and tagging the combined recognition meta information described in FIG. 5, and FIG. 7 shows values obtained by combining the tagged information according to a specified rule.
FIG. 8 shows a value configured by configuring a sequence candidate based on the divided clusters and dynamically designating a median value (k) of clustering according to the requested number of sequences.
9 shows an example of a method for classifying sequences based on clustering maximum/minimum values.
10 is a diagram showing an example of a case where the start of a sequence is set to a minimum value.
11 is a diagram illustrating an example of a method for processing section classification information using recognition meta information according to an embodiment of the present invention.

본 발명의 과제 해결 수단의 특징 및 이점을 보다 명확히 하기 위하여, 첨부된 도면에 도시된 본 발명의 특정 실시 예를 참조하여 본 발명을 더 상세하게 설명한다.In order to clarify the features and advantages of the problem solving means of the present invention, the present invention will be described in more detail with reference to specific embodiments of the present invention shown in the accompanying drawings.

다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.However, in the following description and attached drawings, detailed descriptions of well-known functions or configurations that may obscure the subject matter of the present invention are omitted. In addition, it should be noted that the same components throughout the drawings are denoted by the same reference numerals as much as possible.

이하의 설명 및 도면에서 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms or words used in the following description and drawings should not be interpreted as being limited to ordinary or dictionary meanings, and the inventor can appropriately define the concept of terms for explaining his or her invention in the best way. Based on the principle of being present, it should be interpreted as meanings and concepts consistent with the technical spirit of the present invention. Therefore, the embodiments shown in the embodiments and the drawings described in this specification are only the most preferred embodiments of the present invention, and do not represent all of the technical spirit of the present invention, and can replace them at the time of this application. It should be understood that there may be equivalents and variations.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하기 위해 사용하는 것으로, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용될 뿐, 상기 구성요소들을 한정하기 위해 사용되지 않는다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제2 구성요소는 제1 구성요소로 명명될 수 있고, 유사하게 제1 구성요소도 제2 구성요소로 명명될 수 있다.In addition, terms including ordinal numbers such as first and second are used to describe various components, and are used only for the purpose of distinguishing one component from other components, and to limit the components It is not used. For example, the second component may be referred to as a first component without departing from the scope of the present invention, and similarly, the first component may also be referred to as a second component.

또한, 본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 본 명세서에서 기술되는 "포함 한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In addition, the terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In addition, terms such as "comprises" or "have" described herein are intended to indicate that there are features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, or one or more thereof. It should be understood that the above or other features or numbers, steps, actions, components, parts or combinations thereof are not excluded in advance.

또한, 명세서에 기재된 "부", "기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, "일(a 또는 an)", "하나(one)", "그(the)" 및 유사 관련어는 본 발명을 기술하는 문맥에 있어서(특히, 이하의 청구항의 문맥에서) 본 명세서에 달리 지시되거나 문맥에 의해 분명하게 반박되지 않는 한, 단수 및 복수 모두를 포함하는 의미로 사용될 수 있다.In addition, terms such as “part”, “group”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. In addition, "a (a or an)", "one (one)", "the (the)" and similar related terms in the context of describing the present invention (especially in the context of the following claims) is different herein. It may be used in a sense including both singular and plural unless indicated or clearly contradicted by context.

상술한 용어들 이외에, 이하의 설명에서 사용되는 특정 용어들은 본 발명의 이해를 돕기 위해서 제공된 것이며, 이러한 특정 용어의 사용은 본 발명의 기술적 사상을 벗어나지 않는 범위에서 다른 형태로 변경될 수 있다.In addition to the above-mentioned terms, specific terms used in the following description are provided to help understanding of the present invention, and the use of these specific terms may be changed into other forms without departing from the technical spirit of the present invention.

아울러, 본 발명의 범위 내의 실시 예들은 컴퓨터 실행가능 명령어 또는 컴퓨터 판독가능 매체에 저장된 데이터 구조를 가지거나 전달하는 컴퓨터 판독가능 매체를 포함한다. 이러한 컴퓨터 판독가능 매체는, 범용 또는 특수 목적의 컴퓨터 시스템에 의해 액세스 가능한 임의의 이용 가능한 매체일 수 있다. 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EPROM, CD-ROM 또는 기타 광 디스크 저장장치, 자기 디스크 저장장치 또는 기타 자기 저장장치, 또는 컴퓨터 실행가능 명령어, 컴퓨터 판독가능 명령어 또는 데이터 구조의 형태로 된 소정의 프로그램 코드 수단을 저장하거나 전달하는 데에 이용될 수 있고, 범용 또는 특수 목적 컴퓨터 시스템에 의해 액세스 될 수 있는 임의의 기타 매체와 같은 물리적 저장 매체를 포함할 수 있지만, 이에 한정되지 않는다.In addition, embodiments within the scope of the present invention include computer readable media having or carrying computer-executable instructions or data structures stored on computer-readable media. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. By way of example, such computer readable media may be in the form of RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, or computer executable instructions, computer readable instructions or data structures. Physical storage media, such as any other media that may be used to store or transfer certain program code means in, and can be accessed by, general purpose or special purpose computer systems. .

도 1은 본 발명의 실시 예에 따른 영상 컨텐츠 구간 구분 정보 처리와 관련한 네트워크 환경의 한 예를 나타낸 도면이다.1 is a diagram illustrating an example of a network environment related to image content section classification information processing according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시 예에 따른 영상 컨텐츠 구간 구분 정보 처리와 관련한 네트워크 환경(10)은 네트워크(50), 단말 장치(100), 컨텐츠 제공 서버(300) 및 서비스 장치(200)를 포함할 수 있다. Referring to FIG. 1, a network environment 10 related to image content section classification information processing according to an embodiment of the present invention includes a network 50, a terminal device 100, a content providing server 300, and a service device 200 It may include.

상술한 본 발명의 네트p 환경(10)은 단말 장치(100)가 네트워크(50)를 통하여 서비스 장치(200)에 접속하면, 서비스 장치(200)가 단말 장치(100)에 지정된 컨텐츠 이용 화면을 제공하고, 단말 장치(100)가 요청한 검색 요청 정보에 대응하여, 서비스 장치(200)는 검색 요청 정보에 대응하는 컨텐츠를 검색하여 단말 장치(100)에 제공할 수 있다. 이 과정에서, 상기 서비스 장치(200)는 인식 메타를 이용하여 지정된 개수의 시퀀스를 가지는 컨텐츠를 단말 장치(100)에 제공할 수 있다. 또는, 상기 서비스 장치(200)는 상기 지정된 개수의 시퀀스 각각에서 지정된 광고나 부가 정보가 삽입된 컨텐츠를 단말 장치(100)에 제공할 수 있다. In the above-mentioned network environment 10 of the present invention, when the terminal device 100 accesses the service device 200 through the network 50, the service device 200 uses the content designated on the terminal device 100. In response to the search request information requested by the terminal device 100, the service device 200 may search for content corresponding to the search request information and provide it to the terminal device 100. In this process, the service device 200 may provide the terminal device 100 with content having a specified number of sequences using a recognition meta. Alternatively, the service device 200 may provide the terminal device 100 with content in which a specified advertisement or additional information is inserted in each of the specified number of sequences.

상기 네트워크(50)는 단말 장치(100)와 서비스 장치(200)간 데이터 송수신을 위해 데이터를 전달하는 역할을 하거나, 컨텐츠 제공 서버(300)와 서비스 장치(200) 간의 데이터 송수신을 위해 데이터를 전달하는 역할을 할 수 있다. 이러한 네트워크(50)는 시스템 구현 방식에 따라 이더넷(Ethernet), xDSL(ADSL, VDSL), HFC(Hybrid Fiber Coaxial Cable), FTTC(Fiber to The Curb), FTTH(Fiber To The Home) 등의 유선 통신 방식을 이용할 수도 있고, WLAN(Wireless LAN), 와이파이(Wi-Fi), 와이브로(Wibro), 와이맥스(Wimax), HSDPA(High Speed Downlink Packet Access), LTE(Long Term Evolution), LTE-A (Long Term Evolution Advanced) 등의 무선 통신 방식을 이용할 수도 있으며, 상술한 통신 방식 이외에도 기타 널리 공지되었거나 향후 개발될 모든 형태의 통신 방식을 포함할 수 있다. 또는, 상기 단말 장치(100)가 서비스 장치(200)에서 송출하는 방송 신호를, 방송 안테나를 이용하여 수신하거나 셋탑 박스를 기반으로 수신하는 경우, 상기 네트워크(50)는 방송 네트워크의 일 예를 포함할 수 있다. The network 50 serves to transmit data for data transmission and reception between the terminal device 100 and the service device 200, or transmits data for data transmission and reception between the content providing server 300 and the service device 200 It can play a role. These networks 50 are wired communication such as Ethernet, xDSL (ADSL, VDSL), HFC (Hybrid Fiber Coaxial Cable), FTTC (Fiber to The Curb), FTTH (Fiber To The Home), depending on the system implementation method. You can also use the method, WLAN (Wireless LAN), Wi-Fi (Wi-Fi), Wibro (Wibro), Wimax (Wimax), HSDPA (High Speed Downlink Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution Advanced) may be used, and in addition to the above-described communication methods, other widely known or future communication methods may be included. Alternatively, when the terminal device 100 receives a broadcast signal transmitted from the service device 200 using a broadcast antenna or based on a set-top box, the network 50 includes an example of a broadcast network can do.

상기 단말 장치(100)는 사용자의 조작에 따라 네트워크(50)를 통해 서비스 장치(200)와 각종 데이터를 송수신할 수 있는 사용자의 장치를 의미한다. 단말 장치(100)는 네트워크(50)(인터넷 네트워크 또는 방송 네트워크)를 통하여 영상 컨텐츠 송수신을 수행할 수 있으며, 영상 컨텐츠 송수신 및 처리를 위한 프로그램 및 프로토콜을 저장하는 메모리, 각종 프로그램을 실행하여 연산 및 제어하기 위한 프로세서 등을 구비할 수 있다. 또한, 이러한 본 발명의 단말 장치(100)는 다양한 형태로 구현될 수 있다. 예를 들어, 본 명세서에서 기술되는 단말 장치(100)는 스마트 폰(smart phone), 타블렛 PC(Tablet PC), PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), MP3 Player 등의 이동 단말기는 물론, 스마트 TV(Smart TV), IPTV, IPTV의 셋탑, 노트북 컴퓨터(Laptop Computer), 데스크탑 컴퓨터 등과 같은 고정 단말기가 사용될 수도 있으며, 본 발명에 따른 영상 컨텐츠와 관련한 데이터를 수신 할 수 있는 장치라면, 그 어떠한 장치도 본 발명의 실시 예에 따른 단말 장치(100)로 이용될 수 있다.The terminal device 100 refers to a user device capable of transmitting and receiving various data with the service device 200 through the network 50 according to a user's operation. The terminal device 100 may perform video content transmission and reception through the network 50 (Internet network or broadcast network), execute a memory and various programs for storing and processing programs and protocols for video content transmission and reception, and calculate and It may be provided with a processor for controlling. In addition, the terminal device 100 of the present invention may be implemented in various forms. For example, the terminal device 100 described in the present specification is a mobile phone such as a smart phone, a tablet PC, a personal digital assistants (PDA), a portable multimedia player (PMP), or an MP3 player. Of course, a fixed terminal such as a smart TV, IPTV, IPTV set-top, laptop computer, desktop computer, etc. may be used, and if it is a device capable of receiving data related to image content according to the present invention, Any device can be used as the terminal device 100 according to an embodiment of the present invention.

한편, 본 발명에 실시 예에 따른 단말 장치(100)는 통신 회로, 디스플레이, 입력 장치, 오디오 장치, 프로세서, 메모리 등의 구성을 포함하고, 사용자 조작에 따라 통신 회로를 이용하여 네트워크(50)(예: 인터넷 네트워크 또는 방송 네트워크)를 통한 서비스 장치(200) 와 연결될 수 있다. 예를 들어, 단말 장치(100)는 사용자 입력에 대응하여 웹 브라우저 또는 그에 대응하는 어플리케이션을 실행하고, 사전 입력된 인터넷 주소 정보 또는 사용자가 입력한 주소 정보 등을 기반으로 서비스 장치(200)에 접속할 수 있다. 상기 단말 장치(100)는 서비스 장치(200)로부터 지정된 웹 페이지를 수신하여 디스플레이에 출력할 수 있다. 예를 들면, 상기 웹 페이지는 적어도 하나의 영상 컨텐츠를 검색할 수 있는 가상의 페이지를 포함할 수 있다. 다른 예로서, 상기 단말 장치(100)는 IPTV 또는 IPTV의 셋탑을 포함하며, 안테나를 통해 서비스 장치(200)가 송출하는 영상 컨텐츠를 수신하여 출력할 수도 있다. Meanwhile, the terminal device 100 according to an embodiment of the present invention includes a configuration of a communication circuit, a display, an input device, an audio device, a processor, a memory, and the network 50 ( For example: it may be connected to the service device 200 through an Internet network or a broadcast network. For example, the terminal device 100 executes a web browser or an application corresponding to the user input, and accesses the service device 200 based on pre-entered Internet address information or address information input by the user. Can. The terminal device 100 may receive a specified web page from the service device 200 and output it on the display. For example, the web page may include a virtual page capable of searching for at least one video content. As another example, the terminal device 100 includes an IPTV or an IPTV set top, and may receive and output video content transmitted by the service device 200 through an antenna.

상기 컨텐츠 제공 서버(300)는 적어도 하나의 컨텐츠를 서비스 장치(200)에 제공할 수 있다. 이와 관련하여, 컨텐츠 제공 서버(300)는 컨텐츠 제작자 등으로부터 컨텐츠를 수신하고, 이를 저장 관리할 수 있다. 상기 컨텐츠 제공 서버(300)는 새로운 컨텐츠를 수집되면, 수집된 컨텐츠를 서비스 장치(200)에 제공할 수 있다. 상기 컨텐츠 제공 서버(300)는 영상 컨텐츠 및 컨텐츠 관련 정보를 서비스 장치(200)에 제공할 수 있다. 예컨대, 상기 컨텐츠 관련 정보는 컨텐츠의 종류, 장르, 감독, 배우, 줄거리, 하이라이트(highlight) 장면 등의 정보를 포함할 수 있다. The content providing server 300 may provide at least one content to the service device 200. In this regard, the content providing server 300 may receive content from a content producer or the like, and store and manage it. When the new content is collected, the content providing server 300 may provide the collected content to the service device 200. The content providing server 300 may provide video content and content-related information to the service device 200. For example, the content-related information may include information such as content type, genre, director, actor, storyline, and highlight scene.

상기 서비스 장치(200)는 상기 네트워크(50)를 통하여 상기 단말 장치(100)가 접속할 수 있도록 통신 대기 상태를 가질 수 있다. 상기 서비스 장치(200)는 상기 단말 장치(100)가 접속하면, 적어도 하나의 컨텐츠를 이용할 수 있는 컨텐츠 이용 화면(또는 가상 페이지)을 단말 장치(100)에 제공할 수 있다. 상기 서비스 장치(200)는 단말 장치(100)로부터 컨텐츠 검색 요청 정보를 수신하고, 상기 컨텐츠 검색 요청 정보에 대응하는 컨텐츠를 검색하고, 검색된 컨텐츠를 단말 장치(100)에 제공할 수 있다. 이 과정에서, 상기 서비스 장치(200)는 본 발명의 실시 예에 따른 클러스터링 구간 구분 정보를 포함하는 영상 컨텐츠 또는 클러스터링 구간 구분 정보를 토대로 중간 광고가 삽입된 영상 컨텐츠를 단말 장치(100)에 제공할 수 있다. 한편, 상기 서비스 장치(200)는 컨텐츠 수집과 관련하여 컨텐츠 제공 서버(300)와 통신 채널을 형성하고, 컨텐츠 제공 서버(300)로부터 영상 컨텐츠 및 컨텐츠 관련 정보를 수신하여 저장할 수 있다.The service device 200 may have a communication standby state so that the terminal device 100 can access through the network 50. When the terminal device 100 accesses, the service device 200 may provide the terminal device 100 with a content use screen (or virtual page) capable of using at least one content. The service device 200 may receive content search request information from the terminal device 100, search for content corresponding to the content search request information, and provide the searched content to the terminal device 100. In this process, the service device 200 may provide the terminal device 100 with the video content including the clustering section classification information or the video content with the intermediate advertisement inserted thereon according to the embodiment of the present invention. Can. Meanwhile, the service device 200 may form a communication channel with the content providing server 300 in connection with content collection, and receive and store video content and content related information from the content providing server 300.

상술한 본 발명의 영상 컨텐츠 구간 구분 정보 처리와 관련한 기술은 적어도 하나의 인식 기술을 이용하여 영상 컨텐츠의 인식 메타 정보들을 수집한 후, 통합하고, 클러스터링하여(또는 그룹화하여) 복수의 구간 구분 정보(또는 복수개의 시퀀스 정보)를 생성함으로써, 양호한 기준으로 영상 컨텐츠의 시퀀스 구분을 수행할 수 있으며, 요구되는 개수만큼(예: 광고 제공 업자의 요청 또는 컨텐츠 제공 업자의 요청, 서비스 장치의 관리자 요청 등에 따른 개수)의 구간 구분 정보를 손쉽게 획득할 수 있다. 이에 따라, 본 발명은 컨텐츠 시청에서의 영향을 최소화할 수 있는 시점에 중간 광고 등을 제공함으로써, 중간 광고와 같은 부가 정보에 대한 시청자들의 거부감을 줄이고, 부가 정보의 인식률 또는 광고 효과를 극대화할 수 있다. 또한, 본 발명은 시퀀스 구분 시점의 신뢰도를 높여서, 해당 시점에 다양한 정보를 제공함으로써, 시청자들에게 정보 전달 효과를 극대화할 수 있도록 지원한다.The above-described technology related to image content section classification information processing of the present invention collects recognition meta information of video content using at least one recognition technology, and then integrates, clusters (or groups) a plurality of section classification information ( Or, by generating a plurality of sequence information, it is possible to perform the sequence classification of the video content on a good basis, and according to the required number (for example, the request of the advertisement provider or the request of the content provider, the administrator of the service device, etc.) Number) section can be easily obtained. Accordingly, the present invention can provide viewers with an intermediate advertisement at a point in time that can minimize the influence on viewing content, thereby reducing viewers' reluctance to additional information such as an intermediate advertisement and maximizing the recognition rate or advertisement effect of the additional information. have. In addition, the present invention increases the reliability of the sequence classification time, and provides various information at the time, thereby maximizing the effect of delivering information to viewers.

도 2는 본 발명의 실시 예에 따른 서비스 장치의 개략적인 형태를 나타낸 도면이다.2 is a view showing a schematic form of a service device according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 서비스 장치(200)는 통신 회로(210), 메모리(240) 및 프로세서(250)를 포함할 수 있다. Referring to FIG. 2, the service device 200 of the present invention may include a communication circuit 210, a memory 240 and a processor 250.

상기 통신 회로(210)는 서비스 장치(200)의 통신 채널을 형성할 수 있다. 예컨대, 상기 통신 회로(210)는 단말 장치(100) 접속 요청에 대응하여 단말 장치(100)와 통신 채널을 형성할 수 있다. 상기 통신 회로(210)는 단말 장치(100)에 지정된 가상 페이지(예: 컨텐츠 이용 화면)를 프로세서(250) 제어에 대응하여 제공할 수 있다. 상기 통신 회로(210)는 단말 장치(100)로부터 컨텐츠 검색과 관련한 검색 요청 정보를 수신할 수 있다. 상기 통신 회로(210)는 상기 검색 요청 정보에 대응하여 지정된 컨텐츠를 단말 장치(100)에 제공할 수 있다. 상기 통신 회로(210)는 단말 장치(100) 요청에 의해 선택된 컨텐츠를 상기 단말 장치(100)에 스트리밍 방식 또는 다운로드 방식 등으로 전송할 수 있다. 상기 통신 회로(210)는 서비스 장치(200)와 컨텐츠 제공 서버(300) 간의 통신 채널을 형성할 수 있다. 상기 통신 회로(210)는 컨텐츠 제공 서버(300)에 저장된 영상 컨텐츠 또는 컨텐츠 관련 정보(예: 영상 컨텐츠를 설명하는 정보로서, 영상 컨텐츠 제목, 장르, 감독, 배우, 줄거리, 하일라이트 등의 정보)를 수신할 수 있다. The communication circuit 210 may form a communication channel of the service device 200. For example, the communication circuit 210 may form a communication channel with the terminal device 100 in response to a request for access to the terminal device 100. The communication circuit 210 may provide a virtual page (eg, a content usage screen) designated in the terminal device 100 in response to the control of the processor 250. The communication circuit 210 may receive search request information related to content search from the terminal device 100. The communication circuit 210 may provide the terminal device 100 with specified content corresponding to the search request information. The communication circuit 210 may transmit the content selected by the terminal device 100 request to the terminal device 100 in a streaming method or a download method. The communication circuit 210 may form a communication channel between the service device 200 and the content providing server 300. The communication circuit 210 is video content or content-related information stored in the content providing server 300 (for example, information describing video content, such as video content title, genre, director, actor, storyline, highlight, etc.) Can receive.

상기 메모리(240)는 상기 서비스 장치(200) 운용과 관련한 다양한 데이터, 프로그램, 알고리즘 등을 저장할 수 있다. 예컨대, 상기 메모리(240)는 영상 컨텐츠 DB(241) 및 시퀀스 정보(243)를 포함할 수 있다.The memory 240 may store various data, programs, and algorithms related to the operation of the service device 200. For example, the memory 240 may include image content DB 241 and sequence information 243.

상기 영상 컨텐츠 DB(241)는 적어도 하나의 영상 컨텐츠를 포함할 수 있다. 예컨대, 상기 영상 컨텐츠 DB(241)는 영화, 드라마, 뮤직 비디오, 다큐멘터리 등 다양한 영상 컨텐츠들을 저장할 수 있다. 상기 영상 컨텐츠 DB(241)에 저장된 영상 컨텐츠들은 시퀀스 정보(243)에 저장된 구간 구분 정보들과 매핑될 수 있다. The video content DB 241 may include at least one video content. For example, the image content DB 241 may store various image contents such as a movie, a drama, a music video, and a documentary. The image contents stored in the image content DB 241 may be mapped to section division information stored in the sequence information 243.

상기 시퀀스 정보(243)는 상기 영상 컨텐츠 DB(241)에 저장된 영상 컨텐츠들과 관계된 구간 구분 정보들을 포함할 수 있다. 상기 구간 구분 정보들은 특정 영상 컨텐츠에 적용된 시퀀스들을 구분하는 시점 정보들을 포함할 수 있다. 상기 시퀀스 정보(243)에 포함된 구간 구분 정보들은 영상 컨텐츠의 광고 삽입 등에 이용될 수 있다. 이와 관련하여, 상기 메모리(240)는 상기 영상 컨텐츠 재생 중간에 삽입될 수 있는 적어도 하나의 가상 광고를 포함할 수도 있다.The sequence information 243 may include section classification information related to image contents stored in the image content DB 241. The section classification information may include viewpoint information for classifying sequences applied to specific video content. Section classification information included in the sequence information 243 may be used for advertisement insertion of video content. In this regard, the memory 240 may include at least one virtual advertisement that can be inserted in the middle of playing the video content.

상기 프로세서(250)는 상기 서비스 장치(200) 운용과 관련한 데이터의 전달 또는 데이터의 처리 등을 수행할 수 있다. 예를 들어, 상기 프로세서(250)는 본 발명의 구간 구분 정보 처리 및 컨텐츠 제공 기능과 관련하여, 컨텐츠 제공 서버(300)와의 통신 채널 형성, 컨텐츠 제공 서버(300)로부터 영상 컨텐츠 또는 컨텐츠 관련 정보 수신, 단말 장치(100)의 접속, 컨텐츠 이용 화면의 제공, 단말 장치(100)로부터 검색 요청 정보의 수신, 수신된 검색 요청 정보에 대응하는 컨텐츠 제공을 처리할 수 있다. 특히, 본 발명의 프로세서(250)는 구간 구분 정보 생성과 관련하여, 영상 컨텐츠의 장면 분석으로부터 인식 메타 정보를 추출하고, 추출된 인식 메타 정보들을 통합하고, 통합된 인식 메타 정보들을 클러스터링하여(또는 그룹화하여) 구간 구분 정보를 수집할 수 있다. 이러한 프로세서(250)는 도 3에 도시된 바와 같이 구성을 포함할 수 있다.The processor 250 may perform data transfer or data processing related to the operation of the service device 200. For example, the processor 250, in relation to the section classification information processing and content providing function of the present invention, forms a communication channel with the content providing server 300, receives video content or content related information from the content providing server 300 , It is possible to process the connection of the terminal device 100, provision of a content use screen, reception of search request information from the terminal device 100, and provision of content corresponding to the received search request information. In particular, the processor 250 of the present invention extracts recognition meta information from scene analysis of image content, integrates the extracted recognition meta information, and clusters the integrated recognition meta information (or related to the generation of section classification information). Grouping). The processor 250 may include a configuration as shown in FIG. 3.

도 3은 본 발명의 실시 예에 따른 서비스 장치의 프로세서의 한 예를 나타낸 도면이다.3 is a diagram illustrating an example of a processor of a service device according to an embodiment of the present invention.

도 3을 참조하면, 상기 프로세서(250)는 컨텐츠 수집부(251), 메타 정보 추출부(253), 메타 정보 통합부(255), 시퀀스 추출부(257) 및 서비스 제공부(259)를 포함할 수 있다. Referring to FIG. 3, the processor 250 includes a content collection unit 251, a meta information extraction unit 253, a meta information integration unit 255, a sequence extraction unit 257, and a service provision unit 259 can do.

상기 컨텐츠 수집부(251)는 영상 컨텐츠를 수집하고, 수집된 영상 컨텐츠를 메모리(240)에 저장하고, 적어도 하나의 인식 기술로 인식이 가능하도록 원본 파일을 가공할 수 있다. 예를 들어, 상기 컨텐츠 수집부(251)는 특정 영상 컨텐츠를 제공하는 컨텐츠 제공 서버(300)와 통신 채널을 형성하고, 상기 컨텐츠 제공 서버(300)로부터 적어도 하나의 영상 컨텐츠를 수집할 수 있다. 상기 컨텐츠 수집부(251)는 영상 컨텐츠가 수집되면, 수집된 영상 컨텐츠에 대하여 특정 객체(예: 배우, 동물, 특정 사물 등)가 인식되는 이미지를 추출하거나, 특정 음원(예: 배경 음악)이 검출되는 이미지를 추출할 수 있다. 특정 객체가 인식되는 이미지를 추출하려는 경우, 상기 컨텐츠 수집부(251)는 영상 컨텐츠를 필요한 단위 별 프레임 이미지로 변환하거나, 음악을 인식하려는 경우 영상 컨텐츠의 VOD 파일에서 음원 WAV 파일만을 추출하여 분할하는 등의 인식 메타 정보를 추출하기 위한 기본 작업을 수행할 수 있다. 이러한 컨텐츠 수집부(251)는 실시간 또는 일정 주기로 영상 컨텐츠를 컨텐츠 제공 서버(300)로부터 수집하고, 수집된 영상 컨텐츠에 대하여 상술한 기본 작업을 수행할 수 있다. The content collection unit 251 may collect video content, store the collected video content in the memory 240, and process the original file to be recognized by at least one recognition technology. For example, the content collection unit 251 may form a communication channel with the content providing server 300 providing specific video content, and collect at least one video content from the content providing server 300. When the video content is collected, the content collection unit 251 extracts an image in which a specific object (eg, actor, animal, specific object, etc.) is recognized with respect to the collected video content, or a specific sound source (eg, background music) is The detected image can be extracted. When extracting an image in which a specific object is recognized, the content collection unit 251 converts the video content into a frame image for each unit required, or when recognizing music, extracts and divides only the sound source WAV file from the VOD file of the video content Basic operations for extracting recognition meta information, such as may be performed. The content collection unit 251 may collect video content from the content providing server 300 in real time or at regular intervals, and perform the above-described basic operation on the collected video content.

상기 메타 정보 추출부(253)는 적어도 하나의 인식 기술 기반으로 영상 컨텐츠 DB(241)에 저장된 각 영상 컨텐츠들에 대하여 인식 메타 정보를 추출한다. 이러한 메타 정보 추출부(253)는 영상 컨텐츠에서 각종 인식 메타 정보를 추출하기 위해 필요한 부분으로, 영상 컨텐츠에서 인물, 음원 등의 각종 인식 메타 정보를 지정된 인식 기술에 따라 인식 단위 별로 추출한다. 여기서의 인식 단위란 인물의 경우 특정 수의 프레임 간격이 될 수 있으며, 음원의 경우 음원의 종류를 검출할 수 있는 최소 시간 단위가 될 수 있다. 상기 인식 단위는 인식 방식에 따라 다를 수 있다. 예를 들어, 상기 메타 정보 추출부(253)는 상기 메타 정보 추출을 위한 인식 기술로서 얼굴 인식 기술을 이용할 수 있다. 메타 정보 추출부(253)는 영상 컨텐츠에서 얼굴 인식 기술을 이용하여 인물들 또는 배우들이 등장하는 이미지를 검출하고, 각 이미지들에 대한 인식 메타 정보(예: 인물의 종류와 추출 시간 정보)를 추출할 수 있다. 또는, 메타 정보 추출부(253)는 음원 제목 추출 기술을 이용할 수 있다. 이와 관련하여, 메타 정보 추출부(253)는 영상 컨텐츠에 포함된 음원의 일부 중 기 저장된 전체 음원과 비교할 수 있는 적어도 일부 구간의 음원을 인식 단위로서 수집하고, 수집된 일부 음원을 이용하여 전체 음원의 제목을 추출할 수 있다. 상기 메타 정보 추출부(253)는 인식 메타 정보 추출을 위해서 각종 딥러닝 기반 인식 엔진을 조합해 운용할 수 있다. 각 인식 메타 정보 마다 고유한 단위 시간으로 추출된 인식 메타 정보는 다음 단계의 통합 부에서 유의미한 길이의 시간 단위로 통합될 수 있다. The meta information extraction unit 253 extracts recognition meta information for each image content stored in the image content DB 241 based on at least one recognition technology. The meta information extraction unit 253 is a part necessary for extracting various recognition meta information from the image content, and extracts various recognition meta information such as a person or a sound source from the image content for each recognition unit according to a designated recognition technology. The recognition unit may be a certain number of frame intervals in the case of a person, and in the case of a sound source, it may be a minimum time unit capable of detecting the type of the sound source. The recognition unit may vary depending on the recognition method. For example, the meta information extraction unit 253 may use face recognition technology as a recognition technology for extracting the meta information. The meta information extraction unit 253 detects an image in which people or actors appear from the video content using face recognition technology, and extracts recognition meta information (eg, type and extraction time information) of each image. can do. Alternatively, the meta information extraction unit 253 may use a sound source title extraction technology. In this regard, the meta-information extraction unit 253 collects at least a portion of a sound source that can be compared with a previously stored whole sound source among a part of the sound source included in the video content as a recognition unit, and uses the collected partial sound source to generate the entire sound source. You can extract the title. The meta information extraction unit 253 may combine and operate various deep learning-based recognition engines to extract recognition meta information. Recognition meta information extracted as a unique unit time for each recognition meta information may be integrated in a time unit having a significant length in an integration unit of the next step.

상기 메타 정보 통합부(255)는 상기 메타 정보 추출부(253)가 추출한 각 인식 단위 시간으로 쪼개진 인식 메타 정보들을 연속된 하나의 유의미한 구간으로 만들기 위해 통합작업을 수행한다. 예를 들어 인물 인식의 경우, 초당 29.97fps를 가진 영상에서 10프레임 간격으로 인물이 인식되면서, 10초 내에 반복 등장하는 경우 상기 메타 정보 통합부(255)는 해당 인물에 대해 하나의 연속된 등장으로 연결해 줄 수 있다. 이러한 과정은 추후 각각의 인식 단위가 상이한 메타들을 유사한 수준으로 맞추고 너무 잘게 쪼개어 인식된 인식 메타 정보의 레벨을 변경하는데 이용될 수 있다. The meta-information integration unit 255 performs an integration operation to make the recognition meta-information divided by the recognition unit time extracted by the meta-information extraction unit 253 into a single meaningful section. For example, in the case of person recognition, when a person is recognized at an interval of 10 frames in an image having 29.97 fps per second, and repeatedly appears within 10 seconds, the meta-information integration unit 255 is configured as one continuous appearance for the person. You can connect. This process can be used to change the level of the recognized meta information in the future by matching different meta units to similar levels and splitting them too finely.

특정 영상 컨텐츠 작품에서 인물/음원/상황 정보 등의 인식 메타 정보는 인식엔진의 인식 단위에 따라 각기 다르며, 이를 전체 그래프 형태로 나타내면, 도 4와 같이 세밀한 단위의 인식 결과를 나타낼 수 있다. 도 4는 특정 영상 컨텐츠에서 인물 인식한 인식 메타 정보들의 한 예를 그래프로 나타낸 것이다. 즉, 특정 영상 컨텐츠에 대하여 10프레임 단위로 인식하고, 100프레임씩 통합할 경우, 도 4에 도시된 바와 같이 상세한 인식 메타 정보가 나타날 수 있다. 실제 인식 결과로 나오는 원천 Raw 데이터들은 각각의 인식 메타 특징에 따라 도 4에 보여지는 그래프보다 더 잘게 나뉘어질 수도 있다. 도 4에 도시된 결과를 이용하여 특정 정보를 표시한다면, 특정 인물의 등장 여부가 시점마다 상기 특정 정보가 반복적으로 나왔다 사라졌다를 반복하는 결과를 얻을 수 있다.Recognition meta information such as person/sound source/situation information in a specific video content work is different depending on the recognition unit of the recognition engine, and when it is expressed in the form of a whole graph, it can represent the recognition result in a detailed unit as shown in FIG. 4. 4 is a graph showing an example of recognition meta information recognized by a person in a specific video content. That is, when recognizing a specific video content in units of 10 frames and integrating every 100 frames, detailed recognition meta information may appear as shown in FIG. 4. Source raw data resulting from the actual recognition may be better divided than the graph shown in FIG. 4 according to each recognition meta feature. If specific information is displayed using the results shown in FIG. 4, it is possible to obtain a result of repeating that the specific information repeatedly appears and disappears at each time point whether or not a specific person appears.

한편, 도 4에 나타낸 인식 메타 정보들을 유의미한 일정 간격으로 통합할 경우 도 5와 같은 분포가 나타날 수 있다. 도 5는 도 4에 도시된 인식 메타 정보들을 일정 기준에 따라 통합 결과를 나타낸 것이다. 예컨대, 도 5에 도시된 그래프는 도 4의 인식 메타 정보들을 기반으로 지정된 시간 구간 동안(예: 25초 또는 30초 구간 동안) 특정 인물이 출연한 경우 연속된 하나의 구간으로 통합한 결과를 나타낸 것일 수 있다. 이렇게 유의미한 단위로 묶어주게 되면 어떠한 인물의 등장 시작과 끝 지점을 유의미한 형태로 정의할 수 있게 되고, 무의미하게 반복적으로 등장했다 사라지는 형태의 인식 메타 데이터를 정제할 수 있다. On the other hand, when the recognition meta information shown in FIG. 4 is integrated at a significant interval, a distribution as shown in FIG. 5 may appear. FIG. 5 shows the integration result of the recognition meta information shown in FIG. 4 according to a certain criterion. For example, the graph illustrated in FIG. 5 shows the result of merging into one continuous section when a specific person appears during a designated time interval (for example, during a 25 second or 30 second interval) based on the recognition meta information of FIG. 4. May be In this way, it is possible to define the start and end points of a character in a meaningful form by tying them up in meaningful units, and refine the recognition metadata in the form of appearing and disappearing repeatedly insignificantly.

상기 시퀀스 추출부(257)는 각각의 통합된 인식 메타 정보를 활용하여 시퀀스 지점이 되는 위치를 찾아내는 부분이다. 시퀀스 추출부(257)는 앞서 메타 정보 통합부(255)가 일정 인식 메타 간격끼리 통합한 인식 메타 정보들에 대하여 각종 인식 메타 등장 시간의 시작과 끝 지점을 파악한다. 상기 통합된 인식 메타 정보의 시작 및 끝 지점은 시퀀스 정보 추출의 가장 중요한 입력 값으로 작용할 수 있다. 즉, 어떤 인물의 새로운 등장이나, 배경 음원의 전환, 또는 기타 상황의 인식 메타 정보 값이 시퀀스를 구분하는데 중요한 파라미터가 될 수 있다. 본 발명의 시퀀스 추출부(257)는 전체 인식 메타의 변화가 잦은 지점의 랭킹을 추출할 수 있다. 이러한 시퀀스 추출부(257)는 문맥상 변화가 되는 시퀀스의 후보가 될 수 있는 랭킹을 뽑고, 변화가 심한 순으로 문맥의 변화를 판단할 수 있다. The sequence extracting unit 257 is a part that finds a position to be a sequence point by using each integrated recognition meta information. The sequence extraction unit 257 detects the start and end points of various recognition meta appearance times for the recognition meta information that the meta information integration unit 255 previously integrated between the certain recognition meta intervals. The start and end points of the integrated recognition meta information may serve as the most important input values of sequence information extraction. That is, a new appearance of a character, a change of a background sound source, or other meta-information value of a situation may be an important parameter for classifying a sequence. The sequence extraction unit 257 of the present invention can extract the ranking of the point where the overall recognition meta changes frequently. The sequence extracting unit 257 may select a ranking that can be a candidate for a sequence that is contextually changed, and may determine a contextual change in order of severe change.

상기 시퀀스 추출부(257)는 각종 인식 메타의 변환 지점 즉 시작과 끝 지점을 파악하고, 이 지점들을 활용하여 k-means nearest clustering을 수행한다. K-means nearest clustering을 통해 상기 시퀀스 추출부(257)는 나누고자 하는 K개의 클러스터로 각 입력 값들을 모아주는 작업을 수행할 수 있다. 본 발명에 적용되는 구간 구분 정보 추출에 이용되는 인식 메타 정보는 모두 타임 코드의 시점 값이기 때문에 y축이 없는 x축의 일차원 클러스터링 방법이 적용될 수 있다. 상기 시퀀스 추출부(257)는 나누고자 하는 문맥의 개수만큼, 즉 영상 컨텐츠의 길이나 서비스 요구사항에 맞게 k개의 시퀀스 클러스터를 다이나믹하게 생성할 수 있다. 이와 관련하여, 시퀀스 추출부(257)는 다양한 종류의 인식 메타 정보의 시작과 끝 지점 등의 변화 값에 랭킹을 부여하고, 부여된 랭킹을 상기 클러스터 개수에 맞추어 구간 구분 정보를 생성할 수 있다. 이 동작에서, 상기 랭킹은 각종 인식 메타들의 시작과 끝 지점의 변화 값이 상대적으로 높은 경우 보다 높은 랭킹 값을 가질 수 있으며, 각종 인식 메타들의 시작과 끝 지점의 변화 값이 상대적으로 낮은 경우, 상대적으로 낮은 랭킹 값이 부여될 수 있다. The sequence extraction unit 257 identifies the transformation points of various recognition meta, that is, start and end points, and performs k-means nearest clustering by using these points. Through the K-means nearest clustering, the sequence extraction unit 257 may perform a task of collecting each input value into K clusters to be divided. Since the recognition meta information used for extracting the section classification information applied to the present invention is a time value of the time code, the one-dimensional clustering method of the x-axis without the y-axis may be applied. The sequence extracting unit 257 may dynamically generate k sequence clusters according to the number of contexts to be divided, that is, the length of video content or service requirements. In this regard, the sequence extraction unit 257 may assign rankings to change values such as start and end points of various types of recognition meta information, and generate section classification information according to the number of clusters. In this operation, the ranking may have a higher ranking value when the change values of the start and end points of the various recognition meta are relatively high, and when the change values of the start and end points of the various recognition meta are relatively low, the ranking As a result, a low ranking value may be given.

추가로, 구간 구분 정보 생성 시, 사용되는 입력 값 또한 다이나믹하게 지정될 수 있다. 예를 들어 사람이 출연하지 않는 다큐멘터리와 같은 컨텐츠의 경우, 시퀀스 추출부(257)는 출연진 정보 메타를 제외하고 어떤 동물의 인식 데이터나, 음원 정보만을 인식 메타 정보의 시작 및 끝 지점 입력의 값으로 결정하고, 이를 기반으로 k개의 클러스터를 생성할 수 있다. 또는, 음원이 의미가 큰 영화 컨텐츠의 경우, 시퀀스 추출부(257)는 동물이나 사물 등 객체 정보를 제거하고 음원과 출연진의 정보만을 인식 메타 정보로 결정하고, 결정된 인식 메타 정보를 기반으로 지정된 개수의 클러스터를 산출하기 위한 클러스터링을 수행할 수 있다. 상술한 바와 같이, 본 발명의 시퀀스 추출부(257)가 운용하는 시퀀스 클러스터는 나누고자 하는 서비스 요구사항에 맞는 다이나믹한 적용과, VOD 컨텐츠 특성에 따른 메타 값을 동적으로 반영할 수 있다. 상술한 기능 적용을 위하여, 시퀀스 추출부(257)는 컨텐츠 관련 정보를 확인하고, 컨텐츠 관련 정보에 따라, 어떠한 종류의 인식 메타 정보를 추출하고, 시퀀스 검출에 이용할지를 결정할 수 있다. In addition, when generating section classification information, input values used may also be dynamically designated. For example, in the case of content such as a documentary in which a person does not appear, the sequence extraction unit 257 recognizes only animal recognition data or sound source information, except for the cast information meta, as the values of inputting the start and end points of the meta information Decide, and based on this, k clusters can be generated. Alternatively, in the case of movie content having a significant sound source, the sequence extraction unit 257 removes object information such as animals or objects, determines only the information of the sound source and the cast as recognition meta information, and a specified number based on the determined recognition meta information Clustering may be performed to calculate a cluster of. As described above, the sequence cluster operated by the sequence extraction unit 257 of the present invention can dynamically reflect the dynamic application according to the service requirements to be shared and meta values according to VOD content characteristics. In order to apply the above-described function, the sequence extracting unit 257 may check content-related information, extract what type of recognition meta-information according to the content-related information, and determine what to use for sequence detection.

도 6은 도 5에서 설명한 통합된 인식 메타 정보들에 대하여 수동으로 시퀀스를 구분하여 태깅한 값을 나타낸 것이며, 도 7은 태깅된 정보들을 지정된 규칙에 따라 통합한 값을 나타낸 것이다. 예컨대, 복수의 인식 기술을 통해 획득된 인식 메타 정보들 중 배우 및 음악 메타의 시작점과 끝 점이 겹쳐지거나 일정 범위 내에 위치할 수 있는데, 이러한 특징을 활용하여 각종 인식 메타의 시작과 끝 지점을 기록한 뒤 이 점들이 지정된 근접 거리 이내에 있는 경우 해당 값들을 모아주고, 변화가 잦은 시퀀스 클러스터를 찾기 위해 k-means nearest 클러스터링을 수행한다. FIG. 6 shows the values obtained by manually classifying and tagging the combined recognition meta information described in FIG. 5, and FIG. 7 shows values obtained by combining the tagged information according to a specified rule. For example, among the recognition meta information acquired through a plurality of recognition technologies, the start and end points of the actor and music meta may overlap or be located within a certain range, and after using these features, the start and end points of various recognition meta are recorded. If these points are within a specified proximity distance, the corresponding values are collected and k-means nearest clustering is performed to find a sequence cluster with frequent changes.

클러스터링 과정 수행에 따라, 시퀀스 추출부(257)는 각 인식 메타 변화가 지정된 값 이상으로 자주 변하는 곳을 기준으로 클러스터링된 클러스터들의 분포를 구분할 수 있다. 시퀀스 추출부(257)는 상술한 클러스터링된 클러스터들의 분포에 따라 클러스터들(또는 시퀀스 구간들)을 다르게 구분할 수 있다. 이렇게 모아진 클러스터 지점들이 바로 하나의 시퀀스 분할 개체로 이용될 수 있다. 도 8은 분할된 클러스터들을 기준으로 시퀀스 후보를 구성하고, 요구된 시퀀스 개수에 따라 클러스터링의 중앙값(k)값을 동적으로 지정한 값을 나타낸 것이다. 상술한 클러스터링 기법을 사용하여 동적으로 클러스터의 개수를 지정했을 때의 가장 큰 이점은 실제 서비스 요구사항에 따라 가변적인 시퀀스 분할 요구사항을 한 번의 구현으로 대응할 수 있으며, 인식 메타의 등장과 사라짐이 잦은, 즉 변화가 상대적으로 잦은 영상 컨텐츠 이거나 인식 메타의 변화가 상대적으로 적은 컨텐츠 등의 특성에 상관 없이 요구된 시퀀스 클러스터의 개수만큼 시퀀스를 효과적으로 나눌 수 있도록 지원한다.As the clustering process is performed, the sequence extracting unit 257 may classify the distribution of clustered clusters based on where each meta change of metamorphosis changes more frequently than a specified value. The sequence extraction unit 257 may divide the clusters (or sequence sections) differently according to the above-described distribution of clustered clusters. The cluster points collected in this way can be directly used as a sequence division entity. FIG. 8 shows a value configured by configuring a sequence candidate based on the divided clusters and dynamically designating a median (k) value of clustering according to the requested number of sequences. The biggest advantage of dynamically specifying the number of clusters using the above-described clustering technique is that it can cope with variable sequence segmentation requirements in one implementation according to actual service requirements, and frequent appearances and disappearance of recognition meta. That is, it supports to effectively divide the sequence by the number of requested sequence clusters regardless of characteristics such as content with relatively frequent changes or content with relatively little change in recognition meta.

또한 시퀀스 추출부(257)는 다양한 인식 메타들의 각각의 중요도를 시퀀스 구분에 적용할 수 있다. 예를 들어 예능 장르에 경우, 시퀀스 추출부(257)는 사전 정의된 설정에 따라 음원을 주요 요소로 작용하는 것으로 판단하고, 인식 메타 중 음원 인식 메타의 입력 점을 두 배로 늘려 클러스터 분포를 구분할 수 있다. 또는, 시퀀스 추출부(257)는 사전 정의된 설정에 따라, 특정 드라마나 영화에서 주연 배우가 중요한 경우로 판단되면, 주연 배우의 등장 지점의 검출을 두 배 정도로 증가시켜, 주연 배우의 등장 지점이 클러스터 분류의 더 많은 영향을 주는 형태로 시퀀스를 추출하여 저장한다. In addition, the sequence extraction unit 257 may apply the importance of each of the various recognition meta to the sequence classification. For example, in the genre of entertainment, the sequence extraction unit 257 determines that the sound source acts as a main element according to a predefined setting, and doubles the input point of the sound source recognition meta among the recognition meta to distinguish the cluster distribution. have. Alternatively, the sequence extracting unit 257 increases the detection of the starring actor's appearance point to about twice when the main actor is determined to be an important case in a specific drama or movie according to a predefined setting. Sequences are extracted and stored in a form that has more influence on cluster classification.

이렇게 추출된 도 8과 같은 결과를 실제 시퀀스로 구분하는 방법은 도 9에서 보이는 것처럼 나뉘어진 클러스터링의 최대값 혹은 최소값을 기준으로 시퀀스를 구분하는 방법이다. 도 9는 클러스터링 최대/최소 값 기준의 시퀀스 구분 방법의 한 예를 나타낸 것이다. 또는, 시퀀스 구분에 최대값 혹은 최소값을 사용하지 않고 중앙값(별표)에서 가장 가까운 포인트를 기준으로 시퀀스를 분할하고 이를 서비스의 니즈에 맞게 사용 할 수 있다.The method of classifying the extracted result as shown in FIG. 8 into an actual sequence is a method of classifying a sequence based on the maximum or minimum value of divided clustering as shown in FIG. 9. 9 shows an example of a method for classifying sequences based on clustering maximum/minimum values. Alternatively, the sequence may be divided based on the point closest to the median (asterisk) without using the maximum or minimum values for the sequence classification, and used according to the needs of the service.

예를 들어 실제 클러스터링 수행 결과가 도 10에 나타낸 바와 같이 나온 경우, 시퀀스 추출부(257)는 시퀀스의 분할과 관련하여, 해당 Cluster가 가진 각 Point의 최소값, 최대값, 또는 중앙 값(즉, 해당 클러스터의 평균값)에서 가장 가까운 지점을 선택할 수 있다. 예를 들어, 시퀀스 추출부(257)는 인식 메타의 변화 시작 점으로 시퀀스를 구분할 경우 최소값을 기준으로 클러스터를 분할하고, 또는 변화가 가장 잦은 지점으로 시퀀스를 구분할 경우 중앙 값에 가장 가까운 지점 값을 기준으로 클러스터를 분할 할 수 있다. For example, when the actual clustering result is shown as shown in FIG. 10, the sequence extracting unit 257 is related to the division of the sequence, the minimum, maximum, or median value of each point of the corresponding cluster (ie, corresponding The average point of the cluster) can be selected. For example, the sequence extracting unit 257 divides the cluster based on the minimum value when classifying a sequence as the starting point of change of the recognition meta, or, when classifying a sequence by the point where the change is most frequent, determines the point value closest to the central value. Clusters can be partitioned by reference.

도 10은 시퀀스의 시작을 최소값으로 설정한 경우의 한 예를 나타낸 도면이다. 도 10에 나타낸 각 숫자는 어떠한 인식 메타(인물의 등장, 음원의 등장, 상황메타의 등장 등)의 시작과 끝 지점에 대한 Timecode라고 가정한다. 이런 식으로 Sequence의 시작 점을 나눈 경우 도 10과 같이 시퀀스에 대한 분할을 수행할 수 있다. 10 is a diagram showing an example of a case where the start of a sequence is set to a minimum value. It is assumed that each number shown in FIG. 10 is a timecode for the start and end points of a certain recognition meta (person appearance, sound source appearance, situation meta appearance, etc.). When the starting point of the sequence is divided in this way, the sequence may be split as shown in FIG. 10.

상기 서비스 제공부(259)는 시퀀스 정보(243)에 저장된 구간 구분 정보들을 토대로 지정된 서비스를 제공할 수 있다. 예컨대, 서비스 제공부(259)는 시퀀스 추출부(257)에서 각종 인식 메타 정보의 종합적인 오케스트레이션으로 정의한 시퀀스 정보(243)를 가지고 다른 응용 서비스에서 시퀀스 정보(243)를 사용할 수 있도록 API를 제공하는 등의 서비스를 제공할 수 있다. 예를 들면, 서비스 제공부(259)는 시퀀스 추출부(257)에서 활용될 수 있는 입력 값과 결과 값을 화면 또는 데이터 형태로 제공할 수 있다. 또는, 서비스 제공부(259)는 상기 시퀀스 정보(243)를 이용하여 특정 영상 컨텐츠에 중간 광고를 삽입하거나, 부가 정보를 삽입하고, 단말 장치(100) 요청에 대응하여 중간 광고가 삽입된 영상 컨텐츠를 단말 장치(100)에 제공할 수 있다. The service providing unit 259 may provide a designated service based on section classification information stored in the sequence information 243. For example, the service provider 259 provides the API so that the sequence information 243 can be used by other application services with the sequence information 243 defined by the sequence extraction unit 257 as a comprehensive orchestration of various recognition meta information. And other services. For example, the service providing unit 259 may provide input values and result values that can be utilized by the sequence extraction unit 257 in the form of a screen or data. Alternatively, the service providing unit 259 inserts the intermediate advertisement into the specific video content using the sequence information 243, inserts additional information, and inserts the additional advertisement, and the intermediate content is inserted in response to the request of the terminal device 100 Can be provided to the terminal device 100.

도 11은 본 발명의 실시 예에 따른 인식 메타 정보를 이용한 구간 구분 정보 처리 방법의 한 예를 나타낸 도면이다.11 is a diagram illustrating an example of a method for processing section classification information using recognition meta information according to an embodiment of the present invention.

도 11을 참조하면, 본 발명의 실시 예에 따른 구간 구분 정보 처리 방법과 관련하여, 서비스 장치(200)의 프로세서(250)는 1101 단계에서, 영상 컨테츠가 수집되는지 확인할 수 있다. 영상 컨텐츠가 수집되지 않는 경우, 프로세서(250)는 1103 단계에서 해당 기능 수행을 처리할 수 있다. 예를 들어, 프로세서(250)는 단말 장치(100) 접속을 지원하고, 단말 장치(100) 요청에 따라 지정된 영상 컨텐츠를 단말 장치(100)에 제공할 수 있다. 이때, 단말 장치(100)에 제공하는 영상 컨텐츠는 시퀀스 정보(243)에 따라 적어도 하나의 부가 정보가 삽입된 영상 컨텐츠를 포함할 수 있다. 상기 부가 정보는 예컨대, 지정된 영상 광고를 포함할 수 있다. 영상 컨텐츠 수집과 관련하여, 프로세서(250)는 일정 주기 또는 컨텐츠 제공 서버(300)의 알림(예: 새로운 영상 컨텐츠 등록 안내)에 따라 컨텐츠 제공 서버(300)와 통신 채널을 형성하고, 상기 컨텐츠 제공 서버(300)로부터 영상 컨텐츠를 수집할 수 있다. Referring to FIG. 11, in relation to a method for processing section classification information according to an embodiment of the present invention, the processor 250 of the service device 200 may determine whether image content is collected in step 1101. If the video content is not collected, the processor 250 may process the function execution in step 1103. For example, the processor 250 supports access to the terminal device 100 and may provide the terminal device 100 with video content designated according to the request of the terminal device 100. In this case, the video content provided to the terminal device 100 may include video content in which at least one additional information is inserted according to the sequence information 243. The additional information may include, for example, a designated video advertisement. With regard to video content collection, the processor 250 forms a communication channel with the content providing server 300 according to a predetermined cycle or a notification from the content providing server 300 (for example, a new video content registration guide), and provides the content Video content may be collected from the server 300.

1101 단계에서 영상 컨텐츠가 수집되면, 프로세서(250)는 1105 단계에서 메타 정보를 추출할 수 있다. 이와 관련하여, 프로세서(250)는 복수의 인식 기술을 이용하여 다양한 인식 메타 정보를 영상 컨텐츠에서 추출할 수 있다. 예컨대, 상기 프로세서(250)는 얼굴 인식 기술을 이용하여 적어도 하나의 특정 인물과 관련한 인식 메타를 추출할 수 있다. 또는, 프로세서(250)는 사물 인식 기술을 이용하여 적어도 하나의 특정 사물 또는 특정 동물과 관련한 인식 메타를 추출할 수 있다. 또는, 프로세서(250)는 음원 인식 기술을 이용하여 영상 컨텐츠의 적어도 일부 구간에 적용된 음원과 관련한 인식 메타를 추출할 수 있다. 상술한 작업을 위하여, 상기 프로세서(250)는 영상 컨텐츠의 프레임들을 각 인식 기술에 따른 인식 단위로 구분할 수 있다. When the image content is collected in step 1101, the processor 250 may extract meta information in step 1105. In this regard, the processor 250 may extract various recognition meta information from image content using a plurality of recognition technologies. For example, the processor 250 may extract a recognition meta associated with at least one specific person using face recognition technology. Alternatively, the processor 250 may extract the recognition meta associated with at least one specific object or a specific animal using object recognition technology. Alternatively, the processor 250 may extract the recognition meta associated with the sound source applied to at least a portion of the video content using the sound source recognition technology. For the above-described operation, the processor 250 may divide frames of image content into recognition units according to respective recognition technologies.

1107 단계에서, 프로세서(250)는 메타 정보 통합을 처리할 수 있다. 예를 들어, 상기 프로세서(250)는 상기 추출된 각 인식 메타들에 대하여 지정된 시간 범위 내에 연속된 인식 메타 정보들을 통합하여 하나의 통합된 인식 메타로 처리할 수 있다. 예를 들어, 특정 인물이 일정 시간 범위 내에서 등장과 퇴장을 반복하는 경우, 해당 시간 범위를 하나의 통합된 인식 메타로 처리할 수 있다. In operation 1107, the processor 250 may process meta information integration. For example, the processor 250 may combine continuous recognition meta information within a designated time range for each extracted recognition meta and process it as one integrated recognition meta. For example, when a specific person repeatedly appears and exits within a certain time range, the corresponding time range may be treated as one integrated recognition meta.

1109 단계에서, 프로세서(250)는 클러스터링 수행 및 시퀀스 추출을 수행할 수 있다. 상기 프로세서(250)는 요청된 시퀀스 개수에 따라 클러스터링 작업을 수행할 수 있다. 이 동작에서, 상기 프로세서(250)는 인식 메타의 변화들을 검출하고, 인식 메타 변화도에 따른 랭킹을 부여할 수 있다. 상기 프로세서(250)는 상기 랭킹을 확인하여, 요청된 시퀀스 개수에 따른 시퀀스 추출을 수행할 수 있다. 예를 들어, 3개의 시퀀스 개수가 요청된 경우, 상기 프로세서(250)는 3위 랭크 안의 클러스터들을 선택할 수 있다. 상기 시퀀스 개수는 예컨대, 영상 컨텐츠를 제공한 컨텐츠 제공 서버(300)로부터 수신하거나 또는 광고주 장치로부터 수신할 수 있다. In operation 1109, the processor 250 may perform clustering and sequence extraction. The processor 250 may perform a clustering operation according to the requested number of sequences. In this operation, the processor 250 detects changes in recognition meta and may assign a ranking according to the degree of change in recognition meta. The processor 250 may check the ranking and perform sequence extraction according to the requested number of sequences. For example, when three sequence numbers are requested, the processor 250 may select clusters in the third rank. The number of sequences may be received, for example, from the content providing server 300 providing video content or from an advertiser device.

1111 단계에서, 프로세서(250)는 구간 구분 정보를 생성할 수 있다. 예컨대, 프로세서(250)는 요청된 시퀀스 개수에 따라 선택된 클러스터들을 기준으로 해당 클러스터들을 구분하는 구간 시점 정보를 구간 구분 정보로 획득할 수 있다. 구간 구분 정보를 획득하면, 1113 단계에서, 프로세서(250)는 구간 구분 정보를 저장할 수 있다. In step 1111, the processor 250 may generate section classification information. For example, the processor 250 may obtain section viewpoint information for classifying the clusters based on the selected clusters according to the requested number of sequences as section segmentation information. If the section classification information is obtained, in step 1113, the processor 250 may store the section classification information.

1115 단계에서, 프로세서(250)는 구간 구분 정보 처리의 종료와 관련한 이벤트 발생이 있는지 확인할 수 있다. 상기 프로세서(250)는 구간 구분 정보 처리 종료와 관련한 이벤트 발생이 없으면, 1101 단계 이전으로 분기하여 이하 동작을 재수행할 수 있다. 구간 구분 정보 처리 종료와 관련한 이벤트가 발생하면, 상기 프로세서(250)는 영상 컨텐츠에 대한 시퀀스 추출 및 구간 구분 정보 생성 동작을 종료할 수 있다. In step 1115, the processor 250 may check whether there is an event related to the end of the section classification information processing. If there is no event related to the termination of section division information processing, the processor 250 may branch to step 1101 and perform the following operation again. When an event related to the termination of section division information processing occurs, the processor 250 may end the sequence extraction of the image content and the section division information generation operation.

다른 예시로, 상기 서비스 장치(200)의 프로세서(250)는 외부 전자 장치(예: 상기 컨텐츠 제공 서버(300) 또는 광고 제공 서버)로부터 구간의 개수 및 광고 소재를 수신할 수 있다. 광고 소재가 수신되면, 상기 프로세서(250)는 수신된 광고 소재를 영상 컨텐츠에 삽입하되, 시퀀스 정보에 따라 삽입할 수 있다. As another example, the processor 250 of the service device 200 may receive the number of sections and advertisement materials from an external electronic device (eg, the content providing server 300 or the advertisement providing server). When an advertisement is received, the processor 250 may insert the received advertisement into video content, but may insert it according to sequence information.

상술한 바와 같이, 본 발명은 영상 컨텐츠에서 인식된 다양한 메타를 활용하여 문맥 상 자연스러운 지점을 찾는 시퀀스를 자동으로 검출해 내는 방법을 제공할 수 있다. 상술한 본 발명은 문맥 상 자연스러운 시퀀스 검출 시스템을 구성하여 사람이 수기로 시퀀스를 찾아내지 않아도 자연스러운 중간 광고 삽입이나 그 분할에 맞는 서비스를 제공할 수 있도록 지원한다.As described above, the present invention can provide a method of automatically detecting a sequence of finding a natural point in context by utilizing various meta recognized in video content. The above-described present invention configures a natural sequence detection system in context, and supports to provide a service suitable for natural interstitial advertisement insertion or segmentation without requiring a person to find a sequence by hand.

종래의 시퀀스의 자동 검출 기술이 단순히 영상 인식 만을 활용해 씬/샷 정보의 묶음으로 제공되었거나, 영상 내에서의 변화량 및 변화 감지를 적용하여 이에 의존한 기술들이었다면 본 발명은 여러 메타 데이터를 효율적으로 결합하여 사용하되, 이를 특정 씬/샷을 구분하는 정보로 바로 활용하는 것이 아니라, 입력 값들이 자주 변하는 지점을 동적으로 검출하고 이를 시퀀스 클러스터로 구성하여 자연스러운 시퀀스 검출을 하는 서비스를 구성하기 위한 방법 및 장치를 제공할 수 있다.If the conventional automatic detection technology of a sequence is provided as a bundle of scene/shot information using only image recognition, or if it is a technology that relies on this by applying a change amount and a change detection in an image, the present invention efficiently utilizes multiple metadata. It is used in combination, but it is not directly used as information to distinguish a specific scene/shot, but a method for constructing a service for natural sequence detection by dynamically detecting a point where input values frequently change and configuring it as a sequence cluster. Device can be provided.

본 발명에 따른 방법은 다양한 컴퓨터 수단을 통하여 판독 가능한 소프트웨어 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM, Random Access Memory), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the present invention may be implemented in a form of software readable through various computer means and recorded on a computer-readable recording medium. Here, the recording medium may include program instructions, data files, data structures, or the like, alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes (Magnetic Media), compact disk read only memory (CD-ROM), optical media such as DVD (Digital Video Disk), and optical media. Includes magneto-optical media, such as a floppy disk, and hardware devices specifically configured to store and execute program instructions such as ROM, random access memory (RAM), flash memory, etc. do. Examples of program instructions may include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. Such a hardware device may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

또한, 본 명세서에서 설명하는 기능적인 동작과 주제의 구현물들은 다른 유형의 디지털 전자 회로로 구현되거나, 본 명세서에서 개시하는 구조 및 그 구조적인 등가물들을 포함하는 컴퓨터 소프트웨어, 펌웨어 혹은 하드웨어로 구현되거나, 이들 중 하나 이상의 결합으로 구현 가능하다. 본 명세서에서 설명하는 주제의 구현물들은 하나 이상의 컴퓨터 프로그램 제품, 다시 말해 본 발명에 따른 장치의 동작을 제어하기 위하여 혹은 이것에 의한 실행을 위하여 유형의 프로그램 저장매체 상에 인코딩된 컴퓨터 프로그램 명령에 관한 하나 이상의 모듈로서 구현될 수 있다. 컴퓨터로 판독 가능한 매체는 기계로 판독 가능한 저장 장치, 기계로 판독 가능한 저장 기판, 메모리 장치, 기계로 판독 가능한 전파형 신호에 영향을 미치는 물질의 조성물 혹은 이들 중 하나 이상의 조합일 수 있다.In addition, implementations of the functional operation and subject matter described in this specification may be implemented in other types of digital electronic circuits, or computer software, firmware, or hardware including the structure and its structural equivalents disclosed herein, or these It can be implemented in combination with one or more of. Implementations of the subject matter described herein are one or more computer program products, that is, one for computer program instructions encoded on a tangible program storage medium to control or thereby execute the operation of a device according to the invention. It can be implemented as the above modules. The computer-readable medium may be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of materials affecting a machine-readable propagated signal, or a combination of one or more of these.

아울러, 본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.In addition, this specification includes details of a number of specific implementations, but these should not be understood as limiting on the scope of any invention or claim, but rather on features that may be specific to a particular embodiment of a particular invention. It should be understood as an explanation. Certain features that are described in this specification in the context of separate embodiments may be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented in multiple embodiments individually or in any suitable subcombination. Further, although features may operate in a particular combination and may be initially depicted as so claimed, one or more features from the claimed combination may in some cases be excluded from the combination, and the claimed combination subcombined. Or sub-combinations.

마찬가지로, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징될 수 있다는 점을 이해하여야 한다.Likewise, although the operations are depicted in the drawings in a particular order, it should not be understood that such operations should be performed in the particular order shown or in sequential order, or that all shown actions should be performed in order to obtain desirable results. In certain cases, multitasking and parallel processing may be advantageous. Also, the separation of various system components of the above-described embodiments should not be understood as requiring such separation in all embodiments, and the described program components and systems are generally integrated together into a single software product or packaged in multiple software products. You should understand that you can.

본 명세서에서 설명한 주제의 특정한 실시형태를 설명하였다. 기타의 실시형태들은 이하의 청구항의 범위 내에 속한다. 예컨대, 청구항에서 인용된 동작들은 상이한 순서로 수행되면서도 여전히 바람직한 결과를 성취할 수 있다. 일 예로서, 첨부도면에 도시한 프로세스는 바람직한 결과를 얻기 위하여 반드시 그 특정한 도시된 순서나 순차적인 순서를 요구하지 않는다. 특정한 구현예에서, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다.Specific embodiments of the subject matter described herein have been described. Other embodiments are within the scope of the following claims. For example, the operations recited in the claims may be performed in different orders while still achieving desirable results. As an example, the process illustrated in the accompanying drawings does not necessarily require that particular illustrated order or sequential order to obtain desirable results. In certain implementations, multitasking and parallel processing can be advantageous.

본 기술한 설명은 본 발명의 최상의 모드를 제시하고 있으며, 본 발명을 설명하기 위하여, 그리고 당업자가 본 발명을 제작 및 이용할 수 있도록 하기 위한 예를 제공하고 있다. 이렇게 작성된 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하는 것이 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다.The described description presents the best mode of the present invention, and provides examples for explaining the present invention and for those skilled in the art to make and use the present invention. This written specification is not intended to limit the invention to the specific terms presented. Therefore, although the present invention has been described in detail with reference to the above-mentioned examples, those skilled in the art can make modifications, alterations and modifications to these examples without departing from the scope of the present invention.

따라서 본 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 특허청구범위에 의해 정하여져야 한다.Therefore, the scope of the present invention should not be determined by the described embodiments, but should be determined by the claims.

본 발명은 영상 컨텐츠의 구간 구분 분야에 적용된다.The present invention is applied to the segmentation field of video content.

특히, 본 발명은 영상 컨텐츠의 인식 메타를 이용하여 구간 구분을 수행함으로써, 시퀀스 구분의 신뢰도를 높이고, 이를 기반으로 다양한 부가 정보를 영상 컨텐츠에 삽입하여 제공함으로써, 사람이 직접 컨텐츠의 구간 구분을 수행하던 작업을 보다 쉽고 빠르게 처리하도록 지원할 수 있다. 이에 따라, 본 발명은 부가 정보 삽입이 필요한 광고주 등의 니즈를 신속하고 정확하게 충족시킬 수 있으며, 영상 컨텐츠를 시청하는 시청자의 부가 정보에 대한 거부감을 최소화하여 광고 효과를 극대화할 수 있다.Particularly, the present invention increases the reliability of the sequence classification by performing section division using the recognition meta of the video content, and by providing various additional information by inserting it into the video content based on this, a person directly performs the section division of the content It can help you do your job more quickly and easily. Accordingly, the present invention can quickly and accurately satisfy the needs of advertisers, etc., which need to insert additional information, and can maximize the advertising effect by minimizing the refusal to the additional information of the viewer watching the video content.

10: 네트워크 환경
100: 단말 장치
150: 네트워크
200: 서비스 장치
210: 통신 회로
240: 메모리
250: 프로세서
251: 컨텐츠 수집부
253: 메타 정보 추출부
255: 메타 정보 통합부
257: 시퀀스 추출부
259: 서비스 제공부10: network environment
100: terminal device
150: network
200: service device
210: communication circuit
240: memory
250: processor
251: content collection unit
253: meta information extraction unit
255: meta information integration unit
257: sequence extraction unit
259: service provider

Claims

◈ Claim 1 was abandoned when payment of the set registration fee was made.◈

Memory;
A communication circuit forming a communication channel with an external electronic device; And
And a processor functionally connected to the memory,
The processor
Acquiring the video content stored in the memory,
Recognizing meta information is extracted from the acquired image content,
Clustering detection time information of the extracted recognition meta information,
After classifying sections according to the distribution of clustered clusters, section classification information is generated,
When the required number of sequences is received from the external electronic device through the communication circuit, a sequence candidate is configured based on the clustered clusters, and the median of the clustering is dynamically designated according to the required number of sequences to classify the sections. Service device characterized in that.

◈ Claim 2 was abandoned when payment of the registration fee was set.◈

According to claim 1,
The processor
Recognized meta information that repeatedly appears within a specified time range is integrated to generate integrated recognition meta information,
And a service device configured to perform clustering on the integrated recognition meta information.

◈ Claim 3 was abandoned when payment of the set registration fee was made.◈

According to claim 1,
The processor
The recognition meta information is extracted using a plurality of recognition technologies,
After performing integrated recognition meta information generation for each extracted recognition meta information,
A service device characterized in that the start and end points of the recognition meta are set to process similar input metadata of a cluster within a specified range and process input values of one cluster.

◈ Claim 4 was abandoned when payment of the set registration fee was made.◈

According to claim 1,
The processor
And checking the content-related information of the video content, and determining the type of the extracted meta-information.

◈ Claim 5 was abandoned when payment of the set registration fee was made.◈

According to claim 4,
The processor
If the video content is a movie content, characterized in that the service device is configured to extract the sound source recognition meta and person recognition meta.

◈ Claim 6 was abandoned when payment of the set registration fee was made.◈

According to claim 4,
The processor
When the video content is a documentary, a service device characterized in that it is set to extract a recognition meta and a sound source recognition meta for a specific object or a specific animal.

◈ Claim 7 was abandoned when payment of the set registration fee was made.◈

According to claim 1,
The processor
A service device characterized by classifying the section by assigning a weight according to the importance of each of the recognition meta information.

◈ Claim 8 was abandoned when payment of the set registration fee was made.◈

According to claim 1,
The processor
Based on the distribution of the clustered clusters, ranking of clusters according to the degree of change according to the appearance of recognition meta is given,
A service device configured to select clusters within a certain ranking as sequences to be used to generate section classification information according to the number of sequences.

Service device,
Extracting recognition meta information from the video content stored in the memory;
Generating integrated recognition meta information by integrating the recognition meta information according to a specified criterion;
Clustering detection time information of the integrated recognition meta information; And
After classifying the sections according to the distribution of the clustered clusters, generating section classification information; includes,
When receiving the required number of sequences from the external electronic device,
The step of generating the section classification information
A method of generating section classification information using recognition meta information, comprising configuring a sequence candidate based on the clustered clusters and dynamically specifying the median of the clustering according to the required number of sequences.

The method of claim 9,
Method of generating section classification information using recognition meta information further comprising; ranking the clusters according to the degree of change according to the appearance of recognition meta based on the distribution of the clustered clusters.

delete

A computer recording medium storing at least one instruction executed by a processor, comprising:
The at least one command
Service device,
Extracting recognition meta information from the video content stored in the memory;
Generating integrated recognition meta information by integrating the recognition meta information according to a specified criterion;
Clustering detection time information of the integrated recognition meta information; And
After classifying the sections according to the distribution of the clustered clusters, generating section classifying information;
The operation of generating the section classification information is
A computer recording medium characterized by configuring a sequence candidate based on the clustered clusters and classifying the sections by dynamically designating a median value of the clustering according to the required number of sequences.