KR102423760B1

KR102423760B1 - System and method for video segmentation based on events

Info

Publication number: KR102423760B1
Application number: KR1020170014815A
Authority: KR
Inventors: 함경준; 박재혁; 조기성
Original assignee: 한국전자통신연구원
Priority date: 2017-02-02
Filing date: 2017-02-02
Publication date: 2022-07-22
Also published as: KR20180089977A

Abstract

본 발명은 영상 이벤트 단위 세그멘테이션 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 문자 중계 및 영상 정보를 이용하여 이벤트 단위로 중계 영상을 세그멘테이션 하는 시스템 및 그 방법에 관한 것이다.
본 발명에 따른 영상 이벤트 단위 세그멘테이션 시스템은 중계영상 및 문자중계 정보를 수신하고, 영상정보를 추출하는 정보 수집부와, 영상 내 이벤트 구간을 식별하는 이벤트 구간 식별부 및 이벤트 구간을 기초로 생성된 영상 세그먼트를 서비스하는 영상 서비스부를 포함하는 것을 특징으로 한다. The present invention relates to a video event unit segmentation system and method, and more particularly, to a system and method for segmenting a relay video in event units using text relay and image information.
The video event unit segmentation system according to the present invention includes an information collecting unit for receiving a relay image and text relay information and extracting image information, an event section identification unit for identifying an event section within the video, and an image generated based on the event section It is characterized in that it comprises a video service unit for servicing the segment.

Description

Video event unit segmentation system and method

본 발명은 영상 이벤트 단위 세그멘테이션 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 문자중계 및 영상정보를 이용하여 이벤트 단위로 중계 영상을 세그멘테이션 하는 시스템 및 그 방법에 관한 것이다. The present invention relates to a video event unit segmentation system and a method therefor, and more particularly, to a system and method for segmenting a relayed video in an event unit using text relay and image information.

IPTV와 OTT 시장의 확대로 방대한 양의 콘텐츠가 시청자에게 제공되고 있으며 특히 스포츠 중계는 국내 대부분의 경기와 국외 주요 경기 컨텐츠가 제공되고 있다. With the expansion of IPTV and OTT markets, a vast amount of content is being provided to viewers.

이렇게 방대한 양의 스포츠 영상물에서 시청자의 콘텐츠 소비 가능 시간은 한정 되어 있기 때문에 평균적인 스포츠 콘텐츠의 재활용성은 드라마나 다큐멘터리와 같은 다른 유형의 콘텐츠에 비해 떨어지는 편이다. In such a vast amount of sports videos, viewers have a limited time to consume content, so the average recyclability of sports content is lower than that of other types of content such as dramas and documentaries.

또한 시청자 입장에서 원하는 팀이나 원하는 선수의 플레이만 선별하여 보고 싶어하는 수요가 있지만 일방적인 중계 방송 편성으로 인해 개인화된 서비스가 불가능한 문제점이 있다.In addition, there is a demand for viewers to select and watch only the desired team or player's play, but there is a problem that personalized service is impossible due to one-sided broadcast programming.

또한, 최근 시청자가 원하는 종목의 주요 경기 장면을 편집하여 제공하려는 서비스가 대형 포탈을 중심으로 제공되고 있으나 이는 사람이 직접 개입하여 이벤트 단위로 영상을 편집하고 태깅을 하기 때문에 제공되는 콘텐츠의 범위가 특정 선수 위주이거나 특정 경기에 몰려 있는 경우가 많은 문제점이 있다. In addition, recently, services to edit and provide main game scenes of events that viewers want are provided mainly through large portals. There are many problems with players centered or concentrated on a specific game.

본 발명은 전술한 문제점을 해결하기 위하여 제안된 것으로, 중계 영상에 대한 시맨틱 인댁싱(semantic indexing)을 자동 수행하여, 시청자가 원하는 선수나 이벤트 유형을 선별하여 시청하도록 지원하는 것이 가능한 이벤트 단위 영상 세그멘테이션 시스템 및 그 방법을 제공하는데 그 목적이 있다. The present invention has been proposed to solve the above-described problem, and by automatically performing semantic indexing on a relay image, it is possible to support the viewer to select and watch a desired player or event type, event unit image segmentation An object of the present invention is to provide a system and a method therefor.

본 발명에 따른 영상 이벤트 단위 세그멘테이션 시스템은 중계영상 및 문자중계 정보를 수신하고, 영상정보를 추출하는 정보 수집부와, 영상 내 이벤트 구간을 식별하는 이벤트 구간 식별부 및 이벤트 구간을 기초로 생성된 영상 세그먼트를 서비스하는 영상 서비스부를 포함하는 것을 특징으로 한다. The video event unit segmentation system according to the present invention includes an information collecting unit for receiving a relay image and text relay information and extracting image information, an event section identification unit for identifying an event section within the video, and an image generated based on the event section It is characterized in that it comprises a video service unit for servicing the segment.

본 발명에 따른 영상 이벤트 단위 세그멘테이션 방법은 중계영상 및 문자중계 정보를 수신하는 단계와, 중계영상에서 영상정보를 추출하는 단계 및 영상정보를 이용하여 문자중계 정보에 포함되는 이벤트에 해당하는 이벤트 구간을 식별하고, 영상 세그먼트를 생성하는 단계를 포함하는 것을 특징으로 한다. The video event unit segmentation method according to the present invention includes the steps of receiving a relay image and text relay information, extracting image information from the relay image, and using the image information to define an event section corresponding to an event included in text relay information. and identifying and generating an image segment.

본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 시스템 및 그 방법은 샷클래스, 리플레이 구간 여부 정보뿐 아니라 카메라 패닝 정보, 경기가 진행되고 있는지 여부, 경기 시간이 영상에 표시되어 있는지 여부, 샷클래스가 현재에서 바뀌는 시점까지 소요되는 시간, 경기 진행 여부 등 다수의 자질을 이용하여 이벤트 영상 구간에 대한 식별 정확도를 향상시키는 효과가 있다. A video event unit segmentation system and method according to an embodiment of the present invention include information about the shot class and replay section as well as camera panning information, whether a game is in progress, whether the game time is displayed in the video, and whether the shot class is currently There is an effect of improving the identification accuracy of the event video section by using a number of qualities, such as the time required to change from .

본 발명에 따르면 영상에 대한 의미적인 색인 정보가 생성되어, 미디어 서비스 환경에서 시청자가 선호하는 선수, 팀 또는 이벤트 유형 별로 콘텐츠를 선별하여 제공하는 것이 가능한 효과가 있으며, 예컨대 득점 장면만 필터링하여 경기에 대한 하이라이트 영상 생성을 자동으로 수행하는 것이 가능한 효과가 있다. According to the present invention, semantic index information for an image is generated, and it is possible to select and provide content by player, team, or event type preferred by viewers in a media service environment. There is a possible effect of automatically generating a highlight image for the .

본 발명의 효과는 이상에서 언급한 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 시스템의 구성을 나타내는 도면이다.
도 2는 본 발명의 실시예에 따른 문자 중계 정보를 나타내는 도면이다.
도 3은 본 발명의 실시예에 따른 규칙 기반 이벤트 구간 식별 과정을 나타내는 도면이다.
도 4는 본 발명의 실시예에 따른 학슴모델 기반 이벤트 구간 식별의 학습 데이터를 나타내는 도면이다.
도 5는 본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 방법을 나타내는 순서도이다. 1 is a diagram showing the configuration of a video event unit segmentation system according to an embodiment of the present invention.
2 is a diagram illustrating text relay information according to an embodiment of the present invention.
3 is a diagram illustrating a rule-based event section identification process according to an embodiment of the present invention.
4 is a diagram illustrating learning data of Haksim model-based event section identification according to an embodiment of the present invention.
5 is a flowchart illustrating a video event unit segmentation method according to an embodiment of the present invention.

본 발명의 전술한 목적 및 그 이외의 목적과 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. The above and other objects, advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings.

그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 목적, 구성 및 효과를 용이하게 알려주기 위해 제공되는 것일 뿐으로서, 본 발명의 권리범위는 청구항의 기재에 의해 정의된다. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the following embodiments are intended for those of ordinary skill in the art to which the present invention pertains. It is only provided to easily inform the composition and effect, and the scope of the present invention is defined by the description of the claims.

한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자가 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가됨을 배제하지 않는다.Meanwhile, the terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural, unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” means that a referenced component, step, operation, and/or element is the presence of one or more other components, steps, operations and/or elements. or added.

본 발명의 바람직한 실시예를 설명하기에 앞서, 이하에서는 당업자의 이해를 돕기 위하여 본 발명이 제안된 배경을 먼저 살펴보기로 한다. Before describing a preferred embodiment of the present invention, a background in which the present invention is proposed to help those of ordinary skill in the art in the following will first look at the background.

종래 기술에 따른 하이라이트 콘텐츠 구성은 경기영상을 분석하여 주요 부분을 추출하는 구성을 제안하고 있으나, 전술한 바와 같이 사람이 직접 개입하여 이벤트 단위로 영상을 편집하고 태깅함으로써, 시청자가 원하는 영상 이벤트에 대하여 개인화된 영상 서비스 제공이 불가능하고, 시청자는 오직 서비스 제공자에 의해 생성/제공되는 하이라이트 영상만을 시청할 수 있다는 한계가 있다. The highlight content configuration according to the prior art suggests a configuration in which the main part is extracted by analyzing the game video. It is impossible to provide a personalized image service, and there is a limit in that a viewer can only view a highlight image created/provided by the service provider.

본 발명은 전술한 문제점을 해결하기 위하여 제안된 것으로, 스포츠 중계영상에 대한 시맨틱 인덱싱을 자동 수행함으로써, 시청자가 원하는 선수나 이벤트 유형을 요청하면, 그에 대한 영상 세그먼트를 추출하여 서비스하는 것이 가능한 영상 이벤트 단위 세그멘테이션 시스템 및 그 방법을 제안한다. The present invention has been proposed to solve the above problems, and by automatically performing semantic indexing on sports relay images, when a viewer requests a desired player or event type, it is possible to extract and service an image segment for the desired player or event type. A unit segmentation system and method are proposed.

일반적으로 슛과 같은 이벤트가 발생하였을 때, 영상에서 해당 이벤트에 해당하는 지점은 볼이 골대로 들어가는 한 장면이지만, 시청자가 관심있어 하는 이벤트 영상은 해당 이벤트가 발생하게 된 과정 및 발생 후의 일정 장면을 포함하고 있는 영상이라 할 것이다. In general, when an event such as a shot occurs, the point corresponding to the event in the video is a scene where the ball enters the goal, but the event video that the viewer is interested in shows the process of the event and a certain scene after the event. It's supposed to be an embedded video.

그런데, 해당 이벤트와 관련성이 적은 장면이 이벤트 영상에 포함될 경우, 사용자 만족도가 크게 하락할 것이므로, 각 이벤트에 대한 정확한 시작 지점 및 종료 지점을 식별하여 이벤트 영상을 제공하는 것이 중요하며, 이러한 기술적 과제를 달성하기 위하여 제안된 본 발명의 실시예에 대한 상세한 설명을 이하 서술하기로 한다.However, if a scene with little relevance to the event is included in the event video, user satisfaction will decrease significantly. Therefore, it is important to identify the exact starting point and ending point for each event to provide the event video, and to achieve this technical task A detailed description of the embodiments of the present invention proposed in order to do so will be described below.

도 1은 본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 시스템의 구성을 나타내는 도면이다. 1 is a diagram showing the configuration of a video event unit segmentation system according to an embodiment of the present invention.

본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 시스템은 문자중계로부터 이벤트 발생 시간과 내용을 추출하고, 중계영상에서 해당 지점을 찾아낸 후 이벤트 구간을 식별하여 추출함으로써, 이벤트 단위의 영상 클립을 자동으로 생성하고 이를 시청자에게 서비스한다. The video event unit segmentation system according to an embodiment of the present invention extracts the event occurrence time and contents from the text relay, finds the corresponding point in the relay image, identifies and extracts the event section, and automatically generates an event unit video clip and serve it to viewers.

본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 시스템은 중계영상 및 문자중계 정보를 수신하고, 영상정보를 추출하는 정보 수집부(310)와, 문자중계 정보 및 영상정보를 이용하여 영상 내 이벤트 구간을 식별하는 이벤트 구간 식별부(320) 및 이벤트 구간을 기초로 생성된 영상 세그먼트를 서비스하는 영상 서비스부(330)를 포함한다. The video event unit segmentation system according to an embodiment of the present invention includes an information collecting unit 310 that receives a relay image and text relay information and extracts image information, and uses the text relay information and the image information to define an event section in an image. It includes an event section identification unit 320 to identify and a video service unit 330 for servicing the video segment generated based on the event section.

본 발명의 실시예에 따른 정보 수집부(310)는 영상정보 추출부(310) 및 문자중계정보 전처리부(320)를 포함하며, 비디오스트리밍 캡쳐부로(100)부터 중계영상을 수신하고, 실시간 크롤링부(200)로부터 문자중계 정보를 수신한다.The information collection unit 310 according to an embodiment of the present invention includes an image information extraction unit 310 and a text relay information pre-processing unit 320, and receives a relay image from the video streaming capture unit 100, and performs real-time crawling. Receives text relay information from the unit 200 .

본 발명의 실시예에 따른 정보 수집부(310)는 중계영상에서 시간 정보, 샷클래스 정보 및 카메라 패닝 정보 중 적어도 어느 하나를 추출하는데, 바람직하게는 Convolutional Neural Network(CNN) 기법을 이용하여 중계 영상에서 시간 정보와 샷클래스 (코트뷰, 클로즈업) 정보를 추출하고, optical flow estimation과 flow segmentation 알고리즘을 이용하여 카메라 패닝 정보를 추출한다.The information collection unit 310 according to an embodiment of the present invention extracts at least one of time information, shot class information, and camera panning information from a relay image, and preferably uses a Convolutional Neural Network (CNN) technique to extract the relay image. Time information and shot class (court view, close-up) information are extracted from the , and camera panning information is extracted using optical flow estimation and flow segmentation algorithms.

본 발명의 실시예에 따른 정보 수집부(310)는 추출된 정보를 다시 가공하여 부가적인 자질을 구성하는데, 즉 경기의 진행 여부, 경기 시간이 영상에 표시되어 있는지 여부 및 샷클래스가 바뀌는 시점에 소요되는 시간 중 적어도 어느 하나를 자질로 생성하여 영상정보를 추출한다. The information collection unit 310 according to an embodiment of the present invention processes the extracted information again to configure additional qualities, that is, whether the game is in progress, whether the game time is displayed in the image, and when the shot class is changed. Image information is extracted by generating at least one of the required times as a feature.

본 발명의 실시예에 따른 영상 서비스부(330)는 이벤트 구간을 기초로 생성된 영상 세그먼트를 하위 비디오클립 파일로 저장하고, 해당 비디오클립 파일에 대한 이벤트 정보를 메타데이터로 저장하여 데이터베이스를 구축하며, 시청자의 선택에 따라 이벤트 단위 서비스를 TV, 태블릿 등의 사용자 단말(400)로 제공한다. The video service unit 330 according to an embodiment of the present invention stores the video segment generated based on the event section as a lower video clip file, stores event information for the video clip file as metadata, and builds a database. , an event unit service is provided to the user terminal 400 such as a TV or tablet according to a viewer's selection.

본 발명의 실시예에 따른 정보 수집부(310)가 수신하는 문자중계 정보는 도 2에 도시된 바와 같이, 이벤트 발생 시각 및 어떤 선수가 어떤 행동을 하였는지(농구의 경우 득점, 리바운드, 어시스트, 반칙, 교체 등)에 대한 정보를 포함한다. As shown in FIG. 2 , the text relay information received by the information collection unit 310 according to an embodiment of the present invention includes the event occurrence time and a player's action (in the case of basketball, scoring, rebounding, assisting, fouling). , replacement, etc.).

본 발명의 실시예에 따른 정보 수집부(310)는 중계영상으로부터 영상정보를 추출하게 되는데, 이는 후술하는 이벤트 구간 식별부(320)가 이벤트의 시작 지점 및 종료 지점을 정확하게 식별하도록 하기 위한 것이다. The information collection unit 310 according to an embodiment of the present invention extracts image information from the relay image, so that the event section identification unit 320, which will be described later, accurately identifies the start point and the end point of the event.

하나의 이벤트에 대하여 해당 영상 구간을 정확히 찾아내기 위한 가장 기본적인 정보는 경기 시각이다. The most basic information for accurately finding the corresponding video section for one event is the game time.

즉, 도 2에 도시한 문자중계 정보의 예에서 볼 수 있듯이, 모든 이벤트는 발생 시각이 있기 때문에, 영상에서 숫자 인식 기법을 활용하여 해당 발생 시간이 노출되는 지점을 찾는 것이 가능하다.That is, as can be seen in the example of the text relay information shown in FIG. 2 , since all events have an occurrence time, it is possible to find a point at which the corresponding occurrence time is exposed by using a number recognition technique in the image.

그런데, 시청자는 해당 이벤트가 벌어진 과정 및 이벤트 후 일정 장면을 보고 싶어하는 경향이 있으므로, 해당 이벤트가 발생한 과정과 발생 후의 일정 장면도 그 이벤트의 구간에 포함되어야 하며, 해당 이벤트와 관련도가 적은 장면이 포함되는 경우에는 사용자의 만족도가 낮아질 것이므로, 이벤트의 시작 및 종료 지점을 정확히 식별하는 것이 중요한 과제이다. However, since viewers tend to want to see a scene in which the event occurred and a scene after the event, a scene in which the event occurred and a scene after the event should also be included in the section of the event, and scenes with little relevance to the event should be included. If it is included, the user's satisfaction will be lowered, so it is an important task to accurately identify the start and end points of the event.

실제 문자중계 상의 이벤트 발생 시각과 실제 경기에서의 발생 시각은 1 내지 3초 정도의 오차가 있으므로, 이벤트의 시작 지점과 종료 지점은 이벤트 별로 상이하다. Since there is an error of about 1 to 3 seconds between the event occurrence time in the text relay and the actual game occurrence time, the start point and the end point of the event are different for each event.

따라서 본 발명의 실시예에 따르면, 이벤트 구간을 자동으로 찾아내기 위하여 스포츠 중계 방송의 일반적인 제작 규칙을 활용한다.Therefore, according to an embodiment of the present invention, a general production rule of a sports relay is used to automatically find an event section.

예를 들어, 농구 경기에서 대부분의 슛이나 골 이벤트 이후에는 카메라가 코트뷰 샷에서 클로즈업 샷으로 전환되어 해당 플레이어를 비추는 경우가 많다.For example, in basketball games, after most shot or goal events, the camera will often switch from a court view shot to a close-up shot to illuminate the player.

또한 많은 이벤트가 골대가 있는 코트의 양 끝에서 발생하는데, 이때 카메라의 패닝(panning, 동체의 속도나 진행방향에 맞춰서 카메라를 이동시키면서 촬영하는 기법)이 거의 없거나 느리게 발생한다. In addition, many events occur at both ends of the court where the goal is located, and there is little or no panning of the camera (a technique of shooting while moving the camera according to the speed or direction of the body).

또한 경기 시각이 멈추거나 사라지는 경우 경기 진행이 중단된 경우이므로 이벤트 구간 선정 시 해당 부분을 필터링함이 바람직하다. In addition, if the game time stops or disappears, it is the case that the game is stopped, so it is desirable to filter the relevant part when selecting the event section.

본 발명의 실시예에 따르면 규칙을 이용한 이벤트 구간 식별과 기계학습 알고리즘(예: CRFs)를 이용하여 이벤트 구간 식별함을 조합하였으며, 해당 학습 모델에는 전술한 방송 제작 규칙이 자연스럽게 녹아 들어가게 된다.According to an embodiment of the present invention, event section identification using rules and event section identification using machine learning algorithms (eg, CRFs) are combined, and the above-described broadcast production rules are naturally melted into the learning model.

도 3은 본 발명의 실시예에 따른 규칙 기반 이벤트 구간 식별 과정을 나타내는 도면이고, 도 4는 본 발명의 실시예에 따른 학슴모델 기반 이벤트 구간 식별의 학습 데이터를 나타내는 도면이다. 3 is a diagram illustrating a rule-based event interval identification process according to an embodiment of the present invention, and FIG. 4 is a diagram illustrating learning data of Haksum model-based event interval identification according to an embodiment of the present invention.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 규칙 및 학습모델을 기반으로 상기 영상 내 이벤트 구간을 식별하되, 이벤트 시작 지점 및 종료 지점에 대하여 가중치와 평균값을 이용하여 이벤트 구간을 식별한다.The event section identification unit 320 according to an embodiment of the present invention identifies an event section in the image based on a rule and a learning model, and identifies the event section using weights and average values for the event start point and end point. .

학습모델은 중계영상에서 추출된 영상정보를 자질로 사용하여 구축되는데, 이는 도 4에 대한 상세한 설명으로 후술하기로 하고, 이하에서는 도 3을 참조하여 규칙 기반 이벤트 구간 식별 과정을 설명한다. The learning model is built using the image information extracted from the relay image as a feature, which will be described later with a detailed description of FIG. 4 , and a rule-based event section identification process will be described with reference to FIG. 3 below.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 문자중계 정보에 기초하여 문자 중계 상의 경기 시각에 해당되는 지점을 찾되, 문자중계 정보의 오류를 고려한다. The event section identification unit 320 according to an embodiment of the present invention finds a point corresponding to the game time on the text relay based on the text relay information, but considers errors in the text relay information.

전술한 바와 같이, 보통 문자 중계의 이벤트 발생 시각은 실제 경기에서의 발생 시각 대비 1 내지 3초 정도 느린데, 예컨대 문자 중계에서 11분 40초에 발생된 이벤트는 보통 실제 영상에서는 11분 42초 내지 44초에 발생된다. As described above, the event occurrence time of the text relay is usually 1 to 3 seconds slower than the occurrence time of the actual game. For example, the event occurring at 11 minutes 40 seconds in the text relay is usually 11 minutes 42 seconds to 44 seconds in the actual video occurs in seconds.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 이러한 문자중계 정보의 경기시각에 대한 오류를 감안하여, 중계영상에서 해당 경기시각의 지점을 인식한다. The event section identification unit 320 according to an embodiment of the present invention recognizes the point of the corresponding game time in the relay image in consideration of the error regarding the game time of the text relay information.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 인식 지점을 기준으로 기설정된 전후 시간을 적용하여 1차 이벤트 구간을 정의한다. The event section identification unit 320 according to an embodiment of the present invention defines a first event section by applying a preset time before and after the recognition point.

도 3을 참조하면, 인식 지점을 기준으로 15초 전과 5초 후를 1차 이벤트 구간으로 정의하게 되는데, 이는 이벤트 발생 과정을 주로 포함하고 이벤트 발생 후의 구간은 부차적으로 포함하기 위함이다. Referring to FIG. 3 , 15 seconds before and 5 seconds after the recognition point are defined as the primary event period, which mainly includes the event generation process and secondaryly includes the period after the event occurrence.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 영상정보 중 샷클래스 정보를 이용하여 1차 이벤트 구간 내 일부 영역이 제거된 2차 이벤트 구간을 정의한다.The event section identification unit 320 according to an embodiment of the present invention defines a second event section in which a partial region within the first event section is removed by using shot class information among the image information.

즉, 이벤트 구간 식별부(320)는 1차 이벤트 구간에 대해 각 프레임마다 추출된 샷클래스 정보를 이용하여 구간 경계 부분에 클로즈업 샷클래스가 포함되어 있다면 그 부분을 제거하여 2차 이벤트 구간을 정의한다.That is, the event section identification unit 320 defines the second event section by removing the close-up shot class in the section boundary portion using the shot class information extracted from each frame for the first event section. .

대부분의 스포츠 중계 방송에서 경기가 진행될 때에는 파샷(far-shot)이며, 경기가 잠시 중단되거나 이벤트가 발생하였을 때는 특정 선수를 클로즈업 하여 보여주는 경우가 있다. In most sports broadcasts, when a game is in progress, it is a far-shot, and when the game is temporarily suspended or an event occurs, a close-up of a specific player is sometimes shown.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 이러한 방송 제작 페턴을 고려하여 1차 이벤트 구간을 정교하게 다듬어 2차 이벤트 구간을 정의한다.The event section identification unit 320 according to an embodiment of the present invention refines the first event section in consideration of such a broadcast production pattern to define a second event section.

본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 영상정보 중 카메라 패닝 정보를 이용하여 2차 이벤트 구간 내 일부 영역이 제거된 최종 이벤트 구간을 선정한다. The event section identification unit 320 according to an embodiment of the present invention selects a final event section in which a partial region within the secondary event section is removed by using camera panning information among image information.

이 때, 이벤트 구간 식별부(320)는 2차 이벤트 구간에 대하여 카메라 패닝 정보를 이용하여 카메라 패닝이 구간 경계 부근에서 급속히 증가하였다면 그 부분을 제거하여 최종 이벤트 구간을 선정한다. At this time, the event section identification unit 320 selects the final event section by removing the portion if the camera panning rapidly increases near the section boundary using camera panning information for the second event section.

예컨대 농구 경기에서 대부분의 이벤트는 양 쪽 골대 부근에서 발생하고 이 때의 카메라 패닝은 거의 발생하지 않는다. For example, in a basketball game, most events occur near the goalposts on both sides, and camera panning rarely occurs at this time.

패닝이 발생하였다는 것은 선수들이 왼쪽에서 오른쪽(혹은 그 반대)으로 이동했음을 의미하므로 이러한 구간을 제거하면 이벤트의 시작과 종료 지점을 보다 정확하게 찾아내는 것이 가능하다. Because panning means that players have moved from left to right (or vice versa), removing these sections makes it possible to more accurately pinpoint the start and end points of the event.

다른 실시예로서, 본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 카메라 패닝이 발생한 경우에도, 카메라 패닝이 발생한 시점과 이벤트 발생 시점이 기설정된 시간 이내인 경우, 카메라 패닝이 급속하게 증가한 부분이라 하더라도 제거하지 않은 채, 최종 이벤트 구간을 선정한다. As another embodiment, even when camera panning occurs, the event section identification unit 320 according to an embodiment of the present invention may rapidly increase the camera panning when the camera panning and event occurrence time are within a preset time. Even if it is a part, the final event section is selected without removing it.

예를 들어 농구 경기에서 속공이 성공한 경우, 카메라 패닝은 급속하게 증가하였지만, 그 카메라 패닝이 급속하게 증가한 부분과 이벤트 발생 시점이 기설정된 시간 이내(예: 1.5초)이므로, 이를 선수들의 이동에 따른 불필요한 장면이 아니라 이벤트와 연관된 주요 장면으로 판별하는 것이다. For example, in the case of a successful fast attack in a basketball game, the camera panning increased rapidly, but the rapid increase in the camera panning and the event occurrence time are within a preset time (eg 1.5 seconds), so It is not an unnecessary scene, but a main scene related to the event.

이하에서는 도 4를 참조하여 학습모델 기반 이벤트 구간 식별 과정을 설명한다. Hereinafter, a learning model-based event section identification process will be described with reference to FIG. 4 .

본 발명의 실시예에 따르면, 학습모델을 구축하기 위해 중계 영상에서 영상정보(시간 정보, 샷클래스 정보, 카메라 패닝 정보 등)를 자질로 사용한다. According to an embodiment of the present invention, image information (time information, shot class information, camera panning information, etc.) in a relay image is used as a feature in order to build a learning model.

또한, 추가적인 자질로서, 경기가 진행되고 있는지 여부, 경기시간이 영상에 표시되어 있는지 여부, 샷클래스가 현재에서 바뀌는 시점까지 소요되는 시간을 이용하여 학습 데이터를 구축한다.In addition, as additional qualities, whether the game is in progress, whether the game time is displayed in the video, and the time taken from the time the shot class is changed from the present to the time taken to build the learning data.

본 발명의 실시예에 따른 기계학습알고리즘인 CRFs를 학습하기 위한 하나의 프레임에 대한 학습 데이터는 도 4에 도시한 바와 같다.Learning data for one frame for learning CRFs, which is a machine learning algorithm according to an embodiment of the present invention, is shown in FIG. 4 .

도 4를 참조하면, time은 현재 프레임의 시간정보를 나타내며, -1은 경기시간이 없는 경우를 의미하고, ##:##은 경기 시간을 의미한다.Referring to FIG. 4 , time represents time information of the current frame, -1 means no game time, and ##:## means game time.

panning은 현재프레임과 다음프레임을 비교하여 카메라 패닝 정도를 수치화 한 값을 나타낸다.The panning represents the numerical value of the camera panning degree by comparing the current frame with the next frame.

shotcls는 현재프레임의 샷클래스 정보를 나타내며, far는 파샷을, close는 클로즈업샷을 의미한다.shotcls indicates the shot class information of the current frame, far indicates a far shot, and close indicates a close-up shot.

board는 현재 프레임에 경기시간이 표시되어 있는지 여부를 나타낸다.The board indicates whether the game time is displayed in the current frame.

isStop은 현재 프레임과 1초 후의 프레임을 비교하여 경기시간이 멈춤 여부를 나타낸다.isStop compares the current frame with the frame after 1 second to indicate whether the game time is stopped.

close2far는 현재 프레임이 클로즈업이라면 파샷 프레임이 나올때까지의 영상 길이를 나타내고, far2close: 현재 프레임이 파샷이라면 클로즈업 프레임이 나올때까지의 영상 길이를 나타낸다.close2far indicates the length of the video until the far shot frame appears if the current frame is a close-up, and far2close: indicates the video length until the close-up frame if the current frame is a far shot.

'-#'이 앞에 붙은 항목은 이전 #번째 프레임의 정보를 나타내며, '#'이 앞에 붙은 항목은 이후 #번째 프레임의 정보임을 나타낸다.Items prefixed with '-#' indicate information of the previous #th frame, and items prefixed with '#' indicate information of the #th frame after.

출력순열은 현재 프레임이 이벤트 구간에 해당되는지 여부를 나타내며, none은 일반영상을, yes는 이벤트구간임을 의미한다.The output permutation indicates whether the current frame corresponds to the event section, none means a normal video, and yes means the event section.

CRFs는 조건부 확률을 최대화하기 위해 훈련된, 방향성이 없는 그래프 모델로서, 매개변수 Λ={λ,…}를 갖는 선형 체인 CRFs는 아래 [수학식 1]과 같이 입력 순열 x가 주어졌을 때 레이블 확률 변수 y에 대한 조건부 확률로 정의된다.CRFs are undirected graph models trained to maximize conditional probabilities, with parameters Λ={λ,… } is defined as a conditional probability for a label random variable y when an input permutation x is given as shown in [Equation 1] below.

여기서 Zx는 입력 데이터 열에 대한 레이블 열의 확률값의 합이 1이 되도록 하는 정규화 상수이다.Here, Zx is a normalization constant that makes the sum of the probability values of the label column for the input data column equal to 1.

전술한 바와 같이, 본 발명의 실시예에 따른 이벤트 구간 식별부(320)는 규칙 및 학습모델 기반의 이벤트 구간 식별을 조합하는 것으로, 규칙 기반 및 학습모델 기반으로 식별된 이벤트 구간의 시작 및 종료 지점에 대하여 가중치, 평균값을 이용하여 최종 이벤트 구간을 식별하여, 식별 정확도를 높인다.As described above, the event section identification unit 320 according to an embodiment of the present invention combines the rule-based and learning model-based event section identification, and the start and end points of the event section identified based on the rule-based and the learning model. The final event section is identified using weights and average values for , and the identification accuracy is increased.

도 5는 본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 방법을 나타내는 순서도이다. 5 is a flowchart illustrating a video event unit segmentation method according to an embodiment of the present invention.

본 발명의 실시예에 따른 영상 이벤트 단위 세그멘테이션 방법은 중계영상 및 문자중계 정보를 수신하는 단계(S100)와, 중계영상에서 영상정보를 추출하는 단계(S200) 및 영상정보를 이용하여 문자중계 정보에 포함되는 이벤트에 해당하는 이벤트 구간을 식별하고 영상 세그먼트를 생성하는 단계(S300)를 포함한다. The video event unit segmentation method according to an embodiment of the present invention includes the steps of receiving a relay image and text relay information (S100), extracting image information from the relay image (S200), and using the image information to obtain text relay information. and identifying an event section corresponding to the included event and generating an image segment (S300).

S100 단계는 비디오스트리밍 캡쳐부로부터 중계영상을 수신하고, 실시간 크롤링을 통해 문자중계 정보를 수신하는 단계이다. Step S100 is a step of receiving a relay image from the video streaming capture unit, and receiving text relay information through real-time crawling.

S200 단계는 시간 정보, 샷클래스 정보 및 카메라 패닝 정보 중 적어도 어느 하나를 추출하고, 추출된 정보를 가공하여 경기의 진행 여부, 경기 시간이 영상에 표시되어 있는지 여부 및 샷클래스가 바뀌는 시점에 소요되는 시간 중 적어도 어느 하나를 자질로 생성하여 영상정보를 추출한다.Step S200 extracts at least one of time information, shot class information, and camera panning information, and processes the extracted information to determine whether the game is in progress, whether the game time is displayed in the video, and the time it takes when the shot class is changed. Image information is extracted by generating at least one of time as a feature.

S300 단계는 S200 단계에서 추출된 영상정보를 자질로 사용하여 구축된 학습모델 및 규칙 기반으로 이벤트 구간을 식별하는 것으로, 규칙 기반 및 학습모델 기반 이벤트 구간 식별을 조합하여 최종 이벤트 구간을 선정하는 단계이다. Step S300 is to identify the event section based on the learning model and rule-based constructed using the image information extracted in step S200 as a feature, and selects the final event section by combining the rule-based and learning model-based event section identification. .

S300 단계는 문자중계 정보의 오류를 고려하여, 중계영상 중 문자중계 상의 경기 시간에 해당되는 지점을 찾고, 이를 기준으로 이벤트 구간을 식별한다.In step S300, in consideration of the error in the text relay information, a point corresponding to the game time on the text relay among the relay images is found, and an event section is identified based on this.

실제 문자중계 상의 이벤트 발생 시각과 실제 경기에서의 발생 시각은 1 내지 3초 정도의 오차가 있으므로, S300 단계는 이러한 문자중계의 오류를 고려하여 이벤트의 발생 지점을 인식한다. Since there is an error of about 1 to 3 seconds between the event occurrence time on the actual text relay and the actual game time, step S300 recognizes the occurrence point of the event in consideration of the text relay error.

S300 단계는 이벤트 발생 인식 지점을 기준으로 기설정된 전후 시간(예: 전 15초, 후 5초)을 적용하여 1차 이벤트 구간을 정의하며, 샷클래스 정보를 이용하여 1차 이벤트 구간 내 클로즈업 샷클래스가 포함된 일부 영역이 제거된 2차 이벤트 구간을 정의한다.In step S300, the first event section is defined by applying a preset time before and after (eg 15 seconds before, 5 seconds after) based on the event occurrence recognition point, and close-up shot class within the first event section using shot class information Defines a secondary event section in which some areas including .

또한, S300 단계는 카메라 패닝 정보를 이용하여 2차 이벤트 구간 내 카메라 패닝이 구간 경계 부근에서 기설정된 값보다 급속히 증가하는 부분이 제거된 최종 이벤트 구간을 선정한다.In addition, step S300 selects a final event section in which a portion in which the camera panning within the secondary event section rapidly increases than a preset value near the section boundary is removed by using the camera panning information.

S300 단계는 이벤트 식별 결과를 이용하여 생성된 영상 세그먼트를 하위 비디오클립 파일로 저장하여, 시청자의 선택에 따라 이벤트 단위 서비스를 제공하며, 비디오클립파일에 대한 이벤트 정보를 메타데이터로 저장하여 데이터베이스를 구축한다.Step S300 stores the video segment generated using the event identification result as a sub-video clip file, provides an event unit service according to the viewer's selection, and stores event information about the video clip file as metadata to build a database do.

이제까지 본 발명의 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다. So far, the embodiments of the present invention have been mainly looked at. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100: 비디오 스트리밍 캡쳐부 200: 실시간 크롤링부
310: 정보 수집부 311: 영상정보 추출부
312: 문자중계정보 전처리부 320: 이벤트 구간 식별부
330: 영상 서비스부 400: 시청자 단말100: video streaming capture unit 200: real-time crawling unit
310: information collection unit 311: image information extraction unit
312: text relay information pre-processing unit 320: event section identification unit
330: video service unit 400: viewer terminal

Claims

an information collecting unit for receiving a relay image and text relay information and extracting the image information;
an event section identification unit for identifying an event section in an image by using the text relay information and the image information; and
And a video service unit for servicing the video segment generated based on the event section,
The event section identification unit identifies the final event section by removing the corresponding part according to the degree to which the camera panning increases near the section boundary by using the camera panning information for the event section, but when the camera panning occurs and the event If the occurrence time is within a preset time, selecting the final event section without removing the corresponding part
In-video event unit segmentation system.

According to claim 1,
The information collection unit receives the relay image from the video streaming capture unit, and receives the text relay information from the real-time crawling unit
In-video event unit segmentation system.

According to claim 1,
The information collecting unit extracts at least one of time information, shot class information, and the camera panning information from the relay image
In-video event unit segmentation system.

4. The method of claim 3,
The information collection unit processes the extracted information to generate at least one of the characteristics of whether the game is in progress, whether the game time is displayed in the image, and the time required when the shot class is changed, and extracting the image information
In-video event unit segmentation system.

According to claim 1,
The event section identification unit identifies the event section in the image based on a rule and a learning model, and identifies the event section using a weight and an average value for an event start point and an end point, and the learning model in the relay image Constructed using the extracted image information as a feature
In-video event unit segmentation system.

The method of claim 1,
In consideration of the error of the text relay information, the event section identification unit finds a point corresponding to the game time on the text relay among the relay images, and defines the event section based on the recognition point
In-video event unit segmentation system.

7. The method of claim 6,
The event section identification unit selects the final event section for the event section by using the image information
In-video event unit segmentation system.

8. The method of claim 7,
The event section identification unit defines a first event section by applying a preset time before and after the recognition point, and defines a second event section in which a partial region within the first event section is removed using shot class information, , selecting the final event section in which a partial area within the second event section is removed using the camera panning information
In-video event unit segmentation system.

9. The method of claim 8,
The event section identification unit defines the second event section by removing the part including the close-up shot class from the first event section
In-video event unit segmentation system.

9. The method of claim 8,
The event section identification unit selects the final event section by removing a portion in which the camera panning rapidly increases than a preset value near the section boundary in the second event section
In-video event unit segmentation system.

According to claim 1,
The video service unit stores the video segment generated based on the event section as a lower video clip file, stores event information for the video clip file as metadata to build a database, and provides event unit service according to a viewer's selection to provide
In-video event unit segmentation system.

In the video event unit segmentation method performed by the video event unit segmentation system,
(a) receiving a relay image and text relay information;
(b) extracting image information from the relay image; and
(c) identifying an event section corresponding to an event included in the text relay information using the image information, and generating an image segment;
In step (c), the final event section is identified by using camera panning information for the event section, and the corresponding part is removed according to the degree to which the camera panning increases in the vicinity of the section boundary, but when the camera panning occurs and the event If the occurrence time is within a preset time, selecting the final event section without removing the corresponding part
In-video event unit segmentation method.

13. The method of claim 12,
The step (a) is to receive the text relay information through real-time crawling.
In-video event unit segmentation method.

13. The method of claim 12,
Step (b) extracts at least one of time information, shot class information, and the camera panning information, and processes the extracted information to determine whether the game is in progress, whether the game time is displayed in the video, and the shot class is changed. Extracting the image information by generating at least one of the time required for the viewpoint as a quality
In-video event unit segmentation method.

13. The method of claim 12,
The step (c) is to identify the event section based on a learning model and rule built using the image information extracted from the relay image as a feature
In-video event unit segmentation method.

16. The method of claim 15,
The step (c) is to find a point corresponding to the game time on the text relay among the relay images in consideration of the error of the text relay information, and to identify the event section based on this
In-video event unit segmentation method.

16. The method of claim 15,
In step (c), a first event section is defined by applying a preset time before and after the event occurrence recognition point, and a second event section in which a partial region within the first event section is removed using shot class information defining and selecting the final event section in which a partial area within the second event section is removed using the camera panning information
In-video event unit segmentation method.

delete

13. The method of claim 12,
The step (c) stores the video segment generated using the event identification result as a lower video clip file, and provides an event unit service according to a viewer's selection.
In-video event unit segmentation method.

20. The method of claim 19,
The step (c) is to build a database by storing the event information for the video clip file as metadata.
In-video event unit segmentation method.