KR20060136413A

KR20060136413A - Replay of media stream from a prior change location

Info

Publication number: KR20060136413A
Application number: KR1020067014999A
Authority: KR
Inventors: 게라르드 홀레만스
Original assignee: 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 2004-01-26
Filing date: 2005-01-24
Publication date: 2007-01-02

Abstract

사용자가 이용할 수 있는 리플레이 옵션은 비디오 스트림(30)이 순차적으로 비디오 스트림(30)의 이전 변화 지점들(L_N - L₁)로 역방향으로 움직이도록 할 수 있고, 그런 후 사용자에 의해서 선택된 이전 변화 지점들 중 하나로부터 비디오 스트림(30)을 순방향으로 실행한다. 비디오 스트림(30)의 현재의 실행 지점(T)이전에 생성하는 비디오 스트림의 변화 지점은 실시간으로 생성되거나 또는 비디오 스트림(30)에 포함된다. 변화 지점들(L_N - L₁)은 음성 브레이크(speech breaks), 화면 컷(shot cuts), 또는 비디오 스트림(30)에서의 사람 또는 대상물의 움직임일 수 있다. The replay option available to the user may cause the video stream 30 to move backwards to the previous change points L _N -L ₁ of the video stream 30 sequentially, and then the previous change selected by the user. Run video stream 30 forward from one of the points. The change point of the video stream generated before the current execution point T of the video stream 30 is generated in real time or included in the video stream 30. The change points L _N -L ₁ may be speech breaks, shot cuts, or the movement of a person or object in the video stream 30.

Description

REPLAY OF MEDIA STREAM FROM A PRIOR CHANGE LOCATION}

본 발명은 비디오 콘텐츠의 검색과 관련된다. 특히, 본 발명은 비디오 스트림의 이전 부분의 검색과 재생에 관련된다. The present invention relates to the retrieval of video content. In particular, the present invention relates to the retrieval and playback of previous portions of a video stream.

비디오 리플레이에 관한 알려진 방법이 있다. 하지만, 이들 리플레이 기술은 제한되어 있다. 어떤 시스템에 대해서는, 사용자는 비디오 스트림의 리플레이를 시작하려는 특정 시간 스탬프(stamp)를 입력할 수 있다. 만일 사용자가 리플레이하는데 관심이 있는 비디오 스트림에서의 특정 시점을 알지 못한다면, 입력될 수 있는 최상의 것은 근사치이다. 이것은 관심 위치 이전 또는 이후의 비디오 스트림에 있는 위치에 사용자를 놓을 수 있어서, 사용자를 혼란하게 하거나 또는 좌절하게 할 수 있다. 또한 문장의 중간에 재생을 시작할 수 있어, 역시 사용자를 혼란하게 하거나 또는 좌절하게 할 수 있다. 사용자의 혼란은 이전의 위치로 되돌아 갈 때 역으로 비디오 스트림을 렌더링하지 않는 그러한 시스템에 대해서 악화될 수 있는데, 이는 그러한 역 실행의 사용자에게 재-시작(re-start) 위치에 대한 시각적 상황을 제공할 수 있기 때문이다.There is a known method for video replay. However, these replay techniques are limited. For some systems, the user may enter a specific time stamp to begin replaying the video stream. If the user does not know the specific point in time of the video stream he is interested in replaying, the best that can be input is an approximation. This may place the user in a location in the video stream before or after the location of interest, which may confuse or frustrate the user. It can also start playing in the middle of a sentence, which can also confuse or frustrate the user. The user's confusion can be exacerbated for those systems that do not render the video stream in reverse when returning to the previous location, which gives the user of such a reverse run a visual situation about the re-start location. Because you can.

또 다른 비디오 리플레이 특징은 사용자가 예컨대 리모트(remote)를 통해서, 역 기능(reverse function)을 개시하도록 한다. 실행 위치는 사용자가 역 기능을 사용하지 않을 때까지(예컨대, 리모트에서 "정지"버튼을 누름으로써) 비디오 스트림을 통해서 시간상으로 되돌아 간다. 종종 그러한 역 특징은 사용자에게 역으로 비디오 콘텐츠를 렌더링해서, 사용자가 비디오 스트림 내에서 역방향으로 얼마나 멀리 이동했는지에 대한 어느 정도의 일반적인 감각을 사용자에게 제공한다. (그러한 역기능은 그들이 관심 있는 근접 이전의 위치에서 도착할 때까지 테잎을 리와인드(rewind)하고 그것이 역으로 실행 하는 것을 보는 VCRs의 사용자에게 잘 알려져 있다. 하지만, 그러한 역기능은 미숙한 제어이며 종종 사용자는 비디오 스트림에서 정확한 관심 위치를 확인할 수 없거나, 또는 관심 있는 위치에서 역기능을 멈출 수 없다. 추가로, 사용자를 돕기 위한 역기능 동안 렌더링된 사운드가 없다. 예컨대, 만일 사용자가 최근의 말을 리플레이하는데 관심이 있다면, 사용자는 (예컨대, 배우들을 지켜 봄으로써) 역으로 렌더링되는 비디오로부터 관심 있는 인접한 이전 위치를 결정해야만 한다. 사용자가 역기능을 멈추는 때까지, 상당한 양의 비디오 스트림에서의 여분의 역 이동이 종종 생성한다. 테잎을 시작하는 것은 또한 말해진 문장의 중간에서 시작할 수 있어서, 다시 사용자를 혼란스럽고 좌절하도록 만든다. 추가로 만일 콘텐츠가 역기능 동안 역으로 렌더링되지 않는다면, 사용자는 언제 멈출지를 추정하고 비디오 스트림이 재시작되는 위치에 대해 알지 못한다. Another video replay feature allows a user to initiate a reverse function, for example via a remote. The execution position goes back in time through the video stream until the user does not use the reverse function (eg, by pressing the "stop" button on the remote). Often such inverse features render video content inversely to the user, giving the user some general sense of how far the user has moved backwards in the video stream. (These dysfunctions are well known to users of VCRs who rewind the tape until they arrive at a location before their close proximity of interest and watch it run in reverse. However, such dysfunction is immature control and often the user is not Cannot determine the exact location of interest in the stream, or stop the dysfunction at the location of interest, in addition, there is no sound rendered during the dysfunction to assist the user, e.g. if the user is interested in replaying recent words In this case, the user must determine the adjacent previous location of interest from the video being reversely rendered (eg, by watching the actors), until a user stops dysfunction, an extra reverse movement in a significant amount of video stream is often generated. Starting the tape also means that the middle of the spoken sentence It can start at, confusing and frustrating the user again In addition, if the content is not rendered inversely during the dysfunction, the user estimates when to stop and does not know where the video stream is restarted.

위의 비디오 리플레이 특징(및 그들의 부수적인 단점들은)들은 테잎, 하드 드라이브 또는 비디오 스트림을 생성하기 위한 광학적 디스크를 사용하는 비디오 시스템 상에서 발견될 수 있다. 일부의 시스템들은 또한 사용자가 "점프-백(jump- back)", "반복(repeat)", 또는 유사한 버튼을 누름으로써 방금 실행된 비디오 스트림의 일부를 리플레이하도록 한다. 이것은 전형적으로 비디오 스트림의 현재 실행을 멈추고, 비디오 스트림에서 이전에 고정된 시간으로부터 재-시작한다. 예컨대, 사용자가 점프 백 버튼(예컨대, 리모트 상에서)를 선택할 때, 비디오 스트림은 실행를 멈추고, 비디오 스트림에서 30초를 되돌려서 실행를 재-시작한다. 따라서, VCR 애플리케이션을 위해, 점프-백 버튼을 누르는 것은 테잎이 실행 시간을 30초 리-와인드하고 그 위치에서 실행 기능을 재시작한다. 유사한 특징들이 또한 하드 드라이브 및 광학 기반 비디오 시스템에서 발견된다.The above video replay features (and their associated disadvantages) can be found on video systems using optical discs to create tapes, hard drives or video streams. Some systems also allow a user to replay a portion of the video stream just executed by pressing a "jump-back", "repeat", or similar button. This typically stops the current execution of the video stream and re-starts from a previously fixed time in the video stream. For example, when the user selects a jump back button (eg on a remote), the video stream stops running and resumes running by returning 30 seconds from the video stream. Thus, for a VCR application, pressing the jump-back button causes the tape to rewind the run time 30 seconds and restart the run function at that location. Similar features are also found in hard drives and optically based video systems.

하지만, 사용자의 관점에서, 그러한 고정된 양의 시간은 많은 단점을 가진다. 고정된 양의 시간은 일반적으로 사용자가 관심 있는 비디오 스트림에서의 특정 순간 이전 혹은 이후에 있는 위치로 비디오 스트림을 다시 위치하게 할 것이다. 그러한 임의의 위치는 사용자를 어수선하게 하고, 혼란스럽게 하거나 좌절시킬 수 있다. 예컨대, 사용자는 최근 대화 중 하나의 단어를 놓칠 수도 있고, 비디오의 적어도 30초를 리플레이하기를 원치 않는다. 추가로, 일부의 시스템에 대해 점프-백 특징은 사용자에 반하여 점프 백을 스패닝(spanning)해서 비디오를 렌더링하지 않고 이전의 위치로 개별적으로 점프백한다. 따라서, 사용자는 관심있는 비디오 스트림의 위치에 대해서 알지 못할 수도 있다. 사용자는 단지 그 위치로부터 비디오가 순방향으로 실행하거나 또는 또 다른 30초 동안 점프백하도록 해서 그 문제를 간단히 무마할 수 있다. 추가로, 점프 백 버튼을 누르는 것은 이전의 화면으로부터 비디오의 일부를 제시할 수 있고, 이전의 대화의 불완전한 일부를 제공할 수 있다. 다시, 이것은 사용자를 혼란하게 할 수 있다. However, from the user's point of view, such a fixed amount of time has many disadvantages. A fixed amount of time will generally reposition the video stream to a position that is before or after a particular moment in the video stream of interest to the user. Such arbitrary locations can clutter, confuse or frustrate the user. For example, a user may miss one word of a recent conversation and does not want to replay at least 30 seconds of the video. In addition, for some systems the jump-back feature spans the jump back against the user to individually jump back to the previous position without rendering the video. Thus, the user may not know about the location of the video stream of interest. The user can simply get rid of the problem simply by having the video run forward or jump back for another 30 seconds from that location. In addition, pressing the jump back button can present a portion of the video from the previous screen and provide an incomplete portion of the previous conversation. Again, this can confuse the user.

추가로, 하드 드라이브 및 광학 비디오 시스템들과 같은 특정 시스템들은 사용자가 비디오 스트림의 챕터(chapter)를 제공하는 메뉴에 엑세스하도록 할 수 있다. DVD는 이러한 종류의 옵션의 잘 알려진 예이다. 사용자는 따라서 메뉴에 엑세스할 수 있고, 이전의 챕터의 시작으로부터 비디오 스트림을 리플레이할 수 있다. 하지만 챕터는 사용자에게 시각적인 나레이티브(narrative)(또는 콘텐츠 테이블)를 제공하기 위해서 생성될 수 있는 화면의 그룹이다. 따라서, 그들은 다른 당사자의 화면의 주관적인 그룹핑이다. 다른 단점들 중에서 챕터의 시작부로의 이동은 사용자들이 리플레이하기를 원하는 위치를 선택하도록 허락하지 않는다. 예컨대, 만일 사용자가 현재의 화자가 이야기를 시작하는 때로부터와 같은, 짧은 양의 리플레이에, 단지 관심이 있다면, 현재의 챕터의 시작을 선택하는 것은 관심 위치의 오래전에 비디오 스트림에서의 위치에서 사용자의 위치를 정할 수 있다. In addition, certain systems, such as hard drives and optical video systems, may allow a user to access a menu that provides a chapter of a video stream. DVD is a well known example of this kind of option. The user can thus access the menu and replay the video stream from the beginning of the previous chapter. But a chapter is a group of screens that can be created to provide a visual narrative (or content table) to the user. Thus, they are subjective groupings of the screens of other parties. Among other drawbacks, moving to the beginning of the chapter does not allow users to select the location they want to replay. For example, if the user is only interested in a short amount of replays, such as from when the current speaker starts talking, selecting the beginning of the current chapter may not be the user at the position in the video stream long before the location of interest. You can set the position of.

또 다른 관심 영역에서, 비디오 브라우징(browsing)의 기술은 관심 토픽이며 발전일 수 있다. 브라우징은 전형적으로 사용자에게 일부 형태의 비디오 콘텐츠의 요약을 제공함으로써 일반적으로 비디오 콘텐츠가 사용자에게 관심이 있는지를 사용자가 결정하도록 돕는 데 중점을 둔다. 예컨대, 다른 것들 중에서도, Li 등의 "Browsing Digital Video", ACM Press의 Proceeding of ACMCHI'00(2000년 4월, 네델란드, 헤이그)169-176 쪽에서, 사용자에게 화면 경계 프레임(shot boundary frames)을 포함하는 비디오 인덱스가 제공된다. Li에 따르면, 화면 경계 프레임들은 인텍스에 그들의 위치를 기록하는 검출 알고리즘에 의해서 생성될 수 있다. 비 디오 스트림이 실행될 때, 현재의 화면을 위한 화면 경계 프레임은 하일라이트 되고, 사용자는 인덱스 내의 또 다른 화면 경계 프레임을 클릭함으로써 비디오의 다른 부분을 선택할 수 있다. 화면 경계 인덱스가 온전한 비디오를 위해서 완전하기 때문에, 사용자는 현재의 위치로부터 순방향 또는 역방향으로 움직일 수 있다. In another area of interest, the technique of video browsing is a topic of interest and may be an evolution. Browsing typically focuses on helping the user determine if the video content is of interest to the user by providing the user with a summary of some form of video content. For example, among others, in " Browsing Digital Video " A video index is provided. According to Li, screen boundary frames can be generated by a detection algorithm that records their position in the index. When the video stream is executed, the screen border frame for the current screen is highlighted, and the user can select another portion of the video by clicking another screen border frame in the index. Because the screen boundary index is complete for intact video, the user can move forward or backward from the current position.

유사하게, Van Houten등의 "Video Browsing & Summarisation"{저작권 2000, Telematica Instituut(TI ref: TI/RS/2000/163)}은 스토리보드(storyboard)로서 화면(shots)을 사용하는 것을 언급하며(section 2.3), 그리고 다시 Li 출판물(section 2.4.3)를 참조한다. Van Houten 또한 인덱싱에서 대화의 음성 인식(speech recognition)의 사용을 언급한다(section 2.4.1). Similarly, "Video Browsing & Summarisation" by Van Houten et al. (Copyright 2000, Telematica Instituut (TI ref: TI / RS / 2000/163)) refers to the use of shots as storyboards ( section 2.3), and again the Li publication (section 2.4.3). Van Houten also mentions the use of speech recognition of dialogue in indexing (section 2.4.1).

본 발명은 비디오 스트림의 현재의 실행 위치 이전에 생성하는 비디오 스트림의 콘텐츠 변화를 인식하는 데이터를 검출하거나 이용하기 위한 방법을 포함한다. 콘텐츠 변화는 비디오에서의 (아래에서 "음성 브레이크"로 일반적으로 언급되는) 음성에서의 브레이크(breaks)로 구성된다. 비디오에서의 음성 브레이크는 말하기가 상대적인 침묵 기간 후에 시작되는 장소일 수 있다. 콘텐츠 변화는 비디오에서의 화면 컷(shot cut)과 같은 비디오 스트림에서 콘텐츠의 다른 중대한 변화를 포함할 수 있다. 사용자가 사용할 수 있는 재생 또는 리플레이 옵션은 비디오 스트림이 순차적으로 비디오 스트림에서 이전의 콘텐츠의 변화로 역방향으로 움직이도록 한 후에 사용자에 의해서 선택된 이전의 콘텐트 변화의 위치로부터 순방향으로 비디오 스트림을 실행한다. The present invention includes a method for detecting or using data that recognizes a change in content of a video stream that is generated before the current execution position of the video stream. Content changes consist of breaks in the voice in the video (commonly referred to below as "voice breaks"). The voice break in the video may be the place where speech begins after a relative silence period. Content changes can include other significant changes in content in the video stream, such as shot cuts in the video. The playback or replay option available to the user causes the video stream to move backwards to the previous content change in the video stream sequentially and then executes the video stream forward from the location of the previous content change selected by the user.

따라서, 본 발명의 하나의 측면에서, 비디오 스트림이 수신되고, 비디오 디스플레이 시스템에 의한 사용자를 위해서 실행한다. 비디오 스트림은 그것이 실행하듯이 실행하는 비디오 스트림 내의 음성 브레이크를 검출하기 위해 실질적으로 실시간으로 프로세스된다. 비디오 스트림의 현재의 실행 위치 이전에 비디오 스트림에서 음성 브레이크의 위치는 유지된다. 비디오 스트림이 실행할 때, 추가적인 음성 브레이크들이 검출되고, 비디오 스트림 내에서의 그들의 위치는 메모리에 추가된다. 만일 사용자가 재생 요건을 사용한다면, 비디오 스트림의 출력은 멈추고, 가장 가까운 이전 음성 브레이크 위치에서 시작한다. 따라서, 이전기술의 리플레이 시스템과 같지 않게, 비디오는 사용자에 통일성 있는 비디오에서의 위치로부터 리플레이 된다. Thus, in one aspect of the invention, a video stream is received and executed for the user by the video display system. The video stream is processed substantially in real time to detect voice breaks in the video stream that it executes as it does. The position of the voice break in the video stream is maintained before the current execution position of the video stream. As the video stream executes, additional voice breaks are detected and their location in the video stream is added to the memory. If the user uses the playback requirement, the output of the video stream stops and starts at the nearest previous voice break position. Thus, unlike the replay system of the prior art, the video is replayed from a location in the video that is uniform to the user.

사용자는 재생 옵션을 여러 번 사용할 수 있으며, 매번 비디오 스트림이 비디오 스트림에서 하나의 추가적인 음성 브레이크를 다시 이동하도록 한다. 따라서, 사용자는 그들이 리플레이하는 데 관심이 있는 비디오에서의 특정 음성 브레이크의 시작으로 다시 이동할 수 있다. 사용자가 재생 옵션을 사용하기를 중단할 때, 비디오 스트림은 선택된 이전의 음성 브레이크의 위치로부터 실행하는 것을 재시작한다. 다시, 사용자는 비디오 내에서 다시 움직일 수 있어서 플레이백은 예컨대 사람들이 말하기 시작하는 음성 브레이크와 같은 비디오에서의 특정 위치로부터 시작한다.The user can use the playback option multiple times, each time causing the video stream to move back one additional voice break in the video stream. Thus, the user can move back to the start of a particular voice break in the video they are interested in replaying. When the user stops using the playback option, the video stream resumes execution from the position of the previous voice break selected. Again, the user can move back in the video so playback starts from a specific location in the video, such as a voice break where people start speaking.

화면 컷(shot cut)과 같은 다른 형태의 이전 콘텐츠 변화는 비디오 스트림에서 또한 검출될 수 있다. 그들의 위치는 검출된 음성 브레이크와 함께 저장될 수 있어서, 이전 변화 위치의 통합된 리스트를 포함한다. 리플레이는 이들 이전의 변화 위치 중의 하나로부터 시작될 수 있다.Other forms of previous content changes, such as shot cuts, can also be detected in the video stream. Their positions can be stored with detected voice breaks, including an integrated list of previous change positions. The replay may start from one of these previous change positions.

본 발명의 또 다른 측면에서, 변화 위치는 미리-확인되고 사용자에 의해서 실행되는 동안 비디오 스트림의 일부로서 포함된다. 위에서 주목할만한 경우와 같이, 사용자는 비디오 스트림 데이터에서 확인되는 것처럼 이전의 변화 위치로부터의 비디오 스트림의 실행을 재시작하기 위한 재생 옵션을 사용할 수 있다. In another aspect of the invention, the change location is pre-identified and included as part of the video stream while being executed by the user. As noted above, the user can use the playback option to restart the execution of the video stream from the previous change location as identified in the video stream data.

본 발명의 추가적인 변경에서, 음성 브레이크 및 화면 컷(shot cut)에 추가하여 비디오 스트림에서의 다른 이전 변화는 재생을 위해서 이용가능하다. 예컨대, 대상물과 사람들의 움직임에서의 변화가 검출되어 리플레이가 시작될 수 있는 비디오 스트림에서의 이전의 위치로서 사용될 수 있다. In a further variation of the invention, other previous changes in the video stream in addition to the voice break and shot cut are available for playback. For example, a change in the movement of the object and people can be detected and used as a previous position in the video stream where replay can begin.

따라서, 일반적으로, 본 발명은 미디어 스트림에서의 이전에 확인된 많은 콘텐츠 변화 중의 선택된 하나로부터 미디어 스트림을 리플레이 하는 것을 포함하며, 미디어 스트림에서의 이전의 위치로부터 미디어 스트림을 재생하기 위한 방법을 포함하는데, 여기서 콘텐츠 변화는 미디어 스트림에서의 이전 음성 브레이크를 포함한다. 본 발명은 또한 미디어 스트림의 현재의 실행 위치(T) 이전의 미디어 스트림 내에서의 위치로부터 디지털 미디어 스트림을 재생하는 방법을 또한 포함한다. 상기 방법은 미디어 스트림이 실행할 때와 같이 실시간으로 콘텐츠 변화 위치를 검출하는 것을 포함한다. 적어도 많은 실행 위치(T) 이전에 검출되는 가장 인접한 변화 위치가 저장된다. 번호(m)를 포함하는 하나 이상의 입력신호가 수신되며, 미디어 스트림에서 위치(T)에 이전에 m번째 가장 가까운 변화 위치가 검색된다. 미디어 스트림은 m 번째 가장 가까운 변화위치에서 미디어 스트림 내의 T까지 리플레이 된다.Thus, in general, the present invention encompasses replaying a media stream from a selected one of a number of previously identified content changes in the media stream, and includes a method for playing a media stream from a previous location in the media stream. , Where the content change includes the previous voice break in the media stream. The invention also includes a method of playing a digital media stream from a position in the media stream before the current execution position (T) of the media stream. The method includes detecting a content change position in real time as the media stream executes. The closest change position detected before at least many execution positions T is stored. One or more input signals containing the number m are received and the m-th closest change position previously retrieved from position T in the media stream is retrieved. The media stream is replayed up to T in the media stream at the mth closest change location.

추가로, 본 발명은 미디어 스트림 내의 이전의 위치로부터 미디어 스트림을 리플레이하는 시스템을 포함한다. 이 시스템은 프로세서와 메모리를 포함하는데, 프로세서는 미디어 스트림에서 많은 이전에 확인된 콘텐츠 변화 중 하나를 선택하는 하나 이상의 입력 신호를 수신한다. 프로세서는 추가로 선택된 콘텐츠 변화에 대응하는 위치를 메모리로부터 검색하고 선택된 변화 위치로부터 미디어 스트림의 리플레이를 활성화하는데, 여기서 식별 콘텐츠 변화는 미디어 스트림 내의 이전 음성 브레이크를 포함한다. In addition, the present invention includes a system for replaying a media stream from a previous location in the media stream. The system includes a processor and a memory, the processor receiving one or more input signals to select one of the many previously identified content changes in the media stream. The processor further retrieves from memory the location corresponding to the selected content change and activates the replay of the media stream from the selected change location, wherein the identification content change includes a previous voice break in the media stream.

하지만 여전히 미디어 스트림 내의 선택된 이전의 위치로부터 미디어 스트림을 리플레이 하기 위한 컴퓨터-판독가능 매체에 구현된 컴퓨터 프로그램 제품이 제공되는데, 상기 컴퓨터 프로그램 제품은 본 발명의 방법들을 수행한다. However, there is still provided a computer program product embodied in a computer-readable medium for replaying a media stream from a selected previous location in the media stream, the computer program product performing the methods of the present invention.

도 1은 본 발명을 지지하는 디바이스와 시스템의 대표도.1 is a representation of a device and system supporting the present invention.

도 2는 실행 지점(T)에서 비디오 스트림 내의 이전 변화 위치의 대표도.2 is a representative view of a previous change position in the video stream at execution point T;

도 3은 본 발명의 실시 예에 대한 플로차트.3 is a flowchart of an embodiment of the present invention.

도 1은 본 발명에 따라 실행하는 시스템(10)을 나타낸다. 비디오 디바이스(20)는 디스플레이(40)를 통해서 사용자에게 디스플레이되는 비디오 스트림(30)을 생성하고 제공한다. 비디오 디바이스(20)는 테잎을 실행시키는 비디오 카세트 레코더 또는 디스크를 가동하는 DVD 플레이어와 같은 많은 전형적인 디바이스 중 임의의 것일 수 있다. 비디오 디바이스(20)는 그 안에 삽입된 사전-기록된 비디오 테잎 또는 DVD를 재생함으로써 비디오 스트림(30)을 생성할 수 있다. 비디오 디바이스(20)는 비디오 스트림(30)이 하드 드라이브에 저장된 비디오 프로그램을 실행시킴으로써 생성될 수 있는 비디오 스트림을 저장하기 위한 하드 드라이브 저장소를 가질 수 있다. 비디오 디바이스(20)가 테잎, 하드 드라이드, 또는 유사한 레코딩 능력을 가질 때, 디바이스는 디스플레이된 비디오 스트림(30)과 같이 재생되는 입력 비디오 스트림(30a)을 수신 및 기록할 수 있다. 입력 스트림은 예컨대 유선 인터페이스상에서(예컨대, 케이블 텔레비젼 방송, 서버로부터의 웹 캐스트, 등) 또는 무선(예컨대, 통상적인 공중파(OTA) 텔레비젼 방송, 위성 텔레비젼 방송, 또는 에어 인터페이스를 통한 다른 방송)으로 수신될 수 있다. 그러한 디바이스에서, 디스플레이된 비디오 스트림(30)은 초기에 입력 비디오 스트림(30a)(예컨대, 저장된 스트림이 아님)일 수 있다. 일단 리플레이가 개시되면, 디스플레이된 스트림(30)은 입력 스트림(30a)에 뒤에 오며 메모리 내에 저장된 스트림으로부터 제공된다. 비록 디바이스(20)가 디스플레이(40)로부터 분리되듯이 도시되지만, 그들은 내장 하드 드라이브를 가진 TV와 같이 동일 디바이스에 위치할 수 있다. 1 shows a system 10 for implementing in accordance with the present invention. Video device 20 generates and provides a video stream 30 that is displayed to a user via display 40. Video device 20 can be any of many typical devices, such as a video cassette recorder that runs a tape or a DVD player that runs a disc. Video device 20 may generate video stream 30 by playing a pre-recorded videotape or DVD inserted therein. Video device 20 may have hard drive storage for storing a video stream that video stream 30 may be generated by executing a video program stored on the hard drive. When video device 20 has a tape, hard drive, or similar recording capability, the device can receive and record an input video stream 30a that is played like the displayed video stream 30. The input stream may be received, for example, on a wired interface (eg, cable television broadcast, webcast from a server, etc.) or wirelessly (eg, conventional over-the-air (OTA) television broadcast, satellite television broadcast, or other broadcast via an air interface). Can be. In such a device, the displayed video stream 30 may initially be the input video stream 30a (eg, not a stored stream). Once replay is initiated, the displayed stream 30 is provided from the stream stored in memory following the input stream 30a. Although the device 20 is shown as separate from the display 40, they can be located in the same device as a TV with an internal hard drive.

비디오 스트림(30)은 프로세서(50)에 의해 실-시간 내부 프로세싱에 종속된다. (비록 프로세서(50)가 디바이스(20)의 내부에 있는 것처럼 보이지만, 프로세서(50)는 디바이스(20)의 외부에 대안적으로 위치하게 될 수 있다.) 프로세서(50)는 비디오 스트림 내에서 음성 브레이크를 검출하기 위해 프로그램된다. 음성 브레 이크를 검출하기 위해 본 발명에서 사용될 수 있는 많은 알려진 방법들이 있다. 예컨대, 도 1의 수신된 비디오 스트림(30)은 오디오 부분을 음성 및 침묵 같은 카테고리로 분할하기 위해 프로세서(50)의 오디오 특성화 모듈에서 프로세스될 수 있다. 비디오 스트림의 각각의 프레임들은 일반적으로 멜-주파수 셉스트럼 계수(mel-frequency cepstrum coefficients, MFCC), 푸리에 계수(Fourier coefficients), 기본 주파수(fundamental frequency), 대역폭 등과 같은 한 세트의 오디오 특성에 의해서 일반적으로 특성화된다. (비디오 스트림의 포맷에 따라서, 특정 프리-프로세싱(pre-processing)은 오디오 특성을 추출하도록 요구될 수 있다.) 오디오 특성은 상대적인 침묵 기간 후에 인간 음성 파라미터에 대응하는 특성에 대해서 분석된다. 상대적인 침묵 기간 후에 말하기가 시작되는 비디오 스트림에서의 위치가 확인되어 음성의 시작을 구성하는 음성 브레이크로서 프로세서(50)에 의해서 저장된다. Video stream 30 is subject to real-time internal processing by processor 50. (Although processor 50 may appear to be inside device 20, processor 50 may alternatively be located outside device 20.) Processor 50 may be voiced within a video stream. It is programmed to detect brakes. There are many known methods that can be used in the present invention to detect voice breaks. For example, the received video stream 30 of FIG. 1 may be processed in an audio characterization module of the processor 50 to divide the audio portion into categories such as voice and silence. Each frame of a video stream is typically characterized by a set of audio characteristics such as mel-frequency cepstrum coefficients (MFCC), Fourier coefficients, fundamental frequency, bandwidth, etc. Generally characterized. (Depending on the format of the video stream, certain pre-processing may be required to extract audio characteristics.) The audio characteristics are analyzed for characteristics corresponding to human voice parameters after a relative silence period. After the relative silence period the position in the video stream where speaking begins is identified and stored by the processor 50 as a voice break that constitutes the beginning of the voice.

도 2는 위에서 기술된 대로 프로세서(50)에 의해서 확인되는 비디오 스트림(30) 내의 음성 브레이크(예컨대, 음성 시작 위치)의 위치를 나타낸다. T는 비디오 스트림에서의 실행의 현재의 위치를 나타내는데 반해, T의 왼쪽에 있는 지점들은 비디오 스트림에서의 이전의 동작 위치를 나타낸다. 지점(O)은 비디오 스트림의 시작을 나타낸다. 지점(L_N,...,L₁)은 시간 T를 통해서 프로세서(50)에 의해서 확인되고 저장되는 비디오 스트림에서의 N 개의 이전 음성 브레이크의 위치를 나타낸다. (도 2에서의 위치 지점(L)은 비디오 스트림에서의 음성 브레이크 위치의 표시이다; 실제로 메모리에 저장된 음성 브레이크의 위치 데이터는 일반적으로 타임 스 탬프, 프레임 번호, 또는 비디오 스트림에서의 브레이크 위치에 대한 유사한 표시이다.) 편의를 위해서, 도 2의 대표적인 이전 음성 브레이크 위치(L)는 내림 차순으로 현재의 실행 시간(T)에 대해서 가장 오래된 것(L_N)으로부터 가장 최근의 것(L₁) 순서로 라벨이 붙여진다. 물론, 실행이 진행되면서, 새로운 음성 브레이크들이 위치(L₁) 후에 검출되고, 그들의 위치는 메모리에 저장된다. 하지만, 도 2는 일반적으로 비디오 스트림의 임의의 주어진 시간(T)을 통해서 검출되고 저장되는 N 개의 총 이전 변화 위치를 나타낸다. 2 illustrates the location of a voice break (eg, voice start position) within video stream 30 as identified by processor 50 as described above. T represents the current position of execution in the video stream, while the points to the left of T represent the previous operating position in the video stream. Point O represents the beginning of the video stream. Points L _N ..., L ₁ represent the locations of the N previous voice breaks in the video stream identified and stored by processor 50 over time T. (The position point L in FIG. 2 is an indication of the voice break position in the video stream; in practice, the position data of the voice break stored in the memory generally refers to the time stamp, frame number, or break position in the video stream. a similar display.) for convenience, a typical prior speech break location (L in Fig. 2) is the most recently from the oldest in descending order for the current run time (T) (L _N) (L ₁₎ sequence It is labeled with. Of course, as the execution proceeds, new voice breaks are detected after the location L ₁ , and their location is stored in the memory. However, FIG. 2 generally shows the N total previous change positions that are detected and stored over any given time T of the video stream.

따라서, L_N은 비디오 스트림에서 제1 음성 브레이크 위치를 나타내고, L₁은 실행시간(T)을 통해서 비디오 스트림(30) 내의 가장 최근의 음성 브레이크 위치를 나타낸다. 따라서, 만일 한 사람이 시간(T)에 말한다면, 위치(L₁)는 비디오 스트림에서 현재의 실행 위치(T)에 대해서 가장 인접한(또는 가장 최근의) 이전 음성 브레이크 위치를 표현한다. 이전의 위치(L₂)는 사람이 이야기를 시작하는 비디오 스트림에서 제2 가장 인접한 이전 위치이다.Accordingly, L _N represents the first voice break position in the video stream, and L ₁ represents the most recent voice break position in the video stream 30 through runtime T. Thus, if one speaks at time T, position L ₁ represents the nearest (or most recent) previous voice break position relative to the current execution position T in the video stream. The previous position L ₂ is the second closest previous position in the video stream where the person starts talking.

비디오 디바이스(20)는 재생 또는 리플레이 특징을 포함한다. 리플레이 특징이 시간(T)에서 사용될 때, 디바이스(20)는 프로세서(50)에 의해서 저장된 이전 음성 브레이크 위치에 접근하고 가장 인접한 이전 음성 브레이크 위치(L₁)를 검색한다. 재생 디바이스(20)는 비디오 스트림의 현재의 출력을 멈추고, 위치(L₁)로부터 리플레이를 시작한다. 위치(L₁)로부터 리플레이함으로써, 리플레이는 비디오 스트림에서 가장 최근의 관련된 지점, 즉 비디오 스트림에서 가장 최근의 화자가 이야기를 시작할 때로부터 시작한다. 리플레이 특성을 두 번 사용함으로써, 리플레이는 제2 이전 음성 브레이크 위치(L₂)로부터 시작한다. 리플레이 특성은 연속으로 많은 횟수 "m"을 사용함으로써, 디바이스(20)는 비디오 스트림에서 T에 m번째 가장 인접한 이전 음성 브레이크(Lm)의 위치를 검색하고, 그 위치로부터 비디오 스트림의 리플레이를 시작한다. Video device 20 includes a playback or replay feature. When the replay feature is used at time T, device 20 approaches the previous voice break position stored by processor 50 and retrieves the nearest previous voice break position L ₁ . The playback device 20 stops the current output of the video stream and starts replay from position L ₁ . By replaying from position L ₁ , the replay starts from the most recent relevant point in the video stream, ie when the most recent speaker in the video stream starts talking. By using the replay feature twice, the replay starts from the second previous voice break position L ₂ . By using the replay feature a number of times in succession "m", the device 20 retrieves the position of the mth closest previous voice break Lm in the video stream and starts replaying the video stream from that position. .

따라서, 예컨대, 만일 디바이스(20)가 VCR이면, 확인된 이전 음성 브레이크의 저장된 위치들은 비디오 스트림 내의 프레임의 타임 스탬프(time stamp)일 수 있다. 디바이스(20)는 테잎을 선택된 이전 음성 브레이크의 타임 스탬프로 테잎을 리와인드한다. 만일 디바이스(20)가 예컨대 DVD이면 그리고 확인된 이전의 음성 브레이크들은 데이터를 트랙킹함으로써 저장되는 경우, 디바이스(20)는 레이저를 선택된 이전 음성 브레이크의 트랙 위치로 이동시키며, 실행을 계속한다. 만일 디바이스(20)가 하드 드라이브에 기초한 시스템이면, 이전 음성 브레이크는 저장된 비디오 스트림의 대응 프레임에 대한 메모리 어드레스에 의해서 확인될 수 있다. 리플레이 명령이 수신될 때, 비디오 스트림(30)은 선택된 이전 음성 브레이크에 대한 메모리 어드레스에서 시작하는 출력이다.Thus, for example, if the device 20 is a VCR, the stored positions of the identified previous voice break may be the time stamp of the frame in the video stream. The device 20 rewinds the tape to the time stamp of the selected previous voice break. If the device 20 is for example a DVD and the identified previous voice breaks are stored by tracking the data, the device 20 moves the laser to the track position of the selected previous voice break and continues execution. If the device 20 is a hard drive based system, the previous voice break can be identified by the memory address for the corresponding frame of the stored video stream. When a replay command is received, video stream 30 is an output starting at the memory address for the selected previous voice break.

리플레이 특징은 수동으로 예컨대 비디오 디바이스(20) 상의 버튼을 누름으로써, 또는 선택적으로 적절한 IR 신호를 디바이스(20)로 보내는 리모트(도시되지 않음) 상의 버튼을 누름으로써 사용될 수 있다. 대안적으로, 리플레이 특징은 음성 활성화(voice activation) 또는 제스쳐 인식(gesture recognition) 또는 다른 적당한 명령 입력에 의해서 사용될 수 있다. 예컨대, 음성 인식의 경우, 리플레이 특성이 이용될 수 있고, 사용자가 단어 "리플레이"를 말할 때마다 하나의 음성 브레이크를 뒤로 움직이게 한다. 사용자의 제스쳐 인식은 사용자의 실행을 포착하는 외부 카메라를 사용해서 디바이스(20)에 의해서 검출될 수 있다; 포착된 영상은 입력 제스쳐를 검출하기 위해서 잘-알려진 영상 검출 알고리즘을 사용하는 프로세서(50)에 의해서 서브루틴에서 프로세스될 수 있다. (예컨대, 제스쳐 인식은 비디오 스트림에서 움직임을 검출하기 위한 아래에 기술된 바와 같은 다이얼 기반 기능을 이용할 수 있다.) 유사하게, 음성 활성화는 사용자의 음성을 포착해서 잘-알려진 음성 인식 프로세싱을 이용하여 명령어에 대해 그것을 분석하는 프로세서(50)에 공급하는 디바이스(20)에 연결된 외부 스티커를 이용할 수 있다. {예컨대, 음성인식은 (비디오 스트림(30)에서의 음성 브레이크를 검출하기 위해 위에서 기술된 것과 같은) 오디오 특징을 분석할 수 있다.}The replay feature can be used manually, for example, by pressing a button on video device 20, or optionally by pressing a button on a remote (not shown) that sends an appropriate IR signal to device 20. Alternatively, the replay feature can be used by voice activation or gesture recognition or other suitable command input. For example, in the case of speech recognition, the replay feature may be used, causing one voice break to move back whenever the user speaks the word "replay". Gesture recognition of the user can be detected by the device 20 using an external camera that captures the user's execution; The captured image may be processed in a subroutine by the processor 50 using a well-known image detection algorithm to detect an input gesture. (For example, gesture recognition may utilize a dial based function as described below for detecting motion in a video stream.) Similarly, voice activation captures a user's voice and uses well-known speech recognition processing. An external sticker connected to the device 20 that supplies the processor 50 that analyzes it for instructions may be used. {For example, speech recognition may analyze audio features (such as those described above to detect speech breaks in video stream 30).

디바이스(20)는 비디오 스트림 내의 현재의 위치로부터 선택된 이전 음성 브레이크의 위치로 이동하기 때문에 바람직하게 역으로 디스플레이(40) 상에 비디오 스트림의 콘텐츠를 렌더링할 수 있다. (그러한 것이 VCR 및 DVD 수동 역 기능들의 표준 특성이다.) 이들은 사용자에게 사용자가 비디오 스트림에서 얼마나 멀리 뒤로 움직였는지에 관한 가시적인 기준 프레임을 제공한다. 추가로, 리플레이 특징이 사용되고, 비디오 스트림이 선택된 이전 음성 브레이크로 되돌아 갈 때, 실행 특성은 즉각적으로 재-사용될 수 없다. 대신에, 디스플레이 상의 비디오 출력은 음성 브레이크의 제1 프레임 상에서 "움직이지 않을" 수 있어서, 사용자가 만일 이것이 요구된 리플레이 위치인지를 가시적으로 결정하도록 한다. 만일 그렇다면, 사용자는 실행버튼을 누를 수 있고, 비디오 스트림 출력이 재시작된다. 만일 그렇치 않으면, 사용자는 리플레이 버튼을 다시 누를 수 있다. 추가로, 일단 사용자가 적어도 하나의 이전 변화 위치로 되돌아 간다면, 디바이스(20)는 눌러지면 비디오스트림에서 다음의 음성 브레이크로 순방향으로 이동하는, "순방향 이동" 특성을 가질 수 있다. 따라서, 만일 사용자가 리플레이 버튼을 사용하여 너무 멀리 되돌아 간다면, 사용자는 원하는 위치로 순방향으로 이동할 수 있다. The device 20 may preferably render the content of the video stream on the display 40 inversely because it moves from the current position in the video stream to the position of the selected previous voice break. (That's the standard feature of VCR and DVD passive reverse functions.) They give the user a visible frame of reference as to how far back the user has moved in the video stream. In addition, when the replay feature is used and the video stream returns to the selected previous voice break, the run feature cannot be immediately re-used. Instead, the video output on the display may not "move" on the first frame of the voice break, allowing the user to visually determine if this is the required replay position. If so, the user can press the play button and the video stream output is restarted. If not, the user can press the replay button again. In addition, once the user has returned to the at least one previous change location, the device 20 may have a "forward move" characteristic, which, when pressed, moves forward to the next voice break in the video stream. Thus, if the user goes back too far using the replay button, the user can move forward to the desired position.

추가로, 프로세서(50)는 모든 음성 브레이크의 위치를 현재의 실행 지점 이전에 유지할 필요가 없다. 사용자는 정상적으로 시간상 현재의 실행 지점으로부터 상당히 이전에 있는 변화 위치로부터 리플레이 하지 않는다. 따라서, 프로세서(50)는 예컨대 비디오 스트림의 현재의 실행 지점에 대해서, 마지막 10개의 변화 위치(도 2의 L₁₀ - L₁)을 단지 저장할 수 있다. 새로운 변화 위치가 비디오 스트림에서 검출되고, 메모리 위치에 추가되기 때문에, 가장 오래된 변화 위치(즉, 위에 예에서 열 번째 가장 인접한 위치)가 탈락된다. In addition, the processor 50 need not maintain the position of all voice breaks before the current execution point. The user normally does not replay from the change location significantly earlier from the current execution point in time. Thus, the processor 50 may only store the last 10 change positions (L ₁₀ -L _{1 in} FIG. 2), for example for the current execution point of the video stream. Since the new change location is detected in the video stream and added to the memory location, the oldest change location (ie, the tenth nearest location in the example above) is dropped.

위에서 기술된 특정 실시예에서, 음성 브레이크가 검출되며, 비디오 스트림의 실행과 동시에 컴파일된다. 대안적으로, 디바이스(20)로 입력되거나 디바이스(20)에 의해서 생성되는 스트림이 음성 브레이크 위치를 확인하도록 전-처리될 수 있다. 따라서, 예컨대, 디바이스(20)가 VCR이면, 비디오 테잎은 비디오 스트림이 실행할 때 비디오 스트림 내의 음성 브레이크를 확인하는 데이터 필드를 포함할 수 있다. 디바이스(20)는 따라서 비디오 스트림에서 확인될 때 버퍼 메모리에서의 음성 브레이크의 위치를 저장하고, 위에서 기술된 바와 같이 리플레이 기능에서의 위치를 이용한다. 대안적으로, 리플레이 기능이 사용될 때, 디바이스(20)는 테잎이 리와인드 될 때 데이터 필드로부터 이전 음성 브레이크의 위치를 검출할 수 있다. 따라서, 테잎은 선택된 수의 음성 브레이크들에 의해서 리와인드 될 수 있다. 또 다른 변형 예에서, 음성 브레이크 위치들은 데이터 세트로서 테잎의 시작에서 포함될 수 있다. 데이터 세트는 비디오 스트림의 출력 이전에 테잎에서 디바이스(20)로 다운로드 되고, 리플레이 기능 동안에 비디오 스트림에서 현재의 위치이전에 음성 브레이크의 위치를 확인하기 위해 사용된다. 비록 VCR 실시예가 본 발명에서 중점이 되지만, 유사한 변형이 다른 종류의 비디오 디바이스에 적용된다.In the specific embodiment described above, the voice break is detected and compiled at the same time as the execution of the video stream. Alternatively, the stream input to or generated by device 20 may be pre-processed to confirm the voice break position. Thus, for example, if the device 20 is a VCR, the video tape may include a data field that identifies a voice break in the video stream when the video stream executes. The device 20 thus stores the position of the voice break in the buffer memory when identified in the video stream and uses the position in the replay function as described above. Alternatively, when the replay function is used, device 20 may detect the location of the previous voice break from the data field when the tape is rewinded. Thus, the tape can be rewinded by a selected number of voice breaks. In another variation, voice break positions may be included at the beginning of the tape as a data set. The data set is downloaded from the tape to the device 20 prior to the output of the video stream and used to identify the position of the voice break before the current position in the video stream during the replay function. Although VCR embodiments are the focus of the present invention, similar modifications apply to other kinds of video devices.

도 3은 본 발명의 실시예에서 수행된 절차와 처리의 플로차트를 제공한다. 단계 100에서, 비디오 스트림이 수신되거나 또는 생성된다. 단계 110에서, 수신된 또는 생성된 비디오 스트림이 음성 브레이크를 미리-확인하는 데이터를 포함할지가 결정된다. 만일 그렇지 않다면, 비디오 스트림이 처리되고, 음성 브레이크는 검출되고 비디오 스트림에서 음성 브레이크의 위치들은 실시간으로 저장된다(즉, 비디오 스트림이 실행되듯이)(단계 120). 비디오 스트림이 출력일 때, 프로세싱은 리플레이 특징이 사용되어야 할지(단계 130)를 모니터링한다. 만일 그렇다면, 비디오 스트림은 가장 인접한 이전 음성 브레이크(L₁)의 위치로부터 리플레이 되거나, 또는, 만일 리플레이 특징이 m번 사용된다면, m번 째 가장 인접한 이전 음성 브레이크(Lm)(140단계)의 위치로부터 리플레이 된다. (리플레이 특성이 사용될 수 있는 횟수 m은 저장된 음성 브레이크의 위치의 수와 같거나 작은 임의의 정수 1,2,... 이다.) 프로세싱은 단계 120으로 되돌아 가는데, 여기서 비디오 스트림 출력과 음성 브레이크들의 검출이 지속된다. (이 경우, 음성 브레이크 검출은 비디오 스트림이 이전에 리플레이되는 지점을 통과할 때까지 디스플레이 될 수 있는데, 이는 이들 브레이크들이 이미 검출되고 저장되어 왔기 때문이다.) 만일 리플레이 특성이 단계 130에서 사용되지 않는다면, 비디오 스트림이 단계 150에서 종료될지가 결정된다. 만일 그렇다면, 프로세싱은 종료된다(단계 160). 만일 그렇지 않다면, 프로세싱은 또한 단계 120으로 되돌아 간다. 3 provides a flowchart of the procedures and processing performed in an embodiment of the present invention. In step 100, a video stream is received or generated. In step 110, it is determined whether the received or generated video stream includes data pre-confirming the voice break. If not, the video stream is processed, the voice break is detected and the positions of the voice break in the video stream are stored in real time (ie, as the video stream is executed) (step 120). When the video stream is an output, processing monitors whether a replay feature should be used (step 130). If so, the video stream is replayed from the position of the nearest previous voice break L ₁ , or if the replay feature is used m times, from the position of the mth nearest previous voice break L m (step 140). Will be replayed. (The number of times m where the replay feature can be used is any integer 1,2, ... which is less than or equal to the number of stored voice break positions.) Processing returns to step 120 where the video stream output and voice breaks are Detection continues. (In this case, voice break detection may be displayed until the video stream passes through the point where it was previously replayed, since these breaks have already been detected and stored.) If the replay feature is not used in step 130 It is determined whether the video stream ends at step 150. If so, processing ends (step 160). If not, processing also returns to step 120.

만일 음성 브레이크 데이터가 단계 110의 비디오 데이터 스트림 내에서 미리-확인된다면, 비디오 스트림은 단계 120a에서의 출력이다. 비디오 스트림이 출력이므로, 프로세싱은 리플레이 특징이 사용되는지를 모니터링한다(단계 130a). 만일 그렇다면, 비디오 스트림은 가장 인접한 이전 음성 브레이크의 위치로부터 리플레이 되거나 또는 만일 리플레이 특성이 m번 사용된다면, m번째 가장 인접한 이전 음성 브레이크로부터 리플레이 된다(단계 140a). 이것은 단계 120a에서의 비디오 스트림에 포함되는 음성 브레이크 위치를 이용한다. 프로세싱은 그 다음에 단계 120a로 되돌아 가는데, 여기서 비디오 스트림 출력이 계속된다. 만일 리플레이 특징이 이 단계 130a에서 사용되지 않는다면, 비디오 스트림이 단계 150a에서 완료되는지가 결정된다. 만일 그렇다면, 프로세싱은 종료된다(단계 160). 만일 그렇지 않다면, 프로세싱은 또한 단계 120a로 되돌아 간다. If the voice break data is pre-identified in the video data stream of step 110, then the video stream is the output in step 120a. Since the video stream is an output, processing monitors whether the replay feature is used (step 130a). If so, the video stream is replayed from the position of the nearest previous voice break, or if the replay feature is used m times, from the mth nearest previous voice break (step 140a). This uses the voice break position included in the video stream in step 120a. Processing then returns to step 120a, where the video stream output continues. If the replay feature is not used at this step 130a, it is determined if the video stream is completed at step 150a. If so, processing ends (step 160). If not, processing also returns to step 120a.

위에서 기술된 디바이스, 시스템 및 방법들은 리플레이 지점으로서 음성 브레이크에 중점을 둔다. 비디오 스트림의 현재의 실행 위치(T)에 대해서 이전 음성 브레이크로부터 리플레이함으로써, 비디오 스트림은 자연스런 오디오 콘텐츠 변화 위치로부터 리플레이해서, 사용자에게 오디오 및 비디오의 일치하는 이전 세그먼트를 제공한다. 다른 리플레이 위치는 사용자에게 일관성을 제공하고 본 발명의 프로세싱에서 리플레이 위치로서 또한 포함될 수 있다. 일관된 리플레이 위치를 제공할 수 있는 비디오 스트림에서 다른 중대한 콘텐츠 변화는 장면 변화 또는 화면컷(shot cut)를 포함한다. 예컨대, 사용자는 일시적으로 산만해지고 현재의 장면의 시작으로 되돌아 가기를 원할 수 있다. 따라서, 도 1에서 디바이스(20)의 프로세서(50)는 비디오 스트림에서의 화면 컷의 위치를 검출해서 저장한다. 비록 많은 경우에 음성 브레이크 중의 하나가 화면 컷과 거의 동시에 생성한다 할지라도, 양 타입의 변화 위치를 리플레이 지점으로 이용가능하게 하는 것은 사용자에게 추가적인 유연성을 제공해준다. The devices, systems and methods described above focus on voice breaks as replay points. By replaying from the previous voice break for the current execution position T of the video stream, the video stream replays from the natural audio content change position, providing the user with a matching previous segment of audio and video. Other replay positions provide consistency to the user and may also be included as replay positions in the processing of the present invention. Other significant content changes in the video stream that can provide a consistent replay location include scene changes or shot cuts. For example, a user may want to temporarily distract and return to the beginning of the current scene. Therefore, in FIG. 1, the processor 50 of the device 20 detects and stores the position of the screen cut in the video stream. Although in many cases one of the voice breaks generates almost simultaneously with the screen cut, making both types of change positions available as replay points provides the user additional flexibility.

예컨대, 도 1의 비디오 스트림(30)은 비디오 스트림에서의 화면 컷을 검출하기 위해 프로세서(50)에 의해서 추가로 프로세스될 수 있다. "장면-컷(scene cuts)" 및 "화면 컷(shot cut)"은 유사한 개념을 가리키며 이후로는 교환가능하게 사용될 것이다. 장면-컷 또는 화면 컷은 전형적으로 연속적인 프레임 사이에서 비 디오 콘텐트에서의 실질적인 변화를 가리킨다. (더욱 일반적으로, 그것은 작은 수의 프레임상에 비디오 콘텐츠의 실질적인 변화를 가리켜서, 비디오 스트림이 비디오 콘텐츠의 이산적인 변화를 수행하는 것처럼 보이게 한다.) 다시 말해서, 매우 상관관계가 없는 연속적인 프레임들은 장면 컷 또는 화면 컷을 나타낸다. "화면 컷"이라는 용어가 아래에서 사용될 것이나 제한하도록 의도되지는 않는다.For example, video stream 30 of FIG. 1 may be further processed by processor 50 to detect screen cuts in the video stream. "Scene cuts" and "shot cuts" refer to similar concepts and will be used interchangeably thereafter. Scene-cut or screen cut typically indicates a substantial change in video content between successive frames. (More generally, it indicates a substantial change in the video content on a small number of frames, making the video stream appear to perform discrete changes in the video content.) In other words, continuous frames that are not highly correlated are Indicates a scene cut or screen cut. The term "screen cut" will be used below but is not intended to be limiting.

전형적인 화면 컷은 하나의 세팅(위치)에서 다른 하나로의 변화를 포함한다. 하나의 화면 컷은 비록 위치가 동일하게 유지된다 할지라도, 또한 시간상의 변화를 포함할 수 있다. 예컨대, 야외 화면 컷(outdoor shot cut)은 위치에서의 변화없이 일광으로부터 밤중으로의 갑작스런 변화를 포함할 수 있는데, 이는 연속적인 비디오 프레임에 실질적인 콘텐츠의 변화가 있기 때문이다. 또 다른 화면 컷의 관련 예는 동일한 위치를 사용하지만, 위치의 시계의 변화를 포함한다. 잘-알려진 화면 컷의 예는 뮤직 비디오에서 생성하는데, 여기서 연기자들은 급격히 연속적으로 많은 다른 관점으로부터 보여질 수 있다. Typical screen cuts include a change from one setting (position) to another. One screen cut may also include a change in time, although the position remains the same. For example, an outdoor shot cut can include a sudden change from daylight to night without a change in position because there is a substantial change in content in successive video frames. A related example of another screen cut uses the same location, but involves a change in the clock of the location. An example of a well-known screen cut is generated in a music video, where actors can be seen from many different points of view in rapid succession.

비디오 스트림(30)은 따라서 또한 비디오 스크림 내에서의 화면 컷을 검출하기 위해서 프로세서(50)에 의한 실-시간 내부 프로세싱에 종속된다. 비디오 스트림을 분석하고 본 발명에서 사용될 수 있는 화면 컷을 검출하는데 이용가능한 많은 알려진 기술들이 있다. 본 발명에 사용될 수 있는 다양한 기술은 비디오가 실시간으로 플레이 하고 있을 때 화면 컷의 검출을 제공한다. 예컨대, 많은 기술들은 일반적으로 연속적인 프레임들 사이에서 이산 코사인 변환(DCT) 계수를 분석함으로써 비디오 스트림에서의 화면 컷을 확인하는데 의존한다. 비디오 스트림이 예컨대 MPEG 표준에 따라 압축될 때, DCT 계수들은 비디오 스트림이 디코드되고 있을때(즉, 실시간으로) 추출될 수 있다. 일반적으로, 프레임 픽셀의 많은 매크로 블록에 대한 DCT 값은 많은 이용가능한 비교 알고리즘 중의 하나에 따라 연속적인 프레임을 위해 결정되고 비교된다. 프레임 사이의 DCT 값의 차이가 특정 알고리즘에 따라 임계값을 초과할 때, 화면 컷이 지시된다. 만일 비디오 스트림이 MPEG 인코딩이 아니면, 빠른 DCT 변환이 수신된 프레임들의 매크로 블록에 제공되어서, 화면 컷 검출에 대한 그러한 실-시간 프로세싱을 허락한다.Video stream 30 is thus also subject to real-time internal processing by processor 50 to detect screen cuts within the video stream. There are many known techniques available for analyzing video streams and detecting screen cuts that can be used in the present invention. Various techniques that can be used in the present invention provide for the detection of screen cuts when the video is playing in real time. For example, many techniques generally rely on identifying picture cuts in a video stream by analyzing discrete cosine transform (DCT) coefficients between successive frames. When the video stream is compressed, for example according to the MPEG standard, the DCT coefficients can be extracted when the video stream is being decoded (ie in real time). In general, the DCT values for many macro blocks of frame pixels are determined and compared for consecutive frames according to one of many available comparison algorithms. When the difference in the DCT value between the frames exceeds the threshold according to a particular algorithm, a screen cut is indicated. If the video stream is not MPEG encoded, a fast DCT transform is provided in the macro block of the received frames, allowing such real-time processing for picture cut detection.

그러한 기술의 예는 N. Dimitrova, T. McGee & H. Elenbaas의 "Video Keyframe Extraction and Filtering: A Keyframe is Not A Keyframe To Everyone"{제6차 정보 및 지식 경영에 관한 국제회의 과정(ACM CIKM'97), 라스베가스, NV(Nov. 10-14, 1997), ACM 1997, pp. 113-120}에 기술되어 있고, 그 내용은 여기에서 참조로서 삽입된다. (예컨대 섹션 2.1, "Video Cut Detection". 참조)An example of such a technique is "Video Keyframe Extraction and Filtering: A Keyframe is Not A Keyframe To Everyone" by N. Dimitrova, T. McGee & H. Elenbaas {The Sixth International Conference on Information and Knowledge Management (ACM CIKM '). 97), Las Vegas, NV (Nov. 10-14, 1997), ACM 1997, pp. 113-120, the contents of which are incorporated herein by reference. (See section 2.1, "Video Cut Detection", for example).

따라서, 프로세서(50)는 실시간으로 비디오 스트림(30)에서의 화면 컷을 확인하기 위해서 적어도 하나의 그러한 기술들을 사용한다. 비디오 스트림에서 확인된 화면 컷 위치들은 전에 기술된 바와 같이 음성 브레이크 위치와 함께 연속적으로 저장된다. 비디오 스트림에서의 위치들은 프레임 번호, 시간 스탬프 등등에 의해서 확인될 수 있다. 따라서, 도 2를 다시 참조하여, 이 경우 묘사된 L_N - L₁은 동작 지점 (T)까지 비디오 스트림의 N 개의 이전 "내용 변화"(음성 브레이크 또는 화면 컷)의 위치를 보여준다. 예컨대, 마지막 변화 위치(L₁)는 시간 T에서 현재 말하 는 배우가 말을 시작하는 비디오 스트림에서의 위치를 표시할 수 있다. L₂ - L₅는 스트림에서의 유사한 이전 음성 브레이크 위치를 표시하며, L₆는 마지막 화면 컷 위치 등을 표시할 수 있다. 사용자가 리플레이 기능을 사용할 때, 비디오 스트림은 이 경우 L₁에서 마지막 변화 위치로부터 리플레이 된다. 따라서, 만일 사용자가 예컨대 현재의 화자의 단어를 놓친다면, 리플레이 특징을 한번 누름으로써 현재의 화자가 말하기 시작하는 지점에서 비디오 스트림을 시작한다. Accordingly, processor 50 uses at least one such technique to identify the screen cut in video stream 30 in real time. The picture cut positions identified in the video stream are stored continuously with the voice break position as previously described. Locations in the video stream can be identified by frame number, time stamp, and the like. Thus, referring again to FIG. 2, the depicted L _N -L ₁ in this case shows the location of N previous "content changes" (voice breaks or screen cuts) of the video stream up to the operating point T. For example, the last change position L ₁ may indicate the position in the video stream at which the currently speaking actor starts speaking. L ₂ -L ₅ may indicate a similar previous voice break position in the stream, L ₆ may indicate the last screen cut position, and the like. When the user uses the replay function, the video stream is replayed from the last change position in L ₁ in this case. Thus, if the user misses a word of the current speaker, for example, pressing the replay feature once starts the video stream at the point where the current speaker starts speaking.

유사하게, 리플레이 기능을 두 번 사용하는 것은 다음의 이전 음성 브레이크(L2)로부터 비디오 스트림을 리플레이한다. (다음의 이전 음성 브레이크는 다른 화자의 음성을 시작일 수 있다. 그것은 또한 만일 화자가 음성 시작 위치 L1 및L2 사이에서 특히 중지한다면, 시간 T에서 현재의 화자에 대한 또 다른 음성의 시작일 수 있다.) 리플레이 기능을 m번 누르는 것은 m번째 이전의 변화 위치로부터 비디오 스트림을 리플레이한다. 바람직하게, 비디오 스트림은 리플레이 특징이 사용되듯이 역으로 렌더링된다. 이것은 사용자가 특정의 관심의 변화(예컨대, 지점 L₆일 수 있는 마지막 화면 컷과 같음)을 확인하고, 순방향 실행이 재-시작되도록 할 수 있다. Similarly, using the replay function twice replays the video stream from the next previous voice break L2. (The next previous voice break may be the start of another speaker's voice. It may also be the start of another voice for the current speaker at time T, if the speaker specifically stops between voice start positions L1 and L2.) Pressing the replay function m times replays the video stream from the mth previous change position. Preferably, the video stream is rendered inversely as the replay feature is used. This may allow the user to see a particular change of interest (eg, the same as the last screen cut, which may be point L ₆ ), and allow forward execution to be re-started.

화면 컷 위치 및 음성 브레이크 위치(말하기가 상대적인 침묵 후에 시작되는 위치와 같음)를 포함하는 모든 변화 위치들은 데이터 스트림에서 또한 미리-확인될 수 있다는 데 주목하라. 따라서, 위에서 기술된 바와 같이, 프로세서(50)는 리플레이 기능 동안 비디오 스트림에서 미리-확인된 것처럼 변화의 위치를 이용할 수 있다. 추가로, 도 3은 화면 컷과 음성 브레이크들이 프로세서(50)에 의해서 검출되고 메모리에 집적된 방식으로 저장되는 경우에 사용되는 프로세싱 단계를 나타낼 수 있다. 따라서, 도 3에서 묘사된 각 단계들에 대해서, "음성 브레이크"에 대한 중점은 예컨대 음성 브레이크 및 화면 컷 모두로 구성되는 "콘텐츠 변화"로 일반화될 수 있다.Note that all change positions, including the screen cut position and the voice break position (speaking are the same as the position starting after relative silence), can also be pre-identified in the data stream. Thus, as described above, the processor 50 may use the location of the change as pre-identified in the video stream during the replay function. In addition, FIG. 3 may represent a processing step used when screen cuts and voice breaks are detected by the processor 50 and stored in an integrated manner in memory. Thus, for each of the steps depicted in FIG. 3, the emphasis on "voice break" can be generalized to "content change" consisting of both voice break and screen cut, for example.

위에서 언급된 바와 같이, 화면 컷은 프레임 사이에서 실질적인 변화를 검출하기 위한 연속적인 프레임들의 매크로 블록을 위한 DCT 계수에서의 변화를 모니터링함으로써 많은 방식으로 검출될 수 있다. 하지만, 특정 변화는 덜 실질적이나 그럼에도 불구하고 사용자에게 중요한 변화 지점이 될 수 있는 동일 화면(shot) 내에서 생성할 수 있다. 예컨대, 화면 내에서 움직이기 시작하는 배우(또는 대상물)은 사용자에 대한 관심의 변화일 수 있다. 유사하게, (예컨대, 문을 통해 화면속으로 들어와서)화면에 추가되는 또 다른 배우는 관심의 변화일 수 있다. 그러한 변화는 위에서 언급된 상대적인 침묵의 기간 후에 말하기 시작하는 배우와 유사하다. 그들은 사용자에게 관심의 변화일 수 있지만, 하나의 화면 내에서 생성할 수 있다. 따라서, 장면 내에서 배우(또는 대상물)의 움직임의 변화는 본 발명의 목적을 위한 현저한 콘텐츠 변화를 포함할 수 있다. As mentioned above, the picture cut can be detected in many ways by monitoring the change in the DCT coefficient for the macro block of successive frames to detect the substantial change between the frames. However, certain changes can be made within the same shot, which may be less substantial but nevertheless an important point of change for the user. For example, an actor (or object) starting to move within the screen may be a change of interest in the user. Similarly, another actor added to the screen (eg, entering the screen through a door) may be a change of interest. Such a change is similar to the actor who begins to speak after the period of relative silence mentioned above. They can be a change of interest to the user, but can be created within one screen. Thus, changes in the actor's (or object's) movement within the scene may include significant content changes for the purposes of the present invention.

따라서, 그러한 움직임의 변화가 시작되는 위치로부터 리플레이하는 것은 사용자에게 리플레이의 일관성을 제공하고 본 발명의 프로세싱에서 리플레이 위치로서 포함될 수 있다. 따라서, 예컨대, 사용자는 장면에서의 배우가 문을 향해서 걷기 시작하는 비디오 스트림에서의 최근 지점으로 되돌아 가길 원할 수 있다. 따라서, 도 1에서 디바이스(20)의 프로세서(50)는 장면 내의 사람들이나 대상물을 확인 하고, 사람이나 대상물이 정지한 후 움직이기 시작하는 비디오 스트림에서의 위치를 저장할 수 있다. Thus, replaying from the position at which such a change in movement begins gives the user consistency of the replay and can be included as a replay position in the processing of the present invention. Thus, for example, the user may want to return to the most recent point in the video stream where the actor in the scene begins walking towards the door. Accordingly, in FIG. 1, the processor 50 of the device 20 may identify people or objects in the scene and store the positions in the video stream where the people or objects start moving after they have stopped.

예컨대, 도 1의 비디오 스트림(30)은 화면내에서 사람의 윤곽 및/또는 얼굴을 확인하고 프레임 사이에서 그들의 움직임을 검출하기 위해서 프로세서(50)에서 추가로 프로세스될 수 있다. 이러한 목적을 위해서 프로세서(50)에서 프로그램될 수 있는 당업계에서 이용가능한 실-시간 영상 인식 및 움직임 검출 방법 및 기술들이 있다. 예컨대, 비디오 스트림에서 움직이는 사람들을 확인하기 위해서 사용될 수 있는 기술은 공통적으로-소유된 그리고 공동-계류중인 Gutta등에 의한 "Classification Of Objects Through Model Ensembles"로 명명된 2001년 2월 27일에 출원된 미국 특허 출원번호 09/794,443에 기술되어 있으며, 그 내용은 본 발명에서 참조로 통합되었다. (미국 특허 출원 09/794,443 또한 국제 공개 번호 WO 02/069267 A2를 갖는 WIPO에 의해 공개된 PCT 출원에 대응한다는 것에 주목할 필요가 있다.) 따라서, 사람들이 정지 후에 움직이기 시작하는 비디오 스트림에서의 위치가 프로세서(50)에 의해서 확인되고 저장된다. For example, video stream 30 of FIG. 1 may be further processed in processor 50 to identify the contours and / or faces of people within the screen and to detect their movement between frames. There are real-time image recognition and motion detection methods and techniques available in the art that can be programmed in the processor 50 for this purpose. For example, a technique that can be used to identify people moving in a video stream is the United States, filed February 27, 2001, entitled "Classification Of Objects Through Model Ensembles" by commonly-owned and co-pending Gutta et al. Patent Application No. 09 / 794,443, the contents of which are incorporated herein by reference. (It should be noted that US patent application 09 / 794,443 also corresponds to a PCT application published by WIPO with International Publication No. WO 02/069267 A2.) Thus, the position in the video stream where people begin to move after stopping. Is identified and stored by the processor 50.

비디오 스트림에서 사람들의 움직임의 그러한 시작에 대응하는 위치들은 이전에 기술된 것과 동일한 방법으로 검출된 화면 컷과 저장소에서의 음성 브레이크의 위치와 통합된다. 따라서, 도 2에 나타난 각각의 저장된 변화 위치는 말하기의 시작, 움직임의 시작, 또는 비디오 스트림에서의 화면 컷을 위한 이전의 위치일 수 있다. 예컨대, L₁은 대상에 이르기 시작하는 현재의 화면에 배우의 위치를 나타낼 수 있고, L₂는 화면에서 현재 말하는 배우에 의해서 말하기의 시작의 위치를 나타내고, L₃는 마지막 화면 컷 등을 나타낼 수 있다. 사용자가 리플레이 기능을 사용할 때, 비디오 스트림은 현재의 실행 위치(T)에 대해서 가장 인접한 종래 변화 위치인, L₁으로부터 리플레이된다. 이것은 배우가 대상물에 도달하기 시작하는 지점에서 비디오 스트림을 시작한다. 재생을 누르는 것은 다시 현재의 배우에 의한 말하기의 시작인 L₂로부터의 비디오 스트림을 리플레이한다. The positions corresponding to such a start of people's movement in the video stream are integrated with the position of the voice break in the storage and the screen cut detected in the same way as previously described. Thus, each stored change location shown in FIG. 2 may be the start of speech, the start of motion, or the previous location for screen cuts in the video stream. For example, L ₁ may indicate the position of the actor on the current screen starting to reach the target, L ₂ may indicate the position of the start of speech by the actor currently speaking on the screen, L ₃ may represent the last screen cut, and the like. have. When the user uses the replay function, the video stream is replayed from L ₁ , which is the closest conventional change position with respect to the current execution position (T). This starts the video stream at the point where the actor starts to reach the object. Pressing play again replays the video stream from L _{2, which} is the beginning of speech by the current actor.

다양한 사용자는 본 발명의 시스템과 디바이스가 리플레이 기능을 주문형으로 만들기 위해서 사용할 수 있는 특정 리플레이 성향을 가질 수 있다. 여컨대, 하나 이상의 특정 사용자 군(family)은 전형적으로 비디오 스트림에서의 마지막 화면 컷 위치로 되돌아 가기 위한 리플레이 기능을 사용한 다음, 디바이스(20)는 디폴트 리플레이 위치로서 가장 최근의 종래 화면 컷을 설정할 수 있다. 디바이스(20)는 시간에 걸쳐 리플레이 입력을 모니터하고 시스템의 하나 이상의 사용자들의 집합적인 기호를 반영하기 위한 리플레이 기준을 조정하는 학습 알고리즘을 포함 할 수 있다. 이들은 시간에 따라 변할 수 있다. 유사한 방식으로, 시스템과 디바이스는 시스템과 디바이스를 사용하는 개별적인 다른 사용자를 위해 리플레이 기능을 맞춤제작할 수 있다. 그런 경우, 디바이스(20)는 (로그인 프로세스와 같은) 각각의 사용자에 대한 확인 절차를 가지고, 다양한 사용자들의 특성을 모니터하고 저장한다. 추가로, 비디오 스트림을 위한 저장된 변화 위치는 또한 변화 형태(화면 컷, 음성, 실행 등)를 포함해서, 리플레이는 현재의 사용자의 기호에 대응하지 않는 이들 인 터리빙 변화 위치를 건너뛸 수 있다. 그러한 기호-기반 리플레이는 사용자가 모든 위치를 통해서 순차적으로 되돌아 가도록 허락하도록 원래의 리플레이 특징을 남겨둔 채, 다른 입력(예컨대, "반복-2"입력)에 의해서 시작될 수 있다.Various users may have specific replay propensities that the systems and devices of the present invention can use to customize replay functionality. For example, one or more specific user families typically use the replay function to return to the last screen cut position in the video stream, and then device 20 may set the most recent conventional screen cut as the default replay position. have. The device 20 may include a learning algorithm that monitors replay input over time and adjusts replay criteria to reflect the collective preferences of one or more users of the system. These can change over time. In a similar manner, systems and devices can customize replay functionality for individual users who use the system and devices. In such a case, device 20 has a verification procedure for each user (such as a login process) and monitors and stores the characteristics of the various users. In addition, the stored change positions for the video stream also include change types (screen cuts, voice, play, etc.) so that replays can skip these interleaving change positions that do not correspond to the current user's preferences. Such symbol-based replay may be initiated by another input (eg, a "repeat-2" input), leaving the original replay feature to allow the user to go back sequentially through all locations.

또한, 위치(L_N - L₁)는 다른 콘텐츠 변화(화면 컷, 음성 브레이크 등)로 구성되는 경우, 다른 리플레이 기능들은 각각의 형태의 변화로부터 재생에 이용될 수있다. 그러한 경우에, 프로세서(50)는 변화 위치를 가지는 변화 형태를 저장한다. In addition, when the position L _N -L ₁ is composed of other content changes (screen cut, voice break, etc.), other replay functions can be used for playback from each type of change. In such a case, the processor 50 stores the change type with the change position.

추가로, 도 1을 다시 참조하여, 디바이스(20)는 사용자의 디스플레이 비다이스(40)에 유선 또는 무선 인터페이스를 통해 비디오 스트림(30)을 제공하는 서비스 제공자에 대안적으로 위치할 수 있다. 디바이스(20)는 위에서 기술된 것과 같은 방식으로 비디오 스트림에서의 위치 변화를 결정하고 검출하기 위해서 비디오 스트림을 프로세싱한다. 사용자가 리플레이 특징을 이용할 때, 그것은 위에서 기술된 바와 같이 종래의 변화 지점 위치로부터 비디오 스트림을 리플레이 하는 서비스 제공자에게 전달된다. In addition, referring again to FIG. 1, device 20 may alternatively be located in a service provider that provides a video stream 30 over a wired or wireless interface to a user's display device 40. The device 20 processes the video stream to determine and detect a change in position in the video stream in the same manner as described above. When the user uses the replay feature, it is passed to the service provider who replays the video stream from the conventional change point location as described above.

추가로, 위의 예시적인 실시 예에서, 비디오 스트림에서의 이전의 변화 지점으로의 하나의 움직임이 리플레이 특징의 개별적인 사용에 의해서 행해진다. 따라서, 예컨대, 비디오 스트림에서의 "m" 변화 위치로 다시 움직이기 위해서, 플레이백 옵션은 "m"번 사용되는 것으로 기술된다. 리플레이 특징을 이용하는 다른 방법들은 가능하며 본 발명에 의해서 포함된다. 예컨대, 하나의 제어 입력은 리플레이 특징이 "m" 변화 위치들을 뒤로 이동시키도록 할 수 있다. 예컨대, 입력이 리모트 를 경유할 때, 채널 번호 "5"는 리플레이 특징이 비디오 스트림에서의 5 변화 위치를 뒤로 움직이도록 리모트에서 눌려질 수 있다. 대안적으로, 제스쳐 인식을 통해 입력이 있는 경우 , 3 손가락을 올리는 것은 리플레이 특징이 비디오 스트림에서 3 변화 위치를 뒤로 움직이도록 할 수 있다. In addition, in the above example embodiment, one movement to a previous point of change in the video stream is done by the individual use of the replay feature. Thus, for example, to move back to the "m" change position in the video stream, the playback option is described as being used "m" times. Other methods of using the replay feature are possible and are encompassed by the present invention. For example, one control input may cause the replay feature to move the "m" change positions back. For example, when the input is via a remote, channel number “5” may be pressed at the remote such that the replay feature moves back 5 shift positions in the video stream. Alternatively, if there is input through gesture recognition, raising three fingers may cause the replay feature to move back three shift positions in the video stream.

추가로, 위에서 예시된 콘텐츠의 변화는 제한되도록 의도되지 않는다. 본 발명은 검출될 수 있는(또는 미리-확인될 수 있는) 그리고 리플레이 위치로 사용될 수 있는 임의의 형태의 현저한 콘텐츠의 변화를 포함한다. 예컨대, 위의 실시예에서 음성 시작을 포함하는 음성 브레이크와 움직임의 시작을 포함하는 움직임의 변화가 예시되었다. 대안적으로(또는 추가로), 음성 및 움직임 종료는 콘텐츠 변화 위치로서 사용될 수 있다. 컬러 밸런스, 오디오 볼륨, 음악의 시작 및 종료 등과 같은 다른 콘텐츠의 변화 또한 사용될 수 있다.In addition, changes in the content illustrated above are not intended to be limiting. The present invention encompasses any form of significant content change that can be detected (or pre-identified) and used as a replay location. For example, in the above embodiment, the voice break including the voice start and the change in the motion including the start of the movement are illustrated. Alternatively (or in addition), voice and motion termination may be used as the content change location. Other content changes such as color balance, audio volume, music start and end, etc. can also be used.

추가로, 본 발명의 위에서 언급된 실시 예들은 오디오 성분을 가지는 비디오 스트림에 중점을 두었으나, 본 발명은 비디오 성분을 포함하는 미디어 스트림에 제한되지 않는다. 따라서, 본 발명은 다른 미디어 스트림을 포함한다. 예컨대, 본 발명은 또한 오디오 스트림 만의 유사한 프로세싱을 포함한다. 이러한 맥락에서, 오디오 스트림은 예컨대 테잎 플레이어, CD 플레이어 또는 하드드라이브 기반 디바이스로부터 생성할 수 있다. (초기에, 사용자가 리플레이 기능을 개시하기 전에, 위부 오디오 스트림이 수신될 수 있고, 동시에 레코드되는 동안 디바이스에 의해서 실시간으로 출력될 수 있다. 일단 리플레이 특징이 개시되면, 오디오 스트림은 수신된 스트림 뒤에 오며 저장 매체로부터 생성된다.) 오디오 스트림에 포함된 이전 음성 브레이크를 검출하고 저장하기 위한 오디오 스트림의 프로세싱은 위에서 기술된 비디오 스트림의 처리에서와 유사한 방식으로 진행된다. 사용자가 리플레이 특징을 사용할 때, 예컨대, 오디오 스트림이 중단되고, 리플레이 특징에 의해서 사용자로부터 수신되는 입력에 따라서 결정되는 이전 음성 브레이크로부터 리플레이 된다. In addition, while the above-mentioned embodiments of the present invention focus on a video stream having an audio component, the present invention is not limited to a media stream including a video component. Thus, the present invention includes other media streams. For example, the present invention also includes similar processing of the audio stream only. In this context, the audio stream can be generated, for example, from a tape player, CD player or hard drive based device. (Initially, before the user initiates the replay function, the upper audio stream can be received and output in real time by the device while being recorded at the same time. Once the replay feature is initiated, the audio stream is followed by the received stream. And from the storage medium.) The processing of the audio stream for detecting and storing the previous voice break included in the audio stream proceeds in a similar manner as in the processing of the video stream described above. When the user uses the replay feature, for example, the audio stream is interrupted and replayed from the previous voice break determined by the input received from the user by the replay feature.

본 발명이 몇몇 실시 예를 참조하여 기술되었으나, 본 발명이 도시되고 기술된 특정 형태에 제한되지 않는다는 것은 당업자에 의해서 이해될 수 있을 것이다. 따라서, 형태와 세부사항에 있어서 다양한 변화가 첨부된 청구항에 의해서 한정된 바와 같은 본 발명의 정신과 범위로부터 벗어나지 않고 만들어질 수 있다. 예컨대, 위에서 언급된 바와 같이, 음성 브레이크를 검출하고, 화면 컷, 영상 인식과 움직임을 검출하기 위해 본 발명에서 사용될 수 있는 많은 기술이 있다. 따라서, 위에서 기재된 음성 브레이크를 검출하며, 화면 컷, 영상 인식 및 움직임 검출에 관련된 특정 기술들은 단지 예에 의해서 사용되며 본 발명의 범위를 제한하지 않는다.While the invention has been described with reference to some embodiments, it will be understood by those skilled in the art that the invention is not limited to the specific forms shown and described. Accordingly, various changes in form and detail may be made without departing from the spirit and scope of the invention as defined by the appended claims. For example, as mentioned above, there are many techniques that can be used in the present invention to detect voice breaks, screen cuts, image recognition and motion. Thus, the specific techniques related to detecting voice breaks described above and related to screen cuts, image recognition and motion detection are used by way of example only and do not limit the scope of the present invention.

본 발명은 비디오 콘텐츠의 검색과 관련되며, 특히 비디오 스트림의 이전 부분의 검색과 리플레이에 이용가능하다. The present invention relates to the retrieval of video content and is particularly useful for retrieval and replay of previous portions of a video stream.

Claims

A method of playing a media stream 30 from a previous position L _N -L ₁ in a media stream 30, wherein the method includes a large number of previously identified content changes 120, 120a in the media stream 30. Playing a media stream (140, 140a) from a selected one of the above, wherein the content change comprises a previous voice break in the media stream (30).

The method of claim 1, wherein the media stream 30 is a video stream 30 and the previously identified content changes 120, 120a further comprise at least one of a change in screen cut and motion. Way.

The method of claim 1, wherein the previous voice break comprises the start of voice after a relative silence period in the media stream (30).

2. The media stream of claim 1, further comprising the step of receiving a control command (130, 130a) used to select one previous content change in the media stream (30) to be played (140, 140a). How to play.

5. The media of claim 4, wherein the control command (130, 130a) comprises m input signals, wherein the m input signals are used to select the m-th previous content change in the media stream to start playback (140, 140a). How to play the stream.

5. The method of claim 4, wherein the control command (130, 130a) is used to select one content change to play (140, 140a) to be processed based on a previous control command received.

5. The method of claim 4, wherein the received control command (130, 130a) is generated by at least one of manual input, voice input, and gesture recognition.

2. The method of claim 1, further comprising the step 120 of locating and storing a previous content change in real time while media stream 30 is executing, wherein the media stream from the selected previous content change is replayed. Step 140 uses the stored location corresponding to the selected content change.

2. The method of claim 1, further comprising the step (120a) of identifying a location of a previous content change in the media stream from data contained in the media stream, wherein the replaying of the media stream (120a) from the selected previous content change ( 140a) utilizes the location of the selected content change included in the media stream (30).

The method of claim 1, further comprising generating (100) a media stream from at least one of a magnetic tape, an optical disc, a server, and a hard drive.

The method of claim 1, further comprising receiving (100) the media stream from an external source.

12. The method of claim 11, further comprising recording the received media stream and playing from the recorded media stream.

The method of claim 1, wherein replaying the media stream from selected one of the many previously identified content changes 120, 120a in the media stream 30 is a function of a content change type. Way.

A method of playing a digital media stream 30 from a position in the media stream before the current execution position T of the media stream 30, the method comprising:

a) detecting 120 a content change location L _N -L _{1 in} real time when the media stream executes;

b) storing (120) at least many nearest change positions detected before execution position T;

c) receiving 130 one or more input signals comprising the number m;

d) receiving from the memory the m closest change position before position T in the media stream; And

e) replaying (140) the media stream from the mth nearest change location in the media stream.

15. The method of claim 14, wherein the media stream (30) is at least one of an audio stream and a video stream.

16. The method of claim 15, wherein the change location is comprised of voice break locations in the media stream.

17. The method of claim 16, wherein the media stream (30) is a video stream and the change location is further comprised of at least one of a change in screen cut position and movement positions.

A system (10) for replaying a media stream from a previous location (L _N -L ₁ ) in a media stream (30), the system (10) comprising a processor (50) and a memory, wherein the processor (50) Receive one or more input signals to select one of the many previously identified content changes in the media stream 30, wherein the processor 50 stores the location L _N -L ₁ corresponding to the selected content change. Further retrieval from and activate the replay of the media stream 30 from the selected change location L _N -L ₁ , where the change of the identified content includes previous voice breaks in the media stream 30. System for playing media streams.

19. The processor (50) of claim 18, wherein the processor (50) further identifies changes in content in the media stream (30) and adds their positions (L _N -L ₁ ) when the media stream (30) is played. System that plays a media stream.

19. The system of claim 18, wherein the system (10) further creates the media stream (30).

19. The system of claim 18, wherein the system (10) further receives the media stream (30) and records the media stream (30).

19. The system of claim 18, wherein the system (10) consists of a single device (20) receiving the processor (50) and a memory, receiving the input signals and activating the replay.

23. The system of claim 22, wherein the device (20) is one of a VCR, a CD player, a DVD player and a PC.

A computer program product embodied in a computer-readable medium for replaying a media stream 30 from a previous position L _N -L ₁ selected in the media stream 30, wherein the computer program product comprises:

a) (120) computer readable program code for detecting a change in content in real time when the media stream is operating;

b) computer readable program code for storing (120) at least many positions L _N -L ₁ of the nearest content change in the media stream detected prior to playback position T;

c) computer readable program code for receiving one or more input signals comprising the number m (130);

d) computer readable program code for retrieving from the memory the mth nearest change position before position T in the media stream; And

e) computer readable program code for generating (140) an output signal for replaying the media stream from the mth nearest change location prior to T.