KR20020060964A

KR20020060964A - System to index/summarize audio/video content

Info

Publication number: KR20020060964A
Application number: KR1020027006025A
Authority: KR
Inventors: 코헨-소랄에릭; 스트루베휴고; 리미-수엔
Original assignee: 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 2000-09-11
Filing date: 2001-08-27
Publication date: 2002-07-19
Also published as: WO2002021843A2; EP1393568A2; JP2004508776A; WO2002021843A3

Abstract

내용 자료의 제작 동안 사용가능한 "배경 정보(background information)"는 오디오/비디오 내용 자료의 소스에 선택적인 억세스를 용이하게 하기 위해 내용 자료와 서로 관련된다. 이러한 제작 정보는 예를 들어, 내용 자료의 수집 동안 사용되는 카메라 설정들을 포함한다. 다른 제작 정보는 사운드 제어들, 화면 식별기들, 소스 식별기들, 내용 자료를 제작하는 스태프와 통신하는 명령들 등을 포함한다. 인덱싱을 용이하게 하기 위해서, 그 제작 정보는 기호 표현들의 수집을 제작하기 위해서 처리되고 필터링되며, 각각의 기호는 결정된 이벤트 또는 특성에 대응한다. 이러한 기호 제작 정보는 능률적이고 효과적인 선택 검색을 용이하게 하는 내용 자료에 한 세트의 주석들을 제공하기 위해서, 바람직하게 정보를 인덱싱 및 요약하는 다른 정보들과 결합된다.The "background information" available during the production of the content material correlates with the content material to facilitate selective access to the source of the audio / video content material. Such production information includes, for example, camera settings used during the collection of content data. Other production information includes sound controls, screen identifiers, source identifiers, instructions to communicate with the staff producing the content material, and the like. In order to facilitate indexing, the production information is processed and filtered to produce a collection of symbolic representations, each symbol corresponding to a determined event or characteristic. This preference making information is preferably combined with other information that indexes and summarizes the information to provide a set of annotations in the content material that facilitates efficient and effective selection retrieval.

Description

System for indexing / summarizing audio / video content {System to index / summarize audio / video content}

2. 종래 기술의 설명2. Description of the prior art

MITRE Corporation의 BNE(Broadcast News Editor) 및 BNN(Broadcast News Navigator)(Bedford MA의 MITRE Corporation의 Andrew Merlino, Daryl Morey, 및 Mark Maybury의 1997년 ACM 멀티미디어 회의 진행에 broadcast News Navigation using story segmentation의 381 내지 389페이지)에 의해 증명된 것으로서, 자동화 스토리의 세그먼트 및 식별 분야에서 개선들이 계속적으로 이루어지고 있다. BNE를 사용하여, 뉴스 방송들이 자동적으로 개별 스토리 세그먼트들로 분리되고, 그 세그먼트과 관련된 클로즈드 캡션 텍스트(closed caption text)의 제 1 라인은 각각의 스토리의 요약으로서 사용된다. 크로즈드 캡션 텍스트 또는 오디오로부터의 키워드들은 각각의 스토리의 요약으로서 사용된다. BNN은 고객이 검색 용어(searchwords)들을 입력하도록 허용하고, BNN은 그 검색 용어들에 부합하는 각각의 스토리 세그먼트에서의 키워드들의 수로 스토리 세그먼트들을 분류한다. 키워드들에 부합하는 발생(occurrence )들의 주파수에 기초하여, 그 유저는 관심의 스토리들을 선택한다. 유사한 탐색 및 검색 기술들은 본 기술 분야에서 평범하다. 예를 들어, 종래의 텍스트 검색 기술들은 텔레비전 가이드에 기초하여 컴퓨터에 적용될 수 있어서, 사람들이 특정 제목, 특정 연기자, 특정 형태의 쇼들 등을 검색할 수 있었다.Broadcast News Navigation using story segmentation, 381-389 As is demonstrated by Page, improvements continue in the area of segmentation and identification of automation stories. Using BNE, news broadcasts are automatically split into individual story segments, and the first line of closed caption text associated with that segment is used as a summary of each story. Keywords from closed caption text or audio are used as a summary of each story. The BNN allows the customer to enter search terms, and the BNN classifies the story segments by the number of keywords in each story segment that match the search terms. Based on the frequency of occurrences that match the keywords, the user selects stories of interest. Similar search and search techniques are commonplace in the art. For example, conventional text search techniques could be applied to a computer based on a television guide, allowing people to search for a specific title, a specific performer, a particular type of shows, and the like.

이들 인덱싱 및 검색 기술들은 또한 오디오 및 비디오 CD들 및 DVD들과 같은 레코딩된 엔터테인먼트 자료를 위해 개발되고 있다. 유저들은 예를 들어, "독백(monologue), 존 스미스(John Smith)"와 같은 연기자의 이름 및 장면의 특성을 지정함으로써 특정 장면에 대해 검색할 수 있을 것이다. 응답하여, 그 검색 디바이스는 독백을 연기하는 존 스미스를 포함하는 하나 이상의 장면을 나타낼 것이다.These indexing and retrieval techniques are also being developed for recorded entertainment materials such as audio and video CDs and DVDs. Users may be able to search for a particular scene, for example, by specifying the actor's name, such as "monologue, John Smith," and the characteristics of the scene. In response, the search device will present one or more scenes including John Smith who plays the monologue.

검색을 위한 자료를 분류하고 인덱싱할 때 우연히 만나게 되는 난점들 중 하나가 효율적인 검색을 용이하게 하기 위한 관련 정보로 자료에 "주석첨가할 (annotate)" 필요성이다. 수동 처리가 각각 레코딩될 인덱싱 정보를 추가하고 내용 자료의 방송 설정을 위해서 사용될 수 있지만, 그러한 처리는 희생이 많은 노력이고, 따라서 BNE와 같은 앞서 언급된 자동화 인덱싱 시스템들에 대한 필요성이 있을 것이다. 예를 들어, 종래에는, 바로 이전 프레임과는 실질적으로 다른 프레임들에 대해서 검색함으로써 자동화 인덱싱 시스템이 먼저 장면의 각각의 변화를 인식하거나 커팅(cut)한다. 그 후, 그 프레임이 클로우즈-업(close-up)한 얼굴 샷을 포함하고, 그 프로그램의 문맥(context)이 "뉴스 방송(news broadcast)"이라면, 그 다음의 프레임들의 시퀀스는 "뉴스캐스터(newscaster)" 클립으로서 식별될 수 있는 반면에, 그 프레임이 전체 그림 형상(full-figure shape)을 포함한다면, 그 다음의 프레임들의 화상은 "야외 촬영(on-location)" 클립으로서 식별될 수 있다. 앞서 명시된 바와 같이, 임의의 클로즈드 캡션 텍스트는 또한 장면들을 식별하고 분류하는데 사용된다. 비록, 이러한 기술들이 다소 효율적인 것으로 판명될지라도, 그들은 그 자료를 특징으로 하는 적절한 설정의 인덱싱 파라미터들을 결정하기 위해 각 장면에서의 화상들 및 음성들과 같은 자료의 내용에 매우 의존한다.One of the difficulties encountered by sorting and indexing data for search is the need to "annotate" the data with relevant information to facilitate efficient search. Although manual processing can be used to add indexing information to be recorded and broadcast content of content respectively, such processing is a costly effort and therefore there will be a need for previously mentioned automated indexing systems such as BNE. For example, conventionally, by searching for frames that are substantially different from the immediately preceding frame, the automated indexing system first recognizes or cuts each change in the scene. Then, if the frame contains a close-up face shot, and the context of the program is "news broadcast," the next sequence of frames is "Newscaster ( newscaster "clip, whereas if the frame contains a full-figure shape, then the image of the next frame can be identified as an" on-location "clip. . As noted above, any closed caption text is also used to identify and classify scenes. Although these techniques turn out to be somewhat efficient, they rely heavily on the content of material such as pictures and sounds in each scene to determine the appropriate setting of indexing parameters that characterize the material.

MPEG-7 표준은 효과적인 인덱싱 및 검색 능력들에 대한 필요성을 어드레스하고, 한 세트의 기술 구성(description scheme)들 및 기술자(descriptor)들, 기술 정의 언어, 및 그 기술을 코딩하기 위한 구성을 표준화하도록 의도된 "멀티미디어 내용 기술 인터페이스(Multimedia Content Description Interface)"를 요청한다. 특히, MPEG-7 표준은 이전 및 이후 제작 스크립트(pre- and post-production script)들, 슈팅 동안 캡쳐되거나 주석첨가된 정보, 이후 제작 편집 목록들을 포함하는, 다양한 비디오 제작 단계들에 비디오 스트림들 내의 기술 정보를 연관시키는 능력을 요청한다. 그 비디오 자료의 제작 동안 이러한 정보를 추가시킴으로써 비디오 아카이브(video archive) 내의 자료의 이후-제작 주석과 비교하여 그 주석들의 질 및 효율성을 실질적으로 개선하리라 예상된다.The MPEG-7 standard addresses the need for effective indexing and search capabilities, and standardizes a set of description schemes and descriptors, a description definition language, and a configuration for coding the description. Request the intended "Multimedia Content Description Interface." In particular, the MPEG-7 standard includes pre- and post-production scripts, information captured or annotated during shooting, and post production edit lists in video streams at various video production stages. Request the ability to associate descriptive information. Adding such information during the production of the video material is expected to substantially improve the quality and efficiency of the annotations compared to post-production annotations of the material in the video archive.

1. 발명의 분야1. Field of Invention

본 발명은 고객 전자들의 분야에 관한 것이며, 특히, 선택 내용의 효율적인 검색 및 탐색을 위한 오디오/비디오 내용의 인덱싱 및 요약을 용이하게 하는 시스템에 관한 것이다.The present invention relates to the field of customer electronics, and more particularly to a system that facilitates indexing and summarization of audio / video content for efficient retrieval and navigation of selections.

도 1은 오디오/비디오 정보를 모으기 위한 예시적인 제작 장면을 도시하는 도면.1 illustrates an exemplary production scene for gathering audio / video information.

도 2는 본 발명에 따른 제작 레코딩 시스템의 예시적인 블록 다이어그램을 도시하는 도면.2 shows an exemplary block diagram of a production recording system according to the invention.

도 3은 본 발명에 따른 인덱싱/요약 시스템의 예시적인 블록 다이어그램을 도시하는 도면.3 illustrates an exemplary block diagram of an indexing / summary system according to the present invention.

도 4는 본 발명에 따른 제작 레코더의 예시적인 블록 다이어그램을 도시하는 도면.4 shows an exemplary block diagram of a production recorder in accordance with the present invention.

발명의 간단한 요약Brief summary of the invention

본 발명의 목적은 오디오/비디오 내용 자료에 대한 시스템들을 인덱싱 및 요약하는 내용의 효율성을 개선하는 것이다. 본 발명의 또 다른 목적은 내용 자료의 인덱싱 및 요약을 용이하게 하기 위해 부가적인 보조 정보를 제공하는 것이다.It is an object of the present invention to improve the efficiency of content indexing and summarizing systems for audio / video content material. It is yet another object of the present invention to provide additional supplemental information to facilitate indexing and summarization of content material.

이러한 목적들 및 다른 목적들은 내용 자료의 제작동안 사용가능한 "배경 정보(backgroud information)"를 그 내용 자료와 서로 관련시킴으로써 달성된다. 이러한 제작 정보는 예를 들어, 내용 자료의 수집 동안 사용되는 카메라 설정들을 포함한다. 다른 제작 정보는 사운드 제어들, 장면 식별기들, 소스 식별기들, 내용 자료를 제작하는 스태프와 통신하는 명령들 등을 포함한다. 감독으로부터 각각의 카메라 조작자(camera operator)로의 명령들은 예를 들어, 카메라들로부터의 다음 영상들의 내용에 인사이트(insight)를 제공할 수 있다. 같은 방법으로, 자동화 카메라 시스템들로부터 자동적으로 발생된 명령들 또한 인사이트를 제공할 수 있다. 다른 제작 정보는 사운드 제어들, 장면 식별기들, 소스 식별기들 등을 포함한다. 인덱싱을 용이하게 하기 위해서, 제작 정보는 제작 정보의 기호 표현들의 수집을 제작하기 위해 처리되고 필터링되며, 각각의 기호는 결정된 이벤트 또는 특성에 대응한다. 이러한 기호 제작 정보는 바람직하게 능률적이고 효율적인 선택 검색을 용이하게 하는 내용 자료에 주석들의 설정을 제공하기 위해서, 인덱싱 및 요약 정보의 다른 소스들과 결합된다. 본 명세서에 제시된 기술들은 또한 특히 화상회의 녹화(videoconference recording)의 키 세그먼트들의 식별을 용이하게 하기 위해 화상 회의들의 내용 자료에 주석첨가하기 위해 아주 적절하다.These and other objects are achieved by correlating "backgroud information" available with the content material with the content material during use. Such production information includes, for example, camera settings used during the collection of content data. Other production information includes sound controls, scene identifiers, source identifiers, instructions for communicating with a staff that produces content material, and the like. Instructions from the director to each camera operator can provide insight into the content of subsequent images from the cameras, for example. In the same way, commands automatically generated from automated camera systems can also provide insight. Other production information includes sound controls, scene identifiers, source identifiers, and the like. In order to facilitate indexing, production information is processed and filtered to produce a collection of symbolic representations of production information, each symbol corresponding to a determined event or characteristic. This preference making information is preferably combined with other sources of indexing and summary information to provide setting of annotations in content material that facilitates efficient and efficient selection retrieval. The techniques presented herein are also well suited for annotating the content data of video conferencing, in particular to facilitate identification of key segments of videoconference recording.

본 발명은 첨부 도면들을 참조하여 보다 상세하게 예의 방식으로 설명된다.The invention is explained by way of example in more detail with reference to the accompanying drawings.

전체 도면들에서, 동일한 참조부호들은 유사하거나 대응하는 특징들 또는 기능들을 지시한다.In all figures, like reference numerals indicate similar or corresponding features or functions.

참조 및 이해의 용이함을 위해, 용어들 "인덱싱(indexing)" 및 "요약(summarizing)"은 본 발명의 특정 응용들을 참조하기 위해 본 명세서에 사용된다. 이러한 발명들은 내용 자료와 연관된 정보를 제공하기 위한 방법 및 디바이스를 어드레스하고, 그 정보가 사용되는 방법에 의해 제한되지는 않는다. 비록, 이러한 제공된 정보가 자료에 대한 검색을 용이하게 하기 위해 인덱스로서 사용하거나 또는 그 자료의 빠른 리뷰(review) 및 프리뷰(preview)를 용이하게 하기 위한 개요(synopsis)로서 사용하기에 특히 매우 적절할 수 있으며, 본 기술 분야의 숙련자는 이러한 특정 응용들에 제한되지 않음을 인식할 것이다.For ease of reference and understanding, the terms "indexing" and "summarizing" are used herein to refer to specific applications of the present invention. These inventions address a method and device for providing information associated with content material, and are not limited by how the information is used. Although such provided information may be particularly well suited for use as an index to facilitate searching for a material or as a synopsis to facilitate quick review and preview of the material. Those skilled in the art will recognize that they are not limited to these specific applications.

도 1은 오디오/비디오 정보를 모으기 위한 예시적인 제작 장면을 도시하는 도면이다. 그 장면은 연기자(performer)들(130, 131) 및 대상(object)들(140)뿐만 아니라, 카메라 조작자들(120, 121)의 동작(action)을 지시하는 감독(110)을 포함한다. 예시적인 장면은 지시된 장면을 나타낸다. 본 기술 분야의 숙련자에게 명백하듯이, 본 발명은 또한 뉴스 또는 스포츠 중계(event)의 레코딩과 같은 지시되지 않은 장면에 적용가능하다. 생방송 스포츠 중계에서, 예를 들어, 제작 감독은 카메라가 현재 액션에 의존하는 "온라인(on-line)" 카메라임을 연속적으로 결정하고, "온라인" 카메라로서의 잠재적인 선택에 대한 특정 장면들을 캡쳐하기 위해서 "오프라인(off-line)" 카메라들을 명령하거나, 또는 "인스턴트 리플레이(instant replay)"에 대한 잠재적인 사용을 명령한다. 뉴스 방송에서, 제작 감독은 또한 전송을 위한 소스 자료의 유사한 선택을 수행한다. 화상 회의에서, 양측의 참가자들은 통상적으로 화상 회의 활동들에 조화되어 있는 카메라 설정을 조정할 수 있다.1 is a diagram illustrating an exemplary production scene for gathering audio / video information. The scene includes directors 110 and 131 and objects 140 as well as director 110 instructing the actions of camera operators 120 and 121. Example scenes represent scenes indicated. As will be apparent to those skilled in the art, the present invention is also applicable to undirected scenes, such as recording of news or sports events. In a live sports broadcast, for example, the production director decides successively that the camera is an "on-line" camera depending on the current action, and to capture specific scenes for potential selection as an "online" camera. Command "off-line" cameras or potential use for "instant replay". In a news broadcast, the production director also performs a similar selection of source material for transmission. In video conferencing, participants on both sides can adjust camera settings that are typically coordinated with video conferencing activities.

상상될 수 있는 바와 같이, 도 1의 감독(110)은 "카메라 1은 Joe(130)가 군중 속으로 달려들어갈 때 따라가라. 카메라 2는 Jim(131)을 따라가라. 당신 둘 모두는 그들이 서류가방들을 교환하는 시야 내에 서류 가방들이 있는 지를 확인하라."와 같은 지시들을 내릴 수 있다. 이러한 발명은 그 장면들에 대응하는 제작 정보의 부분을 형성하는 그러한 "배후 장면(behind-the-scene)" 명령들이 장면들의 화상들을 해석하기 위한 정보의 실질적인 양을 전달한다. 예를 들어, 상기의 3개의 지령 문장들은 오로지 화상 내용에 기초하여 추론하기 어려운 도 1의 장면에 의미를 부여한다. 그리고, 도 1의 장면과 연관된 어떠한 대화도 없는 경우,클로즈드-캡션 텍스트의 사용은 이러한 의미를 식별할 시 최소한의 조력을 제공할 것이다. MPEG-7에 대해 상기 인용된 발표된 스크립트들, 개요들, 및 장면 편집 목록들은 적절하게 설명한 정보를 포함할 수 있지만, 그 발표된 정보는 그들이 실제로 제작 사이트에서 발생한 때와 같이 그 이벤트들을 반영하지 않을 수 있다. 다른 한편, 상기의 3개의 예시적인 문장들은 Joe와 Jim이 그 장면 안에 있고, 이것이 서류 가방들이 교환되는 장면이라는 등의 정보를 전달한다. 이러한 제작 정보는 또한 정보의 다른 소스들의 해석을 용이하게 하기 위해 간접적인 방법으로 사용될 수 있는 정보를 전달한다. 예를 들어, 화상 처리 시스템이 아마도 "그룹" 장면으로서 이러한 장면을 식별할 것이며, 그 예시적인 문장들이 군중은 단순히 "배경"임을 강조하는데 반하여, 그 그룹 내의 Joe 또는 Jim을 식별할 수도 있고, 식별하지 못할 수도 있다. 같은 방법으로, 그 장면 내의 자동차들(140)의 존재는 또한 제작 지령들 내에 그들에 대한 참조가 없는 것에 기초하여, 중요하지 않은 배경 정보로서 해석될 수 있다. 즉, 제작 정보는 다음 화상들의 내용에 관한 직접적인 정보를 전달할 뿐만 아니라, 인덱싱 또는 요약 정보의 다른 소스들에 의한 효율적인 처리를 용이하게 하는 신호들을 제공한다.As can be imagined, director 110 of FIG. 1 states, “Camera 1 follows when Joe 130 runs into the crowd. Camera 2 follows Jim 131. Make sure there are briefcases in view of exchanging the bags. ” This invention conveys a substantial amount of information for interpreting pictures of scenes such that "behind-the-scene" instructions form part of the production information corresponding to those scenes. For example, the above three command sentences give meaning to the scene of FIG. 1 which is difficult to reason based solely on the image contents. And in the absence of any conversations associated with the scene of FIG. 1, the use of closed-caption text will provide minimal assistance in identifying this meaning. The published scripts, outlines, and scene edit lists cited above for MPEG-7 may contain appropriately described information, but the published information does not reflect those events as they actually occurred at the production site. You may not. On the other hand, the three example sentences above convey information that Joe and Jim are in the scene, this is the scene where the briefcases are exchanged, and so on. This production information also conveys information that can be used in an indirect manner to facilitate the interpretation of other sources of information. For example, the image processing system will probably identify such a scene as a "group" scene, and while the example sentences highlight that the crowd is simply a "background", it may identify Joe or Jim in that group and identify it. You may not be able to. In the same way, the presence of the cars 140 in the scene can also be interpreted as non-essential background information, based on the lack of a reference to them in the production instructions. That is, the production information not only conveys direct information about the contents of the following pictures, but also provides signals that facilitate efficient processing by indexing or other sources of summary information.

도 2는 본 발명에 따른 제작 레코딩 시스템(200)의 예시적인 블록 다이어그램을 도시한다. 제작 레코더(210)는 다양한 소스들(220, 230, 240)로부터 정보를 수신하고, 다음 처리를 위해 효율적인 형식으로 제작 정보를 캡쳐하는 제작 정보의 데이터베이스(215)를 제작한다. 제작 정보의 1차 소스는 발성 입력(vocal input)(220)이다. 다양한 발성 소스들(220)이 통상적으로 제작 정보를 제공한다.예를 들어, 뉴스 방송에서, 그 제작 부스(production booth)는 발성 정보의 소스를 제공하고; 현장 리포트(on-site report)는 현장 카메라 조작자에게 지령을 제공할 수 있고; 방송 전달 이전에, 뉴스 진행자는 스튜디오 뉴스캐스터들 또는 현장 리포터 등에게 조언을 해줄 수 있다.2 shows an exemplary block diagram of a production recording system 200 according to the present invention. Production recorder 210 receives information from various sources 220, 230, 240, and produces a database 215 of production information that captures production information in an efficient format for subsequent processing. The primary source of production information is a vocal input 220. Various spoken sources 220 typically provide production information. For example, in a news broadcast, the production booth provides a source of spoken information; An on-site report can provide instructions to an on-site camera operator; Before the broadcast delivery, the news host may advise studio newscasters or field reporters.

제작 레코더(210)은 이러한 정보의 소스들 각각을 처리하고, 적절한 정보를 추출하고, 검색 처리에서 다음 사용을 위한 적절한 정보를 레코딩하도록 구성된다. 분석의 복잡성에 따라서, 이러한 처리는 레코딩되고 있는 영상들, 또는 이후 처리 태스크로서 실시간으로 수행될 수 있다. 바람직한 실시예에서, 제작 레코더(210)는 또한 처리 및 본석의 다음의 선택가능한 정도를 용이하게 하기 위해, 소스들(220 내지 240)로부터의 정보를 직접적으로 레코딩한다. 예를 들어, 제작 정보의 실시간 분석은 감독과 연관된 마이크로폰으로부터의 발성 정보와 같은 제작 정보의 '1차' 소스를 사용하고, 제작 정보의 다른 소스들은 보다 상세한 분석 및 평가를 위해 차후 시간에 요구되는 바와 같이 처리된다.Production recorder 210 is configured to process each of these sources of information, extract the appropriate information, and record the appropriate information for subsequent use in the retrieval process. Depending on the complexity of the analysis, this processing may be performed in real time as the images being recorded, or as a later processing task. In a preferred embodiment, production recorder 210 also directly records information from sources 220-240 to facilitate processing and the next selectable degree of gemstone. For example, real-time analysis of production information uses a 'primary' source of production information, such as voice information from a microphone associated with the director, while other sources of production information may be required later for more detailed analysis and evaluation. Is treated as.

발성 명령들에 부가하여, 바람직한 실시예의 제작 레코딩 시스템(200)은 별도의 카메라들과 연관된 파라미터들에 대응하는 입력(230)을 포함한다. 예를 들어, 카메라의 줌(zoom) 설정은 장면의 특성을 제공하는데 사용할 수 있다. 좁은 각도, 높은 줌 설정은 통상적으로 개인 또는 이벤트로 지시된 초점, 또는 개인 또는 이벤트에 강조의 변화를 지시하는 줌 설정의 변화를 강조한다. 넓은 각도 또는 낮은 줌 설정은 통상적으로 "배경" 또는 "감정 설정" 장면과 연관된다. 낮은 줌 설정으로 캡쳐되는 한 시리즈의 화상들의 식별은 예를 들어, 보다 높은 줌 설정 화상들에 "앞으로 건너뛰기(skip-ahead)" 위한 화상 기초 분류 시스템으로 사용될 수 있다. 또는, 학습 시스템(learning system)에서, 각각의 새로운 장면이 줌 설정에 관계없이 평가될 수 있으며, 화상 처리기가 특정 줌 설정으로 거의 또는 전혀 알아볼 수 없는 정보를 찾는다면, '앞으로 건너뛰기' 기능을 활성화할 수 있다. 비디오 캡쳐 기술들, 특히 동일한 제작에서의 일관성을 가정하면, '앞으로 건너 뛰기'에 대한 결정은 점점더 빠르게 이루어 질 수 있으므로, 화상 기초 분류 시스템의 효율성을 향상시킨다. 동일한 방법으로, 방위 또는 방위의 변화율은 또한 장면을 특징으로 하는데 사용될 수 있다. 예를 들어, 스포츠에서, 엔드-라인-러쉬(end-line-rush) 또는 스트라이크 아웃(strike-out)이 아마 카메라 방위의 변화를 포함하지 않을 것인데 반해, 킥-오프(kick-off), 포워드 패스(forward-pass) 또는 홈런(home-run)의 캡쳐는 카메라 방위의 상대적으로 신속한 변화를 포함할 것이다.In addition to speech commands, the production recording system 200 of the preferred embodiment includes an input 230 corresponding to parameters associated with separate cameras. For example, the camera's zoom setting can be used to provide the characteristics of the scene. Narrow angles, high zoom settings typically emphasize a focus pointed to a person or event, or a change in zoom setting that indicates a change in emphasis on the person or event. Wide angle or low zoom settings are typically associated with a "background" or "emotion setting" scene. The identification of a series of images captured at a low zoom setting can be used, for example, as an image based classification system for "skip-ahead" to higher zoom setting pictures. Alternatively, in a learning system, each new scene can be evaluated regardless of the zoom setting, and if the image processor finds little or no recognizable information with a particular zoom setting, the 'skip forward' function is disabled. It can be activated. Assuming consistency in video capture techniques, especially in the same production, the decision to 'skip forward' can be made faster and faster, thus improving the efficiency of the picture based classification system. In the same way, azimuth or rate of change of azimuth can also be used to characterize the scene. For example, in sports, end-line-rush or strike-out will probably not include a change in camera orientation, while kick-off, forward Capturing a forward-pass or home-run will involve a relatively rapid change in camera orientation.

레코딩 동안, 사운드 붐(boom)들의 위치와 같은 제작 정보의 다른 소스들(240) 및 그 장면의 '포커스'를 확인하는 다른 수단이 또한 레코딩된 내용 자료의 인덱싱 또는 요약을 용이하게 하는데 사용된다. 같은 방법으로, 내용 자료의 '소스'는 그 장면에 관한 정보를 전달할 수 있다. 예를 들어, 뉴스 방송에서, "파일 푸티지(file footage)"로부터 오는 장면의 식별은 그 장면의 처리를 최소화하거나, 이전 처리 및 이러한 푸티지의 특징에 링크를 제공하는 것 중 하나에 사용될 수 있다. 제작 태스크들이 컴퓨터 자원들을 통해 점점더 자동화가 되거나 적어도 관리될 때, 그 제작 정보의 소스는 상당하게 됨을 주목하라. 예를 들어, 뉴스 방송 동안 소스들의 시퀀싱 및 선택이 컴퓨터를 통해 제어될 것이 기대될 수 있다. 각각의 제작에 대한 이러한 정보의 캡쳐는 실질적으로 다른 내용 인덱싱 및 요약 도구들의 능률성 및 효율성을 증가시킨다.During recording, other sources 240 of production information such as the location of the sound booms and other means of identifying the 'focus' of the scene are also used to facilitate indexing or summarization of the recorded content material. In the same way, a 'source' of content can convey information about the scene. For example, in a news broadcast, identification of a scene coming from a "file footage" can be used for either minimizing the processing of the scene or providing a link to previous processing and features of such footage. . Note that as production tasks are increasingly automated or at least managed through computer resources, the source of that production information becomes significant. For example, it can be expected that sequencing and selection of sources during the news broadcast will be controlled via a computer. Capturing this information for each production substantially increases the efficiency and efficiency of other content indexing and summarization tools.

제작 정보의 소스들(220 내지 240)의 선택이 임의적이고, 전통적인 제작 기술들에 맞출 필요는 없음을 주목하라. 예를 들어, Mi-Suen Lee가 2000년 3월 21일에 출원한 시리얼 넘버 제09/532,820호, 대리인 문서번호 US000063의 미국 특허 출원 "HANDS-FREE HOME VIDEO PRODUCTION CAMCORDER"은 흥미로울 것 같은 장면들을 캡쳐하기 위해 카메라 필드의 시야를 자동적으로 조정하는 기술 및 디바이스를 기재하고 있으며, 본 명세서에서 참조된다. 그 조정은 예를 들어, 기술들, 사운드 위치 및 포커싱 등을 트래킹할 목적에 기초하고, 노련한 카메라 조작자의 동작들에 필적하는 지식-기초 시스템 기술(knowledge-based system technique)들을 통합한다. 본 발명에 사용되는 것으로서, 그 결과 자동적으로 발생된 카메라 설정들은 제작 정보의 임의의 다른 소스들을 가지거나 가지지 않고, 앞서 기재된 재생 레코더(210)에 카메라 입력(230)을 제공한다. 같은 방법으로, 공동 계류 중인 Eric Cohen-Solal 및 Mi-Suen Lee가 2000년 1월 20일에 출원한 시리얼 넘버 제09/488,028호, 대리인 문서번호 US000015의 미국 특허 출원 "MULTIMODAL VIDEO TARGET ACQUISITION AND RE-DIRECTION SYSTEM AND METHOD"는 제스츄어(gesture)들 및 키워드(key word)들에 기초하여 카메라 필드의 시야를 조정하는 기술 및 디바이스를 개시하고 있으며, 본 명세서에서 참조된다. 유사하게, 공동 계류 중인 Hugo Strubbe 및 Mi-Suen Lee가 2000년 4월 13일에 출원한 시리얼 넘버 제09/548,734호, 대리인 문서번호 US000103의 미국 특허 출원 "METHOD AND APPARATUS FOR TRACKINGMOVING OBJECTS USING COMBINED VIDEO AND AUDIO INFORMATION IN VIDEO CONFERENCING AND OTHER APPLICATION"는 레코딩되고 있는 비디오 및 오디오 내용의 분석에 기초하여, 카메라 필드의 시야를 조정하는 기술 및 디바이스를 기재하고 있다. 본 발명에 사용되는 바와 같이, 결과적인 카메라 설정들 및 이러한 설정들을 야기하는데 사용되는 제스츄어들, 음성 및 움직임들, 또는 제스츄어들, 음성 또는 움직임들의 분석은 제작 정보 데이터베이스(215) 제작시에 사용하기 위한 제작 레코더(210)에 제공될 수 있다.Note that the selection of sources 220-240 of fabrication information is arbitrary and need not be adapted to traditional fabrication techniques. For example, the US patent application "HANDS-FREE HOME VIDEO PRODUCTION CAMCORDER" of serial number 09 / 532,820, agent document number US000063, filed March 21, 2000, by Mi-Suen Lee captures scenes that may be of interest. Techniques and devices for automatically adjusting the field of view of a camera field are described, and are referred to herein. The adjustment is for example based on the purpose of tracking techniques, sound position and focusing, etc., and incorporates knowledge-based system techniques comparable to the operations of a seasoned camera operator. As used in the present invention, the camera settings that are automatically generated as a result provide the camera input 230 to the playback recorder 210 described above with or without any other sources of production information. In the same way, US patent application "MULTIMODAL VIDEO TARGET ACQUISITION AND RE-" of serial number 09 / 488,028, attorney docket US000015, filed Jan. 20, 2000 by Eric Cohen-Solal and Mi-Suen Lee, co-pending DIRECTION SYSTEM AND METHOD "discloses a technique and device for adjusting the field of view of a camera field based on gestures and key words, and is referenced herein. Similarly, US patent application "METHOD AND APPARATUS FOR TRACKINGMOVING OBJECTS USING COMBINED VIDEO AND" issued by co-pending Hugo Strubbe and Mi-Suen Lee, Serial No. 09 / 548,734, filed April 13, 2000, Representative Document No. US000103. AUDIO INFORMATION IN VIDEO CONFERENCING AND OTHER APPLICATION "describes techniques and devices for adjusting the field of view of a camera field based on analysis of video and audio content being recorded. As used in the present invention, analysis of the resulting camera settings and gestures, voices and movements, or gestures, voices or movements used to cause these settings may be used in the production of production information database 215. Can be provided to the production recorder 210 for.

화상 회의 동안의 카메라 설정들은 유사하게 화상 회의 세션의 특징을 용이하게 하는데 사용될 수 있다. 개선된 화상 회의 시스템들은 앞서 기재된 자동화 및 반자동화 카메라 제어 특징들을 포함할 것으로 예상되며, 심지어는 비교적 간단한 시스템들은 화상 회의의 참가자들이 그들의 위치에서 또는 먼 위치에서 카메라 필드의 시야를 조정하도록 한다. 또는, 카메라 조작자가 화상 회의의 중앙 위치, 또는 키 스피커(key speaker)r의 위치 등에 제공될 수 있다. 연장된 지속 시간 동안 아마도 줌(zoom) 시에 약간의 변동들을 갖는 고정 카메라 위치가 특히, 각각의 화상 회의 위치로부터의 오디오 내용와 상호 연관될 때 키노트 어드레스(keynote address)를 지시할 수 있다. 같은 방법으로, 계속적인 카메라의 앞뒤 회전이 키 회의 기간(key discussion period)을 지시할 수 있다. 앞서 명시된 바와 같이, 내용 자료를 갖는 이러한 제작 정보의 조합은 내용 자료로부터 쉽게 명백해 질 수 없는 인사이트를 제공하고, 그리하여, 각각의 화상 회의에 대한 요약들을 제공하는 질 및 효율을 향상시킬 수 있다. 예를 들어, 일단 복수의 스피커들 각각에 대응하는카메라 설정이 결정되기만 하면(또는 명백하게 제공되기만 하면), 화상들에 대응하는 카메라 설정이 문자 식별 처리에 제공되는 문자들의 선택들을 이전 필터링하는데 사용될 경우, 화상들에서의 문자 식별이 상당히 간단해진다. 유사하게, 스피커 식별 처리는 각각의 오디오 트랙에 대응하는 카메라 설정들을 제공함으로써 유사하게 개선될 수 있다. 같은 방법으로, 오디오 트랙이 현재의 카메라에 대응하는 필드의 시야에서 식별된 참가자를 부합시키지 않는다는 신속한 결정이 현재의 스피커를 검색하기 위해 현재의 카메라 설정들을 변경하는데 사용될 수 있다. 이러한 그리고 다른 공동성 효과(synergetic effect)들은 본 발명의 사용이 일반화(commonplace)될 때 본 기술 분야의 숙련자들에게 명백해 질 것이다.Camera settings during video conferencing may similarly be used to facilitate the characteristics of the video conferencing session. Improved video conferencing systems are expected to include the automation and semi-automated camera control features described above, and even relatively simple systems allow participants in video conferencing to adjust the field of view of the camera field at their or remote locations. Alternatively, the camera operator may be provided at the center position of the video conference, the position of the key speakerr, or the like. A fixed camera position with some fluctuations, perhaps upon zooming for an extended duration, may indicate a keynote address, especially when correlated with audio content from each video conferencing position. In the same way, continuous camera rotation back and forth may indicate a key discussion period. As noted above, this combination of production information with content material may provide insights that may not be readily apparent from the content material, thereby improving the quality and efficiency of providing summaries for each video conference. For example, once the camera settings corresponding to each of the plurality of speakers have been determined (or only provided explicitly), the camera settings corresponding to the images are used to previously filter the selections of characters provided in the character identification process. , Character identification in the images becomes quite simple. Similarly, speaker identification processing can be similarly improved by providing camera settings corresponding to each audio track. In the same way, a quick determination that the audio track does not match the identified participant in the field of view corresponding to the current camera can be used to change the current camera settings to retrieve the current speaker. These and other synergetic effects will be apparent to those skilled in the art when the use of the present invention is commonplace.

내용 자료를 갖는 제작 정보의 동기를 용이하게 하기 위해, 시간 참조번호(201)가 레코딩된 제작 정보(215)와 연관된다. 본 기술 분야의 숙련자들에게 명백해 질 것처럼, 카메라 설정들(240)과 같은 몇몇 제작 정보가 내용 자료의 장면들과 제시간에 일치할 것이다. 발성 지령들(220)과 같은 다른 정보가 통상적으로 그들이 적용한 장면들에 앞선다. 지식에 기초하고 발견적인 기술들은 특정 지령들(220)과 내용 자료 간의 상관 관계를 결정하는데 사용된다. 예를 들어, 카메라 설정들(230)의 중요한 조정에 앞서는 지령(220) 또는 장면들 내의 '컷(cut)'의 발생이 다음의 클립에 대한 정보를 포함하기 위한 것이다. 그렇지 않으면, 예를 들어, 제작 레코더(210) 또는 내용 자료에 대한 다른 입력들에 어떠한 중요한 변화도 없는 경우, 그 지령들은 아마도 현재의 클립에 적절할 것이다. 원인과 결과 관계들을 결정하기 위한 다른 기술들이 본 기술 분야에서 일반적이다. 다른 장면 식별 및동기 입력(202)이 또한 특히, 지시된 장면들에 제공될 수 있으며, 그 장면(예를 들어, "Rocky Ⅸ, Scene 32, Take 3")의 명백한 식별이 사용가능하다.To facilitate synchronization of production information with content data, a time reference 201 is associated with the recorded production information 215. As will be apparent to those skilled in the art, some production information, such as camera settings 240, will coincide with the scenes of the content material in time. Other information, such as speech commands 220, typically precedes the scenes they apply. Knowledge-based and heuristic techniques are used to determine the correlation between specific instructions 220 and content data. For example, the occurrence of a 'cut' in the command 220 or scenes prior to significant adjustment of the camera settings 230 is intended to include information about the next clip. Otherwise, for example, if there is no significant change in production recorder 210 or other inputs to the content material, the instructions will probably be appropriate for the current clip. Other techniques for determining cause and effect relationships are common in the art. Other scene identification and synchronization inputs 202 may also be provided, in particular, to the indicated scenes, and an explicit identification of that scene (eg, “Rocky Ⅸ, Scene 32, Take 3”) is available.

도 3은 본 발명에 따른 인덱싱/요약 시스템(300)의 예시적인 블록 다이어그램을 도시한다. 도시된 바와 같이, 인덱서/요약기(indexer/summarizer)(310)이 바람직하게 내용 정보의 특성, 또는 인덱싱, 또는 요약을 용이하게 하는 다양한 정보(215, 320 내지 323)에 억세스한다. 클로즈드 캡션 정보(320)는 일반적으로 앞서 기재된 바와 같이, BNE 및 BNN 시스템들에서와 같이 내용 자료를 특징으로 하는데 사용된다. 본 발명에 따라서, 제작 정보는 이러한 분류 처리의 효율성 및 능률성을 개성시키는데 사용될 수 있다. 예를 들어, 도 1의 장면은 클로즈드 캡션 자료에 포함되는 대화를 포함할 수 있으나, 이러한 대화는 단지 서류 가방들의 교환으로부터의 전환으로서 제공될 수 있거나, 단지 그 교환이 일어나는 동안의 주입기 자료(filler material)로서 제공될 수 있다. 바람직한 실시예에서의 인덱서/요약기(310)는 예를 들어, 화상 정보(321)과 연관된 대응하는 중요성-가중-인자(significance-weighting-factor)를 증가시키는 동안, 클로즈드-캡션 정보 (320)와 연관된 중요성-가중-인자를 감소시키기 위해, 도 1의 장면과 연관된 제작 정보(215)를 사용한다.3 shows an exemplary block diagram of an indexing / summary system 300 in accordance with the present invention. As shown, an indexer / summarizer 310 preferably accesses the characteristics of the content information, or various information 215, 320-323 that facilitates indexing, or summarization. Closed caption information 320 is generally used to characterize the content material as in BNE and BNN systems, as described above. According to the present invention, production information can be used to personalize the efficiency and efficiency of this sorting process. For example, the scene of FIG. 1 may include a conversation included in the closed caption material, but such a conversation may only be provided as a transition from the exchange of briefcases, or just a filler material during the exchange. material). The indexer / summarizer 310 in the preferred embodiment, while increasing the corresponding importance-weighting-factor associated with the picture information 321, for example, the closed-caption information 320. In order to reduce the importance-weighting-factor associated with the < Desc / Clms Page number 12 >

내용 자료에서의 화상 정보(321)는 각 화상의 시각적 특성들에 기초하는 내용 자료를 분류하는데 사용된다. 예를 들어, 도 1의 장면은 간단한 패턴 및 문맥 인식 기술들에 기초하여 "집 밖(outdoor), 그룹, 자동차들, 보행자들(pedestrian)"로서 특징될 수 있다. 그 시스템의 능력들에 따라서, 인덱서/요약기(310)는 또한그 장면 내에 하나 이상의 배우들 및 여배우들의 인식을 포함할 수 있다. 공동 계류 중인 Nevenka Dimitrova 및 Lalitha Agnihotri가 1999년 12월 1일에 출원한 시리얼 넘버 제9/452,581호, 대리인 문서번호 PHA 23,846의 미국 특허 출원 "PROGRAM CLASSIFICATION USING OBJECT TRACKING"는 한 프레임 내의 얼굴 화상들 및 텍스트 화상들의 존재를 검출하고, 비디오 세그먼트의 복수의 프레임들을 통해 각 화상의 경로 또는 궤도(trajectory)를 결정하는 내용 기초의 분류 시스템을 개시하고 있다. 얼굴 궤도와 텍스트 궤도 정보의 조합은 비디오 시퀀스의 각각의 세그먼트를 분류하는데 사용된다. 도 1의 예에서, 제작 정보(215)는 예를 들어, 지령 문장들에서 "Joe"와 "Jim"의 참조들에 기초하여 이러한 대상 트래킹을 용이하게 한다. 같은 방법으로, 그 대상 트래킹은 "Joe"와 "Jim"이 각각의 장면 내에 있는 지의 여부에 의존하여, 현재 또는 다음 장면에 대한 지령 문장들을 연관시킴으로써 내용 자료에 대한 지령들의 동기를 용이하게 한다. 장면들 또는 클립들의 특성을 용이하게 하기 위해 제작 정보를 화상 정보와 조합시키는 이들 및 다른 기술들이 이러한 기재를 고려하여 본 기술 분야의 숙련자들에게 명백해질 것이다. 앞서 언급한 MPEG-7 표준의 결과는 이러한 정보의 크로스-플랫폼 (cross-platform) 이용을 용이하게 하기 위해 요구되는 문장론(syntax)뿐만 아니라, 오디오/비디오 내용 자료의 능률적이고 효율적인 인덱싱 및 요약에 대한 유용한 의미론적 기술자(semantic desciptor)들도 제공할 것으로 예상된다.Image information 321 in the content data is used to classify the content data based on the visual characteristics of each image. For example, the scene of FIG. 1 may be characterized as “outdoor, group, cars, pedestrian” based on simple pattern and contextual awareness techniques. Depending on the capabilities of the system, the indexer / summary 310 may also include the recognition of one or more actors and actresses within the scene. US patent application "PROGRAM CLASSIFICATION USING OBJECT TRACKING" of serial number 9 / 452,581, attorney docket No. PHA 23,846, filed December 1, 1999 by co-pending Nevenka Dimitrova and Lalitha Agnihotri. A content based classification system is disclosed that detects the presence of text pictures and determines the path or trajectory of each picture through a plurality of frames of a video segment. The combination of face trajectory and text trajectory information is used to classify each segment of the video sequence. In the example of FIG. 1, production information 215 facilitates tracking of this object based, for example, on references of "Joe" and "Jim" in command sentences. In the same way, the object tracking facilitates the motivation of the instructions for the content material by associating instruction sentences for the current or next scene, depending on whether "Joe" and "Jim" are in each scene. These and other techniques for combining production information with image information to facilitate the nature of scenes or clips will be apparent to those skilled in the art in view of this description. The results of the aforementioned MPEG-7 standard provide for the efficient and efficient indexing and summarization of audio / video content, as well as the syntax required to facilitate cross-platform use of this information. It is also expected to provide useful semantic desciptors.

문맥 정보(322)는 또한 인덱서/요약기(310)에 의해 제공되는 내용 자료의 특성을 용이하게 한다. 예를 들어, 장면의 문맥이 스포츠 중계일 경우, 제작정보(215) 또는 클로즈드-캡션 정보(320)에 포함된 용어론(terminology)적인 해석은 변경될 수 있고; 그 화상들 내에 묘사된 개인들에게는 필수적인 것이 아닌,클로즈드-캡션 정보가 그 방송캐스터(broadcaster)에 대응할 가능성(likelihood)에 기초하여, 그 클로즈드-캡션 정보(320)와 화상 정보(321) 간의 상관 관계가 변경될 수 있는 등이다.Contextual information 322 also facilitates the nature of the content material provided by indexer / summary 310. For example, if the context of the scene is a sports relay, the terminology interpretation included in production information 215 or closed-caption information 320 may change; Correlation between the closed-caption information 320 and the picture information 321 based on the likelihood that the closed-caption information corresponds to the broadcaster, which is not essential to the individuals depicted in the pictures. Relationships may change, etc.

인덱서/요약기(310)가 유저의 홈에 있는 경우, 또는 특정 유저에 대해 커스터마이징된(customized) 경우, 유저 정보(323)는 또한 내용 자료의 특성을 용이하게 하는데 사용될 수 있다. 공동 계류 중인 Jan H. Elenbaas, Tomas McGee, Nevenka Dimitrova, 및 Mark Simpson이 1998년 12월 23일에 출원한 시리얼 넘버 제09/220,277호, 대리인 문서번호 PHA 23,590의 미국 특허 출원 "PERSONALIZED NEWS RETRIEVAL SYSTEM"는 유저의 선호도들 또는 시청 습관들에 기초한 정보의 분류 및 검색 을 커스터마이징하기 위한 기술들을 제시하고 있으며, 본 명세서에서 참조된다. 본 출원의 문맥에서, 유저의 선호도 및/또는 습관들을 인식하는 것은 인덱서/요약기(310)의 특정 관점들에 대한 우선 순위를 제공함으로써 제작 정보(215) 및 다른 정보(320 내지 322)의 처리를 용이하게 한다. 예를 들어, 유저는 연기자(performer)의 이름에 기초한 정보를 좀처럼 검색하지 않는다면, 그 인덱서/요약기(310)은 각 장면 또는 클립에 나타난 연기자들의 포괄적인 인덱스를 제공하기 위해 복수의 정보 소스들(215, 320 내지 322)을 사용하여 각 연기자를 트래킹하기 위해 부가적인 시간과 자원들을 소비한다. 인덱서/요약기(310)의 능률성 및 효율성을 최적화하기 위한 이들 및 다른 기술들이 본 명세서를 고려하여 본 기술 분야의 숙련자들에게 명백해 질 것이다.If the indexer / summary 310 is at the user's home, or customized for a particular user, the user information 323 may also be used to facilitate the nature of the content material. US patent application "PERSONALIZED NEWS RETRIEVAL SYSTEM" of serial number 09 / 220,277, attorney docket No. PHA 23,590, filed December 23, 1998 by Jan H. Elenbaas, Tomas McGee, Nevenka Dimitrova, and Mark Simpson, co-pending DETAILED DESCRIPTION Techniques for customizing the classification and retrieval of information based on user preferences or viewing habits are referred to herein. In the context of the present application, recognizing a user's preferences and / or habits is the processing of production information 215 and other information 320-322 by giving priority to certain aspects of the indexer / summary 310. To facilitate. For example, if the user rarely retrieves information based on the performer's name, the indexer / summarizer 310 may provide multiple information sources to provide a comprehensive index of the performers appearing in each scene or clip. 215, 320-322 consume additional time and resources to track each actor. These and other techniques for optimizing the efficiency and efficiency of the indexer / summarizer 310 will be apparent to those skilled in the art in view of the present specification.

인덱서/요약기(310)는 통상적으로 내용 자료(350)의 주석첨가된 버전으로서 내용 자료에 첨부된 정보를 제공한다. 예를 들어, DVD 제공자는 DVD 상에 내용 자료의 각 장면과 연관된 인덱싱 또는 요약 정보를 포함하는 DVD를 제공하기 위해 시스템(300)을 사용할 것이다. 대응하는 DVD 플레이어는 포함된 인덱싱 또는 요약 정보에 기초하여 내용 자료의 특정 장면들에 대한 검색이 용이하도록 구성된다. 대안으로, 인덱싱/요약 시스템(300)은 내용 자료의 제공자에 독립적일 수 있고, 내용 자료에 관계없는 부속물(adjunct)로서 인덱싱 또는 요약 정보를 제공할 수 있다. 예를 들어, 벤더(vendor)가 인터넷 사이트 상의 인덱싱 및 요약 정보를 제공할 수 있고, 유저가 웹-TV 디바이스, 또는 퍼스널 컴퓨터(PC)와 같은 인터넷 억세스 디바이스를 통해 앞서 말한 검색을 달성하도록 하는 응용 프로그램을 제공할 수 있다.The indexer / summary 310 is typically an annotated version of the content material 350 and provides information attached to the content material. For example, a DVD provider will use the system 300 to provide a DVD containing indexing or summary information associated with each scene of the content material on the DVD. The corresponding DVD player is configured to facilitate searching for specific scenes of the content material based on the indexing or summary information included. Alternatively, indexing / summary system 300 may be independent of the provider of the content material and provide indexing or summary information as an adjunct independent of the content material. For example, a vendor may provide indexing and summary information on an Internet site, and allow a user to achieve the aforementioned search through a web-TV device, or an internet access device such as a personal computer (PC). Program can be provided.

도 4는 본 발명에 따른 제작 레코더(210)의 예시적인 블록 다이어그램을 도시한다. 바람직한 실시예에서, 제작 레코더(210)는 다양한 소스들로부터의 제작 관련 입력들(201 내지 240)을 처리하기 위한 음성 인식기(420), 시야 처리기(field of view processor)(430), 및 장면 동기기(410)를 포함한다. 음성 인식기(420)는 말해진 사운드들로부터 인식된 용어들로의 번역을 제공하며, 발성 입력들(220)을 처리하는데 사용된다. 시야 처리기(430)는 각 카메라(230)의 현재의 시야를 특징으로 하기 위해 카메라 설정들의 해석을 제공한다. 장면 동기기(410)는 제작 정보 및 내용 자료 간의 동기를 용이하게 하기 위해 동기 입력들(201, 202)을 처리한다. 다른 처리기들(440)은 요구되는 바와 같이 제작 정보의 다른 소스들(240)로부터의 입력을 처리하기 위해 제공된다.4 shows an exemplary block diagram of a production recorder 210 in accordance with the present invention. In a preferred embodiment, production recorder 210 includes voice recognizer 420, field of view processor 430, and scene synchronizer for processing production related inputs 201-240 from various sources. 410. Speech recognizer 420 provides translation from spoken sounds to recognized terms and is used to process speech inputs 220. Field processor 430 provides an interpretation of the camera settings to characterize the current field of view of each camera 230. The scene synchronizer 410 processes the synchronization inputs 201 and 202 to facilitate synchronization between production information and content data. Other processors 440 are provided for processing input from other sources 240 of production information as required.

바람직한 실시예에서, 기호 인코더(450)는 인덱서/요약(310)(도 3)에 의한 다음 처리를 용이하게 하기 위해 상징적 형태로 제작 정보(215)를 인코딩한다. 제작 레코더(210)는 이러한 기호 인코딩을 용이하게 하는 기호 라이브러리(symbol library)(460)를 포함한다. 예를 들어, 바람직한 실시예에서, 기호 라이브러리(460)는 기호 인코더(450)가 음성 인식기(420)에 의해 제공된 용어들을 인코딩하기 위해 사용하는 키워드들에 대한 기호들을 포함한다. 같은 방법으로, 그 기호 라이브러리는 시야 처리기(420)에 의해 제공된 카메라 특성들의 인코딩을 용이하게 하기 위해, 특정 카메라 설정들 또는 설정들의 조합들에 대응하는 심볼들을 포함한다. 다양한 기술들이 효율적인 기호 라이브러리(460)을 유지하기 위해 사용될 수 있다. 공동 계류 중인 Keith Mathias, J. David Schaffer 및 Murali Mani가 1999년 6월 29일에 출원한 시리얼 넘버 제09/343,649호, 대리인 문서번호 PHA 23,696의 미국 특허 출원 "IMAGE CLASSIFICATION USING EVOLVED PARAMETERS"는 유전, 진화, 화상들을 분류하는데 사용되는 파라미터들을 최적화하기 위한 알고리즘들의 사용을 개시하고 있다.In the preferred embodiment, the symbol encoder 450 encodes the production information 215 in symbolic form to facilitate subsequent processing by the indexer / summary 310 (FIG. 3). Production recorder 210 includes a symbol library 460 that facilitates such symbol encoding. For example, in a preferred embodiment, symbol library 460 includes symbols for keywords that symbol encoder 450 uses to encode terms provided by speech recognizer 420. In the same way, the symbol library includes symbols corresponding to specific camera settings or combinations of settings, to facilitate encoding of camera characteristics provided by field processor 420. Various techniques may be used to maintain the efficient symbol library 460. US patent application "IMAGE CLASSIFICATION USING EVOLVED PARAMETERS" of serial number 09 / 343,649, attorney docket No. PHA 23,696, filed June 29, 1999 by Keith Mathias, J. David Schaffer, and Murali Mani, Evolution, the use of algorithms to optimize the parameters used to classify pictures.

앞서 말한 것은 단지 본 발명의 원리들을 설명한 것이다. 그러므로, 비록 본 명세서에 명백하게 기재되거나 도시되지 않았을지라도 본 기술 분야의 숙련자들이 본 발명의 원리들을 구체화하고, 그러므로 본 발명의 정신 및 범위 내에 있는 다양한 장치들을 고안할 수 있을 것임이 이해될 것이다. 예를 들어, 비록, 내용 정보의 다른 형태들의 분류 및 인덱싱이 또한 제작 정보의 포함에 의해 용이하게 될 지라도, 비디오 내용 정보가 본 발명의 응용을 나타내기 위한 모범(paradigm)으로서 사용되었다. 사운드 스튜디오에서의 지령들 및 설비 설정들은 예를 들어, 오디오 내용 자료를 인덱싱하고 요약하는데 사용될 수 있다. 유사하게, 인덱싱 및 검색을 위한 사용에 부가하여, 주석첨가된 오디오/비디오 내용(350), 특히 감독의 발성 명령들에 포함된 제작 정보(215)는 유저에게는 직접적인 흥미가 될 수 있고, 이러한 정보를 포함하는 매체에 대한 마케팅 이점을 제공할 수 있다. 이들 및 다른 시스템 구성들 및 최적화 특징들은 본 명세서를 고려하여 본 기술 분야의 숙련자들에게 명백할 것이며, 다음 청구항들의 범위 내에 포함된다.The foregoing merely illustrates the principles of the invention. Therefore, it will be understood that those skilled in the art, although not explicitly described or shown herein, may embody the principles of the present invention and, therefore, devise various devices that fall within the spirit and scope of the present invention. For example, although content classification and indexing of other forms of content information is also facilitated by the inclusion of production information, video content information has been used as a paradigm to represent the application of the present invention. Instructions and facility settings in the sound studio can be used, for example, to index and summarize audio content material. Similarly, in addition to use for indexing and searching, annotated audio / video content 350, particularly production information 215 included in director's utterance commands, may be of direct interest to the user, and such information. It may provide a marketing advantage for the medium, including. These and other system configurations and optimization features will be apparent to those skilled in the art in view of this specification, and are included within the scope of the following claims.

Claims

In a method of providing ancillary information related to content material,

Collecting source information (220 to 240) related to the production of the content data, and

Processing the source information (220 to 240) to provide the assistance information,

The source information 220-240 may include one or more instructions 220 issued during the production of the content material;

And at least one of one or more parameters (230, 240) associated with one or more equipment items used during the production of the content document.

The method of claim 1,

Processing the source information (220 to 240),

Recognizing terms associated with a vocal input corresponding to the one or more instructions 220, and

At least one of recognizing a field of view setting associated with at least one camera (230) corresponding to the one or more facility items.

The method of claim 1,

Processing the source information (220 to 240),

Also processing other information (320 to 323) to provide the assistance information,

The other information 320 to 323,

Closed-caption information 320 associated with the content material;

Image information 321 associated with the content data;

Audio information associated with the content material,

Context information 322 associated with the content material, and

And at least one of user information (323) associated with a user of the content material.

The method of claim 3, wherein

Processing of the source information (220 to 240) facilitates summarizing the content data.

The method of claim 1,

And the content material corresponds to scenes of a video conference.

The method of claim 1,

Synchronizing the content data with the assistance information based on the assistance information to facilitate a search for specific segments of the content material (350).

The method of claim 1,

And classifying production information (215) associated with the source information (220 to 240) in symbolic form to facilitate processing of the source material.

The method of claim 1,

And the auxiliary information is provided in accordance with the MPEG-7 specification.

In the method for providing auxiliary information related to the content material,

Collecting camera parameters 230 associated with one or more camera settings used during the production of the content material, and

Based on the camera parameters (230), producing the assistance information.

The method of claim 1,

Providing the auxiliary information,

And processing other information 320 to 323 to provide the assistance information.

The other information 320 to 323,

Closed-caption information 320 associated with the content material;

Image information 321 associated with the content data;

Audio information associated with the content material,

Contextual information 322 associated with the content material, and

In the recording system 210,

Source information (220 to 240) associated with the production of the content material;

Accept as input the synchronization data 201, 202 associated with the content data therefrom

And an encoder (450) for producing production information (215) that facilitates selective access to the content material.

The method of claim 11,

The source information 220 to 240 is

Instructions 220 associated with the production of the content material, and

And at least one of the settings (230-240) associated with the facility used to produce the content material.

The method of claim 11,

The encoder 450 is

A speech recognition system 420 for processing utterance source information 220 associated with the production of the content material, and

And at least one of a field of view processor (430) for processing parameters (230) associated with at least one camera associated with the production of the content material.

In the information processing system 300,

Source of production information (215) affecting production of contents document,

A processor 310, operatively coupled to a source of production information 215, configured to provide supplemental information related to the content material,

The production information 215 is

One or more instructions 220 issued during the production of the content document and

And at least one of one or more parameters (230, 240) associated with one or more equipment items used during the production of the content material.

The method of claim 14,

The source of the reproduction information 215 is

A speech recognition system 420 configured to recognize terms associated with speech input corresponding to the one or more instructions 220;

And at least one of a visual field processor (430) for processing parameters (230) associated with at least one camera corresponding to the one or more facility items.

The method of claim 14,

Further includes at least one source of other information 320 to 323,

The other information 320 to 323

Closed-caption information 320 associated with the content material;

Image information 321 associated with the content data;

Contextual information 322 associated with the content material and

At least one of user information 323 associated with a user of the content material,

The processor 310 is information coupled, operatively coupled to at least one source of other information 320-323, and further configured to provide the assistance information further based on the other information 320-323. System 300.

The method of claim 14,

And a synchronizer (410) configured to provide a correlation between the supplemental information and the content data.

The method of claim 14,

And the assistance information facilitates identification of characters associated with the content material.

The method of claim 14,

And the auxiliary information is provided according to the MPEG-7 specification.

In the information processing system 300,

An input for receiving production information 215 related to production of the content material,

A processor 310, operatively coupled to the input, configured to provide assistance information related to the content material,

The production information (215) includes one or more parameters (230) associated with one or more cameras used during the production of the content material.

The method of claim 20,

At least one source 320-323 of other information,

The other information 320 to 323 is

Closed-caption information 320 associated with the content material;

Image information 321 associated with the content data;

Contextual information 322 associated with the content material, and