KR20010108147A

KR20010108147A - Method and apparatus for editing a video recording with audio selections

Info

Publication number: KR20010108147A
Application number: KR1020017009524A
Authority: KR
Inventors: 알렉산더테레사에이.
Original assignee: 피터 엔. 데트킨; 인텔 코오퍼레이션
Priority date: 1999-01-28
Filing date: 1999-01-28
Publication date: 2001-12-07
Also published as: IL144017A; JP2002536887A; GB2362986A; GB2362986B; KR100420293B1; DE19983916B4; DE19983916T1; IL144017A0; GB0116270D0

Abstract

본 발명은 비디오 컨텐트를 포함하는 신호를 수신하는 단계(124), 상기 비디오 컨텐트의 특성을 나타내는 시각 속성을 식별하기 위하여 상기 수신된 신호의 비디오 컨텐트를 분석하는 단계(128)를 포함하는 비디오 기록 편집 방법에 관한 것이다. 적어도 부분적으로 상기 비디오의 상기 식별된 시각 속성을 기초하여, 상기 수신된 신호를 합성시키는 상기 오디오 섹션(128)을 상기 이용 가능한 복수의 오디오 선택 내용에서 식별한다.The present invention provides a video recording editing comprising the step of receiving a signal comprising video content (124) and analyzing (128) the video content of the received signal to identify a visual property indicative of the characteristic of the video content. It is about a method. Based at least in part on the identified visual properties of the video, the audio section 128 that synthesizes the received signal is identified in the available plurality of audio selections.

Description

METHOD AND APPARATUS FOR EDITING A VIDEO RECORDING WITH AUDIO SELECTIONS}

최근에 일반적으로 소비자 전자 제품(consumer electronics), 특히 오락 시스템의 분야에 많은 발전이 있었다. 실제로 미국 내의 많은 가정이 지금 텔레비전, 비디오 리코더/재생 장치, 예를 들면 비디오카세트 리코더, 디지털 다목적 디스크(digital versatile disk)(일명 디지털 비디오 디스크나 DVD), 레이저디스크 플레이어 등을 구비하고 있다. 또한, 아이들 생일, 축구 게임, 방학 등을 소리(audio)와 영상(video)으로 기록한(documenting) 자신들의 영화를 만들 수 있는, 일반적으로 "캠코더(cam-corder)"라고 하는 비디오 카메라를 보유한 가정이 점점 더 많아지고 있다. 이와 유사하게, "정지(still)" 사진을 완전히 대체하진 못했지만, 결혼식을 사진 앨범뿐만 아니라 비디오 테입에 담아 두는 경우가 많이 있다.Recently, there have been many advances in the field of consumer electronics, in particular entertainment systems. Indeed, many homes in the United States now have televisions, video recorder / playback devices such as videocassette recorders, digital versatile disks (aka digital video disks or DVDs), laser disk players, and the like. In addition, families with video cameras, commonly referred to as "cam-corders," who can make their own movies documenting audio and video of children's birthdays, soccer games, vacations, etc. This is getting more and more. Similarly, although it is not a complete replacement for "still" photography, weddings are often housed in videotapes as well as photo albums.

이러한 종래의 비디오 카메라에 대한 좀더 발전된 모델은 사용자가기록(recording)을 편집하고 조작할 수 있다는 특징이 있다. 예를 들면, 어떤 비디오 카메라는 사용자가 기록을 조작하여 기록의 시작부에 제목이나 크레디트(credit)를 첨가할 수 있도록 되어 있다. 또한 다른 어떤 비디오 카메라는 사용자가 "페이드 인/페이드 아웃(fade in/fade out)"하도록, 예를 들면 "흐릿한(fuzzy)" 영상(picture)에서 선명한 영상이 또는 완전히 검은 영상에서 선명한 영상이 되거나, 그 반대가 되도록 초점을 서서히 조절할 수도 있다. 이와 유사하게, 좀더 발전된 어떤 비디오 리코더/재생 장치는 사용자가 비디오 기록을 편집하여 기록에 제목이나 크레디트를 첨가할 수 있는 특징이 있다.A more advanced model of such a conventional video camera is characterized by the user being able to edit and manipulate recording. For example, some video cameras allow a user to manipulate the recording to add a title or credit to the beginning of the recording. Some other video cameras also allow the user to “fade in / fade out,” for example, a clear image in a “fuzzy” picture or a clear image in a completely black image. You can also slowly adjust the focus to reverse it. Similarly, some more advanced video recorder / playback devices have the feature that a user can edit a video record to add a title or credit to the record.

그러나 앞에 기술한 A/V 장치들 중에서, 비디오 기록에 존재하는 원래의 오디오 컨텐트(original audio content)를 대체하지(replacing) 않고, 사용자가 기록을 편집하여 오디오 컨텐트, 예를 들면, 사운드 트랙, 시(poem), 소네트(sonnet)나 다른 강화된(enhancing) 오디오를 첨가할 수 있는 것은 아무 것도 없다. 예를 들면, 어떤 사람이 종래의 일반 가정용(home) A/V 기기를 사용하여 바다(ocean)를 촬영한 자신의 비디오 기록에 배경 음악을 넣길 원할 경우, 비디오 기록의 오디오 트랙을 다시 기록하여 이 바다 소리를 "배경" 음악으로 교체하며, 이 배경 음악이 비디오 기록에서 주 오디오 컨텐트가 된다. 당업자는, 예를 들면 두 개의 신호를 수신하여 이들을 합성 신호(composite signal)로 합성하는 장치이며, 일반적인 전문 편집 장비에 해당하는 오디오 "믹서(mixer)"를 사용하여 편집자가 비디오 기록에 오디오를 합성함으로써(augment) 이러한 어려움(dilemma)을 잘 해결할 수 있다. 그러나 대표적인 전문 편집 시스템인 오디오 믹서는 고가이고, 설치 및 이용에 어려움이 있다. 즉, 종래의 통상적인 오디오 믹서는 비디오 기록에 합성되는 오디오 컨텐트를 사용자가 선택해야 하고, 사용자는 오디오 선택 내용을 주 오디오 컨텐트와 동조시키고, 오디오의 다양한 레벨(예를 들면, 볼륨)을 설정해야 한다.However, among the A / V devices described above, instead of repeating the original audio content present in the video recording, the user edits the recording so that audio content, such as sound tracks, Nothing can add poems, sonnets, or other enhanced audio. For example, if someone wants to put background music into their video recording of a sea using a conventional home A / V device, they can re-record the audio track of the video recording. Replace the sound of the sea with "background" music, which becomes the main audio content in the video record. One skilled in the art is, for example, a device that receives two signals and synthesizes them into a composite signal, and the editor synthesizes the audio into the video recording using an audio "mixer" corresponding to a general professional editing equipment. By augmenting this dilemma can be solved well. However, audio mixers, which are typical professional editing systems, are expensive and have difficulty in installation and use. That is, conventional conventional audio mixers require the user to select the audio content synthesized in the video recording, and the user must synchronize the audio selection with the main audio content and set various levels of audio (e.g., volume). do.

그래서 이런 전문 믹서의 비용이 엄청나게 비싸지 않더라도, 일반적인 가정용 오락 시스템의 사용자는 자신의 가정용 영화를 편집하기 위하여 전문 오디오 믹서를 이용하는 것이 얼마나 어려운지를 알게 된다.So even if the cost of such a professional mixer is not enormously expensive, a user of a typical home entertainment system finds out how difficult it is to use a professional audio mixer to edit his home movie.

결론적으로, 가정용 영화에 배경 음악이나 다른 음향 효과를 첨가하는 것이 바람직하더라도, 지금까지의 소비자 전자 산업은 이러한 필요성을 충족시키지 못한다.In conclusion, although it is desirable to add background music or other sound effects to home movies, the consumer electronics industry to date does not meet this need.

그래서 종래 기술과 연관된 상기한 부족한 점과 한계에 영향을 받지 않는 오디오 선택 내용으로 비디오 기록을 편집하는 방법 및 장치가 필요하다.What is needed is a method and apparatus for editing video recordings with audio selections that are not affected by the above deficiencies and limitations associated with the prior art.

본 발명은 오락 시스템(entertainment system)의 분야에 관한 것으로, 더욱 상세하게는 오디오 선택 내용(audio selections)으로 비디오 기록을 편집하기 위한 방법 및 장치에 관한 것이다.FIELD OF THE INVENTION The present invention relates to the field of entertainment systems, and more particularly, to a method and apparatus for editing video recordings with audio selections.

발명의 개요Summary of the Invention

본 발명의 요지에 따르면, 오디오 선택 내용으로 비디오 기록을 편집하는 방법 및 장치를 제공한다. 본 발명의 제1 실시예에서, 비디오 기록을 편집하는 방법은 비디오 컨텐트를 포함하는 신호를 수신하는 단계와 수신된 신호의 비디오 컨텐트를 분석하여 이 비디오 컨텐트의 특성을 나타내는 시각 속성(visual attribute)을 식별하는 단계를 포함한다. 적어도 부분적으로 비디오 컨텐트의 식별된 시각 속성에 기초하여, 수신된 신호에 합성되는 오디오 선택 내용은 이용 가능한 복수의 오디오 선택 내용으로부터 식별된다.According to the gist of the present invention, there is provided a method and apparatus for editing a video record with audio selections. In a first embodiment of the invention, a method of editing a video record comprises receiving a signal comprising video content and analyzing the video content of the received signal to obtain a visual attribute representing the nature of the video content. Identifying. Based at least in part on the identified visual attributes of the video content, the audio selections synthesized to the received signal are identified from the plurality of available audio selections.

동일한 구성 요소에 같은 도면 부호를 부여한 첨부한 도면을 참조하여 예시한 실시예를 이용하여 본 발명을 설명하지만, 본 발명은 이 실시예에 한정되지 않는다.Although the present invention will be described using the embodiments illustrated with reference to the accompanying drawings in which like elements have the same reference numerals, the present invention is not limited to these embodiments.

[도면의 상세한 설명]Detailed Description of the Drawings

도 1은 본 발명의 요지를 포함하고 있는 오락 시스템을 도시한 블록도이다.1 is a block diagram illustrating an entertainment system including the gist of the present invention.

도 2는 본 발명의 일 실시예에 따른 A/V 편집 시스템의 블록도이다.2 is a block diagram of an A / V editing system according to an embodiment of the present invention.

도 3은 본 발명의 요지에 따라서, 오디오 선택 내용을 비디오 기록에 자동으로 합성하는 방법의 한 예를 나타낸 순서도이다.3 is a flowchart showing an example of a method for automatically synthesizing audio selections into video recording, in accordance with the teachings of the present invention.

도 4는 본 발명의 한 실시예에 따라 A/V 신호의 시각 속성을 특성 짓는 데 이용되는 양자화 필드(quantization field)를 묘사하는 수신된 A/V 신호의 비디오 채널에 대한 일 예를 도시한 도면이다.4 shows an example of a video channel of a received A / V signal depicting a quantization field used to characterize the visual properties of an A / V signal in accordance with an embodiment of the present invention. to be.

도 5는 본 발명의 일 실시예에 따른 비디오 기록의 주 오디오 컨텐트에 대한 특성을 자동으로 나타내는 방법의 일 실시예를 도시하는 도면이다.FIG. 5 is a diagram illustrating one embodiment of a method for automatically representing characteristics of primary audio content of a video recording according to an embodiment of the present invention.

도 6은 본 발명의 일 실시예에 따른 오디오 선택 내용 데이터베이스의 일 예를 도시하는 도면이다.6 is a diagram illustrating an example of an audio selection content database according to an embodiment of the present invention.

도 7은 본 발명의 일 실시예에 따른 A/V 편집 시스템으로 이용하기에 적합한 예시적인 컴퓨터 시스템의 블록도이다.7 is a block diagram of an exemplary computer system suitable for use with an A / V editing system in accordance with an embodiment of the present invention.

도 8은 본 발명의 일 실시예에 따른 A/V 편집 시스템을 구현하기 위한 예시적인 소프트웨어 아키텍처(software architecture)를 도시한 블럭도이다.8 is a block diagram illustrating an exemplary software architecture for implementing an A / V editing system in accordance with one embodiment of the present invention.

다음의 상세한 설명에서, 설명을 목적으로, 본 발명을 완전히 이해할 수 있도록 특정 번호, 재료와 구성을 기재된다. 그러나 본 발명을 실행하기 위하여 이러한 세부 사항이 필요 없을 수도 있다는 것은 당업자에rps 자명할 것이다. 다른 예에서, 잘 알려진 특징은 설명을 용이하게 하기 위하여 생략하거나 간략화 한다. 또한, 쉽게 이해할 수 있도록, 소정 방법의 단계는 각각의 단계로서 기술된다. 그러나, 이러한 개별적으로 기술하는 단계는 필요에 따라 성능에 따라서 순서대로 구성되지 않을 수 있다.In the following detailed description, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that these details may not be required to practice the invention. In other instances, well-known features are omitted or simplified for ease of explanation. Also, for ease of understanding, the steps of a given method are described as each step. However, these separately described steps may not be arranged in order depending on performance as required.

도 1은 본 발명의 요지를 포함하고 있는 오락 시스템의 한 예를 예시한 블록도를 도시하다. 도 1에 예시한 실시예에 따라서, 오락 시스템(100)은 본 발명의 요지를 포함하는 오디오/비디오(A/V) 편집 시스템(128)을 포함하고 있다.1 shows a block diagram illustrating an example of an entertainment system incorporating the subject matter of the present invention. In accordance with the embodiment illustrated in FIG. 1, the entertainment system 100 includes an audio / video (A / V) editing system 128 that incorporates the subject matter of the present invention.

다음에 좀더 상세하게 기술하는 것처럼, 본 발명의 한 실시예에 따르면, A/V 편집 시스템(128)은 비디오 스트림(video stream) 내의 비디오 컨텐트와 수신되는 신호(이하에는 일반적으로 A/V 신호로 칭함)의 오디오 스트림(audio stream) 내에 포함되어 있는 임의적인 오디오 컨텐트를 포함하는 비디오 기록/재생 장치로부터 상기 신호를 수신한다. 그로 인해 A/V 편집 시스템(128)은 비디오 컨텐트을 특징 짓는 시각 속성을 식별하는 수신된 A/V 신호의 비디오 컨텐트를 분석하고, 적어부 부분적으로는 식별된 시각 속성에 기초하여 A/V 편집 시스템은 수신된 A/V 신호에 합성시키기 위하여 이용 가능한 복수의 오디오 선택 내용에서 적절한 오디오 선택 내용을 식별한다. 따라서 당업자는 혁신적인 A/V 편집 시스템(128)을 포함하는 오락 시스템(100)이 자동으로 선택된 오디오 선택 내용으로 가정용 영화와 다른 비디오 기록을 자동으로 편집하고 질을 높이기 위한 수단을 구비한 오락 시스템으로서 사용자에게 제공된다는 것을 인식하게 된다.As described in more detail below, in accordance with one embodiment of the present invention, the A / V editing system 128 includes a video content within a video stream and a received signal (hereinafter generally referred to as an A / V signal). Receiving the signal from a video recording / reproducing apparatus including arbitrary audio content included in an audio stream. As such, the A / V editing system 128 analyzes the video content of the received A / V signal identifying the visual property that characterizes the video content and, at least in part, based on the identified visual property. Identifies the appropriate audio selection from the plurality of audio selections available for synthesis into the received A / V signal. Therefore, those skilled in the art will appreciate that the entertainment system 100, including the innovative A / V editing system 128, is an entertainment system with means for automatically editing and enhancing the quality of home movies and other video records with automatically selected audio selections. It will be provided to the user.

도 1에 예시한 실시예에 기술된 것처럼, A/V 편집 시스템(128)은 매우 다양한 A/V 컴포넌트로 이용될 수 있다. 도 1에 따르면, 오락 시스템(100)은 도 1에 도시한 것처럼 시스템(100) 내에서 통신을 위해(communicatively) 각각 연결된, 라우팅 멀티플렉서(routing multiplexer, 108, 112), 복수의 비디오 기록/재생 장치, 예를 들면 비디오카세트 기록/플레이어(VCR, 116), 디지털 다목적 디스크(일명 디지털 비디오 디스크나 DVD)(118), 레이저디스크(120), 비디오 카메라(122)와 같은 것, 텔레비전/모니터(126), 일반적으로(cumulatively) 오디오 시스템이라고 하는 복수의 오디오 컴포넌트(132)를 포함하고 있다. 본 발명의 요지를 포함하고 있는 A/V 편집 시스템을 제외하면, 시스템(100)의 각 구성 요소는 매우 다양하게 일반적으로 이용할 수 있는 A/V 컴포넌트를 표현하기 위한 것이고, 이들 구성 요소 각각의 기능과 특징이 종래에 공지되어 있는 한, 이들에 대한 상세한 설명은 기재할 필요가 없다.As described in the embodiment illustrated in FIG. 1, A / V editing system 128 may be used with a wide variety of A / V components. According to FIG. 1, the entertainment system 100 is a plurality of video recording / reproducing apparatus, routing multiplexers 108 and 112, each communicatively connected within the system 100 as shown in FIG. 1. For example, videocassette recording / players (VCRs) 116, digital versatile discs (aka digital video discs or DVDs) 118, laser discs 120, video cameras 122, television / monitors 126 And a plurality of audio components 132, commonly referred to as audio systems. Except for the A / V editing system which includes the gist of the present invention, each component of the system 100 is intended to represent a wide variety of commonly available A / V components, the function of each of these components As long as and features are known in the art, the detailed description thereof does not need to be described.

도 1의 오락 시스템(100)에 도시한 것처럼, A/V 신호는 많은 신호원(source)중 임의의 신호원으로부터 출력될 수 있다. 예시한 도 1의 실시예에서, 오락 시스템(10)은 무선 신호원(wireless source) 및/또는 유선 신호원(wireline source)으로부터 A/V 신호를 수신한다. 즉, A/V 편집 시스템(128)은 예를 들면, 안테나(102)로 수신되는 텔레비전 방송(103)과 위성 접시 안테나(104)로 수신되는 위성 방송(105)을 포함하는 많은 방송 신호원(broadcast source)중 임의의 것을 통하여 A/V 신호를 수신할 수 있다. 이와 유사하게, 오락 시스템(100)은 또한 예를 들면, 회선(106)을 통하여 인터넷 자원(Internet resource), 인트라넷 자원(Intranet resource), 케이블 텔레비전 방송과 같은 유선 신호원으로부터 A/V 신호를 수신한다. 그래서 도 1에 예시한 실시예에 따라서, 회선(106)은 대응하는 다양한 유선 서비스로부터 A/V 신호를 제공하는 플레인 올드 전화 서비스(plain old telephone service, POTS) 회선, 통합 서비스 디지털 네트워크(Integrated Services Digital Network, ISDN) 회선, 케이블 회선, 이더넷 회선(Ethernet line), TI/EI 회선 등을 포함하지만 이에 제한되지 않는 다양한 유선 전송 매체중 임의의 것을 나타내기 위한 것이다. 이와 유사하게 A/V 편집 시스템(128)은 이미 기술한 복수의 비디오 기록/재생 장치(116-122) 중 임의의 것으로부터 A/V 신호를 수신할 수 있다. 대안적인 실시예에서, 텔레비전/모니터(126)와 A/V 편집 시스템(128)은 개별 안테나/유선 신호원이나 MUX(108)로부터 직접 방송 A/V 신호를 수신할 수 있거나 회선(110)을 통해 MUX(108)로부터 직접 이 방송 A/V 신호를 수신할 수 있다. 그러나 결론적으로, 시스템(100)은 편집 시스템(128)에 이용할 수 있는 신호원의 다른 특성을 예시하기 위한 예이고, 보다 많은 성능이나 적은 성능을 갖는 시스템들이 본 발명의 본질이나 범위를 벗어나지 않고 대체될 수 있다는 것은 당업자에겐 자명하다.As shown in the entertainment system 100 of FIG. 1, the A / V signal may be output from any of a number of signal sources. In the illustrated embodiment of FIG. 1, the entertainment system 10 receives A / V signals from a wireless source and / or a wireline source. That is, the A / V editing system 128 may include many broadcast signal sources, including, for example, television broadcast 103 received at the antenna 102 and satellite broadcast 105 received at the satellite dish antenna 104. A / V signals may be received through any of the broadcast sources. Similarly, entertainment system 100 also receives A / V signals from wired signal sources, such as Internet resources, Intranet resources, cable television broadcasts, for example, over circuit 106. do. Thus, in accordance with the embodiment illustrated in FIG. 1, circuit 106 is a plain old telephone service (POTS) circuit that provides A / V signals from a variety of corresponding wired services, an integrated services digital network. It is intended to represent any of a variety of wired transmission media including, but not limited to, Digital Network, ISDN) lines, cable lines, Ethernet lines, TI / EI lines, and the like. Similarly, A / V editing system 128 may receive A / V signals from any of the plurality of video recording / playback devices 116-122 described above. In an alternative embodiment, television / monitor 126 and A / V editing system 128 may receive broadcast A / V signals directly from separate antenna / wired signal sources or MUX 108 or may disconnect line 110. This broadcast A / V signal can be received directly from the MUX 108. In conclusion, however, the system 100 is an example to illustrate other characteristics of the signal source available to the editing system 128, and systems with more or less performance may be substituted without departing from the spirit or scope of the present invention. It will be apparent to those skilled in the art that it can be.

본 발명의 일 실시예에서, A/V 편집 시스템(128)은 다음의 도 7을 참조하여 기술하는 본 발명의 요지를 포함하고 있는 컴퓨터 시스템이다. 다른 실시예에서, A/V 편집 시스템(128)은 필요한 처리 전력(power)이 인가되고(endowned) 본 발명의요지를 포함하고 있는 "셋톱 박스"일 수 있다. 대안적으로, 시스템(100) 내의 각 구성 요소(예를 들면, 텔레비전이나 비디오카세트 리코더) 안에 A/V 편집 시스템(128)을 충분히 내장시킬 수 있다. 그래서 도 1에 도시한 실시예에 따라서, 시스템(100)은 많은 교류원(alternate source) 중 임의의 것으로부터 A/V 신호를 수신할 수 있고 많은 가정에서 발견할 수 있는 많은 오락 시스템 중 임의의 것을 표현하기 위한 것이다.In one embodiment of the present invention, A / V editing system 128 is a computer system that incorporates the subject matter of the present invention as described below with reference to FIG. In another embodiment, A / V editing system 128 may be a "set top box" in which required processing power is applied and incorporates the subject matter of the present invention. Alternatively, the A / V editing system 128 may be fully embedded within each component of the system 100 (eg, a television or videocassette recorder). Thus, in accordance with the embodiment shown in FIG. 1, the system 100 can receive A / V signals from any of a number of alternate sources and can be found in any of a number of entertainment systems found in many homes. It is to express things.

오락 시스템(100)의 환경(context) 내에 이 혁신적인 A/V 편집 시스템(128)의 개념을 도입하면서, 도 2는 본 발명의 요지를 포함하고 있는 오락 시스템에 적절히 이용할 수 있는 A/V 편집 시스템(200)에 대한 예시적인 구조를 도시한 블록도를 도시한다. 도 2에 도시한 예시적인 구조에서, A/V 편집 시스템(200)은 비디오 분석 모듈(video analysis module, 202), 오디오 분석 모듈(audio analysis module, 208), 컨트롤러(206), 표시장치(216), 사용자 입력 장치(218), 오디오 파일부(212)를 포함하고 있고, 각 구성 요소는 도시한 것처럼, 통신을 위해 연결되어 있다. 도 2의 예시적인 실시예에서 오디오 파일부(212)는 컨트롤러(206) 외부에 설치되어 있지만, 이 오디오 파일부(212)는 컨트롤러(206) 내의 대형 기억 장치(도시하지 않음) 내에 충분히 기억될 수 있다는 것은 당업자에겐 자명하다. 대안적인 실시예에서, 오디오 파일부(212)는 인터넷과 회선(106)을 통하여 접근 가능한 원격 위치에 충분히 위치할 수 있거나, 오디오 파일부(212)가 오디오 시스템[예를 들면, 오디오 시스템(132)] 내에 충분히 위치할 수 있다는 것은 당업자에겐 자명하다. 여기서 A/V 편집 시스템(200)과 오디오 시스템 사이는 회선(214)을 이용하여 상호접속된다. 이와 유사하게 대안적인 실시예에서, 표시 장치(216) 대신에, A/V 편집 시스템(200)용 비디오 표시 장치로서 텔레비전(TV)/모니터(126)를 이용한다.While introducing the concept of this innovative A / V editing system 128 into the context of the entertainment system 100, FIG. 2 is an A / V editing system suitable for use in an entertainment system that incorporates the subject matter of the present invention. A block diagram illustrating an example structure for 200 is shown. In the example structure shown in FIG. 2, the A / V editing system 200 includes a video analysis module 202, an audio analysis module 208, a controller 206, a display 216. ), A user input device 218, and an audio file unit 212, each of which is connected for communication, as shown. In the exemplary embodiment of FIG. 2, the audio file unit 212 is installed outside the controller 206, but the audio file unit 212 is sufficiently stored in a large storage device (not shown) in the controller 206. It will be apparent to those skilled in the art that it can. In alternative embodiments, the audio file portion 212 may be sufficiently located in a remote location accessible via the Internet and the line 106, or the audio file portion 212 may be located in an audio system (eg, audio system 132). It will be apparent to those skilled in the art that they can be sufficiently located within Here, the A / V editing system 200 and the audio system are interconnected using the line 214. Similarly, in an alternative embodiment, instead of the display device 216, a television (TV) / monitor 126 is used as the video display device for the A / V editing system 200.

예시적인 A/V 편집 시스템에 대한 구조가 도 2에 도시되었다면, 본 발명의 요지에 따라서, 오디오 기록에 자동으로 합성시키기 위한 예시적인 방법은 도 3에 도시한 순서도를 참조하여 이루어지다. 특히, 설명을 쉽게 하기 위한 것이고 이에 한정되지 않는 편집 시스템(200)의 동작은 도 3을 참조하여 기술하고 도 2를 계속 참조한다. 즉, 일 실시예에서, A/V 편집 시스템(200)의 사용자는 사용자 입력 장치(218)를 이용하여 오디오 합성 특성(audio augmentation feature)을 이용 가능한 상태(enable)로 할 수 있다. 단계 302에서, A/V 편집 시스템(200)의 오디오 합성 특성이 이용 가능한 상태가 아닌 것으로 판정되면, 편집 동작(editing session)은 자동적인 오디오 합성 동작을 실행하지 않고 계속 이어진다(단계 304).If the structure for an exemplary A / V editing system is shown in FIG. 2, in accordance with the teachings of the present invention, an exemplary method for automatically synthesizing to an audio recording is made with reference to the flowchart shown in FIG. In particular, the operation of the editing system 200 for ease of explanation and not limitation, is described with reference to FIG. 3 and continues with reference to FIG. 2. That is, in one embodiment, the user of the A / V editing system 200 may enable the audio augmentation feature to be enabled using the user input device 218. If it is determined in step 302 that the audio synthesis characteristics of the A / V editing system 200 are not available, then the editing session continues without performing an automatic audio synthesis operation (step 304).

그러나 A/V 편집 시스템(200)의 오디오 합성 특성이 이용 가능한 상태이면, A/V 편집 시스템(200)은 분석을 위해, A/V 신호로 미리 정해진 양의 비디오 기록을 로딩한다(단계 306). 일 실시예에서, A/V 편집 시스템(200)은 분석과 오디오 합성을 위하여 비디오 분석 모듈(206)과 오디오 분석 모듈(208) 내의 버퍼(도시하지 않음) 내로 비디오 기록 전체를 로딩한다. 대안적인 실시예에서, A/V 편집 시스템(200)은 분석과 합성을 위하여 비디오 기록 전체의 서브셋(subset)을 로딩한다. 좀더 구체적으로, 이후의 실시예에 따르면, A/V 편집 시스템(200)은 분석과 합성을 위하여 버퍼 내로 비디오 기록의 2분 내지 3분 정도의 부분(segment), 즉 샘플들을 로딩한다. 이 샘플들의 길이는 오디오 파일부(214)의 오디오 파일의 평균 길이에 대응한다. 다른 실시예에서, A/V 편집 시스템(200)은 버퍼 내로 비디오 기록의 각 장면(scene)을 로딩한다. 다음에 좀더 상세하게 기술하는 것처럼, 한 실시예에서, A/V 편집 시스템(200)은 비디오 기록을 포함하는 복수의 장면 각각을 식별하기 위하여 비디오 기록에 대한 초기 분석을 수행하고 오디오를 합성하기 위하여 분석 버퍼로 이들 각각의 장면을 순차적으로(incrementally) 로딩한다.However, if the audio synthesis characteristics of the A / V editing system 200 are available, the A / V editing system 200 loads a predetermined amount of video recording into the A / V signal for analysis (step 306). . In one embodiment, A / V editing system 200 loads the entire video record into a buffer (not shown) in video analysis module 206 and audio analysis module 208 for analysis and audio synthesis. In an alternative embodiment, A / V editing system 200 loads a subset of the entire video recording for analysis and synthesis. More specifically, according to a later embodiment, the A / V editing system 200 loads a segment, or samples, of two to three minutes of video recording into a buffer for analysis and synthesis. The length of these samples corresponds to the average length of the audio file of the audio file portion 214. In another embodiment, A / V editing system 200 loads each scene of the video recording into a buffer. As described in more detail below, in one embodiment, the A / V editing system 200 performs initial analysis on the video record and synthesizes the audio to identify each of a plurality of scenes including the video record. Each of these scenes is incrementally loaded into the analysis buffer.

단계 306에서, 분석을 위해 미리 정해진 양의 비디오 기록을 A/V 신호로 로딩하면, 단계 308과 단계 310에서 비디오 분석 모듈(202)과 오디오 분석 모듈(208)은 동시에 수신된 A/V 신호를 분석한다. 즉, 도시된 도 3의 실시예에 따르면, A/V 신호의 오디오 컨텐트를 분석하는 오디오 분석 모듈(208)뿐만 아니라 A/V 신호의 비디오 컨텐트를 분석하는 비디오 분석 모듈(202)로 동시에 A/V 신호를 제공한다. 특히, 비디오 분석 모듈(202)은 수신된 A/V 신호의 비디오 스트림 내에 포함된 비디오 컨텐트를 분석하고 많은 시각 속성들 중 어느 하나로 비디오 컨텐트를 특징 짓는다(단계 308). 일 실시예에서, 비디오 분석 모듈(202)은 많은 양자화 영역(quantization regions)이나 쿼드런트(quadrant)로 수신된 비디오 스트림을 "분할(split)"하고, 비디오 스트림의 각 양자화 영역 내에 포함되어 있는 비디오 컨텐트를 분석한다. 이 양자화 영역 내로 분할된(broken into) 비디오 스트림의 한 예를 도 4에 도시한다.In step 306, if a predetermined amount of video recording is loaded into the A / V signal for analysis, in steps 308 and 310 the video analysis module 202 and the audio analysis module 208 simultaneously receive the received A / V signal. Analyze That is, according to the illustrated embodiment of FIG. 3, not only the audio analysis module 208 for analyzing the audio content of the A / V signal but also the video analysis module 202 for analyzing the video content of the A / V signal may be A / V simultaneously. Provide the V signal. In particular, video analysis module 202 analyzes the video content included in the video stream of the received A / V signal and features the video content with any of a number of visual attributes (step 308). In one embodiment, video analysis module 202 “splits” a video stream received in many quantization regions or quadrants, and includes video content contained within each quantization region of the video stream. Analyze An example of a video stream broken into this quantization region is shown in FIG. 4.

도 4로 넘어가 참조하면, 대응하는 양자화 영역 내의 비디오 스트림을 도시한다. 특히, 비디오 스트림(400)은 각각 도면 부호 402a, 402b 내지 402n으로 표시된 비디오 스트림(예를 들면 미리 정해진 양의 비디오 스트림)의 많은 "프레임"을 포함하는 것으로 도시되어 있다. 도 4에 도시된 것처럼, 프레임(402b)의 양자화 영역은 양자화 영역 1(Q₁)(404a) 내지 양자화 영역 9(Q₉)(404n)로 이루어진다. 그래서, 본 발명의 한 실시예에 따르면, 비디오 분석 모듈(202)은 자신의 비디오 속성으로 비디오 컨텐트를 특징 짓기 위하여 비디오 스트림의 각 프레임에 대한 양자화 영역 내의 비디오 컨텐트를 분석한다.4, a video stream in a corresponding quantization region is shown. In particular, video stream 400 is shown to include many "frames" of video streams (e.g., predetermined amounts of video streams), denoted by reference numerals 402a, 402b through 402n, respectively. As shown in Fig. 4, the quantization region of the frame 402b consists of quantization region 1 (Q ₁ ) 404a to quantization region 9 (Q ₉ ) 404n. Thus, according to one embodiment of the present invention, video analysis module 202 analyzes the video content in the quantization region for each frame of the video stream to characterize the video content with its video attributes.

예를 들면, 한 실시예에서, 비디오 분석 모듈(202)은 틈틈이 적절한 계조를 갖는 컬러 속성, 예를 들면, 수신된 A/V 신호의 비디오 컨텐트가 "시원한(cool)"색(파란색, 흰색), "더운(hot)"색(빨간색, 노란색) 또는 "따뜻한(worn)"색이나 "흙(earthy)"색(갈색, 오렌지색)등인지, 0["차가운"(흰색)] 내지 10["더운"(빨간)]의 등급(scale)을 갖는 출력에 해당하는지를 비디오 스트림(400)의 각 양자화 영역(404a 내지 404n)을 분석한다. 다른 실시예에서, 비디오 분석 모듈(202)은 비디오 속성을 밝히기 위하여, 예를 들면, 비디오 컨텐트가 "밝은"이거나 "어두운"인지를 밝히기 위하여 비디오 스트림(400)의 각 양자화 영역(404a 내지 404n)을 분석한다. 일 실시예에서, 비디오 분석 모듈(202)은 내용 시각 속성과 움직임 시각 속성(motion visual attribute)에 대하여, 예를 들면 비디오가 도시 풍경이나 시골 풍경을 포함하고 있는지, 사람을 포함하고 있는지, 이들 사람이 움직이는지 앉아있는지에 대하여 비디오 스트림(400)의 각 양자화 영역(404a 내지 404n)을 분석한다. 비디오 컨텐트 내의 움직임(motion)/동작(action)을 양자화하기 위한 시스템의 일 예가 본 발명의 양수인에게 양도되고 계류중인 미합중국 출원 번호 제08/918,681호(출원인: Adnan Allatar, 제목: Bit-Rate Control of Video Data Compression)에 개시되어 있다. 또한 다른 실시예에서, 비디오 분석 모듈(202)은 이미 기술한 각 시각 속성에 대하여 비디오 스트림의 양자화 영역(404a 내지 404n)을 분석한다.For example, in one embodiment, the video analysis module 202 is a color attribute with a suitable gradation, e.g., the video content of the received A / V signal is "cool" color (blue, white). , "Hot" (red, yellow) or "worn" or "earthy" (brown, orange), etc., 0 ["cold" (white)] to 10 [" Each quantization region 404a-404n of the video stream 400 is analyzed to see if it corresponds to an output having a scale of "hot" (red)]. In another embodiment, video analysis module 202 may use each quantization region 404a-404n of video stream 400 to reveal video attributes, for example, to reveal whether the video content is "bright" or "dark." Analyze In one embodiment, the video analysis module 202 is configured for content visual attributes and motion visual attributes, for example, whether the video includes urban or rural landscapes, humans, and humans. Each quantization region 404a-404n of the video stream 400 is analyzed for whether it is moving or sitting. An example of a system for quantizing motion / action in video content is US Patent Application No. 08 / 918,681, filed by Adnan Allatar, titled Bit-Rate Control of Video Data Compression). Also in another embodiment, the video analysis module 202 analyzes the quantization regions 404a through 404n of the video stream for each visual attribute already described.

다시 도 2에 도시한 예시적인 방법으로 되돌아가면, 단계 308에서 비디오를 분석하는 것 이외에, 비디오 분석 모듈(208)은 수신된 A/V 신호의 오디오 스트림에 수신된, 만약 있다면, 오디오 컨텐트[이하, 주 오디오 컨텐트(primary audio content)이라 칭함]을 분석하고, 오디오 스트림 내에 포함되어 있는 주 오디오 컨텐트(primary audio content)를 특성 짓는 오디오 속성 정보를 식별한다(단계 310). 예시한 실시예에 대하여, 단계 310에서 오디오 속성 정보를 식별하는 목적은 합성된 오디오 선택 내용의 레벨, 예를 들면 볼륨을 설정하기 위한 것으로, 합성된 오디오 선택 내용의 관련 레벨이 만약 있다면 주 오디오 컨텐트를 완전히 "묻혀버리(smother)"거나 주 오디오 컨텐트의 "소리가 안 들리는(drowned out)" 것을 방지한다. 수신된 A/V 신호의 주 오디오 컨텐트를 분석(예를 들면 단계 310)하기 위한 한 예를 도 5에 도시한다.Returning back to the example method shown in FIG. 2, in addition to analyzing the video at step 308, the video analysis module 208 also receives audio content, if any, received in the audio stream of the received A / V signal. , Referred to as primary audio content, and identifies audio attribute information that characterizes the primary audio content included in the audio stream (step 310). For the illustrated embodiment, the purpose of identifying the audio attribute information in step 310 is to set the level of the synthesized audio selection content, for example the volume, if there is a relevant level of the synthesized audio selection content, the main audio content. To completely "smother" or "drowned out" the main audio content. An example for analyzing (eg, step 310) the primary audio content of the received A / V signal is shown in FIG. 5.

도 5를 참조하여, 수신된 A/V 신호의 주 오디오 컨텐트를 분석하기 위한 방법의 한 예를 본 발명의 실시예에 따라서 기술한다. 도 5에 도시한 일 실시예에서, 오디오 분석은 수신된 A/V 신호가 오디오 컨텐트(즉, 주 오디오 컨텐트)를 갖고 있는 오디오 스트림을 포함하는지의 여부를 판정하는 오디오 분석 모듈(208)의 동작부터 시작된다(단계 502). 오디오 분석 모듈(208)이 어떠한 오디오 컨텐트도실어 나르지 않는 것으로 오디오 스트림을 판정하면, 최종적으로 A/V 편집 시스템(200)에 의해 선택된 오디오 선택 내용은 기록을 위하여 오디오 컨텐트만을 제공하며, 오디오 분석 모듈(208)은 회선(214)을 경유하여 컨트롤러(206)로 레벨 표시(level indication)를 제공한다(단계 504). 이런 환경에서, 비디오 기록 중 이 부분에 해당하는 오디오 컨텐트만을 제공하는 것처럼, 이 레벨 표시는 A/V 편집 시스템이 선택한 오디오 선택 내용의 볼륨을 "고(high)" 레벨로 설정한다. 그러나, 오디오 분석 모듈(208)이 오디오 컨텐트를 포함하고 있는 것으로 수신된 A/V 신호를 판정할 경우(단계 502), 오디오 분석 모듈(208)은 주 오디오 컨텐트가 말(speech)을 포함하고 있는지를 다음에 결정한다(단계 506). 일 실시예에서, 오디오 분석 모듈(208)은 이 태스크를 수행하기 위한 많은 이용 가능한 말 인식 장치(speech recognition device) 중 소정의 것을 이용한다.5, an example of a method for analyzing main audio content of a received A / V signal is described according to an embodiment of the present invention. In one embodiment shown in FIG. 5, audio analysis operates the audio analysis module 208 to determine whether a received A / V signal includes an audio stream having audio content (ie, primary audio content). Start at step 502. If the audio analysis module 208 determines that the audio stream does not carry any audio content, the audio selection finally selected by the A / V editing system 200 provides only the audio content for recording and the audio analysis. Module 208 provides a level indication to controller 206 via line 214 (step 504). In this environment, this level indication sets the volume of the audio selection selected by the A / V editing system to a "high" level, as it provides only audio content corresponding to this part of the video recording. However, when the audio analysis module 208 determines the received A / V signal as containing audio content (step 502), the audio analysis module 208 determines whether the main audio content contains speech. Is determined next (step 506). In one embodiment, the audio analysis module 208 uses any of many available speech recognition devices to perform this task.

단계 506에서, 오디오 분석 모듈(208)이 말로 이루어져 있는 것으로 주 오디오 컨텐트를 판정할 경우, 오디오 분석 모듈(208)은 확실하게 말소리가 들리도록 회선(214)을 경유하여 컨트롤러로 레벨 표시를 출력한다. 대안적으로, 단계 506에서 오디오 분석 모듈(208)이 말이 아니라고 주 오디오 컨텐트를 판정할 경우, 주 오디오 컨텐트가 음악으로 이루어져 있는지를 단계 510에서 판정한다. 일 실시예에서, 이러한 결정을 할 때 오디오 분석 모듈(208)은 주 오디오 컨텐트의 특징을 나타내는 관련 주파수 스펙트럼의 레벨과 폭을 분석한다. 예를 들면, 스펙트럼 분석은 주 오디오 컨텐트가 음악으로 이루어져 있다는 표시를 제공하고, 여기서 주 오디오 컨텐트는 시간에 따라서 변하는 스펙트럼 내의 커다란 기울기를 갖는 넓은주파수 스펙트럼을 범위를 나타낸다(span).In step 506, when the audio analysis module 208 determines the main audio content as being composed of words, the audio analysis module 208 outputs a level indication to the controller via the line 214 so that words can be heard reliably. . Alternatively, when the audio analysis module 208 determines the main audio content to be nonsense at step 506, it is determined at step 510 whether the main audio content consists of music. In one embodiment, when making this determination, the audio analysis module 208 analyzes the level and width of the relevant frequency spectrum that characterizes the main audio content. For example, spectral analysis provides an indication that the main audio content consists of music, where the main audio content spans a wide frequency spectrum with a large slope in the spectrum that changes over time.

대안적인 실시예에서, 오디오 분석 모듈(208)은 수신된 주 오디오 컨텐트를 비교하기 위하여 많은 음악 선택 내용을 갖도록 미리 프로그래밍될 수 있다.In an alternative embodiment, the audio analysis module 208 may be preprogrammed to have many music selections to compare the received main audio content.

분석 방법에 무관하게, 이미 주 오디오 컨텐트가 말이 아닌 것으로 판정될 경우, 오디오 분석 모듈(208)이 음악이 아닌 것으로 주 오디오 컨텐트를 판정하면(단계 510), A/V 편집 시스템(200)에 의해 최종적으로 선택된 오디오 선택 내용은 배경 소리(background audio)로 되고 결과적으로 주 오디오 컨텐트가 배경 소리 선택부에 완전히 "묻혀버리지" 않도록 회선(214)을 통해 컨트롤러(216)로 레벨 표시를 출력한다(단계 508). 그러나 단계 510에서 오디오 분석 모듈(208)이 음악으로 이루어지는 것으로 주 오디오 컨텐트를 판정하면, 오디오 분석 모듈(208)은 회선(214)을 통해 컨트롤러(206)로 이 오디오 속성 정보를 제공한다. 그 후, 컨트롤러(206)는 표시 장치(216)를 통하여 A/V 편집 시스템(200)의 사용자에게 주 오디오 컨텐트(예를 들면, 음악)를 겹쳐 쓸 수 있는 옵션을 이용할 수 있도록 한다. 비록 이 단계(512)는 도 3에 도시한 방법(300)을 통해 후에 완료된다는 것이 당업자에겐 자명하지만, 단지 설명을 계속하고 용이하게 하기 위하여, 컨트롤러(206)가 A/V 편집 시스템(200)의 사용자에게 이러한 옵션을 제공하는 기능은 단계 512와 같이 도 5에 도시한 한 실시예에서 제공한다.Regardless of the analysis method, if it is already determined that the main audio content is nonsense, if the audio analysis module 208 determines that the main audio content is not music (step 510), the A / V editing system 200 The finally selected audio selection becomes background audio and as a result outputs a level indication to the controller 216 via line 214 so that the main audio content is not completely "buried" in the background sound selection (step) 508). However, if in step 510 the audio analysis module 208 determines the main audio content as being composed of music, the audio analysis module 208 provides this audio attribute information to the controller 206 via the line 214. The controller 206 then allows the user of the A / V editing system 200 to use the option to overwrite the main audio content (eg, music) via the display device 216. Although it will be apparent to those skilled in the art that this step 512 may be completed later via the method 300 shown in FIG. 3, for purposes of continuation and ease of explanation only, the controller 206 may include the A / V editing system 200. The ability to provide these options to the user of is provided in one embodiment as shown in FIG.

사용자가 수신된 A/V 신호의 주 오디오 컨텐트를 겹쳐 쓰도록 선택할 경우, A/V 편집 시스템(200)에 의해 선택된 오디오 선택 내용은 A/V 편집 시스템(200)에 의해 생성된 합성 신호의 주 오디오 컨텐트로 되므로, 적절한 레벨로 설정된다(단계 504). 그러나 단계 512에서 A/V 편집 시스템(200)의 사용자가 음악으로 이루어져 있는 주 오디오 컨텐트를 겹쳐 쓰지 않도록 선택할 경우, 오디오를 합성하지 않고 미리 정해진 양의 수신된 A/V 신호에 대한 편집 동작을 계속한다.If the user chooses to overwrite the main audio content of the received A / V signal, the audio selection selected by the A / V editing system 200 is the primary of the synthesized signal generated by the A / V editing system 200. Since it is audio content, it is set to an appropriate level (step 504). However, in step 512, if the user of the A / V editing system 200 chooses not to overwrite the main audio content consisting of music, the editing operation for a predetermined amount of received A / V signals is continued without synthesizing the audio. do.

그러므로, 본 발명의 일 실시예에 따라서, 오디오 스트림의 내용은 수신된 A/V 신호에 합성시키기 위한 레벨을 설정하기 위한 것으로만 단지 분석된다. 즉, 예시적인 실시예에서, 오디오 분석 모듈(208)의 기능은 컨트롤러(206)가 설정한 합성된 오디오의 기록 레벨(예를 들면, 볼륨)이 주 오디오 컨텐트를 완전히 "묻어버리지"않도록 주 오디오 컨텐트의 오디오 속성을 식별하기 위한 것이다. 그러나 다른 실시예에서 오디오 컨텐트를 분석하는 것은 본 발명의 범위와 본질을 벗어나지 않은 부가적인 기능을 수행하기 위한 것이라는 것이 당업자에겐 자명하다.Therefore, in accordance with one embodiment of the present invention, the content of the audio stream is only analyzed for setting the level for combining to the received A / V signal. That is, in an exemplary embodiment, the functionality of the audio analysis module 208 is such that the recording level (eg, volume) of the synthesized audio set by the controller 206 does not completely "bury" the main audio content. To identify the audio attribute of the content. However, it will be apparent to those skilled in the art that, in another embodiment, analyzing the audio content is to perform additional functions without departing from the scope and spirit of the present invention.

도 3에 도시한 예시적인 방법을 계속 설명하면, 단계 308에서 비디오 분석 모듈(202)로부터 시각 속성 정보를 수신하면, 컨트롤러(206)는 적어도 부분적으로는 수신된 시각 속성 정보에 기초하여 오디오 파일부(212) 내에 포함된 복수의 오디오 선택 내용으로부터 적절한 오디오 선택 내용을 식별한다(단계 312). 한 실시예에서, 컨트롤러(206)는 많은 대응하는 시각 속성 중 임의의 것에 기초하여 적절한 오디오 선택 내용을 참조하는 데이터베이스에 의존한다. 컨트롤러(206)에서 이용하기 적절한 데이터베이스의 한 예를 도 6에 도시한다.Continuing to describe the example method shown in FIG. 3, upon receiving the visual attribute information from the video analysis module 202 in step 308, the controller 206 may at least in part based on the received visual attribute information. Appropriate audio selections are identified from the plurality of audio selections contained within 212 (step 312). In one embodiment, controller 206 relies on a database that references the appropriate audio selection based on any of many corresponding visual attributes. One example of a database suitable for use in the controller 206 is shown in FIG. 6.

도 6에 도시한 한 실시예에서, 데이터베이스(600)는 많은 시각 속성을 참조한(cross reference) 많은 오디오 선택 내용을 포함하고 있는 것이 도시되어 있다. 도 6의 실시예에 도시되어 있는 것처럼, 데이터베이스(600)는 2차원 데이터베이스에서 대응하는 시각 속성에 해당하는 오디오 선택 내용을 참조한다. 한 실시예에서, y축은 오디오 선택 내용의 유형(genre, 602)을 식별함으로써 특성이 나타난다. 그러나 이 유형 정보(genre information, 602)가 데이터베이스(600) 내에서 유형 정보를 조직화할 수 있는 많은 대안적인 수단들 중 하나라는 것은 당업자에겐 자명하다. 그래서 본 발명의 본질과 범위를 벗어나지 않고 데이터베이스(600)의 이 정보를 조직화하기 위하여 많은 적절한 대안적인 접근이 존재한다.In one embodiment, shown in FIG. 6, the database 600 is shown to contain many audio selections with many cross-references. As shown in the embodiment of FIG. 6, the database 600 references audio selection content corresponding to the corresponding visual attributes in the two-dimensional database. In one embodiment, the y-axis is characterized by identifying the type of audio selection content (genre 602). However, it will be apparent to those skilled in the art that this type information 602 is one of many alternative means by which type information can be organized within the database 600. Thus, there are many suitable alternative approaches for organizing this information in database 600 without departing from the spirit and scope of the present invention.

데이터베이스(600)의 x축은 예를 들면 컬러 속성(color attribute, 606), 밝기 속성(lighting attribute, 608), 내용/움직임 속성(content/motion attribution, 610)과 같은 대응하는 시각 속성을 참조한(across referenced) 오디오 선택 내용(604)에 의해 특성이 나타난다. 본 발명의 한 실시예에 따라서, A/V 편집 시스템(200)은 많은 오디오 선택 내용으로 미리 로딩될 수 있고, 데이터베이스(600) 내에 기억된 이 오디오 선택 내용을 다양한 속성 정보에 따라서 참조한다. 다른 실시예에서, A/V 편집 시스템(200)은 오디오 선택 내용을 오디오 파일부(212)에 부가시키거나 이 오디오 파일부(212)로부터 삭제시킬 수 있는 사용자 인터페이스를 제공하여, 적절하게 컨트롤러(206)는 부가/삭제로 데이터베이스(600)를 자동으로 갱신한다. 또한 비록 데이터베이스(600)가 2차원 데이터베이스로 도시되었지만, 이는 단지 설명을 용이하게 하기 위한 것이라는 것이 당업자에겐 자명하다. 즉, 데이터베이스(600)는 그 데이터베이스 내에 포함된 많고 복잡한 정보로 인한 대응적인 효과를 갖는 좀더 복잡하거나 좀 덜 복잡한 데이터베이스로 바람직하게 교체될 수 있다.The x-axis of the database 600 refers to a corresponding visual attribute, such as, for example, a color attribute 606, a lighting attribute 608, a content / motion attribution 610. The characterization is indicated by the audio selection 604. According to one embodiment of the present invention, the A / V editing system 200 may be preloaded with many audio selections, and references these audio selections stored in the database 600 in accordance with various attribute information. In another embodiment, the A / V editing system 200 provides a user interface that can add or remove audio selections to or from the audio file portion 212, as appropriate to provide a controller ( 206 automatically updates the database 600 with add / delete. Also, although database 600 is shown as a two-dimensional database, it is apparent to those skilled in the art that this is for ease of explanation only. That is, the database 600 may preferably be replaced with a more complex or less complex database having a corresponding effect due to the many complex information contained within that database.

다시 도 3에 도시한 실시예로 되돌아가면, 단계 312에서 적어도 부분적으로는 비디오 기록의 식별된 시각 속성에 기초한 비디오 기록에 합성시키기 위하여 오디오 선택 내용을 자동으로 선택하면, 컨트롤러(206)는 이 컨트롤러(206)의 오디오 선택 내용을 수락하거나 다른 선택을 위하여 오디오 선택 내용을 거절하는 옵션을 A/V 편집 시스템(200)의 사용자가 이용할 수 있도록 한다(단계 314). 단계 314에서, 사용자가 컨트롤러(206)의 오디오 선택 내용을 수락하면, 이 컨트롤러(206)는 오디오 선택 내용을 비디오 기록에 동기시키고, 오디오 분석 모듈(208)이 자동으로 정한 레벨에서 만일 가능하다면 주 오디오 컨텐트와 오디오 선택 내용을 적절하게 혼합하여, 자동으로 식별된 오디오 선택 내용과 합성되어 수신된 A/V 신호를 포함하고 있는 합성 신호를 출력한다. 일 실시예에서, 자동으로 식별된 오디오 선택 내용을 주 오디오 컨텐트와 혼합하는 동안, 컨트롤러(206)는 비디오 컨텐트에서 식별된 움직임 속도(rate)나 주 오디오 컨텐트의 템포(tempo)로 오디오 선택 내용의 "템포(예를 들면 속도)"를 조정한다.Returning to the embodiment shown in FIG. 3 again, in step 312 the controller 206 automatically selects the audio selections to synthesize to the video record based at least in part on the identified visual properties of the video record. The option of accepting the audio selections of 206 or rejecting the audio selections for another selection is made available to the user of the A / V editing system 200 (step 314). In step 314, if the user accepts the audio selections of the controller 206, the controller 206 synchronizes the audio selections to the video recording and, if possible, at a level automatically determined by the audio analysis module 208, if possible. The audio content and the audio selection are properly mixed to output a synthesized signal including the received A / V signal synthesized with the automatically identified audio selection. In one embodiment, while mixing the automatically identified audio selections with the main audio content, the controller 206 is responsible for the selection of the audio selections at the rate of movement identified in the video content or the tempo of the main audio content. Adjust the "tempo (eg speed)".

그러나 사용자가 단계 314에서 컨트롤러(206)의 오디오 선택 내용을 거절하면, 이용 가능한 오디오 선택 내용의 데이터베이스(600)를 사용자가 접근할 수 있는 인터페이스를 사용자에게 제공하고, 이 사용자는 비디오 기록에 합성시키는 오디오 선택 내용을 선택한다(단계 316). 단계 318에서, 컨트롤러(206)는 비디오 기록의 끝단(end)에 도달했는지를 판정한다. 만약 그렇다면, 이 방법을 종료한다. 그러나 비디오 기록의 끝단에 아직 도착하지 않은 것으로 컨트롤러(206)가 판정하면, 이 방법은 단계 306으로 넘어가고, 그런 다음, 편집하기 위해 미리 정해진 양의 비디오를 적절히 A/V 편집 시스템(200)으로 로딩한다.However, if the user rejects the audio selections of the controller 206 in step 314, the user provides an interface for the user to access the database 600 of available audio selections, which the user then synthesizes into the video record. Select the audio selection (step 316). At step 318, controller 206 determines whether the end of video recording has been reached. If so, exit this method. However, if the controller 206 determines that it has not yet reached the end of the video recording, the method proceeds to step 306 and then appropriately transfers the predetermined amount of video to the A / V editing system 200 for editing. Load.

그러므로 본 발명의 요지에 따라서, A/V 편집 시스템(200)은 수신된 A/V 신호의 비디오 컨텐트를 분석하고, 많은 시각 속성으로 비디오 컨텐트의 특성을 나타낸다. 또한 A/V 편집 시스템(200)은 오디오 선택 내용을 주 오디오 컨텐트와 "혼합"하기 위한 레벨을 자동으로 식별하기 위하여 수신된 A/V 신호의 오디오 컨텐트, 예를 들면 주 오디오 컨텐트를 분석한다. 시각 속성 정보를 생성하면, A/V 편집 시스템(200)은 적어도 부분적으로 식별된 시각 속성 정보에 기초하여 복수의 이용 가능한 오디오 선택 내용으로부터 오디오 선택 내용을 선택한다. 주 오디오 컨텐트에 합성시키는 오디오 선택 내용을 식별한 후, 사용자 승인을 수신할 경우 A/V 편집 시스템(200)은 오디오 스트림의 정해진 오디오 속성 정보에 따라서 자동으로 결정된 레벨에서 주 오디오 컨텐트와 오디오 선택 내용을 혼합한다. 본 발명의 일 실시예에 따라서, 오디오 선택 내용은 음악물(musical composition), 예를 들면 노래이다. 대안적인 실시예에서, 오디오 선택 내용은 시, 소네트 또는 수신된 A/V 신호의 시각 속성으로 묘사되는 무드를 고취시키기 위하여 A/V 편집 시스템(200)에 의해 자동으로 선택되는 다른 서정적인 문장(lyrical composition)이다.Therefore, in accordance with the teachings of the present invention, the A / V editing system 200 analyzes the video content of the received A / V signal and indicates the nature of the video content with many visual attributes. The A / V editing system 200 also analyzes the audio content of the received A / V signal, for example the main audio content, to automatically identify the level for "mixing" the audio selection with the main audio content. Upon generating the visual attribute information, the A / V editing system 200 selects the audio selection contents from the plurality of available audio selection contents based at least in part on the identified visual attribute information. After identifying the audio selections to be synthesized to the main audio content, upon receiving user approval, the A / V editing system 200 automatically determines the main audio content and audio selection at a level determined automatically according to the determined audio attribute information of the audio stream. Mix it. According to one embodiment of the invention, the audio selection is a musical composition, for example a song. In an alternative embodiment, the audio selection content may be automatically selected by the A / V editing system 200 to inspire mood described by visual, sonnet or visual attributes of the received A / V signal. lyrical composition.

본 발명의 일 실시예에 따르면, A/V 편집 시스템(200)은 비디오 스트림을 분석하고 수신된 A/V 신호의 비디오 컨텐트에 대한 특징을 자신의 시각 속성으로 나타내도록 적절히 구성된 컴퓨터 시스템이고, 적어도 부분적으로 이러한 시각 속성에 기초하여 컴퓨터 시스템은 수신된 A/V 신호에 합성되는 오디오 선택 내용을 식별하며 수신된 A/V 신호의 조합물과 자동으로 식별된 오디오 선택 내용을 기록을위하여 생성한다. 다음에 좀더 상세하게 기술하는 것처럼, A/V 편집 시스템(200)은 종래에 공지된 넓은 범위의 컴퓨터 시스템을 표현하기 위한 것이다. 이 컴퓨터 시스템의 한 예는 오디오 신호와 비디오 신호를 수신하고, 디지털화하고, 압축하며, 압축 해제하기 위한 많은 오디오 및 비디오 입/출력 주변장치(peripheral)/인터페이스(interface) 중 소정의 것을 포함하는, 일반적으로 California의 Santa Clare에 소재한 Intel Corporation에 의해 제조되고 이 Intel Corporation으로부터 입수할 수 있는 펜티엄^ⓡ프로세서, 펜티엄^ⓡ프로 프로세서, 펜티엄^ⓡIII 프로세서와 같은 고성능 마이크로 프로세서를 갖춘 데스크탑 시스템이다. A/V 편집 시스템(200)에 대한 외부 크기와 디자인은 오락 시스템, 예를 들면 오락 시스템(100) 내에 좀더 시각적으로 적절하게 어울리는 것으로 교체할 수 있다. 따라서, A/V 편집 시스템(200)은 본 발명의 요지를 포함하고 있는 "셋톱" 박스 내에 충분히 설치될 수 있다.According to one embodiment of the invention, the A / V editing system 200 is a computer system suitably configured to analyze the video stream and to indicate in its visual attributes the characteristics of the video content of the received A / V signal, at least Based in part on these visual attributes, the computer system identifies the audio selections synthesized in the received A / V signals and generates for recording a combination of the received A / V signals and the automatically identified audio selections. As described in more detail below, the A / V editing system 200 is intended to represent a wide range of computer systems known in the art. One example of this computer system includes any of a number of audio and video input / output peripherals / interfaces for receiving, digitizing, compressing, and decompressing audio and video signals. generally manufactured by Intel Corporation, based in Santa Clare, California is a desktop system with high-performance microprocessors such as the Pentium processor ^ⓡ, ^ⓡ Pentium Pro processor, Pentium ^ⓡ III processor available from the Intel Corporation. The external size and design for the A / V editing system 200 may be replaced with a more visually appropriate fit within the entertainment system, for example the entertainment system 100. Thus, the A / V editing system 200 can be sufficiently installed in a "set top" box containing the subject matter of the present invention.

도 7은 본 발명의 요지를 포함한 컴퓨터 시스템[예를 들면, 시스템(700)]의 블록도를 도시한다. 한 실시예에서, 시스템(700)은 도 1의 A/V 편집 시스템(128)이다. 예시한 실시예에서, 시스템(700)은 도시한 것처럼 서로 연결된 적어도 하나의 프로세서[예를 들면, 프로세서(702)]와 캐시 메모리(704)를 포함한다. 또한, 시스템(700)은 도시한 것처럼 고 성능 입력/출력(I/O) 버스(706)와 표준 I/O 버스(708)를 포함한다. 호스트 브리지(710)는 프로세스(702)를 고 성능 I/O 버스(706)에 연결하는 반면에, I/O 버스 브리지(712)는 고 성능 I/O 버스(706)를표준 I/O 버스(708)에 연결한다. 고 성능 I/O버스(706)에는 네트워크/통신 인터페이스(724), 시스템 메모리(714), 오디오/비디오 인터페이스 보드(730), A/V 편집기(732)와 비디오 메모리(716)를 연결시킨다. 또한 표시 장치(718)를 비디오 메모리(716)에 연결시킨다. 표준 I/O 버스(708)에 대용량 기억 장치(720), 키보드 및 위치 결정 장치(pointing device, 722), I/O 포트(726)를 연결시킨다. 한 실시예에서, 키보드 및 위치 결정 장치는 직렬 통신 인터페이스 케이블을 이용하여 표준 I/O 버스(708)에 연결되는 반면에, 대안적인 실시예에서는 적외선(IR) 인터페이스나 무선 주파수(RF) 인터페이스로 통신을 위해 연결될 수 있다.7 illustrates a block diagram of a computer system (eg, system 700) incorporating the subject matter of the present invention. In one embodiment, system 700 is A / V editing system 128 of FIG. 1. In the illustrated embodiment, the system 700 includes at least one processor (eg, processor 702) and cache memory 704 connected to one another as shown. System 700 also includes a high performance input / output (I / O) bus 706 and a standard I / O bus 708 as shown. The host bridge 710 connects the process 702 to the high performance I / O bus 706, while the I / O bus bridge 712 connects the high performance I / O bus 706 to the standard I / O bus. (708). The high performance I / O bus 706 connects the network / communication interface 724, system memory 714, audio / video interface board 730, A / V editor 732, and video memory 716. In addition, the display device 718 is connected to the video memory 716. A mass storage device 720, a keyboard and pointing device 722, and an I / O port 726 are connected to a standard I / O bus 708. In one embodiment, the keyboard and positioning device are connected to a standard I / O bus 708 using a serial communication interface cable, while in alternative embodiments, an infrared (IR) interface or a radio frequency (RF) interface. Can be connected for communication.

계속 도 7을 참조하면, 구성 요소(702 - 730)는 종래에 이미 공지되어 있는 그들만의 고유 기능을 수행한다. 특히, 네트워크/통신 인터페이스(724)는 시스템(700)과 이더넷, 토큰 링, 인터넷 등과 같은 통상적인 광역 네트워크(wide range network)들 중 임의의 것들 간의 통신을 실시하기 위하여 이용된다. 이와 유사하게, 오디오/비디오 인터페이스 보드(730)는 RF 방송, 위성 방송 케이블 방송 등과 같은 통상적인 광역 유선 및 무선 방송 매체 중 임의의 것으로부터 방송 통신을 수신하기 위하여 이용된다. 이미 기술한 기능을 이행하기 위하여 데이터와 프로그래밍 명령을 영구히 기억하기 위하여 대용량 기억 장치(720)를 제공하는 반면에, 프로세스(702)에 의해 실행될 때 데이터나 프로그래밍 명령을 임시 기억하기 위하여 시스템 메모리(714)를 제공한다. I/O 포트(726)는 시스템(700)(예를 들면, 스테레오, 스피커 등)에 연결될 수 있는 부가적인 주변 장치들 간의 통신을 제공하기 위한 이용되는 하나 이상의 직렬 통신 포트 및/또는 병렬 통신 포트이다. 결론적으로(collectively), 시스템(700)에 연결된 구성 요소는 California의 Santa Clare에 소재한 Intel Corporation으로부터 일반적으로 입수할 수 있는 펜티엄^ⓡ프로세서, 펜티엄^ⓡ프로 프로세서, 펜티엄^ⓡII 프로세서에 기초한 범용 컴퓨터 시스템을 포함하지만 이에 한정되지 않는 넓은 유형의 하드웨어 시스템을 표현하기 위한 것이다.With continued reference to FIG. 7, components 702-730 perform their own unique functions that are already known in the art. In particular, network / communication interface 724 is used to effect communication between system 700 and any of conventional wide range networks, such as Ethernet, token ring, the Internet, and the like. Similarly, audio / video interface board 730 is used to receive broadcast communications from any of conventional wide area wired and wireless broadcast media, such as RF broadcasts, satellite broadcast cable broadcasts, and the like. Mass storage 720 is provided to permanently store data and programming instructions to perform the functions previously described, while system memory 714 is provided to temporarily store data or programming instructions when executed by process 702. ). I / O port 726 is one or more serial communication ports and / or parallel communication ports used to provide communication between additional peripheral devices that may be connected to system 700 (eg, stereo, speakers, etc.). to be. In conclusion, (collectively), connected components in the system 700 comprises a general purpose computer systems based on the Pentium ^ⓡ processor, Pentium ^ⓡ Pro processor, a Pentium ^ⓡ II processor capable of generally available from Intel Corporation, located in Santa Clare, California However, it is not intended to limit the scope of the hardware system.

한 실시예에서, A/V 편집기(732)는 A/V 편집 시스템(200)의 비디오 분석 모듈(202)과 오디오 분석 모듈(208)을 포함하는 반면에, A/V 편집 시스템(200)의 컨트롤러(206), 표시 장치(216)와 사용자 인터페이스 장치(218)는 도 7에 도시한 시스템(700)의 프로세서(702), 표시 장치(718)와 키보드 및 위치 결정 장치(722)에 각각 대응한다. 한 실시예에서, 오디오 파일부(216)는 대용량 기억 장치(720)에 기억되거나 멀리 위치하고 있고, 통신을 위해 네트워크/통신 인터페이스(724)를 통해 시스템(700)에 연결된다. 한 실시예에서, 시스템(700)은 상기한 요지에 따라서, 네트워크/통신 인터페이스(724) 및/또는 오디오/비디오 튜너 인터페이스(730)로부터 A/V 신호를 수신하고, 시각 속성 정보에 대하여 비디오 컨텐트를 분석하며, 수신된 A/V 신호에 합성되는 오디오 선택 내용을 자동으로 식별한다. 대안적인 실시예에서, 시스템(700)은 I/O 포트(726)중 하나에 연결된 안테나(도시하지 않음)를 통해 A/V 신호를 수신하고, 수신한 A/V 신호에 합성되는 적절한 오디오 선택 내용을 자동으로 식별한다.In one embodiment, A / V editor 732 includes video analysis module 202 and audio analysis module 208 of A / V editing system 200, while A / V editor 732 includes A / V editor 208. The controller 206, the display device 216, and the user interface device 218 correspond to the processor 702, the display device 718, the keyboard, and the positioning device 722 of the system 700 shown in FIG. 7, respectively. do. In one embodiment, the audio file portion 216 is stored in or remotely located in the mass storage device 720 and is connected to the system 700 via a network / communication interface 724 for communication. In one embodiment, the system 700 receives A / V signals from the network / communication interface 724 and / or the audio / video tuner interface 730 in accordance with the foregoing subject matter, and provides video content for visual attribute information. Analyze and automatically identify audio selections synthesized on the received A / V signal. In an alternative embodiment, system 700 receives A / V signals through an antenna (not shown) connected to one of I / O ports 726 and selects the appropriate audio to be synthesized with the received A / V signals. Automatically identify the content.

시스템(700)의 여러 구성 요소가 다시 배치될 수 있다는 것은 자명하다. 예를 들면, 캐시(704)는 프로세서(702)를 구비한 온 칩(on-chip)일 수 있다. 대안적으로, 캐시(704)와 프로세서(702)는 "프로세서 코어(processor core)"로 지칭되는 프로세서(702)를 구비한 "프로세서 모듈"처럼 함께 패키징될 수 있다. 또한, 대용량 기억 장치(720), 키보드 및 위치 결정 장치(722) 및/또는 표시 장치(718)와 비디오 메모리(716)는 시스템(700)에 내장될 수 없다. 부가적으로 표준 I/O 버스(708)에 연결된 도시되어 있는 주변 장치를 대안적인 실시예에서 고 성능 I/O 버스(706)에 연결할 수 있고; 또는 소정의 구현에서, 단지 하나의 버스만이 이 시스템(700)의 구성 요소와 연결되도록 존재할 수 있다. 또한, 부가적인 프로세서, 기억 장치나 메모리와 같은 부가적인 구성 요소를 시스템(700)에 포함시킬 수 있다.Obviously, various components of the system 700 may be rearranged. For example, the cache 704 may be on-chip with a processor 702. Alternatively, cache 704 and processor 702 may be packaged together as a “processor module” with processor 702 referred to as a “processor core”. In addition, the mass storage device 720, the keyboard and the positioning device 722, and / or the display device 718 and the video memory 716 may not be embedded in the system 700. Additionally, the depicted peripherals connected to standard I / O bus 708 may be connected to high performance I / O bus 706 in alternative embodiments; Or in some implementations, only one bus may be present to connect with components of the system 700. In addition, additional components, such as additional processors, storage devices, or memory, may be included in the system 700.

일 실시예에서, 분리된 A/V 편집기(732)를 포함하기 보다, 도 7에 도시한 시스템(700)으로 일련의 소프트웨어 루틴을 실행시킴으로써 이미 기술한 본 발명의 혁신적인 특징을 이행할 수 있다. 이런 소프트웨어 루틴은 시스템(700)의 프로세서(702)와 같은 프로세서에 의해 실행되는 복수의 명령이나 일련의 명령을 실행시킨다. 먼저, 대용량 기억 장치(720)와 같은 기억 장치에 일련의 명령을 기억시킨다. 디스켓, 씨디 롬(CD ROM), 자기 테이프, 디지털 다목적 디스크(DVD)(또한 디지털 비디오 디스크라 함), 레이저디스크, 롬(ROM), 플래시 메모리 등과 같은 종래 소정의 기억 장치에 일련의 명령을 기억시킬 수 있다는 것은 자명하다. 일련의 명령을 국부적으로 기억할 필요 없이 네트워크/통신 인터페이스(724)를 경유하여 네트워크 상의 서버와 같은 원격 기억 장치로부터 수신할 수 있다. 명령은 대용량기억 장치(720)와 같은 기억 장치로부터 시스템 메모리(714)로 복사된 후, 프로세서(702)의 접근에 의해 실행된다. 일 실시예에서, 이러한 소프트웨어 루틴은 C++ 프로그래밍 언어로 작성된다(written). 그러나 이러한 소프트웨어 루틴은 다양한 종류의 프로그래밍 언어로 구현될 수 있다는 것은 자명하다. 대안적인 실시예에서, 분리된 하드웨어나 펌웨어(firmware)로 본 발명을 구현할 수 있다. 예를 들면 이미 기술한 본 발명의 기능을 갖도록 주문형 IC(application specific integrated circuit, ASIC)를 프로그래밍할 수 있다.In one embodiment, rather than including a separate A / V editor 732, the innovative features of the present invention described above may be implemented by executing a series of software routines with the system 700 shown in FIG. Such software routines execute a plurality of instructions or a series of instructions executed by a processor, such as processor 702 of system 700. First, a series of commands is stored in a storage device such as mass storage device 720. Stores a series of instructions in a conventional predetermined storage device such as a diskette, CD ROM, magnetic tape, digital general purpose disk (DVD) (also called digital video disk), laser disk, ROM (ROM), flash memory, etc. It is self-evident. A series of commands can be received from a remote storage device, such as a server on a network, via network / communication interface 724 without having to locally store a series of commands. The instructions are copied from a storage device such as mass storage device 720 to system memory 714 and then executed by the access of processor 702. In one embodiment, such software routines are written in the C ++ programming language. However, it is obvious that these software routines can be implemented in various kinds of programming languages. In alternative embodiments, the present invention may be implemented in separate hardware or firmware. For example, an application specific integrated circuit (ASIC) may be programmed to have the functionality of the present invention as described above.

도 8은 본 발명의 일 실시예에 따른 예시적인 소프트웨어 구조를 갖는 소프트웨어 구성 요소를 예시하는 블록도이다. 특히, 예시적인 소프트웨어 구조(800)는 A/V 편집기 애플리케이션(802), 비디오 분석 모듈(806)과 오디오 분석 모듈(808)에 관련된 A/V 편집기 에이젠트(A/V editor agent) 그리고 구동 장치와 BIOS(822)에 연관된 운영 체제(810)를 갖고 있는 것으로 도시되어 있다. 도 8에 예시한 일 실시예에 도시되어 있는 것처럼, A/V 편집기 애플리케이션(802)은 A/V 편집기 에이전트(805)와 인터페이싱하고 사용자 인터페이스에게 도 1의 A/V 편집 시스템(128)을 제공한다.8 is a block diagram illustrating software components having an exemplary software structure in accordance with one embodiment of the present invention. In particular, the example software structure 800 includes an A / V editor agent and driving device associated with the A / V editor application 802, the video analysis module 806 and the audio analysis module 808. And an operating system 810 associated with the BIOS 822. As shown in one embodiment illustrated in FIG. 8, the A / V Editor application 802 interfaces with the A / V Editor Agent 805 and provides the user interface with the A / V Editing System 128 of FIG. 1. do.

일 실시예에서, A/V 편집기 에이전트(804)는 데이터베이스(812)와 오디오 파일부(814)에 연결되어 정보를 획득할 수 있다. 대안적인 실시예에서, 오디오 파일부(814) 및/또는 데이터베이스(812)는 A/V 편집기 에이전트(804)의 집적 모듈이다. 도 8에 도시되어 있는 것처럼, A/V 편집기 에이전트(804)는 통신 포트를 통해 운영 체제(810) 내의 적절한 구동 장치에서 비디오 신호를 수신한다. 일 실시예에서,이미 기술한 본 발명의 요지에 따라서, 비디오 분석 모듈(806)은 많은 시각 속성들 중 임의의 것에 대하여 수신된 A/V 신호의 비디오 컨텐트를 분석하는 단계는 수행하고, 반면에 오디오 분석 모듈(808)은 그 레벨로 오디오 선택 내용이 기록되는 관련 레벨(예를 들면, 볼륨)을 결정하기 위하여 만일 가능하다면 기록된 A/V 신호의 오디오 컨텐트(예를 들면, 주 오디오 컨텐트)를 분석한다. 비디오 분석 모듈(806)로부터 출력되는 출력 중 적어도 일부에 따라서, 데이터베이스(812)에 접근하는 A/V 편집기(804)는 수신된 신호의 비디오 컨텐트에 대한 식별된 시각 속성에 실질적으로 대응하는 오디오 선택 내용을 식별하고, 식별된 오디오 선택 내용에 사용자가 접근할 때 A/V 편집기(802)는 오디오 분석 모듈(208)이 자동으로 선택한 레벨에서 수신된 A/V 신호를 식별된 오디오 선택 내용과 조합하여 운영 체제(810)의 적절한 구동 장치를 통해 출력되는 합성된 오디오 선택 내용을 포함하는 조합 A/V 신호를 제공한다.In one embodiment, the A / V Editor agent 804 may be connected to the database 812 and the audio file unit 814 to obtain information. In an alternative embodiment, the audio file portion 814 and / or database 812 is an integrated module of the A / V editor agent 804. As shown in FIG. 8, A / V editor agent 804 receives a video signal from a suitable drive within operating system 810 via a communication port. In one embodiment, in accordance with the presently described subject matter, video analysis module 806 performs the step of analyzing the video content of the received A / V signal for any of a number of visual attributes, while The audio analysis module 808, if possible, determines the audio content (eg, primary audio content) of the recorded A / V signal to determine the relevant level (eg, volume) at which the audio selection is recorded. Analyze In accordance with at least some of the output from the video analysis module 806, the A / V editor 804 accessing the database 812 may select audio corresponding substantially to the identified visual properties of the video content of the received signal. When identifying content and the user accesses the identified audio selection, the A / V Editor 802 combines the received A / V signal with the identified audio selection at the level that the audio analysis module 208 automatically selected. Thereby providing a combined A / V signal comprising the synthesized audio selections output through the appropriate drive of the operating system 810.

상기에서 시사하는 것처럼, BIOS(822)는 운영 체제(810)와 하드웨어 시스템에 연결된 다양한 I/O 장치 사이에 인터페이스를 제공한다. 이 운영 체제(810)는 만약 가능하다면 본 발명이 실시되는 컴퓨터 시스템[예를 들면, 시스템(700)]에 의해 실행되는 다른 소프트웨어 애플리케이션들뿐만 아니라 BIOS(822)와 A/V 편집기 에이젠트(804) 사이의 인터페이스를 제공하는 소프트웨어 서비스이다. 운영 체제(810)는 사용자와 시스템 컨트롤러 사이에 그래픽 사용자 인터페이스(graphical user interface, GUI)와 같은 인터페이스를 제공한다. 본 발명의 한 실시예에 따르면, 운영 체제(810)는 Washington, Redmond의 MicrosoftCorporation으로부터 입수 가능한 Windows^TM95 운영 체제이다. 그러나 본 발명은 예를 들면 Microsoft WindowsTM의 다른 버전(예를 들면, Windows^TM3.0, Windows^TM3.1, Windows^TMN.T나 Windows^TMC.E), Microsoft DOS, New York, Armonk의 International Business Machines Corporation으로부터 입수 가능한 OS/2, California, Cupertino의 Apple Computer Incorporated로부터 입수 가능한 Apple Macintosh Operation System과 NeXTSTEP^ⓡ, California Santa Cruz의 Santa Cruz Operations으로부터 입수 가능한 UNIX 운영 체제와 같은 통상적인 다른 운영 체제들 중 임의의 것을 이용할 수 있다는 것은 자명하다.As suggested above, the BIOS 822 provides an interface between the operating system 810 and various I / O devices connected to the hardware system. This operating system 810, if possible, the BIOS 822 and A / V editor agent 804 as well as other software applications executed by the computer system (eg, system 700) on which the present invention is practiced. A software service that provides an interface between The operating system 810 provides an interface, such as a graphical user interface (GUI), between the user and the system controller. According to one embodiment of the invention, operating system 810 is a Windows ^TM 95 operating system available from Microsoft Corporation of Redmond, Washington. However, the present invention, for example, different versions of Microsoft WindowsTM (for ^{^{example, Windows TM 3.0, Windows TM 3.1}} , Windows TM NT or ^{Windows TM CE), Microsoft DOS,} New York, available OS from International Business Machines Corporation of Armonk / 2, California, is that of the from Apple Computer Incorporated of Cupertino available Apple Macintosh operation system and NeXTSTEP ^ⓡ, California Santa Cruz conventional other operations, such as commercially available UNIX operating systems, from Santa Cruz operations system available to any Self-explanatory

그래서 본 발명의 요지에 따라서, A/V 편집 시스템은 비디오 기록의 비디오 컨텐트를 분석하고, 적어도 부분적으로는 비디오 컨텐트의 시각 속성에 기초하여 만일 가능하다면 비디오 기록의 주 오디오 컨텐트 위에 겹쳐 쓰지(overwhelm) 않는 A/V 편집 시스템에 의해 자동으로 선택된 기록 레벨에서, 비디오 기록에 합성되는 오디오 선택 내용을 자동으로 식별한다.Thus, in accordance with the teachings of the present invention, the A / V editing system analyzes the video content of the video record and overwhelms the main audio content of the video record if possible based at least in part on the visual properties of the video content. At the recording level automatically selected by the A / V editing system, the audio selection content synthesized in the video recording is automatically identified.

본 발명의 장치 및 방법은 상기에 기술한 실시예에 따라서 기술되었지만, 본 발명은 이미 기술한 실시에에 한정되지 않는다는 것은 당업자에겐 자명하다. 본 발명의 범위나 본질을 벗어나지 않고 이 특정 실시예로부터 변형되거나 변경된 다양한 변형이나 변경이 이루어질 수 있다. 예를 들면, 개별적인 구성 요소로 도시되었지만, A/V 편집 시스템(128)은 시스템(100) 내의 시스템 컴포넌트들(예를 들면텔레비전/모니터/비디오카세트 기록/재생 장치) 중 임의의 것에 충분히 내장될 수 있다. 또한, 편집 시스템(128)은 도 2 내지 도 7에 도시한 구성 요소 모드를 포함하지 않을 수 있거나 대안적으로 본 발명의 범위와 본질을 벗어나지 않고 부가적인 구성 요소를 포함할 수 있다. 따라서 발명의 상세한 설명은 본 발명을 제한하기보다는 예시하기 위한 것으로 간주된다.Although the apparatus and method of the present invention have been described in accordance with the embodiments described above, it will be apparent to those skilled in the art that the present invention is not limited to the embodiments already described. Various modifications or changes may be made or modified from this specific embodiment without departing from the scope or spirit of the invention. For example, although depicted as separate components, the A / V editing system 128 may be fully embedded in any of the system components (eg, television / monitor / videocassette recording / playback apparatus) in the system 100. Can be. Further, editing system 128 may not include the component modes shown in FIGS. 2-7 or alternatively may include additional components without departing from the scope and spirit of the present invention. Accordingly, the detailed description of the invention is intended to be illustrative rather than restrictive.

그래서 오디오 선택 내용으로 비디오 기록을 편집하기 위한 방법 및 장치를 기술한다.Thus, a method and apparatus for editing a video record with audio selections is described.

Claims

Receiving a signal comprising video content,

Analyzing the video content of the received signal to identify a visual attribute of the video content, and

Identifying an appropriate audio selection from a plurality of available audio selections to synthesize to the received signal based at least in part on the identified visual property of the video content.

Mechanism implementation method comprising a.

The method of claim 1,

Synthesizing the received signal with the identified audio selection to form a composite audio / video (A / V) signal comprising at least one of the video content and the identified audio selection

Mechanism implementation method further comprising.

The method of claim 1,

And wherein said analyzing video content characterizes said video content of said received signal with a color visual attribute.

The method of claim 1,

And wherein said analyzing video content characterizes the video content of said received signal with a lighting visual attribute.

The method of claim 1,

And the video content analyzing step characterizes the video content of the received signal with a motion visual attribute.

The method of claim 1,

The video content analysis step

Characterizing the video content of the received signal with the color visual attribute;

Characterizing the video content of the received signal with the brightness visual attribute, and

Characterizing the video content of the received signal with the motion visual attribute

Containing at least one of

How to implement the mechanism.

The method of claim 1,

Identifying a recording level for synthesizing the received signal with the identified audio selection.

The method of claim 7, wherein

Mechanism implementation method further comprising.

The method of claim 7, wherein

The recording level identification step

Determining whether the received signal includes audio content,

If the received signal includes the audio content, identifying an audio attribute representing a characteristic of the audio content, and

Selecting a recording level to synthesize audio content of the received signal with the identified audio selections;

Containing

How to implement the mechanism.

An input port for receiving a signal having video content,

A video analysis circuit coupled to the input port and analyzing video content of the received signal to identify a visual property of the video content, and

Identify a suitable audio selection in a plurality of audio selections coupled to the video analysis circuitry and at least partially synthesized in the received signal based on the identified visual attributes of the video content of the received video signal; Controller to navigate,

Device comprising a.

The method of claim 10,

And a mass storage device coupled to the controller for storing and retrieving each of the plurality of audio selections.

The method of claim 10,

And the video analysis circuitry is to determine the amount of color visual attributes of the video content.

The method of claim 10,

And the video analysis circuitry identifies brightness visual properties of the video content.

The method of claim 10,

The video analysis circuitry identifies a motion visual attribute of the video content.

The method of claim 10,

And the video analysis circuitry identifies color visual properties, brightness visual properties, and motion visual properties of the video content.

The method of claim 10,

The controller automatically mixes the identified audio selections with the main audio content of the received signal, if possible, at the identified recording level.

The method of claim 16,

And an audio analysis circuit coupled to the input port and a controller, the audio analysis circuitry analyzing the main audio content of the received signal, if possible, to identify audio attributes of the main audio content.

The method of claim 17,

And the audio analysis circuit selects a recording level for the identified audio selection content based at least in part on the identified audio attribute of the main audio content and provides the recording level to the controller.

The method of claim 10,

The plurality of audio selections are stored in a remote audio system communicatively coupled to the device via an audio interface,

The controller identifies and retrieves the appropriate audio selections in the remote audio system via the audio interface.

Device.

The method of claim 10,

The plurality of audio selections are stored in a network server communicatively connected to the device via a network connection,

The controller may identify and retrieve appropriate audio selections from the network server via the network connection.

Device.

An input port for receiving a signal with video content,

A video analysis circuit coupled to the input port and analyzing the video content of the received signal to identify a visual property of the video content; and

A controller coupled to the video analysis circuitry for identifying and retrieving an appropriate audio selection from a plurality of audio selections to synthesize to the received signal based at least in part on the identified visual property of the video content

Video editing system comprising a.

The method of claim 21,

And the video analysis circuitry determines the amount of color visual properties, brightness visual properties, and / or motion visual properties of the video content.

The method of claim 21,

The controller automatically synthesizes the identified audio selections with the main audio content of the received signal, if possible, at the identified recording level.

The method of claim 21,

And the audio analysis circuitry is coupled to the input port and a controller and analyzes the main audio content of the received signal, if possible, to identify audio attributes of the main audio content.

The method of claim 25,

The audio analysis circuit selects a recording level for the identified audio selection content based at least in part on the identified audio attribute of the main audio content and provides the recording level to the controller.

The method of claim 21,

The plurality of audio selections are stored in a remote audio system communicatively coupled to the video editing system via an audio interface,

Video editing system.

The method of claim 21,

The plurality of audio selections are stored in a network server communicatively connected to the video editing system via a network connection,

The controller identifies and retrieves appropriate audio selections from the network server via the network connection.

Video editing system.

A main audio / video (A / V) functional unit providing a signal comprising video content, and

An input port coupled to the main A / V functional unit and receiving a signal having video content, a video coupled to the input port and analyzing the video content of the received signal to identify a visual property of the video content An appropriate audio selection in a plurality of audio selections coupled to the analysis circuitry and coupled to the video analysis circuitry and synthesized to the received signal based at least in part on the identified visual property of the video content of the received signal. Video editing system including a controller for identifying and retrieving a video signal and responding to the primary A / V functional unit

Electronic device comprising a.

The method of claim 29,

The electronic device is a television.

The method of claim 29,

And the electronic device is a video recorder / playback device.

The method of claim 29,

The auxiliary video editing system further comprises audio analysis circuitry coupled to the input port and for analyzing the main audio content included in the received signal, if possible, to identify audio attributes of the main audio content. Electronic devices.

33. The method of claim 32,

And the audio analysis circuit identifies a recording level suitable for synthesizing the received signal with the identified audio selection based at least in part on the identified visual property of the main audio content.

The method of claim 29,

The secondary video editing system synthesizes the received signal with the identified audio selection to generate a composite audio / video (A / V) signal comprising at least one of the video content and the identified audio selection. Device.

In a medium readable by a mechanism in which a plurality of instructions for performing a video editing service are stored,

The video editing service is a service that analyzes video content of a received signal to identify a visual property of video content and the received signal based at least in part on the identified visual property of the video content of the received signal. A service for identifying an appropriate audio selection from a plurality of available audio selections to be composited into the video content of the.

Mechanism-readable medium.

36. The method of claim 35 wherein

To identify a recording level suitable for the video editing service to identify audio properties of the main audio content and to synthesize the received video record with the identified audio selection based at least in part on the audio property. If possible, further comprising a service for analyzing the received primary audio content.

36. The method of claim 35 wherein

A service for synthesizing the received signal with the identified audio selection to generate a composite audio / video (A / V) signal that includes the video editing service and at least one of the video content and the identified audio selection; Further comprising mechanism readable medium.