JP2009060325A

JP2009060325A - Editing apparatus, editing method and program

Info

Publication number: JP2009060325A
Application number: JP2007225206A
Authority: JP
Inventors: Mitsutoshi Magai; 光俊真貝; Yoshiaki Shibata; 賀昭柴田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-08-31
Filing date: 2007-08-31
Publication date: 2009-03-19

Abstract

PROBLEM TO BE SOLVED: To easily apply predetermined processing to the speech of a desired speaker in editing results when video images with voices are edited non-destructively. SOLUTION: An EM creating section 153 creates electronic mark data of the editing results on the basis of the electronic mark data of a clip indicating the start point of the speech of the sound of the clip as a feature and describing a speaker EM (start) added with the speaker ID of the speaker of the speech. An operating section I/F designates a speaker of a voice subjected to duck voice processing in the voice of the editing result. An edit list creating section 152 includes information designating a section of the voice of the speaker designated by the operation section I/F as a section of the voice subjected to the duck voice processing into an edit list on the basis of the electronic mark data of the editing result. This apparatus is applicable to an editing apparatus, for example. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、編集装置および編集方法、並びにプログラムに関し、特に、音声付き映像を非破壊編集する場合において、編集結果のうちの所望の発言者の音声に容易に所定の加工を施すことができるようにした編集装置および編集方法、並びにプログラムに関する。 The present invention relates to an editing apparatus, an editing method, and a program. In particular, when non-destructive editing is performed on a video with audio, predetermined audio can be easily processed on the audio of a desired speaker in the editing result. The present invention relates to an editing apparatus, an editing method, and a program.

従来、ニュース番組やドキュメンタリ番組では、例えば発言者の身元を隠す必要がある場合に、その発言者の音声のピッチやフォルマントを変更する加工を施すことがある。この加工後の音声は、大男や子供の声に似た音声になることもあるが、ドナルドダックの声に似た音声になることが多いので、ダックボイスといわれている。なお、以下では、音声をダックボイスにする加工のことを、ダックボイス加工という。 Conventionally, in a news program or a documentary program, for example, when it is necessary to hide the speaker's identity, a process of changing the pitch or formant of the speaker's voice may be performed. The processed sound may be similar to that of a big boy or a child, but is often similar to Donald Duck's voice, so it is said to be duck voice. In the following, the process of converting the sound into a duck voice is referred to as duck voice processing.

また、近年、編集作業の効率を高めるために、撮影された映像や音声をそのまま残してカット点を記述する非破壊編集を行う編集装置が普及してきている。なお、カット点とは、撮影された映像や音声のうちの編集結果に含める区間の開始位置を表すイン点、または、終了位置を表すアウト点である。 In recent years, in order to increase the efficiency of editing work, editing apparatuses that perform nondestructive editing that describes cut points while leaving captured video and audio as they are are becoming widespread. The cut point is an In point that represents a start position of an interval included in an editing result of captured video or audio, or an Out point that represents an end position.

このような編集装置としては、例えば、ユーザによりカット点が指定されると、映像の特徴を示すEssenceMark(登録商標)などの電子マークの一覧である電子マークリストに、カット点を示す電子マークを追加し、カット点に基づいて映像や音声を編集するための編集リストを生成する編集装置がある(例えば、特許文献１参照)。 As such an editing device, for example, when a cut point is designated by a user, an electronic mark indicating a cut point is added to an electronic mark list which is a list of electronic marks such as EssenceMark (registered trademark) indicating the characteristics of an image. There is an editing apparatus that adds and generates an edit list for editing video and audio based on cut points (see, for example, Patent Document 1).

特開２００４−１８０２７９号公報JP 2004-180279 A

しかしながら、非破壊編集を行う編集装置において、編集結果のうちの所望の発言者の音声に対してダックボイス加工などの所定の加工を施すことは考えられていなかった。 However, in an editing apparatus that performs nondestructive editing, it has not been considered to perform predetermined processing such as duck voice processing on the voice of a desired speaker in the editing result.

本発明は、このような状況に鑑みてなされたものであり、音声付き映像を非破壊編集する場合において、編集結果のうちの所望の発言者の音声に容易に所定の加工を施すことができるようにするものである。 The present invention has been made in view of such a situation, and in the case of nondestructive editing of a video with audio, it is possible to easily apply predetermined processing to the audio of a desired speaker in the editing result. It is what you want to do.

本発明の一側面の編集装置は、音声付き映像を編集する編集装置において、前記音声付き映像を編集し、その編集結果に関する編集情報を作成する編集手段と、前記音声付き映像の音声の特徴を示し、その音声の発言者の固有の情報が付加された電子マークを記述した編集前電子マークデータに基づいて、前記編集結果の電子マークデータである編集後電子マークデータを作成する作成手段と、前記編集結果の音声のうちの所定の加工を施す音声の発言者を指定する指定手段とを備え、前記編集手段は、前記編集後電子マークデータに基づいて、前記指定手段により指定された発言者の音声の区間を、前記所定の加工を施す音声の区間として指定する情報を、前記編集情報に含める。 An editing apparatus according to one aspect of the present invention includes an editing unit that edits an image with audio and creates editing information related to the editing result, and an audio feature of the audio with video. Creating means for creating electronic mark data after editing, which is electronic mark data of the editing result, based on the electronic mark data before editing describing the electronic mark to which the unique information of the voice speaker is added, Designating means for designating a speaker of speech to be subjected to predetermined processing of the edited speech, and the editing means is designated by the designating means based on the edited electronic mark data Is included in the editing information.

本発明の一側面の編集装置において、前記編集手段は、ユーザにより指定された、前記音声付き映像のうちの前記編集結果に含める区間の開始位置または終了位置を表すカット点に基づいて、前記音声付き映像を編集し、前記作成手段は、前記編集結果に含まれる音声の編集前電子マークデータを、前記編集後電子マークデータとして複写し、前記カット点に対応する編集結果上の位置の電子マークを前記編集後電子マークデータに新たに記述することにより、前記編集後電子マークデータを作成することができる。 In the editing apparatus according to one aspect of the present invention, the editing unit is configured to select the audio based on a cut point that is designated by a user and represents a start position or an end position of a section included in the editing result of the video with audio. The attached image is edited, and the creating means copies the electronic mark data before editing of the audio included in the editing result as the electronic mark data after editing, and an electronic mark at a position on the editing result corresponding to the cut point. Is newly written in the edited electronic mark data, the edited electronic mark data can be created.

本発明の一側面の編集装置において、前記作成手段は、前記指定手段により指定された発言者の固有の情報が付加された編集後電子マークに、前記所定の加工を施すことを表す情報を付加することができる。 In the editing apparatus according to one aspect of the present invention, the creation unit adds information indicating that the predetermined processing is performed on the edited electronic mark to which the unique information of the speaker designated by the designation unit is added. can do.

本発明の一側面の編集方法は、音声付き映像を編集する編集装置の編集方法において、前記音声付き映像を編集して、その編集結果に関する編集情報を作成し、前記音声付き映像の音声の特徴を示し、その音声の発言者の固有の情報が付加された電子マークを記述した編集前電子マークデータに基づいて、前記編集結果の電子マークデータである編集後電子マークデータを作成し、前記編集結果の音声のうちの所定の加工を施す音声の発言者を指定し、前記編集後電子マークデータに基づいて、指定された発言者の音声の区間を、前記所定の加工を施す音声の区間として指定する情報を、前記編集情報に含めるステップを含む。 An editing method according to one aspect of the present invention is an editing method of an editing apparatus that edits video with audio, edits the video with audio, creates editing information related to the editing result, and features of audio of the video with audio The edited electronic mark data, which is the electronic mark data of the edited result, is created based on the electronic mark data before editing describing the electronic mark to which the unique information of the voice speaker is added, and the editing The speaker of the voice to be subjected to the predetermined processing of the resultant voice is specified, and the section of the voice of the specified speaker is set as the section of the voice to be subjected to the predetermined processing based on the edited electronic mark data. Including specifying information in the editing information.

本発明の一側面のプログラムは、音声付き映像を編集する編集処理を、コンピュータに行わせるプログラムにおいて、前記音声付き映像を編集して、その編集結果に関する編集情報を作成し、前記音声付き映像の音声の特徴を示し、その音声の発言者の固有の情報が付加された電子マークを記述した編集前電子マークデータに基づいて、前記編集結果の電子マークデータである編集後電子マークデータを作成し、前記編集結果の音声のうちの所定の加工を施す音声の発言者を指定し、前記編集後電子マークデータに基づいて、指定された発言者の音声の区間を、前記所定の加工を施す音声の区間として指定する情報を、前記編集情報に含めるステップを含む編集処理をコンピュータに行わせる。 A program according to one aspect of the present invention is a program that causes a computer to perform editing processing for editing video with audio, edits the video with audio, creates editing information related to the editing result, and Based on the electronic mark data before editing that describes the electronic mark to which the information specific to the speaker of the voice is added, indicating the characteristics of the voice, the electronic mark data after editing that is the electronic mark data of the editing result is created. The voice of the voice to be subjected to the predetermined processing among the voices of the editing result is specified, and the voice of the voice of the predetermined processing of the voice of the specified speaker is specified based on the edited electronic mark data. The computer is caused to perform an editing process including a step of including information specified as a section of the above in the editing information.

本発明の一側面においては、音声付き映像が編集されて、その編集結果に関する編集情報が作成され、音声付き映像の音声の特徴を示し、その音声の発言者の固有の情報が付加された電子マークを記述した編集前電子マークデータに基づいて、編集結果の電子マークデータである編集後電子マークデータが作成され、編集結果の音声のうちの所定の加工を施す音声の発言者が指定され、編集後電子マークデータに基づいて、指定された発言者の音声の区間を、所定の加工を施す音声の区間として指定する情報が、編集情報に含められる。 In one aspect of the present invention, an electronic video to which an audio-added video is edited, editing information relating to the editing result is created, the audio characteristics of the audio-added video are indicated, and information specific to the speaker of the audio is added. Based on the pre-edit electronic mark data describing the mark, post-edit electronic mark data, which is the electronic mark data of the edit result, is created, and a voice speaker who performs predetermined processing of the edit result voice is designated, Based on the edited electronic mark data, information that specifies the voice section of the designated speaker as the voice section to be subjected to predetermined processing is included in the editing information.

以上のように、本発明の一側面によれば、音声付き映像を非破壊編集する場合において、編集結果のうちの所望の発言者の音声に容易に所定の加工を施すことができる。 As described above, according to one aspect of the present invention, when non-destructive editing is performed on a video with sound, predetermined processing can be easily performed on the sound of a desired speaker in the editing result.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書又は図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書又は図面に記載されていることを確認するためのものである。従って、明細書又は図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

本発明の一側面の編集装置は、
音声付き映像を編集する編集装置(例えば、図１の編集装置４１)において、
前記音声付き映像を編集し、その編集結果に関する編集情報を作成する編集手段(例えば、図１０のエディットリスト作成部１５２)と、
前記音声付き映像の音声の特徴を示し、その音声の発言者の固有の情報が付加された電子マークを記述した編集前電子マークデータに基づいて、前記編集結果の電子マークデータである編集後電子マークデータを作成する作成手段(例えば、図１０のEM作成部１５３)と、
前記編集結果の音声のうちの所定の加工を施す音声の発言者を指定する指定手段(例えば、図９の操作部I/F１１４)と
を備え、
前記編集手段は、前記編集後電子マークデータに基づいて、前記指定手段により指定された発言者の音声の区間を、前記所定の加工を施す音声の区間として指定する情報を、前記編集情報に含める。 An editing device according to one aspect of the present invention includes:
In an editing device (for example, the editing device 41 in FIG. 1) for editing video with audio,
Editing means for editing the video with audio and creating editing information related to the editing result (for example, the edit list creating unit 152 in FIG. 10);
Based on the pre-edit electronic mark data describing the electronic mark to which the unique information of the voice speaker is added, indicating the characteristics of the audio of the video with audio, the edited electronic mark data is the electronic mark data of the edit result Creating means for creating mark data (for example, the EM creating unit 153 in FIG. 10);
A designation means (for example, the operation unit I / F 114 in FIG. 9) for designating a speaker of a voice to be subjected to predetermined processing of the edited voice.
The editing means includes, in the editing information, information that designates a voice section of a speaker designated by the designation means as a voice section to be subjected to the predetermined processing based on the edited electronic mark data. .

本発明の一側面の編集方法は、
音声付き映像を編集する編集装置(例えば、図１の編集装置４１)の編集方法において、
前記音声付き映像を編集して、その編集結果に関する編集情報を作成し（例えば、図２３のステップＳ８３）、
前記音声付き映像の音声の特徴を示し、その音声の発言者の固有の情報が付加された電子マークを記述した編集前電子マークデータに基づいて、前記編集結果の電子マークデータである編集後電子マークデータを作成し（例えば、図２３のステップＳ９０）、
前記編集結果の音声のうちの所定の加工を施す音声の発言者を指定し(例えば、図２３のステップＳ９１)、
前記編集後電子マークデータに基づいて、指定された発言者の音声の区間を、前記所定の加工を施す音声の区間として指定する情報を、前記編集情報に含める(例えば、図２３のステップＳ９２)
ステップを含む。 An editing method according to one aspect of the present invention includes:
In an editing method of an editing apparatus (for example, the editing apparatus 41 in FIG. 1) for editing video with audio,
Edit the video with audio and create editing information related to the editing result (for example, step S83 in FIG. 23),
Based on the pre-edit electronic mark data describing the electronic mark to which the unique information of the voice speaker is added, indicating the characteristics of the audio of the video with audio, the edited electronic mark data is the electronic mark data of the edit result Mark data is created (for example, step S90 in FIG. 23),
A voice speaker who performs predetermined processing of the edited voice is designated (for example, step S91 in FIG. 23),
Based on the edited electronic mark data, information specifying the voice section of the designated speaker as the voice section to be subjected to the predetermined processing is included in the editing information (for example, step S92 in FIG. 23).
Includes steps.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings.

図１は、本発明を適用した撮影編集システムの第１の実施の形態の構成例を示している。 FIG. 1 shows a configuration example of a first embodiment of a photographing editing system to which the present invention is applied.

図１の撮影編集システム１０は、例えば、テレビジョン番組の素材となるテレビジョン素材を撮影して編集するために用いられる撮影編集システムである。 A photographing / editing system 10 in FIG. 1 is a photographing / editing system used for photographing and editing a television material as a material of a television program, for example.

撮影編集システム１０は、マイクロフォン２１Ａを備えたカムコーダー（商標）などのビデオカメラ２１、ビデオカメラ２２、および編集装置４１により構成される。 The photographing and editing system 10 includes a video camera 21 such as a camcorder (trademark) provided with a microphone 21A, a video camera 22, and an editing device 41.

ビデオカメラ２１および２２は、テレビジョン番組であるニュース番組やドキュメンタリ番組のテレビジョン素材の収録に使用される装置である。ビデオカメラ２１は、テレビジョン素材の映像を撮影するとともに、マイクロフォン２１Ａにより周囲の音声を取得する。ビデオカメラ２１は、その結果得られる音声付き映像のデータを、テレビジョン素材のデータである素材データとして光ディスク３１に記録する。 The video cameras 21 and 22 are devices used for recording television materials such as news programs and documentary programs which are television programs. The video camera 21 captures a video of a television material and acquires surrounding sound by the microphone 21A. The video camera 21 records the audio-added video data obtained as a result on the optical disc 31 as material data that is data of the television material.

また、ビデオカメラ２２は、テレビジョン素材の映像を撮影し、その結果得られる映像のデータを素材データとして、光ディスク３２に記録する。さらに、ビデオカメラ２１および２２は、それぞれ、ユーザの入力に基づいて、収録に関する情報等のように、例えば、後段の編集時に有用となる情報等を生成し、素材データにメタデータとして関連付ける。 In addition, the video camera 22 captures a video of the television material and records the video data obtained as a result on the optical disc 32 as material data. Furthermore, each of the video cameras 21 and 22 generates, for example, information useful for subsequent editing, such as information related to recording, based on the user's input, and associates it with the material data as metadata.

素材データや、それに関連付けられるメタデータが記録された光ディスク３１または３２は、編集装置４１の光ディスクドライブ４１Ａに装着される。 The optical disk 31 or 32 on which material data and metadata associated therewith are recorded is loaded into the optical disk drive 41A of the editing apparatus 41.

編集装置４１は、光ディスクドライブ４１Ａに装着される光ディスク３１および３２に記録された素材データの、１つの光ディスク３１への集約と、その光ディスク３１に集約された素材データの編集とに使用される装置である。 The editing device 41 is a device used to aggregate the material data recorded on the optical disks 31 and 32 mounted on the optical disk drive 41A into one optical disk 31, and to edit the material data collected on the optical disk 31. It is.

編集装置４１は、光ディスク３２に記録された素材データを、必要に応じて光ディスク３１に複写する。また、編集装置４１は、ユーザの入力に応じて、光ディスク３１に集約された素材データの非破壊編集を行い、その編集結果に関する情報であるエディットリストを作成して、光ディスク３１に記録する。さらに、編集装置４１は、ユーザの入力に応じて、編集結果のうちの所望の発言者の音声にダックボイス加工を施す。 The editing device 41 copies the material data recorded on the optical disc 32 to the optical disc 31 as necessary. The editing device 41 performs nondestructive editing of the material data collected on the optical disc 31 in accordance with a user input, creates an edit list that is information related to the editing result, and records the edited list on the optical disc 31. Further, the editing device 41 performs duck voice processing on the voice of a desired speaker in the editing result in accordance with a user input.

なお、図１の撮影編集システム１０では、ビデオカメラ２１または２２と編集装置４１が、それぞれ別々の装置であるものとしたが、それらが一体化されていてもよい。 1, the video camera 21 or 22 and the editing device 41 are separate devices, but they may be integrated.

また、撮影編集システム１０では、光ディスク３１および３２が、編集装置４１の光ディスクドライブ４１Ａに装着され、その光ディスク３１および３２に対する読み出しまたは記録が行われるものとしたが、編集装置４１が、光ディスク３１が装着されたビデオカメラ２１、および、光ディスク３２が装着されたビデオカメラ２２とネットワークを介して接続され、そのネットワークを介して、光ディスク３１および３２に対する読み出しまたは記録が行われるようにしてもよい。 In the photographing and editing system 10, the optical disks 31 and 32 are mounted on the optical disk drive 41 A of the editing apparatus 41, and reading or recording is performed on the optical disks 31 and 32. The video camera 21 mounted and the video camera 22 mounted with the optical disk 32 may be connected via a network, and reading or recording may be performed on the optical disks 31 and 32 via the network.

図２は、図１のビデオカメラ２１のハードウェア構成例を示すブロック図である。 FIG. 2 is a block diagram illustrating a hardware configuration example of the video camera 21 of FIG.

図２のビデオカメラ２１では、映像入力I/F（Interface）６０、音声入力I/F６１、マイクロコンピュータ（以下、マイコンという）６２、一時記憶メモリI/F６３、光ディスクドライブI/F６４、操作部I/F６５、音声出力I/F６６、シリアルデータI/F６７、映像表示I/F６８、メモリカードI/F６９、ネットワークI/F７０、ハードディスクドライブI/F７１、およびドライブI/F７２が、システムバス７３に接続されている。 In the video camera 21 of FIG. 2, a video input I / F (Interface) 60, an audio input I / F 61, a microcomputer (hereinafter referred to as a microcomputer) 62, a temporary storage memory I / F 63, an optical disk drive I / F 64, an operation unit I / F65, audio output I / F66, serial data I / F67, video display I / F68, memory card I / F69, network I / F70, hard disk drive I / F71, and drive I / F72 are connected to the system bus 73 Has been.

映像入力I/F６０には、カメラ７４が接続されており、カメラ７４により撮影された結果得られる映像信号が、カメラ７４から入力される。映像入力I/F６０は、その映像信号に含まれる、SDI（Serial Digital Interface）規格に準拠した信号、コンポジット信号、コンポーネント信号などの同期信号に対してA/D（Analog/Digital）変換を行い、その結果得られるデジタル信号を映像データとして、システムバス７３を介して、マイコン６２、映像表示I/F６８、または、一時記憶メモリI/F６３に供給する。 A camera 74 is connected to the video input I / F 60, and a video signal obtained as a result of shooting by the camera 74 is input from the camera 74. The video input I / F 60 performs A / D (Analog / Digital) conversion on the SDI (Serial Digital Interface) standard compliant signal, composite signal, component signal, etc. included in the video signal, A digital signal obtained as a result is supplied as video data to the microcomputer 62, the video display I / F 68, or the temporary storage memory I / F 63 via the system bus 73.

音声入力I/F６１には、外部に設けられたマイクロフォン２１Ａが接続されており、マイクロフォン２１Ａにより取得された周囲の音声のアナログ信号である音声信号が入力される。音声入力I/F６１は、その音声信号に対してA/D変換を行い、その結果得られるデジタル信号を音声データとして、システムバス７３を介して、マイコン６２または一時記憶メモリI/F６３に供給する。 A microphone 21A provided outside is connected to the voice input I / F 61, and a voice signal that is an analog signal of surrounding voice acquired by the microphone 21A is input thereto. The voice input I / F 61 performs A / D conversion on the voice signal, and supplies the resulting digital signal as voice data to the microcomputer 62 or the temporary storage memory I / F 63 via the system bus 73. .

マイコン６２は、CPU(Central Processing Unit)、ROM(Read Only Memory)、およびRAM（Random Access Memory）により構成される。マイコン６２のCPUは、ROMまたはハードディスク８１に記録されているプログラムにしたがって、操作部I/F６５からの操作信号などに応じて、ビデオカメラ２１の各部を制御する。 The microcomputer 62 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). The CPU of the microcomputer 62 controls each unit of the video camera 21 according to an operation signal from the operation unit I / F 65 in accordance with a program recorded in the ROM or the hard disk 81.

例えば、CPUは、映像入力I/F６０から供給される映像データと、音声入力I/F６１から供給される音声データとからなる素材データを用いて、その映像データの解像度を低くしたプロキシデータを作成する。CPUは、そのプロキシデータと素材データを一時記憶メモリI/F６３に供給して、一時記憶メモリ７５に記憶させる。また、CPUは、音声入力I/F６１から供給される音声データのレベルに応じて、音声の特徴を示す電子マークを記述する電子マークデータを作成し、光ディスクドライブI/F６４に供給する。 For example, the CPU creates proxy data in which the resolution of the video data is lowered using material data composed of video data supplied from the video input I / F 60 and audio data supplied from the audio input I / F 61. To do. The CPU supplies the proxy data and material data to the temporary storage memory I / F 63 and causes the temporary storage memory 75 to store them. Further, the CPU creates electronic mark data describing an electronic mark indicating the characteristics of the sound according to the level of the sound data supplied from the sound input I / F 61, and supplies the electronic mark data to the optical disc drive I / F 64.

さらに、CPUは、一時記憶メモリI/F６３から供給される素材データまたはプロキシデータのうちの音声データを、システムバス７３を介して音声出力I/F６６に供給して、その音声データに対応する音声をスピーカ７８から出力させる。 Further, the CPU supplies the audio data of the material data or proxy data supplied from the temporary storage memory I / F 63 to the audio output I / F 66 via the system bus 73, and the audio corresponding to the audio data. Is output from the speaker 78.

また、CPUは、一時記憶メモリI/F６３から供給される素材データまたはプロキシデータのうちの映像データを、システムバス７３を介して映像表示I/F６８に供給して、その映像データに対応する映像を表示装置７９に表示させる。RAMには、CPUが実行するプログラムやデータなどが適宜記憶される。 Further, the CPU supplies the video data of the material data or proxy data supplied from the temporary storage memory I / F 63 to the video display I / F 68 via the system bus 73, and the video corresponding to the video data. Is displayed on the display device 79. The RAM appropriately stores programs executed by the CPU, data, and the like.

一時記憶メモリI/F６３には、バッファなどの一時記憶メモリ７５が接続されており、一時記憶メモリI/F６３は、映像入力I/F６０からの映像データと、音声入力I/F６１からの音声データとからなる素材データを、一時記憶メモリ７５に記憶させる。また、一時記憶メモリI/F６３は、マイコン６２から供給されるプロキシデータを、一時記憶メモリ７５に記憶させる。 A temporary storage memory 75 such as a buffer is connected to the temporary storage memory I / F 63, and the temporary storage memory I / F 63 includes video data from the video input I / F 60 and audio data from the audio input I / F 61. Is stored in the temporary storage memory 75. The temporary storage memory I / F 63 stores the proxy data supplied from the microcomputer 62 in the temporary storage memory 75.

さらに、一時記憶メモリI/F６３は、一時記憶メモリ７５に記憶されている、映像入力I/F６０からの映像データと、音声入力I/F６１からの音声データとからなる素材データとプロキシデータとを読み出す。そして、一時記憶メモリI/F６３は、その素材データとプロキシデータを、システムバス７３を介して光ディスクドライブI/F６４に供給し、光ディスク３１に記録させる。 Further, the temporary storage memory I / F 63 stores the material data and the proxy data that are stored in the temporary storage memory 75 and are composed of the video data from the video input I / F 60 and the audio data from the audio input I / F 61. read out. Then, the temporary storage memory I / F 63 supplies the material data and proxy data to the optical disc drive I / F 64 via the system bus 73 and records the data on the optical disc 31.

また、一時記憶メモリI/F６３は、光ディスクドライブI/F６４から供給されるクリップ（詳細は後述する）のうちの素材データまたはプロキシデータを、一時記憶メモリ７５に記憶させる。さらに、一時記憶メモリI/F６３は、一時記憶メモリ７５に記憶されている、光ディスクドライブI/F６４から供給された素材データまたはプロキシデータを読み出し、システムバス７３を介して、マイコン６２に供給する。 Further, the temporary storage memory I / F 63 stores material data or proxy data in a clip (details will be described later) supplied from the optical disk drive I / F 64 in the temporary storage memory 75. Further, the temporary storage memory I / F 63 reads the material data or proxy data supplied from the optical disk drive I / F 64 and stored in the temporary storage memory 75, and supplies it to the microcomputer 62 via the system bus 73.

なお、クリップとは、例えば、１回の撮影処理（撮影開始から撮影終了までの撮影処理）により得られた素材データ、メタデータ、プロキシデータ等の集合体を指す。 Note that a clip refers to a collection of material data, metadata, proxy data, and the like obtained by one shooting process (shooting process from shooting start to shooting end).

光ディスクドライブI/F６４には、光ディスク３１が装着される光ディスクドライブ７６が接続されている。光ディスクドライブI/F６４は、光ディスクドライブ７６を制御して、クリップのうちの素材データまたはプロキシデータを読み出し、システムバス７３を介して一時記憶メモリI/F６３に供給する。 An optical disk drive 76 to which the optical disk 31 is mounted is connected to the optical disk drive I / F 64. The optical disk drive I / F 64 controls the optical disk drive 76 to read material data or proxy data in the clip, and supplies it to the temporary storage memory I / F 63 via the system bus 73.

また、光ディスクドライブI/F６４は、光ディスクドライブ７６を制御し、一時記憶メモリI/F６３からの素材データ、プロキシデータなどを、光ディスク３１にクリップ単位で記録させる。さらに、光ディスクドライブI/F６４は、光ディスクドライブ７６を制御し、マイコン６２からの電子マークデータを、光ディスク３１に記録させる。 The optical disk drive I / F 64 controls the optical disk drive 76 to record the material data, proxy data, and the like from the temporary storage memory I / F 63 on the optical disk 31 in units of clips. Further, the optical disk drive I / F 64 controls the optical disk drive 76 to record the electronic mark data from the microcomputer 62 on the optical disk 31.

操作部I/F６５には、操作ボタン、リモートコントローラから送信されてくる指令を受信する受信部などの操作部７７が接続される。操作部I/F６５は、ユーザによる操作部７７の操作に応じて、その操作を表す操作信号を生成し、その操作信号を、システムバス７３を介してマイコン６２に供給する。 The operation unit I / F 65 is connected to an operation unit 77 such as an operation button and a reception unit that receives a command transmitted from the remote controller. The operation unit I / F 65 generates an operation signal indicating the operation in accordance with the operation of the operation unit 77 by the user, and supplies the operation signal to the microcomputer 62 via the system bus 73.

音声出力I/F６６には、スピーカ７８が接続される。音声出力I/F６６は、マイコン６２から供給される音声データに対してD/A（Digital/Audio）変換を行い、その結果得られるアナログ信号を増幅して、スピーカ７８に供給する。スピーカ７８は、音声出力I/F６６からのアナログ信号に基づいて、音声を外部に出力する。なお、音声出力I/F６６は、音声データをそのままスピーカ７８に供給し、スピーカ７８が、D/A変換等を行い、その結果得られるアナログ信号に基づいて音声を外部に出力するようにしてもよい。 A speaker 78 is connected to the audio output I / F 66. The audio output I / F 66 performs D / A (Digital / Audio) conversion on the audio data supplied from the microcomputer 62, amplifies an analog signal obtained as a result, and supplies the amplified analog signal to the speaker 78. The speaker 78 outputs sound to the outside based on the analog signal from the sound output I / F 66. The audio output I / F 66 supplies the audio data as it is to the speaker 78, and the speaker 78 performs D / A conversion and the like, and outputs the audio to the outside based on the resulting analog signal. Good.

シリアルデータI/F６７は、必要に応じて、図示せぬ外部のコンピュータ等のデジタル機器との間で、データをやり取りする。映像表示I/F６８には、表示装置７９が接続され、映像表示I/F６８は、映像入力I/F６０またはマイコン６２からの映像データに対して、D/A変換を行い、その結果得られるコンポジット信号、コンポーネント信号などのアナログ信号を増幅して、表示装置７９に供給する。表示装置７９は、映像表示I/F６８からのアナログ信号に基づいて映像を表示する。 The serial data I / F 67 exchanges data with a digital device such as an external computer (not shown) as necessary. A display device 79 is connected to the video display I / F 68. The video display I / F 68 performs D / A conversion on video data from the video input I / F 60 or the microcomputer 62, and a composite obtained as a result. Analog signals such as signals and component signals are amplified and supplied to the display device 79. The display device 79 displays a video based on the analog signal from the video display I / F 68.

なお、映像表示I/F６８は、映像データをそのまま表示装置７９に供給し、表示装置７９がD/A変換等を行い、その結果得られるアナログ信号に基づいて映像を外部に出力するようにしてもよい。 The video display I / F 68 supplies the video data to the display device 79 as it is, and the display device 79 performs D / A conversion and the like, and outputs the video to the outside based on the resulting analog signal. Also good.

メモリカードI/F６９は、必要に応じてビデオカメラ２１に装着されるメモリカード(図示せず)に対して、素材データ、各種の設定データなどの読み書きを行う。ネットワークI/F７０は、必要に応じて、インターネットやローカルエリアネットワークといった、有線または無線のネットワークを介して接続される他の装置との間で、データのやり取りを行う。 The memory card I / F 69 reads / writes material data, various setting data, and the like from / to a memory card (not shown) attached to the video camera 21 as necessary. The network I / F 70 exchanges data with other devices connected via a wired or wireless network such as the Internet or a local area network as necessary.

例えば、ネットワークI/F７０は、他の装置からネットワークを介してプログラムを取得し、システムバス７３、ハードディスクドライブI/F７１、およびハードディスクドライブ８０を介して、ハードディスク８１に記録させる。 For example, the network I / F 70 acquires a program from another device via the network, and records the program on the hard disk 81 via the system bus 73, the hard disk drive I / F 71, and the hard disk drive 80.

ハードディスクドライブI/F７１には、ハードディスク８１が装着されるハードディスクドライブ８０が接続されている。ハードディスクドライブI/F７１は、ハードディスクドライブ８０を制御し、ハードディスク８１に対するデータの読み書きを行う。例えば、ハードディスクドライブI/F７１は、ハードディスクドライブ８０を制御し、ネットワークI/F７０とシステムバス７３を介して供給されるプログラムを、ハードディスク８１に記録させる。 A hard disk drive 80 to which a hard disk 81 is attached is connected to the hard disk drive I / F 71. The hard disk drive I / F 71 controls the hard disk drive 80 and reads / writes data from / to the hard disk 81. For example, the hard disk drive I / F 71 controls the hard disk drive 80 to record a program supplied via the network I / F 70 and the system bus 73 on the hard disk 81.

ドライブI/F７２には、ドライブ８２が接続されている。ドライブI/F７２は、ドライブ８２を制御し、ドライブ８２に磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア５１が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じてハードディスクドライブI/F７１などを介してハードディスク８１に転送され、記録される。 A drive 82 is connected to the drive I / F 72. The drive I / F 72 controls the drive 82, and when a removable medium 51 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is mounted in the drive 82, drives the program and the program recorded therein And get data. The acquired program and data are transferred to and recorded on the hard disk 81 via the hard disk drive I / F 71 as necessary.

システムバス７３は、そこに接続されている各部の間でのデータのやり取りを仲介する。 The system bus 73 mediates exchange of data between the units connected thereto.

なお、図１のビデオカメラ２２も、図２のビデオカメラ２１と同様に構成されるが、ビデオカメラ２２には、マイクロフォンが接続されておらず、マイクロフォンから音声信号が入力されない。即ち、ビデオカメラ２２は、テレビジョン素材の映像だけを撮影する。従って、ビデオカメラ２２は、音声の部分を除けばビデオカメラ２１と同様であり、以下では、ビデオカメラ２２についての説明は省略する。 1 is configured in the same manner as the video camera 21 of FIG. 2, the microphone is not connected to the video camera 22, and no audio signal is input from the microphone. That is, the video camera 22 captures only the video of the television material. Accordingly, the video camera 22 is the same as the video camera 21 except for the audio portion, and description of the video camera 22 is omitted below.

次に、図２のビデオカメラ２１において、マイコン６２は、所定のプログラムを実行することにより、テレビジョン素材の音声付き映像を撮影する撮影処理部として機能する。 Next, in the video camera 21 of FIG. 2, the microcomputer 62 functions as a photographing processing unit that captures a video with sound of a television material by executing a predetermined program.

図３は、そのような撮影処理部の機能的な構成例を示している。 FIG. 3 shows a functional configuration example of such a photographing processing unit.

図３の撮影処理部９０は、制御部９１、判定部９２、および作成部９３により構成される。 3 includes a control unit 91, a determination unit 92, and a creation unit 93.

制御部９１は、撮影に関する各種の制御を行う。例えば、制御部９１は、操作部I/F６５から供給される、撮影の開始を指令するための操作を表す操作信号に応じて、映像入力I/F６０と音声入力I/F６１を制御し、素材データの取得を開始する。また、制御部９１は、取得した素材データのうちの音声データを判定部９２に供給する。 The control unit 91 performs various types of control related to shooting. For example, the control unit 91 controls the video input I / F 60 and the audio input I / F 61 according to an operation signal that is supplied from the operation unit I / F 65 and represents an operation for instructing the start of shooting. Start data acquisition. In addition, the control unit 91 supplies audio data of the acquired material data to the determination unit 92.

また、制御部９１は、取得した素材データを用いてプロキシデータを作成する。さらに、制御部９１は、素材データとプロキシデータを一時記憶メモリI/F６３に供給して、一時記憶メモリ７５に記憶させる。 Further, the control unit 91 creates proxy data using the acquired material data. Further, the control unit 91 supplies the material data and proxy data to the temporary storage memory I / F 63 and stores them in the temporary storage memory 75.

判定部９２は、制御部９１から供給される音声データのレベルに応じて、その音声データが、テレビジョン素材における未定の発言者の発言の開始時の音声データであるか、および、発言の終了時の音声データであるかを判定する。判定部９２は、その判定の結果に基づいて、未定の発言者の発言の開始時または終了時の音声データに対応するフレーム番号を、作成部９３に供給する。 In accordance with the level of audio data supplied from the control unit 91, the determination unit 92 determines whether the audio data is audio data at the start of an undetermined speaker's utterance in the television material, and the end of the utterance. It is determined whether the sound data is time. Based on the result of the determination, the determination unit 92 supplies a frame number corresponding to the voice data at the start or end of the utterance of an undetermined speaker to the creation unit 93.

作成部９３は、判定部９２から供給される未定の発言者の発言の開始時または終了時の音声データに対応するフレーム番号に基づいて、そのフレーム番号のフレームに付与する、未定の発言者の発言の開始位置または終了位置を音声の特徴として示す電子マークを作成する。作成部９３は、その電子マークを記述した電子マークデータを、光ディスクドライブI/F６４に供給して、光ディスク３１に記録させる。 Based on the frame number corresponding to the voice data at the start or end of the utterance of the undetermined speaker supplied from the determination unit 92, the creation unit 93 adds the frame of the undetermined speaker to the frame with the frame number. An electronic mark indicating the start position or end position of the speech as a voice feature is created. The creation unit 93 supplies the electronic mark data describing the electronic mark to the optical disc drive I / F 64 and records it on the optical disc 31.

図４は、図１の光ディスク３１に記録されているファイルのディレクトリ構造の例を示している。 FIG. 4 shows an example of a directory structure of files recorded on the optical disc 31 of FIG.

図４において、シンボル９５は、１つのディレクトリを表している。なお、符号は付していないが、シンボル（ディレクトリ）９５と同一のその他のシンボルも、１つのディレクトリを表している。また、シンボル９６は、１つのファイルを示している。なお、符号は付していないが、シンボル（ファイル）９６と同一のその他のシンボルも、１つのファイルを示している。 In FIG. 4, a symbol 95 represents one directory. It should be noted that other symbols identical to the symbol (directory) 95 also represent one directory, although not denoted by reference numerals. A symbol 96 indicates one file. In addition, although the code | symbol is not attached | subjected, the other symbol same as the symbol (file) 96 has also shown one file.

なお、以下、特に断りの無い限り、ディレクトリとディレクトリのシンボルとは同一であるとみなして説明する。同様に、ファイルとファイルのシンボルとは同一であるとみなして説明する。また、各ディレクトリのそれぞれ、および、各ファイルのそれぞれの識別を容易なものとするために、以下、ファイルまたはディレクトリの後方に括弧（）書きでその名称を記載する。 In the following description, a directory and a directory symbol are assumed to be the same unless otherwise specified. Similarly, the description will be made assuming that the file and the file symbol are the same. Further, in order to facilitate identification of each directory and each file, the name is written in parentheses () after the file or directory.

図４の例では、光ディスク３１には、目次を記述するデータのファイルであり、クリップを管理するための情報を記述するインデックスファイル(INDEX.XML)９６と、光ディスク３１の代表画のパス、光ディスク３１のタイトルやコメントなどから構成されるディスクメタデータのファイルであるディスクメタファイル（DISCMETA.XML）とが設けられている。 In the example of FIG. 4, the optical disc 31 is a data file describing a table of contents, an index file (INDEX.XML) 96 describing information for managing clips, a path of a representative image of the optical disc 31, an optical disc A disc meta file (DISCMETA.XML), which is a disc metadata file composed of 31 titles and comments, is provided.

また、光ディスク３１には、クリップの素材データとメタデータのファイルが下位に設けられるクリップディレクトリ（Clip）９５と、クリップのプロキシデータのファイルが下位に設けられるプロキシディレクトリ（Sub）が設けられている。 In addition, the optical disc 31 is provided with a clip directory (Clip) 95 in which clip material data and metadata files are provided in a lower level, and a proxy directory (Sub) in which clip proxy data files are provided in a lower level. .

クリップディレクトリ(Clip)９５には、光ディスク３１に記録されているクリップのうちの素材データとメタデータが、それぞれ、クリップ毎に異なるファイルとして記録される。 In the clip directory (Clip) 95, material data and metadata of clips recorded on the optical disc 31 are recorded as different files for each clip.

具体的には、例えば、図４は、光ディスク３１に３つのクリップのデータが記録されている場合の例を示している。 Specifically, for example, FIG. 4 shows an example in which data of three clips is recorded on the optical disc 31.

即ち、例えば、クリップディレクトリ９５の下位には、光ディスク３１に記録された最初のクリップの素材データのファイルである第１のクリップファイル（C0001.MXF）と、このクリップの素材データに対応する、リアルタイム性を要求されない電子マークデータなどのメタデータ（以下、ノンリアルタイムメタデータ（NRTデータ）という）を含むファイルである第１のNRTファイル（C0001M01.XML）とが設けられている。 That is, for example, below the clip directory 95, the first clip file (C0001.MXF), which is the material data file of the first clip recorded on the optical disc 31, and the real-time data corresponding to the material data of this clip. There is provided a first NRT file (C0001M01.XML) which is a file including metadata such as electronic mark data (hereinafter referred to as non-real-time metadata (NRT data)) that is not required to be reliable.

また、クリップディレクトリ９５の下位には、第１のクリップファイル（C0001.MXF）および第１のNRTファイル（C0001M01.XML）と同様に、第２のクリップファイル（C0002.MXF）および第２のNRTファイル（C0002M01.XML）、並びに、第３のクリップファイル（C0003.MXF）および第３のNRTファイル（C0003M01.XML）が設けられている。 Similarly to the first clip file (C0001.MXF) and the first NRT file (C0001M01.XML), the second clip file (C0002.MXF) and the second NRT are located below the clip directory 95. A file (C0002M01.XML), a third clip file (C0003.MXF), and a third NRT file (C0003M01.XML) are provided.

また、図４において、このようなクリップディレクトリ（Clip）の下方に示される、プロキシディレクトリ（Sub）には、光ディスク３１に記録されているクリップのプロキシデータが、クリップ毎に異なるファイルとして記録されている。 Also, in FIG. 4, in the proxy directory (Sub) shown below the clip directory (Clip), the proxy data of the clip recorded on the optical disc 31 is recorded as a different file for each clip. Yes.

例えば、図４の例の場合、プロキシディレクトリ（Sub）の下位には、光ディスク３１に記録された最初のクリップのプロキシデータのファイルである第１のプロキシファイル（C0001S01.MXF）、第２のクリップのプロキシデータのファイルである第２のプロキシファイル（C0002S01.MXF）、および第３のクリップのプロキシデータのファイルである第３のプロキシファイル（C0003S01.MXF）が設けられる。 For example, in the case of the example of FIG. 4, a first proxy file (C0001S01.MXF), which is a proxy data file of the first clip recorded on the optical disc 31, and a second clip are located below the proxy directory (Sub). A second proxy file (C0002S01.MXF), which is a proxy data file, and a third proxy file (C0003S01.MXF), which is a third clip proxy data file, are provided.

さらに、光ディスク３１には、クリップ以外のデータのファイルが設けられる一般ディレクトリ（General）が設けられている。 Further, the optical disk 31 is provided with a general directory (General) in which data files other than clips are provided.

図５は、図４のクリップファイルのフォーマットの例を示している。 FIG. 5 shows an example of the format of the clip file of FIG.

図５Ａに示すように、クリップファイルは、素材データが１クリップ分まとめてボディに配置され、さらに、そのボディにヘッダとフッタが付加されることにより構成される。 As shown in FIG. 5A, the clip file is configured by arranging material data for one clip together in a body and adding a header and a footer to the body.

ヘッダには、その先頭から、ヘッダパーティションパック(Header Partition Pack)、ヘッダメタデータ(Header Metadata)、インデックステーブル(Index Table)が順次配置される。ヘッダパーティションパックには、ファイルフォーマット(例えば、MXF(Material exchange Format))を表すデータであるパーティションメタデータ、ボディの長さ、ボディの開始位置、ボディに配置されるデータの形式を表すデータなどが配置される。ヘッダメタデータには、例えば、UMID（Unique Material Identifier）、先頭タイムコード、ファイルの作成日、ボディに配置されたデータに関する情報(例えば、映像の画素数、アスペクト比など)などが配置される。 In the header, a header partition pack, header metadata, and an index table are sequentially arranged from the top. The header partition pack contains partition metadata, which is data representing the file format (for example, MXF (Material exchange Format)), body length, body start position, data representing the format of data placed in the body, etc. Be placed. In the header metadata, for example, a UMID (Unique Material Identifier), a head time code, a file creation date, information on data arranged in the body (for example, the number of pixels of an image, an aspect ratio, and the like) are arranged.

なお、UMIDとは、各ファイルをグローバルユニークに識別するためのファイル固有の識別子であって、SMPTE（Society of Motion Picture and Television Engineers）により定められる識別子を指す。 The UMID is a file-specific identifier for uniquely identifying each file, and refers to an identifier determined by SMPTE (Society of Motion Picture and Television Engineers).

インデックステーブルには、ボディに配置されたデータを管理するためのデータなどが配置される。フッタは、フッタパーティションパック(Footer Partition Pack)により構成され、フッタパーティションパックには、フッタを特定するためのデータなどが配置される。 In the index table, data for managing data arranged in the body is arranged. The footer is composed of a footer partition pack, and data for specifying the footer is arranged in the footer partition pack.

図５Ｂに示すように、クリップファイルのボディには、１フレーム分のリアルタイム性を要求されるメタデータ(以下、リアルタイムメタデータという)が配置されるシステムアイテム、D10と呼ばれるMPEG(Moving Picture Experts Group) IMX方式で符号化された映像データ、および、AES(Audio Engineering Society)3形式の非圧縮の音声データが、KLV(Key,Length,Value)構造にKLVコーディングされて配置される。 As shown in FIG. 5B, the clip file body is a system item in which metadata that requires real-time property for one frame (hereinafter referred to as real-time metadata) is arranged, MPEG (Moving Picture Experts Group) called D10. ) Video data encoded in the IMX format and uncompressed audio data in AES (Audio Engineering Society) 3 format are KLV-coded and arranged in a KLV (Key, Length, Value) structure.

KLV構造とは、その先頭から、キー(Key)、レングス(Length)、バリュー(Value)が順次配置された構造であり、キーには、バリューに配置されるデータがどのようなデータであるかを表す、SMPTE 298Mの規格に準拠した16バイトのラベルが配置される。レングスには、バリューに配置されるデータのデータ長が配置される。バリューには、実データ、即ち、ここでは、システムアイテム、映像データ、または音声データが配置される。 The KLV structure is a structure in which a key, length, and value are arranged sequentially from the beginning. What kind of data is the data arranged in the key? A 16-byte label that conforms to the SMPTE 298M standard is placed. In the length, the data length of data arranged in the value is arranged. In the value, actual data, that is, here, system items, video data, or audio data is arranged.

また、KLVコーディングされたシステムアイテム、映像データ、および音声データは、そのデータ長が、KAG(KLV Alignment Grid)を基準とする固定長となっている。そして、KLVコーディングされたシステムアイテム、映像データ、および音声データを固定長とするのに、スタッフィング(stuffing)のためのデータとしてのフィラー(Filler)が、やはりKLV構造とされて、KLVコーディングされたシステムアイテム、映像データ、および音声データのそれぞれの後に配置される。 In addition, the system length, video data, and audio data subjected to KLV coding have fixed data lengths based on KAG (KLV Alignment Grid). And to make KLV coded system items, video data, and audio data fixed length, the filler as stuffing data is also made KLV structure and KLV coded It is arranged after each of the system item, the video data, and the audio data.

図６は、未定の発言者の発言の開始位置または終了位置を示す電子マークを記述した電子マークデータの例を示している。 FIG. 6 shows an example of electronic mark data in which an electronic mark indicating a start position or an end position of an undetermined speaker is described.

なお、図６の例では、電子マークデータは、XML（Extensible Markup Language）で記述されている。また、図６において、各行頭の数字は、説明の便宜上付加したものであり、XML記述の一部ではない。これらのことは、後述する図１３、図１８、図２０、図３３、および図３４においても同様である。 In the example of FIG. 6, the electronic mark data is described in XML (Extensible Markup Language). In FIG. 6, the numbers at the beginning of each line are added for convenience of explanation, and are not part of the XML description. The same applies to FIG. 13, FIG. 18, FIG. 20, FIG. 33, and FIG.

図６に示すように、電子マークデータのXML記述は、主に電子マークテーブル（<EssenceMark Table> </EssenceMark Table>）で囲まれる電子マークテーブル部により構成される。図６の例では、この電子マークテーブル部は、２乃至１１行目に記述されている。 As shown in FIG. 6, the XML description of the electronic mark data is mainly composed of an electronic mark table section surrounded by electronic mark tables (<EssenceMark Table> </ EssenceMark Table>). In the example of FIG. 6, this electronic mark table part is described in the 2nd to 11th lines.

なお、２行目の「targetMedia="Original-Material"」の記述は、この電子マークデータが、クリップの素材データに付与される電子マークを記述した電子マークデータであることを示している。 Note that the description of “targetMedia =“ Original-Material ”” on the second line indicates that this electronic mark data is electronic mark data describing an electronic mark added to the material data of the clip.

また、詳細には、電子マークテーブル部には、クリップの素材データに付与される全ての電子マークの情報がリスト化されてまとめて記述される。図６の例では、EssenceMark要素は、各電子マークに対応しており、value属性において電子マークが示す特徴を示し、frameCount属性において電子マークが付与される付与位置の、クリップの先頭からのフレーム数を示している。 In more detail, in the electronic mark table section, information on all electronic marks assigned to the material data of the clip is listed and collectively described. In the example of FIG. 6, the EssenceMark element corresponds to each electronic mark, indicates the characteristics indicated by the electronic mark in the value attribute, and the number of frames from the beginning of the clip at the attachment position where the electronic mark is assigned in the frameCount attribute Is shown.

例えば、図６の３行目の「EssenceMark value="Speaker-X:start"frameCount="0"」の記述は、この電子マークが示す特徴が未定の発言者による発言の開始位置であり、付与位置がクリップの先頭から0フレーム目であることを示している。 For example, the description of “EssenceMark value =“ Speaker-X: start ”frameCount =“ 0 ”” on the third line in FIG. 6 is the start position of a speech by a speaker whose characteristics indicated by this electronic mark are undetermined. This indicates that the position is the 0th frame from the beginning of the clip.

また、図６の４行目の「EssenceMark value="Speaker-X:end"frameCount="564"」の記述は、この電子マークが示す特徴が未定の発言者による発言の終了位置であり、付与位置がクリップの先頭から564フレーム目であることを示している。 In addition, the description of “EssenceMark value =“ Speaker-X: end ”frameCount =“ 564 ”” on the fourth line in FIG. 6 is an end position of a speech by a speaker whose characteristics indicated by the electronic mark are undetermined. This indicates that the position is the 564th frame from the beginning of the clip.

同様に、５行目の「EssenceMark value="Speaker-X:start"frameCount="564"」、７行目の「EssenceMark value="Speaker-X:start"frameCount="924"」、９行目の「EssenceMark value="Speaker-X:start"frameCount="1804"」の記述は、この電子マークが示す特徴が未定の発言者による発言の開始位置であり、付与位置が、それぞれ、クリップの先頭から564フレーム目、924フレーム目、1804フレーム目であることを示している。 Similarly, "EssenceMark value =" Speaker-X: start "frameCount =" 564 "" on the 5th line, "EssenceMark value =" Speaker-X: start "frameCount =" 924 "" on the 7th line, 9th line The description of "EssenceMark value =" Speaker-X: start "frameCount =" 1804 "" is the start position of the speech by the speaker whose characteristics indicated by this electronic mark are undecided, and the assigned position is the beginning of the clip. From 564th frame, 924th frame, and 1804th frame are shown.

また、６行目の「EssenceMark value="Speaker-X:end"frameCount="924"」、８行目の「EssenceMark value="Speaker-X:end"frameCount="1804"」、１０行目の「EssenceMark value="Speaker-X:end"frameCount="2100"」の記述は、この電子マークが示す特徴が未定の発言者による発言の終了位置であり、付与位置が、それぞれ、クリップの先頭から924フレーム目、1804フレーム目、2100フレーム目であることを示している。 Also, "EssenceMark value =" Speaker-X: end "frameCount =" 924 "" on the 6th line, "EssenceMark value =" Speaker-X: end "frameCount =" 1804 "" on the 8th line, 10th line The description of “EssenceMark value =" Speaker-X: end "frameCount =" 2100 "" is the ending position of the utterance by the speaker whose characteristics indicated by this electronic mark are undecided. This indicates the 924th frame, the 1804th frame, and the 2100th frame.

次に、図７を参照して、ユーザがビデオカメラ２１を用いて行う撮影作業について説明する。 Next, with reference to FIG. 7, a photographing operation performed by the user using the video camera 21 will be described.

図７の表では、撮影作業の各ステップの番号に対応付けて、そのステップにおける撮影作業の内容、ビデオカメラ２１による主な処理の内容、および、その処理の対象となるデータが記述されている。 In the table of FIG. 7, the contents of the photographing work at that step, the contents of the main processing by the video camera 21 and the data to be processed are described in association with the number of each step of the photographing work. .

図７に示すように、ステップＳ１１において、ユーザは、操作部７７を操作して、撮影の開始を指令する。このとき、ビデオカメラ２１の制御部９１は、クリップのNRTファイル（図４）を光ディスク３１に作成する。また、制御部９１は、クリップファイルを光ディスク３１に作成する。さらに、制御部９１は、映像入力I/F６０と音声入力I/F６１から供給される素材データのクリップファイルへの記録を開始するとともに、その素材データのうちの音声データの判定部９２への供給を開始する。 As shown in FIG. 7, in step S11, the user operates the operation unit 77 to instruct the start of shooting. At this time, the control unit 91 of the video camera 21 creates a clip NRT file (FIG. 4) on the optical disc 31. Further, the control unit 91 creates a clip file on the optical disc 31. Further, the control unit 91 starts recording the material data supplied from the video input I / F 60 and the audio input I / F 61 in the clip file, and supplies the audio data of the material data to the determination unit 92. To start.

また、判定部９２は、制御部９１から供給される音声データの所定の閾値以上のレベルが所定時間以上連続したことを検出する。そして、判定部９２は、音声データの所定の閾値以上のレベルが所定時間以上連続したとき、その音声データが、テレビジョン素材における未定の発言者の発言の開始時の音声データであると判定し、その連続区間の開始地点の音声データに対応するフレーム番号を作成部９３に供給する。 Further, the determination unit 92 detects that the level of the audio data supplied from the control unit 91 is equal to or higher than a predetermined threshold for a predetermined time or more. Then, when the level equal to or higher than the predetermined threshold of the audio data continues for a predetermined time or more, the determination unit 92 determines that the audio data is the audio data at the start time of an undetermined speaker in the television material. The frame number corresponding to the audio data at the start point of the continuous section is supplied to the creation unit 93.

作成部９３は、判定部９２から供給される未定の発言者の発言の開始時の音声データに対応するフレーム番号に基づいて、そのフレーム番号のフレームに付与する未定の発言者の発言の開始位置を音声の特徴として示す電子マーク（以下、発言者未定EM(start)という）を作成する。そして、作成部９３は、その発言者未定EM(start)を、クリップのNRTファイルの電子マークデータに記述する。 The creation unit 93, based on the frame number corresponding to the voice data at the start of the utterance of the undetermined speaker supplied from the determination unit 92, the start position of the utterance of the undetermined speaker assigned to the frame of the frame number Is created as an audio feature (hereinafter referred to as “speaker undetermined EM (start)”). Then, the creation unit 93 describes the speaker undetermined EM (start) in the electronic mark data of the NRT file of the clip.

また、判定部９２は、音声データの所定の閾値未満のレベルが所定時間以上連続したことを検出する。そして、判定部９２は、音声データの所定の閾値未満のレベルが所定時間以上連続したとき、その音声データが、テレビジョン素材における未定の発言者の発言の終了時の音声データであると判定し、その連続区間の開始地点の音声データに対応するフレーム番号を作成部９３に供給する。 In addition, the determination unit 92 detects that a level of audio data that is less than a predetermined threshold value continues for a predetermined time or more. Then, the determination unit 92 determines that the audio data is the audio data at the end of the utterance of the undetermined speaker in the television material when the level of the audio data below the predetermined threshold continues for a predetermined time or more. The frame number corresponding to the audio data at the start point of the continuous section is supplied to the creation unit 93.

作成部９３は、判定部９２から供給される未定の発言者の発言の終了時の音声データに対応するフレーム番号に基づいて、そのフレーム番号のフレームに付与する未定の発言者の発言の終了位置を音声の特徴として示す電子マーク（以下、発言者未定EM(end)という）を作成する。そして、作成部９３は、その発言者未定EM(end)を、クリップのNRTファイルの電子マークデータに記述する。 The creation unit 93, based on the frame number corresponding to the voice data at the end of the utterance of the undetermined speaker supplied from the determination unit 92, the end position of the utterance of the undetermined speaker to be given to the frame of the frame number An electronic mark (hereinafter referred to as “speaker undetermined EM (end))” is created. Then, the creation unit 93 describes the speaker undecided EM (end) in the electronic mark data of the NRT file of the clip.

ステップＳ１２において、ユーザは、操作部７７を操作して撮影の終了を指令する。このとき、制御部９１は、素材データのクリップファイルへの記録を終了するとともに、その素材データのうちの音声データの判定部９２への供給を終了する。 In step S 12, the user operates the operation unit 77 to command the end of shooting. At this time, the control unit 91 ends the recording of the material data in the clip file, and ends the supply of the audio data of the material data to the determination unit 92.

次に、図８のフローチャートを参照して、図３の撮影処理部９０による撮影処理について説明する。この撮影処理は、例えば、ユーザが操作部７７を操作することにより、撮影の開始を指令したとき開始される。 Next, imaging processing by the imaging processing unit 90 in FIG. 3 will be described with reference to the flowchart in FIG. This photographing process is started, for example, when the user commands the start of photographing by operating the operation unit 77.

ステップＳ３１において、撮影処理部９０の制御部９１は、クリップのNRTファイルを光ディスク３１に作成する。ステップＳ３２において、制御部９１は、クリップファイルを光ディスク３１に作成する。ステップＳ３３において、制御部９１は、映像入力I/F６０と音声入力I/F６１から供給される素材データのクリップファイルへの記録を開始する。また、制御部９１は、その素材データのうちの音声データの判定部９２への供給を開始する。 In step S 31, the control unit 91 of the imaging processing unit 90 creates an NRT file of a clip on the optical disc 31. In step S 32, the control unit 91 creates a clip file on the optical disc 31. In step S33, the control unit 91 starts recording the material data supplied from the video input I / F 60 and the audio input I / F 61 in the clip file. Further, the control unit 91 starts supplying the audio data of the material data to the determination unit 92.

ステップＳ３４において、判定部９２は、制御部９１から供給される音声データの閾値以上のレベルが所定時間以上連続したか、即ち、音声データのレベルが所定時間以上の間閾値以上であるかを判定する。ステップＳ３４で音声データの閾値以上のレベルが所定時間以上連続していないと判定された場合、判定部９２は、音声データの閾値以上のレベルが所定時間以上連続するまで待機する。 In step S34, the determination unit 92 determines whether the level of the audio data supplied from the control unit 91 is equal to or higher than the threshold for a predetermined time, that is, whether the level of the audio data is equal to or higher than the threshold for a predetermined time or more. To do. If it is determined in step S34 that the level equal to or higher than the threshold value of the audio data is not continuous for the predetermined time or longer, the determination unit 92 waits until the level equal to or higher than the threshold value of the audio data is continued for the predetermined time or longer.

ステップＳ３４で音声データの閾値以上のレベルが所定時間以上連続したと判定された場合、判定部９２は、その音声データが、テレビジョン素材における未定の発言者の発言の開始時の音声データであると判定し、その連続区間の開始地点の音声データに対応するフレーム番号を作成部９３に供給する。 When it is determined in step S34 that the level equal to or higher than the threshold value of the audio data has continued for a predetermined time or more, the determination unit 92 is the audio data at the start of the utterance of an undetermined speaker in the television material. And the frame number corresponding to the audio data at the start point of the continuous section is supplied to the creation unit 93.

そして、ステップＳ３５において、作成部９３は、判定部９２から供給される未定の発言者の発言の開始時の音声データに対応するフレーム番号に基づいて、そのフレーム番号のフレームに付与する発言者未定EM(start)を作成し、その発言者未定EM(start)をクリップのNRTファイルの電子マークデータに記述する。 In step S 35, the creating unit 93 determines the speaker undecided to be assigned to the frame of the frame number based on the frame number corresponding to the voice data at the start of the speech of the undecided speaker supplied from the determining unit 92. EM (start) is created, and the speaker undetermined EM (start) is described in the electronic mark data of the NRT file of the clip.

ステップＳ３６において、判定部９２は、制御部９１から供給される音声データの閾値未満のレベルが所定時間以上連続したか、即ち音声データのレベルが所定時間以上の間閾値未満であるかを判定する。ステップＳ３６で、音声データの閾値未満のレベルが所定時間以上連続していないと判定された場合、判定部９２は、音声データの閾値未満のレベルが所定時間以上連続するまで待機する。 In step S36, the determination unit 92 determines whether the level of the audio data supplied from the control unit 91 is less than the threshold for a predetermined time or more, that is, whether the level of the audio data is less than the threshold for a predetermined time or more. . If it is determined in step S36 that the level less than the threshold value of the audio data is not continuous for the predetermined time or more, the determination unit 92 waits until the level less than the threshold value of the audio data continues for the predetermined time or more.

一方、ステップＳ３６で音声データの閾値未満のレベルが所定時間以上連続したと判定された場合、判定部９２は、その音声データが、テレビジョン素材における未定の発言者の発言の終了時の音声データであると判定し、その連続区間の開始地点の音声データに対応するフレーム番号を作成部９３に供給する。 On the other hand, when it is determined in step S36 that the level below the threshold of the audio data has continued for a predetermined time or more, the determination unit 92 determines that the audio data is the audio data at the end of the utterance of the undetermined speaker in the television material. And the frame number corresponding to the audio data at the start point of the continuous section is supplied to the creation unit 93.

そして、ステップＳ３７において、作成部９３は、判定部９２から供給される未定の発言者の発言の終了時に対応するフレーム番号に基づいて、そのフレーム番号のフレームに付与する発言者未定EM(end)を作成し、その発言者未定EM(end)をクリップのNRTファイルの電子マークデータに記述する。 In step S 37, the creation unit 93 adds the speaker undecided EM (end) to be given to the frame of the frame number based on the frame number corresponding to the end of the utterance of the undetermined speaker supplied from the determination unit 92. And the speaker's undecided EM (end) is described in the electronic mark data of the NRT file of the clip.

ステップＳ３８において、制御部９１は、操作部７７からの操作信号に基づいて、ユーザにより撮影の終了が指令されたかを判定する。ステップＳ３８で撮影の終了が指令されていないと判定された場合、処理はステップＳ３４に戻り、上述した処理を繰り返す。 In step S 38, the control unit 91 determines whether or not the user has commanded the end of shooting based on the operation signal from the operation unit 77. If it is determined in step S38 that the end of shooting has not been commanded, the process returns to step S34 and the above-described process is repeated.

ステップＳ３８で、ユーザにより撮影の終了が指令されたと判定された場合、ステップＳ３９において、制御部９１は、素材データのクリップファイルへの記録を終了する。また、制御部９１は、その素材データのうちの音声データの判定部９２への供給を終了する。
そして処理は終了する。 If it is determined in step S38 that the user has commanded the end of shooting, in step S39, the control unit 91 ends recording of the material data in the clip file. In addition, the control unit 91 ends the supply of the audio data of the material data to the determination unit 92.
Then, the process ends.

以上のように、ビデオカメラ２１は、音声データのレベルが所定時間以上の間閾値以上である場合、または、音声データのレベルが所定の時間以上の間閾値未満である場合、その音声データに対応するフレームに、発言者未定EM（start）または発言者未定EM（end）を付与するので、この発言者未定EM（start）と発言者未定EM（end）により、後述する編集装置４１において発言の開始位置と終了位置を容易に認識することができる。 As described above, when the audio data level is equal to or higher than the threshold value for a predetermined time or more, or the audio data level is lower than the threshold value for the predetermined time or longer, the video camera 21 supports the audio data. Since the speaker undetermined EM (start) or the speaker undetermined EM (end) is assigned to the frame to be executed, the editing device 41 (to be described later) uses the speaker undetermined EM (start) and the speaker undetermined EM (end). The start position and end position can be easily recognized.

図９は、図１の編集装置４１のハードウェア構成例を示すブロック図である。 FIG. 9 is a block diagram illustrating a hardware configuration example of the editing apparatus 41 in FIG.

図９の編集装置４１では、マイコン１１１、一時記憶メモリI/F１１２、光ディスクドライブI/F１１３、操作部I/F１１４、音声出力I/F１１５、シリアルデータI/F１１６、映像表示I/F１１７、メモリカードI/F１１８、ネットワークI/F１１９、ハードディスクドライブI/F１２０、およびドライブI/F１２１が、システムバス１２２に接続されている。 9 includes a microcomputer 111, a temporary storage memory I / F 112, an optical disk drive I / F 113, an operation unit I / F 114, an audio output I / F 115, a serial data I / F 116, a video display I / F 117, and a memory card. An I / F 118, a network I / F 119, a hard disk drive I / F 120, and a drive I / F 121 are connected to the system bus 122.

マイコン１１１は、CPU、ROM、およびRAMにより構成される。マイコン１１１のCPUは、ROMまたはハードディスク１２８に記録されているプログラムにしたがって、操作部I/F１１４からの操作信号などに応じて、編集装置４１の各部を制御する。 The microcomputer 111 is constituted by a CPU, a ROM, and a RAM. The CPU of the microcomputer 111 controls each unit of the editing device 41 according to an operation signal from the operation unit I / F 114 according to a program recorded in the ROM or the hard disk 128.

例えば、CPUは、光ディスクドライブI/F１１３から供給される、光ディスクドライブ４１Ａに装着された光ディスク３１または光ディスク３２から読み出されたクリップを、一時記憶メモリI/F１１２に供給する。また、CPUは、一時記憶メモリI/F１１２から供給される、光ディスク３２に記録されているクリップを、光ディスクドライブI/F１１３を介して光ディスクドライブ４１Ａに供給し、光ディスク３１に集約する。 For example, the CPU supplies the clip read from the optical disc 31 or the optical disc 32 mounted on the optical disc drive 41A supplied from the optical disc drive I / F 113 to the temporary storage memory I / F 112. Further, the CPU supplies the clips recorded on the optical disc 32 supplied from the temporary storage memory I / F 112 to the optical disc drive 41A via the optical disc drive I / F 113 and collects them on the optical disc 31.

さらに、CPUは、操作信号に応じてエディットリストを作成することにより、非破壊編集を行う。CPUは、エディットリストを光ディスクドライブI/F１１３を介して光ディスクドライブ４１Ａに供給し、光ディスク３１に記録させる。 Furthermore, the CPU performs nondestructive editing by creating an edit list according to the operation signal. The CPU supplies the edit list to the optical disc drive 41A via the optical disc drive I / F 113 and records it on the optical disc 31.

また、CPUは、操作信号に応じて、一時記憶メモリI/F１１２から供給されるクリップの電子マークデータに記述される、発言者未定EM(start)と発言者未定EM（end）に、ユーザにより入力された発言者の固有の情報としての発言者ＩＤを付加する。そして、CPUは、発言者ＩＤが付加された発言者未定EM(start)である発言者EM(start)と、発言者ＩＤが付加された発言者未定EM（end）である発言者EM(end)とを記述した電子マークデータを、光ディスクドライブI/F１１３に供給して、光ディスク３１のクリップのNRTファイルに記録させる。 In addition, according to the operation signal, the CPU adds the speaker undecided EM (start) and the speaker undecided EM (end) described in the electronic mark data of the clip supplied from the temporary storage memory I / F 112 by the user. A speaker ID is added as unique information of the input speaker. The CPU then determines the speaker EM (start) that is the speaker undetermined EM (start) to which the speaker ID is added and the speaker EM (end) that is the speaker undetermined EM (end) to which the speaker ID is added. ) Is supplied to the optical disc drive I / F 113 and recorded in the NRT file of the clip on the optical disc 31.

さらに、CPUは、エディットリストとクリップのNRTファイルの電子マークデータとに基づいて、編集結果の電子マークデータを作成する。そして、CPUは、その電子マークデータを、光ディスクドライブI/F１１３に供給して、光ディスク３１に記録させる。 Further, the CPU creates electronic mark data as an editing result based on the edit list and the electronic mark data of the NRT file of the clip. Then, the CPU supplies the electronic mark data to the optical disc drive I / F 113 and records it on the optical disc 31.

また、CPUは、操作信号と編集結果の電子マークデータとに基づいて、編集結果の音声のうちの、ユーザにより指定された発言者ＩＤの発言者の発言にダックボイス加工を施すように、エディットリストを変更する。 In addition, the CPU edits so as to perform duck voice processing on the speech of the speaker with the speaker ID specified by the user in the edited speech based on the operation signal and the electronic mark data of the edited result. Change the list.

さらに、CPUは、一時記憶メモリI/F１１２から供給されるクリップのうちの音声データを、システムバス１２２を介して音声出力I/F１１５に供給して、クリップの音声をスピーカ１２５から出力させる。また、CPUは、一時記憶メモリI/F１１２から供給されるクリップのうちの映像データを、システムバス１２２を介して映像表示I/F１１７に供給して、クリップの映像を表示装置１２６に表示させる。RAMには、CPUが実行するプログラムやデータなどが適宜記憶される。 Further, the CPU supplies the audio data of the clip supplied from the temporary storage memory I / F 112 to the audio output I / F 115 via the system bus 122 and causes the audio of the clip to be output from the speaker 125. Further, the CPU supplies the video data of the clip supplied from the temporary storage memory I / F 112 to the video display I / F 117 via the system bus 122 and causes the display device 126 to display the video of the clip. The RAM appropriately stores programs executed by the CPU, data, and the like.

一時記憶メモリI/F１１２には、バッファなどの一時記憶メモリ１２３が接続されており、一時記憶メモリI/F１１２は、マイコン１１１から供給される、光ディスク３１または光ディスク３２に記録されているクリップを、一時記憶メモリ１２３に記憶させる。また、一時記憶メモリI/F１１２は、一時記憶メモリ１２３に記憶されているクリップを読み出し、マイコン１１１に供給する。 A temporary storage memory 123 such as a buffer is connected to the temporary storage memory I / F 112, and the temporary storage memory I / F 112 supplies clips recorded on the optical disk 31 or the optical disk 32 supplied from the microcomputer 111. It is stored in the temporary storage memory 123. In addition, the temporary storage memory I / F 112 reads the clip stored in the temporary storage memory 123 and supplies the clip to the microcomputer 111.

光ディスクドライブI/F１１３には、光ディスク３１または光ディスク３２が装着される光ディスクドライブ４１Ａが接続されている。光ディスクドライブI/F１１３は、光ディスクドライブ４１Ａを制御して、光ディスクドライブ４１Ａに装着されている光ディスク３１または光ディスク３２からクリップを読み出し、システムバス１２２を介して一時記憶メモリI/F１１２に供給する。 An optical disk drive 41A to which the optical disk 31 or the optical disk 32 is mounted is connected to the optical disk drive I / F 113. The optical disk drive I / F 113 controls the optical disk drive 41A to read a clip from the optical disk 31 or the optical disk 32 loaded in the optical disk drive 41A, and supplies the clip to the temporary storage memory I / F 112 via the system bus 122.

また、光ディスクドライブI/F１１３は、光ディスクドライブ４１Ａを制御し、マイコン１１１から供給される、光ディスク３２に記録されているクリップ、エディットリスト、発言者EM（start）と発言者EM(end)を記述した電子マークデータ、および編集結果の電子マークデータを、光ディスク３１に記録させる。 The optical disk drive I / F 113 controls the optical disk drive 41A and describes clips, edit lists, speakers EM (start) and speakers EM (end) supplied from the microcomputer 111 and recorded on the optical disk 32. The recorded electronic mark data and the edited electronic mark data are recorded on the optical disc 31.

操作部I/F１１４には、操作ボタン、キーボード、マウス、リモートコントローラから送信されてくる指令を受信する受信部などの操作部１２４が接続される。操作部I/F１１４は、ユーザによる操作部１２４の操作に応じて、その操作を表す操作信号を生成し、その操作信号を、システムバス１２２を介してマイコン１１１に供給する。 The operation unit I / F 114 is connected to an operation unit 124 such as an operation button, a keyboard, a mouse, and a reception unit that receives a command transmitted from a remote controller. The operation unit I / F 114 generates an operation signal indicating the operation in accordance with the operation of the operation unit 124 by the user, and supplies the operation signal to the microcomputer 111 via the system bus 122.

音声出力I/F１１５には、スピーカ１２５が接続される。音声出力I/F１１５は、マイコン１１１から供給される音声データに対してD/A変換を行い、その結果得られるアナログ信号を増幅して、スピーカ１２５に供給する。スピーカ１２５は、音声出力I/F１１５からのアナログ信号に基づいて、音声を外部に出力する。なお、音声出力I/F１１５は、音声データをそのままスピーカ１２５に供給し、スピーカ１２５が、D/A変換等を行い、その結果得られるアナログ信号に基づいて音声を外部に出力するようにしてもよい。 A speaker 125 is connected to the audio output I / F 115. The audio output I / F 115 performs D / A conversion on the audio data supplied from the microcomputer 111, amplifies the analog signal obtained as a result, and supplies the amplified analog signal to the speaker 125. The speaker 125 outputs sound to the outside based on the analog signal from the sound output I / F 115. The audio output I / F 115 supplies audio data as it is to the speaker 125, and the speaker 125 performs D / A conversion and the like, and outputs audio to the outside based on the resulting analog signal. Good.

シリアルデータI/F１１６は、必要に応じて、図示せぬ外部のコンピュータ等のデジタル機器との間で、データをやり取りする。映像表示I/F１１７には、表示装置１２６が接続され、映像表示I/F１１７は、マイコン１１１からの映像データに対してD/A変換を行い、その結果得られるコンポジット信号、コンポーネント信号などのアナログ信号を増幅して、表示装置１２６に供給する。表示装置１２６は、映像表示I/F１１７からのアナログ信号に基づいて映像を表示する。 The serial data I / F 116 exchanges data with a digital device such as an external computer (not shown) as necessary. A display device 126 is connected to the video display I / F 117, and the video display I / F 117 performs D / A conversion on the video data from the microcomputer 111, and analog signals such as composite signals and component signals obtained as a result thereof. The signal is amplified and supplied to the display device 126. The display device 126 displays a video based on the analog signal from the video display I / F 117.

なお、映像表示I/F１１７は、映像データをそのまま表示装置１２６に供給し、表示装置１２６がD/A変換等を行い、その結果得られるアナログ信号に基づいて映像を外部に出力するようにしてもよい。 The video display I / F 117 supplies the video data as it is to the display device 126, and the display device 126 performs D / A conversion and the like, and outputs the video to the outside based on the resulting analog signal. Also good.

メモリカードI/F１１８は、必要に応じて編集装置４１に装着されるメモリカード(図示せず)に対して、素材データ、各種の設定データなどの読み書きを行う。ネットワークI/F１１９は、必要に応じて、インターネットやローカルエリアネットワークといった、有線または無線のネットワークを介して接続される他の装置との間で、データのやり取りを行う。 The memory card I / F 118 reads / writes material data, various setting data, and the like from / to a memory card (not shown) attached to the editing device 41 as necessary. The network I / F 119 exchanges data with other devices connected via a wired or wireless network such as the Internet or a local area network as necessary.

例えば、ネットワークI/F１１９は、他の装置からネットワークを介してプログラムを取得し、システムバス１２２、ハードディスクドライブI/F１２０、およびハードディスクドライブ１２７を介して、ハードディスク１２８に記録させる。 For example, the network I / F 119 acquires a program from another device via the network, and records the program on the hard disk 128 via the system bus 122, the hard disk drive I / F 120, and the hard disk drive 127.

ハードディスクドライブI/F１２０には、ハードディスク１２８が装着されるハードディスクドライブ１２７が接続されている。ハードディスクドライブI/F１２０は、ハードディスクドライブ１２７を制御し、ハードディスク１２８に対するデータの読み書きを行う。例えば、ハードディスクドライブI/F１２０は、ハードディスクドライブ１２７を制御し、ネットワークI/F１１９とシステムバス１２２を介して供給されるプログラムを、ハードディスク１２８に記録させる。 A hard disk drive 127 to which the hard disk 128 is attached is connected to the hard disk drive I / F 120. The hard disk drive I / F 120 controls the hard disk drive 127 to read / write data from / to the hard disk 128. For example, the hard disk drive I / F 120 controls the hard disk drive 127 and causes the hard disk 128 to record a program supplied via the network I / F 119 and the system bus 122.

ドライブI/F１２１には、ドライブ１２９が接続されている。ドライブI/F１２１は、ドライブ１２９を制御し、ドライブ１２９に磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア１０１が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータなどを取得する。取得されたプログラムやデータは、必要に応じてハードディスクドライブI/F１２０などを介してハードディスク１２８に転送され、記録される。 A drive 129 is connected to the drive I / F 121. The drive I / F 121 controls the drive 129. When a removable medium 101 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is loaded in the drive 129, the drive I / F 121 drives them, and a program recorded there And get data. The acquired program and data are transferred to the hard disk 128 and recorded via the hard disk drive I / F 120 as necessary.

システムバス１２２は、そこに接続されている各部の間でのデータのやり取りを仲介する。 The system bus 122 mediates exchange of data between the units connected thereto.

次に、図９の編集装置４１において、マイコン１１１は、所定のプログラムを実行することにより、テレビジョン素材の音声付き映像を編集する編集処理部として機能する。 Next, in the editing device 41 of FIG. 9, the microcomputer 111 functions as an editing processing unit that edits video with sound of a television material by executing a predetermined program.

図１０は、そのような編集処理部１５０の機能的な構成例を示している。 FIG. 10 shows a functional configuration example of such an editing processing unit 150.

図１０の編集処理部１５０は、付加部１５１、エディットリスト作成部１５２、EM作成部１５３により構成される。 10 includes an adding unit 151, an edit list creating unit 152, and an EM creating unit 153.

付加部１５１は、ハードディスク１２８に記録されている、発言者ＩＤと発言者の名前を対応付けた発言者リストを読み出す。付加部１５１は、その発言者リストに基づいて、発言者ＩＤを入力するための入力画面(後述する図１２)の映像データを生成する。付加部１５１は、その入力画面の映像データを映像表示I/F１１７に供給して、入力画面を表示装置１２６に表示させる。 The adding unit 151 reads a speaker list recorded on the hard disk 128 in which a speaker ID is associated with a speaker name. The adding unit 151 generates video data of an input screen (FIG. 12 described later) for inputting a speaker ID based on the speaker list. The adding unit 151 supplies the video data of the input screen to the video display I / F 117 and causes the display device 126 to display the input screen.

また、付加部１５１は、入力画面においてユーザが操作部１２４を操作することにより操作部I/F１１４から供給される操作信号に応じて、その操作信号に対応する発言者ＩＤを、一時記憶メモリI/F１１２から供給されるクリップの電子マークデータに記述される、発言者未定EM(start)と発言者未定EM（end）に付加する。そして、付加部１５１は、発言者ＩＤを付加した後の電子マークデータを、光ディスクドライブI/F１１３に供給して、光ディスク３１のNRTファイルに記録させる。 In addition, in response to an operation signal supplied from the operation unit I / F 114 when the user operates the operation unit 124 on the input screen, the addition unit 151 stores the speaker ID corresponding to the operation signal in the temporary storage memory I. It is added to the speaker undecided EM (start) and the speaker undecided EM (end) described in the electronic mark data of the clip supplied from / F112. Then, the adding unit 151 supplies the electronic mark data after adding the speaker ID to the optical disc drive I / F 113 and records it in the NRT file of the optical disc 31.

エディットリスト作成部１５２は、光ディスクドライブI/F１１３から供給される、光ディスク３１または光ディスク３２から読み出されたクリップを、一時記憶メモリI/F１１２に供給する。また、エディットリスト作成部１５２は、一時記憶メモリI/F１１２から供給される、光ディスク３２に記録されているクリップを、光ディスクドライブI/F１１３に供給して、光ディスク３１に集約する。 The edit list creation unit 152 supplies the clip read from the optical disc 31 or the optical disc 32 supplied from the optical disc drive I / F 113 to the temporary storage memory I / F 112. The edit list creation unit 152 supplies clips recorded on the optical disc 32 supplied from the temporary storage memory I / F 112 to the optical disc drive I / F 113 and collects them on the optical disc 31.

さらに、エディットリスト作成部１５２は、一時記憶メモリI/F１１２から供給されるプロキシデータのうちの音声データを音声出力I/F１１５に供給して、クリップの音声をスピーカ１２５から出力させるとともに、プロキシデータのうちの映像データを映像表示I/F１１７に供給して、クリップの低解像度の映像を、編集を行うための編集画面として表示装置１２６に表示させる。このとき、ユーザは、スピーカ１２５からの音声を聞きつつ、編集画面を見ながら、操作部１２４を操作して編集作業を行う。 Further, the edit list creation unit 152 supplies the audio data of the proxy data supplied from the temporary storage memory I / F 112 to the audio output I / F 115 to output the audio of the clip from the speaker 125, and the proxy data The video data is supplied to the video display I / F 117, and the low-resolution video of the clip is displayed on the display device 126 as an editing screen for editing. At this time, the user performs an editing operation by operating the operation unit 124 while listening to the sound from the speaker 125 and viewing the editing screen.

エディットリスト作成部１５２は、ユーザの編集作業により操作部I/F１１４から供給される操作信号に応じて、エディットリストを作成することにより、非破壊編集を行う。そして、エディットリスト作成部１５２は、エディットリストを光ディスクドライブI/F１１３に供給して光ディスク３１に記録させるとともに、EM作成部１５３に供給する。 The edit list creation unit 152 performs nondestructive editing by creating an edit list in accordance with an operation signal supplied from the operation unit I / F 114 by the user's editing work. Then, the edit list creation unit 152 supplies the edit list to the optical disc drive I / F 113 to record it on the optical disc 31 and also supplies it to the EM creation unit 153.

また、エディットリスト作成部１５２は、操作部I/F１１４から供給される操作信号と、EM作成部１５３から供給される編集結果の電子マークデータとに基づいて、編集結果の音声のうちの、ユーザにより指定された発言者ＩＤの発言者の発言にダックボイス加工を施すように、エディットリストを変更する。 Further, the edit list creation unit 152, based on the operation signal supplied from the operation unit I / F 114 and the electronic mark data of the edit result supplied from the EM creation unit 153, the user of the edit result voice. The edit list is changed so that the duck voice processing is performed on the speech of the speaker with the speaker ID designated by.

EM作成部１５３は、エディットリスト作成部１５２から供給されるエディットリストと、一時記憶メモリI/F１１２に記憶されている、クリップの電子マークデータとに基づいて、編集結果の電子マークデータを作成する。そして、EM作成部１５３は、その電子マークデータを光ディスクドライブI/F１１３に供給して、光ディスク３１に記録させるとともに、エディットリスト作成部１５２に供給する。 The EM creation unit 153 creates electronic mark data as a result of editing based on the edit list supplied from the edit list creation unit 152 and the electronic mark data of the clip stored in the temporary storage memory I / F 112. . Then, the EM creation unit 153 supplies the electronic mark data to the optical disc drive I / F 113 to record it on the optical disc 31 and also to the edit list creation unit 152.

また、EM作成部１５３は、操作部I/F１１４から供給される操作信号に応じて、編集結果の電子マークデータに記述される、ユーザにより指定された発言者ＩＤが付加された発言者EM(start)と発言者EM(end)に、ダックボイス加工の有無を表す情報を付加する。 Further, the EM creation unit 153 responds to the operation signal supplied from the operation unit I / F 114, and the speaker EM (with the speaker ID specified by the user described in the electronic mark data of the edited result is added. information indicating the presence or absence of duck voice processing is added to start) and speaker EM (end).

図１１は、非破壊編集後の光ディスク３１に記録されているファイルのディレクトリ構造の例を示している。 FIG. 11 shows an example of the directory structure of files recorded on the optical disc 31 after nondestructive editing.

なお、図１１において、図４と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 In FIG. 11, the same components as those in FIG.

図１１の例では、光ディスク３１には、インデックスファイル(INDEX.XML)９６とディスクメタファイル（DISCMETA.XML）が設けられている。図１１のインデックスファイル９６には、クリップを管理するための情報だけでなく、エディットリストを管理するための情報も記述される。 In the example of FIG. 11, the optical disc 31 is provided with an index file (INDEX.XML) 96 and a disc metafile (DISCMETA.XML). The index file 96 in FIG. 11 describes not only information for managing clips but also information for managing edit lists.

また、光ディスク３１には、クリップディレクトリ（Clip）９５、エディットリストのファイルが下位に設けられるエディットリストディレクトリ（Edit）、およびプロキシディレクトリ（Sub）が設けられている。 In addition, the optical disk 31 is provided with a clip directory (Clip) 95, an edit list directory (Edit) in which edit list files are provided at a lower level, and a proxy directory (Sub).

図１１の例では、光ディスク３１には、ビデオカメラ２１またはビデオカメラ２２により撮影された４つのクリップのデータが集約されている。 In the example of FIG. 11, data of four clips photographed by the video camera 21 or the video camera 22 are collected on the optical disc 31.

即ち、例えば、クリップディレクトリ９５の下位には、ビデオカメラ２１により撮影された第１のクリップファイル（C0001.MXF）および第１のNRTファイル（C0001M01.XML）、第２のクリップファイル（C0002.MXF）および第２のNRTファイル（C0002M01.XML）、並びに第３のクリップファイル（C0003.MXF）および第３のNRTファイル（C0003M01.XML）と、ビデオカメラ２２により撮影された第４のクリップファイル(C0004.MXF)および第４のNRTファイル（C0004M01.XML）とが設けられている。 That is, for example, below the clip directory 95, the first clip file (C0001.MXF), the first NRT file (C0001M01.XML), and the second clip file (C0002.MXF) captured by the video camera 21 are displayed. ) And the second NRT file (C0002M01.XML), the third clip file (C0003.MXF) and the third NRT file (C0003M01.XML), and the fourth clip file ( C0004.MXF) and a fourth NRT file (C0004M01.XML).

図１１において、このようなクリップディレクトリ９５の下方に示されるエディットディレクトリ（Edit）には、エディットリストが、編集処理ごとに異なるファイルとして記録されている。 In FIG. 11, in the edit directory (Edit) shown below such a clip directory 95, edit lists are recorded as different files for each editing process.

例えば、図１１の例の場合、エディットディレクトリ（Edit）の下位には、光ディスク３１に記録された第１乃至第４のクリップの１回目の編集処理の編集結果に関するエディットリストを含むファイルであるエディットリストファイル（E0001E01.SMI）と、１回目の編集結果を構成する素材データに対応するNRTデータ、または、そのNRTデータに基づいて新たに生成されたNRTデータを含むファイルであるエディットリスト用NRTファイル（E0001M01.XML）が設けられている。また、同様に、２回目の編集処理のエディットリストファイル（E0002E01.SMI）と、エディットリスト用NRTファイル（E0002M01.XML）が設けられている。 For example, in the case of the example of FIG. 11, the edit directory (Edit) is a file including an edit list relating to the edit result of the first edit process of the first to fourth clips recorded on the optical disc 31 below the edit directory (Edit). Edit list NRT file that is a list file (E0001E01.SMI) and NRT data corresponding to the material data constituting the first editing result, or a file containing NRT data newly generated based on the NRT data (E0001M01.XML) is provided. Similarly, an edit list file (E0002E01.SMI) for the second editing process and an NRT file for edit list (E0002M01.XML) are provided.

また、図１１において、このようなクリップディレクトリ（Clip）の下方に示される、プロキシディレクトリ（Sub）には、光ディスク３１に記録された４つのクリップのプロキシデータが集約されている。 In FIG. 11, proxy data of four clips recorded on the optical disc 31 is collected in a proxy directory (Sub) shown below such a clip directory (Clip).

例えば、図１１の例の場合、プロキシディレクトリ（Sub）の下位には、ビデオカメラ２１により撮影された第１のクリップのプロキシファイル（C0001S01.MXF）、第２のクリップのプロキシファイル（C0002S01.MXF）、および第３のクリップのプロキシファイル（C0003S01.MXF）と、ビデオカメラ２２により撮影された第４のクリップのプロキシファイル（C0004S01.MXF）とが設けられる。 For example, in the case of the example of FIG. 11, the proxy file (C0001S01.MXF) of the first clip and the proxy file (C0002S01.MXF) of the second clip taken by the video camera 21 are located below the proxy directory (Sub). ), And a third clip proxy file (C0003S01.MXF) and a fourth clip proxy file (C0004S01.MXF) taken by the video camera 22 are provided.

さらに、光ディスク３１には、一般ディレクトリ（General）が設けられている。この一般ディレクトリ（General）には、クリップとエディットリスト以外のデータのファイルが設けられる。 Further, the optical disk 31 is provided with a general directory (General). In this general directory (General), files of data other than clips and edit lists are provided.

次に、図１２は入力画面の例を示している。 Next, FIG. 12 shows an example of the input screen.

図１２の入力画面には、発言者ＩＤと発言者の名前が対応付けて表示される。図１２の例では、発言者ＩＤ「Ａ」を表す「Speaker-A」と発言者の名前「○○さん」、発言者ＩＤ「Ｂ」を表す「Speaker-B」と発言者の名前「××さん」、発言者ＩＤ「Ｃ」を表す「Speaker-C」と発言者の名前「△△さん」が、それぞれ対応付けて表示される。 In the input screen of FIG. 12, the speaker ID and the name of the speaker are displayed in association with each other. In the example of FIG. 12, “Speaker-A” representing the speaker ID “A”, the speaker name “XXX”, “Speaker-B” representing the speaker ID “B”, and the speaker name “×”. “Mr. x”, “Speaker-C” representing the speaker ID “C”, and the name of the speaker “Mr. ΔΔ” are displayed in association with each other.

また、入力画面には、いずれか１つの発言者ＩＤと発言者の名前の表示位置にカーソル１６０が配置される。このカーソル１６０は、発言者未定EM(start)と発言者未定EM（end）に、発言者ＩＤを付加するときに操作される。 On the input screen, a cursor 160 is placed at the display position of any one speaker ID and the name of the speaker. The cursor 160 is operated when a speaker ID is added to a speaker undetermined EM (start) and a speaker undetermined EM (end).

具体的には、ユーザは、例えばスピーカ１２５から出力されるクリップの音声を聞きながら、操作部１２４を操作し、その音声を発した発言者の発言者ＩＤと名前の表示位置にカーソル１６０を移動させて、決定の指令を行う。付加部１５１は、この操作を表す操作信号に応じて、決定の指令時に再生中の音声に対応するフレームの直前に付与されている発言者未定EM(start)と、直後に付与されている発言者未定EM（end）に、カーソル１６０の位置に対応する発言者ＩＤを付加する。 Specifically, for example, the user operates the operation unit 124 while listening to the sound of the clip output from the speaker 125, and moves the cursor 160 to the display position of the speaker ID and name of the speaker who has made the sound. And issue a decision command. In response to the operation signal indicating this operation, the adding unit 151 adds the speaker undetermined EM (start) given immediately before the frame corresponding to the sound being reproduced at the time of the decision command and the message given immediately after The speaker ID corresponding to the position of the cursor 160 is added to the speaker undetermined EM (end).

次に、図１３は、発言者EM(start)または発言者EM（end）を記述した電子マークデータの例を示している。なお、図１３では、図６の発言者未定EM(start)と発言者未定EM（end）に発言者ＩＤが付加された発言者EM(start)と発言者EM（end）を記述した電子マークデータを示している。 Next, FIG. 13 shows an example of electronic mark data describing the speaker EM (start) or the speaker EM (end). In FIG. 13, an electronic mark describing a speaker EM (start) and a speaker EM (end) in which a speaker ID is added to the speaker undecided EM (start) and the speaker undecided EM (end) in FIG. Data are shown.

図１３の例では、電子マークテーブル（<EssenceMark Table> </EssenceMark Table>）で囲まれる電子マークテーブル部は、２乃至１１行目に記述されている。 In the example of FIG. 13, the electronic mark table portion surrounded by the electronic mark table (<EssenceMark Table> </ EssenceMark Table>) is described in the 2nd to 11th lines.

図１３の２行目の「targetMedia="Original-Material"」は、この電子マークデータが、クリップの素材データに付与される電子マークを記述した電子マークデータであることを示している。 “TargetMedia =“ Original-Material ”” on the second line in FIG. 13 indicates that this electronic mark data is electronic mark data describing an electronic mark added to the material data of the clip.

また、３行目の「EssenceMark value="Speaker-A:start"frameCount="0"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の開始位置であり、付与位置がクリップの先頭から0フレーム目であることを示している。即ち、図１３の３行目の記述は、図６の３行目の記述が示す発言者未定EM(start)に発言者ＩＤ「Ａ」が付加された発言者EM(start)を示している。 Also, the description of “EssenceMark value =“ Speaker-A: start ”frameCount =“ 0 ”” on the third line is the start position of the speech by the speaker whose speaker ID is “A”. , Indicating that the assigned position is the 0th frame from the beginning of the clip. That is, the description on the third line in FIG. 13 indicates the speaker EM (start) in which the speaker ID “A” is added to the speaker undetermined EM (start) indicated by the description on the third line in FIG. .

また、４行目の「EssenceMark value="Speaker-A:end"frameCount="564"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の終了位置であり、付与位置がクリップの先頭から564フレーム目であることを示している。 The description of “EssenceMark value =“ Speaker-A: end ”frameCount =“ 564 ”” on the fourth line is the end position of the speech by the speaker whose speaker ID is “A”. , Indicating that the assigned position is the 564th frame from the beginning of the clip.

同様に、５行目の「EssenceMark value="Speaker-B:start"frameCount="564"」、７行目の「EssenceMark value="Speaker-A:start"frameCount="924"」、９行目の「EssenceMark value="Speaker-B:start"frameCount="1804"」の記述は、この電子マークが示す特徴が、それぞれ、発言者ＩＤ「Ｂ」の発言者、発言者ＩＤ「Ａ」の発言者、発言者ＩＤ「Ｂ」の発言者による発言の開始位置であり、付与位置が、それぞれ、クリップの先頭から564フレーム目、924フレーム目、1804フレーム目であることを示している。 Similarly, "EssenceMark value =" Speaker-B: start "frameCount =" 564 "" on the 5th line, "EssenceMark value =" Speaker-A: start "frameCount =" 924 "" on the 7th line, 9th line "EssenceMark value =" Speaker-B: start "frameCount =" 1804 "" describes that the characteristics indicated by the electronic mark are the speaker with the speaker ID "B" and the speaker with the speaker ID "A", respectively. The start position of the speaker by the speaker with the speaker ID “B” and the assigned positions are the 564th frame, the 924th frame, and the 1804th frame from the top of the clip, respectively.

また、６行目の「EssenceMark value="Speaker-B:end"frameCount="924"」、８行目の「EssenceMark value="Speaker-A:end"frameCount="1804"」、１０行目の「EssenceMark value="Speaker-B:end"frameCount="2100"」の記述は、この電子マークが示す特徴が、それぞれ、発言者ＩＤ「Ｂ」の発言者、発言者ＩＤ「Ａ」の発言者、発言者ＩＤ「Ｂ」の発言者による発言の終了位置であり、付与位置が、それぞれ、クリップの先頭から924フレーム目、1804フレーム目、2100フレーム目であることを示している。 Also, "EssenceMark value =" Speaker-B: end "frameCount =" 924 "" on the sixth line, "EssenceMark value =" Speaker-A: end "frameCount =" 1804 "" on the eighth line, The description “EssenceMark value =“ Speaker-B: end ”frameCount =“ 2100 ”” indicates that the feature indicated by the electronic mark is the speaker with the speaker ID “B” and the speaker with the speaker ID “A”, respectively. , The ending position of the utterance by the speaker with the speaker ID “B”, and the attachment positions are the 924th frame, the 1804th frame, and the 2100th frame from the top of the clip, respectively.

次に、図１４乃至図２０を参照して、編集装置４１における非破壊編集について説明する。 Next, nondestructive editing in the editing device 41 will be described with reference to FIGS.

なお、ここでは、ビデオカメラ２１が、発言者ＩＤ「Ａ」の発言者「○○さん」と発言者ＩＤ「Ｂ」の発言者「××さん」の２人を被写体として撮影するとともに対話の音声を取得し、ビデオカメラ２２が、発言者「××さん」のみを被写体として撮影することにより、２台のビデオカメラ２１および２２が、２人の発言者の対話をテレビジョン素材として撮影したものとする。 Here, the video camera 21 shoots two persons, a speaker “Mr. XX” with a speaker ID “A” and a speaker “Mr. XX” with a speaker ID “B”, as well as a conversation. The audio is acquired, and the video camera 22 captures only the speaker “Mr. XX” as the subject, so that the two video cameras 21 and 22 capture the conversation between the two speakers as the television material. Shall.

そして、ユーザは、そのテレビジョン素材の所定の連続する区間の音声を切り取って編集結果の音声として使用するとともに、所定の区間の映像を切り取って編集結果の映像として使用し、発言者「××さん」の発言にダックボイス加工を施すように、非破壊編集を行う。 Then, the user cuts out a predetermined continuous section of the television material and uses it as the edited result, and also cuts out a predetermined section of the video and uses it as the edited result. Perform non-destructive editing so that Duck voice processing is applied to the remarks of "San".

まず最初に、図１４と図１５を参照して、光ディスク３１に記録されている編集対象のクリップと編集結果について説明する。なお、図１４Ａにおいて、横軸は撮影時刻を表しており、図１４Ｂと図１５において、横軸はフレーム番号を表している。 First, with reference to FIG. 14 and FIG. 15, a clip to be edited and an editing result recorded on the optical disc 31 will be described. In FIG. 14A, the horizontal axis represents the shooting time, and in FIGS. 14B and 15, the horizontal axis represents the frame number.

図１４Ａの上段の棒は、ビデオカメラ２１により撮影された編集対象である第１のクリップの映像の長さを示しており、棒の上に記述されている数字は、その記述位置に対応する撮影時刻に撮影された映像のフレーム番号を示している。即ち、図１４の例では、第１のクリップの映像のフレーム数は2525フレームであり、各フレームには、フレーム番号が「0」から順に「2524」まで付与されている。 The upper bar in FIG. 14A indicates the length of the video of the first clip to be edited, which is captured by the video camera 21, and the number described on the bar corresponds to the description position. The frame number of the video shot at the shooting time is shown. That is, in the example of FIG. 14, the number of frames of the first clip video is 2525 frames, and frame numbers are assigned from “0” to “2524” in order.

また、図１４Ａの中段の棒は、第１のクリップの音声の長さを示しており、棒の中のアルファベットは、その位置に対応する音声を発した発言者の発言者ＩＤである。 Further, the middle bar in FIG. 14A indicates the length of the sound of the first clip, and the alphabet in the bar is the speaker ID of the speaker who uttered the sound corresponding to the position.

なお、図１４の例では、第１のクリップには、図１３に示した発言者EM(start)と発言者EM（end）が付与されている。従って、図１４Ａの中段の棒には、図１３の３行目の記述が示す発言者EM(start)が付与されたフレームのフレーム番号「0」から、図１３の４行目の記述が示す発言者EM(end)が付与されたフレームのフレーム番号「564」までに対応する位置に、その区間の音声の発言者の発言者ＩＤ「Ａ」が記述されている。 In the example of FIG. 14, the speaker EM (start) and the speaker EM (end) shown in FIG. 13 are assigned to the first clip. Accordingly, the middle bar of FIG. 14A shows the description of the fourth line in FIG. 13 from the frame number “0” of the frame to which the speaker EM (start) indicated by the description of the third line of FIG. In a position corresponding to the frame number “564” of the frame to which the speaker EM (end) is assigned, the speaker ID “A” of the speaker of the voice in the section is described.

同様に、図１４Ａの中段の棒には、図１３の５行目の記述が示す発言者EM(start)が付与されたフレームのフレーム番号「564」から、図１３の６行目の記述が示す発言者EM(end)が付与されたフレームのフレーム番号「924」までに対応する位置に、その区間の音声の発言者の発言者ＩＤ「Ｂ」が記述されている。 Similarly, in the middle bar of FIG. 14A, the description on the sixth line in FIG. 13 is given from the frame number “564” of the frame to which the speaker EM (start) indicated by the description on the fifth line in FIG. The speaker ID “B” of the speaker of the voice in the section is described at the position corresponding to the frame number “924” of the frame to which the speaker EM (end) shown is assigned.

また、図１４Ａの中段の棒には、図１３の７行目の記述が示す発言者EM(start)が付与されたフレームのフレーム番号「924」から、図１３の８行目の記述が示す発言者EM(end)が付与されたフレームのフレーム番号「1804」までに対応する位置に、その区間の音声の発言者の発言者ＩＤ「Ａ」が記述されている。 14A shows the description in the eighth line in FIG. 13 from the frame number “924” of the frame to which the speaker EM (start) indicated in the description in the seventh line in FIG. 13 is assigned. In a position corresponding to the frame number “1804” of the frame to which the speaker EM (end) is assigned, the speaker ID “A” of the speaker of the voice in the section is described.

さらに、図１４Ａの中段の棒には、図１３の９行目の記述が示す発言者EM(start)が付与されたフレームのフレーム番号「1804」から、図１３の１０行目の記述が示す発言者EM(end)が付与されたフレームのフレーム番号「2100」までに対応する位置に、その区間の音声の発言者の発言者ＩＤ「Ｂ」が記述されている。 Further, the middle bar of FIG. 14A shows the description of the 10th line of FIG. 13 from the frame number “1804” of the frame to which the speaker EM (start) indicated by the description of the 9th line of FIG. In a position corresponding to the frame number “2100” of the frame to which the speaker EM (end) is assigned, the speaker ID “B” of the speaker of the voice in the section is described.

図１４Ａの下段の棒は、ビデオカメラ２２により撮影された編集対象である第４のクリップの映像の長さを示しており、棒の上に記述されている数字は、その記述位置に対応する撮影時刻に撮影された映像のフレーム番号を示している。即ち、図１４の例では、第４のクリップの映像のフレーム数は2415フレームであり、各フレームには、フレーム番号が「0」から順に「2414」まで付与されている。 The bar in the lower part of FIG. 14A indicates the length of the video of the fourth clip that is the editing target photographed by the video camera 22, and the number described on the bar corresponds to the description position. The frame number of the video shot at the shooting time is shown. That is, in the example of FIG. 14, the number of frames of the fourth clip video is 2415 frames, and frame numbers are assigned from “0” to “2414” in order to each frame.

図１４Ｂの上段の棒は、図１４Ａに示した第１のクリップと第４のクリップを編集対象として非破壊編集が行われた結果得られる編集結果の映像の長さを示しており、棒の上に記述されている数字は、その記述位置に対応する編集結果上の映像のフレーム番号を示している。 The upper bar in FIG. 14B shows the length of the video of the editing result obtained as a result of non-destructive editing performed on the first clip and the fourth clip shown in FIG. 14A. The numbers described above indicate the frame number of the video on the editing result corresponding to the description position.

即ち、図１４の例では、ユーザが図１４Ａに示した第１のクリップのフレーム番号「284」を映像のイン点として指定し、フレーム番号「564」を映像のアウト点として指定している。これにより、図１４Ｂの上段に示すように、編集結果には、第１のクリップのフレーム番号「284」から「564」までの編集区間の映像データが、編集結果のフレーム番号「0」から「280」までの映像データ（以下、第１の映像サブクリップという）として含まれる。 That is, in the example of FIG. 14, the user designates the frame number “284” of the first clip shown in FIG. 14A as the video in point, and designates the frame number “564” as the video out point. As a result, as shown in the upper part of FIG. 14B, the editing result includes the video data of the editing section from the frame number “284” to “564” of the first clip to the frame numbers “0” to “564” of the editing result. 280 "(hereinafter referred to as the first video sub-clip).

また、図１４の例では、ユーザが図１４Ａに示した第４のクリップのフレーム番号「454」を映像のイン点として指定し、フレーム番号「1054」を映像のアウト点として指定している。これにより、図１４Ｂの上段に示すように、編集結果には、第４のクリップのフレーム番号「454」から「1054」までの編集区間の映像データが、編集結果のフレーム番号「280」から「880」までの映像データ（以下、第２の映像サブクリップという）として含まれる。 In the example of FIG. 14, the user designates the frame number “454” of the fourth clip shown in FIG. 14A as the video in point, and designates the frame number “1054” as the video out point. As a result, as shown in the upper part of FIG. 14B, the editing result includes the video data in the editing section from the frame number “454” to “1054” of the fourth clip from the frame number “280” of the editing result to “ 880 "(hereinafter referred to as a second video sub-clip).

さらに、図１４の例では、ユーザが図１４Ａに示した第１のクリップのフレーム番号「１１６４」を映像のイン点として指定し、フレーム番号「1644」を映像のアウト点として指定している。これにより、図１４Ｂの上段に示すように、編集結果には、第１のクリップのフレーム番号「1164」から「1644」までの編集区間の映像データが、編集結果のフレーム番号「880」から「1360」までの映像データ（以下、第３の映像サブクリップという）として含まれる。 Further, in the example of FIG. 14, the user designates the frame number “1164” of the first clip shown in FIG. 14A as the video in point, and designates the frame number “1644” as the video out point. As a result, as shown in the upper part of FIG. 14B, the editing result includes the video data of the editing section from the frame number “1164” to “1644” of the first clip from the frame number “880” of the editing result to “ 1360 "(hereinafter referred to as a third video sub clip).

また、図１４の例では、ユーザが図１４Ａに示した第４のクリップのフレーム番号「1534」を映像のイン点として指定し、フレーム番号「1974」を映像のアウト点として指定している。これにより、図１４Ｂの上段に示すように、編集結果には、第４のクリップのフレーム番号「1534」から「1974」までの編集区間の映像データが、編集結果のフレーム番号「1360」から「1800」までの映像データ（以下、第４の映像サブクリップという）として含まれる。 In the example of FIG. 14, the user designates the frame number “1534” of the fourth clip shown in FIG. 14A as the video in point, and designates the frame number “1974” as the video out point. As a result, as shown in the upper part of FIG. 14B, the editing result includes the video data of the editing section from the frame number “1534” to “1974” of the fourth clip to the frame number “1360” of the editing result. 1800 "(hereinafter referred to as a fourth video sub-clip).

さらに、図１４の例では、ユーザが図１４Ａに示した第１のクリップのフレーム番号「284」を音声のイン点として指定し、フレーム番号「2084」を音声のアウト点として指定している。これにより、図１４Ｂと図１５の下段に示すように、編集結果には、第１のクリップのフレーム番号「284」から「2084」までの編集区間の音声データが、編集結果のフレーム番号「0」から「1800」までの音声データ（以下、音声サブクリップという）として含まれる。 Further, in the example of FIG. 14, the user designates the frame number “284” of the first clip shown in FIG. 14A as the audio In point, and designates the frame number “2084” as the audio Out point. As a result, as shown in the lower part of FIG. 14B and FIG. 15, the editing result includes the audio data of the editing section from the frame numbers “284” to “2084” of the first clip to the frame number “0” of the editing result. ”To“ 1800 ”(hereinafter referred to as audio subclips).

ここで、図１４Ａに示したように、第１のクリップのフレーム番号「0」から「564」までの音声データに対応する発言者ＩＤは「Ａ」であるので、図１４Ｂの下段に示すように、第１のクリップのフレーム番号「284」から「564」までの音声データである、編集結果のフレーム番号「0」から「280」までの音声データに対応する発言者ＩＤは「Ａ」である。 Here, as shown in FIG. 14A, the speaker ID corresponding to the audio data from the frame numbers “0” to “564” of the first clip is “A”. Furthermore, the speaker ID corresponding to the audio data from the frame numbers “0” to “280” of the edited result that is the audio data from the frame numbers “284” to “564” of the first clip is “A”. is there.

また、図１４Ａに示したように、第１のクリップのフレーム番号「564」から「924」までの音声データに対応する発言者ＩＤは「Ｂ」であるので、図１４Ｂの下段に示すように、第１のクリップのフレーム番号「564」から「924」までの音声データである、編集結果のフレーム番号「280」から「640」までの音声データに対応する発言者ＩＤは「Ｂ」である。 Further, as shown in FIG. 14A, the speaker ID corresponding to the audio data from the frame numbers “564” to “924” of the first clip is “B”, and as shown in the lower part of FIG. 14B. The speaker ID corresponding to the audio data from the frame numbers “280” to “640” of the edited result, which is the audio data from the frame numbers “564” to “924” of the first clip, is “B”. .

さらに、図１４Ａに示したように、第１のクリップのフレーム番号「924」から「1804」までの音声データに対応する発言者ＩＤは「Ａ」であるので、図１４Ｂの下段に示すように、第１のクリップのフレーム番号「924」から「1804」までの音声データである、編集結果のフレーム番号「640」から「1520」までの音声データに対応する発言者ＩＤは「Ａ」である。 Furthermore, as shown in FIG. 14A, the speaker ID corresponding to the audio data from the frame numbers “924” to “1804” of the first clip is “A”. The speaker ID corresponding to the audio data from the frame numbers “640” to “1520” of the edited result, which is the audio data from the frame numbers “924” to “1804” of the first clip, is “A”. .

また、図１４Ａに示したように、第１のクリップのフレーム番号「1804」から「2100」までの音声データに対応する発言者ＩＤは「Ｂ」であるので、図１４Ｂの下段に示すように、第１のクリップのフレーム番号「1804」から「2084」までの音声データである、編集結果のフレーム番号「1520」から「1800」までの音声データに対応する発言者ＩＤは「Ｂ」である。 Further, as shown in FIG. 14A, since the speaker ID corresponding to the audio data of the first clip from frame numbers “1804” to “2100” is “B”, as shown in the lower part of FIG. 14B. The speaker ID corresponding to the audio data from the frame numbers “1520” to “1800” of the edited result, which is the audio data from the frame numbers “1804” to “2084” of the first clip, is “B”. .

以上のように、図１４の例では、第３の映像サブクリップのイン点およびアウト点と、発言者の切替点が異なっている。即ち、図１４Ｂに示した編集結果では、図１５に示すように、編集結果のフレーム番号「640」から「1520」までの発言者「○○さん」の発言の最初と最後で、発言者「××さん」のみの映像が表示される。 As described above, in the example of FIG. 14, the in-point and out-point of the third video sub-clip and the speaker switching point are different. That is, in the editing result shown in FIG. 14B, as shown in FIG. 15, at the beginning and the end of the speech of the speaker “Mr. XX” from the frame number “640” to “1520” of the editing result, A video of only “Mr. XX” is displayed.

図１６は、図１４Ｂや図１５に示した編集結果のエディットリストを示している。 FIG. 16 shows an edit list of the editing result shown in FIG. 14B or FIG.

即ち、図１６は、XMLで記述されたエディットリストファイル（図１１）の具体的な記述例を示す図である。なお、図１６において、各行頭の数字は、説明の便宜上付加したものであり、XML記述の一部ではない。このことは、後述する図１９においても同様である。 That is, FIG. 16 is a diagram showing a specific description example of the edit list file (FIG. 11) described in XML. In FIG. 16, the numbers at the beginning of each line are added for convenience of explanation and are not part of the XML description. The same applies to FIG. 19 described later.

エディットリストファイルは、編集結果に関するエディットリストを含むファイルであり、編集結果の再生方法についても記述されている。 The edit list file is a file including an edit list related to the edit result, and also describes a method for reproducing the edit result.

図１６に示すように、エディットリストファイルのXML記述は、主にスマイルタグ（<smil> </smil>）で囲まれたボディタグ（<body> </body>）で囲まれるボディ部により構成される。図１６の例では、このボディ部は３乃至１６行目に記述されている。なお、２行目の「name="Initial-EditList"」の記述は、このファイルがエディットリストファイルであることを示している。 As shown in Fig. 16, the XML description of the edit list file is mainly composed of a body part surrounded by body tags (<body> </ body>) surrounded by smile tags (<smil> </ smil>). Is done. In the example of FIG. 16, this body part is described in the 3rd to 16th lines. The description “name =“ Initial-EditList ”” on the second line indicates that this file is an edit list file.

ボディ部には、編集記述の時間的振る舞いと関係する情報が記述される。図１６の例では、４行目の開始タグ「<par>」と１５行目の終了タグ「</par>」の間に記述されるpar要素は、複数の要素を同時に再生する単純時間グループを定義する。 In the body part, information related to the temporal behavior of the edit description is described. In the example of FIG. 16, the par element described between the start tag “<par>” on the 4th line and the end tag “</ par>” on the 15th line is a simple time group in which a plurality of elements are played back simultaneously. Define

図１６の例では、第１のカット（図１６の例では、Cut１と記述されており、図１４Ｂの第１の映像サブクリップである、第２のカット（図１６の例では、Cut2と記述されており、図１４Ｂの第２の映像サブクリップである）、第３のカット（図１６の例では、Cut3と記述されており、図１４Ｂの第３の映像サブクリップである）、第４のカット（図１６の例では、Cut4と記述されており、図１４Ｂの第４の映像サブクリップである）、および音声(図１６の例では、audio in Cam1-Clip.mxfと記述されており、図１４Ｂの音声サブクリップである)が同時に再生されるように定義されている。 In the example of FIG. 16, the first cut (denoted as Cut1 in the example of FIG. 16 and the second cut (denoted as Cut2 in the example of FIG. 16), which is the first video sub-clip of FIG. 14B. 14B, which is the second video subclip of FIG. 14B), the third cut (in the example of FIG. 16, this is described as Cut3, and is the third video subclip of FIG. 14B), the fourth 16 (described as Cut4 in the example of FIG. 16 and the fourth video subclip of FIG. 14B) and audio (described as audio in Cam1-Clip.mxf in the example of FIG. 16) , Which is the audio sub-clip in FIG. 14B).

但し、図１６の例の場合、後述するように、４つの第１乃至第４の映像サブクリップどうしの再生開始時間はズレており、実際には、第１乃至第４の映像サブクリップは、連続して再生される。 However, in the case of the example of FIG. 16, as described later, the playback start times of the four first to fourth video subclips are shifted, and actually, the first to fourth video subclips are Play continuously.

具体的には、図１６において、６行目、８行目、１０行目、および１２行目のvideo要素には、編集結果の映像として参照するクリップファイルおよび参照するクリップファイルの再生範囲等が記述されている。 Specifically, in FIG. 16, the video elements in the 6th, 8th, 10th, and 12th lines include a clip file that is referred to as an edited video, a playback range of the referenced clip file, and the like. is described.

６行目の「src="Cam1-Clip1.mxf"
」の記述は、参照先のクリップファイルがビデオカメラ２１で記録された第１のクリップであることを示している。 "Src =" Cam1-Clip1.mxf "on the 6th line
"Indicates that the reference clip file is the first clip recorded by the video camera 21.

また、６行目の「clipBegin="284"」の記述は、第１の映像サブクリップとして映像の再生を開始する位置を、第１のクリップのフレーム番号で示している。６行目の「clipEnd="564"」の記述は、第１の映像サブクリップとして映像の再生を終了する位置を、第１のクリップのフレーム番号で示している。 Also, the description of “clipBegin =“ 284 ”” on the sixth line indicates the position at which video playback is started as the first video sub-clip by the frame number of the first clip. The description of “clipEnd =“ 564 ”” on the sixth line indicates the position at which video playback ends as the first video sub-clip, by the frame number of the first clip.

さらに、その記述に続く６行目の「begin="0"」の記述は、編集結果における第１の映像サブクリップが開始される位置を、編集結果上のフレーム番号で示している。また、６行目の「end="280"」の記述は、編集結果における第１の映像サブクリップが終了される位置を、編集結果上のフレーム番号で示している。 Further, the description of “begin =“ 0 ”” on the sixth line following the description indicates the position where the first video sub-clip is started in the editing result by the frame number on the editing result. The description of “end =“ 280 ”” on the sixth line indicates the position at which the first video sub clip ends in the editing result by the frame number on the editing result.

以上のようにして、図１６の例では、編集結果のフレーム番号「0」のフレームからフレーム番号「280」のフレームまでの映像として、第１のクリップのフレーム番号「284」のフレームからフレーム番号「564」のフレームまでの映像が再生されることが、エディットリストに記述されている。 As described above, in the example of FIG. 16, as the video from the frame with frame number “0” to the frame with frame number “280” as the editing result, the frame number from frame with frame number “284” of the first clip is used. It is described in the edit list that the video up to frame “564” is reproduced.

また、第２の映像サブクリップについても、８行目において、第１の映像サブクリップの場合と同様に記述されている。図１６の例では、編集結果のフレーム番号「280」のフレームからフレーム番号「880」のフレームまでの映像として、ビデオカメラ２２で記録された第４のクリップのフレーム番号「454」のフレームからフレーム番号「1054」のフレームまでの映像が再生されることが、エディットリストに記述されている。 The second video sub clip is also described in the eighth line in the same manner as the first video sub clip. In the example of FIG. 16, frames from the frame of frame number “454” of the fourth clip recorded by the video camera 22 are recorded as images from the frame of frame number “280” to the frame of frame number “880” as the editing result. It is described in the edit list that the video up to the frame of the number “1054” is reproduced.

さらに、第３の映像サブクリップについても、１０行目において、第１や第２の映像サブクリップの場合と同様に記述されている。図１６の例では、編集結果のフレーム番号「880」のフレームからフレーム番号「1360」のフレームまでの映像として、第１のクリップのフレーム番号「1164」のフレームからフレーム番号「1644」のフレームまでの映像が再生されることが、エディットリストに記述されている。 Further, the third video sub-clip is also described in the 10th line in the same manner as the first and second video sub-clips. In the example of FIG. 16, as the video from the frame of frame number “880” to the frame of frame number “1360” as a result of editing, from the frame of frame number “1164” to the frame of frame number “1644” of the first clip. It is described in the edit list that the video will be played.

また、第４の映像サブクリップについても、１２行目において、第１乃至第３の映像サブクリップの場合と同様に記述されている。図１６の例では、編集結果のフレーム番号「1360」のフレームからフレーム番号「1800」のフレームまでの映像として、第４のクリップのフレーム番号「1534」のフレームからフレーム番号「1974」のフレームまでの映像が再生されることが、エディットリストに記述されている。 The fourth video sub-clip is also described on the 12th line in the same manner as the first to third video sub-clips. In the example of FIG. 16, from the frame of the editing result frame number “1360” to the frame of frame number “1800”, from the frame of frame number “1534” of the fourth clip to the frame of frame number “1974” It is described in the edit list that the video will be played.

さらに、図１６において、１４行目のaudio要素には、編集結果の音声として参照するクリップファイルおよび参照するクリップファイルの再生範囲等が記述されている。１４行目の「src="Cam1-Clip1.mxf"
」の記述は、参照先のクリップファイルがビデオカメラ２１で記録された第１のクリップであることを示している。 Further, in FIG. 16, the audio element on the 14th line describes a clip file to be referred to as the sound of the edited result, a playback range of the referenced clip file, and the like. 14th line "src =" Cam1-Clip1.mxf "
"Indicates that the reference clip file is the first clip recorded by the video camera 21.

また、１４行目の「channel=l」の記述は、第１のクリップの音声を再生するチャンネルを示している。１４行目の「clipBegin="284"」の記述は、音声サブクリップとして音声の再生を開始する位置を、第１のクリップのフレーム番号で示している。１４行目の「clipEnd="2084"」の記述は、音声サブクリップとして音声の再生を終了する位置を、第１のクリップのフレーム番号で示している。 The description of “channel = l” on the 14th line indicates a channel for reproducing the sound of the first clip. The description of “clipBegin =“ 284 ”” on the 14th line indicates the position where the reproduction of the audio is started as the audio sub clip by the frame number of the first clip. The description of “clipEnd =“ 2084 ”” on the 14th line indicates the position at which the reproduction of the audio as the audio sub clip ends by the frame number of the first clip.

さらに、その記述に続く１４行目の「begin="0"」の記述は、編集結果における音声サブクリップが開始される位置を、編集結果上のフレーム番号で示している。また、１４行目の「end="1800"」の記述は、編集結果における音声サブクリップが終了される位置を、編集結果上のフレーム番号で示している。 Further, the description of “begin =“ 0 ”” on the 14th line following the description indicates the position where the audio sub-clip is started in the editing result by the frame number on the editing result. The description of “end =“ 1800 ”” on the 14th line indicates the position where the audio sub clip ends in the editing result by the frame number on the editing result.

以上のように、図１６の例では、編集結果のフレーム番号「0」のフレームからフレーム番号「1800」のフレームまでの１チャンネルの音声として、第１のクリップのフレーム番号「284」のフレームからフレーム番号「2084」のフレームまでの音声が再生されることが、エディットリストに記述されている。 As described above, in the example of FIG. 16, as one-channel audio from the frame with the frame number “0” to the frame with the frame number “1800” as the editing result, the frame from the frame with the frame number “284” of the first clip is used. It is described in the edit list that the sound up to the frame of frame number “2084” is reproduced.

従って、図１６のエディットリストによれば、図１４Ｂに示したように、編集結果のフレーム番号「0」のフレームからフレーム番号「1800」のフレームまでの映像として、第１乃至第４のサブクリップが連続して再生される。また、それと同時に、編集結果のフレーム番号「0」のフレームからフレーム番号「1800」のフレームまでの１チャンネルの音声として、音声サブクリップが再生される。 Therefore, according to the edit list of FIG. 16, as shown in FIG. 14B, the first to fourth sub-clips are displayed as the video from the frame of frame number “0” to the frame of frame number “1800” as shown in FIG. Are played continuously. At the same time, the audio sub-clip is reproduced as one-channel audio from the frame with the frame number “0” as the editing result to the frame with the frame number “1800”.

次に、図１７を参照して、図１４Ｂや図１５に示した編集結果に付与される発言者EM(start)と発言者EM(end)について説明する。なお、図１７において、横軸はフレーム番号を表している。 Next, referring to FIG. 17, the speaker EM (start) and the speaker EM (end) given to the editing results shown in FIG. 14B and FIG. 15 will be described. In FIG. 17, the horizontal axis represents the frame number.

図１７の上段は、図１３の電子マークデータに記述された、第１のクリップに付与された発言者EM(start)と発言者EM(end)を示している。即ち、図１７の上段に示すように、第１のクリップには、フレーム番号「0」のフレームに発言者ＩＤ「Ａ」が付加された発言者EM(start)が付与され（A11s）、フレーム番号「564」のフレームに発言者ＩＤ「Ａ」が付加された発言者EM(end)が付与されている（A11e）。 The upper part of FIG. 17 shows a speaker EM (start) and a speaker EM (end) given to the first clip described in the electronic mark data of FIG. That is, as shown in the upper part of FIG. 17, a speaker EM (start) in which a speaker ID “A” is added to a frame with a frame number “0” is assigned to the first clip (A11s). The speaker EM (end) with the speaker ID “A” added to the frame of the number “564” is assigned (A11e).

また、図１７の上段に示すように、第１のクリップには、フレーム番号「564」のフレームに発言者ＩＤ「Ｂ」が付加された発言者EM(start)が付与され（B11s）、フレーム番号「924」のフレームに発言者ＩＤ「Ｂ」が付加された発言者EM(end)が付与されている（B11e）。 As shown in the upper part of FIG. 17, a speaker EM (start) in which a speaker ID “B” is added to a frame with a frame number “564” is assigned to the first clip (B11s), The speaker EM (end) with the speaker ID “B” added to the frame of the number “924” is assigned (B11e).

さらに、図１７の上段に示すように、第１のクリップには、フレーム番号「924」のフレームに発言者ＩＤ「Ａ」が付加された発言者EM(start)が付与され（A12s）、フレーム番号「1804」のフレームに発言者ＩＤ「Ａ」が付加された発言者EM(end)が付与されている（A12e）。 Further, as shown in the upper part of FIG. 17, a speaker EM (start) in which a speaker ID “A” is added to a frame with a frame number “924” is assigned to the first clip (A12s). The speaker EM (end) with the speaker ID “A” added to the frame of the number “1804” is assigned (A12e).

また、図１７の上段に示すように、第１のクリップには、フレーム番号「1804」のフレームに発言者ＩＤ「Ｂ」が付加された発言者EM(start)が付与され（B12s）、フレーム番号「2100」のフレームに発言者ＩＤ「Ｂ」が付加された発言者EM(end)が付与されている（B12e）。 Also, as shown in the upper part of FIG. 17, a speaker EM (start) in which a speaker ID “B” is added to a frame with a frame number “1804” is assigned to the first clip (B12s), and the frame The speaker EM (end) with the speaker ID “B” added to the frame of the number “2100” is assigned (B12e).

以上のような発言者EM(start)と発言者EM(end)が付与された第１のクリップに対して、図１４Ｂや図１５の編集結果を得る非破壊編集が行われる場合、音声のイン点として指定された第１のクリップのフレーム番号のフレームの直前のフレームに付与されている発言者EM(start)が、そのイン点に対応する編集結果上のフレームに付与される。 When non-destructive editing is performed on the first clip to which the speaker EM (start) and the speaker EM (end) as described above are assigned, the editing results of FIGS. 14B and 15 are obtained. The speaker EM (start) given to the frame immediately before the frame having the frame number of the first clip designated as the point is given to the frame on the editing result corresponding to the In point.

図１７の例では、音声のイン点として指定された第１のクリップのフレーム番号「284」のフレームの直前のフレーム番号「0」のフレームに付与されている発言者ＩＤ「Ａ」が付加された発言者EM(start)が、そのイン点に対応する編集結果上のフレーム番号「0」のフレームに付与される（A21s）。 In the example of FIG. 17, the speaker ID “A” given to the frame with the frame number “0” immediately before the frame with the frame number “284” of the first clip designated as the audio In point is added. The speaker EM (start) is added to the frame of frame number “0” on the editing result corresponding to the IN point (A21s).

また、音声のイン点として指定された第１のクリップのフレーム番号のフレームから、音声のアウト点として指定された第１のクリップのフレーム番号のフレームまでのフレームに付与されている電子マークが、そのフレームに対応する編集結果上のフレームに付与される。 In addition, an electronic mark given to a frame from a frame having a frame number of the first clip designated as an audio In point to a frame having a frame number of the first clip designated as an audio Out point is It is given to the frame on the editing result corresponding to the frame.

図１７の例では、音声のイン点として指定された第１のクリップのフレーム番号「284」のフレームから、音声のアウト点として指定された第１のクリップのフレーム番号「2084」のフレームまでの間の、フレーム番号「564」のフレームに付与されている発言者ＩＤ「Ａ」が付加された発言者EM(end)と、発言者ＩＤ「Ｂ」が付加された発言者EM(start)が、そのフレームに対応する編集結果上のフレーム番号「280」のフレームに付与される（A21e,B21s）。 In the example of FIG. 17, the frame from the frame number “284” of the first clip designated as the audio In point to the frame of frame number “2084” of the first clip designated as the audio Out point. The speaker EM (end) to which the speaker ID “A” added to the frame with the frame number “564” is added and the speaker EM (start) to which the speaker ID “B” is added are The frame number “280” on the editing result corresponding to the frame is assigned to the frame (A21e, B21s).

また、フレーム番号「924」のフレームに付与されている発言者ＩＤ「Ａ」が付加された発言者EM(start)と、発言者ＩＤ「Ｂ」が付加された発言者EM(end)が、そのフレームに対応する編集結果上のフレーム番号「640」のフレームに付与される（A22s,B21e）。さらに、フレーム番号「1804」のフレームに付与されている発言者ＩＤ「Ａ」が付加された発言者EM(end)と、発言者ＩＤ「Ｂ」が付加された発言者EM(start)が、そのフレームに対応する編集結果上のフレーム番号「1520」フレームに付与される（A22e,B22s）。 Further, a speaker EM (start) to which the speaker ID “A” given to the frame of the frame number “924” is added and a speaker EM (end) to which the speaker ID “B” is added are It is given to the frame of frame number “640” on the editing result corresponding to the frame (A22s, B21e). Further, a speaker EM (end) to which the speaker ID “A” added to the frame of the frame number “1804” is added and a speaker EM (start) to which the speaker ID “B” is added are: The frame number “1520” on the editing result corresponding to the frame is assigned to the frame (A22e, B22s).

さらに、音声のアウト点として指定された第１のクリップのフレーム番号のフレームの直後のフレームに付与されている発言者EM(end)が、そのアウト点に対応する編集結果上のフレームに付与される。図１７の例では、音声のアウト点として指定された第１のクリップのフレーム番号「2084」のフレームの直後のフレーム番号「2100」のフレームに付与されている発言者ＩＤ「Ｂ」が付加された発言者EM(end)が、そのアウト点に対応する編集結果上のフレーム番号「1800」のフレームに付与される（B22e）。 Furthermore, the speaker EM (end) assigned to the frame immediately after the frame of the frame number of the first clip designated as the audio out point is assigned to the frame on the editing result corresponding to the out point. The In the example of FIG. 17, the speaker ID “B” given to the frame of the frame number “2100” immediately after the frame of the frame number “2084” of the first clip designated as the audio out point is added. The speaker EM (end) is added to the frame of the frame number “1800” on the editing result corresponding to the out point (B22e).

以上のようにして編集結果に付与された発言者EM(start)と発言者EM(end)を記述した電子マークデータは、図１８に示すようになる。 The electronic mark data describing the speaker EM (start) and the speaker EM (end) given to the editing result as described above is as shown in FIG.

図１８の例では、電子マークテーブル（<EssenceMark Table> </EssenceMark Table>）で囲まれる電子マークテーブル部は、２乃至１１行目に記述されている。 In the example of FIG. 18, the electronic mark table portion surrounded by the electronic mark table (<EssenceMark Table> </ EssenceMark Table>) is described in the 2nd to 11th lines.

図１８の２行目の「targetMedia="Initial-EditList"」は、この電子マークデータが、編集結果に付与される電子マークを記述した電子マークデータであることを示している。 “TargetMedia =“ Initial-EditList ”” on the second line in FIG. 18 indicates that this electronic mark data is electronic mark data describing an electronic mark given to the editing result.

また、３行目の「EssenceMark value="Speaker-A:start"frameCount="0"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の開始位置であり、付与位置が編集結果の先頭から0フレーム目であることを示している。 Also, the description of “EssenceMark value =“ Speaker-A: start ”frameCount =“ 0 ”” on the third line is the start position of the speech by the speaker whose speaker ID is “A”. , Indicating that the assigned position is the 0th frame from the top of the editing result.

４行目の「EssenceMark value="Speaker-A:end"frameCount="280"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の終了位置であり、付与位置が編集結果の先頭から280フレーム目であることを示している。 The description of “EssenceMark value =“ Speaker-A: end ”frameCount =“ 280 ”” on the fourth line is the end position of the speech by the speaker whose speaker ID is “A”. This indicates that the position is the 280th frame from the top of the editing result.

同様に、５行目の「EssenceMark value="Speaker-B:start"frameCount="280"」、７行目の「EssenceMark value="Speaker-A:start"frameCount="６４0"」、９行目の「EssenceMark value="Speaker-B:start"frameCount="1520"」の記述は、この電子マークが示す特徴が、それぞれ、発言者ＩＤ「Ｂ」の発言者、発言者ＩＤ「Ａ」の発言者、発言者ＩＤ「Ｂ」の発言者による発言の開始位置であり、付与位置が、それぞれ、編集結果の先頭から280フレーム目、640フレーム目、1520フレーム目であることを示している。 Similarly, "EssenceMark value =" Speaker-B: start "frameCount =" 280 "" on the fifth line, "EssenceMark value =" Speaker-A: start "frameCount =" 640 "" on the seventh line, The description of “EssenceMark value =“ Speaker-B: start ”frameCount =“ 1520 ”” indicates that the feature indicated by this electronic mark is the speaker with the speaker ID “B” and the speaker with the speaker ID “A”, respectively. The start position of the speech by the speaker with the speaker ID “B”, and the assigned positions are the 280th frame, the 640th frame, and the 1520th frame from the top of the editing result, respectively.

また、６行目の「EssenceMark value="Speaker-B:end"frameCount="６４0"」、８行目の「EssenceMark value="Speaker-A:end"frameCount="1520"」、１０行目の「EssenceMark value="Speaker-B:end"frameCount="1800"」の記述は、この電子マークが示す特徴が、それぞれ、発言者ＩＤ「Ｂ」の発言者、発言者ＩＤ「Ａ」の発言者、発言者ＩＤ「Ｂ」の発言者による発言の終了位置であり、付与位置が、それぞれ、編集結果の先頭から640フレーム目、1520フレーム目、1800フレーム目であることを示している。 Also, "EssenceMark value =" Speaker-B: end "frameCount =" 640 "" on the sixth line, "EssenceMark value =" Speaker-A: end "frameCount =" 1520 "" on the eighth line, The description of “EssenceMark value =“ Speaker-B: end ”frameCount =“ 1800 ”” indicates that the characteristics indicated by the electronic mark are the speaker with the speaker ID “B” and the speaker with the speaker ID “A”, respectively. , The ending position of the utterance by the speaker with the speaker ID “B”, and the addition positions are the 640th frame, the 1520th frame, and the 1800th frame from the top of the editing result, respectively.

図１９は、図１４Ｂや図１５に示した編集結果のうちの発言者ＩＤ「Ｂ」の発言者の音声に対してダックボイス加工を施す場合の、エディットリストの例を示している。 FIG. 19 shows an example of an edit list when the duck voice processing is performed on the voice of the speaker having the speaker ID “B” in the editing results shown in FIGS. 14B and 15.

図１９のエディットリストでは、図１６の１４行目のaudio要素の後に、オーディオフィルタタグ（<audioFilter> </audioFilter>）で囲まれたオーディオフィルタ部が設けられている。このオーディオフィルタ部には、所定の加工を施す音声の区間を指定する情報が記述される。 In the edit list of FIG. 19, an audio filter section surrounded by audio filter tags (<audioFilter> </ audioFilter>) is provided after the audio element on the 14th line of FIG. In this audio filter section, information for designating a voice section to be subjected to predetermined processing is described.

詳細には、１４行目のaudio要素の後に設けられた１つ目のオーディオフィルタ部は、１５行目乃至１８行目に記述され、２つ目のオーディオフィルタ部は、１９行目乃至２２行目に記述される。 Specifically, the first audio filter section provided after the audio element in the 14th line is described in the 15th to 18th lines, and the second audio filter section is in the 19th to 22nd lines. Described in the eyes.

１５行目の「type="duckVoice"」の記述は、ダックボイス加工を施すことを示している。その記述に続く１５行目の「begin="280"」の記述は、編集結果においてダックボイス加工を施す音声の開始位置を、編集結果上のフレーム番号で示している。図１８に示したように、発言者ＩＤ「Ｂ」の発言者の発言の１つ目の開始位置を示す発言者EM(start)は、編集結果におけるフレーム番号「280」のフレームに付与されているので、１５行目の「begin="280"」の記述では、そのフレーム番号「280」が、編集結果においてダックボイス加工を施す音声の開始位置として示されている。 The description of “type =“ duckVoice ”” on the 15th line indicates that duck voice processing is performed. The description of “begin =“ 280 ”” on the 15th line following the description indicates the start position of the voice to be duck voice processed in the editing result by the frame number on the editing result. As shown in FIG. 18, a speaker EM (start) indicating the first start position of a speaker with a speaker ID “B” is assigned to a frame with a frame number “280” in the editing result. Therefore, in the description of “begin =“ 280 ”” on the 15th line, the frame number “280” is shown as the start position of the voice to be duck voice processed in the editing result.

また、１５行目の「end="６４0"」の記述は、編集結果においてダックボイス加工を施す音声の終了位置を、編集結果上のフレーム番号で示している。図１８に示したように、発言者ＩＤ「Ｂ」の発言者の発言の１つ目の終了位置を示す発言者EM（end）は、編集結果におけるフレーム番号「６４0」のフレームに付与されているので、１５行目の「end="６４0"」の記述では、そのフレーム番号「６４0」が、編集結果においてダックボイス加工を施す音声の終了位置として示されている。 Further, the description of “end =“ 640 ”” on the 15th line indicates the end position of the sound to be duck voice processed in the editing result by the frame number on the editing result. As shown in FIG. 18, the speaker EM (end) indicating the first end position of the speech of the speaker with the speaker ID “B” is assigned to the frame with the frame number “640” in the editing result. Therefore, in the description of “end =“ 640 ”” on the 15th line, the frame number “640” is shown as the end position of the voice to be duck voice processed in the editing result.

以上のように、１５行目の「begin="280"」の記述と１５行目の「end="６４0"」の記述は、発言者ＩＤ「Ｂ」の発言者の音声の区間である280フレーム目から６４0フレーム目までの区間を、ダックボイス加工を施す区間として指定している。 As described above, the description of “begin =“ 280 ”” on the 15th line and the description of “end =“ 640 ”” on the 15th line are 280 of the voice of the speaker with the speaker ID “B”. The section from the frame to the 640th frame is designated as the section for performing duck voice processing.

さらに、１６行目と１７行目のparam要素には、ダックボイス加工に関するパラメータの設定値が記述される。詳細には、１６行目の「name="pitch"」の記述は、設定値が設定されるパラメータがピッチであることを示している。また、１６行目の「value="0.5"」の記述は、その設定値が0.5であることを示している。 Furthermore, parameter setting values relating to duck voice processing are described in the param elements on the 16th and 17th lines. Specifically, the description of “name =“ pitch ”” on the 16th line indicates that the parameter for which the set value is set is the pitch. The description of “value =“ 0.5 ”” on the 16th line indicates that the set value is 0.5.

また、１７行目の「name="formant"」の記述は、設定値が設定されるパラメータがフォルマントであることを示している。また、１７行目の「value="1.0"」の記述は、その設定値が1.0であることを示している。 The description of “name =“ formant ”” on the 17th line indicates that the parameter for which the setting value is set is formant. The description of “value =“ 1.0 ”” on the 17th line indicates that the set value is 1.0.

同様に、１９行目には、図１８の電子マークデータに記述される、発言者ＩＤ「Ｂ」が付加された２つ目の発言者EM(start)が付与されているフレームのフレーム番号と、発言者EM（end）が付与されているフレームのフレーム番号が、それぞれ、ダックボイス加工を施す音声の開始位置と終了位置として記述される。また、２０行目および２１行目には、このダックボイス加工のパラメータであるピッチの設定値として0.5が記述され、フォルマントの設定値として1.0が記述される。 Similarly, the 19th line includes the frame number of the frame to which the second speaker EM (start) added with the speaker ID “B” described in the electronic mark data of FIG. The frame number of the frame to which the speaker EM (end) is assigned is described as the start position and end position of the sound to be duck voice processed, respectively. In the 20th and 21st lines, 0.5 is described as the pitch setting value, which is a parameter for the duck voice processing, and 1.0 is described as the formant setting value.

図２０は、図１４Ｂや図１５に示した編集結果のうちの発言者ＩＤ「Ｂ」の発言者の音声に対してダックボイス加工を施す場合の、編集結果の電子マークデータの例を示している。 FIG. 20 shows an example of electronic mark data of the editing result when the duck voice processing is performed on the voice of the speaker having the speaker ID “B” in the editing results shown in FIG. 14B and FIG. Yes.

図２０の例では、電子マークテーブル（<EssenceMark Table> </EssenceMark Table>）で囲まれる電子マークテーブル部は、２乃至１１行目に記述されている。 In the example of FIG. 20, the electronic mark table part surrounded by the electronic mark table (<EssenceMark Table> </ EssenceMark Table>) is described in the 2nd to 11th lines.

図２０の２行目の「targetMedia="Initial-EditList"」は、この電子マークデータが、編集結果に付与される電子マークを記述した電子マークデータであることを示している。 “TargetMedia =“ Initial-EditList ”” on the second line in FIG. 20 indicates that this electronic mark data is electronic mark data describing an electronic mark given to the editing result.

また、３行目の「EssenceMark value="Speaker-A:start:normal"frameCount="0"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の開始位置であり、その発言はダックボイス加工を施さずにそのまま出力されるものであり、付与位置が編集結果の先頭から0フレーム目であることを示している。 The description “EssenceMark value =“ Speaker-A: start: normal ”frameCount =“ 0 ”” on the third line indicates that the start position of the speech by the speaker whose speaker ID is “A”. The utterance is output as it is without being subjected to duck voice processing, and indicates that the assigned position is the 0th frame from the head of the editing result.

４行目の「EssenceMark value="Speaker-A:end:normal"frameCount="280"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の終了位置であり、その発言はダックボイス加工を施さずにそのまま出力されるものであり、付与位置が編集結果の先頭から280フレーム目であることを示している。 The description of “EssenceMark value =“ Speaker-A: end: normal ”frameCount =“ 280 ”” on the fourth line is the end position of the speech by the speaker whose speaker ID is “A”. The remarks are output as they are without being subjected to duck voice processing, indicating that the assigned position is the 280th frame from the top of the editing result.

また、５行目の「EssenceMark value="Speaker-B:start:duckVoice"frameCount="280"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ｂ」の発言者による発言の開始位置であり、その発言はダックボイス加工を施して出力されるものであり、付与位置が編集結果の先頭から280フレーム目であることを示している。 In addition, the description of “EssenceMark value =“ Speaker-B: start: duckVoice ”frameCount =“ 280 ”” on the fifth line indicates that the start position of the speech by the speaker whose speaker ID is “B”. The utterance is output after duck voice processing, and indicates that the assigned position is the 280th frame from the top of the editing result.

６行目の「EssenceMark value="Speaker-B:end:duckVoice"frameCount="６４0"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ｂ」の発言者による発言の終了位置であり、その発言はダックボイス加工を施して出力されるものであり、付与位置が編集結果の先頭から６４0フレーム目であることを示している。 The description of "EssenceMark value =" Speaker-B: end: duckVoice "frameCount =" 640 "" on the sixth line is the end position of the speech by the speaker whose speaker ID is "B". The remarks are output after being processed with duck voice, and indicate that the assigned position is the 640th frame from the top of the editing result.

同様に、７行目乃至１０行目の記述には、フレーム番号「６４0」からフレーム番号「1520」までの発言者ＩＤ「Ａ」の発言者による発言には、ダックボイス加工を施さず、フレーム番号「1520」からフレーム番号「1800」までの発言者ＩＤ「Ｂ」の発言者による発言には、ダックボイス加工を施すことが示されている。 Similarly, in the description on the 7th to 10th lines, the speech by the speaker with the speaker ID “A” from the frame number “640” to the frame number “1520” is not subjected to duck voice processing, It is indicated that the duck voice processing is applied to the speech by the speaker with the speaker ID “B” from the number “1520” to the frame number “1800”.

次に、図２１を参照して、ユーザが編集装置４１を用いて行う編集作業について説明する。 Next, with reference to FIG. 21, editing work performed by the user using the editing device 41 will be described.

図２１の表では、編集作業の各ステップの番号に対応付けて、そのステップにおける編集作業の内容、編集装置４１による主な処理の内容、および、その処理の対象となるデータが記述されている。 In the table of FIG. 21, the contents of the editing work at that step, the contents of the main processing by the editing device 41, and the data to be processed are described in association with the number of each step of the editing work. .

図２１に示すように、ステップＳ５１において、ユーザは、編集装置４１の光ディスクドライブ４１Ａに光ディスク３１を装着し、入力画面（図１２）の表示を指令する。このとき、編集装置４１の付加部１５１は、予めハードディスク１２８に登録されている発言者リストに基づいて、表示装置１２６に入力画面を表示させる。 As shown in FIG. 21, in step S51, the user attaches the optical disc 31 to the optical disc drive 41A of the editing apparatus 41, and instructs the display of the input screen (FIG. 12). At this time, the adding unit 151 of the editing device 41 displays an input screen on the display device 126 based on the speaker list registered in the hard disk 128 in advance.

ステップＳ５２において、ユーザは、操作部１２４を操作し、光ディスク３１に記録されているクリップの再生を指令する。このとき、編集装置４１の付加部１５１は、そのクリップのクリップファイルを光ディスク３１から再生する。その結果、クリップの音声がスピーカ１２５から出力され、映像が表示装置１２６に表示される。 In step S 52, the user operates the operation unit 124 to instruct playback of a clip recorded on the optical disc 31. At this time, the adding unit 151 of the editing device 41 reproduces the clip file of the clip from the optical disc 31. As a result, the audio of the clip is output from the speaker 125 and the video is displayed on the display device 126.

ステップＳ５３において、ユーザは、クリップの音声を聞き、各発言者の発言が聞こえたときに、入力画面において操作部１２４を操作し、その発言者の発言者ＩＤを入力する。このとき、付加部１５１は、クリップの電子マークデータに記述されている、再生中の音声に対応するフレームの直前のフレームに付与された発言者未定EM（start）と、直後のフレームに付与された発言者未定EM(end)に、入力された発言者ＩＤを付加する。 In step S 53, the user listens to the audio of the clip, and when each speaker speaks, operates the operation unit 124 on the input screen and inputs the speaker ID of the speaker. At this time, the adding unit 151 is added to the speaker undetermined EM (start) given to the frame immediately before the frame corresponding to the sound being reproduced, which is described in the electronic mark data of the clip, and to the immediately following frame. The input speaker ID is added to the determined speaker undetermined EM (end).

ステップＳ５４において、ユーザは、操作部１２４を操作して編集画面の表示を指令する。このとき、エディットリスト作成部１５２は、プロキシファイルのプロキシデータに基づいて、編集画面を表示装置１２６に表示させ、クリップの音声をスピーカ１２５から出力させる。 In step S54, the user operates the operation unit 124 to instruct the display of the edit screen. At this time, the edit list creation unit 152 displays an editing screen on the display device 126 based on the proxy data of the proxy file, and outputs the audio of the clip from the speaker 125.

ステップＳ５５において、ユーザは、操作部１２４を操作して、編集画面において映像と音声のイン点およびアウト点を指定することにより編集を行う。このとき、エディットリスト作成部１５２は、ユーザにより指定された映像と音声のイン点およびアウト点に基づいて、エディットリストを作成する。そして、エディットリスト作成部１５２は、そのエディットリストを光ディスク３１のエディットリストファイルに記録させるとともに、EM作成部１５３に供給する。 In step S55, the user operates the operation unit 124 to perform editing by designating the video and audio in and out points on the editing screen. At this time, the edit list creation unit 152 creates an edit list based on the video and audio in and out points specified by the user. Then, the edit list creation unit 152 records the edit list in the edit list file of the optical disc 31 and supplies it to the EM creation unit 153.

また、EM作成部１５３は、エディットリスト作成部１５２から供給されるエディットリストと、クリップの発言者EM(start)と発言者EM(end)が記述された電子マークデータとに基づいて、音声のカット点で発言者EM(start)または発言者EM(end)を補間し、音声のイン点からアウト点までに付与されている発言者EM(start)または発言者EM(end)を、編集結果上の対応する位置にコピーすることにより、編集結果の電子マークデータを作成する。 Further, the EM creation unit 153 generates a voice based on the edit list supplied from the edit list creation unit 152 and the electronic mark data describing the speaker EM (start) and the speaker EM (end) of the clip. Interpolate the speaker EM (start) or speaker EM (end) at the cut point, and edit the speaker EM (start) or speaker EM (end) assigned from the In point to the Out point of the speech The electronic mark data of the edited result is created by copying to the corresponding position above.

即ち、編集結果の電子マークデータは、クリップの電子マークデータの記述のうち、音声のイン点からアウト点までに付与されている発言者EM(start)または発言者EM(end)の記述を複写して、その発言者EM(start)または発言者EM(end)の付与位置の記述を変更し、さらに、音声のカット点に対応する編集結果上の位置に付与された発言者EM(start)または発言者EM(end)を、新たに記述することにより作成される。 In other words, the electronic mark data of the edited result is a copy of the description of the speaker EM (start) or speaker EM (end) given from the In point to the Out point of the audio in the description of the electronic mark data of the clip Then, the description of the position of the speaker EM (start) or the speaker EM (end) is changed, and the speaker EM (start) added to the position on the editing result corresponding to the cut point of the voice. Alternatively, it is created by newly describing the speaker EM (end).

そして、EM作成部１５３は、編集結果の電子マークデータを、光ディスク３１のエディットリスト用NRTファイルに記録させる。 Then, the EM creation unit 153 records the electronic mark data of the edited result in the edit list NRT file of the optical disc 31.

ステップＳ５６において、ユーザは、操作部１２４を操作することにより、編集結果においてダックボイス加工を施す発言の発言者の発言者ＩＤを指定する。このとき、エディットリスト作成部１５２は、ユーザにより指定された発言者ＩＤと、EM作成部１５３により作成された編集結果の電子マークデータに基づいて、ダックボイス加工を施す区間を特定し、その区間にダックボイス加工を施すことを、エディットリストファイルのエディットリストに記述する。 In step S56, the user operates the operation unit 124 to specify the speaker ID of the speaker who performs the duck voice processing in the edited result. At this time, the edit list creation unit 152 identifies a section on which the duck voice processing is performed based on the speaker ID designated by the user and the electronic mark data of the edited result created by the EM creation unit 153, and the section In the edit list of the edit list file, the duck voice processing will be performed.

ステップＳ５７において、ユーザは、操作部１２４を操作して、所望の発言者の発言にダックボイス加工を施した編集結果の再生を指令する。このとき、マイコン１１１のCPUは、ダックボイス加工を施すことが記述されたエディットリストにしたがって、光ディスク３１から編集結果を再生する。 In step S 57, the user operates the operation unit 124 to instruct reproduction of an edited result obtained by performing duck voice processing on a desired speaker's speech. At this time, the CPU of the microcomputer 111 reproduces the edited result from the optical disc 31 according to the edit list in which the duck voice processing is described.

具体的には、CPUは、ダックボイス加工を施すことが記述されたエディットリストにしたがって、所定のクリップの所定の区間の映像データおよび音声データを光ディスク３１から読み出す。そして、CPUは、読み出した音声データのうちの所定の発言者の発言に対応する音声データに対してダックボイス加工を施し、その結果得られる音声データを音声出力I/F１１５に供給することにより、編集結果の音声をスピーカ１２５から出力させる。また、CPUは、読み出した映像データを映像表示I/F１１７に供給することにより、編集結果の映像を表示装置１２６に表示させる。 Specifically, the CPU reads the video data and audio data of a predetermined section of a predetermined clip from the optical disc 31 according to an edit list in which duck voice processing is described. Then, the CPU performs duck voice processing on the voice data corresponding to the speech of a predetermined speaker among the read voice data, and supplies the resulting voice data to the voice output I / F 115. The edited sound is output from the speaker 125. Further, the CPU supplies the read video data to the video display I / F 117, thereby causing the display device 126 to display the edited video.

次に、図２２のフローチャートを参照して、図１０の付加部１５１による発言者ＩＤを発言者未定EM（start）と発言者未定EM（end）に付加する付加処理について説明する。この付加処理は、例えば、ユーザが操作部１２４を操作することにより、図１２の入力画面の表示を指令したとき開始される。 Next, with reference to the flowchart of FIG. 22, an adding process for adding the speaker ID by the adding unit 151 of FIG. 10 to the speaker undecided EM (start) and the speaker undecided EM (end) will be described. This addition processing is started, for example, when the user instructs display of the input screen in FIG. 12 by operating the operation unit 124.

ステップＳ７１において、付加部１５１は、予めハードディスク１２８に登録されている発言者リストに基づいて、表示装置１２６に入力画面を表示させる。ステップＳ７２において、付加部１５１は、ユーザにより光ディスク３１に記録されているクリップの再生が指令されたかどうかを判定する。ステップＳ７２で、再生が指令されていないと判定された場合、付加部１５１は、再生が指令されるまで待機する。 In step S 71, the adding unit 151 displays an input screen on the display device 126 based on the speaker list registered in the hard disk 128 in advance. In step S72, the adding unit 151 determines whether or not the user has instructed playback of the clip recorded on the optical disc 31. If it is determined in step S72 that the reproduction is not instructed, the adding unit 151 waits until the reproduction is instructed.

一方、ステップＳ７２で、クリップの再生が指令されたと判定された場合、ステップＳ７３において、付加部１５１は、そのクリップの再生を開始する。ステップＳ７４において、付加部１５１は、操作部I/F１１４から供給される操作信号に応じて、ユーザにより発言者ＩＤが入力されたかを判定する。 On the other hand, when it is determined in step S72 that the reproduction of the clip has been instructed, in step S73, the adding unit 151 starts reproduction of the clip. In step S 74, the adding unit 151 determines whether a speaker ID is input by the user according to the operation signal supplied from the operation unit I / F 114.

具体的には、ユーザは、操作部１２４を操作することにより入力画面においてカーソル１６０を移動し決定の指令を行う。操作部I/F１１４は、この操作により発言者ＩＤの入力を受け付け、その操作を表す操作信号を付加部１５１に供給する。付加部１５１は、この操作信号が供給された場合、ユーザにより発言者ＩＤが入力されたと判定する。 Specifically, the user operates the operation unit 124 to move the cursor 160 on the input screen and issue a determination command. The operation unit I / F 114 receives an input of a speaker ID through this operation, and supplies an operation signal representing the operation to the adding unit 151. When this operation signal is supplied, the adding unit 151 determines that the speaker ID is input by the user.

ステップＳ７４で、ユーザにより発言者ＩＤが入力されていないと判定された場合、付加部１５１は、発言者ＩＤが入力されるまで待機する。また、ステップＳ７４で、ユーザにより発言者ＩＤが入力されたと判定された場合、処理はステップＳ７５に進む。 If it is determined in step S74 that the speaker ID has not been input by the user, the adding unit 151 waits until the speaker ID is input. If it is determined in step S74 that the speaker ID has been input by the user, the process proceeds to step S75.

ステップＳ７５において、付加部１５１は、現在再生中のフレームのフレーム番号と入力された発言者ＩＤとに基づいて、発言者ＩＤの入力に対応する位置に付与された、現在再生中のフレームの直前の発言者未定EM（start）と直後の発言者未定EM(end)に、入力された発言者ＩＤを付加する。その結果、例えば図６に示したクリップの電子マークデータは、図１３に示したクリップの電子マークデータに変更される。 In step S75, the adding unit 151 immediately before the currently playing frame, which is given to the position corresponding to the input of the speaker ID, based on the frame number of the currently playing frame and the inputted speaker ID. The input speaker ID is added to the speaker undetermined EM (start) and the immediately following speaker undetermined EM (end). As a result, for example, the electronic mark data of the clip shown in FIG. 6 is changed to the electronic mark data of the clip shown in FIG.

ステップＳ７６において、付加部１５１は、再生中のクリップが終端まで再生されたかを判定し、終端まで再生されていないと判定した場合、処理はステップＳ７４に戻り、上述した処理が繰り返される。 In step S76, the adding unit 151 determines whether the clip being played has been played to the end, and if it is determined that the clip has not been played to the end, the process returns to step S74 and the above-described process is repeated.

一方、ステップＳ７６において、再生中のクリップが終端まで再生されたと判定された場合、ステップＳ７７において、付加部１５１は、クリップの再生を終了する。そして処理は終了する。 On the other hand, when it is determined in step S76 that the clip being played has been played to the end, in step S77, the adding unit 151 ends the playback of the clip. Then, the process ends.

以上のように、編集装置４１は、発言者未定EM(start)と発言者未定EM（end）に発言者ＩＤを付加するので、編集結果のうちの所望の発言者の音声にダックボイス加工を施す場合に、この発言者ＩＤが付加された発言者EM(start)と発言者EM（end）により、ダックボイス加工を施す音声の区間を容易に認識することができる。 As described above, the editing device 41 adds the speaker ID to the speaker undetermined EM (start) and the speaker undetermined EM (end), so that the voice of the desired speaker in the editing result is subjected to duck voice processing. When performing, it is possible to easily recognize a voice section to be subjected to duck voice processing by the speaker EM (start) and the speaker EM (end) to which the speaker ID is added.

従って、発言者EM(start)と発言者EM（end）は、編集結果のうちの所望の発言者の音声に対してダックボイス加工を容易に施すために有用な電子マークであるといえる。 Therefore, the speaker EM (start) and the speaker EM (end) can be said to be useful electronic marks for easily performing duck voice processing on the voice of a desired speaker in the edited result.

次に、図２３を参照して、図１０の編集処理部１５０による、音声を非破壊編集する音声編集処理について説明する。この音声編集処理は、例えば、ユーザが操作部１２４を操作して、編集画面の表示を指令したとき、開始される。 Next, with reference to FIG. 23, the audio editing process for nondestructive editing of audio by the editing processing unit 150 of FIG. 10 will be described. This voice editing process is started, for example, when the user operates the operation unit 124 to instruct the display of the editing screen.

ステップＳ８１において、エディットリスト作成部１５２は、編集対象とするクリップのプロキシデータの光ディスク３１からの再生を開始する。その結果、表示装置１２６には、編集対象とするクリップの低解像度の映像が編集画面として表示され、スピーカ１２５から、そのクリップの音声が出力される。 In step S81, the edit list creation unit 152 starts reproduction of proxy data of the clip to be edited from the optical disc 31. As a result, the low-resolution video of the clip to be edited is displayed on the display device 126 as an editing screen, and the audio of the clip is output from the speaker 125.

ステップＳ８２において、エディットリスト作成部１５２は、編集画面においてユーザにより音声のイン点が入力されたかを判定し、音声のイン点が入力されていないと判定した場合、音声のイン点が入力されるまで待機する。 In step S82, the edit list creation unit 152 determines whether or not an audio In point is input by the user on the editing screen. If it is determined that no audio In point is input, the audio In point is input. Wait until.

一方、ステップＳ８２で音声のイン点が入力されたと判定された場合、ステップＳ８３において、エディットリスト作成部１５２は、現在再生中のフレームのフレーム番号を、音声サブクリップとして音声の再生を開始する位置として、エディットリストに記述する。例えば、図１４の例では、ユーザにより第１のクリップのフレーム番号「284」のフレームの再生中に音声のイン点が入力され、エディットリスト作成部１５２は、そのフレーム番号「284」をエディットリストに記述する。 On the other hand, if it is determined in step S82 that an audio In point has been input, in step S83, the edit list creation unit 152 uses the frame number of the currently reproduced frame as the audio sub clip to start the audio reproduction. As described in the edit list. For example, in the example of FIG. 14, the user inputs an audio In point during playback of the frame of the first clip frame number “284”, and the edit list creation unit 152 adds the frame number “284” to the edit list. Describe in.

ステップＳ８４において、エディットリスト作成部１５２は、編集画面においてユーザにより音声のアウト点が入力されたかを判定し、音声のアウト点が入力されていないと判定した場合、音声のアウト点が入力されるまで待機する。 In step S84, the edit list creation unit 152 determines whether or not an audio out point has been input by the user on the editing screen. If it is determined that no audio out point has been input, the audio out point is input. Wait until.

一方、ステップＳ８４で音声のアウト点が入力されたと判定された場合、ステップＳ８５において、エディットリスト作成部１５２は、現在再生中のフレームのフレーム番号を、音声サブクリップとして音声の再生を終了する位置として、エディットリストに記述する。例えば、図１４の例では、ユーザにより第１のクリップのフレーム番号「2084」のフレームの再生中に音声のアウト点が入力され、エディットリスト作成部１５２は、そのフレーム番号「2084」をエディットリストに記述する。 On the other hand, if it is determined in step S84 that the audio out point has been input, in step S85, the edit list creation unit 152 sets the frame number of the currently reproduced frame as the audio sub clip, and ends the audio reproduction. As described in the edit list. For example, in the example of FIG. 14, an audio out point is input by the user during playback of the frame with the frame number “2084” of the first clip, and the edit list creation unit 152 adds the frame number “2084” to the edit list. Describe in.

ステップＳ８６において、エディットリスト作成部１５２は、ユーザにより編集対象とするクリップの変更が指令されたかを判定する。なお、図１４の例では、編集対象とするクリップのうち音声が含まれるクリップは、第１のクリップのみであるので、ステップＳ８６の判定の判定結果は常に否（No）となる。 In step S86, the edit list creation unit 152 determines whether the user has instructed to change the clip to be edited. In the example of FIG. 14, the clip that includes audio among the clips to be edited is only the first clip, so the determination result of the determination in step S86 is always No (No).

ステップＳ８６で、編集対象とするクリップの変更が指令されたと判定された場合、ステップＳ８７において、エディットリスト作成部１５２は、現在の編集対象であるクリップのプロキシデータの再生を終了する。そして、処理はステップＳ８１に戻り、新たに編集対象とするクリップのプロキシデータの再生が開始され、以降の処理が繰り返される。 If it is determined in step S86 that a change of the clip to be edited has been instructed, in step S87, the edit list creation unit 152 ends the reproduction of the proxy data of the clip that is the current editing target. Then, the process returns to step S81, the reproduction of the proxy data of the clip to be edited newly is started, and the subsequent processes are repeated.

一方、ステップＳ８６で編集対象とするクリップの変更が指令されていないと判定された場合、ステップＳ８８において、エディットリスト作成部１５２は、ユーザにより音声の編集の終了が指令されたかを判定する。ステップＳ８８でユーザにより音声の編集の終了が指令されていないと判定された場合、処理はステップＳ８２に戻り、上述した処理が繰り返される。 On the other hand, when it is determined in step S86 that the change of the clip to be edited is not instructed, in step S88, the edit list creation unit 152 determines whether or not the user has instructed the end of audio editing. If it is determined in step S88 that the user has not commanded the end of voice editing, the process returns to step S82, and the above-described processes are repeated.

また、ステップＳ８８で音声の編集の終了が指令されたと判定された場合、ステップＳ８９において、エディットリスト作成部１５２は、編集対象であるクリップのプロキシデータの再生を終了し、エディットリストをEM作成部１５３に供給する。 If it is determined in step S88 that the end of audio editing has been instructed, in step S89, the edit list creation unit 152 ends the reproduction of the proxy data of the clip to be edited, and the edit list is generated as an EM creation unit. 153.

ステップＳ９０において、EM作成部１５３は、エディットリスト作成部１５２から供給されるエディットリストと、クリップの電子マークデータとに基づいて、編集結果の電子マークデータを作成する。 In step S90, the EM creation unit 153 creates electronic mark data as a result of editing based on the edit list supplied from the edit list creation unit 152 and the electronic mark data of the clip.

例えば、EM作成部１５３は、図１６に示したエディットリストと、図１３に示したクリップの電子マークデータとに基づいて、図１８に示した編集結果の電子マークデータを作成する。そして、EM作成部１５３は、編集結果の電子マークデータを、光ディスク３１のエディットリスト用NRTファイルに記録させるとともに、エディットリスト作成部１５２に供給する。 For example, the EM creation unit 153 creates the electronic mark data of the editing result shown in FIG. 18 based on the edit list shown in FIG. 16 and the electronic mark data of the clip shown in FIG. Then, the EM creation unit 153 records the electronic mark data as the editing result in the edit list NRT file of the optical disc 31 and supplies it to the edit list creation unit 152.

ステップＳ９１において、エディットリスト作成部１５２は、操作部I/F１１４からの操作信号に応じて、ユーザによりダックボイス加工を施す発言の発言者の発言者ＩＤが入力されたかを判定する。 In step S91, the edit list creation unit 152 determines whether or not the speaker ID of the speaker who performs the duck voice processing is input by the user in accordance with the operation signal from the operation unit I / F 114.

具体的には、ユーザは、操作部１２４を操作して、ダックボイス加工を施す発言の発言者の発言者ＩＤを入力する。操作部I/F１１４は、この操作を表す操作信号を、エディットリスト作成部１５２に供給することにより、ダックボイス加工を施す発言の発言者の発言者ＩＤを指定する。エディットリスト作成部１５２は、この操作信号が操作部I/F１１４から供給された場合、ユーザによりダックボイス加工を施す発言の発言者の発言者ＩＤが入力されたと判定する。 Specifically, the user operates the operation unit 124 to input the speaker ID of the speaker who performs the duck voice processing. The operation unit I / F 114 supplies an operation signal representing this operation to the edit list creation unit 152, thereby designating the speaker ID of the speaker who performs the duck voice processing. When this operation signal is supplied from the operation unit I / F 114, the edit list creation unit 152 determines that the speaker ID of the speaker who performs the duck voice processing is input by the user.

ステップＳ９１で、ダックボイス加工を施す発言の発言者の発言者ＩＤが入力されたと判定された場合、ステップＳ９２において、エディットリスト作成部１５２は、入力された発言者ＩＤと、ステップＳ９０で作成された編集結果の電子マークデータとに基づいて、その発言者の発言に対応する区間の音声にダックボイス加工を施すことを示す記述をエディットリストに行う。その結果、例えば図１６に示したエディットリストは、図１９に示したエディットリストに変更される。 When it is determined in step S91 that the speaker ID of the speaker who performs the duck voice processing is input, in step S92, the edit list creating unit 152 creates the input speaker ID and the speaker ID in step S90. Based on the electronic mark data of the edited result, a description indicating that the duck voice processing is applied to the voice of the section corresponding to the utterance of the speaker is given to the edit list. As a result, for example, the edit list shown in FIG. 16 is changed to the edit list shown in FIG.

ステップＳ９３において、EM作成部１５３は、ユーザによりダックボイス加工を施す発言の発言者の発言者ＩＤとして入力された発言者ＩＤに基づいて、ステップＳ９１で作成された編集結果の電子マークデータに記述される発言者EM(start)と発言者EM(end)に、ダックボイス加工の有無を表す情報としての「duckVoice」または「normal」を付加する。その結果、例えば、図１８に示した編集結果の電子マークデータは、図２０に示した編集結果の電子マークデータに変更される。そして、処理は終了する。 In step S93, the EM creation unit 153 describes the edit result electronic mark data created in step S91 based on the speaker ID input as the speaker ID of the speaker who performs the duck voice processing by the user. “DuckVoice” or “normal” as information indicating the presence or absence of duck voice processing is added to the speaker EM (start) and the speaker EM (end). As a result, for example, the electronic mark data of the editing result shown in FIG. 18 is changed to the electronic mark data of the editing result shown in FIG. Then, the process ends.

なお、図２３では、音声を非破壊編集する音声編集処理について説明したが、映像を非破壊編集する映像編集処理も同様に行われ、エディットリストには、ユーザにより入力された映像のイン点またはアウト点に対応して、映像サブクリップとして映像の再生を開始する位置または終了する位置を示す情報などが記述される。但し、映像の編集処理では、図２３のステップＳ９０乃至Ｓ９３の処理は行われない。 In FIG. 23, the audio editing process for nondestructive editing of the audio has been described. However, the video editing process for nondestructive editing of the video is performed in the same manner, and the edit list includes an in-point of video input by the user Corresponding to the out point, information indicating the position where video playback starts or ends as a video sub clip is described. However, in the video editing process, steps S90 to S93 in FIG. 23 are not performed.

以上のように、図１の撮影編集システム１０では、編集結果の電子マークデータに発言者EM（start）または発言者EM（start）が記述されるので、その電子マークデータに基づいて、編集結果の音声のうちの各発言者の発言の区間を容易に認識することができる。 As described above, in the photographing and editing system 10 in FIG. 1, the speaker EM (start) or the speaker EM (start) is described in the electronic mark data of the editing result, so that the editing result is based on the electronic mark data. The speech segment of each speaker can be easily recognized.

従って、ユーザは、ダックボイス加工を施す発言の発言者の発言者ＩＤを入力することにより、その発言者の発言に対応する区間の音声にダックボイス加工を施すことを示す記述を容易にエディットリストに行うことができる。その結果、ユーザは、編集結果の音声のうちの特定の発言者の発言に対して容易にダックボイス加工を施すことができる。 Therefore, the user can easily enter a description indicating that duck voice processing is performed on the voice of the section corresponding to the speaker's speech by inputting the speaker ID of the speaker who performs the duck voice processing. Can be done. As a result, the user can easily perform duck voice processing on the speech of a specific speaker in the edited speech.

また、ユーザは、ダックボイス加工を施す発言の発言者を変更または削除する場合においても、変更または削除後の発言者の発言者ＩＤを入力することにより、編集結果の音声のうちのダックボイス加工を施す発言の発言者を容易に変更または削除することができる。特にニュース番組では、短時間での編集が要求されるため、ダックボイス加工を施す発言の発言者を容易に変更または削除可能であることは有用である。 In addition, even when the user changes or deletes the speaker who performs the duck voice processing, by inputting the speaker ID of the speaker after the change or deletion, the user can process the duck voice of the edited speech. Can be easily changed or deleted. In particular, since news programs require editing in a short time, it is useful to be able to easily change or delete the speaker who performs the duck voice processing.

なお、図１の撮影編集システム１０では、２つのビデオカメラ２１とビデオカメラ２２によりテレビジョン素材が撮影されたが、１つのビデオカメラによりテレビジョン素材が撮影されるようにしてもよい。この場合、編集装置４１がクリップを１つの光ディスクに集約する必要はない。 In the photographing / editing system 10 of FIG. 1, the television material is photographed by the two video cameras 21 and 22, but the television material may be photographed by one video camera. In this case, it is not necessary for the editing apparatus 41 to collect the clips on one optical disc.

また、撮影編集システム１０では、１つのビデオカメラ２１で音声が取得されたが、２つのビデオカメラ２１および２２で音声が取得されるようにしてもよい。この場合、編集装置４１は、映像と音声を同時に非破壊編集することができる。 In the shooting and editing system 10, audio is acquired by one video camera 21, but audio may be acquired by two video cameras 21 and 22. In this case, the editing device 41 can non-destructively edit video and audio at the same time.

次に、図２４は、本発明を適用した撮影編集システムの第２の実施の形態の構成例を示している。なお、図１と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 Next, FIG. 24 shows a configuration example of the second embodiment of the photographing and editing system to which the present invention is applied. The same components as those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted because it will be repeated.

図２４の撮影編集システム１７０では、撮影中に、ユーザがビデオカメラ１７１に発言者ＩＤを入力する。 24, a user inputs a speaker ID to the video camera 171 during shooting.

詳細には、ビデオカメラ１７１は、図１のビデオカメラ２１やビデオカメラ２２と同様に、テレビジョン素材の収録に使用される装置である。ビデオカメラ１７１は、ビデオカメラ２１と同様に、テレビジョン素材の映像を撮影するとともに、マイクロフォン２１Ａにより音声を取得する。ビデオカメラ１７１は、ビデオカメラ２１と同様に、その結果得られる音声付き映像のデータを素材データとして、光ディスク１７２のクリップファイルに記録する。 Specifically, the video camera 171 is a device used for recording television material, like the video camera 21 and the video camera 22 of FIG. Similar to the video camera 21, the video camera 171 captures a video of a television material and acquires sound by the microphone 21A. Similar to the video camera 21, the video camera 171 records the resulting video with audio data as material data in a clip file on the optical disk 172.

また、ビデオカメラ１７１は、テレビジョン素材の音声の取得中にユーザにより入力された、その音声を発した発言者の発言者ＩＤを取得する。ビデオカメラ１７１は、ユーザにより入力された発言者ＩＤに応じて、その発言者ＩＤが付加された発言者EM（start）を、取得中の音声のフレームに付与する。ビデオカメラ１７１は、その発言者EM(start)を記述した電子マークデータを、光ディスク１７２のクリップのNRTファイルに記録させる。光ディスク１７２は、編集装置１７３の光ディスクドライブ４１Ａに装着される。 In addition, the video camera 171 acquires the speaker ID of the speaker who uttered the sound input by the user during the acquisition of the sound of the television material. In response to the speaker ID input by the user, the video camera 171 adds the speaker EM (start) to which the speaker ID is added to the audio frame being acquired. The video camera 171 records the electronic mark data describing the speaker EM (start) in the NRT file of the clip on the optical disk 172. The optical disk 172 is mounted on the optical disk drive 41A of the editing device 173.

編集装置１７３は、編集装置４１と同様に、光ディスクドライブ４１Ａに装着される光ディスク１７２に記録された素材データの編集などに使用される装置である。編集装置１７３は、編集装置４１と同様に、ユーザの入力に応じて、光ディスク１７２に記録されている素材データの非破壊編集を行い、エディットリストを作成して光ディスク１７２のエディットリストファイルに記録する。 Similar to the editing device 41, the editing device 173 is a device used for editing material data recorded on the optical disk 172 mounted on the optical disk drive 41A. Similar to the editing device 41, the editing device 173 performs nondestructive editing of the material data recorded on the optical disc 172 in accordance with a user input, creates an edit list, and records it in the edit list file on the optical disc 172. .

また、編集装置１７３は、エディットリストとクリップの電子マークデータに基づいて、編集結果に発言者EM(start)を付与する。そして、編集装置１７３は、その発言者EM(start)を記述した電子マークデータを、編集結果の電子マークデータとして、光ディスク１７２のエディットリスト用NRTファイルに記録させる。さらに、編集装置１７３は、編集装置４１と同様に、ユーザの入力に応じて、編集結果のうちの所定の発言者の音声にダックボイス加工を施す。 In addition, the editing device 173 adds a speaker EM (start) to the editing result based on the edit list and the electronic mark data of the clip. Then, the editing device 173 records the electronic mark data describing the speaker EM (start) in the edit list NRT file of the optical disc 172 as the electronic mark data of the editing result. Further, like the editing device 41, the editing device 173 performs duck voice processing on the voice of a predetermined speaker in the editing result in accordance with a user input.

なお、図２４では、ビデオカメラ１７１と編集装置１７３が、それぞれ別々の装置であるものとしたが、それらが一体化されていてもよい。 In FIG. 24, the video camera 171 and the editing device 173 are assumed to be separate devices, but they may be integrated.

また、図２４では、光ディスク１７２が、編集装置１７３の光ディスクドライブ４１Ａに装着され、その光ディスク１７２に対する読み出しまたは記録が行われるものとしたが、編集装置１７３が、光ディスク１７２が装着されたビデオカメラ１７１とネットワークを介して接続され、そのネットワークを介して、光ディスク１７２に対する読み出しまたは記録が行われるようにしてもよい。 In FIG. 24, the optical disk 172 is mounted on the optical disk drive 41A of the editing apparatus 173 and reading or recording is performed on the optical disk 172. However, the editing apparatus 173 is a video camera 171 mounted with the optical disk 172. May be connected via a network, and the optical disk 172 may be read or recorded via the network.

図２５は、図２４のビデオカメラ１７１のハードウェア構成例を示すブロック図である。 FIG. 25 is a block diagram illustrating a hardware configuration example of the video camera 171 of FIG.

図２５のビデオカメラ１７１では、映像入力I/F６０、音声入力I/F６１、一時記憶メモリI/F６３、光ディスクドライブI/F６４、操作部I/F６５、音声出力I/F６６、シリアルデータI/F６７、映像表示I/F６８、メモリカードI/F６９、ネットワークI/F７０、ハードディスクドライブI/F７１、ドライブI/F７２、およびマイコン１８１が、システムバス７３に接続されている。 In the video camera 171 of FIG. 25, a video input I / F 60, an audio input I / F 61, a temporary storage memory I / F 63, an optical disk drive I / F 64, an operation unit I / F 65, an audio output I / F 66, a serial data I / F 67. The video display I / F 68, the memory card I / F 69, the network I / F 70, the hard disk drive I / F 71, the drive I / F 72, and the microcomputer 181 are connected to the system bus 73.

なお、図２５において、図２と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 Note that, in FIG. 25, the same components as those in FIG.

マイコン１８１は、CPU、ROM、およびRAMにより構成される。マイコン１８１のCPUは、ROMまたはハードディスク８１に記録されているプログラムにしたがって、操作部I/F６５からの操作信号などに応じて、ビデオカメラ１７１の各部を制御する。 The microcomputer 181 includes a CPU, a ROM, and a RAM. The CPU of the microcomputer 181 controls each unit of the video camera 171 in accordance with an operation signal from the operation unit I / F 65 in accordance with a program recorded in the ROM or the hard disk 81.

例えば、CPUは、図２のマイコン６２のCPUと同様に、映像入力I/F６０から供給される映像データと、音声入力I/F６１から供給される音声データとからなる素材データを用いてプロキシデータを作成し、一時記憶メモリ７５に記憶させる。また、CPUは、操作部I/F６５から入力される操作信号に応じて、撮影中のフレームに発言者EM(start)を付与する。そして、CPUは、その発言者EM(start)を記述する電子マークデータを作成し、光ディスクドライブI/F６４に供給して、光ディスク１７２のクリップのNRTファイルに記録させる。 For example, similar to the CPU of the microcomputer 62 in FIG. 2, the CPU uses proxy data using material data composed of video data supplied from the video input I / F 60 and audio data supplied from the audio input I / F 61. And is stored in the temporary storage memory 75. In addition, the CPU gives the speaker EM (start) to the frame being shot in accordance with the operation signal input from the operation unit I / F 65. Then, the CPU creates electronic mark data describing the speaker EM (start), supplies it to the optical disc drive I / F 64, and records it in the NRT file of the clip on the optical disc 172.

さらに、CPUは、マイコン６２のCPUと同様に、一時記憶メモリI/F６３から供給される素材データまたはプロキシデータのうちの音声データを、システムバス７３を介して音声出力I/F６６に供給して、その音声データに対応する音声をスピーカ７８から出力させる。 Further, like the CPU of the microcomputer 62, the CPU supplies the audio data of the material data or proxy data supplied from the temporary storage memory I / F 63 to the audio output I / F 66 via the system bus 73. The sound corresponding to the sound data is output from the speaker 78.

また、CPUは、マイコン６２のCPUと同様に、一時記憶メモリI/F６３から供給される素材データまたはプロキシデータのうちの映像データを、システムバス７３を介して映像表示I/F６８に供給して、その映像データに対応する映像を表示装置７９に表示させる。RAMには、CPUが実行するプログラムやデータなどが適宜記憶される。 Further, the CPU supplies the video data of the material data or proxy data supplied from the temporary storage memory I / F 63 to the video display I / F 68 via the system bus 73 in the same manner as the CPU of the microcomputer 62. Then, a video corresponding to the video data is displayed on the display device 79. The RAM appropriately stores programs executed by the CPU, data, and the like.

図２６は、図２５のビデオカメラ１７１における撮影処理部の機能的な構成例を示している。 FIG. 26 shows a functional configuration example of the shooting processing unit in the video camera 171 of FIG.

図２６の撮影処理部１９０は、制御部１９１と作成部１９２により構成される。 26 includes a control unit 191 and a creation unit 192.

制御部１９１は、撮影に関する各種の制御を行う。例えば、制御部１９１は、図３の制御部９１と同様に、操作部I/F６５から供給される、撮影の開始を指令するための操作を表す操作信号に応じて、映像入力I/F６０と音声入力I/F６１を制御し、素材データの取得を開始する。 The control unit 191 performs various types of control related to shooting. For example, similarly to the control unit 91 in FIG. 3, the control unit 191 selects the video input I / F 60 and the video input I / F 60 according to an operation signal that is supplied from the operation unit I / F 65 and represents an operation for instructing the start of shooting. The voice input I / F 61 is controlled, and acquisition of material data is started.

また、制御部１９１は、制御部９１と同様に、取得した素材データを用いてプロキシデータを作成する。さらに、制御部１９１は、素材データとプロキシデータを一時記憶メモリI/F６３に供給して、一時記憶メモリ７５に記憶させる。 Further, the control unit 191 creates proxy data using the acquired material data in the same manner as the control unit 91. Further, the control unit 191 supplies the material data and the proxy data to the temporary storage memory I / F 63 and stores them in the temporary storage memory 75.

作成部１９２は、操作部I/F６５から供給される、発言者ＩＤを入力するための操作を表す操作信号に応じて、その発言者ＩＤを付加した発言者EM（start）を、撮影中のフレームに付与する。そして、作成部１９２は、その発言者EM（start）を記述した電子マークデータを作成し、光ディスクドライブI/F６４に供給して、光ディスク１７２のクリップのNRTファイルに記録させる。 The creation unit 192 is shooting a speaker EM (start) to which the speaker ID is added in response to an operation signal supplied from the operation unit I / F 65 and representing an operation for inputting the speaker ID. Give it to the frame. Then, the creation unit 192 creates electronic mark data describing the speaker EM (start), supplies it to the optical disc drive I / F 64, and records it in the NRT file of the clip on the optical disc 172.

次に、図２７を参照して、ユーザがビデオカメラ１７１を用いて行う撮影作業について説明する。 Next, with reference to FIG. 27, a photographing operation performed by the user using the video camera 171 will be described.

図２７の表では、撮影作業の各ステップの番号に対応付けて、そのステップにおける撮影作業の内容、ビデオカメラ１７１による主な処理の内容、および、その処理対象となるデータが記述されている。 In the table of FIG. 27, the contents of the photographing work at that step, the contents of the main processing by the video camera 171 and the data to be processed are described in association with the number of each step of the photographing work.

図２７に示すように、ステップＳ１０１において、ユーザは、操作部７７を操作して、図１２の入力画面の表示を指令する。このとき、ビデオカメラ１７１の作成部１９２は、予めハードディスク８１に登録されている発言者リストに基づいて、表示装置７９に入力画面を表示させる。 As shown in FIG. 27, in step S101, the user operates the operation unit 77 to instruct the display of the input screen in FIG. At this time, the creation unit 192 of the video camera 171 displays an input screen on the display device 79 based on a speaker list registered in the hard disk 81 in advance.

ステップＳ１０２において、ユーザは、操作部７７を操作して、撮影の開始を指令する。このとき、ビデオカメラ１７１の制御部１９１は、クリップのNRTファイルを光ディスク１７２に作成する。また、制御部１９１は、クリップファイルを光ディスク１７２に作成する。さらに、制御部１９１は、映像入力I/F６０と音声入力I/F６１から供給される素材データのクリップファイルへの記録を開始する。 In step S102, the user operates the operation unit 77 to instruct the start of shooting. At this time, the control unit 191 of the video camera 171 creates an NRT file of the clip on the optical disc 172. In addition, the control unit 191 creates a clip file on the optical disc 172. Further, the control unit 191 starts recording the material data supplied from the video input I / F 60 and the audio input I / F 61 in the clip file.

ステップＳ１０３において、ユーザは、各発言者の発言の開始時に、入力画面において操作部７７を操作し、その発言者の発言者ＩＤを入力する。このとき、作成部１９２は、その発言者ＩＤが付加された発言者EM(start)を撮影中のフレームに付与し、その発言者EM(start)を、クリップのNRTファイルの電子マークデータに記述する。 In step S103, the user operates the operation unit 77 on the input screen at the start of each speaker, and inputs the speaker ID of the speaker. At this time, the creation unit 192 adds the speaker EM (start) to which the speaker ID is added to the frame being shot, and describes the speaker EM (start) in the electronic mark data of the NRT file of the clip. To do.

ステップＳ１０４において、ユーザは、操作部７７を操作して撮影の終了を指令する。このとき、制御部１９１は、素材データのクリップファイルへの記録を終了する。 In step S104, the user operates the operation unit 77 to instruct the end of shooting. At this time, the control unit 191 ends the recording of the material data into the clip file.

次に、図２８のフローチャートを参照して、図２６の撮影処理部１９０による撮影処理の詳細について説明する。この撮影処理は、例えば、ユーザが操作部７７を操作することにより、入力画面の表示を指令したとき、開始される。 Next, the details of the photographing process by the photographing processing unit 190 in FIG. 26 will be described with reference to the flowchart in FIG. This photographing process is started, for example, when the user instructs display of the input screen by operating the operation unit 77.

ステップＳ１１１において、ビデオカメラ１７１の作成部１９２は、予めハードディスク８１に登録されている発言者リストに基づいて、表示装置７９に入力画面を表示させる。ステップＳ１１２において、制御部１９１は、操作部I/F６５から供給される操作信号に応じて、ユーザにより撮影の開始が指令されたかを判定する。ステップＳ１１２で撮影の開始が指令されていないと判定された場合、撮影の開始が指令されるまで待機する。 In step S 111, the creation unit 192 of the video camera 171 displays an input screen on the display device 79 based on a speaker list registered in the hard disk 81 in advance. In step S 112, the control unit 191 determines whether or not the user has instructed the start of shooting according to the operation signal supplied from the operation unit I / F 65. If it is determined in step S112 that the start of shooting is not instructed, the process waits until the start of shooting is instructed.

一方、ステップＳ１１２で撮影の開始が指令されたと判定された場合、ステップＳ１１３において、制御部１９１は、図８のステップＳ３１の処理と同様に、クリップのNRTファイルを光ディスク１７２に作成する。ステップＳ１１４において、制御部１９１は、図８のステップＳ３２の処理と同様に、クリップファイルを光ディスク１７２に作成する。ステップＳ１１５において、制御部１９１は、映像入力I/F６０と音声入力I/F６１から供給される素材データのクリップファイルへの記録を開始する。 On the other hand, if it is determined in step S112 that the start of shooting has been instructed, in step S113, the control unit 191 creates an NRT file of the clip on the optical disc 172 in the same manner as in step S31 of FIG. In step S114, the control unit 191 creates a clip file on the optical disc 172 in the same manner as in step S32 in FIG. In step S115, the control unit 191 starts recording the material data supplied from the video input I / F 60 and the audio input I / F 61 in the clip file.

ステップＳ１１６において、作成部１９２は、操作部I/F６５から供給される操作信号に応じて、ユーザにより発言者ＩＤが入力されたかを判定する。ステップＳ１１６で、ユーザにより発言者ＩＤが入力されていないと判定された場合、処理はステップＳ１１７をスキップして、ステップＳ１１８に進む。 In step S116, the creation unit 192 determines whether a speaker ID has been input by the user in accordance with an operation signal supplied from the operation unit I / F 65. If it is determined in step S116 that the speaker ID has not been input by the user, the process skips step S117 and proceeds to step S118.

一方、ステップＳ１１６で、発言者ＩＤが入力されたと判定された場合、ステップＳ１１７において、作成部１９２は、現在撮影中のフレームのフレーム番号に基づいて、そのフレーム番号のフレームに、ユーザにより入力された発言者ＩＤが付加された発言者EM（start）を付与する。そして、作成部１９２は、その発言者EM（start）をクリップのNRTファイルの電子マークデータに記述する。 On the other hand, when it is determined in step S116 that the speaker ID has been input, in step S117, the creation unit 192 is input by the user to the frame having the frame number based on the frame number of the currently captured frame. The speaker EM (start) to which the speaker ID is added is assigned. Then, the creation unit 192 describes the speaker EM (start) in the electronic mark data of the NRT file of the clip.

ステップＳ１１８において、制御部１９１は、操作部７７からの操作信号に応じて、ユーザにより撮影の終了が指令されたかを判定する。ステップＳ１１８で撮影の終了が指令されていないと判定された場合、処理はステップＳ１１６に戻り、上述した処理が繰り返される。 In step S 118, the control unit 191 determines whether the end of shooting has been commanded by the user according to the operation signal from the operation unit 77. If it is determined in step S118 that the end of shooting has not been commanded, the process returns to step S116, and the above-described process is repeated.

また、ステップＳ１１８で撮影の終了が指令されたと判定された場合、ステップＳ１１９において、制御部１９１は、素材データのクリップファイルへの記録を終了する。そして処理は終了する。 If it is determined in step S118 that the end of shooting has been commanded, in step S119, the control unit 191 ends recording of the material data in the clip file. Then, the process ends.

図２９は、図２４の編集装置１７３のハードウェア構成例を示すブロック図である。 FIG. 29 is a block diagram illustrating a hardware configuration example of the editing device 173 of FIG.

図２９の編集装置１７３では、一時記憶メモリI/F１１２、光ディスクドライブI/F１１３、操作部I/F１１４、音声出力I/F１１５、シリアルデータI/F１１６、映像表示I/F１１７、メモリカードI/F１１８、ネットワークI/F１１９、ハードディスクドライブI/F１２０、ドライブI/F１２１、およびマイコン１９５が、システムバス１２２に接続されている。なお、図２９において、図９と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 29, the temporary storage memory I / F 112, the optical disk drive I / F 113, the operation unit I / F 114, the audio output I / F 115, the serial data I / F 116, the video display I / F 117, and the memory card I / F 118. A network I / F 119, a hard disk drive I / F 120, a drive I / F 121, and a microcomputer 195 are connected to the system bus 122. In FIG. 29, the same components as those in FIG. 9 are denoted by the same reference numerals, and the description thereof will be omitted to avoid repetition.

マイコン１９５は、CPU、ROM、およびRAMにより構成される。マイコン１９５のCPUは、ROMまたはハードディスク１２８に記録されているプログラムにしたがって、操作部I/F１１４からの操作信号などに応じて、編集装置１７３の各部を制御する。 The microcomputer 195 includes a CPU, a ROM, and a RAM. The CPU of the microcomputer 195 controls each unit of the editing device 173 according to an operation signal from the operation unit I / F 114 according to a program recorded in the ROM or the hard disk 128.

例えば、CPUは、図９のマイコン１１１のCPUと同様に、光ディスクドライブI/F１１３から供給される、光ディスクドライブ４１Ａに装着された光ディスク１７２から読み出されたクリップを一時記憶メモリI/F１１２に供給する。 For example, the CPU supplies the clip read from the optical disk 172 mounted on the optical disk drive 41A, supplied from the optical disk drive I / F 113, to the temporary storage memory I / F 112, similarly to the CPU of the microcomputer 111 in FIG. To do.

また、CPUは、マイコン１１１のCPUと同様に、操作信号に応じてエディットリストを作成することにより、非破壊編集を行う。CPUは、マイコン１１１のCPUと同様に、エディットリストを光ディスク１７２に記録させる。 Further, like the CPU of the microcomputer 111, the CPU performs nondestructive editing by creating an edit list according to the operation signal. The CPU records the edit list on the optical disc 172 in the same manner as the CPU of the microcomputer 111.

さらに、CPUは、マイコン１１１のCPUと同様に、エディットリストとクリップのNRTファイルの電子マークデータとに基づいて、編集結果の電子マークデータを作成する。そして、CPUは、マイコン１１１のCPUと同様に、その電子マークデータを、光ディスク１７２のエディットリスト用NRTファイルに記録させる。 Further, as with the CPU of the microcomputer 111, the CPU creates electronic mark data as an editing result based on the edit list and the electronic mark data of the NRT file of the clip. The CPU records the electronic mark data in the edit list NRT file on the optical disk 172 in the same manner as the CPU of the microcomputer 111.

また、CPUは、マイコン１１１のCPUと同様に、操作信号と編集結果の電子マークデータとに基づいて、編集結果の音声のうちの、ユーザにより指定された発言者ＩＤの発言者の発言にダックボイス加工を施すように、エディットリストを変更する。 In addition, the CPU, like the CPU of the microcomputer 111, ducks the speech of the speaker with the speaker ID specified by the user, out of the edited result, based on the operation signal and the electronic mark data of the edited result. Change the edit list to perform voice processing.

さらに、CPUは、マイコン１１１のCPUと同様に、一時記憶メモリI/F１１２から供給されるクリップのうちの音声データを、システムバス１２２を介して音声出力I/F１１５に供給して、クリップの音声をスピーカ１２５から出力させる。また、CPUは、マイコン１１１のCPUと同様に、一時記憶メモリI/F１１２から供給されるクリップのうちの映像データを、システムバス１２２を介して映像表示I/F１１７に供給して、クリップの映像を表示装置１２６に表示させる。RAMには、CPUが実行するプログラムやデータなどが適宜記憶される。 Further, like the CPU of the microcomputer 111, the CPU supplies the audio data of the clip supplied from the temporary storage memory I / F 112 to the audio output I / F 115 via the system bus 122, and the audio of the clip. Is output from the speaker 125. Similarly to the CPU of the microcomputer 111, the CPU supplies the video data of the clip supplied from the temporary storage memory I / F 112 to the video display I / F 117 via the system bus 122, and the video of the clip. Is displayed on the display device 126. The RAM appropriately stores programs executed by the CPU, data, and the like.

図３０は、図２９の編集装置１７３における編集処理部の機能的な構成例を示している。 FIG. 30 illustrates a functional configuration example of the editing processing unit in the editing device 173 of FIG.

図３０の編集処理部２００は、エディットリスト作成部２０１とEM作成部２０２により構成される。 30 includes an edit list creation unit 201 and an EM creation unit 202.

エディットリスト作成部２０１は、図１０のエディットリスト作成部１５２と同様に、光ディスクドライブI/F１１３から供給される、光ディスク１７２から読み出されたクリップを、一時記憶メモリI/F１１２に供給する。 The edit list creation unit 201 supplies the clip read from the optical disc 172 supplied from the optical disc drive I / F 113 to the temporary storage memory I / F 112 in the same manner as the edit list creation unit 152 in FIG.

また、エディットリスト作成部２０１は、エディットリスト作成部１５２と同様に、一時記憶メモリI/F１１２から供給されるプロキシデータのうちの音声データを音声出力I/F１１５に供給して、クリップの音声をスピーカ１２５から出力させるとともに、プロキシデータのうちの映像データを映像表示I/F１１７に供給して、クリップの低解像度の映像を編集画面として表示装置１２６に表示させる。このとき、ユーザは、スピーカ１２５からの音声を聞きつつ、編集画面を見ながら、操作部１２４を操作して編集作業を行う。 Similarly to the edit list creation unit 152, the edit list creation unit 201 supplies the audio data of the proxy data supplied from the temporary storage memory I / F 112 to the audio output I / F 115, and the audio of the clip. While outputting from the speaker 125, the video data of proxy data are supplied to the video display I / F 117, and the low-resolution video of the clip is displayed on the display device 126 as an editing screen. At this time, the user performs an editing operation by operating the operation unit 124 while listening to the sound from the speaker 125 and viewing the editing screen.

エディットリスト作成部２０１は、エディットリスト作成部１５２と同様に、ユーザの編集作業により操作部I/F１１４から供給される操作信号に応じて、エディットリストを作成することにより、非破壊編集を行う。そして、エディットリスト作成部２０１は、エディットリスト作成部１５２と同様に、エディットリストを光ディスク１７２に記録させるとともに、EM作成部２０２に供給する。 Similar to the edit list creation unit 152, the edit list creation unit 201 performs nondestructive editing by creating an edit list in accordance with an operation signal supplied from the operation unit I / F 114 by a user editing operation. The edit list creation unit 201 records the edit list on the optical disc 172 and supplies the edit list to the EM creation unit 202 in the same manner as the edit list creation unit 152.

また、エディットリスト作成部２０１は、エディットリスト作成部１５２と同様に、操作部I/F１１４から供給される操作信号と、EM作成部２０２から供給される編集結果の電子マークデータとに基づいて、編集結果の音声のうちの、ユーザにより指定された発言者ＩＤの発言者の発言にダックボイス加工を施すように、エディットリストを変更する。 Similarly to the edit list creation unit 152, the edit list creation unit 201 is based on the operation signal supplied from the operation unit I / F 114 and the electronic mark data of the editing result supplied from the EM creation unit 202. The edit list is changed so that duck voice processing is performed on the speech of the speaker with the speaker ID specified by the user in the edited speech.

EM作成部２０２は、EM作成部１５３と同様に、エディットリスト作成部２０１から供給されるエディットリストと、一時記憶メモリI/F１１２に記憶されている、クリップの電子マークデータとに基づいて、編集結果の電子マークデータを作成する。そして、EM作成部２０２は、EM作成部１５３と同様に、その電子マークデータを光ディスク１７２のエディットリスト用NRTファイルに記録させるとともに、エディットリスト作成部２０１に供給する。 Similar to the EM creation unit 153, the EM creation unit 202 performs editing based on the edit list supplied from the edit list creation unit 201 and the electronic mark data of the clip stored in the temporary storage memory I / F 112. Create the resulting electronic mark data. Similarly to the EM creation unit 153, the EM creation unit 202 records the electronic mark data in the edit list NRT file of the optical disc 172 and supplies the recorded data to the edit list creation unit 201.

また、EM作成部２０２は、操作部I/F１１４から供給される操作信号に応じて、編集結果の電子マークデータに記述される、ユーザにより指定された発言者ＩＤが付加された発言者EM(start)に、ダックボイス加工の有無を表す情報を付加する。 Further, the EM creation unit 202 responds to the operation signal supplied from the operation unit I / F 114, and the speaker EM (with the speaker ID specified by the user described in the electronic mark data of the edited result is added. information indicating the presence or absence of duck voice processing is added to (start).

次に、図３１乃至図３４を参照して、編集装置１７３における非破壊編集について説明する。 Next, nondestructive editing in the editing device 173 will be described with reference to FIGS. 31 to 34.

なお、ここでは、ビデオカメラ１７１が、発言者ＩＤ「Ａ」の発言者「○○さん」、発言者ＩＤ「Ｂ」の発言者「××さん」、および発言者ＩＤ「Ｃ」の発言者「△△さん」の３人を被写体として撮影するとともに対話の音声を取得することにより、３人の発言者の対話をテレビジョン素材として撮影したものとする。 In this example, the video camera 171 is a speaker with a speaker ID “A”, a speaker “Mr. XX”, a speaker ID “B” with a speaker “Mr. XX”, and a speaker with a speaker ID “C”. It is assumed that the conversation between the three speakers is filmed as a television material by photographing the three persons “Mr. Δ △” as subjects and acquiring the voice of the conversation.

そして、ユーザは、そのテレビジョン素材の所定の区間の音声を切り取って編集結果の音声として使用するとともに、所定の区間の映像を切り取って編集結果の映像として使用し、３人の発言者のうちの少なくとも１人の発言者の発言にダックボイス加工を施すように、非破壊編集を行う。 Then, the user cuts out the audio of the predetermined section of the television material and uses it as the audio of the editing result, and also cuts out the video of the predetermined section and uses it as the video of the editing result. Non-destructive editing is performed so as to perform duck voice processing on the speech of at least one speaker.

まず最初に、図３１と図３２を参照して、光ディスク１７２に記録されている編集対象のクリップと編集結果について説明する。なお、図３１において、横軸はフレーム番号を表している。 First, with reference to FIG. 31 and FIG. 32, a clip to be edited and an editing result recorded on the optical disc 172 will be described. In FIG. 31, the horizontal axis represents the frame number.

図３１Ａの上段の棒は、ビデオカメラ１７１により撮影された編集対象である第１のクリップの映像の長さを示しており、棒の上に記述されている数字は、その記述位置に対応する撮影時刻に撮影された映像のフレーム番号を示している。即ち、図３１の例では、第１のクリップの映像のフレーム数は1001フレームであり、各フレームには、フレーム番号が「0」から順に「1000」まで付与されている。 The upper bar in FIG. 31A indicates the length of the video of the first clip to be edited, which is shot by the video camera 171, and the number described on the bar corresponds to the description position. The frame number of the video shot at the shooting time is shown. That is, in the example of FIG. 31, the number of frames of the video of the first clip is 1001 frames, and frame numbers are assigned from “0” to “1000” in order.

また、図３１Ａの下段の棒は、第１のクリップの音声の長さを示しており、棒の中のアルファベットは、その位置に対応する音声を発した発言者の発言者ＩＤである。また、棒の下に記載されている矢印は、発言者EM（start）を表している。 Further, the lower bar in FIG. 31A indicates the length of the sound of the first clip, and the alphabet in the bar is the speaker ID of the speaker who uttered the sound corresponding to the position. An arrow written below the bar represents the speaker EM (start).

詳細には、図３１の例では、第１のクリップには、100フレーム目に発言者ＩＤ「Ａ」が付加された発言者EM（start）が付与されおり、350フレーム目に、発言者ＩＤ「Ｂ」が付加された発言者EM（start）が付与されている。また、600フレーム目に、発言者ＩＤ「Ｃ」が付加された発言者EM（start）が付与されている。 Specifically, in the example of FIG. 31, the first clip is given the speaker EM (start) with the speaker ID “A” added at the 100th frame, and the speaker ID at the 350th frame. A speaker EM (start) to which “B” is added is assigned. In addition, a speaker EM (start) to which a speaker ID “C” is added is assigned to the 600th frame.

また、図３１Ｂの棒は、図３１Ａに示した第１のクリップを編集対象として非破壊編集が行われた結果得られる編集結果の素材データの長さを示しており、棒の上に記述されている数字は、その記述位置に対応する編集結果上のフレーム番号を示している。図３１の例では、編集結果のフレーム数は601フレームであり、各フレームには、フレーム番号が「0」から順に「600」まで付与されている。 The bar in FIG. 31B indicates the length of the editing result material data obtained as a result of non-destructive editing with the first clip shown in FIG. 31A as the editing target, and is described on the bar. The number indicates the frame number on the editing result corresponding to the description position. In the example of FIG. 31, the number of frames of the editing result is 601 frames, and frame numbers are assigned from “0” to “600” in order from each frame.

詳細には、図３１の例では、ユーザにより第１のクリップの200フレーム目がイン点として指定され、300フレーム目がアウト点として指定されている。従って、第１のクリップの200フレーム目から300フレーム目までのフレームの素材データが、編集結果の0フレーム目から100フレーム目までのフレームの素材データ（以下、第１の素材サブクリップという）となる。 Specifically, in the example of FIG. 31, the user specifies the 200th frame of the first clip as the In point and the 300th frame as the Out point. Therefore, the material data of the frames from the 200th frame to the 300th frame of the first clip is the material data of the frames from the 0th frame to the 100th frame of the editing result (hereinafter referred to as the first material subclip). Become.

また、図３１の例では、ユーザにより第１のクリップの400フレーム目がイン点として指定され、750フレーム目がアウト点として指定されている。従って、第１のクリップの400フレーム目から750フレーム目までのフレームの素材データが、編集結果の100フレーム目から450フレーム目までのフレームの素材データ（以下、第２の素材サブクリップという）となる。 In the example of FIG. 31, the 400th frame of the first clip is designated as the In point by the user, and the 750th frame is designated as the Out point. Therefore, the frame material data from the 400th frame to the 750th frame of the first clip is the material data of the frames from the 100th frame to the 450th frame of the edited result (hereinafter referred to as the second material subclip). Become.

さらに、図３１の例では、ユーザにより第１のクリップの850フレーム目がイン点として指定され、1000フレーム目がアウト点として指定されている。従って、第１のクリップの850フレーム目から1000フレーム目までのフレームの素材データが、編集結果の450フレーム目から600フレーム目までのフレームの素材データ（以下、第３の素材サブクリップという）となる。 Furthermore, in the example of FIG. 31, the 850th frame of the first clip is designated as the In point by the user, and the 1000th frame is designated as the Out point. Therefore, the frame material data from the 850th frame to the 1000th frame of the first clip is the material data of the frames from the 450th frame to the 600th frame of the edited result (hereinafter referred to as the third material sub clip). Become.

以上のようにイン点とアウト点が指定され、非破壊編集が行われた場合、第１のクリップのイン点の直前に付与された発言者EM（start）が、イン点に対応する編集結果上の位置に付与される。 When the In point and Out point are specified as described above and non-destructive editing is performed, the speaker EM (start) given immediately before the In point of the first clip is the editing result corresponding to the In point. It is given to the upper position.

図３１の例では、イン点として指定された第１のクリップの200フレーム目の直前の100フレーム目に付与された、発言者ＩＤ「Ａ」が付与された発言者EM（start）が、そのイン点に対応する編集結果上の位置である0フレーム目に付与される。 In the example of FIG. 31, the speaker EM (start) assigned with the speaker ID “A” assigned to the 100th frame immediately before the 200th frame of the first clip designated as the IN point is It is assigned to the 0th frame, which is the position on the editing result corresponding to the IN point.

また、イン点として指定された第１のクリップの350フレーム目の直前の300フレーム目に付与された、発言者ＩＤ「Ｂ」が付与された発言者EM（start）が、そのイン点に対応する編集結果上の位置である100フレーム目に付与される。 In addition, the speaker EM (start) assigned with the speaker ID “B” assigned to the 300th frame immediately before the 350th frame of the first clip designated as the In point corresponds to the In point. It is given to the 100th frame which is the position on the editing result.

さらに、イン点として指定された第１のクリップの850フレーム目の直前の600フレーム目に付与された、発言者ＩＤ「Ｃ」が付与された発言者EM（start）が、そのイン点に対応する編集結果上の位置である450フレーム目に付与される。 Furthermore, the speaker EM (start) assigned with the speaker ID “C” assigned to the 600th frame immediately before the 850th frame of the first clip designated as the IN point corresponds to the IN point. It is given to the 450th frame which is the position on the editing result.

また、第１のクリップのイン点からアウト点までの編集区間内の位置に付与されている発言者EM（start）が、その位置に対応する編集結果上の位置に付与される。図３１の例では、イン点として指定された第１のクリップの400フレーム目から、アウト点として指定された第２のクリップの７５0フレーム目までの編集区間内の600フレーム目に付与されている発言者EM（start）が、その位置に対応する編集結果上の位置である300フレーム目に付与される。 Also, the speaker EM (start) given to the position in the editing section from the In point to the Out point of the first clip is given to the position on the editing result corresponding to that position. In the example of FIG. 31, it is assigned to the 600th frame in the editing section from the 400th frame of the first clip designated as the In point to the 750th frame of the second clip designated as the Out point. The speaker EM (start) is assigned to the 300th frame which is the position on the editing result corresponding to the position.

なお、この発言者EM（start）を付与する編集結果上の位置Tee3（図３１の例では300）は、イン点に対応する編集結果上の位置Tec1（図３１の例では100）、発言者EM（start）が付与されているクリップ上の位置Tme3（図３１の例では600）、および、イン点Tmc3（図３１の例では400）を用いた以下の式（１）により求められる。 Note that the position Tee3 (300 in the example of FIG. 31) on the editing result to which the speaker EM (start) is assigned is the position Tec1 (100 in the example of FIG. 31) on the editing result corresponding to the IN point. It is obtained by the following equation (1) using the position Tme3 (600 in the example of FIG. 31) on the clip to which EM (start) is given and the in point Tmc3 (400 in the example of FIG. 31).

Tee3=Tec1+Tme3-Tmc3
・・・（１） Tee3 = Tec1 + Tme3-Tmc3
... (1)

また、編集結果と、その編集結果の音声に対応する発言者ＩＤは、図３２に示すようになる。なお、図３２において、横軸はフレーム番号を示している。 Further, the editing result and the speaker ID corresponding to the sound of the editing result are as shown in FIG. In FIG. 32, the horizontal axis indicates the frame number.

即ち、編集結果は、第１の素材サブクリップ、第２の素材サブクリップ、および第３の素材サブクリップが順に先頭から並べられることにより構成される。また、編集結果のフレーム番号「0」から「100」までのフレームの音声は、発言者ＩＤ「Ａ」の発言者の発言であり、編集結果のフレーム番号「100」から「300」までのフレームの音声は、発言者ＩＤ「Ｂ」の発言者の発言である。さらに、編集結果のフレーム番号「300」から「600」までのフレームの音声は、発言者ＩＤ「Ｃ」の発言者の発言である。 That is, the editing result is configured by arranging the first material sub clip, the second material sub clip, and the third material sub clip in order from the top. Also, the voices of the frames with the frame numbers “0” to “100” as the editing result are the speeches of the speaker with the speaker ID “A”, and the frames with the frame numbers “100” to “300” as the editing result are included. Is the speech of the speaker with the speaker ID “B”. Further, the voices of the frames with frame numbers “300” to “600” as the editing result are the speech of the speaker with the speaker ID “C”.

以上のように、図３１Ｂや図３２に示した編集結果では、第２の素材サブクリップのアウト点と、発言者の切替点が異なっている。即ち、図３１Ｂや図３２に示した編集結果では、第２の素材サブクリップの途中で、発言者「××さん」の発言から、「△△さん」の発言に切り替わる。 As described above, in the editing results shown in FIGS. 31B and 32, the out point of the second material sub-clip and the switching point of the speaker are different. That is, in the editing results shown in FIG. 31B and FIG. 32, the speech of the speaker “Mr. XX” is switched to the message of “Mr. ΔΔ” in the middle of the second material sub-clip.

図３３は、図３１に示した第１のクリップの電子マークデータを示し、図３４は、図３１や図３２に示した編集結果の電子マークデータを示している。 FIG. 33 shows the electronic mark data of the first clip shown in FIG. 31, and FIG. 34 shows the electronic mark data of the editing result shown in FIG. 31 and FIG.

図３３の例では、電子マークテーブル（<EssenceMark Table> </EssenceMark Table>）で囲まれる電子マークテーブル部は、２乃至６行目に記述されている。 In the example of FIG. 33, the electronic mark table portion surrounded by the electronic mark table (<EssenceMark Table> </ EssenceMark Table>) is described in the second to sixth lines.

図３３の２行目の「targetMedia="Original-Material"」は、この電子マークデータが、クリップの素材データに付与される電子マークを記述した電子マークデータであることを示している。 “TargetMedia =“ Original-Material ”” on the second line in FIG. 33 indicates that this electronic mark data is electronic mark data describing an electronic mark added to the material data of the clip.

また、３行目の「EssenceMark value="Speaker-A"frameCount="100"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の開始位置であり、付与位置がクリップの先頭から100フレーム目であることを示している。 In addition, the description of “EssenceMark value =“ Speaker-A ”frameCount =“ 100 ”” on the third line is the start position of the speech by the speaker whose speaker ID is “A”. It shows that the position is the 100th frame from the beginning of the clip.

同様に、４行目の「EssenceMark value="Speaker-B"frameCount="350"」、５行目の「EssenceMark value="Speaker-C"frameCount="600"」の記述は、この電子マークが示す特徴が、それぞれ、発言者ＩＤ「Ｂ」の発言者、発言者ＩＤ「Ｃ」の発言者による発言の開始位置であり、付与位置が、それぞれ、クリップの先頭から350フレーム目、600フレーム目であることを示している。 Similarly, the description of "EssenceMark value =" Speaker-B "frameCount =" 350 "" on the fourth line and "EssenceMark value =" Speaker-C "frameCount =" 600 "" on the fifth line The features shown are the start position of the speech by the speaker with the speaker ID “B” and the speaker with the speaker ID “C”, and the assigned positions are the 350th frame and the 600th frame from the beginning of the clip, respectively. It is shown that.

また、図３４の例では、電子マークテーブル（<EssenceMark Table> </EssenceMark Table>）で囲まれる電子マークテーブル部は、２乃至７行目に記述されている。 In the example of FIG. 34, the electronic mark table portion surrounded by the electronic mark tables (<EssenceMark Table> </ EssenceMark Table>) is described in the second to seventh lines.

なお、２行目の「targetMedia="Initial-EditList"」は、この電子マークデータが、編集結果に付与される電子マークを記述した電子マークデータであることを示している。 Note that “targetMedia =“ Initial-EditList ”” on the second line indicates that this electronic mark data is electronic mark data describing an electronic mark given to the editing result.

また、３行目の「EssenceMark value="Speaker-A"frameCount="0"」の記述は、この電子マークが示す特徴が発言者ＩＤ「Ａ」の発言者による発言の開始位置であり、付与位置が編集結果の先頭から0フレーム目であることを示している。 The description of “EssenceMark value =“ Speaker-A ”frameCount =“ 0 ”” on the third line is the start position of the speech by the speaker whose speaker ID is “A”. It indicates that the position is the 0th frame from the beginning of the editing result.

同様に、４行目の「EssenceMark value="Speaker-B"frameCount="100"」、５行目の「EssenceMark value="Speaker-C"frameCount="300"」、６行目の「EssenceMark value="Speaker-C"frameCount="450"」の記述は、この電子マークが示す特徴が、それぞれ、発言者ＩＤ「Ｂ」の発言者、発言者ＩＤ「Ｃ」の発言者、発言者ＩＤ「Ｃ」の発言者による発言の開始位置であり、付与位置が、それぞれ、編集結果の先頭から100フレーム目、300フレーム目、450フレーム目であることを示している。 Similarly, "EssenceMark value =" Speaker-B "frameCount =" 100 "" on the fourth line, "EssenceMark value =" Speaker-C "frameCount =" 300 "" on the fifth line, "EssenceMark value on the sixth line" In the description of = "Speaker-C" frameCount = "450", the feature indicated by the electronic mark is that the speaker with the speaker ID "B", the speaker with the speaker ID "C", and the speaker ID " This is the start position of the speech by the speaker of “C”, and indicates that the assigned positions are the 100th frame, the 300th frame, and the 450th frame from the top of the editing result, respectively.

なお、上述した説明では、編集結果の300フレーム目と450フレーム目に、同一の発言者ＩＤ「Ｃ」が付加された発言者EM(start)が連続して付与されたが、同一の発言者ＩＤが付加された発言者EM(start)が連続する場合、最初の発言者EM(start)だけを付与するようにしてもよい。 In the above description, the speaker EM (start) to which the same speaker ID “C” is added is continuously added to the 300th frame and the 450th frame of the edited result. When the speaker EM (start) to which the ID is added continues, only the first speaker EM (start) may be assigned.

次に、図３５を参照して、ユーザが編集装置１７３を用いて行う編集作業について説明する。 Next, with reference to FIG. 35, editing work performed by the user using the editing device 173 will be described.

図３５の表では、編集作業の各ステップの番号に対応付けて、そのステップにおける編集作業の内容、編集装置１７３による主な処理の内容、および、その処理の対象となるデータが記述されている。 In the table of FIG. 35, the contents of the editing work at that step, the contents of the main processing by the editing device 173, and the data to be processed are described in association with the number of each step of the editing work. .

図３５に示すように、ステップＳ１３１において、ユーザは、編集装置１７３の光ディスクドライブ４１Ａに光ディスク１７２を装着し、操作部１２４を操作して編集画面の表示の指令を行う。このとき、エディットリスト作成部２０１は、プロキシファイルのプロキシデータに基づいて、編集画面を表示装置１２６に表示させ、クリップの音声をスピーカ１２５から出力させる。 As shown in FIG. 35, in step S131, the user attaches the optical disc 172 to the optical disc drive 41A of the editing apparatus 173, and operates the operation unit 124 to give an instruction to display an edit screen. At this time, the edit list creation unit 201 displays an editing screen on the display device 126 based on the proxy data of the proxy file, and outputs the audio of the clip from the speaker 125.

ステップＳ１３２において、ユーザは、操作部１２４を操作して、編集画面においてイン点およびアウト点を指定することにより編集を行う。このとき、エディットリスト作成部２０１は、ユーザにより指定されたイン点およびアウト点に基づいて、エディットリストを作成する。そして、エディットリスト作成部２０１は、そのエディットリストを光ディスク１７２のエディットリストファイルに記録させるとともに、EM作成部２０２に供給する。 In step S132, the user operates the operation unit 124 to perform editing by designating an in point and an out point on the editing screen. At this time, the edit list creation unit 201 creates an edit list based on the in and out points specified by the user. Then, the edit list creation unit 201 records the edit list in an edit list file on the optical disc 172 and supplies it to the EM creation unit 202.

また、EM作成部２０２は、エディットリスト作成部２０１から供給されるエディットリストと、クリップの発言者EM(start)が記述された電子マークデータとに基づいて、カット点で発言者EM(start)を補間し、イン点からアウト点までに付与されている発言者EM(start)を、編集結果上の対応する位置にコピーすることにより、編集結果の電子マークデータを作成する。そして、EM作成部２０２は、編集結果の電子マークデータを、光ディスク１７２のエディットリスト用NRTファイルに記録させる。 Further, the EM creation unit 202 is based on the edit list supplied from the edit list creation unit 201 and the electronic mark data in which the speaker EM (start) of the clip is described. Is interpolated and the speaker EM (start) given from the In point to the Out point is copied to the corresponding position on the editing result, thereby creating electronic mark data of the editing result. Then, the EM creation unit 202 records the electronic mark data as the editing result in the edit list NRT file on the optical disc 172.

ステップＳ１３３およびＳ１３４の処理は、図２１のステップＳ５６およびＳ５７の処理と同様であるので、説明は省略する。 The processing in steps S133 and S134 is the same as the processing in steps S56 and S57 in FIG.

なお、図示は省略するが、編集処理部２００による、音声と映像を非破壊編集する編集処理は、図２３の音声編集処理と同様である。但し、編集処理部２００による編集処理では、図２３のステップＳ８３とＳ８５において、現在再生中のフレームのフレーム番号が、素材サブクリップとして音声と映像の再生を開始する位置または終了する位置として、エディットリストに記述される。 Although illustration is omitted, the editing processing for nondestructive editing of audio and video by the editing processing unit 200 is the same as the audio editing processing of FIG. However, in the editing processing by the editing processing unit 200, in steps S83 and S85 in FIG. 23, the frame number of the frame currently being played back is set as the position at which audio and video playback starts or ends as a material sub clip. Described in the list.

また、図２４の撮影編集システム１７０では、ビデオカメラ１７１により撮影と発言者EM(start)の付与の両方が行われたが、図３６に示すように、撮影を行うビデオカメラ２０５とは別に、発言者EM(start)を付与する付与装置２０６が設けられるようにしてもよい。 In addition, in the shooting / editing system 170 of FIG. 24, both shooting and giving of the speaker EM (start) are performed by the video camera 171, but as shown in FIG. 36, separately from the video camera 205 that performs shooting, An assigning device 206 for assigning a speaker EM (start) may be provided.

さらに、図２４の撮影編集システム１７０では、１つのビデオカメラ１７１によりテレビジョン素材が撮影されたが、複数のビデオカメラによりテレビジョン素材が撮影されるようにしてもよい。 Further, in the shooting / editing system 170 of FIG. 24, the television material is shot by one video camera 171, but the television material may be shot by a plurality of video cameras.

この場合、編集装置１７３は、図１の編集装置４１と同様に、各ビデオカメラで撮影されたクリップを１つの光ディスクに集約する。また、この場合、複数のビデオカメラのそれぞれで音声が取得されるようにしてもよいし、いずれか１つのビデオカメラで音声が取得されるようにしてもよい。いずれか１つのビデオカメラで音声が取得される場合、編集装置１７３は、編集装置４１と同様に、映像と音声を別々に非破壊編集する。 In this case, the editing device 173 collects the clips taken by the respective video cameras on one optical disc, similarly to the editing device 41 in FIG. In this case, the sound may be acquired by each of the plurality of video cameras, or the sound may be acquired by any one of the video cameras. When audio is acquired by any one of the video cameras, the editing device 173 performs nondestructive editing on the video and audio separately, similarly to the editing device 41.

次に、図３７は、本発明を適用した撮影編集システムの第３の実施の形態の構成例を示している。なお、図１や図２４と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 Next, FIG. 37 shows a configuration example of the third embodiment of the photographing editing system to which the present invention is applied. The same components as those in FIGS. 1 and 24 are denoted by the same reference numerals, and the description thereof will be omitted to avoid repetition.

図３７の撮影編集システム２１０では、ユーザが、編集装置２１３において編集結果の再生中に発言者ＩＤを入力する。 In the shooting / editing system 210 of FIG. 37, the user inputs the speaker ID while the editing device 213 reproduces the edited result.

詳細には、ビデオカメラ２１１は、図２４のビデオカメラ１７１と同様に、テレビジョン素材の収録に使用される装置である。ビデオカメラ２１１は、ビデオカメラ１７１と同様に、テレビジョン素材の映像を撮影するとともに、マイクロフォン２１Ａにより音声を取得する。ビデオカメラ２１１は、ビデオカメラ１７１と同様に、その結果得られる音声付き映像のデータを素材データとして、光ディスク２１２のクリップファイルに記録する。 Specifically, the video camera 211 is an apparatus used for recording television material, similar to the video camera 171 of FIG. Similar to the video camera 171, the video camera 211 captures a video of a television material and acquires sound by the microphone 21 A. Similar to the video camera 171, the video camera 211 records the video data with audio obtained as a result in the clip file of the optical disk 212 as material data.

光ディスク２１２は、編集装置２１３の光ディスクドライブ４１Ａに装着される。編集装置２１３は、編集装置１７３と同様に、光ディスクドライブ４１Ａに装着される光ディスク２１２に記録された素材データの編集などに使用される装置である。 The optical disk 212 is mounted on the optical disk drive 41A of the editing device 213. Similar to the editing device 173, the editing device 213 is a device used for editing material data recorded on the optical disk 212 mounted on the optical disk drive 41A.

編集装置２１３は、編集装置１７３と同様に、ユーザの入力に応じて、光ディスク２１２に記録されている素材データの非破壊編集を行い、エディットリストを作成して光ディスク２１２のエディットリストファイルに記録する。また、編集装置２１３は、ユーザの入力に応じて、編集結果に発言者EM(start)を付与し、その発言者EM(start)を記述した電子マークデータを、編集結果の電子マークデータとして、光ディスク２１２のエディットリスト用NRTファイルに記録させる。 Similar to the editing device 173, the editing device 213 performs nondestructive editing of the material data recorded on the optical disc 212 in accordance with a user input, creates an edit list, and records it in the edit list file on the optical disc 212. . Further, the editing device 213 gives a speaker EM (start) to the editing result in response to a user input, and uses electronic mark data describing the speaker EM (start) as electronic mark data of the editing result. An edit list NRT file on the optical disk 212 is recorded.

さらに、編集装置２１３は、編集装置１７３と同様に、ユーザの入力に応じて、編集結果のうちの所定の発言者の音声にダックボイス加工を施す。 Further, like the editing device 173, the editing device 213 performs duck voice processing on the voice of a predetermined speaker in the editing result in accordance with a user input.

なお、図３７では、ビデオカメラ２１１と編集装置２１３が、それぞれ別々の装置であるものとしたが、それらが一体化されていてもよい。 In FIG. 37, the video camera 211 and the editing device 213 are assumed to be separate devices, but they may be integrated.

また、図３７では、光ディスク２１２が、編集装置２１３の光ディスクドライブ４１Ａに装着され、その光ディスク２１２に対する読み出しまたは記録が行われるものとしたが、編集装置２１３が、光ディスク２１２が装着されたビデオカメラ２１１とネットワークを介して接続され、そのネットワークを介して、光ディスク２１２に対する読み出しまたは記録が行われるようにしてもよい。 In FIG. 37, the optical disk 212 is mounted on the optical disk drive 41A of the editing apparatus 213, and reading or recording is performed on the optical disk 212. However, the editing apparatus 213 has a video camera 211 mounted with the optical disk 212. May be connected via a network, and the optical disk 212 may be read or recorded via the network.

図３８は、図３７のビデオカメラ２１１のハードウェア構成例を示すブロック図である。 FIG. 38 is a block diagram illustrating a hardware configuration example of the video camera 211 of FIG.

図３８のビデオカメラ２１１では、映像入力I/F６０、音声入力I/F６１、一時記憶メモリI/F６３、光ディスクドライブI/F６４、操作部I/F６５、音声出力I/F６６、シリアルデータI/F６７、映像表示I/F６８、メモリカードI/F６９、ネットワークI/F７０、ハードディスクドライブI/F７１、ドライブI/F７２、およびマイコン２２１が、システムバス７３に接続されている。 38, the video input I / F 60, the audio input I / F 61, the temporary storage memory I / F 63, the optical disk drive I / F 64, the operation unit I / F 65, the audio output I / F 66, and the serial data I / F 67. The video display I / F 68, the memory card I / F 69, the network I / F 70, the hard disk drive I / F 71, the drive I / F 72, and the microcomputer 221 are connected to the system bus 73.

なお、図３８において、図２や図２５と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 In FIG. 38, the same components as those in FIGS. 2 and 25 are denoted by the same reference numerals, and the description thereof will be omitted because it will be repeated.

マイコン２２１は、CPU、ROM、およびRAMにより構成される。マイコン２２１のCPUは、ROMまたはハードディスク８１に記録されているプログラムにしたがって、操作部I/F６５からの操作信号などに応じて、ビデオカメラ２１１の各部を制御する。 The microcomputer 221 includes a CPU, a ROM, and a RAM. The CPU of the microcomputer 221 controls each unit of the video camera 211 according to an operation signal from the operation unit I / F 65 according to a program recorded in the ROM or the hard disk 81.

例えば、CPUは、図２５のマイコン１８１のCPUと同様に、映像入力I/F６０から供給される映像データと、音声入力I/F６１から供給される音声データとからなる素材データを用いてプロキシデータを作成し、一時記憶メモリ７５に記憶させる。また、CPUは、マイコン１８１のCPUと同様に、一時記憶メモリI/F６３から供給される素材データまたはプロキシデータのうちの音声データを、システムバス７３を介して音声出力I/F６６に供給して、その音声データに対応する音声をスピーカ７８から出力させる。 For example, like the CPU of the microcomputer 181 in FIG. 25, the CPU uses proxy data using material data composed of video data supplied from the video input I / F 60 and audio data supplied from the audio input I / F 61. And is stored in the temporary storage memory 75. Further, the CPU supplies the audio data of the material data or proxy data supplied from the temporary storage memory I / F 63 to the audio output I / F 66 via the system bus 73 in the same manner as the CPU of the microcomputer 181. The sound corresponding to the sound data is output from the speaker 78.

また、CPUは、マイコン１８１のCPUと同様に、一時記憶メモリI/F６３から供給される素材データまたはプロキシデータのうちの映像データを、システムバス７３を介して映像表示I/F６８に供給して、その映像データに対応する映像を表示装置７９に表示させる。RAMには、CPUが実行するプログラムやデータなどが適宜記憶される。 Further, the CPU supplies the video data of the material data or proxy data supplied from the temporary storage memory I / F 63 to the video display I / F 68 via the system bus 73 in the same manner as the CPU of the microcomputer 181. Then, a video corresponding to the video data is displayed on the display device 79. The RAM appropriately stores programs executed by the CPU, data, and the like.

図３９は、図３８のビデオカメラ２１１における撮影処理部の機能的な構成例を示している。図３９に示すように、撮影処理部２３０は、図２６の制御部１９１により構成されるので、説明は省略する。 FIG. 39 shows an example of the functional configuration of the photographing processing unit in the video camera 211 of FIG. As shown in FIG. 39, the photographing processing unit 230 is configured by the control unit 191 in FIG.

次に、図４０を参照して、ユーザがビデオカメラ２１１を用いて行う撮影作業について説明する。 Next, with reference to FIG. 40, a photographing operation performed by the user using the video camera 211 will be described.

図４０の表では、撮影作業の各ステップの番号に対応付けて、そのステップにおける撮影作業の内容、ビデオカメラ２１１による主な処理の内容、および、その処理対象となるデータが記述されている。 In the table of FIG. 40, the contents of the photographing work at that step, the contents of the main processing by the video camera 211, and the data to be processed are described in association with the number of each step of the photographing work.

図４０のステップＳ１７１およびＳ１７２は、図２７のステップＳ１０２およびＳ１０４と同様である。即ち、図４０の編集作業は、図２７の編集作業において、発言者EM(start)の付与に関する作業であるステップＳ１０１とＳ１０３を削除したものである。 Steps S171 and S172 in FIG. 40 are the same as steps S102 and S104 in FIG. That is, the editing work of FIG. 40 is obtained by deleting steps S101 and S103, which are work related to the assignment of the speaker EM (start), in the editing work of FIG.

次に、図４１のフローチャートを参照して、図３９の撮影処理部２３０による撮影処理について説明する。この撮影処理は、例えば、ユーザが操作部７７を操作することにより、撮影の開始を指令したとき開始される。 Next, imaging processing by the imaging processing unit 230 in FIG. 39 will be described with reference to the flowchart in FIG. This photographing process is started, for example, when the user commands the start of photographing by operating the operation unit 77.

ステップＳ１９１乃至Ｓ１９５の処理は、図２８のステップＳ１１３乃至Ｓ１１５、Ｓ１１８、およびＳ１１９の処理と同様であるので、説明は省略する。 The processing in steps S191 through S195 is the same as the processing in steps S113 through S115, S118, and S119 in FIG.

図４２は、図３７の編集装置２１３のハードウェア構成例を示すブロック図である。 FIG. 42 is a block diagram illustrating a hardware configuration example of the editing device 213 in FIG.

図４２の編集装置２１３では、一時記憶メモリI/F１１２、光ディスクドライブI/F１１３、操作部I/F１１４、音声出力I/F１１５、シリアルデータI/F１１６、映像表示I/F１１７、メモリカードI/F１１８、ネットワークI/F１１９、ハードディスクドライブI/F１２０、ドライブI/F１２１、およびマイコン２４１が、システムバス１２２に接続されている。なお、図４２において、図９や図２９と同一のものには同一の符号を付してあり、説明は繰り返しになるので省略する。 42, the temporary storage memory I / F 112, the optical disk drive I / F 113, the operation unit I / F 114, the audio output I / F 115, the serial data I / F 116, the video display I / F 117, and the memory card I / F 118. A network I / F 119, a hard disk drive I / F 120, a drive I / F 121, and a microcomputer 241 are connected to the system bus 122. In FIG. 42, the same components as those in FIGS. 9 and 29 are denoted by the same reference numerals, and the description thereof will not be repeated.

マイコン２４１は、CPU、ROM、およびRAMにより構成される。マイコン２４１のCPUは、ROMまたはハードディスク１２８に記録されているプログラムにしたがって、操作部I/F１１４からの操作信号などに応じて、編集装置２１３の各部を制御する。 The microcomputer 241 includes a CPU, a ROM, and a RAM. The CPU of the microcomputer 241 controls each unit of the editing device 213 according to an operation signal from the operation unit I / F 114 according to a program recorded in the ROM or the hard disk 128.

例えば、CPUは、図２９のマイコン１９５のCPUと同様に、光ディスクドライブI/F１１３から供給される、光ディスクドライブ４１Ａに装着された光ディスク２１２から読み出されたクリップを一時記憶メモリI/F１１２に供給する。 For example, the CPU supplies the clip read from the optical disk 212 mounted on the optical disk drive 41A, supplied from the optical disk drive I / F 113, to the temporary storage memory I / F 112, similarly to the CPU of the microcomputer 195 in FIG. To do.

また、CPUは、マイコン１９５のCPUと同様に、操作信号に応じてエディットリストを作成することにより、非破壊編集を行う。CPUは、マイコン１９５のCPUと同様に、エディットリストを光ディスク２１２に記録させる。 The CPU performs nondestructive editing by creating an edit list in accordance with the operation signal, similarly to the CPU of the microcomputer 195. The CPU records the edit list on the optical disk 212 in the same manner as the CPU of the microcomputer 195.

さらに、CPUは、操作部I/F１１４からの操作信号に応じて、編集結果の電子マークデータを作成する。そして、CPUは、マイコン１９５のCPUと同様に、その電子マークデータを、光ディスク２１２のエディットリスト用NRTファイルに記録させる。 Further, the CPU creates electronic mark data as a result of editing in response to an operation signal from the operation unit I / F 114. Then, the CPU records the electronic mark data in the edit list NRT file of the optical disk 212, similarly to the CPU of the microcomputer 195.

また、CPUは、マイコン１９５のCPUと同様に、操作信号と編集結果の電子マークデータとに基づいて、編集結果の音声のうちの、ユーザにより指定された発言者ＩＤの発言者の発言にダックボイス加工を施すように、エディットリストを変更する。 In addition, the CPU, like the CPU of the microcomputer 195, ducks the speech of the speaker with the speaker ID specified by the user, out of the edited result, based on the operation signal and the electronic mark data of the edited result. Change the edit list to perform voice processing.

さらに、CPUは、マイコン１９５のCPUと同様に、一時記憶メモリI/F１１２から供給されるクリップのうちの音声データを、システムバス１２２を介して音声出力I/F１１５に供給して、クリップの音声をスピーカ１２５から出力させる。また、CPUは、マイコン１９５のCPUと同様に、一時記憶メモリI/F１１２から供給されるクリップのうちの映像データを、システムバス１２２を介して映像表示I/F１１７に供給して、クリップの映像を表示装置１２６に表示させる。RAMには、CPUが実行するプログラムやデータなどが適宜記憶される。 Further, like the CPU of the microcomputer 195, the CPU supplies the audio data of the clip supplied from the temporary storage memory I / F 112 to the audio output I / F 115 via the system bus 122, and the audio of the clip. Is output from the speaker 125. Similarly to the CPU of the microcomputer 195, the CPU supplies the video data of the clip supplied from the temporary storage memory I / F 112 to the video display I / F 117 via the system bus 122, and the video of the clip. Is displayed on the display device 126. The RAM appropriately stores programs executed by the CPU, data, and the like.

図４３は、図４２の編集装置２１３における編集処理部の機能的な構成例を示している。 FIG. 43 shows a functional configuration example of the editing processing unit in the editing device 213 of FIG.

図４３の編集処理部２５０は、エディットリスト作成部２０１とEM作成部２５１により構成される。なお、図４３において、図３０と同一のものには同一の符号を付してあり、説明は繰り返しになるので、省略する。 The edit processing unit 250 in FIG. 43 includes an edit list creation unit 201 and an EM creation unit 251. In FIG. 43, the same components as those in FIG. 30 are denoted by the same reference numerals, and description thereof will be omitted.

EM作成部２５１は、操作部I/F１１４からの操作信号に応じて、編集結果の電子マークデータを作成する。そして、EM作成部２５１は、図３０のEM作成部２０２と同様に、その電子マークデータを、光ディスク２１２のエディットリスト用NRTファイルに記録させるとともに、エディットリスト作成部２０１に供給する。 The EM creation unit 251 creates electronic mark data as a result of editing in response to an operation signal from the operation unit I / F 114. Then, the EM creation unit 251 records the electronic mark data in the edit list NRT file of the optical disc 212 and supplies it to the edit list creation unit 201 in the same manner as the EM creation unit 202 of FIG.

また、EM作成部２５１は、EM作成部２０２と同様に、操作部I/F１１４から供給される操作信号に応じて、編集結果の電子マークデータに記述される、ユーザにより指定された発言者ＩＤが付加された発言者EM(start)に、ダックボイス加工の有無を表す情報を付加する。 Similarly to the EM creation unit 202, the EM creation unit 251 is a speaker ID specified by the user described in the electronic mark data of the edited result in accordance with the operation signal supplied from the operation unit I / F 114. Information indicating the presence / absence of duck voice processing is added to the speaker EM (start) to which is added.

次に、図４４を参照して、ユーザが編集装置２１３を用いて行う編集作業について説明する。 Next, with reference to FIG. 44, an editing operation performed by the user using the editing device 213 will be described.

図４４の表では、編集作業の各ステップの番号に対応付けて、そのステップにおける編集作業の内容、編集装置２１３による主な処理の内容、および、その処理の対象となるデータが記述されている。 In the table of FIG. 44, the contents of the editing work at that step, the contents of the main processing by the editing device 213, and the data to be processed are described in association with the number of each step of the editing work. .

図４４に示すように、ステップＳ２１１において、図３５のステップＳ１３１と同様に、ユーザは、編集装置２１３の光ディスクドライブ４１Ａに光ディスク２１２を装着し、操作部１２４を操作して編集画面の表示の指令を行う。このとき、編集装置２１３のエディットリスト作成部２０１は、プロキシファイルのプロキシデータに基づいて、編集画面を表示装置１２６に表示させ、クリップの音声をスピーカ１２５から出力させる。 As shown in FIG. 44, in step S211, similarly to step S131 of FIG. 35, the user mounts the optical disk 212 on the optical disk drive 41A of the editing apparatus 213, and operates the operation unit 124 to command to display the editing screen. I do. At this time, the edit list creation unit 201 of the editing device 213 displays the editing screen on the display device 126 based on the proxy data of the proxy file, and outputs the audio of the clip from the speaker 125.

ステップＳ２１２において、ユーザは、操作部１２４を操作して、編集画面においてイン点およびアウト点を指定することにより編集を行う。このとき、エディットリスト作成部２０１は、ユーザにより指定されたイン点およびアウト点に基づいて、エディットリストを作成する。そして、エディットリスト作成部２０１は、そのエディットリストを光ディスク２１２のエディットリストファイルに記録させるとともに、EM作成部２５１に供給する。 In step S212, the user operates the operation unit 124 to perform editing by specifying an in point and an out point on the editing screen. At this time, the edit list creation unit 201 creates an edit list based on the in and out points specified by the user. Then, the edit list creation unit 201 records the edit list in the edit list file of the optical disc 212 and supplies it to the EM creation unit 251.

ステップＳ２１３において、ユーザは、操作部１２４を操作して入力画面（図１２）の表示を指令する。このとき、EM作成部２５１は、予めハードディスク１２８に登録されている発言者リストに基づいて、表示装置１２６に入力画面を表示させる。 In step S213, the user operates the operation unit 124 to instruct display of the input screen (FIG. 12). At this time, the EM creation unit 251 displays an input screen on the display device 126 based on a speaker list registered in the hard disk 128 in advance.

ステップＳ２１４において、ユーザは、編集結果の再生を指令する。このとき、EM作成部２５１は、エディットリストに基づいて、光ディスク２１２のクリップファイルから編集結果を構成する素材データを再生する。その結果、編集結果の音声がスピーカ１２５から出力され、映像が表示装置１２６に表示される。 In step S214, the user commands the reproduction of the edited result. At this time, the EM creation unit 251 reproduces material data constituting the editing result from the clip file on the optical disc 212 based on the edit list. As a result, the edited audio is output from the speaker 125 and the video is displayed on the display device 126.

ステップＳ２１５において、ユーザは、クリップの音声を聞き、各発言者の発言の開始時に、入力画面において操作部１２４を操作して、その発言者の発言者ＩＤを入力する。このとき、EM作成部２５１は、再生中の音声に対応するフレームに、入力された発言者ＩＤを付加した発言者EM（start）を付与し、その発言者EM(start)をエディットリスト用NRTファイルの電子マークデータに記述する。 In step S215, the user listens to the audio of the clip and, at the start of each speaker's speech, operates the operation unit 124 on the input screen to input the speaker's speaker ID. At this time, the EM creating unit 251 assigns a speaker EM (start) to which the input speaker ID is added to the frame corresponding to the sound being played back, and the speaker EM (start) is assigned to the edit list NRT. Describe in the electronic mark data of the file.

ステップＳ２１６およびＳ２１７の処理は、図３５のステップＳ１３３およびＳ１３４の処理と同様であるので、説明は省略する。 Since the process of step S216 and S217 is the same as the process of step S133 and S134 of FIG. 35, description is abbreviate | omitted.

次に、図４５のフローチャートを参照して、図４３のEM作成部２５１による編集結果に発言者EM（start）を付与する付与処理について説明する。この付与処理は、例えば、ユーザが操作部１２４を操作することにより、図１２の入力画面の表示を指令したとき開始される。 Next, with reference to a flowchart of FIG. 45, description will be given of an adding process for adding a speaker EM (start) to the editing result by the EM creating unit 251 of FIG. This giving process is started, for example, when the user instructs display of the input screen of FIG. 12 by operating the operation unit 124.

ステップＳ２３１において、EM作成部２５１は、予めハードディスク１２８に登録されている発言者リストに基づいて、表示装置１２６に入力画面を表示させる。ステップＳ２３２において、EM作成部２５１は、ユーザにより編集結果の再生が指令されたかどうかを判定する。ステップＳ２３２で、編集結果の再生が指令されていないと判定された場合、EM作成部２５１は、再生が指令されるまで待機する。 In step S 231, the EM creation unit 251 displays an input screen on the display device 126 based on a speaker list registered in the hard disk 128 in advance. In step S232, the EM creation unit 251 determines whether or not the user has instructed playback of the edited result. If it is determined in step S232 that reproduction of the edited result is not instructed, the EM creation unit 251 waits until reproduction is instructed.

一方、ステップＳ２３２で、編集結果の再生が指令されたと判定された場合、ステップＳ２３３において、EM作成部２５１は、その編集結果の再生を開始する。ステップＳ２３４において、EM作成部２５１は、操作部I/F１１４から供給される操作信号に応じて、ユーザにより発言者ＩＤが入力されたかを判定する。 On the other hand, when it is determined in step S232 that reproduction of the editing result has been instructed, in step S233, the EM creation unit 251 starts reproduction of the editing result. In step S234, the EM creation unit 251 determines whether a speaker ID is input by the user in accordance with the operation signal supplied from the operation unit I / F 114.

ステップＳ２３４で、ユーザにより発言者ＩＤが入力されていないと判定された場合、EM作成部２５１は、発言者ＩＤが入力されるまで待機する。また、ステップＳ２３４で、ユーザにより発言者ＩＤが入力されたと判定された場合、ステップＳ２３５において、EM作成部２５１は、その発言者ＩＤの入力に対応する位置である現在再生中のフレームのフレーム番号に基づいて、現在再生中のフレームに、入力された発言者ＩＤが付加された発言者EM(start)を付与し、その発言者EM(start)をエディットリスト用NRTファイルの電子マークデータに記述する。 If it is determined in step S234 that the speaker ID has not been input by the user, the EM creating unit 251 waits until the speaker ID is input. If it is determined in step S234 that the speaker ID has been input by the user, in step S235, the EM creation unit 251 determines the frame number of the currently playing frame that is the position corresponding to the input of the speaker ID. Based on the above, the speaker EM (start) with the input speaker ID is added to the currently playing frame, and the speaker EM (start) is described in the electronic mark data of the edit list NRT file To do.

ステップＳ２３６において、EM作成部２５１は、再生中の編集結果が終端まで再生されたかを判定し、終端まで再生されていないと判定した場合、処理はステップＳ２３４に戻り、上述した処理が繰り返される。 In step S236, the EM creation unit 251 determines whether the editing result being played has been played to the end, and if it is determined that the editing result has not been played to the end, the process returns to step S234 and the above-described processing is repeated.

一方、ステップＳ２３６において、再生中の編集結果が終端まで再生されたと判定された場合、ステップＳ２３７において、EM作成部２５１は、編集結果の再生を終了する。そして処理は終了する。 On the other hand, when it is determined in step S236 that the editing result being reproduced has been reproduced to the end, in step S237, the EM creation unit 251 ends the reproduction of the editing result. Then, the process ends.

以上のように、編集装置２１３は、ユーザからの入力に応じて、編集結果に発言者EM(start)を付与するので、編集結果のうちの所望の発言者の音声にダックボイス加工を施す場合に、この発言者EM(start)により、ダックボイス加工を施す音声の区間を容易に認識することができる。 As described above, the editing device 213 adds the speaker EM (start) to the editing result in accordance with the input from the user, and therefore when the duck voice processing is performed on the voice of the desired speaker in the editing result In addition, the speaker EM (start) can easily recognize the voice section to be duck voice processed.

なお、図示は省略するが、編集処理部２５０による、音声と映像を非破壊編集する編集処理は、図２３の音声編集処理と同様である。但し、編集処理部２５０による編集処理では、図２３のステップＳ８３とＳ８５において、現在再生中のフレームのフレーム番号が、素材サブクリップとして音声と映像の再生を開始する位置または終了する位置として、エディットリストに記述される。 Although illustration is omitted, the editing processing for nondestructive editing of audio and video by the editing processing unit 250 is the same as the audio editing processing of FIG. However, in the editing processing by the editing processing unit 250, in steps S83 and S85 in FIG. 23, the frame number of the frame currently being played back is set as the position where the playback of audio and video as the material sub-clip starts or ends. Described in the list.

また、図３７の撮影編集システム２１０では、１つのビデオカメラ２１１によりテレビジョン素材が撮影されたが、複数のビデオカメラによりテレビジョン素材が撮影されるようにしてもよい。 37, the television material is photographed by one video camera 211, but the television material may be photographed by a plurality of video cameras.

この場合、編集装置２１３は、図１の編集装置４１と同様に、各ビデオカメラで撮影されたクリップを１つの光ディスクに集約する。また、この場合、複数のビデオカメラのそれぞれで音声が取得されるようにしてもよいし、いずれか１つのビデオカメラで音声が取得されるようにしてもよい。いずれか１つのビデオカメラで音声が取得される場合、編集装置２１３は、編集装置４１と同様に、映像と音声を別々に非破壊編集する。 In this case, the editing device 213 aggregates the clips taken by each video camera onto one optical disc, similarly to the editing device 41 in FIG. In this case, the sound may be acquired by each of the plurality of video cameras, or the sound may be acquired by any one of the video cameras. When audio is acquired by any one of the video cameras, the editing device 213 performs nondestructive editing on the video and the audio separately, similarly to the editing device 41.

さらに、上述した説明では、光ディスクにクリップが記録されるものとしたが、クリップが記録される記録媒体は、勿論、光ディスクに限定されない。 Further, in the above description, the clip is recorded on the optical disc. However, the recording medium on which the clip is recorded is not limited to the optical disc.

また、上述した説明では、ハードディスクに発言者リストが記録されるものとしたが、光ディスクなどの記録媒体に、クリップとともに記録されるようにしてもよい。 In the above description, the speaker list is recorded on the hard disk. However, the speaker list may be recorded together with the clip on a recording medium such as an optical disk.

さらに、ダックボイス加工を施した発言の発言者の映像には、モザイクを施すようにしてもよい。 Furthermore, a mosaic may be applied to a video of a speaker who has performed duck voice processing.

また、図２４や図３６の撮影編集システム１７０と図３７の撮影編集システム２１０では、発言者ID（start）が付与されたが、発言者ID（start）と発言者ID（end）の両方が付与されるようにしてもよい。 In addition, in the shooting / editing system 170 in FIGS. 24 and 36 and the shooting / editing system 210 in FIG. 37, the speaker ID (start) is assigned, but both the speaker ID (start) and the speaker ID (end) are assigned. It may be given.

さらに、上述した説明では、ユーザによりダックボイス加工を施す発言の発言者の発言者ＩＤが入力されると、編集結果の電子マークデータに記述されている発言者EM(start)と発言者EM(end)、または、発言者EM（start）に、ダックボイス加工の有無を表す情報が付加されたが、この情報が付加されないようにしてもよい。 Further, in the above description, when the speaker ID of the speaker who performs the duck voice processing is input by the user, the speaker EM (start) described in the electronic mark data of the editing result and the speaker EM ( end) or information indicating the presence or absence of duck voice processing is added to the speaker EM (start), but this information may not be added.

なお、本明細書において、プログラム記録媒体に格納されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the step of describing the program stored in the program recording medium is not limited to the processing performed in time series in the order described, but is not necessarily performed in time series. Or the process performed separately is also included.

また、本明細書において、システムとは、複数の装置により構成される装置全体を表すものである。 Further, in this specification, the system represents the entire apparatus constituted by a plurality of apparatuses.

さらに、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 Furthermore, the embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present invention.

本発明を適用した撮影編集システムの第１の実施の形態の構成例を示す図である。It is a figure which shows the structural example of 1st Embodiment of the imaging | photography edit system to which this invention is applied. 図１のビデオカメラのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the video camera of FIG. 図１のビデオカメラの撮影処理部の機能的な構成例を示すブロック図である。It is a block diagram which shows the functional structural example of the imaging | photography process part of the video camera of FIG. 図１の光ディスクに記録されているファイルのディレクトリ構造の例を示す図である。It is a figure which shows the example of the directory structure of the file currently recorded on the optical disk of FIG. 図４のクリップファイルのフォーマットの例を示す図である。It is a figure which shows the example of a format of the clip file of FIG. 発言者未定EM（start）と発言者未定EM（end）を記述した電子マークデータの例を示す図である。It is a figure which shows the example of the electronic mark data which described speaker undecided EM (start) and speaker undecided EM (end). 図１のビデオカメラを用いて行う撮影作業について説明する図である。It is a figure explaining the imaging | photography operation | work performed using the video camera of FIG. 図３の撮影処理部による撮影処理について説明するフローチャートである。FIG. 4 is a flowchart for describing photographing processing by a photographing processing unit in FIG. 3. FIG. 図１の編集装置のハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the editing apparatus of FIG. 図９の編集装置の編集処理部の機能的な構成例を示すブロック図である。It is a block diagram which shows the functional structural example of the edit process part of the editing apparatus of FIG. 非破壊編集後の光ディスクに記録されているファイルのディレクトリ構造の例を示す図である。It is a figure which shows the example of the directory structure of the file currently recorded on the optical disk after nondestructive editing. 入力画面の例を示す図である。It is a figure which shows the example of an input screen. 発言者EM(start)または発言者EM（end）を記述した電子マークデータの例を示す図である。It is a figure which shows the example of the electronic mark data which described speaker EM (start) or speaker EM (end). 編集対象のクリップと編集結果について説明する図である。It is a figure explaining the clip of edit object, and an edit result. 編集結果について説明する図である。It is a figure explaining an edit result. 図１５の編集結果のエディットリストを示す図である。FIG. 16 is a diagram showing an edit list of the editing result in FIG. 15. 図１５の編集結果に付与される発言者EM(start)と発言者EM(end)について説明する図である。It is a figure explaining the speaker EM (start) and the speaker EM (end) given to the edit result of FIG. 編集結果に付与された発言者EM(start)と発言者EM(end)を記述した電子マークデータの例を示す図である。It is a figure which shows the example of the electronic mark data which described speaker EM (start) and speaker EM (end) provided to the edit result. ダックボイス加工を施す場合のエディットリストの例を示す図である。It is a figure which shows the example of the edit list | wrist in the case of performing a duck voice process. ダックボイス加工を施す場合の編集結果の電子マークデータの例を示す図である。It is a figure which shows the example of the electronic mark data of the edit result in the case of performing a duck voice process. 図１の編集装置を用いて行う編集作業について説明する図である。It is a figure explaining the edit operation | work performed using the editing apparatus of FIG. 図１０の付加部による付加処理について説明するフローチャートである。It is a flowchart explaining the addition process by the addition part of FIG. 図１０の編集処理部による音声編集処理について説明するフローチャートである。It is a flowchart explaining the audio | voice edit process by the edit process part of FIG. 本発明を適用した撮影編集システムの第２の実施の形態の構成例を示す図である。It is a figure which shows the structural example of 2nd Embodiment of the imaging | photography edit system to which this invention is applied. 図２４のビデオカメラのハードウェア構成例を示すブロック図である。FIG. 25 is a block diagram illustrating a hardware configuration example of the video camera in FIG. 24. 図２５のビデオカメラにおける撮影処理部の機能的な構成例を示すブロック図である。FIG. 26 is a block diagram illustrating a functional configuration example of a photographing processing unit in the video camera of FIG. 25. 図２４のビデオカメラを用いて行う撮影作業について説明する図である。It is a figure explaining the imaging | photography operation | work performed using the video camera of FIG. 図２６の撮影処理部による撮影処理の詳細について説明するフローチャートである。It is a flowchart explaining the detail of the imaging | photography process by the imaging | photography process part of FIG. 図２４の編集装置のハードウェア構成例を示すブロック図である。FIG. 25 is a block diagram illustrating a hardware configuration example of the editing apparatus in FIG. 24. 図２９の編集装置における編集処理部の機能的な構成例を示すブロック図である。FIG. 30 is a block diagram illustrating a functional configuration example of an editing processing unit in the editing apparatus of FIG. 29. 編集対象のクリップと編集結果について説明する図である。It is a figure explaining the clip of edit object, and an edit result. 編集結果について説明する図である。It is a figure explaining an edit result. 第１のクリップの電子マークデータを示す図である。It is a figure which shows the electronic mark data of a 1st clip. 編集結果の電子マークデータを示す図である。It is a figure which shows the electronic mark data of an edit result. 編集装置を用いて行う編集作業について説明する図である。It is a figure explaining the edit work performed using an editing apparatus. 図２４の撮影編集システムの他の構成例を示す図である。It is a figure which shows the other structural example of the imaging | photography edit system of FIG. 本発明を適用した撮影編集システムの第３の実施の形態の構成例を示す図である。It is a figure which shows the structural example of 3rd Embodiment of the imaging | photography edit system to which this invention is applied. 図３７のビデオカメラのハードウェア構成例を示すブロック図である。FIG. 38 is a block diagram illustrating a hardware configuration example of the video camera in FIG. 37. 図３８のビデオカメラにおける撮影処理部の機能的な構成例を示すブロック図である。It is a block diagram which shows the functional structural example of the imaging | photography process part in the video camera of FIG. 図３７のビデオカメラを用いて行う撮影作業について説明する図である。It is a figure explaining the imaging | photography operation | work performed using the video camera of FIG. 図３９の撮影処理部による撮影処理について説明するフローチャートである。It is a flowchart explaining the imaging | photography process by the imaging | photography process part of FIG. 図３７の編集装置のハードウェア構成例を示すブロック図である。FIG. 38 is a block diagram illustrating a hardware configuration example of the editing apparatus in FIG. 37. 図４２の編集装置における編集処理部の機能的な構成例を示すブロック図である。43 is a block diagram illustrating a functional configuration example of an editing processing unit in the editing apparatus of FIG. 42. FIG. 図３７の編集装置を用いて行う編集作業について説明する図である。It is a figure explaining the edit work performed using the editing apparatus of FIG. 図４３のEM作成部による付与処理について説明するフローチャートである。FIG. 44 is a flowchart for describing a grant process by an EM creation unit in FIG. 43. FIG.

Explanation of symbols

４１編集装置，１１４操作部I/F，１５２エディットリスト作成部，１５３ EM作成部，２０１エディットリスト作成部，２０２ EM作成部，２５１ EM作成部 41 editing device, 114 operation unit I / F, 152 edit list creation unit, 153 EM creation unit, 201 edit list creation unit, 202 EM creation unit, 251 EM creation unit

Claims

In an editing device for editing video with audio,
Editing means for editing the video with audio and creating editing information related to the editing result;
Based on the pre-edit electronic mark data describing the electronic mark to which the unique information of the voice speaker is added, indicating the characteristics of the audio of the video with audio, the edited electronic mark data is the electronic mark data of the edit result Creating means for creating mark data;
Designating means for designating a speaker of a voice to be subjected to predetermined processing of the voice of the edited result,
The editing means includes, in the editing information, information that designates a voice section of a speaker designated by the designation means as a voice section to be subjected to the predetermined processing based on the edited electronic mark data. Editing device.

The editing means edits the video with audio based on a cut point that represents a start position or an end position of a section to be included in the editing result of the video with audio specified by the user,
The creation means copies the electronic mark data before editing of audio included in the editing result as the electronic mark data after editing, and sets the electronic mark at the position on the editing result corresponding to the cut point to the electronic mark after editing. The editing apparatus according to claim 1, wherein the edited electronic mark data is created by newly describing the data.

The editing apparatus according to claim 1, wherein the creating unit adds information indicating that the predetermined processing is performed to the edited electronic mark to which the unique information of the speaker specified by the specifying unit is added.

In the editing method of the editing device for editing video with sound,
Edit the video with audio, create editing information about the editing result,
Based on the pre-edit electronic mark data describing the electronic mark to which the unique information of the voice speaker is added, indicating the characteristics of the audio of the video with audio, the edited electronic mark data is the electronic mark data of the edit result Create mark data,
Specify the voice speaker to perform the predetermined processing of the edited result voice,
An editing method including a step of including, in the editing information, information specifying a voice section of a designated speaker as a voice section to be subjected to the predetermined processing based on the edited electronic mark data.

In a program that causes a computer to perform editing processing for editing video with audio,
Edit the video with audio, create editing information on the editing result,
Based on the pre-editing electronic mark data describing the electronic mark to which the unique information of the voice speaker is added and indicating the characteristics of the audio with the audio-added video, the edited electronic mark data is the electronic mark data of the editing result Create mark data,
Specify the voice speaker to perform the predetermined processing of the edited result voice,
Based on the edited electronic mark data, the computer performs an editing process including the step of including, in the editing information, information specifying the voice section of the designated speaker as the voice section to be subjected to the predetermined processing. Program.