JP2013131871A

JP2013131871A - Editing device, remote controller, television receiver, specific audio signal, editing system, editing method, program, and recording medium

Info

Publication number: JP2013131871A
Application number: JP2011279036A
Authority: JP
Inventors: Takeshi Shibata; 健柴田; Yasuhisa Nogami; 康久野上; Matsuo Kamei; 松雄亀井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2011-12-20
Filing date: 2011-12-20
Publication date: 2013-07-04

Abstract

PROBLEM TO BE SOLVED: To provide an editing device capable of easily editing plural pieces of audio video data.SOLUTION: An audio analysis part 15 of an editing device 10 extracts a specific audio signal embedded in the pieces of audio video data and identifies a reproduction start point on each audio video data on the basis of each extracted specific audio signal. A control part 16 performs control to reproduce each piece of audio video data from the reproduction start point.

Description

本発明は、映像音声データを編集する編集装置に関する。また、そのような編集装置を含むテレビジョン受像機、および、そのような編集装置を遠隔制御するための遠隔制御装置に関する。 The present invention relates to an editing apparatus for editing video / audio data. The present invention also relates to a television receiver including such an editing device and a remote control device for remotely controlling such an editing device.

近年、ビデオカメラのデジタル化に伴い、一般のユーザでも高画質かつ高音質のデジタル映像を手軽に撮影することができるようになってきた。また、ビデオカメラの低価格化に伴い、一般ユーザの間でも、複数のビデオカメラを用いて複数のアングルから被写体を撮影し、これら複数の音声映像データを編集することによって編集映像を作成したいというニーズが高まっている。 In recent years, with the digitalization of video cameras, it has become possible for ordinary users to easily shoot digital images with high image quality and high sound quality. In addition, along with the price reduction of video cameras, it is said that general users want to create an edited video by shooting a subject from a plurality of angles using a plurality of video cameras and editing the plurality of audio / video data. Needs are growing.

特許文献１には、複数のビデオ収録デッキを一度に制御することができるビデオカメラビデオ収録デッキ用操作装置が開示されている。 Patent Document 1 discloses an operation device for a video camera video recording deck capable of controlling a plurality of video recording decks at a time.

特開平１０−２７１４３１（１９９８年１０月９日公開）JP-A-10-271431 (released on October 9, 1998)

デジタル音声映像データの編集には、パーソナルコンピュータおよび編集用のソフトウェアなどの編集用環境が必要になる。また、一般に編集作業には、高度な知識が必要とされ、また、手間のかかる作業でもある。それゆえ、編集作業を毛嫌いするユーザも多い。 Editing digital audio / video data requires an editing environment such as a personal computer and editing software. In general, the editing work requires a high level of knowledge and is a laborious work. Therefore, many users dislike editing work.

本発明は、上記の課題を解決するためになされたものであり、その主たる目的は、複数の音声映像データの編集作業を容易に行うことができる編集装置を実現することにある。 The present invention has been made to solve the above-described problems, and a main object of the present invention is to realize an editing apparatus capable of easily editing a plurality of audio-video data.

上記の課題を解決するため、本発明に係る編集装置は、各音声映像データによって表される複数の映像を編集する編集装置であって、各音声映像データに埋め込まれた特定音声信号を抽出する抽出手段と、上記抽出手段によって抽出された各特定音声信号に基づいて、各音声映像データ上の再生開始点を特定する特定手段と、上記再生開始点から各音声映像データを再生するよう各音声映像データの頭出しを行う制御手段と、を備えていることを特徴としている。 In order to solve the above problems, an editing apparatus according to the present invention is an editing apparatus that edits a plurality of videos represented by each audio-video data, and extracts a specific audio signal embedded in each audio-video data. Extraction means, identification means for specifying a reproduction start point on each audio video data based on each specific audio signal extracted by the extraction means, and each audio so as to reproduce each audio video data from the reproduction start point And control means for cueing video data.

上記のように構成された編集装置によれば、各音声映像データに埋め込まれた特定音声信号に基づいて再生開始点を特定し、特定された再生開始点から各音声映像データを再生するよう各音声映像データの頭出しを行うので、各音声映像データの同期を容易にとることができる。したがって、ユーザは各音声映像データを同期させるために煩雑な作業を行う必要がないので、音声映像データの編集作業を容易に行うことができる。 According to the editing apparatus configured as described above, the reproduction start point is identified based on the specific audio signal embedded in each audio / video data, and each audio / video data is reproduced from the identified reproduction start point. Since the audio / video data is cued, the audio / video data can be easily synchronized. Therefore, since the user does not need to perform a complicated operation to synchronize the audio / video data, the audio / video data can be easily edited.

また、上記特定音声信号には、固定周波数からなる固定周波数信号と、段階的に変化していく可変周波数からなる可変周波数信号とが繰り返し含まれており、上記特定手段は、上記固定周波数信号と上記可変周波数信号とを参照して、上記再生開始点を特定する、ことが好ましい。 The specific audio signal includes a fixed frequency signal composed of a fixed frequency and a variable frequency signal composed of a variable frequency that changes in stages, and the identifying means includes the fixed frequency signal and the fixed frequency signal. It is preferable to specify the reproduction start point with reference to the variable frequency signal.

上記の構成によれば、上記特定手段は、上記特定音声信号に含まれる固定周波数信号と可変周波数信号とを参照して上記再生開始点を特定するので、周波数を解析するための簡易な構成によって、上記再生開始点を特定することができる。 According to the above configuration, the specifying unit specifies the reproduction start point with reference to the fixed frequency signal and the variable frequency signal included in the specific audio signal, and therefore, with a simple configuration for analyzing the frequency. The playback start point can be specified.

また、本発明に係る編集装置は、上記各音声映像データは、各外部機器から供給され、上記制御手段は、上記再生開始点から各音声映像データを再生するよう各外部機器を制御する、ことが好ましい。 In the editing apparatus according to the present invention, each audio / video data is supplied from each external device, and the control means controls each external device to reproduce each audio / video data from the reproduction start point. Is preferred.

上記のように構成された編集装置によれば、各外部機器から供給される各音声映像データに埋め込まれた特定音声信号にお基づいて再生開始点を特定し、その再生開始点から各音声映像データを再生するよう、各外部機器を制御するので、各外部機器から供給される各音声映像データの同期を容易にとることができる。したがって、ユーザは各外部機器から供給される各音声映像データを同期させるために煩雑な作業を行う必要がないので、音声映像データの編集作業を容易に行うことができる。 According to the editing apparatus configured as described above, the playback start point is specified based on the specific audio signal embedded in each audio video data supplied from each external device, and each audio video is determined from the playback start point. Since each external device is controlled so as to reproduce the data, each audio video data supplied from each external device can be easily synchronized. Therefore, the user does not need to perform a complicated operation to synchronize each audio / video data supplied from each external device, so that the audio / video data can be easily edited.

また、上記編集装置は、再生される各音声映像データのいずれかを各時刻において選択することによって編集後の音声映像データを生成する編集手段を更に備えていることが好ましい。 The editing apparatus preferably further includes editing means for generating edited audio / video data by selecting any of the audio / video data to be reproduced at each time.

上記の構成によれば、ユーザは、上記編集手段を用いることによって、各音声映像データから編集後の音声映像データを容易に生成することができる。 According to the above configuration, the user can easily generate edited audio / video data from each audio / video data by using the editing means.

また、本発明に係る遠隔制御装置は、音声映像データによって表される複数の映像を編集する編集装置を遠隔制御するための遠隔制御装置であって、上記編集装置が各音声映像データの再生開始点を特定するために参照する特定音声信号を出力する出力手段を備えている、ことを特徴としている。 The remote control device according to the present invention is a remote control device for remotely controlling an editing device that edits a plurality of videos represented by audio-video data, and the editing device starts reproduction of each audio-video data. It is characterized by comprising output means for outputting a specific audio signal that is referred to for specifying a point.

上記のように構成された遠隔制御装置によれば、上記編集装置が各音声映像データの再生開始点を特定するために参照する特定音声信号を出力する出力手段を備えているので、ユーザは、上記遠隔制御装置を用いて、各外部機器が記録する音声映像データの一部に特定音声信号を埋め込んでおくことができる。 According to the remote control device configured as described above, the editing device includes an output unit that outputs a specific audio signal that is referred to in order to specify the reproduction start point of each audio video data. Using the remote control device, a specific audio signal can be embedded in a part of audio / video data recorded by each external device.

また、そのような音声映像データを編集する編集装置は、上記特定音声信号を参照して、再生開始点を特定することができるので、ユーザは容易に編集作業を行うことができる。 In addition, since an editing apparatus that edits such audio-video data can specify the playback start point with reference to the specific audio signal, the user can easily perform editing work.

また、本発明に係る編集システムは、上記編集装置と、上記遠隔制御装置と、上記外部機器として複数のビデオカメラとを含んでいる、ことを特徴としている。 The editing system according to the present invention includes the editing device, the remote control device, and a plurality of video cameras as the external devices.

上記の構成によれば、上記編集装置は、各ビデオカメラから供給される音声映像データに埋め込まれた特定音声信号に基づいて、再生開始点を特定するので、ユーザは容易に編集作業を行うことができる。 According to the above configuration, the editing device identifies the playback start point based on the specific audio signal embedded in the audio / video data supplied from each video camera, so that the user can easily perform editing operations. Can do.

また、本発明に係るテレビジョン受像機は、上記編集装置を備えている、ことを特徴としている。 A television receiver according to the present invention includes the editing device.

上記のように構成されたテレビジョン受像機によれば、上記編集装置を備えているので、ユーザは容易に編集作業ができる。また、編集作業のためのパーソナルコンピュータや編集用ソフトウェアなどを用意する必要がないので、ユーザの利便性を高めることができる。 According to the television receiver configured as described above, since the editing device is provided, the user can easily perform editing work. In addition, since it is not necessary to prepare a personal computer or editing software for editing work, user convenience can be improved.

また、本発明に係る特定音声信号は、音声信号に埋め込まれ、当該音声信号上の特定の位置を特定するために参照される特定音声信号であって、固定周波数からなる固定周波数信号と、段階的に変化していく可変周波数からなる可変周波数信号とを繰り返し含んでおり、上記音声信号を解析する解析装置は、上記固定周波数信号と上記可変周波数信号とを参照して、上記特定の位置を特定する、ことを特徴としている。 In addition, the specific audio signal according to the present invention is a specific audio signal that is embedded in the audio signal and referred to for specifying a specific position on the audio signal, and a fixed frequency signal composed of a fixed frequency, And an analysis device that analyzes the audio signal refers to the fixed frequency signal and the variable frequency signal to determine the specific position. It is characterized by specifying.

上記のように構成された特定音声信号を解析する解析装置は、上記固定周波数信号と上記可変周波数信号とを参照して、上記特定の位置を特定するので、上記特定音声信号は、簡易な構成によって音声信号上の特定の位置を特定するために用いることができる。 The analysis device that analyzes the specific audio signal configured as described above specifies the specific position with reference to the fixed frequency signal and the variable frequency signal. Therefore, the specific audio signal has a simple configuration. Can be used to identify a specific location on the audio signal.

また、本発明に係る編集方法は、各音声映像データによって表される複数の映像を編集する編集方法であって、各音声映像データに埋め込まれた特定音声信号を抽出する抽出工程と、上記抽出工程にて抽出された各特定音声信号に基づいて、各音声映像データ上の再生開始点を特定する特定工程と、上記再生開始点から各音声映像データを再生するよう各音声映像データの頭出しを行う制御工程と、を含んでいることを特徴としている。 The editing method according to the present invention is an editing method for editing a plurality of videos represented by each audio-video data, the extraction step extracting a specific audio signal embedded in each audio-video data, and the above extraction Based on each specific audio signal extracted in the process, a specific step for specifying a reproduction start point on each audio / video data, and cueing of each audio / video data to reproduce each audio / video data from the reproduction start point And a control process for performing.

上記のように構成された編集方法によれば、上記編集装置と同様の効果を奏する。 According to the editing method configured as described above, the same effects as those of the editing apparatus can be obtained.

また、本発明に係る編集装置が備えている各手段としてコンピュータを動作させるためのプログラム、および、それらのプログラムを記録したコンピュータ読み取り可能な記録媒体についても本発明の範疇に含まれる。 Further, a program for operating a computer as each means included in the editing apparatus according to the present invention and a computer-readable recording medium on which the program is recorded are also included in the scope of the present invention.

以上のように、本発明に係る編集装置は、各音声映像データによって表される複数の映像を編集する編集装置であって、各音声映像データに埋め込まれた特定音声信号を抽出する抽出手段と、上記抽出手段によって抽出された各特定音声信号に基づいて、各音声映像データ上の再生開始点を特定する特定手段と、上記再生開始点から各音声映像データを再生するよう各音声映像データの頭出しを行う制御手段と、を備えている。 As described above, the editing apparatus according to the present invention is an editing apparatus that edits a plurality of videos represented by each audio-video data, and an extraction unit that extracts a specific audio signal embedded in each audio-video data; , Based on each specific audio signal extracted by the extraction means, specifying means for specifying a reproduction start point on each audio video data, and each audio video data to reproduce each audio video data from the reproduction start point Control means for performing cueing.

上記のように構成された編集装置によれば、ユーザは、音声映像データの編集を容易に行うことができる。 According to the editing apparatus configured as described above, the user can easily edit audio-video data.

本発明の一実施形態に係る編集装置の構成を示すブロック図である。It is a block diagram which shows the structure of the editing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係るリモコンの構成を示すブロック図である。It is a block diagram which shows the structure of the remote control which concerns on one Embodiment of this invention. 本発明の一実施形態に係るデジタルビデオカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the digital video camera which concerns on one Embodiment of this invention. 本発明の一実施形態に係る特定音声信号の波形を示す図である。It is a figure which shows the waveform of the specific audio | voice signal which concerns on one Embodiment of this invention. 本発明の一実施形態に係る編集装置による編集開始点特定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the edit start point specific process by the editing apparatus which concerns on one Embodiment of this invention. 本発明の適用例を説明するための図であって、運動会等で用いられるトラックコースを互いに異なる複数の角度から撮影する場合を示す図である。It is a figure for demonstrating the example of application of this invention, Comprising: It is a figure which shows the case where the track course used at an athletic meet etc. is image | photographed from several different angles. 本発明の適用例を説明するための図であって、（ａ）は、当該適用例において用いられるデジタルビデオカメラおよび録音機を示しており、（ｂ）は、（ａ）に示す各機器によって録音される特定音声信号の各波形を示している。It is a figure for demonstrating the application example of this invention, Comprising: (a) has shown the digital video camera and recording device which are used in the said application example, (b) is shown by each apparatus shown to (a). Each waveform of a specific audio signal to be recorded is shown. 本発明の適用例を説明するための図であって、実施形態に係る音声解析部によって解析される各音声データ、並びに、解析の結果算出される編集開始点までの時間的間隔を示す図である。It is a figure for demonstrating the example of application of this invention, Comprising: Each voice data analyzed by the audio | voice analysis part which concerns on embodiment, and the figure which shows the time interval to the edit start point calculated as a result of an analysis is there. 本発明の一実施形態に係る編集装置を備えるテレビジョン受像機を用いた編集作業における表示画面の一例を示す図である。It is a figure which shows an example of the display screen in the edit operation | work using the television receiver provided with the editing apparatus which concerns on one Embodiment of this invention.

（テレビジョン受像器３０）
本発明の一実施形態に係るテレビジョン受像器３０について、図１を参照しながら説明する。テレビジョン受像器３０は、本発明の一実施形態に係る編集装置１０、映像処理部３１、ＬＣＤパネル３２、記録部３３、アンプ３４、および、スピーカー３５を備えている。テレビジョン受像機３０は、図１に示すように、リモコン（遠隔操作装置）４０によって操作される。 (Television receiver 30)
A television receiver 30 according to an embodiment of the present invention will be described with reference to FIG. The television receiver 30 includes an editing apparatus 10 according to an embodiment of the present invention, a video processing unit 31, an LCD panel 32, a recording unit 33, an amplifier 34, and a speaker 35. As shown in FIG. 1, the television receiver 30 is operated by a remote controller (remote operation device) 40.

（編集装置１０の構成）
図１に示すように、編集装置１０は、第１外部入出力部１１ａ、第２外部入出力部１１ｂ、第３外部入出力部１１ｃ、ＵＳＢインターフェース１２ａ、ＳＤカードインターフェース１２ｂ、メモリ１３、復号部１４、音声解析部１５（抽出手段、特定手段）、制御部１６（制御手段）、赤外線受光部１７および編集部（編集手段）２０を備えている。編集部２０は、映像選択部２１、音声選択部２２および編集データ生成部２３を備えている。 (Configuration of editing apparatus 10)
As shown in FIG. 1, the editing apparatus 10 includes a first external input / output unit 11a, a second external input / output unit 11b, a third external input / output unit 11c, a USB interface 12a, an SD card interface 12b, a memory 13, and a decoding unit. 14, a voice analysis unit 15 (extraction unit, identification unit), a control unit 16 (control unit), an infrared light receiving unit 17, and an editing unit (editing unit) 20. The editing unit 20 includes a video selection unit 21, an audio selection unit 22, and an editing data generation unit 23.

第１外部入出力部１１ａ、第２外部入出力部１１ｂおよび第３外部入出力部１１ｃは、外部機器を編集装置１０に接続するためのインターフェースである。当該インターフェースは、編集装置１０に接続されている外部機器から音声映像データを受け付けるものであり、かつ、編集装置１０に接続されている外部機器を制御可能な規格に適合しているものである。 The first external input / output unit 11a, the second external input / output unit 11b, and the third external input / output unit 11c are interfaces for connecting an external device to the editing apparatus 10. The interface accepts audio / video data from an external device connected to the editing apparatus 10 and conforms to a standard capable of controlling the external device connected to the editing apparatus 10.

上記インターフェースの規格として、例えばＨＤＭＩ、ＨＤＶ（ＩＥＥＥ１３９４）が挙げられる。本実施形態において、第１外部入出力部１１ａおよび第２外部入出力部１１ｂはＨＤＭＩに適合したものであり、第３外部入出力部１１ｃはＨＤＶに適合したものとする。ただし、これは、本実施形態を限定するものではなく、後述するデジタルビデオカメラ５０による音声映像データの頭出しを制御するための制御信号を伝送できるものであればよい。 Examples of the interface standard include HDMI and HDV (IEEE 1394). In the present embodiment, the first external input / output unit 11a and the second external input / output unit 11b are adapted to HDMI, and the third external input / output unit 11c is adapted to HDV. However, this is not a limitation of the present embodiment, and any signal may be used as long as it can transmit a control signal for controlling cueing of audio / video data by a digital video camera 50 described later.

ＵＳＢインターフェース１２ａは、ＵＳＢ接続された外部機器からの音声映像データを取り込むためのインターフェースであり、ＳＤカードインターフェース１２ｂは、ＳＤカードに記録された音声映像データを取り込むためのインターフェースである。 The USB interface 12a is an interface for taking in audio / video data from an external device connected via USB, and the SD card interface 12b is an interface for taking in audio / video data recorded on the SD card.

ＵＳＢインターフェース１２ａおよびＳＤカードインターフェース１２ｂから取り込まれた音声映像データは、メモリ１３に格納される。メモリ１３としては、例えば、大容量のフラッシュメモリ等を用いることができる。 The audio / video data fetched from the USB interface 12 a and the SD card interface 12 b is stored in the memory 13. As the memory 13, for example, a large-capacity flash memory can be used.

以下の説明では、第１〜第３の外部入出力部には、外部機器として、それぞれ第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃが接続されているものとする。また、第１の外部入出力部１１ａ、第２の外部入出力部１１ｂ、および、第３の外部入出力部１１ｃを合わせて外部入出力部１１とも表記する。 In the following description, it is assumed that first to third digital video cameras 50a to 50c are connected to the first to third external input / output units as external devices, respectively. The first external input / output unit 11a, the second external input / output unit 11b, and the third external input / output unit 11c are also collectively referred to as the external input / output unit 11.

なお、本実施形態において、編集装置１０が編集する複数の音声映像データの一例として、デジタルビデオカメラ５０ａ〜５０ｃに記録されている各音声映像データを用いて説明する。しかし、編集装置１０が編集する複数の音声映像データは、これに限定されるものではない。編集装置１０は、メモリ１３、または、テレビジョン受像器３０が備える記録部３３にあらかじめ記録された複数の音声映像データを編集することに用いることもできる。 In the present embodiment, a description will be given using audio / video data recorded in the digital video cameras 50a to 50c as an example of a plurality of audio / video data edited by the editing apparatus 10. However, the plurality of audio / video data edited by the editing apparatus 10 is not limited to this. The editing apparatus 10 can also be used to edit a plurality of audio / video data recorded in advance in the memory 13 or the recording unit 33 provided in the television receiver 30.

（復号部１４）
復号部１４は、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃから外部入出力部１１を介して供給される第１〜第３の音声映像データ、並びに、メモリ１３に格納された映像音声データを受け取ると共に、各音声映像データを復号することによって各音声映像信号を生成する。生成された各音声映像信号は、映像選択部２１、および音声解析部１５に供給される。各音声映像データは、例えば、ＭＰＥＧ２ストリームやＭＰＥＧ４ストリームである。 (Decoding unit 14)
The decoding unit 14 includes first to third audio / video data supplied from the first to third digital video cameras 50 a to 50 c via the external input / output unit 11, and video / audio data stored in the memory 13. Each audio video signal is generated by decoding each audio video data. The generated audio / video signals are supplied to the video selection unit 21 and the audio analysis unit 15. Each audio / video data is, for example, an MPEG2 stream or an MPEG4 stream.

以下では、第１〜第３の音声映像データを復号して得られる音声映像信号を、それぞれ、第１から第３の音声映像信号とも呼称する。 Hereinafter, the audio / video signals obtained by decoding the first to third audio / video data are also referred to as first to third audio / video signals, respectively.

なお、
（音声解析部１５）
音声解析部１５は、復号部１４によって復号された各音声信号から特定音声信号を検出および抽出すると共に、抽出した特定音声信号に基づいて各音声信号上の同期位置を特定する。また、音声解析部１５は、特定した同期位置を表す同期位置情報を、制御部１６に供給する。 In addition,
(Voice analysis unit 15)
The sound analysis unit 15 detects and extracts a specific sound signal from each sound signal decoded by the decoding unit 14 and specifies a synchronization position on each sound signal based on the extracted specific sound signal. Further, the voice analysis unit 15 supplies synchronization position information representing the identified synchronization position to the control unit 16.

ここで、特定音声信号とは、各音声信号に埋め込まれた特定の音声信号であり、音声解析部１５は、特定音声信号を解析することによって、各音声映像データにおける同期位置を特定することができる。本実施形態において、この同期位置は、主として編集の開始点として用いられるので、以下では、この同期位置のことを編集開始点（再生開始点）Ｐとも呼称する。また、第１〜第３の音声映像データにおける編集開始点Ｐを、それぞれ編集開始点Ｐ１〜Ｐ３とも表記する。 Here, the specific audio signal is a specific audio signal embedded in each audio signal, and the audio analysis unit 15 may specify a synchronization position in each audio video data by analyzing the specific audio signal. it can. In the present embodiment, this synchronization position is mainly used as an editing start point, and therefore, this synchronization position is also referred to as an editing start point (reproduction start point) P below. The edit start points P in the first to third audio / video data are also referred to as edit start points P1 to P3, respectively.

特定音声信号に基づいて編集開始点Ｐを特定する具体的な処理については、後述するためここでは説明を省略する。 Since specific processing for specifying the editing start point P based on the specific audio signal will be described later, description thereof is omitted here.

（制御部１６）
制御部１６は、外部入出力部１１、復号部１４、音声解析部１５、赤外線受光部１７および編集部２０に接続されており、これらの各部を制御する。 (Control unit 16)
The control unit 16 is connected to the external input / output unit 11, the decoding unit 14, the voice analysis unit 15, the infrared light receiving unit 17, and the editing unit 20, and controls these units.

赤外線受光部１７は、ユーザが操作するリモコン４０から送信される操作信号を受信し、受信した操作信号を制御部１６に出力する。制御部１６は、操作信号に応じて上記各部を制御する。 The infrared light receiving unit 17 receives an operation signal transmitted from the remote controller 40 operated by the user, and outputs the received operation signal to the control unit 16. The control unit 16 controls each of the above units according to the operation signal.

ユーザがリモコン４０を介して行う操作は、例えば、第１〜第３の音声映像データを編集開始点Ｐにてスタンバイさせるための操作（スタンバイ操作）、編集を開始させるための操作（編集開始操作）、および、編集時における音声映像データを選択させるための操作（音声映像データ選択操作）などが挙げられる。制御部１６は、赤外線受光部１７を介して操作信号を取得することによって、ユーザがどのような動作を編集装置１０に求めているのかを検知することができる。 The operation performed by the user via the remote controller 40 is, for example, an operation for putting the first to third audio / video data on standby at the editing start point P (standby operation), and an operation for starting editing (editing start operation). ), And an operation (audio / video data selection operation) for selecting audio / video data during editing. The control unit 16 can detect what operation the user is requesting from the editing apparatus 10 by acquiring an operation signal via the infrared light receiving unit 17.

制御部１６は、ユーザがリモコン４０を介して行う音声映像データ選択操作に基づいて、第１〜第３の映像音声信号の何れを選択すべきかを示す選択信号を生成し、映像選択部２１、音声選択部２２、および、映像処理部３１に供給する。選択信号は、より具体的には、各時点において、第１〜第３の映像音声信号の何れを選択すべきかを示す信号である。 The control unit 16 generates a selection signal indicating which one of the first to third video / audio signals should be selected based on an audio / video data selection operation performed by the user via the remote controller 40, and the video selection unit 21, The audio selection unit 22 and the video processing unit 31 are supplied. More specifically, the selection signal is a signal indicating which of the first to third video / audio signals should be selected at each time point.

また、制御部１６は、ユーザがリモコン４０を介して行うスタンバイ操作に基づいて、第１〜第３外部入出力部１１ａ〜１１ｃを介して、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃに対して制御信号を供給し、各音声映像データを編集開始点Ｐにてスタンバイさせる。 Further, the control unit 16 controls the first to third digital video cameras 50 a to 50 c via the first to third external input / output units 11 a to 11 c based on a standby operation performed by the user via the remote controller 40. On the other hand, a control signal is supplied, and each audio / video data is put on standby at the edit start point P.

（編集部２０）
編集部２０は、映像選択部２１、音声選択部２２、および、編集データ生成部２３を備えている。復号部１４において復号された第１〜第３の音声映像信号のうち、それぞれの映像信号である第１〜第３の映像信号は、映像選択部２１に供給される。復号部１４において復号された第１〜第３の音声映像信号のうち、それぞれの音声信号である第１〜第３の音声信号は、音声選択部２２に供給される。 (Editor 20)
The editing unit 20 includes a video selection unit 21, an audio selection unit 22, and an editing data generation unit 23. Among the first to third audio / video signals decoded by the decoding unit 14, the first to third video signals which are the respective video signals are supplied to the video selection unit 21. Of the first to third audio / video signals decoded by the decoding unit 14, the first to third audio signals that are respective audio signals are supplied to the audio selection unit 22.

映像選択部２１は、制御部１６から供給される選択信号に基づいて、第１〜第３の映像信号のいずれか１つの映像信号を選択し、編集データ生成部２３に出力する。また、映像選択部２１は、第１〜第３の映像信号を映像処理部３１に供給する。 The video selection unit 21 selects any one of the first to third video signals based on the selection signal supplied from the control unit 16 and outputs the selected video signal to the editing data generation unit 23. Further, the video selection unit 21 supplies the first to third video signals to the video processing unit 31.

音声選択部２２は、第１〜第３の音声信号のいずれか１つの音声信号を選択し、編集データ生成部２３およびアンプ３４に出力する。 The audio selection unit 22 selects any one of the first to third audio signals and outputs the selected audio signal to the editing data generation unit 23 and the amplifier 34.

映像処理部３１は、映像選択部２１から供給される第１〜第３の映像信号によって表される各映像を、ＬＣＤパネル３２に表示させる。より具体的には、映像処理部３１は、一画面を複数の領域に分割すると共に、各領域に、上記各映像およびユーザによって選択された映像を表示する。映像処理部３１によってＬＣＤパネル３２に表示させる画面の一例を図９に示す。 The video processing unit 31 causes the LCD panel 32 to display each video represented by the first to third video signals supplied from the video selection unit 21. More specifically, the video processing unit 31 divides one screen into a plurality of areas, and displays the video and the video selected by the user in each area. An example of a screen displayed on the LCD panel 32 by the video processing unit 31 is shown in FIG.

アンプ３４は、音声選択部２２から供給される音声信号を増幅し、当該音声信号の示す音声を、スピーカー３５を介して出力する。 The amplifier 34 amplifies the audio signal supplied from the audio selection unit 22 and outputs the audio indicated by the audio signal via the speaker 35.

ここで、音声選択部２２による音声信号の選択は、制御部１６から供給される選択信号に基づいて行われる。したがって、音声選択部２２は、映像選択部２１が選択する映像信号に対応する音声信号を選択することになる。たとえば、映像選択部２１が第１映像信号を選択している際には、音声選択部２２は第１音声信号を選択する。 Here, the selection of the audio signal by the audio selection unit 22 is performed based on the selection signal supplied from the control unit 16. Therefore, the audio selection unit 22 selects an audio signal corresponding to the video signal selected by the video selection unit 21. For example, when the video selection unit 21 selects the first video signal, the audio selection unit 22 selects the first audio signal.

ただし、音声選択部２１における別の選択方法として、常に第１〜第３の音声信号のいずれか１つの音声信号を選択するようにしてもよい。たとえば、音声選択部２１が、各音声信号の音質を比較する構成とし、第１音声信号の音質が優れている場合は、常に第１音声信号を選択する構成としてもよい。また、音声選択部２１は、メモリ１３に格納された音声データを選択する構成としてもよい。 However, as another selection method in the voice selection unit 21, any one of the first to third voice signals may always be selected. For example, the sound selection unit 21 may be configured to compare the sound quality of each sound signal, and may be configured to always select the first sound signal when the sound quality of the first sound signal is excellent. Further, the voice selection unit 21 may be configured to select voice data stored in the memory 13.

編集データ生成部２３は、映像選択部２１によって選択されている映像信号と、音声選択部２２によって選択されている音声信号とを多重化することによって、編集後の音声映像データを生成する。また、生成した音声映像データを、適宜記録部３３に適合した符号化方法によって符号化し、符号化後のデータを記録部３３に供給する。符号化後のデータは、例えば、ＭＰＥＧ２ストリームやＭＰＥＧ４ストリームである。 The edit data generation unit 23 generates edited audio / video data by multiplexing the video signal selected by the video selection unit 21 and the audio signal selected by the audio selection unit 22. Further, the generated audio / video data is appropriately encoded by an encoding method suitable for the recording unit 33, and the encoded data is supplied to the recording unit 33. The encoded data is, for example, an MPEG2 stream or an MPEG4 stream.

記録部３３は、編集データ生成部２３から供給される符号化後のデータを、記録媒体に記録する。 The recording unit 33 records the encoded data supplied from the editing data generation unit 23 on a recording medium.

記録媒体の例としては、テレビジョン受像器３０に内蔵されているＨＤ（harddisk）、テレビジョン受像器３０に挿入されているＢＤ（Blu-ray Disk）、テレビジョン受像器３０に接続されているＤＬＮＡ対応ＨＤなどが挙げられる。なお、本実施形態において、記録媒体はこれらに限定されず、編集音声映像データを記録可能な記録媒体であれば、いかなる記録媒体でもよい。 As an example of the recording medium, an HD (hard disk) built in the television receiver 30, a BD (Blu-ray Disk) inserted in the television receiver 30, and the television receiver 30 are connected. DLNA compatible HD and the like. In the present embodiment, the recording medium is not limited to these, and any recording medium can be used as long as it can record the edited audio / video data.

（リモコン４０）
リモコン（遠隔制御装置）４０は、ユーザが選択する操作を操作信号に変換し、変換した操作信号をテレビジョン受像器３０が備える赤外線受光部１７に送信するものである。操作信号を送信する機能とは別の機能として、特定音声信号を発生する機能を有している。 (Remote control 40)
The remote controller (remote control device) 40 converts an operation selected by the user into an operation signal, and transmits the converted operation signal to the infrared light receiving unit 17 included in the television receiver 30. As a function different from the function of transmitting the operation signal, it has a function of generating a specific audio signal.

図２は、リモコン４０の構成を示すブロック図である。図２に示すように、リモコン４０は、制御部４１、操作部４２、赤外線送信部４３、特定音声生成部４４およびスピーカー４５を備えている。 FIG. 2 is a block diagram showing the configuration of the remote controller 40. As shown in FIG. 2, the remote controller 40 includes a control unit 41, an operation unit 42, an infrared transmission unit 43, a specific sound generation unit 44, and a speaker 45.

ユーザがリモコン４０を操作することによって行う操作の一例は、視聴するチャンネルを変更する、および、音量を変更するなど、テレビジョン受像器における操作として一般的な操作である。また、後述するように、複数のサブ画像のいずれかを選択する操作も、ユーザがリモコン４０を介して行う操作の一例である。 An example of an operation performed by the user operating the remote control 40 is a general operation as an operation on the television receiver, such as changing a viewing channel and changing a volume. Further, as will be described later, an operation of selecting any of the plurality of sub-images is an example of an operation performed by the user via the remote controller 40.

一方、本実施形態に係るテレビジョン受像器３０に特有な操作として、たとえば、編集開始点Ｐを検出する、編集開始、および、編集時における音声映像データの選択などの操作が挙げられる。 On the other hand, operations unique to the television receiver 30 according to the present embodiment include, for example, operations such as detecting the editing start point P, starting editing, and selecting audio / video data during editing.

ユーザが操作部４２に対して行った操作の内容を示す操作信号は制御部４１に供給される。制御部４１は、供給された操作信号の示す具体的な操作を解釈すると共に、当該操作信号の示す操作がテレビジョン受像機３０に対して送信すべきものである場合には、当該操作信号を赤外線送信部４３に供給する。赤外線送信部４３は、制御部４１より供給された操作信号を赤外線信号として、リモコン４０の外部へ送信する。 An operation signal indicating the content of the operation performed by the user on the operation unit 42 is supplied to the control unit 41. The control unit 41 interprets a specific operation indicated by the supplied operation signal, and when the operation indicated by the operation signal is to be transmitted to the television receiver 30, the control signal is transmitted to the infrared receiver. The data is supplied to the transmission unit 43. The infrared transmitter 43 transmits the operation signal supplied from the controller 41 to the outside of the remote controller 40 as an infrared signal.

一方、ユーザがリモコン４０から特定音声信号を発生させたい場合、ユーザは、例えば、操作部４２に設けられている特定音声出力ボタンを押下する。すると、操作部４２は、制御部４１に対して、特定音声信号を発生させる旨の操作信号を供給し、制御部４１は、この操作信号を受けると、特定音声信号を発生させる旨の制御信号を特定音声生成部４４に供給する。 On the other hand, when the user wants to generate a specific audio signal from the remote controller 40, the user presses a specific audio output button provided on the operation unit 42, for example. Then, the operation unit 42 supplies an operation signal for generating a specific audio signal to the control unit 41, and when the control unit 41 receives the operation signal, the control signal for generating a specific audio signal. Is supplied to the specific voice generation unit 44.

特定音声生成部４４は、当該制御信号を受けると、特定音声信号を生成し、スピーカー４５を介して出力する。 When receiving the control signal, the specific sound generation unit 44 generates a specific sound signal and outputs it through the speaker 45.

なお、特定音声信号の詳細については、後述するため、ここでは説明を省略する。また、特定音声生成部４４が生成することができる特定音声信号は、１つの波形に限られない。たとえば、特定音声生成部４４は、それぞれ波形の異なる複数の特定音声信号を生成できるように予め設定されていてもよい。このような構成の場合、リモコン４０の操作部４２に、特定音声信号を選択するためのボタンを備えておき、ユーザは、当該ボタンを操作することによって特定音声信号を選択することができる。 Since details of the specific audio signal will be described later, the description thereof is omitted here. Further, the specific sound signal that can be generated by the specific sound generation unit 44 is not limited to one waveform. For example, the specific sound generation unit 44 may be set in advance so as to generate a plurality of specific sound signals having different waveforms. In such a configuration, the operation unit 42 of the remote controller 40 is provided with a button for selecting a specific audio signal, and the user can select the specific audio signal by operating the button.

（デジタルビデオカメラ５０）
リモコン４０にから出力された特定音声信号は、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃによって、それぞれの映像音声データの一部として記録される。 (Digital video camera 50)
The specific audio signal output from the remote controller 40 is recorded as a part of each video / audio data by the first to third digital video cameras 50a to 50c.

テレビジョン受像器３０を用いて音声映像データを編集する際、第１〜第３の外部入出力部には、それぞれ、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃが接続される。なお、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃを総称して、デジタルビデオカメラ５０と呼ぶ。 When audio / video data is edited using the television receiver 30, the first to third digital video cameras 50a to 50c are connected to the first to third external input / output units, respectively. The first to third digital video cameras 50a to 50c are collectively referred to as a digital video camera 50.

デジタルビデオカメラ５０としては、以下の条件を満たしていれば、汎用のデジタルビデオカメラを用いることができる。
・テレビジョン受像機３０の備える外部入出力部１１に接続することが可能なインターフェースを備える。
・特定音声信号を音声映像データの一部として記録可能である。 As the digital video camera 50, a general-purpose digital video camera can be used as long as the following conditions are satisfied.
An interface that can be connected to the external input / output unit 11 included in the television receiver 30 is provided.
-Specific audio signals can be recorded as part of audio-video data.

なお、外部入出力部１１に接続することが可能なインターフェースとは、上述したように、例えばＨＤＭＩおよびＨＤＶに適合しているインターフェースが挙げられるが、これらに限定されるものではなく、デジタルビデオカメラ５０による音声映像データの頭出しを制御するための制御信号を伝送できるものであればよい。 As described above, the interface that can be connected to the external input / output unit 11 includes, for example, an interface that conforms to HDMI and HDV. However, the interface is not limited to these, and is a digital video camera. Any control signal may be used as long as it can transmit a control signal for controlling cueing of audio-video data by 50.

本実施形態において、後述するように、特定音声信号の周波数として２０ＫＨｚ以下の周波数領域を用いている。したがって、デジタルビデオカメラ５０は、２０ＫＨｚ以下の周波数領域を音声映像データの一部として記録可能であればよい。 In the present embodiment, as will be described later, a frequency region of 20 KHz or less is used as the frequency of the specific audio signal. Therefore, the digital video camera 50 only needs to be able to record a frequency region of 20 KHz or less as part of the audio video data.

上述の２点の条件は、出願時において流通しているほとんどの汎用デジタルビデオカメラが備えている条件である。本実施形態では、テレビジョン受像器３０を用いて音声映像データを編集するために特別なデジタルビデオカメラを用意する必要はない。例えば、デジタルビデオカメラ５０は、他のデジタルビデオカメラとの間で、音声映像データの同期をとるための特別な通信を行う構成を備えている必要はない。 The above-mentioned two conditions are conditions that are included in most general-purpose digital video cameras distributed at the time of filing. In this embodiment, it is not necessary to prepare a special digital video camera in order to edit the audio / video data using the television receiver 30. For example, the digital video camera 50 does not need to have a configuration for performing special communication with other digital video cameras to synchronize audio / video data.

図３は、デジタルビデオカメラ５０の基本的な構成を示すブロック図である。図３に示すように、デジタルビデオカメラ５０は、制御部５１、操作部５２、撮像素子５３、マイク５４、外部入出力部５５、および、記録媒体である記録部５６を備えている。 FIG. 3 is a block diagram showing a basic configuration of the digital video camera 50. As shown in FIG. 3, the digital video camera 50 includes a control unit 51, an operation unit 52, an image sensor 53, a microphone 54, an external input / output unit 55, and a recording unit 56 that is a recording medium.

ユーザは、操作部５２を介して、録画の開始および停止、画角の調整、フォーカスの調整といった操作を行う。操作部５２は、ユーザが行う操作に対応した信号を制御部５１に供給する。制御部５１は、当該信号に応じて上記の各部を制御する。 The user performs operations such as recording start and stop, angle of view adjustment, and focus adjustment via the operation unit 52. The operation unit 52 supplies a signal corresponding to the operation performed by the user to the control unit 51. The control unit 51 controls each unit described above according to the signal.

撮像素子５３は、フォーカスおよび倍率を調節する光学系から入力される光を、映像データに変換し制御部５１に供給する。 The image sensor 53 converts light input from an optical system that adjusts focus and magnification into video data and supplies the video data to the controller 51.

マイク５４は、デジタルビデオカメラ５０の周辺環境における音声を音声データに変換し、制御部５１に供給する。 The microphone 54 converts sound in the surrounding environment of the digital video camera 50 into sound data and supplies the sound data to the control unit 51.

制御部５１は、撮像素子５３から供給される映像データと、マイク５４から供給される音声データとを多重化することによって、音声映像データを生成する。さらに、生成した音声映像データを、適宜記録部５６に適合した符号化方法によって符号化し、符号化後の音声映像データを記録部５６に供給する。符号化された音声映像データは、例えば、ＭＰＥＧ２ストリームやＭＰＥＧ４ストリームである。 The control unit 51 generates audio / video data by multiplexing the video data supplied from the imaging element 53 and the audio data supplied from the microphone 54. Further, the generated audio / video data is appropriately encoded by an encoding method suitable for the recording unit 56, and the encoded audio / video data is supplied to the recording unit 56. The encoded audio / video data is, for example, an MPEG2 stream or an MPEG4 stream.

記録部５６は、制御部５１から供給される符号化された音声映像データを記録媒体に記録する。記録媒体の例としては、ＨＤ、ＳＤカードまたはＢＤなど、音声映像データを記録可能な記録媒体であれば、いかなる記録媒体でもよい。 The recording unit 56 records the encoded audio / video data supplied from the control unit 51 on a recording medium. As an example of the recording medium, any recording medium may be used as long as it can record audio-video data such as HD, SD card, or BD.

外部入出力部５５は、編集装置１０が備える外部入出力部１１に対応するインターフェースであればよい。たとえば、ＨＤＭＩ、または、ＨＤＶの規格に適合しているインターフェースであればよい。 The external input / output unit 55 may be an interface corresponding to the external input / output unit 11 included in the editing apparatus 10. For example, any interface that conforms to the HDMI or HDV standard may be used.

記録部５６により記録媒体に記録された音声映像データは、外部入出力部５５を介してテレビジョン受像機３０に供給される。また、外部入出力部５５には、テレビジョン受像機３０からの制御信号が入力され、制御部５１は、当該制御信号を参照して、デジタルビデオカメラ５０の備える各部を制御する。 The audio / video data recorded on the recording medium by the recording unit 56 is supplied to the television receiver 30 via the external input / output unit 55. Further, a control signal from the television receiver 30 is input to the external input / output unit 55, and the control unit 51 controls each unit included in the digital video camera 50 with reference to the control signal.

例えば、テレビジョン受像機３０から、音声映像データ上の所定の時間的位置でスタンバイする旨の制御信号を受けた場合、制御部５０は、音声映像データ上の所定の時間的位置まで巻き戻しまたは早送りを行い、当該所定の時間的位置から再生を行うことができるようスタンバイする。 For example, when receiving a control signal from the television receiver 30 to stand by at a predetermined time position on the audio / video data, the control unit 50 rewinds to a predetermined time position on the audio / video data. Fast-forwarding is performed, and standby is performed so that playback can be performed from the predetermined time position.

（特定音声信号）
テレビジョン受像機３０の備える音声解析部１５によって各音声信号から抽出される特定音声信号について、図４を参照しながら説明する。特定音声信号は、リモコン４０の備える特定音声生成部４４によって生成され、第１〜第３の音声映像データ、およびメモリ１３に格納された音声データ（または映像音声データ）のそれぞれの一部として記録されている信号である。 (Specific audio signal)
A specific audio signal extracted from each audio signal by the audio analysis unit 15 included in the television receiver 30 will be described with reference to FIG. The specific audio signal is generated by the specific audio generation unit 44 included in the remote controller 40 and recorded as a part of each of the first to third audio / video data and the audio data (or video / audio data) stored in the memory 13. Signal.

特定音声信号は、第１〜第３の音声信号データの各々に対応する音声信号、およびメモリ１３に格納された音声データ（または映像音声データ）に埋め込まれている。特定音声信号は、各音声映像データにおける編集開始点Ｐを特定するために音声解析部１５によって抽出される。 The specific audio signal is embedded in the audio signal corresponding to each of the first to third audio signal data and the audio data (or video / audio data) stored in the memory 13. The specific audio signal is extracted by the audio analysis unit 15 in order to specify the editing start point P in each audio video data.

図４は、本実施形態における特定音声信号の一例を示す波形図である。図４に示すように、特定音声信号は、固定周波数からなる固定周波数信号ＦＦと、段階的に変化していく可変周波数からなる可変周波数信号ＶＦとが繰り返される音声信号である。 FIG. 4 is a waveform diagram showing an example of the specific audio signal in the present embodiment. As shown in FIG. 4, the specific audio signal is an audio signal in which a fixed frequency signal FF having a fixed frequency and a variable frequency signal VF having a variable frequency that changes stepwise are repeated.

図４に示す例においては、固定周波数信号ＦＦは１７．０ＫＨｚの周波数を有している信号である。可変周波数信号ＶＦは１８．０ＫＨｚから１８．９ＫＨｚまで、０．１ＫＨｚ間隔で段階的に変化する信号である。図４に示すように、特定音声信号には、所定の期間連続する固定周波数信号ＦＦと、所定の期間連続する可変周波数信号ＶＦとが含まれている。ここで、上記所定の時間をｔと表記することにする。本実施形態において、ｔ＝０．５秒である。すなわち、本実施形態に係る音声特定信号は、１７．０ＫＨｚ、１８．０ＫＨｚ、１７．０ＫＨｚ、１８．１ＫＨｚ、１７．０ＫＨｚ、１８．２ＫＨｚ、１７．０ＫＨｚ、１８．３ＫＨｚ、・・・、１７．０ＫＨｚ、１８．８ＫＨｚ、１７．０ＫＨｚおよび１８．９ＫＨｚの各周波数を有する各信号が、０．５秒間づつ連続している音声信号であり、、特定音声信号の長さは１０秒間である。 In the example shown in FIG. 4, the fixed frequency signal FF is a signal having a frequency of 17.0 KHz. The variable frequency signal VF is a signal that gradually changes from 18.0 KHz to 18.9 KHz at intervals of 0.1 KHz. As shown in FIG. 4, the specific audio signal includes a fixed frequency signal FF that continues for a predetermined period and a variable frequency signal VF that continues for a predetermined period. Here, the predetermined time is expressed as t. In this embodiment, t = 0.5 seconds. That is, the audio specific signal according to the present embodiment is 17.0 KHz, 18.0 KHz, 17.0 KHz, 18.1 KHz, 17.0 KHz, 18.2 KHz, 17.0 KHz, 18.3 KHz,. Each signal having each frequency of 0 KHz, 18.8 KHz, 17.0 KHz, and 18.9 KHz is an audio signal continuous for 0.5 seconds, and the length of the specific audio signal is 10 seconds.

編集開始点Ｐの位置は、特定音声信号が終了する位置として定めてもよいし、特定音声信号が終了してから任意時間後の位置として定めてもよい。この任意時間をαと定義する。本実施形態においては、α＝１秒として編集開始点Ｐを定めている（図４参照）。 The position of the edit start point P may be determined as a position where the specific audio signal ends, or may be determined as a position after an arbitrary time after the specific audio signal ends. This arbitrary time is defined as α. In the present embodiment, the editing start point P is determined with α = 1 second (see FIG. 4).

なお、固定周波数信号ＦＦおよび可変周波数信号ＶＦの繰り返し周期は、所定の時間ｔではなく、信号が連続する周期数によって決定してもよい。たとえば、８８５０周期ごとに固定周波数信号ＦＦおよび可変周波数信号ＶＦが繰り返されるように、特定音声信号の波形を定めてもよい。この場合、特定音声信号の長さは１０秒間になる。 Note that the repetition period of the fixed frequency signal FF and the variable frequency signal VF may be determined not by the predetermined time t but by the number of periods in which the signal continues. For example, the waveform of the specific audio signal may be determined so that the fixed frequency signal FF and the variable frequency signal VF are repeated every 8850 periods. In this case, the length of the specific audio signal is 10 seconds.

（編集開始点Ｐの特定）
音声解析部１５は、解析対象の音声信号から特定音声信号を抽出すると共に、抽出した特定音声信号を解析し編集開始点Ｐを特定する。音声解析部１５は、特定音声信号において、固定周波数信号ＦＦから可変周波数信号ＶＦへ（ケース１と呼ぶ）、または、可変周波数信号ＶＦから固定周波数信号ＦＦへ（ケース２と呼ぶ）、周波数が変化する音声信号上の時点（周波数変化点とも呼ぶ）を検出する。この周波数変化点をＴ_ｃｈと定義する。 (Identification of editing start point P)
The voice analysis unit 15 extracts the specific voice signal from the voice signal to be analyzed, and analyzes the extracted specific voice signal to specify the editing start point P. The sound analysis unit 15 changes the frequency of the specific sound signal from the fixed frequency signal FF to the variable frequency signal VF (referred to as case 1) or from the variable frequency signal VF to the fixed frequency signal FF (referred to as case 2). A time point (also referred to as a frequency change point) on the audio signal to be detected is detected. This frequency change point is defined as _Tch .

本実施形態においてｔ＝０．５秒なので、少なくとも１回のＴ_ｃｈを検出するために、音声解析部１５は０．５秒より長い時間の特定音声信号を解析することが好ましい。これに対応して、解析対象の音声信号には、０．５秒より長い時間の特定音声信号が含まれていることが好ましい。 Since t = 0.5 seconds in the present embodiment, it is preferable that the voice analysis unit 15 analyzes a specific voice signal for a time longer than 0.5 seconds in order to detect at least one _Tch . Correspondingly, it is preferable that the audio signal to be analyzed includes a specific audio signal having a time longer than 0.5 seconds.

特定音声信号は、音声映像データを復号した音声信号に埋め込まれている音声信号であり、他の音声に埋もれている可能性もある。したがって、編集開始点Ｐをより正確に特定するために、音声解析部１５は、複数のＴ_ｃｈを検出し解析することが更に好ましい。このためには、音声解析部１５は１．０秒より長い時間の特定音声信号を解析することが更に好ましい。これに対応して、解析対象の音声信号には、１．０秒より長い時間の特定音声信号が含まれていることが好ましい。 The specific audio signal is an audio signal embedded in an audio signal obtained by decoding audio / video data, and may be embedded in other audio. Therefore, in order to specify the edit start point P more accurately, it is more preferable that the voice analysis unit 15 detects and analyzes a plurality of _Tch . For this purpose, it is more preferable that the voice analysis unit 15 analyzes a specific voice signal having a time longer than 1.0 seconds. Correspondingly, it is preferable that the audio signal to be analyzed includes a specific audio signal having a time longer than 1.0 seconds.

なお、図４に示す音声特定信号において、Ｔ_ｃｈにおける可変周波数信号ＶＦの周波数をｆ_ＶＦと定義する。たとえば、特定音声信号の周波数が１７．０ＫＨｚから１８．３ＫＨｚに変化する場合において、ｆ_ＶＦ＝１８．３ＫＨｚである。特定音声信号の周波数が１８．６ＫＨｚから１７．０ＫＨｚに変化する場合において、ｆ_ＶＦ＝１８．６ＫＨｚである。また、特定音声信号において、最後の可変周波数信号ＶＦが有する周波数をｆ_ｆＶＦと定義する。本実施形態において、ｆ_ｆＦＶ＝１８．９ＫＨｚである。さらに、可変周波数ＶＦの周波数が段階的に変化する際の間隔をＤと定義する。本実施形態において、Ｄ＝０．１ＫＨｚである。 Incidentally, in the audio specified signal shown in FIG. _4, the frequency of the variable frequency signal VF at _{T ch} is defined as _{f VF.} For example, when the frequency of the specific audio signal changes from 17.0 KHz to 18.3 KHz, f _VF = 18.3 KHz. When the frequency of the specific audio signal changes from 18.6 KHz to 17.0 KHz, f _VF = 18.6 KHz. In addition, the frequency of the last variable frequency signal VF in the specific audio signal is defined as f _fVF . In the present embodiment, f _fFV = 18.9 KHz. Further, an interval when the frequency of the variable frequency VF changes stepwise is defined as D. In this embodiment, D = 0.1 KHz.

以下、解析対象の音声信号が第１の音声信号であるとして、音声解析部１５が、第１の編集開始点Ｐ_１を特定する処理について図５を参照して説明する。なお、第２、第３の音声信号を解析し、第２、第３の編集開始点Ｐ_２、Ｐ_３を特定する処理についても同様である。 Hereinafter, a process in which the voice analysis unit 15 specifies the _first editing start point P1 on the assumption that the voice signal to be analyzed is the first voice signal will be described with reference to FIG. The same applies to the processing of analyzing the second and third audio signals and specifying the second and third editing start points P ₂ and P ₃ .

図５は、音声解析部１５によって第１の編集開始点Ｐ_１を特定する処理の流れを示すフローチャートである。 FIG. 5 is a flowchart showing a flow of processing for specifying the _first editing start point P ₁ by the voice analysis unit 15.

（ステップＳ１０１）
音声解析部１５は、まず、復号部１４より第１の音声信号を取得する。 (Step S101)
The voice analysis unit 15 first acquires a first voice signal from the decoding unit 14.

（ステップＳ１０２）
続いて、音声解析部１５は、ステップＳ１０１にて取得した第１の音声信号に埋め込まれている特定音声信号を抽出する。 (Step S102)
Subsequently, the voice analysis unit 15 extracts a specific voice signal embedded in the first voice signal acquired in step S101.

（ステップＳ１０３）
続いて、音声解析部１５は、ステップＳ１０２にて抽出した特定音声信号において、周波数が変化する時点であるＴ_ｃｈ１を検出する。 (Step S103)
Subsequently, the voice analysis unit 15 detects T _ch1 which is a time point when the frequency changes in the specific voice signal extracted in step S102.

（ステップＳ１０４）
続いて、音声解析部１５は、特定音声信号におけるＴ_ｃｈ１の前後の周波数を比較することによって、Ｔ_ｃｈ１が、ケース１およびケース２のいずれに対応するものであるのかを判定する。 (Step S104)
Subsequently, the voice analysis unit 15 determines whether T _ch1 corresponds to Case 1 or Case 2 by comparing frequencies before and after T _ch1 in the specific voice signal.

（ステップＳ１０５）
Ｔ_ｃｈ１がケース１に対応するものである場合、音声解析部１５は、Ｔ_ｃｈ１から編集開始点Ｐ_１までの時間Ｔ_Ｐ１を以下の数式（１）によって決定する。 (Step S105)
When T _ch1 corresponds to Case 1, the voice analysis unit 15 determines the time T _P1 from T _ch1 to the editing start point P ₁ by the following formula (1).

たとえば、α＝１秒であり、特定音声信号の周波数が１７．０ＫＨｚから１８．３ＫＨｚに変化するＴ_ｃｈの場合、音声解析部１５は、Ｔ_Ｐ１＝７．５秒と決定する。 For example, in the case of T _ch where α = 1 second and the frequency of the specific sound signal changes from 17.0 KHz to 18.3 KHz, the sound analysis unit 15 determines that T _P1 = 7.5 seconds.

（ステップＳ１０６）
一方で、Ｔ_ｃｈ１がケース２に対応するものである場合、音声解析部１５は、Ｔ_Ｐ１を以下の数式（２）によって決定する。 (Step S106)
On the other hand, when T _ch1 corresponds to case 2, the voice analysis unit 15 determines _TP1 by the following formula (2).

たとえば、α＝１秒であり、特定音声信号の周波数が１８．６ＫＨｚから１７．０ＫＨｚに変化するＴ_ｃｈ１の場合、音声解析部１５は、Ｔ_Ｐ１＝４．０秒と決定する。 For example, in the case of T _ch1 where α = 1 second and the frequency of the specific sound signal changes from 18.6 KHz to 17.0 KHz, the sound analysis unit 15 determines that T _P1 = 4.0 seconds.

（ステップＳ１０７）
続いて、音声解析部１５は、音声信号上の時点Ｔ_ｃｈ１にＴ_Ｐ１を加算して得られる時点を編集開始点Ｐ_１として特定する。また、音声解析部１５は、当該編集開始点Ｐを示す編集開始点情報を制御部１６に供給する。 (Step S107)
Then, the voice analysis unit 15 identifies the point in time obtained by adding T _P1 at time T _ch1 on the voice signal as the editing start point P _1. Further, the voice analysis unit 15 supplies the editing start point information indicating the editing start point P to the control unit 16.

このように、音声解析部１５は、音声特定信号の周波数が変化するタイミングであるＴ_ｃｈを検出し、上記式を用いてＴ_Ｐを特定することによって、特定音声信号が埋め込まれている音声信号における編集開始点Ｐを特定することができる。換言すれば、音声解析部１５は、上記の解析を行うことによって上記音声信号に対応する音声映像データにおける編集開始点Ｐを特定することができる。また、上記式を用いて編集開始点Ｐを特定することによって、様々な波形の特定音声信号に対応することができる。すなわち、上述のパラメータｔ、ｆ_ｆＶＦ、Ｄとして様々な値を有する特定音声信号によっても、編集開始点Ｐを特定することができる。 Thus, the voice analysis unit 15 detects a T _ch is the timing to change the frequency of the audio specified signal, by identifying T _P using the above equation, the audio signal specific sound signal is embedded The editing start point P can be specified. In other words, the audio analysis unit 15 can specify the editing start point P in the audio / video data corresponding to the audio signal by performing the above analysis. Further, by specifying the editing start point P using the above formula, it is possible to deal with specific audio signals having various waveforms. That is, the editing start point P can be specified also by specific audio signals having various values as the parameters t, f _fVF , and D described above.

なお、パラメータｔ、ｆ_ｆＶＦ、Ｄおよびαの具体的な値を、予め定められたものとして、編集装置１０の備えるメモリ１３に格納しておき、音声解析部１５がそれらの値を読み出して上記の解析処理に用いる構成としてもよい。編集装置１０がリモコン４０からそれらの具体的な数値に関する情報を取得し、取得した情報に基づいて、音声解析部１５がそれらの具体的な数値を決定し、上記の解析処理に用いる構成としてもよい。当該構成は、特性音声信号を複数種類用いる構成とする場合に好適である。 Note that specific values of the parameters t, f _fVF , D, and α are stored in the memory 13 provided in the editing apparatus 10 as predetermined values, and the voice analysis unit 15 reads out the values and reads the above values. It is good also as a structure used for this analysis process. The editing apparatus 10 acquires information on these specific numerical values from the remote controller 40, and based on the acquired information, the voice analysis unit 15 determines those specific numerical values and uses them for the above analysis processing. Good. This configuration is suitable when a plurality of types of characteristic audio signals are used.

また、音声解析部１５が編集開始点Ｐを特定する方法として、ルックアップテーブル（ＬＵＴ）を用いる方法を採用してもよい。例えば、ｔ、ｆ_ｆＦＶ、およびαとしてそれぞれ予め定められた値を用いる場合、ｆ_ＶＦの各値に対するＴ_Ｐの各値を予め求めておき、音声解析部１５が、これらの値を含むＬＵＴを参照して、ｆ_ＶＦからＴ_Ｐを決定する構成としてもよい。このようなＬＵＴは、例えば、編集装置１０の備えるメモリ１３に格納しておけばよい。ＬＵＴを用いることによって、Ｔ_Ｐの算出に伴う処理量を低減することができるので、処理速度が向上する。 Further, a method using a look-up table (LUT) may be adopted as a method for the voice analysis unit 15 to specify the editing start point P. For example, t, when using the f FFV predetermined value respectively _as, and alpha, obtained in advance the values of T _P for each value of f _VF, the voice analysis unit 15, a LUT containing these values Referring _to, it may be configured to determine _{T P} from _{f VF.} Such an LUT may be stored in the memory 13 provided in the editing apparatus 10, for example. By using the LUT, it is possible to reduce the processing amount due to calculation of T _P, the processing speed is improved.

特定音声信号を特徴付ける固定周波数および可変周波数は、汎用のデジタルビデオカメラによって記録可能な周波数、例えば、２０ＫＨｚ以下であることが好ましい。また、デジタルビデオカメラが記録可能な周波数の範囲内において、特定音声信号の周波数は高い周波数帯に属することが好ましい。特定音声信号は、音声映像データにおける時間的な位置を特定するための信号である。特定音声信号の周波数が高い程、編集開始点Ｐをより精度よく特定することができる。 The fixed frequency and variable frequency that characterize the specific audio signal are preferably frequencies that can be recorded by a general-purpose digital video camera, for example, 20 KHz or less. Moreover, it is preferable that the frequency of the specific audio signal belongs to a high frequency band within a frequency range that can be recorded by the digital video camera. The specific audio signal is a signal for specifying a temporal position in the audio video data. As the frequency of the specific audio signal is higher, the editing start point P can be specified with higher accuracy.

（特定音声信号の変形例）
特定音声信号が連続する時間は１０秒間に限られず、任意の時間に設定することが可能である。特定音声信号が連続する時間を上述の例よりも長く設定する場合、例えば、次の２つの方法を用いればよい。 (Modified example of specific audio signal)
The continuous time of the specific audio signal is not limited to 10 seconds, and can be set to an arbitrary time. When the time for which the specific audio signal continues is set longer than the above example, for example, the following two methods may be used.

１つ目の方法は、可変周波数信号ＶＦの範囲（１８．０ＫＨｚから１８．９ＫＨｚ）、および、周波数間隔Ｄは変化させずに、各固定周波数信号ＦＦおよび各可変周波数信号ＶＦの時間ｔを長くする方法である。本実施形態においてｔ＝０．５秒であるが、ｔ＝１．０秒とすることによって、特定音声信号は２０秒間になる。 The first method is to increase the time t of each fixed frequency signal FF and each variable frequency signal VF without changing the range of the variable frequency signal VF (18.0 KHz to 18.9 KHz) and the frequency interval D. It is a method to do. In this embodiment, t = 0.5 seconds, but by setting t = 1.0 seconds, the specific audio signal becomes 20 seconds.

もう１つの方法は、ｔ＝０．５秒は変化させずに、可変周波数信号ＶＦが順次変化する回数を増やす方法である。可変周波数信号ＶＦの周波数を１８．０ＫＨｚから１９．９ＫＨｚまでＤ＝０．１ＫＨｚで変化させることによって、可変周波数信号ＶＦの周波数は２０回変化することになる。この場合も、特定音声信号は２０秒間になる。 Another method is to increase the number of times that the variable frequency signal VF changes sequentially without changing t = 0.5 seconds. By changing the frequency of the variable frequency signal VF from 18.0 KHz to 19.9 KHz at D = 0.1 KHz, the frequency of the variable frequency signal VF changes 20 times. Also in this case, the specific audio signal is 20 seconds.

また、ｔ＝０．５秒であり、可変周波数信号ＶＦの周波数範囲が１８．０ＫＨｚから１８．９ＫＨｚであっても、Ｄ＝０．０５ＫＨｚとすることによって、可変周波数信号ＶＦの変化する回数を２０回にすることができる。この場合も、特定音声信号は２０秒間になる。 Further, even when t = 0.5 seconds and the frequency range of the variable frequency signal VF is 18.0 KHz to 18.9 KHz, by setting D = 0.05 KHz, the number of times the variable frequency signal VF changes can be set. Can be 20 times. Also in this case, the specific audio signal is 20 seconds.

一般に、特定音声信号の時間が長ければユーザの利便性を高めることができる。ユーザにとって使い勝手が良いように、特定音声信号の具体的な波形を定めておけばよい。 Generally, if the time of the specific audio signal is long, the convenience for the user can be improved. What is necessary is just to define the specific waveform of a specific audio | voice signal so that it may be convenient for a user.

＜適用例＞
編集装置１０を備えるテレビジョン受像器３０を用いた編集処理の特徴は、各音声映像データに埋め込まれている特定音声信号を解析することによって、複数の音声映像データの同期をとることにある。このため、テレビジョン受像器３０を用いて複数の音声映像データを編集するための前工程として、各音声映像データを録画および録音する段階において、特定音声信号を各音声映像データに埋め込んでおく（録音しておく）。 <Application example>
A feature of the editing process using the television receiver 30 including the editing apparatus 10 is that a plurality of audio / video data is synchronized by analyzing a specific audio signal embedded in each audio / video data. Therefore, as a pre-process for editing a plurality of audio / video data using the television receiver 30, a specific audio signal is embedded in each audio / video data at the stage of recording and recording each audio / video data ( Record it).

以下では、図６〜図９を参照しながら、テレビジョン受像器３０を用いた編集の対象となる複数の音声映像データを録画および録音する方法の一例について説明する。以下では、編集の対象となる複数の音声映像データが、同場面を異なる角度から撮影したものである場合を例にとり説明を行うが、これは本実施形態を限定するものではない。 Hereinafter, an example of a method for recording and recording a plurality of audio / video data to be edited using the television receiver 30 will be described with reference to FIGS. In the following, a case will be described as an example where a plurality of audio-video data to be edited is obtained by photographing the same scene from different angles, but this does not limit the present embodiment.

図６は、運動会等で用いられるトラックコースを互いに異なる複数の角度から撮影する場合を示す図である。 FIG. 6 is a diagram illustrating a case where a track course used in an athletic meet or the like is photographed from a plurality of different angles.

図６に示す第１のデジタルビデオカメラ５０ａおよび第２のデジタルビデオカメラ５０ｂは、撮影位置が固定されているデジタルビデオカメラである。第１のデジタルビデオカメラ５０ａは、広い画角での撮影を行うためのものであり、運動会全体の雰囲気を中心に撮影するためのものである。一方、第２のデジタルビデオカメラ５０ｂは、狭い画角での撮影を行うためのものである。すなわち、走者の顔をアップにして撮影したり、一人の走者に合わせてズーミングを調整しながら撮影したりするためのものである。第３のデジタルビデオカメラ５０ｃは、ユーザの手持ちのカメラであり、撮影位置および撮影する画角は特に定められておらず、ユーザが自由に移動しながら撮影するためのものである。 The first digital video camera 50a and the second digital video camera 50b shown in FIG. 6 are digital video cameras whose shooting positions are fixed. The first digital video camera 50a is for photographing with a wide angle of view, and is for photographing mainly the atmosphere of the athletic meet. On the other hand, the second digital video camera 50b is for performing shooting at a narrow angle of view. In other words, it is for shooting with the runner's face up or shooting while adjusting the zooming according to one runner. The third digital video camera 50c is a camera held by the user, and the shooting position and the angle of view for shooting are not particularly defined. The third digital video camera 50c is for the user to move while moving freely.

また、図７（ａ）に示すように、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃに加えて、録音機６０を用いて、運動会の音声を別途録音するものとする。 Further, as shown in FIG. 7A, in addition to the first to third digital video cameras 50a to 50c, the sound of the athletic meet is separately recorded using the recorder 60.

図７（ｂ）は、リモコン４０から発せられる特定音声信号、第１のデジタルビデオカメラ５０ａによって録音される特定音声信号、第２のデジタルビデオカメラ５０ｂによって録音される特定音声信号、第３のデジタルビデオカメラ５０ｃによって録音される特定音声信号、および、録音機６０によって録音される特定音声信号の各波形を示す図である。 FIG. 7B shows a specific audio signal emitted from the remote controller 40, a specific audio signal recorded by the first digital video camera 50a, a specific audio signal recorded by the second digital video camera 50b, and a third digital signal. It is a figure which shows each waveform of the specific audio | voice signal recorded by the video camera 50c, and the specific audio | voice signal recorded by the recording device 60. FIG.

図７（ｂ）に示すように、ユーザは、リモコン４０を、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃ、および録音機６０の近傍に順次近づけることによって、各機器に特定音声信号の少なくとも一部分を録音させる。 As shown in FIG. 7B, the user sequentially brings the remote controller 40 closer to the vicinity of the first to third digital video cameras 50a to 50c and the recorder 60, so that at least a specific audio signal is transmitted to each device. Record a part.

ここで、各機器に録音される特定音声信号には、図７（ｂ）に示すように、少なくとも一つの周波数変化点（上述のＴ_ｃｈ）とその前後の周波数が含まれる。 Here, as shown in FIG. 7B, the specific audio signal recorded in each device includes at least one frequency change point (the above-mentioned T _ch ) and frequencies before and after that.

編集装置１０の備える音声解析部１５は、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃ、および録音機６０に記録された音声データを解析する。 The audio analysis unit 15 included in the editing apparatus 10 analyzes audio data recorded in the first to third digital video cameras 50 a to 50 c and the recorder 60.

図８は、音声解析部１５によって解析される第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃ、および録音機６０に記録された音声データ、並びに、解析の結果算出される編集開始点Ｐまでの時間的間隔を示す図である。 FIG. 8 shows audio data recorded in the first to third digital video cameras 50a to 50c and the recorder 60 analyzed by the audio analyzing unit 15, and up to an editing start point P calculated as a result of the analysis. It is a figure which shows a time interval.

図８に示すように、音声解析部１５は、第１のデジタルビデオカメラ５０ａが記録した音声映像データに含まれる特定音声信号を、上述の処理によって解析し、周波数１８ＫＨｚから周波数１７ＫＨｚへの周波数変化点から編集開始点Ｐまでの時間的間隔を１０秒（ｓｅｃ）と算出する。同様に、音声解析部１５は、第２のデジタルビデオカメラ５０ｂが記録した音声映像データに含まれる特定音声信号を、上述の処理によって解析し、周波数１８．１ＫＨｚから周波数１７ＫＨｚへの周波数変化点から編集開始点Ｐまでの時間的間隔を９秒（ｓｅｃ）と算出する。同様に、音声解析部１５は、第３のデジタルビデオカメラ５０ｃが記録した音声映像データに含まれる特定音声信号を、上述の処理によって解析し、周波数１８．３ＫＨｚから周波数１７ＫＨｚへの周波数変化点から編集開始点Ｐまでの時間的間隔を７秒（ｓｅｃ）と算出する。同様に、音声解析部１５は、録音機６０が記録した音声データに含まれる特定音声信号を、上述の処理によって解析し、周波数１８．８ＫＨｚから周波数１７ＫＨｚへの周波数変化点から編集開始点Ｐまでの時間的間隔を２秒（ｓｅｃ）と算出する。 As shown in FIG. 8, the audio analysis unit 15 analyzes the specific audio signal included in the audio video data recorded by the first digital video camera 50a by the above-described processing, and changes the frequency from the frequency 18 KHz to the frequency 17 KHz. The time interval from the point to the editing start point P is calculated as 10 seconds (sec). Similarly, the audio analysis unit 15 analyzes the specific audio signal included in the audio video data recorded by the second digital video camera 50b by the above-described processing, and starts from the frequency change point from the frequency 18.1 KHz to the frequency 17 KHz. The time interval to the edit start point P is calculated as 9 seconds (sec). Similarly, the audio analysis unit 15 analyzes the specific audio signal included in the audio video data recorded by the third digital video camera 50c by the above-described processing, and starts from the frequency change point from the frequency 18.3 KHz to the frequency 17 KHz. The time interval to the editing start point P is calculated as 7 seconds (sec). Similarly, the voice analysis unit 15 analyzes the specific voice signal included in the voice data recorded by the recorder 60 by the above-described processing, and from the frequency change point from the frequency 18.8 KHz to the frequency 17 KHz to the editing start point P. Is calculated as 2 seconds (sec).

ユーザが編集装置１０に対して、リモコン４０を介して、編集スタンバイを指示することによって、編集装置１０の備える制御部１６は、第１〜第３のデジタルビデオカメラに対して、音声解析部１５によって特定された編集開始点Ｐの位置まで、映像音声データ（録音機６０については音声データ、以下同様）を早送りまたは巻き戻しさせると共に、当該編集開始点Ｐの位置で、各音声映像データの再生を一時停止させることによって、各音声映像データの頭出しを行う。また、制御部１６は、録音機６０によって記録され、メモリ１３に格納されている音声データを、編集開始点Ｐの位置から再生するよう頭出しを行う。 When the user instructs editing standby to the editing apparatus 10 via the remote controller 40, the control unit 16 included in the editing apparatus 10 performs the audio analysis unit 15 on the first to third digital video cameras. The video / audio data (audio data for the recorder 60, the same applies hereinafter) is fast-forwarded or rewound to the position of the editing start point P specified by the above, and each audio / video data is reproduced at the position of the editing start point P. Is paused to cue each audio-video data. In addition, the control unit 16 performs cueing so that the audio data recorded by the recording device 60 and stored in the memory 13 is reproduced from the position of the editing start point P.

ここで、上述の説明から明らかなように、編集開始点Ｐは、各音声映像データにおいて、同一の時点、例えば、運動会の開会式がスタートする時点を指している。 Here, as is clear from the above description, the editing start point P indicates the same time point in each audio video data, for example, the time point when the opening ceremony of the athletic meet starts.

ユーザが編集装置１０に対して、リモコン４０を介して、編集開始を指示することによって、制御部１６は、第１〜第３のデジタルビデオカメラおよび録音機６０に対して、各音声映像データを、編集開始点Ｐから再生させる。 When the user instructs the editing apparatus 10 to start editing via the remote controller 40, the control unit 16 sends the audio / video data to the first to third digital video cameras and the recorder 60. Then, playback is started from the editing start point P.

図９は、編集装置１０を備えるテレビジョン受像機３０を用いた編集作業における表示画面の一例を示す図である。図９に示すように、テレビジョン受像機３０の備える第１〜第３外部入出力部１１ａ〜１１ｃには、それぞれ、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃが接続され、ＳＤカードインターフェース１２ｂには、録音機６０によって音声データが記録されたＳＤカード６１が接続される。 FIG. 9 is a diagram illustrating an example of a display screen in editing work using the television receiver 30 including the editing device 10. As shown in FIG. 9, first to third digital video cameras 50 a to 50 c are connected to the first to third external input / output units 11 a to 11 c included in the television receiver 30, respectively, and an SD card interface is provided. An SD card 61 on which audio data is recorded by the recorder 60 is connected to 12b.

また、図９に示すように、テレビジョン受像機３０のＬＣＤパネル３２には、それぞれ第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃからの映像データが、サブ画像として表示される。ユーザは、リモコン４０を介して音声映像データ選択操作を行うことによって、サブ画像の一つを選択する。選択されたサブ画像に対応する映像データは、図９に示すように、メイン画像として表示される。図９に示す例においては、サブ画像ｃがメイン画像として選択されている。図９に示すように、選択中のサブ画像を枠囲みによって強調表示しておくことが好ましい。 In addition, as shown in FIG. 9, video data from the first to third digital video cameras 50a to 50c are displayed as sub-images on the LCD panel 32 of the television receiver 30, respectively. The user selects one of the sub-images by performing an audio / video data selection operation via the remote controller 40. The video data corresponding to the selected sub-image is displayed as a main image as shown in FIG. In the example shown in FIG. 9, the sub image c is selected as the main image. As shown in FIG. 9, it is preferable that the currently selected sub-image is highlighted with a frame.

また、ユーザは、任意のタイミングにて、音声映像データ選択操作を行うことによって選択画像を切り替えることができる。この場合、選択予定のサブ画像に枠囲みを付しておき、ユーザがリモコン４０の決定ボタンを押下した場合にメイン画像が、当該選択予定のサブ画像に切り替わる構成とすることが好ましい。図９は、サブ画像ａが選択予定である場合を示している。 Further, the user can switch the selected image by performing an audio / video data selection operation at an arbitrary timing. In this case, it is preferable that a frame is attached to the sub-image to be selected and the main image is switched to the sub-image to be selected when the user presses the determination button on the remote controller 40. FIG. 9 shows a case where the sub-image a is scheduled to be selected.

なお、選択中のサブ画像に付される枠囲みと、選択予定のサブ画像に付される枠囲みとは、例えば互いに異なる色を用いたり、枠囲みの線の太さを互いに異ならせるなどして、互いに識別できるよう表示することが好ましい。また、図９に示すように、選択中のサブ画像に付される枠囲みを実線で表示し、選択予定のサブ画像に付される枠囲み点線で表示する構成としてもよい。 Note that the frame box attached to the sub-image being selected and the frame box attached to the sub-image to be selected use different colors, for example, or the thicknesses of the frame lines are different from each other. It is preferable to display so that they can be distinguished from each other. Further, as shown in FIG. 9, the frame box attached to the selected sub-image may be displayed with a solid line, and the frame box dotted line attached to the sub-image to be selected may be displayed.

映像選択部２１は、ユーザにより各時点にて選択されたサブ画面（すなわちメイン画像）に対応する映像データを選択し、編集データ生成部２３は、当該各時点にて選択された映像データを、編集後の音声映像データとして出力する。なお、上述したように、編集後の音声映像データに含まれる音声データは、第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃからの音声映像データに含まれる音声データであってもよいし、録音機６０からの音声データに含まれる音声データであってもよいし、その他の音声データをＢＧＭとして用いる構成としてもよい。 The video selection unit 21 selects video data corresponding to the sub-screen (that is, the main image) selected at each time by the user, and the edit data generation unit 23 selects the video data selected at each time. Output as edited audio / video data. As described above, the audio data included in the edited audio / video data may be audio data included in the audio / video data from the first to third digital video cameras 50a to 50c, or may be recorded. The audio data included in the audio data from the device 60 may be used, or other audio data may be used as the BGM.

以上のように、本実施形態に係る編集装置１０は、複数のデジタルビデオカメラからの音声映像データの頭出しを自動的に行うことができるので、ユーザは、同期された音声映像データに基づく編集作業を容易に行うことができる。また、各デジタルビデオカメラが、同期をとるための特別な構成を有している必要がないので、余分なコストを招来しない。 As described above, the editing apparatus 10 according to the present embodiment can automatically perform cueing of audio / video data from a plurality of digital video cameras, so that the user can edit based on synchronized audio / video data. Work can be done easily. Further, since each digital video camera does not need to have a special configuration for synchronization, no extra cost is incurred.

なお、上述した例においては、編集装置１０が、編集装置１０に接続されている外部機器である第１〜第３のデジタルビデオカメラ５０ａ〜５０ｃから供給される音声映像データを直接的に用いて編集処理を行う場合について述べたが、本実施形態はこれに限定されるものではない。 In the above-described example, the editing apparatus 10 directly uses audio / video data supplied from the first to third digital video cameras 50a to 50c which are external devices connected to the editing apparatus 10. Although the case of performing the editing process has been described, the present embodiment is not limited to this.

例えば、編集装置１０は、メモリ１３にあらかじめ記録された複数の音声映像データを編集することに用いることもできる。また、別の例として、編集装置１０は、記録部にあらかじめ記録された複数の音声映像データを編集することに用いることもできる。当該記録部は、テレビジョン受像器３０が備える記録部３３であってもよいし、例えばＵＳＢインターフェース１２ａを介して編集装置１０に接続される外部記録部であってもよい。 For example, the editing apparatus 10 can be used to edit a plurality of audio / video data recorded in advance in the memory 13. As another example, the editing apparatus 10 can be used to edit a plurality of audio-video data recorded in advance in the recording unit. The recording unit may be the recording unit 33 included in the television receiver 30, or may be an external recording unit connected to the editing apparatus 10 via the USB interface 12a, for example.

このような場合、メモリ１３には、撮影日時の異なる多数の音声映像データが記録されていることもある。編集装置１０は、このような多数の音声映像データを、それらの撮影日時に関連付けて記憶しておく構成とすることが好ましい。編集装置１０は、これら多数の音声映像データから、それらの各々に関連付けられた撮影日時を参照して、略同一の撮影日時を有する音声映像データを探索すると共に、探索した音声映像データから特定音声信号を抽出することによって編集開始点Ｐを特定する構成とすればよい。また、編集装置１０は、特定した編集開始点Ｐから各音声映像データの再生を行い、上述の編集を行う構成とすればよい。このような構成をとることによって、メモリ１３、記録部３３および上記外部記録部の少なくとも１つに格納された撮影日時の異なる多数の音声映像データを編集する場合にも、ユーザは容易に編集作業を行うことができる。 In such a case, the memory 13 may have a large number of audio and video data recorded with different shooting dates and times. The editing apparatus 10 is preferably configured to store such a large number of audio / video data in association with their shooting dates and times. The editing device 10 searches the audio / video data having substantially the same shooting date / time with reference to the shooting date / time associated with each of these audio / video data, and also specifies the specific audio from the searched audio / video data. What is necessary is just to set it as the structure which specifies the edit start point P by extracting a signal. The editing apparatus 10 may be configured to perform the above-described editing by playing back each audio-video data from the specified editing start point P. By adopting such a configuration, even when a large number of audio / video data having different shooting dates and times stored in at least one of the memory 13, the recording unit 33, and the external recording unit are edited, the user can easily perform editing work. It can be performed.

（付記事項１）
上記の説明では、デジタル音声映像データを例に挙げたが、これは本実施形態を限定するものではない、上記の説明から明らかなように、本実施形態に係る編集装置１０による編集処理は、アナログ音声映像データに対しても適用することができる。したがって、第１〜第３のデジタルビデオカメラ５０ａ〜５０〜ｃに代えて、アナログビデオカメラを用いる構成としてもよい。 (Appendix 1)
In the above description, the digital audio / video data has been described as an example. However, this does not limit the present embodiment. As is clear from the above description, the editing process by the editing apparatus 10 according to the present embodiment is as follows. The present invention can also be applied to analog audio / video data. Therefore, an analog video camera may be used instead of the first to third digital video cameras 50a to 50-c.

（付記事項２）
上述した編集装置１０の各ブロックは、集積回路（ＩＣチップ）上に形成された論理回路によってハードウェア的に実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェア的に実現してもよい。 (Appendix 2)
Each block of the editing device 10 described above may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be realized in software using a CPU (Central Processing Unit). Also good.

後者の場合、上記装置は、各機能を実現するプログラムの命令を実行するＣＰＵ、上記プログラムを格納したＲＯＭ（Read Only Memory）、上記プログラムを展開するＲＡＭ（Random Access Memory）、上記プログラム及び各種データを格納するメモリ等の記憶装置（記録媒体）などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである上記各装置の制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータ読み取り可能に記録した記録媒体を、上記装置に供給し、そのコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 In the latter case, the apparatus includes a CPU that executes instructions of a program that realizes each function, a ROM (Read Only Memory) that stores the program, a RAM (Random Access Memory) that expands the program, the program, and various data. A storage device (recording medium) such as a memory for storing the. An object of the present invention is to provide a recording medium on which a program code (execution format program, intermediate code program, source program) of a control program for each device, which is software for realizing the functions described above, is recorded in a computer-readable manner. This can also be achieved by supplying to the apparatus and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ類、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／ＭＯ／ＭＤ／ＤＶＤ／ＣＤ−Ｒ等の光ディスクを含むディスク類、ＩＣカード（メモリカードを含む）／光カード等のカード類、マスクＲＯＭ／ＥＰＲＯＭ／ＥＥＰＲＯＭ／フラッシュＲＯＭ等の半導体メモリ類、あるいはＰＬＤ（Programmable logic device）やＦＰＧＡ（Field Programmable Gate Array）等の論理回路類などを用いることができる。 Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, and disks including optical disks such as CD-ROM / MO / MD / DVD / CD-R. IC cards (including memory cards) / optical cards, semiconductor memories such as mask ROM / EPROM / EEPROM / flash ROM, PLD (Programmable logic device), FPGA (Field Programmable Gate Array), etc. Logic circuits can be used.

また、上記各装置を通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。この通信ネットワークは、プログラムコードを伝送可能であればよく、特に限定されない。例えば、インターネット、イントラネット、エキストラネット、ＬＡＮ、ＩＳＤＮ、ＶＡＮ、ＣＡＴＶ通信網、仮想専用網（Virtual Private Network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、この通信ネットワークを構成する伝送媒体も、プログラムコードを伝送可能な媒体であればよく、特定の構成または種類のものに限定されない。例えば、ＩＥＥＥ１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ（Asymmetric Digital Subscriber Line）回線等の有線でも、ＩｒＤＡやリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＩＥＥＥ８０２．１１無線、ＨＤＲ（High Data Rate）、ＮＦＣ（Near Field Communication）、ＤＬＮＡ（Digital Living Network Alliance）、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。 Further, each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, virtual private network, telephone line network, mobile communication network, satellite communication network, and the like can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, even with wired lines such as IEEE 1394, USB, power line carrier, cable TV line, telephone line, and ADSL (Asymmetric Digital Subscriber Line) line, infrared rays such as IrDA and remote control, Bluetooth (registered trademark), IEEE 802.11 wireless, HDR ( It can also be used by radio such as High Data Rate (NFC), Near Field Communication (NFC), Digital Living Network Alliance (DLNA), mobile phone network, satellite line, and digital terrestrial network.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention.

本発明は、音声映像データを編集するための編集装置として、好適に適用することができる。 The present invention can be suitably applied as an editing apparatus for editing audio-video data.

１０編集装置
１１ａ第１外部入出力部
１１ｂ第２外部入出力部
１１ｃ第３外部入出力部
１２ａＵＳＢインターフェース
１２ｂＳＤカードインターフェース
１３メモリ
１４復号部
１５音声解析部（抽出手段、特定手段）
１６制御部（制御手段）
１７赤外線受光部
２０編集部（編集手段）
２１映像選択部
２２音声選択部
２３編集データ生成部
３０テレビジョン受像器
３１映像処理部
３２ＬＣＤパネル
３３記録部
３４アンプ
３５スピーカー
４０リモコン（遠隔制御装置）
５０ａ第１のデジタルビデオカメラ（外部機器）
５０ｂ第２のデジタルビデオカメラ（外部機器）
５０ｃ第３のデジタルビデオカメラ（外部機器）
DESCRIPTION OF SYMBOLS 10 Editing apparatus 11a 1st external input / output part 11b 2nd external input / output part 11c 3rd external input / output part 12a USB interface 12b SD card interface 13 Memory 14 Decoding part 15 Speech analysis part (extraction means, identification means)
16 Control unit (control means)
17 Infrared light receiving unit 20 Editing unit (editing means)
DESCRIPTION OF SYMBOLS 21 Image | video selection part 22 Audio | voice selection part 23 Edit data generation part 30 Television receiver 31 Image | video processing part 32 LCD panel 33 Recording part 34 Amplifier 35 Speaker 40 Remote control (remote control device)
50a First digital video camera (external device)
50b Second digital video camera (external device)
50c Third digital video camera (external device)

Claims

An editing device for editing a plurality of videos represented by each audio-video data,
Extraction means for extracting a specific audio signal embedded in each audio video data;
Identification means for identifying a reproduction start point on each audio-video data based on each specific audio signal extracted by the extraction means;
Control means for cuing each audio-video data so as to reproduce each audio-video data from the reproduction start point;
An editing apparatus comprising:

The specific audio signal repeatedly includes a fixed frequency signal composed of a fixed frequency and a variable frequency signal composed of a variable frequency that changes in stages,
The specifying means specifies the reproduction start point with reference to the fixed frequency signal and the variable frequency signal.
The editing apparatus according to claim 1.

Each audio video data is supplied from each external device,
The control means controls each external device to reproduce each audio-video data from the reproduction start point.
The editing apparatus according to claim 1 or 2, characterized in that

Editing means for generating edited audio / video data by selecting any of the audio / video data to be reproduced at each time;
The editing apparatus according to any one of claims 1 to 3, wherein

A remote control device for remotely controlling an editing device for editing a plurality of videos represented by audio-video data,
The editing apparatus includes output means for outputting a specific audio signal referred to for specifying a reproduction start point of each audio video data.
A remote control device characterized by that.

The editing device according to claim 3 or 4, the remote control device according to claim 5, and a plurality of video cameras as the external device,
An editing system characterized by that.

The editing apparatus according to claim 1 is provided.
A television receiver characterized by that.

A specific audio signal embedded in an audio signal and referred to to specify a specific position on the audio signal,
It includes a fixed frequency signal consisting of a fixed frequency and a variable frequency signal consisting of a variable frequency that changes in stages.
The analysis device for analyzing the audio signal specifies the specific position with reference to the fixed frequency signal and the variable frequency signal.
A specific audio signal characterized by that.

An editing method for editing a plurality of videos represented by each audio-video data,
An extraction step of extracting a specific audio signal embedded in each audio-video data;
Based on each specific audio signal extracted in the extraction step, a specific step for specifying a reproduction start point on each audio video data,
A control step of cueing each audio / video data to reproduce each audio / video data from the reproduction start point;
The editing method characterized by including.

The program for operating a computer as each means with which the editing apparatus of any one of Claim 1 to 4 is provided.

The computer-readable recording medium which recorded the program of Claim 10.