JP2018170678A

JP2018170678A - Live video processing system, live video processing method, and program

Info

Publication number: JP2018170678A
Application number: JP2017067451A
Authority: JP
Inventors: 真史庄司; Sanefumi Shoji; 佑介小林; Yusuke Kobayashi; 貴久篠木; Takahisa Shinoki
Original assignee: Livearth Co Ltd
Current assignee: Livearth Co Ltd
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2018-11-01

Abstract

PROBLEM TO BE SOLVED: To automatically and properly switch cameras recording a live video in which music instruments or the like are played, in accordance with a music being played.SOLUTION: This system detects beats of a music from inputted audio signals, detects the feature amount of the audio signals of each music instrument or player, and determines a music instrument or a player whose detected feature amount has the strongest relation with a maximum feature amount. Then the video of a music instrument or a player whose detected feature amount is determined to have the strongest relation is selected for each interval corresponding to a beat.SELECTED DRAWING: Figure 1

Description

本発明は、例えばコンサート、討論会、演劇などのライブ映像の収録に適用して好適なライブ映像処理システム、ライブ映像処理方法及びプログラムに関する。 The present invention relates to a live video processing system, a live video processing method, and a program suitable for recording live video such as a concert, a debate, and a theater.

従来、コンサートなどのライブ映像を収録する際には、複数台のカメラを配置して、様々なアングルから撮影を行うことが一般的に行われている。例えば、ステージ上で４人の演奏者が演奏中のライブ映像を収録する場合に、それぞれの演奏者を撮影する４台のカメラを用意し、４台のカメラの映像を切換える操作を、切換作業者が行うことで、ライブ映像を完成させている。 Conventionally, when recording live video such as a concert, it is generally performed by arranging a plurality of cameras and shooting from various angles. For example, when recording live images being played by four performers on the stage, prepare four cameras to shoot each performer, and switch the operation of switching the images of the four cameras. By doing this, the live video is completed.

このような切換作業者によるカメラ映像の切換え操作は、演奏される曲や楽器に合わせて随時適切なカメラに切換える必要があり、演奏に同期して適切なカメラに切換えるためには、ある程度熟練した者による操作が必要であった。また、収録の開始から終了まで長時間の操作が必要であり、カメラ切換え操作を行う切換作業者の負担が大きいという問題があった。 Such switching operation of the camera image by the switching operator is required to switch to an appropriate camera at any time according to the music or musical instrument to be played. An operation by a person was necessary. In addition, a long operation is required from the start to the end of recording, and there is a problem that a burden on a switching operator who performs a camera switching operation is heavy.

このため、切換作業者による操作を不要として、自動的にカメラ映像を切換えることができる映像切換装置の開発が望まれている。
例えば特許文献１には、音楽データの再生に同期して表示映像を切換えるスライドショーを行う場合に、音楽データのリズムを解析して、解析したリズムに合わせて映像の切換えタイミングを設定する技術が記載されている。 Therefore, it is desired to develop a video switching device that can automatically switch the camera video without requiring an operation by a switching operator.
For example, Patent Document 1 describes a technique for analyzing a rhythm of music data and setting a video switching timing according to the analyzed rhythm when performing a slide show in which a display video is switched in synchronization with reproduction of music data. Has been.

特開２００７−２５２４２号公報JP 2007-25242 A

特許文献１に記載された技術を適用することで、音楽データのリズムを解析して、スライドショーの再生を行う場合、その映像切換えのタイミングを音楽データのリズムに合わせることができる。しかしながら、この映像切換えを行う場合には、複数用意された映像の内で、どの映像に切換えるのが適切であるかを自動的に判断するのは、容易なことではない。 By applying the technique described in Patent Document 1, when a rhythm of music data is analyzed and a slide show is reproduced, the video switching timing can be matched with the rhythm of the music data. However, when this video switching is performed, it is not easy to automatically determine which video is suitable for switching among a plurality of prepared videos.

例えば、ステージ上で４人の演奏者が異なる楽器（４種類の楽器）を演奏中のときの映像切換え方法としては、その４種類の楽器の音を個別のマイクロフォンで収録して、それぞれの時点で、音が一番大きな楽器の映像に切換えることが考えられる。しかしながら、楽器には、ドラムや鍵盤楽器などの比較的大きな音を発するものと、弦楽器などの比較的小さな音を発するものとがあり、一律で音の強弱で映像を選んでしまうと、大きな音を発する楽器の映像が選択されることが多くなるため、実際のライブ映像として好ましくないものになってしまう。 For example, as a method of switching images when four performers are playing different instruments (four kinds of instruments) on the stage, the sounds of the four kinds of instruments are recorded by individual microphones, and each time point is recorded. It is possible to switch to the image of the loudest musical instrument. However, some instruments produce relatively loud sounds such as drums and keyboard instruments, while others produce relatively small sounds such as stringed instruments. Since an image of a musical instrument that emits is often selected, it is not preferable as an actual live image.

本発明は、楽器などを演奏したライブ映像のカメラ切換えを、演奏中の音楽に合わせて自動的にかつ適切に行うことができる映像処理システム、映像処理方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a video processing system, a video processing method, and a program capable of automatically and appropriately performing camera switching of a live video of a musical instrument or the like according to the music being played. .

本発明の映像処理システムは、楽器又は奏者ごとに用意された複数のカメラが撮影した複数の映像信号を切り換える映像切換部と、入力したオーディオ信号から楽曲の拍を検出する拍検出部と、入力したオーディオ信号から、楽器又は奏者ごとの特徴量を検出する特徴量検出部と、特徴量検出部が検出した楽器又は奏者ごとの最大特徴量を記録しながら、拍検出部が検出した拍に基づいて設定した区間ごとに、特徴量検出部が検出した特徴量と最大特徴量との相関が最も強い楽器又は奏者を判断する特徴量比較部と、特徴量比較部が判断した相関が最も強い楽器又は奏者の映像信号を、設定した区間ごとに映像切換部で選択するカメラ選択部と、を備える。 The video processing system of the present invention includes a video switching unit that switches a plurality of video signals photographed by a plurality of cameras prepared for each instrument or player, a beat detection unit that detects the beat of the music from the input audio signal, and an input Based on the beat detected by the beat detection unit while recording the maximum feature amount for each instrument or player detected by the feature amount detection unit and the feature amount detection unit for detecting the feature amount for each instrument or player from the recorded audio signal For each set section, a feature quantity comparison unit for judging the instrument or player having the strongest correlation between the feature quantity detected by the feature quantity detection unit and the maximum feature quantity, and a musical instrument having the strongest correlation judged by the feature quantity comparison unit Alternatively, a camera selection unit that selects a player's video signal by a video switching unit for each set section is provided.

また、本発明の映像処理方法は、入力したオーディオ信号から楽曲の拍を検出する拍検出処理と、入力したオーディオ信号から、楽器又は奏者ごとの特徴量を検出する特徴量検出処理と、特徴量検出処理により検出した楽器又は奏者ごとの最大特徴量を記録しながら、拍検出処理で検出した拍に基づいて設定した区間ごとに、特徴量検出処理により検出した特徴量と最大特徴量との相関が最も強い楽器又は奏者を判断する特徴量比較処理と、楽器又は奏者ごとに用意された複数のカメラが撮影した複数の映像信号から、特徴量比較処理により判断した相関が最も強い楽器又は奏者の映像信号を、設定した区間ごとに選択するカメラ選択処理と、を含む。 The video processing method of the present invention includes a beat detection process for detecting a beat of a music piece from an input audio signal, a feature value detection process for detecting a feature value for each instrument or player from the input audio signal, and a feature value. Correlation between the feature quantity detected by the feature quantity detection process and the maximum feature quantity for each section set based on the beat detected by the beat detection process while recording the maximum feature quantity for each instrument or player detected by the detection process Of the musical instrument or player having the strongest correlation determined by the feature amount comparison processing from the plurality of video signals photographed by a plurality of cameras prepared for each instrument or player. Camera selection processing for selecting a video signal for each set section.

また、本発明のプログラムは、上記の映像処理方法の各処理を手順としてコンピュータに実行させるものである。 The program of the present invention causes a computer to execute each process of the above video processing method as a procedure.

本発明によれば、楽曲の拍に基づいた区間ごとに、特徴量の相関が高い楽器又は奏者を撮影した映像に切換えられるため、楽曲の演奏中に、それぞれの時点で注目したい楽器又は奏者の映像に自動的に切換わるようになり、適切なライブ映像の収録ができるようになる。 According to the present invention, for each section based on the beat of the music, it is possible to switch to an image of a musical instrument or player having a high correlation of feature amounts. It will automatically switch to video, and appropriate live video can be recorded.

本発明の一実施の形態例によるシステム全体の構成例を示す図である。It is a figure which shows the structural example of the whole system by the example of 1 embodiment of this invention. 本発明の一実施の形態例による映像処理システムを構成する制御装置と映像切換装置と映像処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the control apparatus which comprises the video processing system by one example of this invention, a video switching apparatus, and a video processing apparatus. 本発明の一実施の形態例による特徴量の検出例を示す周波数特性図である。It is a frequency characteristic figure which shows the example of detection of the feature-value by one embodiment of this invention. 本発明の一実施の形態例による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process by one embodiment of this invention. 本発明の一実施の形態例によるカメラ選択とエフェクト状態の例を示す説明図である。It is explanatory drawing which shows the example of the camera selection and effect state by the example of 1 embodiment of this invention.

以下、本発明の一実施の形態の例（以下、「本例」と称する）を、図１〜図５を参照して説明する。 Hereinafter, an example of an embodiment of the present invention (hereinafter referred to as “this example”) will be described with reference to FIGS.

［１．システム全体の構成例］
図１は、本例の撮影システム全体の概要を示す。
この例では、ステージ５の上に、ベース１、ドラム２、キーボード３、及びピアノ４が配置され、それぞれ別の奏者が各楽器１〜４を演奏する。そして、各楽器１〜４の演奏音を集音するマイクロフォンＭ１〜Ｍ４が、各楽器１〜４ごとに配置されている。また、ステージ５の近傍（又はステージ５の上）に、各楽器１〜４とそれぞれの奏者を撮影するカメラＣ１〜Ｃ４が配置されている。
それぞれのカメラＣ１〜Ｃ４は、演奏状態を撮影するビデオカメラであり、固定された位置に設置されたカメラである。
なお、マイクロフォンＭ１〜Ｍ４やカメラＣ１〜Ｃ４の配置状態は一例であり、これらのマイクロフォンＭ１〜Ｍ４やカメラＣ１〜Ｃ４以外に、ステージ５の上や近傍にマイクロフォンやカメラを配置してもよい。例えば、ステージ５全体を撮影するカメラや、ステージ５全体を集音するマイクロフォンを配置してもよい。 [1. Example of overall system configuration]
FIG. 1 shows an overview of the entire photographing system of this example.
In this example, a base 1, a drum 2, a keyboard 3, and a piano 4 are arranged on a stage 5, and different players play the musical instruments 1 to 4, respectively. And the microphones M1-M4 which collect the performance sound of each musical instrument 1-4 are arrange | positioned for every musical instrument 1-4. In addition, cameras C1 to C4 that photograph the musical instruments 1 to 4 and respective players are arranged near the stage 5 (or on the stage 5).
Each of the cameras C1 to C4 is a video camera that captures a performance state, and is a camera installed at a fixed position.
The arrangement state of the microphones M1 to M4 and the cameras C1 to C4 is an example, and in addition to the microphones M1 to M4 and the cameras C1 to C4, a microphone and a camera may be arranged on or near the stage 5. For example, a camera that captures the entire stage 5 or a microphone that collects the entire stage 5 may be disposed.

４つのマイクロフォンＭ１〜Ｍ４から得られるオーディオ信号は、オーディオミキサ６に供給され、予め設定された混合状態又は演奏に合わせて随時調整した混合状態で、混合されたオーディオ信号となる。オーディオミキサ６で混合されたオーディオ信号は、例えば出力端子６ａを介してスピーカ（不図示）に供給され出力される。また、オーディオミキサ６で混合されたオーディオ信号は、後述する記録装置８及び配信装置９にも供給される。
さらに、オーディオミキサ６に得られる４つのマイクロフォンＭ１〜Ｍ４から得られるオーディオ信号は、混合せずに映像処理システム１０の制御装置２０に供給される。 The audio signals obtained from the four microphones M1 to M4 are supplied to the audio mixer 6 and become a mixed audio signal in a preset mixed state or a mixed state adjusted at any time according to the performance. The audio signal mixed by the audio mixer 6 is supplied to and output from a speaker (not shown) via, for example, the output terminal 6a. The audio signal mixed by the audio mixer 6 is also supplied to a recording device 8 and a distribution device 9 described later.
Furthermore, the audio signals obtained from the four microphones M1 to M4 obtained by the audio mixer 6 are supplied to the control device 20 of the video processing system 10 without being mixed.

映像処理システム１０は、制御装置２０と映像切換装置３０と映像処理装置４０とで構成される。
制御装置２０は、コンピュータで構成され、４つのマイクロフォンＭ１〜Ｍ４から得られたオーディオ信号を解析した結果に基づいて、カメラＣ１〜Ｃ４の撮影映像の選択やエフェクトを制御する。制御装置２０が行う制御の詳細は後述する。 The video processing system 10 includes a control device 20, a video switching device 30, and a video processing device 40.
The control device 20 is configured by a computer, and controls selection of captured images and effects of the cameras C1 to C4 based on the result of analyzing the audio signals obtained from the four microphones M1 to M4. Details of the control performed by the control device 20 will be described later.

映像切換装置３０には、カメラＣ１〜Ｃ４で撮影された４つの映像信号が供給され、供給される４つの映像信号の切換処理を行い、処理された１つの映像信号を出力する。また、映像処理装置４０は、映像切換装置３０の切換処理で処理された映像信号に対して、ズーム処理や色の変換などのエフェクト処理を行い、処理された映像信号を出力する。 The video switching device 30 is supplied with four video signals taken by the cameras C1 to C4, performs switching processing of the four video signals supplied, and outputs one processed video signal. In addition, the video processing device 40 performs effect processing such as zoom processing and color conversion on the video signal processed by the switching processing of the video switching device 30 and outputs the processed video signal.

映像処理システム１０の映像処理装置４０が出力する映像信号は、プロジェクタ装置７、記録装置８、及び配信装置９に供給される。プロジェクタ装置７は、供給される映像信号による映像を投影する。記録装置８は、供給される映像信号を記録する。配信装置９は、供給される映像信号を外部に配信する。なお、記録装置８における記録と配信装置９における配信時には、オーディオミキサ６から出力されるオーディオ信号についても記録又は配信を行う。なお、ライブ映像収録の目的や形態により、プロジェクタ装置７、記録装置８、及び配信装置９は必ずしも全てが必要ではなく、いずれか１つあるいは２つの組み合わせによる構成も考えられる。 The video signal output from the video processing device 40 of the video processing system 10 is supplied to the projector device 7, the recording device 8, and the distribution device 9. The projector device 7 projects an image based on the supplied video signal. The recording device 8 records the supplied video signal. The distribution device 9 distributes the supplied video signal to the outside. Note that at the time of recording in the recording device 8 and distribution in the distribution device 9, the audio signal output from the audio mixer 6 is also recorded or distributed. Note that the projector device 7, the recording device 8, and the distribution device 9 are not necessarily required depending on the purpose and form of live video recording, and a configuration in which any one or a combination of the two is conceivable.

［２．映像処理システムの構成例］
図２は、映像処理システム１０の機能的な構成を示すブロック図である。
制御装置２０は、４つのマイクロフォンＭ１〜Ｍ４から得られたオーディオ信号を解析して、その解析結果に基づいて映像切換装置３０を制御する処理を行う。この制御装置２０は、先に説明したようにコンピュータで構成され、解析処理などを実行するプログラムを、コンピュータが備える演算処理機能により実行させることで実現される。なお、制御装置２０をコンピュータで構成するのは一例であり、専用のハードウェアで構成してもよい。 [2. Example of video processing system configuration]
FIG. 2 is a block diagram showing a functional configuration of the video processing system 10.
The control device 20 analyzes audio signals obtained from the four microphones M1 to M4, and performs processing for controlling the video switching device 30 based on the analysis result. The control device 20 is configured by a computer as described above, and is realized by executing a program for executing analysis processing or the like by an arithmetic processing function provided in the computer. Note that the control device 20 is configured by a computer, and may be configured by dedicated hardware.

制御装置２０は、図２に示すように、オーディオ入力部２１を備え、このオーディオ入力部２１に各楽器１〜４の演奏音を集音した４チャンネルのオーディオ信号が供給される。オーディオ入力部２１は、例えばアナログオーディオ信号をデジタルオーディオ信号に変換するアナログデジタル（ＡＤ）変換器の役割を持っている。
そして、オーディオ入力部２１は、変換されたデジタルオーディオ信号を混合して、拍検出部２２に供給する。拍検出部２２は、オーディオ信号の周期的なレベルの変化から楽曲全体の拍を検出し、検出した拍のタイミングのデータを、特徴量比較部２５に供給する。 As shown in FIG. 2, the control device 20 includes an audio input unit 21, and the audio input unit 21 is supplied with 4-channel audio signals obtained by collecting performance sounds of the musical instruments 1 to 4. The audio input unit 21 has a role of an analog-digital (AD) converter that converts an analog audio signal into a digital audio signal, for example.
Then, the audio input unit 21 mixes the converted digital audio signals and supplies them to the beat detection unit 22. The beat detection unit 22 detects the beat of the entire music from the periodic level change of the audio signal, and supplies the detected beat timing data to the feature amount comparison unit 25.

また、オーディオ入力部２１に得られる各チャンネルのデジタルオーディオ信号は、高速フーリエ変換（ＦＦＴ：Fast Fourier Transform）部２３に供給される。高速フーリエ変換部２３は、供給される各チャンネルのオーディオ信号を個別に高速フーリエ変換して解析することで、低域（２００Ｈｚ帯）、中域（１２００Ｈｚ帯）、及び高域（８０００Ｈｚ帯）の３つの帯域の強度（レベル）又は振幅を検出する。 The digital audio signal of each channel obtained in the audio input unit 21 is supplied to a fast Fourier transform (FFT) unit 23. The fast Fourier transform unit 23 analyzes each of the supplied audio signals of each channel by performing a fast Fourier transform individually, so that the low frequency (200 Hz band), the middle frequency (1200 Hz band), and the high frequency (8000 Hz band) are analyzed. The intensity (level) or amplitude of the three bands is detected.

図３は、あるタイミングのオーディオ信号について、高速フーリエ変換部２３で解析した結果の例を示す。図３Ａは、マイクロフォンＭ１が集音したベース１の演奏音の周波数解析特性ｆ_Ａを示す。図３Ｂは、マイクロフォンＭ２が集音したドラム２の演奏音の周波数解析特性ｆ_Ｂを示す。図３Ｃは、マイクロフォンＭ３が集音したキーボード３の演奏音の周波数解析特性ｆ_Ｃを示す。図３Ｄは、マイクロフォンＭ４が集音したピアノ４の演奏音の周波数解析特性ｆ_Ｄを示す。図３Ａ〜Ｄにおいて、縦軸は強度（レベル）を示し、横軸は周波数を示す。
これら図３Ａ〜Ｄにおいて、周波数帯ｆ１、ｆ２、ｆ３は、それぞれ低域（２００Ｈｚ帯）、中域（１２００Ｈｚ帯）、及び高域（８０００Ｈｚ帯）を示す。
このようにして、高速フーリエ変換部２３は、各チャンネルのオーディオ信号ごとに、帯域ごとの強度のデータを得、得られた強度のデータを特徴量検出部２４に供給する。 FIG. 3 shows an example of a result obtained by analyzing the audio signal at a certain timing by the fast Fourier transform unit 23. FIG. 3A shows the frequency analysis characteristic f _A of the performance sound of the bass 1 collected by the microphone M1. Figure 3B shows the frequency analysis characteristic f _B performance sound of the drum 2 the microphone M2 is collected. FIG. 3C shows the frequency analysis characteristic f _C of the performance sound of the keyboard 3 collected by the microphone M3. 3D shows a frequency analysis characteristic f _D performance sound piano 4 the microphone M4 is collected. 3A to 3D, the vertical axis represents intensity (level), and the horizontal axis represents frequency.
3A to 3D, frequency bands f1, f2, and f3 indicate a low band (200 Hz band), a middle band (1200 Hz band), and a high band (8000 Hz band), respectively.
In this way, the fast Fourier transform unit 23 obtains intensity data for each band for each audio signal of each channel, and supplies the obtained intensity data to the feature amount detection unit 24.

特徴量検出部２４は、各チャンネルのオーディオ信号の強度のデータから、低域の周波数帯ｆ１、中域の周波数帯ｆ２、及び高域の周波数帯ｆ３の強度を示す特徴量を取得する。ここでの周波数の強度を示す特徴量としては、例えば周波数帯ごとの強度を並べたベクトル値を使用する。特徴量検出部２４で得られた各チャンネルの３つの帯域ｆ１，ｆ２，ｆ３の特徴量のデータは、特徴量比較部２５に供給される。
特徴量比較部２５は、供給される各チャンネルの３つの帯域ｆ１，ｆ２，ｆ３の特徴量のうちの最大値のデータを最大特徴量記録部２８に供給し、これを記録する。この最大特徴量記録部２８での特徴量の最大値の記録は、例えば楽曲の開始とともに半拍ごとに行われ、楽曲の終了とともにリセットされる。 The feature amount detection unit 24 acquires feature amounts indicating the intensities of the low frequency band f1, the middle frequency band f2, and the high frequency band f3 from the data of the intensity of the audio signal of each channel. As the feature amount indicating the intensity of the frequency here, for example, a vector value in which the intensity for each frequency band is arranged is used. The feature amount data of the three bands f1, f2, and f3 of each channel obtained by the feature amount detection unit 24 is supplied to the feature amount comparison unit 25.
The feature amount comparison unit 25 supplies the maximum value data among the feature amounts of the three bands f1, f2, and f3 of the supplied channels to the maximum feature amount recording unit 28, and records this. The recording of the maximum value of the feature value by the maximum feature value recording unit 28 is performed, for example, every half beat when the music starts, and is reset when the music ends.

そして、特徴量比較部２５は、最大特徴量記録部２８に記録された各チャンネルの最大値の特徴量と、特徴量検出部２４から供給されるリアルタイムの特徴量との相関を検出し、４つのチャンネルの相関値を比較して、相関値が高い順に順位を判断する。この相関値が高い順位の判断は、拍検出部２２が検出した演奏中の楽曲の拍に基づいた区間ごとに行われる。例えば、特徴量比較部２５は、拍検出部２２が検出した拍に基づいて、楽曲の半拍ごとの区間を設定し、その半拍の区間ごとに、相関値が高い順位を判断する。ここでは、例えば半拍の区間内の、特定の１つのタイミングでのオーディオ信号から特徴量を検出して、相関を判断する。あるいは、半拍の区間内のオーディオ信号から連続的に特徴量を検出して、その連続した特徴量から相関を判断するようにしてもよい。
特徴量比較部２５が判断した相関値の高い順位のデータは、カメラ選択部２６に供給される。また、特徴量比較部２５で検出した相関値のデータは、エフェクト選択部２７に供給される。 The feature quantity comparison unit 25 detects the correlation between the maximum feature quantity of each channel recorded in the maximum feature quantity recording unit 28 and the real-time feature quantity supplied from the feature quantity detection unit 24. The correlation values of the two channels are compared, and the rank is determined in descending order of the correlation value. The determination of the ranking with the higher correlation value is made for each section based on the beat of the music being played detected by the beat detector 22. For example, the feature amount comparison unit 25 sets a section for each half beat of the music based on the beat detected by the beat detection section 22, and determines the rank with the highest correlation value for each half beat section. Here, for example, a feature amount is detected from an audio signal at one specific timing within a half-beat section, and the correlation is determined. Alternatively, the feature amount may be continuously detected from the audio signal in the half-beat section, and the correlation may be determined from the continuous feature amount.
Data having a higher correlation value determined by the feature amount comparison unit 25 is supplied to the camera selection unit 26. Further, the correlation value data detected by the feature amount comparison unit 25 is supplied to the effect selection unit 27.

カメラ選択部２６は、相関値の順位のデータが供給される毎に、４台のカメラＣ１〜Ｃ４の映像を選択する処理を行う。例えば、ある半拍の区間では、ドラム２の演奏音を集音したマイクロフォンＭ２から得た特徴量の相関値が１位であるとき、その半拍の区間は、ドラム２を撮影したカメラＣ２の映像を選択する。
但し、カメラ選択部２６は、４台のカメラＣ１〜Ｃ４での選択状態を監視して、特定のカメラの映像が選択され続けることを避けるために、特徴量の相関値が１位であっても、相関値が２位以下のカメラの映像を選ぶ場合もある。
カメラ選択部２６で得たカメラ選択データは、映像切換装置３０の映像切換部３１に供給される。 The camera selection unit 26 performs a process of selecting the images of the four cameras C1 to C4 every time the correlation value rank data is supplied. For example, when the correlation value of the feature value obtained from the microphone M2 that collected the performance sound of the drum 2 is 1st in a certain half-beat section, the half-beat section is that of the camera C2 that captured the drum 2. Select a video.
However, the camera selection unit 26 monitors the selection state of the four cameras C1 to C4, and the feature value correlation value is first in order to avoid the selection of a video of a specific camera. In some cases, a camera image having a correlation value of second or lower is selected.
The camera selection data obtained by the camera selection unit 26 is supplied to the video switching unit 31 of the video switching device 30.

エフェクト選択部２７は、カメラ選択部２６で選択したカメラのデータと、そのカメラが撮影した映像に対応した楽器のオーディオ信号の特徴量の相関値とを取得する。そして、取得した相関値に応じて、映像に施すエフェクトを選択し、得られたエフェクト選択データを出力する。例えば、映像の中心部分をズームアップするようなエフェクト処理や、映像の色を通常とは異なる色にするエフェクト処理など、様々なエフェクト処理を選択する。エフェクト選択部２７が映像に施すエフェクトを選択する上で、オーディオ信号の特徴量の相関値を利用するのは一例であり、オーディオ信号から得た特徴値や、映像信号を画像解析した結果の値などを利用してもよい。あるいは、これらのオーディオ信号や映像信号から得た種々の値を組み合わせて、エフェクト選択部２７が映像に施すエフェクトを選択するようにしてもよい。さらにまた、映像に施すエフェクトを、エフェクト選択部２７がランダムに選択するようにしてもよい。
エフェクト選択部２７で得たエフェクト選択データは、映像処理装置４０の映像処理部４１に供給される。 The effect selection unit 27 acquires the data of the camera selected by the camera selection unit 26 and the correlation value between the feature values of the audio signal of the musical instrument corresponding to the video captured by the camera. Then, an effect to be applied to the video is selected according to the acquired correlation value, and the obtained effect selection data is output. For example, various effect processes such as an effect process for zooming up the central portion of the video and an effect process for changing the color of the video to a color different from the normal are selected. When the effect selection unit 27 selects an effect to be applied to the video, the correlation value of the feature value of the audio signal is used as an example. The feature value obtained from the audio signal or the value obtained by image analysis of the video signal is used. Etc. may be used. Alternatively, the effect selection unit 27 may select an effect to be applied to the video by combining various values obtained from these audio signals and video signals. Furthermore, the effect selection unit 27 may randomly select an effect to be applied to the video.
The effect selection data obtained by the effect selection unit 27 is supplied to the video processing unit 41 of the video processing device 40.

映像切換装置３０は、４台のカメラＣ１〜Ｃ４が撮影した映像信号が供給される映像切換部３１を備える。映像切換部３１は、４台のカメラＣ１〜Ｃ４が撮影した映像信号から、カメラ選択部２６で指示された映像信号に切換え、切換えられた映像信号を映像処理装置４０に供給する。
また、映像処理装置４０は、映像切換装置３０の映像切換部３１で切換えられた映像信号に対してエフェクト処理を施す映像処理部４１を備える。映像処理部４１は、エフェクト選択部２７から指示されたエフェクト処理を、供給される映像信号に対して施し、エフェクト処理が施された映像信号を出力する。なお、エフェクト処理なしの指示がある場合には、映像処理部４１は映像切換部３１から供給される映像信号をそのまま出力する。
映像処理部４１が出力する映像信号は、プロジェクタ装置７、記録装置８、及び配信装置９に供給される。 The video switching device 30 includes a video switching unit 31 to which video signals captured by the four cameras C1 to C4 are supplied. The video switching unit 31 switches from the video signals captured by the four cameras C1 to C4 to the video signals instructed by the camera selection unit 26, and supplies the switched video signals to the video processing device 40.
The video processing device 40 includes a video processing unit 41 that performs effect processing on the video signal switched by the video switching unit 31 of the video switching device 30. The video processing unit 41 performs the effect processing instructed by the effect selection unit 27 on the supplied video signal, and outputs the video signal subjected to the effect processing. If there is an instruction not to perform effect processing, the video processing unit 41 outputs the video signal supplied from the video switching unit 31 as it is.
The video signal output from the video processing unit 41 is supplied to the projector device 7, the recording device 8, and the distribution device 9.

［３．映像選択処理の流れ］
図４は、映像処理システム１０の制御装置２０による制御で、映像選択処理及びエフェクト選択処理が行われる流れを示すフローチャートである。
まず、拍検出部２２が拍検出処理を行い、この拍検出部２２での一定の拍（ここでは半拍）の区間の検出ごとに、特徴量比較部２５は、特徴量検出部２４における特徴量検出処理で検出された各帯域ｆ１，ｆ２，ｆ３（図３）の強度を示す特徴量を取得する（ステップＳ１１）。ここで、最大特徴量記録部２８は、各楽器のオーディオ信号の最大特徴量を記録する（ステップＳ１２）。この最大特徴量記録部２８に記録される最大特徴量は、楽曲の演奏開始から演奏が進むごとに、最大値が随時更新されて記録される。 [3. Flow of video selection process]
FIG. 4 is a flowchart showing a flow in which video selection processing and effect selection processing are performed under the control of the control device 20 of the video processing system 10.
First, the beat detection unit 22 performs a beat detection process, and the feature amount comparison unit 25 performs the feature detection in the feature amount detection unit 24 every time the beat detection unit 22 detects a certain beat (in this case, a half beat). A feature amount indicating the intensity of each of the bands f1, f2, and f3 (FIG. 3) detected by the amount detection process is acquired (step S11). Here, the maximum feature amount recording unit 28 records the maximum feature amount of the audio signal of each musical instrument (step S12). The maximum feature amount recorded in the maximum feature amount recording unit 28 is updated and recorded as needed every time the performance progresses from the start of the music performance.

そして、特徴量比較部２５は、最大特徴量記録部２８に記録された最大特徴量と、特徴量検出部２４で検出された現在の特徴量とから、各楽器のオーディオ信号ごとの相関値を計算する（ステップＳ１３）。ここでの相関値は、例えば現在の特徴量が最大特徴量に近い値であるとき高い相関値となり、現在の特徴量が最大特徴量から離れた小さな値であるとき低い相関値となる。
その後、特徴量比較部２５は、４種類の楽器１〜４に対応した４チャンネルのオーディオ信号の内で、相関が高いものから順位を判断する特徴量比較処理を行う（ステップＳ１４）。この相関が高い順位の判断は、拍検出部２２が検出した拍に基づいて、楽曲の半拍の区間ごとに行われる。 Then, the feature quantity comparison unit 25 calculates a correlation value for each audio signal of each instrument from the maximum feature quantity recorded in the maximum feature quantity recording unit 28 and the current feature quantity detected by the feature quantity detection unit 24. Calculate (step S13). The correlation value here is, for example, a high correlation value when the current feature value is close to the maximum feature value, and a low correlation value when the current feature value is a small value far from the maximum feature value.
Thereafter, the feature amount comparison unit 25 performs a feature amount comparison process for determining the rank from the highest correlation among the four channels of audio signals corresponding to the four types of musical instruments 1 to 4 (step S14). The determination of the ranking with the high correlation is performed for each half-beat section of the music based on the beat detected by the beat detection unit 22.

次に、カメラ選択部２６は、ステップＳ１４で順位が１位になった楽器のカメラ映像を選ぶことが適切か否かを判断する（ステップＳ１５）。ここでは、例えば特定のカメラの映像が選択され続ける状態がある程度継続した場合に、該当するカメラ映像の選択が適切でないと判断される。また、カメラの切換えが頻繁に行われた場合であっても、特定のカメラの映像が選ばれることが比較的多い状態になる場合にも、該当するカメラ映像の選択が適切でないと判断される。 Next, the camera selection unit 26 determines whether or not it is appropriate to select the camera image of the musical instrument that is ranked first in step S14 (step S15). Here, for example, when the state where the video of a specific camera continues to be selected continues to some extent, it is determined that the corresponding camera video is not appropriately selected. Even when the camera is frequently switched, it is determined that the selection of the corresponding camera video is not appropriate even when the video of a specific camera is relatively frequently selected. .

ステップＳ１５において、順位が１位になった楽器のカメラ映像を選ぶことが適切であると判断されたとき（ステップＳ１５のＹＥＳ）、カメラ選択部２６は、映像切換部３１に対して、順位が１位になった楽器を撮影したカメラの映像を選択する指示を行う（ステップＳ１６）。また、ステップＳ１５で、順位が１位になった楽器のカメラ映像を選ぶことが適切でないとカメラ選択部２６が判断したときは（ステップＳ１５のＮＯ）、カメラ選択部２６は、映像切換部３１に対して、順位が２位以下のカメラの映像の内で、適切なカメラの映像を選択する指示を行う（ステップＳ１７）。このステップＳ１７では、例えば相関の高さの順位が２位のカメラの映像を選択するようにする。あるいは、相関値とは無関係に、過去の一定期間内で最も選択される頻度が少ないカメラの映像を選ぶようにしてもよい。 In step S15, when it is determined that it is appropriate to select the camera image of the musical instrument with the first rank (YES in step S15), the camera selection unit 26 gives the rank to the video switching unit 31. An instruction to select an image of a camera that has photographed the instrument that has been ranked first is given (step S16). In step S15, when the camera selection unit 26 determines that it is not appropriate to select the camera image of the musical instrument that ranks first (NO in step S15), the camera selection unit 26 selects the video switching unit 31. In response to this, an instruction is given to select an appropriate camera image from among the images of the cameras ranked second or lower (step S17). In this step S17, for example, the video of the camera having the second highest correlation order is selected. Or you may make it select the image | video of the camera with the least frequency selected within the past fixed period irrespective of a correlation value.

そして、ステップＳ１６又はステップＳ１７で映像の選択指示を行った後、エフェクト選択部２７は、映像処理部４１に対してエフェクト処理についての指示を行い、映像のエフェクト状態を設定する（ステップＳ１８）。このエフェクト処理については、相関値により選択されたエフェクト状態を設定する場合と、相関値とは無関係にランダムにエフェクト状態を設定する場合とがある。 Then, after giving a video selection instruction in step S16 or step S17, the effect selection unit 27 instructs the video processing unit 41 about effect processing, and sets the video effect state (step S18). As for this effect processing, there are a case where an effect state selected by a correlation value is set and a case where an effect state is set randomly regardless of the correlation value.

［４．実際の映像切換例］
図５は、映像切換え処理を実行した一例を示す。
図５において、半拍を検出するタイミングｂ１，ｂ２，ｂ３，ｂ４，・・・ごとに、カメラＣ１〜Ｃ４の映像の選択状態と、エフェクトの設定状態を示す。
例えば、タイミングｂ１からタイミングｂ５までの区間（２拍の区間）で、カメラＣ３の映像を選択し、エフェクトの設定なしとする。そして、タイミングｂ５からタイミングｂ７までの区間（１拍の区間）で、カメラＣ２の映像を選択し、エフェクト状態として、そのカメラＣ２が撮影した映像の中心をズームアップする処理を行う。このズームアップ処理は、いわゆるデジタルズーム処理により行われる。また、タイミングｂ７からタイミングｂ８までの区間（半拍の区間）で、カメラＣ１の映像を選択し、エフェクトの設定なしとする。さらに、タイミングｂ８からタイミングｂ１０までの区間（１拍の区間）で、カメラＣ２の映像を選択し、カラーをセピア色に変更するエフェクト処理を設定する。 [4. Actual video switching example]
FIG. 5 shows an example in which the video switching process is executed.
In FIG. 5, for each timing b1, b2, b3, b4,... For detecting a half-beat, a video selection state and an effect setting state of the cameras C1 to C4 are shown.
For example, the video of the camera C3 is selected in the section from the timing b1 to the timing b5 (two beat section), and no effect is set. Then, in the section from timing b5 to timing b7 (one beat section), the video of the camera C2 is selected, and as the effect state, processing for zooming up the center of the video shot by the camera C2 is performed. This zoom-up process is performed by a so-called digital zoom process. In addition, the video of the camera C1 is selected in the section from the timing b7 to the timing b8 (half-beat section), and no effect is set. Further, in the section from timing b8 to timing b10 (one beat section), the video of the camera C2 is selected, and effect processing for changing the color to sepia is set.

このようにオーディオ信号の特徴量の最大値との相関に基づいて、カメラが撮影した映像を一定の拍ごとに切換えるようにしたことで、複数台のカメラＣ１〜Ｃ４が撮影した映像が、自動的に演奏に合わせて切換わるようになる。この場合、演奏音の拍を検出して、その拍に基づいて設定した区間（ここでは半拍の区間）ごとに判断して切換えるようしたことで、楽曲の演奏に同期した切換えが行われ、違和感のない自然な映像切換えが実行される。 As described above, by switching the video captured by the camera at every fixed beat based on the correlation with the maximum value of the feature value of the audio signal, the video captured by the plurality of cameras C1 to C4 is automatically displayed. Will be switched according to the performance. In this case, by detecting the beat of the performance sound and judging and switching for each section (here, the half-beat section) set based on the beat, switching is performed in synchronization with the performance of the music, Natural video switching without discomfort is executed.

また、選択される映像は、その映像で表示される楽器の音が相対的に高い状態であり、複数の演奏者の中で注目すべき演奏者の映像が選択されることになり、適切な映像切換えが行われる。すなわち、複数の楽器が演奏時に発する演奏音の大きさは、楽器ごとに異なり、例えば図３の例では、ドラムの演奏音の周波数解析特性ｆ_Ｂ（図３Ｂ）は比較的強い傾向があり、音の強さだけで切換えの判断を行うようにすると、ドラムの映像だけが選択され続けることになってしまう。ここで本例では、特徴量の最大値との相関で判断するようにしたことで、それぞれの楽器の演奏音が盛り上がった特徴量が高い状態か否かの判断ができ、適切な映像選択ができるようになる。
例えば、図３Ａ〜Ｄに示す演奏音が得られたとき、特徴量の絶対的なレベルではドラムの演奏音が高いが、ピアノなどの他の楽器の演奏音を最も相関が高いと判断して、その楽器の映像を選ぶような処理が可能になる。したがって、実際の楽器の演奏状態に合わせた適切な映像切換えが実現できるようになる。なお、特徴量としては、例えば周波数帯ごとの強度を並べたベクトル値が適用可能である。 In addition, the selected video is a state in which the sound of the musical instrument displayed in the video is relatively high, and a video of a performer to be noted among a plurality of performers is selected. Video switching is performed. That is, the magnitude of the performance sound produced by a plurality of musical instruments during performance varies from instrument to instrument. For example, in the example of FIG. 3, the frequency analysis characteristic f _B (FIG. 3B) of the performance sound of the drum tends to be relatively strong. If switching is determined based on sound intensity alone, only the drum image will be selected. Here, in this example, since the judgment is made based on the correlation with the maximum value of the feature value, it can be judged whether or not the feature value of the performance sound of each instrument is high, and appropriate video selection can be performed. become able to.
For example, when the performance sound shown in FIGS. 3A to 3D is obtained, it is determined that the performance sound of the drum is high at the absolute level of the feature amount, but the performance sound of other musical instruments such as a piano is most highly correlated. , Processing such as selecting the video of the instrument. Therefore, it is possible to realize appropriate video switching in accordance with the actual musical instrument performance state. In addition, as a feature-value, the vector value which arranged the intensity | strength for every frequency band, for example is applicable.

［５．変形例］
なお、上述した実施の形態例では、音響解析のみに基づき、映像切換を実行する例を説明した。これに対して、本発明の音響解析とリアルタイムの画像解析を組み合わせ、切換の精度を更に高めてもよい。具体的には、特定の演奏者の演奏音の盛り上がりに加えて、例えば指の動きが加速する、感情が高ぶり演奏者の表情が大きく変化する或いはボディジェスチャが顕著になる、などの特徴量変化を、画像解析により検出し、音響解析と組み合わせ、映像切換処理を実行するようにしてもよい。画像解析処理については、既存の技術を活用することができる。
また、上述した指の動きの加速、演奏者の表情の大きな変化などの画像解析からの特徴量変化の検出で、エフェクト処理の実行を制御するようにしてもよい。 [5. Modified example]
In the above-described embodiment, an example in which video switching is performed based only on acoustic analysis has been described. On the other hand, the acoustic analysis of the present invention and real-time image analysis may be combined to further increase the switching accuracy. Specifically, in addition to the excitement of the performance sound of a specific performer, for example, the movement of fingers accelerates, the emotion is high, the facial expression of the performer changes greatly, or the body gesture becomes noticeable, for example, May be detected by image analysis and combined with acoustic analysis to execute video switching processing. For image analysis processing, existing technologies can be used.
Further, the execution of the effect process may be controlled by detecting a feature amount change from image analysis such as acceleration of finger movement and a large change in the player's facial expression.

また、上述した実施の形態例では、映像処理システム１０として、オーディオ信号に基づいて映像の切換制御を行う制御装置２０と、その制御装置２０からの指示で映像切換処理を行う映像切換装置３０とエフェクト処理を行う映像処理装置４０とを個別の装置とした。これに対して、１つの装置が、制御装置２０としての機能と、映像切換装置３０と映像処理装置４０としての機能を備えるようにしてもよい。あるいは、これら３つの装置２０，３０，４０のいずれか２つを１つの装置で構成してもよい。このように１つの装置で映像処理システム１０を構成する際にも、コンピュータ装置（及びその周辺機器）で構成する場合と、専用のハードウェアで構成する場合のいずれでもよい。 In the above-described embodiment, the video processing system 10 includes a control device 20 that performs video switching control based on an audio signal, and a video switching device 30 that performs video switching processing in response to an instruction from the control device 20. The video processing device 40 that performs effect processing is a separate device. On the other hand, one device may be provided with a function as the control device 20 and a function as the video switching device 30 and the video processing device 40. Alternatively, any two of these three devices 20, 30, 40 may be configured as one device. As described above, when the video processing system 10 is configured by one device, either the computer device (and its peripheral devices) or the dedicated hardware may be used.

また、上述した実施の形態例では、マイクロフォンを各楽器１〜４ごとに配置して、それぞれの楽器の演奏音が個別のチャンネルのオーディオ信号として得られるようした。これに対して、オーディオ信号のチャンネル数は、楽器の数よりも少ない数とし、得られたオーディオ信号を解析して、各楽器の演奏音の強度（特徴量）を判断するようにしてもよい。 In the above-described embodiment, a microphone is arranged for each of the musical instruments 1 to 4, and the performance sound of each musical instrument is obtained as an audio signal of an individual channel. On the other hand, the number of channels of the audio signal may be smaller than the number of instruments, and the obtained audio signal may be analyzed to determine the intensity (feature value) of the performance sound of each instrument. .

また、上述した実施の形態例では、楽器の演奏音を解析して映像の切換え処理を行うようにしたが、楽器の演奏音以外の音を解析して、映像の切換え処理を行うようにしてもよい。例えば、マイクロフォンが集音する少なくとも１種類の音については、演奏者の歌声とし、その歌声の特徴量の相関値を得るようにして、楽器の演奏音の特徴量の相関値との比較で、映像の切換え処理を行うようにしてもよい。 In the embodiment described above, the performance sound of the musical instrument is analyzed and the video switching process is performed. However, the sound other than the musical instrument performance sound is analyzed and the video switching process is performed. Also good. For example, at least one kind of sound collected by a microphone is used as a performer's singing voice, and a correlation value of the characteristic amount of the singing voice is obtained, and compared with a correlation value of the characteristic amount of the musical performance sound, A video switching process may be performed.

また、上述した実施の形態例では、映像切換部３１で映像を切り換えた後、エフェクト処理部４１でエフェクト処理を施すようにしたが、映像切換処理のみを行うようして、エフェクト処理については実行しないようにしてもよい。あるいは、映像切換処理のみを映像処理システム１０が自動的に行うようにして、エフェクト処理部４１でエフェクト処理については、作業者が手動による操作で随時エフェクト処理を施すようにしてもよい。エフェクト処理の例として、映像のズームアップや色の変更以外の処理を施すようにしてもよい。 Further, in the embodiment described above, after the video is switched by the video switching unit 31, the effect processing is performed by the effect processing unit 41. However, only the video switching process is performed and the effect processing is executed. You may make it not. Alternatively, the video processing system 10 may automatically perform only the video switching processing, and the effect processing unit 41 may perform the effect processing as needed by manual operation by the operator. As an example of the effect processing, processing other than zooming up the image or changing the color may be performed.

さらに、映像切換部３１での映像切換処理としては、図５で説明したように半拍ごとの区間でいずれか１つの映像を選ぶようにしたが、例えば、映像切換部３１が、各楽器の演奏音の特徴量の相関値に基づいて、複数の映像を合成するようにしてもよい。例えば、ある区間で、演奏音の特徴量の相関値がほぼ同じ程度に高い２つの楽器が存在するとき、１画面内を２分割して、相関値が高い２つの楽器の奏者を撮影した映像を各分割画面に配置するようにしてもよい。 Further, as the video switching process in the video switching unit 31, as described with reference to FIG. 5, any one video is selected in the section of each half beat. A plurality of videos may be synthesized based on the correlation value of the characteristic amount of the performance sound. For example, when there are two musical instruments with a high correlation value of the characteristic value of the performance sound in a certain section, a picture of two musical instrument players with a high correlation value divided into two in one screen May be arranged on each divided screen.

また、用意するカメラの少なくとも１台については、ステージ全体を撮影するようにして、随時、そのステージ全体の映像に切り換えるようにしてもよい。この場合には、例えば多数（ここでは３人又は４人）の楽器の演奏音の特徴量の相関値が閾値以上に高い状態を検出したとき、ほぼ全員の奏者の演奏が盛り上がった状態であるとして、ステージ全体を撮影するカメラの映像を、適宜挿入するようにしてもよい。 Further, for at least one camera to be prepared, the entire stage may be photographed and switched to the image of the entire stage as needed. In this case, for example, when a state in which the correlation value of the feature values of performance sounds of a large number (three or four in this case) of musical instruments is higher than a threshold value is detected, the performance of almost all the players is swelled. As an example, a camera image that captures the entire stage may be inserted as appropriate.

さらに、上述した実施の形態例では、拍検出部２２での検出に基づいて半拍ごとに映像を切り換えるようにしたが、この半拍ごとの区間の設定については一例であり、１拍や複数の拍ごとに映像を切り換える区間を設定してもよい。
また、映像を切り換える区間の設定そのものをランダムに行い、あるタイミングでは半拍ごとに切換えるようにし、別のタイミングでは１拍や複数拍ごとに切換えるようにしてもよい。この場合、例えば特徴量の相関値が比較的高い状態（つまり演奏音が比較的大きい区間）では、切換える区間を短くし、特徴量の相関値が比較的低い状態（つまり演奏音が比較的小さい区間）では、切換える区間を長くするように、可変設定してもよい。 Furthermore, in the above-described embodiment, the video is switched for each half beat based on the detection by the beat detection unit 22, but the setting of the section for each half beat is an example, and one beat or a plurality of beats are set. You may set the section which switches an image for every beat.
Further, the setting of the section for switching the video itself may be performed at random, and may be switched every half beat at a certain timing, and may be switched every one beat or a plurality of beats at another timing. In this case, for example, in a state where the correlation value of the feature value is relatively high (that is, a section where the performance sound is relatively high), the section to be switched is shortened and the correlation value of the feature value is relatively low (that is, the performance sound is relatively low) In (section), it may be variably set so that the section to be switched is lengthened.

また、上述した実施の形態例では、各楽器の演奏音の特徴量の最大値を、楽曲の開始から逐次更新させるようしたが、例えば楽器ごとに予め想定される最大値を予め最大特徴量記録部２８にセットするようにしてもよい。
さらに、各チャンネルの音の特徴量は、高速フーリエ変換処理で周波数解析した結果から得るようにしたが、高速フーリエ変換以外フィルタ処理で、帯域ごとの特徴量を得るようにしてもよい。帯域として、低域、中域、及び高域の３つの帯域を選び、その３つの帯域での特徴量から選択するようにした点についても一例であり、その他の帯域数で特徴量を検出するようにしてもよい。 In the above-described embodiment, the maximum value of the performance value of the performance sound of each musical instrument is sequentially updated from the start of the music. For example, the maximum value estimated in advance for each musical instrument is recorded in advance as the maximum characteristic value. You may make it set to the part 28. FIG.
Furthermore, although the feature amount of the sound of each channel is obtained from the result of frequency analysis by fast Fourier transform processing, the feature amount for each band may be obtained by filter processing other than fast Fourier transform. This is an example of selecting three bands of low, middle, and high as the bands, and selecting from the features in the three bands. The feature is detected using the number of other bands. You may do it.

また、図５の例では、特定の楽器の演奏音の相関が高い状態が続いたとき、連続して同じ映像を選ぶようにしたが、例えば半拍や１拍などの予め決めた期間ごとに、必ず別の映像に切り換えるようにしてもよい。 In the example of FIG. 5, when the correlation of performance sounds of a specific instrument continues to be high, the same video is selected continuously. For example, every predetermined period such as half beat or one beat. It is also possible to always switch to another video.

さらに、図１の例では、コンサートのライブ映像を撮影（収録）する場合としたが、本発明は、その他の各種ライブ映像（討論会、演劇など）のライブ映像を撮影（収録）する場合にも適用可能である。 Further, in the example of FIG. 1, the live video of the concert is shot (recorded), but the present invention is used when shooting (recording) the live video of various other live videos (discussion, theater, etc.). Is also applicable.

１…ベース、２…ドラム、３…キーボード、４…ピアノ、５…ステージ、６…オーディオミキサ、７…プロジェクタ装置、８…記録装置、９…配信装置、１０…映像処理システム、２０…制御装置、２１…オーディオ入力部、２２…拍検出部、２３…高速フーリエ変換部（ＦＦＴ部）、２４…特徴量検出部、２５…特徴量比較部、２６…カメラ選択部、２７…エフェクト選択部、２８…最大特徴量記録部、３０…映像切換装置、３１…映像切換部、４０…映像処理部、４１…映像処理部、Ｃ１〜Ｃ４…カメラ、Ｍ１〜Ｍ４…マイクロフォン DESCRIPTION OF SYMBOLS 1 ... Base, 2 ... Drum, 3 ... Keyboard, 4 ... Piano, 5 ... Stage, 6 ... Audio mixer, 7 ... Projector apparatus, 8 ... Recording apparatus, 9 ... Distribution apparatus, 10 ... Video processing system, 20 ... Control apparatus , 21 ... audio input unit, 22 ... beat detection unit, 23 ... fast Fourier transform unit (FFT unit), 24 ... feature amount detection unit, 25 ... feature amount comparison unit, 26 ... camera selection unit, 27 ... effect selection unit, 28 ... Maximum feature amount recording unit, 30 ... Video switching device, 31 ... Video switching unit, 40 ... Video processing unit, 41 ... Video processing unit, C1-C4 ... Camera, M1-M4 ... Microphone

Claims

A video switching unit that receives a plurality of video signals captured by a plurality of cameras prepared for each instrument or player, and switches between the plurality of video signals;
A beat detector for detecting the beat of the music from the input audio signal;
A feature amount detection unit for detecting a feature amount for each instrument or player from the audio signal;
While recording the maximum feature amount for each instrument or player detected by the feature amount detection unit, the feature amount detected by the feature amount detection unit for each section set based on the beat detected by the beat detection unit and the feature amount A feature value comparison unit that determines the instrument or player who has the strongest correlation with the maximum feature value;
A live video processing system comprising: a camera selection unit that selects a video signal of a musical instrument or a player having the strongest correlation determined by the feature amount comparison unit by the video switching unit for each set section.

The audio signal is a signal of a plurality of channels prepared for each instrument or player,
The live video processing system according to claim 1, wherein the feature amount detection unit detects the intensity of a specific frequency band from each of audio signals of a plurality of channels.

When the frequency at which the camera selection unit most recently selected the video signal of the musical instrument or player having the strongest correlation is equal to or higher than a certain frequency, the video signal of the musical instrument or player whose correlation determined by the feature amount comparison unit is the second or later The live video processing system according to claim 1, wherein the live video processing system is selected.

Furthermore, an effect selection unit that selects effect processing to be applied to the video signal;
The live video processing system according to claim 1, further comprising: a video processing unit that performs the effect processing selected by the effect selection unit on the video signal selected by the camera selection unit.

Beat detection processing to detect the beat of the music from the input audio signal;
A feature amount detection process for detecting a feature amount for each instrument or player from the audio signal;
While recording the maximum feature quantity for each instrument or player detected by the feature quantity detection process, the feature quantity detected by the feature quantity detection process for each section set based on the beat detected by the beat detection process A feature value comparison process for determining the instrument or player who has the strongest correlation with the maximum feature value;
Camera selection for selecting a video signal of a musical instrument or player having the strongest correlation determined by the feature amount comparison processing for each set section from a plurality of video signals photographed by a plurality of cameras prepared for each instrument or player. And a live video processing method.

Furthermore, an effect selection process for selecting an effect process to be applied to the video signal;
The live video processing method according to claim 5, further comprising: video processing for performing the effect processing selected by the effect selection processing on the video signal selected by the camera selection processing.

A beat detection procedure for detecting the beat of the music from the input audio signal;
A feature amount detection procedure for detecting a feature amount for each instrument or player from the audio signal;
While recording the maximum feature amount for each instrument or player detected by the feature amount detection procedure, for each section set based on the beat detected by the beat detection procedure, the feature amount detected by the feature amount detection procedure and the A feature comparison procedure for determining the instrument or player with the strongest correlation with the maximum feature,
Camera selection for selecting a video signal of a musical instrument or player having the strongest correlation determined by the feature amount comparison procedure for each set section from a plurality of video signals photographed by a plurality of cameras prepared for each instrument or player. Procedure and
A program that causes a computer to execute.

Furthermore, an effect selection procedure for selecting the effect processing to be applied to the video signal,
The program according to claim 7, wherein the computer executes a video processing procedure for performing the effect processing selected by the effect selection procedure on the video signal selected by the camera selection procedure.