JP2018074252A

JP2018074252A - Acoustic system and control method of same, signal generating device, computer program

Info

Publication number: JP2018074252A
Application number: JP2016208845A
Authority: JP
Inventors: 恭平北澤; Kyohei Kitazawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-10-25
Filing date: 2016-10-25
Publication date: 2018-05-10
Anticipated expiration: 2036-10-25
Also published as: JP6821390B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique enabling efficient processing in a configuration in which sound is acquired from a plurality of areas into which a space is divided to generate a reproduction signal.SOLUTION: The acoustic system includes: a microphone array for collecting a sound; separation means for separating the sound collected by the microphone array into sounds in a plurality of separation areas obtained by dividing a certain space; control means for controlling division into a plurality of separation areas of the certain space; and generating means for generating a reproduction signal based on the separated sounds.SELECTED DRAWING: Figure 1

Description

本発明は音響システム及びその制御方法、信号生成装置、コンピュータプログラムに関する。 The present invention relates to an acoustic system, a control method thereof, a signal generation device, and a computer program.

空間を複数のエリアに分割してエリアごとの音声を取得する技術が知られている（特許文献１）。 A technique is known in which a space is divided into a plurality of areas and audio for each area is acquired (Patent Document 1).

特開２０１４−７２７０８号公報JP 2014-72708 A

しかしながら、複数のエリアに分割したエリアの音声をリアルタイム処理し、放送しようとすると、処理や伝送が間に合わずデータが欠損し、音声が途切れてしまう可能性があった。 However, if the voice of the area divided into a plurality of areas is processed in real time and broadcasted, the processing and transmission may not be in time, data may be lost, and the voice may be interrupted.

本発明は上記課題に鑑みなされたものであり、空間を分割した複数のエリアから音声を取得して再生用信号を生成する構成において、処理の効率化を可能にする技術を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique that enables efficient processing in a configuration in which audio is obtained from a plurality of areas divided into spaces and a reproduction signal is generated. And

上記目的を達成するため、本発明による音響システムは以下の構成を備える。即ち、
音声を収集するマイクアレイと、
前記マイクアレイが収集した音声を、一定の空間を分割した複数の分離エリアおける音声に分離する分離手段と、
前記一定の空間の前記複数の分離エリアへの分割を制御する制御手段と、
前記分離した音声に基づき、再生用信号を生成する生成手段と
を備える。 In order to achieve the above object, an acoustic system according to the present invention comprises the following arrangement. That is,
A microphone array that collects audio,
Separation means for separating the sound collected by the microphone array into sound in a plurality of separation areas into which a certain space is divided;
Control means for controlling division of the constant space into the plurality of separation areas;
Generating means for generating a reproduction signal based on the separated sound.

本発明によれば、空間を分割した複数のエリアから音声を取得して再生用信号を生成する構成において、処理を効率化することが可能になる。 According to the present invention, it is possible to improve processing efficiency in a configuration in which sound is obtained from a plurality of areas divided into spaces and a reproduction signal is generated.

音声信号処理装置の構成を示すブロック図。The block diagram which shows the structure of an audio | voice signal processing apparatus. 分離エリア制御の説明図。Explanatory drawing of isolation | separation area control. 分離エリア制御の時間変化を表す説明図。Explanatory drawing showing the time change of isolation | separation area control. 音声信号処理装置のハードウェア構成を示すブロック図。The block diagram which shows the hardware constitutions of an audio | voice signal processing apparatus. 音声信号処理を示すフローチャート。The flowchart which shows an audio | voice signal process. 分離エリア制御の表示装置を説明する図。6A and 6B illustrate a display device for separation area control. 音響システムを説明する図。The figure explaining an acoustic system. 音響システムの構成の詳細を示すブロック図。The block diagram which shows the detail of a structure of an acoustic system. 分離エリア制御の説明図。Explanatory drawing of isolation | separation area control. 音声信号処理を示すフローチャート。The flowchart which shows an audio | voice signal process.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention, and all the combinations of features described in the present embodiment are not necessarily essential to the solution means of the present invention. In addition, about the same structure, the same code | symbol is attached | subjected and demonstrated.

＜実施形態１＞
本発明の第一の実施形態（実施形態１）では、音源分離処理がリアルタイム再生に間に合わなくなった場合に、使用する分離エリア数を低減する構成を説明する。 <Embodiment 1>
In the first embodiment (Embodiment 1) of the present invention, a configuration for reducing the number of separation areas to be used when the sound source separation processing is not in time for real-time reproduction will be described.

（音声信号処理装置）
図１は音声信号処理装置１００の構成を示すブロック図である。音声信号処理装置１００は、マイクアレイにより所定の空間エリアから音声を収集し、収集した音声を複数の分離エリアに基づき複数の音声信号に分離して音声処理を行い、ミキシングを行って再生用信号を生成する装置である。音声信号処理装置１００はマイクアレイ１１１、音源分離部１１２、分離エリア制御部１１３、音声信号処理部１１４、記憶部１１５、リアルタイム再生用信号生成部１１６、及び、リプレイ再生用信号生成部１１７を備える。 (Audio signal processor)
FIG. 1 is a block diagram showing the configuration of the audio signal processing apparatus 100. The audio signal processing apparatus 100 collects audio from a predetermined spatial area by a microphone array, separates the collected audio into a plurality of audio signals based on a plurality of separation areas, performs audio processing, performs mixing, and reproduces signals. Is a device that generates The audio signal processing device 100 includes a microphone array 111, a sound source separation unit 112, a separation area control unit 113, an audio signal processing unit 114, a storage unit 115, a real-time reproduction signal generation unit 116, and a replay reproduction signal generation unit 117. .

マイクアレイ１１１は複数のマイクロホンからなる。マイクアレイ１１１は、担当する空間の音声をマイクロホンで収集する。マイクアレイ１１１を構成する各マイクロホンがそれぞれ収音するため、マイクアレイ１１１が収集する音声は、全体として、各マイクロホンが収集した複数の音声からなるマルチチャネルの信号となる。マイクアレイ１１１は、空間の音声をマイクロホンで収音し、収音した信号をＡ／Ｄ変換（アナログ／デジタル変換）したのち、音源分離部１１２へ出力する。 The microphone array 111 is composed of a plurality of microphones. The microphone array 111 collects sound of the space in charge with a microphone. Since each microphone constituting the microphone array 111 collects sound, the sound collected by the microphone array 111 is a multi-channel signal composed of a plurality of sounds collected by each microphone as a whole. The microphone array 111 picks up the sound of the space with a microphone, performs A / D conversion (analog / digital conversion) on the picked up signal, and outputs the signal to the sound source separation unit 112.

音源分離部１１２、分離エリア制御部１１３、音声信号処理部１１４、リアルタイム再生用信号生成部１１６、リプレイ再生用信号生成部１１７は例えばＣＰＵ（中央演算処理装置）やＤＳＰ、ＭＰＵなどの演算処理装置からなる。ＤＳＰはDigital Signal Processorの略称であり、ＭＰＵはMicro-processing unitの略称である。 The sound source separation unit 112, the separation area control unit 113, the audio signal processing unit 114, the real-time reproduction signal generation unit 116, and the replay reproduction signal generation unit 117 are, for example, arithmetic processing devices such as a CPU (Central Processing Unit), a DSP, and an MPU. Consists of. DSP is an abbreviation for Digital Signal Processor, and MPU is an abbreviation for Micro-processing unit.

音源分離部１１２は、マイクアレイ１１１が収音を担当する空間をＮ（Ｎ＞１）個のエリア（以下、「分離エリア」）に分割した場合に、マイクアレイ１１１から入力された信号を各分離エリアにおける音声に分離する音源分離処理を行う。前述のように、マイクアレイ１１１から入力される信号は各マイクロホンが収集した複数の音声からなるマルチチャネルの信号である。そのため、マイクアレイ１１１を構成する各マイクロホンと集音したい分離エリアとの位置関係に基づき、マイクロホンが収集した音声信号に位相制御および重みづけをして加算することで、任意の分離エリアの音声を再現することができる。なお、本実施形態では、この分離エリアの配置が予め定められている例を説明する。音源分離部１１２はマイクアレイ１１１から入力された信号を用いて空間をＮ（Ｎ＞１）個のエリアに分割するように音源分離処理を行う。分離処理は処理フレームごと、つまり所定の時間間隔ごとに行われる。例えば、所定の時間ごとにビームフォーミング処理を行い、エリアごとの音声を取得する。分離して取得した音声は音声信号処理部１１４および記憶部１１５へ出力される。 The sound source separation unit 112 divides a signal input from the microphone array 111 into each of N (N> 1) areas (hereinafter, “separation areas”) when the space in which the microphone array 111 is in charge of sound collection is divided. Performs sound source separation processing to separate the sound in the separation area. As described above, the signal input from the microphone array 111 is a multi-channel signal composed of a plurality of sounds collected by each microphone. Therefore, based on the positional relationship between the microphones constituting the microphone array 111 and the separation area to be collected, the sound signals collected by the microphones are phase-controlled and weighted and added, so that the sound of an arbitrary separation area can be added. Can be reproduced. In this embodiment, an example in which the arrangement of the separation areas is predetermined will be described. The sound source separation unit 112 performs sound source separation processing so as to divide the space into N (N> 1) areas using the signal input from the microphone array 111. The separation process is performed every processing frame, that is, every predetermined time interval. For example, the beam forming process is performed every predetermined time, and the sound for each area is acquired. The separated and acquired sound is output to the sound signal processing unit 114 and the storage unit 115.

分離エリア制御部１１３は、音源の分離や再生用信号の生成等を行うための処理負荷に応じてマイクアレイが収音する一定の空間の複数の分離エリアへの分割を制御する。具体的には、複数の分離エリアの配置及び個数を制御する。例えば、処理装置の処理負荷が大きく、全てのエリアの音源分離処理を行うと処理がリアルタイム再生に間に合わない場合、分離エリア制御部１１３は音源分離部１１２で行う音源分離エリアを結合してエリア数を減らす。例えば、処理が十分に間に合っている状態では、例えば図２（Ａ）のように収音空間Ａ１を８×８＝６４個の分離エリアＡ２に細かくエリア分割する。処理が間に合わなくなった場合には、例えば前フレームの処理においてそのエリアの音声が所定のレベル以上であったか否かを判定し、所定のレベル未満のエリアについては図２（Ｂ）に示すようにエリアを結合しエリア数を減らす。所定のレベル以上の音声は有意な音声である蓋然性が高い一方で、所定のレベル未満の音声は雑音等の有意でない音声である蓋然性が高い。そのため、音声が所定のレベル以上のエリアには細かい分離エリアを優先的に割り当てることで、有意な音声を忠実に再現するとともに、所定レベル未満のエリアでは分離エリアを統合することで、処理を高速化することができる。 The separation area control unit 113 controls the division into a plurality of separation areas in a certain space where the microphone array collects sound according to the processing load for performing sound source separation, reproduction signal generation, and the like. Specifically, the arrangement and the number of separation areas are controlled. For example, if the processing load of the processing device is large and the sound source separation processing for all areas is not in time for real-time playback, the separation area control unit 113 combines the sound source separation areas performed by the sound source separation unit 112 to determine the number of areas. Reduce. For example, in a state where the processing is sufficiently in time, for example, as shown in FIG. 2A, the sound collection space A1 is finely divided into 8 × 8 = 64 separation areas A2. When the processing is not in time, for example, it is determined whether or not the sound of the area is equal to or higher than a predetermined level in the processing of the previous frame, and areas less than the predetermined level are shown in FIG. To reduce the number of areas. While the voice above the predetermined level has a high probability of being a significant voice, the voice below the predetermined level has a high probability of being a non-significant voice such as noise. For this reason, a fine separation area is preferentially assigned to areas where the audio is above a predetermined level, so that significant audio can be faithfully reproduced, and in areas below the predetermined level, the separation area is integrated to speed up processing. Can be

エリアの分離サイズ変化の例を図３に示す。図３（Ｄ）は、処理負荷に基づいたエリア制御が行われている（エリア制御ＯＮ）か否（エリア制御ＯＦＦ）かの状態を示している。ｆｐからｆｐ＋７はフレーム番号を表す。図３（Ｃ）は、エリアごとに分離した音声のレベルが所定のレベル以上（音有）か所定のレベル未満（音無）かの状態を示している。ここではフレームｆｐ＋１およびｆｐ＋３において音有の状態となっている。図３（Ｂ）は最も細かく分割されたエリアの分割サイズを示している。この分割サイズは、収音空間Ａ１の面積を１とした場合の最小エリアの面積を表している。例えばフレームｆｐでは空間を６４のエリアに等分割しているため最少のエリアサイズは１／６４となっている。図３（Ａ）は、各フレームが複数のエリアに分離された様子を示している。 An example of the change in the separation size of the area is shown in FIG. FIG. 3D shows a state in which area control based on the processing load is being performed (area control ON) or not (area control OFF). fp to fp + 7 represent frame numbers. FIG. 3C shows a state where the sound level separated for each area is equal to or higher than a predetermined level (with sound) or lower than a predetermined level (no sound). Here, sound is present in frames fp + 1 and fp + 3. FIG. 3B shows the division size of the most finely divided area. This division size represents the area of the minimum area when the area of the sound collection space A1 is 1. For example, in the frame fp, since the space is equally divided into 64 areas, the minimum area size is 1/64. FIG. 3A shows a state in which each frame is separated into a plurality of areas.

ｆｐ＋１からｆｐ＋６までが処理負荷が大きくエリア数を減らす必要がある時間である。フレームｆｐにおいては、どこのエリアでも音声レベルが所定値を超えなかった（図３（Ｃ）で音無）。そのため、フレームｆｐ＋１ではエリアサイズは１辺が収音空間の１／２で収音空間を４つに分割した大きなエリアになる（図３（Ｂ）で１／４）。 The time from fp + 1 to fp + 6 is the time when the processing load is large and the number of areas needs to be reduced. In the frame fp, the sound level did not exceed a predetermined value in any area (no sound in FIG. 3C). Therefore, in the frame fp + 1, the area size is a large area in which one side is 1/2 of the sound collection space and the sound collection space is divided into four (1/4 in FIG. 3B).

フレームｆｐ＋１では音声レベルが所定値を超えたエリアがあった（図３（Ｃ）で音有）。そのため、フレームｆｐ＋２では音声のあったエリアＡ３は再び１辺が収音空間Ａ１の１／８の小さなエリアに分割される（図３（Ｂ）で１／６４）。 In the frame fp + 1, there was an area where the sound level exceeded a predetermined value (with sound in FIG. 3C). For this reason, in the frame fp + 2, the area A3 where the voice is present is again divided into areas each having a side that is 1/8 smaller than the sound collection space A1 (1/64 in FIG. 3B).

続いてフレームｆｐ＋２ではどこのエリアでも音声レベルが所定値を超えなかった（図３（Ｃ）で音無）。そのため、フレームｆｐ＋３では、一部のエリアが結合され１辺が収音空間の１／４の中間の大きさのエリアに分割される（図３（Ｂ）で１／１６）。 Subsequently, in the frame fp + 2, the sound level did not exceed the predetermined value in any area (no sound in FIG. 3C). Therefore, in the frame fp + 3, some areas are combined and one side is divided into an area having an intermediate size of 1/4 of the sound collection space (1/16 in FIG. 3B).

フレームｆｐ＋３では音声レベルが所定値を超えたエリアがあった（図３（Ｃ）で音有）。そのためフレームｆｐ＋４では音声のあったエリアＡ３は再び１辺が収音空間の１／８の小さなエリアに分割される（図３（Ｂ）で１／６４）。 In the frame fp + 3, there was an area where the sound level exceeded a predetermined value (with sound in FIG. 3C). For this reason, in the frame fp + 4, the area A3 where the sound is present is again divided into areas whose one side is 1/8 as small as the sound collection space (1/64 in FIG. 3B).

フレームｆｐ＋４、ｆｐ＋５ではどこのエリアも音声レベルが所定値を超えなかった（図３（Ｃ）で音無）。そのため、エリアが結合されフレームｆｐ＋６では１辺が収音空間の１／２で収音空間を４つに分割した大きなエリアになる。 In frames fp + 4 and fp + 5, the sound level did not exceed a predetermined value in any area (no sound in FIG. 3C). Therefore, the areas are combined, and in the frame fp + 6, one side is a half of the sound collection space and the sound collection space is divided into four.

分離エリア制御部１１３は、このようにして音声検出の有無に応じて分離エリア数を増減させる。ここで分離エリア制御部１１３は音源分離エリアを結合してエリア数を減らす例を説明した。もっとも、実際には音源分離部１１２に複数のエリアサイズに分離するビームフォーミング用のフィルタを持ち、分離エリア制御部１１３は使用するフィルタを制御するようにしてもよい。 In this way, the separation area control unit 113 increases or decreases the number of separation areas according to the presence or absence of voice detection. Here, the example in which the separation area control unit 113 combines the sound source separation areas to reduce the number of areas has been described. Of course, the sound source separation unit 112 may have a filter for beam forming that separates into a plurality of area sizes, and the separation area control unit 113 may control the filter to be used.

さらに分離エリア制御部１１３では分離エリア制御によって結合したエリアについてフレームと結合したエリア情報を分離エリア制御リストとして管理する。例えばフレームｆｑにおいて４つのエリアを結合した場合、フレームｆｑと４つのエリアがリストとして管理される。ここでエリアはあらかじめＩＤなどを付けて区別が付けられるようにしておく。分離エリア制御部１１３は、処理の負荷が小さくなったことに応じて分離エリア制御リストに記録されたフレームと結合されたエリアについてそれぞれのエリアの音源分離を行うように音源分離部１１２へ指示を出す。音源分離が行われるとそのフレームとエリアはリストから削除される。 Further, the separation area control unit 113 manages the area information combined with the frame for the areas combined by the separation area control as a separation area control list. For example, when four areas are combined in the frame fq, the frame fq and the four areas are managed as a list. Here, the areas are previously identified with an ID or the like. The separation area control unit 113 instructs the sound source separation unit 112 to perform sound source separation of each area for the areas combined with the frames recorded in the separation area control list in response to a reduction in processing load. put out. When sound source separation is performed, the frame and area are deleted from the list.

音声信号処理部１１４では、フレーム、エリアごとの音声信号の処理を行う。音声信号処理部１１４で行われる処理は、例えば、エリアと収音装置の距離による影響を補正するための遅延補正処理、ゲイン補正処理や、エコー除去などである。 The audio signal processing unit 114 performs audio signal processing for each frame and area. The processing performed by the audio signal processing unit 114 is, for example, delay correction processing for correcting the influence of the distance between the area and the sound collection device, gain correction processing, echo removal, and the like.

記憶部１１５は、例えばＨＤＤ（ハードディスクドライブ）やＳＳＤ（ソリッドステートドライブ）、メモリのような記憶装置である。記憶部１１５は、音源分離部１１２において分離エリア制御されたフレームの全音声チャンネルの信号と音声信号処理部１１４で音声信号処理を行った信号を、時刻情報とともに記録する。 The storage unit 115 is a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a memory. The storage unit 115 records the signals of all the audio channels of the frames subjected to the separation area control in the sound source separation unit 112 and the signals subjected to the audio signal processing in the audio signal processing unit 114 together with time information.

リアルタイム再生用信号生成部１１６では音源分離部１１２から得たエリアごとの音声を収音から所定の時間内にミキシングすることでリアルタイム再生用の信号を生成し出力する。例えば、外部から時間に応じて変化する空間内の仮想の聴取点と仮想の聴取者の向き（以下、単に聴取点と聴取者の向きと称する）と、再生環境の情報とを取得し、音源のミキシングを行う。ここで再生環境とは、リアルタイム再生用信号生成部１１６で生成した信号を再生する再生装置がスピーカ（ステレオ、サラウンド、その他マルチチャンネル）か、あるはヘッドホンかといった、再生装置の構成に関する環境である。すなわち、音源のミキシングにおいては、各分割エリアの音声信号を、再生装置のチャンネル数等の環境に合わせて合成・変換する処理を行う。 The real-time playback signal generation unit 116 generates and outputs a signal for real-time playback by mixing the sound for each area obtained from the sound source separation unit 112 within a predetermined time from the collected sound. For example, a virtual listening point in a space that changes according to time, the direction of the virtual listener (hereinafter simply referred to as the listening point and the listener's direction), and reproduction environment information are acquired from the outside, and the sound source Mixing. Here, the playback environment is an environment related to the configuration of the playback device, such as whether the playback device that plays back the signal generated by the real-time playback signal generation unit 116 is a speaker (stereo, surround, other multi-channel) or headphones. . That is, in sound source mixing, a process of synthesizing and converting the audio signal of each divided area according to the environment such as the number of channels of the playback device is performed.

リプレイ再生用信号生成部１１７は、リプレイ再生が要求された場合に、該当する時刻のデータを記憶部１１５から取得し、リアルタイム再生用信号生成部１１６と同様の処理を行い出力する。 When replay playback is requested, the replay playback signal generation unit 117 acquires data at the corresponding time from the storage unit 115, performs the same processing as the real time playback signal generation unit 116, and outputs the data.

図４は、音声信号処理装置１００のハードウェア構成例を示すブロック図である。音声信号処理装置１００は、例えば、パーソナルコンピュータ（ＰＣ）や組込みシステム、タブレット端末、スマートフォン等により実現される。 FIG. 4 is a block diagram illustrating a hardware configuration example of the audio signal processing apparatus 100. The audio signal processing apparatus 100 is realized by, for example, a personal computer (PC), an embedded system, a tablet terminal, a smartphone, or the like.

図４において、ＣＰＵ９９０は中央演算処理装置であり、コンピュータプログラムに基づいて他の構成要素と協働し、音声信号処理装置１００全体の動作を制御する。ＲＯＭ９９１は読出し専用メモリであり、基本プログラムや基本処理に使用するデータ等を記憶する。ＲＡＭ９９２は書込み可能メモリであり、ＣＰＵ９９０のワークエリア等として機能する。 In FIG. 4, a CPU 990 is a central processing unit, and controls the operation of the entire audio signal processing apparatus 100 in cooperation with other components based on a computer program. The ROM 991 is a read-only memory and stores basic programs, data used for basic processing, and the like. A RAM 992 is a writable memory and functions as a work area for the CPU 990.

外部記憶ドライブ９９３は記録媒体へのアクセスを実現し、ＵＳＢメモリ等のメディア（記録媒体）９９４に記憶されたコンピュータプログラムやデータを本システムにロードすることができる。ストレージ９９５はＳＳＤ（ソリッドステートドライブ）等の大容量メモリとして機能する装置である。ストレージ９９５には、各種コンピュータプログラムやデータが格納される。 The external storage drive 993 can access a recording medium, and can load a computer program and data stored in a medium (recording medium) 994 such as a USB memory into this system. The storage 995 is a device that functions as a large-capacity memory such as an SSD (solid state drive). The storage 995 stores various computer programs and data.

操作部９９６はユーザからの指示やコマンドの入力を受け付ける装置であり、キーボードやポインティングデバイス、タッチパネル等がこれに相当する。ディスプレイ９９７は、操作部９９６から入力されたコマンドや、それに対する音声信号処理装置１００の応答出力等を表示する表示装置である。インターフェイス（Ｉ／Ｆ）９９８は外部装置とのデータのやり取りを中継する装置である。また、マイクアレイ１１１は、インターフェイス９９８を介して音声信号処理装置１００に接続される。システムバス９９９は、音声信号処理装置１００内のデータの流れを司るデータバスである。 The operation unit 996 is a device that receives an instruction and a command input from a user, and corresponds to a keyboard, a pointing device, a touch panel, and the like. The display 997 is a display device that displays a command input from the operation unit 996, a response output of the audio signal processing device 100 with respect to the command, and the like. An interface (I / F) 998 is a device that relays data exchange with an external device. The microphone array 111 is connected to the audio signal processing apparatus 100 via the interface 998. The system bus 999 is a data bus that manages the flow of data in the audio signal processing apparatus 100.

図１の各機能要素は、ＣＰＵ９９０がコンピュータプログラムに基づき装置全体を制御することにより実現される。なお、以上の各装置と同等の機能を実現するソフトウェアにより、ハードウェア装置の代替として構成することもできる。 Each functional element in FIG. 1 is realized by the CPU 990 controlling the entire apparatus based on a computer program. In addition, it can also be comprised as an alternative of a hardware apparatus by the software which implement | achieves a function equivalent to the above each apparatus.

（処理手順）
続いて、音声信号処理装置１００が実行する処理の手順について図５を参照して説明する。図５（Ａ）から図５（Ｃ）は、本実施形態の音声信号処理装置１００が実行する処理の手順を示すフローチャートである。 (Processing procedure)
Next, the procedure of processing executed by the audio signal processing apparatus 100 will be described with reference to FIG. FIG. 5A to FIG. 5C are flowcharts showing a procedure of processing executed by the audio signal processing apparatus 100 of the present embodiment.

図５（Ａ）は、収音からリアルタイム再生用信号を生成するまでのフローである。はじめに、マイクアレイ１１１において空間内の音の収音が行われる（Ｓ１１１）。収音された各チャンネルの音声信号は音源分離部１１２へ出力される。 FIG. 5A is a flow from collecting sound to generating a real-time reproduction signal. First, sound in the space is collected in the microphone array 111 (S111). The collected audio signal of each channel is output to the sound source separation unit 112.

続いて分離エリア制御部１１３において処理の負荷の観点から音源分離がリアルタイム再生に間に合うか否かを判定する（Ｓ１１２）。この処理は、図３を参照して説明したように、所定のレベルの音声の有無等に基づいて行われる。 Subsequently, the separation area control unit 113 determines whether the sound source separation is in time for real-time reproduction from the viewpoint of processing load (S112). As described with reference to FIG. 3, this process is performed based on the presence or absence of a predetermined level of sound.

リアルタイム再生に間に合わないと判定された場合（Ｓ１１２でＮＯ）、分離エリア制御部１１３では音源分離エリアが少なくなるようにエリア数を制御する（Ｓ１１３）。具体的には、例えば、一定レベル以上の音声が検出されないエリア等の重要度の低い分離エリアを統合して分離エリアの個数を減少させる。そして、どのようなエリアで分離するかという情報を音源分離部１１２へ出力する。さらに分離エリア制御部１１３では分離エリア制御リストを作成する。 When it is determined that it is not in time for real-time reproduction (NO in S112), the separation area control unit 113 controls the number of areas so that the sound source separation area is reduced (S113). Specifically, for example, the number of separation areas is reduced by integrating separation areas with low importance such as areas in which sound of a certain level or higher is not detected. Then, information on which area is to be separated is output to the sound source separation unit 112. Further, the separation area control unit 113 creates a separation area control list.

続いて記憶部１１５において分離エリア制御を行ったフレームの音声信号を記録する（Ｓ１１４）。 Subsequently, the audio signal of the frame subjected to the separation area control is recorded in the storage unit 115 (S114).

リアルタイム再生に間に合うと判定された場合、あるいはＳ１１４において記録を行った後、音源分離部１１２において音源分離が行われる（Ｓ１１５）。すなわち、Ｓ１１１で集音したマルチチャネルの信号をもとに、各分離エリアにおける音声を合成する。前述のように、分離エリアの音声は、マイクアレイ１１１を構成するマイクロホンと、分離エリアの位置との関係に基づき、各マイクロホンが収集した音声信号に位相制御および重みづけをして加算することで再現することができる。分離されたエリアごとの音声信号は音声信号処理部１１４へ出力される。 When it is determined that it is in time for real-time reproduction, or after recording in S114, the sound source separation unit 112 performs sound source separation (S115). That is, the sound in each separation area is synthesized based on the multi-channel signal collected in S111. As described above, the sound of the separation area is added by performing phase control and weighting on the sound signal collected by each microphone based on the relationship between the microphones constituting the microphone array 111 and the position of the separation area. Can be reproduced. The separated audio signal for each area is output to the audio signal processing unit 114.

続いて音声信号処理部１１４において分離エリアごとの音声信号の処理を行う（Ｓ１１６）。音声信号処理部１１４による処理は、前述のように、例えば、分離エリアと収音装置との距離による影響を補正するための遅延補正処理、ゲイン補正処理や、エコー除去による雑音処理などである。処理された音声信号はリアルタイム再生用信号生成部１１６および記憶部１１５へ出力される。 Subsequently, the audio signal processing unit 114 processes the audio signal for each separation area (S116). As described above, the processing by the audio signal processing unit 114 is, for example, delay correction processing for correcting the influence of the distance between the separation area and the sound collection device, gain correction processing, noise processing by echo removal, and the like. The processed audio signal is output to the real-time reproduction signal generation unit 116 and the storage unit 115.

続いてリアルタイム再生用信号生成部１１６においてリアルタイム再生用の音声のミキシングが行われる（Ｓ１１７）。ミキシングにおいては、再生機器の仕様（例えば、チャンネル数等）に合わせて再生できるように信号を合成・変換したりする。リアルタイム再生用にミキシングされた音声は外部の再生機器あるいは放送用信号として出力される。 Subsequently, the real-time playback signal generator 116 mixes the audio for real-time playback (S117). In mixing, signals are synthesized and converted so that playback can be performed in accordance with the specifications of the playback device (for example, the number of channels). Audio mixed for real-time playback is output as an external playback device or broadcast signal.

続いて記憶部１１５において各エリアの音声の記録が行われる（Ｓ１１８）。リプレイ再生用の音声信号は記憶部１１５のエリアごとの音声を用いて作成される。 Subsequently, the voice of each area is recorded in the storage unit 115 (S118). The audio signal for replay reproduction is created using the audio for each area of the storage unit 115.

次に、図５（Ｂ）を用いて図５（Ａ）のＳ１１２においてリアルタイム再生に処理が間に合わなかった場合（Ｓ１１２でＮＯ）の処理を説明する。 Next, referring to FIG. 5B, a description will be given of processing in the case where processing is not in time for real-time playback in S112 of FIG. 5A (NO in S112).

分離エリア制御部１１３では処理装置の負荷が所定値より低い場合に、分離エリア制御リストに基づいて記憶部１１５からデータを読み出す（Ｓ１２１）。 The separation area control unit 113 reads data from the storage unit 115 based on the separation area control list when the load on the processing apparatus is lower than a predetermined value (S121).

続いて分離エリア制御リストに記載のエリアを結合して音源分離を行ったエリアについて再度結合前のエリアについて音源分離処理を行う（Ｓ１２２）。処理を行った音声信号は音声信号処理部１１４へ出力する。対応するフレームとエリアは処理が終わると分離エリア制御リストから削除される。Ｓ１２３はＳ１１６と同様のため詳細な説明を省略する。 Subsequently, the sound source separation process is performed again on the area before combining the areas where the sound source separation is performed by combining the areas described in the separation area control list (S122). The processed audio signal is output to the audio signal processing unit 114. The corresponding frame and area are deleted from the separation area control list when the processing is completed. Since S123 is similar to S116, detailed description thereof is omitted.

続いて記憶部１１５では入力されたエリアの音声信号を以前のデータに上書きし記録する（Ｓ１２４）。 Subsequently, the storage unit 115 overwrites and records the audio signal of the input area on the previous data (S124).

次に、図５（Ｃ）を用いてリプレイが要求された場合の処理フローを説明する。リプレイが要求されると、リプレイ再生用信号生成部１１７は記憶部１１５からリプレイ時間に対応したエリアごとの音声信号を読み出す（Ｓ１３１）。 Next, a processing flow when replay is requested will be described with reference to FIG. When a replay is requested, the replay playback signal generation unit 117 reads out an audio signal for each area corresponding to the replay time from the storage unit 115 (S131).

続いてリプレイ再生用信号生成部１１７においてリプレイ再生用の音声のミキシングが行われる（Ｓ１３２）。リプレイ再生用にミキシングされた音声は外部の再生機器あるいは放送用信号として出力される。 Subsequently, the replay playback signal generator 117 mixes the replay playback audio (S132). Audio mixed for replay playback is output as an external playback device or broadcast signal.

以上説明したように、処理負荷に応じて分離エリアを制御する。すなわち、一定の空間において、音源の分離及び再生用信号の生成の少なくともいずれかの処理の負荷がより大きい領域を、より細かい分離エリアに分割するように制御する。そのため、音量レベルが所定値より低いエリアの分離度は低下するが、音量レベルが所定値以上のエリアは高い分解能でリアルタイム再生用信号生成に間に合う。さらに処理負荷が軽い時に分離エリア制御したエリアの分離を行う事でリプレイ時には十分な分解能のデータを得ることができる。 As described above, the separation area is controlled according to the processing load. That is, in a certain space, control is performed so that a region where the processing load of at least one of sound source separation and reproduction signal generation is larger is divided into finer separation areas. For this reason, the degree of separation of the area where the sound volume level is lower than the predetermined value is lowered, but the area where the sound volume level is higher than the predetermined value is in time for generating the real-time reproduction signal with high resolution. Furthermore, when the processing load is light, the separation of the areas controlled by the separation area is performed, so that data with sufficient resolution can be obtained during replay.

本実施形態においてマイクアレイ１１１はマイクロホンからなる例を説明したが、反射板などの構造物とセットであってもよい。またマイクアレイ１１１で使用するマイクロホンは無指向性であってもよいし、指向性マイクであってもよく、それらの混合でもよい。 In the present embodiment, the microphone array 111 has been described as an example of a microphone, but may be a set with a structure such as a reflector. Further, the microphone used in the microphone array 111 may be non-directional, may be a directional microphone, or a mixture thereof.

本実施形態において音源分離部１１２はビームフォーミングを用いてエリアごとの音声収音を行う例を説明したが、その他の音源分離を用いてもよい。例えばエリアごとのパワースペクトル密度(ＰＳＤ)を推定し、推定したＰＳＤに基づいてウィナーフィルタによる分離を行ってもよい。 In the present embodiment, the sound source separation unit 112 has described an example in which sound collection for each area is performed using beamforming, but other sound source separation may be used. For example, the power spectral density (PSD) for each area may be estimated, and separation by a Wiener filter may be performed based on the estimated PSD.

本実施形態において分離エリア制御部１１３はエリアの音声レベルが所定値以上か否かで分離エリアを制御する例を説明したが、その他の判定基準を持っていてもよい。例えば同じ音声を使用する場合でも、レベルではなく、音の特徴量を検出する構成を備え、特徴量の有無を判定してもよい。具体的には、音声の特徴量解析により悲鳴や銃声や、ボールの音、自動車の音などが音声に含まれる場合など、予め定められた特徴を示す音声が検出されたときは分離エリアを小さくして、詳細な音声を再現するようにしてもよい。また、例えば全てのエリアを含む空間を撮影し、その撮影した動画像から分離エリアを制御してもよい。例えば、動画から人物や動物、マーカ等の特定の被写体を検出し、その被写体周辺の分離エリアの大きさがより小さくなるように制御してもよい。 In the present embodiment, the example in which the separation area control unit 113 controls the separation area based on whether or not the audio level of the area is equal to or higher than a predetermined value has been described, but other determination criteria may be used. For example, even when the same voice is used, it is possible to provide a configuration for detecting a feature amount of a sound instead of a level and determine the presence or absence of the feature amount. Specifically, when the voice that shows a predetermined feature is detected, such as when screams, gunshots, ball sounds, car sounds, etc. are detected by voice feature analysis, the separation area is reduced. Thus, detailed audio may be reproduced. Further, for example, a space including all areas may be photographed, and the separation area may be controlled from the photographed moving image. For example, a specific subject such as a person, an animal, or a marker may be detected from the moving image, and control may be performed so that the size of the separation area around the subject becomes smaller.

またテレビ放送などの生中継では、時間調整や、不慮の事態に対応するため実際の撮影から数秒から数分程度の一定の遅延を持たせて放送するようなシステムが一般に知られている。そのようなシステムを用いた場合、分離エリア制御部１１３は遅延時間分の映像や音声に含まれる事象に応じて分離順序を制御してもよい。例えば、スポーツのライブ中継において２分の遅延がある場合、２分間の試合展開から分離エリアを設定して、音源分離をしてもよい。例えばサッカーなどの競技においてゴールが決まると、２分間の映像からゴールを決めた選手やボールの動きを検出し、その軌跡周辺の分離エリアが細かくなるように設定されるようになっていてもよい。反対に選手やボールが入らないエリアについては分離エリアが粗くなるように設定されるようにするとよい。 In live broadcasts such as television broadcasting, a system is generally known that broadcasts with a certain delay of several seconds to several minutes from actual shooting in order to adjust the time or cope with an unexpected situation. When such a system is used, the separation area control unit 113 may control the separation order according to events included in video and audio for a delay time. For example, when there is a delay of 2 minutes in a live sports broadcast, a sound source separation may be performed by setting a separation area from a game development of 2 minutes. For example, when a goal is decided in a game such as soccer, the movement of the player or the ball who decided the goal from the video for 2 minutes is detected, and the separation area around the trajectory may be set to be fine. . On the other hand, it is preferable that the separation area is set to be rough for areas where players and balls do not enter.

また本実施形態では分離エリア制御部１１３はエリア数を極力減らしたが、処理負荷に応じてエリア数を計算し、必要最低限のエリア数を低減するようにしてもよい。 In the present embodiment, the separation area control unit 113 reduces the number of areas as much as possible, but may calculate the number of areas according to the processing load to reduce the minimum number of areas.

また本実施形態では分離エリア制御部１１３は前フレームの音声のレベルを用いて分離エリアを制御したが、処理フレームの情報を用いて分離エリアを制御してもよい。つまり、分離エリア制御部１１３は分離したエリアの音声のレベルが所定値以上であれば、そのエリアをさらに細かく分割したエリアでの音源分離を行うように音源分離部１１２へ指示する。分離エリア制御部１１３および音源分離部１１２はこの処理をエリアが所定のサイズまで小さくなるまで繰り返し行う。このようにして１フレーム分、分離エリア制御が遅れないようにすることができる。ただし、この手法は音源数が増えると、処理量が増えてしまうため、あらかじめ音源数が少ないとわかっている場面で用いるか、繰り返しの回数を処理負荷の許容範囲内に制限するようにするとよい。 In this embodiment, the separation area control unit 113 controls the separation area using the audio level of the previous frame. However, the separation area may be controlled using information of the processing frame. That is, if the sound level of the separated area is greater than or equal to a predetermined value, the separation area control unit 113 instructs the sound source separation unit 112 to perform sound source separation in an area obtained by further dividing the area. The separation area control unit 113 and the sound source separation unit 112 repeat this process until the area is reduced to a predetermined size. In this way, the separation area control can be prevented from being delayed by one frame. However, this method increases the amount of processing as the number of sound sources increases, so it should be used in a scene where the number of sound sources is known to be small in advance, or the number of repetitions should be limited within the allowable processing load range. .

本実施形態において音声信号処理部１１４は遅延補正処理、ゲイン補正処理、エコー除去を行うとしたが、他の処理も行ってもよい。例えばエリアごとの雑音除去処理などを行うようになっていてもよい。 In the present embodiment, the audio signal processing unit 114 performs the delay correction process, the gain correction process, and the echo removal, but other processes may also be performed. For example, noise removal processing for each area may be performed.

本実施形態においては、リプレイ再生用信号生成部１１７とリアルタイム再生用信号生成部１１６は同様の処理を行う例を説明した。ただし、リプレイ再生用信号生成部１１７とリアルタイム再生用信号生成部１１６では異なるミキシングをしてもよい。たとえばリアルタイム再生用信号生成部１１６では分離エリアの大きさが粗い音声が入力されることがあるため、処理の実施済みか否かに応じて例えばエリアサイズの大きいエリアはミキシング時のレベルを下げるなどしてもよい。 In the present embodiment, the example in which the replay reproduction signal generation unit 117 and the real time reproduction signal generation unit 116 perform the same processing has been described. However, the replay playback signal generation unit 117 and the real time playback signal generation unit 116 may perform different mixing. For example, in the real-time playback signal generation unit 116, a voice with a rough separation area may be input, so that, for example, an area having a large area size is reduced during mixing depending on whether or not the processing has been performed. May be.

また本実施形態では示さなかったが、図６に示すようにエリア制御の状況を表示装置に表示させる表示制御を行うようにしてもよい。例えば表示画面にはタイムバー５０１とタイムカーソル５０２、エリア分割表示５０３、エリア分割割合表示５０４等が表示される。ここで、タイムバー５０１は現在までの録音時間を表すバーで、タイムカーソル５０２の位置が表示画面の時間を表す。エリア分割表示５０３にはタイムカーソル５０２の指す時刻におけるエリアの分割状態を示す。この分割状態を示す画像は、実際の空間の画像や、実際の空間を再現したＣＧ等に重畳されて表示されるようにしてもよい。エリア分割割合表示５０４にはエリア分割のサイズごとの割合が表示される。あるいは図３のような画面が表示されていてもよい。このように表示を行うことで、エリア分割の状態を直感的に分かりやすくすることができる。またこの表示装置はさらにタッチパネルのような入力装置を備えていてもよい。例えばユーザがエリアサイズの大きくなっているエリアをタッチなどで選択し、そのエリアの分割を細かくする処理を優先的に行うように設定できるようにしてもよい。 Although not shown in the present embodiment, as shown in FIG. 6, display control for displaying the status of area control on the display device may be performed. For example, a time bar 501 and a time cursor 502, an area division display 503, an area division ratio display 504, and the like are displayed on the display screen. Here, the time bar 501 is a bar representing the recording time up to the present, and the position of the time cursor 502 represents the time on the display screen. The area division display 503 shows the area division state at the time indicated by the time cursor 502. The image indicating the division state may be displayed by being superimposed on an actual space image, a CG reproducing the actual space, or the like. The area division ratio display 504 displays a ratio for each area division size. Alternatively, a screen as shown in FIG. 3 may be displayed. By displaying in this way, it is possible to intuitively understand the state of area division. The display device may further include an input device such as a touch panel. For example, the user may select an area with a large area size by touch or the like, and may be configured to preferentially perform the process of finely dividing the area.

＜実施形態２＞
本発明の第二の実施形態（実施形態２）は複数のユーザがそれぞれ聴取点を設定し、その聴取点に応じた音響を再生装置で再生する音響システムに関する。 <Embodiment 2>
The second embodiment (Embodiment 2) of the present invention relates to an acoustic system in which a plurality of users each set a listening point and a sound corresponding to the listening point is played back by a playback device.

（音響システム）
図７は音響システム２０の構成を示すブロック図である。音響システム２０は収音部２１と再生信号生成部２２、および複数の再生部２３を備える。収音部２１と再生信号生成部２２、複数の再生部２３は互いに有線もしくは無線の伝送経路を通じてデータの送受信を行う。収音部２１、再生信号生成部２２、及び、再生部２３の間の伝送経路はＬＡＮ等の専用の通信経路により実現されるが、インターネット等の公衆通信網を経由してもよい。 (Acoustic system)
FIG. 7 is a block diagram showing the configuration of the acoustic system 20. The acoustic system 20 includes a sound collection unit 21, a reproduction signal generation unit 22, and a plurality of reproduction units 23. The sound collection unit 21, the reproduction signal generation unit 22, and the plurality of reproduction units 23 perform data transmission / reception via a wired or wireless transmission path. A transmission path between the sound collection unit 21, the reproduction signal generation unit 22, and the reproduction unit 23 is realized by a dedicated communication path such as a LAN, but may also pass through a public communication network such as the Internet.

図８（Ａ）収音部２１の構成を示すブロック図、図８（Ｂ）は再生信号生成部２２の構成を示すブロック図、図８（Ｂ）は再生部２３の構成を示すブロック図である。図８（Ａ）の収音部２１は、マイクアレイ１１１、及び、収音信号送信部２１１を備える。マイクアレイ１１１は実施形態１と同様のため詳細な説明は省略する。収音信号送信部２１１はマイクアレイ１１１から入力されたマイク信号を送信する。 8A is a block diagram showing the configuration of the sound collection unit 21, FIG. 8B is a block diagram showing the configuration of the reproduction signal generation unit 22, and FIG. 8B is a block diagram showing the configuration of the reproduction unit 23. is there. The sound collection unit 21 in FIG. 8A includes a microphone array 111 and a sound collection signal transmission unit 211. Since the microphone array 111 is the same as that of the first embodiment, detailed description thereof is omitted. The collected sound signal transmission unit 211 transmits the microphone signal input from the microphone array 111.

図８（Ｂ）の再生信号生成部２２は、音源分離部１１２、分離エリア制御部１１３、音声信号処理部１１４、記憶部１１５、収音信号受信部２２１、聴取点受信部２２２、再生用信号生成部２２３、再生信号送信部２２４を備える。音源分離部１１２、音声信号処理部１１４、記憶部１１５は実施形態１とほぼ同様のため詳細な説明を省略する。 8B includes a sound source separation unit 112, a separation area control unit 113, an audio signal processing unit 114, a storage unit 115, a collected sound signal reception unit 221, a listening point reception unit 222, and a reproduction signal. A generation unit 223 and a reproduction signal transmission unit 224 are provided. Since the sound source separation unit 112, the audio signal processing unit 114, and the storage unit 115 are substantially the same as those in the first embodiment, detailed description thereof is omitted.

分離エリア制御部１１３は後述する聴取点受信部２２２から入力される複数の聴取点に基づいて音源分離部１１２の音源分離を行うエリアを制御する。ここで聴取点とは、ユーザが設定する空間内での仮想の聴取者の位置と向き、および時刻からなる情報である。例えば、分離エリア制御部１１３では再生信号生成部２２の処理負荷を監視し、負荷が大きくなると聴取点の分布に基づいて分離エリア数を減らすようにエリアを制御する。例えばリアルタイムで聴取しているユーザが設定している聴取者の位置が図９（Ａ）の様に分布したとする。その場合、図９（Ｂ）に示すように、より多くの聴取点が設定されているエリアの周辺を細かく分割し、聴取点が少ないエリアを粗く分割するようにエリアを制御する。 The separation area control unit 113 controls an area in which the sound source separation unit 112 performs sound source separation based on a plurality of listening points input from a listening point reception unit 222 described later. Here, the listening point is information including the position and orientation of the virtual listener in the space set by the user, and the time. For example, the separation area control unit 113 monitors the processing load of the reproduction signal generation unit 22 and controls the area so as to reduce the number of separation areas based on the distribution of listening points when the load increases. For example, it is assumed that the positions of listeners set by users who are listening in real time are distributed as shown in FIG. In that case, as shown in FIG. 9B, the area is controlled so that the periphery of the area where more listening points are set is finely divided, and the area with few listening points is roughly divided.

また、過去の時刻の聴取点をユーザが指定してきた場合、つまりリプレイが要求された場合にはその時刻における分離エリアの状況と指定された視点に基づいて音源分離処理が必要か否かを判定し、必要な場合には処理負荷に応じて音源分離を実施する。例えば、指定された時刻においてエリア制御が行われていない場合、あるいはエリア制御されたが、今回指定された聴取点周辺は十分に細かいエリアで音源分離されている場合には改めて分離を行う必要はない。一方、指定された時刻においてエリア制御が行われ、かつ、今回指定された聴取点周辺のエリアの分割が粗い場合、分離エリア制御部１１３は聴取点の周辺のエリア分割を細かくするように音源分離部１１２へ制御信号を出力する。 Also, when the user has specified listening points at a past time, that is, when replay is requested, it is determined whether sound source separation processing is necessary based on the situation of the separation area at the time and the specified viewpoint If necessary, sound source separation is performed according to the processing load. For example, if area control is not performed at the specified time, or if area control is performed but the sound source is separated in a sufficiently fine area around the listening point specified this time, it is necessary to perform separation again Absent. On the other hand, when the area control is performed at the designated time and the area around the listening point designated this time is rough, the separation area control unit 113 performs sound source separation so as to make the area division around the listening point fine. A control signal is output to the unit 112.

収音信号受信部２２１は収音部２１から収音信号を受信する。聴取点受信部２２２は複数の再生部２３の各々から聴取点を受信する。受信した聴取点は分離エリア制御部１１３および再生用信号生成部２２３へ出力する。再生用信号生成部２２３は、実施形態１のリアルタイム再生用信号生成部１１６とリプレイ再生用信号生成部１１７を合わせた機能を持つ。聴取点受信部２２２から入力された聴取者の位置と向き、時刻に応じて再生信号を生成する。入力された時刻がリアルタイムであればリアルタイム再生用信号生成部１１６と同様であり、時刻が過去であればリプレイ再生用信号生成部１１７と同様になる。聴取点ごとに生成した音声信号は再生信号送信部２２４へ出力される。再生信号送信部２２４では受信した聴取点ごとの音声信号を、それぞれの再生部２３へ出力する。 The collected sound signal receiving unit 221 receives the collected sound signal from the sound collecting unit 21. The listening point receiving unit 222 receives a listening point from each of the plurality of reproducing units 23. The received listening points are output to the separation area control unit 113 and the reproduction signal generation unit 223. The reproduction signal generation unit 223 has a function of combining the real-time reproduction signal generation unit 116 and the replay reproduction signal generation unit 117 of the first embodiment. A reproduction signal is generated according to the position and orientation of the listener input from the listening point receiver 222 and the time. If the input time is real time, it is the same as the real time reproduction signal generation unit 116, and if the input time is past, it is the same as the replay reproduction signal generation unit 117. The audio signal generated for each listening point is output to the reproduction signal transmission unit 224. The reproduction signal transmission unit 224 outputs the received audio signal for each listening point to each reproduction unit 23.

図８（Ｃ）の再生部２３は、聴取点入力部２３１、聴取点送信部２３２、再生信号受信部２３３、及び、スピーカ２３４を備える。聴取点入力部２３１は、ユーザが時刻と収音を行っている空間内の仮想的な聴取者の位置と聴取者の向きを設定できる入力装置である。聴取点入力部２３１は、キーボード、ポインティング装置、あるいは、タッチパネル等により実現される。設定された聴取点は聴取点送信部２３２へ出力される。 The playback unit 23 in FIG. 8C includes a listening point input unit 231, a listening point transmission unit 232, a playback signal reception unit 233, and a speaker 234. The listening point input unit 231 is an input device that can set the time and the position of the virtual listener in the space where the user is collecting sound and the orientation of the listener. The listening point input unit 231 is realized by a keyboard, a pointing device, a touch panel, or the like. The set listening point is output to the listening point transmission unit 232.

聴取点送信部２３２はユーザによって設定された聴取点を聴取点受信部２２２へ出力する。再生信号受信部２３３は聴取点入力部２３１で設定した聴取点に対応する音声信号を受信し、スピーカ２３４へ出力する。スピーカ２３４では入力された音声信号をＤ／Ａ変換してスピーカから放音する。 The listening point transmitting unit 232 outputs the listening point set by the user to the listening point receiving unit 222. The reproduction signal receiving unit 233 receives an audio signal corresponding to the listening point set by the listening point input unit 231 and outputs the audio signal to the speaker 234. The speaker 234 performs D / A conversion on the input audio signal and emits sound from the speaker.

（処理手順）
続いて、音響システム２０が実行する処理の手順について９４を参照して説明する。図１０Ａから図１０Ｃは、本実施形態の音響システム２０が実行する処理の手順を示すフローチャートである。 (Processing procedure)
Subsequently, a procedure of processing executed by the acoustic system 20 will be described with reference to 94. FIG. 10A to FIG. 10C are flowcharts showing a procedure of processing executed by the acoustic system 20 of the present embodiment.

図１０Ａに示すように、はじめにマイクアレイ１１１において空間内の音の収音が行われる（Ｓ２０１）。収音された音声は収音信号送信部２１１へ出力される。続いて収音信号が収音部２１の収音信号送信部２１１から送信され、再生信号生成部２２の収音信号受信部２２１において受信される（Ｓ２０２）。受信された収音信号は音源分離部１１２へ出力される。続いて複数の再生部２３の聴取点入力部２３１において聴取点が入力される（Ｓ２０３）。入力された聴取点は聴取点送信部２３２へ出力される。 As shown in FIG. 10A, first, sound in the space is collected in the microphone array 111 (S201). The collected sound is output to the collected sound signal transmission unit 211. Subsequently, the collected sound signal is transmitted from the collected sound signal transmitting unit 211 of the sound collecting unit 21 and received by the collected sound signal receiving unit 221 of the reproduction signal generating unit 22 (S202). The received sound collection signal is output to the sound source separation unit 112. Subsequently, listening points are input in the listening point input units 231 of the plurality of reproducing units 23 (S203). The input listening point is output to the listening point transmission unit 232.

続いて聴取点が聴取点送信部２３２から送信され、再生信号生成部２２の聴取点受信部２２２において受信される（Ｓ２０４）。受信された複数の聴取点は分離エリア制御部１１３および再生用信号生成部２２３へ出力される。 Subsequently, the listening point is transmitted from the listening point transmitting unit 232 and received by the listening point receiving unit 222 of the reproduction signal generating unit 22 (S204). The plurality of received listening points are output to the separation area control unit 113 and the reproduction signal generation unit 223.

続いて分離エリア制御部１１３において処理がリアルタイム再生に間に合うか否かの判定が行われる（Ｓ２０５）。リアルタイム再生に間に合うと判定された場合（Ｓ２０５でＹＥＳ）はＳ２０８へ進み、リアルタイム再生に間に合わないと判定された場合（Ｓ２０５でＮＯ）はＳ２０６へ進む。 Subsequently, the separation area control unit 113 determines whether or not the processing is in time for real-time reproduction (S205). If it is determined that the real-time reproduction is in time (YES in S205), the process proceeds to S208. If it is determined that the real-time reproduction is not in time (NO in S205), the process proceeds to S206.

Ｓ２０６では、分離エリア制御部１１３において分離エリアの制御が行われる。すなわち、Ｓ２０６では複数のエリアを結合し、エリア数を減らす制御を音源分離部１１２へ出力する。さらに分離エリア制御リストを生成し、分離エリアの制御情報を管理する。続いて音源分離部１１２ではエリアが制御されると、そのフレームの収音信号を記憶部１１５へ出力し、記憶部１１５において入力された収音信号を記録する（Ｓ２０７）。そして、Ｓ２０８へ進む。 In S206, the separation area control unit 113 controls the separation area. That is, in S206, a control for combining a plurality of areas and reducing the number of areas is output to the sound source separation unit 112. Further, a separation area control list is generated and control information of the separation area is managed. Subsequently, when the area is controlled in the sound source separation unit 112, the sound collection signal of the frame is output to the storage unit 115, and the sound collection signal input in the storage unit 115 is recorded (S207). Then, the process proceeds to S208.

Ｓ２０８では、音源分離部１１２においてエリアごとの音源分離が行われる。分離されたエリアごとの音声信号は音声信号処理部１１４へ出力される。 In S208, the sound source separation unit 112 performs sound source separation for each area. The separated audio signal for each area is output to the audio signal processing unit 114.

続いて音声信号処理部１１４において音声信号の処理が行われる（Ｓ２０９）。処理された音声信号は記憶部１１５へ出力される。 Subsequently, the audio signal processing unit 114 performs audio signal processing (S209). The processed audio signal is output to the storage unit 115.

続いて記憶部１１５において処理されたエリアごとの音声信号が記録される（Ｓ２１０）。続いて再生用信号生成部２２３では記憶部１１５から聴取点受信部２２２から入力された複数の聴取点の時刻に応じてエリアごとの音声を取得し、聴取点ごとに再生用の音声のミキシングが行われる（Ｓ２１１）。ミキシングされた複数の再生信号は再生信号送信部２２４へ出力される。 Subsequently, an audio signal for each area processed in the storage unit 115 is recorded (S210). Subsequently, the reproduction signal generation unit 223 acquires sound for each area according to the times of the plurality of listening points input from the storage unit 115 from the listening point reception unit 222, and the reproduction sound is mixed for each listening point. This is performed (S211). The plurality of mixed reproduction signals are output to the reproduction signal transmission unit 224.

続いて聴取点ごとに生成された複数の再生信号は再生信号送信部２２４から送信され、入力した聴取点に対応する再生信号が、それぞれの再生部２３の再生信号受信部２３３において受信される（Ｓ２１２）。最後に再生信号受信部２３３で受信した再生信号はスピーカから再生される（Ｓ２１３）。 Subsequently, a plurality of reproduction signals generated for each listening point are transmitted from the reproduction signal transmitting unit 224, and a reproduction signal corresponding to the input listening point is received by the reproduction signal receiving unit 233 of each reproducing unit 23 ( S212). Finally, the reproduction signal received by the reproduction signal receiving unit 233 is reproduced from the speaker (S213).

次に、図１０（Ｂ）を用いて図１０（Ａ）のＳ２０５において処理が間に合わないと判定された場合で、エリア数を減らした場合の処理を説明する。 Next, a process when the number of areas is reduced when it is determined in S205 of FIG. 10A that the process is not in time will be described with reference to FIG.

分離エリア制御部１１３では処理負荷が所定値を下回った場合に、分離エリア制御リストを参照し、分離を行う時刻（フレーム）とエリアを決定する（Ｓ２２１）。分離するエリアや時刻の情報は音源分離部１１２へ出力される。 When the processing load falls below a predetermined value, the separation area control unit 113 refers to the separation area control list and determines the time (frame) and area for separation (S221). Information on the area to be separated and the time is output to the sound source separation unit 112.

続いて音源分離部１１２において、記憶部１１５から入力された時刻情報に基づいて収音信号を読み出す（Ｓ２２２）。Ｓ２２３からＳ２２５についてはＳ２０８からＳ２１０と同様のため詳細な説明を省略する。 Subsequently, the sound source separation unit 112 reads a sound collection signal based on the time information input from the storage unit 115 (S222). Since S223 to S225 are the same as S208 to S210, detailed description thereof is omitted.

以上説明したように、処理負荷および複数の聴取点の分布に基づいて分離エリアを結合して、エリア数を低減させる。そのため、重要な音声信号を忠実に再現することができるとともに、処理を効率化してリアルタイム処理を実現することができる。さらにリプレイ時にはリアルタイム時には伝送が間に合わなかったエリアに対しても分離された音を使って再生信号を生成できる。 As described above, the number of areas is reduced by combining the separation areas based on the processing load and the distribution of the plurality of listening points. Therefore, it is possible to faithfully reproduce important audio signals, and to realize real-time processing by improving processing efficiency. Further, at the time of replay, a reproduction signal can be generated using the separated sound even in an area where transmission is not in time in real time.

本実施形態において再生部２３は簡単のため全て同じ構成としたが、その構成は異なっていてもよい。本実施形態では記載しなかったが、自由視点映像を生成する自由視点映像生成システムと組み合わせて用いてもよい。例えば複数の撮像装置で音声を収音した空間と略同じ空間をあらゆる方向から撮像し、その撮像した画像から自由視点映像を生成する。その場合、聴取点は視点から算出するようになっていてもよいし、聴取点に連動して自由視点映像が生成されるようになっていてもよい。 In the present embodiment, all the playback units 23 have the same configuration for simplicity, but the configurations may be different. Although not described in the present embodiment, it may be used in combination with a free viewpoint video generation system that generates a free viewpoint video. For example, a space substantially the same as a space where sound is collected by a plurality of imaging devices is imaged from all directions, and a free viewpoint video is generated from the captured images. In this case, the listening point may be calculated from the viewpoint, or a free viewpoint video may be generated in conjunction with the listening point.

本実施形態において再生用信号生成部２２３は再生信号生成部２２内に構成されたが、再生部２３内に構成されるようになっていてもよい。本実施形態において分離エリア制御部１１３は、複数の聴取者の位置のみを用いて分離エリアを決定したが図９（Ｃ）に示すように聴取者の向きに応じて聴取の向き前方の前方に存在する領域を細かく分割し、後方を粗く分割するようにしてもよい。 In the present embodiment, the reproduction signal generation unit 223 is configured in the reproduction signal generation unit 22, but may be configured in the reproduction unit 23. In the present embodiment, the separation area control unit 113 determines the separation area using only the positions of a plurality of listeners. However, as shown in FIG. The existing area may be divided finely and the rear part may be divided roughly.

本実施形態においてエリア制御を行った場合、聴取点入力部２３１において入力できる聴取位置を制限するようにしてもよい。本実施形態において再生部２３は一律で扱ったが、分離エリアを制御するために聴取点ごとに異なる重みを持っていてもよい。また実施形態１と同様に、エリア制御の状況を表示する表示装置や分離エリア制御を指示する入力装置を備えていてもよい。 When area control is performed in the present embodiment, listening positions that can be input by the listening point input unit 231 may be limited. In the present embodiment, the playback unit 23 is treated uniformly, but may have a different weight for each listening point in order to control the separation area. Similarly to the first embodiment, a display device that displays the status of area control and an input device that instructs separation area control may be provided.

本発明の各実施形態においては、再生までの時間が限られているリアルタイム再生においても音源分離するエリアの数を制御することで空間全体を収音し、かつ重要なエリアの分解能を保ったまま再生することができる。 In each embodiment of the present invention, even in real-time playback where the time until playback is limited, by controlling the number of areas for sound source separation, the entire space is collected and the resolution of important areas is maintained. Can be played.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００：音声信号処理装置、１１１：マイクアレイ、１１２：音源分離部、１１３：分離エリア制御部、１１４：音声信号処理部、１１５：記憶部、１１６：リアルタイム再生用信号生成部、１１７：リプレイ再生用信号生成部 100: Audio signal processing device, 111: Microphone array, 112: Sound source separation unit, 113: Separation area control unit, 114: Audio signal processing unit, 115: Storage unit, 116: Signal generation unit for real-time reproduction, 117: Replay reproduction Signal generator

Claims

A microphone array that collects audio,
Separation means for separating the sound collected by the microphone array into sound in a plurality of separation areas into which a certain space is divided;
Control means for controlling division of the constant space into the plurality of separation areas;
An acoustic system comprising: a generation unit that generates a reproduction signal based on the separated sound.

The acoustic system according to claim 1, wherein the control unit controls the arrangement and the number of the plurality of separation areas.

The acoustic system according to claim 2, wherein the control unit controls the arrangement and the number of the plurality of separation areas based on a processing load of at least one of the separation unit and the generation unit.

The control unit according to claim 1, wherein the control unit performs control so as to divide a region in which the processing load of at least one of the separation unit and the generation unit is larger in the fixed space into finer separation areas. 3. The acoustic system according to 3.

The acoustic system according to claim 2, wherein the control unit controls the arrangement and the number of the plurality of separation areas based on sound detected in the certain space.

The acoustic system according to claim 5, wherein the control unit performs control so as to divide a region in which a higher sound level is detected in the certain space into finer separation areas.

6. The control unit according to claim 5, wherein the control unit performs control so as to divide an area in which sound having a predetermined characteristic is detected in the certain space into smaller separation areas than other areas. The described acoustic system.

A photographing means for photographing the fixed space and generating an image;
The acoustic system according to claim 2, wherein the control unit controls the arrangement and the number of the plurality of separation areas based on an image generated by the photographing unit.

9. The acoustic system according to claim 8, wherein the control unit performs control so as to divide an area in which more specific subjects are reflected in the image into finer separation areas.

3. The sound according to claim 2, wherein when the position of a listening point for listening to a sound is designated, the control unit performs control so as to divide a region closer to the listening point into finer separation areas. system.

11. The control unit according to claim 10, wherein when the listening direction at the listening point is designated, the control unit performs control so that a region existing in the listening direction is divided into finer separation areas. Acoustic system.

The acoustic system according to claim 10, wherein the generation unit generates a signal of a sound to be heard at the listening point as the reproduction signal.

The acoustic system according to any one of claims 1 to 12, further comprising display control means for causing a display means to display an image showing an arrangement of the plurality of separation areas divided by the control means.

The acoustic system according to claim 2, wherein the control unit controls the arrangement and the number of the plurality of separation areas according to an instruction from a user.

A method for controlling an acoustic system including a microphone array for collecting sound,
A separation step of separating the sound collected by the microphone array into sound in a plurality of separation areas into which a certain space is divided;
Generating a reproduction signal based on the separated audio, and
A control method for an acoustic system, comprising: a control step of controlling division of the certain space into the plurality of separation areas before the separation step.

A signal generation device that generates a reproduction signal based on sound collected by a microphone array that collects sound,
Separation means for separating the sound collected by the microphone array into sound in a plurality of separation areas into which a certain space is divided;
Control means for controlling division of the constant space into the plurality of separation areas;
A signal generation device comprising: a generation unit configured to generate a reproduction signal based on the separated sound.

The computer program for functioning a computer as each means with which the signal generation apparatus of Claim 16 is provided.