JP2014063145A

JP2014063145A - Environmental sound synthesizer, environmental sound transmission system, environmental sound synthesizing method, environmental sound transmission method, and program

Info

Publication number: JP2014063145A
Application number: JP2013169037A
Authority: JP
Inventors: Masaru Kamamoto; 優鎌本; Takehiro Moriya; 健弘守谷; Akira Omoto; 章尾本; Kazuhiko Kawahara; 一彦河原
Original assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2012-08-27
Filing date: 2013-08-16
Publication date: 2014-04-10
Anticipated expiration: 2033-08-16
Also published as: JP6095223B2

Abstract

PROBLEM TO BE SOLVED: To efficiently transmit environmental sound such as applause, hand-clapping sound, cheering, calling out or the like recorded at a transmission source and to reproduce the atmosphere of the transmission source at a transmission destination.SOLUTION: An environmental sound synthesizer that produces environmental sound by acquiring environmental sound volume parameter relating to the sound volume of an acoustic signal of a transmission source from an environmental sound analysis apparatus, comprises: a data reception part configured to receive an environmental sound volume parameter from the environmental sound volume analysis apparatus; a data reception part configured to receive an environmental sound volume parameter from the environmental sound analysis apparatus; a template storage section configured to store a template for environmental sound for one frame (hereinafter referred to as template) and information corresponding to the volume of the environmental sound of the template; and an audio source synthesizing section configured to produce environmental sound by selecting from the template storage section a template that has the same sound volume as the environmental sound volume parameter and synthesizing the selected template.

Description

本発明は、伝送元で収音された環境音を、伝送先で再生する環境音合成装置、環境音伝送システム、環境音合成方法、環境音伝送方法、プログラムに関する。 The present invention relates to an environmental sound synthesizer, an environmental sound transmission system, an environmental sound synthesis method, an environmental sound transmission method, and a program for reproducing environmental sound collected at a transmission source at a transmission destination.

実測データをもとに算出した個人差や、速度・大きさの揺らぎの程度を利用して、単独のユーザと同期するように複数の拍手音を合成し出力する技術が提案されている（非特許文献１）。また、ある地点の音を別の場所に伝送し再生する技術として、音響符号化技術が知られている。例えば、非特許文献２では、聴覚マスキングを巧みに利用し、また楽器の特性を利用して低域の成分を高域にコピーして使うという楽音の特性に合わせたモデルにより、低ビットレートで品質の高い音響符号化技術が提案されている。 A technique for synthesizing and outputting multiple applause sounds to synchronize with a single user using individual differences calculated based on measured data and the degree of fluctuation in speed and size has been proposed (non- Patent Document 1). Also, an acoustic coding technique is known as a technique for transmitting and reproducing sound at a certain point to another place. For example, in Non-Patent Document 2, a low bit rate is achieved by a model that uses sound masking skillfully and copies the low-frequency component to the high frequency using the characteristics of the musical instrument. High quality acoustic coding techniques have been proposed.

西村竜一、宮里勉、「仮想的集団による拍手音の合成」、電子情報通信学会技術研究報告、電子情報通信学会、1999年3月、MVE,マルチメディア・仮想環境基礎、98(684), p.17-24,Ryuichi Nishimura, Tsutomu Miyazato, “Synthesis of clap sound by virtual group”, IEICE Technical Report, IEICE, March 1999, MVE, Multimedia and Virtual Environment Basics, 98 (684), p .17-24, Stefan Meltzer and Gerald Moser,"MPEG-4 HE-AAC v2 - audio coding for today's digital media world," EBU technical review, Jan., 2006.Stefan Meltzer and Gerald Moser, "MPEG-4 HE-AAC v2-audio coding for today's digital media world," EBU technical review, Jan., 2006.

非特許文献１は、ユーザと同調する複数の人がその場にいるような環境を仮想的に実現することを目的としたものであり、ユーザの拍手のピッチに合わせて仮想的な拍手音を合成する技術であり、実在する遠隔地の場の状況（拍手音や手拍子）を、別の場所に伝送し再現することはできなかった。また、声援・掛け声などの拍手音以外の環境音を伝送し再現することは対象としていない。また、拍手音や声援・掛け声などの環境音は純粋な音声や楽器音とは異なり白色雑音に近いため、非特許文献２のような従来の音響符号化技術ではうまく表現できず、音質が劣化していた。そこで本発明では、伝送元において収音された拍手や手拍子音、声援・掛け声などの環境音を効率よく伝送し、伝送先で伝送元の場の雰囲気を再現することができる環境音合成装置を提供することを目的とする。 Non-Patent Document 1 is intended to virtually realize an environment in which a plurality of people who are synchronized with a user are present, and a virtual applause sound is generated in accordance with the pitch of a user's applause. It is a technology to synthesize, and it was impossible to transmit and reproduce the actual situation of the remote place (applause sound and clapping) to another place. Also, it is not intended to transmit and reproduce environmental sounds other than clapping sounds such as cheering and shouting. In addition, environmental sounds such as applause, cheering, and cheering sounds are close to white noise, unlike pure voices and instrument sounds, so they cannot be expressed well with conventional acoustic coding techniques such as Non-Patent Document 2 and sound quality deteriorates. Was. Therefore, in the present invention, an environmental sound synthesizer capable of efficiently transmitting environmental sounds such as applause, hand clapping, cheering and shout collected at the transmission source, and reproducing the atmosphere of the transmission source at the transmission destination. The purpose is to provide.

本発明の環境音合成装置は、環境音分析装置から伝送元の音響信号の音量に関する環境音量パラメタを取得して環境音を生成することを特徴とし、データ受信部と、テンプレート記憶部と、音源合成部とを備える。 An environmental sound synthesizer according to the present invention is characterized in that an environmental sound is generated by acquiring an environmental sound volume parameter related to a sound volume of a transmission source acoustic signal from an environmental sound analyzer, and a data receiving unit, a template storage unit, a sound source And a synthesis unit.

データ受信部は、環境音分析装置から環境音量パラメタを受信する。テンプレート記憶部は、１フレーム分の環境音のテンプレート（以下、テンプレートという）と当該テンプレートの環境音の音量に対応する情報とを対応付けて記憶する。音源合成部は、環境音量パラメタと同じ音量大きさのテンプレートをテンプレート記憶部から選択し、選択したテンプレートを合成して環境音を生成する。 The data receiving unit receives an environmental sound volume parameter from the environmental sound analyzer. The template storage unit stores an environmental sound template for one frame (hereinafter referred to as a template) and information corresponding to the volume of the environmental sound of the template in association with each other. The sound source synthesis unit selects a template having the same volume level as the environmental volume parameter from the template storage unit, and synthesizes the selected template to generate an environmental sound.

本発明の環境音合成装置によれば、伝送元において収音された拍手や手拍子音、声援・掛け声などの環境音を効率よく伝送し、伝送先で伝送元の場の雰囲気を再現することができる。 According to the environmental sound synthesizer of the present invention, it is possible to efficiently transmit environmental sounds such as applause, hand clapping sound, cheering and shout collected at the transmission source, and to reproduce the atmosphere of the transmission source at the transmission destination. it can.

本発明の環境音伝送システムの構成例を示すブロック図。The block diagram which shows the structural example of the environmental sound transmission system of this invention. 実施例１の環境音分析装置の構成を示すブロック図。1 is a block diagram showing a configuration of an environmental sound analyzer of Example 1. FIG. 実施例１の環境音分析装置の動作を示すフローチャート。3 is a flowchart showing the operation of the environmental sound analysis apparatus according to the first embodiment. 実施例２の環境音分析装置の構成を示すブロック図。FIG. 6 is a block diagram illustrating a configuration of an environmental sound analysis apparatus according to a second embodiment. 実施例２の環境音分析装置の動作を示すフローチャート。9 is a flowchart showing the operation of the environmental sound analysis apparatus according to the second embodiment. 実施例２のパラメタ変換部のパラメタ生成手順を例示する図。The figure which illustrates the parameter production | generation procedure of the parameter conversion part of Example 2. 実施例２の変形例１の環境音分析装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of an environmental sound analyzer according to a first modification of the second embodiment. 実施例２の変形例１の環境音分析装置の動作を示すフローチャート。9 is a flowchart showing the operation of the environmental sound analysis apparatus according to the first modification of the second embodiment. 実施例３の環境音合成装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of an environmental sound synthesizer according to a third embodiment. 実施例３の環境音合成装置の動作を示すフローチャート。9 is a flowchart illustrating the operation of the environmental sound synthesizer according to the third embodiment. 実施例４の環境音合成装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of an environmental sound synthesizer according to a fourth embodiment. 実施例４の環境音合成装置の動作を示すフローチャート。10 is a flowchart showing the operation of the environmental sound synthesizer of the fourth embodiment. 実施例４の音源合成部の環境音素片テンプレート合成手順を例示する図。The figure which illustrates the environmental sound segment template synthetic | combination procedure of the sound source synthetic | combination part of Example 4. FIG. 実施例５及びその変形例の環境音分析装置の構成を示すブロック図。The block diagram which shows the structure of Example 5 and the environmental sound analyzer of the modification. 実施例５及びその変形例の環境音分析装置の動作を示すフローチャート。The flowchart which shows operation | movement of the environmental sound analyzer of Example 5 and its modification. 拍手音や手拍子音の周波数成分の時間変化を例示する図。The figure which illustrates the time change of the frequency component of a clapping sound and a clapping sound. 実施例６の環境音合成装置の構成を示すブロック図。FIG. 10 is a block diagram illustrating a configuration of an environmental sound synthesizer according to a sixth embodiment. 実施例６の環境音合成装置の動作を示すフローチャート。10 is a flowchart showing the operation of the environmental sound synthesizer of the sixth embodiment. 実施例７、８の環境音合成装置の構成を示すブロック図。The block diagram which shows the structure of the environmental sound synthesizer of Example 7,8. 実施例７、８の環境音合成装置の動作を示すフローチャート。10 is a flowchart showing the operation of the environmental sound synthesizer according to the seventh and eighth embodiments. テンプレート記憶部に環境音素片テンプレートと出力確率とを対応付けて記憶する例を例示する図。The figure which illustrates the example which matches and memorize | stores an environmental sound segment template and an output probability in a template memory | storage part.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

拍手や手拍子音、声援・掛け声などの環境音の総音量は、観客の人数が多いほど大きくなる。本発明では、環境音そのものを伝送するのではなく、環境音の音量を表す情報だけを伝送する。そして、伝送先では予め記憶された環境音のテンプレートを、音量を表す情報に応じて変換することにより、伝送元の環境音（に類似した音）を再生する。 The total volume of environmental sounds such as clapping, clapping, cheering and shouting increases as the number of spectators increases. In the present invention, the environmental sound itself is not transmitted, but only information representing the volume of the environmental sound is transmitted. Then, the environmental sound template stored in advance at the transmission destination is converted in accordance with the information indicating the volume, thereby reproducing the environmental sound (similar sound) of the transmission source.

また、拍手や手拍子音の一拍（一度両手を合わせて打つこと）は、音響パワーの個人差が小さい。また、一拍と一拍の時間間隔（以下、拍手間隔ともいう）の個人差も小さく、２００ｍｓ〜３００ｍｓ程度である。したがって、ある人の拍手音（一拍分）を環境音素片テンプレートとして用意しておき、それを個人差に応じたゆらぎ（２００ｍｓ〜３００ｍｓ）を持たせた間隔で繰り返し再生することにより、別の人の拍手音に類似した音を構成することができる。 In addition, a single applause or one beat of a clapping sound (to strike both hands once) has a small individual difference in acoustic power. Moreover, the individual difference of the time interval between 1 beat and 1 beat (henceforth an applause interval) is also small, and is about 200 ms-300 ms. Therefore, a certain person's applause sound (for one beat) is prepared as an environmental sound segment template, and it is repeatedly reproduced at intervals given fluctuations (200 ms to 300 ms) according to individual differences. A sound similar to a human applause sound can be formed.

＜環境音伝送システム＞
以下、図１を参照して本発明の環境音伝送システムについて説明する。図１は本発明の環境音伝送システムの構成例を示すブロック図である。図１に示すように、本発明の環境音伝送システムは、伝送元の環境音分析装置と、伝送先の環境音合成装置から構成される。環境音分析装置は、入力された音響信号（環境音）の音量に対応する情報（環境音量パラメタＰ_ｊ、以下単にパラメタともいう）を抽出し、出力する。環境音合成装置は、予め記憶された環境音のテンプレートを用いて、入力された環境音量パラメタＰ_ｊに合わせてテンプレートを変換することにより環境音を合成し、出力する。以下、実施例１において環境音分析装置１、実施例２において環境音分析装置２、実施例２の変形例１において環境音分析装置２’、実施例３において環境音合成装置３、実施例４において環境音合成装置４、実施例５において環境音分析装置５、実施例６において環境音合成装置６、実施例７において環境音合成装置７、実施例８において環境音合成装置８、実施例９において環境音合成装置９、実施例１０において環境音合成装置１０をそれぞれ説明する。また、環境音分析装置１と環境音合成装置３との組み合わせを環境音伝送システム１０００、環境音分析装置２と環境音合成装置３との組み合わせを環境音伝送システム２０００、環境音分析装置２’と環境音合成装置３との組み合わせを環境音伝送システム２０００’、環境音分析装置１と環境音合成装置４との組み合わせを環境音伝送システム３０００、環境音分析装置２と環境音合成装置４との組み合わせを環境音伝送システム４０００、環境音分析装置２’と環境音合成装置４との組み合わせを環境音伝送システム４０００’、環境音分析装置５と環境音合成装置６との組み合わせを環境音伝送システム５０００と呼ぶ。 <Environmental sound transmission system>
The environmental sound transmission system of the present invention will be described below with reference to FIG. FIG. 1 is a block diagram showing a configuration example of an environmental sound transmission system according to the present invention. As shown in FIG. 1, the environmental sound transmission system according to the present invention includes a transmission source environmental sound analysis device and a transmission destination environmental sound synthesis device. The environmental sound analyzer extracts and outputs information corresponding to the volume of the input acoustic signal (environmental sound) (environmental volume parameter P _j , also simply referred to as parameter hereinafter). The environmental sound synthesizer synthesizes and outputs the environmental sound by converting the template according to the input environmental sound volume parameter P _j using the environmental sound template stored in advance. Hereinafter, the environmental sound analysis device 1 in the first embodiment, the environmental sound analysis device 2 in the second embodiment, the environmental sound analysis device 2 ′ in the first modification of the second embodiment, the environmental sound synthesis device 3 in the third embodiment, and the fourth embodiment. Environmental sound synthesizer 4 in Example 5, environmental sound analysis device 5 in Example 5, environmental sound synthesizer 6 in Example 6, environmental sound synthesizer 7 in Example 7, environmental sound synthesizer 8 in Example 8, and Example 9 The environmental sound synthesizer 9 and the environmental sound synthesizer 10 in the tenth embodiment will be described. The combination of the environmental sound analysis device 1 and the environmental sound synthesis device 3 is the environmental sound transmission system 1000, and the combination of the environmental sound analysis device 2 and the environmental sound synthesis device 3 is the environmental sound transmission system 2000, and the environmental sound analysis device 2 ′. The combination of the environmental sound synthesizer 3 and the environmental sound transmission system 2000 ', and the combination of the environmental sound analysis device 1 and the environmental sound synthesizer 4 are the environmental sound transmission system 3000, the environmental sound analysis device 2 and the environmental sound synthesizer 4. The environmental sound transmission system 4000, the combination of the environmental sound analysis device 2 ′ and the environmental sound synthesis device 4 is the environmental sound transmission system 4000 ′, and the combination of the environmental sound analysis device 5 and the environmental sound synthesis device 6 is the environmental sound transmission. Called system 5000.

以下、図２、図３を参照して本発明の実施例１の環境音分析装置について説明する。図２は本実施例の環境音分析装置１の構成を示すブロック図である。図３は本実施例の環境音分析装置１の動作を示すフローチャートである。図２に示すように、本実施例の環境音分析装置１は、収音部１１と、音量計算部１２と、パラメタ変換部１３と、データ送信部１４とを備える。 Hereinafter, the environmental sound analysis apparatus according to the first embodiment of the present invention will be described with reference to FIGS. FIG. 2 is a block diagram showing the configuration of the environmental sound analyzer 1 of this embodiment. FIG. 3 is a flowchart showing the operation of the environmental sound analyzer 1 of the present embodiment. As shown in FIG. 2, the environmental sound analyzer 1 according to the present embodiment includes a sound collection unit 11, a volume calculation unit 12, a parameter conversion unit 13, and a data transmission unit 14.

＜収音部１１＞
収音部１１は伝送元の音を収音する（Ｓ１１）。ここでは、収音部１１には伝送元の拍手音が入力されるものとする。 <Sound Collection Unit 11>
The sound collection unit 11 collects the transmission source sound (S11). Here, it is assumed that the applause sound of the transmission source is input to the sound collection unit 11.

＜音量計算部１２＞
音量計算部１２は、拍手音の音響信号を取得する。音量計算部１２が取得する拍手音の音響信号は、所定のサンプリング周波数でサンプリングされた信号列とする。ここで、Ｘ_ｊを第ｊフレームの音響信号とし、Ｘ_ｊ＝（ｘ_ｊ（１），ｘ_ｊ（２），…，ｘ_ｊ（Ｎ））（Ｎはフレームあたりのサンプル数）とする。例えば８ｋＨｚサンプリングのときに１フレーム２０ｍｓとすると、Ｎ＝１６０である。なお、遅延が短い方が良ければフレームの長さを短くし、遅延が長くなっても良ければ、フレームの長さを長くすれば良い。音量計算部１２は、フレーム毎に、入力された拍手音の音響信号の音量に対応する値（以下、「拍手音量に対応する値」ともいう）を求めて出力する。具体的には、音量計算部１２は、フレーム毎に、入力された拍手音の音響信号Ｘ_ｊ＝（ｘ_ｊ（１），ｘ_ｊ（２），…，ｘ_ｊ（Ｎ））の平均エネルギー <Volume calculator 12>
The volume calculation unit 12 acquires an acoustic signal of applause sound. The sound signal of the clap sound acquired by the volume calculation unit 12 is a signal sequence sampled at a predetermined sampling frequency. Here, X _j is an acoustic signal of the j-th frame, and X _j = (x _j (1), x _j (2),..., X _j (N)) (N is the number of samples per frame). For example, if 8 frames are sampled and one frame is 20 ms, N = 160. If a shorter delay is better, the frame length is shortened. If a longer delay is acceptable, the frame length is increased. The volume calculation unit 12 obtains and outputs a value corresponding to the volume of the input sound signal of the applause sound (hereinafter also referred to as “value corresponding to the applause volume”) for each frame. Specifically, the volume calculation unit 12 calculates the average energy of the input clapping sound signal X _j = (x _j (1), x _j (2),..., X _j (N)) for each frame.

を計算する（Ｓ１２）。 Is calculated (S12).

＜パラメタ変換部１３＞
パラメタ変換部１３は、音量計算部１２から出力された拍手音量に対応する値を取得する。パラメタ変換部１３は、取得した拍手音量に対応する値を量子化し、環境音量パラメタを出力する。具体的には、パラメタ変換部１３は、平均エネルギーＥ_ｊの取りうる範囲（例えばｘ_ｊ（ｉ）（ｉ＝１，２，…，Ｎ）が符号付き１６ｂｉｔの場合は最小値が０で最大値が２＾３０となる）をあらかじめ定められた場合の数（例えば１６ｂｉｔ）に量子化し、そのインデックスを環境音量パラメタＰ_ｊとして出力する（Ｓ１３）。 <Parameter converter 13>
The parameter conversion unit 13 acquires a value corresponding to the applause volume output from the volume calculation unit 12. The parameter conversion unit 13 quantizes the value corresponding to the acquired applause volume and outputs an environmental volume parameter. Specifically, the parameter conversion unit 13 sets the minimum value to 0 and the maximum when the range (for example, x _j (i) (i = 1, 2,..., N)) that can be taken by the average energy E _j is 16 bits with a sign. value is quantized to the number (e.g. 16bit) when predetermined to be 2 ^ 30), and outputs the index as the environmental sound level parameter _{P j} (S13).

＜データ送信部１４＞
データ送信部１４は、パラメタ変換部１３が出力した環境音量パラメタＰ_ｊを伝送先の環境音合成装置３（または４）に送信する（Ｓ１４）。環境音合成装置３については実施例３に、環境音合成装置４については実施例４に記載する。 <Data transmission unit 14>
The data transmission unit 14 transmits the environmental sound volume parameter P _j output from the parameter conversion unit 13 to the environmental sound synthesizer 3 (or 4) as the transmission destination (S14). The environmental sound synthesizer 3 is described in the third embodiment, and the environmental sound synthesizer 4 is described in the fourth embodiment.

このように、本実施例の環境音分析装置１によれば、伝送元において収音された拍手音を効率よく低遅延に伝送することができる。 Thus, according to the environmental sound analyzer 1 of the present embodiment, the applause sound collected at the transmission source can be efficiently transmitted with low delay.

[実施例１の動作例２]
上述の実施例１では、伝送元の環境音の例として拍手音を対象とし、伝送元の拍手音を分析する環境音分析装置１の動作例を説明したが、これに限らず拍手音以外の環境音を対象としても良い。例えば、声援や掛け声などを環境音としても良いし、伝送元で収音される音の中から伝送元会場のメインコンテンツの音を除いた音響信号（雑音を含む）を環境音としても良い。 [Operation Example 2 of Example 1]
In the first embodiment described above, an example of the operation of the environmental sound analysis apparatus 1 that analyzes the applause sound of the transmission source has been described as an example of the environmental sound of the transmission source. However, the present invention is not limited to this. It is also possible to target environmental sounds. For example, cheering or shouting may be used as the environmental sound, or an acoustic signal (including noise) obtained by removing the main content sound at the transmission source site from the sound collected at the transmission source may be used as the environmental sound.

実施例１の動作例２における環境音分析装置１は、環境音分析装置１の収音部１１、音量計算部１２、パラメタ変換部１３、データ送信部１４の各部で取り扱われる拍手音および拍手音量が、環境音及び環境音の音量に置き換わる点を除いては、上述の動作例と同じである。 The environmental sound analysis apparatus 1 in the operation example 2 of the first embodiment includes the applause sound and the applause volume handled by each of the sound collection unit 11, the volume calculation unit 12, the parameter conversion unit 13, and the data transmission unit 14 of the environmental sound analysis device 1. However, it is the same as the above-described operation example except that it is replaced with the environmental sound and the volume of the environmental sound.

拍手音や声援・掛け声、雑音などは、いずれも伝送元の会場の雰囲気を決定づける重要な要素である一方で、いろいろな音響信号が混合された白色雑音に近い信号である。前述したようにこれらの音を環境音と呼ぶ。伝送元で環境音が発せられたタイミング及び音量が保たれていれば、信号そのものは伝送元の環境音と全く同じ信号でなくとも、場の雰囲気を再現することができる。そこで、環境音分析装置１において、伝送元の環境音の音量に関するパラメタを抽出することで、伝送元において収音された環境音を効率よく低遅延に伝送することができる。 While applause, cheering, cheering, and noise are all important factors that determine the atmosphere of the venue of the transmission source, they are signals close to white noise mixed with various acoustic signals. As described above, these sounds are called environmental sounds. If the timing and volume at which the environmental sound is generated at the transmission source are maintained, the atmosphere of the field can be reproduced even if the signal itself is not exactly the same signal as the environmental sound of the transmission source. Thus, the environmental sound analyzer 1 can efficiently transmit the environmental sound collected at the transmission source with low delay by extracting the parameter related to the volume of the environmental sound of the transmission source.

以下、図４、図５、図６を参照して本発明の実施例２の環境音分析装置について説明する。図４は本実施例の環境音分析装置２の構成を示すブロック図である。図５は本実施例の環境音分析装置２の動作を示すフローチャートである。図６は本実施例のパラメタ変換部２３のパラメタ生成手順を例示する図である。図４に示すように、本実施例の環境音分析装置２は、収音部１１と、音量計算部１２と、パラメタ変換部２３と、データ送信部１４とを備える。収音部１１、音量計算部１２、データ送信部１４は実施例１の環境音分析装置１における同一番号の各構成部と同じであるから説明を適宜略する。 Hereinafter, the environmental sound analysis apparatus according to the second embodiment of the present invention will be described with reference to FIGS. 4, 5, and 6. FIG. 4 is a block diagram showing the configuration of the environmental sound analyzer 2 of this embodiment. FIG. 5 is a flowchart showing the operation of the environmental sound analyzer 2 of the present embodiment. FIG. 6 is a diagram illustrating a parameter generation procedure of the parameter conversion unit 23 of the present embodiment. As shown in FIG. 4, the environmental sound analysis device 2 of this embodiment includes a sound collection unit 11, a volume calculation unit 12, a parameter conversion unit 23, and a data transmission unit 14. Since the sound collection unit 11, the sound volume calculation unit 12, and the data transmission unit 14 are the same as the components having the same numbers in the environmental sound analyzer 1 of the first embodiment, the description thereof will be omitted as appropriate.

＜音量計算部１２＞
音量計算部１２は、４８ｋＨｚサンプリングでサンプリングされた信号列であり、１フレーム６サンプル（Ｎ＝６）で構成される信号列Ｘ_ｊ＝（ｘ_ｊ（１），ｘ_ｊ（２），…，ｘ_ｊ（６））を取得する。音量計算部１２は、フレーム毎に、入力された拍手音響信号Ｘ_ｊ＝（ｘ_ｊ（１），ｘ_ｊ（２），…，ｘ_ｊ（６））から、平均エネルギー <Volume calculator 12>
The sound volume calculation unit 12 is a signal sequence sampled by 48 kHz sampling, and a signal sequence X _j = (x _j (1), x _j (2),. x _j (6)) is acquired. For each frame, the volume calculation unit 12 calculates the average energy from the input applause sound signal X _j = (x _j (1), x _j (2),..., X _j (6)).

を計算する（Ｓ１２）。 Is calculated (S12).

＜パラメタ変換部２３＞
パラメタ変換部２３は、求めた平均エネルギーＥ_ｊを以下の式により変形した列Ｆ_ｊを求める。 <Parameter converter 23>
The parameter conversion unit 23 obtains a sequence F _j obtained by transforming the obtained average energy E _j by the following expression.

つまり、図６に示すように、ガウス関数や床関数により整数値化されたＦ_ｊの取りうる値（０〜３２７６８）のうち、奇数の値に負の符号を与え、さらに１を減じる。これにより、Ｆ_ｊはすべて偶数の値を取ることになる。次に、全部偶数になったＦ_ｊの各々を２で割る（右に１ビットシフトでも構わない）。この値をＧ．７１１準拠の範囲に収めるために、μ−ｌａｗを使うのであればさらに２で割り（右に１ビットシフトでもよい）値Ｇ_ｊを求める。そして、Ｇに対して、ＩＴＵ−Ｔ＿Ｇ．７１１の符号化処理を行い、Ｇ_ｊをＧ．７１１の符号（番号）に変換する。４８ｋＨｚサンプリング６サンプル分を一塊（１フレーム）にすると８ｋＨｚの１サンプル分に相当するので、上記のＧ_ｊ毎にＧ．７１１のシンボル１つを割り当てることができる。割り当てられたシンボル列をパラメタＰ_ｊとして出力する（Ｓ２３）。パラメタＰ_ｊは通常の音声と同様に固定電話回線を用いて伝送すると遅延が短くすむ。式（１）のかわりに式（２）のように対数を用いてもよい。 That is, as shown in FIG. 6, a negative sign is given to an odd value among the possible values (0 to 32768) of F _j converted to an integer value by a Gaussian function or a floor function, and 1 is further reduced. As a result, all F _j take even values. Next, each of F _j , which is all even, is divided by 2 (a 1-bit shift to the right may be used). This value is set to G. In order to keep within the range of 711, if μ-law is used, a value G _j is obtained by further dividing by 2 (may be shifted to the right by 1 bit). For G, ITU-T_G. 711 is performed, and G _{j is changed} to G.G. It is converted into a code (number) 711. Since equivalent to one sample of 8kHz when the 48kHz sampling 6 samples in loaf (1 frame), G. per above _{G j} One of 711 symbols can be assigned. The assigned symbol string is output as the parameter P _j (S23). When the parameter P _j is transmitted using a fixed telephone line as in the case of normal voice, the delay is shortened. A logarithm may be used instead of equation (1) as in equation (2).

また、平方根演算や対数演算は多項式近似（テイラー展開など）で演算量を削減してもよい。 In addition, square root calculation and logarithmic calculation may reduce the amount of calculation by polynomial approximation (Taylor expansion etc.).

[実施例２の変形例１]
以下、図７、図８を参照して実施例２のパラメタ変換部２３に変更を加えた変形例１の環境音分析装置について説明する。図７は本変形例の環境音分析装置２’の構成を示すブロック図である。図８は本変形例の環境音分析装置２’の動作を示すフローチャートである。図７に示すように、本変形例の環境音分析装置２’は、収音部１１と、音量計算部１２と、パラメタ変換部２３’と、データ送信部１４とを備える。収音部１１、音量計算部１２、データ送信部１４は実施例２の環境音分析装置２における同一番号の各構成部と同じであるから説明を適宜略する。 [Modification 1 of Embodiment 2]
Hereinafter, a description will be given of an environmental sound analysis apparatus according to Modification 1 in which the parameter conversion unit 23 according to Embodiment 2 is modified with reference to FIGS. 7 and 8. FIG. 7 is a block diagram showing the configuration of the environmental sound analyzer 2 ′ of the present modification. FIG. 8 is a flowchart showing the operation of the environmental sound analysis apparatus 2 ′ of this modification. As shown in FIG. 7, the environmental sound analysis device 2 ′ of the present modification includes a sound collection unit 11, a sound volume calculation unit 12, a parameter conversion unit 23 ′, and a data transmission unit 14. Since the sound collection unit 11, the sound volume calculation unit 12, and the data transmission unit 14 are the same as the components with the same numbers in the environmental sound analysis device 2 of the second embodiment, description thereof will be omitted as appropriate.

＜パラメタ変換部２３’＞
パラメタ変換部２３’は、図６のようなマッピング演算の代わりに、Ｆ_ｊの取りうる０〜３２７６８の値を直接８ｂｉｔのシンボルにマッピングするマッピングテーブル２３Ａを予め備えており、マッピングテーブル２３Ａを参照してパラメタＰ_ｊを求める（Ｓ２３’）。または、パラメタ変換部２３’は、Ｆ_ｊの取りうる０〜３２７６８の値をあらかじめビットシフト等により場合の数を減らしてから、マッピングテーブル２３Ａを用いてパラメタＰ_ｊを求めてもよい。この場合はマッピングテーブル２３Ａの大きさを削減できる。Ｆ_ｊはデシベル単位に変換したものを用いてもよい。 <Parameter converter 23 '>
Parameter conversion unit 23 ', instead of mapping operations as in FIG. 6, has previously provided a mapping table 23A that maps directly 8bit symbol value of 0 to 32,768 which can be taken of the _{F j,} referring to the mapping table 23A Then, the parameter P _j is obtained (S23 ′). Alternatively, the parameter conversion unit 23 ′ may obtain the parameter P _j by using the mapping table 23A after reducing the number of cases where the value of F _j can be 0 to 32768 by bit shift or the like in advance. In this case, the size of the mapping table 23A can be reduced. F _j converted to decibel units may be used.

実施例２及び変形例１の環境音分析装置は以下の効果を有する。収音された拍手音の音響信号は正の値となるため、Ｅ_ｊの平方根の値の取りうる範囲は正の整数値、例えばｘ_ｊ（ｎ）（ｎ＝１，２，…，Ｎ）が符号付き１６ｂｉｔの場合は最小値が０で最大値が３２７６８となる。このまま、パラメタ変換部でＩＴＵ−Ｔ＿Ｇ．７１１の符号化を行うと、符号化効率が悪くなるという問題がある。上記式（１）の変形を行うと、例えばｘ_ｊ（ｎ）（ｎ＝１，２，…，Ｎ）が符号付き１６ｂｉｔの場合は、Ｆ_ｊの取りうる範囲は−１６３８４から１６３８４になる。そこで、パラメタ変換部においてＥ_ｊの取りうる範囲が負の整数値から正の整数値の範囲となるように変換した値Ｆ_ｊを用いることにより、符号化効率を向上させることができ、パラメタＰ_ｊの情報量を削減することができる。つまり、伝送遅延をより少なくすることが可能となる。 The environmental sound analyzers according to the second embodiment and the first modification have the following effects. Since the acoustic signal of the collected clap sound has a positive value, the range of the value of the square root of E _j is a positive integer value, for example, x _j (n) (n = 1, 2,..., N) When 16 bits are signed, the minimum value is 0 and the maximum value is 32768. As it is, the parameter conversion unit performs ITU-T_G. When the encoding of 711 is performed, there is a problem that the encoding efficiency deteriorates. When the above formula (1) is modified, for example, when x _j (n) (n = 1, 2,..., N) is 16 bits with a sign, the range that F _j can take is from 16384 to 16384. Therefore, the encoding efficiency can be improved by using the value F _j converted so that the range that E _j can take from the negative integer value to the positive integer value in the parameter conversion unit, and the parameter P The information amount of _j can be reduced. That is, transmission delay can be further reduced.

[実施例２の動作例２]
上述の実施例２および実施例２の変形例１では、伝送元の環境音の例として拍手音を対象とし、伝送元の拍手音を分析する環境音分析装置２（２’）の動作例を説明したが、これに限らず拍手音以外の環境音を対象としても良い。例えば、声援や掛け声などを環境音としても良いし、伝送元で収音される音の中から伝送元会場のメインコンテンツの音を除いた音響信号（雑音を含む）を環境音としても良い。 [Operation Example 2 of Example 2]
In the above-described second embodiment and the first modification of the second embodiment, an example of the operation of the environmental sound analysis apparatus 2 (2 ′) that analyzes the applause sound as an example of the environmental sound of the transmission source. Although described, the present invention is not limited to this, and environmental sounds other than applause sounds may be targeted. For example, cheering or shouting may be used as the environmental sound, or an acoustic signal (including noise) obtained by removing the main content sound at the transmission source site from the sound collected at the transmission source may be used as the environmental sound.

実施例２の動作例２においては、環境音分析装置２（２’）の収音部１１、音量計算部１２、パラメタ変換部２３または２３’、データ送信部１４の各部で取り扱われる拍手音および拍手音量が、環境音及び環境音の音量に置き換わる点を除いては、上述の動作例と同じである。 In the operation example 2 of the second embodiment, applause sounds handled by the sound collection unit 11, the sound volume calculation unit 12, the parameter conversion unit 23 or 23 ′, and the data transmission unit 14 of the environmental sound analysis device 2 (2 ′) It is the same as the above-described operation example, except that the applause volume is replaced with the environmental sound and the environmental sound volume.

以下、図９、図１０を参照して本発明の実施例３の環境音合成装置について説明する。図９は本実施例の環境音合成装置３の構成を示すブロック図である。図１０は本実施例の環境音合成装置３の動作を示すフローチャートである。図９に示すように、本実施例の環境音合成装置３は、データ受信部３１と、音源合成部３２と、テンプレート記憶部３３と、再生部３４とを備える。環境音合成装置３は環境音分析装置１（２、２’）から伝送元の音響信号の音量に関する環境音量パラメタを取得して環境音を生成する装置である。以下、実施例１、２で詳述した動作例に従い、環境音の例として拍手音を用いて説明を進める。 Hereinafter, an environmental sound synthesizer according to a third embodiment of the present invention will be described with reference to FIGS. 9 and 10. FIG. 9 is a block diagram showing the configuration of the environmental sound synthesizer 3 of this embodiment. FIG. 10 is a flowchart showing the operation of the environmental sound synthesizer 3 of this embodiment. As shown in FIG. 9, the environmental sound synthesizer 3 according to the present embodiment includes a data reception unit 31, a sound source synthesis unit 32, a template storage unit 33, and a reproduction unit 34. The environmental sound synthesizer 3 is an apparatus that generates an environmental sound by acquiring an environmental volume parameter related to the volume of the transmission source acoustic signal from the environmental sound analyzer 1 (2, 2 '). Hereinafter, in accordance with the operation example described in detail in the first and second embodiments, the explanation will be made using applause sound as an example of the environmental sound.

＜データ受信部３１＞
データ受信部３１は、環境音分析装置から環境音量パラメタＰ_ｊを受信する（Ｓ３１）。 <Data receiving unit 31>
Data receiving unit 31 receives the environmental sound level parameters _{P j} from the environmental sound analysis device (S31).

＜テンプレート記憶部３３＞
テンプレート記憶部３３には、拍手音の各音量バリエーションに対して複数の拍手音（１フレーム分）のテンプレートが記憶されている。つまり、テンプレート記憶部３３には、ｉをフレームのインデックスとした場合に、１フレーム分の拍手音を含む環境音のテンプレートＴ_ｉと当該テンプレートの環境音の音量に対応する情報Ｅ’_ｉとが対応付けて記憶されているものとする。なお、テンプレートの環境音の音量に対応する値は、各テンプレートＴ_ｉを入力として、上記実施例１または２の音量計算部１２及びパラメタ変換部１３（２３）と同じ方法により求めることができる。なお、実施例１または２のどの方法を用いるかは、環境音分析装置と環境音合成装置との間で統一しておくものとする。 <Template storage unit 33>
The template storage unit 33 stores templates of a plurality of clap sounds (for one frame) for each volume variation of the clap sounds. That is, in the template storage unit 33, when i is a frame index, an environmental sound template T _i including applause sound for one frame and information E ′ _i corresponding to the volume of the environmental sound of the template are stored. Assume that they are stored in association with each other. Note that the value corresponding to the volume of the environmental sound of the template can be obtained by the same method as the volume calculation unit 12 and the parameter conversion unit 13 (23) of the first or second embodiment with each template _Ti as an input. It should be noted that which method of Embodiment 1 or 2 is used is unified between the environmental sound analyzer and the environmental sound synthesizer.

＜音源合成部３２＞
音源合成部３２は、入力された環境音量パラメタＰ_ｊと同じ音量大きさのテンプレートのうちいずれか１つをテンプレート記憶部３３からランダムに選択する。つまり、Ｐ_ｊ＝Ｅ’_ｉを満たすＥ’_ｉに対応づけられているテンプレートＴ_ｉのうち、いずれか１つをランダムに選択する。音源合成部３２は、選択したテンプレートを、必要に応じて前のフレームと補間をして、１フレーム分の音響信号を合成して環境音（この動作例では拍手音）を生成する（Ｓ３２）。例えば、２０ｍｓのフレームあたり環境音量パラメタに８ｂｉｔのバリエーションがあったとすると、４００ｂｉｔ／ｓｅｃで拍手音を伝送できる。 <Sound source synthesis unit 32>
The sound source synthesis unit 32 randomly selects one of the templates having the same volume level as the input environmental volume parameter P _j from the template storage unit 33. That is, one of the templates T _{i associated} with E ′ _i satisfying P _j = E ′ _i is selected at random. The sound source synthesizer 32 interpolates the selected template with the previous frame as necessary, and synthesizes an acoustic signal for one frame to generate an environmental sound (applause sound in this operation example) (S32). . For example, if there is an 8-bit variation in the environmental sound volume parameter per 20 ms frame, the applause sound can be transmitted at 400 bits / sec.

＜再生部３４＞
再生部３４は、音源合成部３２が合成した拍手音を再生する（Ｓ３４）。 <Playback unit 34>
The reproducing unit 34 reproduces the clap sound synthesized by the sound source synthesizing unit 32 (S34).

このように、本実施例の環境音合成装置３によれば、テンプレート記憶部３３に拍手音の各音量バリエーションに対して複数のテンプレートを保持しておき、音源合成部３２が音量の条件を充たす複数のテンプレートから１つのテンプレートをランダムに選択するため、合成された拍手音が定常的なパターンとして聞こえないようにすることができる。 Thus, according to the environmental sound synthesizer 3 of the present embodiment, the template storage unit 33 holds a plurality of templates for each volume variation of the applause sound, and the sound source synthesizer 32 satisfies the volume condition. Since one template is randomly selected from a plurality of templates, the synthesized applause sound can be prevented from being heard as a steady pattern.

[実施例３の動作例２]
実施例３では、伝送元の環境音の例として拍手音を対象とし、伝送元の拍手音の音量に関するパラメタを取得して、伝送先で拍手音を生成する環境音合成装置３の動作例を説明したが、これに限らず拍手音以外の環境音を対象としても良い。例えば、声援や掛け声や、伝送元で収音される音の中から伝送元会場のメインコンテンツの音を除いた音響信号（雑音を含む）を環境音とし、伝送元の環境音量パラメタが入力され、伝送先で環境音を合成してもよい。 [Operation Example 2 of Example 3]
In the third embodiment, an example of the operation of the environmental sound synthesizer 3 that generates applause sound at a transmission destination by acquiring a parameter related to the volume of the applause sound as a transmission source as an example of the environmental sound of the transmission source. Although described, the present invention is not limited to this, and environmental sounds other than applause sounds may be targeted. For example, a sound signal (including noise) obtained by excluding the sound of the main content at the transmission source site from the sound collected at the transmission source, such as cheering and cheering, is used as the environmental sound, and the environmental volume parameter of the transmission source is input. Environmental sound may be synthesized at the transmission destination.

実施例３の動作例２では、実施例３の環境音合成装置３のデータ受信部３１と、音源合成部３２と、テンプレート記憶部３３と、再生部３４において、拍手音が環境音に置き換わる点を除いては、上述の動作例と同じである。 In the operation example 2 of the third embodiment, the applause sound is replaced with the environmental sound in the data reception unit 31, the sound source synthesis unit 32, the template storage unit 33, and the reproduction unit 34 of the environmental sound synthesizer 3 of the third embodiment. Is the same as the above-described operation example.

以下、図１１、図１２、図１３を参照して本発明の実施例４の環境音合成装置について説明する。図１１は本実施例の環境音合成装置４の構成を示すブロック図である。図１２は本実施例の環境音合成装置４の動作を示すフローチャートである。図１３は本実施例の音源合成部４２の環境音素片テンプレート合成手順を例示する図である。図１１に示すように、本実施例の環境音合成装置４は、データ受信部３１と、音源合成部４２と、テンプレート記憶部４３と、再生部３４と、人数推定部４５と、テンプレート音量記憶部４６とを備える。データ受信部３１、再生部３４は実施例３の環境音合成装置３における同一番号の各構成部と同じであるから説明を省略する。 Hereinafter, an environmental sound synthesizer according to a fourth embodiment of the present invention will be described with reference to FIGS. 11, 12, and 13. FIG. 11 is a block diagram showing the configuration of the environmental sound synthesizer 4 of this embodiment. FIG. 12 is a flowchart showing the operation of the environmental sound synthesizer 4 of this embodiment. FIG. 13 is a diagram exemplifying a procedure for synthesizing the environmental sound element template of the sound source synthesizing unit 42 of the present embodiment. As shown in FIG. 11, the environmental sound synthesizer 4 of the present embodiment includes a data receiving unit 31, a sound source synthesizing unit 42, a template storage unit 43, a playback unit 34, a number estimating unit 45, and a template volume storage. Part 46. Since the data receiving unit 31 and the reproducing unit 34 are the same as the components having the same numbers in the environmental sound synthesizer 3 of the third embodiment, the description thereof is omitted.

＜テンプレート記憶部４３＞
テンプレート記憶部４３には、一人の人間による一拍分の拍手音（３００ｍｓ程度）のテンプレートの複数のバリエーションが記憶されている。本実施例では環境音の例として拍手音を扱うため、拍手音のテンプレートを環境音素片テンプレートのバリエーションのひとつとする。従って、以下では拍手音のテンプレートを環境音素片テンプレートともいう。例えば、異なる人の一拍分の拍手音をそれぞれ異なる環境音素片テンプレートとして記憶しておく。以下、単にテンプレートという場合には、所定フレーム長の複数人による拍手音（環境音）全体を収録したテンプレートを指すものとし、環境音素片テンプレートという場合には、一人の人間による一拍分の拍手音（環境音）のテンプレートを指すものとする。 <Template storage unit 43>
The template storage unit 43 stores a plurality of variations of a one-beat applause sound (about 300 ms) by one person. In this embodiment, a clap sound is treated as an example of the environmental sound, and therefore the template of the clapping sound is one of the variations of the environmental sound segment template. Accordingly, in the following, the applause sound template is also referred to as an environmental sound segment template. For example, applause sounds for one beat of different people are stored as different environmental sound segment templates. Hereinafter, the term “template” refers to a template that contains the entire applause sound (environmental sound) of a plurality of people with a predetermined frame length, and the term “environmental phoneme template” refers to applause for one beat by one person. It shall be a sound (environmental sound) template.

＜テンプレート音量記憶部４６＞
テンプレート音量記憶部４６には、テンプレート記憶部４３に記憶されている環境音素片テンプレートの音量に対応する情報（具体的には、実施例１または２の音量計算部１２により計算される、平均エネルギー）が記憶されている。なお、１人分の拍手音の音量の差は小さいので、テンプレート記憶部４３に記憶されている環境音素片テンプレートのいずれか一つについて計算された平均エネルギーを環境音素片テンプレートの音量に対応する情報として記憶しておいてもよい。また、テンプレート記憶部４３に記憶されている全環境音素片テンプレートの平均エネルギーの平均値を、環境音素片テンプレートの音量に対応する情報としてテンプレート音量記憶部４６に記憶しておいてもよい。あるいは、予め定めた定数を音量に対応する情報としてテンプレート音量記憶部４６に記憶しておいても良い。 <Template Volume Storage Unit 46>
The template volume storage unit 46 stores information corresponding to the volume of the environmental phoneme template stored in the template storage unit 43 (specifically, the average energy calculated by the volume calculation unit 12 of the first or second embodiment). ) Is stored. Since the difference in volume of applause sound for one person is small, the average energy calculated for any one of the environmental phoneme templates stored in the template storage unit 43 corresponds to the volume of the environmental phoneme template. It may be stored as information. Further, the average value of the average energy of all environmental phoneme templates stored in the template storage unit 43 may be stored in the template volume storage unit 46 as information corresponding to the volume of the environmental phoneme template. Alternatively, a predetermined constant may be stored in the template volume storage unit 46 as information corresponding to the volume.

なお、テンプレート音量記憶部４６に予め環境音素片テンプレートの音量に対応する情報を記憶せず、その都度テンプレート記憶部４３からランダムに選択した環境音素片テンプレートについて計算した平均エネルギーを環境音素片テンプレートの音量に対応する情報として用いても良い。 In addition, the template volume storage unit 46 does not store information corresponding to the volume of the environmental phoneme template in advance, and the average energy calculated for the environmental phoneme template randomly selected from the template storage unit 43 each time is stored in the template phoneme template. You may use as information corresponding to a volume.

＜人数推定部４５＞
人数推定部４５は、環境音量パラメタＰ_ｊに応じて音量のゲイン調整を行うための構成である。人数推定部４５は、伝送元から出力された環境音量パラメタＰ_ｊを取得し、当該環境音量パラメタＰ_ｊから音量に対応する情報Ｅ’_ｊを求める。具体的には、実施例１または２のパラメタ変換部１３（２３）と逆の処理を行うことにより、音量に対応する情報Ｅ’_ｊを得る。人数推定部４５は、音量に対応する情報Ｅ’_ｊを環境音素片テンプレートの音量に対応する情報で除算した値の整数値（小数点以下を四捨五入、または切り捨てた値）を拍手の人数Ｍとして出力する（Ｓ４５）。 <Number of people estimation unit 45>
Number estimating unit 45 is configured for adjusting the gain of the sound volume in accordance with the environmental sound level parameters P _j. The number-of-people estimation unit 45 acquires the environmental sound volume parameter P _j output from the transmission source, and obtains information E ′ _j corresponding to the sound volume from the environmental sound volume parameter P _j . Specifically, information E ′ _j corresponding to the sound volume is obtained by performing the reverse process of the parameter conversion unit 13 (23) of the first or second embodiment. The number-of-persons estimation unit 45 outputs an integer value (a value obtained by rounding off or rounding off the decimal point) as information M 'of applause by dividing the information E ′ _j corresponding to the volume by the information corresponding to the volume of the environmental phoneme template. (S45).

＜音源合成部４２＞
音源合成部４２は、テンプレート記憶部４３から環境音素片テンプレートをランダムに選択して合成する（Ｓ４２）。環境音量パラメタにより一人分の拍手を合成する場合（人数推定部４５においてＭ＝１となった場合）は、図１３Ａのように、約３００ｍｓごとにランダムに選択された環境音素片テンプレートＴ_ｉを用いて合成した波形を拍手音として出力する。前述のように合成の時間間隔は約３００ｍｓでよいが、より好ましくは３００ｍｓを中心として時間間隔に揺らぎを持たせてもよい。時間間隔に揺らぎを持たせることによってさらに自然な拍手音を合成することができる。たとえば３００ｍｓを中心としてガウス分布にしたがう乱数により、±数１０ｍｓの揺らぎを持たせればよい。例えば音源合成部４２は <Sound source synthesis unit 42>
The sound source synthesizing unit 42 randomly selects and synthesizes the environmental phoneme template from the template storage unit 43 (S42). When applause for one person is synthesized based on the environmental volume parameter (when M = 1 in the number estimation unit 45), the environmental sound segment template T _i selected at random intervals of about 300 ms is obtained as shown in FIG. 13A. The synthesized waveform is output as applause sound. As described above, the synthesis time interval may be about 300 ms, but more preferably, the time interval may have fluctuations around 300 ms. By giving fluctuation to the time interval, a more natural applause sound can be synthesized. For example, a fluctuation of ± several tens of ms may be given by a random number according to a Gaussian distribution centering on 300 ms. For example, the sound source synthesis unit 42

によりテンプレートを変換した拍手音Ｙ_ｉ（ｉ＝０，１，２，・・・）を出力する（Ｓ４２）。別の表現方法で書くと、時系列テンプレート信号Ｔ_ｉ＝（ｔ_ｉ［１］ｔ_ｉ［２］ … ｔ_ｉ［Ｐ］）と拍手タイミングを表すインパルスδ（ｉ・τ＋σ_ｉ）の畳み込み演算でＹ_ｉを出力とする。 The applause sound Y _i (i = 0, 1, 2,...) Obtained by converting the template is output (S42). In other words, the time series template signal T _i = (t _i [1] t _i [2]... T _i [P]) and the impulse δ (i · τ + σ _i ) representing the applause timing are convolved. Let Y _i be the output.

ここで＊は畳み込み演算を表す。ここで、τ＝３００ｍｓであり、σ_ｉは−１０ｍｓ≦σ_ｉ≦＋１０ｍｓの範囲で生成した乱数である。環境音量パラメタによりＭ人分の拍手を合成する場合は、図１３Ｂのように、時間間隔を約３００／Ｍ（ｍｓ）ごとにランダムに選択された環境音素片テンプレートを用いて合成された波形を拍手音として出力する。人数Ｍの逆数を使って、時間間隔を約３００／Ｍ（ｍｓ）と設定することで、拍手の人数Ｍが増えるに従って時間間隔が小さくなるように設定することができる。この場合もガウス分布やラプラス分布に従う乱数によって、揺らぎを持たせることができる。例えば音源合成部４２は、 Here, * represents a convolution operation. Here, τ = 300 ms, and σ _i is a random number generated in a range of −10 ms ≦ σ _i ≦ + 10 ms. When synthesizing applause for M persons using the environmental volume parameter, as shown in FIG. 13B, a waveform synthesized using an environmental phoneme template that is randomly selected at intervals of about 300 / M (ms) is used. Output as applause sound. By setting the time interval to about 300 / M (ms) using the reciprocal of the number of people M, the time interval can be set to decrease as the number M of applause increases. In this case as well, fluctuations can be given by random numbers according to the Gaussian distribution or Laplace distribution. For example, the sound source synthesis unit 42

によりテンプレートを変換した環境音Ｙ_ｉ（ｉ＝０，１，２，・・・）を出力する（Ｓ４２）。 The environmental sound Y _i (i = 0, 1, 2,...) Converted from the template is output (S42).

このように、本実施例の環境音合成装置４によれば、実施例３のように音量ごとにテンプレートを用意しておく必要がなく、テンプレート記憶部４３に記憶しておく環境音素片テンプレートの数も少なくてよいため、環境音合成装置４のメモリ量を削減することができる。 Thus, according to the environmental sound synthesizer 4 of the present embodiment, it is not necessary to prepare a template for each volume as in the third embodiment, and the environmental sound segment template stored in the template storage unit 43 is not necessary. Since the number may be small, the memory amount of the environmental sound synthesizer 4 can be reduced.

[実施例４の動作例２]
実施例４は、伝送元の伝送元の環境音の例として拍手音を対象とし、伝送元の拍手音の音量に関するパラメタを取得して、伝送先で拍手音を生成する環境音合成装置４を説明したが、これに限らず拍手音以外の環境音を対象としても良い。上述では、一人の人間による一拍分の拍手音（３００ｍｓ程度）のテンプレートを環境音素片テンプレートの例として示したが、これに限らず、たとえば、一人の人間による一拍分の声援、掛け声のテンプレートを環境音素片テンプレートとしてもよい。 [Operation Example 2 of Example 4]
The fourth embodiment targets an applause sound as an example of a transmission source environmental sound, obtains a parameter related to the volume of the transmission applause sound, and generates an applause sound at the transmission destination. Although described, the present invention is not limited to this, and environmental sounds other than applause sounds may be targeted. In the above description, a template of applause sound for one beat (about 300 ms) by one person is shown as an example of an environmental sound segment template. However, the template is not limited to this. The template may be an environmental sound segment template.

実施例４の動作例２では、実施例４の環境音合成装置４のデータ受信部３１と、音源合成部４２と、テンプレート記憶部４３と、再生部３４と、人数推定部４５と、テンプレート音量記憶部４６において取り扱われるデータが拍手音から環境音に置き換わる点を除いては、上述の動作例と同じである。 In the operation example 2 of the fourth embodiment, the data reception unit 31, the sound source synthesis unit 42, the template storage unit 43, the playback unit 34, the number of people estimation unit 45, and the template volume of the environmental sound synthesizer 4 of the fourth embodiment. The operation example is the same as that described above except that the data handled in the storage unit 46 is replaced with the ambient sound from the applause sound.

なお、音源合成部４２において、式（３）の代わりに、時系列テンプレート信号Ｔ_ｉ＝（ｔ_ｉ［１］ｔ_ｉ［２］ … ｔ_ｉ［Ｐ］）と環境音タイミングを表すインパルスδ（ｍ・τ＋σ_ｍ）の畳み込み演算でＹ_ｉを出力としても良い。 In the sound source synthesis unit 42, instead of the equation (3), the time-series template signal T _i = (t _i [1] t _i [2]... T _i [P]) and the impulse δ ( It is also possible to output Y _i by the convolution operation of m · τ + σ _m ).

ここで＊は畳み込み演算を表す。 Here, * represents a convolution operation.

また、テンプレート記憶部４３に記憶しておく環境音素片テンプレートの波形のエネルギーをあらかじめ正規化してあってもよい。その場合は、人数推定部４５のパラメタに応じで、音量（ゲイン）を調整すればよい。この場合もメモリ量を少なくしながらバリエーションを増やすことができる。 Further, the energy of the waveform of the environmental sound segment template stored in the template storage unit 43 may be normalized in advance. In that case, the volume (gain) may be adjusted according to the parameter of the number of persons estimation unit 45. In this case as well, variations can be increased while reducing the amount of memory.

以下、図１４、図１５、図１６を参照して本発明の実施例５及びその変形例の環境音分析装置について説明する。図１４は本実施例及びその変形例の環境音分析装置５の構成を示すブロック図である。図１５は本実施例及びその変形例の環境音分析装置５の動作を示すフローチャートである。図１６は拍手音や手拍子音の周波数成分の時間変化を例示する図である。図１６Ａは、拍手音が鳴りやんで無音状態に移行した場合の周波数成分の時間変化の例を示す図である。図１６Ｂは、手拍子が行われている場合の周波数成分の時間変化の例を示す図である。図１６Ｃは、拍手が行われている場合の周波数成分の時間変化の例を示す図である。図１４に示すように、本実施例の環境音分析装置５は、収音部１１と、音量計算部１２と、パラメタ変換部１３と、データ送受信部５４と、拍手区間検出部５５と、周期性判定部５６とを備える。本実施例の変形例としてパラメタ変換部１３をパラメタ変換部２３、２３’に適宜変更可能である。また、収音部１１、音量計算部１２、パラメタ変換部１３、２３、２３’は実施例１、実施例２、変形例１の環境音分析装置１、２，２’における同一番号の各構成部と同じであるから適宜説明を略する。 Hereinafter, an environmental sound analysis apparatus according to a fifth embodiment of the present invention and a modification thereof will be described with reference to FIGS. 14, 15, and 16. FIG. 14 is a block diagram showing the configuration of the environmental sound analyzer 5 of the present embodiment and its modification. FIG. 15 is a flowchart showing the operation of the environmental sound analyzer 5 of the present embodiment and its modification. FIG. 16 is a diagram illustrating time variation of frequency components of applause sound and clapping sound. FIG. 16A is a diagram illustrating an example of a time change of the frequency component when the applause sound stops and the state shifts to the silent state. FIG. 16B is a diagram illustrating an example of a time change of a frequency component when hand clapping is performed. FIG. 16C is a diagram illustrating an example of a time change of the frequency component when applause is performed. As shown in FIG. 14, the environmental sound analysis device 5 of the present embodiment includes a sound collection unit 11, a volume calculation unit 12, a parameter conversion unit 13, a data transmission / reception unit 54, a clap section detection unit 55, and a cycle. A sex determination unit 56. As a modification of the present embodiment, the parameter converter 13 can be appropriately changed to the parameter converters 23 and 23 '. In addition, the sound collection unit 11, the volume calculation unit 12, and the parameter conversion units 13, 23, and 23 ′ are components having the same numbers in the environmental sound analyzers 1, 2, and 2 ′ of the first, second, and first modification examples. Since it is the same as the part, description will be omitted as appropriate.

＜拍手区間検出部５５＞
拍手区間検出部５５は、音響信号が拍手音であるか否かを判別する機能を有する。具体的には、拍手区間検出部５５は、ＶＡＤ（Ｖｏｉｃｅ＿Ａｃｔｉｖｉｔｙ＿Ｄｅｔｅｃｔｉｏｎ）やＳＡＤ（Ｓｏｕｎｄ＿Ａｃｔｉｖｉｔｙ＿Ｄｅｔｅｃｔｉｏｎ）を用いて入力された音響信号をフレーム毎に分析し、「無音区間」「音声（音楽）区間」「その他」のいずれに該当するかを判別し、音響信号が無音区間または音声区間と判別された場合には、環境音量パラメタＰ_ｊ＝０と設定する。例えば拍手区間検出部５５が、ＩＴＵ−Ｔ＿Ｇ．７２０．１を用いてＶＡＤ・ＧＳＡＤ分析する場合、拍手区間検出部５５は判別結果として０：無音、１：ノイズ、２：音楽、３：音声の何れかのフラグを生成する。この場合、拍手区間検出部５５は、判別結果が無音区間（フラグ０）、音声区間（フラグ２または３）である場合（つまり、拍手音でない場合）には、環境音量パラメタＰ_ｊ＝０と設定する。一方、拍手区間検出部５５は、判別結果がノイズ（フラグ１）である場合には、これを拍手音であるものとして、フラグ＝１と設定する。 <Applause section detector 55>
The applause section detection unit 55 has a function of determining whether or not the acoustic signal is a clap sound. Specifically, the applause section detection unit 55 analyzes an acoustic signal input for each frame using VAD (Voice_Activity_Detection) or SAD (Sound_Activity_Detection), and performs “silent section”, “voice (music) section”, “others”. If the sound signal is determined to be a silent section or a speech section, the environmental volume parameter P _j = 0 is set. For example, the applause section detection unit 55 receives the ITU-T_G. When VAD / GSAD analysis is performed using 720.1, the applause section detection unit 55 generates a flag of 0: silence, 1: noise, 2: music, or 3: voice as a discrimination result. In this case, when the determination result is a silent section (flag 0) or a voice section (flag 2 or 3) (that is, not a clapping sound), the applause section detection unit 55 sets the environment volume parameter P _j = 0. Set. On the other hand, if the determination result is noise (flag 1), the applause section detection unit 55 sets the flag = 1 as a clap sound.

環境音量パラメタＰ_ｊ＝０の場合（Ｓ５Ｙ）、拍手区間検出部５５はデータ送信部５４に環境音量パラメタＰ_ｊ＝０、および周期情報Ｔ＝０、または周期なしを示す周期情報を出力し、ステップＳ５４に移行する。一方、環境音量パラメタＰ_ｊが０でない（フラグ＝１）場合（Ｓ５Ｎ）、フラグ＝１を出力してステップＳ５６に移行する。 When the environmental volume parameter P _j = 0 (S5Y), the applause interval detection unit 55 outputs the environmental volume parameter P _j = 0 and the period information T = 0 or the period information indicating no period to the data transmission unit 54, Control goes to step S54. On the other hand, when the environmental sound volume parameter P _j is not 0 (flag = 1) (S5N), the flag = 1 is output and the process proceeds to step S56.

＜周期性判定部５６＞
周期性判定部５６は、環境音量パラメタＰ_ｊが０でない場合（Ｇ．７２０．１を用いてＶＡＤ・ＧＳＡＤ分析する場合、フラグ１：ノイズと判別された場合）当該フレームについて、１秒から数秒程度（例えば３００フレーム：３秒）の窓幅で分析を行う。分析したフレームが図１６Ｂに示すように一定の周期性がある場合には、周期性判定部５６は、当該フレームの音響信号を手拍子の音響信号と判断し、周期が検出された場合には、自己相関関数のピーク間隔を周期情報として出力する（Ｓ５６）。演算量削減のために、Ｇ．７２０．１の内部変数であるＲＭＳ値（フレームのパワーの平方根）を用いて、３００フレーム分の自己相関関数を求め、周期Ｔを求めてもよい（フレームの自己相関ではなく信号の自己相関を用いても良い）。 <Periodicity determination unit 56>
When the environmental sound volume parameter P _j is not 0 (when VAD / GSAD analysis is performed using G.720.1, flag 1: when determined as noise), the periodicity determination unit 56 is 1 second to several seconds. The analysis is performed with a window width of about (for example, 300 frames: 3 seconds). When the analyzed frame has a certain periodicity as shown in FIG. 16B, the periodicity determining unit 56 determines that the acoustic signal of the frame is an acoustic signal of a clapping time, and when the period is detected, The peak interval of the autocorrelation function is output as period information (S56). In order to reduce the amount of calculation, G.I. The autocorrelation function for 300 frames may be obtained using the RMS value (square root of the power of the frame) which is an internal variable of 720.1, and the period T may be obtained (not the autocorrelation of the frame but the autocorrelation of the signal) May be used).

周期情報の検出には、例えば次のような方法を用いる。Ｇ．７２０．１の内部変数である１０ｍｓ毎のＲＭＳ値（フレームのパワーの平方根）の自己相関関数をＲ（ｉ）（ｉ＝０，１，２，…，３００）とする。この例では３秒の分析フレームで分析したことになる。この例では、１００ｍｓ以上の周期を持つ場合には手拍子、そうでない場合には拍手と判定されることになる。手拍子の間隔が１００ｍｓ以上であるとすると、Ｒ（ｉ）のｉの値を大きくしていき、相関の値Ｒ（ｉ）が増加から減少に転じたときのｉをＴ１とする。さらにｉの値を増加させていき、相関の値Ｒ（ｉ）が増加から減少に転じたときのｉをＴ２とする。Ｔ２−Ｔ１の値が閾値の範囲内（例えば１１〜２６５）であれば、Ｔ２−Ｔ１を周期としてＴ＝Ｔ２−Ｔ１を出力する。なお、ピークの検出は、心電図のＲ−Ｒ間隔の検出など様々な方法があるので、既存の技術を適宜用いれば良い。 For example, the following method is used to detect the period information. G. Let R (i) (i = 0, 1, 2,..., 300) be an autocorrelation function of an RMS value (square root of frame power) every 10 ms, which is an internal variable of 720.1. In this example, the analysis is performed in an analysis frame of 3 seconds. In this example, if it has a period of 100 ms or more, it is determined to be clapping, and otherwise it is determined to be clapping. Assuming that the clapping interval is 100 ms or more, the value of i of R (i) is increased, and i when the correlation value R (i) changes from increasing to decreasing is set to T1. Further, the value of i is further increased, and i when the correlation value R (i) changes from increasing to decreasing is assumed to be T2. If the value of T2-T1 is within the threshold value range (for example, 11 to 265), T = T2-T1 is output with T2-T1 as the period. Note that there are various methods for detecting the peak, such as detecting the RR interval of the electrocardiogram, so existing techniques may be used as appropriate.

周期が検出されない場合には、周期性判定部５６は、Ｔ＝０または周期なしを示す周期情報を出力する。周期を表すインデックスが上記のように例えば１１〜２６５であれば２５５通りの周期を表すことができ、周期が無いというインデックスを加えて２５６通りの条件を表す８ビットを伝送すればよい。 When the period is not detected, the periodicity determination unit 56 outputs period information indicating T = 0 or no period. For example, if the index representing the period is 11 to 265 as described above, 255 periods can be represented, and an index indicating that there is no period may be added to transmit 8 bits representing 256 conditions.

音量計算部１２とパラメタ変換部１３（２３、２３’）は、環境音量パラメタＰ_ｊが０でない（Ｇ．７２０．１を用いてＶＡＤ・ＧＳＡＤ分析する場合、フラグ１：ノイズと判別された）フレームについて、実施例１または実施例２、あるいは変形例２の音量計算部１２とパラメタ変換部１３（２３、２３’）と同様の処理を行うことにより、環境音量パラメタＰ_ｊを計算し出力する（Ｓ１２、Ｓ１３、Ｓ２３、Ｓ２３’）。なお、音量計算部１２が出力するＥ_ｊとして、Ｇ．７２０．１の内部変数であるＲＭＳ値（フレームのパワーの平方根）を出力してもよい。 The sound volume calculation unit 12 and the parameter conversion unit 13 (23, 23 ′) have the environment sound volume parameter P _j not 0 (when the VAD / GSAD analysis is performed using G.720.1, it is determined that the flag 1 is noise). For the frame, the environment volume parameter P _j is calculated and output by performing the same processing as the volume calculation unit 12 and the parameter conversion unit 13 (23, 23 ′) of the first embodiment, the second embodiment, or the modification 2. (S12, S13, S23, S23 ′). As E _j output from the volume calculation unit 12, G. The RMS value (square root of the power of the frame) which is an internal variable of 720.1 may be output.

＜データ送信部５４＞
データ送信部５４は、環境音量パラメタＰ_ｊとともに周期情報を後述する環境音合成装置６に送信する（Ｓ５４）。 <Data transmission unit 54>
The data transmission unit 54 transmits the period information together with the environmental sound volume parameter P _j to the environmental sound synthesizer 6 described later (S54).

以下、図１７、図１８を参照して本発明の実施例６の環境音合成装置について説明する。実施例６の環境音合成装置６は、実施例５の環境音分析装置５に対応する装置である。図１７は本実施例の環境音合成装置６の構成を示すブロック図である。図１８は本実施例の環境音合成装置６の動作を示すフローチャートである。図１７に示すように、本実施例の環境音合成装置６は、データ受信部６１と、音源合成部６２と、テンプレート記憶部４３と、再生部３４と、人数推定部４５と、テンプレート音量記憶部４６とを備える。データ受信部６１と音源合成部６２以外の各構成部は実施例３、または実施例４の環境音合成装置３、４における同一番号の各構成部と同じであるから説明を省略する。 The environmental sound synthesizer according to the sixth embodiment of the present invention will be described below with reference to FIGS. The environmental sound synthesizer 6 according to the sixth embodiment is an apparatus corresponding to the environmental sound analyzer 5 according to the fifth embodiment. FIG. 17 is a block diagram showing the configuration of the environmental sound synthesizer 6 of this embodiment. FIG. 18 is a flowchart showing the operation of the environmental sound synthesizer 6 of this embodiment. As shown in FIG. 17, the environmental sound synthesizer 6 of the present embodiment includes a data receiving unit 61, a sound source synthesizing unit 62, a template storage unit 43, a playback unit 34, a number estimating unit 45, and a template sound volume storage. Part 46. Since the components other than the data receiver 61 and the sound source synthesizer 62 are the same as the components having the same numbers in the environmental sound synthesizers 3 and 4 of the third embodiment or the fourth embodiment, description thereof is omitted.

＜データ受信部６１＞
データ受信部６１は、環境音量パラメタＰ_ｊとともに周期情報を環境音分析装置５から受信する（Ｓ６１）。 <Data receiving unit 61>
The data receiving unit 61 receives the period information together with the environmental sound volume parameter P _j from the environmental sound analyzer 5 (S61).

＜音源合成部６２＞
音源合成部６２は、あるフレームの音響信号の環境音量パラメタＰ_ｊが０でなく（Ｓ６ＡＮ）、周期情報がＴ＝０、または周期が存在しないことを示している場合に（Ｓ６ＢＮ）、当該フレームの音響信号が拍手音であると判定して、実施例４の環境音合成装置４の音源合成部４２と同じ処理を行い、拍手音を出力する（Ｓ６２Ａ）。一方、音源合成部６２は、あるフレームの音響信号の環境音量パラメタＰ_ｊが０でなく（Ｓ６ＡＮ）、周期情報がＴ≠０である（周期情報ありを示す）場合に（Ｓ６ＢＹ）、当該フレームの音響信号が手拍子であると判定して、その周期（たとえば５００ｍｓ）が中心となるような実施例４の時よりも分散の小さいガウス分布やラプラス分布に従う揺らぎを持たせた波形を合成し、拍手音（実際は手拍子音）として出力する（Ｓ６２Ｂ）。音源合成部６２は、例えば上記式（３）または（４）において、τの値を周期情報（例えばτ＝５００ｍｓ）としてテンプレートを変換した拍手音Ｙ_ｉ（ｉ＝０，１，２，・・・）を出力する（Ｓ６２Ｂ）。この例では、手拍子音におけるτ＝５００（ｍｓ）として前述の拍手音の合成の際に設定されたτ＝３００（ｍｓ）より長い時間間隔に設定しているため、拍手音と判定された場合と比較して手拍子音と判定された場合の時間間隔が長くなるように環境音素片テンプレートが配置され合成される。 <Sound source synthesis unit 62>
Sound synthesizing unit 62, the environmental sound level parameters _{P j} of the audio signal of a certain frame is not 0 (S6AN), when the period information indicates that there is no T = 0 or period, (S6BN), the frame Are determined to be applause sounds, the same processing as the sound source synthesis unit 42 of the environmental sound synthesizer 4 of Example 4 is performed, and applause sounds are output (S62A). On the other hand, the sound source synthesizing section 62 is not an environmental sound level parameters _{P j} of the audio signal of a certain frame is 0 (S6AN), the period information is a T ≠ 0 (indicating that there is period information) in the case (S6BY), the frame Synthesizes a waveform having fluctuations according to a Gaussian distribution or a Laplace distribution having a smaller dispersion than in the case of Example 4 in which the period (for example, 500 ms) is the center. It outputs as a clapping sound (actually a clapping sound) (S62B). The sound source synthesizing unit 62, for example, in the above formula (3) or (4), the applause sound Y _i (i = 0, 1, 2,...) Obtained by converting the template using the value of τ as period information (for example, τ = 500 ms). .) Is output (S62B). In this example, since τ = 500 (ms) in the clapping sound is set to a time interval longer than τ = 300 (ms) set in the above-described synthesis of the clapping sound, it is determined that the sound is a clapping sound. The environmental sound segment template is arranged and synthesized so that the time interval when it is determined to be a clapping sound is longer than that of.

このように、実施例５の環境音分析装置５と本実施例の環境音合成装置６により構成される環境音伝送システム５０００によれば、周期性判定部５６がフレームを分析して周期情報を生成し、音源合成部６２において、周期情報の有無を考慮して拍手、手拍子の何れかの音響信号を生成するため、伝送元において収音された拍手や手拍子音を効率よく伝送する効果に加えて、拍手、手拍子の双方をより正確に合成することができる。 As described above, according to the environmental sound transmission system 5000 including the environmental sound analysis device 5 according to the fifth embodiment and the environmental sound synthesis device 6 according to the present embodiment, the periodicity determination unit 56 analyzes the frame to obtain the periodic information. In addition to the effect of efficiently transmitting applause and hand clapping sounds collected at the transmission source, the sound source synthesizer 62 generates an acoustic signal of either applause or hand clapping in consideration of the presence or absence of periodic information. Thus, both applause and clapping can be synthesized more accurately.

以下、図１９、図２０を参照して実施例４の変形実施例である本発明の実施例７の環境音合成装置について説明する。図１９は本実施例の環境音合成装置７の構成を示すブロック図である。図２０は本実施例の環境音合成装置７の動作を示すフローチャートである。図１９に示すように、本実施例の環境音合成装置７は、データ受信部３１と、音源合成部７２と、テンプレート記憶部７３と、再生部３４と、人数推定部４５と、テンプレート音量記憶部４６とを備える。実施例４との違いは、実施例４における音源合成部４２と、テンプレート記憶部４３が、本実施例においてそれぞれ音源合成部７２と、テンプレート記憶部７３に変更されている点のみである。よって、音源合成部７２、テンプレート記憶部７３以外の各構成については説明を省略する。 The environmental sound synthesizer according to the seventh embodiment of the present invention, which is a modified embodiment of the fourth embodiment, will be described below with reference to FIGS. FIG. 19 is a block diagram showing the configuration of the environmental sound synthesizer 7 of this embodiment. FIG. 20 is a flowchart showing the operation of the environmental sound synthesizer 7 of this embodiment. As shown in FIG. 19, the environmental sound synthesizer 7 of this embodiment includes a data receiving unit 31, a sound source synthesizing unit 72, a template storage unit 73, a playback unit 34, a number estimating unit 45, and a template volume storage. Part 46. The difference from the fourth embodiment is only that the sound source synthesizing unit 42 and the template storage unit 43 in the fourth embodiment are changed to a sound source synthesizing unit 72 and a template storage unit 73 in this embodiment, respectively. Therefore, description of each component other than the sound source synthesis unit 72 and the template storage unit 73 is omitted.

実施例４では音源合成部４２においてテンプレート記憶部４３から環境音素片テンプレートをランダムに１つ選択して、伝送先での環境音を合成していたが、本実施例では、音源合成部４２に代わって、音源合成部７２が、テンプレート記憶部７３から環境音素片テンプレートを出力確率に応じて１以上選択して合成する（Ｓ７２）。 In the fourth embodiment, the sound source synthesis unit 42 randomly selects one environmental sound segment template from the template storage unit 43 and synthesizes the environmental sound at the transmission destination. Instead, the sound source synthesis unit 72 selects and synthesizes one or more environmental phoneme templates from the template storage unit 73 according to the output probability (S72).

特に、声援・掛け声は拍手音と比較すると個人差が大きく、その内容にいくつかバリエーションがある。例えば、出演者が複数人いる場合などは、出演者の個人名やニックネームが声援や掛け声に含まれることがあり、出演者毎に異なる声援・掛け声のパターンが存在する。 In particular, cheering and shouting are more individually different than applause sounds, and there are several variations in their content. For example, when there are a plurality of performers, the personal names and nicknames of the performers may be included in the cheering and shouting, and there are different cheering and shouting patterns for each performer.

ただし、声援や掛け声の音のパターン（ミックスを打つとも表現される）はコンテンツによりきめられていることが多く、声援や掛け声のセリフのバリエーションは多くない。また、ある曲の決められたタイミングで発せられることが多く、音響パワーの時間差が小さい。したがって、複数種類の声援や掛け声音の定型文を環境音素片テンプレートとして用意しておき、一つのテンプレートではなく、音量に応じたゆらぎを持たせた混合音にして再生することにより、毎回同じ声援や掛け声の音ではなく、発せられるタイミングにより別の環境音に類似した音を構成することができる。 However, cheering and cheering sound patterns (also expressed as hitting a mix) are often determined by the content, and there are not many variations of cheering and cheering lines. Moreover, it is often emitted at a predetermined timing of a certain song, and the time difference of sound power is small. Therefore, the same type of support is provided each time by preparing multiple types of cheering and cheering phrases as environmental sound segment templates and playing them as mixed sounds with fluctuations according to the volume instead of a single template. A sound similar to another environmental sound can be formed according to the timing at which it is emitted instead of the sound of shout or shout.

以下では、伝送元で複数の出演者を含むコンテンツが演じられている（もしくは再生されている）場合を例に、環境音を伝送元とは異なる伝送先の会場で合成する環境音合成装置７の動作例を説明する。 In the following, an environmental sound synthesizer 7 that synthesizes environmental sound at a transmission destination venue different from the transmission source, taking as an example a case where content including a plurality of performers is played (or reproduced) at the transmission source. An example of the operation will be described.

前述したように、本実施例の環境音合成装置７は、実施例４における環境音合成装置４とほとんど同じ構成である。ただし、テンプレート記憶部７３に、複数種類の環境音素片テンプレートと各環境音素片テンプレートの出力確率とが対応付けて記憶されている点が異なる。複数種類の環境音素片テンプレートとは、例えば、コンテンツの出演者の各々の個人名やニックネームに対応する声援や掛け声の音響信号であり、予め用意しておくものとする。 As described above, the environmental sound synthesizer 7 according to the present embodiment has almost the same configuration as the environmental sound synthesizer 4 according to the fourth embodiment. However, the difference is that the template storage unit 73 stores plural types of environmental sound element templates and output probabilities of the environmental sound element templates in association with each other. The plurality of types of environmental sound segment templates are, for example, cheering and shouting sound signals corresponding to the individual names and nicknames of the performers of the content, and are prepared in advance.

以下、図２１を参照して説明を続ける。図２１は、テンプレート記憶部に環境音素片テンプレートと出力確率とを対応付けて記憶する例を例示する図である。 Hereinafter, the description will be continued with reference to FIG. FIG. 21 is a diagram illustrating an example in which the environmental phoneme template and the output probability are stored in the template storage unit in association with each other.

図２１のＡでは、環境音素片テンプレートの後のＰｉ（ｉ＝１，２，…，ｎ）、例えば「アリスー」の後のＰ１は、その対応する環境音素片テンプレートが出力される確率であり、Ｐ１＋Ｐ２＋…＋Ｐｎ＝１である。実施例４の動作例２はこの確率が共通であって（Ｐ１＝Ｐ２＝…＝Ｐｎ＝１／ｎ）環境音素片テンプレートをランダムに１つ選択して合成する場合に相当する。本実施例では、この確率がテンプレートごとに異なる。なお、図２１のＡは出演者ごとに１つの環境音テンプレートが記憶されている例であるが、図２１のＢのように出演者ごとに複数の環境音テンプレートが記憶されていて、各環境音テンプレートに出力確率が対応付けられていても良い。各テンプレートの確率は予め設定しておけば良い。例えば、伝送元や伝送先のコンサートホール等において各自が応援したい出演者を投票もしくは、事前に登録してもらうなどして、その会場の聴衆についての出演者の人気度のランキングを得ておき、その順位が高いほど出力確率が高くなるように、各テンプレートの出力確率を決定しておけば良い。あるいは、伝送先の聴衆に限らずに取得した人気度のランキングに基づいても良い。また、ある出演者の誕生日や引退など、特別なイベントを考慮して、該当出演者への環境音素片テンプレートの出力確率を高くするようにバイアスをかけてもよい。 In FIG. 21A, Pi (i = 1, 2,..., N) after the environmental phoneme template, for example, P1 after “Alice” is the probability that the corresponding environmental phoneme template is output. , P1 + P2 +... + Pn = 1. The operation example 2 of the fourth embodiment has the same probability (P1 = P2 =... = Pn = 1 / n) and corresponds to a case where one environmental phoneme template is selected and synthesized at random. In this embodiment, this probability is different for each template. FIG. 21A is an example in which one environmental sound template is stored for each performer. However, a plurality of environmental sound templates are stored for each performer as shown in FIG. An output probability may be associated with the sound template. The probability of each template may be set in advance. For example, you can vote for the performers you want to support at the transmission hall or destination concert hall, or have them registered in advance, and obtain the popularity ranking of the performers for the audience at the venue, What is necessary is just to determine the output probability of each template so that an output probability becomes high, so that the order | rank is high. Alternatively, it may be based on the popularity ranking acquired without being limited to the audience of the transmission destination. In addition, in consideration of special events such as a performer's birthday or retirement, a bias may be applied to increase the output probability of the environmental sound segment template to the performer.

これらの出力確率は、このシステムを動作させながらリアルタイムで変更してもよい。例えば、受聴者がボタンを押すなどのリアルタイムの投票数に応じて、出力確率値を時間的に変化させてもよい。 These output probabilities may be changed in real time while operating the system. For example, the output probability value may be temporally changed according to the number of votes in real time such as when the listener presses a button.

音源合成部７２では、テンプレート記憶部７３に記憶された出力確率に従って、テンプレートを１つ以上選択し、式（３’）または式（４）に従って環境音の音響信号を合成する。人数推定部４５で推定した人数Ｍが２名以上であって、テンプレートを２つ以上選択するとする。フレームｉについて選択するテンプレートの数をＳ_ｉ（Ｓは２以上の整数）とすると、音源合成部７２は、フレームｉごとに The sound source synthesizing unit 72 selects one or more templates according to the output probability stored in the template storage unit 73, and synthesizes the acoustic signal of the environmental sound according to the equation (3 ′) or the equation (4). It is assumed that the number of persons M estimated by the number-of-persons estimation unit 45 is two or more, and two or more templates are selected. If the number of templates to be selected for the frame i is S _i (S is an integer of 2 or more), the sound source synthesis unit 72

により合成した音響信号Ｙ_ｉを出力する。ここで、Ｔ_ｉ，ｋはフレームiについて選択したｋ番目のテンプレートであり、 The acoustic signal Y _i synthesized by the above is output. Where T _{i, k} is the kth template selected for frame i,

である。ここで、Ｍ_ｋは、テンプレートＴ_ｉ，ｋに対応する人数であり、 It is. Here, M _k is the number of people corresponding to the template T _{i, k} ,

となるように、テンプレートＴ_ｉ，ｋの出力確率に応じて各Ｍ_ｋの値を決定したものである。 Thus, the value of each M _k is determined according to the output probability of the template T _{i, k} .

以上の構成によれば、出演者の人気が反映された音量で、各出演者に対応する環境音が合成されるので、より場の盛り上がりや雰囲気を反映した環境音を合成することができる。 According to the above configuration, since the environmental sound corresponding to each performer is synthesized at a volume that reflects the popularity of the performer, it is possible to synthesize the environmental sound that more reflects the excitement and atmosphere of the place.

引き続き、図１９、図２０、図２１を参照して実施例８の環境音合成装置８について説明する。図１９に示すように、本実施例の環境音合成装置８は、データ受信部３１と、音源合成部８２と、テンプレート記憶部８３と、再生部３４と、人数推定部４５と、テンプレート音量記憶部４６とを備える。実施例７との違いは、実施例７における音源合成部７２と、テンプレート記憶部７３が、本実施例においてそれぞれ音源合成部８２と、テンプレート記憶部８３に変更されている点のみである。 Next, the environmental sound synthesizer 8 according to the eighth embodiment will be described with reference to FIGS. 19, 20, and 21. As shown in FIG. 19, the environmental sound synthesizer 8 according to the present embodiment includes a data receiving unit 31, a sound source synthesizing unit 82, a template storage unit 83, a playback unit 34, a number estimating unit 45, and a template volume storage. Part 46. The difference from the seventh embodiment is only that the sound source synthesizing unit 72 and the template storage unit 73 in the seventh embodiment are changed to a sound source synthesizing unit 82 and a template storage unit 83 in the present embodiment, respectively.

実施例７ではテンプレートごとに確率値を決めていたが、本実施例では図２１のＣに示すテンプレート記憶部８３のように、テンプレートを出演者毎にクラスタリングし、クラスタごとに出力確率を決定しても良い。この場合、音源合成部８２では、テンプレート記憶部８３に記憶された出力確率に従って、出演者に対応するクラスタを１つ以上選択し、選択した各クラスタに対応する１以上の環境音素片テンプレートの中からランダムに１つまたは複数の環境音素片テンプレートを選択する。なお、クラスタ内で複数の環境音素片テンプレートを選択する場合には、そのテンプレートにより出力される音響信号の音量の総和が、当該クラスタに対して割り当てられる音量となるように、テンプレートごとの音量を決定する。 Although the probability value is determined for each template in the seventh embodiment, the template is clustered for each performer and the output probability is determined for each cluster as in the template storage unit 83 shown in FIG. 21C. May be. In this case, the sound source synthesis unit 82 selects one or more clusters corresponding to the performers in accordance with the output probabilities stored in the template storage unit 83, and selects one of the one or more environmental sound segment templates corresponding to the selected clusters. Randomly select one or more environmental sound segment templates. When selecting a plurality of environmental phoneme template in a cluster, the volume for each template is set so that the sum of the volume of the sound signal output by the template is the volume assigned to the cluster. decide.

以下、実施例９の環境音合成装置９について説明する。本実施例の環境音合成装置９は、実施例７または８の環境音合成装置で合成した音声をヘッドマウントディスプレイに出力する。 Hereinafter, the environmental sound synthesizer 9 according to the ninth embodiment will be described. The environmental sound synthesizer 9 of this embodiment outputs the sound synthesized by the environmental sound synthesizer of Example 7 or 8 to the head mounted display.

ここで、テンプレート記憶部７３（８３）に記憶しておく各テンプレートもしくは各クラスタの出力確率は、ヘッドマウントディスプレイを装着する使用者の好みに応じて予め設定しておく。これにより、個人ごとに好みの環境音を提示することができる。ヘッドマウントディスプレイの装着者毎にあらかじめ定めた確率値に応じて、テンプレート記憶部７３（８３）から環境音素片テンプレートをあらかじめ定められた確率に応じて選択して合成すると、個人個人に応じた環境音を別々に提示することができる。ヘッドマウントディスプレイに限らず、ユーザが音響的に独立の場所でこのシステムを利用する場合には、場所ごとに異なる確率値でテンプレート記憶部７３（８３）から環境音素片テンプレートをあらかじめ定められた確率に応じて選択して合成することができる。 Here, the output probability of each template or each cluster stored in the template storage unit 73 (83) is set in advance according to the preference of the user wearing the head mounted display. Thereby, a favorite environmental sound can be presented for each individual. According to the probability value predetermined for each wearer of the head mounted display, the environment phoneme template is selected from the template storage unit 73 (83) according to the predetermined probability and synthesized. Sounds can be presented separately. When the user uses this system in an acoustically independent place, not limited to the head-mounted display, the probability that the environmental phoneme template is predetermined from the template storage unit 73 (83) with a different probability value for each place. It can be selected according to the combination.

以下、実施例１０の環境音合成装置１０について説明する。上述の実施例７、８、９では、演者ごとに異なる環境音素片テンプレートを対応させていたが、本実施例の環境音合成装置１０では、演者全員に共通な環境音素片テンプレートを定義する。例えば確率値もしくはランダムに選択された環境音素片テンプレートがいわゆるミックス（声援や掛け声の音のパターン）でもよい。 Hereinafter, the environmental sound synthesizer 10 according to the tenth embodiment will be described. In the seventh, eighth, and ninth embodiments described above, different environmental sound element templates are associated with each performer. However, in the environmental sound synthesizing apparatus 10 according to the present embodiment, an environmental sound element template common to all performers is defined. For example, a so-called mix (a pattern of cheering or cheering sounds) may be used as a probability value or a randomly selected environmental sound segment template.

なお、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 The various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good.

なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer). In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

An environmental sound synthesizer that generates an environmental sound by acquiring an environmental volume parameter related to the volume of an acoustic signal of a transmission source from an environmental sound analyzer,
A data receiver for receiving the environmental volume parameter from the environmental sound analyzer;
A template storage unit that stores an environmental sound template for one frame (hereinafter referred to as a template) and information corresponding to the volume of the environmental sound of the template in association with each other;
A template having the same volume as the environmental volume parameter is selected from the template storage unit, and a sound source synthesis unit that synthesizes the selected template to generate environmental sound;
An environmental sound synthesizer characterized by comprising:

An environmental sound synthesizer that generates an environmental sound by acquiring an environmental volume parameter related to the volume of an acoustic signal of a transmission source from an environmental sound analyzer,
A data receiver for receiving the environmental volume parameter from the environmental sound analyzer;
A template storage unit for storing a template of environmental sound for one beat by a person (hereinafter referred to as an environmental sound segment template);
A template volume storage unit that stores information corresponding to the volume of the environmental phoneme template stored in the template storage unit;
A number estimation unit that outputs a value obtained by dividing the information corresponding to the volume of the environmental volume parameter by the information corresponding to the volume of the environmental sound segment template as a number of persons,
The environmental phoneme template is selected from the template storage unit, and the selected environmental phoneme template is arranged and synthesized at predetermined time intervals so as to decrease as the number of people increases, thereby generating an environmental sound. A sound source synthesizer
An environmental sound synthesizer characterized by comprising:

The environmental sound synthesizer according to claim 2,
The template storage unit stores a plurality of types of environmental sound element templates and output probabilities of each environmental sound element template in association with each other,
The sound source synthesis unit
One or more environmental sound segment templates are selected from the template storage unit in accordance with the output probability and synthesized.

An environmental sound transmission system including an environmental sound analyzer and an environmental sound synthesizer,
The environmental sound analyzer is
A sound volume calculation unit that obtains an acoustic signal and calculates a value corresponding to the volume of the acoustic signal;
A parameter conversion unit that obtains a value corresponding to the volume of the acoustic signal, quantizes the value corresponding to the volume of the acoustic signal, and outputs the index as an environmental volume parameter;
A data transmission unit for transmitting the environmental volume parameter to the environmental sound synthesizer,
The environmental sound synthesizer is
A data receiver for receiving the environmental volume parameter from the environmental sound analyzer;
A template storage unit that stores an environmental sound template for one frame (hereinafter referred to as a template) and information corresponding to the volume of the environmental sound of the template in association with each other;
A template having the same volume as the environmental volume parameter is selected from the template storage unit, and a sound source synthesis unit that synthesizes the selected template to generate environmental sound;
An environmental sound transmission system comprising:

An environmental sound transmission system including an environmental sound analyzer and an environmental sound synthesizer,
The environmental sound analyzer is
A sound volume calculation unit that obtains an acoustic signal and calculates a value corresponding to the volume of the acoustic signal;
A parameter conversion unit that obtains a value corresponding to the volume of the acoustic signal, quantizes the value corresponding to the volume of the acoustic signal, and outputs the index as an environmental volume parameter;
A data transmission unit for transmitting the environmental volume parameter to the environmental sound synthesizer,
The environmental sound synthesizer is
A data receiver for receiving the environmental volume parameter from the environmental sound analyzer;
A template storage unit for storing a template of environmental sound for one beat by a person (hereinafter referred to as an environmental sound segment template);
A template volume storage unit that stores information corresponding to the volume of the environmental phoneme template stored in the template storage unit;
A number estimation unit that outputs a value obtained by dividing the information corresponding to the volume of the environmental volume parameter by the information corresponding to the volume of the environmental sound segment template as a number of persons,
The environmental phoneme template is selected from the template storage unit, and the selected environmental phoneme template is arranged and synthesized at predetermined time intervals so as to decrease as the number of people increases, thereby generating an environmental sound. A sound source synthesizer
An environmental sound transmission system comprising:

The environmental sound transmission system according to claim 4 or 5,
The parameter converter of the environmental sound analyzer is
Calculating an average energy E _j (j is an integer of 1 or more representing a frame index) as a value corresponding to the volume of the acoustic signal;
From the average energy E _j

In columns F _j calculates represented, said column F _j twice, obtains the value G _j divided by 2, assign a symbol to each of the values G _j by converting the value G _j to the code, The environmental sound transmission system, wherein a symbol assigned to the value G _j is output as a parameter P _j of a j-th frame.

The environmental sound transmission system according to claim 5,
The environmental sound analyzer is
It is determined whether or not the acoustic signal is a clapping sound. If the determination result is not a clapping sound, environmental sound volume parameter = 0 and period information indicating no period are output, and the determination result is a clapping sound. In the case, a clap section detection unit that outputs a flag indicating a clap sound;
Obtaining a flag indicating the applause sound, detecting the period of the acoustic signal using an autocorrelation function, and outputting the peak interval of the autocorrelation function as period information when the period is detected, A periodicity determination unit that outputs period information indicating no period when no period is detected;
The data transmission unit transmits the period information together with the environmental sound volume parameter to the environmental sound synthesizer,
The data receiving unit receives the period information together with the environmental sound volume parameter from the environmental sound analyzer,
The sound source synthesizer determines that the acoustic signal is a clapping sound when the environmental volume parameter is not 0 and the period information indicates no period, and the environmental volume parameter is not 0 and the period information is a period When the presence is indicated, it is determined that the sound signal is a hand clapping sound, and the environment is set such that the time interval when the sound signal is determined to be the clapping sound is longer than that of the clapping sound. An environmental sound transmission system characterized by arranging and synthesizing a phoneme template.

An environmental sound synthesis method for generating an environmental sound by acquiring an environmental volume parameter related to a volume of an acoustic signal of a transmission source,
A data receiving step for receiving the environmental volume parameter;
Referring to a template storage unit that stores an environmental sound template for one frame (hereinafter referred to as a template) and information corresponding to the volume of the environmental sound of the template in association with each other, the same volume level as the environmental volume parameter A sound source synthesizing step of selecting a template and generating the environmental sound by synthesizing the selected template;
An environmental sound synthesis method comprising:

An environmental sound synthesis method for generating an environmental sound by acquiring an environmental volume parameter related to a volume of an acoustic signal of a transmission source,
A data receiving step for receiving the environmental volume parameter;
Referring to a template volume storage unit that stores information corresponding to the volume of an environmental sound template (hereinafter referred to as an environmental sound segment template) for one beat by one person, information corresponding to the volume of the environmental volume parameter is obtained. The number of people estimation step of outputting the value divided by the information corresponding to the volume of the environmental phoneme template as the number of people,
With reference to a template storage unit that stores the environmental phoneme template, the environmental phoneme template is selected, and the selected environmental phoneme template is reduced at time intervals determined so as to decrease as the number of people increases. A sound source synthesis step for generating an environmental sound,
An environmental sound synthesis method comprising:

An environmental sound transmission method executed by an environmental sound analyzer and an environmental sound synthesizer,
The environmental sound analyzer is
Obtaining a sound signal and calculating a value corresponding to the sound signal volume;
A parameter conversion step of obtaining a value corresponding to the volume of the acoustic signal, quantizing the value corresponding to the volume of the acoustic signal, and outputting the index as an environmental volume parameter;
Performing a data transmission step of transmitting the environmental volume parameter to the environmental sound synthesizer;
The environmental sound synthesizer is
A data receiving step for receiving the environmental volume parameter from the environmental sound analyzer;
Referring to a template storage unit that stores an environmental sound template for one frame (hereinafter referred to as a template) and information corresponding to the volume of the environmental sound of the template in association with each other, the same volume level as the environmental volume parameter A sound source synthesizing step of selecting a template and generating the environmental sound by synthesizing the selected template;
The environmental sound transmission method characterized by performing.

An environmental sound transmission method executed by an environmental sound analyzer and an environmental sound synthesizer,
The environmental sound analyzer is
Obtaining a sound signal and calculating a value corresponding to the sound signal volume;
A parameter conversion step of obtaining a value corresponding to the volume of the acoustic signal, quantizing the value corresponding to the volume of the acoustic signal, and outputting the index as an environmental volume parameter;
Performing a data transmission step of transmitting the environmental volume parameter to the environmental sound synthesizer;
The environmental sound synthesizer is
A data receiving step for receiving the environmental volume parameter from the environmental sound analyzer;
A template volume storage unit that stores information corresponding to the volume of an environmental sound template (hereinafter referred to as an environmental sound segment template) for one beat by one person is referred to, and the information corresponding to the volume of the environmental volume parameter is The number of people estimation step of outputting the value divided by the information corresponding to the volume of the environmental sound segment template as the number of people,
With reference to a template storage unit that stores the environmental phoneme template, the environmental phoneme template is selected, and the selected environmental phoneme template is reduced at time intervals determined so as to decrease as the number of people increases. A sound source synthesis step for generating an environmental sound,
The environmental sound transmission method characterized by performing.

The environmental sound transmission method according to claim 10 or 11,
In the parameter conversion step executed by the environmental sound analyzer,
Calculating an average energy E _j (j is an integer of 1 or more representing a frame index) as a value corresponding to the volume of the acoustic signal;
From the average energy E _j

In columns F _j calculates represented, said column F _j twice, obtains the value G _j divided by 2, assign a symbol to each of the values G _j by converting the value G _j to the code, The environmental sound transmission method, wherein a symbol assigned to the value G _j is output as a parameter P _j of a j-th frame.

The environmental sound transmission method according to claim 11,
The environmental sound analyzer is
It is determined whether or not the acoustic signal is a clapping sound. If the determination result is not a clapping sound, environmental sound volume parameter = 0 and period information indicating no period are output, and the determination result is a clapping sound. In the case, a clap section detection step for outputting a flag indicating a clap sound;
Obtaining a flag indicating the applause sound, detecting the period of the acoustic signal using an autocorrelation function, and outputting the peak interval of the autocorrelation function as period information when the period is detected, A periodicity determining step of outputting periodic information indicating no period when no period is detected; and
In the data transmission step, the period information together with the environmental volume parameter is transmitted to the environmental sound synthesizer,
In the data receiving step, the periodic information together with the environmental sound volume parameter is received from the environmental sound analyzer,
In the sound source synthesis step, when the environmental volume parameter is not 0 and the period information indicates no period, it is determined that the acoustic signal is a clapping sound, and the environmental volume parameter is not 0 and the period information is a period. When the presence is indicated, it is determined that the sound signal is a hand clapping sound, and the environment is set such that the time interval when the sound signal is determined to be the clapping sound is longer than that of the clapping sound. An environmental sound transmission method comprising arranging and synthesizing a phoneme template.

A program for causing a computer to function as the environmental sound synthesizer according to any one of claims 1 to 3.

A program for causing a computer to function as the environmental sound transmission system according to any one of claims 4 to 7.