JP2019126076A

JP2019126076A - Tone signal control method and display control method

Info

Publication number: JP2019126076A
Application number: JP2019041824A
Authority: JP
Inventors: 嘉山　啓; Hiroshi Kayama; 啓嘉山; 雅史吉田; Masashi Yoshida; 佳孝浦谷; Yoshitaka Uratani; 森　隆志; Takashi Mori; 隆志森; 国本　利文; Toshifumi Kunimoto; 利文国本; 近藤　多伸; Kazunobu Kondo; 多伸近藤; 隼人大下; Hayato Oshita; 誠橘; Makoto Tachibana
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2019-07-25
Anticipated expiration: 2034-10-17
Also published as: JP6881488B2

Abstract

To provide a mixing device for mixing a plurality of tone signals while subjecting to appropriate acoustic treatment, without performing a complex operation.SOLUTION: An analysis part 101 sets main voice order in tone signals Ak (k=1-n), on the basis of a sound volume extracted from the plurality of tone signals Ak (k=1-n). The main voice order thus set is associated with the tone signals Ak (k=1-n), and outputted to a generating section 102. The generating section 102 generates control data for controlling sound image localization of the tone signals Ak (k=1-n), according to the main voice order. A synthesis section 103 performs sound image localization processing of the tone signals Ak (k=1-n), according to the control data, and mixes the tone signals Ak (k=1-n) subjected to sound image localization processing.SELECTED DRAWING: Figure 2

Description

本発明は、ミキシング装置等に好適な音信号制御方法および表示制御方法に関する。 The present invention relates to a sound signal control method and display control method suitable for a mixing apparatus or the like.

マイクロホン等を介して入力される複数の音声信号のミキシングを行うミキシング装置が知られている。この種のミキシング装置では、ミキシング結果を放音したときの音響的効果を高めるため、ミキシング対象である各音声信号に対して、音像定位処理等、各種の音響処理を施す場合がある。 There is known a mixing device that mixes a plurality of audio signals input via a microphone or the like. In this type of mixing apparatus, various sound processing such as sound image localization processing may be performed on each sound signal to be mixed in order to enhance the acoustic effect when the mixing result is emitted.

特許第４０６８０６９号Patent No. 4068069

ところで、例えば複数人の歌い手の歌唱音声信号のミキシングを行う場合、それらの各歌唱音声信号の状況は時々刻々と変化する。従って、優れた音響的効果を実現するためには、各歌唱音声信号に適用する音響処理の内容を各歌唱音声の状況に応じて臨機応変に切り換えることが求められる。しかしながら、ミキシング装置の操作に慣れた熟練者でないと、そのような切り換え操作を行うことは困難である。 By the way, when mixing the singing voice signal of a plurality of singers, for example, the situation of each of those singing voice signals changes from moment to moment. Therefore, in order to realize an excellent acoustic effect, it is required to switch the contents of the acoustic processing applied to each singing voice signal to be adaptive depending on the situation of each singing voice. However, it is difficult for such a switching operation to be performed unless it is an expert who is used to the operation of the mixing apparatus.

この発明は、以上説明した事情に鑑みてなされたものであり、複雑な操作を行わせることなく、複数の音信号の状況に応じて、各音信号に適切な音響処理を施してミキシングすることができるミキシング装置を提供することを目的としている。 The present invention has been made in view of the circumstances described above, and performs appropriate acoustic processing on each sound signal and mixes them according to the conditions of a plurality of sound signals without performing complicated operations. The purpose is to provide a mixing device that can

この発明は、複数の音信号から特徴量を各々抽出し、抽出した各特徴量に基づき、前記複数の音信号に順位を各々設定する分析部と、前記複数の音信号に適用する音響処理を各々制御するための複数の制御データを前記複数の音信号に設定された順位に基づいて各々生成する生成部とを有することを特徴とするミキシング装置を提供する。 The present invention extracts an amount of feature from each of a plurality of sound signals, and based on the extracted each amount of feature, an analysis unit that sets an order to each of the plurality of sound signals, and acoustic processing applied to the plurality of sound signals. The present invention provides a mixing apparatus including: a generation unit that generates a plurality of control data to be controlled based on the order set in the plurality of sound signals.

かかるミキシング装置によれば、複数の音信号から各々抽出される特徴量が変化すると、これにより各音信号に設定される順位が変化する場合がある。この場合、変化後の各音信号の順位に従って、各音信号に施される音響処理が制御される。従って、時々刻々と変化する複数の音信号の状況に応じて、各音信号に適用する音響処理の内容を制御することができる。 According to the mixing apparatus, when the feature quantities extracted from the plurality of sound signals change, the order set for each sound signal may change. In this case, the sound processing applied to each sound signal is controlled in accordance with the order of each sound signal after the change. Therefore, it is possible to control the content of the acoustic processing applied to each sound signal according to the situation of the plurality of sound signals changing from moment to moment.

なお、ミキシングの際の音響処理の制御を行う技術を開示した文献として、特許文献１がある。この特許文献１では、歌唱者がカラオケに合わせて、ある歌唱パートを歌唱していることをカラオケ装置が認知すると、その歌唱パートとミキシングするバックコーラスパートの再生音量を小さくする。しかし、この発明は、この特許文献１のようにミキシング対象である１つのパートの音声信号の有無に基づいて他のパートの音声信号の音量を制御するものではなく、ミキシング対象である複数の音信号の特徴量に基づいて複数の音信号に順位を設定し、複数の音信号の順位に従って、各音信号に適用する音響処理を制御するものである。このように、本発明は、特許文献１に開示のものとは全く異なる発明である。 In addition, there exists patent document 1 as a document which disclosed the technique which performs control of the sound processing in the case of mixing. In this patent document 1, when the karaoke apparatus recognizes that the singer sings a certain singing part in accordance with karaoke, the reproduction volume of the back chorus part to be mixed with the singing part is reduced. However, this invention does not control the volume of the audio signal of the other part based on the presence or absence of the audio signal of one part which is the mixing object as in this patent document 1, but a plurality of sounds which are the mixing object The order is set to the plurality of sound signals based on the feature amount of the signal, and the sound processing to be applied to each sound signal is controlled according to the order of the plurality of sound signals. Thus, the present invention is an invention completely different from that disclosed in Patent Document 1.

本発明の第１実施形態であるミキシング装置１０の構成を示すブロック図である。It is a block diagram showing composition of mixing device 10 which is a 1st embodiment of the present invention. 同実施形態におけるＣＰＵ１が実行するミキシング制御プログラム１００の構成を説明するための図である。It is a figure for demonstrating the structure of the mixing control program 100 which CPU1 in the embodiment performs. 同ミキシング制御プログラム１００の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the mixing control program 100. 本発明の第２実施形態であるミキシング装置２０において、ＣＰＵ１が実行するミキシング制御プログラム２００の構成を説明するための図である。It is a figure for demonstrating the structure of the mixing control program 200 which CPU1 performs in the mixing apparatus 20 which is 2nd Embodiment of this invention. 同ミキシング制御プログラム２００の処理内容を示すフローチャートである。5 is a flowchart showing the processing content of the mixing control program 200. FIG.

＜第１実施形態＞
図１は、この発明の第１実施形態であるミキシング装置１０の構成を示すブロック図である。図１に示すミキシング措置１０は、ＣＰＵ１と、ＲＯＭ２と、ＲＡＭ３と、表示部４と、操作部５と、データＩ／Ｏ６と、集音器７−ｋ（ｋ＝１〜ｎ）と、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）と、Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）と、増幅器１０−ｊ（ｊ＝１〜ｍ）と、拡声器１１−ｊ（ｊ＝１〜ｍ）により構成される。各々の機器は、バス１２を介してデータの入出力を行う。なお、バス１２はオーディオバスやデータバス等を総称したものである。 First Embodiment
FIG. 1 is a block diagram showing the configuration of a mixing apparatus 10 according to a first embodiment of the present invention. The mixing device 10 shown in FIG. 1 includes the CPU 1, the ROM 2, the RAM 3, the display unit 4, the operation unit 5, the data I / O 6, the sound collectors 7-k (k = 1 to n), and A / D converter 8-k (k = 1 to n), D / A converter 9-j (j = 1 to m), amplifier 10-j (j = 1 to m), and loudspeaker 11- It comprises j (j = 1 to m). Each device performs data input / output via the bus 12. The bus 12 is a generic term for an audio bus, a data bus and the like.

ＣＰＵ１は、バス１２を介してミキシング装置全体の動作を制御するプロセッサである。ＲＯＭ２は、ミキシング装置１０の基本的な動作を制御するためにＣＰＵ１が実行するプログラム（以下、ミキシング制御プログラムという）を記憶した読み出し専用メモリである。ＲＡＭ３は、ＣＰＵ１によってワークエリアとして利用される揮発性メモリである。表示部４は、例えば液晶ディスプレイとその駆動回路であり、ＣＰＵ１からバス１２を介して与えられた表示制御信号に基づいて各種画面を表示する。操作部５は、利用者に各種情報を入力させるための手段であり、複数の操作子やタッチパネル等で構成されている。データＩ／Ｏ６は、ＭＩＤＩ（Musical Instruments Digital Interface：登録商標）形式の演奏データやオーディオ形式の波形データを外部から受け取り、音信号として出力するインターフェースである。集音器７−ｋ（ｋ＝１〜ｎ）はｎ個のマイクロホン等により構成され、入力される歌唱者の歌声等をアナログの電気信号に変換してＡ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）に出力する。Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）は、集音器７−ｋ（ｋ＝１〜ｎ）から出力される各アナログ音信号をデジタル音信号Ａｋ（ｋ＝１〜ｎ）に変換する。Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）は、ミキシング処理の結果得られるデジタル音信号Ｂｊ（ｊ＝１〜ｍ）をアナログ音信号に変換する。増幅器１０−ｊ（ｊ＝１〜ｍ）は、Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）から出力されたアナログ音信号を増幅する。拡声器１１−ｊ（ｊ＝１〜ｍ）は、増幅器１０−ｊ（ｊ＝１〜ｍ）から出力されるアナログ音信号を音として放音する。 The CPU 1 is a processor that controls the operation of the entire mixing apparatus via the bus 12. The ROM 2 is a read only memory storing a program (hereinafter, referred to as a mixing control program) executed by the CPU 1 to control the basic operation of the mixing apparatus 10. The RAM 3 is a volatile memory used by the CPU 1 as a work area. The display unit 4 is, for example, a liquid crystal display and a drive circuit thereof, and displays various screens based on a display control signal supplied from the CPU 1 through the bus 12. The operation unit 5 is a means for causing the user to input various information, and is configured by a plurality of operators and a touch panel. The data I / O 6 is an interface that externally receives performance data in the MIDI (Musical Instruments Digital Interface (registered trademark)) format and waveform data in the audio format, and outputs it as a sound signal. The sound collector 7-k (k = 1 to n) is composed of n microphones etc., and converts the singing voice etc. of the input singer into an analog electric signal and converts it into an A / D converter 8-k (k Output to = 1 to n). The A / D converter 8-k (k = 1 to n) converts each analog sound signal output from the sound collector 7-k (k = 1 to n) into a digital sound signal Ak (k = 1 to n) Convert to The D / A converter 9-j (j = 1 to m) converts the digital sound signal Bj (j = 1 to m) obtained as a result of the mixing process into an analog sound signal. The amplifier 10-j (j = 1 to m) amplifies an analog sound signal output from the D / A converter 9-j (j = 1 to m). The loudspeakers 11-j (j = 1 to m) emit an analog sound signal output from the amplifier 10-j (j = 1 to m) as a sound.

図２は、本実施形態におけるＣＰＵ１が実行するミキシング制御プログラム１００の構成を説明するための図である。ミキシング制御プログラム１００は、分析部１０１と、生成部１０２と、合成部１０３とを含んでいる。分析部１０１は、逐次入力される音信号Ａｋ（ｋ＝１〜ｎ）から特徴量を抽出し、抽出した特徴量に基づき、順位（以下、主音声順位という）を音信号Ａｋ（ｋ＝１〜ｎ）に設定する。そして、設定した主音声順位を音信号Ａｋ（ｋ＝１〜ｎ）に対応付け、これを分析データとして生成部１０２に出力する。本実施形態では、分析部１０１は、特徴量として音量を音信号Ａｋ（ｋ＝１〜ｎ）から抽出する。 FIG. 2 is a diagram for explaining the configuration of the mixing control program 100 executed by the CPU 1 in the present embodiment. The mixing control program 100 includes an analysis unit 101, a generation unit 102, and a synthesis unit 103. The analysis unit 101 extracts feature quantities from the sound signals Ak (k = 1 to n) sequentially input, and based on the extracted feature quantities, the order (hereinafter referred to as the main speech order) is a sound signal Ak (k = 1). Set to n). Then, the set main voice order is associated with the sound signal Ak (k = 1 to n), and this is output to the generation unit 102 as analysis data. In the present embodiment, the analysis unit 101 extracts the volume as the feature amount from the sound signal Ak (k = 1 to n).

生成部１０２は、分析データを受け取ると、音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に適用する音響処理を制御するための制御データを生成する。本実施形態では、音響処理として、音信号Ａｋ（ｋ＝１〜ｎ）に対して音像定位処理を施す。そこで、生成部１０２は、音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従って、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理に適用する音像位置を選択し、音信号Ａｋ（ｋ＝１〜ｎ）に対応した音像をこれらの選択した音像位置に定位させるための制御データを生成する。例えば、分析部１０１に音信号Ａ１〜Ａ３が入力され、音信号Ａ１の主音声順位が第３位、音信号Ａ２の主音声順位が第２位、音信号Ａ３の主音声順位が第１位であったとする。この場合、生成部１０２は、主音声順位が第１位である音信号Ａ３の音像を最も優遇された位置であるセンタに、主音声順位が第２位である音信号Ａ２の音像をその次に優遇された位置である左に、主音声順位が第３位である音信号Ａ１の音像を最も優遇されていない位置である右に定位させるための制御データを生成し、合成部１０３に出力する。なお、生成部１０２が選択する音像位置は任意であり、上記例において、主音声順位が第１位である音信号Ａ３の音像を左に、主音声順位が第２位である音信号Ａ２の音像を右に、主音声順位が第３位である音信号Ａ１の音像をセンタに定位させる等、種々のパターンが考えられる。 When receiving the analysis data, the generation unit 102 controls acoustic processing to be applied to the sound signal Ak (k = 1 to n) in accordance with the main audio order set to the sound signal Ak (k = 1 to n). Generate control data. In the present embodiment, sound image localization processing is performed on the sound signal Ak (k = 1 to n) as sound processing. Therefore, the generation unit 102 selects a sound image position to be applied to the sound image localization process of the sound signal Ak (k = 1 to n) in accordance with the main sound order set to the sound signal Ak (k = 1 to n). Control data for localizing sound images corresponding to the signals Ak (k = 1 to n) at these selected sound image positions is generated. For example, the sound signals A1 to A3 are input to the analysis unit 101, the main sound order of the sound signal A1 is third, the main sound order of the sound signal A2 is second, and the main sound order of the sound signal A3 is first. It is assumed that In this case, the generation unit 102 places the sound image of the sound signal A2 having the second main speech order next to the center at the position where the sound image of the sound signal A3 having the first main speech order is most favored. The control data for localizing the sound image of the sound signal A1 whose third main audio order is third to the right, which is the least preferential position, is generated on the left, which is a position favored by Do. Note that the sound image position selected by the generation unit 102 is arbitrary, and in the above example, the sound image of the sound signal A3 having the first main sound order is left with the sound signal A2 having the second main sound order. Various patterns can be considered, such as localizing the sound image of the sound signal A1 whose sound image is on the right and the main sound order is third, to the center.

合成部１０３は、生成部１０２から制御データを受け取ると、制御データに従って、音信号Ａｋ（ｋ＝１〜ｎ）に音響処理（この例では音像定位処理）を施すとともに、音響処理の結果である音信号をミキシングする。 When the synthesis unit 103 receives control data from the generation unit 102, the synthesis unit 103 performs acoustic processing (sound image localization processing in this example) on the sound signal Ak (k = 1 to n) according to the control data, and is a result of the acoustic processing. Mix sound signals.

図３は、本実施形態におけるミキシング制御プログラム１００の処理内容を示すフローチャートである。以下、図３を参照し、本実施形態の動作を説明する。複数の歌唱者が歌唱すると、複数の歌唱者の音信号が、集音器７−ｋ（ｋ＝１〜ｎ）を介してＡ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）に入力される。そして、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）によりＡ／Ｄ変換された音信号Ａｋ（ｋ＝１〜ｎ）は、分析部１０１および合成部１０３に入力される。 FIG. 3 is a flowchart showing the processing content of the mixing control program 100 in the present embodiment. The operation of the present embodiment will be described below with reference to FIG. When a plurality of singers sing, the sound signals of the plurality of singers are input to the A / D converter 8-k (k = 1 to n) via the sound collector 7-k (k = 1 to n) Be done. Then, the sound signal Ak (k = 1 to n) subjected to A / D conversion by the A / D converter 8-k (k = 1 to n) is input to the analysis unit 101 and the synthesis unit 103.

分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）を受け取ると、以下のような算出手順で、音信号Ａｋ（ｋ＝１〜ｎ）から特徴量として音量（音信号Ａｋ（ｋ＝１〜ｎ）の振幅値）を抽出し、抽出した音量を基に音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定する（ステップＳＡ１）。 When receiving the sound signal Ak (k = 1 to n), the analysis unit 101 calculates the volume (sound signal Ak (k = 1) as the feature amount from the sound signal Ak (k = 1 to n) according to the following calculation procedure. An amplitude value of .about.n) is extracted, and based on the extracted sound volume, the main voice order is set to the sound signal Ak (k = 1 to n) (step SA1).

まず、分析部１０１は、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）から音信号Ａｋ（ｋ＝１〜ｎ）が逐次入力されると、所定の時間単位における音信号Ａｋ（ｋ＝１〜ｎ）の振幅値を抽出し、振幅エンベロープを算出する。ここで、設定する時間単位は、所定の一定値や、歌唱者が歌う楽曲の１曲全体の再生時間または１番のみの再生時間等としてもよい。また、ＶＡＤ（Voice Activity Detection）や、ＶＡＤとｈａｎｇｏｖｅｒ処理を併用した処理等により時間単位を設定してもよい。 First, when the sound signal Ak (k = 1 to n) is sequentially input from the A / D converter 8-k (k = 1 to n), the analysis unit 101 generates the sound signal Ak (k) in a predetermined time unit. The amplitude value of 1 to n) is extracted, and the amplitude envelope is calculated. Here, the time unit to be set may be a predetermined constant value, the reproduction time of the entire one song of the song sung by the singer, the reproduction time of only the first, or the like. Also, the time unit may be set by voice activity detection (VAD), processing using VAD and hangover processing in combination, or the like.

次に、分析部１０１は、算出した音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープを平滑化し、振幅エンベロープ波形に重畳されたノイズを除去する。次に、分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープのうち、最大の振幅値を有する振幅エンベロープを特定する。そして、振幅エンベロープの最大振幅値によって、音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープを除算し正規化する。ここで、正規化した振幅エンベロープが、予め定められた所定の閾値に満たない振幅値を含む場合、その振幅値に対応する区間において音信号Ａｋ（ｋ＝１〜ｎ）が入力されていないものとする。 Next, the analysis unit 101 smoothes the amplitude envelope of the calculated sound signal Ak (k = 1 to n) and removes noise superimposed on the amplitude envelope waveform. Next, the analysis unit 101 specifies an amplitude envelope having the largest amplitude value among the amplitude envelopes of the sound signal Ak (k = 1 to n). Then, the amplitude envelope of the sound signal Ak (k = 1 to n) is divided and normalized by the maximum amplitude value of the amplitude envelope. Here, when the normalized amplitude envelope includes an amplitude value that does not satisfy a predetermined threshold, the sound signal Ak (k = 1 to n) is not input in a section corresponding to the amplitude value. I assume.

次に、分析部１０１は、正規化された音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープを各々比較し、振幅エンベロープ値の大きな順に主音声順位を与える。すなわち、分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）のうち、振幅エンベロープ値が最大のものの主音声順位を第１位、次に大きい振幅エンベロープ値を有する音信号の主音声順位を第２位、…、最も小さな振幅エンベロープ値を有する音信号の主音声順位を第ｎ位とする。従って、最も大きな声で歌う歌唱者の音信号は、複数の歌唱者の歌声の中で最も存在感が大きいため、主音声順位が第１位となる。一方、最も小さな声で歌う歌唱者の音信号は、複数の歌唱者の音声の中で最も存在感が小さいため、主音声順位が第ｎ位（最下位）となる。分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定すると、これを音信号Ａｋ（ｋ＝１〜ｎ）に対応付け、分析データとして生成部１０２に出力する（ステップＳＡ２）。 Next, the analysis unit 101 compares the amplitude envelopes of the normalized sound signal Ak (k = 1 to n) with one another, and gives the main speech order in descending order of the amplitude envelope value. That is, the analysis unit 101 determines, among the sound signals Ak (k = 1 to n), the main sound order of the one having the largest amplitude envelope value and the main sound order of the sound signal having the second largest amplitude envelope value. Second,..., The main speech order of the sound signal having the smallest amplitude envelope value is the nth. Therefore, since the sound signal of the singer who sings the loudest voice has the largest presence among the singing voices of the plurality of singers, the main voice ranks first. On the other hand, since the sound signal of the singer who sings with the smallest voice has the smallest presence among the voices of the plurality of singers, the main voice rank is the nth (bottom). Analysis unit 101 sets the main audio order to sound signal Ak (k = 1 to n), associates it with sound signal Ak (k = 1 to n), and outputs it as analysis data to generation unit 102 (step SA2).

生成部１０２は、分析部１０１から分析データを受け取ると、これを基に音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御データを設定する（ステップＳＡ３）。より具体的には、生成部１０２は、分析データを参照して、例えば最も主音声順位が高い音信号については、音像をセンタに定位させる制御データを設定する。一方、最も主音声順位が低い音信号については、音像を例えば右に定位させる制御データを設定する。生成部１０２は、音信号Ａｋ（ｋ＝１〜ｎ）について各々設定した音像定位処理の制御データを合成部１０３に出力する。 When receiving the analysis data from the analysis unit 101, the generation unit 102 sets control data for sound image localization processing of the sound signal Ak (k = 1 to n) based on the analysis data (step SA3). More specifically, the generation unit 102 sets control data for localizing the sound image to the center with reference to the analysis data, for example, for the sound signal with the highest main sound order. On the other hand, for the sound signal with the lowest main sound order, control data for localizing the sound image to the right, for example, is set. The generation unit 102 outputs control data of sound image localization processing set for each of the sound signals Ak (k = 1 to n) to the synthesis unit 103.

合成部１０３は、生成部１０２から制御データを受け取ると、制御データに従って音信号Ａｋ（ｋ＝１〜ｎ）に音像定位処理を施す（ステップＳＡ４）。そして、音像定位処理が施された音信号Ａｋ（ｋ＝１〜ｎ）をミキシングし、Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）にミキシング結果である音信号Ｂｊ（ｊ＝１〜ｍ）を出力する（ステップＳＡ５）。ステップＳＡ５の処理が完了すると、ステップＳＡ１に戻り、以上説明したステップＳＡ１〜ＳＡ５の処理を繰り返す。 When receiving the control data from the generation unit 102, the synthesis unit 103 performs sound image localization processing on the sound signal Ak (k = 1 to n) according to the control data (step SA4). Then, the sound signals Ak (k = 1 to n) subjected to the sound image localization process are mixed, and sound signals Bj (j = 1) which are the mixing result to the D / A converter 9-j (j = 1 to m). .About.m) are output (step SA5). When the process of step SA5 is completed, the process returns to step SA1 and repeats the processes of steps SA1 to SA5 described above.

Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）は、ミキシング結果である音信号Ｂｊ（ｊ＝１〜ｍ）をアナログ音信号に変換し、拡声器１１−ｊ（ｊ＝１〜ｍ）に出力する。拡声器１１−ｊ（ｊ＝１〜ｍ）は、Ｄ／Ａ変換器９−ｋ（ｋ＝１〜ｎ）からのアナログ音信号をｍ個のスピーカから音として放音する。この結果、音信号Ａｋ（ｋ＝１〜ｎ）が、制御データにより定まる位置に音像の定位した音としてリスナに聴取される。 The D / A converter 9-j (j = 1 to m) converts the sound signal Bj (j = 1 to m) as the mixing result into an analog sound signal, and the loudspeakers 11-j (j = 1 to m) Output to). The loudspeakers 11-j (j = 1 to m) emit analog sound signals from the D / A converter 9-k (k = 1 to n) as sound from m speakers. As a result, the sound signal Ak (k = 1 to n) is heard by the listener as a sound whose sound image is localized at a position determined by the control data.

本実施形態では、複数の音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従って、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御データを設定する。そして、制御データにより定まる位置に音像が定位した音をリスナに聴取させる。従って、本実施形態によると、ユーザは複雑な操作を一切行わずに、複数の音信号Ａｋ（ｋ＝１〜ｎ）の状況（この場合、音量の大小関係）に応じて、音信号Ａｋ（ｋ＝１〜ｎ）に適用する定位を適切に切り換えることができる。 In the present embodiment, control data of sound image localization processing of the sound signal Ak (k = 1 to n) is set according to the main sound order set to the plurality of sound signals Ak (k = 1 to n). Then, the listener is made to listen to the sound whose sound image is localized at the position determined by the control data. Therefore, according to the present embodiment, the user does not perform any complicated operation, and the sound signal Ak (in this case, the magnitude relationship of the volume) of the plurality of sound signals Ak (k = 1 to n) is generated. It is possible to appropriately switch the localization applied to k = 1 to n).

また、本実施形態では、音量の大きさに応じて音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定し、主音声順位が最も大きい音信号Ａｋ（ｋ＝１〜ｎ）はセンタに、主音声順位が最も小さい音信号Ａｋ（ｋ＝１〜ｎ）は左右に定位するように音像定位処理の制御データを設定する。従って、本実施形態によると、歌唱者に自分の歌声をセンタに定位させるために、大きな声で歌唱する動機づけを行わせることができる。 Further, in the present embodiment, the main audio order is set to the sound signal Ak (k = 1 to n) according to the volume level, and the sound signal Ak (k = 1 to n) having the largest main audio order is the center. The control data of the sound image localization process is set so that the sound signal Ak (k = 1 to n) with the smallest main sound order is localized to the left and right. Therefore, according to the present embodiment, it is possible to cause a singer to perform a motivation to sing in a loud voice in order to localize his / her singing voice at the center.

＜第２実施形態＞
図４は、この発明の第２実施形態であるミキシング装置２０において、ＣＰＵ１が実行するミキシング制御プログラム２００の構成を説明するための図である。ミキシング制御プログラム２００は、分析部２０１と、生成部２０２と、合成部２０３と、ＵＩ（User Interface）２０４とを含んでいる。本実施形態におけるミキシング制御プログラム２００は、歌唱者の歌声等が録音されたオーディオ形式の波形データを再生して得られる音信号Ａｋ（ｋ＝１〜ｎ）にミキシング処理を行う。すなわち、本実施形態に示すミキシング装置２０は、第１実施形態に示したように、リアルタイムに入力される複数の音信号Ａｋ（ｋ＝１〜ｎ）をミキシングする処理に加えて、録音された複数の音声データ等をミキシングする処理を行う。ミキシング制御プログラム２００は、第１実施形態に示すミキシング制御プログラム１００に、ＵＩ２０４を含めた構成となっている。ＵＩ２０４は、ユーザの操作により、分析部２０１、生成部２０２および合成部２０３に操作コマンドを送信する。 Second Embodiment
FIG. 4 is a diagram for explaining the configuration of the mixing control program 200 executed by the CPU 1 in the mixing apparatus 20 according to the second embodiment of the present invention. The mixing control program 200 includes an analysis unit 201, a generation unit 202, a synthesis unit 203, and a UI (User Interface) 204. The mixing control program 200 in this embodiment performs mixing processing on sound signals Ak (k = 1 to n) obtained by reproducing waveform data in an audio format in which a singing voice of a singer is recorded. That is, as described in the first embodiment, the mixing apparatus 20 shown in the present embodiment is recorded in addition to the process of mixing the plurality of sound signals Ak (k = 1 to n) input in real time. A process of mixing a plurality of audio data etc. is performed. The mixing control program 200 has a configuration in which the UI 204 is included in the mixing control program 100 shown in the first embodiment. The UI 204 transmits an operation command to the analysis unit 201, the generation unit 202, and the combining unit 203 according to the operation of the user.

本実施形態では、分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から特徴量として音量を抽出するだけでなく、音色、定位、音高、歌声の継続時間等の種々の特徴量を抽出する。また、生成部２０２は、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御データを設定するだけでなく、音高、音量、音色等の種々の音響効果の制御データを設定する。合成部２０３は、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理を制御するだけでなく、音高、音量、音色の制御等の種々の音響処理を音信号Ａｋ（ｋ＝１〜ｎ）に施す。 In the present embodiment, the analysis unit 201 not only extracts the volume as the feature amount from the sound signal Ak (k = 1 to n), but also various feature amounts such as timbre, localization, pitch, and duration of singing voice. Extract. The generation unit 202 not only sets control data of sound image localization processing of the sound signal Ak (k = 1 to n), but also sets control data of various sound effects such as pitch, volume and timbre. The synthesis unit 203 not only controls sound image localization processing of the sound signal Ak (k = 1 to n), but also performs various sound processing such as control of pitch, volume, timbre, etc. to the sound signal Ak (k = 1 to n). Apply to).

図５は、本実施形態におけるミキシング制御プログラム２００の処理内容を示すフローチャートである。以下、図５を参照し、本実施形態の動作を説明する。ＣＰＵ１の指示により、データＩ／Ｏ６に格納されたオーディオ形式の波形データが再生されると、複数の音信号Ａｋ（ｋ＝１〜ｎ）が分析部２０１に入力される。 FIG. 5 is a flowchart showing the processing content of the mixing control program 200 in the present embodiment. The operation of the present embodiment will be described below with reference to FIG. When waveform data in the audio format stored in the data I / O 6 is reproduced according to an instruction from the CPU 1, a plurality of sound signals Ak (k = 1 to n) are input to the analysis unit 201.

分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）を受け取ると、音信号Ａｋ（ｋ＝１〜ｎ）から種々の特徴量を抽出する（ステップＳＢ１）。より具体的には、分析部２０１は、音色、定位、音高、歌声の継続時間等の種々の特徴量のうち１または複数の特徴量を抽出する。ここで、分析部２０１が抽出する特徴量は、ユーザからの指示により選択される。すなわち、ユーザはＵＩ２０４を介して、抽出すべき特徴量に対応する操作コマンドを分析部２０１に送信する。これを受け、分析部２０１は、ユーザから指定された１または複数の特徴量を音信号Ａｋ（ｋ＝１〜ｎ）から抽出する。 When receiving the sound signal Ak (k = 1 to n), the analysis unit 201 extracts various feature amounts from the sound signal Ak (k = 1 to n) (step SB1). More specifically, the analysis unit 201 extracts one or more feature amounts from various feature amounts such as timbre, localization, pitch, and duration of singing voice. Here, the feature quantities extracted by the analysis unit 201 are selected by an instruction from the user. That is, the user transmits an operation command corresponding to the feature amount to be extracted to the analysis unit 201 via the UI 204. In response to this, the analysis unit 201 extracts one or more feature amounts designated by the user from the sound signal Ak (k = 1 to n).

分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から１または複数の特徴量を抽出すると、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位を設定する。ここで、抽出した特徴量が複数ある場合、分析部２０１は複数の特徴量について設定された主音声順位を重みづけ加算して統合する。例えば、特徴量として音量と音色が抽出された場合、音量の主音声順位と音色の主音声順位とに重みを与え、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位を重みづけ加算により算出する。そして、重みづけ加算された主音声順位を最終的な主音声順位とする。分析部２０１は、算出された主音声順位を音信号Ａｋ（ｋ＝１〜ｎ）に対応付け、分析データとして生成部２０２に出力する（ステップＳＢ２）。なお、重みはユーザがＵＩ２０４を介して、操作コマンドを分析部２０１に送信することにより指定される。 When the analysis unit 201 extracts one or a plurality of feature amounts from the sound signal Ak (k = 1 to n), the analysis unit 201 sets a main audio order of the sound signal Ak (k = 1 to n). Here, when there are a plurality of extracted feature quantities, the analysis unit 201 performs weighted addition on the main speech order set for the plurality of feature quantities and integrates them. For example, when the volume and the timbre are extracted as feature quantities, weights are given to the main voice order of the volume and the main voice order of the timbre, and the main voice orders of the sound signal Ak (k = 1 to n) are weighted and added. calculate. Then, the weighted and added main speech order is taken as the final main speech order. The analysis unit 201 associates the calculated main speech order with the sound signal Ak (k = 1 to n), and outputs the result as analysis data to the generation unit 202 (step SB2). The weight is designated by the user transmitting an operation command to the analysis unit 201 via the UI 204.

生成部２０２は、音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に施す種々の音響処理の制御データを設定する。ここで、生成部２０２は、定位、音量、音色の制御等の種々の音響処理のうち１または複数の音響処理の制御データを設定する（ステップＳＢ３）。 The generation unit 202 sets control data of various acoustic processes to be applied to the sound signal Ak (k = 1 to n) in accordance with the main audio order set to the sound signal Ak (k = 1 to n). Here, the generation unit 202 sets control data of one or more acoustic processes among various acoustic processes such as localization, volume control, and timbre control (step SB3).

例えば、音響処理として音量を制御する場合、生成部２０２は、最も主音声順位が高い音信号Ａｋ（ｋ＝１〜ｎ）の音量が最大となるように、音量の制御データを設定する。また、最も主音声順位が低い音信号Ａｋ（ｋ＝１〜ｎ）の音量が最小となるように、音量の制御データを設定する。 For example, when controlling the sound volume as sound processing, the generation unit 202 sets the control data of the sound volume such that the sound volume of the sound signal Ak (k = 1 to n) having the highest main audio rank is maximum. Further, the control data of the sound volume is set such that the sound volume of the sound signal Ak (k = 1 to n) having the lowest main audio order is minimized.

また、音響処理として音色を制御する場合、生成部２０２は、最も主音声順位が高い音信号Ａｋ（ｋ＝１〜ｎ）の、高音領域における音圧レベルが強調されるように、イコライザの制御データを設定する。また、最も主音声順位が高い音信号Ａｋ（ｋ＝１〜ｎ）の音声周波数帯域における音圧レベルが強調されるように、イコライザの制御データを設定してもよい。 In addition, when controlling the timbre as the sound processing, the generation unit 202 controls the equalizer so that the sound pressure level in the high sound area of the sound signal Ak (k = 1 to n) having the highest main sound order is emphasized. Set the data. The control data of the equalizer may be set so that the sound pressure level in the sound frequency band of the sound signal Ak (k = 1 to n) having the highest main sound order is emphasized.

音信号Ａｋ（ｋ＝１〜ｎ）に施す音響処理は、ユーザからの指示により選択される。すなわち、ユーザはＵＩ２０４を介して、所望の音響処理を指定する操作コマンドを生成部２０２に送信する。これを受け、生成部２０２は、ユーザから指定された１または複数の音響処理の制御データを設定する。生成部２０２は、音信号Ａｋ（ｋ＝１〜ｎ）に施す音響処理の制御データを合成部２０３に送信する。 The sound processing to be applied to the sound signal Ak (k = 1 to n) is selected by an instruction from the user. That is, the user transmits an operation command specifying a desired sound process to the generation unit 202 via the UI 204. In response to this, the generation unit 202 sets control data of one or more sound processing designated by the user. The generation unit 202 transmits control data of acoustic processing to be applied to the sound signal Ak (k = 1 to n) to the synthesis unit 203.

合成部２０３は、生成部２０２から制御データを受け取ると、制御データに従って音信号Ａｋ（ｋ＝１〜ｎ）に音響処理を施す（ステップＳＢ４）。そして、音響処理が施された音信号をミキシングし、ミキシング結果である音信号Ｂｊ（ｊ＝１〜ｍ）をデータＩ／Ｏ６に出力する（ステップＳＢ５）。ステップＳＢ５の処理が完了すると、ステップＳＢ１に戻り、以上説明したステップＳＢ１〜ＳＢ５の処理を繰り返す。データＩ／Ｏ６は、音信号Ｂｊ（ｊ＝１〜ｍ）を受け取ると、オーディオ形式の波形データとして図示しないメモリに格納する。 When receiving the control data from the generation unit 202, the synthesis unit 203 performs acoustic processing on the sound signal Ak (k = 1 to n) according to the control data (step SB4). Then, the sound signal subjected to the sound processing is mixed, and the sound signal Bj (j = 1 to m) as the mixing result is output to the data I / O 6 (step SB5). When the process of step SB5 is completed, the process returns to step SB1 and repeats the processes of steps SB1 to SB5 described above. When the data I / O 6 receives the sound signal Bj (j = 1 to m), the data I / O 6 is stored as audio waveform data in a memory (not shown).

本実施形態では、複数の音信号Ａｋ（ｋ＝１〜ｎ）から抽出された１または複数の特徴量に基づき、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位が設定される。そして、この主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に１または複数の音響効果が付与される。従って、本実施形態によると、音信号Ａｋ（ｋ＝１〜ｎ）の種々の特徴を考慮したバリエーション豊かな音響処理を音信号Ａｋ（ｋ＝１〜ｎ）に施すことができる。 In the present embodiment, the main audio order of the sound signal Ak (k = 1 to n) is set based on one or a plurality of feature quantities extracted from the plurality of sound signals Ak (k = 1 to n). Then, one or more acoustic effects are given to the sound signal Ak (k = 1 to n) according to the order of the main sound. Therefore, according to the present embodiment, it is possible to apply to the sound signal Ak (k = 1 to n) acoustic processing that is rich in variation in consideration of various features of the sound signal Ak (k = 1 to n).

本実施形態では、オーディオ形式の波形データを再生して得られる音信号Ａｋ（ｋ＝１〜ｎ）に音響処理を施してミキシングする。従って、歌唱者は、自分の歌声等を録音して動画投稿サイトに投稿する場合に、複雑な操作を伴わずに歌声等に音響処理を施して、その歌声等をミキシングすることができる。 In the present embodiment, sound processing is performed on the sound signals Ak (k = 1 to n) obtained by reproducing waveform data in the audio format and mixing is performed. Therefore, when the singer records his / her singing voice etc. and posts it on the moving picture posting site, the singing voice etc. can be subjected to sound processing and mixed with the singing voice etc without complicated operation.

また、本実施形態によると、合成部２０３に音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御を行わせることにより、最も上手に歌う歌唱者の歌声をセンタに定位させ、上手に歌うことができない歌唱者の歌声を左右に定位させることができる。従って、歌唱者に自分の歌声の音像をセンタに定位させるために、歌唱力を向上させようとする動機づけを行わせることができる。 Further, according to the present embodiment, by causing the synthesis unit 203 to control the sound image localization processing of the sound signal Ak (k = 1 to n), the singing voice of the singer who sings the best can be localized at the center, Singers who can not sing can be localized left and right. Therefore, it is possible to make the singer motivate to improve the singing ability in order to localize the sound image of his singing voice at the center.

＜他の実施形態＞
以上、この発明の各種の実施形態について説明したが、この発明には他にも実施形態が考えられる。 Other Embodiments
Although the various embodiments of the present invention have been described above, other embodiments can be considered in the present invention.

（１）第１実施形態において、合成部１０３は、制御データに従い音信号Ａｋ（ｋ＝１〜ｎ）に音像定位処理の制御を施すことにより、音像を水平方向の所定の位置に定位させた。しかし、音像が垂直方向の所定の位置に定位するように、生成部１０２に制御データを生成させてもよい。 (1) In the first embodiment, the synthesizer 103 localizes the sound image at a predetermined position in the horizontal direction by performing control of sound image localization processing on the sound signal Ak (k = 1 to n) according to the control data. . However, the control data may be generated by the generation unit 102 so that the sound image is localized at a predetermined position in the vertical direction.

（２）第２実施形態において、分析部２０１は、オーディオ形式の波形データの全再生区間において音信号Ａｋ（ｋ＝１〜ｎ）から特徴量を抽出し、音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定してもよい。また、生成部２０２は、この主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に付与する音響効果の制御データを設定してもよい。さらに、合成部２０３は、この制御データに基づき、音信号Ａｋ（ｋ＝１〜ｎ）に音響処理を施してもよい。これにより、音信号Ａｋ（ｋ＝１〜ｎ）全体の音楽的な特徴を考慮した音響処理を音信号Ａｋ（ｋ＝１〜ｎ）に施すことができる。 (2) In the second embodiment, the analysis unit 201 extracts the feature amount from the sound signal Ak (k = 1 to n) in the entire reproduction interval of the waveform data in the audio format, and the sound signal Ak (k = 1 to n). The main voice order may be set to Further, the generation unit 202 may set control data of sound effects to be applied to the sound signals Ak (k = 1 to n) according to the order of the main sound. Furthermore, the synthesizing unit 203 may perform acoustic processing on the sound signal Ak (k = 1 to n) based on the control data. Thereby, sound processing in consideration of musical features of the entire sound signal Ak (k = 1 to n) can be performed on the sound signal Ak (k = 1 to n).

（３）第２実施形態において、分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量と模範データから抽出した特徴量との類似性に基づき、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位を決定してもよい。ここで、模範データとは、例えば、模範ボーカルや模範コーラスの歌声、ＭＩＤＩ形式の演奏データ、楽譜データ等のことをいう。模範ボーカルや模範コーラスから抽出する特徴量は、音量、音高、歌声の継続時間等の種々の特徴量のうち１または複数の特徴量であってもよい。この場合、分析部２０１が抽出する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより指定される。分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量と模範データから抽出した特徴量との類似性が最も高い音信号の主音声順位を第１位とし、最も低い音信号の主音声順位を第ｎ位とする。 (3) In the second embodiment, based on the similarity between the feature extracted from the sound signal Ak (k = 1 to n) and the feature extracted from the model data, the analysis unit 201 generates the sound signal Ak (k = k). 1 to n) may be determined. Here, model data means, for example, model vocals, singing voices of model chorus, performance data of MIDI format, music score data and the like. The feature quantities extracted from the model vocals or the model chorus may be one or more feature quantities among various feature quantities such as volume, pitch, and duration of singing voice. In this case, the feature amount extracted by the analysis unit 201 is designated by the user transmitting a predetermined operation command via the UI 204. The analysis unit 201 determines that the main voice rank of the sound signal having the highest similarity between the feature extracted from the sound signal Ak (k = 1 to n) and the feature extracted from the model data is the first, and the sound is the lowest. The main speech order of the signal is the nth.

（４）第２実施形態において、合成部２０３は、模範データから抽出した特徴量をリファレンスとして、音信号Ａｋ（ｋ＝１〜ｎ）の特徴量を補正してもよい。ここで、模範データとは、例えば、ＭＩＤＩ形式の演奏データや模範ボーカルの歌声等のことをいう。例えば、合成部２０３は、ある演奏区間において、分析部２０１がＭＩＤＩ形式の演奏データから取得したピッチカーブデータをリファレンスとして、当該演奏区間における音信号Ａｋ（ｋ＝１〜ｎ）のピッチカーブを補正する。また、合成部２０３は、ある演奏区間において分析部２０１がＭＩＤＩ形式の演奏データから取得したベロシティ（音の強弱）データをリファレンスとして、当該演奏区間における音信号Ａｋ（ｋ＝１〜ｎ）のアーティキュレーション（例えば、音量・音韻遷移時間）を補正する。また、合成部２０３は、ある演奏区間において分析部２０１がＭＩＤＩ形式の演奏データから取得したビブラート（例えば、音高変化、音量変化）データをリファレンスとして、当該演奏区間における音信号Ａｋ（ｋ＝１〜ｎ）のビブラートを補正する。模範データから取得する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより設定される。 (4) In the second embodiment, the synthesizing unit 203 may correct the feature quantities of the sound signal Ak (k = 1 to n) using the feature quantities extracted from the model data as a reference. Here, model data means, for example, performance data of MIDI format, singing voice of model vocals, and the like. For example, the synthesis unit 203 corrects the pitch curve of the sound signal Ak (k = 1 to n) in the performance section with the pitch curve data acquired from the performance data of the MIDI format as a reference in a certain performance section. Do. In addition, the synthesis unit 203 uses artefacts of the sound signal Ak (k = 1 to n) in the performance section with reference to velocity (sound strength) data acquired from the performance data of the MIDI format by the analysis section 201 in a certain performance section. Correct curation (for example, volume and phonetic transition time). In addition, the synthesis unit 203 uses the vibrato (for example, pitch change, volume change) data acquired by the analysis unit 201 from MIDI performance data in a certain performance section as a reference, and uses the sound signal Ak (k = 1) in the performance section. Correct the vibrato of ~ n). The feature amount acquired from the model data is set by the user transmitting a predetermined operation command via the UI 204.

また、合成部２０３は、模範ボーカルの歌声から抽出した声質をリファレンスとして、音信号Ａｋ（ｋ＝１〜ｎ）が示す歌声の声質を補正してもよい。 In addition, the synthesis unit 203 may correct the voice quality of the singing voice indicated by the sound signal Ak (k = 1 to n) using the voice quality extracted from the singing voice of the model vocal as a reference.

（５）第２実施形態において、合成部２０３は、模範データから抽出した特徴量と音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量とを基に新たな波形データを生成し、当該波形データからなる音信号Ａｋ（ｋ＝１〜ｎ）を、音信号Ａｋ（ｋ＝１〜ｎ）にミキシングしてもよい。例えば、分析部２０１は、ＭＩＤＩ形式の演奏データからピッチカーブ、楽曲のコード進行情報、ダイヤトニックスケール等の特徴量を抽出する。合成部２０３は、この特徴量が音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量と調和するように、コーラス音声やダブリング音声等の波形を生成する。そして、生成したコーラス音声やダブリング音声が示す音信号Ａｋ（ｋ＝１〜ｎ）と各入力音信号Ａｋ（ｋ＝１〜ｎ）とをミキシングすることにより、音信号Ａｋ（ｋ＝１〜ｎ）が示す音声にコーラス音声やダブリング音声を重畳させる。模範データから抽出する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより設定される。 (5) In the second embodiment, the synthesizing unit 203 generates new waveform data based on the feature quantity extracted from the model data and the feature quantity extracted from the sound signal Ak (k = 1 to n), The sound signal Ak (k = 1 to n) consisting of waveform data may be mixed with the sound signal Ak (k = 1 to n). For example, the analysis unit 201 extracts feature amounts such as a pitch curve, chord progression information of a music, and a diatonic scale from performance data in the MIDI format. The synthesizing unit 203 generates a waveform such as a chorus sound or a doubling sound so that the feature amount matches the feature amount extracted from the sound signal Ak (k = 1 to n). Then, the sound signals Ak (k = 1 to n) are generated by mixing the generated sound signals Ak (k = 1 to n) indicated by the chorus sound and the doubling sound with the input sound signals Ak (k = 1 to n). The chorus voice and doubling voice are superimposed on the voice indicated by). The feature value extracted from the model data is set by the user transmitting a predetermined operation command via the UI 204.

（６）第２実施形態において、合成部２０３は、音信号Ａｋ（ｋ＝１〜ｎ）から抽出された特徴量を基に、当該特徴量を取得したパートまたはそれ以外のパートの音信号Ａｋ（ｋ＝１〜ｎ）の特徴量を加工してもよい。 (6) In the second embodiment, the synthesis unit 203 generates the sound signal Ak of the part that acquired the feature amount or the part other than that based on the feature amount extracted from the sound signal Ak (k = 1 to n). The feature amounts of (k = 1 to n) may be processed.

例えば、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したピッチカーブデータを基に、当該ピッチカーブデータを抽出したパートのピッチカーブを加工する。これにより、当該パートのピッチカーブの特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したピッチカーブデータを基に、当該ピッチカーブデータを抽出したパートとは別のパートのピッチカーブを加工してもよい。これにより、あるパートのピッチカーブの特徴を、他のパートにも付与することができる。 For example, based on the pitch curve data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n), the synthesis unit 203 processes the pitch curve of the part from which the pitch curve data has been extracted. Thereby, the characteristic of the pitch curve of the part can be changed by an appropriate amount. Further, based on the pitch curve data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n), the synthesis unit 203 processes the pitch curve of the part other than the part from which the pitch curve data is extracted. May be In this way, the pitch curve characteristic of one part can be added to other parts.

また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したベロシティデータを基に、当該ベロシティデータを抽出したパートのアーティキュレーションを加工する。これにより、当該パートのアーティキュレーションの特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したベロシティデータを基に、当該ベロシティデータを抽出したパートとは別のパートのアーティキュレーションを加工してもよい。これにより、あるパートのアーティキュレーションの特徴を、他のパートにも付与することができる。 Further, the synthesizing unit 203 processes the articulation of the part from which the velocity data has been extracted, based on the velocity data extracted from the sound signal Ak (k = 1 to n) by the analyzing unit 201. In this way, it is possible to change the articulation feature of the part by an appropriate amount. Further, based on the velocity data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n), the synthesis unit 203 processes the articulation of the part different from the part from which the velocity data is extracted. It is also good. In this way, the articulation feature of a part can be added to other parts.

また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したビブラートデータを基に、当該ビブラートデータを抽出したパートのビブラートを加工する。これにより、当該パートのビブラートの特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したビブラートデータを基に、当該ビブラートデータを抽出したパートとは別のパートのビブラートを加工してもよい。これにより、あるパートのビブラートの特徴を、他のパートにも付与することができる。 The synthesizing unit 203 processes the vibrato of the part from which the vibrato data is extracted, based on the vibrato data extracted from the sound signal Ak (k = 1 to n) by the analyzing unit 201. This makes it possible to change the vibrato feature of the part by an appropriate amount. In addition, the synthesizing unit 203 may process vibrato of a part different from the part from which the vibrato data is extracted, based on the vibrato data extracted from the sound signal Ak (k = 1 to n) by the analyzing unit 201. . This makes it possible to impart vibrato characteristics of one part to other parts.

また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から取得した歌唱者の声質データを基に、当該声質データを取得したパートの声質を加工する。これにより、当該パートの声質の特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出した声質データを基に、当該声質データを抽出したパートとは別のパートの声質を加工してもよい。これにより、あるパートの声質の特徴を、他のパートにも付与することができる。 Further, the synthesis unit 203 processes the voice quality of the part for which the voice quality data has been acquired, based on the voice quality data of the singer who is obtained from the sound signal Ak (k = 1 to n) by the analysis unit 201. Thereby, the voice quality characteristics of the part can be changed by an appropriate amount. Further, the synthesizing unit 203 may process the voice quality of a part other than the part from which the voice quality data is extracted, based on the voice quality data extracted from the sound signal Ak (k = 1 to n) by the analysis unit 201. . In this way, the voice quality characteristics of one part can be added to other parts.

分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより設定される。また、合成部２０３により加工されるパートは、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより設定される。 The feature amount extracted from the sound signal Ak (k = 1 to n) by the analysis unit 201 is set by the user transmitting a predetermined operation command via the UI 204. Further, the part processed by the synthesis unit 203 is set by the user transmitting a predetermined operation command via the UI 204.

（７）第２実施形態において、合成部２０３は、模範データから抽出された特徴量を基に、所定の区間を設定し、この区間においてのみミキシングされた音信号Ａｋ（ｋ＝１〜ｎ）を出力させてもよい。例えば、合成部２０３は、ＭＩＤＩ形式の演奏データ等の模範データから各種特徴量を抽出し、歌い出し〜Ａメロ〜サビに至るまでの区間、歌い出し〜最大音量付近に至るまでの区間、歌いだし〜最小音量付近に至るまでの区間等を特定する。そして、これらの指定された区間においてのみミキシングされた音信号Ａｋ（ｋ＝１〜ｎ）を出力する。 (7) In the second embodiment, the synthesizing unit 203 sets a predetermined section based on the feature quantity extracted from the model data, and the sound signal Ak (k = 1 to n) mixed only in this section May be output. For example, the synthesizing unit 203 extracts various feature quantities from model data such as MIDI performance data, and segments from singing to A melody to rust, from singing to maximum volume, and singing -Specify the section up to around the minimum sound volume. Then, the sound signal Ak (k = 1 to n) mixed only in these designated sections is output.

また、合成部２０３は、設定された複数の区間を時系列に接続したダイジェストを作成し、このダイジェストに従い順次ミキシングされた音信号Ａｋ（ｋ＝１〜ｎ）を出力してもよい。この場合、ダイジェストの時間長は、ネットワークの混雑状況等を考慮して適宜変更できるようにしてもよい。これらの区間やダイジェストの時間長は、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより設定される。 The synthesizing unit 203 may create a digest in which a plurality of set sections are connected in time series, and may output sound signals Ak (k = 1 to n) sequentially mixed according to the digest. In this case, the time length of the digest may be appropriately changed in consideration of the congestion state of the network and the like. The time length of these sections and the digest is set by the user transmitting a predetermined operation command via the UI 204.

（８）第１実施形態および第２実施形態において、合成部１０３および２０３に、主音声順位に従い、歌唱者等の画像を表示部４または他の表示手段に表示させるための表示制御信号を出力させてもよい。この場合、主音声順位が最も高い歌唱者の画像を表示部４または他の表示手段のセンタに表示させ、主音声順位が最も低い歌唱者の画像を表示部４または他の表示手段の左右に小さく表示させる。これにより、歌唱者に自身の画像をセンタに表示させるために、歌唱力を向上させようとする動機づけを行わせることができる。 (8) In the first embodiment and the second embodiment, the synthesizing units 103 and 203 output display control signals for causing the display unit 4 or other display means to display an image of a singer or the like according to the main voice order You may In this case, the image of the singer who has the highest primary audio order is displayed on the center of the display unit 4 or other display means, and the image of the singer who has the lowest primary audio order is displayed to the left and right of the display unit 4 or other display means Display small. In this way, it is possible to cause the singer to perform a motivation to improve the singing ability in order to display his or her own image on the center.

（９）上記（１）〜（８）に示す制御を実行するか否かの判断は、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより決定してもよい。また、第２実施形態において、逐次入力される複数の音信号Ａｋ（ｋ＝１〜ｎ）をリアルタイムでミキシングする処理、または録音された複数の音声データが示す音信号Ａｋ（ｋ＝１〜ｎ）をミキシングする処理のいずれを行うかの判断は、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより決定してもよい。 (9) The determination as to whether or not to execute the control described in (1) to (8) may be made by the user transmitting a predetermined operation command via the UI 204. In the second embodiment, a process of mixing a plurality of sequentially input sound signals Ak (k = 1 to n) in real time, or sound signals Ak (k = 1 to n) indicated by a plurality of audio data recorded. The determination of which of the processes of mixing) may be performed by transmitting a predetermined operation command via the UI 204 by the user.

（１０）第１実施形態および第２実施形態に示すミキシング装置は、クライアントサーバシステム（分散型コンピュータシステム）としてもよい。すなわち、クライアント側に集音器７−ｋ（ｋ＝１〜ｎ）およびＡ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）を設置し、歌声等の集音および音信号Ａｋ（ｋ＝１〜ｎ）のＡ／Ｄ変換を行わせる。そして、Ａ／Ｄ変換後の音信号Ａｋ（ｋ＝１〜ｎ）をサーバにアップロードし、サーバ側に設置されたＣＰＵ１にミキシング制御プログラム１００または２００を実行させる。そして、ミキシングが施された音信号Ｂｊ（ｊ＝１〜ｍ）をクライアント側にダウンロードする構成としてもよい。 (10) The mixing device described in the first and second embodiments may be a client server system (distributed computer system). That is, sound collectors 7-k (k = 1 to n) and A / D converters 8-k (k = 1 to n) are installed on the client side, and sound collection of a singing voice etc. and sound signal Ak (k = k 1 to n) A / D conversion is performed. Then, the sound signal Ak (k = 1 to n) after A / D conversion is uploaded to the server, and the CPU 1 installed on the server side causes the mixing control program 100 or 200 to be executed. Then, the sound signal Bj (j = 1 to m) subjected to the mixing may be downloaded to the client side.

また、クライアント側で、集音器７−ｋ（ｋ＝１〜ｎ）による集音、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）によるＡ／Ｄ変換、分析部１０１および２０１による分析データの生成を行わせてもよい。この場合、Ａ／Ｄ変換後の音信号Ａｋ（ｋ＝１〜ｎ）および分析データをサーバにアップロードし、サーバ側に生成部１０２および２０２による制御データの生成、合成部１０３および２０３によるミキシングを行わせる。そして、ミキシングが施された音信号Ｂｊ（ｊ＝１〜ｍ）をクライアント側にダウンロードする構成としてもよい。 Further, on the client side, sound collection by the sound collector 7-k (k = 1 to n), A / D conversion by the A / D converter 8-k (k = 1 to n), analysis by the analysis units 101 and 201 Analysis data may be generated. In this case, the sound signal Ak (k = 1 to n) after A / D conversion and analysis data are uploaded to the server, and on the server side, generation of control data by the generation units 102 and 202 and mixing by the synthesis units 103 and 203 are performed. Let it go. Then, the sound signal Bj (j = 1 to m) subjected to the mixing may be downloaded to the client side.

また、クライアント側で、集音器７−ｋ（ｋ＝１〜ｎ）による集音、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）によるＡ／Ｄ変換、分析部１０１および２０１による分析データの生成、生成部１０２および２０２による制御データの生成を行わせてもよい。この場合、Ａ／Ｄ変換後の音信号Ａｋ（ｋ＝１〜ｎ）および制御データをサーバにアップロードし、サーバ側に合成部１０３および２０３によるミキシングを行わせる。そして、ミキシングが施された音信号Ｂｊ（ｊ＝１〜ｍ）をクライアント側にダウンロードする構成としてもよい。 Further, on the client side, sound collection by the sound collector 7-k (k = 1 to n), A / D conversion by the A / D converter 8-k (k = 1 to n), analysis by the analysis units 101 and 201 The generation of analysis data and the generation of control data by the generation units 102 and 202 may be performed. In this case, the sound signal Ak (k = 1 to n) after A / D conversion and the control data are uploaded to the server, and the server side performs mixing by the synthesizing units 103 and 203. Then, the sound signal Bj (j = 1 to m) subjected to the mixing may be downloaded to the client side.

また、クライアントサーバシステムにした場合、サーバ側の処理結果を随時クライアント側でモニタリングできるようにすることで、クライアント側はサーバの処理能力に応じて処理量を調整することができる。 Further, in the case of a client server system, the client side can adjust the processing amount according to the processing capacity of the server by enabling the client side to monitor the processing result on the server side as needed.

（１１）上記各実施形態において、主音声順位が同順位の音信号Ａｋ（ｋ＝１〜ｎ）が複数ある場合、以下のような処理を実行してもよい。例えば、主音声順位が第１位の音信号が２つある場合、各々を同率１位とする。そして、第２位を欠番とし、他の音信号に第３位〜第ｎ位までの主音声順位を設定する。あるいは、第２位を欠番とせず、他の音信号Ａｋに第２位〜第ｎ−１位までの主音声順位を設定する。あるいは、主音声順位が第１位の音信号Ａｋ（ｋ＝１〜ｎ）が２つある場合、各々を同率１位とせず、添え字の番号ｋ（１〜ｎ）の小さい方の音信号の主音声順位を第１位、大きい方の音信号の主音声順位を第２位と設定してもよい。 (11) In the above-described embodiments, when there are a plurality of sound signals Ak (k = 1 to n) having the same primary audio rank, the following processing may be performed. For example, when there are two sound signals with the first main speech order, each is set to the same rate first. Then, the second place is regarded as a missing number, and the third to nth main speech orders are set to the other sound signals. Alternatively, the second to the n-1th main speech orders are set to the other sound signals Ak without making the second place a missing number. Alternatively, when there are two sound signals Ak (k = 1 to n) having the first main speech order, each sound signal is not equal to the first rank, and the sound signal with the smaller subscript number k (1 to n) is used. The main audio order of may be set as the first place, and the main audio order of the larger sound signal may be set as the second place.

（１２）上記各実施形態では、歌唱音声信号をミキシングするミキシング装置にこの発明を適用したが、この発明は楽音信号をミキシングするミキシング装置や、歌唱音声信号と楽音信号をミキシングするミキシング装置にも適用可能である。 (12) In the above embodiments, the present invention is applied to a mixing device for mixing a singing voice signal, but the present invention is also applicable to a mixing device for mixing a musical tone signal and a mixing device for mixing a singing voice signal and a musical tone signal. It is applicable.

１…ＣＰＵ、２…ＲＯＭ、３…ＲＡＭ、４…表示部、５…操作部、６…データＩ／Ｏ、７−ｋ（ｋ＝１〜ｎ）…集音器、８−ｋ（ｋ＝１〜ｎ）…Ａ／Ｄ変換器、９−ｊ（ｊ＝１〜ｍ）…Ｄ／Ａ変換器、１０−ｊ（ｊ＝１〜ｍ）…増幅器、１１−ｊ（ｊ＝１〜ｍ）…拡声器、１２…バス、１０，２０…ミキシング装置、１００，２００…ミキシング制御プログラム、１０１，２０１…分析部、１０２，２０２…生成部、１０３，２０３…合成部、２０４…ＵＩ。 DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... ROM, 3 ... RAM, 4 ... Display part, 5 ... Operation part, 6 ... Data I / O, 7-k (k = 1 to n) ... Sound collector, 8-k (k =) 1 to n) ... A / D converter, 9-j (j = 1 to m) ... D / A converter, 10-j (j = 1 to m) ... amplifier, 11-j (j = 1 to m) ... ... Loudspeaker, 12 ... Bus, 10, 20 ... Mixing device, 100, 200 ... Mixing control program, 101, 201 ... Analysis unit, 102, 202 ... Generation unit, 103, 203 ... Synthesis unit, 204 ... UI.

Claims

Extracting a plurality of feature quantities from each sound signal in the plurality of sound signals;
The order of the plurality of sound signals is set based on a plurality of feature quantities extracted from each sound signal,
A sound signal control method, wherein control data for controlling the plurality of sound signals are generated based on the order set for each sound signal.

The volume of each sound signal in the plurality of sound signals, the characteristic of the equalizer for processing each sound signal, the acoustic effect to be applied to each sound signal, or the localization position of each sound signal according to control data corresponding to each sound signal The sound signal control method according to claim 1, wherein the control is performed.

The plurality of sound signals are sound signals of singing of a plurality of singers,
The sound signal control method according to claim 1 or 2, wherein display of the images of the plurality of singers is controlled according to the order of the sound signals corresponding to the plurality of singers.

A first feature amount and a second feature amount are extracted as the feature amounts from each sound signal in the plurality of sound signals,
The sound signal control method according to any one of claims 1 to 3, wherein the order of the plurality of sound signals is set based on the first feature amount and the second feature amount of each sound signal.

In setting the order of the plurality of sound signals,
A first rank is set for the first feature amount of each sound signal,
Setting a second rank for the second feature value of each sound signal;
The sound signal control method according to claim 4, wherein the order of each sound signal is set by performing weighted addition of the first order and the second order of each sound signal.

Extracting a plurality of feature quantities from each sound signal in sound signals of singing of a plurality of singers;
The order of the plurality of sound signals is set based on a plurality of feature quantities extracted from each sound signal,
A display control method of controlling display of images of the plurality of singers according to the order of sound signals corresponding to the respective singers.

Extract features from multiple sound signals,
The order of the plurality of sound signals is set based on the similarity between the feature amount of the plurality of sound signals and the feature amount of the sound signal as a reference,
A display control method for controlling display of images of a plurality of singers according to the order of sound signals corresponding to the singers.