JP6881488B2

JP6881488B2 - Sound signal control method and display control method

Info

Publication number: JP6881488B2
Application number: JP2019041824A
Authority: JP
Inventors: 嘉山　啓; 啓嘉山; 雅史吉田; 佳孝浦谷; 森　隆志; 隆志森; 国本　利文; 利文国本; 近藤　多伸; 多伸近藤; 隼人大下; 誠橘; 橘　　誠
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2021-06-02
Anticipated expiration: 2034-10-17
Also published as: JP2019126076A

Description

本発明は、ミキシング装置等に好適な音信号制御方法および表示制御方法に関する。 The present invention relates to a sound signal control method and a display control method suitable for a mixing device or the like.

マイクロホン等を介して入力される複数の音声信号のミキシングを行うミキシング装置が知られている。この種のミキシング装置では、ミキシング結果を放音したときの音響的効果を高めるため、ミキシング対象である各音声信号に対して、音像定位処理等、各種の音響処理を施す場合がある。 A mixing device that mixes a plurality of audio signals input via a microphone or the like is known. In this type of mixing device, in order to enhance the acoustic effect when the mixing result is emitted, various acoustic processes such as sound image localization processing may be performed on each audio signal to be mixed.

特許第４０６８０６９号Patent No. 4068069

ところで、例えば複数人の歌い手の歌唱音声信号のミキシングを行う場合、それらの各歌唱音声信号の状況は時々刻々と変化する。従って、優れた音響的効果を実現するためには、各歌唱音声信号に適用する音響処理の内容を各歌唱音声の状況に応じて臨機応変に切り換えることが求められる。しかしながら、ミキシング装置の操作に慣れた熟練者でないと、そのような切り換え操作を行うことは困難である。 By the way, for example, when mixing the singing audio signals of a plurality of singers, the situation of each singing audio signal changes from moment to moment. Therefore, in order to realize an excellent acoustic effect, it is required to flexibly switch the content of the acoustic processing applied to each singing voice signal according to the situation of each singing voice. However, it is difficult to perform such a switching operation unless a skilled person is accustomed to operating the mixing device.

この発明は、以上説明した事情に鑑みてなされたものであり、複雑な操作を行わせることなく、複数の音信号の状況に応じて、各音信号に適切な音響処理を施してミキシングすることができるミキシング装置を提供することを目的としている。 The present invention has been made in view of the circumstances described above, and mixes each sound signal by applying appropriate acoustic processing according to the situation of a plurality of sound signals without performing complicated operations. It is an object of the present invention to provide a mixing device capable of performing.

この発明は、複数の音信号から特徴量を各々抽出し、抽出した各特徴量に基づき、前記複数の音信号に順位を各々設定する分析部と、前記複数の音信号に適用する音響処理を各々制御するための複数の制御データを前記複数の音信号に設定された順位に基づいて各々生成する生成部とを有することを特徴とするミキシング装置を提供する。 The present invention provides an analysis unit that extracts feature quantities from a plurality of sound signals and sets the order of each of the plurality of sound signals based on the extracted feature quantities, and an acoustic process applied to the plurality of sound signals. The present invention provides a mixing apparatus characterized by having a generation unit for generating a plurality of control data for each control based on the order set for the plurality of sound signals.

かかるミキシング装置によれば、複数の音信号から各々抽出される特徴量が変化すると、これにより各音信号に設定される順位が変化する場合がある。この場合、変化後の各音信号の順位に従って、各音信号に施される音響処理が制御される。従って、時々刻々と変化する複数の音信号の状況に応じて、各音信号に適用する音響処理の内容を制御することができる。 According to such a mixing device, when the feature amount extracted from each of a plurality of sound signals changes, the order set for each sound signal may change accordingly. In this case, the acoustic processing applied to each sound signal is controlled according to the order of each sound signal after the change. Therefore, it is possible to control the content of the acoustic processing applied to each sound signal according to the situation of the plurality of sound signals that change from moment to moment.

なお、ミキシングの際の音響処理の制御を行う技術を開示した文献として、特許文献１がある。この特許文献１では、歌唱者がカラオケに合わせて、ある歌唱パートを歌唱していることをカラオケ装置が認知すると、その歌唱パートとミキシングするバックコーラスパートの再生音量を小さくする。しかし、この発明は、この特許文献１のようにミキシング対象である１つのパートの音声信号の有無に基づいて他のパートの音声信号の音量を制御するものではなく、ミキシング対象である複数の音信号の特徴量に基づいて複数の音信号に順位を設定し、複数の音信号の順位に従って、各音信号に適用する音響処理を制御するものである。このように、本発明は、特許文献１に開示のものとは全く異なる発明である。 Patent Document 1 is a document that discloses a technique for controlling acoustic processing during mixing. In Patent Document 1, when the karaoke device recognizes that the singer is singing a certain singing part in accordance with the karaoke, the playback volume of the backing chorus part mixed with the singing part is reduced. However, the present invention does not control the volume of the audio signal of the other part based on the presence or absence of the audio signal of one part to be mixed as in Patent Document 1, but a plurality of sounds to be mixed. The order is set for a plurality of sound signals based on the characteristic amount of the signal, and the acoustic processing applied to each sound signal is controlled according to the order of the plurality of sound signals. As described above, the present invention is completely different from the invention disclosed in Patent Document 1.

本発明の第１実施形態であるミキシング装置１０の構成を示すブロック図である。It is a block diagram which shows the structure of the mixing apparatus 10 which is 1st Embodiment of this invention. 同実施形態におけるＣＰＵ１が実行するミキシング制御プログラム１００の構成を説明するための図である。It is a figure for demonstrating the structure of the mixing control program 100 executed by CPU 1 in the same embodiment. 同ミキシング制御プログラム１００の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the mixing control program 100. 本発明の第２実施形態であるミキシング装置２０において、ＣＰＵ１が実行するミキシング制御プログラム２００の構成を説明するための図である。It is a figure for demonstrating the structure of the mixing control program 200 executed by the CPU 1 in the mixing apparatus 20 which is 2nd Embodiment of this invention. 同ミキシング制御プログラム２００の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the mixing control program 200.

＜第１実施形態＞
図１は、この発明の第１実施形態であるミキシング装置１０の構成を示すブロック図である。図１に示すミキシング措置１０は、ＣＰＵ１と、ＲＯＭ２と、ＲＡＭ３と、表示部４と、操作部５と、データＩ／Ｏ６と、集音器７−ｋ（ｋ＝１〜ｎ）と、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）と、Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）と、増幅器１０−ｊ（ｊ＝１〜ｍ）と、拡声器１１−ｊ（ｊ＝１〜ｍ）により構成される。各々の機器は、バス１２を介してデータの入出力を行う。なお、バス１２はオーディオバスやデータバス等を総称したものである。 <First Embodiment>
FIG. 1 is a block diagram showing a configuration of a mixing device 10 according to a first embodiment of the present invention. The mixing measures 10 shown in FIG. 1 include a CPU 1, a ROM 2, a RAM 3, a display unit 4, an operation unit 5, data I / O 6, a sound collector 7-k (k = 1 to n), and A. / D converter 8-k (k = 1-n), D / A converter 9-j (j = 1-m), amplifier 10-j (j = 1-m), loudspeaker 11- It is composed of j (j = 1 to m). Each device inputs / outputs data via the bus 12. The bus 12 is a general term for an audio bus, a data bus, and the like.

ＣＰＵ１は、バス１２を介してミキシング装置全体の動作を制御するプロセッサである。ＲＯＭ２は、ミキシング装置１０の基本的な動作を制御するためにＣＰＵ１が実行するプログラム（以下、ミキシング制御プログラムという）を記憶した読み出し専用メモリである。ＲＡＭ３は、ＣＰＵ１によってワークエリアとして利用される揮発性メモリである。表示部４は、例えば液晶ディスプレイとその駆動回路であり、ＣＰＵ１からバス１２を介して与えられた表示制御信号に基づいて各種画面を表示する。操作部５は、利用者に各種情報を入力させるための手段であり、複数の操作子やタッチパネル等で構成されている。データＩ／Ｏ６は、ＭＩＤＩ（Musical Instruments Digital Interface：登録商標）形式の演奏データやオーディオ形式の波形データを外部から受け取り、音信号として出力するインターフェースである。集音器７−ｋ（ｋ＝１〜ｎ）はｎ個のマイクロホン等により構成され、入力される歌唱者の歌声等をアナログの電気信号に変換してＡ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）に出力する。Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）は、集音器７−ｋ（ｋ＝１〜ｎ）から出力される各アナログ音信号をデジタル音信号Ａｋ（ｋ＝１〜ｎ）に変換する。Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）は、ミキシング処理の結果得られるデジタル音信号Ｂｊ（ｊ＝１〜ｍ）をアナログ音信号に変換する。増幅器１０−ｊ（ｊ＝１〜ｍ）は、Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）から出力されたアナログ音信号を増幅する。拡声器１１−ｊ（ｊ＝１〜ｍ）は、増幅器１０−ｊ（ｊ＝１〜ｍ）から出力されるアナログ音信号を音として放音する。 The CPU 1 is a processor that controls the operation of the entire mixing device via the bus 12. The ROM 2 is a read-only memory that stores a program (hereinafter referred to as a mixing control program) executed by the CPU 1 to control the basic operation of the mixing device 10. The RAM 3 is a volatile memory used as a work area by the CPU 1. The display unit 4 is, for example, a liquid crystal display and its drive circuit, and displays various screens based on a display control signal given from the CPU 1 via the bus 12. The operation unit 5 is a means for allowing the user to input various information, and is composed of a plurality of controls, a touch panel, and the like. The data I / O6 is an interface that receives performance data in MIDI (Musical Instruments Digital Interface: registered trademark) format and waveform data in audio format from the outside and outputs them as sound signals. The sound collector 7-k (k = 1 to n) is composed of n microphones or the like, and converts the input singing voice of the singer into an analog electric signal to convert the A / D converter 8-k (k). = 1 to n) is output. The A / D converter 8-k (k = 1 to n) converts each analog sound signal output from the sound collector 7-k (k = 1 to n) into a digital sound signal Ak (k = 1 to n). Convert to. The D / A converter 9-j (j = 1 to m) converts the digital sound signal Bj (j = 1 to m) obtained as a result of the mixing process into an analog sound signal. The amplifier 10-j (j = 1-m) amplifies the analog sound signal output from the D / A converter 9-j (j = 1-m). The loudspeaker 11-j (j = 1-m) emits an analog sound signal output from the amplifier 10-j (j = 1-m) as sound.

図２は、本実施形態におけるＣＰＵ１が実行するミキシング制御プログラム１００の構成を説明するための図である。ミキシング制御プログラム１００は、分析部１０１と、生成部１０２と、合成部１０３とを含んでいる。分析部１０１は、逐次入力される音信号Ａｋ（ｋ＝１〜ｎ）から特徴量を抽出し、抽出した特徴量に基づき、順位（以下、主音声順位という）を音信号Ａｋ（ｋ＝１〜ｎ）に設定する。そして、設定した主音声順位を音信号Ａｋ（ｋ＝１〜ｎ）に対応付け、これを分析データとして生成部１０２に出力する。本実施形態では、分析部１０１は、特徴量として音量を音信号Ａｋ（ｋ＝１〜ｎ）から抽出する。 FIG. 2 is a diagram for explaining the configuration of the mixing control program 100 executed by the CPU 1 in the present embodiment. The mixing control program 100 includes an analysis unit 101, a generation unit 102, and a synthesis unit 103. The analysis unit 101 extracts a feature amount from the sequentially input sound signal Ak (k = 1 to n), and based on the extracted feature amount, sets the rank (hereinafter referred to as the main voice rank) to the sound signal Ak (k = 1). Set to ~ n). Then, the set main voice rank is associated with the sound signal Ak (k = 1 to n), and this is output to the generation unit 102 as analysis data. In the present embodiment, the analysis unit 101 extracts the volume as a feature amount from the sound signal Ak (k = 1 to n).

生成部１０２は、分析データを受け取ると、音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に適用する音響処理を制御するための制御データを生成する。本実施形態では、音響処理として、音信号Ａｋ（ｋ＝１〜ｎ）に対して音像定位処理を施す。そこで、生成部１０２は、音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従って、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理に適用する音像位置を選択し、音信号Ａｋ（ｋ＝１〜ｎ）に対応した音像をこれらの選択した音像位置に定位させるための制御データを生成する。例えば、分析部１０１に音信号Ａ１〜Ａ３が入力され、音信号Ａ１の主音声順位が第３位、音信号Ａ２の主音声順位が第２位、音信号Ａ３の主音声順位が第１位であったとする。この場合、生成部１０２は、主音声順位が第１位である音信号Ａ３の音像を最も優遇された位置であるセンタに、主音声順位が第２位である音信号Ａ２の音像をその次に優遇された位置である左に、主音声順位が第３位である音信号Ａ１の音像を最も優遇されていない位置である右に定位させるための制御データを生成し、合成部１０３に出力する。なお、生成部１０２が選択する音像位置は任意であり、上記例において、主音声順位が第１位である音信号Ａ３の音像を左に、主音声順位が第２位である音信号Ａ２の音像を右に、主音声順位が第３位である音信号Ａ１の音像をセンタに定位させる等、種々のパターンが考えられる。 Upon receiving the analysis data, the generation unit 102 controls the acoustic processing applied to the sound signal Ak (k = 1 to n) according to the main voice order set in the sound signal Ak (k = 1 to n). Generate control data. In the present embodiment, as acoustic processing, sound image localization processing is performed on the sound signal Ak (k = 1 to n). Therefore, the generation unit 102 selects a sound image position to be applied to the sound image localization process of the sound signal Ak (k = 1 to n) according to the main voice order set in the sound signal Ak (k = 1 to n), and makes a sound. Control data for localizing the sound image corresponding to the signal Ak (k = 1 to n) to these selected sound image positions is generated. For example, the sound signals A1 to A3 are input to the analysis unit 101, the main sound rank of the sound signal A1 is the third place, the main voice rank of the sound signal A2 is the second place, and the main voice rank of the sound signal A3 is the first place. Suppose it was. In this case, the generation unit 102 places the sound image of the sound signal A3 having the first main voice rank in the center, which is the most preferential position, and the sound image of the sound signal A2 having the second main voice rank next to it. Generates control data for localizing the sound image of the sound signal A1 having the third main voice rank to the right, which is the position that is not given the most preferential treatment, and outputs it to the compositing unit 103. To do. The sound image position selected by the generation unit 102 is arbitrary. In the above example, the sound image of the sound signal A3 having the first main voice rank is on the left, and the sound signal A2 having the second main voice rank is on the left. Various patterns are conceivable, such as positioning the sound image on the right and the sound image of the sound signal A1 having the third main voice rank at the center.

合成部１０３は、生成部１０２から制御データを受け取ると、制御データに従って、音信号Ａｋ（ｋ＝１〜ｎ）に音響処理（この例では音像定位処理）を施すとともに、音響処理の結果である音信号をミキシングする。 Upon receiving the control data from the generation unit 102, the synthesis unit 103 performs acoustic processing (sound image localization processing in this example) on the sound signal Ak (k = 1 to n) according to the control data, and is the result of the acoustic processing. Mix the sound signal.

図３は、本実施形態におけるミキシング制御プログラム１００の処理内容を示すフローチャートである。以下、図３を参照し、本実施形態の動作を説明する。複数の歌唱者が歌唱すると、複数の歌唱者の音信号が、集音器７−ｋ（ｋ＝１〜ｎ）を介してＡ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）に入力される。そして、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）によりＡ／Ｄ変換された音信号Ａｋ（ｋ＝１〜ｎ）は、分析部１０１および合成部１０３に入力される。 FIG. 3 is a flowchart showing the processing contents of the mixing control program 100 in the present embodiment. Hereinafter, the operation of the present embodiment will be described with reference to FIG. When a plurality of singers sing, the sound signals of the plurality of singers are input to the A / D converter 8-k (k = 1 to n) via the sound collector 7-k (k = 1 to n). Will be done. Then, the sound signal Ak (k = 1 to n) A / D converted by the A / D converter 8-k (k = 1 to n) is input to the analysis unit 101 and the synthesis unit 103.

分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）を受け取ると、以下のような算出手順で、音信号Ａｋ（ｋ＝１〜ｎ）から特徴量として音量（音信号Ａｋ（ｋ＝１〜ｎ）の振幅値）を抽出し、抽出した音量を基に音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定する（ステップＳＡ１）。 When the analysis unit 101 receives the sound signal Ak (k = 1 to n), the analysis unit 101 uses the following calculation procedure to calculate the volume (sound signal Ak (k = 1)) as a feature amount from the sound signal Ak (k = 1 to n). (Amplitude value of ~ n)) is extracted, and the main voice order is set in the sound signal Ak (k = 1 to n) based on the extracted volume (step SA1).

まず、分析部１０１は、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）から音信号Ａｋ（ｋ＝１〜ｎ）が逐次入力されると、所定の時間単位における音信号Ａｋ（ｋ＝１〜ｎ）の振幅値を抽出し、振幅エンベロープを算出する。ここで、設定する時間単位は、所定の一定値や、歌唱者が歌う楽曲の１曲全体の再生時間または１番のみの再生時間等としてもよい。また、ＶＡＤ（Voice Activity Detection）や、ＶＡＤとｈａｎｇｏｖｅｒ処理を併用した処理等により時間単位を設定してもよい。 First, when the sound signal Ak (k = 1 to n) is sequentially input from the A / D converter 8-k (k = 1 to n), the analysis unit 101 receives the sound signal Ak (k) in a predetermined time unit. The amplitude value of = 1 to n) is extracted and the amplitude envelope is calculated. Here, the time unit to be set may be a predetermined constant value, a playback time of the entire song sung by the singer, a playback time of only the first song, or the like. Further, the time unit may be set by VAD (Voice Activity Detection), processing in which VAD and changer processing are used in combination, or the like.

次に、分析部１０１は、算出した音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープを平滑化し、振幅エンベロープ波形に重畳されたノイズを除去する。次に、分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープのうち、最大の振幅値を有する振幅エンベロープを特定する。そして、振幅エンベロープの最大振幅値によって、音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープを除算し正規化する。ここで、正規化した振幅エンベロープが、予め定められた所定の閾値に満たない振幅値を含む場合、その振幅値に対応する区間において音信号Ａｋ（ｋ＝１〜ｎ）が入力されていないものとする。 Next, the analysis unit 101 smoothes the amplitude envelope of the calculated sound signal Ak (k = 1 to n) and removes the noise superimposed on the amplitude envelope waveform. Next, the analysis unit 101 identifies the amplitude envelope having the maximum amplitude value among the amplitude envelopes of the sound signal Ak (k = 1 to n). Then, the amplitude envelope of the sound signal Ak (k = 1 to n) is divided and normalized by the maximum amplitude value of the amplitude envelope. Here, when the normalized amplitude envelope contains an amplitude value less than a predetermined threshold value, the sound signal Ak (k = 1 to n) is not input in the section corresponding to the amplitude value. And.

次に、分析部１０１は、正規化された音信号Ａｋ（ｋ＝１〜ｎ）の振幅エンベロープを各々比較し、振幅エンベロープ値の大きな順に主音声順位を与える。すなわち、分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）のうち、振幅エンベロープ値が最大のものの主音声順位を第１位、次に大きい振幅エンベロープ値を有する音信号の主音声順位を第２位、…、最も小さな振幅エンベロープ値を有する音信号の主音声順位を第ｎ位とする。従って、最も大きな声で歌う歌唱者の音信号は、複数の歌唱者の歌声の中で最も存在感が大きいため、主音声順位が第１位となる。一方、最も小さな声で歌う歌唱者の音信号は、複数の歌唱者の音声の中で最も存在感が小さいため、主音声順位が第ｎ位（最下位）となる。分析部１０１は、音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定すると、これを音信号Ａｋ（ｋ＝１〜ｎ）に対応付け、分析データとして生成部１０２に出力する（ステップＳＡ２）。 Next, the analysis unit 101 compares the amplitude envelopes of the normalized sound signals Ak (k = 1 to n), and gives the main voice ranks in descending order of the amplitude envelope values. That is, the analysis unit 101 sets the main voice rank of the sound signal Ak (k = 1 to n) having the largest amplitude envelope value as the first place, and the main voice rank of the sound signal having the next largest amplitude envelope value. The second place, ..., The main voice rank of the sound signal having the smallest amplitude envelope value is set to the nth place. Therefore, the sound signal of the singer who sings with the loudest voice has the largest presence among the singing voices of the plurality of singers, so that the main voice rank is the first. On the other hand, the sound signal of the singer who sings with the smallest voice has the smallest presence among the voices of the plurality of singers, so that the main voice rank is nth (lowest). When the analysis unit 101 sets the main voice rank for the sound signal Ak (k = 1 to n), it associates this with the sound signal Ak (k = 1 to n) and outputs it to the generation unit 102 as analysis data (step). SA2).

生成部１０２は、分析部１０１から分析データを受け取ると、これを基に音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御データを設定する（ステップＳＡ３）。より具体的には、生成部１０２は、分析データを参照して、例えば最も主音声順位が高い音信号については、音像をセンタに定位させる制御データを設定する。一方、最も主音声順位が低い音信号については、音像を例えば右に定位させる制御データを設定する。生成部１０２は、音信号Ａｋ（ｋ＝１〜ｎ）について各々設定した音像定位処理の制御データを合成部１０３に出力する。 When the generation unit 102 receives the analysis data from the analysis unit 101, the generation unit 102 sets the control data for the sound image localization process of the sound signal Ak (k = 1 to n) based on the analysis data (step SA3). More specifically, the generation unit 102 refers to the analysis data and sets control data for localizing the sound image at the center, for example, for the sound signal having the highest main voice rank. On the other hand, for the sound signal having the lowest main voice rank, control data for localizing the sound image to the right is set. The generation unit 102 outputs the control data of the sound image localization process set for each of the sound signals Ak (k = 1 to n) to the synthesis unit 103.

合成部１０３は、生成部１０２から制御データを受け取ると、制御データに従って音信号Ａｋ（ｋ＝１〜ｎ）に音像定位処理を施す（ステップＳＡ４）。そして、音像定位処理が施された音信号Ａｋ（ｋ＝１〜ｎ）をミキシングし、Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）にミキシング結果である音信号Ｂｊ（ｊ＝１〜ｍ）を出力する（ステップＳＡ５）。ステップＳＡ５の処理が完了すると、ステップＳＡ１に戻り、以上説明したステップＳＡ１〜ＳＡ５の処理を繰り返す。 When the synthesis unit 103 receives the control data from the generation unit 102, the synthesis unit 103 performs sound image localization processing on the sound signal Ak (k = 1 to n) according to the control data (step SA4). Then, the sound signal Ak (k = 1 to n) subjected to the sound image localization processing is mixed, and the sound signal Bj (j = 1) which is the mixing result is mixed with the D / A converter 9-j (j = 1 to m). ~ M) is output (step SA5). When the process of step SA5 is completed, the process returns to step SA1 and the processes of steps SA1 to SA5 described above are repeated.

Ｄ／Ａ変換器９−ｊ（ｊ＝１〜ｍ）は、ミキシング結果である音信号Ｂｊ（ｊ＝１〜ｍ）をアナログ音信号に変換し、拡声器１１−ｊ（ｊ＝１〜ｍ）に出力する。拡声器１１−ｊ（ｊ＝１〜ｍ）は、Ｄ／Ａ変換器９−ｋ（ｋ＝１〜ｎ）からのアナログ音信号をｍ個のスピーカから音として放音する。この結果、音信号Ａｋ（ｋ＝１〜ｎ）が、制御データにより定まる位置に音像の定位した音としてリスナに聴取される。 The D / A converter 9-j (j = 1 to m) converts the sound signal Bj (j = 1 to m), which is the mixing result, into an analog sound signal, and the loudspeaker 11-j (j = 1 to m). ). The loudspeaker 11-j (j = 1-m) emits analog sound signals from the D / A converter 9-k (k = 1-n) as sound from m speakers. As a result, the sound signal Ak (k = 1 to n) is heard by the listener as a localized sound of the sound image at a position determined by the control data.

本実施形態では、複数の音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従って、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御データを設定する。そして、制御データにより定まる位置に音像が定位した音をリスナに聴取させる。従って、本実施形態によると、ユーザは複雑な操作を一切行わずに、複数の音信号Ａｋ（ｋ＝１〜ｎ）の状況（この場合、音量の大小関係）に応じて、音信号Ａｋ（ｋ＝１〜ｎ）に適用する定位を適切に切り換えることができる。 In the present embodiment, the control data of the sound image localization process of the sound signal Ak (k = 1 to n) is set according to the main voice order set for the plurality of sound signals Ak (k = 1 to n). Then, the listener is made to listen to the sound in which the sound image is localized at the position determined by the control data. Therefore, according to the present embodiment, the user does not perform any complicated operation, and the sound signal Ak (in this case, the volume relationship) depends on the situation of a plurality of sound signals Ak (k = 1 to n). The localization applied to k = 1 to n) can be appropriately switched.

また、本実施形態では、音量の大きさに応じて音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定し、主音声順位が最も大きい音信号Ａｋ（ｋ＝１〜ｎ）はセンタに、主音声順位が最も小さい音信号Ａｋ（ｋ＝１〜ｎ）は左右に定位するように音像定位処理の制御データを設定する。従って、本実施形態によると、歌唱者に自分の歌声をセンタに定位させるために、大きな声で歌唱する動機づけを行わせることができる。 Further, in the present embodiment, the main voice rank is set for the sound signal Ak (k = 1 to n) according to the loudness of the volume, and the sound signal Ak (k = 1 to n) having the highest main voice rank is the center. In addition, the control data of the sound image localization process is set so that the sound signal Ak (k = 1 to n) having the smallest main voice rank is localized to the left and right. Therefore, according to the present embodiment, the singer can be motivated to sing in a loud voice in order to localize his / her singing voice at the center.

＜第２実施形態＞
図４は、この発明の第２実施形態であるミキシング装置２０において、ＣＰＵ１が実行するミキシング制御プログラム２００の構成を説明するための図である。ミキシング制御プログラム２００は、分析部２０１と、生成部２０２と、合成部２０３と、ＵＩ（User Interface）２０４とを含んでいる。本実施形態におけるミキシング制御プログラム２００は、歌唱者の歌声等が録音されたオーディオ形式の波形データを再生して得られる音信号Ａｋ（ｋ＝１〜ｎ）にミキシング処理を行う。すなわち、本実施形態に示すミキシング装置２０は、第１実施形態に示したように、リアルタイムに入力される複数の音信号Ａｋ（ｋ＝１〜ｎ）をミキシングする処理に加えて、録音された複数の音声データ等をミキシングする処理を行う。ミキシング制御プログラム２００は、第１実施形態に示すミキシング制御プログラム１００に、ＵＩ２０４を含めた構成となっている。ＵＩ２０４は、ユーザの操作により、分析部２０１、生成部２０２および合成部２０３に操作コマンドを送信する。 <Second Embodiment>
FIG. 4 is a diagram for explaining the configuration of the mixing control program 200 executed by the CPU 1 in the mixing device 20 according to the second embodiment of the present invention. The mixing control program 200 includes an analysis unit 201, a generation unit 202, a synthesis unit 203, and a UI (User Interface) 204. The mixing control program 200 in the present embodiment performs a mixing process on the sound signal Ak (k = 1 to n) obtained by reproducing the waveform data in the audio format in which the singing voice of the singer is recorded. That is, as shown in the first embodiment, the mixing device 20 shown in the present embodiment is recorded in addition to the process of mixing a plurality of sound signals Ak (k = 1 to n) input in real time. Performs a process of mixing a plurality of audio data and the like. The mixing control program 200 includes the UI 204 in the mixing control program 100 shown in the first embodiment. The UI 204 transmits an operation command to the analysis unit 201, the generation unit 202, and the synthesis unit 203 by the operation of the user.

本実施形態では、分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から特徴量として音量を抽出するだけでなく、音色、定位、音高、歌声の継続時間等の種々の特徴量を抽出する。また、生成部２０２は、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御データを設定するだけでなく、音高、音量、音色等の種々の音響効果の制御データを設定する。合成部２０３は、音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理を制御するだけでなく、音高、音量、音色の制御等の種々の音響処理を音信号Ａｋ（ｋ＝１〜ｎ）に施す。 In the present embodiment, the analysis unit 201 not only extracts the volume as a feature amount from the sound signal Ak (k = 1 to n), but also extracts various feature amounts such as timbre, localization, pitch, and duration of singing voice. Extract. Further, the generation unit 202 not only sets the control data of the sound image localization process of the sound signal Ak (k = 1 to n), but also sets the control data of various acoustic effects such as pitch, volume, and timbre. The synthesis unit 203 not only controls the sound image localization processing of the sound signal Ak (k = 1 to n), but also performs various acoustic processing such as pitch, volume, and timbre control of the sound signal Ak (k = 1 to n). ).

図５は、本実施形態におけるミキシング制御プログラム２００の処理内容を示すフローチャートである。以下、図５を参照し、本実施形態の動作を説明する。ＣＰＵ１の指示により、データＩ／Ｏ６に格納されたオーディオ形式の波形データが再生されると、複数の音信号Ａｋ（ｋ＝１〜ｎ）が分析部２０１に入力される。 FIG. 5 is a flowchart showing the processing contents of the mixing control program 200 in the present embodiment. Hereinafter, the operation of the present embodiment will be described with reference to FIG. When the audio format waveform data stored in the data I / O 6 is reproduced according to the instruction of the CPU 1, a plurality of sound signals Ak (k = 1 to n) are input to the analysis unit 201.

分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）を受け取ると、音信号Ａｋ（ｋ＝１〜ｎ）から種々の特徴量を抽出する（ステップＳＢ１）。より具体的には、分析部２０１は、音色、定位、音高、歌声の継続時間等の種々の特徴量のうち１または複数の特徴量を抽出する。ここで、分析部２０１が抽出する特徴量は、ユーザからの指示により選択される。すなわち、ユーザはＵＩ２０４を介して、抽出すべき特徴量に対応する操作コマンドを分析部２０１に送信する。これを受け、分析部２０１は、ユーザから指定された１または複数の特徴量を音信号Ａｋ（ｋ＝１〜ｎ）から抽出する。 Upon receiving the sound signal Ak (k = 1 to n), the analysis unit 201 extracts various feature quantities from the sound signal Ak (k = 1 to n) (step SB1). More specifically, the analysis unit 201 extracts one or a plurality of feature quantities among various feature quantities such as timbre, localization, pitch, and duration of singing voice. Here, the feature amount extracted by the analysis unit 201 is selected according to an instruction from the user. That is, the user transmits an operation command corresponding to the feature amount to be extracted to the analysis unit 201 via the UI 204. In response to this, the analysis unit 201 extracts one or a plurality of feature quantities specified by the user from the sound signal Ak (k = 1 to n).

分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から１または複数の特徴量を抽出すると、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位を設定する。ここで、抽出した特徴量が複数ある場合、分析部２０１は複数の特徴量について設定された主音声順位を重みづけ加算して統合する。例えば、特徴量として音量と音色が抽出された場合、音量の主音声順位と音色の主音声順位とに重みを与え、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位を重みづけ加算により算出する。そして、重みづけ加算された主音声順位を最終的な主音声順位とする。分析部２０１は、算出された主音声順位を音信号Ａｋ（ｋ＝１〜ｎ）に対応付け、分析データとして生成部２０２に出力する（ステップＳＢ２）。なお、重みはユーザがＵＩ２０４を介して、操作コマンドを分析部２０１に送信することにより指定される。 When the analysis unit 201 extracts one or a plurality of feature quantities from the sound signal Ak (k = 1 to n), the analysis unit 201 sets the main voice order of the sound signal Ak (k = 1 to n). Here, when there are a plurality of extracted feature amounts, the analysis unit 201 weights and adds the main voice ranks set for the plurality of feature amounts and integrates them. For example, when the volume and timbre are extracted as feature quantities, the main voice rank of the volume and the main voice rank of the timbre are weighted, and the main voice rank of the sound signal Ak (k = 1 to n) is weighted and added. calculate. Then, the weighted and added main voice rank is used as the final main voice rank. The analysis unit 201 associates the calculated main voice rank with the sound signal Ak (k = 1 to n) and outputs the analysis data to the generation unit 202 (step SB2). The weight is specified by the user transmitting an operation command to the analysis unit 201 via UI204.

生成部２０２は、音信号Ａｋ（ｋ＝１〜ｎ）に設定された主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に施す種々の音響処理の制御データを設定する。ここで、生成部２０２は、定位、音量、音色の制御等の種々の音響処理のうち１または複数の音響処理の制御データを設定する（ステップＳＢ３）。 The generation unit 202 sets control data for various acoustic processes applied to the sound signal Ak (k = 1 to n) according to the main voice order set for the sound signal Ak (k = 1 to n). Here, the generation unit 202 sets control data for one or a plurality of acoustic processes among various acoustic processes such as localization, volume, and timbre control (step SB3).

例えば、音響処理として音量を制御する場合、生成部２０２は、最も主音声順位が高い音信号Ａｋ（ｋ＝１〜ｎ）の音量が最大となるように、音量の制御データを設定する。また、最も主音声順位が低い音信号Ａｋ（ｋ＝１〜ｎ）の音量が最小となるように、音量の制御データを設定する。 For example, when controlling the volume as an acoustic process, the generation unit 202 sets the volume control data so that the volume of the sound signal Ak (k = 1 to n) having the highest main voice rank is maximized. Further, the volume control data is set so that the volume of the sound signal Ak (k = 1 to n) having the lowest main voice rank is minimized.

また、音響処理として音色を制御する場合、生成部２０２は、最も主音声順位が高い音信号Ａｋ（ｋ＝１〜ｎ）の、高音領域における音圧レベルが強調されるように、イコライザの制御データを設定する。また、最も主音声順位が高い音信号Ａｋ（ｋ＝１〜ｎ）の音声周波数帯域における音圧レベルが強調されるように、イコライザの制御データを設定してもよい。 When controlling the timbre as acoustic processing, the generation unit 202 controls the equalizer so that the sound pressure level of the sound signal Ak (k = 1 to n) having the highest main voice rank is emphasized in the treble region. Set the data. Further, the control data of the equalizer may be set so that the sound pressure level in the sound frequency band of the sound signal Ak (k = 1 to n) having the highest main voice rank is emphasized.

音信号Ａｋ（ｋ＝１〜ｎ）に施す音響処理は、ユーザからの指示により選択される。すなわち、ユーザはＵＩ２０４を介して、所望の音響処理を指定する操作コマンドを生成部２０２に送信する。これを受け、生成部２０２は、ユーザから指定された１または複数の音響処理の制御データを設定する。生成部２０２は、音信号Ａｋ（ｋ＝１〜ｎ）に施す音響処理の制御データを合成部２０３に送信する。 The acoustic processing applied to the sound signal Ak (k = 1 to n) is selected according to an instruction from the user. That is, the user transmits an operation command for designating the desired acoustic processing to the generation unit 202 via the UI 204. In response to this, the generation unit 202 sets the control data of one or a plurality of acoustic processes designated by the user. The generation unit 202 transmits the control data of the acoustic processing applied to the sound signal Ak (k = 1 to n) to the synthesis unit 203.

合成部２０３は、生成部２０２から制御データを受け取ると、制御データに従って音信号Ａｋ（ｋ＝１〜ｎ）に音響処理を施す（ステップＳＢ４）。そして、音響処理が施された音信号をミキシングし、ミキシング結果である音信号Ｂｊ（ｊ＝１〜ｍ）をデータＩ／Ｏ６に出力する（ステップＳＢ５）。ステップＳＢ５の処理が完了すると、ステップＳＢ１に戻り、以上説明したステップＳＢ１〜ＳＢ５の処理を繰り返す。データＩ／Ｏ６は、音信号Ｂｊ（ｊ＝１〜ｍ）を受け取ると、オーディオ形式の波形データとして図示しないメモリに格納する。 When the synthesis unit 203 receives the control data from the generation unit 202, the synthesis unit 203 performs acoustic processing on the sound signal Ak (k = 1 to n) according to the control data (step SB4). Then, the sound signal subjected to the acoustic processing is mixed, and the sound signal Bj (j = 1 to m) which is the mixing result is output to the data I / O 6 (step SB5). When the process of step SB5 is completed, the process returns to step SB1 and the processes of steps SB1 to SB5 described above are repeated. When the data I / O6 receives the sound signal Bj (j = 1 to m), it is stored in a memory (not shown) as waveform data in audio format.

本実施形態では、複数の音信号Ａｋ（ｋ＝１〜ｎ）から抽出された１または複数の特徴量に基づき、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位が設定される。そして、この主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に１または複数の音響効果が付与される。従って、本実施形態によると、音信号Ａｋ（ｋ＝１〜ｎ）の種々の特徴を考慮したバリエーション豊かな音響処理を音信号Ａｋ（ｋ＝１〜ｎ）に施すことができる。 In the present embodiment, the main voice order of the sound signal Ak (k = 1 to n) is set based on one or a plurality of feature quantities extracted from the plurality of sound signals Ak (k = 1 to n). Then, one or a plurality of acoustic effects are added to the sound signal Ak (k = 1 to n) according to the main voice order. Therefore, according to the present embodiment, the sound signal Ak (k = 1 to n) can be subjected to a wide variety of acoustic processing in consideration of various characteristics of the sound signal Ak (k = 1 to n).

本実施形態では、オーディオ形式の波形データを再生して得られる音信号Ａｋ（ｋ＝１〜ｎ）に音響処理を施してミキシングする。従って、歌唱者は、自分の歌声等を録音して動画投稿サイトに投稿する場合に、複雑な操作を伴わずに歌声等に音響処理を施して、その歌声等をミキシングすることができる。 In the present embodiment, the sound signal Ak (k = 1 to n) obtained by reproducing the waveform data in the audio format is subjected to acoustic processing and mixed. Therefore, when a singer records his / her own singing voice and posts it on a video posting site, he / she can perform acoustic processing on the singing voice or the like and mix the singing voice or the like without complicated operations.

また、本実施形態によると、合成部２０３に音信号Ａｋ（ｋ＝１〜ｎ）の音像定位処理の制御を行わせることにより、最も上手に歌う歌唱者の歌声をセンタに定位させ、上手に歌うことができない歌唱者の歌声を左右に定位させることができる。従って、歌唱者に自分の歌声の音像をセンタに定位させるために、歌唱力を向上させようとする動機づけを行わせることができる。 Further, according to the present embodiment, by causing the synthesis unit 203 to control the sound image localization processing of the sound signal Ak (k = 1 to n), the singing voice of the singer who sings best is localized in the center, and the singing voice is well localized. The singing voice of a singer who cannot sing can be localized to the left and right. Therefore, it is possible to motivate the singer to improve his / her singing ability in order to localize the sound image of his / her singing voice at the center.

＜他の実施形態＞
以上、この発明の各種の実施形態について説明したが、この発明には他にも実施形態が考えられる。 <Other Embodiments>
Although various embodiments of the present invention have been described above, other embodiments of the present invention can be considered.

（１）第１実施形態において、合成部１０３は、制御データに従い音信号Ａｋ（ｋ＝１〜ｎ）に音像定位処理の制御を施すことにより、音像を水平方向の所定の位置に定位させた。しかし、音像が垂直方向の所定の位置に定位するように、生成部１０２に制御データを生成させてもよい。 (1) In the first embodiment, the synthesis unit 103 localizes the sound image at a predetermined position in the horizontal direction by controlling the sound image localization process on the sound signal Ak (k = 1 to n) according to the control data. .. However, the generation unit 102 may generate control data so that the sound image is localized at a predetermined position in the vertical direction.

（２）第２実施形態において、分析部２０１は、オーディオ形式の波形データの全再生区間において音信号Ａｋ（ｋ＝１〜ｎ）から特徴量を抽出し、音信号Ａｋ（ｋ＝１〜ｎ）に主音声順位を設定してもよい。また、生成部２０２は、この主音声順位に従い、音信号Ａｋ（ｋ＝１〜ｎ）に付与する音響効果の制御データを設定してもよい。さらに、合成部２０３は、この制御データに基づき、音信号Ａｋ（ｋ＝１〜ｎ）に音響処理を施してもよい。これにより、音信号Ａｋ（ｋ＝１〜ｎ）全体の音楽的な特徴を考慮した音響処理を音信号Ａｋ（ｋ＝１〜ｎ）に施すことができる。 (2) In the second embodiment, the analysis unit 201 extracts a feature amount from the sound signal Ak (k = 1 to n) in the entire reproduction section of the waveform data in the audio format, and the sound signal Ak (k = 1 to n). ) May be set as the main voice ranking. Further, the generation unit 202 may set the control data of the sound effect to be given to the sound signal Ak (k = 1 to n) according to the main voice order. Further, the synthesis unit 203 may perform acoustic processing on the sound signal Ak (k = 1 to n) based on the control data. As a result, the sound signal Ak (k = 1 to n) can be subjected to acoustic processing in consideration of the musical characteristics of the entire sound signal Ak (k = 1 to n).

（３）第２実施形態において、分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量と模範データから抽出した特徴量との類似性に基づき、音信号Ａｋ（ｋ＝１〜ｎ）の主音声順位を決定してもよい。ここで、模範データとは、例えば、模範ボーカルや模範コーラスの歌声、ＭＩＤＩ形式の演奏データ、楽譜データ等のことをいう。模範ボーカルや模範コーラスから抽出する特徴量は、音量、音高、歌声の継続時間等の種々の特徴量のうち１または複数の特徴量であってもよい。この場合、分析部２０１が抽出する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより指定される。分析部２０１は、音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量と模範データから抽出した特徴量との類似性が最も高い音信号の主音声順位を第１位とし、最も低い音信号の主音声順位を第ｎ位とする。 (3) In the second embodiment, the analysis unit 201 uses the sound signal Ak (k = 1 to n) based on the similarity between the feature amount extracted from the sound signal Ak (k = 1 to n) and the feature amount extracted from the model data. The main voice order of 1 to n) may be determined. Here, the model data refers to, for example, model vocals, singing voices of model choruses, performance data in MIDI format, score data, and the like. The feature amount extracted from the model vocal or the model chorus may be one or more of various feature amounts such as volume, pitch, and duration of singing voice. In this case, the feature amount extracted by the analysis unit 201 is specified by the user transmitting a predetermined operation command via the UI 204. The analysis unit 201 sets the main voice rank of the sound signal having the highest similarity between the feature amount extracted from the sound signal Ak (k = 1 to n) and the feature amount extracted from the model data as the first place, and the lowest sound. The main voice order of the signal is the nth place.

（４）第２実施形態において、合成部２０３は、模範データから抽出した特徴量をリファレンスとして、音信号Ａｋ（ｋ＝１〜ｎ）の特徴量を補正してもよい。ここで、模範データとは、例えば、ＭＩＤＩ形式の演奏データや模範ボーカルの歌声等のことをいう。例えば、合成部２０３は、ある演奏区間において、分析部２０１がＭＩＤＩ形式の演奏データから取得したピッチカーブデータをリファレンスとして、当該演奏区間における音信号Ａｋ（ｋ＝１〜ｎ）のピッチカーブを補正する。また、合成部２０３は、ある演奏区間において分析部２０１がＭＩＤＩ形式の演奏データから取得したベロシティ（音の強弱）データをリファレンスとして、当該演奏区間における音信号Ａｋ（ｋ＝１〜ｎ）のアーティキュレーション（例えば、音量・音韻遷移時間）を補正する。また、合成部２０３は、ある演奏区間において分析部２０１がＭＩＤＩ形式の演奏データから取得したビブラート（例えば、音高変化、音量変化）データをリファレンスとして、当該演奏区間における音信号Ａｋ（ｋ＝１〜ｎ）のビブラートを補正する。模範データから取得する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより設定される。 (4) In the second embodiment, the synthesis unit 203 may correct the feature amount of the sound signal Ak (k = 1 to n) with reference to the feature amount extracted from the model data. Here, the model data means, for example, performance data in MIDI format, singing voice of model vocal, and the like. For example, in a certain performance section, the synthesis unit 203 corrects the pitch curve of the sound signal Ak (k = 1 to n) in the performance section with reference to the pitch curve data acquired by the analysis unit 201 from the performance data in MIDI format. To do. Further, the synthesis unit 203 uses the velocity (sound intensity) data acquired by the analysis unit 201 from the performance data in the MIDI format as a reference in a certain performance section as a reference, and articulation of the sound signal Ak (k = 1 to n) in the performance section. Correct curation (for example, volume / rhyme transition time). Further, the synthesis unit 203 uses the vibrato (for example, pitch change, volume change) data acquired by the analysis unit 201 from the MIDI format performance data in a certain performance section as a reference, and the sound signal Ak (k = 1) in the performance section. ~ N) Correct the vibrato. The feature amount acquired from the model data is set by the user transmitting a predetermined operation command via UI204.

また、合成部２０３は、模範ボーカルの歌声から抽出した声質をリファレンスとして、音信号Ａｋ（ｋ＝１〜ｎ）が示す歌声の声質を補正してもよい。 Further, the synthesis unit 203 may correct the voice quality of the singing voice indicated by the sound signal Ak (k = 1 to n) by using the voice quality extracted from the singing voice of the model vocal as a reference.

（５）第２実施形態において、合成部２０３は、模範データから抽出した特徴量と音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量とを基に新たな波形データを生成し、当該波形データからなる音信号Ａｋ（ｋ＝１〜ｎ）を、音信号Ａｋ（ｋ＝１〜ｎ）にミキシングしてもよい。例えば、分析部２０１は、ＭＩＤＩ形式の演奏データからピッチカーブ、楽曲のコード進行情報、ダイヤトニックスケール等の特徴量を抽出する。合成部２０３は、この特徴量が音信号Ａｋ（ｋ＝１〜ｎ）から抽出した特徴量と調和するように、コーラス音声やダブリング音声等の波形を生成する。そして、生成したコーラス音声やダブリング音声が示す音信号Ａｋ（ｋ＝１〜ｎ）と各入力音信号Ａｋ（ｋ＝１〜ｎ）とをミキシングすることにより、音信号Ａｋ（ｋ＝１〜ｎ）が示す音声にコーラス音声やダブリング音声を重畳させる。模範データから抽出する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより設定される。 (5) In the second embodiment, the synthesis unit 203 generates new waveform data based on the feature amount extracted from the model data and the feature amount extracted from the sound signal Ak (k = 1 to n). The sound signal Ak (k = 1 to n) composed of waveform data may be mixed with the sound signal Ak (k = 1 to n). For example, the analysis unit 201 extracts features such as a pitch curve, chord progression information of a musical piece, and a diatonic scale from performance data in MIDI format. The synthesis unit 203 generates waveforms such as chorus voice and doubling voice so that the feature amount is in harmony with the feature amount extracted from the sound signal Ak (k = 1 to n). Then, by mixing the sound signal Ak (k = 1 to n) indicated by the generated chorus voice or doubling voice and each input sound signal Ak (k = 1 to n), the sound signal Ak (k = 1 to n) is mixed. ) Is superimposed on the chorus sound and the doubling sound. The feature amount extracted from the model data is set by the user transmitting a predetermined operation command via UI204.

（６）第２実施形態において、合成部２０３は、音信号Ａｋ（ｋ＝１〜ｎ）から抽出された特徴量を基に、当該特徴量を取得したパートまたはそれ以外のパートの音信号Ａｋ（ｋ＝１〜ｎ）の特徴量を加工してもよい。 (6) In the second embodiment, the synthesis unit 203 uses the feature amount extracted from the sound signal Ak (k = 1 to n) to obtain the feature amount, or the sound signal Ak of the part obtained from the other part or the other part. The feature amount of (k = 1 to n) may be processed.

例えば、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したピッチカーブデータを基に、当該ピッチカーブデータを抽出したパートのピッチカーブを加工する。これにより、当該パートのピッチカーブの特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したピッチカーブデータを基に、当該ピッチカーブデータを抽出したパートとは別のパートのピッチカーブを加工してもよい。これにより、あるパートのピッチカーブの特徴を、他のパートにも付与することができる。 For example, the synthesis unit 203 processes the pitch curve of the part from which the pitch curve data is extracted based on the pitch curve data extracted from the sound signal Ak (k = 1 to n) by the analysis unit 201. As a result, the characteristics of the pitch curve of the part can be changed by an appropriate amount. Further, the synthesis unit 203 processes the pitch curve of a part different from the part from which the pitch curve data is extracted, based on the pitch curve data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n). You may. As a result, the characteristics of the pitch curve of one part can be added to other parts.

また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したベロシティデータを基に、当該ベロシティデータを抽出したパートのアーティキュレーションを加工する。これにより、当該パートのアーティキュレーションの特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したベロシティデータを基に、当該ベロシティデータを抽出したパートとは別のパートのアーティキュレーションを加工してもよい。これにより、あるパートのアーティキュレーションの特徴を、他のパートにも付与することができる。 Further, the synthesis unit 203 processes the articulation of the part from which the velocity data is extracted based on the velocity data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n). As a result, the articulation characteristics of the part can be changed by an appropriate amount. Further, the synthesis unit 203 processes the articulation of a part different from the part from which the velocity data is extracted based on the velocity data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n). May be good. As a result, the articulation characteristics of one part can be given to other parts.

また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したビブラートデータを基に、当該ビブラートデータを抽出したパートのビブラートを加工する。これにより、当該パートのビブラートの特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出したビブラートデータを基に、当該ビブラートデータを抽出したパートとは別のパートのビブラートを加工してもよい。これにより、あるパートのビブラートの特徴を、他のパートにも付与することができる。 Further, the synthesis unit 203 processes the vibrato of the part from which the vibrato data is extracted based on the vibrato data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n). As a result, the vibrato characteristics of the part can be changed by an appropriate amount. Further, the synthesis unit 203 may process the vibrato of a part different from the part from which the vibrato data is extracted, based on the vibrato data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n). .. As a result, the vibrato characteristics of one part can be added to other parts.

また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から取得した歌唱者の声質データを基に、当該声質データを取得したパートの声質を加工する。これにより、当該パートの声質の特徴を適量だけ変化させることができる。また、合成部２０３は、分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出した声質データを基に、当該声質データを抽出したパートとは別のパートの声質を加工してもよい。これにより、あるパートの声質の特徴を、他のパートにも付与することができる。 Further, the synthesis unit 203 processes the voice quality of the part from which the voice quality data has been acquired, based on the voice quality data of the singer acquired by the analysis unit 201 from the sound signal Ak (k = 1 to n). As a result, the characteristics of the voice quality of the part can be changed by an appropriate amount. Further, the synthesis unit 203 may process the voice quality of a part different from the part from which the voice quality data is extracted based on the voice quality data extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n). .. As a result, the characteristics of the voice quality of one part can be given to other parts.

分析部２０１が音信号Ａｋ（ｋ＝１〜ｎ）から抽出する特徴量は、ユーザがＵＩ２０４を介して所定の操作コマンドを送信することにより設定される。また、合成部２０３により加工されるパートは、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより設定される。 The feature amount extracted by the analysis unit 201 from the sound signal Ak (k = 1 to n) is set by the user transmitting a predetermined operation command via the UI 204. Further, the part processed by the synthesis unit 203 is set by the user transmitting a predetermined operation command via the UI 204.

（７）第２実施形態において、合成部２０３は、模範データから抽出された特徴量を基に、所定の区間を設定し、この区間においてのみミキシングされた音信号Ａｋ（ｋ＝１〜ｎ）を出力させてもよい。例えば、合成部２０３は、ＭＩＤＩ形式の演奏データ等の模範データから各種特徴量を抽出し、歌い出し〜Ａメロ〜サビに至るまでの区間、歌い出し〜最大音量付近に至るまでの区間、歌いだし〜最小音量付近に至るまでの区間等を特定する。そして、これらの指定された区間においてのみミキシングされた音信号Ａｋ（ｋ＝１〜ｎ）を出力する。 (7) In the second embodiment, the synthesis unit 203 sets a predetermined section based on the feature amount extracted from the model data, and the sound signal Ak (k = 1 to n) mixed only in this section. May be output. For example, the synthesis unit 203 extracts various feature quantities from model data such as MIDI-format performance data, and sings, a section from singing to verse to chorus, a section from singing to near the maximum volume, and singing. However, specify the section from the beginning to the vicinity of the minimum volume. Then, the mixed sound signal Ak (k = 1 to n) is output only in these designated sections.

また、合成部２０３は、設定された複数の区間を時系列に接続したダイジェストを作成し、このダイジェストに従い順次ミキシングされた音信号Ａｋ（ｋ＝１〜ｎ）を出力してもよい。この場合、ダイジェストの時間長は、ネットワークの混雑状況等を考慮して適宜変更できるようにしてもよい。これらの区間やダイジェストの時間長は、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより設定される。 Further, the synthesis unit 203 may create a digest in which a plurality of set sections are connected in a time series, and output a sound signal Ak (k = 1 to n) sequentially mixed according to the digest. In this case, the digest time length may be appropriately changed in consideration of the network congestion status and the like. The time length of these sections and digests is set by the user transmitting a predetermined operation command via UI204.

（８）第１実施形態および第２実施形態において、合成部１０３および２０３に、主音声順位に従い、歌唱者等の画像を表示部４または他の表示手段に表示させるための表示制御信号を出力させてもよい。この場合、主音声順位が最も高い歌唱者の画像を表示部４または他の表示手段のセンタに表示させ、主音声順位が最も低い歌唱者の画像を表示部４または他の表示手段の左右に小さく表示させる。これにより、歌唱者に自身の画像をセンタに表示させるために、歌唱力を向上させようとする動機づけを行わせることができる。 (8) In the first embodiment and the second embodiment, the synthesis units 103 and 203 output display control signals for displaying an image of a singer or the like on the display unit 4 or other display means according to the main voice order. You may let me. In this case, the image of the singer with the highest main voice rank is displayed on the center of the display unit 4 or other display means, and the image of the singer with the lowest main voice rank is displayed on the left and right of the display unit 4 or other display means. Display small. As a result, the singer can be motivated to improve his / her singing ability in order to display his / her own image in the center.

（９）上記（１）〜（８）に示す制御を実行するか否かの判断は、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより決定してもよい。また、第２実施形態において、逐次入力される複数の音信号Ａｋ（ｋ＝１〜ｎ）をリアルタイムでミキシングする処理、または録音された複数の音声データが示す音信号Ａｋ（ｋ＝１〜ｎ）をミキシングする処理のいずれを行うかの判断は、ユーザがＵＩ２０４を介して、所定の操作コマンドを送信することにより決定してもよい。 (9) The determination as to whether or not to execute the controls shown in (1) to (8) above may be determined by the user transmitting a predetermined operation command via the UI204. Further, in the second embodiment, a process of mixing a plurality of sequentially input sound signals Ak (k = 1 to n) in real time, or a sound signal Ak (k = 1 to n) indicated by a plurality of recorded voice data. ) May be determined by the user transmitting a predetermined operation command via the UI204.

（１０）第１実施形態および第２実施形態に示すミキシング装置は、クライアントサーバシステム（分散型コンピュータシステム）としてもよい。すなわち、クライアント側に集音器７−ｋ（ｋ＝１〜ｎ）およびＡ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）を設置し、歌声等の集音および音信号Ａｋ（ｋ＝１〜ｎ）のＡ／Ｄ変換を行わせる。そして、Ａ／Ｄ変換後の音信号Ａｋ（ｋ＝１〜ｎ）をサーバにアップロードし、サーバ側に設置されたＣＰＵ１にミキシング制御プログラム１００または２００を実行させる。そして、ミキシングが施された音信号Ｂｊ（ｊ＝１〜ｍ）をクライアント側にダウンロードする構成としてもよい。 (10) The mixing device shown in the first embodiment and the second embodiment may be a client-server system (distributed computer system). That is, a sound collector 7-k (k = 1 to n) and an A / D converter 8-k (k = 1 to n) are installed on the client side to collect sound such as singing voice and sound signal Ak (k =). 1 to n) A / D conversion is performed. Then, the sound signal Ak (k = 1 to n) after the A / D conversion is uploaded to the server, and the CPU 1 installed on the server side executes the mixing control program 100 or 200. Then, the mixed sound signal Bj (j = 1 to m) may be downloaded to the client side.

また、クライアント側で、集音器７−ｋ（ｋ＝１〜ｎ）による集音、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）によるＡ／Ｄ変換、分析部１０１および２０１による分析データの生成を行わせてもよい。この場合、Ａ／Ｄ変換後の音信号Ａｋ（ｋ＝１〜ｎ）および分析データをサーバにアップロードし、サーバ側に生成部１０２および２０２による制御データの生成、合成部１０３および２０３によるミキシングを行わせる。そして、ミキシングが施された音信号Ｂｊ（ｊ＝１〜ｍ）をクライアント側にダウンロードする構成としてもよい。 Further, on the client side, sound collection by the sound collector 7-k (k = 1 to n), A / D conversion by the A / D converter 8-k (k = 1 to n), and analysis units 101 and 201. Analytical data may be generated. In this case, the sound signal Ak (k = 1 to n) and the analysis data after A / D conversion are uploaded to the server, control data is generated by the generation units 102 and 202, and mixing is performed by the synthesis units 103 and 203 on the server side. Let me do it. Then, the mixed sound signal Bj (j = 1 to m) may be downloaded to the client side.

また、クライアント側で、集音器７−ｋ（ｋ＝１〜ｎ）による集音、Ａ／Ｄ変換器８−ｋ（ｋ＝１〜ｎ）によるＡ／Ｄ変換、分析部１０１および２０１による分析データの生成、生成部１０２および２０２による制御データの生成を行わせてもよい。この場合、Ａ／Ｄ変換後の音信号Ａｋ（ｋ＝１〜ｎ）および制御データをサーバにアップロードし、サーバ側に合成部１０３および２０３によるミキシングを行わせる。そして、ミキシングが施された音信号Ｂｊ（ｊ＝１〜ｍ）をクライアント側にダウンロードする構成としてもよい。 Further, on the client side, sound collection by the sound collector 7-k (k = 1 to n), A / D conversion by the A / D converter 8-k (k = 1 to n), and analysis units 101 and 201. The analysis data may be generated, and the control data may be generated by the generation units 102 and 202. In this case, the sound signal Ak (k = 1 to n) and the control data after the A / D conversion are uploaded to the server, and the server side is made to perform mixing by the synthesis units 103 and 203. Then, the mixed sound signal Bj (j = 1 to m) may be downloaded to the client side.

また、クライアントサーバシステムにした場合、サーバ側の処理結果を随時クライアント側でモニタリングできるようにすることで、クライアント側はサーバの処理能力に応じて処理量を調整することができる。 Further, in the case of a client-server system, the processing amount on the server side can be adjusted according to the processing capacity of the server by enabling the client side to monitor the processing result on the server side at any time.

（１１）上記各実施形態において、主音声順位が同順位の音信号Ａｋ（ｋ＝１〜ｎ）が複数ある場合、以下のような処理を実行してもよい。例えば、主音声順位が第１位の音信号が２つある場合、各々を同率１位とする。そして、第２位を欠番とし、他の音信号に第３位〜第ｎ位までの主音声順位を設定する。あるいは、第２位を欠番とせず、他の音信号Ａｋに第２位〜第ｎ−１位までの主音声順位を設定する。あるいは、主音声順位が第１位の音信号Ａｋ（ｋ＝１〜ｎ）が２つある場合、各々を同率１位とせず、添え字の番号ｋ（１〜ｎ）の小さい方の音信号の主音声順位を第１位、大きい方の音信号の主音声順位を第２位と設定してもよい。 (11) In each of the above embodiments, when there are a plurality of sound signals Ak (k = 1 to n) having the same main voice rank, the following processing may be executed. For example, when there are two sound signals having the first highest voice rank, each of them has the same ratio of the first voice signal. Then, the second place is a missing number, and the main voice ranks from the third place to the nth place are set for other sound signals. Alternatively, the second place is not omitted, and the main voice ranks from the second place to the n-1th place are set in the other sound signals Ak. Alternatively, when there are two sound signals Ak (k = 1 to n) having the first main voice rank, they are not set to the same ratio and the first place, and the smaller sound signal with the subscript number k (1 to n) is used. The main voice rank of is set to the first place, and the main voice rank of the larger sound signal may be set to the second place.

（１２）上記各実施形態では、歌唱音声信号をミキシングするミキシング装置にこの発明を適用したが、この発明は楽音信号をミキシングするミキシング装置や、歌唱音声信号と楽音信号をミキシングするミキシング装置にも適用可能である。 (12) In each of the above embodiments, the present invention is applied to a mixing device that mixes a singing voice signal, but the present invention is also applied to a mixing device that mixes a music signal and a mixing device that mixes a singing voice signal and a music signal. Applicable.

１…ＣＰＵ、２…ＲＯＭ、３…ＲＡＭ、４…表示部、５…操作部、６…データＩ／Ｏ、７−ｋ（ｋ＝１〜ｎ）…集音器、８−ｋ（ｋ＝１〜ｎ）…Ａ／Ｄ変換器、９−ｊ（ｊ＝１〜ｍ）…Ｄ／Ａ変換器、１０−ｊ（ｊ＝１〜ｍ）…増幅器、１１−ｊ（ｊ＝１〜ｍ）…拡声器、１２…バス、１０，２０…ミキシング装置、１００，２００…ミキシング制御プログラム、１０１，２０１…分析部、１０２，２０２…生成部、１０３，２０３…合成部、２０４…ＵＩ。 1 ... CPU, 2 ... ROM, 3 ... RAM, 4 ... Display unit, 5 ... Operation unit, 6 ... Data I / O, 7-k (k = 1-n) ... Sound collector, 8-k (k = 1-n) ... A / D converter, 9-j (j = 1-m) ... D / A converter, 10-j (j = 1-m) ... Amplifier, 11-j (j = 1-m) ) ... Loudspeaker, 12 ... Bus, 10, 20 ... Mixing device, 100, 200 ... Mixing control program, 101, 201 ... Analysis unit, 102, 202 ... Generation unit, 103, 203 ... Synthesis unit, 204 ... UI.

Claims

Multiple features are extracted from each sound signal in multiple sound signals,
The main voice rank indicating the rank of each sound signal in the plurality of sound signals is set based on a plurality of feature quantities extracted from each sound signal.
A sound signal control method for generating control data for controlling each of the plurality of sound signals based on the order of the sound signals indicated by the main voice order set for the sound signals.

The volume of each sound signal in the plurality of sound signals, the characteristics of the equalizer that processes each sound signal, the acoustic effect given to each sound signal, or the localization position of each sound signal is determined according to the control data corresponding to each sound signal. The sound signal control method according to claim 1 for controlling.

The plurality of sound signals are sound signals of singing by a plurality of singers.
The sound signal control method according to claim 1 or 2, wherein the display of each image of the plurality of singers is controlled according to the order of the sound signals indicated by the main voice order of the sound signals corresponding to the singer.

A first feature amount and a second feature amount are extracted as the feature amount from each sound signal in the plurality of sound signals, and the feature amount is extracted.
The sound signal control method according to any one of claims 1 to 3, wherein the main sound order of each of the plurality of sound signals is set based on the first feature amount and the second feature amount of the plurality of sound signals.

The main voice rank of each sound signal has a larger value as the rank of the sound signal in the plurality of sound signals is higher.
In setting the main voice order of the plurality of sound signals,
The first main voice rank is set for the first feature amount of each sound signal, and the first main voice rank is set.
A second main voice rank is set for the second feature amount of each sound signal, and the second main voice rank is set.
The sound signal control method according to claim 4, wherein the main voice rank of each sound signal is set by weighting and adding the first main voice rank of each sound signal and the second main voice rank .