JP2011085731A

JP2011085731A - Musical signal processing device and program

Info

Publication number: JP2011085731A
Application number: JP2009238083A
Authority: JP
Inventors: Motoaki Takashima; 基明高島
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2009-10-15
Filing date: 2009-10-15
Publication date: 2011-04-28
Anticipated expiration: 2029-10-15
Also published as: JP5703555B2

Abstract

<P>PROBLEM TO BE SOLVED: To create a lead tone and an additional tone which is stable without pitch fluctuation, when a musical signal in which many pitches are fluctuated is input in a short period of time. <P>SOLUTION: Two similar control means with each independent structure are separately provided, that is: a first pitch detecting means for detecting a note corresponding to a pitch name, and a first musical tone creating means for creating a first musical tone signal in which a pitch is controlled to the detected note; and a second pitch detecting means for detecting a note corresponding to a pitch name, and a second musical tone creating means for creating a second musical tone signal in which a pitch is controlled to an arbitrary note determined based on the detected note. Different control is performed on each of them, according to control information. Thereby, when a musical signal with changing note by fluctuating up and down is input, the lead tone in which a note is controlled based on the musical signal is created, and the additional tone following to note variation of the lead tone is created as a stable musical tone without pitch fluctuation. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、入力された楽音又は音声等のピッチ（音高）に基づきピッチ制御されるリード音を生成すると共に、前記生成したリード音のピッチ変動に追従してピッチ制御される付加音を生成する楽音信号処理装置及びプログラムに関する。特に、短い時間内にピッチが数多く変動する楽音又は音声等が入力された場合において、ピッチにふらつきがなく聴感上落ち着いた安定感のある付加音を生成する技術に関する。 The present invention generates a lead sound that is pitch-controlled based on the pitch (pitch) of the input musical sound or voice, and generates an additional sound that is pitch-controlled following the pitch fluctuation of the generated lead sound. The present invention relates to a musical sound signal processing device and a program. In particular, the present invention relates to a technique for generating a stable added sound that is stable in terms of audibility and has no wobbling in the pitch when musical sounds or voices whose pitch changes in a short time are input.

従来から、入力された楽音又は音声等の楽音信号のピッチを検出し（最終的には音楽の音名のいずれかに対応する特定のピッチを検出する）、該検出したピッチのリード音の楽音信号（第１の楽音信号）を生成すると共に、前記検出したピッチと鍵盤等から入力されたコード情報とを元にして別途新たにピッチ（同様に音楽の音名のいずれかに対応する特定のピッチ）を決定し、前記生成したリード音を主音とする別途独立した付加音として前記決定したピッチのハーモニー音の楽音信号（第２の楽音信号）を自動的に生成する楽音生成機能を有する楽音信号処理装置及びプログラムが知られている。こうした装置の一例を挙げると、下記に示す特許文献１に記載の装置がある。なお、この明細書において、楽音信号という場合、音楽的な音の信号に限るものではなく、音声あるいはその他任意の音の信号を含んでいてもよい意味あいで用いるものとする。 Conventionally, the pitch of a musical tone signal such as an input musical tone or voice is detected (a specific pitch corresponding to one of the musical pitch names is finally detected), and the musical tone of the lead sound at the detected pitch. A signal (first musical tone signal) is generated, and a new pitch (similarly corresponding to one of the musical pitch names) is newly created based on the detected pitch and chord information input from the keyboard or the like. A tone having a tone generation function for automatically generating a tone signal (second tone signal) of a harmony tone having the determined pitch as a separate independent additional tone having the generated lead tone as a main tone. Signal processing apparatuses and programs are known. As an example of such an apparatus, there is an apparatus described in Patent Document 1 shown below. In this specification, the term “musical sound signal” is not limited to a musical sound signal, but is used in the sense that it may include a sound or any other sound signal.

ここで、特許文献１に記載された装置等における従来知られた楽音生成処理手順について説明する。図４は、従来知られた楽音生成処理手順を説明するための概念図である。図４左図は当該装置において実行される処理の流れを順に示し、図４右図は各処理の実行に伴う信号波形の変化を示している。縦軸は周波数であり、横軸は時間である。また、図５は、後述するようにしてハーモニー音の音高を決定する際に参照される従来知られた音高決定テーブルのデータ構成を示す概念図である。 Here, a conventionally known musical tone generation processing procedure in the apparatus described in Patent Document 1 will be described. FIG. 4 is a conceptual diagram for explaining a conventionally known musical sound generation processing procedure. The left diagram in FIG. 4 sequentially shows the flow of processing executed in the apparatus, and the right diagram in FIG. 4 shows changes in signal waveforms accompanying the execution of each processing. The vertical axis is frequency and the horizontal axis is time. FIG. 5 is a conceptual diagram showing a data structure of a conventionally known pitch determination table that is referred to when determining the pitch of a harmony sound as described later.

まず、マイクロフォン等を介して入力される音声信号は「周波数検出」処理により、周波数信号に変換される。この「周波数検出」処理は、例えば音声分析の分野で周知の技術であるゼロクロス法などの公知のどのような技術を用いてもよいことから、ここでの詳しい説明を省略する。次に、前記周波数信号を「平坦化」処理することによって、周波数信号の変化を平坦化（又は平滑化とも呼ばれる）する。該平坦化された周波数信号は「階名検出」処理により、所定時間毎に１２音音階の階名（音名）のいずれかに離散化される。具体的には、平坦化された周波数信号が半音（１００セント）単位で定められた複数の音楽の音名のいずれかに対応する所定のピッチに丸められる（階名信号と呼ぶ）。このようにして、入力された音声信号のピッチを検出する。「収束曲線」処理では、前記階名信号を時間的に連続変化する信号とする。そして、この時間的に連続変化する信号を「出力変調」処理することによって、入力された音声信号のピッチを変調したリード音の出力信号を生成する。ただし、この例では検出された音声信号のピッチそのものをリード音の音高として決定するものを示す。 First, an audio signal input via a microphone or the like is converted into a frequency signal by “frequency detection” processing. For this “frequency detection” process, any known technique such as the zero cross method, which is a well-known technique in the field of speech analysis, may be used, and detailed description thereof is omitted here. Next, the frequency signal is flattened (or also referred to as smoothing) by “flattening” the frequency signal. The flattened frequency signal is discretized into one of twelve-tone scale names (pitch names) every predetermined time by “class name detection” processing. Specifically, the flattened frequency signal is rounded to a predetermined pitch corresponding to one of a plurality of musical names defined in units of semitones (100 cents) (referred to as a name signal). In this way, the pitch of the input audio signal is detected. In the “convergence curve” processing, the floor signal is a signal that changes continuously in time. Then, the output signal of the lead sound in which the pitch of the input audio signal is modulated is generated by performing “output modulation” processing on the signal that continuously changes over time. In this example, however, the pitch of the detected audio signal is determined as the pitch of the lead sound.

一方、ハーモニー音を付加する場合には、前記「階名検出」処理において得られた音声信号のピッチ検出結果（又は前記ピッチ検出結果に応じて決定されるリード音の音高）と鍵盤等から入力されたコード情報とに基づき、予め用意された図５に示すような音高決定テーブルに従って前記所定時間毎に１２音音階の階名（音名）のいずれかを決定する。すなわち、図５に示す音高決定テーブルはコード毎に１テーブルずつ複数のテーブルがＲＯＭやＲＡＭ等に予め記憶されており、前記コード情報に従って対応する１テーブルが特定される。ここでは一例として、「Ｃメジャー」コード用のテーブルのみを示している。そして、前記入力された音声信号のピッチ検出を契機（トリガ）としてすぐに、該ピッチ検出結果に基づき前記特定されたテーブルを参照することでハーモニー音の音高として音楽の音名のいずれかに対応した特定の音高が決定され、該音高にピッチ制御されたハーモニー音が生成されるようになっている。ここで、図５に示す音高決定テーブルにおいて、「Ｅ（０）」は検出されたリード音の音高と同じオクターブの「Ｅ」音であることを示し、「Ｃ（＋１）」は検出されたリード音の音高から１つ上のオクターブの「Ｃ」音であることを示す。したがって、例えばリード音の音高が「Ｅ３」である場合には第１のハーモニー音の音高として「Ｇ３」、第２のハーモニー音の音高として「Ｃ４」にそれぞれ決定されることになる。 On the other hand, when adding a harmony sound, the pitch detection result (or the pitch of the lead sound determined according to the pitch detection result) of the audio signal obtained in the “class name detection” process and the keyboard Based on the input chord information, one of 12 scale names (pitch names) is determined every predetermined time according to a pitch determination table as shown in FIG. 5 prepared in advance. That is, the pitch determination table shown in FIG. 5 is stored in advance in a ROM, RAM, or the like, one table for each chord, and one corresponding table is specified according to the chord information. Here, as an example, only the table for the “C major” code is shown. Then, as soon as the pitch detection of the input voice signal is triggered (trigger), the specified table is referred to based on the pitch detection result, and the pitch of the harmony sound is set to one of the musical pitch names. A corresponding specific pitch is determined, and a harmony sound pitch-controlled to the pitch is generated. Here, in the pitch determination table shown in FIG. 5, “E (0)” indicates an “E” sound having the same octave as the pitch of the detected lead sound, and “C (+1)” is detected. This indicates that the pitch is “C”, one octave above the pitch of the lead sound. Therefore, for example, when the pitch of the lead sound is “E3”, the pitch of the first harmony sound is determined as “G3”, and the pitch of the second harmony sound is determined as “C4”. .

このようにして、図５に示すような音高決定テーブルに従って決定された１２音音階の階名のいずれかに対応するピッチからなる階名信号に基づき、前記リード音の生成と同様にして「収束曲線」処理及び「出力変調」処理が順次に行われることによって、１乃至複数のハーモニー音の出力信号は生成される。なお、上記のようにして生成されるリード音及びハーモニー音のノートオンのタイミングは入力された音声信号のピッチが検出された時点であり、他方ノートオフのタイミングは入力された音声信号のピッチが検出されなくなった時点である。 Thus, based on the floor name signal composed of the pitch corresponding to one of the floor names of 12 scales determined according to the pitch determination table as shown in FIG. By performing the “convergence curve” process and the “output modulation” process in sequence, an output signal of one or more harmony sounds is generated. Note that the note-on timing of the lead sound and the harmony sound generated as described above is the time when the pitch of the input audio signal is detected, while the note-off timing is the pitch of the input audio signal. This is the point when it is no longer detected.

特開平11-133954号公報Japanese Patent Laid-Open No. 11-133954

上述したように従来の装置では、入力された音声信号のピッチ検出結果（つまりはリード音の音高）に基づいてハーモニー音の音高を決定するようにしていることから、ハーモニー音の音高はリード音の音高に左右されることが理解できる。そうであるならば、入力された音声信号の母音検出から次の母音検出の間までのような短い時間内にビブラートのような音高が上下に揺らぎながら変動する音声信号が入力されている場合には、リード音の揺らぎに比べるとより大きく音高が連続的に揺らいでいるハーモニー音が生成される恐れがあり、そのようなハーモニー音は落ち着きがなく聴くに耐えがたいので都合が悪い。 As described above, the conventional apparatus determines the pitch of the harmony sound based on the pitch detection result (that is, the pitch of the lead sound) of the input audio signal. It can be understood that depends on the pitch of the lead sound. If so, when a voice signal is input that fluctuates up and down like a vibrato within a short period of time between the detection of the vowel of the input voice signal and the detection of the next vowel. There is a risk that a harmony sound whose pitch is continuously fluctuating larger than the fluctuation of the lead sound may be generated, and such a harmony sound is inconvenient and difficult to listen to.

例えば、図５に示した音高決定テーブルによれば、入力された音声信号（ひいてはリード音）が音高「Ｅ３」と音高「Ｆ３」との間で揺らぐようなビブラートである場合、第１のハーモニー音は音高「Ｇ３」と音高「Ｃ４」との間を揺らぐように連続的に変化する出力信号となってしまう。これは、入力された音声信号が僅か半音の間だけで揺らいでいるにも関わらず、付加されるハーモニー音は短い時間内に４半音もの大きな音高の揺らぎをもつ音跳びが繰り返されることを表しており、このようなハーモニー音はとてもビブラートに対する表情付けとして使えるものではない。 For example, according to the pitch determination table shown in FIG. 5, when the input audio signal (and thus the lead sound) is a vibrato that fluctuates between the pitch “E3” and the pitch “F3”, The harmony sound of 1 becomes an output signal that continuously changes so as to fluctuate between the pitch “G3” and the pitch “C4”. This means that even though the input audio signal fluctuates for only a semitone, the added harmony sound repeats a jump with a large pitch fluctuation of 4 semitones within a short period of time. This kind of harmony sound is not very useful as an expression for vibrato.

また、上記不都合を避けるために他の方法として、入力された音声信号のピッチ検出の頻度そのものを少なくすることが考えられる。しかし、リード音及びハーモニー音は共に前記音声信号のピッチ検出に応じて生成されることから、そうした場合にはハーモニー音だけでなくリード音自体の発生頻度も少なくなってしまい、入力された音声信号そのものが持っていた音楽的な個性や表現力等が失われる恐れがあることから、この方法を採用するのはとても都合が悪い。 In order to avoid the inconvenience, as another method, it is conceivable to reduce the frequency of pitch detection of the input audio signal itself. However, since both the lead sound and the harmony sound are generated according to the pitch detection of the sound signal, in such a case, not only the harmony sound but also the frequency of the lead sound itself is reduced, and the input sound signal Adopting this method is very inconvenient because there is a risk of losing the musical personality and expressiveness that it had.

本発明は上述の点に鑑みてなされたもので、入力された楽音信号のピッチ変動が短い時間内に数多く繰り返し生じている場合であっても、入力された楽音信号に基づき該信号の音楽的な個性や表現力を反映したリード音を生成すると共に、ピッチがふらつくことなく聴感上落ち着いた安定感ある付加音を生成することができるようにした楽音信号処理装置及びプログラムを提供しようとするものである。 The present invention has been made in view of the above points, and even if the pitch variation of the input musical sound signal repeatedly occurs within a short time, the musical signal of the signal is based on the input musical sound signal. A musical tone signal processing apparatus and program that can generate a lead sound that reflects individuality and expressiveness, and that can generate a stable additional sound that is calm and audible without fluctuation in pitch It is.

本発明に係る楽音信号処理装置は、入力される楽音信号に基づき音高が制御される第１の楽音信号を生成すると共に、前記第１の楽音信号の音高変動に追従して音高が制御される第２の楽音信号を生成する楽音信号処理装置において、楽音信号を入力する入力手段と、前記入力される楽音信号を周波数分析する分析手段と、前記周波数分析に応じて得られる周波数信号の変化を平坦化し、該平坦化した周波数信号に従って音名に対応した音高のいずれかを順次に検出する第１ピッチ検出手段と、前記第１ピッチ検出手段により検出された音高にピッチ制御された第１の楽音信号を生成する第１楽音生成手段と、制御情報を指定する指定手段と、前記周波数分析に応じて得られる周波数信号の変化を平坦化し、該平坦化した周波数信号に従って音名に対応した音高のいずれかを順次に検出する第２ピッチ検出手段であって、該第２ピッチ検出手段は前記制御情報に基づき制御されるものと、前記第２ピッチ検出手段により検出された音高に基づき決定される任意の音高にピッチ制御された第２の楽音信号を生成する第２楽音生成手段と、前記生成した第１及び第２の楽音信号の少なくとも一方を出力する出力手段とを具えてなり、前記第２ピッチ検出手段は、前記第１ピッチ検出手段により検出された音高に従う音高変化とは時間的に異なる音高変化を伴う音高を順次に検出するように、前記制御情報に基づいて前記第１ピッチ検出手段とは独立に制御されることを特徴とする。 The musical tone signal processing apparatus according to the present invention generates a first musical tone signal whose pitch is controlled based on an inputted musical tone signal, and the pitch follows the fluctuation in pitch of the first musical tone signal. In a musical tone signal processing apparatus for generating a second musical tone signal to be controlled, input means for inputting a musical tone signal, analyzing means for analyzing the frequency of the inputted musical tone signal, and a frequency signal obtained in accordance with the frequency analysis The first pitch detection means for sequentially detecting any of the pitches corresponding to the pitch names according to the flattened frequency signal, and the pitch control to the pitch detected by the first pitch detection means. A first musical tone generation means for generating the first musical tone signal, a designation means for designating control information, and a change in the frequency signal obtained in accordance with the frequency analysis is flattened, and according to the flattened frequency signal Second pitch detecting means for sequentially detecting any one of pitches corresponding to names, the second pitch detecting means being controlled based on the control information, and being detected by the second pitch detecting means. Second musical tone generating means for generating a second musical tone signal pitch-controlled to an arbitrary pitch determined based on the pitch, and an output for outputting at least one of the generated first and second musical tone signals. And the second pitch detection means sequentially detects pitches with pitch changes that are temporally different from the pitch changes according to the pitches detected by the first pitch detection means. Further, the control is performed independently of the first pitch detection means based on the control information.

本発明によると、音名に対応した音高を検出する第１ピッチ検出手段及び前記第１ピッチ検出手段により検出された音高にピッチ制御された第１の楽音信号を生成する第１楽音生成手段と、音名に対応した音高を検出する第２ピッチ検出手段及び前記第２ピッチ検出手段により検出された音高に基づき決定される任意の音高にピッチ制御された第２の楽音信号を生成する第２楽音生成手段とをそれぞれ分けて構成する。そして、前記第２ピッチ検出手段に対し制御情報に基づいて前記第１ピッチ検出手段とは異なる制御を行わせるようにしている。すなわち、第１の楽音信号を生成するための前記第１ピッチ検出手段及び前記第１楽音生成手段と、第２の楽音信号を生成するための前記第２ピッチ検出手段及び前記第２楽音生成手段とをほぼ同じ制御を行うがそれぞれ独立した構成で別途に設けておき、制御情報に従ってそれぞれに異なる制御を行わせる。特には前記第２ピッチ検出手段を前記制御情報に基づき、前記第１ピッチ検出手段により検出された音高に従う音高変化とは時間的に異なる音高変化を伴う音高を順次に検出するように制御させることによって、例えばビブラートのような音高が上下に揺らぎながら変化する楽音信号が入力された場合であっても、該入力される楽音信号に基づき音高が制御される第１の楽音信号を生成すると共に、前記第１の楽音信号の音高変動に追従して音高が制御される第２の楽音信号を落ち着いた聴感上安定感のある楽音として別々に生成することができるようになる。 According to the present invention, the first pitch detection means for detecting the pitch corresponding to the pitch name and the first musical tone generation for generating the first musical tone signal pitch-controlled to the pitch detected by the first pitch detection means. Means, a second pitch detecting means for detecting a pitch corresponding to the pitch name, and a second musical tone signal pitch-controlled to an arbitrary pitch determined based on the pitch detected by the second pitch detecting means. The second musical sound generating means for generating the sound is separately configured. Then, the second pitch detection means is caused to perform control different from that of the first pitch detection means based on the control information. That is, the first pitch detecting means and the first musical sound generating means for generating the first musical sound signal, and the second pitch detecting means and the second musical sound generating means for generating the second musical sound signal. Are controlled in the same manner, but are provided separately in independent configurations, and different controls are performed according to the control information. In particular, the second pitch detection means sequentially detects pitches with pitch changes temporally different from the pitch changes according to the pitches detected by the first pitch detection means based on the control information. For example, even when a musical tone signal such as a vibrato that changes while the pitch fluctuates up and down is input, the first musical tone whose pitch is controlled based on the inputted musical tone signal. In addition to generating a signal, the second musical sound signal whose pitch is controlled following the pitch fluctuation of the first musical sound signal can be separately generated as a calm and stable musical sound. become.

本発明は装置の発明として構成し実施することができるのみならず、方法の発明として構成し実施することができる。また、本発明は、コンピュータまたはＤＳＰ等のプロセッサのプログラムの形態で実施することができるし、そのようなプログラムを記憶した記憶媒体の形態で実施することもできる。 The present invention can be constructed and implemented not only as a device invention but also as a method invention. Further, the present invention can be implemented in the form of a program of a processor such as a computer or a DSP, or can be implemented in the form of a storage medium storing such a program.

この発明によれば、第１の楽音信号生成手段と第２の楽音信号生成手段とを別々に構成し、入力された楽音信号に基づき第１の楽音信号と第２の楽音信号とを生成するようにそれぞれ別途に制御できるようにしたことから、例えばビブラートのような音高が上下に揺らぎながら変化する音声信号が入力された場合であっても、該音高の揺らぎを反映した第１の楽音信号とは別途独立して、落ち着いた聴感上安定感のある第２の楽音信号を生成することができるようになる、という効果を得る。
また、入力された楽音信号のピッチ検出の頻度を少なくしなくてもよく第１の楽音信号の発生頻度は従来と変わることがないことから、入力された楽音信号が持っていた音楽的な個性や表現力等が失われることもない、という利点もある。 According to the present invention, the first musical tone signal generating means and the second musical tone signal generating means are configured separately, and the first musical tone signal and the second musical tone signal are generated based on the inputted musical tone signal. Thus, even when an audio signal such as vibrato that changes while the pitch fluctuates up and down is input, the first that reflects the fluctuation of the pitch is used. Independent of the musical tone signal, the second musical tone signal having a calm and stable sense of hearing can be generated.
In addition, since the frequency of the first musical sound signal does not have to be different from the conventional frequency, it is not necessary to reduce the pitch detection frequency of the inputted musical sound signal. There is also an advantage that the expression power is not lost.

この発明に係る楽音信号処理装置の全体構成の一実施例を示したハード構成ブロック図である。1 is a block diagram of a hardware configuration showing an embodiment of an overall configuration of a musical tone signal processing apparatus according to the present invention. 本発明に係る楽音信号処理装置の楽音生成機能を説明するための機能ブロック図である。It is a functional block diagram for demonstrating the musical tone production | generation function of the musical tone signal processing apparatus which concerns on this invention. 階名検出を説明するために音声信号の周波数変化を示す概念図である。It is a conceptual diagram which shows the frequency change of an audio | voice signal in order to demonstrate a floor name detection. 従来知られた楽音生成処理手順を説明するための概念図である。It is a conceptual diagram for demonstrating the conventionally known musical tone production | generation process procedure. 従来知られた音高決定テーブルのデータ構成を示す概念図である。It is a conceptual diagram which shows the data structure of the conventionally known pitch determination table.

以下、この発明の実施の形態を添付図面に従って詳細に説明する。 Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

図１は、この発明に係る楽音信号処理装置の全体構成の一実施例を示したハード構成ブロック図である。本実施例に示す楽音信号処理装置は、マイクロプロセッサユニット（ＣＰＵ）１、リードオンリメモリ（ＲＯＭ）２、ランダムアクセスメモリ（ＲＡＭ）３からなるマイクロコンピュータによって制御される。ＣＰＵ１は、この装置全体の動作を制御する。このＣＰＵ１に対して、通信バス１Ｄ（例えばデータ及びアドレスバス）を介してＲＯＭ２、ＲＡＭ３、入力操作部４、表示部５、音源６、通信インタフェース（Ｉ／Ｆ）７、記憶装置８がそれぞれ接続されている。 FIG. 1 is a hardware configuration block diagram showing an embodiment of the overall configuration of a musical tone signal processing apparatus according to the present invention. The musical tone signal processing apparatus shown in this embodiment is controlled by a microcomputer comprising a microprocessor unit (CPU) 1, a read only memory (ROM) 2, and a random access memory (RAM) 3. The CPU 1 controls the operation of the entire apparatus. ROM 2, RAM 3, input operation unit 4, display unit 5, sound source 6, communication interface (I / F) 7, and storage device 8 are connected to CPU 1 via communication bus 1D (for example, data and address bus). Has been.

ＲＯＭ２は、ＣＰＵ１により実行あるいは参照される各種制御プログラムや例えば図５に示した音高決定テーブルなどの各種データ等を格納する。ＲＡＭ３は、ＣＰＵ１が所定の制御プログラムを実行する際に発生する各種データなどを一時的に記憶するワーキングメモリとして、あるいは現在実行中の制御プログラムやそれに関連するデータを一時的に記憶するメモリ等として使用される。ＲＡＭ３の所定のアドレス領域がそれぞれの機能に割り当てられ、レジスタやフラグ、テーブル、メモリなどとして利用される。 The ROM 2 stores various control programs executed or referred to by the CPU 1 and various data such as a pitch determination table shown in FIG. The RAM 3 is a working memory that temporarily stores various data generated when the CPU 1 executes a predetermined control program, or a memory that temporarily stores a control program currently being executed and related data. used. A predetermined address area of the RAM 3 is assigned to each function and used as a register, flag, table, memory, or the like.

入力操作部４は、例えば人が発した音声などの音声信号を入力するマイクロフォンなどの入力機器や、ハーモニー音の自動生成開始／停止を指示するスタート／ストップボタンや楽音生成のための各種パラメータ（詳しくは後述する）を設定するスイッチなどの各種操作子の他、数値データ入力用のテンキーや文字データ入力用のキーボードあるいはマウスなどであってよい。前記入力機器はマイクロフォンに限らず、ハーモニー音を生成する際に必要とされるコード音などの楽音信号をユーザ操作に応じて発生する例えば鍵盤等の演奏操作子や、予めＲＯＭ２等に記憶した楽音信号を演奏進行順に供給するシーケンサーなどの入力装置であってもよい。 The input operation unit 4 is, for example, an input device such as a microphone for inputting a voice signal such as a voice uttered by a person, a start / stop button for instructing automatic generation start / stop of a harmony sound, and various parameters ( In addition to various operators such as a switch for setting a switch to be described in detail later, a numeric keypad for inputting numeric data, a keyboard for inputting character data, a mouse, or the like may be used. The input device is not limited to a microphone, and a musical tone operator such as a keyboard that generates a musical tone signal such as a chord tone required for generating a harmony sound in response to a user operation, or a musical tone previously stored in the ROM 2 or the like. It may be an input device such as a sequencer that supplies signals in the order of performance.

表示部５は例えば液晶表示パネル（ＬＣＤ）やＣＲＴ等から構成されてなり、生成したリード音及び／又はハーモニー音の楽譜、各種操作子により設定されたパラメータ設定状態、あるいは予め記憶されている各種データの一覧やＣＰＵ１の制御状態などといった各種情報を表示する。 The display unit 5 includes, for example, a liquid crystal display panel (LCD), a CRT, and the like. The generated lead sound and / or harmony sound score, parameter setting states set by various operators, or various kinds of pre-stored information Various information such as a list of data and a control state of the CPU 1 is displayed.

音源６は複数のチャンネルで楽音信号の同時発生が可能であり、通信バス１Ｄを経由して与えられる例えばマイクロフォンを介して入力された音声信号等に基づきリード音（第１の楽音信号）やハーモニー音（第２の楽音信号）などの楽音信号を生成し、該生成した楽音信号に基づいて楽音を発生する。音源６から発生された楽音は、アンプやスピーカなどを含むサウンドシステム６Ａから発音される。また、音源６はリード音やハーモニー音などを生成する際に、例えばジェンダー（男性声、女性声といった声質のタイプおよび深さ）、ビブラート（深さと周期の変化率、ビブラート開始までの遅延時間）、トレモロ、音量、パン（定位）、デチューン、リバーブ（残響）などの各種効果を付与することができるようになっている。なお、音源６とサウンドシステム６Ａの構成には、従来のいかなる構成を用いてもよい。例えば、音源６はＦＭ、ＰＣＭ、物理モデル、フォルマント合成等の各種楽音合成方式のいずれを採用してもよく、専用のハードウェアで構成してもよいし、ＣＰＵ１あるいはＤＳＰによるソフトウェア処理で構成してもよい。 The sound source 6 can simultaneously generate musical sound signals on a plurality of channels, and for example, a lead sound (first musical sound signal) or harmony based on an audio signal input via a communication bus 1D, for example, via a microphone. A musical tone signal such as a sound (second musical tone signal) is generated, and a musical tone is generated based on the generated musical tone signal. The musical sound generated from the sound source 6 is generated from a sound system 6A including an amplifier and a speaker. When the sound source 6 generates a lead sound, a harmony sound, etc., for example, gender (type and depth of voice quality such as male voice and female voice), vibrato (depth and period change rate, delay time until vibrato start) Various effects such as tremolo, volume, pan (stereolocation), detune, and reverb (reverberation) can be applied. Note that any conventional configuration may be used for the configuration of the sound source 6 and the sound system 6A. For example, the tone generator 6 may employ any of various tone synthesis methods such as FM, PCM, physical model, formant synthesis, etc., may be configured with dedicated hardware, or configured with software processing by the CPU 1 or DSP. May be.

通信インタフェース（Ｉ／Ｆ）７は、当該装置と図示しない外部機器との間で楽音信号や音高決定テーブルさらには制御プログラムなどの各種情報を送受信するためのインタフェースである。この通信インタフェース７は、例えばMIDIインタフェース，ＬＡＮ，インターネット，電話回線等であってよく、また有線あるいは無線のものいずれかでなく双方を具えていてよい。 The communication interface (I / F) 7 is an interface for transmitting and receiving various information such as a tone signal, a pitch determination table, and a control program between the apparatus and an external device (not shown). The communication interface 7 may be, for example, a MIDI interface, a LAN, the Internet, a telephone line, or the like, and may include both wired and wireless ones.

記憶装置８は、予め用意された音高決定テーブルやＣＰＵ１が実行する各種制御プログラムなどの各種情報を記憶する。あるいは、生成されたリード音やハーモニー音などの楽音信号を記憶できるようにしてもよい。 The storage device 8 stores various information such as a pitch determination table prepared in advance and various control programs executed by the CPU 1. Or you may enable it to memorize | store musical sound signals, such as the produced | generated lead sound and a harmony sound.

なお、前記ＲＯＭ２に制御プログラムが記憶されていない場合、この記憶装置８（例えばハードディスク）に制御プログラムを記憶させておき、それを前記ＲＡＭ３に読み込むことにより、ＲＯＭ２に制御プログラムを記憶している場合と同様の動作をＣＰＵ１に実行させることができる。このようにすると、制御プログラムの追加やバージョンアップ等が容易に行える。なお、記憶装置８はハードディスク（HD）に限られず、フレキシブルディスク（FD）、コンパクトディスク（CD）、光磁気ディスク（MO）、あるいはDVD（Digital Versatile Disk）等の着脱自在な様々な形態の外部記録媒体を利用する記憶装置であってもよい。あるいは、半導体メモリなどであってもよい。 If no control program is stored in the ROM 2, the control program is stored in the storage device 8 (for example, a hard disk) and read into the RAM 3 to store the control program in the ROM 2. It is possible to cause the CPU 1 to execute the same operation as in FIG. In this way, control programs can be easily added and upgraded. The storage device 8 is not limited to a hard disk (HD), but can be attached in various forms such as a flexible disk (FD), a compact disk (CD), a magneto-optical disk (MO), or a DVD (Digital Versatile Disk). A storage device using a recording medium may be used. Alternatively, a semiconductor memory or the like may be used.

なお、上述した楽音信号処理装置において、入力操作部４や表示部５あるいは音源６などを１つの装置本体に内蔵したものに限らず、それぞれが別々に構成され、MIDIインタフェースや各種ネットワーク等の通信インタフェースを用いて各装置を接続するように構成されたものであってよいことは言うまでもない。
なお、本発明に係る楽音信号処理装置及びプログラムは、カラオケ装置、電子楽器、パーソナルコンピュータ、携帯電話等の携帯型通信端末、あるいはゲーム装置など、どのような形態の装置・機器に適用してもよい。携帯型通信端末に適用した場合、端末のみで所定の機能が完結している場合に限らず、機能の一部をサーバ側に持たせ、端末とサーバとからなるシステム全体として所定の機能を実現するようにしてもよい。 In the above-described musical tone signal processing apparatus, the input operation unit 4, the display unit 5, the sound source 6 and the like are not limited to being built in one apparatus body, but each is configured separately and communicates via a MIDI interface or various networks. Needless to say, each device may be configured to be connected using an interface.
The musical tone signal processing apparatus and program according to the present invention can be applied to any type of apparatus / equipment such as a karaoke apparatus, an electronic musical instrument, a personal computer, a portable communication terminal such as a mobile phone, or a game apparatus. Good. When applied to a portable communication terminal, not only when a predetermined function is completed with only the terminal, but also a part of the function is provided on the server side, and the predetermined function is realized as a whole system composed of the terminal and the server. You may make it do.

本発明に係る楽音信号処理装置においても、例えばマイクロフォン等を介して入力された音声信号のピッチを検出し（最終的には音楽の音名のいずれかに対応する特定のピッチ（音高）を検出する）、該検出したピッチのリード音の楽音信号を生成すると共に、前記検出したピッチと鍵盤等から入力されたコード情報を元にして別途新たにピッチ（同様に音楽の音名のいずれかに対応する特定のピッチ）を決定し、該決定したピッチのハーモニー音の楽音信号を自動的に生成する楽音生成機能を有する。ただし、本発明に係る楽音信号処理装置では、従来と異なりリード音とハーモニー音を生成する機能のそれぞれを独立に構成している。そこで、本発明に係る楽音信号処理装置の楽音生成機能について図２を用いて説明する。図２は、本発明に係る楽音信号処理装置の楽音生成機能を説明するための機能ブロック図である。図２において、図中の矢印は信号の流れを表す。 Also in the musical tone signal processing apparatus according to the present invention, for example, the pitch of an audio signal input via a microphone or the like is detected (finally a specific pitch (pitch) corresponding to one of the musical pitch names). Detecting), generating a musical tone signal of the lead sound of the detected pitch, and separately generating a new pitch (similarly, any of the musical pitch names) based on the detected pitch and chord information inputted from the keyboard or the like. And a tone generation function for automatically generating a tone signal of a harmony tone of the determined pitch. However, in the musical tone signal processing apparatus according to the present invention, unlike the conventional one, each of the functions for generating the lead sound and the harmony sound is configured independently. Therefore, the tone generation function of the tone signal processing apparatus according to the present invention will be described with reference to FIG. FIG. 2 is a functional block diagram for explaining the tone generation function of the tone signal processing apparatus according to the present invention. In FIG. 2, the arrows in the figure represent the flow of signals.

図２に示すように、音源６は信号入力部Ｉ、周波数検出部Ｆ、リード音用楽音生成部Ｍ１、ハーモニー音用楽音生成部Ｍ２、効果付与部Ｅ、信号出力制御部Ｏからなる楽音生成機能を有する。信号入力部Ｉは、楽音生成機能の開始に伴いマイクロフォン等を介して入力された音声信号を取得し、該取得した音声信号を周波数検出部Ｆに対して順次に供給する。周波数検出部Ｆは音声信号を受け取ると、該入力された音声信号を「周波数検出」処理して周波数信号に変換する。周波数検出部Ｆから出力される周波数信号は、リード音用楽音生成部Ｍ１及びハーモニー音用楽音生成部Ｍ２にそれぞれ供給される。 As shown in FIG. 2, the tone generator 6 includes a signal input unit I, a frequency detection unit F, a lead tone tone generator M1, a harmony tone tone generator M2, an effect applying unit E, and a signal output control unit O. It has a function. The signal input unit I acquires an audio signal input via a microphone or the like with the start of the musical sound generation function, and sequentially supplies the acquired audio signal to the frequency detection unit F. When receiving the audio signal, the frequency detection unit F performs “frequency detection” processing on the input audio signal and converts it into a frequency signal. The frequency signal output from the frequency detector F is supplied to the lead tone musical sound generator M1 and the harmony musical tone generator M2.

リード音用楽音生成部Ｍ１及びハーモニー音用楽音生成部Ｍ２はリード音の楽音信号とハーモニー音の楽音信号をそれぞれ生成するために別途独立に構成された楽音生成部であって、それぞれが平坦化部Ｈ１（Ｈ２）、音高変換部Ｃ１（Ｃ２）、リード音生成部Ａ１又はハーモニー音生成部Ａ２を有する。平坦化部Ｈ１（Ｈ２）は、周波数検出部Ｆから出力される周波数信号を「平坦化」処理することによって、周波数信号の変化を平坦化（平滑化）する。ただし、後述するように平坦化部Ｈ１と平坦化部Ｈ２とでは、指示されるパラメータ内容（後述する平坦化の時定数）に従って「平坦化」処理の結果として得られる「平坦化された周波数信号」は異なるものとなり得る。上記パラメータは、平坦化部Ｈ１に対してリード音用パラメータ設定部Ｔ１からリード音用パラメータ情報として、平坦化部Ｈ２に対してハーモニー音用パラメータ設定部Ｔ２からハーモニー音用パラメータ情報としてそれぞれ指示される。 The lead tone music generator M1 and the harmony tone generator M2 are separately generated tone generators for generating the lead tone signal and the harmony tone signal, respectively. Part H1 (H2), pitch conversion part C1 (C2), lead sound generation part A1, or harmony sound generation part A2. The flattening unit H1 (H2) performs a “flattening” process on the frequency signal output from the frequency detection unit F, thereby flattening (smoothing) the change in the frequency signal. However, as will be described later, in the flattening unit H1 and the flattening unit H2, the “flattened frequency signal” obtained as a result of the “flattening” process according to the content of the designated parameter (flattening time constant described later). "Can be different. The parameters are instructed as lead sound parameter information from the lead sound parameter setting unit T1 to the flattening unit H1, and as harmony sound parameter information from the harmony sound parameter setting unit T2 to the flattening unit H2. The

平坦化された周波数信号は、音高変換部Ｃ１又は音高変換部Ｃ２にそれぞれ供給される。音高変換部Ｃ１では、該平坦化された周波数信号を「階名検出」処理することによって所定時間間隔毎に１２音音階の階名（音名）のいずれかに離散化する。すなわち、入力された音声信号のピッチに基づいて音楽の音名のいずれかに対応する特定のピッチを検出し、こうして得られた音楽の音名のいずれかに対応する特定のピッチがそのままリード音の音高として決定され用いられる。勿論、入力された音声信号のピッチ検出結果そのものをリード音の音高に決定するものに限らず、入力された音声信号のピッチ検出結果を例えば１オクターブや３半音等の所定ピッチだけ上下するなどして音高変換したものを、リード音の音高に決定するようにしてもよい。 The flattened frequency signal is supplied to the pitch converter C1 or the pitch converter C2. The pitch conversion unit C1 discretizes the flattened frequency signal into any one of 12 scale names (pitch names) at predetermined time intervals by performing “class name detection” processing. That is, a specific pitch corresponding to one of the musical pitch names is detected based on the pitch of the input audio signal, and the specific pitch corresponding to one of the musical pitch names thus obtained is used as the lead sound. The pitch is determined and used. Of course, the pitch detection result of the input sound signal itself is not limited to the pitch of the lead sound, and the pitch detection result of the input sound signal is raised or lowered by a predetermined pitch such as one octave or three semitones. Then, the pitch converted may be determined as the pitch of the lead sound.

他方、音高変換部Ｃ２では、該平坦化された周波数信号を「階名検出」処理することによって所定時間間隔毎に１２音音階の階名（音名）のいずれかに離散化する際に、「階名検出」処理により得られた入力された音声信号のピッチ検出結果を、さらに前記ピッチ検出結果と鍵盤等から入力されたコード情報とに基づいて、予め用意された図５に示すような音高決定テーブルに従って１２音音階の階名（音名）のいずれかに音高変換して離散化する。こうして得られた音楽の音名のいずれかに対応する特定のピッチが（１乃至複数であってよい）、ハーモニー音の音高として決定され用いられる。ただし、後述するように音高変換部Ｃ１と音高変換部Ｃ２とでは、指示されるパラメータ内容（後述する階名変化の閾値やオフセット値）に従って「階名検出」処理の結果として得られる入力された音声信号のピッチ検出結果が異なるものとなり得る。上記パラメータは、音高変換部Ｃ１に対してリード音用パラメータ設定部Ｔ１からリード音用パラメータ情報として、音高変換部Ｃ２に対してハーモニー音用パラメータ設定部Ｔ２からハーモニー音用パラメータ情報としてそれぞれ指示される。 On the other hand, the pitch conversion unit C2 performs a “class name detection” process on the flattened frequency signal to discretize it into one of 12 scale names (pitch names) at predetermined time intervals. FIG. 5 prepared in advance shows the pitch detection result of the input voice signal obtained by the “floor name detection” process based on the pitch detection result and the chord information input from the keyboard or the like. According to a pitch determination table, the pitch is converted into one of the twelve scale names (sound names) and discretized. A specific pitch (which may be one or more) corresponding to any of the pitch names of the music thus obtained is determined and used as the pitch of the harmony sound. However, as will be described later, the pitch conversion unit C1 and the pitch conversion unit C2 input data obtained as a result of the “floor name detection” process according to the instructed parameter contents (floor name change threshold and offset value described later). The pitch detection results of the audio signals thus made can be different. The parameters are as lead sound parameter information from the lead sound parameter setting unit T1 for the pitch conversion unit C1, and as harmony sound parameter information from the harmony sound parameter setting unit T2 to the pitch conversion unit C2. Instructed.

音高変換部Ｃ１（Ｃ２）によって検出された音楽の音名のいずれかに対応する特定のピッチ（階名信号）は、リード音生成部Ａ１又はハーモニー音生成部Ａ２にそれぞれ供給される。リード音生成部Ａ１又はハーモニー音生成部Ａ２は階名信号を受け取ると、該供給された階名信号に基づきリード音又はハーモニー音をそれぞれ別途に生成する。すなわち、リード音生成部Ａ１は「収束曲線」処理によって階名信号を時間的に連続変化する信号とし、該時間的に連続変化する階名信号を「出力変調」処理することによって入力された音声信号のピッチを変調したリード音の出力信号を生成する。ハーモニー音生成部Ａ２は「収束曲線」処理によって階名信号を時間的に連続変化する信号とし、該時間的に連続変化する階名信号を「出力変調」処理することによって入力された音声信号のピッチを変調したハーモニー音の出力信号を前記リード音生成部Ａ１によるリード音の生成とは関係なしに別途独立して生成する。 A specific pitch (name signal) corresponding to one of the pitch names of music detected by the pitch converter C1 (C2) is supplied to the lead sound generator A1 or the harmony sound generator A2. When the lead sound generation unit A1 or the harmony sound generation unit A2 receives the floor name signal, the lead sound generation part A1 or the harmony sound generation part A2 separately generates a lead sound or a harmony sound based on the supplied floor name signal. That is, the lead sound generation unit A1 uses the “convergence curve” process to convert the rank signal into a signal that changes continuously in time, and the input voice signal by performing the “output modulation” process on the rank signal that changes in time continuously. An output signal of a lead sound in which the pitch of the signal is modulated is generated. The harmony sound generator A2 uses the “convergence curve” process to convert the rank signal into a signal that continuously changes in time, and performs an “output modulation” process on the rank signal that changes in time continuously. The output signal of the harmony sound with modulated pitch is generated independently independently of the generation of the lead sound by the lead sound generation unit A1.

リード音生成部Ａ１又はハーモニー音生成部Ａ２によってそれぞれ生成されたリード音又はハーモニー音は効果付与部Ｅに供給され、該効果付与部Ｅによってジェンダー、ビブラート、トレモロ、音量、パン、デチューン、リバーブなどの各種効果を付与されうる。信号出力制御部Ｏは、効果付与部Ｅから供給されるリード音及び／又はハーモニー音をサウンドシステム６Ａに出力する。その際には、リード音のみ、ハーモニー音のみ、リード音及びハーモニー音のように出力する楽音信号を適宜に選択することができる。 The lead sound or the harmony sound generated by the lead sound generation unit A1 or the harmony sound generation unit A2 is supplied to the effect applying unit E, and the effect applying unit E provides gender, vibrato, tremolo, volume, pan, detune, reverb, etc. Various effects can be imparted. The signal output control unit O outputs the lead sound and / or the harmony sound supplied from the effect applying unit E to the sound system 6A. In this case, it is possible to appropriately select a musical sound signal to be output such as only a lead sound, only a harmony sound, a lead sound, and a harmony sound.

上述のように、本発明に係る楽音信号処理装置においては、リード音とハーモニー音を生成する機能のそれぞれを独立に構成している。そのため、リード音用楽音生成部Ｍ１及びハーモニー音用楽音生成部Ｍ２のそれぞれを別途に制御することができ、これによりリード音の生成タイミングとハーモニー音の生成タイミングとを必ずしも一致させることのないようにしている。具体的には、平坦化部Ｈ１と平坦化部Ｈ２、音高変換部Ｃ１と音高変換部Ｃ２とにそれぞれ異なる内容（パラメータ値）でパラメータを指示することによって、これら双方に独立の時間的な振る舞いを実現させて信号制御することができるようにしている。以下、説明する。 As described above, in the musical tone signal processing apparatus according to the present invention, each of the functions for generating the lead sound and the harmony sound is configured independently. Therefore, each of the lead tone musical sound generation unit M1 and the harmony sound musical tone generation unit M2 can be separately controlled, so that the generation timing of the lead sound and the generation timing of the harmony sound are not necessarily matched. I have to. Specifically, by instructing the parameters with different contents (parameter values) to the flattening unit H1 and the flattening unit H2, and the pitch converting unit C1 and the pitch converting unit C2, respectively, independent time for both of them. It is possible to control the signal by realizing the proper behavior. This will be described below.

まず、平坦化部Ｈ１と平坦化部Ｈ２の制御について説明する。平坦化処理において、入力された音声信号が無音から有声音に変化する際には、多くの場合周波数がうまく検出できない期間が存在するので、有声音（母音）を検出してから周波数検出を行うまでの猶予期間（これを平坦化の時定数と呼ぶ）をパラメータとして指示することにより、うまく周波数を検出できない場合に検出されうる余計な周波数情報を排除することができる。平坦化の時定数は母音として検出された時点から平坦化を開始するまでの時間であって、例えば０〜５０ｍｓ（ミリ秒）の間に設定される。例えば平坦化の時定数を０ｍｓに設定した場合には、母音として検出された時点から平坦化を開始する。一方、平坦化の時定数を１０ｍｓに設定した場合には、母音として検出された時点から１０ｍｓまでは検出ピッチをそのまま出力して１０ｍｓ以降に平坦化を行う。 First, control of the flattening part H1 and the flattening part H2 will be described. In the flattening process, when the input audio signal changes from silence to voiced sound, there is often a period during which the frequency cannot be detected well, so the frequency detection is performed after detecting the voiced sound (vowel). By indicating the grace period up to (this is called a flattening time constant) as a parameter, it is possible to eliminate unnecessary frequency information that can be detected when the frequency cannot be detected well. The time constant of flattening is the time from the time when it is detected as a vowel to the start of flattening, and is set, for example, between 0 and 50 ms (milliseconds). For example, when the flattening time constant is set to 0 ms, the flattening is started from the time point detected as a vowel. On the other hand, when the flattening time constant is set to 10 ms, the detected pitch is output as it is from the time when it is detected as a vowel until 10 ms, and flattening is performed after 10 ms.

ところで、一般的には平坦化の時定数を遅くすると、半音に近い幅広いピッチ変化を階名変化と誤認識する可能性が低くなるが、半音を超えるピッチ変化への追従性が悪くなる。反対に、平坦化の時定数を速くすると、半音を超えるピッチ変化への追従性は良くなるが、半音に近い幅広いピッチ変化を階名変化と誤認識する可能性が高くなる。しかし、従来においてはリード音とハーモニー音とで平坦化処理を独立させておらず、リード音とハーモニー音とを同じ平坦化の時定数を使って制御することしかできなかった。すなわち、従来の場合には１つのある時定数でしか平坦化処理を制御することができず、そのため平坦化の時定数を遅くした場合にはリード音に都合が悪くハーモニー音に都合がよい一方、平坦化の時定数を速くした場合にはリード音に都合がよくハーモニー音に都合が悪いことが生じる結果となっていた。そこで、本発明においては、リード音とハーモニー音とで平坦化処理を独立させた上で、半音を超えるピッチ変化への追従性が必要とされるリード音については平坦化の時定数を速くする一方で、半音を超えるピッチ変化への追従性が必要とされないハーモニー音については平坦化の時定数を遅くするように設定して、平坦化部Ｈ１と平坦化部Ｈ２とを分けて制御することができるようにしている。 By the way, in general, if the time constant of flattening is slowed, the possibility of misrecognizing a wide pitch change close to a semitone as a name change is lowered, but the followability to a pitch change exceeding a semitone is deteriorated. On the other hand, if the flattening time constant is increased, the followability to a pitch change exceeding a semitone is improved, but the possibility of misrecognizing a wide pitch change close to a semitone as a name change is increased. Conventionally, however, the lead sound and the harmony sound are not independent of the flattening process, and the lead sound and the harmony sound can only be controlled using the same flattening time constant. That is, in the conventional case, the flattening process can be controlled only by one certain time constant. Therefore, when the time constant of flattening is delayed, the lead sound is not convenient and the harmony sound is convenient. When the flattening time constant was increased, the lead sound was favorable and the harmony sound was inconvenient. Therefore, in the present invention, the flattening process is made independent for the lead sound and the harmony sound, and the time constant of the flattening is increased for the lead sound that needs to follow the pitch change exceeding the semitone. On the other hand, for a harmony sound that does not require the ability to follow a pitch change exceeding a semitone, the time constant of flattening is set to be slow, and the flattening part H1 and the flattening part H2 are controlled separately. To be able to.

次に、音高変換部Ｃ１と音高変換部Ｃ２の制御について説明する。「階名検出」時において周波数情報を階名に離散化する際には、従来は上下の階名の中心周波数のどちらに近いかで判定して階名を決定するようにしているが、この実施例では一旦階名を決定すると指定した閾値以上の中心周波数から離れない限り新しい階名を採用しないようにしている。そのためのパラメータとして、ピッチ補正の目標音高の推移を抑止する範囲（階名検出の閾値と呼ぶ）を指定する。例えば、階名検出の閾値を７５セントと設定した場合、検出ピッチが現在の目標音高から±７５セント以内にある場合は、現在の目標音高が維持される（新しい階名を採用しない）ようにしている。これにより、ピッチ補正の目標音高の推移にヒステリシス効果をもたらしうる。 Next, control of the pitch converter C1 and the pitch converter C2 will be described. When discretizing frequency information into floor names at the time of “floor name detection”, conventionally, the floor name is determined by determining which one is closer to the center frequency of the upper and lower floor names. In the embodiment, once a floor name is determined, a new floor name is not adopted unless the floor frequency is separated from a center frequency equal to or higher than a specified threshold value. As a parameter for that purpose, a range (referred to as a floor detection threshold) in which the transition of the target pitch for pitch correction is suppressed is designated. For example, if the floor detection threshold is set to 75 cents, and the detected pitch is within ± 75 cents from the current target pitch, the current target pitch is maintained (no new floor name is adopted). I am doing so. Thereby, a hysteresis effect can be brought about in transition of the target pitch of pitch correction.

ここで、初めに階名（特定の音高）Ｘとして検出されてから、検出した周波数が図３に示すように変化する場合を例にして説明する。図３は、階名検出を説明するために音声信号の周波数変化を示す概念図である。縦軸は周波数であり、横軸は時間である。この図３においては、階名ｘと階名ｙは隣り合う階名（１００セント差）であって、階名ｘの周波数を基準として図中における破線が５０セント差の周波数、該破線の上の一点鎖線が７５セント差の周波数、該破線の下の一点鎖線が２５セント差の周波数をそれぞれ便宜的に表している。 Here, a case where the detected frequency changes as shown in FIG. 3 after it is first detected as a floor name (specific pitch) X will be described as an example. FIG. 3 is a conceptual diagram showing a frequency change of an audio signal in order to explain floor name detection. The vertical axis is frequency and the horizontal axis is time. In FIG. 3, the floor name x and the floor name y are adjacent floor names (100 cents difference), and the broken line in the figure is the frequency of 50 cents difference on the basis of the frequency of the floor name x. For convenience, the alternate long and short dash line represents a frequency with a difference of 75 cents, and the alternate long and short dashed line represents a frequency with a difference of 25 cents.

周波数が変化してａ時点で初めて５０セント差を上回った場合、従来ではこの時点で階名（特定の音高）がｘからｙに変わる。その後、ｂ時点で再び５０セントを下回るので階名がｙからｘに変わる。しかし、このような変化は音楽的に少し深い表情で音高が変化しているだけで記譜上は階名変化に相当しない場合が多い。そこで、本発明において階名変化の閾値として例えば７５セントを指示する。この場合の動作は、周波数変化に伴い、ａ時点で５０セント差を上回ったとしても７５セント差には達していないことから該時点では階名Ｘを維持したままとし、ｃ時点で７５セント差を初めて上回るのでこの時点で階名をｘからｙに変える。そして、一旦階名がｘからｙに変わると、次は階名ｙの周波数から７５セント以上周波数が変化しないと階名がｙから他の階名に変わらないので、図３の例では７５セント以上周波数が変化したｆ時点で初めて階名がｙからｘに変わる。なお、上記７５セントという階名検出の閾値は、ユーザが歌い方などに応じて適宜の値に設定することができてよい。 When the frequency changes and exceeds the 50 cent difference for the first time at a time point, conventionally, the floor name (specific pitch) changes from x to y at this time point. After that, it falls below 50 cents again at time b, so the floor name changes from y to x. However, such a change often changes the pitch with a slightly deeper expression, and does not correspond to a change in name on the notation. Therefore, in the present invention, for example, 75 cents is designated as the threshold for changing the name. In this case, as the frequency changes, even if it exceeds the 50 cent difference at the time point a, it does not reach the 75 cent difference. Since this is the first time, the floor name is changed from x to y at this point. Once the floor name is changed from x to y, the floor name is not changed from y to another floor name unless the frequency is changed more than 75 cents from the frequency of the floor name y. The floor name changes from y to x for the first time at the time point f when the frequency changes. It should be noted that the floor name detection threshold of 75 cents may be set to an appropriate value depending on how the user sings.

また、図３において破線や一点鎖線で示される値は、リード音用パラメータ設定部Ｔ１又はハーモニー音用パラメータ設定部Ｔ２からそれぞれ指示されるオフセット値（パラメータ情報）に従って上下にオフセットさせることができる。これにより、入力された音声信号のピッチから音高を判定する際に、音高判定基準に異なるオフセットを加えることによって、リード音とハーモニー音とでそれぞれの階名変化のポイントを調整するように指示することができる。例えば、上ずり傾向の入力音声については、音高変換部Ｃ２に対してのみ階名変化の閾値（上記例において７５セント）を全体的に５セント程度上下にオフセットするオフセット値を指示するとよい。上記したような各種のパラメータを音高変換部Ｃ１と音高変換部Ｃ２にそれぞれ与えることによって、音高変換部Ｃ１と音高変換部Ｃ２はそれぞれ、幅の広いビブラートの音声信号が入力されたような場合すなわち音楽的な表情として音高変化が大きい場合であっても、余分な階名変化を検出することがない。 In addition, the values indicated by the broken lines and the alternate long and short dash lines in FIG. 3 can be offset up and down according to the offset values (parameter information) instructed from the lead sound parameter setting unit T1 or the harmony sound parameter setting unit T2, respectively. As a result, when determining the pitch from the pitch of the input audio signal, by adding different offsets to the pitch determination criterion, the point of change of the name of each rank is adjusted between the lead sound and the harmony sound. Can be directed. For example, for an input voice with an upward tendency, an offset value that offsets the floor change threshold (75 cents in the above example) up and down by about 5 cents as a whole may be instructed only to the pitch conversion unit C2. By giving the various parameters as described above to the pitch converter C1 and the pitch converter C2, the pitch converter C1 and the pitch converter C2 respectively receive wide vibrato audio signals. Even in such a case, that is, when the pitch change is large as a musical expression, an extra floor name change is not detected.

以上のようにして、本発明に係る楽音信号処理装置では、「周波数検出」処理、「平坦化」処理、「階名検出」処理及び「収束曲線」処理、「出力変調」処理を順次に行うことによりリード音とハーモニー音とを生成するが、前記「周波数検出」処理以降に行われる上記各処理を含んでなる同じ構成のリード音用楽音生成部Ｍ１とハーモニー音用楽音生成部Ｍ２とをリード音生成用とハーモニー音生成用とで別々に設けておき、これらに異なる内容のパラメータ情報を与えることによってそれぞれで異なる制御を行うことのできるようにしている。すなわち、音声信号のピッチ検出を契機としてすぐさま該ピッチ検出結果に基づき音高を決定してリード音が生成されるような場合であっても、従来と異なりハーモニー音用楽音生成部Ｍ２は当該音声信号のピッチ検出を契機に音高を検出しないようにできる場合がある（そのようにハーモニー音用楽音生成部Ｍ２に対してパラメータ情報を与える）。つまりは、ハーモニー音用楽音生成部Ｍ２がリード音用楽音生成部Ｍ１により検出された音高に従う音高変化とは時間的に異なる音高変化を伴う音高を順次に検出するように、ハーモニー音用パラメータ情報がハーモニー音用パラメータ設定部Ｔ２から指示される。こうすることによって、例えばビブラートのような音高が上下に揺らぎながら変化する音声信号が入力された場合であっても、落ち着いた聴感上安定感のあるハーモニー音を生成することができるようになる。
また、入力された音声信号のピッチ検出の頻度を少なくしなくてもよいことからリード音の発生頻度は従来と変わらず、入力された音声信号が持つ音楽的な個性や表現力等が失われることもない。 As described above, the musical tone signal processing apparatus according to the present invention sequentially performs the “frequency detection” process, the “flattening” process, the “floor name detection” process, the “convergence curve” process, and the “output modulation” process. In this way, the lead sound and the harmony sound are generated, and the lead sound tone generation unit M1 and the harmony sound music generation unit M2 having the same configuration including each of the processes performed after the “frequency detection” process are provided. A lead sound generation and a harmony sound generation are provided separately, and different parameter information is given to them so that different control can be performed. In other words, even when a lead sound is generated by immediately determining the pitch based on the pitch detection result in response to the pitch detection of the audio signal, the harmony tone musical sound generating unit M2 is different from the conventional one. In some cases, the pitch may not be detected when the pitch of the signal is detected (parameter information is given to the musical tone generator M2 for the harmony sound in this way). In other words, the harmony sound generation unit M2 sequentially detects the pitches accompanied by a change in pitch that is temporally different from the change in pitch according to the pitch detected by the lead tone generation unit M1. Sound parameter information is instructed from the harmony sound parameter setting unit T2. By doing so, even when a voice signal such as a vibrato whose pitch changes while moving up and down is input, it is possible to generate a harmony sound with a calm and stable sense of hearing. .
In addition, since the frequency of pitch detection of the input audio signal does not have to be reduced, the frequency of occurrence of lead sounds is the same as before, and the musical personality and expressive power of the input audio signal are lost. There is nothing.

なお、上述した実施例においてはリード音やハーモニー音を生成するための元となる楽音信号はマイク入力された音声を例に説明したが、例えばマイク入力された楽器演奏音などであってもよい。楽器演奏音の場合、付加音は伴奏音であってよい。
なお、ハーモニー音は一度に１音のみ生成するものに限らず、同時に複数音を生成するものであってもよい。 In the above-described embodiments, the musical sound signal that is the basis for generating the lead sound and the harmony sound has been described by taking the sound input to the microphone as an example. However, for example, the musical instrument performance sound input to the microphone may be used. . In the case of a musical instrument performance sound, the additional sound may be an accompaniment sound.
The harmony sound is not limited to generating only one sound at a time, and may generate a plurality of sounds at the same time.

なお、ハーモニー音生成のために入力されるコード情報は、上述したように本装置上あるいは本装置に接続された鍵盤などの演奏操作子からユーザ操作に応じて入力された入力情報から検出されたものでもよいし、あるいは和音名を順次入力する形式で得られるものであってもよい。
なお、上述した実施例では、ハーモニー音をコード情報を元に生成するものを示したがこれに限らず、コード情報を元にしないでハーモニー音を生成する公知の他の方法であってもよい。例えば、リード音の音高の３度上の音高でハーモニー音を生成する方法を採用するなどしてもよい。 Note that the chord information input for generating the harmony sound was detected from the input information input in response to a user operation from a performance operator such as a keyboard connected to the apparatus or the apparatus as described above. It may be a thing which can be obtained in the form of inputting chord names sequentially.
In the above-described embodiment, the harmony sound is generated based on the chord information. However, the present invention is not limited to this, and other known methods for generating the harmony sound without using the chord information may be used. . For example, a method of generating a harmony sound with a pitch three times higher than the pitch of the lead sound may be adopted.

１…ＣＰＵ、２…ＲＯＭ、３…ＲＡＭ、４…入力操作部、５…表示部、６…音源、６Ａ…サウンドシステム、７…通信インタフェース、８…記憶装置、１Ｄ…通信バス、Ａ１…リード音生成部、Ａ２…ハーモニー音生成部、Ｃ１（Ｃ２）…音高変換部、Ｅ…効果付与部、Ｆ…周波数検出部、Ｈ１（Ｈ２）…平坦化部、Ｉ…信号入力部、Ｍ１…リード音用楽音生成部、Ｍ２…ハーモニー音用楽音生成部、Ｏ…信号出力制御部、Ｔ１…リード音用パラメータ設定部、Ｔ２…ハーモニー音用パラメータ設定部 DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... ROM, 3 ... RAM, 4 ... Input operation part, 5 ... Display part, 6 ... Sound source, 6A ... Sound system, 7 ... Communication interface, 8 ... Memory | storage device, 1D ... Communication bus, A1 ... Read Sound generation unit, A2 ... Harmony sound generation unit, C1 (C2) ... Pitch conversion unit, E ... Effect imparting unit, F ... Frequency detection unit, H1 (H2) ... Flattening unit, I ... Signal input unit, M1 ... Musical tone generator for lead sound, M2 ... Musical tone generator for harmony sound, O ... Signal output control unit, T1 ... Parameter setting unit for lead sound, T2 ... Parameter setting unit for harmony sound

Claims

A first musical tone signal whose pitch is controlled based on the input musical tone signal is generated, and a second musical tone signal whose pitch is controlled following the pitch fluctuation of the first musical tone signal is generated. In the musical sound signal processing device to
An input means for inputting a musical sound signal;
Analyzing means for frequency analysis of the input musical sound signal;
First pitch detection means for flattening a change in the frequency signal obtained according to the frequency analysis, and sequentially detecting any one of pitches corresponding to the pitch names according to the flattened frequency signal;
First musical sound generating means for generating a first musical sound signal pitch-controlled to the pitch detected by the first pitch detecting means;
A designation means for designating control information;
A second pitch detecting means for flattening a change in the frequency signal obtained in accordance with the frequency analysis and sequentially detecting any one of pitches corresponding to pitch names in accordance with the flattened frequency signal; The pitch detection means is controlled based on the control information,
Second musical sound generating means for generating a second musical sound signal pitch-controlled to an arbitrary pitch determined based on the pitch detected by the second pitch detecting means;
Output means for outputting at least one of the generated first and second musical sound signals,
The second pitch detecting means is based on the control information so as to sequentially detect a pitch accompanied by a pitch change temporally different from a pitch change according to the pitch detected by the first pitch detecting means. The musical tone signal processing apparatus is controlled independently of the first pitch detecting means.

The designation means uses control information as time information from vowel detection to flattening start when flattening a change in frequency signal obtained according to the frequency analysis, and a sound corresponding to a pitch name according to the flattened frequency signal. Specify at least one of threshold information for determining which pitch is to be the upper or lower pitch when detecting any of the high, offset information for offsetting the threshold further up and down,
The second pitch detection means detects a pitch so as to suppress a temporal pitch change compared to a pitch change detected by the first pitch detection means based on at least one of the information. The musical tone signal processing apparatus according to claim 1.

A first musical tone signal whose pitch is controlled based on the input musical tone signal is generated, and a second musical tone signal whose pitch is controlled following the pitch fluctuation of the first musical tone signal is generated. A computer-executable program, the program being stored in the computer,
The procedure for inputting musical sound signals,
A procedure for frequency analysis of the input musical sound signal;
A procedure for flattening a change in the frequency signal obtained according to the frequency analysis, and sequentially detecting any one of the pitches corresponding to the pitch names according to the flattened frequency signal;
Generating a first musical tone signal pitch-controlled to the detected pitch;
A procedure for specifying control information;
Based on the control information, the change in the frequency signal obtained according to the frequency analysis is flattened so as to be accompanied by a pitch change that is temporally different from the pitch change according to the detected pitch, and the flattening is performed. A detection procedure for sequentially detecting one of pitches corresponding to the pitch name according to the frequency signal, and a second musical tone signal pitch-controlled to an arbitrary pitch determined based on the pitch detected by the detection procedure The steps to generate
A program for executing a procedure for outputting at least one of the generated first and second musical tone signals.