JP2011197564A

JP2011197564A - Electronic music device and program

Info

Publication number: JP2011197564A
Application number: JP2010066796A
Authority: JP
Inventors: Akira Yamauchi; 明山内; Hikaru Kase; 光加瀬; Motoaki Takashima; 基明高島
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-03-23
Filing date: 2010-03-23
Publication date: 2011-10-06

Abstract

PROBLEM TO BE SOLVED: To generate a harmony voice signal that hardly causes a user to have unnatural noise feeling even in a section where it is difficult to detect a pitch of an input speech signal.SOLUTION: When a pitch of an input voice signal can be detected, the detected pitch is held and input pitch (pitch corresponding to sound name) is specified based on the pitch. When a pitch cannot be detected, the input pitch is specified based on the pitch held immediately before it. By specifying the input pitch based on the pitch held immediately before it, the original pitch to be pitch shifted is made clear and the pitch shift amount can be calculated. Thus, without preparing special data, a harmony voice signal where the discontinuity with the pitch in consecutive vowel sections is small, the continuity of timbre is kept, and the user hardly has unnatural noise feeling in a section where the pitch of the input voice signal cannot be detected.

Description

この発明は、入力された音声信号をピッチシフトすることにより１乃至複数のハーモニー音声信号を生成する電子音楽装置及びプログラムに関する。特に、入力音声信号のピッチを検出することが困難なピッチ非検出区間においても、ユーザに不自然なノイズ感を生じさせることの少ない１乃至複数のハーモニー音声信号を生成する技術に関する。 The present invention relates to an electronic music apparatus and program for generating one or more harmony audio signals by pitch-shifting an input audio signal. In particular, the present invention relates to a technique for generating one or a plurality of harmony audio signals that hardly cause unnatural noise to the user even in a pitch non-detection section in which it is difficult to detect the pitch of an input audio signal.

従来から、マイクロフォン等を介してユーザにより入力された楽器演奏音又は音声等の入力音声信号に基づき、前記入力音声信号とは例えば３度や５度などの所定の音程分だけ上下に音高が離れた１乃至複数のハーモニー音声信号を自動的に生成し、これを前記入力音声信号と共に出力する電子音楽装置及びプログラムが知られている。従来知られた装置は、入力音声信号を周波数解析して得られる周波数情報（ピッチ）に基づき所定区間（又は所定期間）毎に基本周波数つまりは音楽の音名のいずれかに対応する音高を特定し、該特定した入力音声信号の音高に応じて決定される所定のピッチシフト量に従って入力音声信号（より具体的には、前記特定した音高に対応した窓関数により切り出され記憶される１周期分の波形要素データ）をピッチシフトすることにより、所定音高（ここでは、音楽の音名のいずれかに対応する音高）の１乃至複数のハーモニー音声信号を別途独立した付加音として生成するようになっている。 Conventionally, based on an input sound signal such as a musical instrument performance sound or a sound input by a user via a microphone or the like, the input sound signal has a pitch up and down by a predetermined pitch such as 3 degrees or 5 degrees. 2. Description of the Related Art There are known electronic music apparatuses and programs that automatically generate one or more separated harmony audio signals and output them together with the input audio signals. A conventionally known device calculates a pitch corresponding to either a fundamental frequency, that is, a musical pitch name for each predetermined section (or a predetermined period) based on frequency information (pitch) obtained by frequency analysis of an input audio signal. The input voice signal (more specifically, the window function corresponding to the specified pitch is cut out and stored in accordance with a predetermined pitch shift amount determined according to the pitch of the specified input voice signal. By pitch-shifting one period of waveform element data), one or more harmony audio signals having a predetermined pitch (here, the pitch corresponding to one of the musical pitch names) are separately added as additional sounds. It is designed to generate.

ところで、例えば入力音声信号が音程感のある非周期波形からなる子音区間から周期波形からなる母音区間へと遷移するときなどの、入力音声信号を倍音構造が明確な母音として検出することができない母音非検出区間（前記子音区間）では、入力音声信号のピッチを正しく（明確に）検出することができないことから音高（音名のいずれかに対応する音高）を特定することは困難である。こうした入力音声信号の音高を特定できなければ、上記したような入力音声信号をピッチシフトすることによるハーモニー音声信号の生成を行うことができないので、入力音声信号のピッチを正しく（明確に）検出することが困難な母音非検出区間（ピッチ非検出区間とも呼ぶ）については、ハーモニー音声信号として入力音声信号をピッチシフトすることなくそのまま出力するしかなかった。 By the way, vowels that cannot detect the input speech signal as a vowel with a clear harmonic structure, such as when the input speech signal transitions from a consonant interval consisting of a non-periodic waveform with a sense of pitch to a vowel interval consisting of a periodic waveform. In the non-detection section (the consonant section), it is difficult to specify the pitch (pitch corresponding to one of the pitch names) because the pitch of the input voice signal cannot be detected correctly (clearly). . If the pitch of the input audio signal cannot be specified, it is impossible to generate a harmony audio signal by pitch-shifting the input audio signal as described above, so the pitch of the input audio signal is detected correctly (clearly). For a vowel non-detection interval (also referred to as a pitch non-detection interval) that is difficult to do, the input audio signal must be output as it is without being pitch-shifted as a harmony audio signal.

しかし、前記母音非検出区間においてハーモニー音声信号として入力音声信号をそのままピッチシフトすることなく出力すると、非周期波形である子音であっても音程感を有する子音の場合には、相前後する母音区間との間でピッチの不連続がどうしても生じてしまいまた音色も不連続となることから（後述する図４（ｄ）参照）、結果として前記母音非検出区間におけるハーモニー音声信号は不自然なノイズとしてユーザに知覚されてしまうことになり都合が悪い。そこで、下記に示す特許文献１には、別途用意した歌唱者が歌うべき旋律のシーケンスデータを利用して前記母音非検出区間におけるハーモニー音声信号の音高を決定することによって、上記の不都合を解決するようにした装置が開示されている。 However, if the input voice signal is output without being pitch-shifted as a harmony voice signal in the vowel non-detection section, in the case of a consonant having a sense of pitch even if it is a consonant that is a non-periodic waveform, the vowel sections that follow each other As a result, the harmony voice signal in the vowel non-detection section is regarded as unnatural noise. It will be perceived by the user, which is inconvenient. Therefore, Patent Document 1 shown below solves the above-mentioned inconvenience by determining the pitch of the harmony voice signal in the vowel non-detection section using melody sequence data to be sung by a separately prepared singer. An apparatus for doing so is disclosed.

特許第3173310号Japanese Patent No. 3173310

上述したように、上記特許文献１に記載された従来の装置においては、入力音声信号のピッチを検出することが困難な母音非検出区間のハーモニー音声信号を生成するためにシーケンスデータのような特別なデータを必要とすることから、こうした特別なデータを予め用意しておく必要がある。しかし、マイクロフォン等を介して歌唱者（ユーザ）が入力しうる音声信号は多種多様であって、それら全てに関して前記特別なデータを予め用意しておくことは非現実的であるし、また例え全てに関して前記特別なデータを予め用意できるとしてもそれらを作成し記憶するには膨大な労力と記憶容量が必要とされコストがかかってしまい都合が悪い、という問題点がある。 As described above, in the conventional apparatus described in Patent Document 1, special data such as sequence data is used to generate a harmony voice signal in a vowel non-detection section in which it is difficult to detect the pitch of the input voice signal. Such special data must be prepared in advance. However, there are a wide variety of audio signals that can be input by a singer (user) via a microphone or the like, and it is impractical to prepare the special data in advance for all of them. However, even if the special data can be prepared in advance, it takes a lot of labor and storage capacity to create and store them, which is inconvenient and costly.

本発明は上述の点に鑑みてなされたもので、シーケンスデータのような特別なデータを用意せずとも、入力音声信号のピッチ検出が困難なピッチ非検出区間（具体的には上記したような母音非検出区間である子音区間）についても、ユーザに不自然なノイズ感を生じさせることの少ないハーモニー音声信号を生成することのできるようにした電子音楽装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-described points. A pitch non-detection section (specifically, as described above) is difficult to detect the pitch of an input audio signal without preparing special data such as sequence data. An object of the present invention is to provide an electronic music apparatus and program that can generate a harmony voice signal that does not cause an unnatural noise to the user even for a consonant section that is a vowel non-detection section. .

本発明の請求項１に係る電子音楽装置は、入力音声信号の音高変動に追従して音高が制御される１乃至複数のハーモニー音声信号を生成する電子音楽装置であって、音声信号を入力する入力手段と、前記入力された音声信号を分析してピッチを検出することに伴い、当該入力音声信号の所定区間毎に音名に対応した音高のいずれかを特定する入力音高特定手段と、前記特定した音高に従って１乃至複数のハーモニー音声の音高を決定するハーモニー音高決定手段と、前記特定した音高と前記決定した１乃至複数のハーモニー音声の音高とに基づき、１乃至複数のピッチシフト量を算出するシフト量算出手段と、前記算出した１乃至複数のピッチシフト量に基づいて前記所定区間毎に前記入力音声信号のピッチシフトをそれぞれ行い、前記決定した１乃至複数の音高に制御された１乃至複数のハーモニー音声信号を生成する楽音生成手段とを具えてなり、前記入力音高特定手段は、前記入力音声信号のピッチを検出できた場合には該検出したピッチを保持すると共に該ピッチに基づいて前記入力音声信号の音高を特定する一方で、前記入力音声信号のピッチを検出できない場合には直前に保持された前記ピッチに基づいて前記入力音声信号の音高を特定することを特徴とする。 An electronic music apparatus according to claim 1 of the present invention is an electronic music apparatus that generates one or a plurality of harmony audio signals whose pitches are controlled in accordance with pitch fluctuations of an input audio signal. Input pitch specification that specifies one of the pitches corresponding to the pitch name for each predetermined section of the input voice signal in accordance with the input means to input and analyzing the input voice signal to detect the pitch Means, a harmony pitch determining means for determining pitches of one or more harmony sounds according to the specified pitches, and the specified pitches and the pitches of the determined one or more harmony sounds, Shift amount calculating means for calculating one or a plurality of pitch shift amounts, and performing the pitch shift of the input audio signal for each of the predetermined sections based on the calculated one or more pitch shift amounts; And a musical tone generating means for generating one or more harmony voice signals controlled to one or more pitches, and the input pitch specifying means is capable of detecting the pitch of the input voice signals. While holding the detected pitch and specifying the pitch of the input voice signal based on the pitch, if the pitch of the input voice signal cannot be detected, the pitch is determined based on the pitch held immediately before. The pitch of the input audio signal is specified.

この発明によると、入力音声信号のピッチを検出できた場合には該検出したピッチを保持すると共に該ピッチに基づいて前記入力音声信号の音高（音名に対応した音高のいずれか）を特定する一方で、入力音声信号のピッチを検出できない場合には直前に保持された前記ピッチに基づいて前記入力音声信号の音高を特定する。すなわち、ハーモニー音声信号は入力音声信号をピッチシフトすることにより生成されることから、その際に用いられるピッチシフト量を算出するためには入力音声信号の音高（音名に対応した音高のいずれか）が必要となる。そうであるならば、入力音声信号のピッチを検出できないと前記音高が特定できずにピッチシフト量を算出することができないしまたピッチシフトする際に元の音高もわからないので、ハーモニー音声信号を生成することができない。そこで、入力音声信号のピッチを検出できない場合には直前に保持されたピッチに基づいて入力音声信号の音高を特定することによって、ピッチシフトする元の音高を明確にすると共にピッチシフト量を算出することのできるようにしている。このようにすると、特別なデータを別途用意せずとも、入力音声信号のピッチを検出できない区間について、入力音声信号をそのまま出力することなく、相前後する母音区間との間におけるピッチの不連続性をできる限り小さくまた音色の連続性を保持することができ、従ってユーザに不自然なノイズ感を生じさせることの少ないハーモニー音声信号を生成することができるようになる。 According to the present invention, when the pitch of the input voice signal can be detected, the detected pitch is held and the pitch of the input voice signal (one of the pitches corresponding to the pitch name) is based on the pitch. On the other hand, if the pitch of the input voice signal cannot be detected, the pitch of the input voice signal is specified based on the pitch held immediately before. That is, since the harmony voice signal is generated by shifting the pitch of the input voice signal, the pitch of the input voice signal (the pitch corresponding to the pitch name) is used to calculate the pitch shift amount used at that time. Either) is required. If this is the case, if the pitch of the input audio signal cannot be detected, the pitch cannot be determined and the pitch shift amount cannot be calculated, and the original pitch cannot be determined when the pitch is shifted. Cannot be generated. Therefore, when the pitch of the input voice signal cannot be detected, the pitch of the original pitch to be shifted is clarified and the pitch shift amount is determined by specifying the pitch of the input voice signal based on the pitch held immediately before. It can be calculated. In this way, the pitch discontinuity between the adjacent vowel sections without outputting the input voice signal as it is for the section in which the pitch of the input voice signal cannot be detected without preparing special data separately. As a result, it is possible to generate a harmony voice signal that can minimize the timbre and maintain the continuity of the timbre, and thus less cause an unnatural noise to the user.

本発明の請求項２に係る電子音楽装置は、入力音声信号の音高変動に追従して音高が制御される１乃至複数のハーモニー音声信号を生成する電子音楽装置であって、音声信号を入力する入力手段と、前記入力された音声信号を分析してピッチを検出することに伴い、当該入力音声信号の所定区間毎に音名に対応した音高のいずれかを特定する入力音高特定手段と、前記特定した音高に従って１乃至複数のハーモニー音声の音高を決定するハーモニー音高決定手段と、前記特定した音高と前記決定した１乃至複数のハーモニー音声の音高とに基づき、１乃至複数のピッチシフト量を算出するシフト量算出手段と、前記算出した１乃至複数のピッチシフト量を保持するシフト量保持手段と、前記保持した１乃至複数のピッチシフト量に基づいて前記所定区間毎に前記入力音声信号のピッチシフトをそれぞれ行い、前記決定した１乃至複数の音高に制御された１乃至複数のハーモニー音声信号を生成する楽音生成手段とを具えてなり、前記楽音生成手段は、前記入力音声信号のピッチを検出できなかった区間におけるハーモニー音声信号を生成する際に、ピッチを検出できた直前の区間の前記保持した１乃至複数のピッチシフト量に基づいてピッチシフトすることを特徴とする。これによっても、入力音声信号のピッチを検出できない区間のハーモニー音声信号を生成するために特別なデータを別途用意せずとも、ユーザに不自然なノイズ感を生じさせることの少ない１乃至複数のハーモニー音声信号を生成することができる。さらに、この場合には、母音非検出区間（子音区間）のハーモニー音声信号を生成するためにわざわざピッチシフト量を算出する必要がなく、処理の負担を軽減できて有利である。 An electronic music apparatus according to claim 2 of the present invention is an electronic music apparatus that generates one or a plurality of harmony audio signals whose pitches are controlled following the pitch fluctuation of an input audio signal, Input pitch specification that specifies one of the pitches corresponding to the pitch name for each predetermined section of the input voice signal in accordance with the input means to input and analyzing the input voice signal to detect the pitch Means, a harmony pitch determining means for determining pitches of one or more harmony sounds according to the specified pitches, and the specified pitches and the pitches of the determined one or more harmony sounds, Based on the shift amount calculating means for calculating one or more pitch shift amounts, the shift amount holding means for holding the calculated one or more pitch shift amounts, and on the basis of the held one or more pitch shift amounts. A musical tone generating means for generating one or a plurality of harmony voice signals controlled to the determined one or a plurality of pitches, each of which performs a pitch shift of the input voice signal for each predetermined section; When generating a harmony voice signal in a section in which the pitch of the input voice signal could not be detected, the means performs a pitch shift based on the held one or more pitch shift amounts in the section immediately before the pitch was detected. It is characterized by that. In this way, one or a plurality of harmonies that hardly cause an unnatural noise to the user can be obtained without preparing special data separately in order to generate a harmony voice signal in a section in which the pitch of the input voice signal cannot be detected. An audio signal can be generated. Furthermore, this case is advantageous in that it is not necessary to calculate the amount of pitch shift in order to generate a harmony speech signal in a vowel non-detection section (consonant section), and the processing load can be reduced.

本発明は装置の発明として構成し実施することができるのみならず、方法の発明として構成し実施することができる。また、本発明は、コンピュータまたはＤＳＰ等のプロセッサのプログラムの形態で実施することができるし、そのようなプログラムを記憶した記憶媒体の形態で実施することもできる。 The present invention can be constructed and implemented not only as a device invention but also as a method invention. Further, the present invention can be implemented in the form of a program of a processor such as a computer or a DSP, or can be implemented in the form of a storage medium storing such a program.

この発明によれば、検出されたピッチあるいは算出されたピッチシフト量を保持し、これに基づき入力音声信号のピッチを検出できなかった区間における１乃至複数のハーモニー音声信号を生成するようにした。これにより、従来のように特別なデータを用意せずとも、相前後する母音区間におけるピッチとの不連続性が小さくまた音色の連続性を保持してなり、ユーザに不自然なノイズ感を生じさせることの少ない１乃至複数のハーモニー音声信号を生成することができるようになる、という効果を奏する。 According to the present invention, the detected pitch or the calculated pitch shift amount is held, and based on this, one or a plurality of harmony audio signals in a section where the pitch of the input audio signal could not be detected is generated. As a result, even if special data is not prepared as in the past, the discontinuity with the pitch in successive vowel sections is small and the timbre continuity is maintained, resulting in an unnatural noise for the user. There is an effect that it is possible to generate one or a plurality of harmony audio signals that are rarely generated.

この発明に係る電子音楽装置の全体構成の一実施例を示したハード構成ブロック図である。1 is a block diagram of a hardware configuration showing an example of the overall configuration of an electronic music apparatus according to the present invention. ハーモニーテーブルのデータ構成の一実施例を示す概念図である。It is a conceptual diagram which shows one Example of the data structure of a harmony table. ハーモニー音生成処理の一実施例を示すフローチャートである。It is a flowchart which shows one Example of a harmony sound production | generation process. ハーモニー音生成処理を説明するための具体例を示す概念図である。It is a conceptual diagram which shows the specific example for demonstrating a harmony sound production | generation process. ハーモニー音生成処理の別の実施例を示すフローチャートである。It is a flowchart which shows another Example of a harmony sound production | generation process.

以下、この発明の実施の形態を添付図面に従って詳細に説明する。 Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

図１は、この発明に係る電子音楽装置の全体構成の一実施例を示したハード構成ブロック図である。本実施例に示す電子音楽装置は、マイクロプロセッサユニット（ＣＰＵ）１、リードオンリメモリ（ＲＯＭ）２、ランダムアクセスメモリ（ＲＡＭ）３からなるマイクロコンピュータによって制御される。ＣＰＵ１は、この電子音楽装置全体の動作を制御するものである。このＣＰＵ１に対して、データ及びアドレスバス１Ｄを介してＲＯＭ２、ＲＡＭ３、検出回路４，５、表示回路６、音源・効果回路７、Ａ／Ｄ変換回路８、記憶装置９、通信インタフェース（Ｉ／Ｆ）１０がそれぞれ接続されている。 FIG. 1 is a hardware configuration block diagram showing an embodiment of the overall configuration of an electronic music apparatus according to the present invention. The electronic music apparatus shown in this embodiment is controlled by a microcomputer comprising a microprocessor unit (CPU) 1, a read only memory (ROM) 2, and a random access memory (RAM) 3. The CPU 1 controls the operation of the entire electronic music apparatus. The CPU 1 is connected to the ROM 1, RAM 3, detection circuits 4 and 5, display circuit 6, sound source / effect circuit 7, A / D conversion circuit 8, storage device 9, communication interface (I / O) via the data and address bus 1 D. F) 10 are connected to each other.

ＲＯＭ２は、ＣＰＵ１により実行あるいは参照される各種制御プログラムや例えば図２に示したハーモニーテーブル（音高決定テーブル）などの各種データ等を格納する。ＲＡＭ３は、ＣＰＵ１が所定のプログラムを実行する際に発生する各種データなどを一時的に記憶するワーキングメモリとして、あるいは現在実行中のプログラムやそれに関連するデータを一時的に記憶するメモリ等として使用される。ＲＡＭ３の所定のアドレス領域がそれぞれの機能に割り当てられ、レジスタやフラグ、テーブル、テンポラリメモリなどとして利用される。 The ROM 2 stores various control programs executed or referred to by the CPU 1, various data such as a harmony table (pitch determination table) shown in FIG. The RAM 3 is used as a working memory for temporarily storing various data generated when the CPU 1 executes a predetermined program, or as a memory for temporarily storing a currently executing program and related data. The A predetermined address area of the RAM 3 is assigned to each function, and is used as a register, a flag, a table, a temporary memory, or the like.

演奏操作子４Ａは楽音の音高を選択するための複数の鍵を備えた例えば鍵盤等のようなものであり、各鍵に対応してキースイッチを有しており、この演奏操作子４Ａ（鍵盤等）はユーザ自身の手弾きによるマニュアル演奏のために使用することができるのは勿論のこと、またこの実施例においてはユーザ自身が和音（コード）を手弾きすることによって、ハーモニー音声信号を生成する際に参照される「ハーモニーテーブル」（後述の図２参照）を特定するコード情報を入力することができるようになっている。検出回路４は、演奏操作子４Ａの各鍵の押圧及び離鍵を検出することによって検出出力を生じる。 The performance operator 4A is, for example, a keyboard provided with a plurality of keys for selecting the pitch of a musical tone, and has a key switch corresponding to each key. Of course, the keyboard etc. can be used for manual performance by the user's own playing, and in this embodiment, the user himself / herself plays the chords (chords) to generate the harmony audio signal. Code information for specifying a “harmony table” (see FIG. 2 to be described later) to be referred to at the time of generation can be input. The detection circuit 4 generates a detection output by detecting the pressing and release of each key of the performance operator 4A.

設定操作子（スイッチ等）５Ａは、例えば入力音声信号に基づくハーモニー音声信号の自動生成開始／停止を指示するハーモニー自動生成開始／停止ボタンや、信号制御に関する各種パラメータを設定する設定スイッチなどであってよい。勿論、設定操作子５Ａは上記した以外にも音高、音色、効果等を選択・設定・制御するための数値データ入力用のテンキーや文字データ入力用のキーボード、あるいはディスプレイ６Ａに表示されたポインタなどを操作するマウス等の各種操作子を含んでいてもよい。検出回路５は、上記設定操作子５Ａの操作状態を検出し、その操作状態に応じたスイッチ情報等をデータ及びアドレスバス１Ｄを介してＣＰＵ１に出力する。 The setting operator (switch, etc.) 5A is, for example, a harmony automatic generation start / stop button for instructing automatic generation start / stop of a harmony audio signal based on an input audio signal, a setting switch for setting various parameters related to signal control, and the like. It's okay. Of course, the setting operation element 5A has a numeric data input numeric keypad and a character data input keyboard for selecting / setting / controlling pitches, tones, effects, etc., or a pointer displayed on the display 6A. Various operators such as a mouse for operating the above may be included. The detection circuit 5 detects the operation state of the setting operation element 5A, and outputs switch information and the like corresponding to the operation state to the CPU 1 via the data and address bus 1D.

表示回路６は、例えば液晶表示パネル（ＬＣＤ）やＣＲＴ等から構成されるディスプレイ６Ａに、ユーザがマイクロフォン８Ａ等を介して音声信号を入力する際に参照することが可能な楽譜や歌詞など、上記設定操作子（スイッチ等）５Ａにより設定された各種パラメータの設定内容、あるいは予め記憶されている各種データの一覧やＣＰＵ１の制御状態などといった各種情報を表示する。 The display circuit 6 includes, for example, musical scores and lyrics that can be referred to when a user inputs an audio signal to the display 6A including a liquid crystal display panel (LCD), a CRT, or the like via the microphone 8A or the like. Various information such as setting contents of various parameters set by a setting operator (switch or the like) 5A, a list of various data stored in advance, a control state of the CPU 1, and the like are displayed.

音源・効果回路７は複数のチャンネルで楽音信号の同時発生が可能であり、データ及びアドレスバス１Ｄを経由して与えられる、例えばマイクロフォン８Ａを介して入力された入力音声信号、前記入力音声信号に基づき生成したハーモニー音声信号などを発生する。音源・効果回路７から発生されたこれらの信号は、アンプやスピーカなどを含むサウンドシステム７Ａから発音される。また、音源・効果回路７は入力音声信号やハーモニー音声信号などを発生する際に、例えばジェンダー（男性声、女性声といった声質のタイプおよび深さ）、ビブラート（深さと周期の変化率、ビブラート開始までの遅延時間）、トレモロ、音量、パン（定位）、デチューン、リバーブ（残響）などの各種効果を付与することができるようになっていてもよい。なお、音源・効果回路７とサウンドシステム７Ａの構成には、従来のいかなる構成を用いてもよい。例えば、音源・効果回路７はＦＭ、ＰＣＭ、物理モデル、フォルマント合成等の各種楽音合成方式のいずれを採用してもよく、専用のハードウェアで構成してもよいし、ＣＰＵ１あるいはＤＳＰ（Digital Signal Processor）によるソフトウェア処理で構成してもよい。 The tone generator / effect circuit 7 can simultaneously generate musical sound signals in a plurality of channels, and is applied to the input audio signal, for example, input via the microphone 8A, which is given via the data and address bus 1D. Generates a harmony voice signal generated based on the above. These signals generated from the sound source / effect circuit 7 are generated by a sound system 7A including an amplifier and a speaker. When the sound source / effect circuit 7 generates an input voice signal, a harmony voice signal, etc., for example, gender (type and depth of voice quality such as male voice and female voice), vibrato (depth and cycle change rate, vibrato start) Delay time), tremolo, volume, pan (localization), detune, reverb (reverberation), and the like. It should be noted that any conventional configuration may be used for the configuration of the sound source / effect circuit 7 and the sound system 7A. For example, the tone generator / effect circuit 7 may employ any of various tone synthesis methods such as FM, PCM, physical model, formant synthesis, etc., may be configured with dedicated hardware, or may be a CPU 1 or DSP (Digital Signal). (Processor) software processing.

マイクロフォン８Ａは、例えばユーザが発した音声やユーザ自らが演奏した楽器の演奏音などの音声信号を入力するための信号入力機器である。Ａ／Ｄ変換回路８は、マイクロフォン８Ａから入力された入力音声信号をディジタル変換する。ディジタル変換された入力音声信号は、前記音源・効果回路７に入力される。なお、音声信号を入力するための信号入力機器はマイクロフォン８Ａに限らず、例えば予め記憶済みの音声信号を再生して供給する再生機器であってもよい。 The microphone 8A is a signal input device for inputting a sound signal such as a sound uttered by a user or a performance sound of a musical instrument performed by the user himself / herself. The A / D conversion circuit 8 digitally converts the input voice signal input from the microphone 8A. The digitally converted input audio signal is input to the sound source / effect circuit 7. The signal input device for inputting the audio signal is not limited to the microphone 8A, and may be a reproduction device that reproduces and supplies a previously stored audio signal, for example.

記憶装置９は、予め用意されたハーモニーテーブル（図２参照）やＣＰＵ１が実行する各種制御プログラムなどの各種情報を記憶する。あるいは、入力された入力音声信号や生成されたハーモニー音声信号などを記憶できるようにしてもよい。なお、上述したＲＯＭ２に制御プログラムが記憶されていない場合、この記憶装置９（例えばハードディスク）に制御プログラムを記憶させておき、それをＲＡＭ３に読み込むことにより、ＲＯＭ２に制御プログラムを記憶している場合と同様の動作をＣＰＵ１に実行させることができる。このようにすると、制御プログラムの追加やバージョンアップ等が容易に行える。また、記憶装置９はハードディスク（HD）に限られず、フレキシブルディスク（FD）、コンパクトディスク（CD‐ROM・CD‐RAM）、光磁気ディスク（MO）、あるいはDVD（Digital Versatile Disk）等の様々な形態の記憶媒体を利用する記憶装置であればどのようなものであってもよい。あるいは、フラッシュメモリなどの半導体メモリであってもよい。 The storage device 9 stores various information such as a prepared harmony table (see FIG. 2) and various control programs executed by the CPU 1. Or you may enable it to memorize | store the input audio | voice signal input, the produced | generated harmony audio | voice signal, etc. When the control program is not stored in the ROM 2 described above, the control program is stored in the storage device 9 (for example, a hard disk) and read into the RAM 3 to store the control program in the ROM 2. It is possible to cause the CPU 1 to execute the same operation as in FIG. In this way, control programs can be easily added and upgraded. The storage device 9 is not limited to a hard disk (HD), but may be a flexible disk (FD), a compact disk (CD-ROM / CD-RAM), a magneto-optical disk (MO), or a DVD (Digital Versatile Disk). Any storage device may be used as long as it uses a storage medium in the form. Alternatively, a semiconductor memory such as a flash memory may be used.

通信インタフェース（Ｉ／Ｆ）１０は、当該装置と図示しない外部機器との間で制御プログラムや各種データなどの各種情報を送受信するためのインタフェースである。この通信インタフェース１０は、例えばMIDIインタフェース，ＬＡＮ，インターネット，電話回線等であってよく、また有線あるいは無線のものいずれかでなく双方を具えていてよい。 A communication interface (I / F) 10 is an interface for transmitting and receiving various information such as a control program and various data between the apparatus and an external device (not shown). The communication interface 10 may be, for example, a MIDI interface, a LAN, the Internet, a telephone line, or the like, and may include both wired and wireless ones.

なお、上述した実施例において、演奏操作子４Ａは鍵盤楽器の形態に限らず、弦楽器や管楽器あるいは打楽器等どのようなタイプの形態でもよい。また、電子音楽装置は演奏操作子４Ａやディスプレイ６Ａあるいは音源・効果回路７などを１つの装置本体に内蔵したものに限らず、それぞれが別々に構成され、MIDIインタフェースや各種ネットワーク等の通信インタフェース１０を用いて各装置を接続するように構成されたものであってもよいことは言うまでもない。
なお、本発明に係る電子音楽装置は、カラオケ装置、電子楽器、パーソナルコンピュータ、携帯電話等の携帯通信端末、あるいはゲーム装置など、どのような形態の装置・機器であってもよい。携帯通信端末の場合、端末のみで所定の機能が完結している場合に限らず、機能の一部をサーバ側に持たせ、端末とサーバとからなるシステム全体として所定の機能を実現するようにしてもよい。 In the above-described embodiment, the performance operator 4A is not limited to a keyboard instrument, and may be any type such as a stringed instrument, a wind instrument, or a percussion instrument. Further, the electronic music apparatus is not limited to one in which the performance operator 4A, the display 6A, the sound source / effect circuit 7 and the like are built in one apparatus body, but each is configured separately, and a communication interface 10 such as a MIDI interface or various networks. Needless to say, the apparatus may be configured to connect the respective devices using the.
The electronic music apparatus according to the present invention may be any form of apparatus / equipment such as a karaoke apparatus, an electronic musical instrument, a personal computer, a mobile communication terminal such as a mobile phone, or a game apparatus. In the case of a mobile communication terminal, not only the case where a predetermined function is completed with only the terminal, but also a part of the function is provided on the server side so that the predetermined function is realized as a whole system including the terminal and the server. May be.

図１に示した電子音楽装置は、マイクロフォン８Ａを介して入力された入力音声信号を周波数解析してピッチを検出し（最終的には音楽の音名のいずれかに対応する特定の音高に特定する）、該特定した音高と鍵盤等から入力されたコード情報とを元にして別途新たに１乃至複数の目標音高（同様に音楽の音名のいずれかに対応する特定の音高である）を決定し、該決定した目標音高のハーモニー音声信号を自動的に生成するハーモニー音生成機能（又はハーモニー付加機能）を有する。 The electronic music apparatus shown in FIG. 1 detects the pitch by analyzing the frequency of the input audio signal input via the microphone 8A (finally at a specific pitch corresponding to one of the musical pitch names). Specific), based on the specified pitch and the chord information input from the keyboard or the like, one or more new target pitches (specifically, specific pitches corresponding to any of the musical pitch names). And a harmony sound generation function (or a harmony addition function) for automatically generating a harmony sound signal of the determined target pitch.

ここで、前記目標音高は、入力音声信号を周波数解析して得られた音楽の音名のいずれかに対応する特定の音高と鍵盤等から入力されたコード情報とに基づき、予め用意された図２に示すハーモニーテーブル（音高決定テーブル）に従って１２音音階の階名（音名）のいずれかに決定されるようになっている。図２は、ハーモニーテーブルのデータ構成を示す概念図である。ただし、ここではコード情報として「Ｃメジャー」が指定された場合に参照されるテーブルであって、３系列のハーモニー音声信号をそれぞれ同時に生成する際に参照するものを例に示している。 Here, the target pitch is prepared in advance based on a specific pitch corresponding to one of musical pitches obtained by frequency analysis of the input voice signal and chord information input from a keyboard or the like. According to the harmony table (pitch determination table) shown in FIG. 2, any one of twelve scale names (sound names) is determined. FIG. 2 is a conceptual diagram showing the data structure of the harmony table. However, here is an example of a table that is referred to when “C major” is designated as the code information, and is referred to when three series of harmony audio signals are generated simultaneously.

ハーモニーテーブルはコード毎に１テーブルずつ複数のテーブルが予め記憶されており、前記コード情報に従って対応する１テーブルが特定されるようになっている。図２から理解できるように、ハーモニーテーブルは入力音声信号を周波数解析して得られた音楽の音名のいずれかに対応する特定の音高（入力音高）毎に、１乃至複数のハーモニー系列の目標音高を定義する。ただし、この図２では、入力音高について音名「Ｃ,Ｄ,Ｅ,Ｆ,Ｇ,Ａ,Ｂ」で表記してあり、♯（シャープ）又は♭（フラット）のついた入力音高については記載を省略している。また、３つのハーモニー系列それぞれの目標音高についても同様に音名で表記している。 In the harmony table, a plurality of tables are stored in advance for each code, and one corresponding table is specified according to the code information. As can be understood from FIG. 2, the harmony table includes one or more harmony sequences for each specific pitch (input pitch) corresponding to one of the musical pitch names obtained by frequency analysis of the input voice signal. Define the target pitch for. However, in FIG. 2, the input pitch is indicated by the pitch name “C, D, E, F, G, A, B”, and the input pitch with # (sharp) or ♭ (flat) is shown. Is omitted. Similarly, the target pitches of the three harmony series are also indicated by their pitch names.

前記目標音高の音名表記に関し、例えば目標音高「Ｇ」は入力音高と同じオクターブ領域の「Ｇ」音であることを示し、目標音高「Ｃ＋」は入力音高から１つ上のオクターブ領域の「Ｃ」音であることを示し、目標音高「Ｅ−」は入力音高から１つ下のオクターブ領域の「Ｅ」音であることを示す。したがって、図示の例では、例えば入力音高が「Ｅ３」である場合には第１系列のハーモニー音声信号の目標音高として「Ｃ４」、第２系列のハーモニー音声信号の目標音高として「Ｇ３」、第３系列のハーモニー音声信号の目標音高として「Ｇ２」にそれぞれ決定されることになる。なお、図２においては目標音高の括弧内に、入力音高と各目標音高との差分であるピッチシフト量（単位：セント）を便宜的に示している。また、この実施例ではオクターブ領域を「Ｃ」と「Ｂ」との間で区切るものを例に示している。 Regarding the pitch notation of the target pitch, for example, the target pitch “G” indicates a “G” tone in the same octave region as the input pitch, and the target pitch “C +” is one level higher than the input pitch. The target pitch “E−” indicates that the sound is the “E” sound in the octave region one level lower than the input pitch. Therefore, in the illustrated example, for example, when the input pitch is “E3”, “C4” is set as the target pitch of the first sequence harmony voice signal, and “G3” is set as the target pitch of the second sequence harmony voice signal. "G2" is determined as the target pitch of the third series harmony voice signal. In FIG. 2, the pitch shift amount (unit: cent), which is the difference between the input pitch and each target pitch, is shown in parentheses for the target pitch for convenience. In this embodiment, an example in which the octave region is divided between “C” and “B” is shown.

次に、上記したハーモニー音生成機能を実現する「ハーモニー音生成処理」について、図３及び図４を用いて説明する。図３は、「ハーモニー音生成処理」の一実施例を示すフローチャートである。当該処理は、例えばハーモニー自動生成開始／停止ボタンの操作に従いハーモニー音の自動生成の開始が指示されることに応じて開始され、ハーモニー音の自動生成の停止が指示されるまで繰り返し実行される。図４はハーモニー音生成処理を説明するための具体例を示す概念図であり、この図４を適宜に参照しながら上記処理について説明する。 Next, “harmonic sound generation processing” for realizing the above-described harmony sound generation function will be described with reference to FIGS. 3 and 4. FIG. 3 is a flowchart showing an embodiment of “harmonic sound generation processing”. The processing is started in response to an instruction to start automatic generation of harmony sound in accordance with, for example, the operation of the automatic harmony sound generation start / stop button, and is repeatedly executed until an instruction to stop automatic generation of harmony sound is instructed. FIG. 4 is a conceptual diagram showing a specific example for explaining the harmony sound generation processing. The above processing will be described with reference to FIG. 4 as appropriate.

ステップＳ１は、ディジタル化された入力音声信号を解析して所定区間毎（例えば数ms〜数十ms毎）にピッチを検出する。すなわち、マイクロフォン８Ａ等を介して入力される入力音声信号をＡ／Ｄ変換回路８によりディジタル化し、該ディジタル化された入力音声信号を「周波数検出」処理により周波数信号に変換する。この「周波数検出」処理は、例えば音声分析の分野で周知の技術であるゼロクロス法などの公知のどのような技術を用いてもよいことから、ここでの説明を省略する。 In step S1, the digitized input voice signal is analyzed, and the pitch is detected at predetermined intervals (for example, every several ms to several tens of ms). That is, an input voice signal input via the microphone 8A or the like is digitized by the A / D conversion circuit 8, and the digitized input voice signal is converted into a frequency signal by “frequency detection” processing. For this “frequency detection” process, any known technique such as the zero-cross method, which is a technique well-known in the field of speech analysis, may be used.

ステップＳ２は、入力音声信号のピッチを検出することができたか否かを判定する。入力音声信号のピッチ検出ができた場合には（ステップＳ２のＹＥＳ）、検出したピッチを保持する（ステップＳ３）。すなわち、検出したピッチを保持するためにＲＡＭ３などに予め用意されているピッチ保持用の記憶領域を随時に更新する。また、この際には前記検出したピッチに対応した窓関数により切り出される１周期分の波形要素データを記憶する記憶領域を、新たに切り出された前記波形要素データによって更新する処理が行われる。 In step S2, it is determined whether or not the pitch of the input audio signal has been detected. If the pitch of the input audio signal can be detected (YES in step S2), the detected pitch is held (step S3). That is, a pitch holding storage area prepared in advance in the RAM 3 or the like to hold the detected pitch is updated as needed. At this time, a process of updating a storage area for storing one period of waveform element data cut out by the window function corresponding to the detected pitch with the newly cut out waveform element data is performed.

一方、入力音声信号のピッチ検出ができない場合には（ステップＳ２のＮＯ）、直前に検出されて保持されたピッチをそのまま保持する（ステップＳ９）。すなわち、前記ピッチ保持用の記憶領域の更新を行わない。また、この際には前記波形要素データの更新も行われず、直前に記憶された１周期分の波形要素データがそのまま保持される。なお、このステップＳ９の処理は実質的にステップＳ３のピッチ保持の処理つまりは検出したピッチによる前記ピッチ保持用の記憶領域の更新を実行しなければ達成される処理であることから、省略してもよい。 On the other hand, if the pitch of the input audio signal cannot be detected (NO in step S2), the pitch detected and held immediately before is held (step S9). That is, the storage area for holding the pitch is not updated. At this time, the waveform element data is not updated, and the waveform element data for one cycle stored immediately before is held as it is. Note that the processing in step S9 is substantially omitted because the processing is achieved unless the pitch holding processing in step S3, that is, the update of the pitch holding storage area by the detected pitch is executed. Also good.

ステップＳ４は、前記保持したピッチを音高（ノート）単位にクオンタイズすることにより入力音声信号の音高を特定する（検出する）。すなわち、前記周波数信号を「平坦化（クオンタイズ）」処理することによって、周波数信号の変化を平坦化（又は平滑化とも呼ばれる）する。該平坦化された周波数信号は「階名検出」処理により、所定時間毎に例えば１２音音階の階名（音名）のいずれかに離散化される。具体的には、平坦化された周波数信号が半音（１００セント）単位で定められた複数の音楽の音名「Ｃ,Ｄ,Ｅ,Ｆ,Ｇ,Ａ,Ｂ」及びこれらに♯（シャープ）あるいは♭（フラット）がついた音名のいずれかに対応する音高に丸められる。このようにして、入力された入力音声信号を音楽の音名に対応する音高のいずれかに特定する。 In step S4, the pitch of the input voice signal is specified (detected) by quantizing the held pitch in pitch (note) units. That is, the frequency signal is flattened (or also referred to as smoothing) by “flattening (quantizing)” the frequency signal. The flattened frequency signal is discretized into, for example, any one of twelve-tone scale names (pitch names) every predetermined time by “floor name detection” processing. Specifically, the flattened frequency signal is a plurality of musical pitch names “C, D, E, F, G, A, B” in which semitones (100 cents) are defined, and # (sharp) Or it is rounded to the pitch corresponding to one of the pitch names with ♭ (flat). In this way, the input voice signal that has been input is identified as one of the pitches corresponding to the musical pitch name.

ここで、図４（ａ）は、音名「Ｅ」の音高を基準に上下に小さなピッチ揺れ（半音以下であり、数〜数十セント程度）を有する第１の母音区間から、図中において時刻ｔ１〜ｔ２で示すピッチを明確に検出することが困難であるが音程感のある子音区間（母音非検出区間）を介して、音名「Ｇ」の音高を基準に上下に小さなピッチ揺れ（同じく数〜数十セント程度）を有する第２の母音区間へと遷移する一連の入力音声信号である。例えば、歌詞が「いか」であるような場合には、第１の母音区間に該当するのは音節「い」の音素「ｉ」であり、子音区間に該当するのは音節「か」の音素「ｋ」であり、第２の母音区間に該当するのは音節「か」の音素「ａ」である。 Here, FIG. 4A shows from the first vowel section having a small pitch fluctuation (less than a semitone and about several to several tens of cents) on the basis of the pitch of the pitch name “E”. , The pitch indicated by the time t1 to t2 is difficult to detect clearly, but the pitch is small up and down based on the pitch of the pitch name “G” through a consonant interval (vowel non-detection interval) with a sense of pitch. It is a series of input speech signals that transition to a second vowel section having shaking (similarly about several to several tens of cents). For example, when the lyrics are “Ika”, the phoneme “i” of the syllable “i” corresponds to the first vowel segment, and the phoneme of the syllable “ka” corresponds to the consonant segment. It is “k”, and the phoneme “a” of the syllable “ka” corresponds to the second vowel section.

このような入力音声信号の場合、図４（ｂ）に示すように、ピッチ検出可能な第１の母音区間については保持したピッチに従って音名「Ｅ」の音高に特定することができ、第２の母音区間については音名「Ｇ」の音高にそれぞれ特定することができる。しかし、時刻ｔ１〜ｔ２で示す子音区間においては短時間にピッチが大きくかつイレギュラーに変動しており明確にピッチを検出することができないために、音高（音名のいずれかに対応する音高）を特定することができない。そこで、本発明においては、直前に検出されて保持される第１の母音区間において最後に検出されたピッチ（図４（ａ）に示す保持ピッチ）に従って子音区間の音高を特定するようにしたことから（上記ステップＳ９参照）、この例では子音区間の音高として仮定的に音名「Ｅ」の音高に特定することができる（図４（ｂ）では仮定音高と記載）。 In the case of such an input voice signal, as shown in FIG. 4B, the pitch of the pitch name “E” can be specified in accordance with the held pitch for the first vowel section where the pitch can be detected. Two vowel intervals can be specified for the pitch of the pitch name “G”. However, in the consonant section indicated by the times t1 to t2, the pitch is large and irregular in a short time and cannot be clearly detected. Therefore, the pitch (sound corresponding to one of the pitch names) is not detected. High) cannot be specified. Therefore, in the present invention, the pitch of the consonant section is specified according to the last detected pitch (holding pitch shown in FIG. 4A) in the first vowel section detected and held immediately before. For this reason (see step S9 above), in this example, the pitch of the pitch name “E” can be specified hypothetically as the pitch of the consonant section (described as the assumed pitch in FIG. 4B).

ステップＳ５は、例えばユーザが鍵盤等を操作して入力したコード情報を取得する。ステップＳ６は、前記取得したコード情報と前記求めた入力音声信号の音高とに基づいて、ＲＯＭ２又は記憶装置８に記憶されている該当するハーモニーテーブルを参照して１乃至複数のハーモニー音声信号の音高（目標音高）を決定する。例えば入力されたコード情報が「Ｃメジャー」であり図２に示したハーモニーテーブルに従えば、図４に示す第１の母音区間における目標音高は「Ｃ＋」，「Ｇ」,「Ｇ−」であり、第２の母音区間における目標音高は「Ｅ＋」，「Ｃ＋」,「Ｃ」となる。また、この実施例において子音区間における目標音高は上記したように子音区間の音高として仮定的に音名「Ｅ」の音高に特定されることから、前記第１の母音区間における目標音高「Ｃ＋」，「Ｇ」,「Ｇ−」と同じになる。 In step S5, for example, chord information input by the user operating a keyboard or the like is acquired. Step S6 refers to the corresponding harmony table stored in the ROM 2 or the storage device 8 on the basis of the acquired code information and the pitch of the obtained input voice signal, and stores one or more harmony voice signals. Determine the pitch (target pitch). For example, if the input chord information is “C major” and the harmony table shown in FIG. 2 is used, the target pitches in the first vowel section shown in FIG. 4 are “C +”, “G”, “G−”. The target pitches in the second vowel section are “E +”, “C +”, and “C”. Further, in this embodiment, the target pitch in the consonant section is presumed to be the pitch of the pitch name “E” as the pitch of the consonant section as described above, so that the target sound in the first vowel section is determined. High “C +”, “G”, “G-”.

ステップＳ７は、前記ステップＳ６で決定した１乃至複数の目標音高と前記ステップＳ１で検出した入力音声信号のピッチとを比較して差分を求め、該求めた差分に従って１乃至複数のピッチシフト量を求める。ステップＳ８は、前記求めた１乃至複数のピッチシフト量に基づいて入力音声信号（詳しくは前記記憶済みの１周期の波形要素データ）をピッチシフトすることによって、前記目標音高の１乃至複数系列のハーモニー音声信号を生成する。 In step S7, one or more target pitches determined in step S6 and the pitch of the input audio signal detected in step S1 are compared to obtain a difference, and one or more pitch shift amounts are determined according to the obtained difference. Ask for. In step S8, one or more sequences of the target pitch are shifted by pitch-shifting an input audio signal (specifically, the stored waveform element data of one cycle) based on the obtained one or more pitch shift amounts. Generates a harmony voice signal.

図４（ｃ）に、上記ステップＳ７の処理に従って求められるそれぞれの区間におけるピッチシフト量を実線で示す。ただし、子音区間においては比較のために、従来のハーモニー音声信号として入力音声信号をピッチシフトすることなくそのまま出力する場合、つまりは全てのハーモニー系列のピッチシフト量を一旦「０」とする場合を太線で示した。この図４（ｃ）から理解できるように、上記のようにして目標音高が「Ｃ＋」，「Ｇ」，「Ｇ−」に決定した第１の母音区間についてはピッチシフト量が「+800」，「+300」，「-900」となり、目標音高が「Ｅ＋」，「Ｃ＋」,「Ｃ」に決定した第２の母音区間はピッチシフト量が「+900」，「+500」，「-700」になる。そして、従来であれば、全てのハーモニー系列のピッチシフト量が一旦「０」とされる第１の母音区間から第２の母音区間に遷移する子音区間（ピッチ非検出区間）については、子音区間の直前の入力音声信号に対するピッチシフト量「+800」，「+300」，「-900」が実質的に維持されており、従来と比較して第１の母音区間及び第２の母音区間とのピッチシフト量の不連続さが緩和されていることが理解できる。 FIG. 4C shows the pitch shift amount in each section obtained in accordance with the process of step S7 by a solid line. However, in the consonant section, for comparison, when the input speech signal is output as it is without being pitch-shifted as a conventional harmony speech signal, that is, when the pitch shift amount of all harmony sequences is once set to “0”. Shown in bold lines. As can be understood from FIG. 4C, the pitch shift amount is “+800” for the first vowel section in which the target pitches are determined as “C +”, “G”, and “G−” as described above. ”,“ +300 ”,“ −900 ”, and the pitch shift amount is“ +900 ”,“ +500 ”in the second vowel section where the target pitch is determined to be“ E + ”,“ C + ”,“ C ”. , “-700”. Conventionally, for a consonant section (pitch non-detection section) in which the pitch shift amount of all the harmony sequences is temporarily set to “0”, the transition from the first vowel section to the second vowel section is performed. The pitch shift amounts “+800”, “+300”, and “−900” with respect to the input speech signal immediately before are substantially maintained, and the first vowel interval and the second vowel interval are compared with the conventional case. It can be understood that the discontinuity of the pitch shift amount is reduced.

このように、本発明においてはピッチを明確に検出することが困難な子音区間であっても、目標音高を実現するためのピッチシフト量を全てのハーモニー系列について一旦「０」とすることなく、各ハーモニー系列毎にそれぞれ母音区間と同様にピッチシフト量を求める（設定する）ことができる。これに従って、それぞれの区間において入力音声信号（前記記憶した波形要素データ）に対してピッチシフトを行うことにより、図４（ｄ）に示すような３系列からなるハーモニー音声信号をそれぞれ生成することができる。 Thus, in the present invention, even in a consonant section where it is difficult to detect the pitch clearly, the pitch shift amount for realizing the target pitch is not once set to “0” for all the harmony sequences. The pitch shift amount can be obtained (set) for each harmony sequence in the same manner as in the vowel section. Accordingly, by performing pitch shift on the input voice signal (the stored waveform element data) in each section, it is possible to generate the harmony voice signals composed of three sequences as shown in FIG. it can.

この図４（ｄ）から理解できるように、第１のハーモニー系列において、従来では「Ｃ＋」，「Ｅ」，「Ｅ＋」の順に信号が遷移していたが、本実施例では「Ｃ＋」，「Ｃ＋」，「Ｅ＋」の順に信号が遷移するようになっている。第２のハーモニー系列において、従来では「Ｇ」，「Ｅ」，「Ｃ＋」の順に信号が遷移していたが、本実施例では「Ｇ」，「Ｇ」，「Ｃ＋」の順に信号が遷移するようになっている。すなわち、第１及び第２のハーモニー系列においては従来に比べて本実施例の方が、第１の母音区間から子音区間に遷移する時刻ｔ１及び子音区間から第２の母音区間に遷移する時刻ｔ２の双方におけるハーモニー音声信号のピッチの不連続が小さくなっている。 As can be understood from FIG. 4 (d), in the first harmony sequence, signals have conventionally transitioned in the order of "C +", "E", and "E +", but in this embodiment, "C +", The signal transitions in the order of “C +” and “E +”. In the second harmony sequence, the signal has conventionally transitioned in the order of “G”, “E”, “C +”, but in this embodiment, the signal has transitioned in the order of “G”, “G”, “C +”. It is supposed to be. That is, in the first and second harmony sequences, in the present embodiment, the time t1 at which transition from the first vowel section to the consonant section and the time t2 at which transition from the consonant section to the second vowel section occur in the present embodiment. In both cases, the pitch discontinuity of the harmony voice signal is small.

一方、第３のハーモニー系列において、従来では「Ｇ−」，「Ｅ」，「Ｃ」の順に信号が遷移していたが、本実施例では「Ｇ−」，「Ｇ−」，「Ｃ」の順に信号が遷移するようになっている。すなわち、この第３のハーモニー系列では、従来に比べて本実施例の方が子音区間から第２の母音区間に遷移する時刻ｔ２でのピッチ不連続が大きくなってしまっていることが理解できる。しかし、従来では時刻ｔ１に生じていたピッチの不連続が解消されて時刻ｔ２のみでピッチ不連続が生ずること、従来では子音区間においては単なる入力音声であったものが明確に音程感を持ったハーモニー音の一部として捉えられることなどの点から、時刻ｔ２において本実施例ではピッチ不連続は増加したもののその影響に比べて上記利点の方が大きい。 On the other hand, in the third harmony sequence, the signal has conventionally transitioned in the order of “G-”, “E”, “C”, but in this embodiment, “G-”, “G-”, “C”. The signal transitions in this order. That is, in the third harmony sequence, it can be understood that the pitch discontinuity at the time t2 when the present embodiment transitions from the consonant section to the second vowel section is larger than in the conventional example. However, the discontinuity of the pitch that occurred at the time t1 in the past has been eliminated, and the discontinuity of the pitch occurs only at the time t2, and in the past, what was a simple input voice in the consonant section clearly has a sense of pitch. From the point of being captured as a part of the harmony sound, the pitch discontinuity is increased in the present embodiment at time t2, but the above advantages are greater than the influence thereof.

以上のように、本発明においては、入力音声信号のピッチを検出できた場合には該検出したピッチを保持すると共に該ピッチに基づいて前記入力音声信号の音高（音名に対応した音高のいずれか）を特定する一方で、入力音声信号のピッチを検出できない場合には直前に保持された前記ピッチに基づいて前記入力音声信号の音高を特定するようにした。すなわち、ハーモニー音声信号は入力音声信号をピッチシフトすることにより生成されることから、その際に用いられるピッチシフト量を算出するためには入力音声信号の音高（音名に対応した音高のいずれか）が必要である。そこで、入力音声信号のピッチを検出できない場合には直前に保持されたピッチに基づいて入力音声信号の音高を特定することによって、ピッチシフトする元の音高を明確にすると共にピッチシフト量を算出するようにしている。このようにすると、入力音声信号のピッチを検出できない区間について、入力音声信号をそのまま出力することなく、相前後する母音区間との間におけるピッチの不連続性をできる限り小さくまた音色の連続性を保持することができる。つまり、特別なデータを別途用意せずとも、ユーザに不自然なノイズ感を生じさせることの少ない１乃至複数のハーモニー音声信号を生成することが簡単にできるようになる。
なお、上述した処理ではピッチ非検出区間においてコードが任意に変更されたとしても、該変更に対応した音高のハーモニー音声信号を生成することができ有利である。 As described above, in the present invention, when the pitch of the input voice signal can be detected, the detected pitch is held, and the pitch of the input voice signal (pitch corresponding to the pitch name) is based on the pitch. On the other hand, if the pitch of the input voice signal cannot be detected, the pitch of the input voice signal is specified based on the pitch held immediately before. That is, since the harmony voice signal is generated by shifting the pitch of the input voice signal, the pitch of the input voice signal (the pitch corresponding to the pitch name) is used to calculate the pitch shift amount used at that time. Any) is required. Therefore, when the pitch of the input voice signal cannot be detected, the pitch of the original pitch to be shifted is clarified and the pitch shift amount is determined by specifying the pitch of the input voice signal based on the pitch held immediately before. I am trying to calculate. In this way, in a section in which the pitch of the input voice signal cannot be detected, the pitch discontinuity between the adjacent vowel sections is minimized and the timbre continuity is reduced without outputting the input voice signal as it is. Can be held. That is, it is possible to easily generate one or a plurality of harmony audio signals that hardly cause an unnatural noise to the user without preparing special data separately.
In the above-described processing, even if the chord is arbitrarily changed in the pitch non-detection section, a harmony voice signal having a pitch corresponding to the change can be advantageously generated.

以上、図面に基づいて実施形態の一例を説明したが、本発明はこれに限定されるものではなく、様々な実施形態が可能であることは言うまでもない。例えば、ハーモニー音生成機能を実現する「ハーモニー音生成処理」は上述した実施例のものに限らない。図５は、ハーモニー音生成処理の別の実施例を示すフローチャートである。 As mentioned above, although an example of embodiment was demonstrated based on drawing, this invention is not limited to this, It cannot be overemphasized that various embodiment is possible. For example, the “harmonic sound generation process” for realizing the harmony sound generation function is not limited to the above-described embodiment. FIG. 5 is a flowchart showing another embodiment of the harmony sound generation process.

ステップＳ１１は、ディジタル化された入力音声信号を解析してピッチを検出する。ステップＳ１２は、ピッチ検出ができたか否かを判定する。ピッチ検出ができないと判定した場合には（ステップＳ１２のＮＯ）、ステップＳ１８の処理へジャンプする。他方、ピッチ検出ができたと判定した場合には（ステップＳ１２のＹＥＳ）、検出したピッチを音高（ノート）単位にクオンタイズすることにより入力音声信号の音高を求める（ステップＳ１３）。ステップＳ１４は、コード情報を取得する。ステップＳ１５は、前記取得したコード情報と前記求めた入力音声信号の音高とに基づいて、ＲＯＭ２又は記憶装置８に記憶されている該当するハーモニーテーブルを参照して１乃至複数のハーモニー音声信号の音高（目標音高）を求める。 In step S11, the digitized input voice signal is analyzed to detect the pitch. In step S12, it is determined whether or not the pitch has been detected. If it is determined that the pitch cannot be detected (NO in step S12), the process jumps to step S18. On the other hand, if it is determined that the pitch has been detected (YES in step S12), the pitch of the input voice signal is obtained by quantizing the detected pitch in pitch (note) units (step S13). A step S14 acquires code information. Step S15 refers to the corresponding harmony table stored in the ROM 2 or the storage device 8 on the basis of the acquired code information and the pitch of the obtained input voice signal, and stores one or more harmony voice signals. Find the pitch (target pitch).

ステップＳ１６は、前記ステップＳ１５で求めた１乃至複数の目標音高と前記ステップＳ１１で検出した入力音声信号のピッチとを比較して差分を求め、該求めた差分に従って１乃至複数のピッチシフト量を求める。ステップＳ１７は、前記求めた１乃至複数のピッチシフト量を保持する。ステップＳ１８は、前記保持した最新の１乃至複数のピッチシフト量に基づいて入力音声信号をピッチシフトすることによって、前記目標音高の１乃至複数のハーモニー系列のハーモニー音声信号を生成する。ただし、前記入力音声信号のピッチを検出できなかった区間においては、ピッチを検出できた直前の区間の入力音声信号をピッチシフトすることによって当該区間における１乃至複数のハーモニー系列のハーモニー音声信号を生成する。 In step S16, one or more target pitches obtained in step S15 are compared with the pitch of the input audio signal detected in step S11 to obtain a difference, and one or more pitch shift amounts are determined according to the obtained difference. Ask for. Step S17 holds the obtained one or more pitch shift amounts. In step S18, the input voice signal is pitch-shifted based on the held one or more latest pitch shift amounts, thereby generating one or more harmony series harmony voice signals of the target pitch. However, in a section where the pitch of the input voice signal cannot be detected, the input voice signal of the section immediately before the pitch can be detected is pitch-shifted to generate one or a plurality of harmony-sequence harmony voice signals in the section. To do.

このようにすると、上述した実施例（図３参照）と同様に、従来と比較して第１の母音区間及び第２の母音区間のピッチシフト量の不連続さが緩和されることとなり、子音区間においても１乃至複数系列のハーモニー音声信号を得ることができるようになる。この実施例においては、特に子音区間の直前の入力音声信号に対するピッチシフト量を維持するようにしたので、わざわざ保持したピッチと目標音高との差分を計算してピッチシフト量を求めなくてよいことから、その分の処理の負担を軽減できて有利である。 In this way, as in the above-described embodiment (see FIG. 3), the discontinuity in the pitch shift amount between the first vowel section and the second vowel section is alleviated as compared with the conventional example, and the consonant. One or more series of harmony audio signals can be obtained even in the section. In this embodiment, since the pitch shift amount with respect to the input speech signal immediately before the consonant section is maintained, there is no need to calculate the difference between the held pitch and the target pitch and calculate the pitch shift amount. For this reason, it is advantageous to reduce the processing burden.

なお、ハーモニー音声信号を生成するために入力されるコード情報は、上述したように本装置上あるいは本装置に接続された鍵盤などの演奏操作子からユーザ操作に応じて入力された入力情報から検出されたものでもよいし、あるいは和音名を順次入力する形式で得られるものであってもよい。あるいは、自動演奏データ内に曲の進行に従って和音名データを含んだものを再生してコード情報を取得するようにしてもよい。
なお、上述した実施例では、ハーモニー音声信号の音高をコード情報（より詳しくはハーモニーテーブル）を元にして決定するものを示したがこれに限らず、コード情報を元にすることなくハーモニー音声信号の音高を決定する公知の他の方法であってもよい。例えば、入力音声信号の音高に対して予め決めてある所定の音程分離れた音高（例えば３度上など）にハーモニー音声信号の音高を決定する方法を採用するなどしてもよい。 The chord information input to generate the harmony audio signal is detected from the input information input in response to a user operation from a performance operator such as a keyboard connected to the apparatus or the apparatus as described above. May be obtained, or may be obtained by sequentially inputting chord names. Alternatively, the chord information may be acquired by reproducing the automatic performance data including chord name data according to the progress of the song.
In the above-described embodiment, the pitch of the harmony voice signal is determined based on the chord information (more specifically, the harmony table). However, the present invention is not limited to this, and the harmony voice is not based on the chord information. Other known methods for determining the pitch of the signal may be used. For example, a method may be employed in which the pitch of the harmony voice signal is determined based on a predetermined pitch separated from the pitch of the input voice signal (for example, three degrees above).

なお、上述した実施例においては、入力音声信号に対して３系列のハーモニー音声信号を同時に生成するものを示したがこれに限らず、３系列以上あるいは３系列以下のハーモニー音声信号を同時に生成するものであってもよい。
なお、上記のようにして入力された音声信号のピッチ検出結果をそのまま用いてハーモニー音声信号の音高を決定するものに限らず、入力音声信号のピッチ検出結果を例えば１オクターブや３半音等の所定ピッチだけ上下するなど音高変換したものを用いてハーモニー音声信号の音高を決定するようにしてもよい。
なお、上述した実施例においてはハーモニー音声信号を生成するための元となる入力音声信号はマイクロフォン８Ａを介して入力された音声を例に説明したが、例えばマイクロフォン８Ａを介して入力される楽器演奏音などであってもよい。楽器演奏音の場合、付加音は伴奏音であってよい。 In the above-described embodiment, although three series of harmony voice signals are simultaneously generated with respect to the input voice signal, the present invention is not limited to this, and three or more series harmony voice signals are simultaneously generated. It may be a thing.
The pitch detection result of the harmony voice signal is not limited to the pitch detection result of the voice signal input as described above, and the pitch detection result of the input voice signal is, for example, one octave or three semitones. You may make it determine the pitch of a harmony audio | voice signal using what converted pitch, such as going up and down only a predetermined pitch.
In the above-described embodiment, the input audio signal that is the basis for generating the harmony audio signal is described as an example of the audio input through the microphone 8A. However, for example, the musical instrument performance input through the microphone 8A is used. Sound may be used. In the case of a musical instrument performance sound, the additional sound may be an accompaniment sound.

１…ＣＰＵ、２…ＲＯＭ、３…ＲＡＭ、４，５…検出回路、４Ａ…演奏操作子、５Ａ…設定操作子、６…表示回路、６Ａ…ディスプレイ、７…音源・効果回路、７Ａ…サウンドシステム、８…Ａ／Ｄ変換回路、８Ａ…マイクロフォン、９…記憶装置、１０…通信インタフェース、１Ｄ…データ及びアドレスバス 1 ... CPU, 2 ... ROM, 3 ... RAM, 4,5 ... detection circuit, 4A ... performance operator, 5A ... setting operator, 6 ... display circuit, 6A ... display, 7 ... sound source / effect circuit, 7A ... sound System: 8 ... A / D conversion circuit, 8A ... Microphone, 9 ... Storage device, 10 ... Communication interface, 1D ... Data and address bus

Claims

An electronic music device that generates one or a plurality of harmony voice signals whose pitches are controlled in accordance with pitch fluctuations of an input voice signal,
An input means for inputting an audio signal;
Analyzing the input voice signal and detecting the pitch, an input pitch specifying means for specifying any of the pitches corresponding to the pitch name for each predetermined section of the input voice signal;
Harmony pitch determination means for determining the pitch of one or more harmony sounds according to the specified pitch;
Shift amount calculating means for calculating one or more pitch shift amounts based on the identified pitches and the determined pitches of the one or more harmony sounds;
Based on the calculated one or more pitch shift amounts, the input audio signal is pitch-shifted for each predetermined interval, and the determined one or more harmony audio signals controlled to the plurality of pitches are obtained. A musical sound generating means for generating,
When the pitch of the input voice signal can be detected, the input pitch specifying means holds the detected pitch and specifies the pitch of the input voice signal based on the pitch. An electronic music apparatus characterized in that, when a pitch of a signal cannot be detected, a pitch of the input audio signal is specified based on the pitch held immediately before.

An electronic music device that generates one or a plurality of harmony voice signals whose pitches are controlled in accordance with pitch fluctuations of an input voice signal,
An input means for inputting an audio signal;
Analyzing the input voice signal and detecting the pitch, an input pitch specifying means for specifying any of the pitches corresponding to the pitch name for each predetermined section of the input voice signal;
Harmony pitch determination means for determining the pitch of one or more harmony sounds according to the specified pitch;
Shift amount calculating means for calculating one or more pitch shift amounts based on the identified pitches and the determined pitches of the one or more harmony sounds;
Shift amount holding means for holding the calculated one or more pitch shift amounts;
Based on the held one or more pitch shift amounts, the input audio signal is pitch-shifted for each predetermined section, and the determined one or more harmony audio signals controlled to the plurality of pitches are obtained. A musical sound generating means for generating,
The musical sound generation means, when generating a harmony voice signal in a section in which the pitch of the input voice signal could not be detected, based on the held one or more pitch shift amounts in the section immediately before the pitch was detected. An electronic music apparatus characterized by pitch shifting.

A computer-executable program for generating one or more harmony voice signals whose pitches are controlled following the pitch fluctuation of an input voice signal, the program being stored in the computer,
Input audio signal,
A procedure for identifying one of pitches corresponding to a pitch name for each predetermined section of the input voice signal, by detecting the pitch by analyzing the input voice signal, the procedure comprising: When the pitch of the input audio signal can be detected, the detected pitch is held and the pitch of the input audio signal is specified based on the pitch, while the pitch of the input audio signal cannot be detected Identifying the pitch of the input audio signal based on the pitch held immediately before;
Determining a pitch of one or more harmony voices according to the identified pitch;
Calculating one or more pitch shift amounts based on the identified pitch and the determined pitches of the one or more harmony voices;
Based on the calculated one or more pitch shift amounts, the input audio signal is pitch-shifted for each predetermined interval, and the determined one or more harmony audio signals controlled to the plurality of pitches are obtained. A program that executes the generating procedure.

A computer-executable program for generating one or more harmony voice signals whose pitches are controlled following the pitch fluctuation of an input voice signal, the program being stored in the computer,
Input audio signal,
A procedure for identifying one of pitches corresponding to a pitch name for each predetermined section of the input voice signal, by analyzing the input voice signal and detecting a pitch;
Determining a pitch of one or more harmony voices according to the identified pitch;
Calculating one or more pitch shift amounts based on the identified pitch and the determined pitches of the one or more harmony voices;
A procedure for holding the calculated one or more pitch shift amounts;
Based on the held one or more pitch shift amounts, the input audio signal is pitch-shifted for each predetermined section, and the determined one or more harmony audio signals controlled to the plurality of pitches are obtained. A step of generating, in the step of generating a harmony voice signal in a section in which the pitch of the input voice signal could not be detected, the held one or more pitches in the section immediately before the pitch was detected. A program that executes what shifts the pitch based on the shift amount.