JP3613859B2

JP3613859B2 - Karaoke equipment

Info

Publication number: JP3613859B2
Application number: JP30304795A
Authority: JP
Inventors: 保夫蔭山
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1995-11-21
Filing date: 1995-11-21
Publication date: 2005-01-26
Anticipated expiration: 2020-01-26
Also published as: JPH08227296A

Description

【０００１】
【発明の属する技術分野】
この発明は、歌唱等の旋律音声信号に対してハーモニー音声を付加する音声信号処理装置に関し、特に、複数の旋律音声信号が入力されたときに、このうち主旋律の音声信号のみに対してハーモニーを付加する音声信号処理装置に関する。
【０００２】
【従来の技術】
カラオケの歌唱を盛り上げるために、歌唱者の歌唱に対してハーモニー（たとえば、歌唱の旋律に対して３度上の旋律）の音声を付加して出力するものが提案されている。ハーモニー機能としては、歌唱音声信号をピッチシフトしてハーモニー音を生成するものが一般的である。
【０００３】
また、カラオケ曲のなかには、いわゆるデュエット曲など複数（二人）で歌唱するものもある。
【０００４】
【発明が解決しようとする課題】
しかし、上記デュエット曲の場合、二人の歌唱音声信号が混ざって入力されるが、従来のハーモニー付加機能を有するカラオケ装置では、この全ての歌唱音声に対してハーモニーをつけてしまうため、複数パートが混ざりあった不明瞭な歌唱になってしまい、カラオケ歌唱を盛り上げることができず、かえって、二人の歌唱音声信号を損なってしまう欠点があった。
【０００５】
この発明は、複数の歌唱音声信号が入力された場合でも、そのなかから主旋律のみを抽出してハーモニーを付加することのできるカラオケ装置を提供することを目的とする。
【０００６】
【課題を解決するための手段】
請求項１の発明は、楽曲データ再生手段と、基本周波数情報抽出手段と、主旋律選択手段と、主旋律分離手段と、ピッチシフト手段と、歌唱分析手段とを備えるカラオケ装置であって、
楽曲データ再生手段は、楽曲データを再生して伴奏信号を出力すると共に、楽曲データに含まれる主旋律情報を主旋律選択手段に、ハーモニー情報をピッチシフト手段と歌唱分析手段に出力し、
基本周波数情報抽出手段は、マイクロホンからの音声信号からパート毎の基本周波数情報を抽出し、
主旋律選択手段は、パート毎の基本周波数情報のうち主旋律情報に合った基本周波数情報を選択情報として出力し、
主旋律分離手段は、マイクロホンからの音声信号を周波数分析し複数のパート音声信号に分離すると共に、主旋律選択手段からの選択情報に基づいてパート音声信号のうちの１つを主旋律音声信号として出力し、
ピッチシフト手段は、主旋律音声信号をハーモニー情報に基づいてピッチシフトしてハーモニー音声信号を出力し、
歌唱分析手段は、基本周波数情報に基づいて音声信号中の歌唱パート数を分析し、歌唱パート数が０の場合は第１の処理を、歌唱パート数が１の場合は第２の処理を、歌唱パート数が複数であって、ハーモニー情報に対応するハーモニーパートが含まれている場合は第３の処理を行い、
第１の処理は、主旋律選択手段と、主旋律分離手段と、ピッチシフト手段とを休止させ、
第２の処理は、主旋律選択手段と、主旋律分離手段とを休止させ、主旋律音声信号に代え音声信号をピッチシフト手段に入力し、
第３の処理は、主旋律分離手段と、ピッチシフト手段とを休止させる
ことを特徴とする。
請求項２の発明は、楽曲データ再生手段と、基本周波数情報抽出手段と、主旋律選択手段と、主旋律分離手段と、ピッチシフト手段と、歌唱分析手段とを備えるカラオケ装置であって、
楽曲データ再生手段は、楽曲データを再生して伴奏信号を出力すると共に、楽曲データに含まれるハーモニー情報をピッチシフト手段と歌唱分析手段に出力し、
基本周波数情報抽出手段は、マイクロホンからの音声信号からパート毎の基本周波数情報を抽出し、
主旋律選択手段は、パート毎の基本周波数情報のうち周波数の高さが所定番目の基本周波数情報を選択情報として出力し、
主旋律分離手段は、マイクロホンからの音声信号を周波数分析し複数のパート音声信号に分離すると共に、主旋律選択手段からの選択情報に基づいてパート音声信号のうちの１つを主旋律音声信号として出力し、
ピッチシフト手段は、主旋律音声信号をハーモニー情報に基づいてピッチシフトしてハーモニー音声信号を出力し、
歌唱分析手段は、基本周波数情報に基づいて音声信号中の歌唱パート数を分析し、歌唱パート数が０の場合は第１の処理を、歌唱パート数が１の場合は第２の処理を、歌唱パート数が複数であって、ハーモニー情報に対応するハーモニーパートが含まれている場合は第３の処理を行い、
第１の処理は、主旋律選択手段と、主旋律分離手段と、ピッチシフト手段とを休止させ、
第２の処理は、主旋律選択手段と、主旋律分離手段とを休止させ、主旋律音声信号に代え音声信号をピッチシフト手段に入力し、
第３の処理は、主旋律分離手段と、ピッチシフト手段とを休止させる
ことを特徴とする。
【０００７】
上記発明の音声信号処理装置は、音声信号入力手段から複数の旋律音声信号を入力する。この装置はたとえばカラオケ装置に適用されるものであり、この場合には、音声信号入力手段は、歌唱用のマイクとそのマイクに接続されるアンプ等の機器となる。主旋律判定手段が、入力された複数の音声信号から主旋律の音声信号を判定する。主旋律の判定は、予め記憶しておいた主旋律情報に基づき、これに対応するものを主旋律と判定するようにしてもよく、また、あるルールに基づいて、例えば最高音の音声信号を主旋律情報とするなどのルールに基づいて判定してもよい。このようにして主旋律として判定された音声信号を前記入力された複数の音声信号から抽出する。複数の音声信号が別系統で入力されている場合には、そのうちの主旋律の系統を選択すればよく、複数の歌唱音声信号が１系統で入力されている場合には、そのなかから主旋律の基本倍音に該当する周波数成分のみを分離抽出するなどの方式で主旋律を抽出する。この抽出された主旋律の音声信号をピッチシフトしてハーモニー音声信号を生成する。ピッチシフトの方式は単純に読出クロックを変える方式もあり、また、フォルマントを移動させずに、周波数成分のみシフトする方式もある。
【０００８】
このようにして生成されたハーモニー音声信号を入力された複数の音声信号に合成することによってハーモニーを伴う音声信号を出力することができる。
【０００９】
【発明の実施の形態】
図面を参照してこの発明の実施形態であるカラオケ装置について説明する。このカラオケ装置は、いわゆる音源カラオケ装置である。音源カラオケ装置とは、楽曲データで音源装置を駆動することによりカラオケ演奏音を発生するカラオケ装置である。楽曲データとは、音高や発音タイミングを指定する演奏データ列などの複数トラックからなるシーケンスデータである。
【００１０】
また、このカラオケ装置は、歌唱者の歌唱音声信号に３度や５度の音程のハーモニー音声信号を付加するハーモニー付加機能を有している。ハーモニー音声信号は、歌唱者の歌唱音声をピッチシフトすることにより３度や５度などの音程を有する音声信号を生成し、これをハーモニー音声信号として出力するものである。さらに、このカラオケ装置は、デュエット曲で２人が同時に歌っているときでも、そのうちどちらが主旋律であるかを判断しその主旋律の歌唱音声信号のみに対してハーモニーを付加する。
【００１１】
図１は同カラオケ装置の要部のブロック図である。同図はカラオケ演奏音（伴奏音）および歌唱音の音声信号処理部のみを図示しており、歌詞や背景画像などの表示処理部や選曲部は従来より一般的な構成であるため図示を省略している。カラオケ演奏を行うための楽曲データはＨＤＤ１５に記憶されている。ＨＤＤ１５には楽曲データが数千曲分記憶されており、図示しない選曲部によってそのうちの１曲が選択されると、シーケンサ１４が該選択された楽曲データを読み込む。シーケンサ１４は読み込んだ楽曲データを記憶するメモリと、この楽曲データをテンポクロックに基づいて順次読み出すシーケンスプログラム処理部を有しており、読み出されたデータはそのトラックに応じて所定の処理部に出力される。
【００１２】
ここで、図２を参照して楽曲データの構成を説明する。同図（Ａ）において、楽曲データは、曲名やジャンル等が書き込まれたヘッダに続いて、楽音トラック，主旋律トラック，ハーモニートラック，歌詞トラック，音声トラック，効果トラックおよび音声データ部からなっている。このうち、主旋律トラックは同図（Ｂ）に示すように複数のイベントデータと各イベントデータ間の時間間隔を示すデュレーションデータΔｔからなるシーケンスデータで構成されている。シーケンサ１４は、カラオケ演奏時に所定のテンポクロックでΔｔをカウントし、このΔｔをカウントアップしたときこれに続くイベントデータを読み出す。読み出されたこの主旋律トラックのイベントデータは、主旋律選択用のデータとして後述の主旋律選択部２３に出力される。
【００１３】
主旋律トラック以外のトラック、すなわち、楽音トラック，ハーモニートラック，歌詞トラック，音声トラック，効果トラックも主旋律トラックと同様、複数のイベントデータおよびデュレーションデータ列からなるシーケンスデータで構成されている。楽音トラックは、カラオケ演奏用のメロディトラック，リズムトラック，コードトラックなどの複数のシーケンストラックで構成されている。カラオケ演奏時にシーケンサ１４がこの楽音トラックからイベントデータを読み出すと、そのデータを音源１６に出力する。ハーモニートラックは、主旋律に付加すべきハーモニー旋律を記憶したトラックであり、このイベントデータは、歌唱分析部２２やピッチシフト部２６に出力される。音源１６はこのデータに基づいて楽音信号を発生する。また、歌詞トラックは、画面に歌詞を表示するためのシーケンストラックである。シーケンサ１４がこの歌詞トラックのイベントデータが読み出したとき、これを図示しない表示制御部に出力する。表示制御部はこのイベントデータに基づいて歌詞の表示を制御する。音声トラックは、音源１６で合成することが困難なコーラス音声や合いの手などの人声信号の再生タイミングを指定するトラックである。人声信号は音声データとして音声データ部に複数記憶されている。カラオケ演奏中にシーケンサ１４が音声トラックのイベントデータを読み出したとき、そのイベントデータで指定される音声データを後述の加算部２８に出力する。これにより、この音声データがカラオケ演奏としてミキシングされる。効果トラックは音源１６に含まれる効果部（ＤＳＰで構成される）を制御するためのトラックである。効果部が付与する効果としてはリバーブなどがある。このイベントデータは音源１６に出力される。
【００１４】
音源１６は、シーケンサ１４から入力された楽音イベントデータに基づいて、そのデータで指定される音色，音高，音量の楽音信号を形成する。この楽音信号はＤＳＰ１３内の加算部２８に入力される。
【００１５】
一方、このカラオケ装置は、１本の歌唱用のマイク１０を有しており、デュエット曲などで二人が歌唱した場合、２人の歌唱音は該１本のマイク１０に入力される。マイク１０から入力された歌唱の音声信号はアンプ１１で増幅され、ＡＤＣ１２によってディジタル信号に変換される。このディジタル信号に変換された音声信号がＤＳＰ１３に入力される。ＤＳＰはマイクロプログラムによって種々の機能を実現するが、このＤＳＰ１３は、同図のブロックに示すような機能を実現するためのマイクロプログラムを記憶しており、前記ディジタル信号の１サンプリング周期にこの図示の機能を全て実行する速度でこのマイクロプログラムを実行している。
【００１６】
同図において、ＡＤＣ１２から入力されたディジタル音声信号は自己相関分析部２１およびディレイ２４，２７に入力される。自己相関分析部２１は、入力された音声信号の各周波数成分の繰り返し周期を分析し、この繰り返し周期から複数の歌唱者の歌唱音声信号の基本周波数を検出する。
【００１７】
図３は、前記自己相関分析部２における自己相関分析の手法を説明する図である。周期信号の自己相関関数も、信号と同じ周期の周期関数となることから、周期Ｐサンプルの信号の自己相関関数は信号の時間原点に無関係に、０，±Ｐ，±２Ｐ，…サンプル目に極大値に達する。そこで、自己相関関数の最初の極大点を見つけることで、その周期を推定することができる。同図において、極大値は整数倍でない複数の位置に現れており、これらが２人の歌唱者による異なる周波数の歌唱信号波の周期を示していることがわかる。これで、基本周波数が割り出される。自己相関分析部２１は、この２つの基本周波数を歌唱分析部２２および主旋律選択部２３に入力する。また、有声音は明確な周期波形になるのに対し、無声音はノイズ的な波形になるため、これにより有声音／無声音の識別をすることができる。この識別結果は歌唱分析部２２に入力される。
【００１８】
主旋律選択部２３は、シーケンサ１４から入力される主旋律情報（主旋律トラックのイベントデータ）に基づいて、自己相関分析部２１から入力された複数パートの可能音声信号の基本周波数のうちどれが主旋律であるかを割り出す。この選択情報は主旋律成分分離部２５に入力される。
【００１９】
一方、歌唱分析部２２では自己相関分析部２１から入力される基本周波数を含む分析情報に基づいて、現在の歌唱状態を分析する。歌唱状態とは、現在歌っている歌唱者の人数が０人（間奏等の無音区間）であるか、１人（ソロまたは掛け合い）であるか、２人以上（デュエット中）であるかの状態である。歌唱分析部２２はこれを判断し、さらに２人以上が歌っている場合に主旋律以外の音声信号がハーモニーになっていないかなどの状態を検出する。ハーモニーの検出はシーケンサから入力されるハーモニー情報（ハーモニートラックのイベントデータ）に基づいて判断される。また、主旋律の発声が有声音であるか無声音であるかも判定する。
【００２０】
歌唱分析部２２は、この判定結果に基づいて主旋律選択部２３や主旋律成分分離部２５の動作内容を制御する。歌唱状態が無音区間であると判断した場合には、主旋律選択も主旋律成分分析も不要であるため、主旋律選択部２３および主旋律成分分離部２５の動作をこの期間休止させる。また、２人のうち一方が主旋律を歌唱し、他方がそれに対するハーモニーを歌唱している場合には、敢えてこれに重ねてハーモニー音を生成する必要がないため、主旋律成分分離部２５を休止させる。主旋律分離部２５が動作を休止すると後段のピッチシフト部２６に入力される音声信号がないためピッチシフトによるハーモニー音の生成も休止することになる。
【００２１】
また、現在歌唱者の一方しか歌っていないことが検出された場合には、歌われている旋律が主旋律であることは明らかであるため、主旋律選択部２３の動作を休止させ、主旋律成分分離部２５に入力された歌唱音声信号をスキップさせるように指示する。これにより、１人の歌唱音声信号がディレイ２４から直接ピッチシフト部２６に入力される。
【００２２】
さらに、現在の主旋律の音声が有声音であるか無声音であるかで主旋律成分分離部２５の分離アルゴリズムを切り換える。すなわち、有声音の場合には、比較的単純に基音（基本周波数）の倍音で歌唱音声信号が構成されているため、この原則に基づいて主旋律成分分離を行う。一方、無声音の場合には、非線形なノイズ成分が多く含まれているため、上記有声音とは異なる手法で主旋律の歌唱音声信号を分離する。
【００２３】
主旋律成分分離部２５により分離された主旋律の音声信号、または、主旋律成分分離部２５をスキップした単独歌唱の音声信号はピッチシフト部２６に入力される。ピッチシフト部２６は、入力された音声信号をシーケンサ１４から入力されるハーモニー情報に基づいてピッチシフトし、ハーモニーの音声信号として加算部２８に出力する。
【００２４】
ここで、ピッチシフト部２６は、図４に示すように前段から入力された音声信号のフォルマント（周波数成分の包絡線）を保存し、そのフォルマントを構成する各周波数成分のみをピッチシフトする。ピッチシフトされた各周波数成分はそのピッチの包絡線に一致されるようにレベル調整される。これにより、音質を変えずに音高（周波数）のみをシフトすることができる。
【００２５】
図１において、加算部２８には、このハーモニー音声信号以外に前記音源１６から入力されるカラオケ演奏音、シーケンサ１４から直接入力されるコーラス音およびＡＤＣ１２からディレイ２７を介して直接入力される歌唱音声信号が入力される。加算部２８は、これらの歌唱音声信号，ハーモニー音声信号，カラオケ演奏音およびコーラス音を加算合成してステレオ信号にミキシングする。このミキシングされた音声信号はＤＳＰ１３から出力され、ＤＡＣ１７に入力される。ＤＡＣ１７はこのディジタルステレオ信号をアナログ信号に変換してアンプ１８に入力する。アンプ１８はこのアナログ信号を増幅してスピーカ１９から出力する。
【００２６】
なお、ＤＳＰ１３のブロック中に挿入されている２つのディレイ２４，２７は、信号処理自己相関分析部２１，歌唱分析部２２および主旋律選択部２３等における信号処理のための遅れ時間を吸収するためのものである。
【００２７】
このようにこのカラオケ装置では、１本のマイク１０から入力される複数（２人）の歌唱音声のうち何方が主旋律であるかを分析し、その主旋律のみにハーモニー歌唱を付加して出力するため、デュエット曲などを一緒に歌っていても、その主旋律に対してのみハーモニーを付加することができる。
【００２８】
図５はこの発明の他の実施形態であるカラオケ装置の要部のブロック図である。このカラオケ装置と図１に示した第１の実施形態のカラオケ装置との相違点は、このカラオケ装置が歌唱者の人数分のマイク（同図では２本）を備え、各歌唱者の歌唱音声信号が別系統でＤＳＰに入力される点である。カラオケ演奏用の楽曲データの記憶・読出部および歌唱音声信号とカラオケ演奏信号とが加算されたのちの信号系は上記第１の実施形態と同一であるため構成部に同一番号を付して説明を省略する。
【００２９】
デュエット用の２本のマイク３０，３１はそれぞれ別系統のアンプ３２，３３で増幅され、ＡＤＣ３４，３５でディジタル信号に変換されてＤＳＰ３６に入力される。ＤＳＰ３６において、第１の歌唱音声信号（マイク３０から入力された歌唱音声信号）は自己相関分析部４１および加算部４７に入力される。また、第２の歌唱音声信号（マイク３１から入力された歌唱音声信号）は自己相関分析部４２および加算部４４，４７に入力される。自己相関分析部４１，４２では、それぞれ第１の歌唱信号，第２の歌唱信号の基本周波数を分析する。この構成では、自己相関分析部４１，４２は、複数の歌唱音の基本周波数をぞれぞれ分離して分析する必要はない。分析結果は歌唱分析部４３に入力される。歌唱分析部４３は、入力された２人の歌唱音声信号の基本周波数およびシーケンサ１４から入力される主旋律情報，ハーモニー情報に基づいて、歌唱人数判定動作，主旋律選択動作，ハーモニー検出動作を実行する。すなわち、２人が同時に歌唱しているか否か、二人で歌っている場合、どちらが主旋律か、また、他方の歌唱音が主旋律のハーモニーになっているかなどを分析する。主旋律選択動作により主旋律が選択されると、それに対応するセレクト信号をセレクタ４５に入力する。セレクタ４５は主旋律として選択された歌唱音声信号をピッチシフタ４６に入力するべく接続を切り換える。ピッチシフタ４６は、入力された音声信号をシーケンサ１４から入力されるハーモニー情報に基づいてピッチシフトし、ハーモニー音声信号を生成する。
【００３０】
ハーモニー音声信号は加算部４９に入力される。加算部４９には、このハーモニー音声信号以外に前記音源１６から入力されるカラオケ演奏音、シーケンサ１４から直接入力されるコーラス音およびＡＤＣ１２から加算部４７−ディレイ４８を経て入力される歌唱音声信号が入力される。加算部４９は、これらの歌唱音声信号，ハーモニー音声信号，カラオケ演奏音およびコーラス音を加算合成してステレオ信号にミキシングする。このミキシングされた音声信号はＤＳＰ３６から出力され、ＤＡＣ１７に入力される。
【００３１】
なお、上記実施形態には、特許請求の範囲に記載した発明以外の発明も含まれており、この発明を特許請求の範囲の請求項１の発明の従属形式で記載すると以下のようになる。
【００３２】
〔請求項２〕前記音声信号入力手段は、複数の歌唱音声信号を１系統で入力する手段である請求項１に記載の音声信号処理装置。
【００３３】
〔請求項３〕前記主旋律判定手段は、入力された複数の音声信号のそれぞれの基本周波数を検出する手段と、検出された基本周波数と予め記憶されている主旋律情報とを比較して一致するものを主旋律と判定する手段とを含む請求項１に記載の音声信号処理装置。
【００３４】
〔請求項４〕前記主旋律抽出手段は、１系統で入力された複数の音声信号から主旋律の音声信号成分を分離抽出する手段である請求項２に記載の音声信号処理装置。
【００３５】
〔請求項５〕前記ハーモニー生成手段は、予め記憶されているハーモニー情報に基づいて前記主旋律の音声信号をピッチシフトする手段である請求項１に記載の音声信号処理装置。
【００３６】
〔請求項６〕主旋律以外の音声信号が該主旋律に対するハーモニーになっているか否かを検出するハーモニー検出手段と、該ハーモニー検出手段がハーモニーになっている音声信号を検出したとき前記ハーモニー生成手段を無効にする手段とを備えたことを特徴とする請求項１に記載の音声信号処理装置。
【００３７】
〔請求項７〕主旋律以外の音声信号が前記ハーモニー情報と一致するか否かを判定するハーモニー検出手段と、該ハーモニー検出手段がハーモニーになっている音声信号を検出したとき前記ハーモニー生成手段を無効にする手段とを備えたことを特徴とする請求項５に記載の音声信号処理装置。
【００３８】
また、この発明はカラオケのような歌唱音声以外にも楽器演奏にも適用することができる。
【００３９】
【発明の効果】
以上のようにこの発明によれば、複数の音声信号が入力されても、そのなかから主旋律を歌唱／演奏する音声信号を判定して抽出し、この音声信号のみにハーモニー音声を付加するため、主旋律を引き立てるハーモニーのみを付加することができ、主旋律でないものに対してハーモニーを付加して却って歌唱／演奏を損なうことがなくなる。
【００４０】
また、入力された複数の音声信号から主旋律を判定するため、主旋律が交代するような演奏であってもそのなかから主旋律を抽出することができる。
【図面の簡単な説明】
【図１】この発明の実施形態であるカラオケ装置の要部の構成図
【図２】同カラオケ装置の楽曲データの構成を示す図
【図３】入力された歌唱音声信号の自己相関分析を説明する図
【図４】音声信号のピッチシフトの手法を説明する図
【図５】この発明の他の実施形態であるカラオケ装置の要部の構成図[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio signal processing device for adding harmony to a melody audio signal such as singing, and in particular, when a plurality of melody audio signals are input, harmony is applied only to the main melody audio signal. The present invention relates to an audio signal processing apparatus to be added.
[0002]
[Prior art]
In order to enliven the singing of karaoke, what is output by adding a voice of harmony (for example, a melody three times higher than the melody of the singing) to the singing of the singer has been proposed. As the harmony function, one that generates a harmony sound by pitch shifting the singing voice signal is generally used.
[0003]
Also, some karaoke songs are sung in plural (two people) such as so-called duet songs.
[0004]
[Problems to be solved by the invention]
However, in the case of the above-mentioned duet music, two singing voice signals are mixed and input. However, in a conventional karaoke apparatus having a harmony addition function, since all the singing voices are harmonized, there are multiple parts. However, there was a drawback that the singing voice signal of the two people was damaged.
[0005]
An object of the present invention is to provide a karaoke apparatus that can extract harmony by extracting only the main melody from a plurality of singing voice signals.
[0006]
[Means for Solving the Problems]
The invention of claim 1 is a karaoke apparatus comprising music data reproduction means, fundamental frequency information extraction means, main melody selection means, main melody separation means, pitch shift means, and singing analysis means,
The music data reproduction means reproduces the music data and outputs an accompaniment signal, outputs the main melody information included in the music data to the main melody selection means, and outputs the harmony information to the pitch shift means and the singing analysis means,
The fundamental frequency information extracting means extracts fundamental frequency information for each part from the audio signal from the microphone,
The main melody selection means outputs the basic frequency information suitable for the main melody information among the basic frequency information for each part as selection information,
The main melody separation means frequency-analyzes the sound signal from the microphone and separates it into a plurality of part sound signals, and outputs one of the part sound signals as the main melody sound signal based on the selection information from the main melody selection means,
The pitch shift means pitch-shifts the main melody audio signal based on the harmony information and outputs a harmony audio signal.
The singing analysis means analyzes the number of singing parts in the audio signal based on the fundamental frequency information. When the number of singing parts is 0, the first process is performed. When the number of singing parts is 1, the second process is performed. If the number of singing parts is plural and the harmony part corresponding to the harmony information is included, the third process is performed.
The first process pauses the main melody selection means, the main melody separation means, and the pitch shift means,
In the second process, the main melody selection means and the main melody separation means are paused, and a voice signal is input to the pitch shift means instead of the main melody voice signal,
The third process pauses the main melody separation means and the pitch shift means.
It is characterized by that.
The invention of claim 2 is a karaoke apparatus comprising music data reproduction means, fundamental frequency information extraction means, main melody selection means, main melody separation means, pitch shift means, and singing analysis means,
The music data reproduction means reproduces the music data and outputs an accompaniment signal, and outputs the harmony information included in the music data to the pitch shift means and the singing analysis means,
The fundamental frequency information extracting means extracts fundamental frequency information for each part from the audio signal from the microphone,
The main melody selection means outputs the fundamental frequency information having a predetermined frequency height among the fundamental frequency information for each part as selection information,
The main melody separation means frequency-analyzes the sound signal from the microphone and separates it into a plurality of part sound signals, and outputs one of the part sound signals as the main melody sound signal based on the selection information from the main melody selection means,
The pitch shift means pitch-shifts the main melody audio signal based on the harmony information and outputs a harmony audio signal.
The singing analysis means analyzes the number of singing parts in the audio signal based on the fundamental frequency information. When the number of singing parts is 0, the first process is performed. When the number of singing parts is 1, the second process is performed. If the number of singing parts is plural and the harmony part corresponding to the harmony information is included, the third process is performed.
The first process pauses the main melody selection means, the main melody separation means, and the pitch shift means,
In the second process, the main melody selection means and the main melody separation means are paused, and a voice signal is input to the pitch shift means instead of the main melody voice signal,
The third process pauses the main melody separation means and the pitch shift means.
It is characterized by that.
[0007]
The audio signal processing apparatus of the above invention inputs a plurality of melodic audio signals from the audio signal input means. This apparatus is applied to, for example, a karaoke apparatus. In this case, the audio signal input means is a device such as a singing microphone and an amplifier connected to the microphone. The main melody determination means determines the main melody audio signal from the plurality of input audio signals. The determination of the main melody may be based on main melody information stored in advance, and the corresponding one may be determined as the main melody. Also, based on a certain rule, for example, the sound signal of the highest tone is used as the main melody information. You may determine based on rules, such as doing. The sound signal determined as the main melody in this way is extracted from the plurality of input sound signals. If multiple audio signals are input in different systems, the main melodic system can be selected. If multiple singing audio signals are input in one system, the main melodic basics can be selected. The main melody is extracted by a method such as separating and extracting only frequency components corresponding to overtones. The extracted main melody audio signal is pitch-shifted to generate a harmony audio signal. As a pitch shift method, there is a method in which the read clock is simply changed, and there is a method in which only the frequency component is shifted without moving the formant.
[0008]
By synthesizing the generated harmony audio signal into a plurality of input audio signals, an audio signal with harmony can be output.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
A karaoke apparatus according to an embodiment of the present invention will be described with reference to the drawings. This karaoke apparatus is a so-called sound source karaoke apparatus. The sound source karaoke device is a karaoke device that generates a karaoke performance sound by driving the sound source device with music data. The music data is sequence data composed of a plurality of tracks such as a performance data string for designating pitches and sound generation timings.
[0010]
Moreover, this karaoke apparatus has a harmony addition function for adding a harmony voice signal having a pitch of 3 or 5 degrees to a singing voice signal of a singer. The harmony sound signal is a signal that generates a sound signal having a pitch of 3 degrees or 5 degrees by pitch-shifting the singing sound of the singer and outputs the sound signal as a harmony sound signal. Furthermore, this karaoke apparatus determines which one is the main melody even when two people are singing simultaneously in a duet song, and adds harmony only to the singing voice signal of the main melody.
[0011]
FIG. 1 is a block diagram of a main part of the karaoke apparatus. The figure shows only the voice signal processing unit for karaoke performance sound (accompaniment sound) and singing sound, and the display processing unit and the song selection unit for lyrics and background images are not shown because they have a more general configuration than before. doing. Music data for performing karaoke performance is stored in the HDD 15. The HDD 15 stores thousands of pieces of music data. When one music piece is selected by a music selection unit (not shown), the sequencer 14 reads the selected music data. The sequencer 14 has a memory for storing the read music data and a sequence program processing unit for sequentially reading the music data based on the tempo clock. The read data is sent to a predetermined processing unit according to the track. Is output.
[0012]
Here, the composition of the music data will be described with reference to FIG. In FIG. 2A, the music data is composed of a music sound track, main melody track, harmony track, lyrics track, audio track, effect track, and audio data portion following a header in which the title, genre, etc. are written. Of these, the main melody track is composed of a plurality of event data and sequence data composed of duration data Δt indicating a time interval between the event data as shown in FIG. The sequencer 14 counts Δt with a predetermined tempo clock at the time of karaoke performance, and reads the event data that follows when Δt is counted up. The read event data of the main melody track is output to the main melody selection unit 23 described later as main melody selection data.
[0013]
Similar to the main melody track, tracks other than the main melody track, that is, a musical tone track, a harmony track, a lyrics track, an audio track, and an effect track, are composed of sequence data composed of a plurality of event data and duration data strings. The musical tone track is composed of a plurality of sequence tracks such as a melody track, a rhythm track, and a chord track for karaoke performance. When the sequencer 14 reads event data from the musical sound track during karaoke performance, the data is output to the sound source 16. The harmony track is a track that stores the harmony melody to be added to the main melody, and this event data is output to the song analysis unit 22 and the pitch shift unit 26. The sound source 16 generates a musical sound signal based on this data. The lyrics track is a sequence track for displaying lyrics on the screen. When the sequencer 14 reads the event data of the lyrics track, it outputs it to a display control unit (not shown). The display control unit controls the display of lyrics based on the event data. The audio track is a track for designating the reproduction timing of a human voice signal such as chorus sound or a matching hand that is difficult to synthesize with the sound source 16. A plurality of human voice signals are stored as voice data in the voice data section. When the sequencer 14 reads out the audio track event data during the karaoke performance, the audio data designated by the event data is output to the adder 28 described later. Thereby, this voice data is mixed as a karaoke performance. The effect track is a track for controlling an effect unit (configured by a DSP) included in the sound source 16. The effect imparted by the effect unit includes reverb. This event data is output to the sound source 16.
[0014]
Based on the musical tone event data input from the sequencer 14, the sound source 16 forms a musical tone signal having a tone color, pitch, and volume specified by the data. This musical sound signal is input to the adder 28 in the DSP 13.
[0015]
On the other hand, this karaoke apparatus has one singing microphone 10, and when two people sing a duet song or the like, the two singing sounds are input to the one microphone 10. The singing voice signal input from the microphone 10 is amplified by the amplifier 11 and converted into a digital signal by the ADC 12. The audio signal converted into the digital signal is input to the DSP 13. The DSP realizes various functions by a microprogram. The DSP 13 stores a microprogram for realizing a function as shown in the block of the figure, and the digital signal is sampled in one sampling period. The microprogram is running at a speed that performs all functions.
[0016]
In the figure, the digital audio signal input from the ADC 12 is input to the autocorrelation analyzer 21 and delays 24 and 27. The autocorrelation analyzer 21 analyzes the repetition period of each frequency component of the input sound signal, and detects the fundamental frequency of the singing sound signals of a plurality of singers from this repetition period.
[0017]
FIG. 3 is a diagram for explaining a method of autocorrelation analysis in the autocorrelation analysis unit 2. Since the autocorrelation function of the periodic signal is also a periodic function having the same period as the signal, the autocorrelation function of the signal of periodic P samples is 0, ± P, ± 2P,..., Regardless of the time origin of the signal. The maximum is reached. Therefore, the period can be estimated by finding the first local maximum of the autocorrelation function. In the figure, the maximum values appear at a plurality of positions that are not integer multiples, and it can be seen that these indicate the periods of singing signal waves of different frequencies by two singers. This determines the fundamental frequency. The autocorrelation analysis unit 21 inputs these two fundamental frequencies to the song analysis unit 22 and the main melody selection unit 23. In addition, a voiced sound has a clear periodic waveform, whereas an unvoiced sound has a noise-like waveform, so that a voiced / unvoiced sound can be identified. This identification result is input to the song analysis unit 22.
[0018]
Based on the main melody information (main melody track event data) input from the sequencer 14, the main melody selection unit 23 is one of the fundamental frequencies of the possible audio signals of a plurality of parts input from the autocorrelation analysis unit 21. Find out. This selection information is input to the main melody component separation unit 25.
[0019]
On the other hand, the singing analysis unit 22 analyzes the current singing state based on the analysis information including the fundamental frequency input from the autocorrelation analysis unit 21. The singing state is a state in which the number of singers currently singing is 0 (silent section such as an interlude), 1 (solo or cross), or 2 or more (during a duet) It is. The singing analysis unit 22 judges this, and detects a state such as whether or not an audio signal other than the main melody is in harmony when two or more people are singing. The detection of harmony is determined based on the harmony information (harmonic track event data) input from the sequencer. It is also determined whether the main melody is voiced or unvoiced.
[0020]
The singing analysis unit 22 controls the operation contents of the main melody selection unit 23 and the main melody component separation unit 25 based on the determination result. If it is determined that the singing state is a silent section, neither main melody selection nor main melody component analysis is required, and therefore the operations of the main melody selection unit 23 and the main melody component separation unit 25 are suspended during this period. Also, when one of the two sings the main melody and the other sings the harmony to it, it is not necessary to generate a harmony sound on top of it, so the main melody component separation unit 25 is suspended. . When the main melody separation unit 25 ceases operation, since there is no audio signal input to the subsequent pitch shift unit 26, the generation of the harmony sound by the pitch shift is also stopped.
[0021]
If it is detected that only one of the singers is currently singing, it is clear that the melody being sung is the main melody, so that the operation of the main melody selection unit 23 is stopped and the main melody component separation unit is stopped. The singing voice signal input to 25 is instructed to be skipped. As a result, one singing voice signal is directly input from the delay 24 to the pitch shift unit 26.
[0022]
Further, the separation algorithm of the main melody component separation unit 25 is switched depending on whether the current voice of the main melody is a voiced sound or an unvoiced sound. That is, in the case of a voiced sound, a singing voice signal is composed of harmonics of the fundamental tone (basic frequency) relatively simply, and therefore, main melody component separation is performed based on this principle. On the other hand, in the case of an unvoiced sound, since many nonlinear noise components are included, the melody singing voice signal is separated by a method different from that of the voiced sound.
[0023]
The main melody audio signal separated by the main melody component separation unit 25 or the single singing voice signal skipped by the main melody component separation unit 25 is input to the pitch shift unit 26. The pitch shift unit 26 shifts the pitch of the input audio signal based on the harmony information input from the sequencer 14 and outputs it to the adder 28 as a harmony audio signal.
[0024]
Here, as shown in FIG. 4, the pitch shift unit 26 stores the formant (envelope of the frequency component) of the audio signal input from the previous stage, and pitch-shifts only each frequency component constituting the formant. Each frequency component that has been pitch-shifted is level-adjusted to match the pitch envelope. Thereby, it is possible to shift only the pitch (frequency) without changing the sound quality.
[0025]
In FIG. 1, in addition to the harmony sound signal, the adder 28 has a karaoke performance sound input from the sound source 16, a chorus sound input directly from the sequencer 14, and a singing sound input directly from the ADC 12 via the delay 27. A signal is input. The adder 28 adds and synthesizes these singing voice signals, harmony voice signals, karaoke performance sounds, and chorus sounds, and mixes them into a stereo signal. The mixed audio signal is output from the DSP 13 and input to the DAC 17. The DAC 17 converts this digital stereo signal into an analog signal and inputs it to the amplifier 18. The amplifier 18 amplifies the analog signal and outputs it from the speaker 19.
[0026]
The two delays 24 and 27 inserted in the block of the DSP 13 are for absorbing delay times for signal processing in the signal processing autocorrelation analysis unit 21, the song analysis unit 22, the main melody selection unit 23, and the like. Is.
[0027]
In this way, in this karaoke apparatus, in order to analyze which of the plural (two) singing voices inputted from one microphone 10 is the main melody, and to add the harmony singing only to the main melody and output it. Even if you sing duet songs together, you can add harmony only to the main melody.
[0028]
FIG. 5 is a block diagram of a main part of a karaoke apparatus according to another embodiment of the present invention. The difference between this karaoke device and the karaoke device of the first embodiment shown in FIG. 1 is that this karaoke device is provided with microphones (two in the figure) for the number of singers, and the singing voice of each singer. The signal is input to the DSP in a separate system. The music data storage / reading unit for karaoke performance and the signal system after the addition of the singing voice signal and the karaoke performance signal are the same as those in the first embodiment, and thus the same reference numerals are given to the components. Is omitted.
[0029]
The two duet microphones 30 and 31 are respectively amplified by separate amplifiers 32 and 33, converted into digital signals by ADCs 34 and 35, and input to the DSP 36. In the DSP 36, the first singing voice signal (singing voice signal input from the microphone 30) is input to the autocorrelation analysis unit 41 and the addition unit 47. The second singing voice signal (singing voice signal input from the microphone 31) is input to the autocorrelation analysis unit 42 and the addition units 44 and 47. The autocorrelation analyzers 41 and 42 analyze the fundamental frequencies of the first song signal and the second song signal, respectively. In this configuration, the autocorrelation analyzers 41 and 42 do not need to separate and analyze the fundamental frequencies of a plurality of singing sounds. The analysis result is input to the song analysis unit 43. The singing analysis unit 43 performs a singing number determination operation, a main melody selection operation, and a harmony detection operation based on the fundamental frequency of the two singing voice signals input and the main melody information and harmony information input from the sequencer 14. That is, it is analyzed whether or not two people are singing at the same time, if two people are singing, which is the main melody, and the other singing sound is the harmony of the main melody. When the main melody is selected by the main melody selection operation, a corresponding select signal is input to the selector 45. The selector 45 switches the connection to input the singing voice signal selected as the main melody to the pitch shifter 46. The pitch shifter 46 pitch-shifts the input audio signal based on the harmony information input from the sequencer 14 to generate a harmony audio signal.
[0030]
The harmony audio signal is input to the adding unit 49. In addition to the harmony audio signal, the adder 49 receives a karaoke performance sound input from the sound source 16, a chorus sound input directly from the sequencer 14, and a singing audio signal input from the ADC 12 via the adder 47 -delay 48. Entered. The adder 49 adds and synthesizes these singing voice signals, harmony voice signals, karaoke performance sounds, and chorus sounds, and mixes them into a stereo signal. The mixed audio signal is output from the DSP 36 and input to the DAC 17.
[0031]
The above embodiment includes inventions other than those described in the scope of claims, and the present invention is described as follows in the subordinate form of the invention of claim 1 of the scope of claims.
[0032]
[Claim 2] The audio signal processing apparatus according to claim 1, wherein the audio signal input means is a means for inputting a plurality of singing audio signals in one system.
[0033]
[Claim 3] The main melody determination means compares the means for detecting each fundamental frequency of a plurality of input audio signals with the detected fundamental frequency and the main melody information stored in advance. The audio signal processing apparatus according to claim 1, further comprising: means for determining a main melody.
[0034]
[Claim 4] The audio signal processing apparatus according to claim 2, wherein the main melody extraction means is means for separating and extracting a main melody audio signal component from a plurality of audio signals input in one system.
[0035]
[Claim 5] The audio signal processing apparatus according to claim 1, wherein the harmony generation means is means for pitch-shifting the audio signal of the main melody based on previously stored harmony information.
[0036]
[Claim 6] Harmony detection means for detecting whether or not an audio signal other than the main melody is in harmony with the main melody, and the harmony generation means when the harmony detection means detects an audio signal in harmony. The audio signal processing apparatus according to claim 1, further comprising a means for invalidating.
[0037]
[Claim 7] Harmony detection means for determining whether or not an audio signal other than the main melody matches the harmony information, and the harmony generation means is disabled when the harmony detection means detects an audio signal in harmony. The audio signal processing apparatus according to claim 5, further comprising:
[0038]
Further, the present invention can be applied to musical instrument performance in addition to singing voice such as karaoke.
[0039]
【The invention's effect】
As described above, according to the present invention, even when a plurality of audio signals are input, the audio signal for singing / playing the main melody is determined and extracted from the input, and the harmony audio is added only to the audio signal. Only harmony that enhances the main melody can be added, and singing / performance is not impaired by adding harmony to anything that is not the main melody.
[0040]
In addition, since the main melody is determined from a plurality of input audio signals, the main melody can be extracted from the performances in which the main melody alternates.
[Brief description of the drawings]
FIG. 1 is a block diagram of the main part of a karaoke apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram showing the structure of music data of the karaoke apparatus. FIG. 4 is a diagram for explaining a pitch shift method of an audio signal. FIG. 5 is a configuration diagram of a main part of a karaoke apparatus according to another embodiment of the present invention.

Claims

A karaoke apparatus comprising music data reproduction means, fundamental frequency information extraction means, main melody selection means, main melody separation means, pitch shift means, and singing analysis means,
The music data reproduction means reproduces the music data and outputs an accompaniment signal, outputs the main melody information included in the music data to the main melody selection means, and outputs the harmony information to the pitch shift means and the singing analysis means,
The fundamental frequency information extracting means extracts fundamental frequency information for each part from the audio signal from the microphone,
The main melody selection means outputs the basic frequency information suitable for the main melody information among the basic frequency information for each part as selection information,
The main melody separation means frequency-analyzes the sound signal from the microphone and separates it into a plurality of part sound signals, and outputs one of the part sound signals as the main melody sound signal based on the selection information from the main melody selection means,
The pitch shift means pitch-shifts the main melody audio signal based on the harmony information and outputs a harmony audio signal.
The singing analysis means analyzes the number of singing parts in the audio signal based on the fundamental frequency information. When the number of singing parts is 0, the first process is performed. When the number of singing parts is 1, the second process is performed. If the number of singing parts is plural and the harmony part corresponding to the harmony information is included, the third process is performed.
The first process pauses the main melody selection means, the main melody separation means, and the pitch shift means,
In the second process, the main melody selection means and the main melody separation means are paused, and a voice signal is input to the pitch shift means instead of the main melody voice signal,
The third process is a karaoke apparatus that pauses the main melody separation means and the pitch shift means.

A karaoke apparatus comprising music data reproduction means, fundamental frequency information extraction means, main melody selection means, main melody separation means, pitch shift means, and singing analysis means,
The music data reproduction means reproduces the music data and outputs an accompaniment signal, and outputs the harmony information included in the music data to the pitch shift means and the singing analysis means,
The fundamental frequency information extracting means extracts fundamental frequency information for each part from the audio signal from the microphone,
The main melody selection means outputs the fundamental frequency information having a predetermined frequency height among the fundamental frequency information for each part as selection information,
The main melody separation means frequency-analyzes the sound signal from the microphone and separates it into a plurality of part sound signals, and outputs one of the part sound signals as the main melody sound signal based on the selection information from the main melody selection means,
The pitch shift means pitch-shifts the main melody audio signal based on the harmony information and outputs a harmony audio signal.
The singing analysis means analyzes the number of singing parts in the audio signal based on the fundamental frequency information. When the number of singing parts is 0, the first process is performed. When the number of singing parts is 1, the second process is performed. If the number of singing parts is plural and the harmony part corresponding to the harmony information is included, the third process is performed.
The first process pauses the main melody selection means, the main melody separation means, and the pitch shift means,
In the second process, the main melody selection means and the main melody separation means are paused, and a voice signal is input to the pitch shift means instead of the main melody voice signal,
The third process is a karaoke apparatus that pauses the main melody separation means and the pitch shift means.