JP3608674B2

JP3608674B2 - Score recognition device

Info

Publication number: JP3608674B2
Application number: JP27514295A
Authority: JP
Inventors: 誠至中野; 錬澄田; 鉄夫日野; 厚始大場
Original assignee: Kawai Musical Instrument Manufacturing Co Ltd
Current assignee: Kawai Musical Instrument Manufacturing Co Ltd
Priority date: 1995-09-29
Filing date: 1995-09-29
Publication date: 2005-01-12
Anticipated expiration: 2015-09-29
Also published as: JPH0997061A

Description

【０００１】
【発明の属する技術分野】
本発明は楽譜認識装置に関し、特に、楽譜の段落やパートを認識し、楽譜情報を適切に時系列化することが可能な楽譜認識装置に関するものである。
【０００２】
【従来の技術】
従来の楽譜認識装置においては、例えばスキャナによって読み込んだ楽譜画像データについて、五線、音符や各種記号を認識して、ＭＩＤＩファイルデータ等の演奏データを生成するものがあった。そして、このような楽譜認識装置では、スコア譜を認識する場合には、全ての段落でパート数が固定されているものを対象にしていた。この場合、認識できたパートを各段落内の五線に上から順に対応させればデータの時系列化が完了する。
【０００３】
【発明が解決しようとする課題】
実際のスコア譜では、全ての段落でパート数が等しいものだけでなく、ある段落からのパートの追加、パートの省略、パートの融合（単独譜表２つがある段落から大譜表になる）、パートの分離（その逆）、譜表の変更（大譜表←→単独譜表）などが行われる場合がある。このような場合には、段落内の各五線にパートを上から順に対応させるだけでは、データの時系列化を正しく行うことができないという問題点があった。
本発明の目的は、前記のような従来技術の問題点を解決し、スコア譜における段落の分離、パートの対応を認識し、楽譜認識したデータをトラックごと時系列化することが可能な楽譜認識装置を提供することにある。
【０００４】
【課題を解決するための手段】
本発明は、矩形領域毎に読み取られた楽譜画像データから、楽譜の各種記号を認識する楽譜認識装置において、五線を検出する五線検出手段と、矩形領域毎に読み取られた楽譜画像データから、隣接する五線間に延びる段落線、最上部の五線から上方へ延びる段落線および最下部の五線から下方へ延びる段落線を検出する段落線検出手段とを備え、上下方向で隣接する上記矩形領域毎に読み取られた楽譜画像データにおいて、上側の矩形領域内で読み取られた楽譜画像データの最下部の五線から下方へ延びる段落線が検出され、下側の矩形領域内で読み取られた楽譜画像データの最上部の五線から上方へ延びる段落線が検出された場合には、上記最下部の五線と上記最上部の五線は１つの段落に含まれると認識することを特徴とする。
また、矩形領域毎に読み取られた楽譜画像データから、複数の五線間の段落線の有無を検出する段落検出手段と、それぞれの段落線の左側で、大かっこの有無を検出する大かっこ検出手段とを備え、上記大かっこ検出手段により大かっこが有りと検出された場合には、上記大かっこで括られる複数の五線を１つのパートと認識することを特徴とする。
更に、楽譜画像データから、楽譜に記載されたパート名を認識するパート名認識手段と、上記パート名認識手段により認識されたパート名に基づき、段落間におけるパートの対応をとるパート対応検出手段とを備えたことを特徴とする。
【０００５】
本発明はこのような構成により、五線を各段落毎に的確にグループ化でき、更に段落間において各パートの対応を取ることができる。従って、楽譜に記載されている音符などを適切なタイミングおよび音色の演奏情報に自動的に変換することが可能である。
【０００６】
【発明の実施の形態】
以下、本発明の実施の形態を図面を参照して詳細に説明する。図１は本発明の楽譜認識装置の１実施例の構成を示すブロック図である。この装置は、パソコン等の一般的な計算機システムにスキャナやＭＩＤＩインターフェース回路を付加したものである。ＣＰＵ１は、ＲＯＭ２あるいはＲＡＭ３に格納されるプログラムに基づき、楽譜認識装置全体の制御を行う中央処理装置である。また、予め設定された所定の周期でＣＰＵ１に割り込みをかけるタイマ回路を内蔵している。ＲＡＭ３はプログラムエリアの他、画像データバッファ、ワークエリア等として使用される。ハードディスク装置ＨＤＤ４およびフロッピディスク装置ＦＤＤ５は、プログラムおよび画像データ、演奏データ等を格納する。ＣＲＴ６はＣＰＵ１の制御に基づき、ＣＲＴインターフェース回路７から出力される映像情報を表示し、キーボード８から入力された情報は、キーボードインターフェース回路９を経てＣＰＵ１に取り込まれる。プリンタ１０は、ＣＰＵ１の制御に基づき、プリンタインターフェース回路１１から出力される印字情報を印字する。
【０００７】
スキャナ１２は、（印刷された）楽譜を光学的に走査して、２値あるいはグレイスケールの画像データに変換するものであり、フラットベッド型、ハンディ型、フィーダー型等任意のタイプのものを使用できる。スキャナ１２によって読み取られた画像情報は、スキャナインターフェース回路１３を介して、ＲＡＭ３あるいはＨＤＤ４に取り込まれる。ＭＩＤＩインターフェース回路１４は、音源モジュール等の外部のＭＩＤＩ機器との間でＭＩＤＩデータの送受信を行う回路である。バス１５は楽譜認識装置内の各回路を接続している。なお、この他にマウス等のポインティングデバイス、ＲＳ２３２Ｃ等のシリアルインターフェース回路等を備えていてもよい。
【０００８】
図３は、ＣＰＵ１のメイン処理を示すフローチャートである。Ｓ１においては、スキャナ１２によって楽譜のイメージをＲＡＭ３に取り込む。画像は２値の画像として取り込む。Ｓ２においては、かすれやドットノイズなどを軽減するために、図形融合などの画質平滑化処理を行う。Ｓ３においては、画質チェック処理を行う。画質チェック処理においては、倍率と濃度の情報を得ると共に、後段における五線検出の基準データを得るために、まず五線の線幅と五線の各線間の間幅を検出する。線幅、間幅を求めるためには、まず、画像上の横（ｘ）方向の数箇所において縦（ｙ）方向に走査し、黒ラン（連続する黒画素）と白ランの長さを全て求めて、長さ毎に頻度分布（ヒストグラム）データを作成する。
【０００９】
楽譜上で最も多い記号は五線であるので、作成された黒ラン長ヒストグラムと、白ラン長ヒストグラムのピークをそれぞれ検出することで、五線の線幅、間幅が推定できる。そして、画像データの倍率は、例えば間幅から推定可能であり、また、濃度は線幅と間幅の比から推定することができる。楽譜の認識処理においては、倍率および濃度が所定の範囲から外れると認識率が低下してしまうので、Ｓ３においては、これらの値が、所定の範囲内に入っているか否かがチェックされる。Ｓ４においては、Ｓ３におけるチェック結果が画質ＯＫであるか否かが判定され、結果がＯＫでない場合にはＳ１に戻って、倍率や濃度を変えて再取り込みを行う。
【００１０】
Ｓ５においては、後述する五線認識を行う。五線認識処理は、大きく五線領域分離処理、五線走査開始位置検出処理、および五線シフト量の検出処理に分かれる。Ｓ６においては、後述する段落認識処理を行う。この処理は、大きく、段落認識処理と、大かっこ認識処理に分かれる。
【００１１】
Ｓ７においては、段落の認識結果を表示して、段落認識結果が正しいか否かを利用者にチェックさせることにより、ＯＫか否かが判定され、結果がＯＫでない場合にはＳ８に移行して、段落認識結果の修正が行われる。スコア譜においては、各段落のパート構成が等しいものの他に、途中でパートの省略や追加があったり、同じパートで単独譜表と大譜表が段落ごとに変化する場合もある。このようなパートの対応は、大かっこの対比等で行う。これは、全段落中から２つの段落を抜き出し、それぞれの段落の大譜表同士の対応の組み合わせの中から、大譜表間の大譜表でないパートの数の差が最も少なくなる組み合わせを選んでいくという方法で行うが、パートの対応が一意に決められない場合もあるので、予め段落認識結果の修正を行えるようにする。
【００１２】
なお、五線認識が失敗した場合には、その後の処理が行えないので、倍率や濃度を変更して再度画像を取り込む必要がある。従って、ステップＳ７においては、まず五線の認識結果を表示し、正しいか否かを利用者に判定させ、もし正しくない場合には、Ｓ１に戻ってイメージの再取り込みを行い、また五線が正しく認識されている場合には、段落認識結果を表示し、チェックさせるようにしてもよい。
【００１３】
Ｓ９においては、処理矩形の決定処理が行われる。求められた五線（大譜表の場合には譜表中の五線）を含む、ある程度広い矩形を採り、これを認識処理矩形とする。この際の矩形は、求められた五線から、まず、五線のずらし量を考慮してある程度の幅を持った矩形にし、五線側から見て、上下方向にある程度の幅の空白が存在した場合には矩形を縮小する。
【００１４】
Ｓ１０においては、五線傾き補正処理を行う。概略を述べると、先に求めた五線シフト量に基づいて、矩形内の画像の縦方向の画素列を上下にシフトする。大譜表の場合には、五線が複数になるので、一番上の五線のシフト量を採用するか、あるいはシフト量の平均を取ってもよい。この後、その矩形が上下の五線に重なっていた場合には、その部分を消去し、更に、矩形の上下端に接し、かつ認識する五線の部分に達していないラベルが存在した場合には、上下の五線のパートを構成するラベルとして消去する。この処理を行った後に、矩形の端から黒画素の存在する部分まで、矩形を更に縮小してもよい。
【００１５】
Ｓ１１〜Ｓ１５においては、各種記号の認識処理が行われる。楽譜記号は、形、位置に関して大まかに以下の３つの種類がある。（１）定型で、上下位置がほぼ決まっているもの（音部記号、拍子記号等）。（２）定型で、上下位置は自由度があるもの（臨時記号、休符等）。（３）不定型かつ位置も不定のもの（音符、スラー、タイ等）。これらをそれぞれに適した方式で、音部記号、拍子認識、音符認識、定型記号認識、文字列認識、スラー、タイ認識の順に認識する。
【００１６】
音部記号、拍子認識を最初に行うのは、処理コストの低い認識を最初に行って、この記号を削除することによって、後の認識の処理コストを軽減するためと、最初により確実なものを認識することで、後の認識での誤認識を減らすためである。また、音符認識の後に定型記号認識を行うのは、ラベルの接触に影響されにくい認識方式である音符認識を行って、この音符を削除することで、音符に接触した臨時記号等の認識を可能にするためである。スラー、タイ認識が最後になっているのは、処理コストの高いスラー認識の対象になるラベルをなるべく少なくするためである。また、先に検出された音符の周りのラベルだけをスラー、タイ認識の対象にすることによって、更にスラー、タイ認識の処理コストを下げ、かつ、スラー、タイの誤認識も減らすことができる。
【００１７】
Ｓ１１においては、五線に対して定位置にある記号として、音部記号と拍子記号を認識する。該処理においては、まず、求められた五線を含む矩形領域で縦に黒画素のヒストグラムを取っていき、黒画素量があるしきい値以上の帯域を、記号が存在する可能性のある場所としてマッチングの対象とする。マッチングは、五線間の数箇所について横方向のペリフェラル特徴によって行う。ペリフェラル特徴とは、マッチング対象となる記号のみを含む矩形領域の左右端から五線間の数箇所の白画素領域を内方向に走査し、黒画素領域に到達するまでの距離を、１次（最初）あるいは数次（２回目以降）まで求めたものである。また、マッチングに失敗した場合には、隣接した帯域を併合して再度認識を行う。そして、認識された記号は画像データから削除する。
【００１８】
Ｓ１２においては音符認識を行う。まず、矩形領域を横に走査して、所定の長さ以下の黒ランを検出し、分離する。分離された画像データは、横が細い部分であるので、ここから符尾や小節線の候補になる縦線を検出する。次に、縦に所定の長さ以下の黒ランを検出し分離する。分離された画像データは細い横線を構成する部分なので、ここから加線やクレッシェンドなどの候補になる横線を検出する。最後に、元の画像データから検出された縦横の細ランを消せば、画像中の太い部分（以下太ラベル）が抽出できる。楽譜の場合、４部音符より短い音符の符頭（以下黒玉符頭）や連鉤（複数の音符をつなぐ帯）が分離できる。
【００１９】
黒玉符頭は、太ラベルの境界線についての座標チェーンデータを求め、この座標データから公知の方法により楕円式を計算し、この形や太ラベルとのマッチング度をとって認識する。２部音符、全音符の符頭（以下白抜き符頭）は、画像の穴の座標チェーンから楕円式を計算する。
【００２０】
最後に、先に求めた符尾候補と結合して音符を検出する。連鉤は、これまでに求められた旗を考えない音符の符尾の周辺に存在する太ラベルを検出し、これの形状から連鉤の本数を計算する。また、この連鉤に連結している他の音符も検出する。連結する他の音符が無い場合には単独の旗を持つ音符と考える。連鉤の本数により、音符の情報を変更する。この後、分離した横線を使って音の高さ（加線）やクレッシェンド、横線と縦線を使ってくり返しかっこ等を認識する。残った縦線から小節線を認識する。そして、認識された記号は画像から削除する。
【００２１】
Ｓ１３においては、定型記号認識が行われる。この処理においては、まず、公知の輪郭線荷重方向指数を取り、辞書の各記号データについてラベルのサイズと荷重方向指数のマッチング度を計算して、各マッチング度を正規化し、統合した結果が最も高い記号を出力する。なお、サイズと荷重方向指数の他に、ペリフェラルなどの他の特徴を使ってもよい。また、五線消去によりラベルが切れたものの対策として、五線消去によって切れたラベルを辞書に登録し、この記号であると認識された場合には、その周りのラベルを結合して再認識する。認識された記号は画像から削除する。
【００２２】
Ｓ１４においては、文字列認識を行う。速度記号などの文字列を認識するために、定型記号認識で認識されたアルファベットその他の記号を使い、その記号を囲む矩形が文字列状に並んでいるものを抽出し、これと文字列辞書のマッチングをとることで、文字列状の記号を、それぞれの構成文字が多少間違っていても認識できるようにする。
【００２３】
Ｓ１５においては、スラータイ認識を行う。この処理においては、残ったラベルのうち、検出された音符の周りのラベルに関して、これを細線化し、これを多円弧近似する。そして、以前に消された記号により線が切れている場合があるので、求められた多円弧同士の連結を行う。最後に、求められた円弧の形や元画像の図の太さ、音符との関係などからスラー、タイを認識する。これが認識で最後のルーチンなので、認識された記号は画像から削除しなくてもよいが、認識したスラー、タイを削除し、この後で再度定型記号認識を行うようにすれば、スラー、タイと接触した記号を認識することができるようになる。
【００２４】
Ｓ１６においては、例えば認識結果に基づき、楽譜画像データを合成して表示し、正しいか否かを利用者にチェックさせることにより、ＯＫか否かが判定され、結果がＯＫでない場合にはＳ１７に移行して、マウス、キーボード等を用いて、手動により認識結果の修正が行われる。Ｓ１８においては、演奏データ作成処理が行われる。該処理においては、認識した各種の記号や音符情報に基づき、例えば公知の演奏データ形式であるＭＩＤＩファイルデータを生成する。
【００２５】
図４は、図３のＳ５の五線認識処理の詳細を示すフローチャートである。五線認識処理においては、まず、横方向に並んだコーダ状の五線の認識を行いやすくするために、段落矩形の検出を行う。この矩形検出処理は、後で正確な段落線の検出を行うので、おおまかなもので良く、段落矩形が、正しいものより小さく、例えば五線ごとに分離されていてもよい。
Ｓ２０においては、画像の各横軸ごとの黒画素数を計数し、ヒストグラムを作成する。図２は、ヒストグラムの説明図である。図２（ａ）の楽譜画像（音符等は存在するが記載は省略してある）の各ｙ座標における黒画素数を計数したものが（ｂ）のヒストグラムである。このヒストグラムは五線が存在する箇所において大きな値となるが、値の増減が激しいので、このままでは判定が困難である。そこで、Ｓ２１においては、波形を平滑化するために、各ｙ座標に関して所定幅のｙ座標範囲のヒストグラム値を合計した加算ヒストグラムを作成する。幅は例えば五線の隣接する２本の線間の幅程度でよく、加算範囲は、例えば任意のｙ座標に対してそれより下の所定範囲であってもよいし、ｙ座標を中心とする上下の範囲であってもよい。図２（ｃ）は加算ヒストグラムの例である。
【００２６】
Ｓ２２においては、上（ｙ座標の小さい方）から加算ヒストグラムを検索し、加算ヒストグラム値が、図２（ｃ）に点線で示す、先に求めた五線の線幅、五間の幅で正規化した所定のしきい値以上となるｙ座標（ｙ１）を検出する。なお、Ｓ２２において画像の下端に達した場合にはＳ２７に移行する。Ｓ２３においては、ｙ１から下に加算ヒストグラムを検索し、加算ヒストグラム値が所定のしきい値以下となるｙ座標（ｙ２）を検出する。更に、ｙ２から下に加算ヒストグラムを検索し、加算ヒストグラム値が再び所定のしきい値以上となるｙ座標（ｙ３）を検出する。ただし、もし（ｙ３−ｙ２）が所定の値以下である場合、即ち黒画素の少ない領域の幅が狭い場合には該ｙ２、ｙ３を破棄し、ｙ３より下において再度ｙ２を探索する。またｙ３検索中に下端に達した場合には、十分下方でｙ３を検出したものと見なして、Ｓ２５に移行する。
Ｓ２５においては、（ｙ２−ｙ１）が所定の値、例えば五線全体の幅以上であれば、段落あるいは五線が含まれると推定される矩形として、該矩形の座標情報等を保存する。Ｓ２６においては、画像の下端に達したか否かが判定され、結果が否定の場合にはＳ２２に戻って、残りの領域について検索を行う。
【００２７】
Ｓ２７からＳ３０においては、横軸方向に五線の切れ目を検索する。まず、Ｓ２７においては、Ｓ２５において検出された各矩形について、各縦軸ごとの黒画素数のヒストグラムを取る。Ｓ２８においては、ヒストグラムを左から検索し、ヒストグラム値が所定の値以上になるｘ座標（ｘ１）を検出する。しきい値を例えば五線４本分の黒画素数としておけば、五線が存在すれば必ずしきい値を超える。Ｓ２９においては、ヒストグラムをｘ１から右に検索し、ヒストグラム値が所定の値以下になるｘ座標（ｘ２）を検出する。Ｓ３０においては、（ｘ２−ｘ１）が所定値以上であれば、矩形を分割する。なお、五線の右に記号等が存在する可能性があるので、最右端の短い矩形は左の矩形に含める。
【００２８】
Ｓ３１においては、それぞれの段落矩形で五線認識を行う。まず、段落矩形を五線検出矩形とし、矩形内で五線の線幅、五間の幅を再計算する。そして、五線シフト量走査の開始位置を求め、更に五線シフト量を求める。このようにして求められた一つ目の五線から、五線検出矩形の上下端までを次の五線検出矩形とし、この矩形の縦幅があるしきい値以上あれば、これらの矩形中でも同様に五線の線幅、五間の幅を再計算し、五線検出を行う。五線の線幅、五間の幅をそれぞれの矩形中で再計算するのは、段落矩形中の全ての五線の幅が一定であるとは限らないからである。それぞれの段落矩形の中で、五線が検出できる矩形がなくなるまでこの処理を行う。
【００２９】
五線走査開始位置検出処理の概略を述べると、ｘ軸方向のある位置で、黒画素と白画素の幅を順に求め、求められた線幅と間幅が五線状に並んでいる位置を、ある程度の誤差を考慮して検出する。そして、加線（五線からはみ出した音符を記載するために付加した横線）の影響を除くために、五線状の並びの両側に間幅より大きな白画素幅があるという条件を加える。この条件に合う白黒画素の並びがあるｘ位置の各黒ランの中点を五線走査開始位置とする。
【００３０】
Ｓ３２においては、五線を追跡し、補正のためのシフト量を検出する。五線シフト量の検出処理の概略を述べると、求められたｘ位置の五線走査開始位置（５点の黒画素位置）から、１ドットずつ位置を左右に変えて五線を追跡していき、５点の内、黒画素がある個数（例えば３あるいは４個）以下になった場合に、５点を上下にずらして黒画素数をチェックし、黒画素の割合が高くなる方向へｙ座標をシフトする。そして開始位置からのシフト量を五線のシフト量とする。五線走査開始位置から左右に、黒画素個数が０になる位置まで追跡することにより五線の検出を行う。
【００３１】
図５は、図３のＳ６の段落認識処理の詳細を示すフローチャートである。Ｓ４０においては、検出した五線情報を上から順に並べる。Ｓ４１においては、隣合う五線について、五線の左端のｘ座標の差の絶対値ｄを求める。なお、段落が途中で切れている場合など、必ずしも段落内の五線がｙ座標において隣接しているとは限らないので、全ての五線の組み合わせについてチェックするか、あるいはある五線に対して所定のｙ軸距離内にある五線について全てチェックするようにする。
【００３２】
Ｓ４２においては、距離ｄが所定のしきい値ＴＨＲ未満であるか否かが判定され、結果が否定の場合にはＳ４６に移行するが、肯定の場合にはＳ４３に移行する。Ｓ４３においては、２つの五線の左端の間の領域に連続する段落線が存在するか否かがチェックされる。段落線の存在は、黒画素が上から下まで８連結で連続しているか否かをチェックすることにより判明するが、画像がかすれていても段落線の存在を認識できるようにするために、ある程度の黒画素が途切れても、その後連続していれば、段落線が存在するものと判定する。
【００３３】
図６は段落線の検出領域を示す説明図である。段落線の検出は、上下の五線の左端間の領域Ａにおいて行われる。領域Ａは２つの五線の左端の座標が判明しているので、該座標データから求めることができる。なお、ハンディスキャナ等の幅の狭いスキャナで楽譜画像を読み込んだ場合には、段落の途中で画像データが分かれてしまう場合がある。このような場合には、最上部の五線の上部の領域Ｂあるいは最下部の五線の下部の領域Ｃに画像データの端まで続く段落線が存在するか否かをチェックし、他の画像データとの連続性を検出する。例えばＣの領域に段落線が存在し、次の画像データのＢの領域にやはり段落線が存在した場合には２つの画像データの段落線は接続されているものと判断する。
【００３４】
Ｓ４４においては、Ｓ４３のチェック結果が段落線有りか否かが判定され、結果が否定の場合にはＳ４６に移行するが、肯定の場合にはＳ４５に移行する。Ｓ４５においては、上下の五線を同じ段落番号にする。Ｓ４６においては、残っている五線の組み合わせが無いか否かが判定され、まだ残っている場合にはＳ４１に戻って、段落線のチェックを繰り返す。なお、求められた段落矩形同士が横軸方向に重複せず、縦軸方向に重複していて、かつ、五線の数が等しい場合には、コーダ状の段落であるとする。この場合、２つの段落を結合して１つの段落とするか、あるいは、別の段落として記号、音符認識を行い、後で時系列処理を行う。
【００３５】
Ｓ４７においては、段落線が検出された各段落において、五線が大譜表か単独譜表かを区別するために、大かっこの認識を行う。求められた段落の五線の左端より左の領域で、ラベル抽出を行いながら記号認識を行う。この際、大かっこが段落線と接触している場合もあるので、あらかじめ段落線の部分でラベルを切り取っておく。この時の記号認識は、大かっこと大弧線とその他のものを区別できる程度の認識率でよい。また、大かっこは中心の細い部分でラベルが分離していることが多いので、分離した辞書データも用意し、上下の分離大かっこと認識された場合には認識結果を結合する。そして、段落内の五線の内、大かっこで括られていると認識された五線の組は、同じパートナンバーを割り振る。
【００３６】
以上実施例を開示したが、次に示すような変形例も考えられる。画像データはスキャナで読み込んだ画像データを１個づつ順に認識する方式を開示したが、例えばハンドスキャナで読み込んだ場合には、１曲分が複数のイメージファイルとして取り込まれる可能性が高い。このような場合には、複数の画像データファイルを予め１つのファイルに合成してから認識してもよい。また、予め画像データを複数に分割してから本発明の認識処理を行ってもよい。
【００３７】
段落認識処理においては、段落線の有無をチェックする方式を開示したが、予めｘ軸、ｙ軸方向に黒画素のヒストグラムを取り、これの完全な空白部分を検出することによって段落の存在を推定してもよい。
【００３８】
段落認識処理時に、大かっこ、大弧線の認識と同時に、パート名を認識するための文字列認識を行ってもよい。この場合、大かっこ認識時より広い範囲を認識対象にして、辞書も、英数字（日本語）を追加する。文字が認識された場合には、この認識矩形の並びから文字列対象の認識結果を抽出し、文字列一致ポイントの最大値選択しきい値判定によってパート名を認識する。これを行うためのパート名の文字列辞書も用意する。また、多くの場合、段落によって、パート名が略記されているので、略記とそうでないパート名の対応も取れるようにする。
【００３９】
認識処理矩形に於いて、矩形の上下端からも、五線からも離れているラベル、つまり譜表の構成要素は、上下の認識処理矩形の両方に入っている場合がある。この場合には、前の認識処理矩形で認識された記号情報を次の認識処理矩形の認識時にチェックし、次の認識処理矩形での同じ図形を予め消しておけば、認識の高速化に貢献できる。また、例えば、矩形の下端での下向きのスラーなど、記号によっては、予めどちらのパートに属するのかを決定できるものもあるので、この場合には、記号情報を対応するパートのみに入れる。
実施例においては、五線ごとにシフト量を計算し、矩形画像内でシフト補正を行う例を開示したが、シフト量は、取り込み画像全体で１つ計算し、画像全体をシフトしてもよい。
【００４０】
【発明の効果】
以上述べたように、本発明は、五線を各段落毎にグループ化でき、更に段落間において各パートの対応を取ることができる。従って、楽譜に記載されている音符などを適切なタイミングおよび音色の演奏情報に自動的に変換することが可能となるという効果がある。また、楽譜の倍率や記号の形などが異なっていても高い認識率を達成することが可能となるという効果もある。
【図面の簡単な説明】
【図１】本発明の楽譜認識装置の実施例の構成を示すブロック図である。
【図２】ヒストグラムの説明図である。
【図３】ＣＰＵ１のメイン処理を示すフローチャートである。
【図４】図３のＳ５の五線認識処理の詳細を示すフローチャートである。
【図５】図３のＳ６の段落認識処理の詳細を示すフローチャートである。
【図６】段落線の検出領域を示す説明図である。
【符号の説明】
１…ＣＰＵ、２…ＲＯＭ、３…ＲＡＭ、４…ハードディスク装置、５…フロッピディスク装置、６…ＣＲＴディスプレイ装置、７…ＣＲＴインターフェース回路、８…キーボード、９…キーボードインターフェース回路、１０…プリンタ、１１…プリンタインターフェース回路、１２…スキャナ、１３…スキャナインターフェース回路、１４…ＭＩＤＩインターフェース回路、１５…バス[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a score recognition apparatus, and more particularly to a score recognition apparatus capable of recognizing paragraphs and parts of a score and appropriately time-sequencing the score information.
[0002]
[Prior art]
In a conventional score recognition apparatus, for example, musical score image data read by a scanner recognizes staffs, notes, and various symbols, and generates performance data such as MIDI file data. In such a score recognition apparatus, when a score score is recognized, the number of parts is fixed in all paragraphs. In this case, if the recognized parts correspond to the staff in each paragraph in order from the top, the time series of data is completed.
[0003]
[Problems to be solved by the invention]
In the actual score, not only the number of parts in all paragraphs is the same, but also the addition of parts from a paragraph, the omission of parts, the fusion of parts (from a paragraph with two single staffs to a grand staff), Separation (the reverse), staff change (large staff ← → single staff), etc. may be performed. In such a case, there is a problem in that the data cannot be correctly time-series simply by associating the parts with the staffs in the paragraph in order from the top.
The object of the present invention is to solve the above-mentioned problems of the prior art, recognize the separation of paragraphs in the score score, recognize the correspondence of parts, and can recognize the score-recognized data in a time series for each track. To provide an apparatus.
[0004]
[Means for Solving the Problems]
The present inventionRead for each rectangular areaFrom musical score image dataThe sheet musicA musical score recognition device that recognizes various symbols.ThedetectionStaff detectionMeans,A paragraph line for detecting a paragraph line extending between adjacent staffs, a paragraph line extending upward from the uppermost staff, and a paragraph line extending downward from the lowermost staff from the musical score image data read for each rectangular areaDetecting means,In the score image data read for each rectangular area adjacent in the vertical direction, a paragraph line extending downward from the lowest staff of the score image data read in the upper rectangular area is detected, and the lower rectangle is detected. When a paragraph line extending upward from the uppermost staff of the musical score image data read in the area is detected, the lowermost staff and the uppermost staff are included in one paragraph. recognizeIt is characterized by that.
Also, paragraph detection means for detecting the presence or absence of paragraph lines between multiple staffs from the score image data read for each rectangular area, and bracket detection for detecting the presence or absence of brackets on the left side of each paragraph line And a plurality of staffs enclosed by the brackets are recognized as one part when the brackets are detected by the bracket detection means.
Furthermore, part name recognition means for recognizing the part name described in the score from the score image data, and part correspondence detection means for taking correspondence between the parts based on the part name recognized by the part name recognition means, It is provided with.
[0005]
According to the present invention, the staff can be accurately grouped for each paragraph, and each part can be matched between paragraphs. Therefore, it is possible to automatically convert notes and the like described in the score into performance information with appropriate timing and timbre.
[0006]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of one embodiment of the score recognition apparatus of the present invention. This apparatus is obtained by adding a scanner and a MIDI interface circuit to a general computer system such as a personal computer. The CPU 1 is a central processing unit that controls the entire score recognition apparatus based on a program stored in the ROM 2 or the RAM 3. Further, a timer circuit for interrupting the CPU 1 at a predetermined cycle set in advance is incorporated. The RAM 3 is used as an image data buffer, a work area, etc. in addition to a program area. The hard disk device HDD4 and the floppy disk device FDD5 store programs, image data, performance data, and the like. The CRT 6 displays video information output from the CRT interface circuit 7 based on the control of the CPU 1, and information input from the keyboard 8 is taken into the CPU 1 through the keyboard interface circuit 9. The printer 10 prints the print information output from the printer interface circuit 11 based on the control of the CPU 1.
[0007]
The scanner 12 optically scans (printed) musical scores and converts them into binary or gray scale image data, and uses any type such as a flatbed type, a handy type, and a feeder type. it can. Image information read by the scanner 12 is taken into the RAM 3 or the HDD 4 via the scanner interface circuit 13. The MIDI interface circuit 14 is a circuit that transmits and receives MIDI data to and from an external MIDI device such as a sound module. The bus 15 connects each circuit in the score recognition apparatus. In addition, a pointing device such as a mouse, a serial interface circuit such as RS232C, or the like may be provided.
[0008]
FIG. 3 is a flowchart showing the main processing of the CPU 1. In S <b> 1, the image of the score is taken into the RAM 3 by the scanner 12. The image is captured as a binary image. In S2, image quality smoothing processing such as figure fusion is performed in order to reduce blurring and dot noise. In S3, an image quality check process is performed. In the image quality check process, in order to obtain information on magnification and density, and in order to obtain reference data for detecting the staff in the subsequent stage, first, the width between the staffs and the width between the staffs is detected. In order to obtain the line width and the interval width, first, scanning is performed in the vertical (y) direction at several positions in the horizontal (x) direction on the image, and all the lengths of the black run (continuous black pixels) and the white run are determined. Obtain frequency distribution (histogram) data for each length.
[0009]
Since the most common symbol on the score is a staff, the line width and interval of the staff can be estimated by detecting the peaks of the created black run length histogram and white run length histogram. The magnification of the image data can be estimated from, for example, the gap width, and the density can be estimated from the ratio between the line width and the gap width. In the score recognition process, if the magnification and the density are out of the predetermined range, the recognition rate decreases. Therefore, in S3, it is checked whether or not these values are within the predetermined range. In S4, it is determined whether or not the check result in S3 is an image quality OK. If the result is not OK, the process returns to S1 and recapture is performed by changing the magnification and density.
[0010]
In S5, staff recognition described later is performed. The staff recognition process is roughly divided into staff area separation processing, staff scanning start position detection processing, and staff shift amount detection processing. In S6, a paragraph recognition process described later is performed. This process is roughly divided into a paragraph recognition process and a bracket recognition process.
[0011]
In S7, the recognition result of the paragraph is displayed and the user is checked whether or not the paragraph recognition result is correct to determine whether or not it is OK. If the result is not OK, the process proceeds to S8. The paragraph recognition result is corrected. In the score notation, in addition to the same part structure of each paragraph, there are cases where parts are omitted or added in the middle, or the single staff and grand staff change for each paragraph in the same part. Correspondence of such parts is performed by comparison with brackets. This means that two paragraphs are extracted from all the paragraphs, and the combination that minimizes the difference in the number of non-major parts between the grand staves is selected from the corresponding combinations of the grand staves of each paragraph. Although it is performed by the method, the correspondence of the parts may not be uniquely determined, so that the paragraph recognition result can be corrected in advance.
[0012]
If the staff recognition fails, the subsequent processing cannot be performed, so it is necessary to change the magnification and density and capture the image again. Therefore, in step S7, first, the recognition result of the staff is displayed to let the user determine whether or not it is correct. If it is not correct, the process returns to S1 to re-import the image. If it is recognized correctly, the paragraph recognition result may be displayed and checked.
[0013]
In S9, processing rectangle determination processing is performed. A rectangle that is fairly wide including the found staff (the staff in the staff in the case of a grand staff) is taken, and this is used as a recognition processing rectangle. The rectangle in this case is changed from the obtained staff to a rectangle with a certain width in consideration of the shift amount of the staff, and there is a certain amount of white space in the vertical direction when viewed from the staff side. If so, the rectangle is reduced.
[0014]
In S10, staff inclination correction processing is performed. In brief, based on the staff shift amount obtained previously, a rectangleInsideShift the vertical pixel column of the image up and down. In the case of a grand staff, since there are a plurality of staffs, the shift amount of the top staff may be adopted, or the average of the shift amounts may be taken. After that, if the rectangle overlaps the upper and lower staffs, delete that part, and if there is a label that touches the upper and lower ends of the rectangle and does not reach the recognized staff part The upper and lower staffThe parts ofErase as label. After performing this processing, the rectangle may be further reduced from the end of the rectangle to the portion where the black pixel exists.
[0015]
In S11 to S15, recognition processing of various symbols is performed. There are roughly the following three types of musical notation symbols in terms of shape and position. (1) A fixed type whose top and bottom positions are almost fixed (clef, time signature, etc.). (2) It is a standard type and has a degree of freedom in the vertical position (temporary symbols, rests, etc.). (3) An indefinite type and an indefinite position (note, slur, tie, etc.). These are recognized in the order of clef, time signature recognition, note recognition, fixed symbol recognition, character string recognition, slur, and tie recognition in a method suitable for each.
[0016]
The clef and time signature recognition is performed first in order to reduce the processing cost of later recognition by performing recognition with low processing cost first and then deleting this symbol. This is to reduce misrecognition in later recognition. In addition, standard symbol recognition after note recognition is a recognition method that is not easily affected by label contact, and by deleting this note it is possible to recognize temporary symbols that touch the note It is to make it. The reason for the slur and tie recognition last is to reduce as many labels as possible for slur recognition with high processing costs. Further, by making only the labels around the previously detected notes a target for slur and tie recognition, the processing cost of slur and tie recognition can be further reduced, and the misrecognition of slur and tie can also be reduced.
[0017]
In S11, a clef and a time signature are recognized as symbols at a fixed position with respect to the staff. In this process, first, a histogram of black pixels is taken vertically in the rectangular area including the found staff, and a band where the black pixel amount is equal to or greater than a certain threshold is a place where a symbol may exist. As a target of matching. Matching is performed by peripheral features in the horizontal direction at several points between the staffs. Peripheral features refer to the distance from the left and right ends of a rectangular area that contains only the symbol to be matched to several black pixel areas between the staffs inward and the distance to the black pixel area. It is obtained up to the first) or several times (after the second). If matching fails, the adjacent bands are merged and recognized again. The recognized symbol is deleted from the image data.
[0018]
In S12, note recognition is performed. First, a rectangular area is scanned horizontally to detect and separate black runs of a predetermined length or less. Since the separated image data is a thin portion, a vertical line that is a candidate for a stem or a bar line is detected from here. Next, a black run having a predetermined length or less is detected and separated vertically. Since the separated image data is a portion constituting a thin horizontal line, a horizontal line that is a candidate for an additional line or a crescendo is detected from here. Finally, by removing the vertical and horizontal thin runs detected from the original image data, a thick part (hereinafter referred to as a thick label) in the image can be extracted. In the case of a musical score, note heads (hereinafter referred to as black ball heads) and continuations (bands connecting a plurality of notes) shorter than four-part notes can be separated.
[0019]
The black ball head obtains coordinate chain data for the border line of the thick label, calculates an elliptic formula from the coordinate data by a known method, and recognizes the matching degree with the shape and the thick label. For the two-part notes and all note heads (hereinafter, white note heads), an elliptic formula is calculated from the coordinate chain of the hole in the image.
[0020]
Finally, a note is detected by combining with the previously determined note candidate. Renren detects a thick label existing around the tail of a note that does not consider the flag that has been determined so far, and calculates the number of consecutive labels from this shape. Also, other notes connected to the chain are also detected. If there is no other note to connect, it is considered as a note with a single flag. The note information is changed according to the number of reams. After this, the separated horizontal lines are used to recognize the pitch of the sound (additional lines), crescendo, repeated horizontal and vertical lines, etc. Recognize bar lines from the remaining vertical lines. Then, the recognized symbol is deleted from the image.
[0021]
In S13, standard symbol recognition is performed. In this process, first, the well-known contour load direction index is taken, the label size and the load direction index matching degree are calculated for each symbol data in the dictionary, the matching degree is normalized, and the integrated result is the most. Output high symbols. In addition to the size and load direction index, other features such as peripherals may be used. In addition, as a countermeasure against a broken label due to staff erasure, a label that has been erased by staff erasure is registered in the dictionary, and if it is recognized as this symbol, the surrounding labels are combined and re-recognized. . The recognized symbol is deleted from the image.
[0022]
In S14, character string recognition is performed. To recognize a character string such as a speed symbol, the alphabet or other symbol recognized by the standard symbol recognition is used, and a character string in which the rectangles surrounding the symbol are arranged is extracted. Matching makes it possible to recognize a character string-like symbol even if each constituent character is slightly wrong.
[0023]
In S15, slur tie recognition is performed. In this process, among the remaining labels, the labels around the detected notes are thinned and approximated by a multi-arc. Then, since the line may be cut off due to a previously erased symbol, the obtained multiple arcs are connected. Finally, slurs and ties are recognized based on the arc shape obtained, the thickness of the original image, and the relationship with the notes. Since this is the last routine in recognition, the recognized symbols do not have to be deleted from the image, but if the recognized slurs and ties are deleted and then the standard symbol recognition is performed again, the slurs and ties The touched symbol can be recognized.
[0024]
In S16, for example, based on the recognition result, the score image data is synthesized and displayed, and it is determined whether or not it is OK by letting the user check whether it is correct. If the result is not OK, the process goes to S17. Then, the recognition result is manually corrected using a mouse, a keyboard, or the like. In S18, performance data creation processing is performed. In this processing, based on the recognized various symbols and note information, for example, MIDI file data in a known performance data format is generated.
[0025]
FIG. 4 is a flowchart showing details of the staff recognition process in S5 of FIG. In the staff recognition process, first, a paragraph rectangle is detected in order to facilitate recognition of a coder-like staff arranged in the horizontal direction. Since this rectangle detection process detects an accurate paragraph line later, it may be a rough one, and the paragraph rectangle may be smaller than the correct one, for example, separated for every staff.
In S20, the number of black pixels for each horizontal axis of the image is counted and a histogram is created. FIG. 2 is an explanatory diagram of a histogram. The histogram of (b) is obtained by counting the number of black pixels at each y coordinate of the musical score image of FIG. 2 (a) (notes are present but not shown). This histogram has a large value at the location where the staff is present, but since the increase and decrease of the value is severe, it is difficult to make a determination if it remains as it is. Therefore, in S21, in order to smooth the waveform, an addition histogram is created by summing the histogram values in the y-coordinate range having a predetermined width for each y-coordinate. The width may be, for example, about the width between two adjacent lines of the staff, and the addition range may be, for example, a predetermined range lower than an arbitrary y coordinate or centered on the y coordinate It may be in the upper and lower range. FIG. 2C is an example of an addition histogram.
[0026]
In S22, the addition histogram is searched from the top (the one with the smaller y coordinate), and the addition histogram value is normalized by the line width of the staves previously obtained and the width between the five shown by the dotted line in FIG. The y coordinate (y1) that is equal to or greater than the predetermined threshold value is detected. If the lower end of the image is reached in S22, the process proceeds to S27. In S23, the addition histogram is searched downward from y1, and the y coordinate (y2) at which the addition histogram value is equal to or smaller than a predetermined threshold value is detected. Further, the addition histogram is searched from y2 below, and the y coordinate (y3) at which the addition histogram value is again equal to or greater than a predetermined threshold is detected. However, if (y3−y2) is equal to or smaller than a predetermined value, that is, if the width of the region with few black pixels is narrow, the y2 and y3 are discarded, and the search for y2 is performed again below y3. If the lower end is reached during the y3 search, it is assumed that y3 has been detected sufficiently downward, and the process proceeds to S25.
In S25, if (y2-y1) is a predetermined value, for example, the width of the entire staff or more, the coordinate information of the rectangle is stored as a rectangle estimated to include a paragraph or staff. In S26, it is determined whether or not the lower end of the image has been reached. If the result is negative, the process returns to S22 to search for the remaining area.
[0027]
In S27 to S30, a staff break is searched in the horizontal axis direction. First, in S27, a histogram of the number of black pixels for each vertical axis is taken for each rectangle detected in S25. In S28, the histogram is searched from the left, and the x coordinate (x1) at which the histogram value is equal to or greater than a predetermined value is detected. For example, if the threshold is set to the number of black pixels for four staffs, the threshold value is always exceeded if there is a staff. In S29, the histogram is searched from x1 to the right, and the x coordinate (x2) at which the histogram value is equal to or smaller than a predetermined value is detected. In S30, if (x2-x1) is greater than or equal to a predetermined value, the rectangle is divided. Since a symbol or the like may exist on the right side of the staff, the shortest rightmost rectangle is included in the left rectangle.
[0028]
In S31, staff recognition is performed for each paragraph rectangle. First, the paragraph rectangle is set as a staff detection rectangle, and the width of the staff and the width between the five are recalculated in the rectangle. Then, the start position of the staff shift amount scanning is obtained, and further the staff shift amount is obtained. From the first staff obtained in this way to the upper and lower ends of the staff detection rectangle, the next staff detection rectangle is used. Similarly, the line width of the staff and the width between the five are recalculated to detect the staff. The reason for recalculating the line width and the width between the five lines in each rectangle is that the widths of all the lines in the paragraph rectangle are not always constant. This process is performed until there is no rectangle in which the staff can be detected in each paragraph rectangle.
[0029]
The outline of the staff scanning start position detection process is as follows. At a certain position in the x-axis direction, the widths of black pixels and white pixels are obtained in order, and the positions where the obtained line widths and inter-widths are arranged in the form of a staff Detecting in consideration of some error. Then, in order to remove the influence of the additional line (the horizontal line added to describe the note that protrudes from the staff), a condition that there is a white pixel width larger than the width between both sides of the staff line is added. The midpoint of each black run at the x position where the arrangement of black and white pixels meeting this condition is set as the staff scanning start position.
[0030]
In S32, the staff is traced and a shift amount for correction is detected. The outline of the staff shift amount detection process will be described. The staff is traced by changing the position to the left and right one dot at a time from the staff scanning start position (5 black pixel positions) of the obtained x position. When the number of black pixels becomes less than a certain number (for example, 3 or 4) out of 5 points, the 5 pixels are shifted up and down to check the number of black pixels, and the y coordinate increases in the direction in which the ratio of black pixels increases. To shift. The shift amount from the start position is taken as the staff shift amount. The staff is detected by tracing from the staff scanning start position to the left and right until the position where the number of black pixels becomes zero.
[0031]
FIG. 5 is a flowchart showing details of the paragraph recognition process in S6 of FIG. In S40, the detected staff information is arranged in order from the top. In S41, the absolute value d of the difference between the x-coordinates at the left end of the staff is obtained for adjacent staffs. Note that the staff in the paragraph is not necessarily adjacent in the y-coordinate, such as when the paragraph is cut off in the middle, so check all the combinations of staffs, or for a certain staff All staffs within a predetermined y-axis distance are checked.
[0032]
In S42, it is determined whether or not the distance d is less than the predetermined threshold value THR. If the result is negative, the process proceeds to S46, but if the result is affirmative, the process proceeds to S43. In S43, it is checked whether or not there is a continuous paragraph line in the region between the left ends of the two staffs. The presence of a paragraph line can be determined by checking whether the black pixels are continuous in 8 connections from top to bottom, but in order to be able to recognize the presence of a paragraph line even if the image is faint, Even if a certain number of black pixels are interrupted, if it continues thereafter, it is determined that a paragraph line is present.
[0033]
FIG. 6 is an explanatory diagram showing a paragraph line detection area. The paragraph line is detected in a region A between the left ends of the upper and lower staffs. Since the coordinates of the left end of the two staffs are known, the area A can be obtained from the coordinate data. When a musical score image is read by a narrow scanner such as a handy scanner, image data may be divided in the middle of a paragraph. In such a case, it is checked whether or not there is a paragraph line extending to the end of the image data in the upper area B of the uppermost staff or the lower area C of the lowermost staff. Detect continuity with data. For example, if a paragraph line exists in the area C and a paragraph line also exists in the area B of the next image data, it is determined that the two line lines of the image data are connected.
[0034]
In S44, it is determined whether or not the check result in S43 is a paragraph line. If the result is negative, the process proceeds to S46, but if the result is affirmative, the process proceeds to S45. In S45, the upper and lower staffs are set to the same paragraph number. In S46, it is determined whether or not there is any remaining staff combination. If there is still a combination of the staffs, the process returns to S41 and the paragraph line check is repeated. If the obtained paragraph rectangles do not overlap in the horizontal axis direction but overlap in the vertical axis direction and the number of staffs is equal, it is assumed that the paragraph is a coder-like paragraph. In this case, two paragraphs are combined into one paragraph, or as another paragraph, symbols and notes are recognized and time series processing is performed later.
[0035]
In S47, brackets are recognized in order to distinguish whether the staff is a grand staff or a single staff in each paragraph where a paragraph line is detected. Symbol recognition is performed while performing label extraction in the area to the left of the left end of the staff of the obtained paragraph. At this time, since the brackets may be in contact with the paragraph line, the label is cut in advance at the part of the paragraph line. The symbol recognition at this time may be a recognition rate that can distinguish between a large arc and a large arc line. Also, since the brackets are often separated by a thin portion at the center, separate dictionary data is also prepared, and the recognition results are combined when the upper and lower separation brackets are recognized. Then, among the staffs in the paragraph, the group of staffs recognized to be enclosed in brackets are assigned the same part number.
[0036]
Although the embodiments have been disclosed above, the following modifications are also conceivable. The image data has been disclosed as a method of recognizing the image data read by the scanner one by one in order. However, when the image data is read by the hand scanner, for example, there is a high possibility that one song is captured as a plurality of image files. In such a case, a plurality of image data files may be combined into a single file and recognized. Further, the recognition processing of the present invention may be performed after the image data is divided into a plurality of pieces in advance.
[0037]
In the paragraph recognition process, a method for checking the presence or absence of a paragraph line has been disclosed. However, a black pixel histogram is taken in the x-axis and y-axis directions in advance, and the existence of a paragraph is estimated by detecting a complete blank portion thereof. May be.
[0038]
During paragraph recognition processing, character string recognition for recognizing part names may be performed simultaneously with recognition of brackets and arcs. In this case, alphanumeric characters (Japanese) are added to the dictionary with a wider range than the recognition of the brackets being recognized. When the character is recognized, the recognition result of the character string target is extracted from the recognition rectangle sequence, and the part name is recognized by the maximum value selection threshold value determination of the character string matching point. A character string dictionary of part names for this is also prepared. In many cases, the part name is abbreviated by paragraph, so that the correspondence between the abbreviation and the other part name can be taken.
[0039]
Recognition processLabels that are separated from the top and bottom edges of the rectangle and from the staff in a rectangleThat is, the components of the staffUp and downRecognition processMay be in both rectangles. In this case, the previousRecognition processThe symbol information recognized by the rectangleRecognition processCheck when the rectangle is recognized, thenRecognition processIf the same figure in the rectangle is erased in advance, it can contribute to speeding up the recognition. Further, for example, downward slur a rectangular bottom, some symbols, because there is what can determine belongs in advance either part, in this case, putting the mark information only to the corresponding part.
In the embodiment, the shift amount is calculated for each staff and the shift correction is performed in the rectangular image. However, one shift amount may be calculated for the entire captured image, and the entire image may be shifted. .
[0040]
【The invention's effect】
As described above, according to the present invention, staffs can be grouped for each paragraph, and each part can be matched between paragraphs. Therefore, there is an effect that it is possible to automatically convert notes and the like described in the score into performance information with appropriate timing and tone color. In addition, there is an effect that it is possible to achieve a high recognition rate even if the score magnification and the symbol shape are different.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an embodiment of a score recognition apparatus of the present invention.
FIG. 2 is an explanatory diagram of a histogram.
FIG. 3 is a flowchart showing main processing of the CPU 1;
FIG. 4 is a flowchart showing details of the staff recognition process in S5 of FIG. 3;
FIG. 5 is a flowchart showing details of a paragraph recognition process in S6 of FIG. 3;
FIG. 6 is an explanatory diagram showing a paragraph line detection area;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... ROM, 3 ... RAM, 4 ... Hard disk device, 5 ... Floppy disk device, 6 ... CRT display device, 7 ... CRT interface circuit, 8 ... Keyboard, 9 ... Keyboard interface circuit, 10 ... Printer, 11 ... Printer interface circuit, 12 ... Scanner, 13 ... Scanner interface circuit, 14 ... MIDI interface circuit, 15 ... Bus

Claims

In a score recognition device for recognizing various symbols of a score from score image data read for each rectangular area ,
Musical score image data read for each of the rectangular area, and the staff detecting means for detecting the staff,
Musical score image data read for each of the rectangular area, to detect a paragraph line extending paragraph line extending between adjacent staves, from staff paragraph line and lowermost portion which extends from the top of the staff upward downward Paragraph line detection means ,
In the score image data read for each rectangular area adjacent in the vertical direction, a paragraph line extending downward from the lowest staff of the score image data read in the upper rectangular area is detected, and the lower rectangle is detected. When a paragraph line extending upward from the uppermost staff of the musical score image data read in the area is detected, the lowermost staff and the uppermost staff are included in one paragraph. music recognition and wherein the recognizing.

In a score recognition device for recognizing various symbols of a score from score image data read for each rectangular area ,
Musical score image data read for each of the rectangular area, and the staff detecting means for detecting the staff,
Musical score image data read for each of the rectangular area, and the paragraph detecting means for detect the presence or absence of paragraph lines between a plurality of the staff,
On the left side of each paragraph line , there is a bracket detection means for detecting the presence or absence of brackets,
Above when it is detected that there is bracket by brackets detecting means, the musical score recognition apparatus characterized by recognizing a plurality of staffs and one part which is enclosed by the brackets.

Musical score image data, comprising a part name recognition means for recognizing the part names listed in the score, based on the part name recognized by the part name recognition means, and a part corresponding detecting means to take corresponding parts between paragraphs The musical score recognition apparatus according to claim 2, wherein: