JP2682383B2

JP2682383B2 - Music score recognition device

Info

Publication number: JP2682383B2
Application number: JP5171000A
Authority: JP
Inventors: 一彦首藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1992-08-03
Filing date: 1993-06-17
Publication date: 1997-11-26
Anticipated expiration: 2012-11-26
Also published as: JPH06102871A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、楽譜をイメージスキ
ャナ等により読み取って得た画像データから楽譜中の五
線、音符、記号及びそれらの位置等を認識し、その認識
結果に基づいて楽音の音高、発音タイミング及び発音時
間等の情報を生成出力する楽譜認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention recognizes staffs, notes, symbols and their positions in a musical score from image data obtained by reading the musical score with an image scanner or the like, and based on the recognition result, generates musical tones. The present invention relates to a musical score recognizing device that generates and outputs information such as pitch, sounding timing, and sounding time.

【０００２】[0002]

【従来の技術】楽譜をイメージスキャナ等で読み取って
得られた画像データから楽譜情報を認識し、その認識結
果からＭＩＤＩ（Musical Instrument Digital Interfa
ce）データを作成すると共に、作成されたＭＩＤＩデー
タを電子楽器等のＭＩＤＩ音源装置に供給することによ
り、楽譜の読取から自動演奏までの処理を一貫して行わ
せようとする試みがある。2. Description of the Related Art Score information is recognized from image data obtained by reading a score with an image scanner or the like, and a MIDI (Musical Instrument Digital Interface) is recognized from the recognition result.
ce) There is an attempt to create data and supply the created MIDI data to a MIDI tone generator, such as an electronic musical instrument, so that the processes from reading the score to performing automatically can be performed consistently.

【０００３】従来より行なわれているこの種のシステム
の一般的な処理の流れを示せば次のようになる。 (1) 楽譜データ取り込み。 (2) 五線検出・認識。 (3) 五線消去。 (4) 小節線検出・認識。 (5) 小節線消去。 (6) 音符検出。 (7) 音符認識。 (8) 音符消去。 (9) 記号検出。 (10)記号認識。 (11)記号消去。 (12)演奏データ作成。 (13)自動演奏（ＭＩＤＩデータ作成・出力）。The general processing flow of this type of system conventionally performed is as follows. (1) Import music score data. (2) Staff detection and recognition. (3) Clear staff. (4) Bar line detection and recognition. (5) Measure line deletion. (6) Note detection. (7) Note recognition. (8) Delete note. (9) Symbol detection. (10) Symbol recognition. (11) Symbol erasure. (12) Performance data creation. (13) Automatic performance (MIDI data creation / output).

【０００４】ここで、五線検出・認識は、垂直方向軸に
投影した水平方向の画素数のヒストグラム等に基づい
て、水平方向に長く連続する当間隔の線分を検出するこ
とにより行なわれる。検出された五線の垂直方向位置は
検出された音符の音高を決定する際の情報となる。検出
された五線は、音符及び記号の認識処理に不都合を与え
ないために消去される。検出された小節線は、同時に演
奏する複数の五線の組を認識するのに使用される。音符
検出・認識及び記号検出・認識処理は、公知のパターン
マッチングの手法を用いて行なわれる。なお、(10)の記
号認識まででＭＩＤＩコードは作成することができるの
で、(11)の記号消去は必ずしも必要ではないが、処理の
便宜上、記号を消去することもある。Here, the staff detection / recognition is performed by detecting a line segment having a long and continuous interval in the horizontal direction based on a histogram of the number of pixels in the horizontal direction projected on the vertical axis. The vertical position of the detected staff is used as information for determining the pitch of the detected note. The detected staff is deleted so as not to cause any inconvenience in the note and symbol recognition process. The detected bar lines are used for recognizing a set of a plurality of staffs that play simultaneously. The note detection / recognition and the symbol detection / recognition processing are performed using a known pattern matching technique. Since the MIDI code can be created up to the symbol recognition in (10), the symbol elimination in (11) is not always necessary, but the symbol may be erased for convenience of processing.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上述し
た従来の楽譜認識処理は、大型コンピュータによる処理
を前提としているために、複雑な画像処理を伴い、同様
の処理を一般的なパーソナルコンピュータで行なうと、
処理時間が長くなり、実用に耐えないという問題があ
る。また、楽譜には複数の音符を同時に発音させる和音
が存在し、しかも和音を構成する複数の音符の符長が全
て等しいとは限らないうえ、休符も存在するため、音符
及び記号からなるイベントの認識結果から実際の発音タ
イミングを決定するのは容易ではないという問題もあ
る。However, since the above-described conventional score recognition processing is premised on processing by a large computer, it involves complicated image processing, and if similar processing is performed by a general personal computer. ,
There is a problem that the processing time becomes long, and it is not practical. In addition, there are chords that generate multiple notes at the same time in the score, and the notes that make up the chords do not all have the same note length. There is also a problem that it is not easy to determine the actual pronunciation timing from the recognition result of.

【０００６】この発明は、このような事情に鑑みてなさ
れたもので、楽譜の認識を少ない計算量で効率良く行な
うことができ、しかも的確に楽譜上のイベント認識及び
楽音出力が可能になる楽譜認識装置を提供することを目
的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and a music score that can efficiently recognize a music score with a small amount of calculation and that can accurately recognize an event on a music score and output a musical sound. It is an object to provide a recognition device.

【０００７】[0007]

【課題を解決するための手段】この発明に係る楽譜認識
装置は、楽譜のイメージを読み取って得られた画像デー
タから前記楽譜中の五線、音符、記号及びそれらの位置
等を認識し、その認識結果に基づいて楽音の音高、発音
タイミング及び発音時間等の情報を生成する楽譜認識装
置において、前記楽譜中の五線、音符、記号及びそれら
の位置等の認識結果に基づいて、前記音符の音高、前記
音符及び記号からなるイベントの発生順序及び符長、並
びに同時に発生すべき複数の前記イベントを関連付ける
同時発生情報を求めるイベント認識手段と、前記イベン
トの発生順序及び符長に基づいて前記楽音の発音間隔を
決定すると共に、前記同時発生情報で関連付けられた複
数のイベントが存在する場合には前記複数のイベントの
うち符長が最も短いイベントの符長に基づいて前記発音
間隔を決定する発音間隔決定手段とを備えたことを特徴
とする。A musical score recognition apparatus according to the present invention recognizes staffs, notes, symbols and their positions in the musical score from image data obtained by reading an image of the musical score, In a musical score recognition device that generates information such as pitch of a musical tone, sounding timing and sounding time based on a recognition result, based on the recognition result of the staff, notes, symbols and their positions in the score, the musical notes Based on the pitch, the event occurrence order and the note length of the event consisting of the notes and symbols, and the coincidence information for associating the plurality of events to be simultaneously occurred, and the event occurrence order and the note length. The note length is the shortest among the plurality of events when the tone generation interval of the musical tone is determined and a plurality of events associated with the coincidence information are present. Based on the note length of the event, characterized in that a sound interval determination means for determining the sound interval.

【０００８】なお、前記発音間隔決定手段は、前記イベ
ントが休符である場合には、その符長を前記発音間隔に
加算していくものであることを特徴とする。When the event is a rest, the sounding interval determining means is to add the note length to the sounding interval.

【０００９】[0009]

【作用】この発明においては、楽譜中の五線、音符、記
号及びそれらの位置等の認識結果に基づいて、イベント
の発生順序及び符長を求めると共に、同時発生する複数
のイベントについては、同時発生情報で相互に関連付け
るようにし、楽音の発生間隔の決定に際して、同時発生
情報で関連付けられた複数のイベントが存在する場合に
は、それらのイベントのうち、符長が最短のイベントの
符長に基づいて発音間隔を決定するようにしているの
で、簡単な判定処理で正しい発音間隔を決定することが
可能になる。なお、イベントが休符の場合には、その符
長を前記発音間隔に加算していくという極めて簡易な方
法で、発音タイミングを容易に整合させることができ
る。According to the present invention, the order of occurrence of events and the note length are obtained based on the recognition result of the staff, notes, symbols and their positions in the musical score, and the simultaneous occurrence of a plurality of events occurs simultaneously. If there are multiple events associated with the coincidence information when determining the interval of musical tone generation, the note length of the event with the shortest note length will be used as the note length. Since the tone generation interval is determined based on this, it is possible to determine the correct tone generation interval by a simple determination process. When the event is a rest, the tone generation timing can be easily matched by a very simple method of adding the note length to the tone generation interval.

【００１０】[0010]

【実施例】以下、図面を参照して、この発明の実施例を
説明する。図１は、この発明の一実施例に係る楽譜認識
・自動演奏システムの構成を示すブロック図である。こ
の楽譜認識・自動演奏システムは、印刷楽譜等を読み取
って楽譜中の五線、音符、記号及びそれらの位置等を認
識し、その認識結果に基づいて楽音の音程、発生タイミ
ング及び発音時間等の演奏データを生成すると共に、こ
の演奏データに従ってＭＩＤＩデータを生成出力する楽
譜認識装置１と、この楽譜認識装置１から出力されるＭ
ＩＤＩデータに従って自動演奏処理を実行する電子楽器
等の外部ＭＩＤＩ音源装置２と、この外部ＭＩＤＩ音源
装置２により駆動されるスピーカ、オーディオ装置等の
出力装置３とにより構成されている。楽譜認識装置１
は、パーソナルコンピュータ、ワークステーション等の
コンピュータシステムからなり、システムバス１１を介
して相互に接続されたイメージスキャナ１２、ＣＰＵ１
３、ＲＯＭ１４、ＲＡＭ１５、タイマ１６、スイッチ１
７、ディスプレイ１８及びＭＩＤＩインタフェース１９
から構成されている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a musical score recognition / automatic performance system according to an embodiment of the present invention. This musical score recognition / automatic performance system reads printed musical scores and the like to recognize staves, notes, symbols and their positions in the musical score, and based on the recognition result, determines the pitch, generation timing and sounding time of musical sounds. A music score recognition device 1 that generates performance data and generates and outputs MIDI data in accordance with the performance data, and M output from the music score recognition device 1.
An external MIDI sound source device 2 such as an electronic musical instrument that executes an automatic performance process according to the IDI data, and an output device 3 such as a speaker and an audio device driven by the external MIDI sound source device 2. Music score recognition device 1
Is composed of a computer system such as a personal computer or a workstation, and is connected to an image scanner 12 and a CPU 1 via a system bus 11.
3, ROM 14, RAM 15, timer 16, switch 1
7, display 18 and MIDI interface 19
It is composed of

【００１１】イメージスキャナ１２は、楽譜のイメージ
を光学的に読み取ってドットデータからなる２値画像デ
ータを原画像データとして楽譜認識装置１内に取り込む
画像入力装置である。ＣＰＵ１３は、ＲＯＭ１４に格納
された楽譜認識プログラムに従って、楽譜認識処理を実
行する。ＲＡＭ１５は、イメージスキャナ１２で取り込
まれた２値画像データを記憶すると共に、楽譜認識処理
を実行する際のワークエリアを提供する。タイマ１６
は、楽譜認識処理で得られた演奏データに基づいてＭＩ
ＤＩデータの出力タイミングを決定し、ＣＰＵ１３にＭ
ＩＤＩデータ出力のための割り込みをかける。スイッチ
１７及びディスプレイ１８は、楽譜認識装置１とオペレ
ータとの間のマンマシンインタフェースである。この楽
譜認識装置１は、生成したＭＩＤＩデータをＭＩＤＩイ
ンタフェース１９を介して外部ＭＩＤＩ音源装置２に出
力する。The image scanner 12 is an image input device that optically reads an image of a musical score and takes binary image data composed of dot data into the musical score recognition device 1 as original image data. The CPU 13 executes a score recognition process according to a score recognition program stored in the ROM 14. The RAM 15 stores the binary image data captured by the image scanner 12 and provides a work area when executing the score recognition processing. Timer 16
Is based on the performance data obtained in the score recognition process.
The output timing of DI data is determined, and M
An interrupt for outputting IDI data is issued. The switch 17 and the display 18 are a man-machine interface between the musical score recognition device 1 and an operator. The musical score recognition device 1 outputs the generated MIDI data to the external MIDI sound source device 2 via the MIDI interface 19.

【００１２】次に、このように構成された楽譜認識・自
動演奏システムの動作を説明する。図２は、楽譜認識・
自動演奏処理のフローチャートである。この実施例にお
ける楽譜認識・自動演奏処理は、大きく分けて次の４つ
の部分から構成されている。（１）前処理（五線・小節線認識、傾斜補正、五線消去
およびビーム消去）。（２）オブジェクト認識（外接長方形の探索およびマッ
チング処理）。（３）イベント認識処理（音高認識および音長認識処
理）及び演奏データ作成。（４）自動演奏（ＭＩＤＩデータ作成及び出力）。Next, the operation of the musical score recognition / automatic performance system configured as described above will be described. Figure 2 shows the score recognition and
It is a flowchart of an automatic performance process. The musical score recognition / automatic performance processing in this embodiment is roughly divided into the following four parts. (1) Preprocessing (staff / bar recognition, inclination correction, staff elimination and beam elimination). (2) Object recognition (circumscribed rectangle search and matching processing). (3) Event recognition processing (pitch recognition and pitch recognition processing) and performance data creation. (4) Automatic performance (MIDI data creation and output).

【００１３】まず、イメージスキャナ１２で楽譜を読み
取り、楽譜の原画像データをＲＡＭ１５に保存する（ス
テップＳ１）。いま、図３に示すような楽譜をイメージ
スキャナ１２で読み込んだ場合、楽譜がＸ軸及びＹ軸に
対して傾いて読み取られることがある。この場合、図４
に示すように、得られた原画像データはＸ軸及びＹ軸に
対して所定角度だけ傾く。原画像データにこのような傾
きが存在すると、認識処理に多大な影響を与える。即
ち、Ｘ軸に対する傾きθ１については、五線がＸ軸方向
に長いという関係上、図４に示すように、僅かな傾斜で
もプロジェクション（ドット数のＹ軸投影図）Ｐのプロ
フィールはかなりなまった形になるので、五線位置の検
出が不可能になる。また、Ｙ軸に対する傾きθ２は、オ
ブジェクト認識でのマッチング処理や同時発生させるべ
き和音の認識を困難にする。First, a musical score is read by the image scanner 12, and original image data of the musical score is stored in the RAM 15 (step S1). Now, when a musical score as shown in FIG. 3 is read by the image scanner 12, the musical score may be read while being inclined with respect to the X axis and the Y axis. In this case, FIG.
As shown in (1), the obtained original image data is inclined by a predetermined angle with respect to the X axis and the Y axis. The presence of such an inclination in the original image data has a great effect on the recognition processing. That is, regarding the inclination θ1 with respect to the X-axis, the profile of the projection (Y-axis projection of the number of dots) P is considerably reduced even with a slight inclination, as shown in FIG. 4, because the staff is long in the X-axis direction. As a result, the staff position cannot be detected. Further, the inclination θ2 with respect to the Y axis makes it difficult to perform matching processing in object recognition and recognition of chords to be generated simultaneously.

【００１４】そこで、このようなＸ軸及びＹ軸に対する
傾斜を補正すべく、前処理の第１段階として、五線認識
傾斜補正処理が実行される（ステップＳ２）。図５は、
この五線認識傾斜補正処理の詳細を示すフローチャート
である。まず、五線を認識する際の妨げになるオブジェ
クトを除去するため、原画像データからドットが水平方
向（Ｘ軸方向）に所定数だけ連続的に並んでいる水平ラ
ンを抽出する（ステップＳ１１）。オブジェクトを効果
的に除去するために、検出すべき水平ランのドット数
は、オブジェクトの一般的な水平方向長さよりも長い値
に設定する必要がある。図４の原画像データに対して水
平ランの抽出処理を行った結果を図６に示す。水平方向
に比較的長い成分を含むビーム等、五線以外の一部の画
像データが残ることがあるが、オブジェクトの大部分が
除去されるので、大きな支障はない。Therefore, in order to correct such an inclination with respect to the X axis and the Y axis, a staff recognition inclination correction processing is executed as a first stage of the preprocessing (step S2). FIG.
It is a flowchart which shows the detail of this staff recognition inclination correction processing. First, in order to remove an object that hinders recognition of a staff, a horizontal run in which dots are continuously arranged by a predetermined number in the horizontal direction (X-axis direction) is extracted from the original image data (step S11). . In order to effectively remove an object, the number of horizontal run dots to be detected needs to be set to a value longer than the general horizontal length of the object. FIG. 6 shows the result of performing the horizontal run extraction process on the original image data of FIG. Some image data other than the staff, such as a beam including a relatively long component in the horizontal direction, may remain, but since most of the objects are removed, there is no major problem.

【００１５】次に、抽出された水平成分画像データから
五線を判別する（ステップＳ１２）。即ち、図６に示す
ように、水平成分画像データを、左端から順次Ｙ軸方向
に走査し、間隔の揃った５本の水平線を五線として検出
する。より具体的には、Ｙ軸方向走査により検出される
水平のランの間隔が、図６に示すように、ｄ１、ｄ２、
ｄ３およびｄ４であるとすると、下記数１に示すＤｍ
（各水平ランの平均間隔）が下記数２に示すＤｓ（平均
間隔と各間隔との偏差の合計）が３ドット未満を満足す
る場合、これらの５本の水平ランを五線と判断する。即
ち、各水平ランの間隔のばらつきが所定値未満であれば
五線とみなす。Next, a staff is determined from the extracted horizontal component image data (step S12). That is, as shown in FIG. 6, the horizontal component image data is sequentially scanned in the Y-axis direction from the left end, and five horizontal lines with uniform intervals are detected as five lines. More specifically, as shown in FIG. 6, the intervals between horizontal runs detected by scanning in the Y-axis direction are d1, d2,
If d3 and d4, then Dm
When Ds (the sum of the deviation between the average interval and each interval) shown in the following Expression 2 satisfies less than 3 dots, the five horizontal runs are determined to be a staff. That is, if the variation of the interval between the horizontal runs is less than a predetermined value, it is regarded as a staff.

【００１６】[0016]

【数１】Dm＝（d1＋d2＋d3＋d4）／４Dm = (d1 + d2 + d3 + d4) / 4

【００１７】[0017]

【数２】 Ds＝｜Dm−d1｜＋｜Dm−d2｜＋｜Dm−d3｜＋｜Dm−d4｜＜３ドットDs = | Dm−d1 | + | Dm−d2 | + | Dm−d3 | + | Dm−d4 | <3 dots

【００１８】このとき、初めに上記の条件を満たした図
中黒ドットで示す位置を五線の位置とみなし、それら五
線の各Ｙ座標と五線の上端線から下端線までの幅Ｄａを
ＲＡＭ１５に記憶する。なお、イメージスキャナ１２の
解像度が３６０ｄｐｉの場合、五線の太さは５〜６ドッ
ト程度になるが、その場合には、各五線の中心のＹ座標
を記憶する。At this time, a position indicated by a black dot in the figure that first satisfies the above conditions is regarded as a staff position, and the Y coordinate of each staff and the width Da from the top line to the bottom line of the staff are determined. It is stored in the RAM 15. When the resolution of the image scanner 12 is 360 dpi, the thickness of the staff is about 5 to 6 dots. In this case, the Y coordinate of the center of each staff is stored.

【００１９】次に、図６に示すように、Ｘ軸方向に少し
ずつずらしながらＹ軸方向走査を実行し、下記数３によ
り五線の傾きθを算出する（ステップＳ１３）。Next, as shown in FIG. 6, scanning in the Y-axis direction is carried out while being shifted little by little in the X-axis direction, and the inclination θ of the staff is calculated by the following equation 3 (step S13).

【００２０】[0020]

【数３】∠θ＝ tan-1（ｘ／ｙ）３θ = tan-1 (x / y)

【００２１】但し、ｘはＸ軸方向のドット数、ｙはＹ軸
方向のドット数である。なお、θはＸ軸方向ｘドット当
たり、Ｙ軸方向ｙドットというデータでもよい。また、
Ｙ軸走査の高速化を図るため、図６に示すように、五線
の位置が検出された後の走査では、五線の幅Ｄａの範囲
を飛ばすようにする。走査は必ずしもＸ軸方向の全範囲
について行う必要はなく、例えば左から１／３の範囲に
ついて行うだけでも傾きの判断は十分可能であり、この
場合、更に処理の高速化を図ることができる。Here, x is the number of dots in the X-axis direction, and y is the number of dots in the Y-axis direction. Θ may be data of x dots in the X-axis direction and y dots in the Y-axis direction. Also,
In order to speed up the Y-axis scanning, as shown in FIG. 6, in the scanning after the position of the staff is detected, the range of the staff width Da is skipped. Scanning does not necessarily need to be performed for the entire range in the X-axis direction. For example, it is possible to sufficiently determine the inclination by performing, for example, only the range of 1/3 from the left. In this case, the processing speed can be further increased.

【００２２】次に、五線検出と同様の手法で原画像デー
タを例えばＸ軸方向に順次走査してＹ軸に長く延びる線
を小節線として検出する（ステップＳ１４）。例えばピ
アノ楽譜等、高音部（ト音記号付き）と低音部（ヘ音記
号付き）とを並行して演奏するため、複数の五線が共通
の小節線に結合されている場合には、五線の幅Ｄａの３
倍以上の長さを持つＹ軸方向線を小節線とみなす。この
ような条件は、楽譜の種類に応じて、スイッチ１７で指
定するようにしてもよい。ＲＡＭ１５には、小節線の情
報として、Ｘ座標及びその長さを記憶する。小節線の位
置の情報は、後述する音長の判定で拍子が合わないとき
のチェックに利用することができる。また、小節線の長
さの情報は、並行して演奏する五線の組を判断するのに
使用される。Next, the original image data is sequentially scanned, for example, in the X-axis direction in the same manner as the staff detection, and a line extending long in the Y-axis is detected as a bar line (step S14). For example, in order to play a treble portion (with treble clef) and a bass portion (with treble clef) in parallel, such as a piano score, if a plurality of staffs are connected to a common bar, 3 of line width Da
A Y-axis direction line having a length twice or more is regarded as a bar line. Such a condition may be specified by the switch 17 according to the type of the musical score. The X-coordinate and its length are stored in the RAM 15 as bar line information. The information on the position of the bar line can be used for checking when the time signature does not match in the determination of the sound length described later. The information on the bar length is used to determine a set of staffs to be played in parallel.

【００２３】続いて、原画像データの傾斜補正処理が実
行される。この傾斜補正処理は、Ｙ軸方向傾斜補正処理
（ステップＳ１５）と、Ｘ軸方向傾斜補正処理（ステッ
プＳ１６）とからなる。Ｙ軸方向傾斜補正処理（ステッ
プＳ１５）では、先に求められた傾斜角度θに基づい
て、図７（ａ）に示すように、各五線およびその近傍部
分（演奏に関与するＹ軸方向の範囲）をＸ軸方向のブロ
ック毎に、原画像上のドットをＹ軸方向にドット単位で
シフトさせ、各五線及びその周辺毎にＹ軸方向に傾斜補
正を行う。各五線及びその周辺毎に傾斜補正を行うよう
にした理由は、楽譜によっては五線毎に傾角が異なって
いるものもしばしば見られるからである。Ｘ軸方向傾斜
補正処理（ステップＳ１６）についても、同様に傾斜角
度θに基づいて、図７（ｂ）に示すように、Ｙ軸方向に
連続するブロック毎に、原画像上のドットをＸ軸方向に
ドット単位でシフトして、Ｙ軸に対する傾斜を補正す
る。但し、このＸ軸方向傾斜補正処理は、傾斜角度θ
が、所定の角度以上である場合に限り実行するようにし
てもよい。上記所定の角度は、イメージスキャナの解像
度を考慮し、オブジェクト認識や和音検出に影響を与え
る限界値に設定する。また、この値は、小節線の長さに
よって変えるようにしてもよい。これにより、認識に影
響を与えない小さな傾斜の場合には、Ｙ軸に対する傾斜
補正処理を省略して処理時間を短縮することができる。
図８は、Ｙ軸方向傾斜補正処理後の画像データの例、図
９は、Ｘ軸方向傾斜補正処理後の画像データの例を示し
ている。傾斜補正後のＸ軸方向のプロジェクションのプ
ロフィールＰ′は、図９に示すように、五線の位置に高
いピーク値を持ったものとなる。Subsequently, the inclination correction processing of the original image data is executed. The tilt correction process includes a Y-axis direction tilt correction process (step S15) and an X-axis direction tilt correction process (step S16). In the Y-axis direction inclination correction processing (step S15), based on the inclination angle θ obtained previously, as shown in FIG. 7A, each staff and its vicinity (the Y-axis direction involved in the performance). Range), the dots on the original image are shifted in the Y-axis direction for each block in the X-axis direction, and tilt correction in the Y-axis direction is performed for each staff and its periphery. The reason why the inclination correction is performed for each staff and its periphery is that, depending on the musical score, there are often cases where the inclination is different for each staff. In the X-axis direction tilt correction process (step S16), similarly, based on the tilt angle θ, as shown in FIG. Is shifted in units of dots in the direction to correct the inclination with respect to the Y axis. However, this X-axis direction inclination correction processing is performed by the inclination angle θ.
May be executed only when the angle is equal to or larger than a predetermined angle. The predetermined angle is set to a limit value that affects object recognition and chord detection in consideration of the resolution of the image scanner. This value may be changed according to the length of the bar line. Accordingly, in the case of a small inclination that does not affect recognition, the processing time can be reduced by omitting the inclination correction processing for the Y axis.
FIG. 8 shows an example of image data after the Y-axis direction inclination correction processing, and FIG. 9 shows an example of image data after the X-axis direction inclination correction processing. As shown in FIG. 9, the projection profile P 'in the X-axis direction after the inclination correction has a high peak value at the position of the staff.

【００２４】そこで、次に傾斜補正後の画像データのＸ
軸方向のプロジェクションを求め（ステップＳ１７）、
得られたプロジェクションの各ピーク部分のＹ軸方向の
中心位置のＹ座標を五線の正式な位置とする（ステップ
Ｓ１８）。このとき、同時に正式な五線の間隔Ｄも求め
ておく。また、プロジェクションを求める方法の他、五
線が示すピーク値の中間位置を走査するようにＸ軸の中
間位置をサンプリングして、Ｄｓ＜３ドットの条件を満
たすＹ座標を五線の正式な位置とするようにしてもよ
い。なお、図９の画像データは、Ｘ軸方向の傾斜補正も
行っているので、Ｙ軸方向のプロジェクションを求める
と、後述するオブジェクト認識の際、オブジェクトの正
確なＸ軸方向位置を判定することもできる。Then, the X of the image data after the inclination correction is
An axial projection is obtained (step S17),
The Y coordinate of the center position in the Y-axis direction of each peak portion of the obtained projection is set as a formal staff position (step S18). At this time, the official staff interval D is also obtained at the same time. In addition to the method of obtaining the projection, the X-axis intermediate position is sampled so as to scan the intermediate position of the peak value indicated by the staff, and the Y coordinate satisfying the condition of Ds <3 dots is set to the official position of the staff. You may make it. Note that the image data in FIG. 9 also performs tilt correction in the X-axis direction. Therefore, when a projection in the Y-axis direction is obtained, an accurate X-axis position of the object can be determined at the time of object recognition described later. it can.

【００２５】以上の五線認識傾斜補正処理（ステップＳ
２）が終了すると、続いて、傾斜補正後の画像データか
ら五線部分のみを消去する（ステップＳ３）。このステ
ップＳ３で五線を消去するのは、後に述べるオブジェク
ト認識の際に五線が残っていると不都合を生じるためで
ある。五線を消去する際に注意しなければならないの
は、図１３（ａ）に示すように、楽譜を構成するオブジ
ェクトまで消去しないような処理を行わなければならな
いことである。The above-described staff recognition inclination correction processing (step S)
After 2), only the staff portion is deleted from the image data after the inclination correction (step S3). The reason why the staff is erased in this step S3 is that inconvenience will occur if the staff remains in the object recognition described later. It should be noted that when the staff is deleted, as shown in FIG. 13A, processing must be performed so as not to delete the objects constituting the musical score.

【００２６】この五線消去処理の詳細を図１０に示すフ
ローチャートを参照して説明する。まず、傾斜補正後の
画像データから、平均的な五線の太さに相当する所定の
太さ有し且つ認識された五線位置に沿う画素ドットの連
なりを弁別する（ステップＳ２１）。次に、その太さを
チェックしながら、所定の太さを満たさなくなった時点
で、その連なりの長さを弁別する（ステップＳ２２）。
その連なりが例えば五線間隔Ｄの２倍以上の連なりと判
断されれば、その連なりを消去する（ステップＳ２
３）。即ち、図９のＡを拡大して一部省略した図１１を
例にとると、五線の太さは５〜６ドット程度であるた
め、若干の余裕をとって、８ドット程度の太さＤｌに収
まるＸ軸方向の連なりを追跡し、太さの条件を満たさな
くなった時点ｔで、その連なりの長さＬｌが２Ｄ以上で
あるかどうかを判定する。もし、その条件を満たせば、
両端に若干のマージンＭをとって、その間の線分（図中
Ａｉの領域）を消去する。これにより、図１２に示すよ
うに、五線の大部分が消去され、同図Ｂの部分を拡大し
た図１３（ｂ）に示すように、五線を若干残した状態で
あるがオブジェクトの情報を完全に残した状態で五線を
消去することができる。なお、弁別の基準となる平均的
な五線の太さについては、予め一般的な楽譜より、五線
の間隔に対応する五線の太さの平均的な範囲を求めて記
憶させておくようにしてもよく、また、前述した五線認
識の結果を利用してもよい。The details of the staff clearing process will be described with reference to the flowchart shown in FIG. First, from the image data after the inclination correction, a series of pixel dots having a predetermined thickness corresponding to the average staff thickness and along the recognized staff position is discriminated (step S21). Next, while checking the thickness, when the predetermined thickness is not satisfied, the length of the series is discriminated (step S22).
If the sequence is determined to be, for example, a sequence of twice or more the staff interval D, the sequence is deleted (step S2).
3). That is, in the example of FIG. 11 in which A of FIG. 9 is enlarged and partly omitted, the thickness of the staff is about 5 to 6 dots. The sequence in the X-axis direction that falls within Dl is tracked, and at the time t when the thickness condition is no longer satisfied, it is determined whether or not the length Ll of the sequence is 2D or more. If you meet that condition,
With a slight margin M at both ends, the line segment (Ai area in the figure) between them is erased. As a result, as shown in FIG. 12, most of the staff is erased, and as shown in FIG. 13B in which the portion of FIG. The staff can be erased in a state where is completely left. As for the average staff thickness as a criterion for discrimination, an average range of staff thickness corresponding to the staff interval is calculated and stored in advance from a general musical score. Alternatively, the result of the staff recognition described above may be used.

【００２７】五線消去処理（ステップＳ３）が終了する
と、次にビーム認識消去処理（ステップＳ４）を実行す
る。ビームも五線の場合と同様でオブジェクト認識の際
に、例えば図１４（ｂ）に示すような和音の黒丸（符
頭）の塊等と誤認するおそれがあるので、少なくとも音
長認識の処理を行うまでは画像データから消去してお
く。ただし、ビームの場合、一定の条件にあてはまらな
い形態のものも少なくなく、それを完全に認識判別する
ことは不可能であるので、以後の処理の便宜のため少な
くとも典型的な形態のビームを認識消去するようにす
る。仮に極端な形態のビームが消去されずに残ったとし
ても、オブジェクトが消去されるよりは好ましい。典型
的な形態のビームの条件としては、図１４に示すよう
に、太さＤｂ、長さＬｂ及び角度θｂが下記の条件を満
たすものとする。When the staff erasing process (step S3) is completed, a beam recognition erasing process (step S4) is executed. Similarly to the case of the staff, the beam may be erroneously recognized at the time of object recognition as, for example, a chord of black circles (note heads) of chords as shown in FIG. Until the operation is performed, the image data is deleted. However, in the case of a beam, there are many forms that do not meet certain conditions, and it is impossible to completely recognize and determine the beam. Try to erase it. Even if the extreme form of the beam remains without being erased, it is preferable to the object being erased. As a condition of a beam in a typical form, as shown in FIG. 14, the thickness Db, the length Lb, and the angle θb satisfy the following conditions.

【００２８】太さＤ／３＜Ｄｂ＜Ｄ長さＬｂ≧２Ｄ角度 −４５°≦θｂ≦４５°Thickness D / 3 <Db <D Length Lb ≧ 2D Angle −45 ° ≦ θb ≦ 45 °

【００２９】図１５は、このビーム認識消去処理のフロ
ーチャートである。まず、五線が消去された画像データ
から太さＤｂの条件を満たすドッドの連なりを弁別し
（ステップＳ３１）、その連なりが長さＬｂの条件を満
たすかどうかを弁別する（ステップＳ３２）。その連な
りが２つの条件を満たしていれば、最後にその連なりが
角度θｂの条件を満たしているかどうかを弁別する（ス
テップＳ３３）。そして、全ての条件を満たしたドット
の連なりを消去する（ステップＳ３４）。FIG. 15 is a flowchart of the beam recognition erasing process. First, a series of dots that satisfies the condition of the thickness Db is discriminated from the image data from which the staff is deleted (step S31), and whether or not the series satisfies the condition of the length Lb is discriminated (step S32). If the sequence satisfies the two conditions, it is finally discriminated whether the sequence satisfies the condition of the angle θb (step S33). Then, a series of dots satisfying all the conditions is deleted (step S34).

【００３０】ビーム認識消去処理（ステップＳ４）が終
了すると、次にオブジェクト認識処理（ステップＳ５）
を実行する。このオブジェクト認識処理の詳細を図１６
に示すフローチャートを参照して説明する。まず、画像
の全面に対してマッチング処理を行うのは効率が悪いの
で、マッチング処理を行う範囲を限定するため、五線お
よびビームが消去された画像データをスキャンして、ド
ットの塊が存在する範囲毎に外接長方形を設定する（ス
テップＳ４１）。例えば、図１７に示すように、五線お
よびビームが消去された画像を、左上から水平に逐次ス
キャンして、図示＊点でドットにぶつかったら、そのド
ットの塊の輪郭（◇で示す）を図示矢印方向に追跡し
て、ＸＹの最大値Ｘmax，Ｙmax 及び最小値Ｘmin ，Ｙm
in を求め、これらの座標値で囲まれる長方形を外接長
方形とする。具体的には、図示＊点を起点として、「上
下左右を調べ、ドットがあればその点をレジストして、
そのドットを一旦消す」という操作を上下左右を調べて
ドットが見つからなくなるまで繰り返す。なお、五線お
よびビームが消去された画像を、全てのドットの塊の外
接長方形が求められた時点で、復元できるようにするた
め、五線およびビームが消去された画像を、外接長方形
を求める前に別途保存しておくか、あるいは一旦消去し
たドットの位置を全てレジストしておく。When the beam recognition erasing process (Step S4) is completed, the object recognition process (Step S5) is performed.
Execute The details of this object recognition processing are shown in FIG.
This will be described with reference to the flowchart shown in FIG. First, since it is inefficient to perform the matching process on the entire surface of the image, in order to limit the range of performing the matching process, scan the image data from which the staff and the beam have been erased, and there is a lump of dots. A circumscribed rectangle is set for each range (step S41). For example, as shown in FIG. 17, the image from which the staff and the beam have been erased is sequentially scanned horizontally from the upper left, and if the image hits a dot at the indicated * point, the outline of the dot cluster (indicated by 塊) is obtained. Tracking in the direction of the arrow shown in the figure, the maximum value Xmax, Ymax and the minimum value Xmin, Ym of XY are obtained.
in is obtained, and a rectangle surrounded by these coordinate values is set as a circumscribed rectangle. Specifically, with the * point shown as a starting point, "up, down, left, and right, and if there is a dot, register that point,
The operation of "erasing the dot once" is repeated until the dot is not found by examining the top, bottom, left and right. In addition, in order to be able to restore the image from which the staff and the beam are erased at the time when the circumscribed rectangle of all the dot clusters is obtained, the image from which the staff and the beam are erased is obtained by circumscribing the rectangle. Either save it separately beforehand, or register all the positions of the dots once erased.

【００３１】次に、求められた外接長方形内を、レファ
レンスパターンでスキャンして、マッチング度を評価す
る（ステップＳ４２）。このステップＳ４２で用いるマ
ッチングのテンプレートとしてのレファレンスパターン
は、図１８に示すように、各種オブジェクトのパターン
の特徴を効果的にとらえるような特徴点、すなわち画素
ドットが存在すべき点（図中黒丸）及び画素ドットが存
在してはならない点（図中白丸）の分布パターンとして
形成されたもので、予めＲＯＭ１４に登録されている。
図１８は、シャープ記号（ａ），（ｂ）、ナチュラル記
号（ｃ）、全音符（ｄ）、２分音符（ｅ）及び４分音符
（ｆ）のリファレンスパターンを示している。シャープ
記号（ａ），（ｂ）とナチュラル記号（ｃ）とは、かな
り類似しているが、右上及び左下の部分が、前者では画
素ドットの存在すべき点、後者では画素ドットの存在し
てはならない点に設定されているので、この相違によっ
て両者を識別することができる。また、全音符（ｄ）と
２分音符（ｅ）も、かなり類似しているが、楕円の傾き
を考慮した図示のようなパターンを使用することによっ
て、両者を識別することができる。また、レファレンス
パターンは、五線の消え残し（前述の五線消去のときに
残したマージンＭ）の部分を判断しないようなパターン
に設定されている。また、図１８（ｂ）に示すように、
同（ａ）に対してオブジェクトが半音ずれた状態では五
線の消え残しの位置が異なり、五線が消え残る可能性の
ある位置には、特徴点を付けないようにしているので、
五線の一部が残っていてもその影響を受けない。Next, the inside of the determined circumscribed rectangle is scanned with a reference pattern to evaluate the degree of matching (step S42). As shown in FIG. 18, the reference pattern as the matching template used in step S42 is a feature point that effectively captures the pattern features of various objects, that is, a point where a pixel dot should exist (black circle in the figure). And a distribution pattern of points (white circles in the figure) where pixel dots must not exist, and are registered in the ROM 14 in advance.
FIG. 18 shows reference patterns of sharp symbols (a) and (b), natural symbols (c), whole notes (d), half notes (e) and quarter notes (f). The sharp symbols (a) and (b) and the natural symbol (c) are quite similar, but the upper right and lower left portions are the points where pixel dots should exist in the former and the pixel dots exist in the latter. The difference is set so that they can be distinguished from each other. Although the whole note (d) and the half note (e) are quite similar, they can be identified by using a pattern as shown in FIG. In addition, the reference pattern is set to a pattern that does not determine the portion of the staff remaining unerased (the margin M left when the staff is erased). Also, as shown in FIG.
In contrast to the case (a), when the object is shifted by a semitone, the position of the staff disappearing is different, and no feature point is attached to the position where the staff may possibly disappear.
Even if part of the staff remains, it is not affected.

【００３２】ＲＯＭ１４から読出されたレファレンスパ
ターンは、五線の間隔に基づいて縮小・拡大されてパタ
ーンマッチング処理に供される。このようにすることに
より、楽譜画像が、いかなる分解能で入力されたもので
あっても対応することができ、また児童向けの楽譜など
のようにオブジェクトの大きさが通常とは大きく異なる
ものにも対応することができる。こうして、オブジェク
トの大きさに対応させたレファレンスパターンを用い
て、外接長方形の中をスキャンして、各点において、も
し合致すれば評価値を「+1」とし、合致しなければ、図
１９に示すように、さらに上下左右斜め８方向を調べて
合致すれば評価値を「+0.9」とする。これはオブジェク
ト形状の多少の変形にも対応し得るようにするためであ
る。このような評価の結果、最終的に得られた評価値が
完全なマッチング時の９５％以上であれば、そのレファ
レンスパターンに該当するオブジェクトであると判定す
る（ステップＳ４３）。例えば、レファレンスパターン
が８個の特徴点から構成されているものとすると、評価
値が７．６（＝８×０．９５）以上であれば、そのレフ
ァレンスパターンに該当するオブジェクトであると判断
する。The reference pattern read from the ROM 14 is reduced / enlarged based on the interval between the staffs and subjected to pattern matching processing. By doing this, it is possible to handle a score image that is input at any resolution, and it is also possible to use an image whose object size is significantly different from normal, such as a score for children. Can respond. In this way, using the reference pattern corresponding to the size of the object, the inside of the circumscribed rectangle is scanned, and at each point, if they match, the evaluation value is set to “+1”. As shown in the figure, the evaluation value is set to “+0.9” if the eight directions are further checked and matched. This is to make it possible to cope with a slight deformation of the object shape. As a result of such evaluation, if the finally obtained evaluation value is 95% or more of the perfect matching, it is determined that the object corresponds to the reference pattern (step S43). For example, assuming that the reference pattern includes eight feature points, if the evaluation value is 7.6 (= 8 × 0.95) or more, it is determined that the object corresponds to the reference pattern. .

【００３３】具体的な処理に当たっては、ステップＳ４
３のオブジェクト判別と、ステップＳ４２のマッチング
評価とを連携させ、「既に認識できたパターンは画像か
ら消去する」、「ある程度マッチング度を評価して極端
に合わなければ、そのレファレンスパターンについての
マッチング度の評価を打ち切り、次のレファレンスパタ
ーンの当てはめによるマッチング度の評価を行う」など
して、処理の高速化を図る。また、この方式では、付点
および全休符は、認識が困難であるので、これらについ
ては幾何学的な特徴をとらえて認識する。すなわち、全
休符は、五線の間隔からその楽譜の全休符の持つべき面
積を計算し、それにあてはまり、且つドットが稠密な図
形を探すことにより認識し、付点は、後述するイベント
認識の過程で、極端に面積が小さく且つドットの分散が
少ないという特徴に基づいて認識する。なお、各レファ
レンスパターンの情報には、その中心座標の情報も含ま
れており、オブジェクト認識された時点で、レファレン
スパターンの中心座標の画像データにおける座標が認識
されたオブジェクトの情報と対でＲＡＭ１５に登録され
る。In the specific processing, step S4
In step S42, the object determination of step 3 and the matching evaluation of step S42 are linked, and “the pattern that has already been recognized is deleted from the image”. Is terminated, and the degree of matching is evaluated by applying the next reference pattern. " Also, in this method, it is difficult to recognize the dots and all rests, and therefore, they are recognized by capturing their geometric features. In other words, all rests are calculated by calculating the area that all rests of the score should have from the staff interval, and are applied to the rest, and are recognized by searching for a pattern with dense dots. Thus, recognition is performed based on the feature that the area is extremely small and the dispersion of dots is small. It should be noted that the information of each reference pattern also includes the information of the center coordinates thereof. At the time when the object is recognized, the coordinates in the image data of the center coordinates of the reference pattern are paired with the information of the recognized object in the RAM 15. be registered.

【００３４】オブジェクト認識処理（ステップＳ５）が
終了すると、次にイベント認識処理（ステップＳ６）を
実行する。図２０は、このイベント認識処理を示すフロ
ーチャートである。まず、五線認識処理で求められた五
線の情報と、オブジェクト認識処理で求められたオブジ
ェクトの座標情報とに基づき、音符の音高の認識を行う
（ステップＳ５１）。このステップＳ５１では、ステッ
プＳ４３で求められたシャープ、フラットおよびナチュ
ラル等の音高を制御するための音楽記号と、音符の白丸
および黒丸と（必要ならばさらに小節線と）をまとめて
Ｘ軸方向についてソートして、時系列に並べ変え、各音
符を逐次五線に当てはめて音高を決める。なお、ステッ
プＳ５１では、加線の部分については、原則的には、五
線の間隔Ｄを等倍した位置で音高を決定するが、実際に
は、加線の間隔は五線の間隔Ｄよりも広いことが多いの
で、それを考慮して、五線の間隔Ｄが例えば１．２倍等
の係数で広がっていくものとして音高を決定する。When the object recognition processing (step S5) is completed, an event recognition processing (step S6) is executed. FIG. 20 is a flowchart showing the event recognition process. First, the pitch of a musical note is recognized based on the staff information obtained by the staff recognition processing and the coordinate information of the object obtained by the object recognition processing (step S51). In step S51, music symbols for controlling pitches such as sharp, flat, and natural obtained in step S43, and white and black circles of notes (and bar lines if necessary) are collected in the X-axis direction. And sort them in chronological order, and apply each note to the staff one by one to determine the pitch. In step S51, the pitch is determined at a position obtained by multiplying the staff interval D by a factor equal to the staff line in principle. In consideration of this, the pitch is determined assuming that the staff interval D is spread by a coefficient such as 1.2 times.

【００３５】次に、ステップＳ４３のオブジェクトの判
別で得られた音符の符頭中心位置のＸ座標の近接の程度
を評価して、同時に鳴らすべき音を和音としてまとめる
（ステップＳ５２）。即ち、和音の形態としては、図２
１のｈ１，ｈ２，ｈ３のように種々の形態が考えられる
が、ｈ２の場合、同時に鳴らすべき音符の符頭中心のＸ
座標がＤｈだけずれることになる。この点を考慮して、
符頭中心のＸ座標の間隔Ｄｈが五線の間隔Ｄ以内のとき
に和音とみなすようにする。なお、和音検出をより正確
に行う場合には、後述するステムの間隔についても評価
の対象とすればよい。Next, the degree of proximity of the X-coordinate of the note head center position of the note obtained by the object discrimination in step S43 is evaluated, and sounds to be sounded at the same time are grouped as chords (step S52). That is, as a form of a chord, FIG.
Various forms are conceivable, such as h1, h2 and h3 of 1. In the case of h2, X at the center of the note head of the note to be simultaneously sounded
The coordinates will be shifted by Dh. With this in mind,
When the interval Dh between the X-coordinates of the noteheads is within the interval D between the staffs, it is regarded as a chord. When the chord detection is performed more accurately, the intervals between the stems, which will be described later, may be evaluated.

【００３６】次に、音符の符頭に結合された縦棒（ステ
ム）の端の位置の評価を行う（ステップＳ５３）。この
処理は、ビームまたは旗の数を探索及びカウントする際
に、探索を開始する点を決めるのに必要な処理である。
すなわち、図２２に示すように、ステップＳ５３では、
音符の右上および左下の端部を探索し、これらと符頭の
黒丸または白丸の中心位置（図中＊で示す）との距離を
求め、より遠いほうをステム端（図中△で示す）の座標
とする。ステムの歪に対処するため、探索は、符頭中心
位置から左上、上、右上および右と、左、左下、下およ
び右下との２方向にドットを再帰的に行う。図２２
（ｂ）のように、符頭が白丸である場合は、中心点
「＊」から上下左右を追跡し、黒ドットに当たったらと
ころから再帰的に追跡を始めるようにすると、楽譜の印
刷にかすれなどがあっても、確実に探索を行うことがで
きる。また、もし、オブジェクトとして認識された音符
の符頭が白丸で、その右上端部から中心点までの距離
と、その左上端部から中心点までの距離との差が少ない
ときは、単音の全音符とみなす。Next, the position of the end of the vertical bar (stem) connected to the note head of the note is evaluated (step S53). This processing is necessary for determining the point at which the search is started when searching and counting the number of beams or flags.
That is, as shown in FIG. 22, in step S53,
The upper right and lower left ends of the note are searched, and the distance between these ends and the center position of the black or white circle of the note head (indicated by * in the figure) is calculated. Coordinates. In order to cope with the distortion of the stem, the search recursively performs dots in two directions from the notehead center position: upper left, upper, upper right, and right, and left, lower left, lower, and lower right. FIG.
If the note head is a white circle as in (b), tracking up, down, left, and right from the center point “*” and recursively starting from the point where the black dot is hit will make the printing of the score poor. Even if there is a search, the search can be performed reliably. If the note recognized as an object has a white circle and the difference between the distance from the upper right end to the center point and the distance from the upper left end to the center point is small, the entire note Treated as a note.

【００３７】ステム端の位置が検出されたら、ステム端
を基準として、ビームまたは旗のカウントを行う（ステ
ップＳ５４）。即ち、図２３に示すように、ステム端と
して検出された位置が符頭の右上デある場合には、ステ
ム端「△」を中心に左右にＤ／２だけ振った点からＹ方
向に沿い且つ下に向かって追跡を始める。追跡方向に連
続している画素ドットの数を数え、この値がステップＳ
３１で述べたビームの幅のマージンに収まっていればカ
ウントし、和音の黒丸をビームとしてカウントしてしま
うのを防ぐため、ある程度空白が続けば追跡を中止す
る。一方、検出されたステム端が符頭の左下の点である
場合には、ステム端を中心に左右にＤ／２だけ振った点
からＹ方向に沿い且つ上に向かって追跡を始める。次
に、このようにしてカウントした、左右のビーム数を比
較し、音長が短いほう、すなわちビーム数の多いほうを
そのイベントの持つ音長として認識する。従って、図２
３の場合、検索対象となっている真ん中の音符は、右側
のビーム２本が優先されて１６分音符として認識され
る。When the position of the stem end is detected, the beam or the flag is counted based on the stem end (step S54). That is, as shown in FIG. 23, when the position detected as the stem end is located at the upper right corner of the note head, it is located along the Y direction from the point swung left and right by D / 2 around the stem end “△”. Start tracking down. The number of pixel dots that are continuous in the tracking direction is counted, and this value
If the beam is within the margin of the beam width described in 31, the counting is performed, and in order to prevent the black circle of the chord from being counted as the beam, the tracking is stopped after a certain amount of blank space. On the other hand, when the detected end of the stem is the lower left point of the notehead, tracking starts in the Y direction and upward from a point swung left and right by D / 2 around the stem end. Next, the number of left and right beams counted as described above is compared, and the shorter sound length, that is, the larger number of beams is recognized as the sound length of the event. Therefore, FIG.
In the case of 3, the middle note to be searched is recognized as a sixteenth note with priority given to the two right beams.

【００３８】次に、付点検出処理（ステップＳ５５）が
実行される。例えば図２２のように、符頭の中心点から
横に五線間隔Ｄ、上下にＤ／３の長方形の範囲で探索
し、充分に面積が小さく且つ画素ドットの分散が少ない
ドット群があればそれを、その音符に付属する付点と認
識する。最後に、ステップＳ４３で認識された音符の情
報と、ステップＳ５２およびステップＳ５４で得られた
ビームまたは旗の数の情報とをもとにして音長を決定す
る。検出された付点があるときは、対応する音符の音長
に元の長さの半分を加えて音長とする（ステップＳ５
６）。このとき、先に求めた小節線の位置情報を利用し
て、１小節毎に音長を最終チェックすることにより、例
えば４／４拍子であるのに、３．５／４拍子分しか音符
が存在しない等のエラーを出力することができる。Next, a dot detection process (step S55) is executed. For example, as shown in FIG. 22, if there is a dot group having a sufficiently small area and a small dispersion of pixel dots, a search is made in a rectangular range of a staff interval D horizontally and D / 3 vertically above and below the central point of the note head. It recognizes it as a dot attached to the note. Finally, the tone length is determined based on the information on the note recognized in step S43 and the information on the number of beams or flags obtained in steps S52 and S54. If there is a detected dot, a half note is added to the note length of the corresponding note to obtain a note length (step S5).
6). At this time, by using the previously obtained bar line position information, a final check of the note length is performed for each measure. An error such as non-existence can be output.

【００３９】このようにして、イベントの認識結果は、
楽譜の認識結果としてＲＡＭ１５に適宜保存される。例
えば図２４の楽譜の認識結果は、図２５のようになる。
認識結果は、イベント番号、イベント種類、符長（４分
音符長を２４クロックとしたときのクロック数）、同時
発生（和音）フラグより構成される。イベント番号は、
音符ａ〜ｅ，ｈ及び休止記号ｆ，ｇ等を特定する番号、
イベント種類は音符か休符かを示す情報及び音符の場合
にはＣ４，Ｆ３等の音高情報である。同時発生フラグが
「１」である場合には、その直前のイベント情報と和音
を構成する。Thus, the event recognition result is:
The result of the music score recognition is stored in the RAM 15 as appropriate. For example, the recognition result of the musical score of FIG. 24 is as shown in FIG.
The recognition result includes an event number, an event type, a note length (the number of clocks when a quarter note length is 24 clocks), and a simultaneous (chord) flag. The event number is
A number identifying notes a to e, h and pause symbols f, g, etc.
The event type is information indicating a note or a rest, and in the case of a note, pitch information such as C4 and F3. When the coincidence flag is “1”, a chord is formed with the immediately preceding event information.

【００４０】イベント認識（ステップＳ６）が終了する
と、次に、得られた認識結果から、ＭＩＤＩデータを作
成するための中間データである演奏データが作成される
（ステップＳ７）。即ち、図２６に示すように、外部Ｍ
ＩＤＩ音原装置２をコントロールするためには、求めら
れた符長からゲートタイム（実際のＭＩＤＩノートオン
データとノートオフデータの出力間隔で符長よりも短
い）を算出し、ノートオン及びノートオフのタイミング
を決定する必要がある。また、先行するイベントのノー
トオンタイミングから次のイベントのノートオンタイミ
ングまでの間隔、即ちデュレーションも決定する必要が
ある。そこで、図２５に示した認識結果から、デュレー
ション、ノートナンバ及びゲートタイムを含んだ演奏デ
ータを生成する。When the event recognition (step S6) is completed, performance data as intermediate data for creating MIDI data is created from the obtained recognition result (step S7). That is, as shown in FIG.
In order to control the IDI sound source device 2, a gate time (shorter than the note length at the output interval of the actual MIDI note-on data and note-off data) is calculated from the determined note length, and the note-on and note-off are calculated. It is necessary to determine the timing. It is also necessary to determine the interval from the note-on timing of the preceding event to the note-on timing of the next event, that is, the duration. Therefore, performance data including duration, note number, and gate time is generated from the recognition result shown in FIG.

【００４１】図２７は、演奏データ作成のフローチャー
ト、図２８は、この処理で作成される演奏データの一例
を示す図である。先ず、最初のノートイベントデータの
発生タイミングとしてデュレーションに「０」を書き込
むと共に、デュレーションレジスタ（ＤＵＲ）を「０」
にリセットする（ステップＳ６１）。次に、図２５に示
すように、予めＸ軸方向にソートされているイベント情
報をイベント番号の若い方から順番に取り出して、その
イベントに続くイベントの同時発生フラグが「１」であ
るかどうかを判定して、同時発生イベントの有無を判定
する（ステップＳ６２，６３）。同時発生イベントがな
い場合には、符長を次のイベントまでのデュレーション
としてＤＵＲに格納するが（ステップＳ６５）、同時発
生イベントがある場合には、その中で最短の符長を次の
イベントまでのデュレーションとしてＤＵＲに格納する
（ステップＳ６５）。FIG. 27 is a flowchart of performance data creation, and FIG. 28 is a diagram showing an example of performance data created by this processing. First, “0” is written to the duration as the generation timing of the first note event data, and the duration register (DUR) is set to “0”.
(Step S61). Next, as shown in FIG. 25, the event information pre-sorted in the X-axis direction is extracted in ascending order of the event number, and whether or not the simultaneous flag of the event following the event is “1”. To determine whether there is a simultaneous event (steps S62 and S63). If there is no simultaneous event, the note length is stored in the DUR as the duration up to the next event (step S65). If there is a simultaneous event, the shortest note length is stored in the DUR until the next event. Is stored in the DUR as the duration (step S65).

【００４２】続いて、イベントが音符であるかどうかを
判定し（ステップＳ６６）、音符である場合に限り、音
高に対応したノートナンバを書き込むと共に、符長×
０．８をゲートタイムとして書き込む（ステップＳ６
７，６８）。実際の楽器演奏では、スラーやスタッカー
ト気味の演奏を除き、ある音符の実際の発音期間は、符
長の０．８倍程度であることが多いためである。同時イ
ベントがある場合には、音符判定（ステップＳ６６）及
びノートナンバ、ゲートタイム書き込み（ステップＳ６
７，Ｓ６８）を繰り返す（ステップＳ６９）。他に同時
発生イベントがない場合には、ＤＵＲの内容をデュレー
ションとして書き込む（ステップＳ７１）。この時、書
き込まれたデュレーションデータが２つ連続する場合、
すなわち休符が存在したためにノートナンバ、ゲートタ
イムが書き込まれなかった場合は、連続する２つのデュ
レーションデータを加算して新たなデュレーションデー
タに書き直す。全てのイベントについて演奏データを作
成したら、最後にエンドデータを書き込む（ステップＳ
７０，７２）。この処理により、図２８に示すようなデ
ュレーション、ノートナンバ及びゲートタイムからなる
演奏データが求められる。Subsequently, it is determined whether or not the event is a note (step S66). Only when the event is a note, a note number corresponding to the pitch is written and a note length ×
Write 0.8 as the gate time (step S6)
7, 68). This is because, in actual musical instrument performance, the actual sounding period of a note is often about 0.8 times the note length, except for slurs and staccato sounds. If there is a simultaneous event, note determination (step S66) and note number and gate time writing (step S6)
7, S68) is repeated (step S69). If there is no other simultaneous event, the contents of DUR are written as duration (step S71). At this time, if two written duration data continue,
That is, when the note number and the gate time are not written due to the presence of the rest, two consecutive duration data are added and rewritten to new duration data. After the performance data has been created for all events, end data is written last (step S
70, 72). Through this processing, performance data including a duration, a note number, and a gate time as shown in FIG. 28 is obtained.

【００４３】最後に、求められた演奏データをもとに、
ＭＩＤＩデータを作成し、外部ＭＩＤＩ音源装置２に出
力することにより、自動演奏処理を実行する（ステップ
Ｓ８）。ここでは、テンポに応じた一定周期のクロック
をカウントし、演奏データに含まれるデュレーションの
時間が経過したら、ＭＩＤＩノートオンメッセージを出
力する。また、ノートオンを出力した時点から前記クロ
ックをカウントし、ゲートタイムの時間が経過したら、
ＭＩＤＩノートオフメッセージを出力する。これらのＭ
ＩＤＩデータは、例えば図２９に示すように、ノートオ
ン／オフ及び演奏チャンネルを示すステータスバイト、
音高を示すノートナンバ及び音の強さなどの情報である
ベロシティーの３バイトからなる。このうち、ベロシテ
ィーについては、アクセントやクレッシェンド、デクレ
ッシェンド等の強弱記号も認識した場合には、これらの
認識結果から作成してもよいし、認識した情報中にその
ようなデータがない場合には、スイッチ１７を通じて適
宜入力・作成するようにしてもよい。Finally, based on the obtained performance data,
Automatic performance processing is executed by creating MIDI data and outputting it to the external MIDI tone generator 2 (step S8). Here, a clock of a fixed cycle corresponding to the tempo is counted, and when the duration of the duration included in the performance data has elapsed, a MIDI note-on message is output. In addition, the clock is counted from the time when the note-on is output, and when the gate time elapses,
Outputs a MIDI note-off message. These M
The IDI data includes, for example, a status byte indicating a note on / off and a performance channel, as shown in FIG.
It is composed of three bytes of velocity, which is information such as a note number indicating a pitch and a sound intensity. Of these, the velocity can be created from the recognition results when dynamics such as accents, crescendos, and decrescendos are also recognized, or when there is no such data in the recognized information. May be input and created through the switch 17 as appropriate.

【００４４】このようにして、簡単な処理で楽譜を正確
に認識することができるので、処理速度の向上と必要な
メモリ容量の削減とを図ることができ、パーソナルコン
ピュータシステムを使用して実用に耐え得る楽譜認識を
行うことが可能となる。なお、上記の実施例では、楽譜
の認識結果から、一旦演奏データを作成した後にＭＩＤ
Ｉデータを作成したが、認識結果から直接ＭＩＤＩデー
タに変換するようにしてもよい。また、五線の傾きの求
め方は実施例のものに限らず、五線の左端と右端の座標
を求め、その値から傾きを求めるようにしてもよい。ま
た、傾きの補正はＸ軸方向の補正の後にＹ軸方向の補正
をするようにしてもよい。In this way, the score can be accurately recognized by simple processing, so that the processing speed can be improved and the necessary memory capacity can be reduced, and the personal computer system can be used for practical use. It is possible to perform a recognizable music score. In the above embodiment, once the performance data is created from the music score recognition result, the MID
Although the I data is created, it may be directly converted to MIDI data from the recognition result. The method of obtaining the inclination of the staff is not limited to that of the embodiment, but the coordinates of the left end and the right end of the staff may be obtained, and the inclination may be obtained from the values. The correction of the tilt may be performed in the Y-axis direction after the correction in the X-axis direction.

【００４５】[0045]

【発明の効果】以上述べたように、この発明によれば、
楽譜中の五線、音符、記号及びそれらの位置等の認識結
果に基づいて、イベントの発生順序及び符長を求めると
共に、同時発生する複数のイベントについては、同時発
生情報で相互に関連付けるようにし、楽音の発生間隔の
決定に際して、同時発生情報で関連付けられた複数のイ
ベントが存在する場合には、それらのイベントのうち、
符長が最短のイベントの符長に基づいて発音間隔を決定
するようにしているので、簡単な判定処理で正しい発音
間隔を決定することが可能になる。また、イベントが休
符の場合には、その符長を前記発音間隔に加算していく
という極めて簡易な方法で、発音タイミングを容易に整
合させることが可能になる。As described above, according to the present invention,
Based on the recognition results of the staff, notes, symbols and their positions in the score, the order of occurrence of events and the note length are determined, and multiple events that occur at the same time should be correlated with each other by the coincidence information. , When there is a plurality of events associated with the simultaneous occurrence information when determining the musical tone generation interval, of those events,
Since the sound generation interval is determined based on the code length of the event having the shortest note length, it is possible to determine the correct sound generation interval by a simple determination process. Further, when the event is a rest, it is possible to easily match the sound generation timing by a very simple method of adding the note length to the sound generation interval.

[Brief description of the drawings]

【図１】この発明の実施例に係る楽譜認識・自動演奏
システムブロック図である。FIG. 1 is a block diagram of a music score recognition / automatic performance system according to an embodiment of the present invention.

【図２】同楽譜認識・自動演奏処理のフローチャート
である。FIG. 2 is a flowchart of the musical score recognition / automatic performance process.

【図３】同認識対象の楽譜の例を示す図である。FIG. 3 is a diagram showing an example of a musical score to be recognized.

【図４】同楽譜を読み込んで得られた原画像データと
そのプロジェクションとを示す図である。FIG. 4 is a diagram showing original image data obtained by reading the musical score and a projection thereof.

【図５】同五線認識傾斜補正処理のフローチャートで
ある。FIG. 5 is a flowchart of a staff recognition inclination correction process.

【図６】図処理における水平成分抽出画像を示す図で
ある。FIG. 6 is a diagram showing a horizontal component extraction image in the graphic processing.

【図７】図処理における傾斜補正の方法を示す図であ
る。FIG. 7 is a diagram illustrating a method of tilt correction in the figure processing.

【図８】同処理におけるＹ軸方向傾斜補正後の画像デ
ータを示すである。FIG. 8 shows image data after Y-axis direction tilt correction in the same process.

【図９】同処理におけるＸ軸方向傾斜補正後の画像デ
ータとプロジェクションとを示す図である。FIG. 9 is a diagram showing image data and projection after X-axis direction tilt correction in the same process.

【図１０】同五線消去処理のフローチャートである。FIG. 10 is a flowchart of the staff clearing process.

【図１１】同処理における太さ及び長さ弁別の方法を
説明するための図である。FIG. 11 is a diagram for explaining a method of discriminating thickness and length in the same process.

【図１２】同処理で五線消去されたのちの画像データ
を示す図である。FIG. 12 is a diagram showing image data after the staff is erased in the same process.

【図１３】同画像データの一部を拡大し比較例と共に
示した図である。FIG. 13 is a diagram showing a part of the same image data enlarged and shown together with a comparative example.

【図１４】同ビーム認識消去処理を説明するための図
である。FIG. 14 is a diagram for explaining the beam recognition erasing process.

【図１５】同処理のフローチャートである。FIG. 15 is a flowchart of the same process.

【図１６】同オブジェクト認識処理のフローチャート
である。FIG. 16 is a flowchart of the object recognition process.

【図１７】同処理における外接長方形の決定方法を示
す図である。FIG. 17 is a diagram illustrating a method of determining a circumscribed rectangle in the same process.

【図１８】同処理におけるレファレンスパターンを示
す図である。FIG. 18 is a diagram showing a reference pattern in the same processing.

【図１９】同処理におけるマッチング評価方法を示す
図である。FIG. 19 is a diagram showing a matching evaluation method in the same process.

【図２０】同イベント認識処理のフローチャートであ
る。FIG. 20 is a flowchart of the event recognition process.

【図２１】同処理における和音判別の方法を示す図で
ある。FIG. 21 is a diagram showing a chord discrimination method in the same process.

【図２２】同処理におけるステム端判別及び付点検出
方法を示す図である。FIG. 22 is a diagram showing a stem end discrimination and dot detection method in the same process.

【図２３】同処理におけるビーム・旗カウントの方法
を示す図である。FIG. 23 is a diagram showing a beam / flag counting method in the same process.

【図２４】同イベント認識される楽譜の例を示す図で
ある。FIG. 24 is a diagram showing an example of a musical score recognized as the event.

【図２５】同楽譜の認識結果を示す図である。FIG. 25 is a diagram showing a recognition result of the musical score.

【図２６】同認識結果で実際のＭＩＤＩ音源を駆動す
るタイミングを示すタイムチャートである。FIG. 26 is a time chart showing the timing of driving an actual MIDI sound source based on the recognition result.

【図２７】同演奏データ作成処理のフローチャートで
ある。FIG. 27 is a flowchart of the performance data creation process.

【図２８】同処理で作成された演奏データを示す図で
ある。FIG. 28 is a diagram showing performance data created in the same process.

【図２９】同演奏データから作成されるＭＩＤＩデー
タを示す図である。FIG. 29 is a diagram showing MIDI data created from the performance data.

[Explanation of symbols]

１…楽譜認識装置、２…外部ＭＩＤＩ音源装置、３…出
力装置、１１…システムバス、１２…イメージスキャ
ナ、１３…ＣＰＵ、１４…ＲＯＭ、１５…ＲＡＭ、１６
…タイマ、１７…スイッチ、１８…ディスプレイ、１９
…ＭＩＤＩインタフェース。DESCRIPTION OF SYMBOLS 1 ... Score recognition device, 2 ... External MIDI sound source device, 3 ... Output device, 11 ... System bus, 12 ... Image scanner, 13 ... CPU, 14 ... ROM, 15 ... RAM, 16
... Timer, 17 ... Switch, 18 ... Display, 19
... MIDI interface.

Claims

(57) [Claims]

1. A staff, a note, a symbol and their positions in the score are recognized from image data obtained by reading an image of the score, and based on the recognition result, a pitch of a musical tone, a sounding timing and In a musical score recognition device for generating information such as pronunciation time, based on the recognition result of the staff, notes, symbols and their positions in the musical score, the pitch of the notes, the event consisting of the notes and the symbols is generated. Event recognition means for obtaining sequence and note length, and simultaneous occurrence information that associates a plurality of the events to be simultaneously generated, and determining the sounding interval of the musical tones based on the sequence and the note length of the events, and the simultaneous occurrence When there are a plurality of events associated by information, the pronunciation interval is determined based on the note length of the event having the shortest note length among the plurality of events. Music recognition apparatus characterized by comprising a sound interval determination means that.

2. The musical score recognition according to claim 1, wherein, when the event is a rest, the pronunciation interval determination means adds the note length to the pronunciation interval. apparatus.