CA2001923A1

CA2001923A1 - Automatic music transcription

Info

Publication number: CA2001923A1
Application number: CA002001923A
Authority: CA
Inventors: Yoshinari Utsumi; Shichiro Tsuruta; Hiromi Fujii; Masaki Fujimoto; Masanori Mizuno
Original assignee: NEC Home Electronics Ltd; NEC Corp
Current assignee: NEC Home Electronics Ltd; NEC Corp
Priority date: 1988-10-31
Filing date: 1989-10-31
Publication date: 1990-04-30
Also published as: JP3047068B2; KR900006908A; DE68911858T2; AU4389489A; DE68911858D1; KR920007206B1; EP0367191A2; EP0367191B1; JPH02120893A; AU631573B2; EP0367191A3

Abstract

ABSTRACT OF THE DISCLOSURE

An arrangement (apparatus and method) for automatically transcribing music. Means are provided for capturing acoustic signals and storing them in a memory.
Auxiliary rhythms are generated and displayed to a user in audio/video form. Once stored in memory along with the rhythms, pitch and power information are extracted from the acoustic signals. The acoustic signals are segmented based on the pitch and/or power information. This segmenting, in essence, divides the acoustic signals into sections each of which can be regarded to form a single level in musical interval. The segments are identified with a level on the axis of absolute musical interval on the basis of the pitch information. Then the results of the forgoing processing are displayed and/or printed.

Description

AUTOMl~TIC MUSIC TRANSCRIPTION

BAC~GROUND OF THE IN~7ENTION
,.Field_of th ,'Invention ~ his invention relates in general ko automatic music transcription. More specifically, the invention is directed to an arrange1nent ~method and apparatus) for preparing musical score data from acoustic signals. These acoustic signals may include vocal sounds, humming voices, and musical instrument sounds.
Description of the Related Art An automatic music transcription system transforms such acoustic signals as those in vocals, hummed voices, and musical instrument sounds, into musical score data. It is necessary for such a system to be able to detect rom the acoustic signals basic items of information, such as, for examples, sound lengths, musical intervals, keys;
times, and tempos.
Acoustic signals comprise repetitions of fundamental waveforms in continu~m. It is not possible to ob~ain dir~ctly from the acoustic signals the basic items of information needed to establish the musical score data.
According to a conventional method of automatic music transcription, the individual items of information are obtained by the following sequence of steps:

' 1 --~ ~ ' "~''-,, . ' .

a) information is obtained regarding tha repetitions of fundamental waveforms representing the levels of acoustic signals in pitch (which will hereafter be referred to as "the pitch information") and the power .i.nformation of such signals for each of the analyzed cycles, b) then, the acoustic signals are divided into those sections (i.e. segments) which can he considered to form one and the same level in musical interval (this process being called "se~mentation") on the basis of the pitch o information and/or ~he power information so extracted, and c) subsequently, the musical interval of each segment is determined with reference to the axis of absolute musical interval on the basis of the pitch information on the particular segment, and the key of the acoustic signal is determined on the basis of the information on the musical interval so determined, and d) thereafter, the time and tempo of the acoustic signal are determined on the basis o the segment.
Now that also the time and the tempo are determined in the application of the exi ting automatic music transcription method, the user will eventually sing or play a desired song, while keeping time and tempo for himself.
However, for a user who is not accustomed to performance or singing, such an act like this is difficult to do.
Moreover, there are users who like to perform music or sing ~iJ~ 3 a song, measuring the tempo with a metronome or the like.
Furthe~nore, the acoustic signals which are input from a music performance or a song by a user measuring the tempo or keeping the time for himself, above all those s acoustic signals which occur in songs, contain fluctuations in power and pitch, and, because of this feature, it has been found difficult to perform segmentation even with the utilization of the power information and the pitch information. Segmentation is an element important for the lo compilation of musical score data, and a lower degree of accuracy in segmentation results in a considerably low degree of accuracy in the musical score data to be obtained ultimately. Therefore, it is to be desired that the accuracy of segmentation is improved.
SUMM~RY OF THE INVENTION
The present invent.ion provides an automatic music transcription arrangement tapparatus and method) which is easier to use than known systems. Furthermore, the system according to the present invention provides more accurate segmentation than can be obtained from known systems.
According to a first aspect of the invention, there is provided an arrangement for capturing acous~ic signals and storing them in the memory while reporting the information on the input auxiliary rhythms including at 2s least information on tempo by an auditory sense process or 2~ 23 a visual sense process, the system be.ing incorporated in an automatic music transcription system which converts such acoustic signal~ into musical score dat.a by a set of processes including at least the process for capturing such acoustic signals and storing them .in the memory by means of an acoustic signal input means and thereafter extracting the pitch information, which represents the repetitive cycles of their waveforms and their sound p.itch, and the power information of such acoustic signals out of the 0 acoustic signals so stored in the memory, the process for segmentation, whî.ch consists in dividing the acoustic signals into sections each of which can be regarded to represent a ~ingle level in musical interval, by performing such segmentation on ~he basis of the pi~ch information and/or the power information, and the musical interval identifying process, which identifies each of the segments derived by such division with a level on the axis of absolute musical inter~al on the basis of the pitch information.
The system has been d~signe~ to give the users the input auxiliary information by an acoustic sense process and/or a visual sense proce~s, so that they may have ease and simplicity in generating the acoustic signals when they capture acoustic signals and take such signals into th~ :
2s system for storing them in the memory for the purpose of performing the process for music transcription.
According to a second aspect o~ the present invention, there is provided an automatic music transcription system that is somewhat similar to the first s ~spect described above, but wherein the system stores input auxiliary rhythm information as well in the memory on the same time ax.is at the time when it performs the capturing and storing in memory of the acoustic signals and wherein the segmen~ation process is divided among the first process for dividing the acoustic signals into those sections which can be regarded as representing the same level of musical interval a~ determined on the basis of the input auxiliary rhythm information stored in the memory, the second process for dividing the acoustic signals into those segments which can be regarded as representing the same level in musical interval as determined on the basis of the pitch information and~or the power information, and the third process for making ad~ustments to the sections as divided by the first process and the second process.
The system is arranged so as to utilize the input auxiliary rhythm information, so that the accuracy of the segmentation process may be improved. In other words, the system stores in its memory also the input auxiliary xhythm information at the same time as the acoustic signals are 2s captured and stored in the memory. Then, the system '-, performs its segmentation process on the basis oE this input auxiliary rhythm information, performs its segmentation process also on the basis of the pitch information and the power information, and then makes s ad~ustments to the results of such segmentation processes.
According to a third aspect of the present invention, there is provided a system including an input auxiliary rhythm reporting means whereby the input auxiliary rhythm information including at least the tempo information is o reported by an auditory sense process and/or a visual sense process at the time when the acoustic signals are captured and stored in the memory, the system being incorporated in an automatic music transcription system for converting the acoustic signals into musical score data, the system being provided at least iTI some with the means of capturing and taking the acoustic signals into the system, the means of storing in the memory the acoustic signals so taken into the system, the pi~ch and power extracting means, which extracts the pitch information representing the repetitive cycle of the waveforms in the acoustic signals stored in the memory and representing the level in pitch, and the power information from the acoustic signals, the segmentation means for dividing the acoustic signals into those sections wh.ich can be regarded as represen~ing the 2s same level in musical interval as determined on ~he ~asis of the pitch information and the power information, and the musical interval identifying means, which determines the musical interval of the acoustic signals, with respect to the sections so divided, with reference to the axis of s absolute musical lnterval.
The system is so designed that its input auxiliary rhythm reporting means reports the input auxiliary rhythm information by an auditory sense process and~or a visual sense process at the time when the acoustic signals are o captured and stored in the memory. ~s the result of this feature, it has been made possible or the user to perform the input operations on the basis of the input auxiliary rhythm information and consequently to enjoy greater ease in the input of signals.
1S According to a fourth aspect of the present invention, there is provided a system having a memory means designed to store also the input auxiliary information on the input auxiliary rhythm information in memory on the same time axis at the time when acoustic signal is captured and processed for storage in the memo~y and provided also with a segmenting means including a first segmenting section for segmenting the acoustic signals into those sections each of which can be regarded as forming one and the same level of musical interval, as determined on the basis of the input auxiliary rhythm information stored in - ' . , :, , ~
, ~ , ' - ,' . " ' ,~

2~ 23 the memory, a second segmenting section for segmenting the acoustic signals into those sections each of which can be regarded as forming one and the same le~el of musical interval, as determined on the basis of the pitch in~o.rmation and the power information, and a third segmenting section for making adjustments to those sections as divided into segments by the first segmenting section and the second segmenting section.
The memory means, which stores the acoustic signals o in its memory, keeps also the input auxiliary rhythm information in memory on the same time axis as reported from the input auxiliary rhythm information when the captured acoustic signals are stored in the memory, and the system is so designed that the first seqmenting section performs its segmentation process on the basis of this input reporting rhythm information and the third sesmenting section makes ad~ustments to the results of this segmentation process and the results of the segmentation performed by the second segmenting section on the basis of the pitch information and the power information. A~ the xesult of this feature, the accuracy of se~mentation can be improved.
BRIEF DESCRIPTION OF THE_DRP~WINGS
Preferred embodimen~(s) of the invention will be described in detail wlth reference to the drawings wherein , 2~ 3 like reference numerals denote like or corresponding parts throughout.
FIGURE 1 is a flow chart o the input process for the acoustic signal~ to be processed by one of the embodiments o~ the present invention;
FIGURE 2 is a block diagram of an automatic music transcription system incorporating the present invention;
FIGURE 3 is a flow chart of the automatic music transcription process;
FIGURE ~ is a flow chart of the segmentation process based on the measure informa~ion and the power information generated by the system;
FI~URE 5 is a flow chart showing greater details of the segmentation process based on the measure information S and the power information;
FIGURE 6 iS a characteristic curve chart representing one example of the segmentation; and FIGURE 7 is a block diagram of other embodiments of the automatic music transcxiption system.
DE~ILED DESCRIPTION OF PREFERRED EMBODIMENTS
Preferred embodlment(s) of the invention will be described in detail wi~h reference to ~he drawings wherein like reference numerals denote like or corresponding parts throughout.
2s Automatic Music TranscriPtion SYstem ' ~ ';

:

9~

FIGURE 2 is a block diagram of an automatic music transcription system incorporating the present invention.
A Central Processing Unit (CPU) l performs overall control for the entire system. CPU l e~ecutes an acoustic signal input program shown in the FIGURE 1 flow chart and a music transcription processing program shown in the FIGURE 3 f low chart. The acoustic signal input and music transcription processing programs are stored in a main storage device 3 connected to CPU l via a bus 2. Also coupled to bus 2 are a Xeyboard 4, which serves as an input device, a display unit 5, which serves as an output device, an auxiliary memory device 6 for use as working memory, and an analog/digital (A/D) converter 7. An acoustic signal input device 8, which may comprise a microphone, etc. pro~ides input to A~D converter 7. Acoustic signal input device 8 captures the acoustic signals in vocal songs or humming voices or like sound signals generated by musical instruments and then transforms the signals into electrical signals, thereafter outputting the electrical signals to the A/D converter 7.
Also connected to bus 2 is a speaker driving section 9 for driving a speaker lO. Speaker lO generates, when necessary, scatt~red input auxiliary rhythm sounds representing the predetermined time and tempo under control 2s of CPU l.

~ 10 -', ' . ~ : .

., ~ -~ f.'~ 3 CPU 1 operates in accordance with the acoustic signal input program flow charted in FIGURE 1 to .input acoustic signals into the system. These signals are stored in main storage de~ice 3. When there has been received a command to input the acoustic signals, ~,ogether wi.th a command to operate Wit}l th~ specified time and tempo, as entered on the keyboard 4, the input acoustic signals are sto.red in an orderly sequence into the auxiliary storage device 6. The system also temporarily stores input auxiliary rhythm o information in auxiliary memory device 6.
Upon completion of the input of acoustic signals into the system, CPU 1 executes the music transcription processing program (flow charted in FIGURE 3) stored in the main storage device 3 thereby converting the input acoustic s signals into musical score data and outputting such data to display unit S as required.
Input of Acoustic Siqnals FIGURE 1 is a flow chart of the process for inputting acoustic signals. When the CPU 1 receives a command by way of keyboard 4 to operate in its input mode, the CPU 1 starts executing the program flow charted in FIGURE 1. It first displays on the display unit 5 a prompt for the use to input timing information. It then receives timing infoxmation from the user in response to the prompt via 2s keyboard 4. Displzy unit S then displays a prompt to the ' ' ' ' . ~ ., g~

user to input tempo information. The tempo information is received ~rom the user in response to that prompt (Steps SP
1 and SP 2). Thereafter, the CPU 1 carries out arithmetic operations to determine the cycle and intensity of the s input auxiliary rhythm informati.on on the basis of the timing information and the tempo information. CPIJ 1 then stands by for the input of an input start command from keyboard 4 (Steps SP 3 and SP 4).
When an input start command is given by the user, the lo CPU 1 causes an input auxiliary rhythm sound to be generated f.rom the speaker lO. It thereafter determines whether or not the input auxiliary rhythm sound so generated indicates the beginning of any measure.
If the sound indicates the beginning of a measure, 1S then the CPU 1 stores the sound in the auxiliary storage device 6 and thereafter receives into the system the acoustic signals composed of digital data as processed through the acoustic signal input device 8 and the A/D
converter 7. However, if the sound does not indicate any be~inning of a measure, then the CPU 1 immediately inputs of the acoustic signals (Steps SP S through SP 8).
Thereafter, the CPU 1 stores ~he acoustic signals so input into ~he system in ~he auxiliary sto.rage device 6 (Step SP
. :

When the one set of data on the acoustic signal are thus stored in the auxiliary storage device 6, the CPU l deter~ines whether or not any command to finish of the input operatLon has been given by way o the keyboard 4.
s When a finish command has been given, the CPU l stops its series of operations. However, if there has not ~een any finish command given, the CPU 1 further determines whether or not the system is in a timing for the genexation of any input auxiliary rhythm sound (Steps SP lO and SP ll). If lo it is not in any timing for the generation of such a sound, the CPU l returns to step SP 8 and proceeds to the step at which it takes acoustic signals into the system. If it is found that the operation of the system is in ~he timing for generating the input auxiliary rhythm sound, the CPU l S returns to step SP 5, and moves on to the step for the generation of the next input auxil.iaxy rhythm sound.
Thus, the system takes in acoustic signals generated by a user while generating the input auxiliary rhythm sound, and stores the signals one after another in orderly sequence, together with marks indicating the beginning of a measure, in the auxiliary storage device 6.
The feature of the system related to generating the input auxiliary rhythm sound makes it easy for a user to input of the acoustic signals.
2s Music Score Transcription Process .

FIGURE 3 is a flow chart of the automatic music transcription process. This process does not occur until after the input of acoustic signals.
First, the CPU 1 extracts the pitch information for s the acoustic signals or each analytical cycle using autocorxelation analysis of the acoustic signals. I~ also extracts power information for each analytical cycle by processing the acoustic signals to find the square sum.
Then, CPU 1 performs vari.ous pre-treatment processes, such o as, for examples pre-treatments for noise elimination and smoothing (Steps SP 21 and SP 22).
Thereafter, the CPU 1 segments the input acoustic signals into predetermined sections on the basis of the marks placed at the beginning of each measure as stored in s the auxiliary storage device 6. It then reviews such sections on the basis of the changes in power, thereby separating such sections to establish the segments which can be regarded as representing the same sound ~Steps SP 23 and SP 24)o Next, the CPU 1 performs a tuning process (S~ep 25).
CPU 1 calculates he amount of deviation of the musical interval axis that the acoustic signal has in relation to the axis of the absolute musical interval on the basis of the state of distribution of the pitch information and effecting a shift of the obtained pitch inormation in - 14 ~

.

accordance with the amount of deviation thereof. In other words, the CPU l modifies the pitch information in such a way that there will remain a smaller difference between the axis of musical interval for the singer or the musical instru1nQnt that has genera~Qd the acoustic signal and the axis of the absolute musical axis.
The CPU l thus identifies the musical interval of the particular segment with that on the ax.is of the absolute musical interval to which the relevant pitch information is 0 considered to be closest as seen on the basis of the pitch inormation of the segment obtained by the above-men~ioned segmentation process and further executes the segmentation process again on the basis of whether or not the musical interval of the identified segments in continuum are identical (Steps SP 26 and SP 27).
After tha~, the CPU l finds the product sum of the frequency of occurrence of the musical interval as obtained by working out the classified total of the pitch information after the tuning thereof and the certain prescribed weighing coefficient as determined in correspondence to the key, and, on the basis of the maximum value information o this product sum, the CPU l determines the key, for example, the C-major key or the A-minor key, for the piece of music in the input acoustic signals, thereafter ascertaining and correcting the musical interval ,~ -by reviewing the same musical interval in greater detail with respect to the pitch information regarding the prescribed musical interval on the musical scale for the determLned key tSteps SP 28 and SP 29).
Subse~uently, the CPU 1 carrier out a inal segmentation by revie~ing the segmentation results on the basis of whether or not the finally determined musical interval contains identical sesments in continuum or whether or not there is any change in power among the lo segments in continuum (Step SP 30).
After the musical interval and the segments (i.e.
the sound length) have been determined in this manner, the CPU 1 produces the finalized musical score data through ad~ustment of the information including the timing ~5 information and the tempo information which were input at the time when the input of the acoustic signals was started tStep SP 31).
Seqmentation Based on Measure and Power FIGURE 4 is a flow chart of the segmentation process based on the measure information and the power infoxmation generated by the system and FIGUR~ 5 is a flow chart showing greater details of the segmentation process based on the measure information and the power information. The following is a detailed explanation of the flow charts in:
2s FIGURE 4 and FIGURE 5 related to the segmentation process .,'~ ,",' : '. , ' ', , . .

~f~ Z3 (Steps SP 23 and SP 24 in FIGURE 3) based on the measure information and the power information on the acoustic signals. In this regard, FIGURE 4 is a flow chart ~llustrating such a process at the functional level while FIGURE S is a flow chart illustrating the greater details of what is shown in FIGURE 4.
The acoustic si.gnals are brought to their squares with respect to the individual sampling points within the analytical cycle, and the sum total of those square values o is used for the power information on the acoustic signals in the analytical cycle.
First, an outline of such a segmen~ation process is described with re~erence to FIGURE 4. For the purposes of illustration only, we now assume that the quadruple time has been selected for the measure of the signals. Of course, the invention is not limited to this assumption.
The assumption is made only to facilitate explanation of the invention.
~he CPU 1 takes out the mark for the beginning of a measure as s~ored in the auxiliary storage device 6~
divides each measure into four equal portions, and puts a mark indicating the beginning of ~ beat at the initial part of each of the equally divided portions (Step SP 40). In the case of quadruple measure not being selected, but rather triple measure having been selected, the measure is ' ' , ' .

'.
- .
.. , ' . . ,, ' 9~

to ~e divided in~o three equal portions. Next, the CPU 1 makes a further division of each of the obtained beats into four equal portions, and puts a mark or the beginning of a semiquart~r note at the initial part of each of the s oqually divided poxtions (Step SP 41). In this manner, the acoustic signals are divided into 1~ portions of each measure on the basis of the measure information. In those cases where not the quadruple measure but the triple measure has been selected, one measure will be divided into o twelve equal portions. Thereafter, the CPU 1 reviews these divided portions on the basis of the power information.
The system has been so arranged as to reflect the power information on the segmentation process because users may produce changes accompanying some intensification of s power when they change the pitch in the sounds, i.e. when they make a transition to the next sound.
CPU 1 then extracts the point of a rise in the power information, putting a mark indicating a rising point at the appropriate place and thereafter taking the mark which indicates the beginning of a semiquarter note and is located at a point closest to each of the rising point and puttinq a mark indicating the beginning of a semiquarter note at the rising point (Steps SP 42 and SP 43). The reason why these steps are carried out is that it is 2s practically difficult for the user to make a change in .
': ' , ,- . , .. :
. ' ; ' ' .

2~ 3 sound in cornplete agreement with the timing of the input of acoustic signals even if such si~nals are input with ~he generation of the input auxiliary rhy~hm sound, so that it is provided ko ensure t1lat it can be judged with certainty whether or not the following section is a rest section with the help of the division of a sound set at a point of change in the acoustic signal.
Subsequently, the CPU 1 counts the number of pieces of the pitch information in each semiquarter note section 0 and puts a mark indicating the beginning of a rest at the initial point of each section where the n-Imber of pieces of such information is smaller than the threshold value (Step SP 44). Finally, the CPU l places a mark indicating the beginning of a segment at those points bearing a mark for the beginning of a measure, a rising point, or the beginning of a rest (Step SP 45). A mark is made indicating the beginning of a segment also at the point where a measure begins because one sound may extend over ~wo measures, in which case it is the practice to show ~o musical notes in the respective measures indicated on the score.
In ~his manner, the system obtains a plural number of segments obtained by the division based on the measura information and the power information. Even if some of thè
segments obtained by this segmentation process should turn ~3(~ 3 out to be inadequate ones, such segmen~s will be rectified to be proper segments by the effect of the segmentati.on to be executed at subsequent steps (steps SP 27 and SP 30 given in FIGVRE 3) as mentioned above.
Next, this process is explained in greater detail with reEerence to the flow chart .in FIGURE 5. The CPU 1 first clears to zero the parameter i indicating each analytical cycle (such an analytical cycle like this is hereafter called an analytical point in view of the fact 0 that it has a very short analytical cycle), and then~
ascertaining that the analytical point data (which include pitch i.n~ormation and power information) to be processed has not yet been completed, the CPU judges whether or not any mark indicating the beginning o a measure is placed on s that analytical point (Steps SP 50 through SP 52). In case no such mark is placed, the CPU increment the parameter i for the analytical point and returns to the above-mentioned Step SP 51, but, in case where such a mark is placed, the CPU 1 proceeds to perform the process0s at the Step SP 54 and the subsequent steps (Step SP 54). In this manner, the CPU 1 finds the mark indica~ing the beginning of the first measure.
Having detected a mark indicating the beginning of a measure, the CPU 1 se~s i ~ l in the parameter j, and, 2s ascertaining that the analytical point data to be processed _ 20 -.
:' . , ' ~

2ll'` ~3 have not been completed, the CPU 1 judges whether any mark indicating the beginning of a measure is placed on the particular analytical point (Steps SP 54 through SP 56).
In case no such mark is placed, the CPU 1 ~ncrements the s parameter ~ and returns to the Step SP 55 mentioned above, but~ in case such a mark is placed, the CPU 1 proceeds to the processing of the Step SP 58 and the subsequent steps (Step SP 57).
Here, at the timing which has generated an lo affLrmative result at Step SP 56, the parameter i indicates the analytical po.int positioned at the former mark out of the two consecutive marks which indicate the beginning of a measure while the parameter ; indicates the analytical point positioned at the latter of the two consecutive marks lS which indicate the beginning of a measure. Thus, the CPU 1 divides the section from the analytical point i to khe analytical point j~l into four equal portions (or into three equal portions in the case of such a section with the triple beat) and puts a mark for the beginning of a beat on each of those portions, thereafter se~ting j in the parameter i, which indicates the analytical point positioned in the former of the marks indicating the beginning of a measure, and then returning to the above-mentioned Step SP 54 to proceed to the searching of the 25~ analytica]. point bearing the maxk indicating the beginning ' ;
.
. .

3.~3 of a measure and positioned in the latter of the analytical points (Steps SP 58 and SP 5~).
By the repeated execution of this loop operation process including Steps SP 54 through SP 59, the marks s i.ndicating the begi.nning of each beat are placed one by one in orderly sequence in the individual measure sections until the data on the final analytical point are taken out to produce an affirmative result at the Step Sp 55. At such a time, the CPU 1 places a mark indicating the o beginning of a beat at the analytical point for the parameter i at the particular point in time, therewith completing a series of processes fox putting the mark indicati.ng the beginning of a beat and thereafter proceeds to 5tep 61 and the subsequent steps for putting the mark lS indicating the beginning of each semiquarter note (Step SP
60).
If CPU 1 obtains an affirmative result at the Step SP
51 as it comes to the final data without finding any mark indicating the beginning of the initial measure, the CPU
proceeds, without placing any mark on such sections, to the processes for putting the marks indicating the beginning of the semi~uarter notes. The portion of the process including St~ps SP 50 through SP 60 correspond to Step 40 in FIGURE 4.

- 2~ ~

The details of the processes corresponding to the Step 41 in FIGURE 4, which are to be performed for putting the marks indicating the beginning of the semiquarter notes by f.inding the two marks ona preceding the other and s indicating the beginning of the beat and dividinq the sections with such marks into four equal portions, are almost identical to the processes of Steps SP 50 through SP
60. Marks are placed indicating the beginning of the beats by finding the marks indicating the beginning of the respectively preceding and following sections and dividing those sections into four e~ual portions. Therefore, a detailed discussion of that process is omitted (Steps SP 61 through SP 71).
Upon completion of the processes of placiny marks S indicating the beginning of the semiquarter notes, the CPU
l clears to zero the parameter i for the analytical point and thereaPter performs arithmetic operations to determine the function d (i) for extracting the ri.se in the power information with respect to that analytical point, ascertaining the point that the analytical point data to be processed have not yet been brought to a finish (Steps SP
72 through SP 74~.
The rise extraction function d (i) for the power information, power (i~, with respect to the analytical point i is determtned by arithmetic operations by the _ 23 ~

Z~ 3 followi~g equation and applied as such:
d(t) = {power(i~t) - power(i)}~
{power(i~t) ~ power(i)} ... (1) Where t represents a na~ural number indicating an amount of time adequate for capturing the fluctuations in the rise of the power information.
Thereafter, the CPU 1 ~udges whether o.r not the value of the rise extraction function d(i~ so obtained is any smalle.r than the threshold value ~d, and, if it is smaller, 0 the CPU 1 increments the parameter i for the analytical point and returns to the Step SP 73 (Steps SP 75 and SP
76). On the other hand, in case the rise extraction function d(t) is found to be in excess of the threshold value ~d, the CPU 1 places the mark indicating the S beginning of a rise point to that analytical point (Step SP
77).
Thereafter, the CPU 1 ascertains that the processing has not yet been completed on the data with respect to all the analytical points and then, performing arithmetic operations to determine the rise extraction function d(i~, judges whether or not the rise extraction function d(i) smaller than the threshold value ~d ~Steps SP 78 through SP
80). In case the rise extraction function d(i) is smaller than the threshold value, the CPU 1 increments the 2s parameter i and returns to the above- mentioned Step SP 78 (Step SP 81).
The process of Steps from SP 78 through SP 81 is a process for finding the analytical point at which the rise extraction function d~i) becomes smaller than the threshold value ~d ater the rise extraction function once grows larger than the threshold value. Now that there is an analytical point where the rise extraction function rises again after the analytical point thus obtained, the CPU 1 xeturns to the above-mentioned step 5P 73 and resumes the o process for extracting the rise point if has found an analytical point where the rise extraction function becomes smaller than the threshold value, i.e. if i~ obtains an af~irmative result at the above- mentioned step SP 80.
By repeating the processing procedure mentioned above, the CPU 1 soon detects that the processing has been completed of all the analytical points at the Steps SP 73 or SP 78, and the CPU 1 proceeds to a review of the rise points on the basis of the length between the adjacent ri~e points at the Step SP 82 and the subsequent steps.
In such a process, the CPU 1 clears to zero the parameter i for the analytical point, and then, ascertaining that the data on the analytical point have not yet been brought to a finish, the CPU 1 judges whether or not a mark indicating a rise point is placed on the 2~ analytical point (Steps SP 82 through SP 84). When the point is not a rise poi.nt, the CPU 1 increments the parameter i for the analytical point and then returns to the Step SP 83 (Step SP 85). Upon the detection of a rise point through the repeated performance of this process, the S CPU 1 sets the length parameter L at the initial value "1"
in order to measure the length from the rise point to the next rise point (Step SP 86).
Thereafter, the CPU 1 increments the analytical point parameter i, and then, ascertaining that the analytical o point data has not yet been completed, further judges whether or not any mark indicating the beginning of a rise point is placed on the particular analytical point (Steps SP 87 through SP 89). If the CPU 1 finds as the result that the analytical point is not any rise point, the CPU 1 s increments ~he length parameter L and also increments the analytical point parameter i, thereafter retuxning to the above-mentioned step, SP 88 (Steps SP 90 and SP 91).
By repeating the process of the steps SP 88 through SP 91, the CPU 1 will soon come to an analytical point where the next mark indicating the beginning of a segment is placed, obtaining an affirmative result at the step SP
89. ~he length parameter L found a~ this time corresponds to the distance between the marked analytical point being taken up for processing and the immediately preceding 2s marked analytical point, i.e. to the length between the ~ 26 -.
' ,' ' , respectively preceding and following rise points. I an affirmative result is obtained at the step SP 89, the CPU l judges whether or not this parameter h is shorter than the threshold value ~L, and, when it is found to be above the threshold value ~L, the CPU l returns to the above-mentioned Step, SP 83, without eliminating the mark indicating a rise point, but, when it is smaller than the threshold value ~L, the CPU 1 removes the former mark indicating the rise point, and then returns to the above-mentioned step SP 83 (Steps SP 92 and SP 93).
Moreover, in case the CPU l has returned to the stepSP 83 from the step SP 92 or SP 93, the CPU l will immediat.ely obtain an affirmative result at the step SP 84, unless the analytical point data has been completed, and the CPU l will proceed to the process.ing at the subsequent steps beginning with the step SP 86 and will move on to the operation for searching for another mark next to the mark just found.
By repeating ~his sequence of steps, the CPU l will complete the review of the lengths between the rise points with respect to all the rise points, and when it 500n obtains an affirmative result at the Step SP 83 or the Step SP 88, the CPU l will complete the series of processes for the extraction of the rise points in the power information:.
2s The proces~ of Steps, SP 72 through SP 93 corresponds to 2a~ .23 the process of Step SP 42 shown in FIGURE 4.
The reason why this system has been arranged to review the rise points with reference to the distance between the respectively preceding and following rise points a~ker the rise points have been extracted with the ri.se extract.ion function d (i) is the necessity of preventing the occurrence of a plural number of rise points in a section shorter than the length of a single sound in consequence o~ the fact that the power in acoustic signals 0 may yet undergo fluctuations even though they are intended to be a single sound and the fact that acoustic signals may contain an intrusive outside noises.
When the CPU l completes the process for thus extracting the rise points in the power information by repeating this processing procedure, the CPU l first clears to zero the parameter i for the analytical point and then, ascertaining that the data to be processed are not yet f.inished, the CPU l judges whether or not any mark indicating a rise point in the power information is placed with respect to that analytical poin~ (the Steps, SP 94 through SP 96). In cas no such mark is placed, the CPU l increments the parameter i and then returns to the step, SP
95, mentioned above (Step SP 97). When the CPU l finds one rise point in this manner, the CPU judges whether or not 2s any mark indicating a semiquarter note is placed on the - 28 ~

. .
~, . .. .
, . ' ~ ~ ~' ', ' 2~ 323 analytical point i thereof (Step SP 98).
In case it is found that a mark indicatinq a semiquarter note is placed on the point, the CPU 1 increments the parameter î and then returns to the Step SP
95 mentioned above, thereupon proceeding to -the process or searching the next rise point because it is not necessary to perform any processing for the matching of that rise point and the beginning point of the semiquarter note (Step 5P 99) On the other hand, in case the rise point so found does not have any mark indicating the beginning of a semiquarter note, then the CPU 1 ret~lrns to the above-mentioned step SP 95 and proceeds therefrom to the process for searching the rise point pos.itioned closest to this IS rise point.
First, the CPU 1 puts a mark indicating the beginning of a semiquarter note at the rise point, and then it sets the parameter ; at its initial value "1" for finding the analytical point preceding the rise point and bearing a mark indicating the beginning of a semiquarter note (St~ps SP 100 and SP 101~.
Thereafter, ascertaining that the section i-~ is not any less than 0 (which means that the analytical point i-j is an analytical point loaded with data), the CPU 1 judges.
2s whether or not any mark indicating the besinning of a semiquarter note is placed on the analytical point i-j. In case no such mark is placed there, the CPU 1 increments the parameter ~, thereaftex returning to the Step 102 (Steps SP
102 through SP 104). By repeating the process of the s steps, SP 102 through SP 104, the CPU 1 finds the analytical point i-;, which is located in a position closest on the side preceding the rise point where a mark indicating a semiquarter note is placed, then obtaining an affirmative result at the Step SP 103.
o In such a case, the CPU 1 sets the parameter k, which is a parameter for finding the analytical point bearing a mark indicating a semiquarter note at the side following the rise point, at the initial value "1" (Step SP 105).
Thereaftar, the CPU 1 ascertains that the analytical point i~k does not have any value larger than that of the final analytical point, which amounts to saying that the analytical point i~k is one where data are present, and then judges whether or not any mark indicating the beginning of a semiquarter note is placed on the analytical point i~k. If no such mark is placed there, the CPU 1 increments the parameter, then returning to the step SP 106 (Steps SP 106 through SP 108). By repeating the process of steps, SP 106 through SP 108, the CPU 1 finds the analytical point i~k, which is positione~ closest to and 2s following ~he rise point which bears the mark indicating - : , '- ' - ' . ' ' .~ ' '- , the beginning of a semiquarter note, then obtaining an affirmative result at the step SP 107.
Thus finding the analytical points positioned closest to the xise poi.nt and respectively preceding and following the point whexe a mark indicating the begi.nning of a semiquarter note is placed, the CPU 1 compares the two parameters ~ and k in te~ms of size and judges which of the two analytical points are closer to the r.ise point, and, in case the analytical point i-j positioned on the preceding 0 side is closer to the rise point (including those cases where the analytical point is equally close to the rise point), the CPU 1 removes the mark indicating the beginning of a semiquarter note from the analytical point i-~, where it has been placed, and thereafter the CPU l increments the lS parameter i and proceeds to the process of searching the next rise point. On the other hand, if the analytical point i+k positioned on the following side is closer to the rise point, the CPU l removes the mark indicating the beginning of a semiquarter note from that analytical point i+k where it has been at~ached, and thereafter increments the parameter i and pxoceeds to the process of searching the next rise point (Steps SP 109 through SP 113).
By repeating this process, the CPU 1 places a mark indicating the beginning of a semiquarter note on every rise point while it removes the mark indicating the ~' - .

9~

beginning of a semiquarter note from the point closest to that rise point. And, when this process is completed with respect to all the analytical pOilltS, the CPU 1 inishes the process for matching the series of rise points and the s points maxk.ing the beginning o the semiquart~r poinks by the step, SP 95. Moreover, the process of s~eps SP 9~
through SP 113 corresponds to the step SP 43 of FIGURE 4.
Having thus completed the process of changing the rise points in the power information, the CPU 1 clears to 0 zero the parameter i for the analytical point and then, ascertaining that the data to be processed with respect to the analytical point are not yet finished, the CPU 1 judges whether or not a mark indicating the beginning of a semiquarter note is placed on that analytical point (steps, SP 114 through SP 116). In case no such mark is placed, the CPU 1 increments the parameter i and returns to the above-mentioned step SP 115 (Step SP 117). When the first mark indicating the beginning of a semiquarter no~e has thus been located, the CPU 1 sets at i-l the parameter j applied to the next mark indicating the beginning of a semi~uarter note, and then~ ascertains that the data to be processed of the analytical data have not been finished yet, the CPU 1 judges whether or not a mark indicating the beginning o~ a semiquarter note i5 placed on that analytical point ; (Steps SP 118 through SP 120~. In case ' ~

, . .. . . .

2~ 3 no such mark is placedr the CPU 1 then increments the parameter ~ and returns to the step SP 119 mentioned above (St~p SP 121).
When the next mark indicating the beginning of a semiquarter not~ is found, the CPU 1 clears to zero the number-oE-pieces parameter n for a segment with a pitch and therea~ter sets at 1 the finish parameter k for the processing of a se~ment with the presence of pitch (Steps SP 122 and SP 123). Next, after ascertaining that the 0 parameter k is smaller in value than the parameter j, the CPU 1 ~udges whether or not there is any pitch information present at the analytical point k, i.e. whether or not the analytical point k contains a voiced sound (Steps SP 124 and SP 125).
If an affirmative result has been obtained from this p.rocess, the CPU 1 then increments the numbex-of- pieces parameter n and thereafter also increments parameter k, then returning to the step SP 124 mentioned abova. On the other hand, when a negative resul~ has been obtained, the CPU 1 immediately increments the parameter k, thereafter returning to the above-mentioned step SP 124 ~Steps SP 125 and SP 126). The repetition of this process wilI soon results in obtaining an affirmative answer at the s~ep SP
12~. Here, the parameter k chan~es within the range from i to j-1, andr when an affirmative result is obtained at the ~a~
step SP 124, the number-of~pieces parameter n indicates the n~ er of pieces of the analytical poin~s with the presence of the pitch information between the analytical point i and the analytical s point ~-1, i.e. the number of pieces of the analytical points where there is some pitch information between the preceding and the following marks each indicating the beginning of a semiquarter note.
The CPU 1 judges whether or not the value of the 0 number-of-pieces parameter n is larger than the prescribed thxeshold value ~n. If the value of the parameter is smaller than the threshold value ~n~ the CPU 1 puts a mark for the beginning of a rest at the analytical point i, which is the first analytical point in the count of the number of pieces of the analytical points, where a mark indicating the beginning of a semiquarter note is placed, and thereafter the CPU 1 sets the paxameter i at ~ and returns to the step SP 118 mentioned above. On the other hand, if the value of the parameter is moxe than the threshold value ~n, the CPU 1 immediately sets the parameter i at ;, thereafter returning to ~he above-mentioned step SP 118 and proceeding to the process of searching the next analytical point where a mark indicating the beginning of a semiquarter note is placed (Steps SP 128 through SP 130). By repea~ing this process, a mark .
. ~ .
'-~' ' ' ~ .
,.

9~3 indicating the beginning of a rest is placed one by one in orderly sequence at the first analy~ical point that is positioned between the respectively precedi.ng and following marks each indicating the beginning of a semiquarter note and having a ewer number of pieces of analytical points with the presence of the pitch information, and soon an affirmative result is obtained at the step SP 115 or SP
119, and the series of processes for placing a mark indicating the beginning of a rest will be brought to a 0 finish. In this regard, the process of steps, SP lL4 through SP 130, corresponds to the process at the step SP
44 of FIGURE 4.
Upon completion of the process o placing a mark indicating the beginning of a rest, the CPU 1 clears to zero the analytical point parameter i, and, ascertaining that the analytical point data to be processed have not yet been finished, the CPU 1 judges whether or not a mark indicating the beginning of a measure is placed on that analytical point ~Steps SP 131 through SP 133). In case no mark indicating the beginning of a measure is placed, the CPU 1 further judges whether or not a mark indicating a rise point in the power information is placed there (Step SP 134). In case there is no mark placed for indicating a xise point, the CPU 1 further ~udges whether or not a mark indicating the beginning of a rest is placed there (Step SP

2~

135). In case ~he mark indicating the beginning of a rest is not placed, the CPU 1 increments the parameter i and returns to the above mentioned step, SP 132, then ascertaining the presence of a mark on the next analytical point (Step SP 136).
Meanwhile, if any mark is placed on the analytical point i for the indication of the beginning of a measure or the beginning of a rise point or a rest, the CPU 1 puts a mark on the analytical point thereby to indicate the lo beginning of a segment, and then increments the parameter i, thereafter xeturning to the above-mentioned step, SP
132, and ascertaining whether or not the prescribed marX is attached to the next analytical point (Steps SP 137 and SP
138).
In this manner, the CPU 1 places marks indicating the beginnings of segments one by one on those analytical points which bear a mark indicating the beginnin~ of a measure, a rise point, or the beginning of a rest, and the process soon comes to the final data, and an affinnative result is obtained at the step SP 132. Thereupon the series of processes for placing the mark indicating the beginning of a segment is finished. The process of ~he steps SP 131 through SP 138 corresponds to the process of step SP 45 of FIGURE 4.

?~
, ~,: .' . , ' ' '. ': "', . ' , ', ' ' : . '. ' . ' . ' ' ' .

2~

Thus, the CPU 1 finishes the process of sqgmentation on the basis of the measures and power information, thereafter pxoceeding to the tuning process as described a~ove.
FIGURE 6 presents the changes in the pitch information, PIT, the power information, POW, and the rise extraction function d(i) with respect to the one-measure section. Here, the "dual circle" mark represents the beginning of a measure, and the "white star" mark o represents a rise point, while the "circle" mark indicates the beginning of a beat, and the "X" mark indicates the beginning of a semiquarter note before the matching with a rise point is executed, and the "triangle" mark shows the beginning of a rest. Thereore, in the case of this lS example of a section corresponding to one measure, the mark indicating the beginning of a segment is placed as shown by the "black circle'l mark shown in it as the result of the execution of the series of segmentation processes as desc.ribed above.
According to the embodiment described above, the system is so designed as to generate input auxiliary rhythm sounds in order to help the users in their input of acoustic signal~, thereby offering simplicity and ease of use with regard to the input of acoustic signals and enabling their input with accuracy in terms of rhy~hm, .

- 37 ~

, ' ~ ' . : .. .
. . . - ~ :
:. '' " '. ' , ~" ~. . . , , 2~ 3 which results in greater facility in the segmenting of such signals and therefore in improvements upon the precision of the produced musical score data.
The system is arranged Ln such a way that the s information on the input auxiliary rhythm sounds generated at the time of the input ~re recorded on the same time axis as ~or ~he acoustic signals, so that such information may be used for segmenting such signals. This feature enhances accuracy of segmentation, which in turn leads to improvement~ on the precision of the musical score data produced.
Alternative Embodiments The preferred embodiment, described above, employs the squaxe sum of the acoustic signal as the power in~ormation, but another parameter may also be used. For example, the square root of the square sum may be used.
Moreover, the rise extraction function has been obtained in the manner expressed in the equation (l), but also another parameter may be employed. It is acceptable to extract the rise in the power information by the application of a function representing only the numerator in the equation (1) In the preferred embodiment the system takes away the mark of ~he rise point on the preceding side in case the distance between the preceding and following ri~e ~oints is ., . .. . . . - . .
. ... ~ . .... .
: -, - : .
:' ' . ' ~ ' ' '',. ', ' 2~ 3 short, but it is acceptable to remove the mark of the rise point.
In the preferred embodiment, described above, the ~ystem generates the input auxiliary rhythm sounds to permit the users to input the acoustic sounds with ease.
HowevQr, the rhythm information for assisting the user with the input procedure may be provided in the visual form.
For example, it is feasible to display on display unit 5 an image of a baton which moves with th~ appropriate rhythm.
Also, it is acceptable to use a combination of audio and visual means for indicating rhythm to the user. In this regard, the sounds of a metronome or rhythmic accompanying sounds could ~e provided as the input auxiliary sounds.
In the preferred embodiment, described above, the system makes use of the information on the beginnin~ of a measure, out of the input auxiliary rhythm information, for performing the segmentation process. HowPver, the information indicating the beginning of a beat, out of the input auxiliary rhythm information, may well be used for performing the ~esmentation process.
The preferred embodiment uses display unit 5 to output of tha musical ~core data, but a character printing device can be used in its place.
In the preferred embodiment CPU 1 executes all the 2s processes in accordance with the programs stored in memory w ~9 _ ' ., .
. .

,~

2~

in the main storage device 3. Yet, some or all of the processes can be executed by a hardware system or sub-system. For example, as illustrated in FIGURE 7, where the identical reference numbers are given for the parts s corresponding to thosQ shown in FIGURE 2, the acoustic signals input from the acoustic signal input device 8 can be amplified while they are passed through the amplifying circuit 11 and thereafter channeled through a pre-filter 12 and then fed into the A/D converter 13, where they are o converted into digital signals. The acoustic signals as thus converted into digîtal signals are then processed for autocorrelation analysis by the signal-processing processor 14, which thereby extracts the pitch information or may otherwise extract the power information by processing the signals to find their square sum, and the pitch information or the power information, as the case may be, can then be supplied to the CPU 1 for their processing with the softwara system. As a signal- processing processor 14 which can be utilized for such a hard~are construction (11 through 14), it is possible to use a processor which is capable of performing the real-time pro~essing of the signals and is also provided with the signals for establishing an interface with the host computer (for example, ~ PD 7720 made by Nippon ~lectric Corporation) .
' .

The preferred embodiment performs the initial segmentation process on the basis of the input auxiliary rhythm information and the power information, but the system can be designed to perform the process on the basis S o the input auxiliary rhythm information and the pitch .information, or can also be so designed as to perform the process on the basis of the input auxiliary rhythm information and the power information and the pitch information.
lo The system according to this invention is arranged so as to provide a user with input auxiliary rhythm information and let the user input acoustic signals, thereby enablin~ the user to input acoustic signals with greater ease and simplicity, so that he can input the intended acoustic signals with accuracy in terms of rhythm, with the result that greater facility is attained in the performance of the segmentatîon process for such acoustic signals and that ~he precision of the musical score data so prepared can be improved positively.
Moreover, the system is designed also to record the input auxilia~y rhythm information provided to the users on the same time axis as the acousti.c signals, so that the information so recorded may be made availablP for the process o~ segmentation process. This feature makes it ~5 possible to perform accurate segmentation, thereby :
:
' 2~ 3 enhancing the precision of the musical score data generated by the system.
While this invention has been described in connection wit.h what is presently considered to be the most practical and pre~erred embodimenk, it is to be understood that the invention is not limited to the disclosed embodiment, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope o~ the appended claims.

Claims

1. A method for automatically transcribing music, comprising the steps of:
capturing acoustic signals;
storing the acoustic signals in a memory;
simultaneous with the step of capturing, reporting information on input auxiliary rhythms including at least information on tempo;
extracting, from said acoustic signals stored in memory, pitch information, which represents the repetitive cycles of their waveforms and their sound pitch, and power information;
segmenting the acoustic signals on the basis of the pitch information and/or the power information, the process of segmenting including dividing the acoustic signals into sections each of which can be regarded to form a single level in musical interval;
identifying each of the segments with a level on the axis of absolute musical interval on the basis of the pitch information; and displaying/reporting the results of the foregoing steps.

2. An automatic music transcription method according to claim 1, wherein the reporting step comprises the step of reporting with an audio signal.

3. An automatic music transcription method according to claim 1, wherein the reporting step comprises the step of reporting with a video signal.

4. An automatic music transcription method according to claim 1, wherein the reporting step comprises the step of reporting with both audio and video signals.

5. An automatic music transcription method according to claim 1 further comprising the step of storing the auxiliary rhythms in the memory on the same time axis as that of the acoustic signals.

6. An automatic music transcription method according to claim 1 wherein the segmenting step comprises the steps of:
first segmenting, on the basis of the input auxiliary rhythm information stored in the memory, the acoustic signals into sections each of which can be regarded as forming one and the same level of musical interval, second segmenting, on the basis of the pitch information and the power information, the acoustic signals into sections each of which can be regarded as forming one and the same level of musical interval, and third making adjustments to those sections as divided into segments by the first and the second steps.

7. An automatic music transcription system, comprising:
means for receiving acoustic signals to be transcribed;
means for providing auxiliary rhythm information including tempo information, at the time when the acoustic signals are being received;
a memory;
means for processing and storing into the memory the acoustic signals and rhythm information;
pitch and power extraction means for extracting from the acoustic signals stored in memory pitch information, which represents a repetitive cycle of the waveforms of the acoustic signals and a level of musical interval of such signals, and the power information;
segmentation means for dividing the acoustic signals into sections each of which can be regarded as forming one level of musical interval as determined on the basis of the pitch information and/or the power information; and musical interval identification means for identifying the musical interval of the above-mentioned acoustic signals with reference to an absolute axis of musical interval.

8. An automatic music transcription system according to claim 7, further comprising means for reporting the input auxiliary rhythm information in an audio form.

9. An automatic music transcription system according to claim 7, further comprising means for reporting the input auxiliary rhythm information in a visual form.

10. An automatic music transcription system according to claim 7, further comprising means for reporting the input auxiliary rhythm information in both audio and visual form.

11. An automatic music transcription system according to claim 7, wherein said means for processing and storing comprises means for storing the auxiliary rhythm information and the acoustic signals in memory on the same time axis at the time when the above-mentioned acoustic signals are captured and stored in the memory.

12. An automatic music transcription system according to claim 11, wherein the segmentation means comprises:
a first segmenting section for segmenting, on the basis of the input auxiliary rhythm information stored in the memory, the acoustic signals into sections each of which can be regarded as forming one and the same level of musical interval;
a second segmenting section for segmenting, on the basis of the pitch information and the power information, the acoustic signals into sections each of which can be regarded as forming one and the same level of musical interval; and a third segmenting section for making adjustments to those sections as divided into segments by the first and second segmenting sections.