AU2003204917B2 - Method and Apparatus for Synchronising a Keyframe with Sound - Google Patents

Method and Apparatus for Synchronising a Keyframe with Sound Download PDF

Info

Publication number
AU2003204917B2
AU2003204917B2 AU2003204917A AU2003204917A AU2003204917B2 AU 2003204917 B2 AU2003204917 B2 AU 2003204917B2 AU 2003204917 A AU2003204917 A AU 2003204917A AU 2003204917 A AU2003204917 A AU 2003204917A AU 2003204917 B2 AU2003204917 B2 AU 2003204917B2
Authority
AU
Australia
Prior art keywords
time
tempo
beat
varying signal
animation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2003204917A
Other versions
AU2003204917A1 (en
Inventor
Cameron Bolitho Browne
Gerard Anthony Hill
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AUPS3153A external-priority patent/AUPS315302A0/en
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2003204917A priority Critical patent/AU2003204917B2/en
Publication of AU2003204917A1 publication Critical patent/AU2003204917A1/en
Application granted granted Critical
Publication of AU2003204917B2 publication Critical patent/AU2003204917B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Description

I
S&F Ref: 639006
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address of Applicant: Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Gerard Anthony Hill Cameron Bolitho Browne Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Method and Apparatus for Synchronising a Keyframe with Sound ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU PS3153 [32] Application Date 24 Jun 2002 The following statement is a full description of this invention, including the best method of performing it known to me/us:- I -1- METHOD AND APPARATUS FOR SYNCHRONISING A KEYFRAME WITH
SOUND
Field of the Invention The present invention relates generally to computer animation and, in particular, to a method and apparatus for generating a keyframe animation sequence according to an audio signal, and to a computer program product including a computer readable medium having recorded thereon a computer program for generating a keyframe animation sequence according to an audio signal.
Background Animation is based on a series of related still images viewed successively over a short period, the images being perceived by a viewer as continuous motion. Each individual still image is generally referred to as a frame.
Conventionally, the main drawback in generating an animation sequence has been the work required by an animator to produce the large magnitude of frames required.
A short period of animation one minute) can require between 600 and 1800 separate image frames, depending on the quality of the animation. Thus, generating images manually is very labour intensive. The amount of labour required was the catalyst behind the development of a technique known as "keyframing".
The majority of frames of an animation sequence involve routine incremental changes from a previous frame directed toward a defined endpoint. Conventional animators, such as Walt Disney Studios T M realised that they could increase the productivity of their master artists by having them draw only the important frames, called "keyframes". Lower skilled assistants could then determine and produce the frames in between the key-frames. The in-between frames are generally referred to as "tweens".
Once all of the keyframes and tweens had been drafted, the images were inked or rendered to produce the final images. Even today, production of a traditional animation sequence requires many artists in order to produce the thousands of images required.
639006.doc
I
More recently it has become popular to create complex 3D models, photorealistic still images, and film quality animation through the utilisation of computer systems having high-resolution graphics displays. The graphic image animation production industry is undergoing rapid development and complex and sophisticated image production systems are often utilised for modelling, animating and rendering scenes.
Typically such image production systems generate computer based animations by creating a timeline including keyframes, each of which specify the appearance of a frame of the animation. Tweens or intermediate frames on the timeline are derived by interpolating between two adjacent keyframes such that motion appears to be continuous and smooth. Generally, this is achieved by advancing a time factor for which a keyframe is required, at a fixed rate defined by the rate at which new frames are required the playback rate). The time factor can be advanced forwards or backwards in such conventional systems depending on the desired direction of the animation. The result is a plurality of frames that are snapshots of an animation at particular points in time.
Computer based animations typically consist of several layers of graphical objects containing animated elements4hat are independent of each other. These layers of animated objects are composited together on a frame by frame basis to create the final animation. During the compositing process, colour components typically Red, Green and Blue (RGB) colour components) are added together for each pixel of each of the layers, resulting in the final colour components for each pixel. The colour components are generally pre-multiplied by an opacity value so that a simple summing algorithm may be used to produce the final image. Alternatively, other mathematical operations may be used to implement different combinations of the layers. However, it is often desirable to synchronise a keyframe based animation sequence with an externally generated asynchronous audio sequence such that particular events in the animation match with events in the audio stream. One major disadvantage of conventional image production systems is that they are limited in allowing the alteration of an animation according to an external input in such a manner.
639006.doc
I
Summary Disclosed are arrangements which seek to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements by allowing the synchronisation of an animation with an external asynchronous input. Such synchronisation is achieved by alternating interpolation of tween frames and keyframes of an animation so that the frames match the dynamics, or any other chosen property, obtained from audio analysis of an input audio signal music). In addition, other properties can be used to alter the appearance of frames of an animation.
The alternating of frame interpolation described above can be based on the prediction of an input event and the rendering of appropriate interim frames for display ensuring that the event occurs on a desired keyframe. Such prediction can reduce processing delay and allow an animation to proceed in real time.
According to a first aspect of the present disclosure, there is provided a method of generating an animation sequence according to a time varying signal, said animation sequence having a plurality of frames, one or more of said frames being a keyframe, said method comprising the steps of: determining a pre-selected property of said time varying signal; predicting a position in time in said animation sequence for generation of at least one of said keyframes depending on said property; and generating said animation sequence in real time according to the predicted position in time for generation of said at least one keyframe.
According to a second aspect of the present disclosure, there is provided an apparatus for generating an animation sequence according to a time varying signal, said animation sequence having a plurality of frames, one or more of said frames being a keyframe, said apparatus comprising: keyframe, said apparatus comprising: 639006.doc
I
property determining means for determining a pre-selected property of said time varying signal; keyframe prediction means for predicting a position in time in said animation sequence for generation of at least one of said keyframes depending on said property; and animation generation means for generating said animation sequence in real time according to the predicted position in time for generation of said at least one keyframe.
According to a third aspect of the present disclosure, there is provided a program for generating an animation sequence according to a time varying signal, said animation sequence having a plurality of frames, one or more of said frames being a keyframe, said program comprising: code for determining a pre-selected property of said time varying signal; code for predicting a position in time in said animation sequence for generation of at least one of said keyframes depending on said property; and code for generating said animation sequence in real time according to the predicted position in time for generation of said at least one keyframe.
Other aspects of the invention are also disclosed.
Brief Description of the Drawings One or more embodiments of the present invention will now be described with reference to the drawings, in which: Fig. 1 is flow diagram showing a method of generating an animation sequence according to an audio signal, Fig. 2 shows a schematic block diagram of a general purpose computer upon which arrangements described can be practiced; Fig. 3 is a block diagram showing preferred software components for implementing the arrangements described; Fig. 4 shows a keyframe timeline representing an animation comprising four keyframes; 639006.doc
I
Fig. 5 shows a timeline representing a sequence of events received from an audio signal input source; Fig. 6 is a graph showing change in interpolation factor during rendering of the keyframes for the animation represented by the timeline of Fig. 4; Fig. 7 is a graph showing change in playback rate for the animation represented by the timeline of Fig. 4; Fig. 8 is a flow diagram showing a method of synchronising an animation with the beat of an audio signal; Fig. 9 is a flow diagram showing a method of determining a keyframe time as performed during the method of Fig. 8; Fig 10 is a flow diagram showing a method of predicting a next beat for an input audio signal stream.
Fig. 11 is a flow diagram showing a method for resetting a timer when a timer interrupt is received; Fig. 12 is a flow diagram showing a method of detecting the tempo and beat of an audio signal; Fig. 13(a) shows a portion of a real power signal waveform; Fig. 13(b) shows the first derivative of the power signal waveform of Fig. 13(a); Fig. 14 shows a music event list for the waveforms of Figs. 13(a) and Fig. 15 is a flow diagram showing a method of analysing the music event list of Fig. 14; Fig. 16 is a flow diagram showing a method of processing tempo and beat lists; and Fig. 17 is a graph showing the scoring behaviour of a particular beat.
Detailed Description including Best Mode 639006.doc -6- Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
It is to be noted that the discussions contained in the "Background" section relating to prior art arrangements relate to discussions of documents, devices or systems which form public knowledge through their respective publication and/or use. Such should not be interpreted as a representation by the present inventor(s) or patent applicant that such documents or devices in any way form part of the common general knowledge in the art.
A method 100, as seen in Fig. 1, of generating an animation sequence according to an audio signal is described below with reference to Figs. 1 to 17. The method 100 is preferably practiced using a general-purpose computer system 200, such as that shown in Fig. 2 wherein the processes of Fig. 1 and Figs. 3 to 17 (to be described) may be implemented as software, such as an application program executing within the computer system 200. In particular, the steps of method 100 are effected by instructions in the software that are carried out by the computer. The instructions may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part performs the method and a second part manages a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer from the computer readable medium, and then executed by the computer. A computer readable medium having such software or computer program recorded on it is a computer program product.
The use of the computer program product in the computer preferably effects an advantageous apparatus for implementing the methods described herein.
639006.doc
I
-7- The computer system 200 is formed by a computer module 201, input devices such as a keyboard 202 and mouse 203, output devices including a printer 215, a display device 214 and loudspeakers 217. A Modulator-Demodulator (Modem) transceiver device 216 is used by the computer module201 for communicating to and from a communications network 220, for example connectable via a telephone line 221 or other functional medium. The modem 216 can be used to obtain access to the Intemrnet, and other network systems, such as a Local Area Network (LAN) or a Wide Area Network (WAN), and may be incorporated into the computer module201 in some implementations.
The computer module 201 typically includes at least one processor unit 205, and a memory unit 206, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 201 also includes an number of input/output (110) interfaces including an audio-video interface 207 that couples to the video display 214 and loudspeakers 217, an I/O interface 213 for the keyboard 202 and mouse 203 and optionally a joystick (not illustrated), and an interface 208 for the modem 216 and printer 215. In some implementations, the audio-video interface 207 can include an analogue to digital converter (not shown). In this instance, the interface 207 can accept either an analogue or digital audio signal on an audio input 230 for processing by the processor 205 and/or reproduction on the loudspeakers 217. However, a person skilled in the relevant art would appreciate that such a converter can be implemented as software being controlled in its execution by the processor 205.
In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. A storage device 209 is provided and typically includes a hard disk drive 210 and a floppy disk drive 211. A magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 212 is typically provided as a non-volatile source of data. The components 205 to 213 of the computer module 201, 639006.doc
I
-8typically communicate via an interconnected bus 204 and in a manner which results in a conventional mode of operation of the computer system 200 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations or alike computer systems evolved therefrom.
Typically, the application program is resident on the hard disk drive 210 and read and controlled in its execution by the processor 205. Intermediate storage of the program and any data fetched from the network 220 may be accomplished using the semiconductor memory 206, possibly in concert with the hard disk drive 210. In some instances, the application program may be supplied to the user encoded on a CD-ROM or floppy disk and read via the corresponding drive 212 or 211, or alternatively may be read by the user from the network 220 via the modem device 216. Still further, the software can also be loaded into the computer system 200 from other computer readable media. The term "computer readable medium" as used herein refers to any storage or transmission medium that participates in providing instructions and/or data to the computer system 200 for execution and/or processing. Examples of storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transmission media include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The method 100 is preferably implemented as software resident on the hard disk drive 210 and being controlled in its execution by the processor 205. The method 100 begins at the first step 101, where the processor 205 determines a pre-selected property of an audio signal. The audio signal is input to the computer module 201 via the input 230.
639006.doc
I
-9- Alternatively, a person skilled in the relevant art would appreciate that such an audio signal could be accessed from the computer network 220, via the modem 216 and interface 208, or from the CD-ROM drive 212, for example. In one arrangement, the property determined at step 101 is a time to a next event, such as a beat in the audio signal where the audio signal represents music. In this instance, the method 100 can be utilised in order to ensure that a keyframe of the animation is rendered at a time corresponding to the beat. That is, the keyframe is synchronised with the beat of the audio signal. In this connection, the processor 205 can be configured to determine a tempo for the audio signal and a method 1200 for detecting the tempo of the audio signal will be explained in detail below with particular reference to Fig 12. However, any method known to those in the relevant art can be utilised for determining the tempo of the audio signal.
In an alternative arrangement, the property of the audio signal determined at step 101 can be a parameter associated with one or more frames of the animation such as predominant pitch or current loudness. In such an arrangement, an animation can be rendered such that when the particular parameter reaches a threshold the animation can be configured to jump to a particular frame.
The method 100 continues at the next step 103 where the processor 205 determines a playback rate for at least one frame of the animation sequence depending on the property determined at step 101. The term playback rate refers to the rate at which one or more frames of an animation are played back. At the next step 105, the processor 205 generates one or more frames of the animation sequence according to the playback rate.
As will be explained in detail below, the rendering of the frames can be synchronised with the events accented beats) of the audio signal by varying the playback rate of the frames. In one arrangement, a keyframe can be held for some duration rather than shown for a single frame time. The method 100 concludes at the next 107 if all frames of 639006.doc
I
the animation sequence have been rendered. Otherwise, the method 100 returns to step 101.
The method 100 of generating an animation sequence according to an audio signal, will now be described in further detail by way of example with particular reference to Figs. 3 to 7.
As described above, the instructions for performing the steps of the method 100 may be formed as one or more code modules, each for performing one or more particular tasks. Turning to Fig. 3, one arrangement 300 of stch code modules comprises an animation design code module 301, a frame rendering code module 302, a playback controller code module 304 and an audio analyser code module 303. The code modules 301 to 304 are preferably resident on the hard disk drive 210 and are read and controlled in their execution by the processor 205. The animation design code module 301 consists of the parameters for each keyframe of an animation. These parameters are passed to the frame renderer code module 302, which is responsible for reading the keyframe information and creating a frame of video for output by the computer module 201. The frame renderer code module 302 also accepts instructions from the playback controller code module 304 including a render time of a next frame to be rendered. The playback controller module 304 preferably comprises the software for implementing the method 100. The playback controller module 304 accepts input from the audio analyser code module 303 which in turn accepts an audio input signal 305 and processes the signal to extract parameters for controlling the playback of an animation. A person skilled in the relevant art would appreciate that the input 305 is different from the audio input 230, since the audio analyser module is a software module resident on the hard disk drive 210.
However, the audio signal present on the audio input 230 is substantially reflected on the input 305, possibly after analogue to digital conversion of the signal by a converter integrally formed within the interface 207.
639006.doc
I
-11- Turning now to Fig. 4, there is shown an example of a keyframe timeline 410 showing the render times for four keyframes 400, 401, 402 and 403 marked on the timeline 410 and representing an animation (not shown). The keyframes 400, 401, 402 and 403, are rendered at 0, 2, 3.5 and 4 seconds, respectively, as shown on the timeline 410. The parameters for each of these keyframes 400, 401, 402 and 403 are preferably contained in the animation design code module 301.
Fig. 5 shows a timeline 510 indicating the positions of beats 500, 501, 502, 503, 504 and 505, extracted from an input audio signal stream by the audio analyser module 303.
As seen in Fig. 5, the beats 500, 501, 502, 503 and 504 occur at regular one second intervals commencing at t=0. Data representing these beats 500, 501, 502, 503 and 504 can be fed into the playback controller module 304 as a signal source for which the keyframes 400, 401, 402 and 403 of the animation represented by the timeline 410 can be synchronised.
In order to synchronise an animation with the beats of a particular audio signal, events extracted from the signal are used to influence a frame of the animation to be rendered by the frame rendering module 302, the parameters of which can be contained within the animation design module 301. In the case of the animation represented by the example timeline 410 of Fig. 4, the rendering of the keyframes 400, 401, 402 and 403 can be synchronised with the beats 500, 501, 502, 503, 504 and 505, by varying a "playback rate" of the keyframes 400, 401, 402 and 403. As described above, the term playback rate refers to the rate at which an original animation is played back relative to a specified output frame rate, in frames per second, in order to synchronise keyframes of the animation with the beats of an audio signal stream. The playback rate will be explained in more detail below by way of example.
In order to synchronise the keyframes 400, 401, 402 and 403 with the beats 500, 501, 502, 503, 504 and 505, the frame renderer module 302 can consume multiple beats 639006.doc 12between a keyframe pair the keyframes 400 and 401). Conversely, multiple keyframes can be inserted between each of the beats 500, 501, 502, 503, 504 and 505.
Fig. 6 is a graph 600 showing the change in an interpolation factor (indicated by the vertical axis 611) during rendering of keyframes 400, 401, 402 and 403 at render times corresponding to the beats 500, 502, 504 and 505 indicated by the horizontal axis 610.
The interpolation factor is derived by the frame renderer module 302 for each frame (e.g.
402) that is being generated and indicates the proportion of contribution from each of the keyframes 400, 401, 402 and 403 prior to and following a particular point in time. In this connection, an intermediate frame a tween) can be created by linearly combining parameters from each of two adjacent keyframes 400 and 401).
Fig. 7 is a graph 700 showing the change in playback rate (indicated by the vertical axis 761) for the animation represented by the timeline 410 at render times corresponding to the beats 500, 502, 504 and 505 indicated by the horizontal axis 760.
Referring to Figs. 6 and 7, at time, t=0, the processor 205, for example, renders keyframe 400. Between, t=0 and t=2, the interpolation factor increases linearly 603 indicating that an increasing proportion of the parameters of keyframe 401 affect the calculation of intermediate frames, on the display 214, for example, as seen in Fig. 6. At time, t=2, the processor 205 renders keyframe 401 since the beat 502 corresponds with keyframe 401. As such the render time for the keyframes 400 and 402 progresses at a rate 750 corresponding to the timeline 410. However, during the rendering of the keyframe 400 of the animation represented by the timeline 410, there are more beats than keyframes beats 501 and 502). Thus, the beat 501 is ignored by the processor 205 so that the animation plays back at the same speed as the timeline 410. In connection with the above, during the rendering of the interval 603 the playback rate is 1.0, as shown at 750 in Fig. 7.
639006.doc -13- Between, t=2 and t=4, the interpolation factor increases linearly 601 as described above for the period t=0 to t=2. After time, t=2, beats 503 and 504 occur at times t=3 and t=4, respectively, and the keyframe 402 original render time is t=3.5, as shown in Fig 4.
According to the timeline 410, the keyframe 402, by occurring at time, t=3.5, is to be rendered between the beats 503 and 504 on timeline 510. However, in order to synchronise the rendering of the keyframe 402 with the beats 500, 501, 502, 503, 504 and 505 of the timeline 510, the keyframe 402 can be matched to one of the beats 503 or 504.
In this example, the keyframe 402 is matched to the beat 504 resulting in the processor 205 rendering the keyframe 402 at time, t=4. Note that the keyframe 402 original render time is t=3.5 but the actual render time corresponds to beat 504 at t=4. Thus, the animation represented by the timeline 410 has been slowed down to match the timing of the beat 504. Accordingly, this slow down in the rendering of the animation is shown at 752 of Fig. 7, as a reduction in the playback rate of the animation to 0.75. The 1.5 seconds of animation from time, t=2 to t=3.5 is played back over two seconds from t=2 to t=4.
Alternatively, the keyframe 402 can be matched to the beat 503 at t=3 which would result in the animation speeding up in comparison with the original render times of the animation as seen in the timeline 410.
Continuing the example of Figs. 4 to 7, the render time of the keyframe 402 is t=4 as seen on the timeline 410. However, it is the relative timing of the position of the keyframe 403, which is of particular importance. Since the keyframe 403 occurs seconds after keyframe 402 the presently rendered keyframe) and beat 505 occurs one second after beat 504, in order to synchronise the keyframe 403 with the beats 500 to 505, beat 505 is matched with keyframe 403. As seen in Fig. 6, keyframe 403 is rendered at t=5. Again, in the period t=4 to t=5, there are more keyframes than beats, the result being that the animation plays back at a slower rate than in the original timeline 410 of Fig. 4. Accordingly, as shown in Fig 7, the playback rate for the animation is reduced to 639006.doc
I
-14during the period t=4 to t=5 (754). Further, the interpolation factor increases linearly 602 at a higher rate as seen in Fig. 6.
Fig. 8 is a flow diagram showing a method 800 of synchronising an animation with the beat of an audio signal. The method 800 is preferably implemented as software resident on the hard disk drive 210 and being controlled in its execution by the processor 205. The method 800 begins at step 802, where a playback rate parameter, stored in memory 206, is initialised for the animation to be rendered. An output frame rate in frames per second is also specified. At the next step 804, a keyframe time parameter is determined by the processor 205 and stored in memory 206. The keyframe time represents the time on the keyframe timeline the timeline 410) for which a next frame of the animation is to be rendered. A method 900 of determining the keyframe time as performed at step 804 will be explained in more detail with reference to Fig 9.
The method 800 continues at step 806, where the keyframe time represented by the keyframe time parameter is passed to the frame renderer module 302 and a video frame corresponding to the keyframe time is output for rendering on the display 214. At the next step 808, if there are more frames to be rendered then the method 800 proceeds to step 810. Otherwise, the method 800 concludes. At step 810, there is a wait of one frame time while the rendered frame is displayed on the display 214. The frame time is the reciprocal of the output frame rate. After step 810, the method 800 returns to step 804 and proceeds with the rendering of the next frame of the animation by repeating steps 804 to 810 until the generation of the animation has been completed.
The method 900 begins at step 902 where the keyframe time parameter stored in memory 206 is advanced by the value of the current playback rate parameter stored in memory 206 multiplied by the frame time. At the next step 904, if there is an accented beat present on the input 305 to the playback control module 304 then the method 900 continues at the next step 906. Otherwise, the method 900 concludes. At step 906, if the 639006.doc present time of the currently executing animation is close to another keyframe on the keyframe timeline timeline 410), then the method continues at the next step 908.
Otherwise, the method 900 proceeds to step 914. At step 908, the keyframe time parameter stored in memory 206 is set to the keyframe time associated with the keyframe identified at step 906 and the method 900 concludes. The comparison performed at step 906 can be based on a predetermined tolerance half a frame time at the present playback rate).
At step 914, a next beat is determined. A method 1000, as seen in Fig. 10, of predicting the position of a next beat, as performed at step 914, will be described in detail below with reference to Figs. 10 and 11. However, any method known to those in the relevant art for predicting the position of a next beat can be utilised in the arrangements described herein. The method 1000 is preferably implemented as software being resident on the hard disk drive 210 and being executed by the processor 205. Such software is preferably implemented as part of the audio analyser module 303.
The method 900 continues at the next step 916 where if the processor 205 determines that the time to the next beat is less than the time to the next keyframe in the animation then the method 900 continues at the next step 918. Otherwise, the method 900 proceeds to step 924. At step 918, the processor 205 determines the number of beats to the next keyframe of the animation. Such a determination can be made based on the period of the previous beats, for example, or using the method 1000 which will be explained in detail below. At the next step 920, the value of the playback rate parameter stored in memory 206 is changed so that an integral number of beats occur before the next keyframe and the method 900 concludes. In this instance, the playback rate parameter can be either increased or decreased by the processor 205 in order to ensure that such an integral number of beats occur.
639006.doc
I
-16- At step 924, the playback rate parameter stored in memory 206 is decreased so that the next beat falls at the same time that the next keyframe of the animation occurs and the method 900 concludes.
The methods described above can synchronise an accented beat to keyframes by adjusting a playback rate parameter associated with the keyframes so that the beat and the keyframes coincide. In one arrangement, the execution of a keyframe can be hard locked to a particular beat when a first beat of an audio signal is encountered by the processor 205. In this instance, the processor 205 can set the keyframe time parameter to the start time of the first keyframe at step 908, when the first beat is encountered.
A timer counter, configured within memory 206, can be used to store the time to a next beat in an input audio signal. The timer counter can be read at any time, for example at step 914 of the method 900. Fig. 11 is a flow diagram showing a method 1100 for resetting such a timer counter when a timer interrupt is received by the processor 205, typically at hundreds or thousands of times per second.
The method 1100 is preferably implemented as software resident on the hard disk drive 210 and being controlled in its execution by the processor 205. The method 1100 begins at the first step 1101, where a timer interrupt is received by the processor 205. At the next step 1102, a timer counter configured within memory 206 is decremented. The method 1100 continues at the next step 1104 where if the processor 205 determines that the timer counter is negative, then the method 1100 proceeds to step 1108. Otherwise, the method 1100 concludes. At step 1108, the processor 205 resets the timer counter to zero and the method 1100 concludes.
Fig 10 is a flow diagram showing a method 1000 of predicting a next beat for an input audio signal stream. The method 1000 is preferably implemented as software being resident on the hard disk drive 210 and being controlled in its execution by the processor 205. The software for the method 1000 is preferably configured as part of the audio 639006.doc -17analyser code module 303. The method 1000 begins at the first step 1002, where a tempo value, determined by the audio analyser code module 303, is converted to a time value by taking the reciprocal of the tempo value. A method 1200 of detecting the tempo value of an audio signal, as performed by the audio analyser code module 303 will be explained in more detail below. However, any method known to those in the relevant art can be utilised by the audio analyser code module 303 in order to determine the tempo of the input audio signal stream.
One or more operations can be performed by the processor 205 in order to scale the tempo value to suit the timer stored in memory 206. For example, if the timer operates in milliseconds, and the tempo value is in beats per minute, the conversion is as follows: timer value 1000 60 tempo (1) The method 1000 continues at the next step 1004 where if a beat is present at the input of the audio analyser 303, then the method 1000 proceeds to step 1008. Otherwise, the method 1000 concludes. A beat presence indicator a flag) configured within the audio analyser code module 303 can be used to determine the presence of a beat. At step 1008, the timer counter configured within memory 206 is set to the value determined for the timer counter at step 1002. The method 1000 concludes after step 1008.
The combination of setting the timer counter in the method 1000 and decrementing the timer counter in the event of an interrupt being received by the processor 205 in the method 1100, ensures that the timer counter is always set to the time of the next beat.
The timer counter provides a continuous estimate of the time until the next beat.
As discussed above with reference to step 1104 of the method 1100, the timer counter can reach negative values. This can indicate that a beat is late due to tempo reduction, or that the beat is missing altogether. In this instance, the time to the next beat 639006.doc -18can be interpreted as zero in order to allow an animation being rendered to continue at its intended render rate.
A method 1200 of detecting the tempo and beat of an input audio signal stream is described below with reference to Fig. 12. The tempo value determined using the method 1200 can be utilised by the processor 205 during the method 1000 of predicting a next beat for an input audio signal stream. The method 1200 processes short segments (e.g.
lOOms segments) of the audio signal in real-time. For ease of explanation, the steps of the method 1200 are described with reference to music represented by a digital waveform audio signal. However, it is not intended that the present invention be limited to the described methods. For example, the method 1200 may have application to speech and other forms of audio data.
The steps of the method 1200 are preferably implemented as software as part of the audio analyser 303 resident on the hard disk drive 210 and being read and controlled in its execution by the processor 205. In the method 1200, a digital waveform audio signal is received by the processor 205 in real time as short blocks of audio data, via the audio input 230 and interface 207, for processing. As described above, a person skilled in the relevant art would appreciate that such an audio signal could alternatively be accessed from the computer network 220, via the modem 216 and interface 208, or from the CD- ROM drive 212, for example. The steps of the method 1200 are repeated for each new block (or current block) of audio data detected by the processor 205. At the first step 1201 of the method 1200, the processor 205 detects a current block of audio data. The length of the block of audio data is preferably 1024 samples, at a sampling rate of 11.025kHz, which corresponds to approximately 93ms of audio data. However, a person skilled in the relevant art would appreciate that any suitable sampling rate, which allows the computer system 200 to respond to changes in the tempo of music represented by the input audio signal, can be used. Further, in order for the processor 205 to predict a next 639006.doc 19beat for the input audio signal, using previously received audio data, the previously received audio data is preferably less than one second old. For a sampling rate producing 93ms blocks of audio data, the processor 205 detects approximately ten blocks of audio data per second, which is very suitable for detecting the data and then predicting the next beat for the input audio signal. A high sampling rate, and therefore a high processing rate is preferred, particularly if the system 200 is also being used to output other real-time attributes instantaneous volume and frequency measures) of the audio signal.
The method 1200 continues at the next step 1203, where a series of frequency spectra in time are determined from the audio data block detected at step 1201. The frequency spectra can be determined by applying a frequency domain transformation to the block of data, such as a 512-point Fast Fourier Transform. The transformation is preferably applied to a segment of the data in the audio signal using an appropriate windowing algorithm a Hamming window) at regular intervals every 32samples). For a 1024-sample block where the transformation is applied every thirty-two samples, thirty-two frequency spectra 1024 32 32 result. The spacing in time between these frequency spectra is therefore approximately 2.9ms. In order to allow for such a transformation to be applied to the starting edge of a current block of audio data, the last portion of an immediately previous audio data block can be prepended to a current audio data block so that the frequency transformation window is always covering real audio data.
A filter can also be applied to the frequency spectra, at step 1203, to remove or enhance certain frequency ranges in order that tempo and beat analysis can be performed on particular desired frequencies in the audio data. For example, if only low frequency bass drums are to be used for beat and tempo analysis, then the frequency spectra for the input audio data can be low-pass filtered to allow such low frequency signals to be 639006.doc
I
separated in the input audio data. Alternatively, beat and tempo analysis can be performed across the entire available frequency spectrum of the input audio data.
Also at step 1203, the power components of the frequency spectrum of the audio data block are each individually summed to produce a 'power signal' waveform across the audio data block. The power signal waveform can be smoothed using any method known to those in the relevant art in order to remove noise and minor signal variations from the power signal waveform.
As an example, a portion of a real power signal waveform 1310 is shown in Fig.
13(a) with the amplitude of the signal 1310 shown on a vertical axis 1301 and time extending horizontally on a horizontal axis 1303. The first derivative 1320 of the power signal waveform 1310, as shown in Fig. 13(b), representing the change in the power signal waveform 1310 over time, can also be obtained by calculating the difference between successive values in the signal 1310. The first derivative values are known as "delta" values with the magnitude of such values, for the signal 1310, being shown on the vertical axis 1305 as seen in Fig. 13(b).
The method 1200 continues at the next step 1205, where the power signal waveform 1310), associated with the current audio data block, is analysed by the processor 205 to extract 'events' music events) in the input audio signal. A music event corresponds to a significant event in the music represented by the current audio data block such as a drum beat, or a note starting to be played on an instrument. Such events are typically represented by peaks or local maxima in the power signal waveform associated with an input audio signal, and are also seen as spikes in the delta waveform for the input audio signal.
As analysis proceeds across a current audio data block, at step 1205, running averages are dynamically determined by the processor 205 for both the power signal data waveform and the delta first derivative) data waveform associated with the current 639006.doc -21 audio data block. The running averages are determined by the processor 205 and are preferably stored in memory 206 such that the averages pertain to a recent portion in time of the power signal or delta data associated with the current audio data block being analysed.
One method for determining the running averages for the power signal and delta first derivative) waveforms is to use an exponential decay formula as shown below: new_ average f x average (1 f) x data (2) where: f=e-lwidth; and (3) average represents the current running average; new-average represents the new calculated average; data represents a new data value being incorporated into the running average; frepresents a decay factor that determines the rate by which the average adjusts to new data; and width represents an approximate number of data values over which the (j running average is maintained.
Alternatively, a different decay factor f) can be used when the data value, data, has a larger or smaller magnitude than the current running average. For example, the running average value, average, can be made to increase more rapidly for data values that are higher than the current running average. Inversely, the running average value can be 639006.doc
I
-22made to fall less rapidly when the data values are lower than the current running average value. Thus, the calculated average value, newaverage, can be configured to respond more quickly to a rising average value, but more slowly for a general fall in average value. Such a response provides advantages where the calculated average value is used as a threshold to accept or reject maxima, as will be described below.
Maxima for the current audio data block can be identified analytically by examining the power signal waveform 1310) or delta data waveform 1320) associated with the current audio data block, in order to identify peaks in these associated waveforms. These peaks are referred to as candidate maxima 1311) and once such a candidate maximum has been found, the amplitude at the peak is compared to a dynamically determined threshold 1312, represented by the broken line in Fig. 13(a). The threshold 1312 is based on the running average of the power signal as described above, multiplied by a predetermined factor. Typically the factor is quite small, for example, 1.05.
Similarly, the corresponding spike 1321) in the delta data waveform 1320 that precedes the power signal peak 1311 is also examined. The spike 1321 corresponds to the highest occurring gradient just prior to the peak 1311 in the power signal waveform 1310, and represents the onset of a note or sound in the input audio signal that is responsible for the power signal peak 1311. Again, the amplitude value of the spike 1321 is compared to a threshold 1322 based on the running average of the delta data waveform 1320, multiplied by another factor. For the delta data waveform threshold 1322, the factor can be higher in order to reject events that do not have a significant-enough steep enough) signature.
If both the power signal threshold 1312 and the delta data threshold 1322 are exceeded by corresponding maxima found in the respective waveforms 1310 and 1320, an event such as a music event is deemed to have been found. The details of such an event, 639006.doc -23including the power signal amplitude of the maximum 1311 in the power signal waveform 1310 and the magnitude of the delta spike 1321, are recorded by the processor 205 as a 'music event', at step 1205 of the method 1200, and are appended to a music event list stored in memory 206. The use of the delta waveform 1320 enables accurate location of a beat within a block of audio data, since the beat generally occurs at the time of the onset of the music event being observed. Such an event is characterised by a significantly abrupt spike in the delta waveform the delta waveform 1320), and can therefore be additionally filtered by applying a minimum threshold to such spikes prior to accepting them as a candidate beat music event. The method 1200 provides a more accurate indication of where events occur in an input audio signal compared to conventional tempo detection methods such as examining maxima in a raw audio signal and applying a single minimum threshold to the audio signal. In such a conventional method, non-abrupt events in an input audio signal that often do not indicate a beat, for example, can also be accepted as erroneous candidate beats.
An example of a music event list 1400 is shown in Fig. 14. The music event list 1400 is represented by a time line 1410. The list 1400 includes a marker 1430 indicating where in the music event list 1400 new events have been added, so that such new events can be identified during later analysis. A reliable indication of the time that an event occurred is the time index, read from the axis 1307, associated with the spike 1321) found in the delta data waveform first derivative) 1320 that preceded the signal maximum 1311). The list 1400 of Fig. 14 includes several music events indicated by filled diamonds representing old events 1420) and filled circles representing new events 1440). Such events in the list 1400 represent possible instances in time where a beat is occurring in the input music signal being analysed.
Referring to the example of Fig. 13(a), each music event identified in the power signal waveform 1310 has been designated with an arrow above an associated peak (e.g.
639006.doc 24 1311), and the corresponding delta spike is similarly indicated 1321). Note that at the point 1313 of the waveform 1310, the associated power signal maximum amplitude does not exceed the running average threshold) 1312 so that no music event is created even though the associated spike 1323 in the delta waveform 1320 exceeded the delta threshold 1322. The extraction of music events from the power signal waveform 1310 continues in such a manner until the entire current block of audio data has been processed.
The method 1200 continues at the next step 1207, where the event details in the music event list 1400 time index, power signal amplitude of the maximum 1311 in the power signal waveform 1310 and the magnitude of the delta spike 1321) are analysed by the processor 205. As described above, the music events list 1400 can have several new music events added to it during the processing of the current audio data block. The point 1430) in the list 1400 that the new events were added to the list is also recorded. The remainder of the list 1400, prior to the point 1430 where new events were added, contains older music events that were added during processing of previous audio data blocks. The time interval between two events in the list 1400 can be used by the processor 205 to identify the time between successive beats. Hence, 'a pair' of events represents a candidate tempo for the input audio signal at a particular point, and also serves to locate the time of the actual beats of the tempo. The time interval between the events represents the period of the tempo.
The processor 205 also stores and updates the tempos and beats for the current audio data block in memory 206 as a tempo and beat list (not shown), at step 1207. Each tempo stored in the tempo list contains information describing the particular tempo. The information includes the period of the tempo, so that the tempo may be matched and detected in the future. The tempo also includes a score, which allows comparison between tempos.
639006.doc Fig. 15 shows a method 1500 of analysing the music event list stored in memory 206 and updating the tempos and beat list associated with the current audio data block, as performed at step 1207. The steps of the method 1500 are preferably implemented as software resident on the hard disk drive 210 and being read and controlled in its execution by the processor 205. The method 1500 begins at step 1501, where the processor 205 truncates the music events list 1400) by removing-old events so that the oldest music event in the list 1400 is no more than'a predetermined period max_tempoperiod) older than the time index of the start of the current audio data block. Truncating the list 1400 can be performed since tempos with a period greater than maxtempoperiod are not tracked by the system 200. There is therefore no need to retain music events, on the list 1400, which will pair up with another music event if the period of the tempo represented by such a pairing exceeds this limit. The music event list 1400 can be implemented in memory 206 or the hard disk drive 210 as an array or series of memory buckets as known to those in the relevant art, where each of the buckets stores the details of one of the music events. Truncating the list 1400 in the manner described above provides savings in memory 206 and improves performance by reducing the number of event pairs to be analysed over the conventional tempo detection methods.
As described below each new music event 1440) in the list 1400 is paired with each music event 1420) having an earlier time index, from oldest to newest. At step 1513, if a new music event is available then the method 1500 proceeds to step 1503.
Otherwise, the method 1500 concludes. At step 1503, each new music event in the list 1400 is paired with each music event having an earlier time index, from oldest to newest.
The first such pair in the example of Fig. 14 is represented by the arrow 1450. Each pair identified in this way is also analysed at step 1503. As mentioned above, a music event pair represents two candidate beats in the music. Using a pair of music events to match against existing tempos and beat phases reduces the number of false beats being detected.
639006.doc -26- Such false beats can result from erroneous noise in the audio, and often occur in isolation and not in pairs or longer sequences. If the period period) between the two events of a music event pair is smaller than a predefined minimum tempo, period, min_tempoperiod, the pair is disregarded and a next pair is formed. The list of tempos stored in memory 206 is examined at the next step 1505 and if an existing tempo is found with period, existingperiod, that matches the period between the music event pair, then a tempo match is deemed to have occurred. A tolerance eperiod is used to allow for minor tempo fluctuations. For example, Eperiod 0.08. A tempo is matched when Sperio e xisting period <1 period (4) 8 peiod~ e period (4 As described above, each tempo stored in the tempo list contains details describing the particular tempo. Information such as the period of the tempo is recorded, so that the tempo may be matched and detected in the future. The tempo also includes a score, which allows comparison between tempos. Running averages of the peak signal and delta data associated with music events that have been matched with the tempo are also stored in the tempo list. The running averages gives a measure of how 'powerful' recent music events corresponding to a tempo were, and can be used along with the score when comparing tempos to determine which tempo most probably represents the tempo of the input music signal.
The method 1500 continues at the next step 1507, where for each matching tempo found, the details of the existing tempo are updated with the attributes of the corresponding music event pair. The period of the tempo is updated slightly by the new period of the music event pair being processed. For example, a factor of 1/3 may be 639006.doc -27t applied to the period of the tempo to dampen the effect so that the tempo period is Supdated smoothly as follows: updatedperiod (period existingperiod) x 1/3 existingperiod The peak signal and delta values of the music events in the pair are also incorporated into the running averages in the tempo in a similar fashion to Equation The time index of the newer of the music events in the pair is stored in memory 206 in order to record the last time that the tempo was detected. The score of the tempo is also (Ni increased by the inverse of tempo period. Thus, faster tempos get a higher score.
Alternatively, scoring where the increase is not based on the tempo period can also be used. While a slower tempo will be detected less frequently than a faster tempo, in this case, the slower tempo may detect multiple beat phases half-beats) which will also increase the overall score associated with a tempo. Slower tempos generally get a higher score in this case. A combination of the two above-mentioned methods for adjusting the score associated with a tempo can also be used, including adjusting the score based on the running averages of the power signal and delta data associated with the current audio data block. If no existing tempo is found that has a similar period to the music event pair, at step 1507, then a new tempo is created based solely on the music event pair being analysed.
A maximum limit on the number of tempos being tracked can be set, and when the number of tempos being tracked reaches the maximum, an existing tempo can be removed from the tempo list stored in memory 206. One method for choosing a tempo to remove from the list is by examining all tempos and removing the tempo that has not been matched for the longest time. That is, remove the details of the tempo that has the earliest time index associated with the last time the tempo was matched. Choosing too low a limit on the number of tempos being tracked can sometimes cause the analysis to remove a tempo which is actually the true tempo of the music. However, this problem 639006.doc -28does not occur if a sufficiently high enough limit is used 16), when the detectable tempo range is set for 30 to 165 beats per minute max_tempoperiod 2s, min_tempoperiod 0.36s, respectively). The benefit of using a high limit is that it allows the system 200 to track several multiples of the tempo and beat of the music represented by the input audio signal. This information canbe used to make better decisions regarding the choice of actual beat location, as well as identifying which beat is a downbeat, since the beats occur at multiples of the beat of the music.
Within each tempo that is being tracked for an input audio signal, a number of beat phases can also be monitored. A beat phase for a particular tempo is a beat being tracked in the music that is recurring at the rate of that tempo. Each beat phase stored with the tempo contains details describing the beat, such as a score to allow comparison with the other beats in the same tempo. Also, running averages of the peak signal and delta data associated with the music events that have been matched with this beat can be tracked. Such running averages represent to some extent the 'power' of recent music events corresponding to a beat, and can also be used with the score for comparing the beats. The last time the beat was detected is also recorded, and can be used for phasechecking the beat against future music events to determine if the beat has been detected more than once. The different beat phases in a tempo are therefore discriminated by the actual times at which the last beat last music event) was detected that matched that beat phase.
The method 1500 continues at the next step 1509, where a music event pair that has successfully matched a tempo the period between the music events matches the period of the tempo) is checked by the processor 205 to determine if the music event pair is in phase with any of the beat phases from that tempo. Phase-checking consists of comparing the time index of the first event in the music event pair against the time index at which the beat was last detected. If these times differ by an integer-multiple of the 639006.doc -29tempo, then the music event pair is a match for the beat phase. A tolerance Ephase can be used to allow for minor phase misalignment Ephase 0.08) where the beat phase is matched when: f 'phase OR f Sphase) (6) where: frepresents the fractional part of the specific number of tempo periods elapsed (periods_elapsed) between the time of the older music event being checked (timeevent) and the time at which the beat phase was last detected (timelastbeat); periods elapsed timeevent timeastbet (7) updated period f periods_elapsed 1 periods_elapsed_ and (8) updated_period represents the new tempo period as calculated above.
At the next step 1511 of the method 1500, when a beat within the tempo is matched, the details of the beat stored in memory 206 are updated. The beat details are updated in a similar manner to the tempo details being updated when the tempo was matched, as described above. The running averages for signal and delta values being tracked for the beat are also adjusted, at step 1511, according to the peak values stored with the music events of the matched pair. The time index of the newer of the music events in the pair is recorded in memory 206 for future phase-checking and the score of the beat phase is also increased.
639006.doc Similar to tempo scores, the beat score may be increased using one of several different methods. Again, one method is to increase the beat score by the inverse of the beat period. An additional modification to tempo and beat scores can also be made based on the aforementioned running averages of the power signal and delta value waveforms.
Since these running averages indicate the power of the music events that corresponded to the tempo or beat of an input audio signal, a higher value score associated with a tempo or beat can be made to bear more weight for that particular tempo or beat. This is especially useful when choosing between beats within a particular tempo, where often the choice is being made between the true beat, and a half-beat, for example. In many cases, the true beat has a higher power.
The beat score is preferably increased by the inverse of the period of the tempo, and can be further altered by multiplication by the actual peak power of the signal that constituted the beat, and by the magnitude of the fir'st derivative at the point in time where the beat occurs. However, as will be described below, beat scores can also be automatically reduced by an amount based on the elapsed time since the previous processing cycle. Hence, the manner in which scores increase for beat detection needs to be compatible with the manner in which the scores are modified elsewhere.
If no beat-phase being tracked within the tempo is similar in phase to the beats represented by the music event pair being processed, at step 1511, a new beat phase can be created within the tempo. Similar to the tempo list, a maximum limit on the number of beats to be tracked per tempo can be imposed. Since often there are only a small number of phases for the actual beat of the music, this maximum limit can be quite small 3 or 4 beats). If the system 200 is already tracking the maximum number of beat phases for a particular tempo, one of the beat phases being tracked must be removed. One method for choosing a beat phase to be removed is by selecting the beat that was last detected the longest time ago, with the proviso that the beat phase currently classed as the 'best' is not 639006.doc -31 removed. The 'best' beat phase represents a best guess for where in time the actual beat is occurring for a given tempo in the input music signal and can be indicated in memory 206 by a flag associated with the beat phase details.
If many slower multiples of the actual tempo and beat of the input music signal are required to be tracked, then more beat phases may be required for the slower tempos.
This is because there will be more 'apparent' beat phases detected for the slower tempos.
For example, the minimum tempo detectable can be set as low as twenty beats per minute, so that fourth-beats of an actual eighty beats per minute tempo can be detected. The twenty beats per minute tempo can then attempt to track four phases of a twenty beat-perminute tempo, since there are actually four beats occurring in one twenty beat-per-minute cycle one for each beat of the eighty beats per minute tempo). Hence, the twenty beat-per-minute tempo is able to track at least four phases, although more would be preferable in the event that there are also half beats occurring. The number of beat phases tracked for a tempo can be made to depend on the period of the tempo, such that a tempo that is twice as slow as another will track double the number of beat phases.
The method 1500 proceeds to step 1513, after step 1511, and analysis continues in the manner described above for steps 1503 to 1511 until all new music events added to the music event list 1400 have been paired with older events, and the pairs have been analysed with respect to the tempo list and beat phases. As seen in Fig. 14, analysis of event pair 1450 new event 1430 and old event 1420) is.followed by event pair 1460 new event 1430 and old event 1470), and so on down for the remaining event pairs until all new music events 1440, 1480 and 1485 have been paired in the manner described.
Returning to Fig. 12, after the music event list stored in memory 206 has been analysed and the tempo and beat lists associated with the current audio data block have been updated at step 1207, the method 1200 proceeds to step 1209. At step 1209, the 639006.doc -32tempo and beat phase lists are checked and scores associated with the tempos and beats are adjusted in order to identify which tempo and beat phase can be classified as the 'best', at the next step 1211. The tempo classified as the 'best' represents a best guess of the actual tempo of the input music at the current time. Within a tempo, a 'best' beat phase is also identified representing a best guess for where in time the actual beat is occurring in the input music signal. This information can then be used to predict the location of the next beat in the future since the next beat will occur one period after the time that the last beat was detected.
Fig. 16 shows a method 1600 of processing tempo and beat lists as performed at step 1209 of the method 1200. The steps of the method 1600 are preferably implemented as software resident on the hard disk drive 210 and being read and controlled in its execution by the processor 205. The method 1600 iteratively processes each tempo being tracked by the processor 205, and each beat phase within each tempo. The scores of each of the tempos and the beats within the tempo are reduced by a predetermined factor with the view that, as time passes, all tempos and beats consistently have their scores decayed in a specific way. However, if some of the tempos and beats are actually being detected, then the scores associated with the detected tempos are increased. Furthermore, tempos and beats that are consistently being detected those that represent the actual tempo and beat of the music) have their associated scores increased consistently. Such consistency yields an equilibrium whereby the increase in scores of tempos being detected is offset by the decrease in decay factor that is applied each cycle to those tempos not being detected. Tempos and beats that are not detected consistently lose score gradually and tend towards zero. The reduction factor can be based on the elapsed time between processing cycles, and also chosen so that scores undergo a suitable decay over time if the tempo or beat is no longer detected.
639006.doc
I
-33- Fig. 17 is a graph 1700 showing an example of the above described scoring behaviour for a particular beat 1701 repeating over a time period of about six beats. A beat being consistently detected by the processor 205 should ideally reach a score equilibrium point (maximum score), shown by the broken line at 1730. At the equilibrium point 1730, the recurring decay 1750 applied to the value of the score is offset exactly by the score increase 1760 associated with successful beat re-detection. The score increases 1760 and 1761 are applied to the beat during step 1511" of the method 1500 for a beat being re-detected. In the event that a beat is missed at the point 1740), either by accident or since the beat is actually erroneous, there will be a subsequent greater decay of the score associated with the beat. However, if the beat is again consistently detected, the score associated with the beat will rise once more over time to the equilibrium score 1730.
The method 1600 begins at step 1601 where the processor 205 selects a next tempo from the tempo list stored in memory 206. At the next step 1603, the score associated with the tempo is reduced by a predetermined factor. As described above, the reduction factor can be based on the elapsed time between processing cycles, and can also be chosen so that scores undergo a suitable decay over time if the tempo or beat is no longer detected. Then at the next step 1605, a next beat phase associated with the tempo selected at step 1601, is selected from memory 206 by the processor 205. The method 1600 continues at the next step 1607 where the score associated with the beat phase selected at step 1607 is reduced. The method 1600 then returns to step 1605 if there are more beat phases associated with the selected tempo. Otherwise, the method 1600 proceeds to step 1611. At step 1611, the beat phase associated with the tempo and having the highest associated score is selected as the best beat phase for the tempo selected at step 1603. At the next step 1613, the method 1600 returns to step 1601 if there are more tempos to be processed. Otherwise, the method 1600 proceeds to step 1615, where the 639006.doc 34 processor 205 selects the tempo with the highest associated score as the best tempo for the input music signal being processed. The best beat phase within the chosen tempo represents a best indication of the actual location of the beat of the music. As described above, the best beat phase information can then be used, among other things, to predict when in the future the next beat is to occur since the next beat will occur one period after the time that the last beat was detected.
Adjusting the decay factor, and also the manner by which scores are increased when a tempo or beat is detected, alters the behaviour of the system 200, in a number of ways. Firstly, the rate at which the score associated with a consistently-detected tempo or beat increases, and eventually surpasses scores of other tempos and beats, is changed.
Secondly, the rate at which the score associated with a tempo or beat reduces if the tempo or beat ceases being detected at some time is changed. Finally, the maximum score at which a consistently-detected tempo of a certain period reaches an equilibrium state is changed.
The aforementioned preferred method(s) comprise a particular control flow. There are many other variants of the preferred method(s) which use different control flows without departing from the spirit or scope of the invention. Furthermore one or more of the steps of the preferred method(s) may be performed in parallel rather than sequentially.
The method described above may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the methods. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
639006.doc In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of". Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.
639006.doc

Claims (19)

1. A method of generating an animation sequence according to a time varying signal, said animation sequence having a plurality of frames, one or more of said frames being a keyframe, said method comprising the steps of: determining a pre-selected property of said time varying signal; predicting a position in time in said animation sequence for generation of at least one of said keyframes depending on said property; and generating said animation sequence in real time according to the predicted position in time for generation of said at least one keyframe.
2. A method according to claim 1, wherein a playback rate of one or more of said frames is dependent on the predicted position in time.
3. A method according to claims 1 or 2, said method comprising the further step of controlling one or more parameters, associated with one or more of said frames of said animation according to said property.
4. A method according to any one of claims 1 to 3, wherein said property is a time to a next event in said time varying signal.
A method according to any one of claims 1 to 4, said method further comprising the step of generating one or more of said frames of said animation sequence by interpolating between two or more of said keyframes.
6. A method according to any one of claims 4 or 5, said method further comprising the step of rendering one or more of said key-frames at a time corresponding to said event.
639006.doc I I -37-
7. A method according to any one of claims 4 to 6, said method further comprising the step of rendering a plurality of said frames between two or more events.
8. A method according to any one of claims 4 to 7, wherein said next event is represented by a peak derived from said time varying signal.
9. A method according to any one of claims 1 to 8, said method further comprising the step of determining a tempo of said time varying signal.
10. A method according to claim 9, said method further comprising the step of storing a time to a next beat according to said tempo.
11. A method according to claim 9, said method further comprising the steps of: transforming a portion of said time varying signal to form a frequency representation of said portion; analysing said frequency representation to identify significant events in said portion of said time varying signal; grouping said significant events together depending on when a particular event occurred in said time varying signal; scoring each of said grouped events depending on frequency of occurrence within said signal; and comparing said scored events to determine the tempo of said time varying signal.
12. A method according to any one of claims 1 to 11, wherein said property is the amplitude of said time varying signal, averaged over a specified time. 639006.doc -38-
13. A method according to any one of claims 1 to 12, wherein said property is a dominant frequency of said time varying signal, averaged over a specified time.
14. A method according to any one of claims 1 to 13, said method further comprising the steps of: determining a second pre-selected property of said time varying signal; and controlling one or more parameters associated with one or more of said frames of said animation according to said second property.
15. A method according to any one of claims 1 to 14, wherein said position in time for generation is predicted substantially simultaneously with the determination of said predetermined property.
16. The method according to any one of claims 1 to 15, wherein said time varying signal is an audio signal.
17. An apparatus for generating an animation sequence according to a time varying signal, said animation sequence having a plurality of frames, one or more of said frames being a keyframe, said apparatus comprising: property determining means for determining a pre-selected property of said time varying signal; keyframe prediction means for predicting a position in time in said animation sequence for generation of at least one of said keyframes depending on said property; and animation generation means for generating said animation sequence in real time according to the predicted position in time for generation of said at least one keyframe. 639006.doc I
18. A program for generating an animation sequence according to a time varying signal, said animation sequence having a plurality of frames, one or more of said frames being a keyframe, said program comprising: code for determining a pre-selected property of said time varying signal; code for predicting a position in time in said animation sequence for generation of at least one of said keyframes depending on said property; and code for generating said animation sequence in real time according to the predicted position in time for generation of said at least one keyframe.
19. An animation sequence generated according to the method of any one of claims 1 to 16. A method of generating an animation sequence according to a time varying signal, substantially as herein before described with reference to Figs. 1 to 17. 21. An apparatus for generating an animation sequence according to a time varying signal, substantially as herein before described with reference to Figs. 1 to 17. 22. A program for generating an animation sequence according to a time varying signal, substantially as herein before described with reference to Figs. 1 to 17. DATED this Twenty- Fourth Day of June 2003 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON 639006.doc
AU2003204917A 2002-06-24 2003-06-24 Method and Apparatus for Synchronising a Keyframe with Sound Ceased AU2003204917B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003204917A AU2003204917B2 (en) 2002-06-24 2003-06-24 Method and Apparatus for Synchronising a Keyframe with Sound

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AUPS3153 2002-06-24
AUPS3153A AUPS315302A0 (en) 2002-06-24 2002-06-24 Method and apparatus for synchronising a keyframe with sound
AU2003204917A AU2003204917B2 (en) 2002-06-24 2003-06-24 Method and Apparatus for Synchronising a Keyframe with Sound

Publications (2)

Publication Number Publication Date
AU2003204917A1 AU2003204917A1 (en) 2004-01-15
AU2003204917B2 true AU2003204917B2 (en) 2006-01-12

Family

ID=34137094

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2003204917A Ceased AU2003204917B2 (en) 2002-06-24 2003-06-24 Method and Apparatus for Synchronising a Keyframe with Sound

Country Status (1)

Country Link
AU (1) AU2003204917B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682458A (en) * 2011-03-15 2012-09-19 新奥特(北京)视频技术有限公司 Synchronous regulating method of multi-stunt multi-parameter of key frame animation curve

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5267334A (en) * 1991-05-24 1993-11-30 Apple Computer, Inc. Encoding/decoding moving images with forward and backward keyframes for forward and reverse display
GB2277847A (en) * 1993-05-03 1994-11-09 Grass Valley Group Creating concurrent video effects using keyframes
WO1998034181A2 (en) * 1997-02-03 1998-08-06 Koninklijke Philips Electronics N.V. A method and device for keyframe-based video displaying using a video cursor frame in a multikeyframe screen
WO1998034182A2 (en) * 1997-02-03 1998-08-06 Koninklijke Philips Electronics N.V. A method and device for navigating through video matter by means of displaying a plurality of key-frames in parallel
US6580437B1 (en) * 2000-06-26 2003-06-17 Siemens Corporate Research, Inc. System for organizing videos based on closed-caption information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5267334A (en) * 1991-05-24 1993-11-30 Apple Computer, Inc. Encoding/decoding moving images with forward and backward keyframes for forward and reverse display
GB2277847A (en) * 1993-05-03 1994-11-09 Grass Valley Group Creating concurrent video effects using keyframes
WO1998034181A2 (en) * 1997-02-03 1998-08-06 Koninklijke Philips Electronics N.V. A method and device for keyframe-based video displaying using a video cursor frame in a multikeyframe screen
WO1998034182A2 (en) * 1997-02-03 1998-08-06 Koninklijke Philips Electronics N.V. A method and device for navigating through video matter by means of displaying a plurality of key-frames in parallel
US6580437B1 (en) * 2000-06-26 2003-06-17 Siemens Corporate Research, Inc. System for organizing videos based on closed-caption information

Also Published As

Publication number Publication date
AU2003204917A1 (en) 2004-01-15

Similar Documents

Publication Publication Date Title
US7026536B2 (en) Beat analysis of musical signals
AU2021201716B2 (en) Rhythmic Synchronization Of Cross Fading For Musical Audio Section Replacement For Multimedia Playback
CN105513583B (en) song rhythm display method and system
JP3941417B2 (en) How to identify new points in a source audio signal
US8654250B2 (en) Deriving visual rhythm from video signals
EP2816550B1 (en) Audio signal analysis
US7500176B2 (en) Method and apparatus for automatically creating a movie
EP1377959B1 (en) System and method of bpm determination
KR20080066007A (en) Method and apparatus for processing audio for playback
US6881889B2 (en) Generating a music snippet
US8885841B2 (en) Audio processing apparatus and method, and program
EP2962299B1 (en) Audio signal analysis
US10460763B2 (en) Generating audio loops from an audio track
FR2972835A1 (en) METHOD FOR GENERATING A SCENARIO FROM A MUSIC, GAME AND SYSTEMS COMPRISING MEANS FOR IMPLEMENTING SUCH A METHOD
JP2002116754A (en) Tempo extraction device, tempo extraction method, tempo extraction program and recording medium
WO2019241785A1 (en) Systems and methods for dancification
CN102473415A (en) Audio control device, audio control program, and audio control method
US7276656B2 (en) Method for music analysis
Dixon et al. Real time tracking and visualisation of musical expression
JP2007520727A (en) How to process a sound sequence like a song
AU2003204917B2 (en) Method and Apparatus for Synchronising a Keyframe with Sound
US9990911B1 (en) Method for creating preview track and apparatus using the same
JP2007025242A (en) Image processing apparatus and program
US10395669B2 (en) Voice analysis apparatus, voice analysis method, and program
US11107504B1 (en) Systems and methods for synchronizing a video signal with an audio signal

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
MK14 Patent ceased section 143(a) (annual fees not paid) or expired