CN112352279A

CN112352279A - Beat decomposition facilitating automatic video editing

Info

Publication number: CN112352279A
Application number: CN201980043850.0A
Authority: CN
Inventors: 克里斯多夫·沃谢
Original assignee: Sokolipu Co
Current assignee: Sokolipu Co
Priority date: 2018-07-03
Filing date: 2019-07-03
Publication date: 2021-02-09
Anticipated expiration: 2039-07-03
Also published as: CN112352279B; EP3818528A1

Abstract

The disclosed technology relates to a method for detecting a musical event within a musical composition. The detection of a musical event is based on analyzing the energy and frequency of the digital signals of the musical composition. Identification of musical events in a musical composition will be used in conjunction with audio-video editing.

Description

Beat decomposition facilitating automatic video editing

Cross Reference to Related Applications

This application claims the benefits of U.S. non-provisional application No. 16/503,379 entitled "beat decomposition for facilitating automatic video editing" filed on 7/3 2019, which itself claims the benefits of U.S. provisional application No. 62/693,799 entitled "beat decomposition for facilitating automatic video editing" filed on 7/3 2018, which is incorporated herein by reference in its entirety.

Background

1. Field of the invention

The subject matter of the present disclosure relates generally to video editing and, more particularly, to systems and methods for beat decomposition to facilitate beat matching.

2. Introduction to

When applied properly, musical scores in conjunction with visual content such as images or video can produce an emotionally powerful multimedia production. However, for such content to be affective, the music and visual transformation must occur in careful synchronization so that the visual effect matches well with the music transformation. This process is sometimes known by content editors as "beat matching," a manual process that is often difficult and time consuming. That is, conventional video editing typically requires beat matching to be performed by an editing specialist, making the production of professional content difficult or inaccessible to the average consumer. However, with the continued proliferation of social media and the continued proliferation of mobile devices, such as video-enabled smart phones, consumers increasingly desire to generate and share their own mixed content offerings.

Drawings

In order to describe the manner in which the above-recited and other advantages and features of the disclosure can be obtained, a more particular description of the principles briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the disclosure and are not therefore to be considered to be limiting of its scope, the principles herein are described and explained with additional specificity and detail through the use of the accompanying drawings in which:

FIG. 1 conceptually illustrates one example of identifying musical events (musical artifacts) in a musical piece.

Fig. 2 shows exemplary types of input audio files converted into the wav format.

FIG. 3 conceptually illustrates one example of identifying musical events in a musical piece based on energy.

Fig. 4 shows an example of the application of a high-pass filter to detect high normalized energy in an audio input.

Fig. 5 shows an example of applying a band pass filter to detect a specific musical event.

FIG. 6A illustrates one example of a method for detecting a musical event in a musical piece.

FIG. 6B illustrates another example of a method for detecting a musical event in a musical piece.

Fig. 7 shows an example of an output file of hit/miss musical pieces.

FIG. 8 illustrates one example of a processor-based computing device to implement various aspects of the present technology.

Disclosure of Invention

The detailed description set forth below is intended as a description of various configurations of embodiments and is not intended to represent the only configurations in which the presently disclosed subject matter may be practiced. The accompanying drawings are incorporated in and constitute a part of this detailed description. The detailed description includes specific details in order to provide a more thorough understanding of the presently disclosed subject matter. It will be apparent, however, that the subject matter of the present disclosure is not limited to the specific details set forth herein, but may be practiced without these details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject matter of the present disclosure.

Disclosed herein are computer-implemented methods, computer-readable media, and systems for identifying musical events. Identification of a musical event is carried out by first receiving a primary waveform representing a musical piece having the musical event. The primary waveform is then filtered to generate a substitute waveform associated with the musical event. The substitute waveform is then analyzed to identify points in time in the primary waveform that correspond to the musical event.

In some embodiments, the filtering of the primary waveform comprises a first filtering using two or more interleaved bandpass filters and outputting two or more secondary waveforms. The first filtering includes: the method includes calculating sample groups (samples' modules) for the secondary waveforms, identifying sample groups that exceed a first predetermined frequency range threshold, identifying a frequency range having a maximum sample group that exceeds the first predetermined frequency range threshold for each of the music events, and identifying a primary list of music events that is based on the maximum sample groups from the secondary waveforms that exceed the first predetermined frequency range.

In another embodiment, said filtering of the primary waveform comprises a second filtering process of the primary waveform using a low pass filter or a resampling process of the output three-level waveform. The second filtering process includes: a sample set moving average of the three-level waveform is calculated for the frequency range of each music event and at least one of the sample set moving averages exceeding a second predetermined frequency range threshold is identified for each music event. The second filtering process then identifies a secondary list of musical events that includes the primary list of musical events, wherein the secondary list of musical events includes musical events that are: these music events being related to the point in time of the music events included in the primary list

Time points within the range have a sample group moving average that exceeds a second predetermined frequency range threshold.

In yet another embodiment, the filtering of the primary waveform further includes a third filtering of the primary waveform using a high pass filter that outputs a four-level waveform. The third filtering process includes: identifying a sample set of the four-level waveform that exceeds a third predetermined second frequency range threshold, and identifying a tertiary list of musical events, the tertiary list including the secondary list of musical events. The three-level list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in the secondary list

A set of samples within the range having a four-level waveform exceeding a third predetermined frequency range threshold.

Detailed Description

As described herein, the present specification addresses the limitations of the conventional (manual) beat matching methods described above by providing methods, systems, and software for implementing automatic beat matching. As explained in more detail below, automatic beat matching may be used to facilitate automatic editing of visual media content (e.g., images and videos) by mixing these visual media content with various music selections or compositions.

In some aspects, beat matching is implemented by identifying high-energy music events, for example, corresponding to sharp musical transitions in a musical composition (song). The timing of these transitions with respect to the associated song may be recorded, for example, to generate an output file indicating the temporal location (or point in time) of the transition in the duration of the song (e.g., "hits") and the time period (or song segment) that does not contain such an event (e.g., "miss" segment). As explained in more detail below, the resulting hit/miss file (sometimes represented by an waiting vector) may be used to indicate a cut opportunity in the visual content, i.e., a location where the visual transformation is paired with a corresponding music selection.

As described herein, musical events may be identified based on certain frequency and energy characteristics. While the present disclosure specifically provides representative examples regarding identifying drumming, i.e., bass, snare, and charlie drums, identifying other types of music events is also contemplated.

By identifying these musical events, recognizable features about musical compositions can be used in conjunction with an automated audio-video editing process to automate the process for producing professional quality mixed media output. In particular, the disclosed techniques are used in conjunction with beat matching algorithms to automate the matching between scene transitions of audio input files (e.g., musical compositions) and video files. Edits made to the video file are made in conjunction with recognizable features about the audio input file for corresponding to the video file. This allows anyone, even a user who has not had a rich audio-video editing experience, to produce a high quality mixed media output in which scene transitions and other effects can be automatically implemented based on an accompanying audio input file (e.g., a musical composition). Details regarding automated audio-video editing that may be made using identified musical events of a musical composition are described in U.S. provisional application No. 62/837,122 entitled "music-based video editing" filed on 22.4.2019 and incorporated herein by reference. More details on how to identify a musical event in an audio input file, such as a musical composition, will be described below.

Aspects of the disclosed technology described below address the limitations of conventional (manual) beat matching processes by providing improved methods for automatically identifying musical events (e.g., drum hits) in musical compositions. By improving the automated method for identifying music events, the present technology facilitates video editing by automatically mixing videos using the identified music events and their temporal locations (also referred to as time points) in a musical composition. Synchronization between the temporal location of a music event in a musical composition and the corresponding visual effect/scene transition applied in video editing improves the quality of the video.

The present disclosure describes a method of analyzing a musical piece to identify musical events in the musical piece based on different thresholds associated with both frequency and energy. In some aspects, identification of a music event may be implemented to identify a particular drum type, such as a bass drum (e.g., a bass drum) or a snare drum. However, it is to be understood that the present disclosure illustrating analysis of a musical composition to identify musical events is not limited to the types of musical events described in the various embodiments below. For example, other musical features including one or more of the following may also be identified in a musical piece: floor drums, hanging/shelving tom-tom drums, hi-hat, hang-hat, jingle, hi-hat, and/or chinese cymbal, or the like.

As explained in more detail below, musical events in a musical composition may be identified based on frequency and energy. For example, certain types of music events (e.g., drum hits) are identified by their relatively high prominence (energy) and relatively short duration (e.g., 10 ms). Drumming can be a revealing feature that can be distinguished from other instruments such as guitars and pianos, which typically exhibit lower energy features and longer durations. In another example, musical events may also be identified based on frequency. Certain types of music events (e.g., drum hits) may have a single constant resonant frequency based on the geometry of the drum itself.

Figure 1 conceptually illustrates one example of identifying musical events in a musical piece (e.g., a song). The figure is a chart 100 showing an energy characterization of a musical piece, in this example the prelude to the song "Back in Black" of the AC/DC band. In particular, the frequency of the audio signal (ranging from 0Hz to 22kHz) is shown graphically versus time.

The chart also shows different musical events (e.g., elements 110-130) shown at specific temporal locations in the musical piece at which the musical piece exhibits various different energy peaks. A musical event (e.g., a drum beat) exists at a particular point in time in a musical composition. Depending on the desired implementation, musical events may be identified based on different frequency and/or energy characteristics. As an example, the identification of a musical event may be based on determining whether the musical event has: (1) energy above a predetermined threshold occurring over a predetermined length of time (e.g., a few milliseconds); and/or (2) a resonance frequency that concentrates most of the energy.

As shown in this figure, the musical events 110 and 120 are two different points in the graph 100 of the energy characterization of a musical composition that correspond to different musical events that may be detected using the present technology. The energy level of each of the music events 110 and 120 is above a predetermined minimum threshold. The graph 100 showing the energy characterization shows that the musical events 110 and 120 have their respective highest energy peaks at different points in time in the musical composition.

A music event, such as a drum beat, may be characterized by an energy above a first threshold for a predetermined length of time. Also, the music event may have an average energy level (based on tens of subsequent samples or groups of samples) above a second threshold. The first and second thresholds used to identify energy and average energy for a musical event may be self-defined based on, for example, characteristics of the musical piece and each of the detected musical events. This customization may be implemented, for example, by a user of the present technology. Another factor for customizing the first and second thresholds may also be based on the instrument (e.g., type of drum) used in conjunction with the detected musical event (e.g., drum strike).

Also, each of the music events 110 and 120 is associated with a peak energy centered around a different frequency. Based on around what frequencies the peak energy of the audio signal is concentrated, it may be possible to specifically identify the type of musical event in the graph 100 of energy representations of a musical piece. For example, a musical event having a peak energy centered at about 60 to 70Hz may correspond to the base drum 110. Conversely, if the peak energy of a music event is centered around 130 to 150Hz, this may indicate that the music event is a snare drum 120.

Different musical events will have different concentrated frequency ranges in which their peak energy lies. However, these frequency ranges do not overlap for different musical events, so that the present technique may be able to distinguish between different musical events. For example, for the base drum 110, the threshold frequency range may be as wide as 40 to 100 Hz. Conversely, the threshold frequency range for detecting the snare drum 130 may be as wide as 110 to 170 Hz. Other frequency ranges than those described above for the bass and snare drums will be assigned to other music events.

As described above, music events 110 and 120 (e.g., drum hits) have energy above a first threshold (e.g., a high frequency range). The chart 100 of energy characterization of a musical composition shows that the musical events 110 (e.g., a base drum) and 120 (e.g., a snare drum) each have energy in a high frequency range 130. The high frequency range 130 represents music events having energy at high frequencies. The high frequency range 130 is used as an initial determination (or "gate") for identifying the potential presence of a musical event at a point in time in the musical piece. This initial determination associated with the high frequency range 130 is also referred to as an "O zone". While the presence of a "gate" at the high frequency range 130 is intended to identify a temporal location in the musical piece where the musical event may be located, the presence of a "gate" at the high frequency range 130 does not necessarily mean that a musical event is present. The following is possible: despite the energy being in the high frequency range 130, it may still be determined that no musical event is present. However, the absence of a "gate" in the high frequency range 130 at a point in time in the musical piece does conclude that there are no musical events 110, 120 (e.g., drum hits such as a base drum or snare drum) at a particular time in the musical piece.

To obtain the energy and frequency associated with a musical piece, the musical piece is processed so that it can be converted from an audio format to an analyzable digital format (also referred to as a primary waveform). The audio input file is converted to a wav file as described herein (although other formats are possible).

Fig. 2 shows exemplary types of input audio files that may be converted to a wav format (e.g., a primary waveform). For example, audio inputs in formats such as. MP3,. aac,. M4A, and. ogg, may all be converted to the. wav format.

In some aspects, the wav format will be used in connection with identifying musical events in a musical piece. It should be noted that there may be other types of audio input files that may be used and converted to the wav format in addition to the.mp 3, the.aac, the.m 4A, and the.ogg. Also, different formats other than the.wav format may be used. In addition, other digital file formats may be used without departing from the scope of the disclosed technology.

FIG. 3 conceptually illustrates one example of identifying musical events in a musical piece based on energy. In particular, the figure shows a diagram 300 including an application to a symmetric square window 310, which symmetric square window 310 can be used to detect potential musical events in a musical piece based on an average energy of an audio signal 320. As shown, the energy is defined according to the following equation:

NRJ′(k)＝∑_kABS(xi)

where "k" corresponds to the number of sampling groups and "ABS ()" corresponds to an absolute value function of the energy of the musical piece.

As shown in FIG. 3, the absolute value of the energy associated with a musical piece over a period of time is displayed. The square window 310 is used to identify that the average energy of a musical piece over a period of time (corresponding to a predetermined number of samples or groups of samples) is above a predetermined Threshold (TH)₁) 330. In this way, a square window is used to identify points in time at which a musical event in a musical piece may be located.These time points are called "X-zones".

It should be noted that the predetermined Threshold (TH)₁)330 may be initially set at a default value, but may also be defined by the user. For a predetermined threshold value (TH)₁)330 allows the present technology to take into account the different characteristics of possible musical events present in a musical piece and allows the user to specifically determine which musical events to detect.

Since the square window 310 takes into account multiple samples (or groups of samples) when identifying whether a musical event is present in a musical piece, a threshold peak selection process is implemented to identify the sample (or group of samples) having the largest/highest peak to represent that musical event at that point in time.

Fig. 4 shows an example of the application of a high-pass filter to detect high normalized energy in an audio input (or primary waveform). The high pass filter is useful for detecting musical events in a musical piece. As described above in fig. 1, a musical event has energy in a high frequency range. By using the high pass filter 400, the musical piece 410 may be filtered to output only the high frequency portion 420 (four-level waveform). The high frequency portion 420 may then correspond to a point in time when the musical piece has the required energy in the high frequency range (see fig. 1, high frequency range 130). In this manner, the high pass filter 400 may be used to identify an initial determination (or "gate") that identifies where potential musical events may be detected in the musical piece (also referred to as an "O-zone").

Fig. 5 shows one example of the application of a band pass filter 500 (or possibly two or more interleaved band pass filters) to the detection of a particular musical event. One or more band pass filters 500 (e.g., a stage with interleaved band pass filters) may be used to remove high and low frequency components of the musical piece 510 by passing frequencies within a range of frequencies (e.g., a frequency band) and rejecting or attenuating frequencies outside the range of frequencies. Depending on the type of musical event detected (e.g., drum beat), the corresponding band pass filter 500 may be designed to filter the musical composition to provide only an output corresponding to the frequency associated with the type of musical event.

As described above in fig. 1, different musical events have peak energies concentrated around different frequency ranges. For example, the base drum may be associated with a frequency range of 40Hz to 100 Hz. Conversely, the snare drum may be associated with a frequency range of 120Hz to 170 Hz. Other frequency ranges may be possible for other detectable types of musical events. In any case, the corresponding band pass filter 500 may be designed for the frequency range associated with the type of musical event to be detected. The output 520 (after filtering the musical piece 510 with the band pass filter 500 (also referred to as an interleaved band pass filter)) will be used to identify the portion of the musical piece 510 that corresponds to the location where the particular musical event is located. The output 520 may also be referred to as a secondary waveform.

For example, one band pass filter may be designed to filter the musical piece 510 specifically for frequencies between 40Hz to 100 Hz. The output 520 will show the portion of the musical piece where the base drum was detected. In particular, output 520 (after filtering musical piece 510 with band pass filter 500) will identify the temporal location in the musical piece where energy is concentrated between the frequency range of 40Hz to 100 Hz.

FIG. 6A illustrates one example of a method 600 for detecting a musical event in a musical piece. The method 600 is used to analyze a musical piece to identify a temporal location of a musical event within the musical piece.

In step 605, an audio signal representing a musical composition is received. A musical composition has an unknown number of musical events to be identified. As described above, musical events correspond to different identifiable characteristics of a musical composition. Some exemplary musical events include drumming (e.g., bass drum, snare drum), however, other types of events may be detected using the method 600 without departing from the techniques of this disclosure.

After receiving the audio signal of the musical piece, the audio signal is then digitized in step 610 to generate a digital version of the audio file (also referred to as a primary waveform). The digital version of the musical composition is provided in a format (e.g.,. wav) that can be further processed and analyzed.

In step 615, the digital audio signal is further processed. In particular, the digital audio signal is processed such that a waveform associated with the musical event can be normalized. In one example of a normalization process, the peak values associated with the digital audio signal are normalized to between [ -1 to 1 ]. Other normalization limits may be considered and implemented in conjunction with the present techniques without departing from the scope of the disclosed techniques.

After normalizing the digital audio signal, the audio signal may be analyzed in step 620 to identify a musical event in step 620. The analysis of the audio signal may comprise a number of different filtering processes. Also, a density of musical events detected over a period of time in the musical piece may be determined.

A temporal location of a musical event in a musical composition and a type of musical event at the location may be determined. Identifying a musical event in step 620 may include a number of different steps (described above in fig. 3-5). For example, a musical event is detected in a musical composition when the signal has the following indications: (1) the signal has an energy above a first predetermined threshold corresponding to a high frequency; and (2) the signal has an average energy above a second predetermined threshold over a period of time; and (3) the signal has a resonant frequency at a predetermined threshold corresponding to where most of the energy is concentrated.

Based on the results of the above analysis, it may be determined what musical event was detected and where in the musical composition the musical event was located. The identification of specific music events (e.g., drum hits) and their locations in a musical composition will be used with U.S. provisional application 62/837,122, which is incorporated herein by reference, to automate the audio-video editing process.

Based on the different detected musical events, a next calculation may be performed that identifies a density of musical events for different portions of the musical composition. In particular, the density of music events corresponds to the number of different music events present within a predetermined time period. For example, a prelude period (e.g., 5 seconds) of a musical piece may have two different detected musical events. However, during later portions of the same musical piece, more musical events (e.g., 15) may be detected within a span of 5 seconds. The duration for evaluating the density of music events may be user-defined. For example, in selecting how much the density of a portion of a musical piece is to be calculated, the user may consider the characteristics of the musical piece.

In calculating the musical event density, a characterization and comparison identifying the number of musical events detected by a portion of a musical piece to different portions of the same musical piece may be performed. Comparisons may also be made based on portions of different musical pieces. In general, the higher the density detected in a portion of a musical piece, the more musical events will be detected within that duration.

In step 625, a hit/miss output file may be generated based on the calculation of the density of music events performed in step 620. The hit/miss output file may be used to identify portions of the musical piece where a threshold minimum musical event density (e.g., number of musical events per time period) is detected. Portions of such musical pieces having a number of musical events greater than a predetermined minimum density are entitled "hit" segments, while portions of such musical pieces that do not have the desired minimum musical event density are entitled "miss" segments. It should be noted that the user can customize a minimum number of events (e.g., a threshold density) over a period of time that will be used to characterize portions of a musical piece as "hit" or "miss" segments.

The hit/miss output file may be used to indicate when a segment of a musical piece has a desired number of musical events (or has a predetermined density of musical events). When used in conjunction with an automated method for audio-video editing (as described in U.S. provisional application 62/837,122, which is incorporated herein by reference), a computing device may be instructed to skip portions of a musical piece classified as "misses". This makes the computing device more efficient at audio-video editing by skipping portions that do not implement editing. Meanwhile, when a portion of a musical piece has a "hit" segment, this will correspond to an opportunity that the video may be edited based on the corresponding portion of the musical piece.

FIG. 6B illustrates another example of a method 640 for detecting a musical event in a musical piece. In particular, method 640 begins with receiving a primary waveform in step 650 that will be used to identify what musical events are present in a musical piece. The primary waveform (typically a. wav file, but could be other types of files) represents a musical piece, but has been processed into a format that can be filtered and analyzed.

After the primary waveform is received in step 650, a number of different filtering processes are performed. A first filtering is performed in steps 660 to 667. The first filtering includes using bandpass filters in step 660 that generate different secondary waveforms 662 (associated with different frequency ranges) based on the type of bandpass filter used in step 660. From these secondary waveforms, a peak selection process is implemented in step 662, wherein for each of the different music events in the different frequency ranges, the highest/most/winning sample having the highest peak is selected. These selected samples are assembled into a list of primary music events in step 667. This primary list will be used in a later filtering step (see steps 677 and 687) to compare with the points in time at which the musical events were located to create (in step 690) a final list of musical events corresponding to the actual musical events detected in the musical piece.

A second filtering process is also performed on the primary waveform in steps 670 to 677. The second filtering process uses a low pass filter (or resampling process) in step 670 to generate a three-level waveform in step 672. Based on the three-level waveform, an average energy is calculated in step 675. In step 677, indications of possible musical events are identified (and stored in a secondary manifest) based on detecting a tertiary waveform that exceeds a predetermined threshold. These indications of possible musical events (also referred to as X-regions) are located in the range of the musical events identified in the primary list.

A third filtering process is also performed on the primary waveform in steps 680-687. The third filtering process uses a high pass filter in step 680 to generate a four-level waveform in step 682. Based on the four-level waveform, different sets of samples that exceed a predetermined threshold are identified in step 685. These sample groups are also identified within a time frame with respect to where the music event identified in the primary manifest is located. These sample groups are then classified as "o-zones" in step 687. The O-area represents another possible indication that a musical event is present in the musical composition. These possible indications (regarding O-zones) are stored in a third list, which can be referred to later.

After the various filtering steps described above have been performed, the primary list of music events (created in step 667) is compared with the list of possible music events (created in steps 677 and 687) in step 690 to confirm the final list of music events. In particular, the final list of musical events will correspond to points in time at which the indications (e.g., the o-region and the x-region) of the possible musical events match. This is relevant to the discussion above, where a music event has: (1) an energy (detected by a high-pass filtering process) above a first predetermined threshold corresponding to a high frequency; and (2) an average energy above a second predetermined threshold over a period of time (detected by a low pass filtering or resampling process); and (3) a resonance frequency (detected by a band-pass filter) at a predetermined threshold corresponding to which most of the energy is concentrated.

Fig. 7 shows an example of hit/miss output files of a musical piece. As depicted in the figure, the distinction between "hit" and "miss" segments found in a musical piece may be represented by their respective overall energy levels. The lower curve corresponds to a "missed" segment of the musical piece, while a "hit" segment of the musical piece will have a greater amount of energy corresponding to the higher curve. The "hit" and "miss" segments may also be represented as vectors.

FIG. 8 illustrates one example of a processor-based computing device 800 for implementing various aspects of the present technology.

For example, the processor-based computing device 800 may be used to implement a video editing device configured to mix and beat match audio and video inputs. It will also be appreciated that the processor-based computing device 800 may be used in conjunction with one or more other processor-based devices, for example, as part of a computer network or computing cluster.

The processor-based computing device 800 includes a main Central Processing Unit (CPU)862, interfaces 868, and a bus 815 (e.g., a PCI bus). The CPU 862 preferably performs all these functions under software control, including an operating system and any appropriate application software. CPU 862 may include one or more processors 863 such as a processor from the Motorola family of microprocessors or the MIPS family of microprocessors. In an alternative embodiment, the processor 1063 is specially designed hardware for controlling the operations of the processor-based computing device 800. In a particular embodiment, memory 861 (e.g., non-volatile RAM and/or ROM) also forms part of CPU 862. However, there are many different ways in which memory can be connected to the system.

Interfaces 868 may be provided as interface cards (sometimes referred to as "line cards"). Generally, they control the sending and receiving of data packets over the network, sometimes in support of other peripherals used with the router 810. Among the interfaces that may be provided are ethernet interfaces, frame delay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. In addition, various very high-speed interfaces may be provided, such as fast token ring interfaces, wireless interfaces, ethernet interfaces, gigabit ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces, and the like. Generally, these interfaces may include ports appropriate for communication with the appropriate media. In some cases, they may also include a separate processor and, in some cases, volatile RAM. The independent processors may control such communications intensive tasks as packet switching, media control and management, etc. By providing separate processors for the communications-intensive tasks, these interfaces allow the CPU 862 to efficiently perform routing computations, network diagnostics, security functions, etc.

Although the system shown in fig. 8 is one specific network device of the present invention, it is by no means the only device architecture in which the present invention can be implemented. For example, an architecture having a single processor that handles communications as well as routing computations, etc. is often used. Also, other types of interfaces and media could be used with the router.

Regardless of the network device's configuration, it may employ one or more memories or memory modules (including memory 861) configured to store program instructions for the general-purpose network operations and mechanisms. The program instructions may control the operation of an operating system and/or one or more applications, for example.

For clarity of explanation, in some instances, the present technology may be described as individual functional modules including functional modules that comprise devices, device components, steps or routines in a method embodied as software or a combination of hardware and software.

In some embodiments, the computer-readable storage devices, media, and memories may comprise wired or wireless signals including bit streams and the like. However, when referred to, non-transitory computer readable storage media expressly exclude media such as energy, carrier wave signals, electromagnetic waves, and signals per se.

Methods according to the above examples may be implemented using computer-executable instructions stored or available from computer-readable media. Such instructions may include, for example, instructions and data which cause or configure a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of the computer resources used are accessible over the network. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of such computer readable media that may be used to store instructions, information used, and/or information created during a method according to the described examples include magnetic or optical disks, flash memory, USB devices with non-volatile memory, network storage devices, and so forth.

An apparatus implementing methods in accordance with these disclosures may include hardware, firmware, and/or software, and may take any of a variety of forms. Typical examples of these forms include notebooks, smart phones, small personal computers, personal digital assistants, rack-mounted devices, stand-alone devices, and the like. The functionality described herein may also be implemented in a peripheral device or expansion card. As another example, such functions may also be implemented on different chips executing in a single device or circuit boards in different processes.

The instructions, media for transmitting such instructions, computing resources for executing them and other structures for supporting such computing resources are means for providing the functionality explained in this disclosure.

The statement of the present disclosure includes:

statement 1: a computer-implemented method for identifying musical events, the method comprising: receiving a primary waveform representing a musical piece, wherein the musical piece includes a plurality of musical events; filtering the primary waveform to generate a substitute waveform associated with the plurality of musical events; and automatically analyzing the substitute waveform to identify points in time in the primary waveform that correspond to the plurality of musical events.

Statement 2: the computer-implemented method of statement 1, wherein the filtering of the primary waveform comprises a first filtering process using two or more interleaved bandpass filters, the first filtering process outputting two or more secondary waveforms, and the first filtering process comprises: a) calculating a set of samples of the two or more secondary waveforms; b) identifying the set of samples that exceed a first predetermined frequency range threshold, wherein each of the musical events has a different frequency range threshold; c) identifying, for each of the musical events, which of the two or more secondary waveforms is characterized by a maximum number of sample groups that exceed the first predetermined frequency range threshold; and d) for each of the musical events, identifying a primary manifest for the musical event based on the sample set of the two or more secondary waveforms characterized by the most sample set exceeding the first predetermined frequency range threshold.

Statement 3: the computer-implemented method of statement 1 or 2, wherein the filtering of the primary waveform further comprises a second filtering process on the primary waveform using a low pass filter or a resampling process, wherein the second filtering outputs a three-level waveform, and the second filtering process further comprises: a) for each frequency range of music events, a sample set shift of the three-level waveform is calculatedAnd, for each of the musical events, identifying at least one sample group moving average exceeding a second predetermined frequency range threshold; and b) identifying a secondary list of musical events, the secondary list including the primary list of musical events, wherein the secondary list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in said primary list

Time points within a range have a sample group moving average exceeding the second predetermined frequency range threshold.

Statement 4: the computer-implemented method of any of the preceding claims 1-3, wherein the filtering of the primary waveform further comprises implementing a third filtering on the primary waveform using a high pass filter, wherein the third filtering outputs a four-level waveform, and the third filtering comprises: a) identifying a set of samples of the four-level waveform that exceed a third predetermined second frequency range threshold; and b) identifying a tertiary list of musical events, the tertiary list including the secondary list of musical events, wherein the tertiary list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in said secondary list

A set of samples within a range having the four-level waveform exceeding the third predetermined frequency range threshold.

Statement 5: the computer-implemented method of any of the preceding statements 1-4, wherein the automatic analysis of the substitute waveform further comprises identifying song segments in the primary waveform representing the musical piece for which a number of musical events exceeds a predetermined density threshold.

Statement 6: the computer-implemented method of any of the preceding claims 1-5, further comprising generating a hit/miss output file that identifies the song segments in the primary waveform for which the number of music events exceeds the predetermined density threshold.

Statement 7: the computer-implemented method of any of the preceding claims 1-6, wherein the plurality of musical events comprises one or more of a base drum and a snare drum.

Statement 8: a non-transitory computer-readable medium comprising instructions that identify a musical event, which when executed by a computing system, cause the computing system to: receiving a primary waveform representing a musical piece, wherein the musical piece includes a plurality of musical events; filtering the primary waveform to generate a substitute waveform associated with the plurality of musical events; and automatically analyzing the substitute waveform to identify points in time in the primary waveform that correspond to the plurality of musical events.

Statement 9: the non-transitory computer-readable medium of statement 8, wherein the instructions for the filtering of the primary waveform further comprise implementing a first filtering process that uses two or more interleaved bandpass filters, the first filtering process outputting two or more secondary waveforms, and the first filtering process comprising: a) calculating a set of samples of the two or more secondary waveforms; b) identifying the set of samples that exceed a first predetermined frequency range threshold, wherein each of the musical events has a different frequency range threshold; c) identifying, for each of the musical events, which of the two or more secondary waveforms is characterized by a maximum number of sample groups that exceed the first predetermined frequency range threshold; and d) for each of the musical events, identifying a primary manifest for the musical event based on the sample set of the two or more secondary waveforms characterized by the most sample set exceeding the first predetermined frequency range threshold.

Statement 10: the non-transitory computer-readable medium of statement 8 or 9, wherein the instructions for the filtering of the primary waveform further comprise implementing a second filtering process on the primary waveform using a low pass filter or a resampling process, wherein the second filtering outputs a three-level waveform, and theThe second filtering process further includes: a) calculating a sample set moving average of the three-level waveform for the frequency range of each music event, and, for each of the music events, identifying at least one sample set moving average that exceeds a second predetermined frequency range threshold; and b) identifying a secondary list of musical events, the secondary list including the primary list of musical events, wherein the secondary list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in said primary list

Statement 11: the non-transitory computer-readable medium of any of the preceding claims 8-10, wherein the instructions for the filtering of the primary waveform further comprise implementing a third filtering process on the primary waveform that uses a high pass filter, wherein the third filtering outputs a four-level waveform, and the third filtering comprises: a) identifying a set of samples of the four-level waveform that exceed a third predetermined second frequency range threshold; and b) identifying a tertiary list of musical events, the tertiary list including the secondary list of musical events, wherein the tertiary list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in said secondary list

Statement 12: the non-transitory computer-readable medium of any one of the preceding statements 8-11, wherein the instructions for the automatic analysis of the substitute waveform further comprise identifying song segments in the primary waveform representing the musical piece for which a number of musical events exceeds a predetermined density threshold.

Statement 13: the non-transitory computer-readable medium of any one of preceding claims 8 to 12, wherein the instructions to identify music events further comprise generating a hit/miss output file that identifies the song segments in the primary waveform for which the number of music events exceeds the predetermined density threshold.

Statement 14: the non-transitory computer readable medium of any one of the preceding claims 8-13, wherein the plurality of musical events includes one or more of a base drum and a snare drum.

Statement 15: a system for identifying musical events, the system comprising: a processor and a non-transitory computer-readable medium storing instructions that, when executed by the system, cause the system to: receiving a primary waveform representing a musical piece, wherein the musical piece includes a plurality of musical events; filtering the primary waveform to generate a substitute waveform associated with the plurality of musical events; and automatically analyzing the substitute waveform to identify points in time in the primary waveform that correspond to the plurality of musical events.

Statement 16: the system of statement 15, wherein the instructions for the filtering of the primary waveform further comprise implementing a first filtering process that uses two or more interleaved bandpass filters, the first filtering process outputting two or more secondary waveforms, and the first filtering process comprising: a) calculating a set of samples of the two or more secondary waveforms; b) identifying the set of samples that exceed a first predetermined frequency range threshold, wherein each of the musical events has a different frequency range threshold; c) identifying, for each of the musical events, which of the two or more secondary waveforms is characterized by a maximum number of sample groups that exceed the first predetermined frequency range threshold; and d) for each of the musical events, identifying a primary manifest for the musical event based on the sample set of the two or more secondary waveforms characterized by the most sample set exceeding the first predetermined frequency range threshold.

Statement 17: system of claims 15 or 16Wherein the instructions for the filtering of the primary waveform further comprise implementing a second filtering process on the primary waveform using a low pass filter or a resampling process, wherein the second filtering outputs a three-level waveform, and the second filtering process further comprises: a) calculating a sample set moving average of the three-level waveform for the frequency range of each music event, and, for each of the music events, identifying at least one sample set moving average that exceeds a second predetermined frequency range threshold; and b) identifying a secondary list of musical events, the secondary list including the primary list of musical events, wherein the secondary list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in said primary list

Statement 18: the system of any of the preceding claims 15-17, wherein the instructions for the filtering of the primary waveform further comprise implementing a third filtering process on the primary waveform that uses a high pass filter, wherein the third filtering outputs a four-level waveform, and the third filtering comprises: a) identifying a set of samples of the four-level waveform that exceed a third predetermined second frequency range threshold; and b) identifying a tertiary list of musical events, the tertiary list including the secondary list of musical events, wherein the tertiary list of musical events includes musical events that are: these music events being related to the point in time of the music events comprised in said secondary list

Statement 19: the system of any of the preceding claims 15-18, wherein the instructions for the automatic analysis of the substitute waveform further comprise identifying song segments in the primary waveform representing the musical piece for which a number of musical events exceeds a predetermined density threshold.

Statement 20: the system of any of the preceding claims 15-19, wherein the instructions to identify music events further comprise generating a hit/miss output file that identifies the song segments in the primary waveform for which the number of music events exceeds the predetermined density threshold.

Statement 21: the system of any of the preceding claims 15-20, wherein the plurality of musical events comprises one or more of a base drum and a snare drum.

While various examples and other information may be used to explain various aspects within the scope of the appended claims, no limitation to the claims should be inferred based on the specific features or arrangements of such examples, as one of ordinary skill in the art would be able to use these examples to derive various embodiments. Furthermore, although the subject matter may have been described in language specific to examples of structural features and/or methodological steps, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts. For example, such functionality may be distributed in different ways or implemented in different components than those described herein. Rather, the described features and steps are disclosed as examples of system components and methods within the scope of the claims.

Claims

1. A computer-implemented method for identifying musical events, the method comprising:

receiving a primary waveform representing a musical piece, wherein the musical piece includes a plurality of musical events;

filtering the primary waveform to generate a substitute waveform associated with the plurality of musical events; and

automatically analyzing the substitute waveform to identify points in time in the primary waveform that correspond to the plurality of musical events.

2. The computer-implemented method of claim 1, wherein the filtering of the primary waveform comprises a first filtering process using two or more interleaved bandpass filters, the first filtering process outputting two or more secondary waveforms, and the first filtering process comprises:

a) calculating a set of samples of the two or more secondary waveforms;

b) identifying the set of samples that exceed a first predetermined frequency range threshold, wherein each of the musical events has a different frequency range threshold;

c) identifying, for each of the musical events, which of the two or more secondary waveforms is characterized by a maximum number of sample groups that exceed the first predetermined frequency range threshold; and

d) for each of the music events, identifying a primary manifest for the music event based on the sample groups of the two or more secondary waveforms that are characterized by the most sample groups that exceed the first predetermined frequency range threshold.

3. The computer-implemented method of claim 2, wherein the filtering of the primary waveform further comprises a second filtering process of the primary waveform using a low pass filter or a resampling process, wherein the second filtering outputs a three-level waveform, and the second filtering process further comprises:

a) calculating a sample set moving average of the three-level waveform for the frequency range of each music event, and, for each of the music events, identifying at least one sample set moving average that exceeds a second predetermined frequency range threshold; and

b) identifying a secondary list of musical events, the secondary list including the primary list of musical events, wherein the secondary list of musical events includes musical events that: these music events being related to the point in time of the music events comprised in said primary list

4. The computer-implemented method of claim 3, wherein the filtering of the primary waveform further comprises implementing a third filtering on the primary waveform using a high pass filter, wherein the third filtering outputs a four-level waveform, and the third filtering comprises:

a) identifying a set of samples of the four-level waveform that exceed a third predetermined second frequency range threshold; and

b) identifying a tertiary list of music events, the tertiary list including the secondary list of music events, wherein the tertiary list of music events includes music events that are: these music events being related to the point in time of the music events comprised in said secondary list

5. The computer-implemented method of claim 1, wherein the automatic analysis of the substitute waveform further comprises identifying song segments in the primary waveform representative of the musical piece for which a number of musical events exceeds a predetermined density threshold.

6. The computer-implemented method of claim 5, further comprising generating a hit/miss output file that identifies the song segments in the primary waveform for which the number of music events exceeds the predetermined density threshold.

7. The computer-implemented method of claim 1, wherein the plurality of musical events comprises one or more of a base drum and a snare drum.

8. A non-transitory computer-readable medium comprising instructions that identify a musical event, which when executed by a computing system, cause the computing system to:

filtering the primary waveform to generate a substitute waveform associated with the plurality of musical events; and the number of the first and second groups,

9. The non-transitory computer-readable medium of claim 8, wherein the instructions for the filtering of the primary waveform further comprise implementing a first filtering process using two or more interleaved bandpass filters, the first filtering process outputting two or more secondary waveforms, and the first filtering process comprising:

a) calculating a set of samples of the two or more secondary waveforms;

10. The non-transitory computer-readable medium of claim 9, wherein the instructions for the filtering of the primary waveform further comprise implementing a second filtering process on the primary waveform using a low pass filter or a resampling process, wherein the second filtering outputs a three-level waveform, and the second filtering process further comprises:

11. The non-transitory computer-readable medium of claim 10, wherein the instructions for the filtering of the primary waveform further comprise implementing a third filtering process on the primary waveform that uses a high pass filter, wherein the third filtering outputs a four-level waveform, and the third filtering comprises:

12. The non-transitory computer-readable medium of claim 8, wherein the instructions for the automatic analysis of the substitute waveform further comprise identifying song segments in the primary waveform representing the musical piece for which a number of musical events exceeds a predetermined density threshold.

13. The non-transitory computer-readable medium of claim 12, wherein the instructions to identify music events further comprise generating a hit/miss output file that identifies the song segments in the primary waveform for which the number of music events exceeds the predetermined density threshold.

14. The non-transitory computer-readable medium of claim 8, wherein the plurality of musical events includes one or more of a base drum and a snare drum.

15. A system for identifying musical events, the system comprising:

a processor; and

a non-transitory computer-readable medium storing instructions that, when executed by the system, cause the system to:

16. The system of claim 15, wherein the instructions for the filtering of the primary waveform further comprise implementing a first filtering process using two or more interleaved bandpass filters, the first filtering process outputting two or more secondary waveforms, and the first filtering process comprising:

a) calculating a set of samples of the two or more secondary waveforms;

17. The system of claim 16, wherein the instructions for the filtering of the primary waveform further comprise implementing a second filtering process on the primary waveform using a low pass filter or a resampling process, wherein the second filtering outputs a three-level waveform, and wherein the second filtering process further comprises:

18. The system of claim 17, wherein the instructions for the filtering of the primary waveform further comprise implementing a third filtering process on the primary waveform using a high pass filter, wherein the third filtering outputs a four-level waveform, and the third filtering comprises:

19. The system of claim 15, wherein the instructions for the automatic analysis of the substitute waveform further comprise identifying song segments in the primary waveform representing the musical piece for which a number of musical events exceeds a predetermined density threshold.

20. The system of claim 19, wherein the instructions for identifying music events further comprise generating a hit/miss output file that identifies the song segments in the primary waveform for which the number of music events exceeds the predetermined density threshold.

21. The system of claim 15, wherein the plurality of musical events includes one or more of a base drum and a snare drum.