US9653095B1 - Systems and methods for determining a repeatogram in a music composition using audio features - Google Patents

Systems and methods for determining a repeatogram in a music composition using audio features Download PDF

Info

Publication number
US9653095B1
US9653095B1 US15/251,571 US201615251571A US9653095B1 US 9653095 B1 US9653095 B1 US 9653095B1 US 201615251571 A US201615251571 A US 201615251571A US 9653095 B1 US9653095 B1 US 9653095B1
Authority
US
United States
Prior art keywords
partition
audio
audio track
partitions
correlated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/251,571
Inventor
David Tcheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GoPro Inc
Original Assignee
GoPro Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GoPro Inc filed Critical GoPro Inc
Priority to US15/251,571 priority Critical patent/US9653095B1/en
Assigned to GOPRO, INC. reassignment GOPRO, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TCHENG, DAVID
Assigned to JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT reassignment JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GOPRO, INC.
Priority to US15/458,333 priority patent/US10068011B1/en
Application granted granted Critical
Publication of US9653095B1 publication Critical patent/US9653095B1/en
Assigned to GOPRO, INC. reassignment GOPRO, INC. RELEASE OF PATENT SECURITY INTEREST Assignors: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/02Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
    • G10H1/06Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/051Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/061Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Definitions

  • the disclosure relates to constructing a dataset representing similar segments in a music composition using audio energy.
  • Musical compositions may be characterized by being self-similar.
  • a musical composition may contain acoustically similar segments throughout its duration.
  • Musical composition recorded on an audio track may include segments or partitions that are repeated one or more times throughout a duration of the audio track.
  • An algorithm may compare a partition of the audio track to the audio track itself to determine which other partition of the audio track correlates best with the partition.
  • a minimum distance between the partition and other partitions being compared may be set. The minimum distance may be determined by a user, set by a minimum distance parameter, and/or otherwise determined. The minimum distance may be constant and/or may be varied.
  • a time to the most correlated partition may be determined and recorded.
  • a dataset creating a visual representation of partitions being repeated throughout the audio track may be constructed (a “Repeatogram”).
  • the dataset may be constructed by comparing a current partition of an audio track to all remaining partitions of the audio track, determining a correlated partition that represent most likely the same sound as the current partition, and plotting the time from the current partition to the correlated partition within the audio track on y-axis while the time of audio track duration is plotted on the x-axis. This process may be repeated iteratively for one or more partitions of one or more audio tracks.
  • a system configured for constructing a dataset for representing repeated sounds within an audio track may include one or more servers.
  • the server(s) may be configured to communicate with one or more client computing platforms according to a client/server architecture. The users of the system may access the system via client computing platform(s).
  • the server(s) may be configured to execute one or more computer program components.
  • the computer program components may include one or more of an audio track component, a partition component, a comparison component, a correlation component, a repeatogram component, and/or other components.
  • the audio track component may be configured to obtain one or more audio tracks from one or more media files.
  • an audio track and/or other audio tracks may be obtained from a media file and/or other media files.
  • the media file may be available within the repository of media files available via the system and/or available on a third party platform, which may be accessible and/or available via the system.
  • the audio track component may be configured to obtain audio content obtained from one or more audio tracks obtained from one or more media files.
  • Audio content may include musical content.
  • musical content may be in the form of a musical composition such as a song performance, a classical music performance, an electronic music performance and/or other musical content.
  • the audio track component may be configured to obtain audio tracks from media files by extracting audio signals from media files, and/or by other techniques.
  • the audio track component may be configured to obtain an audio track by extracting audio signal from the media file.
  • Audio signal may include audio information and may contain sound information. Audio information contained within a media file may be extracted in the form of an audio track.
  • the audio track component may be configured to extract audio signals from one or more media files associated with one or more frequency by applying one or more frequency bandpass filters.
  • a frequency bandpass filter applied to the media file may extract audio signal having frequencies between 1000 Hz and 5000 Hz.
  • the audio track component may be configured to extract audio features of the audio information obtained from the audio track.
  • Audio features may include audio energy representations, audio frequency representations, harmonic sound information, and/or other features.
  • the audio track component may be configured to extract one or more audio energy representations from one or more audio tracks.
  • an audio energy representation and/or other representations may be extracted from the audio track.
  • the audio track component may be configured to transform one or more audio energy representations into a frequency domain to generate a spectral energy profile of the one or more audio energy representations.
  • the audio track component may be configured to transform an audio energy representation of an audio track into a frequency domain to generate a spectral energy profile of the audio energy representation.
  • the audio track component may be configured to obtain harmonic sound information representing a harmonic sound from one or more audio tracks.
  • Harmonic information may be obtained by transforming one or more audio energy representations of one or more audio tracks into a frequency space in which energy may be represented as a function of frequency to generate a harmonic energy spectrum of the one or more audio tracks.
  • the partition component may be configured to obtain one or more partition sizes.
  • One or more partition sizes may include a partition size value that refers to a portion of an audio track duration.
  • the partition size value may be expressed in time units including seconds, milliseconds, and/or other units.
  • the partition component may be configured to obtain one or more partition size values that may include a partition size generated by a user, a randomly generated partition size, and/or otherwise obtained. By way of non-limiting illustration, a partition size may be obtained.
  • the partition component may be configured to partition one or more audio track durations of one or more audio tracks into multiple partitions of one or more partition sizes. Individual partitions of one or more partition sizes may span the entirety of the audio track comprised of audio information obtained via audio track component 106 from the audio wave content of one or more audio tracks.
  • the audio track may be partitioned into multiple partitions of the partition size. Individual partitions may occur at different time periods of the audio track duration.
  • individual partitions of the partition size partitioning the audio track may occur at different time periods of the audio track duration.
  • the comparison component may be configured to compare one or more partitions of one or more audio tracks to remaining one or more partitions of one or more audio tracks.
  • the comparison component may be configured to correlate audio features of one or more partitions of one or more audio tracks.
  • a current partition of the partition size of the audio track may be compared against all remaining partitions of the current partition size of the audio track to correlate individual audio features of individual remaining partitions.
  • the comparison component may be configured to compare one or more audio features of one or more partitions of one or more audio tracks.
  • the comparison component may be configured to compare one or more audio energy representations of one or more partitions of one or more audio tracks.
  • the comparison component may be configured to compare one or more audio frequency representations of one or more partitions of one or more audio tracks.
  • the comparison component may be configured to compare one or more harmonic information of one or more partitions, of one or more audio tracks, including pitch of the harmonic sound and harmonic energy of one or more partitions and/or other features.
  • the comparison component may be configured to compare audio features of individual partitions of one or more audio track within the multi-resolution framework, which is incorporated by reference.
  • This process performed by the comparison component may be iterative such that the comparison component may compare audio features of the current partition of the partition size of the audio track against remaining partitions of the partition size of the audio track of the audio track for every partition of the audio track whereby changing the position of the current partition within the audio track duration with individual iteration until the end of the audio track duration has been reached. For example, if the number of the partitions partitioning the audio track duration is x, the comparison component may be configured to perform the comparison process x times.
  • a partition at a first audio track duration position may be compared to x ⁇ 1 number of partitions, then, at next iteration the comparison component may be configured to compare a partition at a second audio track duration position to x ⁇ 1 number of partitions, and so on, until the last partition of the number of partitions is reached.
  • the system may accumulate a number of transmitted correlation results obtained from the comparison component. The correlation results may be transmitted to the system and a determination for the most accurate result during each iteration may be made.
  • the comparison component may be configured to apply one or more constraint parameters to control the comparison process.
  • the comparison constraint parameters may include one or more of setting a minimum distance between comparison being compared, limiting a comparison time, limiting frequency bands, limiting a number of comparison iterations and/or other constrains.
  • the comparison component may be configured to apply a minimum distance parameter when comparing the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track.
  • the minimum distance parameter may refer to a portion of the audio track duration between the current partition and the remaining partitions.
  • the minimum distance parameter applied may be constant and/or may be varied with every comparison iteration.
  • a certain portion of the track duration corresponding to a distance between a current partition and remaining partitions may be removed from a comparison process.
  • a minimum distance parameter corresponding to a shorter distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a short distance to the current partition (e.g., a drum beat repeating at every measure).
  • the minimum distance parameter corresponding to a longer distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a longer distance to the current partition.
  • the minimum distance parameter may be set by a system, selected by a user, and/or otherwise obtained.
  • the minimum distance parameter may include values that are periodic and cycle through a set of minimum distances with each iteration.
  • a minimum distance parameter may include values representing a distance from a current partition to remaining partitions equal to 0.5 second, 1 second, 2 seconds, 4 seconds, 8 seconds, 16 seconds, 32 seconds, 64 seconds, 128 seconds, 256 seconds, and/or include other values.
  • the comparison component may be configured to determine a time it took to compare the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track. Time taken to compare audio features of the current partition of the audio track to audio features of the remaining individual partitions of the audio track may be transmitted to the system.
  • the comparison component may utilize the time taken to correlate audio features of the current partition in subsequent comparison iterations. For example, time taken to compare a current partition of the remaining partitions may be equal to 5 seconds.
  • the comparison component may be configured to limit the next comparison iteration at a subsequent temporal window to 5 seconds. In one implementation, the time taken to compare initial current partition may be utilized by the other constraint comparison parameters and/or used as a constant value.
  • the comparison component may be configured to limit the audio track duration of one or more audio tracks during the comparison process by applying a comparison window set by a comparison window parameter.
  • the comparison component may be configured to limit the audio track duration of one or more audio track being compared by applying the comparison window parameter (i.e., by setting a comparison window).
  • the comparison window parameter may include a time of audio track duration to which the comparison may be limited, a position of the comparison window, including a start position and an end position, and/or other constrains. This value may be predetermined by the system, set by a user, and/or otherwise obtained.
  • the comparison component may be configured to limit the audio track duration such that the comparison window parameter may not be greater than 50 percent of the audio track duration. For example, if an audio track is 500 seconds then the length of the comparison window set by the comparison window parameter may not be greater than 250 seconds.
  • the comparison window parameter may have a predetermined start position that may be generated by the system and/or may be based on user input.
  • System 100 may generate a start position of the comparison window based on the audio track duration.
  • the start position may be randomly set to a portion of the audio track duration.
  • the user may generate the start position of the comparison window based on specific audio features of the audio track.
  • user may know that an audio track may contain audio features in an introductory portion of the audio track that represent the same sound captured at a final portion of the audio track.
  • a musical composition may be characterized by a number of sections that may be recombined and repeated in different ways throughout the composition.
  • An introductory section may often contain a primary theme that may be repeated often, a middle section may contain original theme that may contain elements of the primary theme, and a final section may contain restatement of the primary theme.
  • audio features associated with the introductory section and the final section may be used to generate the Repeatogram.
  • the comparison component may be configured to exclude one or more portions of one or more audio tracks from the comparison process during every comparison iteration based on the comparison window parameter.
  • the comparison component may be configured to exclude same and/or different portion of one or more audio tracks from the comparison process.
  • the comparison component may be configured to exclude a portion of the audio track during every iteration performed by the comparison component.
  • the comparison component may be configured to compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track within the multi-resolution framework, which is incorporated by reference.
  • the comparison component may be configured to compare audio features of the current partitions of the audio track against remaining partitions of the second audio track at a mid-resolution level. Audio features of individual partitions of the audio track at the mid-resolution level may be compared at the mid-resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of a first comparison may identify correlated audio features from the current partition and the remaining partitions of the audio tracks that may represent energy in the same sound. The result of the first comparison may be transmitted to the system after the first comparison is completed.
  • the second comparison may be performed at a level of resolution that may be higher than the mid-resolution level. Audio features of individual partitions of the audio track at the higher resolution level may be compared at the higher resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of the second comparison may be transmitted to the system.
  • This process may be iterative such that the comparison component may compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track at every resolution level whereby increasing the resolution with individual iteration until the highest level of resolution is reached. For example, if the number of resolution levels within individual audio track is finite, the comparison component may be configured to compare audio features at a mid-resolution level first, then, at next iteration, the comparison component may be configured to compare audio features at a resolution level higher than the resolution level of previous iteration, and so on. The last iteration may be performed at the highest resolution level.
  • the system may accumulate a number of transmitted correlation results obtained from the comparison component. The correlation results may be transmitted to the system and a determination for the most accurate result may be made.
  • the correlation component may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track that is most likely to represent the same sounds as the current partition.
  • the correlation component may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track based on the results of comparing the current partition of the partition size obtained by the partitioning component via the comparison component to correlate audio features obtained by audio track component, and/or based on other techniques.
  • the correlated partition may reflect a partition that most likely represents the same sound as the current partition.
  • the correlation component may be configured to determine multiple correlated partitions between the current partition of the audio track and the remaining partitions of the audio track. Individual correlated partitions may be based on comparing individual audio features of one or more partitions of the audio track via the comparison component, as described above.
  • the correlation component may be configured to assign a weight to individual correlated partitions.
  • the correlation component may be configured to determine a final correlated partition by computing weighted averages of multiple correlated partitions and/or by performing other computations.
  • the repeatogram component may be configured to record the correlation between the current partition of the audio track and the correlated partition of the audio track.
  • the repeatogram component may be configured to record a time from the current partition to the most correlated partition determined by the correlation component, and/or based on other techniques. With every iteration performed by the comparison component, the repeatogram component may be configured to record a correlation such that a time from a next current partition to the most correlated partition with the next partition is recorded.
  • the system may accumulate a number of records associated with times between a current partition and a most correlated partition transmitted by the repeatogram component.
  • the repeatogram component may be configured to construct a dataset representing multiple correlations determined by the correlation component as a result of multiple iterations performed by the comparison component.
  • the repeatogram component may be configured to construct a dataset that may visually represent repeated partitions of the audio track by plotting multiple correlations in a two-dimensional time space as data points with the size of individual data points monotonically increasing with correlation strength.
  • the two-dimensional time space may be characterized by a two-coordinate system in which an x-axis may represent the audio duration time, including the current partition time, and a y-axis may represent a time from the current partition to the correlated partition.
  • the repeatogram component may be configured to plot the time from the current partition to the most correlated partition determined by the correlation component on the y-axis as a function of the current partition time on the x-axis.
  • the repeatogram component may be configured to construct the dataset representing every iteration performed by the comparison component such that every time from a next current partition to the most correlated partition with the next partition recorded is plotted in the two-dimensional time space.
  • the repeatogram component may be configured to include positive and negative values on the y-axis representing the time from the current partition to the correlated partition.
  • the value of the time from the current partition to the correlated partition may be based on the relative position of the current partition to the correlated partition within the audio track duration and/or based on other techniques.
  • the repeatogram component may be configured to assign a positive value to the time between the correlated partition and the current partition if the correlated partition occurs after the current partition on the audio track duration.
  • the repeatogram component may be configured to assign a negative value to the time between the correlated partition and the current partition if the correlated partition occurs before the current partition on the audio track duration.
  • the repeatogram component may be configured plot the positive time value on the y-axis representing positive values.
  • the repeatogram component may be configured plot the negative time value on the y-axis representing negative values.
  • FIG. 1 illustrates a system for constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations.
  • FIG. 2 illustrates an exemplary representation of obtaining an audio track, in accordance with one or more implementations.
  • FIG. 3 illustrates an exemplary schematic of partitioning an audio track duration into partitions of varying partition size, in accordance with one or more implementations.
  • FIG. 4 illustrates an exemplary schematic of a comparison process between a current partition of an audio track and remaining partitions into which the audio track was partitioned, in accordance with one or more implementations.
  • FIG. 5 illustrates an exemplary schematic of a dataset constructed by comparing one or more partition of the audio track to the remaining partitions, in accordance with one or more implementations.
  • FIG. 6 illustrates a method for constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations.
  • FIG. 1 illustrates a system 100 constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations.
  • system 100 may include one or more server(s) 102 .
  • Server(s) 102 may be configured to communicate with one or more client computing platforms 104 according to a client/server architecture.
  • the users of system 100 may access system 100 via client computing platform(s) 104 .
  • Server(s) 102 may be configured to execute one or more computer program components.
  • the computer program components may include one or more of audio track component 106 , partition component 108 , comparison component 110 , correlation component 112 , repeatogram component 114 , and/or other components.
  • a repository of media files may be available via system 100 (e.g., via electronic storage 122 and/or other storage location).
  • the repository of media files may be associated with different users.
  • system 100 and/or server(s) 102 may be configured for various types of media files that may include video files that include audio content, audio files, and/or other types of files that include some audio content.
  • Other types of media items may include one or more of audio files (e.g., music, podcasts, audio books, and/or other audio files), multimedia presentations, photos, slideshows, and/or other media files.
  • the media files may be received from one or more storage locations associated with client computing platform(s) 104 , server(s) 102 , and/or other storage locations where media files may be stored.
  • Client computing platform(s) 104 may include one or more of a cellular telephone, a smartphone, a digital camera, a laptop, a tablet computer, a desktop computer, a television set-top box, a smart TV, a gaming console, and/or other client computing platforms.
  • the plurality of media files may include audio files that may not contain video content.
  • Audio track component 106 may be configured to obtain one or more audio tracks from one or more media files.
  • an audio track and/or other audio tracks may be obtained from a media file and/or other media files.
  • the media file may be available within the repository of media files available via system 100 and/or available on a third party platform, which may be accessible and/or available via system 100 .
  • Audio track component 106 may be configured to obtain audio content obtained from one or more audio tracks obtained from one or more media files.
  • Audio content may include musical content.
  • musical content may be in the form of a musical composition such as a song performance, a classical music performance, an electronic music performance and/or other musical content.
  • Audio track component 106 may be configured to obtain audio tracks from media files by extracting audio signals from media files, and/or by other techniques.
  • audio track component 106 may be configured to obtain the audio track by extracting audio signal from the media file.
  • audio track 202 may contain audio information. Audio information may contain sound information which may be graphically visualized as waveform of sound pressure 205 as a function of time.
  • the sound wave's amplitude is mapped on the vertical axis with time on the horizontal axis.
  • the audio information contained within a media file may be extracted in the form of an audio track.
  • audio track component 106 may be configured to extract audio signals from one or more media files associated with one or more frequency by applying one or more frequency bandpass filters.
  • a frequency bandpass filter applied to the media file may extract audio signal having frequencies between 1000 Hz and 5000 Hz.
  • Audio track component 106 may be configured to extract audio features of the audio information obtained from the audio track. Audio features may include audio energy representations, audio frequency representations, harmonic sound information, and/or other features.
  • audio track component 106 may be configured to extract one or more audio energy representations from one or more audio tracks.
  • an audio energy representation and/or other representations may be extracted from the audio track.
  • audio track component 106 may be configured to transform one or more audio energy representations into a frequency domain to generate a spectral energy profile of the one or more audio energy representations.
  • audio track component 106 may be configured to transform the audio energy representation of the audio track into a frequency domain to generate a spectral energy profile of the audio energy representation.
  • audio track component 106 may be configured to obtain harmonic sound information representing a harmonic sound from one or more audio tracks. Harmonic information may be obtained by transforming one or more audio energy representations of one or more audio tracks into a frequency space in which energy may be represented as a function of frequency to generate a harmonic energy spectrum of the one or more audio tracks.
  • Partition component 108 may be configured to obtain one or more partition sizes.
  • One or more partition sizes may include a partition size value that refers to a portion of an audio track duration.
  • Partition size value may be expressed in time units including seconds, milliseconds, and/or other units.
  • Partition component 108 may be configured to obtain partition size values that may include a partition size generated by a user, a randomly generated partition size, and/or otherwise obtained. By way of non-limiting illustration, a partition size may be obtained.
  • Partition component 108 may be configured to partition one or more audio track durations of one or more audio tracks into multiple partitions of one or more partition sizes. Individual partitions of one or more partition sizes may span the entirety of the audio track comprised of audio information obtained via audio track component 106 from the audio wave content of one or more audio tracks. By way of non-limiting illustration, the audio track may be partitioned into multiple partitions of the partition size. Individual partitions may occur at different time periods of the audio track duration. By way of non-limiting illustration, individual partitions of the partition size partitioning the audio track may occur at different time periods of the audio track duration.
  • audio track 302 of audio track duration 306 may be partitioned into multiple partitions of partition size 318 .
  • Audio track 302 of audio track duration 306 may be partitioned into multiple partitions of partition size 328 .
  • Partition size 318 may be different than partition size 328 .
  • Partition 304 of partition size 318 may occur at time period 308 of audio track duration 306 .
  • Partition 305 of partition size 318 may occur at time period 309 of audio track duration 306 .
  • Partition 307 of partition 328 may occur at time period 310 of audio track duration 306 .
  • Partition 311 of partition size 328 may occur at time period 312 of audio track duration 306 .
  • comparison component 110 may be configured to compare one or more partitions of one or more audio tracks to remaining one or more partitions of one or more audio tracks.
  • comparison component 110 may be configured to correlate audio features of one or more partitions of one or more audio tracks.
  • a current partition of the partition size of the audio track may be compared against all remaining partitions of the current partition size of the audio track to correlate individual audio features of individual remaining partitions.
  • comparison process 404 may compare current partition 410 of partition size 411 of audio track 405 against remaining partitions 422 of partition size 411 .
  • Remaining partitions may be partition 412 , partition 414 , partition 416 , partition 418 , partition 420 , partition 422 , partition 424 , partition 426 .
  • Comparison process 404 may compare current partition 410 to partition 412 .
  • Comparison process 404 may compare current partition 410 to partition 414 .
  • Comparison process 404 may compare current partition 410 to partition 416 .
  • Comparison process 404 may compare current partition 410 to partition 418 .
  • Comparison process 404 may compare current partition 410 to partition 420 .
  • Comparison process 404 may compare current partition 410 to partition 422 .
  • Comparison process 404 may compare current partition 410 to partition 424 .
  • Comparison process 404 may compare current partition 410 to partition 426 .
  • comparison component 110 may be configured to compare one or more audio features of one or more partitions of one or more audio tracks.
  • comparison component 110 may be configured to compare one or more audio energy representations of one or more partitions of one or more audio tracks.
  • comparison component 110 may be configured to compare one or more audio frequency representations of one or more partitions of one or more audio tracks.
  • comparison component 110 may be configured to compare one or more harmonic information of one or more partitions, of one or more audio tracks, including pitch of the harmonic sound and harmonic energy of one or more partitions and/or other features.
  • comparison component 110 may be configured to compare audio features of individual partitions of one or more audio track within the multi-resolution framework, which is incorporated by reference.
  • this process performed by comparison component 110 may be iterative such that comparison component 110 may compare audio features of the current partition of the partition size of the audio track against remaining partitions of the partition size of the audio track of the audio track for every partition of the audio track whereby changing the position of the current partition within the audio track duration with individual iteration until the end of the audio track duration has been reached. For example, if the number of the partitions partitioning the audio track duration is x, comparison component 110 may be configured to perform the comparison process x times.
  • a partition at a first audio track duration position may be compared to x ⁇ 1 number of partitions, then, at next iteration comparison component 110 may be configured to compare a partition at a second audio track duration position to x ⁇ 1 number of partitions, and so on, until the last partition of the number of partitions is reached.
  • System 100 may accumulate a number of transmitted correlation results obtained from comparison component 110 . The correlation results may be transmitted to system 100 and a determination for the most accurate result during each iteration may be made.
  • comparison component 110 may be configured to apply one or more constraint parameters to control the comparison process.
  • the comparison constraint parameters may include one or more of setting a minimum distance between comparison being compared, limiting comparison time, limiting frequency bands, limiting a number of comparison iterations and/or other constrains.
  • Comparison component 110 may be configured to apply a minimum distance parameter when comparing the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track.
  • the minimum distance parameter may refer to a portion of the audio track duration between the current partition and the remaining partitions.
  • the minimum distance parameter applied may be constant or may be varied with each comparison iteration.
  • a certain portion of the track duration corresponding to a distance between a current partition and the remaining partitions may be removed from the comparison process.
  • a minimum distance parameter corresponding to a shorter distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a short distance to the current partition (e.g., a drum beat repeating at every measure).
  • the minimum distance parameter corresponding to a longer distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a longer distance to the current partition.
  • the minimum distance parameter may be set by a system, selected by a user, and/or otherwise obtained.
  • the minimum distance parameter may include values that are periodic and cycle through a set of minimum distances with each iteration.
  • the minimum distance parameter may include values representing a distance from a current partition to remaining partitions equal to 0.5 second, 1 second, 2 seconds, 4 seconds, 8 seconds, 16 seconds, 32 seconds, 64 seconds, 128 seconds, 256 seconds, and/or include other values.
  • Comparison component 110 may be configured to determine a time it took to compare the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track. Time taken to compare audio features of the current partition of the audio track to audio features of the remaining individual partitions of the audio track may be transmitted to system 100 . Comparison component 110 may utilize the time taken to correlate audio features of the current partition in subsequent comparison iterations. For example, time taken to compare a current partition of the remaining partitions may be equal to 5 seconds. Comparison component 110 may be configured to limit the next comparison iteration at a subsequent temporal window to 5 seconds. In one implementation, the time taken to compare initial current partition may be utilized by the other constraint comparison parameters and/or used as a constant value.
  • Comparison component 110 may be configured to limit the audio track duration of one or more audio tracks during the comparison process by applying a comparison window set by a comparison window parameter. Comparison component 110 may be configured to limit the audio track duration of one or more audio track being compared by applying the comparison window parameter (i.e., by setting a comparison window).
  • the comparison window parameter may include a time of audio track duration to which the comparison may be limited, a position of the comparison window, including a start position and an end position, and/or other constrains. This value may be predetermined by system 100 , set by a user, and/or otherwise obtained.
  • comparison component 110 may be configured to limit the audio track duration such that the comparison window parameter may not be greater than 50 percent of the audio track duration. For example, if an audio track is 500 seconds then the length of the comparison window set by the comparison window parameter may not be greater than 250 seconds.
  • the comparison window parameter may have a predetermined start position that may be generated by system 100 and/or may be based on user input.
  • System 100 may generate a start position of the comparison window based on the audio track duration.
  • the start position may be randomly set to a portion of the audio track duration.
  • the user may generate the start position of the comparison window based on specific audio features of the audio track. For example, user may know that an audio track may contain audio features in an introductory portion of the audio track that represent the same sound captured at a final portion of the audio track.
  • a musical composition may be characterized by a number of sections that may be recombined and repeated in different ways throughout the composition.
  • An introductory section may often contain a primary theme that may be repeated often, a middle section may contain original theme that may contain elements of the primary theme, and a final section may contain restatement of the primary theme.
  • audio features associated with the introductory section and the final section may be used to generate the repeatogram.
  • Comparison component 110 may be configured to exclude one or more portions of one or more audio tracks from the comparison process based on the comparison window parameter during every comparison iteration. Comparison component 110 may be configured to exclude same and/or different portion of one or more audio tracks from the comparison process. For example, the comparison window parameter may be set such that a portion of the audio track is excluded during every iteration performed by comparison component 110 .
  • comparison component 110 may be configured to compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track within the multi-resolution framework, which is incorporated by reference.
  • comparison component 110 may be configured to compare audio features of the current partitions of the audio track against remaining partitions of the second audio track at a mid-resolution level. Audio features of individual partitions of the audio track at the mid-resolution level may be compared at the mid-resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of a first comparison may identify correlated audio features from the current partition and the remaining partitions of the audio tracks that may represent energy in the same sound. The result of the first comparison may be transmitted to system 100 after the first comparison is completed.
  • the second comparison may be performed at a level of resolution that may be higher than the mid-resolution level. Audio features of individual partitions of the audio track at the higher resolution level may be compared at the higher resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of the second comparison may be transmitted to system 100 .
  • comparison component 110 may compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track at every resolution level whereby increasing the resolution with individual iteration until the highest level of resolution is reached. For example, if the number of resolution levels within individual audio track is finite, comparison component 110 may be configured to compare audio features at a mid-resolution level first, then, at next iteration, comparison component 110 may be configured to compare audio features at a resolution level higher than the resolution level of previous iteration, and so on. The last iteration may be performed at the highest resolution level.
  • System 100 may accumulate a number of transmitted correlation results obtained from comparison component 110 . The correlation results may be transmitted to system 100 and a determination for the most accurate result may be made.
  • Correlation component 112 may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track that is most likely to represent the same sounds as the current partition.
  • correlation component 112 may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track based on the results of comparing the current partition of the partition size obtained by partitioning component 108 via comparison component 110 to correlate audio features obtained by audio track component, and/or based on other techniques.
  • the correlated partition may reflect a partition that most likely represents the same sound as the current partition.
  • correlation component 112 may be configured to determine multiple correlated partitions between the current partition of the audio track and the remaining partitions of the audio track. Individual correlated partitions may be based on comparing individual audio features of one or more partitions of the audio track via comparison component 110 , as described above. Correlation component 112 may be configured to assign a weight to individual correlated partitions. Correlation component 112 may be configured to determine a final correlated partition by computing weighted averages of multiple correlated partitions and/or by performing other computations.
  • Repeatogram component 114 may be configured to record the correlation between the current partition of the audio track and the correlated partition of the audio track.
  • repeatogram component 114 may be configured to record a time from the current partition to the most correlated partition determined by correlation component 112 , and/or based on other techniques. With every iteration performed by comparison component 110 , repeatogram component 114 may be configured to record a correlation such that a time from a next current partition to the most correlated partition with the next partition is recorded.
  • System 100 may accumulate a number of records associated with times between a current partition and a most correlated partition transmitted by repeatogram component 114 .
  • Repeatogram component 114 may be configured to construct a dataset representing multiple correlations determined by correlation component 112 as a result of multiple iterations performed by comparison component 110 .
  • Repeatogram component 114 may be configured to construct a dataset that may visually represent repeated partitions of the audio track by plotting multiple correlations in a two-dimensional time space as data points with the size of individual data points monotonically increasing with correlation strength.
  • the two-dimensional time space may be characterized by a two-coordinate system in which an x-axis may represent the audio duration time, including the current partition time, and a y-axis may represent a time from the current partition to the correlated partition.
  • repeatogram component 114 may be configured to plot the time from the current partition to the most correlated partition determined by correlation component 112 on the y-axis as a function of the current partition time on the x-axis.
  • Repeatogram component 114 may be configured to construct the dataset representing every iteration performed by comparison component 110 such that every time from a next current partition to the most correlated partition with the next partition recorded is plotted in the two-dimensional time space.
  • Repeatogram component 114 may be configured to include positive and negative values on the y-axis representing the time from the current partition to the correlated partition. The value of the time from the current partition to the correlated partition may be based on the relative position of the current partition to the correlated partition within the audio track duration and/or based on other techniques. By way of non-limiting illustration, repeatogram component 114 may be configured to assign a positive value to the time between the correlated partition and the current partition if the correlated partition occurs after the current partition on the audio track duration. Repeatogram component 114 may be configured to assign a negative value to the time between the correlated partition and the current partition if the correlated partition occurs before the current partition on the audio track duration. Repeatogram component 114 may be configured plot the positive time value on the y-axis representing positive values. Repeatogram component 114 may be configured plot the negative time value on the y-axis representing negative values.
  • repeatogram 504 is constructed from audio track 502 partitioned into partitions of partition size 518 .
  • Repeatogram 504 is a visual representation of repeated partitions within audio track 502 .
  • Repeatogram 504 displays dataset recorded by repeatogram component 114 .
  • the dataset displayed by repeatogram 504 is a plot of multiple correlations in a two-dimensional time space characterized by x-axis 506 representing audio track duration, including the current partition time, and y-axis 508 representing a time from a current partition to a correlated partition.
  • Multiple partitions being sequentially repeated along the audio track duration are represented as solid lines 510 on repeatogram 504 .
  • Partitions that do not sequentially repeat along the audio track duration are represented as broken lines 512 on repeatogram 504 .
  • Partitions that do not repeat are not represented by either solid or broken lines on repeatogram 504 .
  • server(s) 102 , client computing platform(s) 104 , and/or external resources 120 may be operatively linked via one or more electronic communication links.
  • electronic communication links may be established, at least in part, via a network such as the Internet and/or other networks. It will be appreciated that this is not intended to be limiting, and that the scope of this disclosure includes implementations in which server(s) 102 , client computing platform(s) 104 , and/or external resources 120 may be operatively linked via some other communication media.
  • a given client computing platform 104 may include one or more processors configured to execute computer program components.
  • the computer program components may be configured to enable a producer and/or user associated with the given client computing platform 104 to interface with system 100 and/or external resources 120 , and/or provide other functionality attributed herein to client computing platform(s) 104 .
  • the given client computing platform 104 may include one or more of a desktop computer, a laptop computer, a handheld computer, a NetBook, a Smartphone, a gaming console, and/or other computing platforms.
  • External resources 120 may include sources of information, hosts and/or providers of virtual environments outside of system 100 , external entities participating with system 100 , and/or other resources. In some implementations, some or all of the functionality attributed herein to external resources 120 may be provided by resources included in system 100 .
  • Server(s) 102 may include electronic storage 122 , one or more processors 124 , and/or other components. Server(s) 102 may include communication lines, or ports to enable the exchange of information with a network and/or other computing platforms. Illustration of server(s) 102 in FIG. 1 is not intended to be limiting. Servers(s) 102 may include a plurality of hardware, software, and/or firmware components operating together to provide the functionality attributed herein to server(s) 102 . For example, server(s) 102 may be implemented by a cloud of computing platforms operating together as server(s) 102 .
  • Electronic storage 122 may include electronic storage media that electronically stores information.
  • the electronic storage media of electronic storage 122 may include one or both of system storage that is provided integrally (i.e., substantially non-removable) with server(s) 102 and/or removable storage that is removably connectable to server(s) 102 via, for example, a port (e.g., a USB port, a firewire port, etc.) or a drive (e.g., a disk drive, etc.).
  • a port e.g., a USB port, a firewire port, etc.
  • a drive e.g., a disk drive, etc.
  • Electronic storage 122 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media.
  • the electronic storage 122 may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources).
  • Electronic storage 122 may store software algorithms, information determined by processor(s) 124 , information received from server(s) 102 , information received from client computing platform(s) 104 , and/or other information that enables server(s) 102 to function as described herein.
  • Processor(s) 124 may be configured to provide information processing capabilities in server(s) 102 .
  • processor(s) 124 may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information.
  • processor(s) 124 is shown in FIG. 1 as a single entity, this is for illustrative purposes only.
  • processor(s) 124 may include a plurality of processing units. These processing units may be physically located within the same device, or processor(s) 124 may represent processing functionality of a plurality of devices operating in coordination.
  • the processor(s) 124 may be configured to execute computer readable instruction components 106 , 108 , 110 , 112 , 114 and/or other components.
  • the processor(s) 124 may be configured to execute components 106 , 108 , 110 , 112 , 114 and/or other components by software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on processor(s) 124 .
  • components 106 , 108 , 110 , 112 , and 114 are illustrated in FIG. 1 as being co-located within a single processing unit, in implementations in which processor(s) 124 includes multiple processing units, one or more of components 106 , 108 , 110 , 112 , and/or 114 may be located remotely from the other components.
  • the description of the functionality provided by the different components 106 , 108 , 110 , 112 , and/or 114 described herein is for illustrative purposes, and is not intended to be limiting, as any of components 106 , 108 , 110 , 112 , and/or 114 may provide more or less functionality than is described.
  • processor(s) 124 may be configured to execute one or more additional components that may perform some or all of the functionality attributed herein to one of components 106 , 108 , 110 , 112 , and/or 114 .
  • FIG. 6 illustrates a method 600 for constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations.
  • the operations of method 600 presented below are intended to be illustrative. In some implementations, method 600 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of method 600 are illustrated in FIG. 6 and described below is not intended to be limiting.
  • method 600 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information).
  • the one or more processing devices may include one or more devices executing some or all of the operations of method 600 in response to instructions stored electronically on an electronic storage medium.
  • the one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 600 .
  • audio information may be obtained from an audio track of an audio track duration. Operation 602 may be performed by one or more physical processors executing an audio track component that is the same as or similar to audio track component 106 , in accordance with one or more implementations.
  • an audio track may be partitioned into partitions of a partition size occurring at different time periods along the audio track duration. Operation 604 may be performed by one or more physical processors executing a partition component that is the same as or similar to partition component 108 , in accordance with one or more implementations.
  • a current partition of the audio track duration may be compared to remaining partitions. Operation 606 may be performed by one or more physical processors executing a comparison component that is the same as or similar to comparison component 110 , in accordance with one or more implementations.
  • a correlated partition for the current partition from among the remaining partitions of the track duration may be determined. Operation 608 may be performed by one or more physical processors executing a correlation component that is the same as or similar to correlation component 112 , in accordance with one or more implementations.
  • the correlation between the current partition and the correlated partition may be recorded.
  • the correlation recorded to represent the partition time period of the correlated partition as a function of partition time period of the current partition may be organized.
  • Operations 610 and 612 may be performed by one or more physical processors executing a repeatogram component that is the same as or similar to repeatogram component 114 , in accordance with one or more implementations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

A dataset representing repeated sounds within a musical composition recorded on an audio track may be constructed. An audio track duration of an audio track may be partitioned into partitions of a partition size. A current partition may be compared to remaining partitions of the audio track. Audio information for the current partition may be correlated to audio information for remaining partitions to determine a correlated partition for the current partition from among the remaining partitions of the track duration. The correlated partition determined may be identified as most likely to represent the same sound as the current partition. This comparison process may be performed iteratively, for individual ones of the remaining partitions. Correlation results of the comparison process may be recorded to represent the partition time period of the correlated partition as a function of partition time period of the current partition.

Description

FIELD
The disclosure relates to constructing a dataset representing similar segments in a music composition using audio energy.
BACKGROUND
Musical compositions may be characterized by being self-similar. A musical composition may contain acoustically similar segments throughout its duration.
SUMMARY
Musical composition recorded on an audio track may include segments or partitions that are repeated one or more times throughout a duration of the audio track. An algorithm may compare a partition of the audio track to the audio track itself to determine which other partition of the audio track correlates best with the partition. A minimum distance between the partition and other partitions being compared may be set. The minimum distance may be determined by a user, set by a minimum distance parameter, and/or otherwise determined. The minimum distance may be constant and/or may be varied. A time to the most correlated partition may be determined and recorded. A dataset creating a visual representation of partitions being repeated throughout the audio track may be constructed (a “Repeatogram”).
The dataset may be constructed by comparing a current partition of an audio track to all remaining partitions of the audio track, determining a correlated partition that represent most likely the same sound as the current partition, and plotting the time from the current partition to the correlated partition within the audio track on y-axis while the time of audio track duration is plotted on the x-axis. This process may be repeated iteratively for one or more partitions of one or more audio tracks.
In some implementations, a system configured for constructing a dataset for representing repeated sounds within an audio track may include one or more servers. The server(s) may be configured to communicate with one or more client computing platforms according to a client/server architecture. The users of the system may access the system via client computing platform(s). The server(s) may be configured to execute one or more computer program components. The computer program components may include one or more of an audio track component, a partition component, a comparison component, a correlation component, a repeatogram component, and/or other components.
The audio track component may be configured to obtain one or more audio tracks from one or more media files. By way of non-limiting illustration, an audio track and/or other audio tracks may be obtained from a media file and/or other media files. The media file may be available within the repository of media files available via the system and/or available on a third party platform, which may be accessible and/or available via the system.
The audio track component may be configured to obtain audio content obtained from one or more audio tracks obtained from one or more media files. Audio content may include musical content. As one example, musical content may be in the form of a musical composition such as a song performance, a classical music performance, an electronic music performance and/or other musical content.
The audio track component may be configured to obtain audio tracks from media files by extracting audio signals from media files, and/or by other techniques. By way of non-limiting illustration, the audio track component may be configured to obtain an audio track by extracting audio signal from the media file. Audio signal may include audio information and may contain sound information. Audio information contained within a media file may be extracted in the form of an audio track.
In some implementations, the audio track component may be configured to extract audio signals from one or more media files associated with one or more frequency by applying one or more frequency bandpass filters. For example, a frequency bandpass filter applied to the media file may extract audio signal having frequencies between 1000 Hz and 5000 Hz.
The audio track component may be configured to extract audio features of the audio information obtained from the audio track. Audio features may include audio energy representations, audio frequency representations, harmonic sound information, and/or other features.
In some implementations, the audio track component may be configured to extract one or more audio energy representations from one or more audio tracks. By way of non-limiting illustration, an audio energy representation and/or other representations may be extracted from the audio track.
In some implementations, the audio track component may be configured to transform one or more audio energy representations into a frequency domain to generate a spectral energy profile of the one or more audio energy representations. By way of non-limiting illustration, the audio track component may be configured to transform an audio energy representation of an audio track into a frequency domain to generate a spectral energy profile of the audio energy representation.
In some implementations, the audio track component may be configured to obtain harmonic sound information representing a harmonic sound from one or more audio tracks. Harmonic information may be obtained by transforming one or more audio energy representations of one or more audio tracks into a frequency space in which energy may be represented as a function of frequency to generate a harmonic energy spectrum of the one or more audio tracks.
The partition component may be configured to obtain one or more partition sizes. One or more partition sizes may include a partition size value that refers to a portion of an audio track duration. The partition size value may be expressed in time units including seconds, milliseconds, and/or other units. The partition component may be configured to obtain one or more partition size values that may include a partition size generated by a user, a randomly generated partition size, and/or otherwise obtained. By way of non-limiting illustration, a partition size may be obtained.
The partition component may be configured to partition one or more audio track durations of one or more audio tracks into multiple partitions of one or more partition sizes. Individual partitions of one or more partition sizes may span the entirety of the audio track comprised of audio information obtained via audio track component 106 from the audio wave content of one or more audio tracks. By way of non-limiting illustration, the audio track may be partitioned into multiple partitions of the partition size. Individual partitions may occur at different time periods of the audio track duration. By way of non-limiting illustration, individual partitions of the partition size partitioning the audio track may occur at different time periods of the audio track duration.
The comparison component may be configured to compare one or more partitions of one or more audio tracks to remaining one or more partitions of one or more audio tracks. For example, the comparison component may be configured to correlate audio features of one or more partitions of one or more audio tracks. By way of non-limiting illustration, a current partition of the partition size of the audio track may be compared against all remaining partitions of the current partition size of the audio track to correlate individual audio features of individual remaining partitions.
In various implementations, the comparison component may be configured to compare one or more audio features of one or more partitions of one or more audio tracks. For example, the comparison component may be configured to compare one or more audio energy representations of one or more partitions of one or more audio tracks. In some implementations, the comparison component may be configured to compare one or more audio frequency representations of one or more partitions of one or more audio tracks. In yet another implementation, the comparison component may be configured to compare one or more harmonic information of one or more partitions, of one or more audio tracks, including pitch of the harmonic sound and harmonic energy of one or more partitions and/or other features.
In some implementations, the comparison component may be configured to compare audio features of individual partitions of one or more audio track within the multi-resolution framework, which is incorporated by reference.
This process performed by the comparison component may be iterative such that the comparison component may compare audio features of the current partition of the partition size of the audio track against remaining partitions of the partition size of the audio track of the audio track for every partition of the audio track whereby changing the position of the current partition within the audio track duration with individual iteration until the end of the audio track duration has been reached. For example, if the number of the partitions partitioning the audio track duration is x, the comparison component may be configured to perform the comparison process x times. First, a partition at a first audio track duration position may be compared to x−1 number of partitions, then, at next iteration the comparison component may be configured to compare a partition at a second audio track duration position to x−1 number of partitions, and so on, until the last partition of the number of partitions is reached. The system may accumulate a number of transmitted correlation results obtained from the comparison component. The correlation results may be transmitted to the system and a determination for the most accurate result during each iteration may be made.
In various implementations, the comparison component may be configured to apply one or more constraint parameters to control the comparison process. The comparison constraint parameters may include one or more of setting a minimum distance between comparison being compared, limiting a comparison time, limiting frequency bands, limiting a number of comparison iterations and/or other constrains.
The comparison component may be configured to apply a minimum distance parameter when comparing the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track. The minimum distance parameter may refer to a portion of the audio track duration between the current partition and the remaining partitions. The minimum distance parameter applied may be constant and/or may be varied with every comparison iteration. By way of non-limiting illustration, a certain portion of the track duration corresponding to a distance between a current partition and remaining partitions may be removed from a comparison process. For example, a minimum distance parameter corresponding to a shorter distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a short distance to the current partition (e.g., a drum beat repeating at every measure). The minimum distance parameter corresponding to a longer distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a longer distance to the current partition.
The minimum distance parameter may be set by a system, selected by a user, and/or otherwise obtained. In some implementations, the minimum distance parameter may include values that are periodic and cycle through a set of minimum distances with each iteration. For example, a minimum distance parameter may include values representing a distance from a current partition to remaining partitions equal to 0.5 second, 1 second, 2 seconds, 4 seconds, 8 seconds, 16 seconds, 32 seconds, 64 seconds, 128 seconds, 256 seconds, and/or include other values.
The comparison component may be configured to determine a time it took to compare the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track. Time taken to compare audio features of the current partition of the audio track to audio features of the remaining individual partitions of the audio track may be transmitted to the system. The comparison component may utilize the time taken to correlate audio features of the current partition in subsequent comparison iterations. For example, time taken to compare a current partition of the remaining partitions may be equal to 5 seconds. The comparison component may be configured to limit the next comparison iteration at a subsequent temporal window to 5 seconds. In one implementation, the time taken to compare initial current partition may be utilized by the other constraint comparison parameters and/or used as a constant value.
The comparison component may be configured to limit the audio track duration of one or more audio tracks during the comparison process by applying a comparison window set by a comparison window parameter. The comparison component may be configured to limit the audio track duration of one or more audio track being compared by applying the comparison window parameter (i.e., by setting a comparison window). The comparison window parameter may include a time of audio track duration to which the comparison may be limited, a position of the comparison window, including a start position and an end position, and/or other constrains. This value may be predetermined by the system, set by a user, and/or otherwise obtained.
In some implementation, the comparison component may be configured to limit the audio track duration such that the comparison window parameter may not be greater than 50 percent of the audio track duration. For example, if an audio track is 500 seconds then the length of the comparison window set by the comparison window parameter may not be greater than 250 seconds.
The comparison window parameter may have a predetermined start position that may be generated by the system and/or may be based on user input. System 100 may generate a start position of the comparison window based on the audio track duration. For example, the start position may be randomly set to a portion of the audio track duration. In some implementations, the user may generate the start position of the comparison window based on specific audio features of the audio track. For example, user may know that an audio track may contain audio features in an introductory portion of the audio track that represent the same sound captured at a final portion of the audio track. For example, a musical composition may be characterized by a number of sections that may be recombined and repeated in different ways throughout the composition. An introductory section may often contain a primary theme that may be repeated often, a middle section may contain original theme that may contain elements of the primary theme, and a final section may contain restatement of the primary theme. Thus, audio features associated with the introductory section and the final section may be used to generate the Repeatogram.
The comparison component may be configured to exclude one or more portions of one or more audio tracks from the comparison process during every comparison iteration based on the comparison window parameter. The comparison component may be configured to exclude same and/or different portion of one or more audio tracks from the comparison process. For example, the comparison component may be configured to exclude a portion of the audio track during every iteration performed by the comparison component.
In some implementations, the comparison component may be configured to compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track within the multi-resolution framework, which is incorporated by reference.
For example, the comparison component may be configured to compare audio features of the current partitions of the audio track against remaining partitions of the second audio track at a mid-resolution level. Audio features of individual partitions of the audio track at the mid-resolution level may be compared at the mid-resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of a first comparison may identify correlated audio features from the current partition and the remaining partitions of the audio tracks that may represent energy in the same sound. The result of the first comparison may be transmitted to the system after the first comparison is completed.
The second comparison may be performed at a level of resolution that may be higher than the mid-resolution level. Audio features of individual partitions of the audio track at the higher resolution level may be compared at the higher resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of the second comparison may be transmitted to the system.
This process may be iterative such that the comparison component may compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track at every resolution level whereby increasing the resolution with individual iteration until the highest level of resolution is reached. For example, if the number of resolution levels within individual audio track is finite, the comparison component may be configured to compare audio features at a mid-resolution level first, then, at next iteration, the comparison component may be configured to compare audio features at a resolution level higher than the resolution level of previous iteration, and so on. The last iteration may be performed at the highest resolution level. The system may accumulate a number of transmitted correlation results obtained from the comparison component. The correlation results may be transmitted to the system and a determination for the most accurate result may be made.
The correlation component may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track that is most likely to represent the same sounds as the current partition. By way of non-limiting illustration, the correlation component may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track based on the results of comparing the current partition of the partition size obtained by the partitioning component via the comparison component to correlate audio features obtained by audio track component, and/or based on other techniques. The correlated partition may reflect a partition that most likely represents the same sound as the current partition.
In some implementations, the correlation component may be configured to determine multiple correlated partitions between the current partition of the audio track and the remaining partitions of the audio track. Individual correlated partitions may be based on comparing individual audio features of one or more partitions of the audio track via the comparison component, as described above. The correlation component may be configured to assign a weight to individual correlated partitions. The correlation component may be configured to determine a final correlated partition by computing weighted averages of multiple correlated partitions and/or by performing other computations.
The repeatogram component may be configured to record the correlation between the current partition of the audio track and the correlated partition of the audio track. By way of non-limiting illustration, the repeatogram component may be configured to record a time from the current partition to the most correlated partition determined by the correlation component, and/or based on other techniques. With every iteration performed by the comparison component, the repeatogram component may be configured to record a correlation such that a time from a next current partition to the most correlated partition with the next partition is recorded. The system may accumulate a number of records associated with times between a current partition and a most correlated partition transmitted by the repeatogram component.
The repeatogram component may be configured to construct a dataset representing multiple correlations determined by the correlation component as a result of multiple iterations performed by the comparison component. The repeatogram component may be configured to construct a dataset that may visually represent repeated partitions of the audio track by plotting multiple correlations in a two-dimensional time space as data points with the size of individual data points monotonically increasing with correlation strength. The two-dimensional time space may be characterized by a two-coordinate system in which an x-axis may represent the audio duration time, including the current partition time, and a y-axis may represent a time from the current partition to the correlated partition. By way of non-limiting illustration, the repeatogram component may be configured to plot the time from the current partition to the most correlated partition determined by the correlation component on the y-axis as a function of the current partition time on the x-axis. The repeatogram component may be configured to construct the dataset representing every iteration performed by the comparison component such that every time from a next current partition to the most correlated partition with the next partition recorded is plotted in the two-dimensional time space.
The repeatogram component may be configured to include positive and negative values on the y-axis representing the time from the current partition to the correlated partition. The value of the time from the current partition to the correlated partition may be based on the relative position of the current partition to the correlated partition within the audio track duration and/or based on other techniques. By way of non-limiting illustration, the repeatogram component may be configured to assign a positive value to the time between the correlated partition and the current partition if the correlated partition occurs after the current partition on the audio track duration. The repeatogram component may be configured to assign a negative value to the time between the correlated partition and the current partition if the correlated partition occurs before the current partition on the audio track duration. The repeatogram component may be configured plot the positive time value on the y-axis representing positive values. The repeatogram component may be configured plot the negative time value on the y-axis representing negative values.
These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a system for constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations.
FIG. 2 illustrates an exemplary representation of obtaining an audio track, in accordance with one or more implementations.
FIG. 3 illustrates an exemplary schematic of partitioning an audio track duration into partitions of varying partition size, in accordance with one or more implementations.
FIG. 4 illustrates an exemplary schematic of a comparison process between a current partition of an audio track and remaining partitions into which the audio track was partitioned, in accordance with one or more implementations.
FIG. 5 illustrates an exemplary schematic of a dataset constructed by comparing one or more partition of the audio track to the remaining partitions, in accordance with one or more implementations.
FIG. 6 illustrates a method for constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations.
DETAILED DESCRIPTION
FIG. 1 illustrates a system 100 constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations. As is illustrated in FIG. 1, system 100 may include one or more server(s) 102. Server(s) 102 may be configured to communicate with one or more client computing platforms 104 according to a client/server architecture. The users of system 100 may access system 100 via client computing platform(s) 104. Server(s) 102 may be configured to execute one or more computer program components. The computer program components may include one or more of audio track component 106, partition component 108, comparison component 110, correlation component 112, repeatogram component 114, and/or other components.
A repository of media files may be available via system 100 (e.g., via electronic storage 122 and/or other storage location). The repository of media files may be associated with different users. In some implementations, system 100 and/or server(s) 102 may be configured for various types of media files that may include video files that include audio content, audio files, and/or other types of files that include some audio content. Other types of media items may include one or more of audio files (e.g., music, podcasts, audio books, and/or other audio files), multimedia presentations, photos, slideshows, and/or other media files. The media files may be received from one or more storage locations associated with client computing platform(s) 104, server(s) 102, and/or other storage locations where media files may be stored. Client computing platform(s) 104 may include one or more of a cellular telephone, a smartphone, a digital camera, a laptop, a tablet computer, a desktop computer, a television set-top box, a smart TV, a gaming console, and/or other client computing platforms. In some implementations, the plurality of media files may include audio files that may not contain video content.
Audio track component 106 may be configured to obtain one or more audio tracks from one or more media files. By way of non-limiting illustration, an audio track and/or other audio tracks may be obtained from a media file and/or other media files. The media file may be available within the repository of media files available via system 100 and/or available on a third party platform, which may be accessible and/or available via system 100.
Audio track component 106 may be configured to obtain audio content obtained from one or more audio tracks obtained from one or more media files. Audio content may include musical content. As one example, musical content may be in the form of a musical composition such as a song performance, a classical music performance, an electronic music performance and/or other musical content.
Audio track component 106 may be configured to obtain audio tracks from media files by extracting audio signals from media files, and/or by other techniques. By way of non-limiting illustration, audio track component 106 may be configured to obtain the audio track by extracting audio signal from the media file. For example and referring to FIG. 2, audio track 202 may contain audio information. Audio information may contain sound information which may be graphically visualized as waveform of sound pressure 205 as a function of time.
The sound wave's amplitude is mapped on the vertical axis with time on the horizontal axis. Thus, the audio information contained within a media file may be extracted in the form of an audio track.
Referring back to FIG. 1, in some implementations, audio track component 106 may be configured to extract audio signals from one or more media files associated with one or more frequency by applying one or more frequency bandpass filters. For example, a frequency bandpass filter applied to the media file may extract audio signal having frequencies between 1000 Hz and 5000 Hz.
Audio track component 106 may be configured to extract audio features of the audio information obtained from the audio track. Audio features may include audio energy representations, audio frequency representations, harmonic sound information, and/or other features.
In some implementations, audio track component 106 may be configured to extract one or more audio energy representations from one or more audio tracks. By way of non-limiting illustration, an audio energy representation and/or other representations may be extracted from the audio track.
In some implementations, audio track component 106 may be configured to transform one or more audio energy representations into a frequency domain to generate a spectral energy profile of the one or more audio energy representations. By way of non-limiting illustration, audio track component 106 may be configured to transform the audio energy representation of the audio track into a frequency domain to generate a spectral energy profile of the audio energy representation.
In some implementations, audio track component 106 may be configured to obtain harmonic sound information representing a harmonic sound from one or more audio tracks. Harmonic information may be obtained by transforming one or more audio energy representations of one or more audio tracks into a frequency space in which energy may be represented as a function of frequency to generate a harmonic energy spectrum of the one or more audio tracks.
Partition component 108 may be configured to obtain one or more partition sizes. One or more partition sizes may include a partition size value that refers to a portion of an audio track duration. Partition size value may be expressed in time units including seconds, milliseconds, and/or other units. Partition component 108 may be configured to obtain partition size values that may include a partition size generated by a user, a randomly generated partition size, and/or otherwise obtained. By way of non-limiting illustration, a partition size may be obtained.
Partition component 108 may be configured to partition one or more audio track durations of one or more audio tracks into multiple partitions of one or more partition sizes. Individual partitions of one or more partition sizes may span the entirety of the audio track comprised of audio information obtained via audio track component 106 from the audio wave content of one or more audio tracks. By way of non-limiting illustration, the audio track may be partitioned into multiple partitions of the partition size. Individual partitions may occur at different time periods of the audio track duration. By way of non-limiting illustration, individual partitions of the partition size partitioning the audio track may occur at different time periods of the audio track duration.
For example, and as illustrated in FIG. 3, audio track 302 of audio track duration 306 may be partitioned into multiple partitions of partition size 318. Audio track 302 of audio track duration 306 may be partitioned into multiple partitions of partition size 328. Partition size 318 may be different than partition size 328. Partition 304 of partition size 318 may occur at time period 308 of audio track duration 306. Partition 305 of partition size 318 may occur at time period 309 of audio track duration 306. Partition 307 of partition 328 may occur at time period 310 of audio track duration 306. Partition 311 of partition size 328 may occur at time period 312 of audio track duration 306.
Referring back to FIG. 1, comparison component 110 may be configured to compare one or more partitions of one or more audio tracks to remaining one or more partitions of one or more audio tracks. For example, comparison component 110 may be configured to correlate audio features of one or more partitions of one or more audio tracks. By way of non-limiting illustration, a current partition of the partition size of the audio track may be compared against all remaining partitions of the current partition size of the audio track to correlate individual audio features of individual remaining partitions. For example, and as illustrated by FIG. 4, comparison process 404 may compare current partition 410 of partition size 411 of audio track 405 against remaining partitions 422 of partition size 411. Remaining partitions may be partition 412, partition 414, partition 416, partition 418, partition 420, partition 422, partition 424, partition 426. Comparison process 404 may compare current partition 410 to partition 412. Comparison process 404 may compare current partition 410 to partition 414. Comparison process 404 may compare current partition 410 to partition 416. Comparison process 404 may compare current partition 410 to partition 418. Comparison process 404 may compare current partition 410 to partition 420. Comparison process 404 may compare current partition 410 to partition 422. Comparison process 404 may compare current partition 410 to partition 424. Comparison process 404 may compare current partition 410 to partition 426.
Referring back to FIG. 1, in various implementations, comparison component 110 may be configured to compare one or more audio features of one or more partitions of one or more audio tracks. For example, comparison component 110 may be configured to compare one or more audio energy representations of one or more partitions of one or more audio tracks. In some implementations, comparison component 110 may be configured to compare one or more audio frequency representations of one or more partitions of one or more audio tracks. In yet another implementation, comparison component 110 may be configured to compare one or more harmonic information of one or more partitions, of one or more audio tracks, including pitch of the harmonic sound and harmonic energy of one or more partitions and/or other features.
In some implementations, comparison component 110 may be configured to compare audio features of individual partitions of one or more audio track within the multi-resolution framework, which is incorporated by reference.
Referring back to FIG. 1, this process performed by comparison component 110 may be iterative such that comparison component 110 may compare audio features of the current partition of the partition size of the audio track against remaining partitions of the partition size of the audio track of the audio track for every partition of the audio track whereby changing the position of the current partition within the audio track duration with individual iteration until the end of the audio track duration has been reached. For example, if the number of the partitions partitioning the audio track duration is x, comparison component 110 may be configured to perform the comparison process x times. First, a partition at a first audio track duration position may be compared to x−1 number of partitions, then, at next iteration comparison component 110 may be configured to compare a partition at a second audio track duration position to x−1 number of partitions, and so on, until the last partition of the number of partitions is reached. System 100 may accumulate a number of transmitted correlation results obtained from comparison component 110. The correlation results may be transmitted to system 100 and a determination for the most accurate result during each iteration may be made.
In various implementations, comparison component 110 may be configured to apply one or more constraint parameters to control the comparison process. The comparison constraint parameters may include one or more of setting a minimum distance between comparison being compared, limiting comparison time, limiting frequency bands, limiting a number of comparison iterations and/or other constrains.
Comparison component 110 may be configured to apply a minimum distance parameter when comparing the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track. The minimum distance parameter may refer to a portion of the audio track duration between the current partition and the remaining partitions. The minimum distance parameter applied may be constant or may be varied with each comparison iteration. By way of non-limiting illustration, a certain portion of the track duration corresponding to a distance between a current partition and the remaining partitions may be removed from the comparison process. For example, a minimum distance parameter corresponding to a shorter distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a short distance to the current partition (e.g., a drum beat repeating at every measure). The minimum distance parameter corresponding to a longer distance between the current partition and the remaining partitions may result in finding correlated partitions characterized by a longer distance to the current partition.
The minimum distance parameter may be set by a system, selected by a user, and/or otherwise obtained. In some implementations, the minimum distance parameter may include values that are periodic and cycle through a set of minimum distances with each iteration. For example, the minimum distance parameter may include values representing a distance from a current partition to remaining partitions equal to 0.5 second, 1 second, 2 seconds, 4 seconds, 8 seconds, 16 seconds, 32 seconds, 64 seconds, 128 seconds, 256 seconds, and/or include other values.
Comparison component 110 may be configured to determine a time it took to compare the current partition of the partition size of the audio track against the remaining partitions of the partition size of the audio track. Time taken to compare audio features of the current partition of the audio track to audio features of the remaining individual partitions of the audio track may be transmitted to system 100. Comparison component 110 may utilize the time taken to correlate audio features of the current partition in subsequent comparison iterations. For example, time taken to compare a current partition of the remaining partitions may be equal to 5 seconds. Comparison component 110 may be configured to limit the next comparison iteration at a subsequent temporal window to 5 seconds. In one implementation, the time taken to compare initial current partition may be utilized by the other constraint comparison parameters and/or used as a constant value.
Comparison component 110 may be configured to limit the audio track duration of one or more audio tracks during the comparison process by applying a comparison window set by a comparison window parameter. Comparison component 110 may be configured to limit the audio track duration of one or more audio track being compared by applying the comparison window parameter (i.e., by setting a comparison window). The comparison window parameter may include a time of audio track duration to which the comparison may be limited, a position of the comparison window, including a start position and an end position, and/or other constrains. This value may be predetermined by system 100, set by a user, and/or otherwise obtained.
In some implementation, comparison component 110 may be configured to limit the audio track duration such that the comparison window parameter may not be greater than 50 percent of the audio track duration. For example, if an audio track is 500 seconds then the length of the comparison window set by the comparison window parameter may not be greater than 250 seconds.
The comparison window parameter may have a predetermined start position that may be generated by system 100 and/or may be based on user input. System 100 may generate a start position of the comparison window based on the audio track duration. For example, the start position may be randomly set to a portion of the audio track duration. In some implementations, the user may generate the start position of the comparison window based on specific audio features of the audio track. For example, user may know that an audio track may contain audio features in an introductory portion of the audio track that represent the same sound captured at a final portion of the audio track. For example, a musical composition may be characterized by a number of sections that may be recombined and repeated in different ways throughout the composition. An introductory section may often contain a primary theme that may be repeated often, a middle section may contain original theme that may contain elements of the primary theme, and a final section may contain restatement of the primary theme. Thus, audio features associated with the introductory section and the final section may be used to generate the repeatogram.
Comparison component 110 may be configured to exclude one or more portions of one or more audio tracks from the comparison process based on the comparison window parameter during every comparison iteration. Comparison component 110 may be configured to exclude same and/or different portion of one or more audio tracks from the comparison process. For example, the comparison window parameter may be set such that a portion of the audio track is excluded during every iteration performed by comparison component 110.
In some implementations, comparison component 110 may be configured to compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track within the multi-resolution framework, which is incorporated by reference.
For example, comparison component 110 may be configured to compare audio features of the current partitions of the audio track against remaining partitions of the second audio track at a mid-resolution level. Audio features of individual partitions of the audio track at the mid-resolution level may be compared at the mid-resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of a first comparison may identify correlated audio features from the current partition and the remaining partitions of the audio tracks that may represent energy in the same sound. The result of the first comparison may be transmitted to system 100 after the first comparison is completed.
The second comparison may be performed at a level of resolution that may be higher than the mid-resolution level. Audio features of individual partitions of the audio track at the higher resolution level may be compared at the higher resolution level to correlate audio features between the current partition of the audio track and the remaining partitions of the audio track. The result of the second comparison may be transmitted to system 100.
This process may be iterative such that comparison component 110 may compare audio features of the current partition of the audio track against audio features of the remaining partitions of the audio track at every resolution level whereby increasing the resolution with individual iteration until the highest level of resolution is reached. For example, if the number of resolution levels within individual audio track is finite, comparison component 110 may be configured to compare audio features at a mid-resolution level first, then, at next iteration, comparison component 110 may be configured to compare audio features at a resolution level higher than the resolution level of previous iteration, and so on. The last iteration may be performed at the highest resolution level. System 100 may accumulate a number of transmitted correlation results obtained from comparison component 110. The correlation results may be transmitted to system 100 and a determination for the most accurate result may be made.
Correlation component 112 may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track that is most likely to represent the same sounds as the current partition. By way of non-limiting illustration, correlation component 112 may be configured to determine a correlated partition for the current partition from among the remaining partitions of the audio track based on the results of comparing the current partition of the partition size obtained by partitioning component 108 via comparison component 110 to correlate audio features obtained by audio track component, and/or based on other techniques. The correlated partition may reflect a partition that most likely represents the same sound as the current partition.
In some implementations, correlation component 112 may be configured to determine multiple correlated partitions between the current partition of the audio track and the remaining partitions of the audio track. Individual correlated partitions may be based on comparing individual audio features of one or more partitions of the audio track via comparison component 110, as described above. Correlation component 112 may be configured to assign a weight to individual correlated partitions. Correlation component 112 may be configured to determine a final correlated partition by computing weighted averages of multiple correlated partitions and/or by performing other computations.
Repeatogram component 114 may be configured to record the correlation between the current partition of the audio track and the correlated partition of the audio track. By way of non-limiting illustration, repeatogram component 114 may be configured to record a time from the current partition to the most correlated partition determined by correlation component 112, and/or based on other techniques. With every iteration performed by comparison component 110, repeatogram component 114 may be configured to record a correlation such that a time from a next current partition to the most correlated partition with the next partition is recorded. System 100 may accumulate a number of records associated with times between a current partition and a most correlated partition transmitted by repeatogram component 114.
Repeatogram component 114 may be configured to construct a dataset representing multiple correlations determined by correlation component 112 as a result of multiple iterations performed by comparison component 110. Repeatogram component 114 may be configured to construct a dataset that may visually represent repeated partitions of the audio track by plotting multiple correlations in a two-dimensional time space as data points with the size of individual data points monotonically increasing with correlation strength. The two-dimensional time space may be characterized by a two-coordinate system in which an x-axis may represent the audio duration time, including the current partition time, and a y-axis may represent a time from the current partition to the correlated partition. By way of non-limiting illustration, repeatogram component 114 may be configured to plot the time from the current partition to the most correlated partition determined by correlation component 112 on the y-axis as a function of the current partition time on the x-axis. Repeatogram component 114 may be configured to construct the dataset representing every iteration performed by comparison component 110 such that every time from a next current partition to the most correlated partition with the next partition recorded is plotted in the two-dimensional time space.
Repeatogram component 114 may be configured to include positive and negative values on the y-axis representing the time from the current partition to the correlated partition. The value of the time from the current partition to the correlated partition may be based on the relative position of the current partition to the correlated partition within the audio track duration and/or based on other techniques. By way of non-limiting illustration, repeatogram component 114 may be configured to assign a positive value to the time between the correlated partition and the current partition if the correlated partition occurs after the current partition on the audio track duration. Repeatogram component 114 may be configured to assign a negative value to the time between the correlated partition and the current partition if the correlated partition occurs before the current partition on the audio track duration. Repeatogram component 114 may be configured plot the positive time value on the y-axis representing positive values. Repeatogram component 114 may be configured plot the negative time value on the y-axis representing negative values.
For example, and as illustrated in FIG. 5, repeatogram 504 is constructed from audio track 502 partitioned into partitions of partition size 518. Repeatogram 504 is a visual representation of repeated partitions within audio track 502. Repeatogram 504 displays dataset recorded by repeatogram component 114. The dataset displayed by repeatogram 504 is a plot of multiple correlations in a two-dimensional time space characterized by x-axis 506 representing audio track duration, including the current partition time, and y-axis 508 representing a time from a current partition to a correlated partition. Multiple partitions being sequentially repeated along the audio track duration are represented as solid lines 510 on repeatogram 504. Partitions that do not sequentially repeat along the audio track duration are represented as broken lines 512 on repeatogram 504. Partitions that do not repeat are not represented by either solid or broken lines on repeatogram 504.
Referring again to FIG. 1, in some implementations, server(s) 102, client computing platform(s) 104, and/or external resources 120 may be operatively linked via one or more electronic communication links. For example, such electronic communication links may be established, at least in part, via a network such as the Internet and/or other networks. It will be appreciated that this is not intended to be limiting, and that the scope of this disclosure includes implementations in which server(s) 102, client computing platform(s) 104, and/or external resources 120 may be operatively linked via some other communication media.
A given client computing platform 104 may include one or more processors configured to execute computer program components. The computer program components may be configured to enable a producer and/or user associated with the given client computing platform 104 to interface with system 100 and/or external resources 120, and/or provide other functionality attributed herein to client computing platform(s) 104. By way of non-limiting example, the given client computing platform 104 may include one or more of a desktop computer, a laptop computer, a handheld computer, a NetBook, a Smartphone, a gaming console, and/or other computing platforms.
External resources 120 may include sources of information, hosts and/or providers of virtual environments outside of system 100, external entities participating with system 100, and/or other resources. In some implementations, some or all of the functionality attributed herein to external resources 120 may be provided by resources included in system 100.
Server(s) 102 may include electronic storage 122, one or more processors 124, and/or other components. Server(s) 102 may include communication lines, or ports to enable the exchange of information with a network and/or other computing platforms. Illustration of server(s) 102 in FIG. 1 is not intended to be limiting. Servers(s) 102 may include a plurality of hardware, software, and/or firmware components operating together to provide the functionality attributed herein to server(s) 102. For example, server(s) 102 may be implemented by a cloud of computing platforms operating together as server(s) 102.
Electronic storage 122 may include electronic storage media that electronically stores information. The electronic storage media of electronic storage 122 may include one or both of system storage that is provided integrally (i.e., substantially non-removable) with server(s) 102 and/or removable storage that is removably connectable to server(s) 102 via, for example, a port (e.g., a USB port, a firewire port, etc.) or a drive (e.g., a disk drive, etc.). Electronic storage 122 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. The electronic storage 122 may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources). Electronic storage 122 may store software algorithms, information determined by processor(s) 124, information received from server(s) 102, information received from client computing platform(s) 104, and/or other information that enables server(s) 102 to function as described herein.
Processor(s) 124 may be configured to provide information processing capabilities in server(s) 102. As such, processor(s) 124 may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Although processor(s) 124 is shown in FIG. 1 as a single entity, this is for illustrative purposes only. In some implementations, processor(s) 124 may include a plurality of processing units. These processing units may be physically located within the same device, or processor(s) 124 may represent processing functionality of a plurality of devices operating in coordination. The processor(s) 124 may be configured to execute computer readable instruction components 106, 108, 110, 112, 114 and/or other components. The processor(s) 124 may be configured to execute components 106, 108, 110, 112, 114 and/or other components by software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on processor(s) 124.
It should be appreciated that although components 106, 108, 110, 112, and 114 are illustrated in FIG. 1 as being co-located within a single processing unit, in implementations in which processor(s) 124 includes multiple processing units, one or more of components 106, 108, 110, 112, and/or 114 may be located remotely from the other components. The description of the functionality provided by the different components 106, 108, 110, 112, and/or 114 described herein is for illustrative purposes, and is not intended to be limiting, as any of components 106, 108, 110, 112, and/or 114 may provide more or less functionality than is described. For example, one or more of components 106, 108, 110, 112, and/or 114 may be eliminated, and some or all of its functionality may be provided by other ones of components 106, 108, 110, 112, and/or 114. As another example, processor(s) 124 may be configured to execute one or more additional components that may perform some or all of the functionality attributed herein to one of components 106, 108, 110, 112, and/or 114.
FIG. 6 illustrates a method 600 for constructing a dataset representing repeated sounds within an audio track, in accordance with one or more implementations. The operations of method 600 presented below are intended to be illustrative. In some implementations, method 600 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of method 600 are illustrated in FIG. 6 and described below is not intended to be limiting.
In some implementations, method 600 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operations of method 600 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 600.
At an operation 602, audio information may be obtained from an audio track of an audio track duration. Operation 602 may be performed by one or more physical processors executing an audio track component that is the same as or similar to audio track component 106, in accordance with one or more implementations.
At an operation 604, an audio track may be partitioned into partitions of a partition size occurring at different time periods along the audio track duration. Operation 604 may be performed by one or more physical processors executing a partition component that is the same as or similar to partition component 108, in accordance with one or more implementations.
At an operation 606, a current partition of the audio track duration may be compared to remaining partitions. Operation 606 may be performed by one or more physical processors executing a comparison component that is the same as or similar to comparison component 110, in accordance with one or more implementations.
At an operation 608, a correlated partition for the current partition from among the remaining partitions of the track duration may be determined. Operation 608 may be performed by one or more physical processors executing a correlation component that is the same as or similar to correlation component 112, in accordance with one or more implementations.
At an operation 610, the correlation between the current partition and the correlated partition may be recorded. At an operation 612 the correlation recorded to represent the partition time period of the correlated partition as a function of partition time period of the current partition may be organized. Operations 610 and 612 may be performed by one or more physical processors executing a repeatogram component that is the same as or similar to repeatogram component 114, in accordance with one or more implementations.
Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.
Although the invention has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the invention is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present invention contemplates that, to the extent possible, one or more features of any embodiment can be combined with one or more features of any other embodiment.

Claims (20)

What is claimed is:
1. A method for constructing a dataset for representing repeated sounds within an audio track, comprising:
(a) obtaining audio information for an audio track recorded over a track duration;
(b) partitioning the track duration into multiple partitions, the individual partitions occurring at different time periods of a partition size along the track duration;
(c) comparing a current partition of the track duration to remaining partitions;
(d) determining a correlated partition for the current partition from among the remaining partitions of the track duration, the correlated partition being identified as most likely to represent the same sound as the current partition;
(e) recording the correlation between the current partition and the correlated partition;
(f) iterating over operations (c)-(e) for individual ones of the remaining partitions; and
organizing the correlations recorded at iterations of operation (e) to represent the partition time period of the correlated partition as a function of partition time period of the current partition.
2. The method of claim 1, wherein the audio track is generated from a media file, the media file including audio and video information.
3. The method of claim 1, wherein the audio information including an audio energy representation.
4. The method of claim 3, wherein the audio energy representation is filtered into a frequency band to produce a frequency energy representation, the frequency energy representation representing individual energy samples associated with sound in the frequency band captured on the audio track.
5. The method of claim 3, wherein the comparing step includes correlating individual energy samples in the frequency band of the current partition of the audio track with individual energy samples in the frequency band in the remaining partitions of the audio track.
6. The method of claim 3, wherein the audio energy representations of individual partitions are transformed into frequency space in which energy is represented as a function of frequency.
7. The method of claim 6, wherein individual transformed representations of individual partitions include identifying pitches of harmonic sound and determining magnitudes of harmonic energy at harmonics of the harmonic sound.
8. The method of claim 7, wherein the comparing step includes correlating pitch of the harmonic sound and harmonic energy of the transformed representation of the current partition of the audio track with pitch of the harmonic sound and harmonic energy of transformed representations in the remaining partitions of the audio track.
9. The method of claim 1, further comprising:
selecting a comparison window to at least one portion of the audio track, the comparison window having a start position and an end position, such that the start position corresponding with a point of the track duration the point having been selected at random, the end position corresponding with the point of the track duration having a predetermined value.
10. The method of claim 1, further comprising:
obtaining a correlation threshold;
comparing the correlated portion with the correlation threshold; and
determining whether to continue comparing the current partition of the audio track to the remaining partitions of the audio track based on the comparison of the correlated portion and the correlation threshold.
11. The method of claim 10, wherein determining whether to continue comparing the current partition of the audio track with the remaining partitions of the audio track includes determining to not continue comparing in response to the correlated partition being smaller than the correlated threshold.
12. The method of claim 1, further comprising:
determining whether to continue comparing the current partition of the audio track with the remaining partitions of the audio track by assessing whether a stopping criteria has been satisfied, such determination being based on the correlated partition and the stopping criteria.
13. The method of claim 12, wherein the stopping criteria is satisfied by multiple, consecutive determinations of the correlated partition falling within a specific range or ranges.
14. The method of claim 13, wherein the specific range or ranges are bounded by a correlation threshold or thresholds.
15. The method of claim 1, wherein operations (a)-(f) are performed by a system comprising one or more physical computer processors configured by computer readable instructions.
16. The method of claim 1, wherein the organizing the correlations recorded at iterations of operation (e) step is performed by a system comprising one or more physical computer processors configured by computer readable instructions.
17. The method of claim 9, wherein the selecting the comparison window step is performed by a system comprising one or more physical computer processors configured by computer readable instructions.
18. The method of claim 10, wherein the obtaining the correlation threshold step is performed by a system comprising one or more physical computer processors configured by computer readable instructions.
19. The method of claim 10, wherein the comparing the correlated portion with the correlation threshold step is performed by a system comprising one or more physical computer processors configured by computer readable instructions.
20. The method of claim 10, wherein the determining whether to continue comparing the current partition of the audio track to the remaining partitions of the audio track step is performed by a system comprising one or more physical computer processors configured by computer readable instructions.
US15/251,571 2016-08-30 2016-08-30 Systems and methods for determining a repeatogram in a music composition using audio features Active US9653095B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/251,571 US9653095B1 (en) 2016-08-30 2016-08-30 Systems and methods for determining a repeatogram in a music composition using audio features
US15/458,333 US10068011B1 (en) 2016-08-30 2017-03-14 Systems and methods for determining a repeatogram in a music composition using audio features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/251,571 US9653095B1 (en) 2016-08-30 2016-08-30 Systems and methods for determining a repeatogram in a music composition using audio features

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/458,333 Continuation US10068011B1 (en) 2016-08-30 2017-03-14 Systems and methods for determining a repeatogram in a music composition using audio features

Publications (1)

Publication Number Publication Date
US9653095B1 true US9653095B1 (en) 2017-05-16

Family

ID=58670560

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/251,571 Active US9653095B1 (en) 2016-08-30 2016-08-30 Systems and methods for determining a repeatogram in a music composition using audio features
US15/458,333 Active US10068011B1 (en) 2016-08-30 2017-03-14 Systems and methods for determining a repeatogram in a music composition using audio features

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/458,333 Active US10068011B1 (en) 2016-08-30 2017-03-14 Systems and methods for determining a repeatogram in a music composition using audio features

Country Status (1)

Country Link
US (2) US9653095B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9905208B1 (en) * 2017-02-21 2018-02-27 Speed of Sound Software, Inc. System and method for automatically forming a master digital audio track
US11024273B2 (en) * 2017-07-13 2021-06-01 Melotec Ltd. Method and apparatus for performing melody detection
GB2615321A (en) * 2022-02-02 2023-08-09 Altered States Tech Ltd Methods and systems for analysing an audio track

Citations (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175769A (en) 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
US20020133499A1 (en) 2001-03-13 2002-09-19 Sean Ward System and method for acoustic fingerprinting
US20030033152A1 (en) * 2001-05-30 2003-02-13 Cameron Seth A. Language independent and voice operated information management system
US6564182B1 (en) * 2000-05-12 2003-05-13 Conexant Systems, Inc. Look-ahead pitch determination
US20040083097A1 (en) * 2002-10-29 2004-04-29 Chu Wai Chung Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20040094019A1 (en) * 2001-05-14 2004-05-20 Jurgen Herre Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function
US20040148159A1 (en) 2001-04-13 2004-07-29 Crockett Brett G Method for time aligning audio signals using characterizations based on auditory events
US20040165730A1 (en) 2001-04-13 2004-08-26 Crockett Brett G Segmenting audio signals into auditory events
US20040172240A1 (en) 2001-04-13 2004-09-02 Crockett Brett G. Comparing audio using characterizations based on auditory events
US20040254660A1 (en) * 2003-05-28 2004-12-16 Alan Seefeldt Method and device to process digital media streams
US20040264561A1 (en) * 2002-05-02 2004-12-30 Cohda Wireless Pty Ltd Filter structure for iterative signal processing
US20050021325A1 (en) * 2003-07-05 2005-01-27 Jeong-Wook Seo Apparatus and method for detecting a pitch for a voice signal in a voice codec
US20050091045A1 (en) * 2003-10-25 2005-04-28 Samsung Electronics Co., Ltd. Pitch detection method and apparatus
US20050234366A1 (en) * 2004-03-19 2005-10-20 Thorsten Heinz Apparatus and method for analyzing a sound signal using a physiological ear model
US20060021494A1 (en) * 2002-10-11 2006-02-02 Teo Kok K Method and apparatus for determing musical notes from sounds
US20060080088A1 (en) * 2004-10-12 2006-04-13 Samsung Electronics Co., Ltd. Method and apparatus for estimating pitch of signal
US20060107823A1 (en) * 2004-11-19 2006-05-25 Microsoft Corporation Constructing a table of music similarity vectors from a music similarity graph
US20070163425A1 (en) * 2000-03-13 2007-07-19 Tsui Chi-Ying Melody retrieval system
US7256340B2 (en) * 2002-10-01 2007-08-14 Yamaha Corporation Compressed data structure and apparatus and method related thereto
US7301092B1 (en) * 2004-04-01 2007-11-27 Pinnacle Systems, Inc. Method and apparatus for synchronizing audio and video components of multimedia presentations by identifying beats in a music signal
US20080304672A1 (en) * 2006-01-12 2008-12-11 Shinichi Yoshizawa Target sound analysis apparatus, target sound analysis method and target sound analysis program
US20090049979A1 (en) 2007-08-21 2009-02-26 Naik Devang K Method for Creating a Beat-Synchronized Media Mix
US20090056526A1 (en) 2006-01-25 2009-03-05 Sony Corporation Beat extraction device and beat extraction method
US7521622B1 (en) * 2007-02-16 2009-04-21 Hewlett-Packard Development Company, L.P. Noise-resistant detection of harmonic segments of audio signals
US20090170458A1 (en) * 2005-07-19 2009-07-02 Molisch Andreas F Method and Receiver for Identifying a Leading Edge Time Period in a Received Radio Signal
US20090217806A1 (en) * 2005-10-28 2009-09-03 Victor Company Of Japan, Ltd. Music-piece classifying apparatus and method, and related computer program
US20090287323A1 (en) 2005-11-08 2009-11-19 Yoshiyuki Kobayashi Information Processing Apparatus, Method, and Program
US7767897B2 (en) * 2005-09-01 2010-08-03 Texas Instruments Incorporated Beat matching for portable audio
US20100257994A1 (en) 2009-04-13 2010-10-14 Smartsound Software, Inc. Method and apparatus for producing audio tracks
US7863513B2 (en) * 2002-08-22 2011-01-04 Yamaha Corporation Synchronous playback system for reproducing music in good ensemble and recorder and player for the ensemble
US20110167989A1 (en) * 2010-01-08 2011-07-14 Samsung Electronics Co., Ltd. Method and apparatus for detecting pitch period of input signal
US7985917B2 (en) * 2007-09-07 2011-07-26 Microsoft Corporation Automatic accompaniment for vocal melodies
US8111326B1 (en) 2007-05-23 2012-02-07 Adobe Systems Incorporated Post-capture generation of synchronization points for audio to synchronize video portions captured at multiple cameras
US20120103166A1 (en) * 2010-10-29 2012-05-03 Takashi Shibuya Signal Processing Device, Signal Processing Method, and Program
US8179475B2 (en) 2007-03-09 2012-05-15 Legend3D, Inc. Apparatus and method for synchronizing a secondary audio track to the audio track of a video source
US20120127831A1 (en) 2010-11-24 2012-05-24 Samsung Electronics Co., Ltd. Position determination of devices using stereo audio
US8193436B2 (en) * 2005-06-07 2012-06-05 Matsushita Electric Industrial Co., Ltd. Segmenting a humming signal into musical notes
US20120297959A1 (en) * 2009-06-01 2012-11-29 Matt Serletic System and Method for Applying a Chain of Effects to a Musical Composition
US20130025437A1 (en) * 2009-06-01 2013-01-31 Matt Serletic System and Method for Producing a More Harmonious Musical Accompaniment
US8428270B2 (en) 2006-04-27 2013-04-23 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
US8497417B2 (en) * 2010-06-29 2013-07-30 Google Inc. Intervalgram representation of audio for melody recognition
US20130220102A1 (en) * 2009-06-01 2013-08-29 Music Mastermind, LLC Method for Generating a Musical Compilation Track from Multiple Takes
US20130304243A1 (en) 2012-05-09 2013-11-14 Vyclone, Inc Method for synchronizing disparate content files
US20130339035A1 (en) 2012-03-29 2013-12-19 Smule, Inc. Automatic conversion of speech into song, rap, or other audible expression having target meter or rhythm
US20140053710A1 (en) * 2009-06-01 2014-02-27 Music Mastermind, Inc. System and method for conforming an audio input to a musical key
US20140053711A1 (en) * 2009-06-01 2014-02-27 Music Mastermind, Inc. System and method creating harmonizing tracks for an audio input
US20140067385A1 (en) 2012-09-05 2014-03-06 Honda Motor Co., Ltd. Sound processing device, sound processing method, and sound processing program
US20140123836A1 (en) * 2012-11-02 2014-05-08 Yakov Vorobyev Musical composition processing system for processing musical composition for energy level and related methods
US20140180637A1 (en) 2007-02-12 2014-06-26 Locus Energy, Llc Irradiance mapping leveraging a distributed network of solar photovoltaic systems
US20140307878A1 (en) 2011-06-10 2014-10-16 X-System Limited Method and system for analysing sound
US9031244B2 (en) * 2012-06-29 2015-05-12 Sonos, Inc. Smart audio settings
US20150279427A1 (en) 2012-12-12 2015-10-01 Smule, Inc. Coordinated Audiovisual Montage from Selected Crowd-Sourced Content with Alignment to Audio Baseline
US20160192846A1 (en) * 2015-01-07 2016-07-07 Children's National Medical Center Apparatus and method for detecting heart murmurs
US9418643B2 (en) 2012-06-29 2016-08-16 Nokia Technologies Oy Audio signal analysis

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653550B2 (en) 2003-04-04 2010-01-26 Apple Inc. Interface for providing modeless timeline based selection of an audio or video file
US8452432B2 (en) 2006-05-25 2013-05-28 Brian Transeau Realtime editing and performance of digital audio tracks
US7842874B2 (en) 2006-06-15 2010-11-30 Massachusetts Institute Of Technology Creating music by concatenative synthesis
US8205148B1 (en) 2008-01-11 2012-06-19 Bruce Sharpe Methods and apparatus for temporal alignment of media
US8908874B2 (en) 2010-09-08 2014-12-09 Dts, Inc. Spatial audio encoding and reproduction
EP2485213A1 (en) 2011-02-03 2012-08-08 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Semantic audio track mixer
US9384741B2 (en) 2013-05-29 2016-07-05 Qualcomm Incorporated Binauralization of rotated higher order ambisonics
GB2581032B (en) 2015-06-22 2020-11-04 Time Machine Capital Ltd System and method for onset detection in a digital signal

Patent Citations (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175769A (en) 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
US20080148924A1 (en) * 2000-03-13 2008-06-26 Perception Digital Technology (Bvi) Limited Melody retrieval system
US20070163425A1 (en) * 2000-03-13 2007-07-19 Tsui Chi-Ying Melody retrieval system
US6564182B1 (en) * 2000-05-12 2003-05-13 Conexant Systems, Inc. Look-ahead pitch determination
US20020133499A1 (en) 2001-03-13 2002-09-19 Sean Ward System and method for acoustic fingerprinting
US20040148159A1 (en) 2001-04-13 2004-07-29 Crockett Brett G Method for time aligning audio signals using characterizations based on auditory events
US20040165730A1 (en) 2001-04-13 2004-08-26 Crockett Brett G Segmenting audio signals into auditory events
US20040172240A1 (en) 2001-04-13 2004-09-02 Crockett Brett G. Comparing audio using characterizations based on auditory events
US7461002B2 (en) 2001-04-13 2008-12-02 Dolby Laboratories Licensing Corporation Method for time aligning audio signals using characterizations based on auditory events
US7012183B2 (en) * 2001-05-14 2006-03-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function
US20040094019A1 (en) * 2001-05-14 2004-05-20 Jurgen Herre Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function
US20030033152A1 (en) * 2001-05-30 2003-02-13 Cameron Seth A. Language independent and voice operated information management system
US8964865B2 (en) * 2002-05-02 2015-02-24 Cohda Wireless Pty Ltd Filter structure for iterative signal processing
US20040264561A1 (en) * 2002-05-02 2004-12-30 Cohda Wireless Pty Ltd Filter structure for iterative signal processing
US20080317150A1 (en) * 2002-05-02 2008-12-25 University Of South Australia Filter structure for iterative signal processing
US8411767B2 (en) * 2002-05-02 2013-04-02 University Of South Australia Filter structure for iterative signal processing
US20130201972A1 (en) * 2002-05-02 2013-08-08 Cohda Wireless Pty. Ltd. Filter structure for iterative signal processing
US7863513B2 (en) * 2002-08-22 2011-01-04 Yamaha Corporation Synchronous playback system for reproducing music in good ensemble and recorder and player for the ensemble
US7256340B2 (en) * 2002-10-01 2007-08-14 Yamaha Corporation Compressed data structure and apparatus and method related thereto
US20070240556A1 (en) * 2002-10-01 2007-10-18 Yamaha Corporation Compressed data structure and apparatus and method related thereto
US7619155B2 (en) * 2002-10-11 2009-11-17 Panasonic Corporation Method and apparatus for determining musical notes from sounds
US20060021494A1 (en) * 2002-10-11 2006-02-02 Teo Kok K Method and apparatus for determing musical notes from sounds
US20070061135A1 (en) * 2002-10-29 2007-03-15 Chu Wai C Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20070055504A1 (en) * 2002-10-29 2007-03-08 Chu Wai C Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20040083097A1 (en) * 2002-10-29 2004-04-29 Chu Wai Chung Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20070055503A1 (en) * 2002-10-29 2007-03-08 Docomo Communications Laboratories Usa, Inc. Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20040254660A1 (en) * 2003-05-28 2004-12-16 Alan Seefeldt Method and device to process digital media streams
US20050021325A1 (en) * 2003-07-05 2005-01-27 Jeong-Wook Seo Apparatus and method for detecting a pitch for a voice signal in a voice codec
US20050091045A1 (en) * 2003-10-25 2005-04-28 Samsung Electronics Co., Ltd. Pitch detection method and apparatus
US7593847B2 (en) * 2003-10-25 2009-09-22 Samsung Electronics Co., Ltd. Pitch detection method and apparatus
US20050234366A1 (en) * 2004-03-19 2005-10-20 Thorsten Heinz Apparatus and method for analyzing a sound signal using a physiological ear model
US7301092B1 (en) * 2004-04-01 2007-11-27 Pinnacle Systems, Inc. Method and apparatus for synchronizing audio and video components of multimedia presentations by identifying beats in a music signal
US7672836B2 (en) * 2004-10-12 2010-03-02 Samsung Electronics Co., Ltd. Method and apparatus for estimating pitch of signal
US20060080088A1 (en) * 2004-10-12 2006-04-13 Samsung Electronics Co., Ltd. Method and apparatus for estimating pitch of signal
US20060107823A1 (en) * 2004-11-19 2006-05-25 Microsoft Corporation Constructing a table of music similarity vectors from a music similarity graph
US8193436B2 (en) * 2005-06-07 2012-06-05 Matsushita Electric Industrial Co., Ltd. Segmenting a humming signal into musical notes
US20090170458A1 (en) * 2005-07-19 2009-07-02 Molisch Andreas F Method and Receiver for Identifying a Leading Edge Time Period in a Received Radio Signal
US7767897B2 (en) * 2005-09-01 2010-08-03 Texas Instruments Incorporated Beat matching for portable audio
US7745718B2 (en) * 2005-10-28 2010-06-29 Victor Company Of Japan, Ltd. Music-piece classifying apparatus and method, and related computer program
US20090217806A1 (en) * 2005-10-28 2009-09-03 Victor Company Of Japan, Ltd. Music-piece classifying apparatus and method, and related computer program
US8101845B2 (en) * 2005-11-08 2012-01-24 Sony Corporation Information processing apparatus, method, and program
US20090287323A1 (en) 2005-11-08 2009-11-19 Yoshiyuki Kobayashi Information Processing Apparatus, Method, and Program
US8223978B2 (en) * 2006-01-12 2012-07-17 Panasonic Corporation Target sound analysis apparatus, target sound analysis method and target sound analysis program
US20080304672A1 (en) * 2006-01-12 2008-12-11 Shinichi Yoshizawa Target sound analysis apparatus, target sound analysis method and target sound analysis program
US20090056526A1 (en) 2006-01-25 2009-03-05 Sony Corporation Beat extraction device and beat extraction method
US8428270B2 (en) 2006-04-27 2013-04-23 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
US20140180637A1 (en) 2007-02-12 2014-06-26 Locus Energy, Llc Irradiance mapping leveraging a distributed network of solar photovoltaic systems
US7521622B1 (en) * 2007-02-16 2009-04-21 Hewlett-Packard Development Company, L.P. Noise-resistant detection of harmonic segments of audio signals
US8179475B2 (en) 2007-03-09 2012-05-15 Legend3D, Inc. Apparatus and method for synchronizing a secondary audio track to the audio track of a video source
US8111326B1 (en) 2007-05-23 2012-02-07 Adobe Systems Incorporated Post-capture generation of synchronization points for audio to synchronize video portions captured at multiple cameras
US20090049979A1 (en) 2007-08-21 2009-02-26 Naik Devang K Method for Creating a Beat-Synchronized Media Mix
US7985917B2 (en) * 2007-09-07 2011-07-26 Microsoft Corporation Automatic accompaniment for vocal melodies
US20100257994A1 (en) 2009-04-13 2010-10-14 Smartsound Software, Inc. Method and apparatus for producing audio tracks
US8785760B2 (en) * 2009-06-01 2014-07-22 Music Mastermind, Inc. System and method for applying a chain of effects to a musical composition
US20120297959A1 (en) * 2009-06-01 2012-11-29 Matt Serletic System and Method for Applying a Chain of Effects to a Musical Composition
US20130025437A1 (en) * 2009-06-01 2013-01-31 Matt Serletic System and Method for Producing a More Harmonious Musical Accompaniment
US20140053710A1 (en) * 2009-06-01 2014-02-27 Music Mastermind, Inc. System and method for conforming an audio input to a musical key
US20140053711A1 (en) * 2009-06-01 2014-02-27 Music Mastermind, Inc. System and method creating harmonizing tracks for an audio input
US20130220102A1 (en) * 2009-06-01 2013-08-29 Music Mastermind, LLC Method for Generating a Musical Compilation Track from Multiple Takes
US8378198B2 (en) * 2010-01-08 2013-02-19 Samsung Electronics Co., Ltd. Method and apparatus for detecting pitch period of input signal
US20110167989A1 (en) * 2010-01-08 2011-07-14 Samsung Electronics Co., Ltd. Method and apparatus for detecting pitch period of input signal
US8497417B2 (en) * 2010-06-29 2013-07-30 Google Inc. Intervalgram representation of audio for melody recognition
US20120103166A1 (en) * 2010-10-29 2012-05-03 Takashi Shibuya Signal Processing Device, Signal Processing Method, and Program
US20120127831A1 (en) 2010-11-24 2012-05-24 Samsung Electronics Co., Ltd. Position determination of devices using stereo audio
US20140307878A1 (en) 2011-06-10 2014-10-16 X-System Limited Method and system for analysing sound
US20130339035A1 (en) 2012-03-29 2013-12-19 Smule, Inc. Automatic conversion of speech into song, rap, or other audible expression having target meter or rhythm
US20130304243A1 (en) 2012-05-09 2013-11-14 Vyclone, Inc Method for synchronizing disparate content files
US9031244B2 (en) * 2012-06-29 2015-05-12 Sonos, Inc. Smart audio settings
US9418643B2 (en) 2012-06-29 2016-08-16 Nokia Technologies Oy Audio signal analysis
US20140067385A1 (en) 2012-09-05 2014-03-06 Honda Motor Co., Ltd. Sound processing device, sound processing method, and sound processing program
US20140123836A1 (en) * 2012-11-02 2014-05-08 Yakov Vorobyev Musical composition processing system for processing musical composition for energy level and related methods
US20150279427A1 (en) 2012-12-12 2015-10-01 Smule, Inc. Coordinated Audiovisual Montage from Selected Crowd-Sourced Content with Alignment to Audio Baseline
US20160192846A1 (en) * 2015-01-07 2016-07-07 Children's National Medical Center Apparatus and method for detecting heart murmurs

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9905208B1 (en) * 2017-02-21 2018-02-27 Speed of Sound Software, Inc. System and method for automatically forming a master digital audio track
US11024273B2 (en) * 2017-07-13 2021-06-01 Melotec Ltd. Method and apparatus for performing melody detection
GB2615321A (en) * 2022-02-02 2023-08-09 Altered States Tech Ltd Methods and systems for analysing an audio track

Also Published As

Publication number Publication date
US10068011B1 (en) 2018-09-04

Similar Documents

Publication Publication Date Title
US10043536B2 (en) Systems and methods for audio based synchronization using energy vectors
US10068011B1 (en) Systems and methods for determining a repeatogram in a music composition using audio features
US8457768B2 (en) Crowd noise analysis
US11190898B2 (en) Rendering scene-aware audio using neural network-based acoustic analysis
CN104768049B (en) Method, system and computer readable storage medium for synchronizing audio data and video data
US10803119B2 (en) Automated cover song identification
US20130139674A1 (en) Musical fingerprinting
US10657175B2 (en) Audio fingerprint extraction and audio recognition using said fingerprints
US9558272B2 (en) Method of and a system for matching audio tracks using chromaprints with a fast candidate selection routine
US9756281B2 (en) Apparatus and method for audio based video synchronization
BRPI0714490A2 (en) Method for computationally estimating the time of a musical selection and time estimation system
US20130041489A1 (en) System And Method For Analyzing Audio Information To Determine Pitch And/Or Fractional Chirp Rate
US8829322B2 (en) Metrical grid inference for free rhythm musical input
US9972294B1 (en) Systems and methods for audio based synchronization using sound harmonics
KR101648931B1 (en) Apparatus and method for producing a rhythm game, and computer program for executing the method
JP2020204772A (en) Method, storage media and apparatus for suppressing noise from harmonic noise source
CN108174133A (en) A kind of trial video methods of exhibiting, device, electronic equipment and storage medium
CN108711415B (en) Method, apparatus and storage medium for correcting time delay between accompaniment and dry sound
CN110659604A (en) Video detection method, device, server and storage medium
US9881083B2 (en) Method of and a system for indexing audio tracks using chromaprints
CN111400542A (en) Audio fingerprint generation method, device, equipment and storage medium
US20180075140A1 (en) Audio identification based on data structure
US9916822B1 (en) Systems and methods for audio remixing using repeated segments
BR112016015557B1 (en) METHOD FOR RESTORING CONTINUITY OF AN AUDIO SIGNAL
CN114268831B (en) Video editing method and device

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOPRO, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TCHENG, DAVID;REEL/FRAME:039586/0507

Effective date: 20160829

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT

Free format text: SECURITY INTEREST;ASSIGNOR:GOPRO, INC.;REEL/FRAME:040996/0652

Effective date: 20161215

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:GOPRO, INC.;REEL/FRAME:040996/0652

Effective date: 20161215

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

AS Assignment

Owner name: GOPRO, INC., CALIFORNIA

Free format text: RELEASE OF PATENT SECURITY INTEREST;ASSIGNOR:JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:055106/0434

Effective date: 20210122