US20110268284A1 - Audio analysis apparatus - Google Patents

Audio analysis apparatus Download PDF

Info

Publication number
US20110268284A1
US20110268284A1 US13/081,408 US201113081408A US2011268284A1 US 20110268284 A1 US20110268284 A1 US 20110268284A1 US 201113081408 A US201113081408 A US 201113081408A US 2011268284 A1 US2011268284 A1 US 2011268284A1
Authority
US
United States
Prior art keywords
component
matrix
audio signal
difference
values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/081,408
Other versions
US8853516B2 (en
Inventor
Keita Arimoto
Sebastian Streich
Bee Suan Ong
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Assigned to YAMAHA CORPORATION reassignment YAMAHA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ONG, BEE SUAN, STREICH, SEBASTIAN, ARIMOTO, KEITA
Publication of US20110268284A1 publication Critical patent/US20110268284A1/en
Application granted granted Critical
Publication of US8853516B2 publication Critical patent/US8853516B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Definitions

  • the present invention relates to a technology for analyzing features of sound.
  • a technology for analyzing features (for example, tone) of music has been suggested in the art.
  • features for example, tone
  • Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156 describes a technology in which the time sequence of the feature amount of each of unit periods (frames) having a predetermined time length, into which an audio signal is divided, is compared between different pieces of music.
  • the feature amount of each unit period includes, for example, Mel-Frequency Cepstral Coefficients (MFCCs) indicating tonal features of an audio signal.
  • MFCCs Mel-Frequency Cepstral Coefficients
  • a DP matching (Dynamic Time Warping (DTW)) technology which specifies corresponding locations on the time axis (i.e., corresponding time-axis locations) in pieces of music, is employed to compare the feature amounts of the pieces of music.
  • DTW Dynamic Time Warping
  • the invention has been made in view of these circumstances and it is an object of the invention to reduce processing load required to compare tones of audio signals representing pieces of music while reducing the amount of data required to analyze tones of audio signals.
  • an audio analysis apparatus comprises: a component acquisition part that acquires a component matrix composed of an array of component values from an audio signal which is divided into a sequence of unit periods in a time-axis direction, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band; a difference generation part that generates a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and that generates a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component value of the shift matrix and the corresponding component value of the component matrix; and a feature amount extraction part that generates a ton
  • the tendency of temporal change of the tone of the audio signal is represented by a plurality of feature value series. Accordingly, it is possible to reduce the amount of data required to estimate the tone of the audio signal, compared to the prior art configuration (for example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156) in which a feature amount is extracted for each unit period.
  • the number of the feature value series does not depend on the time length of the audio signal, it is possible to easily compare temporal changes of the tones of audio signals without requiring a process for matching the time axis of each audio signal even when the audio signals have different time lengths. Accordingly, there is an advantage in that load of processing required to compare tones of audio signals is reduced.
  • a typical example of the audio signal is a signal generated by receiving vocal sound or musical sound of a piece of music.
  • piece of music or “music” refers to a time sequence of a plurality of sounds, no matter whether it is all or part of a piece of music created as a single work.
  • the bandwidth of each unit band is arbitrary, each unit band may be set to a bandwidth corresponding to, for example, one octave.
  • the difference generation part comprises: a weight generation part that generates a sequence of weights from the component matrix in correspondence to the sequence of the unit periods, the weight corresponding to a series of component values arranged in the frequency axis direction at the corresponding unit period; a difference calculation part that generates each initial difference matrix composed of an array of difference values of component values between each shift matrix and the component matrix; and a correction part that generates each difference matrix by applying the sequence of the weights to each initial difference matrix.
  • a difference matrix in which the distribution of difference values arranged in the time-axis direction has been corrected based on the initial difference matrix by applying the weight sequence to the initial difference matrix, is generated. Accordingly, there is an advantage in that it is possible to, for example, generate a tonal feature amount in which the difference between the component matrix and the shift matrix is emphasized for each unit period having large component values of the component matrix (i.e., a tonal feature amount which emphasizes, especially, tones of unit periods, the strength of which is high in the audio signal).
  • the feature amount extraction part generates the tonal feature amount including a series of feature values derived from the component matrix in correspondence to the series of the unit bands, each feature value corresponding to a sequence of component values of the component matrix arranged in the time-axis direction at the corresponding unit band.
  • the advantage of ease of estimation of the tone of the audio signal is especially significant since the tonal feature amount includes a feature value series derived from the component matrix, in which the average tonal tendency (frequency characteristic) over the entirety of the audio signal is reflected, in addition to a plurality of feature value series derived from the plurality of difference matrices in which the temporal change tendency of the tone of the audio signal is reflected.
  • An audio analysis apparatus that is preferable for comparing tones of audio signals comprises a storage part that stores a tonal feature amount for each of first and second ones of an audio signal; and a feature comparison part that calculates a similarity index value indicating tonal similarity between the first audio signal and the second audio signal by comparing the tonal feature amounts of the first audio signal and the second audio signal with each other, wherein the tonal feature amount is derived based on a component matrix of the audio signal which is divided into a sequence of unit periods in a time-axis direction and based on a plurality of shift matrices derived from the component matrix, the component matrix being composed of an array of component values, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a
  • the amount of data of the tonal feature amount is reduced by representing the tendency of temporal change of the tone of the audio signal by a plurality of feature value series, it is possible to reduce capacity required for the storage part, compared to the prior art configuration (for example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156) in which a feature amount is extracted for each unit period.
  • the number of the feature value series does not depend on the time length of the audio signal, it is possible to easily compare temporal changes of the tones of audio signals even when the audio signals have different time lengths. Accordingly, there is also an advantage in that load of processing associated with the feature comparison part is reduced.
  • the audio analysis apparatus may not only be implemented by hardware (electronic circuitry) such as a Digital Signal Processor (DSP) dedicated to analysis of audio signals but may also be implemented through cooperation of a general arithmetic processing unit such as a Central Processing Unit (CPU) with a program.
  • DSP Digital Signal Processor
  • CPU Central Processing Unit
  • the program according to the invention is executable by a computer to perform processes of: acquiring a component matrix composed of an array of component values from an audio signal which is divided into a sequence of unit periods in a time-axis direction, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band; generating a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount; generating a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component value of the shift matrix and the corresponding component value of the component matrix; and generating a tonal feature amount including a plurality of series of feature values corresponding to the plurality of
  • the program achieves the same operations and advantages as those of the audio analysis apparatus according to the invention.
  • the program of the invention may be provided to a user through a computer readable storage medium storing the program and then installed on a computer and may also be provided from a server device to a user through distribution over a communication network and then installed on a computer.
  • FIG. 1 is a block diagram of an audio analysis apparatus according to an embodiment of the invention.
  • FIG. 2 is a block diagram of a signal analyzer.
  • FIGS. 3(A) and 3(B) are a schematic diagram illustrating relationships between a component matrix and a time sequence of the spectrum of an audio signal.
  • FIG. 4 is a block diagram of a difference generator.
  • FIG. 5 is a diagram illustrating operation of the difference generator.
  • FIG. 6 is a diagram illustrating operation of a feature amount extractor.
  • FIG. 7 is a schematic diagram of a tone image.
  • FIG. 1 is a block diagram of an audio analysis apparatus 100 according to an embodiment of the invention.
  • the audio analysis apparatus 100 is a device for analyzing the characteristics of sounds (musical sounds or vocal sounds) included in a piece of music and is implemented through a computer system including an arithmetic processing unit 12 , a storage device 14 , and a display device 16 .
  • the storage device 14 stores various data used by the arithmetic processing unit 12 and a program PGM executed by the arithmetic processing unit 12 .
  • Any known machine readable storage medium such as a semiconductor recording medium or a magnetic recording medium or a combination of various types of recording media may be employed as the storage device 14 .
  • the storage device 14 stores audio signals X (X 1 , X 2 ).
  • Each audio signal X is a signal representing temporal waveforms of sounds included in a piece of music and is prepared for, for example, a section, from which it is possible to identify a melody or a rhythm of the piece of music (for example, a section corresponding to a specific number of measures in the piece of music).
  • the audio signal X 1 and the audio signal X 2 represent parts of different pieces of music. However, it is also possible to employ a configuration in which the audio signal X 1 and the audio signal X 2 represent different parts of the same piece of music or a configuration in which the audio signal X represents the entirety of a piece of music.
  • the arithmetic processing unit 12 implements a plurality of functions (including a signal analyzer 22 , a display controller 24 , and a feature comparator 26 ) required to analyze each audio signal X through execution of the program PGM stored in the storage device 14 .
  • the signal analyzer 22 generates a tonal feature amount F(F 1 , F 2 ) representing the features of the tone color or timbre of the audio signal X.
  • the display controller 24 displays the tonal feature amount F generated by the signal analyzer 22 as an image on the display device 16 (for example, a liquid crystal display).
  • the feature comparator 26 compares the tonal feature amount F 1 of the first audio signal X 1 and the tonal feature amount F 2 of the second audio signal X 2 .
  • each function of the arithmetic processing unit 12 is implemented through a dedicated electronic circuit (DSP) or a configuration in which each function of the arithmetic processing unit 12 is distributed on a plurality of integrated circuits.
  • DSP dedicated electronic circuit
  • FIG. 2 is a block diagram of the signal analyzer 22 .
  • the signal analyzer 22 includes a component acquirer 32 , a difference generator 34 , and a feature amount extractor 36 .
  • the component acquirer 32 generates a component matrix A representing temporal changes of frequency characteristics of the audio signal X.
  • the component acquirer 32 includes a frequency analyzer 322 and a matrix generator 324 .
  • the frequency analyzer 322 generates a spectrum PX of the frequency domain for each of N unit periods (frames) ⁇ T[ 1 ] to ⁇ T[N] having a predetermined length into which the audio signal X is divided, where N is a natural number greater than 1.
  • FIG. 3(A) is a schematic diagram of a time sequence (i.e., a spectrogram) of the spectrum PX generated by the frequency analyzer 322 .
  • the spectrum PX of the audio signal X is a power spectrum in which the respective component values (strengths or magnitudes) x of frequency components of the audio signal X are arranged on the frequency axis.
  • the component acquirer 32 may use any known frequency analysis method such as, for example, short time Fourier transform to generate the spectrum PX.
  • the matrix generator 324 of FIG. 2 generates a component matrix A from the time sequence of the spectrum PX generated by the frequency analyzer 322 .
  • the component matrix A is an M ⁇ N matrix of component values a[ 1 , 1 ] to a[M, N] arranged in M rows and N columns, where M is a natural number greater than 1.
  • the matrix generator 324 calculates each component value a[m, n] of the component matrix A according to a plurality of component values x in the mth unit band ⁇ F[n] in the spectrum PX of the nth unit period ⁇ T[n] on the time axis. For example, the matrix generator 324 calculates, as the component value a[m, n], an average (arithmetic average) of a plurality of component values x in the unit band ⁇ F[m].
  • the component matrix A is a matrix of component values a[m, n], each corresponding to an average strength of a corresponding unit band ⁇ F[m] in a corresponding unit period ⁇ T[n] of the audio signal X, which are arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction (i.e., in the vertical direction), the N columns being arranged in the time-axis direction (i.e., in the horizontal direction).
  • Each of the unit bands ⁇ F[ 1 ] to ⁇ F[M] is set to a bandwidth corresponding to one octave.
  • the difference generator 34 generates K different difference matrices D 1 to DK from the component matrix A, where K is a natural number greater than 1.
  • FIG. 4 is a block diagram of the difference generator 34 and FIG. 5 is a diagram illustrating operation of the difference generator 34 .
  • the difference generator 34 includes a shift matrix generator 42 , a difference calculator 44 , a weight generator 46 , and a corrector 48 .
  • the reference numbers of the elements of the difference generator 34 are written at locations corresponding to processes performed by the elements.
  • each shift matrix Bk is a matrix obtained by shifting each component value a[m, n] of the component matrix A by a shift amount k ⁇ different for each shift matrix Bk along the time-axis direction.
  • Each shift matrix Bk includes component values bk[ 1 , 1 ] to bk[M, N] arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction and the N columns being arranged in the time-axis direction.
  • a component value bk[m, n] located in the mth row and the nth column among the component values of the shift matrix Bk corresponds to a component value a[m, n+k ⁇ ] located in the mth row and the (n+k ⁇ )th column of the component matrix A.
  • the unit ⁇ of the shift amount k ⁇ is set to a time length corresponding to one unit period ⁇ T[n]. That is, the shift matrix Bk is a matrix obtained by shifting each component value a[m, n] of the component matrix A by k unit periods ⁇ T[n] to the front side of the time-axis direction (i.e., backward in time).
  • component values a[m, n] of a number of columns of the component matrix A hatchched in FIG.
  • the shift matrix B 1 is constructed by shifting the 1st column of the component matrix A to the Mth column and the shift matrix B 2 is constructed by shifting the 1st and 2nd columns of the component matrix A to the (M ⁇ 1)th and the Mth column.
  • the difference calculator 44 of FIG. 4 generates an initial difference matrix Ck corresponding to the difference between the component matrix A and the shift matrix Bk for each of the K shift matrices B 1 to BK.
  • the initial difference matrix Ck is an array of difference values ck[ 1 , 1 ] to ck[M, N] arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction and the N columns being arranged in the time-axis direction. As shown in FIG.
  • the difference value ck[m, n] of the initial difference matrix Ck is set to a greater number as a greater change is made to the strength of components in the unit band ⁇ F[m] of the audio signal X within a period that spans the shift amount k ⁇ from each unit period ⁇ T[n] on the time axis.
  • the weight generator 46 of FIG. 4 generates a weight sequence W used to correct the initial difference matrix Ck.
  • the weight sequence W is a sequence of N weights w[ 1 ] to w[N] corresponding to different unit periods ⁇ Tn as shown in FIG. 5 .
  • the nth weight w[n] of the weight sequence W is set according to M component values a[ 1 , n ] to a[M, n] corresponding to the unit period ⁇ T[n] among component values of the component matrix A. For example, the sum or average of the M component values a[ 1 , n ] to a[M, n] is calculated as the weight w[n].
  • the weight w[n] increases as the strength (sound volume) of the unit period ⁇ T[n] over the entire band of the audio signal X increases. That is, a time sequence of the weights w[ 1 ] to w[N] corresponds to an envelope of the temporal waveform of the audio signal X.
  • the corrector 48 of FIG. 4 generates K difference matrices D 1 to DK corresponding to K initial difference matrices Ck by applying the weight sequence W generated by the weight generator 46 to the initial difference matrices Ck (C 1 to CK).
  • the difference matrix Dk is a matrix composed of an array of element values dk[ 1 , 1 ] to dk[M, N] arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction (i.e., in the vertical direction), the N columns being arranged in the time-axis direction (i.e., in the horizontal direction).
  • the corrector 48 functions as an element for correcting (emphasizing levels of) the distribution of N difference values ck[m, 1 ] to ck[m, N] arranged in the time-axis direction in the unit band ⁇ F[m].
  • the feature amount extractor 36 of FIG. 2 generates a tonal feature amount F (F 1 , F 2 ) of the audio signal X using the component matrix A generated by the component acquirer 32 and the K difference matrices D 1 to DK generated by the difference generator 34 .
  • FIG. 6 is a diagram illustrating operation of the feature amount extractor 36 .
  • the tonal feature amount F generated by the feature amount extractor 36 is an M ⁇ (K+1) matrix in which a plurality of K feature value series E 1 to EK corresponding to a plurality of difference matrices Dk and one feature value series EK+1 corresponding to the component matrix A are arranged.
  • the number M of rows and the number (K+1) of columns of the tonal feature amount F do not depend on the time length of the audio signal X (i.e., the total number N of unit periods ⁇ T[n]).
  • the feature value series EK+1 located at the (K+1)th column of the tonal feature amount F is a sequence of M feature values eK+1[ 1 ] to eK+1[M] corresponding to different unit bands ⁇ F[m].
  • the element value eK+1[m] is set according to N component values a[m, 1 ] to a[m, N] corresponding to the unit band ⁇ F[m] among component values of the component matrix A generated by the component acquirer 32 .
  • the sum or average of the N component values a[m, 1 ] to a[m, N] is calculated as the feature value eK+1[m].
  • the feature value eK+1[m] increases as the strength of the components of the unit band ⁇ F[m] over the entire period of the audio signal X increases. That is, the feature value eK+1[m] serves as a feature amount representing an average tone (average frequency characteristics) of the audio signal X over the entire period of the audio signal X.
  • the feature value series Ek (E 1 to EK) is a sequence of M feature values ek[ 1 ] to ek[M] corresponding to different unit band ⁇ F[m].
  • the mth feature value ek[m] of the feature value series Ek is set according to N element values dk[m, 1 ] to dk[m, N] corresponding to the unit band ⁇ F[m] among element values of the difference matrix Dk. For example, the sum or average of the N element values dk[m, 1 ] to dk[m, N] is calculated as the feature value ek[m].
  • the feature value ek[m] is set to a greater value as the strength of the components in the unit band ⁇ F[m] of the audio signal X in each of the unit periods ⁇ T[ 1 ] to ⁇ T[N] more significantly changes in a period that spans the shift amount k ⁇ from the unit period ⁇ Tn. Accordingly, in the case where the K feature values e 1 [m] to eK[m] (arranged in the horizontal direction) corresponding to each unit band ⁇ F[m] in the tonal feature amount F include many great feature values ek[m], it is estimated that the components of the unit band ⁇ F[m] of the audio signal X are components of sound whose strength rapidly changes in a short time.
  • the K feature values e 1 [m] to eK[m] corresponding to each unit band ⁇ F[m] include many small feature values ek[m]
  • the K feature value series E 1 to EK included in the tonal feature amount F serve as a feature amount indicating temporal changes of the components of each unit band ⁇ F[m] of the audio signal X (i.e., temporal changes of tone of the audio signal X).
  • the configuration and operation of the signal analyzer 22 of FIG. 1 have been described above.
  • the signal analyzer 22 sequentially generates the tonal feature amount F 1 of the first audio signal X 1 and the tonal feature amount F 2 of the second audio signal X 2 through the above procedure.
  • the tonal feature amounts F generated by the signal analyzer 22 are provided to the storage device 14 .
  • the display controller 24 displays tone images G (G 1 , G 2 ) of FIG. 7 schematically and graphically representing the tonal feature amounts F (F 1 , F 2 ) generated by the signal analyzer 22 on the display device 16 .
  • FIG. 7 illustrates an example in which the tone image G 1 of the tonal feature amount F 1 of the audio signal X 1 and the tone image G 2 of the tonal feature amount F 2 of the audio signal X 2 are displayed in parallel.
  • the tone image G 1 of the audio signal X 1 and the tone image G 2 of the audio signal X 2 are displayed in contrast with respect to the common horizontal axis (time axis).
  • a display form (color or gray level) of a unit figure u[m, ⁇ ] located at an mth row and an nth column in the tone image G 1 is variably set according to a feature value e ⁇ [m] in the tonal feature amount F 1 .
  • a display form of each unit figure u[m, ⁇ ] of the tone image G 2 is variably set according to a feature value e ⁇ [m] in the tonal feature amount F 2 . Accordingly, the user who has viewed the tone images G can intuitively identify and compare the tendencies of the tones of the audio signal X 1 and the audio signal X 2 .
  • the user can easily identify the tendency of the average tone (frequency characteristics) of the audio signal X over the entire period of the audio signal X from the M unit figures u( 1 , K+1) to u(M, K+1) (the feature value series EK+1) of the (K+1)th column among the unit figures of the tone image G.
  • the user can also easily identify the tendency of temporal changes of the components of each unit band ⁇ F[m] (i.e., each octave) of the audio signal X from the unit figures u(m, k) of the 1st to Kth columns among the unit figures of the tone image G.
  • the user can easily compare the tone of the audio signal X 1 and the tone of the audio signal X 2 since the number M of rows and the number (K+1) of columns of the unit figures u[m, ⁇ ] are common to the tone image G 1 and the tone image G 2 regardless of the time length of each audio signal X.
  • the feature comparator 26 of FIG. 1 calculates a value (hereinafter referred to as a “similarity index value”) Q which is a measure of the tonal similarity between the audio signal X 1 and audio signal X 2 by comparing the tonal feature amount F 1 of the audio signal X 1 and the tonal feature amount F 2 of the audio signal X 2 .
  • any method may be employed to calculate the similarity index value Q, it is possible to employ a configuration in which differences between corresponding feature values e ⁇ [m] in the tonal feature amount F 1 and the tonal feature amount F 2 (i.e., differences between feature values e ⁇ [m] located at corresponding positions in the two matrices) are calculated and the sum or average of absolute values of the differences over the M rows and the (K+1) columns is calculated as the similarity index value Q. That is, the similarity index value Q decreases as the similarity between the tonal feature amount F 1 of the audio signal X 1 and the tonal feature amount F 2 of the audio signal X 2 increases.
  • the similarity index value Q calculated by the feature comparator 26 is displayed on the display device 16 , for example, together with the tone images G (G 1 , G 2 ) of FIG. 7 .
  • the user can quantitatively determine the tonal similarity between the audio signal X 1 and the audio signal X 2 from the similarity index value Q.
  • the tendency of the average tone of the audio signal X over the entire period of the audio signal X is represented by the feature value series EK+1 and the tendency of temporal changes of the tone of the audio signal X over the entire period of the audio signal X is represented by K feature value series E 1 to EK corresponding to the number of shift matrices Bk (i.e., the number of feature amounts k ⁇ ). Accordingly, it is possible to reduce the amount of data required to estimate the tone color or timbre of a piece of music, compared to the prior art configuration (for example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p.
  • the user can easily estimate the tonal similarity between the tone of the audio signal X 1 and the tone of the audio signal X 2 by comparing the tone image G 1 and the tone image G 2 even when the time lengths of the audio signal X 1 and the audio signal X 2 are different.
  • the process for locating corresponding time points between the audio signal X 1 and the audio signal X 2 for example, DP matching required in the technology of Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc.
  • ISMIR 2002, p. 150-156) is unnecessary since the number M of rows and the number (K+1) of columns of the tonal feature amount F do not depend on the audio signal X. Therefore, there is also an advantage in that load of processing for comparing the tones of the audio signal X 1 and the audio signal X 2 (i.e., load of the feature comparator 26 ) is reduced.
  • the method of calculating the component value a[m, n] of each unit band ⁇ F[m] is not limited to the above method in which an average (arithmetic average) of a plurality of component values x in the unit band ⁇ F[m] is calculated as the component value a[m, n].
  • an average (arithmetic average) of a plurality of component values x in the unit band ⁇ F[m] is calculated as the component value a[m, n].
  • the bandwidth of the unit band ⁇ F[m] may be arbitrarily selected without being limited to one octave.
  • each unit band ⁇ F[m] is set to a bandwidth corresponding to a multiple of one octave or a bandwidth corresponding to a divisional of one octave divided by an integer.
  • the initial difference matrix Ck is corrected to the difference matrix Dk using the weight sequence W in the above embodiment, it is possible to omit correction using the weight sequence W.
  • the feature amount extractor 36 generates the tonal feature amount F using the initial difference matrix Ck calculated by the difference calculator 44 of FIG. 4 as the difference matrix Dk (such that the weight generator 46 , the corrector 48 , and the like are omitted).
  • the tonal feature amount F including the K feature value series E 1 to EK generated from difference matrices Dk and the feature value series EK+1 corresponding to the component matrix A is generated in the above embodiment, the feature value series EK+1 may be omitted from the tonal feature amount F.
  • each shift matrix Bk is generated by shifting the component values a[m, n] at the front edge of the component matrix A to the rear edge in the above embodiment
  • the method of generating the shift matrix Bk by the shift matrix generator 42 may be modified as appropriate.
  • the difference calculator 44 generates an initial difference matrix Ck of m rows and (N ⁇ k ⁇ ) columns by calculating difference values ck[m, n] between the component values a[m, n] and the component values dk[m, n] only for an overlapping portion of the component matrix A and the shift matrix Bk.
  • each component value a[m, n] of the component matrix A is shifted to the front side of the time axis in the above example, it is also possible to employ a configuration in which the shift matrix Bk is generated by shifting each component value a[m, n] to the rear side of the time axis (i.e., forward in time).
  • the component acquirer 32 may acquire the component matrix A using any other method. For example, it is possible to employ a configuration in which the component matrix A of the audio signal X is stored in the storage device 14 in advance (such that storage of the audio signal X may be omitted) and the component acquirer 32 acquires the component matrix A from the storage device 14 .
  • the component acquirer 32 may be any element for acquiring the component matrix A.
  • the audio analysis apparatus 100 includes both the signal analyzer 22 and the feature comparator 26 in the above example, the invention may also be realized as an audio analysis apparatus including only one of the signal analyzer 22 and the feature comparator 26 . That is, an audio analysis apparatus used to analyze the tone of the audio signal X (i.e., used to extract the tonal feature amount F) (hereinafter referred to as a “feature extraction apparatus”) may have a configuration in which the signal analyzer 22 is provided while the feature comparator 26 is omitted.
  • an audio analysis apparatus used to compare the tones of the audio signal X 1 and the audio signal X 2 i.e., used to calculate the similarity index value Q
  • a feature comparison apparatus may have a configuration in which the feature comparator 26 is provided while the signal analyzer 22 is omitted.
  • the tonal feature amounts F (F 1 , F 2 ) generated by the signal analyzer 22 of the feature extraction apparatus is provided to the feature comparison apparatus through, for example, a communication network or a portable recording medium and is then stored in the storage device 14 .
  • the feature comparator 26 of the feature comparison apparatus calculates the similarity index value Q by comparing the tonal feature amount F 1 and the tonal feature amount F 2 stored in the storage device 14 .

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

In an audio analysis apparatus, a component acquirer acquires a component matrix composed of an array of component values, columns of the component matrix corresponding to the sequence of unit periods of an audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction. A difference generator generates a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and generates a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component values of the shift matrix and the component matrix. A feature amount extractor generates a tonal feature amount including a plurality of series of feature values corresponding to the plurality of difference matrices, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.

Description

    BACKGROUND OF THE INVENTION
  • 1. Technical Field of the Invention
  • The present invention relates to a technology for analyzing features of sound.
  • 2. Description of the Related Art
  • A technology for analyzing features (for example, tone) of music has been suggested in the art. For example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156 describes a technology in which the time sequence of the feature amount of each of unit periods (frames) having a predetermined time length, into which an audio signal is divided, is compared between different pieces of music. The feature amount of each unit period includes, for example, Mel-Frequency Cepstral Coefficients (MFCCs) indicating tonal features of an audio signal. A DP matching (Dynamic Time Warping (DTW)) technology, which specifies corresponding locations on the time axis (i.e., corresponding time-axis locations) in pieces of music, is employed to compare the feature amounts of the pieces of music.
  • However, since respective feature amounts of unit periods over the entire period of an audio signal are required to represent the overall features of the audio signal, the technology of Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156 has a problem in that the amount of data representing feature amounts is large, especially in the case where the time length of the audio signal is long. In addition, since a feature amount extracted in each unit period is set regardless of the time length or tempo of music, an audio signal extension/contraction process such as the above-mentioned DP matching should be performed to compare the features of pieces of music, causing high processing load.
  • SUMMARY OF THE INVENTION
  • The invention has been made in view of these circumstances and it is an object of the invention to reduce processing load required to compare tones of audio signals representing pieces of music while reducing the amount of data required to analyze tones of audio signals.
  • In order to solve the above problems, an audio analysis apparatus according to the invention comprises: a component acquisition part that acquires a component matrix composed of an array of component values from an audio signal which is divided into a sequence of unit periods in a time-axis direction, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band; a difference generation part that generates a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and that generates a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component value of the shift matrix and the corresponding component value of the component matrix; and a feature amount extraction part that generates a tonal feature amount including a plurality of series of feature values corresponding to the plurality of difference matrices, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
  • In this configuration, the tendency of temporal change of the tone of the audio signal is represented by a plurality of feature value series. Accordingly, it is possible to reduce the amount of data required to estimate the tone of the audio signal, compared to the prior art configuration (for example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156) in which a feature amount is extracted for each unit period. In addition, since the number of the feature value series does not depend on the time length of the audio signal, it is possible to easily compare temporal changes of the tones of audio signals without requiring a process for matching the time axis of each audio signal even when the audio signals have different time lengths. Accordingly, there is an advantage in that load of processing required to compare tones of audio signals is reduced.
  • A typical example of the audio signal is a signal generated by receiving vocal sound or musical sound of a piece of music. The term “piece of music” or “music” refers to a time sequence of a plurality of sounds, no matter whether it is all or part of a piece of music created as a single work. Although the bandwidth of each unit band is arbitrary, each unit band may be set to a bandwidth corresponding to, for example, one octave.
  • In a preferred embodiment of the invention, the difference generation part comprises: a weight generation part that generates a sequence of weights from the component matrix in correspondence to the sequence of the unit periods, the weight corresponding to a series of component values arranged in the frequency axis direction at the corresponding unit period; a difference calculation part that generates each initial difference matrix composed of an array of difference values of component values between each shift matrix and the component matrix; and a correction part that generates each difference matrix by applying the sequence of the weights to each initial difference matrix.
  • In this embodiment, a difference matrix, in which the distribution of difference values arranged in the time-axis direction has been corrected based on the initial difference matrix by applying the weight sequence to the initial difference matrix, is generated. Accordingly, there is an advantage in that it is possible to, for example, generate a tonal feature amount in which the difference between the component matrix and the shift matrix is emphasized for each unit period having large component values of the component matrix (i.e., a tonal feature amount which emphasizes, especially, tones of unit periods, the strength of which is high in the audio signal).
  • In a preferred embodiment of the invention, the feature amount extraction part generates the tonal feature amount including a series of feature values derived from the component matrix in correspondence to the series of the unit bands, each feature value corresponding to a sequence of component values of the component matrix arranged in the time-axis direction at the corresponding unit band.
  • In this embodiment, the advantage of ease of estimation of the tone of the audio signal is especially significant since the tonal feature amount includes a feature value series derived from the component matrix, in which the average tonal tendency (frequency characteristic) over the entirety of the audio signal is reflected, in addition to a plurality of feature value series derived from the plurality of difference matrices in which the temporal change tendency of the tone of the audio signal is reflected.
  • The invention may also be specified as an audio analysis apparatus that compares tonal feature amounts generated respectively for audio signals in each of the above embodiments. An audio analysis apparatus that is preferable for comparing tones of audio signals comprises a storage part that stores a tonal feature amount for each of first and second ones of an audio signal; and a feature comparison part that calculates a similarity index value indicating tonal similarity between the first audio signal and the second audio signal by comparing the tonal feature amounts of the first audio signal and the second audio signal with each other, wherein the tonal feature amount is derived based on a component matrix of the audio signal which is divided into a sequence of unit periods in a time-axis direction and based on a plurality of shift matrices derived from the component matrix, the component matrix being composed of an array of component values, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band, each shift matrix being obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and wherein the tonal feature amount includes a plurality of series of feature values corresponding to a plurality of difference matrices which are derived from the plurality of the shift matrices, each difference matrix being composed of an array of element values each representing a difference between the corresponding component value of each shift matrix and the corresponding component value of the component matrix, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
  • In this configuration, since the amount of data of the tonal feature amount is reduced by representing the tendency of temporal change of the tone of the audio signal by a plurality of feature value series, it is possible to reduce capacity required for the storage part, compared to the prior art configuration (for example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156) in which a feature amount is extracted for each unit period. In addition, since the number of the feature value series does not depend on the time length of the audio signal, it is possible to easily compare temporal changes of the tones of audio signals even when the audio signals have different time lengths. Accordingly, there is also an advantage in that load of processing associated with the feature comparison part is reduced.
  • The audio analysis apparatus according to each of the above embodiments may not only be implemented by hardware (electronic circuitry) such as a Digital Signal Processor (DSP) dedicated to analysis of audio signals but may also be implemented through cooperation of a general arithmetic processing unit such as a Central Processing Unit (CPU) with a program. The program according to the invention is executable by a computer to perform processes of: acquiring a component matrix composed of an array of component values from an audio signal which is divided into a sequence of unit periods in a time-axis direction, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band; generating a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount; generating a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component value of the shift matrix and the corresponding component value of the component matrix; and generating a tonal feature amount including a plurality of series of feature values corresponding to the plurality of difference matrices, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
  • The program achieves the same operations and advantages as those of the audio analysis apparatus according to the invention. The program of the invention may be provided to a user through a computer readable storage medium storing the program and then installed on a computer and may also be provided from a server device to a user through distribution over a communication network and then installed on a computer.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of an audio analysis apparatus according to an embodiment of the invention.
  • FIG. 2 is a block diagram of a signal analyzer.
  • FIGS. 3(A) and 3(B) are a schematic diagram illustrating relationships between a component matrix and a time sequence of the spectrum of an audio signal.
  • FIG. 4 is a block diagram of a difference generator.
  • FIG. 5 is a diagram illustrating operation of the difference generator.
  • FIG. 6 is a diagram illustrating operation of a feature amount extractor.
  • FIG. 7 is a schematic diagram of a tone image.
  • DETAILED DESCRIPTION OF THE INVENTION A: First Embodiment
  • FIG. 1 is a block diagram of an audio analysis apparatus 100 according to an embodiment of the invention. The audio analysis apparatus 100 is a device for analyzing the characteristics of sounds (musical sounds or vocal sounds) included in a piece of music and is implemented through a computer system including an arithmetic processing unit 12, a storage device 14, and a display device 16.
  • The storage device 14 stores various data used by the arithmetic processing unit 12 and a program PGM executed by the arithmetic processing unit 12. Any known machine readable storage medium such as a semiconductor recording medium or a magnetic recording medium or a combination of various types of recording media may be employed as the storage device 14.
  • As shown in FIG. 1, the storage device 14 stores audio signals X (X1, X2). Each audio signal X is a signal representing temporal waveforms of sounds included in a piece of music and is prepared for, for example, a section, from which it is possible to identify a melody or a rhythm of the piece of music (for example, a section corresponding to a specific number of measures in the piece of music). The audio signal X1 and the audio signal X2 represent parts of different pieces of music. However, it is also possible to employ a configuration in which the audio signal X1 and the audio signal X2 represent different parts of the same piece of music or a configuration in which the audio signal X represents the entirety of a piece of music.
  • The arithmetic processing unit 12 implements a plurality of functions (including a signal analyzer 22, a display controller 24, and a feature comparator 26) required to analyze each audio signal X through execution of the program PGM stored in the storage device 14. The signal analyzer 22 generates a tonal feature amount F(F1, F2) representing the features of the tone color or timbre of the audio signal X. The display controller 24 displays the tonal feature amount F generated by the signal analyzer 22 as an image on the display device 16 (for example, a liquid crystal display). The feature comparator 26 compares the tonal feature amount F1 of the first audio signal X1 and the tonal feature amount F2 of the second audio signal X2. It is also possible to employ a configuration in which each function of the arithmetic processing unit 12 is implemented through a dedicated electronic circuit (DSP) or a configuration in which each function of the arithmetic processing unit 12 is distributed on a plurality of integrated circuits.
  • FIG. 2 is a block diagram of the signal analyzer 22. As shown in FIG. 2, the signal analyzer 22 includes a component acquirer 32, a difference generator 34, and a feature amount extractor 36. The component acquirer 32 generates a component matrix A representing temporal changes of frequency characteristics of the audio signal X. As shown in FIG. 2, the component acquirer 32 includes a frequency analyzer 322 and a matrix generator 324.
  • The frequency analyzer 322 generates a spectrum PX of the frequency domain for each of N unit periods (frames) σT[1] to σT[N] having a predetermined length into which the audio signal X is divided, where N is a natural number greater than 1. FIG. 3(A) is a schematic diagram of a time sequence (i.e., a spectrogram) of the spectrum PX generated by the frequency analyzer 322. As shown in FIG. 3(A), the spectrum PX of the audio signal X is a power spectrum in which the respective component values (strengths or magnitudes) x of frequency components of the audio signal X are arranged on the frequency axis. Since each unit period σT[n] (n=1˜N) is set to a predetermined length, the total number N of unit periods σT[n] varies depending on the time length of the audio signal X. The component acquirer 32 may use any known frequency analysis method such as, for example, short time Fourier transform to generate the spectrum PX.
  • The matrix generator 324 of FIG. 2 generates a component matrix A from the time sequence of the spectrum PX generated by the frequency analyzer 322. As shown in FIG. 3(B), the component matrix A is an M×N matrix of component values a[1, 1] to a[M, N] arranged in M rows and N columns, where M is a natural number greater than 1. Assuming that M unit bands σF[1] to σF[M] are defined on the frequency axis, the matrix generator 324 calculates each component value a[m, n] of the component matrix A according to a plurality of component values x in the mth unit band σF[n] in the spectrum PX of the nth unit period σT[n] on the time axis. For example, the matrix generator 324 calculates, as the component value a[m, n], an average (arithmetic average) of a plurality of component values x in the unit band σF[m]. As can be understood from the above description, the component matrix A is a matrix of component values a[m, n], each corresponding to an average strength of a corresponding unit band σF[m] in a corresponding unit period σT[n] of the audio signal X, which are arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction (i.e., in the vertical direction), the N columns being arranged in the time-axis direction (i.e., in the horizontal direction). Each of the unit bands σF[1] to σF[M] is set to a bandwidth corresponding to one octave.
  • The difference generator 34 generates K different difference matrices D1 to DK from the component matrix A, where K is a natural number greater than 1. FIG. 4 is a block diagram of the difference generator 34 and FIG. 5 is a diagram illustrating operation of the difference generator 34. As shown in FIG. 4, the difference generator 34 includes a shift matrix generator 42, a difference calculator 44, a weight generator 46, and a corrector 48. In FIG. 5, the reference numbers of the elements of the difference generator 34 are written at locations corresponding to processes performed by the elements.
  • The shift matrix generator 42 of FIG. 4 generates K shift matrices B1 to BK corresponding to the different difference matrices Dk (k=1˜K) from the single component matrix A. As shown in FIG. 5, each shift matrix Bk is a matrix obtained by shifting each component value a[m, n] of the component matrix A by a shift amount kΔ different for each shift matrix Bk along the time-axis direction. Each shift matrix Bk includes component values bk[1, 1] to bk[M, N] arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction and the N columns being arranged in the time-axis direction. That is, a component value bk[m, n] located in the mth row and the nth column among the component values of the shift matrix Bk corresponds to a component value a[m, n+kΔ] located in the mth row and the (n+kΔ)th column of the component matrix A.
  • The unit Δ of the shift amount kΔ is set to a time length corresponding to one unit period σT[n]. That is, the shift matrix Bk is a matrix obtained by shifting each component value a[m, n] of the component matrix A by k unit periods σT[n] to the front side of the time-axis direction (i.e., backward in time). Here, component values a[m, n] of a number of columns of the component matrix A (hatched in FIG. 5), which correspond to the shift amount kΔ from the front edge in the time-axis direction of the component matrix A (i.e., from the 1st column), are added (i.e., circularly shifted) to the rear edge in the time-axis direction of the shift matrix Bk. That is, the 1st to kΔth columns of the the component matrix A are used as the {M−(kΔ−1)}th to Mth columns of the shift matrix Bk. For example, in the case where the unit Δ is set to a time length corresponding to a single unit period σT[n], the shift matrix B1 is constructed by shifting the 1st column of the component matrix A to the Mth column and the shift matrix B2 is constructed by shifting the 1st and 2nd columns of the component matrix A to the (M−1)th and the Mth column.
  • The difference calculator 44 of FIG. 4 generates an initial difference matrix Ck corresponding to the difference between the component matrix A and the shift matrix Bk for each of the K shift matrices B1 to BK. The initial difference matrix Ck is an array of difference values ck[1, 1] to ck[M, N] arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction and the N columns being arranged in the time-axis direction. As shown in FIG. 5, each difference value ck[m, n] of the initial difference matrix Ck is set to an absolute value of the difference between the component value a[m, n] of the component matrix A and the component value bk[m, n] of the shift matrix Bk (i.e., ck[m, n]=|a[m, n]−bk[m, n]|). Since the shift matrix Bk is generated by shifting the component matrix A, the difference value ck[m, n] of the initial difference matrix Ck is set to a greater number as a greater change is made to the strength of components in the unit band σF[m] of the audio signal X within a period that spans the shift amount kΔ from each unit period σT[n] on the time axis.
  • The weight generator 46 of FIG. 4 generates a weight sequence W used to correct the initial difference matrix Ck. The weight sequence W is a sequence of N weights w[1] to w[N] corresponding to different unit periods σTn as shown in FIG. 5. The nth weight w[n] of the weight sequence W is set according to M component values a[1, n] to a[M, n] corresponding to the unit period σT[n] among component values of the component matrix A. For example, the sum or average of the M component values a[1, n] to a[M, n] is calculated as the weight w[n]. Accordingly, the weight w[n] increases as the strength (sound volume) of the unit period σT[n] over the entire band of the audio signal X increases. That is, a time sequence of the weights w[1] to w[N] corresponds to an envelope of the temporal waveform of the audio signal X.
  • The corrector 48 of FIG. 4 generates K difference matrices D1 to DK corresponding to K initial difference matrices Ck by applying the weight sequence W generated by the weight generator 46 to the initial difference matrices Ck (C1 to CK). As shown in FIG. 5, the difference matrix Dk is a matrix composed of an array of element values dk[1, 1] to dk[M, N] arranged in M rows and N columns, the M rows being arranged in the frequency-axis direction (i.e., in the vertical direction), the N columns being arranged in the time-axis direction (i.e., in the horizontal direction). Each element value dk[m, n] of the difference matrix Dk is set to a value obtained by multiplying a difference value ck[m, n] in the nth column of the initial difference matrix Ck by the nth weight w[n] of the weight sequence W (i.e., dk[m, n]=w[n]×ck[m, n]). Accordingly, each element value dk[m, n] of the difference matrix Dk is emphasized to a greater value, compared to the difference value ck[m, n] of the initial difference matrix Ck, as the strength of the audio signal X in the unit period σT[n] increases. That is, the corrector 48 functions as an element for correcting (emphasizing levels of) the distribution of N difference values ck[m, 1] to ck[m, N] arranged in the time-axis direction in the unit band σF[m].
  • The feature amount extractor 36 of FIG. 2 generates a tonal feature amount F (F1, F2) of the audio signal X using the component matrix A generated by the component acquirer 32 and the K difference matrices D1 to DK generated by the difference generator 34. FIG. 6 is a diagram illustrating operation of the feature amount extractor 36. As shown in FIG. 6, the tonal feature amount F generated by the feature amount extractor 36 is an M×(K+1) matrix in which a plurality of K feature value series E1 to EK corresponding to a plurality of difference matrices Dk and one feature value series EK+1 corresponding to the component matrix A are arranged. Thus, the number M of rows and the number (K+1) of columns of the tonal feature amount F do not depend on the time length of the audio signal X (i.e., the total number N of unit periods σT[n]).
  • The feature value series EK+1 located at the (K+1)th column of the tonal feature amount F is a sequence of M feature values eK+1[1] to eK+1[M] corresponding to different unit bands σF[m]. The element value eK+1[m] is set according to N component values a[m, 1] to a[m, N] corresponding to the unit band σF[m] among component values of the component matrix A generated by the component acquirer 32. For example, the sum or average of the N component values a[m, 1] to a[m, N] is calculated as the feature value eK+1[m]. Accordingly, the feature value eK+1[m] increases as the strength of the components of the unit band σF[m] over the entire period of the audio signal X increases. That is, the feature value eK+1[m] serves as a feature amount representing an average tone (average frequency characteristics) of the audio signal X over the entire period of the audio signal X.
  • The feature value series Ek (E1 to EK) is a sequence of M feature values ek[1] to ek[M] corresponding to different unit band σF[m]. The mth feature value ek[m] of the feature value series Ek is set according to N element values dk[m, 1] to dk[m, N] corresponding to the unit band σF[m] among element values of the difference matrix Dk. For example, the sum or average of the N element values dk[m, 1] to dk[m, N] is calculated as the feature value ek[m]. As can be understood from the above description, the feature value ek[m] is set to a greater value as the strength of the components in the unit band σF[m] of the audio signal X in each of the unit periods σT[1] to σT[N] more significantly changes in a period that spans the shift amount kΔ from the unit period σTn. Accordingly, in the case where the K feature values e1[m] to eK[m] (arranged in the horizontal direction) corresponding to each unit band σF[m] in the tonal feature amount F include many great feature values ek[m], it is estimated that the components of the unit band σF[m] of the audio signal X are components of sound whose strength rapidly changes in a short time. On the other hand, in the case where the K feature values e1[m] to eK[m] corresponding to each unit band σF[m] include many small feature values ek[m], it is estimated that the components of the unit band σF[m] of the audio signal X are components of sound whose strength does not greatly change over a long time (or that the components of the unit band σF[m] are not generated). That is, the K feature value series E1 to EK included in the tonal feature amount F serve as a feature amount indicating temporal changes of the components of each unit band σF[m] of the audio signal X (i.e., temporal changes of tone of the audio signal X).
  • The configuration and operation of the signal analyzer 22 of FIG. 1 have been described above. The signal analyzer 22 sequentially generates the tonal feature amount F1 of the first audio signal X1 and the tonal feature amount F2 of the second audio signal X2 through the above procedure. The tonal feature amounts F generated by the signal analyzer 22 are provided to the storage device 14.
  • The display controller 24 displays tone images G (G1, G2) of FIG. 7 schematically and graphically representing the tonal feature amounts F (F1, F2) generated by the signal analyzer 22 on the display device 16. FIG. 7 illustrates an example in which the tone image G1 of the tonal feature amount F1 of the audio signal X1 and the tone image G2 of the tonal feature amount F2 of the audio signal X2 are displayed in parallel.
  • As shown in FIG. 7, each tone image G is a mapping pattern in which unit figures u[m, κ] corresponding to the element values eκ[m] of the tonal feature amount F (κ=1˜K+1) are mapped in a matrix of M rows and (K+1) columns along the horizontal axis corresponding to the time axis and along the frequency axis (vertical axis) perpendicular to the horizontal axis. The tone image G1 of the audio signal X1 and the tone image G2 of the audio signal X2 are displayed in contrast with respect to the common horizontal axis (time axis).
  • As shown in FIG. 7, a display form (color or gray level) of a unit figure u[m, κ] located at an mth row and an nth column in the tone image G1 is variably set according to a feature value eκ[m] in the tonal feature amount F1. Similarly, a display form of each unit figure u[m, κ] of the tone image G2 is variably set according to a feature value eκ[m] in the tonal feature amount F2. Accordingly, the user who has viewed the tone images G can intuitively identify and compare the tendencies of the tones of the audio signal X1 and the audio signal X2.
  • Specifically, the user can easily identify the tendency of the average tone (frequency characteristics) of the audio signal X over the entire period of the audio signal X from the M unit figures u(1, K+1) to u(M, K+1) (the feature value series EK+1) of the (K+1)th column among the unit figures of the tone image G. The user can also easily identify the tendency of temporal changes of the components of each unit band σF[m] (i.e., each octave) of the audio signal X from the unit figures u(m, k) of the 1st to Kth columns among the unit figures of the tone image G. In addition, the user can easily compare the tone of the audio signal X1 and the tone of the audio signal X2 since the number M of rows and the number (K+1) of columns of the unit figures u[m, κ] are common to the tone image G1 and the tone image G2 regardless of the time length of each audio signal X.
  • The feature comparator 26 of FIG. 1 calculates a value (hereinafter referred to as a “similarity index value”) Q which is a measure of the tonal similarity between the audio signal X1 and audio signal X2 by comparing the tonal feature amount F1 of the audio signal X1 and the tonal feature amount F2 of the audio signal X2. Although any method may be employed to calculate the similarity index value Q, it is possible to employ a configuration in which differences between corresponding feature values eκ[m] in the tonal feature amount F1 and the tonal feature amount F2 (i.e., differences between feature values eκ[m] located at corresponding positions in the two matrices) are calculated and the sum or average of absolute values of the differences over the M rows and the (K+1) columns is calculated as the similarity index value Q. That is, the similarity index value Q decreases as the similarity between the tonal feature amount F1 of the audio signal X1 and the tonal feature amount F2 of the audio signal X2 increases. The similarity index value Q calculated by the feature comparator 26 is displayed on the display device 16, for example, together with the tone images G (G1, G2) of FIG. 7. The user can quantitatively determine the tonal similarity between the audio signal X1 and the audio signal X2 from the similarity index value Q.
  • In the above embodiment, the tendency of the average tone of the audio signal X over the entire period of the audio signal X is represented by the feature value series EK+1 and the tendency of temporal changes of the tone of the audio signal X over the entire period of the audio signal X is represented by K feature value series E1 to EK corresponding to the number of shift matrices Bk (i.e., the number of feature amounts kΔ). Accordingly, it is possible to reduce the amount of data required to estimate the tone color or timbre of a piece of music, compared to the prior art configuration (for example, Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156) in which a feature amount such as an MFCC is extracted for each unit period σT[n]. In addition, since feature values eκ[m] of the tonal feature amount F are calculated using unit bands σF[m], each including a plurality of component values x, as frequency-axis units, the amount of data of the tonal feature amount F is reduced, for example, compared to the prior art configuration in which a feature value is calculated for each frequency corresponding to each component value x. There is also an advantage in that the user can easily identify the range of each feature value eκ[1] to eκ[M] of the tonal feature amount F since each unit band σF[m] is set to a bandwidth of one octave.
  • Further, since the number K of the feature value series E1 to EK representing the temporal change of the tone of the audio signal X does not depend on the time length of the audio signal X, the user can easily estimate the tonal similarity between the tone of the audio signal X1 and the tone of the audio signal X2 by comparing the tone image G1 and the tone image G2 even when the time lengths of the audio signal X1 and the audio signal X2 are different. Furthermore, in principle, the process for locating corresponding time points between the audio signal X1 and the audio signal X2 (for example, DP matching required in the technology of Jouni Paulus and Anssi Klapuri, “Measuring the Similarity of Rhythmic Patterns”, Proc. ISMIR 2002, p. 150-156) is unnecessary since the number M of rows and the number (K+1) of columns of the tonal feature amount F do not depend on the audio signal X. Therefore, there is also an advantage in that load of processing for comparing the tones of the audio signal X1 and the audio signal X2 (i.e., load of the feature comparator 26) is reduced.
  • <Modifications>
  • Various modifications can be made to each of the above embodiments. The following are specific examples of such modifications. Two or more modifications selected from the following examples may be combined as appropriate.
  • (1) Modification 1
  • The method of calculating the component value a[m, n] of each unit band σF[m] is not limited to the above method in which an average (arithmetic average) of a plurality of component values x in the unit band σF[m] is calculated as the component value a[m, n]. For example, it is possible to employ a configuration in which the weighted sum, the sum, or the middle value of the plurality of component values x in the unit band σF[m] is calculated as the component value a[m, n] or a configuration in which each component value x is directly used as the component value a[m, n] of the component matrix A. In addition, the bandwidth of the unit band σF[m] may be arbitrarily selected without being limited to one octave. For example, it is possible to employ a configuration in which each unit band σF[m] is set to a bandwidth corresponding to a multiple of one octave or a bandwidth corresponding to a divisional of one octave divided by an integer.
  • (2) Modification 2
  • Although the initial difference matrix Ck is corrected to the difference matrix Dk using the weight sequence W in the above embodiment, it is possible to omit correction using the weight sequence W. For example, it is possible to employ a configuration in which the feature amount extractor 36 generates the tonal feature amount F using the initial difference matrix Ck calculated by the difference calculator 44 of FIG. 4 as the difference matrix Dk (such that the weight generator 46, the corrector 48, and the like are omitted).
  • (3) Modification 3
  • Although the tonal feature amount F including the K feature value series E1 to EK generated from difference matrices Dk and the feature value series EK+1 corresponding to the component matrix A is generated in the above embodiment, the feature value series EK+1 may be omitted from the tonal feature amount F.
  • (4) Modification 4
  • Although each shift matrix Bk is generated by shifting the component values a[m, n] at the front edge of the component matrix A to the rear edge in the above embodiment, the method of generating the shift matrix Bk by the shift matrix generator 42 may be modified as appropriate. For example, it is possible to employ a configuration in which a shift matrix Bk of m rows and (N−kΔ) columns is generated by eliminating a number of columns corresponding to the shift amount kΔ at the front side of the component matrix A from among the columns of the component matrix A. The difference calculator 44 generates an initial difference matrix Ck of m rows and (N−kΔ) columns by calculating difference values ck[m, n] between the component values a[m, n] and the component values dk[m, n] only for an overlapping portion of the component matrix A and the shift matrix Bk. Although each component value a[m, n] of the component matrix A is shifted to the front side of the time axis in the above example, it is also possible to employ a configuration in which the shift matrix Bk is generated by shifting each component value a[m, n] to the rear side of the time axis (i.e., forward in time).
  • (5) Modification 5
  • Although the frequency analyzer 322 of the component acquirer 32 generates the spectrum PX from the audio signal X while the matrix generator 324 generates the component matrix A from the time sequence of the PX in the above embodiment, the component acquirer 32 may acquire the component matrix A using any other method. For example, it is possible to employ a configuration in which the component matrix A of the audio signal X is stored in the storage device 14 in advance (such that storage of the audio signal X may be omitted) and the component acquirer 32 acquires the component matrix A from the storage device 14. It is also possible to employ a configuration in which a time sequence of each spectrum PX of the audio signal X is stored in the storage device 14 in advance (such that storage of the audio signal X or the frequency analyzer 322 may be omitted) and the component acquirer 32 (the matrix generator 324) generates the component matrix A from the spectrum PX in the storage device 14. That is, the component acquirer 32 may be any element for acquiring the component matrix A.
  • (6) Modification 6
  • Although the audio analysis apparatus 100 includes both the signal analyzer 22 and the feature comparator 26 in the above example, the invention may also be realized as an audio analysis apparatus including only one of the signal analyzer 22 and the feature comparator 26. That is, an audio analysis apparatus used to analyze the tone of the audio signal X (i.e., used to extract the tonal feature amount F) (hereinafter referred to as a “feature extraction apparatus”) may have a configuration in which the signal analyzer 22 is provided while the feature comparator 26 is omitted. On the other hand, an audio analysis apparatus used to compare the tones of the audio signal X1 and the audio signal X2 (i.e., used to calculate the similarity index value Q) (hereinafter referred to as a “feature comparison apparatus”) may have a configuration in which the feature comparator 26 is provided while the signal analyzer 22 is omitted. The tonal feature amounts F (F1, F2) generated by the signal analyzer 22 of the feature extraction apparatus is provided to the feature comparison apparatus through, for example, a communication network or a portable recording medium and is then stored in the storage device 14. The feature comparator 26 of the feature comparison apparatus calculates the similarity index value Q by comparing the tonal feature amount F1 and the tonal feature amount F2 stored in the storage device 14.

Claims (6)

1. An audio analysis apparatus comprising:
a component acquisition part that acquires a component matrix composed of an array of component values from an audio signal which is divided into a sequence of unit periods in a time-axis direction, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band;
a difference generation part that generates a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and that generates a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component value of the shift matrix and the corresponding component value of the component matrix; and
a feature amount extraction part that generates a tonal feature amount including a plurality of series of feature values corresponding to the plurality of difference matrices, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
2. The audio analysis apparatus according to claim 1, wherein the difference generation part comprises:
a weight generation part that generates a sequence of weights from the component matrix in correspondence to the sequence of the unit periods, the weight corresponding to a series of component values arranged in the frequency axis direction at the corresponding unit period;
a difference calculation part that generates each initial difference matrix composed of an array of difference values of component values between each shift matrix and the component matrix; and
a correction part that generates each difference matrix by applying the sequence of the weights to each initial difference matrix.
3. The audio analysis apparatus according to claim 1, wherein the feature amount extraction part generates the tonal feature amount including a series of feature values derived from the component matrix in correspondence to the series of the unit bands, each feature value corresponding to a sequence of component values of the component matrix arranged in the time-axis direction at the corresponding unit band.
4. An audio analysis apparatus comprising:
a storage part that stores a tonal feature amount for each of first and second ones of an audio signal; and
a feature comparison part that calculates a similarity index value indicating tonal similarity between the first audio signal and the second audio signal by comparing the tonal feature amounts of the first audio signal and the second audio signal with each other, wherein
the tonal feature amount is derived based on a component matrix of the audio signal which is divided into a sequence of unit periods in a time-axis direction and based on a plurality of shift matrices derived from the component matrix, the component matrix being composed of an array of component values, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band, each shift matrix being obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and wherein
the tonal feature amount includes a plurality of series of feature values corresponding to a plurality of difference matrices which are derived from the plurality of the shift matrices, each difference matrix being composed of an array of element values each representing a difference between the corresponding component value of each shift matrix and the corresponding component value of the component matrix, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
5. A machine readable storage medium containing an audio analysis program being executable by a computer to perform processes of:
acquiring a component matrix composed of an array of component values from an audio signal which is divided into a sequence of unit periods in a time-axis direction, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band;
generating a plurality of shift matrices each obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount;
generating a plurality of difference matrices each composed of an array of element values in correspondence to the plurality of the shift matrices, the element value representing a difference between the corresponding component value of the shift matrix and the corresponding component value of the component matrix; and
generating a tonal feature amount including a plurality of series of feature values corresponding to the plurality of difference matrices, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
6. A data structure of a tonal feature amount representing a tone color of an audio signal, wherein
the tonal feature amount is derived based on a component matrix of the audio signal which is divided into a sequence of unit periods in a time-axis direction and based on a plurality of shift matrices derived from the component matrix, the component matrix being composed of an array of component values, columns of the component matrix corresponding to the sequence of unit periods of the audio signal and rows of the component matrix corresponding to a series of unit bands of the audio signal arranged in a frequency-axis direction, the component value representing a spectrum component of the audio signal belonging to the corresponding unit period and belonging to the corresponding unit band, each shift matrix being obtained by shifting the columns of the component matrix in the time-axis direction with a different shift amount, and wherein
the tonal feature amount includes a plurality of series of feature values corresponding to a plurality of difference matrices which are derived from the plurality of the shift matrices, each difference matrix being composed of an array of element values each representing a difference between the corresponding component value of each shift matrix and the corresponding component value of the component matrix, one series of feature values corresponding to the series of unit bands of the difference matrix, one feature value representing a sequence of element values arranged in the time-axis direction at the corresponding unit band of the difference matrix.
US13/081,408 2010-04-07 2011-04-06 Audio analysis apparatus Expired - Fee Related US8853516B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010-088354 2010-04-07
JP2010088354A JP5454317B2 (en) 2010-04-07 2010-04-07 Acoustic analyzer

Publications (2)

Publication Number Publication Date
US20110268284A1 true US20110268284A1 (en) 2011-11-03
US8853516B2 US8853516B2 (en) 2014-10-07

Family

ID=44303303

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/081,408 Expired - Fee Related US8853516B2 (en) 2010-04-07 2011-04-06 Audio analysis apparatus

Country Status (3)

Country Link
US (1) US8853516B2 (en)
EP (1) EP2375406B1 (en)
JP (1) JP5454317B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120113122A1 (en) * 2010-11-09 2012-05-10 Denso Corporation Sound field visualization system
US20130289756A1 (en) * 2010-12-30 2013-10-31 Barbara Resch Ranking Representative Segments in Media Data
US20140260913A1 (en) * 2013-03-15 2014-09-18 Exomens Ltd. System and method for analysis and creation of music
US8853516B2 (en) * 2010-04-07 2014-10-07 Yamaha Corporation Audio analysis apparatus
US20160092157A1 (en) * 2014-09-25 2016-03-31 Honeywell International Inc. Method of integrating a home entertainment system with life style systems which include searching and playing music using voice commands based upon humming or singing
US9705857B1 (en) * 2014-10-10 2017-07-11 Sprint Spectrum L.P. Securely outputting a security key stored in a UE
CN112885374A (en) * 2021-01-27 2021-06-01 吴怡然 Sound accuracy judgment method and system based on spectrum analysis

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9299362B2 (en) * 2009-06-29 2016-03-29 Mitsubishi Electric Corporation Audio signal processing device
JP5582123B2 (en) 2011-10-05 2014-09-03 三菱電機株式会社 Semiconductor device
JP5935503B2 (en) * 2012-05-18 2016-06-15 ヤマハ株式会社 Music analysis apparatus and music analysis method
US9681230B2 (en) 2014-10-17 2017-06-13 Yamaha Corporation Acoustic system, output device, and acoustic system control method
KR20180050947A (en) 2016-11-07 2018-05-16 삼성전자주식회사 Representative waveform providing apparatus and method
US10504504B1 (en) 2018-12-07 2019-12-10 Vocalid, Inc. Image-based approaches to classifying audio data
US11170043B2 (en) * 2019-04-08 2021-11-09 Deluxe One Llc Method for providing visualization of progress during media search
CN111292763B (en) * 2020-05-11 2020-08-18 新东方教育科技集团有限公司 Stress detection method and device, and non-transient storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6430533B1 (en) * 1996-05-03 2002-08-06 Lsi Logic Corporation Audio decoder core MPEG-1/MPEG-2/AC-3 functional algorithm partitioning and implementation
US20050117532A1 (en) * 2003-05-13 2005-06-02 Nokia Corporation Fourier-transform based linear equalization for CDMA downlink
US20080072741A1 (en) * 2006-09-27 2008-03-27 Ellis Daniel P Methods and Systems for Identifying Similar Songs
US20090005890A1 (en) * 2007-06-29 2009-01-01 Tong Zhang Generating music thumbnails and identifying related song structure
US7509294B2 (en) * 2003-12-30 2009-03-24 Samsung Electronics Co., Ltd. Synthesis subband filter for MPEG audio decoder and a decoding method thereof
US7659471B2 (en) * 2007-03-28 2010-02-09 Nokia Corporation System and method for music data repetition functionality
US20120237041A1 (en) * 2009-07-24 2012-09-20 Johannes Kepler Universität Linz Method And An Apparatus For Deriving Information From An Audio Track And Determining Similarity Between Audio Tracks
US20130289756A1 (en) * 2010-12-30 2013-10-31 Barbara Resch Ranking Representative Segments in Media Data
US20130322777A1 (en) * 2009-11-15 2013-12-05 Lester F. Ludwig High-Accuracy Centered Fractional Fourier Transform Matrix for Optical Imaging and Other Applications

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1143409B1 (en) 2000-04-06 2008-12-17 Sony France S.A. Rhythm feature extractor
US20030205124A1 (en) 2002-05-01 2003-11-06 Foote Jonathan T. Method and system for retrieving and sequencing music by rhythmic similarity
EP1577877B1 (en) 2002-10-24 2012-05-02 National Institute of Advanced Industrial Science and Technology Musical composition reproduction method and device, and method for detecting a representative motif section in musical composition data
JP4483561B2 (en) * 2004-12-10 2010-06-16 日本ビクター株式会社 Acoustic signal analysis apparatus, acoustic signal analysis method, and acoustic signal analysis program
US20080300702A1 (en) 2007-05-29 2008-12-04 Universitat Pompeu Fabra Music similarity systems and methods using descriptors
JP4973537B2 (en) * 2008-02-19 2012-07-11 ヤマハ株式会社 Sound processing apparatus and program
JP2010054802A (en) * 2008-08-28 2010-03-11 Univ Of Tokyo Unit rhythm extraction method from musical acoustic signal, musical piece structure estimation method using this method, and replacing method of percussion instrument pattern in musical acoustic signal
JP5454317B2 (en) * 2010-04-07 2014-03-26 ヤマハ株式会社 Acoustic analyzer

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6430533B1 (en) * 1996-05-03 2002-08-06 Lsi Logic Corporation Audio decoder core MPEG-1/MPEG-2/AC-3 functional algorithm partitioning and implementation
US20050117532A1 (en) * 2003-05-13 2005-06-02 Nokia Corporation Fourier-transform based linear equalization for CDMA downlink
US7502312B2 (en) * 2003-05-13 2009-03-10 Nokia Corporation Fourier-transform based linear equalization for CDMA downlink
US7509294B2 (en) * 2003-12-30 2009-03-24 Samsung Electronics Co., Ltd. Synthesis subband filter for MPEG audio decoder and a decoding method thereof
US20080072741A1 (en) * 2006-09-27 2008-03-27 Ellis Daniel P Methods and Systems for Identifying Similar Songs
US7659471B2 (en) * 2007-03-28 2010-02-09 Nokia Corporation System and method for music data repetition functionality
US20090005890A1 (en) * 2007-06-29 2009-01-01 Tong Zhang Generating music thumbnails and identifying related song structure
US20120237041A1 (en) * 2009-07-24 2012-09-20 Johannes Kepler Universität Linz Method And An Apparatus For Deriving Information From An Audio Track And Determining Similarity Between Audio Tracks
US20130322777A1 (en) * 2009-11-15 2013-12-05 Lester F. Ludwig High-Accuracy Centered Fractional Fourier Transform Matrix for Optical Imaging and Other Applications
US8712185B2 (en) * 2009-11-15 2014-04-29 Lester F. Ludwig High-accuracy centered fractional fourier transform matrix for optical imaging and other applications
US20130289756A1 (en) * 2010-12-30 2013-10-31 Barbara Resch Ranking Representative Segments in Media Data

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8853516B2 (en) * 2010-04-07 2014-10-07 Yamaha Corporation Audio analysis apparatus
US20120113122A1 (en) * 2010-11-09 2012-05-10 Denso Corporation Sound field visualization system
US20130289756A1 (en) * 2010-12-30 2013-10-31 Barbara Resch Ranking Representative Segments in Media Data
US9313593B2 (en) * 2010-12-30 2016-04-12 Dolby Laboratories Licensing Corporation Ranking representative segments in media data
US9317561B2 (en) 2010-12-30 2016-04-19 Dolby Laboratories Licensing Corporation Scene change detection around a set of seed points in media data
US20140260913A1 (en) * 2013-03-15 2014-09-18 Exomens Ltd. System and method for analysis and creation of music
US9183821B2 (en) * 2013-03-15 2015-11-10 Exomens System and method for analysis and creation of music
US20160092157A1 (en) * 2014-09-25 2016-03-31 Honeywell International Inc. Method of integrating a home entertainment system with life style systems which include searching and playing music using voice commands based upon humming or singing
US10133537B2 (en) * 2014-09-25 2018-11-20 Honeywell International Inc. Method of integrating a home entertainment system with life style systems which include searching and playing music using voice commands based upon humming or singing
US9705857B1 (en) * 2014-10-10 2017-07-11 Sprint Spectrum L.P. Securely outputting a security key stored in a UE
CN112885374A (en) * 2021-01-27 2021-06-01 吴怡然 Sound accuracy judgment method and system based on spectrum analysis

Also Published As

Publication number Publication date
US8853516B2 (en) 2014-10-07
JP2011221157A (en) 2011-11-04
JP5454317B2 (en) 2014-03-26
EP2375406B1 (en) 2014-07-16
EP2375406A1 (en) 2011-10-12

Similar Documents

Publication Publication Date Title
US8853516B2 (en) Audio analysis apparatus
US8487175B2 (en) Music analysis apparatus
JP6019858B2 (en) Music analysis apparatus and music analysis method
CN111680187B (en) Music score following path determining method and device, electronic equipment and storage medium
US8543387B2 (en) Estimating pitch by modeling audio as a weighted mixture of tone models for harmonic structures
US9257111B2 (en) Music analysis apparatus
JP5088030B2 (en) Method, apparatus and program for evaluating similarity of performance sound
JP4815436B2 (en) Apparatus and method for converting an information signal into a spectral representation with variable resolution
US8494668B2 (en) Sound signal processing apparatus and method
US20140020546A1 (en) Note Sequence Analysis Apparatus
US7411125B2 (en) Chord estimation apparatus and method
US9626949B2 (en) System of modeling characteristics of a musical instrument
CN107210029A (en) Method and apparatus for handling succession of signals to carry out polyphony note identification
US7777123B2 (en) Method and device for humanizing musical sequences
CN113012666A (en) Method, device, terminal equipment and computer storage medium for detecting music tonality
JP7120468B2 (en) SOUND ANALYSIS METHOD, SOUND ANALYZER AND PROGRAM
Beauchamp Perceptually correlated parameters of musical instrument tones
JP5879813B2 (en) Multiple sound source identification device and information processing device linked to multiple sound sources
CN113557565A (en) Music analysis method and music analysis device
JP2010054535A (en) Chord name detector and computer program for chord name detection
US20210366453A1 (en) Sound signal synthesis method, generative model training method, sound signal synthesis system, and recording medium
WO2020255214A1 (en) Musical piece analysis device, program, and musical piece analysis method
CN109060109B (en) Informatization acoustic detection method and system for cello resonance box based on impedance technology
EP2043089B1 (en) Method and device for humanizing music sequences
CN113744760A (en) Pitch recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: YAMAHA CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARIMOTO, KEITA;STREICH, SEBASTIAN;ONG, BEE SUAN;SIGNING DATES FROM 20110614 TO 20110621;REEL/FRAME:026617/0508

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20181007