CN103077706B - Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm - Google Patents

Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm Download PDF

Info

Publication number
CN103077706B
CN103077706B CN201310027662.3A CN201310027662A CN103077706B CN 103077706 B CN103077706 B CN 103077706B CN 201310027662 A CN201310027662 A CN 201310027662A CN 103077706 B CN103077706 B CN 103077706B
Authority
CN
China
Prior art keywords
music
matrix
file
frequency
happy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310027662.3A
Other languages
Chinese (zh)
Other versions
CN103077706A (en
Inventor
林晓勇
蒋玲慧
张跃
赵静
穆祥女
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201310027662.3A priority Critical patent/CN103077706B/en
Publication of CN103077706A publication Critical patent/CN103077706A/en
Application granted granted Critical
Publication of CN103077706B publication Critical patent/CN103077706B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method for extracting and representing the music fingerprint characteristic of music with regular drumbeat rhythm. Aimed at the music with regular drumbeat rhythm, music measure parameter extraction and estimation are carried out and a measure position offset matrix is generated, requisite parameters which accord with the perception of the human body are extracted from the content of the music, non-linear Bark separation is carried out, so that an energy parameter matrix for each subband is obtained, interleaving is carried out in the matrixes in the form of blocks, a two-dimensional music fingerprint image is finally determined and outputted, and an independent representable specific 'music fingerprint' file is generated with the measure position offset matrix and the two-dimensional music fingerprint image. The method is mainly aimed at classical music (genuine) with clear drumbeats to extract the specific 'music fingerprints' of classical music, which serve as the separate 'fingerprints' of music, and meanwhile, the method also can extract the music fingerprint information of copied (pirated, illegally recorded and duplicated) classical music to carry out comparison and finally judge whether the copied classical music is genuine music according to errors.

Description

Happy line feature extraction and method for expressing are carried out to the music of regular drumbeat rhythm
Technical field
The present invention relates to a kind of happy line feature extraction of music (especially classical music) content and method for expressing for having regular drumbeat rhythm, belonging to music voice signal characteristic abstraction and processing technology field.
Background technology
At present at CBMR(Content-based Music Retrieve, content-based music fingerprint retrieval), CBID(content-based audio identification, content-based audio identification) AFP(Audio fingerprinting in other words, fingerprint extraction) in search technique field one take music signal as the special applications of principal character.CBMR comprises two large main contents: music fingerprint (happy line) extract and happy line retrieval in matching algorithm.
In happy line extraction algorithm, till now, the achievement in research of a lot of vocal print algorithm is had both at home and abroad.The method of extensive employing be from through in short-term-Fourier transformation after spectrogram inside select some features, and carry out modeling to these characteristic sequences, the model extraction parameter after modeling is as the happy line of this fragment.
In work in early days, mainly contain the LPC(Linear Prediction Coefficients using field of voice signal, linear predictor coefficient), and use MFCC(Mel-Frequency Cepstral Coefficients, mel cepstrum coefficients) feature characterizes music signal.The two is all by transform acoustical signals on cepstrum domain, and MFCC method is compared than LPC has better advantage.
Because " vocal print " retrieval technique of current research is mainly for general sound class, such as voice paragraph, song, music song etc., the means therefore adopted are all more common and extensive, poor-performing in robustness.And for the more and more higher classical music protected the intellectual property in the whole world, not there is generality.Classical music tuneful, drumbeat is rule (such as the thump such as piano, Zheng class music) comparatively, how there is not yet solution to the retrieval that this type of music with regular drumbeat rhythm carries out " vocal print ".
Summary of the invention
Technical problem to be solved by this invention is rapid extraction for the happy line parameter of the music being content with the music (classical music) with regular drumbeat rhythm and visable representation.To guarantor's ear can responsive to frequency retain and process, extraction trifle and beat excursion matrix are carried out to the drumbeat feature of classical music, intertexture is carried out to the sub belt energy of classical music data and does difference judgement, final generation " happy line " tag file, the happy line characteristic parameter obtaining legal music unique represents.
The present invention is for solving the problems of the technologies described above by the following technical solutions:
Happy line feature extraction and a method for expressing are carried out to the music of regular drumbeat rhythm, comprises the preprocessing process to original music, two-dimentional happy print image generative process, music rhythm start position leaching process, happy line tag file generative process; Concrete steps are as follows:
A, preprocessing process are as follows:
Steps A 1, the translation window pattern adopting overlap coefficient to be 31/32 carry out sample sequence framing to original music file, obtain some based on seasonal effect in time series Frame;
Steps A 2, Frame is obtained for A1 carry out preemphasis process, filter background noise and channel white noise;
Steps A 3, adopt the white noise and department's short duration high frequency interference noise that bring due to sound pick-up outfit in filters filter data, obtain continuous print Frame;
Steps A 4, carries out the operation of loading Hanning window to continuous print Frame, is converted into time-domain signal;
Steps A 5, time-domain signal steps A 4 obtained adopt FFT to become frequency domain discrete signal, i.e. frequency domain matrix { H (i, j) }, and this frequency domain matrix { H (i, j) } are adopted Db form matrix E (k)=10log 10(| H (i, j) | 2) be converted into corresponding frequency energy matrix { E (i, j) }; Wherein, H (i, j) be under Time Continuous i frame coordinate, j frequency time short time frame signal amplitude, the frequency energy that E (i, j) denotation coordination (i, j) is corresponding, k represents Time Continuous frame number, and i, j, k are natural number;
B, two-dimentional happy print image generative process are as follows:
Step B1, the frequency energy matrix { E (i, j) } produced steps A 5, adopt Bark curve table to carry out nonlinear Bark subband and be separated;
Step B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system can the energy point that arrives of sensitivity rapidly;
The non-linear value of step B3, corresponding Bark curve, using the division border that each frequency index of continuous subband is separated as subband, carries out sub belt energy summation, obtain a continuous matrix { J (m, n) }, wherein m ∈ (2,32), n ∈ (1, ∞); Then carry out interleaving block process between adjacent block, adopt three value methods to export court verdict, obtain one by three values { matrix that-1,0,1} forms, i.e. happy line characteristic value;
Step B4, to export happy line characteristic value carry out visual image displaying, namely to described three value {-1,0,1} uses RGB look to draw respectively;
C, music rhythm start position leaching process, specifically comprise:
Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, obtained the set { T (k) } of the position skew of point frame, k is for scope is from 1 to obtained whole starting point sums;
Step C2, restriction frequency index range, calculate frequency difference, filter local power minimum in origin sequences; To the origin sequences after filtration, calculate the distance between adjacent T (k), be designated as { D (k) } sequence;
Step C3, carry out K-Means cluster calculation to { D (k) } sequence, obtain its maximal subset { Dm (p) }, wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset;
The corresponding time location of step C4, extraction { Dm (k) }, as the offset data of final effectively rhythm starting point;
D, happy line tag file generative process, be specially:
The final result of step B and step C is synthesized a file, using the head of the result of step C as this file, the result of step B as the data volume of this file, then finally generates a kind of visual happy line data file that uniquely can indicate this song.
The present invention adopts above technical scheme compared with prior art, has following technique effect:
1, non-linear Bark subband partition method is adopted, avoid the simplification process of traditional even partition subband, fully take into account the different feeling of human auditory system curve to classical music content, by the filtration to auditory sensitivity thresholding, the music content part not affecting auditory effect is filtered, remains the validity to perceived content;
2, " three value methods " is adopted to describe visual happy line file, have better illustrative than traditional black and white method, avoid the change of the file fingerprint that minor fluctuations that black and white two-value method causes under noise jamming causes simultaneously, therefore this kind of method has better " robustness ".
3, have employed the maximal subset that clustering algorithm obtains rhythm starting point, this kind of method has than a lot of theoretical algorithm and better realizes effect, it filters pseudo-some subset effectively, although also filter out some available points simultaneously, ensure that the existence of effective rhythm starting point from probability.
4, the final happy line tag file generated, the coloured graph-based of tool, the positional information of the rhythm starting point simultaneously in header file, retrieval original position can be set up rapidly when happy line retrieval, and the comparison breviaty of whole music file tag file will have been become to the process of only rhythm start position fragment being compared.
Accompanying drawing explanation
Fig. 1 is functions implementing the present invention block diagrams.
Fig. 2 is original CD music pretreatment process figure.
Fig. 3 is two-dimentional happy print image generative process figure.
Fig. 4 definitely hears Bark curve map.
Fig. 5 adopts " three value methods " to the schematic diagram carrying out interleaving block process output court verdict between adjacent block.
Fig. 6 is by three values { the matrix schematic diagram of 32 row that-1,0,1} forms.
Fig. 7 is the visual image schematic diagram of the happy line of music.
Fig. 8 is music start position extracting method flow chart.
Fig. 9 is peak-to-peak power points before treatment and the peak-to-peak power points schematic diagram after filtering.
Figure 10 is visual happy line document format data figure.
Detailed description of the invention
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:
As shown in Figure 1, the present invention proposes a kind of happy line feature extraction of music content and method for expressing based on having regular drumbeat rhythm, mainly comprises: the pretreatment of original CD music, two-dimentional happy print image generation method, music start position extracting method, happy line tag file method for expressing.Wherein, two-dimentional happy print image generation method comprises: to the non-linear Bark subband separation process of sampling point sequence, antithetical phrase band carries out the filtration of perception thresholding, and sub belt energy is sued for peace, the process of matrix intersector block, happy line character representation, the processes such as the happy print image display of two-dimensional color.Music start position extracting method comprises: extract the rhythmic stress in music, eliminate pseudo-stress sampling point, effective rhythm start point data is obtained by clustering algorithm, and by start of record deviation post, final and two dimension happy print image file synthesis one has the happy line tag file that unique identification specifies music.
For classical music, the main method specific implementation step of the present invention is as follows:
A, pretreatment are read original classical music file and carry out the preparation measures before happy line extraction.As shown in Figure 2, following steps are specifically comprised:
A1, framing, getting every 16384 sampling points is a frame, and overlap coefficient is 31/32 simultaneously, and namely adjacent two frames only have 512 sample values to be different.
A2, introducing preemphasis process, by channel H (z)=1-α z -1come filter background noise and channel white noise, wherein empirical coefficient α ∈ (0.9375,1), z is the transform factor.
The RASTA wave filter that A3, this method adopt, RASTA wave filter is a time series IIR type bandpass filter, filtration fraction short duration high frequency interference noise; The characteristic of channel is: H ( z ) = 0.1 × 2 + z - 1 - z - 3 - 2 z - 4 z - 4 × ( 1 - 0.94 z - 1 ) .
A4, to filter after time series frame, carry out loading Hanning window (length of window is 16384), introduce empirical coefficient simultaneously β = 8 / 3 .
A5, employing FFT become frequency domain discrete signal, and FFT transformation for mula is, wherein: N is a N continuous time domain grouping sum, and N is natural number, and r is continuous frequency sequence number, and x (t) is the sampled value of t, w n=e (-2 π t/N), obtain frequency domain matrix { H (i, j) }.
A6, obtain and be converted into corresponding energy matrix { E (i, j) } thus, and be converted to Db form matrix.Wherein the conversion formula of Db form matrix is: E (k)=10log 10(| H (i, j) | 2) (unit is Db).
B, two-dimentional happy print image generative process, as shown in Figure 3, specifically comprises the non-linear Bark subband adapting to auditory perceptual and filter, and perception thresholding filters and the judging process of the continuous envelope variation of subband.
Step B1, to { E (the i that steps A 6 produces, j) } carry out nonlinear Bark subband to be separated, the Bark curve table herein taked as shown in Figure 4, comprise people's ear to the threshold of hearing required for the susceptibility of different frequent points, go out to show Freq_Bark_AbsThresh_Table according to curve plotting, comprise frequency index, critical frequency numerical value, definitely subband speed and voice range absolute threshold:
B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system responsively rapidly can arrive energy point, and can be called the pretreatment before carrying out non-linear subband separation, step is as follows:
B2.1, in the frequency band of 300-2000hz, arranges critical band 3,4,5,6,7,8,9,10,11, and 12 totally 10 critical bands, if the centre frequency f of each critical band s;
B2.2 obtains 10 and hears threshold value, corresponding 10 critical bands
T(f s)=3.64(f s/1000)^(-0.8)-6.5exp{-0.6(f s/1000-3.3)^2}+(f s/1000)^4/1000;
(fs is 10 different critical band centre frequencies).
B2.3 gets the minimum point Min of each frequency band;
If amplitude is greater than Min*10^ (T (f in B2.4 matrix { E (i, j) } frequency band s)/20) reservation, be less than the amplitude zero setting of this value, the matrix { E be improved thus new(i, j) };
Step B3, the non-linear value of corresponding Bark curve, gets 34 critical frequencies, namely comprises 33 frequency domain sub-band, is specially
[111,118,125,132,140,148,157,166,176,187,198,209,222,235,249,264,279,296,314,332,352,373,395,418,443,470,497,527,558,591,626,663,703,743] as the division border that frequency index is separated as subband;
Step B4, according to frequency index given by B2, sub belt energy summation is carried out to the matrix that B3 obtains, obtain the continuous matrix { J (m, n) } of behavior 33 row, as shown in Figure 5.
B5, carry out interleaving block process between adjacent block, as shown in Figure 5 { J (m, n) }, adopt " three value methods " to export court verdict, decision method is as follows:
Δ (m-1, n)=| J (m, n)-J (m-1, n) |, wherein m ∈ (2,32), n ∈ (1, ∞)
K=Δ (i, j+1)-Δ (i, j), wherein i ∈ (1,31), j ∈ (1, ∞)
When time, F=0; Herein
Otherwise: as K>0, F=+1;
Work as K<0, F=-1;
B6, thus obtain one by three values the matrix of 32 row that-1,0,1} forms, as shown in Figure 6: wherein the value limit of element (i, j) gets { any one in-1,0,1}.Trichromatic diagram is got to this matrix, namely uses RGB look (255,0,0) (0,255,0) (0,0,255) to draw respectively, can obtain as shown in Figure 7, be the visual image of the happy line of this music.
As long as C, this step extract classical music rhythm starting point, eliminate pseudo-starting point, obtain maximal subset finally by clustering algorithm, obtain the set of effective rhythm starting point, and record the deviation post of this maximal subset, as shown in Figure 8.
Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, during T=0, be judged as sound of mourning in silence, ambient noise is the thresholding trained, only for the process of head and the tail ambient noise.Specific as follows:
Z v = &Sigma; u = - &infin; &infin; { | sgn [ x ( v ) - T ] - sgn [ x ( v - 1 ) - T ] | + | sgn [ x ( v ) + T ] - sgn [ x ( v - 1 ) + T ] | } w ( v - u ) ,
Wherein x (v-1), x (v)for the sampled value of Time Continuous, T is decision threshold, and 20%, u of the number of winning the confidence average amplitude is variable herein, usually gets x (u)be total to (2N+1) individual sample value before and after moment, namely u scope is (-N, N), because can not calculate infinity in theory, so getting N is an enough large natural number, usually gets 1024, w (v-u)for window function, represent that in the value of (v-u) moment w be 1, and be value at other moment w be 0.
Step C2, to be obtained in origin sequences { T (k) } by step C1 and comprise multiple pseudo-point, namely rhythm starting point is mistaken for, therefore need again filtering in a frequency domain, limit frequency index range between (111 ~ 743), to in origin sequences, frequency difference is calculated, filter local power minimum, peak-to-peak power points before treatment and the peak-to-peak power points after filtering are illustrated as shown in Figure 9.
Step C3, to the origin sequences filtered after pseudo-point, calculate the distance between adjacent T (k) simultaneously, be designated as { D (k) } sequence, k is for scope is from 1 to obtained whole starting point sums, also comprise by the real start position deleted by mistake in this sequence, therefore the interval of adjacent two starting points is just much larger than conventional starting point interval.
Step C4, K-Means cluster calculation is carried out to { D (k) } sequence, get the maximal subset of polymerizing factor (0.80 ~ 0.90), be defined as { Dm (p) }; Wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset.
The corresponding time location of step C5, extraction { Dm (p) }, as offset data T (p) of final effectively rhythm starting point.
D, happy line tag file generative process, the final result of step B and step C is synthesized a file, using the head of C result as this file, the result of step B is as the data volume of this file, then a kind of visual happy line data file that uniquely can indicate this song of last generation, file suffixes is called " som ", and file format as shown in Figure 10.
Wherein character " somusic " and " fmt " represent that this file is a happy line file, " head length " shows the data length in whole to " data " from field " somusic ", " total length " the i.e. byte-sized of whole happy line file, " polymerizing factor " refers to the tightness be polymerized when extracting starting point, be defaulted as 85, be 0.85, " sample frequency " is sample frequency when reading source file, " transfer rate " refers to Wave data transfer rate, i.e. average byte number per second, the beginning offset position of to be all four bytes be mark that " index " character representation is follow-up, data increase progressively gradually, terminate until run into " data " field, from after " data " field, be {-1 of step B, 0, the three value combinations of+1}.
When happy line software for display reads " som " file, the validity of file will be judged successively, and the happy line figure of the music of RGB color shows the most at last.
Thus, the happy line that the present invention proposes a kind of music mainly for having regular drumbeat rhythm extracts and method for expressing, not only simplify in the past take probabilistic model as the origin detection method of decision method, consider the nonlinear effect of human auditory model simultaneously, masking effect by people's ear ensure that the feature of music content better, more even sub-band division is separated to the non-linear Bark subband of happy line better, to continuous adjacent subband, namely time domain and frequency domain have carried out interleaving block process, greatly enhance the stability of happy line feature in noise jamming situation, the happy line tag file of " SOM " form that the present invention finally proposes, better technical foundation will be improved for technology such as content-based music retrieval.

Claims (1)

1. happy line feature extraction and a method for expressing are carried out to the music of regular drumbeat rhythm, it is characterized in that: comprise the preprocessing process to original music, two-dimentional happy print image generative process, music rhythm start position leaching process, happy line tag file generative process; Concrete steps are as follows:
A, preprocessing process are as follows:
Steps A 1, the translation window pattern adopting overlap coefficient to be 31/32 carry out sample sequence framing to original music file, obtain some based on seasonal effect in time series Frame;
Steps A 2, Frame is obtained for A1 carry out preemphasis process, filter background noise and channel white noise;
Steps A 3, adopt the white noise and department's short duration high frequency interference noise that bring due to sound pick-up outfit in filters filter data, obtain continuous print Frame;
Steps A 4, carries out the operation of loading Hanning window to continuous print Frame, is converted into time-domain signal;
Steps A 5, time-domain signal steps A 4 obtained adopt FFT to become frequency domain discrete signal, i.e. frequency domain matrix { H (i, j) }, and this frequency domain matrix { H (i, j) } are adopted Db form matrix E (k)=10log 10(| H (i, j) | 2) be converted into corresponding frequency energy matrix { E (i, j) }; Wherein, H (i, j) be under Time Continuous i frame coordinate, j frequency time short time frame signal amplitude, the frequency energy that E (i, j) denotation coordination (i, j) is corresponding, k represents Time Continuous frame number, and i, j, k are natural number;
B, two-dimentional happy print image generative process are as follows:
Step B1, the frequency energy matrix { E (i, j) } produced steps A 5, adopt Bark curve table to carry out nonlinear Bark subband and be separated;
Step B2, each subband is carried out to the filtration of auditory perceptual thresholding, retaining human auditory system can the energy point that arrives of sensitivity rapidly;
The non-linear value of step B3, corresponding Bark curve, using the division border that each frequency index of continuous subband is separated as subband, carries out sub belt energy summation, obtain a continuous matrix { J (m, n) }, wherein m ∈ (2,32), n ∈ (1, ∞); Then carry out interleaving block process between adjacent block, adopt three value methods to export court verdict, obtain one by three values { matrix that-1,0,1} forms, i.e. happy line characteristic value;
Step B4, to export happy line characteristic value carry out visual image displaying, namely to described three value {-1,0,1} uses RGB look to draw respectively;
C, music rhythm start position leaching process, specifically comprise:
Step C1, by steps A obtain energy matrix, carry out the estimation of successive frame energy, by the judgement to zero-crossing rate and average frame energy thresholding, judge to mourn in silence sound and ambient noise, obtained the set { T (k) } of the position skew of point frame, k is for scope is from 1 to obtained whole starting point sums;
Step C2, restriction frequency index range, calculate frequency difference, filter local power minimum in origin sequences; To the origin sequences after filtration, calculate the distance between adjacent T (k), be designated as { D (k) } sequence;
Step C3, carry out K-Means cluster calculation to { D (k) } sequence, obtain its maximal subset { Dm (p) }, wherein p represents that Dm is the mark of D (k) sequence maximal subset from 1 to the maximum sum of this subset;
The corresponding time location of step C4, extraction { Dm (p) }, as the offset data of final effectively rhythm starting point;
D, happy line tag file generative process, be specially:
The final result of step B and step C is synthesized a file, using the head of the result of step C as this file, the result of step B as the data volume of this file, then finally generates a kind of visual happy line data file that uniquely can indicate this song.
CN201310027662.3A 2013-01-24 2013-01-24 Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm Expired - Fee Related CN103077706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310027662.3A CN103077706B (en) 2013-01-24 2013-01-24 Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310027662.3A CN103077706B (en) 2013-01-24 2013-01-24 Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm

Publications (2)

Publication Number Publication Date
CN103077706A CN103077706A (en) 2013-05-01
CN103077706B true CN103077706B (en) 2015-03-25

Family

ID=48154217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310027662.3A Expired - Fee Related CN103077706B (en) 2013-01-24 2013-01-24 Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm

Country Status (1)

Country Link
CN (1) CN103077706B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103354091B (en) * 2013-06-19 2015-09-30 北京百度网讯科技有限公司 Based on audio feature extraction methods and the device of frequency domain conversion
CN104599663B (en) * 2014-12-31 2018-05-04 华为技术有限公司 Accompanying song audio data processing method and device
CN108335687B (en) * 2017-12-26 2020-08-28 广州市百果园信息技术有限公司 Method for detecting beat point of bass drum of audio signal and terminal
CN108732571B (en) * 2018-03-28 2021-06-15 南京航空航天大学 Keyboard monitoring method based on combination of ultrasonic positioning and keystroke sound

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007105927A1 (en) * 2006-03-16 2007-09-20 Harmonicolor System Co., Ltd. Method and apparatus for converting image to sound
CN101151641A (en) * 2005-05-24 2008-03-26 三菱电机株式会社 Musical device with image display
CN102708859A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Real-time music voice identification system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101151641A (en) * 2005-05-24 2008-03-26 三菱电机株式会社 Musical device with image display
WO2007105927A1 (en) * 2006-03-16 2007-09-20 Harmonicolor System Co., Ltd. Method and apparatus for converting image to sound
CN102708859A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Real-time music voice identification system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于多子带分析的短波通信语音信号增强算法;张东方等;《信息工程大学学报》;20120229;第13卷(第1期);第60-65页 *
基于古典音乐的Internet 分组差错隐藏方案;林晓勇等;《南京邮电大学学报(自然科学版)》;20120430;第32卷(第2期);第1-6页 *
音乐可视化研究综述;屈天喜等;《计算机科学》;20070930;第34卷(第9期);第16-22页 *

Also Published As

Publication number Publication date
CN103077706A (en) 2013-05-01

Similar Documents

Publication Publication Date Title
Hu et al. Pitch‐based gender identification with two‐stage classification
CN103617799B (en) A kind of English statement pronunciation quality detection method being adapted to mobile device
CN108806668A (en) A kind of audio and video various dimensions mark and model optimization method
CN101023469B (en) Digital filtering method, digital filtering equipment
CN102982803A (en) Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN107393554A (en) In a kind of sound scene classification merge class between standard deviation feature extracting method
CN109256150A (en) Speech emotion recognition system and method based on machine learning
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN110415728A (en) A kind of method and apparatus identifying emotional speech
CN103077706B (en) Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm
CN103985390A (en) Method for extracting phonetic feature parameters based on gammatone relevant images
CN104123934A (en) Speech composition recognition method and system
CN104978507A (en) Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN113012720B (en) Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction
CN101290766A (en) Syllable splitting method of Tibetan language of Anduo
CN110459241A (en) A kind of extracting method and system for phonetic feature
CN111724770A (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN109714608A (en) Video data handling procedure, device, computer equipment and storage medium
CN104078051A (en) Voice extracting method and system and voice audio playing method and device
CN110299141A (en) The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition
Zhang et al. An efficient perceptual hashing based on improved spectral entropy for speech authentication
CN109377981A (en) The method and device of phoneme alignment
CN109920447B (en) Recording fraud detection method based on adaptive filter amplitude phase characteristic extraction
CN101562016A (en) Totally-blind digital speech authentication method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130501

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000208

Denomination of invention: Method for extracting and representing music fingerprint characteristic of music with regular drumbeat rhythm

Granted publication date: 20150325

License type: Common License

Record date: 20161110

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000208

Date of cancellation: 20180116

EC01 Cancellation of recordation of patent licensing contract
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150325

Termination date: 20180124

CF01 Termination of patent right due to non-payment of annual fee