CN101667423A - Compressed domain high robust voice/music dividing method based on probability density ratio - Google Patents

Compressed domain high robust voice/music dividing method based on probability density ratio Download PDF

Info

Publication number
CN101667423A
CN101667423A CN200910196513A CN200910196513A CN101667423A CN 101667423 A CN101667423 A CN 101667423A CN 200910196513 A CN200910196513 A CN 200910196513A CN 200910196513 A CN200910196513 A CN 200910196513A CN 101667423 A CN101667423 A CN 101667423A
Authority
CN
China
Prior art keywords
music
voice
data
probability density
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910196513A
Other languages
Chinese (zh)
Inventor
余小清
李昌莲
许雪琼
万旺根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN200910196513A priority Critical patent/CN101667423A/en
Publication of CN101667423A publication Critical patent/CN101667423A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a compressed domain high robust voice/music dividing method based on probability density ratio, comprising the steps of: extracting new characteristic parameters based on theprobability density ratio from low signal-to-noise compressed domain voice/music mixed data; detecting the change points of the compressed domain voice and music basing on the new characteristic parameters; and dividing to respectively obtain a divided voice segment and a music segment. Compared with the traditional dividing method, the experimental result shows that the voice/music dividing method obtained by the compressed domain high robust voice/music dividing method based on the probability density ratio can obviously improve the accuracy, the noise resistance and the comprehensive performance.

Description

Compressed domain high robust voice/music dividing method based on the probability density ratio
Technical field
The present invention relates to a kind of compressed domain high robust voice/music dividing method based on the probability density ratio, mainly is that the voice/music based on the probability density ratio changes point detecting method under a kind of different physical environment noise low signal-to-noise ratio condition.
Background technology
Technology such as the systematic searching of compression domain voice/music, scene classification are meant utilizes signal Processing and statistical methods, in a large amount of compressed voice/musical databases, search for the technology of special sound/music, and voice/music to cut apart be one of key issue that realizes the systematic searching technology, the particularly processing under physical environment noise low signal-to-noise ratio condition.
Chang Yong voice/music dividing method in the past, major part is all carried out in uncompressed domain, and directly the problem of cutting apart at the compression domain voice/music relates to seldom, and particularly the research under the low signal-to-noise ratio condition is then still less.But consider that most compression domain voice/music can not finish in the standard recording canopy, that have even from the noisy actual environment, therefore the research of cutting apart at physical environment noise low signal-to-noise ratio condition lower compression territory voice/music seems particularly important.Compression domain voice/music data come from the binary code stream behind raw tone/music encoding, but only can not directly embody the key property of raw tone/music from these data stream.Therefore, it is the data source problem of feature extraction that compression domain voice/music data are cut apart what at first will consider, promptly how packed data is handled, and extracts effective characteristic parameter to satisfy the processing requirements of compressed voice/music data with the calculation cost of minimum.Theoretical analysis and experimental result proof are passed through the packed data partial decoding of h, can obtain and raw tone/similar data of music spectral property, can embody the remarkable difference of voice and music based on the compression domain voice/music data characteristics of this data extract, and can be used for further cutting apart and classification.The compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio adopts above-mentioned thought just, from based on the new characteristic parameter compression domain probability density of the compression domain voice/music extracting data of MPEG1 standard voice the 3rd layer compression technology than (Compressed probability densityratio, CPR), and the compression domain probability density is than zero-crossing rate (Compressed probability density ratiocrossing rate, CPRCR), in compression domain voice/music data, detect the change point of voice and music then, change point at last thus and obtain segmentation result.
Summary of the invention
The objective of the invention is at the defective that exists in the prior art, a kind of compressed domain high robust voice/music dividing method based on the probability density ratio is provided, a voice/music change point detection problem under the different physical environment noise low signal-to-noise ratio conditions in the solution compression domain, can be further used for the identification of compression domain voice/music, voice/music systematic searching, voice/music scene classification etc.For achieving the above object, design of the present invention is:
The compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio at first has the good noise proofness energy, can realize under different physical environment noise low signal-to-noise ratio conditions that compression domain voice/music data cut apart, and its signal to noise ratio (S/N ratio) can be low to moderate 5dB.This is the further processing of compression domain voice/music data, and as classification and retrieval, identification, scene detection etc. provide good basis.
The purpose that the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio is to provide the dividing method of a kind of different physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data, from compression domain voice/music data, directly extract the voice/music characteristic parameter, change point by the voice/music data detects compression domain voice/music data is divided into different classes of voice/music section, and then segmentation result is used for the classification of compression domain voice/music and retrieval etc.
The technical scheme that its technical matters that solves the compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio adopts is: the compression domain voice/music extracting data characteristic parameter under the different physical environment noise low signal-to-noise ratio conditions earlier, again these data are carried out voice/music and change the some detection, change point at last thus and obtain segmentation result.
According to the foregoing invention design, the present invention adopts following technical proposals:
A kind of compressed domain high robust voice/music dividing method based on the probability density ratio, it is characterized in that at first from MP3 (MPEG1-layer3) file, obtaining to embody the data of raw tone/music frequency domain characteristic based on MPEG1 standard voice the 3rd layer compression technology, secondly to the new compression domain probability density of these data extracts than feature parameter (Compressedprobability density ratio, CPR), obtain to embody the compression domain probability density of voice and music different qualities than zero-crossing rate characteristic parameter (Compressed probability density ratio crossingrate based on this parameter then, CPRCR), the last change point that detects voice and music in compression domain voice/music data changes the voice after point finally obtains cutting apart thus, music segments.
This method specifically comprises following five steps:
1), the pre-service of compression domain voice/music data: comprise the obtaining of compression domain voice music blended data, to the reading of decoding frame head and side information, master data read Hafman decoding and quantification;
2), generate and revise discrete cosine transform MDCT matrix: find out the MDCT coefficient in each subband, the coefficient in the subband is arranged, the formation matrix;
3), compression domain voice/music data characteristics Parameter Extraction: comprise compression domain probability density ratio and compression domain probability density asking for than zero-crossing rate characteristic parameter;
4), the change point of voice and music detects: the cut-point that carries out voice/music based on the characteristic parameter that extracts in the step (3) detects;
5), the voice under the different physical environment noise low signal-to-noise ratio condition and the change point of music detect output physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data cut-point, voice, music segments after obtaining cutting apart.
The present invention compared with prior art, have following conspicuous outstanding substantive distinguishing features and remarkable advantage: the present invention directly can effectively embody the significantly characteristic parameter of difference of voice/music from compression domain voice/music extracting data, it is with respect to the method for extracting feature after will the packed data complete solution pressing again, not only simply but also save computing time; Utilize the compression domain probability density can effectively the voice/music cut-point be found out, and this method is for the varying environment noise,, also has good segmentation effect as the noisy sound of automobile noise, train noise and crowd etc. than zero-crossing rate characteristic parameter.Experimental result shows, adopts the present invention to get dividing method than conventional segmentation methods, all is being significantly increased aspect accuracy rate, noise immunity and the combination property.
Description of drawings
Fig. 1 is the process flow diagram that the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio.
Embodiment
A preferred embodiment accompanying drawings of compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio is as follows: this was divided into for five steps based on the compressed domain high robust voice/music dividing method of probability density ratio:
The first step: the pre-service of compression domain voice/music data
The processing of compression domain voice/music data is divided into reading of frame head information, the reading of side information, master data read Hafman decoding and quantification.
1), compression domain voice/music blended data obtains
A), from the audio noise storehouse, obtain one section compression domain white noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), obtain the compression domain voice/music blended data that signal to noise ratio (S/N ratio) is 5dB;
2), frame head information reads
A), read synchronizing information in the frame;
B), according to synchronizing information, make demoder and synchronization of data streams;
C), determine the reference position of these frame data to obtain its frame head information head simultaneously;
3), side information reads
A), determine the side information reference position of these frame data, i.e. its frame head place of finishing;
B), obtain the side information data Side of this frame;
4), master data reads
A), calculate the length M aindata of master data according to side information;
B), read the master data of this frame, its data length is Maindata;
C), from master data information, obtain convergent-divergent English Scale;
5), Hafman decoding and inverse quantization
A), determine the reference position of Huffman data in the master data according to side information Side;
B), the Huffman data are decoded, obtain the Hafman decoding array is of 32*18 dimension;
C), the data among the array is are carried out inverse quantization.
Second step: generate and revise discrete cosine transform MDCT matrix
The data of each particle are made of 32 subbands and each subband contains 18 coefficients, and according to the principle that frequency distributes from low to high, each particle can form one 32 * 18 matrix.This process is as follows:
1), finds out each sub-band coefficients
A), obtain 32 MDCT coefficients of each subband according to Hafman decoding array is;
B), from each MDCT coefficient of each subband, obtain 18 sub-band coefficients;
C), rearrange coefficient in each subband, obtain one group of new sub-band coefficients array S by frequency height principle;
2), form matrix
A), according to the row vector of sub-band coefficients array S, obtain the sub-band serial number array M of 32 * 18 dimensions according to the sub-band serial number combination;
B), according to mentioned above principle, obtain the MDCT matrix of coefficients array M of two particles in these frame data respectively 1And M 2
The 3rd step: compression domain voice/music data characteristics Parameter Extraction
The compression domain feature of being extracted comprise probability density than parameters C PR and probability density than zero-crossing rate CPRCR parameter.
1), asks for the compression domain probability density than CPR characteristic parameter
A), based on bayesian criterion in the statistics;
Set two kinds of hypothesis H 0, H 1:
H 0: the pure noise source of Z=N
H 1: Z=N+S voice/music+noise audio frequency
H wherein 1Be exactly that compressed voice+music+noise mixes input, H 0Be pure noise model.
B), structure noise model;
Suppose H 0Be compression domain white noise model, according to claim 3,4 method, form the MDCT matrix of white noise, structure white noise herein is necessary for the high s/n ratio environment with respect to compression domain voice/music data.
C), calculating probability density is than bayesian criterion model;
Λ = Π K = 1 L λ N ( K ) λ N ( K ) + λ Z ( K ) exp { λ Z ( K ) | Z K | 2 ( λ N ( K ) + λ Z ( K ) ) · λ N ( K ) }
Wherein L represents the number of each frame compressed audio MDCT coefficient, and K is the counter of parameter; Z KK MDCT data representing each frame mixing compressed voice/music data, λ Z(K), λ N(K) represent the variance of audio frequency and noise respectively, λ N(K) can from noise model, estimate to draw λ Z(K) can draw by following formula based on the input signal model:
λ Z ( K ) λ N ( K ) + λ Z ( K ) = ϵ k n = α · | Z K n - 1 | 2 λ N ( k , n - 1 ) + ( 1 - α ) P [ λ ( k ) - 1 ] ;
Wherein P ( X ) = X X ≥ 0 0 Others , α is a weight coefficient, gets α=0.98 among the present invention.
D) compare CPR based on probability density than bayesian criterion Model Calculation probability density;
CPR i = log Λ = 1 L Σ K = 1 L log Λ K
2), ask for the compression domain probability density than zero-crossing rate CPRCR parameter
A), calculated threshold;
Calculate per half second compression domain probability density than threshold value,, choose T in order to demonstrate fully the remarkable details characteristic of voice and music 1T 2Two threshold values, wherein T 1=CPR average, T 2=CPR average * 3, that is:
T 1 = 1 N Σ i = 0 , N + 1 . . . . ( N ) CPR [ i ]
T 2 = ( 1 N Σ i = 0 , N + 1 . . . . ( N ) CPR [ i ] ) * 3 .
Be the probability density ratio of each frame wherein: CPR[i], N is half second a frame number.
B), calculate zero-crossing rate CPRcr1, CPRcr2;
CPRcr 1 = 1 2 Σ m = 0 N - 1 | sgn [ CPR n ( m ) - T 1 ] | - | sgn [ CPR n ( m - 1 ) - T 2 ] |
CPRcr 2 = 1 2 Σ m = 0 N - 1 | sgn [ CPR n ( m ) - T 2 ] | - | sgn [ CPR n ( m - 1 ) - T 2 ] |
Obtain the CPRcr=CPRcr1+CPRcr2 of this segment data; Sgn is-symbol function wherein, CPR n(m) n half second m CPR parameter of expression.
C), calculate per half second final compression domain probability density than zero-crossing rate CPRCR;
CPRCR=CPRCR?/CPRcr?max
Wherein CPRcr max = max i ∈ s ( CPRcr 1 ( i ) + CPRcr 2 ( i ) ) ; Said process is the normalized to CPRcr.
The 4th step: the change point of compression domain voice/music data detects
For guaranteeing the continuity that compression domain voice/music data are cut apart and preventing erroneous judgement, the present invention requires every section voice of cutting apart or the music length must be greater than one second, and needing the individual CPRCR parameter of continuous N (M=2) to be greater than or less than threshold value r could be as effective CPRCR cut-point.
1), described in step 3, the compression domain probability density of calculating each frame compressed voice/music data obtains per half second compression domain probability density than zero-crossing rate CPRCR parameter based on this characteristic parameter then than parameter;
2), revise CPRCR;
We will not satisfy that a continuous N parameter value is greater than or less than r (r=0.5) but parameter point that threshold value occurs being greater than or less than is called singular point.Find out all singular points and it handled, promptly replace current point according to the data before and after the singular point:
CPRCR [ i ] = 1 2 ( CPRCR [ i - 1 ] + CPRCR [ i + 1 ] ) ;
Before cutting apart, find out all singular points and can guarantee the validity cut apart, reduce probability of miscarriage of justice.
3), threshold ratio;
The CPRCR threshold ratio is provided with threshold xi=0.5.
4), cut-point detects;
In view of the probability density ratio characteristic of voice music, the small probability density of voice is than many more than music of the quantity of sequence, and hence one can see that, and the CPRCR of voice is little more than the CPRCR of music.Be voice so the section littler than ξ detects, the section bigger than ξ detects and is music;
5), the cut-point of output compressed voice/music data.
The 5th step: different physical environment noise low signal-to-noise ratio condition lower compression territory voice/music changes point and detects
1), different physical environment noise lower compression territory voice music blended datas obtains;
A), from audio repository, obtain train sound, automobile sound as the physical environment noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), based on the physical environment noise, obtaining signal to noise ratio (S/N ratio) is the compression domain voice/music blended data of 5dB;
2), repeat in the first step 2) to the 4th EOS, export the cut-point of corresponding physical environment noise lower compression territory voice/music data.
Experimental result
The physical environment noise storehouse (as the noisy sound of automobile noise, train noise and crowd etc.) that the compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio uses provincial TV station news to report sound bank, " Ban Derui " special edition music libraries and derive from the sounddogs website.The form of compression domain voice/music data is MP3, and sample frequency is 44.1KHz, and the total time is about 270 minutes (92 compression domain mixing voice/music data sections of 3 minutes *).
The compressed domain high robust voice/music dividing method that we cut apart detection method and the present invention is based on the probability density ratio with traditional B IC above compression domain voice/music data information experimentizes respectively, and measuring accuracy is assessed with the judging nicety rate of voice/music data cut-point.The judging nicety rate of cut-point is defined as: detected judicious cut apart to count account for all number percents of counting of cutting apart to be detected, its computing formula is as follows:
AccuracyRate ( % ) = ( 1 - N S → M N - N M → S N ) * 100 %
In the formula: N S → MExpression was that voice are mistaken for counting of music originally; N M → SExpression was that music is mistaken for counting of voice originally, and N represents that all CPRCR count in the pending sample.
The judging nicety rate of cut-point has embodied in the detected voice/music cut-point, and the ratio that correct cut-point is occupied in all measuring points to be checked has characterized the correctness of testing result.
Experimental result added up show: the cut-point of the compression domain voice/music data when traditional BIC detection method is 5dB to signal to noise ratio (S/N ratio) under the white noise environment detects accuracy rate and only reaches 30.56%, its detection accuracy rate is then lower under the natural noise environment, it only is 25.27% that the cut-point of the compression domain voice/music data when signal to noise ratio (S/N ratio) is 5dB under the train noise environment detects accuracy rate, it only is 22.15% that the cut-point of the compression domain voice/music data when signal to noise ratio (S/N ratio) is 5dB under the automobile noise environment detects accuracy rate, this can not satisfy far away cuts apart demand normally, can think and can not carry out cutting apart of compression domain voice/music data effectively; Use the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio, the rate of accuracy reached to 82.25% that compression domain voice/music data cut-point when signal to noise ratio (S/N ratio) is 5dB under white noise environment detects, under the natural noise environment, also can realize good segmentation effect, the detection rate of accuracy reached to 81.09% of compression domain voice/music data cut-point when signal to noise ratio (S/N ratio) is 5dB under the train noise environment, the detection rate of accuracy reached to 78.21% of compression domain voice/music data cut-point when signal to noise ratio (S/N ratio) is 5dB under the automobile noise environment.
This shows, the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio, can carry out effective voice/music cut-point to different physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data detects, voice/music changes a some detection problem under the different physical environment noise low signal-to-noise ratio conditions thereby solved in the compression domain, this invention can be further used for the identification of compression domain voice/music, voice/music systematic searching, various application occasions such as audio scene analysis.

Claims (7)

1, a kind of compressed domain high robust voice/music dividing method based on the probability density ratio is characterized in that: at first extract the new feature parameter based on the probability density ratio can embody voice and music different qualities from low signal-to-noise ratio compression domain voice/music blended data: the compression domain probability density than and the compression domain probability density compare zero-crossing rate; Based on this new feature parameter compression domain voice and music are changed a detection then; Cut apart thus at last, obtain voice, music segments behind the cut-point respectively.
2, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 1 is characterized in that the concrete operations step is as follows:
1), the pre-service of compressed voice/music data: comprise the obtaining of compression domain voice music blended data, to the reading of decoding frame head and side information, master data read Hafman decoding and quantification;
2), generate and revise discrete cosine transform MDCT matrix: find out the MDCT coefficient in each subband, the coefficient in the subband is arranged, the formation matrix;
3), compression domain voice/music data characteristics Parameter Extraction: comprise compression domain probability density ratio and compression domain probability density asking for than zero-crossing rate characteristic parameter;
4), the change point of voice and music detects: the cut-point that carries out voice/music based on the characteristic parameter that extracts in the step (3) detects;
5), the voice under the different physical environment noise low signal-to-noise ratio condition and the change point of music detect the cut-point of output physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data, voice, music segments after obtaining cutting apart.
3, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that the pre-service concrete steps of the compressed voice/music data of described step 1) are:
1., compression domain voice music blended data obtains
A), from the audio noise storehouse, obtain one section compression domain white noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), obtain the compression domain voice/music blended data that signal to noise ratio (S/N ratio) is 5dB;
2., frame head information reads
A), read synchronizing information in the frame;
B), according to synchronizing information, make demoder and synchronization of data streams;
C), determine the reference position of these frame data to obtain its frame head information head simultaneously;
3., side information reads
A), determine the side information reference position of these frame data, i.e. its frame head place of finishing;
B), obtain the side information data Side of this frame;
4., master data reads
A), calculate the length M aindata of master data according to side information;
B), read the master data of this frame, its data length is Maindata;
C), from master data information, obtain convergent-divergent English Scale;
5., Hafman decoding and inverse quantization
A), determine the reference position of Huffman data in the master data according to side information Side;
B), the Huffman data are decoded, obtain the Hafman decoding array is of 32*18 dimension;
C), the data among the array is are carried out inverse quantization.
4, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that described step 2) generation correction discrete cosine transform MDCT matrix concrete steps be:
1., find out each sub-band coefficients
A), obtain 32 MDCT coefficients of each subband according to Hafman decoding array is;
B), from each MDCT coefficient of each subband, obtain 18 sub-band coefficients;
C), rearrange coefficient in each subband, obtain one group of new sub-band coefficients array S by frequency height principle;
2., form matrix
A), according to the row vector of sub-band coefficients array S, obtain the sub-band serial number array M of 32 * 18 dimensions according to the sub-band serial number combination;
B), according to mentioned above principle, obtain the MDCT matrix of coefficients array M of two particles in these frame data respectively 1And M 2
5, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that the concrete steps of described step 3) compression domain voice/music data characteristics Parameter Extraction are:
1., ask for the compression domain probability density than CPR characteristic parameter
A), based on bayesian criterion in the statistics, set two kinds the hypothesis H 0, H 1:
H 0: the pure noise source of Z=N
H 1: Z=N+S voice/music+noise audio frequency
H wherein 1Be exactly MP3 voice+music+noise input, H 0Be pure noise model;
B), the structure noise model, suppose H 0For compression domain white noise model, according to step 2) middle concrete steps method 2., the MDCT matrix of formation white noise, structure white noise herein is necessary for the high s/n ratio environment with respect to compression domain voice/music data;
C), calculating probability density is than bayesian criterion model:
Λ = Π K = 1 L λ N ( K ) λ N ( K ) + λ Z ( K ) exp { λ Z ( k ) | Z K | 2 ( λ N ( K ) + λ Z ( K ) ) · λ N ( K ) }
Wherein L represents the number of each frame compressed audio MDCT coefficient, and K is the counter of parameter; Z KK MDCT data representing each frame mixing compressed voice/music data, λ Z(K), λ N(K) represent the variance of audio frequency and noise respectively, λ N(K) can from noise model, estimate to draw λ Z(K) can draw by following formula based on the input signal model:
λ Z ( K ) λ N ( K ) + λ Z ( K ) = ϵ k n = α · | Z K n - 1 | 2 λ N ( k , n - 1 ) + ( 1 - α ) P [ λ ( k ) - 1 ] ;
Wherein P ( X ) = X X ≥ 0 0 Others , α is a weight coefficient, gets α=0.98.
D) compare CPR based on bayesian criterion Model Calculation compression domain probability density
CPR i = log Λ = 1 L Σ K = 1 L log Λ K
2., ask for the compression domain probability density than zero-crossing rate CPRCR parameter
A), calculated threshold
Calculate per half second compression domain probability density than threshold value,, choose T in order to demonstrate fully the remarkable details characteristic of voice and music 1T 2Two threshold values, wherein T 1=CPR average, T 2=CPR average * 3, that is:
T 1 = 1 N Σ i = 0 , N + 1 . . . . ( N ) CPR [ i ]
T 2 = ( 1 N Σ i = 0 , N + 1 . . . ( N ) CPR [ i ] ) * 3 .
Be the probability density ratio of each frame wherein: CPR[i], N is half second a frame number;
B), calculate zero-crossing rate CPRcr1, CPRcr2
CPRcr 1 = 1 2 Σ m = 0 N - 1 | sgn [ CPR n ( m ) - T 1 ] - | sgn [ CPR n ( m - 1 ) - T 2 ]
CPRcr 2 = 1 2 Σ m = 0 N - 1 | sgn [ CPR n ( m ) - T 2 ] - | sgn [ CPR n ( m - 1 ) - T 2 ]
Obtain the CPRcr=CPRcr1+CPRcr2 of this segment data; Sgn is-symbol function wherein, CPR n(m) n half second m CPR parameter of expression;
C), calculate per half second final probability density than zero-crossing rate CPRCR
CPRCR=CPRCR?/CPRcr?max
Wherein CPRcr max = max i ∈ s ( CPRCR 1 ( i ) + CPRcr 2 ( i ) ) ; Said process is the normalized to CPRcr.
6, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that the change spot check measuring tool body step of described step 4) voice/music is:
1., the compression domain probability density of calculating each frame data by described step 3) is than parameter, obtains per half second compression domain probability density than zero-crossing rate CPRCR characteristic parameter based on this characteristic parameter then;
2., revise CPRCR
To not satisfy a continuous N parameter value and be greater than or less than r, r=0.5, but the parameter point that threshold value occurs being greater than or less than is called singular point finds out all singular points and it is handled, and promptly replaces current point according to the data before and after the singular point:
CPRCR [ i ] = 1 2 ( CPRCR [ i - 1 ] + CPRCR [ i + 1 ] ) ;
Before cutting apart, find out all singular points and can guarantee the validity cut apart, reduce probability of miscarriage of justice;
3., threshold ratio
The CPRCR threshold ratio is provided with threshold xi=0.5.
4., cut-point detects
In view of the probability density ratio characteristic of voice music, the small probability density of voice is than many more than music of the quantity of sequence, and hence one can see that, and the CPRCR of voice is little more than the CPRCR of music.So the section littler than ζ detects and be the speech modification point, the section detection bigger than ζ changes point for music;
5), the cut-point of output compressed voice/music data.
7, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that voice/music under the different physical environment noise low signal-to-noise ratio conditions of described step 5) changes the concrete steps that point detects and is:
1., different physical environment noise lower compression territory voice music blended datas obtains
A), from audio repository, obtain train sound, automobile sound as the physical environment noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), based on the physical environment noise, obtaining signal to noise ratio (S/N ratio) is the compression domain voice/music blended data of 5dB;
2., according to this compression domain voice/music blended data of claim 3-6 step process, the cut-point of output physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data, thereby voice after obtaining cutting apart, music segments.
CN200910196513A 2009-09-25 2009-09-25 Compressed domain high robust voice/music dividing method based on probability density ratio Pending CN101667423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910196513A CN101667423A (en) 2009-09-25 2009-09-25 Compressed domain high robust voice/music dividing method based on probability density ratio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910196513A CN101667423A (en) 2009-09-25 2009-09-25 Compressed domain high robust voice/music dividing method based on probability density ratio

Publications (1)

Publication Number Publication Date
CN101667423A true CN101667423A (en) 2010-03-10

Family

ID=41804014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910196513A Pending CN101667423A (en) 2009-09-25 2009-09-25 Compressed domain high robust voice/music dividing method based on probability density ratio

Country Status (1)

Country Link
CN (1) CN101667423A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325403A (en) * 2013-06-20 2013-09-25 富泰华工业(深圳)有限公司 Electronic device and video playing method thereof
CN108989882A (en) * 2018-08-03 2018-12-11 百度在线网络技术(北京)有限公司 Method and apparatus for exporting the snatch of music in video
CN112991476A (en) * 2021-02-18 2021-06-18 中国科学院自动化研究所 Scene classification method, system and equipment based on depth compression domain features

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325403A (en) * 2013-06-20 2013-09-25 富泰华工业(深圳)有限公司 Electronic device and video playing method thereof
TWI511536B (en) * 2013-06-20 2015-12-01 Hon Hai Prec Ind Co Ltd A method for playing video and electronic device using the same
CN103325403B (en) * 2013-06-20 2016-04-13 富泰华工业(深圳)有限公司 Electronic installation and video broadcasting method thereof
CN108989882A (en) * 2018-08-03 2018-12-11 百度在线网络技术(北京)有限公司 Method and apparatus for exporting the snatch of music in video
CN112991476A (en) * 2021-02-18 2021-06-18 中国科学院自动化研究所 Scene classification method, system and equipment based on depth compression domain features

Similar Documents

Publication Publication Date Title
US7460994B2 (en) Method and apparatus for producing a fingerprint, and method and apparatus for identifying an audio signal
Chou et al. Robust singing detection in speech/music discriminator design
US7081581B2 (en) Method and device for characterizing a signal and method and device for producing an indexed signal
JP5440051B2 (en) Content identification method, content identification system, content search device, and content use device
US9208790B2 (en) Extraction and matching of characteristic fingerprints from audio signals
CN102799605B (en) A kind of advertisement detecting method and system
US9093120B2 (en) Audio fingerprint extraction by scaling in time and resampling
Seo et al. Audio fingerprinting based on normalized spectral subband moments
CN101221762A (en) MP3 compression field audio partitioning method
CN103959375A (en) Enhanced chroma extraction from an audio codec
US20060173692A1 (en) Audio compression using repetitive structures
CN101667423A (en) Compressed domain high robust voice/music dividing method based on probability density ratio
CN102214219B (en) Audio/video content retrieval system and method
Li et al. Robust audio identification for MP3 popular music
CN103294696A (en) Audio and video content retrieval method and system
Rizzi et al. Genre classification of compressed audio data
Ribbrock et al. A full-text retrieval approach to content-based audio identification
Huang et al. AAC audio compression detection based on QMDCT coefficient
CN102214218A (en) System and method for retrieving contents of audio/video
CN109785848B (en) AAC dual-compression audio detection method based on scale factor coefficient difference
Deng et al. An audio fingerprinting system based on spectral energy structure
CN102655000B (en) Method and device for classifying unvoiced sound and voiced sound
CN108877816A (en) AAC audio weight contracting detection method based on QMDCT coefficient
Yin et al. Robust online music identification using spectral entropy in the compressed domain
Fenet et al. A framework for fingerprint-based detection of repeating objects in multimedia streams

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20100310