CN101667423A - Compressed domain high robust voice/music dividing method based on probability density ratio - Google Patents
Compressed domain high robust voice/music dividing method based on probability density ratio Download PDFInfo
- Publication number
- CN101667423A CN101667423A CN200910196513A CN200910196513A CN101667423A CN 101667423 A CN101667423 A CN 101667423A CN 200910196513 A CN200910196513 A CN 200910196513A CN 200910196513 A CN200910196513 A CN 200910196513A CN 101667423 A CN101667423 A CN 101667423A
- Authority
- CN
- China
- Prior art keywords
- music
- voice
- data
- probability density
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to a compressed domain high robust voice/music dividing method based on probability density ratio, comprising the steps of: extracting new characteristic parameters based on theprobability density ratio from low signal-to-noise compressed domain voice/music mixed data; detecting the change points of the compressed domain voice and music basing on the new characteristic parameters; and dividing to respectively obtain a divided voice segment and a music segment. Compared with the traditional dividing method, the experimental result shows that the voice/music dividing method obtained by the compressed domain high robust voice/music dividing method based on the probability density ratio can obviously improve the accuracy, the noise resistance and the comprehensive performance.
Description
Technical field
The present invention relates to a kind of compressed domain high robust voice/music dividing method based on the probability density ratio, mainly is that the voice/music based on the probability density ratio changes point detecting method under a kind of different physical environment noise low signal-to-noise ratio condition.
Background technology
Technology such as the systematic searching of compression domain voice/music, scene classification are meant utilizes signal Processing and statistical methods, in a large amount of compressed voice/musical databases, search for the technology of special sound/music, and voice/music to cut apart be one of key issue that realizes the systematic searching technology, the particularly processing under physical environment noise low signal-to-noise ratio condition.
Chang Yong voice/music dividing method in the past, major part is all carried out in uncompressed domain, and directly the problem of cutting apart at the compression domain voice/music relates to seldom, and particularly the research under the low signal-to-noise ratio condition is then still less.But consider that most compression domain voice/music can not finish in the standard recording canopy, that have even from the noisy actual environment, therefore the research of cutting apart at physical environment noise low signal-to-noise ratio condition lower compression territory voice/music seems particularly important.Compression domain voice/music data come from the binary code stream behind raw tone/music encoding, but only can not directly embody the key property of raw tone/music from these data stream.Therefore, it is the data source problem of feature extraction that compression domain voice/music data are cut apart what at first will consider, promptly how packed data is handled, and extracts effective characteristic parameter to satisfy the processing requirements of compressed voice/music data with the calculation cost of minimum.Theoretical analysis and experimental result proof are passed through the packed data partial decoding of h, can obtain and raw tone/similar data of music spectral property, can embody the remarkable difference of voice and music based on the compression domain voice/music data characteristics of this data extract, and can be used for further cutting apart and classification.The compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio adopts above-mentioned thought just, from based on the new characteristic parameter compression domain probability density of the compression domain voice/music extracting data of MPEG1 standard voice the 3rd layer compression technology than (Compressed probability densityratio, CPR), and the compression domain probability density is than zero-crossing rate (Compressed probability density ratiocrossing rate, CPRCR), in compression domain voice/music data, detect the change point of voice and music then, change point at last thus and obtain segmentation result.
Summary of the invention
The objective of the invention is at the defective that exists in the prior art, a kind of compressed domain high robust voice/music dividing method based on the probability density ratio is provided, a voice/music change point detection problem under the different physical environment noise low signal-to-noise ratio conditions in the solution compression domain, can be further used for the identification of compression domain voice/music, voice/music systematic searching, voice/music scene classification etc.For achieving the above object, design of the present invention is:
The compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio at first has the good noise proofness energy, can realize under different physical environment noise low signal-to-noise ratio conditions that compression domain voice/music data cut apart, and its signal to noise ratio (S/N ratio) can be low to moderate 5dB.This is the further processing of compression domain voice/music data, and as classification and retrieval, identification, scene detection etc. provide good basis.
The purpose that the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio is to provide the dividing method of a kind of different physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data, from compression domain voice/music data, directly extract the voice/music characteristic parameter, change point by the voice/music data detects compression domain voice/music data is divided into different classes of voice/music section, and then segmentation result is used for the classification of compression domain voice/music and retrieval etc.
The technical scheme that its technical matters that solves the compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio adopts is: the compression domain voice/music extracting data characteristic parameter under the different physical environment noise low signal-to-noise ratio conditions earlier, again these data are carried out voice/music and change the some detection, change point at last thus and obtain segmentation result.
According to the foregoing invention design, the present invention adopts following technical proposals:
A kind of compressed domain high robust voice/music dividing method based on the probability density ratio, it is characterized in that at first from MP3 (MPEG1-layer3) file, obtaining to embody the data of raw tone/music frequency domain characteristic based on MPEG1 standard voice the 3rd layer compression technology, secondly to the new compression domain probability density of these data extracts than feature parameter (Compressedprobability density ratio, CPR), obtain to embody the compression domain probability density of voice and music different qualities than zero-crossing rate characteristic parameter (Compressed probability density ratio crossingrate based on this parameter then, CPRCR), the last change point that detects voice and music in compression domain voice/music data changes the voice after point finally obtains cutting apart thus, music segments.
This method specifically comprises following five steps:
1), the pre-service of compression domain voice/music data: comprise the obtaining of compression domain voice music blended data, to the reading of decoding frame head and side information, master data read Hafman decoding and quantification;
2), generate and revise discrete cosine transform MDCT matrix: find out the MDCT coefficient in each subband, the coefficient in the subband is arranged, the formation matrix;
3), compression domain voice/music data characteristics Parameter Extraction: comprise compression domain probability density ratio and compression domain probability density asking for than zero-crossing rate characteristic parameter;
4), the change point of voice and music detects: the cut-point that carries out voice/music based on the characteristic parameter that extracts in the step (3) detects;
5), the voice under the different physical environment noise low signal-to-noise ratio condition and the change point of music detect output physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data cut-point, voice, music segments after obtaining cutting apart.
The present invention compared with prior art, have following conspicuous outstanding substantive distinguishing features and remarkable advantage: the present invention directly can effectively embody the significantly characteristic parameter of difference of voice/music from compression domain voice/music extracting data, it is with respect to the method for extracting feature after will the packed data complete solution pressing again, not only simply but also save computing time; Utilize the compression domain probability density can effectively the voice/music cut-point be found out, and this method is for the varying environment noise,, also has good segmentation effect as the noisy sound of automobile noise, train noise and crowd etc. than zero-crossing rate characteristic parameter.Experimental result shows, adopts the present invention to get dividing method than conventional segmentation methods, all is being significantly increased aspect accuracy rate, noise immunity and the combination property.
Description of drawings
Fig. 1 is the process flow diagram that the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio.
Embodiment
A preferred embodiment accompanying drawings of compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio is as follows: this was divided into for five steps based on the compressed domain high robust voice/music dividing method of probability density ratio:
The first step: the pre-service of compression domain voice/music data
The processing of compression domain voice/music data is divided into reading of frame head information, the reading of side information, master data read Hafman decoding and quantification.
1), compression domain voice/music blended data obtains
A), from the audio noise storehouse, obtain one section compression domain white noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), obtain the compression domain voice/music blended data that signal to noise ratio (S/N ratio) is 5dB;
2), frame head information reads
A), read synchronizing information in the frame;
B), according to synchronizing information, make demoder and synchronization of data streams;
C), determine the reference position of these frame data to obtain its frame head information head simultaneously;
3), side information reads
A), determine the side information reference position of these frame data, i.e. its frame head place of finishing;
B), obtain the side information data Side of this frame;
4), master data reads
A), calculate the length M aindata of master data according to side information;
B), read the master data of this frame, its data length is Maindata;
C), from master data information, obtain convergent-divergent English Scale;
5), Hafman decoding and inverse quantization
A), determine the reference position of Huffman data in the master data according to side information Side;
B), the Huffman data are decoded, obtain the Hafman decoding array is of 32*18 dimension;
C), the data among the array is are carried out inverse quantization.
Second step: generate and revise discrete cosine transform MDCT matrix
The data of each particle are made of 32 subbands and each subband contains 18 coefficients, and according to the principle that frequency distributes from low to high, each particle can form one 32 * 18 matrix.This process is as follows:
1), finds out each sub-band coefficients
A), obtain 32 MDCT coefficients of each subband according to Hafman decoding array is;
B), from each MDCT coefficient of each subband, obtain 18 sub-band coefficients;
C), rearrange coefficient in each subband, obtain one group of new sub-band coefficients array S by frequency height principle;
2), form matrix
A), according to the row vector of sub-band coefficients array S, obtain the sub-band serial number array M of 32 * 18 dimensions according to the sub-band serial number combination;
B), according to mentioned above principle, obtain the MDCT matrix of coefficients array M of two particles in these frame data respectively
1And M
2
The 3rd step: compression domain voice/music data characteristics Parameter Extraction
The compression domain feature of being extracted comprise probability density than parameters C PR and probability density than zero-crossing rate CPRCR parameter.
1), asks for the compression domain probability density than CPR characteristic parameter
A), based on bayesian criterion in the statistics;
Set two kinds of hypothesis H
0, H
1:
H
0: the pure noise source of Z=N
H
1: Z=N+S voice/music+noise audio frequency
H wherein
1Be exactly that compressed voice+music+noise mixes input, H
0Be pure noise model.
B), structure noise model;
Suppose H
0Be compression domain white noise model, according to claim 3,4 method, form the MDCT matrix of white noise, structure white noise herein is necessary for the high s/n ratio environment with respect to compression domain voice/music data.
C), calculating probability density is than bayesian criterion model;
Wherein L represents the number of each frame compressed audio MDCT coefficient, and K is the counter of parameter; Z
KK MDCT data representing each frame mixing compressed voice/music data, λ
Z(K), λ
N(K) represent the variance of audio frequency and noise respectively, λ
N(K) can from noise model, estimate to draw λ
Z(K) can draw by following formula based on the input signal model:
Wherein
α is a weight coefficient, gets α=0.98 among the present invention.
D) compare CPR based on probability density than bayesian criterion Model Calculation probability density;
2), ask for the compression domain probability density than zero-crossing rate CPRCR parameter
A), calculated threshold;
Calculate per half second compression domain probability density than threshold value,, choose T in order to demonstrate fully the remarkable details characteristic of voice and music
1T
2Two threshold values, wherein T
1=CPR average, T
2=CPR average * 3, that is:
Be the probability density ratio of each frame wherein: CPR[i], N is half second a frame number.
B), calculate zero-crossing rate CPRcr1, CPRcr2;
Obtain the CPRcr=CPRcr1+CPRcr2 of this segment data; Sgn is-symbol function wherein, CPR
n(m) n half second m CPR parameter of expression.
C), calculate per half second final compression domain probability density than zero-crossing rate CPRCR;
CPRCR=CPRCR?/CPRcr?max
Wherein
Said process is the normalized to CPRcr.
The 4th step: the change point of compression domain voice/music data detects
For guaranteeing the continuity that compression domain voice/music data are cut apart and preventing erroneous judgement, the present invention requires every section voice of cutting apart or the music length must be greater than one second, and needing the individual CPRCR parameter of continuous N (M=2) to be greater than or less than threshold value r could be as effective CPRCR cut-point.
1), described in step 3, the compression domain probability density of calculating each frame compressed voice/music data obtains per half second compression domain probability density than zero-crossing rate CPRCR parameter based on this characteristic parameter then than parameter;
2), revise CPRCR;
We will not satisfy that a continuous N parameter value is greater than or less than r (r=0.5) but parameter point that threshold value occurs being greater than or less than is called singular point.Find out all singular points and it handled, promptly replace current point according to the data before and after the singular point:
Before cutting apart, find out all singular points and can guarantee the validity cut apart, reduce probability of miscarriage of justice.
3), threshold ratio;
The CPRCR threshold ratio is provided with threshold xi=0.5.
4), cut-point detects;
In view of the probability density ratio characteristic of voice music, the small probability density of voice is than many more than music of the quantity of sequence, and hence one can see that, and the CPRCR of voice is little more than the CPRCR of music.Be voice so the section littler than ξ detects, the section bigger than ξ detects and is music;
5), the cut-point of output compressed voice/music data.
The 5th step: different physical environment noise low signal-to-noise ratio condition lower compression territory voice/music changes point and detects
1), different physical environment noise lower compression territory voice music blended datas obtains;
A), from audio repository, obtain train sound, automobile sound as the physical environment noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), based on the physical environment noise, obtaining signal to noise ratio (S/N ratio) is the compression domain voice/music blended data of 5dB;
2), repeat in the first step 2) to the 4th EOS, export the cut-point of corresponding physical environment noise lower compression territory voice/music data.
Experimental result
The physical environment noise storehouse (as the noisy sound of automobile noise, train noise and crowd etc.) that the compressed domain high robust voice/music dividing method that the present invention is based on the probability density ratio uses provincial TV station news to report sound bank, " Ban Derui " special edition music libraries and derive from the sounddogs website.The form of compression domain voice/music data is MP3, and sample frequency is 44.1KHz, and the total time is about 270 minutes (92 compression domain mixing voice/music data sections of 3 minutes *).
The compressed domain high robust voice/music dividing method that we cut apart detection method and the present invention is based on the probability density ratio with traditional B IC above compression domain voice/music data information experimentizes respectively, and measuring accuracy is assessed with the judging nicety rate of voice/music data cut-point.The judging nicety rate of cut-point is defined as: detected judicious cut apart to count account for all number percents of counting of cutting apart to be detected, its computing formula is as follows:
In the formula: N
S → MExpression was that voice are mistaken for counting of music originally; N
M → SExpression was that music is mistaken for counting of voice originally, and N represents that all CPRCR count in the pending sample.
The judging nicety rate of cut-point has embodied in the detected voice/music cut-point, and the ratio that correct cut-point is occupied in all measuring points to be checked has characterized the correctness of testing result.
Experimental result added up show: the cut-point of the compression domain voice/music data when traditional BIC detection method is 5dB to signal to noise ratio (S/N ratio) under the white noise environment detects accuracy rate and only reaches 30.56%, its detection accuracy rate is then lower under the natural noise environment, it only is 25.27% that the cut-point of the compression domain voice/music data when signal to noise ratio (S/N ratio) is 5dB under the train noise environment detects accuracy rate, it only is 22.15% that the cut-point of the compression domain voice/music data when signal to noise ratio (S/N ratio) is 5dB under the automobile noise environment detects accuracy rate, this can not satisfy far away cuts apart demand normally, can think and can not carry out cutting apart of compression domain voice/music data effectively; Use the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio, the rate of accuracy reached to 82.25% that compression domain voice/music data cut-point when signal to noise ratio (S/N ratio) is 5dB under white noise environment detects, under the natural noise environment, also can realize good segmentation effect, the detection rate of accuracy reached to 81.09% of compression domain voice/music data cut-point when signal to noise ratio (S/N ratio) is 5dB under the train noise environment, the detection rate of accuracy reached to 78.21% of compression domain voice/music data cut-point when signal to noise ratio (S/N ratio) is 5dB under the automobile noise environment.
This shows, the present invention is based on the compressed domain high robust voice/music dividing method of probability density ratio, can carry out effective voice/music cut-point to different physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data detects, voice/music changes a some detection problem under the different physical environment noise low signal-to-noise ratio conditions thereby solved in the compression domain, this invention can be further used for the identification of compression domain voice/music, voice/music systematic searching, various application occasions such as audio scene analysis.
Claims (7)
1, a kind of compressed domain high robust voice/music dividing method based on the probability density ratio is characterized in that: at first extract the new feature parameter based on the probability density ratio can embody voice and music different qualities from low signal-to-noise ratio compression domain voice/music blended data: the compression domain probability density than and the compression domain probability density compare zero-crossing rate; Based on this new feature parameter compression domain voice and music are changed a detection then; Cut apart thus at last, obtain voice, music segments behind the cut-point respectively.
2, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 1 is characterized in that the concrete operations step is as follows:
1), the pre-service of compressed voice/music data: comprise the obtaining of compression domain voice music blended data, to the reading of decoding frame head and side information, master data read Hafman decoding and quantification;
2), generate and revise discrete cosine transform MDCT matrix: find out the MDCT coefficient in each subband, the coefficient in the subband is arranged, the formation matrix;
3), compression domain voice/music data characteristics Parameter Extraction: comprise compression domain probability density ratio and compression domain probability density asking for than zero-crossing rate characteristic parameter;
4), the change point of voice and music detects: the cut-point that carries out voice/music based on the characteristic parameter that extracts in the step (3) detects;
5), the voice under the different physical environment noise low signal-to-noise ratio condition and the change point of music detect the cut-point of output physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data, voice, music segments after obtaining cutting apart.
3, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that the pre-service concrete steps of the compressed voice/music data of described step 1) are:
1., compression domain voice music blended data obtains
A), from the audio noise storehouse, obtain one section compression domain white noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), obtain the compression domain voice/music blended data that signal to noise ratio (S/N ratio) is 5dB;
2., frame head information reads
A), read synchronizing information in the frame;
B), according to synchronizing information, make demoder and synchronization of data streams;
C), determine the reference position of these frame data to obtain its frame head information head simultaneously;
3., side information reads
A), determine the side information reference position of these frame data, i.e. its frame head place of finishing;
B), obtain the side information data Side of this frame;
4., master data reads
A), calculate the length M aindata of master data according to side information;
B), read the master data of this frame, its data length is Maindata;
C), from master data information, obtain convergent-divergent English Scale;
5., Hafman decoding and inverse quantization
A), determine the reference position of Huffman data in the master data according to side information Side;
B), the Huffman data are decoded, obtain the Hafman decoding array is of 32*18 dimension;
C), the data among the array is are carried out inverse quantization.
4, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that described step 2) generation correction discrete cosine transform MDCT matrix concrete steps be:
1., find out each sub-band coefficients
A), obtain 32 MDCT coefficients of each subband according to Hafman decoding array is;
B), from each MDCT coefficient of each subband, obtain 18 sub-band coefficients;
C), rearrange coefficient in each subband, obtain one group of new sub-band coefficients array S by frequency height principle;
2., form matrix
A), according to the row vector of sub-band coefficients array S, obtain the sub-band serial number array M of 32 * 18 dimensions according to the sub-band serial number combination;
B), according to mentioned above principle, obtain the MDCT matrix of coefficients array M of two particles in these frame data respectively
1And M
2
5, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that the concrete steps of described step 3) compression domain voice/music data characteristics Parameter Extraction are:
1., ask for the compression domain probability density than CPR characteristic parameter
A), based on bayesian criterion in the statistics, set two kinds the hypothesis H
0, H
1:
H
0: the pure noise source of Z=N
H
1: Z=N+S voice/music+noise audio frequency
H wherein
1Be exactly MP3 voice+music+noise input, H
0Be pure noise model;
B), the structure noise model, suppose H
0For compression domain white noise model, according to step 2) middle concrete steps method 2., the MDCT matrix of formation white noise, structure white noise herein is necessary for the high s/n ratio environment with respect to compression domain voice/music data;
C), calculating probability density is than bayesian criterion model:
Wherein L represents the number of each frame compressed audio MDCT coefficient, and K is the counter of parameter; Z
KK MDCT data representing each frame mixing compressed voice/music data, λ
Z(K), λ
N(K) represent the variance of audio frequency and noise respectively, λ
N(K) can from noise model, estimate to draw λ
Z(K) can draw by following formula based on the input signal model:
Wherein
α is a weight coefficient, gets α=0.98.
D) compare CPR based on bayesian criterion Model Calculation compression domain probability density
2., ask for the compression domain probability density than zero-crossing rate CPRCR parameter
A), calculated threshold
Calculate per half second compression domain probability density than threshold value,, choose T in order to demonstrate fully the remarkable details characteristic of voice and music
1T
2Two threshold values, wherein T
1=CPR average, T
2=CPR average * 3, that is:
Be the probability density ratio of each frame wherein: CPR[i], N is half second a frame number;
B), calculate zero-crossing rate CPRcr1, CPRcr2
Obtain the CPRcr=CPRcr1+CPRcr2 of this segment data; Sgn is-symbol function wherein, CPR
n(m) n half second m CPR parameter of expression;
C), calculate per half second final probability density than zero-crossing rate CPRCR
CPRCR=CPRCR?/CPRcr?max
Wherein
Said process is the normalized to CPRcr.
6, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that the change spot check measuring tool body step of described step 4) voice/music is:
1., the compression domain probability density of calculating each frame data by described step 3) is than parameter, obtains per half second compression domain probability density than zero-crossing rate CPRCR characteristic parameter based on this characteristic parameter then;
2., revise CPRCR
To not satisfy a continuous N parameter value and be greater than or less than r, r=0.5, but the parameter point that threshold value occurs being greater than or less than is called singular point finds out all singular points and it is handled, and promptly replaces current point according to the data before and after the singular point:
Before cutting apart, find out all singular points and can guarantee the validity cut apart, reduce probability of miscarriage of justice;
3., threshold ratio
The CPRCR threshold ratio is provided with threshold xi=0.5.
4., cut-point detects
In view of the probability density ratio characteristic of voice music, the small probability density of voice is than many more than music of the quantity of sequence, and hence one can see that, and the CPRCR of voice is little more than the CPRCR of music.So the section littler than ζ detects and be the speech modification point, the section detection bigger than ζ changes point for music;
5), the cut-point of output compressed voice/music data.
7, the compressed domain high robust voice/music dividing method based on the probability density ratio according to claim 2 is characterized in that voice/music under the different physical environment noise low signal-to-noise ratio conditions of described step 5) changes the concrete steps that point detects and is:
1., different physical environment noise lower compression territory voice music blended datas obtains
A), from audio repository, obtain train sound, automobile sound as the physical environment noise;
B), obtain pure compression domain voice and music samples from the voice/music storehouse;
C), based on the physical environment noise, obtaining signal to noise ratio (S/N ratio) is the compression domain voice/music blended data of 5dB;
2., according to this compression domain voice/music blended data of claim 3-6 step process, the cut-point of output physical environment noise low signal-to-noise ratio condition lower compression territory voice/music data, thereby voice after obtaining cutting apart, music segments.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910196513A CN101667423A (en) | 2009-09-25 | 2009-09-25 | Compressed domain high robust voice/music dividing method based on probability density ratio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910196513A CN101667423A (en) | 2009-09-25 | 2009-09-25 | Compressed domain high robust voice/music dividing method based on probability density ratio |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101667423A true CN101667423A (en) | 2010-03-10 |
Family
ID=41804014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910196513A Pending CN101667423A (en) | 2009-09-25 | 2009-09-25 | Compressed domain high robust voice/music dividing method based on probability density ratio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101667423A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325403A (en) * | 2013-06-20 | 2013-09-25 | 富泰华工业(深圳)有限公司 | Electronic device and video playing method thereof |
CN108989882A (en) * | 2018-08-03 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Method and apparatus for exporting the snatch of music in video |
CN112991476A (en) * | 2021-02-18 | 2021-06-18 | 中国科学院自动化研究所 | Scene classification method, system and equipment based on depth compression domain features |
-
2009
- 2009-09-25 CN CN200910196513A patent/CN101667423A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325403A (en) * | 2013-06-20 | 2013-09-25 | 富泰华工业(深圳)有限公司 | Electronic device and video playing method thereof |
TWI511536B (en) * | 2013-06-20 | 2015-12-01 | Hon Hai Prec Ind Co Ltd | A method for playing video and electronic device using the same |
CN103325403B (en) * | 2013-06-20 | 2016-04-13 | 富泰华工业(深圳)有限公司 | Electronic installation and video broadcasting method thereof |
CN108989882A (en) * | 2018-08-03 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Method and apparatus for exporting the snatch of music in video |
CN112991476A (en) * | 2021-02-18 | 2021-06-18 | 中国科学院自动化研究所 | Scene classification method, system and equipment based on depth compression domain features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7460994B2 (en) | Method and apparatus for producing a fingerprint, and method and apparatus for identifying an audio signal | |
Chou et al. | Robust singing detection in speech/music discriminator design | |
US7081581B2 (en) | Method and device for characterizing a signal and method and device for producing an indexed signal | |
JP5440051B2 (en) | Content identification method, content identification system, content search device, and content use device | |
US9208790B2 (en) | Extraction and matching of characteristic fingerprints from audio signals | |
CN102799605B (en) | A kind of advertisement detecting method and system | |
US9093120B2 (en) | Audio fingerprint extraction by scaling in time and resampling | |
Seo et al. | Audio fingerprinting based on normalized spectral subband moments | |
CN101221762A (en) | MP3 compression field audio partitioning method | |
CN103959375A (en) | Enhanced chroma extraction from an audio codec | |
US20060173692A1 (en) | Audio compression using repetitive structures | |
CN101667423A (en) | Compressed domain high robust voice/music dividing method based on probability density ratio | |
CN102214219B (en) | Audio/video content retrieval system and method | |
Li et al. | Robust audio identification for MP3 popular music | |
CN103294696A (en) | Audio and video content retrieval method and system | |
Rizzi et al. | Genre classification of compressed audio data | |
Ribbrock et al. | A full-text retrieval approach to content-based audio identification | |
Huang et al. | AAC audio compression detection based on QMDCT coefficient | |
CN102214218A (en) | System and method for retrieving contents of audio/video | |
CN109785848B (en) | AAC dual-compression audio detection method based on scale factor coefficient difference | |
Deng et al. | An audio fingerprinting system based on spectral energy structure | |
CN102655000B (en) | Method and device for classifying unvoiced sound and voiced sound | |
CN108877816A (en) | AAC audio weight contracting detection method based on QMDCT coefficient | |
Yin et al. | Robust online music identification using spectral entropy in the compressed domain | |
Fenet et al. | A framework for fingerprint-based detection of repeating objects in multimedia streams |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20100310 |