CN112214635B - Fast audio retrieval method based on cepstrum analysis - Google Patents
Fast audio retrieval method based on cepstrum analysis Download PDFInfo
- Publication number
- CN112214635B CN112214635B CN202011145738.9A CN202011145738A CN112214635B CN 112214635 B CN112214635 B CN 112214635B CN 202011145738 A CN202011145738 A CN 202011145738A CN 112214635 B CN112214635 B CN 112214635B
- Authority
- CN
- China
- Prior art keywords
- audio
- retrieval
- features
- sample
- section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a fast audio retrieval method based on cepstrum analysis, and belongs to the technical field of audio retrieval. The invention comprises the following steps: firstly, constructing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to a signal energy ratio cycle to construct the retrieval audio feature library for retrieval; secondly, extracting sample audio fingerprints, and extracting frequency domain characteristics from the sample audio input by the user according to the signal energy ratio to form sample audio characteristics; thirdly, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieved audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixed features is more accurate; and fourthly, searching the sample audio, namely searching the searched audio features with the highest similarity to the sample audio features in the searched audio feature library by using a cepstrum analysis method, wherein the corresponding searched audio information is the sample audio searching result. The audio features extracted by the method are high in representativeness and small in occupied space; during retrieval, the cepstrum analysis is directly carried out on a mixed result of two audio features, and the cepstrum analysis only carries out Fourier correlation transformation on the mixed features, so that the calculation amount is small and the calculation speed is high. Therefore, aiming at the defect of low retrieval efficiency in audio retrieval application in the prior art, the invention greatly improves the retrieval efficiency on the premise of ensuring the audio retrieval accuracy.
Description
Technical Field
The invention relates to a fast audio retrieval method based on cepstrum analysis, and belongs to the technical field of audio retrieval.
Background
With the advent of the big data age, the amount of multimedia information on the internet has increased explosively. The traditional audio retrieval based on text labels aims at establishing different label libraries for different fields, and the method has no universality and can not meet the requirements of people on multimedia retrieval. Therefore, the method for constructing the audio fingerprint database and performing audio retrieval in the hash index mode is provided. Most of subsequent audio retrieval algorithms are improved based on the thought, but the key problems that retrieval accuracy and retrieval efficiency cannot be balanced exist.
Disclosure of Invention
The invention aims to provide a fast audio retrieval method based on cepstrum analysis, which can greatly improve the retrieval efficiency on the premise of ensuring the retrieval accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme:
s1, establishing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to the signal energy ratio in a circulating manner to establish the retrieval audio feature library for retrieval;
s2, extracting sample audio fingerprints, and extracting frequency domain characteristics from the sample audio input by the user according to the signal energy ratio to form sample audio characteristics;
s3, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieval audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixing features is more accurate;
s4, sample audio retrieval, namely searching for a retrieval audio feature with the highest similarity to the sample audio feature in a retrieval audio feature library by using a cepstrum analysis method, wherein the corresponding retrieval audio information is a sample audio retrieval result;
preferably, before performing steps S1 and S2, the method for extracting audio features according to the signal energy ratio in the frequency domain is described, further comprising:
in the frequency domain, each frame of signal starts from the first data point, and goes down in sequence, the energy value of the current data point is divided by the sum of the energy of the whole frame of data points, the obtained result is a row of energy ratio, and the calculation is shown as the formula (1):
wherein Er represents the energy ratio, E represents the energy, k t Indicates the corresponding time point, k f Representing the corresponding frequency point, and Q representing the upper frequency limit;
finding out the frequency position corresponding to the maximum value in the energy ratio, calculating all frames of a section of audio according to the method, and expressing a section of audio characteristic by a group of one-dimensional arrays (dimension, namely audio frame number) according to the calculation result;
preferably, the step S1 includes:
s1.1, performing Fourier transform on each frame of signals obtained after framing and windowing the retrieved audio signals;
s1.2, extracting a frequency position corresponding to a maximum point of the energy ratio of each frame of signal in a frequency domain as a feature, wherein the extraction result represents a section of retrieval audio feature by a group of one-dimensional numerical groups, and the one-dimensional retrieval audio feature F with the length of N is used as a one-dimensional retrieval audio feature F T Is represented as follows:
F T =(f t1 f t2 f t3 … f tN ) (2)
s1.3, traversing all audios in the search audio library in the modes of S1.1 and S1.2, and naming each section of search audio features by respective search audio information so as to construct a search audio feature library;
preferably, the step S2 includes:
s2.1, a user inputs an audio clip of any retrieval audio as a sample audio signal, wherein the duration of the audio clip is R seconds and the audio clip can have white noise with a certain signal-to-noise ratio;
s2.2, performing Fourier transform on each frame of signals obtained after framing and windowing the sample audio signals;
s2.3, extracting the frequency position corresponding to the maximum point of the energy ratio of each frame signal in the frequency domain as a feature, wherein the extraction result represents a section of sample audio feature by a group of one-dimensional arrays, and the length of the one-dimensional sample audio feature is MF S Is represented as follows:
F S =(f s1 f s2 f s3 … f sM ) (3)
preferably, the step S3 includes:
s3.1, retrieving a first section of retrieval audio in an audio library as an original audio, and intercepting an audio fragment of R seconds in the original audio as an audio to be detected;
s3.2, extracting the audio features to be detected, wherein the length is L1;
s3.3, mixing the audio feature to be detected and the original audio feature with the length of L2 from the first point until the point of subtracting L1 from L2 is finished;
s3.4, Fourier transformation, modulus value taking, logarithm solving and inverse Fourier transformation are carried out on each mixing result of the S3.3, cepstrum domain data are obtained, a peak value in the first half of data is found out after an autocorrelation peak of the cepstrum domain data is eliminated, the similarity between the audio feature to be detected and the original audio feature is calculated according to the peak value, a similarity result and corresponding mixing point information are recorded, and the mixing point information corresponding to the highest similarity in the record is returned, namely the optimal mixing point tau of the audio feature of the sample with the length;
preferably, the step S4 includes:
s4.1, mixing the sample audio features with the optimal mixing point calculated in the step S3 and the retrieved audio features to obtain mixed features;
s4.2, Fourier transform, modulus value taking, logarithm solving and inverse Fourier transform are carried out on the mixed features obtained in the S4.1, cepstral domain data are obtained, after an autocorrelation peak of the cepstral domain data is eliminated, a peak value in the first half of data is found out, and the similarity between the sample audio features and the retrieval audio features is calculated according to the peak value;
the search audio features and the sample audio features are both one-dimensional arrays, so that the search audio features and the sample audio features can be regarded as two waveform signals, and the principle of calculating the similarity between the two waveform signals by performing cepstrum analysis on a mixed signal is as follows:
suppose that the retrieved audio feature signal is x 1 (t) the sample audio feature signal is x 2 (t):
Where τ (τ > 0) is the optimum mixing point, i.e. the audio feature signal x is retrieved 1 (t) and sample audio feature signal x 2 Time delay between (t), a 1 And a 2 Is an attenuation factor of the signal, and a 1 ∈(0,1),a 2 ∈(0,1);
The mixed signal is configured as:
y(t)=x(t)*(a 1 δ(t)+a 2 δ(t-τ)) (5)
according to the definition of the power cepstrum, the cepstrum analysis result of the mixed signal is as follows:
as can be seen from equation (3), in the power cepstrum of the mixed signal, there are impulse peak amounts at the optimal mixing point position and at integer multiple positions thereof. Eliminating the self-correlation peak interference in the cepstrum, finding out an impact peak from the first half power cepstrum, and calculating according to the impact peak to obtain the similarity;
s4.3, performing S4.1 and S4.2 on sample feature circulation and each section of retrieval audio features in the retrieval audio feature library, recording similarity results and corresponding retrieval audio information, and returning the retrieval audio information corresponding to the highest similarity in the record, namely the sample audio retrieval result;
compared with the traditional method that the index value of each data point pair in the two audio fingerprints needs to be matched in the Hash search, the search method provided by the invention directly performs cepstrum analysis on the mixed result of the two audio features to obtain the similarity, and the cepstrum analysis only performs Fourier correlation transformation on the mixed result, so that the calculation amount is small and the calculation speed is high.
Drawings
FIG. 1 is a flowchart illustrating an exemplary audio retrieval method according to the present invention
FIG. 2 is a schematic diagram of feature extraction according to the present invention
FIG. 3 is a flow chart of the present invention for constructing a search audio feature library
FIG. 4 is a flow chart of extracting sample audio features according to the present invention
FIG. 5 is a waveform of the audio characteristics of the present invention
FIG. 6 is a flow chart of the present invention for determining the optimal mixing point according to the sample length
FIG. 7 is a flow chart of mixed feature cepstrum analysis in accordance with the present invention
FIG. 8 shows the result of cepstrum analysis of the mixture characteristic of the present invention
Detailed Description
The invention will be further described by means of embodiments in conjunction with the accompanying drawings.
In order to solve the problem that the prior art cannot balance and take the retrieval accuracy and the retrieval efficiency into consideration, the embodiment of the invention provides a fast audio retrieval method based on cepstrum analysis, as shown in fig. 1, comprising the following operations:
s1, establishing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to the signal energy ratio in a circulating manner to establish the retrieval audio feature library for retrieval;
s2, extracting sample audio fingerprints, and extracting frequency domain characteristics from the sample audio input by the user according to the signal energy ratio to form sample audio characteristics;
s3, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieval audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixing features is more accurate;
s4, sample audio retrieval, namely searching for a retrieval audio feature with the highest similarity to the sample audio feature in a retrieval audio feature library by using a cepstrum analysis method, wherein the corresponding retrieval audio information is a sample audio retrieval result;
before proceeding to steps S1 and S2, it is necessary to explain a method of extracting audio features according to the signal energy ratio in the frequency domain in the embodiment:
as shown in fig. 2, each frame signal in the frequency domain starts from the first data point, and sequentially goes down, the energy value of the current data point is divided by the sum of the energies of the data points of the whole frame, the obtained result is a row of energy ratio values, the frequency position corresponding to the maximum value in the row of energy ratio values is found, all frames of a section of audio are calculated according to the method, and the calculation result represents a section of audio features by a group of one-dimensional arrays (dimension, i.e. audio frame number);
on the basis of the above embodiments, step S1 includes: description will be made with reference to FIG. 3
As shown in fig. 3:
s1.1, performing Fourier transform on each frame of signals obtained after framing and windowing the retrieved audio signals;
searching 19 pieces of (I ═ 19) audio in the audio library, wherein the time length of each piece of audio is 1min, dividing the 1 st piece of (I ═ 1) audio into 29999 frames (N ═ 29999), performing Fourier transform on the 1 st frame (N ═ 1) signal, and when N is equal to N, indicating that each frame of signal is subjected to Fourier transform;
s1.2, calculating and searching the signal energy ratio of each frame in the audio frequency domain according to the formula (1), wherein 29999 frames are calculated:
extracting the frequency position corresponding to the maximum point of the energy ratio of each frame signal as a feature, wherein the length of the one-dimensional retrieval audio feature F is 29999 frames T Is represented as follows:
F T =(5 1 3 … 8)
s1.3, traversing and retrieving all audios of the audio library in the ways of S1.1 and S1.2, naming each section of retrieved audio features by respective retrieved audio information, and when I is equal to I, indicating that the audio traversal of the whole retrieved audio library is finished, and completing the construction of the retrieved audio feature library;
on the basis of the above embodiments, step S2 includes: description will be made with reference to FIG. 4
As shown in fig. 4:
s2.1, if the user inputs an audio clip of any search audio as a sample audio signal, the duration of the audio clip is 20 seconds (R is 20), and the audio clip has white noise with a signal-to-noise ratio of 10 dB;
s2.2, performing Fourier transform on each frame of signal obtained after the frame windowing is performed on the sample audio signal;
dividing sample audio input by a user into 9999 frames (M & lt9999 & gt), performing Fourier transform on a 1 st frame (M & lt1 & gt) signal, and performing Fourier transform on each frame signal when M is equal to M;
calculating the signal energy ratio of each frame in the sample audio frequency domain according to the formula (1), and obtaining 9999 frames:
extracting the frequency position corresponding to the maximum point of the energy ratio of each frame signal as a feature, wherein the length of the one-dimensional retrieval audio feature F is 29999 frames T Is represented as follows:
F S =(1 1 3 … 6)
FIG. 5 illustrates retrieving an audio signature and a sample audio signature;
on the basis of the above embodiments, step S3 includes: description will be made with reference to FIG. 6
As shown in fig. 6:
s3.1, retrieving a first section of retrieval audio in an audio library as an original audio, and intercepting an audio fragment with the duration consistent with that of the sample audio from the original audio, namely, an audio fragment with the duration of 20 seconds (R is 20) as an audio to be detected;
s3.2, extracting the characteristics of the audio to be detected, wherein the length of the characteristics is 4999 frames (L1 is 4999);
s3.3, mixing the to-be-detected audio feature with the original audio feature with the length of 29999 frames (L2 is 29999) from the j (j is 1) th point until j is equal to 29999 and 4999 frames (L2-L1), and representing all mixing situations of the to-be-detected audio feature and the original audio feature;
s3.4, Fourier transformation, modulus value taking, logarithm solving and inverse Fourier transformation are carried out on each mixing result of S3.3 to obtain cepstrum domain data, 400 data points in front of the cepstrum domain are set to be 0 so as to eliminate autocorrelation peak interference, a peak value in the first half of data of the cepstrum domain is found out, the similarity between the audio feature to be detected and the original audio feature is calculated according to the peak value, the similarity result and corresponding mixing point information are recorded, and the mixing point information j corresponding to the highest similarity in the record is returned to be the optimal mixing point of the audio feature of the sample with the length;
best mixing point reference values in the examples of Table 1
On the basis of the above embodiments, step S4 includes: description will be made with reference to FIG. 7
As shown in fig. 7:
s4.1, mixing the sample audio features with the retrieved audio features of the ith (i equals to 1) section of the retrieved audio library at the 5400 th frame of the optimal mixing point (j equals to 5400) to obtain mixed features;
s4.2, performing Fourier transform, modulus value taking, logarithm solving and inverse Fourier transform on the mixed features obtained in the S4.1 to obtain cepstrum domain data, setting 400 data points in front of the cepstrum domain to be 0 so as to eliminate self-correlation peak interference, finding out a peak value in the first half data of the cepstrum domain, and calculating the similarity between the sample audio features and the retrieval audio features according to the peak value;
FIG. 8 shows the cepstral analysis of the mixture of the retrieved audio features and the sample audio features of FIG. 5 after fitting, with the top 400 data points in the cepstral domain set to 0;
recording the similarity result and the corresponding retrieval audio information, returning to S4.1 after I +1 is recorded, and returning the retrieval audio information corresponding to the highest similarity in the record as the sample audio retrieval result when I is equal to I (I is 20);
table 2 sample audio retrieval results in the examples
It can be seen from the above table that the highest similarity is 89%, corresponding to the retrieved audio information being "min.
Claims (2)
1. A fast audio retrieval method based on cepstrum analysis is characterized in that:
s1, establishing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to the signal energy ratio in a circulating manner to establish the retrieval audio feature library for retrieval;
constructing an audio feature library; traversing each section of audio in the retrieval audio library, extracting a frequency position corresponding to the maximum point of the energy ratio of each frame of signal in a section of audio frequency domain as a characteristic, expressing a section of retrieval audio characteristic by a group of one-dimensional numerical groups according to the extraction result, and naming each section of retrieval audio characteristic by respective retrieval audio information so as to construct a retrieval audio characteristic library;
s2, extracting sample audio fingerprints, and extracting frequency domain features from the sample audio input by the user according to the signal energy ratio to form sample audio features;
extracting sample audio fingerprints, wherein the sample audio features are used for matching with the features of the retrieval audio feature library; extracting a frequency position corresponding to a maximum point of the energy ratio of each frame of signal in a sample audio frequency domain as a feature, wherein the extraction result represents a section of sample audio feature by a group of one-dimensional arrays;
extracting audio features according to a signal energy ratio in a frequency domain, wherein each frame of signal in the frequency domain starts from a first data point and sequentially goes down, the energy value of the current data point is divided by the sum of the energy of the whole frame of data points, the obtained result is a row of energy ratios, the frequency position corresponding to the maximum value in the row of energy ratios is found out, all frames of a section of audio are calculated according to the method, and the calculation result represents a section of audio features by a group of one-dimensional arrays, wherein the dimension is the number of audio frames;
s3, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieval audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixing features is more accurate;
determining an optimal mixing point according to the sample length; taking a first section of retrieval audio in a retrieval audio library as original audio, intercepting an audio segment with the duration consistent with that of sample audio in the original audio as to-be-detected audio, performing sliding mixing on the characteristics of the to-be-detected audio and the characteristics of the original audio in a window mode, and performing cepstrum analysis on the mixed characteristics to obtain a mixing point corresponding to the highest similarity between the two characteristics as an optimal mixing point;
and S4, sample audio retrieval, namely searching for the retrieval audio features with the highest similarity to the sample audio features in the retrieval audio feature library by using a cepstrum analysis method, wherein the corresponding retrieval audio information is the sample audio retrieval result.
2. The fast audio retrieval method based on cepstral analysis according to claim 1, wherein: step S4, sample audio retrieval; mixing the sample audio feature cycle with each section of retrieval audio features in the audio feature library by using the optimal mixing point to obtain mixed features, performing cepstrum analysis on the mixed features to calculate the similarity between the two audio features, recording the similarity result and corresponding retrieval audio information, and returning the retrieval audio information corresponding to the highest similarity in the record, namely the audio retrieval result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011145738.9A CN112214635B (en) | 2020-10-23 | 2020-10-23 | Fast audio retrieval method based on cepstrum analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011145738.9A CN112214635B (en) | 2020-10-23 | 2020-10-23 | Fast audio retrieval method based on cepstrum analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112214635A CN112214635A (en) | 2021-01-12 |
CN112214635B true CN112214635B (en) | 2022-09-13 |
Family
ID=74054994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011145738.9A Active CN112214635B (en) | 2020-10-23 | 2020-10-23 | Fast audio retrieval method based on cepstrum analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112214635B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112784097B (en) * | 2021-01-21 | 2024-03-26 | 百果园技术(新加坡)有限公司 | Audio feature generation method and device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7177795B1 (en) * | 1999-11-10 | 2007-02-13 | International Business Machines Corporation | Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems |
CN101465122A (en) * | 2007-12-20 | 2009-06-24 | 株式会社东芝 | Method and system for detecting phonetic frequency spectrum wave crest and phonetic identification |
CN105788603A (en) * | 2016-02-25 | 2016-07-20 | 深圳创维数字技术有限公司 | Audio identification method and system based on empirical mode decomposition |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4599420B2 (en) * | 2008-02-29 | 2010-12-15 | 株式会社東芝 | Feature extraction device |
CN103065627B (en) * | 2012-12-17 | 2015-07-29 | 中南大学 | Special purpose vehicle based on DTW and HMM evidence fusion is blown a whistle sound recognition methods |
CN107357875B (en) * | 2017-07-04 | 2021-09-10 | 北京奇艺世纪科技有限公司 | Voice search method and device and electronic equipment |
CN108369813B (en) * | 2017-07-31 | 2022-10-25 | 深圳和而泰智能家居科技有限公司 | Specific voice recognition method, apparatus and storage medium |
CN107731220B (en) * | 2017-10-18 | 2019-01-22 | 北京达佳互联信息技术有限公司 | Audio identification methods, device and server |
EP3701528B1 (en) * | 2017-11-02 | 2023-03-15 | Huawei Technologies Co., Ltd. | Segmentation-based feature extraction for acoustic scene classification |
CN108735230B (en) * | 2018-05-10 | 2020-12-04 | 上海麦克风文化传媒有限公司 | Background music identification method, device and equipment based on mixed audio |
CN110880329B (en) * | 2018-09-06 | 2022-11-04 | 腾讯科技(深圳)有限公司 | Audio identification method and equipment and storage medium |
CN109117622B (en) * | 2018-09-19 | 2020-09-01 | 北京容联易通信息技术有限公司 | Identity authentication method based on audio fingerprints |
CN110363120B (en) * | 2019-07-01 | 2020-07-10 | 上海交通大学 | Intelligent terminal touch authentication method and system based on vibration signal |
CN110310661B (en) * | 2019-07-03 | 2021-06-11 | 云南康木信科技有限责任公司 | Method for calculating two-path real-time broadcast audio time delay and similarity |
CN110767248B (en) * | 2019-09-04 | 2022-03-22 | 太原理工大学 | Anti-modulation interference audio fingerprint extraction method |
CN111710348A (en) * | 2020-05-28 | 2020-09-25 | 厦门快商通科技股份有限公司 | Pronunciation evaluation method and terminal based on audio fingerprints |
-
2020
- 2020-10-23 CN CN202011145738.9A patent/CN112214635B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7177795B1 (en) * | 1999-11-10 | 2007-02-13 | International Business Machines Corporation | Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems |
CN101465122A (en) * | 2007-12-20 | 2009-06-24 | 株式会社东芝 | Method and system for detecting phonetic frequency spectrum wave crest and phonetic identification |
CN105788603A (en) * | 2016-02-25 | 2016-07-20 | 深圳创维数字技术有限公司 | Audio identification method and system based on empirical mode decomposition |
Non-Patent Citations (1)
Title |
---|
Features for Content-Based Audio Retrieval;ChristianBreiteneder等;《Advances in Computers》;20101231;第78卷;71-150页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112214635A (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100820385B1 (en) | Robust and Invariant Audio Pattern Matching | |
CN102959624B (en) | System and method for audio media recognition | |
US20130275421A1 (en) | Repetition Detection in Media Data | |
CN110335625A (en) | The prompt and recognition methods of background music, device, equipment and medium | |
JP2004534274A (en) | Method and system for displaying music information on a digital display for use in content-based multimedia information retrieval | |
CN103971689A (en) | Audio identification method and device | |
CN110599987A (en) | Piano note recognition algorithm based on convolutional neural network | |
US8718803B2 (en) | Method for calculating measures of similarity between time signals | |
US9122753B2 (en) | Method and apparatus for retrieving a song by hummed query | |
CN112035696B (en) | Voice retrieval method and system based on audio fingerprint | |
CN112214635B (en) | Fast audio retrieval method based on cepstrum analysis | |
Zhang et al. | System and method for automatic singer identification | |
CN112732972B (en) | Audio fingerprint generation system and method | |
Patil et al. | Content-based audio classification and retrieval: A novel approach | |
Wang et al. | Audio fingerprint based on spectral flux for audio retrieval | |
KR101661666B1 (en) | Hybrid audio fingerprinting apparatus and method | |
Serrano et al. | A new fingerprint definition for effective song recognition | |
Ferreira et al. | Time complexity evaluation of cover song identification algorithms | |
Cai et al. | Cross-similarity measurement of music sections: A framework for large-scale cover song identification | |
Qian et al. | A novel algorithm for audio information retrieval based on audio fingerprint | |
CN114125368B (en) | Conference audio participant association method and device and electronic equipment | |
Li et al. | Query by humming based on music phrase segmentation and matching | |
CN117877525A (en) | Audio retrieval method and device based on variable granularity characteristics | |
Kamesh et al. | Audio fingerprinting with higher matching depth at reduced computational complexity | |
Yunjing | Similarity matching method for music melody retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |