CN112214635B - Fast audio retrieval method based on cepstrum analysis - Google Patents

Fast audio retrieval method based on cepstrum analysis Download PDF

Info

Publication number
CN112214635B
CN112214635B CN202011145738.9A CN202011145738A CN112214635B CN 112214635 B CN112214635 B CN 112214635B CN 202011145738 A CN202011145738 A CN 202011145738A CN 112214635 B CN112214635 B CN 112214635B
Authority
CN
China
Prior art keywords
audio
retrieval
features
sample
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011145738.9A
Other languages
Chinese (zh)
Other versions
CN112214635A (en
Inventor
邵玉斌
杨贵安
龙华
杜庆治
刘晶
唐维康
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202011145738.9A priority Critical patent/CN112214635B/en
Publication of CN112214635A publication Critical patent/CN112214635A/en
Application granted granted Critical
Publication of CN112214635B publication Critical patent/CN112214635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a fast audio retrieval method based on cepstrum analysis, and belongs to the technical field of audio retrieval. The invention comprises the following steps: firstly, constructing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to a signal energy ratio cycle to construct the retrieval audio feature library for retrieval; secondly, extracting sample audio fingerprints, and extracting frequency domain characteristics from the sample audio input by the user according to the signal energy ratio to form sample audio characteristics; thirdly, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieved audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixed features is more accurate; and fourthly, searching the sample audio, namely searching the searched audio features with the highest similarity to the sample audio features in the searched audio feature library by using a cepstrum analysis method, wherein the corresponding searched audio information is the sample audio searching result. The audio features extracted by the method are high in representativeness and small in occupied space; during retrieval, the cepstrum analysis is directly carried out on a mixed result of two audio features, and the cepstrum analysis only carries out Fourier correlation transformation on the mixed features, so that the calculation amount is small and the calculation speed is high. Therefore, aiming at the defect of low retrieval efficiency in audio retrieval application in the prior art, the invention greatly improves the retrieval efficiency on the premise of ensuring the audio retrieval accuracy.

Description

Fast audio retrieval method based on cepstrum analysis
Technical Field
The invention relates to a fast audio retrieval method based on cepstrum analysis, and belongs to the technical field of audio retrieval.
Background
With the advent of the big data age, the amount of multimedia information on the internet has increased explosively. The traditional audio retrieval based on text labels aims at establishing different label libraries for different fields, and the method has no universality and can not meet the requirements of people on multimedia retrieval. Therefore, the method for constructing the audio fingerprint database and performing audio retrieval in the hash index mode is provided. Most of subsequent audio retrieval algorithms are improved based on the thought, but the key problems that retrieval accuracy and retrieval efficiency cannot be balanced exist.
Disclosure of Invention
The invention aims to provide a fast audio retrieval method based on cepstrum analysis, which can greatly improve the retrieval efficiency on the premise of ensuring the retrieval accuracy.
In order to achieve the purpose, the invention adopts the following technical scheme:
s1, establishing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to the signal energy ratio in a circulating manner to establish the retrieval audio feature library for retrieval;
s2, extracting sample audio fingerprints, and extracting frequency domain characteristics from the sample audio input by the user according to the signal energy ratio to form sample audio characteristics;
s3, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieval audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixing features is more accurate;
s4, sample audio retrieval, namely searching for a retrieval audio feature with the highest similarity to the sample audio feature in a retrieval audio feature library by using a cepstrum analysis method, wherein the corresponding retrieval audio information is a sample audio retrieval result;
preferably, before performing steps S1 and S2, the method for extracting audio features according to the signal energy ratio in the frequency domain is described, further comprising:
in the frequency domain, each frame of signal starts from the first data point, and goes down in sequence, the energy value of the current data point is divided by the sum of the energy of the whole frame of data points, the obtained result is a row of energy ratio, and the calculation is shown as the formula (1):
Figure BDA0002739673050000021
wherein Er represents the energy ratio, E represents the energy, k t Indicates the corresponding time point, k f Representing the corresponding frequency point, and Q representing the upper frequency limit;
finding out the frequency position corresponding to the maximum value in the energy ratio, calculating all frames of a section of audio according to the method, and expressing a section of audio characteristic by a group of one-dimensional arrays (dimension, namely audio frame number) according to the calculation result;
preferably, the step S1 includes:
s1.1, performing Fourier transform on each frame of signals obtained after framing and windowing the retrieved audio signals;
s1.2, extracting a frequency position corresponding to a maximum point of the energy ratio of each frame of signal in a frequency domain as a feature, wherein the extraction result represents a section of retrieval audio feature by a group of one-dimensional numerical groups, and the one-dimensional retrieval audio feature F with the length of N is used as a one-dimensional retrieval audio feature F T Is represented as follows:
F T =(f t1 f t2 f t3 … f tN ) (2)
s1.3, traversing all audios in the search audio library in the modes of S1.1 and S1.2, and naming each section of search audio features by respective search audio information so as to construct a search audio feature library;
preferably, the step S2 includes:
s2.1, a user inputs an audio clip of any retrieval audio as a sample audio signal, wherein the duration of the audio clip is R seconds and the audio clip can have white noise with a certain signal-to-noise ratio;
s2.2, performing Fourier transform on each frame of signals obtained after framing and windowing the sample audio signals;
s2.3, extracting the frequency position corresponding to the maximum point of the energy ratio of each frame signal in the frequency domain as a feature, wherein the extraction result represents a section of sample audio feature by a group of one-dimensional arrays, and the length of the one-dimensional sample audio feature is MF S Is represented as follows:
F S =(f s1 f s2 f s3 … f sM ) (3)
preferably, the step S3 includes:
s3.1, retrieving a first section of retrieval audio in an audio library as an original audio, and intercepting an audio fragment of R seconds in the original audio as an audio to be detected;
s3.2, extracting the audio features to be detected, wherein the length is L1;
s3.3, mixing the audio feature to be detected and the original audio feature with the length of L2 from the first point until the point of subtracting L1 from L2 is finished;
s3.4, Fourier transformation, modulus value taking, logarithm solving and inverse Fourier transformation are carried out on each mixing result of the S3.3, cepstrum domain data are obtained, a peak value in the first half of data is found out after an autocorrelation peak of the cepstrum domain data is eliminated, the similarity between the audio feature to be detected and the original audio feature is calculated according to the peak value, a similarity result and corresponding mixing point information are recorded, and the mixing point information corresponding to the highest similarity in the record is returned, namely the optimal mixing point tau of the audio feature of the sample with the length;
preferably, the step S4 includes:
s4.1, mixing the sample audio features with the optimal mixing point calculated in the step S3 and the retrieved audio features to obtain mixed features;
s4.2, Fourier transform, modulus value taking, logarithm solving and inverse Fourier transform are carried out on the mixed features obtained in the S4.1, cepstral domain data are obtained, after an autocorrelation peak of the cepstral domain data is eliminated, a peak value in the first half of data is found out, and the similarity between the sample audio features and the retrieval audio features is calculated according to the peak value;
the search audio features and the sample audio features are both one-dimensional arrays, so that the search audio features and the sample audio features can be regarded as two waveform signals, and the principle of calculating the similarity between the two waveform signals by performing cepstrum analysis on a mixed signal is as follows:
suppose that the retrieved audio feature signal is x 1 (t) the sample audio feature signal is x 2 (t):
Figure BDA0002739673050000031
Where τ (τ > 0) is the optimum mixing point, i.e. the audio feature signal x is retrieved 1 (t) and sample audio feature signal x 2 Time delay between (t), a 1 And a 2 Is an attenuation factor of the signal, and a 1 ∈(0,1),a 2 ∈(0,1);
The mixed signal is configured as:
y(t)=x(t)*(a 1 δ(t)+a 2 δ(t-τ)) (5)
according to the definition of the power cepstrum, the cepstrum analysis result of the mixed signal is as follows:
Figure BDA0002739673050000032
as can be seen from equation (3), in the power cepstrum of the mixed signal, there are impulse peak amounts at the optimal mixing point position and at integer multiple positions thereof. Eliminating the self-correlation peak interference in the cepstrum, finding out an impact peak from the first half power cepstrum, and calculating according to the impact peak to obtain the similarity;
s4.3, performing S4.1 and S4.2 on sample feature circulation and each section of retrieval audio features in the retrieval audio feature library, recording similarity results and corresponding retrieval audio information, and returning the retrieval audio information corresponding to the highest similarity in the record, namely the sample audio retrieval result;
compared with the traditional method that the index value of each data point pair in the two audio fingerprints needs to be matched in the Hash search, the search method provided by the invention directly performs cepstrum analysis on the mixed result of the two audio features to obtain the similarity, and the cepstrum analysis only performs Fourier correlation transformation on the mixed result, so that the calculation amount is small and the calculation speed is high.
Drawings
FIG. 1 is a flowchart illustrating an exemplary audio retrieval method according to the present invention
FIG. 2 is a schematic diagram of feature extraction according to the present invention
FIG. 3 is a flow chart of the present invention for constructing a search audio feature library
FIG. 4 is a flow chart of extracting sample audio features according to the present invention
FIG. 5 is a waveform of the audio characteristics of the present invention
FIG. 6 is a flow chart of the present invention for determining the optimal mixing point according to the sample length
FIG. 7 is a flow chart of mixed feature cepstrum analysis in accordance with the present invention
FIG. 8 shows the result of cepstrum analysis of the mixture characteristic of the present invention
Detailed Description
The invention will be further described by means of embodiments in conjunction with the accompanying drawings.
In order to solve the problem that the prior art cannot balance and take the retrieval accuracy and the retrieval efficiency into consideration, the embodiment of the invention provides a fast audio retrieval method based on cepstrum analysis, as shown in fig. 1, comprising the following operations:
s1, establishing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to the signal energy ratio in a circulating manner to establish the retrieval audio feature library for retrieval;
s2, extracting sample audio fingerprints, and extracting frequency domain characteristics from the sample audio input by the user according to the signal energy ratio to form sample audio characteristics;
s3, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieval audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixing features is more accurate;
s4, sample audio retrieval, namely searching for a retrieval audio feature with the highest similarity to the sample audio feature in a retrieval audio feature library by using a cepstrum analysis method, wherein the corresponding retrieval audio information is a sample audio retrieval result;
before proceeding to steps S1 and S2, it is necessary to explain a method of extracting audio features according to the signal energy ratio in the frequency domain in the embodiment:
as shown in fig. 2, each frame signal in the frequency domain starts from the first data point, and sequentially goes down, the energy value of the current data point is divided by the sum of the energies of the data points of the whole frame, the obtained result is a row of energy ratio values, the frequency position corresponding to the maximum value in the row of energy ratio values is found, all frames of a section of audio are calculated according to the method, and the calculation result represents a section of audio features by a group of one-dimensional arrays (dimension, i.e. audio frame number);
on the basis of the above embodiments, step S1 includes: description will be made with reference to FIG. 3
As shown in fig. 3:
s1.1, performing Fourier transform on each frame of signals obtained after framing and windowing the retrieved audio signals;
searching 19 pieces of (I ═ 19) audio in the audio library, wherein the time length of each piece of audio is 1min, dividing the 1 st piece of (I ═ 1) audio into 29999 frames (N ═ 29999), performing Fourier transform on the 1 st frame (N ═ 1) signal, and when N is equal to N, indicating that each frame of signal is subjected to Fourier transform;
s1.2, calculating and searching the signal energy ratio of each frame in the audio frequency domain according to the formula (1), wherein 29999 frames are calculated:
Figure BDA0002739673050000051
extracting the frequency position corresponding to the maximum point of the energy ratio of each frame signal as a feature, wherein the length of the one-dimensional retrieval audio feature F is 29999 frames T Is represented as follows:
F T =(5 1 3 … 8)
s1.3, traversing and retrieving all audios of the audio library in the ways of S1.1 and S1.2, naming each section of retrieved audio features by respective retrieved audio information, and when I is equal to I, indicating that the audio traversal of the whole retrieved audio library is finished, and completing the construction of the retrieved audio feature library;
on the basis of the above embodiments, step S2 includes: description will be made with reference to FIG. 4
As shown in fig. 4:
s2.1, if the user inputs an audio clip of any search audio as a sample audio signal, the duration of the audio clip is 20 seconds (R is 20), and the audio clip has white noise with a signal-to-noise ratio of 10 dB;
s2.2, performing Fourier transform on each frame of signal obtained after the frame windowing is performed on the sample audio signal;
dividing sample audio input by a user into 9999 frames (M & lt9999 & gt), performing Fourier transform on a 1 st frame (M & lt1 & gt) signal, and performing Fourier transform on each frame signal when M is equal to M;
calculating the signal energy ratio of each frame in the sample audio frequency domain according to the formula (1), and obtaining 9999 frames:
Figure BDA0002739673050000061
extracting the frequency position corresponding to the maximum point of the energy ratio of each frame signal as a feature, wherein the length of the one-dimensional retrieval audio feature F is 29999 frames T Is represented as follows:
F S =(1 1 3 … 6)
FIG. 5 illustrates retrieving an audio signature and a sample audio signature;
on the basis of the above embodiments, step S3 includes: description will be made with reference to FIG. 6
As shown in fig. 6:
s3.1, retrieving a first section of retrieval audio in an audio library as an original audio, and intercepting an audio fragment with the duration consistent with that of the sample audio from the original audio, namely, an audio fragment with the duration of 20 seconds (R is 20) as an audio to be detected;
s3.2, extracting the characteristics of the audio to be detected, wherein the length of the characteristics is 4999 frames (L1 is 4999);
s3.3, mixing the to-be-detected audio feature with the original audio feature with the length of 29999 frames (L2 is 29999) from the j (j is 1) th point until j is equal to 29999 and 4999 frames (L2-L1), and representing all mixing situations of the to-be-detected audio feature and the original audio feature;
s3.4, Fourier transformation, modulus value taking, logarithm solving and inverse Fourier transformation are carried out on each mixing result of S3.3 to obtain cepstrum domain data, 400 data points in front of the cepstrum domain are set to be 0 so as to eliminate autocorrelation peak interference, a peak value in the first half of data of the cepstrum domain is found out, the similarity between the audio feature to be detected and the original audio feature is calculated according to the peak value, the similarity result and corresponding mixing point information are recorded, and the mixing point information j corresponding to the highest similarity in the record is returned to be the optimal mixing point of the audio feature of the sample with the length;
best mixing point reference values in the examples of Table 1
Figure BDA0002739673050000062
On the basis of the above embodiments, step S4 includes: description will be made with reference to FIG. 7
As shown in fig. 7:
s4.1, mixing the sample audio features with the retrieved audio features of the ith (i equals to 1) section of the retrieved audio library at the 5400 th frame of the optimal mixing point (j equals to 5400) to obtain mixed features;
s4.2, performing Fourier transform, modulus value taking, logarithm solving and inverse Fourier transform on the mixed features obtained in the S4.1 to obtain cepstrum domain data, setting 400 data points in front of the cepstrum domain to be 0 so as to eliminate self-correlation peak interference, finding out a peak value in the first half data of the cepstrum domain, and calculating the similarity between the sample audio features and the retrieval audio features according to the peak value;
FIG. 8 shows the cepstral analysis of the mixture of the retrieved audio features and the sample audio features of FIG. 5 after fitting, with the top 400 data points in the cepstral domain set to 0;
recording the similarity result and the corresponding retrieval audio information, returning to S4.1 after I +1 is recorded, and returning the retrieval audio information corresponding to the highest similarity in the record as the sample audio retrieval result when I is equal to I (I is 20);
table 2 sample audio retrieval results in the examples
Figure BDA0002739673050000071
It can be seen from the above table that the highest similarity is 89%, corresponding to the retrieved audio information being "min.

Claims (2)

1. A fast audio retrieval method based on cepstrum analysis is characterized in that:
s1, establishing a retrieval audio feature library, and extracting frequency domain features from each section of audio in the retrieval audio library according to the signal energy ratio in a circulating manner to establish the retrieval audio feature library for retrieval;
constructing an audio feature library; traversing each section of audio in the retrieval audio library, extracting a frequency position corresponding to the maximum point of the energy ratio of each frame of signal in a section of audio frequency domain as a characteristic, expressing a section of retrieval audio characteristic by a group of one-dimensional numerical groups according to the extraction result, and naming each section of retrieval audio characteristic by respective retrieval audio information so as to construct a retrieval audio characteristic library;
s2, extracting sample audio fingerprints, and extracting frequency domain features from the sample audio input by the user according to the signal energy ratio to form sample audio features;
extracting sample audio fingerprints, wherein the sample audio features are used for matching with the features of the retrieval audio feature library; extracting a frequency position corresponding to a maximum point of the energy ratio of each frame of signal in a sample audio frequency domain as a feature, wherein the extraction result represents a section of sample audio feature by a group of one-dimensional arrays;
extracting audio features according to a signal energy ratio in a frequency domain, wherein each frame of signal in the frequency domain starts from a first data point and sequentially goes down, the energy value of the current data point is divided by the sum of the energy of the whole frame of data points, the obtained result is a row of energy ratios, the frequency position corresponding to the maximum value in the row of energy ratios is found out, all frames of a section of audio are calculated according to the method, and the calculation result represents a section of audio features by a group of one-dimensional arrays, wherein the dimension is the number of audio frames;
s3, determining an optimal mixing point according to the sample length, wherein the sample audio features and the retrieval audio features are mixed at the optimal mixing point, so that the cepstrum analysis result of the mixing features is more accurate;
determining an optimal mixing point according to the sample length; taking a first section of retrieval audio in a retrieval audio library as original audio, intercepting an audio segment with the duration consistent with that of sample audio in the original audio as to-be-detected audio, performing sliding mixing on the characteristics of the to-be-detected audio and the characteristics of the original audio in a window mode, and performing cepstrum analysis on the mixed characteristics to obtain a mixing point corresponding to the highest similarity between the two characteristics as an optimal mixing point;
and S4, sample audio retrieval, namely searching for the retrieval audio features with the highest similarity to the sample audio features in the retrieval audio feature library by using a cepstrum analysis method, wherein the corresponding retrieval audio information is the sample audio retrieval result.
2. The fast audio retrieval method based on cepstral analysis according to claim 1, wherein: step S4, sample audio retrieval; mixing the sample audio feature cycle with each section of retrieval audio features in the audio feature library by using the optimal mixing point to obtain mixed features, performing cepstrum analysis on the mixed features to calculate the similarity between the two audio features, recording the similarity result and corresponding retrieval audio information, and returning the retrieval audio information corresponding to the highest similarity in the record, namely the audio retrieval result.
CN202011145738.9A 2020-10-23 2020-10-23 Fast audio retrieval method based on cepstrum analysis Active CN112214635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011145738.9A CN112214635B (en) 2020-10-23 2020-10-23 Fast audio retrieval method based on cepstrum analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011145738.9A CN112214635B (en) 2020-10-23 2020-10-23 Fast audio retrieval method based on cepstrum analysis

Publications (2)

Publication Number Publication Date
CN112214635A CN112214635A (en) 2021-01-12
CN112214635B true CN112214635B (en) 2022-09-13

Family

ID=74054994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011145738.9A Active CN112214635B (en) 2020-10-23 2020-10-23 Fast audio retrieval method based on cepstrum analysis

Country Status (1)

Country Link
CN (1) CN112214635B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784097B (en) * 2021-01-21 2024-03-26 百果园技术(新加坡)有限公司 Audio feature generation method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177795B1 (en) * 1999-11-10 2007-02-13 International Business Machines Corporation Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems
CN101465122A (en) * 2007-12-20 2009-06-24 株式会社东芝 Method and system for detecting phonetic frequency spectrum wave crest and phonetic identification
CN105788603A (en) * 2016-02-25 2016-07-20 深圳创维数字技术有限公司 Audio identification method and system based on empirical mode decomposition

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4599420B2 (en) * 2008-02-29 2010-12-15 株式会社東芝 Feature extraction device
CN103065627B (en) * 2012-12-17 2015-07-29 中南大学 Special purpose vehicle based on DTW and HMM evidence fusion is blown a whistle sound recognition methods
CN107357875B (en) * 2017-07-04 2021-09-10 北京奇艺世纪科技有限公司 Voice search method and device and electronic equipment
CN108369813B (en) * 2017-07-31 2022-10-25 深圳和而泰智能家居科技有限公司 Specific voice recognition method, apparatus and storage medium
CN107731220B (en) * 2017-10-18 2019-01-22 北京达佳互联信息技术有限公司 Audio identification methods, device and server
EP3701528B1 (en) * 2017-11-02 2023-03-15 Huawei Technologies Co., Ltd. Segmentation-based feature extraction for acoustic scene classification
CN108735230B (en) * 2018-05-10 2020-12-04 上海麦克风文化传媒有限公司 Background music identification method, device and equipment based on mixed audio
CN110880329B (en) * 2018-09-06 2022-11-04 腾讯科技(深圳)有限公司 Audio identification method and equipment and storage medium
CN109117622B (en) * 2018-09-19 2020-09-01 北京容联易通信息技术有限公司 Identity authentication method based on audio fingerprints
CN110363120B (en) * 2019-07-01 2020-07-10 上海交通大学 Intelligent terminal touch authentication method and system based on vibration signal
CN110310661B (en) * 2019-07-03 2021-06-11 云南康木信科技有限责任公司 Method for calculating two-path real-time broadcast audio time delay and similarity
CN110767248B (en) * 2019-09-04 2022-03-22 太原理工大学 Anti-modulation interference audio fingerprint extraction method
CN111710348A (en) * 2020-05-28 2020-09-25 厦门快商通科技股份有限公司 Pronunciation evaluation method and terminal based on audio fingerprints

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7177795B1 (en) * 1999-11-10 2007-02-13 International Business Machines Corporation Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems
CN101465122A (en) * 2007-12-20 2009-06-24 株式会社东芝 Method and system for detecting phonetic frequency spectrum wave crest and phonetic identification
CN105788603A (en) * 2016-02-25 2016-07-20 深圳创维数字技术有限公司 Audio identification method and system based on empirical mode decomposition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Features for Content-Based Audio Retrieval;ChristianBreiteneder等;《Advances in Computers》;20101231;第78卷;71-150页 *

Also Published As

Publication number Publication date
CN112214635A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
KR100820385B1 (en) Robust and Invariant Audio Pattern Matching
CN102959624B (en) System and method for audio media recognition
US20130275421A1 (en) Repetition Detection in Media Data
CN110335625A (en) The prompt and recognition methods of background music, device, equipment and medium
JP2004534274A (en) Method and system for displaying music information on a digital display for use in content-based multimedia information retrieval
CN103971689A (en) Audio identification method and device
CN110599987A (en) Piano note recognition algorithm based on convolutional neural network
US8718803B2 (en) Method for calculating measures of similarity between time signals
US9122753B2 (en) Method and apparatus for retrieving a song by hummed query
CN112035696B (en) Voice retrieval method and system based on audio fingerprint
CN112214635B (en) Fast audio retrieval method based on cepstrum analysis
Zhang et al. System and method for automatic singer identification
CN112732972B (en) Audio fingerprint generation system and method
Patil et al. Content-based audio classification and retrieval: A novel approach
Wang et al. Audio fingerprint based on spectral flux for audio retrieval
KR101661666B1 (en) Hybrid audio fingerprinting apparatus and method
Serrano et al. A new fingerprint definition for effective song recognition
Ferreira et al. Time complexity evaluation of cover song identification algorithms
Cai et al. Cross-similarity measurement of music sections: A framework for large-scale cover song identification
Qian et al. A novel algorithm for audio information retrieval based on audio fingerprint
CN114125368B (en) Conference audio participant association method and device and electronic equipment
Li et al. Query by humming based on music phrase segmentation and matching
CN117877525A (en) Audio retrieval method and device based on variable granularity characteristics
Kamesh et al. Audio fingerprinting with higher matching depth at reduced computational complexity
Yunjing Similarity matching method for music melody retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant