CN110600038A - Audio fingerprint dimension reduction method based on discrete kini coefficient - Google Patents
Audio fingerprint dimension reduction method based on discrete kini coefficient Download PDFInfo
- Publication number
- CN110600038A CN110600038A CN201910784077.5A CN201910784077A CN110600038A CN 110600038 A CN110600038 A CN 110600038A CN 201910784077 A CN201910784077 A CN 201910784077A CN 110600038 A CN110600038 A CN 110600038A
- Authority
- CN
- China
- Prior art keywords
- audio
- fingerprint
- discrete
- dimension
- kini
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000009467 reduction Effects 0.000 title claims abstract description 31
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 11
- 230000005236 sound signal Effects 0.000 claims description 11
- 238000001228 spectrum Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 238000009432 framing Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 5
- 238000007619 statistical method Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 13
- 238000013461 design Methods 0.000 description 12
- 230000002159 abnormal effect Effects 0.000 description 7
- 230000000717 retained effect Effects 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000002547 anomalous effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an audio fingerprint dimensionality reduction method based on discrete kini coefficient calculation, which aims to solve the problem of high dimensionality of audio fingerprint features and specifically comprises the steps of constructing a target sound bank in a classified mode, extracting fingerprint features of sample audio and introducing discrete kini coefficients to reduce dimensionality of the audio fingerprint features. According to the invention, the discrete kini coefficients of the fingerprints are introduced into all the dimensions of the audio fingerprint, the discriminative sizes of different audios in the dimension are reflected by the discrete kini coefficient of all the dimensions of the audio fingerprint, and the dimension with small discrete kini coefficient is deleted by keeping the dimension with large discrete kini coefficient to achieve the purpose of reducing the dimension. The sample audio fingerprint database constructed by the audio fingerprint features after dimension reduction has smaller data volume and higher utilization rate.
Description
Technical Field
The invention belongs to the field of intelligent application type sound fields, and particularly relates to an audio fingerprint dimension reduction method based on discrete kini coefficient calculation.
Background
In recent years, intelligence is popular among people and is widely researched and discussed, intelligent identification of audio is an important basic stone for artificial intelligence development, intelligent identification of audio cannot be separated from feature extraction of audio, among numerous features of audio, audio fingerprint is the most popular in recent years, and the audio fingerprint is a compact digital signature based on content and capable of representing a section of important acoustic features of audio, and the main purpose of the audio fingerprint is to represent a large amount of audio data by using a small amount of digital information. Compared with the traditional audio characteristics, the method has the advantages of small data volume, high anti-noise performance, relatively simple characteristic extraction process and the like, and is widely applied to the fields of music identification, advertisement supervision, copyright protection and the like. However, audio fingerprints have the disadvantage of high fingerprint dimensionality, which delays the recognition speed in audio recognition and occupies a large memory of a computer. In contrast, if the dimensionality of the audio fingerprint can be reduced, the data size of the audio fingerprint can be reduced to a great extent, and meanwhile, the speed of audio retrieval can be improved, and the audio identification performance can be enhanced.
Disclosure of Invention
Aiming at the problem of high dimension of audio fingerprint characteristics, the invention introduces fingerprint discrete kini coefficients in each dimension of the audio fingerprint, and reflects the differentiability of different audios in the dimension through the discrete kini coefficient of each dimension of the audio fingerprint. The larger the discrete kini coefficient of a certain bit of the audio fingerprint is, the larger the difference of different audio at the bit is, which indicates that the bit is better distinguishable, otherwise, the distinctiveness is poor. Therefore, the bits with good distinguishability are reserved, the bits with poor distinguishability are removed, and the high-dimensional audio fingerprint can be converted into a lower dimension, so that the data volume of the fingerprint is effectively reduced.
The technical scheme of the invention is used for solving the problem of overlarge data volume of the audio fingerprint feature library, and the data volume of the sample audio fingerprint library constructed by the audio fingerprint features after dimensionality reduction is smaller and the utilization rate is higher, and the invention mainly comprises the following steps:
step 1, constructing a target sound library in a classified manner
The design classifies the audios and establishes a library according to the audio characteristic types or existing data conditions. Because the audio types are different and the characteristics are different, the audio classification processing is convenient for finding the commonalities of the audio characteristics, and the dimensionality reduction quality of the audio fingerprints can be influenced if the audio classification is not carried out. Storing the existing audio data in a classified manner, and then extracting fingerprint features of various audios respectively. The audio fingerprint extraction algorithm flow chart is shown in fig. 1.
Step2, extracting fingerprint characteristics of sample audio
And selecting various types of audio data from the constructed audio library as original sample audio. Extracting fingerprint features of the original sample and introducing discrete kini coefficients to perform dimensionality reduction on the fingerprint features. The specific process is as follows:
step2.1: preprocessing of target audio
Before extracting the data features, preprocessing operation is firstly carried out. The pretreatment comprises the following steps: band-pass filtering, pre-emphasis, framing.
(1) The method comprises the steps of selecting 8kHz sampling audio signals as processing objects to carry out band-pass filtering processing, extracting the most important frequency components of human ear perception, and selecting a band-pass filter with a band-pass range of 20Hz-4000Hz to process the signals. In the design, a Finite Impulse Response (FIR) Filter is selected for bandpass filtering, and the filtering process is as follows:
wherein, T is the number of sampling points of the processed signal, p is the time domain index, h (l) is the FIR filter coefficient, x (p) is the input signal, and y (p) is the band-pass filtered signal.
(2) The design selects a digital filter with 6 dB/octave to realize the pre-emphasis processing on the signals y (p) after band-pass filtering, so as to improve the high-frequency characteristic of the pre-processed signals, enable the signal frequency spectrum to become relatively flat, and enable the voice signals to use the same signal-to-noise ratio to obtain the frequency spectrum in the whole frequency band from low frequency to high frequency. The pre-emphasis process is shown as follows:
s(p)=y(p)-μ*y(p-1)
wherein mu is a pre-emphasis coefficient which takes a value of 0.96, and s (p) is a signal after pre-emphasis processing.
(3) And carrying out windowing and framing processing on the pre-emphasis signal. Although the frame segmentation method can be a continuous segmentation method, an overlapped segmentation method is generally used, which is to make the transition between frames smooth and maintain the continuity of the frames. The audio is framed with a frame length of 0.064 seconds, with a 75% overlap between frames, each frame being weighted with a hanning window of the same length. The windowing formula is as follows:
wherein, T is the length of the hanning window and is also the frame length of one frame of audio.
Step2.2: fingerprint feature extraction for audio data
A digital audio fingerprint can be regarded as a concentrated essence of a section of audio, and comprises the most important part of audio data audition, and the dimension of the audio fingerprint is a key factor influencing the data volume of the fingerprint and the retrieval rate, so the technology introduces discrete kini coefficients to perform feature dimension reduction on the audio fingerprint, and firstly performs fingerprint feature extraction on the audio before dimension reduction, and the steps are as follows:
(1) performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:
where x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of the discrete fourier transform.
(2) The frequency domain signal after discrete Fourier transform is subjected to spectrum sub-band division, 33 non-overlapping frequency bands are selected from the spectrum and distributed in the range of 20-4000Hz (the identification of the human ear on the audio frequency is mainly concentrated in the range), and the frequency bands are equally logarithmically spaced (according to the fact that the response of the human ear on different frequencies is not linear). The starting frequency of the mth sub-band, i.e., the ending frequency f (m) of the (m-1) th sub-band, can be expressed as follows:
where Fmin is the lower mapping limit, here 20Hz, Fmax is the upper mapping limit, here 4000Hz, and M is the number of subbands, here 33.
(3) Calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, assuming that the starting frequency of the mth sub-band is f (m), the ending frequency is f (m +1), and the frequency domain signal after discrete fourier transform is x (k), the formula of the energy of the mth sub-band of the nth frame is as follows:
(4) generating a sub-fingerprint of each frame of audio, performing bit difference discrimination on the 33 sub-band energies obtained by each frame, and generating a 32-bit binary code (sub-fingerprint) of each frame of audio, wherein the mth sub-band energy of the nth frame is E (n, m), and the corresponding binary bit information is F (n, m), and then the binary audio fingerprint information discrimination formula of each frame is as follows:
as can be seen from the above formula, each frame of audio finally generates a 32-dimensional binary sub-fingerprint information, the sub-fingerprint contains less information, and an audio fingerprint feature often consists of a plurality of sub-fingerprints.
Step3, reducing the dimension of the audio fingerprint characteristics
After the audio fingerprint extraction process, the audio data of each frame finally generates a 32-dimensional binary sub-fingerprint information. For a segment of audio, the contained audio fingerprint information is composed of a plurality of binary sub-fingerprint information, the data volume of the fingerprint information is still large, and in practical application, it is desirable to further reduce the audio fingerprint dimension so as to effectively reduce the data volume of the fingerprint. The design provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation, the discrete kini coefficient is solved for each dimension of the audio fingerprint, and the purpose of reducing the fingerprint dimension is achieved through the size of the discrete kini coefficient of each dimension of the fingerprint.
The discrete kini coefficients of the dimensional fingerprints are obtained by taking every 50 frames of the audio fingerprint as a group, and the discrete kini coefficients of the dimensional fingerprints reflect the discrete degree of certain dimensional data of the audio fingerprint, namely the difference of the certain dimensional data of the audio fingerprint, so that the larger the discrete kini coefficient of a certain position of the audio fingerprint is, the larger the difference of different audio frequencies at the position is, the better the distinguishability of the position is, and otherwise, the distinguishability is poor. The design removes the information of the dimension with poor distinguishability by keeping the information of the dimension with good distinguishability in the audio fingerprint, and converts the 32-dimensional audio fingerprint into the lower dimension to reduce the data volume of the fingerprint.
Step3.1: calculating discrete kini coefficient of each dimension of audio fingerprint
The derivation process and the specific steps are as follows:
(1) obtaining discrete Lorentz curve of audio fingerprint, wherein the discrete Lorentz curve is a key curve for obtaining discrete kini coefficient and is formed by accumulating fingerprint data to obtain ratio vectorJ represents the dimension number of the audio fingerprint, and the value range j is 1,2, … … and 32 to obtain the ratio vector of the accumulated fingerprint dataThe calculation process of (2) is as follows:
processing various audio fingerprints in an audio fingerprint database according to frames, dividing every 50 frames of fingerprint data of the audio fingerprints into a group of N groups, and constructing a jth dimension accumulated fingerprint data vector:
whereinAre numbered with group numbers and have Constructing a j-dimension accumulated fingerprint data proportion vectorThe elements in the vector are defined as:
………
the curve formed by each element of the proportion vector is a discrete Lorentz curve, and specifically, the discrete Lorentz curve drawing process of the j dimension of the audio fingerprint is as follows:
the audio fingerprint accumulated group number is used as an abscissa (the abscissa is the abscissa) Taking the occupation amount of j dimension accumulated fingerprint data of the audio fingerprint as a vertical coordinate, and carrying out occupation amount calculation on j dimension accumulated fingerprint data of the audio fingerprintThe discrete points are drawn, the discrete curve linked by the discrete points is the discrete Lorentz curve of the j dimension of the audio fingerprint, and the coordinates of the discrete curve are
(2) The discrete kini coefficients of all dimensions of various audio fingerprints are obtained, and the obtained discrete Lorentz curve is taken as a boundary line, so that the kini coefficient formula of the j dimension of the audio fingerprint can be obtained as follows:
wherein S isaIs a closed area enclosed by the diagonal OA and the discrete Lorentz curve, and the coordinate of the point O is (0,0) and the coordinate of the point A is (1,1), SbIs a closed area enclosed by coordinate line segments OB and BA and a discrete Lorentz curve, the coordinate of a point B is (1,0), GjFor the kini coefficient of j dimension of audio fingerprint, the fingerprint discrete kini coefficient computation auxiliary graph is shown in fig. 4.
As can be seen from the above, Sa+SbIs the closed area enclosed by the diagonal line segment OA and the line segments OB and BA, i.e.: sa+Sb1/2, because the audio fingerprint is discrete, the present design discretizes the above equation into:
thereby, obtaining the j-dimension discrete kini coefficient of the audio fingerprintWherein i is a group number of the group,and accumulating ith group of fingerprint data occupation quantity for j dimension of the audio fingerprint.
Step3.2: carrying out fingerprint dimensionality reduction by counting discrete kini coefficients of all dimensions of different types of audio fingerprints
And (4) training discrete kini coefficients of all dimensions of the audio fingerprint by combining application scenes. And (3) constructing a training set from the audio data according to the data type or combining the existing data set to construct the training set, and calculating the discrete kini coefficient of each dimension of the audio fingerprint of the audio data of each training set and carrying out statistical analysis.
The discrete kini coefficient of the audio fingerprint designed by the invention is suitable for dimension reduction of various audio fingerprints, the following audio fingerprints are selected as analysis objects, and the discrete kini coefficient of each dimension of the audio fingerprint is subjected to statistical analysis:
as shown in fig. 5, discrete kini coefficients of each dimension of an audio fingerprint of an abnormal sound are obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the abnormal sound from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the abnormal sound audio fingerprint is 0.36-0.38, it can be seen from the figure that the discrete kini coefficients of each dimension of the audio fingerprint are different in size, the discrete kini coefficients of some dimensions are smaller, such as dimensions 2, 25 and 26, and the discrete kini coefficients of some dimensions are larger, such as dimensions 11, 20 and 28, and the size of the discrete kini coefficients of the fingerprint represents the difference of the audio fingerprint data of the dimension. At this time, if the threshold of the discrete kini coefficient of the fingerprint is set to 0.37, it can be seen from the figure that there are 2 nd, 5 th, 24 th, 25 th and 26 th dimensions of the discrete kini coefficient of the fingerprint smaller than 0.37, which shows that the differences of the fingerprint data of each frame of the 5 dimensions are small and can be eliminated, and the dimension of each frame of the audio fingerprint is reduced by eliminating the dimensions with the small differences, thereby reducing the data volume of the audio fingerprint database.
FIG. 6 is a diagram showing discrete kini coefficients of each dimension of an audio fingerprint of a normal speaking class, which is obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the normal speaking class from an audio fingerprint library, by setting a threshold of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information lower than the threshold can be discarded, audio fingerprint dimension information higher than the threshold is retained, the threshold range of the discrete kini coefficients of the audio fingerprint of the normal speaking class is 0.36-0.38, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the audio fingerprint has the 2 nd, 22 nd, 25 nd dimensions, the dimension with larger discrete kini coefficients of the fingerprint has the 18 th, 26 th, 28 th dimensions, etc., if the threshold of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the fingerprint has the 2 nd, 22 nd, 25 nd dimensions, which shows that the difference of each frame fingerprint data of the 3 dimensions is smaller, can be discarded.
FIG. 7 is a diagram illustrating discrete kini coefficients of each dimension of an audio fingerprint of a song class, wherein discrete kini coefficients of 32 dimensions of the audio fingerprint of the song class are counted from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the audio fingerprint of the song class is 0.36-0.38, it can be seen from the diagram that the discrete kini coefficients of each dimension of the audio fingerprint have relatively large differences, the discrete kini coefficients of each dimension of the fingerprint are relatively dispersed, the discrete kini coefficients of 2 nd, 14 th, 15 th and 25 th dimensions of the fingerprint are small, the discrete kini coefficients of 4 th, 5 th, 11 th, 18 th and 29 th dimensions of the fingerprint are large, and if the threshold value of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the discrete kini coefficients of 1 st, 5 th, 11 th, 18 th and, 2. Dimensions 14, 15, 25 and 26 show that the fingerprint data of each frame in the 6 dimensions are small in difference and can be omitted.
It can be seen from these figures that although the discrete kini coefficients of the fingerprints of various audio frequencies are different and the trends of the coefficients in various dimensions are different, the discrete kini coefficients of the fingerprints in several dimensions are smaller, which means that the audio fingerprints are less distinguishable in the several dimensions, so that the data in the several dimensions can be removed, and the 32-dimensional audio fingerprints are converted into the lower dimensions, thereby effectively reducing the data amount of the fingerprints.
The invention has the advantages that:
1. algorithm complexity is low and flexibility is stronger
2. Has smaller audio characteristic data quantity than the traditional audio characteristic data quantity
3. Discrete kini coefficient is introduced as dimension reduction algorithm, complexity is low, and large batch of data can be processed
4. The audio fingerprint robustness after dimensionality reduction is higher
Drawings
FIG. 1 is a flow chart of an audio fingerprint extraction algorithm
FIG. 2 is a 32-D audio fingerprint block diagram of a segment of audio
FIG. 3 Lorentz graph of abnormal sound type audio fingerprint 1 st dimension
FIG. 4 is an auxiliary graph of discrete kini coefficient calculation of audio fingerprint
FIG. 5 is a graph of discrete kini coefficients for dimensions of an anomalous acoustic audio fingerprint
FIG. 6 is a graph of discrete kini coefficients for each dimension of a normal speech class audio fingerprint
FIG. 7 is a graph of discrete kini coefficients for dimensions of audio fingerprints for song class
Detailed Description
The invention provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation.
The technical scheme of the invention is used for solving the problem of overlarge data volume of the audio fingerprint feature library, and the data volume of the sample audio fingerprint library constructed by the audio fingerprint features after dimensionality reduction is smaller and the utilization rate is higher, and the invention mainly comprises the following steps:
step 1, constructing a target sound library in a classified manner
The design classifies the audios and establishes a library according to the audio characteristic types or existing data conditions. Because the audio types are different and the characteristics are different, the audio classification processing is convenient for finding the commonalities of the audio characteristics, and the dimensionality reduction quality of the audio fingerprints can be influenced if the audio classification is not carried out.
The audio fingerprint discrete kini coefficient designed by the invention is suitable for dimension reduction of various audio fingerprints, so that the existing audio data needs to be classified and stored, and then the fingerprint characteristics of various audios are respectively extracted. The audio fingerprint extraction algorithm flow chart is shown in fig. 1.
Step2, extracting fingerprint characteristics of sample audio
And selecting various types of audio data from the constructed audio library as original sample audio. Extracting fingerprint features of the original sample and introducing discrete kini coefficients to perform dimensionality reduction on the fingerprint features. The specific process is as follows:
step2.1: preprocessing of target audio
Before extracting the data features, preprocessing operation is firstly carried out. The pretreatment comprises the following steps: band-pass filtering, pre-emphasis, framing.
(1) And selecting 8kHz sampling audio signals as processing objects to carry out band-pass filtering processing, and selecting a band-pass filter with a band-pass range of 20Hz-4000Hz to process the signals in order to extract the most important frequency components for human ear perception. In the design, a Finite Impulse Response (FIR) Filter is selected for bandpass filtering, and the filtering process is as follows:
wherein, T is the number of sampling points of the processed signal, p is the time domain index, h (l) is the FIR filter coefficient, x (p) is the input signal, and y (p) is the band-pass filtered signal.
(2) The design selects a digital filter with 6 dB/octave to realize the pre-emphasis processing on the signals y (p) after band-pass filtering, so as to improve the high-frequency characteristic of the pre-processed signals, enable the signal frequency spectrum to become relatively flat, and enable the voice signals to use the same signal-to-noise ratio to obtain the frequency spectrum in the whole frequency band from low frequency to high frequency. The pre-emphasis process is shown as follows:
s(p)=y(p)-μ*y(p-1)
wherein mu is a pre-emphasis coefficient which takes a value of 0.96, and s (p) is a signal after pre-emphasis processing.
(3) And carrying out windowing and framing processing on the pre-emphasis signal. Although the frame segmentation method can be a continuous segmentation method, an overlapped segmentation method is generally used, which is to make the transition between frames smooth and maintain the continuity of the frames. The audio is framed with a frame length of 0.064 seconds, with a 75% overlap between frames, each frame being weighted with a hanning window of the same length. The windowing formula is as follows:
wherein, T is the length of the hanning window and is also the frame length of one frame of audio.
Step2.2: fingerprint feature extraction for audio data
A digital audio fingerprint can be regarded as a concentrated essence of a section of audio, and comprises the most important part of audio data audition, and the dimension of the audio fingerprint is a key factor influencing the data volume of the fingerprint and the retrieval rate, so the technology introduces discrete kini coefficients to perform feature dimension reduction on the audio fingerprint, and firstly performs fingerprint feature extraction on the audio before dimension reduction, and the steps are as follows:
(1) performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:
where x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of the discrete fourier transform.
(2) The frequency domain signal after discrete Fourier transform is subjected to spectrum sub-band division, 33 non-overlapping frequency bands are selected from the spectrum and distributed in the range of 20-4000Hz (the identification of the human ear on the audio frequency is mainly concentrated in the range), and the frequency bands are equally logarithmically spaced (according to the fact that the response of the human ear on different frequencies is not linear). The starting frequency of the mth sub-band, i.e., the ending frequency f (m) of the (m-1) th sub-band, can be expressed as follows:
where Fmin is the lower mapping limit, here 20Hz, Fmax is the upper mapping limit, here 4000Hz, and M is the number of subbands, here 33.
(3) Calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, assuming that the starting frequency of the mth sub-band is f (m), the ending frequency is f (m +1), and the frequency domain signal after discrete fourier transform is x (k), the formula of the energy of the mth sub-band of the nth frame is as follows:
(4) generating a sub-fingerprint of each frame of audio, performing bit difference discrimination on the 33 sub-band energies obtained by each frame, and generating a 32-bit binary code of each frame of audio, namely the sub-fingerprint of each frame of audio, wherein the mth sub-band energy of the nth frame is E (n, m), and the corresponding binary bit information is F (n, m), and then the binary audio fingerprint information discrimination formula of each frame is as follows:
as can be seen from the above formula, each frame of audio finally generates a 32-dimensional binary sub-fingerprint information, the sub-fingerprint contains less information, and an audio fingerprint feature often consists of a plurality of sub-fingerprints.
In practical applications, the 32-dimensional binary sub-fingerprints generated by a frame of audio signal contain less information and cannot accurately retrieve or identify target audio, in practical applications, an audio fingerprint block is commonly used for audio retrieval or audio identification, and the audio fingerprint block is formed by combining at least 256 audio sub-fingerprints, such as a 32-dimensional audio fingerprint block extracted from a section of audio shown in fig. 2.
Step3, reducing the dimension of the audio fingerprint characteristics
After the audio fingerprint extraction process, the audio data of each frame finally generates a 32-dimensional binary sub-fingerprint information. For a segment of audio, the contained audio fingerprint information is composed of a plurality of binary sub-fingerprint information, the data volume of the fingerprint information is still large, and in practical application, it is desirable to further reduce the audio fingerprint dimension so as to effectively reduce the data volume of the fingerprint. The design provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation, the discrete kini coefficient is solved for each dimension of the audio fingerprint, and the purpose of reducing the fingerprint dimension is achieved through the size of the discrete kini coefficient of each dimension of the fingerprint.
Every 50 frames of audio fingerprint dimension data are grouped, including but not limited to 50 frames, discrete kini coefficients of the dimension fingerprints are obtained, the discrete kini coefficients of the dimension fingerprints reflect the discrete degree of the dimension data of the audio fingerprint, namely the difference of the dimension data of the audio fingerprint, the larger the discrete kini coefficient of a certain position of the audio fingerprint is, the larger the difference of different audio frequencies at the position is, the better the distinguishability of the position is, and otherwise the distinguishability is poor. The design removes the information of the dimension with poor distinguishability by keeping the information of the dimension with good distinguishability in the audio fingerprint, and converts the 32-dimensional audio fingerprint into the lower dimension to reduce the data volume of the fingerprint.
Step3.1: calculating discrete kini coefficient of each dimension of audio fingerprint
The method comprises the following specific steps:
(1) obtaining discrete Lorentz curve of audio fingerprint, wherein the discrete Lorentz curve is a key curve for obtaining discrete kini coefficient and is formed by accumulating fingerprint data to obtain ratio vectorJ represents the dimension serial number of the audio fingerprint, and the value range j is 1,2, … … and 32 to obtain the ratio vector of the accumulated fingerprint dataThe calculation process of (2) is as follows:
processing various audio fingerprints in an audio fingerprint database according to frames, dividing every 50 frames of fingerprint data of the audio fingerprints into a group of N groups, and constructing a jth dimension accumulated fingerprint data vector:
whereinAre numbered with group numbers and have
Constructing a j-dimension accumulated fingerprint data proportion vectorThe elements in the vector are defined as:
………
the curve formed by each element of the proportion vector is a discrete Lorentz curve, and specifically, the discrete Lorentz curve drawing process of the j dimension of the audio fingerprint is as follows:
the audio fingerprint accumulated group number is used as an abscissa (the abscissa is the abscissa) Taking the occupation amount of j dimension accumulated fingerprint data of the audio fingerprint as a vertical coordinate, and carrying out occupation amount calculation on j dimension accumulated fingerprint data of the audio fingerprintThe discrete points are drawn, the discrete curve linked by the discrete points is the discrete Lorentz curve of the j dimension of the audio fingerprint, and the coordinates of the discrete curve areFor example, a Lorentz plot of the 1 st dimension of an anomalous sound-like audio fingerprint is shown in FIG. 3.
(2) The discrete kini coefficients of all dimensions of various audio fingerprints are obtained, and the obtained discrete Lorentz curve is taken as a boundary line, so that the kini coefficient formula of the j dimension of the audio fingerprint can be obtained as follows:
wherein S isaIs a closed area enclosed by the diagonal OA and the discrete Lorentz curve, and the coordinate of the point O is (0,0) and the coordinate of the point A is (1,1), SbIs a closed area enclosed by coordinate line segments OB and BA and a discrete Lorentz curve, the coordinate of a point B is (1,0), GjFor the kini coefficient of j dimension of audio fingerprint, the fingerprint discrete kini coefficient computation auxiliary graph is shown in fig. 4.
As can be seen from the above, Sa+SbIs the closed area enclosed by the diagonal line segment OA and the line segments OB and BA, i.e.: sa+Sb1/2, because the audio fingerprint is discrete, the present design discretizes the above equation into:
thereby, obtaining the j-dimension discrete kini coefficient of the audio fingerprintWherein i is a group number of the group,and accumulating ith group of fingerprint data occupation quantity for j dimension of the audio fingerprint.
Step3.2: carrying out fingerprint dimensionality reduction by counting discrete kini coefficients of all dimensions of different types of audio fingerprints
And (4) training discrete kini coefficients of all dimensions of the audio fingerprint by combining application scenes. And (3) constructing a training set from the audio data according to the data type or combining the existing data set to construct the training set, and calculating the discrete kini coefficient of each dimension of the audio fingerprint of the audio data of each training set and carrying out statistical analysis.
The discrete kini coefficient designed by the invention is suitable for dimension reduction of various audio fingerprints, the following audio is selected as an analysis object, and the discrete kini coefficient of each dimension of the audio fingerprint is subjected to statistical analysis:
as shown in fig. 5, discrete kini coefficients of each dimension of an audio fingerprint of an abnormal sound are obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the abnormal sound from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the abnormal sound audio fingerprint is 0.36-0.38, it can be seen from the figure that the discrete kini coefficients of each dimension of the audio fingerprint are different in size, the discrete kini coefficients of some dimensions are smaller, such as dimensions 2, 25 and 26, and the discrete kini coefficients of some dimensions are larger, such as dimensions 11, 20 and 28, and the size of the discrete kini coefficients of the fingerprint represents the difference of the audio fingerprint data of the dimension. At this time, if the threshold of the discrete kini coefficient of the fingerprint is set to 0.37, it can be seen from the figure that there are 2 nd, 5 th, 24 th, 25 th and 26 th dimensions of the discrete kini coefficient of the fingerprint smaller than 0.37, which shows that the differences of the fingerprint data of each frame of the 5 dimensions are small and can be eliminated, and the dimension of each frame of the audio fingerprint is reduced by eliminating the dimensions with the small differences, thereby reducing the data volume of the audio fingerprint database.
FIG. 6 is a diagram showing discrete kini coefficients of each dimension of an audio fingerprint of a normal speaking class, which is obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the normal speaking class from an audio fingerprint library, by setting a threshold of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information lower than the threshold can be discarded, audio fingerprint dimension information higher than the threshold is retained, the threshold range of the discrete kini coefficients of the audio fingerprint of the normal speaking class is 0.36-0.38, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the audio fingerprint has the 2 nd, 22 nd, 25 nd dimensions, the dimension with larger discrete kini coefficients of the fingerprint has the 18 th, 26 th, 28 th dimensions, etc., if the threshold of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the fingerprint has the 2 nd, 22 nd, 25 nd dimensions, which shows that the difference of each frame fingerprint data of the 3 dimensions is smaller, can be discarded.
FIG. 7 is a diagram illustrating discrete kini coefficients of each dimension of an audio fingerprint of a song class, wherein discrete kini coefficients of 32 dimensions of the audio fingerprint of the song class are counted from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the audio fingerprint of the song class is 0.36-0.38, it can be seen from the diagram that the discrete kini coefficients of each dimension of the audio fingerprint have relatively large differences, the discrete kini coefficients of each dimension of the fingerprint are relatively dispersed, the discrete kini coefficients of 2 nd, 14 th, 15 th and 25 th dimensions of the fingerprint are small, the discrete kini coefficients of 4 th, 5 th, 11 th, 18 th and 29 th dimensions of the fingerprint are large, and if the threshold value of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the discrete kini coefficients of 1 st, 5 th, 11 th, 18 th and, 2. Dimensions 14, 15, 25 and 26 show that the fingerprint data of each frame in the 6 dimensions are small in difference and can be omitted.
It can be seen from these figures that although the discrete kini coefficients of the fingerprints of various audio frequencies are different and the trends of the coefficients in various dimensions are different, the discrete kini coefficients of the fingerprints in several dimensions are smaller, which means that the audio fingerprints are less distinguishable in the several dimensions, so that the data in the several dimensions can be removed, and the 32-dimensional audio fingerprints are converted into the lower dimensions, thereby effectively reducing the data amount of the fingerprints.
Claims (2)
1. An audio fingerprint dimension reduction method based on discrete kini coefficients is characterized by comprising the following steps:
step 1, constructing a target sound library in a classified manner
Classifying the audios to build a library according to the audio characteristic types or the existing data conditions;
step2, extracting fingerprint characteristics of sample audios in classification mode
Selecting various types of audio data from a constructed audio library as original sample audio, extracting fingerprint features of the original sample according to the types and introducing a discrete kini coefficient to reduce the dimension of the fingerprint features, wherein the specific process comprises the following steps:
step2.1: pre-processing the original sample audio, the pre-processing comprising: band-pass filtering, pre-emphasis, windowing and framing;
step2.2: fingerprint feature extraction is carried out on preprocessed audio data
(1) Performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:
wherein, x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of discrete fourier transform;
(2) dividing frequency spectrum sub-bands of the frequency domain signal after discrete Fourier transform, selecting 33 non-overlapping frequency bands from the frequency spectrum, distributing the frequency bands in the range of 20-4000Hz, wherein the frequency bands are equally logarithmically spaced, and the starting frequency of the mth sub-band, namely the termination frequency f (m) of the mth-1 sub-band, can be expressed as the following formula:
wherein Fmin is a mapping lower limit, Fmax is a mapping upper limit, and M is the number of subbands, here 33;
(3) calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, so that the formula of the energy of the mth sub-band of the nth frame is as follows:
wherein, f (m) is the starting frequency of the mth sub-band, f (m +1) is the ending frequency of the mth sub-band, and x (k) is the frequency domain signal after the discrete fourier transform of the nth frame;
(4) generating a sub-fingerprint of each frame of audio, specifically: and performing bit difference judgment on the 33 subband energies obtained by each frame to generate 32-bit binary codes of each frame of audio, namely the sub-fingerprints of each frame of audio, wherein F (n, m) is binary bit information corresponding to the 32-bit binary codes, and a specific judgment formula is as follows:
wherein E (n, m) is the mth subband energy of the nth frame, and m is 1,2, …, 32;
step3, reducing the dimension of the audio fingerprint characteristics of various samples
Step3.1: the discrete kini coefficient of each dimensionality of the audio fingerprints of various samples is obtained, and the method specifically comprises the following steps:
(1) and processing the audio fingerprints of various types of samples according to frames, and dividing the audio fingerprints into N groups.
(2) Constructing j-dimension accumulated fingerprint data vectors of the audio fingerprints of various samples, wherein a specific calculation formula is as follows:
whereinAre numbered with group numbers and have F (, x) is the binary bit information in step2.
(3) Constructing j-dimension accumulated fingerprint data proportion vector of audio fingerprints of various types of samples The elements in the vector are defined as:
………
(4) obtaining discrete kini coefficients of all dimensionalities of the audio fingerprints of all types of samples, wherein a specific calculation formula is as follows:
wherein i is a group number,accumulating the proportion of the ith group of fingerprint data for the jth dimension of the audio fingerprint, and specifically obtaining the proportion in the previous step (3);
step3.2: performing fingerprint dimensionality reduction on discrete kini coefficients of all dimensionalities of audio fingerprints of various types of samples
And (3) respectively forming training sets corresponding to the categories by various sample data, performing statistical analysis on the discrete kini coefficient of each dimensionality of the audio fingerprint of each training set to obtain a threshold value of the discrete kini coefficient of the audio fingerprint of each category, then performing dimensionality reduction operation on the audio to be subjected to dimensionality reduction according to the obtained threshold value, namely, retaining the audio fingerprint dimensionality information higher than the threshold value, deleting the audio fingerprint dimensionality information lower than the threshold value, and finishing dimensionality reduction.
2. The discrete kini coefficient-based audio fingerprint dimension reduction method according to claim 1, wherein:
selecting 8kHz sampled audio signals as processing objects in the step2, and selecting a band-pass filter with a passband ranging from 20Hz to 4000Hz to process the sampled audio signals, wherein the band-pass filter is a finite impulse response FIR filter;
the pre-emphasis process is shown as follows:
s(p)=y(p)-μ*y(p-1)
wherein y (p) is a band-pass filtered signal, μ is a pre-emphasis coefficient, and s (p) is a pre-emphasis processed signal, preferably implemented with a digital filter having 6 dB/octave;
the windowing framing processing specifically refers to framing the pre-emphasized audio in an overlapping and segmenting mode, wherein a 75% overlapping rate is kept between frames, each frame is weighted by a Hanning window with the same length, and a windowing formula is as follows:
wherein, T is the length of the hanning window and is also the frame length of one frame of audio.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910784077.5A CN110600038B (en) | 2019-08-23 | 2019-08-23 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910784077.5A CN110600038B (en) | 2019-08-23 | 2019-08-23 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110600038A true CN110600038A (en) | 2019-12-20 |
CN110600038B CN110600038B (en) | 2022-04-05 |
Family
ID=68855402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910784077.5A Active CN110600038B (en) | 2019-08-23 | 2019-08-23 | Audio fingerprint dimension reduction method based on discrete kini coefficient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110600038B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111063360A (en) * | 2020-01-21 | 2020-04-24 | 北京爱数智慧科技有限公司 | Voiceprint library generation method and device |
CN111612038A (en) * | 2020-04-24 | 2020-09-01 | 平安直通咨询有限公司上海分公司 | Abnormal user detection method and device, storage medium and electronic equipment |
CN113421585A (en) * | 2021-05-10 | 2021-09-21 | 云境商务智能研究院南京有限公司 | Audio fingerprint database generation method and device |
CN115277322A (en) * | 2022-07-13 | 2022-11-01 | 金陵科技学院 | CR signal modulation identification method and system based on graph and continuous entropy characteristics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6363342B2 (en) * | 1998-12-18 | 2002-03-26 | Matsushita Electric Industrial Co., Ltd. | System for developing word-pronunciation pairs |
CN108122562A (en) * | 2018-01-16 | 2018-06-05 | 四川大学 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
CN109447180A (en) * | 2018-11-14 | 2019-03-08 | 山东省通信管理局 | A kind of fooled people's discovery method of the telecommunication fraud based on big data and machine learning |
CN109493886A (en) * | 2018-12-13 | 2019-03-19 | 西安电子科技大学 | Speech-emotion recognition method based on feature selecting and optimization |
-
2019
- 2019-08-23 CN CN201910784077.5A patent/CN110600038B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6363342B2 (en) * | 1998-12-18 | 2002-03-26 | Matsushita Electric Industrial Co., Ltd. | System for developing word-pronunciation pairs |
CN108122562A (en) * | 2018-01-16 | 2018-06-05 | 四川大学 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
CN109447180A (en) * | 2018-11-14 | 2019-03-08 | 山东省通信管理局 | A kind of fooled people's discovery method of the telecommunication fraud based on big data and machine learning |
CN109493886A (en) * | 2018-12-13 | 2019-03-19 | 西安电子科技大学 | Speech-emotion recognition method based on feature selecting and optimization |
Non-Patent Citations (2)
Title |
---|
VIPIN KUMAR: "Mood Classifiaction of Lyrics using SentiWordNet", 《2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS》 * |
候智庭: "时间序列遥感数据植被分类中的特", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111063360A (en) * | 2020-01-21 | 2020-04-24 | 北京爱数智慧科技有限公司 | Voiceprint library generation method and device |
CN111063360B (en) * | 2020-01-21 | 2022-08-19 | 北京爱数智慧科技有限公司 | Voiceprint library generation method and device |
CN111612038A (en) * | 2020-04-24 | 2020-09-01 | 平安直通咨询有限公司上海分公司 | Abnormal user detection method and device, storage medium and electronic equipment |
CN111612038B (en) * | 2020-04-24 | 2024-04-26 | 平安直通咨询有限公司上海分公司 | Abnormal user detection method and device, storage medium and electronic equipment |
CN113421585A (en) * | 2021-05-10 | 2021-09-21 | 云境商务智能研究院南京有限公司 | Audio fingerprint database generation method and device |
CN115277322A (en) * | 2022-07-13 | 2022-11-01 | 金陵科技学院 | CR signal modulation identification method and system based on graph and continuous entropy characteristics |
CN115277322B (en) * | 2022-07-13 | 2023-07-28 | 金陵科技学院 | CR signal modulation identification method and system based on graph and continuous entropy characteristics |
Also Published As
Publication number | Publication date |
---|---|
CN110600038B (en) | 2022-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600038B (en) | Audio fingerprint dimension reduction method based on discrete kini coefficient | |
CN111279414B (en) | Segmentation-based feature extraction for sound scene classification | |
CN110647656B (en) | Audio retrieval method utilizing transform domain sparsification and compression dimension reduction | |
CN114596879B (en) | False voice detection method and device, electronic equipment and storage medium | |
AU744678B2 (en) | Pattern recognition using multiple reference models | |
Abdalla et al. | DWT and MFCCs based feature extraction methods for isolated word recognition | |
CN112183582A (en) | Multi-feature fusion underwater target identification method | |
CN110767248B (en) | Anti-modulation interference audio fingerprint extraction method | |
CN101577116A (en) | Extracting method of MFCC coefficients of voice signal, device and Mel filtering method | |
CN108172214A (en) | A kind of small echo speech recognition features parameter extracting method based on Mel domains | |
Imran et al. | An analysis of audio classification techniques using deep learning architectures | |
CN110610722A (en) | Short-time energy and Mel cepstrum coefficient combined novel low-complexity dangerous sound scene discrimination method based on vector quantization | |
Panagiotou et al. | PCA summarization for audio song identification using Gaussian mixture models | |
CN113555038A (en) | Speaker independent speech emotion recognition method and system based on unsupervised field counterwork learning | |
Seo et al. | Linear speed-change resilient audio fingerprinting | |
CN116884431A (en) | CFCC (computational fluid dynamics) feature-based robust audio copy-paste tamper detection method and device | |
CN111785262A (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN112309404B (en) | Machine voice authentication method, device, equipment and storage medium | |
Wang et al. | Revealing the processing history of pitch-shifted voice using CNNs | |
CN113808604B (en) | Sound scene classification method based on gamma through spectrum separation | |
CN114613391B (en) | Snore identification method and device based on half-band filter | |
CN113948088A (en) | Voice recognition method and device based on waveform simulation | |
Htun | Analytical approach to MFCC based space-saving audio fingerprinting system | |
Thiruvengatanadhan | Music genre classification using mfcc and aann | |
Therese et al. | A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |