CN110600038A - Audio fingerprint dimension reduction method based on discrete kini coefficient - Google Patents

Audio fingerprint dimension reduction method based on discrete kini coefficient Download PDF

Info

Publication number
CN110600038A
CN110600038A CN201910784077.5A CN201910784077A CN110600038A CN 110600038 A CN110600038 A CN 110600038A CN 201910784077 A CN201910784077 A CN 201910784077A CN 110600038 A CN110600038 A CN 110600038A
Authority
CN
China
Prior art keywords
audio
fingerprint
discrete
dimension
kini
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910784077.5A
Other languages
Chinese (zh)
Other versions
CN110600038B (en
Inventor
贾懋珅
赵文兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910784077.5A priority Critical patent/CN110600038B/en
Publication of CN110600038A publication Critical patent/CN110600038A/en
Application granted granted Critical
Publication of CN110600038B publication Critical patent/CN110600038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an audio fingerprint dimensionality reduction method based on discrete kini coefficient calculation, which aims to solve the problem of high dimensionality of audio fingerprint features and specifically comprises the steps of constructing a target sound bank in a classified mode, extracting fingerprint features of sample audio and introducing discrete kini coefficients to reduce dimensionality of the audio fingerprint features. According to the invention, the discrete kini coefficients of the fingerprints are introduced into all the dimensions of the audio fingerprint, the discriminative sizes of different audios in the dimension are reflected by the discrete kini coefficient of all the dimensions of the audio fingerprint, and the dimension with small discrete kini coefficient is deleted by keeping the dimension with large discrete kini coefficient to achieve the purpose of reducing the dimension. The sample audio fingerprint database constructed by the audio fingerprint features after dimension reduction has smaller data volume and higher utilization rate.

Description

Audio fingerprint dimension reduction method based on discrete kini coefficient
Technical Field
The invention belongs to the field of intelligent application type sound fields, and particularly relates to an audio fingerprint dimension reduction method based on discrete kini coefficient calculation.
Background
In recent years, intelligence is popular among people and is widely researched and discussed, intelligent identification of audio is an important basic stone for artificial intelligence development, intelligent identification of audio cannot be separated from feature extraction of audio, among numerous features of audio, audio fingerprint is the most popular in recent years, and the audio fingerprint is a compact digital signature based on content and capable of representing a section of important acoustic features of audio, and the main purpose of the audio fingerprint is to represent a large amount of audio data by using a small amount of digital information. Compared with the traditional audio characteristics, the method has the advantages of small data volume, high anti-noise performance, relatively simple characteristic extraction process and the like, and is widely applied to the fields of music identification, advertisement supervision, copyright protection and the like. However, audio fingerprints have the disadvantage of high fingerprint dimensionality, which delays the recognition speed in audio recognition and occupies a large memory of a computer. In contrast, if the dimensionality of the audio fingerprint can be reduced, the data size of the audio fingerprint can be reduced to a great extent, and meanwhile, the speed of audio retrieval can be improved, and the audio identification performance can be enhanced.
Disclosure of Invention
Aiming at the problem of high dimension of audio fingerprint characteristics, the invention introduces fingerprint discrete kini coefficients in each dimension of the audio fingerprint, and reflects the differentiability of different audios in the dimension through the discrete kini coefficient of each dimension of the audio fingerprint. The larger the discrete kini coefficient of a certain bit of the audio fingerprint is, the larger the difference of different audio at the bit is, which indicates that the bit is better distinguishable, otherwise, the distinctiveness is poor. Therefore, the bits with good distinguishability are reserved, the bits with poor distinguishability are removed, and the high-dimensional audio fingerprint can be converted into a lower dimension, so that the data volume of the fingerprint is effectively reduced.
The technical scheme of the invention is used for solving the problem of overlarge data volume of the audio fingerprint feature library, and the data volume of the sample audio fingerprint library constructed by the audio fingerprint features after dimensionality reduction is smaller and the utilization rate is higher, and the invention mainly comprises the following steps:
step 1, constructing a target sound library in a classified manner
The design classifies the audios and establishes a library according to the audio characteristic types or existing data conditions. Because the audio types are different and the characteristics are different, the audio classification processing is convenient for finding the commonalities of the audio characteristics, and the dimensionality reduction quality of the audio fingerprints can be influenced if the audio classification is not carried out. Storing the existing audio data in a classified manner, and then extracting fingerprint features of various audios respectively. The audio fingerprint extraction algorithm flow chart is shown in fig. 1.
Step2, extracting fingerprint characteristics of sample audio
And selecting various types of audio data from the constructed audio library as original sample audio. Extracting fingerprint features of the original sample and introducing discrete kini coefficients to perform dimensionality reduction on the fingerprint features. The specific process is as follows:
step2.1: preprocessing of target audio
Before extracting the data features, preprocessing operation is firstly carried out. The pretreatment comprises the following steps: band-pass filtering, pre-emphasis, framing.
(1) The method comprises the steps of selecting 8kHz sampling audio signals as processing objects to carry out band-pass filtering processing, extracting the most important frequency components of human ear perception, and selecting a band-pass filter with a band-pass range of 20Hz-4000Hz to process the signals. In the design, a Finite Impulse Response (FIR) Filter is selected for bandpass filtering, and the filtering process is as follows:
wherein, T is the number of sampling points of the processed signal, p is the time domain index, h (l) is the FIR filter coefficient, x (p) is the input signal, and y (p) is the band-pass filtered signal.
(2) The design selects a digital filter with 6 dB/octave to realize the pre-emphasis processing on the signals y (p) after band-pass filtering, so as to improve the high-frequency characteristic of the pre-processed signals, enable the signal frequency spectrum to become relatively flat, and enable the voice signals to use the same signal-to-noise ratio to obtain the frequency spectrum in the whole frequency band from low frequency to high frequency. The pre-emphasis process is shown as follows:
s(p)=y(p)-μ*y(p-1)
wherein mu is a pre-emphasis coefficient which takes a value of 0.96, and s (p) is a signal after pre-emphasis processing.
(3) And carrying out windowing and framing processing on the pre-emphasis signal. Although the frame segmentation method can be a continuous segmentation method, an overlapped segmentation method is generally used, which is to make the transition between frames smooth and maintain the continuity of the frames. The audio is framed with a frame length of 0.064 seconds, with a 75% overlap between frames, each frame being weighted with a hanning window of the same length. The windowing formula is as follows:
wherein, T is the length of the hanning window and is also the frame length of one frame of audio.
Step2.2: fingerprint feature extraction for audio data
A digital audio fingerprint can be regarded as a concentrated essence of a section of audio, and comprises the most important part of audio data audition, and the dimension of the audio fingerprint is a key factor influencing the data volume of the fingerprint and the retrieval rate, so the technology introduces discrete kini coefficients to perform feature dimension reduction on the audio fingerprint, and firstly performs fingerprint feature extraction on the audio before dimension reduction, and the steps are as follows:
(1) performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:
where x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of the discrete fourier transform.
(2) The frequency domain signal after discrete Fourier transform is subjected to spectrum sub-band division, 33 non-overlapping frequency bands are selected from the spectrum and distributed in the range of 20-4000Hz (the identification of the human ear on the audio frequency is mainly concentrated in the range), and the frequency bands are equally logarithmically spaced (according to the fact that the response of the human ear on different frequencies is not linear). The starting frequency of the mth sub-band, i.e., the ending frequency f (m) of the (m-1) th sub-band, can be expressed as follows:
where Fmin is the lower mapping limit, here 20Hz, Fmax is the upper mapping limit, here 4000Hz, and M is the number of subbands, here 33.
(3) Calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, assuming that the starting frequency of the mth sub-band is f (m), the ending frequency is f (m +1), and the frequency domain signal after discrete fourier transform is x (k), the formula of the energy of the mth sub-band of the nth frame is as follows:
(4) generating a sub-fingerprint of each frame of audio, performing bit difference discrimination on the 33 sub-band energies obtained by each frame, and generating a 32-bit binary code (sub-fingerprint) of each frame of audio, wherein the mth sub-band energy of the nth frame is E (n, m), and the corresponding binary bit information is F (n, m), and then the binary audio fingerprint information discrimination formula of each frame is as follows:
as can be seen from the above formula, each frame of audio finally generates a 32-dimensional binary sub-fingerprint information, the sub-fingerprint contains less information, and an audio fingerprint feature often consists of a plurality of sub-fingerprints.
Step3, reducing the dimension of the audio fingerprint characteristics
After the audio fingerprint extraction process, the audio data of each frame finally generates a 32-dimensional binary sub-fingerprint information. For a segment of audio, the contained audio fingerprint information is composed of a plurality of binary sub-fingerprint information, the data volume of the fingerprint information is still large, and in practical application, it is desirable to further reduce the audio fingerprint dimension so as to effectively reduce the data volume of the fingerprint. The design provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation, the discrete kini coefficient is solved for each dimension of the audio fingerprint, and the purpose of reducing the fingerprint dimension is achieved through the size of the discrete kini coefficient of each dimension of the fingerprint.
The discrete kini coefficients of the dimensional fingerprints are obtained by taking every 50 frames of the audio fingerprint as a group, and the discrete kini coefficients of the dimensional fingerprints reflect the discrete degree of certain dimensional data of the audio fingerprint, namely the difference of the certain dimensional data of the audio fingerprint, so that the larger the discrete kini coefficient of a certain position of the audio fingerprint is, the larger the difference of different audio frequencies at the position is, the better the distinguishability of the position is, and otherwise, the distinguishability is poor. The design removes the information of the dimension with poor distinguishability by keeping the information of the dimension with good distinguishability in the audio fingerprint, and converts the 32-dimensional audio fingerprint into the lower dimension to reduce the data volume of the fingerprint.
Step3.1: calculating discrete kini coefficient of each dimension of audio fingerprint
The derivation process and the specific steps are as follows:
(1) obtaining discrete Lorentz curve of audio fingerprint, wherein the discrete Lorentz curve is a key curve for obtaining discrete kini coefficient and is formed by accumulating fingerprint data to obtain ratio vectorJ represents the dimension number of the audio fingerprint, and the value range j is 1,2, … … and 32 to obtain the ratio vector of the accumulated fingerprint dataThe calculation process of (2) is as follows:
processing various audio fingerprints in an audio fingerprint database according to frames, dividing every 50 frames of fingerprint data of the audio fingerprints into a group of N groups, and constructing a jth dimension accumulated fingerprint data vector:
whereinAre numbered with group numbers and have Constructing a j-dimension accumulated fingerprint data proportion vectorThe elements in the vector are defined as:
………
the curve formed by each element of the proportion vector is a discrete Lorentz curve, and specifically, the discrete Lorentz curve drawing process of the j dimension of the audio fingerprint is as follows:
the audio fingerprint accumulated group number is used as an abscissa (the abscissa is the abscissa) Taking the occupation amount of j dimension accumulated fingerprint data of the audio fingerprint as a vertical coordinate, and carrying out occupation amount calculation on j dimension accumulated fingerprint data of the audio fingerprintThe discrete points are drawn, the discrete curve linked by the discrete points is the discrete Lorentz curve of the j dimension of the audio fingerprint, and the coordinates of the discrete curve are
(2) The discrete kini coefficients of all dimensions of various audio fingerprints are obtained, and the obtained discrete Lorentz curve is taken as a boundary line, so that the kini coefficient formula of the j dimension of the audio fingerprint can be obtained as follows:
wherein S isaIs a closed area enclosed by the diagonal OA and the discrete Lorentz curve, and the coordinate of the point O is (0,0) and the coordinate of the point A is (1,1), SbIs a closed area enclosed by coordinate line segments OB and BA and a discrete Lorentz curve, the coordinate of a point B is (1,0), GjFor the kini coefficient of j dimension of audio fingerprint, the fingerprint discrete kini coefficient computation auxiliary graph is shown in fig. 4.
As can be seen from the above, Sa+SbIs the closed area enclosed by the diagonal line segment OA and the line segments OB and BA, i.e.: sa+Sb1/2, because the audio fingerprint is discrete, the present design discretizes the above equation into:
thereby, obtaining the j-dimension discrete kini coefficient of the audio fingerprintWherein i is a group number of the group,and accumulating ith group of fingerprint data occupation quantity for j dimension of the audio fingerprint.
Step3.2: carrying out fingerprint dimensionality reduction by counting discrete kini coefficients of all dimensions of different types of audio fingerprints
And (4) training discrete kini coefficients of all dimensions of the audio fingerprint by combining application scenes. And (3) constructing a training set from the audio data according to the data type or combining the existing data set to construct the training set, and calculating the discrete kini coefficient of each dimension of the audio fingerprint of the audio data of each training set and carrying out statistical analysis.
The discrete kini coefficient of the audio fingerprint designed by the invention is suitable for dimension reduction of various audio fingerprints, the following audio fingerprints are selected as analysis objects, and the discrete kini coefficient of each dimension of the audio fingerprint is subjected to statistical analysis:
as shown in fig. 5, discrete kini coefficients of each dimension of an audio fingerprint of an abnormal sound are obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the abnormal sound from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the abnormal sound audio fingerprint is 0.36-0.38, it can be seen from the figure that the discrete kini coefficients of each dimension of the audio fingerprint are different in size, the discrete kini coefficients of some dimensions are smaller, such as dimensions 2, 25 and 26, and the discrete kini coefficients of some dimensions are larger, such as dimensions 11, 20 and 28, and the size of the discrete kini coefficients of the fingerprint represents the difference of the audio fingerprint data of the dimension. At this time, if the threshold of the discrete kini coefficient of the fingerprint is set to 0.37, it can be seen from the figure that there are 2 nd, 5 th, 24 th, 25 th and 26 th dimensions of the discrete kini coefficient of the fingerprint smaller than 0.37, which shows that the differences of the fingerprint data of each frame of the 5 dimensions are small and can be eliminated, and the dimension of each frame of the audio fingerprint is reduced by eliminating the dimensions with the small differences, thereby reducing the data volume of the audio fingerprint database.
FIG. 6 is a diagram showing discrete kini coefficients of each dimension of an audio fingerprint of a normal speaking class, which is obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the normal speaking class from an audio fingerprint library, by setting a threshold of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information lower than the threshold can be discarded, audio fingerprint dimension information higher than the threshold is retained, the threshold range of the discrete kini coefficients of the audio fingerprint of the normal speaking class is 0.36-0.38, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the audio fingerprint has the 2 nd, 22 nd, 25 nd dimensions, the dimension with larger discrete kini coefficients of the fingerprint has the 18 th, 26 th, 28 th dimensions, etc., if the threshold of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the fingerprint has the 2 nd, 22 nd, 25 nd dimensions, which shows that the difference of each frame fingerprint data of the 3 dimensions is smaller, can be discarded.
FIG. 7 is a diagram illustrating discrete kini coefficients of each dimension of an audio fingerprint of a song class, wherein discrete kini coefficients of 32 dimensions of the audio fingerprint of the song class are counted from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the audio fingerprint of the song class is 0.36-0.38, it can be seen from the diagram that the discrete kini coefficients of each dimension of the audio fingerprint have relatively large differences, the discrete kini coefficients of each dimension of the fingerprint are relatively dispersed, the discrete kini coefficients of 2 nd, 14 th, 15 th and 25 th dimensions of the fingerprint are small, the discrete kini coefficients of 4 th, 5 th, 11 th, 18 th and 29 th dimensions of the fingerprint are large, and if the threshold value of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the discrete kini coefficients of 1 st, 5 th, 11 th, 18 th and, 2. Dimensions 14, 15, 25 and 26 show that the fingerprint data of each frame in the 6 dimensions are small in difference and can be omitted.
It can be seen from these figures that although the discrete kini coefficients of the fingerprints of various audio frequencies are different and the trends of the coefficients in various dimensions are different, the discrete kini coefficients of the fingerprints in several dimensions are smaller, which means that the audio fingerprints are less distinguishable in the several dimensions, so that the data in the several dimensions can be removed, and the 32-dimensional audio fingerprints are converted into the lower dimensions, thereby effectively reducing the data amount of the fingerprints.
The invention has the advantages that:
1. algorithm complexity is low and flexibility is stronger
2. Has smaller audio characteristic data quantity than the traditional audio characteristic data quantity
3. Discrete kini coefficient is introduced as dimension reduction algorithm, complexity is low, and large batch of data can be processed
4. The audio fingerprint robustness after dimensionality reduction is higher
Drawings
FIG. 1 is a flow chart of an audio fingerprint extraction algorithm
FIG. 2 is a 32-D audio fingerprint block diagram of a segment of audio
FIG. 3 Lorentz graph of abnormal sound type audio fingerprint 1 st dimension
FIG. 4 is an auxiliary graph of discrete kini coefficient calculation of audio fingerprint
FIG. 5 is a graph of discrete kini coefficients for dimensions of an anomalous acoustic audio fingerprint
FIG. 6 is a graph of discrete kini coefficients for each dimension of a normal speech class audio fingerprint
FIG. 7 is a graph of discrete kini coefficients for dimensions of audio fingerprints for song class
Detailed Description
The invention provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation.
The technical scheme of the invention is used for solving the problem of overlarge data volume of the audio fingerprint feature library, and the data volume of the sample audio fingerprint library constructed by the audio fingerprint features after dimensionality reduction is smaller and the utilization rate is higher, and the invention mainly comprises the following steps:
step 1, constructing a target sound library in a classified manner
The design classifies the audios and establishes a library according to the audio characteristic types or existing data conditions. Because the audio types are different and the characteristics are different, the audio classification processing is convenient for finding the commonalities of the audio characteristics, and the dimensionality reduction quality of the audio fingerprints can be influenced if the audio classification is not carried out.
The audio fingerprint discrete kini coefficient designed by the invention is suitable for dimension reduction of various audio fingerprints, so that the existing audio data needs to be classified and stored, and then the fingerprint characteristics of various audios are respectively extracted. The audio fingerprint extraction algorithm flow chart is shown in fig. 1.
Step2, extracting fingerprint characteristics of sample audio
And selecting various types of audio data from the constructed audio library as original sample audio. Extracting fingerprint features of the original sample and introducing discrete kini coefficients to perform dimensionality reduction on the fingerprint features. The specific process is as follows:
step2.1: preprocessing of target audio
Before extracting the data features, preprocessing operation is firstly carried out. The pretreatment comprises the following steps: band-pass filtering, pre-emphasis, framing.
(1) And selecting 8kHz sampling audio signals as processing objects to carry out band-pass filtering processing, and selecting a band-pass filter with a band-pass range of 20Hz-4000Hz to process the signals in order to extract the most important frequency components for human ear perception. In the design, a Finite Impulse Response (FIR) Filter is selected for bandpass filtering, and the filtering process is as follows:
wherein, T is the number of sampling points of the processed signal, p is the time domain index, h (l) is the FIR filter coefficient, x (p) is the input signal, and y (p) is the band-pass filtered signal.
(2) The design selects a digital filter with 6 dB/octave to realize the pre-emphasis processing on the signals y (p) after band-pass filtering, so as to improve the high-frequency characteristic of the pre-processed signals, enable the signal frequency spectrum to become relatively flat, and enable the voice signals to use the same signal-to-noise ratio to obtain the frequency spectrum in the whole frequency band from low frequency to high frequency. The pre-emphasis process is shown as follows:
s(p)=y(p)-μ*y(p-1)
wherein mu is a pre-emphasis coefficient which takes a value of 0.96, and s (p) is a signal after pre-emphasis processing.
(3) And carrying out windowing and framing processing on the pre-emphasis signal. Although the frame segmentation method can be a continuous segmentation method, an overlapped segmentation method is generally used, which is to make the transition between frames smooth and maintain the continuity of the frames. The audio is framed with a frame length of 0.064 seconds, with a 75% overlap between frames, each frame being weighted with a hanning window of the same length. The windowing formula is as follows:
wherein, T is the length of the hanning window and is also the frame length of one frame of audio.
Step2.2: fingerprint feature extraction for audio data
A digital audio fingerprint can be regarded as a concentrated essence of a section of audio, and comprises the most important part of audio data audition, and the dimension of the audio fingerprint is a key factor influencing the data volume of the fingerprint and the retrieval rate, so the technology introduces discrete kini coefficients to perform feature dimension reduction on the audio fingerprint, and firstly performs fingerprint feature extraction on the audio before dimension reduction, and the steps are as follows:
(1) performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:
where x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of the discrete fourier transform.
(2) The frequency domain signal after discrete Fourier transform is subjected to spectrum sub-band division, 33 non-overlapping frequency bands are selected from the spectrum and distributed in the range of 20-4000Hz (the identification of the human ear on the audio frequency is mainly concentrated in the range), and the frequency bands are equally logarithmically spaced (according to the fact that the response of the human ear on different frequencies is not linear). The starting frequency of the mth sub-band, i.e., the ending frequency f (m) of the (m-1) th sub-band, can be expressed as follows:
where Fmin is the lower mapping limit, here 20Hz, Fmax is the upper mapping limit, here 4000Hz, and M is the number of subbands, here 33.
(3) Calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, assuming that the starting frequency of the mth sub-band is f (m), the ending frequency is f (m +1), and the frequency domain signal after discrete fourier transform is x (k), the formula of the energy of the mth sub-band of the nth frame is as follows:
(4) generating a sub-fingerprint of each frame of audio, performing bit difference discrimination on the 33 sub-band energies obtained by each frame, and generating a 32-bit binary code of each frame of audio, namely the sub-fingerprint of each frame of audio, wherein the mth sub-band energy of the nth frame is E (n, m), and the corresponding binary bit information is F (n, m), and then the binary audio fingerprint information discrimination formula of each frame is as follows:
as can be seen from the above formula, each frame of audio finally generates a 32-dimensional binary sub-fingerprint information, the sub-fingerprint contains less information, and an audio fingerprint feature often consists of a plurality of sub-fingerprints.
In practical applications, the 32-dimensional binary sub-fingerprints generated by a frame of audio signal contain less information and cannot accurately retrieve or identify target audio, in practical applications, an audio fingerprint block is commonly used for audio retrieval or audio identification, and the audio fingerprint block is formed by combining at least 256 audio sub-fingerprints, such as a 32-dimensional audio fingerprint block extracted from a section of audio shown in fig. 2.
Step3, reducing the dimension of the audio fingerprint characteristics
After the audio fingerprint extraction process, the audio data of each frame finally generates a 32-dimensional binary sub-fingerprint information. For a segment of audio, the contained audio fingerprint information is composed of a plurality of binary sub-fingerprint information, the data volume of the fingerprint information is still large, and in practical application, it is desirable to further reduce the audio fingerprint dimension so as to effectively reduce the data volume of the fingerprint. The design provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation, the discrete kini coefficient is solved for each dimension of the audio fingerprint, and the purpose of reducing the fingerprint dimension is achieved through the size of the discrete kini coefficient of each dimension of the fingerprint.
Every 50 frames of audio fingerprint dimension data are grouped, including but not limited to 50 frames, discrete kini coefficients of the dimension fingerprints are obtained, the discrete kini coefficients of the dimension fingerprints reflect the discrete degree of the dimension data of the audio fingerprint, namely the difference of the dimension data of the audio fingerprint, the larger the discrete kini coefficient of a certain position of the audio fingerprint is, the larger the difference of different audio frequencies at the position is, the better the distinguishability of the position is, and otherwise the distinguishability is poor. The design removes the information of the dimension with poor distinguishability by keeping the information of the dimension with good distinguishability in the audio fingerprint, and converts the 32-dimensional audio fingerprint into the lower dimension to reduce the data volume of the fingerprint.
Step3.1: calculating discrete kini coefficient of each dimension of audio fingerprint
The method comprises the following specific steps:
(1) obtaining discrete Lorentz curve of audio fingerprint, wherein the discrete Lorentz curve is a key curve for obtaining discrete kini coefficient and is formed by accumulating fingerprint data to obtain ratio vectorJ represents the dimension serial number of the audio fingerprint, and the value range j is 1,2, … … and 32 to obtain the ratio vector of the accumulated fingerprint dataThe calculation process of (2) is as follows:
processing various audio fingerprints in an audio fingerprint database according to frames, dividing every 50 frames of fingerprint data of the audio fingerprints into a group of N groups, and constructing a jth dimension accumulated fingerprint data vector:
whereinAre numbered with group numbers and have
Constructing a j-dimension accumulated fingerprint data proportion vectorThe elements in the vector are defined as:
………
the curve formed by each element of the proportion vector is a discrete Lorentz curve, and specifically, the discrete Lorentz curve drawing process of the j dimension of the audio fingerprint is as follows:
the audio fingerprint accumulated group number is used as an abscissa (the abscissa is the abscissa) Taking the occupation amount of j dimension accumulated fingerprint data of the audio fingerprint as a vertical coordinate, and carrying out occupation amount calculation on j dimension accumulated fingerprint data of the audio fingerprintThe discrete points are drawn, the discrete curve linked by the discrete points is the discrete Lorentz curve of the j dimension of the audio fingerprint, and the coordinates of the discrete curve areFor example, a Lorentz plot of the 1 st dimension of an anomalous sound-like audio fingerprint is shown in FIG. 3.
(2) The discrete kini coefficients of all dimensions of various audio fingerprints are obtained, and the obtained discrete Lorentz curve is taken as a boundary line, so that the kini coefficient formula of the j dimension of the audio fingerprint can be obtained as follows:
wherein S isaIs a closed area enclosed by the diagonal OA and the discrete Lorentz curve, and the coordinate of the point O is (0,0) and the coordinate of the point A is (1,1), SbIs a closed area enclosed by coordinate line segments OB and BA and a discrete Lorentz curve, the coordinate of a point B is (1,0), GjFor the kini coefficient of j dimension of audio fingerprint, the fingerprint discrete kini coefficient computation auxiliary graph is shown in fig. 4.
As can be seen from the above, Sa+SbIs the closed area enclosed by the diagonal line segment OA and the line segments OB and BA, i.e.: sa+Sb1/2, because the audio fingerprint is discrete, the present design discretizes the above equation into:
thereby, obtaining the j-dimension discrete kini coefficient of the audio fingerprintWherein i is a group number of the group,and accumulating ith group of fingerprint data occupation quantity for j dimension of the audio fingerprint.
Step3.2: carrying out fingerprint dimensionality reduction by counting discrete kini coefficients of all dimensions of different types of audio fingerprints
And (4) training discrete kini coefficients of all dimensions of the audio fingerprint by combining application scenes. And (3) constructing a training set from the audio data according to the data type or combining the existing data set to construct the training set, and calculating the discrete kini coefficient of each dimension of the audio fingerprint of the audio data of each training set and carrying out statistical analysis.
The discrete kini coefficient designed by the invention is suitable for dimension reduction of various audio fingerprints, the following audio is selected as an analysis object, and the discrete kini coefficient of each dimension of the audio fingerprint is subjected to statistical analysis:
as shown in fig. 5, discrete kini coefficients of each dimension of an audio fingerprint of an abnormal sound are obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the abnormal sound from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the abnormal sound audio fingerprint is 0.36-0.38, it can be seen from the figure that the discrete kini coefficients of each dimension of the audio fingerprint are different in size, the discrete kini coefficients of some dimensions are smaller, such as dimensions 2, 25 and 26, and the discrete kini coefficients of some dimensions are larger, such as dimensions 11, 20 and 28, and the size of the discrete kini coefficients of the fingerprint represents the difference of the audio fingerprint data of the dimension. At this time, if the threshold of the discrete kini coefficient of the fingerprint is set to 0.37, it can be seen from the figure that there are 2 nd, 5 th, 24 th, 25 th and 26 th dimensions of the discrete kini coefficient of the fingerprint smaller than 0.37, which shows that the differences of the fingerprint data of each frame of the 5 dimensions are small and can be eliminated, and the dimension of each frame of the audio fingerprint is reduced by eliminating the dimensions with the small differences, thereby reducing the data volume of the audio fingerprint database.
FIG. 6 is a diagram showing discrete kini coefficients of each dimension of an audio fingerprint of a normal speaking class, which is obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the normal speaking class from an audio fingerprint library, by setting a threshold of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information lower than the threshold can be discarded, audio fingerprint dimension information higher than the threshold is retained, the threshold range of the discrete kini coefficients of the audio fingerprint of the normal speaking class is 0.36-0.38, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the audio fingerprint has the 2 nd, 22 nd, 25 nd dimensions, the dimension with larger discrete kini coefficients of the fingerprint has the 18 th, 26 th, 28 th dimensions, etc., if the threshold of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the fingerprint has the 2 nd, 22 nd, 25 nd dimensions, which shows that the difference of each frame fingerprint data of the 3 dimensions is smaller, can be discarded.
FIG. 7 is a diagram illustrating discrete kini coefficients of each dimension of an audio fingerprint of a song class, wherein discrete kini coefficients of 32 dimensions of the audio fingerprint of the song class are counted from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the audio fingerprint of the song class is 0.36-0.38, it can be seen from the diagram that the discrete kini coefficients of each dimension of the audio fingerprint have relatively large differences, the discrete kini coefficients of each dimension of the fingerprint are relatively dispersed, the discrete kini coefficients of 2 nd, 14 th, 15 th and 25 th dimensions of the fingerprint are small, the discrete kini coefficients of 4 th, 5 th, 11 th, 18 th and 29 th dimensions of the fingerprint are large, and if the threshold value of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the discrete kini coefficients of 1 st, 5 th, 11 th, 18 th and, 2. Dimensions 14, 15, 25 and 26 show that the fingerprint data of each frame in the 6 dimensions are small in difference and can be omitted.
It can be seen from these figures that although the discrete kini coefficients of the fingerprints of various audio frequencies are different and the trends of the coefficients in various dimensions are different, the discrete kini coefficients of the fingerprints in several dimensions are smaller, which means that the audio fingerprints are less distinguishable in the several dimensions, so that the data in the several dimensions can be removed, and the 32-dimensional audio fingerprints are converted into the lower dimensions, thereby effectively reducing the data amount of the fingerprints.

Claims (2)

1. An audio fingerprint dimension reduction method based on discrete kini coefficients is characterized by comprising the following steps:
step 1, constructing a target sound library in a classified manner
Classifying the audios to build a library according to the audio characteristic types or the existing data conditions;
step2, extracting fingerprint characteristics of sample audios in classification mode
Selecting various types of audio data from a constructed audio library as original sample audio, extracting fingerprint features of the original sample according to the types and introducing a discrete kini coefficient to reduce the dimension of the fingerprint features, wherein the specific process comprises the following steps:
step2.1: pre-processing the original sample audio, the pre-processing comprising: band-pass filtering, pre-emphasis, windowing and framing;
step2.2: fingerprint feature extraction is carried out on preprocessed audio data
(1) Performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:
wherein, x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of discrete fourier transform;
(2) dividing frequency spectrum sub-bands of the frequency domain signal after discrete Fourier transform, selecting 33 non-overlapping frequency bands from the frequency spectrum, distributing the frequency bands in the range of 20-4000Hz, wherein the frequency bands are equally logarithmically spaced, and the starting frequency of the mth sub-band, namely the termination frequency f (m) of the mth-1 sub-band, can be expressed as the following formula:
wherein Fmin is a mapping lower limit, Fmax is a mapping upper limit, and M is the number of subbands, here 33;
(3) calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, so that the formula of the energy of the mth sub-band of the nth frame is as follows:
wherein, f (m) is the starting frequency of the mth sub-band, f (m +1) is the ending frequency of the mth sub-band, and x (k) is the frequency domain signal after the discrete fourier transform of the nth frame;
(4) generating a sub-fingerprint of each frame of audio, specifically: and performing bit difference judgment on the 33 subband energies obtained by each frame to generate 32-bit binary codes of each frame of audio, namely the sub-fingerprints of each frame of audio, wherein F (n, m) is binary bit information corresponding to the 32-bit binary codes, and a specific judgment formula is as follows:
wherein E (n, m) is the mth subband energy of the nth frame, and m is 1,2, …, 32;
step3, reducing the dimension of the audio fingerprint characteristics of various samples
Step3.1: the discrete kini coefficient of each dimensionality of the audio fingerprints of various samples is obtained, and the method specifically comprises the following steps:
(1) and processing the audio fingerprints of various types of samples according to frames, and dividing the audio fingerprints into N groups.
(2) Constructing j-dimension accumulated fingerprint data vectors of the audio fingerprints of various samples, wherein a specific calculation formula is as follows:
whereinAre numbered with group numbers and have F (, x) is the binary bit information in step2.
(3) Constructing j-dimension accumulated fingerprint data proportion vector of audio fingerprints of various types of samples The elements in the vector are defined as:
………
(4) obtaining discrete kini coefficients of all dimensionalities of the audio fingerprints of all types of samples, wherein a specific calculation formula is as follows:
wherein i is a group number,accumulating the proportion of the ith group of fingerprint data for the jth dimension of the audio fingerprint, and specifically obtaining the proportion in the previous step (3);
step3.2: performing fingerprint dimensionality reduction on discrete kini coefficients of all dimensionalities of audio fingerprints of various types of samples
And (3) respectively forming training sets corresponding to the categories by various sample data, performing statistical analysis on the discrete kini coefficient of each dimensionality of the audio fingerprint of each training set to obtain a threshold value of the discrete kini coefficient of the audio fingerprint of each category, then performing dimensionality reduction operation on the audio to be subjected to dimensionality reduction according to the obtained threshold value, namely, retaining the audio fingerprint dimensionality information higher than the threshold value, deleting the audio fingerprint dimensionality information lower than the threshold value, and finishing dimensionality reduction.
2. The discrete kini coefficient-based audio fingerprint dimension reduction method according to claim 1, wherein:
selecting 8kHz sampled audio signals as processing objects in the step2, and selecting a band-pass filter with a passband ranging from 20Hz to 4000Hz to process the sampled audio signals, wherein the band-pass filter is a finite impulse response FIR filter;
the pre-emphasis process is shown as follows:
s(p)=y(p)-μ*y(p-1)
wherein y (p) is a band-pass filtered signal, μ is a pre-emphasis coefficient, and s (p) is a pre-emphasis processed signal, preferably implemented with a digital filter having 6 dB/octave;
the windowing framing processing specifically refers to framing the pre-emphasized audio in an overlapping and segmenting mode, wherein a 75% overlapping rate is kept between frames, each frame is weighted by a Hanning window with the same length, and a windowing formula is as follows:
wherein, T is the length of the hanning window and is also the frame length of one frame of audio.
CN201910784077.5A 2019-08-23 2019-08-23 Audio fingerprint dimension reduction method based on discrete kini coefficient Active CN110600038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910784077.5A CN110600038B (en) 2019-08-23 2019-08-23 Audio fingerprint dimension reduction method based on discrete kini coefficient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910784077.5A CN110600038B (en) 2019-08-23 2019-08-23 Audio fingerprint dimension reduction method based on discrete kini coefficient

Publications (2)

Publication Number Publication Date
CN110600038A true CN110600038A (en) 2019-12-20
CN110600038B CN110600038B (en) 2022-04-05

Family

ID=68855402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910784077.5A Active CN110600038B (en) 2019-08-23 2019-08-23 Audio fingerprint dimension reduction method based on discrete kini coefficient

Country Status (1)

Country Link
CN (1) CN110600038B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063360A (en) * 2020-01-21 2020-04-24 北京爱数智慧科技有限公司 Voiceprint library generation method and device
CN111612038A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user detection method and device, storage medium and electronic equipment
CN113421585A (en) * 2021-05-10 2021-09-21 云境商务智能研究院南京有限公司 Audio fingerprint database generation method and device
CN115277322A (en) * 2022-07-13 2022-11-01 金陵科技学院 CR signal modulation identification method and system based on graph and continuous entropy characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363342B2 (en) * 1998-12-18 2002-03-26 Matsushita Electric Industrial Co., Ltd. System for developing word-pronunciation pairs
CN108122562A (en) * 2018-01-16 2018-06-05 四川大学 A kind of audio frequency classification method based on convolutional neural networks and random forest
CN109447180A (en) * 2018-11-14 2019-03-08 山东省通信管理局 A kind of fooled people's discovery method of the telecommunication fraud based on big data and machine learning
CN109493886A (en) * 2018-12-13 2019-03-19 西安电子科技大学 Speech-emotion recognition method based on feature selecting and optimization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363342B2 (en) * 1998-12-18 2002-03-26 Matsushita Electric Industrial Co., Ltd. System for developing word-pronunciation pairs
CN108122562A (en) * 2018-01-16 2018-06-05 四川大学 A kind of audio frequency classification method based on convolutional neural networks and random forest
CN109447180A (en) * 2018-11-14 2019-03-08 山东省通信管理局 A kind of fooled people's discovery method of the telecommunication fraud based on big data and machine learning
CN109493886A (en) * 2018-12-13 2019-03-19 西安电子科技大学 Speech-emotion recognition method based on feature selecting and optimization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VIPIN KUMAR: "Mood Classifiaction of Lyrics using SentiWordNet", 《2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS》 *
候智庭: "时间序列遥感数据植被分类中的特", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063360A (en) * 2020-01-21 2020-04-24 北京爱数智慧科技有限公司 Voiceprint library generation method and device
CN111063360B (en) * 2020-01-21 2022-08-19 北京爱数智慧科技有限公司 Voiceprint library generation method and device
CN111612038A (en) * 2020-04-24 2020-09-01 平安直通咨询有限公司上海分公司 Abnormal user detection method and device, storage medium and electronic equipment
CN111612038B (en) * 2020-04-24 2024-04-26 平安直通咨询有限公司上海分公司 Abnormal user detection method and device, storage medium and electronic equipment
CN113421585A (en) * 2021-05-10 2021-09-21 云境商务智能研究院南京有限公司 Audio fingerprint database generation method and device
CN115277322A (en) * 2022-07-13 2022-11-01 金陵科技学院 CR signal modulation identification method and system based on graph and continuous entropy characteristics
CN115277322B (en) * 2022-07-13 2023-07-28 金陵科技学院 CR signal modulation identification method and system based on graph and continuous entropy characteristics

Also Published As

Publication number Publication date
CN110600038B (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN110600038B (en) Audio fingerprint dimension reduction method based on discrete kini coefficient
CN111279414B (en) Segmentation-based feature extraction for sound scene classification
CN110647656B (en) Audio retrieval method utilizing transform domain sparsification and compression dimension reduction
CN114596879B (en) False voice detection method and device, electronic equipment and storage medium
AU744678B2 (en) Pattern recognition using multiple reference models
Abdalla et al. DWT and MFCCs based feature extraction methods for isolated word recognition
CN112183582A (en) Multi-feature fusion underwater target identification method
CN110767248B (en) Anti-modulation interference audio fingerprint extraction method
CN101577116A (en) Extracting method of MFCC coefficients of voice signal, device and Mel filtering method
CN108172214A (en) A kind of small echo speech recognition features parameter extracting method based on Mel domains
Imran et al. An analysis of audio classification techniques using deep learning architectures
CN110610722A (en) Short-time energy and Mel cepstrum coefficient combined novel low-complexity dangerous sound scene discrimination method based on vector quantization
Panagiotou et al. PCA summarization for audio song identification using Gaussian mixture models
CN113555038A (en) Speaker independent speech emotion recognition method and system based on unsupervised field counterwork learning
Seo et al. Linear speed-change resilient audio fingerprinting
CN116884431A (en) CFCC (computational fluid dynamics) feature-based robust audio copy-paste tamper detection method and device
CN111785262A (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN112309404B (en) Machine voice authentication method, device, equipment and storage medium
Wang et al. Revealing the processing history of pitch-shifted voice using CNNs
CN113808604B (en) Sound scene classification method based on gamma through spectrum separation
CN114613391B (en) Snore identification method and device based on half-band filter
CN113948088A (en) Voice recognition method and device based on waveform simulation
Htun Analytical approach to MFCC based space-saving audio fingerprinting system
Thiruvengatanadhan Music genre classification using mfcc and aann
Therese et al. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant