CN110600038A

CN110600038A - Audio fingerprint dimension reduction method based on discrete kini coefficient

Info

Publication number: CN110600038A
Application number: CN201910784077.5A
Authority: CN
Inventors: 贾懋珅; 赵文兵
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2019-12-20
Anticipated expiration: 2039-08-23
Also published as: CN110600038B

Abstract

The invention relates to an audio fingerprint dimensionality reduction method based on discrete kini coefficient calculation, which aims to solve the problem of high dimensionality of audio fingerprint features and specifically comprises the steps of constructing a target sound bank in a classified mode, extracting fingerprint features of sample audio and introducing discrete kini coefficients to reduce dimensionality of the audio fingerprint features. According to the invention, the discrete kini coefficients of the fingerprints are introduced into all the dimensions of the audio fingerprint, the discriminative sizes of different audios in the dimension are reflected by the discrete kini coefficient of all the dimensions of the audio fingerprint, and the dimension with small discrete kini coefficient is deleted by keeping the dimension with large discrete kini coefficient to achieve the purpose of reducing the dimension. The sample audio fingerprint database constructed by the audio fingerprint features after dimension reduction has smaller data volume and higher utilization rate.

Description

Audio fingerprint dimension reduction method based on discrete kini coefficient

Technical Field

The invention belongs to the field of intelligent application type sound fields, and particularly relates to an audio fingerprint dimension reduction method based on discrete kini coefficient calculation.

Background

In recent years, intelligence is popular among people and is widely researched and discussed, intelligent identification of audio is an important basic stone for artificial intelligence development, intelligent identification of audio cannot be separated from feature extraction of audio, among numerous features of audio, audio fingerprint is the most popular in recent years, and the audio fingerprint is a compact digital signature based on content and capable of representing a section of important acoustic features of audio, and the main purpose of the audio fingerprint is to represent a large amount of audio data by using a small amount of digital information. Compared with the traditional audio characteristics, the method has the advantages of small data volume, high anti-noise performance, relatively simple characteristic extraction process and the like, and is widely applied to the fields of music identification, advertisement supervision, copyright protection and the like. However, audio fingerprints have the disadvantage of high fingerprint dimensionality, which delays the recognition speed in audio recognition and occupies a large memory of a computer. In contrast, if the dimensionality of the audio fingerprint can be reduced, the data size of the audio fingerprint can be reduced to a great extent, and meanwhile, the speed of audio retrieval can be improved, and the audio identification performance can be enhanced.

Disclosure of Invention

Aiming at the problem of high dimension of audio fingerprint characteristics, the invention introduces fingerprint discrete kini coefficients in each dimension of the audio fingerprint, and reflects the differentiability of different audios in the dimension through the discrete kini coefficient of each dimension of the audio fingerprint. The larger the discrete kini coefficient of a certain bit of the audio fingerprint is, the larger the difference of different audio at the bit is, which indicates that the bit is better distinguishable, otherwise, the distinctiveness is poor. Therefore, the bits with good distinguishability are reserved, the bits with poor distinguishability are removed, and the high-dimensional audio fingerprint can be converted into a lower dimension, so that the data volume of the fingerprint is effectively reduced.

The technical scheme of the invention is used for solving the problem of overlarge data volume of the audio fingerprint feature library, and the data volume of the sample audio fingerprint library constructed by the audio fingerprint features after dimensionality reduction is smaller and the utilization rate is higher, and the invention mainly comprises the following steps:

step 1, constructing a target sound library in a classified manner

The design classifies the audios and establishes a library according to the audio characteristic types or existing data conditions. Because the audio types are different and the characteristics are different, the audio classification processing is convenient for finding the commonalities of the audio characteristics, and the dimensionality reduction quality of the audio fingerprints can be influenced if the audio classification is not carried out. Storing the existing audio data in a classified manner, and then extracting fingerprint features of various audios respectively. The audio fingerprint extraction algorithm flow chart is shown in fig. 1.

Step2, extracting fingerprint characteristics of sample audio

And selecting various types of audio data from the constructed audio library as original sample audio. Extracting fingerprint features of the original sample and introducing discrete kini coefficients to perform dimensionality reduction on the fingerprint features. The specific process is as follows:

step2.1: preprocessing of target audio

Before extracting the data features, preprocessing operation is firstly carried out. The pretreatment comprises the following steps: band-pass filtering, pre-emphasis, framing.

(1) The method comprises the steps of selecting 8kHz sampling audio signals as processing objects to carry out band-pass filtering processing, extracting the most important frequency components of human ear perception, and selecting a band-pass filter with a band-pass range of 20Hz-4000Hz to process the signals. In the design, a Finite Impulse Response (FIR) Filter is selected for bandpass filtering, and the filtering process is as follows:

wherein, T is the number of sampling points of the processed signal, p is the time domain index, h (l) is the FIR filter coefficient, x (p) is the input signal, and y (p) is the band-pass filtered signal.

(2) The design selects a digital filter with 6 dB/octave to realize the pre-emphasis processing on the signals y (p) after band-pass filtering, so as to improve the high-frequency characteristic of the pre-processed signals, enable the signal frequency spectrum to become relatively flat, and enable the voice signals to use the same signal-to-noise ratio to obtain the frequency spectrum in the whole frequency band from low frequency to high frequency. The pre-emphasis process is shown as follows:

s(p)＝y(p)-μ*y(p-1)

wherein mu is a pre-emphasis coefficient which takes a value of 0.96, and s (p) is a signal after pre-emphasis processing.

(3) And carrying out windowing and framing processing on the pre-emphasis signal. Although the frame segmentation method can be a continuous segmentation method, an overlapped segmentation method is generally used, which is to make the transition between frames smooth and maintain the continuity of the frames. The audio is framed with a frame length of 0.064 seconds, with a 75% overlap between frames, each frame being weighted with a hanning window of the same length. The windowing formula is as follows:

wherein, T is the length of the hanning window and is also the frame length of one frame of audio.

Step2.2: fingerprint feature extraction for audio data

A digital audio fingerprint can be regarded as a concentrated essence of a section of audio, and comprises the most important part of audio data audition, and the dimension of the audio fingerprint is a key factor influencing the data volume of the fingerprint and the retrieval rate, so the technology introduces discrete kini coefficients to perform feature dimension reduction on the audio fingerprint, and firstly performs fingerprint feature extraction on the audio before dimension reduction, and the steps are as follows:

(1) performing discrete Fourier transform on the framed audio signal, and performing discrete Fourier transform on each frame of data of the audio signal, wherein the transform formula is as follows:

where x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of the discrete fourier transform.

(2) The frequency domain signal after discrete Fourier transform is subjected to spectrum sub-band division, 33 non-overlapping frequency bands are selected from the spectrum and distributed in the range of 20-4000Hz (the identification of the human ear on the audio frequency is mainly concentrated in the range), and the frequency bands are equally logarithmically spaced (according to the fact that the response of the human ear on different frequencies is not linear). The starting frequency of the mth sub-band, i.e., the ending frequency f (m) of the (m-1) th sub-band, can be expressed as follows:

where Fmin is the lower mapping limit, here 20Hz, Fmax is the upper mapping limit, here 4000Hz, and M is the number of subbands, here 33.

(3) Calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, assuming that the starting frequency of the mth sub-band is f (m), the ending frequency is f (m +1), and the frequency domain signal after discrete fourier transform is x (k), the formula of the energy of the mth sub-band of the nth frame is as follows:

(4) generating a sub-fingerprint of each frame of audio, performing bit difference discrimination on the 33 sub-band energies obtained by each frame, and generating a 32-bit binary code (sub-fingerprint) of each frame of audio, wherein the mth sub-band energy of the nth frame is E (n, m), and the corresponding binary bit information is F (n, m), and then the binary audio fingerprint information discrimination formula of each frame is as follows:

as can be seen from the above formula, each frame of audio finally generates a 32-dimensional binary sub-fingerprint information, the sub-fingerprint contains less information, and an audio fingerprint feature often consists of a plurality of sub-fingerprints.

Step3, reducing the dimension of the audio fingerprint characteristics

After the audio fingerprint extraction process, the audio data of each frame finally generates a 32-dimensional binary sub-fingerprint information. For a segment of audio, the contained audio fingerprint information is composed of a plurality of binary sub-fingerprint information, the data volume of the fingerprint information is still large, and in practical application, it is desirable to further reduce the audio fingerprint dimension so as to effectively reduce the data volume of the fingerprint. The design provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation, the discrete kini coefficient is solved for each dimension of the audio fingerprint, and the purpose of reducing the fingerprint dimension is achieved through the size of the discrete kini coefficient of each dimension of the fingerprint.

The discrete kini coefficients of the dimensional fingerprints are obtained by taking every 50 frames of the audio fingerprint as a group, and the discrete kini coefficients of the dimensional fingerprints reflect the discrete degree of certain dimensional data of the audio fingerprint, namely the difference of the certain dimensional data of the audio fingerprint, so that the larger the discrete kini coefficient of a certain position of the audio fingerprint is, the larger the difference of different audio frequencies at the position is, the better the distinguishability of the position is, and otherwise, the distinguishability is poor. The design removes the information of the dimension with poor distinguishability by keeping the information of the dimension with good distinguishability in the audio fingerprint, and converts the 32-dimensional audio fingerprint into the lower dimension to reduce the data volume of the fingerprint.

Step3.1: calculating discrete kini coefficient of each dimension of audio fingerprint

The derivation process and the specific steps are as follows:

(1) obtaining discrete Lorentz curve of audio fingerprint, wherein the discrete Lorentz curve is a key curve for obtaining discrete kini coefficient and is formed by accumulating fingerprint data to obtain ratio vectorJ represents the dimension number of the audio fingerprint, and the value range j is 1,2, … … and 32 to obtain the ratio vector of the accumulated fingerprint dataThe calculation process of (2) is as follows:

processing various audio fingerprints in an audio fingerprint database according to frames, dividing every 50 frames of fingerprint data of the audio fingerprints into a group of N groups, and constructing a jth dimension accumulated fingerprint data vector:

whereinAre numbered with group numbers and have Constructing a j-dimension accumulated fingerprint data proportion vectorThe elements in the vector are defined as:

………

the curve formed by each element of the proportion vector is a discrete Lorentz curve, and specifically, the discrete Lorentz curve drawing process of the j dimension of the audio fingerprint is as follows:

the audio fingerprint accumulated group number is used as an abscissa (the abscissa is the abscissa) Taking the occupation amount of j dimension accumulated fingerprint data of the audio fingerprint as a vertical coordinate, and carrying out occupation amount calculation on j dimension accumulated fingerprint data of the audio fingerprintThe discrete points are drawn, the discrete curve linked by the discrete points is the discrete Lorentz curve of the j dimension of the audio fingerprint, and the coordinates of the discrete curve are

(2) The discrete kini coefficients of all dimensions of various audio fingerprints are obtained, and the obtained discrete Lorentz curve is taken as a boundary line, so that the kini coefficient formula of the j dimension of the audio fingerprint can be obtained as follows:

wherein S is_aIs a closed area enclosed by the diagonal OA and the discrete Lorentz curve, and the coordinate of the point O is (0,0) and the coordinate of the point A is (1,1), S_bIs a closed area enclosed by coordinate line segments OB and BA and a discrete Lorentz curve, the coordinate of a point B is (1,0), G^jFor the kini coefficient of j dimension of audio fingerprint, the fingerprint discrete kini coefficient computation auxiliary graph is shown in fig. 4.

As can be seen from the above, S_a+S_bIs the closed area enclosed by the diagonal line segment OA and the line segments OB and BA, i.e.: s_a+S_b1/2, because the audio fingerprint is discrete, the present design discretizes the above equation into:

thereby, obtaining the j-dimension discrete kini coefficient of the audio fingerprintWherein i is a group number of the group,and accumulating ith group of fingerprint data occupation quantity for j dimension of the audio fingerprint.

Step3.2: carrying out fingerprint dimensionality reduction by counting discrete kini coefficients of all dimensions of different types of audio fingerprints

And (4) training discrete kini coefficients of all dimensions of the audio fingerprint by combining application scenes. And (3) constructing a training set from the audio data according to the data type or combining the existing data set to construct the training set, and calculating the discrete kini coefficient of each dimension of the audio fingerprint of the audio data of each training set and carrying out statistical analysis.

The discrete kini coefficient of the audio fingerprint designed by the invention is suitable for dimension reduction of various audio fingerprints, the following audio fingerprints are selected as analysis objects, and the discrete kini coefficient of each dimension of the audio fingerprint is subjected to statistical analysis:

as shown in fig. 5, discrete kini coefficients of each dimension of an audio fingerprint of an abnormal sound are obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the abnormal sound from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the abnormal sound audio fingerprint is 0.36-0.38, it can be seen from the figure that the discrete kini coefficients of each dimension of the audio fingerprint are different in size, the discrete kini coefficients of some dimensions are smaller, such as dimensions 2, 25 and 26, and the discrete kini coefficients of some dimensions are larger, such as dimensions 11, 20 and 28, and the size of the discrete kini coefficients of the fingerprint represents the difference of the audio fingerprint data of the dimension. At this time, if the threshold of the discrete kini coefficient of the fingerprint is set to 0.37, it can be seen from the figure that there are 2 nd, 5 th, 24 th, 25 th and 26 th dimensions of the discrete kini coefficient of the fingerprint smaller than 0.37, which shows that the differences of the fingerprint data of each frame of the 5 dimensions are small and can be eliminated, and the dimension of each frame of the audio fingerprint is reduced by eliminating the dimensions with the small differences, thereby reducing the data volume of the audio fingerprint database.

FIG. 6 is a diagram showing discrete kini coefficients of each dimension of an audio fingerprint of a normal speaking class, which is obtained by counting discrete kini coefficients of 32 dimensions of the audio fingerprint of the normal speaking class from an audio fingerprint library, by setting a threshold of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information lower than the threshold can be discarded, audio fingerprint dimension information higher than the threshold is retained, the threshold range of the discrete kini coefficients of the audio fingerprint of the normal speaking class is 0.36-0.38, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the audio fingerprint has the 2 nd, 22 nd, 25 nd dimensions, the dimension with larger discrete kini coefficients of the fingerprint has the 18 th, 26 th, 28 th dimensions, etc., if the threshold of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the dimension with smaller discrete kini coefficients of the fingerprint has the 2 nd, 22 nd, 25 nd dimensions, which shows that the difference of each frame fingerprint data of the 3 dimensions is smaller, can be discarded.

FIG. 7 is a diagram illustrating discrete kini coefficients of each dimension of an audio fingerprint of a song class, wherein discrete kini coefficients of 32 dimensions of the audio fingerprint of the song class are counted from an audio fingerprint library, audio fingerprint dimension information lower than a threshold value can be omitted by setting a threshold value of the discrete kini coefficients of the audio fingerprint, audio fingerprint dimension information higher than the threshold value is retained, the threshold value range of the discrete kini coefficients of the audio fingerprint of the song class is 0.36-0.38, it can be seen from the diagram that the discrete kini coefficients of each dimension of the audio fingerprint have relatively large differences, the discrete kini coefficients of each dimension of the fingerprint are relatively dispersed, the discrete kini coefficients of 2 nd, 14 th, 15 th and 25 th dimensions of the fingerprint are small, the discrete kini coefficients of 4 th, 5 th, 11 th, 18 th and 29 th dimensions of the fingerprint are large, and if the threshold value of the discrete kini coefficients of the audio fingerprint is set to 0.37, it can be seen from the diagram that the discrete kini coefficients of 1 st, 5 th, 11 th, 18 th and, 2. Dimensions 14, 15, 25 and 26 show that the fingerprint data of each frame in the 6 dimensions are small in difference and can be omitted.

It can be seen from these figures that although the discrete kini coefficients of the fingerprints of various audio frequencies are different and the trends of the coefficients in various dimensions are different, the discrete kini coefficients of the fingerprints in several dimensions are smaller, which means that the audio fingerprints are less distinguishable in the several dimensions, so that the data in the several dimensions can be removed, and the 32-dimensional audio fingerprints are converted into the lower dimensions, thereby effectively reducing the data amount of the fingerprints.

The invention has the advantages that:

1. algorithm complexity is low and flexibility is stronger

2. Has smaller audio characteristic data quantity than the traditional audio characteristic data quantity

3. Discrete kini coefficient is introduced as dimension reduction algorithm, complexity is low, and large batch of data can be processed

4. The audio fingerprint robustness after dimensionality reduction is higher

Drawings

FIG. 1 is a flow chart of an audio fingerprint extraction algorithm

FIG. 2 is a 32-D audio fingerprint block diagram of a segment of audio

FIG. 3 Lorentz graph of abnormal sound type audio fingerprint 1 st dimension

FIG. 4 is an auxiliary graph of discrete kini coefficient calculation of audio fingerprint

FIG. 5 is a graph of discrete kini coefficients for dimensions of an anomalous acoustic audio fingerprint

FIG. 6 is a graph of discrete kini coefficients for each dimension of a normal speech class audio fingerprint

FIG. 7 is a graph of discrete kini coefficients for dimensions of audio fingerprints for song class

Detailed Description

The invention provides an audio fingerprint dimension reduction method based on discrete kini coefficient calculation.

step 1, constructing a target sound library in a classified manner

The design classifies the audios and establishes a library according to the audio characteristic types or existing data conditions. Because the audio types are different and the characteristics are different, the audio classification processing is convenient for finding the commonalities of the audio characteristics, and the dimensionality reduction quality of the audio fingerprints can be influenced if the audio classification is not carried out.

The audio fingerprint discrete kini coefficient designed by the invention is suitable for dimension reduction of various audio fingerprints, so that the existing audio data needs to be classified and stored, and then the fingerprint characteristics of various audios are respectively extracted. The audio fingerprint extraction algorithm flow chart is shown in fig. 1.

Step2, extracting fingerprint characteristics of sample audio

step2.1: preprocessing of target audio

(1) And selecting 8kHz sampling audio signals as processing objects to carry out band-pass filtering processing, and selecting a band-pass filter with a band-pass range of 20Hz-4000Hz to process the signals in order to extract the most important frequency components for human ear perception. In the design, a Finite Impulse Response (FIR) Filter is selected for bandpass filtering, and the filtering process is as follows:

s(p)＝y(p)-μ*y(p-1)

Step2.2: fingerprint feature extraction for audio data

(4) generating a sub-fingerprint of each frame of audio, performing bit difference discrimination on the 33 sub-band energies obtained by each frame, and generating a 32-bit binary code of each frame of audio, namely the sub-fingerprint of each frame of audio, wherein the mth sub-band energy of the nth frame is E (n, m), and the corresponding binary bit information is F (n, m), and then the binary audio fingerprint information discrimination formula of each frame is as follows:

In practical applications, the 32-dimensional binary sub-fingerprints generated by a frame of audio signal contain less information and cannot accurately retrieve or identify target audio, in practical applications, an audio fingerprint block is commonly used for audio retrieval or audio identification, and the audio fingerprint block is formed by combining at least 256 audio sub-fingerprints, such as a 32-dimensional audio fingerprint block extracted from a section of audio shown in fig. 2.

Step3, reducing the dimension of the audio fingerprint characteristics

Every 50 frames of audio fingerprint dimension data are grouped, including but not limited to 50 frames, discrete kini coefficients of the dimension fingerprints are obtained, the discrete kini coefficients of the dimension fingerprints reflect the discrete degree of the dimension data of the audio fingerprint, namely the difference of the dimension data of the audio fingerprint, the larger the discrete kini coefficient of a certain position of the audio fingerprint is, the larger the difference of different audio frequencies at the position is, the better the distinguishability of the position is, and otherwise the distinguishability is poor. The design removes the information of the dimension with poor distinguishability by keeping the information of the dimension with good distinguishability in the audio fingerprint, and converts the 32-dimensional audio fingerprint into the lower dimension to reduce the data volume of the fingerprint.

The method comprises the following specific steps:

(1) obtaining discrete Lorentz curve of audio fingerprint, wherein the discrete Lorentz curve is a key curve for obtaining discrete kini coefficient and is formed by accumulating fingerprint data to obtain ratio vectorJ represents the dimension serial number of the audio fingerprint, and the value range j is 1,2, … … and 32 to obtain the ratio vector of the accumulated fingerprint dataThe calculation process of (2) is as follows:

whereinAre numbered with group numbers and have

Constructing a j-dimension accumulated fingerprint data proportion vectorThe elements in the vector are defined as:

………

the audio fingerprint accumulated group number is used as an abscissa (the abscissa is the abscissa) Taking the occupation amount of j dimension accumulated fingerprint data of the audio fingerprint as a vertical coordinate, and carrying out occupation amount calculation on j dimension accumulated fingerprint data of the audio fingerprintThe discrete points are drawn, the discrete curve linked by the discrete points is the discrete Lorentz curve of the j dimension of the audio fingerprint, and the coordinates of the discrete curve areFor example, a Lorentz plot of the 1 st dimension of an anomalous sound-like audio fingerprint is shown in FIG. 3.

The discrete kini coefficient designed by the invention is suitable for dimension reduction of various audio fingerprints, the following audio is selected as an analysis object, and the discrete kini coefficient of each dimension of the audio fingerprint is subjected to statistical analysis:

Claims

1. An audio fingerprint dimension reduction method based on discrete kini coefficients is characterized by comprising the following steps:

step 1, constructing a target sound library in a classified manner

Classifying the audios to build a library according to the audio characteristic types or the existing data conditions;

step2, extracting fingerprint characteristics of sample audios in classification mode

Selecting various types of audio data from a constructed audio library as original sample audio, extracting fingerprint features of the original sample according to the types and introducing a discrete kini coefficient to reduce the dimension of the fingerprint features, wherein the specific process comprises the following steps:

step2.1: pre-processing the original sample audio, the pre-processing comprising: band-pass filtering, pre-emphasis, windowing and framing;

step2.2: fingerprint feature extraction is carried out on preprocessed audio data

wherein, x (k) is a frequency domain signal, x (p) is a time domain signal, k is a frequency index, and T is a sample length of discrete fourier transform;

(2) dividing frequency spectrum sub-bands of the frequency domain signal after discrete Fourier transform, selecting 33 non-overlapping frequency bands from the frequency spectrum, distributing the frequency bands in the range of 20-4000Hz, wherein the frequency bands are equally logarithmically spaced, and the starting frequency of the mth sub-band, namely the termination frequency f (m) of the mth-1 sub-band, can be expressed as the following formula:

wherein Fmin is a mapping lower limit, Fmax is a mapping upper limit, and M is the number of subbands, here 33;

(3) calculating the energy of each sub-band of each frame of audio, and respectively calculating the energy of the selected 33 non-overlapping frequency bands, so that the formula of the energy of the mth sub-band of the nth frame is as follows:

wherein, f (m) is the starting frequency of the mth sub-band, f (m +1) is the ending frequency of the mth sub-band, and x (k) is the frequency domain signal after the discrete fourier transform of the nth frame;

(4) generating a sub-fingerprint of each frame of audio, specifically: and performing bit difference judgment on the 33 subband energies obtained by each frame to generate 32-bit binary codes of each frame of audio, namely the sub-fingerprints of each frame of audio, wherein F (n, m) is binary bit information corresponding to the 32-bit binary codes, and a specific judgment formula is as follows:

wherein E (n, m) is the mth subband energy of the nth frame, and m is 1,2, …, 32;

step3, reducing the dimension of the audio fingerprint characteristics of various samples

Step3.1: the discrete kini coefficient of each dimensionality of the audio fingerprints of various samples is obtained, and the method specifically comprises the following steps:

(1) and processing the audio fingerprints of various types of samples according to frames, and dividing the audio fingerprints into N groups.

(2) Constructing j-dimension accumulated fingerprint data vectors of the audio fingerprints of various samples, wherein a specific calculation formula is as follows:

whereinAre numbered with group numbers and have F (, x) is the binary bit information in step2.

(3) Constructing j-dimension accumulated fingerprint data proportion vector of audio fingerprints of various types of samples The elements in the vector are defined as:

………

(4) obtaining discrete kini coefficients of all dimensionalities of the audio fingerprints of all types of samples, wherein a specific calculation formula is as follows:

wherein i is a group number,accumulating the proportion of the ith group of fingerprint data for the jth dimension of the audio fingerprint, and specifically obtaining the proportion in the previous step (3);

step3.2: performing fingerprint dimensionality reduction on discrete kini coefficients of all dimensionalities of audio fingerprints of various types of samples

And (3) respectively forming training sets corresponding to the categories by various sample data, performing statistical analysis on the discrete kini coefficient of each dimensionality of the audio fingerprint of each training set to obtain a threshold value of the discrete kini coefficient of the audio fingerprint of each category, then performing dimensionality reduction operation on the audio to be subjected to dimensionality reduction according to the obtained threshold value, namely, retaining the audio fingerprint dimensionality information higher than the threshold value, deleting the audio fingerprint dimensionality information lower than the threshold value, and finishing dimensionality reduction.

2. The discrete kini coefficient-based audio fingerprint dimension reduction method according to claim 1, wherein:

selecting 8kHz sampled audio signals as processing objects in the step2, and selecting a band-pass filter with a passband ranging from 20Hz to 4000Hz to process the sampled audio signals, wherein the band-pass filter is a finite impulse response FIR filter;

the pre-emphasis process is shown as follows:

s(p)＝y(p)-μ*y(p-1)

wherein y (p) is a band-pass filtered signal, μ is a pre-emphasis coefficient, and s (p) is a pre-emphasis processed signal, preferably implemented with a digital filter having 6 dB/octave;

the windowing framing processing specifically refers to framing the pre-emphasized audio in an overlapping and segmenting mode, wherein a 75% overlapping rate is kept between frames, each frame is weighted by a Hanning window with the same length, and a windowing formula is as follows: