CN109065070B

CN109065070B - Kernel function-based audio characteristic signal dimension reduction method

Info

Publication number: CN109065070B
Application number: CN201810995309.7A
Authority: CN
Inventors: 龙华; 杨明亮; 邵玉斌; 杜庆治
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2022-07-19
Anticipated expiration: 2038-08-29
Also published as: CN109065070A

Abstract

The invention relates to a kernel function-based audio characteristic signal dimension reduction method, and belongs to the technical field of audio signal processing. The invention carries out dimension reduction processing on the characteristic parameters of the audio signals, achieves the required dimension reduction effect while not discarding the audio characteristic information quantity, visually displays the final dimension reduction data, and carries out comparison analysis on the results obtained by adopting other audio characteristic parameter dimension reduction methods. The invention carries out dimension reduction on the audio characteristic parameters, mainly carries out dimension reduction processing on a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient of an audio coefficient field, and visually displays the data result after dimension reduction. The audio feature dimension reduction processing of the invention can be used for monitoring broadcast signals and quickly identifying and processing audio signals. The method has simple algorithm, uses the nonlinear kernel function to represent the mapping relation between the Gaussian observation space and the hidden space, and avoids the defects of limited use range and poor dimension reduction effect of a linear mapping method.

Description

Kernel function-based audio characteristic signal dimension reduction method

Technical Field

The invention relates to a kernel function-based audio characteristic signal dimension reduction method, and belongs to the technical field of audio characteristic signal processing.

Background

In order to realize the management and control of wireless audio broadcasting and perform safe and efficient real-time monitoring and discrimination on the audio broadcasting, the rapid processing of audio information is related to the process speed of the whole process, and the characteristic signal dimension reduction processing of audio is taken as the core of audio information processing, so that the efficiency and the reliability of the processing are also necessary to be solved at present. Most of the existing audio characteristic signal dimension reduction methods mainly include a local preserving projection method, a multi-dimensional scaling method, a local linear embedding method, a principal component analysis method and the like. Most of the dimension reduction algorithms have high complexity, and the purpose of reducing the dimension by discarding part of characteristic signals can cause unpredictable errors in practical engineering application.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a kernel function-based audio characteristic signal dimension reduction method, which performs dimension reduction analysis on the extracted audio Linear Prediction Coefficients (LPCs), Linear Prediction Cepstrum Coefficients (LPCCs) and Mel Frequency Cepstrum Coefficients (MFCCs) to achieve the purposes of reducing data dimensions and improving information processing rate.

The technical scheme of the invention is as follows: a kernel function-based audio feature signal dimension reduction method. The method comprises the following specific steps:

(1) audio signal acquisition: and acquiring an audio signal to obtain an audio sample.

(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering, pre-emphasizing and framing the digital signals written into the WAV file.

(3) Characteristic parameter extraction: and extracting high-dimensional characteristic parameters of Linear Prediction Coefficients (LPCs), Linear Prediction Cepstrum Coefficients (LPCCs) and Mel Frequency Cepstrum Coefficients (MFCCs) in the processed digital signals.

(4) Building a dimension reduction model: and (3) sending the extracted characteristic parameters into a dimensionality reduction model built by a kernel chemistry technique (kerneltrick) to directly obtain low-dimensional hidden variables, wherein the low-dimensional hidden variables are dimensionality reduced data. The core of the method is to use a Gaussian regression process model (GPR) to model the relation between an implicit variable and an observed variable in a non-linear mode.

(5) And (3) dimension reduction analysis: and performing visual display (2D/3D) on the data subjected to the dimension reduction, and comparing the results obtained by other dimension reduction methods.

In the above dimension reduction method for the audio characteristic signal based on the kernel function, in the step (1), the audio acquisition is to acquire an audio sample through an audio acquisition device, and the audio acquisition device sets a sampling frequency (the sampling frequency meets the nyquist sampling theorem), a sampling channel number and quantization precision when acquiring the audio signal.

In the above method for reducing the dimension of the audio feature signal based on the kernel function, the audio signal preprocessing in step (2) includes the following steps:

(1) using a rectangular window function w (n) (upper limit frequency is generally f)_H3400Hz, lower limit frequency f_L60-100 Hz) filtering the collected audio signal x (n) to obtain a signal y_a(n) wherein

(2) For the filtered signal y_a(n) carrying out pre-emphasis processing by a difference method to obtain a signal y_b(n) wherein y_b(n) ═ y (n) — α y (n-1) (α is a pre-emphasis coefficient generally having a value close to 1). The high frequency part is improved, the low frequency part is suppressed, and the frequency spectrum of the signal is flattened.

(3) The short-time analysis of the frame-divided voice signal is to divide the signal into a plurality of voice segments, one segment is called a frame, and the time range of each segment is between 10 ms and 30 ms. In order to ensure smooth transition between frames, there is a partial overlap between frames, the overlapped part is called frame shift, and the frame shift takes 1/2 or 1/3 of the length of the frame.

In the above dimension reduction method for audio feature signals based on kernel functions, the step (3) of extracting feature parameters includes the following steps:

(1) linear Prediction Coefficient (LPC): calling an LPC function packet by programming, setting order parameters of a frame length, a frame shift, a window function and an LPC, extracting characteristic values of the audio signals preprocessed in the step (2), and putting the extracted characteristic values into a specified table 1.

(2) Linear Prediction Cepstrum Coefficient (LPCC): calling the LPCC function packet by programming, setting the frame length, the frame shift, the window function and the LPCC order parameter, extracting the characteristic value of the audio signal preprocessed in the step (2), and putting the characteristic value into a specified table 2.

(3) Mel-frequency cepstrum coefficient (MFCC): calling the MFCC function packet by programming, setting the frame length, the frame shift, the window function and the order parameter of the MFCC, extracting the characteristic value of the audio signal preprocessed in the step (2), and putting the extracted characteristic value into a specified table 3.

In the above method for reducing the dimension of the audio characteristic signal based on the kernel function, the building of the dimension reduction model in the step (4) includes the following steps:

(1) the characteristic dimension reduction model is built by firstly recording hidden space as

Dimension q, let observation space be

Dimension d (q)<d) In that respect Assuming that a relation of y ═ f (z) + epsilon exists between the observed value and the hidden space parameter, noise epsilon obeys a Gaussian distribution with a mean value of 0 and a variance of beta, and assuming that the hidden function f is a square exponential kernel function satisfying the Gaussian process

Wherein, sigma is a coefficient parameter of square exponential kernel, l represents a distance influence factor parameter between two points z and z ', beta represents a hyper-parameter of the model, sigma (z, z') represents Kronecker delta function, and the parameter to be solved in the kernel function is theta (sigma, l, beta). The kernel function takes a maximum when z is close to z' and a minimum when the distance is far away. For the convenience of subsequent derivation, a calculation formula of the covariance matrix is given, and the formula is

(2) Assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is_:,iFor n elements of the i-th dimension in the observation space Y

To obtain a better dimensionality reduction effect, that is, to obtain the best kernel function hyperparameters by using a correlation algorithm to maximize the probability, a particle swarm optimization algorithm is used to solve the probability, and θ (σ, l, β) is recorded as a ═ a (a)₁,a₂,a₃) Wherein the velocity of the particle i is denoted as v_i＝(v_i1,v_i2,v_i3) The best position where the particle passes is denoted as p_g＝(p_g1,p_g2,p_g3) The particle swarm algorithm adopts the following equation to continuously update the positions of the particles

Wherein w is a non-negative inertia factor; acceleration constant c₁And c₂Is a non-negative number; r is₁And r₂Is at [01]Random numbers transformed within a range. The current position, the experience position and the neighbor bit information of the particle swarm optimization algorithm are utilized to adjust the state of the particles, the information exchange mode of the particle swarm optimization algorithm is applied to the optimization process of the nuclear parameters, and the particles are influenced by the experience of the particles and the experience in the swarm, so that the particle swarm optimization algorithm has better global optimization capability and convergence speed.

The kernel function used by the model is a nonlinear kernel function, the solved kernel parameters theta (sigma, l, beta) are brought back into the model, the extracted characteristic parameters are sent into the dimension reduction model to obtain hidden parameters, and the hidden parameters are dimension reduced data.

In the dimension reduction analysis method based on the audio characteristic signal, in the step (5), the data after dimension reduction is displayed in a two-dimensional or three-dimensional visual manner, and then is analyzed and compared with other dimension reduction algorithm results.

Compared with the existing kernel function-based audio characteristic signal dimension reduction method, the method has the advantages that:

(1) the invention uses the nonlinear kernel function to represent the direct relation between the observation space data and the hidden space parameters, and avoids the defect of poor dimension reduction effect of certain audio characteristic data caused by linear mapping.

(2) The invention adopts the particle swarm algorithm to solve the hyper-parameters in the kernel function, the excellent global optimization capability of the particle swarm and the directionality of the swarm particles can quickly find the optimal hyper-parameters, and the method is extremely convenient for subsequent replacement of other kernel functions.

(3) The novel audio characteristic dimension reduction theory provided by the invention is simple, the programming is easy to realize, the novel audio characteristic dimension reduction theory is more suitable for the application of real engineering projects, and the improvement of the audio information processing speed is substantially changed.

Drawings

FIG. 1 is a flow chart of a dimension reduction analysis of the present invention;

FIG. 2 is a flow chart of signal preprocessing according to the present invention;

FIG. 3 is a flow chart of feature parameter extraction and dimension reduction processing according to the present invention;

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

As shown in fig. 1-3, a kernel function-based audio feature signal dimension reduction method specifically includes the steps of:

(1) audio signal acquisition: and collecting audio signals to obtain audio samples.

(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering, pre-emphasizing and framing the digital signals to be written into the WAV file.

(4) Building a dimension reduction model: and (3) sending the extracted characteristic parameters into a dimensionality reduction model built by a kernel trim (kernel trim) to directly obtain a low-dimensional hidden variable, wherein the low-dimensional hidden variable is dimensionality reduced data.

The audio acquisition is to acquire an audio sample through an audio acquisition device, the audio acquisition device sets the sampling frequency to be 44.1Hz (the sampling frequency meets the Nyquist sampling theorem) when acquiring the audio signal, and the number of sampling channels is single channel and the quantization precision is 16bit because the audio signal is acquired.

The signal preprocessing comprises the following steps:

(1) using a rectangular window function w (n) (the upper limit frequency is generally f)_H3400Hz, lower limit frequency f_L60-100 Hz) filtering the collected audio signal x (n) to obtain a signal y_a(n) wherein

(2) For the filtered signal y_a(n) carrying out pre-emphasis processing by a difference method to obtain a signal y_b(n) wherein y_b(n) ═ y (n) — α y (n-1) (α is a pre-emphasis coefficient generally having a value close to 1).

(3) Processing the pre-emphasis to obtain a signal y_b(n) dividing the voice into a plurality of voice segments, wherein one voice segment is called a frame, and the time range of each voice segment is 10-30 ms. There is a partial overlap between frames, the overlapping part is called frame shift, and the frame shift takes the length of 1/2 or 1/3.

The characteristic parameter extraction comprises the following steps:

(2) Linear Prediction Cepstrum Coefficient (LPCC): calling the LPCC function packet by programming, setting the order parameters of the frame length, the frame shift, the window function and the LPCC, extracting the characteristic values of the audio signals preprocessed in the step (2), and putting the characteristic values into a specified table 2.

(3) Mel-frequency cepstrum coefficient (MFCC): calling the MFCC function packet by programming, setting the frame length, the frame shift, the window function and the order parameter of the MFCC, extracting the characteristic value of the audio signal preprocessed in the step (2), and putting the characteristic value into a specified table 3.

The construction of the dimension reduction model comprises the following steps:

(1) the hidden space parameter is recorded as

Observation space of

Namely, the hidden space is a dimension q, the observation space dimension is d (q < d), the direct existence of a relation of y ═ f (z) + epsilon between an observation value and hidden space parameters is assumed, the noise epsilon follows Gaussian distribution with a mean value of 0 and a variance of ξ, and the hidden function f is assumed to be a square exponential kernel function satisfying the Gaussian process:

wherein, sigma is a coefficient parameter of square exponential kernel, l represents a distance influence factor parameter between two points z and z ', beta represents a hyper-parameter of the model, sigma (z, z') represents Kronecker delta function, and the parameter to be solved in the kernel function is theta (sigma, l, beta). The kernel function takes a maximum when z is close to z' and a minimum when the distance is far away. The covariance matrix is calculated as:

(2) assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is_:,iAs the i-th dimension in the observation space YN elements of

The invention adopts a particle swarm optimization algorithm to solve the parameters, and records theta (sigma, l, beta) as A ═ a₁,a₂,a₃) Wherein the velocity of the particle i is denoted v_i＝(v_i1,v_i2,v_i3) The best position where the particle passes is denoted as p_g＝(p_g1,p_g2,p_g3) And a particle swarm algorithm position iterative updating formula:

wherein w is a non-negative inertia factor; acceleration constant c₁And c₂Is a non-negative number; r is₁And r₂Is at [01]Random numbers transformed within a range. And (4) bringing the solved nuclear parameters theta (sigma, l, beta) back to the model to obtain a dimensionality reduction model based on the nuclear function, and sending the extracted characteristic parameters to the dimensionality reduction model to obtain hidden parameters, wherein the hidden parameters are data subjected to dimensionality reduction.

In the dimension reduction analysis, because a person lives in a three-dimensional space, the space beyond the three-dimensional space cannot be imagined, and the dimension reduction result with more data sets is difficult to directly analyze, the preprocessed audio signal is sent into the built dimension reduction model for dimension reduction processing, and the obtained hidden parameter data is stored and visually displayed so as to be convenient for comparing and analyzing the advantages and the disadvantages of other dimension reduction models. The present invention is not limited to the above embodiments, and the dimension reduction algorithm can be applied to other related fields within the knowledge of those skilled in the art.

Claims

1. A kernel function-based audio feature signal dimension reduction method is characterized in that: the method comprises the following specific steps:

(1) audio signal acquisition: collecting an audio signal to obtain an audio sample;

(2) audio signal preprocessing: converting analog signals in the collected audio samples into digital signals, writing the digital signals into a WAV file, and performing filtering, pre-emphasis and framing processing on the digital signals written into the WAV file;

(3) characteristic parameter extraction: extracting characteristic parameters of a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient in the processed digital signal;

building a dimension reduction model: sending the extracted characteristic parameters into a dimensionality reduction model built by a nucleation skill to directly obtain low-dimensional hidden variables, wherein the low-dimensional hidden variables are dimensionality reduced data;

the dimension reduction model is specifically built as follows:

(1) the dimension reduction model is built by first recording the hidden space as

Dimension q, let observation space be

Dimension d, q<d, assuming that a relation of y ═ f (z) + epsilon exists between the observed value and the hidden space parameter, the noise epsilon follows a gaussian distribution with mean 0 and variance β, and assuming that the hidden function f is a squared exponential kernel function satisfying the gaussian process:

wherein σ is a coefficient parameter of a square exponential kernel, l represents a distance influence factor parameter between z and z ', β represents a hyper-parameter of the model, σ (z, z ') represents a Kronecker delta function, the parameter requiring solution in the kernel function is θ (σ, l, β), it can be known from the above formula that the kernel function obtains a maximum value when z and z ' are very close, and obtains a minimum value when the distance is very far, and a calculation formula of a covariance matrix of the kernel function:

Solving the parameters by a particle swarm optimization algorithm, and recording theta (sigma, l, beta) as A ═ a₁,a₂,a₃) Wherein the velocity of the particle i is denoted v_i＝(v_i1,v_i2,v_i3) The best position where the particle passes is denoted p_g＝(p_g1,p_g2,p_g3) And a particle swarm algorithm position iteration formula:

(4) wherein w is a non-negative inertia factor; acceleration constant c₁And c₂Is a non-negative number; r is₁And r₂Is at [01]Random numbers transformed within the range are applied to a kernel parameter optimization process by an information exchange mode of a particle swarm optimization algorithm, the solved kernel parameters theta (sigma, l, beta) are brought back into the model to obtain a dimension reduction model, the extracted characteristic parameters are sent into the dimension reduction model to obtain hidden parameters, and the hidden parameters are dimension reduced data;

(5) and (3) analyzing a dimension reduction result: and carrying out visual display on the data subjected to the dimensionality reduction.

2. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio acquisition is performed by an audio acquisition device, and the audio acquisition device sets the sampling frequency, the number of sampling channels and the quantization precision when acquiring the audio signals.

3. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio signal pre-processing comprises the steps of:

(1) filtering the collected audio signal x (n) by adopting a rectangular window function w (n) to obtain a signal y_a(n) wherein

(2) For the filtered signal y_a(n) pre-emphasis processing is carried out by using a difference method to obtain a signal y_b(n) wherein y_b(n) y (n) - α y (n-1), α being a pre-emphasis coefficient and generally having a value close to 1;

(3) processing the pre-emphasis to obtain a signal y_b(n) dividing the frame into a plurality of voice frames, and partially overlapping the frames, wherein the overlapped part is called frame shift.