CN105551503A

CN105551503A - Audio matching tracking method based on atom pre-selection and system thereof

Info

Publication number: CN105551503A
Application number: CN201510982266.5A
Authority: CN
Inventors: 胡瑞敏; 姜林; 胡霞; 王晓晨; 涂卫平; 张茂胜; 李登实
Original assignee: Wuhan University WHU
Current assignee: Booslink Suzhou Information Technology Co ltd
Priority date: 2015-12-24
Filing date: 2015-12-24
Publication date: 2016-05-04
Anticipated expiration: 2035-12-24
Also published as: CN105551503B

Abstract

The invention discloses an audio matching tracking method based on atom pre-selection and a system thereof. The method is characterized by using correlation between signal energy and auditory perception to carry out pretreatment on original signals based on energy and extracting parts of the signals with high energy distribution; aiming at the parts of the signals, carrying out matching and tracking and acquiring a sparse coefficient; and through the sparse coefficient and an original dictionary, carrying out signal reconstruction. In the invention, tone quality is guaranteed not to be decreased, simultaneously calculating complexity is greatly reduced and a calculating speed is greatly increased.

Description

Audio matching tracking method and system based on atom preselection

Technical Field

The invention belongs to the technical field of audio coding, and particularly relates to an audio matching tracking method and system based on atomic preselection.

Background

Sparse representation generally means that original signals are accurately represented by using the minimum number of basis functions, so that the main characteristics of the signals are grasped, and the signal processing cost is substantially reduced. Matching Pursuit (MP) is one of the more widely used sparse representation algorithms, and its basic idea is to select the optimal atoms from an overcomplete dictionary in turn in an iterative process, so that the approximation of the signal is more optimized. Because the over-complete dictionary base used by the MP algorithm to represent the signal can be flexibly selected in a self-adaptive manner according to the characteristics of the signal; and a greedy algorithm of repeated iterative approximation is adopted in the atom selection process, so that the number of finally obtained atom coefficients is small, and the MP algorithm is widely applied to various fields of signal analysis, such as image processing, biomedical signal processing, audio processing and the like.

Along with people convectionThe requirements for the media quality and the number of mobile terminal users are increasing, and the requirements for the audio and video coding efficiency are increasing. The traditional matching pursuit algorithm is not suitable for real-time processing due to the high calculation complexity. At present, a plurality of fast matching pursuit algorithms are proposed, such as the joint dictionary method of document 1 and the algorithm improvement optimization method of document 2, however, these algorithms all involve time-consuming optimization, or sacrifice sparse representation efficiency as compensation, and the calculation speed is also difficult to meet the requirement of large-scale problem, document 3The others propose a traversal algorithm based on short-time Gabor atoms, which traverses from a signal starting end to a terminal by using non-complete fixed-length atoms and iteratively selects optimal matching atoms for multiple times to obtain a final sparse coefficient. The data size of the algorithm dictionary is very small, and the storage calculation burden is effectively reduced while the calculation complexity is reduced.

Although this method has a slightly reduced computational complexity compared to other sparse representation algorithms, it is still difficult to use in real-time applications. One of the main approaches to reduce the computation complexity in the matching pursuit algorithm is to reduce the number of iterations, and when the used sparse dictionary is a short-term dictionary, the time consumption for locally performing the MP algorithm on a long-term signal is far less than that of the traversal MP algorithm.

The following references are referred to herein:

[1]RavelliE,RichardG,DaudetL.UnionofMDCTbasesforaudiocoding[J].Audio,Speech,andLanguageProcessing,IEEETransactionson,2008,16(8):1361-1372.

[2]Gharavi-AlkhansariM,HuangTS.Afastorthogonalmatchingpursuitalgorithm[C]//Acoustics,SpeechandSignalProcessing,1998.Proceedingsofthe1998IEEEInternationalConferenceon.IEEE,1998,3:1389-1392.

[3]S,GribonvalR.MPTK:Matchingpursuitmadetractable[C]//Acoustics,SpeechandSignalProcessing,2006.ICASSP2006Proceedings.2006IEEEInternationalConferenceon.IEEE,2006,3:III-III.

disclosure of Invention

Aiming at the defects in the prior art, the invention provides an audio matching tracking method and system based on atomic preselection according to the influence of energy on auditory perception.

The technical scheme adopted by the invention is as follows:

an audio matching tracking method based on atomic preselection comprises the following steps:

signal decomposition and signal reconstruction, wherein the signal decomposition comprises the steps of:

s1, selecting a short-time dictionary according to the type of the original signal, and taking the short-time dictionary as a sparse dictionary;

s2 calculating successive samples S in original signal one by one_i,S_i+1,...S_i+N-1Energy of the sample, i takes 1, 2, … length (S) -N +1 in sequence, and the continuous sample with the highest energy is extracted and recorded as S_maxenergy(ii) a N is the atomic length of the short-time dictionary; length(s) is the original signal length;

s3 obtaining sparse dictionary atom at S_maxenergyThe maximum value of the absolute value of the atomic weight is

S4 calculating a signal residual Is composed ofThe corresponding atom; at the same time, willRecorded in ith of current sparse coefficient matrix_optmax row j_optmax column, i_optmax isAtomic number of (1), j_optmax isThe initial value of the current sparse coefficient matrix is a zero matrix;

s5 current signal residual S'_laterWhen the target SNR is reached or the iteration times reach a preset value, ending signal decomposition and outputting a current sparse coefficient matrix; otherwise, the current signal residual is'_laterRepeating the steps 2-5 as an original signal;

the signal reconstruction includes:

s7, extracting the atom weight in the current sparse coefficient matrix and the corresponding row number and column number;

s8 multiplying the atom weights with the corresponding atoms to obtain recovery signals, assigning the recovery signals to zero vector M with the same length as the original signal in step 1_iWith zero vector M_iJ th_optmax is the center of the recovered signal, j_optmax is the column number of the atom weight corresponding to the current recovery signal; and sequentially accumulating the assigned vectors to obtain a reconstructed signal.

In step S2, consecutive samples { S } in the original signal_i,S_i+1,...S_i+N-1The energy of i.e. the sum of the squares of the amplitudes of all samples in the succession.

In the step S2, in the step S,successive samples S in the original signal_i,S_i+1,...S_i+N-1The energy of is the sum of the absolute values of the amplitudes of all samples in the succession.

In step S2, consecutive samples { S } in the original signal_i,S_i+1,...S_i+N-1The energy of the sample is the maximum of the amplitudes of all samples in the succession.

The system corresponding to the audio matching and tracking method based on atomic preselection comprises:

a signal decomposition unit and a signal reconstruction unit, wherein the signal decomposition unit further comprises:

the dictionary establishing module 101 is used for selecting a short-time dictionary according to the type of the original signal and taking the short-time dictionary as a sparse dictionary;

a preprocessing module 102 for calculating the continuous samples { S ] in the original signal one by one_i,S_i+1,...S_i+N-1Energy of the sample, i takes 1, 2, … length (S) -N +1 in sequence, and the continuous sample with the highest energy is extracted and recorded as S_maxenergy(ii) a N is the atomic length of the short-time dictionary; length(s) is the original signal length;

a weight comparison module 103 for obtaining the atom number S of the sparse dictionary_maxenergyThe maximum value of the absolute value of the atomic weight is

A residual error calculation module 104 for calculating signal residual error Is composed ofThe corresponding atom; at the same time, willRecorded in ith of current sparse coefficient matrix_optmax row j_optmax column, i_optmax isAtomic number of (1), j_optmax isThe initial value of the current sparse coefficient matrix is a zero matrix;

a threshold control module 105 for determining the residual S'_laterWhen the target SNR is reached or the iteration times reach a preset value, ending signal decomposition and outputting a current sparse coefficient matrix; otherwise, the current signal residual is'_laterInputting the signal as an original signal into the preprocessing module 102;

the signal reconstruction unit further includes:

a reconstruction coefficient extraction module 201, configured to extract atom weights in a current sparse coefficient matrix and row numbers and column numbers corresponding to the atom weights;

a signal synthesis module 202, for multiplying the atom weights with the corresponding atoms to obtain recovery signals, and assigning the recovery signals to zero vectors M with the same length as the original signals_iWith zero vector M_iJ th_optmax is the center of the recovered signal, j_optmax is the column number of the atom weight corresponding to the current recovery signal; and sequentially accumulating the assigned vectors to obtain a reconstructed signal.

Compared with the prior art, the invention has the following characteristics:

the invention reduces the times of traversal calculation and reduces the calculation complexity by performing the MP algorithm of the incomplete dictionary on the part with higher short-time energy in the signal. In the dictionary construction, the frequency span of atoms is increased, and the constraint of the dictionary on frequency components is reduced. The sparse representation is calculatedThe method is not limited by the length of the signal to be processed, and the data volume of the dictionary is small. The reconstructed signal generated by the invention is compared with other matching pursuit fast algorithms (such asMethod) can obtain faster calculation speed without degradation of sound quality.

Drawings

FIG. 1 is a detailed flow chart of a signal decomposition section according to an embodiment of the present invention;

FIG. 2 is a detailed flow chart of a signal reconstruction section according to an embodiment of the present invention;

FIG. 3 is a block diagram of a signal decomposition subsystem according to an embodiment of the present invention;

FIG. 4 is a block diagram of a signal reconstruction subsystem according to an embodiment of the present invention;

FIG. 5 is a schematic of the atomic center position.

Detailed Description

For the convenience of understanding and implementation of the technical solution of the present invention, the technical solution of the present invention is further described in detail below with reference to the accompanying drawings and embodiments, it is to be understood that the embodiments described herein are only for illustrating and explaining the present invention, and are not to be used for limiting the present invention.

FIGS. 1-2 show the detailed process of the method of the present invention, which includes two major parts, signal decomposition and signal reconstruction.

The specific implementation of signal decomposition comprises the following steps:

step 1, selecting a short-time dictionary according to the type of an original signal.

This step is a conventional step in audio matching tracking methods. For a speech processing system, selecting a short-time dictionary having speech characteristics; for transient signal processing systems, a short-time dictionary of relative transients is selected. For systems where some features are not obvious or multiple types of signals need to be processed simultaneously, a short-term dictionary with strong universality is selected.

In this embodiment, the test sample includes types such as a speech signal and a music signal, and the short-time dictionary selects a Gabor dictionary with strong scalability. The atoms in the Gabor dictionary are constructed as follows:

g_{w, μ, σ} (n) = \frac{λ_{w, μ, σ}}{σ \sqrt{2 π}} \exp {- \frac{{(n - μ)}^{2}}{2 σ^{2}}} c o s [2 π w (n - μ)] - - - (1)

in formula (1), w represents a frequency scale; μ represents a time offset; σ represents a time scale; lambda [ alpha ]_w,μ,σRepresents atomic energy under w, mu and sigma; n represents a time domain sample point of a Gabor atom; g_w,μ,σ(n) represents the atomic amplitude at the time-domain sample point n.

When the time offset mu is taken, the traditional matching pursuit method based on the Gabor dictionary can obtain the time offsets mu of various scales as much as possible within the range allowed by the number of atoms in the dictionary. In this embodiment, μ is 0, so that all atoms in the dictionary correspond to the part of the signal with the higher energy value selected in the preprocessing, and the energy is located at the center thereof. Assuming that the variation range of N is 1 to N, and the frequency scale w, the time scale sigma and the atomic energy lambda have M combinations, the dictionary size is M × N. In this example, M is 20, and N is 1001.

Step 2, preprocessing the original signal, and calculating continuous samples { S ] in the original signal one by one_i,S_i+1,...S_i+N-1Energy of the energy, the consecutive samples with the highest energy value are marked as S_maxenergyThe continuous sample length is the atom length N of the short-time dictionary selected in step 1.

Several energy calculation methods will be provided below.

(1) The Energy value Energy of successive samples is calculated from the Energy definition as follows:

E n e r g y = {Σ_{i = m + 1}^{m + N} | S_{i} |}^{2} - - - (2)

in the formula (1), S_iIs the ith sample of the original signal S and is also used for representing the amplitude of the ith sample of the original signal S; m is the scale translation amount, and m sequentially takes 0, 1, … length (S) -N, length (S) as the length of the original signal S.

(2) Since the sum of the squared amplitudes of the samples has a quasi-proportional relationship with the sum of the absolute amplitudes of the signals, the sum of the absolute amplitudes of the samples is much less computationally intensive than the sum of the squared amplitudes of the samples. Therefore, the Energy value Energy of successive samples can be approximately calculated using equation (3):

E n e r g y = {Σ_{i = m +}^{m + N}}_{1} | S_{i} | - - - (3)

in the formula (3), S_iIs the ith sample of the original signal S and is also used for representing the amplitude of the ith sample of the original signal S; m is the scale translation amount, and m sequentially takes 0, 1, … length (S) -N, length (S) as the length of the original signal S.

(3) Different energy calculation modes can be selected according to the characteristics of the original signal. If the original signal is mostly a signal with relatively continuous amplitude, the maximum value of the amplitudes of all samples of the continuous samples is taken as the energy of the continuous samples. The method further reduces the computational complexity compared with (1) and (2).

Step 3, the short-time dictionary selected in the step 1 is used as a sparse dictionary, and all atoms in the sparse dictionary are enabled to beIn turn with S_maxenergyInner products are made to obtain atomsAt S_maxenergyThe maximum value of the absolute value of the atomic weight is expressed as

The calculation formula of (a) is as follows:

c_{i_{{opt}_{m a x}}} = m a x {a b s (< S^{'}, g_{i_{o p t}} >)} - - - (4)

in the formula (4), i_optRepresenting atomic numbers, i, in sparse dictionaries_optM, which is the number of atoms in the sparse dictionary;i.e. ith in sparse dictionary_optAn atom;to representAnd the absolute value of the inner product of S'.

Step 4, calculating S_maxenergyComponent at sparse dictionary maximum atomSignal residual S'_laterNamely S_maxenergyAndsee equation (5); and simultaneously updating the current sparse coefficient matrix.

S_{l a t e r}^{'} = S_{\max e n e r g y} - c_{i_{{opt}_{m a x}}} \cdot g_{i_{{opt}_{m a x}}} - - - (5)

Wherein,is composed ofThe corresponding atom, i.e., the largest atom.

The current sparse coefficient matrix is updated as follows:

α_{i_{o p t}}^{'} = α_{i_{o p t}} + c - - - (6)

the initial value of the sparse coefficient matrix is a zero matrix, the row number of the matrix represents an atom label, the column number represents the atom center position, and the element is an atom weight. Atomic centre position, i.e. continuous sample S_maxenergyThe position of the central sample relative to the initial point of the original signal is shown in fig. 5, the initial point of the original signal is set to 0, and the central position of the atom is set tom。

In order to update the sparse coefficient matrix before updating,and the weight matrix c is the same as the sparse coefficient matrix in size for the updated sparse coefficient matrix. The weight matrix c is obtained in the following way: subjecting the product obtained in step 3 toIs assigned to the ith weight matrix c_optmax row j_optmax column, i_optmax is the maximum atomReference number of j_optmax isAt the atomic center position of (i.e. S)_maxenergyCentral sample position of

Step 5, when the signal residual error is S'_laterWhen the target SNR is reached or the iteration times reach a preset value, ending signal decomposition and outputting a current sparse coefficient matrix; otherwise, the signal is residual S'_laterRepeating steps 2-5 as the original signal in step 2.

The matching pursuit method processes the signal by accumulating iterations to represent the original signal as the sum of the superposition of the atomic weight multiplied by the corresponding atom and the residual of the signal. From step 4, a signal residual S 'can be obtained'_laterIs when S'_laterAnd terminating iteration when the target SNR is reached or the iteration times reach a preset value, and outputting the current sparse coefficient matrix. The target SNR and the preset value of the iteration times are artificially set according to experience and actual requirements.

The signal-to-noise ratio SNR is defined as follows:

S N R (S, S^{'}) = 20 \log_{10} (\frac{| | S | |_{2}^{2}}{| | S - S^{'} | |_{2}^{2}}) - - - (7)

in equation (7), S represents the original signal, and S' is the signal after this time of sparse recovery.

In this embodiment, for a segment signal with a sampling frequency of 48kHz, the length is 500000 sample points (10s), the number of iterations is preset to 20000 times, and the target SNR is 20 dB.

The signal reconstruction method comprises the following steps:

step 6: and extracting the atom weight to be used by the reconstruction signal and the atom mark number and the atom center position corresponding to the atom weight from the current sparse coefficient matrix.

Step 7, weighting the atomsAtoms respectively corresponding theretoMultiplying to obtain a recovered signal of length NRecovering each signalRespectively assigning zero vectors M with the same length as the original signals in the step 1_iWhen assigned, with a zero vector M_iJ (d) of_optmax points are recovery signalsCentral point of (j)_optmax, atomic weightColumn numbers in the current sparse coefficient matrix; assigned vector M_iAnd accumulating the signals in sequence to obtain a reconstructed signal S'.

The reconstructed signal synthesis formula is as follows:

S^{'} = Σ_{i = 1}^{k} M_{i} - - - (8)

and k is the number of primitive weights in the current sparse coefficient matrix.

Referring to fig. 3 to 4, the invention further provides an audio matching and tracking system based on atomic preselection, which includes a signal decomposition unit and a signal reconstruction unit. The signal decomposition unit further comprises a dictionary establishing module 101, a preprocessing module 102, a weight value comparison module 103, a residual error calculation module 104 and a threshold control module 105; the signal reconstruction unit further comprises a reconstruction coefficient extraction module 201 and a signal synthesis module 202. Wherein:

the dictionary establishing module 101 is used for selecting a short-time dictionary according to the original signal type and taking the short-time dictionary as a sparse dictionary.

A preprocessing module 102 for calculating the continuous samples { S ] in the original signal one by one_i,S_i+1,...S_i+N-1Energy of the sample, i takes 1, 2, … length (S) -N +1 in sequence, and the continuous sample with the highest energy is extracted and recorded as S_maxenergy(ii) a N is the atomic length of the short-time dictionary; length(s) is the original signal length.

In the preprocessing module 102, the energy of consecutive samples can be calculated as follows:

(1) will continue the samples { S_i,S_i+1,...S_i+N-1The sum of the squares of the amplitudes of all samples in the sequence is taken as the energy of the consecutive samples, see equation (2).

(2) Will continue the samples { S_i,S_i+1,...S_i+N-1The sum of the absolute values of the amplitudes of all samples in the sequence is used as the energy of the consecutive samples, see formula (3).

(3) Will continue the samples { S_i,S_i+1,...S_i+N-1The maximum of all sample amplitudes in the constellation is taken as the energy of the consecutive samples.

The weight comparison module 103 is used for obtaining the atom number S of the sparse dictionary_maxenergyThe maximum value of the absolute value of the atomic weight is

The residual calculation module 104 is used for calculating signal residual Is composed ofThe corresponding atom; at the same time, willRecorded in ith of current sparse coefficient matrix_optmax row j_optmax column, i_optmax isAtomic number of (1), j_optmax isThe initial value of the current sparse coefficient matrix is a zero matrix.

A threshold control module 105 for determining the residual S'_laterWhen the target SNR is reached or the iteration times reach a preset value, ending signal decomposition and outputting a current sparse coefficient matrix; otherwise, the current signal residual is'_laterAs a raw signal input to the pre-processing module 102.

The reconstruction coefficient extraction module 201 is configured to extract the atom weights in the current sparse coefficient matrix and the row numbers and column numbers corresponding to the atom weights.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An audio matching tracking method based on atom preselection is characterized by comprising the following steps:

the signal reconstruction includes:

s8 restoring the atomic weight by multiplying the atomic weight with the corresponding atomSignals, each restored signal is respectively assigned to a zero vector M with the same length as the original signal in the step 1_iWith zero vector M_iJ th_optmax is the center of the recovered signal, j_optmax is the column number of the atom weight corresponding to the current recovery signal; and sequentially accumulating the assigned vectors to obtain a reconstructed signal.

2. The method for audio matching pursuit based on atomic preselection of claim 1, characterized by:

3. The method for audio matching pursuit based on atomic preselection of claim 1, characterized by:

in step S2, consecutive samples { S } in the original signal_i,S_i+1,...S_i+N-1The energy of is the sum of the absolute values of the amplitudes of all samples in the succession.

4. The method for audio matching pursuit based on atomic preselection of claim 1, characterized by:

5. An audio matching tracking system based on atomic preselection, comprising:

a preprocessing module 102 for calculating the continuous samples { S ] in the original signal one by one_i,S_i+1,...S_i+N-1Energy of }Quantity, i takes 1, 2, … length (S) -N +1 in turn, extracts the continuous sample with the highest energy, and records as S_maxenergy(ii) a N is the atomic length of the short-time dictionary; length(s) is the original signal length;

the signal reconstruction unit further includes: