CN108962265A

CN108962265A - A kind of Speech Signal Compression storage and reconstructing method based on superposition sequence

Info

Publication number: CN108962265A
Application number: CN201810497026.XA
Authority: CN
Inventors: 卿朝进; 万东琴; 阳庆瑶; 王维; 郭奕
Original assignee: Xihua University
Current assignee: Xihua University
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2018-12-07
Anticipated expiration: 2038-05-22
Also published as: CN108962265B

Abstract

The invention discloses a kind of methods of Speech Signal Compression storage and reconstruct based on superposition sequence, comprising: reads sparse voice signal, constructs primary index sequence using nonzero element and neutral element location index, store the degree of rarefication of sparse voice signal；Compression processing is carried out to sparse voice signal, generates compressed signal sequence；The partial sequence of primary index sequence is intercepted as index sequence, generates spread spectrum index sequence by processing such as coding, conversion, spread spectrums；Storage sequence is generated after spread spectrum index sequence is weighted, is superimposed respectively with compressed signal sequence to store；Despreading processing is done to storage sequence, obtains conversion index sequence and compressed signal sequence；Conversion index sequence is passed through into data convert, decoding, restores index sequence；Support set is constructed according to index sequence, and reconstructs sparse voice signal.The invention has the advantages that: in the case where not increasing storage resource, effectively improve the reconstruction accuracy of voice signal.

Description

Voice signal compression storage and reconstruction method based on superposition sequence

Technical Field

The invention relates to the technical field of compression storage and reconstruction of voice signals, in particular to a voice signal compression storage and reconstruction method based on a superposition sequence.

Background

With the increasingly frequent information interaction, a voice signal is a very common signal in the information interaction, and the processing technical requirements are gradually refined. Due to the diversity of the speech signal itself and the uniqueness of the human auditory system, the speech signal is sparse in different transform domains. Conventional speech signal sampling typically requires that the nyquist sampling rate be satisfied. Compressed sensing theory (CS) indicates that signals with sparseness or compressibility can be Compressed sampled and reconstructed by Compressed sensing techniques. Therefore, the CS theory is combined with the speech signal processing field, so that the sampling frequency is reduced, and the requirement on a sampling device is lowered.

And according to the CS theory, compressing the sparse voice signal through an observation matrix, and reconstructing the sparse voice signal by using a reconstruction algorithm. However, the existing reconstruction algorithms such as the matching pursuit algorithm, the orthogonal matching pursuit algorithm, the compressive sampling matching pursuit algorithm, the basis pursuit algorithm, the subspace pursuit algorithm, and the like are not specifically proposed for the reconstruction of the sparse speech signal, and then the element position index of the sparse speech signal is not considered and utilized, so that the reconstruction accuracy of the sparse speech signal is limited.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a voice signal compression storage and reconstruction method based on a superposition sequence. Compared with the traditional compressed sensing voice compression, the method and the device have the advantages that partial position indexes of elements of the sparse voice signal are used for assisting reconstruction, and the reconstruction accuracy of the voice signal is improved under the condition that the storage cost is not increased.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a method for compressing, storing and reconstructing a voice signal based on a superposition sequence comprises the following steps: (a) and (3) compression and storage processing of the voice signal:

(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0,1 elementColumn'Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;

(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;

the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;

the M, N generally satisfies M ≦ N;

(a3) for "original index sequence" of length N "intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that β is more than 0 and less than or equal to 1;

(a4) according to the Huffman coding, the 'index sequence' A with the length of β N is compressed and coded to generate the length of L₁The 'compressed index sequence' B is subjected to data conversion to obtain a length L₂The "inverted index sequence" C of (1);

(a5) for the length L₂The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "

(a6) For "spread spectrum index sequence" of length M "and the 'compressed signal sequence' y are respectively given weight values alpha and 1-alphaPerforming superposition by using formulaGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;

the weight α is set according to engineering experience and meets the condition that α is more than or equal to 0 and less than or equal to 1.

(b) Reconstruction reproduction processing of a speech signal:

(b1) de-spreading the memory sequence z with length M to restore length L₂"inverted index sequence" C;

(b2) for the length L₂The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "

(b3) Using formulasSolving a 'compressed signal sequence' y with the length M;

(b4) for the length L₂The 'transformation index sequence' C is used for data reduction to obtain a length L₁the 'index sequence' B is compressed, and then the 'index sequence' A with the length of β N is restored by decoding through Huffman decoding;

(b5) recording the column sequence numbers of non-zero elements in an 'index sequence' A with the length of β N in a set to form a 'fixed support set'

(b6) By using "fixed support assembly"Assisted by, and combined with, heavyThe construction algorithm reconstructs a sparse speech signal x of length N from a "compressed signal sequence" y of length M.

Further, the sparse speech signal described in step a1) is a discrete speech signal that is transformed from a time domain signal to a frequency domain signal by a time-frequency transform method, and the signal amplitude below the silence threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with a length N.

The "psychoacoustic models" are, for example, an MPEG (Moving Picture Experts Group) psychoacoustic model and an OGG (OGGVobis) psychoacoustic model.

The time-frequency transformation method can adopt discrete cosine transformation, short-time Fourier transformation and wavelet transformation.

Further, constructing an "original index sequence" of length N with 0,1 elements as described in step a1) "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"The "original index sequence" thus constructed, with the correspondence record of element 1 "Is a sequence with the length of N and the element of 0 or 1

Further, the data conversion process in step a4) is as follows: will have a length L₁The data of the "compressed index sequence" B of (1) is divided into L groups of γ data₂In which case "zero padding" is used if the number of data of sequence BETA is not exactly divisible by γ"construct a sequence that can be evenly divided by γ; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L₂The "inverted index sequence" C of (1).

Further, the utilization length of step a5) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "The method comprises the following specific steps:

a5-1) "inverted index sequence"Suppose Q ∈ R^q×1Is a spreading sequence, where q is the spreading gain, satisfies

Wherein, the spreading sequence Q can be M sequence, M sequence, Gold sequence, Zadoff-chu sequence.

Wherein, the symbolIndicating a downward integer operation.

a5-2) calculating the Kronecker product,

spread spectrum spreading of sequence C, i.e. S of length (L)₂×q)；

Where the superscript "T" denotes the transpose operation.

a5-3) adds zeros at the end of the vector S, starting from (L)₂Xq) to M, thereby constructing a "spreading index sequence”

The degree is M.

Further, the utilization length of step b2) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "The specific steps of (a) are consistent with the steps a5-1) to a 5-3).

Further, the data restoring process in step b4) is as follows: will have a length L₂The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L₁And the sequence formed by the rest elements is the 'compressed index sequence' B.

Further, the step b6) of using the "fixed support assembly"The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.

Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm.

Further, taking the reconstruction algorithm orthogonal matching pursuit algorithm as an example, the step b6) includes:

b6-1) reading the "compressed signal sequence" y ∈ R^M×1The measurement matrix phi ∈ R^M×NThe sparsity K, t represents the number of iterations, r_tDenotes the residual, Ω, of t iterations_tSet of indices (column indices) representing t iterations, i.e. support set of t iterations, K_tRepresents the index set omega_tThe number of the elements (c) is,represents K_tX 1 vector, λ_tIndicates the index (column index), a, found at the t-th iteration_jThe jth column of the matrix Φ (j ═ 1,2, …, N),express according to "fixed support assembly"Set of columns of the selected matrix phi, phi_tRepresenting the set omega by index_tSelected set of columns (size M K) of matrix phi_tmatrix of (d), the symbol @ represents a union operation, | · | represents solving an absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)^-1Representing matrix inversion;

b6-2) initialization

b6-3) if K_t< K, solvingFind index lambda_t(ii) a Otherwise, ask forLeast squares solution of (c):performing step b 6-8);

b6-4) to omega_t＝Ω_t-1∪{λ_t},

b6-5) solvingLeast squares solution of (c):

b6-6) updating residual

b6-7) t ═ t +1, return to step b 6-3);

b6-8) sparse Speech SignalIn the support set omega_tWith non-zero terms at the index, whose value is the least-squares solution soughtWill be provided withIn the support set omega_tThe elements outside the index are set to 0 to reconstruct the sparse speech signal x.

Compared with the prior art, the invention has the advantages that:

partial position indexes of the sparse speech signal are stored under the condition that storage cost is not increased, and compared with the traditional compressed sensing speech compression, reconstruction accuracy is effectively improved.

Drawings

FIG. 1 is a schematic flow chart of a method for storing and reconstructing a speech signal sample based on a superposition sequence according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of the compression and storage process of the speech signal based on the superposition sequence sampling storage and reconstruction method according to the embodiment of the present invention.

FIG. 3 is a schematic flow chart of the reconstruction and reproduction process of the voice signal based on the method for storing and reconstructing the voice signal samples of the superposition sequence according to the embodiment of the present invention

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings by way of examples.

A flow chart of a method for storing and reconstructing speech signal samples based on a superposition sequence is shown in fig. 1.

The following describes in detail the processing procedure of compressing and storing the voice signal according to the method for compressing, storing and reconstructing the voice signal based on the superposition sequence, as shown in fig. 2.

(a1) Reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;

the sparse speech signal is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude lower than a mute threshold is set to be zero according to a psychoacoustic model to obtain a sparse speech signal x with the length of N.

Among them, the "psychoacoustic models" are, for example, an MPEG (Moving Picture Experts Group) psychoacoustic model and an OGG (OGGVobis) psychoacoustic model, and the like.

Such as discrete cosine transform, short-time fourier transform, wavelet transform, and so on.

Wherein, the 'original index sequence' with the length of N is constructed by 0,1 element "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"Corresponding to the record element 1, the "original index sequence" thus constructed "Is a sequence of length N and elements 0 or 1.

Example 1: examples of such "construction" are as follows:

sparse speech signal assuming N18

x＝(5.4,3.2,6.7,0,0.9,0,7.8,0,0,1.2,0.8,0,4.2,0,0,0,0,0)^TThen "original index sequence"

Where the superscript "T" denotes the transpose operation.

(a2) Reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process can be expressed as that y is equal to phi x;

the M, N generally satisfies M ≦ N;

example 2: the "intercept" example is as follows:

on the basis of example 1, assuming that β is 0.5, β N is 9, "original index sequence"Then "index sequence" a ═ 1,1,1,0,1,0,1,0,0)^T。

wherein, the data conversion process comprises the following steps: will have a length L₁The data of the "compressed index sequence" B of (1) is divided into L groups of γ data₂If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L₂The "inverted index sequence" C of (1).

Example 3: examples of such "transformations" are as follows:

assuming a length L₁The sequence B of 62 ═ 1,0,0,1,0,1,0,1,0, 0,1,0,1,1,1,0,1,0,0,0,1,0,1,1, 1,0,0,1,0,1,0,1,0,1, 1,0,1,0,1,0,0, 0,1,1,0,0,0,1,0^TWhen γ is 16, the groups are divided into 4 groups, i.e., L₂4, and two bits 0 are added at the end, then 4 groups of data are 1001010101011101, 0001011000101010, 1110010101110101 and 0010111000110000 in sequence, the data are 38237,5674,58741 and 11824 in sequence from binary conversion to decimal real number, and then the 'conversion index sequence' C is equal to (38237,5674,58741,11824)^TAnd is a4 × 1 vector.

Example 4: the utilization length is L₂C constructs a 'spread index sequence' by spreading and zero-padding "Examples are as follows:

a5-1) assuming "inverted index sequence" C ═ 3.8,5.6,5.8,1.2)^T，L₂＝4，M＝25， Q∈R^q×1For spreading sequences, Q ═ 1,1,1,1,1)^TWherein q is a spreading gain, satisfies

Wherein, Q ═ 1,1,1,1,1)^TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.

Wherein, the symbolIndicating a down-fetch operationDo this.

a5-2) calculating the Kronecker product,

realizing the spread spectrum expansion of the sequence C, namely the S length is 24;

a5-3) adds zeros at the end of the vector S, starting from (L)₂X q) to M, i.e., from 24 to 25, thereby constructing a "inverted index sequence" The length is 25.

(a6) For "spread spectrum index sequence" of length M "and the 'compressed signal sequence' y are respectively endowed with weights α and 1- α and then are superposed by using a formulaGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;

the weight α is set according to engineering experience and meets the condition that α is more than or equal to 0 and less than or equal to 1;

example 5: the construct "memory sequence" z is exemplified as follows:

on the basis of example 4, assume that M is 25, a "spreading index sequence""compressed signal sequence" y ═ y (y)₁,y₂,…,y₂₄,y₂₅)^Twhere α is 0.2 and 1- α is 0.8, then according to the formula,

the following describes in detail the reconstruction and reproduction process of the speech signal according to the method for compressed storage and reconstruction of a speech signal based on a superposition sequence, as shown in fig. 3.

(b1) De-spreading the 'memory sequence' z with length M to restore length L₂"inverted index sequence" C;

example 6: an example of "despreading" is as follows:

on the basis of example 4 and example 5, assume that "memory sequence" z ∈ R^M×1，M＝25， Q∈R^q×1For spreading sequences, Q ═ 1,1,1,1,1)^TWherein q ═ 6 is the spreading gain;

b1-1) on the basis of examples 4 and 5, it is known that:

b1-2) partitioning the sequence z into blocksOf (2) aAnd one (M-L)₂X q) x 1 pureSpeech signal sequenceI.e.into 4 sequences z of length 6₁,z₂...z₄And a pure speech signal sequenceThen

Wherein,

b1-3) vs. z₁,z₂z₃,z₄Despreading is performed assuming despread data h ═ 4.56,6.96,1.44,6.72^TNamely:

take i as an example 1, i

b1-4) Speech Signal sequence y_i1,y_i2,…,y_i6And Q₁,Q₂,…,Q₆Linearity is not relevant, so:

0.8y_i1Q₁+0.8y_i2Q₂+…+0.8y_i6Q₆＝0；

namely: 0.8y₁₁Q₁+0.8y₁₂Q₂+…+0.8y₁₆Q₆＝0；

b1-5) therefore:

namely:

b1-6) known spreading matrix Q ═ (Q)₁,Q₂,…,Q₆)^TI.e. Q ═ 1,1,1,1,1)^T；

b1-7), the despreading restores the 'conversion index sequence' C ═ C₁,C₂,C₃,C₄)^T；

Namely: 4.56 ═ 0.2C₁+0.2C₁+…+0.2C₁To free C₁＝3.8；

By the same token, solve out C₂,C₃,C₄That is, despreading and recovering the 'conversion index sequence' C ═ (3.8,5.6,5.8,1.2)^T。

Wherein the construction of a "spreading index sequence"Examples are consistent with those described in example 4.

(b3) Using formulasSolving a 'compressed signal sequence' y with the length M;

wherein, the data reduction process comprises the following steps: will have a length L₂The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L₁And the sequence formed by the rest elements is the 'compressed index sequence' B.

Example 7: examples of such "data reduction" are as follows:

on the basis of example 3, assume that "conversion index sequence" C ═ C (38237,5674,58741,11824)^TSequence B is of length L₁When the sequence data obtained by converting the real number element into the binary is 1001010101011101000101100010101011100101011101010010111000110000, the last two bits 0 are removed from the end of the binary obtained by conversion, and the sequence B is reduced to (1,0,0,1,0,1,0,1,0,1, 1,1,0,1,0,0,0,1,0,1,1, 1,0,0,0,1,0,1,0,1,0,1,1,1,0,0,1,0,1,0, 0,1,1,1,0,0,0,1, 0,1,0,0,0,1,0, 0^T

Example 8: the components form a fixed support assembly "Examples of (c) are as follows:

let "index sequence" a ═ 1,1,1,0,1,0,1,0,0)^TRecording the sequence numbers of the non-zero elements in the index sequence A in a set to form a fixed support set "

(b6) By using "fixed support assembly"And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.

Wherein, the said use of "fixed support assembly"The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.

Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm, among others.

Taking the reconstruction computation orthogonal matching pursuit algorithm as an example, the step b6) includes:

b6-2) initialization

b6-4) to omega_t＝Ω_t∪{λ_t},

Example 9: the step b6-4) is exemplified as follows:

on the basis of example 8, assume thatt＝1，λ_tWhen the value is 17, thenThen omega_t＝Ω_t∪{λ_t}＝{1,4,7,10,14,17}，

b6-5) solvingLeast squares solution of (c):

b6-6) updating residual

b6-7) t ═ t +1, return to step b 6-3);

Example 10: an example of the sparse speech signal x reconstruction is as follows:

sparse speech signal obtained by hypothesis reconstructionLength N25, in the support set Ω_tWith non-zero terms whose values are the least-squares solution ofΩ_t＝{1,4,5,7,8,10,14,17,19,23}，Will be provided withIn the support set omega_tThe element outside the index is set to 0, then

I.e. reconstructing a sparse speech signal with a length N-25

x＝(x₁,0,0,x₄,x₅,0,0,x₇,x₈,0,0,x₁₀,0,0,0,x₁₄,0,0,x₁₇,0,x₁₉,0,0,0,x₂₃,0,0)

It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A method for compressing, storing and reconstructing a voice signal based on a superposition sequence is characterized by comprising the following steps: (a) and (3) compression and storage processing of the voice signal:

(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indices in speech signals while storing sparse speech signalsThe sparsity K of;

the M, N generally satisfies M ≦ N;

(b) reconstruction reproduction processing of a speech signal:

(b1) de-spread the memory sequence z with length M to restore the lengthIs L₂"inverted index sequence" C;

(b3) Using formulasSolving a 'compressed signal sequence' y with the length M;

2. The method of claim 1, wherein: the sparse speech signal described in step a1) is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude below a mute threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with the length of N.

3. The method of claim 1, wherein the step of removing the metal oxide layer comprises removing the metal oxide layer from the metal oxide layer: construction of an "original index sequence" of length N with 0,1 elements as described in step a1) "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"Corresponding to the record element 1, the "original index sequence" thus constructed "Is a sequence of length N and elements 0 or 1.

4. The method of claim 1, wherein: the data conversion process of the step a4) is as follows: will have a length L₁The data of the "compressed index sequence" B of (1) is divided into L groups of γ data₂If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L₂The "inverted index sequence" C of (1).

5. The method according to claim 1, wherein said utilization length of step a5) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "The method comprises the following specific steps:

Wherein, the symbolRepresents a downward integer operation;

a5-2) calculating the Kronecker product,

spread spectrum spreading of sequence C, i.e. S of length (L)₂×q)；

Wherein, the superscript "T" represents the transposition operation;

a5-3) adds zeros at the end of the vector S, starting from (L)₂Xq) to M, thereby constructing a "spreading index sequence"

The length is M.

6. The method of claim 1, wherein: the utilization length of step b2) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "The specific steps of (a) are consistent with the steps a5-1) to a 5-3).

7. The method of claim 1, wherein: the data reduction process of the step b4) is as follows: will have a length L₂The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L₁And the sequence formed by the rest elements is the 'compressed index sequence' B.

8. The method of claim 1, wherein: using "fixed support sets" as described in step b6) "The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.