CN108962265B

CN108962265B - Voice signal compression storage and reconstruction method based on superposition sequence

Info

Publication number: CN108962265B
Application number: CN201810497026.XA
Authority: CN
Inventors: 卿朝进; 万东琴; 阳庆瑶; 王维; 郭奕
Original assignee: Xihua University
Current assignee: Xihua University
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2020-08-25
Anticipated expiration: 2038-05-22
Also published as: CN108962265A

Abstract

The invention discloses a method for compressing, storing and reconstructing a voice signal based on a superposition sequence, which comprises the following steps: reading a sparse voice signal, constructing an original index sequence by using non-zero elements and zero element position indexes, and storing the sparsity of the sparse voice signal; compressing the sparse voice signal to generate a compressed signal sequence; intercepting partial sequence of the original index sequence as an index sequence, and generating a spread spectrum index sequence through processing such as coding, conversion, spread spectrum and the like; respectively weighting and superposing the spread spectrum index sequence and the compressed signal sequence to generate a storage sequence for storage; de-spreading the stored sequence to obtain a conversion index sequence and a compressed signal sequence; restoring and decoding the converted index sequence to obtain an index sequence; and constructing a support set according to the index sequence and reconstructing a sparse speech signal. The invention has the advantages that: under the condition of not increasing storage resources, the reconstruction precision of the voice signal is effectively improved.

Description

Voice signal compression storage and reconstruction method based on superposition sequence

Technical Field

The invention relates to the technical field of compression storage and reconstruction of voice signals, in particular to a voice signal compression storage and reconstruction method based on a superposition sequence.

Background

With the increasingly frequent information interaction, a voice signal is a very common signal in the information interaction, and the processing technical requirements are gradually refined. Due to the diversity of the speech signal itself and the uniqueness of the human auditory system, the speech signal is sparse in different transform domains. Conventional speech signal sampling typically requires that the nyquist sampling rate be satisfied. Compressed sensing theory (CS) indicates that signals with sparseness or compressibility can be Compressed sampled and reconstructed by Compressed sensing techniques. Therefore, the CS theory is combined with the speech signal processing field, so that the sampling frequency is reduced, and the requirement on a sampling device is lowered.

And according to the CS theory, compressing the sparse voice signal through an observation matrix, and reconstructing the sparse voice signal by using a reconstruction algorithm. However, the existing reconstruction algorithms such as the matching pursuit algorithm, the orthogonal matching pursuit algorithm, the compressive sampling matching pursuit algorithm, the basis pursuit algorithm, the subspace pursuit algorithm, and the like are not specifically proposed for the reconstruction of the sparse speech signal, and then the element position index of the sparse speech signal is not considered and utilized, so that the reconstruction accuracy of the sparse speech signal is limited.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a voice signal compression storage and reconstruction method based on a superposition sequence. Compared with the traditional compressed sensing voice compression, the method and the device have the advantages that partial position indexes of elements of the sparse voice signal are used for assisting reconstruction, and the reconstruction accuracy of the voice signal is improved under the condition that the storage cost is not increased.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a method for compressing, storing and reconstructing a voice signal based on a superposition sequence comprises the following steps: (a) and (3) compression and storage processing of the voice signal:

(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "

Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;

(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;

the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;

the M, N generally satisfies M ≦ N;

(a3) for "original index sequence" of length N "

Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;

(a4) according to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L₁The 'compressed index sequence' B is subjected to data conversion to obtain a length L₂The "inverted index sequence" C of (1);

(a5) for the length L₂The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "

(a6) For "spread spectrum index sequence" of length M "

And the compressed signal sequence y are added after weights α and 1- α are respectively given to the compressed signal sequence y, and a formula is utilized

Generating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;

the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1.

(b) Reconstruction reproduction processing of a speech signal:

(b1) de-spreading the memory sequence z with length M to restore length L₂"inverted index sequence" C;

(b2) for the length L₂The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index' with the length of MSequence'

(b3) Using formulas

Solving a 'compressed signal sequence' y with the length M;

(b4) for the length L₂The 'transformation index sequence' C is used for data reduction to obtain a length L₁The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;

(b5) recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "

(b6) By using "fixed support assembly"

And (3) assisting and reconstructing a sparse speech signal x with the length of N from the 'compressed signal sequence' y with the length of M by combining a reconstruction algorithm.

Further, the sparse speech signal described in step a1) is a discrete speech signal that is transformed from a time domain signal to a frequency domain signal by a time-frequency transform method, and the signal amplitude below the silence threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with a length N.

Such as the MPEG (moving Picture Experts group) psychoacoustic model and the OGG (OGGVObis) psychoacoustic model.

The time-frequency transformation method can adopt discrete cosine transformation, short-time Fourier transformation and wavelet transformation.

Further, constructing an "original index sequence" of length N with 0,1 elements as described in step a1) "

The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"

The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"

The "original index sequence" thus constructed, with the correspondence record of element 1 "

Is a sequence with the length of N and the element of 0 or 1

Further, the data conversion process in step a4) is as follows: will have a length L₁The data of the "compressed index sequence" B of (1) is divided into L groups of γ data₂If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L₂The "inverted index sequence" C of (1).

Further, the utilization length of step a5) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "

The method comprises the following specific steps:

a5-1) "inverted index sequence"

Suppose Q ∈ R^q×1Is a spreading sequence, where q is the spreading gain, satisfies

Wherein, the spreading sequence Q can be M sequence, M sequence, Gold sequence, Zadoff-chu sequence.

Wherein, the symbol

Indicating a downward integer operation.

a5-2) calculating the Kronecker product,

spread spectrum spreading of sequence C, i.e. S of length (L)₂×q)；

Where the superscript "T" denotes the transpose operation.

a5-3) adds zeros at the end of the vector S, starting from (L)₂× q) is added to M, thereby constructing a "spreading index sequence"

The degree is M.

Further, the utilization length of step b2) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "

The specific steps of (a) are consistent with the steps a5-1) to a 5-3).

Further, the data restoring process in step b4) is as follows: will have a length L₂The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L₁And the sequence formed by the rest elements is the 'compressed index sequence' B.

Further, the step b6) of using the "fixed support set”

The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "

And (5) assisting reconstruction.

Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm.

Further, taking the reconstruction algorithm orthogonal matching pursuit algorithm as an example, the step b6) includes:

b6-1) reading the "compressed Signal sequence" y ∈ R^M×1Measurement matrix phi ∈ R^M×NThe sparsity K, t represents the number of iterations, r_tDenotes the residual, Ω, of t iterations_tSet of indices (column indices) representing t iterations, i.e. support set of t iterations, K_tRepresents the index set omega_tThe number of the elements (c) is,

represents K_t× 1 vector, λ_tIndicates the index (column index), a, found at the t-th iteration_jThe jth column of the matrix Φ (j ═ 1,2, …, N),

express according to "fixed support assembly"

Set of columns of the selected matrix phi, phi_tRepresenting the set omega by index_tSelected column set of matrix Φ (size M × K)_tThe matrix of (d), the notation ∪ denotes a union operation, | · | denotes taking the absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)^-1Representing matrix inversion;

b6-2) initialization

b6-3) if K_t< K, solving

Find index lambda_t(ii) a Otherwise, ask for

Least squares solution of (c):

performing step b 6-8);

b6-4) to omega_t＝Ω_t-1∪{λ_t},

b6-5) solving

Least squares solution of (c):

b6-6) updating residual

b6-7) t ═ t +1, return to step b 6-3);

b6-8) sparse Speech Signal

In the support set omega_tWith non-zero terms at the index, whose value is the least-squares solution sought

Will be provided with

In the support set omega_tIndexingThe other elements are set to 0 to reconstruct the sparse speech signal x.

Compared with the prior art, the invention has the advantages that:

partial position indexes of the sparse speech signal are stored under the condition that storage cost is not increased, and compared with the traditional compressed sensing speech compression, reconstruction accuracy is effectively improved.

Drawings

FIG. 1 is a schematic flow chart of a method for storing and reconstructing a speech signal sample based on a superposition sequence according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of the compression and storage process of the speech signal based on the superposition sequence sampling storage and reconstruction method according to the embodiment of the present invention.

FIG. 3 is a schematic flow chart of the reconstruction and reproduction process of the voice signal based on the method for storing and reconstructing the voice signal samples of the superposition sequence according to the embodiment of the present invention

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings by way of examples.

A flow chart of a method for storing and reconstructing speech signal samples based on a superposition sequence is shown in fig. 1.

The following describes in detail the processing procedure of compressing and storing the voice signal according to the method for compressing, storing and reconstructing the voice signal based on the superposition sequence, as shown in fig. 2.

the sparse speech signal is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude lower than a mute threshold is set to be zero according to a psychoacoustic model to obtain a sparse speech signal x with the length of N.

Among them, the "psychoacoustic models" are, for example, an mpeg (moving Picture Experts group) psychoacoustic model and an ogg (oggvobis) psychoacoustic model, and so on.

Such as discrete cosine transform, short-time fourier transform, wavelet transform, and so on.

Wherein, the 'original index sequence' with the length of N is constructed by 0,1 element "

Corresponding to the record element 1, the "original index sequence" thus constructed "

Is a sequence of length N and elements 0 or 1.

Example 1: examples of such "construction" are as follows:

sparse speech signal assuming N18

x＝(5.4,3.2,6.7,0,0.9,0,7.8,0,0,1.2,0.8,0,4.2,0,0,0,0,0)^TThen "original index sequence"

Where the superscript "T" denotes the transpose operation.

(a2) Reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process can be expressed as that y is equal to phi x;

the M, N generally satisfies M ≦ N;

(a3) for "original index sequence" of length N "

example 2: the "intercept" example is as follows:

on the basis of example 1, assuming β ═ 0.5, β N ═ 9, "original index sequence"

Then "index sequence" a ═ 1,1,1,0,1,0,1,0,0)^T。

wherein, the data conversion process comprises the following steps: will have a length L₁The data of the "compressed index sequence" B of (1) is divided into L groups of γ data₂If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L₂The "inverted index sequence" C of (1).

Example 3: examples of such "transformations" are as follows:

assuming a length L₁The sequence B of 62 ═ 1,0,0,1,0,1,0,1,0, 0,1,0,1,1,1,0,1,0,0, 0,1,0,1,1,1,0,0,1,0,1,0,1, 0,1,1,0, 1,0,1,0,0, 0,1,1,0,0,0,1,0^TWhen γ is 16, the groups are divided into 4 groups, i.e., L ₂4, and two bits 0 are added at the end, then 4 groups of data are 1001010101011101, 0001011000101010, 1110010101110101 and 0010111000110000 in sequence, the data are 38237,5674,58741 and 11824 in sequence from binary conversion to decimal real number, and then the 'conversion index sequence' C is equal to (38237,5674,58741,11824)^TAnd is a vector of 4 × 1.

Example 4: the utilization length is L₂C constructs a 'spread index sequence' by spreading and zero-padding "

Examples are as follows:

a5-1) assuming "inverted index sequence" C ═ 3.8,5.6,5.8,1.2)^T，L₂＝4，M＝25， Q∈R^q×1For spreading sequences, Q ═ 1,1,1,1,1)^TWherein q is a spreading gain, satisfies

Wherein, Q ═ 1,1,1,1,1)^TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.

Wherein, the symbol

Indicating a downward integer operation.

a5-2) calculating the Kronecker product,

realizing the spread spectrum expansion of the sequence C, namely the S length is 24;

a5-3) adds zeros at the end of the vector S, starting from (L)₂× q) to M, i.e., from 24 to 25, thereby constructing a "translation index sequence"

The length is 25.

(a6) For "spread spectrum index sequence" of length M "

the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1;

example 5: the construct "memory sequence" z is exemplified as follows:

on the basis of example 4, assume that M is 25, a "spreading index sequence"

"compressed signal sequence" y ═ y (y)₁,y₂,…,y₂₄,y₂₅)^Tα -0.2, and 1- α -0.8, then according to the formula,

the following describes in detail the reconstruction and reproduction process of the speech signal according to the method for compressed storage and reconstruction of a speech signal based on a superposition sequence, as shown in fig. 3.

(b1) De-spreading the 'memory sequence' z with length M to restore length L₂"inverted index sequence" C;

example 6: an example of "despreading" is as follows:

on the basis of example 4 and example 5, assume that "storage sequence" z ∈ R^M×1，M＝25， Q∈R^q×1For spreading sequences, Q ═ 1,1,1,1,1)^TWherein q ═ 6 is the spreading gain;

b1-1) on the basis of examples 4 and 5, it is known that:

b1-2) partitioning the sequence z into blocks

Of (2) a

And one (M-L)₂× q) × 1 sequence of pure speech signals

I.e.into 4 sequences z of length 6₁,z₂...z₄And a pure speech signal sequence

Then

Wherein the content of the first and second substances,

b1-3) vs. z₁,z₂z₃,z₄Despreading is performed assuming despread data h ═ 4.56,6.96,1.44,6.72^TNamely:

take i as an example 1, i

b1-4) Speech Signal sequence y_i1,y_i2,…,y_i6And Q₁,Q₂,…,Q₆Linearity is not relevant, so:

0.8y_i1Q₁+0.8y_i2Q₂+…+0.8y_i6Q₆＝0；

namely: 0.8y₁₁Q₁+0.8y₁₂Q₂+…+0.8y₁₆Q₆＝0；

b1-5) therefore:

namely:

b1-6) known spreading matrix Q ═ (Q)₁,Q₂,…,Q₆)^TI.e. Q ═ 1,1,1,1,1)^T；

b1-7), the despreading restores the 'conversion index sequence' C ═ C₁,C₂,C₃,C₄)^T；

Namely: 4.56 ═ 0.2C₁+0.2C₁+…+0.2C₁To free C₁＝3.8；

By the same token, solve out C₂,C₃,C₄That is, despreading and recovering the 'conversion index sequence' C ═ (3.8,5.6,5.8,1.2)^T。

(b2) For the length L₂The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "

Wherein the construction of a "spreading index sequence"

Examples are consistent with those described in example 4.

(b3) Using formulas

Solving a 'compressed signal sequence' y with the length M;

wherein, the data reduction process comprises the following steps: will have a length L₂The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L₁And the sequence formed by the rest elements is the 'compressed index sequence' B.

Example 7: examples of such "data reduction" are as follows:

on the basis of example 3, assume that "conversion index sequence" C ═ C (38237,5674,58741,11824)^TSequence B is of length L₁62, converting the real number element into a binary system to obtain sequence data of 1001010101011101000101100010101011100101011101010010111000110000, removing the last two digits 0 from the end of the binary system obtained by conversion, and reducing the sequence B to be (1,0,0,1, 0),1,0,1,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,0, 1,0,1,0,1,1,1,0,1,0,1,0,0,1,0,1,1,1,0,0,0,1,1,0,0)^T

Example 8: the components form a fixed support assembly "

Examples of (c) are as follows:

let "index sequence" a ═ 1,1,1,0,1,0,1,0,0)^TRecording the sequence numbers of the non-zero elements in the index sequence A in a set to form a fixed support set "

(b6) By using "fixed support assembly"

And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.

Wherein, the said use of "fixed support assembly"

And (5) assisting reconstruction.

Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm, among others.

Taking the reconstruction computation orthogonal matching pursuit algorithm as an example, the step b6) includes:

express according to "fixed support assembly"

b6-2) initialization

b6-3) if K_t< K, solving

Find index lambda_t(ii) a Otherwise, ask for

Least squares solution of (c):

performing step b 6-8);

b6-4) to omega_t＝Ω_t∪{λ_t},

Example 9: the step b6-4) is exemplified as follows:

on the basis of example 8, assume that

t＝1，λ_tWhen the value is 17, then

Then omega_t＝Ω_t∪{λ_t}＝{1,4,7,10,14,17}，

b6-5) solving

Least squares solution of (c):

b6-6) updating residual

b6-7) t ═ t +1, return to step b 6-3);

b6-8) sparse Speech Signal

Will be provided with

In the support set omega_tThe elements outside the index are set to 0 to reconstruct the sparse speech signal x.

Example 10: an example of the sparse speech signal x reconstruction is as follows:

sparse speech signal obtained by hypothesis reconstruction

Length N25, in the support set Ω_tWith non-zero terms whose values are the least-squares solution of

Ω_t＝{1,4,5,7,8,10,14,17,19,23}，

Will be provided with

In the support set omega_tThe element outside the index is set to 0, then

I.e. reconstructing a sparse speech signal with a length N-25

x＝(x₁,0,0,x₄,x₅,0,0,x₇,x₈,0,0,x₁₀,0,0,0,x₁₄,0,0,x₁₇,0,x₁₉,0,0,0,x₂₃,0,0)

It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A method for compressing, storing and reconstructing a voice signal based on a superposition sequence is characterized by comprising the following steps: (a) and (3) compression and storage processing of the voice signal:

the M, N generally satisfies M ≦ N;

(a3) for "original index sequence" of length N "

The utilization length of step a5) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "

The method comprises the following specific steps:

a5-1) "inverted index sequence"

Wherein, the symbol

Represents a downward integer operation;

a5-2) calculating the Kronecker product,

spread spectrum spreading of sequence C, i.e. S of length (L)₂×q)；

Wherein, the superscript "T" represents the transposition operation;

A length of M;

(a6) for "spread spectrum index sequence" of length M "

And a "compressed signal sequence" y are assigned respectivelyThe weighted values α and 1- α are added, and the formula is used

(b) reconstruction reproduction processing of a speech signal:

(b3) Using formulas

Solving a 'compressed signal sequence' y with the length M;

(b6) By using "fixed support assembly"

2. The method of claim 1, wherein: the sparse speech signal described in step a1) is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude below a mute threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with the length of N.

3. The method of claim 1, wherein: construction of an "original index sequence" of length N with 0,1 elements as described in step a1) "

Is a sequence of length N and elements 0 or 1.

4. The method of claim 1, wherein: the data conversion process of the step a4) is as follows: will have a length L₁The data of the "compressed index sequence" B of (1) is divided into L groups of γ data₂If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L₂The "inverted index sequence" C of (1).

5. The method of claim 1, wherein: the utilization length of step b2) is L₂C constructs a "spread index sequence" of length M by spreading and zero-padding "

The specific steps of (a) are consistent with the steps a5-1) to a 5-3).

6. The method of claim 1, wherein: the data reduction process of the step b4) is as follows: will have a length L₂The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L₁And the sequence formed by the rest elements is the 'compressed index sequence' B.

7. The method of claim 1, wherein: using "fixed support sets" as described in step b6) "

And (5) assisting reconstruction.