CN108962265B - Voice signal compression storage and reconstruction method based on superposition sequence - Google Patents
Voice signal compression storage and reconstruction method based on superposition sequence Download PDFInfo
- Publication number
- CN108962265B CN108962265B CN201810497026.XA CN201810497026A CN108962265B CN 108962265 B CN108962265 B CN 108962265B CN 201810497026 A CN201810497026 A CN 201810497026A CN 108962265 B CN108962265 B CN 108962265B
- Authority
- CN
- China
- Prior art keywords
- sequence
- length
- index sequence
- zero
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000007906 compression Methods 0.000 title claims description 13
- 230000006835 compression Effects 0.000 title claims description 10
- 238000006243 chemical reaction Methods 0.000 claims abstract description 29
- 108010076504 Protein Sorting Signals Proteins 0.000 claims abstract description 21
- 238000001228 spectrum Methods 0.000 claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims description 30
- 230000007480 spreading Effects 0.000 claims description 29
- 230000008569 process Effects 0.000 claims description 13
- 238000005259 measurement Methods 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000011426 transformation method Methods 0.000 claims description 3
- 238000011946 reduction process Methods 0.000 claims description 2
- 230000017105 transposition Effects 0.000 claims 1
- 238000005070 sampling Methods 0.000 description 6
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical group [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 3
- 229910002056 binary alloy Inorganic materials 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
Abstract
The invention discloses a method for compressing, storing and reconstructing a voice signal based on a superposition sequence, which comprises the following steps: reading a sparse voice signal, constructing an original index sequence by using non-zero elements and zero element position indexes, and storing the sparsity of the sparse voice signal; compressing the sparse voice signal to generate a compressed signal sequence; intercepting partial sequence of the original index sequence as an index sequence, and generating a spread spectrum index sequence through processing such as coding, conversion, spread spectrum and the like; respectively weighting and superposing the spread spectrum index sequence and the compressed signal sequence to generate a storage sequence for storage; de-spreading the stored sequence to obtain a conversion index sequence and a compressed signal sequence; restoring and decoding the converted index sequence to obtain an index sequence; and constructing a support set according to the index sequence and reconstructing a sparse speech signal. The invention has the advantages that: under the condition of not increasing storage resources, the reconstruction precision of the voice signal is effectively improved.
Description
Technical Field
The invention relates to the technical field of compression storage and reconstruction of voice signals, in particular to a voice signal compression storage and reconstruction method based on a superposition sequence.
Background
With the increasingly frequent information interaction, a voice signal is a very common signal in the information interaction, and the processing technical requirements are gradually refined. Due to the diversity of the speech signal itself and the uniqueness of the human auditory system, the speech signal is sparse in different transform domains. Conventional speech signal sampling typically requires that the nyquist sampling rate be satisfied. Compressed sensing theory (CS) indicates that signals with sparseness or compressibility can be Compressed sampled and reconstructed by Compressed sensing techniques. Therefore, the CS theory is combined with the speech signal processing field, so that the sampling frequency is reduced, and the requirement on a sampling device is lowered.
And according to the CS theory, compressing the sparse voice signal through an observation matrix, and reconstructing the sparse voice signal by using a reconstruction algorithm. However, the existing reconstruction algorithms such as the matching pursuit algorithm, the orthogonal matching pursuit algorithm, the compressive sampling matching pursuit algorithm, the basis pursuit algorithm, the subspace pursuit algorithm, and the like are not specifically proposed for the reconstruction of the sparse speech signal, and then the element position index of the sparse speech signal is not considered and utilized, so that the reconstruction accuracy of the sparse speech signal is limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a voice signal compression storage and reconstruction method based on a superposition sequence. Compared with the traditional compressed sensing voice compression, the method and the device have the advantages that partial position indexes of elements of the sparse voice signal are used for assisting reconstruction, and the reconstruction accuracy of the voice signal is improved under the condition that the storage cost is not increased.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a method for compressing, storing and reconstructing a voice signal based on a superposition sequence comprises the following steps: (a) and (3) compression and storage processing of the voice signal:
(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;
the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;
(a4) according to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
(a5) for the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
(a6) For "spread spectrum index sequence" of length M "And the compressed signal sequence y are added after weights α and 1- α are respectively given to the compressed signal sequence y, and a formula is utilizedGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1.
(b) Reconstruction reproduction processing of a speech signal:
(b1) de-spreading the memory sequence z with length M to restore length L2"inverted index sequence" C;
(b2) for the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index' with the length of MSequence'
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;
(b5) recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "
(b6) By using "fixed support assembly"And (3) assisting and reconstructing a sparse speech signal x with the length of N from the 'compressed signal sequence' y with the length of M by combining a reconstruction algorithm.
Further, the sparse speech signal described in step a1) is a discrete speech signal that is transformed from a time domain signal to a frequency domain signal by a time-frequency transform method, and the signal amplitude below the silence threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with a length N.
Such as the MPEG (moving Picture Experts group) psychoacoustic model and the OGG (OGGVObis) psychoacoustic model.
The time-frequency transformation method can adopt discrete cosine transformation, short-time Fourier transformation and wavelet transformation.
Further, constructing an "original index sequence" of length N with 0,1 elements as described in step a1) "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"The "original index sequence" thus constructed, with the correspondence record of element 1 "Is a sequence with the length of N and the element of 0 or 1
Further, the data conversion process in step a4) is as follows: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
Further, the utilization length of step a5) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The method comprises the following specific steps:
a5-1) "inverted index sequence"Suppose Q ∈ Rq×1Is a spreading sequence, where q is the spreading gain, satisfies
Wherein, the spreading sequence Q can be M sequence, M sequence, Gold sequence, Zadoff-chu sequence.
a5-2) calculating the Kronecker product,
spread spectrum spreading of sequence C, i.e. S of length (L)2×q);
Where the superscript "T" denotes the transpose operation.
a5-3) adds zeros at the end of the vector S, starting from (L)2× q) is added to M, thereby constructing a "spreading index sequence"
The degree is M.
Further, the utilization length of step b2) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The specific steps of (a) are consistent with the steps a5-1) to a 5-3).
Further, the data restoring process in step b4) is as follows: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
Further, the step b6) of using the "fixed support set”The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.
Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm.
Further, taking the reconstruction algorithm orthogonal matching pursuit algorithm as an example, the step b6) includes:
b6-1) reading the "compressed Signal sequence" y ∈ RM×1Measurement matrix phi ∈ RM×NThe sparsity K, t represents the number of iterations, rtDenotes the residual, Ω, of t iterationstSet of indices (column indices) representing t iterations, i.e. support set of t iterations, KtRepresents the index set omegatThe number of the elements (c) is,represents Kt× 1 vector, λtIndicates the index (column index), a, found at the t-th iterationjThe jth column of the matrix Φ (j ═ 1,2, …, N),express according to "fixed support assembly"Set of columns of the selected matrix phi, phitRepresenting the set omega by indextSelected column set of matrix Φ (size M × K)tThe matrix of (d), the notation ∪ denotes a union operation, | · | denotes taking the absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)-1Representing matrix inversion;
b6-3) if Kt< K, solvingFind index lambdat(ii) a Otherwise, ask forLeast squares solution of (c):performing step b 6-8);
b6-7) t ═ t +1, return to step b 6-3);
b6-8) sparse Speech SignalIn the support set omegatWith non-zero terms at the index, whose value is the least-squares solution soughtWill be provided withIn the support set omegatIndexingThe other elements are set to 0 to reconstruct the sparse speech signal x.
Compared with the prior art, the invention has the advantages that:
partial position indexes of the sparse speech signal are stored under the condition that storage cost is not increased, and compared with the traditional compressed sensing speech compression, reconstruction accuracy is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of a method for storing and reconstructing a speech signal sample based on a superposition sequence according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of the compression and storage process of the speech signal based on the superposition sequence sampling storage and reconstruction method according to the embodiment of the present invention.
FIG. 3 is a schematic flow chart of the reconstruction and reproduction process of the voice signal based on the method for storing and reconstructing the voice signal samples of the superposition sequence according to the embodiment of the present invention
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings by way of examples.
A flow chart of a method for storing and reconstructing speech signal samples based on a superposition sequence is shown in fig. 1.
The following describes in detail the processing procedure of compressing and storing the voice signal according to the method for compressing, storing and reconstructing the voice signal based on the superposition sequence, as shown in fig. 2.
(a1) Reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
the sparse speech signal is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude lower than a mute threshold is set to be zero according to a psychoacoustic model to obtain a sparse speech signal x with the length of N.
Among them, the "psychoacoustic models" are, for example, an mpeg (moving Picture Experts group) psychoacoustic model and an ogg (oggvobis) psychoacoustic model, and so on.
Such as discrete cosine transform, short-time fourier transform, wavelet transform, and so on.
Wherein, the 'original index sequence' with the length of N is constructed by 0,1 element "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"Corresponding to the record element 1, the "original index sequence" thus constructed "Is a sequence of length N and elements 0 or 1.
Example 1: examples of such "construction" are as follows:
sparse speech signal assuming N18
Where the superscript "T" denotes the transpose operation.
(a2) Reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process can be expressed as that y is equal to phi x;
the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;
example 2: the "intercept" example is as follows:
on the basis of example 1, assuming β ═ 0.5, β N ═ 9, "original index sequence"Then "index sequence" a ═ 1,1,1,0,1,0,1,0,0)T。
(a4) According to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
wherein, the data conversion process comprises the following steps: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
Example 3: examples of such "transformations" are as follows:
assuming a length L1The sequence B of 62 ═ 1,0,0,1,0,1,0,1,0, 0,1,0,1,1,1,0,1,0,0, 0,1,0,1,1,1,0,0,1,0,1,0,1, 0,1,1,0, 1,0,1,0,0, 0,1,1,0,0,0,1,0TWhen γ is 16, the groups are divided into 4 groups, i.e., L 24, and two bits 0 are added at the end, then 4 groups of data are 1001010101011101, 0001011000101010, 1110010101110101 and 0010111000110000 in sequence, the data are 38237,5674,58741 and 11824 in sequence from binary conversion to decimal real number, and then the 'conversion index sequence' C is equal to (38237,5674,58741,11824)TAnd is a vector of 4 × 1.
(a5) For the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
Example 4: the utilization length is L2C constructs a 'spread index sequence' by spreading and zero-padding "Examples are as follows:
a5-1) assuming "inverted index sequence" C ═ 3.8,5.6,5.8,1.2)T,L2=4,M=25, Q∈Rq×1For spreading sequences, Q ═ 1,1,1,1,1)TWherein q is a spreading gain, satisfies
Wherein, Q ═ 1,1,1,1,1)TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.
a5-2) calculating the Kronecker product,
a5-3) adds zeros at the end of the vector S, starting from (L)2× q) to M, i.e., from 24 to 25, thereby constructing a "translation index sequence" The length is 25.
(a6) For "spread spectrum index sequence" of length M "And the compressed signal sequence y are added after weights α and 1- α are respectively given to the compressed signal sequence y, and a formula is utilizedGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1;
example 5: the construct "memory sequence" z is exemplified as follows:
on the basis of example 4, assume that M is 25, a "spreading index sequence""compressed signal sequence" y ═ y (y)1,y2,…,y24,y25)Tα -0.2, and 1- α -0.8, then according to the formula,
the following describes in detail the reconstruction and reproduction process of the speech signal according to the method for compressed storage and reconstruction of a speech signal based on a superposition sequence, as shown in fig. 3.
(b1) De-spreading the 'memory sequence' z with length M to restore length L2"inverted index sequence" C;
example 6: an example of "despreading" is as follows:
on the basis of example 4 and example 5, assume that "storage sequence" z ∈ RM×1,M=25, Q∈Rq×1For spreading sequences, Q ═ 1,1,1,1,1)TWherein q ═ 6 is the spreading gain;
wherein, Q ═ 1,1,1,1,1)TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.
b1-1) on the basis of examples 4 and 5, it is known that:
b1-2) partitioning the sequence z into blocksOf (2) aAnd one (M-L)2× q) × 1 sequence of pure speech signalsI.e.into 4 sequences z of length 61,z2...z4And a pure speech signal sequenceThen
b1-3) vs. z1,z2z3,z4Despreading is performed assuming despread data h ═ 4.56,6.96,1.44,6.72TNamely:
take i as an example 1, i
b1-4) Speech Signal sequence yi1,yi2,…,yi6And Q1,Q2,…,Q6Linearity is not relevant, so:
0.8yi1Q1+0.8yi2Q2+…+0.8yi6Q6=0;
namely: 0.8y11Q1+0.8y12Q2+…+0.8y16Q6=0;
b1-6) known spreading matrix Q ═ (Q)1,Q2,…,Q6)TI.e. Q ═ 1,1,1,1,1)T;
b1-7), the despreading restores the 'conversion index sequence' C ═ C1,C2,C3,C4)T;
Namely: 4.56 ═ 0.2C1+0.2C1+…+0.2C1To free C1=3.8;
By the same token, solve out C2,C3,C4That is, despreading and recovering the 'conversion index sequence' C ═ (3.8,5.6,5.8,1.2)T。
(b2) For the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
Wherein the construction of a "spreading index sequence"Examples are consistent with those described in example 4.
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;
wherein, the data reduction process comprises the following steps: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
Example 7: examples of such "data reduction" are as follows:
on the basis of example 3, assume that "conversion index sequence" C ═ C (38237,5674,58741,11824)TSequence B is of length L162, converting the real number element into a binary system to obtain sequence data of 1001010101011101000101100010101011100101011101010010111000110000, removing the last two digits 0 from the end of the binary system obtained by conversion, and reducing the sequence B to be (1,0,0,1, 0),1,0,1,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,0, 1,0,1,0,1,1,1,0,1,0,1,0,0,1,0,1,1,1,0,0,0,1,1,0,0)T
(b5) Recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "
let "index sequence" a ═ 1,1,1,0,1,0,1,0,0)TRecording the sequence numbers of the non-zero elements in the index sequence A in a set to form a fixed support set "
(b6) By using "fixed support assembly"And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.
Wherein, the said use of "fixed support assembly"The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.
Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm, among others.
Taking the reconstruction computation orthogonal matching pursuit algorithm as an example, the step b6) includes:
b6-1) reading the "compressed Signal sequence" y ∈ RM×1Measurement matrix phi ∈ RM×NThe sparsity K, t represents the number of iterations, rtDenotes the residual, Ω, of t iterationstSet of indices (column indices) representing t iterations, i.e. support set of t iterations, KtRepresents the index set omegatThe number of the elements (c) is,represents Kt× 1 vector, λtIndicates the index (column index), a, found at the t-th iterationjThe jth column of the matrix Φ (j ═ 1,2, …, N),express according to "fixed support assembly"Set of columns of the selected matrix phi, phitRepresenting the set omega by indextSelected column set of matrix Φ (size M × K)tThe matrix of (d), the notation ∪ denotes a union operation, | · | denotes taking the absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)-1Representing matrix inversion;
b6-3) if Kt< K, solvingFind index lambdat(ii) a Otherwise, ask forLeast squares solution of (c):performing step b 6-8);
Example 9: the step b6-4) is exemplified as follows:
on the basis of example 8, assume thatt=1,λtWhen the value is 17, thenThen omegat=Ωt∪{λt}={1,4,7,10,14,17},
b6-7) t ═ t +1, return to step b 6-3);
b6-8) sparse Speech SignalIn the support set omegatWith non-zero terms at the index, whose value is the least-squares solution soughtWill be provided withIn the support set omegatThe elements outside the index are set to 0 to reconstruct the sparse speech signal x.
Example 10: an example of the sparse speech signal x reconstruction is as follows:
sparse speech signal obtained by hypothesis reconstructionLength N25, in the support set ΩtWith non-zero terms whose values are the least-squares solution ofΩt={1,4,5,7,8,10,14,17,19,23},Will be provided withIn the support set omegatThe element outside the index is set to 0, then
I.e. reconstructing a sparse speech signal with a length N-25
x=(x1,0,0,x4,x5,0,0,x7,x8,0,0,x10,0,0,0,x14,0,0,x17,0,x19,0,0,0,x23,0,0)
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (7)
1. A method for compressing, storing and reconstructing a voice signal based on a superposition sequence is characterized by comprising the following steps: (a) and (3) compression and storage processing of the voice signal:
(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;
(a4) according to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
(a5) for the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
The utilization length of step a5) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The method comprises the following specific steps:
a5-1) "inverted index sequence"Suppose Q ∈ Rq×1Is a spreading sequence, where q is the spreading gain, satisfies
a5-2) calculating the Kronecker product,
spread spectrum spreading of sequence C, i.e. S of length (L)2×q);
Wherein, the superscript "T" represents the transposition operation;
a5-3) adds zeros at the end of the vector S, starting from (L)2× q) is added to M, thereby constructing a "spreading index sequence"
(a6) for "spread spectrum index sequence" of length M "And a "compressed signal sequence" y are assigned respectivelyThe weighted values α and 1- α are added, and the formula is usedGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1;
(b) reconstruction reproduction processing of a speech signal:
(b1) de-spreading the memory sequence z with length M to restore length L2"inverted index sequence" C;
(b2) for the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;
(b5) recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "
2. The method of claim 1, wherein: the sparse speech signal described in step a1) is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude below a mute threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with the length of N.
3. The method of claim 1, wherein: construction of an "original index sequence" of length N with 0,1 elements as described in step a1) "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"Corresponding to the record element 1, the "original index sequence" thus constructed "Is a sequence of length N and elements 0 or 1.
4. The method of claim 1, wherein: the data conversion process of the step a4) is as follows: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
6. The method of claim 1, wherein: the data reduction process of the step b4) is as follows: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
7. The method of claim 1, wherein: using "fixed support sets" as described in step b6) "The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497026.XA CN108962265B (en) | 2018-05-22 | 2018-05-22 | Voice signal compression storage and reconstruction method based on superposition sequence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497026.XA CN108962265B (en) | 2018-05-22 | 2018-05-22 | Voice signal compression storage and reconstruction method based on superposition sequence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108962265A CN108962265A (en) | 2018-12-07 |
CN108962265B true CN108962265B (en) | 2020-08-25 |
Family
ID=64499535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810497026.XA Active CN108962265B (en) | 2018-05-22 | 2018-05-22 | Voice signal compression storage and reconstruction method based on superposition sequence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108962265B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109818645B (en) * | 2019-02-20 | 2020-12-29 | 西华大学 | Superposition CSI feedback method based on signal detection and support set assistance |
CN109817229B (en) * | 2019-03-14 | 2020-09-22 | 西华大学 | Single-bit audio compression transmission and reconstruction method assisted by superposition characteristic information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014505415A (en) * | 2011-01-10 | 2014-02-27 | アルカテル−ルーセント | Method and apparatus for measuring and recovering a sparse signal |
CN105099462A (en) * | 2014-05-22 | 2015-11-25 | 北京邮电大学 | Signal processing method based on compressive sensing |
CN105206277A (en) * | 2015-08-17 | 2015-12-30 | 西华大学 | Voice compression method base on monobit compression perception |
CN105933008A (en) * | 2016-04-15 | 2016-09-07 | 哈尔滨工业大学 | Multiband signal reconstruction method based on clustering sparse regularization orthogonal matching tracking algorithm |
-
2018
- 2018-05-22 CN CN201810497026.XA patent/CN108962265B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014505415A (en) * | 2011-01-10 | 2014-02-27 | アルカテル−ルーセント | Method and apparatus for measuring and recovering a sparse signal |
CN105099462A (en) * | 2014-05-22 | 2015-11-25 | 北京邮电大学 | Signal processing method based on compressive sensing |
CN105206277A (en) * | 2015-08-17 | 2015-12-30 | 西华大学 | Voice compression method base on monobit compression perception |
CN105933008A (en) * | 2016-04-15 | 2016-09-07 | 哈尔滨工业大学 | Multiband signal reconstruction method based on clustering sparse regularization orthogonal matching tracking algorithm |
Non-Patent Citations (2)
Title |
---|
《1-Bit压缩感知盲重构算法》;张京超等;《电子与信息学报》;20150331;第37卷(第3期);第567-573页 * |
《One-bit Compressive Sensing with partial support》;Phillip North et al.;《2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)》;20160121;第349-352页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108962265A (en) | 2018-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6998968B2 (en) | Deep neural network execution method, execution device, learning method, learning device and program | |
JP6177239B2 (en) | Adapt analysis weighting window or synthesis weighting window for transform coding or transform decoding | |
CN108962265B (en) | Voice signal compression storage and reconstruction method based on superposition sequence | |
Krahmer et al. | Total variation minimization in compressed sensing | |
Zhang et al. | Signal reconstruction of compressed sensing based on alternating direction method of multipliers | |
Shirazinia et al. | Analysis-by-synthesis quantization for compressed sensing measurements | |
Tawfic et al. | Improving recovery of ECG signal with deterministic guarantees using split signal for multiple supports of matching pursuit (SS-MSMP) algorithm | |
Dendani et al. | Speech enhancement based on deep AutoEncoder for remote Arabic speech recognition | |
Ahmed et al. | Audio compression using transforms and high order entropy encoding | |
Shukla et al. | Audio compression algorithm using discrete cosine transform (DCT) and Lempel-Ziv-Welch (LZW) encoding method | |
Desai et al. | Compressive sensing in speech processing: A survey based on sparsity and sensing matrix | |
Gan et al. | Golay meets Hadamard: Golay-paired Hadamard matrices for fast compressed sensing | |
Yu et al. | Medical image compression with thresholding denoising using discrete cosine-based discrete orthogonal stockwell transform | |
Joshi et al. | Analysis of compressive sensing for non stationary music signal | |
JP2018513996A (en) | Method and device for encoding multiple audio signals and method and device for decoding a mixture of multiple audio signals with improved separation | |
Ambat et al. | On selection of search space dimension in compressive sampling matching pursuit | |
Bhadoria et al. | Comparative analysis of basis & measurement matrices for non-speech audio signal using compressive sensing | |
Kasem et al. | Perceptual compressed sensing and perceptual sparse fast fourier transform for audio signal compression | |
Rajbamshi et al. | Random Gabor multipliers for compressive sensing: a simulation study | |
Yu et al. | Compressed sensing in audio signals and it's reconstruction algorithm | |
Bala et al. | Effect of sparsity on speech compressed sensing | |
Moreno-Alvarado et al. | DCT-compressive sampling of multifrequency sparse audio signals | |
Abo-Zahhad et al. | Electrocardiogram data compression algorithm based on the linear prediction of the wavelet coefficients | |
Sinha et al. | Wavelet based Speech Coding technique using median function thresholding | |
Kasem et al. | A comparative study of audio compression based on compressed sensing and sparse fast fourier transform (sfft): Performance and challenges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20181207 Assignee: Suining Feidian Cultural Communication Co.,Ltd. Assignor: XIHUA University Contract record no.: X2023510000027 Denomination of invention: A method for compressing, storing, and reconstructing speech signals based on stacked sequences Granted publication date: 20200825 License type: Common License Record date: 20231129 |
|
EE01 | Entry into force of recordation of patent licensing contract |