CN108962265B - Voice signal compression storage and reconstruction method based on superposition sequence - Google Patents

Voice signal compression storage and reconstruction method based on superposition sequence Download PDF

Info

Publication number
CN108962265B
CN108962265B CN201810497026.XA CN201810497026A CN108962265B CN 108962265 B CN108962265 B CN 108962265B CN 201810497026 A CN201810497026 A CN 201810497026A CN 108962265 B CN108962265 B CN 108962265B
Authority
CN
China
Prior art keywords
sequence
length
index sequence
zero
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810497026.XA
Other languages
Chinese (zh)
Other versions
CN108962265A (en
Inventor
卿朝进
万东琴
阳庆瑶
王维
郭奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xihua University
Original Assignee
Xihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xihua University filed Critical Xihua University
Priority to CN201810497026.XA priority Critical patent/CN108962265B/en
Publication of CN108962265A publication Critical patent/CN108962265A/en
Application granted granted Critical
Publication of CN108962265B publication Critical patent/CN108962265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation

Abstract

The invention discloses a method for compressing, storing and reconstructing a voice signal based on a superposition sequence, which comprises the following steps: reading a sparse voice signal, constructing an original index sequence by using non-zero elements and zero element position indexes, and storing the sparsity of the sparse voice signal; compressing the sparse voice signal to generate a compressed signal sequence; intercepting partial sequence of the original index sequence as an index sequence, and generating a spread spectrum index sequence through processing such as coding, conversion, spread spectrum and the like; respectively weighting and superposing the spread spectrum index sequence and the compressed signal sequence to generate a storage sequence for storage; de-spreading the stored sequence to obtain a conversion index sequence and a compressed signal sequence; restoring and decoding the converted index sequence to obtain an index sequence; and constructing a support set according to the index sequence and reconstructing a sparse speech signal. The invention has the advantages that: under the condition of not increasing storage resources, the reconstruction precision of the voice signal is effectively improved.

Description

Voice signal compression storage and reconstruction method based on superposition sequence
Technical Field
The invention relates to the technical field of compression storage and reconstruction of voice signals, in particular to a voice signal compression storage and reconstruction method based on a superposition sequence.
Background
With the increasingly frequent information interaction, a voice signal is a very common signal in the information interaction, and the processing technical requirements are gradually refined. Due to the diversity of the speech signal itself and the uniqueness of the human auditory system, the speech signal is sparse in different transform domains. Conventional speech signal sampling typically requires that the nyquist sampling rate be satisfied. Compressed sensing theory (CS) indicates that signals with sparseness or compressibility can be Compressed sampled and reconstructed by Compressed sensing techniques. Therefore, the CS theory is combined with the speech signal processing field, so that the sampling frequency is reduced, and the requirement on a sampling device is lowered.
And according to the CS theory, compressing the sparse voice signal through an observation matrix, and reconstructing the sparse voice signal by using a reconstruction algorithm. However, the existing reconstruction algorithms such as the matching pursuit algorithm, the orthogonal matching pursuit algorithm, the compressive sampling matching pursuit algorithm, the basis pursuit algorithm, the subspace pursuit algorithm, and the like are not specifically proposed for the reconstruction of the sparse speech signal, and then the element position index of the sparse speech signal is not considered and utilized, so that the reconstruction accuracy of the sparse speech signal is limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a voice signal compression storage and reconstruction method based on a superposition sequence. Compared with the traditional compressed sensing voice compression, the method and the device have the advantages that partial position indexes of elements of the sparse voice signal are used for assisting reconstruction, and the reconstruction accuracy of the voice signal is improved under the condition that the storage cost is not increased.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a method for compressing, storing and reconstructing a voice signal based on a superposition sequence comprises the following steps: (a) and (3) compression and storage processing of the voice signal:
(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "
Figure RE-GDA0001695870420000021
Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;
the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "
Figure RE-GDA0001695870420000022
Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;
(a4) according to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
(a5) for the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
Figure RE-GDA0001695870420000023
(a6) For "spread spectrum index sequence" of length M "
Figure RE-GDA0001695870420000031
And the compressed signal sequence y are added after weights α and 1- α are respectively given to the compressed signal sequence y, and a formula is utilized
Figure RE-GDA0001695870420000032
Generating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1.
(b) Reconstruction reproduction processing of a speech signal:
(b1) de-spreading the memory sequence z with length M to restore length L2"inverted index sequence" C;
(b2) for the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index' with the length of MSequence'
Figure RE-GDA0001695870420000033
(b3) Using formulas
Figure RE-GDA0001695870420000034
Solving a 'compressed signal sequence' y with the length M;
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;
(b5) recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "
Figure RE-GDA0001695870420000035
(b6) By using "fixed support assembly"
Figure RE-GDA0001695870420000036
And (3) assisting and reconstructing a sparse speech signal x with the length of N from the 'compressed signal sequence' y with the length of M by combining a reconstruction algorithm.
Further, the sparse speech signal described in step a1) is a discrete speech signal that is transformed from a time domain signal to a frequency domain signal by a time-frequency transform method, and the signal amplitude below the silence threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with a length N.
Such as the MPEG (moving Picture Experts group) psychoacoustic model and the OGG (OGGVObis) psychoacoustic model.
The time-frequency transformation method can adopt discrete cosine transformation, short-time Fourier transformation and wavelet transformation.
Further, constructing an "original index sequence" of length N with 0,1 elements as described in step a1) "
Figure RE-GDA0001695870420000041
The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"
Figure RE-GDA0001695870420000042
The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"
Figure RE-GDA0001695870420000043
The "original index sequence" thus constructed, with the correspondence record of element 1 "
Figure RE-GDA0001695870420000044
Is a sequence with the length of N and the element of 0 or 1
Further, the data conversion process in step a4) is as follows: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
Further, the utilization length of step a5) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "
Figure RE-GDA0001695870420000045
The method comprises the following specific steps:
a5-1) "inverted index sequence"
Figure RE-GDA0001695870420000046
Suppose Q ∈ Rq×1Is a spreading sequence, where q is the spreading gain, satisfies
Figure RE-GDA0001695870420000047
Wherein, the spreading sequence Q can be M sequence, M sequence, Gold sequence, Zadoff-chu sequence.
Wherein, the symbol
Figure RE-GDA0001695870420000048
Indicating a downward integer operation.
a5-2) calculating the Kronecker product,
Figure RE-GDA0001695870420000051
spread spectrum spreading of sequence C, i.e. S of length (L)2×q);
Where the superscript "T" denotes the transpose operation.
a5-3) adds zeros at the end of the vector S, starting from (L)2× q) is added to M, thereby constructing a "spreading index sequence"
Figure RE-GDA0001695870420000052
Figure RE-GDA0001695870420000053
The degree is M.
Further, the utilization length of step b2) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "
Figure RE-GDA0001695870420000054
The specific steps of (a) are consistent with the steps a5-1) to a 5-3).
Further, the data restoring process in step b4) is as follows: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
Further, the step b6) of using the "fixed support set”
Figure RE-GDA0001695870420000055
The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "
Figure RE-GDA0001695870420000056
And (5) assisting reconstruction.
Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm.
Further, taking the reconstruction algorithm orthogonal matching pursuit algorithm as an example, the step b6) includes:
b6-1) reading the "compressed Signal sequence" y ∈ RM×1Measurement matrix phi ∈ RM×NThe sparsity K, t represents the number of iterations, rtDenotes the residual, Ω, of t iterationstSet of indices (column indices) representing t iterations, i.e. support set of t iterations, KtRepresents the index set omegatThe number of the elements (c) is,
Figure RE-GDA0001695870420000061
represents Kt× 1 vector, λtIndicates the index (column index), a, found at the t-th iterationjThe jth column of the matrix Φ (j ═ 1,2, …, N),
Figure RE-GDA0001695870420000062
express according to "fixed support assembly"
Figure RE-GDA0001695870420000063
Set of columns of the selected matrix phi, phitRepresenting the set omega by indextSelected column set of matrix Φ (size M × K)tThe matrix of (d), the notation ∪ denotes a union operation, | · | denotes taking the absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)-1Representing matrix inversion;
b6-2) initialization
Figure RE-GDA0001695870420000064
b6-3) if Kt< K, solving
Figure RE-GDA0001695870420000065
Find index lambdat(ii) a Otherwise, ask for
Figure RE-GDA0001695870420000066
Least squares solution of (c):
Figure RE-GDA0001695870420000067
performing step b 6-8);
b6-4) to omegat=Ωt-1∪{λt},
Figure RE-GDA0001695870420000068
b6-5) solving
Figure RE-GDA0001695870420000069
Least squares solution of (c):
Figure RE-GDA00016958704200000610
b6-6) updating residual
Figure RE-GDA00016958704200000611
b6-7) t ═ t +1, return to step b 6-3);
b6-8) sparse Speech Signal
Figure RE-GDA00016958704200000612
In the support set omegatWith non-zero terms at the index, whose value is the least-squares solution sought
Figure RE-GDA00016958704200000613
Will be provided with
Figure RE-GDA00016958704200000614
In the support set omegatIndexingThe other elements are set to 0 to reconstruct the sparse speech signal x.
Compared with the prior art, the invention has the advantages that:
partial position indexes of the sparse speech signal are stored under the condition that storage cost is not increased, and compared with the traditional compressed sensing speech compression, reconstruction accuracy is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of a method for storing and reconstructing a speech signal sample based on a superposition sequence according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of the compression and storage process of the speech signal based on the superposition sequence sampling storage and reconstruction method according to the embodiment of the present invention.
FIG. 3 is a schematic flow chart of the reconstruction and reproduction process of the voice signal based on the method for storing and reconstructing the voice signal samples of the superposition sequence according to the embodiment of the present invention
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings by way of examples.
A flow chart of a method for storing and reconstructing speech signal samples based on a superposition sequence is shown in fig. 1.
The following describes in detail the processing procedure of compressing and storing the voice signal according to the method for compressing, storing and reconstructing the voice signal based on the superposition sequence, as shown in fig. 2.
(a1) Reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "
Figure RE-GDA0001695870420000071
Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
the sparse speech signal is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude lower than a mute threshold is set to be zero according to a psychoacoustic model to obtain a sparse speech signal x with the length of N.
Among them, the "psychoacoustic models" are, for example, an mpeg (moving Picture Experts group) psychoacoustic model and an ogg (oggvobis) psychoacoustic model, and so on.
Such as discrete cosine transform, short-time fourier transform, wavelet transform, and so on.
Wherein, the 'original index sequence' with the length of N is constructed by 0,1 element "
Figure RE-GDA0001695870420000081
The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"
Figure RE-GDA0001695870420000082
The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"
Figure RE-GDA0001695870420000083
Corresponding to the record element 1, the "original index sequence" thus constructed "
Figure RE-GDA0001695870420000084
Is a sequence of length N and elements 0 or 1.
Example 1: examples of such "construction" are as follows:
sparse speech signal assuming N18
x=(5.4,3.2,6.7,0,0.9,0,7.8,0,0,1.2,0.8,0,4.2,0,0,0,0,0)TThen "original index sequence"
Figure RE-GDA0001695870420000085
Where the superscript "T" denotes the transpose operation.
(a2) Reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process can be expressed as that y is equal to phi x;
the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "
Figure RE-GDA0001695870420000086
Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;
example 2: the "intercept" example is as follows:
on the basis of example 1, assuming β ═ 0.5, β N ═ 9, "original index sequence"
Figure RE-GDA0001695870420000091
Then "index sequence" a ═ 1,1,1,0,1,0,1,0,0)T
(a4) According to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
wherein, the data conversion process comprises the following steps: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
Example 3: examples of such "transformations" are as follows:
assuming a length L1The sequence B of 62 ═ 1,0,0,1,0,1,0,1,0, 0,1,0,1,1,1,0,1,0,0, 0,1,0,1,1,1,0,0,1,0,1,0,1, 0,1,1,0, 1,0,1,0,0, 0,1,1,0,0,0,1,0TWhen γ is 16, the groups are divided into 4 groups, i.e., L 24, and two bits 0 are added at the end, then 4 groups of data are 1001010101011101, 0001011000101010, 1110010101110101 and 0010111000110000 in sequence, the data are 38237,5674,58741 and 11824 in sequence from binary conversion to decimal real number, and then the 'conversion index sequence' C is equal to (38237,5674,58741,11824)TAnd is a vector of 4 × 1.
(a5) For the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
Figure RE-GDA0001695870420000101
Example 4: the utilization length is L2C constructs a 'spread index sequence' by spreading and zero-padding "
Figure RE-GDA0001695870420000102
Examples are as follows:
a5-1) assuming "inverted index sequence" C ═ 3.8,5.6,5.8,1.2)T,L2=4,M=25, Q∈Rq×1For spreading sequences, Q ═ 1,1,1,1,1)TWherein q is a spreading gain, satisfies
Figure RE-GDA0001695870420000103
Wherein, Q ═ 1,1,1,1,1)TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.
Wherein, the symbol
Figure RE-GDA0001695870420000104
Indicating a downward integer operation.
a5-2) calculating the Kronecker product,
Figure RE-GDA0001695870420000105
Figure RE-GDA0001695870420000106
realizing the spread spectrum expansion of the sequence C, namely the S length is 24;
a5-3) adds zeros at the end of the vector S, starting from (L)2× q) to M, i.e., from 24 to 25, thereby constructing a "translation index sequence"
Figure RE-GDA0001695870420000107
Figure RE-GDA0001695870420000108
Figure RE-GDA0001695870420000109
The length is 25.
(a6) For "spread spectrum index sequence" of length M "
Figure RE-GDA00016958704200001010
And the compressed signal sequence y are added after weights α and 1- α are respectively given to the compressed signal sequence y, and a formula is utilized
Figure RE-GDA00016958704200001011
Generating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1;
example 5: the construct "memory sequence" z is exemplified as follows:
on the basis of example 4, assume that M is 25, a "spreading index sequence"
Figure RE-GDA0001695870420000111
"compressed signal sequence" y ═ y (y)1,y2,…,y24,y25)Tα -0.2, and 1- α -0.8, then according to the formula,
Figure RE-GDA0001695870420000112
the following describes in detail the reconstruction and reproduction process of the speech signal according to the method for compressed storage and reconstruction of a speech signal based on a superposition sequence, as shown in fig. 3.
(b1) De-spreading the 'memory sequence' z with length M to restore length L2"inverted index sequence" C;
example 6: an example of "despreading" is as follows:
on the basis of example 4 and example 5, assume that "storage sequence" z ∈ RM×1,M=25, Q∈Rq×1For spreading sequences, Q ═ 1,1,1,1,1)TWherein q ═ 6 is the spreading gain;
wherein, Q ═ 1,1,1,1,1)TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.
b1-1) on the basis of examples 4 and 5, it is known that:
Figure RE-GDA0001695870420000113
b1-2) partitioning the sequence z into blocks
Figure RE-GDA0001695870420000114
Of (2) a
Figure RE-GDA0001695870420000115
And one (M-L)2× q) × 1 sequence of pure speech signals
Figure RE-GDA0001695870420000116
I.e.into 4 sequences z of length 61,z2...z4And a pure speech signal sequence
Figure RE-GDA0001695870420000117
Then
Figure RE-GDA0001695870420000118
Wherein the content of the first and second substances,
Figure RE-GDA0001695870420000121
Figure RE-GDA0001695870420000122
b1-3) vs. z1,z2z3,z4Despreading is performed assuming despread data h ═ 4.56,6.96,1.44,6.72TNamely:
Figure RE-GDA0001695870420000123
take i as an example 1, i
Figure RE-GDA0001695870420000124
b1-4) Speech Signal sequence yi1,yi2,…,yi6And Q1,Q2,…,Q6Linearity is not relevant, so:
0.8yi1Q1+0.8yi2Q2+…+0.8yi6Q6=0;
namely: 0.8y11Q1+0.8y12Q2+…+0.8y16Q6=0;
b1-5) therefore:
Figure RE-GDA0001695870420000125
namely:
Figure RE-GDA0001695870420000126
b1-6) known spreading matrix Q ═ (Q)1,Q2,…,Q6)TI.e. Q ═ 1,1,1,1,1)T
b1-7), the despreading restores the 'conversion index sequence' C ═ C1,C2,C3,C4)T
Namely: 4.56 ═ 0.2C1+0.2C1+…+0.2C1To free C1=3.8;
By the same token, solve out C2,C3,C4That is, despreading and recovering the 'conversion index sequence' C ═ (3.8,5.6,5.8,1.2)T
(b2) For the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
Figure RE-GDA0001695870420000131
Wherein the construction of a "spreading index sequence"
Figure RE-GDA0001695870420000132
Examples are consistent with those described in example 4.
(b3) Using formulas
Figure RE-GDA0001695870420000133
Solving a 'compressed signal sequence' y with the length M;
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;
wherein, the data reduction process comprises the following steps: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
Example 7: examples of such "data reduction" are as follows:
on the basis of example 3, assume that "conversion index sequence" C ═ C (38237,5674,58741,11824)TSequence B is of length L162, converting the real number element into a binary system to obtain sequence data of 1001010101011101000101100010101011100101011101010010111000110000, removing the last two digits 0 from the end of the binary system obtained by conversion, and reducing the sequence B to be (1,0,0,1, 0),1,0,1,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,0, 1,0,1,0,1,1,1,0,1,0,1,0,0,1,0,1,1,1,0,0,0,1,1,0,0)T
(b5) Recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "
Figure RE-GDA0001695870420000141
Example 8: the components form a fixed support assembly "
Figure RE-GDA0001695870420000142
Examples of (c) are as follows:
let "index sequence" a ═ 1,1,1,0,1,0,1,0,0)TRecording the sequence numbers of the non-zero elements in the index sequence A in a set to form a fixed support set "
Figure RE-GDA0001695870420000143
(b6) By using "fixed support assembly"
Figure RE-GDA0001695870420000144
And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.
Wherein, the said use of "fixed support assembly"
Figure RE-GDA0001695870420000145
The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "
Figure RE-GDA0001695870420000146
And (5) assisting reconstruction.
Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm, among others.
Taking the reconstruction computation orthogonal matching pursuit algorithm as an example, the step b6) includes:
b6-1) reading the "compressed Signal sequence" y ∈ RM×1Measurement matrix phi ∈ RM×NThe sparsity K, t represents the number of iterations, rtDenotes the residual, Ω, of t iterationstSet of indices (column indices) representing t iterations, i.e. support set of t iterations, KtRepresents the index set omegatThe number of the elements (c) is,
Figure RE-GDA0001695870420000147
represents Kt× 1 vector, λtIndicates the index (column index), a, found at the t-th iterationjThe jth column of the matrix Φ (j ═ 1,2, …, N),
Figure RE-GDA0001695870420000148
express according to "fixed support assembly"
Figure RE-GDA0001695870420000149
Set of columns of the selected matrix phi, phitRepresenting the set omega by indextSelected column set of matrix Φ (size M × K)tThe matrix of (d), the notation ∪ denotes a union operation, | · | denotes taking the absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)-1Representing matrix inversion;
b6-2) initialization
Figure RE-GDA00016958704200001410
b6-3) if Kt< K, solving
Figure RE-GDA0001695870420000151
Find index lambdat(ii) a Otherwise, ask for
Figure RE-GDA0001695870420000152
Least squares solution of (c):
Figure RE-GDA0001695870420000153
performing step b 6-8);
b6-4) to omegat=Ωt∪{λt},
Figure RE-GDA0001695870420000154
Example 9: the step b6-4) is exemplified as follows:
on the basis of example 8, assume that
Figure RE-GDA0001695870420000155
t=1,λtWhen the value is 17, then
Figure RE-GDA0001695870420000156
Then omegat=Ωt∪{λt}={1,4,7,10,14,17},
Figure RE-GDA0001695870420000157
b6-5) solving
Figure RE-GDA0001695870420000158
Least squares solution of (c):
Figure RE-GDA0001695870420000159
b6-6) updating residual
Figure RE-GDA00016958704200001510
b6-7) t ═ t +1, return to step b 6-3);
b6-8) sparse Speech Signal
Figure RE-GDA00016958704200001511
In the support set omegatWith non-zero terms at the index, whose value is the least-squares solution sought
Figure RE-GDA00016958704200001512
Will be provided with
Figure RE-GDA00016958704200001513
In the support set omegatThe elements outside the index are set to 0 to reconstruct the sparse speech signal x.
Example 10: an example of the sparse speech signal x reconstruction is as follows:
sparse speech signal obtained by hypothesis reconstruction
Figure RE-GDA00016958704200001514
Length N25, in the support set ΩtWith non-zero terms whose values are the least-squares solution of
Figure RE-GDA00016958704200001515
Ωt={1,4,5,7,8,10,14,17,19,23},
Figure RE-GDA00016958704200001516
Will be provided with
Figure RE-GDA00016958704200001517
In the support set omegatThe element outside the index is set to 0, then
Figure RE-GDA00016958704200001518
I.e. reconstructing a sparse speech signal with a length N-25
x=(x1,0,0,x4,x5,0,0,x7,x8,0,0,x10,0,0,0,x14,0,0,x17,0,x19,0,0,0,x23,0,0)
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (7)

1. A method for compressing, storing and reconstructing a voice signal based on a superposition sequence is characterized by comprising the following steps: (a) and (3) compression and storage processing of the voice signal:
(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "
Figure FDA0002572456360000011
Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "
Figure FDA0002572456360000012
Intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that 0 is more than β and is less than or equal to 1;
(a4) according to the Huffman coding, the index sequence A with the length of β N is compressed and coded to generate the length L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
(a5) for the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
Figure FDA0002572456360000013
The utilization length of step a5) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "
Figure FDA0002572456360000014
The method comprises the following specific steps:
a5-1) "inverted index sequence"
Figure FDA0002572456360000015
Suppose Q ∈ Rq×1Is a spreading sequence, where q is the spreading gain, satisfies
Figure DEST_PATH_FDA0001838154780000033
Wherein, the symbol
Figure FDA0002572456360000017
Represents a downward integer operation;
a5-2) calculating the Kronecker product,
Figure FDA0002572456360000021
spread spectrum spreading of sequence C, i.e. S of length (L)2×q);
Wherein, the superscript "T" represents the transposition operation;
a5-3) adds zeros at the end of the vector S, starting from (L)2× q) is added to M, thereby constructing a "spreading index sequence"
Figure FDA0002572456360000022
Figure FDA0002572456360000023
Figure FDA0002572456360000024
A length of M;
(a6) for "spread spectrum index sequence" of length M "
Figure FDA0002572456360000025
And a "compressed signal sequence" y are assigned respectivelyThe weighted values α and 1- α are added, and the formula is used
Figure FDA0002572456360000026
Generating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight alpha is set according to engineering experience and meets the condition that alpha is more than or equal to 0 and less than or equal to 1;
(b) reconstruction reproduction processing of a speech signal:
(b1) de-spreading the memory sequence z with length M to restore length L2"inverted index sequence" C;
(b2) for the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
Figure FDA0002572456360000027
(b3) Using formulas
Figure FDA0002572456360000028
Solving a 'compressed signal sequence' y with the length M;
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1The index sequence B is compressed, and then the index sequence A with the length of β N is restored by decoding through Huffman decoding;
(b5) recording the column sequence numbers of non-zero elements in an index sequence A with the length of β N in a set to form a fixed support set "
Figure FDA0002572456360000031
(b6) By using "fixed support assembly"
Figure FDA0002572456360000032
And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.
2. The method of claim 1, wherein: the sparse speech signal described in step a1) is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude below a mute threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with the length of N.
3. The method of claim 1, wherein: construction of an "original index sequence" of length N with 0,1 elements as described in step a1) "
Figure FDA0002572456360000033
The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"
Figure FDA0002572456360000034
The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"
Figure FDA0002572456360000035
Corresponding to the record element 1, the "original index sequence" thus constructed "
Figure FDA0002572456360000036
Is a sequence of length N and elements 0 or 1.
4. The method of claim 1, wherein: the data conversion process of the step a4) is as follows: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
5. The method of claim 1, wherein: the utilization length of step b2) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "
Figure FDA0002572456360000041
The specific steps of (a) are consistent with the steps a5-1) to a 5-3).
6. The method of claim 1, wherein: the data reduction process of the step b4) is as follows: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
7. The method of claim 1, wherein: using "fixed support sets" as described in step b6) "
Figure FDA0002572456360000042
The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "
Figure FDA0002572456360000043
And (5) assisting reconstruction.
CN201810497026.XA 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence Active CN108962265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810497026.XA CN108962265B (en) 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810497026.XA CN108962265B (en) 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence

Publications (2)

Publication Number Publication Date
CN108962265A CN108962265A (en) 2018-12-07
CN108962265B true CN108962265B (en) 2020-08-25

Family

ID=64499535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810497026.XA Active CN108962265B (en) 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence

Country Status (1)

Country Link
CN (1) CN108962265B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109818645B (en) * 2019-02-20 2020-12-29 西华大学 Superposition CSI feedback method based on signal detection and support set assistance
CN109817229B (en) * 2019-03-14 2020-09-22 西华大学 Single-bit audio compression transmission and reconstruction method assisted by superposition characteristic information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014505415A (en) * 2011-01-10 2014-02-27 アルカテル−ルーセント Method and apparatus for measuring and recovering a sparse signal
CN105099462A (en) * 2014-05-22 2015-11-25 北京邮电大学 Signal processing method based on compressive sensing
CN105206277A (en) * 2015-08-17 2015-12-30 西华大学 Voice compression method base on monobit compression perception
CN105933008A (en) * 2016-04-15 2016-09-07 哈尔滨工业大学 Multiband signal reconstruction method based on clustering sparse regularization orthogonal matching tracking algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014505415A (en) * 2011-01-10 2014-02-27 アルカテル−ルーセント Method and apparatus for measuring and recovering a sparse signal
CN105099462A (en) * 2014-05-22 2015-11-25 北京邮电大学 Signal processing method based on compressive sensing
CN105206277A (en) * 2015-08-17 2015-12-30 西华大学 Voice compression method base on monobit compression perception
CN105933008A (en) * 2016-04-15 2016-09-07 哈尔滨工业大学 Multiband signal reconstruction method based on clustering sparse regularization orthogonal matching tracking algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《1-Bit压缩感知盲重构算法》;张京超等;《电子与信息学报》;20150331;第37卷(第3期);第567-573页 *
《One-bit Compressive Sensing with partial support》;Phillip North et al.;《2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)》;20160121;第349-352页 *

Also Published As

Publication number Publication date
CN108962265A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
JP6998968B2 (en) Deep neural network execution method, execution device, learning method, learning device and program
JP6177239B2 (en) Adapt analysis weighting window or synthesis weighting window for transform coding or transform decoding
CN108962265B (en) Voice signal compression storage and reconstruction method based on superposition sequence
Krahmer et al. Total variation minimization in compressed sensing
Zhang et al. Signal reconstruction of compressed sensing based on alternating direction method of multipliers
Shirazinia et al. Analysis-by-synthesis quantization for compressed sensing measurements
Tawfic et al. Improving recovery of ECG signal with deterministic guarantees using split signal for multiple supports of matching pursuit (SS-MSMP) algorithm
Dendani et al. Speech enhancement based on deep AutoEncoder for remote Arabic speech recognition
Ahmed et al. Audio compression using transforms and high order entropy encoding
Shukla et al. Audio compression algorithm using discrete cosine transform (DCT) and Lempel-Ziv-Welch (LZW) encoding method
Desai et al. Compressive sensing in speech processing: A survey based on sparsity and sensing matrix
Gan et al. Golay meets Hadamard: Golay-paired Hadamard matrices for fast compressed sensing
Yu et al. Medical image compression with thresholding denoising using discrete cosine-based discrete orthogonal stockwell transform
Joshi et al. Analysis of compressive sensing for non stationary music signal
JP2018513996A (en) Method and device for encoding multiple audio signals and method and device for decoding a mixture of multiple audio signals with improved separation
Ambat et al. On selection of search space dimension in compressive sampling matching pursuit
Bhadoria et al. Comparative analysis of basis & measurement matrices for non-speech audio signal using compressive sensing
Kasem et al. Perceptual compressed sensing and perceptual sparse fast fourier transform for audio signal compression
Rajbamshi et al. Random Gabor multipliers for compressive sensing: a simulation study
Yu et al. Compressed sensing in audio signals and it's reconstruction algorithm
Bala et al. Effect of sparsity on speech compressed sensing
Moreno-Alvarado et al. DCT-compressive sampling of multifrequency sparse audio signals
Abo-Zahhad et al. Electrocardiogram data compression algorithm based on the linear prediction of the wavelet coefficients
Sinha et al. Wavelet based Speech Coding technique using median function thresholding
Kasem et al. A comparative study of audio compression based on compressed sensing and sparse fast fourier transform (sfft): Performance and challenges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181207

Assignee: Suining Feidian Cultural Communication Co.,Ltd.

Assignor: XIHUA University

Contract record no.: X2023510000027

Denomination of invention: A method for compressing, storing, and reconstructing speech signals based on stacked sequences

Granted publication date: 20200825

License type: Common License

Record date: 20231129

EE01 Entry into force of recordation of patent licensing contract