CN108962265A - A kind of Speech Signal Compression storage and reconstructing method based on superposition sequence - Google Patents

A kind of Speech Signal Compression storage and reconstructing method based on superposition sequence Download PDF

Info

Publication number
CN108962265A
CN108962265A CN201810497026.XA CN201810497026A CN108962265A CN 108962265 A CN108962265 A CN 108962265A CN 201810497026 A CN201810497026 A CN 201810497026A CN 108962265 A CN108962265 A CN 108962265A
Authority
CN
China
Prior art keywords
sequence
length
index sequence
index
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810497026.XA
Other languages
Chinese (zh)
Other versions
CN108962265B (en
Inventor
卿朝进
万东琴
阳庆瑶
王维
郭奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xihua University
Original Assignee
Xihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xihua University filed Critical Xihua University
Priority to CN201810497026.XA priority Critical patent/CN108962265B/en
Publication of CN108962265A publication Critical patent/CN108962265A/en
Application granted granted Critical
Publication of CN108962265B publication Critical patent/CN108962265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a kind of methods of Speech Signal Compression storage and reconstruct based on superposition sequence, comprising: reads sparse voice signal, constructs primary index sequence using nonzero element and neutral element location index, store the degree of rarefication of sparse voice signal;Compression processing is carried out to sparse voice signal, generates compressed signal sequence;The partial sequence of primary index sequence is intercepted as index sequence, generates spread spectrum index sequence by processing such as coding, conversion, spread spectrums;Storage sequence is generated after spread spectrum index sequence is weighted, is superimposed respectively with compressed signal sequence to store;Despreading processing is done to storage sequence, obtains conversion index sequence and compressed signal sequence;Conversion index sequence is passed through into data convert, decoding, restores index sequence;Support set is constructed according to index sequence, and reconstructs sparse voice signal.The invention has the advantages that: in the case where not increasing storage resource, effectively improve the reconstruction accuracy of voice signal.

Description

Voice signal compression storage and reconstruction method based on superposition sequence
Technical Field
The invention relates to the technical field of compression storage and reconstruction of voice signals, in particular to a voice signal compression storage and reconstruction method based on a superposition sequence.
Background
With the increasingly frequent information interaction, a voice signal is a very common signal in the information interaction, and the processing technical requirements are gradually refined. Due to the diversity of the speech signal itself and the uniqueness of the human auditory system, the speech signal is sparse in different transform domains. Conventional speech signal sampling typically requires that the nyquist sampling rate be satisfied. Compressed sensing theory (CS) indicates that signals with sparseness or compressibility can be Compressed sampled and reconstructed by Compressed sensing techniques. Therefore, the CS theory is combined with the speech signal processing field, so that the sampling frequency is reduced, and the requirement on a sampling device is lowered.
And according to the CS theory, compressing the sparse voice signal through an observation matrix, and reconstructing the sparse voice signal by using a reconstruction algorithm. However, the existing reconstruction algorithms such as the matching pursuit algorithm, the orthogonal matching pursuit algorithm, the compressive sampling matching pursuit algorithm, the basis pursuit algorithm, the subspace pursuit algorithm, and the like are not specifically proposed for the reconstruction of the sparse speech signal, and then the element position index of the sparse speech signal is not considered and utilized, so that the reconstruction accuracy of the sparse speech signal is limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a voice signal compression storage and reconstruction method based on a superposition sequence. Compared with the traditional compressed sensing voice compression, the method and the device have the advantages that partial position indexes of elements of the sparse voice signal are used for assisting reconstruction, and the reconstruction accuracy of the voice signal is improved under the condition that the storage cost is not increased.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a method for compressing, storing and reconstructing a voice signal based on a superposition sequence comprises the following steps: (a) and (3) compression and storage processing of the voice signal:
(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0,1 elementColumn'Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;
the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that β is more than 0 and less than or equal to 1;
(a4) according to the Huffman coding, the 'index sequence' A with the length of β N is compressed and coded to generate the length of L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
(a5) for the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
(a6) For "spread spectrum index sequence" of length M "and the 'compressed signal sequence' y are respectively given weight values alpha and 1-alphaPerforming superposition by using formulaGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight α is set according to engineering experience and meets the condition that α is more than or equal to 0 and less than or equal to 1.
(b) Reconstruction reproduction processing of a speech signal:
(b1) de-spreading the memory sequence z with length M to restore length L2"inverted index sequence" C;
(b2) for the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
(b3) Using formulasSolving a 'compressed signal sequence' y with the length M;
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1the 'index sequence' B is compressed, and then the 'index sequence' A with the length of β N is restored by decoding through Huffman decoding;
(b5) recording the column sequence numbers of non-zero elements in an 'index sequence' A with the length of β N in a set to form a 'fixed support set'
(b6) By using "fixed support assembly"Assisted by, and combined with, heavyThe construction algorithm reconstructs a sparse speech signal x of length N from a "compressed signal sequence" y of length M.
Further, the sparse speech signal described in step a1) is a discrete speech signal that is transformed from a time domain signal to a frequency domain signal by a time-frequency transform method, and the signal amplitude below the silence threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with a length N.
The "psychoacoustic models" are, for example, an MPEG (Moving Picture Experts Group) psychoacoustic model and an OGG (OGGVobis) psychoacoustic model.
The time-frequency transformation method can adopt discrete cosine transformation, short-time Fourier transformation and wavelet transformation.
Further, constructing an "original index sequence" of length N with 0,1 elements as described in step a1) "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"The "original index sequence" thus constructed, with the correspondence record of element 1 "Is a sequence with the length of N and the element of 0 or 1
Further, the data conversion process in step a4) is as follows: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2In which case "zero padding" is used if the number of data of sequence BETA is not exactly divisible by γ"construct a sequence that can be evenly divided by γ; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
Further, the utilization length of step a5) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The method comprises the following specific steps:
a5-1) "inverted index sequence"Suppose Q ∈ Rq×1Is a spreading sequence, where q is the spreading gain, satisfies
Wherein, the spreading sequence Q can be M sequence, M sequence, Gold sequence, Zadoff-chu sequence.
Wherein, the symbolIndicating a downward integer operation.
a5-2) calculating the Kronecker product,
spread spectrum spreading of sequence C, i.e. S of length (L)2×q);
Where the superscript "T" denotes the transpose operation.
a5-3) adds zeros at the end of the vector S, starting from (L)2Xq) to M, thereby constructing a "spreading index sequence”
The degree is M.
Further, the utilization length of step b2) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The specific steps of (a) are consistent with the steps a5-1) to a 5-3).
Further, the data restoring process in step b4) is as follows: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
Further, the step b6) of using the "fixed support assembly"The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.
Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm.
Further, taking the reconstruction algorithm orthogonal matching pursuit algorithm as an example, the step b6) includes:
b6-1) reading the "compressed signal sequence" y ∈ RM×1The measurement matrix phi ∈ RM×NThe sparsity K, t represents the number of iterations, rtDenotes the residual, Ω, of t iterationstSet of indices (column indices) representing t iterations, i.e. support set of t iterations, KtRepresents the index set omegatThe number of the elements (c) is,represents KtX 1 vector, λtIndicates the index (column index), a, found at the t-th iterationjThe jth column of the matrix Φ (j ═ 1,2, …, N),express according to "fixed support assembly"Set of columns of the selected matrix phi, phitRepresenting the set omega by indextSelected set of columns (size M K) of matrix phitmatrix of (d), the symbol @ represents a union operation, | · | represents solving an absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)-1Representing matrix inversion;
b6-2) initialization
b6-3) if Kt< K, solvingFind index lambdat(ii) a Otherwise, ask forLeast squares solution of (c):performing step b 6-8);
b6-4) to omegat=Ωt-1∪{λt},
b6-5) solvingLeast squares solution of (c):
b6-6) updating residual
b6-7) t ═ t +1, return to step b 6-3);
b6-8) sparse Speech SignalIn the support set omegatWith non-zero terms at the index, whose value is the least-squares solution soughtWill be provided withIn the support set omegatThe elements outside the index are set to 0 to reconstruct the sparse speech signal x.
Compared with the prior art, the invention has the advantages that:
partial position indexes of the sparse speech signal are stored under the condition that storage cost is not increased, and compared with the traditional compressed sensing speech compression, reconstruction accuracy is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of a method for storing and reconstructing a speech signal sample based on a superposition sequence according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of the compression and storage process of the speech signal based on the superposition sequence sampling storage and reconstruction method according to the embodiment of the present invention.
FIG. 3 is a schematic flow chart of the reconstruction and reproduction process of the voice signal based on the method for storing and reconstructing the voice signal samples of the superposition sequence according to the embodiment of the present invention
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings by way of examples.
A flow chart of a method for storing and reconstructing speech signal samples based on a superposition sequence is shown in fig. 1.
The following describes in detail the processing procedure of compressing and storing the voice signal according to the method for compressing, storing and reconstructing the voice signal based on the superposition sequence, as shown in fig. 2.
(a1) Reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indexes in the voice signals, and simultaneously storing the sparsity K of the sparse voice signals;
the sparse speech signal is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude lower than a mute threshold is set to be zero according to a psychoacoustic model to obtain a sparse speech signal x with the length of N.
Among them, the "psychoacoustic models" are, for example, an MPEG (Moving Picture Experts Group) psychoacoustic model and an OGG (OGGVobis) psychoacoustic model, and the like.
Such as discrete cosine transform, short-time fourier transform, wavelet transform, and so on.
Wherein, the 'original index sequence' with the length of N is constructed by 0,1 element "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"Corresponding to the record element 1, the "original index sequence" thus constructed "Is a sequence of length N and elements 0 or 1.
Example 1: examples of such "construction" are as follows:
sparse speech signal assuming N18
x=(5.4,3.2,6.7,0,0.9,0,7.8,0,0,1.2,0.8,0,4.2,0,0,0,0,0)TThen "original index sequence"
Where the superscript "T" denotes the transpose operation.
(a2) Reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process can be expressed as that y is equal to phi x;
the measurement matrix is an existing measurement matrix such as a Gaussian random matrix, a Bernoulli random matrix, a partial Hadamard matrix and the like;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that β is more than 0 and less than or equal to 1;
example 2: the "intercept" example is as follows:
on the basis of example 1, assuming that β is 0.5, β N is 9, "original index sequence"Then "index sequence" a ═ 1,1,1,0,1,0,1,0,0)T
(a4) according to the Huffman coding, the 'index sequence' A with the length of β N is compressed and coded to generate the length of L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
wherein, the data conversion process comprises the following steps: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
Example 3: examples of such "transformations" are as follows:
assuming a length L1The sequence B of 62 ═ 1,0,0,1,0,1,0,1,0, 0,1,0,1,1,1,0,1,0,0,0,1,0,1,1, 1,0,0,1,0,1,0,1,0,1, 1,0,1,0,1,0,0, 0,1,1,0,0,0,1,0TWhen γ is 16, the groups are divided into 4 groups, i.e., L24, and two bits 0 are added at the end, then 4 groups of data are 1001010101011101, 0001011000101010, 1110010101110101 and 0010111000110000 in sequence, the data are 38237,5674,58741 and 11824 in sequence from binary conversion to decimal real number, and then the 'conversion index sequence' C is equal to (38237,5674,58741,11824)TAnd is a4 × 1 vector.
(a5) For the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
Example 4: the utilization length is L2C constructs a 'spread index sequence' by spreading and zero-padding "Examples are as follows:
a5-1) assuming "inverted index sequence" C ═ 3.8,5.6,5.8,1.2)T,L2=4,M=25, Q∈Rq×1For spreading sequences, Q ═ 1,1,1,1,1)TWherein q is a spreading gain, satisfies
Wherein, Q ═ 1,1,1,1,1)TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.
Wherein, the symbolIndicating a down-fetch operationDo this.
a5-2) calculating the Kronecker product,
realizing the spread spectrum expansion of the sequence C, namely the S length is 24;
a5-3) adds zeros at the end of the vector S, starting from (L)2X q) to M, i.e., from 24 to 25, thereby constructing a "inverted index sequence" The length is 25.
(a6) For "spread spectrum index sequence" of length M "and the 'compressed signal sequence' y are respectively endowed with weights α and 1- α and then are superposed by using a formulaGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight α is set according to engineering experience and meets the condition that α is more than or equal to 0 and less than or equal to 1;
example 5: the construct "memory sequence" z is exemplified as follows:
on the basis of example 4, assume that M is 25, a "spreading index sequence""compressed signal sequence" y ═ y (y)1,y2,…,y24,y25)Twhere α is 0.2 and 1- α is 0.8, then according to the formula,
the following describes in detail the reconstruction and reproduction process of the speech signal according to the method for compressed storage and reconstruction of a speech signal based on a superposition sequence, as shown in fig. 3.
(b1) De-spreading the 'memory sequence' z with length M to restore length L2"inverted index sequence" C;
example 6: an example of "despreading" is as follows:
on the basis of example 4 and example 5, assume that "memory sequence" z ∈ RM×1,M=25, Q∈Rq×1For spreading sequences, Q ═ 1,1,1,1,1)TWherein q ═ 6 is the spreading gain;
wherein, Q ═ 1,1,1,1,1)TFor simplicity, the spreading sequence may be an M-sequence, a Gold sequence, a Zadoff-chu sequence, or the like.
b1-1) on the basis of examples 4 and 5, it is known that:
b1-2) partitioning the sequence z into blocksOf (2) aAnd one (M-L)2X q) x 1 pureSpeech signal sequenceI.e.into 4 sequences z of length 61,z2...z4And a pure speech signal sequenceThen
Wherein,
b1-3) vs. z1,z2z3,z4Despreading is performed assuming despread data h ═ 4.56,6.96,1.44,6.72TNamely:
take i as an example 1, i
b1-4) Speech Signal sequence yi1,yi2,…,yi6And Q1,Q2,…,Q6Linearity is not relevant, so:
0.8yi1Q1+0.8yi2Q2+…+0.8yi6Q6=0;
namely: 0.8y11Q1+0.8y12Q2+…+0.8y16Q6=0;
b1-5) therefore:
namely:
b1-6) known spreading matrix Q ═ (Q)1,Q2,…,Q6)TI.e. Q ═ 1,1,1,1,1)T
b1-7), the despreading restores the 'conversion index sequence' C ═ C1,C2,C3,C4)T
Namely: 4.56 ═ 0.2C1+0.2C1+…+0.2C1To free C1=3.8;
By the same token, solve out C2,C3,C4That is, despreading and recovering the 'conversion index sequence' C ═ (3.8,5.6,5.8,1.2)T
(b2) For the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
Wherein the construction of a "spreading index sequence"Examples are consistent with those described in example 4.
(b3) Using formulasSolving a 'compressed signal sequence' y with the length M;
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1the 'index sequence' B is compressed, and then the 'index sequence' A with the length of β N is restored by decoding through Huffman decoding;
wherein, the data reduction process comprises the following steps: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
Example 7: examples of such "data reduction" are as follows:
on the basis of example 3, assume that "conversion index sequence" C ═ C (38237,5674,58741,11824)TSequence B is of length L1When the sequence data obtained by converting the real number element into the binary is 1001010101011101000101100010101011100101011101010010111000110000, the last two bits 0 are removed from the end of the binary obtained by conversion, and the sequence B is reduced to (1,0,0,1,0,1,0,1,0,1, 1,1,0,1,0,0,0,1,0,1,1, 1,0,0,0,1,0,1,0,1,0,1,1,1,0,0,1,0,1,0, 0,1,1,1,0,0,0,1, 0,1,0,0,0,1,0, 0T
(b5) recording the column sequence numbers of non-zero elements in an 'index sequence' A with the length of β N in a set to form a 'fixed support set'
Example 8: the components form a fixed support assembly "Examples of (c) are as follows:
let "index sequence" a ═ 1,1,1,0,1,0,1,0,0)TRecording the sequence numbers of the non-zero elements in the index sequence A in a set to form a fixed support set "
(b6) By using "fixed support assembly"And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.
Wherein, the said use of "fixed support assembly"The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.
Such as a matching pursuit algorithm, an orthogonal matching pursuit algorithm, and a regular orthogonal matching pursuit algorithm, among others.
Taking the reconstruction computation orthogonal matching pursuit algorithm as an example, the step b6) includes:
b6-1) reading the "compressed signal sequence" y ∈ RM×1The measurement matrix phi ∈ RM×NThe sparsity K, t represents the number of iterations, rtDenotes the residual, Ω, of t iterationstSet of indices (column indices) representing t iterations, i.e. support set of t iterations, KtRepresents the index set omegatThe number of the elements (c) is,represents KtX 1 vector, λtIndicates the index (column index), a, found at the t-th iterationjThe jth column of the matrix Φ (j ═ 1,2, …, N),express according to "fixed support assembly"Set of columns of the selected matrix phi, phitRepresenting the set omega by indextSelected set of columns (size M K) of matrix phitmatrix of (d), the symbol @ represents a union operation, | · | represents solving an absolute value,<X,Y>the inner product of the vector X and the vector Y is solved, and the vector operator 2 norm is solved by | | · | |)-1Representing matrix inversion;
b6-2) initialization
b6-3) if Kt< K, solvingFind index lambdat(ii) a Otherwise, ask forLeast squares solution of (c):performing step b 6-8);
b6-4) to omegat=Ωt∪{λt},
Example 9: the step b6-4) is exemplified as follows:
on the basis of example 8, assume thatt=1,λtWhen the value is 17, thenThen omegat=Ωt∪{λt}={1,4,7,10,14,17},
b6-5) solvingLeast squares solution of (c):
b6-6) updating residual
b6-7) t ═ t +1, return to step b 6-3);
b6-8) sparse Speech SignalIn the support set omegatWith non-zero terms at the index, whose value is the least-squares solution soughtWill be provided withIn the support set omegatThe elements outside the index are set to 0 to reconstruct the sparse speech signal x.
Example 10: an example of the sparse speech signal x reconstruction is as follows:
sparse speech signal obtained by hypothesis reconstructionLength N25, in the support set ΩtWith non-zero terms whose values are the least-squares solution ofΩt={1,4,5,7,8,10,14,17,19,23},Will be provided withIn the support set omegatThe element outside the index is set to 0, then
I.e. reconstructing a sparse speech signal with a length N-25
x=(x1,0,0,x4,x5,0,0,x7,x8,0,0,x10,0,0,0,x14,0,0,x17,0,x19,0,0,0,x23,0,0)
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (8)

1. A method for compressing, storing and reconstructing a voice signal based on a superposition sequence is characterized by comprising the following steps: (a) and (3) compression and storage processing of the voice signal:
(a1) reading a voice signal x with the sparsity of K and the length of N after sparsification, and constructing an original index sequence with the length of N by using 0 and 1 elements "Recording non-zero elements and zero element position indices in speech signals while storing sparse speech signalsThe sparsity K of;
(a2) reading a pre-stored M multiplied by N measurement matrix phi, and compressing a voice signal by using the measurement matrix to generate a 'compressed signal sequence' y with the length of M, wherein the compression process is represented as y being phi x;
the M, N generally satisfies M ≦ N;
(a3) for "original index sequence" of length N "intercepting to obtain an index sequence A with the length of β N, wherein the interception coefficient β is set according to engineering experience and meets the condition that β is more than 0 and less than or equal to 1;
(a4) according to the Huffman coding, the 'index sequence' A with the length of β N is compressed and coded to generate the length of L1The 'compressed index sequence' B is subjected to data conversion to obtain a length L2The "inverted index sequence" C of (1);
(a5) for the length L2The "conversion index sequence" C of (1) is used for spreading processing, and a "zero padding" mode is used for constructing a "spreading index sequence" with the length of M "
(a6) For "spread spectrum index sequence" of length M "and the 'compressed signal sequence' y are respectively endowed with weights α and 1- α and then are superposed by using a formulaGenerating a 'storage sequence' z with the length of M, and storing the 'storage sequence' z;
the weight α is set according to engineering experience and meets the condition that α is more than or equal to 0 and less than or equal to 1;
(b) reconstruction reproduction processing of a speech signal:
(b1) de-spread the memory sequence z with length M to restore the lengthIs L2"inverted index sequence" C;
(b2) for the length L2The 'conversion index sequence' C is used for carrying out spread spectrum processing, and a 'zero-padding' mode is used for constructing a 'spread spectrum index sequence' with the length of M "
(b3) Using formulasSolving a 'compressed signal sequence' y with the length M;
(b4) for the length L2The 'transformation index sequence' C is used for data reduction to obtain a length L1the 'index sequence' B is compressed, and then the 'index sequence' A with the length of β N is restored by decoding through Huffman decoding;
(b5) recording the column sequence numbers of non-zero elements in an 'index sequence' A with the length of β N in a set to form a 'fixed support set'
(b6) By using "fixed support assembly"And (3) assisting and reconstructing a sparse speech signal x with the length of N from the compressed signal sequence y with the length of M by combining a reconstruction algorithm.
2. The method of claim 1, wherein: the sparse speech signal described in step a1) is a discrete speech signal which is transformed from a time domain signal to a frequency domain signal by a time-frequency transformation method, and the signal amplitude below a mute threshold is set to zero according to a "psychoacoustic model" to obtain a sparse speech signal x with the length of N.
3. The method of claim 1, wherein the step of removing the metal oxide layer comprises removing the metal oxide layer from the metal oxide layer: construction of an "original index sequence" of length N with 0,1 elements as described in step a1) "The process of recording non-zero elements and zero element position indices in a speech signal is: zero elements in a sparse speech signal x of length N in the "original index sequence"The middle correspondence is recorded as element 0, the non-zero elements are in the "original index sequence"Corresponding to the record element 1, the "original index sequence" thus constructed "Is a sequence of length N and elements 0 or 1.
4. The method of claim 1, wherein: the data conversion process of the step a4) is as follows: will have a length L1The data of the "compressed index sequence" B of (1) is divided into L groups of γ data2If the data number of the sequence BETA can not be uniformly divided by gamma, constructing a sequence which can be uniformly divided by gamma in a zero filling mode; converting each group of data from binary number to a decimal real number value to realize conversion processing and obtain a length L2The "inverted index sequence" C of (1).
5. The method according to claim 1, wherein said utilization length of step a5) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The method comprises the following specific steps:
a5-1) "inverted index sequence"Suppose Q ∈ Rq×1Is a spreading sequence, where q is the spreading gain, satisfies
Wherein, the symbolRepresents a downward integer operation;
a5-2) calculating the Kronecker product,
spread spectrum spreading of sequence C, i.e. S of length (L)2×q);
Wherein, the superscript "T" represents the transposition operation;
a5-3) adds zeros at the end of the vector S, starting from (L)2Xq) to M, thereby constructing a "spreading index sequence"
The length is M.
6. The method of claim 1, wherein: the utilization length of step b2) is L2C constructs a "spread index sequence" of length M by spreading and zero-padding "The specific steps of (a) are consistent with the steps a5-1) to a 5-3).
7. The method of claim 1, wherein: the data reduction process of the step b4) is as follows: will have a length L2The real number element in the "conversion index sequence" C of (1) is converted into a binary number, and an element having an amplitude value of zero is removed from the tail of the binary number obtained by the conversion, so that the length of the remaining element is L1And the sequence formed by the rest elements is the 'compressed index sequence' B.
8. The method of claim 1, wherein: using "fixed support sets" as described in step b6) "The auxiliary means that in the process of reconstruction by combining a reconstruction algorithm, a 'fixed support set' is reserved each time a support set is updated and iterated "And (5) assisting reconstruction.
CN201810497026.XA 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence Active CN108962265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810497026.XA CN108962265B (en) 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810497026.XA CN108962265B (en) 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence

Publications (2)

Publication Number Publication Date
CN108962265A true CN108962265A (en) 2018-12-07
CN108962265B CN108962265B (en) 2020-08-25

Family

ID=64499535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810497026.XA Active CN108962265B (en) 2018-05-22 2018-05-22 Voice signal compression storage and reconstruction method based on superposition sequence

Country Status (1)

Country Link
CN (1) CN108962265B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109818645A (en) * 2019-02-20 2019-05-28 西华大学 CSI feedback method is superimposed with what supported collection assisted based on signal detection
CN109817229A (en) * 2019-03-14 2019-05-28 西华大学 The single-bit audio compression transmission of Superposition Characteristics information auxiliary and reconstructing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014505415A (en) * 2011-01-10 2014-02-27 アルカテル−ルーセント Method and apparatus for measuring and recovering a sparse signal
CN105099462A (en) * 2014-05-22 2015-11-25 北京邮电大学 Signal processing method based on compressive sensing
CN105206277A (en) * 2015-08-17 2015-12-30 西华大学 Voice compression method base on monobit compression perception
CN105933008A (en) * 2016-04-15 2016-09-07 哈尔滨工业大学 Multiband signal reconstruction method based on clustering sparse regularization orthogonal matching tracking algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014505415A (en) * 2011-01-10 2014-02-27 アルカテル−ルーセント Method and apparatus for measuring and recovering a sparse signal
CN105099462A (en) * 2014-05-22 2015-11-25 北京邮电大学 Signal processing method based on compressive sensing
CN105206277A (en) * 2015-08-17 2015-12-30 西华大学 Voice compression method base on monobit compression perception
CN105933008A (en) * 2016-04-15 2016-09-07 哈尔滨工业大学 Multiband signal reconstruction method based on clustering sparse regularization orthogonal matching tracking algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PHILLIP NORTH ET AL.: "《One-bit Compressive Sensing with partial support》", 《2015 IEEE 6TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP)》 *
张京超等: "《1-Bit压缩感知盲重构算法》", 《电子与信息学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109818645A (en) * 2019-02-20 2019-05-28 西华大学 CSI feedback method is superimposed with what supported collection assisted based on signal detection
CN109817229A (en) * 2019-03-14 2019-05-28 西华大学 The single-bit audio compression transmission of Superposition Characteristics information auxiliary and reconstructing method
CN109817229B (en) * 2019-03-14 2020-09-22 西华大学 Single-bit audio compression transmission and reconstruction method assisted by superposition characteristic information

Also Published As

Publication number Publication date
CN108962265B (en) 2020-08-25

Similar Documents

Publication Publication Date Title
Needell et al. Stable image reconstruction using total variation minimization
US7508325B2 (en) Matching pursuits subband coding of data
Herrholz et al. Compressive sensing principles and iterative sparse recovery for inverse and ill-posed problems
JP6177239B2 (en) Adapt analysis weighting window or synthesis weighting window for transform coding or transform decoding
CN107919938B (en) Signal sampling recovery method and device suitable for OvXDM system and OvXDM system
Li et al. Phase retrieval from multiple-window short-time Fourier measurements
CN108962265B (en) Voice signal compression storage and reconstruction method based on superposition sequence
JP2011137817A (en) Method for reconstructing streaming signal from streaming measurement value
Pope et al. Probabilistic recovery guarantees for sparsely corrupted signals
Zou et al. Robust compressive sensing of multichannel EEG signals in the presence of impulsive noise
Tawfic et al. Improving recovery of ECG signal with deterministic guarantees using split signal for multiple supports of matching pursuit (SS-MSMP) algorithm
CN103456148B (en) The method and apparatus of signal reconstruction
Yu et al. Medical image compression with thresholding denoising using discrete cosine-based discrete orthogonal stockwell transform
Desai et al. Compressive sensing in speech processing: A survey based on sparsity and sensing matrix
Joshi et al. Analysis of compressive sensing for non stationary music signal
Shawky et al. Efficient compression and reconstruction of speech signals using compressed sensing
JP2018513996A (en) Method and device for encoding multiple audio signals and method and device for decoding a mixture of multiple audio signals with improved separation
CN115861472A (en) Image reconstruction method, device, equipment and medium
Kromka et al. Multiwavelet toolbox for MATLAB
Ambat et al. On selection of search space dimension in compressive sampling matching pursuit
Bhadoria et al. Comparative analysis of basis & measurement matrices for non-speech audio signal using compressive sensing
Barzideh et al. Imposing shift-invariance using flexible structure dictionary learning (FSDL)
Kasem et al. Perceptual compressed sensing and perceptual sparse fast fourier transform for audio signal compression
Bala et al. Effect of sparsity on speech compressed sensing
CN105551503A (en) Audio matching tracking method based on atom pre-selection and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181207

Assignee: Suining Feidian Cultural Communication Co.,Ltd.

Assignor: XIHUA University

Contract record no.: X2023510000027

Denomination of invention: A method for compressing, storing, and reconstructing speech signals based on stacked sequences

Granted publication date: 20200825

License type: Common License

Record date: 20231129

EE01 Entry into force of recordation of patent licensing contract