CN104392728A - Colored repeated sentence spectrum construction method for speech reconstruction - Google Patents

Colored repeated sentence spectrum construction method for speech reconstruction Download PDF

Info

Publication number
CN104392728A
CN104392728A CN201410688088.0A CN201410688088A CN104392728A CN 104392728 A CN104392728 A CN 104392728A CN 201410688088 A CN201410688088 A CN 201410688088A CN 104392728 A CN104392728 A CN 104392728A
Authority
CN
China
Prior art keywords
matrix
real part
channel
symbol
imaginary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410688088.0A
Other languages
Chinese (zh)
Other versions
CN104392728B (en
Inventor
王双维
李广岩
梁士利
王春蕾
曹晓林
郑彩侠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Normal University
Original Assignee
Northeast Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Normal University filed Critical Northeast Normal University
Priority to CN201410688088.0A priority Critical patent/CN104392728B/en
Publication of CN104392728A publication Critical patent/CN104392728A/en
Application granted granted Critical
Publication of CN104392728B publication Critical patent/CN104392728B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Auxiliary Devices For Music (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a colored repeated sentence spectrum construction method for speech reconstruction, and belongs to the technical field of speech signal processing. The method is characterized in that two color channels respectively serve as a real part and an imaginary part in fourier transform; the position coordinates of R-B synthetic color in an R-G-B color space are corresponding to the real part and the imaginary part in the fourier transform, wherein G is the symbolic combination of the real part and the imaginary part. According to the method, the real part, the imaginary and the symbols of the real part and the imaginary, corresponding to the repeating number, can be analyzed according to the R-G-B color ratio; the speech spectrum is subjected to image processing, then the speech is reconstructed, thus the fourier transform can be performed by enhancing the speech by the image processing technology and the like, and as a result, the speech reconstruction is realized.

Description

Color compound spectrogram construction method capable of realizing voice reconstruction
Technical Field
The invention belongs to the field of voice signal processing, and relates to a visual color spectrogram construction method capable of realizing voice reconstruction.
Background
Spectrogram is an advantageous tool for speech analysis and phonetics, and is a readable symbol system for studying speech information. It shows the closely related time domain and frequency domain characteristics and their interrelation at the same time, which is not done by the simple time domain signal or frequency domain signal and the simple parallel of the two signals. Therefore, the amount of information carried by the spectrogram is far greater than the sum of the amount of information carried by the pure time domain signal and the pure frequency domain signal. Recently, it is known that the research includes extracting textural features by using an image processing technology, and realizing voice identity authentication and confirmation of specific words of a specific person by combining a subsequent classifier; carrying out singing voice recognition under background music by using spectrogram textures; and performing voice recognition and the like based on spectrogram local gradient calculation. Zhao Shenghui, et al, Beijing university of rational engineers, proposed "a speech spectrogram color enhancement method for voice visualization" and patented (200910235643.3).
However, in the past research, most spectrograms exist as visual display spectrographic features, and the data source of actual analysis is still original voice signal data rather than the spectrogram itself. In particular, since the spectrogram is a visual representation of the amplitude-frequency characteristics of speech and lacks phase information, speech reconstruction based on the spectrogram is not possible. Although the color spectrogram is based on three color channels, the color spectrogram is a pseudo-color image of the gray spectrogram and does not increase information dimension due to color.
Disclosure of Invention
Technical problem to be solved
The invention aims to provide a visual color spectrogram construction method capable of realizing voice reconstruction, which can respectively represent a real part and an imaginary part of voice time-frequency analysis by utilizing an R channel and a B channel in an RGB color model, and a G channel in the RGB color model marks a symbol combination of the real part and the imaginary part of the voice time-frequency analysis to form a complex spectrogram with a three-dimensional information structure. The spectrogram can obtain the real part size and the imaginary part size of voice time-frequency analysis by extracting R channel data and B channel data, obtain symbols of the real part and the imaginary part through G channel decoding, generate a voice time-frequency analysis complex matrix, and further realize voice reconstruction through inverse Fourier transform.
The invention is not limited to the decomposition and reconstruction of human speech, nor to sound signals in the audio range (20 Hz-20 kHz).
(II) technical scheme
In order to achieve the purpose, the invention adopts the following scheme:
1. windowing and framing the original speech signal to form a speech signal framed NxM matrixThe number N of the matrix lines is the number of signal points of each frame, and the number N of the matrix columns is the number of framing of original voice signals;
2. performing N-point DFT on each column in the signal framing matrix, wherein the result of the ith column is as follows:
(1)
and is
(2)
Wherein,is a matrixThe nth row and the ith column,is composed ofThe real part of (a) is,is composed ofAn imaginary part of (d);is an N × M complex matrix, the matrix elements
(3)
And is
(4)
Is provided with
(5)
3. Will be provided withDecomposition of complex matrices into real partsAnd imaginary partTwo submatrices, taking their absolute values and normalizing the data to make the data dynamicThe range is 0-1;
4. when the sign code matrix is constructed, the real part and the imaginary part of the complex matrix are plus, -and 0 respectively, 9 combinations are in total. The invention marks the 9 combinations by using 9 numerical values so as to reserve symbol information of a real part and an imaginary part of an original complex matrix;
5. constructing a 3-dimensional matrixReal part submatrixNormalized as layer 1, imaginary part submatrix of layer number dimensionAfter normalization, the code matrix is used as the 3 rd layer of the layer number dimension, and the symbol code matrix is used as the 2 nd layer of the layer number dimension;
6. will 3-dimensional matrixAs a driving matrix of the RGB color model, a complex spectrogram consisting of red and blue primary colors is formed. Wherein the real part sub-matrixCorresponding to the red channel R, imaginary submatrixCorresponding to the blue channel B, the symbol coding matrix is used as a corresponding green channel G;
7. and (3) voice reconstruction process: and respectively extracting R channel data, B channel data and G channel data, decoding the G channel to obtain symbols of a real part and an imaginary part, assigning the symbols to the extracted R channel and B channel, and constructing a complex matrix by the two matrixes to obtain normalized voice time-frequency analysis data. And performing inverse Fourier transform to obtain a voice signal framing matrix, and removing framing to form a voice sequence to realize voice reconstruction.
The invention has the advantages of use and superiority (beneficial effect)
1. The invention utilizes two color channels to respectively express the real part and the imaginary part of Fourier transform, in an R-G-B color space, the position coordinates of R-B composite color correspond to the real part and the imaginary part of the Fourier transform, and a G value represents the symbol combination of the real part and the imaginary part. For example, the sizes of the position marked real part and imaginary part of A in the R-B color space in FIG. 1 are 0.8 and 0.2 respectively, and the real part and imaginary part of the corresponding complex value and the sign thereof can be analyzed by the R-G-B color matching;
2. the meaning of the spectrogram is to process the spectrogram, then reconstruct the voice, and realize the purpose of enhancing the voice by using an image processing technology. Although the power spectrum and the amplitude spectrum can also be used in image processing techniques, they lack phase information or symbol information and cannot be subjected to inverse fourier transform, and therefore cannot be reconstructed into speech.
Drawings
1. FIG. 1 shows that in R-B color space, two color channels respectively express the real part and the imaginary part of Fourier transform, the position coordinates of the combined color correspond to the magnitude of the real part and the imaginary part of the Fourier transform, the abscissa (red channel) represents the real part, and the ordinate (blue channel) represents the imaginary part. For example, point A is located in R-B color space (0.8, 0.2), where R-B color matching represents a real component size of 0.8 and an imaginary component size of 0.2. Through G channel symbol decoding, the real part and the imaginary part of the corresponding complex value can be analyzed through color matching;
2. fig. 2 is a flow chart for constructing and using a color compound spectrogram capable of realizing voice reconstruction.
Detailed Description
The examples in the schemes are used to illustrate the invention, but not to limit the scope of the invention.
The specific implementation mode of the invention is divided into two major parts of 9 modules, and the flow is shown in figure 2. The following description takes the example of a speech signal with a sampling rate of 16 kHz:
1. a voice framing module:firstly, a speech signal is windowed and framed, for example, a frame signal divided into 1024 points is divided into M frames to form a 1024 × M framing signal matrix. The frequency domain resolution is 15.6 Hz;
2. a Fourier analysis module:according to the formula (1), FFT is applied to each column of the 1024 xM framing signal matrix for DFT calculation, so that 1024-point DFT of the corresponding column is obtained, and the 1024 xM time-frequency analysis matrix as the formulas (2), (3), (4) and (5) is formed. The matrix is a complex matrix, each element corresponds to a real part and an imaginary part of frequency characteristics of a certain frequency band at a certain moment;
3. a sub-matrix forming module:is provided withThe maximum absolute value of the real or imaginary part of all elements of the matrix is d. Constructing 2 matrices
(6)
(7)
Andare respectively corresponding toReal part of matrixAnd imaginary partNormalizes the subarrays by the absolute value of (c). D is used as a normalization constant in order to makeAndthe dynamic ranges are consistent;
4. a symbol encoding matrix forming module:respectively extracting in formula (5) by using sign functionReal part of matrixAnd imaginary partSymbol of
(8)
(9)
Function(s)The function of (1) is to output-1 when x is less than 0, and +1 when x is greater than 0, and 0 when x is equal to 0. The weighted sum of the two formulas (8) and (9) is obtained to obtain the real partAnd imaginary partSymbol combination coding of
(10)
(10) The symbol combination coding results of (a) are shown in table 1. The 9 calculations in table 1 mark 9 states of the real and imaginary sign combinations. In order to visualize the symbol combination code by using the G channel, the zero point of the result in Table 1 needs to be translated and normalized, and the normalization is expressed by the following formula
(11)
As can be seen from the above formula (11),the value of (A) is between 0 and 0.01, and the results are shown in Table 2. The normalization constant of 800 is used to make the maximum value of the G channel much smaller than the values of the R channel and the B channel, so that the green color of the G channel does not visually interfere with the R-B secondary color chart when the spectrogram is visualizedThe effect of the image;
TABLE 1 real partAnd imaginary partSymbol combination coding
Table 2 real partAnd imaginary partNormalized coding of symbol combinations
5. The RGB color model driving matrix forming and visualizing module comprises:constructing a 3-dimensional matrixNormalization submatrix of absolute value of real partLayer 1, the imaginary absolute value normalization submatrix as the layer number dimensionLayer 3, symbol-combining coding matrix as layer dimensionAs layer 2 in the layer number dimension. Will 3-dimensional matrixAnd forming a color compound spectrogram as a driving matrix of the RGB color model. In which the real part normalizes the absolute value sub-matrixCorresponding to the red channel R, the imaginary part normalized absolute value sub-matrixCorresponding to blue channel B, symbol combination coding matrixCorresponding to the green channel G. Because the value of the G channel is far smaller than that of the R channel and the B channel, the color compound spectrogram is visually embodied as an R-B two-primary-color image.
And a frequency domain subarray extraction module:respectively extracting the 1 st layer and the 3 rd layer in a 3-dimensional matrix corresponding to the image-processed two-primary-color compound language spectrogram intoAndtwo matrixes are used for standby;
7. a symbol decoding module:
taking out the G channel symbol combined code to form a normalized symbol combined code matrix
(1) Real part symbol decoding, firstly, the symbol combination coding matrix recovery is realized by the following formula
(11)
Then the real part symbol matrix
(12)
In the formula (12)Is a step function whenWhen the temperature of the water is higher than the set temperature,when is coming into contact withWhen the temperature of the water is higher than the set temperature,when is coming into contact withWhen the temperature of the water is higher than the set temperature,. (12) The result of formula (la) is: when in useThe time corresponding to the real part sign is positive,the result of (a) is + 1; when in useThe sign of the real part of the time-domain correspondence is negative,the result of (a) is-1; when in useThe time corresponding to the real part sign is zero,the result of (2) is 0.
(2) Imaginary symbol decoding using real symbol decoding results
(13)
The result of analysis of formula (13) is asThe time corresponding imaginary sign is positive whenThe calculation results of the equations (13) areAndand are both + 1. And so on.
And the time-frequency characteristic matrix forming module:the real part sub-matrix and the imaginary part sub-matrix are respectively composed ofAndthe method comprises the steps of generating the data,the frequency domain characteristic matrix
(14)
9. A voice signal reconstruction module:applying FFT pairsPerforming inverse column-column Fourier transform to form a processed speech signal framing matrixWill beAll the columns are connected end to form a one-dimensional voice sequence, so that voice reconstruction is realized.

Claims (1)

1. Color compound language spectrogram construction method capable of realizing voice reconstruction by using voice framing technologyFirstly, performing windowing and framing on a voice signal, dividing the voice signal into frame signals of N points, setting the frame signals to be divided into M frames in total to form an NxM framing signal matrix, applying FFT (fast Fourier transform) to perform DFT (discrete Fourier transform) calculation on each column of the NxM framing signal matrix to obtain an N-point DFT (discrete Fourier transform) of a corresponding column, and forming an NxM time-frequency analysis matrixEach element corresponds to the real part and the imaginary part of the frequency characteristic of a certain frequency band at a certain time, and is characterized in that:
1) and a sub-matrix forming module:is provided withThe maximum absolute value of the real part or the imaginary part of all elements of the matrix is d, and 2 matrixes are constructed
Andare respectively corresponding toReal part of matrixAnd imaginary partD as a normalization constant, in order to make the absolute value ofAndthe dynamic ranges are consistent;
2) the symbol coding matrix forming module:by separate extraction of symbolic functionsIn the formulaReal part of matrixAnd imaginary partSymbol of
Function(s)Has the functions of outputting-1 when x is less than 0, outputting +1 when x is greater than 0, outputting 0 when x is equal to 0, andthe weighted sum is obtained by two formulas to obtain the real partAnd imaginary partSymbol combination coding of
The symbol combination coding results of the above formula are shown in table 1, 9 calculation results in table 1 mark 9 states of the real part and imaginary part symbol combination, and in order to visualize the symbol combination coding by using the G channel, the zero point of the results in table 1 is translated and normalized, and the normalization is represented by the following formula
According to the formula, the compound has the advantages of,the value of (a) is between 0 and 0.01, the result is shown in table 2, and the maximum value of the G channel is far smaller than the values of the R channel and the B channel by taking 800 as a normalization constant, so that the green color of the G channel does not visually interfere with the R-B secondary color image when the spectrogram is visualized;
TABLE 1 real partAnd imaginary partSymbol combination coding
Table 2 real partAnd imaginary partNormalized coding of symbol combinations
3) The RGB color model driving matrix forming and visualizing module comprises:constructing a 3-dimensional matrixNormalization submatrix of absolute value of real partLayer 1, the imaginary absolute value normalization submatrix as the layer number dimensionLayer 3, symbol-combining coding matrix as layer dimensionAs the 2 nd layer in the layer number dimension, a 3-dimensional matrix is formedForming a color complex spectrogram as a driving matrix of the RGB color model, wherein a real part normalizes an absolute value sub-matrixCorresponding to the red channel R, the imaginary part normalized absolute value sub-matrixCorresponding to blue channel B, symbol combination coding matrixCorresponding to the green channel G, the value of the G channel is far smaller than that of the R channel and the B channel, so that the color compound spectrogram is visually represented as an R-B two-primary-color image;
4) and a frequency domain subarray extraction module:respectively extracting the 1 st layer and the 3 rd layer in a 3-dimensional matrix corresponding to the image-processed two-primary-color compound language spectrogram intoAndtwo matrixes are used for standby;
5) and a symbol decoding module:
taking out the G channel symbol combined code to form a normalized symbol combined code matrix
(1) Real part symbol decoding, firstly, the symbol combination coding matrix recovery is realized by the following formula
Then the real part symbol matrix
In the above formulaIs a step function whenWhen the temperature of the water is higher than the set temperature,when is coming into contact withWhen the temperature of the water is higher than the set temperature,when is coming into contact withWhen the temperature of the water is higher than the set temperature,the result of formula (la) is: when in useThe time corresponding to the real part sign is positive,the result of (a) is + 1; when in useThe sign of the real part of the time-domain correspondence is negative,the result of (a) is-1; when in useThe time corresponding to the real part sign is zero,the result of (1) is 0;
(2) imaginary symbol decoding using real symbol decoding results
Analyzing the results of the above formula whenThe time corresponding imaginary sign is positive whenThus, it isThe calculation results of the formulae are respectivelyAndall are +1, and the rest are analogized;
6) and the time-frequency characteristic matrix forming module:the real part sub-matrix and the imaginary part sub-matrix are respectively composed ofAndgenerating, then, a frequency domain characteristic matrix
Applying FFT pairsPerforming inverse column-column Fourier transform to form a processed speech signal framing matrixWill beAll the columns are connected end to form a one-dimensional voice sequence, and voice reconstruction can be realized.
CN201410688088.0A 2014-11-26 2014-11-26 Colored repeated sentence spectrum construction method for speech reconstruction Expired - Fee Related CN104392728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410688088.0A CN104392728B (en) 2014-11-26 2014-11-26 Colored repeated sentence spectrum construction method for speech reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410688088.0A CN104392728B (en) 2014-11-26 2014-11-26 Colored repeated sentence spectrum construction method for speech reconstruction

Publications (2)

Publication Number Publication Date
CN104392728A true CN104392728A (en) 2015-03-04
CN104392728B CN104392728B (en) 2017-04-19

Family

ID=52610620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410688088.0A Expired - Fee Related CN104392728B (en) 2014-11-26 2014-11-26 Colored repeated sentence spectrum construction method for speech reconstruction

Country Status (1)

Country Link
CN (1) CN104392728B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788608A (en) * 2016-03-03 2016-07-20 渤海大学 Chinese initial consonant and compound vowel visualization method based on neural network
CN110310624A (en) * 2019-07-03 2019-10-08 中山大学新华学院 A kind of efficient secondary speech detection recognition methods and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141545A (en) * 2007-10-11 2008-03-12 复旦大学 High-speed algorithm for hypercomplex number Fourier transform and hypercomplex number cross correlation of color image
CN102044254A (en) * 2009-10-10 2011-05-04 北京理工大学 Speech spectrum color enhancement method for speech visualization
CN201910239U (en) * 2010-12-21 2011-07-27 西北师范大学 Voice spectrum analyzing system based on field programmable gate array (FPGA)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141545A (en) * 2007-10-11 2008-03-12 复旦大学 High-speed algorithm for hypercomplex number Fourier transform and hypercomplex number cross correlation of color image
CN102044254A (en) * 2009-10-10 2011-05-04 北京理工大学 Speech spectrum color enhancement method for speech visualization
CN201910239U (en) * 2010-12-21 2011-07-27 西北师范大学 Voice spectrum analyzing system based on field programmable gate array (FPGA)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙红英 等: "语谱分析的FPGA实现", 《电子与信息学报》 *
杨春风: "基于语谱图的音频数字水印算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陶中幸: "基于FPGA的信号时域分析方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788608A (en) * 2016-03-03 2016-07-20 渤海大学 Chinese initial consonant and compound vowel visualization method based on neural network
CN105788608B (en) * 2016-03-03 2019-03-26 渤海大学 Chinese phonetic mother method for visualizing neural network based
CN110310624A (en) * 2019-07-03 2019-10-08 中山大学新华学院 A kind of efficient secondary speech detection recognition methods and device

Also Published As

Publication number Publication date
CN104392728B (en) 2017-04-19

Similar Documents

Publication Publication Date Title
CN110675891B (en) Voice separation method and module based on multilayer attention mechanism
CN103426434B (en) Separated by the source of independent component analysis in conjunction with source directional information
US20190318754A1 (en) Methods and Systems for End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
CN102799892B (en) Mel frequency cepstrum coefficient (MFCC) underwater target feature extraction and recognition method
CN104240712B (en) A kind of three-dimensional audio multichannel grouping and clustering coding method and system
CN104732970B (en) A kind of ship-radiated noise recognition methods based on comprehensive characteristics
CN102664010B (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN107103908A (en) The application of many pitch estimation methods of polyphony and pseudo- bispectrum in multitone height estimation
CN105225672B (en) Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information
Rammo et al. Detecting the speaker language using CNN deep learning algorithm
CN108091345B (en) Double-ear voice separation method based on support vector machine
CN104637497A (en) Speech spectrum characteristic extracting method facing speech emotion identification
CN103811023A (en) Audio processing device, method and program
Li et al. Sams-net: A sliced attention-based neural network for music source separation
CN106847301A (en) A kind of ears speech separating method based on compressed sensing and attitude information
Hsu et al. WG-WaveNet: Real-time high-fidelity speech synthesis without GPU
CN107248414A (en) A kind of sound enhancement method and device based on multiframe frequency spectrum and Non-negative Matrix Factorization
CN105788608A (en) Chinese initial consonant and compound vowel visualization method based on neural network
CN103559893B (en) One is target gammachirp cepstrum coefficient aural signature extracting method under water
CN104392728A (en) Colored repeated sentence spectrum construction method for speech reconstruction
CN111488486B (en) Electronic music classification method and system based on multi-sound-source separation
CN101552006B (en) Method for adjusting windowing signal MDCT domain energy and phase and device thereof
CN102820037B (en) Chinese initial and final visualization method based on combination feature
CN104361889A (en) Audio file processing method and device
Casebeer et al. Deep tensor factorization for spatially-aware scene decomposition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170419

Termination date: 20191126

CF01 Termination of patent right due to non-payment of annual fee