WO2013176073A1 - Audio signal conversion device, method, program, and recording medium - Google Patents
Audio signal conversion device, method, program, and recording medium Download PDFInfo
- Publication number
- WO2013176073A1 WO2013176073A1 PCT/JP2013/063907 JP2013063907W WO2013176073A1 WO 2013176073 A1 WO2013176073 A1 WO 2013176073A1 JP 2013063907 W JP2013063907 W JP 2013063907W WO 2013176073 A1 WO2013176073 A1 WO 2013176073A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- audio
- audio signal
- discrete fourier
- window function
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 249
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims description 102
- 238000000605 extraction Methods 0.000 claims abstract description 45
- 239000000284 extract Substances 0.000 claims abstract description 9
- 230000009466 transformation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 90
- 238000010586 diagram Methods 0.000 description 34
- 230000008569 process Effects 0.000 description 32
- 230000015572 biosynthetic process Effects 0.000 description 30
- 238000003786 synthesis reaction Methods 0.000 description 30
- 238000001228 spectrum Methods 0.000 description 26
- 230000002596 correlated effect Effects 0.000 description 18
- 235000009508 confectionery Nutrition 0.000 description 14
- 238000000926 separation method Methods 0.000 description 14
- 230000008929 regeneration Effects 0.000 description 12
- 238000011069 regeneration method Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 10
- 101000984710 Homo sapiens Lymphocyte-specific protein 1 Proteins 0.000 description 8
- 102100027105 Lymphocyte-specific protein 1 Human genes 0.000 description 8
- 230000000875 corresponding effect Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 4
- 230000004807 localization Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
Definitions
- the present invention relates to an audio signal conversion apparatus, method, program, and recording medium for converting an audio signal for a multi-channel playback method.
- Conventionally proposed sound reproduction systems include a stereo (2ch) system and a 5.1ch surround system (ITU-R BS.775-1), which are widely used for consumer use.
- the 2ch system is a system for generating different audio data from the left speaker 11L and the right speaker 11R as schematically illustrated in FIG.
- the 5.1ch surround system is, as schematically illustrated in FIG. 2, a left front speaker 21L, a right front speaker 21R, a center speaker 22C, a left rear speaker 23L, a right rear speaker 23R disposed between them, This is a method of inputting and outputting different audio data to a subwoofer dedicated to a low sound range (generally 20 Hz to 100 Hz) not shown.
- each speaker is arranged on a circumference or a spherical surface centered on the listener (listener), and ideally a listening position (listening position) that is equidistant from each speaker, so-called sweet. It is preferable to listen at a spot. For example, it is preferable to listen to the sweet spot 12 in the 2ch system and the sweet spot 24 in the 5.1ch surround system.
- the synthesized sound image based on the balance of sound pressure is localized where the producer intended.
- the sound image / quality is generally deteriorated.
- these methods are collectively referred to as a multi-channel reproduction method.
- each sound source object includes its own position information and audio signal. It is out.
- each virtual sound source includes the sound of each musical instrument and position information where the musical instrument is arranged.
- the sound source object-oriented reproduction method is usually reproduced by a reproduction method (that is, a wavefront synthesis reproduction method) in which a sound wavefront is synthesized by a group of speakers arranged in a straight line or a plane.
- a reproduction method that is, a wavefront synthesis reproduction method
- the Wave Field Synthesis (WFS) system described in Non-Patent Document 1 is one of the practical mounting methods using linearly arranged speaker groups (hereinafter referred to as speaker arrays). Has been actively studied in recent years.
- Such a wavefront synthesis reproduction method is different from the above-described multi-channel reproduction method, as shown schematically in FIG. 3, for a listener who is listening at any position in front of the arranged speaker groups 31.
- it has the feature that both good sound image and sound quality can be presented at the same time. That is, the sweet spot 32 in the wavefront synthesis reproduction system is wide as shown in the figure.
- a listener who is listening to sound while facing the speaker array in an acoustic space provided by the WFS method is actually a sound source in which the sound radiated from the speaker array virtually exists behind the speaker array. Feels like being emitted from (virtual sound source).
- This wavefront synthesis playback method requires an input signal representing a virtual sound source.
- one virtual sound source needs to include an audio signal for one channel and position information of the virtual sound source.
- it is an audio signal recorded for each musical instrument and position information of the musical instrument.
- the sound signal of each virtual sound source does not necessarily need to be for each musical instrument, but the arrival direction and magnitude of each sound intended by the content creator must be expressed using the concept of virtual sound source. .
- the music content of the stereo method will be considered.
- the audio signals of the L (left) channel and the R (right) channel in stereo music contents are installed on the left speaker 41L and on the right using two speakers 41L and 41R, respectively. Playback is performed by the speaker 41R.
- the speaker 41R When such reproduction is performed, as shown in FIG. 4, only when listening at a point equidistant from each of the speakers 41L and 41R, that is, the sweet spot 43, the voice of the vocal and the sound of the bass can be heard from the middle position 42b.
- the sound image is localized and heard as intended by the producer, such as a piano sound on the left side 42a and a drum sound on the right side 42c. It is considered that such content is reproduced by the wavefront synthesis reproduction method, and the sound image localization as intended by the content producer is provided to the listener at any position, which is a feature of the wavefront synthesis reproduction method. For this purpose, it is necessary to be able to perceive a sound image when listening in the sweet spot 43 of FIG. 4 from any viewing position, such as the sweet spot 53 shown in FIG.
- the vocal group and the sound of the bass are heard from the middle position 52b at the wide sweet spot 53 by the speaker group 51 arranged in a straight line or a plane, and the piano sound is heard from the left position 52a and the drum sound.
- the sound image must be localized and heard as intended by the producer, such as the right position 52c.
- the sweet spot 63 is still generated even if it is reproduced by the wavefront synthesis reproduction method.
- the sound image is localized as shown in FIG. 4 only at the position of the sweet spot 63. That is, in order to realize such sound image localization, it is necessary to separate 2ch stereo data into sound for each sound image by some means and generate virtual sound source data from each sound.
- Patent Document 1 separates 2ch stereo data into a correlated signal and an uncorrelated signal based on the correlation coefficient of the signal power for each frequency band. Is generated, a virtual sound source is generated from the results, and a discontinuous point of the waveform generated at that time is removed.
- a signal other than a human voice is a consonant part by counting the number of zero crossings, and a waveform is generated for a part other than the consonant part.
- a bias value is added so as to be continuous.
- the musical sound signal or the like is usually a mixture of components that are close to white noise, such as consonant parts, and other components.
- a bias value is added to such an audio signal only by the number of zero crossings, an erroneous determination occurs naturally, and this portion is discontinuous in the generated audio signal waveform. Dots are included and perceived as noise.
- the present invention has been made in view of the above-described actual situation, and an object of the present invention is to generate a multichannel audio signal such as 2ch or 5.1ch without generating noise due to discontinuities.
- An audio signal conversion apparatus, method, program, and recording medium that can be converted are provided.
- a first technical means of the present invention is an audio signal conversion apparatus that converts a multi-channel input audio signal into an audio signal for reproduction by a speaker group.
- Each of the input audio signals of the two channels is read while being shifted by 1 ⁇ 4 of the length of the processing segment, the audio signal of the read processing segment is multiplied by a Hann window function, and then a discrete Fourier transform is performed.
- a correlation signal extraction unit that extracts a correlation signal by ignoring a direct current component of the two-channel audio signals after the discrete Fourier transform by the conversion unit, and the correlation signal extracted by the correlation signal extraction unit or the correlation signal And an uncorrelated signal, an audio signal generated from the correlated signal, or from the correlated signal and the uncorrelated signal
- the inverse transform unit that performs discrete Fourier inverse transform on the generated speech signal, and the speech signal of the processing segment of the speech signal after the discrete Fourier inverse transform by the inverse transform unit is multiplied by the Hann window function again. And a window function multiplier for adding to the audio signal of the previous processing segment with a shift of 1 ⁇ 4 of the length of the processing segment.
- the speech signal after the inverse discrete Fourier transform to be processed by the window function multiplication unit is the correlation signal or the correlation signal and the non-correlation signal.
- the audio signal is subjected to scaling processing in the time domain or the frequency domain.
- a third technical means is an audio signal conversion method for converting a multi-channel input audio signal into an audio signal for reproduction by a group of speakers, wherein the conversion unit is configured for each of the input audio signals of two channels.
- the inverse transform step for performing discrete Fourier inverse transform on the speech signal generated from the signal, and the window function multiplication unit for the speech signal of the processing segment among the speech signals after the discrete Fourier inverse transform in the inverse transform step
- the computer reads out each of the input audio signals of the two channels while shifting each 1/4 of the length of the processing segment, and multiplies the read audio signal of the processing segment by the Hann window function. Then, a conversion step for performing a discrete Fourier transform, an extraction step for ignoring a direct current component and extracting a correlation signal for the two-channel audio signals after the discrete Fourier transform in the conversion step, and an extraction step for extracting the correlation signal Discrete Fourier for the correlated signal or for the correlated signal and the uncorrelated signal, for the speech signal generated from the correlated signal, or for the speech signal generated from the correlated signal and the uncorrelated signal An inverse transform step for performing an inverse transform, and a processing section of the audio signal after the discrete Fourier inverse transform at the inverse transform step.
- the fifth technical means is a computer-readable recording medium in which the program in the fourth technical means is recorded.
- FIG. 5 is a schematic diagram showing an ideal sweet spot when the music content of FIG. 4 is reproduced by the wavefront synthesis reproduction method.
- FIG. 5 is a schematic diagram showing a state of an actual sweet spot when the audio signal of the left / right channel in the music content of FIG.
- FIG. 4 is reproduced by the wavefront synthesis reproduction method with virtual sound sources set at the positions of the left / right speakers, respectively.
- FIG. 8 is a block diagram which shows one structural example of the audio
- FIG. 8 is a block diagram illustrating a configuration example of an audio signal processing unit (an audio signal conversion device according to the present invention) in the audio data reproduction device of FIG. 7. It is a flowchart for demonstrating an example of the audio
- FIG. 14 It is a figure which shows the window function multiplied once per 1/4 segment in the first window function multiplication process in the audio
- FIG. 14 It is a figure which shows the window function multiplied once per 1/4 segment in the first window function multiplication process in the audio
- FIG. 6 is a schematic diagram for explaining waveform discontinuities occurring at segment boundaries after inverse discrete Fourier transform when the left and right channel audio signals are discrete Fourier transformed and the left and right channel DC components are ignored. It is a figure which shows the segment after discrete Fourier inverse transform which performs the discontinuous point removal process which concerns on this invention. It is a figure which shows an example of the graph of the waveform of the input audio
- An audio signal conversion apparatus is an apparatus for converting an audio signal for a multi-channel playback system into an audio signal for playback on a speaker group having the same or different number of channels, an audio signal for a wavefront synthesis playback system, or the like. Therefore, it can also be called an audio signal processing device, an audio data conversion device, etc., and can be incorporated into an audio data reproducing device.
- the audio signal is not limited to a signal in which a so-called audio is recorded, and can also be called an acoustic signal.
- the wavefront synthesis reproduction method is a reproduction method in which a wavefront of sound is synthesized by a group of speakers arranged in a straight line or a plane as described above.
- FIG. 7 is a block diagram showing an example of the configuration of an audio data reproducing apparatus including the audio signal converting apparatus according to the present invention.
- FIG. 8 is an audio signal processing unit (according to the present invention) in the audio data reproducing apparatus of FIG. It is a block diagram which shows one structural example of an audio
- the decoder 71 decodes the content of only audio or video with audio, converts it into a signal processable format, and outputs it to the audio signal extraction unit 72.
- the content is acquired by downloading from the Internet from a digital broadcast content transmitted from a broadcasting station, a server that distributes digital content via a network, or reading from a recording medium such as an external storage device.
- the audio data reproducing device 70 includes a digital content input unit that inputs digital content including a multi-channel input audio signal.
- the decoder 71 decodes the digital content input here.
- the audio signal extraction unit 72 separates and extracts an audio signal from the obtained signal. Here, it is a 2ch stereo signal.
- the signals for the two channels are output to the audio signal processing unit 73.
- the audio signal processing unit 73 generates multi-channel audio signals (which will be described as signals corresponding to the number of virtual sound sources in the following example) from three or more channels and different from the input audio signal from the obtained two-channel signals. . That is, the input audio signal is converted into another multi-channel audio signal.
- the audio signal processing unit 73 outputs the audio signal to the D / A converter 74.
- the number of virtual sound sources can be determined in advance if there is a certain number or more, but the amount of calculation increases as the number of virtual sound sources increases. Therefore, it is desirable to determine the number in consideration of the performance of the mounted device. In this example, the number is assumed to be 5.
- the D / A converter 74 converts the obtained signals into analog signals and outputs each signal to the amplifier 75.
- Each amplifier 75 amplifies the input analog signal, transmits it to each speaker 76, and outputs it from each speaker 76 as sound.
- FIG. 8 shows the detailed configuration of the audio signal processing unit in this figure.
- the audio signal processing unit 73 includes a window function multiplication unit 81, an audio signal separation / extraction unit 82, a window function multiplication unit 83, and an audio output signal generation unit 84.
- the window function multiplication unit 81 reads out the 2-channel audio signal, multiplies it by the Hann window function, and outputs it to the audio signal separation / extraction unit 82.
- the audio signal separation / extraction unit 82 generates an audio signal corresponding to each virtual sound source from the two-channel signals, and outputs it to the window function multiplication unit 83.
- the window function multiplication unit 83 removes a perceptual noise part from the obtained audio signal waveform, and outputs the audio signal after noise removal to the audio output signal generation unit 84.
- the window function multiplication unit 83 functions as a noise removal unit.
- the audio output signal generation unit 84 generates each output audio signal waveform corresponding to each speaker from the obtained audio signal.
- the audio output signal generation unit 84 performs processing such as wavefront synthesis reproduction processing, for example, assigns the obtained audio signal for each virtual sound source to each speaker, and generates an audio signal for each speaker.
- processing such as wavefront synthesis reproduction processing, for example, assigns the obtained audio signal for each virtual sound source to each speaker, and generates an audio signal for each speaker.
- a part of the wavefront synthesis reproduction processing may be performed by the audio signal separation / extraction unit 82.
- FIG. 9 is a flowchart for explaining an example of the audio signal processing in the audio signal processing unit in FIG. 8, and FIG. 10 is a diagram showing a state in which audio data is stored in the buffer in the audio signal processing unit in FIG. .
- FIG. 11 is a diagram showing the Hann window function
- FIG. 12 is a diagram showing one 1 per 1/4 segment in the first window function multiplication process (window function multiplication process in the window function multiplication unit 81) in the audio signal processing of FIG. It is a figure which shows the window function multiplied by times.
- the window function multiplying unit 81 reads out audio data of 1/4 length of one segment from the extraction result in the audio signal extracting unit 72 in FIG. 7 (step S1).
- the audio data refers to a discrete audio signal waveform sampled at a sampling frequency such as 48 kHz.
- a segment is an audio data section composed of a group of sample points having a certain length.
- the segment refers to a section length to be subjected to discrete Fourier transform later, and is also called a processing segment.
- the value is 1024.
- 256 points of audio data that is 1 ⁇ 4 of one segment are to be read.
- the read 256-point audio data is stored in the buffer 100 as illustrated in FIG.
- This buffer can hold the sound signal waveform for the immediately preceding segment, and the past segments are discarded.
- the immediately previous 3/4 segment data (768 points) and the latest 1/4 segment data (256 points) are connected to create audio data for one segment, and the process proceeds to window function calculation (step S2). That is, all sample data is read four times in the window function calculation.
- the window function multiplication unit 81 executes a window function calculation process for multiplying the audio data for one segment by the next Hann window that has been conventionally proposed (step S2).
- This Hann window is illustrated as the window function 110 in FIG.
- m is a natural number
- M is an even number of one segment length.
- the above input signal x L (m 0 ) is multiplied by sin 4 ((m 0 / M) ⁇ ). If this is illustrated as a window function, a window function 120 shown in FIG. 12 is obtained. Since this window function 120 is added a total of four times while being shifted every quarter segment,
- the audio data thus obtained is subjected to discrete Fourier transform as in the following formula (3) to obtain frequency domain audio data (step S3).
- the processing in steps S3 to S10 may be performed by the audio signal separation / extraction unit 82.
- DFT represents discrete Fourier transform
- k is a natural number
- X L (k) and X R (k) are complex numbers.
- X L (k) DFT (x ′ L (n))
- X R (k) DFT (x ′ R (n)) (3)
- steps S5 to S8 is performed for each line spectrum on the obtained frequency domain audio data (steps S4a and S4b).
- Specific processing will be described.
- an example of performing processing such as obtaining a correlation coefficient for each line spectrum will be described.
- a band (small size) divided using Equivalent Rectangular Band (ERB) is used.
- Processing such as obtaining a correlation coefficient may be executed for each (band).
- the line spectrum after the discrete Fourier transform is symmetrical with respect to M / 2 (where M is an even number) except for the DC component, that is, for example, X L (0). That is, X L (k) and X L (Mk) have a complex conjugate relationship in the range of 0 ⁇ k ⁇ M / 2. Therefore, in the following, the range of k ⁇ M / 2 is considered as the object of analysis, and the range of k> M / 2 is treated the same as a symmetric line spectrum having a complex conjugate relationship.
- the correlation coefficient is acquired by calculating
- This normalized correlation coefficient d (i) represents how much the audio signals of the left and right channels are correlated, and takes a real value between 0 and 1. 1 if the signals are exactly the same, and 0 if the signals are completely uncorrelated.
- step S6 a conversion coefficient for separating and extracting the correlation signal and the non-correlation signal from the audio signals of the left and right channels is obtained (step S6), and obtained in step S6.
- step S7 the correlation signal and the non-correlation signal are separated and extracted from the audio signals of the left and right channels (step S7). What is necessary is just to extract both a correlation signal and a non-correlation signal as the estimated audio
- each signal of the left and right channels is composed of an uncorrelated signal and a correlated signal, and for the correlated signal, a signal waveform that differs only in gain from the left and right (that is, a signal waveform composed of the same frequency component) is output.
- the gain corresponds to the amplitude of the signal waveform and is a value related to the sound pressure.
- this model it is assumed that the direction of the sound image synthesized by the correlation signals output from the left and right is determined by the balance of the left and right sound pressures of the correlation signal.
- s (m) is a left and right correlation signal
- n L (m) is a subtracted correlation signal s (m) from a left channel audio signal, and can be defined as an uncorrelated signal (left channel).
- N R (m) is obtained by subtracting the correlation signal s (m) multiplied by ⁇ from the right channel audio signal, and can be defined as an uncorrelated signal (right channel).
- ⁇ is a positive real number representing the degree of left / right sound pressure balance of the correlation signal.
- Equation (8) the audio signals x ′ L (m) and x ′ R (m) after the window function multiplication described in Equation (2) are expressed by the following Equation (9). However, s ′ (m), n ′ L (m), and n ′ R (m) are obtained by multiplying s (m), n L (m), and n R (m) by a window function, respectively.
- Equation (10) is obtained by performing a discrete Fourier transform on the equation (9).
- S (k), N L (k), and N R (k) are discrete Fourier transforms of s ′ (m), n ′ L (m), and n ′ R (m), respectively.
- X L (k) S (k) + N L (k)
- X R (k) ⁇ S (k) + N R (k) (10)
- ⁇ (i) represents ⁇ in the i-th line spectrum.
- P S (i) and P N (i) are the powers of the correlated signal and the uncorrelated signal in the i-th line spectrum, respectively. It is expressed.
- Equation (4) is It can be expressed as. However, in this calculation, it is assumed that S (k), N L (k), and N R (k) are orthogonal to each other and the power when multiplied is 0.
- Equation (13) By solving Equation (13) and Equation (15), the following equation is obtained.
- est (S (i) (k)) ⁇ 1 X L (i) (k) + ⁇ 2 X R (i) (k) (18)
- est (A) represents an estimated value of A.
- each parameter is obtained as follows.
- est ′ (A) represents a scaled estimate of A.
- the parameters ⁇ 3 to ⁇ 6 are
- est (N L (i) (k)) and est (N R (i) (k)) obtained in this way are also scaled by the following equations, as described above.
- step S6 The respective transformation variables ⁇ 1 to ⁇ 6 represented by the mathematical expressions (22), (27), and (28) and the scaling coefficients represented by the mathematical expressions (24), (29), and (30) are converted coefficients obtained in step S6. It corresponds to.
- step S7 the correlation signal and the non-correlated signal (the uncorrelated signal of the right channel, the uncorrelated signal of the left channel, and the left channel) are estimated by calculation using these transform coefficients (formulas (18), (25), (26)). And uncorrelated signals).
- step S8 the assignment process to the virtual sound source is performed (step S8).
- this allocation process as a pre-process, the direction of the synthesized sound image generated by the correlation signal estimated for each line spectrum is estimated.
- FIGS. FIG. 13 is a schematic diagram for explaining an example of the positional relationship between the listener, the left and right speakers, and the synthesized sound image
- FIG. 14 shows an example of the positional relationship between the speaker group used in the wavefront synthesis reproduction method and the virtual sound source.
- FIG. 15 is a schematic diagram for explaining an example of the positional relationship between the virtual sound source of FIG. 14, the listener, and the synthesized sound image.
- the spread angle formed is ⁇ 0
- the spread angle formed by the line drawn from the listener 133 to the position of the estimated synthesized sound image 132 is ⁇ .
- the direction of the synthesized sound image 132 generated by the output sound is the following using the parameter ⁇ representing the sound pressure balance. It is generally known that the following equation can be approximated (hereinafter referred to as the sign law in stereophonic sound).
- the audio signal separation and extraction unit 82 shown in FIG. 8 converts the 2ch signal into a signal of a plurality of channels.
- the number of channels after conversion is five, it is regarded as virtual sound sources 142a to 142e in the wavefront synthesis reproduction system as shown in the positional relationship 140 shown in FIG. 14, and behind the speaker group (speaker array) 141. Deploy. Note that the virtual sound sources 142a to 142e are equally spaced from adjacent virtual sound sources. Therefore, the conversion here converts the audio signal of 2ch into the audio signal of the number of virtual sound sources.
- the audio signal separation / extraction unit 82 first separates the 2ch audio signal into one correlation signal and two uncorrelated signals for each line spectrum. In the audio signal separation / extraction unit 82, it is necessary to determine in advance how to allocate these signals to the virtual sound sources of the number of virtual sound sources (here, five virtual sound sources).
- the assignment method may be user-configurable from a plurality of methods, or may be presented to the user by changing the selectable method according to the number of virtual sound sources.
- the left and right uncorrelated signals are assigned to both ends (virtual sound sources 142a and 142e) of the five virtual sound sources, respectively.
- the synthesized sound image generated by the correlation signal is assigned to two adjacent virtual sound sources out of the five.
- the synthesized sound image generated by the correlation signal is assumed to be inside both ends (virtual sound sources 142a and 142e) of the five virtual sound sources, that is, 2ch stereo reproduction. Assume that five virtual sound sources 142a to 142e are arranged so as to fall within a spread angle formed by two speakers at the time.
- two adjacent virtual sound sources that sandwich the synthesized sound image are determined from the estimated direction of the synthesized sound image, and the allocation of the sound pressure balance to the two virtual sound sources is adjusted, and the two virtual sound sources are synthesized.
- An allocation method is adopted in which reproduction is performed so as to generate a sound image.
- the spread angle formed by the line drawn from the listener 153 to the midpoint of the virtual sound sources 142a and 142e at both ends and the line drawn from the virtual sound source 142e at the end is ⁇ 0.
- a spread angle formed by a line drawn from the listener 153 to the synthesized sound image 151 is defined as ⁇ .
- a line drawn from the listener 153 at the midpoint between the two virtual sound sources 142c and 142d sandwiching the synthesized sound image 151 and a line drawn from the listener 153 at the midpoint between the virtual sound sources 142a and 142e at both ends from the listener 153).
- a spread angle formed by a line drawn on the virtual sound source 142c) is ⁇ 0
- a spread angle formed by a line drawn from the listener 153 on the synthesized sound image 151 is ⁇ .
- ⁇ 0 is a positive real number.
- the synthesized sound image 151 is positioned between the third virtual sound source 142c and the fourth virtual sound source 142d as counted from the left as shown in FIG.
- ⁇ 0 ⁇ 0.11 [rad] is obtained by simple geometric calculation using a trigonometric function between the third virtual sound source 142c and the fourth virtual sound source 142d.
- the direction of the synthesized sound image generated by the correlation signal in each line spectrum is represented by a relative angle from the directions of the two virtual sound sources sandwiching the synthetic sound image.
- the synthesized sound image is generated by the two virtual sound sources 142c and 142d.
- the sound pressure balance of the output audio signals from the two virtual sound sources 142c and 142d may be adjusted, and as the adjustment method, the sign law in the stereophonic sound used again as Equation (31) is used.
- the scaling coefficient for the third virtual sound source 142c is g 1
- the scaling coefficient for the fourth virtual sound source 142d is When g 2, g 1 ⁇ est from the third virtual sound source 142c '(S (i) ( k)), from the fourth virtual source 142d g 2 ⁇ est' (S (i) (k))
- the audio signal is output.
- g 1 and g 2 are due to the sign law in stereophonic sound, Should be satisfied.
- the audio signal of g 1 ⁇ est ′ (S (i) (k)) is transmitted from the fourth virtual sound source 142d to the third virtual sound source 142c as described above.
- the audio signal of g 2 ⁇ est ′ (S (i) (k)) is assigned.
- the uncorrelated signal is assigned to the virtual sound sources 142a and 142e at both ends. In other words, 'the (N L (i) (k )), the 5 th virtual source 142e est' est is the first virtual sound source 142a assigns the (N R (i) (k )).
- the second virtual sound source has g 2 ⁇ est ′ (S (i) (k)) and est ′. Both (N R (i) (k)) will be assigned.
- the left and right channel correlation signals and uncorrelated signals are assigned to the i-th line spectrum in step S8. This is performed for all line spectra by the loop of steps S4a and S4b. For example, if 256 discrete Fourier transforms are performed, the first to 127th line spectrum up to 512 points. If 512 discrete Fourier transforms are performed, all the segment points up to 1st to 255th line spectrum (1024 points). When the discrete Fourier transform is performed for, the first to 511th line spectra are obtained. As a result, if the number of virtual sound sources is J, output audio signals Y 1 (k),..., Y J (k) in the frequency domain for each virtual sound source (output channel) are obtained.
- steps S10 to S12 is executed for each obtained output channel (steps S9a and S9b).
- steps S10 to S12 will be described.
- the output speech signal y ′ J (m) in the time domain is obtained by performing inverse discrete Fourier transform on each output channel (step S10).
- DFT ⁇ 1 represents discrete Fourier inverse transform.
- y ′ J (m) DFT ⁇ 1 (Y J (k)) (1 ⁇ j ⁇ J) (35)
- the signal y ′ J (m) obtained by the inverse transformation is also multiplied by the window function. It is in the state.
- the window function is a function as shown in Formula (1), and reading is performed while shifting by a 1 ⁇ 4 segment length. Therefore, as described above, the window function is shifted by a 1 ⁇ 4 segment length from the head of the previous processed segment.
- the converted data is obtained by adding to the output buffer.
- FIG. 16 is a waveform graph schematically showing this. More specifically, FIG. 16 is a diagram for explaining the discontinuity points of the waveform generated at the segment boundary after the inverse discrete Fourier transform when the left and right channel audio signals are discrete Fourier transformed and the left and right channel DC components are ignored. It is a schematic diagram. In the graph 160 shown in FIG. 16, the horizontal axis represents time.
- the symbol (0) (l) indicates the first sample point of the l-th segment
- (M ⁇ 1) ( The symbol l) indicates the Mth sample point of the lth segment.
- the vertical axis of the graph 160 is the value of the output signal for those sample points. As can be seen from the graph 160, a discontinuous point occurs in the portion from the end of the (l-1) th segment to the beginning of the lth segment.
- the audio signal converter according to the present invention is configured as follows. That is, the audio signal conversion apparatus according to the present invention includes a conversion unit, a correlation signal extraction unit, an inverse conversion unit, and a window function multiplication unit.
- the conversion unit reads out each of the input audio signals of the two channels while shifting by 1 ⁇ 4 of the length of the processing segment, multiplies the read audio signal of the processing segment by a Hann window function, and then performs a discrete Fourier transform. Apply.
- the correlation signal extraction unit extracts the correlation signal from the two-channel audio signals after the discrete Fourier transform by the conversion unit while ignoring the DC component. That is, the correlation signal extraction unit extracts the correlation signal of the input audio signals of the two channels.
- the inverse transform unit is (a1) for the correlation signal extracted by the correlation signal extraction unit, or (a2) for the correlation signal and the non-correlation signal (signal excluding the correlation signal), or (b1) the Discrete Fourier inverse transform is performed on the audio signal generated from the correlation signal or (b2) the audio signal generated from the correlation signal and the non-correlation signal.
- the inverse transform unit is an example of the audio signal of (b2) above, and the discontinuous points are removed from the audio signal after allocation to the virtual sound source for the wavefront synthesis reproduction method.
- discontinuous points with respect to an audio signal before allocation to a virtual sound source which is an example of the above (a1) or (a2), that is, with respect to an extracted correlation signal or an extracted correlation signal and an uncorrelated signal May be removed and then assigned.
- the window function multiplying unit again multiplies the audio signal of the processing segment by the Hann window function among the audio signals after the inverse discrete Fourier transform by the inverse transform unit (that is, the correlation signal or the audio signal generated therefrom). Then, the waveform is shifted by 1 ⁇ 4 of the length of the processing segment and added to the audio signal of the previous processing segment, whereby the discontinuous point of the waveform is removed from the audio signal after the inverse discrete Fourier transform by the inverse transform unit.
- the previous processing segment refers to the previous processing segment, which is actually shifted by 1 ⁇ 4, and refers to the previous, second, and third previous processing segments.
- the conversion unit described above is included in the window function multiplication unit 81 and the audio signal separation extraction unit 82, and the correlation signal extraction unit and the inverse conversion unit described above are included in the audio signal separation extraction unit 82.
- the window function multiplication unit described above can be exemplified by the window function multiplication unit 83.
- FIG. 17 is a diagram showing segments after discrete Fourier inverse transform to which discontinuous point removal processing according to the present invention is performed.
- FIG. 18 is a diagram illustrating an example of a waveform graph of the input audio signal
- FIG. 19 is a waveform graph after the first multiplication process by the Hann window function is performed on the audio signal of FIG.
- FIG. 20 is a diagram illustrating an example of a waveform graph obtained by performing inverse discrete Fourier transform on the audio signal of FIG. 19, and
- FIG. 21 is a Hann window function for the audio signal of FIG. It is a figure which shows an example of the graph of the waveform after performing the 2nd multiplication process.
- the average value of the first value of the waveform of the segment (processing segment) 170 after the inverse discrete Fourier transform as shown in FIG. Subtract from the value.
- the Hann window is calculated before performing the discrete Fourier transform. That is, since the values of both end points of the Hann window are 0, if no spectral component value is changed after the discrete Fourier transform and the inverse discrete Fourier transform is performed again, the end points of the segment become 0, There is no discontinuity between them.
- each spectral component is changed as described above. Therefore, both end points of the segment after the inverse discrete Fourier transform are not 0, and discontinuous points between the segments are generated.
- the Hann window is calculated again as described above.
- the process from the first input speech signal until the multiplication process by the second Hann window function is performed is based on the simplified input waveform.
- an audio signal waveform such as a graph 180 shown in FIG. 18 is inputted
- an audio signal waveform like a graph 190 shown in FIG. 19 is generated by the first Hann window function calculation, and the steps of FIG.
- the process proceeds to the discrete Fourier transform process of S3.
- the waveform of the audio signal as a result of the discrete Fourier inverse transform processing in step S10 in FIG. 9 is shifted from 0 at both end points as in the graph 200 shown in FIG.
- both end points become discontinuous points and are perceived as noise. Therefore, when the Hann window function is calculated again as in step S11 in FIG. 9, both end points are obtained as shown in a graph 210 in FIG. Is a waveform of an audio signal that is guaranteed to be zero. Therefore, the second Hann window function multiplication process ensures that no discontinuity occurs.
- the waveform of the processing segment 170 in FIG. 17 does not have a discontinuous point as in the graph 160 in FIG. 16, and a portion that is a discontinuous point in the graph 160 (segment boundary portion) has a value of 0. It becomes continuous and the slope (differential value) also coincides.
- the processing segment after the second Hann window function multiplication process is multiplied by 2/3, which is the inverse of 3/2, and this is multiplied by the audio signal of the previous processing segment (actually The original waveform can be completely restored if it is added to the audio signals of the previous, second, and third processing segments. Actually, at this point, the waveform of the audio signal of the previous three processing segments can be completely restored. In this way, if the processing segments after the second Hann window function multiplication process multiplied by 2/3 are added while being shifted by 1/4 segment, the original waveform can be completely restored.
- the processing segment after the second Hann window function multiplication processing is added while shifting by 1/4 segment, and the processing segment (the processing segment before the above three) that has been completely added is multiplied by 2/3. In this case, the original signal is completely restored.
- the process of multiplying 2/3 does not have to be executed, but only the amplitude is increased.
- FIG. 22 is a schematic diagram for explaining the processing in the case where the shift width is 1/2 segment and the window function calculation is performed only once
- FIG. 23 is the processing of the present invention (the shift width is 1/4 segment and the window It is a schematic diagram for demonstrating the process in the case of performing a function calculation twice.
- the first window function calculation is performed on the input waveform 231 that is the same as the input waveform 221 with a shift width of 1/4 segment, and the discrete Fourier transform is performed.
- a second window function calculation is performed.
- each segment waveform 232, 233, 234, 235 obtained by performing the second window function operation on each segment waveform after the inverse discrete Fourier transform has both ends as shown in the graph 210 of FIG. Always 0.
- the segment waveforms 234, 233, and 232 correspond to the waveforms of the previous 1, 2, and 3 processing segments, respectively. Since the output waveform 236 is obtained by adding the segment waveforms 232, 233, 234, and 235, no discontinuity occurs in the output waveform 236 even by the addition.
- the discontinuous points of the waveform are removed from the speech signal after the inverse discrete Fourier transform by performing the second Hann window function multiplication processing. Therefore, according to the present invention, an audio signal for a multi-channel system such as 2ch or 5.1ch is converted into an audio signal for reproduction by a wavefront synthesis reproduction system without generating noise due to discontinuous points. It becomes possible to do.
- the listener since it can be converted into an audio signal to be reproduced by the wavefront synthesis reproduction method without generating noise, it is possible for the listener at any position which is a feature of the wavefront synthesis reproduction method. Can also enjoy the effect of providing sound image localization as intended by the content creator.
- the speech signal after inverse discrete Fourier transform to be processed by the window function multiplication unit 83 is scaled in the time domain or the frequency domain with respect to the correlation signal or the correlation signal and the non-correlation signal as illustrated in each equation. It is good also as an audio
- the audio signal conversion processing according to the present invention has been described with reference to an example in which the input audio signal is a 2ch audio signal. However, it can be applied to other multi-channel audio signals. To do.
- a 5.1ch input audio signal will be described as an example with reference to FIG. 24, but the present invention can be similarly applied to other multi-channel input audio signals.
- FIG. 24 is a schematic diagram for explaining an example of the positional relationship between a speaker group to be used and a virtual sound source when a 5.1ch audio signal is reproduced by the wavefront synthesis reproduction method.
- 5.1ch speakers are arranged as shown in FIG. 2, and three speakers 21L, 22C, and 21R are arranged in front of the listener.
- the so-called center channel at the front center is often used for applications such as human speech. That is, there are not many places where sound pressure control is performed so as to generate a synthesized sound image between the center channel and the left channel or between the center channel and the right channel.
- the input audio signals to the 5.1ch front left and right speakers 242a and 242c are converted by this method (audio signal conversion processing according to the present invention) as in the positional relationship 240 shown in FIG.
- audio signal conversion processing according to the present invention
- an audio signal of the center channel center speaker channel
- the output audio signal is reproduced as a sound image for the virtual sound source by the speaker array 241 by the wavefront synthesis reproduction method.
- speakers 242d and 242e may be installed behind 5.1ch and output without any change from there.
- the present invention relates to any two input audio signals among the multi-channel input audio signals.
- the audio signal conversion process as described above is performed to generate an audio signal to be reproduced by the wavefront synthesis reproduction method, and the input audio signals of the remaining channels are added to the generated audio signal and output.
- an adder may be provided in the audio output signal generator 84.
- FIGS. 25 to 27 are diagrams showing examples of the configuration of the television apparatus provided with the audio data reproducing apparatus of FIG. 7, and FIGS. 28 and 29 are diagrams of the video projection system provided with the audio data reproducing apparatus of FIG. 7, respectively.
- FIG. 30 is a diagram illustrating a configuration example
- FIG. 30 is a diagram illustrating a configuration example of a system including a television board and a television device including the audio data reproduction device of FIG. 7,
- FIG. 31 is provided with the audio data reproduction device of FIG. It is a figure which shows the example of a motor vehicle.
- FIGS. 25 to 31 an example is shown in which eight speakers indicated by LSP1 to LSP8 are arranged as the speaker array, but the number of speakers may be plural.
- the audio signal conversion apparatus and the audio data reproduction apparatus including the same can be used for a television apparatus.
- the arrangement of these devices in the television device may be determined freely.
- a speaker group 252 in which the speakers LSP1 to LSP8 in the audio data reproducing device are arranged in a straight line may be provided below the television screen 251.
- a speaker group 262 in which the speakers LSP1 to LSP8 in the audio data reproducing device are arranged in a straight line may be provided above the television screen 261.
- the television screen 271 may be embedded with a speaker group 272 in which transparent film type speakers LSP1 to LSP8 in the audio data reproducing device are arranged in a straight line.
- the audio signal conversion device according to the present invention and the audio data reproduction device provided with the same can be used in a video projection system.
- the speaker group 282 of the speakers LSP1 to LSP8 may be embedded in the projection screen 281b for projecting the video by the video projection device 281a.
- a speaker group 292 in which the speakers LSP1 to LSP8 are arranged behind the sound transmission type screen 291b for projecting an image by the video projection device 291a may be arranged.
- the audio signal conversion apparatus according to the present invention and the audio data reproduction apparatus including the same can be embedded in a TV stand (TV board). As in a system (home theater system) 300 shown in FIG.
- a speaker group 302b in which speakers LSP1 to LSP8 are arranged may be embedded in a TV stand 302a on which the TV device 301 is mounted. Furthermore, the audio signal conversion device according to the present invention and the audio data reproduction device including the same can also be applied to car audio. As in an automobile 310 shown in FIG. 31, a speaker group 312 in which speakers LSP1 to LSP8 are arranged in a curved line may be embedded in a dashboard inside the vehicle.
- the listener can perform the conversion process (in the audio signal processing unit 73 in FIGS. 7 and 8). It is also possible to provide a switching unit that switches whether or not to perform processing by a user operation performed by a button operation or a remote controller operation provided in the apparatus main body.
- the 2ch audio data may be reproduced by arranging the virtual sound source as shown in FIG. Or you may reproduce
- 5.1ch audio data may be assigned to three virtual sound sources, or may be reproduced using only one or two speakers at both ends and the middle.
- any method may be used as long as it includes a speaker array (a plurality of speakers) and outputs a sound image for a virtual sound source from those speakers.
- a preceding sound effect means that if the same sound is played from multiple sound sources and each sound reaching the listener from each sound source has a small time difference, the sound image is localized in the sound source direction of the sound that has arrived in advance. It points out the effect to do. If this effect is used, a sound image can be perceived at the virtual sound source position.
- the audio signal conversion apparatus has been described on the assumption that the audio signal for the multi-channel method is converted into an audio signal for reproduction by the wavefront synthesis reproduction method.
- the present invention can be similarly applied to the case of converting to a multi-channel audio signal (the number of channels may be the same or different).
- the converted audio signal may be an audio signal to be reproduced by a speaker group including at least a plurality of speakers, although the arrangement is not limited. This is because even in the case of such conversion, the DC component may be ignored in order to perform the discrete Fourier transform / inverse transform as described above and obtain a correlation signal.
- each of the signals extracted for each virtual sound source is associated with one speaker at a time, and is normally output and reproduced instead of the wavefront synthesis reproduction method. Can be considered. Further, various reproduction methods such as a method of assigning the uncorrelated signals on both sides to different speakers installed on the side and the rear can be considered.
- each component of the audio signal conversion apparatus such as each component in the audio signal processing unit 73 illustrated in FIG. (Or DSP: Digital Signal Processor), hardware such as a memory, a bus, an interface, and a peripheral device, and software that can be executed on these hardware.
- Part or all of the hardware can be mounted as an integrated circuit / IC (Integrated Circuit) chip set, and in this case, the software may be stored in the memory.
- all the components of the present invention may be configured by hardware, and in that case as well, part or all of the hardware can be mounted as an integrated circuit / IC chip set. .
- a recording medium on which a program code of software for realizing the functions in the various configuration examples described above is recorded is supplied to a device such as a general-purpose computer serving as an audio signal conversion device, and the microprocessor or DSP in the device is used.
- the object of the present invention is also achieved by executing the program code.
- the software program code itself realizes the functions of the above-described various configuration examples. Even if the program code itself or a recording medium (external recording medium or internal storage device) on which the program code is recorded is used.
- the present invention can be configured by the control side reading and executing the code.
- Examples of the external recording medium include various media such as an optical disk such as a CD-ROM or a DVD-ROM and a nonvolatile semiconductor memory such as a memory card.
- Examples of the internal storage device include various devices such as a hard disk and a semiconductor memory.
- the program code can be downloaded from the Internet and executed, or received from a broadcast wave and executed.
- the present invention converts a multi-channel input audio signal into an audio signal for reproduction by a speaker group.
- a form as an audio signal conversion method may also be adopted.
- This audio signal conversion method has the following conversion step, extraction step, inverse conversion step, and window function multiplication step.
- the conversion step after the conversion unit reads each of the input audio signals of the two channels while shifting by 1 ⁇ 4 of the length of the processing segment, the audio signal of the read processing segment is multiplied by the Hann window function.
- This is a step of performing discrete Fourier transform.
- the extraction step is a step in which the correlation signal extraction unit extracts a correlation signal by ignoring a direct current component of the audio signals of the two channels after the discrete Fourier transform in the conversion step.
- the inverse conversion unit performs the correlation signal or the correlation signal and the non-correlation signal extracted in the extraction step, the voice signal generated from the correlation signal, or the correlation signal and the non-correlation signal.
- This is a step of performing inverse discrete Fourier transform on the generated audio signal.
- the window function multiplication unit again multiplies the audio signal of the processing segment among the audio signals after the inverse discrete Fourier transform in the inverse conversion step by the Hann window function, and 1 / of the length of the processing segment.
- This is a step of shifting by 4 and adding to the audio signal of the previous processing segment.
- Other application examples are the same as those described for the audio signal converter, and the description thereof is omitted.
- the program code itself is a program for causing a computer to execute this audio signal conversion method.
- this program reads out each of the input audio signals of the two channels from the computer while shifting each 1/4 of the length of the processing segment, and multiplies the read audio signal of the processing segment by the Hann window function.
- a transform step for performing discrete Fourier transform an extraction step for extracting a correlation signal by ignoring a direct current component of the two-channel audio signals after the discrete Fourier transform in the transform step, and a correlation signal extracted in the extraction step
- an inverse transform step for performing discrete Fourier inverse transform on the correlated signal and the uncorrelated signal, or on the sound signal generated from the correlated signal, or on the sound signal generated from the correlated signal and the uncorrelated signal. And the sound of the processing segment of the audio signal after the discrete Fourier inverse transform in the inverse transform step.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
An audio signal conversion device (exemplified by an audio signal processing section (73)) includes: a conversion section that reads out each input audio signal in two channels while shifting by one fourth the length of a processing segment for each read-out, multiplies the read audio signals in the processing segment by the Hann window function, and then applies a discrete Fourier transform; an extraction section that ignores the DC components of the audio signals in the two channels, to which the discrete Fourier transform has been applied, and extracts correlation signals; an inverse conversion section that applies an inverse discrete Fourier transform to the correlation signals extracted by the extraction section or audio signals generated from the correlation signals; and a window function multiplication section (83) that multiplies again the audio signals in the processing segment among the audio signals to which the inverse discrete Fourier transform has been applied, by the Hann window function, shifts the resultant signals by one fourth the length of the processing segment, and adds the signals to the audio signals in the prior processing segment. As a result, this audio signal conversion device can convert audio signals in multiple channels without generating noise caused by discontinuous points.
Description
本発明は、マルチチャネル再生方式用の音声信号を変換するための音声信号変換装置、方法、プログラム、及び記録媒体に関する。
The present invention relates to an audio signal conversion apparatus, method, program, and recording medium for converting an audio signal for a multi-channel playback method.
従来から提案されている音響再生方式には、ステレオ(2ch)方式、5.1chサラウンド方式(ITU-R BS.775-1)などがあり広く民生用として普及している。2ch方式とは、図1で模式的に図示したように、左スピーカ11Lと右スピーカ11Rから異なる音声データを発生させる方式である。5.1chサラウンド方式とは、図2で模式的に図示したように、左フロントスピーカ21L、右フロントスピーカ21R、それらの間に配置するセンタースピーカ22C、左リアスピーカ23L、右リアスピーカ23R、及び図示しない低音域(一般的に20Hz~100Hz)専用のサブウーファーに対し、それぞれ異なる音声データを入力して出力する方式である。
Conventionally proposed sound reproduction systems include a stereo (2ch) system and a 5.1ch surround system (ITU-R BS.775-1), which are widely used for consumer use. The 2ch system is a system for generating different audio data from the left speaker 11L and the right speaker 11R as schematically illustrated in FIG. The 5.1ch surround system is, as schematically illustrated in FIG. 2, a left front speaker 21L, a right front speaker 21R, a center speaker 22C, a left rear speaker 23L, a right rear speaker 23R disposed between them, This is a method of inputting and outputting different audio data to a subwoofer dedicated to a low sound range (generally 20 Hz to 100 Hz) not shown.
また、2ch方式や5.1chサラウンド方式の他にも、7.1ch、9.1ch、22.2chなどさまざまな音響再生方式が提案されている。上述した方式はいずれも、聴取者(受聴者)を中心とする円周上または球面上に各スピーカを配置し、理想的には各スピーカから等距離にある聴取位置(受聴位置)、いわゆるスイートスポットで聴くことが好ましいとされている。例えば2ch方式ではスイートスポット12で、5.1chサラウンド方式ではスイートスポット24で聴くことが好ましい。スイートスポットで聴くと、音圧のバランスによる合成音像が製作者の意図するところに定位する。逆に、スイートスポット以外の位置で聴くと、一般的に、音像・音質が劣化する。以下、これらの方式を総称してマルチチャネル再生方式と呼ぶ。
In addition to the 2ch system and 5.1ch surround system, various sound reproduction systems such as 7.1ch, 9.1ch, and 22.2ch have been proposed. In any of the methods described above, each speaker is arranged on a circumference or a spherical surface centered on the listener (listener), and ideally a listening position (listening position) that is equidistant from each speaker, so-called sweet. It is preferable to listen at a spot. For example, it is preferable to listen to the sweet spot 12 in the 2ch system and the sweet spot 24 in the 5.1ch surround system. When listening at the sweet spot, the synthesized sound image based on the balance of sound pressure is localized where the producer intended. Conversely, when listening at a position other than the sweet spot, the sound image / quality is generally deteriorated. Hereinafter, these methods are collectively referred to as a multi-channel reproduction method.
一方、マルチチャネル再生方式とは別に、音源オブジェクト指向再生方式もある。この方式は、全ての音が、いずれかの音源オブジェクトが発する音であるとする方式であり、各音源オブジェクト(以下、「仮想音源」と呼ぶ。)が自身の位置情報と音声信号とを含んでいる。音楽コンテンツを例にとると、各仮想音源は、それぞれの楽器の音と楽器が配置されている位置情報とを含む。
そして、音源オブジェクト指向再生方式は、通常、直線状あるいは面状に並べたスピーカ群によって音の波面を合成する再生方式(すなわち波面合成再生方式)により再生される。このような波面合成再生方式のうち、非特許文献1に記載のWave Field Synthesis(WFS)方式は、直線状に並べたスピーカ群(以下、スピーカアレイという)を用いる現実的な実装方法の1つとして近年盛んに研究されている。 On the other hand, apart from the multi-channel playback method, there is also a sound source object-oriented playback method. This method is a method in which all sounds are sounds emitted by any sound source object, and each sound source object (hereinafter referred to as “virtual sound source”) includes its own position information and audio signal. It is out. Taking music content as an example, each virtual sound source includes the sound of each musical instrument and position information where the musical instrument is arranged.
The sound source object-oriented reproduction method is usually reproduced by a reproduction method (that is, a wavefront synthesis reproduction method) in which a sound wavefront is synthesized by a group of speakers arranged in a straight line or a plane. Among such wavefront synthesis reproduction systems, the Wave Field Synthesis (WFS) system described inNon-Patent Document 1 is one of the practical mounting methods using linearly arranged speaker groups (hereinafter referred to as speaker arrays). Has been actively studied in recent years.
そして、音源オブジェクト指向再生方式は、通常、直線状あるいは面状に並べたスピーカ群によって音の波面を合成する再生方式(すなわち波面合成再生方式)により再生される。このような波面合成再生方式のうち、非特許文献1に記載のWave Field Synthesis(WFS)方式は、直線状に並べたスピーカ群(以下、スピーカアレイという)を用いる現実的な実装方法の1つとして近年盛んに研究されている。 On the other hand, apart from the multi-channel playback method, there is also a sound source object-oriented playback method. This method is a method in which all sounds are sounds emitted by any sound source object, and each sound source object (hereinafter referred to as “virtual sound source”) includes its own position information and audio signal. It is out. Taking music content as an example, each virtual sound source includes the sound of each musical instrument and position information where the musical instrument is arranged.
The sound source object-oriented reproduction method is usually reproduced by a reproduction method (that is, a wavefront synthesis reproduction method) in which a sound wavefront is synthesized by a group of speakers arranged in a straight line or a plane. Among such wavefront synthesis reproduction systems, the Wave Field Synthesis (WFS) system described in
このような波面合成再生方式は、上述のマルチチャネル再生方式とは異なり、図3で模式的に図示したように、並べられたスピーカ群31の前のどの位置で聴いている受聴者に対しても、良好な音像と音質を両方同時に提示することができるという特長を持つ。つまり、波面合成再生方式でのスイートスポット32は図示するように幅広くなっている。
また、WFS方式によって提供される音響空間内においてスピーカアレイと対面して音を聴いている受聴者は、実際にはスピーカアレイから放射される音が、スピーカアレイの後方に仮想的に存在する音源(仮想音源)から放射されているかのような感覚を受ける。 Such a wavefront synthesis reproduction method is different from the above-described multi-channel reproduction method, as shown schematically in FIG. 3, for a listener who is listening at any position in front of the arrangedspeaker groups 31. However, it has the feature that both good sound image and sound quality can be presented at the same time. That is, the sweet spot 32 in the wavefront synthesis reproduction system is wide as shown in the figure.
In addition, a listener who is listening to sound while facing the speaker array in an acoustic space provided by the WFS method is actually a sound source in which the sound radiated from the speaker array virtually exists behind the speaker array. Feels like being emitted from (virtual sound source).
また、WFS方式によって提供される音響空間内においてスピーカアレイと対面して音を聴いている受聴者は、実際にはスピーカアレイから放射される音が、スピーカアレイの後方に仮想的に存在する音源(仮想音源)から放射されているかのような感覚を受ける。 Such a wavefront synthesis reproduction method is different from the above-described multi-channel reproduction method, as shown schematically in FIG. 3, for a listener who is listening at any position in front of the arranged
In addition, a listener who is listening to sound while facing the speaker array in an acoustic space provided by the WFS method is actually a sound source in which the sound radiated from the speaker array virtually exists behind the speaker array. Feels like being emitted from (virtual sound source).
この波面合成再生方式では、仮想音源を表す入力信号を必要とする。そして、一般的に、1つの仮想音源には1チャネル分の音声信号とその仮想音源の位置情報が含まれることを必要とする。上述の音楽コンテンツを例にとると、例えば楽器毎に録音された音声信号とその楽器の位置情報ということになる。ただし、仮想音源それぞれの音声信号は必ずしも楽器毎である必要はないが、コンテンツ製作者が意図するそれぞれの音の到来方向と大きさが、仮想音源という概念を用いて表現されている必要がある。
This wavefront synthesis playback method requires an input signal representing a virtual sound source. In general, one virtual sound source needs to include an audio signal for one channel and position information of the virtual sound source. Taking the above-described music content as an example, for example, it is an audio signal recorded for each musical instrument and position information of the musical instrument. However, the sound signal of each virtual sound source does not necessarily need to be for each musical instrument, but the arrival direction and magnitude of each sound intended by the content creator must be expressed using the concept of virtual sound source. .
ここで、前述のマルチチャネル方式の中でも最も広く普及している方式はステレオ(2ch)方式であるため、ステレオ方式の音楽コンテンツについて考察する。図4に示すように2つのスピーカ41L,41Rを用いて、ステレオ方式の音楽コンテンツにおけるL(左)チャネルとR(右)チャネルの音声信号を、それぞれ左に設置したスピーカ41L、右に設置したスピーカ41Rで再生する。このような再生を行うと、図4に示すように、各スピーカ41L,41Rから等距離の地点、すなわちスイートスポット43で聴く場合にのみ、ボーカルの声とベースの音が真ん中の位置42bから聞こえ、ピアノの音が左側の位置42a、ドラムの音が右側の位置42cなど、製作者が意図したように音像が定位して聞こえる。
このようなコンテンツを波面合成再生方式で再生し、波面合成再生方式の特長である、どの位置の受聴者に対してもコンテンツ製作者の意図通りの音像定位を提供することを考える。そのためには、図5で示すスイートスポット53のように、どの視聴位置からでも図4のスイートスポット43内で聴いたときの音像が知覚できなければならない。つまり、直線状あるいは面状に並べられたスピーカ群51によって、広いスイートスポット53で、ボーカルの声とベースの音が真ん中の位置52bから聞こえ、ピアノの音が左側の位置52a、ドラムの音が右側の位置52cなど、製作者が意図したように音像が定位して聞こえなければならない。 Here, since the most widespread method among the aforementioned multi-channel methods is the stereo (2ch) method, the music content of the stereo method will be considered. As shown in FIG. 4, the audio signals of the L (left) channel and the R (right) channel in stereo music contents are installed on theleft speaker 41L and on the right using two speakers 41L and 41R, respectively. Playback is performed by the speaker 41R. When such reproduction is performed, as shown in FIG. 4, only when listening at a point equidistant from each of the speakers 41L and 41R, that is, the sweet spot 43, the voice of the vocal and the sound of the bass can be heard from the middle position 42b. The sound image is localized and heard as intended by the producer, such as a piano sound on the left side 42a and a drum sound on the right side 42c.
It is considered that such content is reproduced by the wavefront synthesis reproduction method, and the sound image localization as intended by the content producer is provided to the listener at any position, which is a feature of the wavefront synthesis reproduction method. For this purpose, it is necessary to be able to perceive a sound image when listening in thesweet spot 43 of FIG. 4 from any viewing position, such as the sweet spot 53 shown in FIG. That is, the vocal group and the sound of the bass are heard from the middle position 52b at the wide sweet spot 53 by the speaker group 51 arranged in a straight line or a plane, and the piano sound is heard from the left position 52a and the drum sound. The sound image must be localized and heard as intended by the producer, such as the right position 52c.
このようなコンテンツを波面合成再生方式で再生し、波面合成再生方式の特長である、どの位置の受聴者に対してもコンテンツ製作者の意図通りの音像定位を提供することを考える。そのためには、図5で示すスイートスポット53のように、どの視聴位置からでも図4のスイートスポット43内で聴いたときの音像が知覚できなければならない。つまり、直線状あるいは面状に並べられたスピーカ群51によって、広いスイートスポット53で、ボーカルの声とベースの音が真ん中の位置52bから聞こえ、ピアノの音が左側の位置52a、ドラムの音が右側の位置52cなど、製作者が意図したように音像が定位して聞こえなければならない。 Here, since the most widespread method among the aforementioned multi-channel methods is the stereo (2ch) method, the music content of the stereo method will be considered. As shown in FIG. 4, the audio signals of the L (left) channel and the R (right) channel in stereo music contents are installed on the
It is considered that such content is reproduced by the wavefront synthesis reproduction method, and the sound image localization as intended by the content producer is provided to the listener at any position, which is a feature of the wavefront synthesis reproduction method. For this purpose, it is necessary to be able to perceive a sound image when listening in the
その課題に対し、例えば、図6のようにLチャネルの音、Rチャネルの音をそれぞれ仮想音源62a,62bとして配置した場合を考える。この場合、L/Rチャネルそれぞれが単体で1つの音源を表すのではなく2つのチャネルによって合成音像を生成するものであるから、それを波面合成再生方式で再生したとしても、やはりスイートスポット63が生成されてしまい、スイートスポット63の位置でしか、図4のような音像定位はしない。つまり、そのような音像定位を実現するには、2chのステレオデータから、何らかの手段によって音像毎の音声に分離し、各音声から仮想音源データを生成することが必要となる。
For this problem, for example, consider the case where the L channel sound and the R channel sound are arranged as virtual sound sources 62a and 62b as shown in FIG. In this case, since each L / R channel does not represent a single sound source alone but generates a synthesized sound image by two channels, the sweet spot 63 is still generated even if it is reproduced by the wavefront synthesis reproduction method. The sound image is localized as shown in FIG. 4 only at the position of the sweet spot 63. That is, in order to realize such sound image localization, it is necessary to separate 2ch stereo data into sound for each sound image by some means and generate virtual sound source data from each sound.
この課題に対し、特許文献1に記載の方法では、2chステレオデータを周波数帯域毎に信号のパワーの相関係数を基に相関信号と無相関信号とに分離し、相関信号については合成音像方向を推定し、それらの結果から仮想音源を生成し、その際に生じる波形の不連続点を除去する処理を行っている。
In response to this problem, the method described in Patent Document 1 separates 2ch stereo data into a correlated signal and an uncorrelated signal based on the correlation coefficient of the signal power for each frequency band. Is generated, a virtual sound source is generated from the results, and a discontinuous point of the waveform generated at that time is removed.
しかしながら、特許文献1に記載の方法では、人間の音声以外の信号に対しても、ゼロ交差回数を数えることによって子音部分であるか否かの判定処理を行い、子音部分以外に対して波形が連続するようにバイアス値を加えている。ここで、楽音信号などは、子音部分のようにホワイトノイズに近いような成分と、それ以外の成分が混ざり合っているのが普通である。また、喩え人間の音声であっても、濁音など、子音と母音の間のような特性を持つ部分も沢山ある。この方法では、そのような音声信号を、ゼロ交差回数だけで、バイアス値を加算するか否かの判定を行うため、当然、誤判定が生じ、その部分については生成した音声信号波形に不連続点が含まれ、ノイズとして知覚されてしまう。
However, in the method described in Patent Document 1, it is determined whether or not a signal other than a human voice is a consonant part by counting the number of zero crossings, and a waveform is generated for a part other than the consonant part. A bias value is added so as to be continuous. Here, the musical sound signal or the like is usually a mixture of components that are close to white noise, such as consonant parts, and other components. There are also many parts of human speech that have characteristics like consonants and vowels, such as muddy sounds. In this method, since it is determined whether or not a bias value is added to such an audio signal only by the number of zero crossings, an erroneous determination occurs naturally, and this portion is discontinuous in the generated audio signal waveform. Dots are included and perceived as noise.
本発明は、上述のような実状に鑑みてなされたものであり、その目的は、2chや5.1ch等のマルチチャネル方式用の音声信号を、不連続点に起因するノイズを発生させることなく変換することが可能な音声信号変換装置、方法、プログラム、及び記録媒体を提供することにある。
The present invention has been made in view of the above-described actual situation, and an object of the present invention is to generate a multichannel audio signal such as 2ch or 5.1ch without generating noise due to discontinuities. An audio signal conversion apparatus, method, program, and recording medium that can be converted are provided.
上述したような課題を解決するために、本発明の第1の技術手段は、マルチチャネルの入力音声信号を、スピーカ群によって再生させるための音声信号に変換する音声信号変換装置であって、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換部と、該変換部で離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する相関信号抽出部と、該相関信号抽出部で抽出された相関信号または該相関信号及び無相関信号に対して、もしくは前記相関信号から生成された音声信号に対して、もしくは前記相関信号及び前記無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換部と、該逆変換部で離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算部と、を備えたことを特徴としたものである。
In order to solve the above-described problems, a first technical means of the present invention is an audio signal conversion apparatus that converts a multi-channel input audio signal into an audio signal for reproduction by a speaker group. Each of the input audio signals of the two channels is read while being shifted by ¼ of the length of the processing segment, the audio signal of the read processing segment is multiplied by a Hann window function, and then a discrete Fourier transform is performed. A correlation signal extraction unit that extracts a correlation signal by ignoring a direct current component of the two-channel audio signals after the discrete Fourier transform by the conversion unit, and the correlation signal extracted by the correlation signal extraction unit or the correlation signal And an uncorrelated signal, an audio signal generated from the correlated signal, or from the correlated signal and the uncorrelated signal The inverse transform unit that performs discrete Fourier inverse transform on the generated speech signal, and the speech signal of the processing segment of the speech signal after the discrete Fourier inverse transform by the inverse transform unit is multiplied by the Hann window function again. And a window function multiplier for adding to the audio signal of the previous processing segment with a shift of ¼ of the length of the processing segment.
第2の技術手段は、第1の技術手段において、前記窓関数乗算部で処理対象となる前記離散フーリエ逆変換後の音声信号は、前記相関信号または前記相関信号及び前記無相関信号に対して、時間領域あるいは周波数領域においてスケーリング処理が施された後の音声信号とすることを特徴としたものである。
According to a second technical means, in the first technical means, the speech signal after the inverse discrete Fourier transform to be processed by the window function multiplication unit is the correlation signal or the correlation signal and the non-correlation signal. The audio signal is subjected to scaling processing in the time domain or the frequency domain.
第3の技術手段は、マルチチャネルの入力音声信号を、スピーカ群によって再生させるための音声信号に変換する音声信号変換方法であって、変換部が、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換ステップと、相関信号抽出部が、前記変換ステップで離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する抽出ステップと、逆変換部が、前記抽出ステップで抽出された相関信号または該相関信号及び無相関信号に対して、もしくは前記相関信号から生成された音声信号に対して、もしくは前記相関信号及び前記無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換ステップと、窓関数乗算部が、前記逆変換ステップで離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算ステップと、を有することを特徴としたものである。
A third technical means is an audio signal conversion method for converting a multi-channel input audio signal into an audio signal for reproduction by a group of speakers, wherein the conversion unit is configured for each of the input audio signals of two channels. The conversion step of performing the discrete Fourier transform after multiplying the audio signal of the read processing segment by shifting by a quarter of the length of the processing segment and multiplying by the Hann window function, and the correlation signal extracting unit, the conversion step And an extraction step for extracting a correlation signal by ignoring a direct current component of the two-channel audio signals after the discrete Fourier transform, and the inverse transformation unit extracts the correlation signal extracted in the extraction step or the correlation signal and the non-correlation To a signal or to an audio signal generated from the correlated signal, or to the correlated signal and the uncorrelated The inverse transform step for performing discrete Fourier inverse transform on the speech signal generated from the signal, and the window function multiplication unit for the speech signal of the processing segment among the speech signals after the discrete Fourier inverse transform in the inverse transform step A window function multiplying step of multiplying the Hann window function again, shifting by 1/4 of the length of the processing segment, and adding to the audio signal of the previous processing segment.
第4の技術手段は、コンピュータに、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換ステップと、該変換ステップで離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する抽出ステップと、該抽出ステップで抽出された相関信号または該相関信号及び無相関信号に対して、もしくは前記相関信号から生成された音声信号に対して、もしくは前記相関信号及び前記無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換ステップと、該逆変換ステップで離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算ステップと、を実行させるためのプログラムである。
According to a fourth technical means, the computer reads out each of the input audio signals of the two channels while shifting each 1/4 of the length of the processing segment, and multiplies the read audio signal of the processing segment by the Hann window function. Then, a conversion step for performing a discrete Fourier transform, an extraction step for ignoring a direct current component and extracting a correlation signal for the two-channel audio signals after the discrete Fourier transform in the conversion step, and an extraction step for extracting the correlation signal Discrete Fourier for the correlated signal or for the correlated signal and the uncorrelated signal, for the speech signal generated from the correlated signal, or for the speech signal generated from the correlated signal and the uncorrelated signal An inverse transform step for performing an inverse transform, and a processing section of the audio signal after the discrete Fourier inverse transform at the inverse transform step. A window function multiplying step of multiplying the audio signal of the transmission segment by the Hann window function again, shifting by a quarter of the length of the processing segment, and adding to the audio signal of the previous processing segment. It is.
第5の技術手段は、第4の技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。
The fifth technical means is a computer-readable recording medium in which the program in the fourth technical means is recorded.
本発明によれば、2chや5.1ch等のマルチチャネル方式用の音声信号を、不連続点に起因するノイズを発生させることなく変換することが可能になる。
According to the present invention, it is possible to convert a multi-channel audio signal such as 2ch or 5.1ch without generating noise due to discontinuous points.
本発明に係る音声信号変換装置は、マルチチャネル再生方式用の音声信号を、チャネル数の同じまたは異なるスピーカ群で再生するための音声信号や波面合成再生方式用の音声信号などに変換する装置であって、音声信号処理装置、音声データ変換装置などとも呼べ、音声データ再生装置に組み込むことができる。なお、音声信号とは、当然、いわゆる音声を記録した信号に限ったものではなく、音響信号とも呼べる。また、波面合成再生方式とは、上述したように直線状または面状に並べたスピーカ群によって音の波面を合成する再生方式である。
An audio signal conversion apparatus according to the present invention is an apparatus for converting an audio signal for a multi-channel playback system into an audio signal for playback on a speaker group having the same or different number of channels, an audio signal for a wavefront synthesis playback system, or the like. Therefore, it can also be called an audio signal processing device, an audio data conversion device, etc., and can be incorporated into an audio data reproducing device. Of course, the audio signal is not limited to a signal in which a so-called audio is recorded, and can also be called an acoustic signal. The wavefront synthesis reproduction method is a reproduction method in which a wavefront of sound is synthesized by a group of speakers arranged in a straight line or a plane as described above.
以下、図面を参照しながら、本発明に係る音声信号変換装置の構成例及び処理例について説明する。また、以下の説明では、まず、本発明に係る音声信号変換装置が、変換により波面合成再生方式用の音声信号を生成する例を挙げる。
図7は、本発明に係る音声信号変換装置を備えた音声データ再生装置の一構成例を示すブロック図で、図8は、図7の音声データ再生装置における音声信号処理部(本発明に係る音声信号変換装置)の一構成例を示すブロック図である。 Hereinafter, a configuration example and a processing example of an audio signal conversion device according to the present invention will be described with reference to the drawings. In the following description, first, an example in which the audio signal conversion apparatus according to the present invention generates an audio signal for the wavefront synthesis reproduction method by conversion will be given.
FIG. 7 is a block diagram showing an example of the configuration of an audio data reproducing apparatus including the audio signal converting apparatus according to the present invention. FIG. 8 is an audio signal processing unit (according to the present invention) in the audio data reproducing apparatus of FIG. It is a block diagram which shows one structural example of an audio | voice signal converter.
図7は、本発明に係る音声信号変換装置を備えた音声データ再生装置の一構成例を示すブロック図で、図8は、図7の音声データ再生装置における音声信号処理部(本発明に係る音声信号変換装置)の一構成例を示すブロック図である。 Hereinafter, a configuration example and a processing example of an audio signal conversion device according to the present invention will be described with reference to the drawings. In the following description, first, an example in which the audio signal conversion apparatus according to the present invention generates an audio signal for the wavefront synthesis reproduction method by conversion will be given.
FIG. 7 is a block diagram showing an example of the configuration of an audio data reproducing apparatus including the audio signal converting apparatus according to the present invention. FIG. 8 is an audio signal processing unit (according to the present invention) in the audio data reproducing apparatus of FIG. It is a block diagram which shows one structural example of an audio | voice signal converter.
図7で例示する音声データ再生装置70は、デコーダ71、音声信号抽出部72、音声信号処理部73、D/Aコンバータ74、増幅器群75、そしてスピーカ群76から構成される。デコーダ71は、音声のみあるいは音声付き映像のコンテンツを復号化し、信号処理可能な形式に変換し音声信号抽出部72に出力する。そのコンテンツは、放送局から送信されたデジタル放送のコンテンツや、ネットワークを介してディジタルコンテンツを配信するサーバからインターネットからダウンロードしたり、あるいは外部記憶装置等の記録媒体から読み込んだりすることによって取得する。このように、図7では図示しないが、音声データ再生装置70は、マルチチャネルの入力音声信号を含むディジタルコンテンツを入力するディジタルコンテンツ入力部を備える。デコーダ71は、ここで入力されたディジタルコンテンツを復号化することになる。音声信号抽出部72では、得られた信号から音声信号を分離、抽出する。ここではそれは2chステレオ信号とする。その2チャネル分の信号を音声信号処理部73に出力する。
7 includes a decoder 71, an audio signal extraction unit 72, an audio signal processing unit 73, a D / A converter 74, an amplifier group 75, and a speaker group 76. The decoder 71 decodes the content of only audio or video with audio, converts it into a signal processable format, and outputs it to the audio signal extraction unit 72. The content is acquired by downloading from the Internet from a digital broadcast content transmitted from a broadcasting station, a server that distributes digital content via a network, or reading from a recording medium such as an external storage device. Thus, although not shown in FIG. 7, the audio data reproducing device 70 includes a digital content input unit that inputs digital content including a multi-channel input audio signal. The decoder 71 decodes the digital content input here. The audio signal extraction unit 72 separates and extracts an audio signal from the obtained signal. Here, it is a 2ch stereo signal. The signals for the two channels are output to the audio signal processing unit 73.
音声信号処理部73では、得られた2チャネル信号から、3チャネル以上で且つ入力音声信号とは異なるマルチチャネルの音声信号(以下の例では、仮想音源数分の信号として説明する)を生成する。つまり入力音声信号を別のマルチチャネルの音声信号に変換する。音声信号処理部73は、その音声信号をD/Aコンバータ74に出力する。仮想音源の数は、ある一定以上の数があれば予め決めておいても性能上差し支えはないが、仮想音源数が多くなるほど演算量も多くなる。そのため実装する装置の性能を考慮してその数を決定することが望ましい。ここの例では、その数を5として説明する。
The audio signal processing unit 73 generates multi-channel audio signals (which will be described as signals corresponding to the number of virtual sound sources in the following example) from three or more channels and different from the input audio signal from the obtained two-channel signals. . That is, the input audio signal is converted into another multi-channel audio signal. The audio signal processing unit 73 outputs the audio signal to the D / A converter 74. The number of virtual sound sources can be determined in advance if there is a certain number or more, but the amount of calculation increases as the number of virtual sound sources increases. Therefore, it is desirable to determine the number in consideration of the performance of the mounted device. In this example, the number is assumed to be 5.
D/Aコンバータ74では得られた信号をアナログ信号に変換し、それぞれの信号を増幅器75に出力する。各増幅器75では入力されたアナログ信号を拡声し各スピーカ76に伝送し、各スピーカ76から空間中に音として出力される。
The D / A converter 74 converts the obtained signals into analog signals and outputs each signal to the amplifier 75. Each amplifier 75 amplifies the input analog signal, transmits it to each speaker 76, and outputs it from each speaker 76 as sound.
この図における音声信号処理部の詳細な構成を図8に示す。音声信号処理部73は、窓関数乗算部81、音声信号分離抽出部82、窓関数乗算部83、及び音声出力信号生成部84から構成される。
FIG. 8 shows the detailed configuration of the audio signal processing unit in this figure. The audio signal processing unit 73 includes a window function multiplication unit 81, an audio signal separation / extraction unit 82, a window function multiplication unit 83, and an audio output signal generation unit 84.
窓関数乗算部81は、2チャンネルの音声信号を読み出してHann窓関数を乗算し、音声信号分離抽出部82に出力する。音声信号分離抽出部82は、2チャネルの信号から各仮想音源に対応する音声信号を生成し、それを窓関数乗算部83に出力する。窓関数乗算部83では、得られた音声信号波形から知覚上ノイズとなる部分を除去し、ノイズ除去後の音声信号を音声出力信号生成部84に出力する。このように、窓関数乗算部83は雑音除去部として機能する。音声出力信号生成部84では、得られた音声信号から各スピーカに対応するそれぞれの出力音声信号波形を生成する。音声出力信号生成部84では、波面合成再生処理などの処理が施され、例えば、得られた各仮想音源用の音声信号を各スピーカに割り当て、スピーカ毎の音声信号を生成する。波面合成再生処理の一部は音声信号分離抽出部82で担ってもよい。
The window function multiplication unit 81 reads out the 2-channel audio signal, multiplies it by the Hann window function, and outputs it to the audio signal separation / extraction unit 82. The audio signal separation / extraction unit 82 generates an audio signal corresponding to each virtual sound source from the two-channel signals, and outputs it to the window function multiplication unit 83. The window function multiplication unit 83 removes a perceptual noise part from the obtained audio signal waveform, and outputs the audio signal after noise removal to the audio output signal generation unit 84. Thus, the window function multiplication unit 83 functions as a noise removal unit. The audio output signal generation unit 84 generates each output audio signal waveform corresponding to each speaker from the obtained audio signal. The audio output signal generation unit 84 performs processing such as wavefront synthesis reproduction processing, for example, assigns the obtained audio signal for each virtual sound source to each speaker, and generates an audio signal for each speaker. A part of the wavefront synthesis reproduction processing may be performed by the audio signal separation / extraction unit 82.
次に、図9に従って、窓関数乗算部81、音声信号分離抽出部82、及び窓関数乗算部83での音声信号処理例を説明する。図9は、図8の音声信号処理部における音声信号処理の一例を説明するためのフロー図で、図10は、図8の音声信号処理部において音声データをバッファに蓄える様子を示す図である。図11は、Hann窓関数を示す図で、図12は、図9の音声信号処理における最初の窓関数乗算処理(窓関数乗算部81での窓関数乗算処理)において、1/4セグメントにつき1回乗算される窓関数を示す図である。
Next, an example of audio signal processing in the window function multiplication unit 81, the audio signal separation / extraction unit 82, and the window function multiplication unit 83 will be described with reference to FIG. FIG. 9 is a flowchart for explaining an example of the audio signal processing in the audio signal processing unit in FIG. 8, and FIG. 10 is a diagram showing a state in which audio data is stored in the buffer in the audio signal processing unit in FIG. . FIG. 11 is a diagram showing the Hann window function, and FIG. 12 is a diagram showing one 1 per 1/4 segment in the first window function multiplication process (window function multiplication process in the window function multiplication unit 81) in the audio signal processing of FIG. It is a figure which shows the window function multiplied by times.
まず、窓関数乗算部81は、1セグメントの1/4の長さの音声データを、図7における音声信号抽出部72での抽出結果から読み出す(ステップS1)。ここで、音声データとは、例えば48kHzなどの標本化周波数で標本化された離散音声信号波形を指すものとする。そして、セグメントとは、ある一定の長さの標本点群からなる音声データ区間であり、ここでは後ほど離散フーリエ変換の対象となる区間長を指すものとし、処理セグメントとも呼ぶ。その値は例えば1024とする。この例では、1セグメントの1/4の長さである256点の音声データが読み出し対象となる。
First, the window function multiplying unit 81 reads out audio data of 1/4 length of one segment from the extraction result in the audio signal extracting unit 72 in FIG. 7 (step S1). Here, the audio data refers to a discrete audio signal waveform sampled at a sampling frequency such as 48 kHz. A segment is an audio data section composed of a group of sample points having a certain length. Here, the segment refers to a section length to be subjected to discrete Fourier transform later, and is also called a processing segment. For example, the value is 1024. In this example, 256 points of audio data that is ¼ of one segment are to be read.
読み出した256点の音声データは図10で例示するようなバッファ100に蓄えられる。このバッファは、直前の1セグメント分の音声信号波形を保持しておけるようになっており、それより過去のセグメントは捨てていく。直前の3/4セグメント分のデータ(768点)と最新の1/4セグメント分のデータ(256点)を繋げて1セグメント分の音声データを作成し、窓関数演算(ステップS2)に進む。すなわち、全ての標本データは窓関数演算に4回読み込まれることになる。
The read 256-point audio data is stored in the buffer 100 as illustrated in FIG. This buffer can hold the sound signal waveform for the immediately preceding segment, and the past segments are discarded. The immediately previous 3/4 segment data (768 points) and the latest 1/4 segment data (256 points) are connected to create audio data for one segment, and the process proceeds to window function calculation (step S2). That is, all sample data is read four times in the window function calculation.
次に、窓関数乗算部81は、従来提案されている次のHann窓を1セグメント分の音声データに乗算する窓関数演算処理を実行する(ステップS2)。このHann窓は図11に窓関数110として図示したものである。
ここで、mは自然数、Mは1セグメント長で偶数とする。ステレオの入力信号をそれぞれxL(m)、xR(m)とすると、窓関数乗算後の音声信号x′L(m)、x′R(m)は、
Next, the window function multiplication unit 81 executes a window function calculation process for multiplying the audio data for one segment by the next Hann window that has been conventionally proposed (step S2). This Hann window is illustrated as the window function 110 in FIG.
Here, m is a natural number, M is an even number of one segment length. If the stereo input signals are x L (m) and x R (m), respectively, the audio signals x ′ L (m) and x ′ R (m) after the window function multiplication are
x′L(m)=w(m)xL(m) 、
x′R(m)=w(m)xR(m) (2)
と計算される。このHann窓を用いると、例えば標本点m0(ただし、0≦m0<M/4)の入力信号xL(m0)にはsin2((m0/M)π)が乗算される。そして、その次の回の読み込みではその同じ標本点がm0+M/4として、その次にはm0+M/2として、その次にはm0+(3M)/4として読み込まれる。さらに、詳細は後述するが、この窓関数を、最後に再度演算する。したがって、上述の入力信号xL(m0)にはsin4((m0/M)π)が乗算されることになる。これを窓関数として図示すると図12に示す窓関数120のようになる。この窓関数120が、1/4セグメント毎にシフトされながら合計4回加算されるので、 x ′ L (m) = w (m) × L (m)
x ′ R (m) = w (m) × R (m) (2)
Is calculated. Using this Hann window, for example, the input signal x L (m 0 ) at the sample point m 0 (where 0 ≦ m 0 <M / 4) is multiplied by sin 2 ((m 0 / M) π). . In the next reading, the same sample point is read as m 0 + M / 4, then as m 0 + M / 2, and then as m 0 + (3M) / 4. Further, although details will be described later, this window function is calculated again at the end. Accordingly, the above input signal x L (m 0 ) is multiplied by sin 4 ((m 0 / M) π). If this is illustrated as a window function, awindow function 120 shown in FIG. 12 is obtained. Since this window function 120 is added a total of four times while being shifted every quarter segment,
x′R(m)=w(m)xR(m) (2)
と計算される。このHann窓を用いると、例えば標本点m0(ただし、0≦m0<M/4)の入力信号xL(m0)にはsin2((m0/M)π)が乗算される。そして、その次の回の読み込みではその同じ標本点がm0+M/4として、その次にはm0+M/2として、その次にはm0+(3M)/4として読み込まれる。さらに、詳細は後述するが、この窓関数を、最後に再度演算する。したがって、上述の入力信号xL(m0)にはsin4((m0/M)π)が乗算されることになる。これを窓関数として図示すると図12に示す窓関数120のようになる。この窓関数120が、1/4セグメント毎にシフトされながら合計4回加算されるので、 x ′ L (m) = w (m) × L (m)
x ′ R (m) = w (m) × R (m) (2)
Is calculated. Using this Hann window, for example, the input signal x L (m 0 ) at the sample point m 0 (where 0 ≦ m 0 <M / 4) is multiplied by sin 2 ((m 0 / M) π). . In the next reading, the same sample point is read as m 0 + M / 4, then as m 0 + M / 2, and then as m 0 + (3M) / 4. Further, although details will be described later, this window function is calculated again at the end. Accordingly, the above input signal x L (m 0 ) is multiplied by sin 4 ((m 0 / M) π). If this is illustrated as a window function, a
そうして得られた音声データを、次の数式(3)のように離散フーリエ変換し、周波数領域の音声データを得る(ステップS3)。なお、ステップS3~S10の処理は、音声信号分離抽出部82が行えばよい。ここで、DFTは離散フーリエ変換を表し、kは自然数で、0≦k<Mである。XL(k)、XR(k)は複素数となる。
XL(k)=DFT(x′L(n)) 、
XR(k)=DFT(x′R(n)) (3) The audio data thus obtained is subjected to discrete Fourier transform as in the following formula (3) to obtain frequency domain audio data (step S3). The processing in steps S3 to S10 may be performed by the audio signal separation /extraction unit 82. Here, DFT represents discrete Fourier transform, k is a natural number, and 0 ≦ k <M. X L (k) and X R (k) are complex numbers.
X L (k) = DFT (x ′ L (n))
X R (k) = DFT (x ′ R (n)) (3)
XL(k)=DFT(x′L(n)) 、
XR(k)=DFT(x′R(n)) (3) The audio data thus obtained is subjected to discrete Fourier transform as in the following formula (3) to obtain frequency domain audio data (step S3). The processing in steps S3 to S10 may be performed by the audio signal separation /
X L (k) = DFT (x ′ L (n))
X R (k) = DFT (x ′ R (n)) (3)
次に、得られた周波数領域の音声データを、各線スペクトルについてステップS5~S8の処理を実行する(ステップS4a,S4b)。具体的に個々の処理について説明する。なお、ここでは線スペクトル毎に相関係数を取得するなどの処理を行う例を挙げて説明するが、特許文献1に記載のように、Equivalent Rectangular Band(ERB)を用いて分割した帯域(小帯域)毎に相関係数を取得するなどの処理を実行してもよい。
Next, the processing of steps S5 to S8 is performed for each line spectrum on the obtained frequency domain audio data (steps S4a and S4b). Specific processing will be described. Here, an example of performing processing such as obtaining a correlation coefficient for each line spectrum will be described. However, as described in Patent Document 1, a band (small size) divided using Equivalent Rectangular Band (ERB) is used. Processing such as obtaining a correlation coefficient may be executed for each (band).
ここで、離散フーリエ変換した後の線スペクトルは、直流成分すなわち例えばXL(0)を除いて、M/2(ただし、Mは偶数)を境に対称となっている。すなわち、XL(k)とXL(M-k)は0<k<M/2の範囲で複素共役の関係になる。したがって、以下ではk≦M/2の範囲を分析の対象として考え、k>M/2の範囲については複素共役の関係にある対称の線スペクトルと同じ扱いとする。
Here, the line spectrum after the discrete Fourier transform is symmetrical with respect to M / 2 (where M is an even number) except for the DC component, that is, for example, X L (0). That is, X L (k) and X L (Mk) have a complex conjugate relationship in the range of 0 <k <M / 2. Therefore, in the following, the range of k ≦ M / 2 is considered as the object of analysis, and the range of k> M / 2 is treated the same as a symmetric line spectrum having a complex conjugate relationship.
次に、各線スペクトルに対し、左チャネルと右チャネルの正規化相関係数を次式で求めることで、相関係数を取得する(ステップS5)。
Next, the correlation coefficient is acquired by calculating | requiring the normalized correlation coefficient of a left channel and a right channel with following Formula with respect to each line spectrum (step S5).
この正規化相関係数d(i)は左右のチャネルの音声信号にどれだけ相関があるかを表すものであり、0から1の間の実数の値をとる。全く同じ信号同士であれば1、そして全く無相関の信号同士であれば0となる。ここで、左右のチャネルの音声信号の電力PL
(i)とPR
(i)の両方が0である場合、その線スペクトルに関して相関信号と無相関信号の抽出は不可能とし、処理を行わず次の線スペクトルの処理に移ることとする。また、PL
(i)とPR
(i)のいずれか片方が0である場合、数式(4)では演算不可能であるが、正規化相関係数d(i)=0とし、その線スペクトルの処理を続行する。
This normalized correlation coefficient d (i) represents how much the audio signals of the left and right channels are correlated, and takes a real value between 0 and 1. 1 if the signals are exactly the same, and 0 if the signals are completely uncorrelated. Here, when both the powers P L (i) and P R (i) of the audio signals of the left and right channels are 0, it is impossible to extract the correlated signal and the uncorrelated signal with respect to the line spectrum, and the processing is performed. Let's move on to the processing of the next line spectrum. Also, if either P L (i) or P R (i) is 0, the calculation cannot be performed in Equation (4), but the normalized correlation coefficient d (i) = 0 and the line Continue processing the spectrum.
次に、この正規化相関係数d(i)を用いて、左右チャネルの音声信号から相関信号と無相関信号をそれぞれ分離抽出するための変換係数を求め(ステップS6)、ステップS6で取得したそれぞれの変換係数を用いて、左右チャネルの音声信号から相関信号と無相関信号を分離抽出する(ステップS7)。相関信号及び無相関信号は、いずれも推定した音声信号として抽出すればよい。
Next, using this normalized correlation coefficient d (i) , a conversion coefficient for separating and extracting the correlation signal and the non-correlation signal from the audio signals of the left and right channels is obtained (step S6), and obtained in step S6. Using each conversion coefficient, the correlation signal and the non-correlation signal are separated and extracted from the audio signals of the left and right channels (step S7). What is necessary is just to extract both a correlation signal and a non-correlation signal as the estimated audio | voice signal.
ステップS6,S7の処理例を説明する。ここで、特許文献1と同様、左右チャネルそれぞれの信号は、無相関信号と相関信号から構成され、相関信号については、左右からゲインのみ異なる信号波形(つまり同じ周波数成分からなる信号波形)が出力されるものとするモデルを採用する。ここで、ゲインは、信号波形の振幅に相当し、音圧に関連する値である。そして、このモデルでは、左右から出力される相関信号によって合成される音像は、その相関信号の左右それぞれの音圧のバランスによって方向が決定されるものとする。そのモデルに従うと、入力信号xL(n)、xR(n)は、
xL(m)= s(m)+nL(m)、
xR(m)=αs(m)+nR(m) (8)
と表される。ここで、s(m)は左右の相関信号、nL(m)は左チャネルの音声信号から相関信号s(m)を減算したものであって(左チャネルの)無相関信号として定義できるもの、nR(m)は右チャネルの音声信号から相関信号s(m)にαを乗算したものを減算したものであって(右チャネルの)無相関信号として定義できるものである。また、αは相関信号の左右音圧バランスの程度を表す正の実数である。 A processing example of steps S6 and S7 will be described. Here, as inPatent Document 1, each signal of the left and right channels is composed of an uncorrelated signal and a correlated signal, and for the correlated signal, a signal waveform that differs only in gain from the left and right (that is, a signal waveform composed of the same frequency component) is output. Adopt the model to be done. Here, the gain corresponds to the amplitude of the signal waveform and is a value related to the sound pressure. In this model, it is assumed that the direction of the sound image synthesized by the correlation signals output from the left and right is determined by the balance of the left and right sound pressures of the correlation signal. According to the model, the input signals x L (n), x R (n) are
x L (m) = s (m) + n L (m),
x R (m) = αs (m) + n R (m) (8)
It is expressed. Here, s (m) is a left and right correlation signal, n L (m) is a subtracted correlation signal s (m) from a left channel audio signal, and can be defined as an uncorrelated signal (left channel). , N R (m) is obtained by subtracting the correlation signal s (m) multiplied by α from the right channel audio signal, and can be defined as an uncorrelated signal (right channel). Α is a positive real number representing the degree of left / right sound pressure balance of the correlation signal.
xL(m)= s(m)+nL(m)、
xR(m)=αs(m)+nR(m) (8)
と表される。ここで、s(m)は左右の相関信号、nL(m)は左チャネルの音声信号から相関信号s(m)を減算したものであって(左チャネルの)無相関信号として定義できるもの、nR(m)は右チャネルの音声信号から相関信号s(m)にαを乗算したものを減算したものであって(右チャネルの)無相関信号として定義できるものである。また、αは相関信号の左右音圧バランスの程度を表す正の実数である。 A processing example of steps S6 and S7 will be described. Here, as in
x L (m) = s (m) + n L (m),
x R (m) = αs (m) + n R (m) (8)
It is expressed. Here, s (m) is a left and right correlation signal, n L (m) is a subtracted correlation signal s (m) from a left channel audio signal, and can be defined as an uncorrelated signal (left channel). , N R (m) is obtained by subtracting the correlation signal s (m) multiplied by α from the right channel audio signal, and can be defined as an uncorrelated signal (right channel). Α is a positive real number representing the degree of left / right sound pressure balance of the correlation signal.
数式(8)により、数式(2)で前述した窓関数乗算後の音声信号x′L(m)、x′R(m)は、次の数式(9)で表される。ただし、s′(m)、n′L(m)、n′R(m)はそれぞれs(m)、nL(m)、nR(m)に窓関数を乗算したものである。
x′L(m)=w(m){s(m)+nL(m)}=s′(m)+n′L(m)、
x′R(m)=w(m){αs(m)+nR(m)}=αs′(m)+n′R(m) (9) From Equation (8), the audio signals x ′ L (m) and x ′ R (m) after the window function multiplication described in Equation (2) are expressed by the following Equation (9). However, s ′ (m), n ′ L (m), and n ′ R (m) are obtained by multiplying s (m), n L (m), and n R (m) by a window function, respectively.
x ′ L (m) = w (m) {s (m) + n L (m)} = s ′ (m) + n ′ L (m),
x ′ R (m) = w (m) {αs (m) + n R (m)} = αs ′ (m) + n ′ R (m) (9)
x′L(m)=w(m){s(m)+nL(m)}=s′(m)+n′L(m)、
x′R(m)=w(m){αs(m)+nR(m)}=αs′(m)+n′R(m) (9) From Equation (8), the audio signals x ′ L (m) and x ′ R (m) after the window function multiplication described in Equation (2) are expressed by the following Equation (9). However, s ′ (m), n ′ L (m), and n ′ R (m) are obtained by multiplying s (m), n L (m), and n R (m) by a window function, respectively.
x ′ L (m) = w (m) {s (m) + n L (m)} = s ′ (m) + n ′ L (m),
x ′ R (m) = w (m) {αs (m) + n R (m)} = αs ′ (m) + n ′ R (m) (9)
数式(9)を離散フーリエ変換することによって、次の数式(10)を得る。ただし、S(k)、NL(k)、NR(k)はそれぞれs′(m)、n′L(m)、n′R(m)を離散フーリエ変換したものである。
XL(k)= S(k)+NL(k)、
XR(k)=αS(k)+NR(k) (10) The following equation (10) is obtained by performing a discrete Fourier transform on the equation (9). However, S (k), N L (k), and N R (k) are discrete Fourier transforms of s ′ (m), n ′ L (m), and n ′ R (m), respectively.
X L (k) = S (k) + N L (k),
X R (k) = αS (k) + N R (k) (10)
XL(k)= S(k)+NL(k)、
XR(k)=αS(k)+NR(k) (10) The following equation (10) is obtained by performing a discrete Fourier transform on the equation (9). However, S (k), N L (k), and N R (k) are discrete Fourier transforms of s ′ (m), n ′ L (m), and n ′ R (m), respectively.
X L (k) = S (k) + N L (k),
X R (k) = αS (k) + N R (k) (10)
したがって、i番目の線スペクトルにおける音声信号XL
(i)(k)、XR
(i)(k)は、
XL (i)(k)=S(i)(k)+NL (i)(k)、
XR (i)(k)=α(i)S(i)(k)+NR (i)(k) (11)と表現される。ここで、α(i)はi番目の線スペクトルにおけるαを表す。以後、i番目の線スペクトルにおける相関信号S(i)(k)、無相関信号NL (i)(k)、NR (i)(k)をそれぞれ、
S(i)(k)=S(k)、
NL (i)(k)=NL(k)、
NR (i)(k)=NR(k) (12)
とおくこととする。 Therefore, the audio signals X L (i) (k) and X R (i) (k) in the i-th line spectrum are
X L (i) (k) = S (i) (k) + N L (i) (k),
X R (i) (k) = α (i) S (i) (k) + N R (i) (k) (11) Here, α (i) represents α in the i-th line spectrum. Thereafter, the correlation signal S (i) (k), the non-correlation signal N L (i) (k), and N R (i) (k) in the i-th line spectrum, respectively,
S (i) (k) = S (k),
N L (i) (k) = N L (k),
N R (i) (k) = N R (k) (12)
I will leave it.
XL (i)(k)=S(i)(k)+NL (i)(k)、
XR (i)(k)=α(i)S(i)(k)+NR (i)(k) (11)と表現される。ここで、α(i)はi番目の線スペクトルにおけるαを表す。以後、i番目の線スペクトルにおける相関信号S(i)(k)、無相関信号NL (i)(k)、NR (i)(k)をそれぞれ、
S(i)(k)=S(k)、
NL (i)(k)=NL(k)、
NR (i)(k)=NR(k) (12)
とおくこととする。 Therefore, the audio signals X L (i) (k) and X R (i) (k) in the i-th line spectrum are
X L (i) (k) = S (i) (k) + N L (i) (k),
X R (i) (k) = α (i) S (i) (k) + N R (i) (k) (11) Here, α (i) represents α in the i-th line spectrum. Thereafter, the correlation signal S (i) (k), the non-correlation signal N L (i) (k), and N R (i) (k) in the i-th line spectrum, respectively,
S (i) (k) = S (k),
N L (i) (k) = N L (k),
N R (i) (k) = N R (k) (12)
I will leave it.
数式(11)から、数式(7)の音圧PL
(i)とPR
(i)は、
PL (i)=PS (i)+PN (i)、
PR (i)=[α(i)]2PS (i)+PN (i) (13)
と表される。ここで、PS (i)、PN (i)はi番目の線スペクトルにおけるそれぞれ相関信号、無相関信号の電力であり、
と表される。ここで、左右の無相関信号の音圧は等しいと仮定している。
From Equation (11), the sound pressures P L (i) and P R (i) in Equation (7 ) are
P L (i) = P S (i) + P N (i)
P R (i) = [α (i) ] 2 P S (i) + P N (i) (13)
It is expressed. Here, P S (i) and P N (i) are the powers of the correlated signal and the uncorrelated signal in the i-th line spectrum, respectively.
It is expressed. Here, it is assumed that the sound pressures of the left and right uncorrelated signals are equal.
PL (i)=PS (i)+PN (i)、
PR (i)=[α(i)]2PS (i)+PN (i) (13)
と表される。ここで、PS (i)、PN (i)はi番目の線スペクトルにおけるそれぞれ相関信号、無相関信号の電力であり、
P L (i) = P S (i) + P N (i)
P R (i) = [α (i) ] 2 P S (i) + P N (i) (13)
It is expressed. Here, P S (i) and P N (i) are the powers of the correlated signal and the uncorrelated signal in the i-th line spectrum, respectively.
また、数式(5)~(7)より、数式(4)は、
と表すことができる。ただし、この算出においてはS(k)、NL(k)、NR(k)が互いに直交し、かけ合わされたときの電力は0と仮定している。
From Equations (5) to (7), Equation (4) is
It can be expressed as. However, in this calculation, it is assumed that S (k), N L (k), and N R (k) are orthogonal to each other and the power when multiplied is 0.
数式(13)と数式(15)を解くことにより、次の式が得られる。
By solving Equation (13) and Equation (15), the following equation is obtained.
これらの値を用いて、各線スペクトルにおける相関信号と無相関信号を推定する。i番目の線スペクトルにおける相関信号S(i)(k)の推定値est(S(i)(k))を、媒介変数μ1、μ2を用いて、
est(S(i)(k))=μ1XL (i)(k)+μ2XR (i)(k) (18)
とおくと、推定誤差εは、
ε=est(S(i)(k))-S(i)(k) (19)
と表される。ここで、est(A)はAの推定値を表すものとする。そして二乗誤差ε2が最少になるとき、εとXL (i)(k)、XR (i)(k)はそれぞれ直交するという性質を利用すると、
E[ε・XL (i)(k)]=0、E[ε・XR (i)(k)]=0 (20)
という関係が成り立つ。数式(11)、(14)、(16)~(19)を利用すると、数式(20)から次の連立方程式が導出できる。
(1-μ1-μ2α(i))PS (i)-μ1PN (i)=0
α(i)(1-μ1-μ2α(i))PS (i)-μ2PN (i)=0
(21) Using these values, a correlation signal and a non-correlation signal in each line spectrum are estimated. Estimate the estimated value est (S (i) (k)) of the correlation signal S (i) (k) in the i-th line spectrum using the parameters μ 1 and μ 2 ,
est (S (i) (k)) = μ 1 X L (i) (k) + μ 2 X R (i) (k) (18)
The estimated error ε is
ε = est (S (i) (k)) − S (i) (k) (19)
It is expressed. Here, est (A) represents an estimated value of A. And when the square error ε 2 is minimized, using the property that ε and X L (i) (k) and X R (i) (k) are orthogonal to each other,
E [ε · X L (i) (k)] = 0, E [ε · X R (i) (k)] = 0 (20)
This relationship holds. The following simultaneous equations can be derived from Equation (20) by using Equations (11), (14), and (16) to (19).
(1-μ 1 −μ 2 α (i) ) P S (i) −μ 1 P N (i) = 0
α (i) (1-μ 1 −μ 2 α (i) ) P S (i) −μ 2 P N (i) = 0
(twenty one)
est(S(i)(k))=μ1XL (i)(k)+μ2XR (i)(k) (18)
とおくと、推定誤差εは、
ε=est(S(i)(k))-S(i)(k) (19)
と表される。ここで、est(A)はAの推定値を表すものとする。そして二乗誤差ε2が最少になるとき、εとXL (i)(k)、XR (i)(k)はそれぞれ直交するという性質を利用すると、
E[ε・XL (i)(k)]=0、E[ε・XR (i)(k)]=0 (20)
という関係が成り立つ。数式(11)、(14)、(16)~(19)を利用すると、数式(20)から次の連立方程式が導出できる。
(1-μ1-μ2α(i))PS (i)-μ1PN (i)=0
α(i)(1-μ1-μ2α(i))PS (i)-μ2PN (i)=0
(21) Using these values, a correlation signal and a non-correlation signal in each line spectrum are estimated. Estimate the estimated value est (S (i) (k)) of the correlation signal S (i) (k) in the i-th line spectrum using the parameters μ 1 and μ 2 ,
est (S (i) (k)) = μ 1 X L (i) (k) + μ 2 X R (i) (k) (18)
The estimated error ε is
ε = est (S (i) (k)) − S (i) (k) (19)
It is expressed. Here, est (A) represents an estimated value of A. And when the square error ε 2 is minimized, using the property that ε and X L (i) (k) and X R (i) (k) are orthogonal to each other,
E [ε · X L (i) (k)] = 0, E [ε · X R (i) (k)] = 0 (20)
This relationship holds. The following simultaneous equations can be derived from Equation (20) by using Equations (11), (14), and (16) to (19).
(1-μ 1 −μ 2 α (i) ) P S (i) −μ 1 P N (i) = 0
α (i) (1-μ 1 −μ 2 α (i) ) P S (i) −μ 2 P N (i) = 0
(twenty one)
この数式(21)を解くことによって、各媒介変数が次のように求まる。
ここで、このようにして求まる推定値est(S(i)(k))の電力Pest(S)
(i)が、数式(18)の両辺を二乗して求まる次の式
Pest(S) (i)=(μ1+α(i)μ2)2PS (i)+(μ1 2+μ2 2)PN (i) (23)
を満たす必要があるため、この式から推定値を次式のようにスケーリングする。なお、est′(A)はAの推定値をスケーリングしたものを表す。 By solving the equation (21), each parameter is obtained as follows.
Here, the power P est (S) (i) of the estimated value est (S (i) (k)) obtained in this way is obtained by squaring both sides of the equation (18), and the following equation P est (S ) (i) = (μ 1 + α (i) μ 2 ) 2 P S (i) + (μ 1 2 + μ 2 2 ) P N (i) (23)
Therefore, the estimated value is scaled as follows from this equation. Note that est ′ (A) represents a scaled estimate of A.
Pest(S) (i)=(μ1+α(i)μ2)2PS (i)+(μ1 2+μ2 2)PN (i) (23)
を満たす必要があるため、この式から推定値を次式のようにスケーリングする。なお、est′(A)はAの推定値をスケーリングしたものを表す。 By solving the equation (21), each parameter is obtained as follows.
Therefore, the estimated value is scaled as follows from this equation. Note that est ′ (A) represents a scaled estimate of A.
そして、i番目の線スペクトルにおける左右チャネルの無相関信号NL
(i)(k)、NR
(i)(k)に対する推定値est(NL
(i)(k))、est(NR
(i)(k))はそれぞれ、
est(NL (i)(k))=μ3XL (i)(k)+μ4XR (i)(k) (25)
est(NR (i)(k))=μ5XL (i)(k)+μ6XR (i)(k) (26)
とおくことにより、上述の求め方と同様にして、媒介変数μ3~μ6は、 The estimated values est (N L (i) (k)) and est (N R ) for the left and right channel uncorrelated signals N L (i) (k) and N R (i) (k) in the i-th line spectrum. (i) (k))
est (N L (i) (k)) = μ 3 X L (i) (k) + μ 4 X R (i) (k) (25)
est (N R (i) (k)) = μ 5 X L (i) (k) + μ 6 X R (i) (k) (26)
Thus, in the same way as the above calculation, the parameters μ 3 to μ 6 are
est(NL (i)(k))=μ3XL (i)(k)+μ4XR (i)(k) (25)
est(NR (i)(k))=μ5XL (i)(k)+μ6XR (i)(k) (26)
とおくことにより、上述の求め方と同様にして、媒介変数μ3~μ6は、 The estimated values est (N L (i) (k)) and est (N R ) for the left and right channel uncorrelated signals N L (i) (k) and N R (i) (k) in the i-th line spectrum. (i) (k))
est (N L (i) (k)) = μ 3 X L (i) (k) + μ 4 X R (i) (k) (25)
est (N R (i) (k)) = μ 5 X L (i) (k) + μ 6 X R (i) (k) (26)
Thus, in the same way as the above calculation, the parameters μ 3 to μ 6 are
数式(22)、(27)、(28)で示した各媒介変数μ1~μ6及び数式(24)、(29)、(30)で示したスケーリングの係数が、ステップS6で求める変換係数に該当する。そして、ステップS7では、これらの変換係数を用いた演算(数式(18)、(25)、(26))により推定することで、相関信号と無相関信号(右チャネルの無相関信号、左チャネルの無相関信号)とを分離抽出する。
The respective transformation variables μ 1 to μ 6 represented by the mathematical expressions (22), (27), and (28) and the scaling coefficients represented by the mathematical expressions (24), (29), and (30) are converted coefficients obtained in step S6. It corresponds to. In step S7, the correlation signal and the non-correlated signal (the uncorrelated signal of the right channel, the uncorrelated signal of the left channel, and the left channel) are estimated by calculation using these transform coefficients (formulas (18), (25), (26)). And uncorrelated signals).
次に、仮想音源への割り当て処理を行う(ステップS8)。まず、この割り当て処理では前処理として、線スペクトル毎に推定した相関信号によって生成される合成音像の方向を推定する。この推定処理について、図13~図15に基づき説明する。図13は、受聴者と左右のスピーカと合成音像との位置関係の例を説明するための模式図、図14は、波面合成再生方式で使用するスピーカ群と仮想音源との位置関係の例を説明するための模式図、図15は、図14の仮想音源と受聴者及び合成音像との位置関係の例を説明するための模式図である。
Next, the assignment process to the virtual sound source is performed (step S8). First, in this allocation process, as a pre-process, the direction of the synthesized sound image generated by the correlation signal estimated for each line spectrum is estimated. This estimation process will be described with reference to FIGS. FIG. 13 is a schematic diagram for explaining an example of the positional relationship between the listener, the left and right speakers, and the synthesized sound image, and FIG. 14 shows an example of the positional relationship between the speaker group used in the wavefront synthesis reproduction method and the virtual sound source. FIG. 15 is a schematic diagram for explaining an example of the positional relationship between the virtual sound source of FIG. 14, the listener, and the synthesized sound image.
いま、図13に示す位置関係130のように、受聴者から左右のスピーカ131L,131Rの中点にひいた線と、同じく受聴者133からいずれかのスピーカ131L/131Rの中心までひいた線がなす見開き角をθ0、受聴者133から推定合成音像132の位置までひいた線がなす見開き角をθとする。ここで、左右のスピーカ131L,131Rから同じ音声信号を、音圧バランスを変えて出力した場合、その出力音声によって生じる合成音像132の方向は、音圧バランスを表す前述のパラメータαを用いて次の式で近似できることが一般的に知られている(以下、立体音響におけるサインの法則と呼ぶ)。
Now, as in the positional relationship 130 shown in FIG. 13, a line drawn from the listener to the middle point of the left and right speakers 131L and 131R and a line drawn from the listener 133 to the center of one of the speakers 131L / 131R. The spread angle formed is θ 0 , and the spread angle formed by the line drawn from the listener 133 to the position of the estimated synthesized sound image 132 is θ. Here, when the same audio signal is output from the left and right speakers 131L and 131R while changing the sound pressure balance, the direction of the synthesized sound image 132 generated by the output sound is the following using the parameter α representing the sound pressure balance. It is generally known that the following equation can be approximated (hereinafter referred to as the sign law in stereophonic sound).
ここで、2chステレオの音声信号を波面合成再生方式で再生できるようにするために、図8に示す音声信号分離抽出部82が2chの信号を複数チャネルの信号に変換する。例えば変換後のチャネル数を5つとした場合、それを図14で示す位置関係140のように、波面合成再生方式における仮想音源142a~142eと見做し、スピーカ群(スピーカアレイ)141の後方に配置する。なお、仮想音源142a~142eにおける隣り合う仮想音源との間隔は均等とする。したがって、ここでの変換は、2chの音声信号を仮想音源数の音声信号に変換することになる。既に説明したように、音声信号分離抽出部82は、まず2chの音声信号を、線スペクトル毎に1つの相関信号と2つの無相関信号に分離する。音声信号分離抽出部82では、さらにそれらの信号をどのように仮想音源数の仮想音源(ここでは5つの仮想音源)に割り当てるかを事前に決めておかなければならない。なお、割り当ての方法については複数の方法の中からユーザ設定可能にしておいてもよいし、仮想音源数に応じて選択可能な方法を変えてユーザに提示するようにしてもよい。
Here, in order to be able to reproduce the 2ch stereo audio signal by the wavefront synthesis reproduction method, the audio signal separation and extraction unit 82 shown in FIG. 8 converts the 2ch signal into a signal of a plurality of channels. For example, when the number of channels after conversion is five, it is regarded as virtual sound sources 142a to 142e in the wavefront synthesis reproduction system as shown in the positional relationship 140 shown in FIG. 14, and behind the speaker group (speaker array) 141. Deploy. Note that the virtual sound sources 142a to 142e are equally spaced from adjacent virtual sound sources. Therefore, the conversion here converts the audio signal of 2ch into the audio signal of the number of virtual sound sources. As already described, the audio signal separation / extraction unit 82 first separates the 2ch audio signal into one correlation signal and two uncorrelated signals for each line spectrum. In the audio signal separation / extraction unit 82, it is necessary to determine in advance how to allocate these signals to the virtual sound sources of the number of virtual sound sources (here, five virtual sound sources). The assignment method may be user-configurable from a plurality of methods, or may be presented to the user by changing the selectable method according to the number of virtual sound sources.
割り当て方法の1つの例として、次のような方法を採る。それは、まず、左右の無相関信号については、5つの仮想音源の両端(仮想音源142a,142e)にそれぞれ割り当てる。次に、相関信号によって生じる合成音像については、5つのうちの隣接する2つの仮想音源に割り当てる。隣接するどの2つの仮想音源に割り当てるかについては、まず、前提として、相関信号によって生じる合成音像が5つの仮想音源の両端(仮想音源142a,142e)より内側になるものとし、すなわち、2chステレオ再生時の2つのスピーカによってなす見開き角内におさまるように5つの仮想音源142a~142eを配置するものとする。そして、合成音像の推定方向から、その合成音像を挟むような隣接する2つの仮想音源を決定し、その2つの仮想音源への音圧バランスの割り当てを調整して、その2つの仮想音源によって合成音像を生じさせるように再生する、という割り当て方法を採る。
The following method is adopted as an example of the allocation method. First, the left and right uncorrelated signals are assigned to both ends ( virtual sound sources 142a and 142e) of the five virtual sound sources, respectively. Next, the synthesized sound image generated by the correlation signal is assigned to two adjacent virtual sound sources out of the five. Regarding which two virtual sound sources are adjacent to each other, first, as a premise, the synthesized sound image generated by the correlation signal is assumed to be inside both ends ( virtual sound sources 142a and 142e) of the five virtual sound sources, that is, 2ch stereo reproduction. Assume that five virtual sound sources 142a to 142e are arranged so as to fall within a spread angle formed by two speakers at the time. Then, two adjacent virtual sound sources that sandwich the synthesized sound image are determined from the estimated direction of the synthesized sound image, and the allocation of the sound pressure balance to the two virtual sound sources is adjusted, and the two virtual sound sources are synthesized. An allocation method is adopted in which reproduction is performed so as to generate a sound image.
そこで、図15で示す位置関係150のように、受聴者153から両端の仮想音源142a,142eの中点にひいた線と、端の仮想音源142eにひいた線とがなす見開き角をθ0、受聴者153から合成音像151にひいた線とがなす見開き角をθとする。さらに、受聴者153から合成音像151を挟む2つの仮想音源142c,142dの中点にひいた線と、受聴者153から両端の仮想音源142a,142eの中点にひいた線(受聴者153から仮想音源142cにひいた線)とがなす見開き角をφ0、受聴者153から合成音像151にひいた線とがなす見開き角をφとする。ここで、φ0は正の実数である。数式(31)で説明したようにして方向を推定した図13の合成音像132(図15における合成音像151に対応)を、これらの変数を用いて仮想音源に割り当てる方法について説明する。
Therefore, as in the positional relationship 150 shown in FIG. 15, the spread angle formed by the line drawn from the listener 153 to the midpoint of the virtual sound sources 142a and 142e at both ends and the line drawn from the virtual sound source 142e at the end is θ 0. A spread angle formed by a line drawn from the listener 153 to the synthesized sound image 151 is defined as θ. Further, a line drawn from the listener 153 at the midpoint between the two virtual sound sources 142c and 142d sandwiching the synthesized sound image 151 and a line drawn from the listener 153 at the midpoint between the virtual sound sources 142a and 142e at both ends (from the listener 153). A spread angle formed by a line drawn on the virtual sound source 142c) is φ 0 , and a spread angle formed by a line drawn from the listener 153 on the synthesized sound image 151 is φ. Here, φ 0 is a positive real number. A method of assigning the synthesized sound image 132 in FIG. 13 (corresponding to the synthesized sound image 151 in FIG. 15) whose direction has been estimated as described in Equation (31) to the virtual sound source using these variables will be described.
まず、i番目の合成音像の方向θ(i)が数式(31)によって推定され、例えばθ(i)=π/15[rad]であったとする。そして、仮想音源が5つの場合、図15に示すように合成音像151は左から数えて3番目の仮想音源142cと4番目の仮想音源142dの間に位置することになる。また、仮想音源が5つである場合、3番目の仮想音源142cと4番目の仮想音源142dの間について、三角関数を用いた単純な幾何的計算により、φ0≒0.121[rad]となり、i番目の線スペクトルにおけるφをφ(i)とすると、φ(i)=θ(i)-φ0≒0.088[rad]となる。このようにして、各線スペクトルにおける相関信号によって生じる合成音像の方向を、それを挟む2つの仮想音源の方向からの相対的な角度で表す。そして上述したように、その2つの仮想音源142c,142dでその合成音像を生じさせることを考える。そのためには、2つの仮想音源142c,142dからの出力音声信号の音圧バランスを調整すればよく、その調整方法については、再び数式(31)として利用した立体音響におけるサインの法則を用いる。
First, it is assumed that the direction θ (i) of the i-th synthesized sound image is estimated by Expression (31), and for example, θ (i) = π / 15 [rad]. When there are five virtual sound sources, the synthesized sound image 151 is positioned between the third virtual sound source 142c and the fourth virtual sound source 142d as counted from the left as shown in FIG. When there are five virtual sound sources, φ 0 ≈0.11 [rad] is obtained by simple geometric calculation using a trigonometric function between the third virtual sound source 142c and the fourth virtual sound source 142d. When φ in the i-th line spectrum is φ (i) , φ (i) = θ (i) −φ 0 ≈0.088 [rad]. In this way, the direction of the synthesized sound image generated by the correlation signal in each line spectrum is represented by a relative angle from the directions of the two virtual sound sources sandwiching the synthetic sound image. Then, as described above, it is considered that the synthesized sound image is generated by the two virtual sound sources 142c and 142d. For that purpose, the sound pressure balance of the output audio signals from the two virtual sound sources 142c and 142d may be adjusted, and as the adjustment method, the sign law in the stereophonic sound used again as Equation (31) is used.
ここで、i番目の線スペクトルにおける相関信号によって生じる合成音像を挟む2つの仮想音源142c,142dのうち、3番目の仮想音源142cに対するスケーリング係数をg1、4番目の仮想音源142dに対するスケーリング係数をg2とすると、3番目の仮想音源142cからはg1・est′(S(i)(k))、4番目の仮想音源142dからはg2・est′(S(i)(k))の音声信号を出力することになる。そして、g1、g2は立体音響におけるサインの法則により、
を満たせばよい。
Here, of the two virtual sound sources 142c and 142d sandwiching the synthesized sound image generated by the correlation signal in the i-th line spectrum, the scaling coefficient for the third virtual sound source 142c is g 1 , and the scaling coefficient for the fourth virtual sound source 142d is When g 2, g 1 · est from the third virtual sound source 142c '(S (i) ( k)), from the fourth virtual source 142d g 2 · est' (S (i) (k)) The audio signal is output. And g 1 and g 2 are due to the sign law in stereophonic sound,
Should be satisfied.
一方、3番目の仮想音源142cと4番目の仮想音源142dからの電力の合計が、元の2chステレオの相関信号の電力と等しくなるようにg1、g2を正規化すると、
g1 2+g2 2=1+[α(i)]2 (33)
となる。 On the other hand, when g 1 and g 2 are normalized so that the total power from the third virtualsound source 142c and the fourth virtual sound source 142d is equal to the power of the original 2ch stereo correlation signal,
g 1 2 + g 2 2 = 1 + [α (i) ] 2 (33)
It becomes.
g1 2+g2 2=1+[α(i)]2 (33)
となる。 On the other hand, when g 1 and g 2 are normalized so that the total power from the third virtual
g 1 2 + g 2 2 = 1 + [α (i) ] 2 (33)
It becomes.
これらを連立させることで、
と求められる。この数式(34)に上述のφ(i)、φ0を代入することによって、g1、g2を算出する。このようにして算出したスケーリング係数に基づき、上述したように3番目の仮想音源142cにはg1・est′(S(i)(k))の音声信号を、4番目の仮想音源142dからはg2・est′(S(i)(k))の音声信号を割り当てる。そして、これも上述したように、無相関信号は両端の仮想音源142a,142eに割り当てられる。すなわち、1番目の仮想音源142aにはest′(NL
(i)(k))を、5番目の仮想音源142eにはest′(NR
(i)(k))を割り当てる。
By bringing these together,
Is required. By substituting the aforementioned φ (i) and φ 0 into the mathematical formula (34), g 1 and g 2 are calculated. Based on the scaling coefficient thus calculated, the audio signal of g 1 · est ′ (S (i) (k)) is transmitted from the fourth virtual sound source 142d to the third virtual sound source 142c as described above. The audio signal of g 2 · est ′ (S (i) (k)) is assigned. As described above, the uncorrelated signal is assigned to the virtual sound sources 142a and 142e at both ends. In other words, 'the (N L (i) (k )), the 5 th virtual source 142e est' est is the first virtual sound source 142a assigns the (N R (i) (k )).
この例とは異なり、もし合成音像の推定方向が1番目と2番目の仮想音源の間であった場合には、1番目の仮想音源にはg1・est′(S(i)(k))とest′(NL
(i)(k))の両方が割り当てられることになる。また、もし合成音像の推定方向が4番目と5番目の仮想音源の間であった場合には、5番目の仮想音源にはg2・est′(S(i)(k))とest′(NR
(i)(k))の両方が割り当てられることになる。
Unlike this example, if the estimated direction of the synthesized sound image is between the first and second virtual sound sources, g 1 · est ′ (S (i) (k) ) And est ′ (N L (i) (k)) will be assigned. Also, if the estimated direction of the synthesized sound image is between the fourth and fifth virtual sound sources, the second virtual sound source has g 2 · est ′ (S (i) (k)) and est ′. Both (N R (i) (k)) will be assigned.
以上のようにして、ステップS8における、i番目の線スペクトルについての左右チャネルの相関信号と無相関信号の割り当てが行われる。これをステップS4a,S4bのループにより全ての線スペクトルについて行う。例えば、256点の離散フーリエ変換を行った場合は1~127番目の線スペクトルまで、512点の離散フーリエ変換を行った場合は1~255番目の線スペクトルまで、セグメントの全点(1024点)について離散フーリエ変換を行った場合は1~511番目の線スペクトルまで、となる。その結果、仮想音源の数をJとすると、各仮想音源(出力チャネル)に対する周波数領域の出力音声信号Y1(k),・・・,YJ(k)が求まる。
As described above, the left and right channel correlation signals and uncorrelated signals are assigned to the i-th line spectrum in step S8. This is performed for all line spectra by the loop of steps S4a and S4b. For example, if 256 discrete Fourier transforms are performed, the first to 127th line spectrum up to 512 points. If 512 discrete Fourier transforms are performed, all the segment points up to 1st to 255th line spectrum (1024 points). When the discrete Fourier transform is performed for, the first to 511th line spectra are obtained. As a result, if the number of virtual sound sources is J, output audio signals Y 1 (k),..., Y J (k) in the frequency domain for each virtual sound source (output channel) are obtained.
そして、得られた各出力チャネルについて、ステップS10~S12の処理を実行する(ステップS9a,S9b)。以下、ステップS10~S12の処理について説明する。
Then, the processing of steps S10 to S12 is executed for each obtained output channel (steps S9a and S9b). Hereinafter, the processing of steps S10 to S12 will be described.
まず、各出力チャネルを離散フーリエ逆変換することによって、時間領域の出力音声信号y′J(m)を求める(ステップS10)。ここで、DFT-1は離散フーリエ逆変換を表す。
y′J(m)=DFT-1(YJ(k)) (1≦j≦J) (35)
ここで、数式(3)で説明したように、離散フーリエ変換した信号は、窓関数乗算後の信号であったため、逆変換して得られた信号y′J(m)も窓関数が乗算された状態となっている。窓関数は数式(1)に示すような関数であり、読み込みは1/4セグメント長ずつずらしながら行ったため、前述した通り、1つ前に処理したセグメントの先頭から1/4セグメント長ずつずらしながら出力バッファに加算していくことにより変換後のデータを得る。 First, the output speech signal y ′ J (m) in the time domain is obtained by performing inverse discrete Fourier transform on each output channel (step S10). Here, DFT −1 represents discrete Fourier inverse transform.
y ′ J (m) = DFT −1 (Y J (k)) (1 ≦ j ≦ J) (35)
Here, as described in Equation (3), since the signal subjected to the discrete Fourier transform is a signal after the window function multiplication, the signal y ′ J (m) obtained by the inverse transformation is also multiplied by the window function. It is in the state. The window function is a function as shown in Formula (1), and reading is performed while shifting by a ¼ segment length. Therefore, as described above, the window function is shifted by a ¼ segment length from the head of the previous processed segment. The converted data is obtained by adding to the output buffer.
y′J(m)=DFT-1(YJ(k)) (1≦j≦J) (35)
ここで、数式(3)で説明したように、離散フーリエ変換した信号は、窓関数乗算後の信号であったため、逆変換して得られた信号y′J(m)も窓関数が乗算された状態となっている。窓関数は数式(1)に示すような関数であり、読み込みは1/4セグメント長ずつずらしながら行ったため、前述した通り、1つ前に処理したセグメントの先頭から1/4セグメント長ずつずらしながら出力バッファに加算していくことにより変換後のデータを得る。 First, the output speech signal y ′ J (m) in the time domain is obtained by performing inverse discrete Fourier transform on each output channel (step S10). Here, DFT −1 represents discrete Fourier inverse transform.
y ′ J (m) = DFT −1 (Y J (k)) (1 ≦ j ≦ J) (35)
Here, as described in Equation (3), since the signal subjected to the discrete Fourier transform is a signal after the window function multiplication, the signal y ′ J (m) obtained by the inverse transformation is also multiplied by the window function. It is in the state. The window function is a function as shown in Formula (1), and reading is performed while shifting by a ¼ segment length. Therefore, as described above, the window function is shifted by a ¼ segment length from the head of the previous processed segment. The converted data is obtained by adding to the output buffer.
しかし、このままでは、従来技術として上述した通り、不連続点が変換後のデータに多数含まれてしまい、それらが再生時にノイズとなって知覚される。このような不連続点は、直流成分の線スペクトルを考慮しないことによるものであることは前述した通りである。図16はそれを模式的に示した波形のグラフである。より詳細には、図16は、左右チャネルの音声信号を離散フーリエ変換し左右チャネルの直流成分を無視した場合に、離散フーリエ逆変換後のセグメント境界に生じる波形の不連続点を説明するための模式図である。図16に示すグラフ160において、横軸は時間を表しており、例えば(0)(l)という記号は、l番目のセグメントの1番目の標本点であることを示し、(M-1)(l)という記号は、l番目のセグメントのM番目の標本点であることを示している。グラフ160の縦軸は、それらの標本点に対する出力信号の値である。このグラフ160から分かるように、(l-1)番目のセグメントの最後からl番目のセグメントの最初にかけての部分で不連続点が生じてしまう。
However, in this state, as described above as the prior art, many discontinuous points are included in the converted data, and these are perceived as noise during reproduction. As described above, such a discontinuous point is caused by not considering the line spectrum of the DC component. FIG. 16 is a waveform graph schematically showing this. More specifically, FIG. 16 is a diagram for explaining the discontinuity points of the waveform generated at the segment boundary after the inverse discrete Fourier transform when the left and right channel audio signals are discrete Fourier transformed and the left and right channel DC components are ignored. It is a schematic diagram. In the graph 160 shown in FIG. 16, the horizontal axis represents time. For example, the symbol (0) (l) indicates the first sample point of the l-th segment, and (M−1) ( The symbol l) indicates the Mth sample point of the lth segment. The vertical axis of the graph 160 is the value of the output signal for those sample points. As can be seen from the graph 160, a discontinuous point occurs in the portion from the end of the (l-1) th segment to the beginning of the lth segment.
図16で説明したような問題を解決するために、本発明に係る音声信号変換装置は、次のように構成する。すなわち、本発明に係る音声信号変換装置は、変換部、相関信号抽出部、逆変換部、及び窓関数乗算部を備える。変換部は、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す。相関信号抽出部は、変換部で離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する。つまり、相関信号抽出部は、2つのチャネルの入力音声信号の相関信号を抽出する。
In order to solve the problem described with reference to FIG. 16, the audio signal converter according to the present invention is configured as follows. That is, the audio signal conversion apparatus according to the present invention includes a conversion unit, a correlation signal extraction unit, an inverse conversion unit, and a window function multiplication unit. The conversion unit reads out each of the input audio signals of the two channels while shifting by ¼ of the length of the processing segment, multiplies the read audio signal of the processing segment by a Hann window function, and then performs a discrete Fourier transform. Apply. The correlation signal extraction unit extracts the correlation signal from the two-channel audio signals after the discrete Fourier transform by the conversion unit while ignoring the DC component. That is, the correlation signal extraction unit extracts the correlation signal of the input audio signals of the two channels.
逆変換部は、(a1)相関信号抽出部で抽出された相関信号に対して、または(a2)その相関信号及び無相関信号(その相関信号を除く信号)に対して、もしくは(b1)その相関信号から生成された音声信号、または(b2)その相関信号及びその無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す。なお、ここでの例では、逆変換部が上記(b2)の音声信号の例である、波面合成再生方式用の仮想音源への割り当て後の音声信号に対して、不連続点を除去した例を挙げているが、これに限らない。例えば、上記(a1)または(a2)の例である仮想音源への割り当て前の音声信号に対して、すなわち抽出された相関信号または抽出された相関信号及び無相関信号に対して、不連続点を除去し、その後、割り当てを行うようにしてもよい。
The inverse transform unit is (a1) for the correlation signal extracted by the correlation signal extraction unit, or (a2) for the correlation signal and the non-correlation signal (signal excluding the correlation signal), or (b1) the Discrete Fourier inverse transform is performed on the audio signal generated from the correlation signal or (b2) the audio signal generated from the correlation signal and the non-correlation signal. In the example here, the inverse transform unit is an example of the audio signal of (b2) above, and the discontinuous points are removed from the audio signal after allocation to the virtual sound source for the wavefront synthesis reproduction method. However, it is not limited to this. For example, discontinuous points with respect to an audio signal before allocation to a virtual sound source, which is an example of the above (a1) or (a2), that is, with respect to an extracted correlation signal or an extracted correlation signal and an uncorrelated signal May be removed and then assigned.
そして、窓関数乗算部は、逆変換部で離散フーリエ逆変換後の音声信号(つまり、相関信号またはそれから生成された音声信号)のうち、処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算することにより、逆変換部で離散フーリエ逆変換後の音声信号から波形の不連続点を除去する。ここで、前の処理セグメントとは、以前の処理セグメントであって、実際には1/4ずつずらすため、1つ前、2つ前、及び3つ前の処理セグメントを指す。
The window function multiplying unit again multiplies the audio signal of the processing segment by the Hann window function among the audio signals after the inverse discrete Fourier transform by the inverse transform unit (that is, the correlation signal or the audio signal generated therefrom). Then, the waveform is shifted by ¼ of the length of the processing segment and added to the audio signal of the previous processing segment, whereby the discontinuous point of the waveform is removed from the audio signal after the inverse discrete Fourier transform by the inverse transform unit. Here, the previous processing segment refers to the previous processing segment, which is actually shifted by ¼, and refers to the previous, second, and third previous processing segments.
図8における音声信号処理部73の例では、上述の変換部は窓関数乗算部81及び音声信号分離抽出部82に含まれ、上述の相関信号抽出部及び逆変換部は音声信号分離抽出部82に含まれることになり、上述の窓関数乗算部は窓関数乗算部83で例示できる。
In the example of the audio signal processing unit 73 in FIG. 8, the conversion unit described above is included in the window function multiplication unit 81 and the audio signal separation extraction unit 82, and the correlation signal extraction unit and the inverse conversion unit described above are included in the audio signal separation extraction unit 82. The window function multiplication unit described above can be exemplified by the window function multiplication unit 83.
図17~図21を併せて参照し、図16で説明したような問題を解決するための、このような不連続点除去処理について具体的に説明する。図17は、本発明に係る不連続点除去処理を施す離散フーリエ逆変換後のセグメントを示す図である。また、図18は、入力された音声信号の波形のグラフの一例を示す図、図19は、図18の音声信号に対し、Hann窓関数による1回目の乗算処理を施した後の波形のグラフの一例を示す図、図20は、図19の音声信号に対し、離散フーリエ逆変換を施した波形のグラフの一例を示す図、図21は、図20の音声信号に対し、Hann窓関数による2回目の乗算処理を施した後の波形のグラフの一例を示す図である。
Referring to FIGS. 17 to 21 as well, such discontinuous point removal processing for solving the problem described in FIG. 16 will be specifically described. FIG. 17 is a diagram showing segments after discrete Fourier inverse transform to which discontinuous point removal processing according to the present invention is performed. FIG. 18 is a diagram illustrating an example of a waveform graph of the input audio signal, and FIG. 19 is a waveform graph after the first multiplication process by the Hann window function is performed on the audio signal of FIG. FIG. 20 is a diagram illustrating an example of a waveform graph obtained by performing inverse discrete Fourier transform on the audio signal of FIG. 19, and FIG. 21 is a Hann window function for the audio signal of FIG. It is a figure which shows an example of the graph of the waveform after performing the 2nd multiplication process.
本発明における不連続点除去処理では、図17に示すような離散フーリエ逆変換後のセグメント(処理セグメント)170の波形の最初の値と、波形の最後の値との平均値を、波形の各値から減算する。これは、前述した通り、離散フーリエ変換を行う前にHann窓を演算していることに起因している。すなわち、Hann窓の両端点の値は0であるため、もし離散フーリエ変換後、どのスペクトル成分も値を変更せず、再び離散フーリエ逆変換を行えば、そのセグメントの両端点は0となり、セグメント間の不連続点は発生しない。しかし実際は、離散フーリエ変換後の周波数領域において、上述したように各スペクトル成分を変更するため、離散フーリエ逆変換後のセグメントの両端点は0とならず、セグメント間の不連続点が発生する。
In the discontinuous point removal processing in the present invention, the average value of the first value of the waveform of the segment (processing segment) 170 after the inverse discrete Fourier transform as shown in FIG. Subtract from the value. As described above, this is because the Hann window is calculated before performing the discrete Fourier transform. That is, since the values of both end points of the Hann window are 0, if no spectral component value is changed after the discrete Fourier transform and the inverse discrete Fourier transform is performed again, the end points of the segment become 0, There is no discontinuity between them. However, in actuality, in the frequency domain after the discrete Fourier transform, each spectral component is changed as described above. Therefore, both end points of the segment after the inverse discrete Fourier transform are not 0, and discontinuous points between the segments are generated.
したがって、その両端点を0にするため、前述したように、再度Hann窓を演算する。この2回目のHann窓関数による乗算処理がどのように機能するかについて、最初の入力音声信号から2回目のHann窓関数による乗算処理が施されるまでの過程を、簡略化した入力波形に基づき説明する。まず、図18に示すグラフ180のような音声信号の波形が入力されたとすると、まず最初のHann窓関数演算によって図19に示すグラフ190のような音声信号の波形が生成され、図9のステップS3の離散フーリエ変換処理に進む。そして、演算処理の結果、図9のステップS10の離散フーリエ逆変換処理の結果の音声信号の波形が図20に示すグラフ200のように、両端点が0からずれていたとする。
Therefore, in order to set the both end points to 0, the Hann window is calculated again as described above. Regarding how the multiplication process by the second Hann window function works, the process from the first input speech signal until the multiplication process by the second Hann window function is performed is based on the simplified input waveform. explain. First, assuming that an audio signal waveform such as a graph 180 shown in FIG. 18 is inputted, an audio signal waveform like a graph 190 shown in FIG. 19 is generated by the first Hann window function calculation, and the steps of FIG. The process proceeds to the discrete Fourier transform process of S3. Then, as a result of the arithmetic processing, it is assumed that the waveform of the audio signal as a result of the discrete Fourier inverse transform processing in step S10 in FIG. 9 is shifted from 0 at both end points as in the graph 200 shown in FIG.
このままでは、両端点部分が不連続点となってノイズとして知覚されてしまうため、図9のステップS11のように再度、Hann窓関数を演算すると、図21に示すグラフ210のように、両端点が0となることが保証された音声信号の波形になる。したがって、2回目のHann窓関数の乗算処理により、不連続点が生じないことが保証される。これにより、図17の処理セグメント170の波形は、図16のグラフ160のような不連続点が生じず、グラフ160で不連続点であった部分(セグメント境界の部分)は、値が0となって連続となり、その傾き(微分値)も一致するようになる。
In this state, both end points become discontinuous points and are perceived as noise. Therefore, when the Hann window function is calculated again as in step S11 in FIG. 9, both end points are obtained as shown in a graph 210 in FIG. Is a waveform of an audio signal that is guaranteed to be zero. Therefore, the second Hann window function multiplication process ensures that no discontinuity occurs. Thus, the waveform of the processing segment 170 in FIG. 17 does not have a discontinuous point as in the graph 160 in FIG. 16, and a portion that is a discontinuous point in the graph 160 (segment boundary portion) has a value of 0. It becomes continuous and the slope (differential value) also coincides.
その後は、前述したように、2回目のHann窓関数乗算処理後の処理セグメントに対し、3/2の逆数である2/3を乗じ、それを前の処理セグメントの音声信号に(実際には1つ前、2つ前、3つ前の処理セグメントの音声信号のそれぞれに)加算すれば、元の波形が完全に復元できる。実際にはこの時点で、3つ前の処理セグメントの音声信号の波形まで完全に復元できる。このように、2/3を乗じた2回目のHann窓関数乗算処理後の処理セグメントを、1/4セグメントずつずらしながら加算していけば、元の波形が完全に復元できる。もしくは、2回目のHann窓関数乗算処理後の処理セグメントに対し、1/4セグメントずつずらしながら加算し、加算が全て完了した処理セグメント(上記3つ前の処理セグメント)について2/3を乗じれば、元の信号が完全に復元されることになる。無論、この2/3を乗算する処理については実行しなくても、振幅が大きくなるだけであるので構わない。
After that, as described above, the processing segment after the second Hann window function multiplication process is multiplied by 2/3, which is the inverse of 3/2, and this is multiplied by the audio signal of the previous processing segment (actually The original waveform can be completely restored if it is added to the audio signals of the previous, second, and third processing segments. Actually, at this point, the waveform of the audio signal of the previous three processing segments can be completely restored. In this way, if the processing segments after the second Hann window function multiplication process multiplied by 2/3 are added while being shifted by 1/4 segment, the original waveform can be completely restored. Alternatively, the processing segment after the second Hann window function multiplication processing is added while shifting by 1/4 segment, and the processing segment (the processing segment before the above three) that has been completely added is multiplied by 2/3. In this case, the original signal is completely restored. Of course, the process of multiplying 2/3 does not have to be executed, but only the amplitude is increased.
次に、図22及び図23を参照しながら、本発明による不連続点除去の効果について、単純なサイン波を例に挙げて模式的に説明する。図22は、ずらし幅が1/2セグメントで窓関数演算を1回のみ施す場合の処理を説明するための模式図で、図23は、本発明の処理(ずらし幅が1/4セグメントで窓関数演算を2回施す場合の処理)を説明するための模式図である。
Next, with reference to FIG. 22 and FIG. 23, the effect of removing discontinuous points according to the present invention will be schematically described by taking a simple sine wave as an example. FIG. 22 is a schematic diagram for explaining the processing in the case where the shift width is 1/2 segment and the window function calculation is performed only once, and FIG. 23 is the processing of the present invention (the shift width is 1/4 segment and the window It is a schematic diagram for demonstrating the process in the case of performing a function calculation twice.
図22に示すように、入力波形221に対して、ずらし幅が1/2セグメントで窓関数演算を1回施して、離散フーリエ変換、音声信号分離抽出、離散フーリエ逆変換を施すと、処理後の各セグメント波形222,223は、図20のグラフ200と同様に、直流成分によって両端が0になっていない。出力波形224は、これらのセグメント波形222,223を加算したものとなるため、不連続点224a,224bが生じる。
As shown in FIG. 22, when the input waveform 221 is subjected to window function calculation once with a shift width of ½ segment and subjected to discrete Fourier transform, speech signal separation extraction, and inverse discrete Fourier transform, As in the graph 200 of FIG. 20, the segment waveforms 222 and 223 of FIG. Since the output waveform 224 is obtained by adding the segment waveforms 222 and 223, discontinuous points 224a and 224b are generated.
これに対し、本発明の処理では、図23に示すように、入力波形221と同じ入力波形231に対して、ずらし幅が1/4セグメントで1回目の窓関数演算を施して、離散フーリエ変換、音声信号分離抽出、離散フーリエ逆変換を施した後、2回目の窓関数演算を施す。このように離散フーリエ逆変換後の各セグメント波形に対して2回目の窓関数演算を施した各セグメント波形232,233,234,235は、その両端が、図21のグラフ210でも示したように、必ず0となる。ここで、セグメント波形235を現在の処理セグメントとすると、セグメント波形234,233,232はそれぞれ1,2,3つ前の処理セグメントの波形に相当する。出力波形236は、これらのセグメント波形232,233,234,235を加算したものとなるため、加算によっても出力波形236には不連続点が生じない。
On the other hand, in the processing of the present invention, as shown in FIG. 23, the first window function calculation is performed on the input waveform 231 that is the same as the input waveform 221 with a shift width of 1/4 segment, and the discrete Fourier transform is performed. After performing speech signal separation and extraction and inverse discrete Fourier transform, a second window function calculation is performed. Thus, each segment waveform 232, 233, 234, 235 obtained by performing the second window function operation on each segment waveform after the inverse discrete Fourier transform has both ends as shown in the graph 210 of FIG. Always 0. Here, assuming that the segment waveform 235 is the current processing segment, the segment waveforms 234, 233, and 232 correspond to the waveforms of the previous 1, 2, and 3 processing segments, respectively. Since the output waveform 236 is obtained by adding the segment waveforms 232, 233, 234, and 235, no discontinuity occurs in the output waveform 236 even by the addition.
このように、本発明に係る不連続点除去処理では、2回目のHann窓関数乗算処理を施すことで、離散フーリエ逆変換後の音声信号から波形の不連続点を除去する。そのため、本発明によれば、2chや5.1ch等のマルチチャネル方式用の音声信号を、不連続点に起因するノイズを発生させることなく、波面合成再生方式で再生させるための音声信号に変換することが可能になる。特に、本発明によれば、特許文献1に記載の方法とは異なり、子音部分のようにホワイトノイズに近いような成分とそれ以外の成分が混ざり合った楽音信号や、人間の音声のうち子音と母音の間のような特性を持つ濁音などの音声信号などに対しても、不連続点に起因するノイズを発生させることはない。
As described above, in the discontinuous point removal processing according to the present invention, the discontinuous points of the waveform are removed from the speech signal after the inverse discrete Fourier transform by performing the second Hann window function multiplication processing. Therefore, according to the present invention, an audio signal for a multi-channel system such as 2ch or 5.1ch is converted into an audio signal for reproduction by a wavefront synthesis reproduction system without generating noise due to discontinuous points. It becomes possible to do. In particular, according to the present invention, unlike the method described in Patent Document 1, a musical sound signal in which a component close to white noise and a component other than the white noise, such as a consonant portion, are mixed, or a consonant among human voices. Noise caused by discontinuities is not generated even for voice signals such as muddy sounds having characteristics between the vowels and vowels.
そして、本発明によれば、ノイズを発生させずに波面合成再生方式で再生させるための音声信号に変換することができるため、波面合成再生方式の特長である、どの位置の受聴者に対してもコンテンツ製作者の意図通りの音像定位を提供するという効果を享受できる。
According to the present invention, since it can be converted into an audio signal to be reproduced by the wavefront synthesis reproduction method without generating noise, it is possible for the listener at any position which is a feature of the wavefront synthesis reproduction method. Can also enjoy the effect of providing sound image localization as intended by the content creator.
また、窓関数乗算部83で処理対象となる離散フーリエ逆変換後の音声信号は、各数式で例示したように、相関信号または相関信号及び無相関信号に対して、時間領域あるいは周波数領域においてスケーリング処理を行い、そのスケーリング処理後の音声信号としてもよい。つまり、相関信号や無相関信号に対しスケーリング処理を施し、スケーリング処理後の相関信号や無相関信号に対し、Hann窓関数の乗算によって不連続点の除去を行うようにしてもよい。
Further, the speech signal after inverse discrete Fourier transform to be processed by the window function multiplication unit 83 is scaled in the time domain or the frequency domain with respect to the correlation signal or the correlation signal and the non-correlation signal as illustrated in each equation. It is good also as an audio | voice signal after processing and the scaling process. That is, scaling processing may be performed on the correlation signal or non-correlation signal, and the discontinuous points may be removed by multiplying the correlation signal or non-correlation signal after the scaling processing by the Hann window function.
以上、本発明に係る音声信号変換処理について、入力音声信号が2chの音声信号である例を挙げて説明したが、次に他のマルチチャネルの音声信号であっても適用可能であることを説明する。ここでは、図24を参照しながら5.1chの入力音声信号を例に挙げるが、他のマルチチャネルの入力音声信号についても同様に適用できる。
In the above, the audio signal conversion processing according to the present invention has been described with reference to an example in which the input audio signal is a 2ch audio signal. However, it can be applied to other multi-channel audio signals. To do. Here, a 5.1ch input audio signal will be described as an example with reference to FIG. 24, but the present invention can be similarly applied to other multi-channel input audio signals.
図24は、5.1chの音声信号を波面合成再生方式で再生する際に、使用するスピーカ群と仮想音源との位置関係の例を説明するための模式図である。5.1chの入力音声に本発明に係る音声信号変換処理を適用することを考える。5.1chのスピーカの配置方法については一般的に図2のように配置されることが多く、受聴者の前方には3つのスピーカ21L,22C,21Rが並んでいる。そして、映画などのコンテンツでは特に、前方中央のいわゆるセンターチャネルは人の台詞音声などの用途で使用されることが多い。つまり、センターチャネルと左チャネル、あるいはセンターチャネルと右チャネルの間で合成音像を生じさせるような音圧制御がされている箇所はあまり多くない。
FIG. 24 is a schematic diagram for explaining an example of the positional relationship between a speaker group to be used and a virtual sound source when a 5.1ch audio signal is reproduced by the wavefront synthesis reproduction method. Consider applying the audio signal conversion processing according to the present invention to 5.1ch input audio. In general, 5.1ch speakers are arranged as shown in FIG. 2, and three speakers 21L, 22C, and 21R are arranged in front of the listener. And especially in content such as movies, the so-called center channel at the front center is often used for applications such as human speech. That is, there are not many places where sound pressure control is performed so as to generate a synthesized sound image between the center channel and the left channel or between the center channel and the right channel.
この性質を利用して、図24で示す位置関係240のように、5.1chの前方左右のスピーカ242a,242cへの入力音声信号を本方式(本発明に係る音声信号変換処理)によって変換し、例えば5つの仮想音源243a~243eに割り当てた後、真ん中の仮想音源243cにセンターチャネル(センタースピーカ用のチャネル)の音声信号を加算する。そのようにして、出力音声信号を仮想音源に対する音像として波面合成再生方式でスピーカアレイ241により再生する。そして後方左右のチャネル用の入力音声信号については、後方に5.1chと同じくスピーカ242d,242eを設置し、そこから何も手を加えずに出力するなどすればよい。
Using this property, the input audio signals to the 5.1ch front left and right speakers 242a and 242c are converted by this method (audio signal conversion processing according to the present invention) as in the positional relationship 240 shown in FIG. For example, after allocating to five virtual sound sources 243a to 243e, an audio signal of the center channel (center speaker channel) is added to the middle virtual sound source 243c. In this way, the output audio signal is reproduced as a sound image for the virtual sound source by the speaker array 241 by the wavefront synthesis reproduction method. As for the input audio signals for the left and right channels, speakers 242d and 242e may be installed behind 5.1ch and output without any change from there.
このように、マルチチャネルの入力音声信号が3つ以上のチャネルの入力音声信号であることを前提とし、マルチチャネルの入力音声信号のうちいずれか2つの入力音声信号に対して、本発明に係る上述のような音声信号変換処理を行って、波面合成再生方式で再生させるための音声信号を生成し、生成された音声信号に残りのチャネルの入力音声信号を加算して出力するようにしてもよい。この加算は、例えば音声出力信号生成部84において加算部を設けておけば済む。
As described above, on the premise that the multi-channel input audio signal is an input audio signal of three or more channels, the present invention relates to any two input audio signals among the multi-channel input audio signals. The audio signal conversion process as described above is performed to generate an audio signal to be reproduced by the wavefront synthesis reproduction method, and the input audio signals of the remaining channels are added to the generated audio signal and output. Good. For this addition, for example, an adder may be provided in the audio output signal generator 84.
次に、本発明の実装について簡単に説明する。本発明は、例えばテレビなど映像の伴う装置に利用できる。本発明を適用可能な装置の様々な例について、図25~図31を参照しながら説明する。図25~図27は、それぞれ図7の音声データ再生装置を備えたテレビ装置の構成例を示す図で、図28及び図29は、それぞれ図7の音声データ再生装置を備えた映像投影システムの構成例を示す図、図30は、図7の音声データ再生装置を備えたテレビボードとテレビ装置とでなるシステムの構成例を示す図、図31は、図7の音声データ再生装置を備えた自動車の例を示す図である。なお、図25~図31のいずれにおいても、スピーカアレイとしてLSP1~LSP8で示す8個のスピーカを配列した例を挙げているが、スピーカの数は複数であればよい。
Next, the implementation of the present invention will be briefly described. The present invention can be used for an apparatus accompanied with an image such as a television. Various examples of apparatuses to which the present invention can be applied will be described with reference to FIGS. FIGS. 25 to 27 are diagrams showing examples of the configuration of the television apparatus provided with the audio data reproducing apparatus of FIG. 7, and FIGS. 28 and 29 are diagrams of the video projection system provided with the audio data reproducing apparatus of FIG. 7, respectively. FIG. 30 is a diagram illustrating a configuration example, FIG. 30 is a diagram illustrating a configuration example of a system including a television board and a television device including the audio data reproduction device of FIG. 7, and FIG. 31 is provided with the audio data reproduction device of FIG. It is a figure which shows the example of a motor vehicle. In any of FIGS. 25 to 31, an example is shown in which eight speakers indicated by LSP1 to LSP8 are arranged as the speaker array, but the number of speakers may be plural.
本発明に係る音声信号変換装置やそれを備えた音声データ再生装置はテレビ装置に利用できる。テレビ装置におけるこれらの装置の配置は自由に決めればよい。図25で示すテレビ装置250のように、テレビ画面251の下方に、音声データ再生装置におけるスピーカLSP1~LSP8を直線状に並べたスピーカ群252を設けてもよい。図26で示すテレビ装置260のように、テレビ画面261の上方に、音声データ再生装置におけるスピーカLSP1~LSP8を直線状に並べたスピーカ群262を設けてもよい。図27で示すテレビ装置270のように、テレビ画面271に、音声データ再生装置における透明のフィルム型スピーカLSP1~LSP8を直線状に並べたスピーカ群272を埋め込んでもよい。
The audio signal conversion apparatus according to the present invention and the audio data reproduction apparatus including the same can be used for a television apparatus. The arrangement of these devices in the television device may be determined freely. Like the television device 250 shown in FIG. 25, a speaker group 252 in which the speakers LSP1 to LSP8 in the audio data reproducing device are arranged in a straight line may be provided below the television screen 251. Like the television device 260 shown in FIG. 26, a speaker group 262 in which the speakers LSP1 to LSP8 in the audio data reproducing device are arranged in a straight line may be provided above the television screen 261. As in the television device 270 shown in FIG. 27, the television screen 271 may be embedded with a speaker group 272 in which transparent film type speakers LSP1 to LSP8 in the audio data reproducing device are arranged in a straight line.
また、本発明に係る音声信号変換装置やそれを備えた音声データ再生装置は、映像投影システムに利用できる。図28で示す映像投影システム280のように、映像投射装置281aで映像を投射する投射用スクリーン281bに、スピーカLSP1~LSP8のスピーカ群282を埋め込むようにしてもよい。図29で示す映像投影システム290のように、映像投射装置291aで映像を投射する音透過型のスクリーン291bの後ろに、スピーカLSP1~LSP8を並べたスピーカ群292を配置してもよい。そのほか、本発明に係る音声信号変換装置やそれを備えた音声データ再生装置は、テレビ台(テレビボード)に埋め込むこともできる。図30で示すシステム(ホームシアターシステム)300のように、テレビ装置301を搭載するためのテレビ台302aにスピーカLSP1~LSP8を並べたスピーカ群302bを埋め込むようにしてもよい。さらに、本発明に係る音声信号変換装置やそれを備えた音声データ再生装置は、カーオーディオに適用することもできる。図31で示す自動車310のように、車内のダッシュボードにスピーカLSP1~LSP8を曲線状に並べたスピーカ群312を埋め込むようにしてもよい。
Also, the audio signal conversion device according to the present invention and the audio data reproduction device provided with the same can be used in a video projection system. Like the video projection system 280 shown in FIG. 28, the speaker group 282 of the speakers LSP1 to LSP8 may be embedded in the projection screen 281b for projecting the video by the video projection device 281a. Like the video projection system 290 shown in FIG. 29, a speaker group 292 in which the speakers LSP1 to LSP8 are arranged behind the sound transmission type screen 291b for projecting an image by the video projection device 291a may be arranged. In addition, the audio signal conversion apparatus according to the present invention and the audio data reproduction apparatus including the same can be embedded in a TV stand (TV board). As in a system (home theater system) 300 shown in FIG. 30, a speaker group 302b in which speakers LSP1 to LSP8 are arranged may be embedded in a TV stand 302a on which the TV device 301 is mounted. Furthermore, the audio signal conversion device according to the present invention and the audio data reproduction device including the same can also be applied to car audio. As in an automobile 310 shown in FIG. 31, a speaker group 312 in which speakers LSP1 to LSP8 are arranged in a curved line may be embedded in a dashboard inside the vehicle.
また、図25~図31を参照して説明したような装置などに本発明に係る音声信号変換処理を適用した際、受聴者はこの変換処理(図7や図8の音声信号処理部73における処理)を行うか行わないかについて、装置本体に備えられたボタン操作やあるいはリモートコントローラ操作などでなされたユーザ操作により切り替える切替部を設けることもできる。この変換処理を行わない場合、2ch音声データの再生には、図6に示したように仮想音源を配置して波面合成再生方式で再生してもよい。あるいは図32に示す位置関係320のように、アレイスピーカ321の両端のスピーカ321L,321Rのみを用いて再生してもよい。5.1ch音声データについても同様に、3つの仮想音源に割り当ててもよいし、あるいは両端と真ん中の1つか2つのスピーカのみを用いて再生してもよい。
In addition, when the audio signal conversion process according to the present invention is applied to the apparatus described with reference to FIGS. 25 to 31, the listener can perform the conversion process (in the audio signal processing unit 73 in FIGS. 7 and 8). It is also possible to provide a switching unit that switches whether or not to perform processing by a user operation performed by a button operation or a remote controller operation provided in the apparatus main body. When this conversion processing is not performed, the 2ch audio data may be reproduced by arranging the virtual sound source as shown in FIG. Or you may reproduce | regenerate using only the speakers 321L and 321R of the both ends of the array speaker 321 like the positional relationship 320 shown in FIG. Similarly, 5.1ch audio data may be assigned to three virtual sound sources, or may be reproduced using only one or two speakers at both ends and the middle.
また、本発明で適用可能な波面合成再生方式としては、上述したようにスピーカアレイ(複数のスピーカ)を備えて仮想音源に対する音像としてそれらのスピーカから出力するようにする方式であればよく、非特許文献1に記載のWFS方式の他、人間の音像知覚に関する現象としての先行音効果(ハース効果)を利用した方式など様々な方式が挙げられる。ここで、先行音効果とは、同一の音声を複数の音源から再生し、音源それぞれから聴取者に到達する各音声に小さな時間差がある場合、先行して到達した音声の音源方向に音像が定位する効果を指し示したものである。この効果を利用すれば、仮想音源位置に音像を知覚させることが可能となる。ただし、その効果だけで音像を明確に知覚させることは難しい。ここで、人間は音圧を最も高く感じる方向に音像を知覚するという性質も持ち合わせている。したがって、音声データ再生装置において、上述の先行音効果と、この最大音圧方向知覚の効果とを組み合わせ、これにより、少ない数のスピーカでも仮想音源の方向に音像を知覚させることが可能になる。
In addition, as a wavefront synthesis reproduction method applicable in the present invention, any method may be used as long as it includes a speaker array (a plurality of speakers) and outputs a sound image for a virtual sound source from those speakers. In addition to the WFS method described in Patent Document 1, there are various methods such as a method using a preceding sound effect (Haas effect) as a phenomenon related to human sound image perception. Here, the preceding sound effect means that if the same sound is played from multiple sound sources and each sound reaching the listener from each sound source has a small time difference, the sound image is localized in the sound source direction of the sound that has arrived in advance. It points out the effect to do. If this effect is used, a sound image can be perceived at the virtual sound source position. However, it is difficult to clearly perceive the sound image only by the effect. Here, humans also have the property of perceiving a sound image in the direction in which the sound pressure is felt highest. Therefore, in the audio data reproducing apparatus, the above-described effect of the preceding sound and the effect of perceiving the maximum sound pressure direction are combined, so that a sound image can be perceived in the direction of the virtual sound source even with a small number of speakers.
以上、本発明に係る音声信号変換装置が、マルチチャネル方式用の音声信号に対して波面合成再生方式で再生させるための音声信号に変換することを前提にして説明したが、本発明は、同じくマルチチャネル方式用(チャネル数は同じでも異なってもよい)の音声信号に変換する場合などにも同様に適用できる。変換後の音声信号としては、配置は問わないが少なくとも複数のスピーカからなるスピーカ群によって再生させるための音声信号であればよい。それは、このような変換の場合にも上述のような離散フーリエ変換・逆変換を施し且つ相関信号を得るために直流成分を無視することがあるためである。このように変換された音声信号の再生方法としては、例えば1つ1つの仮想音源用に抽出した信号に対し、それぞれ1つずつスピーカを対応させて波面合成再生方式ではなく普通に出力再生させることが考えられる。さらに、両側の無相関信号はそれぞれ別の、側方や後方に設置するスピーカに割り当てるような再生方法など、様々な再生方法が考えられる。
As described above, the audio signal conversion apparatus according to the present invention has been described on the assumption that the audio signal for the multi-channel method is converted into an audio signal for reproduction by the wavefront synthesis reproduction method. The present invention can be similarly applied to the case of converting to a multi-channel audio signal (the number of channels may be the same or different). The converted audio signal may be an audio signal to be reproduced by a speaker group including at least a plurality of speakers, although the arrangement is not limited. This is because even in the case of such conversion, the DC component may be ignored in order to perform the discrete Fourier transform / inverse transform as described above and obtain a correlation signal. As a method of reproducing the audio signal converted in this way, for example, each of the signals extracted for each virtual sound source is associated with one speaker at a time, and is normally output and reproduced instead of the wavefront synthesis reproduction method. Can be considered. Further, various reproduction methods such as a method of assigning the uncorrelated signals on both sides to different speakers installed on the side and the rear can be considered.
また、例えば図8で例示した音声信号処理部73における各構成要素など、本発明に係る音声信号変換装置の各構成要素やその装置を備えた音声データ再生装置の各構成要素は、例えばマイクロプロセッサ(またはDSP:Digital Signal Processor)、メモリ、バス、インターフェイス、周辺装置などのハードウェアと、これらのハードウェア上にて実行可能なソフトウェアとにより実現できる。上記ハードウェアの一部または全部は集積回路/IC(Integrated Circuit)チップセットとして搭載することができ、その場合、上記ソフトウェアは上記メモリに記憶しておければよい。また、本発明の各構成要素の全てをハードウェアで構成してもよく、その場合についても同様に、そのハードウェアの一部または全部を集積回路/ICチップセットとして搭載することも可能である。
Further, for example, each component of the audio signal conversion apparatus according to the present invention such as each component in the audio signal processing unit 73 illustrated in FIG. (Or DSP: Digital Signal Processor), hardware such as a memory, a bus, an interface, and a peripheral device, and software that can be executed on these hardware. Part or all of the hardware can be mounted as an integrated circuit / IC (Integrated Circuit) chip set, and in this case, the software may be stored in the memory. In addition, all the components of the present invention may be configured by hardware, and in that case as well, part or all of the hardware can be mounted as an integrated circuit / IC chip set. .
また、上述した様々な構成例における機能を実現するためのソフトウェアのプログラムコードを記録した記録媒体を、音声信号変換装置となる汎用コンピュータ等の装置に供給し、その装置内のマイクロプロセッサまたはDSPによりプログラムコードが実行されることによっても、本発明の目的が達成される。この場合、ソフトウェアのプログラムコード自体が上述した様々な構成例の機能を実現することになり、このプログラムコード自体や、プログラムコードを記録した記録媒体(外部記録媒体や内部記憶装置)であっても、そのコードを制御側が読み出して実行することで、本発明を構成することができる。外部記録媒体としては、例えばCD-ROMまたはDVD-ROMなどの光ディスクやメモリカード等の不揮発性の半導体メモリなど、様々なものが挙げられる。内部記憶装置としては、ハードディスクや半導体メモリなど様々なものが挙げられる。また、プログラムコードはインターネットからダウンロードして実行することや、放送波から受信して実行することもできる。
In addition, a recording medium on which a program code of software for realizing the functions in the various configuration examples described above is recorded is supplied to a device such as a general-purpose computer serving as an audio signal conversion device, and the microprocessor or DSP in the device is used. The object of the present invention is also achieved by executing the program code. In this case, the software program code itself realizes the functions of the above-described various configuration examples. Even if the program code itself or a recording medium (external recording medium or internal storage device) on which the program code is recorded is used. The present invention can be configured by the control side reading and executing the code. Examples of the external recording medium include various media such as an optical disk such as a CD-ROM or a DVD-ROM and a nonvolatile semiconductor memory such as a memory card. Examples of the internal storage device include various devices such as a hard disk and a semiconductor memory. The program code can be downloaded from the Internet and executed, or received from a broadcast wave and executed.
以上、本発明に係る音声信号変換装置について説明したが、処理の流れをフロー図で例示したように、本発明は、マルチチャネルの入力音声信号をスピーカ群によって再生させるための音声信号に変換する音声信号変換方法としての形態も採り得る。
Although the audio signal conversion apparatus according to the present invention has been described above, as illustrated in the flowchart of the processing flow, the present invention converts a multi-channel input audio signal into an audio signal for reproduction by a speaker group. A form as an audio signal conversion method may also be adopted.
この音声信号変換方法は、次の変換ステップ、抽出ステップ、逆変換ステップ、及び窓関数乗算ステップを有する。変換ステップは、変換部が、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施すステップである。抽出ステップは、相関信号抽出部が、変換ステップで離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出するステップである。逆変換ステップは、逆変換部が、抽出ステップで抽出された相関信号または相関信号及び無相関信号に対して、もしくは相関信号から生成された音声信号に対して、もしくは相関信号及び無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施すステップである。窓関数乗算ステップは、窓関数乗算部が、逆変換ステップで離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算するステップである。その他の応用例については、音声信号変換装置について説明した通りであり、その説明を省略する。
This audio signal conversion method has the following conversion step, extraction step, inverse conversion step, and window function multiplication step. In the conversion step, after the conversion unit reads each of the input audio signals of the two channels while shifting by ¼ of the length of the processing segment, the audio signal of the read processing segment is multiplied by the Hann window function. This is a step of performing discrete Fourier transform. The extraction step is a step in which the correlation signal extraction unit extracts a correlation signal by ignoring a direct current component of the audio signals of the two channels after the discrete Fourier transform in the conversion step. In the inverse conversion step, the inverse conversion unit performs the correlation signal or the correlation signal and the non-correlation signal extracted in the extraction step, the voice signal generated from the correlation signal, or the correlation signal and the non-correlation signal. This is a step of performing inverse discrete Fourier transform on the generated audio signal. In the window function multiplication step, the window function multiplication unit again multiplies the audio signal of the processing segment among the audio signals after the inverse discrete Fourier transform in the inverse conversion step by the Hann window function, and 1 / of the length of the processing segment. This is a step of shifting by 4 and adding to the audio signal of the previous processing segment. Other application examples are the same as those described for the audio signal converter, and the description thereof is omitted.
なお、上記プログラムコード自体は、換言すると、この音声信号変換方法をコンピュータに実行させるためのプログラムである。すなわち、このプログラムは、コンピュータに、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換ステップと、変換ステップで離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する抽出ステップと、抽出ステップで抽出された相関信号または相関信号及び無相関信号に対して、もしくは相関信号から生成された音声信号に対して、もしくは相関信号及び無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換ステップと、逆変換ステップで離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算ステップと、を実行させるためのプログラムである。その他の応用例については、音声信号変換装置について説明した通りであり、その説明を省略する。
The program code itself is a program for causing a computer to execute this audio signal conversion method. In other words, this program reads out each of the input audio signals of the two channels from the computer while shifting each 1/4 of the length of the processing segment, and multiplies the read audio signal of the processing segment by the Hann window function. Thereafter, a transform step for performing discrete Fourier transform, an extraction step for extracting a correlation signal by ignoring a direct current component of the two-channel audio signals after the discrete Fourier transform in the transform step, and a correlation signal extracted in the extraction step Alternatively, an inverse transform step for performing discrete Fourier inverse transform on the correlated signal and the uncorrelated signal, or on the sound signal generated from the correlated signal, or on the sound signal generated from the correlated signal and the uncorrelated signal. And the sound of the processing segment of the audio signal after the discrete Fourier inverse transform in the inverse transform step. Signal to a program for causing again multiplied by the Hann window function, shifted by 1/4 of the length of the processing segments, it is executed and the window function multiplier step of adding to the audio signal before the processing segment, a. Other application examples are the same as those described for the audio signal converter, and the description thereof is omitted.
70…音声データ再生装置、71…デコーダ、72…音声信号抽出部、73…音声信号処理部、74…D/Aコンバータ、75…増幅器、76…スピーカ、81…窓関数乗算部、82…音声信号分離抽出部、83…窓関数乗算部、84…音声出力信号生成部。
DESCRIPTION OF SYMBOLS 70 ... Audio | voice data reproduction apparatus, 71 ... Decoder, 72 ... Audio signal extraction part, 73 ... Audio signal processing part, 74 ... D / A converter, 75 ... Amplifier, 76 ... Speaker, 81 ... Window function multiplication part, 82 ... Audio | voice Signal separation and extraction unit, 83... Window function multiplication unit, 84.
Claims (5)
- マルチチャネルの入力音声信号を、スピーカ群によって再生させるための音声信号に変換する音声信号変換装置であって、
2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換部と、
該変換部で離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する相関信号抽出部と、
該相関信号抽出部で抽出された相関信号または該相関信号及び無相関信号に対して、もしくは前記相関信号から生成された音声信号に対して、もしくは前記相関信号及び前記無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換部と、
該逆変換部で離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算部と、
を備えたことを特徴とする音声信号変換装置。 An audio signal converter for converting a multi-channel input audio signal into an audio signal for reproduction by a group of speakers,
Each of the input audio signals of the two channels is read while being shifted by ¼ of the length of the processing segment, and the conversion unit that performs discrete Fourier transform after multiplying the read audio signal of the processing segment by the Hann window function When,
A correlation signal extraction unit that extracts a correlation signal by ignoring a direct current component for two-channel audio signals after discrete Fourier transform in the conversion unit;
The correlation signal extracted by the correlation signal extraction unit or the correlation signal and the non-correlation signal, or the voice signal generated from the correlation signal, or the correlation signal and the non-correlation signal An inverse transform unit that performs discrete Fourier inverse transform on the audio signal;
The audio signal of the processing segment among the audio signals after the discrete Fourier inverse transform in the inverse transform unit is multiplied by the Hann window function again, and shifted by 1/4 of the length of the processing segment, so that the audio of the previous processing segment is obtained. A window function multiplier for adding to the signal;
An audio signal conversion device comprising: - 前記窓関数乗算部で処理対象となる前記離散フーリエ逆変換後の音声信号は、前記相関信号または前記相関信号及び前記無相関信号に対して、時間領域あるいは周波数領域においてスケーリング処理が施された後の音声信号とすることを特徴とする請求項1に記載の音声信号変換装置。 The audio signal after the inverse discrete Fourier transform to be processed by the window function multiplication unit is subjected to scaling processing in the time domain or the frequency domain with respect to the correlation signal or the correlation signal and the non-correlation signal. The audio signal conversion apparatus according to claim 1, wherein the audio signal conversion apparatus is an audio signal.
- マルチチャネルの入力音声信号を、スピーカ群によって再生させるための音声信号に変換する音声信号変換方法であって、
変換部が、2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換ステップと、
相関信号抽出部が、前記変換ステップで離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する抽出ステップと、
逆変換部が、前記抽出ステップで抽出された相関信号または該相関信号及び無相関信号に対して、もしくは前記相関信号から生成された音声信号に対して、もしくは前記相関信号及び前記無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換ステップと、
窓関数乗算部が、前記逆変換ステップで離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算ステップと、
を有することを特徴とする音声信号変換方法。 An audio signal conversion method for converting a multi-channel input audio signal into an audio signal for reproduction by a speaker group,
The conversion unit reads out each of the input audio signals of the two channels while shifting by ¼ of the length of the processing segment, and multiplies the read audio signal of the processing segment by the Hann window function, and then performs a discrete Fourier transform. A conversion step for applying
An extraction step in which the correlation signal extraction unit extracts a correlation signal by ignoring a direct current component for the audio signals of the two channels after the discrete Fourier transform in the conversion step;
An inverse transform unit for the correlation signal extracted in the extraction step, the correlation signal and the non-correlation signal, the voice signal generated from the correlation signal, or the correlation signal and the non-correlation signal; An inverse transform step for performing an inverse discrete Fourier transform on the generated audio signal;
The window function multiplication unit again multiplies the audio signal of the processing segment among the audio signals after the discrete Fourier inverse transform in the inverse transformation step, and shifts by 1/4 of the length of the processing segment, A window function multiplication step to add to the audio signal of the previous processing segment;
An audio signal conversion method comprising: - コンピュータに、
2つのチャネルの入力音声信号のそれぞれについて、処理セグメントの長さの1/4ずつずらしながら読み出し、読み出した処理セグメントの音声信号に対し、Hann窓関数を乗算した後、離散フーリエ変換を施す変換ステップと、
該変換ステップで離散フーリエ変換後の2つのチャネルの音声信号について、直流成分を無視して相関信号を抽出する抽出ステップと、
該抽出ステップで抽出された相関信号または該相関信号及び無相関信号に対して、もしくは前記相関信号から生成された音声信号に対して、もしくは前記相関信号及び前記無相関信号から生成された音声信号に対して、離散フーリエ逆変換を施す逆変換ステップと、
該逆変換ステップで離散フーリエ逆変換後の音声信号のうち処理セグメントの音声信号に対し、再びHann窓関数を乗算し、処理セグメントの長さの1/4だけずらして、前の処理セグメントの音声信号に加算する窓関数乗算ステップと、を実行させるためのプログラム。 On the computer,
A conversion step in which each of the input audio signals of the two channels is read while being shifted by ¼ of the length of the processing segment, the audio signal of the read processing segment is multiplied by the Hann window function, and then a discrete Fourier transform is performed. When,
An extraction step of ignoring a direct current component and extracting a correlation signal for the audio signals of the two channels after the discrete Fourier transform in the conversion step;
The correlation signal extracted in the extraction step or the correlation signal and the non-correlation signal, or the voice signal generated from the correlation signal, or the voice signal generated from the correlation signal and the non-correlation signal An inverse transform step for performing an inverse discrete Fourier transform,
Of the audio signals after the discrete Fourier inverse transform in the inverse transform step, the audio signal of the processing segment is again multiplied by the Hann window function and shifted by ¼ of the length of the processing segment to obtain the audio of the previous processing segment. And a window function multiplication step for adding to the signal. - 請求項4に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the program according to claim 4 is recorded.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-117093 | 2012-05-23 | ||
JP2012117093A JP2013242498A (en) | 2012-05-23 | 2012-05-23 | Device, method, program, and recording medium for converting audio signals |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013176073A1 true WO2013176073A1 (en) | 2013-11-28 |
Family
ID=49623763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/063907 WO2013176073A1 (en) | 2012-05-23 | 2013-05-20 | Audio signal conversion device, method, program, and recording medium |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2013242498A (en) |
WO (1) | WO2013176073A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106057211A (en) * | 2016-05-27 | 2016-10-26 | 广州多益网络股份有限公司 | Signal matching method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011205170A (en) * | 2010-03-24 | 2011-10-13 | Advantest Corp | Filter generating device, program, and filter generating method |
JP2012019454A (en) * | 2010-07-09 | 2012-01-26 | Sharp Corp | Audio signal processor, method, program, and recording medium |
-
2012
- 2012-05-23 JP JP2012117093A patent/JP2013242498A/en active Pending
-
2013
- 2013-05-20 WO PCT/JP2013/063907 patent/WO2013176073A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011205170A (en) * | 2010-03-24 | 2011-10-13 | Advantest Corp | Filter generating device, program, and filter generating method |
JP2012019454A (en) * | 2010-07-09 | 2012-01-26 | Sharp Corp | Audio signal processor, method, program, and recording medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106057211A (en) * | 2016-05-27 | 2016-10-26 | 广州多益网络股份有限公司 | Signal matching method and device |
Also Published As
Publication number | Publication date |
---|---|
JP2013242498A (en) | 2013-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6284480B2 (en) | Audio signal reproducing apparatus, method, program, and recording medium | |
US8295493B2 (en) | Method to generate multi-channel audio signal from stereo signals | |
TWI489887B (en) | Virtual audio processing for loudspeaker or headphone playback | |
JP4580210B2 (en) | Audio signal processing apparatus and audio signal processing method | |
TW200837718A (en) | Apparatus and method for generating an ambient signal from an audio signal, apparatus and method for deriving a multi-channel audio signal from an audio signal and computer program | |
JP2011501486A (en) | Apparatus and method for generating a multi-channel signal including speech signal processing | |
JP2014513502A (en) | Apparatus and method for generating an output signal using a decomposer | |
JP6660982B2 (en) | Audio signal rendering method and apparatus | |
US9913036B2 (en) | Apparatus and method and computer program for generating a stereo output signal for providing additional output channels | |
JP4810621B1 (en) | Audio signal conversion apparatus, method, program, and recording medium | |
JP5338053B2 (en) | Wavefront synthesis signal conversion apparatus and wavefront synthesis signal conversion method | |
JP2011199707A (en) | Audio data reproduction device, and audio data reproduction method | |
JP2013055439A (en) | Sound signal conversion device, method and program and recording medium | |
WO2013176073A1 (en) | Audio signal conversion device, method, program, and recording medium | |
JP6161962B2 (en) | Audio signal reproduction apparatus and method | |
JP2011239036A (en) | Audio signal converter, method, program, and recording medium | |
JP6017352B2 (en) | Audio signal conversion apparatus and method | |
JP7332745B2 (en) | Speech processing method and speech processing device | |
JP2015065551A (en) | Voice reproduction system | |
US11470438B2 (en) | Audio signal processor, system and methods distributing an ambient signal to a plurality of ambient signal channels | |
JP6630599B2 (en) | Upmix device and program | |
WO2017188141A1 (en) | Audio signal processing device, audio signal processing method, and audio signal processing program | |
KR20110102719A (en) | Audio up-mixing apparatus and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13793503 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13793503 Country of ref document: EP Kind code of ref document: A1 |