WO2017188141A1 - Dispositif, procédé et programme de traitement de signal audio - Google Patents

Dispositif, procédé et programme de traitement de signal audio Download PDF

Info

Publication number
WO2017188141A1
WO2017188141A1 PCT/JP2017/016019 JP2017016019W WO2017188141A1 WO 2017188141 A1 WO2017188141 A1 WO 2017188141A1 JP 2017016019 W JP2017016019 W JP 2017016019W WO 2017188141 A1 WO2017188141 A1 WO 2017188141A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
channel
component
target channel
coherent
Prior art date
Application number
PCT/JP2017/016019
Other languages
English (en)
Japanese (ja)
Inventor
安藤 彰男
Original Assignee
国立大学法人富山大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立大学法人富山大学 filed Critical 国立大学法人富山大学
Priority to JP2018514561A priority Critical patent/JP6846822B2/ja
Publication of WO2017188141A1 publication Critical patent/WO2017188141A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • One aspect of the present invention relates to an audio signal processing device, an audio signal processing method, and an audio signal processing program.
  • a method for changing the number of audio signal channels has been known. Specifically, a method called upmix that converts an M-channel audio signal into an N-channel (where N> M) audio signal, and a method called downmix that converts an N-channel audio signal into an M-channel audio signal Exists. For example, conversion from a 2-channel (left channel and right-channel) audio signal to a 5.1-channel audio signal is an example of upmixing. The conversion from a 5.1 channel audio signal to a 2 channel audio signal is an example of downmixing.
  • Patent Document 1 describes a surround playback device that makes a stereo broadcast of a live sports television / radio program a powerful presence and an easy-to-listen announcement.
  • the apparatus has front left / right channel signal creation means, front center channel signal creation means, and rear left / right surround channel signal creation means.
  • Front left / right channel signal creation means selectively adds reverberant sound to front left / right channel audio signals obtained by performing matrix processing on 2-channel audio signal input, and adjusts front volume And output as audio signals for the front left / right channels.
  • the front center channel signal generation means adjusts and outputs the center volume as a front center channel audio signal without adding a reverberation sound to the audio signal obtained by extracting the in-phase component from the 2-channel audio signal input.
  • the rear left / right surround channel signal creation means adds reverberant sound to the front left / right channel audio signals obtained by performing matrix processing, adjusts the rear volume, and outputs the rear left / right channel audio signals. Output as a signal.
  • Non-Patent Document 1 describes a method of dividing a stereo signal into bands, dividing the stereo signal into a main signal and an ambience signal for each band, and reproducing the ambience signal from the rear channel of 5.1 channels.
  • Non-Patent Document 2 describes a method of dividing a stereo signal into bands and then dividing the stereo signal into a direct sound component and a reverberation sound component and reproducing the reverberation sound component from the side.
  • Non-Patent Documents 3 and 4 each disclose a method of generating an audio signal of three or more channels by dividing a multi-channel audio signal into a pair of two-channel audio signals.
  • Non-Patent Documents 1 and 2 do not add reverberation sound, but can be applied only to two-channel audio signals (that is, stereo signals) in principle.
  • Non-Patent Documents 3 and 4 since a component having a high correlation between two-channel audio signals is extracted as a coherent component, information on a sound located near the middle of the two speakers is acquired. . Therefore, in an audio system with three or more channels, only the sound information near the middle of any two speakers can be extracted as a coherent component, and the sound located in the central part of the area surrounded by all the speakers can be extracted. Information cannot be extracted.
  • An audio signal processing device is a receiving unit that receives audio signals of a plurality of channels, and a dividing unit that executes a dividing process for dividing an audio signal into a coherent component and a field component for each channel.
  • the division processing is performed using one channel that is the target of the division processing as the target channel
  • the estimation signal calculated by using at least the audio signal of the channel other than the target channel and the audio signal of the target channel Extracting an estimated signal having the highest correlation as a coherent component of the target channel; and extracting a difference between an audio signal of the target channel and a coherent component of the target channel as a field component of the target channel.
  • An audio signal processing method includes an accepting step in which an audio signal processing device receives audio signals of a plurality of channels, and a division in which the audio signal processing device divides the audio signal into a coherent component and a field component.
  • a division step for performing processing for each channel, and the division processing is calculated using at least an audio signal of a channel other than the target channel when one channel to be divided is set as a target channel. Extracting an estimated signal having the highest correlation with the audio signal of the target channel among the estimated signals as a coherent component of the target channel; and calculating a difference between the audio signal of the target channel and the coherent component of the target channel.
  • Extract as field component of And a step comprises the said dividing step, the audio signal processing device, and an output step of outputting coherent component and field component of each channel extracted in dividing step.
  • An audio signal processing program is a reception step for receiving audio signals of a plurality of channels, and a division step for executing division processing for dividing the audio signal into coherent components and field components for each channel.
  • the division processing is performed using one channel that is the target of the division processing as the target channel, the estimation signal calculated by using at least the audio signal of the channel other than the target channel and the audio signal of the target channel Extracting an estimated signal having the highest correlation as a coherent component of the target channel; and extracting a difference between an audio signal of the target channel and a coherent component of the target channel as a field component of the target channel.
  • Split steps and split steps To execute an output step of outputting coherent component and field component of each channel extracted in up to the computer.
  • a signal that is estimated using an audio signal of a channel other than the target channel and has the highest correlation for each actual audio signal of the target channel is extracted as a coherent component of the target channel. Further, the difference between the actual audio signal of the target channel and its coherent component is extracted as the field component of the target channel.
  • This coherent component and field component are obtained for each channel. In this way, by obtaining the coherent component and field component of each channel using only the original audio signal without adding sound, the atmosphere of the original sound can be maintained as much as possible.
  • this method can be applied regardless of the number of channels of the original sound.
  • the atmosphere of the original sound can be maintained as much as possible when the number of channels of the audio signal is changed regardless of the number of channels of the original sound.
  • FIG. 1 It is a figure which shows the example of the audio signal process which concerns on embodiment. It is a figure which shows the hardware constitutions of the computer which functions as the audio signal processing apparatus which concerns on embodiment. It is a figure which shows the function structure of the audio signal processing apparatus which concerns on embodiment. It is a figure which shows the block which is a unit which processes an audio signal. It is a figure which shows the process in a certain channel. It is a flowchart which shows operation
  • the audio signal processing apparatus 10 is a computer that divides each audio signal of a plurality of channels into a coherent component and a field component.
  • the audio signal is a digital signal including sound in a frequency band (generally about 20 Hz to 20000 Hz) that can be heard by humans, and is converted into an analog signal as necessary. Examples of sound represented by the audio signal include, but are not limited to, voice, music, video sound, natural sound, or any combination thereof.
  • FIG. 1 shows an example of audio signal processing by the audio signal processing apparatus 10, and more specifically shows processing of two channels (L channel and R channel), that is, stereo audio signals.
  • the audio signal processing apparatus 10 divides each channel signal into a coherent component and a field component.
  • a coherent component of one channel is a component having a high correlation with an audio signal of another channel.
  • the field component of one channel is the difference between the audio signal of the channel (ie, the original signal) and the coherent component of the channel. More specifically, the field component is a component obtained by subtracting the coherent component from the audio signal.
  • the coherent component is a sound having a clear direction, whereas the field component is an ambient sound having a diffusive nature.
  • the sound corresponding to the field component is also referred to as “field sound”.
  • FIG. 1 shows that an audio signal processing apparatus 10 divides an L channel audio signal into an L channel coherent component L ⁇ and a field component L ⁇ , and an R channel audio signal into an R channel coherent component R ⁇ and a field component R ⁇ .
  • the coherent component L ⁇ is a component having a high correlation with the R channel audio signal
  • the coherent component R ⁇ is a component having a high correlation with the L channel audio signal.
  • FIG. 1 shows the processing of a two-channel audio signal, but the audio signal processing apparatus 10 may process an arbitrary number of audio signals.
  • the audio signal processing apparatus 10 may process audio signals of three or more channels.
  • the audio signal processing apparatus 10 may process 22.2 channel audio signals for 8K Super Hi-Vision.
  • multi-channel audio signals are recorded by a plurality of microphones arranged in a three-dimensional space.
  • the audio signals of a plurality of channels are recorded in such a manner that a plurality of target sounds (object sound) are mixed with each other or the target sound is mixed with a field sound.
  • target sounds object sound
  • the distance from a sound source differs among individual microphones, the time at which a specific sound arrives differs from microphone to microphone, and as a result, the coherence of the recorded audio signal becomes low.
  • the coherent component can be extracted from the audio signal of each channel, the clarity of the sound and the apparent sound source width (ASW: Appearance Source Width) can be improved. Further, by extracting the field component and using it for the upmix, it is possible to produce a good ambience effect (feeling that the sound surrounds the listener).
  • the coherent component corresponds to a target sound (for example, singing voice, instrument sound, or sound emitted from a speaker) emitted from the main sound source, and the field component is a sound whose directionality is not clear (for example, echo, beat). Etc.).
  • field sound v l (n) That is, the audio signal x l (n) is expressed by the equation (1).
  • Equation (2) The coherent component ⁇ l (n) of the audio signal x l (n) is expressed by Equation (2).
  • the field component ⁇ l (n) of the audio signal x l (n) is expressed by Expression (3).
  • the specific method for realizing the audio signal processing apparatus 10 is not limited.
  • the audio signal processing apparatus 10 may be realized by installing a predetermined program (for example, an audio signal processing program P1 described later) in a computer such as a personal computer, a server, or a portable terminal.
  • a predetermined program for example, an audio signal processing program P1 described later
  • an audio device such as an amplifier may function as the audio signal processing device 10.
  • FIG. 2 shows a general hardware configuration of the computer 100 functioning as the audio signal processing apparatus 10.
  • the computer 100 includes a processor (for example, CPU) 101 that executes an operating system, application programs, and the like, a main storage unit 102 that includes a ROM and a RAM, an auxiliary storage unit 103 that includes a hard disk, a flash memory, and the like.
  • a communication control unit 104 configured by a network card or a wireless communication module, an input device 105 such as a keyboard and a mouse, and an output device 106 such as a monitor are provided.
  • Each functional element of the audio signal processing apparatus 10 is realized by reading predetermined software (for example, an audio signal processing program P1 described later) on the processor 101 or the main storage unit 102 and executing the software.
  • the processor 101 operates the communication control unit 104, the input device 105, or the output device 106 in accordance with the software, and reads and writes data in the main storage unit 102 or the auxiliary storage unit 103. Data or a database necessary for processing is stored in the main storage unit 102 or the auxiliary storage unit 103.
  • the audio signal processing apparatus 10 may be composed of one computer or a plurality of computers. When a plurality of computers are used, one audio signal processing apparatus 10 is logically constructed by connecting these computers via a communication network such as the Internet or an intranet.
  • FIG. 3 shows a functional configuration of the audio signal processing apparatus 10.
  • the audio signal processing apparatus 10 includes a receiving unit 11, a dividing unit 12, and an output unit 13 as functional components.
  • the reception unit 11 is a functional element that receives audio signals of a plurality of channels. “Accepting an audio signal” means that the audio signal processing apparatus 10 acquires an audio signal by an arbitrary method. In other words, “accepting an audio signal” means that the audio signal is input to the audio signal processing apparatus 10.
  • a specific method for receiving the audio signal of each channel is not limited.
  • the reception unit 11 may receive an audio signal by accessing a database or another device and reading out a data file of the audio signal. Or the reception part 11 may receive the audio signal sent via the communication network from the other apparatus. Alternatively, the reception unit 11 may acquire an audio signal input from the audio signal processing device 10. In any case, the receiving unit 11 outputs the received audio signal of each channel to the dividing unit 12.
  • the dividing unit 12 is a functional element that divides the audio signal of each channel into a coherent component and a field component. The following description is based on the premise that the dividing unit 12 processes the N-channel audio signal ⁇ x l (n)
  • l 1,..., N ⁇ expressed by Expression (4).
  • the dividing unit 12 divides the audio signal of each channel into a plurality of time interval signals. Specifically, the dividing unit 12 divides the audio signal into signals having a short time interval (referred to as “frame”) using a window function (for example, Kaiser-Bessel window). For example, if 1024 frequency points are used in the modified discrete cosine transform (MDCT) described later, the dividing unit 12 uses a Kaiser-Bessel window corresponding to the length of 2048 points to divide the audio signal into a plurality of frames. To divide. Usually, the number of samples in one frame is determined so as to obtain an appropriate frequency resolution, but the number of samples is not sufficient for estimating the coherent component.
  • a window function for example, Kaiser-Bessel window
  • the dividing unit 12 sets a plurality of continuous frames (for example, 24 frames) as a signal of one time section (referred to as “block”).
  • FIG. 4 shows the concept of such block generation. More specifically, FIG. 4 shows a process of dividing each of two-channel (L channel and R channel) audio signals into a plurality of blocks.
  • the dividing unit 12 executes the following processing for each block of each channel.
  • a channel that is a target for dividing an audio signal into a coherent component and a field component (that is, a target of division processing) is referred to as a “target channel”.
  • processing in a certain target channel will be described.
  • the dividing unit 12 extracts a coherent component of the target channel, and then extracts a field component of the target channel.
  • FIG. 5 shows the concept of extraction of coherent components corresponding to the first half of the series of processes.
  • the dividing unit 12 converts the audio signal x l (n) of the l-th channel, which is the target channel, into K frequency band (subband) signals (referred to as “subband signals”).
  • the dividing unit 12 uses a least square method for this extraction.
  • the dividing unit 12 extracts the coherent components ⁇ l (n) of the target channel by adding the coherent components of all the subbands. Thereafter, the dividing unit 12 extracts the field component ⁇ l (n) by subtracting the coherent component ⁇ l (n) from the original audio signal x l (n).
  • the dividing unit 12 executes the following processing for each block of the audio signal of the target channel.
  • the dividing unit 12 divides the audio signal x l (n) of each channel into K subband signals x l (k) (n) using a filter bank. This division is expressed by equation (5).
  • the audio signal processing apparatus 10 uses time-domain subband signals, and therefore processes a signal with an arbitrary number of consecutive frames as one block signal. By doing so, the estimated section length can be extended. As a result, the audio signal of each channel can be processed without impairing the sound quality of the obtained coherent component.
  • the dividing unit 12 converts the subband signal x l (k) (n) into subband signals ⁇ x m (k) (n) in the same band (same subband) of N ⁇ 1 channels other than the target channel.
  • n) Estimate from a linear combination of
  • m 1,..., l ⁇ 1, l + 1,. This linear combination corresponding to a certain block is expressed by Equation (6).
  • Estimated signal Can be considered as a component having a high correlation with signals in the same band of other channels (N ⁇ 1 channels other than the target channel).
  • An estimation error e l (k) (n) between the subband signal of the target channel and the estimated signal is expressed by Expression (7).
  • the dividing unit 12 obtains coefficients ⁇ a m (k)
  • m 1,..., L ⁇ 1, l + 1,..., N ⁇ that minimize the estimation error by the least square method.
  • the error function to be minimized is given by equation (8).
  • the coherent component ⁇ l (k) (n) of the target channel in the kth subband is obtained by Equation (12).
  • This coherent component ⁇ l (k) (n) corresponds to an estimated signal having the highest correlation with the audio signal of the target channel among the estimated signals calculated using the audio signals of channels other than the target channel.
  • the dividing unit 12 obtains coherent components for all subbands. Then, the dividing unit 12 obtains the coherent component of the target channel by adding the coherent components of all the subbands. This process is expressed by equation (13).
  • the dividing unit 12 obtains the field component of the target channel by subtracting the coherent component from the original audio signal of the target channel. This processing is expressed by the above formula (3).
  • the dividing unit 12 may obtain a field component by subtracting a coherent component from the audio signal in each subband, and may obtain a field component of the target channel by adding the field components of all subbands. Specifically, the field component ⁇ l (k) (n) of the target channel in the k-th subband is obtained by Expression (14). The field component ⁇ l (n) of the target channel is obtained by Equation (15).
  • the dividing unit 12 performs the above processing on each block of the audio signal of the target channel. Then, the dividing unit 12 extracts the coherent component of the target channel by connecting the coherent components of all blocks. Further, the dividing unit 12 generates the field component of the target channel by concatenating the field components of all blocks.
  • the dividing unit 12 generates a coherent component and a field component for all channels by setting each of a plurality of channels as a target channel and executing the above processing. Then, the division unit 12 outputs the coherent components and field components of all channels to the output unit 13.
  • the dividing unit 12 does not add another signal to the audio signal of each channel (that is, without adding another sound to the original sound), and converts the audio signal of each channel into a coherent component and a field component. To divide.
  • the output unit 13 is a functional element that outputs the coherent component and field component of each channel generated by the dividing unit 12 as a processing result.
  • This processing result can be said to be an upmix from N channel to 2N channel.
  • the output method of the processing result is not limited at all.
  • the output unit 13 may store the processing result in a storage device such as a memory or a database, or may transmit the processing result to another device via a communication network.
  • the output unit 13 may output the coherent component and field component of each channel to a corresponding speaker.
  • existing audio material can be used for production of contents having a larger number of channels, or reproduced by an audio system having a larger number of channels. It becomes possible to do.
  • the audio signal processing apparatus 10 may upmix an N-channel audio signal into a number of channels larger than 2N. Specifically, the audio signal processing apparatus 10 generates signals having different correlations between channels by decorrelating the extracted plurality of field components using a technique described in the following reference. Thereby, more than N field components are obtained. For example, stereo audio material can be converted into 5.1 channel audio material, and can be reproduced with higher presence using a 5.1 channel audio system. Alternatively, 5.1 channel audio material can be converted to 22.2 channel audio material, or reproduced with higher presence using a 22.2 channel audio system. (Reference) J. Breebaart and C. Fallar, “Spatial Audio Processing-MPEG Surround and Other Applications,” Wiley, 2007.
  • the audio signal processing apparatus 10 may upmix the N-channel audio signal into audio signals of J audio signals smaller than 2N (where J> N). Specifically, the audio signal processing apparatus 10 realizes an upmix from the N channel to the J channel by mixing N field components.
  • the processing result by the audio signal processing apparatus 10 can be used not only for upmixing but also for downmixing.
  • the reception unit 11 receives audio signals of a plurality of channels (reception step).
  • the dividing unit 12 executes a dividing process for dividing each audio signal into a coherent component and a field component for each channel (dividing step).
  • the output unit 13 outputs the coherent component and field component of each channel (output step).
  • dividing step a particularly important process of the dividing unit 12 will be described in detail.
  • FIG. 6 shows a process of generating a coherent component and a field component of one target channel.
  • the dividing unit 12 divides the audio signal of each channel into a plurality of blocks (step S11). Note that by storing the audio signal of each channel and each block divided in step S11, step S11 can be omitted when processing the second and subsequent target channels.
  • the dividing unit 12 sets one of a plurality of blocks of the target channel as a processing target (step S12). Subsequently, the dividing unit 12 extracts an estimated signal having the highest correlation with the audio signal of the target channel from among the estimated signals calculated using the audio signals of channels other than the target channel as a coherent component of the target channel. (Step S13). Subsequently, the dividing unit 12 extracts a difference between the audio signal of the target channel and the coherent component thereof as a field component of the target channel (step S14). By such processing, the dividing unit 12 obtains a coherent component and a field component of one block of the target channel.
  • the process proceeds to the next block (see step S15). That is, the dividing unit 12 sets the next block as a processing target (step S12), and generates a coherent component and a field component of the block (steps S13 and S14).
  • the dividing unit 12 executes the processing of steps S12 to S14 for all blocks, and generates coherent components and field components of all blocks (YES in step S15). Then, the dividing unit 12 obtains the final coherent component of the target channel by concatenating the coherent components of all blocks, and obtains the final field component of the target channel by concatenating the field components of all blocks.
  • FIG. 7 shows details of the processing in step S13 in FIG. 6, that is, details of processing for generating a coherent component of the target channel.
  • the process shown in FIG. 7 is executed for each block of the audio signal of the target channel.
  • the dividing unit 12 generates a plurality of subband signals by dividing the block signal into a plurality of subbands for each channel (target channel and all other channels) (step S131). Subsequently, the dividing unit 12 sets one of a plurality of subbands as a processing target (step S132). Subsequently, the dividing unit 12 selects an estimated signal having the highest correlation with the subband signal of the target channel from among the estimated signals calculated using the subband signals of channels other than the target channel. As a coherent component of the target channel at (step S133). The dividing unit 12 executes the processes of steps S132 and S133 for all subbands (see step S134).
  • the dividing unit 12 adds the coherent components to add the coherent components of the target channel (more specifically, the coherent components for one block). ) Is generated (step S135).
  • the audio signal processing program P1 includes a main module P10, a reception module P11, a division module P12, and an output module P13.
  • the main module P10 is a part that performs overall processing of audio signals.
  • the functions realized by executing the reception module P11, the division module P12, and the output module P13 are the same as the functions of the reception unit 11, the division unit 12, and the output unit 13, respectively.
  • the audio signal processing program P1 may be provided after being fixedly recorded on a tangible recording medium such as a CD-ROM, DVD-ROM, or semiconductor memory. Alternatively, the audio signal processing program P1 may be provided via a communication network as a data signal superimposed on a carrier wave.
  • the audio signal processing device executes a reception unit that receives audio signals of a plurality of channels and a division process that divides the audio signal into coherent components and field components for each channel.
  • a dividing unit that performs division processing when the target channel is one channel that is the target of the division processing, the target of the estimated signals calculated using at least audio signals of channels other than the target channel An estimation signal having the highest correlation with the audio signal of the channel is extracted as a coherent component of the target channel, and a difference between the audio signal of the target channel and the coherent component of the target channel is extracted as a field component of the target channel.
  • Including a step, and the dividing unit and the dividing unit And an output unit for outputting the coherent component and field component of each issued channels.
  • An audio signal processing method includes an accepting step in which an audio signal processing device receives audio signals of a plurality of channels, and a division in which the audio signal processing device divides the audio signal into a coherent component and a field component.
  • a division step for performing processing for each channel, and the division processing is calculated using at least an audio signal of a channel other than the target channel when one channel to be divided is set as a target channel. Extracting an estimated signal having the highest correlation with the audio signal of the target channel among the estimated signals as a coherent component of the target channel; and calculating a difference between the audio signal of the target channel and the coherent component of the target channel.
  • Extract as field component of And a step comprises the said dividing step, the audio signal processing device, and an output step of outputting coherent component and field component of each channel extracted in dividing step.
  • An audio signal processing program is a reception step for receiving audio signals of a plurality of channels, and a division step for executing division processing for dividing the audio signal into coherent components and field components for each channel.
  • the division processing is performed using one channel that is the target of the division processing as the target channel, the estimation signal calculated by using at least the audio signal of the channel other than the target channel and the audio signal of the target channel Extracting an estimated signal having the highest correlation as a coherent component of the target channel; and extracting a difference between an audio signal of the target channel and a coherent component of the target channel as a field component of the target channel.
  • Split steps and split steps To execute an output step of outputting coherent component and field component of each channel extracted in up to the computer.
  • a signal that is estimated using an audio signal of a channel other than the target channel and has the highest correlation for each actual audio signal of the target channel is extracted as a coherent component of the target channel. Further, the difference between the actual audio signal of the target channel and its coherent component is extracted as the field component of the target channel.
  • This coherent component and field component are obtained for each channel. In this way, the coherent and field components of each channel are determined using only the original audio signal without adding sound, so that the atmosphere of the original sound (for example, the original tone) is maintained as completely or completely as possible. Can do.
  • the coherent component and the field component can be obtained by the number of original channels, this method can be applied regardless of the number of channels of the original sound. For example, one aspect of the present invention can be applied to audio signals having an arbitrary number of channels such as 2 channels, 3 channels, 5.1 channels, and 22.2 channels.
  • FIG. 9 is a diagram illustrating an example of extraction of a coherent component in a conventional method
  • FIG. 10 is a diagram illustrating an example of extraction of a coherent component in the above-described aspect.
  • 9 and 10 both show an example in which audio signals are output from three speakers 90 arranged in a triangular shape, and thus this example shows a three-channel audio system.
  • a component having a high correlation between two-channel audio signals is extracted as a coherent component 91 (note that a broken line 92 indicates a field component) ). Therefore, in such a conventional method, only the information of the sound located in the middle portion 93 of the two speakers (channels) 90 can be acquired, and the central portion of the region surrounded by the three speakers (channels) 90 Information on the sound located at 94 cannot be extracted.
  • the coherent component of one speaker (channel) 90 is estimated from the signal of another speaker (channel) 90. Therefore, as shown in FIG. 10, it is possible to extract information on the sound located in the central portion 95 of the area surrounded by the three speakers (channels) 90.
  • This central portion 95 may correspond to the sum of the portions 93 and 94 in FIG.
  • the dividing process performs a process of dividing the audio signal into a plurality of frames using a window function for each channel, and combines at least two consecutive frames into one block.
  • the process of generating a plurality of blocks by executing the process on the whole of the plurality of frames may be executed for each channel, and the step of extracting the coherent component of the target channel in each of the blocks may be included.
  • the number of samples for estimating the coherent component increases, so that the coherent component can be extracted with higher accuracy.
  • the dividing unit divides the audio signal of each channel into a plurality of subbands, thereby generating a plurality of subband signals for each channel; Extracting the coherent component of the target channel in each, and extracting the coherent component of the target channel by adding the coherent components in a plurality of subbands may be included.
  • a coherent component can be extracted according to the accuracy required in each frequency band, and thus a coherent component and a field component can be extracted with high accuracy.
  • Table 1 Seven stereo sound materials (that is, 2-channel audio signals) shown in Table 1 were prepared. All audio materials were obtained from commercially available CDs, and the sampling frequency was 44.1 kHz.
  • the name column in Table 1 shows the song name or the type of song, and the explanation column shows the form of performance. “Artifical” in the mixing column indicates that the material has been subjected to mixing processing, and “Natural” indicates that the material has not been subjected to mixing processing.
  • the length column shows the playback time.
  • a superposition addition method using a modified discrete cosine transform was employed.
  • the Kaiser-Bessel window was used as a window function for dividing the audio signal into a plurality of frames.
  • the frame length is 2048 points, which means that 1024 frequency points are obtained in MDCT.
  • the frequency points were grouped into 23 subbands as shown in Table 2. These subbands are a collection of 69 subbands in a 48 kHz long FFT (Fast Fourier Transform), one for every three consecutive subbands, with reference to the MPEG-2 AAC standard. 24 frames were taken as one block. If the sampling frequency was 44.1 kHz, the block length was equivalent to 0.58 seconds.
  • FFT Fast Fourier Transform
  • Table 3 shows cross-correlation coefficients of the original sound, the coherent component, and the field component.
  • the coherent component showed higher cross-correlation than the original sound.
  • Such a coherent component provides a sound field atmosphere narrower than the original sound.
  • the field component showed a negative cross-correlation except for one material (“Quiet Night”). If a field component showing a negative cross-correlation is reproduced by a speaker installed on the side or rear, a good ambience effect can be obtained. As a result, it is possible to reproduce a sound with a high presence.
  • the dividing unit 12 estimates a coherent component of a certain target channel using an audio signal of a channel other than the target channel.
  • the dividing unit estimates the coherent component of the target channel using the audio signal of the other channel and at least one of the past audio signal of the target channel and the past audio signal of the other channel.
  • the “past audio signal” is an audio signal of a block temporally preceding the block to be processed.
  • the procedure of the audio signal processing method executed by at least one processor is not limited to the example in the above embodiment.
  • the audio signal processing apparatus may omit some of the steps (processes) described above, or may execute the steps in a different order. Also, any two or more of the steps described above may be combined, or a part of the steps may be corrected or deleted. Alternatively, the audio signal processing apparatus may execute other steps in addition to the above steps.
  • the audio signal processing apparatus may use either of the two criteria “greater than” and “greater than” when comparing the magnitude relationship between the two values, and the two criteria “less than” and “less than”. Either of these may be used.
  • the selection of such a standard does not change the technical significance of the process of comparing the magnitude relationship between two numerical values.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

La présente invention concerne un dispositif de traitement de signal audio qui comprend : une unité d'acceptation destinée à accepter des signaux audio d'une pluralité de canaux, une unité de division destinée à diviser le signal audio de chaque canal en une composante cohérente et une composante de champ, et une unité de sortie destinée à fournir la composante cohérente et la composante de champ de chaque canal. Dans le processus de division, un signal estimé ayant la corrélation la plus élevée avec le signal audio d'un canal devant être traité parmi des signaux estimés à l'aide dudit signal audio d'un autre que la canal devant être traité, est extrait en tant que composante cohérente du canal devant être traitée. Par la suite, une différence de signal audio du canal devant être traité et de la composante cohérente est extraite d'une composante de champ.
PCT/JP2017/016019 2016-04-27 2017-04-21 Dispositif, procédé et programme de traitement de signal audio WO2017188141A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2018514561A JP6846822B2 (ja) 2016-04-27 2017-04-21 オーディオ信号処理装置、オーディオ信号処理方法、およびオーディオ信号処理プログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016089417 2016-04-27
JP2016-089417 2016-04-27

Publications (1)

Publication Number Publication Date
WO2017188141A1 true WO2017188141A1 (fr) 2017-11-02

Family

ID=60161634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/016019 WO2017188141A1 (fr) 2016-04-27 2017-04-21 Dispositif, procédé et programme de traitement de signal audio

Country Status (2)

Country Link
JP (1) JP6846822B2 (fr)
WO (1) WO2017188141A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008536183A (ja) * 2005-04-15 2008-09-04 コーディング テクノロジーズ アクチボラゲット 無相関信号の包絡線整形
JP2013517518A (ja) * 2010-01-15 2013-05-16 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ ダウンミックス信号と空間パラメータ情報からダイレクト/アンビエンス信号を抽出する装置および方法
JP2016501472A (ja) * 2012-11-15 2016-01-18 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 空間オーディオ信号の異なる再生スピーカ設定に対するセグメント毎の調整

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008536183A (ja) * 2005-04-15 2008-09-04 コーディング テクノロジーズ アクチボラゲット 無相関信号の包絡線整形
JP2013517518A (ja) * 2010-01-15 2013-05-16 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ ダウンミックス信号と空間パラメータ情報からダイレクト/アンビエンス信号を抽出する装置および方法
JP2016501472A (ja) * 2012-11-15 2016-01-18 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 空間オーディオ信号の異なる再生スピーカ設定に対するセグメント毎の調整

Also Published As

Publication number Publication date
JPWO2017188141A1 (ja) 2019-03-07
JP6846822B2 (ja) 2021-03-24

Similar Documents

Publication Publication Date Title
JP6637014B2 (ja) 音声信号処理のためのマルチチャネル直接・環境分解のための装置及び方法
US8346565B2 (en) Apparatus and method for generating an ambient signal from an audio signal, apparatus and method for deriving a multi-channel audio signal from an audio signal and computer program
CN101842834B (zh) 包括语音信号处理在内的生成多声道信号的设备和方法
CA2820351C (fr) Appareil et procede pour decomposer un signal d'entree a l'aide d'une courbe de reference precalculee
JP5379838B2 (ja) 空間出力マルチチャネルオーディオ信号を決定する装置
US8817991B2 (en) Advanced encoding of multi-channel digital audio signals
CN102907120B (zh) 用于声音处理的系统和方法
JP6198800B2 (ja) 少なくとも2つの出力チャネルを有する出力信号を生成するための装置および方法
GB2540175A (en) Spatial audio processing apparatus
JPWO2005112002A1 (ja) オーディオ信号符号化装置及びオーディオ信号復号化装置
US9913036B2 (en) Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
WO2022014326A1 (fr) Dispositif, procédé et programme de traitement de signal
US20080212784A1 (en) Parametric Multi-Channel Decoding
WO2017188141A1 (fr) Dispositif, procédé et programme de traitement de signal audio
EP4252432A1 (fr) Systèmes et procédés de mixage élévateur audio
Kraft et al. Low-complexity stereo signal decomposition and source separation for application in stereo to 3D upmixing
JP6694755B2 (ja) チャンネル数変換装置およびそのプログラム
AU2015238777B2 (en) Apparatus and Method for Generating an Output Signal having at least two Output Channels
WO2013176073A1 (fr) Dispositif de conversion de signaux audio, procédé, programme et support d'enregistrement
CN116643712A (zh) 电子设备、音频处理的系统及方法、计算机可读存储介质
AU2012252490A1 (en) Apparatus and method for generating an output signal employing a decomposer

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018514561

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17789424

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17789424

Country of ref document: EP

Kind code of ref document: A1