CN109121067B - Multichannel loudness equalization method and apparatus - Google Patents

Multichannel loudness equalization method and apparatus Download PDF

Info

Publication number
CN109121067B
CN109121067B CN201811223436.1A CN201811223436A CN109121067B CN 109121067 B CN109121067 B CN 109121067B CN 201811223436 A CN201811223436 A CN 201811223436A CN 109121067 B CN109121067 B CN 109121067B
Authority
CN
China
Prior art keywords
sound
images
level
output array
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811223436.1A
Other languages
Chinese (zh)
Other versions
CN109121067A (en
Inventor
邱锋海
匡敬辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sound+ Technology Co ltd
Original Assignee
Beijing Sound+ Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sound+ Technology Co ltd filed Critical Beijing Sound+ Technology Co ltd
Priority to CN201811223436.1A priority Critical patent/CN109121067B/en
Publication of CN109121067A publication Critical patent/CN109121067A/en
Application granted granted Critical
Publication of CN109121067B publication Critical patent/CN109121067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Abstract

The invention provides a method and equipment for multi-channel loudness equalization. In an embodiment, the method comprises: extracting level size and azimuth of each sound image from the multi-channel signal; determining a loudness gain function independent of each sound image according to the frequency response characteristic of a preset output array, the level of each sound image and the target loudness; adjusting the level of each sound image in each sound image according to the loudness gain function and the frequency response characteristic of the actual output array; and distributing the multi-channel signals of each output channel to the actual output array according to the azimuth of each sound image and the adjusted level of the sound image. According to the embodiment of the invention, the multi-channel combined dynamic balance is carried out based on the sound image loudness, so that the frequency spectrum balance of sound field perception and the stable direction of each sound image in the sound field under different volumes and different frequency response characteristics of sound reproducing equipment are ensured.

Description

Multichannel loudness equalization method and apparatus
Technical Field
The present application relates to the field of audio signal processing, and in particular, to a method and apparatus for multi-channel loudness equalization.
Background
From birth to the present, the loudspeaker system goes through different stages of monophony, stereophonic, surround sound, analog 3D, holographic 3D and the like, the number of channels is more and more, and simultaneously, the stereo sound effect to the surround sound effect are gradually developed to the present 3D sound effect. People are continuously pursuing richer and richer audio experiences, and hope that a loudspeaker can be used for truly reproducing a spatial physical sound field to obtain a listening feeling similar to that in a natural environment. Earphones are used to play 3D audio at the mobile terminal.
Because the relative perception size of the human ear to low frequency, medium frequency and high frequency changes nonlinearly with the sound pressure level, the linear equalization of the loudspeaker or earphone (increasing or decreasing the same sound pressure level of each sub-band of the frequency domain) will cause the frequency spectrum balance perceived by the human ear to change, and affect the tone color and artistic effect expression of the audio. There is therefore a need for non-linear equalization of audio, such as automatic gain control, dynamic range control, dynamic equalization, etc.
The traditional dynamic equalization method of the loudspeaker array or the earphone carries out independent dynamic equalization on each sound channel, which easily causes the azimuth drift of the sound image or the azimuth distortion of the sound image. For 3D audio playback, this conventional dynamic equalization method tends to destroy the original sound image position.
Disclosure of Invention
In a first aspect, an embodiment of the present invention provides a method for multichannel loudness equalization, including: extracting level sizes and orientations of K sound images in conformity with a first output array having N playback units from a first multi-channel signal of N channels; wherein N is more than or equal to 2, and K is more than or equal to 1; the first output array is an array preset at the time of encoding for reproducing a first multi-channel signal; determining a loudness gain function independent of each sound image according to the frequency response characteristic of the first output array, the level sizes of the K sound images and the target loudness; adjusting the level of each sound image in the K sound images according to the loudness gain function and the frequency response characteristic of a second output array with M playback units; wherein M is more than or equal to 2; and distributing the second multi-channel signals of the M channels to the second output array according to the orientation of the K sound images and the adjusted sound image level of the K sound images.
In a possible implementation of the first aspect, the extracting, from the first multi-channel signal of the N channels, the level size and orientation of the K sound images in conformity with the first output array having the N playback units includes separating, from the first multi-channel signal, sound image component signals of the K sound images on the N channels in accordance with the orientation of the N playback units of the first output array; the level sizes and orientations of the K sound images are extracted from the sound image component signals by vector synthesis.
In a possible embodiment of the first aspect, determining the loudness gain function independent of the sound images based on the frequency response of the first output array, the level magnitudes of the K sound images and the target loudness includes calculating a sound pressure level of each of the K sound images based on the level magnitudes of each of the K sound images and the frequency response characteristics of the N playback units of the first output array; dividing the sound pressure level of each sound image in the K sound images into sound pressure level signals of the K sound images in P sub-bands; determining the current sound pressure level of each sub-band in the P sub-bands according to the sound pressure level signals of the K sound images in the P sub-bands; determining gain functions of the P sub-bands according to the expected sound pressure levels of the P sub-bands and the current sound pressure levels of the P sub-bands, and forming the loudness gain function; wherein the expected sound pressure level of the P sub-bands is determined by the current sound pressure level of the P sub-bands and a preset loudness variation formula; adjusting the level of each of the K sound images according to the loudness gain function and the frequency response characteristics of the second output array having M playback units includes determining desired sound pressure levels of the K sound images at the P subbands according to the gain functions of the P subbands; determining the expected sound pressure level of each sound image in the K sound images according to the expected sound pressure levels of the K sound images in the P sub-bands; the level of each sound image is adjusted according to the desired sound pressure levels of the K sound images and the frequency response characteristics of the second output array. In a further possible implementation of the first aspect, the extracting from the first multi-channel signal of N channels the level magnitudes and orientations of the K sound images coinciding with the first output array having the N playback units comprises extracting from the first multi-channel signal of N channels discrete component signals of N channels coinciding with the first output array; the method further includes adjusting the discrete component signals of the N channels according to the sum of the desired sound pressure levels of the K sound images and the sum of the current sound pressure levels of the K sound images; distributing the discrete component signals of M sound channels to M reproduction units of a second output array according to the discrete component signals adjusted by the N sound channels; the method further comprises channel-wise superimposing a second multi-channel signal of the M second channels and discrete component signals of the M channels.
In a possible embodiment of the first aspect, assigning the second multi-channel signals of the M channels to the second output array in dependence on the orientations of the K sound images and the adjusted sound image level sizes of the K sound images comprises selecting at least one first playback unit adjacent to the orientation of each of the K sound images from the M playback units of the second output array, assigning the level sizes of the sound images to the at least one first playback unit by vector translation, thereby determining the channel signals assigned to the at least one first playback unit.
In a possible implementation of the first aspect, the second output array comprises a left headphone unit and a right headphone unit, and the assigning the second multi-channel signals of the M channels to the second output array in dependence on the orientation of the K sound images and the adjusted sound image level size of the K sound images comprises determining the channel signals of the left headphone unit and the right headphone unit in dependence on a head-related transfer function of the left ear and a head-related transfer function of the right ear and the adjusted sound image level size of the K sound images.
In a second aspect, there is provided an apparatus for multichannel loudness equalization, comprising: an extracting module for extracting a level size and an orientation of K sound images in conformity with a first output array having N playback units from a first multi-channel signal of N channels; wherein N is more than or equal to 2, and K is more than or equal to 1; the first output array is an array preset at the time of encoding for reproducing a first multi-channel signal; the determining module is used for determining a loudness gain function independent of each sound image according to the frequency response characteristic of the first output array, the level sizes of the K sound images and the target loudness; the adjusting module is used for adjusting the level of each sound image in the K sound images according to the loudness gain function and the frequency response characteristic of the second output array with the M playback units; wherein M is more than or equal to 2; and the distribution module is used for distributing the second multi-channel signals of the M sound channels to the second output array according to the orientation of the K sound images and the adjusted sound image level of the K sound images.
In a possible implementation of the second aspect, the extraction module separates the sound image component signals of the K sound images on the N channels from the first multi-channel signal according to the orientations of the N playback units of the first output array; the level sizes and orientations of the K sound images are extracted from the sound image component signals by vector synthesis.
In a possible embodiment of the second aspect, the determination module calculates the sound pressure level of each of the K sound images based on the level size of each of the K sound images and the frequency response characteristics of the N playback units of the first output array; dividing the sound pressure level of each sound image in the K sound images into sound pressure level signals of the K sound images in P sub-bands; determining the current sound pressure level of each sub-band in the P sub-bands according to the sound pressure level signals of the K sound images in the P sub-bands; determining gain functions of the P sub-bands according to the expected sound pressure levels of the P sub-bands and the current sound pressure levels of the P sub-bands, and forming the loudness gain function; wherein the expected sound pressure level of the P sub-bands is determined by the current sound pressure level of the P sub-bands and a preset loudness variation formula; the adjusting module determines the expected sound pressure levels of the K sound images in the P sub-bands according to the gain functions of the P sub-bands; determining the expected sound pressure level of each sound image in the K sound images according to the expected sound pressure levels of the K sound images in the P sub-bands; the level of each sound image is adjusted according to the desired sound pressure levels of the K sound images and the frequency response characteristics of the second output array.
In a further possible implementation of the second aspect, the extraction module extracts discrete component signals of the N channels coinciding with the first output array from a first multi-channel signal of the N channels; the equipment also comprises a discrete component adjusting module, which adjusts the discrete component signals of the N sound channels according to the sum of the expected sound pressure levels of the K sound images and the sum of the current sound pressure levels of the K sound images; the discrete component distribution module is used for distributing the discrete component signals of the M sound channels to the M reproduction units of the second output array according to the discrete component signals adjusted by the N sound channels; and the superposition module is used for superposing the second multi-channel signals of the M second channels and the discrete component signals of the M channels according to the channels.
In a possible embodiment of the second aspect, the distribution module selects at least one first playback unit adjacent to the azimuth of each sound image in the K sound images from the M playback units of the second output array, distributes the level magnitude of the sound image to the at least one first playback unit by vector panning, and thereby determines the channel signal distributed to the at least one first playback unit.
In a possible embodiment of the second aspect, the second output array comprises a left headphone unit and a right headphone unit, and the assignment module determines the channel signals of the left headphone unit and the right headphone unit according to the head-related transfer function of the left ear and the head-related transfer function of the right ear and the adjusted sound image level sizes of the K sound images.
According to the embodiment of the invention, the multi-channel combined dynamic balance is carried out based on the sound image loudness, so that the frequency spectrum balance of sound field perception and the stable direction of each sound image in the sound field under different volumes and different frequency response characteristics of sound reproducing equipment are ensured. The method and the device according to the embodiment of the invention have universal adaptability to multiple playback forms such as multi-channel sound playback and upmix and downmix.
Drawings
Fig. 1 is a block diagram of a multi-channel playback apparatus according to an embodiment of the present invention;
FIG. 2 is an example of a two-channel speaker array;
FIG. 3 is an example of a five channel speaker array;
fig. 4 is a method of loudness equalization at the input of a speaker array, at the output of a speaker array, in accordance with an embodiment of the present invention;
fig. 5 is a method of loudness equalization at speaker array input, headphone output, in accordance with an embodiment of the present invention;
fig. 6 is a loudness calculation flow chart.
FIG. 7 illustrates a detailed block diagram of a device that may be used to implement the various techniques described above according to embodiments of the present description;
fig. 8 is a device for multi-channel loudness equalization.
Detailed Description
More and more multi-channel applications, achieve spectral balance and preserve timbre through non-linear equalization (which can also be expressed as dynamic range compression, automatic gain control, dynamic equalization). The method is of great significance to loudspeaker array systems and headphone playback, and the nonlinear equalization is used in 3D audio playback to keep tone color and spectrum balance and sound image orientation stable.
In the embodiment of the invention, the size and the orientation of the sound image are firstly deduced from the multi-channel signals, the loudness gain function of the sound image is calculated according to the target loudness of the sound image and the frequency response characteristic of the actual playback unit, and the size of the sound image is modified on the basis. The modified sound image signal is distributed to the actual playback unit according to the sound image orientation and the array orientation.
Fig. 1 is a block diagram of a multi-channel playback apparatus according to an embodiment of the present invention. As shown in fig. 1, the left side of the multi-channel playback apparatus is a signal input unit for receiving a multi-channel signal. The multi-channel signal may be stereo or may be a 5.1 or 7.1 multi-channel signal. Those skilled in the art will appreciate that these multi-channel signals will typically correspond to an array of playback units, referred to herein as a reference or default output array, from which the multi-channel signal can be preferably reproduced. The preset output array is composed of a plurality of playback units, one for each channel, such as speakers or headphones. In the case where the preset output array has L playback units, the input multi-channel signals are x respectively1(t),x2(t),…xL(t); and each sound channelThe corresponding playback units are respectively oriented
Figure BDA0001835355080000061
(the speaker orientation is based on the orientation of the emperor's position as the center). Fig. 2 is an example of a two-channel speaker array for producing stereo sound, and fig. 3 is an example of a five-channel speaker array for producing 5.0 surround sound.
On the right side of the multi-channel playback device is a signal output unit for outputting a multi-channel signal that can be supplied to the actual output array for playback. The actual output array may be a speaker array or a headphone. In the case of an actual output array having M playback elements, the output signals of the individual channels to be sent to the playback elements are q1(t),q2(t),…qM(t) and the orientations of the reproducing units are respectively noted
Figure BDA0001835355080000062
The actual output array may have the same number of playback elements and arrangement orientation as the preset output array, or may have a different number of playback elements and arrangement orientation. For example, the input signal is a 5.1 multi-channel signal, and accordingly, the preset output array is a 5+1 speaker array playing back the 5.1 multi-channel signal. While the actual output array may be a 7+1 loudspeaker array that reproduces a 7.1 multi-channel signal. In the context of upmix and downmix playback, the number and orientation of playback elements of the preset output array and the actual output array are different. In some scenarios, it is desirable to compare the sound quality of different speaker systems (e.g., a and B systems), where it is desirable that the respective driving signals are relatively matched, the a system may be configured as a default output array, and the B system may be configured as an actual output array.
In the middle of a multi-channel playback device is a signal processor for implementing multi-channel joint non-linear equalization based on sound image loudness and bearing. For an input multi-channel signal, coherent components related to a sound image can be decomposed through a sound source coherent analysis method, a blind source separation method and the like, and the level size and the orientation of the sound image are deduced. And converting the level signal of the sound image into sound pressure level through the frequency response characteristic of the preset output array. Then, calculating a sound image gain function according to the target loudness of each sound image; wherein the image gain function is independent of the individual images. Based on the sound image gain function, the sound pressure level of each sound image can be adjusted. The adjusted or desired sound pressure levels of the respective sound images are converted into level signals by the frequency response characteristics of the actual output array, and then distributed to the respective playback units of the actual output array in accordance with the sound image orientation and the actual output array orientation. It should be noted that the preset output array and the actual output array may have different frequency response characteristics.
The embodiment of the invention carries out coding and decoding to each loudspeaker based on the characteristics and expected loudness of each sound image, and breaks through the traditional concept of independent equalization of each sound channel during multi-channel playback. While the conventional approach performs dynamic equalization after decoding of the speaker array, embodiments of the present invention perform dynamic equalization before decoding of the speaker signal. The embodiment of the invention has universal adaptability to multiple playback forms such as multi-channel sound playback, up-mixing and down-mixing and the like.
In one embodiment, the sound pressure level signal and corresponding loudness of each sound image in different sub-bands can be determined, and then the sub-band gain function of each sub-band is determined by using the loudness of each sub-band and the expected loudness of each sub-band, so that the sound image gain function is determined; the desired level signal for each sound image is determined by adjusting the output sound pressure level of each sound image at each sub-band according to the sub-band gain function for each sub-band.
A specific embodiment of the present invention will be described in detail below with reference to fig. 4. Fig. 4 illustrates a method of loudness equalization at the input of a loudspeaker array, at the output of the loudspeaker array. As shown in fig. 4, loudness equalization according to an embodiment of the present invention mainly includes five parts: separating coherent components; extracting the size and the direction of the acoustic image level; calculating loudness; desired sound image and discrete component calculation; and, a channel coherent component and a discrete component assignment are desired.
1. Coherent component separation
There is a multi-channel signal on the left side of the figure. The figure shows a signal with N channels, x respectively1(t),x2(t),…xN(t) of (d). And NThe preset output array corresponding to the sound channel has reproduction units (e.g. loudspeakers) with respective orientations
Figure BDA0001835355080000071
(the speaker orientation is based on the orientation of the emperor's position as the center). The preset output array is an array preset at the time of encoding for reproducing the first multi-channel signal
Signal x of the ith channeli(n) generally consists of the following coherent and discrete components:
Figure BDA0001835355080000081
wherein s isik(t) represents a coherent component of the k sound image distributed on the i channel at time t. It is assumed here that K sound images exist in a multi-channel signal. e.g. of the typei(t) represents discrete components of the ith channel, such as noise, late reverberation, etc.
Individual coherent components s of different sound images in each channelik(t) are uncorrelated, and coherent components are uncorrelated with discrete components; in contrast, the coherent components of each sound image distributed in the respective channels are highly correlated with each other. Thus, for a multi-channel signal, coherent signal separation can be performed based on the correlation of the components. For example, coherent Component Separation of each channel can be performed by a Principal Component Analysis (PCA), an Independent Component Analysis (ICA), a Blind Source Separation (BSS), or the like.
If the signal is analyzed in a short-time frequency domain, the frequency domain expression of the formula (1) is as follows:
Figure BDA0001835355080000082
wherein S isik(jw) denotes a coherent (frequency) component of the k sound image distributed on the i channel in a short time frame, Ei(jw) represents discrete (frequency) components, such as noise, late reverberation, etc., corresponding to the ith channel in a short time frame. Short time frequency domainThe analysis can be done by Fast Fourier Transform (FFT).
Of course, those skilled in the art will recognize that coherent signal separation may also be implemented directly in the frequency domain. And will not be described herein.
2. Acoustic image size and orientation extraction
Based on speaker position in preset output array
Figure BDA0001835355080000083
And the component level size S of each sound image in each channelik(jw), the level size Y of the kth sound image can be extracted by a vector composition methodk(j ω) and perceived orientation
Figure BDA0001835355080000084
Assuming that the size and orientation of the sound image remain substantially unchanged before and after the sound image is extracted, the level size Y of the sound image is derived under the following conditionsk(j ω) and perceived orientation
Figure BDA0001835355080000085
Figure BDA0001835355080000091
Figure BDA0001835355080000092
The sound image level and perceived orientation under the input or reference loudspeaker layout can be obtained, and the deduced sound image conforms to the directional perception characteristic of the original loudspeaker array sound reproduction.
Also, if the size and orientation of the sound image before and after distributing the sound image are required to be maintained substantially constant between the orientation of the speakers of the actual output array and the orientation perceived by the sound image, the orientation perceived by the actual output array is similar to the orientation perceived by the preset output array.
3. Loudness calculation or equalization of sound images
Due to the sound reproduction characteristics (such as frequency response) of the speaker or the earphone unit itself, the requirements of the program source (such as large dynamic range of volume of the symphony music itself), or the perception characteristics of human ears (such as people with hearing loss), and the like, nonlinear equalization such as automatic gain control, dynamic range control, dynamic equalization and the like needs to be performed on the audio. If such loudness equalization calculates loudness independently on a channel-by-channel basis when used for speaker array playback, the nonlinear equalization method is highly likely to destroy the bearing of the original sound image.
Therefore, it is necessary that multi-channel joint non-linear equalization is performed based on the sound image loudness and bearing. The size of each sound image may be converted from the level domain to the sound pressure level domain and the loudness domain, then the loudness gain function is determined according to the target loudness, and the sound pressure level of the sound image is adjusted according to the loudness gain function, again converting the adjusted sound image size from the sound pressure level domain back to the level domain. In one embodiment, multi-channel joint non-linear equalization may be implemented in subbands. Fig. 6 is an exploded structural view of the sub-band loudness calculation. As shown in fig. 6, the loudness equalization may be divided into sound pressure level calculation, sub-band loudness calculation, sub-band desired sound pressure level calculation. The method comprises the following specific steps.
1) According to the level size Y of each sound imagek(j ω) and frequency response H of the preset output arrayin(j ω) separately calculating the reference (or input) sound pressure level SPL for each sound imagek(j ω). In one example, the frequency response of each playback element of the preset output array is uniform.
SPLk(jω)=20log10((Yk(jω)Hin(jω))/(2×10-5)) (5)
2) The reference sound pressure level SPL of each sound image is divided by a sub-band analysis filter with auditory perception characteristic or a rough frequency domain band dividing methodk(j ω) is divided into P sub-bands, and the reference sound pressure level signal of the P sub-band of the kth sound image is SPLkp(jω),
3) Reference sound pressure level SPL at p-th sub-band according to each sound imagekp(j ω) determining the reference (or current) sound pressure level SPL at the p-th sub-bandpSUM(jω)
Figure BDA0001835355080000101
4) From the relation of the total sound pressure level and loudness of the sub-bands in ISO226, by SPLkp(j ω) calculating the subband reference or current specific loudness L of the kth sound image at the p-th subbandkp(j ω) and calculating a desired loudness of the k sound image at the p-th sub-band according to a preset loudness variation formula
Figure BDA0001835355080000102
Or referred to as the output loudness,
Figure BDA0001835355080000103
5) according to the relation between the total sound pressure level and the loudness of the sub-band in ISO226, the desired loudness
Figure BDA0001835355080000104
Deducing the desired sound pressure level of the respective sub-band
Figure BDA0001835355080000105
Then calculating the expected sound pressure level of the k sound image at the p sub-band
Figure BDA0001835355080000106
Figure BDA0001835355080000107
Desired sound pressure level of pth sub-band
Figure BDA0001835355080000108
Reference or current sound pressure level SPL with the p-th sub-bandpSUMThe ratio (j ω) can be regarded as the gain function of the sound image at the p-th subband. Since the gain function is determined in terms of loudness, it may also be referred to as a loudness gain function. Of course, those skilled in the art will recognize that other ways of determining the gain of the acoustic image may be employedA function.
6) Sound pressure level of P sub-bands of Kth sound image
Figure BDA0001835355080000109
Are spliced together to obtain
Figure BDA00018353550800001010
I.e. the desired sound pressure level of the k-th sound image. This derives the desired sound pressure level for each sound image.
7) Next, a desired sound pressure level of the discrete components of each channel is obtained
Figure BDA00018353550800001011
Figure BDA00018353550800001012
The equalization of discrete components may be linear equalization or nonlinear equalization, and mainly ensures that the energy ratio of the output sound image components and the discrete components is kept close to that of the input. In one example, the desired sound pressure level of the discrete components of each channel may be adjusted according to the sum of the desired sound pressure levels of the respective sound images and the sum of the reference or current sound pressure level as follows. Of course, those skilled in the art will recognize that other methods of equalizing the discrete components may be employed.
Figure BDA0001835355080000111
The loudness equalization of the sound image is carried out by adopting a subband analysis method in the foregoing. However, those skilled in the art will recognize that the sub-band analysis filter may be replaced with an FFT transform, and the energy of each band is estimated in the frequency domain by coarse banded estimation. And the FFT conversion is adopted, so that the operation amount is small, and a processing module can be shared with the loudness calculation.
The equalization method based on loudness calculation can keep the spectrum balance of audio perception while changing the dynamic range of audio or improving the playback volume, thereby effectively protecting the tone quality of audio.
4. Desired audio image and discrete component calculation
According to the frequency response characteristic H of the actual output arrayout(j ω) and k-th sound image output pressure level
Figure BDA0001835355080000112
Deriving desired or output level signal of k-th sound image
Figure BDA0001835355080000113
Figure BDA0001835355080000114
Similarly, the desired or output level signal for the discrete components of each channel is derived.
Figure BDA0001835355080000115
In the case where the actual output array is a speaker array, the output sound image is distributed to the respective speakers to be reproduced in accordance with the layout of the reproduced speakers and the estimated orientation of the sound image.
(1) Assuming that there are M speakers in the actual output array, the orientation of each speaker is noted as
Figure BDA0001835355080000116
(the speaker orientation is based on the orientation of the emperor's position as the center). For the k sound image
Figure BDA0001835355080000117
And its orientation
Figure BDA0001835355080000118
Selecting and from the actual output array
Figure BDA0001835355080000119
Nearest neighbor loudspeaker azimuth
Figure BDA00018353550800001110
Imaging the kth image by vector translation
Figure BDA00018353550800001111
Distributed to several adjacent loudspeakers, the distributed sound image level of each loudspeaker is T1k(jω),T2k(jω)…TMk(jω)。
Assuming that the size and orientation of the sound image remain substantially unchanged before and after the sound image distribution, the level size T of the sound image distributed to the speakers can be determined by the following condition1k(jω),T2k(jω)…TMk(jω):
Figure BDA00018353550800001112
Here TikThere are multiple solutions to (j ω), and an optimal solution can be found. The optimal solution is obtained by vector synthesis using the minimum number of speakers and the speaker that is closest to the angle of the sound image.
The coherent component of each loudspeaker of the actual output array being the sum of the levels at which the respective sound images are distributed to that loudspeaker, i.e.
Figure BDA0001835355080000121
(2) The discrete component of each speaker of the actual output array is set to G1(jω),G2(jω)…GM(j ω) outputting the discrete component output values of the preset output array by a vector translation method
Figure BDA0001835355080000122
(the orientation of the loudspeakers corresponding to the preset output array is
Figure BDA0001835355080000123
) The discrete component size G of the ith channel from the preset output array distributed to the adjacent loudspeakers and distributed to each loudspeaker of the actual output array1i(jω),G2i(jω)…GMi(jω)。
Assuming a transformation from a preset output arrayTo the actual output array, the size and orientation of the discrete components remain substantially unchanged, and G can be determined as follows1i(jω),G2i(jω)…GMi(jω):
Figure BDA0001835355080000124
Where G isjiThere are multiple solutions to (j ω), and an optimal solution can be found. The optimization solution is obtained by vector synthesis using the minimum number of loudspeakers and the loudspeakers closest to the original loudspeaker orientation.
The discrete component of each loudspeaker in the actual output array being the sum of the discrete components assigned to that loudspeaker by each channel of the preset output array, i.e.
Figure BDA0001835355080000125
Those skilled in the art will appreciate that the solution for discrete components may be obtained without the use of vector synthesis. A simpler method can be employed without regard to the direction information. For example, it can be obtained by a simple arithmetic mean method.
(3) Suppose the output signal of each channel of the loudspeaker to be reproduced is q1(t),q2(t),…qM(t) frequency domain represented by Q1(jω),Q2(jω),…QM(jω),
Figure BDA0001835355080000126
Even if the volume and the position of the loudspeaker are adjusted during 3D sound reproduction, the spectral balance of the timbre and the sound image orientation can be kept stable.
The above has been described in connection with the reproduction of speaker arrays where the preset output array and the actual output array may be different. In this specification, the sound reproduction includes not only the upmix and downmix sound reproduction but also sound reproduction that keeps the preset output array in conformity with the actual output array matrix and headphone sound reproduction. Fig. 5 illustrates a loudness equalization method at speaker array input, headphone output. Fig. 5 differs from fig. 4 in the last segment, the headphone sound reproduction segment, or the desired channel coherence component and discrete component assignments.
In headphone sound reproduction, the transmission of sound waves from a sound source to both ears is expressed using a head related transfer function (hrtf). By using
Figure BDA0001835355080000131
And
Figure BDA0001835355080000132
representing sound sources
Figure BDA0001835355080000133
Time domain representation of transfer function with directions reaching the left and right ear canals of the human ear, respectively, by
Figure BDA0001835355080000134
And
Figure BDA0001835355080000135
representing sound sources
Figure BDA0001835355080000136
Directions reach the frequency domain representation of the transfer function at the left and right ear canals of the human ear, respectively. Either live or artificial head data may be used, and may be a transfer function in the free or reverberant field.
(1) Calculating coherent components T in output signals of left and right headphone unitsL(j ω) and TR(jω)
Figure BDA0001835355080000137
Figure BDA0001835355080000138
(2) Calculating discrete component G in output signals of left and right earphone unitsL(j ω) and GR(jω). There are various methods for the calculation of the discrete components, and there is no need to consider the addition of the hrtf factor.
Such as: the original discrete components of the left half plane are distributed to G according to a certain proportionL(j ω) assigning the original discrete components of the second half plane to G in a certain proportionR(jω)。
(3) Calculating the signal Q of each of the left and right earphone unitsL(j ω) and QR(jω)
Figure BDA0001835355080000139
Considering the loudness and azimuth of each sound image in a 3D sound field, embodiments of the present invention ensure that dynamic equalization does not result in distortion of timbre and confusion of azimuth. Practical experiments prove that the embodiment of the invention can keep the spectral balance of timbre and stable sound-image orientation even if the volume and the loudspeaker position are adjusted during 3D sound reproduction.
FIG. 7 illustrates a detailed block diagram of a device that may be used to implement the various techniques described above according to embodiments of the present description. The block diagram illustrates the hardware basis on which the method flows shown in fig. 4-6 can be implemented. As shown in fig. 7, the device may include a processor 702 for controlling a microprocessor or controller 711 of the overall operation of the device. The data bus 715 may be used for data transfer between the storage device 740, the processor 702, and the controller 717, among others. The controller 711 may be used to interact with and control various devices via a device control bus 717. The device may also include a network/bus interface 714 that couples to a data link 712. In the case of a wireless connection, the network/bus interface 714 may include a wireless transceiver.
The apparatus further comprises a storage 740. In one example, the storage device stores software; in operation, software is loaded from RAM740 into RAM720 and thereby controls processor 702 to perform operations including: extracting level sizes and orientations of K sound images in conformity with a preset output array having N playback units from a first multi-channel signal of N channels; wherein N is more than or equal to 2, and K is more than or equal to 1; determining a loudness gain function independent of each sound image according to the frequency response characteristic of a preset output array, the level size and the target loudness of K sound images; adjusting the level of each sound image in the K sound images according to the loudness gain function and the frequency response characteristic of the actual output array with the M playback units; wherein M is more than or equal to 2; distributing the second multi-channel signals of the M channels to an actual output array having M playback units according to the orientations of the K sound images and the adjusted sound image level sizes of the K sound images.
It will be appreciated that the apparatus described herein may in many respects utilize or be combined with the method embodiments described above.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the various embodiments of the specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4-6.
Fig. 8 illustrates some possible scenarios in which the functions described in the embodiments of this specification are implemented in hardware, firmware, or a combination thereof or in combination with software. In particular, fig. 8 is a device for multi-channel loudness equalization, the device comprising: an extracting module 802 for extracting, from a first multi-channel signal of N channels, level sizes and orientations of K sound images in conformity with a preset output array having N playback units; wherein N is more than or equal to 2, and K is more than or equal to 1; the determining module 804 is configured to determine a loudness gain function independent of each sound image according to the frequency response characteristic of the preset output array, the level of the K sound images, and the target loudness; an adjusting module 806 for adjusting the level of each sound image of the K sound images according to the loudness gain function and the frequency response characteristic of the actual output array having M playback units; wherein M is more than or equal to 2; and the distributing module 808 is configured to distribute the second multi-channel signals of the M channels to the actual output array according to the orientations of the K sound images and the adjusted sound image level sizes of the K sound images.
It will be appreciated that the multi-channel loudness equalization apparatus described herein may in many ways make use of or be combined with the previously described method embodiments.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (12)

1. A method of multichannel loudness equalization, comprising:
extracting level sizes and orientations of K sound images in conformity with a first output array having N playback units from a first multi-channel signal of N channels; wherein N is more than or equal to 2, and K is more than or equal to 1; the first output array is an array preset at the time of encoding for reproducing a first multi-channel signal;
calculating the sound pressure level of each sound image in the K sound images according to the frequency response characteristic of the first output array, the level size and the target loudness of the K sound images; dividing the sound pressure level of each sound image in the K sound images into sound pressure level signals of the K sound images in P sub-bands, and determining a loudness gain function independent of each sound image according to the expected sound pressure level signals and the current sound pressure level signals of the P sub-bands;
adjusting the level of each sound image in the K sound images according to the loudness gain function and the frequency response characteristic of a second output array with M playback units; wherein M is more than or equal to 2;
and distributing the second multi-channel signals of the M channels to the second output array according to the orientation of the K sound images and the adjusted sound image level of the K sound images.
2. The method of claim 1, wherein the extracting, from the first multi-channel signal of N channels, the level size and orientation of the K sound images in conformity with the first output array having N playback units comprises separating, from the first multi-channel signal, sound image component signals of the K sound images on the N channels in accordance with the orientation of the N playback units of the first output array; the level sizes and orientations of the K sound images are extracted from the sound image component signals by vector synthesis.
3. The method of claim 1 wherein determining a loudness gain function independent of the respective sound image based on the desired and current sound pressure level signals for the P subbands comprises determining a current sound pressure level for each of the P subbands based on the sound pressure level signals for the K sound images in the P subbands; determining gain functions of the P sub-bands according to the expected sound pressure levels of the P sub-bands and the current sound pressure levels of the P sub-bands, and forming the loudness gain function; wherein the expected sound pressure level of the P sub-bands is determined by the current sound pressure level of the P sub-bands and a preset loudness variation formula;
adjusting the level of each of the K sound images according to the loudness gain function and the frequency response characteristics of the second output array having M playback units includes determining desired sound pressure levels of the K sound images at the P subbands according to the gain functions of the P subbands; determining the expected sound pressure level of each sound image in the K sound images according to the expected sound pressure levels of the K sound images in the P sub-bands; the level of each sound image is adjusted according to the desired sound pressure levels of the K sound images and the frequency response characteristics of the second output array.
4. A method according to claim 3, wherein said extracting from the N-channel first multi-channel signal the level size and orientation of the K sound images in conformity with the first output array having the N playback units comprises extracting from the N-channel first multi-channel signal the discrete component signals of the N channels in conformity with the first output array;
the method further includes adjusting the discrete component signals of the N channels according to the sum of the desired sound pressure levels of the K sound images and the sum of the current sound pressure levels of the K sound images; distributing the discrete component signals of M sound channels to M reproduction units of a second output array according to the discrete component signals adjusted by the N sound channels; the method further comprises channel-wise superimposing a second multi-channel signal of the M second channels and discrete component signals of the M channels.
5. The method of claim 1, wherein assigning the second multi-channel signals of the M channels to the second output array based on the orientations of the K sound images and the adjusted sound image level sizes of the K sound images comprises selecting at least one first playback unit adjacent to the orientations of the respective sound images in the K sound images from among the M playback units of the second output array, and assigning the level sizes of the sound images to the at least one first playback unit by vector panning, thereby determining the channel signals assigned to the at least one first playback unit.
6. The method of claim 1, wherein the second output array includes a left headphone unit and a right headphone unit, and the assigning the second multi-channel signals of the M channels to the second output array based on the orientations of the K sound images and the adjusted sound image level sizes of the K sound images comprises determining the channel signals of the left headphone unit and the right headphone unit based on the head-related transfer function of the left ear and the head-related transfer function of the right ear and the adjusted sound image level sizes of the K sound images.
7. An apparatus for multichannel loudness equalization, comprising:
an extracting module for extracting a level size and an orientation of K sound images in conformity with a first output array having N playback units from a first multi-channel signal of N channels; wherein N is more than or equal to 2, and K is more than or equal to 1; the first output array is an array preset at the time of encoding for reproducing a first multi-channel signal;
the determining module is used for calculating the sound pressure level of each sound image in the K sound images according to the frequency response characteristic of the first output array, the level size and the target loudness of the K sound images; dividing the sound pressure level of each sound image in the K sound images into sound pressure level signals of the K sound images in P sub-bands, and determining a loudness gain function independent of each sound image according to the expected sound pressure level signals and the current sound pressure level signals of the P sub-bands;
the sound image adjusting module is used for adjusting the level of each sound image in the K sound images according to the loudness gain function and the frequency response characteristic of the second output array with the M playback units; wherein M is more than or equal to 2;
and the distribution module is used for distributing the second multi-channel signals of the M sound channels to the second output array according to the orientation of the K sound images and the adjusted sound image level of the K sound images.
8. The apparatus of claim 7, wherein the extracting module separates the sound image component signals of K sound images on N channels from the first multi-channel signal according to orientations of the N playback units of the first output array; the level sizes and orientations of the K sound images are extracted from the sound image component signals by vector synthesis.
9. The apparatus of claim 7, wherein the determining module determines the current sound pressure level of each of the P subbands based on the sound pressure level signals of the K sound images in the P subbands; determining gain functions of the P sub-bands according to the expected sound pressure levels of the P sub-bands and the current sound pressure levels of the P sub-bands, and forming the loudness gain function; wherein the expected sound pressure level of the P sub-bands is determined by the current sound pressure level of the P sub-bands and a preset loudness variation formula;
the adjusting module determines the expected sound pressure levels of the K sound images in the P sub-bands according to the gain functions of the P sub-bands; determining the expected sound pressure level of each sound image in the K sound images according to the expected sound pressure levels of the K sound images in the P sub-bands; the level of each sound image is adjusted according to the desired sound pressure levels of the K sound images and the frequency response characteristics of the second output array.
10. The apparatus of claim 9, wherein the extraction module extracts discrete component signals of N channels coinciding with the first output array from a first multi-channel signal of N channels;
the equipment also comprises a discrete component adjusting module, which adjusts the discrete component signals of the N sound channels according to the sum of the expected sound pressure levels of the K sound images and the sum of the current sound pressure levels of the K sound images;
the discrete component distribution module is used for distributing the discrete component signals of the M sound channels to the M reproduction units of the second output array according to the discrete component signals adjusted by the N sound channels;
and the superposition module is used for superposing the second multi-channel signals of the M second channels and the discrete component signals of the M channels according to the channels.
11. The apparatus of claim 7, wherein the distribution module selects at least one first playback unit adjacent to the azimuth of each sound image in the K sound images from the M playback units of the second output array, distributes the level magnitude of the sound image to the at least one first playback unit by vector panning, thereby determining the channel signal distributed to the at least one first playback unit.
12. The apparatus of claim 7, wherein the second output array includes left and right headphone units, and the assignment module determines the channel signals of the left and right headphone units according to the head-related transfer functions of the left and right ears and the adjusted sound image level sizes of the K sound images.
CN201811223436.1A 2018-10-19 2018-10-19 Multichannel loudness equalization method and apparatus Active CN109121067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811223436.1A CN109121067B (en) 2018-10-19 2018-10-19 Multichannel loudness equalization method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811223436.1A CN109121067B (en) 2018-10-19 2018-10-19 Multichannel loudness equalization method and apparatus

Publications (2)

Publication Number Publication Date
CN109121067A CN109121067A (en) 2019-01-01
CN109121067B true CN109121067B (en) 2020-06-09

Family

ID=64855100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811223436.1A Active CN109121067B (en) 2018-10-19 2018-10-19 Multichannel loudness equalization method and apparatus

Country Status (1)

Country Link
CN (1) CN109121067B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1277532A (en) * 1999-06-10 2000-12-20 三星电子株式会社 Multiple-channel audio frequency replaying apparatus and method
CN101483416A (en) * 2009-01-20 2009-07-15 杭州火莲科技有限公司 Response balance processing method for voice
CN107093991A (en) * 2013-03-26 2017-08-25 杜比实验室特许公司 Loudness method for normalizing and equipment based on target loudness

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7039204B2 (en) * 2002-06-24 2006-05-02 Agere Systems Inc. Equalization for audio mixing
US9253586B2 (en) * 2013-04-26 2016-02-02 Sony Corporation Devices, methods and computer program products for controlling loudness

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1277532A (en) * 1999-06-10 2000-12-20 三星电子株式会社 Multiple-channel audio frequency replaying apparatus and method
CN101483416A (en) * 2009-01-20 2009-07-15 杭州火莲科技有限公司 Response balance processing method for voice
CN107093991A (en) * 2013-03-26 2017-08-25 杜比实验室特许公司 Loudness method for normalizing and equipment based on target loudness

Also Published As

Publication number Publication date
CN109121067A (en) 2019-01-01

Similar Documents

Publication Publication Date Title
US10757529B2 (en) Binaural audio reproduction
KR102529122B1 (en) Method, apparatus and computer-readable recording medium for rendering audio signal
JP4921470B2 (en) Method and apparatus for generating and processing parameters representing head related transfer functions
US5594800A (en) Sound reproduction system having a matrix converter
EP3061268B1 (en) Method and mobile device for processing an audio signal
US20150131824A1 (en) Method for high quality efficient 3d sound reproduction
EP0571455B1 (en) Sound reproduction system
KR20080060640A (en) Method and apparatus for reproducing a virtual sound of two channels based on individual auditory characteristic
JP6552132B2 (en) Audio signal processing apparatus and method for crosstalk reduction of audio signal
CA2983359C (en) An audio signal processing apparatus and method
Garí et al. Flexible binaural resynthesis of room impulse responses for augmented reality research
WO2019239011A1 (en) Spatial audio capture, transmission and reproduction
TWI745795B (en) APPARATUS, METHOD AND COMPUTER PROGRAM FOR ENCODING, DECODING, SCENE PROCESSING AND OTHER PROCEDURES RELATED TO DirAC BASED SPATIAL AUDIO CODING USING LOW-ORDER, MID-ORDER AND HIGH-ORDER COMPONENTS GENERATORS
WO2018193163A1 (en) Enhancing loudspeaker playback using a spatial extent processed audio signal
CN109121067B (en) Multichannel loudness equalization method and apparatus
CN109923877B (en) Apparatus and method for weighting stereo audio signal
WO2017211448A1 (en) Method for generating a two-channel signal from a single-channel signal of a sound source
US11832079B2 (en) System and method for providing stereo image enhancement of a multi-channel loudspeaker setup
US20240056735A1 (en) Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same
WO2024081957A1 (en) Binaural externalization processing
JP2024502732A (en) Post-processing of binaural signals
JP2018101824A (en) Voice signal conversion device of multichannel sound and program thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant