EP4635203A1

EP4635203A1 - Apparatus and method for estimating the perceptual acoustics of a target room

Info

Publication number: EP4635203A1
Application number: EP22838850.0A
Authority: EP
Inventors: Shivam Saini; Stephan Werner; Lukas TREYBIG; Ulrike SLOMA; Liyun PANG
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2025-10-22
Also published as: WO2024132129A1

Abstract

A data processing apparatus (100) for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room. The data processing apparatus (100) is configured to obtain a plurality of physical acoustic parameters (111) for the target room and obtain the plurality of physical acoustic parameters (113) for a virtual room. The data processing apparatus (100) is further configured to estimate a perceptual acoustic distance (115) between the target room and the virtual room based on the plurality of physical acoustic parameters (111) of the target room and on the plurality of physical acoustic parameters (113) of the virtual room and based on a perceptual acoustic quality model (150). The perceptual acoustic quality model (150) defines a mapping between the plurality of physical acoustic parameters (111, 113) and a value of the perceptual attribute of the target room and between the plurality of physical acoustic parameters (111, 113) and a value of the perceptual attribute of the virtual room.

Description

Apparatus and method for estimating the perceptual acoustics of a target room

TECHNICAL FIELD

The present disclosure relates to audio processing in general. More specifically, the disclosure relates to an apparatus and method for estimating the perceptual acoustics of a target room.

BACKGROUND

3D sound can be defined as a sound arriving at a location, for example the ears of a listener, from varying directions and varying distances, which can contribute for example to a three- dimensional aural image humans hear. 3D audio rendering can comprise creating a sound world by attaching a characteristic sound to virtual objects in a virtual room (also called environment or scenery) to synthesize as a 3D sound. For 3D audio rendering, for instance, via headphones, large acoustic and also perceptual differences between a virtual, e.g. synthesized or rendered room and the real listening environment defined as a target room may degrade the spatial plausibility and immersion and, thus, lead to the so-called “Room Divergence Effect”.

SUMMARY

It is an objective to provide an improved apparatus and method for estimating the perceptual acoustics of a target room.

The foregoing and other objectives are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect a data processing apparatus for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room is provided. The data processing apparatus is configured to obtain a plurality of physical acoustic parameters for the target room and obtain the plurality of physical acoustic parameters, i.e. the same parameters as for the target room, for a virtual room, e.g. from a database. The data processing apparatus is further configured to estimate a perceptual acoustic distance measure value for the selected perceptual attribute between the target room and the virtual room based on the plurality of physical acoustic parameters of the target room and on the plurality of physical acoustic parameters of the virtual room and based on a perceptual acoustic quality model. The perceptual acoustic quality model defines a mapping, in particular a correlation, a) between the plurality of physical acoustic parameters and a first value or measure of the perceptual attribute of the target room and b) between the plurality of physical acoustic parameters and a second value or measure of the perceptual attribute of the virtual room. Thus, the data processing apparatus according to the first aspect allows efficiently estimating the perceptual acoustics of a target room based on the physical acoustic parameters of the target room.

In a further possible implementation form of the first aspect, the perceptual attribute of the target room and the virtual room comprises or is an envelopment attribute, a coloration attribute, a plausibility attribute, or an external ization attribute.

In a further possible implementation form of the first aspect, the data processing apparatus is configured to measure the plurality of physical acoustic parameters for the target room for obtaining the plurality of physical acoustic parameters for the target room.

In a further possible implementation form of the first aspect, the data processing apparatus is configured to obtain the plurality of physical acoustic parameters, i.e. the same parameters as for the target room, for the virtual room from a database of physical acoustic parameters for a plurality of virtual rooms.

In a further possible implementation form of the first aspect, the plurality of physical acoustic parameters of the target room and the virtual room comprises: an energy decay curve, EDC, parameter; a reverberation time parameter; a definition parameter; a speech transmission index, STI, parameter; a clarity index parameter; a direct-to-reverberant ratio, DRR, parameter; a centre time parameter; an inter-aural cross-correlation, IACC, parameter and/or a late lateral energy parameter.

In a further possible implementation form of the first aspect, the perceptual acoustic quality model defines, i.e. comprises for each perceptual attribute a correlation matrix between the pluralities of physical acoustic parameters of the target room and the virtual room and the first and second values or measures of the perceptual attribute of the target room and the virtual room for defining the mapping, in particular correlation, a) between the plurality of physical acoustic parameters and the first value or measure of the perceptual attribute of the target room and b) between the plurality of physical acoustic parameters and the second value or measure of the perceptual attribute of the virtual room.

In a further possible implementation form of the first aspect, the perceptual acoustic quality model defines, i.e. comprises for each perceptual attribute a correlation matrix between weighted linear combinations, i.e. the LDA discriminant functions, i.e. LDs determined by means of a linear discriminant analysis, of the plurality of physical acoustic parameters of the target room and the first value or measure of the perceptual attribute of the target room and between weighted linear combinations of the plurality of physical acoustic parameters of the virtual room and the second value or measure of the perceptual attribute of the virtual room.

In a further possible implementation form of the first aspect, the perceptual attribute of the target room and the virtual room is a coloration attribute and wherein the weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 30 dB having the largest weight.

In a further possible implementation form of the first aspect, the weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.33 to about 0.53, in particular 0.4382, the reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.9 to about 1.0, in particular 1.0, and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.29 to 0.49, in particular 0.3958.

In a further possible implementation form of the first aspect, the perceptual attribute of the target room and the virtual room is an envelopment attribute. The weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room may comprise a clarity index parameter for 50 ms having the largest weight.

In a further possible implementation form of the first aspect, the weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.9 to about 1.00, in particular 0.9533, a reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.85 to about 0.95, in particular 0.8834, the clarity index parameter for 50 ms having an absolute value in the range from about 0.9 to about 1.0, in particular 1 .0, and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.9 to 1 .00, in particular 0.9228.

In a further possible implementation form of the first aspect, the perceptual acoustic quality model defines, i.e. comprises for each perceptual attribute a correlation matrix between a cosine similarity of a first and a second weighted linear combination of the plurality of physical acoustic parameters of the target room and the first value or measure of the perceptual attribute of the target room and between a cosine similarity of the first and the second weighted linear combination of the plurality of physical acoustic parameters of the virtual room and the second value or measure of the perceptual attribute of the virtual room.

In a further possible implementation form of the first aspect, the perceptual attribute of the target room and the virtual room is a plausibility attribute. The first weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 30 dB having the largest weight of the first weighted linear combination and the second weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room comprises a clarity index parameter for 50 ms having the largest weight of the second weighted linear combination.

In a further possible implementation form of the first aspect, the first weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.33 to about 0.53, in particular 0.4382, the reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.9 to about 1.0, in particular 1.0, and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.29 to 0.49, in particular 0.3958.

In a further possible implementation form of the first aspect, the second weighted linear combination of the plurality of physical acoustic parameters of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.9 to about 1.00, in particular 0.9533, a reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.85 to about 0.95, in particular 0.8834, the clarity index parameter for 50 ms having an absolute value in the range from about 0.9 to about 1.0, in particular 1 .0, and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.9 to 1 .00, in particular 0.9228.

In a further possible implementation form of the first aspect, the data processing apparatus is further configured to determine a physical acoustic distance measure value between the target room and the virtual room based on the plurality of physical acoustic parameters of the target room and the plurality of physical acoustic parameters of the virtual room.

In a further possible implementation form of the first aspect, the data processing apparatus further comprises a display configured to display a graphical user interface configured to illustrate the perceptual acoustic distance measure value for the selected perceptual attribute between the target room and the virtual room.

In a further possible implementation form of the first aspect, the data processing apparatus is configured to estimate a respective perceptual acoustic distance measure value for the selected perceptual attribute between the target room and a plurality of virtual rooms and to determine a best-matching virtual room of the plurality of virtual rooms having the smallest perceptual acoustic distance measure value relative to the target room.

In a further possible implementation form of the first aspect, the data processing apparatus is further configured to obtain an impulse response function and/or a transfer function associated with the best-matching virtual room rooms having the smallest perceptual acoustic distance measure value relative to the target room.

According to a second aspect a computer-implemented data processing method for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room is provided. The data processing method comprises the steps of: obtaining a plurality of physical acoustic parameters for the target room; obtaining the plurality of physical acoustic parameters, i.e. the same parameters as for the target room, for a virtual room, e.g. from a database; and estimating a perceptual acoustic distance measure value for the selected perceptual attribute between the target room and the virtual room based on the plurality of physical acoustic parameters of the target room and on the plurality of physical acoustic parameters of the virtual room and based on a perceptual acoustic quality model, wherein the perceptual acoustic quality model defines a mapping, in particular a correlation, a) between the plurality of physical acoustic parameters and a first value or measure of the perceptual attribute of the target room and b) between the plurality of physical acoustic parameters and a second value or measure of the perceptual attribute of the virtual room. Thus, the data processing method according to the second aspect allows efficiently estimating the perceptual acoustics of a target room based on the physical acoustic parameters of the target room. The method according to the second aspect can be performed by the data processing apparatus according to the first aspect. Thus, further features of the method according to the second aspect result directly from the functionality of the data processing apparatus according to the first aspect as well as its different implementation forms and embodiments described above and below.

According to a third aspect a computer program product is provided, comprising a computer- readable storage medium for storing program code which causes a computer or a processor to perform the method according to the second aspect, when the program code is executed by the computer or the processor.

Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:

Fig. 1 is a schematic diagram illustrating a data processing apparatus according to an embodiment for estimating the perceptual acoustics of a target room;

Fig. 2a and 2b are schematic diagrams illustrating data flows implemented by a data processing apparatus according to an embodiment;

Fig. 3a is a schematic diagram illustrating an acoustic similarity estimation module implemented by a data processing apparatus according to an embodiment;

Fig. 3b is a schematic diagram illustrating an acoustic parameter calculation module implemented by the data processing apparatus of figure 3a;

Fig. 4 is a graphical diagram illustrating weights of different linear combinations of a plurality of physical acoustic parameters implemented by a data processing apparatus according to an embodiment;

Fig. 5 shows a matrix illustrating numerical values of the weights of figure 4;

Fig. 6a and 6b are schematic diagrams illustrating a classification of rooms from a database of rooms based on different LD discriminant functions;

Fig. 7 shows processing blocks for generating a perceptual quality model implemented by a data processing apparatus according to an embodiment;

Fig. 8 shows an exemplary correlation analysis for determining a perceptual quality model implemented by a data processing apparatus according to an embodiment; Fig. 9 shows a schematic diagram illustrating an implementation of a perceptual quality model with a visualization module of a data processing apparatus according to an embodiment;

Fig. 10 shows a graphical user interface of a display of a data processing apparatus according to an embodiment; and

Fig. 11 is a computer-implemented data processing method according to an embodiment for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room.

In the following, identical reference signs refer to identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following description, reference is made to the accompanying figures, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the present disclosure or specific aspects in which embodiments of the present disclosure may be used. It is understood that embodiments of the present disclosure may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

For instance, it is to be understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise. Figure 1 is a schematic diagram illustrating a data processing apparatus 100 according to an embodiment. The data processing apparatus 100 may comprises a processor 101. The processor 101 may be implemented in hardware and/or software and may comprise digital circuitry, or both analog and digital circuitry. Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors. The data processing apparatus 100 may further comprise a memory 103, e.g. a non-transitory memory or nonvolatile memory, configured to store executable program code which, when executed by the processor 101 , causes the data processing apparatus 100 to perform the functions and methods described herein. The data processing apparatus 100 may further comprise a display 105 for displaying results of the processing performed by the processor 101. The display 105 may be a touchscreen.

Before describing different embodiments of the data processing apparatus 100 in more detail, in the following some technical background as well as terminology concerning audio processing will be introduced making use of one or more of the following abbreviations:

AR Augmented Reality

RIR Room Impulse Response

SRIR Spatial Room Impulse Response

BRIR Binaural Room Impulse Response

DOA Direction of Arrival

DOV Direction of View

DRR Direct-to-Reverberant Ratio

LDA Linear Discriminant Analysis PCA Principal Component Analysis RT Reverberation Time

In the context of audio processing, a virtual room can be defined as an acoustic environment or scenery. The virtual objects are provided with room acoustics which can be assumed as a room of the acoustic scenery and can be used in a Binaural Reproduction System, as further described below. The acoustics of the scenery may need to be sufficiently similar to the room acoustics of the reproduction room to create a high spatial audio quality, represented, e.g. by perceptual attributes such as for example plausibility and externalization.

A target room can be defined as an acoustic environment or scenery. Here, the reproduction room is something like a target room for the virtual acoustics. If the room acoustics of the target room are known, e.g. by room acoustics measurements, virtual audio objects can be provided directly with the acoustics of the target room. If no measurements from the target room are available, the acoustics of other rooms, e.g. virtual rooms from a database, can be used. Furthermore, it is also possible to create the virtual room acoustics by simulation.

In any case, it is of interest to know the similarity of the acoustics of the target room and the Virtual room in order to draw conclusions about the quality of the binaural playback or audio- AR scene. As will be described below in more detail, the room acoustic similarity can be realized by evaluating the similarity of single or multiple room acoustic parameters and can be represented as distance. The distance may also be used for interlinking acoustic distance with quality ratings at divergent and congruent room scenarios from exemplary perceptual evaluations to estimate the effect on spatial audio quality.

The representation of the distance can also be useful for cases where an audio-AR scene is created and it is considered to check how well the virtual audio objects fit into the acoustics of the target room, for example if the creator is not in the target room itself when creating the audio-AR scene.

Physical acoustic parameters, also referred to as room acoustical parameters or more general environment acoustical parameters, can be used for acoustic distance calculation and could be extracted from measured room impulse responses or binaural room impulse responses. Such physical acoustic parameters can comprise the following:

Energy Decay Curve (EDT), which refers to the energy decay behavior of an impulse in a room.

Reverberation Time (Teo), which refers to a time taken for energy to decay by ‘60’ dB (generally 60 dB).

Definition (D50), which refers to a ratio of energy components up to 50ms after the direct sound to the total energy of RIR.

Speech transmission index ST I, which refers to an objective measure for predicting the intelligibility of speech. It has a value range between 0 and 1.

Clarity Index (C50), which refers to a logarithmic ratio between an early sound energy (until 50ms) to a later sound energy. Clarity Index (Cso), which refers to a logarithmic ratio between an early sound energy (until 80ms) to a later sound energy.

Direct-to-Reverberant Ratio (DRR), which refers to a ratio of a direct part to the rest of the reverberation, similar to C50.

Centre Time (Ts), which refers to a time at which half of the signal energy has reached a receiver.

Inter-aural Cross Correlation (IACC), which refers to a cross-correlation of a signal at each ear of a listener or receiver and which defines a spatial difference.

Late lateral energy Lj, which refers to the logarithmic ratio of the lateral sound energy from 80 ms after the direct sound and the total sound energy measured at a distance of 10m (free field).

Perceptual attributes, also referred to as perceptual quality features or attributes, may generally depends on the context and the task. Such perceptual attributes can comprise the following:

Envelopment, which refers to a listener envelopment impression of being surrounded by the reproduced sound field or audio signal. For example, "not surrounded at all", "less surrounded", "slightly less surrounded", "medium surrounded", "slightly more surrounded", "more surrounded", "completely surrounded" can be used for ratings in a subjective listening test.

Coloration, which refers to a timbral impression which is determined by the ratio of high to low frequency components. For example, "extremely muffled", "muffled", "slightly muffled", "well balanced", "slightly bright", "bright", "extremely bright" can be used for ratings in a subjective listening test.

Plausibility, which refers to a plausible auditory illusion or acoustic room congruence impression of how well the heard audio signal fits into the current listening environment, considering room acoustic characteristics. For example, "extremely bad fit", "bad fit", "poor fit", "fair fit", "good fit", "excellent fit", "ideal fit" can be used for ratings in a subjective listening test. Externalization, which refers to a perception of the audio signal being placed outside the head of a receiver or listener within the surrounding environment or being placed inside the head of a receiver or listener, including the ability to localize the direction of the incoming sound. For example, “inside the head, but diffuse”, “inside the head, localizable”, “very near the head, localizable”, “outside the head, localizable”, “outside the head, but diffuse” can be used for ratings in a subjective listening test.

The general idea of embodiments disclosed herein is to provide an efficient and intuitive data processing apparatus 100 for calculation and representation of acoustic and perceptual distances between two or more spatial audio signals using room acoustic parameters. The spatial audio signals may at least be one virtual sound source embedded in a room acoustics, i.e. target room, and at least one further room acoustics, i.e. virtual room, to be compared against. Further spatial audio signals can be further room acoustics, i.e. further virtual rooms, against which a further comparison can be performed. The perceptual distances (or similarities) can be related to the perceptual and intuitive attributes such as envelopment, externalization, plausibility, coloration and others.

As will be described in more detail below, the data processing apparatus 100 is configured for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room. As illustrated in figure 1 , the data processing apparatus 100 is configured to obtain a plurality of physical acoustic parameters 111 , 113 for the target room and obtain the plurality of physical acoustic parameters 111 , 113, i.e. the same parameters as for the target room for a virtual room, e.g. from a database 201a (illustrated in figure 2).

The data processing apparatus 100 is further configured to estimate a perceptual acoustic distance measure value 115 for the selected perceptual attribute between the target room and the virtual room based on the plurality of physical acoustic parameters 111 of the target room and on the plurality of physical acoustic parameters 113 of the virtual room and based on a perceptual acoustic quality model 150. The perceptual acoustic quality model 150 may be stored in the memory 103 of the data processing apparatus 100.

The perceptual acoustic quality model 150 defines a mapping, in particular a correlation a) between the plurality of physical acoustic parameters 111 , 113 and a first value or measure of the perceptual attribute of the target room and b) between the plurality of physical acoustic parameters 111 , 113 and a second value or measure of the perceptual attribute of the virtual room. The perceptual attribute of the target room and the virtual room may comprise or is an envelopment attribute, a coloration attribute, a plausibility attribute, or an externalization attribute, in particular as described above.

The plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise: an energy decay curve, EDC, parameter; a reverberation time parameter; a definition parameter; a speech transmission index, STI, parameter; a clarity index parameter; a direct-to-reverberant ratio, DRR, parameter; a centre time parameter; an inter-aural crosscorrelation, IACC, parameter and/or a late lateral energy parameter, in particular as described above.

Figures 2a and 2b are schematic diagrams illustrating data flows in the data processing apparatus 100 according to an embodiment. More specifically, figure 2b shows a first spatial audio signal 220 or binaural audio signal 220 characterized by room acoustics 221 of the virtual room and a second spatial audio signal 230 or binaural audio signal 230 characterized by room acoustics 231 of the target room. The room acoustics 221 , 231 may comprise a direct sound, early acoustic room reflections, and late acoustic room reflections.

Generally, the data processing apparatus 100 may be configured to assess perceptual/auditory distances (or similarities) between two or more spatial audio signals 220, 230 using the physical acoustic parameters 111 , 113 of the rooms. The data processing apparatus 100 may perform a) a comparison of room acoustic parameters using statistical data analysis and calculation of acoustic distance between different room acoustics, and b) an estimation of perceptual distance using a perceptual quality model which describes the effect of acoustic room divergence on perceived spatial audio quality.

The room acoustics 221 , 231 may be directionally weighted representations (binaural and/or monaural) and/or omnidirectional representations. The acoustic representations may be transfer functions 203a-c in the form of BRI Rs, SRI Rs, and/or RIRs, recordings 201b or spatial audio signals which have been created by simulations 201c. The recordings 201b map comprise spatial audio signals recorded by microphones and/or a piece of audio signal recorded in a room (or environment/scenery), for instance, with a mono microphone or a pair of binaural microphones.

The room acoustics 221 , 231 may be calculated by the data processing apparatus 100 from the acoustic representations and/or they originate from other sources, for example from the database 201a. The room acoustic parameters, i.e. the plurality of physical acoustic parameters 111 , 113 may be calculated by the data processing apparatus 100 from the transfer functions 203a-c or from microphone recordings 201 b.

The room acoustic parameters, i.e. the plurality of physical acoustic parameters 111 , 113 may be previously documented parameters for the acoustic description of rooms (e.g. Teo, DRR, Cso, Cso, IACC) and/or new or adapted parameters for the description of audio-AR scenes.

The room acoustics parameters, i.e. the plurality of physical acoustic parameters 111 , 113 may come directly from measurements or recordings 210b, come from the database 201a where room acoustics parameters for different rooms are stored, come from simulations 201c of room acoustics, be set as values elsewhere (e.g. by manual specification of Teo, DRR, etc.) or come from other sources such as a remote server.

The data processing apparatus 100 may comprise a module 240 for determining similarity and comparison of the physical acoustic parameters 111 , 113 of the rooms, which may be realized by statistical data analysis and for calculating an acoustic distance between different room acoustics.

Data analysis may be performed by statistical ratios and considering single or combinations of the physical acoustic parameters 11 , 113 of the rooms. Additionally or alternatively, data analysis may be performed by applying methods of multivariate statistics, e.g. principal component analysis, cluster analysis, multidimensional scaling and others.

For the groups/clusters created in the data analysis, the similarity between the groups/cl usters may be calculated, e.g. cosine similarity, geometric distance, etc. The similarity may be a measure of multivariate acoustic distance. The groups/clusters may be called spatial classes.

The perceptual acoustic quality model 150 may be designed based on the correlation matrices between the plurality of physical acoustic parameters 111 , 113 and the perceptual attributes. To determine the correlation, subjective evaluation/listening tests may be designed and organized for accessing the overall subjective audio quality as well as some describing attributes.

With regards to the stimuli of the subjective listening test, binaural audio signals auralized with spatial room acoustics of different rooms, i.e. the virtual room illustrated in figure 2b may be used as stimuli for the listening tests. Regarding evaluation metrics, the listening tests may be conducted for different Quality Features, i.e. the perceptual attributes such as plausibility, coloration, envelopment, externalization, etc., separately. An evaluation room may be used for the listing tests, where the listening tests were conducted, and which may be the target room.

The data processing apparatus 100 may further comprise a module 250 configured for estimation of the perceptual distance 115, which may be realized by the perceptual acoustic quality model 150 describing the effect of acoustic room divergence on perceived spatial audio quality. A room database, which may be the same database as the database 201a ora different database, may be used from which combinations of convergent and divergent audio scenes may be selected for perceptual evaluation. The evaluation may measure the perceived quality for the features, i.e. perceptual attributes: Plausibility, Externalization, Coloration, and others. The data processing apparatus 100 may interlink the rated quality and the acoustic similarity to estimate the correlation between acoustics and auditory perception.

The data processing apparatus 100 can be suited for auditory AR use cases as well as for position-dynamic binaural synthesis. In figure 2a, an exemplary use case as a binaural auralization system used in spatial audio rendering for a headphone 209a and loudspeaker 209b is illustrated. This may involve using the transfer functions 203a-c for processing spatial room acoustics, in particular RIR, BRIR and SRIR as described above and a following spatial audio rendering 205. Resulting spatial audio signals 220, 230 may then be processed by a binaural playback 207 in order to achieve binaural audio signals for the headphone 209a or the loudspeakers 209b.

A virtual room based on certain spatial room acoustics may be used to enhance overall spatial experience. The playback device may be the headphone 209a or the loudspeakers 209b. For loudspeakers 209b, additional processing such as crosstalk cancellation processing may be required. For AR applications, it may be important to use the perceptual “similar” virtual room as the real listening room (target room) to result in a plausible illusion.

Besides spatial audio/3D audio rendering for headphone/loudspeaker 209a-b, the data processing apparatus 100 may be used for AR applications where it is important to have a perceptually congruent virtual-target acoustics. Fixed rooms are standard and currently being implemented in the spatial audio/3D audio features of more and more products. However, a perceptually convergent, i.e. the opposite to congruent, virtual-target acoustics will destroy the plausibility and also the overall user experience. If the physical acoustic parameters 111 , 113 of the target room are known, a best matching virtual room can be selected from a pre-collected database based on estimated perceptual virtual-target distance in order to give the best coherent virtual-target acoustics. If the physical acoustic parameters 111 , 113 of the target room are unknown, a piece of audio signals (speech, music or noise etc.) can be recorded with a mono or a pair of binaural microphones and the physical acoustic parameters 111 , 113 are calculated from the recorded signals. The data processing apparatus 100 can also be used for content creation and spatial audio mixing by visualizing acoustic distance and perceptual distance of different virtual rooms.

Figure 3a is a schematic diagram illustrating an acoustic similarity estimation module implemented by the data processing apparatus 100 according to an embodiment.

In order to assess perceptual distances (or similarities) between two or more spatial audio signals 120, 130 using room acoustic parameters, i.e. the plurality of physical acoustic parameters 111 , 113, a first step may be to compare the room acoustic parameters using statistical data analysis and calculate acoustic distance between different room acoustics. A second step may involve an estimation of perceptual distance using the perceptual acoustic quality model 150 which describes the effect of acoustic room divergence on perceived spatial audio quality.

The acoustic similarity estimation module shown in figure 3a may be configured to estimate acoustic similarity using single physical acoustic parameters 111 , 113 and/or multi-variate statistics. The room acoustic database 201a which may comprise the measurement of multiple rooms and the analysis of relevant physical acoustic parameters 111 , 113 of the rooms may be coupled to the acoustic similarity estimation module. When the transfer functions 203a-c, in particular RIR, BRIR or SRIR, are measured or simulated for one or multiple DoA and DoV in certain rooms, one or more of the room acoustic parameters, i.e. the plurality of physical acoustic parameters 111 , 113, as described above may be extracted by an acoustic parameter calculation 301.

Based on the transfer functions 203a-c and/or an audio recording 201b as an input 301a, an acoustic parameter calculation 301 , which is illustrated in figure 3b in more detail, is performed. By the parameter calculation 301 the physical acoustic parameters 111 , 113 for broadband signals and/or for several frequency bands based on a broadband or frequency-band decomposition 301b may be calculated. The corresponding output 301c may be the room acoustic parameters, i.e. the plurality of physical acoustic parameters 111 , 113. As further illustrated in figure 3a, by applying Multivariate statistics 307 such as PCA and/or by applying LDA 303, the physical acoustic parameters 111 , 113 of the rooms can be described by a few linear combinations (principal component in PCA or discriminant function/component in LDA), i.e. by providing a parameter difference 305 or a similarity index 309. Doing so, different rooms can be classified on the basis of the physical acoustic parameters 111 , 113 of the rooms by describing them by a few meaningful combinations of them. In the case of applying Multivariate statistics 307, this may involve supplying at least some of the plurality of physical acoustic parameters 111 , 113 by the room acoustic database 201a. In the following, observations by means of LDA are considered by example, where the room separate is illustrated in a more comprehensible manner in the LDA than in the PCA.

Figure 4 is a graphical diagram illustrating the weights of specific parameters of the plurality of physical acoustic parameters 111 ,113 in regard to linear combinations of the LDA. In the columns of figure 4, from left to right, a first linear combination to an eight linear combination is illustrated (also referred to as LD1 to LD8 in the following). In the rows of figure 4 the specific parameters of the plurality of physical acoustic parameters 111 ,113 are illustrated, namely from top to bottom, EDT, T20, T30, D50, C50, Cso, DRR and T_s.

These specific parameters of the plurality of physical acoustic parameters 111 , 113 are in more detail illustrated in figure 5, which shows a matrix W illustrating the specific parameters of the plurality of physical acoustic parameters 111 ,113 in regard to the linear combinations of the LDA.

Complementary to figure 4, in the columns of Matrix W, from left to right, the first linear combination to the eight linear combination is illustrated. In the columns of Matrix W the specific parameters of the plurality of physical acoustic parameters 111 ,113 are illustrated, namely from top to bottom, EDT, T20, T30, D50, C50, Cso, DRR and T_s.

As illustrated in figure 5, the matrix W may comprise the following weights:

0.0862 0.0035 -0.1099 0.0341 -0.1000 0.0734 -0.4770 0.0825

0.4382 -0.3210 0.9533 1.0000 0.1209 -1.0000 1.0000 0.6332

-1.0000 0.6050 -0.8834 -0.9983 -0.3970 0.8477 -0.6814 -0.3129

-0.0274 0.2459 -0.0340 0.0078 0.9137 -0.3548 -0.4390 -0.6738

-0.2387 -0.6784 -1.0000 0.4360 -0.6017 -0.0072 -0.2041 -0.2228

0.3958 1.0000 0.9228 -0.2039 0.0956 -0.0081 0.2940 0.3611

-0.0273 0.0417 0.0521 0.0600 0.3840 0.0454 -0.0373 -0.0715

—0.0760 0.2343 0.0476 0.1885 1.0000 -0.2180 -0.2473 -1.0000 Generally, the perceptual acoustic quality model 150 may define, i.e. comprises for each perceptual attribute a correlation matrix, such as the matrix W for example, between the pluralities of physical acoustic parameters 111 , 113 of the target room and the virtual room and the first and second values or measures of the perceptual attribute of the target room and the virtual room for defining the mapping, i.e. correlation between a) the plurality of physical acoustic parameters 111 , 113 and the first value or measure of the perceptual attribute of the target room and b) between the plurality of physical acoustic parameters 111 , 113 and the second value or measure of the perceptual attribute of the virtual room.

Figures 6a and 6b are schematic diagrams illustrating a classification of the rooms from the room database 201a using different LD discriminant functions/components.

Figure 6a shows the first and second linear combination, i.e. LD1 and LD2 and figure 6b shows the first to the third linear combination, i.e. LD1 to LD3. Based on the LDA, it can be shown, in particular the data processing apparatus 100 may determine, how the measured rooms differentiate and could be classified, and which combinations of the plurality of physical acoustic parameters 111 , 113 can explain these differentiations.

Since the observations of the LDA for the individual rooms can be arranged in an n-dimensional space, distance measures may be further used. Possible distance measure comprises to (i) calculate a mean value of each point cloud, (ii) calculate a distance between reference room to all rooms, and (iii) calculate a cosine similarity between reference room and all other rooms.

The data processing apparatus 100 may determine which of the LD1 to LD8 are sufficient to describe most of the data. In the following, by way of example, the observations of the first three LD functions/components, i.e. LD1 to LD3, are considered to describe most of the data set. This is illustrated in figures 6a and 6c, where LD1 to LD3 describe up to 99% of the data combined. As a distance measure, the distance and the cosine similarity may be calculated as illustrated in equation 1 and equation 2 below.

The Equation 1 may be as follows:

SUBSTITUTE SHEET (RULE 26) The Equation 2 may be as follows:

Herein, LD_n describes the mean value of the observations of the n-th discriminant function for the target/reference room R_ref and the virtual room R, used in binaural rendering.

When comparing with the perceptual evaluation result to find the link between the plurality of physical acoustic parameters 111 , 113 and perceptual attributes, R_ref may be the room in which the evaluation/listening test was performed, and R, may be the virtual room used to generate the spatial audio signals with room acoustic parameters for listening.

The target of the perceptual acoustic quality model 150 may be to make statements about the perception attributes such as envelopment, plausibility, coloration, externalization only looking at the plurality of physical acoustic parameters 111 , 113. In other words, the perceptual acoustic quality model 150 may be designed based on the correlation matrices between the plurality of physical acoustic parameters 111 , 113 and the perception attributes.

Figure 7 shows components of the perceptual acoustic quality model 150 implemented by the data processing apparatus 100 according to an embodiment. Generally, the task is to interlink the acoustic similarity index with several quality features (regarding spatial audio quality), i.e. the perceptual attributes.

To find the correlation between physical acoustic parameters 111 , 113 and the perceptual attributes, subjective evaluation/listening tests may be designed, as already described above, and organized to create a perceptual evaluation database 701 , which may be stored in the memory 103 of the data processing apparatus 100.

The data processing apparatus 100 may be further configured to implement a module 703 for supplying the quality features, i.e. the perceptual attributes, and a perceived quality analysis module 705 which may process the quality features, i.e. the perceptual attributes, based on the data from the perceptual evaluation database 701 .

Based on the room acoustic database 201a, the data processing apparatus may perform single parameter and multivariate analysis 303, 307, as described above.

SUBSTITUTE SHEET (RULE 26) The corresponding results of the perceived quality analysis module 705 and the single parameter and multivariate analysis 303, 307, which results may comprise the plurality of physical acoustic parameters 111 , 113, the LD components and the determined distances, may be forwarded to an interlinking and correlation module 707.

For estimating spatial audio quality the perceptual acoustic quality model 150 can be aimed to access the overall subjective audio quality as well as some describing attributes. This can be archived, by a combined analysis being performed by the interlinking and correlation module 707 to find the correlations between perceptive evaluations results and physical/acoustic, comprising the physical room acoustic parameters 111 , 113, the LD components and the similarity/distances of the single parameter and multivariate analysis 303, 307.

A parameter or distance may have a significant effect with regard to the asked perception in the perceptual evaluation test, if it correlates in all evaluation conditions (evaluated rooms) in a similar, significant way. Coloration and envelopment may be perceived similar for the different BRI Rs in all evaluation rooms. Plausibility perception of the different BRI Rs may differ strongly between the evaluation rooms, so the evaluation depends on if the render room/target room is congruent or divergent, i.e. similar or un-similar. Therefore, it is difficult to design the perceptual acoustic quality model 150 based on fixed acoustic parameters to estimate different quality features. The perceptual acoustic quality model 150 may be designed separately for each relevant quality feature, i.e. perceptual attribute.

Figure 8 shows an exemplary correlation analysis for determining the perceptual acoustic quality model 150 implemented by the data processing apparatus 100 according to an embodiment.

As illustrated by the diagrams 801-803, 811-813, 821-823 shown in figure 8, the data processing apparatus may determine the perceptual acoustic quality model 150 by comparing a plurality of combinations for each of the perceptual attributes.

In figure 8, by way of example, for the perceptual attribute “plausibility” the diagrams 801-803 represent a first group of combinations to be compared. For the perceptual attribute “envelopment” the diagrams 811-813 represent a second group of combinations to be compared. For the perceptual attribute “coloration” the diagrams 821-823 represent a third group of combinations to be compared. Each first row of each of the diagrams 801-803, 811-813, 821-823 may indicate a same first room, each second row a same second room and each third row a same third room.

Each column of the diagrams 801 , 811 , 821 may indicate specific parameters of the plurality of physical acoustic parameters 111 , 113, namely from left to right the parameters EDT, T20, T30, Teo, D50, C50, Oso, DRR and T_s.

Each column of the diagrams 802, 812, 822 may indicate from left to right the linear combinations LD1 to LD8 based on the LDA.

Each column of the diagrams 803, 813, 823 may indicate from left to right (i) a distance of LD1 , LD2 and LD3, (ii) a cosine similarity over LD 1 and L2, for example according to the equation 2, and (iii) a cosine similarity over LD 1 and L3, for example according to the equation 2.

Based on the comparisons illustrated by the diagrams 801-803, 811-813, 821-823, the data processing apparatus 100 may determine the parameters used in the perceptual acoustic quality model 150, in particular those parameters of the plurality of physical acoustic parameters 111 , 113 showing a high correlation between acoustic evaluation and perceptual evaluation.

In the example illustrated in figure 8, the data processing apparatus 100 may determine for the parameters used in the perceptual acoustic quality model 150 the cosine similarity over LD1 and LD3 for the perceptual attribute “plausibility”, LD3 for the perceptual attribute “envelopment” and LD1 for the perceptual attribute “coloration”. The room perceptual similarity may then be estimated based on the pre-designed perceptual acoustic quality model 150 comprising the parameters as chosen above.

In other words, the data processing apparatus 100 can be configured according to one or more of the following modes based on the matrix W and the comparison performed as described above:

In a first, second and third mode, the perceptual acoustic quality model 150 may define, i.e. comprises for each perceptual attribute a correlation matrix between weighted linear combinations, i.e. the LDA discriminant functions, i.e. LDs determined by means of the linear discriminant analysis, of the plurality of physical acoustic parameters 111 of the target room and the first value or measure of the perceptual attribute of the target room and between weighted linear combinations of the plurality of physical acoustic parameters 113 of the virtual room and the second value or measure of the perceptual attribute of the virtual room.

In the first mode, the perceptual attribute of the target room and the virtual room may be a coloration attribute. The weighted linear combination of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 30 dB, i.e. T30, having the largest weight. The weighted linear combination of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 20 dB, i.e. T20, with a weight having an absolute value in the range from about 0.33 to about 0.53, in particular 0.4382, the reverberation time parameter for an energy decay by 30 dB, i.e. T30, with a weight having an absolute value in the range from about 0.9 to about 1.0, in particular 1.0, and a clarity index parameter for 80 ms, i.e. Cso, with a weight having an absolute value in the range from about 0.29 to 0.49, in particular 0.3958.

In the second mode, the perceptual attribute of the target room and the virtual room may be an envelopment attribute. The weighted linear combination of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprises a clarity index parameter for 50 ms, i.e. C50, having the largest weight. The weighted linear combination of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 20 dB, i.e. T20, with a weight having an absolute value in the range from about 0.9 to about 1 .00, in particular 0.9533, a reverberation time parameter for an energy decay by 30 dB, i.e. T30, with a weight having an absolute value in the range from about 0.85 to about 0.95, in particular 0.8834, the clarity index parameter for 50 ms, i.e. C50, having an absolute value in the range from about 0.9 to about 1 .0, in particular 1.0, and a clarity index parameter for 80 ms, i.e. Cso, with a weight having an absolute value in the range from about 0.9 to 1.00, in particular 0.9228.

In the third mode, the perceptual acoustic quality model 150 may defines, i.e. comprise for each perceptual attribute a correlation matrix between a cosine similarity of a first and a second weighted linear combination of the plurality of physical acoustic parameters 111 of the target room and the first value or measure of the perceptual attribute of the target room and between a cosine similarity of the first and the second weighted linear combination of the plurality of physical acoustic parameters 113 of the virtual room and the second value or measure of the perceptual attribute of the virtual room. In the third mode, the perceptual attribute of the target room and the virtual room may be a plausibility attribute. The first weighted linear combination, i.e. LD1 , of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 30 dB, i.e. T30, having the largest weight of the first weighted linear combination. The second weighted linear combination, i.e. LD3, of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a clarity index parameter for 50 ms, i.e. C50, having the largest weight of the second weighted linear combination.

The first weighted linear combination, i.e. LD1 , of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 20 dB, i.e. T20, with a weight having an absolute value in the range from about 0.33 to about 0.53, in particular 0.4382, the reverberation time parameter for an energy decay by 30 dB, i.e. T30, with a weight having an absolute value in the range from about 0.9 to about 1 .0, in particular 1 .0, and a clarity index parameter for 80 ms, i.e. Cso, with a weight having an absolute value in the range from about 0.29 to 0.49, in particular 0.3958.

The second weighted linear combination, i.e. LD3, of the plurality of physical acoustic parameters 111 , 113 of the target room and the virtual room may comprise a reverberation time parameter for an energy decay by 20 dB, i.e. T20, with a weight having an absolute value in the range from about 0.9 to about 1.00, in particular 0.9533, a reverberation time parameter for an energy decay by 30 dB, i.e. T30, with a weight having an absolute value in the range from about 0.85 to about 0.95, in particular 0.8834, the clarity index parameter for 50 ms, i.e. C50, having an absolute value in the range from about 0.9 to about 1.0, in particular 1.0, and a clarity index parameter for 80 ms, i.e. Cso, with a weight having an absolute value in the range from about 0.9 to 1.00, in particular 0.9228.

Figure 9 shows a schematic diagram illustrating an implementation of the perceptual acoustic quality model 150 with a visualization module 901 of the data processing apparatus 100 according to an embodiment.

Similar to the embodiments described above, the data processing apparatus 100 may be configured to implement the module 240 for determining similarity and comparison of the physical acoustic parameters 111 , 113 of the rooms. For visualization, the visualization module 901 may receive the acoustic distance values as described above from the module 240 and the perceptual distance 115 from the perceptual acoustic quality model 150.

The visualization module 901 may allow the data processing apparatus 100 to present or visualize acoustic and perceptual distances (or similarities) 115 between two or more spatial audio signals using the physical acoustic parameters 111 , 113 of the rooms by the display 105. This can be useful for content creation or spatial audio mixing.

Figure 10 shows a graphical user interface 1000 of the display 105 of the data processing apparatus 100 according to an embodiment. The graphical user interface 1000 may be configured to illustrate the perceptual acoustic distance measure value 115 for the selected perceptual attribute between the target room and the virtual room.

As illustrated in figure 10, the perceptual acoustic distance measure value 115 may be illustrated in the form of an arrow in relation to a plurality of further distances 1001a-c to predefined distances for predefined room classes, which may be stored in the memory 103 of the data processing apparatus 100. The plurality of further distances 1001a-c may be graphically illustrated different to the perceptual acoustic distance measure value 115, for example by a circle.

As further illustrated in figure 10, the perceptual acoustic distance measure value 115 and the plurality of further distances 1001a-c may be illustrated on different classification quadrants of a visualization diagram 1003 of the graphical user interface 1000.

The graphical user interface 1000 may further comprise a display section 1005 for illustrating one or more of the perceptual attributes, i.e. for the estimated perceived spatial audio quality. The one or more of the perceptual attributes may be illustrated in forms of bars, which represent the ratings of the perceptual attributes described above.

For configuring the visualization, the graphical user interface 1000 may further comprise a menu 1007 for configuring visualization settings of the rooms, in particular the target room and the virtual room, the room classes, selection of the physical acoustic parameters 111 , 113 and/or selection of the perceptual attributes.

In the menu 1007, there may be selectable different rooms for the target room and different rooms for the virtual room which can be used in the rendering system. A user of the display 105 may first select the target room which could be a good mixing studio or the real listening room, then click on different virtual rooms. When doing that, the acoustic distance between two rooms is shown based on two components of the single parameter and multivariate analysis, for example two LDA components. Consequently, the estimated ratings for different quality features may be shown on the right side of the graphical user interface 1000 in the display section 1005.

Figure 11 is a flow diagram illustrating a computer-implemented data processing method 1100 according to an embodiment for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room.

The data processing method 1100 comprises a step 1101 of obtaining a plurality of physical acoustic parameters 111 , 113 for the target room.

The data processing method 1100 further comprises a step 1103 of obtaining the plurality of physical acoustic parameters 111 , 113, i.e. the same parameters as for the target room, for a virtual room, e.g. from database 201a.

The data processing method 1100 further comprises a step 1105 of estimating a perceptual acoustic distance measure value 115 for the selected perceptual attribute between the target room and the virtual room based on the plurality of physical acoustic parameters 111 of the target room and on the plurality of physical acoustic parameters 113 of the virtual room and based on the perceptual acoustic quality model 150, wherein the perceptual acoustic quality model 150 defines a mapping, in particular correlation, a) between the plurality of physical acoustic parameters 111 , 113 and a first value or measure of the perceptual attribute of the target room and b) between the plurality of physical acoustic parameters 111 , 113 and a second value or measure of the perceptual attribute of the virtual room.

The method data processing 1100 can be performed by the data processing apparatus 100 according to an embodiment. Thus, further features of the data processing method 1100 result directly from the functionality of the data processing apparatus 100 as well as its different embodiments described above and below.

Advantageously, by assessing perceptual distances (or similarities) between two or more spatial audio signals using the plurality of physical acoustic parameters 111 , 113 of the rooms, perceptually meaningful parameters, i.e. the perceptual acoustic distance measure value 115, can be used for improving the user experience by providing perceptually more similar results when used in binaural auralization for headphone and loudspeakers.

Visualizing acoustic distance and perceptual distances (or similarities) between two or more spatial audio signals using the plurality of physical acoustic parameters 111 , 113 can generate knowledge by which it may achieved to adapt the plurality of physical acoustic parameters 111 , 113 and parts of the BRIRs to achieve an externalized and very plausible perceptual impression.

Estimating perceptual ratings for different quality features, i.e. perceptual attributes using the plurality of physical acoustic parameters 111 , 113 of the rooms makes it possible to evaluate the quality of the 3D audio or audio-AR scene without running subjective listening tests.

Generally, the data processing apparatus 100 and the data processing method 1100 can use perceptually meaningful parameters, i.e. the perceptual acoustic distance measure value 115, instead of physical and/or acoustic parameters which can improve the user experience by providing perceptually more similar results.

The person skilled in the art will understand that the "blocks" ("units") of the various figures (method and apparatus) represent or describe functionalities of embodiments of the present disclosure (rather than necessarily individual "units" in hardware or software) and thus describe equally functions or features of apparatus embodiments as well as method embodiments (unit = step).

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described embodiment of an apparatus is merely exemplary. For example, the unit division is merely logical function division and may be another division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

Claims

1. A data processing apparatus (100) for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room, wherein the data processing apparatus (100) is configured to: obtain a plurality of physical acoustic parameters (111) for the target room; obtain the plurality of physical acoustic parameters (113) for a virtual room; and estimate a perceptual acoustic distance (115) between the target room and the virtual room based on the plurality of physical acoustic parameters (111) of the target room and on the plurality of physical acoustic parameters (113) of the virtual room and based on a perceptual acoustic quality model (150), wherein the perceptual acoustic quality model (150) defines a mapping between the plurality of physical acoustic parameters (111 , 113) and a value of the perceptual attribute of the target room and between the plurality of physical acoustic parameters (111 , 113) and a value of the perceptual attribute of the virtual room.

2. The data processing apparatus (100) of claim 1 , wherein the perceptual attribute of the target room and the virtual room comprises or is an envelopment attribute, a coloration attribute, a plausibility attribute, or an externalization attribute.

3. The data processing apparatus (100) of any one of the preceding claims, wherein the data processing apparatus (100) is configured to measure the plurality of physical acoustic parameters (111) for the target room for obtaining the plurality of physical acoustic parameters (111) for the target room.

4. The data processing apparatus (100) of any one of the preceding claims, wherein the data processing apparatus (100) is configured to obtain the plurality of physical acoustic parameters (113) for the virtual room from a database (201a) of physical acoustic parameters (113) for a plurality of virtual rooms.

5. The data processing apparatus (100) of any one of the preceding claims, wherein the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprise: an energy decay curve, EDC, parameter; a reverberation time parameter; a definition parameter; a speech transmission index, STI, parameter; a clarity index parameter; a direct-to-reverberant ratio, DRR, parameter; a centre time parameter; an inter-aural crosscorrelation, IACC, parameter and/or a late lateral energy parameter.

6. The data processing apparatus (100) of any one of the preceding claims, wherein the perceptual acoustic quality model (150) defines a correlation matrix between the pluralities of physical acoustic parameters (111 , 113) of the target room and the virtual room and the values of the perceptual attribute of the target room and the virtual room for defining the mapping between the plurality of physical acoustic parameters (111 , 113) and the value of the perceptual attribute of the target room and between the plurality of physical acoustic parameters (111 , 113) and the value of the perceptual attribute of the virtual room.

7. The data processing apparatus (100) of claim 6, wherein the perceptual acoustic quality model (150) defines a correlation matrix between weighted linear combinations of the plurality of physical acoustic parameters (111) of the target room and the value of the perceptual attribute of the target room and between weighted linear combinations of the plurality of physical acoustic parameters (113) of the virtual room and the value of the perceptual attribute of the virtual room.

8. The data processing apparatus (100) of claim 7, wherein the perceptual attribute of the target room and the virtual room is a coloration attribute and wherein the weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 30 dB having the largest weight.

9. The data processing apparatus (100) of claim 8, wherein the weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.33 to about 0.53, the reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.9 to about 1.0 and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.29 to 0.49.

10. The data processing apparatus (100) of claim 7, wherein the perceptual attribute of the target room and the virtual room is an envelopment attribute and wherein the weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a clarity index parameter for 50 ms having the largest weight.

11. The data processing apparatus (100) of claim 10, wherein the weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.9 to about 1.00, a reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.85 to about 0.95, the clarity index parameter for 50 ms having an absolute value in the range from about 0.9 to about 1.0, and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.9 to 1 .00.

12. The data processing apparatus (100) of claim 7, wherein the perceptual acoustic quality model (150) defines a correlation matrix between a cosine similarity of a first and a second weighted linear combination of the plurality of physical acoustic parameters (111) of the target room and the value of the perceptual attribute of the target room and between a cosine similarity of the first and the second weighted linear combination of the plurality of physical acoustic parameters (113) of the virtual room and the value of the perceptual attribute of the virtual room.

13. The data processing apparatus (100) of claim 12, wherein the perceptual attribute of the target room and the virtual room is a plausibility attribute, wherein the first weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 30 dB having the largest weight and wherein the second weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a clarity index parameter for 50 ms having the largest weight.

14. The data processing apparatus (100) of claim 13, wherein the first weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.33 to about 0.53, the reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.9 to about 1.0 and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.29 to 0.49.

15. The data processing apparatus (100) of claim 13 or 14, wherein the second weighted linear combination of the plurality of physical acoustic parameters (111 , 113) of the target room and the virtual room comprises a reverberation time parameter for an energy decay by 20 dB with a weight having an absolute value in the range from about 0.9 to about 1.00, a reverberation time parameter for an energy decay by 30 dB with a weight having an absolute value in the range from about 0.85 to about 0.95, the clarity index parameter for 50 ms having an absolute value in the range from about 0.9 to about 1.0, and a clarity index parameter for 80 ms with a weight having an absolute value in the range from about 0.9 to 1 .00.

16. The data processing apparatus (100) of any one of the preceding claims, wherein the data processing apparatus (100) is further configured to determine a physical acoustic distance between the target room and the virtual room based on the plurality of physical acoustic parameters (111) of the target room and the plurality of physical acoustic parameters (113) of the virtual room.

17. The data processing apparatus (100) of any one of the preceding claims, wherein the data processing apparatus (100) further comprises a display (105) configured to display a graphical user interface configured to illustrate the perceptual acoustic distance (115) between the target room and the virtual room.

18. The data processing apparatus (100) of any one of the preceding claims, wherein the data processing apparatus (100) is configured to estimate a respective perceptual acoustic distance (115) between the target room and a plurality of virtual rooms and to determine a best-matching virtual room of the plurality of virtual rooms having the smallest perceptual acoustic distance (115) relative to the target room.

19. The data processing apparatus (100) of claim 18, wherein the data processing apparatus (100) is further configured to obtain an impulse response function and/or a transfer function associated with the best-matching virtual room rooms having the smallest perceptual acoustic distance (115) relative to the target room.

20. A data processing method (1100) for estimating the perceptual acoustics of a target room with respect to a perceptual attribute of the target room, wherein the data processing method (1100) comprises: obtaining (1101) a plurality of physical acoustic parameters (111 , 113) for the target room; obtaining (1103) the plurality of physical acoustic parameters (111 , 113) for a virtual room; and estimating (1105) a perceptual acoustic distance (115) between the target room and the virtual room based on the plurality of physical acoustic parameters (111) of the target room and on the plurality of physical acoustic parameters (113) of the virtual room and based on a perceptual acoustic quality model (150), wherein the perceptual acoustic quality model (150) defines a mapping between the plurality of physical acoustic parameters (111 , 113) and a value of the perceptual attribute of the target room and between the plurality of physical acoustic parameters (111 , 113) and a value of the perceptual attribute of the virtual room.

21. A computer program product comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method (1101) of claim 20, when the program code is executed by the computer or the processor.