WO2024067543A1 - 混响的处理方法、装置和非易失性计算机可读存储介质 - Google Patents

混响的处理方法、装置和非易失性计算机可读存储介质 Download PDF

Info

Publication number
WO2024067543A1
WO2024067543A1 PCT/CN2023/121368 CN2023121368W WO2024067543A1 WO 2024067543 A1 WO2024067543 A1 WO 2024067543A1 CN 2023121368 W CN2023121368 W CN 2023121368W WO 2024067543 A1 WO2024067543 A1 WO 2024067543A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
reverberation
processing
average
acoustic parameter
Prior art date
Application number
PCT/CN2023/121368
Other languages
English (en)
French (fr)
Inventor
叶煦舟
史俊杰
张正普
柳德荣
刘石磊
黄传增
Original Assignee
抖音视界有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 抖音视界有限公司 filed Critical 抖音视界有限公司
Publication of WO2024067543A1 publication Critical patent/WO2024067543A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/13Architectural design, e.g. computer-aided architectural design [CAAD] related to design of buildings, bridges, landscapes, production plants or roads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present disclosure relates to the technical field of audio processing, and in particular to a reverberation processing method, a reverberation processing device, and a non-volatile computer-readable storage medium.
  • a method for processing reverberation including: estimating shape information of a scene based on multiple intersection points of a plurality of sound lines centered on a listener and the scene; calculating a first average acoustic parameter value of a scene material of the scene based on first acoustic parameter values of the scene material at the locations of the plurality of intersection points; and calculating reverberation duration based on the shape information of the scene and the first average acoustic parameter value.
  • estimating the shape information of the scene based on multiple intersection points of the scene and multiple sound lines centered on the listener includes: calculating the coordinates of the intersection points based on the average of the coordinates of the multiple intersection points; and estimating the shape information of the scene based on the average of the distances between each of the multiple intersection points and the average intersection point.
  • calculating a first average acoustic parameter value of a scene material of a scene based on first acoustic parameter values of the scene material at multiple intersection locations includes calculating an average absorptivity of the scene material of the scene based on an average of the absorptivity of the scene material at multiple intersection locations.
  • the shape of the scene is a cube
  • the shape information includes the side length of the cube.
  • the reverberation duration is calculated based on the shape information of the scene and the first average acoustic parameter value. Summary: Calculate the reverberation duration based on the side length of the scene and the average absorption rate of the scene material.
  • the processing method further includes: calculating a second average acoustic parameter value of the scene material of the scene based on the second acoustic parameter value of the scene material at the locations of multiple intersections; and performing reverberation processing on the sound source signal based on the second average acoustic parameter value and the reverberation duration.
  • calculating the second average acoustic parameter value of the scene material of the scene based on the second acoustic parameter value of the scene material at the locations of multiple intersections includes: calculating the average scattering rate of the scene material of the scene based on the average of the scattering rates of the scene material at the locations of multiple intersections.
  • performing reverberation processing on the sound source signal includes: performing filtering processing on the sound source signal using an all-pass filter, wherein the all-pass filter is controlled according to the second average acoustic parameter value.
  • performing reverberation processing on the sound source signal includes: performing reverberation processing using one or more feedback gains based on a result of filtering processing, wherein the one or more feedback gains are controlled according to the reverberation duration.
  • the one or more feedback gains are multiple feedback gains, and each of the multiple feedback gains is determined according to a corresponding delay duration.
  • performing the reverberation processing using one or more feedback gains includes: delaying the result of filtering processing; processing the result of delay processing using a reflection matrix; and processing the processing result of the reflection matrix using one or more feedback gains.
  • delaying the result of the filtering process includes: using multiple delay durations to delay the result of the filtering process respectively.
  • delaying the result of the filtering process includes: delaying the sum of the result of the filtering process and the result of processing using one or more feedback gains.
  • a reverberation processing device including: an estimation unit, used to estimate shape information of a scene based on multiple sound lines centered on a listener and multiple intersection points of the scene; a calculation unit, used to calculate a first average acoustic parameter value of the scene material of the scene based on a first acoustic parameter value of the scene material at the locations of the multiple intersection points, and calculate the reverberation duration based on the shape information of the scene and the first average acoustic parameter value.
  • the estimation unit calculates the coordinates of the intersection points according to the average of the coordinates of the plurality of intersection points; and estimates the shape information of the scene according to the average of the distances between each of the plurality of intersection points and the average intersection point.
  • the calculation unit calculates the average absorption rate of the scene material of the scene according to the average of the absorption rates of the scene materials at the locations of the plurality of intersection points.
  • the shape of the scene is a cube
  • the shape information includes the side length of the cube.
  • the calculation unit calculates the scene length according to the average absorption rate of the scene material of the scene. Calculate the reverberation time.
  • the calculation unit calculates a second average acoustic parameter value of the scene material of the scene based on the second acoustic parameter value of the scene material at the locations of multiple intersections;
  • the processing device also includes: a processing unit, which is used to perform reverberation processing on the sound source signal based on the second average acoustic parameter value and the reverberation duration.
  • the calculation unit calculates an average scattering rate of the scene material of the scene according to an average of the scattering rates of the scene material at the locations of the plurality of intersection points.
  • the processing unit performs filtering processing on the sound source signal using an all-pass filter, and the all-pass filter is controlled according to the second average acoustic parameter value.
  • the processing unit performs reverberation processing using one or more feedback gains based on the result of the filtering process, and the one or more feedback gains are controlled according to the reverberation duration.
  • the one or more feedback gains are multiple feedback gains, and each of the multiple feedback gains is determined according to a corresponding delay duration.
  • the processing unit performs delay processing on the result of the filtering processing; processes the result of the delay processing using a reflection matrix; and processes the processing result of the reflection matrix using one or more feedback gains.
  • the processing unit uses multiple delay durations to perform delay processing on the results of the filtering processing respectively.
  • the processing unit performs delay processing on the sum of the result of the filtering process and the result of the processing using one or more feedback gains.
  • a reverberation processing device comprising: a memory; and a processor coupled to the memory, wherein the processor is configured to execute the reverberation processing method in any one of the above embodiments based on instructions stored in the memory device.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method for processing reverberation described in any of the above embodiments is implemented.
  • a computer program comprising: instructions, which, when executed by a processor, enable the processor to execute the method for processing reverberation according to any one of the above embodiments.
  • a computer program product comprising instructions, which, when executed by a processor, enable the processor to execute the method for processing reverberation according to any one of the above embodiments.
  • FIG1 is a flowchart showing a method for processing reverberation according to some embodiments of the present disclosure
  • FIG2 is a schematic diagram showing a spatial audio rendering system framework according to some embodiments of the present disclosure
  • 3a to 3c are schematic diagrams showing methods for processing reverberation according to some embodiments of the present disclosure
  • FIG4 shows a block diagram of a reverberation processing apparatus according to some embodiments of the present disclosure
  • FIG5 shows a block diagram of a reverberation processing device according to some other embodiments of the present disclosure
  • FIG. 6 shows a block diagram of a device for processing reverberation according to yet other embodiments of the present disclosure.
  • All sounds in the real world are spatial audio. Sound comes from the vibration of objects and is heard after being transmitted through a medium.
  • vibrating objects can appear anywhere, and they form a three-dimensional direction vector with the human head.
  • the horizontal angle of the direction vector affects the loudness difference, time difference, and phase difference of the sound reaching the two ears;
  • the vertical angle of the direction vector also affects the frequency response of the sound reaching the ears. It is precisely based on this physical information that humans, through a lot of unconscious training, have acquired the ability to judge the location of the sound source based on binaural sound signals.
  • HRTF head response function
  • FIR Finite Impulse Response
  • a HRTF can only represent the relative position relationship between a fixed sound source and a certain listener.
  • N HRTFs are theoretically required to perform 2N convolutions on the N original signals; and when the listener rotates, all N HRTFs need to be updated to correctly render the virtual spatial audio scene, resulting in a large amount of calculation.
  • spherical harmonics are applied to spatial audio rendering.
  • the basic idea of spherical harmonics is to imagine that the sound is distributed on a sphere, and N signal channels pointing in different directions each perform their duties and are responsible for the sound in the corresponding direction.
  • the spatial audio rendering algorithm based on ambisonics is as follows:
  • step 1 the sampling points in each ambisonics channel are set to 0;
  • step 2 the weight value of each ambisonics channel is calculated using the horizontal angle and elevation angle of the sound source relative to the listener;
  • step 3 the original signal is multiplied by the weight value of each ambisonics channel, and the weighted signal is superimposed on each channel;
  • step 4 repeat step 3 for all sound sources in the scene
  • step 5 all sampling points of the binaural output signal are set to 0;
  • each ambisonics channel signal is convolved with the HRTF in the corresponding direction of the channel, and the convolved signal is superimposed on the binaural output signal;
  • step 7 repeat step 6 for all ambisonics channels.
  • the number of convolutions is only related to the number of ambisonics channels, not the number of sound sources, and encoding sound sources into ambisonics is much faster than convolution.
  • the listener rotates, all ambisonics channels can be rotated, and the amount of calculation is also independent of the number of sound sources.
  • it can also be simply rendered to the speaker array.
  • Humans may not have the keen hearing of bats, but they can also obtain a lot of information by listening to the impact of the environment on the sound source. For example, when listening to a singer, because of the different reverberation durations, you can clearly hear whether you are listening to the song in a large church or in a parking lot; because of the different ratios of reverberation and direct sound, even in a church, you can clearly distinguish whether you are listening to the song 1 meter in front of the singer or 20 meters in front of the singer. For example, for the scene in the church, because of the different loudness of the early reflected sound, you can clearly distinguish whether you are listening to the singer singing in the center of the church or only 10 centimeters away from the wall.
  • the wave solver based on finite element analysis (wave physics simulation) divides the space to be calculated into densely packed cubes, called "voxels" (similar to the concept of pixels, but pixels are extremely small area units on a two-dimensional plane, and voxels are extremely small volume units in three-dimensional space).
  • the basic process of the algorithm is as follows:
  • step 1 a pulse is excited from a voxel at the location of the sound source in the virtual scene
  • step 2 at the next time segment, the impulses of all neighboring voxels of the voxel are calculated according to the voxel size and whether the neighboring voxels contain the scene shape;
  • step 3 step 2 is repeated multiple times to calculate the sound wave field in the scene. The more times the sound wave field is repeated, the more accurate the calculation is.
  • step 4 an array of all historical amplitudes at the position voxel of the listener's position is taken as the impulse response from the sound source to the listener's position in the current scene;
  • step 5 repeat steps 1 to 4 for all sound sources in the scene.
  • the temporal and spatial accuracy is very high. As long as the voxels are small enough and the time slice is short enough, it can be adapted to scenes of any shape and material.
  • this method cannot correctly reflect changes in the acoustic characteristics of the scene when the scene undergoes unconsidered changes during pre-rendering, because the corresponding rendering parameters are not saved.
  • the core idea of the ray tracing algorithm is to find as many sound propagation paths from the sound source to the listener as possible, so as to obtain the energy direction, delay, and filtering characteristics brought by the path.
  • This type of algorithm is the core of the house acoustic simulation system of Oculus and Wwise.
  • step 1 a number of rays evenly distributed on the sphere are radiated into space with the listener position as the origin;
  • step 2 for each ray:
  • the current path is recorded as the effective path of the sound source and saved;
  • the ray direction will be changed according to the preset material information of the triangle where the intersection point is located, and it will continue to be emitted in the scene;
  • steps a and b Repeat steps a and b until the number of reflections of the ray reaches the preset maximum reflection depth, then return to step 2 and perform steps a to c for the next initial direction of the ray.
  • each sound source has recorded some path information. Then use this information to calculate the energy direction, delay, and filtering characteristics of each path of each sound source. This information is collectively called the spatial impulse response between the sound source and the listener.
  • the accuracy of the algorithm is extremely dependent on the number of samples in the initial direction of the ray, that is, more rays; however, since the complexity of the ray tracing algorithm is O(nlog(n)), more rays will inevitably lead to an explosive increase in the amount of calculation;
  • the idea of the algorithm to simplify the geometry of the environment is to try to find an approximate but much simpler geometry and surface material given the geometry and surface material of the current scene, thereby greatly reducing the amount of calculation for the environmental acoustic simulation.
  • the algorithm to simplify the geometry of the environment includes:
  • step 1 during the pre-rendering phase, a cubic room shape is estimated
  • step 2 the geometric characteristics of the cube are used, and it is assumed that the sound source and the listener are in the same area.
  • the direct sound and early reflections from the sound source to the listener in the scene are quickly calculated using a table lookup method.
  • step 3 in the pre-rendering stage, the duration of the late reverberation in the current scene is calculated using the empirical formula of the reverberation duration of the cubic room, thereby controlling an artificial reverberation to simulate the late reverberation effect of the scene.
  • the approximate shape of the scene is calculated in the pre-rendering stage and cannot adapt to dynamically changing scenes (opening doors, material changes, roofs being blown away, etc.);
  • the present disclosure renders the impact of dynamically changing scenes on ambient sound without significantly affecting the rendering speed, so that devices with relatively weak computing power can also simulate dynamic ambient sound of a large number of sound sources in real time.
  • the efficiency and accuracy of sound rendering can be improved.
  • FIG. 1 shows a flow chart of a method for processing reverberation according to some embodiments of the present disclosure.
  • step 110 according to multiple intersection points of multiple sound lines centered on the listener and the scene, Estimate the shape information of the scene.
  • the coordinates of the intersection points are calculated according to the average of the coordinates of the plurality of intersection points; and the shape information of the scene is estimated according to the average of the distances between each of the plurality of intersection points and the average intersection point.
  • the shape of the scene is a cube, and the shape information includes the side length of the cube.
  • the shape of the scene may also be other shapes such as a cuboid.
  • N sound rays are randomly and evenly scattered around, and the N intersection points P n ,n ⁇ (1,N) between these sound rays and the scene are obtained.
  • the coordinates of the average intersection point are calculated:
  • Calculate the shape information of the approximate cubic room calculate the coordinates of all intersection points Pn to the average intersection point The average distance at:
  • the estimated side length of the cubic room is
  • a first average acoustic parameter value of the scene material of the scene is calculated based on the first acoustic parameter values of the scene material at the locations of the plurality of intersection points.
  • the average absorption rate of the scene material of the scene is calculated according to the average of the absorption rates of the scene materials at the locations of the plurality of intersection points.
  • the second average acoustic parameter value of the scene material of the scene is calculated based on the second acoustic parameter values of the scene material at the locations of the plurality of intersections.
  • the average material scattering rate of the scene is calculated based on the average of the material scattering rates at the locations of the plurality of intersections.
  • the average acoustic parameters of the scene material are calculated. Assume that at each intersection point Pn of the N sound rays and the scene mentioned above, the absorption rate of the scene material is A n and the scattering rate is S n .
  • the average absorption rate is:
  • the average scattering rate (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 +
  • the reverberation duration is calculated according to the shape information of the scene and the first average acoustic parameter value. For example, the reverberation duration is calculated according to the side length of the scene and the average absorption rate of the scene material of the scene.
  • reverberation processing is performed on the sound source signal according to the second average acoustic parameter value and the reverberation duration.
  • S is the indoor surface area of the cubic room
  • V is the net volume of the cubic room.
  • reverberation processing is performed on the sound source signal according to the reverberation duration.
  • calculating the second average acoustic parameter value of the scene material of the scene based on the second acoustic parameter value of the scene material at the locations of multiple intersections includes: calculating the average scattering rate of the scene material of the scene based on the average of the scattering rates of the scene material at the locations of multiple intersections.
  • performing reverberation processing on the sound source signal includes: performing filtering processing on the sound source signal using an all-pass filter, wherein the all-pass filter is controlled according to the second average acoustic parameter value.
  • reverberation processing is performed using one or more feedback gains, and the one or more feedback gains are controlled according to the reverberation duration.
  • the one or more feedback gains are multiple feedback gains, and each of the multiple feedback gains is determined according to a corresponding delay duration.
  • the 16 feedback gains are controlled by the reverberation time T60 as follows:
  • delay(n) is the delay time corresponding to the feedback gain n.
  • the result of the filtering process is delayed; the result of the delay process is processed using a reflection matrix; and the result of the reflection matrix processing is processed using one or more feedback gains.
  • the second average acoustic parameter of the scene material includes a scattering rate of the scene material; and the Allpass filter is set according to the scattering rate of the scene material.
  • multiple delay durations are used to delay the results of the filtering process respectively.
  • a delay is performed on the sum of the result of the filtering process and the result of the processing using one or more feedback gains.
  • FIG. 2 shows a schematic diagram of a spatial audio rendering system framework according to some embodiments of the present disclosure.
  • the scene shape (3D mesh), listener position, wall absorptivity (such as the wall corresponding to the scene, intersection, etc.), wall scattering rate, and distance attenuation information are derived from the "element" in Figure 2.
  • the binaural playback signal in FIG2 may be generated by the following embodiments.
  • 3a to 3c are schematic diagrams showing methods for processing reverberation according to some embodiments of the present disclosure.
  • the rendering processing framework includes: assuming that the sound emitted by the sound source is a number of sound rays emitted from the sound source; simulating physical phenomena such as air propagation, reflection, and refraction in the scene, and collecting the intensity of the sound rays that finally converge at the listener's ears; considering the absorption rate of the wall and the distance factor in the process of sound ray propagation, and estimating the calculation model of the scene through the sound ray intensity; using the generated reverberation model to perform reverberation processing on the audio data of the sound source; mixing the reverberation data with the spatialized binaural data, and outputting the final binaural audio.
  • input information for calculating the reverberation model is the shape of the scene, the absorptivity of the scene, the scattering rate of the scene, and the position of the listener.
  • N rays are randomly and evenly scattered around, and the N intersection points of these rays and the scene are obtained, P n ,n ⁇ (1,N).
  • the coordinates of the average intersection point are calculated:
  • Calculate the shape information of the approximate cubic room calculate the coordinates of all intersection points Pn to the average intersection point The average distance at:
  • the average absorption rate is:
  • the average scattering rate (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 + (0.05 * (1 +
  • a method is proposed to dynamically adjust the reverberation time and the response time during operation.
  • the reverberation processing chain of the time-domain density of the radiated sound allows the calculated changing sound field parameters to dynamically affect the reverberated sound heard, thereby achieving dynamic reverberation related to the scene.
  • Dynamic reverberation can produce different reverberation effects for the 6DoF (6 Degrees of Freedom) movement that follows the listener.
  • the input signal of the dynamic reverberator is the original signal of the sound source or the original sound source signal after being processed by one or more of the following effects: loudness attenuation, air absorption filtering, delay effect processing or spatialization algorithm.
  • the "Allpass ⁇ 3" filter uses three cascaded all-pass filters (such as Schroder Allpass).
  • the structure of the Schroder Allpass filter is shown in Figure 3c.
  • the delay time is 5 to 10 ms and they are prime numbers.
  • the parameter g of the Schroder Allpass filter is affected by the average scattering coefficient control:
  • the coefficient 0.3 can also be replaced with other values as needed.
  • the delay time of delay 0 to 15 is 30 to 50 ms and is prime to each other.
  • the reflection matrix is a 16 ⁇ 16 Householder matrix.
  • g0 to g15 are 16 feedback gains, which are controlled by the reverberation time T60.
  • delay(n) is the delay time corresponding to the nth feedback gain.
  • the reverberation model in the dynamically changing scene is calculated by estimating the simplified shape of the room in real time; the dynamically adjustable artificial reverberation is controlled by the reverberation model.
  • the impact of the dynamically changing scene on the ambient sound can be rendered without significantly affecting the rendering speed, so that devices with relatively weak computing power can also simulate the dynamic ambient sound of a large number of sound sources in real time. Therefore, the efficiency and accuracy of sound rendering can be improved.
  • FIG. 4 shows a block diagram of a reverberation processing apparatus according to some embodiments of the present disclosure.
  • the reverberation processing device 4 includes: an estimation unit 41, which is used to estimate the shape information of the scene based on multiple sound lines centered on the listener and multiple intersection points of the scene; a calculation unit 42, which is used to calculate a first average acoustic parameter value of the scene material of the scene based on a first acoustic parameter value of the scene material at the locations of the multiple intersection points, and calculate the reverberation duration based on the shape information of the scene and the first average acoustic parameter value.
  • the estimation unit 41 calculates the coordinates of the intersection points according to the average of the coordinates of the plurality of intersection points, and estimates the shape information of the scene according to the average of the distances between each of the plurality of intersection points and the average intersection point.
  • the calculation unit 42 calculates the average absorption rate of the scene material of the scene according to the average absorption rate of the scene material at the locations of the plurality of intersection points.
  • the shape of the scene is a cube
  • the shape information includes the side length of the cube.
  • the calculation unit 42 calculates the reverberation duration according to the side length of the scene and the average absorption rate of the scene material of the scene.
  • the calculation unit 42 calculates the second average acoustic parameter value of the scene material of the scene based on the second acoustic parameter value of the scene material at the locations of multiple intersections; the processing device 4 also includes: a processing unit 43, which is used to perform reverberation processing on the sound source signal based on the second average acoustic parameter value and the reverberation duration.
  • the calculation unit 42 calculates the average scattering rate of the scene material of the scene according to the average of the scattering rates of the scene materials at the locations of the plurality of intersection points.
  • the processing unit 43 performs filtering processing on the sound source signal using an all-pass filter, and the all-pass filter is controlled according to the second average acoustic parameter value.
  • the processing unit 43 performs reverberation processing using one or more feedback gains based on the result of the filtering processing, and the one or more feedback gains are controlled according to the reverberation duration.
  • the one or more feedback gains are multiple feedback gains, and each of the multiple feedback gains is determined according to a corresponding delay duration.
  • the processing unit 43 performs delay processing on the result of the filtering processing, processes the result of the delay processing using a reflection matrix, and processes the processing result of the reflection matrix using one or more feedback gains.
  • the processing unit 43 uses multiple delay durations to perform delay processing on the results of the filtering processing respectively.
  • the processing unit 43 performs delay processing on the sum of the result of the filtering process and the result of the processing using one or more feedback gains.
  • FIG. 5 shows a block diagram of a reverberation processing apparatus according to some other embodiments of the present disclosure.
  • the reverberation processing device 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51 , and the processor 52 is configured to execute the reverberation processing method in any one of the embodiments of the present disclosure based on instructions stored in the memory 51 .
  • the memory 51 may include, for example, a system memory, a fixed non-volatile storage medium, etc.
  • the system memory may store, for example, an operating system, an application program, a boot loader, a database, and other programs.
  • FIG. 6 shows a block diagram of a device for processing reverberation according to yet other embodiments of the present disclosure.
  • the reverberation processing device 6 of the embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, wherein the processor 620 is configured to execute the previous The method for processing reverberation in any one of the embodiments is described.
  • the memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, etc.
  • the system memory may store, for example, an operating system, an application program, a boot loader, and other programs.
  • the reverberation processing device 6 may further include an input/output interface 630, a network interface 640, a storage interface 650, etc. These interfaces 630, 640, 650, the memory 610, and the processor 620 may be connected, for example, via a bus 860.
  • the input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a speaker.
  • the network interface 640 provides a connection interface for various networked devices.
  • the storage interface 650 provides a connection interface for external storage devices such as an SD card and a USB flash drive.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transient storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • a computer-usable non-transient storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the method and system of the present disclosure may be implemented in many ways.
  • the method and system of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated.
  • the present disclosure may also be implemented as a program recorded in a recording medium, which includes machine-readable instructions for implementing the method according to the present disclosure. Therefore, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Geometry (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Architecture (AREA)
  • Mathematical Analysis (AREA)
  • Structural Engineering (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Civil Engineering (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

一种混响的处理方法、装置和非易失性计算机可读存储介质,该处理方法,包括:根据以听者为中心的多条声线与场景的多个交点,估计场景的形状信息(110);根据多个交点所在位置的场景材质的第一声学参数值,计算场景的场景材质的第一平均声学参数值(120);根据场景的形状信息和第一平均声学参数值,计算混响时长(130)。

Description

混响的处理方法、装置和非易失性计算机可读存储介质
相关申请的交叉引用
本申请是以PCT申请号为PCT/CN2022/123290,申请日为2022年9月30日的申请为基础,并主张其优先权,该PCT申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及音频处理技术领域,特别涉及一种混响的处理方法、混响的处理装置和非易失性计算机可读存储介质。
背景技术
环境声学现象在现实中无所不在。所以,在一个沉浸式的虚拟环境里,为了尽可能模拟现实世界给予人的各种信息,需要高质量地模拟虚拟场景对于场景中声音的影响,从而不打破用户的沉浸感。
在相关技术中,对环境声学现象的模拟主要有三大类方法,基于有限元分析的波动求解器、射线追踪以及化简环境的几何形状。
发明内容
根据本公开的一些实施例,提供了一种混响的处理方法,包括:根据以听者为中心的多条声线与场景的多个交点,估计场景的形状信息;根据多个交点所在位置的场景材质的第一声学参数值,计算场景的场景材质的第一平均声学参数值;根据场景的形状信息和第一平均声学参数值,计算混响时长。
在一些实施例中,根据以听者为中心的多条声线与场景的多个交点,估计场景的形状信息包括:根据多个交点的坐标的均值,计算交点的坐标;根据多个交点中的每一个与平均交点的距离的均值,估计场景的形状信息。
在一些实施例中,根据多个交点所在位置的场景材质的第一声学参数值,计算场景的场景材质的第一平均声学参数值包括:根据多个交点所在位置的场景材质的吸收率的均值,计算场景的场景材质的平均吸收率。
在一些实施例中,场景的形状为立方体,形状信息包括立方体的边长。
在一些实施例中,根据场景的形状信息和第一平均声学参数值,计算混响时长包 括:根据场景的边长和场景的场景材质的平均吸收率,计算混响时长。
在一些实施例中,处理方法还包括:根据多个交点所在位置的场景材质的第二声学参数值,计算场景的场景材质的第二平均声学参数值;根据第二平均声学参数值和混响时长,对声源信号进行混响处理。
在一些实施例中,根据多个交点所在位置的场景材质的第二声学参数值,计算场景的场景材质的第二平均声学参数值包括:根据多个交点所在位置的场景材质的散射率的均值,计算场景的场景材质的平均散射率。
在一些实施例中,对声源信号进行混响处理包括:利用全通滤波器,对声源信号进行滤波处理,全通滤波器根据第二平均声学参数值控制。
在一些实施例中,对声源信号进行混响处理包括:基于滤波处理的结果,利用一个或多个反馈增益进行混响处理,一个或多个反馈增益根据混响时长控制。
在一些实施例中,一个或多个反馈增益为多个反馈增益,多个反馈增益中的每一个根据对应的延迟时长确定。
在一些实施例中,基于滤波处理的结果,利用一个或多个反馈增益进行所述混响处理包括:对滤波处理的结果,进行延迟处理;利用反射矩阵,对延迟处理的结果进行处理;利用一个或多个反馈增益,对反射矩阵的处理结果进行处理。
在一些实施例中,对滤波处理的结果,进行延迟处理包括:利用多个延迟时长,分别对滤波处理的结果进行延迟处理。
在一些实施例中,对滤波处理的结果,进行延迟处理包括:对滤波处理的结果与利用一个或多个反馈增益的处理结果的加和,进行延迟处理。
根据本公开的另一些实施例,提供一种混响的处理装置,包括:估计单元,用于根据以听者为中心的多条声线与场景的多个交点,估计场景的形状信息;计算单元,用于根据多个交点所在位置的场景材质的第一声学参数值,计算场景的场景材质的第一平均声学参数值,根据场景的形状信息和第一平均声学参数值,计算混响时长。
在一些实施例中,估计单元根据多个交点的坐标的均值,计算交点的坐标;根据多个交点中的每一个与平均交点的距离的均值,估计场景的形状信息。
在一些实施例中,计算单元根据多个交点所在位置的场景材质的吸收率的均值,计算场景的场景材质的平均吸收率。
在一些实施例中,场景的形状为立方体,形状信息包括立方体的边长。
在一些实施例中,计算单元根据场景的边长和场景的场景材质的平均吸收率,计 算混响时长。
在一些实施例中,计算单元根据多个交点所在位置的场景材质的第二声学参数值,计算场景的场景材质的第二平均声学参数值;处理装置还包括:处理单元,用于根据第二平均声学参数值和混响时长,对声源信号进行混响处理。
在一些实施例中,计算单元根据多个交点所在位置的场景材质的散射率的均值,计算场景的场景材质的平均散射率。
在一些实施例中,处理单元利用全通滤波器,对声源信号进行滤波处理,全通滤波器根据第二平均声学参数值控制。
在一些实施例中,处理单元基于滤波处理的结果,利用一个或多个反馈增益进行混响处理,一个或多个反馈增益根据混响时长控制。
在一些实施例中,一个或多个反馈增益为多个反馈增益,多个反馈增益中的每一个根据对应的延迟时长确定。
在一些实施例中,处理单元对滤波处理的结果,进行延迟处理;利用反射矩阵,对延迟处理的结果进行处理;利用一个或多个反馈增益,对反射矩阵的处理结果进行处理。
在一些实施例中,处理单元利用多个延迟时长,分别对滤波处理的结果进行延迟处理。
在一些实施例中,处理单元对滤波处理的结果与利用一个或多个反馈增益的处理结果的加和,进行延迟处理。
根据本公开的又一些实施例,提供一种混响的处理装置,包括:存储器;和耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行上述任一个实施例中的混响的处理方法。
根据本公开的再一些实施例,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一些实施例所述的混响的处理方法。
根据本公开的再一些实施例,还提供了一种计算机程序,包括:指令,所述指令当由处理器执行时使所述处理器执行根据上述任一个实施例所述的混响的处理方法。
根据本公开的再一些实施例,还提供了一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行根据上述任一个实施例所述的混响的处理方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其 优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1示出根据本公开一些实施例的混响的处理方法的流程图;
图2示出根据本公开一些实施例的空间音频渲染系统框架的示意图;
图3a~3c示出根据本公开一些实施例的混响的处理方法的示意图;
图4示出根据本公开一些实施例的混响的处理装置的框图;
图5示出根据本公开另一些实施例的混响的处理装置的框图;
图6示出根据本公开又一些实施例的混响的处理装置的框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
真实世界中的所有声音,都是空间音频。声音来源于物体振动,经由介质传播后被听到。现实世界中,振动的物体可以出现在任何地方,它们与人头会形成一个三维的方向向量。方向向量的水平角会影响声音到达双耳中的响度差、时间差和相位差; 方向向量的垂直角还会影响声音到达双耳中的频率响应。人类正是依靠这些物理信息,在后天大量无意识的训练之下,得到了根据双耳声音信号判断声源所处位置的能力。
在一个沉浸式的虚拟环境里,为了尽可能模拟现实世界给予人的各种信息,也要高质量地模拟声音位置对于听到的双耳信号的影响,从而不打破用户的沉浸感。这个影响在静态环境中确定了声源位置和听者位置。在这种情况下,可以用人头响应函数(HRTF,Head Related Transfer Function)表示。HRTF是一个双声道的FIR(Finite Impulse Response,有限长单位冲激响应)滤波器。将原始信号与指定位置的HRTF进行卷积,就能得到声源在该位置时听到的信号。
然而,一个HRTF只能表示一个固定的声源和一个确定的听者的相对位置关系。当需要渲染N个声源的时候,理论上需要N个HRTF,对N个原始信号进行2N次卷积;并且当听者发生旋转时,需要更新所有的N个HRTF,才能正确的渲染虚拟的空间音频场景,从而导致计算量很大。
为了解决多声源渲染以及听者3DOF(3 Degrees of Freedom,3自由度)的旋转,spherical harmonics(球谐函数)被应用到了空间音频渲染中来。球谐波函数(Ambisonic)的基本思路是想象声音分布在一个球面之上,N个指向不同方向的信号通道各司其职,负责对应方向部分的声音。基于ambisonics的空间音频渲染算法如下:
在步骤1中,各ambisonics通道中采样点均设置为0;
在步骤2中,利用声源相对听者的水平角和俯仰角,计算出各个ambisonics通道的权重值;
在步骤3中,将原始信号乘以各个ambisonics通道的权重值,并将加权后的信号叠加到各个通道;
在步骤4中,对场景中所有声源重复步骤3;
在步骤5中,双耳输出信号的所有采样点设置为0;
在步骤6中,将各个ambisonics通道信号与通道对应方向的HRTF做卷积,并将卷积后的信号叠加到双耳输出信号上;
在步骤7中,对所有ambisonics通道重复步骤6。
如此,做卷积的次数只与ambisonics通道数有关,与声源个数无关,而声源编码到ambisonics比起卷积要快得多。不仅如此,如果听者旋转了,可以对所有的ambisonics通道进行旋转,同样计算量与声源个数无关。除了将ambisonics信号渲染到双耳,还可以简单的将它渲染到扬声器阵列。
在现实世界中,人类,包括其他动物,感知到的声音并非只有声源直达耳朵的直达声,还有声源振动波经由环境反射、散射以及衍射。环境反射和散射声直接影响了对声源和听者自身所处环境的听觉感知。这种感知能力是蝙蝠等夜行性动物能够在黑暗中定位自己位置,并且理解自身所处环境的基本原理。
人类也许没有蝙蝠的听觉灵敏,但是也可以通过聆听环境对声源的影响获得大量的信息。例如,同样是在聆听一位歌手演唱,因为混响时长不同,可以明显听出现在是在一个大型教堂内听歌,还是在停车场上听歌;因为混响和直达声的比例不同,就算在教堂内也能明确分辨是在歌手正前方1米处听歌,还是在歌手正前方20米处听歌。例如,还是针对教堂中的场景,因为早期反射声的响度不同,能明确分辨是在教堂的中心听歌手唱歌,还是在离墙壁只有10厘米处听歌。
基于有限元分析的波动求解器(波动物理模拟),将需计算的空间分割成密集排列的立方体,称之为“体素”(voxel)(与像素的概念类似,不过像素是二维平面上的极小面积单位,体素是三维空间中的极小体积单位)。算法的基本过程如下:
在步骤1中,在虚拟场景中,从声源的位置上的体素内激发一个脉冲;
在步骤2中,在下一个时间片段,根据体素大小以及邻接体素是否包含场景形状,计算该体素的所有邻接体素的脉冲;
在步骤3中,重复步骤2多次,可计算出场景中的声波场,重复次数越多声波场计算得越精确;
在步骤4中,取听者的位置的位置体素上的所有历史振幅的数组,作为当前场景下该声源到这个听者位置的脉冲响应;
在步骤5中,对场景中所有的声源重复步骤1~4。
基于波形求解器的房屋声学模拟算法有如下优点:
时间和空间精度精度非常高,只要给到足够小的体素以及足够短的时间片长,可适应于任意形状、材质的场景。
同时该算法有如下缺点。
1.计算量巨大。计算量与体素的大小的三次方成反比,与时间片长成正比。在现实应用场景下,几乎不可能在保证合理的时间和空间精度的同时,实时计算波动物理。
2.因为上述缺陷,在需要实时渲染房屋声学现象的时候,软件开发者会选择预渲染不同位置组合下,大量的声源和听者之间的脉冲响应,并将之参数化。并在实时演算时根据听者和声源的不同位置,实时切换渲染参数。但这会需要一台强大的计算设 备进行预渲染计算,需要额外的存储空间用以存放大量的参数。
3.如上所述,这种方法不能在预渲染时,场景发生了没考虑到的变化的情况下,正确的反应出场景声学特性的变化,因为并没有保存对应的渲染参数。
射线追踪算法的核心思想是:找到尽可能多的从声源到听者之间的声音传播路径,从而获得该路径会带来的能量方向、延迟以及滤波特性。此类算法是Oculus与Wwise的房屋声学模拟系统的核心。
寻找声源到听者的传播路径算法可以简单的归结于如下步骤:
在步骤1中,以听者位置为原点,向空间中放射若干条在球面上均匀分布的射线;
在步骤2中,对每一条射线:
a.若射线与某一声源的垂直距离小于一个预设值的话,则将当前路径记为该声源的有效路径,加以保存;
b.当射线与场景相交时,会根据相交点所在的三角形预设的材质信息,改变射线方向,继续在场景中发射;
c.重复步骤a、b,直到该射线反射的次数到达预设的最大反射深度,则回到步骤2,对下一个射线初始方向进行a~c的步骤。
至此,每个声源都记录了一些路径信息。接着利用这些信息计算出每个声源每个路径的能量方向、延迟、以及滤波特性。把这些信息统称为声源与听者之间的空间脉冲响应。
最后,只要可听化(auralize)每个声源的空间脉冲响应,即可模拟出非常真实的声源方位、距离以及声源与听者所处环境的特征。空间脉冲响应可听化有两种方法:
1.将空间脉冲响应编码到球形谐波域(ambisonics domain),然后用该形谐波域生成双耳房间脉冲响应(BRIR),最后把声源原始信号与该BRIR进行卷积,即可得带房间反射和混响的空间音频了;
2.利用空间脉冲响应的信息,将声源原始信号编码到球形谐波域,然后把该球型谐波渲染到双耳输出(binaural)即可。
基于射线追踪的环境声学模拟算法有如下优势:
相比起波动物理仿真低得多的计算量,不需要预渲染;
可适应动态变化的场景(开门、材质变化、屋顶被打飞等);
可适应任何形状的场景。
此类算法也有如下劣势:
算法的精度极其依赖于射线初始方向的采样量,即更多的射线;然而,由于光线追踪算法的复杂度是O(nlog(n)),更多的射线势必会带来爆炸式增长的计算量;
无论是BRIR卷积还是原始信号编码到球型谐波,计算量都十分可观;随着场景中声源个数的增加,计算量会线性上涨;这对于计算能力有限的移动端设备不是很友好。
化简环境的几何形状算法的思路,是在给定当前场景的几何形状和表面材质后,试图去寻找一种近似的但简单许多的几何图形以及表面材质,从而大幅度降低环境声学模拟的计算量。化简环境的几何形状算法包括:
在步骤1中,在预渲染阶段,估计一个正方体的房间形状;
在步骤2中,利用正方体的几何特性,同时假设声源和听者在同一区域,用查表的方法快速计算出场景中声源到听者之间的直达声以及早期反射;
在步骤3中,在预渲染阶段,利用立方体房间混响时长的经验公式,推算出当前场景中晚期混响的时长,从而控制一个人工混响来模拟场景的晚期混响效果。
此类算法有如下优势:
1.极少的计算量;
2.理论上可模拟无限长的混响时长,并且没有额外的CPU和内存开销。
但是此类算法有如下劣势:
1.场景近似形状是在预渲染阶段计算的,无法适应动态变化的场景(开门,材质变化,屋顶被打飞等);
2.假设了声源和听者永远处在同一位置,不符合现实;
3.假设了所有场景形状都可近似为三边和世界坐标平行的立方体,不能正确渲染很多现实场景(狭长的走廊、倾斜的楼梯间、废旧的歪着的集装箱等);
4.此类算法牺牲了渲染质量,来换取快速的渲染速度。
也就是说,通过化简环境几何形状的环境声学模拟算法,提供快速的渲染速度,但是牺牲了渲染质量,并且渲染框架不能支持动态变化的场景,比如开关门等。
本公开针对上述技术问题,在不大幅度影响渲染速度的前提下,渲染动态变化的场景对环境声的影响,使得计算能力较为薄弱的设备也可以实时模拟大量声源的动态环境声。从而,能够提高声音渲染的效率和准确性。
图1示出根据本公开一些实施例的混响的处理方法的流程图。
如图1所示,在步骤110中,根据以听者为中心的多条声线与场景的多个交点, 估计场景的形状信息。
在一些实施例中,根据多个交点的坐标的均值,计算交点的坐标;根据多个交点中的每一个与平均交点的距离的均值,估计场景的形状信息。
在一些实施例中,场景的形状为立方体,形状信息包括立方体的边长。例如,场景的形状也可以为长方体等其他形状。
例如,以听者为中心,向四周随机且均匀地散射出N条声线,求得这些声线与场景的N个交点Pn,n∈(1,N)。计算出平均交点的坐标:
计算近似立方体房间的形状信息:计算出所有交点Pn到平均交点的坐标处的平均距离:
估计出立方体房间的边长为
在步骤120中,根据多个交点所在位置的场景材质的第一声学参数值,计算场景的场景材质的第一平均声学参数值。
在一些实施例中,根据多个交点所在位置的场景材质的吸收率的均值,计算场景的场景材质的平均吸收率。
在一些实施例中,根据多个交点所在位置的场景材质的第二声学参数值,计算场景的场景材质的第二平均声学参数值。例如,根据所述多个交点所在位置的材质散射率的均值,计算场景的平均材质散射率。
例如,计算场景材质的平均声学参数。假设在上文提到的N个声线与场景的各交点Pn处,场景材质的吸收率是An,散射率是Sn
例如,平均吸收率为:
例如,平均散射率:
在步骤130中,根据场景的形状信息和第一平均声学参数值,计算混响时长。例如,根据场景的边长和场景的场景材质的平均吸收率,计算混响时长。
在一些实施例中,根据第二平均声学参数值和混响时长,对声源信号进行混响处理。
例如,利用估计的立方体房间、材质的平均吸收率以及Eyring(伊林)公式计算混响时长:
S为立方体房间的室内表面积,V为立方体房间的净容积。
当听者的位置发生变化时,从听者处发出的声线会相交于不同的场景物体表面,使得物理模拟过程所计算得到的混响时长T60以及平均散射率发生变化。
在一些实施例中,根据混响时长,对声源信号进行混响处理。
在一些实施例中,根据多个交点所在位置的场景材质的第二声学参数值,计算场景的场景材质的第二平均声学参数值包括:根据多个交点所在位置的场景材质的散射率的均值,计算场景的场景材质的平均散射率。
在一些实施例中,对声源信号进行混响处理包括:利用全通滤波器,对声源信号进行滤波处理,全通滤波器根据第二平均声学参数值控制。
在一些实施例中,基于滤波处理的结果,利用一个或多个反馈增益进行混响处理,一个或多个反馈增益根据混响时长控制。
在一些实施例中,一个或多个反馈增益为多个反馈增益,多个反馈增益中的每一个根据对应的延迟时长确定。
例如,16个反馈增益受混响时长T60控制为:
delay(n)为反馈增益n对应的延迟时长。
在一些实施例中,对滤波处理的结果,进行延迟处理;利用反射矩阵,对延迟处理的结果进行处理;利用一个或多个反馈增益,对反射矩阵的处理结果进行处理。
在一些实施例中,场景材质的第二平均声学参数包括场景材质的散射率;Allpass滤波器根据场景材质的散射率设置。
在一些实施例中,利用多个延迟时长,分别对滤波处理的结果进行延迟处理。
在一些实施例中,对滤波处理的结果与利用一个或多个反馈增益的处理结果的加和,进行延迟处理。
图2示出根据本公开一些实施例的空间音频渲染系统框架的示意图。
如图2所示,场景形状(3D mesh)、listener(听者)位置、墙壁(如场景、交点等对应的墙壁)的吸收率,墙壁的散射率以及距离衰减信息等来源于图2中的“元数 据”部分。例如,可以通过下面的实施例生成图2中的双耳回放信号。
图3a~3c示出根据本公开一些实施例的混响的处理方法的示意图。
如图3a所示,渲染处理的框架包括:将声源发出的声音假设为从声源发出的若干条声线;在场景中模拟空气传播、反射、折射等物理现象,收集最后汇聚到听者耳朵的声线强度;在声线传播过程中考虑墙壁的吸收率和距离因素,通过声线强度估计场景的计算模型;使用生成的混响模型对声源的音频数据做混响处理;将混响数据与空间化双耳数据混合,输出最终双耳音频。
在一些实施例中,计算混响模型的输入信息是场景形状、场景的吸收率、场景的散射率以及听者的位置。
以听者为中心,向四周随机且均匀地散射出N条射线,求得这些射线与场景的N个交点Pn,n∈(1,N)。计算出平均交点的坐标:
计算近似立方体房间的形状信息:计算出所有交点Pn到平均交点的坐标处的平均距离:
假设立方体房间的边长为
计算场景材质的平均声学参数。假设在上文提到的N个射线与场景的各交点Pn处,场景材质的吸收率是An,散射率是Sn
例如,平均吸收率为:
例如,平均散射率:
例如,利用估计的立方体房间、材质的平均吸收率以及Eyring(伊林)公式计算混响时长:
当听者的位置发生变化时,从听者处发出的射线会相交于不同的场景物体表面,使得物理模拟过程所计算得到的混响时长T60以及平均散射率发生变化。
与这些变化的声场参数相对应的,提出一种可在运行时,动态调整混响时长与反 射声时域密度的混响处理链路,使得计算出的变化的声场参数可以动态地影响听到的混响声音,从而实现与场景相关的动态混响。动态混响可以为跟随听者的6DoF(6 Degrees of Freedom,6自由度)移动产生不同的混响效果。
动态混响器的输入信号是声源的原始信号或者是经过以下一种或几种效果器处理后的原始声源信号:响度衰减、空气吸收滤波、延迟效果处理或空间化算法。
这个动态可调的人工混响器有很多实现方式,其中的一种实施例如图3b所示。“Allpass×3”滤波器使用是3个级联的全通滤波器(如Schroder Allpass)。
Schroder Allpass滤波器的结构如图3c所示,延迟时长是5~10ms且互为质数;Schroder Allpass滤波器的参数g受平均散射系数控制:
也可以根据需要将系数0.3替换为其他数值。g的值越大,输入的能量在时间轴上越分散。
如图3b所示,延迟0~15的延迟时长是30~50ms且互为质数。反射矩阵是16×16Householder矩阵。g0~g15是16个反馈增益,受混响时长T60控制
delay(n)为第n个反馈增益对应的延迟时长。
上述实施例中,通过实时估计房间的简化形状,计算动态变化的场景中的混响模型;动态可调的人工混响受混响模型的控制。这样,在不大幅度影响渲染速度的前提下,可以渲染动态变化的场景对环境声的影响,让计算能力较为薄弱的设备也可以实时模拟大量声源的动态环境声。从而,能够提高声音渲染的效率和准确性。
图4示出根据本公开一些实施例的混响的处理装置的框图。
如图4所示,混响的处理装置4包括:估计单元41,用于根据以听者为中心的多条声线与场景的多个交点,估计场景的形状信息;计算单元42,用于根据多个交点所在位置的场景材质的第一声学参数值,计算场景的场景材质的第一平均声学参数值,根据场景的形状信息和第一平均声学参数值,计算混响时长。
在一些实施例中,估计单元41根据多个交点的坐标的均值,计算交点的坐标,根据多个交点中的每一个与平均交点的距离的均值,估计场景的形状信息。
在一些实施例中,计算单,42根据多个交点所在位置的场景材质的吸收率的均值,计算场景的场景材质的平均吸收率。
在一些实施例中,场景的形状为立方体,形状信息包括立方体的边长。
在一些实施例中,计算单元42根据场景的边长和场景的场景材质的平均吸收率,计算混响时长。
在一些实施例中,计算单元42根据多个交点所在位置的场景材质的第二声学参数值,计算场景的场景材质的第二平均声学参数值;处理装置4还包括:处理单元43,用于根据第二平均声学参数值和混响时长,对声源信号进行混响处理。
在一些实施例中,计算单元42根据多个交点所在位置的场景材质的散射率的均值,计算场景的场景材质的平均散射率。
在一些实施例中,处理单元43利用全通滤波器,对声源信号进行滤波处理,全通滤波器根据第二平均声学参数值控制。
在一些实施例中,处理单元43基于滤波处理的结果,利用一个或多个反馈增益进行混响处理,一个或多个反馈增益根据混响时长控制。
在一些实施例中,一个或多个反馈增益为多个反馈增益,多个反馈增益中的每一个根据对应的延迟时长确定。
在一些实施例中,处理单元43对滤波处理的结果,进行延迟处理,利用反射矩阵,对延迟处理的结果进行处理,利用一个或多个反馈增益,对反射矩阵的处理结果进行处理。
在一些实施例中,处理单元43利用多个延迟时长,分别对滤波处理的结果进行延迟处理。
在一些实施例中,处理单元43对滤波处理的结果与利用一个或多个反馈增益的处理结果的加和,进行延迟处理。
图5示出根据本公开另一些实施例的混响的处理装置的框图。
如图5所示,该实施例的混响的处理装置5包括:存储器51以及耦接至该存储器51的处理器52,处理器52被配置为基于存储在存储器51中的指令,执行本公开中任意一个实施例中的混响的处理方法。
其中,存储器51例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。
图6示出根据本公开又一些实施例的混响的处理装置的框图。
如图6所示,该实施例的混响的处理装置6包括:存储器610以及耦接至该存储器610的处理器620,处理器620被配置为基于存储在存储器610中的指令,执行前 述任意一个实施例中的混响的处理方法。
存储器610例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。
混响的处理装置6还可以包括输入输出接口630、网络接口640、存储接口650等。这些接口630、640、650以及存储器610和处理器620之间例如可以通过总线860连接。其中,输入输出接口630为显示器、鼠标、键盘、触摸屏、麦克、音箱等输入输出设备提供连接接口。网络接口640为各种联网设备提供连接接口。存储接口650为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
至此,已经详细描述了根据本公开的。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
可能以许多方式来实现本公开的方法和系统。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和系统。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。

Claims (19)

  1. 一种混响的处理方法,包括:
    根据以听者为中心的多条声线与场景的多个交点,估计所述场景的形状信息;
    根据所述多个交点所在位置的场景材质的第一声学参数值,计算所述场景的场景材质的第一平均声学参数值;
    根据所述场景的形状信息和所述第一平均声学参数值,计算混响时长。
  2. 根据权利要求1所述的处理方法,其中,所述根据以听者为中心的多条声线与场景的多个交点,估计所述场景的形状信息包括:
    根据所述多个交点的坐标的均值,计算平均交点的坐标;
    根据所述多个交点中的每一个与所述平均交点的距离的均值,估计所述场景的形状信息。
  3. 根据权利要求1或2所述的处理方法,其中,所述根据所述多个交点所在位置的场景材质的第一声学参数值,计算所述场景的场景材质的第一平均声学参数值包括:
    根据所述多个交点所在位置的场景材质的吸收率的均值,计算所述场景的场景材质的平均吸收率。
  4. 根据权利要求1-3任一项所述的处理方法,其中,所述场景的形状为立方体,所述形状信息包括所述立方体的边长。
  5. 根据权利要求4所述的处理方法,其中,所述根据所述场景的形状信息和所述第一平均声学参数值,计算混响时长包括:
    根据所述场景的边长和所述场景的场景材质的平均吸收率,计算所述混响时长。
  6. 根据权利要求1-5任一项所述的处理方法,还包括:
    根据所述多个交点所在位置的场景材质的第二声学参数值,计算所述场景的场景材质的第二平均声学参数值;
    根据所述第二平均声学参数值和所述混响时长,对声源信号进行混响处理。
  7. 根据权利要求6所述的处理方法,其中,所述根据所述多个交点所在位置的场景材质的第二声学参数值,计算所述场景的场景材质的第二平均声学参数值包括:
    根据所述多个交点所在位置的场景材质的散射率的均值,计算所述场景的场景材质的平均散射率。
  8. 根据权利要求6或7所述的处理方法,其中,所述对声源信号进行混响处理包括:
    利用全通滤波器,对所述声源信号进行滤波处理,所述全通滤波器根据所述第二平均声学参数值控制。
  9. 根据权利要求7或8所述的处理方法,其中,所述对声源信号进行混响处理包括:
    基于所述滤波处理的结果,利用一个或多个反馈增益进行所述混响处理,所述一个或多个反馈增益根据所述混响时长控制。
  10. 根据权利要求9所述的处理方法,其中,所述一个或多个反馈增益为多个反馈增益,所述多个反馈增益中的每一个根据对应的延迟时长确定。
  11. 根据权利要求9或10所述的处理方法,其中,所述基于所述滤波处理的结果,利用一个或多个反馈增益进行所述混响处理包括:
    对所述滤波处理的结果,进行延迟处理;
    利用反射矩阵,对所述延迟处理的结果进行处理;
    利用所述一个或多个反馈增益,对所述反射矩阵的处理结果进行处理。
  12. 根据权利要求11所述的处理方法,其中,所述对所述滤波处理的结果,进行延迟处理包括:
    利用多个延迟时长,分别对所述滤波处理的结果进行延迟处理。
  13. 根据权利要求11或12所述的处理方法,其中,所述对所述滤波处理的结果, 进行延迟处理包括:
    对所述滤波处理的结果与利用所述一个或多个反馈增益的处理结果的加和,进行所述延迟处理。
  14. 一种混响的处理装置,包括:
    估计单元,用于根据以听者为中心的多条声线与场景的多个交点,估计所述场景的形状信息;
    计算单元,用于根据所述多个交点所在位置的场景材质的第一声学参数值,计算所述场景的场景材质的第一平均声学参数值,根据所述场景的形状信息和所述第一平均声学参数值,计算混响时长。
  15. 根据权利要求14所述的处理装置,其中,
    所述计算单元根据所述多个交点所在位置的场景材质的第二声学参数值,计算所述场景的场景材质的第二平均声学参数值;
    所述处理装置还包括:
    处理单元,用于根据所述第二平均声学参数值和所述混响时长,对声源信号进行混响处理。
  16. 一种混响的处理装置,包括:
    存储器;和
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行权利要求1-13任一项所述的混响的处理方法。
  17. 一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1-13任一项所述的混响的处理方法。
  18. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-13任一项所述的混响的处理方法。
  19. 一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-13任一项所述的混响的处理方法。
PCT/CN2023/121368 2022-09-30 2023-09-26 混响的处理方法、装置和非易失性计算机可读存储介质 WO2024067543A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2022/123290 2022-09-30
CN2022123290 2022-09-30

Publications (1)

Publication Number Publication Date
WO2024067543A1 true WO2024067543A1 (zh) 2024-04-04

Family

ID=90476343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/121368 WO2024067543A1 (zh) 2022-09-30 2023-09-26 混响的处理方法、装置和非易失性计算机可读存储介质

Country Status (1)

Country Link
WO (1) WO2024067543A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1282444A (zh) * 1997-10-20 2001-01-31 诺基亚有限公司 一种处理虚拟声音环境的方法和系统
CN102456353A (zh) * 2010-10-22 2012-05-16 雅马哈株式会社 声光转换器及声场可视化系统
CN111213082A (zh) * 2017-10-17 2020-05-29 奇跃公司 混合现实空间音频
CN114662185A (zh) * 2022-03-09 2022-06-24 杭州群核信息技术有限公司 室内声学仿真设计方法、装置、设备及存储介质
WO2022144493A1 (en) * 2020-12-29 2022-07-07 Nokia Technologies Oy A method and apparatus for fusion of virtual scene description and listener space description

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1282444A (zh) * 1997-10-20 2001-01-31 诺基亚有限公司 一种处理虚拟声音环境的方法和系统
CN102456353A (zh) * 2010-10-22 2012-05-16 雅马哈株式会社 声光转换器及声场可视化系统
CN111213082A (zh) * 2017-10-17 2020-05-29 奇跃公司 混合现实空间音频
WO2022144493A1 (en) * 2020-12-29 2022-07-07 Nokia Technologies Oy A method and apparatus for fusion of virtual scene description and listener space description
CN114662185A (zh) * 2022-03-09 2022-06-24 杭州群核信息技术有限公司 室内声学仿真设计方法、装置、设备及存储介质

Similar Documents

Publication Publication Date Title
Raghuvanshi et al. Parametric directional coding for precomputed sound propagation
US10602298B2 (en) Directional propagation
WO2019246159A1 (en) Spatial audio for interactive audio environments
US9940922B1 (en) Methods, systems, and computer readable media for utilizing ray-parameterized reverberation filters to facilitate interactive sound rendering
Tsingos et al. Soundtracks for computer animation: sound rendering in dynamic environments with occlusions
US11412340B2 (en) Bidirectional propagation of sound
US11595773B2 (en) Bidirectional propagation of sound
US11651762B2 (en) Reverberation gain normalization
Schissler et al. Efficient construction of the spatial room impulse response
US10911885B1 (en) Augmented reality virtual audio source enhancement
Rosen et al. Interactive sound propagation for dynamic scenes using 2D wave simulation
Pind et al. Acoustic virtual reality–methods and challenges
US20230306953A1 (en) Method for generating a reverberation audio signal
WO2024067543A1 (zh) 混响的处理方法、装置和非易失性计算机可读存储介质
Oliveira et al. Real-time dynamic image-source implementation for auralisation
Raghuvanshi et al. Interactive and Immersive Auralization
Foale et al. Portal-based sound propagation for first-person computer games
WO2023051708A1 (zh) 用于空间音频渲染的系统、方法和电子设备
WO2023051703A1 (zh) 一种音频渲染系统和方法
Johnson et al. Taking advantage of geometric acoustics modeling using metadata
Antani Interactive Sound Propagation using Precomputation and Statistical Approximations
Vorländer Aspects of real-time processing
IEETA REAL-TIME DYNAMIC IMAGE-SOURCE IMPLEMENTATION FOR AURALISATION

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23870772

Country of ref document: EP

Kind code of ref document: A1