WO2023051627A1 - 音频渲染方法、音频渲染设备和电子设备 - Google Patents

音频渲染方法、音频渲染设备和电子设备 Download PDF

Info

Publication number
WO2023051627A1
WO2023051627A1 PCT/CN2022/122204 CN2022122204W WO2023051627A1 WO 2023051627 A1 WO2023051627 A1 WO 2023051627A1 CN 2022122204 W CN2022122204 W CN 2022122204W WO 2023051627 A1 WO2023051627 A1 WO 2023051627A1
Authority
WO
WIPO (PCT)
Prior art keywords
propagation path
sound propagation
audio
frame
energy
Prior art date
Application number
PCT/CN2022/122204
Other languages
English (en)
French (fr)
Inventor
叶煦舟
黄传增
史俊杰
张正普
柳德荣
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023051627A1 publication Critical patent/WO2023051627A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present disclosure relates to the technical field of audio signal processing, and in particular to an audio rendering method, an audio rendering device, an electronic device, a non-transitory computer-readable storage medium, and a computer program product.
  • each sound propagation path between the listener and the sound source carries an energy attenuation coefficient or a set of energy attenuation coefficients.
  • Factors that affect the energy attenuation coefficient include the directivity of the sound source, the reflective surface passing through the sound propagation path, and the air absorption coefficient. After the original signal of the sound source is attenuated by the energy attenuation coefficient, it can be expressed as the signal presented when the sound propagates through this path and finally reaches the listener.
  • path caching mechanisms based on the principle of temporal coherence, a path that is blocked and thus energy cleared will be deleted immediately. But if the path is only temporarily blocked, such as a car passing by the side of the listener, the path A coming from the side is temporarily blocked; when the car passes by quickly, path A should continue to exist. But in fact, path A will be completely deleted.
  • an audio rendering method including: acquiring scene-related audio metadata, where the scene-related audio metadata includes information about a sound propagation path between a sound source and a listener; Determine parameters for audio rendering based on scene-related audio metadata, the parameters for audio rendering include energy attenuation coefficients for each sound propagation path; based on the parameters for audio rendering, the audio of the sound source performing spatial audio encoding on the signal to obtain an encoded audio signal; and performing spatial audio decoding on the encoded audio signal to obtain a decoded audio signal for audio rendering.
  • an audio rendering device including: a metadata acquisition unit configured to acquire scene-related audio metadata, the scene-related audio metadata including Relevant information of sound propagation paths between them; a parameter determination unit configured to determine parameters for audio rendering based on scene-related audio metadata, and the parameters for audio rendering include energy attenuation coefficients for each sound propagation path
  • a spatial audio encoding unit configured to perform spatial audio encoding on the audio signal of the sound source based on the parameters for audio rendering to obtain an encoded audio signal
  • a spatial decoding unit configured to encode the encoded audio The signal is subjected to spatial audio decoding to obtain a decoded audio signal for audio rendering.
  • an electronic device including: a memory; and a processor coupled to the memory, the processor configured to execute the present disclosure based on instructions stored in the memory device.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the audio rendering method of any embodiment described in the present disclosure is implemented.
  • a computer program product comprising instructions which, when executed by a processor, implement the audio rendering method of any one of the embodiments described in the present disclosure.
  • Figure 1 shows a schematic diagram of some embodiments of an audio system architecture
  • Figure 2 shows a flowchart of an exemplary implementation of an audio rendering process according to an embodiment of the present disclosure
  • Fig. 3 shows a schematic diagram of some embodiments of the state transition of the sound propagation path at each rendering
  • Figure 4 shows a flowchart of some embodiments of the audio rendering method of the present disclosure
  • FIG. 5 shows a structural block diagram of some embodiments of the audio rendering device of the present disclosure
  • Figure 6 shows a block diagram of some embodiments of an electronic device of the present disclosure
  • Fig. 7 shows a block diagram of other embodiments of the electronic device of the present disclosure.
  • Figure 8 shows a block diagram of some embodiments of a chip of the present disclosure.
  • Figure 1 shows a schematic diagram of some embodiments of an audio system architecture.
  • An exemplary implementation of various stages of an audio rendering process/system is shown therein, mainly showing production and consumption stages in an audio system, and optionally also including intermediate processing stages such as compression.
  • the audio track interface and common audio metadata are used for authorization and metadata marking.
  • authorization and metadata marking For example, normalization processing is also possible.
  • the processing result of the production side is subjected to spatial audio encoding and decoding processing to obtain a compression result.
  • the processing results (or compression results) on the production side use the audio track interface and general audio metadata (such as ADM extensions, etc.) to perform metadata recovery and rendering processing; perform audio rendering processing on the processing results and then input them to the audio equipment.
  • general audio metadata such as ADM extensions, etc.
  • the audio processing input may include scene-related information and metadata, object-based audio signals, FOA (First-Order Ambisonics, first-order panoramic sound microphone), HOA (Higher-Order Ambisonics, high-order panoramic Acoustic microphone), stereo, surround sound, etc.; the output of audio processing includes stereo audio output, etc.
  • FOA First-Order Ambisonics, first-order panoramic sound microphone
  • HOA Higher-Order Ambisonics, high-order panoramic Acoustic microphone
  • stereo surround sound
  • the output of audio processing includes stereo audio output, etc.
  • the audio rendering system mainly includes a rendering metadata system and a core rendering system.
  • the metadata system there is control information describing audio content and rendering technology. It is still an object or sound field HOA, as well as dynamic sound source and listener position information, and rendered acoustic environment information such as house shape, size, wall texture, etc.
  • the core rendering system renders corresponding playback devices and environments based on different audio signal representations and corresponding metadata parsed from the metadata system.
  • the input audio signal is received, and analyzed or directly transmitted according to the format of the input audio signal.
  • the input audio signal when the input audio signal is an input signal with any spatial audio exchange format, the input audio signal can be analyzed to obtain an audio signal with a specific spatial audio representation, such as an object-based spatial audio representation signal, a scene-based
  • the spatial audio representation signal, the channel-based spatial audio representation signal, and associated metadata are then passed on to the subsequent processing stages.
  • the input audio signal is directly an audio signal with a specific spatial audio representation, it is directly passed to the subsequent processing stage without parsing.
  • audio signals may be directly passed to the audio encoding stage, such as object-based audio representation signals, scene-based audio representation signals, and channel-based audio representation signals, which need to be encoded.
  • the audio signal for that particular spatial representation is of a type/format that does not require encoding, it can be passed directly to the audio decoding stage, e.g. it could be a non-narrative channel track in a parsed channel-based audio representation, or Narrative soundtrack without encoding.
  • information processing may be performed based on the acquired metadata, so as to extract and obtain audio parameters related to each audio signal, and such audio parameters may be used as metadata information.
  • the information processing here can be performed on any one of the audio signal obtained through analysis and the directly transmitted audio signal. Of course, as mentioned above, such information processing is optional and does not have to be performed.
  • signal encoding is performed on the audio signal of the specific spatial audio representation.
  • signal encoding can be performed on an audio signal of a specific spatial audio representation based on metadata information, and the resulting encoded audio signal is either passed directly to a subsequent audio decoding stage, or an intermediate signal is obtained and then passed to a subsequent audio decoding stage.
  • the audio signal of a particular spatial audio representation does not need to be encoded, such an audio signal can be passed directly to the audio decoding stage.
  • the received audio signal can be decoded to obtain an audio signal suitable for playback in the user application scene as an output signal.
  • Such an output signal can pass through the user application scene, such as an audio playback environment.
  • the audio playback device is presented to the user.
  • this disclosure proposes A scheme for smoothing sound effects by adding a "occluded" state to each sound propagation path, and setting an energy attenuation coefficient for each path.
  • This disclosure defines two states for each sound propagation path of sound: valid and invalid. There are two states when the sound propagation path is valid: blocked and not blocked.
  • the state of the sound propagation path is determined to be invalid, and the sound propagation path determined to be invalid is deleted.
  • the state of the sound propagation path is determined to be valid.
  • the sound propagation path is determined to be valid, continue to detect whether the sound propagation path is blocked.
  • For the sound propagation path determined to be in the "occluded” state its energy attenuation coefficient is reduced frame by frame.
  • For the sound propagation path that is determined to be in the "unoccluded” state its energy attenuation coefficient is increased frame by frame until the coefficient becomes 1.
  • Fig. 3 shows a schematic diagram of the state transition of the sound propagation path during each rendering according to an embodiment of the present disclosure. As shown in Figure 3, when a new path is created, its state should be "valid”. If it is detected that the ray of the sound propagation path intersects with the scene, it is determined that it is blocked, and its state is changed to "blocked".
  • the path A from the side is temporarily blocked, and the status of the rendered sound propagation path becomes "occluded” at this time; when the car passes by quickly, the path A continues to exist, At this point the status of the rendered sound propagation path changes to "Valid” again.
  • the sound propagation path is judged as "invalid”, and the invalid path will be deleted.
  • the energy attenuation coefficient g of the sound propagation path is smoothly updated according to the path state at each rendering, and commonly used update methods include but are not limited to exponential change or linear change.
  • the commonly used exponential change update method is:
  • exp is a preset decay speed, which can be set to 0.9 according to a preferred embodiment of the present invention. It should be understood that this preferred value is exemplary only and not intended to be limiting. In fact, the preset decay speed can be set according to actual needs.
  • the commonly used linear change update method is:
  • delta is a preset decay speed, which can be set to 0.05 according to a preferred embodiment of the present invention. It should be understood that this preferred value is also only exemplary and not intended to be limiting. In fact, the preset decay speed can be set according to actual needs.
  • the existing energy of each frequency band in the attenuation of a certain sound path is p
  • the number of frequency bands is N bands
  • the subscript of the frequency band is ⁇
  • the energy attenuation coefficient of each frequency band of the path (that is, the fade-in and fade-out energy coefficient) is g
  • the sound propagation of each frequency band The energy b on the path can be calculated by various calculation methods. As an example, two commonly used calculation methods are given below. Assume that the energy threshold is epsilon, so whether the energy b is less than the threshold can be calculated by any of the following two commonly used calculation methods:
  • Fig. 4 shows an example flowchart of an audio rendering method according to an embodiment of the present disclosure.
  • the scene-related audio metadata may include acoustic environment information, such as information about the sound propagation path between the sound source and the listener, including but not limited to the state and energy of the sound propagation path.
  • the state of the sound propagation path between the sound source and the listener includes valid and invalid, and sub-states of blocked and not blocked in the valid state.
  • the energy of the sound propagation path is less than the threshold, it is determined that the state of the sound propagation path is invalid, otherwise it is determined that the state of the sound propagation path is valid. If it is detected that the ray belonging to the sound propagation path intersects the scene, it is determined that the sound propagation path is blocked, otherwise it is determined that the sound propagation path is not blocked.
  • parameters for audio rendering are determined based on scene-related audio metadata.
  • the parameters for audio rendering include an energy attenuation coefficient for each sound propagation path.
  • step S430 based on the parameters for audio rendering, spatial audio coding is performed on the audio signal of the sound source to obtain a coded audio signal.
  • step S440 spatial audio decoding is performed on the encoded audio signal to obtain a decoded audio signal for audio rendering.
  • determining parameters for audio rendering based on scene-related audio metadata may include performing an energy attenuation coefficient for each sound propagation path based on relevant information (eg, state and energy) of the sound propagation path Adjustment.
  • adjusting the energy attenuation coefficient of each sound propagation path includes first determining whether the path is valid by comparing the energy of the sound propagation path with a threshold value, and then determining whether the sound propagation path is blocked if the path is valid Make a judgment. Specifically, when the energy of the sound propagation path is less than a threshold, it is determined that the state of the sound propagation path is invalid, and the sound propagation path is deleted. When the energy of the sound propagation path is not less than a threshold, it is determined that the state of the sound propagation path is valid.
  • the state of the sound propagation path is detected or determined every time the spatial audio rendering is performed. That is to say, each frame of spatial audio rendering detects or determines the state of the sound propagation path.
  • reducing the energy attenuation coefficient frame by frame includes: in response to judging that the sound propagation path is blocked, multiplying the existing energy attenuation coefficient by a preset exponential attenuation Speed, or decrease the preset linear decay speed based on the existing energy decay coefficient.
  • the energy of the sound propagation path can be calculated using various methods such as those described in the above formula 3, for example, taking the maximum value of the existing energy of each frequency band in the energy attenuation as the energy of the sound propagation path or taking the energy of each frequency band in the energy attenuation
  • the product of the average energy of and the energy attenuation coefficient is used as the energy of the sound propagation path.
  • increasing the energy attenuation coefficient frame by frame until the energy attenuation coefficient is 1 includes: determining the energy attenuation coefficient of each frame frame by frame The minimum value of 1 and the following values: 1-exp*(1-g old ), where exp is the exponential decay speed, and g old is the energy decay coefficient of the previous frame; or determine the energy of each frame frame by frame The decay coefficient is the minimum of 1 and the following values: g old + delta, where delta is the linear decay speed, and g old is the energy decay coefficient of the previous frame.
  • This disclosure determines whether the state of the sound propagation path is "valid” or “invalid”, and further determines whether each sound propagation path is "occluded” when the sound propagation path is "valid”, and determines whether it is "occluded”.
  • the sound propagation path is not deleted immediately, but its energy coefficient is reduced frame by frame, and the energy coefficient of the sound propagation path judged to be "not blocked” is increased frame by frame, which solves the sudden appearance/ Swells/swells of sound energy in certain directions caused by vanishing sound paths, resulting in a smooth, click-free sound.
  • FIG. 5 shows a schematic structural block diagram of the audio rendering device.
  • the audio rendering device 500 includes a metadata acquisition unit 510 , a parameter determination unit 520 , a spatial audio encoding unit 530 and a spatial audio decoding unit 540 .
  • the metadata acquiring unit 510 is configured to acquire scene-related audio metadata
  • the scene-related audio metadata may include, for example, information about the sound propagation path between the sound source and the listener, including but Not limited to the state and energy of the sound propagation path.
  • the parameter determination unit 520 is configured to determine parameters for audio rendering based on scene-related audio metadata, the parameters for audio rendering including energy attenuation coefficients for each sound propagation path.
  • the spatial audio encoding unit 530 is configured to perform spatial audio encoding on the audio signal of the sound source based on the parameters for audio rendering to obtain an encoded audio signal.
  • the spatial audio decoding unit 540 is configured to perform spatial audio decoding on the encoded audio signal to obtain a decoded audio signal for audio rendering.
  • the state of the sound propagation path between the sound source and the listener includes valid and invalid, and the blocked and unblocked sub-states in the valid state, which will not be repeated here.
  • the parameter determination unit 520 may be further configured to adjust the energy attenuation coefficient of each sound propagation path based on relevant information (eg, state and energy) of the sound propagation path.
  • adjusting the energy attenuation coefficient of each sound propagation path includes first determining whether the path is valid by comparing the energy of the sound propagation path with a threshold value, and then determining whether the sound propagation path is blocked if the path is valid Make a judgment. Specifically, when the energy of the sound propagation path is less than a threshold, it is determined that the state of the sound propagation path is invalid, and the sound propagation path is deleted. When the energy of the sound propagation path is not less than a threshold, it is determined that the state of the sound propagation path is valid.
  • the state of the sound propagation path is detected or determined every time the spatial audio rendering is performed.
  • reducing the energy attenuation coefficient frame by frame includes: in response to judging that the sound propagation path is blocked, multiplying the existing energy attenuation coefficient by a preset exponential attenuation Speed, or decrease the preset linear decay speed based on the existing energy decay coefficient.
  • the energy of the sound propagation path can be calculated using various methods such as those described in the above formula 3, for example, taking the maximum value of the existing energy of each frequency band in the energy attenuation as the energy of the sound propagation path or taking the energy of each frequency band in the energy attenuation
  • the product of the average energy of and the energy attenuation coefficient is used as the energy of the sound propagation path.
  • increasing the energy attenuation coefficient frame by frame until the energy attenuation coefficient is 1 includes: determining the energy attenuation coefficient of each frame frame by frame The minimum value of 1 and the following values: 1-exp*(1-g old ), where exp is the exponential decay speed, and g old is the energy decay coefficient of the previous frame; or determine the energy of each frame frame by frame The attenuation coefficient is the minimum of 1 and the following values: g old + delta, where delta is the linear decay speed, and g old is the energy attenuation coefficient of the previous frame.
  • Figure 6 shows a block diagram of some embodiments of an electronic device of the present disclosure.
  • the electronic device 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51 , the processor 52 is configured to execute any one of the present disclosure based on instructions stored in the memory 51 .
  • the memory 51 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • FIG. 7 it shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure.
  • the electronic equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 7 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • Fig. 7 shows a block diagram of other embodiments of the electronic device of the present disclosure.
  • an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be stored in a read-only memory (ROM) 602 according to a program or loaded into a random access memory from a storage device 608. (RAM) 603 to execute various appropriate actions and processing. In the RAM 603, various programs and data necessary for the operation of the electronic device are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, an output device 607 such as a vibrator; a storage device 608 including, for example, a magnetic tape, a hard disk, and the like; and a communication device 609 .
  • the communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 7 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • a chip including: at least one processor and an interface, the interface is used to provide at least one processor with computer-executed instructions, and at least one processor is used to execute computer-executed instructions to implement any of the above-mentioned embodiments Estimation method of reverberation duration, or rendering method of audio signal.
  • Figure 8 shows a block diagram of some embodiments of a chip of the present disclosure.
  • the processor 70 of the chip is mounted on the main CPU (Host CPU) as a coprocessor, and the tasks are assigned by the Host CPU.
  • the core part of the processor 70 is an operation circuit, and the controller 704 controls the operation circuit 703 to extract data in the memory (weight memory or input memory) and perform operations.
  • the operation circuit 703 includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 703 is a two-dimensional systolic array.
  • the arithmetic circuit 703 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 703 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 702, and caches it in each PE in the operation circuit.
  • the operation circuit takes the data of matrix A from the input memory 701 and performs matrix operation with matrix B, and the obtained partial results or final results of the matrix are stored in the accumulator (accumulator) 708 .
  • the vector computing unit 707 can further process the output of the computing circuit, such as vector multiplication, vector addition, exponent operation, logarithmic operation, size comparison and so on.
  • the vector computation unit can 707 store the processed output vectors to the unified buffer 706 .
  • the vector calculation unit 707 may apply a non-linear function to the output of the operation circuit 703, such as a vector of accumulated values, to generate activation values.
  • vector computation unit 707 generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as an activation input to the arithmetic circuit 703, for example for use in a subsequent layer in a neural network.
  • the unified memory 706 is used to store input data and output data.
  • the storage unit access controller 705 (Direct Memory Access Controller, DMAC) transfers the input data in the external memory to the input memory 701 and/or the unified memory 706, stores the weight data in the external memory into the weight memory 702, and stores the weight data in the unified memory
  • the data in 706 is stored in external memory.
  • a bus interface unit (Bus Interface Unit, BIU) 510 is used to realize the interaction between the main CPU, DMAC and instruction fetch memory 709 through the bus.
  • An instruction fetch buffer (instruction fetch buffer) 709 connected to the controller 704 is used to store instructions used by the controller 704;
  • the controller 704 is configured to invoke instructions cached in the memory 709 to control the operation process of the computing accelerator.
  • the unified memory 706, the input memory 701, the weight memory 702, and the instruction fetch memory 709 are all on-chip (On-Chip) memories
  • the external memory is a memory outside the NPU
  • the external memory can be a double data rate synchronous dynamic random Memory (Double Data Rate Synchronous Dynamic Random AccessMemory, DDR SDRAM), high bandwidth memory (High Bandwidth Memory, HBM) or other readable and writable memory.
  • DDR SDRAM Double Data Rate Synchronous Dynamic Random AccessMemory
  • HBM High Bandwidth Memory
  • a computer program including: instructions, which, when executed by a processor, cause the processor to execute the method for estimating the reverberation duration or the method for rendering an audio signal in any one of the above embodiments.
  • a computer program product includes one or more computer instructions or computer programs.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

本公开涉及音频渲染方法、音频渲染设备和电子设备。该音频渲染方法,包括:获取场景相关的音频元数据,所述场景相关的音频元数据包括声源到听者之间的声音传播路径的相关信息;基于场景相关的音频元数据确定用于音频渲染的参数,所述用于音频渲染的参数包括针对每条声音传播路径的能量衰减系数;基于所述用于音频渲染的参数,对声源的音频信号进行空间编码以获得编码的音频信号;以及对所述编码的音频信号进行空间解码,以获得用于音频渲染的解码的音频信号。

Description

音频渲染方法、音频渲染设备和电子设备
相关申请的交叉引用
本申请要求2021年9月28日提交的申请号为PCT/CN2021/121135的国际专利申请的权益,该申请通过引用并入本文。
技术领域
本公开涉及音频信号处理技术领域,特别涉及一种音频渲染方法、音频渲染设备、电子设备、非瞬时性计算机可读存储介质和计算机程序产品。
背景技术
在基于光线追踪的空间音频渲染技术中,每条听者和发声源之间的声音传播路径都会携带一个或一组能量衰减系数。影响该能量衰减系数的包括声源的指向性、声音传播路径上经过的反射表面、以及空气吸收系数等。声源的原始信号经过该能量衰减系数衰减后,可表示为声音通过该路径传播,最终到达听者处时所呈现的信号。
然而,当具有这些特性的声音传播路径被遮挡时,若不做处理,该路径的能量会瞬间消失。瞬间消失的能量会在该路径方向上产生一个极为陡峭的音量台阶,产生诸如clicking的可感知的杂音。
反之亦然,当一条声音传播路径被刚刚找到时,该路径方向上会突然出现一个能量。瞬间增强的能量会以相同的原因在该路径方向上造成可感知的杂音。
不仅如此,瞬间消失的路径能量还可能使得反射声能量失去方向的时间连续性。在某些基于时序连贯性原理的路径缓存机制中,一个被遮挡,进而能量清零的路径会被立即删除。但如果该路径只是被暂时遮挡,例如一辆车从听者侧面经过,从侧面传来的路径A被短暂遮挡;当车很快经过之后,路径A理应继续存在。但事实上,路径A会被完全删除。
发明内容
根据本公开的一些实施例,提供了一种音频渲染方法,包括:获取场景相关的音频元数据,所述场景相关的音频元数据包括声源到听者之间的声音传播路径的相关信息;基于场景相关的音频元数据确定用于音频渲染的参数,所述用于音频渲染的参数包括针 对每条声音传播路径的能量衰减系数;基于所述用于音频渲染的参数,对声源的音频信号进行空间音频编码以获得编码的音频信号;以及对所述编码的音频信号进行空间音频解码,以获得用于音频渲染的解码的音频信号。
根据本公开的另一些实施例,提供了一种音频渲染设备,包括:元数据获取单元,被配置为获取场景相关的音频元数据,所述场景相关的音频元数据包括声源到听者之间的声音传播路径的相关信息;参数确定单元,被配置为基于场景相关的音频元数据确定用于音频渲染的参数,所述用于音频渲染的参数包括针对每条声音传播路径的能量衰减系数;空间音频编码单元,被配置为基于所述用于音频渲染的参数,对声源的音频信号进行空间音频编码以获得编码的音频信号;以及空间解码单元,被配置为对所述编码的音频信号进行空间音频解码,以获得用于音频渲染的解码的音频信号。
根据本公开的又一些实施例,提供了一种电子设备,包括:存储器;和耦接至存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行本公开中所述的任一实施例的音频渲染方法。
根据本公开的再一些实施例,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本公开中所述的任一实施例的音频渲染方法。
根据本公开的还一些实施例,提供了一种计算机程序产品,包括指令,所述指令当由处理器执行时实现本公开中所述的任一实施例的音频渲染方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1示出了音频系统架构的一些实施例的示意图;
图2示出了根据本公开的实施例的音频渲染过程的示例性实现的流程图
图3示出了声音传播路径在每次渲染时的状态转换的一些实施例的示意图;
图4示出了本公开的音频渲染方法的一些实施例的流程图;
图5示出了本公开的音频渲染设备的一些实施例的结构框图;
图6示出了本公开的电子设备的一些实施例的框图;
图7示出了本公开的电子设备的另一些实施例的框图;
图8示出了本公开的芯片的一些实施例的框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
图1示出了音频系统架构的一些实施例的示意图。其中示出了音频渲染过程/系统的各阶段的示例性实现,主要示出了音频系统中的制作和消费阶段,并且可选地还包括中间处理阶段,例如压缩。
如图1所示,在生产侧,根据音频数据和音频源数据,利用音轨接口和通用音频元数据(如ADM扩展等)进行授权和元数据标记。例如,还可以进行标准化处理。
在一些实施例中,将生产侧的处理结果进行空间音频的编码和解码处理,得到压缩结果。
在消费侧,根据生产侧的处理结果(或压缩结果),利用音轨接口和通用音频元数据(如ADM扩展等)进行元数据恢复和渲染处理;对处理结果进行音频渲染处理后输入到音频设备。
在一些实施例中,音频处理的输入可以包括场景相关的信息和元数据、基于目标的音频信号、FOA(First-Order Ambisonics,一阶全景声麦克风)、HOA(Higher-Order Ambisonics,高阶全景声麦克风)、立体声、环绕声等;音频处理的输出包括 立体声音频输出等。
以下将结合附图来描述根据本公开的实施例的音频渲染的示例性实现,其中图2示出了根据本公开的实施例的音频渲染过程的示例性实现的流程图。作为示例,音频渲染系统主要包括渲染元数据系统和核心渲染系统,元数据系统中存在描述音频内容和渲染技术的控制信息,比如音频载荷的输入形式是单通道、双声道、多声道、还是对象(object)或声场HOA,以及动态的声源和听者的位置信息,渲染的声学环境信息如房屋形状、大小、墙体体质等。核心渲染系统依据不同的音频信号表示形式和从元数据系统解析出来的相应元数据来做相应播放设备和环境的渲染。
首先,接收输入音频信号,并且根据输入音频信号的格式进行解析或者直传。一方面,在输入音频信号为具有任意空间音频交换格式的输入信号时,可对输入音频信号进行信号解析以获得具有特定空间音频表示的音频信号,例如基于对象的空间音频表示信号、基于场景的空间音频表示信号、基于声道的空间音频表示信号,以及相关联的元数据,然后将解析结果传递至后续处理阶段。另一方面,在输入音频信号直接为具有特定空间音频表示的音频信号时,无需进行解析而直接传递至后续处理阶段。例如,这样的音频信号可直传到音频编码阶段,例如可以是基于对象的音频表示信号、基于场景的音频表示信号、基于声道的音频表示信号中的需要编码的叙事声道音轨。甚至在该特定空间表示的音频信号为无需编码的类型/格式的情况下,可以直传到音频解码阶段,例如可以是解析出的基于声道的音频表示中的非叙事声道音轨,或者无需编码的叙事声道音轨。
然后,可以基于所获取的元数据来进行信息处理,从而提取并得到各音频信号相关的音频参数,这样的音频参数可以作为元数据信息。这里的信息处理可分别针对解析得到的音频信号以及直传的音频信号中的任一者来执行。当然,如前文所述,这样的信息处理是可选的,并不必需执行。
接下来,对于特定空间音频表示的音频信号来进行信号编码。一方面,可以基于元数据信息对特定空间音频表示的音频信号执行信号编码,所得到的编码音频信号或者直传到后续的音频解码阶段,或者得到中间信号并继而传输到后续的音频解码阶段。另一方面,在特定空间音频表示的音频信号不需要进行编码的情况下,这样的音频信号可以直传到音频解码阶段。
然后,在音频解码阶段,可以对于所接收到的音频信号进行解码,以获得适合于用户应用场景中进行回放的音频信号作为输出信号,这样的输出信号可通过用户应用 场景、例如音频回放环境中的音频回放设备被呈现给用户。
针对声音传播路径被遮挡导致路径方向上的能量骤升、骤降的问题,以及由于声音传播路径被短暂遮挡,导致路径被删除,使其在短暂遮挡事件结束后不可用的问题,本公开提出了通过为每条声音传播路径添加“被遮挡”状态,并为每条路径设置能量衰减系数来平滑声音效果的方案。
本公开为声音的每条声音传播路径定义了两种状态:有效和无效。在声音传播路径有效时又存在两种状态:被遮挡和不被遮挡。
在一条声音传播路径的能量小于阈值时,将该声音传播路径的状态判定为无效,并且将被判定为无效的声音传播路径删除。
在一条声音传播路径的能量不小于阈值时,将该声音传播路径的状态判定为有效。在声音传播路径被判定为有效时,继续检测声音传播路径是否被遮挡。在检测到属于声音传播路径的射线与场景相交的情况下,判定该声音传播路径被遮挡。对于被判定为处于“被遮挡”状态下的声音传播路径,逐帧降低其能量衰减系数。对于被判定为处于“不被遮挡”状态下的声音传播路径,逐帧增加其能量衰减系数,直至该系数变为1。
应理解,当一条声音路径被创建时,其能量衰减系数被设置为0。每次进行空间音频渲染时都对该声音传播路径的状态进行检测或判定。图3示出了根据本公开的实施例的声音传播路径在每次渲染时的状态转换的示意图。如图3所示,在新路径被创建时其状态应为“有效”,如果检测到声音传播路径的射线与场景相交,则判定为其被遮挡,将其状态改变为“被遮挡”。例如,一辆车从听者侧面经过,从侧面传来的路径A被短暂遮挡,此时渲染的声音传播路径的状态变为“被遮挡”;当车很快经过之后,路径A继续存在,此时渲染的声音传播路径的状态重新变为“有效”。在该声音传播路径的能量小于阈值时,该声音传播路径被判定为“无效”,并且无效的路径将被删除。
根据本公开的一些实施例,声音传播路径的能量衰减系数g在每次渲染时根据路径状态进行平滑更新,常用的更新方法包括但不限于指数变化或者线性变化。
根据本公开的一些实施例,例如假设g old是上一次更新获得的衰减系数,即上一帧的衰减系数,常用的指数变化更新方法为:
Figure PCTCN2022122204-appb-000001
其中exp为预设的衰减速度,根据本发明的优选实施例,可将其设置为0.9。应理 解,此优选值仅为示例性的,并不意在对其进行限制。实际上,该预设的衰减速度可根据实际需要进行设定。
根据本公开的另一些实施例,例如假设g old是上一次更新获得的衰减系数,即上一帧的衰减系数,常用的线性变化更新方法为:
Figure PCTCN2022122204-appb-000002
其中delta为预设的衰减速度,根据本发明的优选实施例,可将其设置为0.05。应理解,此优选值同样仅为示例性的,并不意在对其进行限制。实际上,该预设的衰减速度可根据实际需要进行设定。
假设某声音的一路径衰减中各频段现有能量是p,频段数是N bands,频段下标是ω,该路径的各频段能量衰减系数(即淡入淡出能量系数)是g,各频段声音传播路径上的能量b可用多种计算方法进行计算。作为示例,下面给出了常用的两种计算方法。假设能量阈值是epsilon,因此能量b是否小于阈值可用如下常用的两种计算方法中的任何一种计算方法进行计算:
Figure PCTCN2022122204-appb-000003
或者
b=max(p(ω))<epsilon,ω∈[0,N bands-1]   (公式3)
其中epsilon可以用任何很小的浮点数,这决定了被渲染的路径能量的最小能量强度。在本公开的一些实施例中,可以采用epsilon=0.0001(-40dBfs能量)。
图4示出了根据本公开的实施例的音频渲染方法的示例流程图。
如图4所示,在步骤S410处,获取场景相关的音频元数据。根据本发明的实施例,场景相关的音频元数据可包括声学环境信息,例如声源到听者之间的声音传播路径的相关信息,包括但不限于声音传播路径的状态和能量。
如上面所记载的,声源到听者之间的声音传播路径的状态包括有效和无效,以及在有效状态下的被遮挡和不被遮挡子状态。在声音传播路径的能量小于阈值时,判定声音传播路径的状态为无效,否则判定声音传播路径的状态为有效。在检测到属于声音传播路径的射线与场景相交的情况下,判定声音传播路径被遮挡,否则判定声音传播路径不被遮挡。
在步骤S420处,基于场景相关的音频元数据确定用于音频渲染的参数。根据本发明的实施例,用于音频渲染的参数包括针对每条声音传播路径的能量衰减系数。
在步骤S430处,基于所述用于音频渲染的参数,对声源的音频信号进行空间音频编 码以获得编码的音频信号。
在步骤S440处,对所述编码的音频信号进行空间音频解码,以获得用于音频渲染的解码的音频信号。
根据本发明的一些实施例,基于场景相关的音频元数据确定用于音频渲染的参数可包括基于声音传播路径的相关信息(例如,状态和能量),对每条声音传播路径的能量衰减系数进行调整。
根据本发明的一些实施例,对每条声音传播路径的能量衰减系数进行调整包括先通过比较声音传播路径的能量与阈值判定路径是否有效,然后在路径有效的情况下对声音传播路径是否被遮挡进行判定。具体地,在声音传播路径的能量小于阈值时,判定所述声音传播路径的状态为无效,并删除所述声音传播路径。在所述声音传播路径的能量不小于阈值时,判定所述声音传播路径的状态为有效。
如前面所记载的,每次进行空间音频渲染时都对该声音传播路径的状态进行检测或判定。也就是说,每一帧空间音频渲染都对该声音传播路径的状态进行检测或判定。
在判定所述声音传播路径的状态为有效的情况下进一步判定所述声音传播路径是否被遮挡,在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数,并且在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1。
根据本发明的一些实施例,在声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数包括:响应于判断声音传播路径被遮挡,将现有能量衰减系数乘以预设的指数衰减速度,或者在现有能量衰减系数基础上递减预设的线性衰减速度。
这里声音传播路径的能量可以采用诸如上述公式3所记载的多种方法来计算,例如取能量衰减中各个频段的现有能量中的最大值作为该声音传播路径的能量或者取能量衰减中各个频段的平均能量与能量衰减系数的乘积作为该声音传播路径的能量。
根据本发明的一些实施例,在声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1包括:逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:1-exp*(1-g old),其中exp为指数衰减速度,g old为上一帧的能量衰减系数;或者逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:g old+ delta,其中delta为线性衰减速度,g old为上一帧的能量衰减系数。
本公开通过判定声音传播路径的状态是“有效”还是“无效”,并在声音传播路径为“有效”的情况下进一步判定每条声音传播路径是否“被遮挡”,对判定为“被 遮挡”的声音传播路径不马上删除,而是逐帧降低其能量系数,同时对判定为“不被遮挡”的声音传播路径逐帧提升其能量系数,解决了几何声学模拟(Geometric Acoustics)中突然出现/消失的声音路径引起的某些方向上的声音能量骤升/骤降,使得呈现的声音效果是平滑、无杂音的。
本公开实施例提供了一种音频渲染设备,图5示出了该音频渲染设备的示意性结构框图。如图5所示,音频渲染设备500包括元数据获取单元510、参数确定单元520、空间音频编码单元530和空间音频解码单元540。
根据本发明的实施例,元数据获取单元510被配置为获取场景相关的音频元数据,该场景相关的音频元数据例如可包括声源到听者之间的声音传播路径的相关信息,包括但不限于声音传播路径的状态和能量。
根据本发明的实施例,参数确定单元520被配置为基于场景相关的音频元数据确定用于音频渲染的参数,所述用于音频渲染的参数包括针对每条声音传播路径的能量衰减系数。
根据本发明的实施例,空间音频编码单元530被配置为基于所述用于音频渲染的参数,对声源的音频信号进行空间音频编码以获得编码的音频信号。
根据本发明的实施例,空间音频解码单元540被配置为对所述编码的音频信号进行空间音频解码,以获得用于音频渲染的解码的音频信号。
如上面所记载的,声源到听者之间的声音传播路径的状态包括有效和无效,以及在有效状态下的被遮挡和不被遮挡子状态,在此不赘述。
同样应理解,上述记载的用于音频渲染的参数可用于在图2中的空间编码模块中对音频信号进行空间编码。
根据本发明的一些实施例,参数确定单元520可进一步被配置为基于声音传播路径的相关信息(例如,状态和能量),对每条声音传播路径的能量衰减系数进行调整。
根据本发明的一些实施例,对每条声音传播路径的能量衰减系数进行调整包括先通过比较声音传播路径的能量与阈值判定路径是否有效,然后在路径有效的情况下对声音传播路径是否被遮挡进行判定。具体地,在声音传播路径的能量小于阈值时,判定所述声音传播路径的状态为无效,并删除所述声音传播路径。在所述声音传播路径的能量不小于阈值时,判定所述声音传播路径的状态为有效。
如前面所记载的,每次进行空间音频渲染时都对该声音传播路径的状态进行检测或判定。
在判定所述声音传播路径的状态为有效的情况下进一步判定所述声音传播路径是否被遮挡,在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数,并且在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1。
根据本发明的一些实施例,在声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数包括:响应于判断声音传播路径被遮挡,将现有能量衰减系数乘以预设的指数衰减速度,或者在现有能量衰减系数基础上递减预设的线性衰减速度。
这里声音传播路径的能量可以采用诸如上述公式3所记载的多种方法来计算,例如取能量衰减中各个频段的现有能量中的最大值作为该声音传播路径的能量或者取能量衰减中各个频段的平均能量与能量衰减系数的乘积作为该声音传播路径的能量。
根据本发明的一些实施例,在声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1包括:逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:1-exp*(1-g old),其中exp为指数衰减速度,g old为上一帧的能量衰减系数;或者逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:g old+delta,其中delta为线性衰减速度,g old为上一帧的能量衰减系数。
图6示出了本公开的电子设备的一些实施例的框图。
如图6所示,该实施例的电子设备5包括:存储器51以及耦接至该存储器51的处理器52,处理器52被配置为基于存储在存储器51中的指令,执行本公开中任意一个实施例中的混响时长的估计方法,或者音频信号的渲染方法。
其中,存储器51例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。
下面参考图7,其示出了适于用来实现本公开实施例的电子设备的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的电子设备仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
图7示出了本公开的电子设备的另一些实施例的框图。
如图7所示,电子设备可以包括处理装置(例如中央处理器、图形处理器等)601, 其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
在一些实施例中,还提供了芯片,包括:至少一个处理器和接口,接口,用于为至少一个处理器提供计算机执行指令,至少一个处理器用于执行计算机执行指令,实现上述任一个实施例的混响时长的估计方法,或者音频信号的渲染方法。
图8示出了本公开的芯片的一些实施例的框图。
如图8所示,芯片的处理器70作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。处理器70的核心部分为运算电路,控制器704控制运算电路703提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实施例中,运算电路703内部包括多个处理单元(Process Engine,PE)。在一些实施例中,运算电路703是二维脉动阵列。运算电路703还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实施例中,运算电路703是通用的矩阵处理器。
例如,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器702中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器701中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存 在累加器(accumulator)708中。
向量计算单元707可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。
在一些实施例中,向量计算单元能707将经处理的输出的向量存储到统一缓存器706。例如,向量计算单元707可以将非线性函数应用到运算电路703的输出,例如累加值的向量,用以生成激活值。在一些实施例中,向量计算单元707生成归一化的值、合并值,或二者均有。在一些实施例中,处理过的输出的向量能够用作到运算电路703的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器706用于存放输入数据以及输出数据。
存储单元访问控制器705(Direct Memory Access Controller,DMAC)将外部存储器中的输入数据搬运到输入存储器701和/或统一存储器706、将外部存储器中的权重数据存入权重存储器702,以及将统一存储器706中的数据存入外部存储器。
总线接口单元(Bus Interface Unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器709之间进行交互。
与控制器704连接的取指存储器(instruction fetch buffer)709,用于存储控制器704使用的指令;
控制器704,用于调用指存储器709中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器706、输入存储器701、权重存储器702以及取指存储器709均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random AccessMemory,DDR SDRAM)、高带宽存储器(High Bandwidth Memory,HBM)或其他可读可写的存储器。
在一些实施例中,还提供了一种计算机程序,包括:指令,指令当由处理器执行时使处理器执行上述任一个实施例的混响时长的估计方法,或者音频信号的渲染方法。
本领域内的技术人员应当明白,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。在使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行计算机指令或计算机程序时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以为通用计算机、专用计算机、计算机网 络、或者其他可编程装置。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。

Claims (21)

  1. 一种音频渲染方法,包括:
    获取场景相关的音频元数据,所述场景相关的音频元数据包括声源到听者之间的声音传播路径的相关信息;
    基于音频元数据确定用于音频渲染的参数,所述用于音频渲染的参数包括针对每条声音传播路径的能量衰减系数;
    基于所述用于音频渲染的参数,对声源的音频信号进行空间音频编码以获得编码的音频信号;以及
    对所述编码的音频信号进行空间音频解码,以获得用于音频渲染的解码的音频信号。
  2. 根据权利要求1所述的音频渲染方法,其中,基于场景相关的音频元数据确定用于音频渲染的参数包括:基于所述声音传播路径的相关信息,对每条声音传播路径的能量衰减系数进行调整。
  3. 根据权利要求2所述的音频渲染方法,其中,声音传播路径的相关信息包括声音传播路径的状态和能量,并且其中,基于声音传播路径的相关信息,对每条声音传播路径的能量衰减系数进行调整包括:
    在所述声音传播路径的能量小于阈值时,判定所述声音传播路径的状态为无效并删除所述声音传播路径;
    在所述声音传播路径的能量不小于阈值时,判定所述声音传播路径的状态为有效;以及
    在判定所述声音传播路径的状态为有效的情况下判定所述声音传播路径是否被遮挡:
    在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数;以及
    在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1。
  4. 根据权利要求3所述的音频渲染方法,其中在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数包括:
    将现有能量衰减系数乘以预设的指数衰减速度。
  5. 根据权利要求3所述的音频渲染方法,其中在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数包括:
    在现有能量衰减系数基础上递减预设的线性衰减速度。
  6. 根据权利要求3-5中任一项所述的音频渲染方法,其中所述声音传播路径的能量为:
    能量衰减中各个频段的现有能量中的最大值。
  7. 根据权利要求3-5中任一项所述的音频渲染方法,其中所述声音传播路径的能量为:
    能量衰减中各个频段的平均能量与能量衰减系数的乘积。
  8. 根据权利要求3-7中任一项所述的音频渲染方法,其中在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1包括:
    逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:1-exp*(1-g old),其中exp为指数衰减速度,g old为上一帧的能量衰减系数。
  9. 根据权利要求3-7中任一项所述的音频渲染方法,其中在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1包括:
    逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:g old+delta,其中delta为线性衰减速度,g old为上一帧的能量衰减系数。
  10. 一种音频渲染设备,包括:
    元数据获取单元,被配置为获取场景相关的音频元数据,所述场景相关的音频元数据包括声源到听者之间的声音传播路径的相关信息;
    参数确定单元,被配置为基于音频元数据确定用于音频渲染的参数,所述用于音频渲染的参数包括针对每条声音传播路径的能量衰减系数;
    空间音频编码单元,被配置为基于所述用于音频渲染的参数,对声源的音频信号进行空间音频编码以获得编码的音频信号;以及
    空间音频解码单元,被配置为对所述编码的音频信号进行空间音频解码,以获得用于音频渲染的解码的音频信号。
  11. 根据权利要求10所述的音频渲染设备,其中所述参数确定单元进一步被配置为:基于所述声音传播路径的相关信息,对每条声音传播路径的能量衰减系数进行调整。
  12. 根据权利要求11所述的音频渲染设备,其中,声音传播路径的相关信息包括声音传播路径的状态和能量,并且其中,基于声音传播路径的相关信息,对每条声音传播路径的能量衰减系数进行调整包括:
    在所述声音传播路径的能量小于阈值时,判定所述声音传播路径的状态为无效并删除所述声音传播路径;
    在所述声音传播路径的能量不小于阈值时,判定所述声音传播路径的状态为有效;以及
    在判定所述声音传播路径的状态为有效的情况下判定所述声音传播路径是否被遮挡:
    在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数;
    在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1。
  13. 根据权利要求12所述的音频渲染设备,其中在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数包括:
    将现有能量衰减系数乘以预设的指数衰减速度。
  14. 根据权利要求12所述的音频渲染设备,其中在所述声音传播路径被遮挡的情况下,逐帧降低所述能量衰减系数包括:
    在现有能量衰减系数基础上递减预设的线性衰减速度。
  15. 根据权利要求12-14中任一项所述的音频渲染设备,其中所述声音传播路径的能量为:
    能量衰减中各个频段的现有能量中的最大值。
  16. 根据权利要求12-14中任一项所述的音频渲染设备,其中所述声音传播路径的能量为:
    能量衰减中各个频段的平均能量与能量衰减系数的乘积。
  17. 根据权利要求12-16中任一项所述的音频渲染设备,其中在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1包括:
    逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:1-exp*(1-g old),其中exp为指数衰减速度,g old为上一帧的能量衰减系数。
  18. 根据权利要求12-16中任一项所述的音频渲染设备,其中在所述声音传播路径不被遮挡的情况下,逐帧增加所述能量衰减系数,直至所述能量衰减系数为1包括:
    逐帧确定每一帧的所述能量衰减系数为1与下列值中的最小值:g old+delta,其中delta为线性衰减速度,g old为上一帧的能量衰减系数。
  19. 一种电子设备,包括:
    存储器;和
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器装置中的指令,执行根据权利要求1-9中任一项所述的音频渲染方法。
  20. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现根据权利要求1-9中任一项所述的音频渲染方法。
  21. 一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-9中任一项所述的音频渲染方法。
PCT/CN2022/122204 2021-09-28 2022-09-28 音频渲染方法、音频渲染设备和电子设备 WO2023051627A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2021/121135 2021-09-28
CN2021121135 2021-09-28

Publications (1)

Publication Number Publication Date
WO2023051627A1 true WO2023051627A1 (zh) 2023-04-06

Family

ID=85781336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122204 WO2023051627A1 (zh) 2021-09-28 2022-09-28 音频渲染方法、音频渲染设备和电子设备

Country Status (1)

Country Link
WO (1) WO2023051627A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB202319428D0 (en) 2023-12-18 2024-01-31 Prec Planting Llc Ultrasonic cleaning of stir chamber for agricultural sample slurry
GB202319421D0 (en) 2023-12-18 2024-01-31 Prec Planting Llc Ultrasonic cleaning of stir chamber for agricultural sample slurry
WO2024023729A1 (en) 2022-07-28 2024-02-01 Precision Planting Llc Agricultural sample packaging system and related methods

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050182608A1 (en) * 2004-02-13 2005-08-18 Jahnke Steven R. Audio effect rendering based on graphic polygons
CN102622518A (zh) * 2012-03-08 2012-08-01 中山大学 一种基于建筑物群密度的室外声预测方法
WO2020187807A1 (en) * 2019-03-19 2020-09-24 Koninklijke Philips N.V. Audio apparatus and method therefor
CN112365900A (zh) * 2020-10-30 2021-02-12 北京声智科技有限公司 一种语音信号增强方法、装置、介质和设备
CN112771894A (zh) * 2018-10-02 2021-05-07 高通股份有限公司 针对计算机介导现实系统进行渲染时表示遮挡

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050182608A1 (en) * 2004-02-13 2005-08-18 Jahnke Steven R. Audio effect rendering based on graphic polygons
CN102622518A (zh) * 2012-03-08 2012-08-01 中山大学 一种基于建筑物群密度的室外声预测方法
CN112771894A (zh) * 2018-10-02 2021-05-07 高通股份有限公司 针对计算机介导现实系统进行渲染时表示遮挡
WO2020187807A1 (en) * 2019-03-19 2020-09-24 Koninklijke Philips N.V. Audio apparatus and method therefor
CN112365900A (zh) * 2020-10-30 2021-02-12 北京声智科技有限公司 一种语音信号增强方法、装置、介质和设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024023729A1 (en) 2022-07-28 2024-02-01 Precision Planting Llc Agricultural sample packaging system and related methods
WO2024023731A1 (en) 2022-07-28 2024-02-01 Precision Planting Llc Agricultural sample packaging system
WO2024023728A1 (en) 2022-07-28 2024-02-01 Precision Planting Llc Agricultural sample packaging system
GB202319428D0 (en) 2023-12-18 2024-01-31 Prec Planting Llc Ultrasonic cleaning of stir chamber for agricultural sample slurry
GB202319421D0 (en) 2023-12-18 2024-01-31 Prec Planting Llc Ultrasonic cleaning of stir chamber for agricultural sample slurry

Similar Documents

Publication Publication Date Title
WO2023051627A1 (zh) 音频渲染方法、音频渲染设备和电子设备
JP7405989B2 (ja) マシン向け映像符号化における方法及び装置
US11062714B2 (en) Ambisonic encoder for a sound source having a plurality of reflections
US20180047400A1 (en) Method, terminal, system for audio encoding/decoding/codec
MX2012002182A (es) Determinacion de factor de escala de banda de frecuencia en la codificacion de audio con base en la energia de señal de banda de frecuencia.
CN114586055A (zh) 具有微结构掩码的多尺度因子图像超分辨率
CN113035223B (zh) 音频处理方法、装置、设备及存储介质
WO2023025294A1 (zh) 用于音频渲染的信号处理方法、装置和电子设备
US20240153481A1 (en) Audio signal rendering method and apparatus, and electronic device
CN114845212A (zh) 音量优化方法、装置、电子设备及可读存储介质
RU2728812C1 (ru) Системы и способы для отложенных процессов постобработки при кодировании видеоинформации
KR20210071972A (ko) 신호 처리 장치 및 방법, 그리고 프로그램
CN111698512B (zh) 视频处理方法、装置、设备及存储介质
US20220279300A1 (en) Steering of binauralization of audio
WO2019141258A1 (zh) 一种视频编码方法、视频解码方法、装置及系统
WO2023025143A1 (zh) 音频信号的处理方法和装置
KR101696997B1 (ko) Dsp 내장 코덱을 이용한 소음에 따른 출력 음향 크기 자동 조정 장치
US11863755B2 (en) Methods and apparatus to encode video with region of motion detection
KR102670181B1 (ko) 사운드 소스들의 다수의 배열들을 갖는 방향성 오디오 생성
CN114422782B (zh) 视频编码方法、装置、存储介质及电子设备
CN115396672B (zh) 比特流存储方法、装置、电子设备和计算机可读介质
WO2023051703A1 (zh) 一种音频渲染系统和方法
WO2022242534A1 (zh) 编解码方法、装置、设备、存储介质及计算机程序
CN117597732A (zh) 基于深度学习的语音增强的过度抑制减轻
CN115668369A (zh) 音频处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22875010

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE