WO2018072214A1 - Mixed reality audio system - Google Patents

Mixed reality audio system Download PDF

Info

Publication number
WO2018072214A1
WO2018072214A1 PCT/CN2016/102958 CN2016102958W WO2018072214A1 WO 2018072214 A1 WO2018072214 A1 WO 2018072214A1 CN 2016102958 W CN2016102958 W CN 2016102958W WO 2018072214 A1 WO2018072214 A1 WO 2018072214A1
Authority
WO
WIPO (PCT)
Prior art keywords
field
rendering
far
sound source
field rendering
Prior art date
Application number
PCT/CN2016/102958
Other languages
French (fr)
Chinese (zh)
Inventor
向裴
Original Assignee
向裴
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 向裴 filed Critical 向裴
Priority to PCT/CN2016/102958 priority Critical patent/WO2018072214A1/en
Publication of WO2018072214A1 publication Critical patent/WO2018072214A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source

Definitions

  • an audio processing system comprising: a sound source element parsing unit that parses a sound source signal into a sound source element requiring near field rendering and a sound source element requiring far field rendering; near field rendering Unit for near-field rendering of sound source elements that require near-field rendering for near-field output; and far-field rendering units for far-field rendering of source elements that require far-field rendering for far-field output.
  • the audio processing system may further include a near-field far-field transition unit for balancing sound source elements that need to be near-field rendering and sound source elements that require far-field rendering from the same sound source, respectively, in the near-field rendering unit and far The output of the field rendering unit.
  • a near-field far-field transition unit for balancing sound source elements that need to be near-field rendering and sound source elements that require far-field rendering from the same sound source, respectively, in the near-field rendering unit and far The output of the field rendering unit.
  • the output unit is a headset
  • the near field rendering is a three-dimensional rendering for the headset.
  • FIG. 1 is an example illustrating a game scene.
  • FIG. 2 is a schematic diagram of a process flow of a near field terminal processing system in accordance with one embodiment of the present invention.
  • the near-field far-field transition unit 404 can further cooperate with the sound source element parsing unit 401 to adjust the sound source elements that need to be near-field rendering and the sound source that needs far-field rendering that are resolved by the same sound source over time. element. That is to say, not only the energy of near-field rendering and far-field rendering needs to be balanced, but also the components parsed by the sound source elements need to be adjusted according to time, so as to reflect the change of the sound source's near and far position in real time (for example, the player relative When the sound source is in motion, or the sound source itself is In motion).
  • FIG. 5 is a flow chart illustrating an audio processing method in accordance with the present invention.
  • an audio processing method 500 in accordance with the present invention may begin at step S501, in which the sound source signal is parsed into sound source elements requiring near field rendering and sound source elements requiring far field rendering. Then, in step S503, near-field rendering is performed on the sound source elements that require near-field rendering for near-field output. In step S505, far-field rendering is performed on the sound source elements that require far-field rendering for far-field output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

Provided is a mixed reality audio system. An audio processing system (400) comprises: a sound source element parsing unit (401) for parsing a sound source signal into a sound source element requiring near-field rendering and a sound source element requiring far-field rendering; a near-field rendering unit (402) for performing near-field rendering on the sound source element requiring near-field rendering so as to realize near-field output; and a far-field rendering unit (403) for performing far-field rendering on the sound source element requiring far-field rendering so as to realize far-field output. The audio processing system (400) provides, by performing comprehensive processing on speaker signals at different distances in the system, extraordinarily realistic three-dimensional audio positioning and experience for users when used for virtual reality, augmented reality, mixed reality, and the like. In this way, the drawbacks of a three-dimensional sound rendering effect obtained by a conventional method based on far-field speaker box amplification or near-field headset amplification are overcome.

Description

混合现实音频系统Mixed reality audio system 技术领域Technical field
本发明涉及音频信号处理,更具体涉及混合现实音频系统及其信号处理。This invention relates to audio signal processing, and more particularly to a mixed reality audio system and its signal processing.
背景技术Background technique
三维声、3D声、沉浸式音频、虚拟现实(VR)/增强现实(AR)/混合现实(MR)音频这类名词,虽然叫法不一样,但大多指向同一领域用软件算法以及配套的硬件设备,实现让听者感到身临其境的听觉感受,对音响系统虚拟出来的声源有良好的定位,对声场有良好的现场感或虚拟现场感。Three-dimensional sound, 3D sound, immersive audio, virtual reality (VR) / augmented reality (AR) / mixed reality (MR) audio, although not called the same, but mostly point to the same field with software algorithms and supporting hardware The device realizes the auditory feeling that the listener feels immersive, has a good positioning of the sound source virtualized by the sound system, and has a good sense of presence or a virtual sense of the sound field.
传统实现这个目的大多有以下几个方法:Traditionally, there are several methods for achieving this goal:
I.环绕用户的音箱阵列:比较标准的有5.1,7.1,11.1,22.2,ATMOS,Auro 3D,DTS-X等等摆放方式,也有非标准的各种定制化多音箱摆放方式。属于远场或自由场。这种方法很容易让用户有很好的沉浸感,因为声音本身来自于周围的音源,但是如果需要渲染离用户很近的声源,由于扬声器的个数有限,不可能通过类似光学的全息的方法做到。I. Surrounding user's speaker array: Compared with standard, there are 5.1, 7.1, 11.1, 22.2, ATMOS, Auro 3D, DTS-X and so on. There are also non-standard customized multi-speaker placement methods. Belong to the far field or free field. This method is very easy for the user to have a good sense of immersion, because the sound itself comes from the surrounding sound source, but if you need to render a sound source close to the user, because of the limited number of speakers, it is impossible to pass optical holographic The method is done.
II.基于耳机,用头相关传输函数(HRTF)或双耳房间冲激响应(BRIR)卷积,得到虚拟声音的定位。这个方法可以加入头部转动追踪(head tracking),增加真实性。这种方法的弊端在于很难克服头中效应,就是由于耳机的声场属性以及人耳个体差异等等,在虚拟以后还是有大部分用户会觉得声音在头里面,而不是在自己的外面,减少了渲染的真实性。II. Based on the earphone, the head-related transfer function (HRTF) or the binaural room impulse response (BRIR) convolution is used to obtain the positioning of the virtual sound. This method can be added to the head tracking to increase the authenticity. The disadvantage of this method is that it is difficult to overcome the head effect, that is, due to the sound field properties of the headphones and the individual differences of the human ear, etc., after the virtual, most users will feel the sound in the head instead of outside, reducing The authenticity of the rendering.
III.前置音箱阵列:在用户前方的音箱阵列可以通过交叉通路消除(Cross-talk cancelation)等信号处理方法,用一组只在前方的音箱虚拟3D听感。这种方法的弊端是对用户头的相对位置要求很高,而且不适用于多个用户同时欣赏。III. Front speaker array: The speaker array in front of the user can use a signal processing method such as cross-talk cancelation, and a virtual 3D sound sense with a set of speakers only in front. The disadvantage of this method is that the relative position of the user's head is very high, and it is not suitable for multiple users to enjoy at the same time.
发明内容Summary of the invention
针对以上的问题,本发明的目标之一在于,提供一种新的音频处理系统, 通过对系统内不同距离的扬声器信号的综合处理,给虚拟现实(VR),增强现实(AR),混合现实(MR)以及类似情况的用户提供极为逼真的三维音频定位以及体验。希望通过本发明来克服了传统方法只基于远场音箱扩音或者近场耳机扩音,在三维声音渲染效果上的弊端。In view of the above problems, one of the objects of the present invention is to provide a new audio processing system. Provides extremely realistic 3D audio positioning and experience for virtual reality (VR), augmented reality (AR), mixed reality (MR) and similar situations through the integrated processing of loudspeaker signals at different distances within the system. It is hoped that the present invention overcomes the drawbacks of the conventional method based only on far-field speaker amplification or near-field earphone amplification in the three-dimensional sound rendering effect.
根据本发明的第一方面,提供一种音频处理系统,包括:声源元素解析单元,将声源信号解析为需要近场渲染的声源元素和需要远场渲染的声源元素;近场渲染单元,对需要近场渲染的声源元素进行近场渲染,以进行近场输出;以及远场渲染单元,对需要远场渲染的声源元素进行远场渲染,以进行远场输出。According to a first aspect of the present invention, there is provided an audio processing system comprising: a sound source element parsing unit that parses a sound source signal into a sound source element requiring near field rendering and a sound source element requiring far field rendering; near field rendering Unit for near-field rendering of sound source elements that require near-field rendering for near-field output; and far-field rendering units for far-field rendering of source elements that require far-field rendering for far-field output.
所述音频处理系统可以进一步包括近场远场过渡单元,用于平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染单元和远场渲染单元的输出。The audio processing system may further include a near-field far-field transition unit for balancing sound source elements that need to be near-field rendering and sound source elements that require far-field rendering from the same sound source, respectively, in the near-field rendering unit and far The output of the field rendering unit.
优选地,所述近场远场过渡单元进一步用于与声源元素解析单元合作,随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。Preferably, the near-field far-field transition unit is further configured to cooperate with a sound source element analysis unit to adjust a sound source element that needs to be near-field rendering and a sound source that requires far-field rendering, which are parsed by the same sound source as time changes. element.
在所述的音频处理系统中,所述近场渲染单元使用近场耳机类算法进行近场渲染。优选地,所述近场耳机类算法包括:使用头相关传输函数或双耳房间冲激响应卷积的渲染算法。In the audio processing system, the near field rendering unit performs near field rendering using a near field headphone-like algorithm. Preferably, the near field headphone type algorithm comprises: a rendering algorithm using a head related transfer function or a binaural room impulse response convolution.
在所述的音频处理系统中,所述远场渲染单元使用远场音箱算法进行远场渲染。优选地,所述远场音箱算法包括基于对象的三维声渲染算法。In the audio processing system, the far field rendering unit uses a far field speaker algorithm for far field rendering. Preferably, the far field speaker algorithm comprises an object based three dimensional sound rendering algorithm.
所述音频处理系统可以进一步包括:近场渲染输出单元,用于输出近场渲染后的信号;和远场渲染输出单元,用于输出远场渲染后的信号。优选地,近场渲染输出单元是开放式耳机,远场渲染输出单元是音箱阵列。The audio processing system may further include: a near field rendering output unit for outputting the near field rendered signal; and a far field rendering output unit for outputting the far field rendered signal. Preferably, the near field rendering output unit is an open type earphone and the far field rendering output unit is a speaker array.
根据本发明的第二方面,提供一种音频处理方法,包括:将声源信号解 析为需要近场渲染的声源元素和需要远场渲染的声源元素;对需要近场渲染的声源元素进行近场渲染,以进行近场输出;以及对需要远场渲染的声源元素进行远场渲染,以进行远场输出。According to a second aspect of the present invention, an audio processing method is provided, comprising: solving a sound source signal Analysis of sound source elements that require near-field rendering and sound source elements that require far-field rendering; near-field rendering of sound source elements that require near-field rendering for near-field output; and sound source elements that require far-field rendering Perform far-field rendering for far-field output.
所述音频处理方法可以进一步包括:平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染和远场渲染之后的输出。The audio processing method may further include: balancing the output of the sound source element that needs to be near-field rendering and the sound source element that needs far-field rendering that are parsed by the same sound source, respectively, after near-field rendering and far-field rendering.
优选地,所述的平衡操作进一步包括:随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。Preferably, the balancing operation further comprises: adjusting a sound source element that needs to be near-field rendering and a sound source element that requires far-field rendering, which are parsed by the same sound source, as time changes.
在所述的音频处理方法中,使用近场耳机类算法进行近场渲染。优选地,所述近场耳机类算法包括:使用头相关传输函数或双耳房间冲激响应卷积的渲染算法。In the audio processing method described, near field rendering is performed using a near field headphone class algorithm. Preferably, the near field headphone type algorithm comprises: a rendering algorithm using a head related transfer function or a binaural room impulse response convolution.
在所述的音频处理方法中,使用远场音箱算法进行远场渲染。优选地,所述远场音箱算法包括基于对象的三维声渲染算法。In the audio processing method described, far field rendering is performed using a far field speaker algorithm. Preferably, the far field speaker algorithm comprises an object based three dimensional sound rendering algorithm.
根据本发明的第三方面,提供一种针对混合现实音频的近场终端处理系统,包括:点声源定位单元,对点声源的位置进行判断以进行虚拟定位;场景声源近场元素提取单元,提取场景声源信号中需要近场渲染的元素;近场渲染单元,将虚拟定位后的点声源信号与提取的场景声源信号中需要近场渲染的元素进行近场渲染;以及输出单元,输出近场渲染后的信号。According to a third aspect of the present invention, a near field terminal processing system for mixed reality audio is provided, comprising: a point sound source positioning unit that judges a position of a point sound source for virtual positioning; and a near field element extraction of the scene sound source a unit that extracts an element that requires near-field rendering in a scene sound source signal; a near-field rendering unit that performs near-field rendering of the virtual localized point source signal and elements of the extracted scene source signal that require near-field rendering; and output Unit, output the signal after the near field rendering.
优选地,所述输出单元是耳机,所述的近场渲染是针对耳机的三维渲染。Preferably, the output unit is a headset, and the near field rendering is a three-dimensional rendering for the headset.
根据本发明的第四方面,提供一种针对混合现实音频的远场终端处理系统,包括:点声源远场元素提取单元,提取点声源信号中需要远场渲染的元素;场景声源处理单元,从场景声源信号中去除近场处理部分;远场渲染单元,将提取的点声源信号中需要远场渲染的元素与去除了近场处理部分的场景声源信号进行远场渲染,以及输出单元,输出远场渲染后的信号。 According to a fourth aspect of the present invention, a far field terminal processing system for mixed reality audio is provided, comprising: a point source source far field element extracting unit that extracts elements in a point sound source signal that require far field rendering; scene sound source processing The unit removes the near-field processing part from the scene sound source signal; the far-field rendering unit performs far-field rendering on the extracted point source signal that requires far-field rendering and the scene source signal from which the near-field processing part is removed. And an output unit that outputs the signal after the far field rendering.
优选地,所述输出单元是音箱阵列,所述的远场渲染是针对远场音箱的三维渲染。Preferably, the output unit is an array of speakers, and the far field rendering is a three-dimensional rendering for a far field speaker.
本发明从声源信号中解析出了需要进行近场渲染的声源元素和需要进行远场渲染的声源元素,对于二者分别进行相应的近场渲染和远场渲染。由此,特别是针对3D声音信号,或VR/AR/MR以及类似情况,提供极为逼真的3D音频定位以及体验。由此,本发明克服了传统方法只基于远场音箱扩音或者近场耳机扩音,在三维声音渲染效果上的弊端。The invention parses out the sound source elements that need to be near-field rendering and the sound source elements that need to be far-field rendered from the sound source signal, and performs corresponding near-field rendering and far-field rendering respectively for the two. Thereby, extremely realistic 3D audio positioning and experience is provided, especially for 3D sound signals, or VR/AR/MR and the like. Thus, the present invention overcomes the drawbacks of the conventional method based only on far-field speaker amplification or near-field earphone amplification in the three-dimensional sound rendering effect.
附图说明DRAWINGS
下面参考附图结合实施例说明本发明。在附图中:The invention will now be described in connection with the embodiments with reference to the accompanying drawings. In the drawing:
图1是图示说明一个游戏场景的示例。FIG. 1 is an example illustrating a game scene.
图2是根据本发明的一个实施例的近场终端处理系统的处理流程的示意图。2 is a schematic diagram of a process flow of a near field terminal processing system in accordance with one embodiment of the present invention.
图3是根据本发明的一个实施例的远场终端处理系统的处理流程的示意图。3 is a schematic diagram of a process flow of a far field terminal processing system in accordance with one embodiment of the present invention.
图4是图示说明根据本发明的音频处理系统的示意框图。4 is a schematic block diagram illustrating an audio processing system in accordance with the present invention.
图5是图示说明根据本发明的音频处理方法的流程图。Figure 5 is a flow chart illustrating an audio processing method in accordance with the present invention.
具体实施方式detailed description
下面将结合附图来详细解释本发明的具体实施例。Specific embodiments of the present invention will be explained in detail below with reference to the accompanying drawings.
本发明提出一种新的软硬件音频系统设计,针对VR/AR/MR相关的应用场景,但发明本身并不仅限于此类场景。根据本发明的混合现实音频系统以下可简称为MRA系统(Mixed Reality Audio System)。The present invention proposes a new software and hardware audio system design for VR/AR/MR related application scenarios, but the invention itself is not limited to such scenarios. The mixed reality audio system according to the present invention may hereinafter be referred to simply as an MRA system (Mixed Reality Audio System).
MRA系统对于近场的声音内容,通过耳机类算法,如HRTF/BRIR等等处理,而对远场内容采用音箱阵列处理,两种场的过渡部分,通过MRA系统来调试和操控。 The MRA system processes the sound content of the near field through earphone-like algorithms such as HRTF/BRIR, and the speaker array processing for the far-field content. The transition between the two fields is debugged and manipulated by the MRA system.
图1是图示说明一个游戏场景的示例。FIG. 1 is an example illustrating a game scene.
如图1所示,表现了一个多人游戏的俯瞰图。玩家A和B的头朝向分别用箭头表示,而游戏场景中的各个虚拟对象都用方形的图标表示,其中圆角的方形表示离玩家比较远的虚拟对象(Beach、Boat、Wind、Airplane)。图1的示例中,Monster和Bird因为离玩家距离很近,应该在A和B的头盔上用耳机相关算法分别渲染出适合A和B相对位置的声音定位。例如,Bird应该在B的右方,而对于A也在右前方而且距离更远一些;Monster应该在B的左后方,且在A的右前方。这样,针对A和B的近场渲染系统应该分别运算,而且需要处理A和B的头部转动信息。而对于距离比较远的Wind、Airplane、Beach、Boat等物体,可以通过游戏空间周围一圈音箱渲染,这样A和B都听到同一组远场声音,而且音箱相对A和B的位置不变,所以不用单独处理头部转动信息。对于处于近场和远场之间过渡的物体,两套处理模块应该综合进行合成,以达到逼真的效果。As shown in Figure 1, a bird's-eye view of a multiplayer game is shown. The head orientations of players A and B are respectively indicated by arrows, and each virtual object in the game scene is represented by a square icon, wherein the squares of the rounded corners represent virtual objects (Beach, Boat, Wind, Airplane) far away from the player. In the example of Figure 1, Monster and Bird are very close to the player, and the headphone-related algorithms should be used to render the sound positions suitable for the relative positions of A and B on the helmets of A and B respectively. For example, Bird should be on the right side of B, and A is also on the right front and farther away; Monster should be on the left rear of B and on the right front of A. Thus, the near-field rendering systems for A and B should be operated separately, and the head rotation information of A and B needs to be processed. For objects such as Wind, Airplane, Beach, and Boat, which are far away, they can be rendered by a circle of speakers around the game space, so that both A and B hear the same set of far-field sounds, and the positions of the speakers relative to A and B are unchanged. Therefore, it is not necessary to separately process the head rotation information. For objects that transition between near and far fields, the two sets of processing modules should be combined to achieve a realistic effect.
从信号处理的角度,每个玩家应该有一个终端(或者软件虚拟终端)控制玩家信息相关的处理;玩家以外,有另一个终端处理共享的场景,控制远场音箱,供所有玩家体验,各终端通过服务器链接,被游戏引擎以及类似的软件系统所驱动。从游戏引擎角度,对声源的处理可概括为图2和图3中所示的流程。From the perspective of signal processing, each player should have a terminal (or software virtual terminal) to control the processing related to the player information; outside the player, there is another terminal to handle the shared scene, control the far-field speakers for all players to experience, each terminal Driven by the game engine and similar software systems through server links. From the perspective of the game engine, the processing of the sound source can be summarized as the flow shown in Figures 2 and 3.
图2是根据本发明的一个实施例的近场终端处理系统的处理流程的示意图。具体地说,图2描述了针对每个玩家耳机的近场终端处理流程。对于每个点声源,包括游戏中的对象以及其他玩家在虚拟世界中的声音,分析模块会根据陀螺仪、位置捕捉等外设求出每个点声源应该被渲染到的虚拟位置,并运用耳机渲染算法,如HRTF、BRIR卷积、建模处理等等渲染到这个玩家的耳机输出。每个玩家的终端处理是独立和不一样的。对于场景声源,比如整个环境等等,大多情况是交给处理场景远场音效的终端来处理,但为了描述的全面,这里不排除从此种声源中提取需要给近场处理的部分和元素,供耳机渲染。 2 is a schematic diagram of a process flow of a near field terminal processing system in accordance with one embodiment of the present invention. Specifically, Figure 2 depicts the near field terminal processing flow for each player's headset. For each point source, including objects in the game and other players' sounds in the virtual world, the analysis module will find the virtual position to which each point source should be rendered based on peripherals such as gyroscopes, position capture, etc. Headphone rendering algorithms such as HRTF, BRIR convolution, modeling processing, etc. are rendered to the player's headphone output. The terminal processing of each player is independent and different. For scene sound sources, such as the entire environment, etc., most of the cases are handled by the terminal that processes the far-field sound effects of the scene, but for the sake of comprehensive description, it is not excluded to extract the parts and elements that need to be processed for near-field processing from such sound sources. For headphone rendering.
概括地说,根据图2所示,根据本发明的实施例的针对混合现实音频的近场终端处理系统包括:点声源定位单元,对点声源的位置进行判断以进行虚拟定位;场景声源近场元素提取单元,提取场景声源信号中需要近场渲染的元素;近场渲染单元,将虚拟定位后的点声源信号与提取的场景声源信号中需要近场渲染的元素进行近场渲染;以及输出单元,输出近场渲染后的信号。其中,所述输出单元是耳机,所述的近场渲染是针对耳机的三维渲染。In summary, according to FIG. 2, a near field terminal processing system for mixed reality audio according to an embodiment of the present invention includes: a point sound source positioning unit that judges the position of the point sound source for virtual positioning; The source near-field element extracting unit extracts an element that needs near-field rendering in the scene sound source signal; the near-field rendering unit approximates the virtual-positioned point source signal and the extracted scene source signal that requires near-field rendering. Field rendering; and an output unit that outputs the near-field rendered signal. Wherein, the output unit is an earphone, and the near field rendering is a three-dimensional rendering for the earphone.
图3是根据本发明的一个实施例的远场终端处理系统的处理流程的示意图。具体地说,图3描绘了针对远场声音处理的终端。对于近场点声源,一般会有部分需要远场处理的分量,比如说,对象由于周围环境的反射产生的混响(reverb),以及耳机的低频部分,可能需要远场音箱更好的低音处理能力来完成。这种带方向信息的部分提取之后,每个点声源会分别被处理,然后叠加起来交给这个远场终端一并输出到音箱阵列。对于远场里面的点声源,可以运用各种方法,也即所有基于对象的三维声渲染算法,比如矢量基幅值相移(VBAP,Vector based amplitude panning),用于渲染点状对象的高阶高保真度立体声像复制(HOA,High Order Ambisonics)或者杜比、DTS等公司的各类基于对象的三维声渲染方法等等,来渲染。对于场景声源,大部分都应该由远场音箱阵列来渲染和完成。3 is a schematic diagram of a process flow of a far field terminal processing system in accordance with one embodiment of the present invention. Specifically, Figure 3 depicts a terminal for far field sound processing. For near-field sources, there are generally some components that require far-field processing. For example, the reverb generated by the reflection of the surrounding environment and the low-frequency portion of the headphones may require a better bass from the far-field speaker. Processing power to complete. After the partial extraction of the direction information is extracted, each point source is processed separately, and then superimposed and sent to the far field terminal for output to the speaker array. For the point source in the far field, various methods can be used, that is, all object-based three-dimensional sound rendering algorithms, such as vector based amplitude panning (VBAP), used to render the height of the point object. High-fidelity stereo image reproduction (HOA, High Order Ambisonics) or Dolby, DTS and other companies' object-based 3D sound rendering methods, etc., to render. For scene sources, most should be rendered and completed by the far field speaker array.
概括地说,根据图3所示,根据本发明的实施例的针对混合现实音频的远场终端处理系统包括:点声源远场元素提取单元,提取点声源信号中需要远场渲染的元素;场景声源处理单元,从场景声源信号中去除近场处理部分;远场渲染单元,将提取的点声源信号中需要远场渲染的元素与去除了近场处理部分的场景声源信号进行远场渲染;以及输出单元,输出远场渲染后的信号。其中,所述输出单元是音箱阵列,所述的远场渲染是针对远场音箱的三维渲染。In summary, according to FIG. 3, a far field terminal processing system for mixed reality audio according to an embodiment of the present invention includes: a point source source far field element extracting unit that extracts elements in a point source signal that require far field rendering. The scene sound source processing unit removes the near-field processing part from the scene sound source signal; the far-field rendering unit extracts the elements of the point source signal that need to be far-field rendered and the scene source signal from which the near-field processing part is removed. Perform far-field rendering; and output unit to output the far-field rendered signal. Wherein, the output unit is a speaker array, and the far field rendering is a three-dimensional rendering for a far field speaker.
下面更详细地讨论用于实现本发明的系统的软硬件细节。The hardware and software details of the system for implementing the present invention are discussed in more detail below.
近场耳机硬件:至少使用开放式耳机,这样用户可以更多地听到身外音箱的声音以及其他玩家的对话;最好的方式是不遮挡耳洞的耳机,比如说: Near-field earphone hardware: At least use open headphones, so users can hear more of the sound of the outside speakers and other players' dialogue; the best way is to not block the earphones, for example:
1.骨传导耳机:完全不遮挡,但是在声音定位上因为没有经过耳道,效果会有所减弱;1. Bone conduction earphone: completely unobstructed, but in the sound positioning because the ear canal is not passed, the effect will be weakened;
2.半遮挡耳洞的运动耳机,市面上目前不太多,但曾经出现过;2. The sports headphones with half-shield ear holes are not too much on the market, but they have appeared before;
3.向耳洞发声,但是完全不遮挡的耳机,这类耳机不多,但是例如微软的HoloLens以及其他设计已经出现。发明人做过类似的实验,将超小型音箱放在耳洞附近播放双耳信号,取得了非常好的去头中效应的3D声音虚拟效果。这种方法的弊端是因为跟耳洞之间的耦合不佳,低频部分没有一般佩戴耳机方法效果好。因为我们还有远场音箱,所以低频部分完全可以通过外部音箱来补偿,方法类似传统已经广泛运用的低音管理(bass management)系统。3. Headphones that sound to the ear holes, but are completely unobstructed. There are not many such headphones, but Microsoft HoloLens and other designs have emerged. The inventor has done a similar experiment, playing the ultra-small speaker near the ear hole to play the binaural signal, and achieved a very good 3D sound virtual effect of the head-in effect. The disadvantage of this method is that the coupling between the ear and the ear hole is not good, and the low frequency part is not effective in the general method of wearing the earphone. Because we also have far-field speakers, the low-frequency part can be fully compensated by external speakers, similar to the traditional bass management system that has been widely used.
近场耳机算法:HRTF或BRIR卷积,以及类似的3D渲染技术,加上低音管理系统。Near-field headphone algorithm: HRTF or BRIR convolution, and similar 3D rendering techniques, plus bass management system.
远场音箱系统:5.1、7.1、11.1、22.2、DTS-X、Dolby ATMOS或者根据场景定制的覆盖整个空间的音箱系统,以及相应的软硬件控制软件。Far-field speaker system: 5.1, 7.1, 11.1, 22.2, DTS-X, Dolby ATMOS or a speaker system that covers the entire space customized according to the scene, and the corresponding software and hardware control software.
远场音箱算法:可以使用所有基于对象的三维声渲染方式,例如点声源VBAP、用于渲染点状对象的高阶高保真度立体声像复制(HOA,High Order Ambisonics)或者杜比、DTS等公司的各类基于对象的三维声渲染方法等等。场景声源用传统多声道上混/下混来适应音箱摆放,或者HOA直接针对每个音箱位置优化。Far-field speaker algorithm: All object-based 3D sound rendering methods can be used, such as point source VBAP, high-order high-fidelity stereo image copy (HOA, High Order Ambisonics) for rendering point objects, or Dolby, DTS, etc. The company's various object-based 3D sound rendering methods and more. The scene source is adapted to the speaker placement using traditional multi-channel upmix/downmix, or the HOA is optimized directly for each speaker position.
处理引擎:对于现有的系统,各个游戏引擎如Unity,Unreal,MaxPlay等等,是比较好的实践平台。现在发明人的小规模实验在Unity中进行。Processing Engine: For existing systems, various game engines such as Unity, Unreal, MaxPlay, etc., are better practice platforms. The inventor's small-scale experiment is now carried out in Unity.
进场远场过渡模块:各个声源都有部分在进场终端中,部分在远场终端中进行。当声源在场景中位置在近场和远场中转换的时候,需要有一个过渡的软件模块。最简单的实现大致为能量守恒的混音(cross fade),让声源的分量分为属于玩家近场空间的部分以及属于玩家共同可以听到的远场音箱的部分,但两者加起来的能量和声源能量相当。根据每个玩家的具体位置, 近场声源的音量和位置都会不一样,而远场部分必须共享一个版本,所以还会有更具体的设计。此专利应该覆盖基于此概念的各种设计的变种。这个过渡模块可以理解为针对图2和图3系统,但是在时间上变化的过程中,对其中模块在时间上进行的改变。Approach far-field transition module: Each sound source is partially in the approach terminal and partly in the far-field terminal. When the sound source is converted in the near field and far field in the scene, a transitional software module is required. The simplest implementation is roughly a cross-following of the energy, so that the components of the sound source are divided into parts belonging to the player's near-field space and parts of the far-field speakers that are common to the player, but the two add up. Energy and sound source energy are equivalent. According to the specific location of each player, The volume and position of the near-field source will be different, and the far-field part must share a version, so there will be a more specific design. This patent should cover variations of various designs based on this concept. This transitional module can be understood as a change to the module in Fig. 2 and Fig. 3, but in terms of temporal changes.
根据以上的描述,特别是结合图3和图4的图示说明,本发明实际上还提供了一种通用的音频处理系统和音频处理方法。In accordance with the above description, and particularly in connection with the illustrations of Figures 3 and 4, the present invention actually provides a general purpose audio processing system and audio processing method.
图4是图示说明根据本发明的音频处理系统的示意框图。如图4所示,根据本发明的音频处理系统400包括:声源元素解析单元401,其将各个声源信号(例如点声源、场景声源以及其他可能的声源分类信号)解析为需要近场渲染的声源元素和需要远场渲染的声源元素;近场渲染单元402,其对需要近场渲染的声源元素进行近场渲染,以进行近场输出;以及远场渲染单元403,其对需要远场渲染的声源元素进行远场渲染,以进行远场输出。4 is a schematic block diagram illustrating an audio processing system in accordance with the present invention. As shown in FIG. 4, the audio processing system 400 according to the present invention includes: a sound source element parsing unit 401 that parses each sound source signal (eg, a point sound source, a scene sound source, and other possible sound source classification signals) into a required Near-field rendered sound source elements and sound source elements that require far-field rendering; near-field rendering unit 402 that performs near-field rendering of sound source elements that require near-field rendering for near-field output; and far-field rendering unit 403 It performs far-field rendering of sound source elements that require far-field rendering for far-field output.
可选地,如前所述,音频处理系统400还可以包括近场远场过渡单元404。近场远场过渡单元404的作用在于平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染单元402和远场渲染单元403的输出。其平衡的结果在于,如前所述,实现大致为能量守恒的混音,让声源的分量分为属于玩家近场空间的部分以及属于玩家共同可以听到的远场音箱的部分,但两者加起来的能量和声源能量相当。Alternatively, as previously described, the audio processing system 400 can also include a near field far field transition unit 404. The role of the near-field far-field transition unit 404 is to balance the output of the sound source elements that need to be near-field rendering and the sound source elements that require far-field rendering that are resolved by the same sound source in the near-field rendering unit 402 and the far-field rendering unit 403, respectively. . The result of the balance is that, as mentioned above, a substantially energy-conserved mix is achieved, so that the components of the sound source are divided into parts belonging to the player's near-field space and parts of the far-field speakers that are common to the player, but two The added energy is equivalent to the energy of the sound source.
此外,如前所述,根据每个玩家的具体位置,近场声源的音量和位置都会不一样,而远场部分可以共享一个版本。本领域技术人员根据上述教导,应该能够设计出更具体的方法和固件/硬件,来实现以上的过渡功能。In addition, as mentioned above, the volume and position of the near-field source will be different depending on the specific location of each player, and the far-field portion can share a version. Those skilled in the art, based on the above teachings, should be able to design more specific methods and firmware/hardware to implement the above transitional functions.
另一方面,近场远场过渡单元404可以进一步与声源元素解析单元401合作,随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。也就是说,不仅近场渲染和远场渲染的能量需要平衡,而且声源元素解析出的分量也需要根据时间而做出调整,以便实时地反映出声源远近位置的变化(例如,玩家相对于声源在运动中,或声源本身在 运动中)。On the other hand, the near-field far-field transition unit 404 can further cooperate with the sound source element parsing unit 401 to adjust the sound source elements that need to be near-field rendering and the sound source that needs far-field rendering that are resolved by the same sound source over time. element. That is to say, not only the energy of near-field rendering and far-field rendering needs to be balanced, but also the components parsed by the sound source elements need to be adjusted according to time, so as to reflect the change of the sound source's near and far position in real time (for example, the player relative When the sound source is in motion, or the sound source itself is In motion).
图4中,对于近场远场过渡单元404对于声源元素解析单元401、近场渲染单元402和远场渲染单元403的控制作用,采用虚线的形式来绘制图示。此外,近场远场过渡单元404本身作为可选的单元,在图4中用虚线框来表示。In FIG. 4, for the control action of the near-field far-field transition unit 404 for the sound source element analysis unit 401, the near-field rendering unit 402, and the far-field rendering unit 403, the illustration is drawn in the form of a broken line. Furthermore, the near field far field transition unit 404 itself is shown as an optional unit, indicated by the dashed box in FIG.
在音频处理系统400中,近场渲染单元402使用近场耳机类算法进行近场渲染。如前所述,近场耳机类算法可以包括:使用头相关传输函数(HRTF)或双耳房间冲激响应(BRIR)卷积的渲染算法。In audio processing system 400, near field rendering unit 402 performs near field rendering using a near field headphone-like algorithm. As mentioned previously, the near field headphone class algorithm may include a rendering algorithm using a Head Related Transfer Function (HRTF) or a binaural room impulse response (BRIR) convolution.
在音频处理系统400中,远场渲染单元403使用远场音箱算法进行远场渲染。如前所述,远场音箱算法可以包括基于对象的三维声渲染算法。例如这样的算法可以包括以下至少一个:点声源VBAP(矢量基幅值相移)、采用高阶高保真度立体声像复制(HOA)渲染点状对象、或者杜比、DTS等公司的各类基于对象的三维声渲染方法。In audio processing system 400, far field rendering unit 403 performs far field rendering using a far field speaker algorithm. As mentioned previously, the far field speaker algorithm can include an object based three dimensional sound rendering algorithm. For example, such an algorithm may include at least one of the following: a point source VBAP (vector base amplitude phase shift), a high order high fidelity stereo image copy (HOA) rendering point object, or various types of companies such as Dolby and DTS. Object-based 3D sound rendering method.
可选地,根据本发明的音频处理系统400可以进一步包括:近场渲染输出单元405,其用于输出近场渲染后的信号;和远场渲染输出单元406,用于输出远场渲染后的信号。近场渲染输出单元可以是开放式耳机,远场渲染输出单元可以是音箱阵列。如前所述,已经详细讨论了关于近场输出和远场输出的硬件单元。Optionally, the audio processing system 400 according to the present invention may further include: a near field rendering output unit 405 for outputting a near field rendered signal; and a far field rendering output unit 406 for outputting the far field rendered signal. The near field rendering output unit can be an open type earphone, and the far field rendering output unit can be an array of speakers. As mentioned previously, hardware units for near field output and far field output have been discussed in detail.
图5是图示说明根据本发明的音频处理方法的流程图。如图5所示,根据本发明的一种音频处理方法500可以从步骤S501开始,在该步骤,将声源信号解析为需要近场渲染的声源元素和需要远场渲染的声源元素。然后,在步骤S503,对需要近场渲染的声源元素进行近场渲染,以进行近场输出。在步骤S505,对需要远场渲染的声源元素进行远场渲染,以进行远场输出。Figure 5 is a flow chart illustrating an audio processing method in accordance with the present invention. As shown in FIG. 5, an audio processing method 500 in accordance with the present invention may begin at step S501, in which the sound source signal is parsed into sound source elements requiring near field rendering and sound source elements requiring far field rendering. Then, in step S503, near-field rendering is performed on the sound source elements that require near-field rendering for near-field output. In step S505, far-field rendering is performed on the sound source elements that require far-field rendering for far-field output.
步骤S503和S505可以同时进行,也可以先后进行。需要注意的是,经过两个步骤所输出的渲染后的近场信号和远场信号需要同步地输出给玩家。 Steps S503 and S505 may be performed simultaneously or sequentially. It should be noted that the rendered near-field signal and far-field signal output through the two steps need to be output to the player in synchronization.
此外,与图4中的近场远场过渡单元相对应,可选地,方法500可以进一步包括平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染和远场渲染之后的输出的操作。In addition, corresponding to the near-field far-field transition unit in FIG. 4, optionally, the method 500 may further include balancing sound source elements that need to be near-field rendering and sound source elements that require far-field rendering that are resolved by the same sound source. The operation of the output after near field rendering and far field rendering, respectively.
而且,如前所述,所述的平衡操作可以进一步包括:随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。Moreover, as described above, the balancing operation may further include: adjusting a sound source element that needs to be near-field rendering and a sound source element that requires far-field rendering, which are parsed by the same sound source, as time changes.
在音频处理方法500中,使用近场耳机类算法进行近场渲染。所述近场耳机类算法包括:使用头相关传输函数(HRTF)或双耳房间冲激响应(BRIR)卷积的渲染算法。In the audio processing method 500, near field rendering is performed using a near field headphone-like algorithm. The near field headphone class algorithm includes a rendering algorithm using a Head Related Transfer Function (HRTF) or a binaural room impulse response (BRIR) convolution.
在音频处理方法500中,使用远场音箱算法进行远场渲染。所述远场音箱算法包括基于对象的三维声渲染算法。例如这样的算法可以包括以下至少一个:点声源VBAP(矢量基幅值相移)、采用高阶高保真度立体声像复制(HOA)渲染点状对象或者杜比、DTS等公司的各类基于对象的三维声渲染方法。In the audio processing method 500, far field rendering is performed using a far field speaker algorithm. The far field speaker algorithm includes an object based three dimensional sound rendering algorithm. For example, such an algorithm may include at least one of: a point source VBAP (vector base amplitude phase shift), a high order high fidelity stereo image copy (HOA) rendering point object, or a variety of companies based on Dolby, DTS, etc. The three-dimensional sound rendering method of the object.
应用场景Application scenario
本发明可适用于单人/多人游戏,体验空间,展示空间,博物馆,教育空间等等。由于这个发明,用户在对三维声音的体验上有一个层次的飞跃。The present invention is applicable to single/multiplayer games, experience spaces, display spaces, museums, educational spaces, and the like. Thanks to this invention, users have a level of leap in the experience of three-dimensional sound.
备注:以上提到的“玩家”和“用户”其实是可以呼唤的,都指这类系统的终端用户。Remarks: The “players” and “users” mentioned above can actually be called, and they all refer to the end users of such systems.
根据说明书以上描述的内容,本领域技术人员应该认识到,除了权利要求中列出的技术方案之外,本发明还可以包括以下的补充注释技术方案。In view of the above description of the specification, those skilled in the art will appreciate that the present invention may also include the following supplementary annotation technical solutions in addition to the technical solutions listed in the claims.
补充注释:Additional notes:
1、一种计算机可读记录介质,在该计算机可读记录介质上存储指令,这些指令当由用于音频处理的一个或多个处理器执行时,使得所述一个或多 个处理器执行以下操作:A computer readable recording medium on which instructions are stored which, when executed by one or more processors for audio processing, cause said one or more The processors do the following:
将声源信号解析为需要近场渲染的声源元素和需要远场渲染的声源元素;Parsing the sound source signal into sound source elements that require near-field rendering and sound source elements that require far-field rendering;
对需要近场渲染的声源元素进行近场渲染,以进行近场输出;以及Near-field rendering of sound source elements that require near-field rendering for near-field output;
对需要远场渲染的声源元素进行远场渲染,以进行远场输出。Far-field rendering of sound source elements that require far-field rendering for far-field output.
2、如补充注释1所述的计算机可读记录介质,进一步包括指令,使得所述一个或多个处理执行以下操作:2. The computer readable recording medium of supplementary note 1, further comprising instructions that cause the one or more processes to perform the following operations:
平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染和远场渲染之后的输出。Balances the sound source elements that need to be near-field rendered and the sound source elements that require far-field rendering that are parsed by the same sound source, respectively, after near-field rendering and far-field rendering.
3、如补充注释2所述的计算机可读记录介质,其中,所述的平衡操作进一步包括:随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。3. The computer readable recording medium according to supplementary note 2, wherein the balancing operation further comprises: adjusting a sound source element that needs to be near-field rendering and being analyzed by the same sound source as time changes, and requiring far-field rendering Sound source element.
4、如补充注释1所述的计算机可读记录介质,其中,使用近场耳机类算法进行近场渲染。4. The computer readable recording medium according to supplementary note 1, wherein the near field heading algorithm is used for near field rendering.
5、如补充注释4所述的计算机可读记录介质,其中,所述近场耳机类算法包括:使用头相关传输函数或双耳房间冲激响应卷积的渲染算法。5. The computer readable recording medium of supplementary note 4, wherein the near field headphone type algorithm comprises: a rendering algorithm using a head related transfer function or a binaural room impulse response convolution.
6、如补充注释1所述的计算机可读记录介质,其中,使用远场音箱算法进行远场渲染。6. The computer readable recording medium according to supplementary note 1, wherein the far field rendering is performed using a far field speaker algorithm.
7、如补充注释6所述的计算机可读记录介质,其中,所述远场音箱算法包括基于对象的三维声渲染算法。7. The computer readable recording medium of supplementary note 6, wherein the far field speaker algorithm comprises an object based three dimensional sound rendering algorithm.
8、一种针对混合现实音频的近场终端处理方法,包括:8. A near field terminal processing method for mixed reality audio, comprising:
对点声源的位置进行判断以进行虚拟定位;Judging the position of the point source to perform virtual positioning;
提取场景声源信号中需要近场渲染的元素;Extracting elements of the scene source signal that require near field rendering;
将虚拟定位后的点声源信号与提取的场景声源信号中需要近场渲染的 元素进行近场渲染;以及The near-field rendering of the virtual localized point source signal and the extracted scene source signal is required. Elements for near-field rendering;
输出近场渲染后的信号。Output the near-field rendered signal.
9、如补充注释8所述的近场终端处理方法,其中,使用耳机输出近场渲染后的信号,所述的近场渲染是针对耳机的三维渲染。9. The near field terminal processing method according to supplementary note 8, wherein the near field rendered signal is output using a headphone, and the near field rendering is a three-dimensional rendering for the earphone.
10、一种针对混合现实音频的远场终端处理方法,包括:10. A far field terminal processing method for mixed reality audio, comprising:
提取点声源信号中需要远场渲染的元素;Extracting elements of the point source signal that require far field rendering;
从场景声源信号中去除近场处理部分;Removing the near field processing portion from the scene sound source signal;
将提取的点声源信号中需要远场渲染的元素与去除了近场处理部分的场景声源信号进行远场渲染;以及Far-field rendering of the extracted point source signal that requires far-field rendering and far-field rendering of the scene source signal with the near-field processing portion removed;
输出远场渲染后的信号。Output the signal after far field rendering.
11、如补充注释10所述的远场终端处理方法,其中,使用音箱阵列输出远场渲染后的信号,所述的远场渲染是针对远场音箱的三维渲染。11. The far field terminal processing method of supplementary note 10, wherein the far field rendered signal is output using a speaker array, the far field rendering being a three-dimensional rendering for a far field speaker.
12、一种计算机可读记录介质,在该计算机可读记录介质上存储指令,这些指令当由针对混合现实音频进行近场终端处理的一个或多个处理器执行时,使得所述一个或多个处理器执行以下操作:12. A computer readable recording medium on which instructions are stored that, when executed by one or more processors performing near field terminal processing for mixed reality audio, cause said one or more The processors do the following:
对点声源的位置进行判断以进行虚拟定位;Judging the position of the point source to perform virtual positioning;
提取场景声源信号中需要近场渲染的元素;Extracting elements of the scene source signal that require near field rendering;
将虚拟定位后的点声源信号与提取的场景声源信号中需要近场渲染的元素进行近场渲染。The virtual localized point source signal and the extracted scene source signal require near field rendering of the near field rendering element.
13、如补充注释12所述的计算机可读记录介质,其中,所述的近场渲染是针对耳机的三维渲染。13. The computer readable recording medium of supplementary note 12, wherein the near field rendering is a three-dimensional rendering for a headset.
14、一种计算机可读记录介质,在该计算机可读记录介质上存储指令,这些指令当由针对混合现实音频进行远场终端处理的一个或多个处理器执行时,使得所述一个或多个处理器执行以下操作:14. A computer readable recording medium on which instructions are stored for causing said one or more when executed by one or more processors performing far field terminal processing for mixed reality audio The processors do the following:
提取点声源信号中需要远场渲染的元素; Extracting elements of the point source signal that require far field rendering;
从场景声源信号中去除近场处理部分;Removing the near field processing portion from the scene sound source signal;
将提取的点声源信号中需要远场渲染的元素与去除了近场处理部分的场景声源信号进行远场渲染。The far-field rendering of the extracted point source signal that requires far-field rendering and the scene source signal from which the near-field processing portion is removed is performed.
15、如补充注释14所述的计算机可读记录介质,其中,所述的远场渲染是针对远场音箱的三维渲染。The computer readable recording medium of claim 14, wherein the far field rendering is a three-dimensional rendering for a far field speaker.
上面已经描述了本发明的各种实施例和实施情形。但是,本发明的精神和范围不限于此。本领域技术人员将能够根据本发明的教导而做出更多的应用,而这些应用都在本发明的范围之内。 Various embodiments and implementations of the invention have been described above. However, the spirit and scope of the present invention are not limited thereto. Those skilled in the art will be able to make further applications in accordance with the teachings of the present invention, and such applications are within the scope of the present invention.

Claims (20)

  1. 一种音频处理系统,包括:An audio processing system comprising:
    声源元素解析单元,将声源信号解析为需要近场渲染的声源元素和需要远场渲染的声源元素;The sound source element parsing unit parses the sound source signal into a sound source element requiring near field rendering and a sound source element requiring far field rendering;
    近场渲染单元,对需要近场渲染的声源元素进行近场渲染,以进行近场输出;以及Near-field rendering unit for near-field rendering of sound source elements that require near-field rendering for near-field output;
    远场渲染单元,对需要远场渲染的声源元素进行远场渲染,以进行远场输出。The far-field rendering unit performs far-field rendering on sound source elements that require far-field rendering for far-field output.
  2. 如权利要求1所述的音频处理系统,进一步包括:The audio processing system of claim 1 further comprising:
    近场远场过渡单元,用于平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染单元和远场渲染单元的输出。The near-field far field transition unit is configured to balance the output of the sound source element that needs to be near-field rendering and the sound source element that needs far-field rendering that are parsed by the same sound source in the near-field rendering unit and the far-field rendering unit, respectively.
  3. 如权利要求2所述的音频处理系统,其中,所述近场远场过渡单元进一步用于与声源元素解析单元合作,随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。The audio processing system of claim 2, wherein the near-field far-field transition unit is further configured to cooperate with the sound source element analysis unit to adjust the sound of the near-field rendering that is resolved by the same sound source over time. Source elements and sound source elements that require far-field rendering.
  4. 如权利要求1所述的音频处理系统,其中,所述近场渲染单元使用近场耳机类算法进行近场渲染。The audio processing system of claim 1 wherein said near field rendering unit performs near field rendering using a near field headphone class algorithm.
  5. 如权利要求4所述的音频处理系统,其中,所述近场耳机类算法包括:使用头相关传输函数(HRTF)或双耳房间冲激响应(BRIR)卷积的渲染算法。The audio processing system of claim 4 wherein said near field headphone type algorithm comprises a rendering algorithm using a Head Related Transfer Function (HRTF) or a binaural room impulse response (BRIR) convolution.
  6. 如权利要求1所述的音频处理系统,其中,所述远场渲染单元使用远场音箱算法进行远场渲染。The audio processing system of claim 1 wherein said far field rendering unit performs far field rendering using a far field speaker algorithm.
  7. 如权利要求6所述的音频处理系统,其中,所述远场音箱算法包括基于对象的三维声渲染算法。 The audio processing system of claim 6 wherein said far field speaker algorithm comprises an object based three dimensional sound rendering algorithm.
  8. 如权利要求1所述的音频处理系统,进一步包括:The audio processing system of claim 1 further comprising:
    近场渲染输出单元,用于输出近场渲染后的信号;和a near field rendering output unit for outputting a near field rendered signal; and
    远场渲染输出单元,用于输出远场渲染后的信号。The far field rendering output unit is used to output the signal after far field rendering.
  9. 如权利要求8所述的音频处理系统,其中,近场渲染输出单元是开放式耳机,远场渲染输出单元是音箱阵列。The audio processing system of claim 8 wherein the near field rendering output unit is an open type earphone and the far field rendering output unit is a speaker array.
  10. 一种音频处理方法,包括:An audio processing method comprising:
    将声源信号解析为需要近场渲染的声源元素和需要远场渲染的声源元素;Parsing the sound source signal into sound source elements that require near-field rendering and sound source elements that require far-field rendering;
    对需要近场渲染的声源元素进行近场渲染,以进行近场输出;以及Near-field rendering of sound source elements that require near-field rendering for near-field output;
    对需要远场渲染的声源元素进行远场渲染,以进行远场输出。Far-field rendering of sound source elements that require far-field rendering for far-field output.
  11. 如权利要求10所述的音频处理方法,进一步包括:The audio processing method of claim 10, further comprising:
    平衡同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素分别在近场渲染和远场渲染之后的输出。Balances the sound source elements that need to be near-field rendered and the sound source elements that require far-field rendering that are parsed by the same sound source, respectively, after near-field rendering and far-field rendering.
  12. 如权利要求11所述的音频处理方法,其中,所述的平衡操作进一步包括:随时间变化而调整同一声源所解析出的需要近场渲染的声源元素和需要远场渲染的声源元素。The audio processing method according to claim 11, wherein said balancing operation further comprises: adjusting a sound source element required for near-field rendering and a sound source element requiring far-field rendering, which are parsed by the same sound source as time changes. .
  13. 如权利要求10所述的音频处理方法,其中,使用近场耳机类算法进行近场渲染。The audio processing method according to claim 10, wherein the near field rendering is performed using a near field headphone type algorithm.
  14. 如权利要求13所述的音频处理方法,其中,所述近场耳机类算法包括:使用头相关传输函数(HRTF)或双耳房间冲激响应(BRIR)卷积的渲染算法。The audio processing method of claim 13, wherein the near field headphone type algorithm comprises a rendering algorithm using a head related transfer function (HRTF) or a binaural room impulse response (BRIR) convolution.
  15. 如权利要求10所述的音频处理方法,其中,使用远场音箱算法进行远场渲染。 The audio processing method of claim 10 wherein the far field rendering is performed using a far field speaker algorithm.
  16. 如权利要求15所述的音频处理方法,其中,所述远场音箱算法包括基于对象的三维声渲染算法。The audio processing method of claim 15 wherein said far field speaker algorithm comprises an object based three dimensional sound rendering algorithm.
  17. 一种针对混合现实音频的近场终端处理系统,包括:A near field terminal processing system for mixed reality audio, comprising:
    点声源定位单元,对点声源的位置进行判断以进行虚拟定位;Point source localization unit, judging the position of the point sound source for virtual positioning;
    场景声源近场元素提取单元,提取场景声源信号中需要近场渲染的元素;a scene source near-field element extracting unit that extracts elements of the scene sound source signal that require near-field rendering;
    近场渲染单元,将虚拟定位后的点声源信号与提取的场景声源信号中需要近场渲染的元素进行近场渲染;以及a near-field rendering unit that performs near-field rendering of the virtually positioned point source signal and elements of the extracted scene source signal that require near field rendering;
    输出单元,输出近场渲染后的信号。The output unit outputs the signal after the near field rendering.
  18. 如权利要求17所述的近场终端处理系统,其中,所述输出单元是耳机,所述的近场渲染是针对耳机的三维渲染。The near field terminal processing system of claim 17 wherein said output unit is a headset and said near field rendering is a three dimensional rendering of the headset.
  19. 一种针对混合现实音频的远场终端处理系统,包括:A far field terminal processing system for mixed reality audio, comprising:
    点声源远场元素提取单元,提取点声源信号中需要远场渲染的元素;Point source source far field element extraction unit, extracting elements in the point source signal that require far field rendering;
    场景声源处理单元,从场景声源信号中去除近场处理部分;a scene sound source processing unit that removes a near field processing portion from the scene sound source signal;
    远场渲染单元,将提取的点声源信号中需要远场渲染的元素与去除了近场处理部分的场景声源信号进行远场渲染;以及a far field rendering unit that performs far-field rendering of the extracted point source signal that requires far-field rendering and the scene source signal from which the near-field processing portion is removed;
    输出单元,输出远场渲染后的信号。The output unit outputs the signal after the far field rendering.
  20. 如权利要求19所述的远场终端处理系统,其中,所述输出单元是音箱阵列,所述的远场渲染是针对远场音箱的三维渲染。 The far field terminal processing system of claim 19 wherein said output unit is a speaker array and said far field rendering is a three dimensional rendering of a far field speaker.
PCT/CN2016/102958 2016-10-21 2016-10-21 Mixed reality audio system WO2018072214A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/102958 WO2018072214A1 (en) 2016-10-21 2016-10-21 Mixed reality audio system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/102958 WO2018072214A1 (en) 2016-10-21 2016-10-21 Mixed reality audio system

Publications (1)

Publication Number Publication Date
WO2018072214A1 true WO2018072214A1 (en) 2018-04-26

Family

ID=62018120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/102958 WO2018072214A1 (en) 2016-10-21 2016-10-21 Mixed reality audio system

Country Status (1)

Country Link
WO (1) WO2018072214A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597481A (en) * 2018-11-16 2019-04-09 Oppo广东移动通信有限公司 AR virtual portrait method for drafting, device, mobile terminal and storage medium
CN110782865A (en) * 2019-11-06 2020-02-11 上海音乐学院 Three-dimensional sound creation interactive system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1212092A (en) * 1996-12-19 1999-03-24 北方电讯有限公司 Method and apparatus for computing measures of echo
CN1361972A (en) * 1999-05-20 2002-07-31 艾利森电话股份有限公司 Enhancement of near-end voice signals in an echo suppression system
CN1933517A (en) * 2005-09-13 2007-03-21 株式会社日立制作所 Voice call system and method of providing contents during a voice call
WO2012164444A1 (en) * 2011-06-01 2012-12-06 Koninklijke Philips Electronics N.V. An audio system and method of operating therefor
US20130094653A1 (en) * 2011-02-16 2013-04-18 Clearone Communications, Inc. Voip device, voip conferencing system, and related method
CN103106390A (en) * 2011-11-11 2013-05-15 索尼公司 Information processing apparatus, information processing method, and program
CN104010265A (en) * 2013-02-22 2014-08-27 杜比实验室特许公司 Audio space rendering device and method
CN104123950A (en) * 2014-07-17 2014-10-29 深圳市中兴移动通信有限公司 Sound recording method and device
US20150092947A1 (en) * 2013-09-30 2015-04-02 Sonos, Inc. Coordinator Device for Paired or Consolidated Players
CN105191354A (en) * 2013-05-16 2015-12-23 皇家飞利浦有限公司 An audio processing apparatus and method therefor

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1212092A (en) * 1996-12-19 1999-03-24 北方电讯有限公司 Method and apparatus for computing measures of echo
CN1361972A (en) * 1999-05-20 2002-07-31 艾利森电话股份有限公司 Enhancement of near-end voice signals in an echo suppression system
CN1933517A (en) * 2005-09-13 2007-03-21 株式会社日立制作所 Voice call system and method of providing contents during a voice call
US20130094653A1 (en) * 2011-02-16 2013-04-18 Clearone Communications, Inc. Voip device, voip conferencing system, and related method
WO2012164444A1 (en) * 2011-06-01 2012-12-06 Koninklijke Philips Electronics N.V. An audio system and method of operating therefor
CN103106390A (en) * 2011-11-11 2013-05-15 索尼公司 Information processing apparatus, information processing method, and program
CN104010265A (en) * 2013-02-22 2014-08-27 杜比实验室特许公司 Audio space rendering device and method
CN105191354A (en) * 2013-05-16 2015-12-23 皇家飞利浦有限公司 An audio processing apparatus and method therefor
US20150092947A1 (en) * 2013-09-30 2015-04-02 Sonos, Inc. Coordinator Device for Paired or Consolidated Players
CN104123950A (en) * 2014-07-17 2014-10-29 深圳市中兴移动通信有限公司 Sound recording method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597481A (en) * 2018-11-16 2019-04-09 Oppo广东移动通信有限公司 AR virtual portrait method for drafting, device, mobile terminal and storage medium
CN110782865A (en) * 2019-11-06 2020-02-11 上海音乐学院 Three-dimensional sound creation interactive system

Similar Documents

Publication Publication Date Title
US9560467B2 (en) 3D immersive spatial audio systems and methods
CN113630711B (en) Binaural rendering of headphones using metadata processing
US9769589B2 (en) Method of improving externalization of virtual surround sound
TWI517028B (en) Audio spatialization and environment simulation
CN105379309B (en) For reproducing the arrangement and method of the audio data of acoustics scene
US6766028B1 (en) Headtracked processing for headtracked playback of audio signals
EP2591613B1 (en) 3d sound reproducing method and apparatus
US6021206A (en) Methods and apparatus for processing spatialised audio
US9967693B1 (en) Advanced binaural sound imaging
CN107018460A (en) Ears headphone with head tracking is presented
CN110192396A (en) For the method and system based on the determination of head tracking data and/or use tone filter
AU2002234849A1 (en) A method and system for simulating a 3D sound environment
JP2017522771A (en) Determine and use room-optimized transfer functions
CN108293165A (en) Enhance the device and method of sound field
US11221820B2 (en) System and method for processing audio between multiple audio spaces
JP2016527799A (en) Acoustic signal processing method
WO2018026963A1 (en) Head-trackable spatial audio for headphones and system and method for head-trackable spatial audio for headphones
US20190289418A1 (en) Method and apparatus for reproducing audio signal based on movement of user in virtual space
US6990210B2 (en) System for headphone-like rear channel speaker and the method of the same
WO2018072214A1 (en) Mixed reality audio system
CN107211230A (en) Sound reproduction system
CN206117991U (en) Audio processing equipment
US10419870B1 (en) Applying audio technologies for the interactive gaming environment
KR20070108341A (en) 3d sound reproduction apparatus and method using virtual speaker technique under stereo speaker environments
Grigoriou et al. Binaural mixing using gestural control interaction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16919238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/08/2019

122 Ep: pct application non-entry in european phase

Ref document number: 16919238

Country of ref document: EP

Kind code of ref document: A1