CN105959905B - Mixed mode spatial sound generates System and method for - Google Patents
Mixed mode spatial sound generates System and method for Download PDFInfo
- Publication number
- CN105959905B CN105959905B CN201610268371.7A CN201610268371A CN105959905B CN 105959905 B CN105959905 B CN 105959905B CN 201610268371 A CN201610268371 A CN 201610268371A CN 105959905 B CN105959905 B CN 105959905B
- Authority
- CN
- China
- Prior art keywords
- audio object
- branch
- ambisonic
- audio
- independent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000009877 rendering Methods 0.000 claims abstract description 14
- 230000004913 activation Effects 0.000 claims abstract description 9
- 210000005069 ears Anatomy 0.000 claims description 23
- 238000001514 detection method Methods 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 12
- 230000003068 static effect Effects 0.000 claims description 7
- 230000001052 transient effect Effects 0.000 claims 1
- 239000011159 matrix material Substances 0.000 description 12
- 230000008901 benefit Effects 0.000 description 6
- 230000004044 response Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 241000406668 Loxodonta cyclotis Species 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
- H04S7/306—For headphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Stereophonic System (AREA)
Abstract
System and method for is generated the invention discloses a kind of mixed mode spatial sound, the mixed mode spatial sound generation method includes inputting one or more audio object;The number of audio object is detected, when the number of audio object is more than first threshold A, activation ambisonic domains branch handles audio object using ambisonic methods, obtains virtual ring around spatial sound;Otherwise activate independent object and render branch, handle audio object using independent object rendering intent, obtain virtual ring around spatial sound.Mixed mode spatial sound generation System and method for, which is added, to be rendered control module and is controlled to be rendered to audio object, and virtual ring can be generated effectively and in high quality around sound, low complex degree can be kept while the 3D audios of high-quality are produced.
Description
Technical field
The present invention relates to signal processing technology field, and in particular to a kind of mixed mode spatial sound generates System and method for.
Background technology
When with virtual reality helmet (Head-Mounted Display, HMD) to user's presentation content, using void
Intend 3D Audiotechnicas, audio content is played to user by stereophone, at this moment need to face the virtual surrounding sound effect of raising
The problem of.In virtual reality applications, when playing audio content by stereophone, the purpose of virtual 3D audios is intended to
Reach that a kind of effect allows user just as being listened with loudspeaker array (such as 5.1 or 7.1).
When making virtual reality audio content, it usually needs several sound elements.It is a kind of improve telepresenc method be
The headwork (head tracking) of user is tracked, sound is handled accordingly.Such as, if original sound by with
Family is perceived as coming from front, after 90 degree of user's rotary head to the left, and sound, which should be processed, causes user to perceive sound from the just right side
90 degree of side.
Virtual reality device can have a many types herein, such as the display device of headed tracking, or simply one
The stereophone of portion's headed tracking transducer.
Realize that head tracking also there are a variety of methods.Relatively common is to use multiple sensors.Motion sensor external member is led to
Often include accelerometer, gyroscope and magnetometric sensor.Every kind of sensor has oneself in terms of motion tracking and absolute direction
Intrinsic strong point and weakness.Therefore practices well is will to come from each sensor using sensor " fusion " (sensor fusion)
Signal combine, produce a more accurate motion detection result.
, it is necessary to be changed accordingly to sound after end rotation angle is obtained.
It is to use HRTF (Head Related Transfer Function, head phase for audio object common practice
Closing transforming function transformation function) wave filter is filtered, and obtains virtual surround sound.HRTF is HRIR (Head in the title corresponding to time-domain
Related Impulse Response, the impulse response associated with head), or by source of sound and binaural room impulse response
(Binaural Room Impulse Response, BRIR) does convolution.Binaural room impulse response is made up of three parts:Directly
Up to sound, some discrete reflections and late reverberation (reverberation tail).
It is, if scene is complicated, to contain substantial amounts of audio pair directly by the shortcoming of audio object and this way of BRIR convolution
As then complexity can become very high, and for many audio-frequency playing terminals, this will cause power consumption excessive, or even can not play.
On virtual reality device, in addition it is also necessary to audio object position is adjusted in real time according to the action on head, this more greatly increases fortune
Calculation amount so that using the unrealistic of traditional method change in mobile virtual real world devices.
Another way is that sound is gone into ambisonic domains, and then signal is converted by using spin matrix again.
Specific practice is that audio is switched into B format signals, and the B format signals are converted into virtual speaker array signal, will virtually be raised
Sound device array signal is filtered by hrtf filter, obtains virtual surround sound.But this method renders flexibility in sound
It has been be short of that, and independent source of sound can not accurately have been controlled.
It can be seen that, both the above method respectively has advantage and disadvantage in efficiency and effect.
In view of this, a kind of effective and high-quality generation virtual ring is needed in this area around the solution of sound.
The content of the invention
System and method for is generated it is an object of the invention to provide a kind of mixed mode spatial sound, to solve prior art
In produce high-quality 3D audios while can not keep low computational complexity the problem of.
To achieve the above object, mixed mode spatial sound of the present invention generation system include rendering control module,
Ambisonic encoders, ears transcoder and earphone and head tracking device, it is described render control module respectively with
Ambisonic encoders and the connection of ears transcoder, the ambisonic encoders are connected with ears transcoder, the earphone
It is connected respectively with ambisonic encoders and ears transcoder with head tracking device;The control module that renders is used to receive
One or more audio object, detects the number of audio object, when the number of audio object is more than first threshold A, activation
The ambisonic domains branch that ambisonic encoders are constituted, handles audio object using ambisonic methods, obtains virtual ring
Around spatial sound and ambisonic encoders are transferred to, export virtual ring by ambisonic encoders exports around the ears of spatial sound
Virtual ring is around acoustical signal;Otherwise the independent object that activation ears transcoder is constituted renders branch, uses independent object rendering intent
Audio object is handled, virtual ring is obtained around spatial sound and exports ears output virtual ring of the virtual ring around spatial sound around acoustical signal.
It is described to render the metadata (metadata) that control module is further used for detecting audio object, the metadata
Including time and corresponding audio object in the position of three dimensions, in addition to divergence;The control module that renders is according to sound
The divergence of frequency object determines the processing mode of the audio object, if the divergence of audio object is more than Second Threshold B, will
The audio object is temporarily assigned to ambisonic domains branch;After temporarily distribution terminates, according to the current of audio object processing equipment
Situation, calculates computational complexity, is determined whether to redistribute audio object according to computational complexity;Computational complexity passes through statistics
The execution cycle of audio object processing equipment is drawn;When computational complexity allows N number of audio object, if present video
Object has M, and independent object, which renders branch, can handle 0 to N-T audio object, and ambisonic domains branch can handle M-N
+ T audio objects, will if the number H for distributing to the audio object that independent object renders branch is less than N-T
Any number of audio objects in 1 to N-T-H audio object in the branch of ambisonic domains are reassigned to independent object wash with watercolours
Contaminate branch;The N is more than T, and M is more than 0, H and is more than or equal to 0;If N is less than T, all branch is rendered using independent object;
If N is equal to T, all using ambisonic domains branch, or all using independent object renders branch.
It is described to render the distribution that control module determines audio object according to the divergence of source of sound;If the divergence of source of sound is high
In X, then in the case of complexity is met, audio object is assigned to ambisonic domains branch, conversely, audio object is distributed
Branch is rendered to independent object;Wherein X is specified by user.
The present invention also provides a kind of mixed mode spatial sound generation method, comprises the following steps:
Input one or more audio object;
The number of audio object is detected, when the number of audio object is more than first threshold A, activation ambisonic domains point
Branch, handles audio object using ambisonic methods, obtains virtual ring around spatial sound;Otherwise activate independent object and render branch,
Audio object is handled using independent object rendering intent, virtual ring is obtained around spatial sound.
The mixed mode spatial sound generation method further comprises the metadata for detecting audio object, the metadata bag
Include the divergence of time and corresponding audio object in the position of three dimensions, in addition to audio object.
The mixed mode spatial sound generation method further comprises determining the audio pair according to the divergence of audio object
The processing mode of elephant, if the divergence of audio object is more than Second Threshold B, the audio object is temporarily assigned to
Ambisonic domains branch.
After temporarily distribution terminates, according to the present situation of audio object processing equipment, computational complexity is calculated, according to computing
Complexity determines whether to redistribute audio object.
Computational complexity is drawn by counting the execution cycle of audio object processing equipment;1 ambisonic domains branch phase
When in the complexity of T independent audio branches;When computational complexity allows N number of audio object, if present video pair
As there is M, independent object, which renders branch, can handle 0 to N-T audio object, and ambisonic domains branch can handle M-N+T
Individual audio object, if the number H for distributing to the audio object that independent object renders branch is less than N-T, by ambisonic
Any number of audio objects in 1 to N-T-H audio object in the branch of domain are reassigned to independent object and render branch;Institute
N is stated to be more than 0, H more than T, M and be more than or equal to 0.If N is less than T, all branch is rendered using independent object;If N is equal to
T, then all using ambisonic domains branch, or all using independent object render branch.
In a further advantageous embodiment, the distribution of audio object is determined according to the divergence of source of sound, if the hair of source of sound
Divergence is higher than X, then in the case of complexity is met, audio object is assigned to ambisonic branches, conversely, audio object
It is assigned to independent source of sound and renders branch;Wherein X is specified by user.
The mixed mode spatial sound generation method detects the number of audio object using static schema or dynamic mode
With the metadata of detection audio object;The static schema refers to only most starting the number and audio of audio object of detection
The metadata of object;The dynamic mode refers to over time, dynamically adjust and how audio object is assigned into list
Only object renders branch and this two-way branch of ambisonic domains branch.
The specific practice of the dynamic mode is to use fixed time-interval sampling or on-fixed time sampling;The fixation
Time interval sampling refers at interval of regular time section;The number of audio object of detection and the metadata of audio object;
The on-fixed time sampling refers to the initial time based on audio object, each new audio object beginning and end when
Carve the number of audio object of detection and the metadata of audio object.
The invention has the advantages that:Mixed mode spatial sound generation System and method for of the present invention, which is added, renders control
Molding block is controlled to be rendered to audio object, can keep low complex degree while the 3D audios of high-quality are produced.
Brief description of the drawings
Fig. 1 is the structural representation that mixed mode spatial sound of the present invention generates system.
Embodiment
Following examples are used to illustrate the present invention, but are not limited to the scope of the present invention.
As shown in figure 1, the present invention provides a kind of mixed mode spatial sound generation system, including render control module,
Ambisonic encoders, ears transcoder and earphone and head tracking device, it is described render control module respectively with
Ambisonic encoders and the connection of ears transcoder, the ambisonic encoders are connected with ears transcoder, the earphone
It is connected respectively with ambisonic encoders and ears transcoder with head tracking device;The control module that renders is used to receive
One or more audio object, detects the number of audio object, when the number of audio object is more than first threshold A, activation
The ambisonic domains branch that ambisonic encoders are constituted, handles audio object using ambisonic methods, obtains virtual ring
Around spatial sound and ambisonic encoders are transferred to, export virtual ring by ambisonic encoders exports around the ears of spatial sound
Virtual ring is around acoustical signal;Otherwise the independent object that activation ears transcoder is constituted renders branch, uses independent object rendering intent
Audio object is handled, virtual ring is obtained around spatial sound and exports ears output virtual ring of the virtual ring around spatial sound around acoustical signal.
The earphone and head tracking device are used for the end rotation angle for obtaining user and by the end rotation angle of user
Degree is transferred to ambisonic encoders and ears transcoder respectively;The ambisonic encoders and ears transcoder difference root
Audio object is handled according to the end rotation angle of user, virtual ring is obtained around spatial sound.
End rotation angle according to user is referred to according to the end rotation angle of user processing audio object, by audio pair
The B- format signals rotation of elephant obtains postrotational B- format signals;Specifically, it is that spin matrix is generated according to the anglec of rotation,
Further according to the spin matrix, the B- format signals (signal i.e. to be adjusted) of audio object are rotated.So-called rotation,
Spin matrix is multiplied with signal matrix to be adjusted, rotation does not change the size of audio signal matrix component, only changes component
Direction.The exponent number of spin matrix is adapted with audio signal matrix.For example, when signal matrix to be adjusted is [W2X2Y2]TWhen,
Spin matrix isWhen signal matrix to be adjusted is [W2X2Y2Z2]TWhen, spin matrix is
It is described render control module be further used for detect audio object metadata, the metadata include the time and
Divergence of the corresponding audio object in the position of three dimensions, in addition to audio object;The control module that renders is according to sound
The divergence of frequency object determines the processing mode of the audio object, if the divergence of audio object is more than Second Threshold B, will
The audio object is temporarily assigned to ambisonic domains branch;After temporarily distribution terminates, according to the current of audio object processing equipment
Situation, calculates computational complexity, is determined whether to redistribute audio object according to computational complexity;Computational complexity passes through statistics
The execution cycle of audio object processing equipment is drawn.
Divergence (diffusivity) represent herein sound whether in space be have clear and definite dimensional orientation (such as certain
One point sound source), still comparing diverging such as tends to ambient sound.The scope of divergence is [0,1], if 0, then represent audio object
Divergence it is low, level off to point sound source.If 1, then represent nondirectional ambient sound.
1 ambisonic domains branch equivalent to the independent audio branch of T complexity, and no matter 1 ambisonic
How many audio objects are assigned with the branch of domain, 1 ambisonic domains branch is all equivalent to the complexity of T independent audio branches
Degree.Under normal circumstances, complexity of the ambisonic domains branch of T=8, i.e., 1 equivalent to 8 independent audio branches.But T
Specific value is needed according to actual audio object processing equipment determination, and the T value values of different audio object processing equipments have
May be different.
When computational complexity allows N number of audio object, if current audio object has M, independent object is rendered
Branch can handle 0 to N-T audio object, and ambisonic domains branch can handle M-N+T audio object, if distribution
The number H that the audio object of branch is rendered to independent object is less than N-T, then by 1 to N-T-H in the branch of ambisonic domains
Any number of audio objects in individual audio object are reassigned to independent object and render branch;The N is more than T, and M is more than 0, H
More than or equal to 0.If N is less than T, all branch is rendered using independent object;If N is equal to T, all use
Ambisonic domains branch, or all using independent object render branch.
For example, when computational complexity allows 8 audio objects, if current audio object has 8, temporarily distribution
The number that the audio object of branch is rendered to independent object is 3, is temporarily assigned to the audio object in the branch of ambisonic domains
Number be 5, due to the complexity of independent audio branch individual equivalent to T (T=8) of 1 ambisonic domains branch, Er Qiewu
By how many audio objects are assigned with 1 ambisonic domains branch, 1 ambisonic domains branch is all equivalent to T (T=8)
The complexity of individual independent audio branch, therefore " number for being temporarily assigned to the audio object that independent object renders branch is 3,
The number for the audio object being temporarily assigned in the branch of ambisonic domains is 5 " represent that computational complexity needs to allow 3+8=
11 audio objects, and actual conditions are that computational complexity only allows 8 audio objects in this example.Therefore need by
5 audio objects in the branch of ambisonic domains are reassigned to independent object and render branch's (so equivalent to 8 audios pair
As all giving independent object renders branch, the requirement of computational complexity 8 audio objects of permission is met), or will be individually right
As 3 audio objects rendered in branch be reassigned to ambisonic domains branch (it is complete equivalent to by 8 audio objects
Ambisonic domains branch is given in part, due to 1 ambisonic domains branch all answering equivalent to T (T=8) individual independent audio branches
Miscellaneous degree, therefore also meet the requirement of computational complexity 8 audio objects of permission).
When computational complexity allows 8 audio objects, if current audio object there are 14, list is temporarily assigned to
The number that only object renders the audio object of branch is 3, for the audio object being temporarily assigned in the branch of ambisonic domains
Number is 11, because " number for being temporarily assigned to the audio object that independent object renders branch is 3, is temporarily assigned to
The number of audio object in the branch of ambisonic domains is 11 " represent that computational complexity needs to allow 3+T audio object
(under normal circumstances T=8, i.e., 3+T=11 audio object, actual operation complexity only allow 8 audio objects), it is therefore desirable to
Redistribute.By 0 to N-T audio object, (N, which refers to computational complexity, allows N=8 in N number of audio object, this example, generally
In the case of T=8) distribute to independent object and render branch, due to N-T=8-8=0 here, i.e., 0 audio object is distributed to
Independent object renders branch, it is therefore desirable to be reassigned to 3 audio objects for being temporarily assigned to independent object and rendering branch
Ambisonic domains branch, the number for being actually allocated to the audio object of ambisonic domains branch is M-N+T, and (M refers to current sound
M=14 in the number of frequency object, this example, M-N+T are 14-8+8=14), that is, it is actually allocated to ambisonic domains branch
Audio object number be 14.That is, the result redistributed is will to be temporarily assigned to independent object to render branch
3 audio objects be reassigned to ambisonic domains branch so that current 14 audio objects are all assigned to ambisonic
Domain branch.
When computational complexity allows 12 audio objects, if current audio object has 20 (i.e. M=20), temporarily
When be assigned to independent object render branch audio object number be 3, be temporarily assigned to the sound in the branch of ambisonic domains
The number of frequency object is 17, because " number for being temporarily assigned to the audio object that independent object renders branch is 3, temporarily
The number for the audio object being assigned in the branch of ambisonic domains is 17 " represent that computational complexity needs to allow 3+T audio
Object (T=8, i.e., 3+T=11 audio object, 12 audio objects of actual operation complexity permission under normal circumstances), therefore
It can be redistributed.Because the number for distributing to the audio object that independent object renders branch is 3 (i.e. H=3), it is less than
N-T is 12-8=4, therefore by 1 to N-T-H in the branch of ambisonic domains can be 12-8-3=1 audio object
Any number of audio objects are reassigned to independent object and render branch, you can with by 1 audio in the branch of ambisonic domains
Object is reassigned to independent object and renders branch.
In another embodiment, the distribution of audio object is determined according to divergence.
If the divergence of audio object is higher than X (0≤X≤1), in the case of complexity is met, source of sound is assigned to
Ambisonic branches, conversely, audio object, which is assigned to independent audio object, renders branch.
In a preferred embodiment, X=0.5, if that is, the divergence of source of sound (is naturally not limited to this higher than 0.5
Individual value, X can between 0-1 value, or X specifies by user), then in the case of complexity is met, source of sound is assigned to
Ambisonic branches, conversely, source of sound, which is assigned to independent source of sound, renders branch.
The present invention also provides a kind of mixed mode spatial sound generation method, comprises the following steps:
Input one or more audio object;
The number of audio object is detected, when the number of audio object is more than first threshold A, activation ambisonic domains point
Branch, handles audio object using ambisonic methods, obtains virtual ring around spatial sound;Otherwise activate independent object and render branch,
Audio object is handled using independent object rendering intent, virtual ring is obtained around spatial sound.
In a preferred embodiment, the first threshold A is equal to 8.In other examples, first threshold A can be with
It is arbitrarily designated according to the actual requirements by technical staff.
The mixed mode spatial sound generation method further comprises the metadata for detecting audio object, the metadata bag
Include the divergence of time and corresponding audio object in the position of three dimensions, in addition to audio object.
The mixed mode spatial sound generation method further comprises determining the audio pair according to the divergence of audio object
The processing mode of elephant, if the divergence of audio object is more than Second Threshold B, the audio object is temporarily assigned to
Ambisonic domains branch.
In a preferred embodiment, the Second Threshold B is equal to 0.5.In other examples, Second Threshold B can
To be arbitrarily designated according to the actual requirements by technical staff.
After temporarily distribution terminates, according to the present situation of audio object processing equipment, computational complexity is calculated, according to computing
Complexity determines whether to redistribute audio object.
Computational complexity can be drawn by counting the execution cycle of audio object processing equipment.When computational complexity allows
When N number of audio object, if current audio object has M, independent object, which renders branch, can handle 0 to N-T audio
Object, ambisonic domains branch can handle M-N+T audio object, if distributing to the audio that independent object renders branch
The number H of object is less than N-T, then by any number of sounds in 1 to N-T-H audio object in the branch of ambisonic domains
Frequency object is reassigned to independent object and renders branch;The N is more than or equal to T, and M is more than 0, H and is more than or equal to 0.If N is small
In T, then all branch is rendered using independent object;If N is equal to T, according to audio object divergence, all use
Ambisonic domains branch, or all using independent object render branch.
In a further advantageous embodiment, the distribution of audio object is determined according to the divergence of source of sound, if the hair of source of sound
Divergence is higher than X, then in the case of complexity is met, source of sound is assigned to ambisonic branches, conversely, source of sound is assigned to list
Only source of sound renders branch;Wherein X is specified by user.
According to description above, independent object rendering intent and ambisonic methods processing audio object in efficiency and
Respectively there are advantage and disadvantage in effect.The advantage of independent object rendering intent is accurate positioning;The shortcoming of independent object rendering intent be as
Fruit scene is complicated, containing substantial amounts of audio object, then complexity can become very high, and for many audio-frequency playing terminals, this will
Cause power consumption excessive, or even can not play.The advantage of ambisonic methods is computational complexity kept stable,
The shortcoming of ambisonic methods is to render flexibility in sound to be short of, and independent source of sound accurately can not be controlled.
Therefore mixed mode spatial sound generation method of the present invention is needed in independent object rendering intent and ambisonic
Made a choice between method, it is determined that a how many audio object are distributed into independent object renders branch, by how many audio objects
Distribute to ambisonic domains branch.Such as when accurate positioning is needed, on the premise of computational complexity requirement is met,
Audio object as much as possible is distributed into independent object and renders branch.When operand is very big, then by more sounds
Frequency object distributes to ambisonic domains branch.
Mixed mode spatial sound generation method of the present invention is using static schema or dynamic mode detection audio object
Number and detection audio object metadata.The static schema refers to only most starting the number of audio object of detection
With the metadata of audio object.But it is due to that the number of each moment audio object is to differ during spatial sound is generated
Sample, environmental factor is also changing, therefore static schema is not optimal solution, but advantage is that comparison is simple
It is single.
The dynamic mode refers to over time, dynamically adjust and how audio object is assigned into independent object
Render branch and this two-way branch of ambisonic domains branch.Specific way can use fixed time-interval sampling or non-solid
Fix time sampling.The fixed time-interval sampling refers at interval of regular time section (such as at interval of one second) detection one
The number of secondary audio program object and the metadata of audio object.When the on-fixed time sampling refers to the starting based on audio object
Between, first number of the number of audio object of detection and audio object at the time of each new audio object beginning and end
According to.
Although above with general explanation and specific embodiment, the present invention is described in detail, at this
On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore,
These modifications or improvements, belong to the scope of protection of present invention without departing from theon the basis of the spirit of the present invention.
Claims (6)
1. a kind of mixed mode spatial sound generates system, it is characterised in that the mixed mode spatial sound generation system includes wash with watercolours
Control module, ambisonic encoders, ears transcoder and earphone and head tracking device are contaminated, it is described to render control module point
It is not connected with ambisonic encoders and ears transcoder, the ambisonic encoders are connected with ears transcoder, described
Earphone is connected with ambisonic encoders and ears transcoder respectively with head tracking device;The control module that renders is used for
One or more audio object is received, the number of audio object is detected, when the number of audio object is more than first threshold A, swashed
The ambisonic domains branch that ambisonic encoders living are constituted, audio object is handled using ambisonic methods, obtains virtual
Ambient sound is simultaneously transferred to ambisonic encoders, and it is defeated around the ears of spatial sound to export virtual ring by ambisonic encoders
Go out virtual ring around acoustical signal;Otherwise the independent object that activation ears transcoder is constituted renders branch, uses the independent object side of rendering
Method handles audio object, obtains virtual ring around spatial sound and exports ears output virtual surround sound letter of the virtual ring around spatial sound
Number;Described to render the metadata that control module is further used for detecting audio object, the metadata is including the time and correspondingly
Audio object in the position of three dimensions, in addition to divergence;The divergence that control module is rendered according to audio object
The processing mode of the audio object is determined, it is if the divergence of audio object is more than Second Threshold B, the audio object is temporary transient
It is assigned to ambisonic domains branch;After temporarily distribution terminates, according to the present situation of audio object processing equipment, computing is calculated
Complexity, determines whether to redistribute audio object according to computational complexity;Computational complexity is handled by counting audio object
The execution cycle of equipment is drawn.
2. mixed mode spatial sound as claimed in claim 1 generates system, it is characterised in that 1 ambisonic domains branch is suitable
In the complexity of T independent audio branches;When computational complexity allows N number of audio object, if current audio object
There are M, independent object, which renders branch, can handle 0 to N-T audio object, ambisonic domains branch can handle M-N+T
Audio object, if the number H for distributing to the audio object that independent object renders branch is less than N-T, by ambisonic domains
Any number of audio objects in 1 to N-T-H audio object in branch are reassigned to independent object and render branch;It is described
N is more than T, and M is more than 0, H and is more than or equal to 0;If N is less than T, all branch is rendered using independent object;If N is equal to T,
Then according to audio object divergence, all using ambisonic domains branch, or all using independent object branch is rendered.
3. mixed mode spatial sound as claimed in claim 1 generates system, it is characterised in that the control module that renders is according to sound
The divergence of frequency object determines the distribution of audio object;If the divergence of audio object is higher than X, the feelings of complexity are being met
Under condition, audio object is assigned to ambisonic domains branch, conversely, audio object, which is assigned to independent object, renders branch;Its
Middle X is specified by user.
4. a kind of mixed mode spatial sound generation method, it is characterised in that the mixed mode spatial sound generation method include with
Lower step:
Input one or more audio object;
The number of audio object is detected, when the number of audio object is more than first threshold A, activation ambisonic domains branch adopts
Audio object is handled with ambisonic methods, virtual ring is obtained around spatial sound;Otherwise activate independent object and render branch, use
Independent object rendering intent processing audio object, obtains virtual ring around spatial sound;
After temporarily distribution terminates, according to the present situation of audio object processing equipment, computational complexity is calculated, it is complicated according to computing
Degree determines whether to redistribute audio object;Computational complexity is drawn by counting the execution cycle of audio object processing equipment;
When computational complexity allows N number of audio object, if current audio object has M, independent object renders branch can
Handle 0 to N-T audio object, ambisonic domains branch can handle M-N+T audio object, if distribute to it is independent right
As the number H for the audio object for rendering branch is less than N-T, then by 1 to N-T-H audio pair in the branch of ambisonic domains
Any number of audio objects as in are reassigned to independent object and render branch;The N is more than T, and M is more than or waited more than 0, H
In 0;If N is less than T, all branch is rendered using independent object;If N is equal to T, according to audio object divergence, entirely
Portion uses ambisonic domains branch, or all using independent object renders branch;
The mixed mode spatial sound generation method further comprises the metadata for detecting audio object, when the metadata includes
Between and corresponding audio object in the position of three dimensions, in addition to audio object divergence;
The mixed mode spatial sound generation method further comprises determining the audio object according to the divergence of audio object
Processing mode, if the divergence of audio object is more than Second Threshold B, ambisonic is temporarily assigned to by the audio object
Domain branch.
5. mixed mode spatial sound generation method as claimed in claim 4, it is characterised in that sound is determined according to the divergence of source of sound
The distribution of frequency object, if the divergence of source of sound is higher than X, in the case of complexity is met, is assigned to audio object
Ambisonic branches, conversely, audio object, which is assigned to independent source of sound, renders branch;Wherein X is specified by user.
6. the mixed mode spatial sound generation method as described in claim 4 or 5, it is characterised in that the mixed mode spatial sound
Generation method detects the number of audio object and the metadata of detection audio object using static schema or dynamic mode;It is described
Static schema refers to only in the number and the metadata of audio object for most starting audio object of detection;The dynamic mode is
Refer to over time, dynamically adjust and how audio object is assigned to independent object and renders branch and ambisonic domains
This two-way branch of branch;The specific practice of the dynamic mode is to use fixed time-interval sampling or on-fixed time sampling;
The fixed time-interval sampling refers at interval of regular time section;The number of audio object of detection and audio object
Metadata;The on-fixed time sampling refers to the initial time based on audio object, start in each new audio object and
The number of audio object of detection and the metadata of audio object at the time of end.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610268371.7A CN105959905B (en) | 2016-04-27 | 2016-04-27 | Mixed mode spatial sound generates System and method for |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610268371.7A CN105959905B (en) | 2016-04-27 | 2016-04-27 | Mixed mode spatial sound generates System and method for |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105959905A CN105959905A (en) | 2016-09-21 |
CN105959905B true CN105959905B (en) | 2017-10-24 |
Family
ID=56915643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610268371.7A Active CN105959905B (en) | 2016-04-27 | 2016-04-27 | Mixed mode spatial sound generates System and method for |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105959905B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109286889A (en) * | 2017-07-21 | 2019-01-29 | 华为技术有限公司 | A kind of audio-frequency processing method and device, terminal device |
US10469968B2 (en) * | 2017-10-12 | 2019-11-05 | Qualcomm Incorporated | Rendering for computer-mediated reality systems |
CN111508507B (en) * | 2019-01-31 | 2023-03-03 | 华为技术有限公司 | Audio signal processing method and device |
CN113873420B (en) * | 2021-09-28 | 2023-06-23 | 联想(北京)有限公司 | Audio data processing method and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9516446B2 (en) * | 2012-07-20 | 2016-12-06 | Qualcomm Incorporated | Scalable downmix design for object-based surround codec with cluster analysis by synthesis |
CN105191354B (en) * | 2013-05-16 | 2018-07-24 | 皇家飞利浦有限公司 | Apparatus for processing audio and its method |
US9747910B2 (en) * | 2014-09-26 | 2017-08-29 | Qualcomm Incorporated | Switching between predictive and non-predictive quantization techniques in a higher order ambisonics (HOA) framework |
CN105120421B (en) * | 2015-08-21 | 2017-06-30 | 北京时代拓灵科技有限公司 | A kind of method and apparatus for generating virtual surround sound |
CN105376690A (en) * | 2015-11-04 | 2016-03-02 | 北京时代拓灵科技有限公司 | Method and device of generating virtual surround sound |
-
2016
- 2016-04-27 CN CN201610268371.7A patent/CN105959905B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN105959905A (en) | 2016-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105959905B (en) | Mixed mode spatial sound generates System and method for | |
CN105872940B (en) | A kind of virtual reality sound field generation method and system | |
CN102572676B (en) | A kind of real-time rendering method for virtual auditory environment | |
US6766028B1 (en) | Headtracked processing for headtracked playback of audio signals | |
WO2018196469A1 (en) | Method and apparatus for processing audio data of sound field | |
CN104284291B (en) | The earphone dynamic virtual playback method of 5.1 path surround sounds and realize device | |
CN104041081B (en) | Sound Field Control Device, Sound Field Control Method, Program, Sound Field Control System, And Server | |
JP7038725B2 (en) | Audio signal processing method and equipment | |
CN106210990B (en) | A kind of panorama sound audio processing method | |
CN107231600B (en) | The compensation method of frequency response and its electronic device | |
CN105376690A (en) | Method and device of generating virtual surround sound | |
CN105353868B (en) | A kind of information processing method and electronic equipment | |
WO2022021898A1 (en) | Audio processing method, apparatus, and system, and storage medium | |
EP3622730B1 (en) | Spatializing audio data based on analysis of incoming audio data | |
CN106454686A (en) | Multi-channel surround sound dynamic binaural replaying method based on body-sensing camera | |
CN106331977B (en) | A kind of virtual reality panorama acoustic processing method of network K songs | |
US11696087B2 (en) | Emphasis for audio spatialization | |
JP2021535632A (en) | Methods and equipment for processing audio signals | |
TW201735667A (en) | Method, equipment and apparatus for acquiring spatial audio direction vector | |
CN105682000B (en) | A kind of audio-frequency processing method and system | |
CN101184349A (en) | Three-dimensional ring sound effect technique aimed at dual-track earphone equipment | |
TW202105164A (en) | Audio rendering for low frequency effects | |
CN101155440A (en) | Three-dimensional around sound effect technology aiming at double-track audio signal | |
CN114049871A (en) | Audio processing method and device based on virtual space and computer equipment | |
WO2018072214A1 (en) | Mixed reality audio system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |