CN115334366A

CN115334366A - Modeling method for interactive immersive sound field roaming

Info

Publication number: CN115334366A
Application number: CN202210978930.9A
Authority: CN
Inventors: 刘京宇; 蒋鉴; 任鹏昊
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-11-11

Abstract

The invention provides a modeling method for interactive immersive sound field roaming, a sound field roaming method and a sound field roaming system. The sound field roaming method comprises the following steps: determining N first positions of N virtual musical instruments in a virtual sound field space and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space, and the N virtual musical instruments, the virtual sound field space and the virtual character are realized through virtual reality technology; determining relative position information between the N first positions and the second positions, wherein the N first positions are N virtual sound source positions, and the second positions are virtual listening positions; processing the N types of first audio signals by using a sound field space model according to the relative position information to obtain second audio signals; and responding to the playing operation of the user, and playing the second audio signal to the user.

Description

Modeling method for interactive immersion type sound field roaming

Technical Field

The invention relates to the technical field of performance, in particular to a modeling method, a sound field roaming method and a sound field roaming system for interactive immersive sound field roaming.

Background

Reproducing the auditory effects of a real sound field space (e.g., a concert hall) in a virtual system is critical to the experience of the audience and music appreciators. For musicians in the performance industry, the stage can be transferred from off-line to on-line by artists based on rehearsal of a virtual platform and development of a performance simulation system, and the current situations that tourism is difficult, talents run off, and cultural and artistic brands are difficult to establish are solved.

In the process of implementing the embodiment of the present invention, the inventor starts from the field of music performance, combines the use requirements of audiences, musicians, commanders and audio engineers, and finds that the following problems still exist:

(1) For the conductor and the musician of the band, the position of the vocal part of the band in the prior art can not be changed, the real-time switching of the acoustic effect of the vocal part of the band can not be realized, and the evaluation efficiency of the performance effect of the band can not be influenced.

(2) For a sound recorder, the prior art cannot realize real-time comparison and switching of sound effects of different recording systems, cannot realize real-time regulation and control of volume balance of different sound parts, and has very low music mixing efficiency.

(3) For the auditory effect of a concert, the prior art cannot simulate the real-time presentation of the sound effect at any position in a concert hall, and cannot simulate the auditory effect of different sound field spaces (such as different concert halls, different natural scenes and living environments).

Therefore, how to reproduce the acoustic effects of different sound field spaces is a problem to be solved urgently.

Disclosure of Invention

In view of the above problems, the present invention provides a modeling method of interactive immersive sound field roaming, a sound field roaming method, and a sound field roaming system.

One aspect of an embodiment of the present invention provides an interactive immersive sound field roaming method based on audibility, including: determining N first positions of N virtual musical instruments in a virtual sound field space and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space; determining relative position information between the N first positions and the second positions, wherein the N first positions are N virtual sound source positions, the second positions are virtual listening positions, and N is an integer greater than or equal to 1; processing N types of first audio signals by using a sound field space model according to the relative position information to obtain second audio signals, wherein the sound field space model is used for simulating the propagation of the N types of first audio signals in a physical space, and the N types of first audio signals are in one-to-one correspondence with the N types of virtual musical instruments; and responding to the playing operation of the user, and playing the second audio signal to the user.

According to an embodiment of the present invention, the sound field space model includes a direct sound processing model, an early reflected sound model, and a late reverberant sound model, and the processing N types of first audio signals using the sound field space model to obtain the second audio signal includes: performing attenuation processing on the N first audio signals by using the direct sound processing model to obtain a first output result; inputting the first output result into the early reflected sound model for reflection processing to obtain a second output result; inputting the first output result into the later reverberation model for reverberation processing to obtain a third output result; and obtaining the second audio signal according to the second output result and the third output result.

According to an embodiment of the present invention, the relative position information includes distance information, and the attenuating the N first audio signals using the direct sound processing model includes: and processing the N types of first audio signals according to the distance information by utilizing N distance attenuation curves, wherein the N distance attenuation curves correspond to the N types of first audio signals one to one, and any two curves in the N distance attenuation curves are the same or different.

According to an embodiment of the present invention, the processing the N types of first audio signals according to the distance information by using N distance attenuation curves includes performing cone attenuation processing on at least one audio signal of the N types of first audio signals, specifically including: for any one of the at least one audio signal, obtaining a propagation distance based on the internal space information of the virtual soundfield space; taking the position of a virtual sound source corresponding to the audio signal as the position of a sphere center, and taking the propagation distance as a radius to obtain a spherical propagation area of the audio signal; dividing the spherical propagation region into an inner angle region, an outer angle region, and a transition region between the inner angle region and the outer angle region; and performing corresponding attenuation processing on the audio signal according to an actual region to which the second position belongs to obtain the first output result, wherein the actual region comprises any one of the inner corner region category, the outer corner region and the transition region.

According to the embodiment of the invention, M virtual sound sources are obtained by calculation according to the N virtual sound source positions and the geometric forms of the virtual sound field space; calculating S sound reflection paths according to the second position and the geometric form, wherein M and S are integers which are larger than or equal to 1 respectively; wherein, the inputting the first output result into the early stage reflected sound model for reflection processing to obtain a second output result includes: and performing reflection processing on the first output result according to the M virtual sound sources and the S sound reflection paths to obtain a second output result.

According to an embodiment of the present invention, before the calculating S sound reflection paths, the method further comprises: taking the virtual character as a ray source, and emitting virtual rays from the second position; and detecting auditory interaction information through the virtual ray, wherein the auditory interaction information comprises the distance between the virtual character and the wall in the virtual sound field space and the material information of the wall in the virtual sound field space.

According to an embodiment of the present invention, the late reverberation model includes K impulse response signals recorded from K physical environments, the inputting the first output result into the late reverberation model for reverberation processing, and the obtaining a third output result includes: responding to a first virtual sound field space selected by the user from K virtual sound field spaces, and calling a first impulse response signal, wherein the first virtual sound field space is obtained according to a first physical environment construction in the K physical environments, and K is an integer greater than or equal to 1; and performing convolution calculation on the first output result and the first impulse response signal to obtain a third output result.

According to an embodiment of the invention, the method further comprises: in response to a first instruction from the user to move the virtual character, causing the virtual character to move to a third location; updating the virtual listening position to the third position; and re-executing the operations of determining the relative position information, obtaining the second audio signal and playing the second audio signal to the user.

According to an embodiment of the invention, the method further comprises: in response to a second instruction from the user to move at least one virtual instrument, causing the at least one virtual instrument to move to a fourth position; updating the corresponding position of the at least one virtual musical instrument in the N virtual sound source positions to the fourth position; and re-executing the operations of determining the relative position information, obtaining the second audio signal and playing the second audio signal to the user.

Another aspect of the present invention provides an interactive immersive sound field roaming system based on audibility, including: a position determination unit for determining N first positions of N kinds of virtual musical instruments in a virtual sound field space, and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space; a relative position unit, configured to determine relative position information between the N first positions and the second position, where the N first positions are N virtual sound source positions, the second position is a virtual listening position, and N is an integer greater than or equal to 1; a signal processing unit, configured to process N types of first audio signals by using a sound field space model according to the relative position information, to obtain a second audio signal, where the sound field space model is used to simulate propagation of the N types of first audio signals in a physical space, and the N types of first audio signals are in one-to-one correspondence with the N types of virtual musical instruments; and the audio playing unit is used for responding to the playing operation of the user and playing the second audio signal to the user.

Another aspect of the embodiments of the present invention provides a modeling method for interactive immersive sound field roaming, including: obtaining a direct sound processing model, wherein the direct sound processing model is used for carrying out attenuation processing on N types of first audio signals to obtain a first output result, the N types of first audio signals are respectively transmitted to virtual listening positions from N virtual sound source positions, and N is an integer greater than or equal to 1; obtaining an early stage reflected acoustic model, wherein the early stage reflected acoustic model is used for performing reflection processing on the first output result to obtain a second output result; obtaining a late reverberation sound model for performing reverberation processing on the first output result to obtain a third output result; and setting a main output bus, wherein the main output bus is used for obtaining a second audio signal according to the second output result and the third output result, and the second audio signal is obtained by simulating the propagation of the N types of first audio signals in a physical space.

Another aspect of an embodiment of the present invention provides an electronic device, including: one or more processors; a storage device to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

Yet another aspect of embodiments of the present invention provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to perform the method as described above.

One or more embodiments of the invention can provide an audible virtual sound field space, and have a virtual character which can be operated by a user, and along with the movement of the virtual character, the virtual character simulates the sound propagation phenomenon in a real environment to play audio for the user. The positions of the N kinds of virtual musical instruments and virtual characters are respectively determined and are used as N virtual sound source positions and virtual listening positions. Then, relative position information between the N virtual sound source positions and the virtual listening position is determined. Then, the N types of first audio signals are processed by utilizing the sound field space model to simulate the propagation effect of the N types of first audio signals in the physical environment based on the relative position, and a second audio signal is obtained. And finally, playing the second audio signal to the user. Therefore, the immersive sound field roaming function of real-time interaction of the performance is realized.

The above description is only an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description so as to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.

Drawings

The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a flow diagram of an audibility-based interactive immersive sound field roaming method according to an embodiment of the present invention;

fig. 2 schematically shows a flow chart for obtaining a second audio signal according to an embodiment of the invention;

fig. 3 schematically shows a flow chart for processing a first audio signal according to an embodiment of the invention;

FIG. 4 schematically illustrates a flow diagram of a cone attenuation process in accordance with an embodiment of the present invention;

FIG. 5 schematically shows a flow diagram of a reflection process according to an embodiment of the invention;

FIG. 6 schematically shows a flow diagram for detecting auditory interaction information according to an embodiment of the invention;

FIG. 7 schematically illustrates a flow chart for obtaining a third output result according to an embodiment of the invention;

FIG. 8 schematically illustrates a flow chart for updating a virtual listening position according to an embodiment of the present invention;

FIG. 9 schematically illustrates a flow diagram for updating a virtual sound source position according to an embodiment of the invention;

FIG. 10 schematically illustrates a technical architecture diagram of a modeling approach suitable for implementing interactive immersive sound field roaming in accordance with an embodiment of the present invention;

FIG. 11 schematically illustrates a system development architecture diagram suitable for implementing a modeling method for interactive immersive sound field roaming in accordance with an embodiment of the present invention;

FIG. 12 is a block diagram that schematically illustrates the structure of an audibility-based interactive immersive sound field roaming system, in accordance with an embodiment of the present invention; and

FIG. 13 shows a schematic structural diagram of a computing device according to an embodiment of the invention.

Detailed Description

First, terms related to embodiments of the present invention will be described so that the present invention can be better understood.

Audible: techniques to create audible sound files from digital (analog, measured, or synthesized) data.

And (3) interactive mode: the user can interact with the virtual object provided by the embodiment of the invention through some operations, so as to provide the user with functions of sound field roaming, sound field switching, sound part position switching, audio processing and the like in real time.

Immersion feeling: the real-time acoustic environment in the real-time music performance is simulated, so that the user has the immersive auditory experience in the sound field space, and the immersive effect is generated.

Sound field roaming: and (3) a process of moving the position of the virtual character in at least partial area of the virtual sound field space.

Reflected sound: the sound waves transmitted from the ceiling and walls in the room help to form higher sound pressure levels.

Direct sound: refers to sound that travels directly from a sound source to a recipient in a straight line without any reflection.

Early reflected sound: also known as initial reflected sound. The portion of the reflected sound that follows the arrival of the direct sound that is acoustically beneficial.

Reverberant sound: the superposition of all the once and many reflected sounds at the same time when the sound in the room reaches a steady state or the sound source is continuously sounding.

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The embodiment of the invention provides a modeling method for interactive immersive sound field roaming, a sound field roaming method and a sound field roaming system. By taking a virtual concert hall as an example, the technical problem of real-time simulation and simulation of acoustic effects of a sound field of a concert can be solved by combining the use requirements of roles such as audiences, musicians, directors, audio engineers and the like based on the concept of providing a multifunctional customizable interactive immersive concert hall and through a geometric acoustic simulation algorithm and multi-engine cross-platform cooperative operation. For the conductor of the band and the musician, the technical problem that the sound effect of the sound part of the band cannot be switched in real time is solved, the conductor of the band can be used as a virtual rehearsal hall to simulate the music performance of different arrangements and styles, and the efficiency of evaluating the performance effect of the band is improved. For audio engineers (such as a sound engineer), the technical problems of real-time comparison and switching of sound effects of different recording positions are solved, real-time regulation and control of volume balance of different sound parts are realized, and the working efficiency of the sound engineer is improved. For the auditory effect of a concert, the technical problem of simulating the sound effect of any position in a concert hall in real time is solved, and the auditory effects of different sound field spaces can be simulated in real time.

Fig. 1 schematically shows a flow diagram of an audibility-based interactive immersive sound field roaming method according to an embodiment of the present invention.

As shown in fig. 1, the audibility-based interactive immersive sound field roaming method of this embodiment includes operations S110 to S140.

In operation S110, N first positions of N kinds of virtual musical instruments in a virtual sound field space, and a second position of a virtual character in the virtual sound field space, where the virtual character is to be operated by a user to stop or move in the virtual sound field space, are determined.

Illustratively, things in the real world are simulated in a virtual digital space to establish a realistic, virtual and interactive three-dimensional space environment, such as N virtual instrument models, virtual sound field space models and virtual character models. The virtual character can roam (move arbitrarily in at least part of the area) in the virtual sound field space. In other words, the virtual sound field space is a virtual three-dimensional space, which may include three-dimensional spatial environment information in reality.

In some embodiments, the system scene model is constructed with chinese national orchestra (for example only) objects and concert hall scenes, including 1 concert hall model, 1 scene roaming character model, and 11 chinese national orchestra musical instrument models, including lute, chana, zhongruan, sanxian, dulcimer of a plucked instrument group, flute, nanxiao, sheng of a blowing instrument group, urheen, zhonghu of a pulled instrument group, and chime of a percussion instrument group. The script of the character roaming and the musical instrument moving functions is written through a programming language C # supported by Unity. The source of the material is converted to fbx file format for import into the Unity system scene, and the material map is assigned to the physical model after import.

Exemplarily, since a concert hall scene does not have a prominent lamp, but a light source is hidden in a building, a warm and fused lighting effect is realized, and illumination mapping baking is adopted as a main method for scene light rendering. The system scene uses parallel light as a main light source, and simultaneously uses dozens of simulated light types including point light sources and area light to supplement and illuminate all shadings in a concert hall stage area and an auditorium area. Besides, the lighting network consisting of a plurality of light probes is also applied to system scenes and dynamically illuminates a plurality of moving objects including characters and musical instrument models in a concert hall.

In operation S120, relative position information between N first positions, which are N virtual sound source positions, and a second position, which is a virtual listening position, is determined, where N is an integer greater than or equal to 1.

Illustratively, the N first and second positions are mapped to position information in real (physical) space, so that propagation of audio signals is simulated after mapping the N virtual sound source positions and the virtual listening position to real positions, and the relative position information may reflect the relative position between the user and the musical instrument in real space.

In operation S130, the N first audio signals are processed by using a sound field space model according to the relative position information, so as to obtain a second audio signal, where the sound field space model is used to simulate propagation of the N first audio signals in a physical space, and the N first audio signals are in one-to-one correspondence with the N virtual musical instruments.

Illustratively, the sound field space model may have an immersion space sound field simulation framework based on the physical propagation principle of sound, performing signal processing from three angles of sound emission, propagation path, and receiver. For example, the 11 first audio signals are audio files corresponding to 11 chinese national orchestra musical instrument models one to one, and the audio files may be all audio signals played by real musical instruments in a piece of music recorded in advance.

In operation S140, a second audio signal is played to the user in response to a play operation of the user.

For example, when the user performs the play operation, the present invention is not limited, and the play button may be clicked before, during or after the other operations other than operation S140 are performed, for example, in operation S110, or before operation S110.

According to the embodiment of the invention, an audible virtual sound field space can be provided, and the virtual character can be interacted by a user to operate and play audio, and the sound propagation phenomenon in a real environment (such as a concert hall) is simulated to play audio for the user. The positions of the N kinds of virtual musical instruments and virtual characters are respectively determined and are used as N virtual sound source positions and virtual listening positions. Then, relative position information between the N virtual sound source positions and the virtual listening positions is determined. Then, the N types of first audio signals are processed by utilizing the sound field space model to simulate the propagation effect of the N types of first audio signals in the physical environment based on the relative position, and a second audio signal is obtained. And finally, playing the second audio signal to the user. Therefore, the real-time interactive immersive sound field roaming function of the performance is realized.

Fig. 2 schematically shows a flow diagram for obtaining a second audio signal according to an embodiment of the invention. Fig. 3 schematically shows a flow chart for processing a first audio signal according to an embodiment of the invention.

As shown in fig. 2, processing N types of first audio signals using the sound field spatial model in operation S130, and obtaining a second audio signal includes operations S210 to S240. The sound field space model comprises a direct sound processing model, an early reflected sound model and a later reverberant sound model.

In operation S210, the N first audio signals are attenuated using the direct sound processing model to obtain a first output result.

Exemplarily, in the virtual sound field space, a sound (first audio signal) is set as a point sound source, and position coordinates thereof are given to N kinds of virtual instrument models in a scene. Direct sound refers to the part of the energy of a sound source that is transmitted directly to a receiver (virtual character) without any reflection in free field conditions, and the sound energy is attenuated by its surrounding environment during propagation. In physical propagation theory, the attenuation of the energy of a point sound source follows the inverse square law, and the amplitude is proportional to the inverse of the propagation distance, i.e. the amplitude decreases by 6dB for every doubling of the distance. This allows the energy to be spread more and less as the distance increases when the energy is radiated outward.

Referring to fig. 3, the first audio signals playing the virtual instrument may be referred to as play events, and each first audio signal corresponds to one track, a direct-sound processed and early-reflection-sound processed so-called dry track, and a direct-sound processed and late-reverberation processed so-called wet track, in the structural hierarchy of the direct-sound processing model.

The two play events shown in fig. 3 are the same. In some embodiments, the output of the direct sonication model may be used as input to the early reflected sound model and the late reverberant sound model, respectively. Two direct sound processing models can also be set to respectively correspond to the early reflected sound model and the later reverberant sound model.

In operation S220, the first output result is input to the early reflection acoustic model for reflection processing, and a second output result is obtained.

Illustratively, the sound waves (audio signals) collide with the surrounding medium during the further propagation from the sound source, during which a part of the energy is absorbed by the material of the medium, a part of the energy continues to propagate forward, and another part of the energy is reflected. The first few colliding reflections of a sound wave are defined as early reflections, which have a certain time delay with respect to the direct sound and whose propagation direction is diverse. The time difference, the directivity and the sound energy information presented by the early reflectors are identified by listeners (namely virtual characters), so that the preliminary judgment of the direction sense and the positioning of the early reflectors in the space is formed, the size and the shape of the room are further disclosed, and different acoustic effects are generated along with the change of the material of the wall surface.

In operation S230, the first output result is input to the late reverberant model for reverberation processing, and a third output result is obtained.

For example, sound waves continuously propagate in a real concert hall, and after a large number of reflections and absorptions, the sum of the sound wave energy remaining in the room is called reverberant sound. After the sound source stops sounding, the reverberant sound is continuously sounded and slowly dissipated with reflection and absorption. By recognizing late reverberant sounds in different spaces by hearing, the volume of the space and its unique architectural acoustic information will be intuitively perceived. The low frequency sound in the reverberant sound is always higher in content than the high frequency sound, and the attenuation time is also longer than the high frequency sound because the low frequency sound is longer in wavelength, more likely to bypass the obstacle without being reflected, and less likely to be absorbed by the obstacle. And sound is attenuated when propagating through air, but low frequency sound is attenuated to a lesser extent than high frequency sound under the same propagation conditions.

In operation S240, a second audio signal is obtained according to the second output result and the third output result.

Referring to fig. 3, the second output result and the third output result are summarized into the main output bus, and the main output bus outputs the second audio signal. A bus is a route for audio signals, which can go either to another bus or directly to the output.

In some embodiments, the final output of the main output bus includes 11 dry track instrument direct sound, early reflections, and wet reverberations. Namely, the first output result respectively enters the main output bus, the early reflected sound model and the later reverberant sound model.

According to an embodiment of the present invention, the audio dry sound is processed separately from the wet sound. The audio dry sound can contain rich distance and direction information after early reflection processing and is directly sent to the main output bus. Reverberant sound does not contain dry sound. According to an actual sound propagation rule, the energy of pure reverberant sound increases along with the increase of the propagation distance, the distance difference between a listener and a sound source assists to form spatial position information, and dry sound and wet sound are smoothly rendered into a reverberant sound effect carrying comprehensive information in the process of propagating to the listener along with the propagation of the sound source in a virtual space.

According to an embodiment of the present invention, the above-mentioned relative position information includes distance information, and the performing attenuation processing on the N first audio signals by using the direct sound processing model in operation S210 to obtain the first output result includes: and processing the N first audio signals by using N distance attenuation curves according to the distance information to obtain a first output result, wherein the N distance attenuation curves correspond to the N first audio signals one to one, and any two curves in the N distance attenuation curves are the same or different.

Distance modeling work for simulating natural attenuation conditions of sound is completed in the step, the first audio signal in the N can be classified, and a corresponding distance attenuation curve is created according to the classification result to construct a distance attenuation model. The maximum attenuation point is determined by the maximum attenuation distance value, and a spherical attenuation range is formed around each sound source by taking the maximum distance value as a radius. The attenuation curve can be set by self-definition, and control points can be added to refine adjustment. The distance decay curve includes a linear curve, a constant curve, a logarithmic curve, a power curve, and an S-curve. For example, a logarithmic curve is selected, and a curve with a more real auditory sensation is simulated by an interpolation method.

In some embodiments, to simulate the air absorption effect, a recursive filter is selected for some of the high and low frequencies.

FIG. 4 schematically shows a flow diagram of a cone attenuation process according to an embodiment of the invention.

As shown in fig. 4, this embodiment includes performing cone attenuation processing on at least one of the N types of first audio signals, and specifically includes: operations S410 to S440 are performed on any one of the at least one audio signal.

In operation S410, a propagation distance is obtained based on the inner space information of the virtual sound field space.

The interior space information may be, for example, three-dimensional space parameters of the virtual sound field space, such as the size of the interior space, the layout of buildings, or the layout of parties to a concert (e.g., a stage, an auditorium, a band, etc.), and the like. The propagation distance may be the distance from the sound source to a certain wall inside the virtual sound field space.

In operation S420, a spherical propagation region of the audio signal is obtained by using the virtual sound source position corresponding to the audio signal as a sphere center position and the propagation distance as a radius.

In operation S430, the spherical propagation region is divided into an inner angle region, an outer angle region, and a transition region between the inner angle region and the outer angle region.

In operation S440, corresponding attenuation processing is performed on the audio signal according to an actual region to which the second location belongs, so as to obtain a first output result, where the actual region includes any one of the inner corner region category, the outer corner region, and the transition region.

According to the embodiment of the invention, the direct sound of the point sound source reveals the initial virtual sound source position information in the virtual sound field space, and the instrument with partially emphasized directivity is set with a sound cone attenuation mode so as to carry more stage acoustic information which interacts with the listener towards the azimuth change. In cone attenuation, a sphere with propagation distance as radius, centered on the geometric center of the instrument model, is divided into an inner angle, an outer angle and a transition region. In the inner corner area, the volume of the output bus is not attenuated, the volume of the output bus in the outer corner area is attenuated, and the filtering effect reaches the highest level set by the system. In the transition region between the inside and outside corners, linear interpolation is used to reduce the bus output volume. The directivity of the instrument propagation in spatial audio is accomplished by cone-shaped attenuation, and finally, in the system operation, as the orientation of the listener changes (such as forward, lateral and back), different degrees of volume attenuation and filtering effects are presented, and the listener can directly feel the change of the sound along with the direction.

In some embodiments, plug-in algorithms, geometric modeling, filter design and other methods can be comprehensively used to calculate and simulate spatialized early reflected sound of the concert hall model, and the spatial sound field information of the concert hall model is preliminarily restored.

FIG. 5 schematically shows a flow diagram of a reflection process according to an embodiment of the invention.

As shown in fig. 5, the reflection process of this embodiment includes operations S510 to S530. Operation S530 is one embodiment of operation S220.

In the virtual sound field space, the location information (virtual listening position) of the roaming listeners is a factor considered for processing the audio signal. The listener is allowed to roam around in the concert hall, and simultaneously receives early reflected sound wave information from all directions, and the receiving condition of the early reflected sound information is directly related to distance information, direction information, material information and the like of various reflecting walls (namely obstacles in the sound wave propagation process), so that the early reflected sound information and the distance information, the direction information, the material information and the like form the auditory perception of the listener, the judgment of the space perception and the immersion perception is influenced, and the part can be detected in real time and feedback can be calculated.

In operation S510, M virtual sound sources are calculated according to the N virtual sound source locations and the geometric shape of the virtual sound field space.

Illustratively, acoustic reflection geometric modeling may be performed. The spatialized early reflected sound is computed, for example, based on a virtual source technique of a multi-tap time-varying delay line, which presupposes that the reflecting surface is considered infinite and ideally rigid, when the reflection model achieves physical accuracy. In the virtual source method, a mirror image sound source is formed behind each reflecting surface at an equal distance from the sound source, and the line connecting the mirror image sound source with the sound generating body is orthogonal to the reflecting surface. The early reflection order increases with the complexity of the room geometry reflecting surfaces, the original sound source is reflected to each surface in the room geometry, so that first order reflections are generated first, after which all higher order reflections are recursively obtained from the previous order reflections. The highest order of the simulated early reflections is the fourth order, and each order reflection can be referred to as each virtual source.

Illustratively, the geometry includes three-dimensional geometric spatial information of the virtual sound field space, such as size, shape, etc. information.

In operation S520, S sound reflection paths are calculated according to the second position and the geometric form, where M and S are integers greater than or equal to 1, respectively.

Virtual sound sources and sound sources are related to the virtual sound field spatial geometry, meaning that if both are kept stationary, the information of all virtual sound sources is unchanged, so all virtual sound sources can be calculated in advance, but the real-time sound reflection path calculation related to the moving listener is repeatedly performed separately as the listener's position changes.

Illustratively, the spatial surface material in the virtual sound field space may be set in advance. The wall material determines the specific condition of the energy filtered when the sound waves penetrate the barrier. In the music hall system, the simulation of the sound absorption phenomenon of the wall body material is completed by the design of a frequency section filter. The spatial surface material model is completed based on four-frequency band filtering attenuation, the absorption frequency band is divided into low, medium and high, and the default mapping interval is shown in table 1.

TABLE 1 absorption band mapping interval

Name of type	Frequency interval
		Low frequency	<250Hz
Middle and low frequency	>250Hz and<1，000Hz
		medium-high frequency	>1,000Hz and<4，000Hz
High Frequency	>4，000Hz

each reflecting surface in the model obtained by the acoustic reflection geometric modeling is endowed with a spatial surface material, and the sound absorption effects of all the materials are superposed along with the successive arrival of sound waves in the actual operation of the virtual sound field space in the process of continuously filtering the analog signals. Different filtering parameters are respectively set on the wall body, the reflecting plate and the seat in the virtual sound field space, and the values conform to the actual material absorption coefficient which can be found, so that the acoustic property of the real physical material is accurately copied into the virtual scene.

In operation S530, the first output result is subjected to reflection processing according to the M virtual sound sources and the S sound reflection paths to obtain a second output result.

According to an embodiment of the invention, the second output reveals the size and shape of the room, which is directly related to the distance, direction and material information of the various reflecting obstacles in the propagation path, together constituting the listener's judgment of the spatial location, which should be detected and calculated in real time.

Fig. 6 schematically shows a flow chart for detecting auditory interaction information according to an embodiment of the present invention.

Prior to operation S520, as shown in fig. 6, the embodiment detects auditory interaction information includes operations S610 to S620.

In operation S610, a virtual ray is issued from a second location with the virtual character as a source of the ray.

For example, the virtual ray may simulate the illumination of a light ray to detect the surrounding environment based on the propagation and feedback of the virtual ray.

In operation S620, auditory interaction information is detected through a virtual ray, wherein the auditory interaction information includes a distance between a virtual character and a wall in a virtual sound field space and material information of the wall in the virtual sound field space.

Illustratively, the method of ray detection is used for a system to detect azimuth information between a sound source and a receiver and environmental information around a propagation path. On the basis of the acoustic reflection geometric model, when a virtual sound field space runs, a virtual character sends rays to the periphery when roaming, and information related to auditory interaction, such as the distance between the character and a wall body, the acoustic reflection material of the wall body around the character and the like, is detected in real time and enters an algorithm process. The radiation is re-transmitted each time a change in the system occurs, such as activation and deactivation of the source, a change in the position of a wandering figure, or a change in the geometry of the building surrounding the source and receiver.

According to the embodiment of the invention, the sound reflection path can be accurately calculated according to the distance from the person to the wall, the sound reflection material of the wall around the person and the like.

In some embodiments, considering that the actual room geometry surface is not completely an infinite rigid body in an ideal state, but has a boundary, more physical propagation phenomena occurring at the boundary, such as diffraction, transmission, etc., need to be taken into account in addition to the early reflections of the acoustic surface. Diffraction is defined as the physical phenomenon that sound waves travel away from the original straight line when encountering an obstacle, and is particularly characterized by the phenomenon that sound waves bend when bypassing the edge of the obstacle. The size of diffraction depends on the wavelength of sound and the size of the obstacle, and if the size of the obstacle with respect to the wavelength is too large, the degree of diffraction of sound waves is relatively large. A diffraction model based on a ray method is combined with a uniform diffraction theory to define a visible area, a reflection area and a shadow area, sound waves are transmitted from the direction of the reflection area, are reflected through a reflection surface, then are bent and are transmitted to the shadow area through the visible area, and the sound waves in the shadow area can be heard but the sound source cannot be seen. Transmission describes in particular the hindrance of the propagation of the sound source by obstacles between the emitting end and the listener. The filter is applied to simulate the degree of transmission, i.e., sets of associated acoustic reflective material and transmission loss values.

Fig. 7 schematically shows a flow chart for obtaining a third output result according to an embodiment of the invention.

As shown in fig. 7, inputting the first output result into the late reverberation model for reverberation processing in operation S630, and obtaining the third output result includes operations S710 to S720.

In operation S710, a first impulse response signal is invoked in response to a first virtual sound field space selected by a user from K virtual sound field spaces, where the first virtual sound field space is obtained according to a first physical environment construction of K physical environments, and K is an integer greater than or equal to 1.

Illustratively, a user may be provided with K virtual sound field spatial models constructed in one-to-one correspondence with K physical environments. The physical environment includes three-dimensional spatial information.

The room at this time can be analogized to the concept of "system" in the field of signal processing, and more specifically, it can be considered as a linear time-invariant system. The audio dry sound signal can be regarded as an input of the system, the sound with reverberation generated after the effect of the room is regarded as an output of the system, when a plurality of different audio dry sound signals are input, the obtained output is the sum of the superposition of the output results of the audio dry sound signals which are input separately, and the input time does not influence the output result. Further, if the input signal covers the full frequency, the output signal obtained naturally includes the response of the system to all frequencies. In digital signals, the input signal is called a pulse, the output signal of the system is called an impulse response, and the impulse response obtained by inputting the pulse in a specific room also contains all the spatial information of the room.

Illustratively, when constructing K virtual sound field space models, three-dimensional space information and stereoscopic scenes of various well-known concert halls and natural scenes and living environments in the world can be copied. And the K virtual sound field spatial models may be selected by the user to switch in real time.

The virtual sound field space may simulate spaces such as the amsterdam concert hall, berlin concert hall, boston concert hall, chicago concert hall, glaciers, caves, karst cave, and indoor court. In the signal preprocessing stage, for example, the selected impulse response signals include those collected at amsterdam concert halls, berlin concert halls, boston concert halls, and chicago concert halls, each concert hall including a left channel signal and a right channel signal in a mono format, which together constitute stereo. For example, the impulse response signal includes natural and living environment scenes, and is intended to provide the user with the opportunity to study the applicable environment of experimental musical works, such as glaciers, bridge openings, karst caves, indoor court, and the like, and the audio system is stereo.

In operation S720, a convolution calculation is performed on the first output result and the first impulse response signal to obtain a third output result.

For various reasons, the proportion of low-frequency sound in reverberant sound is higher, and since the radiation directivity of low-frequency sound waves is less pronounced than that of high-frequency sound waves, it can be said that the directional characteristic of reverberant sound is less pronounced than that of early-stage reflected sound. Therefore, when the later reverberation of the concert hall system is simulated, the sampling information of the concert hall space is obtained, and the acoustic parameters are adopted to control the reverberation effect device by using the convolution algorithm, so that the consumption cost of the central processor of the computer can be reduced.

Illustratively, different impulse response signals with spatial information are set and are subjected to convolution operation with input audio dry sound signals, so that the spatial sense of different sound field environments is truly reproduced.

A necessary step is upmixing the two mono audio signals into an impulse signal of a stereo format before using the convolution algorithm. All the processed stereo impulse response signals are convoluted with the music dry sound signals which are also stereo, and the music signals with spatial sense are output.

Referring to fig. 3, similar to the operation of the early reflected sound simulation, a plurality of auxiliary buses carrying convolutional reverberation effectors are added to the project, and each impulse response signal is applied as convolutional data, forming a concert hall reverberation effect. The transcoding of the impulse response signal is completed in an off-line mode, when a concert hall system runs, the preprocessed impulse response signal is directly convolved with the input audio dry sound, and more digital signal processing work related to real-time feedback is also completed at the same time. The spatial reverberation effect of each actual concert hall is predefined as a global state which will be triggered following operator instructions when the system is running and then assigned to the corresponding audio object in which a preset audio parameter change will be applied to the audio object.

When a user selects a particular concert hall scene, the system will respond to its corresponding audio state, and the audio auxiliary bus carried by the convolutional reverberation of that concert hall will be activated and fed into the main output bus of the audio link. Inside each auxiliary bus on which the convolution reverberation effector is mounted, technical parameters such as input level of impulse response signals, channel configuration, balance control, interference level during convolution reverberation operation, reverberation level, frequency equalization, filtering, delay time and the like are properly adjusted to ensure that reverberation is not distorted and balanced and can be smoothly and naturally switched.

Referring to fig. 3, the overall hierarchy includes 11 instrument dry sound tracks and 11 instrument pure reverberant sound tracks. The dry sound track can be directly sent to the main output bus to be used as direct sound, and meanwhile, the dry sound track is sent to the early reflection auxiliary bus on which the early reflection simulation plug-in is mounted to form early reflection sound of the concert hall system. The 11 musical instrument pure reverberant sound tracks are directly sent to a convolution reverberation auxiliary bus on which convolution reverberation effect plug-ins are mounted, and pure wet sound reverberation of a plurality of famous concert halls is formed through algorithm processing and parameter adjustment.

The convolutional reverberation has continuity problem when restoring a building sound field, specifically, in view of that the convolutional reverberation can only be convolved with dry sound when being applied once, and the impulse response signal is actually recorded or formed by simulation calculation based on a certain point in a concert hall. Thus, strictly speaking, what is heard at this time is the spatial audio information that is perceived at a certain point standing in the concert hall. When the auditory environment is changed from a flat 2D scene to a 3D scene, namely the functional experience of scene characters roaming in the environment is increased, with the real-time change of the listening position, the convolution reverberation experienced by the listener at the moment can not accurately reflect the real spatial listening feeling any more, namely the convolution reverberation effect rendered offline is static, does not have real-time change spatial position information, and does not support the presentation of dynamic sound changes related to head rotation. More sound interaction information about the motion state of scene wandering people should be added to the reverberant sound field simulation.

Referring to fig. 3 and operations S210 to S240, the audio dry sound is processed separately from the wet sound resulting from the convolution reverberation processing. The audio stem corresponds to the original musical instrument audio track with a distance attenuation model and a cone attenuation model, and the physical propagation models of the audio stem and the cone attenuation model contain rich distance and direction information and are directly sent into a main output bus. The only reverberant sound output by the convolutional reverberation auxiliary bus contains no interfering sound. According to an actual sound propagation rule, the energy of pure reverberant sound increases along with the increase of the propagation distance, the distance difference between a listener and a sound source assists to form space position information, and dry sound and wet sound are smoothly rendered into a reverberant sound effect carrying comprehensive information in the process of propagating to the listener along with the propagation of the sound source in a virtual space.

Fig. 8 schematically shows a flow chart of updating a virtual listening position according to an embodiment of the present invention.

As shown in fig. 8, updating the virtual listening position of this embodiment includes operations S810 to S830.

In operation S810, the virtual character is moved to a third position in response to a first instruction of the user to move the virtual character.

Illustratively, a user may manipulate a virtual character to roam through a virtual sound field space. A plurality of users can operate corresponding virtual characters to roam respectively, and the virtual characters are independent from each other.

Illustratively, the camera is specifically set to a first-person perspective. The virtual character can be manipulated to roam, stand and listen at different locations in the concert hall. In order to better reproduce the experience of recognizing the listening position in the real world, the system adds a head rotation function, and the audience can control the head rotation of the virtual roaming character through the keyboard direction keys so as to perceive the change of the sound source orientation in the process. The combination of the audio-visual sense organs is used as a music hall audience to bring about the experience of being personally on the scene.

In operation S820, the virtual listening position is updated to a third position.

In operation S830, the operations of determining the relative position information, obtaining the second audio signal, and playing the second audio signal to the user are re-performed. I.e., re-performing operations S120 to S140.

Considering that comparing seats in different positions is the most urgent requirement of audiences, the concert hall system realizes the interaction between the audiences and the sound field through real-time auditory simulation. During the roaming process, the interaction between the person and the nearby buildings and sound sources is calculated by the computer in real time as the position of the person changes, and the interaction is used as ray feedback carrying sound information. It may happen that the listeners roam at different locations to experience the effects of the concert, and they can also walk to the working perspective of the orchestra conductor on the stage, standing close to each instrument, knowing their acoustic characteristics.

It should be noted that the processing, propagation, and playing of the audio signal in the virtual space described above has an effect in the real space through simulation, with the position information, the geometry, or the internal space in the virtual space (virtual sound field space) having a mapping relationship with the real space.

Fig. 9 schematically shows a flow chart for updating a virtual sound source position according to an embodiment of the invention.

As shown in fig. 9, updating the virtual sound source position of this embodiment includes operations S910 to S930.

In operation S910, the at least one virtual musical instrument is moved to a fourth position in response to a second instruction of the user to move the at least one virtual musical instrument.

In operation S920, a corresponding position of the at least one virtual musical instrument among the N virtual sound source positions is updated to a fourth position.

Illustratively, each type of virtual instrument may include a plurality of virtual instruments under that type. One or more virtual musical instruments can be moved in the virtual sound field space to realize sound part positioning adjustment. Virtual sound source position coordinates are assigned to N virtual instrument models in the scene and change as the instrument models move at runtime.

In operation S930, the operations of determining the relative position information, obtaining the second audio signal, and playing the second audio signal to the user are re-performed. I.e., re-performing operations S120 to S140.

Illustratively, the band, conductor and musician may rehearse online (the first audio signal may be pre-recorded or may be acquired and processed in real-time performance) without the need for an in-person experience in the real world. Various classical band position modes can be preset for one-key switching of an operator, and the position of the musical instrument can be dragged and adjusted during performance. This function can be used to study the stage acoustics of classical, modified and innovative instruments in different positions.

According to the embodiment of the invention, for the orchestra and the conductor, the on-line simulated rehearsal can be realized by playing the audio again after moving at least one virtual musical instrument. In the simulation rehearsal, the functions of instrument acoustic simulation, sound part position adjustment, concert hall sound field real-time switching and the like are realized.

Fig. 10 schematically shows a technical architecture diagram of a modeling approach suitable for implementing interactive immersive sound field roaming, in accordance with an embodiment of the present invention. Fig. 11 schematically shows a system development architecture diagram suitable for implementing a modeling method for interactive immersive sound field roaming in accordance with an embodiment of the present invention.

Referring to fig. 10 and 11, the embodiment is based on digital twinning, virtual reality, sound field simulation, interactive immersion and other technical means to construct a virtual sound field space capable of roaming customized interactive immersion, for example, N kinds of virtual musical instruments, virtual sound field space and virtual characters are realized by digital twinning and virtual reality technology.

In order to reproduce the architectural acoustic effects of a real concert hall, an acoustic environment (i.e. an interactive immersive sound field) is simulated based on sound propagation principles, geometric models and acoustic materials using virtual reality techniques and binaural room impulse responses.

Referring to fig. 10 and 11 in conjunction with one or more of the embodiments described in fig. 1-9, the embodiments may provide a modeling method based on audibility-based interactive immersive sound field roaming, by which sound field roaming of a user can be achieved for the purpose of achieving the sound field roaming of the user. The method comprises the following steps: obtaining a direct sound processing model, wherein the direct sound processing model is used for carrying out attenuation processing on N types of first audio signals to obtain a first output result, the N types of first audio signals are respectively transmitted to virtual listening positions from N virtual sound source positions, and N is an integer greater than or equal to 1; obtaining an early stage reflected sound model, wherein the early stage reflected sound model is used for performing reflection processing on the first output result to obtain a second output result; obtaining a later reverberation sound model, wherein the later reverberation sound model is used for carrying out reverberation processing on the first output result to obtain a third output result; and setting a main output bus, wherein the main output bus is used for obtaining a second audio signal according to the second output result and the third output result, and the second audio signal is obtained by simulating the propagation of the N types of first audio signals in a physical space.

It should be noted that, referring to fig. 10 and fig. 11, and in conjunction with one or more embodiments described in fig. 1 to fig. 9, one or more steps of the sound field roaming method of the present disclosure are implemented based on one or more steps in a corresponding modeling method, and are not described herein again.

And the three-dimensional development engine Unity is used as a scene rendering platform and integrated with professional modeling software and the interactive audio engine Wwise, and meanwhile, a developed UI system is also loaded on the Unity, and finally the contents are integrated into an application system to obtain the concert hall system for realizing the interactive immersive sound field roaming method. As shown in fig. 11, the communication between Wwise and Unity is based on the logic of audio event packaging, with all audio material, events and state attributes packaged into a sound library. Through the API, it can be sent to Unity, where a series of event commands can be called with C # script.

Illustratively, the currently defined synchronizer logic may include two state sets for controlling the grouped playback of instrument tracks and the dynamic invocation and bypass of the convolutional reverb auxiliary bus, respectively. The definition rules of the playing state of the instrument track are: for the mute state of the wind instrument set, the dry sound/wet sound of the bangbian, the south flute and the sheng is set to be minus infinity, and the dry sound/wet sound of other musical instruments is set to be 0; for the mute state of the plucked instrument set, the dry sound/wet sound of the lute, the Zhongruan, the Daruan, the three-stringed instrument and the dulcimer is set to minus infinity, the dry sound/wet sound of other instruments is set to 0, and so on. The definition rule of the dynamic control state of the convolution reverberation is as follows: the current convolution reverberation auxiliary buses of 5 concert hall reverberations, 4 natural scenes and living environments are set to be 0, the other convolution reverberation auxiliary buses are set to be minus infinity, the early reflection auxiliary bus is set to be 0, and the bypass reverberation of each auxiliary bus is set to be minus infinity.

For professional audio engineers, a series of audio functions are designed, so that professionals can practice skills in virtual workplaces and music enthusiasts can experience and explore the audio functions. Given that digital audio workstations are the most familiar work environment for audio professionals, interactive control systems in the form of digital audio workstations are a central requirement. The concert hall system creates a plurality of comprehensive control panels to support custom adjustment, and all parameter variable changes can generate effects during the performance of a virtual orchestra and cannot cause pause or pause.

For mixing engineers and music sound researchers, allowing for the adjustment of the volume of each instrument track in a band, the system supports selective play and muting when mixing a particular set of instruments or adjusting the overall sound level of a work. The system also supports switching and bypassing of reverberation effects.

The main task of recording engineers and stage technicians is to handle various microphones, the selection and matching between multiple microphones requires special analysis and design. The task of the recording engineer is to transmit the first hand of music, but if the recording engineer does not learn personally on site, the music is difficult to grow up quickly, and one solution is to simulate recording on line and make full preparations. Thus, the system supports switched and critical listening of audio files recorded with microphones of different pickup types and frequency responses. This functionality may also contribute to the creation of a digital microphone library.

In some embodiments, a monitoring system building method can be provided for an application experience end, namely, a head tracker is used for the system, and the head tracker can track the rotation direction and the angle of the head of a person in listening in real time so as to simulate the sound source direction change effect generated by the head in rotation. When a user uses the headset for playback, the head tracker is placed in the middle of the beam right above the headset and connected and paired to a computer through Bluetooth.

The head tracking technology is based on a binaural effect, accurate head position information is acquired through a head tracker, information such as filtering and delay, sound reflection, sound field displacement and the like is processed, and the realization of binaural of an actual space sound field is completed under the condition that no sound dyeing is added. The head tracking technology also comprises a practical function of personalized and customized head modeling, the core technical principle of the technology is to model head related transfer functions, and HRTF functions describe the physical process of sound waves propagating from the spatial sound source position to two ears, and the physical process comprises diffraction, scattering, diffraction and the like of physiological structures (head, trunk, auricle and the like) on the sound waves. It can also be said that the HRTFs reflect the changes in amplitude and phase during the transmission of sound waves from the sound source to the ears. Data such as the head circumference, the binaural distance and the like of the listener are obtained through the synchronous tracker, and the inter-binaural delay, the filtering required by each ear and the gain amount are calculated and simulated so as to compensate the filtering influence of the body in a real sound field.

According to the embodiment of the invention, on one hand, a concert hall scene is built through a virtual reality means and an interactive function is realized, and on the other hand, an audible flow from a sound source to a receiver through a space is realized through simulation of sound transmission and sound propagation, and finally the audible flow is presented in a form of binaural audio. The functional design of the concert hall system is centered on the requirements of different groups for musical performances. For a concert listener, a series of functions such as sound field exploration, sound image positioning, virtual space simulation and the like are designed according to the requirements of realizing immersion, reality and real space simulation binaural positioning. For the orchestra and the conductor, the functions of acoustic simulation of musical instruments, sound part positioning adjustment, real-time sound field switching of a concert hall and the like are designed according to the requirements of solving the orchestra predicament oriented to global tours, helping the form conversion from off-line to on-line performance and realizing on-line simulation rehearsal. For audio engineers, the system takes an analog digital audio workstation, a mixer and a sound recorder convenient to work online and practice skills as requirements, and designs a user interaction control system capable of tuning in real time.

Based on the interactive immersive sound field roaming method based on audibility, the invention also provides an interactive immersive sound field roaming system based on audibility. The apparatus will be described in detail below with reference to fig. 12.

Fig. 12 schematically illustrates a block diagram of an interactive immersive sound field roaming system 1200 based on audibility, in accordance with an embodiment of the present invention.

As shown in fig. 12, the audibility-based interactive immersive sound field roaming system 1200 may include a position determination unit 1210, a relative position unit 1220, a signal processing unit 1230, and an audio playback unit 1240.

The position determining unit 1210 may perform operation S110 for determining N first positions of N kinds of virtual musical instruments in the virtual sound field space, and a second position of a virtual character in the virtual sound field space, wherein the virtual character is to be operated by a user to stop or move in the virtual sound field space.

The position determining unit 1210 may perform operations S810 to 820, and operations S910 to 920 are not described herein.

The relative position unit 1220 may perform operation S120 for determining relative position information between N first positions, which are N virtual sound source positions, and a second position, which is a virtual listening position, where N is an integer greater than or equal to 1.

The signal processing unit 1230 may perform operation S130, to process the N types of first audio signals using a sound field spatial model for simulating propagation of the N types of first audio signals in a physical space, the N types of first audio signals corresponding to the N types of virtual musical instruments one to one, according to the relative position information, and obtain a second audio signal.

The signal processing unit 1230 may further perform operations S210 to 240, operations S410 to 440, operations S510 to 530, operations S610 to 620, and operations S710 to 720, which are not described herein again.

The audio playing unit 1240 may perform operation S140 for playing the second audio signal to the user in response to the playing operation of the user.

Fig. 13 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 13, the computing device may include: a processor (processor) 1302, a communication Interface (Communications Interface) 1304, a memory (memory) 1306, and a communication bus 1308.

Wherein:

the processor 1302, communication interface 1304, and memory 1306 communicate with each other via a communication bus 1308.

A communication interface 1304 for communicating with network elements of other devices, such as clients or other servers.

The processor 1302 is configured to execute the program 1310, and may specifically execute the relevant steps in the above-described embodiment of the object capture method.

In particular, the program 1310 may include program code that includes computer operating instructions.

The processor 1302 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement an embodiment of the present invention. The one or more processors included in the computing device may be the same type of processor, such as one or more CPUs. Or may be different types of processors such as one or more CPUs and one or more ASICs.

A memory 1306 for storing a program 1310. Memory 1306 may include high-speed RAM memory, and may also include non-volatile memory (nonvolatile memory), such as at least one disk memory.

The program 1310 may specifically be configured to cause the processor 1302 to perform an object grasping method in any of the method embodiments described above. For specific implementation of each step in the program 1310, reference may be made to corresponding steps and corresponding descriptions in units in the above-described object capture embodiments, which are not described herein again. It is clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described again here

The present invention also provides a computer-readable storage medium, which may be embodied in the apparatus/device/system described in the above embodiments. Or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus.

Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment.

Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the devices in an embodiment may be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those of skill in the art will appreciate that while some embodiments herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website, or provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limited to the order of execution unless otherwise specified.

Claims

1. An audibility-based interactive immersive sound field roaming method, comprising:

determining N first positions of N virtual musical instruments in a virtual sound field space and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space;

determining relative position information between the N first positions and the second positions, wherein the N first positions are N virtual sound source positions, the second positions are virtual listening positions, and N is an integer greater than or equal to 1;

processing N first audio signals by using a sound field space model according to the relative position information to obtain second audio signals, wherein the sound field space model is used for simulating the propagation of the N first audio signals in a physical space, and the N first audio signals are in one-to-one correspondence with the N virtual musical instruments;

and responding to the playing operation of the user, and playing the second audio signal to the user.

2. The method according to claim 1, wherein the sound field spatial model comprises a direct sound processing model, an early reflected sound model and a late reverberant sound model, and the processing the N types of first audio signals with the sound field spatial model to obtain the second audio signals comprises:

performing attenuation processing on the N first audio signals by using the direct sound processing model to obtain a first output result;

inputting the first output result into the early reflected sound model for reflection processing to obtain a second output result;

inputting the first output result into the later reverberation model for reverberation processing to obtain a third output result;

and obtaining the second audio signal according to the second output result and the third output result.

3. The method as recited in claim 2, wherein the relative position information comprises distance information, and wherein the attenuation processing the N first audio signals using the direct sound processing model comprises:

and processing the N types of first audio signals according to the distance information by utilizing N distance attenuation curves, wherein the N distance attenuation curves correspond to the N types of first audio signals one to one, and any two curves in the N distance attenuation curves are the same or different.

4. The method according to claim 3, wherein said processing the N first audio signals according to the distance information using N distance attenuation curves comprises cone attenuation processing at least one of the N first audio signals, in particular comprising: for any of the at least one audio signal,

obtaining a propagation distance based on the internal space information of the virtual sound field space;

taking the position of a virtual sound source corresponding to the audio signal as the position of a sphere center, and taking the propagation distance as a radius to obtain a spherical propagation area of the audio signal;

dividing the spherical propagation region into an inner angle region, an outer angle region, and a transition region between the inner angle region and the outer angle region;

and performing corresponding attenuation processing on the audio signal according to an actual region to which the second position belongs to obtain the first output result, wherein the actual region comprises any one of the inner corner region type, the outer corner region and the transition region.

5. The method of claim 2, wherein:

calculating to obtain M virtual sound sources according to the N virtual sound source positions and the geometric forms of the virtual sound field spaces;

calculating S sound reflection paths according to the second position and the geometric form, wherein M and S are integers which are larger than or equal to 1 respectively;

wherein, the inputting the first output result into the early stage reflected sound model for reflection processing to obtain a second output result includes:

and performing reflection processing on the first output result according to the M virtual sound sources and the S sound reflection paths to obtain a second output result.

6. The method of claim 5, wherein prior to said calculating S sound reflection paths, the method further comprises:

taking the virtual character as a ray source, and emitting virtual rays from the second position;

and detecting auditory interaction information through the virtual ray, wherein the auditory interaction information comprises the distance between the virtual character and the wall in the virtual sound field space and the material information of the wall in the virtual sound field space.

7. The method of claim 2, wherein the late reverberation sound model includes K impulse response signals obtained from K recordings of physical environments, the inputting the first output result into the late reverberation sound model for reverberation processing, and obtaining a third output result includes:

responding to a first virtual sound field space selected by the user from K virtual sound field spaces, and calling a first impulse response signal, wherein the first virtual sound field space is obtained according to a first physical environment construction in the K physical environments, and K is an integer greater than or equal to 1;

and performing convolution calculation on the first output result and the first impulse response signal to obtain a third output result.

8. The method of claim 1, wherein the method further comprises:

in response to a first instruction from the user to move the virtual character, causing the virtual character to move to a third location;

updating the virtual listening position to the third position;

and re-executing the operations of determining the relative position information, obtaining the second audio signal and playing the second audio signal to the user.

9. The method of claim 1, wherein the method further comprises:

in response to a second instruction from the user to move at least one virtual instrument, causing the at least one virtual instrument to move to a fourth position;

updating the corresponding position of the at least one virtual musical instrument in the N virtual sound source positions to the fourth position;

10. An audibility-based interactive immersive sound field roaming system, comprising:

a position determination unit for determining N first positions of N kinds of virtual musical instruments in a virtual sound field space, and a second position of a virtual character in the virtual sound field space, wherein the virtual character is used for being operated by a user to stop or move in the virtual sound field space;

a relative position unit, configured to determine relative position information between the N first positions and the second position, where the N first positions are N virtual sound source positions, the second position is a virtual listening position, and N is an integer greater than or equal to 1;

a signal processing unit, configured to process N types of first audio signals by using a sound field space model according to the relative position information, to obtain a second audio signal, where the sound field space model is used to simulate propagation of the N types of first audio signals in a physical space, and the N types of first audio signals are in one-to-one correspondence with the N types of virtual musical instruments;

and the audio playing unit is used for responding to the playing operation of the user and playing the second audio signal to the user.

11. A method of modeling interactive immersive sound field roaming, comprising:

obtaining a direct sound processing model, wherein the direct sound processing model is used for carrying out attenuation processing on N types of first audio signals to obtain a first output result, the N types of first audio signals are respectively transmitted to virtual listening positions from N virtual sound source positions, and N is an integer greater than or equal to 1;

obtaining an early phase reflected sound model, wherein the early phase reflected sound model is used for performing reflection processing on the first output result to obtain a second output result;

obtaining a late reverberation sound model for performing reverberation processing on the first output result to obtain a third output result;

and setting a main output bus, wherein the main output bus is used for obtaining a second audio signal according to the second output result and the third output result, and the second audio signal is obtained by simulating the propagation of the N types of first audio signals in a physical space.