CN115103292A

CN115103292A - Audio processing method and device in virtual scene, electronic equipment and storage medium

Info

Publication number: CN115103292A
Application number: CN202210551897.1A
Authority: CN
Inventors: 黄润乾; 陈东鹏
Original assignee: Voiceai Technologies Co ltd
Current assignee: Voiceai Technologies Co ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-09-23

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a storage medium in a virtual scene, wherein the method comprises the following steps: determining candidate scene elements in the virtual scene which are associated with scene audio; determining relative distances between a target user and each candidate scene element in the virtual scene; determining the target volume of an audio signal corresponding to the scene audio transmitted from the candidate scene element to the target user according to the relative distance between the candidate scene element and the target user and the volume information corresponding to the scene audio; determining a target scene audio of which the corresponding target volume exceeds a volume threshold in the scene audio associated with the candidate scene element; adjusting the target scene audio according to the relative orientation information between the candidate scene elements corresponding to the target scene audio and the target user; and sending the adjusted target scene audio to the interactive equipment of the target user. The method and the device can screen the scene audio, and save computing resources.

Description

Audio processing method and device in virtual scene, electronic equipment and storage medium

Technical Field

The present application relates to the field of audio processing, and in particular, to an audio processing method and apparatus in a virtual scene, an electronic device, and a storage medium.

Background

With the rapid development of computer technology, virtual reality technology and augmented reality technology are becoming more and more common in people's daily life. In a virtual scene of Virtual Reality (VR) or Augmented Reality (AR) or Mixed Reality (MR), a user may view the content of the virtual scene through an interactive device, such as a head mounted stereoscopic display, and listen to the audio in the virtual scene through headphones or loudspeakers. In the related art, in an AR or VR scene, sounds in a virtual scene heard by different users are the same, and a user does not experience a sense of hearing in the virtual scene.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present application provide an audio processing method and apparatus in a virtual scene, an electronic device, and a storage medium, so as to improve the foregoing problems.

In a first aspect, an embodiment of the present application provides an audio processing method in a virtual scene, where the method includes: determining candidate scene elements in the virtual scene which are associated with scene audio; determining a relative distance between a target user and each of the candidate scene elements in the virtual scene; according to the relative distance between the candidate scene element and the target user and the volume information corresponding to the scene audio associated with the candidate scene element, determining the target volume of the audio signal corresponding to the corresponding scene audio which is transmitted from the candidate scene element to the position of the target user in the virtual scene; determining a target scene audio of which the corresponding target volume exceeds a volume threshold in the scene audio associated with the candidate scene element; adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user; and sending the adjusted target scene audio to the interaction equipment of the target user.

In a second aspect, an embodiment of the present application provides an audio processing apparatus in a virtual scene, including: a candidate scene element determination module, configured to determine candidate scene elements associated with scene audio in the virtual scene; a relative distance determination module for determining a relative distance between a target user and each of the candidate scene elements in the virtual scene; a target volume determining module, configured to determine, according to a relative distance between the candidate scene element and the target user and volume information corresponding to a scene audio associated with the candidate scene element, a target volume at which an audio signal corresponding to the corresponding scene audio propagates from the candidate scene element to a position of the target user in the virtual scene; the target scene audio determining module is used for determining the target scene audio of which the corresponding target volume exceeds a volume threshold value in the scene audio associated with the candidate scene element; the target scene audio adjusting module is used for adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user; and the transmission module is used for transmitting the adjusted target scene audio to the interactive equipment of the target user.

In some embodiments, the relative bearing information comprises a relative distance between the target user and the candidate scene element in the virtual scene; in this embodiment, the target scene audio adjusting module includes: an adjusted volume determining unit, configured to determine, according to a relative distance between the target user and the candidate scene element in the virtual scene, an adjusted volume corresponding to the target scene audio; and the volume adjusting unit is used for adjusting the volume of the target scene audio according to the adjusted volume.

In some embodiments, the target scene audio adjustment module further comprises: a delay duration determination unit, configured to determine a delay duration of the target scene audio relative to the target user according to a relative distance between the target user and the candidate scene element in the virtual scene; and the delay duration sending unit is used for sending the delay duration to the interactive equipment of the target user so as to enable the interactive equipment of the target user to delay playing of the target scene audio according to the delay duration after receiving the target scene audio.

In further embodiments, the relative orientation information comprises first relative orientation information of the candidate scene element relative to the target user in the virtual scene; in this embodiment, the target scene audio adjusting module includes a sound channel adjusting unit, configured to perform sound channel adjustment on the target scene audio according to the first relative direction information.

In some embodiments, the audio processing device in the virtual scene further comprises: a target scene element determination module to determine a target scene element located between the candidate scene element and the target user in the virtual scene; and the tone color adjusting module is used for adjusting the tone color of the target scene audio according to the attribute information of the target scene element.

In some implementations, the audio processing device in the virtual scene further includes: the climate type determining module is used for determining the climate type of the virtual scene; and the audio superposition module is used for superposing the environmental audio corresponding to the climate type on the target scene audio.

In some embodiments, the audio processing device in the virtual scene further comprises: the local virtual scene determining module is used for determining a local virtual scene in the set range of the target user in the virtual scene according to the position information of the target user in the virtual scene; the target scene type determining module is used for determining a target scene type corresponding to the local virtual scene; and the reverberation superposition module is used for superposing a reverberation effect on the target scene audio according to the reverberation parameter information corresponding to the target scene type.

In some embodiments, the audio processing device in the virtual scene further comprises: a reference user determining module, configured to determine, in the virtual scene, a reference user whose distance to the target user is smaller than a distance threshold; and the audio sending module is used for sending the adjusted target scene audio to the interactive equipment of each reference user.

In some embodiments, the audio processing device in the virtual scene further comprises: a second relative orientation information determining module, configured to determine second relative orientation information of the candidate scene element with respect to each of the reference users in the virtual scene; a sound channel adjustment information determining module, configured to determine, according to the second opposite direction information, sound channel adjustment information of the target scene audio relative to each reference user; and the sound channel adjustment information sending module is used for sending the sound channel adjustment information of the target scene audio relative to each reference user to the interaction equipment of the corresponding reference user, so that the interaction equipment of the reference user carries out sound channel adjustment on the received target scene audio according to the sound channel adjustment information after receiving the adjusted target scene audio.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement a method of audio processing in a virtual scene as described above.

In a fourth aspect, the present application provides a computer-readable storage medium, on which computer-readable instructions are stored, which, when executed by a processor, implement the audio processing method in a virtual scene as described above.

In the scheme of the application, candidate scene elements related to scene audio in a virtual scene can be determined; determining relative distances between a target user and each candidate scene element in the virtual scene; according to the relative distance between the candidate scene element and the target user and the volume information corresponding to the scene audio associated with the candidate scene element, determining the target volume of the audio signal corresponding to the corresponding scene audio, which is transmitted from the candidate scene element to the position of the target user in the virtual scene; determining a target scene audio of which the corresponding target volume exceeds a volume threshold in the scene audio associated with the candidate scene element; adjusting the target scene audio according to the relative orientation information between the candidate scene elements corresponding to the target scene audio and the target user; and sending the adjusted target scene audio to the interactive equipment of the target user.

According to the scheme, the target volume of the audio signal of the scene audio associated with each candidate scene element, which is transmitted to the target user, is determined according to the relative distance between the target user and the candidate scene element in the virtual scene, and then the scene audio is screened according to the target volume, so that the target scene audio which can be clearly heard by the target user at the target user can be determined, which is equivalent to the fact that the transmission condition of the target scene audio in a real environment is simulated in the virtual scene, and therefore the auditory experience of the target user in the virtual scene can be improved. Moreover, in the scheme of the application, for the target user, only the determined target scene audio is adjusted, and other scene audio is not required to be adjusted, so that the processing resource is saved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from these drawings without inventive effort.

Fig. 1 is a schematic diagram of a virtual scene shown according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating an audio processing method in a virtual scene according to an embodiment of the present application.

Fig. 3 is a flowchart illustrating specific steps before step 250 according to an embodiment of the present application.

Fig. 4 is a flowchart illustrating specific steps before step 260 according to an embodiment of the present application.

Fig. 5 is a flowchart illustrating a specific step after the step 250 according to an embodiment of the present application.

Fig. 6 is a flowchart illustrating a specific step after step 510 according to an embodiment of the present application.

Fig. 7 is a block diagram illustrating an audio processing apparatus in a virtual scene according to an embodiment of the present application.

Fig. 8 is a hardware block diagram of an electronic device according to an embodiment of the present application.

While specific embodiments of the invention have been shown by way of example in the drawings and will be described in detail hereinafter, such drawings and description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by way of specific embodiments.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

Fig. 1 is a schematic diagram of a virtual scene shown according to an embodiment of the present application, as shown in fig. 1, in a virtual scene 100, there are a target user 110, candidate scene elements (a first candidate scene element 120, a second candidate scene element 130, and a third candidate scene element 140 are exemplarily shown in fig. 1), and a reference user (a first reference user 150 and a second reference user 160 are exemplarily shown in fig. 1).

The virtual scene may be a three-dimensional scene or a 2.5-dimensional scene constructed by the electronic device according to an environment image acquired by the terminal, the environment image may be acquired by the terminal, and the virtual scene may be presented on a display interface of the terminal. The electronic device may be an AR server, a VR server, or the like, and is not particularly limited herein.

Based on the virtual scene shown in fig. 1, in the present solution, the electronic device may preset a scene audio associated with each candidate scene element, where the scene audio may be an audio simulating a sound production of the candidate scene element, and if it is set that the first candidate scene element 120, the second candidate scene element 130, and the third candidate scene element 140 are associated with the scene audio, on this basis, for the target user, according to relative distances between the first candidate scene element 120, the second candidate scene element 130, and the third candidate scene element 140 and the target user 110, a target volume at which the scene audio corresponding to each candidate scene element is transmitted from each candidate scene element to the target user 110 is determined, and a scene audio whose target volume exceeds a volume threshold is determined as the target scene audio. If the calculated target volume that the scene audio corresponding to the first candidate scene element 120 is propagated to the target user 110 is greater than the volume threshold, determining the scene audio corresponding to the first candidate scene element 120 as the target scene audio, then adjusting the target scene audio of the first candidate scene element 120 according to the relative orientation information between the first candidate scene element 120 and the target user 110, and finally sending the adjusted target scene audio to the interaction device of the target user 110. In some embodiments, the adjusted target scene audio may also be sent to the interaction device of the second reference user 160 according to the distance between the target user 110 and the first reference user 150 and the second reference user 160, respectively.

Fig. 2 is a flowchart illustrating an audio processing method in a virtual scene according to an embodiment of the present application, where the method may be performed by an electronic device with computing processing capability, and the electronic device may be an AR server, a VR server, or another device with processing capability. As shown in fig. 2, the method comprises the steps of:

in step 210, candidate scene elements in the virtual scene associated with the scene audio are determined.

A virtual scene refers to a scene that an application displays (or provides) when running on an interactive device (e.g., VR glasses or other interactive devices used to display virtual scenes). The virtual scene can be a simulation environment of a real world, can also be a semi-simulation semi-fictional virtual scene, and can also be a pure fictional virtual scene. The virtual scene may be any one of a two-dimensional virtual scene, a 2.5-dimensional virtual scene, or a three-dimensional virtual scene, and the dimension of the virtual scene is not limited in the embodiment of the present application.

In some embodiments, there are various scene elements in the virtual scene, for example, the virtual scene may be sky, land, sea, etc., the land may include scene elements of desert, city, etc., and may also include other scene elements, such as airplanes, automobiles, etc.

Scene audio refers to audio that simulates each candidate scene element in a virtual scene to sound in real life. In a specific embodiment, a corresponding relationship between the candidate scene elements and the scene audio may be preset, and it may be understood that, in the virtual scene, some candidate scene elements may not be associated with the scene audio. Thus, based on the correspondence between the candidate scene element and the scene audio, the candidate scene element associated with the scene audio can be determined.

In step 220, the relative distance between the target user and each candidate scene element in the virtual scene is determined.

Relative distance refers to the distance of each candidate scene element in the virtual scene relative to the target user.

The position information of the target user in the virtual scene may be position information of a model constructed for the target user in the virtual scene, or position information of an object manipulated by the target user in the virtual scene. For the target user, the target user is not actually located in the virtual scene, but the target user participates in the interaction with the virtual scene through a controllable object in the virtual scene or a model constructed for the target user.

In the virtual scene, after a corresponding simulated scene is constructed by combining the environment image of the environment where the target user is located, the position information of the model corresponding to the target user in the virtual scene is correspondingly determined. Similarly, the object manipulated by the user in the virtual scene is correspondingly determined after the virtual scene is constructed, so that after the object manipulated by the target user is determined, the position information of the object manipulated by the target user in the virtual scene can be correspondingly used as the position information of the target user in the virtual scene.

The relative distance between the target user and each candidate scene element can be obtained by calculating the coordinates of the target user and each candidate scene element in a coordinate system corresponding to the virtual scene. Specifically, if the coordinates of the target user in the coordinate system corresponding to the virtual scene are a (a, B, c), the coordinates of a candidate scene element in the coordinate system corresponding to the virtual scene are B (x, y, z), and the relative distance between the target user and the candidate scene element is L, then the target user and the candidate scene element are identified by the relative distance between the target user and the candidate scene element

Step 230, determining, according to the relative distance between the candidate scene element and the target user and the volume information corresponding to the scene audio associated with the candidate scene element, the target volume at the position of the target user in the virtual scene, where the audio signal corresponding to the corresponding scene audio propagates from the candidate scene element.

For the scene audio associated with each candidate scene element, the initial volume of the scene audio is correspondingly set in the scene audio, and the initial volume set for a scene audio is the volume information corresponding to the scene audio. It can be understood that if the scene audio is not adjusted, the scene audio is played according to the initial audio, so that the volume of the scene audio heard by each user interacting with the virtual scene is the same. The volume of the scene audio may also be understood as the sound intensity of the scene audio.

The target volume refers to the volume of the scene audio associated with the simulated candidate scene element in the virtual scene, which is propagated to the target user after the scene audio is propagated in reality.

In some embodiments, the target volume may be based on the volume of the scene audio at the location of the corresponding candidate scene element (i.e., as described above)Initial volume) and the distance between the candidate scene element relative to the candidate scene element. Optionally, the volume information corresponding to the scene audio associated with a candidate scene element indicates that the volume of the scene audio at the candidate scene element is B ₀ Determining that the target volume of the audio signal corresponding to the corresponding scene audio propagated from the candidate scene element to the position of the target user in the virtual scene is B, wherein the relative distance between the candidate scene element and the target user is R, and the sound attenuation coefficient is K, and the target volume is B

Step 240, in the scene audio associated with the candidate scene element, determining a target scene audio whose corresponding target volume exceeds the volume threshold.

The target scene audio refers to the scene audio with the target volume exceeding the volume threshold value at the position of the target user in the virtual scene, wherein the audio signal of the scene audio corresponding to the candidate scene element is propagated to the target scene audio.

Because the audio signal is influenced by the propagation medium when propagating in the real environment, the volume of the audio signal at the sound source of the audio signal is different from the volume received at the position for receiving the audio signal, and if the volume of the audio signal propagating to the position of the receiving party is too small, a user cannot sense the volume in the sense of hearing, therefore, if the volume of the audio signal propagating to the position of the receiving party is too small, the user does not need to pay attention to the audio signal. Therefore, in order to enable the experience of a user to be more real, the situation that the scene audio associated with each candidate scene element is propagated in the virtual scene is simulated, and then the target scene audio with the target volume exceeding the volume threshold is determined.

In some embodiments, the volume received at the position of the target user after the audio is propagated through the propagation medium is smaller than the volume propagated to the candidate scene element, but the situation that the audio cannot be heard is caused when the volume is too small, in order to avoid the process that the inaudible scene audio is also subjected to subsequent audio processing, the scene audio needs to be screened, and only the scene audio with the target volume exceeding the volume threshold at the position of the target user can enter the subsequent steps. For example, when the volume threshold is 60dB, if the target volume of a scene audio propagated to the position of the target user in the virtual scene is 20dB, it represents that the volume of the scene audio is too small when the scene audio propagates to the target user, and the target user cannot hear the scene audio; and if the target volume of the other scene audio propagated to the position of the target user in the virtual scene is 61db, determining that the target user can clearly hear the scene audio, and determining the scene audio as the target scene audio. The volume threshold may be set according to actual needs, and is not particularly limited herein.

And step 250, adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user.

The relative orientation information refers to orientation information of candidate scene elements corresponding to the target scene audio relative to the target user, and the relative orientation information may include a relative distance between the candidate scene elements corresponding to the target scene audio and the target user, an orientation between the candidate scene elements corresponding to the target scene audio and the target user, and the like.

In some embodiments, a coordinate system may be established for the origin by the target user, and the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user is determined by determining the coordinates of the candidate scene element corresponding to the target scene audio in the coordinate system.

After the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user is determined, the target scene audio can be adjusted according to the relative orientation information. Optionally, the volume level, the sound channel, etc. of the target scene audio may be adjusted.

In some embodiments, the relative bearing information includes a relative distance between the target user and the candidate scene element in the virtual scene, step 250 includes:

determining the corresponding adjusted volume of the audio frequency of the target scene according to the relative distance between the target user and the candidate scene element in the virtual scene; and adjusting the volume of the target scene audio according to the adjusted volume.

In real life, the propagation of audio is attenuated to different degrees by the environment, so the volume at the sound source is different from that at the receiver. Generally, the farther the distance between the sound source and the audio receiving place is, the more the sound attenuation is, the smaller the volume of the audio received by the receiving place is, and in order to make the experience of the user more realistic in the virtual scene, the volume of the target scene audio is also adjusted according to the relative distance between the candidate scene element corresponding to the target scene audio and the target user.

And after the adjusted volume is determined, subtracting the adjusted volume from the initial volume indicated by the volume information corresponding to the target scene audio to obtain the volume of the target scene audio after the volume is adjusted.

In some embodiments, a plurality of distance ranges may be preset, each distance range corresponding to an adjusted volume, and the adjusted volume may be determined according to the distance range corresponding to the distance value of the relative distance between the target user and the candidate scene element.

In other embodiments, a comparison distance may also be preset, a distance difference between the relative distance between the target user and the candidate scene element and the comparison distance is determined by subtracting the comparison distance from the relative distance between the target user and the candidate scene element, an adjusted volume of the target scene audio is determined according to the distance difference, and then the volume of the target scene audio is adjusted according to the adjusted volume. For example, the contrast distance is H, the relative distance between the target user and the candidate scene element is L, the volume of the target scene audio is G, the adjusted volume is F, where K is a proportional coefficient of the distance difference and the adjusted volume, where K is a positive number, and the adjusted volume of the target scene audio is G ', and G' -G-F is G-K (L-H), and the specific value of the proportional coefficient may be set according to actual needs, which is not specifically limited herein.

In further embodiments, the relative orientation information comprises first relative orientation information of the candidate scene element relative to the target user in the virtual scene; step 250 comprises: and carrying out sound channel adjustment on the target scene audio according to the first relative direction information.

The first relative orientation information is used to indicate an orientation of the candidate scene element relative to the target user in the virtual scene. In some embodiments, in order to allow the user to see the difference between the left and right channels of the audio heard by the user in the virtual scene at different positions in the real world, the target scene audio needs to be adjusted according to the first relative direction information of the candidate scene element relative to the target user.

In some embodiments, the adjusting the target scene audio according to the first relative direction information may specifically be adjusting volumes of left and right channels of the target scene audio according to the first direction information of the candidate scene element relative to the target user, so as to achieve channel adjustment of the target scene audio. For example, if the first direction information indicates that the candidate scene element corresponding to the target scene audio is located on the left side of the target user, the volume of the left channel audio track of the target scene audio may be increased and the volume of the right channel audio track of the target scene audio may be decreased, so as to implement the channel adjustment of the target scene audio.

In other embodiments, specifically, the adjusting the target scene audio according to the first relative direction information may further be determining, according to the first relative direction information, whether the candidate scene element is on the left side or the right side of the target user relative to the target user, optionally, if the candidate scene element is located on the left side of the target user, correspondingly decreasing a weight parameter of a left channel audio track of the target scene audio in the target scene audio and increasing a weight parameter of a right channel audio track of the target scene audio, where the weight parameter may be a parameter determining volume of the left and right channel audio tracks; and if the candidate scene element is positioned on the left side of the target user, reducing the weight parameter of the right channel audio track of the target scene audio and increasing the weight parameter of the left channel audio track of the target scene audio so as to adjust the channels of the target scene audio. The weighting parameter correspondence of different channel tracks in the audio may be to increase the volume of each channel track. The specific weight parameters may be set according to actual needs, and are only illustrated here, and are not particularly limited.

Continuing with fig. 2, step 260, the adjusted target scene audio is sent to the target user's interactive device.

In some embodiments, since the frequency band that can be distinguished by human ears is 20 to 20000Hz, the adjusted target scene audio may be screened first, and the screened target scene audio is sent to the interaction device of the target user, specifically, the audio band whose frequency is within the preset frequency range in the target scene audio may be reserved, so as to reduce the size of the storage space occupied by the target scene audio.

In other embodiments, the adjusted target scene audio may be further compressed and encoded to reduce the size of the storage space occupied by the target scene audio, in this embodiment, when the target scene audio is compressed and encoded, on the premise that the audibility of the user is not affected, optionally, the target scene audio may be compressed and encoded by using methods such as MP3 encoding, WMA encoding, and AAC encoding. Optionally, the size of the storage space occupied by the target scene audio may be compressed by methods of reducing sampling precision, reducing sampling rate, and the like.

Further, if a plurality of determined target scene audios exist, after the target scene audios are respectively adjusted according to the above process, the audio synthesis is performed on the plurality of adjusted target scene audios to obtain a mixed scene audio. The mixed scene audio is then transmitted to the target user's interactive device.

In other embodiments, as shown in fig. 3, prior to step 250, the method further comprises:

step 310, determining a delay time length of the target scene audio relative to the target user according to the relative distance between the target user and the candidate scene element in the virtual scene.

In this embodiment, the delay duration refers to a duration difference between a time when the target scene audio of the candidate scene element starts to propagate and a time when the target user receives the target scene audio when the target scene audio is simulated to propagate in real life in the virtual scene.

In some embodiments, the delay duration may be calculated based on the propagation speed in the target scene audio virtual scene and the relative distance between the target user and the candidate scene element. The propagation speed in the virtual scene of the target scene audio is related to the propagation medium through which the target scene audio propagates, for example, if the propagation medium is air, the propagation speed is 343.2 m/s. Specifically, if the propagation speed of a target scene audio in a virtual scene is V, the relative distance between a target user and a candidate scene element is X, and the delay duration of the target scene audio relative to the target user is T, then T is X/V.

In some embodiments, the longer the relative distance between the target user and the candidate scene element, the longer the delay duration.

And step 320, sending the delay duration to the interaction equipment of the target user, so that the interaction equipment of the target user delays to play the target scene audio according to the delay duration after receiving the target scene audio.

In real life, the time for starting to propagate a target scene audio is different from the time for hearing the target scene audio by human ears, in order to improve the sense of reality of hearing of a user in the virtual scene, the delay time length of the target scene audio relative to a target user is determined by the relative distance between the target user and a candidate scene element in the virtual scene, and then the delay time length is sent to the interaction equipment of the target user, and the interaction equipment of the target user can delay playing the target scene audio according to the delay time length after receiving the target scene audio, so that the propagation condition of an audio signal in a real environment can be simulated, thereby improving the experience in the virtual scene and improving the experience of the user.

In some embodiments, the target user's interactive device may be any one or more of a 3D presentation system, a large projection system (e.g., VR-Platform CAVE), a head mounted stereoscopic display, an eye tracker.

In some embodiments, prior to step 260, the method further comprises: determining a target scene element located between the candidate scene element and a target user in the virtual scene; and adjusting the tone of the target scene audio according to the attribute information of the target scene element.

The tone of the sound heard by the receiver is affected by the sounding body and the transmission medium, and the tone of the sound heard by the human ear is different when the sound emitted by the same sounding body is transmitted through different media. In this embodiment, since the target scene element is a scene element located between the location of the candidate scene element and the location of the target user in the virtual scene, it can be understood that, if the target scene element is in the real environment, the target scene element is a propagation medium of the sound signal from the candidate scene element to the location of the target user. In this embodiment, the timbre of the target scene audio is adjusted according to the target scene element determined to be located between the candidate scene element and the target user in the virtual scene, so as to simulate the propagation condition of the sound signal in the real environment.

The timbre of the target scene audio is changed differently after the target scene audio is propagated through different target scene elements, for example, if the target scene element is a wall, the timbre of the target scene audio is adjusted to be clunky. The attribute information of the target scene element can indicate the type, material, and the like of the target scene element, and the adjustment of the timbre of the specific target scene audio is related to the type, material, and the like of the target scene element. Alternatively, changing the timbre of the target scene audio may be accomplished by changing harmonics of the target scene audio.

In some embodiments, the tone parameters corresponding to different target scene elements are set in advance according to the attribute information of the target scene elements, after the target scene elements are determined, the tone parameters corresponding to the target scene elements are determined according to the attribute information of the target scene elements, and finally, the tone of the target scene audio is adjusted according to the tone parameters. Optionally, the timbre parameters may include harmonic parameters that the target audio needs to be adjusted, and the timbre of the target scene audio is changed by adjusting harmonics of the target scene audio according to the harmonic parameters.

In some embodiments, after step 250, the method further comprises: acquiring an environmental audio associated with a virtual scene; determining whether an audio superposition condition is met according to the volume of the environmental audio and the volume of the adjusted target scene audio; and if the audio frequency superposition condition is met, superposing the environment audio frequency to the target scene audio frequency.

In some embodiments, different virtual scenes correspond to different ambient audio. In order to enable the hearing experience of the user in the virtual scene to be more real, different corresponding environmental audios are associated in different virtual scenes, for example, the virtual scene is a station, and the corresponding associated environmental audios may be car horns and the like. Optionally, different identifiers may be set for the environmental audio associated with each virtual scene, and the scene audio associated with the virtual scene may be acquired according to the current virtual scene.

In some embodiments, the audio superimposition condition may be that a ratio between the volume of the ambient audio and the adjusted volume of the target scene audio is greater than a ratio threshold. For example, if the volume of the ambient audio is 15dB, the volume of the adjusted target scene audio is 50dB, the ratio between the volume of the ambient audio and the volume of the adjusted target scene audio is 3/10, and the ratio threshold is 1/2, it is determined that the audio superimposition condition is satisfied.

In other embodiments, the audio superimposition condition may also be that the difference between the volume of the adjusted target scene audio and the volume of the environmental audio is greater than a set volume difference threshold.

In some embodiments, if the audio superposition condition is not satisfied, the adjusted target scene audio is directly sent to the interactive device of the target user.

In other embodiments, prior to step 260, the method further comprises: determining the climate type of the virtual scene; and overlaying the environmental audio corresponding to the climate type on the target scene audio.

In real life, there is the difference in the environmental sound under the different climates, for example, there is the rain sound of getting out of the way when the weather is little rain, and the rain sound grow and still can be accompanied by the thunder when the weather is big rain, for let the user experience in virtual scene feel more real, can come the environmental audio frequency that this climate type corresponds according to the climate type of current virtual scene superposes.

In some embodiments, when the virtual scene is constructed, corresponding climate parameters are set for each climate type, and the climate type of the current virtual scene can be determined by determining the climate parameters corresponding to the virtual scene.

In some embodiments, the environmental audio corresponding to each climate type is preset, after the climate type of the virtual scene is determined, the environmental audio corresponding to the climate type is determined according to the climate type, and then the environmental audio is superimposed on the target scene audio.

In some embodiments, as shown in fig. 4, prior to step 260, the method further comprises:

step 410, according to the position information of the target user in the virtual scene, determining a local virtual scene in the set range of the target user in the virtual scene.

The position information of the target user in the virtual scene is used for indicating the specific position of the target user in the virtual scene currently.

The virtual scene may be a scene in a wide range, and the position of the target user is only a small area in the virtual scene, so that a local virtual scene in which a set range centered on the target user is located in the virtual scene needs to be determined.

In some embodiments, the setting range may be a spatial range in which a circle having the target user as the center x of the sphere is located; in other embodiments, the setting range may also be a spatial range in which a cube with a center a × b × c is located for the target user, and the setting range may be set according to actual needs, and is not specifically limited herein.

Step 420, determining a target scene type corresponding to the local virtual scene.

In some embodiments, the target scene type corresponding to the local virtual scene is used to indicate a scene type of a position where the target user is located in the virtual scene. For example, if the target user is located at a corner of a closed room, the local virtual scene may be where the target user is located, and the target scene type corresponding to the local virtual scene is the closed room.

In some embodiments, one or more scene models may be pre-constructed, each scene model corresponding to a scene type, and further, a virtual scene may be constructed by the scene models, so that after a local virtual scene is determined, a target scene type of the local virtual scene may be determined according to the scene model corresponding to the local virtual scene. For example, a plurality of scene models such as a classroom, an auditorium, a playground, and a conference room may be constructed in advance, and a virtual scene of a school may be constructed from these scene models. If the scene model corresponding to the local virtual scene is determined to be the scene model of the classroom, the target scene type corresponding to the local virtual scene can be determined to be a room.

In other embodiments, key element information corresponding to each scene type may be preset, where the key element information corresponding to a scene type is used to indicate a scene element included in a local virtual scene belonging to the scene type. On this basis, according to the scene elements included in the local virtual scene, if a scene element in the local virtual scene (assumed to be the local virtual scene P) is one of the scene elements indicated by the key element information corresponding to a scene type (assumed to be the scene type S), it is determined that the target scene type corresponding to the local virtual scene is the scene type S.

And 430, superposing a reverberation effect on the audio frequency of the target scene according to the reverberation parameter information corresponding to the type of the target scene.

The reverberation parameter information may include a reverberation type, a reverberation time duration, a decay time, a reverberation reflection, and the like. In some embodiments, corresponding reverberation parameters may be set in advance for various scene types, and after a target scene type is determined, reverberation parameter information corresponding to the target scene type may be directly obtained. The reverberation effect is different for different scene types, e.g. when the target is used for reverberation in a closed room than when the target user is located in an open space. According to the reverberation parameter information corresponding to the target scene type, the reverberation effect corresponding to the reverberation parameter information is superposed on the target scene audio frequency, so that the experience of a user in the virtual scene can be improved.

In some embodiments, as shown in fig. 5, after step 250, the method further comprises:

in step 510, in the virtual scene, a reference user whose distance from the target user is less than a distance threshold is determined.

In this embodiment, there may be a plurality of users interacting with the virtual scene, and a user whose distance from the target user is smaller than the distance threshold is referred to as a reference user. For example, a user that is not more than 1 meter away from the target user is used as the reference user. The specific distance threshold may be set according to actual needs, and is not particularly limited herein.

And step 520, sending the adjusted target scene audio to the interaction equipment of each reference user.

In some embodiments, since the distance between each reference user and the target user is smaller than the distance threshold, the target scene audio that can be heard by each reference user and the target user is substantially the same, that is, the adjusted target scene audio and each reference user can be shared, and for each reference user, the process of the step 210 and the step 250 need not be performed on the target scene audio, so that the calculation pressure can be reduced.

In some embodiments, each reference user may select whether to receive the adjusted target scene audio, specifically, before sending the adjusted target scene audio to the interaction device of the reference user, sending a prompt message to the interaction device of the reference user, where the prompt message is used to prompt whether to receive the scene audio, and if the reference user selects to receive, sending the adjusted target scene audio to the interaction device of each reference user.

In some embodiments, as shown in fig. 6, after step 510, the method further comprises:

second relative orientation information of the candidate scene elements with respect to the respective reference users is determined in the virtual scene, step 610.

The second relative direction information indicates a direction of the target scene audio relative to the reference user.

And step 620, determining the sound channel adjustment information of the target scene audio relative to each reference user according to the second opposite direction information.

The channel adjustment information is used for indicating a channel of the target scene audio to be adjusted and a volume adjustment amount corresponding to the channel to be adjusted.

Further, the second relative direction information and the distance between the candidate scene element relative to each reference user can be combined to determine the channel adjustment information of the target scene audio relative to each reference user.

Step 630, the channel adjustment information of the target scene audio relative to each reference user is sent to the interaction device of the corresponding reference user, so that the interaction device of the reference user performs channel adjustment on the received target scene audio according to the channel adjustment information after receiving the adjusted target scene audio.

In this case, the reference user and the target user may share the target scene audio after performing the volume adjustment, the tone color adjustment, the reverberation effect increase, and the environment audio superimposition, and for the interaction device of the reference user, only the channel adjustment needs to be performed according to the channel adjustment information.

In some embodiments, a specific implementation manner of the interaction device of each reference user performing the channel adjustment on the received target scene audio according to the channel adjustment information is similar to the specific implementation manner of performing the channel adjustment on the target scene audio according to the first relative direction information, and is not described herein again.

Fig. 7 illustrates an audio processing apparatus in a virtual scene according to an embodiment of the present application, where, as shown in fig. 7, the audio processing 700 in the virtual scene includes: candidate scene element determination module 710, relative distance determination module 720, target volume determination module 730, target scene audio determination module 740, target scene audio adjustment module 750, and transmission module 760.

A candidate scene element determining module 710, configured to determine a candidate scene element associated with a scene audio in a virtual scene; a relative distance determination module 720, configured to determine a relative distance between the target user and each candidate scene element in the virtual scene; the target volume determining module 730 is configured to determine, according to the relative distance between the candidate scene element and the target user and the volume information corresponding to the scene audio associated with the candidate scene element, a target volume at which the audio signal corresponding to the corresponding scene audio propagates from the candidate scene element to the position of the target user in the virtual scene; a target scene audio determining module 740, configured to determine, among scene audios associated with the candidate scene elements, a target scene audio whose corresponding target volume exceeds a volume threshold; a target scene audio adjusting module 750, configured to adjust the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user; and a transmission module 760, configured to send the adjusted target scene audio to the interaction device of the target user.

In some embodiments, the relative orientation information includes a relative distance between the target user and the candidate scene element in the virtual scene; in this embodiment, the target scene audio adjusting module 750 includes an adjusting volume determining unit, configured to determine an adjusting volume corresponding to a target scene audio according to a relative distance between a target user and a candidate scene element in a virtual scene; and the volume adjusting unit is used for adjusting the volume of the target scene audio according to the adjusted volume.

In some embodiments, the target scene audio adjustment module 750 further comprises: the delay duration determining unit is used for determining the delay duration of the target scene audio relative to the target user according to the relative distance between the target user and the candidate scene element in the virtual scene; and the delay duration sending unit is used for sending the delay duration to the interactive equipment of the target user so as to delay the playing of the target scene audio according to the delay duration after the interactive equipment of the target user receives the target scene audio.

In further embodiments, the relative orientation information comprises first relative orientation information of the candidate scene element relative to the target user in the virtual scene; in this embodiment, the target scene audio adjusting module 750 includes a sound channel adjusting unit, configured to perform sound channel adjustment on the target scene audio according to the first relative direction information.

In some embodiments, the audio processing apparatus 700 in the virtual scene further comprises: a target scene element determination module for determining a target scene element located between the candidate scene element and the target user in the virtual scene; and the tone color adjusting module is used for adjusting the tone color of the target scene audio according to the attribute information of the target scene element.

In some implementations, the audio processing device 700 in the virtual scene further includes: the climate type determining module is used for determining the climate type of the virtual scene; and the audio superposition module is used for superposing the environmental audio corresponding to the climate type on the audio of the target scene.

In some embodiments, the audio processing apparatus 700 in the virtual scene further comprises: the local virtual scene determining module is used for determining a local virtual scene in a set range where a target user is located in the virtual scene according to the position information of the target user in the virtual scene; the target scene type determining module is used for determining a target scene type corresponding to the local virtual scene; and the reverberation superposition module is used for superposing a reverberation effect on the target scene audio according to the reverberation parameter information corresponding to the target scene type.

In some embodiments, the audio processing apparatus 700 in the virtual scene further comprises: the reference user determining module is used for determining a reference user of which the distance to the target user is less than a distance threshold value in the virtual scene; and the audio sending module is used for sending the adjusted target scene audio to the interactive equipment of each reference user.

In some embodiments, the audio processing apparatus 700 in the virtual scene further comprises: the second relative direction information determining module is used for determining second relative direction information of the candidate scene elements relative to each reference user in the virtual scene; the sound channel adjusting information determining module is used for determining the sound channel adjusting information of the target scene audio relative to each reference user according to the second opposite direction information; and the sound channel adjustment information sending module is used for sending the sound channel adjustment information of the target scene audio relative to each reference user to the interaction equipment of the corresponding reference user, so that the interaction equipment of the reference user carries out sound channel adjustment on the received target scene audio according to the sound channel adjustment information after receiving the adjusted target scene audio.

According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method in any of the above embodiments.

According to an aspect of the embodiments of the present application, there is also provided an electronic device, as shown in fig. 8, the electronic device 800 includes a processor 810 and one or more memories 820, the one or more memories 820 are used for storing program instructions executed by the processor 810, and the processor 810 implements the object recognition method when executing the program instructions.

Further, processor 810 may include one or more processing cores. The processor 810 executes or otherwise executes the instructions, programs, code sets, or instruction sets stored in the memory 820 and invokes the data stored in the memory 820. Alternatively, the processor 810 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 810 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is to be understood that the modem may be implemented by a communication chip without being integrated into the processor.

According to an aspect of the present application, there is also provided a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries computer-readable instructions that, when executed by a processor, implement the method of any of the embodiments described above.

It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. The names of these elements are not intended to limit the elements themselves in some way.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for audio processing in a virtual scene, comprising:

determining candidate scene elements in the virtual scene which are associated with scene audio;

determining a relative distance between a target user and each of the candidate scene elements in the virtual scene;

according to the relative distance between the candidate scene element and the target user and the volume information corresponding to the scene audio associated with the candidate scene element, determining the target volume of the audio signal corresponding to the corresponding scene audio, which is propagated from the candidate scene element to the position of the target user in the virtual scene;

determining a target scene audio of which the corresponding target volume exceeds a volume threshold in the scene audio associated with the candidate scene element;

adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user;

and sending the adjusted target scene audio to the interaction equipment of the target user.

2. The method of claim 1, wherein the relative orientation information comprises a relative distance between the target user and the candidate scene element in the virtual scene; the adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user includes:

determining the corresponding adjusted volume of the target scene audio according to the relative distance between the target user and the candidate scene element in the virtual scene;

and adjusting the volume of the target scene audio according to the adjusted volume.

3. The method of claim 2, wherein after the volume adjusting the target scene audio by the adjusted volume, the method further comprises:

acquiring an environmental audio associated with a virtual scene;

determining whether an audio superposition condition is met according to the volume of the environment audio and the adjusted volume of the target scene audio;

and if the audio superposition condition is met, superposing the environmental audio into the target scene audio.

4. The method according to claim 1 or 2, characterized in that the method further comprises:

determining a delay duration of the target scene audio relative to the target user according to a relative distance between the target user and the candidate scene element in the virtual scene;

and sending the delay time length to the interaction equipment of the target user, so that the interaction equipment of the target user can delay playing the target scene audio according to the delay time length after receiving the target scene audio.

5. The method of claim 1, wherein the relative orientation information comprises first relative orientation information of the candidate scene element with respect to the target user in the virtual scene; the adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user includes:

and carrying out sound channel adjustment on the target scene audio according to the first relative direction information.

6. The method of claim 1, wherein before the sending the adjusted target scene audio to the target user's interactive device, the method further comprises:

determining a target scene element in the virtual scene that is located between the candidate scene element and the target user;

and adjusting the tone of the target scene audio according to the attribute information of the target scene element.

7. The method of claim 1, wherein before the sending the adjusted target scene audio to the target user's interactive device, the method further comprises:

determining a climate type of the virtual scene;

and superposing the environmental audio corresponding to the climate type on the target scene audio.

8. The method of claim 1, wherein before the sending the adjusted target scene audio to the target user's interactive device, the method further comprises:

determining a local virtual scene in the set range of the target user in the virtual scene according to the position information of the target user in the virtual scene;

determining a target scene type corresponding to the local virtual scene;

and superposing a reverberation effect on the target scene audio according to the reverberation parameter information corresponding to the target scene type.

9. The method of claim 1, wherein after the adjusting the target scene audio according to the information of the relative orientation between the candidate scene element corresponding to the target scene audio and the target user, the method further comprises:

in the virtual scene, determining a reference user with a distance to the target user smaller than a distance threshold;

and sending the adjusted target scene audio to the interaction equipment of each reference user.

10. The method of claim 9, wherein after determining a reference user having a distance to the target user that is less than a distance threshold in the virtual scene, the method further comprises:

determining second relative orientation information of the candidate scene elements with respect to the reference users in the virtual scene;

determining the sound channel adjustment information of the target scene audio relative to each reference user according to the second opposite direction information;

and transmitting the sound channel adjustment information of the target scene audio relative to each reference user to the interaction equipment of the corresponding reference user, so that the interaction equipment of the reference user performs sound channel adjustment on the received target scene audio according to the sound channel adjustment information after receiving the adjusted target scene audio.

11. An apparatus for audio processing in a virtual scene, the apparatus comprising:

a candidate scene element determination module, configured to determine candidate scene elements associated with scene audio in the virtual scene;

a relative distance determination module for determining a relative distance between a target user and each of the candidate scene elements in the virtual scene;

a target volume determining module, configured to determine, according to a relative distance between the candidate scene element and the target user and volume information corresponding to a scene audio associated with the candidate scene element, a target volume at which an audio signal corresponding to the corresponding scene audio propagates from the candidate scene element to a position of the target user in the virtual scene;

the target scene audio determining module is used for determining the target scene audio of which the corresponding target volume exceeds a volume threshold value in the scene audio associated with the candidate scene element;

the target scene audio adjusting module is used for adjusting the target scene audio according to the relative orientation information between the candidate scene element corresponding to the target scene audio and the target user;

and the transmission module is used for transmitting the adjusted target scene audio to the interactive equipment of the target user.

12. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any one of claims 1 to 10.

13. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 10.