CN114049871A

CN114049871A - Audio processing method and device based on virtual space and computer equipment

Info

Publication number: CN114049871A
Application number: CN202210036645.5A
Authority: CN
Inventors: 梁俊斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-02-15

Abstract

The application relates to an audio processing method and device based on a virtual space and computer equipment. The method comprises the following steps: in response to an interactive operation for the virtual space, determining a sound source of interest in the virtual space, and regarding a sound source other than the sound source of interest among all sound sources in the virtual space as a non-sound source of interest; acquiring a target attention audio corresponding to an attention sound source and a target non-attention audio corresponding to a non-attention sound source; performing audio adjustment processing on at least one of a target attention audio and a target non-attention audio to obtain a first intermediate audio of an attention sound source and a second intermediate audio of a non-attention sound source; and mixing the first intermediate audio and the second intermediate audio to obtain a mixed audio processing result. Because the audio adjustment processing can be carried out on at least one audio, the sound effect discrimination corresponding to the concerned sound source is larger than the sound effect discrimination corresponding to the irrelevant sound source, and the concerned sound source is the sound which is interested by the user, so that the user can hear the interested sound more easily.

Description

Audio processing method and device based on virtual space and computer equipment

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to an audio processing method and apparatus based on a virtual space, and a computer device.

Background

The spatial sound effect is processed through a certain audio technology to enable a user to hear sound with more stereoscopic impression and spatial hierarchy, the auditory scene of an actual scene is played and restored through an earphone or the combination of more than two loudspeakers, the listener can clearly recognize the direction, the distance and the moving track of different acoustic objects, the listener can feel the omnibearing package feeling of the sound, and the listener can feel the immersive auditory experience of the actual environment. In order to make users get better sound effect experience, how to process audio is a key.

In the related art, audio processing is mainly realized by virtual stereo reconstruction. Specifically, the volume of each sound source is mainly set based on the distance between each sound source and the user, that is, distance sensing processing. Wherein, the sound volume setting of the sound source far away is smaller, and the sound volume setting of the sound source close to the sound source is larger. The above processing process emphasizes the sense of distance that the user experiences different sound sources, the user is difficult to hear the sound that the user is interested in, and under the noisy environment, the user is more difficult to hear clearly.

Disclosure of Invention

In view of the foregoing, there is a need to provide a virtual space-based audio processing method, apparatus and computer device that enable a user to hear sounds of interest more easily.

A method of virtual space-based audio processing, the method comprising:

in response to an interactive operation for the virtual space, determining a sound source of interest in the virtual space, and regarding a sound source other than the sound source of interest among all sound sources in the virtual space as a non-sound source of interest;

acquiring a target attention audio corresponding to an attention sound source and a target non-attention audio corresponding to a non-attention sound source;

performing audio adjustment processing on at least one of a target attention audio and a target non-attention audio to obtain a first intermediate audio of an attention sound source and a second intermediate audio of a non-attention sound source, wherein the sound effect discrimination of the first intermediate audio is greater than that of the second intermediate audio;

and mixing the first intermediate audio and the second intermediate audio to obtain a mixed audio processing result.

An apparatus for audio processing based on a virtual space, the apparatus comprising:

a determination module configured to determine a sound source of interest in a virtual space in response to an interactive operation for the virtual space, and to take sound sources other than the sound source of interest among all sound sources in the virtual space as non-sound sources of interest;

the acquisition module is used for acquiring a target attention audio corresponding to the attention sound source and a target non-attention audio corresponding to the non-attention sound source;

the audio adjusting and processing module is used for performing audio adjusting and processing on at least one of a target attention audio and a target non-attention audio to obtain a first intermediate audio of an attention sound source and a second intermediate audio of a non-attention sound source, wherein the sound effect discrimination of the first intermediate audio is greater than that of the second intermediate audio;

and the sound mixing processing module is used for mixing the first intermediate audio and the second intermediate audio to obtain a sound mixing processing result.

In one embodiment, the method includes determining, in response to an interactive operation for a virtual space, a region of interest in the virtual space to which the interactive operation is directed; a sound source in the region of interest is taken as a sound source of interest.

In one embodiment, the acquisition module is configured to perform audio mixing processing on a focused audio source emitted by each of a plurality of focused audio sources to obtain a target focused audio; and carrying out sound mixing processing on the non-attention audio frequencies emitted by the plurality of non-attention sound sources respectively to obtain the target non-attention audio frequency.

In one embodiment, the audio conditioning processing module is configured to perform signal enhancement processing on a target attention audio to obtain a first intermediate audio corresponding to an attention sound source; and performing signal attenuation processing on the target non-attention audio to obtain a second intermediate audio corresponding to the non-attention sound source.

In one embodiment, the audio conditioning processing module includes:

a first determination unit configured to determine an audio to be adjusted from a target attention audio and a target non-attention audio;

the second determining unit is used for determining adjusting parameters respectively corresponding to each sampling moment of the audio to be adjusted;

and the third determining unit is used for performing audio adjustment processing on the audio to be adjusted based on the adjustment parameters respectively corresponding to the sampling moments, and determining a first intermediate audio of the concerned sound source and a second intermediate audio of the non-concerned sound source based on the audio adjustment result.

In one embodiment, the second determining unit is configured to determine a target time period within which each sampling time of the audio to be adjusted falls, where the target time period is determined by a sound source switching process, and the sound source switching process refers to switching between a sound source of interest and a sound source of non-interest; and obtaining the calculation results of the adjustment functions corresponding to the target time periods in which the sampling moments respectively fall as the adjustment parameters corresponding to the sampling moments respectively, wherein the value of an independent variable existing in the adjustment functions is determined based on the sampling moments.

In one embodiment, an adjustment threshold is included in the adjustment function, and the adjustment threshold is determined by a distance between the corresponding sound source and a virtual operation object in the virtual space, where the virtual operation object is a mapping object of a target object in the virtual space, which triggers the interactive operation.

In one embodiment, if the audio to be adjusted is the target attention audio, the target time period is one of a cut-in attention time period, a continuous attention time period and a quit attention time period; wherein the start time of the cut-in attention period is determined based on a time at which the sound source is determined as the attention sound source, the start time of the continuous attention period is determined based on an end time of the cut-in attention period, and the start time of the exit attention period is determined based on a time at which the attention sound source is switched to the non-attention sound source, or is determined based on the cancel attention instruction.

In one embodiment, the adjustment function corresponding to the cut-in attention time period is a monotonically increasing function, the adjustment function corresponding to the continuous attention time period is a constant function, and the adjustment function corresponding to the exit attention time period is a monotonically decreasing function.

In one embodiment, the gradient of each of the monotonically increasing function and the monotonically decreasing function gradually decreases with increasing argument.

In one embodiment, if the audio to be adjusted is the target non-attention audio, the target time period is one of a cut-in non-attention time period, a continuous non-attention time period and a quit non-attention time period; wherein a start time of switching into the non-attention period is determined based on a time at which the sound source is determined to be a non-attention sound source, a start time of continuing the non-attention period is determined based on an end time of switching into the non-attention period, a start time of exiting the non-attention period is determined based on a time at which the non-attention sound source is switched to the attention sound source, or is determined based on the attention instruction.

In one embodiment, the adjustment function corresponding to the cut-in non-attention time period is a monotone decreasing function, the adjustment function corresponding to the continuous non-attention time period is a constant function, and the adjustment function corresponding to the exit non-attention time period is a monotone increasing function.

In one embodiment, the third determining unit is configured to, when the target attention audio is an audio to be adjusted, take an audio adjustment result corresponding to the target attention audio as a first intermediate audio of the attention sound source, and otherwise directly take the target attention audio as the first intermediate audio; and if the target non-attention audio is the audio to be adjusted, taking an audio adjusting result corresponding to the target non-attention audio as a second intermediate audio of the non-attention sound source, otherwise, directly taking the target non-attention audio as the second intermediate audio.

In one embodiment, each sound source in the virtual space corresponds to a plurality of sound channels; the apparatus also includes a stereo reconstruction module; the stereo reconstruction module is used for acquiring the sound mixing processing results corresponding to the multiple sound channels, performing stereo reconstruction based on the sound mixing processing results of the multiple sound channels, and outputting the reconstructed stereo in a virtual space.

In one embodiment, the interactive operation is captured by an interactive device, and the operation type of the interactive operation includes at least one of a sensory pointing type, an awareness pointing type, and a limb pointing type.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

A computer program product or computer program, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions being read by a processor of a computer device from the computer readable storage medium, the processor executing the computer instructions to cause the computer device to perform the steps of:

According to the audio processing method, the audio processing device, the computer equipment, the storage medium and the computer program based on the virtual space, because the audio adjustment processing can be performed on at least one of the audio corresponding to the concerned sound source and the audio corresponding to the non-concerned sound source in the virtual space, the audio discrimination of the audio corresponding to the concerned sound source is larger than the audio discrimination of the audio corresponding to the non-concerned sound source, and the concerned sound source is the sound which is interested by the user, the user can hear the sound which is interested by the user more easily. In addition, under a noisy environment, a user can effectively distinguish interested sounds.

Drawings

Fig. 1 is a diagram illustrating an application environment of a flow chart of a stereo reconstruction method in the related art;

FIG. 2 is a diagram of an exemplary application environment for a virtual space-based audio processing method;

FIG. 3 is a flow diagram of a method for virtual space-based audio processing according to one embodiment;

FIG. 4 is a diagram illustrating a simulated field of view region covered in virtual space in one embodiment;

FIG. 5 is a schematic diagram of sound sources located within an area of interest in one embodiment;

FIG. 6 is a diagram illustrating pointing of a pointing motion to a virtual operation object in virtual space, according to one embodiment;

FIG. 7 is a diagram illustrating a virtual object being passively used as a sound source of interest in one embodiment;

FIG. 8 is a schematic diagram illustrating the determination of a sound source of interest and a sound source of non-interest in a virtual space according to an embodiment;

FIG. 9 is a graphical illustration of an adjustment function corresponding to a target audio of interest in one embodiment;

FIG. 10 is a graphical illustration of an adjustment function for a target non-attention audio in one embodiment;

FIG. 11 is a flowchart illustrating a virtual space-based audio processing method according to another embodiment;

FIG. 12 is a block diagram of an embodiment of a virtual space-based audio processing apparatus;

FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

First, terms related to embodiments of the present application will be briefly explained:

the meta universe: the digital living space is a virtual world which is linked and created by utilizing scientific and technological means, is mapped and interacted with the real world, and is provided with a novel social system. The system integrates the novel virtual-real integrated Internet application and social form generated by multiple new technologies, provides immersive experience based on an augmented reality technology, generates a mirror image of a real world based on a digital twin technology, builds an economic system based on a block chain technology, integrates the virtual world and the real world closely on the economic system, a social system and an identity system, and allows each user to perform content production and world editing.

Spatial sound effect: the sound with more stereoscopic impression and spatial layering sense is heard by a user through certain audio technology processing, the auditory scene of an actual scene is played and restored through an earphone or the combination of more than two loudspeakers, the listener can clearly recognize the direction, the distance sense and the moving track of different acoustic objects, the listener can feel the omnibearing package sense of the sound, and the listener can feel the immersive auditory experience of being in an actual environment.

The metauniverse needs to integrate artificial intelligence virtual perception technologies in various aspects such as audio, video and perception, a computer virtual space approaching real world perception is constructed, and an experiencer can experience sensory feelings which are not different from the real world by means of hardware devices (earphones, glasses and somatosensory equipment). The virtual space sound effect is an important part, the binaural sound signals under the real environment are restored through the virtual space sound effect, and the experience person can perceive stereo sound effect experience under the real environment by wearing the earphones. For example, the voice, laughter, and footstep of different people exist in different directions around the vehicle, the engine sound of the vehicle coming from far to near, the pedestrian path warning sound, and the wind and rain sounds.

However, generating virtual stereo is very computationally expensive, and in order to restore the real-world experience, the virtual space needs to generate and mix different sound sources with different orientations by using an HRTF (Head Related Transfer Function) virtual stereo reconstruction technique to generate and mix the two ears of the incoming experience. Since a large number of sound sources require simultaneous stereo reconstruction of HRTFs, huge computational overhead is incurred, and thus, a great challenge is posed to the real-time audio experience.

In the related art, as shown in fig. 1, different sound sources in a virtual space mainly undergo distance sensing processing, virtual stereo reconstruction, and stereo mixing, and then generate sound signals that finally enter both ears of a user. Since each sound source is mainly to adjust the volume of each sound source according to the physical distance from the current user, i.e. the "distance sensing process" shown in fig. 1. But when the environmental sound is noisy, the user is difficult to hear the object sound of interest.

In view of the problems in the related art, the present application provides a virtual space-based audio processing method, which can be applied to an application environment as shown in fig. 2. Wherein the interactive device 202 can perform data transmission with the computer device 204. Specifically, the computer device 204 is configured to construct a virtual space, the virtual space may be presented through the interaction device 202, and the user may interact with the virtual space through the interaction device, and in response to the interaction operation, the computer device 204 may determine, based on the obtained interaction operation related data, a sound source of interest in the virtual space, and use the remaining sound source as a non-sound source of interest. The computer device 204 provides different sensory experiences to the user by adjusting the audio of the sound source. The above process is a spatial sound effect adjustment process, and the above mentioned virtual space can be a metaspace in the noun introduction.

It can be understood that the method provided in the embodiment of the present application can be applied to, without limitation, a virtual reality application program, a three-dimensional map program, an event simulation program, a game application, and the like. The above-mentioned interactive device 202 may be a desktop computer, a laptop computer, a mobile phone, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4) player, a VR device such as VR glasses, and so on. Applications supporting a virtual space, such as applications supporting a three-dimensional virtual environment, may be installed and run in the interactive device 202.

It should be noted that the interactive devices in the foregoing process are only schematic illustrations, and in an actual implementation process, a user is not limited to being able to operate only one interactive device, and the embodiment of the present application does not specifically limit the type and number of the interactive devices. For example, a user can realize visual interaction through VR glasses, and realize limb interaction through a body sensing device, that is, realize interactive operation through two interactive devices.

It should be further noted that the computer device 204 may be a terminal or a server, the terminal may be a mobile terminal or other intelligent terminals such as a vehicle-mounted terminal, and the server may specifically be implemented as a physical server, and may also be implemented as a cloud server in a cloud. The Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data of different levels are processed separately, and various industry data need strong system background support and can be realized through cloud computing.

In some embodiments, the computer device 204 described above may also be implemented as a node in a blockchain system. The Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. The block chain, which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

It is noted that the user exists as an interaction trigger in the virtual space, which can be projected to the virtual space through the virtual operation object. Wherein, the virtual operation object refers to a movable object in the virtual space. The movable object can be a virtual character, a virtual animal, other movable virtual objects, and the like. For example, the virtual operation object may be a character, an animal, a plant, an oil drum, a wall, a stone, and the like displayed in a three-dimensional virtual environment. Optionally, the virtual operation object is a three-dimensional stereo model created based on an animated skeleton technique. Each virtual operation object has a shape and a volume in the three-dimensional virtual environment, and can occupy a part of the space in the three-dimensional virtual environment.

In some embodiments, in combination with the noun introduction and the description of the implementation environment, as shown in fig. 3, there is provided a virtual space-based audio processing method, which is applied to the computer device 204 in fig. 2 and is described as an example applied in a virtual reality application, and includes the following steps:

step 302, in response to the interactive operation for the virtual space, determining a sound source of interest in the virtual space, and regarding sound sources other than the sound source of interest among all sound sources in the virtual space as non-sound sources of interest.

Wherein a virtual space is primarily a two-dimensional, three-dimensional, or higher dimensional space constructed by a computer device. Visual elements such as scenery, light and shadow, water body and the like can be presented in the virtual space, auditory elements such as human voice, scenery and the like can be presented, and even tactile elements and the like can be presented. In this step, the interactive operation is mainly used to adjust the auditory elements in the virtual space. Background sound may exist in the virtual space, and a plurality of virtual objects, such as a plurality of characters and scenes, which may emit sound, may also exist as different sound sources.

It will be appreciated that for sound sources generated in virtual space by a virtual reality application, some sound sources are of interest to the user, such as speech sounds of certain characters when they are present, and some sound sources are not of interest to the user, such as ambient noise. Thus, in this step, the sound source of interest in the virtual space may be determined by the computer device first. Wherein, the sound source of interest is the sound source of interest to the user. And the sound source of interest to the user is determined by the interaction referred to in this step.

The interaction operation may be triggered by a user through the interaction device, and may specifically be triggered by a limb operation. It is to be understood that the triggering manner and type of the interactive operation may be associated with the application scenario and the type of the interactive device, and may also be associated with other factors, which is not specifically limited in the embodiment of the present application. For example, if the application scenario is that the user plays a motion sensing game through a motion sensing device, the type of the interactive operation may be the limb movement of the user. For another example, if the application scene is that the user plays an immersive game through VR glasses, the type of the interactive operation may be a head action of the user, such as a turning action. Or the VR glasses do not capture the head motion of the user, but capture the eyeball motion of the user by means of eyeball position acquisition, and the type of the interactive operation may be the eyeball motion of the user.

It is further understood that, after the interaction device detects the interaction operation of the user with respect to the virtual space, the interaction operation may be converted into interaction action data that can be recognized by the computer device, such as acoustic object data of a virtual operation object in a virtual reality application program, so that the computer device may respond based on the interaction action data, that is, perform the process in this step.

It should be noted that, because the sound source in the virtual space is usually captured in real time or preset by the virtual reality application program, the computer device can obtain various data of all the sound sources in the virtual space, that is, the computer device can know all the sound sources existing in the virtual space. Thus, upon determining the sound source of interest, the computer device can determine which sound sources remain among all the sound sources and treat the sound sources other than the sound source of interest among all the sound sources as the non-sound sources of interest. Wherein, the non-concerned sound source is a sound source which is not interested by the user. Of course, in an actual implementation process, the non-attention sound source in the virtual space may not necessarily be determined by the above-mentioned elimination method, for example, the background sound may be specified as the non-attention sound source in advance, which is not specifically limited in this embodiment of the application. It should be further noted that, after determining the sound source of interest and the sound source of no interest in the virtual space, the computer device may set an identifier for the two sound sources, so as to facilitate the judgment of the subsequent processing procedure.

Step 304, a target attention audio corresponding to the attention sound source and a target non-attention audio corresponding to the non-attention sound source are obtained.

It will be appreciated that the sound sources referred to in the above steps are primarily representations standing from the human auditory perspective, the sound sources real-ly corresponding to virtual objects in virtual space, and the virtual objects in virtual reality applications are typically represented by acoustic object data. The above steps are to distinguish which virtual objects are objects of interest and which virtual objects are objects of no interest, and in this step, mainly, the audio emitted by the virtual objects as sound sources is acquired. As can be seen from the explanation in the above steps, the sound source in the virtual space is usually captured in real time or preset by the virtual reality application, so that the audio obtained in this step may be real-time captured audio data or pre-recorded audio data.

Further, the audio obtained in this step may be associated with time. For the current time, if the sound source in the virtual space is usually captured in real time, the audio acquired in this step is the audio data captured in real time at the current time. If the sound source of the virtual space is preset by the virtual reality application program, the audio acquired in this step is the audio data output when the preset audio is played to the current time according to the time sequence.

It should be noted that the "target attention audio" and the "target non-attention audio" mentioned in this step do not limit the audio itself, but mainly serve as a distinction between two different sources of the sound source. In an actual implementation process, each attention sound source may correspond to one target attention audio, and each non-attention sound source corresponds to one target non-attention audio.

And step 306, performing audio adjustment processing on at least one of the target attention audio and the target non-attention audio to obtain a first intermediate audio of the attention sound source and a second intermediate audio of the non-attention sound source, wherein the sound effect discrimination of the first intermediate audio is greater than that of the second intermediate audio.

In this step, the computer device may perform audio adjustment processing only on the target attention audio, may perform audio adjustment processing only on the target non-attention audio, and may perform audio adjustment processing on both the target attention audio and the target non-attention audio. The audio adjustment processing method may be a volume adjustment method, a coding rate adjustment method, a signal filtering method, and the like, which is not specifically limited in this embodiment of the present application. It can be appreciated that in the embodiment of the present application, there are multiple sound sources, i.e. multiple audio sources, in the virtual space. The sound effect discrimination of a certain audio mainly refers to the difficulty degree of distinguishing the audio in all the audios in the human auditory domain.

Wherein, the greater the sound effect discrimination, the more easily the audio is discriminated by human ears in all audio. Conversely, it means that the audio is mixed in all the audio and is more difficult to be distinguished by human ears. Therefore, the phrase "the sound effect discrimination of the first intermediate audio is greater than the sound effect discrimination of the second intermediate audio" can be specifically expressed as that the sound effect discrimination of the concerned sound source in all sound sources caused by the first signal intermediate processing result and the second signal intermediate processing result is greater than that caused by the target concerned sound source signal and the target non-concerned sound source signal.

And 308, mixing the first intermediate audio and the second intermediate audio to obtain a mixing processing result.

It will be appreciated that for multiple sources of sound in virtual space, integration into one stereo track or monophonic track is often required to output sound to the user. Therefore, in this step, the computer device can individually adjust the frequency, dynamics, tone quality, positioning, reverberation and sound field of the first intermediate audio and the second intermediate audio, and output the mixing processing result to optimize the audio of each source.

In the above embodiment, since the audio adjustment processing can be performed on at least one of the audio corresponding to the attention sound source and the audio corresponding to the non-attention sound source in the virtual space, the audio discrimination of the audio corresponding to the attention sound source is greater than the audio discrimination of the audio corresponding to the non-attention sound source, and the attention sound source is the sound in which the user is interested, so that the user can hear the sound in which the user is interested more easily. In addition, under a noisy environment, a user can effectively distinguish interested sounds.

Taking the virtual space as a three-dimensional space as an example, it can be understood that, besides background sound, sound in the virtual space usually originates from a certain direction, and the direction usually corresponds to a piece of area, that is, a sound source in the virtual space usually corresponds to a piece of area. For example, if there is a person speaking in the virtual space, the person has its own shape and volume, and occupies a part of the space in the virtual space, and the part of the space can be used as the area corresponding to the person as the sound source. Since the sound source generally corresponds to a partial region in the virtual space, the user can perform an interactive operation based on the region corresponding to the sound source to select the sound source. Based on the above statements, in some embodiments, determining a sound source of interest in a virtual space in response to an interaction with the virtual space comprises: in response to an interactive operation for a virtual space, determining a region of interest in the virtual space to which the interactive operation is directed; a sound source in the region of interest is taken as a sound source of interest.

The trigger mode and type of the interactive operation may refer to the process of the above embodiment, and are not described herein again. As can be seen from the above embodiments, the triggering manner and type of the interactive operation can be associated with the application scenario and the type of the interactive device. It will be understood how the area of interest is determined, in turn, may be associated with the manner and type of trigger for the interaction. For ease of understanding, the process of how to determine the region of interest and the sound source of interest will now be described with reference to the following examples. It can be understood that, in the actual implementation process, along with the richness of the application scene, the expansion of the interactive device types, and the upgrade of the interactive operation triggering manner, there may be a plurality of extension embodiments, which should be all optional embodiments on the premise of the concept proposed by the embodiments of the present application.

Taking an application scene as a user to play an immersive game through VR glasses, the type of the interactive device is VR glasses, the triggering mode of the interactive operation is a limb action, and the specific type is a turning motion as an example, based on the example, the determination mode of the attention area may be to use a simulation visual field range area covered by the user in the virtual space through VR glasses as the attention area, and specifically, as shown in fig. 4. In this example, a sound source corresponding to a virtual object in the simulated view range region may be set as the sound source of interest. Further, since the position range information of the simulation visual field range region can be known, and the position information of the virtual object in the virtual space can also be known, which virtual objects are in the simulation visual field range region can be determined by a position comparison method. The location information may be described by using longitude and latitude ranges through a longitude and latitude coordinate system, such as 30-45 degrees of east longitude and 0-60 degrees of north latitude, which are not specifically limited in this embodiment of the present application. As shown in fig. 5, the sound sources located in the region of interest, such as a cat, a dog, and a person walking a dog, are the sound sources of interest. It should be noted that the type of the interaction device used in this example may be a VR headset in addition to VR glasses, and this is not particularly limited in this embodiment of the present application.

Taking the application scene as a user to play the immersive game through the control device, wherein the type of the interactive device is the control device, the triggering mode of the interactive operation is the control action, and the specific type is the finger movement as an example, based on the example, the determination mode of the attention area may be that the computer device determines a virtual object selected in the virtual space by the user through the finger movement, and a part of the space occupied by the virtual object in the virtual space is taken as the attention area. Specifically, since the position information of the partial space occupied by each virtual object in the virtual space can be known, and the position information pointed by the user through the hand can also be known, it can be determined which virtual object is selected in the virtual space by the user through the hand pointing motion in a position comparison manner, and the partial space occupied by the virtual object in the virtual space is taken as the attention area. As shown in fig. 6. The user can use the finger to point to the male virtual object in the virtual space of fig. 6, and the space occupied by the male virtual object in the virtual space can be used as the attention area. The male virtual object is also in the attention area as a sound source, so that the sound source corresponding to the male virtual object can be used as the attention sound source. It should be noted that the finger-direction action in this example may be a real finger click action, may be a virtual finger click action similar to a mouse pointer, and may also be a finger-direction action on a limb, which is not specifically limited in this embodiment of the present application. Therefore, the type of the interaction device used in this example may be a touch device, a game pad, or a visual motion capture device, based on the manipulation device, which is not particularly limited in this embodiment of the present application.

It should be noted that, in the above mentioned examples, the process of determining the sound source of interest is actively triggered by the user. It will be appreciated that in actual implementation, there is also a process of passively triggering the determination of the sound source of interest. In combination with the above-mentioned example that the interactive operation is a turning motion, a user can determine a simulated visual field range region through one turning motion. If the user does not act after turning the head for a certain time. At this time, the simulated visual field range region, i.e., the region of interest, is fixed. It is to be understood that virtual objects in virtual space are generally not fixed, but rather are moving. If there is a virtual object actively moving into the attention area, the sound source corresponding to the virtual object may also be the attention sound source. As shown in fig. 7, the left male virtual object is not in the attention area but actively moves into the attention area, and the sound source corresponding to the male virtual object may be the attention sound source.

In the embodiment, the user can determine the attention area which the user may be interested in only by performing interactive operation on the virtual space, and automatically takes the sound source in the attention area as the attention sound source, so that the user can conveniently determine the sound which the user is interested in.

As shown in fig. 8, when all sound sources in the region of interest are set as the sound sources of interest, there are 3 sound sources of interest, and among all the sound sources, the sound sources other than the sound source of interest are set as the non-sound sources, and there are 2 non-sound sources. That is, in an actual implementation process, there may be a plurality of attention sound sources and non-attention sound sources. In the embodiment corresponding to fig. 3, each sound source of interest may correspond to one target audio of interest, and each sound source of non-interest may correspond to one target audio of non-interest. It can be understood that, under the above-mentioned premise, there may be a plurality of attention sound sources that need to be subjected to audio adjustment processing, a plurality of non-attention sound sources that need to be subjected to audio adjustment processing, or a plurality of both non-attention sound sources and attention sound sources that need to be subjected to audio adjustment processing.

Based on the application, the embodiment that the audio mixing processing is carried out before the audio adjusting processing is carried out is also provided. In some embodiments, acquiring a target audio of interest corresponding to a sound source of interest and a target audio of non-interest corresponding to a sound source of non-interest includes: performing sound mixing processing on attention audio frequencies sent by a plurality of attention sound sources respectively to obtain target attention audio frequencies; and carrying out sound mixing processing on the non-attention audio frequencies emitted by the plurality of non-attention sound sources respectively to obtain the target non-attention audio frequency.

It can be appreciated that there are typically multiple sound sources of interest and multiple sound sources of non-interest in the virtual space. The "plurality" of the "plurality of sound sources of interest" mentioned in the embodiment of the present application may refer to all sound sources of interest, or may refer to a part of sound sources of interest in all sound sources of interest, which is not specifically limited in the embodiment of the present application. The "plurality" of the "plurality of non-attention sound sources" mentioned in the embodiments of the present application may also refer to the above explanation. In addition, the embodiments of the present application refer to "mixing process", which may refer to integrating audio of a plurality of sound sources into one stereo track or mono track.

Wherein, the target attention audio obtained after the mixing processing of a plurality of attention sound sources can be recorded as

The target non-attention audio obtained after the mixing processing of a plurality of non-attention sound sources can be recorded as

. In any sound source, the mixing process may adopt one of direct addition, averaging, clamping, normalization, adaptive mixing weighting, or automatic alignment algorithms. Taking the example of performing the sound mixing processing on a plurality of attention sound sources by using the direct addition method, the sound mixing processing procedure can refer to the following formula (1):

；（1）

wherein,

indicating the mixing processing results of K sound sources of interest at the ith sampling instant. j denotes a jth attention sound source,

and representing the audio sampling value of the jth sound source of interest at the ith sampling moment.

In the above embodiment, since the target attention audio integrating a plurality of attention sound sources and the target non-attention audio integrating a plurality of non-attention sound sources can be formed, at least one of the target attention audio and the target non-attention audio integrated and formed can be subjected to audio adjustment processing subsequently, and audio adjustment processing for the audio of each sound source is not required, so that the workload of audio adjustment processing can be reduced.

It can be known from the content of the foregoing embodiment that the objective of the audio adjustment processing in the embodiment corresponding to fig. 3 is mainly that the sound effect discrimination of the sound source of interest in all sound sources, which is caused by the first signal intermediate processing result and the second signal intermediate processing result, is greater than that caused by the target sound source of interest signal and the target sound source of non-interest signal. To achieve the above object, the audio adjustment process may be performed only on the target attention audio corresponding to the attention sound source, only on the target non-attention audio corresponding to the non-attention sound source, or both.

In some embodiments, the audio adjustment processing is performed on at least one of the target attention audio and the target non-attention audio to obtain a first intermediate audio of the attention sound source and a second intermediate audio of the non-attention sound source, including: performing signal enhancement processing on a target attention audio to obtain a first intermediate audio corresponding to an attention sound source; and performing signal attenuation processing on the target non-attention audio to obtain a second intermediate audio corresponding to the non-attention sound source.

The signal enhancement processing and the signal attenuation processing may both include at least one of a wiener filtering method, a subspace enhancement algorithm, a spectral subtraction method, an adaptive subtraction method, a hidden markov model method, a short-time amplitude spectrum estimation method, or a wavelet transform, which is not specifically limited in this embodiment of the present application. The computer equipment performs signal enhancement processing on the target attention audio, and can be embodied in the following steps that the volume of the attention sound source is increased or the definition of the attention sound source is improved. The computer equipment performs signal attenuation processing on the target non-attention audio, and then the volume of the sound of the non-attention sound source can be reduced or the definition of the sound can be reduced.

In the above embodiment, since the signal enhancement processing can be performed on the target attention audio and the signal attenuation processing can be performed on the target non-attention audio at the same time, the adjustment can be performed from two directions, so that the sound emitted by the attention sound source can obtain higher sound effect discrimination in all sound sources, and the user can hear the sound of interest.

It will be appreciated that in actual practice, the audio is typically adjusted based on the adjustment parameters. Based on this description, in some embodiments, performing audio conditioning processing on at least one of a target audio of interest and a target audio of non-interest to obtain a first intermediate audio of a sound source of interest and a second intermediate audio of a sound source of non-interest, includes: determining audio to be adjusted from the target attention audio and the target non-attention audio; determining adjustment parameters respectively corresponding to each sampling moment of the audio to be adjusted; and carrying out audio adjustment processing on the audio to be adjusted based on the adjustment parameters respectively corresponding to the sampling moments, and determining a first intermediate audio of the concerned sound source and a second intermediate audio of the non-concerned sound source based on the audio adjustment result.

Specifically, only the target attention audio may be used as the audio to be adjusted, only the target non-attention audio may be used as the audio to be adjusted, and the target attention audio and the target non-attention audio may be simultaneously used as the audio to be adjusted, which is not specifically limited in the embodiment of the present application. Sound is in fact an energy wave, which is infinitely smooth, and the string can be seen to be composed of numerous points, since the storage space is relatively limited, so that during digital encoding, the points of the string are usually sampled. It will thus be appreciated that the audio to be conditioned corresponds to a time axis, and that each sampling instant corresponds to a conditioning parameter of the sound signal.

Therefore, in the embodiment of the application, the computer device may mainly perform audio adjustment processing on the adjustment parameter corresponding to the audio to be adjusted at each sampling time. The adjustment parameter may include at least one of an amplitude of the audio waveform, a fundamental frequency of the audio, or a harmonic amount of the audio waveform, which is not particularly limited in the embodiments of the present application. The amplitude of the audio waveform corresponds to the volume, the fundamental frequency of the audio corresponds to the tone, and the number of harmonics of the audio waveform corresponds to the tone. The computer equipment can adjust the adjusting parameters corresponding to each sampling time of the audio to be adjusted respectively, so that an audio adjusting result can be obtained, and then a first intermediate audio of the concerned sound source and a second intermediate audio of the non-concerned sound source are determined according to the audio adjusting result.

In the above embodiment, since the audio adjustment processing may be performed on at least one of the target attention audio and the target non-attention audio based on the adjustment parameter corresponding to each sampling time, accurate adjustment may be achieved.

In some embodiments, determining the adjustment parameter corresponding to each sampling time of the audio to be adjusted includes: determining a target time period in which each sampling moment of the audio to be adjusted falls respectively, wherein the target time period is determined by a sound source switching process, and the sound source switching process refers to switching between a concerned sound source and a non-concerned sound source; and obtaining the calculation results of the adjustment functions corresponding to the target time periods in which the sampling moments respectively fall as the adjustment parameters corresponding to the sampling moments respectively, wherein the value of an independent variable existing in the adjustment functions is determined based on the sampling moments.

As can be seen from the above embodiments, the sound source of the virtual space is usually captured in real time or preset by the virtual reality application. The starting time of the target time period may be predetermined, and the duration of the target time period may also be predetermined. For example, if the sound source of the virtual space is preset by the virtual reality application, both the start time and the duration of the target time period may be preset. In addition, the starting time and the duration are determined, and the ending duration of the target time period can also be determined.

For convenience of explanation, the sound source in the virtual space is captured in real time. If the audio to be adjusted is the target attention audio, for the current sampling time, the computer device may determine a target time period within which the current sampling time falls. It is understood that the sound source of the virtual space is not always a sound source of interest or a sound source of no interest, but a switching of states occurs. For example, a sound source may be a sound source of interest at one time and may be switched to a sound source of no interest at the next time. Thus, since the target audio of interest is the audio corresponding to the sound source of interest, and the target time period is determined by the sound source switching process, the computer device may determine the starting time of the target time period, which may be related to the time that the sound source of interest was last switched into the "sound source of interest". The end time of the target time period may be related to the time when the sound source of interest is switched to the "non-sound source of interest" last time after the current sampling time.

For example, taking the sound source of interest as a, the current sampling time is 20 am, 11 am, 30 min at 12 months, 21 months, 2021. And for the current sampling time, the time when A is switched into the attention sound source for the last time is switched into the attention sound source at 25 am on 21 st day 12/2021, and all the non-attention sound sources are switched into the attention sound source before the current sampling time. It can thus be determined that the starting time of the target time period can be related to, as can be directly, 12/21/2021 at 11 am 25. If a is cut out as a non-attention sound source at 35 am on 21/12/2021 after the current sampling time, it can be determined that the end time of the target time period may be related to 35 am on 21/12/2021, or may be the time as it is.

In addition, if the audio to be adjusted is the target non-attention audio, for the current sampling time, the computer device may determine the target time period in which the current sampling time falls. At this time, the target non-attention audio is the audio corresponding to the non-attention sound source, and how to determine the start time and the end time of the target time period may refer to the contents of the above-described embodiments. For example, the start time of the target time period may be related to the time at which the non-attention sound source was last switched to the "non-attention sound source". The end time of the target time period may be related to the time when the sound source of interest was last cut out as the "sound source of interest" after the current sampling time.

As can be seen from the contents of the above embodiments, if the audio to be adjusted is the target attention audio, the target time period can be understood as a complete life cycle of the sound source as the attention sound source. If the audio to be adjusted is the target non-attention audio, the target time period may be understood as a complete life cycle of the sound source as a non-attention sound source. It should be noted that the audio to be adjusted may include both the target attention audio and the target non-attention audio, and for the current sampling time, the audio adjustment processing needs to be performed on the two audios at the same time. At this time, although there may be overlap in time periods, the target attention audio and the target non-attention audio may each correspond to one target time period, and both perform their own audio adjustment processing based on the respective corresponding adjustment functions in the respective corresponding target time periods.

It should be noted that, in the above, for the target time period corresponding to the target attention audio, the target time period is defined based on one complete life cycle of the sound source as the attention sound source, that is, one complete life cycle of the sound source as the attention sound source corresponds to only one target time period. And a complete life cycle of the sound source as a non-interesting sound source also corresponds to only one target time segment. It can be understood that, in an actual implementation process, a complete life cycle of a sound source as a sound source of interest may be divided into a plurality of time segments, and to which time segment of the plurality of time segments the current sampling time falls, the time segment may be used as a target time segment in which the current sampling time falls. Similarly, a complete life cycle of the sound source as a non-attention sound source may also be divided into a plurality of time segments, and a time segment in which the current sampling time falls in the plurality of time segments may be used as a target time segment in which the current sampling time falls.

In the embodiment of the present application, the target time period corresponds to an adjustment function. It will be appreciated that different sampling instants may correspond to different calculation results of the adjustment function. Therefore, at least one value of the independent variable can be associated with the value of the sampling moment in the adjusting function. And substituting the current sampling time into an adjusting function corresponding to the target time period in which the current sampling time falls by the computer equipment, wherein the obtained calculation result can be used as an adjusting parameter corresponding to the current sampling time.

In the above embodiment, the adjustment parameter corresponding to the sampling time may be determined based on the adjustment function corresponding to the target time period in which the sampling time of the audio to be adjusted falls, so that different adjustment parameters may correspond to different times, and thus, accurate adjustment may be achieved.

It can be understood that the distance between the listener and the sound source affects the listening effect of the listener, such as whether the sound is clear or not. Therefore, in the practical implementation process, how the adjustment parameters are determined can also refer to the distance between the listener and the sound source. In the embodiment of the present application, in the real space, the "listener" may be a target object that triggers the interactive operation, that is, may be a user using the interactive device. In the virtual space, a mapping object, that is, a virtual operation object, usually exists in a mapping manner for a user using the interactive device. Based on this description, in some embodiments, an adjustment threshold is included in the adjustment function, the adjustment threshold being determined by a distance between the corresponding sound source and a virtual operation object in the virtual space, the virtual operation object being a mapping object of a target object in the virtual space that triggers the interactive operation.

As can be seen from the above embodiments, the adjustment function may have an argument corresponding to the sampling time. In the embodiment of the present application, the adjustment function may include, in addition to the argument, another argument determined based on the distance between the virtual operation object and the sound source, where the argument is the adjustment threshold. As can be seen from the above description of the embodiments, the "target attention audio and the target non-attention audio may each correspond to a target time period, and both execute their own audio adjustment processing procedures based on their corresponding adjustment functions in the corresponding target time periods", and the target attention audio and the target non-attention audio each correspond to a sound source. It follows that each sound source may correspond to an adjustment function. Based on this, for any sound source, the adjustment threshold included in the adjustment function corresponding to that sound source may be determined by the distance between that sound source and the virtual operation object in the virtual space.

The adjustment threshold may be determined by a table look-up mapping manner, which is not specifically limited in this embodiment of the present application. For example, different distance value intervals may be preset. The attention sound source corresponds to an adjustment threshold value in each distance value interval, and the non-attention sound source also corresponds to an adjustment threshold value in each distance value interval. Subsequently, based on the distance value section in which the actual distance falls and the sound source type (whether the sound source is the attention sound source or the non-attention sound source) corresponding to the actual distance, the adjustment threshold corresponding to the actual distance can be determined.

In the above embodiment, the distance between the listener and the sound source may affect the value of the adjustment parameter, so that the distance may affect the sound effect generated by the sound source. Therefore, the audio emitted by the sound source can be adjusted based on the distance, so that the user can hear the interested sound more easily.

As can be seen from the above description of the embodiments, a complete life cycle of a sound source as a sound source of interest may correspond to a plurality of time periods. A complete life cycle of a sound source of interest generally refers to a time period from switching into a sound source of interest to switching out of the sound source of non-interest, and the time period of an actual implementation process can be divided into three time periods, namely a time period for switching into the sound source of interest, a time period for continuing the sound source of interest, and a time period for exiting the sound source of interest. Here, the cut-in period may refer to an initial stage when the sound source is just determined as the sound source of interest, the continuous period may refer to a stable stage after the sound source is determined as the sound source of interest, and the exit period may refer to an initial stage when the sound source is just determined as the non-sound source of interest, and also refer to an ending stage when the sound source is no longer as the sound source of interest.

Based on the above description, in some embodiments, if the audio to be adjusted is the target attention audio, the target time period is one of a cut-in attention time period, a continuous attention time period, and a quit attention time period; wherein the start time of the cut-in attention period is determined based on a time at which the sound source is determined as the attention sound source, the start time of the continuous attention period is determined based on an end time of the cut-in attention period, and the start time of the exit attention period is determined based on a time at which the attention sound source is switched to the non-attention sound source, or is determined based on the cancel attention instruction.

The starting time of the cut-in attention time period may be a time when the sound source is determined as the attention sound source, or may be a time before or after the time, such as a time after several seconds. Similarly, the starting time of the continuous attention time period may be the ending time of the cut-in attention time period, or may be a time before or after the ending time. The start time of the quitting attention time period may be a time when the attention sound source is switched to the non-attention sound source, or may be a time before or after the time. The end time of the sustained attention time period may be a time when the attention sound source is switched to the non-attention sound source, or may be a time before or after the time, which is not limited in the embodiment of the present application.

In the above description, no determination is given for the end time of the cut-in attention time period and the end time of the exit attention time period. In an actual implementation process, respective durations of the cut-in attention time period and the quit attention time period may be set to determine respective ending times, which is not specifically limited in the embodiment of the present application. It is to be understood that "the sound source of interest is switched to the sound source of non-interest" emphasizes a passive switching process of the sound source, and if the sound source of interest is usually movable, the sound source of interest leaves the region of interest, i.e. the sound source of interest is switched to the sound source of non-interest. In the actual implementation process, the sound source of interest can be actively switched to the sound source of no interest in a manner of instruction triggering. Thus, in the embodiment of the present application, the starting time of the attention quitting time period may also be determined based on the attention canceling instruction.

In the above embodiment, since a complete life cycle of the sound source as the attention sound source may correspond to a plurality of time periods, based on different time periods in which the sampling time falls, the corresponding adjustment parameter may be determined, and subsequently, the audio adjustment processing may be performed based on the adjustment parameter, so that the audio adjustment may be performed in stages for a complete life cycle of the attention sound source, and further, the adjustment process is more detailed, and the generated sound effect is better.

In some embodiments, the adjustment function corresponding to the cut-in attention time period is a monotonically increasing function, the adjustment function corresponding to the continuous attention time period is a constant function, and the adjustment function corresponding to the exit attention time period is a monotonically decreasing function.

The monotone increasing function refers to a calculation result of the adjusting function, that is, an adjusting parameter gradually increases as time advances. The constant function refers to the result of calculation of the adjustment function, i.e. the adjustment parameter, as time progresses, which remains constant. The monotonously decreasing function means that the calculation result of the adjustment function, that is, the adjustment parameter, gradually decreases as time advances.

It can be understood that the reason why the adjustment function corresponding to the cut-in attention time period is set as a monotonically increasing function is that the cut-in attention time period is an initial stage when the sound source is just determined as the attention sound source, and as time advances, the adjustment parameter becomes larger and larger, which also meets the expectation of the gradual change processing in which the attention sound source is gradually substituted into the auditory world. The reason why the adjustment function corresponding to the continuous attention time period is set as a constant function is that the continuous attention time period refers to a stable stage after the sound source is determined as the attention sound source, and as time advances, the value of the adjustment parameter remains unchanged, which also meets the expectation of the stabilization process that the attention sound source has been substituted into the auditory world. The reason why the adjustment function corresponding to the attention quitting time period is set as the monotone decreasing function is that the attention quitting time period means the ending stage when the sound source is no longer used as the attention sound source, and the value of the adjustment parameter is smaller and smaller along with the advance of time, which also accords with the expectation of the gradual change processing that the attention sound source fades out from the auditory world gradually.

In the above embodiment, since the adjustment parameter may gradually increase and remain unchanged to gradually decrease with the advance of time, and the gradual change processing process may be in accordance with the switching process from the attention sound source to the non-attention sound source, the adjustment process is more precise, and the generated sound effect is better.

In some embodiments, the gradient of each of the monotonically increasing function and the monotonically decreasing function gradually decreases as the argument increases.

The independent variable mentioned in the embodiment of the present application refers to the independent variable determined based on the sampling time. It can be understood that the cut-in attention time period is an initial stage when the sound source is just determined as the attention sound source, and as time advances, values of the adjustment parameters become larger and larger, which can meet the expectation of gradual change processing that the attention sound source is gradually substituted into the auditory world. Since the continuous attention time period which is kept stable is followed by the cut-in attention time period, and as time advances, the value of the adjustment parameter continuously increases at the initial increasing speed, and obviously does not accord with the expectation of the subsequent gradual change processing which tends to be stable too much, and the trend of gradually increasing the value and gradually slowing down the increasing speed accords with the expectation of the subsequent gradual change processing which tends to be stable, so that the monotone increasing function can be set to gradually decrease the gradient as time advances.

Similarly, the exit from the attention time period is an initial stage when the sound source is just determined as a non-attention sound source, and also refers to an ending stage when the sound source is no longer the attention sound source. With the advance of time, the values of the adjusting parameters are smaller and smaller, and the gradual change processing expectation that the concerned sound source fades out from the auditory world gradually can be met. Since the continuous non-attention time period which is kept stable can be followed after the attention time period exits, and as time advances, the value of the adjusting parameter is continuously reduced at the initial speed reduction rate, which obviously does not accord with the expectation of the subsequent gradient processing which tends to be stable too much, and the trend that the value is gradually reduced but the speed reduction rate is gradually reduced accords with the expectation of the subsequent gradient processing which tends to be stable, so that the monotone decreasing function can also be set to gradually reduce the gradient as time advances.

In the above embodiment, since the speed increasing or decreasing of the adjustment parameter can be changed with time to adapt to the expectation of the gradual change process, the adjustment process is finer, and the generated sound effect is better.

The above embodiments mainly explain, from a principle perspective, how to determine the adjustment parameters when the audio to be adjusted is the target audio of interest. Now, the determination process of the adjustment parameter in the above embodiment is described with reference to a specific example, and specifically, reference may be made to fig. 9. Taking the time unit in fig. 9 as a second as an example, 0 to 0.5 seconds in fig. 9 correspond to the cut-in attention period, 0.5 to T correspond to the continuous attention period, and T to T +0.5 correspond to the exit attention period. Here, the 0 th second and the 0.5 th second do not refer to real time, but are only for convenience of measurement. In connection with the contents of the above-described embodiments, it can be understood that the 0 th second may be a time when a sound source is determined as a sound source of interest, and the T th second may be a time when a sound source of interest is switched to a sound source of no interest.

Specifically, the adjustment function corresponding to the cut-in attention time period may refer to the following formula (2):

；（2）

wherein,

representing an adjustment function corresponding to the cut-in attention time period, wherein t represents an independent variable determined based on the current sampling moment;

the value of (a) is determined based on the duration of the cut-in attention time period;

corresponding to the adjustment threshold mentioned in the above embodiments, the value thereof is determined by the distance between the corresponding sound source and the virtual operation object in the virtual spaceDetermining;

、

and

all represent constants. Further, the air conditioner is provided with a fan,

the value of (a) may be greater than 1,

the value of (a) may be 1,

the value of (a) may be 1,

can be 0.5. In an actual implementation process, values of the constants may be set according to requirements, and this is not specifically limited in this embodiment of the present application. As shown in fig. 9, the adjustment function for the plunge focus period is an increasing function and the gradient gradually decreases as time progresses.

The adjustment function corresponding to the continuous attention time period may refer to the following formula (3):

；（3）

wherein,

indicating the corresponding adjustment function for the duration of the focus period.

As the adjustment threshold, the value thereof may be set as required, in the embodiment of the present application

Can be selected from

In the same way, the embodiments of the present application are not particularly limited thereto. As shown in fig. 9, the adjustment function corresponding to the continuous attention time period is a constant function.

The adjustment function corresponding to the exit of the attention time period may refer to the following formula (4):

；（4）

wherein,

indicating that the adjustment function corresponding to the concerned time period exits, and t indicating an independent variable determined based on the current sampling moment;

the value of (a) is determined based on the length of time to exit the period of interest;

corresponding to the adjustment threshold mentioned in the above embodiment, the value thereof is determined by the distance between the corresponding sound source and the virtual operation object in the virtual space;

、

and

all represent constants. Further, the air conditioner is provided with a fan,

can be greater than1，

The value of (a) may be 1,

the value of (a) may be 1,

can be 0.5. In an actual implementation process, values of the constants may be set according to requirements, and this is not specifically limited in this embodiment of the present application. As shown in fig. 9, the adjustment function corresponding to the time period of exit from interest is a decreasing function and the gradient gradually decreases as time advances.

The above contents are mainly embodiments corresponding to the case where the audio to be adjusted is the attention audio, and by combining the contents of the above embodiments, it can be known that the audio to be adjusted may also be the target non-attention audio. Based on this, in some embodiments, if the audio to be adjusted is the target non-attention audio, the target time period is one of a cut-in non-attention time period, a continuous non-attention time period, and a quit non-attention time period; wherein a start time of switching into the non-attention period is determined based on a time at which the sound source is determined to be a non-attention sound source, a start time of continuing the non-attention period is determined based on an end time of switching into the non-attention period, a start time of exiting the non-attention period is determined based on a time at which the non-attention sound source is switched to the attention sound source, or is determined based on the attention instruction.

The starting time of the switching-in non-attention time period may be a time when the sound source is determined to be a non-attention sound source, or may be a time before or after the time, for example, a time after several seconds. Similarly, the starting time of the continuous non-attention time period may be the ending time of the cut-in non-attention time period, or may be a time before or after the ending time. The start time of exiting the non-attention period may be a time at which the non-attention sound source is switched to the attention sound source, or may be a time before or after the time. The end time of the sustained non-attention period may be a time when the non-attention sound source is switched to the attention sound source, or may be a time before or after the time, which is not limited in the embodiment of the present application.

In the above description, no determination is given to the end time of the cut-in non-attention period and the end time of the exit non-attention period. In an actual implementation process, respective durations of the cut-in non-attention time period and the exit non-attention time period may be set to determine respective ending times, which is not specifically limited in the embodiment of the present application. Similar to the above-described embodiment, the "non-attention sound source is switched to the attention sound source" emphasizes a sound source passive switching process, and if the non-attention sound source is usually movable, the non-attention sound source enters the attention area, and is switched to the attention sound source. In the actual implementation process, the non-attention sound source can be actively switched to the attention sound source in a command triggering mode. Thus, in the embodiment of the present application, the starting time of exiting the non-attention time period may also be determined based on the attention instruction.

In the above embodiment, since a complete life cycle of the sound source as the non-attention sound source may correspond to a plurality of time periods, based on different time periods in which the sampling time falls, the corresponding adjustment parameter may be determined, and subsequently, the audio adjustment processing is performed based on the adjustment parameter, so that the audio adjustment may be performed in stages for a complete life cycle of the non-attention sound source, and further, the adjustment process is more detailed, and the generated sound effect is better.

In some embodiments, the adjustment function corresponding to the cut-in non-attention time period is a monotonically decreasing function, the adjustment function corresponding to the continuous non-attention time period is a constant function, and the adjustment function corresponding to the exit non-attention time period is a monotonically increasing function.

The definition of the monotone increasing function, the constant function and the monotone increasing function may refer to the audio to be adjusted as an explanation in the embodiment corresponding to the target focused audio, and will not be described herein again.

It can also be understood that the reason why the adjustment function corresponding to the cut-in non-attention time period is set as a monotonically decreasing function is that the cut-in non-attention time period refers to an ending stage when the sound source is no longer the attention sound source, and as time advances, the value of the adjustment parameter becomes smaller and smaller, which also meets the expectation of the gradual change processing that the attention sound source fades out of the auditory world gradually. The reason why the adjustment function corresponding to the sustained non-attention time period is set as a constant function is that the sustained non-attention time period refers to a stable stage after the sound source is determined as a non-attention sound source, and as time advances, the value of the adjustment parameter remains unchanged and also meets the expectation of the stabilization process in which the non-attention sound source is substituted into the auditory world. The reason why the adjustment function corresponding to the time period of non-attention quitting is set as the monotone increasing function is that the time period of non-attention quitting is the initial stage when the sound source is just determined as the sound source of attention, and the value of the adjustment parameter is larger and larger along with the advance of time, which also meets the expectation of gradual change processing that the sound source of attention is gradually substituted into the auditory world.

In addition, the gradients of the monotonously increasing function and the monotonously decreasing function mentioned in the embodiment of the present application may be gradually decreased with the increase of the independent variable. The argument referred to here is also the argument determined on the basis of the sampling instant. It can be understood that the cut-in non-attention time period is an initial stage when the sound source is just determined as a non-attention sound source, and as time advances, the value of the adjustment parameter becomes smaller and smaller, which can meet the gradual change processing expectation that the non-attention sound source fades out from the auditory world gradually. Since the cut-in non-attention time period is followed by the steady continuous non-attention time period, and as time advances, the value of the adjusting parameter continuously decreases at the initial decreasing speed, which obviously does not accord with the expectation of the subsequent steady gradual processing, and the trend that the value gradually decreases but the decreasing speed gradually slows down is more consistent with the expectation of the subsequent steady gradual processing, so that the monotone decreasing function can be set to gradually decrease the gradient as time advances.

Similarly, exiting the non-attention period is an initial stage when the sound source is just determined as the attention sound source, and also refers to an ending stage when the sound source is no longer the non-attention sound source. With the advance of time, the value of the adjusting parameter is larger and larger, and the gradual change processing expectation that the concerned sound source is gradually substituted into the auditory world can be met. Since the continuous concerned time period which is kept stable is followed by the non concerned time period, and the value of the adjusting parameter continuously increases at the initial increasing rate with the advance of time, obviously, the value is not in accordance with the expectation of the subsequent gradual change processing which tends to be stable, and the trend of gradually increasing the value and gradually slowing down the increasing rate is in accordance with the expectation of the subsequent gradual change processing which tends to be stable, so that the monotone increasing function can also be set to gradually decrease the gradient with the advance of time.

In the above embodiment, since the adjustment parameter may gradually decrease and remain unchanged to gradually increase with the advance of time, and the gradual change processing process may be in accordance with the switching process from the non-attention sound source to the attention sound source, the adjustment process is more precise, and the generated sound effect is better. In addition, the speed increasing or the speed reducing of the adjusting parameters can be changed along with the change of time so as to adapt to the expectation of gradual change processing, thereby also enabling the adjusting process to be finer and the generated sound effect to be better.

The above embodiments mainly explain, from a principle perspective, how to determine the adjustment parameters when the audio to be adjusted is the target non-attention audio. Now, the determination process of the adjustment parameter in the above embodiment is described with reference to a specific example, and specifically, reference may be made to fig. 10. Taking the time unit in fig. 10 as a second as an example, 0 to 0.5 seconds in fig. 10 correspond to a cut-in non-attention period, 0.5 to T correspond to a continuous non-attention period, and T to T +0.5 correspond to a drop-out non-attention period. Here, the 0 th second and the 0.5 th second do not refer to real time, but are only for convenience of measurement. In connection with the contents of the above-described embodiments, it can be understood that the 0 th second may be a time when a sound source is determined to be a non-attention sound source, and the T th second may be a time when the non-attention sound source is switched to an attention sound source.

Specifically, the adjustment function corresponding to the cut-in non-attention time period may refer to the following formula (5):

；（5）

wherein,

representing a corresponding adjustment function for switching into a non-concerned time period, and t represents an independent variable determined based on the current sampling time;

the value of (a) is determined based on the length of time cut into the non-concerned time period;

、

and

all represent constants. Further, the air conditioner is provided with a fan,

the value of (a) may be less than 1,

can be selected from

In the same way, the first and second,

the value of (a) may be 1,

can be 0.5. In an actual implementation process, values of the constants may be set according to requirements, and this is not specifically limited in this embodiment of the present application. As shown in figure 10 of the drawings,the adjustment function corresponding to the cut-in non-attention time period is a decreasing function and the gradient gradually decreases as time advances.

The adjustment function corresponding to the sustained non-attention period may refer to the following equation (6):

；（6）

wherein,

indicating the corresponding adjustment function for the non-attended time period.

Can be selected from

In the same way, the embodiments of the present application are not particularly limited thereto. As shown in fig. 10, the adjustment function corresponding to the sustained non-attention period is a constant function.

The adjustment function corresponding to the exit of the non-attention time period may refer to the following formula (7):

；（7）

wherein,

indicating that the adjustment function corresponding to the non-concerned time period exits, wherein t represents an independent variable determined based on the current sampling moment;

is determined based on the length of time to exit the non-concerned time period;

、

and

all represent constants. Further, the air conditioner is provided with a fan,

the value of (a) may be less than 1,

is taken from

It may be the same as the above-described,

the value of (a) may be 1,

can be 0.5. In an actual implementation process, values of the constants may be set according to requirements, and this is not specifically limited in this embodiment of the present application. As shown in fig. 10, the adjustment function corresponding to exiting the period of non-interest is an increasing function and the gradient gradually decreases as time progresses.

It should also be noted that the audio to be adjusted may include both target attention audio and target non-attention audio. At this time, the audio adjustment processing may be performed on the target attention audio and the target non-attention audio, respectively, in the processing manner mentioned in the embodiment. As can be seen from fig. 9 and 10, the end time of the exit non-attention time period and the start time of the cut-in attention time period may be linked. For example, the start time of exiting the non-attention period may be a time at which the non-attention sound source is switched to the attention sound source, and the start time of switching into the attention period may be an end time of exiting the non-attention period.

In some embodiments, determining a first intermediate audio of a sound source of interest and a second intermediate audio of a sound source not of interest based on the audio adjustment results comprises: if the target attention audio is the audio to be adjusted, taking an audio adjusting result corresponding to the target attention audio as a first intermediate audio of the attention sound source, otherwise, directly taking the target attention audio as the first intermediate audio; and if the target non-attention audio is the audio to be adjusted, taking an audio adjusting result corresponding to the target non-attention audio as a second intermediate audio of the non-attention sound source, otherwise, directly taking the target non-attention audio as the second intermediate audio.

Specifically, if the audio to be adjusted is the target attention audio, it indicates that the target attention audio needs to perform audio adjustment processing, and the computer device may use an audio adjustment result corresponding to the target attention audio as a first intermediate audio of the attention sound source. Otherwise, it indicates that the target attention audio does not need to perform audio adjustment processing, and the computer device may directly use the target attention audio as the first intermediate audio. While the target does not focus on audio for the same reason. In addition, if both the target attention audio and the target non-attention audio are to-be-adjusted audio, the following equations (8) and (9) may be referred to in the process of obtaining the first intermediate audio and the second intermediate audio through the audio adjustment process:

；（8）

；（9）

wherein,

to representThe target audio of interest at the ith sampling instant, t represents an argument determined based on the ith sampling instant,

representing the first intermediate tone at the ith sample instant.

And the adjustment parameter corresponding to the ith sampling moment is expressed on the premise that the audio to be adjusted is the target attention audio.

How to determine can refer to the above equations (2), (3) and (4).

The target non-interesting audio representing the ith sample instant,

representing the second intermediate tone at the ith sample time.

The adjustment parameter corresponding to the ith sampling moment on the premise of representing that the audio to be adjusted is the target non-attention audio,

how to determine can refer to the above equations (5), (6) and (7).

It will be appreciated that the audio signal played in the virtual space will typically have a plurality of channels, and that in practice multi-channel sound reconstruction is involved. Thus, in some embodiments, as shown in fig. 11, a virtual space-based audio processing method is provided, which is applied to the computer device 204 in fig. 2 and is illustrated as being applied to a virtual reality application program, and includes the following steps:

step 1102, in response to the interactive operation for the virtual space, determining a sound source of interest in the virtual space, and regarding sound sources other than the sound source of interest among all sound sources in the virtual space as non-sound sources of interest.

And step 1104, acquiring a target attention audio corresponding to the attention sound source and a target non-attention audio corresponding to the non-attention sound source.

Step 1106, performing audio adjustment processing on at least one of the target attention audio and the target non-attention audio to obtain a first intermediate audio of the attention sound source and a second intermediate audio of the non-attention sound source, wherein the sound effect discrimination of the first intermediate audio is greater than that of the second intermediate audio.

Step 1108, mixing the first intermediate audio and the second intermediate audio to obtain a mixing processing result.

For the detailed explanation of the above steps, reference may be made to the content of the embodiment corresponding to fig. 3.

Step 1110, obtaining a sound mixing processing result corresponding to each of the plurality of channels, performing stereo reconstruction based on the sound mixing processing result of the plurality of channels, and outputting a reconstructed stereo in a virtual space.

In which, each channel may be subjected to audio adjustment processing by referring to the procedures mentioned in the above embodiments, and a mixing processing result of each channel is obtained. In addition, the stereo reconstruction may be implemented by using an HRTF virtual stereo reconstruction technique, which is not specifically limited in this embodiment of the present application.

In the above embodiment, since the audio adjustment processing can be performed on at least one of the audio corresponding to the attention sound source and the audio corresponding to the non-attention sound source in each channel, the audio discrimination of the audio corresponding to the attention sound source is greater than the audio discrimination of the audio corresponding to the non-attention sound source, and the attention sound source is the sound in which the user is interested, so that the user can hear the sound in which the user is interested more easily. In addition, under a noisy environment, a user can effectively distinguish interested sounds.

In addition to the trigger modes and types of the interaction operations mentioned in the above embodiments, other trigger modes and types of the interaction operations may exist. Based on this, in some embodiments, the interaction is captured by the interaction device, and the operation type of the interaction includes at least one of a sensory pointing type, an awareness pointing type, and a limb pointing type.

The sensory pointing type interaction may refer to an interaction triggered by a sensory, such as a visual trigger, as in the above embodiments. The awareness pointing type of the interactive operation may refer to an interactive operation triggered by awareness, such as an interactive operation triggered by brain waves. The limb pointing type of interaction may refer to interaction by a limb action, such as interaction triggered by a hand manipulation action.

In the above embodiment, since there are a plurality of operation types of interactive operations, it is possible to enrich the trigger modes of the attention sound source.

The embodiment of the application also provides an application scene, and the application scene applies the audio processing method based on the virtual space. Specifically, the application of the audio processing method based on the virtual space in the application scenario is as follows:

the user takes the VR glasses to connect the host device to play an immersive game, and the turning motion of the user is mapped in a virtual reality game program of the host device to be a virtual operation object to simulate the change of the visual field range area in the virtual space. The host device regards a sound source located in the changed simulated visual field range region (i.e., the region of interest) in the virtual space as a sound source of interest, and regards the remaining sound sources in the virtual space as non-sound sources of interest.

And the host equipment performs sound mixing processing on all the concerned sound sources to obtain target concerned audios corresponding to all the concerned sound sources. And the host equipment performs sound mixing processing on all the non-attention sound sources to obtain target non-attention audios corresponding to all the non-attention sound sources. The host device determines a time period in which the audio sampling moment falls, the host device determines a correspondingly used adjusting function according to the time period in which the audio sampling moment falls, the host device determines an adjusting parameter based on the adjusting function, and performs signal enhancement processing on the target attention audio based on the adjusting parameter to obtain a first intermediate audio. In a similar manner, the host device performs signal attenuation processing on the target non-attention audio to obtain a second intermediate audio. And the host equipment performs sound mixing on the first intermediate audio and the second intermediate audio to obtain a sound mixing processing result. And finally, the host equipment performs stereo reconstruction on the mixed sound processing result of each sound channel, stereo is output to the VR glasses, and the user can hear the stereo through earphones attached to the VR glasses.

The application further provides an application scenario applying the audio processing method based on the virtual space. Specifically, the application of the audio processing method based on the virtual space in the application scenario is as follows:

the user uses the motion capture device to connect the host device to play the motion sensing game, the hand control motion of the user is captured by the motion capture device and is mapped into a virtual operation object pointed by the virtual operation object in a virtual space in a motion sensing game program, for example, the user points to a non-player character in the motion sensing game through the motion capture device. The host device may use a space occupied by the virtual operation object in the virtual space as a region of interest, and use a sound source within the region of interest, that is, corresponding to the virtual operation object, as a sound source of interest. In this way, the user can specify a plurality of sound sources of interest. The remaining sound sources in the virtual space are regarded as non-attention sound sources.

And the host equipment performs sound mixing processing on all the concerned sound sources to obtain target concerned audios corresponding to all the concerned sound sources. And the host equipment performs sound mixing processing on all the non-attention sound sources to obtain target non-attention audios corresponding to all the non-attention sound sources. The host device determines a time period in which the audio sampling moment falls, the host device determines a correspondingly used adjusting function according to the time period in which the audio sampling moment falls, the host device determines an adjusting parameter based on the adjusting function, and performs signal enhancement processing on the target attention audio based on the adjusting parameter to obtain a first intermediate audio. In a similar manner, the host device performs signal attenuation processing on the target non-attention audio to obtain a second intermediate audio. And the host equipment performs sound mixing on the first intermediate audio and the second intermediate audio to obtain a sound mixing processing result. And finally, the host equipment performs stereo reconstruction on the mixed sound processing result of each sound channel, and outputs stereo sound to a sound box or an earphone worn by the user, so that the user can hear the stereo sound.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a virtual space-based audio processing apparatus for implementing the virtual space-based audio processing method mentioned above. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the virtual space-based audio processing device provided below can refer to the limitations on the virtual space-based audio processing method in the foregoing, and details are not repeated herein.

In some embodiments, as shown in fig. 12, there is provided a virtual space-based audio processing apparatus 1200, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a determination module 1202, an acquisition module 1204, an audio adjustment processing module 1206, and a mixing processing module 1208, wherein:

a determining module 1202, configured to determine a sound source of interest in a virtual space in response to an interactive operation for the virtual space, and to take sound sources other than the sound source of interest among all sound sources in the virtual space as non-sound sources of interest;

an obtaining module 1204, configured to obtain a target attention audio corresponding to the attention sound source and a target non-attention audio corresponding to the non-attention sound source;

the audio adjusting and processing module 1206 is configured to perform audio adjusting and processing on at least one of a target attention audio and a target non-attention audio to obtain a first intermediate audio of an attention sound source and a second intermediate audio of a non-attention sound source, where a sound effect discrimination of the first intermediate audio is greater than a sound effect discrimination of the second intermediate audio;

and the audio mixing processing module 1208 is configured to perform audio mixing on the first intermediate audio and the second intermediate audio to obtain an audio mixing processing result.

In some embodiments, the determining module 1202 is configured to, in response to an interaction with respect to a virtual space, determine a region of interest in the virtual space to which the interaction is directed; a sound source in the region of interest is taken as a sound source of interest.

In some embodiments, the obtaining module 1204 is configured to perform audio mixing processing on a focused audio source emitted by each of a plurality of focused sound sources to obtain a target focused audio; and carrying out sound mixing processing on the non-attention audio frequencies emitted by the plurality of non-attention sound sources respectively to obtain the target non-attention audio frequency.

In some embodiments, the audio conditioning processing module 1206 is configured to perform signal enhancement processing on the target attention audio to obtain a first intermediate audio corresponding to the attention sound source; and performing signal attenuation processing on the target non-attention audio to obtain a second intermediate audio corresponding to the non-attention sound source.

In some embodiments, the audio adjustment processing module 1206 comprises:

In some embodiments, the second determining unit is configured to determine a target time period within which each sampling time of the audio to be adjusted falls, where the target time period is determined by a sound source switching process, and the sound source switching process refers to switching between a sound source of interest and a sound source of non-interest; and obtaining the calculation results of the adjustment functions corresponding to the target time periods in which the sampling moments respectively fall as the adjustment parameters corresponding to the sampling moments respectively, wherein the value of an independent variable existing in the adjustment functions is determined based on the sampling moments.

In some embodiments, an adjustment threshold is included in the adjustment function, the adjustment threshold is determined by a distance between the corresponding sound source and a virtual operation object in the virtual space, and the virtual operation object is a mapping object of a target object in the virtual space, which triggers the interactive operation.

In some embodiments, if the audio to be adjusted is the target attention audio, the target time period is one of a cut-in attention time period, a continuous attention time period, and a quit attention time period; wherein the start time of the cut-in attention period is determined based on a time at which the sound source is determined as the attention sound source, the start time of the continuous attention period is determined based on an end time of the cut-in attention period, and the start time of the exit attention period is determined based on a time at which the attention sound source is switched to the non-attention sound source, or is determined based on the cancel attention instruction.

In some embodiments, if the audio to be adjusted is the target non-attention audio, the target time period is one of a cut-in non-attention time period, a continuous non-attention time period, and a quit non-attention time period; wherein a start time of switching into the non-attention period is determined based on a time at which the sound source is determined to be a non-attention sound source, a start time of continuing the non-attention period is determined based on an end time of switching into the non-attention period, a start time of exiting the non-attention period is determined based on a time at which the non-attention sound source is switched to the attention sound source, or is determined based on the attention instruction.

In some embodiments, the third determining unit is configured to, when the target attention audio is an audio to be adjusted, take an audio adjustment result corresponding to the target attention audio as a first intermediate audio of the attention sound source, and otherwise directly take the target attention audio as the first intermediate audio; and if the target non-attention audio is the audio to be adjusted, taking an audio adjusting result corresponding to the target non-attention audio as a second intermediate audio of the non-attention sound source, otherwise, directly taking the target non-attention audio as the second intermediate audio.

In some embodiments, each sound source in the virtual space corresponds to a plurality of channels; the apparatus also includes a stereo reconstruction module; the stereo reconstruction module is used for acquiring the sound mixing processing results corresponding to the multiple sound channels, performing stereo reconstruction based on the sound mixing processing results of the multiple sound channels, and outputting the reconstructed stereo in a virtual space.

In some embodiments, the interaction is captured by an interaction device, and the operation type of the interaction includes at least one of a sensory pointing type, an awareness pointing type, and a limb pointing type.

For specific definition of the virtual space-based audio processing apparatus, reference may be made to the above definition of the virtual space-based audio processing method, which is not described herein again. The respective modules in the virtual space-based audio processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal or a server. The internal structure thereof may be as shown in fig. 13. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a virtual space-based audio processing method.

Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It should be noted that, the user information (including but not limited to user device information, user operation information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation. The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for audio processing based on virtual space, the method comprising:

in response to an interactive operation for a virtual space, determining a sound source of interest in the virtual space, and regarding a sound source other than the sound source of interest among all sound sources in the virtual space as a non-sound source of interest;

acquiring a target attention audio corresponding to the attention sound source and a target non-attention audio corresponding to the non-attention sound source;

performing audio adjustment processing on at least one of the target attention audio and the target non-attention audio to obtain a first intermediate audio of the attention sound source and a second intermediate audio of the non-attention sound source, wherein the sound effect discrimination of the first intermediate audio is greater than that of the second intermediate audio;

2. The method of claim 1, wherein determining a sound source of interest in a virtual space in response to an interactive operation on the virtual space comprises:

in response to an interactive operation for a virtual space, determining a region of interest in the virtual space to which the interactive operation is directed;

and taking the sound source in the attention area as the attention sound source.

3. The method according to claim 1, wherein the acquiring target attention audio corresponding to the attention sound source and target non-attention audio corresponding to the non-attention sound source comprises:

performing sound mixing processing on attention audio frequencies sent by a plurality of attention sound sources respectively to obtain target attention audio frequencies;

and carrying out sound mixing processing on the non-attention audio frequencies emitted by the plurality of non-attention sound sources respectively to obtain the target non-attention audio frequency.

4. The method according to claim 1, wherein the audio conditioning at least one of the target audio of interest and the target audio of non-interest to obtain a first intermediate audio of the sound source of interest and a second intermediate audio of the sound source of non-interest comprises:

performing signal enhancement processing on the target attention audio to obtain a first intermediate audio corresponding to the attention sound source;

and performing signal attenuation processing on the target non-attention audio to obtain a second intermediate audio corresponding to the non-attention sound source.

5. The method according to claim 1, wherein the audio conditioning at least one of the target audio of interest and the target audio of non-interest to obtain a first intermediate audio of the sound source of interest and a second intermediate audio of the sound source of non-interest comprises:

determining audio to be adjusted from the target audio of interest and the target audio of non-interest;

determining adjustment parameters respectively corresponding to each sampling moment of the audio to be adjusted;

and carrying out audio adjustment processing on the audio to be adjusted based on the adjustment parameters respectively corresponding to the sampling moments, and determining a first intermediate audio of the concerned sound source and a second intermediate audio of the non-concerned sound source based on an audio adjustment result.

6. The method according to claim 5, wherein the determining adjustment parameters corresponding to the sampling moments of the audio to be adjusted respectively comprises:

determining a target time period in which each sampling moment of the audio to be adjusted falls respectively, wherein the target time period is determined by a sound source switching process, and the sound source switching process refers to switching between a concerned sound source and a non-concerned sound source;

and obtaining a calculation result of an adjustment function corresponding to the target time period in which each sampling moment falls respectively, and taking the calculation result as an adjustment parameter corresponding to each sampling moment, wherein the value of an independent variable existing in the adjustment function is determined based on the sampling moment.

7. The method according to claim 6, wherein the adjustment function includes an adjustment threshold, the adjustment threshold is determined by a distance between a corresponding sound source and a virtual operation object in a virtual space, and the virtual operation object is a mapping object of a target object in the virtual space, which triggers the interactive operation.

8. The method according to claim 6, wherein if the audio to be adjusted is the target attention audio, the target time period is one of a cut-in attention time period, a continuous attention time period, and a quit attention time period; wherein a start time of the cut-in attention period is determined based on a time at which a sound source is determined as an attention sound source, a start time of the continuous attention period is determined based on an end time of the cut-in attention period, and a start time of the exit attention period is determined based on a time at which the attention sound source is switched to a non-attention sound source, or is determined based on a cancel attention instruction.

9. The method of claim 8, wherein the adjustment function for the cut-in time period of interest is a monotonically increasing function, the adjustment function for the sustained time period of interest is a constant function, and the adjustment function for the exit time period of interest is a monotonically decreasing function.

10. The method of claim 9, wherein the monotonically increasing function and the monotonically decreasing function each have a gradient that gradually decreases with increasing argument.

11. The method of claim 6, wherein if the audio to be adjusted is the target non-attention audio, the target time period is one of a cut-in non-attention time period, a continuous non-attention time period, and a quit non-attention time period; wherein a start time of the cut-in non-attention period is determined based on a time at which a sound source is determined as a non-attention sound source, a start time of the continuous non-attention period is determined based on an end time of the cut-in non-attention period, and a start time of the exit non-attention period is determined based on a time at which the non-attention sound source is switched to an attention sound source, or is determined based on an attention instruction.

12. The method of claim 11, wherein the adjustment function for the cut-in period of non-interest is a monotonically decreasing function, the adjustment function for the sustained period of non-interest is a constant function, and the adjustment function for the exit period of non-interest is a monotonically increasing function.

13. The method of claim 5, wherein determining the first intermediate audio of the sound source of interest and the second intermediate audio of the sound source of non-interest based on the audio adjustment result comprises:

if the target attention audio is the audio to be adjusted, taking an audio adjusting result corresponding to the target attention audio as a first intermediate audio of the attention sound source, otherwise, directly taking the target attention audio as the first intermediate audio;

and if the target non-attention audio is the audio to be adjusted, taking an audio adjusting result corresponding to the target non-attention audio as a second intermediate audio of the non-attention sound source, otherwise, directly taking the target non-attention audio as the second intermediate audio.

14. The method of any one of claims 1 to 13, wherein each sound source in the virtual space corresponds to a plurality of channels, the method further comprising:

and acquiring a sound mixing processing result corresponding to each of the plurality of channels, performing stereo reconstruction based on the sound mixing processing result of the plurality of channels, and outputting the reconstructed stereo in the virtual space.

15. The method of any one of claims 1 to 13, wherein the interaction is captured by an interaction device, and wherein the operation type of the interaction comprises at least one of a sensory pointing type, an awareness pointing type, and a limb pointing type.

16. An apparatus for audio processing based on virtual space, the apparatus comprising:

a determination module, configured to determine a sound source of interest in a virtual space in response to an interactive operation for the virtual space, and to take sound sources other than the sound source of interest among all sound sources in the virtual space as non-sound sources of interest;

an obtaining module, configured to obtain a target attention audio corresponding to the attention sound source and a target non-attention audio corresponding to the non-attention sound source;

the audio adjusting and processing module is used for performing audio adjusting and processing on at least one of the target attention audio and the target non-attention audio to obtain a first intermediate audio of the attention sound source and a second intermediate audio of the non-attention sound source, wherein the sound effect discrimination of the first intermediate audio is greater than that of the second intermediate audio;

and the audio mixing processing module is used for mixing the first intermediate audio and the second intermediate audio to obtain an audio mixing processing result.

17. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 15.

18. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 15.