CN115237250A

CN115237250A - Audio playing method and device, storage medium, client and live broadcasting system

Info

Publication number: CN115237250A
Application number: CN202210709557.7A
Authority: CN
Inventors: 岳豪; 史俊杰
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-10-25

Abstract

The method carries out spatialization processing on an original audio signal according to second position information of a second virtual object in a virtual scene and first position information of a first virtual object in the virtual scene to obtain a first audio signal, processes the first audio signal according to posture information of the second virtual object to obtain a second audio signal, and the output second audio signal has a six-degree-of-freedom effect. The second audio signal can change along with the relative position change between the first virtual object and the second virtual object, so that a user can have real sound experience in a virtual scene, and the immersion of the user in virtual reality is greatly improved.

Description

Audio playing method and device, storage medium, client and live broadcasting system

Technical Field

The present disclosure relates to the field of audio technologies, and in particular, to an audio playing method, an audio playing device, a storage medium, a client, and a live broadcast system.

Background

In a virtual reality scene, the sound heard by the user is often fixed. For example, the sound emitted by the same sound source that the user hears is consistent regardless of where the user is located in the virtual reality scene. This results in the user not having a sound experience that is consistent with reality, greatly reducing the user's experience of immersive virtual reality.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides an audio playing method applied to a client, including:

determining first position information of a first virtual object in a virtual scene and an original audio signal generated by the first virtual object;

according to second position information of a second virtual object in the virtual scene and the first position information, carrying out spatialization processing on the original audio signal to obtain a first audio signal, wherein the second virtual object is a virtual role controlled by the client;

processing the first audio signal according to the attitude information of the second virtual object to obtain a second audio signal;

outputting the second audio signal.

In a second aspect, the present disclosure provides an audio playing apparatus applied to a client, including:

a determining module configured to determine first position information of a first virtual object in a virtual scene and an original audio signal generated by the first virtual object;

the first audio module is configured to perform spatialization processing on the original audio signal according to second position information of a second virtual object in the virtual scene and the first position information to obtain a first audio signal, wherein the second virtual object is a virtual role controlled by the client;

the second audio module is configured to process the first audio signal according to the posture information of the second virtual object to obtain a second audio signal;

an output module configured to output the second audio signal.

In a third aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processing apparatus, implements the steps of the audio playback method of the first aspect.

In a fourth aspect, the present disclosure provides a client, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the audio playing method according to the first aspect.

In a fourth aspect, the present disclosure provides a virtual reality live broadcasting system, including:

the system comprises a main broadcasting terminal, a server and a broadcasting server, wherein the main broadcasting terminal is configured to acquire position information of a main broadcasting in a real scene and a first original audio signal generated by the main broadcasting terminal and send the position information of the main broadcasting and the first original audio signal to the server;

the server is configured to receive the position information of the anchor and the first original audio signal sent by the anchor terminal, and distribute the received position information of the anchor and the first original audio signal to a client;

the client is configured to:

determining first position information of a first virtual object in the virtual scene according to the position information of the anchor, wherein the first virtual object is a virtual role controlled by the anchor;

according to second position information of a second virtual object in the virtual scene and the first position information, performing spatialization processing on the first original audio signal to obtain a first audio signal, wherein the second virtual object is a virtual role controlled by the client;

outputting the second audio signal.

Based on the technical scheme, the original audio signal is subjected to spatialization processing according to the second position information of the second virtual object in the virtual scene and the first position information of the first virtual object in the virtual scene to obtain the first audio signal, the first audio signal is processed according to the posture information of the second virtual object to obtain the second audio signal, and the output second audio signal has a six-degree-of-freedom effect. The second audio signal can change along with the relative position change between the first virtual object and the second virtual object, so that a user can have real sound experience in a virtual scene, and the immersion of the user in virtual reality is greatly improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

fig. 1 is a schematic diagram illustrating an application scenario of an audio playing method according to some embodiments.

Fig. 2 is a schematic diagram illustrating an application scenario of an audio playing method according to other embodiments.

Fig. 3 is a flow diagram illustrating an audio playback method according to some embodiments.

Fig. 4 is a schematic diagram of a first audio signal shown in accordance with some embodiments.

Fig. 5 is a schematic diagram illustrating a second audio signal according to some embodiments.

Fig. 6 is a logic diagram illustrating sound playback according to some embodiments.

Fig. 7 is a flow diagram illustrating obtaining target audio setting parameters according to some embodiments.

Fig. 8 is a flow diagram illustrating an audio playback method according to further embodiments.

Fig. 9 is a schematic structural diagram illustrating a virtual reality live system according to some embodiments.

Fig. 10 is a schematic diagram illustrating a first raw audio signal captured by a cast end according to some embodiments.

Fig. 11 is a block diagram illustrating the connection of modules of an audio playback device according to some embodiments.

Fig. 12 is a schematic diagram illustrating a structure of a client according to some embodiments.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

It is understood that before the technical solutions disclosed in the embodiments of the present disclosure are used, the type, the use range, the use scene, etc. of the personal information related to the present disclosure should be informed to the user and obtain the authorization of the user through a proper manner according to the relevant laws and regulations.

For example, in response to receiving an active request from a user, a prompt message is sent to the user to explicitly prompt the user that the requested operation to be performed would require the acquisition and use of personal information to the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server, or a storage medium that performs the operations of the technical solution of the present disclosure, according to the prompt information.

As an optional but non-limiting implementation manner, in response to receiving an active request from the user, the manner of sending the prompt information to the user may be, for example, a pop-up window, and the prompt information may be presented in a text manner in the pop-up window. In addition, a selection control for providing personal information to the electronic device by the user's selection of "agreeing" or "disagreeing" can be carried in the popup.

It is understood that the above notification and user authorization process is only illustrative and not limiting, and other ways of satisfying relevant laws and regulations may be applied to the implementation of the present disclosure.

At the same time, it is understood that the data involved in the present disclosure (including but not limited to the data itself, the acquisition or use of the data) should comply with the requirements of the relevant laws and regulations and related regulations.

Fig. 1 is a schematic diagram illustrating an application scenario of an audio playing method according to some embodiments. As shown in fig. 1, an audio playing method provided in the embodiment of the present disclosure may be applied to a virtual reality game scene, where a virtual reality game runs on a server, and a first client, a second client, … …, and an nth client are in communication connection with the server to join in the same virtual reality game through the server. In the virtual reality game, each client can spatialize original audio signals sent by other clients according to the position information and the posture information of the user in the real environment corresponding to the client and in combination with the position information and the posture information of the user in the real environment corresponding to the other clients, so that the audio heard at the client and sent by the other clients can change along with the change of the position information and the posture information of the user. For example, when the virtual object corresponding to the second client approaches the virtual object corresponding to the first client, the loudness of the sound emitted by the virtual object corresponding to the second client may become greater.

It should be understood that the client may be a Virtual Reality (VR) device.

Fig. 2 is a schematic diagram illustrating an application scenario of an audio playing method according to other embodiments. As shown in fig. 2, an audio playing method provided by the embodiment of the present disclosure may be applied to a virtual reality live scene, where the virtual reality live scene includes a main broadcasting end, a server, and a client. The anchor end is in communication connection with the server, and is configured to collect the position information of the anchor in a real scene, a first original audio signal and picture data generated by the anchor, and send the position information of the anchor, the first original audio signal and the picture data to the server. The server is configured to receive the position information, the first original audio signal and the picture data of the anchor sent by the anchor terminal, and form a live push stream, so as to distribute the position information, the first original audio signal and the picture data of the anchor to the client in a live push stream manner. The client is in communication connection with the server, streams are pulled from the server, the client is configured to determine first position information of a first virtual object corresponding to the anchor in a virtual scene according to the position information of the anchor, spatialization processing is carried out on a first audio signal by combining second position information of a second virtual object corresponding to the client in the virtual scene, a first audio signal is obtained, the first audio signal is processed according to the posture information of the second virtual object, a second audio signal is obtained, and then the second audio signal is output in the client. It should be appreciated that while outputting the second audio signal, the client will synchronize the pull-stream acquired picture data to present the corresponding virtual reality picture. Therefore, the second audio signal heard in the client can change with the position and posture change of the user and/or the anchor, so that the virtual reality live scene can have real sound experience.

Fig. 3 is a flow diagram illustrating an audio playback method according to some embodiments. As shown in fig. 3, an embodiment of the present disclosure provides an audio playing method applied to a client, where the method may include the following steps.

In step 310, first position information of a first virtual object in a virtual scene and an original audio signal generated by the first virtual object are determined.

Here, the virtual scene may be a virtual reality scene, such as a virtual reality scene presented using a VR device, but of course, the virtual scene may also be an electronic game scene other than the virtual reality scene. In the embodiments of the present disclosure, the detailed description is mainly made in terms of virtual reality scenes. The first virtual object is a three-dimensional model in a virtual reality scene, and the action of the first virtual object is controlled by a corresponding real object of other clients or anchor terminals. It should be understood that the first virtual object may be any object that makes a sound in the virtual reality scene, and the first virtual object is not limited to a virtual character object, but may also be a virtual article. For example, in a live virtual reality concert scene, the first virtual object may be a main broadcast of the virtual reality concert or a virtual instrument in the virtual reality concert.

The original audio signal refers to an unprocessed audio signal emitted by a real object corresponding to the first virtual object in a real scene. For example, an original audio signal corresponding to a first virtual object may be captured by a VR device of the first virtual object and sent to a client.

For example, a first three-dimensional coordinate system may be constructed by using a reference point in the virtual reality scene as a coordinate origin. The first position information is coordinate information of a first virtual object in the first three-dimensional coordinate system.

In some embodiments, the client receives real position information of a real object corresponding to the first virtual object in a real environment, which is acquired by the VR device corresponding to the first virtual object, and the client obtains the first position information of the first virtual object in the virtual reality scene according to the real position information.

It is worth mentioning that in the virtual reality scene, one reference point in the virtual reality scene is used as the coordinate origin of the first three-dimensional coordinate system. When the VR device collects the real position information of the real object, a reference point in the real environment is also used as the origin of coordinates of the second three-dimensional coordinate system. The real position information is coordinate information of the real object in the second three-dimensional coordinate system. When the real position information is converted into first position information, the reference point is coincided with the reference point, and the first position information of a first three-dimensional coordinate system of the real position information mapped in the virtual reality scene is obtained.

In some embodiments, after step 310, the method further comprises:

and adjusting the loudness of the original audio signal to obtain an adjusted original audio signal, wherein the loudness of the adjusted original audio signal is maintained in a preset loudness range.

Here, after receiving the original audio signal, the client adjusts the loudness of the original audio signal so that the loudness of the adjusted original audio signal is maintained within the preset loudness range.

It should be understood that, by adjusting the loudness of the original audio signal, the loudness of the original audio signal generated by the user corresponding to the first virtual object can be stabilized within a preset loudness range, without causing a problem that the sound heard by the client is large and small when occurring, so as to ensure that the user of the client has a good sound experience.

In step 320, according to second position information of a second virtual object in the virtual scene and the first position information, performing spatialization processing on the original audio signal to obtain a first audio signal, where the second virtual object is a virtual character controlled by the client.

Here, the second virtual object is a virtual character controlled in the virtual scene by a user corresponding to the client. The second position information of the second virtual object in the virtual scene is coordinate information of the second virtual object in the first three-dimensional coordinate system. It should be understood that the method for the client to obtain the second location information is consistent with the method for obtaining the first location information in the foregoing embodiment, and is not described herein again.

From the second position information of the second virtual object and the first position information of the first virtual object, a distance to the second virtual object and an orientation to the first virtual object may be determined in the virtual scene.

The original audio signal is spatially processed according to the second position information and the first position information, and the original audio signal lacking the position information is actually rendered to a virtual scene. Illustratively, spatializing the original audio signal includes distance processing and azimuth processing the original audio signal, wherein the distance processing is to vary one or more of loudness, frequency, diffuseness and degree of focus of the original audio signal in accordance with a distance parameter of the first virtual object relative to the second virtual object. The directional processing is to change the timbre of the original audio signal in accordance with the orientation parameters of the first virtual object relative to the second virtual object.

It is worth mentioning that the first audio signal obtained by spatializing the original audio signal is actually an Ambisonic format audio. The Ambisonic format is an audio format that represents sound based on spatial location. The Ambisonic format audio is isotropic and can treat sound from any direction equally.

Fig. 4 is a schematic diagram of a first audio signal shown in accordance with some embodiments. As shown in fig. 4, the first audio signal obtained by spatialization processing of the original audio signal is actually converted into sub sound signals (shown by gray circles in fig. 4) distributed at different spatial positions in the virtual panoramic space 40. Equivalent to a distribution of an infinite number of speakers in the virtual panoramic space 40, the original audio signal is converted into speakers at different spatial positions. All the sub sound signals shown in fig. 4 constitute the first audio signal. Wherein the size of the gray circle in fig. 4 represents the volume level of the sound signal at the spatial position.

In step 330, the first audio signal is processed according to the pose information of the second virtual object to obtain a second audio signal.

Here, the attitude information of the second virtual object includes at least one of a yaw angle, a roll angle, and a pitch angle of the second virtual object. The pose information reflects an orientation of the second virtual object in the virtual reality scene.

It should be understood that, since the second virtual object is a virtual character controlled by a user corresponding to the client, the pose information of the second virtual object is actually equivalent to the pose information of the user corresponding to the client in the real environment. For example, the client may obtain the pose information of the user in the real environment through a sensor such as a gyroscope, an Inertial Measurement Unit (IMU), and convert the pose information of the user in the real environment into the pose information of the second virtual object.

The processing of the first audio signal in dependence on the pose information of the second virtual object is actually defining the orientation of the first audio signal in dependence on the pose information of the second virtual object. Fig. 5 is a schematic diagram illustrating a second audio signal according to some embodiments. As shown in fig. 5, the second audio signal is obtained by defining sub sound signals (shown as gray circles in fig. 5) distributed at different spatial positions of the virtual panoramic space 50 by the pose information of the second virtual object 51.

It will be appreciated that the second audio signal is actually a six degree of freedom (6 DoF) audio signal comprising six degrees of freedom, including fore and aft, left and right, up and down, pitch, roll and yaw.

In step 340, the second audio signal is output.

Here, the client outputting the second audio signal is outputting the second audio signal according to a parameter corresponding to the audio output device configured by the client. For example, when the audio output device configured by the client is a headphone, the second audio signal is processed by binauralization. And when the audio output device configured at the client is an independent loudspeaker, carrying out loudspeaker processing on the second audio signal.

Therefore, the original audio signal is subjected to spatialization processing according to the second position information of the second virtual object in the virtual reality scene and the first position information of the first virtual object in the virtual reality scene to obtain a first audio signal, the first audio signal is processed according to the posture information of the second virtual object to obtain a second audio signal, and the output second audio signal has a six-degree-of-freedom effect. The second audio signal can change along with the relative position change between the first virtual object and the second virtual object, so that a user can have real sound experience in a virtual reality scene, and the immersion of the user in the virtual reality is greatly improved. For example, when the second virtual object is close to the first virtual object, the loudness of the sound emitted by the first virtual object becomes greater, and when the second virtual object turns around, the sound emitted by the first virtual object also turns along with the second virtual object.

It is worth mentioning that, in the embodiment of the present disclosure, the original audio signal generated by the first virtual object is spatially processed in the client. When a plurality of virtual roles controlled by the clients respectively exist in the virtual reality scene, the same original audio signals are transmitted to each client, each client performs spatialization processing on the original audio signals according to the position information and the posture information of each client, and the sound heard by the user of each client is different.

In addition, in the embodiment of the present disclosure, the client outputs the picture of the virtual reality scene in real time, and the specific logic of the picture output is not described in detail in the present disclosure.

In some implementation manners, in step 320, the original audio signal may be spatially processed according to the second location information and the first location information and by combining with a target audio setting parameter matched with the virtual scene, so as to obtain the first audio signal.

Here, the target audio setting parameter is used to make the sound effect of the sound propagated in the virtual reality scene after being rendered by the target audio setting parameter consistent with the sound effect of the sound propagated in the real environment corresponding to the virtual reality scene.

It will be appreciated that for different types of virtual reality scenes, it may correspond to different target audio setting parameters. Each virtual reality scene can set corresponding audio setting parameters according to different scene contents, so that the sound propagated in the virtual reality scene can be similar to the effect propagated in the real scene.

For different virtual reality scenes, the sound propagation effect is inconsistent, and the corresponding target audio setting parameters are also inconsistent. For example, for a virtual reality scene of a "bird's nest gym" and a virtual reality scene of a "worker gym," the reverberation effect of sound propagating in the "bird's nest gym" is heavier than the reverberation effect propagating in the "worker gym. The target audio setting parameters corresponding to different virtual reality scenes can enable the sound effect presented in the virtual reality scene to be consistent with the sound effect transmitted in the real environment.

The target audio setting parameters comprise at least one of sound attenuation parameters, acoustic material parameters of each virtual article model in the virtual reality scene, early reflected sound parameters, reverberation parameters and spatialization parameters of the environmental sound.

The sound attenuation parameter is a relation that the sound volume, frequency, plug-in sending quantity, diffuseness and audio focus change along with the change of distance. The acoustic material parameters refer to the absorption rate and the reflection rate of different materials to sound with different frequencies, and reflect the absorption rate and the reflection rate of different virtual article models to the sound in the virtual reality scene. The early reflected sound is all reflected sounds that arrive after the direct sound and have a favorable effect on the sound quality of the room, and the early reflected sound parameters may include the proportion of the early reflected sound, the time range of the early reflected sound, the early reflected sound arriving from the side, and the like.

Reverberation is the result of the combined action of indoor sound reflection and sound diffusion, and the sound can be propagated in a virtual reality scene by setting reverberation parameters so as to generate corresponding reverberation feeling. For example, when the virtual reality scene is a large concert hall, the reverberation parameters thereof may be as follows: reverberation time: 2.8s0.3 to 30.0s, high-frequency attenuation ratio: 0.80.1-1.0, reverberation diffusion: 60-10, delay time between direct sound and early reflected sound: 40.0ms0.1-200.0ms, cutoff frequency of low-pass filter: 7.0kHz-16kHz, cut-off frequency of high pass filter: 32Hz-8KHz.

The spatialization parameter of the environmental sound refers to a parameter for spatializing the environmental sound generated in the virtual reality scene, for example, when the virtual reality scene is a large concert hall, the environmental sound such as applause, cheering, or room noise generated in the virtual scene may be processed based on the spatialization parameter of the environmental sound.

The original audio signal can be processed into a first audio signal meeting the acoustic requirement of the virtual reality scene by performing spatialization processing on the original audio signal according to the first position information, the second position information, the set sound attenuation parameter, the acoustic material parameter, the early reflected sound parameter, the reverberation parameter and the spatialization parameter of the environment sound.

Fig. 6 is a logic diagram illustrating sound playback according to some embodiments. As shown in fig. 6, the overall logic of sound playing is: the sound output by the playing logic is processed by spatialization parameters, the sound processed by the spatialization parameters enters sound transmission processing, the sound processed by the sound transmission processing enters sound carrier processing, and finally the sound processed by the sound carrier processing is played back to output the sound.

When the spatialization parameter processing is performed on the sound, the used spatialization parameter may include a sound attenuation parameter, where the sound attenuation parameter refers to a relation that a volume, a frequency, a plug-in sending amount, a diffuseness, and an audio focus of the sound change with a distance. Therefore, in the spatialization parameter processing, the first position information and the second position information are added to determine the distance between the first virtual object and the second virtual object according to the first position information and the second position information, and determine the corresponding sound attenuation parameter according to the distance. It should be appreciated that the pose information of the second virtual object may also be added when spatialization parameter processing is performed on the sound.

In the sound propagation processing, the acoustic material parameter, the reverberation parameter and the early reflected sound parameter are added to simulate the propagation of the sound in the virtual reality scene, so that the output second audio frequency can have a real effect.

In some implementations, the target audio setting parameter is obtained by:

and synchronously loading target audio setting parameters corresponding to the virtual scene when the virtual scene is loaded.

Here, the target audio setting parameter corresponding to the virtual scene is an audio setting parameter configured for the virtual scene according to a scene type corresponding to the virtual scene when the virtual scene is constructed. When a user of the client selects a certain virtual reality scene, the selected virtual reality scene is loaded in the client, and corresponding target audio setting parameters are synchronously loaded.

It should be understood that, when constructing the virtual reality scene, the virtual reality scene may be constructed according to the art scene, and then the audio setting parameters of the virtual reality scene are configured according to the scene type of the constructed virtual reality scene, so that the sound propagated in the virtual reality scene can be matched with the scene type of the virtual reality scene.

As some examples, configuring the audio setting parameters of the virtual scene according to the scene type may be configuring the audio setting parameters according to an environment of the virtual scene. Of course, as another example, configuring the audio setting parameters of the virtual scene according to the scene type may also be configuring the audio setting parameters in the scene according to the content in the virtual scene. For example, when singing in a virtual scene, corresponding audio setting parameters can be configured according to the predetermined singing content. For example, when the piano is used for singing, the audio setting parameters of the piano sound propagated in the virtual reality space can be configured in advance, so that the output of the piano sound is more real.

Therefore, by configuring corresponding target audio setting parameters for each virtual scene, the audio output in the virtual scene can be more realistic, and the propagation of sound can be customized according to the content in the virtual scene, so that the audio output in the virtual scene can meet the content requirement.

Fig. 7 is a flow diagram illustrating obtaining target audio setting parameters according to some embodiments. As shown in fig. 7, in some implementations, the target audio setting parameter is obtained by:

in step 701, a plurality of recommended audio setting parameters are generated according to the initial audio setting parameters corresponding to the virtual scene.

Here, for each virtual reality scene, it may be configured with a corresponding initial audio setting parameter, where the initial audio setting parameter may be an audio setting parameter configured for the virtual reality scene in advance according to a scene type corresponding to the virtual reality scene. The initial audio setting parameters comprise sound attenuation parameters, acoustic material parameters of each virtual article model in the virtual reality scene, early reflected sound parameters, reverberation parameters and spatialization parameters of the environmental sound.

After the virtual reality scene is loaded in the client, the client can adjust the initial audio setting parameters according to the scene type corresponding to the client and/or the user requirement of the user corresponding to the client, so as to generate a plurality of recommended audio setting parameters.

In step 702, the plurality of recommended audio setting parameters are presented in the virtual scene.

Here, after obtaining the plurality of recommended audio setting parameters, the client may present the plurality of recommended audio setting parameters for selection by a user of the client.

For example, the presenting of the plurality of recommended audio setting parameters in the virtual reality scene may be playing the same audio by using the plurality of recommended audio setting parameters, respectively, so that the user selects the corresponding recommended audio setting parameter according to the audio effect.

In step 703, in response to a selection instruction, the selected recommended audio setting parameter is determined as the target audio setting parameter.

Here, after presenting the plurality of recommended audio setting parameters, the client determines a selected recommended audio setting parameter as the target audio setting parameter in response to a selection instruction for the user to select a certain recommended audio setting parameter.

It should be noted that the initial audio setting parameter is an audio setting parameter that can be directly used, and if the user corresponding to the client does not select the recommended audio setting parameter, the initial audio setting parameter may be used by default. The recommended audio setting parameters selected by the user are audio setting parameters which meet the requirements of the user better, and the original audio signals processed by the recommended audio setting parameters can meet the hearing sense of the requirements of the user.

Therefore, the user of the client can customize personalized audio requirements by adjusting the initial audio setting parameters and determining the recommended audio setting parameters selected by the user as the target audio setting parameters.

In some embodiments, the first virtual object comprises an anchor-controlled virtual character that joins the virtual scene through an anchor end, and the original audio signal comprises sound generated by the anchor in a real scene.

Here, the first virtual object, which may be a virtual character controlled by the anchor side, may be a sound generated in a real scene by the anchor collected by the anchor side.

Wherein, when the first virtual object is a virtual character controlled by the main broadcasting terminal, a virtual reality live broadcasting scene is actually provided. In the live broadcast scene, the anchor terminal collects an original audio signal, first position information and picture data generated by the anchor in a real scene, transmits the original audio signal, the first position information and the picture data to the server to form a stream pushing signal, and the client terminal pulls a stream from the server to obtain a corresponding stream pushing signal. After obtaining the stream-pushed signal, the client may output a picture in the virtual scene according to picture data in the stream-pushed signal, and output a second audio signal according to the original audio signal and the first position information.

In some embodiments, the channels and formats of the captured original audio signals may be set in the anchor, the microphone used to capture the original audio signals, and the audio signal routing used to store the original audio signals may be set.

Wherein, the anchor can be not only a character, but also an item. For example, in a virtual reality concert live scene, the anchor may be the singer of the virtual reality concert, or may be an instrument in the virtual reality concert, such as a drum kit, an electric guitar, a bass, and so on. The corresponding original audio signal may be the original audio produced by the character and/or item.

In further embodiments, the first virtual object further comprises a virtual character controlled by a target object joined to the virtual scene by another client, and the first audio signal comprises a sound generated in the real scene by the target object.

Here, in the live scene of the virtual reality, it may include not only the anchor terminal and the present client but also other clients. For example, viewers joining a live scene of virtual reality through other clients.

In this scenario, step 310 may be: responding to a voice request, establishing a voice connection between the client and the other clients, receiving position information of the target object in a real scene and sound generated by the target object in the real scene, which are sent by the other clients, determining the first position information based on the position information of the target object, and determining the sound generated by the target object in the real scene as the original audio signal.

Here, the voice request may be a voice request initiated by the client to another client, or may be a voice request initiated by another client to the client. In the same virtual reality scene, no matter the client side or other client sides, the voice connection between the client side and other client sides can be established by selecting the virtual object in the virtual reality scene, initiating a voice request for applying for establishing the voice connection to the virtual object and establishing the voice connection between the client side and other client sides when the other client side agrees to the voice request. For example, in a virtual reality scenario, virtual object a may request to speak with virtual object B, and when virtual object B approves the speech request, a speech connection between virtual object a and virtual object B is established.

After the voice connection is established, the client receives the position information of the target object in the real scene and the sound generated by the target object in the real scene, which are sent by other clients. It should be noted that, in the above embodiment, detailed descriptions have been given to how other clients acquire the position information of the target object in the real scene and the sound generated by the target object in the real scene, and are not repeated herein.

After receiving the position information of the target object, the client converts the position information of the target object into first position information of the first virtual object. And, determining a sound generated in the real scene by the target object as an original audio signal.

Therefore, in the virtual reality live scene, not only the original audio signals generated by the anchor of the anchor terminal can be subjected to spatialization processing, but also the original audio signals generated by target objects of other clients can be subjected to spatialization processing, so that the voice generated by the microphone-connected communication in the virtual reality scene also has the effect of six degrees of freedom. In other words, in a virtual reality scene, the voice communication between different audiences can also generate spatial variation along with the position and posture variation of the audiences. For example, when the distance between the viewer a and the viewer B becomes shorter, the volume of the voice also becomes larger.

It should be noted that the anchor and other clients can generate corresponding original audio signals simultaneously, and at this time, time axes between the original audio signal generated by the anchor and the original audio signal generated by the other clients need to be synchronized to ensure that the original audio signal generated by the anchor and the original audio signal generated by the other clients can be output synchronously.

In addition, when the original audio signal generated by the anchor terminal and the original audio signal generated by the other client terminal exist in the virtual reality scene at the same time, the processing procedure for the original audio signals is consistent, and only the used parameters are different. For example, when the original audio signal generated by the anchor is subjected to spatialization processing, the first position information corresponding to the anchor and the second position information of the client are used. When the original audio signals generated by other clients are spatialized, the first position information corresponding to the other clients and the second position information of the client are used.

Fig. 8 is a flow diagram illustrating an audio playback method according to further embodiments. As shown in fig. 8, in some implementations, the audio playing method may further include the following steps:

in step 801, in response to a trigger instruction for indicating that an ambient sound is triggered in the virtual scene, a corresponding ambient audio signal is acquired.

Here, the trigger instruction may be transmitted by the anchor side, and the anchor side may trigger a trigger instruction for instructing to trigger the ambient sound in the virtual scene. Illustratively, the trigger instruction may be bound to certain screen data, and the trigger instruction is triggered when certain screen data is presented at the client. Of course, the trigger instruction may also be triggered in the background.

And the client responds to the trigger instruction to acquire a corresponding environment audio signal. It is worth noting that for each virtual reality scene, it may be configured with a different ambient audio signal. For example, for a virtual concert scene, an ambient audio signal of cheering, applause, fireworks sound, and so on may be configured.

Additionally, the ambient audio signal is stored on the client when the virtual reality scene is loaded. When the anchor terminal needs to trigger the environmental audio signal, the anchor terminal sends a trigger instruction to the client terminal, and the client terminal responds to the trigger instruction and acquires the corresponding environmental audio signal from the local database.

It should be understood that, by storing the environment audio signal locally at the client, not only the corresponding environment audio signal does not need to be actually acquired, so as to save the bandwidth for transmitting the audio signal, but also the quality of the environment audio signal can be ensured.

In step 802, according to the trigger position of the environmental audio signal in the virtual scene and the second position information, performing spatialization processing on the environmental audio signal to obtain a third audio signal.

Here, the principle of obtaining the third audio signal by spatially processing the environmental audio signal according to the trigger position of the environmental audio signal in the virtual reality scene and the second position information is consistent with the principle of obtaining the first audio signal, and details are not repeated here. The essence of the method is that the environment audio signal is subjected to spatialization processing, so that the environment audio signal in a virtual reality scene can have the effect of six degrees of freedom.

In step 803, the third audio signal is processed according to the pose information of the second virtual object, and a fourth audio signal is obtained.

Here, the principle of processing the third audio signal according to the pose information of the second virtual object to obtain the fourth audio signal is consistent with the principle of obtaining the second audio signal, and is not described herein again.

In step 804, the fourth audio signal is output.

Here, the principle of outputting the fourth audio signal is identical to the principle of outputting the second audio signal, and is not described in detail herein.

Therefore, the environment audio signals are stored locally at the client, so that the corresponding environment audio signals do not need to be actually acquired, the bandwidth for transmitting the audio signals is saved, and the quality of the environment audio signals can be ensured. In addition, the environment audio signal is processed, so that the output environment audio signal can have the effect of six degrees of freedom.

It should be noted that, when one or more of the original audio signal corresponding to the anchor, the original audio signals of other clients, and the environmental audio signal exist in the virtual reality scene, the spatialization process is consistent.

Fig. 9 is a schematic structural diagram illustrating a virtual reality live system according to some embodiments. As shown in fig. 9, an embodiment of the present disclosure provides a virtual reality live broadcasting system, which includes an anchor terminal 901, a server 902, and a client 903.

The anchor terminal 901 is configured to obtain location information of an anchor in a real scene and a first original audio signal generated by the anchor, and send the location information of the anchor and the first original audio signal to the server 902.

Here, the anchor terminal 901 may collect the location information of the anchor through a positioning module disposed on the anchor terminal 901, where the positioning module may calculate the location information of the anchor through an UWB (Ultra Wideband) spatial positioning technology or a WiFi 6.0 (a wireless network standard) spatial positioning technology.

It should be noted that the collected position information of the anchor is coordinate information of the anchor in the second three-dimensional coordinate system.

In the anchor 901, the channel and format of the acquired audio signal, the microphone for acquiring the audio signal, and the audio signal route for storing the audio signal may be set. The anchor terminal 901 collects a first original audio signal generated by the anchor through a microphone and stores the collected first original audio information according to a set audio signal route, channel, and format. Then, the anchor terminal 901 transmits the collected location information of the anchor and the first original audio signal to the server 902.

It should be understood that anchor 901 will also synchronously collect anchor picture data and send the picture data to server 902. In the embodiments of the present disclosure, the collection of the picture data and the output of the picture data are not involved, and thus, a detailed description thereof will not be given. As an example, the anchor terminal 901 may collect picture data of the anchor through a camera device, and the client 903 directly shows the picture data in the virtual reality scene. As another example, the anchor terminal 901 may collect body posture data of the anchor as picture data, and in the client 903, control an action of a first virtual object corresponding to the anchor according to the body posture data, and show a picture of the first virtual object in a virtual reality scene.

Fig. 10 is a schematic diagram illustrating a first raw audio signal captured by a cast end according to some embodiments. As shown in fig. 10, the virtual reality concert distance is used to illustrate how to acquire a first original audio signal, which includes human voice and musical instrument voice. Drum kit-bass drum, drum kit-snare drum, drum kit-cymbal, electric guitar, bass, synthesizer, vocal-lead and vocal-and-sum, may be mixed by the on-site mixing station and output by the on-site loudspeaker, with the first original audio signal being picked up by a microphone located on-site.

Of course, the drum set cymbal, the electric guitar, the bass, the synthesizer, the vocal-vocal master song and the vocal-harmony voice can be mixed through the on-site pre-mixing sound stage, and the picture data collected by the drum set cymbal, the electric guitar, the bass, the synthesizer, the vocal-master song and the vocal-harmony voice and the camera device are synchronized through the sound and picture synchronization device and sent to the server to form the plug flow signal.

The server 902 is configured to receive the position information of the anchor and the first original audio signal sent by the anchor 901, and distribute the received position information of the anchor and the first original audio signal to the client 903.

Here, after receiving the position information of the anchor and the first original audio signal sent by the anchor 901, the server 902 packages the received position information of the anchor and the first original audio signal to form a streamlet signal, and distributes the position information of the anchor and the first original audio signal to the client 903 in the form of the streamlet signal.

Wherein the first raw audio signal is not processed in the server 902 but is packetized into a push stream signal.

A client 903 configured to:

determining first position information of a first virtual object in the virtual scene according to the position information of the anchor, wherein the first virtual object is a virtual character controlled by the anchor 901;

performing spatialization processing on the first original audio signal according to second position information of a second virtual object in the virtual scene and the first position information to obtain a first audio signal, wherein the second virtual object is a virtual role controlled by the client 903;

outputting the second audio signal.

Here, the client 903 acquires the position information of the anchor and the first original audio signal by pulling a stream from the server 902. First position information of a first virtual object in the virtual reality scene mapped to the anchor in the virtual reality scene can be determined according to the position information of the anchor.

It should be understood that how the position information of the anchor in the second three-dimensional coordinate system is converted into the first position information in the first three-dimensional coordinate system in the virtual reality scene has been described in detail in the foregoing embodiments, and is not described herein again.

In some embodiments, the client 903 is further configured to: after receiving a first original audio signal, adjusting the loudness of the first original audio signal to obtain an adjusted first original audio signal, wherein the loudness of the adjusted first original audio signal is maintained within a preset loudness range.

It should be understood that by adjusting the loudness of the first original audio signal, the loudness of the first original audio signal with different loudness generated by the anchor can be stabilized within a preset loudness range, without causing a problem that the sound heard by the client 903 is large and small when occurring, so as to ensure that the user of the client 903 has a good sound experience.

The second virtual object is a virtual character controlled by the client 903 in the virtual reality scene, and performs spatialization processing on the first original audio signal according to the second position information and the first position information of the second virtual object in the virtual reality scene to obtain the first audio signal.

In the above embodiments, how to obtain the second location information of the second virtual object has been described in detail, and details are not repeated herein.

Moreover, in the above-described embodiment, the process of spatializing the original audio signal based on the first position information and the second position information has been explicitly described, and the process of spatializing the first original audio signal based on the first position information and the second position information is consistent therewith, and therefore, will not be described in detail here.

After obtaining the first audio signal, the client 903 processes the first audio signal based on the pose information of the second virtual object to obtain a second audio signal. Here, since the second virtual object is a virtual character controlled by the user corresponding to the client 903, the posture information of the second virtual object is actually equivalent to the posture information of the user corresponding to the client 903 in the real environment. The processing of the first audio signal in dependence on the pose information of the second virtual object is actually defining the orientation of the first audio signal in dependence on the pose information of the second virtual object. Details of the second audio signal have already been given later in the above embodiments, and are not described herein again.

The client 903 outputting the second audio signal is outputting the second audio signal according to a parameter corresponding to the audio output device configured by the client 903. For example, when the audio output device configured by the client 903 is a headphone, the second audio signal is binaural. When the audio output device configured by the client 903 is an independent speaker, the speaker processing is performed on the second audio signal.

Therefore, in the live virtual reality scene, the first original audio signal is subjected to spatialization processing according to the second position information of the second virtual object in the virtual reality scene and the first position information of the first virtual object in the virtual reality scene to obtain a first audio signal, and the first audio signal is processed according to the posture information of the second virtual object to obtain a second audio signal. The sound output in the live scene of the virtual reality can have the effect of six degrees of freedom, and the sound in the live scene of the virtual reality can change along with the change of the relative position between the first virtual object and the second virtual object, so that the user can have real sound experience in the live process of the virtual reality, and the immersion sense of the user in the live virtual reality is greatly improved.

In some embodiments, the client 903 is specifically configured to:

according to the second position information and the first position information, combining with a target audio setting parameter matched with the virtual scene, and performing spatialization processing on the first original audio signal to obtain a first audio signal;

the target audio setting parameters are used for enabling the sound effect of the sound rendered by the target audio setting parameters and propagated in the virtual scene to be consistent with the sound effect of the sound propagated in the real environment corresponding to the virtual scene.

It is worth mentioning that, in the above embodiments, the process of how to spatially process the original audio signal in combination with the target audio setting parameter has been described in detail, and is not described in detail herein.

In some embodiments, the client 903 is specifically configured to:

and when the virtual scene is loaded, synchronously loading target audio setting parameters corresponding to the virtual scene, wherein the target audio setting parameters are audio setting parameters configured for the virtual scene according to the scene type corresponding to the virtual scene, and the audio setting parameters comprise at least one of sound attenuation parameters, acoustic material parameters of each virtual article model in the virtual scene, early reflected sound parameters, reverberation parameters and spatialization parameters of environment sound.

It is worth mentioning that how to configure the target audio setting parameter has been described in detail in the above embodiments, and will not be described in detail herein.

In some embodiments, the client 903 is specifically configured to:

generating a plurality of recommended audio setting parameters according to the initial audio setting parameters corresponding to the virtual scene;

presenting the plurality of recommended audio setting parameters in the virtual scene;

in response to a selection instruction, determining the selected recommended audio setting parameter as the target audio setting parameter.

It is to be noted that, in the above-described embodiment, the process of determining the target audio setting parameter according to the recommended audio setting parameter has been described in detail, and will not be described in detail herein.

In some embodiments, the anchor 901 is further configured to:

sending a trigger instruction for indicating that the environmental sound is triggered in the virtual scene to the client 903;

the client 903 is further configured to:

in response to the trigger instruction, acquiring a corresponding environment audio signal, wherein the environment audio signal is stored on the client 903 when the client 903 loads the virtual scene;

according to the triggering position of the environment audio signal in the virtual scene and the second position information, carrying out spatialization processing on the environment audio signal to obtain a third audio signal;

processing the third audio signal according to the attitude information of the second virtual object to obtain a fourth audio signal;

outputting the fourth audio signal.

It should be noted that, in the above embodiments, how to process the ambient audio signal has been described in detail, and a detailed description thereof is omitted.

In some embodiments, the client 903 is further configured to:

responding to a voice request, and establishing voice connection between the client 903 and other clients, wherein the other clients are clients 903 corresponding to a third virtual object added to the virtual scene;

receiving position information of a target object corresponding to the third virtual object in a real scene and a second original audio signal generated by the target object in the real scene;

determining third position information of the third virtual object in the virtual scene based on the position information of the target object;

according to the third position information and the second position information, performing spatialization processing on the second original audio signal to obtain a fifth audio signal,

processing the fifth audio signal according to the attitude information of the second virtual object to obtain a sixth audio signal;

outputting the sixth audio signal.

Here, in the live scene of the virtual reality, other clients may join the virtual reality scene through the third virtual object, and then the other clients and the local client 903 may establish voice communication.

The voice request may be a voice request initiated by the client 903 to another client, or a voice request initiated by another client to the client 903. In the same virtual reality scene, no matter the client 903 or other clients, a voice request for applying for establishing a voice connection is initiated to the virtual object by selecting the virtual object in the virtual reality scene, and when another client 903 agrees to the voice request, the voice connection between the client 903 and other clients is established. For example, in a virtual reality scenario, virtual object a may request to speak with virtual object B, and when virtual object B approves the speech request, a speech connection between virtual object a and virtual object B is established.

After the voice connection is established, the client 903 receives the position information of the target object in the real scene and the sound generated by the target object in the real scene, which are sent by other clients. It should be noted that, in the above embodiment, detailed descriptions have been given to how other clients acquire the position information of the target object in the real scene and the sound generated by the target object in the real scene, and are not repeated herein.

The local client 903, after receiving the position information of the target object, converts the position information of the target object into third position information of a third virtual object. And then according to the third position information and the second position information, performing spatialization processing on the second original audio signal to obtain a fifth audio signal, and according to the posture information of the second virtual object, processing the fifth audio signal to obtain a sixth audio signal, and outputting the sixth audio signal.

The principle of obtaining the fifth audio signal by performing spatialization processing on the second original audio signal is the same as the principle of obtaining the first audio signal, and details are not repeated here. The principle of obtaining the sixth audio signal is the same as the principle of obtaining the third audio signal, and is not described herein again. Moreover, the principle of outputting the sixth audio signal and the principle of outputting the second audio signal are not described herein again.

It should be noted that the process of the client 903 obtaining the position information of the target object and the second original audio signal sent by the other clients may be: the other clients send the acquired position information of the target object and the second original audio signal to the server 902, the server 902 packages the position information of the target object and the second original audio signal to form a plug-flow signal, and the client 903 pulls a stream from the server 902 to obtain the position information of the target object and the second original audio signal.

The client 903 may obtain the first original audio signal and the second original audio signal at the same time, and at this time, time axes between the first original audio signal generated by the anchor 901 and the second original audio signal generated by the other client need to be synchronized, so as to ensure that the first original audio signal generated by the anchor 901 and the second original audio signal generated by the other client can be synchronously output.

Therefore, in the virtual reality live scene, not only the first original audio signal generated by the anchor of the anchor 901 can be spatially processed, but also the second original audio signal generated by the target object of another client can be spatially processed, so that the voice generated by the microphone-connected communication in the virtual reality scene also has the effect of six degrees of freedom. In other words, in a virtual live scene, the voice communication between different viewers can also be spatially changed according to the position and posture change between the viewers. For example, when the distance between the viewer a and the viewer B becomes shorter, the volume of the voice also becomes larger.

Fig. 11 is a block diagram illustrating the connection of modules of an audio playback device according to some embodiments. As shown in fig. 11, an embodiment of the present disclosure provides an audio playing apparatus, where the apparatus 1100 is applied to a client, and includes:

a determining module 1101 configured to determine first position information of a first virtual object in a virtual scene and an original audio signal generated by the first virtual object;

a first audio module 1102, configured to perform spatialization processing on the original audio signal according to second position information of a second virtual object in the virtual scene and the first position information to obtain a first audio signal, where the second virtual object is a virtual character controlled by the client;

a second audio module 1103 configured to process the first audio signal according to the pose information of the second virtual object, so as to obtain a second audio signal;

an output module 1104 configured to output the second audio signal.

Optionally, the first audio module 1102 is specifically configured to:

according to the second position information and the first position information, combining with a target audio setting parameter matched with the virtual scene, and performing spatialization processing on the original audio signal to obtain a first audio signal;

the target audio setting parameters are used for enabling the sound effect of sound which is rendered by the target audio setting parameters and propagates in the virtual scene to be consistent with the sound effect of the sound which propagates in the real environment corresponding to the virtual scene.

Optionally, the first audio module 1102 is specifically configured to:

and when the virtual scene is loaded, synchronously loading target audio setting parameters corresponding to the virtual scene, wherein the target audio setting parameters are audio setting parameters configured for the virtual scene according to the scene type corresponding to the virtual scene, and the target audio setting parameters comprise at least one of sound attenuation parameters, acoustic material parameters of each virtual article model in the virtual scene, early reflected sound parameters, reverberation parameters and spatialization parameters of environment sound.

Optionally, the first audio module 1102 is specifically configured to:

Optionally, the first virtual object includes an anchor-controlled virtual character that joins the virtual scene through an anchor end, and the original audio signal includes a sound generated in a real scene by the anchor.

Optionally, the first virtual object further includes a virtual character controlled by a target object joined to the virtual scene by another client, and the first audio signal includes a sound generated in the real scene by the target object;

the determining module 1101 is specifically configured to:

responding to a voice request, and establishing voice connection between the client and the other clients;

receiving the position information of the target object in the real scene and the sound generated by the target object in the real scene, which are sent by the other clients;

determining the first position information based on the position information of the target object, and determining a sound generated by the target object in a real scene as the original audio signal.

Optionally, the apparatus 1100 further comprises:

an audio acquisition module configured to acquire a corresponding ambient audio signal in response to a trigger instruction for indicating that ambient sound is triggered in the virtual scene;

the first audio processing module is configured to perform spatialization processing on the environment audio signal according to the trigger position of the environment audio signal in the virtual scene and the second position information to obtain a third audio signal;

the second audio processing module is configured to process the third audio signal according to the posture information of the second virtual object to obtain a fourth audio signal;

an audio module configured to output the fourth audio signal.

The specific execution logic of each functional module in the apparatus 1100 is described in detail in the section of the method, and is not described in detail here.

Referring now to fig. 12, a schematic diagram of a client 600 suitable for use in implementing embodiments of the present disclosure is shown. Clients in embodiments of the present disclosure may include, but are not limited to, VR devices. The client illustrated in fig. 12 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.

As shown in fig. 12, client 600 may include a VR device (not shown in fig. 12), a processing device (e.g., a central processing unit, a graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage device 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the client 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the client 600 to perform wireless or wired communication with other devices to exchange data. While fig. 12 illustrates a client 600 having various devices, it is to be understood that not all illustrated devices are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the client, server, and host may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the client; or may exist separately and not be assembled into the client.

The computer readable medium carries one or more programs which, when executed by the client, cause the client to: determining first position information of a first virtual object in a virtual scene and an original audio signal generated by the first virtual object; according to second position information of a second virtual object in the virtual scene and the first position information, carrying out spatialization processing on the original audio signal to obtain a first audio signal, wherein the second virtual object is a virtual role controlled by the client; processing the first audio signal according to the attitude information of the second virtual object to obtain a second audio signal; outputting the second audio signal.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. An audio playing method applied to a client includes:

outputting the second audio signal.

2. The method according to claim 1, wherein the spatializing the original audio signal according to the second position information of the second virtual object in the virtual scene and the first position information to obtain a first audio signal comprises:

according to the second position information and the first position information, combining with a target audio setting parameter matched with the virtual scene, and carrying out spatialization processing on the original audio signal to obtain a first audio signal;

3. The method of claim 2, wherein the target audio setting parameter is obtained by:

4. The method of claim 2, wherein the target audio setting parameter is obtained by:

5. The method of claim 1, wherein the first virtual object comprises a cast-controlled virtual character that joins the virtual scene through a cast end, and wherein the original audio signal comprises sound generated by the cast in a real scene.

6. The method of claim 5, wherein the first virtual object further comprises a virtual character controlled by a target object joined to the virtual scene by another client, and wherein the first audio signal comprises a sound generated in the real scene by the target object;

the determining first position information of a first virtual object in a virtual scene and an original audio signal generated by the first virtual object comprises:

receiving position information of the target object in a real scene and sound generated by the target object in the real scene, which are sent by the other clients;

7. The method of claim 5, further comprising:

responding to a trigger instruction for indicating that the environmental sound is triggered in the virtual scene, and acquiring a corresponding environmental audio signal;

outputting the fourth audio signal.

8. An audio playing apparatus, applied to a client, includes:

an output module configured to output the second audio signal.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the audio playback method according to any one of claims 1 to 7.

10. A client, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the audio playback method as claimed in any one of claims 1 to 7.

11. A virtual reality live system, comprising:

the system comprises a main broadcasting terminal, a server and a broadcasting server, wherein the main broadcasting terminal is configured to acquire main broadcasting position information in a real scene and a first original audio signal generated by the main broadcasting terminal and send the main broadcasting position information and the first original audio signal to the server;

the client is configured to:

outputting the second audio signal.

12. The system of claim 11, wherein the client is specifically configured to:

according to the second position information and the first position information, combining with a target audio setting parameter matched with the virtual scene, and carrying out spatialization processing on the first original audio signal to obtain a first audio signal;

13. The system of claim 12, wherein the client is specifically configured to:

14. The system of claim 12, wherein the client is specifically configured to:

15. The system of claim 11, wherein the anchor is further configured to:

sending a trigger instruction for indicating that the environmental sound is triggered in the virtual scene to the client;

the client is further configured to:

responding to the trigger instruction, and acquiring a corresponding environment audio signal, wherein the environment audio signal is stored on the client when the client loads the virtual scene;

outputting the fourth audio signal.

16. The system of claim 11, wherein the client is further configured to:

responding to a voice request, and establishing voice connection between the client and other clients, wherein the other clients are clients corresponding to a third virtual object added into the virtual scene;

outputting the sixth audio signal.