CN116996701A

CN116996701A - Audio processing method, device, electronic equipment and storage medium

Info

Publication number: CN116996701A
Application number: CN202310953346.2A
Authority: CN
Inventors: 支学超; 张�浩; 王俊铮; 唐湘军
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-11-03
Also published as: WO2025026354A1

Abstract

The embodiment of the disclosure provides an audio processing method, an audio processing device, electronic equipment and a storage medium. The method comprises the steps of determining a first wheat position and a second wheat position in a live broadcasting room; determining a first spatial audio coordinate corresponding to the first wheat bit and a second spatial audio coordinate corresponding to the second wheat bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects; and adjusting the audio data of the first wheat bit and the audio data of the second wheat bit based on the first spatial audio coordinate and the second spatial audio coordinate. The scheme improves the audio effect in the wheat scene.

Description

Audio processing method, device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to an audio processing method, an audio processing device, electronic equipment and a storage medium.

Background

With the continuous development of internet technology, live broadcasting based on the internet becomes an indispensable part of daily life, for example, live broadcasting interaction is performed through live broadcasting. Live scenes can be generally divided into single-person broadcasting, multi-person online interaction and the like, and the listening feeling and the immersion feeling of the frequency experience in the existing live continuous-wheat scene are lower.

Disclosure of Invention

The disclosure provides an audio processing method, an audio processing device, electronic equipment and a storage medium, so as to solve the problem that the hearing and immersion of audio experience are lower in a live broadcast process of a multi-person scene.

In a first aspect, an embodiment of the present disclosure provides an audio processing method, including:

determining a first wheat position and a second wheat position in a live broadcasting room;

determining a first spatial audio coordinate corresponding to the first wheat bit and a second spatial audio coordinate corresponding to the second wheat bit;

and adjusting the audio data of the first wheat bit and the audio data of the second wheat bit based on the first spatial audio coordinate and the second spatial audio coordinate.

In a second aspect, embodiments of the present disclosure further provide an audio processing apparatus, the apparatus including:

the first determining module is used for determining a first wheat position and a second wheat position in the live broadcasting room;

the second determining module is used for determining a first spatial audio coordinate corresponding to the first wheat bit and a second spatial audio coordinate corresponding to the second wheat bit;

and the audio processing module is used for adjusting the audio data of the first wheat bit and the audio data of the second wheat bit based on the first spatial audio coordinate and the second spatial audio coordinate.

In a third aspect, embodiments of the present disclosure further provide an electronic device, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the audio processing method described in any of the embodiments of the present disclosure.

In a fourth aspect, the presently disclosed embodiments also provide a storage medium containing computer-executable instructions that, when executed by a computer processor, are configured to perform the audio processing method of any of the embodiments of the present disclosure.

According to the technical scheme, for the first wheat position and the second wheat position of the same direct broadcasting room, the first space audio coordinate corresponding to the first wheat position and the second space audio coordinate corresponding to the second wheat position are determined, and the functions of simulating three-dimensional space sound effects by utilizing the first space audio coordinate and the second space audio coordinate can be utilized to adjust the audio data of the first wheat position and the second wheat position in real time, so that the audio effect in a continuous wheat scene is improved, and the user experience is improved.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a flow chart of an audio processing method according to an embodiment of the disclosure;

FIG. 2 is a flow chart of another audio processing method provided by an embodiment of the present disclosure;

FIG. 3 is a link flow diagram for implementing spatial audio effects as applicable to embodiments of the present disclosure;

FIG. 4 is a flow chart of yet another audio processing method provided by an embodiment of the present disclosure;

FIG. 5a is a schematic diagram of a listening angle based on a wheat-head layout according to an embodiment of the present disclosure;

FIG. 5b is a schematic diagram of coordinate axes of a viewing object vector orientation as applicable to embodiments of the present disclosure;

FIG. 5c is a partial schematic diagram of a wheat head layout pattern in a direct broadcast room, to which embodiments of the present disclosure are applicable;

FIG. 5d is a schematic diagram of a distribution position of wheat bits in a direct broadcast room, to which embodiments of the present disclosure are applicable;

FIG. 6 is a flow chart of yet another audio processing method provided by an embodiment of the present disclosure;

FIG. 7 is a link flow diagram of another implementation of spatial audio effects to which embodiments of the present disclosure are applicable;

FIG. 8 is a flow chart of yet another audio processing method provided by an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a wheat position distribution in a live broadcast room in a special scenario to which the embodiments of the present disclosure are applicable;

fig. 10 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device implementing an audio processing method according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

Fig. 1 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure, where the embodiment of the present disclosure is suitable for a situation of adjusting audio data of each wheat position when a live broadcast room is connected with a wheat, the method may be performed by an audio processing apparatus, and the apparatus may be implemented in a form of software and/or hardware, and optionally, may be implemented by being configured in any electronic device having a network communication function, where the electronic device may be a mobile terminal, a PC end, a server, or the like.

As shown in fig. 1, the audio processing method in the present embodiment may include, but is not limited to, the following processes:

s110, determining a first wheat position and a second wheat position in the live broadcasting room.

Live scenes can be generally divided into single-person broadcasting, multi-person online interaction and the like, particularly multi-person scenes in live broadcasting, and live broadcasting interaction is carried out by connecting a plurality of wheat-connected objects (the wheat-connected objects can be main broadcasting or guests) together through wheat positions. The wheat position area of the live broadcasting room can comprise at least one two wheat positions, the equipment ends corresponding to different wheat positions can be remotely subjected to wheat connection interaction through a plurality of wheat positions of the same live broadcasting room, audio input can be performed at the equipment ends corresponding to the used wheat positions, and input audio sharing is realized during the wheat connection interaction through the wheat positions of the live broadcasting room.

Under the condition that the wheat connecting object in the live broadcasting room comprises a main broadcasting and at least one guest, the first wheat position in the live broadcasting room can be the wheat position corresponding to the main broadcasting side in the live broadcasting room, and the second wheat position in the live broadcasting room can be the wheat position corresponding to the guest side in the live broadcasting room; or the first wheat position and the second wheat position in the live broadcasting room are the wheat positions corresponding to the guest side in the live broadcasting room.

Under the condition that the wheat connecting object in the live broadcasting room comprises a plurality of anchor and at least one guest, the first wheat position and the second wheat position in the live broadcasting room can be the corresponding wheat positions on the anchor side; or, the first wheat position in the live broadcasting room can be the wheat position corresponding to the main broadcasting side in the live broadcasting room, and the second wheat position in the live broadcasting room can be the wheat position corresponding to the guest side in the live broadcasting room.

S120, determining a first spatial audio coordinate corresponding to the first microphone bit and a second spatial audio coordinate corresponding to the second microphone bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects.

The picture of each wheat connecting object (such as a host or a guest of a wire) in the live broadcasting room is placed on a fixed wheat position in the live broadcasting room, and the wheat position of the wheat connecting object of each wire is different along with the different layout of the wheat positions. Although visually, the positions of the wheat connecting objects of each connecting line can be visually distinguished due to different wheat position layouts, the sound of the wheat connecting object corresponding to each wheat position is listened from the audience angle without explicit azimuth sense, and the sounds on both sides are balanced, so that when the wheat connecting objects on a plurality of wheat positions speak, the source of the voice content cannot be distinguished at the first time, and the wheat connecting object can be a host or a guest.

In order to further optimize the immersion of the audio, the audio corresponding to the connecting wheat bit in the live broadcasting room can jump out of the fixed setting, the spatial audio coordinates bound by each wheat bit in the live broadcasting room can be determined, for example, the first spatial audio coordinate corresponding to the first wheat bit and the second spatial audio coordinate corresponding to the second wheat bit can be adapted to different spatial audio coordinates in the audio data corresponding to different wheat bits, and further, the simulation of different three-dimensional spatial audio effects can be realized by the audio data corresponding to each wheat bit along with the different spatial audio coordinates associated with the wheat bit, instead of the solidified spatial audio effect adopted by the audio data corresponding to each wheat bit.

Optionally, when the preset wheat position event is triggered in the live broadcasting room, a first wheat position and a second wheat position for triggering the preset wheat position event in the live broadcasting room can be determined, so that proper spatial audio coordinates are further matched for the first wheat position and the second wheat position, and the adjustment of the spatial audio effect on the audio data of the first wheat position and the audio data of the second wheat position is achieved. The preset wheat level event can comprise that a wheat level is on line or off line in the live broadcasting room, and the wheat level in the live broadcasting room triggers mute or unmute, or the wheat level in the live broadcasting room triggers audio output.

S130, adjusting the audio data of the first wheat bit and the audio data of the second wheat bit based on the first spatial audio coordinate and the second spatial audio coordinate.

For the audio data of the first wheat and the audio data of the second wheat in the live broadcasting room, the audio effect of the audio data of each wheat can be adjusted in real time by utilizing the function of simulating the three-dimensional space sound effect of the space audio coordinates, so that the audio data of the first wheat and the audio data of the second wheat in the live broadcasting room can be dynamically adjusted along with the change of the space audio coordinates associated with the wheat, and then the adjusted audio data is output, the audio effect of the audio data can bring deeper audio source hearing and immersion sense when the audio output is carried out, the problem that the hearing and immersion sense of the audio experience is not high when the live broadcasting is carried out on the wheat connecting scene is solved, the relatively clear azimuth sense when the audio of each wheat is listened from the audience angle is ensured as much as possible, and the relatively good dynamic space audio effect is realized, so that the audio data of different wheat can be effectively distinguished from the hearing sense, and the audio data of different wheat are not limited to be visually distinguished.

Optionally, different spatial audio coordinates are associated with different spatial audio parameters, and the same audio data is subjected to audio effect adjustment through the different spatial audio parameters, so that the same audio data has different spatial audio effects, namely different three-dimensional spatial audio effects are simulated. For the audio data of the first wheat bit and the audio data of the second wheat bit, the first spatial audio coordinate of the first wheat bit can be used for carrying out spatial audio effect adjustment on the audio data of the first wheat bit, and the second spatial audio coordinate of the second wheat bit can be used for carrying out spatial audio effect adjustment on the audio data of the second wheat bit, so that even if the content of the audio data of the two wheat bits is the same, different spatial sound effects can be generated when the audio data of the two wheat bits are output.

Optionally, for the audio data of the first microphone, the audio data of the first microphone may be processed into a left channel audio segment and a right channel audio segment according to the first spatial audio coordinate, and a difference value between the left channel audio segment and the right channel audio segment may be adjusted according to the first spatial audio coordinate, so as to generate the audio data of the first microphone having a three-dimensional stereoscopic sense; for the audio data of the second microphone, the audio data of the second microphone may be processed into a left channel audio segment and a right channel audio segment according to the second spatial audio coordinates, and a difference between the left channel audio segment and the right channel audio segment may be adjusted according to the second spatial audio coordinates to generate the audio data of the second microphone having a three-dimensional sense of stereo orientation.

According to the technical scheme, for the first wheat position and the second wheat position of the same live broadcasting room wheat position area, through determining the first spatial audio coordinate corresponding to the first wheat position and the second spatial audio coordinate corresponding to the second wheat position, the audio data of the first wheat position and the second wheat position can be adjusted in real time by utilizing the function that the first spatial audio coordinate and the second spatial audio coordinate can simulate three-dimensional spatial audio effects, so that the audio data of each wheat position in the live broadcasting room has spatial audio effects, the problem that the hearing feeling and the immersion feeling of the audio experience are not high under the situation of live broadcasting and wheat connection of a plurality of scenes is solved, and particularly, because the wheat positions have the associated spatial audio coordinates to adjust the audio data, not only can the audio data of different wheat positions have better azimuth feeling visually, but also the audio data of different wheat positions can be guaranteed to have clearer azimuth feeling when the audio of each wheat position is listened to from a listener angle as far as possible, the problem that audio cannot be distinguished from the first time source when a plurality of wheat positions in the same live broadcasting room can not be simultaneously solved as far as possible.

Fig. 2 is a schematic flow chart of another audio processing method provided in the embodiment of the present disclosure, where the process of determining the first spatial audio coordinate corresponding to the first microphone bit and the second spatial audio coordinate corresponding to the second microphone bit in the foregoing embodiment is further optimized based on the foregoing embodiment, and the present embodiment may be combined with each of the alternatives in the foregoing one or more embodiments. As shown in fig. 2, the audio processing method of the present embodiment may include the following processes:

S210, determining a first wheat position and a second wheat position in the live broadcasting room.

S220, determining wheat position layout information corresponding to the first wheat position at a first equipment end corresponding to the first wheat position, wherein the wheat position layout information is used for describing position distribution when each wheat position is displayed in the live broadcasting room.

S230, determining a first spatial audio coordinate corresponding to the first wheat bit based on the wheat bit layout information corresponding to the first wheat bit, wherein different spatial audio configuration information is adopted by different wheat bit layouts, and the spatial audio configuration information records the association relation between each wheat bit and the spatial audio coordinate in each wheat bit layout.

The space audio coordinates are used for simulating three-dimensional space sound effects.

In the multi-user live broadcast scene, the live broadcast room wheat position area can comprise at least two wheat positions, and particularly when the number of the wheat positions is relatively large, the layout patterns adopted by all the wheat positions in the live broadcast room wheat position area are different. In order to enable different spatial audio configuration information for recording the association relation between each wheat bit and the corresponding spatial audio parameter to be established in advance according to different wheat bit layouts in a live broadcasting room, the spatial audio effect can be adjusted by adopting a matched adjustment strategy for the subsequent audio data of each wheat bit, and the spatial audio capability is configured.

For example, referring to fig. 3, taking a wheat linking object of a first wheat bit corresponding to a live broadcasting room as a main cast, taking a wheat linking object of a second wheat bit corresponding to a live broadcasting room as a main cast or a guest as an example, a first spatial audio coordinate corresponding to a first wheat bit can be obtained at a first equipment end corresponding to the first wheat bit according to wheat bit layout information corresponding to the first wheat bit, and a spatial audio service at the first equipment end corresponding to the first wheat bit can update the spatial audio coordinate to a real-time communication RTC engine at the first equipment end corresponding to the first wheat bit, and correcting the spatial audio coordinate to be used in association with the first wheat bit through the real-time communication RTC engine at the first equipment end. In addition, the first spatial audio coordinates corresponding to the determined first microphone bit can be synchronized to the server for storage, and the first spatial audio coordinates corresponding to the first microphone bit can be synchronized to the second equipment end corresponding to the second microphone bit through real-time communication RTC imp signaling, so that the second equipment end locally corrects the spatial audio coordinates to be used in association with the second microphone bit.

As an alternative but non-limiting implementation, determining the first spatial audio coordinate corresponding to the first wheat bit based on the wheat bit layout information corresponding to the first wheat bit may include steps A1-A2:

Step A1, based on the wheat position layout information corresponding to the first wheat position, acquiring first space configuration information issued by a server, wherein the first space configuration information is space audio configuration information configured for the wheat position layout information corresponding to the first wheat position.

And A2, determining a first spatial audio coordinate corresponding to the first wheat bit according to the first spatial audio configuration information.

For the same wheat bit layout, the spatial configuration information to be used under different service scenes may also be different, that is, the spatial audio configuration information adapted by the same wheat bit layout under different service scenes may be different, so that even the same wheat bit layout is based on the different service scenes, the adapted spatial audio configuration information may be different.

If the proper spatial audio configuration information is directly matched at the equipment end corresponding to the wheat bit according to the wheat bit layout in the live broadcasting room, so that the wheat bit is matched to the proper spatial audio configuration information, more matching cost may need to be paid. In order to realize low-cost change of the spatial audio coordinates of different wheat bit adaptations so as to provide better hearing experience, the configuration capability of the spatial audio coordinates in different service scenes is required, and the performance cost of the equipment end corresponding to the wheat bit is not increased as much as possible.

Therefore, the process of matching the spatial audio configuration information corresponding to each wheat bit layout in different service scenes can be set at the server, the spatial audio configuration information matched with the various wheat bit layout information in different service scenes is built by means of the performance advantage of the server, and then the spatial audio configuration information associated with each wheat bit layout in different service scenes is stored at the server. When the first wheat bit corresponds to the first equipment end and the spatial audio configuration information adapted to the wheat bit layout corresponding to the first wheat bit needs to be acquired, the first spatial configuration information corresponding to the first wheat bit can be obtained from the server in combination with the service scene, and then the first spatial audio coordinate corresponding to the first wheat bit can be queried from the first spatial configuration information.

For example, referring to fig. 3, taking a wheat linking object of a first wheat bit corresponding to a live broadcasting room as a main cast, taking a wheat linking object of a second wheat bit corresponding to a live broadcasting room as a main cast or a guest as an example, after the first wheat bit corresponding to the wheat linking object is switched to a multi-person scene, a spatial audio service may be started, the first equipment end corresponding to the first wheat bit may acquire wheat bit layout information corresponding to the first wheat bit, search first spatial audio configuration information matched with the wheat bit layout information corresponding to the first wheat bit, and determine a first spatial audio coordinate corresponding to the first wheat bit from the first spatial audio configuration information, where the wheat linking object may be the main cast or the guest. In addition, the first spatial audio coordinates corresponding to the determined first microphone bit can be synchronized to the server for storage, so that the second device end corresponding to the second microphone bit can acquire the first spatial audio coordinates corresponding to the first microphone bit from the server.

By adopting the scheme, the capacity of setting the spatial audio coordinates is served, the performance pressure of configuration at the equipment end can be reduced by giving the capacity to the server end, and the spatial audio effects of different subsequent service scenes can be realized by directly acquiring the spatial audio coordinates from the server end.

S240, receiving a second spatial audio coordinate corresponding to the second wheat bit, wherein the second spatial audio coordinate is a spatial audio coordinate which is determined by the second equipment end corresponding to the second wheat bit and is matched with the second wheat bit based on the wheat bit layout information corresponding to the second wheat bit.

The second spatial audio coordinates corresponding to the second wheat bit are needed at the first equipment end corresponding to the first wheat bit, the spatial audio coordinates which are matched with the second wheat bit and are determined by the second equipment end corresponding to the second wheat bit based on the wheat bit layout information corresponding to the second wheat bit can be directly received, so that the first equipment end corresponding to the first wheat bit only needs to receive the first spatial configuration information issued from the service end, then the first spatial configuration information is utilized to query the first spatial audio coordinates corresponding to the first wheat bit, the second spatial audio coordinates corresponding to the second wheat bit are determined by the second equipment end corresponding to the second wheat bit based on the wheat bit layout information corresponding to the second wheat bit, and then the first equipment end corresponding to the first wheat bit only needs to receive. The method and the system have the advantages that the determination of the spatial audio coordinates corresponding to different wheat positions in the same live broadcasting room is separately executed by the equipment ends corresponding to the respective wheat positions, so that the equipment overhead surge caused by concentration in one equipment is reduced.

Optionally, for the first wheat position and the second wheat position in the live broadcasting room, each wheat position can be configured with a default spatial audio coordinate at the same time, and when the spatial audio configuration information associated with the wheat position layout information is not acquired from the server side to search the spatial audio coordinate, the default spatial audio coordinate corresponding to the wheat position can be directly started.

As an alternative but non-limiting implementation, receiving the second spatial audio coordinates corresponding to the second microphone bit may include the steps of:

receiving a second space audio coordinate corresponding to a second wheat bit sent by a second equipment end corresponding to the second wheat bit through real-time communication with the second equipment end corresponding to the second wheat bit; or,

and receiving a second spatial audio coordinate corresponding to a second wheat bit sent from the server, wherein the second spatial audio coordinate corresponding to the second wheat bit is pre-synchronized to the server by a second equipment terminal corresponding to the second wheat bit.

For example, referring to fig. 3, still taking a wheat linking object of the first wheat bit corresponding to the live broadcasting room as a host, taking a wheat linking object of the second wheat bit corresponding to the live broadcasting room as an host or a guest as an example, on a wheat linking object side corresponding to the second wheat bit, a spatial audio service may be started, a second equipment end corresponding to the second wheat bit may acquire wheat bit layout information adopted by the second wheat bit from the server end, search second spatial audio configuration information matched with the wheat bit layout information corresponding to the second wheat bit, and determine second spatial audio coordinates corresponding to the second wheat bit from the second spatial audio configuration information, where the wheat linking object may be the host or the guest.

In addition, the second spatial audio coordinates corresponding to the determined second microphone bit can be synchronized to the server for storage, so that the first equipment end corresponding to the first microphone bit can acquire the second spatial audio coordinates corresponding to the second microphone bit from the server later; or, the second spatial audio coordinates corresponding to the second microphone bit can be synchronized to the first equipment end corresponding to the first microphone bit through real-time communication RTC imp signaling, so that the first equipment end corresponding to the first microphone bit can conveniently adjust the audio data of the second microphone bit based on the second spatial audio coordinates corresponding to the second microphone bit, and the spatial audio effect is achieved.

S250, adjusting the audio data of the first wheat bit and the audio data of the second wheat bit based on the first spatial audio coordinate and the second spatial audio coordinate.

Fig. 4 is a schematic flow chart of another audio processing method provided in an embodiment of the present disclosure, where the process of adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate in the foregoing embodiment is further optimized based on the foregoing embodiment, and the present embodiment may be combined with each of the alternatives in one or more embodiments. As shown in fig. 4, the audio processing method of the present embodiment may include the following processes:

s410, determining a first wheat position and a second wheat position in the live broadcasting room.

S420, determining a first spatial audio coordinate corresponding to the first microphone bit and a second spatial audio coordinate corresponding to the second microphone bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects.

S430, adjusting the audio data of the second microphone based on the second spatial audio coordinate at the first device end corresponding to the first microphone, and outputting the audio at the first device end corresponding to the first microphone.

Referring to fig. 3, taking a first wheat bit corresponding to a wheat linking object in a live broadcasting room as a main cast, a second wheat bit corresponding to a wheat linking object in a live broadcasting room as a main cast or a guest as an example, the first wheat bit corresponding to a first equipment end starts a spatial audio service, and the spatial audio effect adjustment is not locally performed on the audio data of the first wheat bit corresponding to the first equipment end. For the audio data of the second wheat bit, the first equipment end corresponding to the first equipment end can simulate the audio effect of the audio data of the second wheat bit by utilizing the function of simulating the three-dimensional spatial audio effect according to the acquired second spatial audio coordinate related to the second wheat bit, so that the audio data of the second wheat bit in the live broadcasting room can be dynamically regulated along with the change of the spatial audio coordinate related to the wheat bit in the first equipment end corresponding to the first equipment end, and the regulated audio data is further output, so that the audio effect of the audio data can bring deeper audio source hearing feeling and immersion feeling when the audio output is carried out, the problem that the audio experience hearing feeling and immersion feeling are not high when the live broadcasting is carried out on the continuous wheat scene is solved, and the audio effect of the second wheat bit is ensured to have a definite azimuth feeling when the first equipment end listens to the sound of the second wheat bit from the audience angle as much as possible.

S440, the first spatial audio coordinates corresponding to the first microphone bit are sent to the second equipment end corresponding to the second microphone bit, so that the second equipment end corresponding to the second microphone bit adjusts the audio data of the first microphone bit based on the first spatial audio coordinates and outputs the audio at the second equipment end corresponding to the second microphone bit.

The first equipment end and the second equipment end belong to different equipment ends respectively.

Referring to fig. 3, taking a wheat linking object of the first wheat position corresponding to the live broadcasting room as a main cast and a wheat linking object of the second wheat position corresponding to the live broadcasting room as a guest as an example, for the audio data of the first wheat position, the first equipment end corresponding to the first wheat position may select to send the first spatial audio coordinate corresponding to the first wheat position to the second equipment end corresponding to the second wheat position. The second microphone corresponds to the second equipment end locally, the audio effect of the audio data of the first microphone can be adjusted in real time by utilizing the function of simulating three-dimensional spatial sound effect of the spatial audio coordinates according to the first spatial audio coordinates corresponding to the acquired first microphone, so that the audio data of the first microphone in the live broadcasting room can be dynamically adjusted along with the change of the spatial audio coordinates associated with the microphone at the second equipment end corresponding to the second microphone, and the adjusted audio data is further output, so that the audio effect of the audio data can bring more deeply-mentioned sound source hearing sensation and immersion sensation when the audio is output, the problem that the hearing sensation and immersion sensation of the audio experience are not high when the live broadcasting is carried out on a continuous microphone scene is solved, and the audio effect of the audio data is ensured to have more definite azimuth sensation when the second equipment end listens to the sound of the first microphone from the audience angle as much as possible.

As an alternative but non-limiting implementation, the adjusting of the audio data of the second microphone based on the second spatial audio coordinates may comprise steps B1-B2:

step B1, determining a first hearing angle of a wheat connecting object corresponding to a first equipment end relative to a second wheat position, wherein the first hearing angle is a preset angle formed by a first reference direction and a second reference direction, the first reference direction is determined based on a connection line direction of the wheat connecting object and the first reference wheat position in a live broadcasting inter-wheat position area corresponding to the first equipment end, the first reference wheat position is a wheat position where an interesting area of the wheat connecting object in the live broadcasting inter-wheat position area corresponding to the first equipment end is located, and the second reference direction is determined based on a connection line direction of the wheat connecting object and the second wheat position in the live broadcasting inter-wheat position area corresponding to the first equipment end, and the wheat connecting object can be a host or a guest.

The region of interest of the wheat connecting object can be any preset wheat position region in the wheat position region of the live broadcasting room, or can be a wheat position region highlighted after the triggering operation is performed on the wheat position region of the live broadcasting room, and the triggering operation can comprise clicking or other triggering operations.

And B2, adjusting the audio data of the second microphone based on the second space audio coordinate and the first listening included angle.

Referring to fig. 5a and fig. 5b, the first listening angle is a preset angle formed by the first reference direction and the second reference direction, the connection direction of the first reference wheat position in the live broadcasting room wheat position area corresponding to the first equipment end from the position of the wheat connecting object is determined, and the first reference wheat position can be the wheat position in the interest area of the wheat connecting object in the live broadcasting room wheat position area of the first equipment end, for example, in the wheat position layout of a nine-grid scene, when the wheat connecting object is interested in the wheat position in the middle position in the wheat position area, the wheat position in the middle position is taken as the wheat position in the wheat connecting object sensing area, and at the moment, the first reference wheat position can be the wheat position in the middle position in the nine Gong Gemai-position area in the live broadcasting room of the first equipment end, specifically, the wheat position 5 in fig. 5 a; when the wheat connecting object is located in the wheat position sensing area of the edge position in the wheat position area, the wheat position located in the edge position can be used as the wheat position located in the wheat position sensing area of the wheat connecting object, and the first reference wheat position can be the wheat position located in the edge position in the nine Gong Gemai bit area in the live broadcasting room of the first equipment, specifically, the wheat position located in the edge position such as the number 1 or the number 9 in fig. 5 a. The connection direction of the connection line of the first wheat position corresponding to the first equipment end and the second wheat position in the direct broadcasting room of the first equipment end corresponding to the first wheat position can be determined to be the second reference direction, and the connection object can be a host or a guest.

The direction of the wheat connecting object corresponding to the first equipment end in the first equipment end towards the first reference wheat position in the live broadcasting room wheat position area from the wheat connecting object position can be determined based on the wheat connecting object vector direction corresponding to the first equipment end in the first wheat position, the first reference direction can be determined, meanwhile, the second reference direction can be determined based on the connecting line direction of the wheat connecting object and the first reference wheat position in the live broadcasting room wheat position area corresponding to the first equipment end in the first wheat position, an included angle formed by the first reference direction and the second reference direction can be used as a first listening included angle of the wheat connecting object corresponding to the first equipment end in the first wheat position relative to the second wheat position, and therefore the audio data of the second wheat position can be adjusted by combining the angle of the wheat connecting object of the first equipment end with the second spatial audio coordinates.

Referring to fig. 5a, the coordinate axes of the spatial audio are shown in the figure, and are represented by forward coordinate axes, up coordinate axes and right coordinate axes, and the first listening angle of the headset object corresponding to the first equipment end and the second headset end is calculated by the headset object through the spatial audio coordinate and the vector direction of the headset object and the second headset in the direct broadcasting room corresponding to the first equipment end, so that the audio data of the second headset end is processed to form the spatial audio effect. The vector orientation of the wheat connecting object can be represented by a rotation angle around the up coordinate axis, a rotation angle around the forward coordinate axis and a rotation angle around the right coordinate axis when the wheat connecting object starts to face the direction of a straight line of the first reference wheat position in the direct broadcasting room wheat position area of the first equipment end corresponding to the first wheat position.

Optionally, referring to fig. 5c and fig. 5d, taking an example that a wheat position layout corresponding to a first wheat position in a wheat position area in a live broadcasting room is nine Gong Gemai positions, it is assumed that a wheat position No. 5 in a nine Gong Gemai position area in the live broadcasting room is a wheat position in an interest area of a wheat connecting object, in order to achieve an effect of playing audio from an angle of the wheat connecting object side of the first device end, coordinates and an orientation of the wheat connecting object may be set to be right in front of the position of the wheat connecting object No. 5, so a corresponding sound included angle exists between a listening position of the wheat connecting object and the wheat connecting object on the wheat position, and an included angle calculation rule thereof is as follows: the link wheat object coordinates are (0, L), where L represents the height of the link wheat object from the position of wheat bit No. 5 (or the center of the wheat bit layout in the direct broadcast room), L must be greater than 0, and each wheat bit coordinates are shown in fig. 5d, where a represents the relative distance of one wheat bit from another in the wheat bit layout.

Wherein, L and A jointly determine the included angle of the sound of the wheat connecting object with different wheat positions heard by the wheat connecting object, and the specific relation is as follows: theta=atan (a/L), and corresponding a and L matching values can be back-pushed according to the magnitude of the included angle. For example, the included angle is 20 degrees, thenThe A and L of the service side configuration are 100, so that the included angle between the sound of the wheat connecting object and the sound of the wheat connecting object is +. >

The calculation is performed from the position of the wheat connecting object towards the position of the wheat connecting object in the wheat connecting region of the live broadcasting room, and the change of the state of some users can cause the regular change of the calculation rules of the included angle, such as turning the head, nodding the head, and the like.

Fig. 6 is a schematic flow chart of another audio processing method provided in an embodiment of the present disclosure, where the process of adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate in the foregoing embodiment is further optimized based on the foregoing embodiment, and the present embodiment may be combined with each of the alternatives in one or more embodiments. As shown in fig. 6, the audio processing method of the present embodiment may include the following processes:

s610, determining a first wheat position and a second wheat position in the live broadcasting room.

S620, determining a first spatial audio coordinate corresponding to the first microphone bit and a second spatial audio coordinate corresponding to the second microphone bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects.

S630, determining the audio data of the first wheat bit collected at the first equipment end corresponding to the first wheat bit.

S640, receiving the audio data of the second wheat bit sent from the second equipment end corresponding to the second wheat bit at the first equipment end corresponding to the first wheat bit.

S650, adjusting the audio data of the first microphone bit based on the first spatial audio coordinate and adjusting the audio data of the second microphone bit based on the second spatial audio coordinate.

For the audio data of the first wheat position corresponding to the first equipment end and the audio data of the second wheat position corresponding to the second equipment end, the audio output of the first equipment end and the audio output of the second wheat position corresponding to the second equipment end are not performed finally, but the audio output of the third equipment end is performed finally, so that the audio effect of the audio data can be further improved by adjusting the audio data of the first wheat position and the audio data of the second wheat position at the same time by the first equipment end, the audio effect of the audio data of the first wheat position and the audio effect of the audio data of the second wheat position can be adjusted in real time by utilizing the function of simulating three-dimensional spatial sound effect by using spatial audio coordinates, the problem that the audio effect of the audio experience is not high enough in the audio scene and the audio effect of the second wheat position can be ensured to have a clear audio effect from the first equipment end to the second equipment end when the audio is connected with the live broadcast is solved.

As an alternative but non-limiting implementation, the adjusting of the audio data of the first microphone based on the first spatial audio coordinates may comprise steps C1-C2:

and C1, determining a second listening angle of a viewing object of a third equipment end relative to the first wheat position at the first wheat position corresponding to the first equipment end, wherein the second listening angle is a preset angle formed by a third reference direction and a fourth reference direction, the third reference direction is determined based on a reference line, the reference line is a straight line of the second reference wheat position in a live broadcasting wheat position area facing the third equipment end vertically, the second reference wheat position is a wheat position in the middle position in the live broadcasting wheat position area of the third equipment end, and the fourth reference direction is determined based on a preset reference point on the reference line and a connecting line of the first wheat position in the live broadcasting wheat position area of the third equipment end.

And C2, adjusting the audio data of the first microphone based on the first space audio coordinate and the second listening included angle.

The second listening angle is a preset angle formed by the third reference direction and the fourth reference direction, a reference line of a second reference wheat position in a live broadcasting room wheat position area facing the third equipment end vertically is determined, the direction facing the second reference wheat position along the reference line is taken as the third reference direction, the second reference wheat position is a wheat position located in the middle position in the live broadcasting room wheat position area of the third equipment end, for example, in a wheat position layout of a nine-grid scene, the second reference wheat position can be a wheat position located in the middle position in a nine Gong Gemai-bit area in the live broadcasting room of the third equipment end, and particularly a number 5 wheat position in fig. 5 a. A reference point can be selected in advance on the reference line, the reference point can be set according to the distance between the viewing object of the third equipment end and the third equipment end, the direction of the connecting line of the reference point and the first wheat position in the live broadcasting room wheat position area of the third equipment end is determined to be a fourth reference direction, and the viewing object is a spectator.

The direction of the middle position of the viewing object of the third equipment end in the live broadcasting room wheat position area of the third equipment end, which is perpendicular to the third equipment end, can be determined based on the viewing object vector direction of the third equipment end, so that the third reference direction can be determined, the viewing object can be a spectator, meanwhile, the fourth reference direction can be determined based on the connection line from the viewing object of the third equipment end to the first wheat position in the live broadcasting room wheat position area, and an included angle formed by the third reference direction and the fourth reference direction can be used as a second listening included angle of the viewing object of the third equipment end relative to the first wheat position, so that the audio data of the first wheat position can be adjusted from the viewing object angle of the third equipment end through the first spatial audio coordinates according to the angle of the viewing object of the third equipment end, and particularly, the examples of the wheat connecting objects shown in fig. 5 a-5 d can be seen.

As an alternative but non-limiting implementation, the adjusting of the audio data of the second microphone bit based on the second spatial audio coordinates comprises the steps D1-D2:

and D1, determining a third listening angle of the watching object of the third equipment end relative to the second wheat position at the first wheat position corresponding to the first equipment end, wherein the third listening angle is a preset angle formed by a third reference direction and a fifth reference direction, and the fifth reference direction is determined based on a connecting line of a preconfigured reference point on a reference line and the second wheat position in a live broadcasting room of the third equipment end.

And D2, adjusting the audio data of the second microphone based on the second space audio coordinate and the third listening included angle.

The direction of the middle position of the viewing object of the third equipment end in the direction perpendicular to the middle position in the live broadcasting room wheat position area can be determined based on the viewing object vector direction of the third equipment end, and then the third reference direction can be determined, meanwhile, the fifth reference direction can be determined based on the connecting line from the pre-configured reference point on the reference line to the second wheat position in the live broadcasting room wheat position area of the third equipment end, an included angle formed by the third reference direction and the fifth reference direction can be used as a third listening included angle of the viewing object of the third equipment end relative to the second wheat position, and therefore the audio data of the second wheat position can be adjusted from the viewing object angle of the third equipment end according to the viewing object angle of the third equipment end through the second spatial audio coordinates.

Through the above process, for the watching object at the third device end, the audio frequencies of the first microphone and the second microphone can be respectively and properly adjusted according to the listening angles of the watching object and the first microphone and the second microphone and by combining the spatial audio coordinates, so that the watching object can listen to the audio frequency of the first microphone or the audio frequency of the second microphone, which are adjusted by combining the spatial audio coordinates and the listening angles.

And S660, sending the adjusted audio data of the first wheat bit and the adjusted audio data of the second wheat bit to a third equipment end for audio output, wherein the first equipment end, the second equipment end and the third equipment end belong to different equipment ends.

As an optional but non-limiting way, the sending the adjusted audio data of the first microphone and the adjusted audio data of the second microphone to the third device end for audio output may include steps E1-E2:

and E1, merging the adjusted audio data of the first wheat bit and the adjusted audio data of the second wheat bit into a target audio stream at a first equipment end corresponding to the first wheat bit.

And E2, transmitting the target audio stream to a content distribution network so as to forward the target audio stream to at least two third equipment ends through the content distribution network for audio output.

Referring to fig. 7, when audio data of the first microphone bit and audio data of the second microphone bit need to be output at the third device end, the audio data of the first microphone bit and the audio data of the second microphone bit may be collected at the first device end first, and spatial audio effect adjustment may be performed. And the adjusted audio data of the first wheat bit and the adjusted audio data of the second wheat bit are combined into a target audio stream, when the real-time communication RTC engine is combined, the audio stream with the adjusted spatial audio effect is pushed to the content delivery network CDN, and the audio stream of the content delivery network CDN has the spatial audio effect.

Fig. 8 is a schematic flow chart of another audio processing method provided in an embodiment of the present disclosure, where the process of adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate in the foregoing embodiment is further optimized based on the foregoing embodiment, and the present embodiment may be combined with each of the alternatives in the foregoing one or more embodiments. As shown in fig. 8, the audio processing method of the present embodiment may include the following processes:

S810, determining a first wheat position and a second wheat position in the live broadcasting room.

S820, determining a first spatial audio coordinate corresponding to the first microphone bit and a second spatial audio coordinate corresponding to the second microphone bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects.

S820, at the first equipment end corresponding to the first wheat position, sending the audio data of the first wheat position and the first spatial audio coordinate acquired at the first equipment end corresponding to the first wheat position to the server end, so that the server end adjusts the audio data of the first wheat position based on the first spatial audio coordinate and adjusts the audio data of the second wheat position based on the second spatial audio coordinate, and sends the adjusted audio data of the first wheat position and the audio data of the second wheat position to the third equipment end for audio output, and the second spatial audio coordinate and the audio data of the second wheat position are received by the server end from the second equipment end corresponding to the second wheat position.

The second spatial audio coordinates are received and stored by the server from the second equipment end corresponding to the second wheat bit, and the first equipment end, the second equipment end and the third equipment end belong to different equipment ends.

As an alternative but non-limiting implementation, the adjusting the audio data of the first microphone based on the first spatial audio coordinates may comprise the steps of: determining a second listening angle of a viewing object of the third equipment end relative to the first wheat position at the service end, wherein the second listening angle is a preset angle formed by a third reference direction and a fourth reference direction, the third reference direction is determined based on a reference line, the reference line is a reference line of the second reference wheat position in a live broadcasting wheat position area facing the third equipment end vertically, the second reference wheat position is a wheat position in the middle position in the live broadcasting wheat position area of the third equipment end, and the fourth reference direction is determined based on a connecting line of a preconfigured reference point on the reference line and the first wheat position in the live broadcasting wheat position area of the third equipment end; and adjusting the audio data of the first microphone based on the first space audio coordinate and the second listening included angle.

As an alternative but non-limiting implementation, the adjusting the audio data of the second microphone based on the second spatial audio coordinates comprises the steps of: at the server, determining a third listening angle of a viewing object of the third equipment end relative to the second wheat position, wherein the third listening angle is a preset angle formed by a third reference direction and a fifth reference direction, the third reference direction is determined based on a reference line, the reference line is a reference line of the second reference wheat position in a live broadcasting wheat position area facing the third equipment end vertically, the second reference wheat position is a wheat position in the middle position in the live broadcasting wheat position area of the third equipment end, and the fifth reference direction is determined based on a connecting line of a preconfigured reference point on the reference line and the second wheat position in the live broadcasting wheat position area of the third equipment end; and adjusting the audio data of the second microphone based on the second space audio coordinate and the third listening included angle.

Optionally, but not limited to, sending the adjusted audio data of the first microphone and the adjusted audio data of the second microphone to the third device for audio output may include: at the server, merging the adjusted audio data of the first wheat bit and the adjusted audio data of the second wheat bit into a target audio stream; and transmitting the target audio stream to a content distribution network so as to forward the target audio stream to at least two third equipment ends through the content distribution network for audio output.

According to the technical scheme, for the first wheat position and the second wheat position of the same live broadcasting room, through determining the first spatial audio coordinate corresponding to the first wheat position and the second spatial audio coordinate corresponding to the second wheat position, and utilizing the function that the first spatial audio coordinate and the second spatial audio coordinate can simulate three-dimensional spatial sound effects, the audio data of the first wheat position and the second wheat position are adjusted in real time, so that the audio data of each wheat position in the live broadcasting room has spatial audio effects, the problem that the hearing feeling and the immersion feeling of audio experience are not high under the situation of live broadcasting of a plurality of people scenes is solved, and particularly, because the wheat positions have relevant spatial audio coordinates to adjust the audio data, the audio data of different wheat positions can have better azimuth feeling visually, meanwhile, the audio data of different wheat positions can be guaranteed to have clearer azimuth feeling when the audio of each wheat position is listened to from a listener angle as much as possible, the problem that the audio source cannot be resolved from the first time in hearing sense when a plurality of wheat positions in the same room are simultaneously sent out is solved as much as possible.

On the basis of the foregoing embodiment, optionally, adjusting the audio data of the second microphone based on the second spatial audio coordinates includes:

when the wheat position layout information corresponding to the first wheat position indicates that the positions of all the wheat positions in the live broadcasting room are distributed in a hierarchical mode, adjusting the audio data of the second wheat position based on the second spatial audio coordinates and the first listening included angle;

when the wheat position layout information corresponding to the first wheat position indicates that the position of each wheat position in the live broadcasting room does not have hierarchical distribution, the audio data of the second wheat position is adjusted only based on the second spatial audio coordinates.

Because the wheat-head layout displays of different service scenes are different, in order to achieve the mode that the hearing sense of the spatial audio can be closer to the wheat-head layout display, the multi-person scene with the effect of accessing the spatial audio being adjusted can be divided into a general multi-person scene and a special scene (such as voice KTV). In the general multi-person scene, the positions of the wheat positions are provided with the upper, lower, left and right layers, so that the watching object wants to obtain better listening experience, the angle of the watching object at the third equipment end is required to be set to process the space audio effect, and the sound on the wheat positions is calculated through the listening coordinates of the watching object at the third equipment end, so that the difference between the left, right, upper and lower directions is more obvious, and the three-dimensional space audio effect is obtained.

Because the voice KTV scene is limited on one height by the wheat position of the wheat connecting object, the voice KTV only needs the effect of 2D space audio because the wheat position of the wheat connecting object is not distinguished from the upper part and the lower part and is only distinguished from the left part and the right part, the calculation of the up coordinate axis orientation is reduced in the calculation of the space audio, and the algorithm operation efficiency is improved. Optionally, in order to ensure singing experience of a singer, processing of the spatial audio effect can be turned off for the singer, so that collision between processing of the spatial audio and K song sound effects can be prevented, speaking of other continuous-microphone objects is prevented from affecting singing experience, and fig. 9 is a spatial audio preference coordinate in a voice ktv scene.

Fig. 10 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present disclosure, where the embodiment of the present disclosure is suitable for a situation of adjusting audio data of each wheat position when a live broadcast room is connected with a wheat, the method may be performed by the audio processing apparatus, and the apparatus may be implemented in a form of software and/or hardware, or alternatively, may be implemented by being configured in any electronic device having a network communication function, where the electronic device may be a mobile terminal, a PC end, a server, or the like. As shown in fig. 10, the audio processing apparatus in this embodiment may include:

A first determining module 1010, configured to determine a first wheat position and a second wheat position in the live broadcast room;

a second determining module 1020, configured to determine a first spatial audio coordinate corresponding to the first microphone bit and a second spatial audio coordinate corresponding to the second microphone bit, where the spatial audio coordinates are used to simulate a three-dimensional spatial sound effect;

the audio processing module 1030 is configured to adjust the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate.

On the basis of the foregoing embodiment, optionally, the determining the second spatial audio coordinate corresponding to the second microphone bit and the first spatial audio coordinate corresponding to the first microphone bit includes:

determining wheat position layout information corresponding to the first wheat position at a first equipment end corresponding to the first wheat position, wherein the wheat position layout information is used for describing position distribution when each wheat position is displayed in a live broadcasting room;

determining a first spatial audio coordinate corresponding to the first wheat bit based on the wheat bit layout information corresponding to the first wheat bit, wherein different spatial audio configuration information is adopted for different wheat bit layouts, and the spatial audio configuration information records the association relation between each wheat bit and the spatial audio coordinate in each wheat bit layout;

And receiving a second spatial audio coordinate corresponding to the second wheat bit, wherein the second spatial audio coordinate is a spatial audio coordinate which is determined by the second equipment end corresponding to the second wheat bit and is matched with the second wheat bit based on the wheat bit layout information corresponding to the second wheat bit.

On the basis of the foregoing embodiment, optionally, determining, based on the first microphone bit layout information corresponding to the first microphone bit, a first spatial audio coordinate corresponding to the first microphone bit includes:

acquiring first space configuration information issued from a server based on the wheat position layout information corresponding to the first wheat position, wherein the first space configuration information is space audio configuration information configured for the wheat position layout information corresponding to the first wheat position;

and determining a first spatial audio coordinate corresponding to the first wheat bit according to the issued first spatial audio configuration information.

On the basis of the foregoing embodiment, optionally, the receiving the second spatial audio coordinate corresponding to the second microphone bit includes:

receiving a second spatial audio coordinate corresponding to the second wheat bit sent by the second equipment end corresponding to the second wheat bit through real-time communication with the second equipment end corresponding to the second wheat bit; or alternatively, the first and second heat exchangers may be,

And receiving a second spatial audio coordinate corresponding to the second wheat bit sent from the server, wherein the second spatial audio coordinate corresponding to the second wheat bit is synchronized to the server in advance by a second equipment terminal corresponding to the second wheat bit.

On the basis of the foregoing embodiment, optionally, the adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate includes:

adjusting the audio data of the second wheat bit based on the second spatial audio coordinate at a first equipment end corresponding to the first wheat bit, and outputting audio at the first equipment end corresponding to the first wheat bit;

transmitting the first spatial audio coordinate corresponding to the first wheat bit to the second equipment end corresponding to the second wheat bit, so that the second equipment end corresponding to the second wheat bit adjusts the audio data of the first wheat bit based on the first spatial audio coordinate and outputs audio at the second equipment end corresponding to the second wheat bit;

On the basis of the foregoing embodiment, optionally, the adjusting the audio data of the second microphone based on the second spatial audio coordinate includes:

Determining a first hearing angle of a wheat connecting object corresponding to a first equipment end relative to a second wheat position, wherein the first hearing angle is a preset angle formed by a first reference direction and a second reference direction, the first reference direction is determined based on a connecting line direction of the first reference wheat position in a live broadcasting wheat position area corresponding to the first equipment end, the first reference wheat position is a wheat position in which an interesting area of the wheat connecting object in the live broadcasting wheat position area corresponding to the first equipment end is located, and the second reference direction is determined based on a connecting line direction of the wheat connecting object and the second wheat position in the live broadcasting wheat position area corresponding to the first equipment end;

and adjusting the audio data of the second microphone based on the second space audio coordinate and the first listening included angle.

determining audio data of a first wheat bit collected at a first equipment end corresponding to the first wheat bit;

Receiving the audio data of the second wheat bit sent from the second equipment end corresponding to the second wheat bit at the first equipment end corresponding to the first wheat bit;

adjusting the audio data of the first wheat bit based on the first spatial audio coordinate and adjusting the audio data of the second wheat bit based on the second spatial audio coordinate;

and sending the adjusted audio data of the first wheat bit and the adjusted audio data of the second wheat bit to a third equipment end for audio output, wherein the first equipment end, the second equipment end and the third equipment end belong to different equipment ends.

and transmitting the audio data of the first wheat bit and the first spatial audio coordinate acquired at the first equipment end corresponding to the first wheat bit to a server end so that the server end adjusts the audio data of the first wheat bit based on the first spatial audio coordinate and adjusts the audio data of the second wheat bit based on the second spatial audio coordinate, and transmitting the adjusted audio data of the first wheat bit and the audio data of the second wheat bit to a third equipment end for audio output, wherein the second spatial audio coordinate and the audio data of the second wheat bit are received by the server end from the second equipment end corresponding to the second wheat bit.

On the basis of the foregoing embodiment, optionally, the sending the adjusted audio data of the first microphone bit and the adjusted audio data of the second microphone bit to the third device side for audio output includes:

merging the adjusted audio data of the first wheat bit and the audio data of the second wheat bit into a target audio stream;

and transmitting the target audio stream to a content distribution network so as to forward the target audio stream to at least two third equipment ends through the content distribution network for audio output.

On the basis of the foregoing embodiment, optionally, the adjusting the audio data of the first microphone based on the first spatial audio coordinate includes:

determining a second listening angle of the viewing object of the third equipment end relative to the first wheat position, wherein the second listening angle is a preset angle formed by a third reference direction and a fourth reference direction, the third reference direction is determined based on a reference line direction, the reference line is a straight line which perpendicularly faces to a second reference wheat position in a live broadcasting wheat position area of the third equipment end, the second reference wheat position is a wheat position positioned in a middle position in the live broadcasting wheat position area of the third equipment end, and the fourth reference direction is determined based on a preset reference point in the reference line direction and a connecting line of the first wheat position in the live broadcasting wheat position area of the third equipment end;

Adjusting the audio data of the first microphone based on the first space audio coordinate and the second listening angle;

correspondingly, the adjusting the audio data of the second microphone based on the second spatial audio coordinate includes:

determining a third listening angle of the watching object of the third equipment end relative to the second wheat position, wherein the third listening angle is a preset angle formed by a third reference direction and a fifth reference direction, and the fifth reference direction is determined based on a connecting line of a reference point pre-configured in the reference line direction and the second wheat position in a live broadcasting workshop wheat position area of the third equipment end;

and adjusting the audio data of the second microphone based on the second space audio coordinate and the third listening included angle.

According to the technical scheme, for the first wheat position and the second wheat position in the same live broadcasting room wheat position area, through determining the first spatial audio coordinate corresponding to the first wheat position and the second spatial audio coordinate corresponding to the second wheat position, the audio data of the first wheat position and the second wheat position can be adjusted in real time by utilizing the function that the first spatial audio coordinate and the second spatial audio coordinate can simulate three-dimensional spatial sound effects, so that the audio data of each wheat position in the live broadcasting room has spatial audio effects, the problem that the hearing feeling and the immersion feeling of audio experience are not high under the situation of live broadcasting and wheat connection of a plurality of scenes is solved, and the problem that the audio data cannot be resolved from the first time to the live broadcasting when a plurality of wheat positions in the same room send audio simultaneously is solved.

The audio processing device provided by the embodiment of the disclosure can execute the audio processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that each unit and module included in the above apparatus are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present disclosure.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. Referring now to fig. 11, a schematic diagram of an architecture of an electronic device 1100 (e.g., a terminal device or server of fig. 11) suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, the electronic device 1100 may include a processing means (e.g., a central processor, a graphics processor, etc.) 1101 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage means 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are also stored. The processing device 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An edit/output (I/O) interface 1105 is also connected to the bus 1104.

In general, the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1107 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 1108, including for example, magnetic tape, hard disk, etc.; and a communication device 1109. The communication means 1109 may allow the electronic device 1100 to communicate wirelessly or by wire with other devices to exchange data. While fig. 11 illustrates an electronic device 1100 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 1109, or from storage device 1108, or from ROM 1102. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 1101.

The electronic device provided by the embodiment of the present disclosure and the audio processing method provided by the foregoing embodiment belong to the same inventive concept, and technical details not described in detail in the present embodiment may be referred to the foregoing embodiment, and the present embodiment has the same beneficial effects as the foregoing embodiment.

The present disclosure provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the audio processing method provided by the above embodiments.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a first wheat position and a second wheat position in a live broadcasting room; determining a first spatial audio coordinate corresponding to the first wheat bit and a second spatial audio coordinate corresponding to the second wheat bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects; and adjusting the audio data of the first wheat bit and the audio data of the second wheat bit based on the first spatial audio coordinate and the second spatial audio coordinate.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, example 1 provides an audio processing method, comprising:

determining a first spatial audio coordinate corresponding to the first wheat bit and a second spatial audio coordinate corresponding to the second wheat bit, wherein the spatial audio coordinates are used for simulating three-dimensional spatial sound effects;

According to one or more embodiments of the present disclosure, example 2 is the method of example 1, the determining the first spatial audio coordinate corresponding to the first microphone bit and the second spatial audio coordinate corresponding to the second microphone bit includes:

According to one or more embodiments of the present disclosure, example 3 is the method of example 2, determining, based on the first microphone bit layout information corresponding to the first microphone bit, first spatial audio coordinates corresponding to the first microphone bit, including:

acquiring first space configuration information issued by a server based on the wheat position layout information corresponding to the first wheat position, wherein the first space configuration information is space audio configuration information configured for the wheat position layout information corresponding to the first wheat position;

According to one or more embodiments of the present disclosure, example 4 is the method of example 2, the receiving the second spatial audio coordinates corresponding to the second microphone bit, including:

According to one or more embodiments of the present disclosure, example 5 is the method of example 1, the adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate, comprising:

According to one or more embodiments of the present disclosure, example 6 is the method of example 5, the adjusting the audio data of the second microphone based on the second spatial audio coordinates, comprising:

According to one or more embodiments of the present disclosure, example 7 is the method of example 1, the adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate, comprising:

According to one or more embodiments of the present disclosure, example 8 is the method of example 1, the adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate, comprising:

According to one or more embodiments of the present disclosure, example 9 is the method of example 7 or 8, the sending the adjusted audio data of the first microphone bit and the audio data of the second microphone bit to the third device side for audio output, including:

According to one or more embodiments of the present disclosure, example 10 is the method of example 7 or 8, the adjusting the audio data of the first microphone based on the first spatial audio coordinate, comprising:

Example 11 provides an audio processing apparatus according to one or more embodiments of the present disclosure, comprising:

the second determining module is used for determining a first spatial audio coordinate corresponding to the first wheat bit and a second spatial audio coordinate corresponding to the second wheat bit, and the spatial audio coordinates are used for simulating three-dimensional spatial sound effects;

Example 12 provides an electronic device according to one or more embodiments of the present disclosure, the electronic device comprising:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the audio processing method as described in any of examples 1-10.

Example 13 provides a storage medium containing computer-executable instructions for performing the audio processing method of any one of examples 1-10 when executed by a computer processor, according to one or more embodiments of the present disclosure.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An audio processing method, comprising:

2. The method of claim 1, wherein the determining the first spatial audio coordinates corresponding to the first microphone bit and the second spatial audio coordinates corresponding to the second microphone bit comprises:

determining wheat position layout information corresponding to the first wheat position at a first equipment end corresponding to the first wheat position;

determining a first spatial audio coordinate corresponding to the first wheat bit based on the wheat bit layout information corresponding to the first wheat bit;

and receiving a second spatial audio coordinate corresponding to the second wheat bit, wherein the second spatial audio coordinate is the spatial audio coordinate corresponding to the second wheat bit, which is determined by the second equipment end corresponding to the second wheat bit based on the wheat bit layout information corresponding to the second wheat bit.

3. The method of claim 2, wherein determining the first spatial audio coordinate corresponding to the first wheat bit based on the wheat bit layout information corresponding to the first wheat bit comprises:

Acquiring first space configuration information issued by a server based on the wheat position layout information corresponding to the first wheat position;

and determining a first spatial audio coordinate corresponding to the first microphone bit according to the first spatial audio configuration information.

4. The method of claim 2, wherein receiving the second spatial audio coordinates corresponding to the second microphone bit comprises:

5. The method of claim 1, wherein adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate comprises:

And sending the first spatial audio coordinate corresponding to the first wheat bit to the second equipment end corresponding to the second wheat bit, so that the second equipment end corresponding to the second wheat bit adjusts the audio data of the first wheat bit based on the first spatial audio coordinate and outputs audio at the second equipment end corresponding to the second wheat bit.

6. The method of claim 5, wherein adjusting the audio data of the second microphone based on the second spatial audio coordinates comprises:

determining a first listening angle of the wheat connecting object corresponding to the first equipment end at the first wheat position relative to the second wheat position;

7. The method of claim 1, wherein adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate comprises:

receiving, at a first device end corresponding to the first wheat bit, audio data of the second wheat bit sent from a second device end corresponding to the second wheat bit;

and sending the adjusted audio data of the first wheat bit and the adjusted audio data of the second wheat bit to a third equipment end for audio output.

8. The method of claim 1, wherein adjusting the audio data of the first microphone and the audio data of the second microphone based on the first spatial audio coordinate and the second spatial audio coordinate comprises:

9. The method according to claim 7 or 8, wherein transmitting the adjusted audio data of the first microphone and the adjusted audio data of the second microphone to the third device for audio output, comprises:

10. The method of claim 7 or 8, wherein adjusting the audio data of the first microphone based on the first spatial audio coordinates comprises:

determining a second listening angle of the viewing object of the third equipment end relative to the first microphone;

determining a third listening angle of the viewing object of the third equipment end relative to the second microphone;

11. An audio processing apparatus, comprising:

12. An electronic device, the electronic device comprising:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the audio processing method of any of claims 1-10.

13. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the audio processing method of any of claims 1-10.