CN115866326A - Audio processing method and device for panoramic video - Google Patents

Audio processing method and device for panoramic video Download PDF

Info

Publication number
CN115866326A
CN115866326A CN202211535904.5A CN202211535904A CN115866326A CN 115866326 A CN115866326 A CN 115866326A CN 202211535904 A CN202211535904 A CN 202211535904A CN 115866326 A CN115866326 A CN 115866326A
Authority
CN
China
Prior art keywords
data
audio
panoramic
panoramic video
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211535904.5A
Other languages
Chinese (zh)
Inventor
朱俊炜
聂大森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202211535904.5A priority Critical patent/CN115866326A/en
Publication of CN115866326A publication Critical patent/CN115866326A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stereophonic System (AREA)

Abstract

The embodiment of the application discloses an audio processing method and device for panoramic video, wherein the method comprises the following steps: acquiring a panoramic audio and video file containing panoramic video data and multi-channel audio data; carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data; when the position data of the center viewing object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center viewing object is applied to the center listening object corresponding to the multi-channel audio data so as to change the audio playing effect. The scheme plays the panoramic video data and the multi-channel audio data, and in the playing process, if the position data of the central watching object corresponding to the panoramic video data is changed, the audio playing effect of the multi-channel audio data can be changed along with the central watching object, so that the audio data and the video data are synchronously changed, and the immersive experience of a user is improved.

Description

Audio processing method and device for panoramic video
Technical Field
The embodiment of the application relates to the field of data processing, in particular to an audio processing method and device for panoramic video.
Background
Panoramic video is also called surround video, or immersive video or spherical video, and is a video record that is shot by using an omnidirectional camera or a group of cameras and simultaneously records views in all directions, and supports multi-angle video playing, namely '360-degree video playing', during playing on a display, a viewer can control the viewing direction like controlling a panoramic image, and can also play on a display or a projector arranged in a sphere or a certain part of a sphere.
Many conventional HRTF technologies for headphones have been developed, which utilize original surround sound audio to simulate a number of surround channels around in stereo headphones through a series of algorithms so that the listener can perceive the surrounding as if there were all sources of sound.
However, when a panoramic video in the prior art is played, and different directions are watched by dragging a picture or controlling a mobile phone to move, the heard sound does not change, for example, a sound source is arranged on the right side of the picture, and at the moment, the picture is directly opposite to the sound source, and the sound is still on the right side instead of the front side; when a user watches the panoramic video, the user cannot feel the change of the sound direction, and the user experience is influenced.
Disclosure of Invention
In view of the foregoing problems, the present application provides an audio processing method, apparatus, computing device and computer storage medium for panoramic video, so as to solve the following problems: the existing audio processing method causes that a user cannot feel the change of the sound direction when watching the panoramic video, thereby influencing the user experience.
According to an aspect of an embodiment of the present application, there is provided an audio processing method for a panoramic video, including:
acquiring a panoramic audio and video file containing panoramic video data and multi-channel audio data;
carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data;
when the position data of the center watching object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center watching object is applied to the center listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect.
Further, the acquiring a panoramic audio/video file containing panoramic video data and multi-channel audio data further comprises:
and responding to a panoramic audio and video data acquisition request sent by a user, and pulling the panoramic audio and video file uploaded by the production terminal from the server.
Further, the panorama mapping processing of the panorama video data and the multi-channel audio data further includes:
mapping the panoramic video data to a panoramic mapping spherical surface;
multi-channel audio data is mapped onto respective audio processing nodes in an audio space.
Further, mapping the panoramic video data onto the panoramic mapping sphere further comprises:
and generating a panoramic mapping spherical surface, and mapping frame pictures in the panoramic video data onto the panoramic mapping spherical surface.
Further, generating the panorama mapping sphere further comprises:
and taking the side edge of the frame picture in the panoramic video data as a spherical semicircular arc to generate a corresponding panoramic mapping spherical surface.
Further, mapping the multi-channel audio data onto respective audio processing nodes in an audio space further comprises:
calling an audio context object, and adding audio processing nodes corresponding to each channel of the multi-channel audio data in the audio context object;
and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
Further, the method further comprises:
and setting the attribute of each audio processing node as an algorithm identifier of a preset sound effect positioning algorithm.
Further, applying the changed position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change;
and updating the position data of each audio processing node according to the relative position change data.
Further, applying the modified position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
and updating the position data of the central listening object according to the changed position data of the central viewing object.
According to another aspect of the embodiments of the present application, there is provided an audio processing method for panoramic video, including:
responding to panoramic video playing operation executed by a user, and acquiring and playing a panoramic audio and video file; the panoramic audio and video file comprises panoramic video data and multi-channel audio data;
monitoring a central viewing position change operation of a user;
and responding to the center watching position changing operation, determining changed position data of a center watching object corresponding to the panoramic video data, and applying the changed position data of the center watching object to a center listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect.
Further, responding to a panoramic video playing operation executed by a user, acquiring and playing a panoramic audio and video file further comprises:
in response to the panoramic video playing operation executed by the user, pulling the panoramic audio and video file uploaded by the production terminal from the server terminal;
and carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data.
Further, the panorama mapping processing of the panorama video data and the multi-channel audio data further includes:
mapping the panoramic video data to a panoramic mapping spherical surface;
multi-channel audio data is mapped onto respective audio processing nodes in an audio space.
Further, mapping the multi-channel audio data onto respective audio processing nodes in the audio space further comprises:
calling an audio context object, and adding audio processing nodes corresponding to each channel of the multi-channel audio data in the audio context object;
and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
Further, applying the modified position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change;
and updating the position data of each audio processing node according to the relative position change data.
Further, applying the modified position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
and updating the position data of the central listening object according to the changed position data of the central viewing object.
According to another aspect of the embodiments of the present application, there is provided an audio processing apparatus for panoramic video, including:
the file acquisition module is used for acquiring a panoramic audio and video file containing panoramic video data and multi-channel audio data;
the playing module is used for carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data and then playing the panoramic video data and the multi-channel audio data;
and the processing module is used for applying the changed position data of the central viewing object to the central listening object corresponding to the multi-channel audio data when the position data of the central viewing object corresponding to the panoramic video data is changed in the playing process so as to synchronously change the video playing effect and the audio playing effect.
According to another aspect of the embodiments of the present application, there is provided an audio processing apparatus for panoramic video, including:
the response module is used for responding to panoramic video playing operation executed by a user and acquiring and playing a panoramic audio and video file; the panoramic audio and video file comprises panoramic video data and multi-channel audio data;
the monitoring module is used for monitoring the central viewing position changing operation of the user;
and the synchronization module is used for responding to the center watching position changing operation, determining the changed position data of the center watching object corresponding to the panoramic video data, and applying the changed position data of the center watching object to the center listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect.
According to yet another aspect of embodiments herein, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the audio processing method of the panoramic video.
According to another aspect of the embodiments of the present application, there is provided a computer storage medium having at least one executable instruction stored therein, where the executable instruction causes a processor to perform operations corresponding to the audio processing method of the panoramic video.
According to the audio processing method and device for the panoramic video, a panoramic audio and video file containing panoramic video data and multi-channel audio data is obtained; carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data; when the position data of the center viewing object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center viewing object is applied to the center listening object corresponding to the multi-channel audio data so as to change the audio playing effect. By the scheme, in the process of playing the panoramic video at a webpage end and the like, when the orientation of the picture is changed, the changed position data of the central watching object is applied to the central listening object corresponding to the multi-channel audio data, so that the playing effect that the sound image can be changed along with the change is obtained, and a panoramic processing program such as a computer or a mobile phone end is not required to be installed; according to the scheme, the synchronization of the audio data and the video data can be realized by matching Javascript language with Web API, and a user does not need to download application; and calling the audio context object according to the multi-channel audio data, and adding the audio processing nodes corresponding to the channels of the multi-channel audio data into the audio context object, so that a stereophonic effect can be generated, and a user can obtain immersive experience of surround sound.
The foregoing description is only an overview of the technical solutions of the embodiments of the present application, and the embodiments of the present application can be implemented according to the content of the description in order to make the technical means of the embodiments of the present application more clearly understood, and the detailed description of the embodiments of the present application will be given below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present application more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the embodiments of the present application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a schematic flow diagram of an audio processing method of panoramic video according to an embodiment of the present application;
FIG. 2a shows a schematic flow diagram of an audio processing method of panoramic video according to another embodiment of the present application;
FIG. 2b shows a first mapping diagram of panoramic video data and multi-channel audio data according to an embodiment of the present application;
FIG. 2c shows a second mapping diagram of panoramic video data and multi-channel audio data according to an embodiment of the present application;
FIG. 3 shows a schematic flow diagram of an audio processing method of panoramic video according to another embodiment of the present application;
fig. 4 is a block diagram illustrating an audio processing apparatus for panoramic video according to an embodiment of the present application;
fig. 5 is a block diagram showing a configuration of an audio processing apparatus for panoramic video according to another embodiment of the present application;
FIG. 6 illustrates a schematic structural diagram of a computing device according to one embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
First, the noun terms to which one or more embodiments of the present application relate are explained.
Panoramic video (360 degree video): also known as surround video, or immersive video or spherical video, is a video recording that is taken using an omnidirectional camera or a group of cameras while recording views in various directions. During playback on the display, the viewer can control the viewing direction as if controlling a panoramic view, or can play on a display or projector that is spherical or some part of a sphere.
Head Related Transfer Functions (HRTFs): an HRTF is a set of filters, that is, stereo sound is generated by using Interaural Time Delay (ITD), interaural Amplitude Difference (IAD), and pinna frequency vibration, so that a user can feel surround sound when sound is transmitted to pinna, ear canal, and eardrum in a human ear.
Audio Context (Audio Context) object: is an Audio processing graph built of linked-together Audio modules, each represented by an Audio graph Node (Audio Node), an Audio context object controlling the creation of the nodes it contains and the execution of Audio processing or decoding. Before doing any other operation, an Audio Context object needs to be created, since all happens in the Audio Context object. Instead of initializing a new Audio Context object at a time, an Audio Context object may be created and multiplexed, and one Audio Context object may be used simultaneously for multiple different Audio sources and pipes.
The audio processing node: such as PannerNode, for representing the location, orientation and behavior of the audio source signal in the simulated physical space. The right-handed cartesian coordinates can be used to describe the position of the sound source as a vector, whose direction is a three-dimensional direction cone.
Surround sound: originally, human hearing perceives the whole space stereo of the position of a spatial sound source, surround sound has real sound space perception, the existing stereo sound utilizes the modern electroacoustic technology to respectively adjust the volume and the phase of each frequency component of each group of sound of a left channel and a right channel under the condition of not changing the positions of loudspeakers of the left channel and the right channel, so that each group of sound has psychological 'sound images' on different positions on the front side, and surround sound is added with two sound devices arranged on the back side, so that each group of sound has psychological 'sound images' on different positions on the front side and the back side, and the omnibearing spatial stereo perception of the sound is formed.
Fig. 1 is a flowchart illustrating an audio processing method for panoramic video according to an embodiment of the present application, which may be performed by a playback end, as shown in fig. 1, and the method includes the following steps:
step S101: and acquiring a panoramic audio and video file containing panoramic video data and multi-channel audio data.
Specifically, a panoramic audio and video file uploaded by a production end can be pulled from a server end through a playing end, wherein the panoramic audio and video file comprises panoramic video data and multi-channel audio data, video videos of views in all directions are recorded while being shot by an omnidirectional camera or a group of cameras at the production end, and the panoramic video data are obtained after processing through splicing, fusion, a panoramic video coding algorithm and the like; acquiring audio in a plurality of directions through multiple channels to obtain multi-channel audio data; and then the production end synthesizes the obtained panoramic video data and the multi-channel audio data to obtain a panoramic audio and video file, and uploads the panoramic audio and video file to the server end for distribution.
Step S102: and carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data.
Specifically, the panoramic video is a spherical video in nature, a playing end plays a picture on a spherical projection surface which is subjected to splicing processing and takes a shooting point as a center, a specific storage format of the panoramic video does not exist at present, the projection of a panoramic video image on the spherical surface is inconvenient to store, and generally, the shot picture of 360 degrees around the shooting point is spliced into a plane rectangular picture through a later image to be stored. The first step of playing the panoramic video on the playing end is to remap the transformed rectangular picture to a spherical surface. In this step, the panoramic video data needs to be mapped onto a panoramic mapping sphere; based on the channel bearing, multi-channel audio data is mapped onto various audio processing nodes in the audio space.
Step S103: when the position data of the center viewing object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center viewing object is applied to the center listening object corresponding to the multi-channel audio data.
If a user at a playing end (e.g., a web page end or the like) wants to watch different directions by dragging a picture through a mouse or a gesture or by controlling the movement of a mobile phone in the process of playing the panoramic video, the playing end can monitor the position data of the central watching object corresponding to the panoramic video data in real time in order to ensure the synchronization of the audio, and if the position data of the central watching object corresponding to the panoramic video data is changed, the position data of the changed central watching object is applied to the central listening object corresponding to the multi-channel audio data to synchronously change the video playing effect and the audio playing effect, thereby ensuring the synchronization of the audio data and the video data and improving the user experience.
According to the audio processing method of the panoramic video, provided by the embodiment of the application, a panoramic audio and video file containing panoramic video data and multi-channel audio data is obtained; carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data; when the position data of the center watching object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center watching object is applied to the center listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect. The method plays the panoramic video data and the multi-channel audio data, and in the playing process, if the position data of the central watching object corresponding to the panoramic video data is changed, the audio playing effect of the multi-channel audio data can be changed along with the central watching object, so that the audio data and the video data are synchronously changed, and the immersive experience of a user is improved.
Fig. 2a is a flow chart illustrating an audio processing method of a panoramic video according to another embodiment of the present application, which may be executed by a playback end, as shown in fig. 2a, the method includes the following steps:
step S201: and in response to a panoramic audio and video data acquisition request sent by a user, pulling the panoramic audio and video file uploaded by the production terminal from the server terminal.
The method comprises the steps that a panoramic video production end collects video recordings which are shot by an omnidirectional camera or a group of cameras and simultaneously record views in all directions, panoramic video data are generated through software, multichannel audio data are generated through channel parameter setting and audio collection, specifically, audio sources in multiple directions are generated, and the directions of the audio sources in the multiple directions are defined as fixed parameters; for example, standard 5.1 channels, where standard 5.1 channels include 6 channels: a bass sound channel, a front center sound channel, a front left sound channel, a front right sound channel, a rear left sound channel and a right rear sound channel; the orientations of the 6 channels are defined as fixed; and synthesizing the obtained panoramic video data and the multi-channel audio data by the production end to obtain a panoramic audio and video file, and uploading the panoramic audio and video file to the server end for distribution.
In the step, the playing end responds to a panoramic audio and video data acquisition request sent by a user, and pulls the panoramic audio and video file uploaded by the production end from the server end. Specifically, the playing end may pull the panoramic audio/video file uploaded by the production end through a Fetch () request or a hypertext Transfer Protocol (HTTP) request, and the like, where the HTTP request may be implemented through an Application Programming Interface (API) corresponding to an extensible markup language, that is, an xmlhttrequest API.
Step S202: mapping the panoramic video data to a panoramic mapping spherical surface; multi-channel audio data is mapped onto respective audio processing nodes in an audio space.
Specifically, the panoramic video is a spherical video, and in this step, after the panoramic audio/video file uploaded by the production end is pulled from the service end through the playing end, the panoramic video data needs to be mapped onto a panoramic mapping sphere, and the multi-channel audio data needs to be mapped onto each audio processing node in the audio space.
In an alternative embodiment, step S202 further includes: and generating a panoramic mapping spherical surface, and mapping frame pictures in the panoramic video data onto the panoramic mapping spherical surface. And taking the side edge of the frame picture in the panoramic video data as a spherical semicircular arc to generate a corresponding panoramic mapping spherical surface.
In an alternative embodiment, step S202 further includes: calling an audio context object, and adding audio processing nodes corresponding to each channel of the multi-channel audio data in the audio context object; and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
In an alternative embodiment, the method further comprises: and setting the attribute of each audio processing node as an algorithm identifier of a preset sound effect positioning algorithm.
The playing end plays the pictures on the spherical projection surface which is spliced and takes the shooting point as the center, no specific storage format of the panoramic video exists at present, the projection of the panoramic video image on the spherical surface is inconvenient to store, and generally, the shot pictures of 360 degrees around the shooting point are spliced into a plane rectangular picture to be stored through later-stage images. The first step of playing the panoramic video on the playing end is to remap the transformed rectangular picture to a spherical surface. Specifically, the side edge of a frame picture in panoramic video data is used as a spherical semicircular arc to generate a corresponding panoramic mapping spherical surface, and then the frame picture in the panoramic video data is mapped onto the panoramic mapping spherical surface; the method includes the steps of defining multichannel Audio data according to the direction of a production end, mapping each channel to each Audio processing node in an HRTF implementation space (namely an Audio space) of a playing end, if a network application program interface (Web API) of the playing end is used, creating an Audio Context object directly aiming at the multichannel Audio data, adding Audio processing nodes (such as PannerNodes) corresponding to each channel of the multichannel Audio data in the Audio Context object, and setting a 'Panner Model' attribute of the PanerNodes as an algorithm identifier of a preset sound effect positioning algorithm.
Step S203: and playing the panoramic video data and the multi-channel audio data.
The panoramic video is generally implemented by taking the video as a sky box or a spherical map, and moving the coordinates and the euler angle of a central viewing object (such as a camera) in the center of a space, and if the panoramic video is used for webpage playing, the panoramic video can be implemented by using Javascript language. Fig. 2b is a schematic diagram showing a mapping of panoramic video data and multi-channel audio data according to an embodiment of the present application, as shown in fig. 2b, in a state of playing the panoramic video data and the multi-channel audio data, a mapping state of the panoramic video data is shown in a left diagram, and a mapping state of the multi-channel audio data is shown in a right diagram, wherein a panoramic mapping sphere can be represented by an outer circle in the left diagram, and the left diagram shows a positional relationship between video content shown in the center of a current screen at a playing end and a central viewing object; the Audio space can be represented by the outer circle in the right diagram, the sound channels are represented by the loudspeaker diagram, the six sound channels on the spherical surface respectively correspond to one PannerNode, and the right diagram shows the corresponding relation between the spherical coordinates and the euler angles of the PannerNode corresponding to each sound channel in the center listening object and the Audio content object.
Step S204: when the position data of the center viewing object corresponding to the panoramic video data is changed in the playing process, the relative position change data of each audio processing node relative to the center listening object is calculated according to the changed position data of the center viewing object.
In the playing process, the playing end monitors the position data of the central viewing object, when the position relation between the video content shown in the picture center of the playing end and the central viewing object changes, namely the coordinate or the Euler angle of the central viewing object changes, the changed position data is obtained, and the relative position change data of each PannerNode relative to the central listening object is calculated according to the changed position data of the central viewing object.
Step S205: and updating the position data of each audio processing node according to the relative position change data.
Fig. 2c shows a second mapping diagram of panoramic video data and multi-channel audio data according to an embodiment of the present application, as shown in fig. 2c, the position data of each PannerNode is updated according to the relative position change data; for example, if the central viewing object moves 45 ° clockwise to the right, this corresponds to the respective pannernodes around the central listening object moving 45 ° counterclockwise to the left, and the coordinates and euler angles of the respective pannernodes in the audio space are updated and set according to the relative position change data, so that the effect of moving 45 ° clockwise to the right of the central listening object is obtained.
In an alternative mode, the position data of the central viewing object can be directly set for the central viewing object, and then the position data of the central listening object is updated according to the changed position data of the central viewing object, so that the audio playing effect of the multi-channel audio data can be changed along with the central viewing object, and the audio data and the video data can be synchronously changed. Specifically, the modified coordinates and euler angles of the central viewing object are applied to a central listening object (listener) of the HRTF; for example, using the Web API, if there is a listener attribute on the Audio content object, the coordinates and euler angles of the center listening object in the Audio space may be set in the listener attribute to update the position data of the center listening object.
In the prior art, the user is mostly required to download the panoramic processing program (APP, etc.) of the computer or the mobile terminal to realize the spatial audio effect of the panoramic video, and in order to further improve the user experience and simplify the panoramic video acquisition mode, the embodiment does not require the user to download the panoramic processing program of the computer or the mobile terminal, and can be realized by directly matching the Javascript language with the Web API at the playing end.
According to the audio processing method of the panoramic video, the panoramic audio and video file containing the panoramic video data and the multi-channel audio data is obtained; carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data; when the position data of the center viewing object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center viewing object is applied to the center listening object corresponding to the multi-channel audio data so as to change the audio playing effect. The method can apply the position data after the central watching object is changed to the central listening object corresponding to the multi-channel audio data when the orientation of the picture is changed in the process of playing the panoramic video at a webpage end and the like, so as to obtain the playing effect that the sound image can be changed along with the change of the position data, and a panoramic processing program such as a computer or a mobile phone end and the like does not need to be installed; the method can realize the synchronization of the audio data and the video data by matching Javascript language with Web API, and does not need the user to download application; and calling the audio context object according to the multi-channel audio data, and adding the audio processing nodes corresponding to the channels of the multi-channel audio data into the audio context object, so that a stereophonic effect can be generated, and a user can obtain immersive experience of surround sound.
Fig. 3 is a flowchart illustrating an audio processing method of a panoramic video according to another embodiment of the present application, and as shown in fig. 3, the method includes the following steps:
step S301: and responding to the panoramic video playing operation executed by the user, and acquiring and playing the panoramic audio and video file.
Specifically, the panoramic audio/video file comprises panoramic video data and multi-channel audio data, the panoramic video data and the multi-channel audio data are synthesized by the production terminal to obtain the panoramic audio/video file, the panoramic audio/video file is uploaded to the service terminal to be issued, and then the panoramic audio/video file uploaded by the production terminal can be pulled from the service terminal in response to the panoramic video playing operation executed by a user and the panoramic audio/video file is played.
In an optional implementation, step S301 further includes: and carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data.
In an optional implementation, step S301 further includes: mapping the panoramic video data to a panoramic mapping spherical surface; multi-channel audio data is mapped onto respective audio processing nodes in an audio space.
In an optional implementation, step S301 further includes: calling an audio context object, and adding audio processing nodes corresponding to all channels of the multi-channel audio data in the audio context object; and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
Step S302: the center viewing position changing operation of the user is monitored.
Specifically, the user can change the center viewing position through a mouse, a gesture dragging picture or a mode of controlling the mobile phone to move and the like in the process of playing the panoramic video through the playing end so as to view different directions.
Step S303: and in response to the center viewing position changing operation, determining changed position data of a center viewing object corresponding to the panoramic video data, and applying the changed position data of the center viewing object to a center listening object corresponding to the multi-channel audio data.
In an optional implementation, step S303 further includes: calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change; and updating the position data of each audio processing node according to the relative position change data.
In an optional implementation, step S303 further includes: and updating the position data of the central listening object according to the changed position data of the central viewing object.
In order to ensure the synchronization of audio and video, the playing end can monitor the central viewing position changing operation of a user in real time, specifically, if the central viewing position changing operation is monitored to be executed by the user, the position data of a central viewing object corresponding to panoramic video data is changed, and the position data after the central viewing object is changed is applied to a central listening object corresponding to multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect, ensure the synchronization of the audio data and the video data, and improve the user experience.
According to the audio processing method of the panoramic video, the panoramic audio and video file is obtained and played by responding to the panoramic video playing operation executed by the user; monitoring a central viewing position change operation of a user; and responding to the center watching position changing operation, determining changed position data of a center watching object corresponding to the panoramic video data, and applying the changed position data of the center watching object to a center listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect. The method plays the panoramic video data and the multi-channel audio data, and in the playing process, if the position data of the central watching object corresponding to the panoramic video data is changed, the audio playing effect of the multi-channel audio data can be changed along with the central watching object, so that the audio data and the video data are synchronously changed, and the immersive experience of a user is improved.
Fig. 4 is a block diagram illustrating a structure of an audio processing apparatus for panoramic video according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes: a file acquisition module 401, a playing module 402 and a processing module 403.
The file obtaining module 401 is configured to obtain a panoramic audio/video file including panoramic video data and multi-channel audio data.
In an optional implementation manner, the file obtaining module 401 is further configured to: and in response to a panoramic audio and video data acquisition request sent by a user, pulling the panoramic audio and video file uploaded by the production terminal from the server terminal.
The playing module 402 is configured to perform panorama mapping processing on the panoramic video data and the multi-channel audio data, and then play the panoramic video data and the multi-channel audio data.
In an alternative embodiment, the playing module 402 is further configured to: mapping the panoramic video data to a panoramic mapping spherical surface; multi-channel audio data is mapped onto respective audio processing nodes in an audio space.
In an alternative embodiment, the playing module 402 is further configured to: and generating a panoramic mapping spherical surface, and mapping the frame pictures in the panoramic video data onto the panoramic mapping spherical surface.
In an alternative embodiment, the playing module 402 is further configured to: and taking the side edge of the frame picture in the panoramic video data as a spherical semicircular arc to generate a corresponding panoramic mapping spherical surface.
In an alternative embodiment, the playing module 402 is further configured to: calling an audio context object, and adding audio processing nodes corresponding to each channel of the multi-channel audio data in the audio context object; and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
In an alternative embodiment, the playing module 402 is further configured to: and setting the attribute of each audio processing node as an algorithm identifier of a preset sound effect positioning algorithm.
And a processing module 403, configured to, when the position data of the center viewing object corresponding to the panoramic video data is changed during playing, apply the position data after the change of the center viewing object to a center listening object corresponding to the multi-channel audio data to change an audio playing effect.
In an alternative embodiment, the processing module 403 is further configured to: calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change; and updating the position data of each audio processing node according to the relative position change data.
In an alternative embodiment, the processing module 403 is further configured to: and updating the position data of the central listening object according to the changed position data of the central viewing object.
The descriptions of the modules refer to the corresponding descriptions in the method embodiments, and are not repeated herein.
According to the audio processing device of the panoramic video, the panoramic audio and video file containing the panoramic video data and the multi-channel audio data is obtained; carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data; when the position data of the center viewing object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the center viewing object is applied to the center listening object corresponding to the multi-channel audio data so as to change the audio playing effect. The device plays panoramic video data and multichannel audio data, and in the playing process, if the position data of the center watching object corresponding to the panoramic video data is changed, the audio playing effect of the multichannel audio data can be changed along with the center watching object, so that the audio data and the video data are synchronously changed, and the immersive experience of a user is improved.
Fig. 5 is a block diagram illustrating a structure of an audio processing apparatus for panoramic video according to another embodiment of the present application, which includes, as shown in fig. 5: a response module 501, a monitoring module 502, and a synchronization module 503.
A response module 501, configured to respond to a panoramic video playing operation performed by a user, and acquire and play a panoramic audio/video file; the panoramic audio and video file comprises panoramic video data and multi-channel audio data.
In an alternative embodiment, the response module 501 is further configured to: in response to a panoramic video playing operation executed by a user, pulling a panoramic audio and video file uploaded by a production terminal from a server terminal; and carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data.
In an alternative embodiment, the response module 501 is further configured to: mapping the panoramic video data to a panoramic mapping spherical surface; multi-channel audio data is mapped onto respective audio processing nodes in an audio space.
In an alternative embodiment, the response module 501 is further configured to: calling an audio context object, and adding audio processing nodes corresponding to each channel of the multi-channel audio data in the audio context object; and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
A monitoring module 502, configured to monitor a central viewing position changing operation of the user.
A synchronization module 503, configured to determine, in response to the center viewing position changing operation, changed position data of a center viewing object corresponding to the panoramic video data, and apply the changed position data of the center viewing object to a center listening object corresponding to the multi-channel audio data to synchronously change a video playing effect and an audio playing effect.
In an alternative embodiment, the synchronization module 503 is further configured to: calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change; and updating the position data of each audio processing node according to the relative position change data.
In an alternative embodiment, the synchronization module 503 is further configured to: and updating the position data of the central listening object according to the changed position data of the central viewing object.
According to the audio processing device of the panoramic video, the panoramic audio and video file is obtained and played through responding to the panoramic video playing operation executed by a user; monitoring a central viewing position change operation of a user; and responding to the center watching position changing operation, determining changed position data of a center watching object corresponding to the panoramic video data, and applying the changed position data of the center watching object to a center listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect. The device plays panoramic video data and multichannel audio data, and in the playing process, if the position data of the center watching object corresponding to the panoramic video data is changed, the audio playing effect of the multichannel audio data can be changed along with the center watching object, so that the audio data and the video data are synchronously changed, and the immersive experience of a user is improved.
The embodiment of the application also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the audio processing method of the panoramic video in any method embodiment.
Fig. 6 is a schematic structural diagram of a computing device according to an embodiment of the present application, and a specific embodiment of the present application does not limit a specific implementation of the computing device.
As shown in fig. 6, the computing device may include: a processor (processor) 602, a communication Interface 604, a memory 606, and a communication bus 608.
Wherein:
the processor 602, communication interface 604, and memory 606 communicate with one another via a communication bus 608.
A communication interface 604 for communicating with network elements of other devices, such as clients or other servers.
The processor 602 is configured to execute the program 610, and may specifically perform relevant steps in the above-described audio processing method for panoramic video.
In particular, program 610 may include program code comprising computer operating instructions.
The processor 602 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present Application. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 606 for storing a program 610. Memory 606 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 610 may specifically be configured to cause the processor 602 to execute an audio processing method of a panoramic video in any of the above-described method embodiments. For specific implementation of each step in the program 610, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing audio processing method embodiment of the panoramic video, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present application are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the embodiments of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the embodiments of the present application.
In the description provided herein, numerous specific details are set forth. It can be appreciated, however, that the embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the present application, various features of the embodiments of the present application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed embodiments of the application require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of an embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the embodiments of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with the embodiments of the present application. Embodiments of the present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website, or provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the embodiments of the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The embodiments of the application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (21)

1. An audio processing method of panoramic video, comprising:
acquiring a panoramic audio and video file containing panoramic video data and multi-channel audio data;
carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data;
when the position data of the central viewing object corresponding to the panoramic video data is changed in the playing process, the position data after the change of the central viewing object is applied to the central listening object corresponding to the multi-channel audio data so as to synchronously change the video playing effect and the audio playing effect.
2. The method of claim 1, wherein the obtaining a panoramic audio video file containing panoramic video data and multi-channel audio data further comprises:
and in response to a panoramic audio and video data acquisition request sent by a user, pulling the panoramic audio and video file uploaded by the production terminal from the server terminal.
3. The method of claim 1, wherein the panorama mapping the panoramic video data and the multi-channel audio data further comprises:
mapping the panoramic video data onto a panoramic mapping sphere;
mapping the multi-channel audio data onto respective audio processing nodes in an audio space.
4. The method of claim 3, wherein said mapping said panoramic video data onto a panoramic mapping sphere further comprises:
and generating the panoramic mapping spherical surface, and mapping the frame picture in the panoramic video data onto the panoramic mapping spherical surface.
5. The method of claim 4, wherein the generating a panoramic mapping sphere further comprises:
and taking the side edge of the frame picture in the panoramic video data as a spherical semicircular arc to generate a corresponding panoramic mapping spherical surface.
6. The method of any of claims 3-5, wherein the mapping the multi-channel audio data onto respective audio processing nodes in an audio space further comprises:
calling an audio context object, and adding audio processing nodes corresponding to all channels of the multi-channel audio data in the audio context object;
and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
7. The method of claim 6, wherein the method further comprises:
and setting the attribute of each audio processing node as an algorithm identifier of a preset sound effect positioning algorithm.
8. The method of any of claims 3-7, wherein the applying the altered position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change;
and updating the position data of each audio processing node according to the relative position change data.
9. The method of any of claims 1-8, wherein said applying the altered position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
and updating the position data of the central listening object according to the changed position data of the central viewing object.
10. An audio processing method of panoramic video, comprising:
responding to panoramic video playing operation executed by a user, and acquiring and playing a panoramic audio and video file; the panoramic audio and video file comprises panoramic video data and multi-channel audio data;
monitoring a central viewing position change operation of the user;
and responding to the central watching position changing operation, determining changed position data of a central watching object corresponding to the panoramic video data, and applying the changed position data of the central watching object to a central listening object corresponding to the multi-channel audio data so as to synchronously change a video playing effect and an audio playing effect.
11. The method of claim 10, wherein said retrieving and playing a panoramic audio video file in response to a panoramic video play operation performed by a user further comprises:
in response to panoramic video playing operation executed by a user, pulling the panoramic audio and video file uploaded by a production terminal from a server terminal;
and carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data, and then playing the panoramic video data and the multi-channel audio data.
12. The method of claim 11, wherein the panorama mapping the panoramic video data and the multi-channel audio data further comprises:
mapping the panoramic video data onto a panoramic mapping sphere;
mapping the multi-channel audio data onto respective audio processing nodes in an audio space.
13. The method of claim 12, wherein the mapping the multi-channel audio data onto respective audio processing nodes in an audio space further comprises:
calling an audio context object, and adding audio processing nodes corresponding to all channels of the multi-channel audio data in the audio context object;
and setting the position data of the audio processing node corresponding to each sound channel according to the preset azimuth data of the sound channel aiming at each sound channel in each sound channel.
14. The method of claim 12 or 13, wherein the applying the altered position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
calculating the relative position change data of each audio processing node relative to the center listening object according to the position data of the center viewing object after the change;
and updating the position data of each audio processing node according to the relative position change data.
15. The method of any of claims 10-14, wherein the applying the altered position data of the center viewing object to the center listening object corresponding to the multi-channel audio data further comprises:
and updating the position data of the central listening object according to the changed position data of the central viewing object.
16. An audio processing apparatus of a panoramic video, comprising:
the file acquisition module is used for acquiring a panoramic audio and video file containing panoramic video data and multi-channel audio data;
the playing module is used for carrying out panoramic mapping processing on the panoramic video data and the multi-channel audio data and then playing the panoramic video data and the multi-channel audio data;
and the processing module is used for applying the position data after the change of the central viewing object corresponding to the panoramic video data to a central listening object corresponding to the multi-channel audio data when the position data of the central viewing object corresponding to the panoramic video data is changed in the playing process so as to synchronously change the video playing effect and the audio playing effect.
17. An audio processing apparatus of a panoramic video, comprising:
the response module is used for responding to panoramic video playing operation executed by a user and acquiring and playing a panoramic audio and video file; the panoramic audio and video file comprises panoramic video data and multi-channel audio data;
a monitoring module for monitoring a central viewing position change operation of the user;
and the synchronization module is used for responding to the central viewing position changing operation, determining changed position data of a central viewing object corresponding to the panoramic video data, and applying the changed position data of the central viewing object to a central listening object corresponding to the multi-channel audio data so as to synchronously change a video playing effect and an audio playing effect.
18. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the audio processing method of the panoramic video as claimed in any one of claims 1-9.
19. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to a method of audio processing of panoramic video as recited in any one of claims 1-9.
20. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the audio processing method of the panoramic video according to any one of the claims 10-15.
21. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to a method of audio processing of panoramic video as recited in any one of claims 10-15.
CN202211535904.5A 2022-12-02 2022-12-02 Audio processing method and device for panoramic video Pending CN115866326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211535904.5A CN115866326A (en) 2022-12-02 2022-12-02 Audio processing method and device for panoramic video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211535904.5A CN115866326A (en) 2022-12-02 2022-12-02 Audio processing method and device for panoramic video

Publications (1)

Publication Number Publication Date
CN115866326A true CN115866326A (en) 2023-03-28

Family

ID=85669179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211535904.5A Pending CN115866326A (en) 2022-12-02 2022-12-02 Audio processing method and device for panoramic video

Country Status (1)

Country Link
CN (1) CN115866326A (en)

Similar Documents

Publication Publication Date Title
US11051081B2 (en) Virtual reality resource scheduling of processes in a cloud-based virtual reality processing system
US11348202B2 (en) Generating virtual reality content based on corrections to stitching errors
US11055057B2 (en) Apparatus and associated methods in the field of virtual reality
US10334220B2 (en) Aggregating images and audio data to generate virtual reality content
US10652314B2 (en) Cloud-based virtual reality project system
JP7210602B2 (en) Method and apparatus for processing audio signals
US20240098446A1 (en) Head tracked spatial audio and/or video rendering
JP2023519422A (en) AUDIO PROCESSING METHOD, DEVICE, READABLE MEDIUM AND ELECTRONIC DEVICE
US11917391B2 (en) Audio signal processing method and apparatus
US11431901B2 (en) Aggregating images to generate content
EP3745745A1 (en) Apparatus, method, computer program or system for use in rendering audio
CN115866326A (en) Audio processing method and device for panoramic video
JP2023075859A (en) Information processing apparatus, information processing method, and program
JP2016163181A (en) Signal processor and signal processing method
US20230254660A1 (en) Head tracking and hrtf prediction
US20230283976A1 (en) Device and rendering environment tracking
JP2019102940A (en) Virtual viewpoint content generation system, voice processing device, control method for virtual viewpoint content generation system, and program
CN117768831A (en) Audio processing method and system
WO2023150486A1 (en) Gesture controlled audio and/or visual rendering
CN117826982A (en) Real-time sound effect interaction system based on user pose calculation
KR20230013629A (en) Method and apparatus for switching an audio scene in a multi-view environment
CN118042345A (en) Method, device and storage medium for realizing space sound effect based on free view angle
KR20190082055A (en) Method for providing advertisement using stereoscopic content authoring tool and application thereof
KR20190081160A (en) Method for providing advertisement using stereoscopic content authoring tool and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination