CN108810567B

CN108810567B - Audio and video visual angle matching method, client and server

Info

Publication number: CN108810567B
Application number: CN201710289042.5A
Authority: CN
Inventors: 高莹; 顾迎节; 张尧烨
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2020-10-16
Anticipated expiration: 2037-04-27
Also published as: CN108810567A

Abstract

The application discloses a method for matching audio and video visual angles, a client and a server, which are used for solving the problem that when the current visual angle changes, the client cannot select an audio file matched with the client to play in the existing scheme for playing panoramic video by the client, so that the user experience is poor. The method comprises the steps that a client sends a first request message carrying an identification of an MPD file, which is used for acquiring the MPD file of the panoramic video, to a server; receiving an MPD file fed back by a server according to an identifier of the MPD file, wherein the MPD file comprises an identifier of at least one audio fragment and space description information corresponding to the identifier, and the audio space description information is used for describing an associated region of the at least one audio fragment in the MPD file; and determining a first audio fragment matched with the current visual angle range according to the current visual angle range of the user and the at least one piece of audio space description information.

Description

Audio and video visual angle matching method, client and server

Technical Field

The present application relates to the field of multimedia technologies, and in particular, to a method, a client, and a server for matching audio and video views.

Background

The panoramic video is also called 360-degree panoramic video, 360-degree panoramic shooting is carried out on the periphery through a camera located at the center, the images shot at multiple angles are combined into a panoramic image through the technologies of synchronization, splicing, projection and the like, and the panoramic video is formed by the panoramic images of multiple frames.

When a user watches the panoramic video, the watching angle can be changed up, down, left, right and the like, so that better experience is obtained. One big difference between panoramic video and traditional ordinary video is that: the user does not view the complete video picture at any one time, but only a partial area of the complete video picture. The area where the content currently and actually watched by the user is located in the panoramic video coordinate system is generally referred to as a current viewing angle, and a video picture watched by the user at the current viewing angle is referred to as a video viewing angle in the present application. When a user watches the video, the user can slide the screen or rotate the head (helmet) to switch the current visual angle to watch different video visual angles.

In the current panoramic video application, only the video view angle is considered to be different along with the change of the current view angle of a user, and other media components such as audio, subtitles and the like are not considered. In some application scenarios, when the current viewing angle of the user changes, if the audio can be synchronously matched with the video viewing angle, a better viewing experience is brought to the user. For example, when we are watching an entertainment program such as "dad where you go", when multiple groups of families are grouped together, if the user's current perspective is family 1, this indicates that the user is interested in family 1, and it may be the audio associated with family 1 members that matches it. When the current view angle of the user is switched to the family 2, the matched audio is the audio related to the family 2 member. When a user does not pay special attention to a family or a video picture contains a plurality of families, the family is matched with the family by default audio, but in the current panoramic video application, when the current video visual angle of the user changes, an audio file matched with the current video visual angle cannot be selected to be played, so that the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a method for matching audio and video visual angles, a client and a server, and aims to solve the problem that when the current visual angle of a client changes in the existing scheme for playing panoramic video by the client, an audio file matched with the client cannot be selected to be played, so that the user experience is poor.

The embodiment of the application provides the following specific technical scheme:

in a first aspect, an embodiment of the present application provides a method for matching audio and video perspectives, including:

a server receives a first request message sent by a client and used for acquiring a Media Presentation Description (MPD) file of a panoramic video, wherein the first request message carries an identifier of the MPD file;

and the server returns the MPD file to the client according to the identifier of the MPD file, wherein the MPD file comprises the identifier of at least one audio fragment and audio space description information corresponding to the identifier, and the audio space description information is used for describing the associated region of the at least one audio fragment.

By adopting the method, the client requests the server to acquire the MPD file containing the identification of the audio fragment and the audio space description information corresponding to the identification, so that the client can calculate the associated area of each audio in the panoramic video image according to the audio space description information after the current view angle range is determined. When the associated area corresponding to a certain audio fragment is matched with the current visual angle range of the user, the client acquires an audio file accurately matched with the video image to play, so that the audio and the video image are synchronously matched, and the watching experience of the user is improved. The method and the device for playing the panoramic video can be used for solving the problem that when the current visual angle of the client changes, the client cannot select the audio file matched with the client to play, so that the user experience is poor.

With reference to the first aspect, in a possible design, the MPD file further includes a region matching condition of at least one audio slice in the MPD file and/or a matching policy of multiple audio slices.

In this design, when the MPD file includes the region matching condition, when the region matching condition is satisfied between the associated region of the audio segment and the current view angle range of the user, the audio segment is considered to be matched with more views. When the MPD file comprises a multi-audio matching strategy, and when the associated areas of at least two audio fragments and the current view angle range of the user meet the area matching condition, determining the audio fragments matched with the current view angle range according to the multi-audio matching strategy, and providing a more flexible video matching effect for the user.

With reference to the first aspect, in one possible design, the method further includes:

the server receives a second request message which is sent by the client and used for acquiring the video fragment, wherein the second request message carries the identifier of the video fragment;

and the server sends the video fragments to the client according to the identifiers of the video fragments.

the server receives a third request message which is sent by the client and used for acquiring a first audio fragment matched with the video fragment, wherein the third request message carries an identifier of the first audio fragment;

and the server sends the first audio fragment to the client according to the identifier of the first audio fragment.

In a second aspect, an embodiment of the present application provides a method for matching audio and video perspectives, including:

a client sends a first request message for acquiring a Media Presentation Description (MPD) file of a panoramic video to a server, wherein the first request message carries an identifier of the MPD file;

the client receives the MPD file fed back by the server according to the identifier of the MPD file, wherein the MPD file comprises the identifier of at least one audio fragment and space description information corresponding to the identifier, and the audio space description information is used for describing the associated area of the at least one audio fragment in the MPD file;

and the client determines a first audio fragment matched with the current visual angle range according to the current visual angle range of the user and the at least one piece of audio space description information.

In the method, the client requests the server to acquire the MPD file containing the identification of the audio fragment and the audio space description information corresponding to the identification, so that the client can calculate the associated area of each audio in the panoramic video image according to the audio space description information after the current view angle range is determined. When the associated area corresponding to a certain audio fragment is matched with the current visual angle range of the user, the client acquires an audio file accurately matched with the video image to play, so that the audio and the video image are synchronously matched, and the watching experience of the user is improved. The method and the device for playing the panoramic video can be used for solving the problem that when the current visual angle of the client changes, the client cannot select the audio file matched with the client to play, so that the user experience is poor.

With reference to the second aspect, in a possible design, the MPD file further includes a region matching condition of at least one audio slice in the MPD file and/or a matching policy of multiple audio slices.

With reference to the second aspect, in one possible design, the determining, by the client, a first audio slice matching a current view angle range of a user according to the current view angle range and the at least one piece of audio space description information includes:

the client obtains at least one associated area of at least one audio fragment in the MPD file in the panoramic video according to the at least one audio space description information;

the client determines the audio fragment corresponding to the associated region matched with the current view angle range in the at least one associated region as an alternative audio fragment;

if only one alternative audio fragment exists, determining the alternative audio fragment as a first audio fragment;

if at least two alternative audio fragments exist, determining a first audio fragment according to the matching strategy of the multi-audio fragment;

and if the alternative audio fragment does not exist, determining the default audio fragment which is pre-configured as the first audio fragment.

In the design, a multi-audio matching strategy is set in the MPD file, and when a plurality of associated areas are matched with the current view angle range of a user, the client can select the optimal audio to perform matching playing according to the multi-audio matching strategy.

With reference to the second aspect, in one possible design, an associated region of the at least one associated region that matches the current view angle range is the same associated region as the current view angle range; or the like, or, alternatively,

and the associated area meets the area matching condition with the current view angle range.

In the design, different conditions are set for the associated area, which is matched with the associated area in the current view angle range, in the at least one associated area, so that a user can specifically determine whether the at least one associated area is matched with the current view angle range according to actual needs, the mode is flexible, and the user experience is improved.

With reference to the second aspect, in one possible design, the associating the region, which satisfies the region matching condition, with the current view angle range includes:

an associated region falling within the current perspective range; or the like, or, alternatively,

and the matching degree of the current visual angle range is greater than the correlation area of a preset threshold value.

In the design, by setting the region matching condition of the audio segment in the MPD file, different condition matching between the associated region of the audio and the current view angle of the user can be achieved, so as to provide a more flexible matching effect between the audio and the video image, and further,

with reference to the second aspect, in one possible design, the method further includes:

the client downloads at least one audio fragment included in the MPD file to the local client, and after determining a first audio fragment matched with the current view angle range according to the current view angle range of the user and the at least one audio space description information, the client acquires the first audio fragment from the at least one audio fragment downloaded to the local client for decoding and playing.

In the design, as the data volume of the audio fragments is not large, the client downloads a plurality of audios to the local in advance, and the audio fragments are directly acquired locally for decoding and playing after the audio fragments matched with the area of the current visual angle range of the user are determined, so that the audio acquisition efficiency is improved, the matching efficiency is further improved, and the user experience is improved.

In a third aspect, an embodiment of the present application provides a server, including:

the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a first request message sent by a client and used for acquiring a Media Presentation Description (MPD) file of a panoramic video, and the first request message carries an identifier of the MPD file;

and the processing unit is configured to return the MPD file to the client according to the identifier of the MPD file, where the MPD file includes an identifier of at least one audio segment and audio space description information corresponding to the identifier, and the audio space description information is used to describe an associated region of the at least one audio segment.

With reference to the third aspect, in a possible design, the MPD file further includes a region matching condition of at least one audio slice in the MPD file and/or a matching policy of multiple audio slices.

With reference to the third aspect, in one possible design, the server further includes a sending unit,

the receiving unit is further configured to receive a second request message sent by the client and used for acquiring a video fragment, where the second request message carries an identifier of the video fragment;

and the sending unit is used for sending the video fragments to the client according to the identifiers of the video fragments.

With reference to the third aspect, in a possible design, the receiving unit is further configured to receive a third request message sent by the client and used to obtain a first audio fragment matched with the video fragment, where the third request message carries an identifier of the first audio fragment;

the sending unit is further configured to send the first audio fragment to the client according to the identifier of the first audio fragment.

In a fourth aspect, an embodiment of the present application provides a client, including:

a sending unit, configured to send a first request message for obtaining a Media Presentation Description (MPD) file of a panoramic video to a server, where the first request message carries an identifier of the MPD file;

a receiving unit, configured to receive the MPD file fed back by the server according to an identifier of the MPD file, where the MPD file includes an identifier of at least one audio segment and spatial description information corresponding to the identifier, and the audio spatial description information is used to describe an associated region of at least one audio segment in the MPD file;

and the processing unit is used for determining a first audio fragment matched with the current visual angle range according to the current visual angle range of the user and the at least one piece of audio space description information.

With reference to the fourth aspect, in a possible design, the MPD file further includes a region matching condition of at least one audio slice in the MPD file and/or a matching policy of multiple audio slices.

With reference to the fourth aspect, in a possible design, when determining, according to the current view angle range of the user and the at least one piece of audio space description information, the first audio slice that matches the current view angle range, the processing unit is specifically configured to:

obtaining at least one associated area of at least one audio slice in the MPD file in the panoramic video according to the at least one audio space description information;

determining the audio fragment corresponding to the associated region matched with the current view angle range in the at least one associated region as an alternative audio fragment;

With reference to the fourth aspect, in one possible design, an associated region that matches the current view angle range in the at least one associated region is the same associated region as the current view angle range; or the like, or, alternatively,

With reference to the fourth aspect, in one possible design, the associating the region, which satisfies the region matching condition, with the current view angle range includes:

With reference to the fourth aspect, in one possible design, the processing unit is further configured to:

downloading at least one audio fragment included in the MPD file to the local client, and after determining a first audio fragment matched with the current view angle range according to the current view angle range of a user and the at least one audio space description information, the client acquires the first audio fragment from the at least one audio fragment downloaded to the local client for decoding and playing.

In a fifth aspect, a server provided in an embodiment of the present application includes a memory, a processor, and a communication interface; wherein the content of the first and second substances,

the memory is used for storing a computer readable program;

the processor executes the program in the memory to complete the method provided by any one of the first aspect and the possible implementation manner of the first aspect;

the communication interface is used for receiving and transmitting data under the control of the processor.

In a sixth aspect, an embodiment of the present application provides a client, including a memory, a processor, and a communication interface; wherein the content of the first and second substances,

the memory is used for storing a computer readable program;

the processor executes the program in the memory to complete the method provided by any one of the second aspect and the possible implementation manner of the second aspect;

In a seventh aspect, an embodiment of the present application provides a computer storage medium, which is a computer-readable storage medium storing a program, where the program includes instructions that, when executed by a network device having a processor, cause the network device to perform the method provided by each possible implementation manner of the first aspect and the aspect.

In an eighth aspect, an embodiment of the present application provides a computer storage medium, which is a computer-readable storage medium storing a program, where the program includes instructions, which, when executed by an electronic device with a processor, cause the electronic device to perform the method provided by each possible implementation manner of the second aspect and the second aspect.

Drawings

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a content structure of an MPD file in the prior art;

FIG. 3A is a schematic view of a full transmission mode video;

FIG. 3B is a video diagram of a block transmission scheme;

FIG. 4 is a diagram illustrating a video frame switching in the prior art;

fig. 5 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a client according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a method for matching audio and video views according to an embodiment of the present disclosure;

FIG. 8 is a flowchart illustrating another method for matching audio and video views according to an embodiment of the present disclosure;

fig. 9A, 9B and 9C are schematic diagrams illustrating a matching method of an associated region of an audio slice and a current view;

10A, 10B, and 10C are schematic views when the number of associated regions is greater than one;

fig. 11 is a schematic structural diagram of another server provided in the embodiment of the present application;

fig. 12 is a schematic structural diagram of another client according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The method and the device are based on the same inventive concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

A network architecture related to the technical solution provided by the embodiment of the present application is shown in fig. 1, and includes a server 101 and a client 102. The server corresponds to the client and provides a program of local service for the user, the client related to the embodiment of the application has the function of playing the panoramic video for the user, and the panoramic video player runs on the client, wherein the player can be an application installed on the client or a page on a browser. The client terminal can be a wireless terminal device or a wired terminal device. The wireless terminal device may be a handheld device having wireless connection capability or other processing device connected to a wireless modem. Wireless terminal devices, which may be mobile terminal devices such as mobile telephones (or "cellular" telephones) and computers having mobile terminal devices, for example, portable, pocket, hand-held, computer-included, or vehicle-mounted mobile devices, may communicate with one or more core networks via a Radio Access Network (RAN). The cable terminal device may be a cable television, a cable computer, or the like. The server is a device for providing computing services, the server can respond to a service request of the client, the server has functions of undertaking and guaranteeing the services, and the server according to the embodiment of the application has a function of providing panoramic videos for the client. The server is configured similar to a general-purpose Computer architecture, and generally includes a blow-up, a hard disk, a memory, a system bus, and the like, and is required to have high requirements in terms of processing capability, reliability, stability, security, expandability, manageability, and the like. The communication between the client and the server supports a general media transmission Protocol of the panoramic video, such as a Real-Time transport Protocol (RTP), a Real-Time Streaming Protocol (RTSP), a HyperText transfer Protocol (HTTP), a Dynamic Adaptive Streaming over HTTP (DASH) media Protocol, a Live Streaming over HTTP (HLS) media Protocol, and the like.

The server and the client related to the embodiment of the present application may be based on a DASH technology, or may be based on other technologies. Taking DASH-based technology as an example, DASH technology is mainly used to solve the tedious problem of deployment and reception mechanisms caused by different video distributors using different HTTP streaming media technologies. The DASH technology is mainly characterized in that a client can select media fragments with proper code rate according to network conditions such as download speed, cache amount and the like, and a media distributor sends the media fragments to the client through an HTTP protocol according to the selection of the client so as to ensure the viewing experience of a user.

The existing DASH standard mainly specifies the format of Media Presentation Description (MPD) files and Media segments (segments). The content structure of the conventional MPD file is shown in fig. 2, and the MPD file is divided into 4 levels, i.e., a Period (Period), an Adaptation Set (Adaptation Set), a description (reproduction), and a Segment (Segment). An MPD file consists of one or more consecutive Period, a Period representing a media Period, having a start time and an end time; a period contains one or more Adaptation sets, each typically corresponding to a media component, such as audio, video, subtitles, etc. Taking an MPD file of a video as an example, an Adaptation Set of a video usually includes multiple repetitions, where different repetitions correspond to other characteristics such as different code rates and resolutions, and multiple repetitions included in the same Adaptation Set can be dynamically adaptively switched; each replication is composed of one or more media segments, a media segment is a basic unit of the MPD, and a client may obtain and process a media segment from a server through a Uniform Resource Locator (URL) of the media segment in the MPD file to implement a streaming service.

The embodiment of the application relates to a panoramic video transmission scene, in particular to a scene for requesting a server to acquire an MPD file before a client requests the server to transmit a video fragment of a panoramic video.

The panoramic video is also called 360-degree panoramic video, the panoramic video is panoramic shot all around by 360 degrees through a camera located in the center, the user can change the observation angle of view by sliding a screen or rotating a head to drive a helmet when watching, the picture playing the panoramic video can be automatically switched along with the change, and the user seems to be in a real environment.

In a panoramic video transmission scene, a client first obtains an MPD file of a panoramic video, which is a metadata file, from a server, and provides information on how the client accesses media segments of the panoramic video.

Because the data volume of panoramic video is much larger than that of ordinary video, the current methods for transmitting panoramic video can be mainly divided into two types:

1) full transmission: consistent with a common video transmission method, the whole panoramic image is encoded and transmitted in video encoding forms such as h.264, h.265, etc., and the client receives the complete panoramic video content, as shown in fig. 3A.

2) Block transmission: the method comprises the steps of cutting a panoramic image into a plurality of blocks (tiles), coding each block of image, wherein each block of image corresponds to one video fragment, and preferentially transmitting or transmitting the content of the blocks corresponding to the current view angle of a user at high resolution during transmission. As shown in fig. 3B, the entire panoramic image is divided into 16 blocks, each block corresponding to a video slice.

The client may request a corresponding video slice according to the current video view of the user, where the current video view of the user may fall on one or more blocks, and thus the client receives the video slice corresponding to the one or more blocks. Assume that the client requests video slices corresponding to four partitions in the diagram shown on the left side of fig. 4, respectively, according to the current view requirements of the user. The client decodes, splices, renders and plays the acquired 4 video fragments, and the video picture viewed by the end user is as shown on the right side of fig. 4.

The current Moving Picture Experts Group (MPEG) DASH standard defines a view (viewpoint) descriptor in an MPD file, and video and audio contents having the same viewpoint value can be played simultaneously. The client can find the video and audio fragment lists with the same viewport value in the MPD file, and respectively obtain the video and audio fragments with proper code rates according to the current bandwidth. For example, the video list of MPD example 1 given schematically below contains 4 adaptationsets, and it can be determined from mineType that the first two adaptationsets correspond to videos and the second two adaptationsets correspond to audios, where a video slice corresponding to a playback with id of 11 or 12 and an audio slice corresponding to a playback with id of 31 or 32 can be played together because their viewport values are both equal to vp 1. And a video slice corresponding to a repetition with id 21 or 22 and an audio slice corresponding to a repetition with id 41 or 42 may be played together because their viewport values are both equal to vp 2.

MPD example one

Therefore, the prior art can only represent the view angle matching relationship between the video clips and the audio clips, but during panoramic video transmission, the video clips and the video view angles are not in one-to-one correspondence, and the matching relationship between the audio view angles and the video view angles cannot be well represented. For example, in a block transmission, a video view may consist of multiple video slices, and according to the prior art, the several video slices and the audio for which the view matches should be set to the same viewport value. However, the same video slice may belong to different video views, and especially when the audio frequencies of the two video views are different, the matching relationship between the video slices and the multiple audio frequencies forming different video views cannot be represented by adopting the prior art.

When the panoramic video is transmitted in a full frame, the full frame image corresponds to one video segment, which may contain a plurality of video views, and if the audio frequencies corresponding to the video views are different, the matching relationship between the video views and the different audio frequencies in the same video segment cannot be represented by adopting the prior art.

In the embodiment of the present application, audio space description information is added in the MPD file, and the client may calculate the association region of the audio segment corresponding to the audio space description information by using the audio space description information, and after the current view angle of the user is determined, the client may obtain and play the audio segment whose association region matches with the current view angle range of the user, so as to implement an effect of synchronous matching of the audio and video view angles.

Based on the above problems in the prior art, embodiments of the present application provide a method, a client, and a server for matching an audio view and a video view. The technical solutions provided in the embodiments of the present application are described in detail below by using specific embodiments, and it should be noted that the display order of the embodiments only represents the sequence of the embodiments, and does not represent the merits of the technical solutions provided in the embodiments.

Example one

In the embodiment of the present application, referring to fig. 5, a host 500 of a server includes: at least one processor 501, memory 502, and communication interface 503; the at least one processor 501, the memory 502, and the communication interface 503 are all connected by a bus 504;

the memory 502 is used for storing computer execution instructions.

The at least one processor 501 is configured to execute the computer-executable instructions stored in the memory 502, so that the host 500 performs data interaction with a host where a client is located through the communication interface 503 to perform a method for matching an audio and a video viewing angle according to an embodiment of the present application. Wherein the content of the first and second substances,

the at least one processor 501 reads the program in the memory 502 and performs the following processes:

the at least one processor 501 is configured to receive, through the communication interface 503, a first request message sent by a client and used for acquiring an MPD file of a panoramic video, where the first request message carries an identifier of the MPD file; and returning the MPD file to the client according to the identifier of the MPD file, wherein the MPD file comprises the identifier of at least one audio fragment and audio space description information corresponding to the identifier, and the audio space description information is used for describing the associated region of the at least one audio fragment.

In a possible implementation manner, the MPD file further includes a region matching condition and/or a matching policy of multiple audio segments of at least one audio segment in the MPD file.

The at least one processor 501 is further configured to: receiving, through the communication interface 503, a second request message for acquiring a video fragment sent by the client, where the second request message carries an identifier of the video fragment; and sending the video fragment to the client through the communication interface 503 according to the identifier of the video fragment.

The at least one processor 501 is further configured to: receiving, through the communication interface 503, a third request message sent by the client to obtain a first audio fragment matched with the video fragment, where the third request message carries an identifier of the first audio fragment; and sending the first audio fragment to the client through the communication interface 503 according to the identifier of the first audio fragment.

In this embodiment, the at least one processor 501 may include processors 501 of different types, or include processors 501 of the same type; the processor 501 may be any of the following: a Central Processing Unit (CPU), a microprocessor, a Field Programmable Gate Array (FPGA), a special processor, and other devices with computing and processing capabilities. In an alternative embodiment, the at least one processor 501 may also be integrated as a many-core processor.

The memory 502 may be any one or any combination of the following: random Access Memory (RAM), Read Only Memory (ROM), non-volatile Memory (NVM), Solid State Drive (SSD), mechanical hard disk, magnetic disk, and magnetic disk array.

The communication interface 503 is used for the host 500 to perform data interaction with other devices (e.g., a host where a client is located). The communication interface 503 may be any one or any combination of the following: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.

The bus 504 may include an address bus, a data bus, a control bus, etc., which is represented by a thick line in fig. 5 for ease of illustration. The bus 504 may be any one or any combination of the following: an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, and other wired data transmission devices.

An embodiment of the present invention provides a client, and as shown in fig. 6, a host 600 where the client is located includes: at least one processor 601, memory 602, and communication interface 603; the at least one processor 601, the memory 602, and the communication interface 603 are all connected by a bus 604;

the memory 602 is used to store computer-executable instructions.

The at least one processor 601 is configured to execute the computer-executable instructions stored in the memory 602, so that the host 600 performs data interaction with a host where a client is located through the communication interface 603 to perform a method for matching audio and video perspectives according to an embodiment of the present application. Wherein the content of the first and second substances,

the at least one processor 601 reads the program in the memory 602, and performs the following processes:

the at least one processor 601 is configured to send a first request message for acquiring an MPD file of a panoramic video to a server through the communication interface 603, where the first request message carries an identifier of the MPD file; receiving, by the communications interface 603, the MPD file fed back by the server according to the identifier of the MPD file, where the MPD file includes an identifier of at least one audio segment and spatial description information corresponding to the identifier, and the audio spatial description information is used to describe an associated region of at least one audio segment in the MPD file; and determining a first audio fragment matched with the current visual angle range according to the current visual angle range of the user and the at least one piece of audio space description information.

When determining, according to the current view range of the user and the at least one piece of audio space description information, the first audio slice matching the current view range, the at least one processor 601 is specifically configured to:

obtaining at least one associated area of at least one audio slice in the MPD file in the panoramic video according to the at least one audio space description information; determining the audio fragment corresponding to the associated region matched with the current view angle range in the at least one associated region as an alternative audio fragment; if only one alternative audio fragment exists, determining the alternative audio fragment as a first audio fragment; if at least two alternative audio fragments exist, determining a first audio fragment according to the matching strategy of the multi-audio fragment; and if the alternative audio fragment does not exist, determining the default audio fragment which is pre-configured as the first audio fragment.

In a possible implementation manner, a matching associated region in the at least one associated region and the current view angle range is the same associated region as the current view angle range; or, the current view angle range and the associated area meeting the area matching condition.

In a possible implementation manner, the associating the region, which satisfies the region matching condition with the current view angle range, includes: an associated region falling within the current perspective range; or, the matching degree with the current view angle range is greater than the associated area of a preset threshold value.

The at least one processor 601 is further configured to download at least one audio slice included in the MPD file to the local client, where the client obtains a first audio slice from at least one audio slice downloaded to the local client for decoding and playing after determining the first audio slice matched with the current view angle range according to the current view angle range of the user and the at least one audio space description information.

In this embodiment, the at least one processor 601 may include processors 601 of different types, or include processors 601 of the same type; the processor 601 may be any of the following: the system comprises a CPU, an ARM processor, an FPGA, a special processor and other devices with computing processing capacity. In an alternative embodiment, the at least one processor 601 may also be integrated as a many-core processor.

The memory 602 may be any one or any combination of the following: RAM, ROM, NVM, SSD, mechanical hard disk, magnetic disk array, etc.

The communication interface 603 is used for the host 600 to perform data interaction with other devices (e.g., a host where a server is located). The communication interface 603 may be any one or any combination of the following: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.

The bus 604 may include an address bus, a data bus, a control bus, etc., which is represented by a thick line in fig. 6 for ease of illustration. The bus 604 may be any one or any combination of the following: ISA bus, PCI bus, EISA bus and other wired data transmission devices.

The server and the client provided by the embodiment of the application can realize that the client requests the server to acquire the MPD file containing the identification of the audio fragment and the audio space description information corresponding to the identification, so that the client can calculate the associated area of each audio in the panoramic video image according to the audio space description information after the current view angle range is determined. When the associated area corresponding to a certain audio fragment is matched with the current visual angle range of the user, the client acquires an audio file accurately matched with the video image to play, so that the audio and the video image are synchronously matched, and the watching experience of the user is improved. The method and the device for playing the panoramic video can be used for solving the problem that when the current visual angle of the client changes, the client cannot select the audio file matched with the client to play, so that the user experience is poor. Further, in this embodiment of the present application, by setting an area matching condition of an audio segment in an MPD file, different condition matching between an associated area of an audio and a current view of a user can be achieved, so as to provide a more flexible matching effect between the audio and a video image.

Example two

An embodiment of the present application provides a method for matching an audio view and a video view, as shown in fig. 7, in the method, an interaction flow between a server and a client is as follows:

s701: the method comprises the steps that a client sends a first request message for acquiring an MPD file of a panoramic video to a server, wherein the first request message carries an identifier of the MPD file.

In S701, the identifier of the MPD file is used for the server to acquire the MPD file indicated by the identifier of the MPD file. The identification of the MPD file may be a Uniform Resource Identifier (URI), for example, where the URI is http:// example.

GET http://example.com/mpd HTTP/1.1

Connection:keep-alive

It should be noted that the first request message is only an exemplary description, and the first request message in this embodiment may include other parameters besides the identifier of the MPD file, which is not described herein again.

S702: and the server acquires the MPD file according to the identification of the MPD file.

In S702, the MPD file includes an identifier of at least one audio segment and audio space description information corresponding to the identifier, where the audio space description information is used to describe an associated area of the at least one audio segment.

Illustratively, the content of the MPD file including the audio spatial description information is as follows:

the description of some attributes in the MPD file including the spatial description information is shown in the following table one:

watch 1

In the first table, adaptationSet @ mimeType indicates a media type, and as known from adaptationSet (mimeType ═ video/mp 4), the MPD file includes an mp4 type video file, and the adaptationSet includes 3 video slices with different bitrate, which correspond to different video heights and widths, for example: when the code rate is bandwidth ═ 1024000", the width of the video image is width ═ 2560", and the height is height ═ 720", and since the video in this embodiment adopts the full-frame transmission mode, the width and height of the panoramic image in the panoramic video are 2560 and 720. In addition, the MPD file further includes 3 audio slices, AdaptationSet (mimeType ═ audio/mp4"), where a main audio slice and 2 audio slices corresponding to specific regions are included, schemeIdUri ═ urn: mpeg: dash: asrd:2016" indicates audio space description information, and the key (value) value definitions thereof are shown in table two, where M indicates mandatory and O indicates optional.

Watch two

@value	Application method	Description of the invention
			object_x	M	Abscissa of upper left corner of corresponding area of audio slice in panoramic video image
object_y	M	The upper left corner of the corresponding area of the audio fragment is viewed in the panoramaOrdinate in frequency image
			object_width	M	Width or horizontal dimension of corresponding area of audio slice
object_height	M	Height or vertical dimension of corresponding area of audio slice
			total_width	O	Width of panoramic video image
total_height	O	Height of panoramic video image

Therefore, the audio space description information < supplemental property schemeIdUri ═ urn: mpeg: dash: asrd:2016"value ═ 480,390,810,300,3840,1080"/> corresponding to the audio slice 1 indicates that the associated region of the audio slice is a region with a width of 3840, a height of 1080, and a width of 810 and a height of 300, with (480,390) as the upper left corner in the panoramic video image. Since the width and height of the panoramic video image are provided in the spatial description information corresponding to audio slice 1, the width and height of the panoramic video image may not be provided in < SupplementalProperty scheme electronic ri ═ urn: mpeg: dash: asrd:2016"value ═ 3072,285,480,510"/> corresponding to audio slice 2, which indicates that the associated region of audio slice 2 is a region with width 3840 and height 1080, where (3072,285) is the upper left corner and width 480 is 510.

In this embodiment, an audio slice that does not provide audio relationship description information is considered as a main audio and may also be referred to as a default audio, except that an audio slice that does not provide audio relationship description information may be used as a default audio, and if the audio slice includes priority information, an audio slice with the highest priority may also be considered as a default audio.

It should be noted that, the audio spatial description information may be described by the coordinate position of each vertex of the associated region corresponding to the audio slice, in addition to the description method given in table two, and the description method of the spatial region is not limited in the present application. Therefore, in addition to the above-described absolute value description method, it can be described by giving a relative ratio to the panoramic video image.

S703: and the server returns the MPD file to the client, wherein the MPD file comprises an identifier of at least one audio fragment and audio space description information corresponding to the identifier, and the audio space description information is used for describing an associated area of the at least one audio fragment.

In this embodiment, the server may send the MPD file to the client by using the above method, so that the client realizes one-to-one matching between the video segments and the audio segments based on the MPD file. The method can further comprise the following steps to realize that the server transmits the audio fragments of the panoramic video to the client:

s704: and the client sends a second request message for acquiring the video fragment to the server, wherein the second request message comprises the identifier of the video fragment.

The client requests the server to select a video fragment with a suitable code rate according to the current bandwidth condition, wherein the code rate selected by the client is assumed to be bandwidth ═ 1024000, and the corresponding representation is as follows:

</Representation>

therefore, the URL of the video fragment is http:// cdn1.example. com/562465736.mp4, and the format of the second request message is as follows:

GET http://cdn1.example.com/562465736.mp4HTTP/1.1

Connection:keep-alive

s705: and the server sends the video fragments to the client according to the identifiers of the video fragments.

S706: the client side according to the current view angle range of the user and at least one audio spatial description information in the MPD file,

determining a first audio slice matching the current view angle range.

Since the panoramic image corresponding to the video slice acquired by the client in S705 has a width of 2560 and a height of 720, it is assumed that the current view angle range area of the user is an area with a width of 2560, a height of 720, and a width of 540 and a height of 200, and the upper left corner of the panoramic video image with the width of 2560 and the height of 320,260 is the upper left corner. Since the width of the corresponding panoramic video image in the audio space description information of the MPD file in table one is 3840 and the height is 1080, the client needs to scale the value in the audio space description information:

object_x‘＝object_x*width‘/total_width

object_y‘＝object_y*height‘/total_height

object_width’＝object_width*width‘/total_width

object_height‘＝object_height*height‘/total_height

the object _ x, object _ y, object _ width, object _ height, total _ width, and total _ height are original value values in audio space description information corresponding to audio slices in the MPD file, the width and height are width and height of a panoramic video image corresponding to a video slice acquired by a client, and the object _ x ', object _ y', object _ width ', object _ height', width, and height are space description information of the audio slices in the panoramic video image corresponding to the video slice acquired by the client. After calculation, the audio slice 1 is an area with width 2560, height 720 and associated area in the panoramic video image (320,260) as the upper left corner, width 540 as the area with height 200, the audio slice 2 is an area with width 2560, height 720 and associated area in the panoramic video image (2030,190) as the upper left corner, width 320 as the area with height 340, so that the client determines that the audio slice matching with the current view angle range area of the user is audio slice 1, that is, the first audio slice is audio slice 1.

S707: and the client sends a third request message for acquiring the first audio fragment matched with the video fragment to a server, wherein the third request message carries the identifier of the first audio fragment.

The AdaptationSet corresponding to the audio fragment 1 includes two audio fragments with different code rates, and it is assumed that the client determines and selects the audio fragment with the code rate of "64000" according to the current bandwidth

Therefore, the URL of the selected audio slice is http:// cdn1.example. com/3463275477.mp4, and the third request message format is as follows:

GET http://cdn1.example.com/3463275477.mp4 HTTP/1.1

Connection:keep-alive

s708: and the server sends the first audio fragment to the client according to the identifier of the first audio fragment.

And the server sends the corresponding audio fragment to the client according to the third request information of the client, and the client decodes and plays the audio fragment.

It should be noted that, because the data size of the audio fragment is not large, the client may also download a plurality of audios to the local in advance, and after determining the audio fragment matched with the area of the current view angle range of the user in S706, directly obtain the audio fragment in the local for decoding and playing.

Further, after the user converts the current view, the client acquires the audio slice matched with the latest current view for decoding and playing.

Assuming that the area viewed by the current view after the user is converted is an area with a width of 2560, a height of 720, and a height of 340 in the panoramic video image with (2030,190), the client determines the audio slice matching the current view range area of the user as audio slice 2 according to step S706, and then executes S707 and S708 to obtain the audio slice with a bitrate of "64000" in the AdaptationSet corresponding to audio slice 2 for decoding and playing.

It should be noted that the execution sequence between S704-S705 and S706-S708 is not limited in this application.

In a possible embodiment, the MPD file further includes a region matching condition of at least one audio slice in the MPD file and/or a matching policy of multiple audio slices. This embodiment is described in detail in example three below.

Fig. 8 illustrates a method for matching an audio view and a video view, where fig. 8 describes a client as an execution subject, and at this time, an execution process of a server is the same as that in fig. 7, which is not described again here.

As shown in fig. 8, the method for determining an audio slice matching a current video view by a client includes the following steps:

800: the method comprises the steps that a client sends a first request message for acquiring an MPD file of a panoramic video to a server, wherein the first request message carries an identifier of the MPD file. The specific implementation process may refer to S701 in fig. 7, which is not described herein again.

801: the method comprises the steps that a client receives an MPD file sent by a server, wherein the MPD file comprises an identifier of at least one audio fragment and audio space description information corresponding to the identifier, and the audio space description information is used for describing an associated area of the at least one audio fragment.

Currently, the methods for transmitting panoramic video can be mainly divided into full-frame transmission and block transmission, and when the panoramic video is transmitted in full frame, the content of the MPD file may be as shown in embodiment two. In the third embodiment, the description will be given by focusing on the chunking transmission as an example, and at this time, the content of the MPD file including the audio space description information is as follows.

The MPD file includes a main audio segment and 2 audio segments corresponding to specific regions, where schemeIdUri "urn: mpeg: dash: asrd:2016" represents audio space description information, where the audio space description information may adopt a representation method defined in table one in embodiment two, and a relative value representation method of audio space description information is adopted in embodiment three, where the value definition is shown in table three:

watch III

802: the client selects the video fragments and determines the width and height of the panoramic video image corresponding to the video fragments.

And the client selects the video fragments with proper code rate according to the current bandwidth, and when the panoramic video is transmitted in a full frame as in the second embodiment, the width and the height corresponding to the selected video fragments are the width and the height of the panoramic video image. When the video slice of the panorama is transmitted by using the three-way partitioning of the present embodiment, assuming that the client selects the video slice with the bitrate of bandwidth "128000" according to the current bandwidth, and the width "960" height "270" indicates that the video slice corresponds to the video picture with the width of 960 and the height of 270, as described in the MPD file of the above example, the < supplemental property schemeelduri "urn of the video AdaptationSet (mimeType/mp 4") indicates that the width and the height of the panoramic video picture are respectively 4, the entire panoramic video picture is divided into 4 × 4 — 16 blocks (Tile), that is, the width and the height of each video slice are respectively one of the width and the height of the panoramic video picture, and therefore, the client selects the video slice with the bitrate of 35 bandwidth 960 corresponding to the panoramic video picture of which is supposed to be transmitted by using the video slice of the above example, the height is 270 x 4 ═ 1080.

It should be noted that, the existing prior art may be referred to for determining the width and height of the panoramic video image corresponding to the video slice, and the third embodiment is only an example and is not limited in detail.

803: and the client calculates the associated area of each audio fragment in the panoramic video image corresponding to the video fragment according to the audio space description information in the MPD file.

When the audio spatial description information is expressed by the absolute value as shown in table two, the associated area of each audio slice in the panoramic video image corresponding to the video slice may be calculated according to the method described in S706 in embodiment two. In the third embodiment, a method for calculating the associated area of each audio clip in the panoramic video image corresponding to the video clip when the audio space description information is represented by the relative scale shown in the third embodiment is described in detail.

From the overall width and overall height of the panoramic video image determined in 802, 3840 and 1080 respectively, it can be determined from the audio space description information value attribute (relative scale representation) given in table three: the audio space description information < supplemental property schemeIdUri ═ urn: mpeg: dash: asrd: 2016: "value ═ 0.125,0.361,0.211,0.278"/> indicates that the region associated with the audio slice is a region with (0.125 ═ 3840 ═ 480,0.361 ═ 1080 ═ 390) as the upper left corner, a width of 0.211 ═ 3840 ═ 810, and a height of 0.278 ═ 1080 ═ 300 in the panoramic video image with a width of 3840 and a height of 1080. The audio space description information < supplemental property schemeIdUri ═ urn: mpeg: dash: asrd: 2016: "value ═ 0.8,0.264,0.125,0.472"/> indicates that the region associated with the audio slice is a region with (0.8 ═ 3840 ═ 3072,0.264 ═ 1080 ═ 285) as the upper left corner, a width of 0.125 ═ 3840 ═ 480 ═ 1080 and a height of 0.472 ═ 1080 in the panoramic video image with a width of 3840 and a height of 1080.

804: when there is a match between the associated region of the alternative audio slice and the current view angle range, 805 is performed; otherwise, 807 is performed.

And the client determines the audio fragment corresponding to the associated area matched with the current view angle range in the at least one associated area as the alternative audio fragment.

Specifically, it is determined whether there is an associated area of an audio slice that matches the current viewing angle range. Can be determined by:

in the first mode, if the associated region of an audio slice is the same as the current view angle range, it is determined that the audio slice is matched with the current view angle range.

After the associated area of an audio fragment is calculated according to the method, if the associated area of the audio fragment is the same as the current view angle range area of the user, the audio fragment is considered to be matched with the current view angle range. For example, assuming that the current viewing angle range area of the user is an area with a width of 3840 and a height of 1080, and the left corner is (480,390) in the panoramic video image with a width of 810 and a height of 300, it can be determined from the calculation result in 803 that the associated area of the audio slice 1 is the same as the current viewing angle range area of the user, that is, the audio slice 1 matches the current viewing angle range, as shown in fig. 9A.

The second method comprises the following steps: and if the associated area of one audio fragment is the associated area meeting the area matching condition with the current visual angle range, determining that the audio fragment is matched with the current visual angle range.

Specifically, the associated region, in which the current view angle range satisfies the region matching condition, includes:

an associated region falling within the current perspective range; or, the matching degree with the current view angle range is greater than the associated area of a preset threshold value.

Specifically, a region matching condition may be set in the MPD file, and when the region matching condition is satisfied between the associated region of the audio segment and the current view angle range region of the user, it is determined that the audio segment matches the current view angle range.

For example, 1) the region matching condition is a condition of an inclusion relationship, and when the current viewing angle range region of the user includes an associated region of an audio slice, the audio slice is considered to match the current viewing angle range, as shown in fig. 9A and 9B; 2) the area matching condition is a condition of a minimum matching proportion, and the minimum matching proportion is a preset proportion value. When the proportion of the overlapping part of the current visual angle range area of the user and the associated area of the audio slice in the associated area of the audio slice is greater than the minimum matching proportion, the audio slice is considered to be matched with the current visual angle range, as shown in fig. 9C.

It should be noted that the present application is not limited to the matching method of the associated area of the audio slice and the current viewing angle range.

805: when there is a match between the associated regions of the at least two alternative audio slices and the current view angle range, performing 806; otherwise, 808 is performed.

The number of the associated regions of the audio slice matching the current viewing angle range determined according to the above method is more than one, which can be specifically shown in fig. 10A, 10B and 10C.

806: when the MPD file contains the multi-audio matching strategy, executing 809; otherwise, 807 is performed.

807: and the client selects the default audio fragment as the first audio fragment for decoding and playing.

The default audio slice may be an audio slice without any associated region or an audio slice without audio space description information set, or an audio slice with the highest priority set.

808: and selecting the first audio slice matched with the current view angle range for decoding and playing.

809: and determining a first audio slice which is matched with the current visual angle range and is to be acquired according to a multi-audio matching strategy for decoding and playing.

The multi-audio matching policy is used to indicate a policy for selecting an audio slice matching the current viewing angle range when the associated regions of the plurality of audio slices can all match the current viewing angle range. For example, the priority matching policy may be implemented as one embodiment of a multi-audio matching policy. At this time, the priority of each audio segment needs to be preset in the MPD file, and according to the preset priority of each audio segment, the audio segment with the highest priority is selected as the first audio segment matched with the current view angle range; for another example, the matching strategy of the matching degree may be implemented as an embodiment of the multi-audio matching strategy. At this time, the overlapping area of each associated area and the current view angle range area can be calculated, and the associated area with the largest overlapping area is used as the associated area with the largest matching degree; or calculating the proportion value of the overlapping area and the associated area, and taking the associated area with the maximum proportion value as the associated area with the maximum matching degree, thereby determining the first audio slice corresponding to the associated area matched with the current view angle range.

It should be noted that, the multi-audio matching policy is not specifically limited in this application, and any method that can be used to select an audio slice that matches the current view angle range when all the associated regions of multiple audio slices can match the current view angle range may be used as the multi-audio matching policy.

If the multi-audio matching policy is a priority matching policy, the audio adaptation set (AdaptationSet) should include a priority attribute for indicating the priority of the audio slice. When the associated areas of the plurality of audio fragments can be matched with the current view angle range, the audio fragments with the priorities meeting the requirements are determined to be the audio fragments matched with the current view angle range by comparing the priorities of the audio fragments.

The MPD file in the third embodiment adds the region matching condition on the basis of the second embodiment, so that the matching relationship between the associated region of the audio segment and the current view angle range region can be more flexibly expressed, and further, the problem of how to select the best audio segment when multiple audio segments are matched with the current view angle range can be solved through a multi-audio matching policy, so that more accurate viewing experience of synchronous matching of audio and video view angles can be brought to a user.

EXAMPLE III

Based on the above embodiments, the embodiment of the present invention further provides a server, which may be the same device as the server shown in fig. 5, and the method executed by the server side in the second embodiment may be adopted. Referring to fig. 11, a server 1100 according to an embodiment of the present invention includes: receiving unit 1101, processing unit 1102. Wherein the content of the first and second substances,

a receiving unit 1101, configured to receive a first request message sent by a client and used to acquire a Media Presentation Description (MPD) file of a panoramic video, where the first request message carries an identifier of the MPD file;

a processing unit 1102, configured to return the MPD file to the client according to the identifier of the MPD file, where the MPD file includes an identifier of at least one audio segment and audio space description information corresponding to the identifier, and the audio space description information is used to describe an associated area of the at least one audio segment.

In one possible implementation, the server further includes a sending unit 1103,

the receiving unit 1101 is further configured to receive a second request message sent by the client and used for acquiring a video fragment, where the second request message carries an identifier of the video fragment;

the sending unit 1103 is configured to send the video fragment to the client according to the identifier of the video fragment.

In a possible implementation manner, the receiving unit 1101 is further configured to receive a third request message sent by the client and used to obtain a first audio fragment matched with the video fragment, where the third request message carries an identifier of the first audio fragment;

the sending unit 1103 is further configured to send the first audio fragment to the client according to the identifier of the first audio fragment.

The functions of the above units can be referred to the method executed by the server side in the second embodiment, and are not described herein again.

It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Based on the above embodiment, the embodiment of the present invention further provides a client, where the client may be the same device as the client shown in fig. 6, and the method executed by the client in the second embodiment may be adopted. Referring to fig. 12, a client 1200 according to an embodiment of the present invention includes: a receiving unit 1201, a processing unit 1202 and a transmitting unit 1203. Wherein the content of the first and second substances,

a sending unit 1203, configured to send a first request message for obtaining a media presentation description MPD file of a panoramic video to a server, where the first request message carries an identifier of the MPD file;

a receiving unit 1201, configured to receive the MPD file fed back by the server according to an identifier of the MPD file, where the MPD file includes an identifier of at least one audio segment and spatial description information corresponding to the identifier, and the audio spatial description information is used to describe an associated region of at least one audio segment in the MPD file;

a processing unit 1202, configured to determine, according to a current view range of a user and the at least one piece of audio space description information, a first audio slice that matches the current view range.

In a possible implementation manner, when determining, according to the current view angle range of the user and the at least one piece of audio space description information, the processing unit 1202 is specifically configured to:

In a possible implementation manner, a matching associated region in the at least one associated region and the current view angle range is the same associated region as the current view angle range; or the like, or, alternatively,

In a possible implementation manner, the associating the region, which satisfies the region matching condition with the current view angle range, includes:

In one possible implementation, the processing unit 1202 is further configured to:

The functions of the above units can be referred to the method executed by the second client side in the embodiment, and are not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims

1. A method for audio and video perspective matching, comprising:

2. The method of claim 1, wherein the MPD file further comprises a region matching condition for at least one audio slice and/or a matching policy for multiple audio slices in the MPD file.

3. The method of claim 1 or 2, wherein the method further comprises:

4. The method of claim 3, wherein the method further comprises:

5. A method for audio and video perspective matching, comprising:

6. The method of claim 5, wherein the MPD file further comprises a region matching condition for at least one audio slice and/or a matching policy for multiple audio slices in the MPD file.

7. The method of claim 6, wherein the determining, by the client, the first audio slice matching the current view range according to the current view range of the user and the at least one audio space description information comprises:

8. The method of claim 7, wherein a matching one of the at least one associated region within the current view range is the same associated region as the current view range; or the like, or, alternatively,

9. The method of claim 8, wherein the associating the region with the current view angle range that satisfies the region matching condition comprises:

10. The method of claim 5, wherein the method further comprises:

11. A server, comprising:

12. The server of claim 11, wherein the MPD file further comprises a region matching condition for at least one audio slice and/or a matching policy for multiple audio slices in the MPD file.

13. The server according to claim 11 or 12, wherein the server further comprises a transmitting unit,

14. The server according to claim 13, wherein the receiving unit is further configured to receive a third request message sent by the client to obtain a first audio fragment matching the video fragment, where the third request message carries an identifier of the first audio fragment;

15. A client, comprising:

16. The client of claim 15, wherein the MPD file further comprises a region matching condition for at least one audio slice and/or a matching policy for multiple audio slices in the MPD file.

17. The client according to claim 16, wherein the processing unit, when determining, according to a current viewing angle range of a user and the at least one audio spatial description information, a first audio slice that matches the current viewing angle range, is specifically configured to:

18. The client of claim 17, wherein a matching association region of the at least one association region within the current view range is the same association region as the current view range; or the like, or, alternatively,

19. The client of claim 18, wherein the associating the region with the current view angle range that satisfies the region matching condition comprises:

20. The client of claim 15, wherein the processing unit is further to:

21. A server comprising a memory, a processor, and a communication interface; wherein the content of the first and second substances,

the memory is used for storing a computer readable program;

the processor executes the program in the memory to complete the method according to any one of claims 1 to 4;

22. A client comprising a memory, a processor, and a communication interface; wherein the content of the first and second substances,

the memory is used for storing a computer readable program;

the processor performs the method of any one of claims 5 to 10 by executing a program in the memory;