WO2023236666A1 - 媒体信息处理方法及其装置、存储介质 - Google Patents

媒体信息处理方法及其装置、存储介质 Download PDF

Info

Publication number
WO2023236666A1
WO2023236666A1 PCT/CN2023/089286 CN2023089286W WO2023236666A1 WO 2023236666 A1 WO2023236666 A1 WO 2023236666A1 CN 2023089286 W CN2023089286 W CN 2023089286W WO 2023236666 A1 WO2023236666 A1 WO 2023236666A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
information
fragmentation
media information
display timestamp
Prior art date
Application number
PCT/CN2023/089286
Other languages
English (en)
French (fr)
Inventor
陈奇
王魏强
张晓渠
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023236666A1 publication Critical patent/WO2023236666A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the present application relates to the field of video technology, and in particular, to a media information processing method, device, and computer storage medium.
  • Embodiments of the present application provide a media information processing method, device, and computer storage medium, which can improve the user's video experience.
  • embodiments of the present application provide a media information processing method, including: receiving multiple media information streams, wherein the media information streams include multiple media information packets; and obtaining the third of the received target media information packets.
  • a display timestamp wherein the target media information packet is the first received media information packet among all the media information packets; the first display timestamp is used as the starting display of each media information stream timestamp; perform information fragmentation on each of the media information streams according to the start display timestamp to obtain multiple media fragmentation information of each of the media information streams, wherein the media fragmentation information corresponds to fragmentation sequence number, all the media fragmentation information with the same fragmentation sequence number have the same media duration; aggregate the target media fragmentation information in all the media information streams to obtain free view media fragmentation information, where , the target media fragmentation information is the media fragmentation information with the same fragmentation sequence number.
  • embodiments of the present application also provide a media information processing device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • a media information processing device including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program Implement the media information processing method as described previously.
  • embodiments of the present application also provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the media information processing method as described above.
  • the first display timestamp of the obtained target media information packet is uniformly set as the starting display timestamp of each media information stream, so as to solve the problem of inconsistent images of each media information stream arriving at the media server at the same time. , and then in this case, perform information fragmentation on each media information stream according to the starting display timestamp to obtain multiple media fragmentation information, and aggregate the media fragmentation information with the same fragmentation sequence number in all media information streams. , in order to obtain complete free-viewpoint media fragmentation information, thereby ensuring the image quality while avoiding the video image being lost when the user switches perspectives. A wide range of spatial jumps occur during the process; therefore, the embodiments of the present application enable users to seamlessly switch between free viewpoints, improve the user's video experience, and thus fill the technical gaps in related methods.
  • Figure 1 is a flow chart of a media information processing method provided by an embodiment of the present application.
  • Figure 2a is a schematic diagram of multiple media information streams before alignment provided by an embodiment of the present application
  • Figure 2b is a schematic diagram of multiple media information streams after alignment provided by an embodiment of the present application.
  • Figure 3 is a flow chart for obtaining multiple media fragmentation information of each media information stream in the media information processing method provided by another embodiment of the present application;
  • Figure 4 is a flow chart before obtaining multiple media fragmentation information of each media information stream in the media information processing method provided by an embodiment of the present application;
  • Figure 5 is a flow chart for obtaining free-view media fragmentation information in the media information processing method provided by an embodiment of the present application
  • Figure 6 is a schematic diagram of a media server for executing a media information processing method provided by an embodiment of the present application
  • Figure 7 is a flow chart of a media information processing method performed by an alignment module provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of multiple media information flows provided by another embodiment of the present application.
  • Figure 9 is a flow chart of a media information processing method performed by a splicing module provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of a media information processing device provided by an embodiment of the present application.
  • the media information processing method of one embodiment includes: receiving multiple media information streams, wherein the media information streams include multiple media information packets; obtaining the first display timestamp of the received target media information packet, wherein the target media
  • the information packet is the first received media information packet among all media information packets; the first display timestamp is used as the starting display timestamp of each media information stream; information is divided into each media information flow according to the starting display timestamp. slices to obtain multiple media fragmentation information for each media information stream.
  • the media fragmentation information corresponds to a fragmentation sequence number.
  • All media fragmentation information with the same fragmentation sequence number has the same media duration; all media information streams are The target media fragmentation information in is aggregated to obtain free-view media fragmentation information, where the target media fragmentation information is media fragmentation information with the same fragmentation sequence number.
  • the start display timestamp of the media information stream is used to solve the problem that the pictures of each media information stream arrive at the media server at the same time are inconsistent.
  • the information of each media information stream is fragmented according to the start display timestamp to obtain multiple Media fragmentation information aggregates the media fragmentation information with the same fragmentation sequence number in all media information streams to obtain complete free-view media fragmentation information, thereby ensuring the image quality while avoiding the video image in the user's mind.
  • a large-scale spatial jump occurs during the switching of viewpoints; therefore, the embodiments of the present application enable users to seamlessly switch between free viewpoints, improve the user's video experience, and thus fill the technical gaps in related methods.
  • Figure 1 is a flow chart of a media information processing method provided by an embodiment of the present application.
  • the media information processing method may include but is not limited to step S110 to step S150.
  • Step S110 Receive multiple media information streams, where the media information streams include multiple media information packets.
  • multiple media information streams are received so that the multiple received media information streams can be accurately distinguished from each other in subsequent steps, thereby determining which media information packet in the media information stream is the first.
  • the received media packet is
  • the execution subjects of steps S110 to S150 and related steps can be selected and set by those skilled in the art according to specific circumstances, and are not limited here.
  • a media server for overall management of each media information stream is used as the execution subject, that is, multiple media information streams are received through the media server, and the following steps S120 to S150 and related steps are executed based on the multiple media information streams.
  • Corresponding functional modules can be set up in the media server to perform corresponding steps to achieve better overall planning effects. Therefore, a streaming module can be set up in the media server to pull media information streams from each camera on the free viewpoint front-end.
  • the corresponding server, node, module or device can also serve as the execution subject of steps S110 to S150 and related steps.
  • the "media server” is mainly used as the execution subject of steps S110 to S150 and related steps, but this is not the only limitation.
  • the media server is an important device of the next generation network.
  • the media server Under the control of the control device (such as softswitch device, application server, etc.), the media server provides the media resource functions required to implement various services on the IP network. , including service tone provision, conferencing, interactive response, notification, unified messaging, advanced voice services, etc.
  • the application server you can, but are not limited to, use MSML (Media Server Markup Language) to send playback and other commands to the media server.
  • the media server has good tailorability and can flexibly implement one or more functions, including but not limited to:
  • Dual-Tone Multi-Frequency (DTMF) signal collection and decoding function According to the relevant operating parameters sent by the control device, the DTMF signal is received from the DTMF phone, encapsulated in signaling and transmitted to the control device;
  • Sending function of recording notification according to the requirements of the control device, use the specified voice to play the specified recording notification to the user;
  • Conference function supports the audio mixing function of multiple RTP streams and supports mixing of different encoding formats
  • Conversion function between different codec algorithms supports G.711, G.723, G.729 and other speech codec algorithms, and can realize conversion between codec algorithms;
  • Automatic speech synthesis function concatenate several speech elements or fields to form a complete voice prompt notification, which is fixed or variable;
  • Dynamic voice playback/recording function including music hold, follow-me voice service, etc.;
  • Tone signal generation and sending function Provides basic signal tones such as dial tone, busy tone, ringback tone, waiting tone and empty number tone;
  • Resource maintenance and management functions Provide maintenance and management of media resources and the device itself in local or/and remote ways, such as data configuration, fault management, etc.
  • a media server has at least one of the following features:
  • the gateway provides dual power supplies and supports hot swapping; it is positioned as carrier-grade equipment and has system congestion protection;
  • Easy maintenance Supports communication with SNMP network management, capable of online system maintenance, resource management, post-event analysis, etc.;
  • the independent application layer can customize various value-added services for users, and can update the system online to meet user needs to the greatest extent;
  • the reception method of the media information stream of different cameras can be the same, or the corresponding method can be selected according to the specific settings.
  • RTMP Real Time Messaging Protocol
  • the embodiment of this application ensures that multiple media can be received.
  • the information flow is sufficient.
  • the specific receiving method is not limited here. Since there is no need to limit the transmission method of the media information flow, it is also suitable for application scenarios in which media information flows are pulled in other ways.
  • the media information stream and the reception timing and number of media information packets in each media information stream may not be limited, but may be set accordingly in specific scenarios. For example, more than 50 camera seats can usually be set up in a venue, corresponding to more than 50 media information streams to be received. Since users need to enter the venue to watch at a specific time, the selected media can be The sending time or playback time of the information flow is set near the specific time so that the user can watch the video at the specific time.
  • Step S120 Obtain the first display timestamp of the received target media information packet, where the target media information packet is the first received media information packet among all media information packets.
  • the pictures of each media information stream arrive at the media server at the same time are inconsistent. That is to say, for all media information streams, regardless of the order in which they arrive at the media server, all media information streams need to be processed. Synchronization, then in order to avoid omission or mismatch of media information flow, at least the first received media information packet needs to be found as the starting point for improvement, so by finding the first received media information packet from all media information packets media information package, and use it as the target media information package to obtain the first display timestamp of the target media information package, so that in subsequent steps, the display timestamps of all media information packages can be combined with the first display timestamp of the target media information package. The timestamps are aligned to solve the problem that each media information stream arrives inconsistently at the media server at the same time.
  • the display timestamps of all media information packages are summarized, and then all the display timestamps are compared to obtain the first display timestamp of the target media information package.
  • Step S130 Use the first display timestamp as the starting display timestamp of each media information stream.
  • the display timestamps of each media information stream can be synchronized as the starting display timestamp, then the display times of all media information streams
  • the stamps will be consistent, which can solve the problem of inconsistent arrival of each media information stream at the media server at the same time, so that in subsequent steps, the information of each media information stream can be fragmented and aggregated based on the starting display timestamp.
  • Figure 2a is a schematic diagram of multiple media information streams provided by an embodiment of the present application before alignment
  • Figure 2b is a schematic diagram of multiple media information streams provided by an embodiment of the present application after alignment.
  • a schematic diagram of the media information flows corresponding to three camera positions is given. The media information flow in each camera position includes multiple repeated fragments.
  • each media information packet in the streaming cache queue is traversed to determine whether the current media information packet is the first media information received.
  • the startpts of the first fragment of all locations are forcibly set, where startpts is the first presentation time stamp (Presentation Time Stamp, PTS) of the current fragment, that is, the first received first fragment of the current fragment.
  • PTS of a media information packet i.e., the current media information packet
  • the media information packet will be stored in the linked list of the corresponding camera position, and then the above judgment process can be repeated for another media information packet until the required first one is found.
  • media information package i.e., the current media information packet
  • FIG. 2a a schematic diagram of the media information flow of each camera position is given without modifying the start display timestamp.
  • the numbers in the box represent the PTS of the current media information package. It can be seen from this that the The slice duration is 6s.
  • the PTS range of the first slice of camera position 1 is [0 ⁇ 540000) and startpts is 0.
  • the PTS range of the first slice of camera position 2 is [7200 ⁇ 547200).
  • the startpts is 7200.
  • the PTS range of the first slice of position 3 is [3600 ⁇ 543600), and the startpts is 3600. Since the PTS range of the starting slices of each camera is inconsistent, a large range of the screen will appear when the terminal switches between cameras. Jumping problem.
  • the first media information packet received from camera 2 i.e., the media information packet in the stream buffer queue
  • the first media information packet is the media information packet of camera 2 as an example.
  • the fragmentation duration is 6s.
  • the first fragment of camera 1 The slice PTS range is [0 ⁇ 547200), startpts is 7200, the first slice PTS range of camera position 2 is [7200 ⁇ 547200), startpts is 7200, and the first slice PTS range of camera position 3 is [3600 ⁇ 547200), the startpts is 7200, so the startpts of the second slice of each camera position are all 547200, that is to say, because the startpts of the second slice of each camera position are the same, and the slice duration is also The same, then starting from the second fragment of each camera position, the subsequent fragments of each camera position can be guaranteed to be correspondingly aligned, and then they will be consistent when they arrive at the media server at the same time, so each media can be solved The defect of inconsistent information flow arriving at the media server at the same time.
  • Step S140 Perform information fragmentation on each media information stream according to the start display timestamp to obtain multiple media fragmentation information for each media information stream.
  • the media fragmentation information corresponds to a fragmentation sequence number and has the same fragmentation sequence number. All media fragment information have the same media duration;
  • each media information stream can be further fragmented according to the start display timestamp to obtain multiple Media fragmentation information, and each media fragmentation information is distinguished by fragmentation sequence number. All media fragmentation information with the same fragmentation sequence number has the same media duration. Therefore, for different media information flows, by comparing The respective fragment serial numbers can confirm the media fragmentation information of the same time period, so that in subsequent steps, the various media fragmentation information of the same time period can be aggregated into a complete free-viewpoint fragmentation.
  • Step S140 includes but is not limited to steps S141 and S142.
  • Step S141 For each media information stream, obtain the second display timestamp of the currently received media information packet.
  • Step S142 When it is determined that the information fragmentation conditions are met based on the second display timestamp and the starting display timestamp, initial information fragmentation is performed based on the currently received media information packet, and the second display timestamp is used as the new starting display time. stamp, and perform subsequent information fragmentation based on the new starting display timestamp.
  • the second display timestamp of the currently received media information packet is obtained to compare the second display timestamp with the aligned starting display timestamp to determine whether the information fragmentation condition is met. If so, Initial information fragmentation can be performed based on the currently received media information packet, and subsequent information fragmentation can be performed using the second display timestamp that meets the conditions as a new starting display timestamp, thereby obtaining the information of the currently received media information packet.
  • Complete information about each media fragment, so that in subsequent steps, the information about each media fragment in the same time period can be aggregated into a complete free-viewpoint fragment.
  • the information fragmentation conditions can be set accordingly according to specific scenarios, which are not limited here.
  • the information fragmentation conditions may include but are not limited to: the ratio of the difference between the second display timestamp and the initial display timestamp and the preset time reference, which is greater than or equal to the preset fragmentation duration, wherein the preset time reference may be But it is not limited to the time base of the corresponding media information flow.
  • the duration of the media information packets can be, but is not limited to, set to the preset fragmentation duration.
  • the difference between the two display timestamps is used To measure the difference between the second display timestamp and the initial display timestamp, that is to say, the second display timestamp is large enough to further implement subsequent information fragmentation, then when the second display timestamp is different from the initial display time When the ratio between the stamp difference and the preset time base is less than the preset fragmentation duration, it can be determined that the currently received media information packet does not need to be fragmented.
  • the display timestamps of all media information packages are summarized, and then all the display timestamps are compared to obtain the second display timestamp of the currently received media information package.
  • the next information fragmentation can be continued according to step S142. That is to say, after the duration of the subsequent information fragmentation is clear,
  • the next starting display timestamp can be determined based on the duration of the information fragmentation, the previous starting display timestamp and the preset time base, so that subsequent information fragmentation can be performed based on the next starting display timestamp. .
  • one embodiment of the present application further explains the steps before steps S141 to S142, including but not limited to steps S160 to S180.
  • Step S160 Detect whether there is a first target media information flow, where the first target media information flow is a media information flow that satisfies the interruption recovery condition.
  • Step S170 When detecting the presence of the first target media information stream, obtain the difference between the second display timestamp corresponding to the first target media information stream and the starting display timestamps corresponding to the multiple second target media information streams. value, wherein the second target media information flow is a media information flow that does not meet the interruption recovery condition.
  • Step S180 Update the start display timestamp and fragment sequence number of the first target media information stream to the start display timestamp and fragment sequence number of the second target media information stream corresponding to the target difference value, where the target difference The value is the smallest of all differences.
  • step S160 since the interruption recovery will affect the subsequent information fragmentation of the media information packet, in step S160, the status of the interruption recovery is further determined by detecting whether there is a first target media information flow that satisfies the interruption recovery condition. , and when detecting the presence of the first target media information stream, obtain the difference between the second display timestamp corresponding to the first target media information stream and the starting display timestamps corresponding to the multiple second target media information streams. , that is, consider the display timestamps between the first target media information flow that meets the interruption recovery conditions and all the second target media information flows that do not meet the interruption recovery conditions.
  • the starting display timestamp and fragmentation sequence number of the second target media information stream corresponding to the target difference from all second target media information streams as the starting display timestamp of the first target media information stream to be updated. and the fragment sequence number. Since the target difference value is the smallest of all differences, the starting display timestamp and fragment sequence number of the first target media information stream can be updated to those of the nearest neighbor media information stream.
  • the timestamp and fragmentation sequence number are displayed initially, which can reduce the difficulty of subsequent information fragmentation, that is, try to fragment information as few times as possible, which can reduce network bandwidth requirements.
  • the current interruption recovery conditions can be set accordingly according to specific scenarios, which are not limited here.
  • the interruption recovery condition may include, but is not limited to: the ratio of the difference between the second display timestamp and the display timestamp of the last received media information packet and the preset time base is greater than the preset timeout period, where the preset The time base may be, but is not limited to, the time base of the corresponding media information flow.
  • the difference between the second display timestamp and the display timestamp of the most recently received media information packet in order to better determine the actual timeout degree of the second display timestamp, it can be understood that when the second display timestamp When the difference between the display timestamp of the last received media information packet and the ratio of the preset time base is less than or equal to the preset timeout, it can be determined that there is no need to perform interruption recovery on the currently received media information packet.
  • Step S150 Aggregate the target media fragmentation information in all media information streams to obtain free-view media fragmentation information, where the target media fragmentation information is media fragmentation information with the same fragmentation sequence number.
  • the first display timestamp of the obtained target media information package is uniformly set as the starting display timestamp of each media information stream, so as to solve the problem of inconsistent images arriving at the media server at the same time for each media information stream, and then
  • information fragmentation is performed on each media information stream according to the starting display timestamp to obtain multiple media fragmentation information, and the media fragmentation information with the same fragmentation sequence number in all media information streams is aggregated to obtain
  • Obtain complete free-viewpoint media fragmentation information thereby ensuring image quality while avoiding large-scale spatial jumps in the video picture when the user switches perspectives; therefore, embodiments of the present application enable users to achieve free-viewpoint switching. Seamless switching improves the user's video experience, thereby filling the technical gaps in related methods.
  • the target media fragmentation information may be, but is not limited to, media fragmentation information with a fragmentation sequence number other than 1.
  • the first media in the media information stream of each camera position The display timestamp of the fragment information is modified to the first display timestamp of the first received media information packet.
  • the first media fragment information of each camera position that is, the fragment sequence number is 1
  • the aggregation can start from the media fragment information with fragment sequence number 2. , in order to obtain reliable and stable free-view media fragmentation information.
  • the media information stream is processed according to the corresponding display timestamp. Fragment and aggregate based on the target media fragmentation information in the media information flow to obtain the final free-view media fragmentation information. Therefore, the requirements for network bandwidth can be greatly reduced, which is more suitable for users, and is implemented using this application.
  • the media fragmentation information splicing method in this example does not need to consider the actual impact of the resolution of each camera, that is, it does not need to reduce the resolution of each camera by adapting the resolution of the user's playback, so it can further improve the user's Experience the effect.
  • step S150 which includes but is not limited to steps S151 to S153.
  • Step S151 Traverse the target media fragmentation information in each media information stream in sequence.
  • Step S152 Determine whether the current target media fragmentation information is the first media fragmentation information after the interruption is restored.
  • Step S153 If the current target media fragmentation information is not the first media fragmentation information after the interruption and recovery, aggregate the current target media fragmentation information.
  • the target media fragmentation information in each media information stream is traversed to determine whether the current target media fragmentation information is the first media fragmentation information after the interruption is restored.
  • the first media fragment information after the interruption is restored is similar to the first media fragment information of each camera whose display timestamp has been modified. It is not suitable for aggregation. Therefore, when judging the current target media Only when the fragmentation information is not the first media fragmentation information after the interruption is restored, the current target media fragmentation information is selected to be aggregated, in order to obtain reliable and stable free-view media fragmentation information. In other words, for the interruption In the recovery situation, at least the second media fragmentation information will not be aggregated until the second one after the interruption is restored, so that it is better to obtain the free-viewpoint media fragmentation information.
  • step S150 is further described.
  • Step S150 also includes but is not limited to step S154.
  • Step S154 If the current target media fragmentation information is the first media fragmentation information after the interruption and recovery, the current target media fragmentation information is not aggregated.
  • the current target media fragmentation information after the interruption is similar to the first media fragmentation information of each camera with a modified display timestamp, it is not suitable for aggregation. Therefore, When it is determined that the current target media fragmentation information is the first media fragmentation information after the interruption and recovery, the current target media fragmentation information will not be aggregated to avoid affecting the overall aggregation process of free view media fragmentation information, that is, It is said that for the situation of interruption recovery, it is better to aggregate the media fragmentation information at least until the second media fragmentation after the interruption recovery. In this way, it is better to obtain the free viewpoint media fragmentation information.
  • Figure 6 is a schematic diagram of a media server for executing a media information processing method provided by an embodiment of the present application.
  • the media server may, but is not limited to, include a flow collection module, an alignment module and a splicing module, where:
  • the stream collection module is used to pull media streams from each camera on the free viewpoint front-end (i.e. camera 1 media stream, camera 2 media stream, camera 3 media stream... camera n media stream as shown in Figure 6), and add it to In the receive buffer queue;
  • the alignment module is used to take out the media streams in the stream collection cache queue, align them and then fragment them;
  • the splicing module is used to aggregate each camera position into a complete free-viewpoint slice according to the same slice sequence number.
  • the user can achieve seamless switching between free viewpoints, improve the user's video experience, and thus make up for the technical gaps in related methods.
  • Figure 7 is a flow chart of a media information processing method performed by an alignment module provided by an embodiment of the present application.
  • the alignment module may, but is not limited to, perform the following steps.
  • Step a Traverse each media information packet in the streaming buffer queue and determine whether the current media information packet is the first media information packet received. If so, force the startpts of the first fragment of all cameras to be set to received. After the PTS of the first media information packet (that is, the current media information packet), enter step b, otherwise no processing is performed and step b is entered directly.
  • Step b Store the media information packets in the linked list of the corresponding camera location.
  • Step c Use the formula (curpts-lastpts)/timebase>overtime to determine whether there is a flow interruption at this location.
  • curpts represents the PTS of the current media information package of the camera
  • lastpts represents the PTS of the previous media package of the camera
  • timebase represents the time base of the media stream.
  • overtime represents the preset timeout time.
  • Step d Calculate the difference diffpts between curpts and the startpts of other normal locations, find the startpts and segno of the location corresponding to the smallest diffpts (segno refers to the fragment serial number, increasing from 1), and set it to break.
  • the corresponding information in the stream recovery position specifically, refer to Figure 8, which is a schematic diagram of multiple media information streams provided by another embodiment of the present application.
  • the numbers in the box represent the PTS value of the current media information packet, from which It can be seen that when the startpts of stand 1 is 0, segno is 1, when the startpts of stand 2 is 540000, segno is 2, when the startpts of stand 3 is 108000, segno is 3, and stand 1 has a flow interruption.
  • Example 2 The following is a detailed description of the working principle and process of the splicing module in Example 2.
  • Figure 9 is a flow chart of a media information processing method performed by a splicing module provided by an embodiment of the present application.
  • the splicing module may, but is not limited to, perform the following steps.
  • Step a Scan the fragment information and determine whether the fragment serial number n to be aggregated is 1. If not, go to step b; if so, add 1 to the fragment serial number and then enter step a again. Since the first fragment of each machine position The duration is inconsistent after forced alignment, so the first fragment of each location is not aggregated.
  • Step b Traverse the fragments with the same serial number of each machine in sequence, that is, traverse the fragments with the fragment number n of each machine in sequence, and determine whether the fragment is the first fragment after the interruption is restored. If so, then Go to step b again, otherwise go to step c.
  • Step c Aggregate the free-view media fragmentation information of the camera's fragments, and determine whether all the fragments with fragment number n of the camera have been scanned. If so, add 1 to the fragment number and proceed to step a. Otherwise, proceed to step a. b.
  • the embodiment of the present application forcibly sets the first PTS of the starting sharding of all machine locations in the initialization phase.
  • slicing is performed according to the slicing duration and the slicing serial number is incremented.
  • the first PTS and fragment sequence number of the current fragment of the camera are recalculated, and then all camera information in the same time period is aggregated into a complete free-viewpoint fragment based on the fragment sequence number.
  • the user selects a viewing angle for playback, which can solve the problem of inconsistent images arriving at the media server at the same time from each camera's code stream, avoid large-scale spatial jumps in the video image when the user switches viewing angles, while ensuring image quality and reducing
  • the bandwidth and performance requirements of the terminal device enable users to seamlessly switch between free viewpoints and improve the user's video experience.
  • the methods of the embodiments of the present application can be widely used in panoramic video generation in VR, virtual viewpoint scenes, etc.
  • one embodiment of the present application also discloses a media information processing device 100, including: at least one processor 110; at least one memory 120, used to store at least one program; when at least one program is When executed, at least one processor 110 implements the media information processing method as in any previous embodiment.
  • an embodiment of the present application also discloses a computer-readable storage medium in which computer-executable instructions are stored, and the computer-executable instructions are used to execute the media information processing method as in any of the previous embodiments.
  • an embodiment of the present application also discloses a computer program product, which includes a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium.
  • the computer program or computer instructions are obtained, and the processor executes the computer program or computer instructions, so that the computer device performs the media information processing method as in any of the previous embodiments.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请公开了一种媒体信息处理方法及其装置、存储介质。其中,一种媒体信息处理方法,包括:接收多个媒体信息流;获取接收到的目标媒体信息包的第一显示时间戳;将第一显示时间戳作为各个媒体信息流的起始显示时间戳;根据起始显示时间戳对各个媒体信息流进行信息分片,得到多个媒体分片信息,其中,媒体分片信息对应有分片序号,具有相同的分片序号的所有媒体分片信息具有相同的媒体时长;将所有目标媒体分片信息进行聚合,得到自由视点媒体分片信息,其中,目标媒体分片信息为具有相同的分片序号的媒体分片信息。

Description

媒体信息处理方法及其装置、存储介质
相关申请的交叉引用
本申请基于申请号为202210642307.6、申请日为2022年6月8日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及视频技术领域,尤其是一种媒体信息处理方法及其装置、计算机存储介质。
背景技术
随着5G技术和高速互联网的快速发展,元宇宙和全真互联网迅速来临,沉浸媒体应用得到快速发展,目前创新的自由视点技术可以让观众自由选择任意时刻的360度任意观看视角,提升用户沉浸式的体验感,用户在观看视频的过程中可以自由切换视角,但是由于多机位拍摄的同一时刻画面各视角的视频流到达媒体服务器会存在较大的时间差,因此无法确保整体画面的良好画质,大大的影响了用户的体验效果。
发明内容
本申请实施例提供了一种媒体信息处理方法及其装置、计算机存储介质,能够提升用户的视频体验效果。
第一方面,本申请实施例提供了一种媒体信息处理方法,包括:接收多个媒体信息流,其中,所述媒体信息流包括多个媒体信息包;获取接收到的目标媒体信息包的第一显示时间戳,其中,所述目标媒体信息包为所有所述媒体信息包中第一个被接收的媒体信息包;将所述第一显示时间戳作为各个所述媒体信息流的起始显示时间戳;根据所述起始显示时间戳对各个所述媒体信息流进行信息分片,得到各个所述媒体信息流的多个媒体分片信息,其中,所述媒体分片信息对应有分片序号,具有相同的所述分片序号的所有所述媒体分片信息具有相同的媒体时长;将所有所述媒体信息流中的目标媒体分片信息进行聚合,得到自由视点媒体分片信息,其中,所述目标媒体分片信息为具有相同的所述分片序号的所述媒体分片信息。
第二方面,本申请实施例还提供了一种媒体信息处理装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如前面所述的媒体信息处理方法。
第三方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如前面所述的媒体信息处理方法。
本申请实施例中,通过将获取到的目标媒体信息包的第一显示时间戳统一设置为各个媒体信息流的起始显示时间戳,以解决各个媒体信息流同一时刻画面到达媒体服务器不一致的缺陷,进而在这种情况下根据起始显示时间戳对各个媒体信息流进行信息分片得到多个媒体分片信息,将所有媒体信息流中的具有相同的分片序号的媒体分片信息进行聚合,以得到完整的自由视点媒体分片信息,从而在保证画质的同时,避免视频画面在用户进行视角切换的 过程中出现大范围地空间跳跃;因此,本申请实施例使得用户能够实现自由视点间的无缝切换,提升用户的视频体验效果,从而可以弥补相关方法中的技术空白。
附图说明
图1是本申请一个实施例提供的媒体信息处理方法的流程图;
图2a是本申请一个实施例提供的多个媒体信息流在进行对齐前的示意图;
图2b是本申请一个实施例提供的多个媒体信息流在进行对齐后的示意图;
图3是本申请另一个实施例提供的媒体信息处理方法中,得到各个媒体信息流的多个媒体分片信息的流程图;
图4是本申请一个实施例提供的媒体信息处理方法中,得到各个媒体信息流的多个媒体分片信息之前的流程图;
图5是本申请一个实施例提供的媒体信息处理方法中,得到自由视点媒体分片信息的流程图;
图6是本申请一个实施例提供的用于执行媒体信息处理方法的媒体服务器的示意图;
图7是本申请本申请一个实施例提供的对齐模块执行媒体信息处理方法的流程图;
图8是本申请另一个实施例提供的多个媒体信息流的示意图;
图9是本申请一个实施例提供的拼接模块执行媒体信息处理方法的流程图;
图10是本申请一个实施例提供的媒体信息处理装置的示意图。
具体实施方式
为了使本申请的目的、技术方法及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
目前,为了改善多机位拍摄的同一时刻画面各视角的视频流到达媒体服务器会存在较大的时间差的问题,相关技术中将各个机位的视频流压缩后拼接成一幅超高分辨率的大图画面,再进行图像校正,这样对用户的网络带宽要求比较高,且随着机位数的增多,为了适配用户播放器的分辨率,需要降低机位的分辨率,这大大的影响了用户的体验效果。
基于此,本申请提供了一种媒体信息处理方法及其装置、计算机存储介质及计算机程序产品。其中一个实施例的媒体信息处理方法,包括:接收多个媒体信息流,其中,媒体信息流包括多个媒体信息包;获取接收到的目标媒体信息包的第一显示时间戳,其中,目标媒体信息包为所有媒体信息包中第一个被接收的媒体信息包;将第一显示时间戳作为各个媒体信息流的起始显示时间戳;根据起始显示时间戳对各个媒体信息流进行信息分片,得到各个媒体信息流的多个媒体分片信息,其中,媒体分片信息对应有分片序号,具有相同的分片序号的所有媒体分片信息具有相同的媒体时长;将所有媒体信息流中的目标媒体分片信息进行聚合,得到自由视点媒体分片信息,其中,目标媒体分片信息为具有相同的分片序号的媒体分片信息。该实施例中,通过将获取到的目标媒体信息包的第一显示时间戳统一设置为各个媒 体信息流的起始显示时间戳,以解决各个媒体信息流同一时刻画面到达媒体服务器不一致的缺陷,进而在这种情况下根据起始显示时间戳对各个媒体信息流进行信息分片得到多个媒体分片信息,将所有媒体信息流中的具有相同的分片序号的媒体分片信息进行聚合,以得到完整的自由视点媒体分片信息,从而在保证画质的同时,避免视频画面在用户进行视角切换的过程中出现大范围地空间跳跃;因此,本申请实施例使得用户能够实现自由视点间的无缝切换,提升用户的视频体验效果,从而可以弥补相关方法中的技术空白。
下面结合附图,对本申请实施例作进一步阐述。
如图1所示,图1是本申请一个实施例提供的媒体信息处理方法的流程图,该媒体信息处理方法可以包括但不限于步骤S110至步骤S150。
步骤S110:接收多个媒体信息流,其中,媒体信息流包括多个媒体信息包。
本步骤中,通过接收多个媒体信息流,以便于在后续步骤中对所接收到的多个媒体信息流进行彼此间的准确区分,从而确定哪个媒体信息流中的媒体信息包为第一个被接收的媒体信息包。
在一实施例中,步骤S110至步骤S150及其相关步骤的执行主体可以由本领域技术人员根据具体情况进行选择设置,此处并未限定。例如,以用于统筹管理各个媒体信息流的媒体服务器作为执行主体,也就是说,通过媒体服务器接收多个媒体信息流,并基于多个媒体信息流执行以下步骤S120至S150及其相关步骤,在媒体服务器中可以设置相应的功能模块以执行对应的步骤,以达到更好的统筹效果,因此可以在媒体服务器中设置收流模块,用于从自由视点前端各个机位上拉取媒体信息流并添加到收流模块中的收流缓存队列中;又如,设置其他的服务器、节点、模块或设备等作为统筹管理媒体服务器的一方,即通过统筹管理媒体服务器以间接地对多个媒体信息流进行处理,那么对应的服务器、节点、模块或设备也可以作为步骤S110至步骤S150及其相关步骤的执行主体。需要说明的是,本申请以下各实施例中主要以“媒体服务器”作为步骤S110至步骤S150及其相关步骤的执行主体进行描述,但并不作为唯一限制。
在一实施例中,媒体服务器是下一代网络的重要设备,该设备在控制设备(例如软交换设备、应用服务器等)的控制下,提供在IP网络上实现各种业务所需的媒体资源功能,包括业务音提供、会议、交互式应答、通知、统一消息、高级语音业务等。在应用服务器里,可以但不限于使用MSML(Media Server Markup Language,媒体服务器标记语言)向媒体服务器发送放音等命令。媒体服务器具有较好的可裁剪性,可灵活实现一种或多种功能,包括但不限于有:
双音多频(Dual-Tone Multi Frequency,DTMF)信号的采集与解码功能:按照控制设备发来的相关操作参数的规定,从DTMF话机上接收DTMF信号,封装在信令中传给控制设备;
录音通知的发送功能:按照控制设备的要求,用规定的语音向用户播放规定的录音通知;
会议功能:支持多个RTP流的音频混合功能,支持不同编码格式的混音;
不同编解码算法间的转换功能:支持G.711、G.723、G.729等多种语音编解码算法,并可实现编解码算法之间的转换;
自动语音合成功能:将若干个语音元素或字段级连起来构成一条完整的语音提示通知,该语音提示通知为固定的或可变的;
动态语音播放/录制功能:包括音乐保持、Follow-me语音服务等;
音信号的产生与发送功能:提供拨号音、忙音、回铃音、等待音和空号音等基本信号音;
资源的维护与管理功能:以本地或/和远程两种方式,提供对媒体资源以及设备本身的维护、管理,如数据配置、故障管理等。
媒体服务器至少具有如下特性中的一种:
先进性:可以采用ITU-T的H.248和SIP标准协议等;
兼容性:能够方便的在不同厂家的软交换系统完成互通;
高可靠性:网关提供双电源,支持热插拔;定位于电信级设备,系统拥塞保护;
易维护性:支持与SNMP网管进行通信,能够在线维护系统、管理资源、事后分析等;
高扩展性和易升级性:独立的应用层可以为用户定制各种增值服务,并且能够对系统进行在线更新,最大限度的满足用户的需要;
灵活性:灵活的组网方式和强大的综合接入能力,可以为用户提供多种解决方案。
在一实施例中,对于每个机位的媒体信息流的接收情况不作限定,也就是说,对于不同机位的媒体信息流的接收方式可以为相同的,也可以根据具体设置情况选择对应的方式进行接收,例如以实时消息传输协议(Real Time Messaging Protocol,RTMP)的方式拉取场景中所选定的机位的媒体信息流,也就是说,本申请实施例中确保能够接收多个媒体信息流即可,具体接收方式此处并未限定,由于不需要限制媒体信息流的传输方式,所以同样适用于以其他方式拉取媒体信息流的应用场景。
在一实施例中,媒体信息流和每个媒体信息流中的媒体信息包的接收时机、个数可以不作限定,而是在具体场景中进行相应设置。例如,在一场馆中通常可以设置超过50个的机位,对应有超过50个的媒体信息流待接收,由于用户需要在特定时间才会进入场馆中进行观看,因此可以将所选定的媒体信息流的发送时间或者播放时间设置在该特定时间附近,以便于用户能够在特定时间观看视频。
步骤S120:获取接收到的目标媒体信息包的第一显示时间戳,其中,目标媒体信息包为所有媒体信息包中第一个被接收的媒体信息包。
本步骤中,由于需要解决各个媒体信息流同一时刻画面到达媒体服务器不一致的缺陷,也就是说,对于所有媒体信息流而言,无论到达媒体服务器的先后顺序如何,都需要对所有媒体信息流进行同步,那么为了避免出现媒体信息流的遗漏或者不匹配,至少需要找出第一个被接收的媒体信息包作为起始点进行改善,所以通过从所有媒体信息包中找出第一个被接收的媒体信息包,并以之作为目标媒体信息包,获取该目标媒体信息包的第一显示时间戳,以便于在后续步骤中将所有媒体信息包的显示时间戳与目标媒体信息包的第一显示时间戳进行对齐,以解决各个媒体信息流同一时刻到达媒体服务器不一致的缺陷。
在一实施例中,获取接收到的目标媒体信息包的第一显示时间戳的方式可以为多种,此处并未限定。例如,将所有媒体信息包的显示时间戳进行汇总,然后对所有的显示时间戳进行比较,从而从中获取到目标媒体信息包的第一显示时间戳。
步骤S130:将第一显示时间戳作为各个媒体信息流的起始显示时间戳。
本步骤中,通过将第一显示时间戳作为各个媒体信息流的起始显示时间戳,使得各个媒体信息流的显示时间戳能够被同步为起始显示时间戳,那么所有媒体信息流的显示时间戳均会保持一致,从而能够解决各个媒体信息流同一时刻到达媒体服务器不一致的缺陷,以便于在后续步骤中根据起始显示时间戳对各个媒体信息流进行信息分片、聚合。
以下给出一种具体示例以说明上述各实施例的工作原理及流程。
示例一:
如图2a和图2b所示,图2a是本申请一个实施例提供的多个媒体信息流在进行对齐前的示意图,图2b是本申请一个实施例提供的多个媒体信息流在进行对齐后的示意图,作为示例给出了3个机位分别对应的媒体信息流的示意图,每个机位中的媒体信息流包括多个重复的分片。
以媒体服务器为例,在将所有媒体信息包收入到收流缓存队列中的情况下,遍历收流缓存队列中的各个媒体信息包,判断当前媒体信息包是否为收到的第一个媒体信息包,若是则强制设置所有机位的第一个分片的startpts,其中,startpts是当前分片的第一个显示时间戳(Presentation Time Stamp,PTS),即接收到的当前分片的第一个媒体信息包(即当前媒体信息包)的PTS,否则将该媒体信息包存储到对应机位的链表中,然后可以再针对另一个媒体信息包重复上述判断流程,直至找到所需求的第一个媒体信息包。
如图2a所示,给出了在未修改起始显示时间戳的情况下的各个机位的媒体信息流的示意图,方框中的数字表示当前媒体信息包的PTS,从中可以看出,分片时长为6s,机位1的第一个分片PTS范围为[0~540000),startpts为0,机位2的第一个分片PTS范围为[7200~547200),startpts为7200,机位3的第一个分片PTS范围为[3600~543600),startpts为3600;由于各个机位起始分片的PTS范围不一致,因此终端在进行机位间切换时会出现画面的大范围空间跳跃的问题。
如图2b所示,给出了在修改起始显示时间戳的情况下的各个机位的媒体信息流的示意图,以首个收到机位2的媒体信息包(即收流缓存队列中的第一个媒体信息包为机位2的媒体信息包)为例进行说明,分片时长为6s,从中可以看出,相比于机位原来的媒体信息流,机位1的第一个分片PTS范围为[0~547200),startpts为7200,机位2的第一个分片PTS范围为[7200~547200),startpts为7200,机位3的第一个分片PTS范围为[3600~547200),startpts为7200,这样各个机位的第二个分片的startpts则均为547200,也就是说,由于各个机位的第二个分片的startpts为相同的,且分片时长也是相同的,那么从各个机位的第二个分片开始,各个机位后续的各个分片可以保证为分别对应对齐的,那么其在同一时刻到达媒体服务器则是一致的,因此可以解决各个媒体信息流同一时刻到达媒体服务器不一致的缺陷。
步骤S140:根据起始显示时间戳对各个媒体信息流进行信息分片,得到各个媒体信息流的多个媒体分片信息,其中,媒体分片信息对应有分片序号,具有相同的分片序号的所有媒体分片信息具有相同的媒体时长;
本步骤中,由于在步骤S130中已经确定了各个媒体信息流的起始显示时间戳,因此可以进一步根据起始显示时间戳对各个媒体信息流进行信息分片,得到各个媒体信息流的多个媒体分片信息,并且通过分片序号对各个媒体分片信息进行区分,其中具有相同的分片序号的所有媒体分片信息具有相同的媒体时长,因此对于不同的媒体信息流而言,通过比较各自的分片序号即可以确认得到同一时间段的媒体分片信息,以便于在后续步骤中对将同一时间段的各个媒体分片信息聚合为一个完整的自由视点分片。
如图3所示,本申请的一个实施例,对步骤S140进行进一步说明,步骤S140包括但不限于步骤S141和S142。
步骤S141:对于各个媒体信息流,获取当前接收到的媒体信息包的第二显示时间戳。
步骤S142:当根据第二显示时间戳和起始显示时间戳确定满足信息分片条件,根据当前接收到的媒体信息包进行初始信息分片,将第二显示时间戳作为新的起始显示时间戳,根据新的起始显示时间戳进行后续信息分片。
本步骤中,通过获取当前接收到的媒体信息包的第二显示时间戳,以便于将第二显示时间戳与对齐的起始显示时间戳进行比较以确定是否满足信息分片条件,若满足则可以根据当前接收到的媒体信息包进行初始信息分片,同时以符合条件的第二显示时间戳作为新的起始显示时间戳进行后续信息分片,从而能够得到当前接收到的媒体信息包的完整的各个媒体分片信息,以便于在后续步骤中对将同一时间段的各个媒体分片信息聚合为一个完整的自由视点分片。
在一实施例中,信息分片条件可以根据具体场景进行相应设置,此处并未作出限定。例如,信息分片条件可以包括但不限于为:第二显示时间戳与起始显示时间戳之差和预设时间基准的比值,大于或等于预设分片时长,其中,预设时间基准可以但不限于为对应的媒体信息流的时间基准,当所有媒体信息包的时长相同,则可以但不限于将媒体信息包的时长设置为预设分片时长,两个显示时间戳的差值用于衡量第二显示时间戳与起始显示时间戳之间的差异程度,也就是说,第二显示时间戳足够大以进一步实现后续信息分片,那么当第二显示时间戳与起始显示时间戳之差和预设时间基准的比值小于预设分片时长时,则可以确定不需要对当前接收到的媒体信息包进行信息分片。
在一实施例中,获取当前接收到的媒体信息包的第二显示时间戳的方式可以为多种,此处并未限定。例如,将所有媒体信息包的显示时间戳进行汇总,然后对所有的显示时间戳进行比较,从而从中获取到当前接收到的媒体信息包的第二显示时间戳。
在一实施例中,在根据新的起始显示时间戳进行后续信息分片之后,可以依照步骤S142的方式继续进行接下来的信息分片,也就是说,在清楚后续信息分片的时长的情况下,可以根据该信息分片的时长、上一个起始显示时间戳以及预设时间基准确定下一个起始显示时间戳,从而能够基于下一个起始显示时间戳进行更后续的信息分片。
如图4所示,本申请的一个实施例,对步骤S141至S142之前的步骤进行进一步说明,还包括但不限于步骤S160至S180。
步骤S160:检测是否存在第一目标媒体信息流,其中,第一目标媒体信息流为满足断流恢复条件的媒体信息流。
步骤S170:当检测到存在第一目标媒体信息流,获取第一目标媒体信息流所对应的第二显示时间戳与多个第二目标媒体信息流所对应的起始显示时间戳之间的差值,其中,第二目标媒体信息流为不满足断流恢复条件的媒体信息流。
步骤S180:将第一目标媒体信息流的起始显示时间戳和分片序号,更新为目标差值所对应的第二目标媒体信息流的起始显示时间戳和分片序号,其中,目标差值为所有差值中数值最小的一个。
本步骤中,由于断流恢复会影响到对媒体信息包进行的后续信息分片,因此在步骤S160中通过检测是否存在满足断流恢复条件的第一目标媒体信息流以进一步判断断流恢复情况,并且当检测到存在第一目标媒体信息流,获取第一目标媒体信息流所对应的第二显示时间戳与多个第二目标媒体信息流所对应的起始显示时间戳之间的差值,即考虑满足断流恢复条件的第一目标媒体信息流与所有不满足断流恢复条件的第二目标媒体信息流之间的显示时间戳 的差异,从所有第二目标媒体信息流中选择目标差值所对应的第二目标媒体信息流的起始显示时间戳和分片序号,作为更新第一目标媒体信息流的起始显示时间戳和分片序号的依据,而由于目标差值为所有差值中数值最小的一个,因此可以将第一目标媒体信息流的起始显示时间戳和分片序号更新为最近邻的媒体信息流的起始显示时间戳和分片序号,这样可以降低后续进行信息分片的难度,即尽量进行更少次数的信息分片,可以降低网络带宽要求。
在一实施例中,断流恢复条件可以根据具体场景进行相应设置,此处并未作出限定。例如,断流恢复条件可以但不限于包括:第二显示时间戳与上一个接收到的媒体信息包的显示时间戳之差和预设时间基准的比值,大于预设超时时长,其中,预设时间基准可以但不限于为对应的媒体信息流的时间基准,通过将第二显示时间戳与上一个接收到的媒体信息包的显示时间戳之差进行比较,可以衡量当前接收到的媒体信息包的第二显示时间戳与最近接收到的媒体信息包的显示时间戳之间的差异,以便于较好地确定第二显示时间戳的实际超时程度,可以理解地是,当第二显示时间戳与上一个接收到的媒体信息包的显示时间戳之差和预设时间基准的比值小于或等于预设超时时长时,则可以确定不需要对当前接收到的媒体信息包进行断流恢复。
步骤S150:将所有媒体信息流中的目标媒体分片信息进行聚合,得到自由视点媒体分片信息,其中,目标媒体分片信息为具有相同的分片序号的媒体分片信息。
本步骤中,通过将获取到的目标媒体信息包的第一显示时间戳统一设置为各个媒体信息流的起始显示时间戳,以解决各个媒体信息流同一时刻画面到达媒体服务器不一致的缺陷,进而在这种情况下根据起始显示时间戳对各个媒体信息流进行信息分片得到多个媒体分片信息,将所有媒体信息流中的具有相同的分片序号的媒体分片信息进行聚合,以得到完整的自由视点媒体分片信息,从而在保证画质的同时,避免视频画面在用户进行视角切换的过程中出现大范围地空间跳跃;因此,本申请实施例使得用户能够实现自由视点间的无缝切换,提升用户的视频体验效果,从而可以弥补相关方法中的技术空白。
在一实施例中,目标媒体分片信息可以但不限于为分片序号非1的媒体分片信息,参照图2a和图2b的示例可知,各个机位的媒体信息流中的第一个媒体分片信息的显示时间戳是被修改为第一个被接收的媒体信息包的第一显示时间戳的,在这种情况下各个机位的第一个媒体分片信息(即分片序号为1的媒体分片信息)的时长是不对应相同的,若直接对分片序号为1的媒体分片信息进行聚合则不对应,因此可以从分片序号为2的媒体分片信息开始进行聚合,以便于得到可靠稳定的自由视点媒体分片信息。
在一实施例中,无需像相关技术一样将各个机位的视频流压缩后拼接成一幅超高分辨率的大图画面再进行图像校正,而是根据相应的显示时间戳对媒体信息流进行信息分片并基于媒体信息流中的目标媒体分片信息进行聚合,得到最终的自由视点媒体分片信息,因此能够大大地降低对于网络带宽的要求,对于用户而言更加适用,而且采用本申请实施例的媒体分片信息拼接方式,也不需要考虑各个机位的分辨率的实际影响,即不需要通过适配用户播放的分辨率而降低各个机位的分辨率,因此能够进一步地提升用户的体验效果。
如图5所示,本申请的一个实施例,对步骤S150进行进一步说明,步骤S150包括但不限于步骤S151至S153。
步骤S151:依次遍历各个媒体信息流中的目标媒体分片信息。
步骤S152:判断当前目标媒体分片信息是否为断流恢复后的第一个媒体分片信息。
步骤S153:若当前目标媒体分片信息不为断流恢复后的第一个媒体分片信息,对当前目标媒体分片信息进行聚合。
本步骤中,通过遍历各个媒体信息流中的目标媒体分片信息以判断当前目标媒体分片信息是否为断流恢复后的第一个媒体分片信息,按照上述实施例的相关评述可知,由于断流恢复后的第一个媒体分片信息与各个机位的第一个显示时间戳被修改的媒体分片信息相类似,也不能够较好地适用于进行聚合,因此当判断当前目标媒体分片信息不为断流恢复后的第一个媒体分片信息,才选择对当前目标媒体分片信息进行聚合,以便于得到可靠稳定的自由视点媒体分片信息,也就是说,针对断流恢复情况,至少到断流恢复后的第二个媒体分片信息起才对其进行聚合,这样得到自由视点媒体分片信息效果更好。
本申请的一个实施例,在步骤S151至S153的基础上,对步骤S150进行进一步说明,步骤S150还包括但不限于步骤S154。
步骤S154:若当前目标媒体分片信息为断流恢复后的第一个媒体分片信息,不对当前目标媒体分片信息进行聚合。
本步骤中,由于断流恢复后的第一个媒体分片信息与各个机位的第一个显示时间戳被修改的媒体分片信息相类似,也不能够较好地适用于进行聚合,因此当判断当前目标媒体分片信息为断流恢复后的第一个媒体分片信息,则不对当前目标媒体分片信息进行聚合,以免其影响到自由视点媒体分片信息的整体聚合过程,也就是说,针对断流恢复情况,至少到断流恢复后的第二个媒体分片信息起才对其进行聚合,这样得到自由视点媒体分片信息效果更好。
以下给出多种具体示例以说明上述各实施例的工作原理及流程。
示例二:
如图6所示,图6为本申请一个实施例提供的用于执行媒体信息处理方法的媒体服务器的示意图。
参照图6,媒体服务器可以但不限于包括收流模块、对齐模块和拼接模块,其中:
收流模块用于从自由视点前端各个机位拉取媒体流(即图6所示的机位1媒体流、机位2媒体流、机位3媒体流…机位n媒体流),添加到收流缓存队列中;
对齐模块,用于取出收流缓存队列中的媒体流,并对其进行对齐处理后再分片;
拼接模块,用于将各个机位按照相同的分片序号聚合成一个完整的自由视点分片。
根据上述示例可知,通过收流模块、对齐模块和拼接模块的配合,使得用户能够实现自由视点间的无缝切换,提升用户的视频体验效果,从而可以弥补相关方法中的技术空白。
示例三:
以下具体针对示例二中的对齐模块的工作原理及流程进行详细说明。
如图7所示,图7为本申请一个实施例提供的对齐模块执行媒体信息处理方法的流程图。
参照图7,对齐模块可以但不限于执行如下步骤。
步骤a:遍历收流缓存队列中的各个媒体信息包,判断当前媒体信息包是否为收到的第一个媒体信息包,若是则强制设置所有机位的第一个分片的startpts为接收到的第一个媒体信息包(即当前媒体信息包)的PTS之后,再进入步骤b,否则不做任何处理,直接进入步骤b。
步骤b:将媒体信息包分别存储到对应机位的链表中。
步骤c:根据(curpts-lastpts)/timebase>overtime这一公式判断该机位是否存在断流 恢复的场景,若是则进入步骤d,否则进入步骤e,其中,curpts表示该机位的当前媒体信息包的PTS,lastpts表示该机位的上个媒体包的PTS,timebase表示媒体流的时基,overtime表示预设的超时时间。
步骤d:计算出curpts与其他正常机位的startpts的差值diffpts,找到最小的diffpts所对应的机位的startpts以及segno(segno是指分片序号,从1开始递增),将其设置为断流恢复机位中的相应信息;具体地,参照图8,图8为本申请另一个实施例提供的多个媒体信息流的示意图,方框中的数字表示当前媒体信息包的PTS值,从中可以看出,机位1的startpts为0时,segno为1,机位2的startpts为540000时,segno为2,机位3的startpts为108000时,segno为3,而机位1出现断流情形,则计算其与机位2、机位3相应的媒体信息包之间的PTS差值,与机位2之间的为1083600-540000=543600,与机位3之间的为1083600-1080000=3600,也就是说,将机位3所对应的机位信息startpts以及segno设置为机位1中的相应信息,这样机位1在断流恢复后能够与机位3保持对齐,而当机位2的下个媒体信息包PTS=1080000到达时,机位2切换到下一个分片,startpts以及segno也与机位3保持对齐,这样就保证了机位1断流恢复后能够与其他机位对齐。
步骤e:根据(curpts-startpts)/timebase>=min_seg_duration这一公式判断该机位是否已经满足分片的条件,若是则直接分片,分片以segno命名,待segno加1后进入步骤a,否则不做任何处理直接进入步骤a,其中,min_seg_duration表示预设的分片时长。
示例四:
以下具体针对示例二中的拼接模块的工作原理及流程进行详细说明。
如图9所示,图9为本申请一个实施例提供的拼接模块执行媒体信息处理方法的流程图。
参照图9,拼接模块可以但不限于执行如下步骤。
步骤a:扫描分片信息,判断将要聚合的分片序号n是否为1,若否则进入步骤b;若是则将分片序号加1后再次进入步骤a,由于各个机位的第一个分片强制对齐后时长不一致,所以不对各个机位的第一个分片进行聚合操作。
步骤b:依次遍历各机位相同序号的分片,也即依次遍历各机位的分片序号为n的分片,判断该分片是否为断流恢复后的第一个分片,若是则再次进入步骤b,否则进入步骤c。
步骤c:聚合该机位的分片为自由视点媒体分片信息,并判断是否扫描完所有机位分片序号为n的分片,若是则分片序号加1后进入步骤a,否则进入步骤b。
结合上述各个示例可知,本申请实施例在初始化阶段强制设置所有机位起始分片的第一个PTS,在运行阶段按照分片时长进行切片并递增分片序号,当监控到机位存在断流恢复的场景时,重新计算出该机位当前分片的第一个PTS以及分片序号,然后根据分片序号将同一时间段的所有机位信息聚合为一个完整的自由视点分片,供用户选择视角进行播放,能够解决各机位码流同一时刻画面到达媒体服务端不一致的问题,避免用户在进行视角切换的过程中视频画面出现大范围空间跳跃,同时保证了画质,也能降低终端设备的带宽以及性能要求,使得用户实现自由视点间的无缝切换,提升用户的视频体验效果。
本申请实施例的方法可以被广泛应用于VR、虚拟视点场景下等的全景视频生成。
另外,如图10所示,本申请的一个实施例还公开了一种媒体信息处理装置100,包括:至少一个处理器110;至少一个存储器120,用于存储至少一个程序;当至少一个程序被至少一个处理器110执行时实现如前面任意实施例中的媒体信息处理方法。
另外,本申请的一个实施例还公开了一种计算机可读存储介质,其中存储有计算机可执行指令,计算机可执行指令用于执行如前面任意实施例中的媒体信息处理方法。
此外,本申请的一个实施例还公开了一种计算机程序产品,包括计算机程序或计算机指令,计算机程序或计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取计算机程序或计算机指令,处理器执行计算机程序或计算机指令,使得计算机设备执行如前面任意实施例中的媒体信息处理方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (10)

  1. 一种媒体信息处理方法,包括:
    接收多个媒体信息流,其中,所述媒体信息流包括多个媒体信息包;
    获取接收到的目标媒体信息包的第一显示时间戳,其中,所述目标媒体信息包为所有所述媒体信息包中第一个被接收的媒体信息包;
    将所述第一显示时间戳作为各个所述媒体信息流的起始显示时间戳;
    根据所述起始显示时间戳对各个所述媒体信息流进行信息分片,得到各个所述媒体信息流的多个媒体分片信息,其中,所述媒体分片信息对应有分片序号,具有相同的所述分片序号的所有所述媒体分片信息具有相同的媒体时长;
    将所有所述媒体信息流中的目标媒体分片信息进行聚合,得到自由视点媒体分片信息,其中,所述目标媒体分片信息为具有相同的所述分片序号的所述媒体分片信息。
  2. 根据权利要求1所述的媒体信息处理方法,其中,所述根据所述起始显示时间戳对各个所述媒体信息流进行信息分片,包括:
    对于各个所述媒体信息流,获取当前接收到的所述媒体信息包的第二显示时间戳,当根据所述第二显示时间戳和所述起始显示时间戳确定满足信息分片条件,根据当前接收到的所述媒体信息包进行初始信息分片,将所述第二显示时间戳作为新的起始显示时间戳,根据所述新的起始显示时间戳进行后续信息分片。
  3. 根据权利要求2所述的媒体信息处理方法,其中,所述信息分片条件包括:
    所述第二显示时间戳与所述起始显示时间戳之差和预设时间基准的比值,大于或等于预设分片时长。
  4. 根据权利要求2所述的媒体信息处理方法,其中,所述根据所述起始显示时间戳对各个所述媒体信息流进行信息分片之前,所述媒体信息处理方法,还包括:
    检测是否存在第一目标媒体信息流,其中,所述第一目标媒体信息流为满足断流恢复条件的所述媒体信息流;
    当检测到存在所述第一目标媒体信息流,获取所述第一目标媒体信息流所对应的所述第二显示时间戳与多个第二目标媒体信息流所对应的所述起始显示时间戳之间的差值,其中,所述第二目标媒体信息流为不满足所述断流恢复条件的所述媒体信息流;
    将所述第一目标媒体信息流的所述起始显示时间戳和所述分片序号,更新为目标差值所对应的所述第二目标媒体信息流的所述起始显示时间戳和所述分片序号,其中,所述目标差值为所有所述差值中数值最小的一个。
  5. 根据权利要求4所述的媒体信息处理方法,其中,所述断流恢复条件包括:
    所述第二显示时间戳与上一个接收到的所述媒体信息包的显示时间戳之差和预设时间基准的比值,大于预设超时时长。
  6. 根据权利要求1所述的媒体信息处理方法,其中,所述目标媒体分片信息为所述分片序号非1的所述媒体分片信息。
  7. 根据权利要求6所述的媒体信息处理方法,其中,所述将所有所述媒体信息流中的目标媒体分片信息进行聚合,包括:
    依次遍历各个所述媒体信息流中的所述目标媒体分片信息;
    判断当前所述目标媒体分片信息是否为断流恢复后的第一个所述媒体分片信息;
    若当前所述目标媒体分片信息不为断流恢复后的第一个所述媒体分片信息,对当前所述目标媒体分片信息进行聚合。
  8. 根据权利要求7所述的媒体信息处理方法,其中,所述将所有所述媒体信息流中的目标媒体分片信息进行聚合,还包括:
    若当前所述目标媒体分片信息为断流恢复后的第一个所述媒体分片信息,不对当前所述目标媒体分片信息进行聚合。
  9. 一种媒体信息处理装置,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至8任意一项所述的媒体信息处理方法。
  10. 一种计算机可读存储介质,存储有计算机可执行指令,其中,所述计算机可执行指令用于执行权利要求1至8任意一项所述的媒体信息处理方法。
PCT/CN2023/089286 2022-06-08 2023-04-19 媒体信息处理方法及其装置、存储介质 WO2023236666A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210642307.6 2022-06-08
CN202210642307.6A CN117241105A (zh) 2022-06-08 2022-06-08 媒体信息处理方法及其装置、存储介质

Publications (1)

Publication Number Publication Date
WO2023236666A1 true WO2023236666A1 (zh) 2023-12-14

Family

ID=89083156

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089286 WO2023236666A1 (zh) 2022-06-08 2023-04-19 媒体信息处理方法及其装置、存储介质

Country Status (2)

Country Link
CN (1) CN117241105A (zh)
WO (1) WO2023236666A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900857A (zh) * 2018-08-03 2018-11-27 东方明珠新媒体股份有限公司 一种多视角视频流处理方法和装置
CN112188307A (zh) * 2019-07-03 2021-01-05 腾讯科技(深圳)有限公司 视频资源的合成方法、装置、存储介质及电子装置
CN112954391A (zh) * 2021-02-05 2021-06-11 北京百度网讯科技有限公司 视频编辑方法、装置和电子设备
CN113259715A (zh) * 2021-05-07 2021-08-13 广州小鹏汽车科技有限公司 多路视频数据的处理方法、装置、电子设备及介质
CN114079813A (zh) * 2020-08-18 2022-02-22 中兴通讯股份有限公司 画面同步方法、编码方法、视频播放设备及视频编码设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900857A (zh) * 2018-08-03 2018-11-27 东方明珠新媒体股份有限公司 一种多视角视频流处理方法和装置
CN112188307A (zh) * 2019-07-03 2021-01-05 腾讯科技(深圳)有限公司 视频资源的合成方法、装置、存储介质及电子装置
CN114079813A (zh) * 2020-08-18 2022-02-22 中兴通讯股份有限公司 画面同步方法、编码方法、视频播放设备及视频编码设备
CN112954391A (zh) * 2021-02-05 2021-06-11 北京百度网讯科技有限公司 视频编辑方法、装置和电子设备
CN113259715A (zh) * 2021-05-07 2021-08-13 广州小鹏汽车科技有限公司 多路视频数据的处理方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN117241105A (zh) 2023-12-15

Similar Documents

Publication Publication Date Title
US11758209B2 (en) Video distribution synchronization
US11184627B2 (en) Video transcoding system, method, apparatus, and storage medium
CN108810636B (zh) 视频播放方法、虚拟现实设备、服务器、系统及存储介质
WO2020192152A1 (zh) 视频传输的方法、根节点、子节点、p2p服务器和系统
CN110933449B (zh) 一种外部数据与视频画面的同步方法、系统及装置
CN108347622B (zh) 多媒体数据推送方法、装置、存储介质及设备
US11109092B2 (en) Synchronizing processing between streams
US11284135B2 (en) Communication apparatus, communication data generation method, and communication data processing method
JP7273144B2 (ja) ビデオストリーム切換え方法、装置及びシステム
KR20120107882A (ko) 이종망 기반 연동형 방송콘텐츠 송수신 장치 및 방법
WO2020173165A1 (zh) 一种音频流和视频流同步切换方法及装置
US10666697B2 (en) Multicast to unicast conversion
CN112954433B (zh) 视频处理方法、装置、电子设备及存储介质
WO2012116558A1 (zh) 一种视频质量评估方法及装置
CN114245153B (zh) 切片方法、装置、设备及可读存储介质
Tang et al. Audio and video mixing method to enhance WebRTC
JP2021535658A (ja) ビデオ・ストリーム切り換えを実装するための方法、装置およびシステム
EP3316593A1 (en) Method and device for implementing synchronous playing
JP2014506030A (ja) 協調メディアシステム内の複数の端末装置を介したコンテンツの配信を管理する方法及び装置
WO2023236666A1 (zh) 媒体信息处理方法及其装置、存储介质
US7769035B1 (en) Facilitating a channel change between multiple multimedia data streams
TW202127897A (zh) 用於串流傳輸媒體資料的多解碼器介面
WO2018171567A1 (zh) 播放媒体流的方法、服务器及终端
CN115278288B (zh) 一种显示处理方法、装置、计算机设备及可读存储介质
WO2024087938A1 (zh) 一种媒体直播方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23818833

Country of ref document: EP

Kind code of ref document: A1