WO2020057249A1 - 图像处理方法、装置、系统、网络设备、终端及存储介质 - Google Patents

图像处理方法、装置、系统、网络设备、终端及存储介质 Download PDF

Info

Publication number
WO2020057249A1
WO2020057249A1 PCT/CN2019/097355 CN2019097355W WO2020057249A1 WO 2020057249 A1 WO2020057249 A1 WO 2020057249A1 CN 2019097355 W CN2019097355 W CN 2019097355W WO 2020057249 A1 WO2020057249 A1 WO 2020057249A1
Authority
WO
WIPO (PCT)
Prior art keywords
interest
video
image
information
image processing
Prior art date
Application number
PCT/CN2019/097355
Other languages
English (en)
French (fr)
Inventor
吴钊
李明
吴平
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP19862435.5A priority Critical patent/EP3855750A4/en
Priority to JP2021515166A priority patent/JP7425788B2/ja
Priority to KR1020217011376A priority patent/KR102649812B1/ko
Priority to US17/276,572 priority patent/US20220053127A1/en
Publication of WO2020057249A1 publication Critical patent/WO2020057249A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Definitions

  • the embodiments of the present invention relate to, but are not limited to, the field of image coding and decoding technologies, and in particular, to but not limited to an image processing method, device, system, network device, terminal, and storage medium.
  • Video applications are gradually developing from single-view, low-resolution, low-bit-rate to multi-view, high-resolution, and high-bit-rate, to provide users with new video content types and video presentation characteristics, and better presence And viewing experience.
  • 360-degree panoramic video (hereinafter referred to as panoramic video) is a new type of video content. Users can choose a certain angle to watch according to their subjective needs, thereby achieving 360-degree full-viewing. Although the current network performance and hardware processing performance are high, but the dramatic increase in the number of users and the huge amount of panoramic video data, it is still necessary to reduce the network and hardware resource occupation on the premise of ensuring the user's viewing experience.
  • the Region of Interest (ROI) technology can capture and display panoramic videos according to user preferences. It is not necessary to process all panoramic videos. However, in related technologies, there is usually only one ROI, which can only display the panorama in a limited manner. Part of the images in the video cannot meet the needs of users to watch multiple ROIs. Therefore, how to implement coding when there are multiple ROIs to indicate the composite display of each ROI needs to be solved urgently.
  • ROI Region of Interest
  • the image processing method, device, system, network device, terminal, and storage medium provided by the embodiments of the present invention mainly solve the technical problem of how to implement encoding when there are multiple ROIs.
  • an embodiment of the present invention provides an image processing method, including:
  • An embodiment of the present invention further provides an image processing method, including:
  • An embodiment of the present invention further provides an image processing method, including:
  • the network side obtains composite instruction information for indicating a composite display manner of each of the regions of interest in a video image, generates a media stream of the video image based on the composite instruction information, and sends the media stream to a target node;
  • the target node receives the media stream, parses from the media stream to obtain synthesis instruction information of a region of interest, and controls playback display of a video stream in the media stream according to the synthesis instruction information.
  • An embodiment of the present invention further provides an image processing apparatus, including:
  • An acquisition module configured to acquire synthesis instruction information used to indicate a synthesis display manner of each of the regions of interest in a video image
  • a processing module configured to generate a media stream of the video image based on the composition instruction information.
  • An embodiment of the present invention further provides an image processing apparatus, including:
  • a receiving module for receiving a video stream of video images and description data
  • An analysis module configured to parse and obtain synthetic indication information of a region of interest from the description data
  • a control module configured to control a composite playback display of an image of a region of interest in the video stream according to the composite instruction information.
  • An embodiment of the present invention further provides an image processing system, which includes the two image processing devices described above.
  • An embodiment of the present invention further provides a network device, including a first processor, a first memory, and a first communication bus;
  • the first communication bus is configured to implement connection and communication between a first processor and a first memory
  • the first processor is configured to execute one or more computer programs stored in a first memory to implement the steps of the image processing method according to any one of the preceding items.
  • An embodiment of the present invention further provides a terminal including a second processor, a second memory, and a second communication bus;
  • the second communication bus is configured to implement connection and communication between a second processor and a second memory
  • the second processor is configured to execute one or more computer programs stored in a second memory to implement the steps of the image processing method as described above.
  • An embodiment of the present invention further provides a storage medium, where the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the image processing method described above. step.
  • FIG. 1 is a schematic flowchart of an image processing method according to Embodiment 1 of the present invention
  • FIG. 2 is a first schematic diagram of ROI image stitching indication according to the first embodiment of the present invention
  • FIG. 3 is a second schematic diagram of ROI image stitching indication according to the first embodiment of the present invention.
  • FIG. 4 is a third schematic diagram of a ROI image stitching indication according to the first embodiment of the present invention.
  • FIG. 5 is a fourth schematic diagram of ROI image stitching indication according to the first embodiment of the present invention.
  • FIG. 6 is a fifth schematic diagram of ROI image stitching indication according to the first embodiment of the present invention.
  • FIG. 7 is a first schematic diagram of a ROI image fusion indication according to the first embodiment of the present invention.
  • FIG. 8 is a second schematic diagram of a ROI image fusion indication according to the first embodiment of the present invention.
  • FIG. 9 is a schematic diagram of overlapping regions of a ROI image according to the first embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an ROI image nesting instruction according to the first embodiment of the present invention.
  • FIG. 11 is a schematic diagram of processing a transparent channel of an ROI image according to the first embodiment of the present invention.
  • FIG. 12 is a schematic diagram of an ROI image coordinate position according to the first embodiment of the present invention.
  • FIG. 13 is a first schematic diagram of generating a ROI image video stream according to the first embodiment of the present invention.
  • FIG. 14 is a second schematic diagram of generating a ROI image video stream according to the first embodiment of the present invention.
  • FIG. 15 is a schematic flowchart of an image processing method according to a second embodiment of the present invention.
  • FIG. 16 is a schematic flowchart of an image processing method according to a third embodiment of the present invention.
  • FIG. 17 is a schematic structural diagram of an image processing apparatus according to a fourth embodiment of the present invention.
  • FIG. 18 is a schematic structural diagram of an image processing apparatus according to Embodiment 5 of the present invention.
  • FIG. 19 is a schematic structural diagram of an image processing system according to a sixth embodiment of the present invention.
  • FIG. 20 is a schematic structural diagram of a network device according to Embodiment 7 of the present invention.
  • FIG. 21 is a schematic structural diagram of a terminal according to an eighth embodiment of the present invention.
  • an embodiment of the present invention provides an image processing method, which is mainly applied to network-side devices, encoders, etc. However, it is not limited to devices such as servers and base stations. See FIG. 1, which includes the following steps:
  • composition instruction information used to indicate a composite display manner between regions of interest in a video image.
  • composition instruction information is obtained, which is used to indicate a composite display manner between ROIs in the video image. It should be understood that when the video image does not exist or the ROI is not divided, there is no process of obtaining the synthesis instruction information. Only when there are multiple ROIs, corresponding synthetic instruction information is acquired. In the case where there is only one ROI, this solution can also be adopted to control the display of the one ROI.
  • the corresponding synthesis instruction information is acquired to indicate the synthesis display mode of the ROI.
  • ROI including but not limited to:
  • the video image can be analyzed in advance through image processing, ROI recognition and other technologies, and then the specific content or specific spatial position in the panoramic video can be divided through the analysis results to form different ROIs.
  • a camera is used to track and shoot the ball's motion trajectory as an ROI; or to identify and track a specific target (such as a player) in the captured video image using ROI recognition technology to form an ROI .
  • the user's area of interest information is collected, and specific content or specific spatial position in the panoramic video is automatically divided according to this information, thereby forming different areas of interest.
  • the user selects the area of interest while watching the video image.
  • a media stream of the video image is generated based on the obtained composite instruction information. That is, the synthesis instruction information is encoded and written into a code stream of the video image, thereby generating a media stream of the video image.
  • the media stream can be decoded, and at least the ROIs in the video image can be synthesized and displayed for playback.
  • At least one of the following instruction information is included:
  • First instruction information for instructing mosaic display of regions of interest second instruction information for instructing fusion display of regions of interest, third instruction information for instructing nested display of regions of interest, use
  • the fourth instruction information instructs zooming display of the region of interest
  • the fifth instruction information instructs rotation display of the region of interest
  • the sixth instruction information instructs clipping and display of the region of interest.
  • the first indication information is used to instruct stitching of each ROI, so-called stitching, that is, two ROIs are adjacent and do not overlap.
  • stitching that is, two ROIs are adjacent and do not overlap.
  • areas A, B, C, and D are the four regions of interest in the video image. They have the same size and can be stitched together according to where they appear in the panorama.
  • the areas A, B, C, and D may be spliced together at random positions or specified positions.
  • the sizes of the A, B, C, and D regions may be inconsistent.
  • the positions of the A, B, C, and D regions can be randomly arranged, and the sizes are also inconsistent.
  • the A, B, C, and D regions can be formed into a non-rectangular arbitrary shape after being spliced.
  • the second instruction information is used to instruct fusion of each ROI.
  • the fusion causes a partially overlapping area between the two ROIs, but does not completely superimpose one of the ROIs on the other ROI.
  • areas A, B, C, and D are four areas of interest in a video image, and they are overlapped and fused together in a specific range of areas.
  • the composite display mode of the four regions of interest may be direct coverage of pixels in a fixed coverage order, and the order of superimposition is A ⁇ B ⁇ C ⁇ D. Therefore, the last covered D Covered by the other three ROIs.
  • the pixel values of the overlapping partial regions generated by the fusion may be processed in the following manner: as shown in FIG. 9, the pixels of the overlapping partial regions of the four different ROI regions are calculated to generate new pixel values. For example, the average value of all pixels, or different weights for pixels in different regions, or new pixel values calculated according to the feature matching method, to obtain a natural image fusion effect.
  • the feature matching method calculates new pixel values, and is usually applied to network-side devices with strong video processing capabilities to obtain the best fusion effect. It is also theoretically applicable to the terminal side, but it has higher requirements for terminal performance.
  • the third type of instruction information is used to instruct nested display of the ROI, and the nested display is to completely overlap one ROI on the other.
  • areas A and B are two areas of interest in the video image. B is completely overlapped on A and nested together.
  • the nesting position can be set according to actual needs, such as the size of the image frame. , Overlay the relatively small ROI of the image frame on the relatively large ROI, or customize it according to the user.
  • the fourth type of indication information is used to instruct the ROI to be scaled, and the scale is to change the size of the image.
  • Including the zoom ratio value for example, when the zoom ratio value is 2, it can indicate that the diagonal length of the ROI is doubled to the original.
  • the fifth type of instruction information is used to instruct the ROI to be rotated, including a rotation type and a rotation angle.
  • the rotation type includes, but is not limited to, horizontal rotation and vertical rotation.
  • the sixth instruction information is used to instruct the region of interest to be displayed.
  • regions A and B are two regions of interest in the video image.
  • the circular region in region B is intercepted, which can be implemented by using Alpha transparent channels.
  • the extracted B and A may be nested to synthesize the image.
  • the H.264 / AVC standard and the H.265 / HEVC (High Efficiency Video Coding) standard can be used to encode a video image.
  • the obtained composite instruction information is written into a code stream of a video image.
  • feature information of a corresponding ROI in the video image may also be obtained, and a media stream of the video image is generated based on the obtained composite instruction information and the feature information. That is, the above composite instruction information and feature information are simultaneously written into a code stream of a video image.
  • the generated media stream includes at least the following two parts: description data and video stream.
  • description data is mainly used to instruct decoding of a video stream to implement playback of a video image.
  • the description data may include at least one of the following information: for example, time synchronization information, text information, and other related information.
  • the description data as a part of the video image is optional and has the following two forms: First, it can be encoded with the video stream in the form of a code stream, that is, part of the data in the video stream; also It can be encoded separately from the video stream and separated from the video stream.
  • the ROI characteristic information includes position information and / or coding quality indication information; wherein the position information includes coordinate information of a specific position of the ROI, and a length value and a width value of the ROI.
  • the specific position may be a position of any one of the four corners of the ROI region, such as a pixel point in the upper left corner and a pixel point in the lower right corner; or the position of the center point of the ROI region.
  • the encoding quality indication information can be the encoding quality level used in the encoding process. Different encoding quality indication information characterizes different encoding quality levels. After encoding through different encoding quality levels, the image quality produced is also different.
  • the encoding quality indication information may be “1”, “2”, “3”, “4", “5", “6", and different values represent different encoding quality levels.
  • the encoding quality indication information is “1” ", It means that low-quality coding is used; in contrast, when the coding quality indication information is” 2 ", it means that the medium-quality coding is better than" 1 "; the larger the value, the coding quality increases in order.
  • the ROI position information may also be characterized as follows: Referring to FIG. 12, the upper side of the ROI region 121 is located at the 300th line of the video image, the lower side is located at the 600th line of the video image, and the left side The edge is in column 500 of the video image, and the right edge is in column 800 of the video image. That is, the position information of the ROI area is identified by its rank. For the image area of 1920 * 1080, the pixel position of the upper left corner is (0,0), and the pixel position of the lower right corner is (1919,1079).
  • a Cartesian coordinate system may be used, and other non-Cartesian curvilinear coordinate systems may be used, such as a cylindrical surface, a spherical surface, or a polar coordinate system.
  • the length value of the ROI is based on the above FIG. 12.
  • the point can be used as the width value of the ROI. The opposite is also possible.
  • table_id the identifier of the table
  • roi_num contains the number of regions of interest
  • roi_width width of the region of interest
  • roi_height height of the region of interest
  • roi_quality quality information of the region of interest
  • relation_type synthetic indication information of the region of interest, 0 for stitching, 1 for embedding, and 2 for fusion;
  • rotation rotation angle of the region of interest
  • alpha_flag transparent channel identifier, 0 for the absence of transparent channel information, 1 for the presence of transparent channel information;
  • alpha_info transparent channel information, combined with the region of interest (intercepted) to generate a new image
  • filter_info When the relation_type is a fusion method, it can indicate the filtering method of the fusion region, such as the mean, median, etc .;
  • user_data () user information.
  • the above roi_info_table containing the ROI synthesis instruction information and characteristic information is written into the description data of the video image.
  • the description data optionally, it includes at least one of the following: Supplemental Enhancement Information (SEI), video availability information (VideoUsability Information, VUI), system layer media attribute description unit.
  • SEI Supplemental Enhancement Information
  • VUI VideoUsability Information
  • the roi_info_table is written into the supplemental enhancement information in the video bitstream.
  • a specific example can be the structure shown in Table 2 below.
  • roi_info_table contains relevant information (synthesis instruction information, feature information, etc.) of the corresponding ROI, which is written into the supplemental enhancement information, and the information whose identification information is ROI_INFO can be obtained from the SEI information.
  • the information corresponding to the ROI_INFO is used as the identification information of the SEI information.
  • roi_info_flag in Table 3 When roi_info_flag in Table 3 is equal to 1, it indicates that there is subsequent ROI information.
  • roi_info_table () is also the roi_info_table data structure in Table 1 above, and contains ROI related information.
  • the region of interest information whose identification information is roi_info_flag is 1 can be obtained from the VUI information.
  • system layer media attribute description unit includes, but is not limited to, a descriptor for a transport stream, a data unit in a file format (for example, in a Box), and media description information for a transport stream (for example, a media presentation description) (Media, Presentation, Description, MPD).
  • the ROI synthesis instruction information and feature information are written into the SEI, and it can further be combined with its temporal motion-constrained tile sets (MCTS).
  • MCTS temporal motion-constrained tile sets
  • the relevant ROI information and the use of H .265 / HEVC standard time-domain motion limited tile set combination can be combined with the tiles tightly.
  • Combining the ROI synthesis instruction information with the tiles tightly can flexibly extract the required tile data without adding separate codec ROI data, which can meet the different needs of users and is more conducive to interacting with users in the application. As shown in Table 4 below.
  • roi_info_flag 0 indicates that there is no related information of the region of interest, and 1 indicates that there is related information of the region of interest.
  • roi_info An example of roi_info is shown in Table 5 below.
  • roi_num contains the number of regions of interest
  • roi_width width of the region of interest
  • roi_height height of the region of interest
  • roi_quality quality information of the region of interest
  • relation_type the relation of the region of interest, 0 for stitching, 1 for embedding, 2 for fusion;
  • rotation rotation angle of the region of interest
  • alpha_flag transparent channel identifier, 0 for the absence of transparent channel information, 1 for the presence of transparent channel information;
  • filter_info When the relation_type is a fusion method, it can indicate the filtering method of the fusion region, such as the mean, median, etc.
  • video image data is included.
  • the process of generating the video stream includes: obtaining a region of interest of the video image, dividing the associated image of each region of interest in the same image frame into at least one fragmentation unit and independently encoding to generate a first segment of the video image.
  • a video stream is included.
  • a first frame image of a video image is acquired, and an associated image of each ROI in the first frame image is determined. It is assumed that there are two ROIs in the video image, namely ROI131 and ROI132, and Assume that there is an associated image A1 of ROI131 and an associated image B1 of ROI132. At this time, the associated image A1 of ROI131 is divided into at least one fragment unit for independent encoding, and the associated image B1 of ROI132 is divided into at least one fragment unit.
  • the related image A1 is divided into one fragment unit a11 for independent encoding
  • the related image B1 is divided into two fragment units, which are separately encoded for b11 and b12.
  • any existing encoding method can be used for encoding, which can be independent encoding or non-independent encoding.
  • the finally generated first video stream includes at least all independently coded slice units of each ROI.
  • the user when the user only needs to watch the ROI image, he can only extract the fragment units corresponding to the ROI in the first video stream (without extracting all the fragment units) and independently decode the fragment unit. , Without relying on other fragments to complete decoding, reducing the requirements on the receiving end decoding performance.
  • ROI-related images in the video image can be encoded, and other regions except the associated image are not encoded, or the associated image and other regions are encoded separately.
  • the slice unit includes slice slices of the H.264 / AVC standard, tile tiles of the H.265 / HEVC standard, and the like.
  • the video stream in the media stream it may also be a second video stream, and the process of generating the second video stream is as follows: after synthesizing each associated image according to the composition instruction information, the image is processed as a frame to be processed, and The frame division is encoded in at least one slice unit to generate a second video stream of the region of interest.
  • the second video stream is an associated image (C1 and D1, respectively) of the ROI (including ROI141 and ROI142 in FIG. 14) in the same image frame of the video image.
  • the instruction information is synthesized. It is assumed here that stitching is combined, and the synthesized image is regarded as a to-be-processed image frame E1, and then the to-be-processed image frame E1 is divided into at least one fragment unit (such as e11) for encoding; encoding
  • the method can adopt independent encoding, non-independent encoding, or other encoding methods.
  • the other image frames in the video image are also processed in the above manner, and each image frame may be processed in parallel or serial manner. Thereby, the second video stream is generated.
  • the second video stream can be processed by commonly used decoding methods.
  • the synthesized ROI image can be obtained directly, and there is no need to combine the ROI-related images.
  • This encoding method is beneficial to reduce the processing on the decoding side. Load to improve decoding efficiency. But it needs to be synthesized first when encoding.
  • the network side or the encoding end may generate the above two types of video streams for the same video image.
  • the generated media stream can be stored or sent to the corresponding target node. For example, when receiving a video image acquisition request from a target node, sending the media stream to the target node is triggered.
  • sending the media stream to the target node is triggered.
  • the identification information of the acquired content indicated by the acquisition request is parsed, and the media stream is sent to the target node according to the identification information.
  • the server receives a request for a video image from the terminal, and according to the request, the media stream of the video image ( Including the first video stream and the description data) to the terminal.
  • the terminal can decode the media stream to fully play the video image.
  • the terminal can also decode the media stream, extract the segmentable unit data of the region of interest, and combine the description data to play and display the image of the region of interest.
  • the fragment unit (without decoding operation) of the region of interest in the first video stream and the description data are extracted and sent to the target node; for example, the server side may receive the terminal pair
  • the server finds the data corresponding to the region of interest that can independently encode the fragment unit according to the request information. After extraction, it adds the relevant information of the region of interest (synthesis instruction information and feature information) or the modified interest. Area information, generate a new code stream and send it to the terminal. Avoid sending all code streams to the terminal, reducing network bandwidth occupation and transmission delay.
  • the second video stream and the description data are sent to the target node.
  • the server can also choose to send the second video stream of the video image and the description data to the terminal according to the request sent by the terminal, and the terminal can directly obtain the synthesized ROI image after decoding it.
  • the synthesis instruction information synthesizes the ROI, which is beneficial to reducing terminal resource occupation and improving terminal processing efficiency.
  • the video image may be a 360-degree panoramic video, a stereo video, or the like.
  • the relevant information of the ROI can be applied to the left and right fields of view at the same time.
  • the image processing method provided by the embodiment of the present invention by writing the composite instruction information into a video image code stream, is used to indicate the composite display of the ROI image in the video image, thereby realizing the encoding process in which multiple ROIs exist in the video image. It meets the user's viewing requirements for viewing multiple ROI images at the same time.
  • the decoding end can be decoded independently without relying on other fragments to achieve decoding.
  • an embodiment of the present invention provides an image processing method, which is mainly applied to a terminal, a decoder, and the like, including but not limited to a mobile phone, a personal computer, and the like.
  • the image processing method includes the following steps:
  • the synthetic instruction information of the region of interest information is extracted.
  • characteristic information of the ROI may also be obtained from the description data, including position information and coding quality indication information.
  • ROI image data that is, video stream data is obtained.
  • S153 Control the composite playback display of the image of the region of interest in the video stream according to the composite instruction information.
  • the ROI image is synthesized and displayed according to the composition instruction information.
  • the method before receiving the video stream of the video image and the description data, the method further includes sending an acquisition request to a network side (or an encoding end), and the acquisition request may further include identification information for indicating acquisition content. To get different video streams.
  • the identification information when the identification information is set to the first identification, it can be used to instruct the acquisition of the first video stream and description data of the corresponding video image; when the identification information is set to the second identification, it can be used to instruct the first acquisition of the corresponding video image The fragmentation unit and description data of the region of interest in the video stream; when the identification information is set to the third identification, it can be used to instruct the second video stream and description data to obtain the corresponding video image.
  • the media streams received by the network side are different, and there are corresponding differences in subsequent processing.
  • the identification information in the acquisition request is the first identification
  • the first video stream and description data of the corresponding video image will be obtained.
  • the first video stream and description data can be decoded to obtain the complete image of the video image.
  • the independently-encoded slice unit data of the ROI image in the first video stream is extracted, and the ROI image is synthesized and displayed according to the ROI synthesis instruction information in the description data.
  • the terminal can directly perform the decoding operation on the independently decodable ROI of the ROI. After synthesizing the ROI image with the composition instruction information in the description data, the display is displayed.
  • the terminal can directly decode it using conventional decoding methods to obtain a synthesized ROI image, and then perform Play display.
  • the acquisition request is not limited to including identification information for indicating the acquisition of content, but also includes other necessary information, such as the address information of the local end and the opposite end, identification information of the video image requested to be acquired, and verification information.
  • an embodiment of the present invention provides an image processing method, which is mainly applied to a system including a network side and a terminal side.
  • the image processing method mainly includes the following steps:
  • the network side obtains composite instruction information for indicating a composite display manner of each region of interest in the video image.
  • the network side generates a media stream of the video image based on the synthesis instruction information.
  • S163 The network side sends the media stream to the target node.
  • the target node receives the media stream.
  • the target node parses the composite indication information of the region of interest from the media stream.
  • the target node controls the playback display of the video stream in the media stream according to the composite instruction information.
  • Embodiment 1 For details, refer to related descriptions in Embodiment 1 and / or Embodiment 2, and details are not described herein again.
  • the media stream generated by the network side and the media stream sent by the network side to the target node may be the same or different.
  • the network side may flexibly select a video stream to be sent to the target node according to the acquisition request of the target node, instead of a specific video stream. Therefore, the media stream generated by the network side can be used as the first media stream, and the media stream sent to the target node can be used as the second media stream, so as to facilitate differentiation.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the embodiment of the present invention provides an image processing apparatus, which is used to implement the steps of the image processing method according to the first embodiment.
  • the image processing apparatus includes:
  • the obtaining module 171 is configured to obtain composite instruction information for indicating a composite display manner of each of the regions of interest in the video image; and the processing module 172 is configured to generate a media stream of the video image based on the composite instruction information.
  • the processing module 172 is configured to generate a media stream of the video image based on the composite instruction information.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the embodiment of the present invention provides an image processing apparatus, which is used to implement the steps of the image processing method according to the second embodiment.
  • the image processing apparatus includes:
  • a receiving module 181, configured to receive a video stream of video images and description data
  • An analysis module 182 configured to parse and obtain synthesis indication information of a region of interest from the description data
  • the control module 183 is configured to control the playback display of the image of the region of interest in the video stream according to the composition instruction information.
  • Embodiment 6 is a diagrammatic representation of Embodiment 6
  • an embodiment of the present invention provides an image processing system including an image processing device 191 as the fourth embodiment and an image processing device 192 as described in the fifth embodiment, as shown in FIG. 19.
  • the image processing system is used to implement the image processing method described in the third embodiment.
  • Embodiment 7 is a diagrammatic representation of Embodiment 7:
  • the embodiment of the present invention provides a network device. Referring to FIG. 20, it includes a first processor 201, a first memory 202, and a first communication bus 203.
  • the first communication bus 203 is used to implement connection and communication between the first processor 201 and the first memory 202.
  • the first processor 201 is configured to execute one or more computer programs stored in the first memory 202 to implement the steps of the image processing method described in the first embodiment. For details, refer to the description in the first embodiment, and details are not described herein again.
  • Embodiment 8 is a diagrammatic representation of Embodiment 8
  • the embodiment of the present invention provides a terminal.
  • the terminal includes a second processor 211, a second memory, and a second communication bus 213.
  • the second communication bus 213 is used to implement connection and communication between the second processor 211 and the second memory 212.
  • the second processor 211 is configured to execute one or more computer programs stored in the second memory 212 to implement the steps of the image processing method described in the second embodiment. For details, refer to the description in the second embodiment, and details are not described herein again.
  • the embodiment of the present invention provides a storage medium.
  • the storage medium may be a computer-readable storage medium.
  • the storage medium stores one or more computer programs, and the one or more computers.
  • the program may be executed by one or more processors to implement the steps of the image processing method as described in the first embodiment or the second embodiment.
  • the storage medium includes volatile or nonvolatile, removable or non-removable implemented in any method or technology used to store information such as computer-readable instructions, data structures, computer program modules or other data.
  • Storage media include, but are not limited to, RAM (Random Access Memory, Random Access Memory), ROM (Read-Only Memory, Read-Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory) Or other memory technology, CD-ROM (Compact Disc Read-Only Memory), digital versatile disc (DVD) or other optical disc storage, magnetic box, magnetic tape, disk storage or other magnetic storage device, or can be used Any other medium for storing the desired information and accessible by the computer.
  • This embodiment also provides a computer program (also referred to as computer software), which can be distributed on a computer-readable medium and executed by a computable device to implement the above-mentioned first embodiment and / or the second embodiment. At least one step of the image processing method; and in some cases, at least one step shown or described may be performed in an order different from that described in the above embodiments.
  • a computer program also referred to as computer software
  • This embodiment also provides a computer program product including a computer-readable device, where the computer-readable device stores the computer program as shown above.
  • the computer-readable device in this embodiment may include a computer-readable storage medium as shown above.
  • composition instruction information for indicating a composition display manner between regions of interest in a video image, based on the composition instruction Information to generate the media stream of the video image; that is, to write the composite instruction information into the code stream of the video image, to realize the encoding process of the video image when there are multiple (at least two) ROIs, and during video playback, the Based on the composite instruction information, the composite display and playback of each ROI can be controlled, which can meet the user's need to watch multiple ROIs at the same time.
  • the technical effects including but not limited to the above can be achieved.
  • a communication medium typically contains computer-readable instructions, data structures, computer program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium. Therefore, the present invention is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明实施例提供一种图像处理方法、装置、系统、网络设备、终端及存储介质,通过获取用于指示视频图像中感兴趣区域之间的合成显示方式的合成指示信息;基于所述合成指示信息生成所述视频图像的媒体流;也即将该合成指示信息写入到视频图像的码流中,实现在存在多个(至少两个)ROI时视频图像的编码过程。

Description

图像处理方法、装置、系统、网络设备、终端及存储介质
交叉引用
本发明要求在2018年9月19日提交中国专利局、申请号为201811095593.9、发明名称为“图像处理方法、装置、系统、网络设备、终端及存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。
技术领域
本发明实施例涉及但不限于图像编解码技术领域,具体而言,涉及但不限于一种图像处理方法、装置、系统、网络设备、终端及存储介质。
背景技术
时下数字媒体技术快速发展,硬件性能成倍提高,网络带宽大大增加,网速提升,移动设备数量成几何式增长,给视频应用提供了发展的契机。视频应用正逐步从单视点、低分辨率、低码率向多视点、高分辨率、高码率方向迅速发展,以给用户提供全新的视频内容类型和视频呈现特性,以及更好地临场感和观看体验。
360度全景视频(以下简称全景视频)作为一种全新的视频内容类型,用户可以根据主观需求,任意选择某个视角进行观看,从而实现360度的全方位观看。尽管目前网络性能和硬件处理性能较高,但是用户数量的剧增以及全景视频数据量巨大,还是有必要在保证用户观看体验的前提下,降低网络以及硬件资源占用。
目前,感兴趣区域(Region Of Interest,以下简称ROI)技术可以根据用户喜好对全景视频进行截取显示,不必对全部的全景视频进行处理,但是相 关技术中ROI一般只有一个,只能有限地展现全景视频中的部分图像,无法满足用户观看多个ROI的需求。因此,存在多个ROI时如何实现编码,以指示各个ROI的合成显示,亟待解决。
发明内容
本发明实施例提供的图像处理方法、装置、系统、网络设备、终端及存储介质,主要解决的技术问题是存在多个ROI时,如何实现编码。
为解决上述技术问题,本发明实施例提供一种图像处理方法,包括:
获取用于指示视频图像中感兴趣区域之间的合成显示方式的合成指示信息;
基于所述合成指示信息生成所述视频图像的媒体流。
本发明实施例还提供一种图像处理方法,包括:
接收视频图像的视频流以及描述数据;
从所述描述数据中解析得到感兴趣区域的合成指示信息;
根据所述合成指示信息控制所述视频流中感兴趣区域图像的合成播放显示。
本发明实施例还提供一种图像处理方法,包括:
网络侧获取用于指示视频图像中各所述感兴趣区域的合成显示方式的合成指示信息,基于所述合成指示信息生成所述视频图像的媒体流,并将媒体流发送给目标节点;
所述目标节点接收所述媒体流,从所述媒体流中解析得到感兴趣区域的合成指示信息,根据所述合成指示信息控制所述媒体流中视频流的播放显示。
本发明实施例还提供一种图像处理装置,包括:
获取模块,用于获取用于指示视频图像中各所述感兴趣区域的合成显示方式的合成指示信息;
处理模块,用于基于所述合成指示信息生成所述视频图像的媒体流。
本发明实施例还提供一种图像处理装置,包括:
接收模块,用于接收视频图像的视频流以及描述数据;
解析模块,用于从所述描述数据中解析得到感兴趣区域的合成指示信息;
控制模块,用于根据所述合成指示信息控制所述视频流中感兴趣区域图像的合成播放显示。
本发明实施例还提供一种图像处理系统,包括如上所述的两种图像处理装置。
本发明实施例还提供一种网络设备,包括第一处理器、第一存储器及第一通信总线;
所述第一通信总线用于实现第一处理器和第一存储器之间的连接通信;
所述第一处理器用于执行第一存储器中存储的一个或者多个计算机程序,以实现如上任一项所述的图像处理方法的步骤。
本发明实施例还提供一种终端,包括第二处理器、第二存储器及第二通信总线;
所述第二通信总线用于实现第二处理器和第二存储器之间的连接通信;
所述第二处理器用于执行第二存储器中存储的一个或者多个计算机程序,以实现如上所述的图像处理方法的步骤。
本发明实施例还提供一种存储介质,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上所述的图像处理方法的步骤。
本发明其他特征和相应的有益效果在说明书的后面部分进行阐述说明,且应当理解,至少部分有益效果从本发明说明书中的记载变的显而易见。
附图说明
图1为本发明实施例一的图像处理方法流程示意图;
图2为本发明实施例一的ROI图像拼接指示示意图一;
图3为本发明实施例一的ROI图像拼接指示示意图二;
图4为本发明实施例一的ROI图像拼接指示示意图三;
图5为本发明实施例一的ROI图像拼接指示示意图四;
图6为本发明实施例一的ROI图像拼接指示示意图五;
图7为本发明实施例一的ROI图像融合指示示意图一;
图8为本发明实施例一的ROI图像融合指示示意图二;
图9为本发明实施例一的ROI图像重叠区域示意图;
图10为本发明实施例一的ROI图像嵌套指示示意图;
图11为本发明实施例一的ROI图像透明通道处理示意图;
图12为本发明实施例一的ROI图像坐标位置示意图;
图13为本发明实施例一的ROI图像视频流生成示意图一;
图14为本发明实施例一的ROI图像视频流生成示意图二;
图15为本发明实施例二的图像处理方法流程示意图;
图16为本发明实施例三的图像处理方法流程示意图;
图17为本发明实施例四的图像处理装置结构示意图;
图18为本发明实施例五的图像处理装置结构示意图;
图19为本发明实施例六的图像处理系统结构示意图;
图20为本发明实施例七的网络设备结构示意图;
图21为本发明实施例八的终端结构示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,下面通过具体实施方式结合附图对本发明实施例作进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
实施例一:
为了实现视频图像中存在多个ROI时如何进行编码,以满足用户同时对多个ROI进行观看的需求,本发明实施例提供一种图像处理方法,主要应用 于网络侧设备、编码器等,包括但不限于服务器、基站等设备,参见图1,包括如下步骤:
S101、获取用于指示视频图像中感兴趣区域之间的合成显示方式的合成指示信息。
在对视频图像进行编码时,获取合成指示信息,用于指示视频图像中ROI之间的合成显示方式。应当理解,当该视频图像不存在或者未划分ROI时,则不存在获取该合成指示信息的过程。只有当存在多个ROI时,才获取相应的合成指示信息。在只存在一个ROI的情形下,也可以采用本方案,控制该一个ROI的显示。
可选的,在编码过程中,可以先确定视频图像中存在ROI时,再获取相应的合成指示信息,用于指示该ROI的合成显示方式。
对于ROI,包括但不限于通过如下方式设定:
1、可以预先通过图像处理、ROI识别等技术对视频图像进行分析,再通过分析结果,划分出全景视频中的特定内容或特定空间位置,从而形成不同的ROI。例如,足球比赛过程中,使用一个相机单独对球的运动轨迹进行追踪拍摄,将其作为ROI;或者通过ROI识别技术对拍摄的视频图像中特定目标(例如某个球员)进行识别追踪,形成ROI。
2、根据用户需求或者预先设定信息,对视频图像进行手动划分特定内容或者特定空间位置,从而形成不同的感兴趣区域。
3、在视频图像播放过程中,收集用户感兴趣区域信息,根据这些信息自动划分出全景视频中的特定内容或特定空间位置,从而形成不同的感兴趣区域。
4、用户观看视频图像的过程中自行选定感兴趣区域。
S102、基于所述合成指示信息生成所述视频图像的媒体流。
基于获取到的合成指示信息生成该视频图像的媒体流。也即,将该合成指示信息进行了编码,写入到了该视频图像的码流中,从而生成该视频图像的媒体流。对于播放设备而言,可以解码该媒体流,至少可以对该视频图像中各ROI进行合成并显示播放。
对于合成指示信息,包括如下至少一种指示信息:
用于指示将感兴趣区域进行拼接显示的第一指示信息、用于指示将感兴趣区域进行融合显示的第二指示信息、用于指示将感兴趣区域进行嵌套显示的第三指示信息、用于指示将感兴趣区域进行缩放显示的第四指示信息、用于指示将感兴趣区域进行旋转显示的第五指示信息、用于指示将感兴趣区域进行截取显示的第六指示信息。
其中,第一指示信息用于指示将各ROI进行拼接,所谓拼接,也即两个ROI之间相邻且不重叠。可以参见图2,A、B、C、D区域是视频图像中的四个感兴趣区域,它们的尺寸一致,可按照它们在全景图出现的位置拼接起来。
可选的,如图3所示,A、B、C、D区域可以按照随意位置,或者指定位置拼接起来。
可选的,如图4所示,A、B、C、D区域尺寸可以不一致。
可选的,如图5所示,A、B、C、D区域位置可以随意安排,同时尺寸也不一致。
可选的,如图6所示,A、B、C、D区域拼接后可以形成非矩形的任意形状。
第二指示信息用于指示将各ROI进行融合,融合使两个ROI之间存在部分重叠区域,但并非将其中一个ROI完全叠加到另一个ROI上。可以参见图7,A、B、C、D区域是视频图像中的四个感兴趣区域,它们以特定范围的区域重叠融合在一起。
可选的,如图8所示,四个感兴趣区域合成显示方式可以是按照固定覆盖顺序进行像素的直接覆盖,叠加的顺序是A→B→C→D,因此,最后覆盖的D是没有被其他三个ROI覆盖的。
可选的,对于融合所产生的重叠部分区域,其像素值可以采用如下方式处理:如图9所示,四个ROI不同区域的重叠部分像素计算生成新的像素值。比如,所有像素均值,或者不同区域像素设置不同的权值,或者根据特征匹配方法计算得到新的像素值,获得自然的图像融合效果。其中,特征匹配方法计算新的像素值,通常应用于视频处理能力较强的网络侧设备,以尽量获取最好的融合效果,对于终端侧理论上也适用,只是对于终端性能要求较高。
第三种指示信息用于指示将ROI进行嵌套显示,嵌套显示为将其中一个ROI完全重叠在另一个ROI之上。可以参见图10,A、B区域是视频图像中的两个感兴趣区域,将B完全重叠在A之上,嵌套在一起,嵌套的位置可以根据实际需要设定,比如根据图像画面大小,将图像画面相对较小的ROI重叠在相对较大的ROI之上,或者根据用户定制。
第四种指示信息用于指示将ROI进行缩放,缩放即改变图像的大小。包括缩放比例值,例如缩放比例值为2时,可以指示对ROI对角线长度放大为原来的2倍。
第五种指示信息用于指示对ROI进行旋转,包括旋转类型以及旋转角度,其中旋转类型包括但不限于水平旋转、垂直旋转。
第六指示信息用于指示对感兴趣区域进行截取显示,参见图11,A、B区域是视频图像中的两个感兴趣区域,将B区域中圆形区域进行截取,可采用Alpha透明通道实现。可选的,还可以将截取出的B与A进行嵌套,以合成该图像。
在实际应用中,通常可以结合上述六种指示信息中的多种,以对相应的ROI进行合成处理,以更好地满足用户对多个ROI的观看需求。
本实施例中,可以采用H.264/AVC标准、H.265/HEVC(High Efficiency Video Coding,高效视频编码)标准对视频图像进行编码。在进行编码的过程中,将获取的合成指示信息写入到视频图像的码流中。
在本发明的其他示例中,还可以获取该视频图像中相应ROI的特征信息,基于上述获取到的合成指示信息以及该特征信息,生成该视频图像的媒体流。也即同时将上述合成指示信息以及特征信息写入到视频图像的码流中。
其中,生成的媒体流中,至少包括如下两部分:描述数据和视频流。本实施例中,是将获取到的合成指示信息以及特征信息写入到该描述数据中。应当说明的是,描述数据主要用于指示对视频流进行解码,实现视频图像的播放。描述数据中可以包含如下信息中的至少一种:例如时间同步信息、文本信息以及其他关联信息等。
还需说明的是,描述数据作为视频图像的一部分,可选的,存在如下两种形式:一是,可以以码流的形式与视频流一起编码,也即属于视频流中的部分数据;也可以与视频流单独编码,与视频流分离。
ROI特征信息包括位置信息和/或编码质量指示信息;其中位置信息包括ROI特定位置的坐标信息、以及ROI的长度值和宽度值。特定位置可以是ROI区域的四个角中其中任意一个角的位置,例如左上角的像素点、右下角的像素点;还是可以ROI区域的中心点的位置。而编码质量指示信息可以是编码过程所采用编码质量等级,不同的编码质量指示信息表征不同的编码质量等级,通过不同的编码质量等级进行编码后,所产生的图像画质也就不同。例如,编码质量指示信息可以是“1”、“2”、“3”、“4”、“5”、“6”,不同的数值代表不同的编码质量等级,例如编码质量指示信息为“1”时,表示采用低质量编码;相对的,当编码质量指示信息为“2”时,表示采用相对于“1”较好的中质量编码;数值越大,编码质量依次递增。
在本发明的其他示例中,ROI位置信息也可以通过如下方式表征:参见 图12,ROI区域121的上侧边位于视频图像的第300行,下侧边位于视频图像的第600行,左侧边位于视频图像的第500列,右侧边位于视频图像的第800列。也即通过ROI区域所处行列位置标识其位置信息。对于1920*1080的图像区域,左上角的像素点位置为(0,0),右下角的像素点位置为(1919,1079)。涉及二维或三维图像区域,可使用笛卡尔坐标系,也可使用其它非笛卡尔的曲线坐标系,例如柱面、球面或者极坐标系。
应当理解,ROI的长度值,以上述图12为,上下侧边的长度,也即左右侧边的距离可以作为ROI的长度值,即800-500=300像素点,而600-300=300像素点,可以作为ROI的宽度值。反之亦可。
将ROI的合成指示信息以及特征信息参见如下表1所示:
表1
Figure PCTCN2019097355-appb-000001
table_id:表的标识符;
version:版本信息;
length:长度信息;
roi_num:含有感兴趣区域的个数;
(roi_position_x,roi_position_y,roi_position_z):感兴趣区域在视频图像中的坐标信息;
roi_width:感兴趣区域宽度;
roi_height:感兴趣区域高度;
roi_quality:感兴趣区域质量信息;
relation_type:感兴趣区域的合成指示信息,0为拼接,1为嵌入,2为融合;
(roi_new_position_x,roi_new_position_y,roi_new_position_z):感兴趣区域在新图像中的坐标信息;
scale:感兴趣区域缩放比例;
rotation:感兴趣区域旋转角度;
flip:感兴趣区域翻转,0为水平翻转,1为垂直翻转;
alpha_flag:透明通道标识符,0为不存在透明通道信息,1为存在透明通道信息;
alpha_info():透明通道信息,与感兴趣区域结合(截取)产生新的图像;
filter_info():当relation_type为融合方式,则可以指出融合区域的滤波方式,比如均值,中值等;
user_data():用户信息。
将上述包含ROI合成指示信息以及特征信息的roi_info_table写入到视频 图像的描述数据中,对于描述数据,可选的,包括如下至少一种:补充增强信息(Supplemental Enhancement Information,SEI)、视频可用性信息(Video Usability Information,VUI)、系统层媒体属性描述单元。
将roi_info_table写入视频码流中的补充增强信息中,具体示例可以如下表2所示的结构。
表2
Figure PCTCN2019097355-appb-000002
roi_info_table包含了相应ROI的相关信息(合成指示信息、特征信息等),将其写入到补充增强信息中,可以从SEI信息中获取标识信息为ROI_INFO的信息。相当于将ROI_INFO的信息作为SEI信息的标识信息。
将roi_info_table写入视频可用性信息中,具体示例可以参见如下表3所示的结构。
表3
Figure PCTCN2019097355-appb-000003
表3中roi_info_flag取值等于1时,表示后续有ROI信息。roi_info_table() 也即是上述表1中roi_info_table数据结构,包含了ROI相关信息。可以从VUI信息中获取标识信息为roi_info_flag为1的感兴趣区域信息。
将roi_info_table写入系统层媒体属性描述单元,其中系统层媒体属性描述单元包括但不限于传输流的描述子、文件格式的数据单元(例如Box中)、传输流的媒体描述信息(例如媒体呈现描述(Media Presentation Description,MPD)等信息单元)。
将ROI合成指示信息以及特征信息写入到SEI中,还可以进一步结合其时域运动受限瓦片集合(temporal Motion-Constrained Tile Sets,MCTS),可选的,将ROI的相关信息与使用H.265/HEVC标准的时域运动受限瓦片集合结合。将ROI的合成指示信息与瓦片紧密结合,可以在不增加单独编解码ROI数据的基础上,灵活提取所需瓦片数据,这样可以满足用户的不同需求,更有利于应用中与用户互动。如下表4所示。
表4
Figure PCTCN2019097355-appb-000004
其中,roi_info_flag:0表示不存在感兴趣区域的相关信息,1表示存在感兴趣区域的相关信息。
roi_info的一个实例如下表5所示。
表5
Figure PCTCN2019097355-appb-000005
length:长度信息;
roi_num:含有感兴趣区域的个数;
(roi_pos_x,roi_pos_y):感兴趣区域在分片组(Slice Group)或者在瓦片Tile中的坐标信息;
roi_width:感兴趣区域宽度;
roi_height:感兴趣区域高度;
roi_quality:感兴趣区域质量信息;
relation_type:感兴趣区域的关联关系,0为拼接,1为嵌入,2为融合;
(roi_new_pos_x,roi_new_pos_y):感兴趣区域在新图像中的坐标信息;
scale:感兴趣区域缩放比例;
rotation:感兴趣区域旋转角度;
flip:感兴趣区域翻转,0为水平翻转,1为垂直翻转;
alpha_flag:透明通道标识符,0为不存在透明通道信息,1为存在透明通道信息;
alpha_info():透明通道信息,可以与感兴趣区域结合生产新的图像;
filter_info():当relation_type为融合方式,则可以指出融合区域的滤波方式,比如均值,中值等。
对于媒体流中所包含的视频流,包含视频图像数据。其中生成该视频流的过程包括:获取该视频图像的感兴趣区域,将同一图像帧中各感兴趣区域的关联图像,划分在至少一个分片单元中进行独立编码,以生成该视频图像的第一视频流。
参见图13所示,获取视频图像的第一帧图像,确定该第一帧图像中各ROI的关联图像,假设该视频图像存在两个ROI,分别为ROI131和ROI132,且该第一帧图像中假设存在ROI131的关联图像A1,以及ROI132的关联图像B1,此时将ROI131的关联图像A1划分在至少一个分片单元中进行独立编码,同时将ROI132的关联图像B1划分在至少一个分片单元中进行独立编码;或者将关联图像A1和关联图像B1两者划分在至少一个分片单元中进行独立编码;并对该视频图像的其它所有帧进行串行或并行的方式,执行与第一帧图像相似的步骤,直到完成对该视频图像全部图像帧的编码,以生成该第一视频流。
例如将关联图像A1划分为一个分片单元a11进行独立编码,将关联图 像B1划分为两个分片单元,分别为b11和b12进行独立编码。
对于视频图像中除ROI关联图像外的其他区域133,可以采用现有任意编码方式进行编码,可以是独立编码,也可以是非独立编码。最终生成的第一视频流中,至少包括各ROI的全部独立编码的分片单元。对于接收端而言,当用户只需要观看ROI图像时,可以只提取第一视频流中ROI对应的分片单元(而不需要提取全部的分片单元),并对该分片单元进行独立解码,而不需要依赖其他分片即可完成解码,降低对接收端解码性能的要求。
根据需要,还可以只对视频图像中ROI关联图像进行编码,对于除关联图像外的其他区域则不进行编码处理,或者将关联图像与其他区域单独进行编码。
分片单元包括H.264/AVC标准的分片Slice、H.265/HEVC标准的瓦片Tile等。
对于媒体流中的视频流,还可以是第二视频流,其中生成该第二视频流的过程如下:将各关联图像按照合成指示信息进行合成后作为一张待处理图像帧,将待处理图像帧划分在至少一个分片单元中进行编码,以生成感兴趣区域的第二视频流。
参见图14,与第一视频流不同的是,第二视频流是,视频图像的同一图像帧中ROI(图14中包括ROI141和ROI142)的关联图像(分别为C1和D1),先按照合成指示信息进行合成,这里假设为拼接合成,再将合成后的图像作为一张待处理图像帧E1,然后将该待处理图像帧E1划分在至少一个分片单元(例如e11)中进行编码;编码方式在此可以采用独立编码,也可以采用非独立编码,或者其他编码方式。将该视频图像中的其他图像帧也采用上述方式进行处理,各图像帧可以通过并行或串行方式处理。从而生成该第二视频流。
第二视频流对于解码端而言,可以采用常用的解码方式进行处理,解码 后可直接得到合成后的ROI图像,不必再对ROI关联图像进行合并处理,这种编码方式有利于降低解码端的处理负荷,提高解码效率。但是需要在编码时先进行合成处理。
在本发明的其他示例中,网络侧或者编码端可以对同一视频图像,生成上述两种视频流。
对于生成的媒体流,可以进行存储,也可发送给相应的目标节点。例如,在接收到目标节点对视频图像的获取请求时,触发将该媒体流发送给该目标节点。可选的,解析获取请求所指示获取内容的标识信息,根据该标识信息将该媒体流发送给目标节点。
可选的,在标识信息为第一标识时,将第一视频流以及描述数据发送给目标节点;例如,服务器端接收到终端对视频图像的请求,根据该请求将该视频图像的媒体流(包括第一视频流以及描述数据)发送给终端。终端可以解码该媒体流以完整播放该视频图像。当然,终端也可解码该媒体流,提取其中感兴趣区域所在可独立编码分片单元数据,结合描述数据,以对该感兴趣区域图像进行播放显示。
在标识信息为第二标识时,提取第一视频流中感兴趣区域的分片单元(并不进行解码操作)以及所述描述数据,发送给目标节点;例如,可以是服务器端收到终端对感兴趣区域的请求,服务器端根据请求信息找到对应感兴趣区域所在可独立编码分片单元数据,抽取后加上感兴趣区域的相关信息(合成指示信息以及特征信息等)或者修改后的感兴趣区域信息,生成新的码流发送给终端。避免将全部码流发送给终端,降低网络带宽占用以及传输延时。
在标识信息为第三标识时,将第二视频流以及所述描述数据发送给目标节点。例如,服务器还可以根据终端发送的请求,选择将该视频图像的第二视频流以及描述数据发送给终端,终端对其进行解码后可直接得到合成好的ROI图像,不必再根据描述数据中的合成指示信息对ROI进行合成处理,有 利于降低终端资源占用,提升终端处理效率。
在本发明的其他示例中,视频图像可以是360度全景视频,立体视频等。当视频图像为立体视频时,ROI的相关信息(包括合成指示信息以及特征信息等)可同时适用于左右视场。
本发明实施例提供的图像处理方法,通过将合成指示信息写入到视频图像码流中,用于指示视频图像中ROI图像的合成显示,实现了视频图像中存在多个ROI的编码过程,满足了用户同时观看多个ROI图像的观看需求。
通过对ROI图像进行独立编码,可以使解码端进行独立解码,而无需依赖其他分片实现解码,在媒体流发送方式上,可以选择提取其中ROI所在可独立解码分片单元数据发送给终端,而不必将全部分片数据发送给终端,因此有利于降低网络带宽占用,提高传输效率以及解码效率。
实施例二:
本发明实施例在实施例一的基础上,提供一种图像处理方法,主要应用于终端、解码器等,包括但不限于移动电话机、个人计算机等。参见图15,该图像处理方法包括如下步骤:
S151、接收视频图像的视频流以及描述数据。
S152、从描述数据中解析得到感兴趣区域的合成指示信息。
根据描述数据的不同类型,也即ROI相关信息放置的不同位置,如SEI、VUI、MPD等,提取感兴趣区域信息的合成指示信息。其中,合成指示信息的描述请参见实施例一,在此不再赘述。可选的,还可以从描述数据中获取ROI的特征信息,包括位置信息以及编码质量指示信息等。
根据上述ROI的相关信息,获取ROI图像数据,也即视频流数据。
S153、根据合成指示信息控制视频流中感兴趣区域图像的合成播放显示。
根据合成指示信息,将ROI图像进行合成后播放显示。
在本发明的其他示例中,在接收视频图像的视频流以及描述数据之前,还包括向网络侧(或者编码端)发送获取请求,所述获取请求中还可以设置用于指示获取内容的标识信息,从而获取到不同的视频流。
例如,将标识信息设置为第一标识时,可以用于指示获取相应视频图像的第一视频流以及描述数据;将标识信息设置为第二标识时,可以用于指示获取相应视频图像的第一视频流中感兴趣区域的分片单元以及描述数据;将标识信息设置为第三标识时,可以用于指示获取相应视频图像的第二视频流以及描述数据。
当获取请求不同时,接收到网络侧发送的媒体流则不相同,后续处理过程也存在相应区别。例如,当获取请求中标识信息为第一标识时,将获取到相应视频图像的第一视频流以及描述数据,此时可以将第一视频流以及描述数据进行解码,得到该视频图像的完整图像,还可以对该完整图像进行播放。或者,提取第一视频流中ROI图像的可独立编码分片单元数据,根据描述数据中的ROI合成指示信息,对该ROI图像进行合成后播放显示。
当获取请求中标识信息为第二标识时,将获取到相应视频图像的ROI所在可独立解码分片单元以及描述数据,此时终端可以直接对该ROI可独立解码分片单元进行解码操作,根据描述数据中合成指示信息对该ROI图像进行合成后,播放显示。
当获取请求中标识信息为第三标识时,将获取到相应视频图像的第二视频流以及描述数据,此时终端可以直接采用常规解码方式对其进行解码,得到合成后的ROI图像,然后进行播放显示。
应当理解,获取请求并不限于包含用于指示获取内容的标识信息,还应当包括其他必要信息,例如本端及对端的地址信息、所请求获取的视频图像的识别信息、验证信息等。
实施例三:
本发明实施例在实施例一和/或实施例二的基础上,提供一种图像处理方法,主要应用于包括网络侧和终端侧的系统,参见图16,该图像处理方法主要包括如下步骤:
S161、网络侧获取用于指示视频图像中各感兴趣区域的合成显示方式的合成指示信息。
S162、网络侧基于合成指示信息生成视频图像的媒体流。
S163、网络侧将媒体流发送给目标节点。
S164、目标节点接收媒体流。
S165、目标节点从媒体流中解析得到感兴趣区域的合成指示信息。
S166、目标节点根据合成指示信息控制媒体流中视频流的播放显示。
具体请参见实施例一和/或实施例二中的相关描述,在此不再赘述。
应当理解的是,网络侧所生成的媒体流与网络侧发送给目标节点的媒体流可以相同,也可以存在区别。如实施例一和/或实施例二所述,网络侧可以根据目标节点的获取请求,灵活选择发送给目标节点的视频流,而并非特定的视频流。因此,可以将网络侧生成的媒体流作为第一媒体流,而将发送给目标节点的媒体流作为第二媒体流,以便于区别。
实施例四:
本发明实施例在实施例一的基础上,提供一种图像处理装置,用于实现如实施例一所述的图像处理方法的步骤。参见图17,该图像处理装置包括:
获取模块171,用于获取用于指示视频图像中各所述感兴趣区域的合成显示方式的合成指示信息;处理模块172,用于基于合成指示信息生成视频 图像的媒体流。其中图像处理方法的具体步骤请参见实施例一中的描述,在此不再赘述。
实施例五:
本发明实施例在实施例二的基础上,提供一种图像处理装置,用于实现如实施例二所述的图像处理方法的步骤。参见图18,该图像处理装置包括:
接收模块181,用于接收视频图像的视频流以及描述数据;
解析模块182,用于从描述数据中解析得到感兴趣区域的合成指示信息;
控制模块183,用于根据合成指示信息控制视频流中感兴趣区域图像的播放显示。
其中图像处理方法的具体步骤请参见实施例二中的描述,在此不再赘述。
实施例六:
本发明实施例在实施例三的基础上,提供一种图像处理系统,包括如实施例四的图像处理装置191以及如实施例五中所述的图像处理装置192,参见图19。该图像处理系统用于实现如实施例三中所述的图像处理方法。
其中图像处理方法的具体步骤请参见实施例三中的描述,在此不再赘述。
实施例七:
本发明实施例在实施例一的基础上,提供一种网络设备,参见图20,包括第一处理器201、第一存储器202及第一通信总线203;
其中第一通信总线203用于实现第一处理器201和第一存储器202之间的连接通信;
所述第一处理器201用于执行第一存储器202中存储的一个或者多个计算机程序,以实现如实施例一中所述的图像处理方法的步骤。具体请参见实施例一中的描述,在此不再赘述。
实施例八:
本发明实施例在实施例二的基础上,提供一种终端,参见图21,包括第二处理器211、第二存储器及212第二通信总线213;
其中,第二通信总线213用于实现第二处理器211和第二存储器212之间的连接通信;
所述第二处理器211用于执行第二存储器212中存储的一个或者多个计算机程序,以实现如实施例二中所述的图像处理方法的步骤。具体请参见实施例二中的描述,在此不再赘述。
实施例九:
本发明实施例在实施例一、二的基础上,提供一种存储介质,该存储介质可为计算机可读存储介质,该存储介质存储有一个或者多个计算机程序,所述一个或者多个计算机程序可被一个或者多个处理器执行,以实现如实施例一或实施例二中所述的图像处理方法的步骤。
具体请参见实施例一、二中的描述,在此不再赘述。
该存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、计算机程序模块或其他数据)的任何方法或技术中实施的易失性或非易失性、可移除或不可移除的介质。存储介质包括但不限于RAM(Random Access Memory,随机存取存储器),ROM(Read-Only Memory,只读存储器),EEPROM(Electrically Erasable Programmable read only memory,带电可擦可编程只读 存储器)、闪存或其他存储器技术、CD-ROM(Compact Disc Read-Only Memory,光盘只读存储器),数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。
本实施例还提供了一种计算机程序(或称计算机软件),该计算机程序可以分布在计算机可读介质上,由可计算装置来执行,以实现上述实施例一和/或实施例二中的图像处理方法的至少一个步骤;并且在某些情况下,可以采用不同于上述实施例所描述的顺序执行所示出或描述的至少一个步骤。
本实施例还提供了一种计算机程序产品,包括计算机可读装置,该计算机可读装置上存储有如上所示的计算机程序。本实施例中该计算机可读装置可包括如上所示的计算机可读存储介质。
本发明的有益效果是:
根据本发明实施例提供的图像处理方法、装置、系统、网络设备、终端及存储介质,通过获取用于指示视频图像中感兴趣区域之间的合成显示方式的合成指示信息;基于所述合成指示信息生成所述视频图像的媒体流;也即将该合成指示信息写入到视频图像的码流中,实现在存在多个(至少两个)ROI时视频图像的编码过程,在视频播放时,可以基于该合成指示信息控制各ROI的合成显示播放,可满足用户同时对多个ROI进行观看的需求,在某些实施过程中可实现包括但不限于上述技术效果。
可见,本领域的技术人员应该明白,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件(可以用计算装置可执行的计算机程序代码来实现)、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可 以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。
此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、计算机程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。所以,本发明不限制于任何特定的硬件和软件结合。
以上内容是结合具体的实施方式对本发明实施例所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。

Claims (21)

  1. 一种图像处理方法,其特征在于,包括:
    获取用于指示视频图像中感兴趣区域之间的合成显示方式的合成指示信息;
    基于所述合成指示信息生成所述视频图像的媒体流。
  2. 如权利要求1所述的图像处理方法,其特征在于,所述图像处理方法还包括:获取各所述感兴趣区域的特征信息;
    所述基于所述合成指示信息生成所述视频图像的媒体流包括:基于所述合成指示信息以及所述特征信息,生成所述视频图像的媒体流。
  3. 如权利要求1所述的图像处理方法,其特征在于,所述合成指示信息包括如下至少一种:用于指示将所述感兴趣区域进行拼接显示的第一指示信息、用于指示将所述感兴趣区域进行融合显示的第二指示信息、用于指示将所述感兴趣区域进行嵌套显示的第三指示信息、用于指示将所述感兴趣区域进行缩放显示的第四指示信息、用于指示将所述感兴趣区域进行旋转显示的第五指示信息、用于指示将所述感兴趣区域进行截取显示的第六指示信息。
  4. 如权利要求2所述的图像处理方法,其特征在于,所述特征信息包括位置信息和/或编码质量指示信息;所述位置信息包括所述感兴趣区域特定位置的坐标信息、以及所述感兴趣区域的长度值和宽度值。
  5. 如权利要求2所述的图像处理方法,其特征在于,所述媒体流包括描述数据;所述基于所述合成指示信息以及所述特征信息,生成所述视频图像的媒体流包括:
    将所述合成指示信息以及所述特征信息写入到所述描述数据中,以生成所述媒体流。
  6. 如权利要求5所述的图像处理方法,其特征在于,所述描述数据包括如下至少一种:补充增强信息、视频可用性信息、系统层媒体属性描述单元。
  7. 如权利要求5所述的图像处理方法,其特征在于,所述媒体流还包括视频流,所述图像处理方法还包括:
    获取所述视频图像的所述感兴趣区域,将同一图像帧中各所述感兴趣区域的关联图像,划分在至少一个分片单元中进行独立编码,以生成所述视频图像的第一视频流。
  8. 如权利要求7所述的图像处理方法,其特征在于,所述图像处理方法还包括:将各所述关联图像按照所述合成指示信息进行合成后作为一张待处理图像帧,将所述待处理图像帧划分在至少一个分片单元中进行编码,以生成所述感兴趣区域的第二视频流。
  9. 如权利要求8所述的图像处理方法,其特征在于,图像处理方法还包括:将媒体流进行存储或发送给目标节点。
  10. 如权利要求9所述的图像处理方法,其特征在于,所述将所述媒体流发送给目标节点之前,还包括:接收到所述目标节点对所述视频图像的获取请求。
  11. 如权利要求10所述的图像处理方法,其特征在于,所述将所述媒体流发送给目标节点包括:解析所述获取请求所指示获取内容的标识信息,根据所述标识信息将所述媒体流发送给所述目标节点。
  12. 如权利要求11所述的图像处理方法,其特征在于,所述根据所述标识信息将所述媒体流发送给所述目标节点包括:
    在所述标识信息为第一标识时,将所述第一视频流以及所述描述数据发送给所述目标节点;
    在所述标识信息为第二标识时,提取所述第一视频流中感兴趣区域的分片单元以及所述描述数据,发送给所述目标节点;
    在所述标识信息为第三标识时,将所述第二视频流以及所述描述数据发送给所述目标节点。
  13. 如权利要求1-12任一项所述的图像处理方法,其特征在于,所述视频图像为全景视频图像。
  14. 一种图像处理方法,其特征在于,包括:
    接收视频图像的视频流以及描述数据;
    从所述描述数据中解析得到感兴趣区域的合成指示信息;
    根据所述合成指示信息控制所述视频流中感兴趣区域图像的合成播放显示。
  15. 一种图像处理方法,其特征在于,包括:
    网络侧获取用于指示视频图像中各所述感兴趣区域的合成显示方式的合成指示信息,基于所述合成指示信息生成所述视频图像的媒体流,并将媒体流发送给目标节点;
    所述目标节点接收所述媒体流,从所述媒体流中解析得到感兴趣区域的合成指示信息,根据所述合成指示信息控制所述媒体流中视频流的播放显示。
  16. 一种图像处理装置,其特征在于,包括:
    获取模块,用于获取用于指示视频图像中各所述感兴趣区域的合成显示方式的合成指示信息;
    处理模块,用于基于所述合成指示信息生成所述视频图像的媒体流。
  17. 一种图像处理装置,其特征在于,包括:
    接收模块,用于接收视频图像的视频流以及描述数据;
    解析模块,用于从所述描述数据中解析得到感兴趣区域的合成指示信息;
    控制模块,用于根据所述合成指示信息控制所述视频流中感兴趣区域图像的合成播放显示。
  18. 一种图像处理系统,其特征在于,包括:如权利要求16所述的图像处理装置以及如权利要求17所述的图像处理装置。
  19. 一种网络设备,其特征在于,包括第一处理器、第一存储器及第一通信总线;
    所述第一通信总线用于实现第一处理器和第一存储器之间的连接通信;
    所述第一处理器用于执行第一存储器中存储的一个或者多个计算机程序,以实现如权利要求1至13中任一项所述的图像处理方法的步骤。
  20. 一种终端,其特征在于,包括:第二处理器、第二存储器及第二通信总线;
    所述第二通信总线用于实现第二处理器和第二存储器之间的连接通信;
    所述第二处理器用于执行第二存储器中存储的一个或者多个计算机程序,以实现如权利要求14中所述的图像处理方法的步骤。
  21. 一种存储介质,其特征在于,所述存储介质存储有一个或者多个计算机程序,所述一个或者多个计算机程序可被一个或者多个处理器执行,以实现如权利要求1至13中任一项,或如权利要求14中所述的图片处理方法的步骤。
PCT/CN2019/097355 2018-09-19 2019-07-23 图像处理方法、装置、系统、网络设备、终端及存储介质 WO2020057249A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP19862435.5A EP3855750A4 (en) 2018-09-19 2019-07-23 IMAGE PROCESSING PROCESS, APPARATUS AND SYSTEM, NETWORK DEVICE, TERMINAL AND RECORDING MEDIA
JP2021515166A JP7425788B2 (ja) 2018-09-19 2019-07-23 画像処理方法、装置、システム、ネットワーク機器、端末及びコンピュータプログラム
KR1020217011376A KR102649812B1 (ko) 2018-09-19 2019-07-23 이미지 처리 방법, 장치, 시스템, 네트워크 기기, 단말기 및 저장 매체
US17/276,572 US20220053127A1 (en) 2018-09-19 2019-07-23 Image Processing Method, Apparatus and System, Network Device, Terminal and Storage Medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811095593.9A CN110933461B (zh) 2018-09-19 2018-09-19 图像处理方法、装置、系统、网络设备、终端及存储介质
CN201811095593.9 2018-09-19

Publications (1)

Publication Number Publication Date
WO2020057249A1 true WO2020057249A1 (zh) 2020-03-26

Family

ID=69856069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097355 WO2020057249A1 (zh) 2018-09-19 2019-07-23 图像处理方法、装置、系统、网络设备、终端及存储介质

Country Status (6)

Country Link
US (1) US20220053127A1 (zh)
EP (1) EP3855750A4 (zh)
JP (1) JP7425788B2 (zh)
KR (1) KR102649812B1 (zh)
CN (2) CN110933461B (zh)
WO (1) WO2020057249A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965749A (zh) * 2020-12-14 2022-01-21 深圳市云数链科技有限公司 静态摄像机视频传输方法及系统
CN113206853B (zh) * 2021-05-08 2022-07-29 杭州当虹科技股份有限公司 一种视频批改结果保存改进方法
CN113573059B (zh) * 2021-09-23 2022-03-01 中兴通讯股份有限公司 图像显示方法、装置、存储介质及电子装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006352539A (ja) * 2005-06-16 2006-12-28 Sharp Corp 広視野映像システム
CN1889686A (zh) * 2006-07-14 2007-01-03 北京时越网络技术有限公司 一种同时显示多路视频信息的方法
CN101521745A (zh) * 2009-04-14 2009-09-02 王广生 一组多镜头光心重合式全方位摄像装置及全景摄像、转播的方法
CN102265626A (zh) * 2008-12-22 2011-11-30 韩国电子通信研究院 传送有关立体图像的数据的方法、重放立体图像的方法、和创建立体图像的文件的方法
CN105578204A (zh) * 2014-10-14 2016-05-11 青岛海信电器股份有限公司 一种多视频数据显示的方法及装置
CN106331732A (zh) * 2016-09-26 2017-01-11 北京疯景科技有限公司 生成、展现全景内容的方法及装置
CN108322727A (zh) * 2018-02-28 2018-07-24 北京搜狐新媒体信息技术有限公司 一种全景视频传输方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000165641A (ja) * 1998-11-24 2000-06-16 Matsushita Electric Ind Co Ltd 画像処理方法,画像処理装置およびデータ記憶媒体
KR101255226B1 (ko) 2005-09-26 2013-04-16 한국과학기술원 스케일러블 비디오 코딩에서 다중 roi 설정, 복원을위한 장치 및 방법
US9691098B2 (en) * 2006-07-07 2017-06-27 Joseph R. Dollens Method and system for managing and displaying product images with cloud computing
JP5194679B2 (ja) * 2007-09-26 2013-05-08 日産自動車株式会社 車両用周辺監視装置および映像表示方法
US20120140067A1 (en) * 2010-12-07 2012-06-07 Scott Crossen High Definition Imaging Over Legacy Surveillance and Lower Bandwidth Systems
JP5870835B2 (ja) 2012-04-27 2016-03-01 富士通株式会社 動画像処理装置、動画像処理方法および動画像処理プログラム
US9827487B2 (en) * 2012-05-14 2017-11-28 Sphero, Inc. Interactive augmented reality using a self-propelled device
US9497405B2 (en) * 2012-07-17 2016-11-15 Nec Display Solutions, Ltd. Display device for displaying videos side by side without overlapping each other and method for the same
JP6141084B2 (ja) * 2013-04-19 2017-06-07 キヤノン株式会社 撮像装置
CN106664443B (zh) * 2014-06-27 2020-03-24 皇家Kpn公司 根据hevc拼贴视频流确定感兴趣区域
JP2016048839A (ja) * 2014-08-27 2016-04-07 株式会社小糸製作所 電子制御ユニットおよび車両用映像システム

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006352539A (ja) * 2005-06-16 2006-12-28 Sharp Corp 広視野映像システム
CN1889686A (zh) * 2006-07-14 2007-01-03 北京时越网络技术有限公司 一种同时显示多路视频信息的方法
CN102265626A (zh) * 2008-12-22 2011-11-30 韩国电子通信研究院 传送有关立体图像的数据的方法、重放立体图像的方法、和创建立体图像的文件的方法
CN101521745A (zh) * 2009-04-14 2009-09-02 王广生 一组多镜头光心重合式全方位摄像装置及全景摄像、转播的方法
CN105578204A (zh) * 2014-10-14 2016-05-11 青岛海信电器股份有限公司 一种多视频数据显示的方法及装置
CN106331732A (zh) * 2016-09-26 2017-01-11 北京疯景科技有限公司 生成、展现全景内容的方法及装置
CN108322727A (zh) * 2018-02-28 2018-07-24 北京搜狐新媒体信息技术有限公司 一种全景视频传输方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3855750A4 *

Also Published As

Publication number Publication date
JP2022501902A (ja) 2022-01-06
EP3855750A4 (en) 2021-10-20
CN110933461B (zh) 2022-12-30
EP3855750A1 (en) 2021-07-28
JP7425788B2 (ja) 2024-01-31
KR20210059759A (ko) 2021-05-25
KR102649812B1 (ko) 2024-03-21
US20220053127A1 (en) 2022-02-17
CN115883882A (zh) 2023-03-31
CN110933461A (zh) 2020-03-27

Similar Documents

Publication Publication Date Title
US11109013B2 (en) Method of transmitting 360-degree video, method of receiving 360-degree video, device for transmitting 360-degree video, and device for receiving 360-degree video
US11284124B2 (en) Spatially tiled omnidirectional video streaming
CN108476324B (zh) 增强视频流的视频帧中的感兴趣区域的方法、计算机和介质
KR102357137B1 (ko) 이미지 처리 방법, 단말기, 및 서버
CN112204993B (zh) 使用重叠的被分区的分段的自适应全景视频流式传输
JP7399224B2 (ja) メディアコンテンツを送信するための方法、装置及びコンピュータプログラム
US20200112710A1 (en) Method and device for transmitting and receiving 360-degree video on basis of quality
US10757463B2 (en) Information processing apparatus and information processing method
US11694303B2 (en) Method and apparatus for providing 360 stitching workflow and parameter
US20180176650A1 (en) Information processing apparatus and information processing method
WO2020057249A1 (zh) 图像处理方法、装置、系统、网络设备、终端及存储介质
WO2019137313A1 (zh) 一种媒体信息的处理方法及装置
CA3018600C (en) Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
TW201841499A (zh) 用於軌道合成的方法以及裝置
CN110637463B (zh) 360度视频处理方法
KR102499900B1 (ko) 고해상도 영상의 스트리밍을 위한 영상 전송 장치와 영상 재생 장치 및 그 동작 방법
WO2023194648A1 (en) A method, an apparatus and a computer program product for media streaming of immersive media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19862435

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021515166

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20217011376

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019862435

Country of ref document: EP

Effective date: 20210419