WO2022199594A1 - 一种远程视频方法及相关装置 - Google Patents

一种远程视频方法及相关装置 Download PDF

Info

Publication number
WO2022199594A1
WO2022199594A1 PCT/CN2022/082387 CN2022082387W WO2022199594A1 WO 2022199594 A1 WO2022199594 A1 WO 2022199594A1 CN 2022082387 W CN2022082387 W CN 2022082387W WO 2022199594 A1 WO2022199594 A1 WO 2022199594A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
terminal
image
frame
region
Prior art date
Application number
PCT/CN2022/082387
Other languages
English (en)
French (fr)
Inventor
刘尚
胡翔宇
徐卫国
许旺灿
杨小海
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022199594A1 publication Critical patent/WO2022199594A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a remote video method and related devices.
  • remote video has been widely used in scenarios such as video calls, telemedicine, and distance education. Limited by power consumption and delay, the picture resolution of remote video usually does not exceed 1080P. However, in many occasions, if the capture resolution of the video is 1080P, the resolution of the captured video may not meet the actual requirements. For example, in the scenario of remote homework tutoring, even if the video screen is enlarged, the small fonts in the video screen may not be clearly recognized.
  • the resolution of the video collected by the sending end is improved, and the resolution of the collected video is reduced to 1080P before being sent to the receiving end.
  • the receiving end needs to amplify a certain area (this area is usually called a region of interest)
  • the receiving end feeds back the area that needs to be enlarged to the transmitting end.
  • the sender cuts out the area to be enlarged from the captured video image, converts the captured video image to 1080P, and sends it to the receiver. Since the captured video picture is captured in high resolution, it can be ensured that the definition of the video picture meets the requirements.
  • the receiving end remotely controls the sending end, there will be a high response time, which is likely to cause obvious freezes in the video screen. For example, after the receiving end performs an image zoom operation, it needs to wait for a long time (usually more than 300 milliseconds) before the receiving end can receive the data sent by the sending end and display the enlarged video image.
  • An embodiment of the present application provides a remote video method.
  • a sending end adds an image frame with a lower zoom ratio, that is, an image frame that includes more content in the picture. .
  • the receiving end can timely capture the image corresponding to the changed area of interest from the image frame with a lower zoom magnification and display it, that is, the receiving end does not need to wait for a long time.
  • the adjusted video screen can be displayed in time, which improves the response time when adjusting the video area of interest, and avoids the video screen freezing phenomenon.
  • a first aspect of the present application provides a remote video method, which is applied to a first terminal serving as a video capture end and a video transmission end in a remote video process.
  • the method includes: the first terminal acquires a plurality of image frames, the plurality of image frames including a first image frame and a second image frame.
  • the zoom magnification of the first image frame is greater than the zoom magnification of the second image frame, wherein the zoom magnification refers to the magnification of the image output by the image sensor of the camera.
  • the first image frame is determined according to region of interest information, and the region of interest information is used to indicate the location of the region of interest.
  • area of interest information may be stored, and the area of interest information may be fed back by the second terminal to the first terminal.
  • the region of interest information is used to indicate where the region of interest is located.
  • the region of interest refers to an area obtained by a user zooming in, zooming out or panning a video image displayed on the screen of the terminal through an interactive manner such as touching the screen of the terminal with a finger. Simply put, the region of interest is the region to be displayed on the screen of the terminal during the remote video process.
  • the first terminal may determine the location of the region of interest, thereby acquiring the first image frame related to the region of interest.
  • the picture content in the first image frame is the content of the region of interest.
  • the first terminal sends the plurality of image frames and the indication information to the second terminal, so that the second terminal selects image frames to be displayed according to the indication information, and the image frames to be displayed are used to generate a video
  • the indication information includes the region of interest information of the first image frame.
  • the indication information may include coordinate information used to indicate the location of the region of interest. For example, when the region of interest is a rectangular region, the indication information may include coordinate information of four vertices of the rectangular region. For another example, when the region of interest is a rectangular region, the indication information may include coordinate information of a vertex of the rectangular region (eg, the upper left corner vertex of the rectangular region) and the width and height of the rectangular region. In this way, based on the coordinate information of one vertex of the rectangular area and the width and height of the rectangular area, the coordinate information of the four vertexes of the rectangular area can also be calculated.
  • the first terminal as the sending end adds image frames with lower zoom magnifications, that is, image frames with more content in the picture .
  • the second terminal can timely capture and display the image corresponding to the changed area of interest from the image frame with a lower zoom magnification, that is, the second terminal does not need to After waiting for a long time, the adjusted video picture can be displayed, which improves the response time when adjusting the region of interest in the video, and prevents the video picture from freezing.
  • the first image frame includes only the region of interest indicated in the region of interest
  • the second image frame includes only the region of interest information.
  • other regions are included. That is to say, the picture in the first image frame only includes the picture content at the location of the region of interest, while the picture in the second image frame includes other than the picture content at the location of the region of interest. screen content.
  • the second image frame may be an image frame obtained by the first terminal under a preset zoom magnification, and the first image frame may be obtained by the first terminal when the zoom magnification is adjusted according to the region of interest information. image frame.
  • the second image frame may be considered as a global image obtained by the first terminal, that is, the second image frame includes all areas within the field of view of the camera of the first terminal;
  • the first image frame may be considered as a partial image obtained by the first terminal, that is, the first image frame only includes a partial area within the field of view of the camera of the first terminal.
  • acquiring the plurality of image frames by the first terminal includes: acquiring, by the first terminal, a third image frame and the second image frame in sequence, and the first The zoom magnification of the three image frames is the same as the zoom magnification of the second image frame. That is, the first terminal continuously collects multiple image frames at a specific zoom magnification, and the multiple image frames include the above-mentioned third image frame and second image frame. Then, the first terminal crops the third image frame according to the region of interest information to obtain the first image frame.
  • the first terminal crops the third image frame according to the region of interest information in the first terminal, that is, based on the information indicated by the region of interest information
  • the position crops out the region of interest in the third image frame to obtain the first image frame.
  • the manner in which the first terminal acquires the first image frame is digital zooming.
  • acquiring the multiple image frames by the first terminal includes: acquiring, by the first terminal, one or more third image frames, and the one or more third image frames.
  • the zoom magnification of the three image frames is the same as the zoom magnification of the second image frame;
  • the first terminal crops the one or more third image frames according to the region of interest information, and obtains one or more the first image frame;
  • the first terminal determines a third image frame among the one or more third image frames as the second image frame. That is to say, based on the region of interest information, the first terminal can continuously collect the first image frame (that is, the partial image frame) by means of digital zoom, and insert the second image frame (that is, the partial image frame) into the acquired multiple first image frames. global image frame).
  • the global image frame By inserting the global image frame into a plurality of continuous partial image frames, the frequency of displaying the partial image frames by the second terminal can be ensured, and the smoothness of the video picture can be improved.
  • acquiring, by the first terminal, the multiple image frames includes: acquiring, by the first terminal, the first image frame and the second image frame by means of optical zooming. image frame.
  • the optical zoom means that the first terminal relies on the optical lens structure to achieve zoom, that is, the first terminal zooms in and out of the scene to be photographed by moving the lens.
  • both digital zoom and optical zoom help to zoom in on distant objects during telephoto shooting, only optical zoom can support the image subject to add more pixels, making the subject not only larger, but also relatively clearer. That is to say, when the object in the image is enlarged by means of optical zoom, the object in the image can be made relatively clearer.
  • the optical zoom takes the center of the field of view of the lens as the center point, the image frame captured by the lens is enlarged or reduced by changing the focal length, so the process of collecting the first image frame by the first terminal by means of optical zoom is actually to calculate the coverage of the field of view.
  • the longest focal length of the region of interest, and then the first image frame is acquired based on the longest focal length.
  • the plurality of image frames collected by the first terminal may further include a fourth image frame, and the zoom ratio of the fourth image frame is the same as that of the second image.
  • the zoom ratio of the frame is the same.
  • the process that the first terminal acquires multiple image frames specifically includes: the first terminal sequentially acquires the fourth image frame, one or more of the first image frames and the second image according to a preset rule
  • the preset rule is that after the first terminal collects a preset number of image frames based on the region of interest information, the first terminal uses the target zoom magnification to collect one image frame, and the preset number is equal to one or more image frames.
  • the number of the first image frames is the same, and the target zoom magnification is the zoom magnification of the second image frame.
  • the first terminal each time the first terminal collects a specific number of image frames based on the region of interest information, it uses a lower zoom magnification to collect a global image frame, thereby ensuring the frequency at which the second terminal receives the global image frame, It is ensured that when the second terminal subsequently clips the region of interest based on the global image frame, the phenomenon that the picture changes too much will not occur.
  • the first terminal may select a corresponding reference frame according to the frame type of the image frame.
  • the sender will select the previous image frame of the current image frame as the reference frame, then calculate the difference information between the current image frame and the reference frame, and send the current image frame and the reference frame to the receiver.
  • the difference information between the reference frames is used instead of directly sending the current image frame to the receiving end, thereby reducing the transmitted data.
  • the first image frame and the second image frame are collected based on different zoom ratios. Therefore, compared with two adjacent first image frames, the difference between the first image frame and the second image frame is relatively large.
  • the first terminal may select a corresponding reference frame according to the frame type of the image frame, so as to ensure that the frame type between the image frame currently to be encoded and the reference frame of the image frame is the same.
  • the sending, by the first terminal, the multiple image frames to the second terminal includes: acquiring, by the first terminal, reference frames of the multiple image frames; and, by the first terminal, according to the multiple image frames
  • the reference frame of the first image frame is encoded, and the encoding result is obtained; the first terminal sends the encoding result to the second terminal; wherein, the reference frame of the first image frame is based on the interest obtained from the area information, the zoom magnification of the reference frame of the second image frame is the same as the zoom magnification of the second image frame.
  • the frame type of the partial image frame (that is, the above-mentioned first image frame) acquired by the first terminal based on the region of interest information is the first type, and is based on a specific zoom magnification.
  • the frame type of the acquired global image frame (that is, the above-mentioned second image frame) is the second type.
  • the first terminal may determine that the reference frame of the image frame is the previous image frame of the first type of the image frame, that is, the one closest to the image frame in the instant domain The first type of image frame.
  • the first terminal can determine that the reference frame of the image frame is the image frame of the second type immediately preceding the image frame, which is a distance from the image frame in the instant domain.
  • the nearest image frame of the second type That is, for any image frame, the reference frame of the image frame is an image frame of the same type that is closest to the image frame in the time domain.
  • the first terminal encodes the multiple image frames according to the reference frames of the multiple image frames, and obtains an encoding result, including: the first terminal According to the reference frame of the image frame, the first encoder encodes the image frame to obtain a first encoding result; the first terminal encodes the image frame through the second encoder according to the reference frame of the second image frame.
  • the second image frame is encoded to obtain a second encoding result; wherein, the encoding result includes the first encoding result and the second encoding result.
  • the encoding is performed using the image frame preceding each first image frame as a reference frame.
  • the reference frame of the second image frame is an image frame before the second image frame in the second encoder, and the image frame before the second image frame is the same as the first image frame.
  • the two image frames are of the same type.
  • the indication information further includes the information of the multiple image frames.
  • frame type the frame type of the first image frame is different from the frame type of the second image frame.
  • the indication information indicates that the frame type of the first image frame is the first type, and the frame type of the second image frame is the second type.
  • a certain bit used to indicate the frame type in the indication information when a certain bit used to indicate the frame type in the indication information is set to 1, it indicates that the frame type of the current image frame is the second type, that is, the above-mentioned second image frame; when the indication information uses When a certain bit indicating the frame type is not set (ie, the value of the bit is 0), it indicates that the frame type of the current image frame is the first type, that is, the above-mentioned first image frame.
  • a second aspect of the present application provides a remote video method, and the method is applied to a first terminal serving as a video receiving end.
  • the method includes: the first terminal receives multiple image frames and indication information sent by the second terminal, the multiple image frames include a first image frame and a second image frame, and the zoom ratio of the first image frame is greater than the zoom ratio of the first image frame.
  • the second image frame, the first image frame is determined according to the region of interest information, the region of interest information is used to indicate the location of the region of interest, and the indication information includes the region of interest of the first image frame information; if the region of interest information of the first image frame is different from the region of interest information in the first terminal, crop the second image frame according to the region of interest information in the first terminal A third image frame is obtained, and the third image frame is displayed; if the ROI information of the first image frame is the same as the ROI information in the first terminal, the first image frame is displayed.
  • the second terminal can receive the interaction instruction from the user, and update the area of interest information according to the interaction instruction.
  • the second terminal needs to feed back the updated ROI information to the first terminal, and the first terminal acquires a new image frame based on the updated ROI information, and then sends the new image frame to the second terminal terminal. Therefore, the image frames matching the updated ROI information have a certain hysteresis, that is, within a period of time after the second terminal has updated the local ROI information, the image frames received by the second terminal correspond to the The region of interest information is not the same as the updated region of interest information.
  • the second terminal determines whether the ROI information of the first image frame is the same as the ROI information stored in the second terminal. If the region of interest information of the first image frame is different from the region of interest information stored in the second terminal, the second terminal cuts the region of interest information from the second image frame according to the region of interest information in the second terminal. a third image frame, and displaying the third image frame. To put it simply, since the zoom ratio of the second image frame is greater than that of the first image frame, the second image frame is actually a global image frame. The location of the new region of interest is determined, and the location of the new region of interest is cropped to obtain a third image frame. The content in the third image frame is the content corresponding to the position indicated by the region of interest information of the second terminal.
  • the second image frame includes the region of interest indicated in the region of interest information.
  • the first image frame is a pair of images obtained by the second terminal according to the region of interest information in the second terminal after collecting one or more third image frames.
  • the one or more third image frames are obtained by cropping, and the zoom magnification of the one or more third image frames is the same as the zoom magnification of the second image frame.
  • the second image frame is one image frame among the one or more third image frames.
  • the plurality of image frames are acquired by the second terminal by means of optical zooming.
  • the plurality of image frames further include a fourth image frame, and the zoom magnification of the fourth image frame is the same as the zoom magnification of the second image frame;
  • Four image frames, one or more of the first image frame and the second image frame are sequentially acquired by the second terminal according to a preset rule; wherein, the preset rule is that the second terminal based on the After collecting a preset number of image frames for the region of interest information, use a target zoom magnification to collect one image frame, the preset number is the same as the number of one or more first image frames, and the target zoom magnification is The zoom magnification of the second image frame.
  • the indication information further includes frame types of the multiple image frames, and the frame type of the first image frame is different from the frame type of the second image frame;
  • the method further includes: the first terminal sequentially sends the first image frame to a first buffer and sends the second image frame to a second buffer according to the frame types of the plurality of image frames .
  • the first buffer is used for storing the first image frame whose frame type is the first type
  • the second buffer is used for storing the second image frame whose frame type is the second type.
  • the second terminal Every time the second terminal receives a new image frame, the second terminal determines the frame type of the image frame, and if the frame type of the image frame is the first type, the image is sent to the first buffer; If the frame type of the frame is the second type, the image frame is sent to the second buffer. On the first and second buffers, old image frames are overwritten with new image frames.
  • Obtaining a third image frame and displaying the third image frame includes: if the region of interest information of the first image frame in the first buffer is different from the region of interest information in the first terminal, then According to the region of interest information in the first terminal, a third image frame is obtained by cropping the second image frame in the second buffer, and the third image frame is displayed;
  • the ROI information in the first terminal is the same as the ROI information in the first terminal
  • displaying the first image frame includes: if the ROI information of the first image frame in the first buffer is the same as the ROI information in the first terminal If the region of interest information in the first terminal is the same, the first image frame in the first buffer is displayed.
  • the method further includes: when a user initiates an interaction instruction on the second terminal, the second terminal acquires an interaction instruction, and the interaction instruction is used to indicate that the user is interested in changing area.
  • the interaction instruction may be a zoom-out operation instruction, a zoom-in operation instruction, or a pan operation instruction initiated by the user by touching the screen of the second terminal.
  • the zoom-out operation instruction is used to instruct to zoom out the screen displayed on the screen of the second terminal with the target area as the starting point.
  • the zoom-in operation instruction is used to instruct to zoom in on the target area displayed on the screen of the second terminal.
  • the panning operation instruction is used to instruct to pan the picture displayed on the screen of the second terminal in a specific direction.
  • the second terminal executes the interactive instruction
  • the obtained area to be displayed is the new area of interest
  • the second terminal can update the area of interest information according to the interactive instruction to obtain the updated area of interest information.
  • the second terminal sends the updated ROI information to the first terminal.
  • a third aspect of the present application provides a terminal, including: an acquisition unit, a processing unit, and a transceiver unit; the acquisition unit is configured to acquire multiple image frames, the multiple image frames include a first image frame and a second image frame, The zoom magnification of the first image frame is greater than the zoom magnification of the second image frame, and the first image frame is determined according to the region of interest information, and the region of interest information is used to indicate the position of the region of interest;
  • the first terminal sends the plurality of image frames and the indication information to the second terminal, so that the second terminal selects image frames to be displayed according to the indication information, and the image frames to be displayed are used to generate a video , the indication information includes the region of interest information of the first image frame.
  • the second image frame includes the region of interest indicated in the region of interest information.
  • the acquiring unit is further configured to acquire a third image frame and the second image frame, and the zoom magnification of the third image frame is the same as that of the second image frame.
  • the zoom magnification is the same;
  • the processing unit is further configured to crop the third image frame according to the region of interest information to obtain the first image frame.
  • the acquisition unit is further configured to acquire one or more third image frames, and the zoom magnification of the one or more third image frames is the same as that of the second image.
  • the zoom ratios of the frames are the same;
  • the processing unit is further configured to crop the one or more third image frames according to the region of interest information to obtain one or more first image frames;
  • the processing The unit is further configured to determine a third image frame of the one or more third image frames as the second image frame.
  • the acquisition unit acquires the first image frame and the second image frame by means of optical zooming.
  • the plurality of image frames further include a fourth image frame, and the zoom magnification of the fourth image frame is the same as the zoom magnification of the second image frame;
  • the acquiring The unit is further configured to sequentially collect the fourth image frame, one or more of the first image frames and the second image frame according to a preset rule; wherein the preset rule is that the first terminal is based on the After collecting a preset number of image frames for the region of interest information, use a target zoom magnification to collect one image frame, the preset number is the same as the number of one or more first image frames, and the target zoom magnification is The zoom magnification of the second image frame.
  • the obtaining unit is further configured to obtain reference frames of the multiple image frames;
  • the processing unit is further configured to, according to the reference frames of the multiple image frames,
  • the multiple image frames are encoded to obtain an encoding result;
  • the transceiver unit is further configured to send the encoding result to the second terminal; wherein, the reference frame of the first image frame is obtained according to the region of interest information
  • the zoom magnification of the reference frame of the second image frame is the same as the zoom magnification of the second image frame.
  • the processing unit is further configured to encode the image frame by using the first encoder according to the reference frame of the image frame to obtain a first encoding result; the The processing unit is further configured to encode the second image frame by using the second encoder according to the reference frame of the second image frame to obtain a second encoding result; wherein the encoding result includes the first encoding result and the second encoding result.
  • the indication information further includes frame types of the multiple image frames, and the frame type of the first image frame is different from the frame type of the second image frame.
  • a fourth aspect of the present application provides a terminal, including: an acquisition unit, a transceiving unit, and a processing unit; the transceiving unit is configured to receive multiple image frames and indication information sent by a second terminal, where the multiple image frames include a first An image frame and a second image frame, the zoom magnification of the first image frame is greater than that of the second image frame, the first image frame is determined according to the area of interest information, and the area of interest information is used to indicate the sense of the location of the region of interest, the indication information includes the region of interest information of the first image frame; the processing unit is configured to: if the region of interest information of the first image frame is consistent with the sense of the first terminal If the area of interest information is different, a third image frame is obtained by cropping the second image frame according to the area of interest information in the first terminal, and the third image frame is displayed; If the region of interest information is the same as the region of interest information in the first terminal, the first image frame is displayed.
  • the second image frame includes the region of interest indicated in the region of interest information.
  • the first image frame is a pair of images obtained by the second terminal according to the region of interest information in the second terminal after collecting one or more third image frames.
  • the one or more third image frames are obtained by cropping, and the zoom magnification of the one or more third image frames is the same as the zoom magnification of the second image frame.
  • the second image frame is one image frame among the one or more third image frames.
  • the plurality of image frames are acquired by the second terminal by means of optical zooming.
  • the plurality of image frames further include a fourth image frame, and the zoom magnification of the fourth image frame is the same as the zoom magnification of the second image frame;
  • Four image frames, one or more of the first image frame and the second image frame are sequentially acquired by the second terminal according to a preset rule; wherein, the preset rule is that the second terminal based on the After collecting a preset number of image frames for the region of interest information, use a target zoom magnification to collect one image frame, the preset number is the same as the number of one or more first image frames, and the target zoom magnification is The zoom magnification of the second image frame.
  • the indication information further includes frame types of the multiple image frames, and the frame type of the first image frame is different from the frame type of the second image frame;
  • the processing unit is further configured to send the first image frame to the first buffer and send the second image frame to the second buffer in sequence according to the frame types of the plurality of image frames;
  • the processing The unit is further configured to: if the region of interest information of the first image frame in the first buffer is different from the region of interest information in the first terminal, according to the region of interest information in the first terminal A third image frame is obtained by cropping the second image frame in the second buffer, and the third image frame is displayed; if the region of interest information of the first image frame in the first buffer is the same as the If the region of interest information in the first terminal is the same, the first image frame in the first buffer is displayed.
  • the acquiring unit is configured to acquire an interaction instruction, and the interaction instruction is used to instruct to change the area of interest; the processing unit is further configured to update the interested area according to the interaction instruction. area information to obtain updated area of interest information; the transceiver unit is further configured to send the updated area of interest information to the second terminal.
  • a fifth aspect of the present application provides a terminal, the terminal includes: a processor, a non-volatile memory, and a volatile memory; wherein, computer-readable instructions are stored in the non-volatile memory or the volatile memory; the processor Computer readable instructions are read to cause the terminal to implement the method as any one of the first aspect or the second aspect.
  • a sixth aspect of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute any one of the first aspect or the second aspect. way method.
  • a seventh aspect of the present application provides a computer program product, which, when run on a computer, causes the computer to perform the method of any one of the implementations of the first aspect or the second aspect.
  • An eighth aspect of the present application provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory to execute the method in any possible implementation of any of the above aspects.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire.
  • the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used for receiving data and/or information to be processed, the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • the method provided by the present application may be implemented by one chip, or may be implemented by multiple chips cooperatively.
  • FIG. 1 is a schematic diagram of a video picture provided by an embodiment of the present application.
  • Fig. 2 is the remote video flow schematic diagram of related art one
  • Fig. 3 is the remote video flow schematic diagram of related art two
  • FIG. 4 is a schematic structural diagram of a terminal 101 according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a remote video method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of image comparison of different zoom magnifications provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a comparison of region of interest information in different terminals at different times according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of obtaining a plurality of image frames by means of digital zooming according to an embodiment of the present application.
  • FIG. 9 is another schematic diagram of obtaining multiple image frames according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a comparison of obtaining image frames based on different methods according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of determining a reference frame according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of encoding based on two encoders according to an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a remote video method according to an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of a terminal 1400 according to an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a computer program product 1500 provided by an embodiment of the present application.
  • the picture resolution of the remote video usually does not exceed 1080P (ie, the pixels are 1920 ⁇ 1080).
  • FIG. 1 is a schematic diagram of a video screen provided by an embodiment of the present application.
  • the video capture resolution is 1080P
  • the video capture resolution is 1080P
  • the small fonts in the video screen may not be recognized. clear.
  • the resolution of the video collected by the sender is increased, for example, the resolution of the video collected by the sender is increased to 3840 ⁇ 2160 (referred to as 4k resolution), so as to send a higher resolution to the receiver.
  • video so as to solve the problem of unclear video.
  • FIG. 2 is a schematic diagram of a remote video flow in the first related art.
  • the sending end collects 4k resolution video through the camera, and after encoding the video at 4k resolution, sends the encoded video to the receiving end. After receiving the encoded video, the receiving end decodes the video to obtain a 4k resolution video, and displays the 4k resolution video.
  • the resolution of the remote video can be effectively improved.
  • the power consumption and time delay of encoding 4k resolution video are increased by 4 times, which affects the real-time performance of remote video.
  • the encoding bit rate for encoding 4k resolution video is also increased by 4 times, which will bring about a large bandwidth cost, which makes it difficult to apply this solution in practical business scenarios.
  • the resolution of the video collected by the sending end is increased to 3840 ⁇ 2160, and the resolution of the collected video is reduced to 1080P before sending to the receiving end. end.
  • the receiving end when the receiving end needs to amplify a certain area (this area can usually be called a region of interest), the receiving end feeds back the area that needs to be enlarged to the transmitting end.
  • the sender After the sender obtains the area that needs to be enlarged, the sender cuts out the area to be enlarged from the captured video image, converts the captured video image to 1080P, and sends it to the receiver. Since the captured video picture is captured at high resolution, it can be ensured that the definition of the video picture meets the requirements.
  • FIG. 3 is a schematic diagram of a remote video process of the second related art.
  • the sender collects 4k resolution video through the camera, and intercepts the original video image according to the region of interest information to obtain a video after part of the image is intercepted. Then, the transmitting end encodes the video after cutting out part of the screen at 1080P resolution, and sends the encoded video to the receiving end. After receiving the encoded video, the receiving end decodes the video to obtain a 1080P resolution video, and displays the 1080P resolution video.
  • the sending end collects video at a resolution of 1920 ⁇ 1080, encodes the collected video at a resolution of 1920 ⁇ 1080, and sends it to the receiving end. Then, the receiver can receive a video with a resolution of 1920 ⁇ 1080. If the receiving end needs to zoom in and display a certain area of interest with a size of 960x540, the receiving end needs to intercept the video image corresponding to the area of interest from the received original video, and upsample the captured video image to the size It is a 1920 ⁇ 1080 picture, and finally the up-sampled video picture is displayed. Since the original video image is actually collected at a resolution of 1920 ⁇ 1080, the video image corresponding to the region of interest actually displayed by the receiving end is the video image upsampled by the receiving end, so the definition of the video image is not high.
  • the sender collects the video with a resolution of 3840 ⁇ 2160, encodes the collected video with a resolution of 1920 ⁇ 1080, and sends it to the receiver, that is, after downsampling the original video to 1920 ⁇ 1080, then The downsampled video is encoded. Then, the receiver can receive a video with a resolution of 1920 ⁇ 1080. If the receiving end needs to enlarge and display a certain area of interest with a size of 960x540, the receiving end needs to feed back the position of the area of interest to the transmitting end.
  • the sender After receiving the position of the region of interest, the sender can determine that in the original video collected, the corresponding size of the region of interest is actually 1920 ⁇ 1080. Therefore, the transmitting end can intercept a region of interest with a size of 1920 ⁇ 1080 from the captured original video, encode the intercepted video at a resolution of 1920 ⁇ 1080, and send it to the receiving end. In this way, the receiving end can receive the video clipped from the 4k resolution original video, and the receiving end does not need to upsample the received video picture, so the video picture displayed by the receiving end is of high definition.
  • the receiving end needs to feed back the position of the region of interest to the transmitting end, and the transmitting end performs corresponding processing, and then transmits the processed video to the receiving end.
  • the receiving end remotely controls the sending end, there will be a relatively high response time, which may easily lead to obvious freezes in the video screen.
  • the receiving end after the receiving end performs an image zoom operation, it needs to wait for a long time (usually more than 300 milliseconds) before the receiving end can receive the data sent by the sending end and display the enlarged video image.
  • the remote video method provided by the embodiments of the present application can be applied to a terminal with a video capture function.
  • the terminal is also called user equipment (UE), mobile station (MS), mobile terminal (MT), etc. It is a device equipped with an image capture device capable of shooting video, and It can communicate with other devices remotely to transmit the captured video to other devices. For example, handheld devices with shooting capabilities, surveillance cameras, etc.
  • terminals are: mobile phone (mobile phone), tablet computer, notebook computer, PDA, surveillance camera, mobile internet device (mobile internet device, MID), wearable device, virtual reality (virtual reality, VR) device , Augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical surgery, smart grids wireless terminal in grid), wireless terminal in transportation safety, wireless terminal in smart city, wireless terminal in smart home, etc.
  • the image acquisition device in the terminal is used to convert optical signals into electrical signals to generate image signals.
  • the image acquisition device may be, for example, an image sensor, and the image sensor may be, for example, a charge coupled device (Charge Coupled Device, CCD) or a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS).
  • CCD Charge Coupled Device
  • CMOS complementary metal oxide semiconductor
  • FIG. 4 is a schematic structural diagram of a terminal 101 according to an embodiment of the present application.
  • the terminal 101 includes a processor 103 , and the processor 103 is coupled to a system bus 105 .
  • the processor 103 may be one or more processors, each of which may include one or more processor cores.
  • a video adapter 107 which can drive a display 109, is coupled to the system bus 105.
  • the system bus 105 is coupled to an input-output (I/O) bus through a bus bridge 111 .
  • I/O interface 115 is coupled to the I/O bus.
  • the I/O interface 115 communicates with various I/O devices, such as an input device 117 (eg, a touch screen, etc.), a media tray 121, (eg, a compact disc read-only memory, CD- ROM), multimedia interface, etc.).
  • Transceiver 123 which can transmit and/or receive radio communication signals
  • camera 155 which can capture still and moving digital video images
  • external USB port 125 external USB port 125 .
  • the interface connected to the I/O interface 115 may be a USB interface.
  • the processor 103 may be any conventional processor, including a reduced instruction set computing (reduced instruction set computing, RISC) processor, a complex instruction set computing (complex instruction set computing, CISC) processor or a combination of the above.
  • the processor may be a special purpose device such as an ASIC.
  • Terminal 101 may communicate with software deployment server 149 through network interface 129 .
  • network interface 129 is a hardware network interface, such as a network card.
  • the network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet network or a virtual private network (VPN).
  • the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.
  • the hard drive interface 131 is coupled to the system bus 105 .
  • the hardware driver interface is connected to the hard disk drive 133 .
  • System memory 135 is coupled to system bus 105 .
  • the data running in the system memory 135 may include the operating system (OS) 137 of the terminal 101 , the application programs 143 and the schedule.
  • OS operating system
  • the operating system includes a Shell 139 and a kernel 141 .
  • Shell 139 is an interface between the user and the operating system's kernel.
  • the shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for user input, interpreting user input to the operating system, and processing various operating system output.
  • Kernel 141 consists of those parts of the operating system that manage memory, files, peripherals, and system resources.
  • the kernel 141 directly interacts with the hardware, and the operating system kernel usually runs processes, provides inter-process communication, provides CPU time slice management, interrupts, memory management, IO management, and the like.
  • the application program 143 includes a remote video related program.
  • the terminal 101 can realize remote video with another terminal by executing the application program 143 . That is, the terminal 101 can collect video through the camera 155, and send the collected video to another terminal after being processed and encoded by the processing 103.
  • the terminal 101 may download the application 143 from the software deployment server 149 when the application 143 needs to be executed.
  • FIG. 5 is a schematic flowchart of a remote video method provided by an embodiment of the present application. As shown in Figure 5, the remote video method includes the following steps.
  • Step 501 the first terminal acquires multiple image frames, the multiple image frames include a first image frame and a second image frame, and the zoom magnification of the first image frame is greater than the zoom magnification of the second image frame, so
  • the first image frame is determined according to the region of interest information, and the region of interest information is used to indicate the location of the region of interest.
  • the first terminal is a sending end that collects video and sends the video to the receiving end
  • the second terminal is a receiving end that receives the video sent by the first terminal and displays the video.
  • the first terminal may continuously collect image frames at a fixed frame rate, so as to obtain a plurality of continuous image frames.
  • a first image frame and a second image frame are included.
  • the zoom magnifications of the first image frames are all greater than the zoom magnifications of the second image frames.
  • the zoom ratio refers to the magnification of the image output by the image sensor of the camera.
  • the larger the zoom ratio of the camera the larger the subject in the image output by the camera, and the smaller the range captured in the image;
  • the subject in the output image is also smaller, and the range captured in the image is larger.
  • FIG. 6 is a schematic diagram of image comparison with different zoom magnifications provided by an embodiment of the present application.
  • the camera captures image 1 with a smaller zoom ratio, and captures image 2 with a larger zoom ratio.
  • the shooting range of image one is the whole body of the skier.
  • the shooting range of the second image is the skier's head, that is, the shooting range of the second image is smaller than that of the first image.
  • the head of the skier photographed in the second image is larger than the head of the skier photographed in the first image.
  • the second image can be considered to be obtained by enlarging the area where the skier's head in the first image is located.
  • area of interest information may be stored, and the area of interest information may be fed back by the second terminal to the first terminal.
  • the region of interest information is used to indicate where the region of interest is located.
  • the region of interest refers to an area obtained by a user zooming in, zooming out or panning a video image displayed on the screen of the terminal through an interactive manner such as touching the screen of the terminal with a finger.
  • the region of interest is the region to be displayed on the screen of the terminal during the remote video process.
  • image 1 is displayed on the terminal screen.
  • the user touches the terminal screen with his finger to zoom in on the skier's head area, and the terminal can obtain the corresponding region of interest (ie, image 2) based on the user's interactive instructions. the skier's head area shown).
  • the first terminal may determine the location of the region of interest, thereby acquiring the first image frame related to the region of interest.
  • the picture content in the first image frame is the content of the region of interest.
  • the first image frame includes only the region of interest indicated in the region of interest
  • the second image frame includes only the region of interest indicated in the region of interest information , as well as other areas. That is to say, the picture in the first image frame only includes the picture content at the location of the region of interest, while the picture in the second image frame includes other than the picture content at the location of the region of interest. screen content.
  • the second image frame may be an image frame collected by the first terminal under a preset zoom magnification, and the first image frame may be collected by the first terminal when the zoom magnification is adjusted according to the region of interest information. image frame.
  • the second image frame may be considered as a global image collected by the first terminal, that is, the second image frame includes all areas within the field of view of the camera of the first terminal;
  • the first image frame may be considered as a partial image collected by the first terminal, that is, the first image frame only includes a partial area within the field of view of the camera of the first terminal.
  • Step 502 the first terminal sends the plurality of image frames and the indication information to the second terminal, so that the second terminal selects the image frame to be displayed according to the indication information, and the image frame to be displayed uses
  • the indication information includes region of interest information of the first image frame.
  • the first terminal sequentially collects the above-mentioned multiple image frames, and sends the multiple image frames to the second terminal one by one.
  • the first terminal may collect the plurality of image frames at a higher resolution, and then reduce the plurality of image frames to a specific resolution, and then perform a processing operation on the reduced plurality of images.
  • frame is encoded, and the encoded multiple image frames are sent.
  • the first terminal collects the multiple image frames at 4k resolution, and then reduces the multiple image frames to 1080P resolution, and encodes and sends the reduced multiple image frames.
  • the first terminal in the process of sending the image frame from the first terminal to the second terminal, the first terminal also sends indication information to the second terminal, where the indication information is used to indicate the area of interest information of each image frame.
  • the first terminal may carry indication information corresponding to each image frame in the process of sending each image frame to the second terminal, and the indication information indicates the sense of the image frame transmitted by the first terminal.
  • the indication information may include coordinate information used to indicate the location of the region of interest.
  • the indication information may include coordinate information of four vertices of the rectangular region.
  • the indication information may include coordinate information of a vertex of the rectangular region (eg, the upper left corner vertex of the rectangular region) and the width and height of the rectangular region. In this way, based on the coordinate information of one vertex of the rectangular area and the width and height of the rectangular area, the coordinate information of the four vertexes of the rectangular area can also be calculated.
  • the first terminal when the first terminal performs video encoding through the H.264/H.265 video compression standard, the first terminal may write the indication information into the supplemental enhancement information (Supplemental Enhancement Information, SEI), thereby achieving The indication information corresponding to each image frame is carried in the transmission data.
  • SEI Supplemental Enhancement Information
  • the first terminal may not carry the second image frame when sending the second image frame. Indication information, that is, without indicating the location of the region of interest in the second image frame. In another possible embodiment, when sending the second image frame, the first terminal still carries indication information to indicate the location of the region of interest in the second image frame.
  • the first terminal when the first terminal sends the multiple image frames to the second terminal, the first terminal may encode the image frames one by one, obtain the encoded image information, and then encode the image frames.
  • the encoded image information and the indication information corresponding to the encoded image frame are sent to the second terminal.
  • the indication information may further include frame types of the multiple image frames.
  • the frame type of the first image frame is different from the frame type of the second image frame.
  • the first terminal may indicate the frame type of the image frame by using a certain bit in the indication information. For example, when a certain bit used to indicate the frame type in the indication information is set to 1, it indicates that the frame type of the current image frame is the second type, that is, the above-mentioned second image frame; when the indication information is used to indicate When a certain bit of the frame type is not set (that is, the value of the bit is 0), it indicates that the frame type of the current image frame is the first type, that is, the above-mentioned first image frame.
  • the frame type of the image frame may also be indicated in other manners, which is not specifically limited in this embodiment.
  • Step 503 the second terminal receives multiple image frames and indication information sent by the first terminal.
  • the second terminal After the second terminal receives the data sent by the first terminal, the second terminal obtains multiple image frames sent by the first terminal by decoding the data sent by the first terminal. Exemplarily, in the process that the first terminal continues to send the encoded data to the second terminal, the second terminal continues to decode the received encoded data, thereby sequentially obtaining the above-mentioned multiple image frames and each image. Indication information corresponding to the frame.
  • the first terminal collects an image frame, it encodes the image frame, and sends the encoded image frame and corresponding indication information to the second terminal. Therefore, the second terminal receives the image frames sent by the first terminal one by one instead of receiving multiple image frames at a time.
  • Step 504 if the region of interest information of the first image frame is different from the region of interest information in the second terminal, then according to the region of interest information in the second terminal, in the second image frame
  • the fifth image frame is obtained by cropping, and the fifth image frame is displayed.
  • the second terminal may select the image frame to be displayed according to the indication information corresponding to each image frame.
  • the second terminal can receive the interaction instruction from the user, and update the area of interest information according to the interaction instruction.
  • the second terminal needs to feed back the updated ROI information to the first terminal, and the first terminal collects a new image frame based on the updated ROI information, and then sends the new image frame to the second terminal terminal. Therefore, the image frames matching the updated ROI information have a certain hysteresis, that is, within a period of time after the second terminal has updated the local ROI information, the image frames received by the second terminal correspond to the The region of interest information is not the same as the updated region of interest information.
  • the second terminal determines whether the ROI information of the first image frame is the same as the ROI information stored in the second terminal. If the region of interest information of the first image frame is different from the region of interest information stored in the second terminal, the second terminal cuts the region of interest information from the second image frame according to the region of interest information in the second terminal. a fifth image frame, and displaying the fifth image frame.
  • the second image frame is actually a global image frame.
  • the location of the new region of interest is determined, and the location of the new region of interest is cropped to obtain a fifth image frame.
  • the content in the fifth image frame is the content corresponding to the position indicated by the region of interest information of the second terminal.
  • Step 505 If the region of interest information of the first image frame is the same as the region of interest information in the first terminal, display the first image frame.
  • the second terminal may display the first image frame.
  • FIG. 7 is a schematic diagram for comparison of region of interest information in different terminals at different times according to an embodiment of the present application.
  • the solid line box in the image represents the position indicated by the region of interest information in the current terminal.
  • the area of interest information in the first terminal is the same as the area of interest information in the second terminal, that is, the areas of interest indicated by the area of interest information in the first terminal and the second terminal are both located in the skier's The position of the head.
  • the area of interest information corresponding to the first image frame sent by the first terminal is the same as the area of interest information in the second terminal, and the second terminal chooses to display the first image frame, and the content of the first image frame is The skier's head as marked in the solid box.
  • the second terminal receives an interaction instruction from the user, the interaction instruction is specifically a translation operation instruction, and the second terminal updates the area of interest information according to the interaction instruction.
  • the area of interest indicated by the area of interest information before the update is located on the skier's head, and the area of interest indicated by the updated area of interest information is located on the ski. Since the region of interest information in the second terminal has changed, the region of interest information of the first image frame sent by the first terminal is different from the region of interest information in the second terminal, so the second terminal selects the region of interest in the second image frame.
  • the updated region of interest is cropped in the frame to obtain a third image frame, and the third image frame is displayed.
  • the content of the third image frame is the skateboard marked in the solid line box.
  • the first terminal receives the updated ROI information fed back by the second terminal, so the first terminal acquires a new first image frame according to the updated ROI information, and sends a new image frame to the second terminal. of the first image frame. Since there is a transmission delay between the first terminal and the second terminal, at time t3, the first image frame received by the second terminal is actually obtained by the first terminal based on the region of interest information before the update. That is, the area of interest information of the first image frame received by the second terminal is different from the area of interest information of the second terminal, and the second terminal still chooses to display the third image frame.
  • the region of interest information in the first terminal is the same as the region of interest information in the second terminal, and the second terminal receives the first image frame obtained by the first terminal based on the updated region of interest information.
  • the area of interest information corresponding to the first image frame sent by the first terminal is the same as the area of interest information in the second terminal, and the second terminal chooses to display the first image frame, and the content of the first image frame is Skateboards marked in solid boxes.
  • the first terminal serving as the sending end adds image frames with a lower zoom ratio, that is, images that include more content in the picture frame.
  • the second terminal can timely capture and display the image corresponding to the changed area of interest from the image frame with a lower zoom magnification, that is, the second terminal does not need to After waiting for a long time, the adjusted video picture can be displayed, which improves the response time when adjusting the region of interest in the video, and prevents the video picture from freezing.
  • different buffers may be created in the second terminal.
  • the second terminal After receiving the image frames of different frame types, the second terminal sends the image frames to corresponding buffers respectively. Then, the second terminal selects one of the buffers according to the region of interest information, and displays the image frames in the buffer.
  • the second terminal sends the first image frames to the image frames in sequence according to the frame types of the multiple image frames.
  • the first buffer and sending the second image frame to the second buffer.
  • the first buffer is used for storing the first image frame whose frame type is the first type
  • the second buffer is used for storing the second image frame whose frame type is the second type.
  • the second terminal Every time the second terminal receives a new image frame, the second terminal determines the frame type of the image frame, and if the frame type of the image frame is the first type, the image is sent to the first buffer; If the frame type of the frame is the second type, the image frame is sent to the second buffer. On the first and second buffers, old image frames are overwritten with new image frames.
  • the second terminal After the second terminal sends the image frame to the corresponding buffer based on the frame type of the image frame, the second terminal determines which image frame in the buffer to display according to the region of interest information of the image frame in the buffer.
  • the region of interest information of the first image frame in the first buffer is different from the region of interest information in the second terminal, then according to the region of interest information in the second terminal A fifth image frame is obtained by cropping the second image frame in the second buffer, and the fifth image frame is displayed. If the region of interest information of the first image frame in the first buffer is the same as the region of interest information in the first terminal, the first image frame in the first buffer is displayed.
  • the second terminal may acquire the user's interaction instruction in real time, and update the region of interest in real time according to the interaction instruction.
  • the second terminal acquires the interaction instruction, where the interaction instruction is used to instruct to change the area of interest.
  • the interaction instruction may be a zoom-out operation instruction, a zoom-in operation instruction, or a pan operation instruction initiated by the user by touching the screen of the second terminal.
  • the zoom-out operation instruction is used to instruct to zoom out the screen displayed on the screen of the second terminal with the target area as the starting point.
  • the zoom-in operation instruction is used to instruct to zoom in on the target area displayed on the screen of the second terminal.
  • the panning operation instruction is used to instruct to pan the picture displayed on the screen of the second terminal in a specific direction.
  • the second terminal executes the interactive instruction
  • the obtained area to be displayed is the new area of interest
  • the second terminal can update the area of interest information according to the interactive instruction to obtain the updated area of interest information.
  • the second terminal sends the updated ROI information to the first terminal.
  • the above describes a process in which the first terminal sends image frames with different zoom ratios to the second terminal, and the second terminal selects the image frames to be displayed according to the region of interest information.
  • the following will describe in detail the process of collecting multiple image frames by the first terminal and sending multiple image frames to the second terminal.
  • the first terminal may acquire the above-mentioned multiple image frames in multiple ways.
  • the first terminal may acquire the above-mentioned multiple image frames by means of digital zoom.
  • the digital zoom refers to increasing the area of each pixel of a partial area in the captured image through the processor in the first terminal, so as to achieve the purpose of enlarging the partial area. Actually, digital zoom doesn't change the focal length of the lens.
  • the first terminal sequentially collects one or more third image frames and the second image frame, and the zoom magnification of the one or more third image frames is the same as the zoom magnification of the second image frame. same. That is, the first terminal continuously collects multiple image frames at a specific zoom magnification, and the multiple image frames include the above-mentioned one or more third image frames and second image frames. Then, the first terminal crops the one or more third image frames according to the region of interest information to obtain the one or more first image frames.
  • the first terminal trims the third image frame according to the region of interest information in the first terminal, that is, based on the information indicated by the region of interest information
  • the position crops out the region of interest in the third image frame to obtain the first image frame.
  • the manner in which the first terminal acquires the first image frame is digital zooming.
  • FIG. 8 is a schematic diagram of obtaining multiple image frames by means of digital zooming according to an embodiment of the present application.
  • the first terminal sequentially captures image 1, image 2, image 3, image 4, image 5 and image 6 at a fixed zoom ratio, wherein image 1-image 5 correspond to the third image frame, and image 6 corresponds to the second image frame described above.
  • image 1-image 5 correspond to the third image frame
  • image 6 corresponds to the second image frame described above.
  • the first terminal crops image 1 based on the region of interest information to obtain image A1; similarly, after the first terminal collects image 2,
  • the region of interest information is used to crop image 2 to obtain image A2.
  • the first terminal obtains image A1-image A5 corresponding to image 1-image 5 by means of digital zoom, and image A1-image A5 corresponds to the above-mentioned first image frame.
  • image 6 the first terminal does not crop the image 6 based on the region of interest information, that is, the image 6 in FIG. 8 is the same as the image B.
  • the first terminal may set a fixed interval quantity, and the interval quantity is used to indicate the quantity of image frames spaced between two adjacent global image frames.
  • the above-mentioned first image frame may be referred to as a local image frame
  • the above-mentioned second image frame may be referred to as a global image frame. That is to say, each time the first terminal collects a certain number of partial image frames, it collects one global image frame. For example, when the number of intervals is 4, the first terminal collects one global image frame for every 4 local image frames collected. In this way, when the first terminal collects images at a frame rate of 30, the first terminal collects 30 image frames per second, and the 30 image frames include 24 local image frames and 6 global image frames.
  • the above-mentioned number of intervals may be fixed, for example, the number of intervals is 4 or 5.
  • the number of intervals may also be non-fixed.
  • the first terminal collects a global image frame after an interval of 4 partial image frames, and then the first terminal collects the next global image frame after an interval of 5 partial image frames. This embodiment does not limit the number of intervals set in the first terminal.
  • the plurality of image frames collected by the first terminal may further include a fourth image frame, and the zoom magnification of the fourth image frame is the same as the zoom magnification of the second image frame.
  • the process that the first terminal acquires multiple image frames specifically includes: the first terminal sequentially acquires the fourth image frame, one or more of the first image frames and the second image according to a preset rule
  • the preset rule is that after the first terminal collects a preset number of image frames based on the region of interest information, the first terminal uses the target zoom magnification to collect one image frame, and the preset number is equal to one or more image frames.
  • the number of the first image frames is the same, and the target zoom magnification is the zoom magnification of the second image frame.
  • the first terminal each time the first terminal collects a specific number of image frames based on the region of interest information, it uses a lower zoom magnification to collect a global image frame, thereby ensuring the frequency at which the second terminal receives the global image frame, It is ensured that when the second terminal subsequently clips the region of interest based on the global image frame, the phenomenon that the picture changes too much will not occur.
  • the first terminal inserts a global image frame after acquiring a plurality of local image frames based on the region of interest information.
  • the first terminal collects one or more third image frames, and the zoom magnification of the one or more third image frames is the same as the zoom magnification of the second image frame;
  • the region of interest information crops the one or more third image frames to obtain one or more first image frames;
  • the first terminal converts the one or more third image frames into A third image frame is identified as the second image frame. That is to say, the first terminal continuously collects the first image frame (that is, the local image frame) by means of digital zooming based on the information of the region of interest, and inserts the second image frame (that is, the global image frame) into the acquired multiple first image frames. image frame).
  • FIG. 9 is another schematic diagram of obtaining multiple image frames according to an embodiment of the present application.
  • the first terminal sequentially acquires image 1, image 2, image 3, image 4 and image 5 at a fixed zoom ratio, wherein image 1-image 5 correspond to the above-mentioned third image frame.
  • the first terminal crops the image 1 based on the region of interest information to obtain the image A1; similarly, after acquiring the image 2, the first terminal, based on the The region of interest information is used to crop image 2 to obtain image A2.
  • the first terminal obtains image A1-image A5 corresponding to image 1-image 5 by means of digital zoom, and image A1-image A5 corresponds to the above-mentioned first image frame.
  • the first terminal determines the image 5 as the second image frame, that is, the image B is inserted after the image A5 based on the image 5, and the image B is the same as the image 5.
  • the first terminal collects a global image frame every 5 partial image frames
  • the first terminal collects 30 image frames per second
  • the 30 image frames include 25 partial image frames and 5 image frames.
  • Global image frame For the second terminal, the second terminal displays 25 partial image frames per second.
  • the first terminal collects 30 partial image frames per second, and inserts 6 global image frames frame, that is, a total of 36 image frames.
  • the second terminal displays 30 partial image frames per second. Therefore, by obtaining the image frame in the second manner, the frequency of displaying the image frame by the second terminal can be guaranteed, and the fluency of the video picture can be improved.
  • FIG. 10 is a schematic diagram for comparison of image frames obtained based on different methods according to an embodiment of the present application.
  • the first terminal acquires image 1-image 10 based on a specific zoom magnification.
  • the first terminal crops the image 1-image 4 and the image 6-image 9 based on the region of interest information, and obtains the image A1-image A8 respectively;
  • the first terminal obtains the image B1 and the image B2 based on the image 5 and the image 10 . That is to say, the number of image frames sent by the first terminal to the second terminal is 10 in total.
  • the first terminal When the first terminal obtains the image frame that needs to be sent to the second terminal in the second way, the first terminal cuts the image 1-image 10 based on the region of interest information, and obtains the image A1-image A10 respectively; Image 5 and Image 10 get Image B1 and Image B2 and insert Image B1 after Image A5 and Image B2 after Image A10. That is to say, the number of image frames sent by the first terminal to the second terminal is 12 in total.
  • the second terminal can receive more image frames and can display more image frames, so as to ensure that the second terminal can receive more image frames.
  • the frequency at which the terminal displays image frames to improve the fluency of the video picture.
  • the first terminal may acquire the above-mentioned first image frame and second image frame by means of optical zooming.
  • the optical zoom means that the first terminal relies on the optical lens structure to achieve zoom, that is, the first terminal zooms in and out of the scene to be photographed by moving the lens.
  • both digital zoom and optical zoom help to zoom in on distant objects during telephoto shooting, only optical zoom can support the image subject to add more pixels, making the subject not only larger, but also relatively clearer. That is to say, when the object in the image is enlarged by means of optical zoom, the object in the image can be made relatively clearer.
  • the optical zoom takes the center of the field of view of the lens as the center point, the image frame captured by the lens is enlarged or reduced by changing the focal length, so the process of collecting the first image frame by the first terminal by means of optical zoom is actually to calculate the coverage of the field of view.
  • the longest focal length of the region of interest, and then an image frame including the region of interest is acquired based on the longest focal length. If the image frame including the region of interest just includes the region of interest, the image frame including the region of interest may be determined as the first image frame; if the image frame including the region of interest includes only the region of interest In addition, the non-interesting region is also included, and the region of interest can be cut out from the image frame including the region of interest to obtain the first image frame.
  • the first terminal may adjust the zoom magnification by means of optical zooming based on the region of interest information, and acquire the above-mentioned first image frame. Then, the first terminal adjusts the zoom ratio to acquire the second image frame.
  • the first terminal may select a corresponding reference frame according to the frame type of the image frame.
  • the sender will select the previous image frame of the current image frame as the reference frame, then calculate the difference information between the current image frame and the reference frame, and send the current image frame and the reference frame to the receiver.
  • the difference information between the reference frames is used instead of directly sending the current image frame to the receiving end, thereby reducing the transmitted data.
  • the first image frame and the second image frame are acquired based on different zoom ratios. Therefore, compared with two adjacent first image frames, the difference between the first image frame and the second image frame is relatively large.
  • the first terminal may select a corresponding reference frame according to the frame type of the image frame, so as to ensure that the frame type between the image frame currently to be encoded and the reference frame of the image frame is the same.
  • the first terminal obtains a reference frame of the plurality of image frames, wherein the reference frame of the first image frame is based on the obtained from the area information, the zoom magnification of the reference frame of the second image frame is the same as the zoom magnification of the second image frame.
  • the frame type of the partial image frame (that is, the above-mentioned first image frame) acquired by the first terminal based on the region of interest information is the first type, and is based on a specific zoom magnification.
  • the frame type of the acquired global image frame (that is, the above-mentioned second image frame) is the second type.
  • the first terminal may determine that the reference frame of the image frame is the previous image frame of the first type of the image frame, that is, the image frame closest to the image frame in the instant domain The first type of image frame.
  • the first terminal can determine that the reference frame of the image frame is the image frame of the second type immediately preceding the image frame, which is a distance from the image frame in the instant domain. The nearest image frame of the second type.
  • the reference frame of the image frame is an image frame of the same type that is closest to the image frame in the time domain.
  • the first terminal After obtaining the reference frame, the first terminal encodes the multiple image frames according to the reference frame of the multiple image frames, and obtains an encoding result; the first terminal sends the encoding to the second terminal result.
  • FIG. 11 is a schematic diagram of determining a reference frame according to an embodiment of the present application.
  • the images A1 to A10 are images obtained by the first terminal based on the region of interest information, and the frame types of the images A1 to A10 are the first type.
  • the images B1 and B2 are images obtained by the first terminal based on a specific zoom ratio, and the frame types of the images B1 and B2 are the second type.
  • the reference frame of the image A2 is the image A1
  • the reference frame of the image A3 is the image A2
  • the reference frame of the image A6 is the image A5.
  • the reference frame of the image is the previous image belonging to the first type.
  • the reference frame for picture B2 is picture B1, not picture A10.
  • the reference frame of the picture is the previous picture belonging to the second type.
  • the reference frame of the image frame to be encoded is not necessarily the previous image frame of the image frame to be encoded, so for some images that strictly take the previous image frame of the image frame to be encoded as the reference frame As for the encoder, this part of the encoder may not be able to implement the encoding of image frames well.
  • the first terminal encodes the multiple image frames according to the reference frames of the multiple image frames, and obtains an encoding result, which may specifically include: the first terminal according to the reference frames of the image frames. frame, encode the image frame through the first encoder to obtain a first encoding result; the first terminal encodes the second image frame through the second encoder according to the reference frame of the second image frame encoding to obtain a second encoding result; wherein, the encoding result includes the first encoding result and the second encoding result.
  • the encoding is performed using the image frame preceding each first image frame as a reference frame.
  • the reference frame of the second image frame is an image frame before the second image frame in the second encoder, and the image frame before the second image frame is the same as the first image frame.
  • the two image frames are of the same type.
  • the first terminal inputs these image frames into the first encoder, and the first encoder encodes these image frames.
  • the first terminal inputs these image frames into the second encoder, and the second encoder encodes these image frames.
  • FIG. 12 is a schematic diagram of encoding based on two encoders according to an embodiment of the present application.
  • the images A1 to A10 are images obtained by the first terminal based on the region of interest information, and the frame types of the images A1 to A10 are the first type.
  • the images B1 and B2 are images obtained by the first terminal based on a specific zoom magnification, and the frame types of the images B1 and B2 are the second type.
  • the input of the first encoder is image A1-image A10
  • the reference frame of image A2 is image A1
  • the reference frame of image A3 is image A2
  • the reference frame of image A6 is Image A5.
  • the reference frame of the picture is the previous picture.
  • the inputs to the second encoder are image B1 and image B2.
  • the reference frame of picture B2 is picture B1. That is, for any image input to the second encoder (except the first image input to the second encoder), the reference frame of the image is the previous image.
  • FIG. 13 is a schematic flowchart of a remote video method according to an embodiment of the present application.
  • the remote video method includes steps 1301 to 1315, wherein steps 1301 to 1306 are performed by the first terminal, and steps 1307 to 1315 are performed by the second terminal.
  • Step 1301 During the process of collecting image frames, the first terminal determines whether the image frame currently to be collected is a multiple of N.
  • the first terminal continues to collect image frames through the camera.
  • the first terminal may collect a global image frame every N-1 local image frames. Therefore, the first terminal can determine whether the image frame currently needs to be collected is a multiple of N, Determine whether the current acquisition of the global image frame or the local image frame.
  • the first terminal collects one global image frame every 4 local image frames.
  • the first to fourth image frames collected by the first terminal are local image frames, and the fifth image frame is a global image frame;
  • the sixth to ninth image frames collected by the first terminal are Local image frame,
  • the 10th image frame is the global image frame.
  • the global image frames collected by the first terminal are all multiples of N.
  • Step 1302 If the image frame currently to be collected is not a multiple of N, the first terminal collects a local image frame based on the region of interest information.
  • the partial image frame is acquired by the first terminal by means of digital zooming or optical zooming based on the region of interest information.
  • the partial image frame may be, for example, the above-mentioned first image frame.
  • the area of interest information may be area of interest information stored locally by the first terminal.
  • the first terminal may receive the area of interest information sent by the second terminal, and save the received area of interest information locally in the first terminal.
  • the first terminal may collect local image frames based on 4k resolution or 2k resolution.
  • Step 1303 If the current image frame to be collected is a multiple of N, the first terminal collects the global image frame.
  • the global image frame is, for example, the above-mentioned second image frame, and the global image frame is obtained by the first terminal through a preset zoom magnification.
  • the zoom magnification of the global image frame is smaller than that of the local image frame, and the picture content in the global image frame includes the picture content of the region of interest and the picture content of the non-interested region.
  • the image frame that needs to be collected currently is a multiple of N
  • the first terminal may also continue the partial image frame, and insert a global image frame after collecting the partial image frame.
  • the first terminal may collect global image frames based on 4k resolution or 2k resolution.
  • Step 1304 the first terminal converts the captured image frame to 1080P resolution.
  • the first terminal converts the acquired local image frame or the global image frame to 1080P resolution, that is, converts the acquired local image frame or global image frame to a resolution composed of 1920 ⁇ 1080 pixels image frame.
  • 1080P resolution that is, converts the acquired local image frame or global image frame to a resolution composed of 1920 ⁇ 1080 pixels image frame.
  • the size of the partial image frame if the size of the partial image frame itself is less than 1920 ⁇ 1080, the size of the partial image frame is converted to 1920 ⁇ 1080 by upsampling the partial image frame; if the size of the partial image frame itself is larger than 1920 ⁇ 1080, the size of the partial image frame is converted to 1920 ⁇ 1080 by down-sampling the partial image frame.
  • the global image frame the size of the global image frame is converted to 1920 ⁇ 1080 by down-sampling the global image frame.
  • Step 1305 the first terminal selects a reference frame of the image frame according to the frame type, and encodes the image frame based on the reference frame; or, the first terminal encodes image frames of different frame types by using two encoders.
  • the first terminal After acquiring the image frame, the first terminal encodes the acquired image frame. During the encoding process, the first terminal may select a reference frame of the current image frame to be encoded according to the frame type of the current image frame to be encoded.
  • the frame type of the local image frame may be defined as the first type
  • the frame type of the global image frame may be defined as the second type.
  • the first terminal may determine that the reference frame of the image frame is the previous image frame of the first type of the image frame, that is, An image frame of the first type that is closest to the image frame in the domain.
  • the first terminal can determine that the reference frame of the image frame is the image frame of the second type immediately preceding the image frame, which is a distance from the image frame in the instant domain.
  • the nearest image frame of the second type That is, for any image frame, the reference frame of the image frame is an image frame of the same type that is closest to the image frame in the time domain.
  • the first terminal may input image frames of different frame types into different encoders, so as to implement encoding of the image frames by using two encoders. Specifically, the first terminal inputs the image frames whose frame type is the first type into the first encoder, and the first encoder encodes the part of the image frames. In addition, the first terminal inputs the image frame whose frame type is the second type into the second encoder, and the first encoder encodes this part of the image frame.
  • the first encoder when the first encoder encodes each image frame of the first type, it uses the image frame preceding each image frame of the first type as a reference frame for encoding.
  • the second encoder encodes the image frame of the second type
  • the reference frame of the image frame of the second type is an image frame preceding the image frame in the second encoder.
  • Step 1306 the first terminal carries the indication information of the current image frame in the code stream, and sends the code stream to the second terminal.
  • the first terminal continues to encode the acquired image frame to obtain a code stream.
  • the first terminal may carry the indication information of the current image frame in the SEI of the code stream, and send the code stream to the second terminal.
  • the indication information is used to indicate the frame type of the current image frame and the region of interest information corresponding to the current image frame.
  • Step 1307 The second terminal receives the code stream sent by the first terminal, and decodes it to obtain the image frame and corresponding indication information.
  • the second terminal After receiving the code stream sent by the first terminal, the second terminal decodes the code stream to obtain an image frame and indication information corresponding to the image frame.
  • Step 1308 the second terminal acquires the interaction instruction, and obtains the region of interest indicated by the interaction instruction by parsing the interaction instruction.
  • the second terminal may acquire the interaction instruction triggered by the user, and obtain the region of interest indicated by the interaction instruction by parsing the interaction instruction.
  • the interaction instruction may be a zoom-out operation instruction, a zoom-in operation instruction, or a pan operation instruction initiated by the user by touching the screen of the second terminal.
  • Step 1309 the second terminal updates the local area of interest information according to the area of interest indicated by the interaction instruction.
  • the second terminal After parsing and obtaining the region of interest indicated by the interaction instruction, the second terminal updates the local region of interest information according to the region of interest indicated by the interaction instruction.
  • Step 1310 The second terminal determines whether the current image frame is a global image frame according to the indication information carried by the image frame.
  • the second terminal can judge whether the current image frame is a global image frame according to the indication information corresponding to the image frame in the code stream, that is, judge whether the frame type of the current image frame is the first image frame. Two types.
  • Step 1311 if the current image frame is not the global image frame, send the current image frame to the first buffer.
  • the second terminal When other partial image frames are stored in the first buffer, the second terminal overlays the new partial image frame on the other partial image frames, so that there is always only one image frame in the first buffer.
  • Step 1312 if the current image frame is a global image frame, send the current image frame to the second buffer.
  • the second terminal When other global image frames are stored in the first buffer, the second terminal overlays the new global image frame on the other global image frames, so that there is always only one image frame in the second buffer.
  • Step 1313 The second terminal determines whether the region of interest information corresponding to the image frame in the first buffer is consistent with the region of interest information local to the second terminal.
  • the second terminal may determine whether the region of interest information corresponding to the image frame in the first buffer is consistent with the local region of interest information of the second terminal.
  • Step 1314 If the ROI information corresponding to the image frame in the first buffer is consistent with the local ROI information of the second terminal, the second terminal sends the local image frame in the first buffer to the display screen for display.
  • Step 1315 if the region of interest information corresponding to the image frame in the first buffer is inconsistent with the local region of interest information of the second terminal, the second terminal intercepts part of the region in the global image frame in the second buffer, and uses The captured image frame is sent to the display screen for display.
  • the location of the new region of interest can be determined from the global image frame according to the region of interest information in the local area of the second terminal, and the location of the new region of interest can be cropped to obtain new image frame.
  • the content in the new image frame is the content corresponding to the position indicated by the region of interest information of the second terminal.
  • FIG. 14 is a schematic structural diagram of a terminal 1400 provided by an embodiment of the present application.
  • the terminal 1400 includes: an acquisition unit 1401, a processing unit 1402, and a transceiver unit 1403; the acquisition unit 1401 is used to acquire multiple image frames, the plurality of image frames include a first image frame and a second image frame, the zoom magnification of the first image frame is greater than the zoom magnification of the second image frame, the first image frame is determined by the area information, the area of interest information is used to indicate the location of the area of interest; the first terminal sends the plurality of image frames and the indication information to the second terminal, so that the second terminal can
  • the indication information selects an image frame to be displayed, and the image frame to be displayed is used to generate a video, and the indication information includes the region of interest information of the first image frame.
  • the second image frame includes the region of interest indicated in the region of interest information.
  • the acquiring unit 1401 is further configured to acquire a third image frame and the second image frame, and the zoom magnification of the third image frame is the same as that of the second image.
  • the zoom ratios of the frames are the same;
  • the processing unit 1402 is further configured to crop the third image frame according to the region of interest information to obtain the first image frame.
  • the acquiring unit 1401 is further configured to acquire one or more third image frames, and the zoom magnification of the one or more third image frames is the same as that of the second image frame.
  • the zoom ratios of the image frames are the same;
  • the processing unit 1402 is further configured to crop the one or more third image frames according to the region of interest information to obtain one or more of the first image frames;
  • the processing unit 1402 is further configured to determine a third image frame among the one or more third image frames as the second image frame.
  • the acquiring unit 1401 acquires the first image frame and the second image frame by means of optical zooming.
  • the plurality of image frames further include a fourth image frame, and the zoom magnification of the fourth image frame is the same as the zoom magnification of the second image frame;
  • the acquiring Unit 1401 is further configured to sequentially acquire the fourth image frame, one or more of the first image frames and the second image frame according to a preset rule; wherein the preset rule is that the first terminal is based on After collecting a preset number of image frames for the region of interest information, a target zoom magnification is used to collect one image frame, the preset number is the same as the number of one or more first image frames, and the target zoom magnification is the zoom magnification of the second image frame.
  • the obtaining unit 1401 is further configured to obtain the reference frames of the multiple image frames;
  • the processing unit 1402 is further configured to obtain the reference frames of the multiple image frames according to the , encode the multiple image frames to obtain an encoding result;
  • the transceiver unit 1403 is further configured to send an encoding result to the second terminal; wherein, the reference frame of the first image frame is based on the interest obtained from the area information, the zoom magnification of the reference frame of the second image frame is the same as the zoom magnification of the second image frame.
  • the processing unit 1402 is further configured to encode the image frame through the first encoder according to the reference frame of the image frame to obtain a first encoding result;
  • the processing unit 1402 is further configured to encode the second image frame by using the second encoder according to the reference frame of the second image frame to obtain a second encoding result; wherein the encoding result includes the first encoding result. an encoding result and the second encoding result.
  • the indication information further includes frame types of the multiple image frames, and the frame type of the first image frame is different from the frame type of the second image frame.
  • the transceiver unit 1403 is configured to receive multiple image frames and indication information sent by the second terminal, where the multiple image frames include a first image frame and a second image frame, the first image frame The zoom magnification of an image frame is greater than that of the second image frame, and the first image frame is determined according to the area of interest information, and the area of interest information is used to indicate the position of the area of interest, and the indication information includes all the region of interest information of the first image frame; the processing unit 1402 is configured to: if the region of interest information of the first image frame is different from the region of interest information in the first terminal, The region of interest information in a terminal is cropped in the second image frame to obtain a third image frame, and the third image frame is displayed; if the region of interest information of the first image frame is the same as that of the first terminal The information of the region of interest is the same, then the first image frame is displayed.
  • the second image frame includes the region of interest indicated in the region of interest information.
  • the first image frame is a pair of images obtained by the second terminal according to the region of interest information in the second terminal after collecting one or more third image frames.
  • the one or more third image frames are obtained by cropping, and the zoom magnification of the one or more third image frames is the same as the zoom magnification of the second image frame.
  • the second image frame is one image frame among the one or more third image frames.
  • the plurality of image frames are acquired by the second terminal by means of optical zooming.
  • the plurality of image frames further include a fourth image frame, and the zoom magnification of the fourth image frame is the same as the zoom magnification of the second image frame;
  • Four image frames, one or more of the first image frame and the second image frame are sequentially acquired by the second terminal according to a preset rule; wherein, the preset rule is that the second terminal based on the After collecting a preset number of image frames for the region of interest information, use a target zoom magnification to collect one image frame, the preset number is the same as the number of one or more first image frames, and the target zoom magnification is The zoom magnification of the second image frame.
  • the indication information further includes frame types of the multiple image frames, and the frame type of the first image frame is different from the frame type of the second image frame;
  • the processing unit 1402 is further configured to send the first image frame to the first buffer and send the second image frame to the second buffer in sequence according to the frame types of the multiple image frames;
  • the The processing unit 1402 is further configured to: if the region of interest information of the first image frame in the first buffer is different from the region of interest information in the first terminal, according to the region of interest in the first terminal The region information is cropped in the second image frame of the second buffer to obtain a third image frame, and the third image frame is displayed; if the region of interest information of the first image frame in the first buffer is the same as If the region of interest information in the first terminal is the same, the first image frame in the first buffer is displayed.
  • the acquiring unit 1401 is configured to acquire an interaction instruction, and the interaction instruction is used to instruct to change the region of interest; the processing unit 1402 is further configured to update the interaction instruction according to the The area of interest information is obtained to obtain the updated area of interest information; the transceiver unit 1403 is further configured to send the updated area of interest information to the second terminal.
  • the remote video method provided in this embodiment of the present application may be specifically executed by a chip in the terminal.
  • the chip includes: a processing unit 1402 and a communication unit.
  • the processing unit 1402 may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pipe feet or circuits, etc.
  • the processing unit 1402 can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the remote video method described in the embodiments shown in FIG. 1 to FIG. 13 .
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (read-only memory, ROM) Or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • the present application also provides a computer program product.
  • the method disclosed in FIG. 5 may be implemented as encoded on a computer-readable storage medium in a machine-readable format or encoded in a computer-readable storage medium.
  • FIG. 15 schematically illustrates a conceptual partial view of an example computer program product including a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.
  • computer program product 1500 is provided using signal bearing medium 1501 .
  • Signal bearing medium 1501 may include one or more program instructions 1502 that, when executed by one or more processors, may provide the functions, or portions thereof, described above with respect to FIG. 2 .
  • program instructions 1502 in Figure 15 also describe example instructions.
  • the signal bearing medium 1501 may include a computer-readable medium 1503, such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, memory, ROM or RAM, and the like.
  • a computer-readable medium 1503 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, memory, ROM or RAM, and the like.
  • the signal bearing medium 1501 may include a computer recordable medium 1504 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
  • the signal bearing medium 1501 may include a communication medium 1505, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
  • signal bearing medium 1501 may be conveyed by a wireless form of communication medium 1505 (eg, a wireless communication medium that conforms to the IEEE 802.15 standard or other transmission protocol).
  • the one or more program instructions 1502 may be, for example, computer-executable instructions or logic-implemented instructions.
  • a computing device of a computing device may be configured to respond to program instructions 1502 communicated to the computing device via one or more of computer readable media 1503 , computer recordable media 1504 , and/or communication media 1505 , which provides various operations, functions, or actions.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了一种远程视频方法,应用于终端上。该方法包括:第一终端获取第一图像帧和第二图像帧,第一图像帧的变焦倍率大于第二图像帧的变焦倍率,第一图像帧是根据感兴趣区域信息确定的;第一终端向第二终端发送多个图像帧和指示信息,以使得第二终端根据指示信息选择待显示的图像帧,该指示信息包括第一图像帧的感兴趣区域信息。通过在发送与感兴趣区域相关的图像帧的过程中,加入变焦倍率更低的图像帧,可以使得接收端能够及时从变焦倍率更低的图像帧中截取出变化后的感兴趣区域对应的画面并进行显示,提高了调整视频感兴趣区域时的响应时间,避免视频画面出现卡顿现象。

Description

一种远程视频方法及相关装置
本申请要求于2021年3月26日提交中国专利局、申请号为202110327092.4、发明名称为“一种远程视频方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种远程视频方法及相关装置。
背景技术
随着通信技术的发展,远程视频在视频通话、远程医疗以及远程教育等场景中得到了广泛的应用。受限于功耗以及时延,远程视频的画面分辨率通常不超过1080P。然而,在很多场合下,如果视频的采集分辨率为1080P,则采集得到的视频的清晰度可能无法满足实际需求。例如,在远程作业辅导的场景中,即便放大了视频画面,视频画面中小号的字体也可能无法辨识清楚。
基于此,在相关技术中,提高了发送端采集视频的分辨率,并将采集到的视频的分辨率缩小为1080P后再发送到接收端。这样一来,当接收端需要放大某个区域(该区域通常称为感兴趣区域)时,接收端将需要放大的区域反馈给发送端。发送端在获取到需要放大的区域后,发送端从采集的视频画面中截取出需要放大的区域,并将截取得到的视频画面转换为1080P后发送给接收端。由于截取得到的视频画面是以高分辨率采集得到的,因此能够保证该视频画面的清晰度满足要求。
然而,接收端远程控制发送端,会存在较高的响应时间,容易导致视频画面出现明显的卡顿现象。例如,接收端在执行画面放大操作后,需要等待较长的时间(通常需要等待300毫秒以上),接收端才能够接收到发送端发送的数据并显示放大后的视频画面。
发明内容
本申请实施例提供了一种远程视频方法,发送端在向接收端发送与感兴趣区域相关的图像帧的过程中,加入变焦倍率更低的图像帧,即画面中包括更多内容的图像帧。这样,当接收端中的感兴趣区域发生变化时,接收端能够及时从变焦倍率更低的图像帧中截取出变化后的感兴趣区域对应的画面并进行显示,即接收端不需要等待较长时间即可显示调整后的视频画面,提高了调整视频感兴趣区域时的响应时间,避免视频画面出现卡顿现象。
本申请第一方面提供一种远程视频方法,应用于作为远程视频过程中的视频采集端以及视频发送端的第一终端。该方法包括:第一终端获取多个图像帧,所述多个图像帧包括第一图像帧和第二图像帧。所述第一图像帧的变焦倍率大于所述第二图像帧的变焦倍率,其中,变焦倍率是指摄像头的图像传感器所输出的图像的放大倍数。在摄像头不移动的情况下,摄像头的变焦倍率越大,则摄像头所输出的图像中的被摄物体也越大,且图像中所拍摄的范围越小;摄像头的变焦倍率越小,则摄像头所输出的图像中的被摄物体也越小,且图像中所拍摄的范围越大。
所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣 区域的位置。其中,在所述第一终端中,可以保存有感兴趣区域信息,该感兴趣区域信息可以是第二终端向第一终端反馈的。该感兴趣区域信息用于指示感兴趣区域所在的位置。其中,感兴趣区域是指用户通过手指触控终端的屏幕等交互方式,对终端的屏幕上所显示的视频画面进行放大、缩小或平移,所得到的区域。简单来说,感兴趣区域是远程视频过程中,终端的屏幕上待显示的区域。基于第一终端中的感兴趣区域信息,第一终端可以确定感兴趣区域所在的位置,从而获取与感兴趣区域相关的第一图像帧。所述第一图像帧中的画面内容即为感兴趣区域的内容。
所述第一终端向第二终端发送所述多个图像帧和指示信息,以使得所述第二终端根据所述指示信息选择待显示的图像帧,所述待显示的图像帧用于生成视频,所述指示信息包括所述第一图像帧的感兴趣区域信息。其中,所述指示信息可以包括用于指示感兴趣区域位置的坐标信息。例如,在感兴趣区域为一个矩形区域的情况下,该指示信息中可以包括该矩形区域的四个顶点的坐标信息。又例如,在感兴趣区域为一个矩形区域的情况下,该指示信息中可以包括该矩形区域的一个顶点的坐标信息(如矩形区域的左上角顶点)以及该矩形区域的宽和高。这样,基于矩形区域的一个顶点的坐标信息以及该矩形区域的宽和高,同样可以计算得到该矩形区域的四个顶点的坐标信息。
本方案中,作为发送端的第一终端在向作为接收端的第二终端发送与感兴趣区域相关的图像帧的过程中,加入变焦倍率更低的图像帧,即画面中包括更多内容的图像帧。这样,当第二终端中的感兴趣区域发生变化时,第二终端能够及时从变焦倍率更低的图像帧中截取出变化后的感兴趣区域对应的画面并进行显示,即第二终端不需要等待较长时间即可显示调整后的视频画面,提高了调整视频感兴趣区域时的响应时间,避免视频画面出现卡顿现象。
可选的,在一种可能的实现方式中,所述第一图像帧中仅包括所述感兴趣区域中所指示的感兴趣区域,所述第二图像帧中除了包括所述感兴趣区域信息中所指示的感兴趣区域之外,还包括其他的区域。也就是说,所述第一图像帧中的画面中只有感兴趣区域所在位置的画面内容,而第二图像帧中的画面中除了包括感兴趣区域所在位置的画面内容之外,还包括有其他的画面内容。
示例性地,所述第二图像帧可以是第一终端在预设变焦倍率下获取的图像帧,所述第一图像帧则可以是第一终端在根据感兴趣区域信息调整变焦倍率下获取的图像帧。在第一终端不移动的情况下,所述第二图像帧可以认为是第一终端获取的全局图像,即所述第二图像帧中包括了第一终端的摄像头的视野范围内的所有区域;所述第一图像帧则可以认为是第一终端获取的局部图像,即第一图像帧中仅包括第一终端的摄像头的视野范围内的部分区域。
可选的,在一种可能的实现方式中,所述第一终端获取所述多个图像帧,包括:所述第一终端依次采集第三图像帧和所述第二图像帧,所述第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。即,所述第一终端以特定的变焦倍率连续采集多个图像帧,该多个图像帧中包括上述的第三图像帧和第二图像帧。然后,所述第一终端根据所述感兴趣区域信息对所述第三图像帧进行裁剪,得到所述第一图像帧。具体地,所述第一终端每获取 一个第三图像帧之后,所述第一终端都根据第一终端中的感兴趣区域信息对第三图像帧进行裁剪,即基于感兴趣区域信息所指示的位置将第三图像帧中的感兴趣区域裁剪出来,从而得到第一图像帧。其中,第一终端获取第一图像帧的方式即为数码变焦。
可选的,在一种可能的实现方式中,所述第一终端获取所述多个图像帧,包括:所述第一终端采集一个或多个第三图像帧,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述第一终端根据所述感兴趣区域信息对所述一个或多个第三图像帧进行裁剪,得到一个或多个所述第一图像帧;所述第一终端将所述一个或多个第三图像帧中的一个第三图像帧确定为所述第二图像帧。也就是说,第一终端基于感兴趣区域信息,可以通过数码变焦的方式连续采集第一图像帧(即局部图像帧),并在获取的多个第一图像帧中插入第二图像帧(即全局图像帧)。
也就是说,第一终端每次基于所述感兴趣区域信息采集特定数量的图像帧后,都插入一个变焦倍率较低的全局图像帧。这样一来,第一终端向第二终端发送的图像帧的数量要大于第一终端实际所采集的图像帧的数量。通过在多个连续的局部图像帧中插入全局图像帧,能够保证第二终端显示局部图像帧的频率,提高视频画面的流畅性。
可选的,在一种可能的实现方式中,所述第一终端获取所述多个图像帧,包括:所述第一终端通过光学变焦的方式采集所述第一图像帧和所述第二图像帧。其中,光学变焦是指第一终端依靠光学镜头结构来实现变焦,即第一终端通过镜片移动来放大与缩小需要拍摄的景物。数码变焦和光学变焦虽然都有助于望远拍摄时放大远方物体,但是只有光学变焦可以支持图像主体成像后,增加更多的像素,让主体不但变大,同时也相对更清晰。也就是说,通过光学变焦的方式来放大图像中的物体时,能够使得图像中的物体相对更加清晰。
由于光学变焦是以镜头的视野中心为中心点,通过改变焦距来放大或缩小镜头所捕捉的图像帧,因此第一终端通过光学变焦的方式采集第一图像帧的过程实际上是计算视野覆盖感兴趣区域的最长焦距,然后基于该最长焦距获取第一图像帧。
可选的,在一种可能的实现方式中,在第一终端所采集的所述多个图像帧中还可以包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。所述第一终端获取多个图像帧的过程,具体包括:所述第一终端根据预置规则依次采集所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧;其中,所述预置规则为所述第一终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
这样一来,第一终端每次基于所述感兴趣区域信息采集特定数量的图像帧后,都采用较低的变焦倍率采集一个全局图像帧,从而保证第二终端接收到全局图像帧的频率,保证第二终端后续基于全局图像帧来裁剪感兴趣区域时,不会出现画面变动过大的现象。
可选的,在一种可能的实现方式中,在所述第一终端对上述的多个图像帧进行编码的过程中,所述第一终端可以根据图像帧的帧类型选择对应的参考帧。
一般来说,在视频的编码过程中,发送端会选择当前图像帧的前一个图像帧作为参考 帧,然后计算当前图像帧与参考帧之间的差异信息,并向接收端发送当前图像帧与参考帧之间的差异信息,以代替直接向接收端发送当前图像帧,从而起到减少所传输的数据的作用。然而,在本方案中,由于第一图像帧与第二图像帧是基于不同的变焦倍率采集的。因此,相比于相邻的两个第一图像帧来说,第一图像帧与第二图像帧之间的差异比较大。在对第二图像帧进行编码时,如果以第二图像帧的前一个图像帧(即第一图像帧)为参考帧,则所得到的差异信息的数据量较大,增加了数据传输量。基于此,所述第一终端可以根据图像帧的帧类型选择对应的参考帧,以保证当前需要编码的图像帧与该图像帧的参考帧之间的帧类型是相同的。
具体地,所述第一终端向第二终端发送所述多个图像帧,包括:所述第一终端获取所述多个图像帧的参考帧;所述第一终端根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果;所述第一终端向所述第二终端发送编码结果;其中,所述第一图像帧的参考帧是根据所述感兴趣区域信息得到的,所述第二图像帧的参考帧的变焦倍率与所述第二图像帧的变焦倍率相同。
简单来说,在第一终端采集图像帧的过程中,第一终端基于感兴趣区域信息获取的局部图像帧(即上述的第一图像帧)的帧类型为第一类型,基于特定的变焦倍率获取的全局图像帧(即上述的第二图像帧)的帧类型为第二类型。对于帧类型为第一类型的任意一个图像帧来说,第一终端可以确定该图像帧的参考帧为该图像帧的前一个第一类型的图像帧,即时域上离该图像帧最近的一个第一类型的图像帧。类似地,对于帧类型为第二类型的任意一个图像帧来说,第一终端可以确定该图像帧的参考帧为该图像帧的前一个第二类型的图像帧,即时域上离该图像帧最近的一个第二类型的图像帧。也就是说,对于任意一个图像帧,该图像帧的参考帧为时域上离该图像帧最近的一个相同类型的图像帧。
可选的,在一种可能的实现方式中,所述第一终端根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果,包括:所述第一终端根据所述图像帧的参考帧,通过第一编码器对所述图像帧进行编码,得到第一编码结果;所述第一终端根据所述第二图像帧的参考帧,通过第二编码器对所述第二图像帧进行编码,得到第二编码结果;其中,所述编码结果包括所述第一编码结果和所述第二编码结果。
其中,第一编码器对每个第一图像帧进行编码时,是以每个第一图像帧的前一个图像帧作为参考帧来进行编码。第二编码器对第二图像帧进行编码时,第二图像帧的参考帧为第二编码器中在第二图像帧之前的一个图像帧,该在第二图像帧之前的一个图像帧与第二图像帧类型相同。
本方案中,通过提出了采用两路编码器来对不同帧类型的图像帧进行编码,能够保证图像帧的顺利编码。
可选的,在一种可能的实现方式中,为了便于第二终端确定所述多个图像帧中的第一图像帧和第二图像帧,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同。例如,所述指示信息指示所述第一图像帧的帧类型为第一类型,指示所述第二图像帧的帧类型为第二类型。
示例性地,当指示信息中用于指示帧类型的某一个位被置位为1时,则指示当前图像 帧的帧类型为第二类型,即上述的第二图像帧;当指示信息中用于指示帧类型的某一个位没有被置位时(即该位的值为0),则指示当前图像帧的帧类型为第一类型,即上述的第一图像帧。
本申请第二方面提供一种远程视频方法,该方法应用于作为视频接收端的第一终端。该方法包括:第一终端接收第二终端发送的多个图像帧和指示信息,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置,所述指示信息包括所述第一图像帧的感兴趣区域信息;若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧。
具体地,对于作为接收端的第二终端来说,第二终端能够接收来自于用户的交互指令,并根据交互指令更新感兴趣区域信息。但是,由于第二终端需要将更新后的感兴趣区域信息反馈给第一终端,且第一终端基于更新后的感兴趣区域信息获取新的图像帧后,再将新的图像帧发送给第二终端。因此,与更新后的感兴趣区域信息匹配的图像帧存在一定的滞后性,即第二终端在更新完本地的感兴趣区域信息之后的一段时间内,第二终端接收到的图像帧所对应的感兴趣区域信息与更新后的感兴趣区域信息并不相同。
因此,第二终端在显示第一图像帧之前,第二终端判断第一图像帧的感兴趣区域信息与第二终端中所保存的感兴趣区域信息是否相同。如果第一图像帧的感兴趣区域信息与第二终端中所保存的感兴趣区域信息不同,第二终端则根据所述第二终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧。简单来说,由于第二图像帧的变焦倍率大于第一图像帧的变焦倍率,第二图像帧实际上为全局图像帧,因此,当感兴趣区域发生变化的时候,可以从第二图像帧中确定新的感兴趣区域所在的位置,并将新的感兴趣区域所在位置裁剪下来,得到第三图像帧。其中,第三图像帧中的内容即为第二终端的感兴趣区域信息所指示的位置对应的内容。
可选的,在一种可能的实现方式中,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
可选的,在一种可能的实现方式中,所述第一图像帧是所述第二终端在采集一个或多个第三图像帧后,根据所述第二终端中的感兴趣区域信息对所述一个或多个第三图像帧进行裁剪得到的,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。
可选的,在一种可能的实现方式中,所述第二图像帧为所述一个或多个第三图像帧中的一个图像帧。
可选的,在一种可能的实现方式中,所述多个图像帧是所述第二终端通过光学变焦的方式获取的。
可选的,在一种可能的实现方式中,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述第四图像帧、一个或多个所述第 一图像帧和所述第二图像帧是所述第二终端根据预置规则依次获取的;其中,所述预置规则为所述第二终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
可选的,在一种可能的实现方式中,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同;所述方法还包括:所述第一终端根据所述多个图像帧的帧类型,将所述第一图像帧依次送往第一缓冲区以及将所述第二图像帧送往第二缓冲区。其中,第一缓冲区用于存放帧类型为第一类型的第一图像帧,第二缓冲区用于存放帧类型为第二类型的第二图像帧。第二终端每接收到一个新的图像帧,第二终端则判断该图像帧的帧类型,如果该图像帧的帧类型为第一类型,则将该图像送往第一缓冲区;如果该图像帧的帧类型为第二类型,则将该图像帧送往第二缓冲区。在第一缓冲区和第二缓冲区上,旧的图像帧会被新的图像帧覆盖。
所述若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧,包括:若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二缓冲区的第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;所述若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧,包括:若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一缓冲区中的第一图像帧。
可选的,在一种可能的实现方式中,所述方法还包括:当用户在第二终端上发起交互指令时,所述第二终端获取交互指令,所述交互指令用于指示变更感兴趣区域。该交互指令例如可以为用户通过触控第二终端的屏幕所发起的缩小操作指令、放大操作指令或者平移操作指令。其中,缩小操作指令用于指示以目标区域为起点,缩小第二终端的屏幕所显示的画面。放大操作指令用于指示放大第二终端的屏幕上所显示的目标区域。平移操作指令用于指示将第二终端的屏幕上所显示的画面往特定方向进行平移。第二终端执行所述交互指令后,所得到的需要显示的区域即为新的感兴趣区域,因此所述第二终端可以根据所述交互指令更新感兴趣区域信息,得到更新后的感兴趣区域信息。这样,所述第二终端将本地所保存的感兴趣区域信息进行更新后,所述第二终端向所述第一终端发送更新后的感兴趣区域信息。
本申请第三方面提供一种终端,包括:获取单元、处理单元和收发单元;所述获取单元用于获取多个图像帧,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧的变焦倍率,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置;所述第一终端向第二终端发送所述多个图像帧和指示信息,以使得所述第二终端根据所述指示信息选择待显示的图像帧,所述待显示的图像帧用于生成视频,所述指示信息包括所述第一图像帧的感兴趣区域信息。
可选的,在一种可能的实现方式中,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
可选的,在一种可能的实现方式中,所述获取单元,还用于采集第三图像帧和所述第二图像帧,所述第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述处理单元,还用于根据所述感兴趣区域信息对所述第三图像帧进行裁剪,得到所述第一图像帧。
可选的,在一种可能的实现方式中,所述获取单元,还用于采集一个或多个第三图像帧,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述处理单元,还用于根据所述感兴趣区域信息对所述一个或多个第三图像帧进行裁剪,得到一个或多个所述第一图像帧;所述处理单元还用于将所述一个或多个第三图像帧中的一个第三图像帧确定为所述第二图像帧。
可选的,在一种可能的实现方式中,所述获取单元通过光学变焦的方式采集所述第一图像帧和所述第二图像帧。
可选的,在一种可能的实现方式中,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述获取单元还用于根据预置规则依次采集所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧;其中,所述预置规则为所述第一终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
可选的,在一种可能的实现方式中,所述获取单元还用于获取所述多个图像帧的参考帧;所述处理单元还用于根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果;所述收发单元还用于向所述第二终端发送编码结果;其中,所述第一图像帧的参考帧是根据所述感兴趣区域信息得到的,所述第二图像帧的参考帧的变焦倍率与所述第二图像帧的变焦倍率相同。
可选的,在一种可能的实现方式中,所述处理单元还用于根据所述图像帧的参考帧,通过第一编码器对所述图像帧进行编码,得到第一编码结果;所述处理单元还用于根据所述第二图像帧的参考帧,通过第二编码器对所述第二图像帧进行编码,得到第二编码结果;其中,所述编码结果包括所述第一编码结果和所述第二编码结果。
可选的,在一种可能的实现方式中,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同。
本申请第四方面提供一种终端,包括:获取单元、收发单元和处理单元;所述收发单元用于接收第二终端发送的多个图像帧和指示信息,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置,所述指示信息包括所述第一图像帧的感兴趣区域信息;所述处理单元用于:若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;若所述第一图像帧的 感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧。
可选的,在一种可能的实现方式中,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
可选的,在一种可能的实现方式中,所述第一图像帧是所述第二终端在采集一个或多个第三图像帧后,根据所述第二终端中的感兴趣区域信息对所述一个或多个第三图像帧进行裁剪得到的,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。
可选的,在一种可能的实现方式中,所述第二图像帧为所述一个或多个第三图像帧中的一个图像帧。
可选的,在一种可能的实现方式中,所述多个图像帧是所述第二终端通过光学变焦的方式获取的。
可选的,在一种可能的实现方式中,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧是所述第二终端根据预置规则依次获取的;其中,所述预置规则为所述第二终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
可选的,在一种可能的实现方式中,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同;所述处理单元还用于根据所述多个图像帧的帧类型,将所述第一图像帧依次送往第一缓冲区以及将所述第二图像帧送往第二缓冲区;所述处理单元还用于:若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二缓冲区的第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一缓冲区中的第一图像帧。
可选的,在一种可能的实现方式中,所述获取单元用于获取交互指令,所述交互指令用于指示变更感兴趣区域;所述处理单元还用于根据所述交互指令更新感兴趣区域信息,得到更新后的感兴趣区域信息;所述收发单元还用于向所述第二终端发送更新后的感兴趣区域信息。
本申请第五方面提供一种终端,该终端包括:处理器、非易失性存储器和易失性存储器;其中,非易失性存储器或易失性存储器中存储有计算机可读指令;处理器读取计算机可读指令以使终端实现如第一方面或第二方面中的任意一种实现方式的方法。
本申请第六方面提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如第一方面或第二方面中的任意一种实现方式的方法。
本申请第七方面提供一种计算机程序产品,当其在计算机上运行时,使得计算机执行如第一方面或第二方面中的任意一种实现方式的方法。
本申请第八方面提供一种芯片,包括一个或多个处理器。处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行上述任一方面任意可能的实现方式中的方法。
可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。本申请提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。
附图说明
图1为本申请实施例提供的一种视频画面的示意图;
图2为相关技术一的远程视频流程示意图;
图3为相关技术二的远程视频流程示意图;
图4为本申请实施例提供的一种终端101的结构示意图;
图5为本申请实施例提供的一种远程视频方法的流程示意图;
图6为本申请实施例提供的一种不同变焦倍率的图像对比示意图;
图7为本申请实施例提供的一种不同时刻下不同终端中的感兴趣区域信息的对比示意图;
图8为本申请实施例提供的一种通过数码变焦的方式得到多个图像帧的示意图;
图9为本申请实施例提供的另一种得到多个图像帧的示意图;
图10为本申请实施例提供的一种基于不同方式得到图像帧的对比示意图;
图11为本申请实施例提供的一种确定参考帧的示意图;
图12为本申请实施例提供的一种基于两个编码器进行编码的示意图;
图13为本申请实施例的一种远程视频方法的流程示意图;
图14为本申请实施例提供的一种终端1400的结构示意图;
图15为本申请实施例提供的一种计算机程序产品1500的结构示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步 骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。
随着通信技术的发展,远程视频在视频通话、远程医疗以及远程教育等场景中得到了广泛的应用。受限于功耗以及时延,远程视频的画面分辨率通常不超过1080P(即像素为1920×1080)。
然而,在很多场合下,如果视频的采集分辨率为1080P,则采集的视频的清晰度可能无法满足实际需求。示例性地,可以参阅图1,图1为本申请实施例提供的一种视频画面的示意图。如图1所示,在远程作业辅导的场景中,由于视频的采集分辨率为1080P,因此对于视频画面中较为小号的字体,即便放大了视频画面,视频画面中小号的字体也可能无法辨识清楚。
基于此,相关技术中提出了两种解决方案。在相关技术的一种解决方案中,通过提高发送端采集视频的分辨率,例如将发送端所采集的视频的分辨率提高为3840x2160(简称4k分辨率),来向接收端发送分辨率更高的视频,从而解决视频不清晰的问题。
示例性地,可以参阅图2,图2为相关技术一的远程视频流程示意图。如图2所示,发送端通过摄像头采集4k分辨率的视频,并在以4k分辨率对视频进行编码后,将编码后的视频发送给接收端。接收端在接收到编码后的视频后,对视频进行解码,得到4k分辨率的视频,并将4k分辨率的视频进行显示。
在相关技术一中,通过提高所传输的视频的分辨率,能够有效提高远程视频的清晰度。但是,相比于对1080P分辨率的视频进行编码,对4k分辨率的视频进行编码的功耗和时延都增加了4倍,影响了远程视频的实时性。此外,由于编码功耗较大,在一些终端上往往无法长时间进行远程视频,影响了正常的远程视频。进一步的,对4k分辨率的视频进行编码的编码码率也增加了4倍,会带来较大的带宽成本,导致了该方案难以应用在实际业务场景中。
在相关技术二中,在提高发送端采集视频的分辨率的基础上,例如将发送端所采集的视频的分辨率提高为3840x2160,将采集到的视频的分辨率缩小为1080P后再发送到接收端。
这样一来,当接收端需要放大某个区域(该区域通常可以称为感兴趣区域)时,接收端将需要放大的区域反馈给发送端。发送端在获取到需要放大的区域后,发送端从采集的视频画面中截取出需要放大的区域,并将截取得到的视频画面转换为1080P后发送给接收端。由于截取得到的视频画面是以高分辨率采集的,因此能够保证该视频画面的清晰度满足要求。
可以参阅图3,图3为相关技术二的远程视频流程示意图。如图3所示,发送端通过摄像头采集4k分辨率的视频,并根据感兴趣区域信息对原始视频画面进行截取,得到截取部分画面后的视频。然后,发送端以1080P分辨率对截取部分画面后的视频进行编码,并将编码后的视频发送给接收端。接收端在接收到编码后的视频后,对视频进行解码,得到1080P分辨率的视频,并将1080P分辨率的视频进行显示。
示例性地,假设发送端以1920×1080分辨率采集视频,并以1920×1080分辨率对采集到的视频进行编码后发送给接收端。那么,接收端可以接收到1920×1080分辨率的视频。如果接收端需要放大显示某个大小为960x540的感兴趣区域,则接收端需要将该感兴趣区域对应的视频画面从接收到的原始视频中截取下来,并将截取得到的视频画面上采样为大小为1920×1080的画面,最终显示上采样后的视频画面。由于原始的视频画面实际上是以1920×1080分辨率采集的,接收端实际所显示的感兴趣区域对应的视频画面是接收端上采样后的视频画面,因此该视频画面清晰度不高。
在采用相关技术二后,假设发送端以3840x2160分辨率采集视频,并以1920×1080分辨率对采集到的视频进行编码后发送给接收端,即将原始视频下采样为1920×1080后,再对下采样后的视频进行编码。那么,接收端可以接收到1920×1080分辨率的视频。如果接收端需要放大显示某个大小为960x540的感兴趣区域,则接收端需要将该感兴趣区域的位置反馈给发送端。
发送端在接收到感兴趣区域的位置后,可以确定在所采集的原始视频中,该感兴趣区域对应的大小实际为1920×1080。因此,发送端可以在采集的原始视频中截取大小为1920×1080的感兴趣区域,并以1920×1080分辨率对截取后的视频进行编码后发送给接收端。这样一来,接收端可以接收到从4k分辨率的原始视频中截取下来的视频,接收端不需要对接收到的视频画面进行上采样,因此接收端所显示的视频画面清晰度较高。
然而,在相关技术二中,接收端需要将感兴趣区域的位置反馈给发送端,由发送端进行相应处理后,再将处理后得到的视频传输给接收端。这样一来,接收端远程控制发送端,会存在较高的响应时间,容易导致视频画面出现明显的卡顿现象。例如,接收端在执行画面放大操作后,需要等待较长的时间(通常需要等待300毫秒以上),接收端才能够接收到发送端发送的数据并显示放大后的视频画面。
本申请实施例所提供的远程视频方法可以应用于具有视频采集功能的终端上。该终端又称之为用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等,是一种安装有能够拍摄视频的图像采集装置的设备,且能够跟其他的设备进行远程通信,以将拍摄到的视频传输给其他的设备。例如,具有拍摄功能的手持式设备、监控摄像机等。
目前,一些终端的举例为:手机(mobile phone)、平板电脑、笔记本电脑、掌上电脑、监控摄像机、移动互联网设备(mobile internet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等。
终端中的图像采集装置用于将光信号转换为电信号,以生成图像信号。图像采集装置例如可以为图像传感器,图像传感器例如可以为电荷耦合器件(Charge Coupled Device, CCD)或者互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)。
可以参阅图4,图4为本申请实施例提供的一种终端101的结构示意图。如图4所示,终端101包括处理器103,处理器103和系统总线105耦合。处理器103可以是一个或者多个处理器,其中每个处理器都可以包括一个或多个处理器核。显示适配器(video adapter)107,显示适配器可以驱动显示器109,显示器109和系统总线105耦合。系统总线105通过总线桥111和输入输出(I/O)总线耦合。I/O接口115和I/O总线耦合。I/O接口115和多种I/O设备进行通信,比如输入设备117(如:触摸屏等),多媒体盘(media tray)121,(例如,只读光盘(compact disc read-only memory,CD-ROM),多媒体接口等)。收发器123(可以发送和/或接收无线电通信信号),摄像头155(可以捕捉静态和动态数字视频图像)和外部USB端口125。其中,可选的,和I/O接口115相连接的接口可以是USB接口。
其中,处理器103可以是任何传统处理器,包括精简指令集计算(reduced instruction set Computing,RISC)处理器、复杂指令集计算(complex instruction set computing,CISC)处理器或上述的组合。可选的,处理器可以是诸如ASIC的专用装置。
终端101可以通过网络接口129和软件部署服务器149通信。示例性的,网络接口129是硬件网络接口,比如,网卡。网络127可以是外部网络,比如因特网,也可以是内部网络,比如以太网或者虚拟私人网络(virtual private network,VPN)。可选的,网络127还可以是无线网络,比如WiFi网络,蜂窝网络等。
硬盘驱动器接口131和系统总线105耦合。硬件驱动接口和硬盘驱动器133相连接。系统内存135和系统总线105耦合。运行在系统内存135的数据可以包括终端101的操作系统(OS)137、应用程序143和调度表。
操作系统包括Shell 139和内核(kernel)141。Shell 139是介于使用者和操作系统的内核间的一个接口。shell是操作系统最外面的一层。shell管理使用者与操作系统之间的交互:等待使用者的输入,向操作系统解释使用者的输入,并且处理各种各样的操作系统的输出结果。
内核141由操作系统中用于管理存储器、文件、外设和系统资源的那些部分组成。内核141直接与硬件交互,操作系统内核通常运行进程,并提供进程间的通信,提供CPU时间片管理、中断、内存管理和IO管理等等。
示例性地,在终端101为智能手机的情况下,应用程序143包括远程视频相关的程序。终端101通过执行应用程序143,能够与另一个终端实现远程视频。即,终端101能够通过摄像头155采集视频,并将采集到的视频经过处理103进行处理以及编码后,发送给另一个终端。在一个实施例中,在需要执行应用程序143时,终端101可以从软件部署服务器149下载应用程序143。
以上介绍了本申请实施例所提供的远程视频方法的应用场景,以下将详细介绍该远程视频方法的执行过程。
可以参阅图5,图5为本申请实施例提供的一种远程视频方法的流程示意图。如图5 所示,该远程视频方法包括以下的步骤。
步骤501,第一终端获取多个图像帧,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧的变焦倍率,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置。
本实施例中,第一终端为采集视频并向接收端发送视频的发送端,第二终端为接收第一终端发送的视频并显示视频的接收端。
在视频的采集过程中,第一终端可以是以固定的帧率连续采集图像帧,从而得到连续的多个图像帧。在所述多个图像帧中,包括有第一图像帧和第二图像帧。所述第一图像帧的变焦倍率均大于所述第二图像帧的变焦倍率。
其中,变焦倍率是指摄像头的图像传感器所输出的图像的放大倍数。在摄像头不移动的情况下,摄像头的变焦倍率越大,则摄像头所输出的图像中的被摄物体也越大,且图像中所拍摄的范围越小;摄像头的变焦倍率越小,则摄像头所输出的图像中的被摄物体也越小,且图像中所拍摄的范围越大。
示例性地,可以参阅图6,图6为本申请实施例提供的一种不同变焦倍率的图像对比示意图。如图6所示,对于摄像头在同一场景下所输出的两张尺寸相同的图像,摄像头通过较小的变焦倍率拍摄得到图像一,以及通过较大的变焦倍率拍摄得到图像二。对于变焦倍率较小的图像一,图像一的拍摄范围为滑雪者的全身。对于变焦倍率较大的图像二,图像二的拍摄范围为滑雪者的头部,即图像二的拍摄范围要小于图像一的拍摄范围。此外,图像二中所拍摄的滑雪者的头部要大于图像一中所拍摄的滑雪者的头部。简单来说,图像二可以认为是对图像一中的滑雪者的头部所在的区域放大后得到的。
在所述第一终端中,可以保存有感兴趣区域信息,该感兴趣区域信息可以是第二终端向第一终端反馈的。该感兴趣区域信息用于指示感兴趣区域所在的位置。其中,感兴趣区域是指用户通过手指触控终端的屏幕等交互方式,对终端的屏幕上所显示的视频画面进行放大、缩小或平移,所得到的区域。
简单来说,感兴趣区域是远程视频过程中,终端的屏幕上待显示的区域。以图6为例,终端屏幕上显示图像一,用户通过手指触控终端屏幕,对滑雪者的头部区域进行放大操作,终端则可以基于用户的交互指令得到相应的感兴趣区域(即图像二所示的滑雪者的头部区域)。
基于第一终端中的感兴趣区域信息,第一终端可以确定感兴趣区域所在的位置,从而获取与感兴趣区域相关的第一图像帧。所述第一图像帧中的画面内容即为感兴趣区域的内容。
可选的,所述第一图像帧中仅包括所述感兴趣区域中所指示的感兴趣区域,所述第二图像帧中除了包括所述感兴趣区域信息中所指示的感兴趣区域之外,还包括其他的区域。也就是说,所述第一图像帧中的画面中只有感兴趣区域所在位置的画面内容,而第二图像帧中的画面中除了包括感兴趣区域所在位置的画面内容之外,还包括有其他的画面内容。
示例性地,所述第二图像帧可以是第一终端在预设变焦倍率下采集的图像帧,所述第一图像帧则可以是第一终端在根据感兴趣区域信息调整变焦倍率下采集的图像帧。在第一 终端不移动的情况下,所述第二图像帧可以认为是第一终端采集的全局图像,即所述第二图像帧中包括了第一终端的摄像头的视野范围内的所有区域;所述第一图像帧则可以认为是第一终端采集的局部图像,即第一图像帧中仅包括第一终端的摄像头的视野范围内的部分区域。
步骤502,所述第一终端向第二终端发送所述多个图像帧和指示信息,以使得所述第二终端根据所述指示信息选择待显示的图像帧,所述待显示的图像帧用于生成视频,所述指示信息包括所述第一图像帧的感兴趣区域信息。
在第一终端与第二终端进行远程视频的过程中,第一终端依次采集上述的多个图像帧,并且将所述多个图像帧逐个发送给所述第二终端。
可选的,所述第一终端可以是以较高的分辨率采集所述多个图像帧,然后再将所述多个图像帧缩小至特定的分辨率后,再对缩小后的多个图像帧进行编码,并发送编码后的多个图像帧。例如,所述第一终端以4k分辨率采集所述多个图像帧,再将所述多个图像帧缩小至1080P分辨率后,对缩小后的多个图像帧进行编码并发送。
此外,在第一终端向第二终端发送图像帧的过程中,第一终端还向第二终端发送指示信息,该指示信息用于指示每个图像帧的感兴趣区域信息。
简单来说,第一终端可以是在向第二终端发送每个图像帧的过程中,都携带有每个图像帧对应的指示信息,该指示信息指示了第一终端所传输的图像帧的感兴趣区域信息。其中,该指示信息可以包括用于指示感兴趣区域位置的坐标信息。
例如,在感兴趣区域为一个矩形区域的情况下,该指示信息中可以包括该矩形区域的四个顶点的坐标信息。
又例如,在感兴趣区域为一个矩形区域的情况下,该指示信息中可以包括该矩形区域的一个顶点的坐标信息(如矩形区域的左上角顶点)以及该矩形区域的宽和高。这样,基于矩形区域的一个顶点的坐标信息以及该矩形区域的宽和高,同样可以计算得到该矩形区域的四个顶点的坐标信息。
可选的,在第一终端通过H.264/H.265视频压缩标准来进行视频编码时,第一终端可以是将指示信息写入到补充增强信息(Supplemental Enhancement Information,SEI)中,从而实现在传输数据中携带每个图像帧对应的指示信息。
在一种可能的实施例中,对于第二图像帧来说,由于第二图像帧除了包括感兴趣区域,还包括有其他的区域,因此第一终端在发送第二图像帧时,可以不携带指示信息,即不用指示第二图像帧中的感兴趣区域的位置。在另一种可能的实施例中,第一终端在发送第二图像帧时,仍然携带指示信息,以指示第二图像帧中的感兴趣区域的位置。
可选的,所述第一终端在向所述第二终端发送所述多个图像帧的过程中,所述第一终端可以是逐个对图像帧进行编码,得到编码后的图像信息,然后将编码后的图像信息以及所编码的图像帧对应的指示信息发送给所述第二终端。
可选的,为了便于第二终端确定所述多个图像帧中的第一图像帧和第二图像帧,所述指示信息中还可以包括所述多个图像帧的帧类型。其中,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同。
示例性地,第一终端可以通过指示信息中的某一个位来指示图像帧的帧类型。例如,当指示信息中用于指示帧类型的某一个位被置位为1时,则指示当前图像帧的帧类型为第二类型,即上述的第二图像帧;当指示信息中用于指示帧类型的某一个位没有被置位时(即该位的值为0),则指示当前图像帧的帧类型为第一类型,即上述的第一图像帧。除了通过上述的方式来指示帧类型,还可以是通过其他的方式来指示图像帧的帧类型,本实施例对此不做具体限定。
步骤503,第二终端接收第一终端发送的多个图像帧和指示信息。
在第二终端接收到第一终端所发送的数据后,第二终端通过对第一终端所发送的数据进行解码,得到所述第一终端发送的多个图像帧。示例性地,在第一终端持续向第二终端发送编码后的数据的过程中,第二终端持续对接收到的编码后的数据进行解码,从而依次得到上述的多个图像帧以及每个图像帧对应的指示信息。
值得注意的是,第一终端是每采集一个图像帧,即对该图像帧进行编码,并向第二终端发送编码后的图像帧以及对应的指示信息。因此,第二终端是逐个逐个地接收到第一终端所发送的图像帧,而并非是一次性接收到多个图像帧。
步骤504,若所述第一图像帧的感兴趣区域信息与所述第二终端中的感兴趣区域信息不同,则根据所述第二终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第五图像帧,并显示所述第五图像帧。
本实施例中,在第二终端接收到第一终端发送的图像帧之后,第二终端可以根据每个图像帧对应的指示信息来选择待显示的图像帧。
具体地,对于作为接收端的第二终端来说,第二终端能够接收来自于用户的交互指令,并根据交互指令更新感兴趣区域信息。但是,由于第二终端需要将更新后的感兴趣区域信息反馈给第一终端,且第一终端基于更新后的感兴趣区域信息采集新的图像帧后,再将新的图像帧发送给第二终端。因此,与更新后的感兴趣区域信息匹配的图像帧存在一定的滞后性,即第二终端在更新完本地的感兴趣区域信息之后的一段时间内,第二终端接收到的图像帧所对应的感兴趣区域信息与更新后的感兴趣区域信息并不相同。
因此,第二终端在显示第一图像帧之前,第二终端判断第一图像帧的感兴趣区域信息与第二终端中所保存的感兴趣区域信息是否相同。如果第一图像帧的感兴趣区域信息与第二终端中所保存的感兴趣区域信息不同,第二终端则根据所述第二终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第五图像帧,并显示所述第五图像帧。
简单来说,由于第二图像帧的变焦倍率小于第一图像帧的变焦倍率,第二图像帧实际上为全局图像帧,因此,当感兴趣区域发生变化的时候,可以从第二图像帧中确定新的感兴趣区域所在的位置,并将新的感兴趣区域所在位置裁剪下来,得到第五图像帧。其中,第五图像帧中的内容即为第二终端的感兴趣区域信息所指示的位置对应的内容。
步骤505,若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧。
如果第一图像帧的感兴趣区域信息与第二终端中所保存的感兴趣区域信息相同,第二终端则可以显示第一图像帧。
为便于理解,以下将结合具体例子描述第二终端选择所显示的图像帧的过程。
可以参阅图7,图7为本申请实施例提供的一种不同时刻下不同终端中的感兴趣区域信息的对比示意图。在图7中,图像中的实线框表示当前终端中的感兴趣区域信息所指示的位置。
在t1时刻,第一终端中的感兴趣区域信息与第二终端中的感兴趣区域信息相同,即第一终端以及第二终端中的感兴趣区域信息所指示的感兴趣区域均位于滑雪者的头部所在的位置。此时,第一终端所发送的第一图像帧对应的感兴趣区域信息与第二终端中的感兴趣区域信息相同,第二终端选择显示第一图像帧,该第一图像帧的内容即为实线框中所标记的滑雪者的头部。
在t2时刻,第二终端接收用户的交互指令,该交互指令具体为平移操作指令,第二终端根据该交互指令更新感兴趣区域信息。在第二终端中,更新前的感兴趣区域信息所指示的感兴趣区域位于滑雪者的头部,更新后的感兴趣区域信息所指示的感兴趣区域位于滑雪板。由于第二终端中感兴趣区域信息发生了变化,第一终端所发送的第一图像帧的感兴趣区域信息与第二终端中的感兴趣区域信息不相同,因此第二终端选择在第二图像帧中裁剪更新后的感兴趣区域,得到第三图像帧,并且显示第三图像帧。该第三图像帧的内容即为实线框中所标记的滑板。
在t3时刻,第一终端接收到了第二终端所反馈的更新后的感兴趣区域信息,因此第一终端根据更新后的感兴趣区域信息获取新的第一图像帧,并向第二终端发送新的第一图像帧。由于第一终端与第二终端之间存在传输时延,因此在t3时刻,第二终端所接收到的第一图像帧实际上还是第一终端基于更新前的感兴趣区域信息获取的。也就是说,第二终端中所接收到的第一图像帧的感兴趣区域信息与第二终端中的感兴趣区域信息不相同,第二终端仍然选择显示第三图像帧。
在t4时刻,第一终端中的感兴趣区域信息与第二终端中的感兴趣区域信息相同,并且第二终端接收到了第一终端基于更新后的感兴趣区域信息获取的第一图像帧。此时,第一终端所发送的第一图像帧对应的感兴趣区域信息与第二终端中的感兴趣区域信息相同,第二终端选择显示第一图像帧,该第一图像帧的内容即为实线框中所标记的滑板。
本实施例中,作为发送端的第一终端在向作为接收端的第二终端发送与感兴趣区域相关的图像帧的过程中,加入变焦倍率更低的图像帧,即画面中包括更多内容的图像帧。这样,当第二终端中的感兴趣区域发生变化时,第二终端能够及时从变焦倍率更低的图像帧中截取出变化后的感兴趣区域对应的画面并进行显示,即第二终端不需要等待较长时间即可显示调整后的视频画面,提高了调整视频感兴趣区域时的响应时间,避免视频画面出现卡顿现象。
在一个可能的实施例中,为便于第二终端选择所显示的图像帧,第二终端中可以创建有不同的缓冲区(buffer)。第二终端在接收到不同帧类型的图像帧后,将图像帧分别送至对应的缓冲区中。然后,第二终端根据感兴趣区域信息选择其中一个缓冲区,并显示该缓冲区中的图像帧。
示例性地,在所述指示信息还包括所述多个图像帧的帧类型的情况下,所述第二终端根据所述多个图像帧的帧类型,将所述第一图像帧依次送往第一缓冲区以及将所述第二图像帧送往第二缓冲区。其中,第一缓冲区用于存放帧类型为第一类型的第一图像帧,第二缓冲区用于存放帧类型为第二类型的第二图像帧。
第二终端每接收到一个新的图像帧,第二终端则判断该图像帧的帧类型,如果该图像帧的帧类型为第一类型,则将该图像送往第一缓冲区;如果该图像帧的帧类型为第二类型,则将该图像帧送往第二缓冲区。在第一缓冲区和第二缓冲区上,旧的图像帧会被新的图像帧覆盖。
在第二终端基于图像帧的帧类型,将图像帧送往对应的缓冲区之后,第二终端根据缓冲区中的图像帧的感兴趣区域信息,来确定选择显示哪个缓冲区中的图像帧。
具体地,如果所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第二终端中的感兴趣区域信息不同,则根据所述第二终端中的感兴趣区域信息在所述第二缓冲区的第二图像帧中裁剪得到第五图像帧,并显示所述第五图像帧。如果所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一缓冲区中的第一图像帧。
在一个可能的实施例中,在第二终端显示图像帧的过程中,第二终端可以实时获取用户的交互指令,并且根据该交互指令实时更新感兴趣区域。
示例性地,当用户在第二终端上发起交互指令时,所述第二终端获取交互指令,所述交互指令用于指示变更感兴趣区域。该交互指令例如可以为用户通过触控第二终端的屏幕所发起的缩小操作指令、放大操作指令或者平移操作指令。其中,缩小操作指令用于指示以目标区域为起点,缩小第二终端的屏幕所显示的画面。放大操作指令用于指示放大第二终端的屏幕上所显示的目标区域。平移操作指令用于指示将第二终端的屏幕上所显示的画面往特定方向进行平移。
第二终端执行所述交互指令后,所得到的需要显示的区域即为新的感兴趣区域,因此所述第二终端可以根据所述交互指令更新感兴趣区域信息,得到更新后的感兴趣区域信息。这样,所述第二终端将本地所保存的感兴趣区域信息进行更新后,所述第二终端向所述第一终端发送更新后的感兴趣区域信息。
以上介绍了第一终端向第二终端发送变焦倍率不同的图像帧,且第二终端根据感兴趣区域信息选择待显示的图像帧的过程。为便于理解,以下将详细介绍第一终端采集多个图像帧以及向第二终端发送多个图像帧的过程。
可选的,第一终端可以是通过多种方式来获取上述的多个图像帧。
方式一,第一终端可以是通过数码变焦的方式获取上述的多个图像帧。
其中,数码变焦是指通过第一终端中的处理器,把采集到的图像中的部分区域的每个像素面积增大,从而达到局部区域放大的目的。实际上,数码变焦并没有改变镜头的焦距。
示例性地,所述第一终端依次采集一个或多个第三图像帧和所述第二图像帧,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。即,所述第一终端以 特定的变焦倍率连续采集多个图像帧,该多个图像帧中包括上述的一个或多个第三图像帧和第二图像帧。然后,所述第一终端根据所述感兴趣区域信息对所述一个或多个第三图像帧进行裁剪,得到所述一个或多个第一图像帧。具体地,所述第一终端每采集一个第三图像帧之后,所述第一终端都根据第一终端中的感兴趣区域信息对第三图像帧进行裁剪,即基于感兴趣区域信息所指示的位置将第三图像帧中的感兴趣区域裁剪出来,从而得到第一图像帧。其中,第一终端获取第一图像帧的方式即为数码变焦。
可以参阅图8,图8为本申请实施例提供的一种通过数码变焦的方式得到多个图像帧的示意图。如图8所示,第一终端以固定的变焦倍率依次采集图像1、图像2、图像3、图像4、图像5和图像6,其中图像1-图像5对应于上述的第三图像帧,图像6对应于上述的第二图像帧。在第一终端采集图像1-图像5的过程中,第一终端采集图像1之后,基于感兴趣区域信息对图像1进行裁剪,得到图像A1;类似地,第一终端采集图像2之后,基于感兴趣区域信息对图像2进行裁剪,得到图像A2。以此类推,第一终端通过数码变焦的方式得到与图像1-图像5对应的图像A1-图像A5,图像A1-图像A5对应于上述的第一图像帧。对于图像6,第一终端不再基于感兴趣区域信息对图像6进行裁剪,即图8中的图像6与图像B相同。
在方式一中,第一终端可以设定一个固定的间隔数量,该间隔数量用于指示相邻的两张全局图像帧之间所间隔的图像帧的数量。其中,上述的第一图像帧可以称为局部图像帧,上述的第二图像帧可以称为全局图像帧。也就是说,第一终端每采集特定数量的局部图像帧之后,则采集一个全局图像帧。例如,在间隔数量为4的情况下,第一终端每采集4个局部图像帧,则采集一个全局图像帧。这样一来,第一终端以30的帧率进行图像采集时,第一终端每秒采集到30个图像帧,这30个图像帧中包括24个局部图像帧和6个全局图像帧。在实际应用中,上述的间隔数量可以是固定的,例如间隔数量为4或5。该间隔数量也可以是非固定的,例如第一终端间隔4个局部图像帧后采集一个全局图像帧,然后第一终端再间隔5个局部图像帧后再采集下一个全局图像帧。本实施例并不对第一终端中所设定的间隔数量进行限定。
示例性地,在第一终端所采集的所述多个图像帧中还可以包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。所述第一终端获取多个图像帧的过程,具体包括:所述第一终端根据预置规则依次采集所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧;其中,所述预置规则为所述第一终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
这样一来,第一终端每次基于所述感兴趣区域信息采集特定数量的图像帧后,都采用较低的变焦倍率采集一个全局图像帧,从而保证第二终端接收到全局图像帧的频率,保证第二终端后续基于全局图像帧来裁剪感兴趣区域时,不会出现画面变动过大的现象。
方式二,第一终端基于感兴趣区域信息获取多个局部图像帧后,插入全局图像帧。
示例性地,所述第一终端采集一个或多个第三图像帧,所述一个或多个第三图像帧的 变焦倍率与所述第二图像帧的变焦倍率相同;所述第一终端根据所述感兴趣区域信息对所述一个或多个第三图像帧进行裁剪,得到一个或多个所述第一图像帧;所述第一终端将所述一个或多个第三图像帧中的一个第三图像帧确定为所述第二图像帧。也就是说,第一终端基于感兴趣区域信息,通过数码变焦的方式连续采集第一图像帧(即局部图像帧),并在获取的多个第一图像帧中插入第二图像帧(即全局图像帧)。
也就是说,第一终端每次基于所述感兴趣区域信息采集特定数量的图像帧后,都插入一个变焦倍率较低的全局图像帧。这样一来,第一终端向第二终端发送的图像帧的数量要大于第一终端实际所采集的图像帧的数量。
可以参阅图9,图9为本申请实施例提供的另一种得到多个图像帧的示意图。如图9所示,第一终端以固定的变焦倍率依次获取图像1、图像2、图像3、图像4和图像5,其中图像1-图像5对应于上述的第三图像帧。在第一终端采集图像1-图像5的过程中,第一终端获取图像1之后,基于感兴趣区域信息对图像1进行裁剪,得到图像A1;类似地,第一终端获取图像2之后,基于感兴趣区域信息对图像2进行裁剪,得到图像A2。以此类推,第一终端通过数码变焦的方式得到与图像1-图像5对应的图像A1-图像A5,图像A1-图像A5对应于上述的第一图像帧。在得到图像A1-图像A5后,第一终端将图像5确定为第二图像帧,即基于图像5在图像A5后插入图像B,图像B与图像5相同。
可以理解的是,在第二终端的感兴趣区域信息没有发生变化的情况下,采用方式一来得到图像帧时,由于第二终端并不会选择显示第一终端所采集的全局图像帧,因此第二终端上显示图像帧的频率要低于第一终端实际采集图像帧的频率。
例如,在第一终端每间隔5个局部图像帧后采集一个全局图像帧的情况下,第一终端每秒采集到30个图像帧,这30个图像帧中包括25个局部图像帧和5个全局图像帧。对于第二终端来说,第二终端则是每秒显示25个局部图像帧。
在采用方式二来得到图像帧时,在第一终端每间隔5个局部图像帧后插入一个全局图像帧的情况下,第一终端每秒采集到30个局部图像帧,并插入6个全局图像帧,即一共36个图像帧。对于第二终端来说,第二终端则是每秒显示30个局部图像帧。因此,通过方式二来得到图像帧,能够保证第二终端显示图像帧的频率,提高视频画面的流畅性。
示例性地,可以参阅图10,图10为本申请实施例提供的一种基于不同方式得到图像帧的对比示意图。在图10中,第一终端基于特定的变焦倍率获取图像1-图像10。在第一终端采用方式一得到需要发送给第二终端的图像帧时,第一终端基于感兴趣区域信息对图像1-图像4以及图像6-图像9进行裁剪,分别得到图像A1-图像A8;第一终端基于图像5和图像10得到图像B1和图像B2。也就是说,第一终端发送给第二终端的图像帧一共为10个。
在第一终端采用方式二得到需要发送给第二终端的图像帧时,第一终端基于感兴趣区域信息对图像1-图像10进行裁剪,分别得到图像A1-图像A10;此外,第一终端基于图像5和图像10得到图像B1和图像B2,并将图像B1插入到图像A5后,以及将图像B2插入到图像A10后。也就是说,第一终端发送给第二终端的图像帧一共为12个。显然,在第一终端采用方式二得到需要发送给第二终端的图像帧时,第二终端能够接收到更多的图像 帧,且能够用于显示的图像帧也更多,从而能够保证第二终端显示图像帧的频率,提高视频画面的流畅性。
方式三,第一终端可以是通过光学变焦的方式获取上述的第一图像帧和第二图像帧。
其中,光学变焦是指第一终端依靠光学镜头结构来实现变焦,即第一终端通过镜片移动来放大与缩小需要拍摄的景物。数码变焦和光学变焦虽然都有助于望远拍摄时放大远方物体,但是只有光学变焦可以支持图像主体成像后,增加更多的像素,让主体不但变大,同时也相对更清晰。也就是说,通过光学变焦的方式来放大图像中的物体时,能够使得图像中的物体相对更加清晰。
由于光学变焦是以镜头的视野中心为中心点,通过改变焦距来放大或缩小镜头所捕捉的图像帧,因此第一终端通过光学变焦的方式采集第一图像帧的过程实际上是计算视野覆盖感兴趣区域的最长焦距,然后基于该最长焦距获取包括感兴趣区域的图像帧。如果该包括感兴趣区域的图像帧恰好仅包括了感兴趣区域,则可以将该包括感兴趣区域的图像帧确定为第一图像帧;如果该包括感兴趣区域的图像帧中除了感兴趣区域之外,还包括了非感兴趣区域,则可以从该包括感兴趣区域的图像帧中截取出感兴趣区域,以得到第一图像帧。
本实施例中,第一终端可以是基于感兴趣区域信息,通过光学变焦的方式调整变焦倍率,并获取上述的第一图像帧。然后,第一终端再调整变焦倍率,获取第二图像帧。
在一个可能的实施例中,在所述第一终端对上述的多个图像帧进行编码的过程中,所述第一终端可以根据图像帧的帧类型选择对应的参考帧。
一般来说,在视频的编码过程中,发送端会选择当前图像帧的前一个图像帧作为参考帧,然后计算当前图像帧与参考帧之间的差异信息,并向接收端发送当前图像帧与参考帧之间的差异信息,以代替直接向接收端发送当前图像帧,从而起到减少所传输的数据的作用。
在本实施例中,由于第一图像帧与第二图像帧是基于不同的变焦倍率获取的。因此,相比于相邻的两个第一图像帧来说,第一图像帧与第二图像帧之间的差异比较大。在对第二图像帧进行编码时,如果以第二图像帧的前一个图像帧(即第一图像帧)为参考帧,则所得到的差异信息的数据量较大,增加了数据传输量。基于此,所述第一终端可以根据图像帧的帧类型选择对应的参考帧,以保证当前需要编码的图像帧与该图像帧的参考帧之间的帧类型是相同的。
示例性地,在对所述多个图像帧进行编码的过程中,所述第一终端获取所述多个图像帧的参考帧,其中所述第一图像帧的参考帧是根据所述感兴趣区域信息得到的,所述第二图像帧的参考帧的变焦倍率与所述第二图像帧的变焦倍率相同。
简单来说,在第一终端采集图像帧的过程中,第一终端基于感兴趣区域信息获取的局部图像帧(即上述的第一图像帧)的帧类型为第一类型,基于特定的变焦倍率获取的全局图像帧(即上述的第二图像帧)的帧类型为第二类型。对于帧类型为第一类型的任意一个图像帧来说,第一终端可以确定该图像帧的参考帧为该图像帧的前一个第一类型的图像帧, 即时域上离该图像帧最近的一个第一类型的图像帧。类似地,对于帧类型为第二类型的任意一个图像帧来说,第一终端可以确定该图像帧的参考帧为该图像帧的前一个第二类型的图像帧,即时域上离该图像帧最近的一个第二类型的图像帧。
也就是说,对于任意一个图像帧,该图像帧的参考帧为时域上离该图像帧最近的一个相同类型的图像帧。
在获取得到参考帧后,所述第一终端根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果;所述第一终端向所述第二终端发送编码结果。
示例性地,可以参阅图11,图11为本申请实施例提供的一种确定参考帧的示意图。如图11所示,图像A1-图像A10为第一终端基于感兴趣区域信息获取的图像,图像A1-图像A10的帧类型为第一类型。图像B1和图像B2则是第一终端基于特定的变焦倍率获取的图像,图像B1和图像B2的帧类型为第二类型。
其中,图像A2的参考帧为图像A1,图像A3的参考帧为图像A2…图像A6的参考帧为图像A5。对于图像A1-图像A10中的任意一个图像来说,该图像的参考帧为同属于第一类型的前一个图像。类似地,图像B2的参考帧为图像B1,而并非是图像A10。对于图像B1-图像B2中的任意一个图像来说,该图像的参考帧为同属于第二类型的前一个图像。
本实施例中,由于在编码过程中,待编码图像帧的参考帧并不一定是待编码图像帧的前一个图像帧,因此对于部分严格以待编码图像帧的前一个图像帧作为参考帧的编码器来说,这部分编码器可能无法很好地实现图像帧的编码。
有鉴于此,本实施例中提出了采用两路编码器来对不同帧类型的图像帧进行编码,以保证图像帧的顺利编码。
示例性地,所述第一终端根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果,具体可以包括:所述第一终端根据所述图像帧的参考帧,通过第一编码器对所述图像帧进行编码,得到第一编码结果;所述第一终端根据所述第二图像帧的参考帧,通过第二编码器对所述第二图像帧进行编码,得到第二编码结果;其中,所述编码结果包括所述第一编码结果和所述第二编码结果。
其中,第一编码器对每个第一图像帧进行编码时,是以每个第一图像帧的前一个图像帧作为参考帧来进行编码。第二编码器对第二图像帧进行编码时,第二图像帧的参考帧为第二编码器中在第二图像帧之前的一个图像帧,该在第二图像帧之前的一个图像帧与第二图像帧类型相同。
简单来说,在实际应用中,对于基于感兴趣区域信息获取的图像帧,第一终端将这些图像帧输入至第一编码器中,由第一编码器对这部分图像帧进行编码。对于是在特定变焦倍率下获取的图像帧,第一终端则将这些图像帧输入至第二编码器中,由第二编码器对这部分图像帧进行编码。
示例性地,可以参阅图12,图12为本申请实施例提供的一种基于两个编码器进行编码的示意图。如图12所示,图像A1-图像A10为第一终端基于感兴趣区域信息获取的图像,图像A1-图像A10的帧类型为第一类型。图像B1和图像B2则是第一终端基于特定的变焦 倍率获取的图像,图像B1和图像B2的帧类型为第二类型。
其中,第一编码器的输入为图像A1-图像A10,在第一编码器进行编码的过程中,图像A2的参考帧为图像A1,图像A3的参考帧为图像A2…图像A6的参考帧为图像A5。对于图像A1-图像A10中的任意一个图像(除了输入第一编码器的首个图像)来说,该图像的参考帧即为前一个图像。
第二编码器的输入为图像B1和图像B2。图像B2的参考帧为图像B1。即对于输入第二编码器的任意一个图像(除了输入第二编码器的首个图像)来说,该图像的参考帧即为前一个图像。
为便于理解,以下将结合具体例子详细介绍本申请实施例提供的远程视频方法的流程。可以参阅图13,图13为本申请实施例的一种远程视频方法的流程示意图。如图13所示,该远程视频方法包括步骤1301-步骤1315,其中步骤1301-步骤1306是由第一终端执行的,步骤1307-步骤1315是由第二终端执行的。
步骤1301,在采集图像帧的过程中,第一终端判断当前需要采集的图像帧是否为N的倍数。
在远程视频的过程中,第一终端持续通过摄像头采集图像帧。在采集图像帧的过程中,第一终端可以是每间隔N-1个局部图像帧,则采集一个全局图像帧,因此第一终端可以通过判断当前需要采集的图像帧是否为N的倍数,来判断当前是采集全局图像帧还是局部图像帧。
例如,假设N为5,则第一终端是每间隔4个局部图像帧,则采集一个全局图像帧。这样一来,第一终端所采集的第1个至第4个图像帧为局部图像帧,第5个图像帧为全局图像帧;第一终端所采集的第6个至第9个图像帧为局部图像帧,第10个图像帧为全局图像帧。以此类推,第一终端所采集的全局图像帧均为N的倍数。
步骤1302,如果当前需要采集的图像帧不是N的倍数,第一终端基于感兴趣区域信息,采集局部图像帧。
其中,局部图像帧是第一终端基于感兴趣区域信息通过数码变焦或者光学变焦的方式获取的。所述局部图像帧例如可以为上述的第一图像帧。所述感兴趣区域信息可以为所述第一终端本地所保存的感兴趣区域信息。在远程视频的过程中,第一终端可以接收第二终端所发送的感兴趣区域信息,并将接收到的感兴趣区域信息保存在第一终端的本地。
第一终端可以是基于4k分辨率或者是2k分辨率来采集局部图像帧。
步骤1303,如果当前需要采集的图像帧为N的倍数,第一终端采集全局图像帧。
所述全局图像帧例如为上述的第二图像帧,所述全局图像帧是第一终端通过预设变焦倍率获取的。所述全局图像帧的变焦倍率要小于所述局部图像帧,且全局图像帧中的画面内容包括感兴趣区域的画面内容以及非感兴趣区域的画面内容。
在另一种可能的实现方式中,当前需要采集的图像帧为N的倍数,第一终端还可以是继续局部图像帧,并且在采集该局部图像帧之后,插入一个全局图像帧。
第一终端可以是基于4k分辨率或者是2k分辨率来采集全局图像帧。
步骤1304,第一终端将采集到的图像帧转换为1080P分辨率。
在获取局部图像帧或全局图像帧后,第一终端将获取的局部图像帧或全局图像帧转换为1080P分辨率,即将获取的局部图像帧或全局图像帧转换为由1920×1080个像素构成的图像帧。对于局部图像帧,如果局部图像帧本身的尺寸小于1920×1080,则通过对局部图像帧进行上采样,以将局部图像帧的大小转换为1920×1080;如果局部图像帧本身的尺寸大于1920×1080,则通过对局部图像帧进行下采样,以将局部图像帧的大小转换为1920×1080。对于全局图像帧,则是通过对全局图像帧进行下采样,以将全局图像帧的大小转换为1920×1080。
步骤1305,第一终端根据帧类型选择图像帧的参考帧,并基于参考帧对图像帧进行编码;或者,第一终端通过两路编码器对不同帧类型的图像帧进行编码。
在获取图像帧后,第一终端则对采集到的图像帧进行编码。在编码过程中,第一终端可以根据当前待编码图像帧的帧类型选择当前待编码图像帧的参考帧。其中,局部图像帧的帧类型可以定义为第一类型,全局图像帧的帧类型可以定义为第二类型。
在一种可能的实现方式中,对于帧类型为第一类型的任意一个图像帧来说,第一终端可以确定该图像帧的参考帧为该图像帧的前一个第一类型的图像帧,即时域上离该图像帧最近的一个第一类型的图像帧。类似地,对于帧类型为第二类型的任意一个图像帧来说,第一终端可以确定该图像帧的参考帧为该图像帧的前一个第二类型的图像帧,即时域上离该图像帧最近的一个第二类型的图像帧。也就是说,对于任意一个图像帧,该图像帧的参考帧为时域上离该图像帧最近的一个相同类型的图像帧。
在另一种可能的实现方式中,第一终端可以将不同帧类型的图像帧输入至不同的编码器中,从而实现通过两路编码器来对图像帧进行编码。具体地,第一终端将帧类型为第一类型的图像帧输入至第一编码器中,由第一编码器对这部分图像帧进行编码。此外,第一终端将帧类型为第二类型的图像帧输入至第二编码器中,由第一编码器对这部分图像帧进行编码。
其中,第一编码器对每个第一类型的图像帧进行编码时,是以每个第一类型的图像帧的前一个图像帧作为参考帧来进行编码。第二编码器对第二类型的图像帧进行编码时,第二类型的图像帧的参考帧为第二编码器中在该图像帧之前的一个图像帧。
步骤1306,第一终端在码流中携带当前图像帧的指示信息,并向第二终端发送码流。
在远程视频过程中,第一终端持续对获取的图像帧进行编码,得到码流。第一终端可以在码流的SEI中携带当前图像帧的指示信息,并向第二终端发送码流。其中,所述指示信息用于指示当前图像帧的帧类型以及当前图像帧对应的感兴趣区域信息。
步骤1307,第二终端接收第一终端发送的码流,并进行解码,得到图像帧以及对应的指示信息。
第二终端接收到第一终端发送的码流后,对码流进行解码,得到图像帧以及该图像帧对应的指示信息。
步骤1308,第二终端获取交互指令,并通过解析交互指令,得到交互指令所指示的感兴趣区域。
可选的,在远程视频的任意时刻,第二终端可以获取用户触发的交互指令,并通过解析交互指令,得到交互指令所指示的感兴趣区域。该交互指令例如可以为用户通过触控第二终端的屏幕所发起的缩小操作指令、放大操作指令或者平移操作指令。
步骤1309,第二终端根据交互指令所指示的感兴趣区域更新本地的感兴趣区域信息。
在解析得到交互指令所指示的感兴趣区域后,第二终端根据交互指令所指示的感兴趣区域更新本地的感兴趣区域信息。
步骤1310,第二终端根据图像帧所携带的指示信息判断当前图像帧是否为全局图像帧。
由于指示信息用于指示图像帧的帧类型,因此第二终端可以根据码流中与图像帧对应的指示信息,判断当前图像帧是否为全局图像帧,即判断当前图像帧的帧类型是否为第二类型。
步骤1311,如果当前图像帧不是全局图像帧,则将当前图像帧送往第一缓冲区。
当第一缓冲区存放有其他的局部图像帧时,第二终端将新的局部图像帧覆盖在其他的局部图像帧之上,以使得第一缓冲区中始终保持只有一个图像帧。
步骤1312,如果当前图像帧是全局图像帧,则将当前图像帧送往第二缓冲区。
当第一缓冲区存放有其他的全局图像帧时,第二终端将新的全局图像帧覆盖在其他的全局图像帧之上,以使得第二缓冲区中始终保持只有一个图像帧。
步骤1313,第二终端判断第一缓冲区中的图像帧对应的感兴趣区域信息是否与第二终端本地的感兴趣区域信息一致。
基于第一缓冲区中的图像帧对应的指示信息,第二终端可以判断第一缓冲区中的图像帧对应的感兴趣区域信息是否与第二终端本地的感兴趣区域信息一致。
步骤1314,如果第一缓冲区中的图像帧对应的感兴趣区域信息与第二终端本地的感兴趣区域信息一致,第二终端将第一缓冲区的局部图像帧送往显示屏进行显示。
步骤1315,如果第一缓冲区中的图像帧对应的感兴趣区域信息与第二终端本地的感兴趣区域信息不一致,第二终端截取第二缓冲区中的全局图像帧中的部分区域,并将截取后得到的图像帧送往显示屏进行显示。
当感兴趣区域发生变化的时候,可以根据第二终端本地中的感兴趣区域信息,从全局图像帧中确定新的感兴趣区域所在的位置,并将新的感兴趣区域所在位置裁剪下来,得到新的图像帧。其中,新的图像帧中的内容即为第二终端的感兴趣区域信息所指示的位置对应的内容。
在图1至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。
具体可以参阅图14,图14为本申请实施例提供的一种终端1400的结构示意图,该终端1400包括:获取单元1401、处理单元1402和收发单元1403;所述获取单元1401用于获取多个图像帧,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧的变焦倍率,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置;所述第一终端向第二终端发送所述多个图 像帧和指示信息,以使得所述第二终端根据所述指示信息选择待显示的图像帧,所述待显示的图像帧用于生成视频,所述指示信息包括所述第一图像帧的感兴趣区域信息。
可选的,在一种可能的实现方式中,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
可选的,在一种可能的实现方式中,所述获取单元1401,还用于采集第三图像帧和所述第二图像帧,所述第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述处理单元1402,还用于根据所述感兴趣区域信息对所述第三图像帧进行裁剪,得到所述第一图像帧。
可选的,在一种可能的实现方式中,所述获取单元1401,还用于采集一个或多个第三图像帧,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述处理单元1402,还用于根据所述感兴趣区域信息对所述一个或多个第三图像帧进行裁剪,得到一个或多个所述第一图像帧;所述处理单元1402还用于将所述一个或多个第三图像帧中的一个第三图像帧确定为所述第二图像帧。
可选的,在一种可能的实现方式中,所述获取单元1401通过光学变焦的方式获取所述第一图像帧和所述第二图像帧。
可选的,在一种可能的实现方式中,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述获取单元1401还用于根据预置规则依次获取所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧;其中,所述预置规则为所述第一终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
可选的,在一种可能的实现方式中,所述获取单元1401还用于获取所述多个图像帧的参考帧;所述处理单元1402还用于根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果;所述收发单元1403还用于向所述第二终端发送编码结果;其中,所述第一图像帧的参考帧是根据所述感兴趣区域信息得到的,所述第二图像帧的参考帧的变焦倍率与所述第二图像帧的变焦倍率相同。
可选的,在一种可能的实现方式中,所述处理单元1402还用于根据所述图像帧的参考帧,通过第一编码器对所述图像帧进行编码,得到第一编码结果;所述处理单元1402还用于根据所述第二图像帧的参考帧,通过第二编码器对所述第二图像帧进行编码,得到第二编码结果;其中,所述编码结果包括所述第一编码结果和所述第二编码结果。
可选的,在一种可能的实现方式中,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同。
在另一个可能的实施例中,所述收发单元1403用于接收第二终端发送的多个图像帧和指示信息,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置,所述指示信息包括所述第一图像帧的感兴趣区域信息;所述 处理单元1402用于:若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧。
可选的,在一种可能的实现方式中,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
可选的,在一种可能的实现方式中,所述第一图像帧是所述第二终端在采集一个或多个第三图像帧后,根据所述第二终端中的感兴趣区域信息对所述一个或多个第三图像帧进行裁剪得到的,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。
可选的,在一种可能的实现方式中,所述第二图像帧为所述一个或多个第三图像帧中的一个图像帧。
可选的,在一种可能的实现方式中,所述多个图像帧是所述第二终端通过光学变焦的方式获取的。
可选的,在一种可能的实现方式中,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧是所述第二终端根据预置规则依次获取的;其中,所述预置规则为所述第二终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
可选的,在一种可能的实现方式中,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同;所述处理单元1402还用于根据所述多个图像帧的帧类型,将所述第一图像帧依次送往第一缓冲区以及将所述第二图像帧送往第二缓冲区;所述处理单元1402还用于:若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二缓冲区的第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一缓冲区中的第一图像帧。
可选的,在一种可能的实现方式中,所述获取单元1401用于获取交互指令,所述交互指令用于指示变更感兴趣区域;所述处理单元1402还用于根据所述交互指令更新感兴趣区域信息,得到更新后的感兴趣区域信息;所述收发单元1403还用于向所述第二终端发送更新后的感兴趣区域信息。
本申请实施例提供的远程视频方法具体可以由终端中的芯片来执行,该芯片包括:处理单元1402和通信单元,处理单元1402例如可以是处理器,通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元1402可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图1至图13所示实施例描述的远程视频方法。可选的,存储单元为芯片内的存储单元,如寄存器、缓存等,存储单元还可以是无线接入设备端内的位于芯片 外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
参照图15,本申请还提供了一种计算机程序产品,在一些实施例中,上述图5所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。
图15示意性地示出根据这里展示的至少一些实施例而布置的示例计算机程序产品的概念性局部视图,示例计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。
在一个实施例中,计算机程序产品1500是使用信号承载介质1501来提供的。信号承载介质1501可以包括一个或多个程序指令1502,其当被一个或多个处理器运行时可以提供以上针对图2描述的功能或者部分功能。因此,例如,参考图3中所示的实施例,步骤301-306的一个或多个特征可以由与信号承载介质1501相关联的一个或多个指令来承担。此外,图15中的程序指令1502也描述示例指令。
在一些示例中,信号承载介质1501可以包含计算机可读介质1503,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、ROM或RAM等等。
在一些实施方式中,信号承载介质1501可以包含计算机可记录介质1504,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质1501可以包含通信介质1505,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质1501可以由无线形式的通信介质1505(例如,遵守IEEE 802.15标准或者其它传输协议的无线通信介质)来传达。
一个或多个程序指令1502可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,计算设备的计算设备可以被配置为,响应于通过计算机可读介质1503、计算机可记录介质1504、和/或通信介质1505中的一个或多个传达到计算设备的程序指令1502,提供各种操作、功能、或者动作。
应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显 示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (20)

  1. 一种远程视频方法,其特征在于,包括:
    第一终端获取多个图像帧,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧的变焦倍率,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示所述感兴趣区域的位置;
    所述第一终端向第二终端发送所述多个图像帧和指示信息,以使得所述第二终端根据所述指示信息选择待显示的图像帧,所述待显示的图像帧用于生成视频,所述指示信息包括所述第一图像帧的感兴趣区域信息。
  2. 根据权利要求1所述的方法,其特征在于,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一终端获取所述多个图像帧,包括:
    所述第一终端采集第三图像帧和所述第二图像帧,所述第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;
    所述第一终端根据所述感兴趣区域信息对所述第三图像帧进行裁剪,得到所述第一图像帧。
  4. 根据权利要求1或2所述的方法,其特征在于,所述第一终端获取所述多个图像帧,包括:
    所述第一终端采集一个或多个第三图像帧,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;
    所述第一终端根据所述感兴趣区域信息对所述一个或多个第三图像帧进行裁剪,得到一个或多个所述第一图像帧;
    所述第一终端将所述一个或多个第三图像帧中的一个第三图像帧确定为所述第二图像帧。
  5. 根据权利要求1或2所述的方法,其特征在于,所述第一终端获取所述多个图像帧,包括:
    所述第一终端通过光学变焦的方式采集所述第一图像帧和所述第二图像帧。
  6. 根据权利要求1-5任意一项所述的方法,其特征在于,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;
    所述第一终端获取多个图像帧,包括:
    所述第一终端根据预置规则依次采集所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧;
    其中,所述预置规则为所述第一终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
  7. 根据权利要求1-6任意一项所述的方法,其特征在于,所述第一终端向第二终端发送所述多个图像帧,包括:
    所述第一终端获取所述多个图像帧的参考帧;
    所述第一终端根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果;
    所述第一终端向所述第二终端发送编码结果;
    其中,所述第一图像帧的参考帧是根据所述感兴趣区域信息得到的,所述第二图像帧的参考帧的变焦倍率与所述第二图像帧的变焦倍率相同。
  8. 根据权利要求7所述的方法,其特征在于,所述第一终端根据所述多个图像帧的参考帧,对所述多个图像帧进行编码,得到编码结果,包括:
    所述第一终端根据所述第一图像帧的参考帧,通过第一编码器对所述第一图像帧进行编码,得到第一编码结果;
    所述第一终端根据所述第二图像帧的参考帧,通过第二编码器对所述第二图像帧进行编码,得到第二编码结果;
    其中,所述编码结果包括所述第一编码结果和所述第二编码结果。
  9. 根据权利要求1-8任意一项所述的方法,其特征在于,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同。
  10. 一种远程视频方法,其特征在于,包括:
    第一终端接收第二终端发送的多个图像帧和指示信息,所述多个图像帧包括第一图像帧和第二图像帧,所述第一图像帧的变焦倍率大于所述第二图像帧,所述第一图像帧是根据感兴趣区域信息确定的,所述感兴趣区域信息用于指示感兴趣区域的位置,所述指示信息包括所述第一图像帧的感兴趣区域信息;
    若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;
    若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧。
  11. 根据权利要求10所述的方法,其特征在于,所述第二图像帧中包括所述感兴趣区域信息中所指示的感兴趣区域。
  12. 根据权利要求10或11所述的方法,其特征在于,所述第一图像帧是所述第二终端在采集一个或多个第三图像帧后,根据所述第二终端中的感兴趣区域信息对所述一个或多个第三图像帧进行裁剪得到的,所述一个或多个第三图像帧的变焦倍率与所述第二图像帧的变焦倍率相同。
  13. 根据权利要求10或11所述的方法,其特征在于,所述第二图像帧为所述一个或多个第三图像帧中的一个图像帧。
  14. 根据权利要求10或11所述的方法,其特征在于,所述多个图像帧是所述第二终端通过光学变焦的方式获取的。
  15. 根据权利要求10-14任意一项所述的方法,其特征在于,所述多个图像帧还包括第四图像帧,所述第四图像帧的变焦倍率与所述第二图像帧的变焦倍率相同;
    所述第四图像帧、一个或多个所述第一图像帧和所述第二图像帧是所述第二终端根据预置规则依次获取的;
    其中,所述预置规则为所述第二终端基于所述感兴趣区域信息采集预设数量的图像帧后,则采用目标变焦倍率采集一个图像帧,所述预设数量与一个或多个所述第一图像帧的数量相同,所述目标变焦倍率为所述第二图像帧的变焦倍率。
  16. 根据权利要求10-15任意一项所述的方法,其特征在于,所述指示信息还包括所述多个图像帧的帧类型,所述第一图像帧的帧类型与所述第二图像帧的帧类型不同;
    所述方法还包括:
    所述第一终端根据所述多个图像帧的帧类型,将所述第一图像帧依次送往第一缓冲区以及将所述第二图像帧送往第二缓冲区;
    所述若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧,包括:
    若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息不同,则根据所述第一终端中的感兴趣区域信息在所述第二缓冲区的第二图像帧中裁剪得到第三图像帧,并显示所述第三图像帧;
    所述若所述第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一图像帧,包括:
    若所述第一缓冲区中的第一图像帧的感兴趣区域信息与所述第一终端中的感兴趣区域信息相同,则显示所述第一缓冲区中的第一图像帧。
  17. 根据权利要求10-16任意一项所述的方法,其特征在于,所述方法还包括:
    所述第一终端获取交互指令,所述交互指令用于指示变更感兴趣区域;
    所述第一终端根据所述交互指令更新感兴趣区域信息,得到更新后的感兴趣区域信息;
    所述第一终端向所述第二终端发送更新后的感兴趣区域信息。
  18. 一种终端,其特征在于,包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为执行所述代码,当所述代码被执行时,所述终端执行如权利要求1至17任一所述的方法。
  19. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至17中任一项所述的方法。
  20. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至17任一项所述的方法。
PCT/CN2022/082387 2021-03-26 2022-03-23 一种远程视频方法及相关装置 WO2022199594A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110327092.4A CN115134633B (zh) 2021-03-26 2021-03-26 一种远程视频方法及相关装置
CN202110327092.4 2021-03-26

Publications (1)

Publication Number Publication Date
WO2022199594A1 true WO2022199594A1 (zh) 2022-09-29

Family

ID=83374140

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/082387 WO2022199594A1 (zh) 2021-03-26 2022-03-23 一种远程视频方法及相关装置

Country Status (2)

Country Link
CN (1) CN115134633B (zh)
WO (1) WO2022199594A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692762A (zh) * 2023-06-21 2024-03-12 荣耀终端有限公司 拍摄方法及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050237380A1 (en) * 2004-04-23 2005-10-27 Toshiaki Kakii Coding method for notion-image data, decoding method, terminal equipment executing these, and two-way interactive system
WO2014013619A1 (ja) * 2012-07-20 2014-01-23 Necカシオモバイルコミュニケーションズ株式会社 撮像装置及び電子ズーム方法
US20150371365A1 (en) * 2014-06-24 2015-12-24 Nokia Technologies Oy Method and technical equipment for image capturing and viewing
CN107896303A (zh) * 2017-10-23 2018-04-10 努比亚技术有限公司 一种图像采集方法、系统和设备及计算机可读存储介质
CN111447359A (zh) * 2020-03-19 2020-07-24 展讯通信(上海)有限公司 数字变焦方法、系统、电子设备、介质及数字成像设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4373682B2 (ja) * 2003-01-31 2009-11-25 独立行政法人理化学研究所 関心組織領域抽出方法、関心組織領域抽出プログラム及び画像処理装置
CN102625147B (zh) * 2012-02-29 2015-01-14 中山大学 一种移动可视设备自适应显示方法
CN105553618B (zh) * 2015-12-22 2019-05-10 西安交通大学 基于喷泉码和自适应资源分配的图像安全传输方法
CN107018386A (zh) * 2017-06-08 2017-08-04 柳州智视科技有限公司 一种视频流多分辨率观测系统
KR20190083234A (ko) * 2018-01-03 2019-07-11 삼성메디슨 주식회사 초음파 진단 장치의 제어 방법 및 초음파 진단 장치
CN111741274B (zh) * 2020-08-25 2020-12-29 北京中联合超高清协同技术中心有限公司 一种支持画面局部放大和漫游的超高清视频监看方法
CN112003875A (zh) * 2020-09-03 2020-11-27 北京云石海慧软件有限公司 一种视讯焦点内容传输系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050237380A1 (en) * 2004-04-23 2005-10-27 Toshiaki Kakii Coding method for notion-image data, decoding method, terminal equipment executing these, and two-way interactive system
WO2014013619A1 (ja) * 2012-07-20 2014-01-23 Necカシオモバイルコミュニケーションズ株式会社 撮像装置及び電子ズーム方法
US20150371365A1 (en) * 2014-06-24 2015-12-24 Nokia Technologies Oy Method and technical equipment for image capturing and viewing
CN107896303A (zh) * 2017-10-23 2018-04-10 努比亚技术有限公司 一种图像采集方法、系统和设备及计算机可读存储介质
CN111447359A (zh) * 2020-03-19 2020-07-24 展讯通信(上海)有限公司 数字变焦方法、系统、电子设备、介质及数字成像设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692762A (zh) * 2023-06-21 2024-03-12 荣耀终端有限公司 拍摄方法及电子设备

Also Published As

Publication number Publication date
CN115134633A (zh) 2022-09-30
CN115134633B (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
US11303881B2 (en) Method and client for playing back panoramic video
EP3335416B1 (en) Digital photographing apparatus and method of operating the same
CN112204993B (zh) 使用重叠的被分区的分段的自适应全景视频流式传输
US10244167B2 (en) Apparatus and methods for image encoding using spatially weighted encoding quality parameters
TWI569629B (zh) 用於在壓縮視訊資料中包括感興趣區域指示之技術
EP3596931B1 (en) Method and apparatus for packaging and streaming of virtual reality media content
US11483475B2 (en) Adaptive panoramic video streaming using composite pictures
US20140092439A1 (en) Encoding images using a 3d mesh of polygons and corresponding textures
JP7565104B2 (ja) 映像配信装置、映像配信システム、映像配信方法並びに映像配信プログラム
WO2018103384A1 (zh) 一种360度全景视频的播放方法、装置及系统
CN111669561B (zh) 多角度自由视角图像数据处理方法及装置、介质、设备
US9451197B1 (en) Cloud-based system using video compression for interactive applications
WO2022199594A1 (zh) 一种远程视频方法及相关装置
CN114666477A (zh) 一种视频数据处理方法、装置、设备及存储介质
JP2019149785A (ja) 映像変換装置及びプログラム
JP6952456B2 (ja) 情報処理装置、制御方法、及びプログラム
WO2018196530A1 (zh) 一种视频信息处理方法及终端、计算机存储介质
WO2023103875A1 (zh) 自由视角视频的视角切换方法、装置及系统
CN117082295B (zh) 图像流处理方法、设备及存储介质
WO2024012295A1 (zh) 用于视频传输的方法、装置、系统、设备和介质
WO2024207955A1 (zh) 视频处理方法、装置、电子设备和存储介质
CN117440176A (zh) 用于视频传输的方法、装置、设备和介质
JP2016065958A (ja) 表示制御システム、表示制御装置、及びプログラム
WO2023031890A1 (en) Context based adaptable video cropping
JP2023507586A (ja) 3dof構成要素からの6dofコンテンツを符号化、復号化、及びレンダリングするための方法及び装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22774250

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22774250

Country of ref document: EP

Kind code of ref document: A1