WO2021052500A1 - 视频图像的传输方法、发送设备、视频通话方法和设备 - Google Patents

视频图像的传输方法、发送设备、视频通话方法和设备 Download PDF

Info

Publication number
WO2021052500A1
WO2021052500A1 PCT/CN2020/116541 CN2020116541W WO2021052500A1 WO 2021052500 A1 WO2021052500 A1 WO 2021052500A1 CN 2020116541 W CN2020116541 W CN 2020116541W WO 2021052500 A1 WO2021052500 A1 WO 2021052500A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
frames
ltr
current frame
inter
Prior art date
Application number
PCT/CN2020/116541
Other languages
English (en)
French (fr)
Inventor
周旭升
卢宇峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20866583.6A priority Critical patent/EP4024867A4/en
Publication of WO2021052500A1 publication Critical patent/WO2021052500A1/zh
Priority to US17/698,405 priority patent/US20220210469A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/164Feedback from the receiver or from the transmission channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/58Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Definitions

  • This application relates to the field of communication technology, in particular to video image transmission methods, sending equipment, video call methods and equipment.
  • WiFi Wireless Fidelity
  • video codec and transmission control play a key role in the quality and smoothness of video calls.
  • the codec and transmission control of the video call belong to two sub-systems, and the reference relationship between video frames is relatively stable, and there is a problem that the smoothness and definition of the video cannot be balanced, and the experience is poor.
  • This application provides a video image transmission method, sending equipment, video call method and equipment.
  • This application also provides a video image display method and a video image receiving device to achieve a good relationship between image quality and image fluency. Good balance.
  • the present application provides a method for transmitting a video image, where the video image includes multiple video frames, including:
  • the multiple video frames are encoded to obtain an encoded code stream
  • the code stream includes at least information indicating the reference relationship between frames; in addition, the code stream may also include encoded data, such as the current frame and the reference Residual data of frames, etc.; the above information indicating the reference relationship between frames can be placed in the slice header;
  • the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, wherein:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest time domain distance to the current frame, where the current frame refers to the current frame, and the foregoing target forward LTR frame is the transmission
  • the end device receives the forward LTR frame of the confirmation message of the receiving end device.
  • the target forward LTR frame may be an encoded video frame marked as an LTR frame by the transmitting end device and receiving the confirmation message sent by the receiving end device,
  • the above confirmation message corresponds to the target forward LTR frame;
  • the sending end device is the local end, for example, it may also be called the encoding end device, and the receiving end device is the opposite end or the remote end, for example, it may also be called the decoding end device;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame refers to the previous N frames
  • the forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device.
  • the forward LTR frame is stored in the DPB.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the above-mentioned multiple video frames are encoded to obtain an encoded bitstream, and the above-mentioned bitstream includes at least information indicating a reference relationship between frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is
  • the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device, that is, in this embodiment, the sending end device is marking the LTR frame
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the previous frame of the current frame, where the current frame refers to each of the last M frames, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sender device determines the value of N according to the coding quality of the first n frames of the current frame, and n ⁇ N.
  • the sender device can determine the value of N according to the coding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiver device.
  • the network status information can include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the sending end device determines the value of N according to a comparison result of the coding quality of the first n frames of the current frame and the coding quality threshold.
  • the sending end device determines the value of M according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • the mark interval of the aforementioned LTR frame refers to the number of interval frames for marking the LTR frame, that is, how many frames are needed to mark the next LTR frame after the previous LTR frame is marked. For example, if the marking interval of the LTR frame is 4, then after marking the current frame as the LTR frame, there needs to be an interval of 4 frames, and the fifth frame after the current frame is marked as the LTR frame.
  • L is a positive integer
  • n is greater than or equal to 1. Positive integer.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the sender device can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in time domain.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the sending end device determines the marking interval D of the LTR frame according to the network status information fed back by the receiving end device, and the network status information includes: network packet loss rate, network availability One or more of bandwidth and network loop time RTT.
  • the marking interval D of the LTR frame is used for the sending end device to mark the LTR frame.
  • the transmitting end device marks the LTR frame according to the marking interval of the LTR, which can mark multiple LTR frames in one RTT, and in this application, the marking interval of the LTR is not fixedly set, but dynamically changes, and may be the same The interval may also be a different interval, which is specifically determined according to the actual application scenario, so that the reference distance between frames can be greatly shortened, and the coding quality of the video image can be improved.
  • the sender device can dynamically determine the LTR marking interval according to network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • the present application provides a method for transmitting a video image, where the video image includes multiple video frames, including: determining whether the current frame is marked as a long-term reference LTR frame; if the current frame is not marked as an LTR frame , The unmarked current frame is encoded, where the encoding process includes: at least encoding information representing the inter-frame reference relationship of the current frame into the code stream, and the inter-frame reference relationship of the current frame represents the current frame With reference to the forward LTR frame with the closest temporal distance to the current frame, the forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device; or,
  • the marked current frame is encoded, where the encoding process includes: encoding information indicating the inter-frame reference relationship of the current frame into the code stream, and the frame of the current frame
  • the inter-reference relationship indicates that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, where the target forward LTR frame is the forward LTR for which the sending end device receives the receiving end device confirmation message
  • the target forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device and receiving a confirmation message sent by the receiving end device, and the confirmation message is the same as the target forward LTR frame.
  • Frame correspondence in this application, the sending end device is the local end, for example, it can also be called the encoding end device, and the receiving end device is the opposite end or the remote end, for example, it can also be called the decoding end device;
  • the forward LTR frame is the LTR frame marked by the sending end device.
  • Encoded video frames that is, in this embodiment, the sender device does not need to wait for feedback from the receiver device when marking the LTR frame. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the time between frames. Refer to the distance to improve the coding quality of the video image.
  • the judging whether the current frame is marked as a long-term reference LTR frame includes: judging whether the current frame is marked as an LTR frame according to the marking interval of the LTR frame.
  • the judging whether the current frame is marked as an LTR frame according to the marking interval of the LTR frame includes: obtaining one of the current frame and the forward LTR frame closest to the current frame in time domain. If the number of interval frames is equal to the mark interval of the LTR frame, the current frame is marked as an LTR frame; if the number of interval frames is not equal to the mark interval of the LTR frame, then the current frame is not equal to the mark interval of the LTR frame. The current frame is not marked as an LTR frame.
  • the method further includes: determining the marking interval of the LTR frame according to the network status information fed back by the receiving end device, and the network status information includes: network packet loss rate, network availability One or more of bandwidth and network loop time RTT.
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the sender device tag Is an encoded video frame of an LTR frame; wherein the current frame is not marked as an LTR frame and the encoding quality of the current frame is greater than or equal to the encoding quality threshold; or, the inter-frame reference relationship of the current frame represents the current frame With reference to the forward LTR frame with the closest time domain distance to the current frame, the forward LTR frame is an encoded video frame marked as an LTR frame by the transmitting end device; wherein the current frame is not marked as an LTR frame and is not marked as an LTR frame. The coding quality of the current frame is less than the coding quality threshold.
  • the sending device refers to the forward LTR frame with the closest time domain distance to the current frame when encoding the current frame. After encoding the current frame, the sending device obtains the current frame Encoding quality, compare the encoding quality of the current frame with the encoding quality threshold. If the encoding quality of the current frame is less than the encoding quality threshold, when encoding the next frame of the current frame, refer to the one with the closest temporal distance to the next frame. The target forwards the LTR frame to improve the coding quality of the frame after the current frame.
  • the method further includes: encoding the last M+1 frame of the current frame, and the encoding process includes: setting the inter-frame reference relationship of the last M+1 frame of the current frame
  • the information of the last M+1 frame is encoded into the code stream, and the inter-frame reference relationship of the last M+1 frame indicates that the first frame in the last M+1 frame refers to the target forward LTR frame with the closest time domain distance to the first frame,
  • M is a positive integer; wherein the current frame is not marked as an LTR frame and the current frame
  • the coding quality is less than the coding quality threshold.
  • the method further includes: encoding the next frame of the current frame, and the encoding process includes: encoding information indicating the inter-frame reference relationship of the next frame of the current frame into In the code stream, the inter-frame reference relationship of the next frame indicates that the next frame refers to the target forward LTR frame with the closest time domain distance to the current frame, wherein the current frame is not marked as an LTR frame and the The coding quality of the current frame is less than the coding quality threshold.
  • the method further includes:
  • Encoding the last M+1 frame of the current frame includes: encoding the information indicating the inter-frame reference relationship of the last M+1 frame of the current frame into the code stream,
  • the inter-frame reference relationship means that the first frame in the last M+1 frames refers to the target forward LTR frame with the closest time domain distance to the first frame, and each of the last M+1 frames after the first frame
  • One frame refers to the previous frame of the current frame, where M is a positive integer; wherein the current frame is not marked as an LTR frame and the coding quality of the current frame is less than the coding quality threshold.
  • the sender device can determine that the first frame in the last M+1 frame refers to the target with the closest time domain distance to the first frame.
  • each frame after the first frame in the above-mentioned last M+1 frames refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize the self Adapt to the selection of the reference relationship, such as the flexible combination of the full reference relationship and the frame-by-frame reference relationship. To a certain extent, it avoids referencing the reference frame with a long time domain distance from the current frame, which greatly alleviates the phenomenon of video jams caused by packet loss.
  • the problem of blurring image quality achieves a better balance between image quality and image fluency.
  • the sending end device determines the value of M according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the present application provides a video call method, which is applied to an electronic device with a display screen and an image collector.
  • the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned image collector may be a camera, or a vehicle-mounted sensor, etc.; the above-mentioned electronic device may be a mobile terminal (mobile phone), a smart screen, and no Man-machine, Intelligent Connected Vehicle (hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or on-board equipment and other equipment.
  • ICV Intelligent Connected Vehicle
  • the above-mentioned electronic device may include: in response to a first operation of a first user requesting a video call with a second user, establishing a video call connection between the first user and the second user, where the video call connection refers to A video call connection between the electronic device used by the first user and the electronic device used by the second user; collecting a video image including the environment of the first user through the image collector, the video image including a plurality of video frames,
  • the environment here may be a video image of the internal environment and/or external environment where the first user is located, such as the environment in the car and/or intelligently detect obstacles and perceive the surrounding environment during driving; Encoding is performed to obtain an encoded code stream, the code stream includes at least information indicating the inter-frame reference relationship; the encoded code stream is sent, wherein the information indicating the inter-frame reference relationship includes the inter-frame of the current frame The information of the reference relationship and the information of the inter-frame reference relationship of the previous N frames of the current frame, where:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target long-term reference LTR frame with the closest time domain distance to the current frame, where the current frame refers to the current frame;
  • the sending end device is the first An electronic device used by the user, where the receiving end device is an electronic device used by the second user;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame refers to the previous N frames Every frame in.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the image collector collects information including the first user
  • the video image of the environment is then encoded with multiple video frames included in the video image to obtain an encoded bitstream, the bitstream at least including information indicating the reference relationship between the frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is With reference to the forward LTR frame with the closest time domain distance to this frame, the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device, that is, in this embodiment, the sending end device is marking the LTR frame When there is no need to wait for the feedback from the receiving end device, it is possible to mark multiple LTR frames in one RTT, which can greatly shorten the reference distance between frames and improve the coding quality of video images.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sender device determines the value of N according to the coding quality of the first n frames of the current frame, and n ⁇ N.
  • the sender device can determine the value of N according to the coding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiver device.
  • the network status information can include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the sending end device determines the value of N according to a comparison result of the coding quality of the first n frames of the current frame and the coding quality threshold.
  • the sending end device determines the value of M according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • L is a positive integer
  • n is greater than or equal to 1. Positive integer.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the sender device can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in time domain.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the marking interval D of the LTR frame is used for the sending end device to mark the LTR frame.
  • the transmitting end device marks the LTR frame according to the marking interval of the LTR, which can mark multiple LTR frames in one RTT, and in this application, the marking interval of the LTR is not fixedly set, but dynamically changes, and may be the same The interval may also be a different interval, which is specifically determined according to the actual application scenario, so that the reference distance between frames can be greatly shortened, and the coding quality of the video image can be improved.
  • the sender device can dynamically determine the LTR marking interval based on network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • the present application provides a method for displaying a video image, where the video image includes a plurality of video frames, including:
  • Parsing the code stream to obtain information indicating the inter-frame reference relationship wherein the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, among them:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target long-term reference LTR frame with the closest time domain distance to the current frame; the current frame here refers to the current frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame; the current frame here refers to the previous N frames Every frame in
  • Rebuilding the multiple video frames, wherein the rebuilding multiple data frames includes: reconstructing the current video frame according to the reference frame of the current frame;
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the previous N frames of the current frame.
  • the information of the inter-frame reference relationship of the first N frames of the frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the above forward LTR frame is marked by the sending end device as an LTR frame Encoded video frame. That is to say, in this embodiment, the sending end device does not need to wait for feedback from the receiving end device when marking the LTR frame. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the reference distance between frames and improve the video image. Encoding quality.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the present application provides a device for sending a video image.
  • the video image includes a plurality of video frames, including: an encoding module, configured to encode the plurality of video frames to obtain an encoded bitstream.
  • the code stream includes at least information indicating the reference relationship between frames; in addition, the code stream may also include encoded data, such as: residual data between the current frame and the reference frame, etc.; the above information indicating the reference relationship between frames can be placed In the slice header;
  • the transmission module is configured to send the coded code stream, wherein the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, among them:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest temporal distance to the current frame, where the current frame refers to the current frame, and the target forward LTR frame is
  • the sending end device receives the forward LTR frame of the receiving end device confirmation message, specifically, the target forward LTR frame is an encoded video frame marked as an LTR frame by the encoding module and receiving the confirmation message sent by the receiving end device,
  • the confirmation message corresponds to the target forward LTR frame; in this application, the sending end device is the local end, for example, it can also be called the encoding end device, and the receiving end device is the opposite end or the remote end, for example, it can also be called the decoding end device. ;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame refers to the previous N frames
  • the forward LTR frame is an encoded video frame marked as an LTR frame by the encoding module.
  • the forward LTR frame is stored in the DPB.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the encoding module encodes the multiple video frames to obtain an encoded bitstream, where the bitstream at least includes information indicating a reference relationship between frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is
  • the above forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device. That is, in this embodiment, the encoding module marks the LTR frame when marking the LTR frame. There is no need to wait for feedback from the receiving device, so multiple LTR frames can be marked in one RTT, which can greatly shorten the reference distance between frames and improve the coding quality of video images.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the encoding module determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame.
  • Each of the last M frames of the current frame refers to the previous frame of the current frame, so that Shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize the adaptive selection of reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to avoid references that are far away from the current frame in the time domain.
  • the reference frame greatly alleviates the problem of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image smoothness.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the encoding module determines the value of N according to the encoding quality of the first n frames of the current frame, n ⁇ N.
  • the encoding module can determine the value of N according to the encoding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiving end device.
  • the network status information can include the network packet loss rate, One or more of available network bandwidth and network loop time RTT.
  • the encoding module determines the value of N according to a comparison result of the encoding quality of the first n frames of the current frame and an encoding quality threshold.
  • the encoding module determines the value of M according to the number of video frames included in a unit time.
  • the encoding module may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • L is a positive integer
  • n is greater than or equal to 1. Positive integer.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the encoding module can determine that the first frame in the (Mn+1) frame refers to the target forward LTR that is the closest in time domain to the first frame.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize the adaptive selection reference Relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent, avoid referencing reference frames with a long time domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality blur caused by packet loss The problem of achieving a good balance between image quality and image fluency.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the encoding module determines the marking interval D of the LTR frame according to the network state information fed back by the receiving end device, and the network state information includes: network packet loss rate, network available bandwidth And one or more of the network loopback time RTT.
  • the marking interval D of the LTR frame is used for the encoding module to mark the LTR frame.
  • the encoding module marks the LTR frame according to the mark interval of the LTR, which can mark multiple LTR frames in one RTT, and in this application, the mark interval of the LTR is not fixedly set, but dynamically changes, and may be the same interval , It may be different intervals, which are determined according to actual application scenarios, which can greatly shorten the reference distance between frames and improve the coding quality of video images.
  • the sender device can dynamically determine the LTR marking interval based on network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • the present application provides a device for sending a video image, where the video image includes a plurality of video frames, including:
  • the judgment module is used to judge whether the current frame is marked as a long-term reference LTR frame
  • the encoding module is used to encode the unmarked current frame when the current frame is not marked as an LTR frame, wherein the encoding process includes: at least encoding information representing the inter-frame reference relationship of the current frame into the code Stream, the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest temporal distance to the current frame; or,
  • the marked current frame is encoded, where the encoding process includes: at least encoding information representing the inter-frame reference relationship of the current frame into a code stream, and the current frame
  • the inter-frame reference relationship indicates that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, where the target forward LTR frame is the forward LTR for which the encoding module receives the confirmation message from the receiving end device Frame
  • the target forward LTR frame is an encoded video frame marked as an LTR frame by the encoding module and receiving a confirmation message sent by the receiving end device, and the confirmation message is the same as the target forward LTR frame
  • the sending end device is the local end, for example, it can also be called the encoding end device
  • the receiving end device is the opposite end or the remote end, for example, it can also be called the decoding end device;
  • the transmission module is used to send the coded stream.
  • the encoding module when the encoding module encodes the unmarked current frame, it refers to the forward LTR frame with the closest time domain distance to the unmarked current frame, and the foregoing forward LTR frame is marked by the encoding module as an LTR frame That is to say, in this embodiment, the encoding module does not need to wait for feedback from the receiving device when marking LTR frames. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the time between frames. Refer to the distance to improve the coding quality of the video image.
  • the judgment module is specifically configured to judge whether the current frame is marked as an LTR frame according to the marking interval of the LTR frame.
  • the judgment module includes:
  • the marking sub-module is used to mark the current frame as an LTR frame when the number of interval frames is equal to the marking interval of the LTR frame; when the number of interval frames is not equal to the marking interval of the LTR frame, to The current frame is not marked as an LTR frame.
  • the judgment module is further configured to determine the marking interval of the LTR frame according to the network status information fed back by the receiving end device.
  • the network status information includes: network packet loss rate, network availability One or more of bandwidth and network loop time RTT.
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the sender device tag Is an encoded video frame of an LTR frame; wherein the current frame is not marked as an LTR frame and the encoding quality of the current frame is greater than or equal to the encoding quality threshold; or,
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest temporal distance to the current frame, and the forward LTR frame is an encoded video frame marked as an LTR frame by the transmitting end device; where The current frame is not marked as an LTR frame and the coding quality of the current frame is less than the coding quality threshold.
  • the sending device when the sending device encodes the current frame, it refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the encoding module obtains the encoding of the current frame Quality. Compare the encoding quality of the current frame with the encoding quality threshold. If the encoding quality of the current frame is less than the encoding quality threshold, when the encoding module encodes the next frame of the current frame, refer to the closest temporal distance to the next frame The target forwards the LTR frame to improve the coding quality of the frame after the current frame.
  • the encoding module is further configured to encode the last M+1 frame of the current frame, and the encoding process includes: inter-frame representing the last M+1 frame of the current frame
  • the information of the reference relationship is encoded into the code stream, and the inter-frame reference relationship of the last M+1 frame indicates that the first frame in the last M+1 frame refers to the target forward LTR with the closest time domain distance to the first frame Frame, each frame after the first frame in the last M+1 frames refers to the previous frame of this frame, where M is a positive integer; wherein the current frame is not marked as an LTR frame and the current frame
  • the coding quality of the frame is less than the coding quality threshold.
  • the encoding module can determine that the first frame in the last M+1 frame refers to the front target that is closest to the first frame in time domain.
  • each frame after the first frame in the above-mentioned last M+1 frames refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize self-adaptation
  • Select the reference relationship such as the flexible combination of the full reference relationship and the frame-by-frame reference relationship, to a certain extent, avoid referencing the reference frame with a long time domain distance from the current frame, and greatly alleviate the phenomenon of video jams and images caused by packet loss
  • the problem of quality blur achieves a better balance between image quality and image fluency.
  • the encoding module is further configured to encode the next frame of the current frame, and the encoding process includes: adding information indicating the inter-frame reference relationship of the next frame of the current frame In the code stream, the inter-frame reference relationship of the next frame indicates that the next frame refers to the target forward LTR frame with the closest time domain distance to the current frame, wherein the current frame is not marked as an LTR frame and The coding quality of the current frame is less than the coding quality threshold.
  • the encoding module is configured to determine the value of M according to the number of video frames included in a unit time.
  • the encoding module may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the present application provides a video call device.
  • the video call device may be a video call device used by a first user.
  • the video call device may include: a display screen; an image collector; one or more processors; and a memory. ; Multiple applications; and one or more computer programs.
  • the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned image collector may be a camera, or a vehicle-mounted sensor, etc.; the above-mentioned video call device may be a mobile terminal (mobile phone), a smart screen, UAV, Intelligent Connected Vehicle (hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or in-vehicle equipment and other equipment.
  • a vehicle-mounted computer Mobile Data Center
  • the above-mentioned image collector may be a camera, or a vehicle-mounted sensor, etc.
  • the above-mentioned video call device may be a mobile terminal (mobile phone), a smart screen, UAV, Intelligent Connected Vehicle (hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or in-vehicle equipment and other equipment.
  • ICV Intelligent Connected Vehicle
  • the one or more computer programs are stored in the memory, and the one or more computer programs include instructions.
  • the device executes the following steps: A user requests a first operation of a video call with a second user to establish a video call connection between the first user and the second user; the video call connection here refers to the electronic device used by the first user and the second user 2. Video call connection between electronic devices used by users;
  • the video image includes a plurality of video frames, where the environment may be a video image of the internal environment and/or the external environment where the first user is located , Such as the environment in the car and/or intelligently detect obstacles and perceive the surrounding environment during driving;
  • the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, wherein:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest temporal distance to the current frame, where the current frame refers to the current frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame refers to the previous N frames Every frame in.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the image collector collects information including the first user
  • the video image of the environment is then encoded with multiple video frames included in the above video image to obtain an encoded code stream, the code stream including at least information indicating a reference relationship between frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is With reference to the forward LTR frame with the closest time domain distance to this frame, the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device, that is, in this embodiment, the sending end device is marking the LTR frame When there is no need to wait for the feedback from the receiving end device, it is possible to mark multiple LTR frames in one RTT, which can greatly shorten the reference distance between frames and improve the coding quality of video images.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the device when the instruction is executed by the device, the device specifically executes the following steps:
  • the value of N is determined according to the coding quality of the first n frames of the current frame, n ⁇ N.
  • the sender device can determine the value of N according to the coding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiver device.
  • the network status information can include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the device when the instruction is executed by the device, the device specifically executes the following steps:
  • the value of M is determined according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • L is a positive integer
  • n is greater than or equal to 1. Positive integer.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the sender device can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in time domain.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the marking interval D of the LTR frame is used for the sending end device to mark the LTR frame.
  • the sending end device marks the LTR frame according to the marking interval of the LTR, which can mark multiple LTR frames in one RTT, thereby greatly shortening the reference distance between frames and improving the coding quality of the video image.
  • the mark interval of LTR is not fixed, but dynamically changes. It may be the same interval or different intervals. It is determined according to the actual application scenario, which can greatly shorten the reference distance between frames and improve the video. The encoding quality of the image.
  • the sender device can dynamically determine the LTR marking interval based on network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • the present application provides a video image receiving device, where the video image includes multiple video frames, including:
  • the decoding module is used to parse the code stream to obtain information indicating the inter-frame reference relationship, wherein the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the inter-frame of the previous N frames of the current frame Reference relationship information, where:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target long-term reference LTR frame with the closest time domain distance to the current frame; the current frame here refers to the current frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame; the current frame here refers to the previous N frames Every frame in
  • the decoding module is further configured to reconstruct the multiple video frames, wherein the reconstructing the multiple data frames includes: reconstructing the current video frame according to the reference frame of the current frame;
  • the display module is used to display the video image.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the decoding module parses the code stream, information indicating the inter-frame reference relationship can be obtained, and the above-mentioned information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the previous N frames of the current frame.
  • the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is marked as LTR for the sending end device.
  • Frame of the encoded video frame That is to say, in this embodiment, the sending end device does not need to wait for feedback from the receiving end device when marking the LTR frame. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the reference distance between frames and improve the video image. Encoding quality.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the present application provides a video image encoding device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium, and a computer executable program is stored in the storage medium.
  • the central processing unit is connected to the non-volatile storage medium, and executes the computer executable program to implement the first aspect or the method in any possible implementation manner of the first aspect.
  • the present application provides a video image encoding device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium, and a computer executable program is stored in the storage medium.
  • the central processing unit is connected to the non-volatile storage medium, and executes the computer executable program to implement the second aspect or the method in any possible implementation manner of the second aspect.
  • the present application provides a video image decoding device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium, and a computer executable program is stored in the storage medium.
  • the central processing unit is connected to the non-volatile storage medium and executes the computer executable program to implement the method of the fourth aspect.
  • an embodiment of the present application provides a device for decoding video data, and the device includes:
  • Memory used to store video data in the form of code stream
  • the video decoder is used to decode the information representing the inter-frame reference relationship from the code stream, wherein the information representing the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the first N frames of the current frame Information about the reference relationship between:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target long-term reference LTR frame with the closest time domain distance to the current frame; the current frame here refers to the current frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame; the current frame here refers to the previous N frames Every frame in
  • Rebuilding the multiple video frames, wherein the rebuilding the multiple data frames includes: reconstructing the current video frame according to the reference frame of the current frame.
  • an embodiment of the present application provides a device for encoding video data, and the device includes:
  • a memory for storing video data, the video data including one or more video frames
  • the video encoder is configured to encode the multiple video frames to obtain an encoded code stream, where the code stream includes at least information indicating the reference relationship between frames; in addition, the code stream may also include encoded data, For example: residual data of the current frame and the reference frame, etc.; the above information indicating the reference relationship between frames can be placed in the slice header;
  • the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, wherein:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest time domain distance to the current frame, where the current frame refers to the current frame, and the foregoing target forward LTR frame is used
  • the device for encoding video data receives the forward LTR frame of the confirmation message of the device for decoding video data.
  • the above-mentioned target forward LTR frame may be marked as an LTR frame by the device for encoding video data and received
  • the encoded video frame of the confirmation message sent by the device for decoding video data, where the confirmation message corresponds to the target forward LTR frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame refers to the previous N frames
  • the forward LTR frame is an encoded video frame marked as an LTR frame by the device for encoding video data.
  • the forward LTR frame is stored in the DPB.
  • this application provides a computer-readable storage medium in which a computer program is stored, and when it is run on a computer, the computer can execute operations such as the first, second, and third aspects. Aspect or the method of the fourth aspect.
  • this application provides a computer program, when the computer program is executed by a computer, it is used to execute the method described in the first aspect, the second aspect, the third aspect, or the fourth aspect.
  • the program in the fifteenth aspect may be stored in whole or in part on a storage medium that is packaged with the processor, and may also be stored in part or in a memory that is not packaged with the processor.
  • Figure 1 is a schematic diagram of two users making a video call through the electronic devices they use;
  • Fig. 2 is a reference structure diagram of a coded video frame in the related art
  • FIG. 3 is a flowchart of an embodiment of a video image transmission method of this application.
  • 4(a) to 4(c) are schematic diagrams of an embodiment of the reference relationship between video frames in the video image transmission method of this application;
  • FIG. 5 is a schematic diagram of an embodiment of determining the mark interval of the LTR frame in the video image transmission method of this application;
  • Fig. 6 is a flowchart of another embodiment of a video image transmission method according to this application.
  • FIG. 7(a) to 7(b) are schematic diagrams of another embodiment of the reference relationship between video frames in the video image transmission method of this application.
  • FIG. 8 is a flowchart of an embodiment of a video call method according to this application.
  • Figures 9(a) to 9(c) are schematic diagrams of requesting a video call in the video call method of this application.
  • Figure 9(d) is the interface of establishing a video call connection stage in the video call method of this application.
  • Figure 9(e) is the interface after the video call connection is established in the video call method of this application.
  • Figures 10(a) to 10(b) are schematic diagrams of an embodiment of an application scenario of the video call method of this application.
  • FIG. 11(a) to 11(b) are schematic diagrams of another embodiment of the application scenario of the video call method of this application.
  • FIG. 12 is a flowchart of an embodiment of a method for displaying video images of this application.
  • FIG. 13 is a schematic structural diagram of an embodiment of a video image sending device of this application.
  • FIG. 14 is a schematic structural diagram of another embodiment of a video image sending device of this application.
  • FIG. 15 is a schematic structural diagram of another embodiment of a video image sending device of this application.
  • Figure 16 (a) is a schematic structural diagram of an embodiment of a video call device according to this application.
  • FIG. 16(b) is an explanatory diagram of an example of a video decoding device 40 including an encoder 20 and/or a decoder 30 according to an exemplary embodiment
  • Figure 16(c) is a schematic structural diagram of a video decoding device 400 (for example, a video encoding device 400 or a video decoding device 400) provided by an embodiment of the present application;
  • FIG. 17 is a schematic structural diagram of an embodiment of a video image receiving device of this application.
  • the video image transmission method provided by the embodiments of the present application can be applied to various real-time audio and video interactive scenes, for example: two users make a video call through their own electronic devices, or multiple users make video through their own electronic devices. All teleconferences can use the video image transmission method proposed in this application.
  • Figure 1 is a schematic diagram of two users making a video call through their own electronic devices.
  • the above two users can be user A and user B.
  • user A sends a video stream to user B
  • user A uses The electronic device of can be the sending end A
  • the electronic device used by the user B can be the receiving end B.
  • the sending end A sends the encoded video stream to the receiving end B
  • the receiving end B feeds back the reception of the video frame and the network status information to the sending end A in real time
  • the sending end A evaluates the network status based on the information fed back by the receiving end B
  • the video frame encoding is adjusted according to the receiving situation of the video frame of the receiving end B and the network condition, and the encoded video stream is sent to the receiving end B.
  • the electronic device used by user B can be used as sender B, and the electronic device used by user A can be used as receiver A, so the direction from sender B to receiver A is similar.
  • the processing mechanism will not be repeated here.
  • Figure 2 is a reference structure diagram of an encoded video frame in the related art. Taking the sending end A sending a video stream to the receiving end B as an example, the sending end A responds to the network conditions of the receiving end B, such as available network bandwidth and/or network time.
  • Delay select the appropriate I frame interval, encoding rate, video resolution, frame rate and other information; during the conversation, the sender A can also set the inter-frame reference for the current frame according to the receiving situation of each frame fed back by the receiver B
  • the video frames in the decoded picture buffer (Decoded Picture Buffer; hereinafter referred to as DPB) at the encoding end are respectively marked as long-term reference (Long Term Reference; hereinafter referred to as: LTR) frames, not as reference frames and short-term reference frames;
  • LTR Long Term Reference
  • the sending end A encodes the current frame the LTR frame confirmed by the receiving end B is used as a reference for encoding, which can ensure a better video picture fluency.
  • the confirmed LTR frame of the receiving end B refers to the sending end A
  • the confirmation message sent by the receiving end B is received, and the confirmation message indicates that the LTR frame can be decoded by the receiving B normally.
  • the receiving end B feeds back the decodable frame information in real time, and the sending end A selects among the video frames buffered by the DPB, and marks the selected video frame as an LTR frame, and the current frame uses the newly marked LTR frame as Encode with reference to the frame.
  • the advantage of this reference relationship is that when the video frames received by the receiving end B are encoded, the confirmed LTR frame is used as the reference frame. As long as the received video frame is complete, it can be decoded and displayed. As shown in Figure 2, the packet loss of frames 6, 11, 12, 14, 18 causes the video data to be incomplete, and the sender A does not need to re-encode the I frame to restore the picture of the receiver B. The receiver B only needs to be able to After the subsequent video frames are received normally and completely, they can be decoded normally and sent to the display module of the receiving end B for rendering and display.
  • RTT Network Round Trip Time
  • this application redesigns the reference structure of encoded frames to combat sudden packet loss, large packet loss and congestion scenarios on the live network, while taking into account fluency and clarity to achieve the best Video call experience.
  • FIG. 3 is a flowchart of an embodiment of a video image transmission method of this application.
  • the above-mentioned video image may include multiple video frames.
  • the above-mentioned video image transmission method may include:
  • Step 301 Encode a plurality of video frames to obtain an encoded code stream, the code stream at least including information indicating a reference relationship between frames.
  • the code stream may also include coded data, such as residual data between the current frame and the reference frame; the information indicating the reference relationship between frames may be placed in a slice header.
  • Step 302 Send the encoded bitstream.
  • the above-mentioned information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame.
  • the above-mentioned inter-frame reference relationship information of the current frame indicates that this frame refers to the target forward LTR frame with the closest time domain distance to this frame, where this frame is the current frame, and the above-mentioned target forward LTR frame is the sender
  • the device receives the forward LTR frame of the confirmation message from the receiving end device.
  • the target forward LTR frame may be an encoded video frame marked as an LTR frame by the transmitting end device and receiving the confirmation message sent by the receiving end device.
  • the confirmation message corresponds to the target forward LTR frame;
  • the sending end device is the local end, for example, it can also be called the encoding end device, and the receiving end device is the opposite end or the remote end, for example, it can also be called the decoding end device;
  • the “information of the inter-frame reference relationship of the current frame above indicates that the current frame refers to the target forward long-term reference LTR frame with the closest time domain distance to this frame” in the “target forward LTR frame with the closest time domain distance to this frame” ", the above-mentioned "nearest target forward LTR frame”, in an example, for example: the difference A between the POC of the current frame and the POC of the nearest target forward LTR frame is smaller than the POC of the current frame and other targets.
  • the above-mentioned information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame is each of the previous N frames Frame, the above-mentioned forward LTR frame is an encoded video frame marked as LTR frame by the sending end device, and the above-mentioned forward LTR frame is stored in the DPB; it should be noted that “the reference relationship between the first N frames of the above-mentioned current frame
  • the information indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame" in the “forward LTR frame with the closest time domain distance to the current frame”, the above “nearest forward LTR frame” Frame", in an example, for example: the difference C between the POC of this frame and the POC of the nearest forward LTR frame is smaller than the difference D between the POC of this frame and the POC of other forward LTR frames.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the above-mentioned multiple video frames are encoded to obtain an encoded bitstream, and the above-mentioned bitstream includes at least information indicating a reference relationship between frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is
  • the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device, that is, in this embodiment, the sending end device is marking the LTR frame
  • the above-mentioned information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the above-mentioned information about the inter-frame reference relationship of the last M frames of the current frame indicates each of the last M frames.
  • One frame refers to the previous frame of this frame, where N and M are positive integers. For example, the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the above-mentioned information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the above-mentioned information about the inter-frame reference relationship of the last M frames of the current frame indicates each of the last M frames.
  • One frame refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers. For example, the specific values of N and M may depend on the network.
  • the sending end device may determine the value of N according to the coding quality of the first n frames of the current frame, n ⁇ N.
  • the sender device can determine the value of N according to the coding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiver device.
  • the network status information can include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the sending end device may determine the value of N according to the comparison result of the coding quality of the first n frames of the current frame and the coding quality threshold. Specifically, after encoding each frame, the sender device can output the peak signal to noise ratio (Peak Signal to Noise Ratio; hereinafter referred to as PSNR) that represents the encoding quality of this frame.
  • PSNR Peak Signal to Noise Ratio
  • the sender device finds the current frame's The PSNR of each of the first n frames is smaller than the PSNR of the previous frame, that is, the PSNR of the first n frames is in a downward trend, and the PSNR of the previous frame of the current frame is less than the coding quality threshold (ie PSNR threshold), then the sending end The device determines that the current frame needs to refer to the target forward LTR frame that is closest to the current frame in time domain, and each of the last M frames of the current frame needs to refer to the previous frame of the current frame. At this time, the number of frames between the current frame and the forward LTR frame closest to the current frame in time domain is the value of N.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time. In a specific implementation, the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • the mark interval of the aforementioned LTR frame refers to the number of interval frames for marking the LTR frame, that is, how many frames are needed to mark the next LTR frame after the previous LTR frame is marked. For example, if the marking interval of the LTR frame is 4, then after marking the current frame as the LTR frame, there needs to be an interval of 4 frames, and the fifth frame after the current frame is marked as the LTR frame.
  • the information of the inter-frame reference relationship of the L frame indicates that the first frame in the (Mn+1) frame refers to the target forward LTR frame with the closest time domain distance to the first frame,
  • Each frame after the first frame in the (Mn+1) frame refers to the previous frame of the current frame, where L is a positive integer, and n is a positive integer greater than or equal to 1.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the sender device can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in time domain.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • FIGS. 4(a) to 4(c) are schematic diagrams of an embodiment of the reference relationship between video frames in the video image transmission method of this application.
  • the current frame refers to the target forward LTR frame with the closest time domain distance to this frame.
  • the current frame in this sentence is each of the previous 4 frames.
  • the forward LTR frame with the closest time domain distance to each of the first 4 frames of the current frame happens to be the target forward LTR frame.
  • each of the first 4 frames of the current frame refers to The forward LTR frame may not be the target forward LTR frame.
  • the forward LTR frame is an encoded video frame marked as LTR frame by the sending end device. It is different from the target forward LTR frame.
  • the forward LTR frame is marked by the sending end device according to the marking interval of the LTR frame, and the sending end device has not A confirmation message sent by the receiving end device for the foregoing forward LTR frame is received.
  • each of the last three frames of the current frame refers to the previous frame of the current frame.
  • each of the first 4 frames of the current frame refers to the forward LTR with the closest time domain distance to the current frame.
  • Frame the current frame in this sentence refers to each of the previous 4 frames of the current frame.
  • the forward LTR frame with the closest time domain to each of the previous 4 frames is not the target forward LTR frame;
  • each of the last three frames of the current frame refers to the previous frame of the current frame, and the current frame in this sentence refers to each of the next three frames.
  • the sending end device marks the above current frame as an LTR frame, then for the last M frames of the above current frame, Each frame refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the forward LTR frame here is the current frame in Figure 4(c).
  • the current frame in this sentence refers to the next M frame Every frame.
  • the sender device can determine the mark interval D of the LTR frame according to the network status information fed back by the receiver device.
  • the network status information can include: network packet loss rate, network available bandwidth, and network loopback time RTT. one or more.
  • Figure 5 is a schematic diagram of an embodiment of determining the mark interval of the LTR frame in the video image transmission method of this application.
  • the sending end device can determine the network characteristics according to the network packet loss information and the network RTT , And then combine the network characteristics, anti-packet algorithm, the receiving end feedback confirmed LTR frame, the LTR loss rate (the increase in the reference distance will cause the loss of the encoded picture quality at the same bit rate), the motion scene of the video image (that is, the motion scene in Figure 5) Picture motion status), code table, target jam time, human subjectively perceivable jam time, and the number of LTR frames that can be buffered in the DPB are used as judgment input to obtain the mark interval of the LTR frame.
  • the marking interval D of the aforementioned LTR frame is used for the sending end device to mark the LTR frame.
  • the sending end device marks the LTR frame according to the marking interval of the LTR, which can mark multiple LTR frames in one RTT, and in this application, the marking interval of the LTR is not fixedly set, but dynamically changes, and may be the same interval. It may also be different intervals, which are specifically determined according to actual application scenarios, so that the reference distance between frames can be greatly shortened and the coding quality of the video image can be improved.
  • the sender device can dynamically determine the LTR marking interval based on network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • FIG. 6 is a flowchart of another embodiment of a video image transmission method according to this application.
  • the above-mentioned video image includes multiple video frames.
  • the above-mentioned video image transmission method may include:
  • Step 601 Determine whether the current frame is marked as an LTR frame.
  • step 602 is executed; if the current frame is marked as an LTR frame, step 603 is executed.
  • judging whether the current frame is marked as an LTR frame may be: judging whether the current frame is marked as an LTR frame according to the marking interval of the LTR frame.
  • judging whether the current frame is marked as an LTR frame can be: obtaining the number of interval frames between the current frame and the forward LTR frame that is closest to the foregoing current frame in time domain; if the foregoing interval frame number If it is equal to the mark interval of the LTR frame, the current frame is marked as an LTR frame; if the number of interval frames is not equal to the mark interval of the LTR frame, the current frame is not marked as an LTR frame.
  • the sending end device can determine the marking interval of the LTR frame according to the network status information fed back by the receiving end device.
  • the aforementioned network status information can include: network packet loss rate, network available bandwidth, and network loopback time RTT. One or more of.
  • the sender device can determine the network characteristics according to the network packet loss information and the network RTT, and then send the network characteristics, anti-packet algorithm, and receiver feedback confirmed LTR frames and LTR loss rate (reference distance The increase will cause the loss of the encoding picture quality at the same bit rate), the motion scene of the video image (i.e.
  • the picture motion status in Figure 5 the code table, the number of target freezes, the subjectively perceivable freeze duration, and the bufferable duration in the DPB
  • One or more of the information such as the number of LTR frames is used as a decision input to obtain the mark interval of the LTR frame, and one or a combination of the following information can also be obtained: whether to refer to the forward LTR frame, redundancy strategy, and resolution/bit rate/ Frame rate, etc.
  • marking the LTR frames according to the marking interval of the LTR can realize the marking of multiple LTR frames within one RTT, thereby greatly shortening the reference distance between frames and improving the coding quality of the video image.
  • Step 602 Encode an unmarked current frame, where the foregoing encoding process may include: at least encoding information representing the inter-frame reference relationship of the current frame into a code stream, and the foregoing inter-frame reference relationship of the current frame represents the current frame reference and
  • the foregoing forward LTR frame with the closest temporal distance of the current frame the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device; it should be noted that the forward LTR with the closest temporal distance to the foregoing current frame Frame means that the difference between the POC of the current frame and the POC of the forward LTR frame with the closest time domain distance is smaller than the difference between the POC of this frame and the POCs of other forward LTR frames.
  • step 604 is executed.
  • Step 603 Encode the marked current frame, where the encoding process includes: at least encoding information representing the inter-frame reference relationship of the current frame into the code stream, and the inter-frame reference relationship of the current frame represents the current frame reference and the current frame reference.
  • the target forward LTR frame with the closest frame time domain distance where the target forward LTR frame is the forward LTR frame in which the sending end device receives the receiving end device confirmation message.
  • the target forward LTR frame is the sending end device.
  • the end device is marked as an LTR frame and receives the encoded video frame of the confirmation message sent by the receiving end device, and the confirmation message corresponds to the target forward LTR frame.
  • step 604 is executed.
  • the sending end device is the local end, for example, it may also be called the encoding end device, and the receiving end device is the opposite end or the remote end, for example, it may also be called the decoding end device;
  • the target forward LTR frame with the closest time domain to the current frame refers to: the difference between the POC of the current frame and the POC of the target forward LTR frame is smaller than the POC of this frame and other target forward LTRs The difference between the POC of the frame.
  • Step 604 Send the above code stream.
  • the forward LTR frame is the LTR frame marked by the sending end device.
  • Encoded video frames that is, in this embodiment, the sender device does not need to wait for feedback from the receiver device when marking the LTR frame. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the time between frames. Refer to the distance to improve the coding quality of the video image.
  • 7(a) to 7(b) are schematic diagrams of another embodiment of the reference relationship between video frames in the video image transmission method of this application.
  • the encoding end marks the first I frame encoded (ie the first frame in Figure 7) as an LTR frame, and then performs packetization and redundancy processing on the encoded I frame , Sent to the decoder through the network; at the same time, the I frame is used as a key frame for asymmetric redundancy protection, which is different from ordinary frames, to ensure that the decoder can receive such key frames in a timely manner.
  • the decoder receives the I frame, and After confirming that the I frame can be decoded normally, a confirmation message is fed back to the encoding end in time. If the sending end device does not receive the confirmation message fed back by the receiving end device within a predetermined period of time, the sending end device will re-encode the I frame to prevent the initial stage of receiving the confirmation message. Pass is abnormal.
  • the sending end device refers to the first frame when encoding the second and third frames.
  • the sending end device receives the network status information fed back by the receiving end device, as described above, sending The end device can determine the marking interval of the LTR frame according to the network status information fed back by the receiving end device. At this time, the marking interval of the LTR frame determined by the transmitting end device is 2.
  • the transmitting end device finds the first The number of frames between the 4th frame and the 1st frame is 2, which is equal to the mark interval of the LTR frame. Therefore, the sending end device marks the 4th frame as an LTR frame, because at this time the sending end device has received the data sent by the receiving end device.
  • the confirmation message for the first frame that is, the first frame can be decoded normally by the decoder, and it is the target forward LTR frame.
  • the first frame is the target forward LTR frame with the closest time domain to the fourth frame. Therefore, when the sender device encodes the 4th frame, it refers to the 1st frame.
  • the sending end device When the sending end device encodes the fifth frame, the sending end device can also determine the marking interval of the LTR frame according to the network status information fed back by the receiving end device. At this time, the marking interval of the LTR frame determined by the sending end device is 3. Since the 4th frame is the forward LTR frame of the 5th frame, the number of frames between the 5th frame and the 4th frame is 1, so the 5th frame is not marked as an LTR frame.
  • the forward LTR frame with the closest field distance ie, the 4th frame
  • the encoding process of the subsequent frame is similar to the foregoing encoding process, and will not be repeated here.
  • the sending end device can also determine the marking interval of the LTR frame according to the network status information fed back by the receiving end device. At this time, the marking interval of the LTR frame determined by the sending end device is 2 . Because the 13th frame is the forward LTR frame of the 16th frame, the number of frames between the 16th frame and the 13th frame is 2, so the 16th frame is marked as an LTR frame, but at this time, the sender device does not receive To the receiving device's confirmation message for the 13th frame, the target forward LTR frame closest to the 16th frame in time domain is the 8th frame, so the sending device refers to the 8th frame and encodes the 16th frame.
  • the video frame even if the data of the 5th frame, 6th frame, 12th frame, 13th frame, 14th frame and 18th frame are lost and the video frame is incomplete, the video frame will not be complete. It affects the normal decoding of other received complete video frames.
  • the 15th frame is encoded with reference to the 13th frame, and the 13th frame is incomplete, so the 15th frame cannot be decoded.
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the coded LTR frame marked by the transmitting end device.
  • Video frame wherein the above-mentioned current frame is not marked as an LTR frame and the coding quality of the current frame is greater than or equal to the coding quality threshold; or,
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest temporal distance to the above-mentioned current frame, and the above-mentioned forward LTR frame is an encoded video frame marked as an LTR frame by the transmitting end device; wherein the above-mentioned current frame is not It is marked as an LTR frame and the coding quality of the above-mentioned current frame is less than the coding quality threshold.
  • the sending device refers to the forward LTR frame with the closest time domain distance to the current frame when encoding the current frame. After encoding the current frame, the sending device obtains the current frame Encoding quality, compare the encoding quality of the current frame with the encoding quality threshold. If the encoding quality of the current frame is less than the encoding quality threshold, when encoding the next frame of the current frame, refer to the one with the closest temporal distance to the next frame. The target forwards the LTR frame to improve the coding quality of the frame after the current frame.
  • the sending end device may also encode the last M+1 frames of the current frame.
  • the foregoing encoding process includes: encoding information representing the inter-frame reference relationship of the last M+1 frames of the current frame into the code stream.
  • the inter-frame reference relationship of +1 frame means that the first frame in the last M+1 frame refers to the target forward LTR frame with the closest time domain distance to the first frame, and every frame after the first frame in the last M+1 frame All refer to the previous frame of the current frame, where M is a positive integer; wherein, the above-mentioned current frame is not marked as an LTR frame and the coding quality of the current frame is less than the coding quality threshold.
  • the sending end device can also encode the next frame of the current frame.
  • the encoding process includes: encoding information indicating the inter-frame reference relationship of the next frame of the current frame into the code stream, and the inter-frame of the next frame
  • the reference relationship means that the next frame refers to the target forward LTR frame with the closest temporal distance to the current frame, where the current frame is not marked as an LTR frame and the encoding quality of the current frame is less than the encoding quality threshold.
  • the sender device when the sender device encodes the current frame, if the current frame is not marked as an LTR frame, the sender device refers to the forward LTR frame that is closest to the current frame in time domain to the current frame. After encoding the current frame, if the sending end device finds that the encoding quality of the current frame is less than the encoding quality threshold (that is, the PSNR of the current frame is less than the PSNR threshold), the sending end device will encode the next frame of the current frame , Refer to the target forward LTR frame with the closest temporal distance to the next frame, as shown in Figure 7(b), the next frame of the current frame is the first frame in the last M+1 frame of the current frame, When the transmitting end device encodes each frame after the first frame, it refers to the previous frame of the current frame.
  • the encoding quality threshold that is, the PSNR of the current frame is less than the PSNR threshold
  • the next frame of the current frame can be regarded as a virtual LTR frame.
  • the virtual LTR frame is encoded with the target forward LTR frame as a reference frame.
  • the virtual LTR frame is not buffered in the DPB.
  • the subsequent frames of the virtual LTR frame will be
  • the virtual LTR frame is coded as a short-term reference.
  • the sending end device determines that the first frame of the last M+1 frames of the current frame refers to the first frame time.
  • the target forward LTR frame with the closest field distance, each frame after the first frame in the last M+1 frame refers to the previous frame of this frame, as shown in Figure 7(b), where M is a positive integer.
  • the sending end device determines the value of M according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • FIG. 8 is a flowchart of an embodiment of a video call method according to this application.
  • the video call method provided in this embodiment can be applied to an electronic device having a display screen and an image collector.
  • the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned image collector may be a camera, or a vehicle-mounted sensor, etc.; the above-mentioned electronic device may be a mobile terminal (mobile phone), a smart screen, and no Man-machine, Intelligent Connected Vehicle (hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or on-board equipment and other equipment.
  • ICV Intelligent Connected Vehicle
  • the above video call method may include:
  • Step 801 In response to the first operation of the first user requesting a video call with the second user, establish a video call connection between the first user and the second user, where the video call connection refers to an electronic device used by the first user Video call connection with the electronic device used by the second user.
  • Figures 9(a) to 9(c) are schematic diagrams of requesting a video call in the video call method of this application, as shown in Figure 9(a), A user can click on the call icon 9a displayed in the electronic device used by the first user to enter the interface shown in Figure 9(b), and then click on the second user’s logo in the interface shown in Figure 9(b), Enter the interface shown in Figure 9(c), and then in the interface shown in Figure 9(c), click on the video call icon 9b in the "Unblocked Call" to complete the first request for a video call with the second user. operating.
  • the electronic device used by the first user establishes a video call connection between the first user and the second user in response to the first operation of the first user requesting a video call with the second user.
  • the electronic device used by the first user displays the interface shown in FIG. 9(d), and after the video call connection is established, the electronic device used by the first user displays the interface shown in FIG. 9(e).
  • Fig. 9(d) is the interface of establishing a video call connection stage in the video call method of this application.
  • Step 802 Collect a video image including the environment of the first user through an image collector, where the above-mentioned video image includes a plurality of video frames.
  • the environment here may be a video image of the internal environment and/or the external environment where the first user is located, such as the in-vehicle environment and/or intelligently detect obstacles and perceive the surrounding environment during driving.
  • the aforementioned image collector may be a camera or a vehicle-mounted sensor in an electronic device used by the first user.
  • Step 803 Encode the above-mentioned multiple video frames to obtain an encoded bitstream, where the above-mentioned bitstream at least includes information indicating a reference relationship between frames.
  • Step 804 Send the above-mentioned encoded code stream.
  • the above code stream may be sent to the electronic device used by the second user.
  • the above-mentioned information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame.
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest time domain distance to the current frame, where the current frame refers to the current frame, and the above-mentioned target forward LTR frame is sent
  • the end device is marked as an LTR frame, and the encoded video frame of the confirmation message sent by the receiving end device is received.
  • the confirmation message corresponds to the target forward LTR frame;
  • the sending end device is an electronic device used by the first user, and the receiving end device is an electronic device used by the first user.
  • the end device is an electronic device used by the second user.
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the forward LTR frame is marked as an LTR frame by the sending end device.
  • the current frame here refers to each of the previous N frames.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the image collector collects information including the first user
  • the video image of the environment is then encoded with multiple video frames included in the video image to obtain an encoded bitstream, the bitstream at least including information indicating the reference relationship between the frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is With reference to the forward LTR frame with the closest time domain distance to this frame, the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device, that is, in this embodiment, the sending end device is marking the LTR frame When there is no need to wait for the feedback from the receiving end device, it is possible to mark multiple LTR frames in one RTT, which can greatly shorten the reference distance between frames and improve the coding quality of video images.
  • the above-mentioned information indicating the inter-frame reference relationship also includes the information of the inter-frame reference relationship of the last M frames of the current frame, and the information of the inter-frame reference relationship of the last M frames of the current frame indicates each of the last M frames.
  • Frames refer to the previous frame of this frame, where N and M are positive integers. For example, the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the above-mentioned information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the above-mentioned information about the inter-frame reference relationship of the last M frames of the current frame indicates that in the last M frames
  • Each frame of refers to the forward LTR frame with the closest time domain distance to this frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines the value of N according to the coding quality of the first n frames of the current frame, and n ⁇ N.
  • the sender device can determine the value of N according to the coding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiver device.
  • the network status information can include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the sending end device determines the value of N according to a comparison result of the coding quality of the first n frames of the current frame and the coding quality threshold. Specifically, after each frame is encoded, the sender device can output the PSNR representing the encoding quality of this frame.
  • the sender device finds that the PSNR of each of the first n frames of the current frame is higher than that of the previous frame
  • the PSNR is small, that is, the PSNR of the first n frames is in a downward trend, and the PSNR of the previous frame of the current frame is less than the coding quality threshold (ie PSNR threshold)
  • the sender device determines that the current frame needs to refer to the target with the closest time domain distance to the current frame
  • the forward LTR frame each of the last M frames of the current frame needs to refer to the previous frame of the current frame.
  • the number of frames between the current frame and the forward LTR frame closest to the current frame in time domain is the value of N.
  • the sending end device determines the value of M according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time can be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time can be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • the mark interval of the aforementioned LTR frame refers to the number of interval frames for marking the LTR frame, that is, how many frames are needed to mark the next LTR frame after the previous LTR frame is marked. For example, if the marking interval of the LTR frame is 4, then after marking the current frame as the LTR frame, there needs to be an interval of 4 frames, and the fifth frame after the current frame is marked as the LTR frame.
  • the L frame is after the M frame in the time domain, and the information of the inter-frame reference relationship of the L frame indicates that the first frame in the (Mn+1) frame refers to the target forward LTR that is the closest to the first frame in the time domain.
  • Frame, each frame after the first frame in the above (Mn+1) frame refers to the previous frame of the current frame, where L is a positive integer, and n is a positive integer greater than or equal to 1.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the sender device can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in time domain.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the current frame refers to the target forward LTR frame with the closest time domain distance to this frame.
  • the current frame in this sentence is each of the previous 4 frames.
  • the forward LTR frame with the closest time domain distance to each of the first 4 frames of the current frame happens to be the target forward LTR frame.
  • each of the first 4 frames of the current frame refers to The forward LTR frame may not be the target forward LTR frame.
  • the forward LTR frame is an encoded video frame marked as LTR frame by the sending end device. It is different from the target forward LTR frame.
  • the forward LTR frame is marked by the sending end device according to the marking interval of the LTR frame, and the sending end device has not A confirmation message sent by the receiving end device for the foregoing forward LTR frame is received.
  • each of the last three frames of the current frame refers to the previous frame of the current frame.
  • each of the first 4 frames of the current frame refers to the forward LTR with the closest time domain distance to the current frame.
  • Frame the current frame in this sentence refers to each of the previous 4 frames of the current frame.
  • the forward LTR frame with the closest time domain to each of the previous 4 frames is not the target forward LTR frame;
  • each of the last three frames of the current frame refers to the previous frame of the current frame, and the current frame in this sentence refers to each of the next three frames.
  • the sending end device marks the above current frame as an LTR frame, then for the last M frames of the above current frame, Each frame refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the forward LTR frame here is the current frame in Figure 4(c).
  • the current frame in this sentence refers to the next M frame Every frame.
  • the sender device can determine the mark interval D of the LTR frame according to the network status information fed back by the receiver device.
  • the network status information can include: network packet loss rate, network available bandwidth, and network loopback time RTT. one or more.
  • the sender device can determine the network characteristics according to the network packet loss information and the network RTT, and then the network characteristics, anti-packet algorithm, and the receiver feedback confirmed LTR frames and LTR loss rate (reference distance increase will Causes the loss of encoding picture quality at the same bit rate), the motion scene of the video image (i.e.
  • the code table the number of target jams, the subjectively perceivable jam duration, and the LTR frames that can be cached in the DPB
  • One or more of the number and other information is used as a decision input to obtain the mark interval of the LTR frame, and one or a combination of the following information can also be obtained: whether to refer to the forward LTR frame, redundancy strategy, and resolution/code rate/frame rate Wait.
  • the marking interval D of the aforementioned LTR frame is used for the sending end device to mark the LTR frame.
  • the sending end device marks the LTR frames according to the marking interval of the LTR, which can mark multiple LTR frames within one RTT, which can greatly shorten the reference distance between frames and improve the coding quality of the video image.
  • the sender device can dynamically determine the LTR marking interval based on network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • Figure 9(e) is the interface after the video call connection is established in the video call method of this application.
  • the small window shown in 9c displays a video image including the environment of the first user
  • the large window shown in 9d displays a video image including the environment of the second user.
  • the video image displayed in the large window shown in 9d is obtained after the electronic device used by the first user decodes the code stream sent by the electronic device used by the second user, and the above code stream is the electronic device used by the second user.
  • the video image including the environment of the second user is obtained after encoding.
  • FIG. 8 of the present application can be applied to various real-time audio and video interactive scenes such as video calls or video conferences.
  • Figures 10(a) to 10(b) are schematic diagrams of an embodiment of an application scenario of the video call method of this application.
  • Figures 10(a) to 10(b) show a video call between two users.
  • the image collector is used to obtain real-time YUV data from
  • Video pre-processor The YUV data obtained from the Camera is converted into the format and resolution required by the encoder, and the mobile device completes the horizontal and vertical screen rotation processing of the image.
  • Network analysis and processing system Control resolution, frame rate, redundancy rate, and data frame reference relationship and other information based on feedback information.
  • the specific analysis method can be referred to the relevant description of FIG. 5, which will not be repeated here.
  • Video encoder Complete the encoding process according to the reference frame determined by the network analysis and processing system, and realize the LTR mark and buffer in the DPB.
  • Network Transmitter Complete the video stream/control information stream network sending and receiving process.
  • Video frame processing module complete data frame framing, redundant data recovery and data frame integrity check services.
  • Video decoder Decode the data frames assembled by the pre-order module according to the reference relationship.
  • Video display The decoded data frame is submitted to the display module to complete the data frame rendering and display business.
  • FIGS. 11(a) to 11(b) are schematic diagrams of another embodiment of the application scenario of the video call method of this application.
  • Figure 11 (a) ⁇ Figure 11 (b) show the scene of multi-party users in a video conference.
  • the function of each module in Figure 11 (b) is the same as that of the corresponding module in Figure 10 (b), which will not be repeated here. .
  • FIG. 12 is a flowchart of an embodiment of a video image display method of this application.
  • the above-mentioned video image includes multiple video frames.
  • the above-mentioned video image display method may include:
  • Step 1201 Parse the code stream to obtain information representing the reference relationship between frames.
  • the above-mentioned information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, where:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest temporal distance to the current frame, where the current frame refers to the current frame; wherein, the target forward LTR frame is The sending end device is marked as an LTR frame, and the encoded video frame of the confirmation message sent by the receiving end device is received, where the confirmation message corresponds to the target forward LTR frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the The coded video frame marked as LTR frame by the sending end device.
  • the current frame here refers to each of the previous N frames.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the above-mentioned information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the above-mentioned information about the inter-frame reference relationship of the last M frames of the current frame indicates the Each of the last M frames refers to the previous frame of the current frame, where N and M are positive integers.
  • N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the above-mentioned information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the above-mentioned information about the inter-frame reference relationship of the last M frames of the current frame indicates the Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • Step 1202 Rebuilding the multiple video frames, wherein the rebuilding multiple data frames includes: reconstructing the current video frame according to the reference frame of the current frame.
  • Step 1203 display the above-mentioned video image.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the previous N frames of the current frame.
  • the information of the inter-frame reference relationship of the first N frames of the frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the above forward LTR frame is marked by the sending end device as an LTR frame Encoded video frame. That is to say, in this embodiment, the sending end device does not need to wait for feedback from the receiving end device when marking the LTR frame. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the reference distance between frames and improve the video image. Encoding quality.
  • FIG. 13 is a schematic structural diagram of an embodiment of a video image sending device of this application.
  • the video image includes multiple video frames.
  • the video image sending device 130 may include: an encoding module 1301 and a transmission module 1302;
  • the sending device 130 of the video image may correspond to the sending end A in FIG. 1, or may correspond to the sending device in FIG. 10(b) or FIG. 11(b), or may correspond to the sending device in FIG. 16(a).
  • the device 900 of) may correspond to the device 40 of FIG. 16(b) or may correspond to the device 400 of FIG. 16(c).
  • the encoding module 1301 may specifically correspond to the video encoder in the sending device shown in FIG. 10(b) or FIG. 11(b), or may specifically correspond to the encoder in the apparatus 40 shown in FIG. 16(b). 20.
  • the encoding module 1301 is used to encode the above-mentioned multiple video frames to obtain an encoded code stream
  • the code stream includes at least information indicating the reference relationship between the frames; in addition, the code stream may also include encoded data , For example: residual data of the current frame and the reference frame, etc.; the above information indicating the reference relationship between frames can be placed in the slice header;
  • the transmission module 1302 is configured to send the above-mentioned encoded code stream, wherein the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, where:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest temporal distance to the current frame, where the current frame refers to the current frame, and the target forward LTR frame is
  • the sending end device receives the forward LTR frame of the receiving end device confirmation message.
  • the target forward LTR frame is an encoded video frame marked as an LTR frame by the encoding module 1301 and receiving the confirmation message sent by the receiving end device ,
  • the confirmation message corresponds to the target forward LTR frame;
  • the sending device of the video image is the local end, for example, it can also be called the sending end device, and the receiving end device is the opposite end or the remote end, for example, it can also be called Decoding terminal equipment;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is an encoding module 1301 is an encoded video frame marked as an LTR frame, and the above-mentioned forward LTR frame is stored in the DPB.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the encoding module 1301 encodes the foregoing multiple video frames to obtain an encoded bitstream.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is
  • the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the encoding module 1301, that is, in this embodiment, the encoding module 1301 is marking the LTR frame
  • the above information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates all
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the encoding module determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame.
  • Each of the last M frames of the current frame refers to the previous frame of the current frame, so that Shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize the adaptive selection of reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to avoid references that are far away from the current frame in the time domain.
  • the reference frame greatly alleviates the problem of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image smoothness.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the encoding module 1301 determines the value of N according to the encoding quality of the first n frames of the current frame, and n ⁇ N. In a specific implementation, the encoding module 1301 may determine the value of N according to the encoding quality of the first n frames of the current frame, the motion scene of the above-mentioned video image, and the network state information fed back by the receiving end device.
  • the above-mentioned network state information may include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the encoding module 1301 determines the value of N according to the comparison result of the encoding quality of the first n frames of the current frame and the encoding quality threshold.
  • the encoding module 1301 determines the value of M according to the number of video frames included in the unit time. In a specific implementation, the encoding module 1301 may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • the L frame is after the M frame in the time domain, and the information of the inter-frame reference relationship of the L frame indicates the time domain distance between the first frame reference in the (Mn+1) frame and the first frame
  • the most recent target forward LTR frame, each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, where L is a positive integer, and n is a positive value greater than or equal to 1. Integer.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the encoding module 1301 can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in the time domain when encoding the L frame after M.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the encoding module 1301 determines the marking interval D of the LTR frame according to the network status information fed back by the receiving end device, and the network status information includes: network packet loss rate, network available bandwidth, and One or more of the network loopback time RTT.
  • the marking interval D of the LTR frame is used for the encoding module 1301 to mark the LTR frame.
  • the encoding module 1301 marks the LTR frame according to the mark interval of the LTR, which can mark multiple LTR frames in one RTT, and in this application, the mark interval of the LTR is not fixedly set, but dynamically changes, and may be the same The interval may also be a different interval, which is specifically determined according to actual application scenarios, so that the reference distance between frames can be greatly shortened and the coding quality of video images can be improved.
  • the encoding module 1301 can dynamically determine the marking interval of the LTR according to information such as network conditions, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and Clarity, to achieve the best video call experience.
  • the video image sending device provided in the embodiment shown in FIG. 13 can be used to implement the technical solution of the method embodiment shown in FIG. 3 of the present application. For its implementation principles and technical effects, further reference may be made to related descriptions in the method embodiment.
  • FIG. 14 is a schematic structural diagram of another embodiment of a video image sending device of this application.
  • the sending device 140 of the video image shown in FIG. 14 may correspond to the sending end A in FIG. 1, or may correspond to the sending device in FIG. 10(b) or FIG. 11(b), or may correspond to
  • the device 900 in FIG. 16(a) may either correspond to the device 40 in FIG. 16(b) or may correspond to the device 400 in FIG. 16(c).
  • the encoding module 1402 may specifically correspond to the video encoder in the sending device shown in FIG. 10(b) or FIG. 11(b), or may specifically correspond to the encoder in the apparatus 40 shown in FIG. 16(b). 20.
  • the foregoing video image includes multiple video frames.
  • the foregoing video image sending device 140 may include: a judgment module 1401, an encoding module 1402, and a transmission module 1403;
  • the judging module 1401 is used to judge whether the current frame is marked as an LTR frame; specifically, the judging module 1401 may correspond to the network analysis processing system in FIG. 10(b);
  • the encoding module 1402 is configured to encode the unmarked current frame when the current frame is not marked as an LTR frame, wherein the encoding process includes: at least encoding information representing the inter-frame reference relationship of the current frame into the code
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest temporal distance to the current frame, and the forward LTR frame is the encoded LTR frame marked by the encoding module 1402 Video frame; or,
  • the marked current frame is encoded, where the encoding process includes: at least encoding information representing the inter-frame reference relationship of the current frame into a code stream, and the current frame
  • the inter-frame reference relationship indicates that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, where the target forward LTR frame is the forward LTR frame for which the encoding module 1402 receives the confirmation message from the receiving end device
  • the target forward LTR frame is an encoded video frame marked as an LTR frame by the encoding module 1402 and receiving a confirmation message sent by the receiving end device, the confirmation message corresponding to the target forward LTR frame;
  • the encoding module 1402 may correspond to the video encoder in FIG. 10(b).
  • the transmission module 1403 is used to send the coded stream. Specifically, the transmission module 1403 may correspond to the network transmitter in FIG. 10(b).
  • the encoding module 1402 when the encoding module 1402 encodes the unmarked current frame, it refers to the forward LTR frame with the closest time domain distance to the unmarked current frame, and the foregoing forward LTR frame is marked by the encoding module as LTR
  • the encoded video frame of the frame that is, in this embodiment, when the encoding module marks the LTR frame, it does not need to wait for the feedback of the receiving end device. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the frame. Inter-reference distance to improve the coding quality of video images.
  • the determining module 1401 is specifically configured to determine whether the current frame is marked as an LTR frame according to the marking interval of the LTR frame.
  • FIG. 15 is a schematic structural diagram of another embodiment of a video image sending device of this application.
  • the video image sending device 150 shown in FIG. 15 may correspond to the sending end A in FIG. 1, or may correspond to the sending device in FIG. 10(b) or FIG. 11(b), or may correspond to
  • the device 900 in FIG. 16(a) may either correspond to the device 40 in FIG. 16(b) or may correspond to the device 400 in FIG. 16(c).
  • the encoding module 1402 may specifically correspond to the video encoder in the sending device shown in FIG. 10(b) or FIG. 11(b), or may specifically correspond to the encoder 20 in the apparatus 40 shown in FIG. 16(b). .
  • the judgment module 1401 may include: an acquisition sub-module 14011 and a marking sub-module 14012
  • the obtaining sub-module 14011 is used to obtain the number of interval frames between the current frame and the forward LTR frame with the closest temporal distance to the foregoing current frame;
  • the marking sub-module 14012 is used for marking the current frame as an LTR frame when the number of interval frames is equal to the marking interval of the LTR frame; when the number of interval frames is not equal to the marking interval of the LTR frame, The current frame is not marked as an LTR frame.
  • the judging module 1401 is further configured to determine the marking interval of the LTR frame according to the network status information fed back by the receiving end device, and the network status information includes: network packet loss rate, network available bandwidth, and network loopback time One or more of RTT.
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the sender device tag Is an encoded video frame of an LTR frame; wherein the current frame is not marked as an LTR frame and the encoding quality of the current frame is greater than or equal to the encoding quality threshold; or,
  • the inter-frame reference relationship of the current frame indicates that the current frame refers to the forward LTR frame with the closest temporal distance to the current frame, and the forward LTR frame is an encoded video frame marked as an LTR frame by the transmitting end device; where The current frame is not marked as an LTR frame and the coding quality of the current frame is less than the coding quality threshold.
  • the sending device when the sending device encodes the current frame, it refers to the forward LTR frame with the closest time domain distance to the current frame.
  • the encoding module 1402 After encoding the current frame, the encoding module 1402 obtains the current frame Encoding quality, the encoding quality of the current frame is compared with the encoding quality threshold. If the encoding quality of the current frame is less than the encoding quality threshold, when the encoding module 1402 encodes the next frame of the current frame, refer to the time domain of the next frame Forward the LTR frame to the nearest target to improve the coding quality of the frame after the current frame.
  • the encoding module 1402 is further configured to encode the last M+1 frame of the current frame, and the encoding process includes: inter-frame reference representing the last M+1 frame of the current frame
  • the relationship information is encoded into the code stream, and the inter-frame reference relationship of the last M+1 frame indicates that the first frame in the last M+1 frame refers to the target forward LTR frame with the closest time domain distance to the first frame ,
  • M is a positive integer; wherein the current frame is not marked as an LTR frame and the current frame
  • the encoding quality of is less than the encoding quality threshold.
  • the encoding module 1402 can determine that the first frame in the last M+1 frame refers to the target whose time domain is closest to the first frame.
  • each frame after the first frame in the above-mentioned last M+1 frames refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize the self Adapt to the selection of the reference relationship, such as the flexible combination of the full reference relationship and the frame-by-frame reference relationship. To a certain extent, it avoids referencing the reference frame with a long time domain distance from the current frame, which greatly alleviates the phenomenon of video jams caused by packet loss.
  • the problem of blurring image quality achieves a better balance between image quality and image fluency.
  • the encoding module 1402 is further configured to encode the next frame of the current frame, and the encoding process includes: encoding information indicating the inter-frame reference relationship of the next frame of the current frame In the code stream, the inter-frame reference relationship of the next frame indicates that the next frame refers to the target forward LTR frame with the closest time domain distance to the current frame, wherein the current frame is not marked as an LTR frame and is not marked as an LTR frame.
  • the coding quality of the current frame is less than the coding quality threshold.
  • the encoding module 1402 is configured to determine the value of M according to the number of video frames included in a unit time.
  • the encoding module may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the video image sending device provided in the embodiment shown in FIG. 14 and FIG. 15 can be used to implement the technical solution of the method embodiment shown in FIG. 6 of the present application. For its implementation principles and technical effects, further reference may be made to the related description in the method embodiment.
  • each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit; hereinafter referred to as ASIC), or, one or more micro-processing Digital Processor (Digital Singnal Processor; hereinafter referred to as DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array; hereinafter referred to as FPGA), etc.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Singnal Processor
  • FPGA Field Programmable Gate Array
  • these modules can be integrated together and implemented in the form of a System-On-a-Chip (hereinafter referred to as SOC).
  • SOC System-On-a-Chip
  • FIG. 16(a) is a schematic structural diagram of an embodiment of a video call device of this application.
  • the video call device may be a video call device used by the first user.
  • the video call device may include: display Screen; image collector; one or more processors; memory; multiple application programs; and one or more computer programs.
  • the above-mentioned display screen may include the display screen of a vehicle-mounted computer (Mobile Data Center); the above-mentioned image collector may be a camera, or a vehicle-mounted sensor, etc.; the above-mentioned video call device may be a mobile terminal (mobile phone), a smart screen, UAV, Intelligent Connected Vehicle (hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or in-vehicle equipment and other equipment.
  • a vehicle-mounted computer Mobile Data Center
  • the above-mentioned image collector may be a camera, or a vehicle-mounted sensor, etc.
  • the above-mentioned video call device may be a mobile terminal (mobile phone), a smart screen, UAV, Intelligent Connected Vehicle (hereinafter referred to as ICV), smart/intelligent car (smart/intelligent car) or in-vehicle equipment and other equipment.
  • ICV Intelligent Connected Vehicle
  • the one or more computer programs are stored in the memory, and the one or more computer programs include instructions.
  • the device executes the following steps: A user requests a first operation of a video call with a second user to establish a video call connection between the first user and the second user; the video call connection here refers to the electronic device used by the first user and the second user 2. Video call connection between electronic devices used by users;
  • the video image includes a plurality of video frames, where the environment may be a video image of the internal environment and/or the external environment where the first user is located , Such as the environment in the car and/or intelligently detect obstacles and perceive the surrounding environment during driving;
  • the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, wherein:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest temporal distance to the current frame, where the current frame refers to the current frame, and the target forward LTR frame is
  • the sending end device is marked as an LTR frame, and the encoded video frame of the confirmation message sent by the receiving end device is received, the confirmation message corresponding to the target forward LTR frame;
  • the sending end device is the first user A video call device used, and the receiving end device is a video call device used by the second user;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the The coded video frame marked by the sending end device as an LTR frame, where this frame refers to each of the previous N frames.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the image collector collects information including the first user
  • the video image of the environment is then encoded with multiple video frames included in the above video image to obtain an encoded code stream, the code stream including at least information indicating a reference relationship between frames.
  • the information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the first N frames of the current frame, and the information about the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames of the current frame is With reference to the forward LTR frame with the closest time domain distance to this frame, the foregoing forward LTR frame is an encoded video frame marked as an LTR frame by the sending end device, that is, in this embodiment, the sending end device is marking the LTR frame When there is no need to wait for the feedback from the receiving end device, it is possible to mark multiple LTR frames in one RTT, which can greatly shorten the reference distance between frames and improve the coding quality of video images.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the foregoing device specifically executes the following steps:
  • the value of N is determined according to the coding quality of the first n frames of the current frame, n ⁇ N.
  • the sender device can determine the value of N according to the coding quality of the first n frames of the current frame, the motion scene of the video image, and the network status information fed back by the receiver device.
  • the network status information can include the network packet loss rate. , One or more of available network bandwidth and network loop time RTT.
  • the device when the instruction is executed by the device, the device specifically executes the following steps:
  • the value of M is determined according to the number of video frames included in the unit time.
  • the sending end device may determine the value of M according to the number of video frames included in the unit time and the motion scene of the above-mentioned video image.
  • the above unit time may be set according to system performance and/or implementation requirements during specific implementation. For example, the above unit time may be 1 second.
  • the mark interval D of the LTR frame has a functional relationship with N and M.
  • L is a positive integer
  • n is greater than or equal to 1. Positive integer.
  • the values of M1, M2, ..., Mn may be the same or different, and the specific values may be determined according to actual application scenarios.
  • the sender device can determine that the first frame in the (Mn+1) frame refers to the target forward direction that is closest to the first frame in time domain.
  • each frame after the first frame in the (Mn+1) frame refers to the previous frame of this frame, which can shorten the reference distance between frames, improve the coding quality in the network handicap environment, and realize adaptive selection Reference relations, such as the flexible combination of full reference relations and frame-by-frame reference relations, to a certain extent avoid reference frames with a long time-domain distance from the current frame, and to a large extent alleviate the phenomenon of video jams and image quality caused by packet loss
  • the problem of blur achieves a good balance between image quality and image smoothness.
  • the mark interval D of the LTR frame has a functional relationship with N and L.
  • the marking interval D of the LTR frame is used for the sending end device to mark the LTR frame.
  • the sending end device marks the LTR frame according to the marking interval of the LTR, which can mark multiple LTR frames in one RTT, thereby greatly shortening the reference distance between frames and improving the coding quality of the video image.
  • the mark interval of LTR is not fixed, but dynamically changes. It may be the same interval or different intervals. It is determined according to the actual application scenario, which can greatly shorten the reference distance between frames and improve the video. The encoding quality of the image.
  • the sender device can dynamically determine the LTR marking interval based on network conditions and other information, and can respond to network handicap scenarios such as sudden packet loss, large packet loss, and congestion on the live network in time, and can take into account fluency and congestion. Clarity, to achieve the best video call experience.
  • the electronic device shown in FIG. 16(a) may be a terminal device or a circuit device built in the above-mentioned terminal device.
  • the device can be used to execute the functions/steps in the method provided in the embodiment shown in FIG. 8 of the present application.
  • the electronic device 900 includes a processor 910 and a transceiver 920.
  • the electronic device 900 may further include a memory 930.
  • the processor 910, the transceiver 920, and the memory 930 can communicate with each other through an internal connection path to transfer control and/or data signals.
  • the memory 930 is used to store computer programs, and the processor 910 is used to download from the memory 930. Call and run the computer program.
  • the electronic device 900 may further include an antenna 940 for transmitting the wireless signal output by the transceiver 920.
  • the above-mentioned processor 910 and the memory 930 may be integrated into a processing device, and more commonly, are components independent of each other.
  • the processor 910 is configured to execute the program code stored in the memory 930 to implement the above-mentioned functions.
  • the memory 930 may also be integrated in the processor 910, or independent of the processor 910.
  • the electronic device 900 may also include one or more of an input unit 960, a display unit 970, an audio circuit 980, a camera 990, and a sensor 901.
  • the audio The circuit may also include a speaker 982, a microphone 984, and so on.
  • the display unit 970 may include a display screen
  • the camera 990 is a specific example of an image collector
  • the image collector may be a device with an image acquisition function. The specific form of the image collector is not limited in this embodiment.
  • the aforementioned electronic device 900 may further include a power supply 950 for providing power to various devices or circuits in the terminal device.
  • the electronic device 900 shown in FIG. 16(a) can implement each process of the method provided in the embodiment shown in FIG. 8.
  • the operations and/or functions of each module in the electronic device 900 are respectively for implementing the corresponding processes in the foregoing method embodiments.
  • processor 910 in the electronic device 900 shown in FIG. 16(a) may be a system-on-chip SOC, and the processor 910 may include a central processing unit (Central Processing Unit; hereinafter referred to as CPU), and may further include Other types of processors, such as graphics processing unit (Graphics Processing Unit; hereinafter referred to as GPU), etc.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • each part of the processor or processing unit inside the processor 910 can cooperate to implement the previous method flow, and the corresponding software program of each part of the processor or processing unit can be stored in the memory 930.
  • An embodiment of the present application also provides a device for decoding video data, and the foregoing device includes:
  • Memory used to store video data in the form of code stream
  • the video decoder is used to decode the information representing the inter-frame reference relationship from the code stream, wherein the information representing the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the first N frames of the current frame Information about the reference relationship between:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target long-term reference LTR frame with the closest time domain distance to the current frame; the current frame here refers to the current frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame; the current frame here refers to the previous N frames Every frame in
  • Rebuilding the multiple video frames, wherein the rebuilding the multiple data frames includes: reconstructing the current video frame according to the reference frame of the current frame.
  • An embodiment of the present application also provides a device for encoding video data, and the device includes:
  • a memory for storing video data, the video data including one or more video frames
  • the video encoder is configured to encode the multiple video frames to obtain an encoded bitstream, the bitstream at least including information indicating the reference relationship between the frames; in addition, the above-mentioned bitstream may also include encoded data, For example: residual data of the current frame and the reference frame, etc.; the above information indicating the reference relationship between frames can be placed in the slice header;
  • the information indicating the inter-frame reference relationship includes the information of the inter-frame reference relationship of the current frame and the information of the inter-frame reference relationship of the previous N frames of the current frame, wherein:
  • the information of the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest time domain distance to the current frame, where the current frame refers to the current frame, and the foregoing target forward LTR frame is used
  • the device for encoding video data receives the forward LTR frame of the confirmation message of the device for decoding video data.
  • the above-mentioned target forward LTR frame may be marked as an LTR frame by the device for encoding video data and received
  • the encoded video frame of the confirmation message sent by the device for decoding video data, where the confirmation message corresponds to the target forward LTR frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, where the current frame refers to the previous N frames
  • the forward LTR frame is an encoded video frame marked as an LTR frame by the device for encoding video data.
  • the forward LTR frame is stored in the DPB.
  • FIG. 16(b) is an explanatory diagram of an example of a video decoding device 40 including an encoder 20 and/or a decoder 30 according to an exemplary embodiment.
  • the video decoding device 40 can implement a combination of various technologies in the embodiments of the present application.
  • the video decoding device 40 may include an imaging device 41, an encoder 20, a decoder 30 (and/or a video encoder/decoder implemented by the logic circuit 47 of the processing circuit 46), and an antenna 42 , One or more processors 43, one or more memories 44, and/or display devices 45.
  • the imaging device 41, the antenna 42, the processing circuit 46, the logic circuit 47, the encoder 20, the decoder 30, the processor 43, the memory 44, and/or the display device 45 can communicate with each other.
  • the encoder 20 and the decoder 30 are used to illustrate the video coding device 40, in different examples, the video coding device 40 may include only the encoder 20 or only the decoder 30.
  • antenna 42 may be used to transmit or receive an encoded bitstream of video data.
  • the display device 45 may be used to present video data.
  • the logic circuit 47 may be implemented by the processing circuit 46.
  • the processing circuit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and so on.
  • the video decoding device 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the logic circuit 47 may be implemented by hardware, such as dedicated video encoding hardware, and the processor 43 may be implemented by general software, an operating system, and the like.
  • the memory 44 may be any type of memory, such as volatile memory (for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.) or non-volatile memory. Memory (for example, flash memory, etc.), etc.
  • volatile memory for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
  • Memory for example, flash memory, etc.
  • the memory 44 may be implemented by cache memory.
  • the logic circuit 47 may access the memory 44 (e.g., to implement an image buffer).
  • the logic circuit 47 and/or the processing circuit 46 may include a memory (e.g., cache, etc.) for implementing image buffers and the like.
  • the encoder 20 implemented by logic circuits may include an image buffer (e.g., implemented by the processing circuit 46 or the memory 44) and a graphics processing unit (e.g., implemented by the processing circuit 46).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include an encoder 20 implemented by a logic circuit 47 to implement the various modules discussed with reference to FIG. 2 and/or any other encoder systems or subsystems described herein.
  • Logic circuits can be used to perform the various operations discussed herein.
  • decoder 30 may be implemented by logic circuit 47 in a similar manner to implement the various modules discussed with reference to decoder 30 of FIG. 3 and/or any other decoder system or subsystem described herein.
  • the decoder 30 implemented by logic circuits may include an image buffer (implemented by the processing circuit 44 or the memory 44) and a graphics processing unit (implemented by the processing circuit 46, for example).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include a decoder 30 implemented by a logic circuit 47 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder system or subsystem described herein.
  • antenna 42 may be used to receive an encoded bitstream of video data.
  • the encoded bitstream may include the reference relationship information related to the encoded video frame discussed herein, etc.
  • the video coding device 40 may also include a decoder 30 coupled to the antenna 42 and used to decode the encoded bitstream.
  • the display device 45 is used to present video frames.
  • the decoder 30 can be used to receive and parse such syntax elements, and decode related video data accordingly.
  • the encoder 20 may entropy encode the syntax elements into an encoded video bitstream.
  • the decoder 30 can parse such syntax elements and decode the related video data accordingly.
  • the video image encoding method described in the embodiment of the application occurs at the encoder 20, and the video image decoding method described in the embodiment of the application occurs at the decoder 30.
  • the encoder 20 and the decoder in the embodiment of the application may be, for example, an encoder/decoder corresponding to video standard protocols such as H.263, H.264, HEVC, MPEG-2, MPEG-4, VP8, VP9, or next-generation video standard protocols (such as H.266, etc.).
  • FIG. 16(c) is a schematic structural diagram of a video decoding device 400 (for example, a video encoding device 400 or a video decoding device 400) provided by an embodiment of the present application.
  • the video coding device 400 is suitable for implementing the embodiments described herein.
  • the video coding device 400 may be a video decoder (for example, the decoder 30 of FIG. 16(b)) or a video encoder (for example, the encoder 20 of FIG. 16(b)).
  • the video coding device 400 may be one or more components of the decoder 30 of FIG. 16(b) or the encoder 20 of FIG. 16(b) described above.
  • the video decoding device 400 includes: an entry port 410 for receiving data and a receiving unit (Rx) 420, a processor, logic unit or central processing unit (CPU) 430 for processing data, and a transmitter unit for transmitting data (Tx) 440 (or simply referred to as transmitter 440) and outlet port 450, as well as memory 460 (such as memory 460) for storing data.
  • the video decoding device 400 may also include photoelectric conversion components and electro-optical (EO) components coupled with the inlet port 410, the receiver unit 420 (or simply referred to as the receiver 420), the transmitter unit 440 and the outlet port 450 for optical signals. Or the exit or entrance of an electrical signal.
  • EO electro-optical
  • the processor 430 is implemented by hardware and software.
  • the processor 430 may be implemented as one or more CPU chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 430 communicates with the ingress port 410, the receiver unit 420, the transmitter unit 440, the egress port 450, and the memory 460.
  • the processor 430 includes a decoding module 470 (for example, an encoding module 470 or a decoding module 470).
  • the encoding/decoding module 470 implements the embodiments disclosed herein to implement the chrominance block prediction method provided in the embodiments of the present application. For example, the encoding/decoding module 470 implements, processes, or provides various encoding operations.
  • the encoding/decoding module 470 provides a substantial improvement to the function of the video decoding device 400, and affects the conversion of the video decoding device 400 to different states.
  • the encoding/decoding module 470 is implemented by instructions stored in the memory 460 and executed by the processor 430.
  • the memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution.
  • the memory 460 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), random access memory (Ternary Content-Addressable Memory; hereinafter referred to as TCAM) and / Or static random access memory (SRAM).
  • FIG. 17 is a schematic structural diagram of an embodiment of a video image receiving device of this application.
  • the above-mentioned video image includes multiple video frames.
  • the receiving device 170 of the above-mentioned video image may include: a decoding module 1701, a decoding module 1701, and a display module 1702;
  • the receiving device 170 may correspond to the receiving terminal B in FIG. 1, or may correspond to the receiving device in FIG. 10(b) or FIG. 11(b), or may correspond to the apparatus 900 in FIG. 16(a), or may correspond to The device 40 in FIG. 16(b) or may correspond to the device 400 in FIG. 16(c).
  • the decoding module 1701 may correspond to the video decoder in the receiving device in FIG. 10(b) or FIG. 11(b), or specifically may correspond to the decoder 30 in the apparatus 40 shown in FIG. 16(b).
  • the decoding module 1701 is used to parse the code stream to obtain information indicating the inter-frame reference relationship, where the information indicating the inter-frame reference relationship includes the information about the inter-frame reference relationship of the current frame and the frames of the previous N frames of the current frame Information about the reference relationship between:
  • the information about the inter-frame reference relationship of the current frame indicates that the current frame refers to the target forward long-term reference LTR frame with the closest temporal distance to the current frame, where the current frame refers to the current frame; wherein, the target forward LTR frame is The sending end device is marked as an LTR frame, and the encoded video frame of the confirmation message sent by the receiving end device is received, where the confirmation message corresponds to the target forward LTR frame;
  • the information of the inter-frame reference relationship of the first N frames of the current frame indicates that each of the first N frames refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is the The coded video frame marked as LTR frame by the sending end device;
  • the decoding module 1701 is further configured to reconstruct the multiple video frames, wherein the reconstructing the multiple data frames includes: reconstructing the current video frame according to the reference frame of the current frame;
  • the display module 1702 is used to display the video image.
  • all frames in multiple frames between the current frame and the forward LTR frame with the closest time domain distance can refer to the same LTR frame (for example, A), or can be part of multiple frames
  • the frame refers to the same LTR frame (for example, A).
  • the above-mentioned information indicating the inter-frame reference relationship includes information about the inter-frame reference relationship of the previous N frames of the current frame.
  • the information of the inter-frame reference relationship of the previous N frames of the current frame indicates that each of the previous N frames refers to the forward LTR frame with the closest time domain distance to the current frame, and the forward LTR frame is marked as The encoded video frame of the LTR frame. That is to say, in this embodiment, the sending end device does not need to wait for feedback from the receiving end device when marking the LTR frame. Therefore, it is possible to mark multiple LTR frames within one RTT, which can greatly shorten the reference distance between frames and improve the video image. Encoding quality.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the sending end device determines that the current frame refers to the target forward LTR frame with the closest time domain distance to the current frame, and each of the last M frames of the current frame refers to the previous frame of the current frame.
  • the reference frame which greatly alleviates the phenomenon of video jams and blurred image quality caused by packet loss, and achieves a better balance between image quality and image fluency.
  • the information indicating the inter-frame reference relationship further includes information about the inter-frame reference relationship of the last M frames of the current frame, and the information about the inter-frame reference relationship of the last M frames of the current frame indicates
  • Each of the last M frames refers to the forward LTR frame with the closest time domain distance to the current frame, where N and M are positive integers.
  • N and M are positive integers.
  • the specific values of N and M may depend on the network.
  • the video image receiving device provided by the embodiment shown in FIG. 17 can be used to implement the technical solution of the method embodiment shown in FIG. 12 of the present application. For its implementation principles and technical effects, further reference may be made to the relevant description in the method embodiment.
  • the division of the various modules of the video image receiving device shown in FIG. 17 is only a division of logical functions, and may be fully or partially integrated into one physical entity in actual implementation, or may be physically separated.
  • these modules can all be implemented in the form of software called by processing elements; they can also be implemented in the form of hardware; part of the modules can be implemented in the form of software called by the processing elements, and some of the modules can be implemented in the form of hardware.
  • the encoding module may be a separately established processing element, or it may be integrated in a video image receiving device, such as a certain chip of an electronic device.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated together or implemented independently.
  • each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in the processor element or instructions in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more specific ASICs, or, one or more DSPs, or, one or more FPGAs, etc.
  • these modules can be integrated together and implemented in the form of a system-on-chip SOC.
  • This application also provides a video image encoding device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium.
  • the storage medium stores a computer executable program.
  • the processor is connected to the non-volatile storage medium and executes the computer executable program to implement the method provided in the embodiment shown in FIG. 3 of the present application.
  • This application also provides a video image encoding device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium.
  • the storage medium stores a computer executable program.
  • the processor is connected to the non-volatile storage medium and executes the computer executable program to implement the method provided in the embodiment shown in FIG. 6 of the present application.
  • the present application also provides a video image decoding device.
  • the device includes a storage medium and a central processing unit.
  • the storage medium may be a non-volatile storage medium.
  • a computer executable program is stored in the storage medium.
  • the processor is connected to the non-volatile storage medium and executes the computer executable program to implement the method provided in the embodiment shown in FIG. 12 of the present application.
  • the above-mentioned memory can be read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and Any other media that can be accessed by a computer, etc.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • magnetic disk storage media or other magnetic storage devices or can be used to carry or store desired program codes in the form of instructions or data
  • the processors involved may include, for example, CPU, DSP, microcontroller or digital signal processor, and may also include GPU, embedded neural network processor (Neural-network Process Units; hereinafter referred to as NPU) and Image signal processing (Image Signal Processing; hereinafter referred to as ISP), which may also include necessary hardware accelerators or logic processing hardware circuits, such as ASIC, or one or more integrated circuits used to control the execution of the technical solutions of this application Circuit etc.
  • the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when it runs on a computer, the computer executes Figure 3, Figure 6, Figure 8 or Figure 12 of the present application.
  • the method provided by the illustrated embodiment is not limited to Figure 3, Figure 6, Figure 8 or Figure 12 of the present application.
  • the embodiment of the present application also provides a computer program product, the computer program product includes a computer program, when it runs on a computer, the computer executes the steps provided in the embodiment shown in FIG. 3, FIG. 6, FIG. 8 or FIG. 12 method.
  • At least one refers to one or more
  • multiple refers to two or more.
  • And/or describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. Among them, A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • the following at least one item” and similar expressions refer to any combination of these items, including any combination of single items or plural items.
  • At least one of a, b, and c can represent: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, and c can be single, or There can be more than one.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory; hereinafter referred to as ROM), random access memory (Random Access Memory; hereinafter referred to as RAM), magnetic disks or optical disks, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disks or optical disks etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供一种视频图像的传输方法、发送设备、视频通话方法和设备,上述视频图像包括多个视频帧,上述视频图像的传输方法中,对上述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息。本申请中,当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。

Description

视频图像的传输方法、发送设备、视频通话方法和设备
本申请要求于2019年9月19日提交中国专利局、申请号为201910888693.5、发明名称为“视频图像的传输方法、发送设备、视频通话方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及视频图像的传输方法、发送设备、视频通话方法和设备。
背景技术
随着5G的进一步发展,人们对视频交互的需求也愈发强烈,视频通话作为互动交互基础服务,近年来也在快速增长。现在,视频通话作为基础能力已嵌入车载终端、智慧大屏、无人机、儿童手表、智能音箱、远程医疗等全场景智能终端上,为实现未来智能社会全场景互联互通打下基础。
虽说网络覆盖越来越完善,而现实网络情况也复杂多变,弱信号的覆盖区域、家庭无线网络(Wireless Fidelity;以下简称:WiFi)穿墙及公用WiFi的多用户强占使用等场景下,会存在突发高丢包和/或网络拥塞等现象,从而导致接收到的视频数据不完整造成画面卡顿。
针对突发高丢包造成的数据不完整,现有相关技术主要借助发送端重编码I帧来恢复视频画面流畅,但这会引入明显的卡顿现象,严重影响视频通话的高清流畅度体验。
视频编解码与传输控制作为视频通话的核心技术,对视频通话质量与流畅度起到关键作用。但现有相关技术中,视频通话的编解码与传输控制分属两个子系统,视频帧间参考关系相对稳定,存在视频流畅度与清晰度无法兼顾,体验不佳的问题。
发明内容
本申请提供了一种视频图像的传输方法、发送设备、视频通话方法和设备,本申请还提供一种视频图像的显示方法和视频图像的接收设备,以实现图像质量与图像流畅度之间达到较好的平衡。
第一方面,本申请提供了一种视频图像的传输方法,所述视频图像包括多个视频帧,包括:
对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;另外,上述码流中还可以包括已编码数据,例如:当前帧与参考帧的残差数据等;上述表示帧间参考关系的信息可以放在条带头(slice header)中;
发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,上述目标前向LTR帧为发送端设备接收到接收端设备确认消息的前向LTR帧,具体地,上述目标前向LTR帧可以为发送端设备标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,上述确认消息与所述目标前向LTR帧对应;本申请中,发送端设备即本端,例如也可以叫做编码端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧是指前N帧中的每一帧,所述前向LTR帧为所述发送端设备标记为LTR帧的已编码的视频帧,本申请中,前向LTR帧存储在DPB中。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的 多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的传输方法中,对上述多个视频帧进行编码,得到经编码的码流,上述码流至少包括表示帧间参考关系的信息。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,这里的本帧是指后M帧中的每一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在一种可能的实现方式中,所述发送端设备根据所述当前帧的前n帧的编码质量确定N的数值,n<N。在具体实现时,发送端设备可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述发送端设备根据所述当前帧的前n帧的编码质量与编码质量阈值的比较结果,确定N的数值。
在一种可能的实现方式中,所述发送端设备根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。
其中,上述LTR帧的标记间隔是指标记LTR帧的间隔帧数,即距离标记上一个LTR帧后,需要间隔多少帧标记下一个LTR帧。举例来说,如果LTR帧的标记间隔为4,那么在将当前帧标记为LTR帧之后,需要间隔4帧,将当前帧之后的第5帧标记为LTR帧。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所 述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。在一种可能的实现方式中,所述发送端设备根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔D,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
其中,发送端设备根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息, 动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
第二方面,本申请提供一种视频图像的传输方法,所述视频图像包括多个视频帧,包括:判断当前帧是否被标记为长期参考LTR帧;如果所述当前帧未被标记为LTR帧,则对未标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;或者,
如果所述当前帧被标记为LTR帧,则对标记的当前帧进行编码,其中,所述编码过程包括:将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的目标前向LTR帧,其中,上述目标前向LTR帧为所述发送端设备接收到接收端设备确认消息的前向LTR帧,具体地,所述目标前向LTR帧为所述发送端设备标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;本申请中,发送端设备即本端,例如也可以叫做编码端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;
发送经编码的码流。
上述视频图像的传输方法中,在对未标记的当前帧进行编码时,参考与未标记的当前帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述判断当前帧是否被标记为长期参考LTR帧包括:根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧。
在一种可能的实现方式中,所述根据LTR帧的标记间隔,判断当前 帧是否被标记为LTR帧包括:获取所述当前帧和与所述当前帧时域距离最近的前向LTR帧之间的间隔帧数;如果所述间隔帧数等于所述LTR帧的标记间隔,则将所述当前帧标记为LTR帧;如果所述间隔帧数不等于所述LTR帧的标记间隔,则对所述当前帧不标记为LTR帧。
在一种可能的实现方式中,所述方法还包括:根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量大于或等于编码质量阈值;或者,当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
如果当前帧未被标记为LTR帧,那么发送端设备在对当前帧进行编码时,参考与当前帧时域距离最近的前向LTR帧,在对当前帧编码之后,发送端设备获取当前帧的编码质量,将当前帧的编码质量与编码质量阈值进行对比,如果当前帧的编码质量小于编码质量阈值,则在对当前帧的后一帧进行编码时,参考与后一帧时域距离最近的目标前向LTR帧,以提高当前帧的后一帧的编码质量。
在一种可能的实现方式中,所述方法还包括:对当前帧的后M+1帧进行编码,所述编码过程包括:将表示所述当前帧的后M+1帧的帧间参考关系的信息编入码流,所述后M+1帧的帧间参考关系表示所述后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,M为正整数;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在一种可能的实现方式中,所述方法还包括:对当前帧的后一帧进行 编码,所述编码过程包括:将表示所述当前帧的后一帧的帧间参考关系的信息编入码流,所述后一帧的帧间参考关系表示所述后一帧参考与所述本帧时域距离最近的目标前向LTR帧,其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在一种可能的实现方式中,所述方法还包括:
对当前帧的后M+1帧进行编码,所述编码过程包括:将表示所述当前帧的后M+1帧的帧间参考关系的信息编入码流,所述后M+1帧的帧间参考关系表示所述后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,M为正整数;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对当前帧的后M+1帧进行编码时,确定后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,上述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述发送端设备根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
第三方面,本申请提供一种视频通话方法,应用于具有显示屏和图像采集器的电子设备。其中,上述显示屏可以包括车载计算机(移动数据中心Mobile Data Center)的显示屏;上述图像采集器可以为摄像头Camera, 或者车载传感器等;上述电子设备可以为移动终端(手机),智慧屏,无人机,智能网联车(Intelligent Connected Vehicle;以下简称:ICV),智能(汽)车(smart/intelligent car)或车载设备等设备。
上述电子设备可以包括:响应于第一用户请求与第二用户进行视频通话的第一操作,建立所述第一用户与所述第二用户之间的视频通话连接,这里的视频通话连接是指第一用户使用的电子设备与第二用户使用的电子设备之间视频通话连接;通过所述图像采集器采集包括所述第一用户的环境的视频图像,所述视频图像包括多个视频帧,这里的环境可以是第一用户所处的内部环境和/或外部环境的视频图像,比如车内环境和/或在行驶过程中智能化探测障碍物、感知周围环境;对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧;所述发送端设备为所述第一用户使用的电子设备,所述接收端设备为所述第二用户使用的电子设备;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧是指前N帧中的每一帧。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频通话方法中,响应于第一用户请求与第二用户进行视频通话的第一操作,建立第一用户与第二用户之间的视频通话连接之后,通过图 像采集器采集包括第一用户的环境的视频图像,然后对上述视频图像包括的多个视频帧进行编码,得到经编码的码流,上述码流至少包括表示帧间参考关系的信息。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向 LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在一种可能的实现方式中,所述发送端设备根据所述当前帧的前n帧的编码质量确定N的数值,n<N。在具体实现时,发送端设备可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述发送端设备根据所述当前帧的前n帧的编码质量与编码质量阈值的比较结果,确定N的数值。
在一种可能的实现方式中,所述发送端设备根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的 后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
在一种可能的实现方式中,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
其中,发送端设备根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息,动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
第四方面,本申请提供一种视频图像的显示方法,所述视频图像包括多个视频帧,包括:
解析码流,以得到表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;这里的本帧是指当前帧;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧;这里的本帧是指前N帧中的每一帧;
重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧;
显示所述视频图像。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的显示方法中,在解析码流之后,可以得到表示帧间参考关系的信息,上述表示帧间参考关系的信息中包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧。也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
第五方面,本申请提供一种视频图像的发送设备,所述视频图像包括多个视频帧,包括:编码模块,用于对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;另外,上述码流中还可以包括已编码数据,例如:当前帧与参考帧的残差数据等;上述表示帧间参考关系的信息可以放在条带头(slice header)中;
传输模块,用于发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,所述目标前向LTR帧为发送端设备接收到接收端设备确认消息的前向LTR帧, 具体地,所述目标前向LTR帧为编码模块标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;本申请中,发送端设备即本端,例如也可以叫做编码端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧是指前N帧中的每一帧,所述前向LTR帧为所述编码模块标记为LTR帧的已编码的视频帧,本申请中,前向LTR帧存储在DPB中。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的发送设备中,编码模块对上述多个视频帧进行编码,得到经编码的码流,上述码流至少包括表示帧间参考关系的信息。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,编码模块在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也 可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,编码模块确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在一种可能的实现方式中,所述编码模块根据所述当前帧的前n帧的编码质量确定N的数值,n<N。在具体实现时,编码模块可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述编码模块根据所述当前帧的前n帧的编码质量与编码质量阈值的比较结果,确定N的数值。
在一种可能的实现方式中,所述编码模块根据单位时间内所包括的视 频帧数确定M的数值。在具体实现时,编码模块可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,编码模块可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
在一种可能的实现方式中,所述编码模块根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔D,所述网络状态信息包括: 网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述LTR帧的标记间隔D用于所述编码模块标记LTR帧。
其中,编码模块根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息,动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
第六方面,本申请提供一种视频图像的发送设备,所述视频图像包括多个视频帧,包括:
判断模块,用于判断当前帧是否被标记为长期参考LTR帧;
编码模块,用于当所述当前帧未被标记为LTR帧时,对未标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧;或者,
当所述当前帧被标记为LTR帧时,对标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与上述当前帧时域距离最近的目标前向LTR帧,其中,所述目标前向LTR帧为所述编码模块接收到接收端设备确认消息的前向LTR帧,具体地,所述目标前向LTR帧为所述编码模块标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;本申请中,发送端设备即本端,例如也可以叫做编码端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;
传输模块,用于发送经编码的码流。
上述视频图像的发送设备中,在编码模块对未标记的当前帧进行编码 时,参考与未标记的当前帧时域距离最近的前向LTR帧,上述前向LTR帧为编码模块标记为LTR帧的已编码的视频帧,也就是说,本实施例中,编码模块在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述判断模块,具体用于根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧。
在一种可能的实现方式中,所述判断模块包括:
获取子模块,用于获取所述当前帧和与所述当前帧时域距离最近的前向LTR帧之间的间隔帧数;
标记子模块,用于当所述间隔帧数等于所述LTR帧的标记间隔时,将所述当前帧标记为LTR帧;当所述间隔帧数不等于所述LTR帧的标记间隔,对所述当前帧不标记为LTR帧。
在一种可能的实现方式中,判断模块,还用于根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量大于或等于编码质量阈值;或者,
当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
如果当前帧未被标记为LTR帧,那么发送端设备在对当前帧进行编码时,参考与当前帧时域距离最近的前向LTR帧,在对当前帧编码之后,编码模块获取当前帧的编码质量,将当前帧的编码质量与编码质量阈值进行对比,如果当前帧的编码质量小于编码质量阈值,则在编码模块对当前帧的后一帧进行编码时,参考与后一帧时域距离最近的目标前向LTR帧, 以提高当前帧的后一帧的编码质量。
在一种可能的实现方式中,所述编码模块,还用于对当前帧的后M+1帧进行编码,所述编码过程包括:将表示所述当前帧的后M+1帧的帧间参考关系的信息编入码流,所述后M+1帧的帧间参考关系表示所述后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,M为正整数;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,编码模块可以在对当前帧的后M+1帧进行编码时,确定后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,上述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述编码模块,还用于对当前帧的后一帧进行编码,所述编码过程包括:将表示所述当前帧的后一帧的帧间参考关系的信息编入码流,所述后一帧的帧间参考关系表示所述后一帧参考与所述本帧时域距离最近的目标前向LTR帧,其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在一种可能的实现方式中,所述编码模块,用于根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,编码模块可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
第七方面,本申请提供一种视频通话设备,上述视频通话设备可以为 第一用户使用的视频通话设备,上述视频通话设备可以包括:显示屏;图像采集器;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序。其中,上述显示屏可以包括车载计算机(移动数据中心Mobile Data Center)的显示屏;上述图像采集器可以为摄像头Camera,或者车载传感器等;上述视频通话设备可以为移动终端(手机),智慧屏,无人机,智能网联车(Intelligent Connected Vehicle;以下简称:ICV),智能(汽)车(smart/intelligent car)或车载设备等设备。
其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述设备执行时,使得所述设备执行以下步骤:响应于第一用户请求与第二用户进行视频通话的第一操作,建立所述第一用户与所述第二用户之间的视频通话连接;这里的视频通话连接是指第一用户使用的电子设备与第二用户使用的电子设备之间视频通话连接;
通过所述图像采集器采集包括所述第一用户的环境的视频图像,所述视频图像包括多个视频帧,这里的环境可以是第一用户所处的内部环境和/或外部环境的视频图像,比如车内环境和/或在行驶过程中智能化探测障碍物、感知周围环境;
对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;
发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧是指前N帧中的每一帧。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的 情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频通话设备中,响应于第一用户请求与第二用户进行视频通话的第一操作,建立第一用户与第二用户之间的视频通话连接之后,通过图像采集器采集包括第一用户的环境的视频图像,然后对上述视频图像包括的多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境 下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在一种可能的实现方式中,当所述指令被所述设备执行时,使得所述设备具体执行以下步骤:
根据所述当前帧的前n帧的编码质量确定N的数值,n<N。
在具体实现时,发送端设备可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,当所述指令被所述设备执行时,使得所述设备具体执行以下步骤:
根据单位时间内所包括的视频帧数确定M的数值。
在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
在一种可能的实现方式中,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
其中,所述发送端设备根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息,动态 确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
第八方面,本申请提供一种视频图像的接收设备,所述视频图像包括多个视频帧,包括:
解码模块,用于解析码流,以得到表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;这里的本帧是指当前帧;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧;这里的本帧是指前N帧中的每一帧;
所述解码模块,还用于重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧;
显示模块,用于显示所述视频图像。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的接收设备中,在解码模块解析码流之后,可以得到表示帧间参考关系的信息,上述表示帧间参考关系的信息中包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧。也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考 距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
第九方面,本申请提供一种视频图像的编码设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介 质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现所述第一方面或者第一方面的任一可能的实现方式中的方法。
第十方面,本申请提供一种视频图像的编码设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现所述第二方面或者第二方面的任一可能的实现方式中的方法。
第十一方面,本申请提供一种视频图像的解码设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现所述第四方面的方法。
第十二方面,本申请实施例提供一种用于解码视频数据的设备,所述设备包括:
存储器,用于存储码流形式的视频数据;
视频解码器,用于从码流中解码出表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;这里的本帧是指当前帧;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧;这里的本帧是指前N帧中的每一帧;
重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧。
第十三方面,本申请实施例提供一种用于编码视频数据的设备,所述设备包括:
存储器,用于存储视频数据,所述视频数据包括一个或多个视频帧;
视频编码器,用于对所述多个视频帧进行编码,得到经编码的码流, 所述码流至少包括表示帧间参考关系的信息;另外,上述码流中还可以包括已编码数据,例如:当前帧与参考帧的残差数据等;上述表示帧间参考关系的信息可以放在条带头(slice header)中;
发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,上述目标前向LTR帧为用于编码视频数据的设备接收到用于解码视频数据的设备的确认消息的前向LTR帧,具体地,上述目标前向LTR帧可以为用于编码视频数据的设备标记为LTR帧并且接收到用于解码视频数据的设备发送的确认消息的已编码的视频帧,上述确认消息与所述目标前向LTR帧对应;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧是指前N帧中的每一帧,所述前向LTR帧为所述用于编码视频数据的设备标记为LTR帧的已编码的视频帧,本申请中,前向LTR帧存储在DPB中。
应当理解的是,本申请的第二至十方面与本申请的第一方面的技术方案一致,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
第十四方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如第一方面、第二方面、第三方面或第四方面所述的方法。
第十五方面,本申请提供一种计算机程序,当所述计算机程序被计算机执行时,用于执行第一方面、第二方面、第三方面或第四方面所述的方法。
在一种可能的设计中,第十五方面中的程序可以全部或者部分存储在与处理器封装在一起的存储介质上,也可以部分或者全部存储在不与处理器封装在一起的存储器上。
附图说明
图1为两个用户通过各自使用的电子设备进行视频通话的示意图;
图2为现有相关技术中编码视频帧的参考结构图;
图3为本申请视频图像的传输方法一个实施例的流程图;
图4(a)~图4(c)为本申请视频图像的传输方法中视频帧的帧间参考关系一个实施例的示意图;
图5为本申请视频图像的传输方法中确定LTR帧的标记间隔一个实施例的示意图;
图6为本申请视频图像的传输方法另一个实施例的流程图;
图7(a)~图7(b)为本申请视频图像的传输方法中视频帧的帧间参考关系另一个实施例的示意图;
图8为本申请视频通话方法一个实施例的流程图;
图9(a)~图9(c)为本申请视频通话方法中请求视频通话的示意图;
图9(d)为本申请视频通话方法中建立视频通话连接阶段的界面;
图9(e)为本申请视频通话方法中建立视频通话连接之后的界面;
图10(a)~图10(b)为本申请视频通话方法的应用场景一个实施例的示意图;
图11(a)~图11(b)为本申请视频通话方法的应用场景另一个实施例的示意图;
图12为本申请视频图像的显示方法一个实施例的流程图;
图13为本申请视频图像的发送设备一个实施例的结构示意图;
图14为本申请视频图像的发送设备另一个实施例的结构示意图;
图15为本申请视频图像的发送设备再一个实施例的结构示意图;
图16(a)为本申请视频通话设备一个实施例的结构示意图;
图16(b)是根据一示例性实施例的包含编码器20和/或解码器30的视频译码装置40的实例的说明图;
图16(c)是本申请实施例提供的视频译码设备400(例如视频编码设备400或视频解码设备400)的结构示意图;
图17为本申请视频图像的接收设备一个实施例的结构示意图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
本申请实施例提供的视频图像的传输方法可以应用于各类实时音视频互动场景中,例如:两个用户通过各自使用的电子设备进行视频通话,或者多个用户通过各自使用的电子设备进行视频电话会议,都可以使用本申请提出的视频图像的传输方法。
图1为两个用户通过各自使用的电子设备进行视频通话的示意图,如图1所示,上述两个用户可以为用户A和用户B,当用户A向用户B发送视频流时,用户A使用的电子设备可以为发送端A,用户B使用的电子设备可以为接收端B。发送端A发送编码后的视频流给接收端B,接收端B实时反馈视频帧的接收情况与网络状态信息给发送端A,发送端A根据接收端B反馈的信息对网络状况进行评估,并根据接收端B的视频帧的接收情况和网络状况对视频帧编码进行调节,并将编码后的视频流发送到接收端B。同理,当用户B向用户A发送视频流时,用户B使用的电子设备可以作为发送端B,用户A使用的电子设备可以作为接收端A,那么发送端B到接收端A的方向也是类似处理机制,在此不再赘述。
图2为现有相关技术中编码视频帧的参考结构图,以发送端A向接收端B发送视频流为例,发送端A根据接收端B反馈的网络状况如网络可用带宽和/或网络时延,选择合适的I帧间隔、编码码率以及视频分辨率、帧率等信息;在会话过程中,发送端A还可以根据接收端B反馈的每帧接收情况,为当前帧设置帧间参考关系,同时将编码端解码图像缓存区(Decoded Picture Buffer;以下简称:DPB)中的视频帧分别标记为长期参考(Long Term Reference;以下简称:LTR)帧、不做参考帧与短期参考帧;发送端A在对当前帧进行编码时,以接收端B已确认的LTR帧作为参考进行编码,可保障比较好的视频画面流畅性,这里,接收端B已确认的LTR帧是指发送端A接收到接收端B发送的确认消息,上述确认消息表示上述LTR帧可以被接收B正常解码。如图2所示,接收端B实时反馈可解码的帧信息,发送端A在DPB缓存的视频帧中进行选择, 并将选择的视频帧标记为LTR帧,当前帧以新标记的LTR帧作为参考帧进行编码。
该参考关系的优势在于接收端B接收到的视频帧在编码时,均以已确认的LTR帧作为参考帧,只要接收到的视频帧完整,就可以进行解码显示。如图2中,帧6、11、12、14、18五帧丢包造成视频据不完整,并不会需要发送端A重新编码I帧来使接收端B的画面恢复,接收端B只要可正常完整地接收到后续的视频帧,即可正常解码后送给接收端B的显示模块进行渲染显示。
但是,现有相关技术中,对视频帧进行编码时的参考结构存在明显的问题,在网络差点环境下会伴随时延与丢包,这是因为,发送端编码当前视频帧的参考帧为1个网络回环时间(Round Trip Time;以下简称:RTT)前被接收端确认收到的视频帧,当前视频帧与参考帧之间的距离与时延强相关(至少需要一个RTT以上),时延越大,当前视频帧与参考帧之间的距离越长,从而明显影响图像质量。
针对现有相关技术存在的图像质量明显下降的问题,本申请重新设计编码帧的参考结构来对抗现网突发丢包、大丢包以及拥塞场景,同时兼顾流畅度与清晰度,实现最佳的视频通话体验。
图3为本申请视频图像的传输方法一个实施例的流程图,本实施例中,上述视频图像可以包括多个视频帧,如图3所示,上述视频图像的传输方法可以包括:
步骤301,对多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息。另外,上述码流中还可以包括已编码数据,例如:当前帧与参考帧的残差数据等;上述表示帧间参考关系的信息可以放在条带头(slice header)中。
步骤302,发送经编码的码流。
其中,上述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息。
其中:上述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向LTR帧,这里的本帧即为当前帧,其中,上述目标前 向LTR帧为发送端设备接收到接收端设备确认消息的前向LTR帧,具体地,上述目标前向LTR帧可以为发送端设备标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,上述确认消息与目标前向LTR帧对应;本实施例中,发送端设备即本端,例如也可以叫做编码端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;需要说明的是,“上述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧”中的“与本帧时域距离最近的目标前向LTR帧”,上述“最近的目标前向LTR帧”,在一种示例下,例如:当前帧的POC与最近的目标前向LTR帧的POC之间的差值A小于当前帧的POC与其它目标前向LTR帧的POC之间的差值B;本申请实施例中,POC表示的是视频帧的显示顺序;
上述当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧为前N帧中的每一帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,上述前向LTR帧存储在DPB中;需要说明的是,“上述当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧”中的“与本帧时域距离最近的前向LTR帧”,上述“最近的前向LTR帧”,在一种示例下,例如:本帧的POC与最近的前向LTR帧的POC之间的差值C小于本帧的POC与其它前向LTR帧的POC之间的差值D。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的传输方法中,对上述多个视频帧进行编码,得到经编码的码流,上述码流至少包括表示帧间参考关系的信息。上述表示帧间参 考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
本实施例中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,上述当前帧的后M帧的帧间参考关系的信息表示后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
本实施例中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,上述当前帧的后M帧的帧间参考关系的信息表示后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也 可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
本实施例中,发送端设备可以根据当前帧的前n帧的编码质量确定N的数值,n<N。在具体实现时,发送端设备可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
一个实施例中,发送端设备可以根据当前帧的前n帧的编码质量与编码质量阈值的比较结果,确定N的数值。具体地,在对每一帧进行编码之后,发送端设备可以输出表示这一帧的编码质量的峰值信噪比(Peak Signal to Noise Ratio;以下简称:PSNR),如果发送端设备发现当前帧的前n帧中每一帧的PSNR均比前一帧的PSNR小,即前n帧的PSNR呈下降趋势,并且当前帧的前一帧的PSNR小于编码质量阈值(即PSNR阈值),则发送端设备确定当前帧需要参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧需要参考本帧的前一帧。这时,当前帧和与当前帧时域距离最近的前向LTR帧之间的帧数即为N的数值。
本实施例中,发送端设备可以根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
本实施例中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。其中,上述LTR帧的标记间隔是指标记LTR帧的间隔帧数,即距离标记上一个LTR帧后,需要间隔多少帧标记下一个LTR帧。举例来说,如果LTR帧的标记间隔为4,那么在将当前帧标记为LTR帧之后,需要间隔4帧,将当前帧之后的第5帧标记为LTR帧。
本实施例中,上述表示帧间参考关系的信息还可以包括L帧的帧间 参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),上述L帧时域上在所述M帧之后,上述L帧的帧间参考关系的信息表示(Mn+1)帧中的第一帧参考与上述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
这时,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
下面参考图4(a)~图4(c),对本申请图3所示实施例提供的视频图像的传输方法进行说明。图4(a)~图4(c)为本申请视频图像的传输方法中视频帧的帧间参考关系一个实施例的示意图。
如图4(a)所示,当前帧参考与本帧时域距离最近的目标前向LTR帧,这句话中的本帧即为当前帧;对于图4(a)中的当前帧,N=4,M=3,当前帧的前4帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这句话中的本帧即为前4中的每一帧,本例中,与当前帧的前4帧中的每一帧时域距离最近的前向LTR帧恰好也为目标前向LTR帧,当然,当前帧的前4帧中的每一帧所参考的前向LTR帧也可以不是目标前向LTR帧。前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,与目标前向 LTR帧不同,前向LTR帧是发送端设备根据LTR帧的标记间隔标记的,并且发送端设备并未接收到接收端设备针对上述前向LTR帧发送的确认消息。
继续参见图4(a),当前帧的后3帧中的每一帧均参考本帧的前一帧。
图4(a)中,当前帧的后3帧之后,还包括L帧,图4(a)中,L=4,也就是说,L=M1+1,这里的M1=3;这4帧中的第一帧参考与第一帧时域距离最近的目标前向LTR帧,这4帧中第一帧之后的每一帧参考本帧的前一帧。
参见图4(b),对于图4(b)中的当前帧,N=4,M=3,当前帧的前4帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这句话中的本帧指当前帧的前4帧中的每一帧,这时,与前4帧中的每一帧时域距离最近的前向LTR帧,就不是目标前向LTR帧;
继续参见图4(b),当前帧的后3帧中的每一帧均参考本帧的前一帧,这句话中的本帧指后3帧中的每一帧。
参见图4(c),对于图4(c)中的当前帧,在进行编码时,发送端设备将上述当前帧标记为LTR帧,那么对于上述当前帧的后M帧,上述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,此处的前向LTR帧在图4(c)中即为当前帧,这句话中的本帧指后M帧中的每一帧。
本实施例中,发送端设备可以根据接收端设备反馈的网络状态信息,确定上述LTR帧的标记间隔D,上述网络状态信息可以包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
具体地,参见图5,图5为本申请视频图像的传输方法中确定LTR帧的标记间隔一个实施例的示意图,在具体实现时,发送端设备可以根据网络丢包信息和网络RTT确定网络特征,然后将网络特征、抗丢包算法、接收端反馈已确认的LTR帧、LTR损失率(参考距离增加会造成同样码率时编码画面质量损失)、视频图像的运动场景(即图5中的画面运动状况)、码表、目标卡顿次数、人主观可感知卡顿时长以及DPB中可缓存的LTR帧数等信息中的一个或多个作为判决输入,获得LTR帧的标记间隔,还可以获得以下信息之一或组合:是否全参考前向LTR帧、冗余策略以 及分辨率/码率/帧率等。
本实施例,上述LTR帧的标记间隔D用于发送端设备标记LTR帧。发送端设备根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息,动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
图6为本申请视频图像的传输方法另一个实施例的流程图,本实施例中,上述视频图像包括多个视频帧,如图6所示,上述视频图像的传输方法可以包括:
步骤601,判断当前帧是否被标记为LTR帧。
如果当前帧未被标记为LTR帧,则执行步骤602;如果当前帧被标记为LTR帧,则执行步骤603。
具体地,判断当前帧是否被标记为LTR帧可以为:根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧。
其中,根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧可以为:获取当前帧和与上述当前帧时域距离最近的前向LTR帧之间的间隔帧数;如果上述间隔帧数等于LTR帧的标记间隔,则将当前帧标记为LTR帧;如果上述间隔帧数不等于LTR帧的标记间隔,则对当前帧不标记为LTR帧。
进一步地,本实施例中,发送端设备可以根据接收端设备反馈的网络状态信息,确定LTR帧的标记间隔,上述网络状态信息可以包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
参见图5,在具体实现时,发送端设备可以根据网络丢包信息和网络RTT确定网络特征,然后将网络特征、抗丢包算法、接收端反馈已确认的LTR帧、LTR损失率(参考距离增加会造成同样码率时编码画面质量损失)、 视频图像的运动场景(即图5中的画面运动状况)、码表、目标卡顿次数、人主观可感知卡顿时长以及DPB中可缓存的LTR帧数等信息中的一个或多个作为判决输入,获得LTR帧的标记间隔,还可以获得以下信息之一或组合:是否全参考前向LTR帧、冗余策略以及分辨率/码率/帧率等。
本实施例,根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
步骤602,对未标记的当前帧进行编码,其中,上述编码过程可以包括:至少将表示当前帧的帧间参考关系的信息编入码流,上述当前帧的帧间参考关系表示当前帧参考与上述当前帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;需要说明的是,与上述当前帧时域距离最近的前向LTR帧是指:当前帧的POC与时域距离最近的前向LTR帧的POC之间的差值小于本帧的POC与其他前向LTR帧的POC之间的差值。然后执行步骤604。
步骤603,对标记的当前帧进行编码,其中,上述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,上述当前帧的帧间参考关系表示当前帧参考与上述当前帧时域距离最近的目标前向LTR帧,其中,上述目标前向LTR帧为所述发送端设备接收到接收端设备确认消息的前向LTR帧,具体地,上述目标前向LTR帧为发送端设备标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,上述确认消息与所述目标前向LTR帧对应。然后执行步骤604。
本申请中,发送端设备即本端,例如也可以叫做编码端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;
需要说明的是,与上述当前帧时域距离最近的目标前向LTR帧是指:当前帧的POC与目标前向LTR帧的POC之间的差值小于本帧的POC与其他目标前向LTR帧的POC之间的差值。
步骤604,发送上述码流。
上述视频图像的传输方法中,在对未标记的当前帧进行编码时,参考与未标记的当前帧时域距离最近的前向LTR帧,上述前向LTR帧为发送 端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
下面参考图7(a)~图7(b),对本申请图6所示实施例提供的视频图像的传输方法进行说明。图7(a)~图7(b)为本申请视频图像的传输方法中视频帧的帧间参考关系另一个实施例的示意图。
参见图7(a)所示,初始阶段,编码端将编码的首个I帧(即图7中的第1帧)标记为LTR帧,然后将编码后的I帧进行分包与冗余处理,通过网络发送给解码端;同时将I帧作为关键帧进行有别于普通帧的不对等冗余保护,确保解码端能及时完整接收到此类关键帧,解码端在接收到I帧,并确认I帧可以正常解码之后,向编码端及时反馈确认消息,如果发送端设备在预定时长内未接收到接收端设备反馈的确认消息,发送端设备将重新对I帧进行编码,防止初始阶段接通异常。
然后发送端设备在对第2帧和第3帧编码时,均参考第1帧,在对第4帧进行编码时,发送端设备接收到了接收端设备反馈的网络状态信息,如上所述,发送端设备可以根据上述接收端设备反馈的网络状态信息确定LTR帧的标记间隔,这时,发送端设备确定的LTR帧的标记间隔为2,在对第4帧进行编码时,发送端设备发现第4帧与第1帧之间的间隔帧数为2,等于LTR帧的标记间隔,于是,发送端设备将第4帧标记为LTR帧,由于这时发送端设备已收到接收端设备发送的针对第1帧的确认消息,也就是说,第1帧可以被解码端正常解码,是目标前向LTR帧,这样,第1帧就是与第4帧时域距离最近的目标前向LTR帧,因此发送端设备在对第4帧进行编码时,参考第1帧。
在发送端设备对第5帧进行编码时,发送端设备同样可以根据上述接收端设备反馈的网络状态信息确定LTR帧的标记间隔,这时发送端设备确定的LTR帧的标记间隔为3。由于第4帧为第5帧的前向LTR帧,第5帧与第4帧之间的间隔帧数为1,因此第5帧未被标记为LTR帧,发送端设备参考与第5帧时域距离最近的前向LTR帧(即第4帧)对第5帧进 行编码。
后续帧的编码过程与上述编码过程相似,在此不再赘述。
需要说明的是,在对第16帧进行编码时,发送端设备同样可以根据上述接收端设备反馈的网络状态信息确定LTR帧的标记间隔,这时发送端设备确定的LTR帧的标记间隔为2,由于第13帧为第16帧的前向LTR帧,第16帧与第13帧之间的间隔帧数为2,因此第16帧被标记为LTR帧,但这时,发送端设备未接收到接收端设备针对第13帧的确认消息,因此与第16帧时域距离最近的目标前向LTR帧为第8帧,所以发送端设备参考第8帧对第16帧进行编码。
按照本实施例提供的视频图像的传输方法,即使第5帧、第6帧、第12帧、第13帧、第14帧和第18帧这几帧数据丢包造成视频帧不完整,也不影响其它接收完整的视频帧的正常解码,图7(a)中,第15帧由于参考第13帧编码,第13帧不完整,因此第15帧也无法解码。
在本实施例步骤602中,上述当前帧的帧间参考关系表示当前帧参考与上述当前帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中上述当前帧未被标记为LTR帧且当前帧的编码质量大于或等于编码质量阈值;或者,
当前帧的帧间参考关系表示当前帧参考与上述当前帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中上述当前帧未被标记为LTR帧且上述当前帧的编码质量小于编码质量阈值。
如果当前帧未被标记为LTR帧,那么发送端设备在对当前帧进行编码时,参考与当前帧时域距离最近的前向LTR帧,在对当前帧编码之后,发送端设备获取当前帧的编码质量,将当前帧的编码质量与编码质量阈值进行对比,如果当前帧的编码质量小于编码质量阈值,则在对当前帧的后一帧进行编码时,参考与后一帧时域距离最近的目标前向LTR帧,以提高当前帧的后一帧的编码质量。
进一步地,发送端设备还可以对当前帧的后M+1帧进行编码,上述编码过程包括:将表示当前帧的后M+1帧的帧间参考关系的信息编入码 流,上述后M+1帧的帧间参考关系表示后M+1帧中的第一帧参考与第一帧时域距离最近的目标前向LTR帧,后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,M为正整数;其中,上述当前帧未被标记为LTR帧且当前帧的编码质量小于编码质量阈值。
进一步地,发送端设备还可以对当前帧的后一帧进行编码,上述编码过程包括:将表示当前帧的后一帧的帧间参考关系的信息编入码流,上述后一帧的帧间参考关系表示后一帧参考与本帧时域距离最近的目标前向LTR帧,其中上述当前帧未被标记为LTR帧且当前帧的编码质量小于编码质量阈值。
参见图7(b),发送端设备在对当前帧进行编码时,如果当前帧未被标记为LTR帧,则发送端设备参考与上述当前帧时域距离最近的前向LTR帧对上述当前帧进行编码,在对当前帧编码之后,如果发送端设备发现当前帧的编码质量小于编码质量阈值(即当前帧的PSNR小于PSNR阈值),则发送端设备在对当前帧的后一帧进行编码时,参考与后一帧的时域距离最近的目标前向LTR帧,如图7(b)中所示,当前帧的后一帧即为当前帧的后M+1帧中的第一帧,在发送端设备对第一帧之后的每一帧进行编码时,均参考本帧的前一帧。
图7(b)中,当前帧的后一帧可以看作虚拟LTR帧,虚拟LTR帧以目标前向LTR帧作为参考帧进行编码,虚拟LTR帧不缓存DPB中,虚拟LTR帧的后续帧将虚拟LTR帧作为短期参考进行编码。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,如图7(a)中,在对第16帧进行编码时,需要参考第8帧,参考距离达到了7帧,因此第16帧的编码质量势必会明显下降。这时,如果发送端设备在对当前帧进行编码之后,发现当前帧的编码质量小于编码质量阈值,则发送端设备确定当前帧的后M+1帧中的第一帧参考与第一帧时域距离最近的目标前向LTR帧,后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,如图7(b)所示,其中,M为正整数,从而可以缩短帧间参考距离,提高网络差点环境下 的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
本实施例中,发送端设备根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
图8为本申请视频通话方法一个实施例的流程图,本实施例提供的视频通话方法可以应用于具有显示屏和图像采集器的电子设备中。其中,上述显示屏可以包括车载计算机(移动数据中心Mobile Data Center)的显示屏;上述图像采集器可以为摄像头Camera,或者车载传感器等;上述电子设备可以为移动终端(手机),智慧屏,无人机,智能网联车(Intelligent Connected Vehicle;以下简称:ICV),智能(汽)车(smart/intelligent car)或车载设备等设备。
如图8所示,上述视频通话方法可以包括:
步骤801,响应于第一用户请求与第二用户进行视频通话的第一操作,建立第一用户与第二用户之间的视频通话连接,这里的视频通话连接是指第一用户使用的电子设备与第二用户使用的电子设备之间视频通话连接。
具体地,参见图9(a)~图9(c),图9(a)~图9(c)为本申请视频通话方法中请求视频通话的示意图,如图9(a)所示,第一用户可以点击第一用户所使用的电子设备中显示的通话图标9a,进入图9(b)所示的界面,然后在图9(b)所示的界面中,点击第二用户的标识,进入图9(c)所示的界面,然后在图9(c)所示的界面中,点击“畅连通话”中的视频通话图标9b,从而完成请求与第二用户进行视频通话的第一操作。
然后,第一用户使用的电子设备响应于第一用户请求与第二用户进行 视频通话的第一操作,建立第一用户与第二用户之间的视频通话连接。
在建立视频通话连接阶段,第一用户使用的电子设备显示图9(d)所示的界面,建立视频通话连接之后,第一用户使用的电子设备显示图9(e)所示的界面。
其中,图9(d)为本申请视频通话方法中建立视频通话连接阶段的界面。
步骤802,通过图像采集器采集包括第一用户的环境的视频图像,上述视频图像包括多个视频帧。这里的环境可以是第一用户所处的内部环境和/或外部环境的视频图像,比如车内环境和/或在行驶过程中智能化探测障碍物、感知周围环境。
其中,上述图像采集器可以为第一用户使用的电子设备中的摄像头或者车载传感器。
步骤803,对上述多个视频帧进行编码,得到经编码的码流,上述码流至少包括表示帧间参考关系的信息。
步骤804,发送上述经编码的码流。
具体地,可以将上述码流发送给第二用户使用的电子设备。
其中,上述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息。
其中:当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,上述目标前向LTR帧为发送端设备标记为LTR帧,并且接收到接收端设备发送的确认消息的已编码的视频帧,上述确认消息与目标前向LTR帧对应;上述发送端设备为第一用户使用的电子设备,上述接收端设备为第二用户使用的电子设备。
当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧。这里的本帧是指前N帧中的每一帧。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的 情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频通话方法中,响应于第一用户请求与第二用户进行视频通话的第一操作,建立第一用户与第二用户之间的视频通话连接之后,通过图像采集器采集包括第一用户的环境的视频图像,然后对上述视频图像包括的多个视频帧进行编码,得到经编码的码流,上述码流至少包括表示帧间参考关系的信息。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
本实施例中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,当前帧的后M帧的帧间参考关系的信息表示后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境 下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
本实施例中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,上述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
本实施例中,发送端设备根据当前帧的前n帧的编码质量确定N的数值,n<N。在具体实现时,发送端设备可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述发送端设备根据所述当前帧的前n帧的编码质量与编码质量阈值的比较结果,确定N的数值。具体地,在对每一帧进行编码之后,发送端设备可以输出表示这一帧的编码质量的PSNR,如果发送端设备发现当前帧的前n帧中每一帧的PSNR均比前一帧的PSNR小,即前n帧的PSNR呈下降趋势,并且当前帧的前一帧的PSNR小于编码质量阈值(即PSNR阈值),则发送端设备确定当前帧需要参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧需要参考本帧的前一帧。这时,当前帧和与当前帧时域距离最近的前向LTR帧之间的帧数即为N的数值。
在一种可能的实现方式中,发送端设备根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述 单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。其中,上述LTR帧的标记间隔是指标记LTR帧的间隔帧数,即距离标记上一个LTR帧后,需要间隔多少帧标记下一个LTR帧。举例来说,如果LTR帧的标记间隔为4,那么在将当前帧标记为LTR帧之后,需要间隔4帧,将当前帧之后的第5帧标记为LTR帧。
在一种可能的实现方式中,上述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),上述L帧时域上在上述M帧之后,上述L帧的帧间参考关系的信息表示上述(Mn+1)帧中的第一帧参考与第一帧时域距离最近的目标前向LTR帧,上述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
如图4(a)所示,当前帧参考与本帧时域距离最近的目标前向LTR帧,这句话中的本帧即为当前帧;对于图4(a)中的当前帧,N=4,M=3,当前帧的前4帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这句话中的本帧即为前4中的每一帧,本例中,与当前帧的前4帧中的每一帧时域距离最近的前向LTR帧恰好也为目标前向LTR帧,当然,当前帧的前4帧中的每一帧所参考的前向LTR帧也可以不是目标前向LTR帧。前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,与目标前向LTR帧不同,前向LTR帧是发送端设备根据LTR帧的标记间隔标记的,并且发送端设备并未接收到接收端设备针对上述前向LTR帧发送的确认消息。
继续参见图4(a),当前帧的后3帧中的每一帧均参考本帧的前一帧。
图4(a)中,当前帧的后3帧之后,还包括L帧,图4(a)中,L=4,也就是说,L=M1+1,这里的M1=3;这4帧中的第一帧参考与第一帧时域距离最近的目标前向LTR帧,这4帧中第一帧之后的每一帧参考本帧的前一帧。
参见图4(b),对于图4(b)中的当前帧,N=4,M=3,当前帧的前4帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这句话中的本帧指当前帧的前4帧中的每一帧,这时,与前4帧中的每一帧时域距离最近的前向LTR帧,就不是目标前向LTR帧;
继续参见图4(b),当前帧的后3帧中的每一帧均参考本帧的前一帧,这句话中的本帧指后3帧中的每一帧。
参见图4(c),对于图4(c)中的当前帧,在进行编码时,发送端设备将上述当前帧标记为LTR帧,那么对于上述当前帧的后M帧,上述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,此处的前向LTR帧在图4(c)中即为当前帧,这句话中的本帧指后M帧中的每一帧。
本实施例中,发送端设备可以根据接收端设备反馈的网络状态信息,确定上述LTR帧的标记间隔D,上述网络状态信息可以包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
具体地,参见图5,发送端设备可以根据网络丢包信息和网络RTT 确定网络特征,然后将网络特征、抗丢包算法、接收端反馈已确认的LTR帧、LTR损失率(参考距离增加会造成同样码率时编码画面质量损失)、视频图像的运动场景(即图5中的画面运动状况)、码表、目标卡顿次数、人主观可感知卡顿时长以及DPB中可缓存的LTR帧数等信息中的一个或多个作为判决输入,获得LTR帧的标记间隔,还可以获得以下信息之一或组合:是否全参考前向LTR帧、冗余策略以及分辨率/码率/帧率等。
在一种可能的实现方式中,上述LTR帧的标记间隔D用于发送端设备标记LTR帧。
其中,发送端设备根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息,动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
图9(e)为本申请视频通话方法中建立视频通话连接之后的界面。图9(e)中,9c所示的小窗口中显示包括第一用户的环境的视频图像,9d所示的大窗口中显示包括第二用户的环境的视频图像。其中,在9d所示的大窗口中显示的视频图像是第一用户使用的电子设备对第二用户使用的电子设备发送的码流解码后获得的,上述码流是第二用户使用的电子设备按照本申请图8所示实施例提供的方法,对包括第二用户的环境的视频图像进行编码后获得的。
本申请图8所示实施例提供的视频通话方法可以应用于视频通话或视频会议等各类实时音视频互动场景中。图10(a)~图10(b)为本申请视频通话方法的应用场景一个实施例的示意图。图10(a)~图10(b)示出的是两个用户进行视频通话的场景。
图10(b)中,图像采集器用于从获取实时YUV数据;
视频前处理器:从Camera获取的YUV数据转化成编码器所需要的格式与分辨率、手机设备完成图像的横竖屏旋转处理。
网络分析处理系统:依据反馈信息控制分辨率、帧率、冗余率与数据帧参考关系等信息,具体的分析方式可以参见图5的相关描述,在此不再赘述。
视频编码器:根据网络分析处理系统确定的参考帧完成编码处理,实现DPB中LTR标记与缓存。
网络传输器:完成视频流/控制信息流网络发送与接收处理过程。
视频帧处理模块:完成数据帧组帧、冗余数据恢复与数据帧完整性校验业务。
视频解码器:将前序模块组好的数据帧按照参考关系完成数据帧解码。
视频显示器:将解码完成数据帧,提交给显示模块,完成数据帧渲染显示业务。
图11(a)~图11(b)为本申请视频通话方法的应用场景另一个实施例的示意图。图11(a)~图11(b)示出了多方用户进行视频会议的场景,图11(b)中各模块的功能与图10(b)中相应模块的功能相同,在此不再赘述。
图12为本申请视频图像的显示方法一个实施例的流程图,本实施例中,上述视频图像包括多个视频帧,如图12所示,上述视频图像的显示方法可以包括:
步骤1201,解析码流,以得到表示帧间参考关系的信息。
其中,上述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧;其中,所述目标前向LTR帧为发送端设备标记为LTR帧,并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,所述前向LTR帧为所述 发送端设备标记为LTR帧的已编码的视频帧。这里的本帧是指前N帧中的每一帧。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
在一种可能的实现方式中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,上述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,上述当前帧的后M帧的帧间参考关 系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
步骤1202,重建上述多个视频帧,其中,上述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧。
步骤1203,显示上述视频图像。
上述视频图像的显示方法中,在解析码流之后,可以得到表示帧间参考关系的信息,上述表示帧间参考关系的信息中包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧。也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
可以理解的是,上述实施例中的部分或全部步骤骤或操作仅是示例,本申请实施例还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照上述实施例呈现的不同的顺序来执行,并且有可能并非要执行上述实施例中的全部操作。
图13为本申请视频图像的发送设备一个实施例的结构示意图,上述视频图像包括多个视频帧,如图13所示,上述视频图像的发送设备130可以包括:编码模块1301和传输模块1302;应当理解的是,视频图像的发送设备130可以对应于图1中的发送端A,或者可以对应于图10(b)或图11(b)中的发送设备,或者可以对应于图16(a)的装置900,或者可以对应于图16(b)的装置40,或者可以对应于图16(c)的装置400。 其中,编码模块1301具体可以对应于图10(b)或图11(b)中的发送设备中的视频编码器,或者,具体可以对应于图16(b)所示的装置40中的编码器20。
其中,编码模块1301,用于对上述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;另外,上述码流中还可以包括已编码数据,例如:当前帧与参考帧的残差数据等;上述表示帧间参考关系的信息可以放在条带头(slice header)中;
传输模块1302,用于发送上述经编码的码流,其中,表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,所述目标前向LTR帧为发送端设备接收到接收端设备确认消息的前向LTR帧,具体地,所述目标前向LTR帧为编码模块1301标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;本申请中,视频图像的发送设备即本端,例如也可以叫做发送端设备,接收端设备为对端或远端,例如也可以叫做解码端设备;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,所述前向LTR帧为编码模块1301标记为LTR帧的已编码的视频帧,上述前向LTR帧存储在DPB中。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的发送设备中,编码模块1301对上述多个视频帧进行 编码,得到经编码的码流。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为编码模块1301标记为LTR帧的已编码的视频帧,也就是说,本实施例中,编码模块1301在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,上述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,编码模块确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在一种可能的实现方式中,编码模块1301根据上述当前帧的前n帧的编码质量确定N的数值,n<N。在具体实现时,编码模块1301可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,编码模块1301根据所述当前帧的前n帧的编码质量与编码质量阈值的比较结果,确定N的数值。
在一种可能的实现方式中,编码模块1301根据单位时间内所包括的视频帧数确定M的数值。在具体实现时,编码模块1301可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。
在一种可能的实现方式中,上述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,编码模块1301可以在对 M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
在一种可能的实现方式中,编码模块1301根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔D,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述LTR帧的标记间隔D用于编码模块1301标记LTR帧。
其中,编码模块1301根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,编码模块1301可以根据网络状况等信息,动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
图13所示实施例提供的视频图像的发送设备可用于执行本申请图3所示方法实施例的技术方案,其实现原理和技术效果可以进一步参考方法实施例中的相关描述。
图14为本申请视频图像的发送设备另一个实施例的结构示意图。应当理解的是,图14所示的视频图像的发送设备140可以对应于图1中的 发送端A,或者可以对应于图10(b)或图11(b)中的发送设备,或者可以对应于图16(a)的装置900,或者可以对应于图16(b)的装置40,或者可以对应于图16(c)的装置400。其中,编码模块1402具体可以对应于图10(b)或图11(b)中的发送设备中的视频编码器,或者,具体可以对应于图16(b)所示的装置40中的编码器20。
本实施例中,上述视频图像包括多个视频帧,如图14所示,上述视频图像的发送设备140可以包括:判断模块1401、编码模块1402和传输模块1403;
判断模块1401,用于判断当前帧是否被标记为LTR帧;具体地,判断模块1401可以对应于图10(b)中的网络分析处理系统;
编码模块1402,用于当上述当前帧未被标记为LTR帧时,对未标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为编码模块1402标记为LTR帧的已编码的视频帧;或者,
当所述当前帧被标记为LTR帧时,对标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与上述当前帧时域距离最近的目标前向LTR帧,其中,所述目标前向LTR帧为编码模块1402接收到接收端设备确认消息的前向LTR帧,具体地,所述目标前向LTR帧为编码模块1402标记为LTR帧并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;具体地,编码模块1402可以对应于图10(b)中的视频编码器。
传输模块1403,用于发送经编码的码流。具体地,传输模块1403可以对应于图10(b)中的网络传输器。
上述视频图像的发送设备中,在编码模块1402对未标记的当前帧进行编码时,参考与未标记的当前帧时域距离最近的前向LTR帧,上述前向LTR帧为编码模块标记为LTR帧的已编码的视频帧,也就是说,本实施例中,编码模块在标记LTR帧时,无需等待接收端设备的反馈,因此 可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,判断模块1401,具体用于根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧。
如图15所示,图15为本申请视频图像的发送设备再一个实施例的结构示意图。应当理解的是,图15所示的视频图像的发送设备150可以对应于图1中的发送端A,或者可以对应于图10(b)或图11(b)中的发送设备,或者可以对应于图16(a)的装置900,或者可以对应于图16(b)的装置40,或者可以对应于图16(c)的装置400。其中,编码模块1402具体可以对应于图10(b)或图11(b)中的发送设备中的视频编码器,或者,具体可以对应于图16(b)所示的装置40中编码器20。
在一种可能的实现方式中,判断模块1401可以包括:获取子模块14011和标记子模块14012
获取子模块14011,用于获取当前帧和与上述当前帧时域距离最近的前向LTR帧之间的间隔帧数;
标记子模块14012,用于当上述间隔帧数等于所述LTR帧的标记间隔时,将所述当前帧标记为LTR帧;当所述间隔帧数不等于所述LTR帧的标记间隔,对所述当前帧不标记为LTR帧。
进一步地,判断模块1401,还用于根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量大于或等于编码质量阈值;或者,
当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
如果当前帧未被标记为LTR帧,那么发送端设备在对当前帧进行编码时,参考与当前帧时域距离最近的前向LTR帧,在对当前帧编码之后,编码模块1402获取当前帧的编码质量,将当前帧的编码质量与编码质量阈值进行对比,如果当前帧的编码质量小于编码质量阈值,则在编码模块1402对当前帧的后一帧进行编码时,参考与后一帧时域距离最近的目标前向LTR帧,以提高当前帧的后一帧的编码质量。
在一种可能的实现方式中,编码模块1402,还用于对当前帧的后M+1帧进行编码,所述编码过程包括:将表示所述当前帧的后M+1帧的帧间参考关系的信息编入码流,所述后M+1帧的帧间参考关系表示所述后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,M为正整数;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,编码模块1402可以在对当前帧的后M+1帧进行编码时,确定后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,上述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,编码模块1402,还用于对当前帧的后一帧进行编码,所述编码过程包括:将表示所述当前帧的后一帧的帧间参考关系的信息编入码流,所述后一帧的帧间参考关系表示所述后一帧参考与所述本帧时域距离最近的目标前向LTR帧,其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
在一种可能的实现方式中,编码模块1402,用于根据单位时间内所 包括的视频帧数确定M的数值。在具体实现时,编码模块可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
图14和图15所示实施例提供的视频图像的发送设备可用于执行本申请图6所示方法实施例的技术方案,其实现原理和技术效果可以进一步参考方法实施例中的相关描述。
应理解以上图13~图15所示的视频图像的发送设备的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块以软件通过处理元件调用的形式实现,部分模块通过硬件的形式实现。例如,编码模块可以为单独设立的处理元件,也可以集成在视频图像的发送设备,例如电子设备的某一个芯片中实现。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit;以下简称:ASIC),或,一个或多个微处理器(Digital Singnal Processor;以下简称:DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array;以下简称:FPGA)等。再如,这些模块可以集成在一起,以片上系统(System-On-a-Chip;以下简称:SOC)的形式实现。
图16(a)为本申请视频通话设备一个实施例的结构示意图,上述视频通话设备可以为第一用户使用的视频通话设备,如图16(a)所示,上述视频通话设备可以包括:显示屏;图像采集器;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序。
其中,上述显示屏可以包括车载计算机(移动数据中心Mobile Data Center)的显示屏;上述图像采集器可以为摄像头Camera,或者车载传感器等;上述视频通话设备可以为移动终端(手机),智慧屏,无人机,智能网联车(Intelligent Connected Vehicle;以下简称:ICV),智能(汽)车(smart/intelligent car)或车载设备等设备。
其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述设备执行时,使得所述设备执行以下步骤:响应于第一用户请求与第二用户进行视频通话的第一操作,建立所述第一用户与所述第二用户之间的视频通话连接;这里的视频通话连接是指第一用户使用的电子设备与第二用户使用的电子设备之间视频通话连接;
通过所述图像采集器采集包括所述第一用户的环境的视频图像,所述视频图像包括多个视频帧,这里的环境可以是第一用户所处的内部环境和/或外部环境的视频图像,比如车内环境和/或在行驶过程中智能化探测障碍物、感知周围环境;
对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;
发送经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,所述目标前向LTR帧为发送端设备标记为LTR帧,并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;所述发送端设备为所述第一用户使用的视频通话设备,所述接收端设备为所述第二用户使用的视频通话设备;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,所述前向LTR帧为所述发送端设备标记为LTR帧的已编码的视频帧,这里的本帧是指前N帧中的每一帧。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频通话设备中,响应于第一用户请求与第二用户进行视频通话的第一操作,建立第一用户与第二用户之间的视频通话连接之后,通过图像采集器采集包括第一用户的环境的视频图像,然后对上述视频图像包括的多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息。上述表示帧间参考关系的信息包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示当前帧的前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧,也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参 考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在一种可能的实现方式中,当上述指令被所述设备执行时,使得上述设备具体执行以下步骤:
根据所述当前帧的前n帧的编码质量确定N的数值,n<N。
在具体实现时,发送端设备可以根据当前帧的前n帧的编码质量、上述视频图像的运动场景和接收端设备反馈的网络状态信息确定N的数值,上述网络状态信息可以包括网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
在一种可能的实现方式中,当所述指令被所述设备执行时,使得所述设备具体执行以下步骤:
根据单位时间内所包括的视频帧数确定M的数值。
在具体实现时,发送端设备可以根据单位时间内所包括的视频帧数和上述视频图像的运动场景确定M的数值。其中,上述单位时间可以在具体实现时,根据系统性能和/或实现需求等自行设定,举例来说,上述单位时间可以为1秒。
在一种可能的实现方式中,LTR帧的标记间隔D与N和M具有函数关系。举例来说,上述函数关系可以为D=N+(M+1)。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
其中,M1,M2,…,Mn的数值可以相同也可以不同,具体的数值大小可以根据实际的应用场景来确定。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备可以在对M之后的L帧进行编码时,确定(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,LTR帧的标记间隔D与N和L具有函数关系。举例来说,上述函数关系可以为D=N+L,L=(M1+1)+(M2+1)+…+(Mn+1)。
在一种可能的实现方式中,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
其中,所述发送端设备根据LTR的标记间隔进行LTR帧的标记,可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且本申请中,LTR的标记间隔不是固定设置的,而是动态变化的,可能是相同间隔,也可能是不同间隔,具体根据实 际应用场景来确定,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。并且,本实施例中,发送端设备可以根据网络状况等信息,动态确定LTR的标记间隔,可以及时应对现网突发丢包、大丢包以及拥塞等网络差点场景,并可以兼顾流畅度与清晰度,实现最佳的视频通话体验。
图16(a)所示的电子设备可以是终端设备也可以是内置于上述终端设备的电路设备。该设备可以用于执行本申请图8所示实施例提供的方法中的功能/步骤。
如图16(a)所示,电子设备900包括处理器910和收发器920。可选地,该电子设备900还可以包括存储器930。其中,处理器910、收发器920和存储器930之间可以通过内部连接通路互相通信,传递控制和/或数据信号,该存储器930用于存储计算机程序,该处理器910用于从该存储器930中调用并运行该计算机程序。
可选地,电子设备900还可以包括天线940,用于将收发器920输出的无线信号发送出去。
上述处理器910可以和存储器930可以合成一个处理装置,更常见的是彼此独立的部件,处理器910用于执行存储器930中存储的程序代码来实现上述功能。具体实现时,该存储器930也可以集成在处理器910中,或者,独立于处理器910。
除此之外,为了使得电子设备900的功能更加完善,该电子设备900还可以包括输入单元960、显示单元970、音频电路980、摄像头990和传感器901等中的一个或多个,所述音频电路还可以包括扬声器982、麦克风984等。其中,显示单元970可以包括显示屏,摄像头990是图像采集器的一种具体示例,图像采集器可以为具有图像采集功能的设备,本实施例对图像采集器的具体形式不作限定。
可选地,上述电子设备900还可以包括电源950,用于给终端设备中的各种器件或电路提供电源。
应理解,图16(a)所示的电子设备900能够实现图8所示实施例提供的方法的各个过程。电子设备900中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见图8所示方法实施 例中的描述,为避免重复,此处适当省略详细描述。
应理解,图16(a)所示的电子设备900中的处理器910可以是片上系统SOC,该处理器910中可以包括中央处理器(Central Processing Unit;以下简称:CPU),还可以进一步包括其他类型的处理器,例如:图像处理器(Graphics Processing Unit;以下简称:GPU)等。
总之,处理器910内部的各部分处理器或处理单元可以共同配合实现之前的方法流程,且各部分处理器或处理单元相应的软件程序可存储在存储器930中。
本申请实施例还提供一种用于解码视频数据的设备,上述设备包括:
存储器,用于存储码流形式的视频数据;
视频解码器,用于从码流中解码出表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;这里的本帧是指当前帧;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧;这里的本帧是指前N帧中的每一帧;
重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧。
本申请实施例还提供一种用于编码视频数据的设备,所述设备包括:
存储器,用于存储视频数据,所述视频数据包括一个或多个视频帧;
视频编码器,用于对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;另外,上述码流中还可以包括已编码数据,例如:当前帧与参考帧的残差数据等;上述表示帧间参考关系的信息可以放在条带头(slice header)中;
发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息, 其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧,其中,上述目标前向LTR帧为用于编码视频数据的设备接收到用于解码视频数据的设备的确认消息的前向LTR帧,具体地,上述目标前向LTR帧可以为用于编码视频数据的设备标记为LTR帧并且接收到用于解码视频数据的设备发送的确认消息的已编码的视频帧,上述确认消息与所述目标前向LTR帧对应;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,这里的本帧是指前N帧中的每一帧,所述前向LTR帧为所述用于编码视频数据的设备标记为LTR帧的已编码的视频帧,本申请中,前向LTR帧存储在DPB中。
参见图16(b),图16(b)是根据一示例性实施例的包含编码器20和/或解码器30的视频译码装置40的实例的说明图。视频译码装置40可以实现本申请实施例的各种技术的组合。在所说明的实施方式中,视频译码装置40可以包含成像设备41、编码器20、解码器30(和/或藉由处理电路46的逻辑电路47实施的视频编/解码器)、天线42、一个或多个处理器43、一个或多个存储器44和/或显示设备45。
如图16(b)所示,成像设备41、天线42、处理电路46、逻辑电路47、编码器20、解码器30、处理器43、存储器44和/或显示设备45能够互相通信。如所论述,虽然用编码器20和解码器30绘示视频译码装置40,但在不同实例中,视频译码装置40可以只包含编码器20或只包含解码器30。
在一些实例中,天线42可以用于传输或接收视频数据的经编码比特流。另外,在一些实例中,显示设备45可以用于呈现视频数据。在一些实例中,逻辑电路47可以通过处理电路46实施。处理电路46可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。视频译码装置40也可以包含可选的处理器43,该可选处理器43类似地可以包含专用集成电路(application-specific  integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。在一些实例中,逻辑电路47可以通过硬件实施,如视频编码专用硬件等,处理器43可以通过通用软件、操作系统等实施。另外,存储器44可以是任何类型的存储器,例如易失性存储器(例如,静态随机存取存储器(Static Random Access Memory,SRAM)、动态随机存储器(Dynamic Random Access Memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,存储器44可以由超速缓存内存实施。在一些实例中,逻辑电路47可以访问存储器44(例如用于实施图像缓冲器)。在其它实例中,逻辑电路47和/或处理电路46可以包含存储器(例如,缓存等)用于实施图像缓冲器等。
在一些实例中,通过逻辑电路实施的编码器20可以包含(例如,通过处理电路46或存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过逻辑电路47实施的编码器20,以实施参照图2和/或本文中所描述的任何其它编码器系统或子系统所论述的各种模块。逻辑电路可以用于执行本文所论述的各种操作。
在一些实例中,解码器30可以以类似方式通过逻辑电路47实施,以实施参照图3的解码器30和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的解码器30可以包含(通过处理电路44或存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过逻辑电路47实施的解码器30,以实施参照图3和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。
在一些实例中,天线42可以用于接收视频数据的经编码比特流。如上论述,经编码比特流可以包含本文所论述的与编码视频帧相关的参考关系信息等。视频译码装置40还可包含耦合至天线42并用于解码经编码比特流的解码器30。显示设备45用于呈现视频帧。
应理解,关于信令语法元素,解码器30可以用于接收并解析这种语 法元素,相应地解码相关视频数据。在一些例子中,编码器20可以将语法元素熵编码成经编码视频比特流。在此类实例中,解码器30可以解析这种语法元素,并相应地解码相关视频数据。
需要说明的是,本申请实施例描述的视频图像编码方法发生在编码器20处,本申请实施例描述的视频图像解码方法发生在解码器30处,本申请实施例中的编码器20和解码器30可以是例如H.263、H.264、HEVC、MPEG-2、MPEG-4、VP8、VP9等视频标准协议或者下一代视频标准协议(如H.266等)对应的编/解码器。
参见图16(c),图16(c)是本申请实施例提供的视频译码设备400(例如视频编码设备400或视频解码设备400)的结构示意图。视频译码设备400适于实施本文所描述的实施例。在一个实施例中,视频译码设备400可以是视频解码器(例如图16(b)的解码器30)或视频编码器(例如图16(b)的编码器20)。在另一个实施例中,视频译码设备400可以是上述图16(b)的解码器30或图16(b)的编码器20中的一个或多个组件。
视频译码设备400包括:用于接收数据的入口端口410和接收单元(Rx)420,用于处理数据的处理器、逻辑单元或中央处理器(CPU)430,用于传输数据的发射器单元(Tx)440(或者简称为发射器440)和出口端口450,以及,用于存储数据的存储器460(比如内存460)。视频译码设备400还可以包括与入口端口410、接收器单元420(或者简称为接收器420)、发射器单元440和出口端口450耦合的光电转换组件和电光(EO)组件,用于光信号或电信号的出口或入口。
处理器430通过硬件和软件实现。处理器430可以实现为一个或多个CPU芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器430与入口端口410、接收器单元420、发射器单元440、出口端口450和存储器460通信。处理器430包括译码模块470(例如编码模块470或解码模块470)。编码/解码模块470实现本文中所公开的实施例,以实现本申请实施例所提供的色度块预测方法。例如,编码/解码模块470实现、处理或提供各种编码操作。因此,通过编码/解码模块470为视频译码设备 400的功能提供了实质性的改进,并影响了视频译码设备400到不同状态的转换。或者,以存储在存储器460中并由处理器430执行的指令来实现编码/解码模块470。
存储器460包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择性地执行这些程序时存储程序,并存储在程序执行过程中读取的指令和数据。存储器460可以是易失性和/或非易失性的,可以是只读存储器(ROM)、随机存取存储器(RAM)、随机存取存储器(Ternary Content-Addressable Memory;以下简称:TCAM)和/或静态随机存取存储器(SRAM)。
图17为本申请视频图像的接收设备一个实施例的结构示意图。上述视频图像包括多个视频帧,如图17所示,上述视频图像的接收设备170可以包括:解码模块1701、解码模块1701和显示模块1702;应当理解的是,图17所示的视频图像的接收设备170可以对应于图1中的接收端B,或者可以对应于图10(b)或图11(b)中的接收设备,或者可以对应于图16(a)的装置900,或者可以对应于图16(b)的装置40,或者可以对应于图16(c)的装置400。
其中,解码模块1701可以对应于图10(b)或图11(b)中的接收设备中的视频解码器,或者,具体可以对应于图16(b)所示的装置40中解码器30。
解码模块1701,用于解析码流,以得到表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,这里的本帧是指当前帧;其中,所述目标前向LTR帧为发送端设备标记为LTR帧,并且接收到接收端设备发送的确认消息的已编码的视频帧,所述确认消息与所述目标前向LTR帧对应;
所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每 一帧均参考与本帧时域距离最近的前向LTR帧,所述前向LTR帧为所述发送端设备标记为LTR帧的已编码的视频帧;
解码模块1701,还用于重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧;
显示模块1702,用于显示所述视频图像。
需要说明的是,当前帧的前N帧与当前帧之间可以存在其他帧,也可以是当前帧的前N帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述前N帧一样,也可以采用其他的帧间参考关系。
换句话说,当前帧与时域距离最近的前向LTR帧(例如A)之间的多个帧中的所有帧可以参考同一个LTR帧(例如A),也可以是多个帧中的部分帧参考同一个LTR帧(例如A)。
上述视频图像的接收设备中,在解码模块1701解析码流之后,可以得到表示帧间参考关系的信息,上述表示帧间参考关系的信息中包括当前帧的前N帧的帧间参考关系的信息,上述当前帧的前N帧的帧间参考关系的信息表示前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,上述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧。也就是说,本实施例中,发送端设备在标记LTR帧时,无需等待接收端设备的反馈,因此可以实现一个RTT内标记多个LTR帧,从而可以大大缩短帧间参考距离,提升视频图像的编码质量。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
在网络差点高时延场景下,当LTR帧的间隔过长时,如果LTR帧的 后续帧全部参考与本帧时域距离最近的前向LTR帧,势必造成帧间参考距离过长,从而导致编码质量明显下降,这时,发送端设备确定当前帧参考与当前帧时域距离最近的目标前向LTR帧,当前帧的后M帧中的每一帧均参考本帧的前一帧,从而可以缩短帧间参考距离,提高网络差点环境下的编码质量,实现了自适应选择参考关系,例如全参考关系和逐帧参考关系的灵活组合,一定程度上避免参考离当前帧时域距离很长的参考帧,较大程度缓解了丢包导致的视频卡顿的现象与图像质量模糊的问题,实现了图像质量与图像流畅度之间达到较好的平衡。
在一种可能的实现方式中,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。例如,N和M的具体数值可以取决于网络。
需要说明的是,当前帧的后M帧与当前帧之间可以存在其他帧,也可以是当前帧的后M帧与当前帧之间是时域上紧邻的关系,针对前者的情况,其他帧的帧间参考关系可以与所述后M帧一样,也可以采用其他的帧间参考关系。
图17所示实施例提供的视频图像的接收设备可用于执行本申请图12所示方法实施例的技术方案,其实现原理和技术效果可以进一步参考方法实施例中的相关描述。
应理解以上图17所示的视频图像的接收设备的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块以软件通过处理元件调用的形式实现,部分模块通过硬件的形式实现。例如,编码模块可以为单独设立的处理元件,也可以集成在视频图像的接收设备,例如电子设备的某一个芯片中实现。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或 者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定ASIC,或,一个或多个DSP,或,一个或者多个FPGA等。再如,这些模块可以集成在一起,以片上系统SOC的形式实现。
本申请还提供一种视频图像的编码设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现本申请图3所示实施例提供的方法。
本申请还提供一种视频图像的编码设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现本申请图6所示实施例提供的方法。
本申请还提供一种视频图像的解码设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现本申请图12所示实施例提供的方法。
上述存储器可以是只读存储器(read-only memory,ROM)、可存储静态信息和指令的其它类型的静态存储设备、随机存取存储器(random access memory,RAM)或可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者还可以是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质等。
以上各实施例中,涉及的处理器可以例如包括CPU、DSP、微控制器或数字信号处理器,还可包括GPU、嵌入式神经网络处理器(Neural-network Process Units;以下简称:NPU)和图像信号处理器 (Image Signal Processing;以下简称:ISP),该处理器还可包括必要的硬件加速器或逻辑处理硬件电路,如ASIC,或一个或多个用于控制本申请技术方案程序执行的集成电路等。此外,处理器可以具有操作一个或多个软件程序的功能,软件程序可以存储在存储介质中。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请图3、图6、图8或图12所示实施例提供的方法。
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请图3、图6、图8或图12所示实施例提供的方法。
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a和b,a和c,b和c或a和b和c,其中a,b,c可以是单个,也可以是多个。
本领域普通技术人员可以意识到,本文中公开的实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存 储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory;以下简称:ROM)、随机存取存储器(Random Access Memory;以下简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。

Claims (64)

  1. 一种视频图像的传输方法,所述视频图像包括多个视频帧,其特征在于,包括:
    对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;
    发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
    所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,其中,所述目标前向LTR帧为发送端设备接收到接收端设备确认消息的前向LTR帧,所述确认消息与所述目标前向LTR帧对应;
    所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧。
  2. 根据权利要求1所述的方法,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
  3. 根据权利要求1所述的方法,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。
  4. 根据权利要求1所述的方法,其特征在于,所述发送端设备根据所述当前帧的前n帧的编码质量确定N的数值,n<N。
  5. 根据权利要求2-4任意一项所述的方法,其特征在于,所述发送端设备根据单位时间内所包括的视频帧数确定M的数值。
  6. 根据权利要求5所述的方法,其特征在于,LTR帧的标记间隔D与N和M具有函数关系。
  7. 根据权利要求1所述的方法,其特征在于,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
  8. 根据权利要求7所述的方法,其特征在于,LTR帧的标记间隔D与N和L具有函数关系。
  9. 根据权利要求6或8所述的方法,其特征在于,所述发送端设备根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔D,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
  10. 根据权利要求6或8所述的方法,其特征在于,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
  11. 一种视频图像的传输方法,所述视频图像包括多个视频帧,其特征在于,包括:
    判断当前帧是否被标记为长期参考LTR帧;
    如果所述当前帧未被标记为LTR帧,则对未标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧;或者,
    如果所述当前帧被标记为LTR帧,则对标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与上述当前帧时域距离最近的目标前向LTR帧,其中,所述目标前向LTR帧为所述发送端设备接收到接收端设备确认消息的前向LTR帧,所述确认消息与所述目标前向LTR帧对应;
    发送经编码的码流。
  12. 根据权利要求11所述的方法,其特征在于,所述判断当前帧是否被标记为长期参考LTR帧包括:
    根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧。
  13. 根据权利要求12所述的方法,其特征在于,所述根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧包括:
    获取所述当前帧和与所述当前帧时域距离最近的前向LTR帧之间的间隔帧数;
    如果所述间隔帧数等于所述LTR帧的标记间隔,则将所述当前帧标记为LTR帧;
    如果所述间隔帧数不等于所述LTR帧的标记间隔,则对所述当前帧不标记为LTR帧。
  14. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
  15. 根据权利要求11-14任意一项所述的方法,其特征在于,
    所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量大于或等于编码质量阈值;或者,
    当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
  16. 根据权利要求15所述的方法,其特征在于,还包括:
    对当前帧的后M+1帧进行编码,所述编码过程包括:将表示所述当前帧的后M+1帧的帧间参考关系的信息编入码流,所述后M+1帧的帧间参考关系表示所述后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述后M+1帧中在第一帧之后的每一帧均参考本帧 的前一帧,其中,M为正整数;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
  17. 根据权利要求15所述的方法,其特征在于,还包括:
    对当前帧的后一帧进行编码,所述编码过程包括:将表示所述当前帧的后一帧的帧间参考关系的信息编入码流,所述后一帧的帧间参考关系表示所述后一帧参考与所述本帧时域距离最近的目标前向LTR帧,其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
  18. 根据权利要求16或17所述的方法,其特征在于,所述发送端设备根据单位时间内所包括的视频帧数确定M的数值。
  19. 一种视频通话方法,应用于具有显示屏和图像采集器的电子设备,其特征在于,包括:
    响应于第一用户请求与第二用户进行视频通话的第一操作,建立所述第一用户与所述第二用户之间的视频通话连接;
    通过所述图像采集器采集包括所述第一用户的环境的视频图像,所述视频图像包括多个视频帧;
    对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;
    发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
    所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;所述发送端设备为所述第一用户使用的电子设备,所述接收端设备为所述第二用户使用的电子设备;
    所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧。
  20. 根据权利要求19所述的方法,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
  21. 根据权利要求19所述的方法,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。
  22. 根据权利要求19所述的方法,其特征在于,所述发送端设备根据所述当前帧的前n帧的编码质量确定N的数值,n<N。
  23. 根据权利要求20-22任意一项所述的方法,其特征在于,所述发送端设备根据单位时间内所包括的视频帧数确定M的数值。
  24. 根据权利要求23所述的方法,其特征在于,LTR帧的标记间隔D与N和M具有函数关系。
  25. 根据权利要求19所述的方法,其特征在于,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
  26. 根据权利要求25所述的方法,其特征在于,LTR帧的标记间隔D与N和L具有函数关系。
  27. 根据权利要求24或26所述的方法,其特征在于,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
  28. 一种视频图像的显示方法,所述视频图像包括多个视频帧,其特征在于,包括:
    解析码流,以得到表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
    所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;
    所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧;
    重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧;
    显示所述视频图像。
  29. 根据权利要求28所述的方法,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
  30. 根据权利要求28所述的方法,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。
  31. 一种视频图像的发送设备,所述视频图像包括多个视频帧,其特征在于,包括:
    编码模块,用于对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;
    传输模块,用于发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
    所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧,其中,所述目标前向LTR帧为发送端设备接收到接收端设备确认消息的前向LTR帧,所述确认消息与所述目标前向LTR帧对应;
    所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧。
  32. 根据权利要求31所述的设备,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
  33. 根据权利要求31所述的设备,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。
  34. 根据权利要求31所述的设备,其特征在于,所述编码模块根据所述当前帧的前n帧的编码质量确定N的数值,n<N。
  35. 根据权利要求32-34任意一项所述的设备,其特征在于,所述编码模块根据单位时间内所包括的视频帧数确定M的数值。
  36. 根据权利要求35所述的设备,其特征在于,LTR帧的标记间隔D与N和M具有函数关系。
  37. 根据权利要求31所述的设备,其特征在于,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
  38. 根据权利要求37所述的设备,其特征在于,LTR帧的标记间隔D与N和L具有函数关系。
  39. 根据权利要求36或38所述的设备,其特征在于,所述编码模块根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔D,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
  40. 根据权利要求36或38所述的设备,其特征在于,所述LTR帧的标记间隔D用于所述编码模块标记LTR帧。
  41. 一种视频图像的发送设备,所述视频图像包括多个视频帧,其特征在于,包括:
    判断模块,用于判断当前帧是否被标记为长期参考LTR帧;
    编码模块,用于当所述当前帧未被标记为LTR帧时,对未标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考 关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧;或者,
    当所述当前帧被标记为LTR帧时,对标记的当前帧进行编码,其中,所述编码过程包括:至少将表示当前帧的帧间参考关系的信息编入码流,所述当前帧的帧间参考关系表示所述当前帧参考与上述当前帧时域距离最近的目标前向LTR帧,其中,所述目标前向LTR帧为所述编码模块接收到接收端设备确认消息的前向LTR帧,所述确认消息与所述目标前向LTR帧对应;
    传输模块,用于发送经编码的码流。
  42. 根据权利要求41所述的设备,其特征在于,
    所述判断模块,具体用于根据LTR帧的标记间隔,判断当前帧是否被标记为LTR帧。
  43. 根据权利要求42所述的设备,其特征在于,所述判断模块包括:
    获取子模块,用于获取所述当前帧和与所述当前帧时域距离最近的前向LTR帧之间的间隔帧数;
    标记子模块,用于当所述间隔帧数等于所述LTR帧的标记间隔时,将所述当前帧标记为LTR帧;当所述间隔帧数不等于所述LTR帧的标记间隔,对所述当前帧不标记为LTR帧。
  44. 根据权利要求42所述的设备,其特征在于,
    所述判断模块,还用于根据所述接收端设备反馈的网络状态信息,确定所述LTR帧的标记间隔,所述网络状态信息包括:网络丢包率、网络可用带宽和网络回环时间RTT中的一个或多个。
  45. 根据权利要求41-44任意一项所述的设备,其特征在于,
    所述当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量大于或等于编码质量阈值;或者,
    当前帧的帧间参考关系表示所述当前帧参考与所述当前帧时域距离最近的前向LTR帧,所述前向LTR帧为发送端设备标记为LTR帧的已编 码的视频帧;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
  46. 根据权利要求45所述的设备,其特征在于,
    所述编码模块,还用于对当前帧的后M+1帧进行编码,所述编码过程包括:将表示所述当前帧的后M+1帧的帧间参考关系的信息编入码流,所述后M+1帧的帧间参考关系表示所述后M+1帧中的第一帧参考与所述第一帧时域距离最近的目标前向LTR帧,所述后M+1帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,M为正整数;其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
  47. 根据权利要求45所述的设备,其特征在于,
    所述编码模块,还用于对当前帧的后一帧进行编码,所述编码过程包括:将表示所述当前帧的后一帧的帧间参考关系的信息编入码流,所述后一帧的帧间参考关系表示所述后一帧参考与所述本帧时域距离最近的目标前向LTR帧,其中所述当前帧未被标记为LTR帧且所述当前帧的编码质量小于编码质量阈值。
  48. 根据权利要求46或47所述的设备,其特征在于,
    所述编码模块,用于根据单位时间内所包括的视频帧数确定M的数值。
  49. 一种视频通话设备,其特征在于,包括:
    显示屏;图像采集器;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述设备执行时,使得所述设备执行以下步骤:
    响应于第一用户请求与第二用户进行视频通话的第一操作,建立所述第一用户与所述第二用户之间的视频通话连接;
    通过所述图像采集器采集包括所述第一用户的环境的视频图像,所述视频图像包括多个视频帧;
    对所述多个视频帧进行编码,得到经编码的码流,所述码流至少包括表示帧间参考关系的信息;
    发送所述经编码的码流,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
    所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;所述发送端设备为所述第一用户使用的视频通话设备,所述接收端设备为所述第二用户使用的视频通话设备;
    所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧。
  50. 根据权利要求49所述的设备,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
  51. 根据权利要求49所述的设备,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。
  52. 根据权利要求49所述的设备,其特征在于,当所述指令被所述设备执行时,使得所述设备具体执行以下步骤:
    根据所述当前帧的前n帧的编码质量确定N的数值,n<N。
  53. 根据权利要求50-52任意一项所述的设备,其特征在于,当所述指令被所述设备执行时,使得所述设备具体执行以下步骤:
    根据单位时间内所包括的视频帧数确定M的数值。
  54. 根据权利要求53所述的设备,其特征在于,LTR帧的标记间隔D与N和M具有函数关系。
  55. 根据权利要求49所述的设备,其特征在于,所述表示帧间参考关系的信息还包括L帧的帧间参考关系的信息,L=(M1+1)+(M2+1)+…+(Mn+1),所述L帧时域上在所述M帧之后,所述L帧的帧间参考关系的信息表示所述(Mn+1)帧中的第一帧参考与所述第一帧时域距 离最近的目标前向LTR帧,所述(Mn+1)帧中在第一帧之后的每一帧均参考本帧的前一帧,其中,L为正整数,n为大于或等于1的正整数。
  56. 根据权利要求55所述的设备,其特征在于,LTR帧的标记间隔D与N和L具有函数关系。
  57. 根据权利要求54或56所述的设备,其特征在于,所述LTR帧的标记间隔D用于所述发送端设备标记LTR帧。
  58. 一种视频图像的接收设备,所述视频图像包括多个视频帧,其特征在于,包括:
    解码模块,用于解析码流,以得到表示帧间参考关系的信息,其中,所述表示帧间参考关系的信息包括当前帧的帧间参考关系的信息和当前帧的前N帧的帧间参考关系的信息,其中:
    所述当前帧的帧间参考关系的信息表示本帧参考与本帧时域距离最近的目标前向长期参考LTR帧;
    所述当前帧的前N帧的帧间参考关系的信息表示所述前N帧中的每一帧均参考与本帧时域距离最近的前向LTR帧;
    所述解码模块,还用于重建所述多个视频帧,其中,所述重建多个数据帧包括:根据当前帧的参考帧,重建当前视频帧;
    显示模块,用于显示所述视频图像。
  59. 根据权利要求58所述的设备,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考本帧的前一帧,其中N和M为正整数。
  60. 根据权利要求58所述的设备,其特征在于,所述表示帧间参考关系的信息还包括当前帧的后M帧的帧间参考关系的信息,所述当前帧的后M帧的帧间参考关系的信息表示所述后M帧中的每一帧均参考与本帧时域距离最近的前向LTR帧,其中N和M为正整数。
  61. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求1-10任一项所述的方法。
  62. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求11-18任一项所述的方法。
  63. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求19-27任一项所述的方法。
  64. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求28-30任一项所述的方法。
PCT/CN2020/116541 2019-09-19 2020-09-21 视频图像的传输方法、发送设备、视频通话方法和设备 WO2021052500A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20866583.6A EP4024867A4 (en) 2019-09-19 2020-09-21 VIDEO IMAGE TRANSMISSION METHOD, SENDING DEVICE, AND VIDEO CALL METHOD AND DEVICE
US17/698,405 US20220210469A1 (en) 2019-09-19 2022-03-18 Method For Transmitting Video Picture, Device For Sending Video Picture, And Video Call Method And Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910888693.5A CN112532908B (zh) 2019-09-19 2019-09-19 视频图像的传输方法、发送设备、视频通话方法和设备
CN201910888693.5 2019-09-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/698,405 Continuation US20220210469A1 (en) 2019-09-19 2022-03-18 Method For Transmitting Video Picture, Device For Sending Video Picture, And Video Call Method And Device

Publications (1)

Publication Number Publication Date
WO2021052500A1 true WO2021052500A1 (zh) 2021-03-25

Family

ID=74883927

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/116541 WO2021052500A1 (zh) 2019-09-19 2020-09-21 视频图像的传输方法、发送设备、视频通话方法和设备

Country Status (4)

Country Link
US (1) US20220210469A1 (zh)
EP (1) EP4024867A4 (zh)
CN (1) CN112532908B (zh)
WO (1) WO2021052500A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113573063A (zh) * 2021-06-16 2021-10-29 百果园技术(新加坡)有限公司 视频编解码方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114567799B (zh) * 2022-02-23 2024-04-05 杭州网易智企科技有限公司 视频流数据的传输方法、装置、存储介质及电子设备
WO2024017135A1 (zh) * 2022-07-21 2024-01-25 华为技术有限公司 处理图像的方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045557A (zh) * 2009-10-20 2011-05-04 鸿富锦精密工业(深圳)有限公司 视频编解码方法及使用其的视频编码、解码装置
CN103024374A (zh) * 2011-10-20 2013-04-03 斯凯普公司 视频数据的传输
CN103167283A (zh) * 2011-12-19 2013-06-19 华为技术有限公司 一种视频编码方法及设备
CN105794207A (zh) * 2013-12-02 2016-07-20 高通股份有限公司 参考图片选择
CN106937168A (zh) * 2015-12-30 2017-07-07 掌赢信息科技(上海)有限公司 一种利用长期参考帧的视频编码方法、电子设备及系统
JP2018019195A (ja) * 2016-07-27 2018-02-01 パナソニックIpマネジメント株式会社 動画生成方法、動画生成装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060083298A1 (en) * 2004-10-14 2006-04-20 Nokia Corporation Reference picture management in video coding
US20110249729A1 (en) * 2010-04-07 2011-10-13 Apple Inc. Error resilient hierarchical long term reference frames
CN101924924A (zh) * 2010-07-28 2010-12-22 厦门雅迅网络股份有限公司 一种无线远程视频监控的自适应传输方法及传输系统
US20130343459A1 (en) * 2012-06-22 2013-12-26 Nokia Corporation Method and apparatus for video coding
CN106817585B (zh) * 2015-12-02 2020-05-01 掌赢信息科技(上海)有限公司 一种利用长期参考帧的视频编码方法、电子设备和系统
US10652532B2 (en) * 2016-07-06 2020-05-12 Agora Lab, Inc. Method and apparatus for reference frame management for video communication
CN107948654A (zh) * 2017-11-21 2018-04-20 广州市百果园信息技术有限公司 视频发送、接收方法和装置及终端
US11032567B2 (en) * 2018-07-20 2021-06-08 Intel Corporation Automatic adaptive long term reference frame selection for video process and video coding
CN112995685B (zh) * 2021-02-05 2023-02-17 杭州网易智企科技有限公司 数据发送方法及装置、数据接收方法及装置、介质、设备
CN114465993B (zh) * 2022-01-24 2023-11-10 杭州网易智企科技有限公司 视频编码方法、视频解码方法和装置、介质和计算设备
CN114567799B (zh) * 2022-02-23 2024-04-05 杭州网易智企科技有限公司 视频流数据的传输方法、装置、存储介质及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045557A (zh) * 2009-10-20 2011-05-04 鸿富锦精密工业(深圳)有限公司 视频编解码方法及使用其的视频编码、解码装置
CN103024374A (zh) * 2011-10-20 2013-04-03 斯凯普公司 视频数据的传输
CN103167283A (zh) * 2011-12-19 2013-06-19 华为技术有限公司 一种视频编码方法及设备
CN105794207A (zh) * 2013-12-02 2016-07-20 高通股份有限公司 参考图片选择
CN106937168A (zh) * 2015-12-30 2017-07-07 掌赢信息科技(上海)有限公司 一种利用长期参考帧的视频编码方法、电子设备及系统
JP2018019195A (ja) * 2016-07-27 2018-02-01 パナソニックIpマネジメント株式会社 動画生成方法、動画生成装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113573063A (zh) * 2021-06-16 2021-10-29 百果园技术(新加坡)有限公司 视频编解码方法及装置

Also Published As

Publication number Publication date
CN112532908A (zh) 2021-03-19
US20220210469A1 (en) 2022-06-30
CN112532908B (zh) 2022-07-19
EP4024867A4 (en) 2022-12-21
EP4024867A1 (en) 2022-07-06

Similar Documents

Publication Publication Date Title
WO2021052500A1 (zh) 视频图像的传输方法、发送设备、视频通话方法和设备
JP7221957B2 (ja) ビデオエンコーダレンダリング向けのゲームエンジンアプリケーション
WO2017219896A1 (zh) 视频流的传输方法及装置
JP5882547B2 (ja) シーンの変化に伴うピクチャ内の符号化及び送信パラメータの好適化
WO2021057481A1 (zh) 视频编解码方法和相关装置
CN102396225B (zh) 用于可靠实时传输的图像和视频的双模式压缩
WO2021057705A1 (zh) 视频编解码方法和相关装置
CN113727185B (zh) 视频帧播放方法及系统
WO2012119459A1 (zh) 数据传输的方法、装置和系统
WO2023142716A1 (zh) 编码方法、实时通信方法、装置、设备及存储介质
US20220021872A1 (en) Video encoding method and apparatus, video decoding method and apparatus, storage medium, and electronic device
CN111263192A (zh) 视频处理方法及相关设备
WO2022156688A1 (zh) 分层编解码的方法及装置
US11943473B2 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium, and electronic device
US11979577B2 (en) Video encoding method, video decoding method, and related apparatuses
US20240214562A1 (en) Video coding with dynamic groups of pictures
WO2024187940A1 (zh) 视频数据传输方法、设备、存储介质和系统
WO2021237474A1 (zh) 视频传输方法、装置和系统
CN113747191A (zh) 基于无人机的视频直播方法、系统、设备及存储介质
WO2023169424A1 (zh) 编解码方法及电子设备
WO2021056575A1 (zh) 一种低延迟信源信道联合编码方法及相关设备
WO2023050921A1 (zh) 视音频数据的发送方法、显示方法、发送端及接收端
CN116962613A (zh) 数据传输方法及装置、计算机设备、存储介质
CN111225238A (zh) 信息处理方法及相关设备
US20220078454A1 (en) Video encoding method, video decoding method, and related apparatuses

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20866583

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020866583

Country of ref document: EP

Effective date: 20220331