WO2018076370A1 - 一种视频帧的处理方法及设备 - Google Patents

一种视频帧的处理方法及设备 Download PDF

Info

Publication number
WO2018076370A1
WO2018076370A1 PCT/CN2016/104119 CN2016104119W WO2018076370A1 WO 2018076370 A1 WO2018076370 A1 WO 2018076370A1 CN 2016104119 W CN2016104119 W CN 2016104119W WO 2018076370 A1 WO2018076370 A1 WO 2018076370A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
parameter
value
video
frame
Prior art date
Application number
PCT/CN2016/104119
Other languages
English (en)
French (fr)
Inventor
张金雷
王妙锋
石中博
王世通
薛东
刘海啸
罗巍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201680080601.5A priority Critical patent/CN108713318A/zh
Priority to PCT/CN2016/104119 priority patent/WO2018076370A1/zh
Publication of WO2018076370A1 publication Critical patent/WO2018076370A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • the present invention relates to the field of video processing technologies, and in particular, to a method and a device for processing a video frame.
  • the method is to directly reduce the frame rate, which is equivalent to the active end discarding part of the video frame according to the network state, so as to avoid the packet loss during the transmission, and the decoder cannot correctly decode.
  • the video frame is generally discarded at equal intervals. For example, if the frame rate is reduced from 30 fps to 15 fps, one frame is discarded every two frames, and uniform discarding is performed.
  • the discarding method may cause high-quality video frames to be discarded.
  • the user collects video by hand, but does not fix the device, it is easy to cause the captured video to be blurred due to shaking of the device during the acquisition process. Therefore, just discarding video frames evenly over time is likely to cause the collected high quality video frames to be discarded.
  • Embodiments of the present invention provide a method and a device for processing a video frame, so as to ensure that a high quality video frame is preserved.
  • a method for processing a video frame is provided.
  • the method is applicable to a first device that collects video, and the device transmits the collected video to a second device in real time.
  • the method includes: acquiring, by the first device, a value of a parameter of the first video frame, where the first video frame is any one of the video captured by the first device, and the value of the parameter is used by the first video frame. Indicates the degree of clarity of the first video frame.
  • the first device compares the value of the parameter with a preset value range, and determines whether the value of the parameter is within a preset value range. If the value of this parameter is within the preset value range, then The first device retains the first video frame.
  • the value of the parameter of the first video frame is compared with the preset value range. If the value of the parameter is within the preset value range, the first video frame is reserved, that is, the first video frame is not discarded. The first video frame, that is, even if some video frames need to be discarded, the quality of the video frame is selected according to the quality of the video frame, and the high quality video frame is kept as much as possible, so that the obtained video is as clear as possible and the video quality is improved.
  • the first device may encode the first video frame according to the first coding manner.
  • the encoding is performed according to the first encoding manner, and the number of used bits is greater than a preset number of bits threshold.
  • the first video frame After retaining the first video frame, it involves encoding the first video frame. Because the value range of the parameter of the first video frame is in the preset value range, indicating that the quality of the first video frame is relatively good, the first video frame may be encoded by using the first coding mode, and used in coding. The number of bits can be such that the encoded first video frame is more highly restored when decoded, thereby increasing the number of high quality video frames in the video and improving the quality of the entire video.
  • the first device determines whether the value of the parameter is within a preset value range
  • the first device determines that the value of the parameter is not Within the preset value range
  • the current frame rate can be further considered. If the current frame rate is less than or equal to the target frame rate, indicating that the current frame rate meets the requirements, The first video frame is not discarded for the time being. In addition, if the first video frame is not discarded, then the first video frame is encoded.
  • the second encoding side can be used for the first video frame.
  • the encoding is performed, and fewer bits are used in encoding, which saves coding resources.
  • the first video frame is discarded.
  • the current frame rate is greater than the target frame rate, it indicates that the current frame rate does not meet the requirement, and the quality of the first video frame is not so good, so the first video frame can be directly discarded, and the first video frame does not need to be encoded. Save coding resources, and no need to transmit the encoded first video frame, saving bandwidth.
  • the parameter For contrast parameters, or noise parameters.
  • Any parameter that can be used to indicate the degree of clarity of the video frame, or a parameter that can be used to indicate the quality of the video frame, can be the parameter extracted in the embodiment of the present invention.
  • the parameter is a contrast parameter.
  • obtaining the value of the parameter of the first video frame may be implemented by: the first device obtaining the focus information of the image signal processing module. If the focus object indicated by the focus information is an object to be photographed by the first video frame, the first device obtains the value of the contrast parameter according to the focus information.
  • the focus object indicated by the focus information is an object to be photographed by the first video frame, it indicates that the focus is correct when the first video frame is captured, and the focus information recorded in the image signal processing module is valid, and the first device directly acquires Yes, the way is simpler.
  • the first device passes The Sobel operator obtains the value of the contrast parameter, or obtains the value of the contrast parameter by the Hadamard code conversion algorithm.
  • a device can obtain the comparison of the first video frame by other means.
  • the value of the degree parameter The embodiments of the present invention provide two methods, a Sobel operator and a Hada code transform. It is foreseeable that, in addition to the two methods, other methods for obtaining the value of the contrast parameter of the video frame are also implemented in the present invention. The scope of protection of the example.
  • the first device is configured by the second video frame pair Video frames are inter-predicted.
  • the second video frame is a video frame with the smallest interval between the first video frame and the first video frame before the first video frame.
  • video frames may be encoded using different coding methods
  • the video frames encoded according to the second coding mode are themselves video frames of less quality
  • subsequent video frames are reused, reconstruction of such video frames is used.
  • Inter-frame prediction of frames may result in inaccurate prediction results and reduce the quality of video frames. Therefore, in the embodiment of the present invention, for a video frame encoded according to the first coding mode, because of its high definition and good quality, the reconstructed frame of such a video frame can be used as a reference frame of a subsequent video frame, that is, a subsequent video. Frames can be inter-predicted using reconstructed frames of such video frames.
  • the reconstructed frame of such a video frame is not used as a subsequent video.
  • the reference frame of the frame that is, the subsequent video frame does not select the reconstructed frame of such a video frame for inter prediction. In this way, the quality of prediction can be improved and the possibility of erroneous transmission can be reduced.
  • a processing device for a video frame transmitting the collected video to a second device in real time.
  • the device includes an acquisition module, a comparison module, and a processing module.
  • the obtaining module is configured to obtain a value of a parameter of the first video frame in the process of collecting video by the device.
  • the comparison module is configured to compare the value of the parameter with a preset value range, and determine whether the value of the parameter is within a preset value range.
  • the processing module is configured to reserve the first video frame if the value of the parameter is within a preset value range.
  • the first video frame is any one of the videos collected by the device, and the value of the parameter is used to indicate the clarity of the first video frame.
  • the apparatus further includes an encoding module.
  • the encoding module is configured to: after the processing module retains the first video frame, follow the first The encoding method encodes the first video frame.
  • the encoding is performed according to the first encoding manner, and the number of used bits is greater than a preset number of bits threshold.
  • the apparatus further includes an encoding module.
  • the comparison module is further configured to: after determining whether the value of the parameter is within a preset value range, if the value of the parameter is not within the preset value range, determine whether the current frame rate is greater than the target frame rate. .
  • the encoding module is further configured to: if the comparison module determines that the current frame rate is less than or equal to the target frame rate, encode the first video frame according to the second encoding manner. The encoding is performed according to the second encoding manner, and the number of bits used is less than or equal to the preset bit number threshold.
  • the comparing module is further configured to: after determining whether the current frame rate is greater than the target frame rate, if the current frame is If the rate is greater than the target frame rate, the first video frame is discarded.
  • the parameter For contrast parameters, or noise parameters.
  • the parameter is a contrast parameter.
  • the acquisition module is configured to: obtain focus information of the image signal processing module. If the focus object indicated by the focus information is the object to be photographed in the first video frame, the value of the contrast parameter is obtained according to the focus information.
  • the acquiring module is further configured to: if the in-focus object indicated by the focus information is not the object to be photographed by the first video frame Then, the value of the contrast parameter is obtained by the Sobel operator, or the value of the contrast parameter is obtained by the Hadamard code conversion algorithm.
  • the device further includes a prediction module, configured to: pass The second video frame performs inter prediction on the first video frame.
  • the second video frame is a video frame with the smallest interval between the first video frame and the first video frame before the first video frame.
  • the first coding mode is used for coding, and the number of bits used is greater than a preset. The number of bits threshold.
  • a processing device for a video frame where the device transmits the collected video to the second device in real time.
  • the device includes a memory and a processor.
  • the memory is used to store instructions.
  • the processor is configured to execute an instruction stored in the memory, and obtain a value of a parameter of the first video frame during the process of acquiring the video. Compare the value of the parameter with the preset value range to determine whether the value of the parameter is within the preset value range. If the value of the parameter is within a preset value range, the first video frame is reserved.
  • the first video frame is any one of the videos collected by the device, and the value of the parameter is used to indicate the degree of clarity of the first video frame.
  • the processor is further configured to: after the first video frame is reserved, encode the first video frame according to the first coding manner.
  • the encoding is performed according to the first encoding manner, and the number of used bits is greater than a preset number of bits threshold.
  • the processor is further configured to: after determining whether the value of the parameter is within a preset value range, if the value of the parameter is determined not to be Within the preset value range, it is determined whether the current frame rate is greater than the target frame rate. If it is determined that the current frame rate is less than or equal to the target frame rate, the first video frame is encoded according to the second coding mode. The encoding is performed according to the second encoding manner, and the number of bits used is less than or equal to the preset bit number threshold.
  • the processor is further configured to: after determining whether the current frame rate is greater than the target frame rate, determine the current If the frame rate is greater than the target frame rate, the first video frame is discarded.
  • the parameter For contrast parameters, or noise parameters.
  • the parameter is a contrast parameter.
  • the value of the parameter used by the processor to obtain the first video frame may be implemented by obtaining focus information of the image signal processing module. If the focus object indicated by the focus information is the object to be photographed in the first video frame, according to the focus information Get the value of the contrast parameter.
  • the processor is further configured to: if the focus object indicated by the focus information is not the first video frame For the object to be photographed, the value of the contrast parameter is obtained by the Sobel operator, or the value of the contrast parameter is obtained by the Hadamard code conversion algorithm.
  • the processor The method is further configured to: perform inter prediction on the first video frame by using the second video frame.
  • the second video frame is a video frame with the smallest interval between the first video frame and the first video frame before the first video frame.
  • the first coding mode is used for coding, and the number of bits used is greater than a preset number of bits threshold.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions for use in the foregoing first device, which includes any possible implementation for performing the above first aspect or the first aspect.
  • the video frames are selected according to the quality of the video frames, and the high-quality video frames are reserved as much as possible, so that the obtained video is as clear as possible and the video quality is improved.
  • the names of the first device and the second device are not limited to the device itself. In actual implementation, the devices may appear under other names. As long as the functions of the respective devices are similar to the embodiments of the present invention, they are within the scope of the claims and the equivalents thereof.
  • Figure 1 is a schematic diagram of a video encoding process
  • 2A is a schematic diagram of 35 intra prediction modes
  • 2B is a schematic diagram of a prediction mode of mode 29 in an intra prediction mode
  • 3 is a schematic diagram of an inter prediction mode
  • FIG. 4 is a schematic diagram of an application scenario according to an embodiment of the present invention.
  • Figure 5 is a schematic diagram of error transmission caused by dropped frames
  • FIG. 6 is a flowchart of a video processing method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of dividing a video frame when calculating a contrast parameter by using a Sobel operator according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a transformation matrix when calculating a value of a contrast parameter by using a Hadamard transform algorithm according to an embodiment of the present invention.
  • FIG. 9A is a schematic diagram of how the current video frame in the prior art selects a previous frame for inter prediction
  • 9B is a schematic diagram of how to perform inter-frame prediction on a frame before a current video frame is selected according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of two structures of a video frame processing apparatus according to an embodiment of the present invention.
  • H.264 is the Video Coding Experts Group (VCEG) and the International Organization for Standardization (ITU) of the ITU-T for ITU Telecommunication Standardization Sector (International Organization for ITU) Standardization, ISO) / International Electrical Commission (IEC) Moving Picture Experts Group (MPEG) Joint Video Team (JVT) proposed a highly compressed digital video codec standard .
  • VCEG Video Coding Experts Group
  • ITU International Organization for Standardization
  • ISO International Organization for ITU
  • ISO International Electrical Commission
  • MPEG Moving Picture Experts Group
  • JVT Joint Video Team
  • I frames Three frames are defined in the H.264 protocol, and frames predicted only with reference to the frame of the current encoded frame internal information are referred to as I frames.
  • the I frame is generally predicted by the intra prediction method, that is, the partial image block that has been reconstructed in the current frame is used for prediction, and the adjacent frame is not used for prediction.
  • a frame generated by referring to the previous frame and containing only the difference partial coding is referred to as a P frame, and a frame encoded with reference to the preceding and succeeding frames is referred to as a B frame.
  • intra prediction is performed.
  • the prediction method only uses the reconstruction value inside the current video frame for prediction. If the previous video frame is lost, it does not affect the current video frame.
  • P frames generally perform interframe prediction and intra prediction.
  • Interframe prediction is performed by using video frames that have been encoded and reconstructed before the currently encoded video frame. It can be seen that if the quality of the reference frame before the currently encoded video frame is not high, the quality of the prediction result obtained by the video frame according to the reference frame for inter prediction is not good.
  • Content related to coding such as prediction methods, will be described below.
  • Terminal equipment also known as user equipment.
  • the user equipment is a device that provides voice and/or data connectivity to the user, and may include, for example, a handheld device having a wireless connection function, or a processing device connected to the wireless modem.
  • the user equipment can communicate with the core network via a Radio Access Network (RAN) to exchange voice and/or data with the RAN.
  • the user equipment may include (User Equipment, UE), a wireless terminal device, a mobile terminal device, a Subscriber Unit, a Subscriber Station, a Mobile Station, a Mobile, and a Remote Station ( Remote Station), Access Point (AP), Remote Terminal, Access Terminal, User Terminal, User Agent, or User Equipment ( User Device) and so on.
  • a mobile phone or "cellular” phone
  • a computer with a mobile terminal device
  • a portable, pocket, handheld, computer built-in or in-vehicle mobile device For example, Personal Communication Service (PCS) phone, no Rope phone, Session Initiation Protocol (SIP) phone, Wireless Local Loop (WLL) station, Personal Digital Assistant (PDA) and other devices.
  • PCS Personal Communication Service
  • SIP Session Initiation Protocol
  • WLL Wireless Local Loop
  • PDA Personal Digital Assistant
  • Multiple in the embodiment of the present invention means two or more. "and/or”, describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/”, unless otherwise specified, generally indicates that the contextual object is an "or" relationship.
  • the embodiment of the present invention relates to the process of video coding, in order to better understand the technical solution provided by the embodiment of the present invention, the process of video coding is briefly introduced below.
  • FIG. 1 is a schematic diagram of a video encoding process.
  • the video coding process includes several processes such as prediction, transformation, quantization, and entropy coding.
  • the transformation process is mainly to remove the correlation of spatial signals, such as Discrete Cosine Transform (DCT).
  • the quantization process is a process of representing a larger set with a smaller set, such as scalar (Scalar) quantization.
  • DCT Discrete Cosine Transform
  • Scalar Scalar
  • the first device collects a video frame through the sensor, that is, the original video frame in FIG.
  • the first device subtracts the original video frame from the optimal prediction value to obtain a prediction residual, transforms and quantizes the prediction residual to obtain a quantized coefficient, and entropy the quantized coefficient. Encode to generate a stream.
  • the optimal prediction value is a prediction value corresponding to the optimal mode of the intra prediction
  • the original video frame is a P frame
  • the previous video frame is used for inter prediction and the original is utilized.
  • the reconstructed image block in the video frame is subjected to intra prediction.
  • the two types of prediction methods are compared with the optimal prediction value selected.
  • the first device further performs inverse quantization and inverse transform on the quantized coefficients obtained by transform and quantization, obtains a reconstructed residual of the original video frame, obtains a reconstructed video frame according to the reconstructed residual and the optimal predicted value, and performs reconstruction video frames according to the reconstructed video frame. prediction.
  • the prediction part is composed of two types of prediction modes: inter prediction and intra prediction.
  • Intra prediction uses the spatial correlation to explore the correlation inside the video frame.
  • the encoding process performs the block by block on the video frame, and the encoding end reconstructs each video block block by block.
  • the intra prediction block directly utilizes the reconstructed block above the currently encoded video block in the video frame and the left reconstructed block for prediction.
  • intra prediction can have multiple prediction methods.
  • 35 intra prediction modes are defined in H.265, including planar mode (planar), mean mode (DC) two prediction modes, and 33 prediction modes in 33 directions.
  • the DC mode represents the mean prediction, that is, the mean of the adjacent reconstructed pixels is used as the prediction.
  • FIG. 2B which is a schematic diagram of prediction mode 29 therein.
  • the prediction mode 29 is predicted in the corresponding direction by the reconstruction values of the upper row (the horizontal frame region in FIG. 2B) and the left column (the vertical frame region in FIG. 2B) of the current block, in FIG. 2B.
  • the arrow indicates the direction, that is, the pixel points in the direction indicated by the arrow are predicted using the reconstructed value of the horizontal line region.
  • Inter-frame prediction uses temporal correlation to explore the correlation between adjacent video frames.
  • the current video frame is predicted by the video frame that has been reconstructed in time, that is, the other video that has been reconstructed before the current video frame is utilized.
  • the frame predicts the current video frame.
  • FIG. 3 takes the example of using four frames before the current video frame as a reference frame, that is, using the previous four video frames to predict the current video frame.
  • the five video frames in Figure 3 are arranged in chronological order.
  • the first video frame on the right is the current video frame, and the first four video frames are reference frames. Dividing the current video frame into a plurality of image blocks, searching for a prediction block in each of the four reference frames for each image block, and selecting a closest prediction block if the number of the predicted prediction blocks is greater than or equal to two As the prediction block of the image block.
  • the image block 1 in the current video frame is the prediction block 2 in the third reference frame from left to right as the prediction block, and the image block 3 in the current video frame is from left to right.
  • the prediction block 4 found in one reference frame, and the prediction block 5 in the fourth reference frame from left to right, may be from the prediction block 4 and the prediction block 5 for the image block 3. Choose one as the prediction block.
  • the optimal prediction value of the current video frame may be selected.
  • rate-distortion optimization is generally used to select the optimal prediction value. That is to say, the utilization distortion optimization criterion is to weigh the two factors of the code rate and the error to select an optimal one.
  • the calculation method of rate distortion optimization is:
  • D represents the error between the reconstructed value and the original value after encoding in the current prediction mode (intra prediction or inter prediction)
  • R represents the number of bits used in encoding in the current prediction mode
  • ⁇ It is the Lagrangian factor.
  • the formula (1) can be used to calculate the corresponding RDcost, and finally the predicted value with the smallest RDcost is selected as the optimal prediction value.
  • the video encoding process is briefly introduced as above.
  • the embodiments of the present invention mainly relate to which video frames are selected for encoding, and which video frames are selected for subsequent inter-frame prediction in the video encoding process.
  • An application scenario of the embodiment of the present invention is described below. Please refer to FIG. 4 .
  • the first device collects video in real time through a sensor (for example, a camera) (the image in FIG. 4 represents a video frame collected by the first device), and after the first device collects the video frame, the video frame is encoded. And then transmitting the encoded video frame to the second device through the base station in real time for the second device to display in real time. Because the network status is unstable, the packet loss rate is relatively high.
  • the first device needs to discard some video frames from the collected video frames, and then encodes the remaining video frames and transmits them to the second device, thereby reducing network bandwidth. Occupy, increase the transmission success rate.
  • the first device and the second device may both be user devices.
  • FIG. 4 is a mobile phone as an example.
  • the P frame needs to be predicted by using the inter prediction method. If the network bandwidth is unstable, if the first device does not actively discard the video frame, the network bandwidth is unstable. The result is that the video frame is lost due to the loss of the video frame.
  • FIG. 5 it is a schematic diagram of the video frame being forced to be discarded due to unstable network bandwidth in the prior art. Because the network bandwidth is unstable, the video frame 3 in FIG. 5 is discarded, and the quality of the video frame 3 is relatively high. If the subsequent video frame is predicted by using the video frame 3, a relatively good prediction result can be obtained. But now video frame 3 is discarded, causing subsequent video frames to be unable to be correctly predicted.
  • the embodiment of the present invention proposes that even if some video frames need to be discarded, the quality of the video frame is selected according to the quality of the video frame, and the high-quality video frame is retained as much as possible, so that the prediction accuracy can be improved, and the obtained video is maximized. Clear and improve video quality.
  • an embodiment of the present invention provides a video frame processing method, which can be performed by a device that collects a video frame and encodes a video frame. As shown in FIG. 1 , the device is a first device.
  • Step 601 The first device acquires a value of a parameter of the first video frame in the process of collecting the video.
  • the first video frame is any frame in the video collected by the first device, and the value of the parameter is used to indicate the first The clarity of a video frame;
  • Step 602 The first device compares the value of the parameter with a preset value range, and determines whether the value of the parameter is within a preset value range.
  • Step 603 If the value of the parameter is within the preset value range, the first device reserves the first video frame.
  • the method provided by the embodiment of the present invention may be performed on each video frame included in the video captured by the first device, or performed on a specific video frame in the collected video.
  • the method provided by the embodiment of the present invention may be performed in any case, or may be performed in consideration of a current frame rate and/or a network state, for example, the current frame rate is large or the network condition is unstable (network When the bandwidth is insufficient, the solution provided by the embodiment of the present invention is executed.
  • the parameter may be a contrast parameter, or a noise parameter, or other possible parameters, as long as the parameter capable of reflecting the clarity of the video frame can be used as the parameter extracted in the embodiment of the present invention.
  • the preset value range may be a range greater than the preset contrast threshold. If the value of the contrast parameter of the first video frame is greater than the preset contrast threshold, the contrast parameter is determined. The value is in the preset value range. The first video frame is considered to be clearer and the quality is higher.
  • the preset value range may be a range smaller than the preset noise threshold, if the value of the noise parameter of the first video frame is smaller than
  • the preset noise threshold determines that the value of the noise parameter is within a preset value range, and the first video frame is considered to be clear and the quality is high, and if the value of the noise parameter of the first video frame is greater than or equal to the preset noise.
  • the threshold value determines that the value of the noise parameter is not within the preset value range, and the first video frame is considered to be fuzzy and of poor quality.
  • the noise parameter can be embodied by the sensitivity (ISO) parameter, and the severity of the noise can be estimated by the ISO parameter.
  • the ISO parameter is related to the resolution information, and the value of ISO will be different when the resolution is different. In general, the larger the value of the ISO parameter, the more severe the noise.
  • the preset value range corresponding to the noise parameter may be separately set, that is, the corresponding preset noise threshold is set, so that the preset preset corresponding to the resolution may be determined according to the resolution of the first video frame.
  • the value range determines whether the quality of the first video frame is determined according to whether the value of the noise parameter of the first video frame is within a preset value range.
  • the parameter of the extracted first video frame is a contrast parameter, it also involves how to obtain the value of the contrast parameter.
  • the value of the contrast parameter may be obtained in different manners, which is described in the following examples.
  • the focus information of the Image Signal Processing (ISP) module can be obtained.
  • the value of the contrast parameter can be directly obtained according to the focus information, which is simple.
  • the object to be photographed by the first video frame that is, the object that the user is interested in, for a video, which objects are objects of interest to the user, may be set by the user in advance, or may be located by analyzing the video.
  • the video frame before the first video frame is determined, for example, in the first video In the video frame before the frame, most of the video frames include an object, or the focus object of the multiple video frames in the video frame before the first video frame is an object, then it is determined that the object is interested in the user. Object.
  • the focus of the first video frame is correct, and the value of the contrast parameter obtained by the focus information may be Think it is more accurate. If the focus object indicated by the focus information is not the object of interest to the user, then the focus is incorrect when the first video frame is captured, and the value of the contrast parameter obtained by the focus information may not be accurate enough. The value of the contrast parameter can be obtained without using the focus information.
  • the value of the contrast parameter of the first video frame can be obtained by the second mode or the third mode as described below.
  • the second method or the third method can be used to obtain the value of the contrast parameter of the first video frame, or at the beginning, Any one of the modes is selected to obtain the value of the contrast parameter of the first video frame.
  • the value of the contrast parameter can be calculated using the value of the Y component representing the Luminance in the YUV color space model of the pixel included in the video frame.
  • the matrix composed of the Y values of the corresponding 3*3 image blocks is cross-multiplied with the G x matrix to obtain val1, and the Y value of the 3*3 image block is composed.
  • the matrix is cross-multiplied with the G y matrix to obtain val2, and the square sum of val1 and val2 is calculated to obtain the value of the contrast of the 3*3 image block.
  • the so-called cross multiplication that is, the values of the corresponding positions of the two matrices are multiplied and added.
  • the value of the contrast for the 3*3 image block can be calculated by the formula (2):
  • the values of all the contrasts are averaged to obtain the value of the contrast of the video frame.
  • the averaging may be a simple arithmetic average or a weighted average, which is not limited in the embodiment of the present invention.
  • a video frame can be divided into 3*3 sizes to obtain a plurality of 3*3 image blocks, as shown in FIG. 7.
  • Fig. 7 shows a video frame in which each box represents a 3*3 image block, and Fig. 7 is exemplified by dividing an image into 72 3*3 image blocks, which is of course not limited to practical applications.
  • the value of the contrast is calculated as described above.
  • the values of all the calculated contrasts are averaged, and the obtained value is obtained.
  • the value of the contrast of the video frame may be a simple arithmetic average or a weighted average, which is not limited in the embodiment of the present invention.
  • a 5*5 image block may be selected. If a 5*5 image block is selected, the corresponding G x matrix and G y matrix are selected. as follows:
  • image blocks of 3*3 For the specific calculation method, reference may be made to the description of the calculation manner of the image block of 3*3 as above.
  • image blocks of other sizes may be selected, such as 7*7, or 9*9, etc., which are not limited in the embodiment of the present invention.
  • the value of the contrast component can also be calculated using the value of the Y component in the YUV color space model of the pixel points included in the video frame.
  • each pixel in the pixel block takes the value of its Y component to form an 8*8 matrix.
  • the Hadamard transform is performed on the 8*8 matrix, and a transformation matrix used is shown in FIG.
  • the transformation formula is as follows:
  • Y represents a transformed matrix
  • X represents a matrix before transformation
  • H n represents a transformation matrix, such as the matrix shown in FIG. Represents the transposed matrix of the transformation matrix.
  • the coefficients in the obtained matrix Y are calculated according to the following formula to obtain the value of the contrast parameter of the 8*8 pixel block:
  • Y(j, k) represents the value of the luminance component of the pixel of the jth row and the kth column in the matrix Y, and abs represents the absolute value. That is: the brightness of the pixel at the position other than (0, 0) in the matrix Y
  • the value of the luminance component of the 63 pixel points other than the value of the degree component is added as the value of the contrast parameter of the 8*8 pixel block.
  • the first video frame is divided into a plurality of 8*8 pixel blocks, and the value of the contrast parameter of each 8*8 pixel block is calculated, and the value of the contrast parameter of each pixel block is added to obtain the first video.
  • the value of the contrast parameter of the frame is divided into a plurality of 8*8 pixel blocks, and the value of the contrast parameter of each 8*8 pixel block is calculated, and the value of the contrast parameter of each pixel block is added to obtain the first video.
  • the value of the contrast parameter of the frame is divided into a plurality of
  • the above is an 8*8 pixel block as an example, and is not limited thereto in practical applications.
  • a 4*4 pixel block, a 16*16 pixel block, or a 32*32 pixel block may be selected, and the like.
  • the following mainly takes the extracted parameter as a contrast parameter.
  • the preset value range is a range greater than or equal to the preset contrast threshold.
  • the preset contrast threshold can be obtained by performing a certain weighted average using the values of the contrast parameters of the adjacent first n frames.
  • the weight may be selected according to the user's request for the video. For example, the clearer the user requests the video, the greater the weight may be selected.
  • a way to weight the average is as follows:
  • contrast(i) represents the value of the contrast parameter of the ith frame
  • i represents the ith frame in the time dimension
  • i-1 represents the previous frame of the ith frame in time
  • thr represents the preset.
  • the formula (5) is only an example of a method of obtaining a preset contrast threshold, and acquiring a preset
  • the method of contrast threshold is not limited to this, and a similar preset contrast threshold can be obtained by using a certain combination of contrast information of adjacent frames.
  • the first video frame may be encoded according to the first coding mode, wherein the coding is performed according to the first coding mode, and the used bit (bit) If the number is greater than the preset number of bits threshold, more bits are allocated for the first video frame, and then a smaller quantization parameter (QP) is selected for the first video frame by the rate control technique, and then encoded.
  • QP quantization parameter
  • the preset bit number threshold can be set according to the capabilities of the device or the user's requirements for the video. That is to say, the value of the contrast parameter can indicate the focus accuracy and sharpness of the first video frame. If the value of the contrast parameter is greater than or equal to the preset contrast threshold, the first video frame can be considered to be more accurate or more focused. For a clear video frame, that is, a video frame with a relatively high quality, in the embodiment of the present invention, a relatively large number of bits are allocated to the encoder to perform key coding on the video frame to improve the quality of the entire video.
  • the encoding is performed according to the first encoding method, and the number of bits used may be the same or different.
  • the difference between the first video frame and the preset contrast threshold is smaller than the second video frame and the preset.
  • the difference between the contrast thresholds that is, the quality of the second video frame may be considered to be higher than the quality of the first video frame, in which case the first video frame and the second video frame may be encoded using the same bit.
  • the number is so simple for the encoder, or the number of bits used for encoding the second video frame can be larger than the number of bits used for encoding the first video frame, which can further improve the quality of the video, so that the quality is high. Video frames get better coding.
  • a plurality of difference intervals may be set in advance, and a correspondence between the difference interval and the number of bits used for encoding may be set.
  • the value of the contrast parameter of the video frame is greater than the preset contrast threshold, the difference between the value of the contrast parameter of the video frame and the preset contrast threshold is calculated, To determine which difference interval the difference is located, the video frame is encoded by the number of bits corresponding to the difference interval.
  • the current frame rate can be further considered.
  • the frame rate is related to the network bandwidth between the first device and the second device, if current The frame rate is greater than the target frame rate, indicating that the network bandwidth is insufficient. In this case, there is no need to transmit a bad quality video frame, so the first video frame can be directly discarded to save network bandwidth. And because the discarded video frames are not of good quality, the impact on the quality of the entire video will not be great.
  • the current frame rate is less than or equal to the target frame rate, it indicates that the network bandwidth is sufficient. In this case, even if the quality of the video frame is not good, transmission can be performed to reduce the number of discarded video frames. If the first video frame is not discarded, then the first video frame is encoded. In a possible implementation, if the value of the contrast parameter of the first video frame is less than or equal to the preset contrast threshold, and the first video frame is not discarded because of the frame rate, the first video frame is encoded.
  • the second coding mode may be used for coding, where the second coding mode is used for coding, and the number of used bits is less than or equal to the preset number of bits threshold, that is, less bits are allocated for the first video frame, and then
  • the rate control technology selects a larger QP for the first video frame, and then performs coding.
  • the specific coding mode reference may be made to the coding method in the prior art.
  • the QP when a QP is allocated for a video frame that is encoded according to the second coding mode, the QP may be allocated according to experience, or may be preset to be allocated according to the second coding mode.
  • the QP of the video frame coded by one coding mode is M, and the QP allocated for the video frame coded according to the second coding mode is greater than the QP allocated for the video frame coded according to the first coding mode.
  • M may be set according to an empirical value, or according to a user's requirement for a video frame, for example, M is equal to 2 or 3.
  • the value of the contrast parameter can indicate the focus accuracy of the first video frame and If the value of the contrast parameter is less than or equal to the preset contrast threshold, the first video frame may be regarded as a video frame whose focus is not accurate or unclear.
  • the encoder in the embodiment of the present invention The video frame is encoded without using more bits, that is, the video frame is not heavily encoded, so as to save the number of coded bits and reduce the burden on the encoder.
  • the encoding is performed according to the second encoding method, and the number of bits used may be the same or different.
  • there are two video frames which are respectively the first video frame and the second video frame, and are encoded by the second coding mode.
  • the difference between the first video frame and the preset contrast threshold is smaller than the second video frame and the preset.
  • the difference between the contrast thresholds, that is, the quality of the first video frame may be considered to be higher than the quality of the second video frame.
  • encoding the first video frame and the second video frame may use the same bit.
  • the number is so simple for the encoder, or the number of bits used for encoding the first video frame can be larger than the number of bits used for encoding the second video frame, which can further improve the quality of the video, so that the quality is high. Video frames get better coding.
  • a plurality of difference intervals may be set in advance, and a correspondence relationship between the difference interval and the number of bits used for encoding may be set.
  • the value of the contrast parameter of the video frame is less than or equal to the preset contrast threshold, calculate a difference between the value of the contrast parameter of the video frame and the preset contrast threshold, and determine which difference interval the difference is located in. Then, the video frame is encoded by the number of bits corresponding to the difference interval.
  • the number of bits allocated for the first coding mode and the number of bits allocated for the second coding mode may be determined according to the network bandwidth and the current frame rate.
  • the number of bits allocated for the first coding method refers to the number of bits used when encoding one video frame according to the first coding method, and the number of bits allocated for the second coding method is also the same.
  • a proportional relationship may be set between the two types of bits, for example, the first coding mode allocation is preset.
  • the number of bits is N times the number of bits allocated for the second coding mode, where N is an integer or fraction greater than one.
  • N 2
  • the network bandwidth is 1000 Kbit/s
  • the current frame rate is 30.
  • the ratio of the video frame encoded according to the first coding mode to the video frame coded according to the second coding mode is approximately 1:1.
  • the value of N may be determined according to the quality requirements of the video frame by the user, and the ratio of the video frame encoded according to the first encoding manner and the video frame encoded according to the second encoding manner may be based on the user shaking condition and the focusing accuracy. Depending on factors.
  • the video frames encoded according to the second encoding method are themselves poor quality video frames, if the subsequent video frames reuse such video frames.
  • the reconstructed frame is inter-predicted, which may result in inaccurate prediction results and reduce the quality of the video frame. Therefore, in the embodiment of the present invention, for a video frame that is mainly coded, that is, a video frame that is encoded according to the first coding mode, because of its high definition and good quality, the reconstructed frame of such a video frame can be used as a subsequent video.
  • the reference frame of the frame, ie the subsequent video frame can be inter-predicted using the reconstructed frame of such a video frame.
  • a non-emphasized coded video frame that is, a video frame that is encoded according to the second coding mode
  • the video source has a lower resolution and poor quality, and the key code is not encoded during encoding, so this class is not used.
  • the reconstructed frame of the video frame is used as the reference frame of the subsequent video frame, that is, the subsequent video frame does not select the reconstructed frame of such video frame for inter prediction.
  • the video frame encoded according to the first coding manner and the video frame encoded according to the second coding manner may be separately added with identification information, for example, by using one bit to implement the identification information, and the video encoded according to the first coding manner is used.
  • the value of the bit used to implement the identification information is “1”.
  • the value of the bit used to implement the identification information is “0”, and the video is passed through a video.
  • the identification information of the frame can determine the encoding in which the video frame is encoded, so that it can be known whether the video frame is to be used as the reference frame of the subsequent video frame.
  • the first video frame may be inter-predicted by using the second video frame, that is, the first video frame is inter-predicted by using the reconstructed frame of the second video frame.
  • the second video frame is a video frame with the smallest interval between the video frame encoded by the first coding mode and the first video frame before the first video frame. In this way, you can try to select higher quality video frames for interframe prediction. Because the resolution is closer, you can find more accurate. The predicted value, which in turn helps to increase the encoding compression ratio.
  • each rectangle represents a video frame.
  • each frame uses a temporally adjacent previous frame as a reference frame, as indicated by the arrows in Figure 9A.
  • FIG. 9B a reference frame structure is provided in the embodiment of the present invention, wherein each rectangle represents a video frame, and a rectangle having a longer length represents a video frame encoded according to the first encoding manner, which is shown in FIG. 9B.
  • a rectangle having a shorter length indicates a video frame encoded in the second encoding mode, which is referred to as a small P frame in FIG. 9B.
  • the subsequent video frame can select the previous large P frame when performing inter prediction, instead of selecting the previous small P frame, as shown by the arrow in FIG. 9B.
  • the fourth video frame from left to right in FIG. 9B that is, a large P frame
  • it does not select the previous small P frame adjacent to it when performing inter prediction, but can skip the small In the P frame
  • the large P frame before the small P frame that is, the second video frame from left to right, is selected for inter prediction.
  • the video frame processing method provided by the embodiment of the present invention is applicable not only to a P frame but also to an I frame.
  • the first coding mode or the second coding mode may be adopted for the P frame, and the two coding modes may be used for the I frame.
  • the first coding mode of the I frame and the first coding mode of the P frame may be the same.
  • the second encoding mode of the I frame may be the same as or different from the second encoding mode of the P frame. If the value of the I frame parameter is in the preset value range (for the I frame, the preset value range may be the same as the preset value range selected by the P frame), then the I frame is used.
  • the second coding mode if the value of the parameter of the I frame is not within the preset value range, and the current frame rate is less than or equal to the target frame rate, the second coding mode is adopted for the I frame.
  • the I frame encoded according to the first coding mode because of its high quality, it is specified that it can be used as a long-term reference frame of a subsequent video frame, that is, the I can be used in addition to a subsequent P frame adjacent to time.
  • subsequent I-frames may also use the I frame as a reference frame.
  • an embodiment of the present invention provides a processing device for a video frame, where the device may include a processor 1001 and a memory 1002.
  • the processor 1001 may include a central processing unit (CPU) or an application specific integrated circuit (ASIC), and may include one or more integrated circuits for controlling program execution, and may include using a field programmable gate array.
  • a hardware circuit developed by a Field Programmable Gate Array (FPGA) may include a baseband chip.
  • the number of memories 1002 may be one or more.
  • the memory 1002 may include a Read Only Memory (ROM), a Random Access Memory (RAM), and a disk storage, and the like.
  • the memory 1002 can be used to store program code required for the processor 1001 to perform tasks, and can also be used to store data and the like.
  • the memory 1002 can be connected to the processor 1001 via the bus 1000 (as shown in FIG. 10 as an example), or can be connected to the processor 1001 through a dedicated connection line.
  • the code corresponding to the method provided by the embodiment shown in FIG. 6 is solidified into the chip, so that the chip can perform the operation shown in the embodiment shown in FIG. 6 during operation. method.
  • How to design and program the processor 1001 is a technique well known to those skilled in the art, and details are not described herein again.
  • the device can be used to perform the method provided by the embodiment shown in FIG. 6, for the functions and the like implemented by the functional modules in the device, reference may be made to the description of the previous method, and details are not described herein.
  • an embodiment of the present invention provides a processing device for a video frame, where the device may include an obtaining module 1101, a comparing module 1102, and a processing module 1103.
  • the obtaining module 1101 is configured to obtain a value of a parameter of the first video frame, where the first video frame is any one of the collected video, and the value of the parameter is used to indicate the clarity of the first video frame.
  • the comparison module 1102 is configured to compare the value of the parameter with a preset threshold, and determine whether the value of the parameter is less than a preset threshold.
  • the processing module 1103 is configured to reserve the first video frame if the value of the parameter is greater than or equal to a preset threshold.
  • the device may further include an encoding module 1104, which is shown together in FIG.
  • the encoding module 1104 serves as an optional functional module, which is drawn in the form of a dashed line in FIG. 11 in order to distinguish it from the required functional modules.
  • the device may further include a prediction module 1105, which is shown together in FIG.
  • the prediction module 1105 serves as an optional functional module, which is drawn in the form of a dashed line in FIG. 11 in order to distinguish it from the required functional modules.
  • the entity modules corresponding to the obtaining module 1101, the comparing module 1102, the processing module 1103, the encoding module 1104, and the prediction module 1105 may all be the processor 1001 in the embodiment shown in FIG.
  • the device may be used to perform the method provided by the embodiment shown in FIG. 6. Therefore, for the functions and the like implemented by the function modules in the device, reference may be made to the description of the previous method, and details are not described herein.
  • the value of the parameter of the first video frame is compared with the preset value range. If the value of the parameter is within the preset value range, the first video frame is reserved, that is, the first video frame is not discarded. The first video frame, that is, even if some video frames need to be discarded, the quality of the video frame is selected according to the quality of the video frame, and the high quality video frame is kept as much as possible, so that the obtained video is as clear as possible and the video quality is improved.
  • the disclosed apparatus and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit or unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to implement the embodiments of the present invention.
  • the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may also be an independent physical module.
  • the integrated unit is implemented in the form of a software functional unit and sold as a standalone product Or when used, it can be stored in a computer readable storage medium.
  • all or part of the technical solution of the present invention may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a computer device, such as a personal computer. , a server, or a network device or the like, or a processor performs all or part of the steps of the method of the various embodiments of the present invention.
  • the foregoing storage medium includes: a universal serial bus flash drive, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种视频帧的处理方法及设备,尽量保证高质量的视频帧得以保留。该方法包括:第一设备在采集视频的过程中,获取第一视频帧的参数的取值;所述第一视频帧为所述第一设备采集的所述视频中的任意一帧,所述参数的取值用于指示所述第一视频帧的清晰程度;所述第一设备将所述参数的取值与预设取值范围进行比较,确定所述参数的取值是否位于预设取值范围内;若所述参数的取值位于所述预设取值范围内,则所述第一设备保留所述第一视频帧。

Description

一种视频帧的处理方法及设备 技术领域
本发明涉及视频处理技术领域,尤其涉及一种视频帧的处理方法及设备。
背景技术
在实时传输过程中,例如在设备A将实时采集的设备传输给设备B的过程中,当网络反馈的丢包率高达一定阈值时,则认为目前的网络带宽无法满足当前帧率的要求,目前的做法是直接降低帧率,相当于编码端根据网络状态主动丢弃部分视频帧,以免传输过程中丢包导致解码器无法正确解码。其中,在降低帧率时,一般选择等时间间隔地抽取视频帧丢弃,比如:帧率从30fps降低到15fps,则每2帧丢弃1帧,进行均匀丢弃。
这种方法虽然可以通过主动丢弃视频帧的方式尽量保证网络带宽能够满足帧率的要求,但是这种丢弃方式可能会导致高质量的视频帧被丢弃。尤其是对于手机等移动设备而言,若用户手持设备来采集视频,而并不是将设备固定后进行采集,则很容易由于采集过程中设备的晃动等原因而导致所采集视频较为模糊。因此,只是按照时间来均匀丢弃视频帧,很有可能导致所采集的高质量的视频帧被丢弃。
发明内容
本发明实施例提供一种视频帧的处理方法及设备,尽量保证高质量的视频帧得以保留。
第一方面,提供一种视频帧的处理方法,该方法可应用于采集视频的第一设备,该设备要将采集的视频实时传输给第二设备。该方法包括:第一设备在采集视频的过程中,获取第一视频帧的参数的取值,其中,第一视频帧为第一设备采集的视频中的任意一帧,该参数的取值用于指示第一视频帧的清晰程度。第一设备将该参数的取值与预设取值范围进行比较,确定该参数的取值是否位于预设取值范围内。若该参数的取值位于预设取值范围内,则 第一设备保留第一视频帧。
本发明实施例中将第一视频帧的参数的取值与预设取值范围进行比较,如果该参数的取值位于预设取值范围内,那么就保留第一视频帧,即不会丢弃第一视频帧,也就是说,即使需要丢弃一些视频帧,也会根据视频帧的质量进行选择,对于高质量的视频帧尽量保留,这样可以使得得到的视频尽量清晰,提高视频质量。
结合第一方面,在第一方面的第一种可能的实现方式中,在第一设备保留第一视频帧之后,第一设备可以按照第一编码方式对第一视频帧进行编码。其中,按照第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
在保留第一视频帧之后,就涉及到对第一视频帧进行编码。因为第一视频帧的参数的取值范围位于预设取值范围内,表明第一视频帧的质量比较好,因此可以对第一视频帧采用第一编码方式进行编码,在编码时使用较多的比特数,可以使得编码后的第一视频帧在被解码时还原度更高,从而增加视频中高质量的视频帧的数量,提升整个视频的质量。
结合第一方面,在第一方面的第二种可能的实现方式中,在第一设备确定该参数的取值是否位于预设取值范围内之后,若第一设备确定该参数的取值没有位于预设取值范围内,则确定当前的帧率是否大于目标帧率。若第一设备确定当前的帧率小于或等于目标帧率,则按照第二编码方式对第一视频帧进行编码。其中,按照第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值。
如果第一视频帧的参数的取值没有位于预设取值范围内,表明第一视频帧的质量不够好,本身是可以将其丢弃的。但考虑到丢弃的视频帧的数量越少对于解码的性能来说越好,则可以进一步考虑当前的帧率,如果当前的帧率小于或等于目标帧率,表明当前的帧率满足要求,可以暂时不丢弃第一视频帧。另外,如果不丢弃第一视频帧,那么就涉及到对第一视频帧进行编码。因为第一视频帧的质量本身就不太好,即使编码后的第一视频帧在被解码后还原度较高,可能也是不够清晰的,因此可以对第一视频帧采用第二编码方 式进行编码,在编码时使用较少的比特数,节省编码资源。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,在确定当前的帧率是否大于目标帧率之后,若确定当前的帧率大于目标帧率,则丢弃第一视频帧。
如果当前的帧率大于目标帧率,表明当前的帧率不满足要求,而第一视频帧的质量又不太好,因此可以直接丢弃第一视频帧,无需再对第一视频帧进行编码,节省编码资源,也无需再传输编码后的第一视频帧,节省带宽。
结合第一方面或第一方面的第一种可能的实现方式至第三种可能的实现方式中的任一种可能的实现方式,在第一方面的第四种可能的实现方式中,该参数为对比度参数,或,噪声参数。
只要是能够用于指示视频帧的清晰程度的参数,或理解为能够表明视频帧的质量的参数,都可以是本发明实施例中所提取的参数。
结合第一方面的第四种可能的实现方式,在第一方面的第五种可能的实现方式中,该参数为对比度参数。则,获取第一视频帧的参数的取值,可以通过以下方式实现:第一设备获得图像信号处理模块的对焦信息。若对焦信息所指示的对焦对象为第一视频帧需拍摄的对象,则第一设备根据对焦信息获得对比度参数的取值。
如果对焦信息所指示的对焦对象是第一视频帧需拍摄的对象,那么表明在拍摄第一视频帧时对焦是正确的,那么图像信号处理模块中记录的对焦信息有效,则第一设备直接获取即可,方式较为简单。
结合第一方面的第五种可能的实现方式,在第一方面的第六种可能的实现方式中,若对焦信息所指示的对焦对象不是第一视频帧需拍摄的对象,则第一设备通过索贝尔算子获得对比度参数的取值,或通过哈达码变换算法获得对比度参数的取值。
如果对焦信息所指示的对焦对象不是第一视频帧需拍摄的对象,那么表明在拍摄第一视频帧时对焦有误,此时图像信号处理模块中记录的对焦信息可能是不准确的,因此第一设备可以通过其他方式来获得第一视频帧的对比 度参数的取值。本发明实施例提供了索贝尔算子和哈达码变换这两种方法,可以预见的是,除这两种方法外,其他用于获取视频帧的对比度参数的取值的方法也在本发明实施例的保护范围之内。
结合第一方面或第一方面的第一种可能的实现方式至第六种可能的实现方式,在第一方面的第七种可能的实现方式中,第一设备通过第二视频帧对第一视频帧进行帧间预测。第二视频帧为第一视频帧之前的采用第一编码方式进行编码的视频帧中与第一视频帧之间的间隔最小的视频帧。
因为对于视频帧可能会采用不同的编码方式进行编码,那么因为对于按照第二编码方式进行编码的视频帧本身就是质量不太好的视频帧,如果后续的视频帧再利用这种视频帧的重建帧进行帧间预测,可能会导致预测结果不够准确,降低视频帧的质量。因此在本发明实施例中,对于按照第一编码方式进行编码的视频帧,由于其清晰度高,质量较好,可以将此类视频帧的重建帧作为后续视频帧的参考帧,即后续视频帧可以利用此类视频帧的重建帧进行帧间预测。而对于按照第二编码方式进行编码的视频帧,由于其本身视频源清晰度就较低,质量较差,在编码时也未进行重点编码,因此不将此类视频帧的重建帧作为后续视频帧的参考帧,即后续视频帧不选择此类视频帧的重建帧进行帧间预测。通过这种方式可以提高预测质量,减少错误传输的可能性。
第二方面,提供一种视频帧的处理设备,该设备要将采集的视频实时传输给第二设备。该设备包括获取模块、比较模块和处理模块。其中,获取模块用于在该设备采集视频的过程中,获取第一视频帧的参数的取值。比较模块用于将该参数的取值与预设取值范围进行比较,确定该参数的取值是否位于预设取值范围内。处理模块用于若该参数的取值位于预设取值范围内,则保留第一视频帧。其中,第一视频帧为该设备采集的视频中的任意一帧,该参数的取值用于指示第一视频帧的清晰程度;
结合第二方面,在第二方面的第一种可能的实现方式中,该设备还包括编码模块。编码模块用于:在处理模块保留所述第一视频帧之后,按照第一 编码方式对第一视频帧进行编码。其中,按照第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
结合第二方面,在第二方面的第二种可能的实现方式中,该设备还包括编码模块。比较模块还用于:在确定该参数的取值是否位于预设取值范围内之后,若该参数的取值没有位于所述预设取值范围内,确定当前的帧率是否大于目标帧率。所述编码模块还用于:若比较模块确定当前的帧率小于或等于目标帧率,按照第二编码方式对第一视频帧进行编码。其中,按照第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值。
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,比较模块还用于:在确定当前的帧率是否大于目标帧率之后,若当前的帧率大于所述目标帧率,则丢弃第一视频帧。
结合第二方面或第二方面的第一种可能的实现方式至第三种可能的实现方式中的任一种可能的实现方式,在第二方面的第四种可能的实现方式中,该参数为对比度参数,或,噪声参数。
结合第二方面的第四种可能的实现方式,在第二方面的第五种可能的实现方式中,该参数为对比度参数。获取模块用于:获得图像信号处理模块的对焦信息。若对焦信息所指示的对焦对象为第一视频帧需拍摄的对象,则根据对焦信息获得对比度参数的取值。
结合第二方面的第五种可能的实现方式,在第二方面的第六种可能的实现方式中,获取模块还用于:若对焦信息所指示的对焦对象不是第一视频帧需拍摄的对象,则通过索贝尔算子获得对比度参数的取值,或通过哈达码变换算法获得对比度参数的取值。
结合第二方面或第二方面的第一种可能的实现方式至第六种可能的实现方式,在第二方面的第七种可能的实现方式中,该设备还包括预测模块,用于:通过第二视频帧对第一视频帧进行帧间预测。第二视频帧为第一视频帧之前的采用第一编码方式进行编码的视频帧中与第一视频帧之间的间隔最小的视频帧。其中,采用第一编码方式进行编码,所使用的比特数量大于预设 比特数阈值。
第三方面,提供一种视频帧的处理设备,该设备要将采集的视频实时传输给第二设备。该设备包括存储器和处理器。其中,存储器用于存储指令。处理器用于执行存储器存储的指令,在采集视频的过程中,获取第一视频帧的参数的取值。将该参数的取值与预设取值范围进行比较,确定该参数的取值是否位于预设取值范围内。若该参数的取值位于预设取值范围内,则保留第一视频帧。第一视频帧为该设备采集的视频中的任意一帧,该参数的取值用于指示第一视频帧的清晰程度。
结合第三方面,在第三方面的第一种可能的实现方式中,处理器还用于:在保留第一视频帧之后,按照第一编码方式对第一视频帧进行编码。其中,按照第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
结合第三方面,在第三方面的第二种可能的实现方式中,处理器还用于:在确定该参数的取值是否位于预设取值范围内之后,若确定该参数的取值没有位于预设取值范围内,则确定当前的帧率是否大于目标帧率。若确定当前的帧率小于或等于目标帧率,则按照第二编码方式对第一视频帧进行编码。其中,按照第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值。
结合第三方面的第二种可能的实现方式,在第三方面的第三种可能的实现方式中,处理器还用于:在确定当前的帧率是否大于目标帧率之后,若确定当前的帧率大于目标帧率,则丢弃第一视频帧。
结合第三方面或第三方面的第一种可能的实现方式至第三种可能的实现方式中的任一种可能的实现方式,在第三方面的第四种可能的实现方式中,该参数为对比度参数,或,噪声参数。
结合第三方面或第三方面的第四种可能的实现方式,在第三方面的第五种可能的实现方式中,该参数为对比度参数。处理器用于获取第一视频帧的参数的取值,可以通过以下方式实现:获得图像信号处理模块的对焦信息。若对焦信息所指示的对焦对象为第一视频帧需拍摄的对象,则根据对焦信息 获得对比度参数的取值。
结合第三方面或第三方面的第五种可能的实现方式,在第三方面的第六种可能的实现方式中,处理器还用于:若对焦信息所指示的对焦对象不是第一视频帧需拍摄的对象,则通过索贝尔算子获得对比度参数的取值,或通过哈达码变换算法获得对比度参数的取值。
结合第三方面或第三方面的第一种可能的实现方式至第六种可能的实现方式中的任一种可能的实现方式,在第三方面的第七种可能的实现方式中,处理器还用于:通过第二视频帧对第一视频帧进行帧间预测。第二视频帧为第一视频帧之前的采用第一编码方式进行编码的视频帧中与第一视频帧之间的间隔最小的视频帧。采用第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
第四方面,本发明实施例提供了一种计算机存储介质,用于储存为上述第一设备所用的计算机软件指令,其包含用于执行上述第一方面或第一方面的任一种可能的实现方式中为第一设备所设计的程序。
本发明实施例中,即使需要丢弃一些视频帧,也会根据视频帧的质量进行选择,对于高质量的视频帧尽量保留,这样可以使得得到的视频尽量清晰,提高视频质量。
本发明实施例中,第一设备及第二设备等名称对设备本身不构成限定,在实际实现中,这些设备可以以其他名称出现。只要各个设备的功能和本发明实施例类似,属于本发明权利要求及其等同技术的范围之内。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所介绍的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为视频编码过程的示意图;
图2A为35种帧内预测模式的示意图;
图2B为帧内预测模式中的模式29的预测方式示意图;
图3为帧间预测方式的示意图;
图4为本发明实施例的一种应用场景示意图;
图5为因丢帧导致的错误传输示意图;
图6为本发明实施例提供的视频处理方法的流程图;
图7为本发明实施例提供的利用索贝尔算子计算对比度参数时对视频帧进行划分的示意图;
图8为本发明实施例提供的利用哈达玛变换算法计算对比度参数的取值时一种变换矩阵的示意图;
图9A为现有技术中当前的视频帧如何选取之前的帧进行帧间预测的示意图;
图9B为本发明实施例中当前的视频帧如何选取之前的帧进行帧间预测的示意图;
图10-图11为本发明实施例提供的视频帧的处理设备的两种结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明实施例保护的范围。
以下,对本发明实施例中的部分用语进行解释说明,以便于本领域技术人员理解。
1)H.264,是由国际电信联盟电信标准分局(ITU-T for ITU Telecommunication Standardization Sector)的视频编码专家组(Video Coding Experts Group,VCEG)和国际标准化组织(International Organization for  Standardization,ISO)/国际电工委员会(International Electrical Commission,IEC)动态图像专家组(Moving Picture Experts Group,MPEG)联合组成的联合视频组(Joint Video Team,JVT)提出的高度压缩数字视频编解码器标准。
在H.264协议里定义了三种帧,仅参考当前编码帧内部信息的帧进行预测的帧称为I帧。I帧一般是采用帧内预测方式进行预测,即会采用当前帧前面已经重建的部分图像块进行预测,而不会采用相邻帧进行预测。参考之前的帧生成的只包含差异部分编码的帧称为P帧,还有一种参考前后的帧进行编码的帧称为B帧。
其中,对于I帧一般进行的是帧内预测,该预测方式仅仅应用当前视频帧内部的重建值进行预测,若前面的视频帧丢失,并不会对当前视频帧产生影响。P帧一般会进行帧间预测和帧内预测,帧间预测就是利用当前编码的视频帧之前已经编码重建完成的视频帧进行预测。可见,如果当前编码的视频帧之前的参考帧若质量不高,则该视频帧根据参考帧进行帧间预测所得到的预测结果的质量也不会好。关于预测方式等与编码相关的内容将在下文中进行介绍。
2)终端设备,也称用户设备。其中,用户设备是指向用户提供语音和/或数据连通性的设备,例如可以包括具有无线连接功能的手持式设备、或连接到无线调制解调器的处理设备。该用户设备可以经无线接入网(Radio Access Network,RAN)与核心网进行通信,与RAN交换语音和/或数据。该用户设备可以包括(User Equipment,UE)、无线终端设备、移动终端设备、订户单元(Subscriber Unit)、订户站(Subscriber Station),移动站(Mobile Station)、移动台(Mobile)、远程站(Remote Station)、接入点(Access Point,AP)、远程终端设备(Remote Terminal)、接入终端设备(Access Terminal)、用户终端设备(User Terminal)、用户代理(User Agent)、或用户装备(User Device)等。例如,可以包括移动电话(或称为“蜂窝”电话),具有移动终端设备的计算机,便携式、袖珍式、手持式、计算机内置的或者车载的移动装置。例如,个人通信业务(Personal Communication Service,PCS)电话、无 绳电话、会话发起协议(SIP)话机、无线本地环路(Wireless Local Loop,WLL)站、个人数字助理(Personal Digital Assistant,PDA)等设备。
3)本发明实施例中的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。
因为本发明实施例涉及到视频编码的过程,因此,为了更好地理解本发明实施例所提供的技术方案,下面先简单介绍视频编码的过程。
请参见图1,为视频编码过程的示意图。视频编码过程包括预测、变换、量化、及熵编码等几个过程。其中变换过程主要是为了去除空间信号的相关性,变换方式例如为离散余弦变换(Discrete Cosine Transform,DCT)。量化过程是用更小的集合表示更大的集合的过程,量化方法如标量(Scalar)量化等。对于变换、量化等过程,均可参考现有技术,本发明实施例不多赘述。
例如第一设备通过传感器采集了一个视频帧,即图1中的原视频帧。第一设备在对该原视频帧进行编码时,将该原视频帧与最优预测值相减,得到预测残差,对预测残差进行变换、量化后得到量化系数,并对量化系数进行熵编码,生成码流。其中,如果原视频帧是I帧,则最优预测值为帧内预测的最优模式对应的预测值,如果原视频帧是P帧,则是利用前面的视频帧进行帧间预测以及利用原视频帧内已重建的图像块进行帧内预测这两大类预测方法对比后选择的最优预测值。另外,第一设备还对变换量化后得到的量化系数进行反量化、反变换,得到原视频帧的重建残差,根据重建残差和最优预测值得到重建视频帧,再根据重建视频帧进行预测。
其中,预测部分由两大类预测模式构成:帧间预测和帧内预测。
帧内预测是利用空间相关性,探索视频帧内部的相关性,一般来说,编码过程对视频帧逐块进行,编码端也会对各个视频块逐块进行重建。帧内预测块直接利用视频帧中当前编码的视频块的上方的重建块和左方的重建块进行预测。
其中,帧内预测可以有多种预测方式。请参见图2A,在H.265中共定义了35种帧内预测模式,其中包括平面模式(plannar)、均值模式(DC)两种预测模式以及33个方向的33种预测模式。例如DC模式表示均值预测,即采用相邻重建像素的均值作为预测。或者例如,请参见图2B,为其中的预测模式29的示意图。预测模式29是以当前块的上方一行(图2B中的横向的方框区域)和左方一列(图2B中的竖向的方框区域)的重建值进行相应方向的预测,图2B中的箭头表示方向,也就是说,在箭头所示的方向上的像素点都采用横线区域的重建值进行预测。
帧间预测是利用时间相关性,探索相邻视频帧之间的相关性,通过在时间上已经重建的视频帧对当前的视频帧进行预测,即利用当前视频帧之前的已经重建完成的其他视频帧对当前视频帧进行预测。
请参见图3,为帧间预测的一种示意图,图3以采用当前视频帧之前的4个帧作为参考帧为例,即利用之前的4个视频帧来对当前视频帧进行预测。目前为了简化预测过程,一般会选择用当前视频帧之前的一个视频帧对当前视频帧进行预测。图3中的5个视频帧是按照时间先后顺序排列的,右边的第一个视频帧为当前视频帧,前面的4个视频帧为参考帧。将当前视频帧划分为多个图像块,对于每个图像块,分别在4个参考帧中查找预测块,若查找到的预测块的数量大于或等于2,则从中选择一个最接近的预测块作为该图像块的预测块。例如图3中,当前视频帧里面的图像块1是将从左往右数的第3个参考帧中的预测块2作为预测块,当前视频帧里面图像块3在从左往右数的第1个参考帧中查找到了的预测块4,以及在从左往右数的第4个参考帧中查找到了预测块5,则对于图像块3来说,可以从预测块4和预测块5中选择一个来作为预测块。
对当前视频视进行不同块分割方式的帧内预测和帧间预测后,可以选择当前视频帧的最优预测值。目前,一般都是采用率失真优化方式来选择最优预测值。也就是说,利用率失真优化准则来权衡码率和误差两个因素,以选择最优的一种预测值。率失真优化的计算方法为:
RD cost=D+λR       (1)
公式(1)中,D表示在当前的预测方式(帧内预测或帧间预测)下编码后的重建值与原始值之间的误差,R表示在当前预测方式下编码使用的比特数,λ是拉格朗日因子。对于不同块分割方式下的帧内预测和帧间预测值,都可以通过公式(1)进行计算,得到相应的RDcost,最终选择RDcost最小的预测值作为最优预测值。
如上简单介绍了视频编码过程,本发明实施例主要涉及的是选择哪些视频帧进行编码,以及在视频编码过程中选择哪些视频帧进行后续的帧间预测。下面介绍本发明实施例的一种应用场景,请参见图4。在实时传输的过程中,例如第一设备在通过传感器(例如摄像头)实时采集视频(图4中的图像表示第一设备采集的视频帧),第一设备采集视频帧后,对视频帧进行编码,再实时将编码后的视频帧通过基站传输给第二设备,以供第二设备实时显示。因为网络状态不稳定,丢包率比较高,因此第一设备需要从采集的视频帧中丢弃一些视频帧,再将剩余的视频帧编码后传输给第二设备,这样可以减小对网络带宽的占用,提高传输成功率。其中,第一设备和第二设备都可以是用户设备,图4是以手机为例。
根据如上对于视频编码过程的介绍可知,对于P帧来说是需要采用帧间预测方式进行预测的,在网络带宽不稳定时,如果第一设备不主动丢弃视频帧,则会由于网络带宽不稳定性导致部分视频帧丢失,从而导致视频帧的错误传播,请参见图5,为现有技术中因为网络带宽不稳定而导致视频帧被迫丢弃的示意图。因为网络带宽不稳定,图5中的视频帧3被丢弃,视频帧3的质量比较高,后续的视频帧若采用视频帧3进行预测是可以得到比较好的预测结果的。但现在视频帧3被丢弃,导致后续的视频帧无法进行正确预测,从图5中可以看到,从视频帧4开始,都出现了预测错误,而且越往后错误越明显,从而导致错误传播。而现有技术中为了避免这种情况,采用了主动丢帧方式,如果按照现有技术中的主动丢帧方式,即第一设备等时间间隔地 抽取视频帧丢弃,则很可能会丢弃质量较高的视频帧而保留下设备晃动等情况下采集到的模糊视频帧。
为解决该技术问题,本发明实施例提出,即使需要丢弃一些视频帧,也会根据视频帧的质量进行选择,对于高质量的视频帧尽量保留,这样可以提高预测准确性,使得得到的视频尽量清晰,提高视频质量。
下面结合说明书附图介绍本发明实施例提供的技术方案。下文结合图4所示的应用场景进行介绍。
请参见图6,本发明一实施例提供一种视频帧处理方法,该方法可以通过采集视频帧且对视频帧进行编码的设备执行,以图1为例,该设备为第一设备。
步骤601:第一设备在采集视频的过程中,获取第一视频帧的参数的取值;第一视频帧为第一设备采集的视频中的任意一帧,该参数的取值用于指示第一视频帧的清晰程度;
步骤602:第一设备将该参数的取值与预设取值范围进行比较,确定该参数的取值是否位于预设取值范围内;
步骤603:若该参数的取值位于该预设取值范围内,则第一设备保留第一视频帧。
本发明实施例所提供的方法可以对第一设备采集的视频所包括的每个视频帧都执行,或者对采集的视频中的特定视频帧执行。另外,本发明实施例所提供的方法可以在任意情况下执行,或者也可以在考虑了当前帧率和/或网络状态的情况下执行,例如在当前帧率较大或者网络情况不稳定(网络带宽不足)时执行本发明实施例所提供的方案。
可能的实施方式中,该参数可以是对比度参数(contrast),或者是噪声参数,或者是其他可能的参数,只要能够反映视频帧的清晰程度的参数都可以作为本发明实施例中所提取的参数。很明确的是,若该参数为对比度参数,那么预设取值范围可以是大于预设对比度阈值的范围,若第一视频帧的对比度参数的取值大于预设对比度阈值,就确定对比度参数的取值位于预设取值范围内,认为第一视频帧较为清晰,质量较高,而若第一视频帧的对比度参 数的取值小于或等于预设对比度阈值,就确定对比度参数的取值没有位于预设取值范围内,认为第一视频帧较为模糊,质量较差。若该参数为噪声参数,即用于指示第一视频帧中噪声的严重程度,那么,预设取值范围可以是小于预设噪声阈值的范围,若第一视频帧的噪声参数的取值小于预设噪声阈值,就确定噪声参数的取值位于预设取值范围内,认为第一视频帧较为清晰,质量较高,而若第一视频帧的噪声参数的取值大于或等于预设噪声阈值,就确定噪声参数的取值没有位于预设取值范围内,认为第一视频帧较为模糊,质量较差。
若提取的第一视频帧的参数是噪声参数,那么涉及到如何获取噪声参数的取值。本发明实施例中,噪声参数可以通过感光度(ISO)参数来体现,通过ISO参数就可以估计噪声的严重程度。其中,ISO参数与分辨率信息相关,在分辨率不同时ISO的取值也会有所不同。一般来说,ISO参数的取值越大,则表明噪声越严重。针对不同的分辨率,可以分别设置噪声参数对应的预设取值范围,即设置对应的预设噪声阈值,从而根据第一视频帧的分辨率可以确定预先设置的该分辨率对应的预设取值范围,根据第一视频帧的噪声参数的取值是否位于预设取值范围内,就可以确定第一视频帧的质量。
或者,若提取的第一视频帧的参数是对比度参数,那么同样涉及到如何获取对比度参数的取值。
本发明实施例中,可以有不同的方式来获取对比度参数的取值,以下举例介绍。
1、根据对焦信息获取对比度参数。
在这种方式下,可以获得图像信号处理(Image Signal Processing,ISP)模块的对焦信息。若该对焦信息所指示的对焦对象为第一视频帧需拍摄的对象,那么就可以直接根据该对焦信息获得对比度参数的值,方式较为简单。其中,第一视频帧需拍摄的对象,也就是用户感兴趣的对象,对于一个视频来说,究竟哪些对象是用户感兴趣的对象,可以事先由用户设置,或者也可以通过分析该视频中位于第一视频帧之前的视频帧确定,例如位于第一视频 帧之前的视频帧中多数视频帧中都包括某个对象,或者位于第一视频帧之前的视频帧中的多个视频帧的对焦对象都是某个对象,那么就确定该对象为用户感兴趣的对象。
如果根据第一视频帧的对焦信息确定该对焦信息所指示的对焦对象就是用户感兴趣的对象,那么就说明第一视频帧的对焦无误,则通过该对焦信息所获得的对比度参数的取值可以认为是比较准确的。而如果该对焦信息所指示的对焦对象不是用户感兴趣的对象,那么说明在拍摄第一视频帧时对焦有误,若通过该对焦信息获得对比度参数的取值可能会不够准确,在这种情况下就可以不通过对焦信息来获得对比度参数的取值。
如果该对焦信息所指示的对焦对象不是用户感兴趣的对象,那么可以通过如下介绍的第2种方式或第3种方式来获得第一视频帧的对比度参数的取值。当然,可以在无法使用第1种方式获得对比度参数的取值时再选择使用第2种方式或第3种方式来获得第一视频帧的对比度参数的取值,或者一开始就可以在这3种方式中任意选取一种方式来获得第一视频帧的对比度参数的取值。
2、利用索贝尔(sobel)算子计算对比度参数的取值。
在这种方式下,可以利用视频帧中包括的像素点的YUV色彩空间模型中的表示明亮度(Luminance)的Y分量的值来计算对比度参数的取值。
以3*3的像素块为例。Sobel算子提供Gx和Gy两个矩阵,如下:
Figure PCTCN2016104119-appb-000001
Figure PCTCN2016104119-appb-000002
利用sobel算子进行计算时,可以理解成将相应的3*3的图像块的Y值构成的矩阵与Gx矩阵进行叉乘,得到val1,以及将该3*3的图像块的Y值构成的矩 阵与Gy矩阵进行叉乘,得到val2,并对val1和val2计算平方和得到该3*3的图像块的对比度的取值。所谓的叉乘,即两个矩阵对应位置的值相乘后相加。
若将该3*3的图像块的对比度表示为contrast,那么对于该3*3的图像块的对比度的取值可通过公式(2)来计算:
contrast=Val12+Val22       (2)
其中,根据公式(2)计算出来的,可以视为一个3*3的图像块的对比度的取值,更准确来讲,实际上应该是该3*3的图像块的中间位置的像素点的对比度的取值。那么,如果要使得计算结果更准确,可以分别将一个视频帧里的每个像素点作为一个3*3的图像块的中间位置的像素点,也就是说,在一个视频帧里分别构建3*3的图像块,使得该视频帧里的每个像素点都能够位于其中一个3*3的图像块的中间位置,通过如上的计算方式计算构建的每个3*3的图像块的对比度的取值,也就是计算该视频帧里的每个像素点的对比度的取值。在得到该视频帧里每个像素点的对比度的取值之后,将所有的对比度的取值求平均值,就得到了该视频帧的对比度的取值。这里求平均值,可以是简单的算术平均,也可以是加权平均,本发明实施例不作限制。
当然,如果想要简化计算过程,也可以直接将根据公式(2)计算得到的结果视为一个3*3的图像块的对比度的取值。在这种方式下,首先可以将一个视频帧按照3*3的尺寸进行划分,得到多个3*3的图像块,如图7所示。图7表示一个视频帧,其中的每个方框表示一个3*3的图像块,图7以将一个图像划分为72个3*3的图像块为例,在实际应用中当然不限于此。对划分得到的每个3*3图像块按照如上方式计算对比度的取值,在计算得到所有的图像块的对比度的取值后,将计算得到的所有对比度的取值求平均值,就得到了该视频帧的对比度的取值。这里求平均值,可以是简单的算术平均,也可以是加权平均,本发明实施例不作限制。
以上是以3*3的图像块为例,在实际应用中不限于此,例如还可以选取5*5的图像块,若选取5*5的图像块,那么对应的Gx矩阵和Gy矩阵如下:
Figure PCTCN2016104119-appb-000003
Figure PCTCN2016104119-appb-000004
具体计算方式可参考如上对于3*3的图像块的计算方式的描述。当然,除了3*3和5*5之外,还可以选取其他尺寸的图像块,例如7*7、或9*9等,本发明实施例不作限制。
3、利用哈达玛变换算法计算对比度参数的取值。
在这种方式下,也可以利用视频帧中包括的像素点的YUV色彩空间模型中Y分量的值来计算对比度参数的取值。
以8*8的像素块为例,其中像素块中的每个像素点取其Y分量的值,构成一个8*8的矩阵。对该8*8的矩阵进行哈达玛变换,使用的一种变换矩阵如图8所示。变换公式如下:
Figure PCTCN2016104119-appb-000005
公式(3)中,Y表示变换后的矩阵,X表示变换前的矩阵,Hn表示变换矩阵,例如图3所示的矩阵,
Figure PCTCN2016104119-appb-000006
表示该变换矩阵的转置矩阵。
在按照公式(3)进行哈达玛变换后,将得到的矩阵Y中的各个系数按如下公式进行计算,得到该8*8的像素块的对比度参数的值:
Figure PCTCN2016104119-appb-000007
公式(4)中,Y(j,k)表示该矩阵Y中的第j行第k列的像素点的亮度分量的值,abs表示取绝对值。即:将矩阵Y中除了(0,0)位置处的像素点的亮 度分量的值之外的63个像素点的亮度分量的值取绝对值相加作为该8*8的像素块的对比度参数的取值。将第一视频帧划分为多个8*8的像素块,计算每个8*8的像素块的对比度参数的取值,将各个像素块的对比度参数的取值相加就可以得到第一视频帧的对比度参数的取值。
以上是以8*8的像素块为例,在实际应用中不限于此,例如可以选取4*4的像素块,或16*16的像素块,或32*32的像素块,等等。
以上三种获得对比度参数的取值的方式只是举例,本领域技术人员自然知晓还可以采用何种方式来获得对比度参数的取值,本发明实施例对此不作限制。
下文主要以提取的参数是对比度参数为例。
在获得第一视频帧的对比度参数的取值后,将对比度参数的取值与预设取值范围进行比较,以确定对比度参数的取值是否位于预设取值范围内。对于对比度参数来说,预设取值范围是大于或等于预设对比度阈值的范围。
由以上几种计算对比度参数的取值的方式可知,对比度参数的取值可以充分反映当前视频内容的对比度信息,在内容接近的情况下,可以充分表明当前对焦的准确性。对比度参数的取值越大,说明对焦越好。因此,在为对比度参数的预设取值范围设置预设对比度阈值时,该预设对比度阈值可以利用相邻的前n帧的对比度参数的取值进行一定的加权平均后获得。在进行加权平均时,权值可以根据用户对于视频的要求选取,例如用户要求视频越清晰,则选取的权值可以越大。一种加权平均的方式如下:
Figure PCTCN2016104119-appb-000008
公式(5)中,contrast(i)表示第i帧的对比度参数的取值,i表示时间维度上的第i帧,i-1表示第i帧在时间上的前一帧,thr表示预设对比度阈值。公式(5)中以通过前3帧来计算预设对比度阈值为例,即n=3,以及以权值=0.8为例,在实际应用中当然不限于此。
另外,公式(5)只是一种获得预设对比度阈值的方法的示例,获取预设 对比度阈值的方法不限于此,类似的利用相邻帧的对比度信息进行一定的组合适配均可得到合适的预设对比度阈值。
如果对比度参数的取值大于预设对比度阈值,那么表明第一视频帧的质量较高,则可以确定不丢弃第一视频帧,即保留第一视频帧。因为还要对第一视频帧进行编码,因此在可能的实施方式中,可以按照第一编码方式对第一视频帧进行编码,其中,按照第一编码方式进行编码,所使用的比特(bit)的数量大于预设比特数阈值,即会为第一视频帧分配较多的比特,再通过码率控制技术为第一视频帧选取较小的量化参数(quantization parameter,QP),之后进行编码,具体编码方式可参考现有技术中的编码方式。预设比特数阈值可以根据设备的能力、或用户对于视频的要求而设定。也就是说,对比度参数的取值可以表明第一视频帧的对焦准确性以及清晰度,若对比度参数的取值大于或等于预设对比度阈值,就可以认为第一视频帧为对焦较准确或者较为清晰的视频帧,也就是质量比较高的视频帧,则本发明实施例中就为编码器分配比较多的比特来对该视频帧进行重点编码,以提高整个视频的质量。
其中,按照第一编码方式进行编码,所使用的比特数量可以相同,也可以不同。比如有两个视频帧,分别为第一视频帧和第二视频帧,都采用第一编码方式进行编码,第一视频帧与预设对比度阈值之间的差值小于第二视频帧与预设对比度阈值之间的差值,即可以认为第二视频帧的质量要高于第一视频帧的质量,在这种情况下,对第一视频帧和第二视频帧进行编码可以采用相同的比特数量,这样对于编码器来说较为简单,或者,对第二视频帧编码所使用的比特数量可以大于对第一视频帧编码所使用的比特数量,这样可以进一步提高视频的质量,使得质量高的视频帧得到更好的编码。
若按照第一编码方式进行编码所使用的比特数量不同,那么可以预先设置多个差值区间,并设置差值区间与编码所使用的比特数量之间的对应关系,对于一个视频帧来说,如果该视频帧的对比度参数的取值大于预设对比度阈值,则计算该视频帧的对比度参数的取值与预设对比度阈值之间的差值,确 定该差值位于哪个差值区间内,则采用该差值区间所对应的比特数量来对该视频帧进行编码。
而在将获得的对比度参数的取值与预设取值范围进行比较后,如果确定对比度参数的取值没有位于预设取值范围内,即对比度参数的取值小于或等于预设对比度阈值,那么就表明第一视频帧的质量不是很好。如果是这种情况,那么可以选择直接丢弃第一视频帧,或者,若考虑到丢弃的帧越少则视频的质量就会越好,那么还可以进一步考虑当前的帧率。即,如果确定对比度参数的取值小于或等于预设对比度阈值,可以确定当前的帧率是否大于目标帧率,帧率是与第一设备和第二设备之间的网络带宽相关的,若当前的帧率大于目标帧率,表明网络带宽不足,在这种情况下就没有必要再传输质量不好的视频帧,因此可以直接丢弃第一视频帧,以节省网络带宽。且因为丢弃的是质量不太好的视频帧,对于整个视频的质量的影响也不会很大。
而如果当前的帧率小于或等于目标帧率,则表明网络带宽是足够的,在这种情况下,即使视频帧的质量不太好,也可以进行传输,以减少丢弃的视频帧的数量。若不丢弃第一视频帧,那么就涉及到对第一视频帧进行编码。在可能的实施方式中,若第一视频帧的对比度参数的取值小于或等于预设对比度阈值,而又因为帧率的原因未丢弃该第一视频帧,那么在对第一视频帧进行编码时,可以采用第二编码方式进行编码,其中,采用第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值,即会为第一视频帧分配较少的比特,再通过码率控制技术为第一视频帧选取较大的QP,之后进行编码,具体编码方式可参考现有技术中的编码方式。本发明实施例中,在为按照第二编码方式进行编码的视频帧分配QP时,可以根据经验分配,或者也可以预先设定为按照第二编码方式编码的视频帧分配的QP与为按照第一编码方式编码的视频帧分配的QP的差值为M,且为按照第二编码方式编码的视频帧分配的QP大于为按照第一编码方式编码的视频帧分配的QP。其中,M可以根据经验值设定,或者根据用户对于视频帧的要求设定,例如M等于2或3等。也就是说,对比度参数的取值可以表明第一视频帧的对焦准确性以及清 晰度,若对比度参数的取值小于或等于预设对比度阈值,就可以认为第一视频帧为对焦不够准确或者不清晰的视频帧,为了优化编码器的编码效果,本发明实施例中编码器无需耗费较多的比特对该视频帧进行编码,也就是不对这种视频帧进行重点编码,以节省编码比特数,也减轻编码器的负担。
其中,按照第二编码方式进行编码,所使用的比特数量可以相同,也可以不同。比如有两个视频帧,分别为第一视频帧和第二视频帧,都采用第二编码方式进行编码,第一视频帧与预设对比度阈值之间的差值小于第二视频帧与预设对比度阈值之间的差值,即可以认为第一视频帧的质量要高于第二视频帧的质量,在这种情况下,对第一视频帧和第二视频帧进行编码可以采用相同的比特数量,这样对于编码器来说较为简单,或者,对第一视频帧编码所使用的比特数量可以大于对第二视频帧编码所使用的比特数量,这样可以进一步提高视频的质量,使得质量高的视频帧得到更好的编码。
若按照第二编码方式进行编码所使用的比特数量不同,那么可以预先设置多个差值区间,并设置差值区间与编码所使用的比特数量之间的对应关系,对于一个视频帧来说,如果该视频帧的对比度参数的取值小于或等于预设对比度阈值,则计算该视频帧的对比度参数的取值与预设对比度阈值之间的差值,确定该差值位于哪个差值区间内,则采用该差值区间所对应的比特数量来对该视频帧进行编码。
本发明实施例中,可以根据网络带宽以及当前的帧率来确定为第一编码方式分配的比特数和为第二编码方式分配的比特数。这里的为第一编码方式分配的比特数,是指在对一个视频帧按照第一编码方式进行编码时所使用的比特数,为第二编码方式分配的比特数也是同样。其中,为了更好地确定为第一编码方式分配的比特数和为第二编码方式分配的比特数,可以在这两种比特数之间设置比例关系,例如预先设定为第一编码方式分配的比特数是为第二编码方式分配的比特数的N倍,其中N为大于1的整数或小数。
例如N=2,网络带宽为1000Kbit/s,当前的帧率为30,按照第一编码方式进行编码的视频帧和按照第二编码方式进行编码的视频帧的比例大概是1:1, 则,为第一编码方式分配的比特数为:1000*2/(15*2+15)=44.4Kbits,为第二编码方式分配的比特数为:1000/(15x2+15)=22.2kbits。此处,N的取值可以根据用户对视频帧质量要求而定,按照第一编码方式进行编码的视频帧和按照第二编码方式进行编码的视频帧的比例可以根据用户晃动情况以及对焦准确性等因素而定。
另外,因为对于视频帧可能会采用不同的编码方式进行编码,那么因为对于按照第二编码方式进行编码的视频帧本身就是质量不太好的视频帧,如果后续的视频帧再利用这种视频帧的重建帧进行帧间预测,可能会导致预测结果不够准确,降低视频帧的质量。因此在本发明实施例中规定,对于重点编码的视频帧,即按照第一编码方式进行编码的视频帧,由于其清晰度高,质量较好,可以将此类视频帧的重建帧作为后续视频帧的参考帧,即后续视频帧可以利用此类视频帧的重建帧进行帧间预测。而对于非重点编码的视频帧,即按照第二编码方式进行编码的视频帧,由于其本身视频源清晰度就较低,质量较差,在编码时也未进行重点编码,因此不将此类视频帧的重建帧作为后续视频帧的参考帧,即后续视频帧不选择此类视频帧的重建帧进行帧间预测。例如,可以为按照第一编码方式进行编码的视频帧和按照第二编码方式进行编码的视频帧分别添加标识信息,例如通过一个比特来实现该标识信息,对于按照第一编码方式进行编码的视频帧,用于实现该标识信息的比特的取值为“1”,对于按照第二编码方式进行编码的视频帧,用于实现该标识信息的比特的取值为“0”,则通过一个视频帧的标识信息就可以确定该视频帧是按照何种编码方式进行的编码,从而可以知晓是否要使用该视频帧作为后续视频帧的参考帧。
例如对于第一视频帧来说,可以采用第二视频帧对第一视频帧进行帧间预测,也就是采用第二视频帧的重建帧对第一视频帧进行帧间预测。其中,第二视频帧是第一视频帧之前的采用第一编码方式进行编码的视频帧中与第一视频帧之间的间隔最小的视频帧。通过这种方式,可以尽量选取质量较高的视频帧进行帧间预测,由于其清晰度更加接近,可以更有利于找到更精确 的预测值,从而辅助提升编码压缩率。
如图9A所示,为现有的实时视频编码业务中常用的一种参考帧结构,其中每个矩形表示一个视频帧。在现有方式下,每一帧使用时间上相邻的前一帧作为参考帧,如图9A中的箭头所示。如图9B所示,为本发明实施例所提供的参考帧结构,其中每个矩形表示一个视频帧,长度较长的矩形表示按照第一编码方式进行编码的视频帧,在图9B中将其称为大P帧,长度较短的矩形表示按照第二编码方式进行编码的视频帧,在图9B中将其称为小P帧。其中,后续的视频帧进行帧间预测时可以选择之前的大P帧,而不选择之前的小P帧,如图9B中的箭头所示。例如对于图9B中从左往右数的第4个视频帧,即大P帧来说,其在进行帧间预测时不选择与其相邻的前一个小P帧,而是可以跳过该小P帧,选择该小P帧之前的大P帧,即从左往右数的第2个视频帧进行帧间预测。
需要说明的是,本发明实施例所提供的视频帧处理方法,不仅适用于P帧,同样也适用于I帧。如前介绍了对于P帧可以采用第一编码方式或第二编码方式,对于I帧同样可以采用两种编码方式,当然,I帧的第一编码方式与P帧的第一编码方式可以相同也可以不同,I帧的第二编码方式与P帧的第二编码方式可以相同也可以不同。若I帧的参数的取值位于预设取值范围内(对于I帧来说,预设取值范围与P帧所选取的预设取值范围可以相同也可以不同),则对I帧采用第一编码方式,若I帧的参数的取值没有位于预设取值范围内,而当前的帧率小于或等于目标帧率,则对I帧采用第二编码方式。其中,对于按照第一编码方式进行编码的I帧来说,因为其质量较高,规定其可以用作后续视频帧的长期参考帧,即:除了时间相邻的一个后续P帧可以采用该I帧作参考帧之外,后续其它的P帧也可以采用该I帧作为参考帧。而对于按照第二编码方式进行编码的I帧来说,因为其质量不够好,比较模糊,因此规定其只用作后续与其在时间上的距离最小的P帧使用该I帧作为参考帧,而后续的其他P帧不使用该I帧作为参考帧。
下面结合说明书附图介绍本发明实施例提供的设备。
请参见图10,本发明一实施例提供一种视频帧的处理设备,该设备可以包括处理器1001和存储器1002。
其中,处理器1001可以包括中央处理器(CPU)或特定应用集成电路(Application Specific Integrated Circuit,ASIC),可以包括一个或多个用于控制程序执行的集成电路,可以包括使用现场可编程门阵列(Field Programmable Gate Array,FPGA)开发的硬件电路,可以包括基带芯片。
存储器1002的数量可以是一个或多个。存储器1002可以包括只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)和磁盘存储器,等等。存储器1002可以用于存储处理器1001执行任务所需的程序代码,还可以用于存储数据等。
存储器1002可以通过总线1000与处理器1001相连接(图10以此为例),或者也可以通过专门的连接线与处理器1001连接。
通过对处理器1001进行设计编程,将图6所示的实施例所提供的方法所对应的代码固化到芯片内,从而使芯片在运行时能够执行图6所示的实施例中的所示的方法。如何对处理器1001进行设计编程为本领域技术人员所公知的技术,这里不再赘述。以及,因该设备可以用于执行上述图6所示的实施例提供的方法,对于该设备中的各功能模块所实现的功能等,可参考如前方法部分的描述,不多赘述。
请参见图11,本发明一实施例提供一种视频帧的处理设备,该设备可以包括获取模块1101、比较模块1102和处理模块1103。其中,获取模块1101用于获取第一视频帧的参数的取值,第一视频帧为采集的视频中的任意一帧,参数的取值用于指示第一视频帧的清晰程度。比较模块1102用于将该参数的取值与预设阈值进行比较,确定该参数的取值是否小于预设阈值。处理模块1103用于若该参数的取值大于或等于预设阈值,则保留第一视频帧。
可选的,该设备还可以包括编码模块1104,在图11中一并示出。编码模块1104作为可选的功能模块,为了与必选的功能模块相区分,在图11中将其画为虚线形式。
可选的,该设备还可以包括预测模块1105,在图11中一并示出。预测模块1105作为可选的功能模块,为了与必选的功能模块相区分,在图11中将其画为虚线形式。
在实际应用中,获取模块1101、比较模块1102和处理模块1103、编码模块1104以及预测模块1105对应的实体设备均可以是图10所示的实施例中的处理器1001。
该设备可以用于执行上述图6所示的实施例提供的方法,因此对于该设备中的各功能模块所实现的功能等,可参考如前方法部分的描述,不多赘述。
本发明实施例中将第一视频帧的参数的取值与预设取值范围进行比较,如果该参数的取值位于预设取值范围内,那么就保留第一视频帧,即不会丢弃第一视频帧,也就是说,即使需要丢弃一些视频帧,也会根据视频帧的质量进行选择,对于高质量的视频帧尽量保留,这样可以使得得到的视频尽量清晰,提高视频质量。
在本发明中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例。
在本发明实施例中的各功能单元可以集成在一个处理单元中,或者各个单元也可以均是独立的物理模块。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售 或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备,例如可以是个人计算机,服务器,或者网络设备等,或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:通用串行总线闪存盘(Universal Serial Bus flash drive)、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以对本发明的技术方案进行了详细介绍,但以上实施例的说明只是用于帮助理解本发明实施例的方法,不应理解为对本发明实施例的限制。本技术领域的技术人员可轻易想到的变化或替换,都应涵盖在本发明实施例的保护范围之内。

Claims (24)

  1. 一种视频帧的处理方法,应用于采集视频的第一设备,所述第一设备要将采集的所述视频实时传输给第二设备,其特征在于,所述方法包括:
    所述第一设备在采集视频的过程中,获取第一视频帧的参数的取值;所述第一视频帧为所述第一设备采集的所述视频中的任意一帧,所述参数的取值用于指示所述第一视频帧的清晰程度;
    所述第一设备将所述参数的取值与预设取值范围进行比较,确定所述参数的取值是否位于预设取值范围内;
    若所述参数的取值位于所述预设取值范围内,则所述第一设备保留所述第一视频帧。
  2. 如权利要求1所述的方法,其特征在于,在所述第一设备保留所述第一视频帧之后,还包括:
    所述第一设备按照第一编码方式对所述第一视频帧进行编码;其中,按照所述第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
  3. 如权利要求1所述的方法,其特征在于,在所述第一设备确定所述参数的取值是否位于预设取值范围内之后,还包括:
    若所述第一设备确定所述参数的取值没有位于所述预设取值范围内,则确定当前的帧率是否大于目标帧率;
    若所述第一设备确定当前的帧率小于或等于所述目标帧率,则按照第二编码方式对所述第一视频帧进行编码;其中,按照所述第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值。
  4. 如权利要求3所述的方法,其特征在于,在确定当前的帧率是否大于目标帧率之后,还包括:
    若所述第一设备确定当前的帧率大于所述目标帧率,则丢弃所述第一视频帧。
  5. 如权利要求1-4任一所述的方法,其特征在于,所述参数为对比度参 数,或,噪声参数。
  6. 如权利要求5所述的方法,其特征在于,所述参数为对比度参数;获取第一视频帧的参数的取值,包括:
    所述第一设备获得图像信号处理模块的对焦信息;
    若所述对焦信息所指示的对焦对象为所述第一视频帧需拍摄的对象,则所述第一设备根据所述对焦信息获得所述对比度参数的取值。
  7. 如权利要求6所述的方法,其特征在于,所述方法还包括:
    若所述对焦信息所指示的对焦对象不是所述第一视频帧需拍摄的对象,则所述第一设备通过索贝尔算子获得所述对比度参数的取值,或通过哈达码变换算法获得所述对比度参数的取值。
  8. 如权利要求1-7任一所述的方法,其特征在于,所述方法还包括:
    所述第一设备通过第二视频帧对所述第一视频帧进行帧间预测;所述第二视频帧为所述第一视频帧之前的采用第一编码方式进行编码的视频帧中与所述第一视频帧之间的间隔最小的视频帧;其中,采用所述第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
  9. 一种视频帧的处理设备,所述设备要将采集的视频实时传输给第二设备,其特征在于,所述设备包括:
    获取模块,用于在所述设备采集视频的过程中,获取第一视频帧的参数的取值;所述第一视频帧为所述设备采集的视频中的任意一帧,所述参数的取值用于指示所述第一视频帧的清晰程度;
    比较模块,用于将所述参数的取值与预设取值范围进行比较,确定所述参数的取值是否位于预设取值范围内;
    处理模块,用于若所述参数的取值位于所述预设取值范围内,则保留所述第一视频帧。
  10. 如权利要求9所述的设备,其特征在于,所述设备还包括编码模块;
    所述编码模块用于:在所述处理模块保留所述第一视频帧之后,按照第一编码方式对所述第一视频帧进行编码;其中,按照所述第一编码方式进行 编码,所使用的比特数量大于预设比特数阈值。
  11. 如权利要求9所述的设备,其特征在于,所述设备还包括编码模块;
    所述比较模块还用于:在确定所述参数的取值是否位于预设取值范围内之后,若所述参数的取值没有位于所述预设取值范围内,确定当前的帧率是否大于目标帧率;
    所述编码模块还用于:若所述比较模块确定当前的帧率小于或等于所述目标帧率,按照第二编码方式对所述第一视频帧进行编码;其中,按照所述第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值。
  12. 如权利要求11所述的设备,其特征在于,所述比较模块还用于:
    在确定当前的帧率是否大于目标帧率之后,若当前的帧率大于所述目标帧率,则丢弃所述第一视频帧。
  13. 如权利要求9-12任一所述的设备,其特征在于,所述参数为对比度参数,或,噪声参数。
  14. 如权利要求12所述的设备,其特征在于,所述参数为对比度参数;所述获取模块用于:
    获得图像信号处理模块的对焦信息;
    若所述对焦信息所指示的对焦对象为所述第一视频帧需拍摄的对象,则根据所述对焦信息获得所述对比度参数的取值。
  15. 如权利要求14所述的设备,其特征在于,所述获取模块还用于:
    若所述对焦信息所指示的对焦对象不是所述第一视频帧需拍摄的对象,则通过索贝尔算子获得所述对比度参数的取值,或通过哈达码变换算法获得所述对比度参数的取值。
  16. 如权利要求9-15任一所述的设备,其特征在于,所述设备还包括预测模块,用于:
    通过第二视频帧对所述第一视频帧进行帧间预测;所述第二视频帧为所述第一视频帧之前的采用第一编码方式进行编码的视频帧中与所述第一视频帧之间的间隔最小的视频帧;其中,采用所述第一编码方式进行编码,所使 用的比特数量大于预设比特数阈值。
  17. 一种视频帧的处理设备,所述设备要将采集的视频实时传输给第二设备,其特征在于,所述设备包括:
    存储器,用于存储指令;
    处理器,用于执行所述指令,在采集视频的过程中,获取第一视频帧的参数的取值;将所述参数的取值与预设取值范围进行比较,确定所述参数的取值是否位于预设取值范围内;若所述参数的取值位于所述预设取值范围内,则保留所述第一视频帧;所述第一视频帧为所述设备采集的所述视频中的任意一帧,所述参数的取值用于指示所述第一视频帧的清晰程度。
  18. 如权利要求17所述的设备,其特征在于,所述处理器还用于:
    在保留所述第一视频帧之后,按照第一编码方式对所述第一视频帧进行编码;其中,按照所述第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
  19. 如权利要求17所述的设备,其特征在于,所述处理器还用于:
    在确定所述参数的取值是否位于预设取值范围内之后,若确定所述参数的取值没有位于所述预设取值范围内,则确定当前的帧率是否大于目标帧率;
    若确定当前的帧率小于或等于所述目标帧率,则按照第二编码方式对所述第一视频帧进行编码;其中,按照所述第二编码方式进行编码,所使用的比特数量小于或等于预设比特数阈值。
  20. 如权利要求19所述的设备,其特征在于,所述处理器还用于:
    在确定当前的帧率是否大于目标帧率之后,若确定当前的帧率大于所述目标帧率,则丢弃所述第一视频帧。
  21. 如权利要求17-20任一所述的设备,其特征在于,所述参数为对比度参数,或,噪声参数。
  22. 如权利要求21所述的设备,其特征在于,所述参数为对比度参数;所述处理器用于获取第一视频帧的参数的取值,包括:
    获得图像信号处理模块的对焦信息;
    若所述对焦信息所指示的对焦对象为所述第一视频帧需拍摄的对象,则根据所述对焦信息获得所述对比度参数的取值。
  23. 如权利要求22所述的设备,其特征在于,所述处理器还用于:
    若所述对焦信息所指示的对焦对象不是所述第一视频帧需拍摄的对象,则通过索贝尔算子获得所述对比度参数的取值,或通过哈达码变换算法获得所述对比度参数的取值。
  24. 如权利要求17-23任一所述的设备,其特征在于,所述处理器还用于:
    通过第二视频帧对所述第一视频帧进行帧间预测;所述第二视频帧为所述第一视频帧之前的采用第一编码方式进行编码的视频帧中与所述第一视频帧之间的间隔最小的视频帧;其中,采用所述第一编码方式进行编码,所使用的比特数量大于预设比特数阈值。
PCT/CN2016/104119 2016-10-31 2016-10-31 一种视频帧的处理方法及设备 WO2018076370A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680080601.5A CN108713318A (zh) 2016-10-31 2016-10-31 一种视频帧的处理方法及设备
PCT/CN2016/104119 WO2018076370A1 (zh) 2016-10-31 2016-10-31 一种视频帧的处理方法及设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/104119 WO2018076370A1 (zh) 2016-10-31 2016-10-31 一种视频帧的处理方法及设备

Publications (1)

Publication Number Publication Date
WO2018076370A1 true WO2018076370A1 (zh) 2018-05-03

Family

ID=62024243

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/104119 WO2018076370A1 (zh) 2016-10-31 2016-10-31 一种视频帧的处理方法及设备

Country Status (2)

Country Link
CN (1) CN108713318A (zh)
WO (1) WO2018076370A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114422734A (zh) * 2022-01-28 2022-04-29 杭州海康威视数字技术股份有限公司 一种录像机、视频数据处理方法、装置及电子设备
CN114422735A (zh) * 2022-01-28 2022-04-29 杭州海康威视数字技术股份有限公司 一种录像机、视频数据处理方法、装置及电子设备
CN117596386A (zh) * 2023-12-06 2024-02-23 中云数科(广州)信息科技有限公司 一种智能楼宇安全监控系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541553B (zh) * 2020-12-18 2024-04-30 深圳地平线机器人科技有限公司 目标对象的状态检测方法、装置、介质以及电子设备
CN117528638A (zh) * 2022-07-27 2024-02-06 展讯通信(上海)有限公司 一种数据传输方法及相关装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283269A1 (en) * 2006-05-31 2007-12-06 Pere Obrador Method and system for onboard camera video editing
CN101527040A (zh) * 2008-03-05 2009-09-09 深圳华为通信技术有限公司 图像处理方法及系统
CN102598665A (zh) * 2009-11-06 2012-07-18 高通股份有限公司 基于图像捕捉参数控制视频编码
CN103428483A (zh) * 2012-05-16 2013-12-04 华为技术有限公司 一种媒体数据处理方法及设备
CN103428460A (zh) * 2012-05-24 2013-12-04 联发科技股份有限公司 针对影像撷取模组记录输出视讯序列的录影方法以及录影装置
CN103636212A (zh) * 2011-07-01 2014-03-12 苹果公司 基于帧相似性和视觉质量以及兴趣的帧编码选择
CN103650504A (zh) * 2011-05-24 2014-03-19 高通股份有限公司 基于图像捕获参数对视频编码的控制
CN104796580A (zh) * 2014-01-16 2015-07-22 北京亿羽舜海科技有限公司 一种基于选择集成的实时稳像视频巡检系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100487722C (zh) * 2005-08-26 2009-05-13 欧姆龙株式会社 一种确定特征的级联分类器的连接顺序和特征阈值的方法
US9208394B2 (en) * 2005-09-05 2015-12-08 Alpvision S.A. Authentication of an article of manufacture using an image of the microstructure of it surface
JP4218720B2 (ja) * 2006-09-22 2009-02-04 ソニー株式会社 撮像装置、および撮像装置制御方法、並びにコンピュータ・プログラム
CN102170554B (zh) * 2010-02-25 2016-04-27 无锡中感微电子股份有限公司 基于电力网的网络摄像装置及网络摄像方法
US10033658B2 (en) * 2013-06-20 2018-07-24 Samsung Electronics Co., Ltd. Method and apparatus for rate adaptation in motion picture experts group media transport
CN104881854B (zh) * 2015-05-20 2017-10-31 天津大学 基于梯度和亮度信息的高动态范围图像融合方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283269A1 (en) * 2006-05-31 2007-12-06 Pere Obrador Method and system for onboard camera video editing
CN101527040A (zh) * 2008-03-05 2009-09-09 深圳华为通信技术有限公司 图像处理方法及系统
CN102598665A (zh) * 2009-11-06 2012-07-18 高通股份有限公司 基于图像捕捉参数控制视频编码
CN103650504A (zh) * 2011-05-24 2014-03-19 高通股份有限公司 基于图像捕获参数对视频编码的控制
CN103636212A (zh) * 2011-07-01 2014-03-12 苹果公司 基于帧相似性和视觉质量以及兴趣的帧编码选择
CN103428483A (zh) * 2012-05-16 2013-12-04 华为技术有限公司 一种媒体数据处理方法及设备
CN103428460A (zh) * 2012-05-24 2013-12-04 联发科技股份有限公司 针对影像撷取模组记录输出视讯序列的录影方法以及录影装置
CN104796580A (zh) * 2014-01-16 2015-07-22 北京亿羽舜海科技有限公司 一种基于选择集成的实时稳像视频巡检系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114422734A (zh) * 2022-01-28 2022-04-29 杭州海康威视数字技术股份有限公司 一种录像机、视频数据处理方法、装置及电子设备
CN114422735A (zh) * 2022-01-28 2022-04-29 杭州海康威视数字技术股份有限公司 一种录像机、视频数据处理方法、装置及电子设备
CN114422734B (zh) * 2022-01-28 2023-12-01 杭州海康威视数字技术股份有限公司 一种录像机、视频数据处理方法、装置及电子设备
CN114422735B (zh) * 2022-01-28 2023-12-01 杭州海康威视数字技术股份有限公司 一种录像机、视频数据处理方法、装置及电子设备
CN117596386A (zh) * 2023-12-06 2024-02-23 中云数科(广州)信息科技有限公司 一种智能楼宇安全监控系统
CN117596386B (zh) * 2023-12-06 2024-05-24 中云数科(广州)信息科技有限公司 一种智能楼宇安全监控系统

Also Published As

Publication number Publication date
CN108713318A (zh) 2018-10-26

Similar Documents

Publication Publication Date Title
WO2018076370A1 (zh) 一种视频帧的处理方法及设备
EP1797722B1 (en) Adaptive overlapped block matching for accurate motion compensation
US9521411B2 (en) Method and apparatus for encoder assisted-frame rate up conversion (EA-FRUC) for video compression
US11558639B2 (en) Selective resolution video encoding method, computer device, and readable storage medium
WO2016058357A1 (zh) 视频处理方法、编码设备和解码设备
US20220058775A1 (en) Video denoising method and apparatus, and storage medium
JP2019505144A (ja) ビデオコーディングのためのフィルタのための幾何学的変換
CA2883133C (en) A video encoding method and a video encoding apparatus using the same
RU2566332C2 (ru) Способ и устройство кодирования и способ и устройство декодирования
US9414086B2 (en) Partial frame utilization in video codecs
JP6615346B2 (ja) 符号化処理におけるリアルタイムビデオノイズ低減のための方法、端末、および、不揮発性コンピュータ可読記憶媒体
KR101482896B1 (ko) 최적화된 디블록킹 필터
CN108012163B (zh) 视频编码的码率控制方法及装置
BRPI0304565B1 (pt) Método de predição de movimento compensado para uso na codificação da sequência de vídeo digital, codificador e decodificador de vídeo para codificar/decodificar uma sequência de vídeo digital usando predição de movimento compensado, terminal de miltimídia, e, codec de vídeo
TW201004357A (en) Rate-distortion quantization for context-adaptive variable length coding (CAVLC)
WO2021057481A1 (zh) 视频编解码方法和相关装置
US11134250B2 (en) System and method for controlling video coding within image frame
WO2021185257A1 (zh) 图像编码方法、图像解码方法及相关装置
CN110832856A (zh) 用于减小视频编码波动的系统及方法
CN107409211A (zh) 一种视频编解码方法及装置
WO2022022622A1 (zh) 图像编码方法、图像解码方法及相关装置
CN117616751A (zh) 动态图像组的视频编解码
CN113615191B (zh) 图像显示顺序的确定方法、装置和视频编解码设备
JP4458923B2 (ja) 画像処理装置
TW202044830A (zh) 用於針對合併模式以確定預測權重的方法、裝置和系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16919800

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16919800

Country of ref document: EP

Kind code of ref document: A1