CN116761036A - Video encoding method and device, electronic equipment and computer readable storage medium - Google Patents

Video encoding method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN116761036A
CN116761036A CN202311048965.3A CN202311048965A CN116761036A CN 116761036 A CN116761036 A CN 116761036A CN 202311048965 A CN202311048965 A CN 202311048965A CN 116761036 A CN116761036 A CN 116761036A
Authority
CN
China
Prior art keywords
frame
ith
ith frame
similarity
filling degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311048965.3A
Other languages
Chinese (zh)
Other versions
CN116761036B (en
Inventor
黄震坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongguancun Kejin Technology Co Ltd
Original Assignee
Beijing Zhongguancun Kejin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongguancun Kejin Technology Co Ltd filed Critical Beijing Zhongguancun Kejin Technology Co Ltd
Priority to CN202311048965.3A priority Critical patent/CN116761036B/en
Publication of CN116761036A publication Critical patent/CN116761036A/en
Application granted granted Critical
Publication of CN116761036B publication Critical patent/CN116761036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1101Session protocols
    • H04L65/1108Web based protocols, e.g. webRTC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/70Media network packetisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping

Abstract

The present disclosure provides a video encoding method and apparatus, an electronic device, and a computer-readable storage medium, the method comprising: acquiring the image complexity of the ith frame and the image complexity of the i-1 th frame; determining the similarity of the ith frame and the i-1 frame according to the image complexity of the ith frame and the image complexity of the i-1 frame; acquiring filling degree of a buffer area after the i-1 th frame is placed in the buffer area and before the i-1 th frame is placed in the buffer area, wherein the filling degree is the ratio of the number of the buffered video frames in the buffer area to the rated capacity; judging whether the ith frame jumps or not according to the similarity of the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of a buffer area and a preset filling degree threshold, and obtaining a first judgment result corresponding to the ith frame; and determining a target coding mode of the ith frame based on a first judging result corresponding to the ith frame, and coding the ith frame according to the target coding mode. The embodiment of the disclosure can reduce the misjudgment of frame skip and the coding efficiency.

Description

Video encoding method and device, electronic equipment and computer readable storage medium
Technical Field
The disclosure relates to the field of information technology, and in particular, to a video encoding method and device, electronic equipment and a computer readable storage medium.
Background
WebRTC (Web Real-Time Communications) is a Real-time communication technology that allows Web applications or sites to establish Peer-to-Peer (Peer-to-Peer) connections between browsers without the aid of intermediaries, enabling the transmission of video and/or audio streams.
The WebRTC comprises the functions of video frame acquisition, video encoding and decoding, network transmission, display and the like, wherein the efficiency of the video encoding and decoding directly influences the quality of video call. X264 is a currently commonly used video encoder, and since the frame skip mode is not set in X264, when the video changes severely, the quantization parameter (quantization parameter, QP for short) drops rapidly, resulting in a drastic drop in video quality, and the coding efficiency is low.
Disclosure of Invention
The disclosure provides a video coding method and device, electronic equipment and a computer readable storage medium, which can ensure video quality and improve coding efficiency.
In a first aspect, the present disclosure provides a video encoding method, the video encoding method comprising:
acquiring the image complexity of an ith frame and the image complexity of an ith-1 frame, wherein i is an integer greater than 2, the ith frame is a current frame, and the ith-1 frame is a frame before the ith frame;
Determining the similarity of the ith frame and the i-1 frame according to the image complexity of the ith frame and the image complexity of the i-1 frame;
acquiring filling degree of the buffer zone after the i-1 th frame is put into the buffer zone and before the i-1 th frame is put into the buffer zone, wherein the filling degree is the ratio of the number of video frames buffered by the buffer zone to the rated capacity;
judging whether the ith frame jumps or not according to the similarity between the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of the buffer area and a preset filling degree threshold, and obtaining a first judging result corresponding to the ith frame;
and determining a target coding mode of the ith frame based on a first judging result corresponding to the ith frame, and coding the ith frame according to the target coding mode.
In a second aspect, the present disclosure provides a video encoding apparatus comprising:
the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring the image complexity of an ith frame and the image complexity of an ith-1 frame, wherein i is an integer greater than 2, the ith frame is a current frame, and the ith-1 frame is a previous frame of the ith frame;
a determining module, configured to determine a similarity between the ith frame and the i-1 th frame according to an image complexity of the ith frame and an image complexity of the i-1 th frame;
The acquisition module is further used for acquiring the filling degree of the buffer area after the ith frame is placed in the buffer area and before the ith frame is placed in the buffer area, wherein the filling degree is the ratio of the number of the video frames cached in the buffer area to the rated capacity;
the acquisition module is used for judging whether the ith frame jumps or not according to the similarity between the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of the buffer area and a preset filling degree threshold, and acquiring a first judgment result corresponding to the ith frame;
the coding module is further configured to determine a target coding manner of the ith frame based on a first judgment result corresponding to the ith frame, and code the ith frame according to the target coding manner.
In a third aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, one or more of the computer programs being executable by the at least one processor to enable the at least one processor to perform the video encoding method described above.
In a fourth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor/processing core, implements the video encoding method described above.
According to the video coding method provided by the embodiment of the disclosure, after the image complexity of the ith frame and the image complexity of the ith-1 frame are obtained, the similarity of the ith frame and the ith-1 frame is determined according to the image complexity of the ith frame and the image complexity of the ith-1 frame, the higher the similarity is, the closer the ith frame and the ith-1 frame are, the smaller the influence of the frame skipping on the video quality is, and on the contrary, the larger the influence of the frame skipping on the video quality is, so that the erroneous judgment on the frame skipping of the ith frame can be reduced according to the similarity and the similarity threshold of the ith frame and the ith-1 frame, and the influence of the frame skipping on the video quality is reduced; the method comprises the steps of obtaining the filling degree of a buffer area after an ith frame is placed in the buffer area and before the ith frame is placed in the buffer area, wherein the larger the filling degree is, the longer the encoding processing time is, and the shorter the encoding processing time is, so that the influence of the ith frame on the encoding efficiency can be judged based on the filling degree and a filling degree threshold, and the video encoding efficiency can be improved while the video quality is not influenced based on the similarity of the ith frame and the ith-1 frame and a preset similarity threshold and the first judging result obtained by the filling degree and the filling degree threshold of the buffer area, so that the error judgment of the frame skipping is avoided, the degradation of the video quality is avoided, and the encoding efficiency is considered.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of a video encoding method and apparatus provided by an embodiment of the present disclosure;
fig. 2 is a flowchart of a video encoding method according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a video encoding apparatus according to an embodiment of the present disclosure;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For a better understanding of the technical solutions of the present disclosure, exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Current video coding includes openh264, vp8, vp9, and av1. Wherein, openh264 is a WebRTC default 246 encoder, and WebRTC calls openh264 to compress the encoding when 264 code streams are performed. The openh264 has the characteristics of dynamic frame rate variation and dynamic code rate variation, and has higher application value in the aspect of real-time video. However, the openh264 only supports the basic file coding mode of the h.264, and does not support the coding modes of other levels of the h.264, and the code rate control algorithm of the openh264 is too simple, so that when the dynamic frame rate and the dynamic code rate are simultaneously started, the code rate control is inaccurate. For example, standard test video 720p5994_stock holm_ter.y4m was subjected to compression test parameters and results with openh264 as follows:
the encoder parameters include: the target code rate is 2500K, the frame rate is 24, and when the video is compressed to 100 frames, the frame rate is changed to 20, and the code rate is changed to 2000K. The result of the adoption of FFMPEG for push stream is: frame=604, fps=25, q= -1.0, lsize=7456 kb, time=00:00:24.12, bitrate= 2532.2kbits/s speed=0.999 x.
From this, it can be seen that the target code rate is 2500K, and the openhl 264 is controlled to 2000K under the condition of simultaneously changing the frame rate and the coding, which indicates that the effect of the code rate control is not good. Since the current x264 cannot be optimized for the real-time video frame, when the current x264 is directly applied to the real-time video frame, if the video frame changes more severely, the quantization parameter is reduced more rapidly because the frame skip mode is not set for the quantization parameter, so that the image quality is reduced.
For this reason, related technicians propose to judge the frame skip according to the filling degree of the code stream buffer, however, judging the frame skip only according to the filling degree of the buffer is easy to have misjudgment, and is easy to cause a problem of severe frames during scene transformation. The related technical personnel propose a fixed frame-skipping technology, namely, frame-skipping is carried out once every fixed number of video frames, but the code rate of the fixed frame-skipping technology is higher, and the fixed frame-skipping technology is only suitable for video coding with small fluctuation and cannot adapt to the fluctuation of the video frames. The related art also proposes a method for judging the frame skipping based on the structural similarity (Structural Similarity, abbreviated as SSIM) between the current frame and the reconstructed frame, however, because the calculation process of the SSIM is complex, the encoding speed of the video frame is affected, and the SSIM is used as an image evaluation index, and the image complexity is not always in a forward corresponding relation, therefore, the frame skipping is judged based on the SSIM, compared with the fixed frame rate, although the code rate is saved, the accuracy of frame skipping judgment is still lower, and the quality of the video frame is reduced.
Fig. 1 schematically illustrates an application scenario diagram of a video encoding method and apparatus provided by an embodiment of the present disclosure.
As shown in fig. 1, an application scenario of an embodiment of the present disclosure may include a terminal device 101, a server 102, and a network 103. The network 103 is a medium used to provide a communication link between the terminal device 101 and the server 102. The network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 102 via the network 103 using the terminal device 101 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal device 101 (by way of example only).
The terminal device 101 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 102 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by the user using the terminal device 101. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the video encoding method and apparatus provided in the embodiments of the present disclosure may be performed by the server 102. Accordingly, the video encoding apparatus provided by the embodiments of the present disclosure may be disposed in the server 102. The video encoding method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 102 and is capable of communicating with the terminal device 101 and/or the server 102. Accordingly, the video encoding apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 102 and is capable of communicating with the terminal device 101 and/or the server 102.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 is a flowchart of a video encoding method according to an embodiment of the present disclosure. Referring to fig. 2, a video encoding method provided in an embodiment of the present disclosure includes:
step S201, the image complexity of the ith frame and the image complexity of the i-1 th frame are acquired.
Wherein i is an integer greater than 2, the i-th frame is the current frame, and the i-1-th frame is the frame preceding the i-th frame.
In the embodiment of the present disclosure, the i-th frame refers to the current video frame, and the i-1-th frame refers to the previous video frame of the i-th frame. The image complexity of the current video frame is compared with that of the previous video frame, and the similarity degree of the image complexity of the ith video frame and the i-1 video frame can be determined.
In some embodiments, the image complexity of each of the i-th frame and the i-1 th frame is derived based on a power function determined by a compression control parameter, a number of macroblocks, and a first constant and a second constant, wherein the compression control parameter is a parameter of the compressed video frame, the number of macroblocks is a number of macroblocks when the video frame is encoded, and the first constant and the second constant are preset constants.
The embodiment of the disclosure does not limit the calculation mode of the image complexity. For x264 source codes, the image complexity of the current video frame can be calculated by equation (1).
cplxr_sum = 0.01 × pow ( 700000, rc->qcompress )
× pow( h->mb.i_mb_count , 0.5 ) (1)
In formula (1), cplxr_sum represents image complexity, pow () represents a power function, 700000 and 0.5 are a first constant and a second constant, respectively, rc- > qcompact represents compression control parameters, h- > mb.i_mb_count macroblock number. pow (700000, rc- > qcompact) represents 700000 as a base, rc- > qcompact as an exponential power function, pow (h- > mb.i_mb_count, 0.5) represents h- > mb.i_mb_count as a base, and 0.5 as an exponential power function.
The update can be performed using equation (2) after obtaining the image complexity of the current video frame.
cplxr_sum += bits × qp2qscale(rc->qpa_rc) / rc->last_rceq (2)
In formula (2), cplxr_sum represents image complexity, + = represents arithmetic operator, bits represents encoded bit number (code stream), qp2qscale () represents qp to qscale conversion function, qp represents quantization parameter, qscale represents quantization scale, rc- > qpa _rc represents quantization step size of qp, rc- > last_rc eq represents function converted from previous video frame.
In some embodiments, the image complexity of the video frame may also be calculated using equation (3).
cplxr_sum *= rc->cbr_decay (3)
Where cplxr_sum represents image complexity, rc- > cbr _decay represents a decay parameter, =represents an assignment operator.
Step S202, determining the similarity of the ith frame and the ith-1 frame according to the image complexity of the ith frame and the image complexity of the ith-1 frame.
Calculating the difference value of the image complexity of the ith frame and the image complexity of the ith-1 frame, and determining the similarity of the ith frame and the ith-1 frame, wherein the higher the similarity is, the smaller the change of the ith frame and the ith-1 frame is, and the less influence of the frame skip on the quality of the video is; the smaller the similarity, the larger the change of the ith frame and the ith-1 frame, and the larger the influence of the frame skip on the quality of the video.
In the embodiment of the disclosure, the similarity threshold value can be compared with the similarity of the ith frame and the i-1 th frame to judge whether the adjustment of the ith frame can affect the quality of the video. When the similarity of the ith frame and the ith-1 frame is greater than or equal to a similarity threshold value, the similarity of the ith frame and the ith-1 frame is higher, and the frame skip of the ith frame cannot influence the video quality. When the similarity between the i frame and the i-1 frame is smaller than the similarity threshold, the similarity between the i frame and the i-1 frame is lower, and the frame skip of the i frame can influence the video quality.
The embodiment of the disclosure does not limit the specific value of the similarity threshold. For example, the similarity threshold may be 0.8. When the similarity of the ith frame and the i-1 th frame is larger than 0.8, the similarity of the ith frame and the i-1 th frame is higher, and the frame skip of the ith frame does not influence the video quality. When the similarity between the i frame and the i-1 frame is smaller than 0.8, the similarity between the i frame and the i-1 frame is lower, and the frame skip of the i frame can influence the video quality.
Step S203, the filling degree of the buffer area before the ith frame is put into the buffer area after the ith frame is put into the buffer area is obtained.
Where fullness is the ratio of the number of video frames buffered in the buffer to the rated capacity. For example, the rated capacity of the buffer is 1M, the number of buffered video frames in the buffer is 900k, and the fullness of the buffer is 90%. The higher the filling level, the slower the encoding speed, the lower the filling level, and the faster the encoding speed.
After the ith frame is received, the ith frame is not placed in the buffer zone, the video frames buffered in the buffer zone are the ith-1 frame and the video frames before the ith-1 frame, and the filling degree of the buffer zone can best reflect the coding processing capacity when the ith frame is received.
Step S204, judging whether the ith frame is jumped or not according to the similarity of the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of a buffer area and a preset filling degree threshold, and obtaining a first judging result corresponding to the ith frame.
And judging whether the ith frame needs to skip frames from the similarity angle of the video frames by using the similarity and the similarity threshold of the ith frame and the ith-1 frame, so that the influence of the skip frames on the video quality is reduced, and the encoding capacity is judged by using the filling degree and the filling degree threshold of the buffer area, so that the influence of the non-skip frames on the encoding efficiency is reduced, and therefore, the first judgment result obtained through the similarity and the filling degree can not only reduce erroneous judgment, but also improve the encoding efficiency.
In the disclosed embodiments, the similarity threshold and the filling threshold may be set by a user. For example, the similarity threshold is set to 0.8 or 0.9. The fullness threshold is set to 0.9, 0.8 or 0.7.
In some embodiments, the fullness threshold comprises a first fullness threshold and a second fullness threshold, the first fullness threshold being greater than the second fullness threshold.
In some embodiments, step S204, judging whether the ith frame is jumped according to the similarity between the ith frame and the ith-1 frame, the preset similarity threshold, the filling degree of the buffer area and the preset filling degree threshold, and obtaining a first judging result corresponding to the ith frame, wherein the first judging result comprises: under the condition that the similarity between the ith frame and the i-1 th frame is smaller than or equal to a similarity threshold value, the first judgment result corresponding to the ith frame is a non-skip frame; when the similarity of the ith frame and the i-1 th frame is larger than a similarity threshold value and the filling degree is larger than or equal to a first filling degree threshold value, the first judgment result corresponding to the ith frame is frame skipping; and when the similarity between the ith frame and the i-1 th frame is larger than a similarity threshold, and the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, the first judgment result corresponding to the ith frame is a non-skip frame.
The following description will take, as an example, a similarity threshold of 0.8, a first filling degree threshold of 0.9, and a second filling degree threshold of 0.7. When the similarity between the ith frame and the i-1 th frame is less than or equal to 0.8, the difference between the ith frame and the i-1 th frame is larger, and if the frame is jumped, the quality of the video is affected, so that the first judgment result corresponding to the ith frame is non-jumped. When the similarity between the ith frame and the ith-1 frame is greater than 0.8 and the filling degree is greater than or equal to 0.9, the similarity between the ith frame and the ith-1 frame is higher, the influence of the frame skipping on the quality of the video is smaller, the filling degree is higher, and the encoding capacity is limited, so that the first judgment result corresponding to the ith frame is the frame skipping. When the similarity between the ith frame and the ith-1 frame is greater than 0.8 and the filling degree is between 0.7 and 0.9, the similarity between the ith frame and the ith-1 frame is higher, and the influence of the frame skipping on the quality of the video is smaller, but because the filling degree is lower, the frame skipping has certain coding capacity, and the first judgment result corresponding to the ith frame is a non-frame skipping.
In some embodiments, in order to avoid frame skipping of a plurality of consecutive video frames, the quality of the video may be affected, and therefore, when determining frame skipping, the embodiments of the present disclosure may further combine the actual frame skipping situation of the previous preset number of video frames adjacent to the ith frame.
Judging whether the ith frame jumps or not according to the similarity of the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of a buffer area and a preset filling degree threshold, and before obtaining a first judging result corresponding to the ith frame, further comprising: and acquiring the actual frame skipping condition of the video frames adjacent to the ith frame and having the preset number.
The preset number may be set by the user, for example, 3 or other values. The actual frame skip condition of the previous preset number of video frames adjacent to the i-th frame may be all frame skip or partial frame skip.
When the similarity of the ith frame and the (i-1) th frame is larger than a similarity threshold, the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame are all frame skipping, the first judgment result corresponding to the ith frame is non-frame skipping; when the similarity between the ith frame and the i-1 th frame is larger than a similarity threshold, the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame comprise non-skip frames, the first judgment result corresponding to the ith frame is skip frames.
For example, when the preset number is 3, it is determined whether the ith frame is jumped or not according to the situation of the previous 3 frames adjacent to the ith frame. If the similarity between the ith frame and the i-1 th frame is greater than 0.8, the filling degree is less than 0.9 and greater than 0.7, if the first 3 video frames adjacent to the ith frame are all skipped frames, the first judgment result corresponding to the ith frame is non-skipped frames, i.e. 4 video frames cannot be skipped frames continuously. When the similarity between the ith frame and the i-1 th frame is greater than 0.8, and the filling degree is less than 0.9 and greater than 0.7, if the first 3 video frames adjacent to the ith frame include non-skip frames, the first judgment result corresponding to the ith frame is skip frames. In other words, if two frames or one frame of the first 3 video frames adjacent to the ith frame are skipped frames, the determination result of the ith frame is not affected. If the first 3 adjacent video frames of the ith frame are all skip frames, the judging result of the ith frame is affected.
Step S205, determining the target coding mode of the ith frame based on the first judging result corresponding to the ith frame, and coding the ith frame according to the target coding mode.
And when the first judgment result is that the ith frame does not skip frames, the coding processing is carried out on the ith frame, and when the first judgment result is that the ith frame skips frames, the coding processing is not carried out on the ith frame.
When the i frame is encoded, the i frame may be directly encoded, and the specific encoding mode in the embodiment of the disclosure is not limited, and any encoding mode may be used.
In an embodiment of the present disclosure, an encoding process is performed on an i-th frame based on a quantization parameter, and the encoding process includes: acquiring quantization parameters of an ith frame and quantization parameters of an i-1 th frame; calculating the quantization parameter of the ith frame and the variation of the quantization parameter of the (i-1) th frame; under the condition that the variable quantity of the quantization parameter of the ith frame and the quantization parameter of the i-1 th frame is smaller than or equal to a preset change threshold value, encoding the ith frame according to the quantization parameter of the ith frame; under the condition that the variation of the quantization parameter of the ith frame and the quantization parameter of the ith-1 frame is larger than a preset variation threshold, calculating the quantization parameter of the ith frame according to the quantization parameter of the ith-1 frame and the variation threshold, obtaining a planned quantization parameter of the ith frame, and carrying out coding processing on the ith frame according to the planned quantization parameter.
The quantization parameter is a parameter of a compressed video frame in the encoding process, and the larger the quantization parameter is, the larger the encoding loss is, and the more easily the encoded image is distorted; the smaller the quantization parameter, the more details of the image can be preserved. The quantization parameter may be calculated by fq=round (y/Qstep), where y represents the sample point code, qstep is the quantization step size, and FQ represents the quantization parameter.
When the variation of the quantization parameter of the ith frame and the quantization parameter of the i-1 th frame is less than or equal to the variation threshold, it is indicated that the quantization parameter does not vary much, and the coding process is performed on the ith frame according to the quantization parameter of the ith frame. When the variation of the quantization parameter of the ith frame and the quantization parameter of the ith-1 frame are larger than the variation threshold, the variation of the quantization parameter is larger, the variation of the quantization parameter is limited within the variation threshold range, namely, the variation threshold is increased on the basis of the quantization parameter of the ith-1 frame, the quantization parameter is planned, then the ith frame is encoded according to the planned quantization parameter, and the phenomenon that the video quality is obviously changed due to the fact that the variation of the quantization parameter of two continuous frames is larger is avoided, and the experience of watching the video by a user is affected is avoided.
In some embodiments, to avoid erroneous determination of whether the i frame is skipped based on the similarity and the filling degree, the embodiments of the present disclosure may further determine whether the i frame is skipped by combining a random number, that is, comprehensively determining whether the i frame is skipped by the similarity, the filling degree, and the random number.
Before determining the target coding mode of the ith frame based on the first judgment result corresponding to the ith frame, the method further comprises: and obtaining a random number, wherein each video frame is configured with a random number, the random number comprises a frame hopping number and at least one non-frame hopping number, the frame hopping number is a frame hopping numerical value, and the non-frame hopping number is a frame non-frame hopping numerical value.
In the embodiment of the present disclosure, the number of random numbers may be arbitrarily set. For example, the random numbers are 0, 1, 2, 3, and 4, where 0 is the number of frames to be skipped, and 1, 2, 3, and 4 is the number of frames to be skipped, that is, the video frames corresponding to 0 are skipped, and the judgment result based on the similarity and the filling degree is not considered. 1. 2, 3, 4, and based on the judging result of similarity and filling degree.
Determining a target coding mode of the ith frame based on a first judging result corresponding to the ith frame, including: and determining a target coding mode of the ith frame based on the first judging result corresponding to the ith frame and the random number.
In the embodiment of the disclosure, a random number is configured for each video frame, and when the first determination result is non-frame skip, but the random number is 0, the ith frame skips. When the first judgment result is non-frame skip, and the random number is 1, the ith frame does not skip. When the first judgment result is frame skip and the random number is 0, the ith frame is frame skip. When the first judgment result is frame skip and the random number is 1, the ith frame is frame skip.
In some embodiments, determining the target coding mode of the ith frame based on the first determination result and the random number corresponding to the ith frame includes:
under the condition that the random number is the frame skip number, the target coding mode of the ith frame is to carry out frame skip processing on the ith frame; when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a non-skip frame, the target coding mode of the ith frame is to code the ith frame; when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a skip frame, the target coding mode of the ith frame is to perform frame skip processing on the ith frame.
For example, when the first determination result is non-frame skip, but the random number is 0, the ith frame is frame skip processed. When the first judgment result is non-frame skip, and the random number is 1, the ith frame is processed without frame skip. When the first judgment result is frame skip and the random number is 0, the ith frame is subjected to frame skip processing. When the first judgment result is frame skip and the random number is 1, the ith frame is subjected to frame skip processing.
It should be noted that, after determining that the ith frame does not need to skip frames by combining the similarity, the filling degree and the random number, the encoding processing manner of the ith frame is the same as the judging manner based on the similarity and the filling degree, and will not be described herein.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
Compared with a general SSIM method, the video coding method provided by the embodiment of the disclosure has a first test video 720p5994_stock holm_ter.y4m, a target code rate of 2000K, a frame rate of 30 frames, a code rate of 1500K changed from a 101 st frame, and a frame rate of 20 frames.
SSIM method:
x264 [info]: PSNR Mean Y:29.953 U:37.467 V:37.358 Avg:31.337 Global:31.235 kb/s:1560.53
encoded 598 frames, 375.16 fps
the video coding method provided by the embodiment of the disclosure comprises the following steps:
x264 [info]: PSNR Mean Y:29.914 U:37.436 V:37.322 Avg:31.300 Global:31.185 kb/s:1554.36
encoded 597 frames, 598.80 fps。
the second test video was stat2_1080p25.y4m, the target bitrate was 2000K, the frame rate was 30 frames, the bitrate was changed from frame 101 to 1500K, and the frame rate was 20 frames.
SSIM method:
x264 [info]: PSNR Mean Y:34.138 U:41.346 V:41.393 Avg:35.505 Global:35.452 kb/s:1631.76
encoded 311 frames, 157.39 fps
The video coding method provided by the embodiment of the disclosure comprises the following steps:
x264 [info]: PSNR Mean Y:34.160 U:41.385 V:41.435 Avg:35.528 Global:35.477 kb/s:1643.79
encoded 301 frames, 212.42 fps。
from the experimental results, even if the code rate is below 1500K, the video encoding method provided by the embodiment of the disclosure can perform frame skipping processing on the video frames in the video, and the encoding speed is faster than that based on the SSIM method, and the peak signal-to-noise ratio (Peak Signal to Noise Ratio, abbreviated as PSNR) is basically consistent.
According to the video coding method provided by the embodiment of the disclosure, after the image complexity of the ith frame and the image complexity of the ith-1 frame are obtained, the similarity of the ith frame and the ith-1 frame is determined according to the image complexity of the ith frame and the image complexity of the ith-1 frame, the higher the similarity is, the closer the ith frame and the ith-1 frame are, the smaller the influence of the frame skipping on the video quality is, and on the contrary, the larger the influence of the frame skipping on the video quality is, so that the erroneous judgment on the frame skipping of the ith frame can be reduced according to the similarity and the similarity threshold of the ith frame and the ith-1 frame, and the influence of the frame skipping on the video quality is reduced; the method comprises the steps of obtaining the filling degree of a buffer area after an ith frame is placed in the buffer area and before the ith frame is placed in the buffer area, wherein the larger the filling degree is, the longer the encoding processing time is, and the shorter the encoding processing time is, so that the influence of the ith frame on the encoding efficiency can be judged based on the filling degree and a filling degree threshold, and the first judgment result obtained based on the similarity of the ith frame and the ith-1 frame and a preset similarity threshold and the filling degree threshold of the buffer area is used for avoiding the erroneous judgment of the skip frame, so that the degradation of video quality is avoided, and the encoding efficiency is considered, and the target encoding mode of the ith frame is determined based on the first judgment result corresponding to the ith frame, so that the encoding efficiency of the video can be improved while the video quality is not influenced.
Fig. 3 is a block diagram of a video encoding apparatus according to an embodiment of the present disclosure.
Referring to fig. 3, an embodiment of the present disclosure provides a video encoding apparatus 300 including:
an acquisition module 301, configured to acquire an image complexity of an i-th frame and an image complexity of an i-1 th frame, where i is an integer greater than 2, the i-th frame is a current frame, and the i-1 th frame is a frame preceding the i-th frame.
A determining module 302, configured to determine the similarity between the ith frame and the i-1 th frame according to the image complexity of the ith frame and the image complexity of the i-1 th frame.
The obtaining module 301 is further configured to obtain a fullness of the buffer after the i-1 th frame is placed in the buffer and before the i-1 th frame is placed in the buffer, where fullness is a ratio of a number of buffered video frames in the buffer to a rated number.
The obtaining module 303 is configured to determine whether the ith frame is jumped according to the similarity between the ith frame and the ith-1 frame, a preset similarity threshold, and the filling degree of the buffer area and the preset filling degree threshold, and obtain a first determination result corresponding to the ith frame.
The encoding module 304 is further configured to determine a target encoding mode of the ith frame based on the first determination result corresponding to the ith frame, and encode the ith frame according to the target encoding mode.
In some embodiments, the fullness threshold comprises a first fullness threshold and a second fullness threshold, the first fullness threshold being greater than the second fullness threshold.
In some embodiments, the obtaining module 303 is further configured to, if the similarity between the i-th frame and the i-1-th frame is less than or equal to the similarity threshold, determine that the first determination result corresponding to the i-th frame is a non-skipped frame;
when the similarity of the ith frame and the i-1 th frame is larger than a similarity threshold value and the filling degree is larger than or equal to a first filling degree threshold value, the first judgment result corresponding to the ith frame is frame skipping;
when the similarity of the ith frame and the i-1 th frame is larger than a similarity threshold, and the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, the first judgment result corresponding to the ith frame is a non-skip frame;
in some embodiments, the obtaining module 301 is further configured to obtain an actual frame skip situation of a previous preset number of video frames adjacent to the ith frame.
The obtaining module 303 is further configured to, when the similarity between the ith frame and the i-1 th frame is greater than a similarity threshold, the filling degree is less than a first filling degree threshold and greater than a second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame are all skipped frames, determine that the first determination result corresponding to the ith frame is a non-skipped frame;
When the similarity between the ith frame and the i-1 th frame is larger than a similarity threshold, the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame comprise non-skip frames, the first judgment result corresponding to the ith frame is skip frames.
In some embodiments, the obtaining module 301 is further configured to obtain a random number, where each video frame is configured with a random number, and the random number includes a skip frame number and at least one non-skip frame number, where the skip frame number is a value for performing a skip frame, and the non-skip frame number is a value for not performing a skip frame.
The encoding module 304 is further configured to determine a target encoding mode of the ith frame based on the first determination result and the random number corresponding to the ith frame.
In some embodiments, the encoding module 304 is further configured to, if the random number is a frame skip number, perform frame skip processing on the i frame in a target encoding manner of the i frame;
when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a non-skip frame, the target coding mode of the ith frame is to code the ith frame;
when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a skip frame, the target coding mode of the ith frame is to perform frame skip processing on the ith frame.
The encoding module 304 is further configured to obtain a quantization parameter of the i-th frame and a quantization parameter of the i-1 th frame;
calculating the quantization parameter of the ith frame and the variation of the quantization parameter of the (i-1) th frame;
under the condition that the variable quantity of the quantization parameter of the ith frame and the quantization parameter of the i-1 th frame is smaller than or equal to a preset change threshold value, encoding the ith frame according to the quantization parameter of the ith frame;
under the condition that the variation of the quantization parameter of the ith frame and the quantization parameter of the ith-1 frame is larger than a preset variation threshold, calculating the quantization parameter of the ith frame according to the quantization parameter of the ith-1 frame and the variation threshold, obtaining a planned quantization parameter of the ith frame, and carrying out coding processing on the ith frame according to the planned quantization parameter.
In some embodiments, the image complexity of each of the i-th frame and the i-1 th frame is derived based on a power function determined by a compression control parameter, a number of macroblocks, and a first constant and a second constant, wherein the compression control parameter is a parameter of the compressed video frame, the number of macroblocks is a number of macroblocks used when encoding the video frame, and the first constant and the second constant are preset constants.
The various modules in the video encoding apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
According to the video coding device provided by the embodiment of the disclosure, the acquisition module acquires the image complexity of the ith frame and the image complexity of the ith-1 frame, the determination module determines the similarity of the ith frame and the ith-1 frame according to the image complexity of the ith frame and the image complexity of the ith-1 frame, the higher the similarity is, the closer the ith frame and the ith-1 frame are, the smaller the influence of the frame skipping on the video quality is, and otherwise, the larger the influence of the frame skipping on the video quality is, so that the erroneous judgment on the frame skipping of the ith frame can be reduced according to the similarity of the ith frame and the ith-1 frame and the similarity threshold value, and the influence of the frame skipping on the video quality is reduced; the acquisition module also acquires the filling degree of the buffer area after the ith frame is placed in the buffer area and before the ith frame is placed in the buffer area, the larger the filling degree is, the longer the encoding processing time is, and the shorter the encoding processing time is, so that the influence of the ith frame on the encoding efficiency can be judged based on the filling degree and the filling degree threshold, the acquisition module can acquire a first judgment result based on the similarity between the ith frame and the ith-1 frame and the preset similarity threshold and the filling degree threshold of the buffer area, so that the erroneous judgment of the skip frame is avoided, the degradation of the video quality is avoided, the encoding efficiency is considered, the encoding module determines the target encoding mode of the ith frame based on the first judgment result corresponding to the ith frame, encodes the ith frame according to the target encoding mode, and can improve the encoding efficiency of the video while not affecting the video quality.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Referring to fig. 4, an embodiment of the present disclosure provides an electronic device including: at least one processor 401; at least one memory 402, and one or more I/O interfaces 403, connected between the processor 401 and the memory 402; wherein the memory 402 stores one or more computer programs executable by the at least one processor 401, the one or more computer programs being executable by the at least one processor 401 to enable the at least one processor 401 to perform the video encoding method described above.
In some embodiments, the fullness threshold comprises a first fullness threshold and a second fullness threshold, the first fullness threshold being greater than the second fullness threshold.
In some embodiments, the processor 401 is further configured to, if the similarity between the i-th frame and the i-1-th frame is less than or equal to the similarity threshold, determine that the first determination result corresponding to the i-th frame is a non-skipped frame;
when the similarity of the ith frame and the i-1 th frame is larger than a similarity threshold value and the filling degree is larger than or equal to a first filling degree threshold value, the first judgment result corresponding to the ith frame is frame skipping;
when the similarity of the ith frame and the i-1 th frame is larger than a similarity threshold, and the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, the first judgment result corresponding to the ith frame is a non-skip frame;
In some embodiments, the processor 401 is further configured to obtain an actual frame skip situation of a previous preset number of video frames adjacent to the ith frame, and when the similarity between the ith frame and the i-1 th frame is greater than a similarity threshold, the filling degree is less than a first filling degree threshold and greater than a second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame are all frame skip, the first judgment result corresponding to the ith frame is non-frame skip;
when the similarity between the ith frame and the i-1 th frame is larger than a similarity threshold, the filling degree is smaller than a first filling degree threshold and larger than a second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame comprise non-skip frames, the first judgment result corresponding to the ith frame is skip frames.
In some embodiments, the processor 401 is further configured to obtain a random number, where each video frame is configured with a random number, and the random number includes a frame skip number and at least one non-frame skip number, where the frame skip number is a value for performing frame skip, the non-frame skip number is a value for not performing frame skip, and determine a target coding manner of the i frame based on the first determination result corresponding to the i frame and the random number.
In some embodiments, the processor 401 is further configured to, in a case where the random number is a frame skip number, perform frame skip processing on the i frame in a target encoding manner of the i frame; when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a non-skip frame, the target coding mode of the ith frame is to code the ith frame; when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a skip frame, the target coding mode of the ith frame is to perform frame skip processing on the ith frame.
The processor 401 is further configured to obtain a quantization parameter of the i-th frame and a quantization parameter of the i-1 th frame; calculating the quantization parameter of the ith frame and the variation of the quantization parameter of the (i-1) th frame; under the condition that the variable quantity of the quantization parameter of the ith frame and the quantization parameter of the i-1 th frame is smaller than or equal to a preset change threshold value, encoding the ith frame according to the quantization parameter of the ith frame; under the condition that the variation of the quantization parameter of the ith frame and the quantization parameter of the ith-1 frame is larger than a preset variation threshold, calculating the quantization parameter of the ith frame according to the quantization parameter of the ith-1 frame and the variation threshold, obtaining a planned quantization parameter of the ith frame, and carrying out coding processing on the ith frame according to the planned quantization parameter.
In some embodiments, the image complexity of each of the i-th frame and the i-1 th frame is derived based on a power function determined by a compression control parameter, a number of macroblocks, and a first constant and a second constant, wherein the compression control parameter is a parameter of the compressed video frame, the number of macroblocks is a number of macroblocks used when encoding the video frame, and the first constant and the second constant are preset constants.
The various modules in the electronic device described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor/processing core implements the video encoding method described above. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
The disclosed embodiments also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the video encoding method described above.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (10)

1. A video encoding method, comprising:
acquiring the image complexity of an ith frame and the image complexity of an ith-1 frame, wherein i is an integer greater than 2, the ith frame is a current frame, and the ith-1 frame is a frame before the ith frame;
determining the similarity of the ith frame and the i-1 frame according to the image complexity of the ith frame and the image complexity of the i-1 frame;
acquiring filling degree of the buffer zone after the i-1 th frame is put into the buffer zone and before the i-1 th frame is put into the buffer zone, wherein the filling degree is the ratio of the number of video frames buffered by the buffer zone to the rated capacity;
Judging whether the ith frame jumps or not according to the similarity between the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of the buffer area and a preset filling degree threshold, and obtaining a first judging result corresponding to the ith frame;
and determining a target coding mode of the ith frame based on a first judging result corresponding to the ith frame, and coding the ith frame according to the target coding mode.
2. The method of claim 1, wherein the fullness threshold comprises a first fullness threshold, a second fullness threshold, the first fullness threshold being greater than the second fullness threshold;
the step of judging whether the ith frame jumps or not according to the similarity between the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of the buffer area and a preset filling degree threshold, and obtaining a first judgment result corresponding to the ith frame, wherein the first judgment result comprises:
when the similarity between the ith frame and the i-1 th frame is smaller than or equal to the similarity threshold, the first judgment result corresponding to the ith frame is a non-skip frame;
when the similarity between the ith frame and the ith-1 frame is greater than the similarity threshold and the filling degree is greater than or equal to the first filling degree threshold, a first judgment result corresponding to the ith frame is a frame skip;
And when the similarity between the ith frame and the ith-1 frame is greater than the similarity threshold, and the filling degree is smaller than the first filling degree threshold and greater than the second filling degree threshold, the first judgment result corresponding to the ith frame is a non-skip frame.
3. The method according to claim 2, wherein the method further comprises:
acquiring the actual frame skipping condition of a preset number of video frames adjacent to the ith frame;
when the similarity between the ith frame and the i-1 th frame is greater than the similarity threshold, the filling degree is smaller than the first filling degree threshold and greater than the second filling degree threshold, and the previous preset number of video frames adjacent to the ith frame are all skipped frames, the first judgment result corresponding to the ith frame is non-skipped frames;
and when the similarity between the ith frame and the i-1 th frame is greater than the similarity threshold, the filling degree is smaller than the first filling degree threshold and greater than the second filling degree threshold, and the preset number of video frames adjacent to the ith frame comprise non-skipped frames, the first judgment result corresponding to the ith frame is a skipped frame.
4. A method according to any one of claims 1 to 3, wherein before the determining the target coding mode of the i-th frame based on the first determination result corresponding to the i-th frame, the method further includes:
Obtaining a random number, wherein each video frame is configured with one random number, the random number comprises a frame skip number and at least one non-frame skip number, the frame skip number is a frame skip value, and the non-frame skip number is a frame skip non-frame skip value;
the determining the target coding mode of the ith frame based on the first judgment result corresponding to the ith frame includes:
and determining a target coding mode of the ith frame based on a first judging result corresponding to the ith frame and the random number.
5. The method of claim 4, wherein the determining the target coding mode of the ith frame based on the first determination result corresponding to the ith frame and the random number comprises:
under the condition that the random number is the frame skip number, the target coding mode of the ith frame is to carry out frame skip processing on the ith frame;
when the random number is a non-skip frame number and the first judgment result corresponding to the ith frame is a non-skip frame, the target coding mode of the ith frame is to code the ith frame;
and when the random number is a non-frame skip number and the first judgment result corresponding to the ith frame is a frame skip, performing frame skip processing on the ith frame in a target coding mode of the ith frame.
6. The method of claim 5, wherein said encoding said i-th frame comprises:
acquiring the quantization parameter of the ith frame and the quantization parameter of the i-1 th frame;
calculating the quantization parameter of the ith frame and the variation of the quantization parameter of the i-1 th frame;
when the quantization parameter of the ith frame and the variation of the quantization parameter of the ith-1 frame are smaller than or equal to a preset variation threshold, carrying out coding processing on the ith frame according to the quantization parameter of the ith frame;
and under the condition that the variation of the quantization parameter of the ith frame and the quantization parameter of the ith-1 frame is larger than a preset variation threshold, calculating the quantization parameter of the ith frame according to the quantization parameter of the ith-1 frame and the variation threshold to obtain a planned quantization parameter of the ith frame, and carrying out coding processing on the ith frame according to the planned quantization parameter.
7. A method according to any one of claims 1-3, wherein the image complexity of the i-th frame and the i-1-th frame are each derived based on a power function determined by a compression control parameter, a number of macroblocks, the number of macroblocks being used when encoding the video frame, a first constant and a second constant, wherein the compression control parameter is a parameter of the compressed video frame, and the first constant and the second constant are preset constants.
8. A video encoding apparatus, comprising:
the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring the image complexity of an ith frame and the image complexity of an ith-1 frame, wherein i is an integer greater than 2, the ith frame is a current frame, and the ith-1 frame is a previous frame of the ith frame;
a determining module, configured to determine a similarity between the ith frame and the i-1 th frame according to an image complexity of the ith frame and an image complexity of the i-1 th frame;
the acquisition module is further used for acquiring the filling degree of the buffer area after the ith frame is placed in the buffer area and before the ith frame is placed in the buffer area, wherein the filling degree is the ratio of the number of the video frames cached in the buffer area to the rated capacity;
the acquisition module is used for judging whether the ith frame jumps or not according to the similarity between the ith frame and the ith-1 frame, a preset similarity threshold, the filling degree of the buffer area and a preset filling degree threshold, and acquiring a first judgment result corresponding to the ith frame;
the coding module is further configured to determine a target coding manner of the ith frame based on a first judgment result corresponding to the ith frame, and code the ith frame according to the target coding manner.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the video encoding method of any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the video encoding method according to any of claims 1-7.
CN202311048965.3A 2023-08-21 2023-08-21 Video encoding method and device, electronic equipment and computer readable storage medium Active CN116761036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311048965.3A CN116761036B (en) 2023-08-21 2023-08-21 Video encoding method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311048965.3A CN116761036B (en) 2023-08-21 2023-08-21 Video encoding method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN116761036A true CN116761036A (en) 2023-09-15
CN116761036B CN116761036B (en) 2023-11-14

Family

ID=87953763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311048965.3A Active CN116761036B (en) 2023-08-21 2023-08-21 Video encoding method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116761036B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040001550A1 (en) * 2002-06-28 2004-01-01 Lsi Logic Corporation Method of detecting internal frame skips by MPEG video decoders
US20060056508A1 (en) * 2004-09-03 2006-03-16 Phillippe Lafon Video coding rate control
US20100278268A1 (en) * 2007-12-18 2010-11-04 Chung-Ku Lee Method and device for video coding and decoding
CN102113329A (en) * 2008-07-29 2011-06-29 高通股份有限公司 Intelligent frame skipping in video coding based on similarity metric in compressed domain
CN104620595A (en) * 2012-10-11 2015-05-13 坦戈迈公司 Proactive video frame dropping
CN110113602A (en) * 2019-04-22 2019-08-09 西安电子科技大学 A kind of H.264 code rate control frame-skipping optimization method
WO2021244341A1 (en) * 2020-06-05 2021-12-09 中兴通讯股份有限公司 Picture coding method and apparatus, electronic device and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040001550A1 (en) * 2002-06-28 2004-01-01 Lsi Logic Corporation Method of detecting internal frame skips by MPEG video decoders
US20060056508A1 (en) * 2004-09-03 2006-03-16 Phillippe Lafon Video coding rate control
US20100278268A1 (en) * 2007-12-18 2010-11-04 Chung-Ku Lee Method and device for video coding and decoding
CN102113329A (en) * 2008-07-29 2011-06-29 高通股份有限公司 Intelligent frame skipping in video coding based on similarity metric in compressed domain
CN104620595A (en) * 2012-10-11 2015-05-13 坦戈迈公司 Proactive video frame dropping
CN110113602A (en) * 2019-04-22 2019-08-09 西安电子科技大学 A kind of H.264 code rate control frame-skipping optimization method
WO2021244341A1 (en) * 2020-06-05 2021-12-09 中兴通讯股份有限公司 Picture coding method and apparatus, electronic device and computer readable storage medium

Also Published As

Publication number Publication date
CN116761036B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
US11336907B2 (en) Video encoder, video decoder, and corresponding encoding and decoding methods
KR102288109B1 (en) Bidirectional prediction in video compression
US10477237B2 (en) Decoder side motion vector refinement in video coding
JP6334830B2 (en) High-speed video coding method using block segmentation
WO2019011245A1 (en) Fractional quantization parameter offset in video compression
US20190089952A1 (en) Bidirectional Weighted Intra Prediction
US11350104B2 (en) Method for processing a set of images of a video sequence
US20190110052A1 (en) Bidirectional intra prediction
CN114845106A (en) Video coding method, video coding device, storage medium and electronic equipment
CN116761036B (en) Video encoding method and device, electronic equipment and computer readable storage medium
US9762902B2 (en) Weighted prediction method and apparatus in quantization matrix coding
KR20220123666A (en) Estimation of weighted-prediction parameters
EP4032276A1 (en) Chroma residual scaling foreseeing a corrective value to be added to luma mapping slope values
CN116886918A (en) Video coding method, device, equipment and storage medium
JP7279189B2 (en) Transform unit segmentation method for video coding
EP3815369A1 (en) Methods and devices for coding and decoding a data stream representing at least one image
US20170347138A1 (en) Efficient transcoding in a network transcoder
US20240080446A1 (en) Systems and methods for parameterizing arithmetic coder probability update rates
US20230388535A1 (en) Systems and methods for combining subblock motion compensation and overlapped block motion compensation
KR20150102874A (en) Method for coding image by using adaptive coding scheme and device for coding image using the method
CN117083863A (en) System and method for division-free probability regularization for arithmetic coding
FR3081656A1 (en) METHODS AND DEVICES FOR ENCODING AND DECODING A DATA STREAM REPRESENTATIVE OF AT LEAST ONE IMAGE.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant