CN112532975A - Video encoding method, video encoding device, computer equipment and storage medium - Google Patents

Video encoding method, video encoding device, computer equipment and storage medium Download PDF

Info

Publication number
CN112532975A
CN112532975A CN202011336008.7A CN202011336008A CN112532975A CN 112532975 A CN112532975 A CN 112532975A CN 202011336008 A CN202011336008 A CN 202011336008A CN 112532975 A CN112532975 A CN 112532975A
Authority
CN
China
Prior art keywords
macro block
coding
modes
rate
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011336008.7A
Other languages
Chinese (zh)
Other versions
CN112532975B (en
Inventor
肖文惠
刘海军
王诗涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011336008.7A priority Critical patent/CN112532975B/en
Publication of CN112532975A publication Critical patent/CN112532975A/en
Application granted granted Critical
Publication of CN112532975B publication Critical patent/CN112532975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a video coding method, a video coding device, computer equipment and a storage medium, and belongs to the technical field of audio and video. According to the method and the device, the current macro block to be coded is a static macro block, the reference video frame is a forward reference P frame, the coding mode of the static macro block is directly determined as the target coding mode based on the fact that the quality parameter of the static macro block is larger than or equal to the quality parameter of the reference macro block and the motion vector meets the target condition, prediction judgment of other coding modes is skipped, rate distortion cost does not need to be calculated one by one in-frame mode and inter-frame mode, and the coding rate can be greatly improved under the condition that coding efficiency is not lost.

Description

Video encoding method, video encoding device, computer equipment and storage medium
Technical Field
The present application relates to the field of audio and video technologies, and in particular, to a video encoding method and apparatus, a computer device, and a storage medium.
Background
With the development of audio and video technology and the diversification of terminal functions, a user can watch videos through the terminal at any time and any place. When the server transmits the video to the terminal, the video needs to be compressed (i.e., encoded) to save communication overhead. In the encoding process, a video image needs to be divided into a plurality of blocks, and a decision of an encoding mode is made for each block to select an optimal encoding mode. The decision flow of the coding mode is complicated, and therefore, a method for increasing the coding rate without losing the coding efficiency is needed.
Disclosure of Invention
The embodiment of the application provides a video coding method, a video coding device, computer equipment and a storage medium, which can improve the coding rate on the basis of not losing the coding efficiency. The technical scheme is as follows:
in one aspect, a video encoding method is provided, and the method includes:
obtaining a static macro block in a current video frame, wherein the static macro block is the same as a reference macro block at the same position in a reference video frame;
responding to the reference video frame as a forward reference frame, and acquiring the quality parameter of the static macro block;
acquiring a motion vector of the still macro block in response to the quality parameter of the still macro block being greater than or equal to the quality parameter of the reference macro block;
in response to the motion vector meeting a target condition, skipping cost prediction for other coding modes and coding the still macroblock based on a target coding mode, the other coding modes being coding modes other than the target coding mode.
In one aspect, a video encoding apparatus is provided, the apparatus including:
the first acquisition module is used for acquiring a static macro block in a current video frame, wherein the static macro block is the same as a reference macro block at the same position in a reference video frame;
a second obtaining module, configured to obtain a quality parameter of the static macroblock in response to the reference video frame being a forward reference frame;
a third obtaining module, configured to obtain a motion vector of the stationary macroblock in response to that the quality parameter of the stationary macroblock is greater than or equal to the quality parameter of the reference macroblock;
and an encoding module, configured to skip cost prediction for other encoding modes in response to the motion vector meeting a target condition, and encode the still macroblock based on a target encoding mode, where the other encoding modes are encoding modes other than the target encoding mode.
In one possible implementation, the third obtaining module is configured to:
acquiring coding modes of a plurality of adjacent macroblocks of the static macroblock;
determining a motion vector of the still macroblock based on the coding modes of the plurality of neighboring macroblocks.
In one possible embodiment, the apparatus further comprises:
a first determining module, configured to determine a discrete cosine transform coefficient of the stationary macroblock in response to the quality parameter of the stationary macroblock being less than the quality parameter of the reference macroblock;
the coding module is further configured to, in response to the discrete cosine transform coefficient mapping being zero, skip cost prediction for other coding modes, and code the stationary macroblock based on a target coding mode.
In one possible embodiment, the apparatus further comprises:
a prediction module to predict a target rate distortion cost of a target inter mode based on the target inter mode having a target block size in response to the discrete cosine transform coefficient not being mappable to zero;
the prediction module is further configured to predict a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
and a second determining module, configured to determine, based on the target rate-distortion cost and a minimum value of the plurality of first rate-distortion costs, an encoding mode corresponding to the minimum value.
In one possible embodiment, the apparatus further comprises:
a prediction module to predict a plurality of second rate-distortion costs for a plurality of inter modes based on the plurality of inter modes having different block sizes in response to the reference video frame not being the forward reference frame;
the prediction module is further configured to predict a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
a third determining module, configured to determine, based on a minimum value of the first rate-distortion costs and the second rate-distortion costs, an encoding mode corresponding to the minimum value.
In one possible embodiment, the apparatus further comprises:
a prediction module for predicting, for a non-stationary macroblock in the current video frame, a plurality of second rate-distortion costs for a plurality of inter modes based on the plurality of inter modes having different block sizes;
the prediction module is further configured to predict a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
a third determining module, configured to determine, based on a minimum value of the first rate-distortion costs and the second rate-distortion costs, an encoding mode corresponding to the minimum value.
In a possible implementation, the target condition is that the motion vector is a zero vector.
In one aspect, a computer device is provided, the computer device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one computer program that is loaded and executed by the one or more processors to implement a video encoding method as any one of the above possible implementations.
In one aspect, a storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement the video encoding method according to any one of the possible implementations described above.
In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer readable storage medium. The one or more program codes can be read by one or more processors of the computer device from a computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the computer device can execute the video encoding method of any of the above-mentioned possible embodiments.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
the method has the advantages that the current macro block to be coded is a static macro block, the reference video frame is a forward reference P frame, the coding mode of the static macro block is directly determined to be the target coding mode based on the fact that the quality parameter of the static macro block is larger than or equal to the quality parameter of the reference macro block and the motion vector meets the target condition, prediction judgment of other coding modes is skipped, rate distortion cost does not need to be calculated one by one in-frame mode and inter-frame mode, and the coding rate can be greatly improved under the condition that coding efficiency is not lost.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of an implementation environment of a video encoding method according to an embodiment of the present application;
fig. 2 is a flowchart of a video encoding method according to an embodiment of the present application;
fig. 3 is a flowchart of a video encoding method according to an embodiment of the present application;
fig. 4 is a flowchart of a video encoding method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a video encoding method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.
The term "at least one" in this application means one or more, and the meaning of "a plurality" means two or more, for example, a plurality of first locations means two or more first locations.
Before introducing the embodiments of the present application, some basic concepts in the cloud technology field need to be introduced:
cloud Technology (Cloud Technology): the cloud computing business mode management system is a management technology for unifying series resources such as hardware, software, networks and the like in a wide area network or a local area network to realize data calculation, storage, processing and sharing, namely is a general name of a network technology, an information technology, an integration technology, a management platform technology, an application technology and the like applied based on a cloud computing business mode, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support in the field of cloud technology. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can be realized through cloud computing.
Cloud Computing (Cloud Computing): cloud computing refers to a mode of delivery and use of IT (Internet Technology) infrastructure to obtain required resources through a network in an on-demand, easily scalable manner. In other words, cloud computing is a computing model that can distribute computing tasks over a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand. As a basic capability provider of cloud computing, a cloud computing resource pool (Infrastructure as a Service, IaaS, Infrastructure as a Service platform, cloud platform for short) is established, and multiple types of virtual resources are deployed in the resource pool and are used by external clients. The cloud computing resource pool mainly comprises: computing devices (which are virtualized machines, including operating systems), storage devices, network devices, and the like. According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on the IaaS layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS layer can also be directly deployed on the IaaS layer. The PaaS layer is a platform on which software runs, such as a database, a Web (Web page) container, and the like. The SaaS layer is various service software, such as a Web portal, a short message group sender, and the like. Generally, the SaaS layer and the PaaS layer are upper layers with respect to the IaaS layer.
The generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
Cloud conference: the cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology. A user can share voice, data files and videos with teams and clients all over the world quickly and efficiently only by performing simple and easy-to-use operation through an internet interface, and complex technologies such as transmission and processing of data in a conference are assisted by a cloud conference service provider to operate. At present, domestic cloud conferences mainly focus on service contents which mainly adopt an SaaS mode, wherein the service contents comprise service forms such as telephone, network, video and the like, and the cloud computing-based video conference is called as a cloud conference.
In the cloud conference era, data transmission, processing and storage are all processed by computer resources of video conference manufacturers, users do not need to purchase expensive hardware and install complicated software, and efficient teleconferencing can be performed only by opening a browser and logging in a corresponding interface.
The cloud conference system supports multi-server dynamic cluster deployment, provides a plurality of high-performance servers, and greatly improves conference stability, safety and usability. In recent years, video conferences are popular with many users because of greatly improving communication efficiency, continuously reducing communication cost and bringing about upgrading of internal management level, and the video conferences are widely applied to various fields such as governments, armies, transportation, finance, operators, education, enterprises and the like. Undoubtedly, after the video conference uses cloud computing, the cloud computing has stronger attraction in convenience, rapidness and usability, and the arrival of new climax of video conference application is necessarily stimulated.
In addition, the embodiments of the present application explain some basic terms in the field of video coding:
macroblock (Macro Block, MB): macroblock is a basic concept in video coding technology. Different compression strategies are implemented for different locations by dividing the picture into blocks of different sizes. In video coding, a video frame is typically divided into macroblocks, a macroblock consisting of a block of luminance pixels and two additional blocks of chrominance pixels. In general, a luminance pixel block is a 16 × 16 size pixel block, and the size of two chrominance pixel blocks depends on the sampling format of its video frame, such as: for YUV420 sampled video frames, the chroma pixel block is a block of 8x8 pixels. In each video frame, a plurality of macro blocks are arranged in a slice form, and a video coding algorithm codes the macro blocks one by taking the macro blocks as units to organize a continuous video code stream.
Inter prediction (Inter): the interframe prediction achieves the purpose of image compression by utilizing the correlation among video frames, namely time correlation, and is widely applied to compression coding of common televisions, conference televisions, video telephones and high-definition televisions. In video transmission techniques, a moving image is a temporal image sequence consisting of successive video frames spaced apart in time by a frame period, which has a greater correlation in time than in space. Most of television images have small detail change between adjacent frames, namely video frames have strong correlation, and the compression ratio higher than that of intra-frame coding can be obtained by performing inter-frame coding by utilizing the characteristic of the correlation of the frames. For still images or images with slow movement, some frames can be transmitted less frequently, such as frames transmitted every other frame, frames not transmitted, and data of the previous frame in the frame memory of the receiving end is used as the frame data, which has no influence on the vision. Because the human eye requires a higher spatial resolution for static or slow moving parts of the image, while the temporal resolution may be less demanding. The method is called a frame repetition method, is widely applied to video telephones and video conference systems, and the video frame rate is generally 1-15 frames/second.
Intra prediction (Intra): refer to a new technique adopted by h.264, in h.264, intra prediction may be used when encoding video frames. For each possible block size (except for the special handling of edge blocks), each pixel can be predicted with a different weighted sum (some weights may be 0) of the 17 closest previously coded pixels, i.e. the 17 pixels in the upper left corner of the block where this pixel is located. Wherein each possible block size comprises at least one of 4 x 4, 8x8, or 16x 16. Obviously, the intra-frame prediction is not performed in time, but in a spatial domain, and a predictive coding algorithm can remove spatial redundancy among adjacent blocks, thereby achieving more effective compression. The luminance has 9 types of different modes in common, but the chrominance intra prediction has only 1 type of mode, depending on the point of the prediction reference selected.
Skip mode (Skip): skip mode is a special coding mode for inter prediction, if a macroblock is decided as skip mode, it has no pixel residue and no motion vector residue. When decoding, the MV information of the adjacent blocks is directly used for predicting the MV of the current block so as to obtain a pixel prediction value. That is, the pixel reconstruction value in the skip mode is equal to the pixel prediction value.
Motion Vector (MV): in inter prediction, the relative displacement between the current macroblock and the best matching block (i.e., the reference macroblock) in its reference video frame is represented by a motion vector. Each divided block has corresponding motion information to be transmitted to a decoding end. If the MVs of each block are coded and transmitted independently, especially divided into small-sized blocks, a considerable number of bits are consumed.
Quantization Parameter (QP): the quantization parameter reflects the spatial detail compression. The smaller the QP value, the finer the quantization, the higher the image quality, and the longer the resulting codestream. If the QP is small, most of the details are preserved; QP increases, some details are lost, the code rate decreases, but image distortion increases and quality degrades. The existing code rate control algorithm mainly outputs a target code rate by adjusting the QP size of discrete cosine transform, wherein the QP and the bit rate are in an inverse proportion relation, and the inverse proportion relation is more obvious along with the improvement of the complexity of a video source.
Rate-Distortion Cost (Rate-Distortion Cost, RD Cost): in the H.264 coding process, a plurality of modes can be selected, and the image distortion of some modes is small, but the code rate is large; some modes have large image distortion but small code rate. The rate-distortion cost is a parameter of rate-distortion optimization, which aims to: under the limit of a certain code rate, the distortion of the video is reduced; video is compressed to a minimum, allowing for some distortion. Since the bitrate is expressed by the degree of data compression, the lower the bitrate, the higher the degree of data compression, and the distortion is expressed by the subjective quality of the video, and the lower the distortion, the higher the subjective quality of the video, the goal of rate-distortion optimization can also be considered as: and selecting an optimal coding mode according to a certain strategy to realize optimal coding performance. The rate distortion optimization is to select the mode with the minimum rate distortion cost for video coding.
Forward reference frame (P frame): when encoding for consecutive motion video frames, consecutive video frames are divided P, B, I into three types, a P frame is predicted from a P frame or I frame preceding it, and it is inter-frame compressed in comparison with the same information or data between P frames or I frames preceding it, i.e. taking into account the nature of the motion. The coding mode of the P frame needs to make intra mode decision and inter mode decision when deciding, and at most one reference video frame is allowed.
Discrete Cosine Transform (DCT): the DCT transform is a transform defined on a real signal, and the transform results in a real signal in the frequency domain. The DCT Transform is a Transform related to the Fourier Transform, similar to the Discrete Fourier Transform (DFT), but using only real numbers. DCT transform can reduce the number of computations by more than half compared to DFT transform. The DCT transform also has a very important property, called the energy concentration property: most of natural signals (such as sound and image) have energy concentrated in a low frequency portion after DCT transformation, and thus DCT transformation is widely used in data compression of natural signals such as sound and image.
In a cloud conference scene, screen sharing between a speaker and participants is often involved, the screen sharing is mainly focused on two application scenes of PPT (PowerPoint, presentation) sharing and document browsing, in which a large part of regions (such as background regions) are unchanged and only a few characters or pictures are changed. The traditional h.264 coding mode decision algorithm does not consider the change information of the region, and performs complete mode decision on all macroblocks of the input video frame, that is, performs inter-frame prediction and intra-frame prediction of all division modes on the current macroblock, which results in long video coding time consumption, serious coding efficiency loss, and influences the subjective visual experience of the user.
In view of this, embodiments of the present application provide a video encoding method, which can utilize the detected region change information between the current video frame and the reference video frame to perform a fast encoding mode decision, and save the time overhead of performing the mode decision on the macro block located in the static region, so as to improve the encoding rate while minimizing the loss of the encoding efficiency. In other words, the static area detection is performed on the current video frame according to the macroblock level, the static macroblock is identified, when each macroblock is subjected to mode decision, the detected static area information is utilized to analyze the current macroblock, the mode type of the decision needed by the current macroblock is determined, and some unnecessary mode decisions are skipped in advance according to different situations, so that the coding complexity is reduced.
Fig. 1 is a schematic diagram of an implementation environment of a video encoding method according to an embodiment of the present application. Referring to fig. 1, a terminal 101 and a server 102 may be included in the implementation environment, and both the terminal 101 and the server 102 are computer devices.
The terminal 101 is configured to run a video application, for example, the video application includes an application program capable of providing a video streaming service, such as a cloud conference application, a live application, and a short video application, and the terminal 101 installs the video application, starts the video application in response to a trigger operation of a user on the video application, and pulls a video stream from the server 102.
The terminal 101 is directly or indirectly connected to the server 102 through wired or wireless communication, and the present application is not limited thereto.
The server 102 is used to provide background services for video applications. Alternatively, the server 102 encodes the original video stream and outputs the encoded target video stream to the terminal 101. The server 102 includes at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center. Optionally, the server 102 undertakes primary computational work and the terminal 101 undertakes secondary computational work; or, the server 102 undertakes the secondary computing work, and the terminal 101 undertakes the primary computing work; or, the terminal 101 and the server 102 perform cooperative computing by using a distributed computing architecture.
In some embodiments, the server 102 encodes the original video stream and pushes the encoded target video stream to the terminal 101, or after the original video stream is recorded by the terminal, the terminal encodes the original video stream, sends the encoded target video stream to the server 102, and forwards the encoded target video stream to another terminal requesting to view the target video stream by the server 102.
Optionally, the server 102 is an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and artificial intelligence platform.
Optionally, the device types of the terminal 101 include: a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4) player, and the like, but are not limited thereto. For example, the terminal 101 may be a smartphone, or other handheld portable projection device. The following embodiments are illustrated with the terminal 101 comprising a smartphone.
Those skilled in the art will appreciate that the number of terminals 101 described above may be greater or fewer. For example, the number of the terminals 101 may be only one, or the number of the terminals 101 may be several tens or hundreds, or more. The number and the device type of the terminals 101 are not limited in the embodiment of the present application.
Fig. 2 is a flowchart of a video encoding method according to an embodiment of the present application. Referring to fig. 2, the embodiment is applied to a computer device, and is described below by taking the computer device as a server (or other terminal devices), and the embodiment includes:
201. the server acquires a static macro block in the current video frame, wherein the static macro block is the same as a reference macro block at the same position in the reference video frame.
Optionally, the server is configured to provide a video push service to the terminal, for example, the server receives an original video stream sent by the first terminal, encodes or transcodes the original video stream, outputs a target video stream with different code rates, and sends the target video stream with different code rates to different second terminals according to a pull requirement of the second terminals. In one example, taking a live scene as an example, the first terminal is a main broadcasting terminal, and the second terminal is a spectator terminal; in another example, taking a cloud conference scene as an example, the first terminal is a speaker terminal, and the second terminal is a participant terminal. The embodiment of the present application does not specifically limit the types of the first terminal and the second terminal.
In some embodiments, a video frame currently being processed by the server is referred to as a current video frame, the server divides the current video frame into a plurality of macro blocks, and performs video coding in units of macro blocks, different macro blocks may adopt different coding modes, for example, the coding modes are divided into intra-frame prediction and inter-frame prediction, and a special coding mode, namely, a skip mode, is involved in the inter-frame prediction.
Alternatively, the server determines a reference macro block corresponding to the position coordinates in the reference video frame based on the position coordinates of the current macro block for any macro block currently being processed in the current video frame (referred to as the current macro block), and determines that the current macro block is a still macro block in response to the current macro block completely coinciding with the reference macro block, and then performs step 202 below. Conversely, in response to the current macroblock not completely coinciding with the reference macroblock, the current macroblock is determined to be a non-stationary macroblock.
Optionally, the server predicts, for a non-stationary macroblock in the current video frame, a plurality of second rate-distortion costs of a plurality of inter modes based on the plurality of inter modes having different block sizes; predicting a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes; and determining the coding mode corresponding to the minimum value based on the minimum value of the first rate distortion costs and the second rate distortion costs.
In some embodiments, if the current macroblock is a non-stationary macroblock, the server first traverses all block sizes, calculates a rate distortion cost of an inter mode of each block size, determines an inter mode having a minimum inter rate distortion cost (a second rate distortion cost) as a current optimal inter mode, and records the current minimum inter rate distortion cost as min _ inter _ cost; then, carrying out intra-frame mode decision on the current macro block, traversing the rate distortion cost of intra-frame modes with different block sizes, determining the intra-frame mode used for the minimum intra-frame rate distortion cost (first rate distortion cost) as the current optimal intra-frame mode, and recording the current minimum intra-frame rate distortion cost as min _ intra _ cost; and further comparing the minimum inter-frame rate distortion cost min _ inter _ cost with the minimum intra-frame rate distortion cost min _ intra _ cost, and determining the coding mode corresponding to the minimum value of the two as the coding mode of the current macroblock, namely selecting the mode with the minimum rate distortion cost as the optimal prediction mode.
In the above process, if the current macroblock is a non-stationary macroblock, the server still needs to predict the respective rate distortion cost for each inter mode and each intra mode, and selects the coding mode corresponding to the minimum rate distortion cost, so as to ensure that the coding efficiency of the current macroblock is not lost.
202. The server obtains the quality parameters of the still macro block in response to the reference video frame being a forward reference frame (P-frame).
In some embodiments, in the case where the current macroblock is a still macroblock, in response to a reference video frame of the current video frame being a P frame, the server calculates quality parameters of the still macroblock and calculates quality parameters of the reference macroblock. The server performs step 203 described below in response to the quality parameter of the still macroblock being greater than or equal to the quality parameter of the reference macroblock. The server responds to the fact that the quality parameter of the static macro block is smaller than the quality parameter of the reference macro block, and the DCT transformation coefficient of the static macro block is determined; in response to the DCT transform coefficients being able to map to zero, the step of encoding the still macroblock based on the target encoding mode in step 204 described below is performed.
In one example, the quality parameter is a quantization parameter QP, when the server calculates the QP of each of the still macroblock and the reference macroblock, respectively, and in response to the QP of the still macroblock being greater than or equal to the QP of the reference macroblock, the following step 203 is performed, otherwise, the DCT transform coefficient of the still macroblock is determined, and if the DCT transform coefficient can be mapped to zero, the following step 204 of skipping cost prediction for other coding modes and coding the still macroblock based on the target coding mode is performed.
In the above process, when the quality parameter of the static macroblock is smaller than the quality parameter of the reference macroblock, it is equivalent to the server performing the decision of the target coding mode on the current macroblock (static macroblock), optionally, the server performs motion compensation, DCT transformation and quantization on the current macroblock, if the DCT transformation coefficient can be cleared, the current macroblock is suitable for selecting the target coding mode, and the following step 204 is executed. For example, the target coding mode is a skip mode.
In other embodiments, the server predicts a target rate-distortion cost for a target inter mode having a target block size based on the target inter mode being based on the DCT transform coefficients not being mapped to zeros; predicting a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes; based on the target rate distortion cost and the minimum value of the plurality of first rate distortion costs, determining the coding mode corresponding to the minimum value.
In the above process, since the DCT transform coefficient cannot be cleared to zero, which indicates that the current macroblock is not suitable for selecting the target coding mode, further determination needs to be performed on other coding models, at this time, the target inter-frame mode of the target block size is directly regarded as the inter-frame mode with the minimum rate distortion cost, rate distortion costs of other intra-frame modes are determined one by one, and the coding mode corresponding to the minimum value of all the calculated rate distortion costs is determined as the coding mode of the current macroblock. By only calculating the target rate distortion cost of the target inter-frame mode, the rate distortion cost of all inter-frame modes is avoided being judged one by one, and therefore the coding rate is greatly improved.
In one example, the target block size is 16 × 16 block size, if the server cannot zero out the DCT transform coefficient, inter mode prediction is performed on a current macroblock by 16 × 16 block size, and a target rate distortion cost of the current mode is calculated, the target rate distortion cost is recorded as min _ inter _ cost, inter mode decision of other block sizes is skipped, that is, the rate distortion cost in the inter mode by 16 × 16 block size is directly regarded as the minimum rate distortion cost in the inter modes of all block sizes, so that time consumption for calculating the rate distortion cost in the inter modes one by one is saved, and the encoding rate is increased.
In some embodiments, the server predicts, in response to the reference video frame not being a P frame, a second plurality of rate-distortion costs for a plurality of inter modes based on the plurality of inter modes having different block sizes; predicting a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes; and determining the coding mode corresponding to the minimum value based on the minimum value of the first rate distortion costs and the second rate distortion costs.
In some embodiments, if the reference video frame is not a P frame, the server first traverses all block sizes, calculates a rate distortion cost of an inter mode for each block size, determines an inter mode having a minimum inter rate distortion cost (second rate distortion cost) as a current optimal inter mode, and records the current minimum inter rate distortion cost as min _ inter _ cost; then, carrying out intra-frame mode decision on the current macro block, traversing the rate distortion cost of intra-frame modes with different block sizes, determining the intra-frame mode used for the minimum intra-frame rate distortion cost (first rate distortion cost) as the current optimal intra-frame mode, and recording the current minimum intra-frame rate distortion cost as min _ intra _ cost; and further comparing the minimum inter-frame rate distortion cost min _ inter _ cost with the minimum intra-frame rate distortion cost min _ intra _ cost, and determining the coding mode corresponding to the minimum value of the two as the coding mode of the current macroblock, namely selecting the mode with the minimum rate distortion cost as the optimal prediction mode.
In the above process, if the reference video frame is not a P frame, the server still needs to predict the rate distortion cost of each inter-frame mode and each intra-frame mode respectively, and select the coding mode corresponding to the minimum rate distortion cost, so that it can be ensured that the coding efficiency of the current macroblock is not lost.
203. The server responds to the fact that the quality parameter of the static macro block is larger than or equal to the quality parameter of the reference macro block, and obtains the motion vector of the static macro block.
In some embodiments, the server may acquire the encoding modes of a plurality of neighboring macroblocks of the still macroblock in response to the quality parameter of the still macroblock being greater than or equal to the quality parameter of the reference macroblock; the motion vector of the still macroblock is determined based on the coding modes of the plurality of neighboring macroblocks. Optionally, the number of the plurality of neighboring macroblocks is any integer greater than or equal to 2, for example, the number of the plurality of neighboring macroblocks is 3.
In some embodiments, the server obtains that the motion vector of the target coding mode predicted by 3 adjacent macro blocks around the current macro block is pskip _ mv, if the pskip _ mv meets the target condition, the following step 204 is executed, otherwise, if the pskip _ mv does not meet the target condition, the DCT transform coefficient of the current macro block is determined; in response to the DCT transform coefficients being able to map to zero, the following step 204 of skipping cost prediction for other coding modes and coding the still macroblock based on the target coding mode is performed, otherwise, prediction for the current macroblock is performed for full intra mode and inter mode for each block size.
204. The server skips cost prediction for other coding modes in response to the motion vector meeting a target condition, and codes the still macroblock based on the target coding mode, the other coding modes being coding modes other than the target coding mode.
Optionally, the target condition is that the motion vector is a zero vector, i.e. pskip _ mv is (0,0), and the target condition indicates that the reference macroblock and the current macroblock are located at the same position.
In some embodiments, the server determines the coding mode of the current macroblock as the target coding mode directly without performing cost prediction on other coding modes in response to the motion vector meeting the target condition, and codes the current macroblock based on the target coding mode. For example, the target coding mode is a SKIP mode (SKIP mode). When the motion vector meets the target condition, the target coding mode is directly determined for coding, and other coding modes do not need to be judged one by one, so that the coding rate is greatly improved.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, the current macro block to be coded is a static macro block, the reference video frame is a P frame, the coding mode of the static macro block is directly determined as the target coding mode based on the fact that the quality parameter of the static macro block is larger than or equal to the quality parameter of the reference macro block and the motion vector meets the target condition, the prediction judgment of other coding modes is skipped, the rate distortion cost does not need to be calculated one by one in-frame mode and inter-frame mode, and the coding rate can be greatly improved under the condition of not losing the coding efficiency.
Fig. 3 is a flowchart of a video encoding method according to an embodiment of the present application. Referring to fig. 3, the embodiment is applied to a computer device, and the following description takes the computer device as a server as an example, and the embodiment includes:
301. the server acquires a static macro block in the current video frame, wherein the static macro block is the same as a reference macro block at the same position in the reference video frame.
Step 301 is similar to step 201 and will not be described herein.
302. And the server responds to the reference video frame as a P frame and acquires the quality parameter of the static macro block.
Step 302 is similar to step 202, and is not described herein.
303. The server determines the DCT transform coefficients of the stationary macro block in response to the quality parameter of the stationary macro block being less than the quality parameter of the reference macro block.
304. The server skips cost prediction for other coding modes in response to the DCT transform coefficient mapping being zero, and encodes the stationary macroblock based on a target coding mode, the other coding modes being coding modes other than the target coding mode.
In the above process, when the quality parameter of the static macroblock is smaller than the quality parameter of the reference macroblock, it is equivalent to the server performing a decision of a target coding mode on the current macroblock (static macroblock), optionally, the server performs at least one of motion compensation, DCT transform, or quantization on the current macroblock, if the DCT transform coefficient can be cleared, the current macroblock is suitable for selecting the target coding mode, skipping cost prediction on other coding modes, and coding the static macroblock based on the target coding mode. For example, the target coding mode is a skip mode.
According to the method provided by the embodiment of the application, the current macro block to be coded is a static macro block, the reference video frame is a P frame, the static macro block is based on the condition that the quality parameter of the static macro block is smaller than that of the reference macro block and the DCT conversion coefficient can be cleared, the coding mode of the static macro block is directly determined to be the target coding mode, the prediction judgment of other coding modes is skipped, the rate distortion cost does not need to be calculated one by one in-frame mode and inter-frame mode, and the coding rate can be greatly improved under the condition of not losing the coding efficiency.
Fig. 4 is a flowchart of a video encoding method according to an embodiment of the present application. Referring to fig. 4, the embodiment is applied to a computer device, and the following description takes the computer device as a server as an example, and the embodiment includes:
401. the server acquires a static macro block in the current video frame, wherein the static macro block is the same as a reference macro block at the same position in the reference video frame.
Step 401 is similar to step 201 and is not described herein.
402. And the server responds to the reference video frame as a P frame and acquires the quality parameter of the static macro block.
For example, the quality parameter is the quantization parameter QP.
Step 402 is similar to step 202, and is not described herein.
403. The server determines the DCT transform coefficients of the stationary macro block in response to the quality parameter of the stationary macro block being less than the quality parameter of the reference macro block.
Step 403 is similar to step 303 and will not be described herein.
404. The server predicts a target rate-distortion cost for a target inter mode having a target block size based on the target inter mode in response to the DCT transform coefficient not being mapped to zero.
In the above process, because the DCT transform coefficient cannot be cleared, it is indicated that the current macroblock is not suitable for selecting the target coding mode, and therefore further determination needs to be performed on other coding models, at this time, the target inter-frame mode of the target block size is directly regarded as the inter-frame mode with the minimum rate distortion cost, and only the target rate distortion cost of the target inter-frame mode is calculated. By only calculating the target rate distortion cost of the target inter-frame mode, the rate distortion cost of all inter-frame modes is avoided being judged one by one, and therefore the coding rate is greatly improved.
In one example, the target block size is 16 × 16 block size, if the server cannot zero out the DCT transform coefficient, inter mode prediction is performed on a current macroblock by 16 × 16 block size, and a target rate distortion cost of the current mode is calculated, the target rate distortion cost is recorded as min _ inter _ cost, inter mode decision of other block sizes is skipped, that is, the rate distortion cost in the inter mode by 16 × 16 block size is directly regarded as the minimum rate distortion cost in the inter modes of all block sizes, so that time consumption for calculating the rate distortion cost in the inter modes one by one is saved, and the encoding rate is increased.
405. The server predicts a plurality of first rate-distortion costs for a plurality of intra modes having different block sizes based on the plurality of intra modes.
In the process, the server judges the rate distortion cost of each other intra-frame mode one by one to obtain a plurality of first rate distortion costs.
406. The server determines, based on the target rate-distortion cost and a minimum value of the plurality of first rate-distortion costs, an encoding mode corresponding to the minimum value.
In the above process, the server determines the coding mode corresponding to the minimum value of all rate-distortion costs calculated in the above steps 404 and 405 as the coding mode of the current macroblock.
According to the method provided by the embodiment of the application, the current macro block to be coded is a static macro block, the reference video frame is a P frame, the target inter-frame mode with the size of the target block is directly regarded as the inter-frame mode with the minimum rate distortion cost based on the fact that the quality parameter of the static macro block is smaller than that of the reference macro block and the DCT transformation coefficient cannot be cleared, only the target rate distortion cost of the target inter-frame mode is calculated, and therefore the rate distortion cost of all the inter-frame modes is avoided being judged one by one, and the coding rate is greatly improved.
Fig. 5 is a schematic flowchart of a video coding method according to an embodiment of the present application, please refer to 500, which is described with reference to an example in which a target coding mode is an SKIP mode, and the method includes:
step one, a server detects whether an input current macro block is completely consistent with a reference macro block at the same position of a reference video frame, if so, the current macro block is judged to be a static macro block, otherwise, the current macro block is judged to be a non-static macro block (also called a change macro block).
Step two, the server judges the current macro block: if the current macro block is a static macro block and the reference video frame of the current video frame is a P frame, executing the third step; if any one of the conditions is not satisfied, for example, the current macroblock is a non-still macroblock, or the reference video frame is not a P frame, step eight is executed.
Step three, the server judges the QP of the current macro block: if the QP used by the quantization of the current macro block is larger than or equal to the QP used by the reference macro block at the same position of the reference video frame, executing the step four; otherwise, executing step five.
Step four, the server records the MV of the SKIP mode predicted by three adjacent macro blocks around the current macro block as pskip _ MV, and judges the pskip _ MV: if the pskip _ mv is (0,0), namely the positions of the reference macro block and the current macro block are the same, executing a step six; otherwise, executing step five.
And step five, the server judges the SKIP mode of the current macro block, wherein the judgment comprises the processes of motion compensation, DCT transformation, quantization and the like, and if the DCT coefficient can be cleared to 0, the current macro block is suitable for selecting the SKIP mode. If the current macro block can be selected as the SKIP mode, executing the step six; otherwise, executing step seven.
And step six, the server selects the SKIP mode for the current macro block, other mode judgment is not needed, the mode decision of the current macro block is ended, and the following steps are not needed to be executed.
And step seven, the server performs inter-mode prediction on the current macro block by 16x16 blocks, calculates RD Cost of the current mode, records the RD Cost as min _ inter _ Cost, skips inter-mode judgment of other blocks, and directly executes the step nine.
And step eight, the server carries out complete inter-frame mode decision on the current macro block, namely, traversing all block sizes, calculating the RD Cost of each inter-frame mode with the block size, selecting the current optimal inter-frame mode as the block partition type with the minimum inter-frame RD Cost, recording the current minimum inter-frame RD Cost as min _ inter _ Cost, and executing the step nine.
And step nine, the server performs intra-frame mode decision on the current macro block, namely, the intra-frame mode RD Cost with different block sizes is calculated in a traversing mode, the current optimal intra-frame mode is selected as the block partition type with the minimum intra-frame RD Cost, the current minimum intra-frame RD Cost is recorded as min _ intra _ Cost, and the step ten is executed.
Step ten, the server compares the min _ inter _ Cost with the min _ intra _ Cost, selects the mode with the minimum RD Cost as the optimal prediction mode of the current macro block, and finishes the mode selection of the current macro block.
According to the method provided by the embodiment of the application, by introducing the static area information detection module at the macro block level (namely, detecting the static macro block from the macro block level) and designing the fast mode decision algorithm based on the detected information, the complexity of mode decision of the static macro block which does not change between adjacent frames is effectively reduced, so that the coding complexity is reduced, the coding time is reduced, and the coding rate is improved.
The above-mentioned fast coding mode decision algorithm based on the refined region can reduce the coding complexity as much as possible on the premise of basically not losing the compression efficiency, so as to shorten the coding time consumption and improve the coding rate. As shown in table 1, for different types of video streams, various indexes of the coding efficiency and the coding rate in the conventional h.264 coding are shown, where QP represents a quantization parameter, Kbps represents a kilobit rate of digital Signal transmission, Y PSNR represents PSNR (Peak Signal to Noise Ratio) of a luminance Signal Y, U PSNR represents PSNR of a chrominance Signal U, V PSNR represents PSNR of a chrominance Signal V, and Enc fps represents a transmission frame number per second (i.e., a frame rate) of a video stream obtained by coding.
TABLE 1
Figure BDA0002797157000000171
Figure BDA0002797157000000181
Figure BDA0002797157000000191
As shown in table 2, for different types of video streams, various indexes of the coding efficiency and the coding rate when video coding is performed based on the video coding method provided in the embodiment of the present application are shown.
TABLE 2
Figure BDA0002797157000000192
Figure BDA0002797157000000201
As shown in table 3, the comparison results between the two different video encoding methods in table 1 and table 2 are shown, and BD-rates (BD rates) of the luminance signal Y, the chrominance signal U, and the chrominance signal V are given for different types of video streams.
TABLE 3
Figure BDA0002797157000000202
Figure BDA0002797157000000211
Experiments prove that by using the method, about 30% of coding time is effectively saved on the premise that the BD-rate loss is 0.9%.
Fig. 6 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application, please refer to fig. 6, the apparatus includes:
a first obtaining module 601, configured to obtain a static macro block in a current video frame, where the static macro block is the same as a reference macro block at the same position in a reference video frame;
a second obtaining module 602, configured to obtain a quality parameter of the still macroblock in response to the reference video frame being a forward reference frame;
a third obtaining module 603, configured to obtain a motion vector of the stationary macroblock in response to that the quality parameter of the stationary macroblock is greater than or equal to the quality parameter of the reference macroblock;
an encoding module 604, configured to skip cost prediction for other encoding modes and encode the still macroblock based on a target encoding mode in response to the motion vector meeting a target condition, the other encoding modes being encoding modes other than the target encoding mode.
According to the device provided by the embodiment of the application, the current macro block to be coded is a static macro block, the reference video frame is a forward reference P frame, the coding mode of the static macro block is directly determined as the target coding mode based on the fact that the quality parameter of the static macro block is larger than or equal to the quality parameter of the reference macro block and the motion vector meets the target condition, prediction judgment on other coding modes is skipped, rate distortion cost does not need to be calculated one by one in-frame mode and inter-frame mode, and the coding rate can be greatly improved under the condition that coding efficiency is not lost.
In a possible implementation, the third obtaining module 603 is configured to:
acquiring coding modes of a plurality of adjacent macroblocks of the static macroblock;
the motion vector of the still macroblock is determined based on the coding modes of the plurality of neighboring macroblocks.
In a possible embodiment, based on the apparatus composition of fig. 6, the apparatus further comprises:
a first determining module, configured to determine a discrete cosine transform coefficient of the stationary macroblock in response to the quality parameter of the stationary macroblock being less than the quality parameter of the reference macroblock;
the coding module is further configured to, in response to that the discrete cosine transform coefficients can be mapped to zero, perform the step of skipping cost prediction for other coding modes and coding the still macroblock based on the target coding mode.
In a possible embodiment, based on the apparatus composition of fig. 6, the apparatus further comprises:
a prediction module for predicting a target rate distortion cost of a target inter mode based on the target inter mode having a target block size in response to the discrete cosine transform coefficient not being mappable to zero;
the prediction module is further configured to predict a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
and a second determining module, configured to determine, based on the target rate-distortion cost and a minimum value of the plurality of first rate-distortion costs, an encoding mode corresponding to the minimum value.
In a possible embodiment, based on the apparatus composition of fig. 6, the apparatus further comprises:
a prediction module to predict a plurality of second rate-distortion costs for a plurality of inter modes based on the plurality of inter modes having different block sizes in response to the reference video frame not being the forward reference frame;
the prediction module is further configured to predict a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
and a third determining module, configured to determine, based on a minimum value of the plurality of first rate-distortion costs and the plurality of second rate-distortion costs, an encoding mode corresponding to the minimum value.
In a possible embodiment, based on the apparatus composition of fig. 6, the apparatus further comprises:
a prediction module, configured to predict, for a non-stationary macroblock in the current video frame, a plurality of second rate-distortion costs for a plurality of inter modes based on the plurality of inter modes having different block sizes;
the prediction module is further configured to predict a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
and a third determining module, configured to determine, based on a minimum value of the plurality of first rate-distortion costs and the plurality of second rate-distortion costs, an encoding mode corresponding to the minimum value.
In one possible embodiment, the target condition is that the motion vector is a zero vector.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the video encoding apparatus provided in the above embodiment, only the division of the above functional modules is taken as an example for the video encoding, and in practical applications, the above functions can be distributed by different functional modules as needed, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the above described functions. In addition, the video encoding apparatus and the video encoding method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the video encoding method embodiments, and are not described herein again.
Fig. 7 is a schematic structural diagram of a computer device 700 according to an embodiment of the present application, where the computer device 700 may generate a relatively large difference due to different configurations or performances, the computer device 700 includes one or more processors (CPUs) 701 and one or more memories 702, where the memory 702 stores at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 701 to implement the video encoding method provided in the foregoing embodiments. Optionally, the computer device 700 further has a wired or wireless network interface, a keyboard, an input/output interface, and other components to facilitate input and output, and the computer device 700 further includes other components for implementing the device functions, which are not described herein again.
Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, taking a computer device as an example of a terminal 800, the device types of the terminal 800 include: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. The terminal 800 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.
In general, the terminal 800 includes: a processor 801 and a memory 802.
Optionally, processor 801 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. Alternatively, the processor 801 is implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). In some embodiments, processor 801 includes a main processor and a coprocessor, the main processor being a processor for Processing data in the wake state, also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 is integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 801 further includes an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
In some embodiments, memory 802 includes one or more computer-readable storage media, which are optionally non-transitory. Optionally, memory 802 also includes high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 802 is used to store at least one program code for execution by the processor 801 to implement the video encoding methods provided by various embodiments herein.
In some embodiments, the terminal 800 may further include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 can be connected by bus or signal lines. Various peripheral devices can be connected to the peripheral interface 803 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a display screen 805, a camera assembly 806, an audio circuit 807, a positioning assembly 808, and a power supply 809.
The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 are implemented on separate chips or circuit boards, which are not limited by this embodiment.
The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. Optionally, the radio frequency circuit 804 communicates with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 804 further includes NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 805 is used to display a UI (User Interface). Optionally, the UI includes graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal can be input to the processor 801 as a control signal for processing. Optionally, the display 805 is also used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 805 is one, providing the front panel of terminal 800; in other embodiments, there are at least two display screens 805, each disposed on a different surface of the terminal 800 or in a folded design; in still other embodiments, the display 805 is a flexible display disposed on a curved surface or a folded surface of the terminal 800. Even, optionally, the display 805 is arranged in a non-rectangular irregular figure, i.e. a shaped screen. Optionally, the Display 805 is made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.
The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 also includes a flash. Optionally, the flash is a monochrome temperature flash, or a bi-color temperature flash. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and is used for light compensation under different color temperatures.
In some embodiments, the audio circuitry 807 includes a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones are respectively arranged at different positions of the terminal 800. Optionally, the microphone is an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. Alternatively, the speaker is a conventional membrane speaker, or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to human, but also the electric signal can be converted into a sound wave inaudible to human for use in distance measurement or the like. In some embodiments, the audio circuitry 807 also includes a headphone jack.
The positioning component 808 is used to locate the current geographic position of the terminal 800 for navigation or LBS (Location Based Service). Optionally, the Positioning component 808 is a Positioning component based on a GPS (Global Positioning System) in the united states, a beidou System in china, a graves System in russia, or a galileo System in the european union.
Power supply 809 is used to provide power to various components in terminal 800. Optionally, the power source 809 is alternating current, direct current, disposable batteries, or rechargeable batteries. When the power supply 809 comprises a rechargeable battery, the rechargeable battery supports wired charging or wireless charging. The rechargeable battery is also used to support fast charge technology.
In some embodiments, terminal 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.
In some embodiments, the acceleration sensor 811 detects acceleration magnitudes on three coordinate axes of a coordinate system established with the terminal 800. For example, the acceleration sensor 811 is used to detect the components of the gravitational acceleration in three coordinate axes. Optionally, the processor 801 controls the display screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 is also used for acquisition of motion data of a game or a user.
In some embodiments, the gyro sensor 812 detects a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 cooperates with the acceleration sensor 811 to acquire a 3D motion of the terminal 800 by the user. The processor 801 implements the following functions according to the data collected by the gyro sensor 812: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Optionally, pressure sensors 813 are disposed on the side frames of terminal 800 and/or underneath display 805. When the pressure sensor 813 is disposed on the side frame of the terminal 800, the holding signal of the user to the terminal 800 can be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Optionally, fingerprint sensor 814 is disposed on the front, back, or side of terminal 800. When a physical button or a vendor Logo is provided on the terminal 800, the fingerprint sensor 814 can be integrated with the physical button or the vendor Logo.
The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, processor 801 controls the display brightness of display 805 based on the ambient light intensity collected by optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display 805 is reduced. In another embodiment, the processor 801 also dynamically adjusts the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.
A proximity sensor 816, also known as a distance sensor, is typically provided on the front panel of the terminal 800. The proximity sensor 816 is used to collect the distance between the user and the front surface of the terminal 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 gradually decreases, the processor 801 controls the display 805 to switch from the bright screen state to the dark screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 becomes gradually larger, the display 805 is controlled by the processor 801 to switch from the breath-screen state to the bright-screen state.
Those skilled in the art will appreciate that the configuration shown in fig. 8 is not intended to be limiting of terminal 800, and can include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer-readable storage medium, such as a memory including at least one computer program, which is executable by a processor in a terminal to perform the video encoding method in the above embodiments, is also provided. For example, the computer-readable storage medium includes a ROM (Read-Only Memory), a RAM (Random-Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product or computer program is also provided, comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more program codes can be read by one or more processors of the computer device from a computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the computer device can execute to complete the video encoding method in the above-described embodiments.
Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program instructing relevant hardware, and optionally, the program is stored in a computer readable storage medium, and optionally, the above mentioned storage medium is a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of video encoding, the method comprising:
obtaining a static macro block in a current video frame, wherein the static macro block is the same as a reference macro block at the same position in a reference video frame;
responding to the reference video frame as a forward reference frame, and acquiring the quality parameter of the static macro block;
acquiring a motion vector of the still macro block in response to the quality parameter of the still macro block being greater than or equal to the quality parameter of the reference macro block;
in response to the motion vector meeting a target condition, skipping cost prediction for other coding modes and coding the still macroblock based on a target coding mode, the other coding modes being coding modes other than the target coding mode.
2. The method of claim 1, wherein the obtaining the motion vector of the still macroblock comprises:
acquiring coding modes of a plurality of adjacent macroblocks of the static macroblock;
determining a motion vector of the still macroblock based on the coding modes of the plurality of neighboring macroblocks.
3. The method of claim 1, further comprising:
determining discrete cosine transform coefficients of the stationary macro block in response to the quality parameters of the stationary macro block being less than the quality parameters of the reference macro block;
and in response to the discrete cosine transform coefficient mapping being zero, performing a step of skipping cost prediction for other coding modes and coding the still macroblock based on a target coding mode.
4. The method of claim 3, further comprising:
predicting a target rate-distortion cost of a target inter mode having a target block size based on the target inter mode in response to the discrete cosine transform coefficient not being mappable to zero;
predicting a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
determining, based on the target rate-distortion cost and a minimum value of the plurality of first rate-distortion costs, an encoding mode corresponding to the minimum value.
5. The method of claim 1, further comprising:
predicting a plurality of second rate-distortion costs for a plurality of inter modes having different block sizes based on the plurality of inter modes in response to the reference video frame not being the forward reference frame;
predicting a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
determining, based on a minimum value of the first rate-distortion costs and the second rate-distortion costs, an encoding mode corresponding to the minimum value.
6. The method of claim 1, further comprising:
predicting, for a non-stationary macroblock in the current video frame, a plurality of second rate-distortion costs for a plurality of inter modes based on the plurality of inter modes having different block sizes;
predicting a plurality of first rate-distortion costs for a plurality of intra modes based on the plurality of intra modes having different block sizes;
determining, based on a minimum value of the first rate-distortion costs and the second rate-distortion costs, an encoding mode corresponding to the minimum value.
7. The method according to any of claims 1 to 6, wherein the target condition is that the motion vector is a zero vector.
8. A video encoding apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring a static macro block in a current video frame, wherein the static macro block is the same as a reference macro block at the same position in a reference video frame;
a second obtaining module, configured to obtain a quality parameter of the static macroblock in response to the reference video frame being a forward reference frame;
a third obtaining module, configured to obtain a motion vector of the stationary macroblock in response to that the quality parameter of the stationary macroblock is greater than or equal to the quality parameter of the reference macroblock;
and an encoding module, configured to skip cost prediction for other encoding modes in response to the motion vector meeting a target condition, and encode the still macroblock based on a target encoding mode, where the other encoding modes are encoding modes other than the target encoding mode.
9. A computer device, characterized in that the computer device comprises one or more processors and one or more memories in which at least one computer program is stored, the at least one computer program being loaded and executed by the one or more processors to implement the video encoding method according to any one of claims 1 to 7.
10. A storage medium having stored therein at least one computer program which is loaded and executed by a processor to implement the video encoding method of any one of claims 1 to 7.
CN202011336008.7A 2020-11-25 2020-11-25 Video encoding method, video encoding device, computer equipment and storage medium Active CN112532975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011336008.7A CN112532975B (en) 2020-11-25 2020-11-25 Video encoding method, video encoding device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011336008.7A CN112532975B (en) 2020-11-25 2020-11-25 Video encoding method, video encoding device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112532975A true CN112532975A (en) 2021-03-19
CN112532975B CN112532975B (en) 2021-09-21

Family

ID=74993160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011336008.7A Active CN112532975B (en) 2020-11-25 2020-11-25 Video encoding method, video encoding device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112532975B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113382258A (en) * 2021-06-10 2021-09-10 北京百度网讯科技有限公司 Video encoding method, apparatus, device, and medium
CN113891074A (en) * 2021-11-18 2022-01-04 北京达佳互联信息技术有限公司 Video encoding method and apparatus, electronic apparatus, and computer-readable storage medium
CN115190295A (en) * 2022-06-30 2022-10-14 北京百度网讯科技有限公司 Video frame processing method, device, equipment and storage medium
CN117880507A (en) * 2024-03-12 2024-04-12 腾讯科技(深圳)有限公司 Video encoding method, apparatus, device, storage medium, and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106791642A (en) * 2016-12-16 2017-05-31 浙江宇视科技有限公司 A kind of video data restoration methods and device
CN107087200A (en) * 2017-05-11 2017-08-22 郑州轻工业学院 Coding mode advance decision method is skipped for high efficiency video encoding standard
US20190037233A1 (en) * 2016-07-27 2019-01-31 Cisco Technology, Inc. Motion compensation using a patchwork motion field
CN110087077A (en) * 2019-06-05 2019-08-02 广州酷狗计算机科技有限公司 Method for video coding and device, storage medium
US20200177880A1 (en) * 2011-10-05 2020-06-04 Texas Instruments Incorporated Systems and methods for quantization of video content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200177880A1 (en) * 2011-10-05 2020-06-04 Texas Instruments Incorporated Systems and methods for quantization of video content
US20190037233A1 (en) * 2016-07-27 2019-01-31 Cisco Technology, Inc. Motion compensation using a patchwork motion field
CN106791642A (en) * 2016-12-16 2017-05-31 浙江宇视科技有限公司 A kind of video data restoration methods and device
CN107087200A (en) * 2017-05-11 2017-08-22 郑州轻工业学院 Coding mode advance decision method is skipped for high efficiency video encoding standard
CN110087077A (en) * 2019-06-05 2019-08-02 广州酷狗计算机科技有限公司 Method for video coding and device, storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113382258A (en) * 2021-06-10 2021-09-10 北京百度网讯科技有限公司 Video encoding method, apparatus, device, and medium
CN113891074A (en) * 2021-11-18 2022-01-04 北京达佳互联信息技术有限公司 Video encoding method and apparatus, electronic apparatus, and computer-readable storage medium
CN113891074B (en) * 2021-11-18 2023-08-01 北京达佳互联信息技术有限公司 Video encoding method and apparatus, electronic apparatus, and computer-readable storage medium
CN115190295A (en) * 2022-06-30 2022-10-14 北京百度网讯科技有限公司 Video frame processing method, device, equipment and storage medium
CN117880507A (en) * 2024-03-12 2024-04-12 腾讯科技(深圳)有限公司 Video encoding method, apparatus, device, storage medium, and computer program product

Also Published As

Publication number Publication date
CN112532975B (en) 2021-09-21

Similar Documents

Publication Publication Date Title
CN112532975B (en) Video encoding method, video encoding device, computer equipment and storage medium
JP7085014B2 (en) Video coding methods and their devices, storage media, equipment, and computer programs
KR102330316B1 (en) Adaptive transfer function for video encoding and decoding
US10986332B2 (en) Prediction mode selection method, video encoding device, and storage medium
CN111698504B (en) Encoding method, decoding method and device
JP7026260B2 (en) Video coding methods, video coding equipment and computer programs
JP2022534318A (en) Prediction mode decoding method, encoding method, decoding device, encoding device and storage medium
CN111770340B (en) Video encoding method, device, equipment and storage medium
WO2023087637A1 (en) Video coding method and apparatus, and electronic device and computer-readable storage medium
CN110177275B (en) Video encoding method and apparatus, and storage medium
CN110572679B (en) Method, device and equipment for coding intra-frame prediction and readable storage medium
CN110049326B (en) Video coding method and device and storage medium
CN116074512A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN111770339B (en) Video encoding method, device, equipment and storage medium
CN116563771A (en) Image recognition method, device, electronic equipment and readable storage medium
CN114422782B (en) Video encoding method, video encoding device, storage medium and electronic equipment
CN113079372B (en) Method, device and equipment for coding inter-frame prediction and readable storage medium
CN112437304B (en) Video decoding method, encoding method, device, equipment and readable storage medium
CN115955565B (en) Processing method, processing apparatus, and storage medium
CN117768650A (en) Image block chroma prediction method and device, electronic equipment and storage medium
CN114079787A (en) Video decoding method, video encoding method, video decoding apparatus, video encoding apparatus, and storage medium
CN117676170A (en) Method, apparatus, device and storage medium for detecting blocking effect
CN115118979A (en) Image encoding method, image decoding method, device, equipment and storage medium
CN116980627A (en) Video filtering method and device for decoding, electronic equipment and storage medium
CN115811615A (en) Screen video coding method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40040520

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant