CN114745551A - Method for processing video frame image and electronic equipment - Google Patents

Method for processing video frame image and electronic equipment Download PDF

Info

Publication number
CN114745551A
CN114745551A CN202110016821.4A CN202110016821A CN114745551A CN 114745551 A CN114745551 A CN 114745551A CN 202110016821 A CN202110016821 A CN 202110016821A CN 114745551 A CN114745551 A CN 114745551A
Authority
CN
China
Prior art keywords
image block
coded
encoded
image
division
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110016821.4A
Other languages
Chinese (zh)
Inventor
郑羽珊
楼剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110016821.4A priority Critical patent/CN114745551A/en
Publication of CN114745551A publication Critical patent/CN114745551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method, an electronic device, and a computer-readable storage medium for processing a video frame image are provided. The method comprises the following steps: acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded; acquiring coding characteristics of coded surrounding image blocks and complexity information of the image block to be coded; determining a dividing mode of an image block to be coded based on the coding characteristics of the coded surrounding image block and the complexity information of the image block to be coded, wherein the dividing mode comprises a coarse-grained dividing mode and a fine-grained dividing mode; dividing and encoding the image block to be encoded based on the dividing mode to obtain encoded data of the image block to be encoded; and determining whether the dividing mode meets a preset condition or not based on the coded data of the image block to be coded and the coded characteristics of the coded surrounding image blocks. Thereby, the encoding speed is improved and distortion loss is reduced.

Description

Method for processing video frame image and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method for processing a video frame image, an electronic device, and a computer-readable storage medium.
Background
In the rapid development of the digital video application industry chain, in the face of the trend that video applications are continuously developed towards high definition, high frame rate and high compression rate, the limitations of the previous generation video compression solution VP9 are continuously highlighted. Therefore, the open media alliance (AOMedia) developed AOMedia Video 1(AV1) as a Video coding format to improve the VP9 scheme. The AV1 is aimed at improving compression efficiency with the same or as little loss of picture quality as possible and coding rate.
AV1 is a block-based encoding scheme, in which a video frame is first divided into a plurality of non-overlapping image blocks, and each image block is encoded using the same or different encoding methods to obtain a code stream of each image block. Generally, an image area with more detail, or an image area with high complexity, should be divided into more and smaller image blocks to ensure the quality of the image. While less detailed image areas or less complex image areas should be divided into larger image blocks to achieve a smaller video compression.
However, when the current AV1 encoder divides a video frame into image blocks, the defects of low dividing speed and inaccurate dividing result still exist. Therefore, the video processing method is provided to improve the dividing speed and the accuracy of the dividing result.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An embodiment of the present disclosure provides a method for processing a video frame image, including: acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded; acquiring coding characteristics of coded surrounding image blocks and complexity information of the image blocks to be coded; determining a dividing mode of an image block to be coded based on the coding characteristics of the coded surrounding image block and the complexity information of the image block to be coded, wherein the dividing mode comprises a coarse-grained dividing mode and a fine-grained dividing mode; dividing and encoding the image block to be encoded based on the dividing mode to obtain encoded data of the image block to be encoded; determining whether the division mode meets a preset condition or not based on the coded data of the image block to be coded and the coded features of the coded surrounding image blocks; stopping searching the division mode of the image block to be coded under the condition that the division mode meets the preset condition; and under the condition that the dividing mode does not meet the preset condition, continuing searching the dividing mode of the image block to be coded.
For example, the dividing and encoding the image block to be encoded based on the dividing manner to obtain the encoded data of the image block to be encoded further includes: under the condition that the dividing mode is a coarse-granularity dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded; and under the condition that the division mode is a fine-grained division mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded.
For example, if the dividing manner does not satisfy the preset condition, continuing to search the manner of the image block to be encoded further includes: under the condition that the dividing mode is a coarse-granularity dividing mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded; under the condition that the dividing mode is a fine-grained dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded; and determining the coded data of the image block to be coded based on the coded data obtained by coding the image block to be coded as a whole and the coded data obtained by coding the plurality of sub image blocks respectively.
For example, when the dividing manner is a coarse-granularity dividing manner, the determining whether the dividing manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoded features of the encoded surrounding image blocks further includes: determining a division depth of the encoded surrounding image block based on the encoding characteristics of the encoded surrounding image block; determining the division depth of the image block to be coded based on the coded data of the image block to be coded; determining that the dividing mode does not meet a preset condition under the condition that the dividing depth of the image block to be coded is lower than the dividing depth of the coded peripheral image block; and under the condition that the division depth of the image block to be coded is not lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
For example, when the partition manner is a fine-grained partition manner, the determining whether the partition manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoded features of the encoded surrounding image blocks further includes: determining a division depth of the encoded surrounding image block based on the encoding characteristics of the encoded surrounding image block; determining the division depth of the image block to be coded based on the coded data of the image block to be coded; under the condition that the division depth of the image block to be coded is not lower than the division depth of the coded peripheral image block, determining that the division mode does not meet a preset condition; and under the condition that the division depth of the image block to be coded is lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
For example, the determining, based on the coding features of the coded surrounding image blocks and the complexity information of the image blocks to be coded, the dividing manner of the image blocks to be coded further includes: determining a division mode of the image block to be coded by utilizing a division method analysis model based on the coding characteristics of the coded peripheral image block and the complexity information of the image block to be coded, wherein the input of the division method analysis model is as follows: the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded; the output of the partitioning method analysis model is: the division mode is the probability of a coarse-grained division mode or the probability of a fine-grained division mode.
For example, the complexity information includes one or more of: coarse intra prediction loss of a video frame, coarse inter prediction loss of a video frame, prediction rate distortion loss of a video frame, sum of absolute prediction errors between an image block to be encoded and an image block reconstructed based on the encoded data, sum of square prediction differences between an image block to be encoded and an image block reconstructed based on the encoded data, average absolute prediction difference between an image block to be encoded and an image block reconstructed based on the encoded data, average square prediction error between an image block to be encoded and an image block reconstructed based on the encoded data, spatial information, and/or temporal information.
For example, the coding features include one or more of: coding mode, coding division depth, rate distortion loss, sum of absolute errors between an image block to be coded and an image block reconstructed based on the coded data, sum of squares of differences between an image block to be coded and an image block reconstructed based on the coded data, average absolute difference between an image block to be coded and an image block reconstructed based on the coded data, average square error between an image block to be coded and an image block reconstructed based on the coded data.
For example, the encoded data is a code stream into which the image block to be encoded is encoded.
For example, the determining the encoded data of the image block to be encoded based on the encoded data obtained by encoding the image block to be encoded as a whole and the encoded data obtained by encoding the plurality of sub-image blocks respectively further includes: determining a first rate distortion loss of the image block to be coded according to the coded data obtained by coding the image block to be coded as a whole; determining a second rate distortion loss of the image block to be encoded according to the encoded data obtained by respectively encoding the plurality of sub-image blocks; when the first rate distortion loss is lower than the second rate distortion loss, taking the coded data obtained by coding the image block to be coded as a whole as the coded data of the image block to be coded; and when the first rate distortion loss is higher than the second rate distortion loss, taking the coded data obtained by respectively coding the plurality of sub image blocks as the coded data of the image block to be coded.
For example, the obtaining the complexity information of the image block to be encoded further includes: acquiring complexity information of the image block to be coded by using a complexity analysis model, wherein the complexity information of the image block to be coded comprises the complexity of the image block to be coded as a whole and/or the complexity of each sub image block with fine granularity formed by dividing the image block to be coded; the input of the complexity analysis model is the pixel value of each pixel in the image block to be coded or the pixel value of each pixel in a plurality of sub image blocks with fine granularity into which the image block to be coded is divided; and the output of the complexity analysis model is the complexity of the image block to be coded, or one or more of the sum of the complexities, the mean of the complexities and the weighted sum of the complexities of all sub-image blocks with fine granularity formed by dividing the image block to be coded.
For example, the obtaining the coding features of the coded surrounding image blocks further includes: acquiring the coding characteristics of the coded surrounding image blocks by using a coding data analysis model, wherein the input of the coding data analysis model is the coding data of the coded image blocks; the output of the encoded data analysis model is the encoding characteristics of the encoded image block.
For example, in the case of acquiring complexity information of the image block to be encoded by using a complexity analysis model, parameters in the complexity analysis model are adjusted by using a first rate-distortion loss and a second rate-distortion loss.
For example, in a case where the encoding characteristics of the encoded surrounding image block are obtained by using an encoded data analysis model, parameters in the encoded data analysis model are adjusted using a first rate-distortion loss and a second rate-distortion loss.
For example, in the case where the division method analysis model is used to determine the division manner of the image block to be encoded, the parameters in the division method analysis model are adjusted using the first rate-distortion loss and the second rate-distortion loss.
An embodiment of the present disclosure provides a method for processing a video frame image, including: acquiring accelerated coding enabling information input by a user, wherein the accelerated coding enabling information indicates that: in the process of determining the dividing mode of the image block to be coded in the video frame image, determining the dividing mode of the image block to be coded by using the coding characteristics of the image blocks around the image block to be coded and the complexity information of the image block to be coded, wherein the dividing mode comprises a coarse-granularity dividing mode and a fine-granularity dividing mode; wherein surrounding image blocks of the image block to be encoded are adjacent to the image block to be encoded, and the surrounding image blocks of the image block to be encoded have already been encoded; and encoding the video frame image based on the accelerated encoding enabling information.
An embodiment of the present disclosure provides a method for processing a video frame image, including: acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded; acquiring coding characteristics of coded surrounding image blocks and complexity information of the image blocks to be coded; determining a dividing mode of an image block to be coded based on the coding characteristics of the coded surrounding image blocks and the complexity information of the image block to be coded; under the condition that the dividing mode is a coarse-granularity dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded; and under the condition that the division mode is a fine-grained division mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded.
An embodiment of the present disclosure provides a method for processing a video frame image, including: acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded; acquiring complexity information of the image block to be coded; determining a dividing mode of the image block to be encoded based on the complexity information of the image block to be encoded, wherein the dividing mode is a coarse-grained dividing mode or a fine-grained dividing mode; dividing and coding an image block to be coded based on a dividing mode to obtain coded data of the image block to be coded; acquiring coding characteristics of coded surrounding image blocks, and determining whether a division mode meets a preset condition or not based on the coded data of the image block to be coded and the coding characteristics of the coded surrounding image blocks; stopping searching the division mode of the image block to be coded under the condition that the division mode meets a preset condition; and under the condition that the dividing mode does not meet the preset condition, continuing searching the mode of the image block to be coded.
Embodiments of the present disclosure provide an electronic device. The electronic device includes: one or more processors; and one or more memories, wherein the memories have stored therein computer readable code, which when executed by the one or more processors, performs the method described above.
According to yet another embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon instructions, which, when executed by a processor, cause the processor to perform the above-mentioned method.
According to another aspect of the present disclosure, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the above aspects or various alternative implementations of the above aspects.
Therefore, the embodiment of the disclosure can acquire a more accurate division mode by combining the coding features of the coded surrounding image blocks and the complexity information of the image blocks to be coded. Meanwhile, the embodiment of the disclosure compares the coding characteristics of the image block to be coded with the coding characteristics of the coded surrounding image blocks, so as to determine whether to skip the search of the division mode, thereby improving the coding speed. The embodiment of the disclosure combines the complexity of the image block to be coded with the coded information of the surrounding blocks, can realize the judgment of a rapid division mode and rapidly jump out the division process of the image block to be coded, so as to improve the coding speed and obtain the balance of the coding speed and the coding rate on the premise of not influencing the coding rate and the subjective experience of a user as much as possible.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly introduced below. The drawings in the following description are merely exemplary embodiments of the disclosure.
Fig. 1A is a schematic structural diagram of an AV1 video coding framework according to an embodiment of the present disclosure.
Fig. 1B is a schematic diagram of image block partitioning in AV1 video coding according to an embodiment of the present disclosure.
Fig. 2A is a first flowchart of a method of processing a video frame image according to an embodiment of the present disclosure.
Fig. 2B is a second flowchart of a method of processing a video frame image according to an embodiment of the present disclosure.
Fig. 2C is a third flowchart of a method of processing a video frame image according to an embodiment of the present disclosure.
Fig. 2D is a schematic diagram of partitioning a video frame image according to an embodiment of the present disclosure.
Fig. 3 is yet another schematic diagram of a method of processing a video frame image according to an embodiment of the present disclosure.
Fig. 4A is an interface schematic diagram of a method of processing a video frame image according to an embodiment of the present disclosure.
Fig. 4B is a schematic flow chart diagram of a method of processing a video frame image according to an embodiment of the present disclosure.
Fig. 5 shows a schematic diagram of an electronic device according to an embodiment of the disclosure.
Fig. 6 shows a schematic diagram of an architecture of an exemplary computing device, according to an embodiment of the present disclosure.
FIG. 7 shows a schematic diagram of a storage medium according to an embodiment of the disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
In the present specification and the drawings, steps and elements having substantially the same or similar characteristics are denoted by the same or similar reference numerals, and repeated description of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance or order.
Various concepts related to the present disclosure are described below.
The Quantization Parameter (Quantization Parameter) is the number of Quantization step size Qstep. For luminance (Luma) coding, the quantization step Qstep has a total of 52 values, with QP values from 0 to 51, and for Chroma (Chroma) coding, QP values from 0 to 39.
Acquiring a frame rate: the number of video frames acquired per second, in fps (frame per second).
The coding frame rate is as follows: number of video frames coded per second, in units of fps (frame per second).
Original image: the original encoded image of the encoder is input.
And (3) image reconstruction: and (4) after the coding is finished, the decoder outputs a reconstructed image.
Rate-Distortion loss (Rate-Distortion Cost, RDCost): the method is used for measuring the code rate and the distortion in video coding. The rate-distortion loss indicates the minimum distortion loss achieved at a given code rate.
Spatial Information (SI): the amount of spatial detail characterizing a frame of image. The more spatially complex the scene, the higher the SI value.
Temporal Information (TI): representing the amount of temporal variation in video sequences, sequences with higher degrees of motion will typically have higher TI values.
Cloud Technology (Cloud Technology) is a general term of network Technology, information Technology, integration Technology, management platform Technology, application Technology and the like based on Cloud computing business model application, can form a resource pool, can be used as required, and is flexible and convenient. Background services of technical network systems currently require a lot of computing and storage resources, such as video websites, picture-like websites and more portal websites. With the high development and application of the internet industry, each article may have an own identification mark and needs to be transmitted to a background system for logic processing, data of different levels can be processed separately, and various industry data need strong system background support and can be realized only through cloud computing.
At present, cloud technologies are mainly classified into a cloud-based technology class and a cloud application class; the cloud-based technology class can be further subdivided into: cloud computing, cloud storage, databases, big data, and the like; the cloud application class may be further subdivided into: medical cloud, cloud-things, cloud security, cloud calls, private cloud, public cloud, hybrid cloud, cloud gaming, cloud education, cloud conferencing, cloud social, and artificial intelligence cloud services, among others.
Methods of processing video frames according to the present disclosure may involve cloud computing and cloud storage under cloud technology.
Cloud computing (cloud computing) is a computing model that distributes computing tasks over a large pool of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
In the present disclosure, since determining the complexity of the current image block and the surrounding image block involves large-scale computation, and requires huge computation power and storage space, in the present disclosure, a terminal device may obtain sufficient computation power and storage space through a cloud computing technology, so as to perform the determining of the complexity of the original image block and generate the encoding of the original image block according to the complexity of the current image block and the surrounding image block, so as to generate the encoded data (code stream) of the video.
A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.
In the present disclosure, the video frame images may be stored in a "cloud", and when it is necessary to determine the complexity of the current image block and the surrounding image block and determine the encoding strategy according to the complexity of the current image block and the surrounding image block, the current image block and the surrounding image block of the current video frame, or the image blocks at corresponding positions of multiple frames before and after the current video frame may be pulled from a cloud storage device to reduce the storage pressure of the terminal device.
From the service perspective, the video coding method related to the present disclosure may be applied in service scenes related to video coding, such as video uploading, web conferences, online training (e.g., online conferences), and the like; from the technical principle perspective, the technical solution of the present disclosure can be directly applied to the AV1 video coding standard to improve the coding flexibility and coding speed when video coding is performed based on the AV1 video coding standard. The present disclosure is illustrated by taking the AV1 video coding standard as an example, and the technical solution of the present disclosure may also be applied to other known video coding standards, such as MPEG (moving picture experts group), HEVC (high efficiency video coding), VVC (multi-function video coding), etc., or may also be applied to other more advanced video coding standards, which is not limited by the present disclosure.
First, the operation principle of the AV1 video encoding framework corresponding to the AV1 video encoding standard will be explained.
Fig. 1A is a schematic structural diagram of an AV1 video coding framework according to an embodiment of the present disclosure. Fig. 1B is a schematic diagram of image block division in AV1 video coding according to an embodiment of the present disclosure.
The encoding system of AV1 employs a hybrid encoding framework, in which a hybrid of multiple modules is included. In AV1, each module compresses data redundancies in different aspects of the image from different perspectives and means. So that AV1 can achieve relatively high performance.
As shown in fig. 1A, AV1 may take a certain video frame of a video as an input image and then divide this image into a plurality of image blocks. While fig. 1A schematically shows the block division in white horizontal and vertical lines, those skilled in the art will appreciate that other block divisions may be employed by the AV1 for the video frame. The AV1 encoder will then process in units of image blocks. For example, the processing of the image block may include inter-frame prediction, intra-frame prediction, transformation (e.g., Discrete Cosine Transform (DCT) transformation), quantization, entropy coding, loop filtering, film granularity synthesis, etc., so as to obtain compressed encoded data (e.g., a code stream).
The present disclosure relates generally to block partitioning techniques in the AV1 video coding standard. The block division technique is, for example, to divide an image into a plurality of rectangular image blocks and then to encode and decode the image in units of image blocks. In the current AV1 video coding standard, the largest image block has a size of 128 (pixels) x128 (pixels) and the smallest image block has a size of 4 (pixels) x4 (pixels). The size of the largest image block may be further divided into four equal parts or two equal parts. As shown in fig. 1B, the four equal parts of sub image blocks (the sub image blocks labeled R in fig. 1B) may be further recursively divided, and each sub image block may be further divided into fine-grained sub image blocks in up to nine divisions.
Thus, for complex and various image contents, different division modes can enable the AV1 encoder to perform most effective encoding on image blocks with different sizes and different complexities. Furthermore, different prediction modes (e.g., a directional prediction mode, a recursive filtering mode, a cross component prediction mode, a smooth prediction mode, etc.) and processing modes, etc. may be used for different image blocks, thereby further improving the efficiency and quality of encoding.
In general, for image blocks with complex scenes and more details (e.g., higher complexity), a smaller partition size should be used, while for image blocks with simple scenes and less details (e.g., lower complexity), a larger partition size should be used.
As to how to find the optimal division of image blocks faster to increase the coding efficiency of the AV1 encoder, the following two schemes are proposed.
The first scheme is as follows: when a video is encoded, all the division modes are sequentially traversed from top to bottom (for example, from the division mode with the coarsest granularity to the division mode with the finest granularity) or from bottom to top (for example, from the division mode with the finest granularity to the division mode with the coarsest granularity), and then the encoding effects of the division modes are compared, so that the optimal division mode is determined. In the scheme, the optimal division mode can be accurately found under the ordinary condition. However, in the first scheme, when a video is encoded, if a block division performed with recursion from top to bottom is adopted, for a block with a complex scene, the optimal division size of the block is usually small, and therefore many layers of recursion are required to find the optimal division mode. If the recursive block division is performed from bottom to top, the optimal division size of a block is usually larger for a block with a simple scene, so that the optimal division mode can be found only if recursive multiple layers are required for dividing a small-size image block to a large-size image block. Although the coding quality and compression efficiency of searching for the optimal block partitioning manner from top to bottom and from bottom to top are good, the scheme-encoding on the scene-complex video will bring large speed loss.
And in the second scheme, when the video is coded, before whether the current image block needs to be divided into image blocks with smaller sizes (fine-grained image blocks) is judged each time, the complexity of the current image block is analyzed, and the division granularity of the current image block is judged according to the complexity of the current image block. If the complexity of the current image block is high, the current image block is encoded in a fine-grained division manner (for example, the current image block is divided into four square sub-blocks for encoding). If the complexity is low, the encoding is done with coarse granularity partitioning (e.g., encoding without partitioning directly). However, although the second scheme can quickly determine the partition mode to improve the encoding speed, in many cases, the judgment of the partition mode according to the complexity of the current image block is likely to be inaccurate, which results in a large quality loss and a poor user experience.
Therefore, further improvements in the block division technique are required, thereby further improving the division speed and the accuracy of the division result.
Fig. 2A is a flow chart of a method 21 of processing a video frame image according to an embodiment of the disclosure. Fig. 2B is a flow chart of a method 22 of processing a video frame image according to an embodiment of the present disclosure. Fig. 2C is a flow chart of a method 23 of processing a video frame image according to an embodiment of the present disclosure. Fig. 2D is a schematic diagram of partitioning a video frame image according to an embodiment of the present disclosure.
The methods 21 to 23 may be performed by a user terminal. The user terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. Methods 21-23 may also be performed by a network server. The network server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited thereto. The methods 21 to 23 may also be performed by a combination of a user terminal and a network server, and the disclosure is not limited thereto.
As shown in fig. 2A, a method 21 of processing a video frame image according to an embodiment of the present disclosure includes the following steps.
In step S211, an image block to be encoded is obtained from the video frame image, and at least one surrounding image block adjacent to the image block to be encoded is encoded.
As shown in fig. 2D, image blocks adjacent to the image block to be encoded include encoded image blocks a-G, and a plurality of image blocks to be encoded (e.g., image block H and image block I). As shown in fig. 2D, surrounding image blocks corresponding to the image block to be encoded may or may not be encoded, depending on whether the encoding of the respective image blocks is performed concurrently and the order of the encoding of the image blocks. In the present disclosure, as long as there is one encoded peripheral image block among the peripheral image blocks of the image block to be encoded, the number of encoded peripheral image blocks is not limited.
In step S212, the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded are obtained.
Optionally, the complexity information comprises one or more of: coarse intra prediction loss of a video frame, coarse inter prediction loss of a video frame, prediction rate distortion loss of a video frame, sum of absolute prediction errors between an image block to be encoded and an image block reconstructed based on the encoded data, sum of square prediction differences between an image block to be encoded and an image block reconstructed based on the encoded data, average absolute prediction difference between an image block to be encoded and an image block reconstructed based on the encoded data, average square prediction error between an image block to be encoded and an image block reconstructed based on the encoded data, spatial information, and/or temporal information. It is noted that the complexity information may also include less or more information, and the disclosure is not limited thereto.
The Spatial Information (SI) may refer to Temporal Information (TI) of the video frame image. The Spatial Information (SI) characterizes the amount of spatial detail of a frame of an image. The more spatially complex the scene, the higher the SI value. Temporal Information (TI) characterizes the amount of temporal variation in a sequence of video frames, with sequences with higher degrees of motion typically having higher TI values.
The prediction rate distortion loss of the video frame image is the relation between the predicted code rate and the predicted distortion in the video coding process, and the relation indicates the minimum distortion loss which can be achieved by prediction under the given code rate.
Optionally, the complexity information of the image block to be encoded includes a complexity of the image block to be encoded as a whole, and may also include a sum of complexities, a complexity average, and a complexity weighted sum of sub-image blocks of fine granularity into which the image block to be encoded is divided. The present disclosure is not so limited. How to obtain the complexity information of the image block to be encoded will be further described with reference to fig. 3, and this disclosure is not repeated herein.
Optionally, the coding features comprise one or more of: coding mode, coding division depth, rate distortion loss, sum of absolute differences between an image block to be coded and an image block reconstructed based on the coded data, sum of squares of differences between an image block to be coded and an image block reconstructed based on the coded data, average absolute difference between an image block to be coded and an image block reconstructed based on the coded data, average square error between an image block to be coded and an image block reconstructed based on the coded data. How to obtain the coding features of the image block to be coded and/or the coded surrounding image block will be further described later with reference to fig. 3, and this disclosure is not repeated herein.
In step S213, based on the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded, determining a dividing manner of the image blocks to be encoded, where the dividing manner includes a coarse-grained dividing manner and a fine-grained dividing manner.
Optionally, the dividing manner of the image block to be encoded determined in step S213 is a tendency dividing manner, which indicates that, according to the encoding characteristics of the encoded peripheral image block and the complexity information of the image block to be encoded, a coarse-grained dividing manner may be better than a fine-grained dividing manner, or a fine-grained dividing manner may be better than a coarse-grained dividing manner.
The complexity information of the image block to be encoded optionally includes complexity information of each sub image block of fine granularity into which the image block to be encoded is divided, for example, any one or more of a sum of complexities, a mean of complexities, and a weighted sum of complexities of each sub image block of fine granularity into which the image block to be encoded is divided. The complexity information of each sub image block with fine granularity formed by dividing the image block to be coded has certain reference significance for dividing the image block to be coded. Through the complexity information of each sub image block with fine granularity formed by dividing the image block to be coded, a more accurate dividing mode (for example, a tendency dividing mode) can be obtained.
As shown in fig. 2D, since the video scene usually has continuity, the surrounding image blocks have a certain reference meaning for the division of the image block to be encoded. By combining the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded, a more accurate partitioning manner (e.g., a trend partitioning manner) can be obtained.
Wherein determining the division manner of the image block to be encoded can be implemented, for example, using a function/module related to the search of the division manner in AV 1. For example, the function/module may be "partition _ search" or "av 1_ prune _ partitions _ before _ search", etc., or a combination thereof. The present disclosure is not so limited and will include more or less functional modules as the relevant video coding standards evolve.
Optionally, the coarse-grained division manner indicates: and coding the image block to be coded as a whole to acquire coded data of the image block to be coded. The coded data is a code stream formed by coding the image block to be coded. The coarse-grained division may be as shown by the blocks filled with diagonal lines in fig. 1B.
Optionally, the fine-grained division manner indicates: and dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire the encoded data of the image block to be encoded. The fine-grained division may be as shown in a white square in fig. 1B, and each sub image block may be further divided into fine-grained sub image blocks according to at most nine division manners. As shown in fig. 1B, the four equal parts of sub image blocks (the sub image blocks labeled R in fig. 1B) may be further recursively divided into fine-grained sub image blocks.
Of course, as the standard of AV1 video coding and other video coding standards further evolve, more partitions may occur, and the fine-grained partitions and the coarse-grained partitions described in this disclosure should be understood as relative concepts and include any partitions that may occur.
The partitioning manner in the present disclosure may be determined based on the encoding characteristics of the encoded surrounding image block and the complexity information of the image block to be encoded by a trained partitioning method analysis model. Later, how to determine the partition mode of the image block to be encoded by the partition method analysis model will be further described in fig. 3, and this disclosure is not repeated herein.
In step S214, the image block to be encoded is divided and encoded based on the dividing manner to obtain encoded data of the image block to be encoded.
Wherein, in the case that the dividing manner is a coarse-grained dividing manner, the dividing and encoding the image block to be encoded based on the dividing manner to obtain the encoded data of the image block to be encoded further includes: and coding the image block to be coded as a whole to obtain coded data of the image block to be coded.
Wherein, when the dividing manner is a fine-grained dividing manner, the dividing and encoding the image block to be encoded based on the dividing manner to obtain the encoded data of the image block to be encoded further includes: and dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded.
The division of the image block to be encoded based on the division manner can be realized by selecting a relevant function/module using the division manner in AV1, for example. For example, the function/module may be "rd _ pick _ partition". The present disclosure is not so limited and will include more or less functional modules as the relevant video coding standards evolve.
In step S215, it is determined whether the dividing manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoding characteristics of the encoded surrounding image blocks.
For example, the encoding characteristics of the image block to be encoded in the case of the division mode may be obtained based on the encoded data of the image block to be encoded. And then comparing the coding characteristics of the image block to be coded with the coding characteristics of the coded surrounding image blocks. If the difference between the two is smaller, the current dividing mode meets the preset condition, otherwise, the dividing mode does not meet the preset condition.
For example, when the dividing manner is a coarse-granularity dividing manner, the determining whether the dividing manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoded features of the encoded surrounding image blocks further includes: determining a division depth of the encoded surrounding image block based on the encoding characteristics of the encoded surrounding image block; determining the division depth of the image block to be coded based on the coded data of the image block to be coded; determining that the dividing mode does not meet a preset condition under the condition that the dividing depth of the image block to be coded is lower than the dividing depth of the coded peripheral image block; and under the condition that the division depth of the image block to be coded is not lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
For example, when the partition manner is a fine-grained partition manner, the determining whether the partition manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoded features of the encoded surrounding image blocks further includes: determining a division depth of the encoded surrounding image block based on the encoding characteristics of the encoded surrounding image block; determining the division depth of the image block to be coded based on the coded data of the image block to be coded; under the condition that the division depth of the image block to be coded is not lower than the division depth of the coded peripheral image block, determining that the division mode does not meet a preset condition; and under the condition that the division depth of the image block to be coded is lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
Among them, the division depth may indicate the number of iterations that are performed when a certain image block is iterated from an image block of the maximum size (an image block of the coarsest granularity) to the size of the image block for encoding.
In step S216, if the dividing manner meets a preset condition, the search for the dividing manner of the image block to be encoded is stopped.
Therein, stopping the search for the division of the image blocks to be encoded may be implemented, for example, using prune related functions/modules in AV 1. For example, the function/module may be "av 1_ prune _ partitions _ by _ max _ min _ bsize", "partition _ search _ skip _ partition", "partition _ search _ break _ kout", or the like, or a combination thereof. The present disclosure is not so limited and will include more or less functional modules as the relevant video coding standards evolve.
For example, when the dividing manner is a coarse-granularity dividing manner and the dividing manner satisfies a preset condition, the encoded data obtained by encoding the image block to be encoded as a whole is determined as the encoded data of the image block to be encoded. At this time, further evaluation/search of the coarse-grained division manner and the fine-grained division manner of the image block to be encoded may be stopped, without comparing other performance indicators (e.g., rate-distortion loss) of the coarse-grained division and the fine-grained division. The determined encoded data of the image block to be encoded is the final encoded data of the image block to be encoded, and other encoding modes do not need to be further judged.
For example, when the dividing manner is a fine-grained dividing manner and the dividing manner meets a preset condition, it is determined that fine-grained dividing should be performed on the image block to be encoded, and further evaluation/search of coarse-grained dividing and fine-grained dividing of the image block to be encoded is stopped, without comparing other performance indexes (e.g., rate-distortion loss) of the coarse-grained dividing manner and the fine-grained dividing manner. And then, respectively using the plurality of divided sub image blocks as image blocks to be encoded so as to determine the dividing mode of each sub image block. The determination of the partition manner of each sub-image block is similar to the determination of the partition manner of the image block to be encoded, and is not described herein again.
In step S217, if the dividing manner does not satisfy the preset condition, the search for the dividing manner of the image block to be encoded is continued.
Optionally, when the dividing manner is a coarse-granularity dividing manner, continuing to search the dividing manner of the image block to be encoded further includes dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and encoding each of the plurality of sub image blocks to obtain encoded data of the image block to be encoded. Then, the encoded data of the image block to be encoded is determined based on the encoded data obtained by encoding the image block to be encoded as a whole and the encoded data obtained by encoding the plurality of sub image blocks respectively.
Optionally, when the dividing manner is a fine-grained dividing manner, continuing to search for the dividing manner of the image block to be encoded further includes encoding the image block to be encoded as a whole to obtain encoded data of the image block to be encoded (that is, dividing and encoding the image block to be encoded in a coarse-grained dividing manner).
Then, the encoded data of the image block to be encoded is determined based on the encoded data obtained by encoding the image block to be encoded as a whole (i.e., encoded data based on a coarse-grained division manner) and the encoded data obtained by encoding the plurality of sub-image blocks respectively (i.e., encoded data based on a fine-grained division manner). For example, the encoded data having the optimal encoding characteristics is selected from the encoded data based on the coarse-grained division method and the encoded data based on the fine-grained division method. The coding feature is optionally a rate distortion loss.
The continuation of the search for the division of the image block to be encoded may be implemented, for example, by using a "partition _ search" related function/module in AV 1. The present disclosure is not so limited and will include more or less functional modules as the relevant video coding standards evolve.
Optionally, the determining the encoded data of the image block to be encoded based on the encoded data obtained by encoding the image block to be encoded as a whole and the encoded data obtained by encoding the plurality of sub-image blocks respectively further includes: determining a first rate distortion loss of the image block to be coded according to the coded data obtained by coding the image block to be coded as a whole; determining a second rate distortion loss of the image block to be encoded according to the encoded data obtained by respectively encoding the plurality of sub-image blocks; when the first rate distortion loss is lower than the second rate distortion loss, taking the coded data obtained by coding the image block to be coded as a whole as the coded data of the image block to be coded; and when the first rate distortion loss is higher than the second rate distortion loss, taking the coded data obtained by respectively coding the plurality of sub image blocks as the coded data of the image block to be coded.
And under the condition that the encoded data obtained by respectively encoding the plurality of sub image blocks is used as the encoded data of the image block to be encoded (that is, under the condition that the encoded data based on the fine-grained division mode is used as the encoded data of the image block to be encoded), respectively taking the plurality of divided sub image blocks as the image blocks to be encoded again to determine the division mode of each sub image block. The determination of the partition manner of each sub-image block is similar to the determination of the partition manner of the image block to be encoded, and is not described herein again.
The first iteration of the scheme is performed continuously to obtain the optimal block division mode, so that a great amount of speed loss is caused. Compared with the first scheme, the method 21 may determine whether the division manner meets the coding requirement (e.g., the requirement of dividing the depth) through the coding characteristics (e.g., the dividing depth) of the coded surrounding image block and the coding characteristics (e.g., the dividing depth) of the image block to be coded. And under the condition of meeting the preset condition, the process of dividing and searching under the current size can be quickly skipped, and the coding efficiency is greatly improved.
And the second scheme judges the division mode only according to the complexity of the image block to be coded and does not further judge the coded data acquired based on the division mode, thereby bringing a great deal of quality loss. Compared with scheme two, the method 21 may determine whether the partition manner meets the coding requirement (e.g., the requirement of rate distortion loss) by using the coding characteristics (e.g., the partition depth) of the coded surrounding image blocks and the coding characteristics (e.g., the partition depth) of the image blocks to be coded. And under the condition that the preset condition is not met, the searching process can be continuously divided, and the accuracy of coding is ensured by comparing the first rate distortion loss with the second rate distortion loss.
Therefore, the method 21 may obtain a more accurate partitioning manner by combining the coding features of the coded surrounding image blocks and the complexity information of the image blocks to be coded. Meanwhile, the method 21 may determine whether to skip the search of the partition mode of the current size by comparing the coding features of the image block to be coded with the coding features of the encoded surrounding image blocks, so as to increase the coding speed. The method 21 combines the complexity of the image block to be coded with the coded information of the surrounding blocks, can realize the judgment of a rapid division mode and rapidly jump out the division process of the image block to be coded under the current size, and improves the coding speed on the premise of not influencing the coding rate and the subjective experience of a user as much as possible.
As shown in fig. 2B, a method 22 of processing a video frame image according to an embodiment of the present disclosure includes the following steps.
In step S221, an image block to be encoded is obtained from the video frame image, and at least one surrounding image block adjacent to the image block to be encoded is encoded. Step S221 is similar to step S211, and therefore is not described in detail.
In step S222, the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded are obtained. Step S222 is similar to step S212, and therefore is not described in detail.
In step S223, a dividing manner of the image block to be encoded is determined based on the encoding characteristics of the encoded surrounding image block and the complexity information of the image block to be encoded. Step S223 is similar to step S213, and thus is not described in detail.
Since video scenes are usually continuous, surrounding image blocks have a certain reference meaning to the division of the image block to be encoded. And acquiring a more accurate division mode according to the coding characteristics of the coded surrounding image blocks and the complexity information of the image blocks to be coded.
In step S224, when the dividing manner is a coarse-grained dividing manner, the image block to be encoded is encoded as a whole to obtain encoded data of the image block to be encoded.
In step S225, when the dividing manner is a fine-grained dividing manner, the image block to be encoded is divided into a plurality of sub image blocks with fine granularity, and the plurality of sub image blocks are encoded respectively to obtain encoded data of the image block to be encoded.
Step S224 and step S225 are similar to step S214, and thus are not described again.
As shown in fig. 2D, since the video scene usually has continuity, the surrounding image blocks have a certain reference meaning to the division of the image block to be encoded. The method 22 may obtain a more accurate partitioning manner by combining the coding characteristics of the coded surrounding image blocks and the complexity information of the image blocks to be coded.
As shown in fig. 2C, a method 23 of processing a video frame image according to an embodiment of the present disclosure includes the following steps.
In step S231, an image block to be encoded is obtained from the video frame image, and at least one surrounding image block adjacent to the image block to be encoded is encoded. Step S231 is similar to step S211, and thus is not described in detail.
In step S232, complexity information of the image block to be encoded is acquired. Step S232 is similar to the step S212 of obtaining the complexity information of the image block to be encoded, and thus is not repeated.
In step S233, a dividing manner of the image block to be encoded is determined based on the complexity information of the image block to be encoded, where the dividing manner is a coarse-grained dividing manner or a fine-grained dividing manner.
Optionally, the complexity information comprises one or more of: coarse intra prediction loss of a video frame, coarse inter prediction loss of a video frame, prediction rate distortion loss of a video frame, sum of absolute prediction errors between an image block to be encoded and an image block reconstructed based on the encoded data, sum of square prediction differences between an image block to be encoded and an image block reconstructed based on the encoded data, average absolute prediction difference between an image block to be encoded and an image block reconstructed based on the encoded data, average square prediction error between an image block to be encoded and an image block reconstructed based on the encoded data, spatial information, and/or temporal information. It is noted that the complexity information may also include less or more information, and the disclosure is not limited thereto.
Optionally, the complexity information of the image block to be encoded includes a complexity of the image block to be encoded as a whole, and may also include a sum of complexities, a complexity average, and a complexity weighted sum of sub-image blocks of fine granularity into which the image block to be encoded is divided. Therefore, compared with the scheme one in which the division mode is determined only according to the complexity of the image block to be encoded as a whole, in the present disclosure, a more accurate division mode can be obtained by combining the sum of the complexities, the mean of the complexities, and the weighted sum of the complexities of the fine-grained sub image blocks into which the image block to be encoded is divided.
In step S234, the image block to be encoded is divided and encoded based on the division manner to obtain encoded data of the image block to be encoded. Step S234 is similar to step S214, and therefore is not described in detail.
In step S235, the encoding characteristics of the encoded surrounding image block are obtained, and based on the encoded data of the image block to be encoded and the encoding characteristics of the encoded surrounding image block, it is determined whether the partition manner satisfies a preset condition.
The encoding characteristics of the obtained encoded surrounding image blocks in step S235 are similar to the encoding characteristics of the obtained encoded surrounding image blocks in step S212. The step S235 of determining whether the partition manner satisfies the preset condition based on the encoded data of the image block to be encoded and the encoded characteristics of the encoded surrounding image blocks is similar to the step S215 of determining whether the partition manner satisfies the preset condition based on the encoded data of the image block to be encoded and the encoded characteristics of the encoded surrounding image blocks, and therefore details are omitted here.
In step S236, if the dividing manner satisfies a preset condition, the search for the dividing manner of the image block to be encoded is stopped. Step S236 is similar to step S216, and thus is not described in detail.
In step S237, if the dividing manner does not satisfy the preset condition, the search for the manner of the image block to be encoded is continued. Step S237 is similar to step S217 and thus will not be described in detail.
The first iteration of the scheme is performed continuously to obtain the optimal block division mode, so that a great amount of speed loss is caused. Compared with scheme one, the method 23 may determine whether the partition manner meets the encoding requirement (e.g., the requirement of dividing the depth) through the encoding characteristics (e.g., the dividing depth) of the encoded surrounding image blocks and the encoding characteristics (e.g., the dividing depth) of the image blocks to be encoded. And under the condition of meeting the preset conditions, the process of dividing and searching can be rapidly skipped, and the coding efficiency is greatly improved.
And the second scheme judges the division mode only according to the complexity of the image block to be coded and does not further judge the coded data acquired based on the division mode, thereby bringing a great deal of quality loss. Compared with scheme two, the method 23 may determine whether the partition manner meets the coding requirement (e.g., the requirement of rate distortion loss) by using the coding characteristics (e.g., the partition depth) of the coded surrounding image blocks and the coding characteristics (e.g., the partition depth) of the image blocks to be coded. And under the condition that the preset condition is not met, the process of searching can be continuously divided, so that the accuracy of coding is ensured.
Therefore, the method 23 can determine whether to skip the search of the current size of the partition mode by comparing the coding characteristics of the image block to be coded with the coding characteristics of the coded surrounding image blocks, thereby improving the coding speed. The method 23 combines the complexity of the image block to be coded with the coded information of the surrounding blocks, can realize the judgment of the rapid division mode and rapidly jump out the division process of the image block to be coded under the current size, so as to improve the coding speed on the premise of not influencing the coding rate and the subjective experience of the user as much as possible.
According to still another aspect of the present disclosure, there is also provided an apparatus for processing a video frame image. The device for processing the video frame image comprises: a first obtaining module configured to obtain an image block to be encoded from the video frame image, at least one surrounding image block adjacent to the image block to be encoded having been encoded; the second acquisition module is configured to acquire the encoding characteristics of the encoded surrounding image block and the complexity information of the image block to be encoded; a first determination module configured to: determining a dividing mode of an image block to be coded based on the coding characteristics of the coded surrounding image block and the complexity information of the image block to be coded, wherein the dividing mode comprises a coarse-grained dividing mode and a fine-grained dividing mode; a processing module configured to: dividing and encoding the image block to be encoded based on the dividing mode to obtain encoded data of the image block to be encoded; a second determination module configured to: determining whether the division mode meets a preset condition or not based on the coded data of the image block to be coded and the coded features of the coded surrounding image blocks; a partition stopping manner searching module configured to: stopping searching the division mode of the image block to be coded under the condition that the division mode meets a preset condition; a continued division mode search module configured to: and under the condition that the dividing mode does not meet the preset condition, continuing searching the dividing mode of the image block to be coded.
According to still another aspect of the present disclosure, there is also provided an apparatus for processing a video frame image. The device for processing the video frame image comprises: a first obtaining module configured to obtain an image block to be encoded from the video frame image, at least one surrounding image block adjacent to the image block to be encoded having been encoded; a second acquisition module configured to: acquiring coding characteristics of coded surrounding image blocks and complexity information of the image blocks to be coded; a determination module configured to: determining a dividing mode of an image block to be coded based on the coding characteristics of the coded surrounding image blocks and the complexity information of the image block to be coded; a first encoding module configured to: under the condition that the dividing mode is a coarse-granularity dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded; a second encoding module configured to: and under the condition that the division mode is a fine-grained division mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded.
According to still another aspect of the present disclosure, there is also provided an apparatus for processing a video frame image. The device for processing the video frame image comprises: a first acquisition module configured to: acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded; a second acquisition module configured to: acquiring the coding characteristics of the coded surrounding image blocks; a first determination module configured to: determining a dividing mode of the image block to be encoded based on the complexity information of the image block to be encoded, wherein the dividing mode is a coarse-grained dividing mode or a fine-grained dividing mode; a processing module configured to: dividing and coding an image block to be coded based on a dividing mode to obtain coded data of the image block to be coded; a second determination module configured to: acquiring complexity information of an image block to be coded, and determining whether a division mode meets a preset condition or not based on coded data of the image block to be coded and coding characteristics of coded surrounding image blocks; a partition stopping manner searching module configured to: stopping searching the division mode of the image block to be coded under the condition that the division mode meets a preset condition; a continued division mode search module configured to: and under the condition that the dividing mode does not meet the preset condition, continuing searching the mode of the image block to be coded.
Fig. 3 is yet another schematic diagram of a method of processing a video frame image according to an embodiment of the present disclosure.
The complexity analysis model, the coded data analysis model, and the partitioning method analysis model described below may all be artificial intelligence models, in particular artificial intelligence-based neural network models. Typically, artificial intelligence based neural network models are implemented as acyclic graphs, with neurons arranged in different layers. Typically, the neural network model comprises an input layer and an output layer, the input layer and the output layer being separated by at least one hidden layer. The hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer. The network nodes are all connected to nodes in adjacent layers via edges, and no edge exists between nodes in each layer. Data received at a node of an input layer of a neural network is propagated to a node of an output layer via any one of a hidden layer, an active layer, a pooling layer, a convolutional layer, and the like. The input and output of the neural network model may take various forms, which the present disclosure does not limit.
The complexity analysis model, the coded data analysis model, and the partitioning method analysis model described below may also be other than an artificial intelligence model, but other types of computational models, which the present disclosure does not limit.
As shown in fig. 3, a method 30 of processing a video frame image according to an embodiment of the present disclosure includes the following steps.
In step S301, complexity information of the image block to be encoded is obtained by using a complexity analysis model. For example, step S301 may correspond to the above-mentioned steps S212, S222 and S232, wherein the complexity information of the image block to be encoded is obtained by using a complexity analysis model.
The complexity information of the image block to be encoded comprises the complexity of the image block to be encoded as a whole and/or the complexity of each sub-image block with fine granularity formed by dividing the image block to be encoded.
The input of the complexity analysis model is the pixel value of each pixel in the image block to be encoded, or the pixel value of each pixel in a plurality of sub-image blocks with fine granularity into which the image block to be encoded is divided.
The output of the complexity analysis model is the complexity of the image block to be coded, or one or more of the sum of the complexities, the mean of the complexities and the weighted sum of the complexities of all sub image blocks with fine granularity formed by dividing the image block to be coded.
As described above, the complexity information may include one or more of: coarse intra prediction loss of a video frame, coarse inter prediction loss of a video frame, prediction rate distortion loss of a video frame, sum of absolute prediction errors between an image block to be encoded and an image block reconstructed based on the encoded data, sum of square prediction differences between an image block to be encoded and an image block reconstructed based on the encoded data, average absolute prediction difference between an image block to be encoded and an image block reconstructed based on the encoded data, average square prediction error between an image block to be encoded and an image block reconstructed based on the encoded data, spatial information, and/or temporal information.
The image blocks to be coded are located in the video frame image, and the image blocks reconstructed based on the coded data are located in the reconstructed video frame image, and the image blocks and the reconstructed video frame image comprise pixels at corresponding positions. The above-described sum of absolute differences, sum of squared differences, average absolute difference, and average squared error may be calculated/predicted based on the difference between pixels of corresponding positions of the two image blocks.
Therefore, the complexity analysis model can be further designed according to the content included in the complexity information. When the complexity information includes the plurality of contents, the complexity analysis model may include a corresponding prediction model based on the plurality of contents, and design a corresponding pooling layer, an activation layer, and an output layer based on an output of each prediction model, thereby minimizing a weighted sum of the plurality of complexity information. The present disclosure is not so limited.
The complexity analysis model may be trained using historical video, or in real-time during the encoding process for a particular video. Of course, the complexity analysis model may also be trained by a combination of the two.
As shown in fig. 3, in step S302, the encoding data analysis model is used to obtain the encoding characteristics of the encoded surrounding image blocks. For example, step S302 may correspond to the acquiring of the encoding characteristics of the encoded surrounding image block in step S212, step S222, and step S232 described above. And acquiring the coding characteristics of the coded surrounding image blocks by using a coded data analysis model.
Wherein, the input of the coding data analysis model is the coding data of the coded image block. The output of the encoded data analysis model is the encoding characteristics of the encoded image block.
As described above, the encoding features may include one or more of: coding mode, coding division depth, rate distortion loss, sum of absolute errors between an image block to be coded and an image block reconstructed based on the coded data, sum of squares of differences between an image block to be coded and an image block reconstructed based on the coded data, average absolute difference between an image block to be coded and an image block reconstructed based on the coded data, average square error between an image block to be coded and an image block reconstructed based on the coded data. The coded data is a code stream formed by coding the image block to be coded.
Therefore, the encoded data analysis model can be further designed according to the content included in the encoding feature. When the coding features include the plurality of contents, the coding features may include corresponding computational models based on the plurality of contents, and design corresponding pooling layers, activation layers, and output layers based on outputs of the respective computational models, thereby minimizing a weighted sum of the plurality of coding features. The present disclosure is not so limited.
The coded data analysis model may be trained using historical video, or in real time during the encoding process for a particular video. Of course, the data analysis model may also be trained by a combination of the two.
In step S303, based on the coding features of the coded surrounding image blocks and the complexity information of the image blocks to be coded, a partitioning method analysis model is used to determine the probability that the partitioning manner of the image blocks to be coded is a coarse-grained partitioning manner or the probability that the partitioning manner is a fine-grained partitioning manner.
In step S304, a division manner is determined based on the output of the division method analysis model.
Wherein, the input of the analysis model of the division method is as follows: the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded. The output of the partitioning method analysis model is: the division mode is the probability of a coarse-grained division mode or the probability of a fine-grained division mode.
When the probability that the division mode is the fine-grained division mode is greater than a certain threshold value, the division mode can be determined to be the fine-grained division mode.
When the probability that the division mode is the fine-grained division mode is less than or equal to a certain threshold, the division mode can be determined to be the coarse-grained division mode.
The threshold may be a trained value or a value preset by the encoder, which is not limited in this disclosure.
Optionally, when the probability that the partition manner is the coarse-grained partition manner is greater than a certain threshold, it may be determined that the partition manner is the coarse-grained partition manner. When the probability that the division mode is the coarse-grained division mode is less than or equal to a certain threshold, the division mode can be determined to be the fine-grained division mode. The threshold may be a trained value or a value preset by the encoder, which is not limited in this disclosure.
For example, steps S303 and S304 may correspond to the determination of the division manner of the image block to be encoded in steps S213, S223, and S233 described above. And analyzing the model by using a partitioning method, and determining the partitioning mode of the image block to be coded.
In step S305, when the division method is a fine-grained division method, the division and encoding are performed in the fine-grained division method.
Step S305 may correspond to the following steps S214, S224, and S234 described above: and under the condition that the dividing mode is a fine-grained dividing mode, dividing and encoding the image block to be encoded based on the dividing mode to obtain the encoded data of the image block to be encoded.
In step S306, based on the encoded data of the image block to be encoded, the division depth of the image block to be encoded is determined. In step S307, when the division depth of the image block to be encoded is not lower than the division depth of the encoded peripheral image block, determining that the division manner does not satisfy a preset condition; and under the condition that the division depth of the image block to be coded is lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
Step S306 and step S307 may correspond to the above step S215 and step S235, and determine whether the dividing manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoding characteristics of the encoded surrounding image blocks. The dividing mode is a fine-grained dividing mode.
As shown in fig. 3, in step S308, when the division method is a coarse-grained division method, the division and encoding are performed in the coarse-grained division method.
Step S308 may correspond to the following steps S214, S224, and S234 described above: and under the condition that the dividing mode is a coarse-granularity dividing mode, dividing and encoding the image block to be encoded based on the dividing mode to obtain the encoded data of the image block to be encoded.
In step S309, based on the encoded data of the image block to be encoded, the division depth of the image block to be encoded is determined. In step S310, determining that the partition mode does not satisfy a preset condition when the partition depth of the image block to be encoded is not lower than the partition depth of the encoded surrounding image block; and under the condition that the division depth of the image block to be coded is lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
Steps S309 and S310 may correspond to the above steps S215 and S235, and determine whether the dividing manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoding characteristics of the encoded surrounding image blocks. Wherein, the dividing mode is a coarse-grained dividing mode.
Optionally, the obtaining of the encoding characteristics of the image block to be encoded may also use the encoded data analysis model. At this time, the input of the encoded data analysis model is the encoded data of the image block to be encoded. And the output of the coded data analysis model is the coding characteristics of the coded data of the image block to be coded. Optionally, the coding feature at this time may be a division depth.
Where the partition depth may indicate the number of iterations that are performed when a certain image block is iterated from the largest sized image block (the coarsest grained image block) to the size of the image block for encoding.
Alternatively, when there are a plurality of encoded peripheral image blocks, the divided depth of the encoded peripheral image blocks may be an average or a weighted average of the divided depths of the plurality of encoded peripheral image blocks.
As shown in fig. 3, in step S311, in the case where it is determined that the division manner does not satisfy the preset condition (e.g., has a large difference from the division depth of the already-encoded surrounding image block), the reset encoding process is utilized. In step S312, the advantages and disadvantages of the coarse-grained division manner and the fine-grained division manner are further compared to select a better division manner.
Step S311 and step S312 may correspond to step S217 and step S237 described above.
For example, when the dividing manner is a coarse-granularity dividing manner, the image block to be encoded is divided into a plurality of sub image blocks with fine granularity, and the plurality of sub image blocks are respectively encoded to obtain encoded data of the image block to be encoded.
For example, when the dividing manner is a fine-grained dividing manner, the image block to be encoded is encoded as a whole to obtain encoded data of the image block to be encoded.
Then, the encoded data of the image block to be encoded is determined based on the encoded data obtained by encoding the image block to be encoded as a whole and the encoded data obtained by encoding the plurality of sub image blocks respectively.
For example, step S312 may further include the following steps.
Firstly, determining a first rate-distortion loss of an image block to be coded according to coded data obtained by coding the image block to be coded as a whole; and determining a second rate-distortion loss of the image block to be encoded according to the encoded data obtained by respectively encoding the plurality of sub-image blocks.
And then, when the first rate distortion loss is lower than the second rate distortion loss, taking the coded data obtained by coding the image block to be coded as a whole as the coded data of the image block to be coded.
And when the first rate distortion loss is higher than the second rate distortion loss, taking the encoded data obtained by respectively encoding the plurality of sub image blocks as the encoded data of the image block to be encoded.
In order to enable the complexity analysis model, the encoded data analysis model and the partitioning method analysis model to more accurately determine the partitioning manner in the subsequent encoding process, so as to further improve the encoding efficiency and the encoding quality, parameters in the models may be adjusted based on the encoded data obtained by encoding the image block to be encoded as a whole and the encoded data obtained by encoding the plurality of sub-image blocks respectively in the above steps.
In step S313, the more optimal division manner is analyzed.
For example, the above-described encoded data analysis model may be used to extract the encoding features in the encoded data obtained by encoding the image block to be encoded as a whole and the encoding features in the encoded data obtained by encoding the plurality of sub image blocks respectively, and analyze the encoding features. For example, the difference of the coding characteristics brought by the two division modes can be further analyzed, so that the prediction of the division modes of other video frames is referred to.
Then, in step S314, parameters in the above-described model are adjusted.
For example, in the case of acquiring complexity information of the image block to be encoded by using a complexity analysis model, a first rate distortion loss and a second rate distortion loss may be used to adjust parameters in the complexity analysis model. Of course, other encoding features may be used to adjust the parameters in the complexity analysis model, and the disclosure is not limited thereto.
For example, in a case where the encoding characteristics of the encoded surrounding image block are obtained using an encoded data analysis model, parameters in the encoded data analysis model are adjusted using a first rate distortion loss and a second rate distortion loss. Of course, other encoding features may be used to adjust parameters in the encoded data analysis model, and the disclosure is not limited thereto.
For example, in the case where the division manner of the image block to be encoded is determined by the division method analysis model, the parameters in the division method analysis model are adjusted using the first rate distortion loss and the second rate distortion loss. Of course, other coding features may be used to adjust the parameters in the partitioning analysis model, and the disclosure is not limited thereto.
Optionally, if it is finally determined that the encoded data obtained by respectively encoding the plurality of sub image blocks should be used as the encoded data of the image block to be encoded, the plurality of divided sub image blocks may be respectively used again as the image block to be encoded to determine the dividing manner of each sub image block. The determination of the partition manner of each sub-image block is similar to the determination of the partition manner of the image block to be encoded, and is not described herein again.
Therefore, the embodiment of the disclosure can acquire a more accurate division mode by combining the coding features of the coded surrounding image blocks and the complexity information of the image blocks to be coded. Meanwhile, the embodiment of the disclosure compares the coding characteristics of the image block to be coded with the coding characteristics of the coded surrounding image blocks, so as to determine whether to skip the search of the division mode under the current size, and further improve the coding speed. According to the embodiment of the disclosure, the complexity of the image block to be coded is combined with the coded information of the surrounding blocks, so that the judgment of a rapid partition mode can be realized, and the partition process of the image block to be coded under the current size can be rapidly skipped out, so that the coding speed is improved on the premise of not influencing the coding rate and the subjective experience of a user as far as possible.
Fig. 4A is an interface schematic diagram of a method 40 of processing video frame images according to an embodiment of the disclosure. Fig. 4B is a schematic flow chart diagram of a method 40 of processing a video frame image according to an embodiment of the present disclosure.
Referring to fig. 4B, a method 40 of processing a video frame image according to an embodiment of the present disclosure may include the following steps.
In step S401, acceleration coding enabling information input by a user is acquired, wherein the acceleration coding enabling information indicates: in the process of determining the dividing mode of the image block to be coded in the video frame image, the dividing mode of the image block to be coded is determined by using the coding characteristics of the image blocks around the image block to be coded and the complexity information of the image block to be coded, and the dividing mode comprises a coarse-granularity dividing mode and a fine-granularity dividing mode. Wherein surrounding image blocks of the image block to be encoded are adjacent to the image block to be encoded, and the surrounding image blocks of the image block to be encoded have been encoded.
For example, in FIG. 4A, if the user clicks on whether to accelerate the code, the presentation of the acceleration option is triggered. At this time, if the user selects the acceleration mode of the surrounding block, then, in step S402, the video frame image is encoded based on the acceleration encoding enabling information. For example, the encoder will employ methods 21-23, and 30 described above to speed up the encoding process.
In addition, the above-mentioned methods 21 to 23 and 30 can be started by inputting the parameters corresponding to the related acceleration options in the command line panel by the user to accelerate the encoding process.
According to still another aspect of the present disclosure, there is also provided an electronic device. Fig. 5 shows a schematic diagram of an electronic device 2000 according to an embodiment of the disclosure.
As shown in fig. 5, the electronic device 2000 may include one or more processors 2010, and one or more memories 2020. Wherein the memory 2020 has stored therein computer readable code, which when executed by the one or more processors 2010 may perform a method as described above.
The processor in the disclosed embodiments may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, either of the X86 architecture or the ARM architecture.
In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of embodiments of the disclosure have been illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
For example, a method or apparatus in accordance with embodiments of the present disclosure may also be implemented by way of the architecture of computing device 3000 shown in fig. 6. As shown in fig. 6, computing device 3000 may include a bus 3010, one or more CPUs 3020, a Read Only Memory (ROM)3030, a Random Access Memory (RAM)3040, a communication port 3050 connected to a network, input/output components 3060, a hard disk 3070, and the like. A storage device in the computing device 3000, such as the ROM 3030 or the hard disk 3070, may store various data or files used in the processing and/or communication of the method for determining a driving risk of a vehicle provided by the present disclosure, as well as program instructions executed by the CPU. Computing device 3000 can also include user interface 3080. Of course, the architecture shown in FIG. 6 is merely exemplary, and one or more components of the computing device shown in FIG. 6 may be omitted when implementing different devices, as desired.
According to yet another aspect of the present disclosure, there is also provided a computer-readable storage medium. Fig. 7 shows a schematic diagram 4000 of a storage medium according to the present disclosure.
As shown in fig. 7, the computer storage media 4020 has stored thereon computer readable instructions 4010. The computer readable instructions 4010, when executed by a processor, can perform methods according to embodiments of the present disclosure described with reference to the above figures. The computer readable storage medium in embodiments of the present disclosure may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memories of the methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory. It should be noted that the memories of the methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory.
Embodiments of the present disclosure also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform a method according to an embodiment of the disclosure.
It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of embodiments of the disclosure have been illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The exemplary embodiments of the present disclosure, which are described in detail above, are merely illustrative, and not restrictive. It will be appreciated by those skilled in the art that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and that such modifications are intended to be within the scope of the disclosure.

Claims (15)

1. A method of processing a video frame image, comprising:
acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded;
acquiring coding characteristics of coded surrounding image blocks and complexity information of the image blocks to be coded;
determining a dividing mode of the image block to be coded based on the coding characteristics of the coded surrounding image block and the complexity information of the image block to be coded, wherein the dividing mode comprises a coarse-grained dividing mode and a fine-grained dividing mode;
dividing and encoding the image block to be encoded based on the dividing mode to obtain encoded data of the image block to be encoded;
determining whether the division mode meets a preset condition or not based on the coded data of the image block to be coded and the coded features of the coded surrounding image blocks;
stopping searching the division mode of the image block to be coded under the condition that the division mode meets a preset condition;
and under the condition that the dividing mode does not meet the preset condition, continuing searching the dividing mode of the image block to be coded.
2. The method according to claim 1, wherein the dividing and encoding the image block to be encoded based on the division manner to obtain the encoded data of the image block to be encoded further comprises:
under the condition that the dividing mode is a coarse-granularity dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded;
and under the condition that the division mode is a fine-grained division mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded.
3. The method according to claim 1, wherein the continuing the search for the division manner of the image block to be encoded in the case that the division manner does not satisfy a preset condition further comprises:
under the condition that the dividing mode is a coarse-granularity dividing mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded;
under the condition that the dividing mode is a fine-grained dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded;
and determining the coded data of the image block to be coded based on the coded data obtained by coding the image block to be coded as a whole and the coded data obtained by coding the plurality of sub image blocks respectively.
4. The method according to claim 2, wherein, in case that the partition manner is a coarse-granularity partition manner, the determining whether the partition manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoded features of the encoded surrounding image blocks further comprises:
determining a division depth of the encoded surrounding image block based on the encoding characteristics of the encoded surrounding image block;
determining the division depth of the image block to be coded based on the coded data of the image block to be coded;
determining that the dividing mode does not meet a preset condition under the condition that the dividing depth of the image block to be coded is lower than the dividing depth of the coded peripheral image block;
and under the condition that the division depth of the image block to be coded is not lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
5. The method according to claim 2, wherein in a case that the partition manner is a fine-grained partition manner, the determining whether the partition manner satisfies a preset condition based on the encoded data of the image block to be encoded and the encoded characteristics of the encoded surrounding image blocks further comprises:
determining a division depth of the encoded surrounding image block based on the encoding characteristics of the encoded surrounding image block;
determining the division depth of the image block to be coded based on the coded data of the image block to be coded;
under the condition that the division depth of the image block to be coded is not lower than the division depth of the coded peripheral image block, determining that the division mode does not meet a preset condition;
and under the condition that the division depth of the image block to be coded is lower than the division depth of the coded peripheral image block, determining that the division mode meets a preset condition.
6. The method according to claim 1, wherein the determining the partition of the image block to be encoded based on the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image block to be encoded further comprises: determining the division mode of the image block to be coded based on the coding characteristics of the coded surrounding image block and the complexity information of the image block to be coded by utilizing a division method analysis model, wherein,
the input of the analysis model of the division method is as follows: the encoding characteristics of the encoded surrounding image blocks and the complexity information of the image blocks to be encoded;
the output of the partitioning method analysis model is: the division mode is the probability of a coarse-grained division mode or the probability of a fine-grained division mode.
7. The method of claim 1, wherein,
the complexity information comprises one or more of: coarse intra prediction loss of a video frame, coarse inter prediction loss of a video frame, prediction rate distortion loss of a video frame, sum of absolute prediction errors between an image block to be encoded and an image block reconstructed based on the encoded data, sum of square prediction differences between an image block to be encoded and an image block reconstructed based on the encoded data, average absolute prediction difference between an image block to be encoded and an image block reconstructed based on the encoded data, average square prediction error between an image block to be encoded and an image block reconstructed based on the encoded data, spatial information, and/or temporal information;
the coding features include one or more of: an encoding mode, an encoding division depth, a rate-distortion loss, a sum of absolute errors between an image block to be encoded and an image block reconstructed based on encoded data, a sum of squares of differences between an image block to be encoded and an image block reconstructed based on encoded data, an average absolute difference between an image block to be encoded and an image block reconstructed based on encoded data, an average square error between an image block to be encoded and an image block reconstructed based on encoded data;
the coded data is a code stream formed by coding the image block to be coded.
8. The method of claim 2, wherein the determining the encoded data of the image block to be encoded based on the encoded data obtained by encoding the image block to be encoded as a whole and the encoded data obtained by encoding the plurality of sub-image blocks respectively further comprises:
determining a first rate-distortion loss of the image block to be encoded according to encoded data obtained by encoding the image block to be encoded as a whole;
determining a second rate distortion loss of the image block to be encoded according to the encoded data obtained by respectively encoding the plurality of sub-image blocks;
when the first rate distortion loss is lower than the second rate distortion loss, taking the coded data obtained by coding the image block to be coded as a whole as the coded data of the image block to be coded;
and when the first rate distortion loss is higher than the second rate distortion loss, taking the coded data obtained by respectively coding the plurality of sub image blocks as the coded data of the image block to be coded.
9. The method of claim 8, further comprising one or more of:
under the condition that a complexity analysis model is utilized to obtain complexity information of the image block to be coded, adjusting parameters in the complexity analysis model by using a first rate distortion loss and a second rate distortion loss;
under the condition that the coding characteristics of the coded surrounding image blocks are obtained by using a coding data analysis model, adjusting parameters in the coding data analysis model by using a first rate distortion loss and a second rate distortion loss; or
In the case that the division method analysis model is used to determine the division mode of the image block to be encoded, parameters in the division method analysis model are adjusted by using a first rate distortion loss and a second rate distortion loss.
10. The method according to any of claims 1 to 9, wherein said obtaining complexity information of said image block to be encoded further comprises: obtaining complexity information of the image block to be coded by using a complexity analysis model, wherein,
the complexity information of the image block to be coded comprises the complexity of the image block to be coded as a whole and/or the complexity of each sub image block with fine granularity formed by dividing the image block to be coded;
the input of the complexity analysis model is the pixel value of each pixel in the image block to be coded or the pixel value of each pixel in a plurality of sub-image blocks with fine granularity into which the image block to be coded is divided;
the output of the complexity analysis model is the complexity of the image block to be coded, or one or more of the sum of the complexities, the mean of the complexities and the weighted sum of the complexities of all sub image blocks with fine granularity formed by dividing the image block to be coded.
11. The method according to any of claims 1 to 9, wherein said obtaining the coding characteristics of the coded surrounding image blocks further comprises: obtaining the coding characteristics of the coded surrounding image blocks by using a coding data analysis model, wherein,
the input of the coding data analysis model is coding data of the coded image block;
the output of the encoded data analysis model is the encoding characteristics of the encoded image block.
12. A method of processing a video frame image, comprising:
acquiring turbo coding enabling information input by a user, wherein the turbo coding enabling information indicates: in the process of determining the dividing mode of the image blocks to be coded in the video frame image, determining the dividing mode of the image blocks to be coded by using the coding features of the peripheral image blocks of the image blocks to be coded and the complexity information of the image blocks to be coded, wherein the dividing mode comprises a coarse-granularity dividing mode and a fine-granularity dividing mode, the peripheral image blocks of the image blocks to be coded are adjacent to the image blocks to be coded, and the peripheral image blocks of the image blocks to be coded are already coded;
and encoding the video frame image based on the accelerated encoding enabling information.
13. A method of processing a video frame image, comprising:
acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded;
acquiring coding characteristics of coded surrounding image blocks and complexity information of the image blocks to be coded;
determining a dividing mode of an image block to be coded based on the coding characteristics of the coded surrounding image blocks and the complexity information of the image block to be coded;
under the condition that the dividing mode is a coarse-grained dividing mode, the image block to be coded is coded as a whole to obtain coded data of the image block to be coded;
and under the condition that the division mode is a fine-grained division mode, dividing the image block to be encoded into a plurality of sub image blocks with fine granularity, and respectively encoding the plurality of sub image blocks to acquire encoded data of the image block to be encoded.
14. A method of processing a video frame image, comprising:
acquiring an image block to be encoded from the video frame image, wherein at least one surrounding image block adjacent to the image block to be encoded is encoded;
acquiring complexity information of the image block to be coded;
determining a dividing mode of the image block to be encoded based on the complexity information of the image block to be encoded, wherein the dividing mode is a coarse-grained dividing mode or a fine-grained dividing mode;
dividing and coding an image block to be coded based on a dividing mode to obtain coded data of the image block to be coded;
acquiring coding features of coded surrounding image blocks, and determining whether a division mode meets a preset condition or not based on the coded data of the image block to be coded and the coding features of the coded surrounding image blocks;
stopping searching the division mode of the image block to be coded under the condition that the division mode meets the preset condition;
and under the condition that the dividing mode does not meet the preset condition, continuing searching the mode of the image block to be coded.
15. An electronic device, comprising:
one or more processors; and
one or more memories, wherein the memories have computer-readable code stored therein which, when executed by the one or more processors, performs the method of any one of claims 1-14.
CN202110016821.4A 2021-01-07 2021-01-07 Method for processing video frame image and electronic equipment Pending CN114745551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110016821.4A CN114745551A (en) 2021-01-07 2021-01-07 Method for processing video frame image and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110016821.4A CN114745551A (en) 2021-01-07 2021-01-07 Method for processing video frame image and electronic equipment

Publications (1)

Publication Number Publication Date
CN114745551A true CN114745551A (en) 2022-07-12

Family

ID=82273952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110016821.4A Pending CN114745551A (en) 2021-01-07 2021-01-07 Method for processing video frame image and electronic equipment

Country Status (1)

Country Link
CN (1) CN114745551A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116567286A (en) * 2023-07-10 2023-08-08 武汉幻忆信息科技有限公司 Online live video processing method and system based on artificial intelligence
WO2024077767A1 (en) * 2022-10-14 2024-04-18 北京大学深圳研究生院 Learning model-oriented coding decision processing method and apparatus, and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077767A1 (en) * 2022-10-14 2024-04-18 北京大学深圳研究生院 Learning model-oriented coding decision processing method and apparatus, and device
CN116567286A (en) * 2023-07-10 2023-08-08 武汉幻忆信息科技有限公司 Online live video processing method and system based on artificial intelligence
CN116567286B (en) * 2023-07-10 2023-09-22 武汉幻忆信息科技有限公司 Online live video processing method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
US20210258579A1 (en) Method and device for encoding or decoding image
KR100974177B1 (en) Method and apparatus for using random field models to improve picture and video compression and frame rate up conversion
CN111801945A (en) Hybrid motion compensated neural network with side information based video coding
US20100166073A1 (en) Multiple-Candidate Motion Estimation With Advanced Spatial Filtering of Differential Motion Vectors
US20180115787A1 (en) Method for encoding and decoding video signal, and apparatus therefor
CN112102212B (en) Video restoration method, device, equipment and storage medium
CN112465698A (en) Image processing method and device
CN109587491A (en) A kind of intra-frame prediction method, device and storage medium
KR102138650B1 (en) Systems and methods for processing a block of a digital image
CN114745551A (en) Method for processing video frame image and electronic equipment
CN115486068A (en) Method and apparatus for inter-frame prediction based on deep neural network in video coding
US20190124347A1 (en) Video encoding
US10271060B2 (en) Methods and devices for generating, encoding or decoding images with a first dynamic range, and corresponding computer program products and computer-readable medium
US11641470B2 (en) Planar prediction mode for visual media encoding and decoding
US10893274B2 (en) Method for processing video signal on basis of arbitrary partition transform
Afsana et al. Efficient scalable uhd/360-video coding by exploiting common information with cuboid-based partitioning
US11206394B2 (en) Apparatus and method for coding an image
CN114071161B (en) Image encoding method, image decoding method and related devices
CN114745541A (en) Method for processing video frame image and electronic equipment
CN115209147A (en) Camera video transmission bandwidth optimization method, device, equipment and storage medium
CN111901595B (en) Video coding method, device and medium based on deep neural network
US9838713B1 (en) Method for fast transform coding based on perceptual quality and apparatus for the same
CN114071162A (en) Image encoding method, image decoding method and related device
CN111885378B (en) Multimedia data encoding method, apparatus, device and medium
US11924437B2 (en) Variable framerate encoding using content-aware framerate prediction for high framerate videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination