WO2022095871A1 - 一种视频处理方法、视频处理装置、智能设备及存储介质 - Google Patents

一种视频处理方法、视频处理装置、智能设备及存储介质 Download PDF

Info

Publication number
WO2022095871A1
WO2022095871A1 PCT/CN2021/128311 CN2021128311W WO2022095871A1 WO 2022095871 A1 WO2022095871 A1 WO 2022095871A1 CN 2021128311 W CN2021128311 W CN 2021128311W WO 2022095871 A1 WO2022095871 A1 WO 2022095871A1
Authority
WO
WIPO (PCT)
Prior art keywords
data block
sub
target data
encoding
target
Prior art date
Application number
PCT/CN2021/128311
Other languages
English (en)
French (fr)
Inventor
郑羽珊
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022095871A1 publication Critical patent/WO2022095871A1/zh
Priority to US17/957,071 priority Critical patent/US20230023369A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the present application relates to the field of computer technology, and in particular, to a video processing method, a video processing apparatus, a smart device, and a computer-readable storage medium.
  • video coding technologies are widely used in scenarios such as video sessions and video-on-demand.
  • the video coding technology is used to process the session video involved in the video session scene
  • the video coding technology is used to process the on-demand video involved in the video-on-demand scene
  • Video coding technology refers to the technology of compressing and coding video according to the coding mode. Compressing and encoding the video can effectively save the storage space of the video and improve the transmission efficiency of the video.
  • Embodiments of the present application provide a video processing method, apparatus, device, and storage medium, which can more accurately select an appropriate encoding mode to encode video frames.
  • an embodiment of the present application provides a video processing method, which is executed by a smart device, and the video processing method includes:
  • N is an integer greater than or equal to 2;
  • the target data block is encoded according to the determined encoding mode.
  • an embodiment of the present application provides a video processing apparatus, and the video processing apparatus includes:
  • an acquisition unit for acquiring a target video frame from the video to be encoded, and determining a target data block to be encoded from the target video frame
  • the processing unit is used to analyze the scene complexity of the target data block to obtain the data block index information; divide the target data block into N sub-data blocks, and analyze the scene complexity of each sub-data block separately to obtain the sub-block index information, N is an integer greater than or equal to 2; determine the encoding mode for the target data block according to the data block index information and the sub-block index information; and encode the target data block according to the determined encoding mode.
  • an embodiment of the present application provides a smart device, and the smart device includes:
  • a processor adapted to implement a computer program
  • a memory where a computer program is stored in the memory, and the computer program is loaded and executed by the processor to implement the above-mentioned video processing method.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is read and executed by a processor of a computer device, the above-mentioned video processing method is implemented.
  • an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to implement the above-mentioned video processing method.
  • 1a is a schematic diagram of a recursive division process of a video frame provided by an embodiment of the present application
  • 1b is a schematic diagram of a video frame division result provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an encoder provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an associated data block provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another video processing method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another video processing method provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a video processing apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a smart device provided by an embodiment of the present application.
  • Cloud Technology refers to a hosting technology that unifies a series of resources such as hardware, software, and network in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data.
  • Cloud technology is a general term for network technology, information technology, integration technology, management platform technology, and application technology based on cloud computing business models. Cloud computing technology will become an important support. Background services of technical network systems require a lot of computing and storage resources, such as video websites, picture websites and more portal websites.
  • each item may have its own identification mark, which needs to be transmitted to the back-end system for logical processing. Data of different levels will be processed separately, and all kinds of industry data need to be strong.
  • the system backing support which can only be achieved through cloud computing.
  • Cloud computing is a computing model that distributes computing tasks on a resource pool composed of a large number of computers, enabling various application systems to obtain computing power, storage space and information services as needed.
  • the network that provides the resources is called the “cloud”.
  • the resources in the "cloud” are infinitely scalable from the point of view of the user, and can be acquired at any time, used on demand, expanded at any time, and paid for use.
  • a cloud computing resource pool (referred to as a cloud platform) will be established, generally referred to as an IaaS (Infrastructure as a Service) platform, and various types of virtual machines will be deployed in the cloud computing resource pool. Resources for external customers to choose and use.
  • the cloud computing resource pool mainly includes: computing devices (which are virtualized machines, including operating systems), storage devices, and network devices.
  • the PaaS (Platform as a Service) layer can be deployed on the IaaS layer
  • the SaaS (Software as a Service) layer can be deployed on the PaaS layer, or the SaaS can be directly deployed on the on IaaS.
  • PaaS is a platform on which software (eg, databases, web containers, etc.) runs.
  • SaaS is a variety of business software (such as web portals, SMS group senders, etc.).
  • SaaS and PaaS are upper layers relative to IaaS.
  • Cloud computing also refers to the delivery and use mode of IT (Internet Technology) infrastructure, which refers to obtaining the required resources through the network in an on-demand and easily scalable manner.
  • Cloud computing in a broad sense refers to the delivery and use mode of services, and refers to obtaining required services in an on-demand and easily scalable manner through the network.
  • Such services can be IT and software, Internet-related, or other services.
  • Cloud computing is grid computing (Grid Computing), distributed computing (Distributed Computing), parallel computing (Parallel Computing), utility computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing ( Load Balance) and other traditional computer and network technology development and integration products.
  • Cloud computing can be used in the field of cloud conference.
  • Cloud conference is an efficient, convenient and low-cost conference form based on cloud computing technology. Users only need to perform simple and easy-to-use operations through the Internet interface, and can quickly and efficiently share voice, data files and videos with teams and customers around the world, and complex technologies such as data transmission and processing in the conference are provided by cloud conference services. The dealer helps the user to operate.
  • domestic cloud conferences mainly focus on the service content of the SaaS model, including telephone, network, video and other service forms.
  • Video conferences based on cloud computing are called cloud conferences. In the era of cloud conferencing, data transmission, processing, and storage are all handled by the computer resources of video conferencing manufacturers. Users do not need to purchase expensive hardware and install cumbersome software.
  • the cloud conference system supports multi-server dynamic cluster deployment and provides multiple high-performance servers, which greatly improves the stability, security and availability of conferences.
  • video conferencing has been welcomed by many users because it can greatly improve communication efficiency, continue to reduce communication costs, and bring about an upgrade in internal management. It has been widely used in government, military, transportation, transportation, finance, operators, education, and enterprises. and other fields. There is no doubt that after the use of cloud computing for video conferencing, it will be more attractive in terms of convenience, speed, and ease of use, which will surely stimulate the arrival of a new upsurge in video conferencing applications.
  • the embodiments of the present application relate to a video (Video), and a video is a video frame sequence composed of consecutive video frames. Due to the visual persistence effect of the human eye, when the video frame sequence is played at a certain rate, what we see is a video with continuous action. Due to the extremely high similarity between consecutive video frames, there is a large amount of redundant information within each video frame and between consecutive video frames. Therefore, before the video is stored or transmitted, it is often necessary to use video encoding. ) technology encodes the video and removes redundant information of the video in dimensions such as space and time to save storage space and improve video transmission efficiency.
  • Video coding technology may also be called video compression technology, which refers to a technology for compressing and coding each video frame in a video according to the coding mode. Specifically, in the process of encoding the video, it is necessary to recursively divide each video frame in the video into data blocks of various sizes, and then input each divided data block into the encoder for encoding. The recursive division process involved in the embodiment of the present application is described below with reference to FIG. 1a.
  • FIG. 1a is a schematic diagram of a recursive division process of a video frame provided by an embodiment of the present application. As shown in FIG.
  • the video frame 10 is composed of several data blocks 101 of the first size; the recursive division method of the video frame 10 may include: 1 Directly input the data blocks 101 of the first size into the encoder for encoding, that is to say , do not divide the data block 101 of the first size, directly input the data block of the first size into the encoder for encoding; 2 divide the data block 101 of the first size into four data blocks 102 of the second size with the same size , do not further divide the data block 102 of the second size, and input the data block 102 of the second size into the encoder for coding; 3. further divide the data block 102 of the second size, and divide to obtain 4 thirds of the same size.
  • the data block 103 of the third size is not further divided into the data block 103 of the third size, and the data block 103 of the third size is input into the encoder for encoding; and so on, the data block 103 of the third size can also be further divided.
  • the video frame 10 is recursively divided into data blocks of various sizes, and the division result of the video frame 10 can be seen in FIG. 1b.
  • FIG. 1b is a schematic diagram of a video frame division result provided by an embodiment of the present application. As shown in FIG. 1b, the divided video frame 10 is composed of data blocks of three sizes, namely, a data block 101 of a first size, a data block 102 of a second size, and a data block 103 of a third size.
  • the division strategies of the multiple data blocks included in the video frame are different, the encoding mode for encoding the video frame is also different, and the encoding speed for the video frame is also different.
  • the embodiment of the present application provides a video processing solution, and the video processing solution provides two different data block division strategies, which are a top-down data block division strategy and a bottom-up data block division strategy. Both of these two data block division strategies can determine the optimal division method of each data block in the video frame.
  • the optimal division method can be understood as that after the data blocks are divided according to the data block division method, the coding quality and the coding rate are relatively low when the divided data blocks are coded.
  • the encoding bit rate may refer to the amount of data transmitted in a unit time (for example, 1 second) when the encoded video data stream is transmitted. The more data amount transmitted in a unit time, the higher the encoding bit rate.
  • a unit time for example, 1 second
  • the above two data block division strategies are described in detail below.
  • a top-down, recursive division is performed on each data block in the video frame.
  • the top-down and recursive division may refer to the process of dividing from the maximum size of the data block in the video frame to the minimum size of the data block in the video frame until the optimal division method of the data block is found.
  • the maximum size of a data block in a video frame is 64px (Pixel) ⁇ 64px
  • the minimum size of a data block in a video frame is 4px ⁇ 4px.
  • the maximum size of a data block is 64px ⁇ 64px
  • the minimum size of a data block is 4px ⁇ 4px, which is for example only, and does not constitute a limitation on the embodiments of this application.
  • the maximum size of a data block may also be 128px ⁇ 128px, and the The minimum size can also be 8px x 8px and so on.
  • the 64px ⁇ 64px data block can be divided into 4 32px ⁇ 32px data blocks.
  • RDCost Rate-Distortion Cost
  • the division size of the data block is determined to be 64px ⁇ 64px, that is, there is no need to divide the data block of 64px ⁇ 64px; if the first sum RDCost is less than or equal to 64px ⁇ 64px
  • the RDCost of the data block is determined, and the division size of the data block is determined to be 32px ⁇ 32px. Subsequently, the 32px ⁇ 32px data block can be further divided into four 16px ⁇ 16px data blocks for coding, and so on, until the optimal division method of the data blocks is found.
  • the top-down data block partitioning strategy can generally determine the optimal partitioning method more accurately.
  • the optimal division size of the data block is relatively large. Therefore, for data blocks with low scene complexity, the top-down data block division strategy can quickly determine the optimal size of the data block. division method. For data blocks with high scene complexity, the optimal division size of data blocks is relatively small. Therefore, for data blocks with high scene complexity, the top-down data block division strategy is used to determine the optimal division method of the data block. The process requires a lot of time cost, which affects the encoding speed of the data block.
  • the optimal division size of the data block can be understood as when the size of the divided data block is this size, the coding quality of the divided data block is better and the coding rate is lower.
  • the optimal division size corresponding to the data block with lower scene complexity is relatively large, and the optimal division size corresponding to the data block with higher scene complexity is relatively small.
  • the scene complexity of the data block can be measured by spatial information (Spatial Information, SI) or temporal information (Temporal Information). Spatial information can be used to represent the amount of spatial detail of the data block. The more elements contained in the data block, the higher the value of the spatial information of the data block, and the higher the complexity of the scene of the data block.
  • data block A contains 5 elements (a cat, a dog, a tree, a flower and a sun)
  • data block B contains 2 elements (a cat and a dog)
  • data block A contains contains more elements than data block B
  • the value of the spatial information of data block A is higher than that of data block B
  • the scene complexity of data block A is higher than that of data block B.
  • the time information can be used to characterize the temporal variation of the data block. The higher the motion degree of the target data block in the target video frame currently being processed relative to the reference data block in the reference video frame of the target video frame, the higher the time of the target data block. The larger the value of the information, the higher the scene complexity of the target data block.
  • the reference video frame is a video frame whose coding sequence is located before the target video frame in the video frame sequence, and the position of the target data block in the target video frame is the same as the position of the reference data block in the reference video frame.
  • the target data block contains an element (such as a car element), and the reference data block also contains this element.
  • the greater the displacement of the car element in the target data block relative to the car element in the reference data block the more The higher the degree of movement of the block relative to the reference data block in the reference video frame, the greater the numerical value of the time information of the target data block, and the higher the scene complexity of the target data block.
  • bottom-up, recursive partitioning is performed on each data block in the video frame.
  • the bottom-up recursive division may refer to the process of dividing from the smallest size of the data block in the video frame to the largest size of the data block in the video frame until the optimal division method of the data block is found.
  • the minimum size of a data block in a video frame is 4px ⁇ 4px
  • the maximum size of a data block in a video frame is 64px ⁇ 64px.
  • Each data block in the data block encodes the RDCost, and calculates the sum of the RDCosts of the four 4px ⁇ 4px data blocks (ie, calculates the second sum RDCost of the four 4px ⁇ 4px data blocks); and calculates the 8px ⁇ 8px chunks of data to encode RDCost. If the second sum RDCost is less than or equal to the RDCost of the data block of 8px ⁇ 8px, then the division size of the data block is determined to be 4px ⁇ 4px; if the second sum RDCost is greater than the RDCost of the data block of 8px ⁇ 8px, the division of the data block is determined The dimensions are 8px x 8px.
  • the 16px ⁇ 16px data block composed of four 8px ⁇ 8px data blocks can also be continuously encoded, and so on, until the optimal division method of the data block is found.
  • the bottom-up video frame division strategy can generally determine the optimal division method more accurately. For the data block with high scene complexity, the optimal division size of the data block is relatively small. Therefore, for the data block with high scene complexity, the bottom-up video frame division strategy can quickly determine the optimal division of the data block. Way. For data blocks with low scene complexity, the optimal division size of the data block is relatively large. Therefore, for data blocks with low scene complexity, the bottom-up video frame division strategy is used to determine the optimal division method of the data block. The process requires a lot of time cost, which affects the encoding speed of the data block.
  • the top-down data block division strategy and the bottom-up video frame division strategy can generally determine the optimal division method of each data block in the video frame more accurately.
  • the data block division strategy is more suitable for data blocks with lower scene complexity
  • the bottom-up video frame division strategy is more suitable for data blocks with higher scene complexity.
  • the embodiments of the present application provide a further video processing solution, the video processing solution performs scene complexity analysis on the target data block to be encoded in the to-be-coded video frame; the target data block is divided into multiple sub-data blocks, and Perform scene complexity analysis on each sub-data block separately; determine the target data block based on the scene complexity analysis result of the target data block and the scene complexity analysis result of each sub-data block in the divided sub-data blocks. and then determine the encoding mode for the target data block, so that the target data block can be encoded according to the determined encoding mode.
  • the scene complexity analysis in this embodiment of the present application may be performed by using any scene complexity analysis method in the related art, for example, a sum of squared error (SSE, Sum of Squared Error) method, a sum of absolute difference (SAD, Sum of Absolute Difference) method etc.
  • SSE sum of squared error
  • SAD sum of Absolute Difference
  • the optimal division method mentioned in the embodiments of the present application refers to a data block division method that can improve the encoding speed to a certain extent on the premise of achieving a balance between the encoding quality and the encoding bit rate.
  • the video processing scheme can formulate a coding mode adapted to the scene complexity of the target data block according to the scene complexity of the target data block, effectively improving the coding speed of the target data block, thereby improving the video coding speed; and the video processing
  • the scheme is more universal and suitable for any video coding scene. It can not only determine the coding mode suitable for data blocks with low scene complexity, but also determine the coding mode suitable for data blocks with high scene complexity.
  • FIG. 2 is a schematic structural diagram of an encoder provided by an embodiment of the present application.
  • the encoder 20 includes a scene complexity analysis module 201 , an adaptive partition decision module 202 and a partition early termination module 203 .
  • the scene complexity analysis module 201 may be configured to perform scene complexity analysis on the target data block to be encoded in the to-be-encoded video frame to obtain data block index information.
  • the scene complexity analysis module 201 can also be used to divide the target data block into N sub-data blocks, and perform scene complexity analysis on each of the N sub-data blocks to obtain sub-block index information, where N is greater than or An integer equal to 2.
  • N is greater than or An integer equal to 2.
  • the value of N is generally 4, which is not limited in this embodiment of the present application.
  • Dividing the target data block into N sub-data blocks may refer to dividing the target data block into 4 sub-data blocks of the same size.
  • the adaptive partition decision module 202 may be configured to determine an encoding mode for the target data block according to the data block index information and the sub-block index information.
  • the encoding mode may include any one of the first mode or the second mode, that is, the adaptive partition decision module 202 may determine, according to the data block indicator information and the sub-block indicator information, that the encoding mode for the target data block is the first mode or the second mode. model.
  • the first mode refers to an encoding mode in which the data block is divided into N sub-data blocks, and each sub-data block in the N sub-data blocks is encoded separately; the second mode refers to the target data block is not divided, and the target data block is directly The encoding mode in which the data block is encoded.
  • FIG. 3 is a schematic diagram of an associated data block provided by an embodiment of the present application. As shown in FIG.
  • the M associated data blocks related to the target data block may refer to the associated data block 302 located at the left position of the target data block 301 , the associated data block 303 located at the top position of the target data block 301 , and the associated data block 303 located at the top position of the target data block 301 .
  • the attempted coding sequence may include any one of the first attempted coding sequence or the second attempted coding sequence, that is, the adaptive partition decision module 202 may determine, according to the associated block indicator information, that the attempted coding sequence for the target data block is the first attempted coding sequence Or second try encoding order.
  • the first attempt encoding order refers to the encoding order in which the target data block is first attempted to be encoded according to the first mode, and then the target data block is tried to be encoded according to the second mode; the second attempt encoding order refers to the first attempt to encode the target data block according to the second mode.
  • the target data block is encoded, and then the encoding sequence for encoding the target data block is attempted according to the first mode.
  • the division early termination module 203 is configured to set a division termination condition when the adaptive division decision module 202 cannot determine an encoding mode for the target data block, and determine an encoding mode for the target data block according to the set division termination condition. Specifically, if the adaptive division decision module 202 determines that the attempted coding sequence for the target data block is the first attempted coding sequence, the division early termination module 203 obtains the N sub-data blocks obtained by coding the target data block according to the first mode. Encoding information, if the encoding information of the N sub-data blocks satisfies the first division termination condition (ie, the fifth condition hereinafter), the division early termination module 203 determines that the encoding mode for the target data block is the first mode.
  • the first division termination condition ie, the fifth condition hereinafter
  • the division early termination module 203 obtains the encoding information of the target data block obtained by encoding the target data block according to the second mode. If the encoding information of the target data block satisfies the second division termination condition (ie, the sixth condition hereinafter), the division early termination module 203 determines that the encoding mode for the target data block is the second mode.
  • the second division termination condition ie, the sixth condition hereinafter
  • the encoder 20 shown in FIG. 2 can formulate an encoding mode suitable for the scene complexity of the target data block according to the scene complexity of the target data block, which effectively improves the encoding speed of the target data block, thereby improving the video encoding speed.
  • the division early termination module 203 in the encoder 20 can determine the coding mode of the target data block when the coding information satisfies the division termination condition, and terminate the further division of the target data block, so that the coding speed of the target data block is different from the coding speed of the target data block.
  • the bit rate is balanced, the encoding speed of the target data block is further improved, and the video encoding efficiency is further improved.
  • the video processing solution provided by the embodiment of the present application and the specific application scenario of the encoder 20 provided by the embodiment shown in FIG. 2 will be introduced below with reference to the video processing system shown in FIG. 4 .
  • FIG. 4 is a schematic structural diagram of a video processing system provided by an embodiment of the present application.
  • the video processing system includes P terminals (for example, a first terminal 401, a second terminal 402, etc.) and a server 403, where P is an integer greater than 1.
  • P terminals may be a device with a camera function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and a smart wearable device, but is not limited thereto.
  • Any one of the P terminals can support the installation and operation of various applications, and the applications here can include but are not limited to social applications (such as instant messaging applications, audio conversation applications, video conversation applications, etc.
  • the server may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server providing cloud computing services, which is not limited in this embodiment of the present application.
  • the P terminals and the server 403 may be directly or indirectly connected through wired communication or wireless communication, which is not limited in this application.
  • the video processing solution provided by the embodiment of the present application is described below by taking a video session scenario as an example.
  • the encoders 20 are respectively deployed in P terminals, and video processing is performed by the P terminals.
  • the server 403 is configured to transmit the target video generated by the P terminals during the video session.
  • the target video may include session videos generated by P terminals during a video session.
  • the first terminal 401 is any one of the P terminals.
  • the processing of the target video by the first terminal 401 is taken as an example for description.
  • the processing process of the target video by other terminals in the P terminals except the first terminal 401 The process of processing the target video by the first terminal 401 is the same.
  • the target video is composed of a plurality of consecutive video frames, and each video frame included in the target video includes a plurality of data blocks to be encoded.
  • the encoder 20 is deployed in the first terminal 401, and the first terminal 401 calls the encoder 20 to analyze the scene complexity of each data block in each video frame included in the target video, and determines the encoding mode for each data block. , and encode each data block according to the determined encoding mode, and finally obtain the encoded target video.
  • the first terminal 401 can send the encoded target video to the server 403, and the server 403 transmits the encoded target video to other terminals participating in the video session; It is transmitted to other terminals participating in the video session; other terminals receive the encoded target video, parse and play the target video, so as to realize a video session in which P terminals participate.
  • the encoder 20 is deployed in the server 403, and the server 403 performs video processing.
  • the server 403 is configured to process and transmit the target video generated by the P terminals during the video session.
  • the target video may include session videos generated by P terminals during a video session.
  • the target video may be a session video generated by the first terminal 401 in the process of participating in the video session.
  • the first terminal 401 transmits the target video to the server 403.
  • the target video is composed of multiple consecutive video frames, and each video contained in the target video A frame includes multiple blocks of data to be encoded.
  • the encoder 20 is deployed in the server 403, and the server 403 calls the encoder 20 to analyze the scene complexity of each data block in each video frame included in the target video, and determines the encoding mode for each data block, and respectively according to The obtained encoding mode is determined to encode each data block, and finally an encoded target video is obtained.
  • the server 403 transmits the encoded target video to other terminals participating in the video session; other terminals receive the encoded target video, parse and play the target video, so as to realize a video session in which P terminals participate.
  • each terminal or server participating in the video session can call the encoder to encode the target video generated during the video session, and the encoding mode of each terminal or server for each data block included in the target video is based on each The scene complexity of the data block is determined, which effectively improves the encoding speed of the data block, thereby improving the encoding speed of the target video.
  • the encoding speed of the target video is greatly accelerated, thereby improving the The smoothness of the target video is improved, the session quality of the video session is improved to a certain extent, the effect of the video session is optimized, and the user experience is improved.
  • FIG. 5 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • the video processing method may be executed by a smart device, and the smart device may be a user terminal or a server, and the user terminal may be a smart phone or a tablet computer.
  • a device with a camera function such as a smart wearable device
  • the smart device can be, for example, any terminal or server in the video processing system shown in Figure 4, and the video processing method includes the following steps S501 to S505:
  • S501 Acquire a target video frame from a video to be encoded, and determine a target data block to be encoded from the target video frame.
  • the video to be encoded consists of a plurality of consecutive video frames, and a target video frame is obtained from the video to be encoded, and the target video frame is any video frame in the video to be encoded.
  • the target video frame contains multiple data blocks, and there may be encoded data blocks and data blocks to be encoded in the multiple data blocks. Determine the target data block to be encoded from the target video frame, and the target data block is in the target video frame. Any data block to be encoded.
  • the data block index information obtained by performing scene complexity analysis on the target data block may include, but is not limited to, at least one of the following: estimated distortion parameters of the target data block, spatial information parameters of the target data block, and time information parameters of the target data block.
  • the estimated distortion (Distortion, Dist) parameter is obtained by estimating the degree of distortion in the process of intra-frame predictive coding or inter-frame predictive coding.
  • the estimated distortion parameter of the target data block can be used to measure the degree of distortion of the reconstructed target data block compared to the original target data block.
  • the original target data block refers to an unencoded target data block
  • the reconstructed target data block refers to an encoded target data block obtained by performing intra-frame prediction encoding or inter-frame prediction encoding on the target data block.
  • the spatial information parameter of the target data block may refer to the numerical value of the spatial information obtained by calculating the target data block; the time information parameter of the target data block may refer to the numerical value of the time information obtained by calculating the target data block.
  • S503 Divide the target data block into N sub-data blocks, and perform scene complexity analysis on each sub-data block to obtain sub-block index information, where N is an integer greater than or equal to 2.
  • the sub-block indicator information may include N sub-block indicator data, the i-th sub-data block is any sub-data block in the N sub-data blocks, and the i-th sub-block indicator data is any sub-block indicator data in the N sub-block indicator data.
  • the i-th sub-data block corresponds to the i-th sub-block indicator data, and the i-th sub-block indicator data is obtained by analyzing the scene complexity of the i-th sub-data block, i ⁇ [1, N].
  • the i-th sub-block indicator data may include, but is not limited to, at least one of the following: distortion estimation parameters of the i-th sub-block, spatial information parameters of the i-th sub-block, and time information parameters of the i-th sub-block.
  • the estimated distortion parameter of the i-th sub-data block can be used to measure the distortion degree of the i-th sub-data block after reconstruction compared to the original i-th sub-data block.
  • the original i-th sub-data block refers to the i-th sub-data block that has not been encoded
  • the reconstructed i-th sub-data block refers to the post-encoding obtained by performing intra-frame prediction encoding or inter-frame prediction encoding on the i-th sub-data block.
  • the spatial information parameter of the ith sub-data block may refer to the value of spatial information obtained by calculating the ith sub-data block; the time information parameter of the ith sub-data block may refer to the time obtained by calculating the ith sub-data block The value of the information.
  • S504 Determine an encoding mode for the target data block according to the data block index information and the sub-block index information.
  • the data block index information and sub-block index information may be input into a joint statistical model, the joint statistical model is used to calculate the data block index information and the sub-block index information, and the data block index information and sub-block index information of the joint statistical model are obtained.
  • the output value obtained after the calculation is performed, and the coding mode for the target data block is determined according to the output value of the joint statistical model.
  • the joint statistical model can be obtained by training the data block index information of the data block of which the encoding mode has been determined, and the sub-block index information of N sub-data blocks obtained by dividing the data block of the determined encoding mode.
  • the joint statistical model may perform weighted calculation on the data block index information and the sub-block index information to obtain an output value, wherein the weighting factor may be obtained by training the relevant information of the data block and its sub-blocks whose encoding mode has been determined.
  • the output value of the joint statistical model satisfies the first condition, that is, the output value is greater than the first division threshold, it is determined that the encoding mode for the target data block is the first mode.
  • the output value of the joint statistical model satisfies the first condition, indicating that the correlation between the N sub-data blocks obtained by dividing the target data block is weak, and it is more inclined to divide the target data block into N sub-data blocks. Sub-blocks are encoded. If the output value of the joint statistical model satisfies the second condition, that is, the output value is smaller than the second division threshold, it is determined that the encoding mode for the target data block is the second mode.
  • the output value of the joint statistical model satisfies the second condition, indicating that the correlation between the N sub-data blocks obtained by dividing the target data block is strong, and it is more inclined to not divide the target data block, but directly encode the target data block.
  • the first division threshold and the second division threshold may be obtained during training of the joint statistical model.
  • encoding the target data block according to the determined encoding mode may refer to encoding the target data block according to the first mode, specifically may refer to dividing the target data block into N sub-data blocks, and Each sub-data block is input into the encoder for encoding.
  • encoding mode is the second mode
  • encoding the target data block according to the determined encoding mode may refer to encoding the target data block according to the second mode, specifically, directly inputting the target data block into the encoder for encoding.
  • the scene complexity analysis results (that is, the data block index information and sub-block index information) of the scene complexity analysis on the target data block are input into the joint statistical model for calculation, and the calculated output is based on the joint statistical model.
  • the value determines the encoding mode for the target data block.
  • the video processing solution provided by the embodiments of the present application is applicable to any video scene, and can adaptively adjust the encoding mode of the target data block according to the scene complexity of the target data block to be encoded in any video, so as to determine whether the target data block is the same as the target data block.
  • the encoding mode adapted to the scene complexity of the block effectively improves the encoding speed of the target data block, thereby improving the video encoding speed, and can obtain the best balance between the encoding speed of the video and the encoding bit rate of the video.
  • the video processing method may be executed by a smart device, and the smart device may be a user terminal or a server, and the user terminal may be a smart phone, a tablet Computers, smart wearable devices and other devices with camera functions, the smart device can be, for example, any terminal or server in the video processing system shown in FIG. 4 , and the video processing method includes the following steps S601 to S608:
  • S601 Acquire a target video frame from a video to be encoded, and determine a target data block to be encoded from the target video frame.
  • S602 Perform scene complexity analysis on the target data block to obtain data block index information.
  • S603 Divide the target data block into N sub-data blocks, and perform scene complexity analysis on each sub-data block respectively to obtain sub-block index information, where N is an integer greater than or equal to 2.
  • step S601 in the embodiment of the present application is the same as the execution process of step S501 in the embodiment shown in FIG. 5
  • the execution process of step S602 is the same as the execution process of step S502 in the embodiment shown in FIG. 5
  • the execution process of step S603 The execution process of step S503 in the embodiment shown in FIG. 5 is the same.
  • S605 Obtain an output value obtained by calculating the data block index information and the sub-block index information by the joint statistical model.
  • S607 Determine an encoding mode for the target data block according to the associated block index information.
  • the output value of the joint statistical model satisfies the third condition, that is, the output value is less than or equal to the first division threshold, and the output value is greater than or equal to the second division threshold, it indicates that the N subsections obtained by dividing the target data block
  • the correlation between the data blocks is between strong and weak, there is no obvious tendency to divide the target data block into N sub-data blocks and encode each sub-data block separately, and there is no obvious tendency to not target the target data block.
  • the target data block is directly encoded by dividing.
  • M associated data blocks related to the target data block are determined, and scene complexity analysis is performed on the M associated data blocks to obtain associated block index information of the M associated data blocks, and according to the M associated data blocks
  • the associated block index information of the data block determines the coding mode for the target data block, and M is a positive integer.
  • the associated block index information of the M associated data blocks may include a first number of associated data blocks that are divided into a plurality of sub-data blocks for encoding among the M associated data blocks.
  • the first number satisfies the fourth condition, that is, the first number is greater than or equal to the first number threshold, it indicates that the scene complexity of the M associated data blocks related to the target data block is relatively high, and you can try first.
  • Divide the target data block into N sub-data blocks to be analyzed (ie, N sub-data blocks), and perform scene complexity analysis on the N sub-data blocks to be analyzed to obtain the sub-data blocks to be analyzed of the N sub-data blocks to be analyzed index information, and further determine the encoding mode for the target data block according to the index information of the sub-data block to be analyzed.
  • the first quantity threshold may be set according to an empirical value.
  • the first quantity threshold may be set as 2.
  • the three associated data blocks related to the target data block 301 are the associated data block 302, the associated data block 303 and the associated data block 304 respectively, and the associated data block 302 and the associated data block 303 are divided into multiple sub-data blocks.
  • the block is encoded, and the associated data block 304 is directly encoded without being divided, that is, the first number of associated data blocks that are divided into a plurality of sub-data blocks for encoding among the three associated data blocks is two.
  • the indicator information of the sub-data blocks to be analyzed may include the second number of sub-data blocks to be analyzed that satisfy the further division conditions among the N sub-data blocks to be analyzed.
  • the index information of the sub-data blocks to be analyzed may refer to the data block index information of each of the N sub-data blocks to be analyzed in the N sub-data blocks to be analyzed by using a joint statistical model and the index information for dividing each data block to be analyzed into Among the N output values obtained by calculating the sub-block indication information of the N sub-data blocks, the second number of output values that satisfy the first condition.
  • the second quantity threshold may also be set according to an empirical value.
  • the second quantity threshold may be set to 3. If all the four sub-data blocks to be analyzed satisfy the further division conditions, that is, the second number of the four to-be-analyzed sub-data blocks that satisfy the further division conditions is four, and the second number satisfies the fifth condition, it is possible to It is determined that the encoding mode for the target data block is the first mode.
  • the first number does not satisfy the fourth condition, that is, the first number is less than the first number threshold, it indicates that the scene complexity of the M associated data blocks related to the target data block is relatively low, and you can try not to
  • the target data block is directly encoded by dividing the data block, and the encoding information of the target data block obtained by encoding the target data block is obtained.
  • the encoding information of the target data block may include encoding distortion parameters of the target data block.
  • the distortion estimation parameter of the target data block may refer to the estimated value of the degree of distortion in the encoding process of the target data block, and the encoding distortion parameter of the target data block may be used to measure the encoded target data block (that is, after reconstruction).
  • the actual distortion degree of the target data block) compared to the target data block before encoding ie the original target data block.
  • the encoding parameter is calculated, and the calculation process of the encoding parameter can refer to the following formula 1:
  • Code represents the coding parameter
  • Dist represents the coding distortion parameter of the target data block
  • QP represents the quantization parameter
  • the encoding parameter if the encoding parameter satisfies the sixth condition, that is, the encoding parameter is smaller than the third division threshold, it may be determined that the encoding mode for the target data block is the second mode. In another embodiment, if the encoding parameter does not satisfy the sixth condition, that is, the encoding parameter is greater than or equal to the third division threshold, try to divide the target data block into N sub-data blocks, and encode each sub-data block separately, And according to the coding information of the target data block obtained by directly coding the target data block, and according to the coding information of the N sub-data blocks obtained by dividing the target data block into N sub-data blocks and encoding each sub-data block respectively, Determines the encoding mode for the target data block. It should be noted that the third division threshold can be obtained by training during the training process of the joint statistical model.
  • the encoding information of the target data block may further include the first encoding rate-distortion loss parameter of the target data block; the encoding information of the N sub-data blocks may include the second encoding rate-distortion loss parameter of the N sub-data blocks .
  • the second rate-distortion loss parameter is calculated according to the third rate-distortion loss parameter of each of the N sub-data blocks, for example, the second rate-distortion loss parameter may be each of the N sub-blocks Sum of the third coding rate-distortion loss parameter of the sub-block.
  • the first rate-distortion loss parameter is calculated according to the coding rate and the coding distortion parameter according to the coding mode in which the target data block is directly encoded.
  • the first coding rate-distortion loss parameter may be the coding rate and the coding distortion. ratio of parameters.
  • the first coding rate-distortion loss parameter can be used to measure the coding effect of directly coding the target data block. The smaller the first coding rate-distortion loss parameter is, the better the coding effect of the target data block is.
  • the good coding effect of the target data block may mean that the coding rate of the target data block is low under the condition that the distortion degree of the target data block after coding is lower than that of the target data block before coding.
  • the second coding rate-distortion loss parameter may be used to measure the coding effect of dividing the target data block into N sub-data blocks and encoding each sub-data block separately. The smaller the second coding rate-distortion loss parameter, the better the effect of coding the target data block. If the first rate-distortion loss parameter is greater than or equal to the second rate-distortion loss parameter, the encoding mode for the target data block is determined to be the first mode; if the first rate-distortion loss parameter is less than the second rate-distortion loss parameter, Then, it is determined that the encoding mode for the target data block is the second mode.
  • step S608 in this embodiment of the present application is the same as the execution process of step S505 in the embodiment shown in FIG. 5 .
  • specific execution process reference may be made to the description of the embodiment shown in FIG. 5 , which will not be repeated here.
  • the scene complexity of the associated data block related to the target data block can also reflect the scene complexity of the target data block to a certain extent.
  • the output value of the joint statistical model) and the scene complexity of the associated data block related to the target data block ie, the associated block indication information jointly determine the coding mode for the target data block.
  • the video processing solution provided by the embodiments of the present application is universal, and can adaptively adjust the encoding mode of the target data block according to the scene complexity of the target data block to be encoded in any video, so as to determine the difference between the target data block and the target data block.
  • the coding mode adapted to the scene complexity effectively improves the coding speed of the target data block, thereby improving the video coding speed, and can effectively improve the video coding speed and obtain the video coding speed without losing the video coding efficiency.
  • FIG. 7 is a schematic flowchart of another video processing method provided by an embodiment of the present application.
  • the video processing method may be executed by a smart device, the smart device may be a user terminal or a server, and the user terminal may be a smart phone, a tablet A device with a camera function such as a computer and a smart wearable device, for example, the smart device can be any terminal or server in the video processing system shown in FIG. 4 , and the video processing method includes the following steps S701 to S720:
  • S701 Acquire a target video frame from a video to be encoded, and determine a target data block to be encoded from the target video frame.
  • S702 Perform scene complexity analysis on the target data block to obtain data block index information.
  • S703 Divide the target data block into N sub-data blocks, and perform scene complexity analysis on each sub-data block to obtain sub-block index information.
  • the data block index information and the sub-block index information are input into the joint statistical model for calculation to obtain an output value of the joint statistical model.
  • step S705 determine whether the output value satisfies the first condition. If the output value satisfies the first condition, step S706 is executed; if the output value does not meet the first condition, step S707 is executed.
  • step S706 if the output value satisfies the first condition, determine that the encoding mode for the target data block is the first mode. After the execution of step S706 is completed, step S720 is executed.
  • step S707 if the output value does not satisfy the first condition, determine whether the output value satisfies the second condition. If the output value satisfies the second condition, step S708 is executed; if the output value does not meet the second condition, step S709 is executed.
  • step S708 if the output value satisfies the second condition, determine that the encoding mode for the target data block is the second mode. After the execution of step S708 is completed, step S720 is executed.
  • step S709 if the output value does not satisfy the second condition, then determine whether the output value satisfies the third condition. If the output value satisfies the third condition, step S710 is executed.
  • step S711 judging whether the first quantity satisfies the fourth condition. If the first quantity satisfies the fourth condition, step S712 is performed; if the first quantity does not satisfy the fourth condition, step S716 is performed.
  • S713 Determine the second number of sub-data blocks that satisfy the further division condition among the N sub-data blocks.
  • step S714 determine whether the second quantity satisfies the fifth condition. If the second number satisfies the fifth condition, it is determined that the encoding mode for the target data block is the first mode, and step S720 is executed. If the second number does not satisfy the fifth condition, step S715 is executed.
  • step S715 try to directly encode the target data block without dividing the target data, and determine an encoding mode for the target data block according to the encoding information. Specifically, obtain the encoding information of the target data block obtained by directly encoding the target data block, and obtain the encoding information of the N sub-data blocks obtained by dividing the target data block into N sub-data blocks and encoding each sub-data block respectively information, and determine the encoding mode for the target data block according to the encoding information of the target data block and the encoding information of the N sub-data blocks.
  • step S720 is executed.
  • S717 Calculate and obtain the encoding parameter according to the encoding distortion parameter and the quantization parameter.
  • step S718 determine whether the encoding parameter satisfies the sixth condition. If the encoding parameter satisfies the sixth condition, it is determined that the encoding mode for the target data block is the second mode, and step S720 is performed; if the encoding parameter does not satisfy the sixth condition, step S719 is performed.
  • step S719 try to divide the target sub-data block into N sub-data blocks, encode each sub-data block separately, and determine an encoding mode for the target data block according to the encoding information. Specifically, obtain the encoding information of the target data block obtained by directly encoding the target data block, and obtain the encoding information of the N sub-data blocks obtained by dividing the target data block into N sub-data blocks and encoding each sub-data block respectively information, and determine the encoding mode for the target data block according to the encoding information of the target data block and the encoding information of the N sub-data blocks.
  • step S720 is executed.
  • S720 encode the target data block according to the determined encoding mode.
  • another data block to be encoded other than the target data block can be determined from the video frame to be encoded, and determined according to the scene complexity of the data block to be encoded
  • the data block to be encoded is encoded according to the encoding mode of the data block to be encoded and according to the determined encoding mode.
  • the process of determining the encoding mode of the data block to be encoded according to the scene complexity of the data block to be encoded is the same as the process of determining the encoding mode of the target data block according to the scene complexity of the target data block.
  • the scene complexity of the associated data block related to the target data block can also reflect the scene complexity of the target data block to a certain extent.
  • the output value of the joint statistical model) and the scene complexity of the associated data block related to the target data block ie, the associated block indication information jointly determine the coding mode for the target data block.
  • FIG. 8 is a schematic structural diagram of a video processing apparatus provided by an embodiment of the present application.
  • the video processing apparatus 80 in the embodiment of the present application may be set in a smart device, and the smart device may be a smart terminal, or It can be a server, and the video processing apparatus 80 can be used to execute the corresponding steps in the video processing method shown in FIG. 5 , FIG. 6 or FIG. 7 , and the video processing apparatus 80 includes the following units.
  • Obtaining unit 801 for obtaining the target video frame from the video to be encoded, and determining the target data block to be encoded from the target video frame;
  • the processing unit 802 is configured to perform scene complexity analysis on the target data block to obtain data block index information; divide the target data block into N sub-data blocks, and perform scene complexity analysis on each sub-data block respectively to obtain the sub-blocks index information, N is an integer greater than or equal to 2; determine the encoding mode for the target data block according to the data block index information and the sub-block index information; and encode the target data block according to the determined encoding mode.
  • the data block indicator information includes any one or more of: distortion estimation parameters of the target data block, spatial information parameters of the target data block, and temporal information parameters of the target data block.
  • the sub-block indicator information includes: N sub-block indicator data, wherein the i-th sub-block indicator data in the N sub-block indicator data includes: distortion estimation parameters of the i-th sub-data block in the N sub-data blocks, and the i-th sub-data block. Any one or more of the spatial information parameters of , and the time information parameters of the ith sub-data block, i ⁇ [1,N].
  • processing unit 802 is specifically configured to:
  • processing unit 802 is specifically configured to:
  • that the output value satisfies the first condition means that the output value is greater than the first division threshold.
  • processing unit 802 is specifically configured to:
  • that the output value satisfies the second condition means that the output value is smaller than the second division threshold.
  • processing unit 802 is specifically configured to:
  • the scene complexity analysis is performed on the M associated data blocks related to the target data block, and the associated block index information of the M associated data blocks is obtained, where M is a positive integer;
  • the output value satisfying the third condition means that the output value is less than or equal to the first division threshold, and the output value is greater than or equal to the second division threshold.
  • processing unit 802 is specifically configured to:
  • associated block indicator information includes: the first number of associated data blocks that are divided into multiple sub-data blocks for encoding in the M associated data blocks;
  • the target data block is divided into N sub-data blocks to be analyzed
  • that the first quantity satisfies the fourth condition means that the first quantity is greater than or equal to the first quantity threshold.
  • processing unit 802 is specifically configured to:
  • the index information of sub-data blocks to be analyzed includes: a second number of sub-data blocks to be analyzed that satisfy further division conditions among the N sub-data blocks to be analyzed;
  • the target data block is input into the encoder for encoding, and the encoding mode for the target data block is determined according to the encoding information of the target data block obtained by encoding;
  • that the second quantity satisfies the fifth condition means that the second quantity is greater than or equal to the second quantity threshold; the second quantity does not satisfy the fifth condition means that the second quantity is smaller than the second quantity threshold.
  • processing unit 802 is further configured to:
  • that the first quantity does not satisfy the fourth condition means that the first quantity is smaller than the first quantity threshold.
  • the coding information of the target data block includes coding distortion parameters of the target data block; the processing unit 802 is specifically configured to:
  • the coding parameter is obtained by calculating
  • the encoding parameter satisfies the sixth condition, then determine that the encoding mode is the second mode;
  • the target data block is divided into N sub-data blocks, and each sub-data block in the N sub-data blocks is input into the encoder for encoding;
  • that the coding parameter satisfies the sixth condition means that the coding parameter is smaller than the third division threshold; and that the coding parameter does not satisfy the sixth condition means that the coding parameter is greater than or equal to the third division threshold.
  • the encoding information of the target data block further includes a first encoding rate-distortion loss parameter of the target data block;
  • the encoding information of the N sub-data blocks includes a second encoding rate-distortion loss parameter of the N sub-data blocks, the second encoding
  • the rate-distortion loss parameter is calculated according to the third coding rate-distortion loss parameter of each of the N sub-data blocks; the processing unit 802 is specifically used for:
  • the first coding rate-distortion loss parameter is greater than or equal to the second coding rate-distortion loss parameter, determining that the coding mode for the target data block is the first mode
  • the coding mode for the target data block is the second mode.
  • each unit in the video processing apparatus 80 shown in FIG. 8 may be separately or all merged into one or several other units to form, or some of the unit(s) may be further subdivided It is composed of multiple units that are functionally smaller, which can achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit.
  • the video processing apparatus 80 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by cooperation of multiple units.
  • a general-purpose computing device including a general-purpose computer such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., and a general-purpose computer may be implemented Running a computer program (including program code) capable of executing the steps involved in the corresponding method as shown in FIG. 5, FIG. 6 or FIG. 7, to construct the video processing apparatus 80 as shown in FIG. 8, and to realize The video processing method of the embodiment of the present application.
  • the computer program can be recorded on, for example, a computer-readable storage medium, and loaded on any terminal (for example, the first terminal 401 or the second terminal 402, etc.) or server of the video processing system shown in FIG. 4 through the computer-readable storage medium 403, and run within it.
  • the analysis result of the entire target data block to be encoded is obtained and the target data block to be encoded is divided into multiple sub-data respectively.
  • an appropriate encoding mode can be more accurately determined for the target data block currently to be encoded, which can effectively improve the encoding speed of the target data block and improve the video encoding efficiency.
  • FIG. 9 is a schematic structural diagram of a smart device provided by an embodiment of the present application.
  • the smart device 90 includes at least a processor 901 and a memory 902 .
  • the processor 901 and the memory 902 may be connected through a bus or other means.
  • the processor 901 may be a central processing unit (Central Processing Unit, CPU).
  • the processor 901 may further include a hardware chip.
  • the above-mentioned hardware chip can be an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), and the like.
  • the above-mentioned PLD may be a Field-Programmable Gate Array (FPGA), a Generic Array Logic (GAL), or the like.
  • the memory 902 may include a volatile memory (Volatile Memory), such as a random access memory (Random-Access Memory, RAM); the memory 902 may also include a non-volatile memory (Non-Volatile Memory), such as a flash memory (Flash memory) Memory), solid-state drive (Solid-State Drive, SSD), etc.; the memory 902 may also include a combination of the above-mentioned types of memory.
  • volatile memory such as a random access memory (Random-Access Memory, RAM
  • non-Volatile Memory such as a flash memory (Flash memory) Memory), solid-state drive (Solid-State Drive, SSD), etc.
  • flash memory Flash memory
  • SSD solid-state drive
  • the memory 902 is used to store computer programs, the computer programs include computer instructions, and the processor 901 is used to execute the computer instructions.
  • the processor 901 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and the control core of the smart device 90, which is suitable for implementing one or more computer instructions, specifically suitable for loading and executing one or more computer instructions.
  • the instruction thus implements the corresponding method flow or corresponding function.
  • the smart device 90 can be any terminal in the video processing system shown in FIG. 4 (for example, the first terminal 401 or the second terminal 402, etc.) or the server 403; the memory 902 stores a computer program, and the computer program includes a or multiple computer instructions; one or more computer instructions are loaded and executed by the processor 901 to realize the corresponding steps in the method embodiment shown in FIG. 5, FIG. 6 or FIG. 7; in the specific implementation, the computer instructions in the memory 902 Loaded by the processor 901 and executes the following steps:
  • N is an integer greater than or equal to 2;
  • the target data block is encoded according to the determined encoding mode.
  • the data block indicator information includes: any one or more of a distortion estimation parameter of the target data block, a spatial information parameter of the target data block, and a time information parameter of the target data block;
  • the sub-block indicator information includes: N sub-block indicator data, wherein the i-th sub-block indicator data in the N sub-block indicator data includes: distortion estimation parameters of the i-th sub-data block in the N sub-data blocks, and the i-th sub-data block. Any one or more of the spatial information parameters of , and the time information parameters of the ith sub-data block, i ⁇ [1,N].
  • that the output value satisfies the first condition means that the output value is greater than the first division threshold.
  • that the output value satisfies the second condition means that the output value is smaller than the second division threshold.
  • the scene complexity analysis is performed on the M associated data blocks related to the target data block, and the associated block index information of the M associated data blocks is obtained, where M is a positive integer;
  • the output value satisfying the third condition means that the output value is less than or equal to the first division threshold, and the output value is greater than or equal to the second division threshold.
  • associated block indicator information includes: the first number of associated data blocks that are divided into multiple sub-data blocks for encoding in the M associated data blocks;
  • the target data block is divided into N sub-data blocks to be analyzed
  • that the first quantity satisfies the fourth condition means that the first quantity is greater than or equal to the first quantity threshold.
  • the index information of sub-data blocks to be analyzed includes: a second number of sub-data blocks to be analyzed that satisfy further division conditions among the N sub-data blocks to be analyzed;
  • the target data block is input into the encoder for encoding, and the encoding mode for the target data block is determined according to the encoding information of the target data block obtained by encoding;
  • that the second quantity satisfies the fifth condition means that the second quantity is greater than or equal to the second quantity threshold; the second quantity does not satisfy the fifth condition means that the second quantity is smaller than the second quantity threshold.
  • that the first quantity does not satisfy the fourth condition means that the first quantity is smaller than the first quantity threshold.
  • the encoding information of the target data block includes encoding distortion parameters of the target data block; when the computer instructions in the memory 902 are loaded by the processor 901, the following steps are specifically performed:
  • the coding parameter is obtained by calculating
  • the encoding parameter satisfies the sixth condition, then determine that the encoding mode is the second mode;
  • the target data block is divided into N sub-data blocks, and each sub-data block in the N sub-data blocks is input into the encoder for encoding;
  • that the coding parameter satisfies the sixth condition means that the coding parameter is smaller than the third division threshold; and that the coding parameter does not satisfy the sixth condition means that the coding parameter is greater than or equal to the third division threshold.
  • the encoding information of the target data block further includes a first encoding rate-distortion loss parameter of the target data block;
  • the encoding information of the N sub-data blocks includes a second encoding rate-distortion loss parameter of the N sub-data blocks, the second encoding
  • the rate-distortion loss parameter is calculated according to the third coding rate-distortion loss parameter of each of the N sub-data blocks; when the computer instructions in the memory 902 are loaded by the processor 901, the following steps are specifically performed:
  • the first coding rate-distortion loss parameter is greater than or equal to the second coding rate-distortion loss parameter, determining that the coding mode for the target data block is the first mode
  • the coding mode for the target data block is the second mode.
  • the analysis result of the entire target data block to be encoded is obtained and the target data block to be encoded is divided into multiple sub-data respectively.
  • an appropriate encoding mode can be more accurately determined for the target data block currently to be encoded, which can effectively improve the encoding speed of the target data block and improve the video encoding efficiency.
  • a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video processing methods provided in the above-mentioned various optional manners.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种视频处理方法、视频处理装置、智能设备及存储介质,该视频处理方法包括:从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块;对目标数据块进行场景复杂度分析,得到数据块指标信息;将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息;根据数据块指标信息和子块指标信息,确定对目标数据块的编码模式;按照确定的编码模式对目标数据块进行编码。采用本申请实施例,可以较为准确地选择合适的编码模式进行视频帧编码。

Description

一种视频处理方法、视频处理装置、智能设备及存储介质
本申请要求于2020年11月9日提交中国专利局、申请号为202011239333.1、名称为“一种视频处理方法、视频处理装置、智能设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种视频处理方法、视频处理装置、智能设备以及计算机可读存储介质。
背景技术
目前,视频编码技术广泛应用于视频会话、视频点播等场景中。例如,采用视频编码技术对视频会话场景中涉及的会话视频进行处理,采用视频编码技术对视频点播场景中涉及的点播视频进行处理等等。视频编码技术是指按照编码模式对视频进行压缩编码的技术。对视频进行压缩编码能够有效节省视频的存储空间,提升视频的传输效率。
在视频编码过程中准确地确定较为合适的编码模式,并按照该编码模式对视频进行编码能够加速整个视频编码过程,提升视频编码效率,因此,在视频编码过程中如何较为准确地确定编码模式进行视频帧编码成为当前研究的热点问题。
发明内容
本申请实施例提供了一种视频处理方法、装置、设备及存储介质,可以较为准确地选择合适的编码模式进行视频帧编码。
一方面,本申请实施例提供一种视频处理方法,由智能设备执行,该视频处理方法包括:
从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块;
对目标数据块进行场景复杂度分析,得到数据块指标信息;
将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数;
根据数据块指标信息和子块指标信息,确定对目标数据块的编码模式;
按照确定的编码模式对目标数据块进行编码。
另一方面,本申请实施例提供一种视频处理装置,该视频处理装置包括:
获取单元,用于从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块;
处理单元,用于对目标数据块进行场景复杂度分析,得到数据块指标信息;将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数;根据数据块指标信息和子块指标信息,确定对目标数据块的编码模式;以及,按照确定的编码模式对目标数据块进行编码。
另一方面,本申请实施例提供一种智能设备,该智能设备包括:
处理器,适于实现计算机程序;以及,
存储器,所述存储器存储有计算机程序,所述计算机程序由处理器加载并运行,实现上述的视频处理方法。
另一方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被计算机设备的处理器读取并执行时,实现上述的视频处理方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,实现上述的视频处理方法。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1a是本申请实施例提供的一种视频帧的递归划分过程的示意图;
图1b是本申请实施例提供的一种视频帧的划分结果的示意图;
图2是本申请实施例提供的一种编码器的结构示意图;
图3是本申请实施例提供的一种关联数据块的示意图;
图4是本申请实施例提供的一种视频处理系统的架构示意图;
图5是本申请实施例提供的一种视频处理方法的流程示意图;
图6是本申请实施例提供的另一种视频处理方法的流程示意图;
图7是本申请实施例提供的另一种视频处理方法的流程示意图;
图8是本申请实施例提供的一种视频处理装置的结构示意图;
图9是本申请实施例提供的一种智能设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例涉及云技术(Cloud Technology),本申请实施例可以利用云技术实现视频处理过程。云技术是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。云技术是基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻 辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,这只能通过云计算来实现。
云计算(Cloud Computing)是一种计算模式,它将计算任务分布在大量计算机构成的资源池上,使各种应用系统能够根据需要获取计算力、存储空间和信息服务。提供资源的网络被称为“云”。“云”中的资源在使用者看来是可以无限扩展的,并且可以随时获取、按需使用、随时扩展、按使用付费。作为云计算的基础能力提供商,会建立云计算资源池(简称云平台),一般称为IaaS(Infrastructureas a Service,基础设施即服务)平台,并在云计算资源池中部署多种类型的虚拟资源,供外部客户选择使用。云计算资源池中主要包括:计算设备(为虚拟化机器,包含操作系统)、存储设备、网络设备。按照逻辑功能划分,在IaaS层上可以部署PaaS(Platform as a Service,平台即服务)层,PaaS层之上再部署SaaS(Software as a Service,软件即服务)层,也可以直接将SaaS部署在IaaS上。PaaS为软件(例如数据库、web容器等)运行的平台。SaaS为各式各样的业务软件(例如web门户网站、短信群发器等)。一般来说,SaaS和PaaS相对于IaaS是上层。云计算还指IT(Internet Technology,互联网技术)基础设施的交付和使用模式,指通过网络以按需、易扩展的方式获得所需资源。广义云计算指服务的交付和使用模式,指通过网络以按需、易扩展的方式获得所需服务。这种服务可以是IT和软件、互联网相关,也可是其他服务。云计算是网格计算(Grid Computing)、分布式计算(Distributed Computing)、并行计算(Parallel Computing)、效用计算(Utility Computing)、网络存储(Network Storage Technologies)、虚拟化(Virtualization)、负载均衡(Load Balance)等传统计算机和网络技术发展融合的产物。
云计算可以用于云会议领域,云会议是基于云计算技术的一种高效、便捷、低成本的会议形式。使用者只需要通过互联网界面,进行简单易用的操作,便可快速高效地与全球各地团队及客户同步分享语音、数据文件及视频,而会议中数据的传输、处理等复杂技术由云会议服务商帮助使用者进行操作。目前国内云会议主要集中在以SaaS模式为主体的服务内容,包括电话、网络、视频等服务形式,基于云计算的视频会议称为云会议。在云会议时代,数据的传输、处理、存储全部由视频会议厂家的计算机资源处理,使用者无需再购置昂贵的硬件和安装繁琐的软件,只需打开浏览器,登录相应界面,就能进行高效的远程会议。云会议系统支持多服务器动态集群部署,并提供多台高性能服务器,大大提升了会议稳定性、安全性、可用性。近年来,视频会议因能大幅提高沟通效率,持续降低沟通成本,带来内部管理水平升级,而获得众多用户欢迎,已广泛应用在政府、军队、交通、运输、金融、运营商、教育、企业等各个领域。毫无疑问,视频会议运用云计算以后,在方便性、快捷性、易用性上具有更强的吸引力,必将激发视频会议应用新高潮的到来。
本申请实施例涉及视频(Video),视频是由连续的视频帧构成的视频帧序列。由于人眼的视觉暂留效应,当视频帧序列以一定的速率播放时,我们看到的就是动作连续的视频。由于连续的视频帧之间相似性极高,每一个视频帧内部以及连续的视频帧之间均存在大量的冗余信息,因此在对视频进行存储或传输之前,往往需要采用视频编码(Video Encoding)技术对视频进行编码,去除视频在空间、时间等维度上的冗余信息,以节省存储空间、提升视频传输效率。
视频编码技术也可以称为视频压缩技术,是指按照编码模式对视频中的每一个视频帧进行压缩编码的技术。具体地,在对视频进行编码的过程中,需要将视频中的每一个视频 帧递归地划分成多种尺寸的数据块,然后分别将每一个划分得到的数据块输入编码器中进行编码。下面结合图1a对本申请实施例涉及的递归划分过程进行介绍。图1a是本申请实施例提供的一种视频帧的递归划分过程的示意图。如图1a所示,视频帧10由若干第一尺寸的数据块101组成;视频帧10的递归划分方式可以包括:①直接将第一尺寸的数据块101输入编码器中进行编码,也就是说,不对第一尺寸的数据块101进行划分,直接将第一尺寸的数据块输入编码器中进行编码;②将第一尺寸的数据块101划分为4个尺寸相同的第二尺寸的数据块102,不对第二尺寸的数据块102进行进一步划分,将第二尺寸的数据块102输入编码器中进行编码;③对第二尺寸的数据块102进行进一步划分,划分得到4个尺寸相同的第三尺寸的数据块103,不对第三尺寸的数据块103进行进一步划分,将第三尺寸的数据块103输入编码器中进行编码;以此类推,还可以对第三尺寸的数据块103进行进一步划分,从而将视频帧10递归地划分成多种尺寸的数据块,视频帧10的划分结果可参见图1b。图1b是本申请实施例提供的一种视频帧的划分结果的示意图。如图1b所示,划分后的视频帧10由三种尺寸的数据块组成,分别是第一尺寸的数据块101、第二尺寸的数据块102和第三尺寸的数据块103。
在实际视频编码的过程中,视频帧中包含的多个数据块的划分策略不同,对视频帧进行编码的编码模式也不相同,对视频帧的编码速度也不相同。本申请实施例提供一种视频处理方案,该视频处理方案提供两种不同的数据块划分策略,分别是自顶向下的数据块划分策略和自底向上的数据块划分策略。这两种数据块划分策略均能确定出视频帧中每一个数据块的最优划分方式。最优划分方式可以理解为按照该数据块划分方式对数据块进行划分后,对划分得到的数据块进行编码时的编码质量较好、编码码率较低。编码码率可以是指对编码得到的视频数据流进行传输时单位时间(例如1秒)内传输的数据量,单位时间内传输的数据量越多,编码码率越高。下面分别对以上两种数据块划分策略进行详细介绍。
(1)自顶向下的数据块划分策略
在对视频进行编码的过程中,对视频帧中的每一个数据块进行自顶向下的、递归的划分。具体地,自顶向下的、递归的划分可以是指从视频帧中数据块的最大尺寸向视频帧中数据块的最小尺寸进行划分,直至找到数据块的最优划分方式的过程。举例来说,视频帧中数据块的最大尺寸为64px(Pixel,像素)×64px,视频帧中数据块的最小尺寸为4px×4px。数据块的最大尺寸为64px×64px,数据块的最小尺寸为4px×4px仅用于举例,并不构成对本申请实施例的限定,例如数据块的最大尺寸还可以为128px×128px,数据块的最小尺寸还可以为8px×8px等等。可以将64px×64px的数据块划分为4个32px×32px的数据块。分别计算对4个32px×32px的数据块中的每一个数据块进行编码的率失真参数(Rate-Distortion Cost,RDCost),并计算4个32px×32px的数据块的RDCost之和(即计算4个32px×32px的数据块的第一总和RDCost),以及计算直接对64px×64px的数据块进行编码的RDCost。如果第一总和RDCost大于64px×64px的数据块的RDCost,则确定数据块的划分尺寸为64px×64px,即无需对64px×64px的数据块进行划分;如果第一总和RDCost小于或等于64px×64px的数据块的RDCost,则确定数据块的划分尺寸为32px×32px。随后,还可以继续将32px×32px的数据块划分为4个16px×16px的数据块进行编码,以此类推,直至找到数据块的最优划分方式。自顶向下的数据块划分策略通常情况下都能够较为准确的确定出最优划分方式。对于场景复杂程度较低的数据块,数据块的最优划分尺寸比较大,因此对于场景复杂程度较低的数据块,通过自顶向下的数据块划分策 略能够快速确定该数据块的最优划分方式。对于场景复杂程度较高的数据块,数据块的最优划分尺寸比较小,因此对于场景复杂程度较高的数据块,通过自顶向下的数据块划分策略确定该数据块的最优划分方式的过程需要耗费大量的时间成本,进而影响了该数据块的编码速度。
数据块的最优划分尺寸可以理解为当划分得到的数据块的尺寸为该尺寸时,对划分得到的数据块进行编码时的编码质量较好、编码码率较低。一般而言,场景复杂程度较低的数据块对应的最优划分尺寸比较大,场景复杂程度较高的数据块对应的最优划分尺寸比较小。数据块的场景复杂程度可以采用空间信息(Spatial Information,SI)或时间信息(Temporal Information)等方法进行衡量。空间信息可以用于表征数据块的空间细节量,数据块中包含的元素越多,数据块的空间信息的数值越高,数据块的场景复杂程度越高。例如,数据块A中包含5个元素(一只猫、一条狗、一棵树、一朵花和一个太阳),数据块B中包含2个元素(一只猫和一条狗),数据块A中包含的元素多于数据块B中包含的元素,那么数据块A的空间信息的数值高于数据块B的空间信息的数值,数据块A的场景复杂程度高于数据块B的场景复杂程度。时间信息可以用于表征数据块的时间变化量,当前正在处理的目标视频帧中的目标数据块相对于目标视频帧的参考视频帧中的参考数据块的运动程度越高,目标数据块的时间信息的数值越大,目标数据块的场景复杂程度越高。参考视频帧是视频帧序列中编码顺序位于目标视频帧之前的视频帧,目标数据块在目标视频帧中的位置与参考数据块在参考视频帧中的位置相同。例如,目标数据块中包含一个元素(例如汽车元素),参考数据块中同样包含该元素,目标数据块中汽车元素相对于参考数据块中汽车元素的位移越大,目标视频帧中的目标数据块相对于参考视频帧中的参考数据块的运动程度越高,目标数据块的时间信息的数值越大,目标数据块的场景复杂程度越高。
(2)自底向上的视频帧划分策略
在对视频进行编码的过程中,对视频帧中的每一个数据块进行自底向上的、递归的划分。具体地,自底向上的、递归的划分可以是指从视频帧中数据块的最小尺寸向视频帧中数据块的最大尺寸进行划分,直至找到数据块的最优划分方式的过程。举例来说,视频帧中数据块的最小尺寸为4px×4px,视频帧中数据块的最大尺寸为64px×64px。分别对4个4px×4px的数据块中的每一个数据块进行编码,再对由4个4px×4px的数据块组成的8px×8px的数据块进行编码,分别计算对4个4px×4px的数据块中的每一个数据块进行编码的RDCost,并计算4个4px×4px的数据块的RDCost之和(即计算4个4px×4px的数据块的第二总和RDCost);以及计算对8px×8px的数据块进行编码的RDCost。如果第二总和RDCost小于或等于8px×8px的数据块的RDCost,则确定数据块的划分尺寸为4px×4px;如果第二总和RDCost大于8px×8px的数据块的RDCost,则确定数据块的划分尺寸为8px×8px。随后,还可以继续对由4个8px×8px的数据块组成的16px×16px的数据块进行编码,以此类推,直至找到数据块的最优划分方式。自底向上的视频帧划分策略通常情况下都能够较为准确的确定出最优划分方式。对于场景复杂程度较高的数据块,数据块的最优划分尺寸比较小,因此对于场景复杂程度较高的数据块,通过自底向上的视频帧划分策略能够快速确定该数据块的最优划分方式。对于场景复杂程度较低的数据块,数据块的最优划分尺寸比较大,因此对于场景复杂程度较低的数据块,通过自底向上的视频帧划分策略确定该数据块的最优划分方式的过程需要耗费大量的时间成本,进而影响了该数 据块的编码速度。
由此可见,自顶向下的数据块划分策略和自底向上的视频帧划分策略通常情况下都能较为准确地确定出视频帧中每一个数据块的最优划分方式,自顶向下的数据块划分策略更适用于场景复杂程度较低的数据块,自底向上的视频帧划分策略更适用于场景复杂程度较高的数据块。
在此基础上,本申请实施例提供了进一步的视频处理方案,该视频处理方案对待编码视频帧中待编码的目标数据块进行场景复杂度分析;将目标数据块划分成多个子数据块,并分别对每一个子数据块进行场景复杂度分析;通过目标数据块的场景复杂度分析结果,以及划分得到的多个子数据块中每一个子数据块的场景复杂度分析结果,确定对目标数据块的较优划分方式,进而确定对目标数据块的编码模式,从而可以按照确定的编码模式对目标数据块进行编码。本申请实施例中的场景复杂度分析可以采用相关技术中的任意场景复杂度分析方法来执行,例如误差平方和(SSE,Sum of Squared Error)方法、绝对差和(SAD,Sum of Absolute Difference)方法等。本申请实施例提及的较优划分方式是指在编码质量与编码码率之间取得平衡的前提下,能够在一定程度上提升编码速度的数据块划分方式。该视频处理方案能够根据目标数据块的场景复杂度制定与目标数据块的场景复杂度相适应的编码模式,有效提升了对目标数据块的编码速度,从而提升了视频编码速度;并且该视频处理方案更具普适性,适用于任意视频编码场景,既能确定出适用于场景复杂程度较低的数据块的编码模式,又能确定出适用于场景复杂程度较高的数据块的编码模式。
基于上述描述,本申请实施例提供一种编码器,该编码器可以用于实现上述的视频处理方案,该视频编码器可以是AV1(Alliance for Open Media Video 1,开放媒体视频联盟1)标准编码器。图2是本申请实施例提供的一种编码器的结构示意图。如图2所示,该编码器20包括场景复杂度分析模块201、自适应划分决策模块202和划分提前终止模块203。
场景复杂度分析模块201可以用于对待编码视频帧中待编码的目标数据块进行场景复杂度分析,得到数据块指标信息。场景复杂度分析模块201还可以用于将目标数据块划分为N个子数据块,并对N个子数据块中的每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数。在实际编码场景中,N的取值一般为4,本申请实施例对此不作限定。将目标数据块划分为N个子数据块可以是指将目标数据块划分为4个尺寸相同的子数据块。
自适应划分决策模块202可以用于根据数据块指标信息和子块指标信息确定对目标数据块的编码模式。编码模式可以包括第一模式或第二模式中的任意一种,即自适应划分决策模块202可以根据数据块指标信息和子块指标信息确定出对目标数据块的编码模式为第一模式或第二模式。第一模式是指将数据块划分为N个子数据块,并分别对N个子数据块中的每一个子数据块进行编码的编码模式;第二模式是指不对目标数据块进行划分,直接对目标数据块进行编码的编码模式。若自适应划分决策模块202不能根据数据块指标信息和子块指标信息确定出对目标数据块的编码模式,那么自适应划分决策模块202还可以用于获取与目标数据块相关的M个关联数据块的关联块指标信息,并根据关联块指标信息确定对目标数据块的尝试编码顺序,M为正整数。在实际编码场景中,M的取值一般为3,本申请实施例对此不作限定。图3是本申请实施例提供的一种关联数据块的示意图。如图3所示,与目标数据块相关的M个关联数据块可以是指位于目标数据块301左部位置的关联数据块302、位于目标数据块301顶部位置的关联数据块303、位于目标数据块301左 上部位置的关联数据块304。尝试编码顺序可以包括第一尝试编码顺序或第二尝试编码顺序中的任意一种,即自适应划分决策模块202可以根据关联块指标信息确定对目标数据块的尝试编码顺序为第一尝试编码顺序或第二尝试编码顺序。第一尝试编码顺序是指优先尝试按照第一模式对目标数据块进行编码,再尝试按照第二模式对目标数据块进行编码的编码顺序;第二尝试编码顺序是指优先尝试按照第二模式对目标数据块进行编码,再尝试按照第一模式对目标数据块进行编码的编码顺序。
划分提前终止模块203用于在自适应划分决策模块202无法确定出对目标数据块的编码模式时设置划分终止条件,并根据设置的划分终止条件确定对目标数据块的编码模式。具体地,若自适应划分决策模块202确定对目标数据块的尝试编码顺序为第一尝试编码顺序,则划分提前终止模块203获取按照第一模式对目标数据块进行编码得到的N个子数据块的编码信息,若N个子数据块的编码信息满足第一划分终止条件(即下文中的第五条件),则划分提前终止模块203确定对目标数据块的编码模式为第一模式。若自适应划分决策模块202确定对目标数据块的尝试编码顺序为第二尝试编码顺序,则划分提前终止模块203获取按照第二模式对目标数据块进行编码得到的目标数据块的编码信息,若目标数据块的编码信息满足第二划分终止条件(即下文中的第六条件),则划分提前终止模块203确定对目标数据块的编码模式为第二模式。
图2所示的编码器20能够根据目标数据块的场景复杂度制定与目标数据块的场景复杂度相适应的编码模式,有效提升了对目标数据块的编码速度,从而提升了视频编码速度。此外,编码器20中的划分提前终止模块203能够在编码信息满足划分终止条件时确定出目标数据块的编码模式,并终止对目标数据块的进一步划分,从而在目标数据块的编码速度与编码码率取得平衡的前提下进一步提升目标数据块的编码速度,并进一步提升了视频编码效率。下面将结合图4所示的视频处理系统对本申请实施例提供的视频处理方案以及图2所示实施例提供的编码器20的具体应用场景进行介绍。
图4是本申请实施例提供的一种视频处理系统的架构示意图。如图4所示,该视频处理系统包括P个终端(例如第一终端401、第二终端402等)和服务器403,P为大于1的整数。其中,P个终端中的任一个终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能可穿戴设备等带摄像功能的设备,但并不局限于此。P个终端中的任一个终端可以支持各种应用程序的安装及运行,此处的应用程序可以包括但不限于社交应用程序(例如即时通信应用程序、音频会话应用程序、视频会话应用程序等等)、音视频应用程序(例如音视频点播应用程序、音视频播放器等等)、游戏应用程序等等。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器,本申请实施例在此不作限制。P个终端与服务器403可以通过有线通信或者无线通信方式进行直接或间接地连接,本申请在此不做限制。下面以视频会话场景为例,对本申请实施例提供的视频处理方案进行介绍。
(1)编码器20分别部署于P个终端中,由P个终端进行视频处理。
P个用户分别使用P个终端中运行的视频会话应用程序参与视频会话,例如用户1使用第一终端401中运行的视频会话应用程序参与视频会话,用户2使用第二终端402中运行的视频会话应用程序参与视频会话等等。服务器403用于对P个终端在视频会话过程中产生的目标视频进行传输。其中,目标视频可以包括P个终端在视频会话过程中产生的会话视频。第一终端401是P个终端中的任一个终端,此处以第一终端401对目标视频进行 处理为例进行说明,P个终端中除第一终端401之外的其他终端对目标视频的处理过程与第一终端401对目标视频的处理过程相同。
目标视频由多个连续的视频帧组成,目标视频包含的每一个视频帧中包括多个待编码的数据块。编码器20部署于第一终端401中,第一终端401调用编码器20对目标视频包含的每一个视频帧中的各个数据块的场景复杂度进行分析,确定出对每一个数据块的编码模式,并分别按照确定得到的编码模式对每一个数据块进行编码,最终得到编码后的目标视频。第一终端401可以将编码后的目标视频发送至服务器403,由服务器403将编码后的目标视频传输至参与视频会话的其他终端中;或者,第一终端401还可以直接将编码后的目标视频传输至参与视频会话的其他终端中;其他终端接收到编码后的目标视频,解析并播放目标视频,以实现P个终端参与的视频会话。
(2)编码器20部署于服务器403中,由服务器403进行视频处理。
P个用户分别使用P个终端中运行的视频会话应用程序参与视频会话,例如用户1使用第一终端401中运行的视频会话应用程序参与视频会话,用户2使用第二终端402中运行的视频会话应用程序参与视频会话等等。服务器403用于对P个终端在视频会话过程中产生的目标视频进行处理和传输。其中,目标视频可以包括P个终端在视频会话过程中产生的会话视频。
目标视频可以是第一终端401在参与视频会话的过程中产生的会话视频,第一终端401将目标视频传输至服务器403,目标视频由多个连续的视频帧组成,目标视频包含的每一个视频帧中包括多个待编码的数据块。编码器20部署于服务器403中,服务器403调用编码器20对目标视频包含的每一个视频帧中的各个数据块的场景复杂度进行分析,确定出对每一个数据块的编码模式,并分别按照确定得到的编码模式对每一个数据块进行编码,最终得到编码后的目标视频。服务器403将编码后的目标视频传输至参与视频会话的其他终端中;其他终端接收到编码后的目标视频,解析并播放目标视频,以实现P个终端参与的视频会话。
本申请实施例中,参与视频会话的各个终端或服务器可以调用编码器对视频会话过程中产生的目标视频进行编码,并且各个终端或服务器对目标视频中包含的各个数据块的编码模式是根据各个数据块的场景复杂度确定得到的,有效提升了对数据块的编码速度,从而提升了对目标视频的编码速度,在保障目标视频质量的同时,大幅度加快了目标视频的编码速度,从而提升了目标视频的流畅性,在一定程度上提升了视频会话的会话质量,优化了视频会话效果,提升用户体验。
图5是本申请实施例提供的一种视频处理方法的流程示意图,该视频处理方法可以由一个智能设备来执行,该智能设备可以是用户终端或者服务器,该用户终端可以是智能手机、平板电脑、智能可穿戴设备等带摄像功能的设备,该智能设备例如可以是图4所示的视频处理系统中的任一个终端或服务器,该视频处理方法包括以下步骤S501至步骤S505:
S501,从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块。待编码视频由多个连续的视频帧组成,从待编码视频中获取目标视频帧,目标视频帧是待编码视频中的任一个视频帧。目标视频帧中包含多个数据块,多个数据块中可能存在已编码的数据块和待编码的数据块,从目标视频帧中确定待编码的目标数据块,目标数据块是目标视频帧中任一个待编码的数据块。
S502,对目标数据块进行场景复杂度分析,得到数据块指标信息。对目标数据块进行 场景复杂度分析得到的数据块指标信息可以包括但不限于以下至少一种:目标数据块的预估失真参数、目标数据块的空间信息参数、目标数据块的时间信息参数。其中,预估失真(Distortion,Dist)参数是对帧内预测编码或帧间预测编码过程中的失真程度进行估计得到的。目标数据块的预估失真参数可以用于衡量重建后的目标数据块相比于原始的目标数据块的失真程度。原始的目标数据块是指未进行编码的目标数据块,重建后的目标数据块是指对目标数据块进行帧内预测编码或帧间预测编码后得到的编码后的目标数据块。目标数据块的空间信息参数可以是指对目标数据块进行计算得到的空间信息的数值;目标数据块的时间信息参数可以是指对目标数据块进行计算得到的时间信息的数值。
S503,将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数。
子块指标信息可以包括N个子块指标数据,第i个子数据块是N个子数据块中的任一个子数据块,第i个子块指标数据是N个子块指标数据中的任一个子块指标数据,第i个子数据块与第i个子块指标数据对应,第i个子块指标数据是对第i个子数据块进行场景复杂度分析得到的,i∈[1,N]。
第i个子块指标数据可以包括但不限于以下至少一种:第i个子数据块的失真估计参数、第i个子数据块的空间信息参数、第i个子数据块的时间信息参数。其中,第i个子数据块的预估失真参数可以用于衡量重建后的第i个子数据块相比于原始的第i个子数据块的失真程度。原始的第i个子数据块是指未进行编码的第i个子数据块,重建后的第i个子数据块是指对第i个子数据块进行帧内预测编码或帧间预测编码后得到的编码后的第i个子数据块。第i个子数据块的空间信息参数可以是指对第i个子数据块进行计算得到的空间信息的数值;第i个子数据块的时间信息参数可以是指对第i个子数据块进行计算得到的时间信息的数值。
S504,根据数据块指标信息和子块指标信息,确定对目标数据块的编码模式。
在一个实施例中,可以将数据块指标信息和子块指标信息输入联合统计模型,采用联合统计模型对数据块指标信息和子块指标信息进行计算,获取联合统计模型对数据块指标信息和子块指标信息进行计算后得到的输出值,并根据联合统计模型的输出值确定对目标数据块的编码模式。联合统计模型可以采用已确定编码模式的数据块的数据块指标信息,以及对已确定编码模式的数据块进行划分得到的N个子数据块的子块指标信息进行训练得到。在实施例中,联合统计模型可以对数据块指标信息和子块指标信息进行加权计算,得到输出值,其中加权因子可以通过对已确定编码模式的数据块及其子块的相关信息进行训练得到。
若联合统计模型的输出值满足第一条件,即输出值大于第一划分阈值,则确定对目标数据块的编码模式为第一模式。联合统计模型的输出值满足第一条件表明,对目标数据块进行划分得到的N个子数据块之间的关联性较弱,更倾向于将目标数据块划分为N个子数据块后分别对每一个子数据块进行编码。若联合统计模型的输出值满足第二条件,即输出值小于第二划分阈值,则确定对目标数据块的编码模式为第二模式。联合统计模型的输出值满足第二条件表明,对目标数据块进行划分得到的N个子数据块之间的关联性较强,更倾向于不对目标数据块进行划分,而直接对目标数据块进行编码。需要说明的是,第一划分阈值和第二划分阈值可以在联合统计模型的训练过程中训练得到。
S505,按照确定的编码模式对目标数据块进行编码。若编码模式为第一模式,则按照 确定的编码模式对目标数据块进行编码可以是指按照第一模式对目标数据块进行编码,具体可以是指将目标数据块划分成N个子数据块,并分别将每一个子数据块输入编码器中进行编码。若编码模式为第二模式,则按照确定的编码模式对目标数据块进行编码可以是指按照第二模式对目标数据块进行编码,具体可以是指直接将目标数据块输入编码器中进行编码。
本申请实施例中,将对目标数据块进行场景复杂度分析的场景复杂度分析结果(即数据块指标信息和子块指标信息)输入联合统计模型中进行计算,并根据联合统计模型计算后的输出值确定对目标数据块的编码模式。本申请实施例提供的视频处理方案适用于任意视频场景中,能够根据任意视频中待编码的目标数据块的场景复杂度自适应地对目标数据块的编码模式进行调整,从而确定出与目标数据块的场景复杂度相适应的编码模式,有效提升了对目标数据块的编码速度,进而提升了视频编码速度,并且能够获得视频的编码速度与视频的编码码率之间的最佳平衡。
图6是本申请实施例提供的另一种视频处理方法的流程示意图,该视频处理方法可以由一个智能设备来执行,该智能设备可以是用户终端或者服务器,该用户终端可以是智能手机、平板电脑、智能可穿戴设备等带摄像功能的设备,该智能设备例如可以是图4所示的视频处理系统中的任一个终端或服务器,该视频处理方法包括以下步骤S601至步骤S608:
S601,从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块。
S602,对目标数据块进行场景复杂度分析,得到数据块指标信息。
S603,将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数。
本申请实施例中步骤S601的执行过程与图5所示实施例中步骤S501的执行过程相同,步骤S602的执行过程与图5所示实施例中步骤S502的执行过程相同,步骤S603的执行过程与图5所示实施例中步骤S503的执行过程相同,具体执行过程可参见图5所示实施例的描述,在此不再赘述。
S604,将数据块指标信息和子块指标信息输入联合统计模型。
S605,获取联合统计模型对数据块指标信息和子块指标信息进行计算后得到的输出值。
S606,若输出值满足第三条件,则对与目标数据块相关的M个关联数据块进行场景复杂度分析,得到M个关联数据块的关联块指标信息,M为正整数。
S607,根据关联块指标信息确定对目标数据块的编码模式。
S606至S607中,若联合统计模型的输出值满足第三条件,即输出值小于或等于第一划分阈值,且输出值大于或等于第二划分阈值,表明对目标数据块进行划分得到的N个子数据块之间的关联性处于较强和较弱之间,没有明显倾向于将目标数据块划分为N个子数据块后分别对每一个子数据块进行编码,也没有明显倾向于不对目标数据块进行划分而直接对目标数据块进行编码。在这种情况下,确定与与目标数据块相关的M个关联数据块,并对M个关联数据块进行场景复杂度分析,得到M个关联数据块的关联块指标信息,并根据M个关联数据块的关联块指标信息确定对目标数据块的编码模式,M为正整数。M个关联数据块的关联块指标信息可以包括M个关联数据块中被划分为多个子数据块进行编码的关联数据块的第一数量。
在一个实施例中,若第一数量满足第四条件,即第一数量大于或等于第一数量阈值, 则表明与目标数据块相关的M个关联数据块的场景复杂程度较高,可以先尝试将目标数据块划分成N个待分析子数据块(即N个子数据块),并对N个待分析子数据块进行场景复杂度分析,得到N个待分析子数据块的待分析子数据块指标信息,并根据待分析子数据块指标信息进一步确定对目标数据块的编码模式。需要说明的是,第一数量阈值可以根据经验值设定,例如,与目标数据块相关的关联数据块的数量为3个,可以将第一数量阈值设置为2。以图3为例,与目标数据块301相关的3个关联数据块分别是关联数据块302、关联数据块303和关联数据块304,关联数据块302和关联数据块303被划分为多个子数据块进行编码,关联数据块304未被划分而直接进行编码,即3个关联数据块中被划分为多个子数据块进行编码的关联数据块的第一数量为2个。第一数量满足第四条件,那么可以确定与目标数据块301相关的3个关联数据块的场景复杂程度较高,可以先尝试将目标数据块301划分成N个待分析子数据块。待分析子数据块指标信息可以包括N个待分析子数据块中满足进一步划分条件的待分析子数据块的第二数量。具体地,待分析子数据块指标信息可以是指采用联合统计模型对N个待分析子数据块中的每一个待分析子数据块的数据块指标信息和将每一个待分析数据块划分成的N个子数据块的子块指示信息进行计算得到的N个输出值中,满足第一条件的输出值的第二数量。
在一个实施例中,若第二数量满足第五条件,即第二数量大于或等于第二数量阈值,可以确定对目标数据块的编码模式为第一模式。若第二数量不满足第五条件,即第二数量小于第二数量阈值,尝试直接对目标数据块进行编码,并根据直接对目标数据块进行编码得到的目标数据块的编码信息,以及根据将目标数据块划分成N个子数据块并分别对每一个子数据块进行编码得到的N个子数据块的编码信息,确定对目标数据块的编码模式。需要说明的是,第二数量阈值也可以根据经验值设定。例如,将目标数据块划分成4个待分析子数据块,可以将第二数量阈值设置为3。若4个待分析子数据块全部满足进一步划分条件,即4个待分析子数据块中满足进一步划分条件的待分析子数据块的第二数量为4个,第二数量满足第五条件,可以确定对目标数据块的编码模式为第一模式。
在一个实施例中,若第一数量不满足第四条件,即第一数量小于第一数量阈值,则表明与目标数据块相关的M个关联数据块的场景复杂程度较低,可以先尝试不对数据块进行划分而直接对目标数据块进行编码,并获取对目标数据块进行编码得到的目标数据块的编码信息,目标数据块的编码信息可以包括目标数据块的编码失真参数。需要说明的是,目标数据块的失真估计参数可以是指目标数据块的编码过程中失真程度的估计值,而目标数据块的编码失真参数可以用于衡量编码后的目标数据块(即重建后的目标数据块)相比于编码前的目标数据块(即原始的目标数据块)的实际失真程度。进一步地,根据目标数据块的编码失真参数和对目标数据块进行量化的量化参数(Quatization Parameter,QP)计算得到编码参数,编码参数的计算过程可参见下述公式1:
Code=Dist/QP 2   公式1
如上述公式1,Code表示编码参数,Dist表示目标数据块的编码失真参数,QP表示量化参数。
在一个实施例中,若编码参数满足第六条件,即编码参数小于第三划分阈值,则可以确定对目标数据块的编码模式为第二模式。在另一个实施例中,若编码参数不满足第六条件,即编码参数大于或等于第三划分阈值,尝试将目标数据块划分成N个子数据块,并分别对每一个子数据块进行编码,并根据直接对目标数据块进行编码得到的目标数据块的编 码信息,以及根据将目标数据块划分成N个子数据块并分别对每一个子数据块进行编码得到的N个子数据块的编码信息,确定对目标数据块的编码模式。需要说明的是,第三划分阈值可以在联合统计模型的训练过程中训练得到。
在以上两个实施例中,目标数据块的编码信息还可以包括目标数据块的第一编码率失真损失参数;N个子数据块的编码信息可以包括N个子数据块的第二编码率失真损失参数。第二编码率失真损失参数是根据N个子数据块中的每一个子数据块的第三编码率失真损失参数计算得到的,例如第二编码率失真损失参数可以是N个子数据块中的每一个子数据块的第三编码率失真损失参数的和。其中,第一编码率失真损失参数是根据按照直接对目标数据块进行编码的编码模式的编码码率与编码失真参数计算得到的,例如第一编码率失真损失参数可以是编码码率与编码失真参数的比值。第一编码率失真损失参数可以用于衡量直接对目标数据块进行编码的编码效果,第一编码率失真损失参数越小,对目标数据块进行编码的效果越好。目标数据块的编码效果好可以是指在保证编码后的目标数据块相比于编码前的目标数据块的失真程度较低的情况下,目标数据块的编码码率较低。类似的,第二编码率失真损失参数可以用于衡量将目标数据块划分成N个子数据块并分别对每一个子数据块进行编码的编码效果。第二编码率失真损失参数越小,对目标数据块进行编码的效果越好。若第一编码率失真损失参数大于或等于第二编码率失真损失参数,则确定对目标数据块的编码模式为第一模式;若第一编码率失真损失参数小于第二编码率失真损失参数,则确定对目标数据块的编码模式为第二模式。
S608,按照确定的编码模式对目标数据块进行编码。
本申请实施例中步骤S608的执行过程与图5所示实施例中步骤S505的执行过程相同,具体执行过程可参见图5所示实施例的描述,在此不再赘述。
本申请实施例中,与目标数据块相关的关联数据块的场景复杂度在一定程度上也能反映目标数据块的场景复杂程度,因此,本申请实施例根据目标数据块的场景复杂度(即联合统计模型的输出值)以及与目标数据块相关的关联数据块的场景复杂度(即关联块指示信息)联合确定对目标数据块的编码模式。本申请实施例提供的视频处理方案具有普适性,能够根据任意视频中待编码的目标数据块的场景复杂度自适应地对目标数据块的编码模式进行调整,从而确定出与目标数据块的场景复杂度相适应的编码模式,有效提升了对目标数据块的编码速度,进而提升了视频编码速度,并且能够在不损失视频编码效率的情况下,有效提升视频编码速度,并获得视频编码速度与视频编码码率之间的最佳平衡。
图7是本申请实施例提供的另一种视频处理方法的流程示意图,该视频处理方法可以由一个智能设备来执行,该智能设备可以是用户终端或者服务器,该用户终端可以是智能手机、平板电脑、智能可穿戴设备等带摄像功能的设备,该智能设备例如可以是图4所示的视频处理系统中的任一个终端或服务器,该视频处理方法包括以下步骤S701至步骤S720:
S701,从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块。
S702,对目标数据块进行场景复杂度分析,得到数据块指标信息。
S703,将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息。
S704,将数据块指标信息和子块指标信息输入联合统计模型中进行计算,得到联合统计模型的输出值。
S705,判断输出值是否满足第一条件。若输出值满足第一条件,则执行步骤S706;若输出值不满足第一条件,则执行步骤S707。
S706,若输出值满足第一条件,则确定对目标数据块的编码模式为第一模式。步骤S706执行结束后,执行步骤S720。
S707,若输出值不满足第一条件,则判断输出值是否满足第二条件。若输出值满足第二条件,则执行步骤S708;若输出值不满足第二条件,则执行步骤S709。
S708,若输出值满足第二条件,则确定对目标数据块的编码模式为第二模式。步骤S708执行结束后,执行步骤S720。
S709,输出值不满足第二条件,则判断输出值是否满足第三条件。若输出值满足第三条件,则执行步骤S710。
S710,若输出值满足第三条件,则确定与目标数据块相关的M个关联数据块中被划分为多个子数据块进行编码的关联数据块的第一数量。
S711,判断第一数量是否满足第四条件。若第一数量满足第四条件,则执行步骤S712;若第一数量不满足第四条件,则执行步骤S716。
S712,若第一数量满足第四条件,则优先尝试将目标子数据块划分成N个子数据块,并分别对每一个子数据块进行编码。
S713,确定N个子数据块中满足进一步划分条件的子数据块的第二数量。
S714,判断第二数量是否满足第五条件。若第二数量是否满足第五条件,则确定对目标数据块的编码模式为第一模式,并执行步骤S720。若第二数量不满足第五条件,则执行步骤S715。
S715,尝试不对目标数据进行划分,直接对目标数据块进行编码,并根据编码信息确定对目标数据块的编码模式。具体地,获取直接对目标数据块进行编码得到的目标数据块的编码信息,以及获取将目标数据块划分成N个子数据块并分别对每一个子数据块进行编码得到的N个子数据块的编码信息,并根据目标数据块的编码信息和N个子数据块的编码信息确定对目标数据块的编码模式。具体执行过程可参见图6所示实施例,步骤S715执行结束后,执行步骤S720。
S716,若第一数量不满足第四条件,则优先尝试不对目标数据块进行划分,直接对目标数据块进行编码,并确定目标数据块的编码失真参数。
S717,根据编码失真参数和量化参数计算得到编码参数。
S718,判断编码参数是否满足第六条件。若编码参数满足第六条件,则确定对目标数据块的编码模式为第二模式,并执行步骤S720;若编码参数不满足第六条件,则执行步骤S719。
S719,尝试将目标子数据块划分成N个子数据块,并分别对每一个子数据块进行编码,并根据编码信息确定对目标数据块的编码模式。具体地,获取直接对目标数据块进行编码得到的目标数据块的编码信息,以及获取将目标数据块划分成N个子数据块并分别对每一个子数据块进行编码得到的N个子数据块的编码信息,并根据目标数据块的编码信息和N个子数据块的编码信息确定对目标数据块的编码模式。具体执行过程可参见图6所示实施例,步骤S719执行结束后,执行步骤S720。
S720,按照确定得到的编码模式对目标数据块进行编码。按照确定得到的编码模式对目标数据块进行编码后,可以从待编码视频帧中确定除目标数据块之外的另一个待编码的 数据块,并根据该待编码的数据块的场景复杂度确定对该待编码的数据块的编码模式,并按照确定得到的编码模式,对该待编码的数据块进行编码。根据待编码的数据块的场景复杂度确定对该待编码的数据块的编码模式的过程,与根据目标数据块的场景复杂度确定对目标数据块的编码模式的过程相同。
本申请实施例中,与目标数据块相关的关联数据块的场景复杂度在一定程度上也能反映目标数据块的场景复杂程度,因此,本申请实施例根据目标数据块的场景复杂度(即联合统计模型的输出值)以及与目标数据块相关的关联数据块的场景复杂度(即关联块指示信息)联合确定对目标数据块的编码模式。本申请实施例设置了6个判断条件,对目标数据块的场景复杂度进行多角度综合分析,从而使得最终确定得到的对目标数据块的编码模式与目标数据块的场景复杂度的适应程度较高,有效提升了对目标数据块的编码速度,进而提升了视频编码速度,并且能够在保证编码后的视频的清晰度与流畅度的同时,在不损失视频编码质量与视频编码效率的情况下,有效提升视频编码速度。
请参见图8,图8是本申请实施例提供的一种视频处理装置的结构示意图,本申请实施例的所述视频处理装置80可以设置在智能设备中,该智能设备可以是智能终端,也可以是服务器,该视频处理装置80可以用于执行图5、图6或图7所示的视频处理方法中的相应步骤,该视频处理装置80包括如下单元。
获取单元801,用于从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块;
处理单元802,用于对目标数据块进行场景复杂度分析,得到数据块指标信息;将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数;根据数据块指标信息和子块指标信息,确定对目标数据块的编码模式;以及,按照确定的编码模式对目标数据块进行编码。
在一个实施例中,数据块指标信息包括:目标数据块的失真估计参数、目标数据块的空间信息参数、目标数据块的时间信息参数中的任意一种或多种。
子块指标信息包括:N个子块指标数据,其中,N个子块指标数据中的第i个子块指标数据包括:N个子数据块中的第i个子数据块的失真估计参数、第i个子数据块的空间信息参数、第i个子数据块的时间信息参数中的任意一种或多种,i∈[1,N]。
在一个实施例中,处理单元802,具体用于:
将数据块指标信息和子块指标信息输入联合统计模型;
获取联合统计模型对数据块指标信息和子块指标信息进行计算后得到的输出值;
根据输出值确定对目标数据块的编码模式。
在一个实施例中,处理单元802,具体用于:
若输出值满足第一条件,则确定编码模式为第一模式;
将目标数据块划分成N个子数据块,并分别将每一个子数据块输入编码器中进行编码;
其中,输出值满足第一条件是指:输出值大于第一划分阈值。
在一个实施例中,处理单元802,具体用于:
若输出值满足第二条件,则确定编码模式为第二模式;
将目标数据块输入编码器中进行编码;
其中,输出值满足第二条件是指:输出值小于第二划分阈值。
在一个实施例中,处理单元802,具体用于:
若输出值满足第三条件,则对与目标数据块相关的M个关联数据块进行场景复杂度分析,得到M个关联数据块的关联块指标信息,M为正整数;
根据关联块指标信息确定对目标数据块的编码模式;
其中,输出值满足第三条件是指:输出值小于或等于第一划分阈值,且输出值大于或等于第二划分阈值。
在一个实施例中,处理单元802,具体用于:
获取关联块指标信息,关联块指标信息包括:M个关联数据块中被划分为多个子数据块进行编码的关联数据块的第一数量;
若第一数量满足第四条件,则将目标数据块划分成N个待分析子数据块;
对N个待分析子数据块进行场景复杂度分析,确定N个待分析子数据块的待分析子数据块指标信息;
根据待分析子数据块指标信息确定对目标数据块的编码模式;
其中,第一数量满足第四条件是指:第一数量大于或等于第一数量阈值。
在一个实施例中,处理单元802,具体用于:
获取待分析子数据块指标信息,待分析子数据块指标信息包括:N个待分析子数据块中满足进一步划分条件的待分析子数据块的第二数量;
若第二数量满足第五条件,则确定编码模式为第一模式;
若第二数量不满足第五条件,则将目标数据块输入编码器中进行编码,并根据编码得到的目标数据块的编码信息确定对目标数据块的编码模式;
其中,第二数量满足第五条件是指:第二数量大于或等于第二数量阈值;第二数量不满足第五条件是指:第二数量小于第二数量阈值。
在一个实施例中,处理单元802,还用于:
若第一数量不满足第四条件,则将目标数据块输入编码器中进行编码;
根据编码得到的目标数据块的编码信息确定对目标数据块的编码模式;
其中,第一数量不满足第四条件是指:第一数量小于第一数量阈值。
在一个实施例中,目标数据块的编码信息包括目标数据块的编码失真参数;处理单元802,具体用于:
根据目标数据块的编码失真参数和对目标数据块进行量化的量化参数计算得到编码参数;
若编码参数满足第六条件,则确定编码模式为第二模式;
若编码参数不满足第六条件,则将目标数据块划分成N个子数据块,并分别将N个子数据块中的每一个子数据块输入编码器中进行编码;
根据目标数据块的编码信息和编码得到的N个子数据块的编码信息确定对目标数据块的编码模式;
其中,编码参数满足第六条件是指:编码参数小于第三划分阈值;编码参数不满足第六条件是指:编码参数大于或等于第三划分阈值。
在一个实施例中,目标数据块的编码信息还包括目标数据块的第一编码率失真损失参数;N个子数据块的编码信息包括N个子数据块的第二编码率失真损失参数,第二编码率失真损失参数是根据N个子数据块中的每一个子数据块的第三编码率失真损失参数计算得到的;处理单元802,具体用于:
若第一编码率失真损失参数大于或等于第二编码率失真损失参数,则确定对目标数据块的编码模式为第一模式;
若第一编码率失真损失参数小于第二编码率失真损失参数,则确定对目标数据块的编码模式为第二模式。
根据本申请的一个实施例,图8所示的视频处理装置80中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再划分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该视频处理装置80也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行如图5、图6或图7中所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图8中所示的视频处理装置80,以及来实现本申请实施例的视频处理方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于图4所示的视频处理系统的任一个终端(例如第一终端401或第二终端402等等)或服务器403中,并在其中运行。
本申请实施例中,通过对目标视频帧中待编码的目标数据块进行场景复杂度分析,分别得到对整个待编码的目标数据块的分析结果和将待编码的目标数据块划分成多个子数据块后的分析结果,并基于这些分析结果选择编码模式,可以为当前待编码的目标数据块较为准确地确定出合适的编码模式,能够有效提升对目标数据块的编码速度,提升视频编码效率。
请参见图9,图9是本申请实施例提供的一种智能设备的结构示意图,该智能设备90至少包括处理器901以及存储器902。其中,处理器901以及存储器902可通过总线或者其它方式连接。
处理器901可以是中央处理器(Central Processing Unit,CPU)。处理器901还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(Application-Specific Integrated Circuit,ASIC),可编程逻辑器件(Programmable Logic Device,PLD)等。上述PLD可以是现场可编程逻辑门阵列(Field-Programmable Gate Array,FPGA),通用阵列逻辑(Generic Array Logic,GAL)等。
存储器902可以包括易失性存储器(Volatile Memory),例如随机存取存储器(Random-Access Memory,RAM);存储器902也可以包括非易失性存储器(Non-Volatile Memory),例如快闪存储器(Flash Memory),固态硬盘(Solid-State Drive,SSD)等;存储器902还可以包括上述种类的存储器的组合。
存储器902用于存储计算机程序,计算机程序包括计算机指令,处理器901用于执行计算机指令。处理器901(或称CPU(Central Processing Unit,中央处理器))是智能设备90的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
该智能设备90可以是图4所示的视频处理系统中的任一个终端(例如第一终端401或第二终端402等等)或服务器403;该存储器902中存储有计算机程序,计算机程序包 括一条或多条计算机指令;由处理器901加载并执行一条或多条计算机指令,以实现图5、图6或图7所示方法实施例中的相应步骤;具体实现中,存储器902中的计算机指令由处理器901加载并执行如下步骤:
从待编码视频中获取目标视频帧,并从目标视频帧中确定待编码的目标数据块;
对目标数据块进行场景复杂度分析,得到数据块指标信息;
将目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数;
根据数据块指标信息和子块指标信息,确定对目标数据块的编码模式;
按照确定的编码模式对目标数据块进行编码。
在一个实施例中,数据块指标信息包括:目标数据块的失真估计参数、目标数据块的空间信息参数、目标数据块的时间信息参数中的任意一种或多种;
子块指标信息包括:N个子块指标数据,其中,N个子块指标数据中的第i个子块指标数据包括:N个子数据块中的第i个子数据块的失真估计参数、第i个子数据块的空间信息参数、第i个子数据块的时间信息参数中的任意一种或多种,i∈[1,N]。
在一个实施例中,存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
将数据块指标信息和子块指标信息输入联合统计模型;
获取联合统计模型对数据块指标信息和子块指标信息进行计算后得到的输出值;
根据输出值确定对目标数据块的编码模式。
在一个实施例中,存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
若输出值满足第一条件,则确定编码模式为第一模式;
将目标数据块划分成N个子数据块,并分别将每一个子数据块输入编码器中进行编码;
其中,输出值满足第一条件是指:输出值大于第一划分阈值。
在一个实施例中,存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
若输出值满足第二条件,则确定编码模式为第二模式;
将目标数据块输入编码器中进行编码;
其中,输出值满足第二条件是指:输出值小于第二划分阈值。
在一个实施例中,存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
若输出值满足第三条件,则对与目标数据块相关的M个关联数据块进行场景复杂度分析,得到M个关联数据块的关联块指标信息,M为正整数;
根据关联块指标信息确定对目标数据块的编码模式;
其中,输出值满足第三条件是指:输出值小于或等于第一划分阈值,且输出值大于或等于第二划分阈值。
在一个实施例中,存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
获取关联块指标信息,关联块指标信息包括:M个关联数据块中被划分为多个子数据块进行编码的关联数据块的第一数量;
若第一数量满足第四条件,则将目标数据块划分成N个待分析子数据块;
对N个待分析子数据块进行场景复杂度分析,确定N个待分析子数据块的待分析子数据块指标信息;
根据待分析子数据块指标信息确定对目标数据块的编码模式;
其中,第一数量满足第四条件是指:第一数量大于或等于第一数量阈值。
在一个实施例中,存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
获取待分析子数据块指标信息,待分析子数据块指标信息包括:N个待分析子数据块中满足进一步划分条件的待分析子数据块的第二数量;
若第二数量满足第五条件,则确定编码模式为第一模式;
若第二数量不满足第五条件,则将目标数据块输入编码器中进行编码,并根据编码得到的目标数据块的编码信息确定对目标数据块的编码模式;
其中,第二数量满足第五条件是指:第二数量大于或等于第二数量阈值;第二数量不满足第五条件是指:第二数量小于第二数量阈值。
在一个实施例中,存储器902中的计算机指令由处理器901加载时还执行如下步骤:
若第一数量不满足第四条件,则将目标数据块输入编码器中进行编码;
根据编码得到的目标数据块的编码信息确定对目标数据块的编码模式;
其中,第一数量不满足第四条件是指:第一数量小于第一数量阈值。
在一个实施例中,目标数据块的编码信息包括目标数据块的编码失真参数;存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
根据目标数据块的编码失真参数和对目标数据块进行量化的量化参数计算得到编码参数;
若编码参数满足第六条件,则确定编码模式为第二模式;
若编码参数不满足第六条件,则将目标数据块划分成N个子数据块,并分别将N个子数据块中的每一个子数据块输入编码器中进行编码;
根据目标数据块的编码信息和编码得到的N个子数据块的编码信息确定对目标数据块的编码模式;
其中,编码参数满足第六条件是指:编码参数小于第三划分阈值;编码参数不满足第六条件是指:编码参数大于或等于第三划分阈值。
在一个实施例中,目标数据块的编码信息还包括目标数据块的第一编码率失真损失参数;N个子数据块的编码信息包括N个子数据块的第二编码率失真损失参数,第二编码率失真损失参数是根据N个子数据块中的每一个子数据块的第三编码率失真损失参数计算得到的;存储器902中的计算机指令由处理器901加载时具体执行如下步骤:
若第一编码率失真损失参数大于或等于第二编码率失真损失参数,则确定对目标数据块的编码模式为第一模式;
若第一编码率失真损失参数小于第二编码率失真损失参数,则确定对目标数据块的编码模式为第二模式。
本申请实施例中,通过对目标视频帧中待编码的目标数据块进行场景复杂度分析,分别得到对整个待编码的目标数据块的分析结果和将待编码的目标数据块划分成多个子数据块后的分析结果,并基于这些分析结果选择编码模式,可以为当前待编码的目标数据块较为准确地确定出合适的编码模式,能够有效提升对目标数据块的编码速度,提升视频编码效率。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各种可选方式中提供的视频处理方法。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,计算机可读存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。

Claims (14)

  1. 一种视频处理方法,由智能设备执行,所述方法包括:
    从待编码视频中获取目标视频帧,并从所述目标视频帧中确定待编码的目标数据块;
    对所述目标数据块进行场景复杂度分析,得到数据块指标信息;
    将所述目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数;
    根据所述数据块指标信息和所述子块指标信息,确定对所述目标数据块的编码模式;
    按照确定的所述编码模式对所述目标数据块进行编码。
  2. 如权利要求1所述的方法,其中,所述数据块指标信息包括:所述目标数据块的失真估计参数、所述目标数据块的空间信息参数、所述目标数据块的时间信息参数中的任意一种或多种;
    所述子块指标信息包括:N个子块指标数据,其中,所述N个子块指标数据中的第i个子块指标数据包括:所述N个子数据块中的第i个子数据块的失真估计参数、所述第i个子数据块的空间信息参数、所述第i个子数据块的时间信息参数中的任意一种或多种,i∈[1,N]。
  3. 如权利要求1所述的方法,其中,所述根据所述数据块指标信息和所述子块指标信息,确定对所述目标数据块的编码模式,包括:
    将所述数据块指标信息和所述子块指标信息输入联合统计模型;
    获取所述联合统计模型对所述数据块指标信息和所述子块指标信息进行计算后得到的输出值;
    根据所述输出值确定对所述目标数据块的编码模式。
  4. 如权利要求3所述的方法,其中,所述根据所述输出值确定对所述目标数据块的编码模式,包括:
    若所述输出值满足第一条件,则确定所述编码模式为第一模式;
    所述按照确定的所述编码模式对所述目标数据块进行编码,包括:
    将所述目标数据块划分成所述N个子数据块,并分别将每一个子数据块输入编码器中进行编码;
    其中,所述输出值满足所述第一条件是指:所述输出值大于第一划分阈值。
  5. 如权利要求3所述的方法,其中,所述根据所述输出值确定对所述目标数据块的编码模式,包括:
    若所述输出值满足第二条件,则确定所述编码模式为第二模式;
    所述按照确定的所述编码模式对所述目标数据块进行编码,包括:
    将所述目标数据块输入编码器中进行编码;
    其中,所述输出值满足所述第二条件是指:所述输出值小于第二划分阈值。
  6. 如权利要求3所述的方法,其中,所述根据所述输出值确定对所述目标数据块的编码模式,包括:
    若所述输出值满足第三条件,则对与所述目标数据块相关的M个关联数据块进行场景复杂度分析,得到所述M个关联数据块的关联块指标信息,M为正整数;
    根据所述关联块指标信息确定对所述目标数据块的编码模式;
    其中,所述输出值满足所述第三条件是指:所述输出值小于或等于第一划分阈值,且所述输出值大于或等于第二划分阈值。
  7. 如权利要求6所述的方法,其中,所述根据所述关联块指标信息确定对所述目标数据块的编码模式,包括:
    获取所述关联块指标信息,所述关联块指标信息包括:所述M个关联数据块中被划分为多个子数据块进行编码的关联数据块的第一数量;
    若所述第一数量满足第四条件,则将所述目标数据块划分成N个待分析子数据块;
    对所述N个待分析子数据块进行场景复杂度分析,确定所述N个待分析子数据块的待分析子数据块指标信息;
    根据所述待分析子数据块指标信息确定对所述目标数据块的编码模式;
    其中,所述第一数量满足所述第四条件是指:所述第一数量大于或等于第一数量阈值。
  8. 如权利要求7所述的方法,其中,所述根据所述待分析子数据块指标信息确定对所述目标数据块的编码模式,包括:
    获取所述待分析子数据块指标信息,所述待分析子数据块指标信息包括:所述N个待分析子数据块中满足进一步划分条件的待分析子数据块的第二数量;
    若所述第二数量满足第五条件,则确定所述编码模式为第一模式;
    若所述第二数量不满足所述第五条件,则将所述目标数据块输入编码器中进行编码,并根据编码得到的所述目标数据块的编码信息确定对所述目标数据块的编码模式;
    其中,所述第二数量满足所述第五条件是指:所述第二数量大于或等于第二数量阈值;所述第二数量不满足所述第五条件是指:所述第二数量小于所述第二数量阈值。
  9. 如权利要求7所述的方法,其中,所述根据所述关联块指标信息确定对所述目标数据块的编码模式,还包括:
    若所述第一数量不满足所述第四条件,则将所述目标数据块输入编码器中进行编码;
    根据编码得到的所述目标数据块的编码信息确定对所述目标数据块的编码模式;
    其中,所述第一数量不满足所述第四条件是指:所述第一数量小于所述第一数量阈值。
  10. 如权利要求8或9所述的方法,其中,所述目标数据块的编码信息包括所述目标数据块的编码失真参数;所述根据编码得到的所述目标数据块的编码信息确定对所述目标数据块的编码模式,包括:
    根据所述目标数据块的编码失真参数和对所述目标数据块进行量化的量化参数计算得到编码参数;
    若所述编码参数满足第六条件,则确定所述编码模式为第二模式;
    若所述编码参数不满足所述第六条件,则将所述目标数据块划分成所述N个子数据块,并分别将所述N个子数据块中的每一个子数据块输入所述编码器中进行编码;
    根据所述目标数据块的编码信息和编码得到的所述N个子数据块的编码信息确定对所述目标数据块的编码模式;
    其中,所述编码参数满足所述第六条件是指:所述编码参数小于第三划分阈值;所述编码参数不满足所述第六条件是指:所述编码参数大于或等于所述第三划分阈值。
  11. 如权利要求10所述的方法,其中,所述目标数据块的编码信息还包括所述目标数据块的第一编码率失真损失参数;所述N个子数据块的编码信息包括所述N个子数据块的 第二编码率失真损失参数,所述第二编码率失真损失参数是根据所述N个子数据块中的每一个子数据块的第三编码率失真损失参数计算得到的;所述根据所述目标数据块的编码信息和所述N个子数据块的编码信息确定对所述目标数据块的编码模式,包括:
    若所述第一编码率失真损失参数大于或等于所述第二编码率失真损失参数,则确定对所述目标数据块的编码模式为所述第一模式;
    若所述第一编码率失真损失参数小于所述第二编码率失真损失参数,则确定对所述目标数据块的编码模式为所述第二模式。
  12. 一种视频处理装置,包括:
    获取单元,用于从待编码视频中获取目标视频帧,并从所述目标视频帧中确定待编码的目标数据块;
    处理单元,用于对所述目标数据块进行场景复杂度分析,得到数据块指标信息;将所述目标数据块划分成N个子数据块,并分别对每一个子数据块进行场景复杂度分析,得到子块指标信息,N为大于或等于2的整数;根据所述数据块指标信息和所述子块指标信息,确定对所述目标数据块的编码模式;以及,按照确定的所述编码模式对所述目标数据块进行编码。
  13. 一种智能设备,包括:
    处理器,适于实现计算机程序;以及,
    存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器运行时,实现如权利要求1至11任一项所述的视频处理方法。
  14. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器读取并运行时,实现如权利要求1至11任一项所述的视频处理方法。
PCT/CN2021/128311 2020-11-09 2021-11-03 一种视频处理方法、视频处理装置、智能设备及存储介质 WO2022095871A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/957,071 US20230023369A1 (en) 2020-11-09 2022-09-30 Video processing method, video processing apparatus, smart device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011239333.1A CN112104867B (zh) 2020-11-09 2020-11-09 一种视频处理方法、视频处理装置、智能设备及存储介质
CN202011239333.1 2020-11-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/957,071 Continuation US20230023369A1 (en) 2020-11-09 2022-09-30 Video processing method, video processing apparatus, smart device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022095871A1 true WO2022095871A1 (zh) 2022-05-12

Family

ID=73785176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128311 WO2022095871A1 (zh) 2020-11-09 2021-11-03 一种视频处理方法、视频处理装置、智能设备及存储介质

Country Status (3)

Country Link
US (1) US20230023369A1 (zh)
CN (1) CN112104867B (zh)
WO (1) WO2022095871A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112104867B (zh) * 2020-11-09 2021-03-02 腾讯科技(深圳)有限公司 一种视频处理方法、视频处理装置、智能设备及存储介质
CN113286120A (zh) * 2021-05-13 2021-08-20 深圳地理人和科技有限公司 一种视频分析处理方法、存储介质及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170208334A1 (en) * 2016-01-19 2017-07-20 Samsung Electronics Co., Ltd Method and apparatus for processing image data
CN108322747A (zh) * 2018-01-05 2018-07-24 中国软件与技术服务股份有限公司 一种面向超高清视频的编码单元划分优化方法
CN111107344A (zh) * 2018-10-26 2020-05-05 西安科锐盛创新科技有限公司 视频图像的编码方法及装置
CN111741313A (zh) * 2020-05-18 2020-10-02 杭州电子科技大学 基于图像熵k均值聚类的3d-hevc快速cu分割方法
CN111818332A (zh) * 2020-06-09 2020-10-23 复旦大学 一种适用于vvc标准的帧内预测划分判决的快速算法
CN112104867A (zh) * 2020-11-09 2020-12-18 腾讯科技(深圳)有限公司 一种视频处理方法、视频处理装置、智能设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200935916A (en) * 2007-12-05 2009-08-16 Onlive Inc System and method for compressing video by adjusting tile size based on detected intraframe motion or scene complexity
CN104883566B (zh) * 2015-05-27 2018-06-12 复旦大学 一种适用于hevc标准的帧内预测块大小划分的快速算法
CN105681812B (zh) * 2016-03-30 2019-11-19 腾讯科技(深圳)有限公司 Hevc帧内编码处理方法和装置
CN106961606B (zh) * 2017-01-26 2020-04-07 浙江工业大学 基于纹理划分特征的hevc帧内编码模式选择方法
CN107690069B (zh) * 2017-08-28 2021-01-01 中国科学院深圳先进技术研究院 一种数据驱动的级联视频编码方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170208334A1 (en) * 2016-01-19 2017-07-20 Samsung Electronics Co., Ltd Method and apparatus for processing image data
CN108322747A (zh) * 2018-01-05 2018-07-24 中国软件与技术服务股份有限公司 一种面向超高清视频的编码单元划分优化方法
CN111107344A (zh) * 2018-10-26 2020-05-05 西安科锐盛创新科技有限公司 视频图像的编码方法及装置
CN111741313A (zh) * 2020-05-18 2020-10-02 杭州电子科技大学 基于图像熵k均值聚类的3d-hevc快速cu分割方法
CN111818332A (zh) * 2020-06-09 2020-10-23 复旦大学 一种适用于vvc标准的帧内预测划分判决的快速算法
CN112104867A (zh) * 2020-11-09 2020-12-18 腾讯科技(深圳)有限公司 一种视频处理方法、视频处理装置、智能设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KIM HYO-SONG; PARK RAE-HONG: "Fast CU Partitioning Algorithm for HEVC Using an Online-Learning-Based Bayesian Decision Rule", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE, USA, vol. 26, no. 1, 1 January 2016 (2016-01-01), USA, pages 130 - 138, XP011592164, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2015.2444672 *
NIU ZHIGUO, JIUZHEN LIANG, QIN WU: "Moving Object Segmentation Method Based on Block in HEVC Compressed Domain", COMPUTER ENGINEERING AND APPLICATIONS, HUABEI JISUAN JISHU YANJIUSUO, CN, vol. 52, no. 14, 29 September 2015 (2015-09-29), CN , pages 202 - 208, XP055927224, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002-8331.1506-0050 *

Also Published As

Publication number Publication date
CN112104867A (zh) 2020-12-18
US20230023369A1 (en) 2023-01-26
CN112104867B (zh) 2021-03-02

Similar Documents

Publication Publication Date Title
WO2022095871A1 (zh) 一种视频处理方法、视频处理装置、智能设备及存储介质
WO2018090774A1 (zh) 动态自适应视频流媒体的码率控制与版本选择方法及系统
WO2021129007A1 (zh) 视频码率的确定方法、装置、计算机设备及存储介质
WO2023142716A1 (zh) 编码方法、实时通信方法、装置、设备及存储介质
WO2023134523A1 (zh) 内容自适应视频编码方法、装置、设备和存储介质
CN113473148A (zh) 一种用于视频编码的计算系统及视频编码方法
CN111294591B (zh) 视频信息处理方法、多媒体信息处理方法、装置
WO2022000298A1 (en) Reinforcement learning based rate control
CN112468816B (zh) 固定码率系数预测模型建立及视频编码的方法
Song et al. Remote display solution for video surveillance in multimedia cloud
US20220408097A1 (en) Adaptively encoding video frames using content and network analysis
CN113191945A (zh) 一种面向异构平台的高能效图像超分辨率系统及其方法
Bouaafia et al. VVC In‐Loop Filtering Based on Deep Convolutional Neural Network
US20230018087A1 (en) Data coding method and apparatus, and computer-readable storage medium
WO2023078204A1 (zh) 数据处理方法、装置、设备、可读存储介质及程序产品
Zheng‐Jie et al. Fast intra partition and mode prediction for equirectangular projection 360‐degree video coding
CN115002452A (zh) 一种视频处理方法、视频处理装置、电子设备及存储介质
WO2017096947A1 (zh) 实时转码的实时控制方法及装置
US11445200B2 (en) Method and system for processing video content
US11546597B2 (en) Block-based spatial activity measures for pictures
CN115941971A (zh) 视频处理方法、装置及计算机设备、存储介质、程序产品
CN115243042A (zh) 一种量化参数确定方法及相关装置
CN116527940A (zh) 一种视频编码方法、装置及计算机设备、介质
CN117176729A (zh) 应用于联邦学习的客户端选择方法、设备和存储介质
CN115988207A (zh) 一种视频编码方法、装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888568

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25-09-2023)