CN114567778B - Video coding method and system - Google Patents

Video coding method and system Download PDF

Info

Publication number
CN114567778B
CN114567778B CN202210450047.2A CN202210450047A CN114567778B CN 114567778 B CN114567778 B CN 114567778B CN 202210450047 A CN202210450047 A CN 202210450047A CN 114567778 B CN114567778 B CN 114567778B
Authority
CN
China
Prior art keywords
interest
pixel
coding
pixels
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210450047.2A
Other languages
Chinese (zh)
Other versions
CN114567778A (en
Inventor
黄震坤
岑裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunzhong Rongxin Network Technology Co ltd
Original Assignee
Beijing Yunzhong Rongxin Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhong Rongxin Network Technology Co ltd filed Critical Beijing Yunzhong Rongxin Network Technology Co ltd
Priority to CN202210450047.2A priority Critical patent/CN114567778B/en
Publication of CN114567778A publication Critical patent/CN114567778A/en
Application granted granted Critical
Publication of CN114567778B publication Critical patent/CN114567778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to the technical field of multimedia video image information processing, and discloses a video coding method and a system, wherein the video coding method comprises the following steps: converting an input video stream into an RGB image; carrying out target detection and motion detection on the RGB image so as to identify target pixels and motion pixels of the RGB image; performing fusion processing on the target pixel and the motion pixel to determine an interested pixel and a non-interested pixel in the RGB image; determining an interested region and a non-interested region according to the distribution of the interested pixels and the non-interested pixels; and distributing coding rates for the interested regions and the non-interested regions according to the set target code rate. The invention ensures that the QP of the region of interest is smaller, and improves the definition of the region of interest.

Description

Video coding method and system
Technical Field
The present invention relates to the field of multimedia video image information processing technologies, and in particular, to a video encoding method and system.
Background
Currently, more and more conferences are transferred from offline to online, the network conference generally requires to achieve high-definition image quality, network bandwidth is continuously changed due to network complexity, more and more challenges are brought to video compression and video transmission, high-definition and ultra-high-definition video compression coding is an indispensable technical means, and meanwhile, the performance and complexity of video compression coding directly influence the application range and potential of high-definition and ultra-high-definition videos.
Therefore, under the condition of keeping a certain video quality, it is a generally pursued goal to increase the compression ratio of video coding and reduce the complexity of video coding. In the prior art, coding methods such as HEVC, H.264, VVC and the like are sequentially generated, but the coding speed still cannot meet the actual requirements of high-definition and ultra-high-definition video compression. Chinese patent publication No. CN113115043A discloses a video encoder, a video encoding system, and a video encoding method, which mainly adopt a scheme in which each frame of image is jointly completed by using a plurality of video encoders, thereby reducing encoding time, reducing encoding delay, and realizing real-time encoding of a high-definition video source.
The real-time audio and video technology is a terminal service, and provides full-scene, full-interaction and full-real-time audio and video services with high concurrency, low time delay, high definition smoothness, safety and reliability for the industry. However, in the case of the real-time audio-video technology, in the case of limited bandwidth, decoding and playing are performed under the condition of still maintaining high-definition video coding. If the encoder increases the QP, it will cause picture blurring, and region-of-interest-based encoding is one of the methods to solve such problems. For example, chinese patent publication No. CN106162177A discloses a video encoding method and apparatus, which determines an area of interest by identifying a moving object and performs high fidelity encoding by smoothing filtering; also for example, chinese patent publication No. CN103297754A discloses a surveillance video adaptive region-of-interest coding system, which uses ROI detection and h.264 coding to realize compromise between data compression based on h.264 protocol and high-quality storage of key information. Therefore, in the process of video compression and transmission, under the condition that the code rate is not changed, how to maintain the definition of the region of interest to reduce the network bandwidth occupation enables a user to enjoy the watching interest of the ultra-high-definition video at low bandwidth/network speed becomes a problem to be solved urgently.
Disclosure of Invention
In view of the above defects or shortcomings in the prior art, the present invention provides a video encoding method and system, which extract an interested area in a video by combining target detection and motion detection, and then allocate an encoding rate by combining game theory.
In an aspect of the present invention, there is provided a video encoding method, including:
converting an input video stream into an RGB image;
performing target detection and motion detection on the RGB image to identify target pixels and motion pixels of the RGB image;
performing fusion processing on the target pixel and the motion pixel to determine an interested pixel and a non-interested pixel in the RGB image;
determining an interested area and a non-interested area according to the distribution of the interested pixels and the non-interested pixels;
and distributing coding rates for the interested regions and the non-interested regions according to the set target code rate.
Further, the step of allocating coding rate to the interested region and the non-interested region according to the set target coding rate comprises:
calculated according to the formula
Figure DEST_PATH_IMAGE001
Code rate of region of interest at minimum
Figure 15736DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
Figure 725066DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
Figure 19912DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
Figure 495893DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
Figure 661426DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
Figure 541658DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE013
Wherein D1 is the R-D function of the region of interest, and D2 is the R-D function of the region of no interest;
Figure 573067DEST_PATH_IMAGE014
a weight representing the overall coding quality; m represents the number of coding tree units of the region of interest; n represents the number of coding tree units of the non-interesting region;
Figure DEST_PATH_IMAGE015
indicating the coding complexity of the ith coding tree unit,
Figure 340822DEST_PATH_IMAGE016
representing the coding complexity of the (i-1) th coding tree unit;
Figure DEST_PATH_IMAGE017
representing the number of bits per pixel of the ith coding tree unit;
Figure 751075DEST_PATH_IMAGE018
the number of pixels of the ith coding tree unit;
Figure DEST_PATH_IMAGE019
indicating the set target code rate, and indicating the target code rate,
Figure 192420DEST_PATH_IMAGE002
representing a code rate of the region of interest;
Figure 852072DEST_PATH_IMAGE020
is a first constant which is a function of the first,
Figure DEST_PATH_IMAGE021
is a second constant;
Figure 155008DEST_PATH_IMAGE022
with initial setting
Figure DEST_PATH_IMAGE023
Figure 809981DEST_PATH_IMAGE015
With initial setting
Figure 828752DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE025
Represent
Figure 54328DEST_PATH_IMAGE026
The natural logarithm of the number of the pairs,
Figure DEST_PATH_IMAGE027
represent
Figure 410223DEST_PATH_IMAGE028
The natural logarithm of (d);
Figure DEST_PATH_IMAGE029
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure 60647DEST_PATH_IMAGE030
representing true consumption
Figure 860107DEST_PATH_IMAGE029
Figure DEST_PATH_IMAGE031
Is composed of
Figure 963193DEST_PATH_IMAGE032
Consumed in time
Figure DEST_PATH_IMAGE033
Further, the motion detection of the RGB image includes:
taking a Gaussian mixture model GMM as a background model of a static scene without an invasive object; and taking the pixels in the current RGB image which are not matched with the background model as the motion pixels.
Further, the fusion process includes:
if the pixel in the RGB image belongs to the target pixel and the motion pixel at the same time, the pixel is judged as the interested pixel.
Further, the step of determining the regions of interest and the regions of non-interest based on the distribution of the pixels of interest and the pixels of non-interest comprises:
if the proportion of the interested pixels in all the pixels of the coding block exceeds or equals to a set proportion threshold value, the coding block is an interested area, otherwise, the coding block is a non-interested area.
In another aspect of the present invention, there is provided a video encoding system including:
a conversion module configured to convert an input video stream into an RGB image;
a detection module configured to perform target detection and motion detection on the RGB image to identify target pixels and motion pixels of the RGB image;
a fusion module configured to perform fusion processing on the target pixel and the motion pixel to determine a pixel of interest and a non-pixel of interest in the RGB image;
a determination module configured to determine regions of interest and regions of non-interest from the distribution of the pixels of interest and the pixels of non-interest;
and the code rate allocation module is configured to allocate coding code rates to the interested region and the non-interested region according to the set target code rate.
Further, the code rate allocation module is further configured to:
calculated according to the following formula
Figure 122778DEST_PATH_IMAGE001
Code rate of region of interest at minimum
Figure 627709DEST_PATH_IMAGE002
Figure 595141DEST_PATH_IMAGE003
Figure 451101DEST_PATH_IMAGE004
Figure 414378DEST_PATH_IMAGE005
Figure 304974DEST_PATH_IMAGE006
Figure 836449DEST_PATH_IMAGE007
Figure 789493DEST_PATH_IMAGE008
Figure 431827DEST_PATH_IMAGE009
Figure 911349DEST_PATH_IMAGE010
Figure 738360DEST_PATH_IMAGE011
Figure 303334DEST_PATH_IMAGE012
Figure 749358DEST_PATH_IMAGE013
Wherein D1 is the R-D function of the region of interest, and D2 is the R-D function of the region of no interest;
Figure 489912DEST_PATH_IMAGE014
a weight representing the overall coding quality; m represents the number of coding tree units of the region of interest; n represents the number of coding tree units of the non-interesting region;
Figure 97611DEST_PATH_IMAGE015
indicating the coding complexity of the ith coding tree unit,
Figure 149881DEST_PATH_IMAGE016
representing the coding complexity of the i-1 st coding tree unit;
Figure 524230DEST_PATH_IMAGE017
representing the number of bits per pixel of the ith coding tree unit;
Figure 978346DEST_PATH_IMAGE018
the number of pixels of the ith coding tree unit;
Figure 22525DEST_PATH_IMAGE019
indicates the set target code rate and, therefore,
Figure 171878DEST_PATH_IMAGE002
representing a code rate of the region of interest;
Figure 959705DEST_PATH_IMAGE020
is a first constant which is a function of the first,
Figure 392960DEST_PATH_IMAGE021
is the second timeCounting;
Figure 608041DEST_PATH_IMAGE022
with initial setting
Figure 900482DEST_PATH_IMAGE023
Figure 850857DEST_PATH_IMAGE015
With initial setting
Figure 545143DEST_PATH_IMAGE024
Figure 665546DEST_PATH_IMAGE025
To represent
Figure 304338DEST_PATH_IMAGE026
The natural logarithm of the number of the pairs,
Figure 433968DEST_PATH_IMAGE027
to represent
Figure 717182DEST_PATH_IMAGE028
The natural logarithm of (d);
Figure 149431DEST_PATH_IMAGE029
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure 885306DEST_PATH_IMAGE030
representing true consumption
Figure 818627DEST_PATH_IMAGE029
Figure 80981DEST_PATH_IMAGE031
Is composed of
Figure 808766DEST_PATH_IMAGE032
Consumed in real time
Figure 297516DEST_PATH_IMAGE033
Further, the detection module is further configured to:
taking a Gaussian mixture model GMM as a background model of a static scene without an invasive object; and taking pixels in the current RGB image which are not matched with the background model as motion pixels.
Further, the fusion module is further configured to:
if a pixel in the RGB image belongs to both the target pixel and the motion pixel, the pixel is determined to be the pixel of interest.
Further, the determination module is further configured to:
if the proportion of the interested pixel in all the pixels of the coding block exceeds or equals to a set proportion threshold value, the coding block is an interested area, otherwise, the coding block is a non-interested area.
According to the video coding method and system, the interesting region in the video is extracted by adopting a mode of combining target detection and motion detection, and the coding code rate is distributed by adopting a mode of combining game theory, so that the QP of the interesting region is smaller, and the definition of the interesting region is improved.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments thereof, made with reference to the following drawings:
fig. 1 is a flowchart of a video encoding method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a video coding system according to an embodiment of the present invention;
fig. 3 is a schematic composition diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that although the terms first, second, third, etc. may be used to describe the acquisition modules in embodiments of the present invention, these acquisition modules should not be limited to these terms. These terms are only used to distinguish the acquisition modules from each other.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (a stated condition or event)" may be interpreted as "upon determining" or "in response to determining" or "upon detecting (a stated condition or event)" or "in response to detecting (a stated condition or event)" depending on the context.
It should be noted that the terms "upper," "lower," "left," "right," and the like used in the description of the embodiments of the present invention are illustrated in the drawings, and should not be construed as limiting the embodiments of the present invention. In addition, in this context, it is also to be understood that when an element is referred to as being "on" or "under" another element, it can be directly formed on "or" under "the other element or be indirectly formed on" or "under" the other element through an intermediate element.
One embodiment of the present invention provides a video coding method, which can greatly reduce network bandwidth occupation by combining a region of interest ROI with a video coding technology, so that a user can enjoy watching of an ultra high definition video at a low bandwidth/network speed.
Referring to fig. 1, the video encoding method of the present embodiment includes two parts, namely, target detection and rate allocation, and specifically includes the following steps:
step S101, converting an input video stream into an RGB image;
specifically, in this embodiment, an original yuv video stream is acquired from a camera, and a video file is defined as input. A 264 encoder used in WebRTC and open source OpenH264 provided by cisco will be described as an example. In the encoder, input.yuv video data input is converted into an RGB image.
Step S102, carrying out target detection and motion detection on the RGB image to identify target pixels and motion pixels of the RGB image;
object detection is an image segmentation based on object geometry and statistical features. The method combines the segmentation and the identification of the target into a whole, and the accuracy and the real-time performance of the method are important capabilities of the whole system. The present embodiment adopts the YOLOv4 model for target detection. The YOLOv4 model designs a powerful and efficient detection model, and the model can be trained by 1080 Ti and 2080 Ti, which is an ultrafast and accurate model. In the detection model training stage, the detection model verifies the effects of some most advanced Bag-of-freebes and Bag-of-Specials methods, and a plurality of SOTA methods are modified to make the single GPU training more efficient, such as CBN, PAN, SAM and the like. A complete YOLOv4 model includes: CSPDarknet53 (backbone) + SPP + PAN (Neck, i.e. feature enhancement module) + YoloV 3. The YOLOv4 model uses "comp" techniques of CutMix, Mosaic data enhancement, DropBlock regularization, label smoothing, CIoU-loss, CmBN, self-confrontation training, each object assigned to multiple anchors. The "special" skills used include: mish activation, cross-phase space Connectivity (CSP), multiple-input-weight residual connectivity, SPP-block, SAM-block, PAN, DIoU-NMS. The input to YOLOv4 is the original image and the output is the detected target pixel.
The motion detection of the embodiment adopts a gaussian mixture model to extract the motion region. The gaussian mixture model is a probability model that can be used to represent a distribution (distribution) having K sub-distributions, in other words, the gaussian mixture model represents a probability distribution of the observed data in the population, which is a mixture distribution composed of K sub-distributions. The gaussian mixture model does not require the observation data to provide information about the sub-distributions to calculate the probability of the observation data in the overall distribution. The gaussian mixture model can be regarded as a model formed by combining K single gaussian models, which are Hidden variables (Hidden variables) of the mixture model. In general, any probability distribution can be used for a mixture model, where a Gaussian mixture model is used because of its good mathematical properties and good computational performance.
The invention adopts a Gaussian Mixture Model (GMM) to detect the motion region. In a monitoring system, a shooting background is a fixed scene with less change, and a static scene without an invasive object has some conventional characteristics and can be described by a background model. The GMM is a feature that simulates the background by using a weighted sum of a plurality of gaussian models mixed together, i.e. as a background model. Taking pixels in the current RGB image, which are not matched with the background model, as motion pixels, namely identifying intrusion objects; and taking the pixel matched with the background model in the current RGB image as a background pixel.
Step S103, carrying out fusion processing on the target pixel and the motion pixel to determine an interested pixel and a non-interested pixel in the RGB image;
specifically, a binary fusion mode is adopted as a region fusion method, that is, if a pixel in an RGB image belongs to both a target pixel and a motion pixel, the pixel is determined as an interested pixel. In other words, for a pixel, if the pixel belongs to both the target pixel detected by the target detection model and the motion pixel detected by the gaussian mixture model, the pixel is determined as the pixel of interest, otherwise, the pixel is determined as the non-pixel of interest.
Step S104, determining an interested area and a non-interested area according to the distribution of the interested pixels and the non-interested pixels;
specifically, since video encoding employs block-based compression, the block-based compression unit is not a single pixel, but is a 4 × 4, 8 × 8, or 16 × 16 block. In h.264, the macroblock 16 × 16 scheme is used for encoding and compression. There are situations where the pixel of interest cannot fully fill the macroblock, in which case a scaling threshold needs to be determined. When the proportion of the interested pixels of a macro block reaches or exceeds a set proportion threshold value, the macro block is considered to be the interested macro block or the interested area. The judgment rule of this embodiment is: if the pixel of interest percentage of a coding block exceeds or equals to 80% of the whole macroblock pixels, the coding block is defined as the macroblock of interest/region of interest.
And step S105, distributing coding rate to the interested region and the non-interested region according to the set target code rate.
The present embodiment employs a game theory based model in the rate allocation scheme.
The coding quality of the interested region is used as a leader, the coding quality of the non-interested region is used as a follower, the leader determines the code rate allocated to the interested region under the set target code rate, and the follower determines the code rate allocated to the non-interested region. For the interested region, the utility of the interested region not only depends on the interested region, but also affects the coding quality of the whole RGB image, and the non-interested region can only achieve the optimal utility by using the rest code rate.
Specifically, it is calculated according to the following formula
Figure 644315DEST_PATH_IMAGE001
Code rate of region of interest at minimum
Figure 902121DEST_PATH_IMAGE002
(e.g., as for equation (1))
Figure 800806DEST_PATH_IMAGE001
Taking the partial derivative of R1, calculating R1 when the partial derivative is equal to 0):
Figure 635907DEST_PATH_IMAGE003
Figure 911031DEST_PATH_IMAGE004
Figure 633130DEST_PATH_IMAGE005
Figure 968297DEST_PATH_IMAGE006
Figure 166060DEST_PATH_IMAGE007
(region of interest)
Figure 103929DEST_PATH_IMAGE008
(region of non-interest)
Figure 70748DEST_PATH_IMAGE009
Figure 311236DEST_PATH_IMAGE010
Figure 603153DEST_PATH_IMAGE011
Figure 220079DEST_PATH_IMAGE012
Figure 306984DEST_PATH_IMAGE015
Is updated to
Figure 843007DEST_PATH_IMAGE013
Figure 15363DEST_PATH_IMAGE022
Is updated to
Wherein D1 is the R-D function of the region of interest, and D2 is the R-D function of the region of no interest;
Figure 45767DEST_PATH_IMAGE014
a weight representing the overall coding quality;
m represents the number of coding tree units of the region of interest;
n represents the number of coding tree units of the non-interesting region;
Figure 987178DEST_PATH_IMAGE015
indicating the coding complexity of the ith coding tree unit,
Figure 835048DEST_PATH_IMAGE016
representing the coding complexity of the i-1 st coding tree unit;
Figure 353754DEST_PATH_IMAGE015
is a parameter that is continuously updated according to the video content;
Figure 312483DEST_PATH_IMAGE017
representing the ith coding tree unit
Figure 108401DEST_PATH_IMAGE029
Figure 736959DEST_PATH_IMAGE018
The number of pixels of the ith coding tree unit;
Figure 618327DEST_PATH_IMAGE022
with initial setting
Figure 380747DEST_PATH_IMAGE023
Figure 421384DEST_PATH_IMAGE015
With initial setting
Figure 345478DEST_PATH_IMAGE024
Figure 714142DEST_PATH_IMAGE022
Basically stabilized around 1, practically set to 1;
Figure 624461DEST_PATH_IMAGE015
initial value of (2)
Figure 660550DEST_PATH_IMAGE024
The default setting is set to 3.2003 so that,
Figure 21124DEST_PATH_IMAGE022
of (2) is calculated
Figure 736139DEST_PATH_IMAGE023
The default setting is 1.367.
Figure 840361DEST_PATH_IMAGE019
Indicating the set target code rate, and indicating the target code rate,
Figure 343674DEST_PATH_IMAGE002
representing a code rate of the region of interest;
Figure 875149DEST_PATH_IMAGE020
is a first constant which is a function of the first,
Figure 952827DEST_PATH_IMAGE021
is a second constant; in this embodiment
Figure 719794DEST_PATH_IMAGE034
Set to 0.1 and 0.05, respectively.
Figure 464896DEST_PATH_IMAGE025
To represent
Figure 901694DEST_PATH_IMAGE026
The natural logarithm of the number of the pairs,
Figure 607613DEST_PATH_IMAGE027
to represent
Figure 53638DEST_PATH_IMAGE028
Natural logarithm of (d);
Figure 387667DEST_PATH_IMAGE029
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure 385579DEST_PATH_IMAGE030
representing true consumption
Figure 437849DEST_PATH_IMAGE029
Figure 687564DEST_PATH_IMAGE031
Is composed of
Figure 751466DEST_PATH_IMAGE032
Consumed in time
Figure 795646DEST_PATH_IMAGE033
The video coding method provided by the embodiment can enable the QP of the region of interest to be smaller, and improves the definition of the region of interest.
Referring to fig. 2, another embodiment of the present invention further provides a video encoding system 200, which includes a conversion module 201, a detection module 202, a fusion module 203, a determination module 204, and a rate allocation module 205. The video coding system 200 is configured to perform the method steps in the above-described method embodiments.
Specifically, the method comprises the following steps:
a conversion module 201 configured to convert an input video stream into an RGB image;
a detection module 202 configured to perform target detection and motion detection on the RGB image to identify target pixels and motion pixels of the RGB image;
a fusion module 203 configured to perform fusion processing on the target pixel and the motion pixel to determine a pixel of interest and a non-pixel of interest in the RGB image;
a determination module 204 configured to determine regions of interest and regions of non-interest based on the distribution of the pixels of interest and the pixels of non-interest;
and the code rate allocation module 205 is configured to allocate coding code rates to the interested region and the non-interested region according to the set target code rate.
Further, the code rate allocation module 205 is further configured to calculate the code rate according to the following equation
Figure 600791DEST_PATH_IMAGE001
Code rate of region of interest at minimum
Figure 247673DEST_PATH_IMAGE002
(e.g., as for equation (1))
Figure 821874DEST_PATH_IMAGE001
Taking the partial derivative of R1, calculating R1 when the partial derivative is equal to 0):
Figure 646741DEST_PATH_IMAGE003
Figure 939182DEST_PATH_IMAGE004
Figure 999542DEST_PATH_IMAGE005
Figure 552883DEST_PATH_IMAGE006
Figure 204445DEST_PATH_IMAGE007
Figure 453023DEST_PATH_IMAGE008
Figure 189511DEST_PATH_IMAGE009
Figure 738304DEST_PATH_IMAGE010
Figure 295187DEST_PATH_IMAGE011
Figure 155695DEST_PATH_IMAGE012
Figure 89016DEST_PATH_IMAGE013
wherein D1 is the R-D function of the interested region, D2 is the R-D function of the uninterested region;
Figure 492316DEST_PATH_IMAGE014
a weight representing the overall coding quality; m represents the number of coding tree units of the region of interest; n represents the number of coding tree units of the non-interesting region;
Figure 829888DEST_PATH_IMAGE015
indicating the coding complexity of the ith coding tree unit,
Figure 318638DEST_PATH_IMAGE016
representing the coding complexity of the i-1 st coding tree unit;
Figure 649125DEST_PATH_IMAGE017
representing the number of bits per pixel of the ith coding tree unit;
Figure 906931DEST_PATH_IMAGE018
the number of pixels of the ith coding tree unit;
Figure 805617DEST_PATH_IMAGE019
indicating the set target code rate, and indicating the target code rate,
Figure 657029DEST_PATH_IMAGE002
representing a code rate of the region of interest;
Figure 666574DEST_PATH_IMAGE020
is a first constant which is a function of the first,
Figure 778886DEST_PATH_IMAGE021
is a second constant;
Figure 973107DEST_PATH_IMAGE022
with initial setting
Figure 170870DEST_PATH_IMAGE023
Figure 984105DEST_PATH_IMAGE015
With initial setting
Figure 91870DEST_PATH_IMAGE024
Figure 332358DEST_PATH_IMAGE025
To represent
Figure 17417DEST_PATH_IMAGE026
The natural logarithm of the number of the pairs,
Figure 758977DEST_PATH_IMAGE027
to represent
Figure 580303DEST_PATH_IMAGE028
Natural logarithm of (d);
Figure 257272DEST_PATH_IMAGE029
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure 30625DEST_PATH_IMAGE030
representing true consumption
Figure 185663DEST_PATH_IMAGE029
Figure 392653DEST_PATH_IMAGE031
Is composed of
Figure 99578DEST_PATH_IMAGE032
Consumed in time
Figure 759230DEST_PATH_IMAGE033
Further, the detection module 202 is configured to: taking a Gaussian mixture model GMM as a background model of a static scene without an invasive object; and taking the pixels in the current RGB image which are not matched with the background model as the motion pixels.
Further, the fusion module 203 is configured to: if the pixel in the RGB image belongs to the target pixel and the motion pixel at the same time, the pixel is judged as the interested pixel.
Further, the determination module 204 is configured to: if the proportion of the interested pixels in all the pixels of the coding block exceeds or equals to a set proportion threshold value, the coding block is an interested area, otherwise, the coding block is a non-interested area.
It should be noted that, the video coding system 200 provided in this embodiment is corresponding to a technical solution that can be used to implement each method embodiment, and the implementation principle and technical effect are similar to those of the method, and are not described herein again.
The invention further provides electronic equipment for executing the method embodiment. Referring now specifically to fig. 3, a schematic diagram of a structure suitable for implementing the electronic device 300 in the present embodiment is shown. The electronic device 300 in the present embodiment may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), a wearable electronic device, and the like, and a stationary terminal such as a digital TV, a desktop computer, a smart home device, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes to implement the methods of the various embodiments as described herein, according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage device 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate with other devices, wireless or wired, to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided, and that more or fewer means may be alternatively implemented or provided.
The above description is that of the preferred embodiment of the invention only. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents is encompassed without departing from the spirit of the disclosure. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims (8)

1. A video encoding method, comprising:
converting an input video stream into an RGB image;
carrying out target detection and motion detection on the RGB image so as to identify target pixels and motion pixels of the RGB image;
performing fusion processing on the target pixel and the motion pixel to determine an interested pixel and a non-interested pixel in the RGB image;
determining regions of interest and regions of non-interest according to the distribution of the pixels of interest and the pixels of non-interest;
allocating coding rates to the interested region and the non-interested region according to a set target code rate, which specifically comprises the following steps:
calculated according to the formula
Figure DEST_PATH_IMAGE002
Code rate of region of interest at minimum
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
Wherein, D1Is an R-D function of the region of interest, D2 is an R-D function of the region of non-interest,
Figure DEST_PATH_IMAGE008
a weight representing the overall coding quality;
Figure DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014
Figure DEST_PATH_IMAGE016
wherein
Figure DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE022
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE028
As described above
Figure DEST_PATH_IMAGE030
Indicating the coding complexity of the ith coding tree unit,
Figure DEST_PATH_IMAGE032
representing the coding complexity of the (i-1) th coding tree unit;
Figure DEST_PATH_IMAGE034
representing the ith coding tree unit
Figure DEST_PATH_IMAGE036
The number of pixels of the ith coding tree unit;
Figure DEST_PATH_IMAGE038
is a first constant which is a function of the first,
Figure DEST_PATH_IMAGE040
is a second constant;
Figure DEST_PATH_IMAGE042
with initial setting
Figure DEST_PATH_IMAGE044
Figure 800976DEST_PATH_IMAGE030
With initial setting
Figure DEST_PATH_IMAGE046
Figure DEST_PATH_IMAGE048
To represent
Figure DEST_PATH_IMAGE050
The natural logarithm of the sum of the coefficients,
Figure DEST_PATH_IMAGE052
to represent
Figure DEST_PATH_IMAGE054
The natural logarithm of (d);
Figure DEST_PATH_IMAGE056
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure DEST_PATH_IMAGE058
representing true consumption
Figure 373909DEST_PATH_IMAGE056
Figure DEST_PATH_IMAGE060
Is composed of
Figure DEST_PATH_IMAGE062
Consumed in time
Figure DEST_PATH_IMAGE064
Figure DEST_PATH_IMAGE066
Figure DEST_PATH_IMAGE068
Representing the number of coding tree units of the non-interesting area;
Figure DEST_PATH_IMAGE070
Figure DEST_PATH_IMAGE072
Figure DEST_PATH_IMAGE074
the number of coding tree elements representing the region of non-interest,
Figure DEST_PATH_IMAGE076
representing the set target code rate;
Figure DEST_PATH_IMAGE078
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE080
Figure DEST_PATH_IMAGE082
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE084
Figure DEST_PATH_IMAGE086
as described above
Figure DEST_PATH_IMAGE088
Indicating the coding complexity of the jth coding tree unit,
Figure DEST_PATH_IMAGE090
representing the coding complexity of the j-1 th coding tree unit;
Figure DEST_PATH_IMAGE092
representing the jth coding tree unit
Figure DEST_PATH_IMAGE094
The number of the pixels of the jth coding tree unit;
Figure 88399DEST_PATH_IMAGE038
is a first constant which is a function of the first,
Figure 827816DEST_PATH_IMAGE040
is a second constant;
Figure DEST_PATH_IMAGE096
with initial setting
Figure 704505DEST_PATH_IMAGE044
Having an initial setting value
Figure 261389DEST_PATH_IMAGE046
Figure 371165DEST_PATH_IMAGE048
To represent
Figure 38907DEST_PATH_IMAGE050
The natural logarithm of the number of the pairs,
Figure 301261DEST_PATH_IMAGE052
to represent
Figure 763466DEST_PATH_IMAGE054
The natural logarithm of (d);
Figure 252216DEST_PATH_IMAGE056
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure 333436DEST_PATH_IMAGE058
representing true consumption
Figure 325663DEST_PATH_IMAGE056
Figure DEST_PATH_IMAGE098
Is composed of
Figure DEST_PATH_IMAGE100
Consumed in time
Figure 660567DEST_PATH_IMAGE064
2. A video coding method according to claim 1, wherein the motion detection of the RGB image comprises:
taking a Gaussian mixture model GMM as a background model of a static scene without an invasive object; and taking pixels in the current RGB image which are not matched with the background model as motion pixels.
3. The video coding method according to claim 1, wherein the fusing the target pixel and the motion pixel comprises:
and if the pixel in the RGB image belongs to the target pixel and the motion pixel at the same time, judging the pixel as the pixel of interest.
4. The video coding method according to claim 1, wherein the step of determining regions of interest and regions of non-interest based on the distribution of the pixels of interest and the pixels of non-interest comprises:
if the ratio of the interested pixel in all the pixels of the coding block exceeds or is equal to a set proportion threshold value, the coding block is an interested area, otherwise, the coding block is a non-interested area.
5. A video coding system, comprising:
a conversion module configured to convert an input video stream into an RGB image;
a detection module configured to perform target detection and motion detection on the RGB image to identify target pixels and motion pixels of the RGB image;
a fusion module configured to perform fusion processing on the target pixel and the motion pixel to determine a pixel of interest and a non-pixel of interest in the RGB image;
a determination module configured to determine regions of interest and regions of non-interest from the distribution of the pixels of interest and the pixels of non-interest;
a code rate allocation module configured to allocate coding code rates to the region of interest and the non-region of interest according to a set target code rate, including:
calculated according to the following formula
Figure 636613DEST_PATH_IMAGE002
Code rate of region of interest at minimum
Figure 505212DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006A
Wherein D1 is the R-D function of the region of interest, D2 is the R-D function of the region of non-interest,
Figure 696153DEST_PATH_IMAGE008
a weight representing the overall coding quality;
Figure DEST_PATH_IMAGE010A
wherein the content of the first and second substances,
Figure 562478DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014A
Figure DEST_PATH_IMAGE016A
wherein
Figure 665301DEST_PATH_IMAGE018
Figure 88323DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE022A
Figure DEST_PATH_IMAGE024A
Figure DEST_PATH_IMAGE026A
Figure DEST_PATH_IMAGE028A
As described above
Figure 859005DEST_PATH_IMAGE030
Indicating the coding complexity of the ith coding tree unit,
Figure 365073DEST_PATH_IMAGE032
representing the coding complexity of the i-1 st coding tree unit;
Figure 909187DEST_PATH_IMAGE034
representing the ith coding tree unit
Figure 260534DEST_PATH_IMAGE036
The number of pixels of the ith coding tree unit;
Figure 455761DEST_PATH_IMAGE038
is a first constant which is a function of the first,
Figure 867150DEST_PATH_IMAGE040
is a second constant;
Figure 898560DEST_PATH_IMAGE042
with initial setting
Figure 788019DEST_PATH_IMAGE044
Figure 995009DEST_PATH_IMAGE030
With initial setting
Figure 452667DEST_PATH_IMAGE046
Figure 846739DEST_PATH_IMAGE048
To represent
Figure 664522DEST_PATH_IMAGE050
The natural logarithm of the number of the pairs,
Figure 460440DEST_PATH_IMAGE052
to represent
Figure 587534DEST_PATH_IMAGE054
The natural logarithm of (d);
Figure 468902DEST_PATH_IMAGE056
representing the bit number or total pixel number occupied by the compressed RGB image; representing true consumption
Figure 90376DEST_PATH_IMAGE056
Figure 740800DEST_PATH_IMAGE060
Is composed of
Figure 274681DEST_PATH_IMAGE062
Consumed in time
Figure 908925DEST_PATH_IMAGE064
Figure DEST_PATH_IMAGE066A
Figure 6194DEST_PATH_IMAGE068
Representing the number of coding tree units of the non-interesting area;
Figure DEST_PATH_IMAGE070A
Figure DEST_PATH_IMAGE072A
Figure 681763DEST_PATH_IMAGE074
the number of coding tree elements representing the region of non-interest,
Figure 386545DEST_PATH_IMAGE076
representing the set target code rate;
Figure DEST_PATH_IMAGE078A
wherein the content of the first and second substances,
Figure 304823DEST_PATH_IMAGE080
Figure DEST_PATH_IMAGE082A
wherein the content of the first and second substances,
Figure 845263DEST_PATH_IMAGE084
Figure DEST_PATH_IMAGE086A
as described above
Figure 283329DEST_PATH_IMAGE088
Indicating the coding complexity of the jth coding tree unit,
Figure 549225DEST_PATH_IMAGE090
representing the coding complexity of the j-1 th coding tree unit;
Figure 751536DEST_PATH_IMAGE092
representing the jth coding tree unit
Figure 659450DEST_PATH_IMAGE094
The number of the pixels of the jth coding tree unit;
Figure 512874DEST_PATH_IMAGE038
is a first constant which is a function of the first,
Figure 949671DEST_PATH_IMAGE040
is a second constant;
Figure 639279DEST_PATH_IMAGE096
with initial setting
Figure 819724DEST_PATH_IMAGE044
Figure 29120DEST_PATH_IMAGE088
With initial setting
Figure 902398DEST_PATH_IMAGE046
Figure 954668DEST_PATH_IMAGE048
To represent
Figure 797859DEST_PATH_IMAGE050
The natural logarithm of the number of the pairs,
Figure 251974DEST_PATH_IMAGE052
represent
Figure 404475DEST_PATH_IMAGE054
The natural logarithm of (d);
Figure 209620DEST_PATH_IMAGE056
representing the bit number or total pixel number occupied by the compressed RGB image;
Figure 856502DEST_PATH_IMAGE058
representing true consumption
Figure 165124DEST_PATH_IMAGE056
Figure 989992DEST_PATH_IMAGE098
Is composed of
Figure 751274DEST_PATH_IMAGE100
Consumed in time
Figure 201847DEST_PATH_IMAGE064
6. The video coding system of claim 5, wherein the detection module is further configured to:
taking a Gaussian mixture model GMM as a background model of a static scene without an invasive object; and taking pixels in the RGB image which are not matched with the background model as motion pixels.
7. A video coding system according to claim 5, wherein the fusion module is further configured to:
and if the pixel in the RGB image belongs to the target pixel and the motion pixel at the same time, determining the pixel as the pixel of interest.
8. The video coding system of claim 5, wherein the determining module is further configured to:
if the proportion of the interested pixel in all the pixels of the coding block exceeds or is equal to a set proportion threshold value, the coding block is an interested area, otherwise, the coding block is a non-interested area.
CN202210450047.2A 2022-04-24 2022-04-24 Video coding method and system Active CN114567778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210450047.2A CN114567778B (en) 2022-04-24 2022-04-24 Video coding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210450047.2A CN114567778B (en) 2022-04-24 2022-04-24 Video coding method and system

Publications (2)

Publication Number Publication Date
CN114567778A CN114567778A (en) 2022-05-31
CN114567778B true CN114567778B (en) 2022-07-05

Family

ID=81721068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210450047.2A Active CN114567778B (en) 2022-04-24 2022-04-24 Video coding method and system

Country Status (1)

Country Link
CN (1) CN114567778B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061100A (en) * 1997-09-30 2000-05-09 The University Of British Columbia Noise reduction for video signals
CN101742321A (en) * 2010-01-12 2010-06-16 浙江大学 Layer decomposition-based Method and device for encoding and decoding video
CN101916448A (en) * 2010-08-09 2010-12-15 云南清眸科技有限公司 Moving object detecting method based on Bayesian frame and LBP (Local Binary Pattern)
CN107396108A (en) * 2017-08-15 2017-11-24 西安万像电子科技有限公司 Code rate allocation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8467570B2 (en) * 2006-06-14 2013-06-18 Honeywell International Inc. Tracking system with fused motion and object detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061100A (en) * 1997-09-30 2000-05-09 The University Of British Columbia Noise reduction for video signals
CN101742321A (en) * 2010-01-12 2010-06-16 浙江大学 Layer decomposition-based Method and device for encoding and decoding video
CN101916448A (en) * 2010-08-09 2010-12-15 云南清眸科技有限公司 Moving object detecting method based on Bayesian frame and LBP (Local Binary Pattern)
CN107396108A (en) * 2017-08-15 2017-11-24 西安万像电子科技有限公司 Code rate allocation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的视频编码加速及智能比特分配;石隽;《优秀硕士学位论文》;20210115;全文 *

Also Published As

Publication number Publication date
CN114567778A (en) 2022-05-31

Similar Documents

Publication Publication Date Title
US10944996B2 (en) Visual quality optimized video compression
CN111918066B (en) Video encoding method, device, equipment and storage medium
US9013536B2 (en) Augmented video calls on mobile devices
CN106303157B (en) Video noise reduction processing method and video noise reduction processing device
JP6109956B2 (en) Utilize encoder hardware to pre-process video content
CN112102212B (en) Video restoration method, device, equipment and storage medium
WO2021164216A1 (en) Video coding method and apparatus, and device and medium
CN109698957B (en) Image coding method and device, computing equipment and storage medium
CN111182303A (en) Encoding method and device for shared screen, computer readable medium and electronic equipment
CN102572502B (en) Selecting method of keyframe for video quality evaluation
US20190045203A1 (en) Adaptive thresholding for computer vision on low bitrate compressed video streams
Yang et al. An objective assessment method based on multi-level factors for panoramic videos
CN112954398B (en) Encoding method, decoding method, device, storage medium and electronic equipment
US11290345B2 (en) Method for enhancing quality of media
US20150117515A1 (en) Layered Encoding Using Spatial and Temporal Analysis
CN112435244A (en) Live video quality evaluation method and device, computer equipment and storage medium
CN111524110B (en) Video quality evaluation model construction method, evaluation method and device
WO2023160617A9 (en) Video frame interpolation processing method, video frame interpolation processing device, and readable storage medium
CN113784118A (en) Video quality evaluation method and device, electronic equipment and storage medium
CN110766637A (en) Video processing method, processing device, electronic equipment and storage medium
CN113068034A (en) Video encoding method and device, encoder, equipment and storage medium
CA3182110A1 (en) Reinforcement learning based rate control
CN114554211A (en) Content adaptive video coding method, device, equipment and storage medium
CN103929640A (en) Techniques For Managing Video Streaming
CN106603885B (en) Method of video image processing and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant