CN114401405A - Video coding method, medium and electronic equipment - Google Patents

Video coding method, medium and electronic equipment Download PDF

Info

Publication number
CN114401405A
CN114401405A CN202210042341.XA CN202210042341A CN114401405A CN 114401405 A CN114401405 A CN 114401405A CN 202210042341 A CN202210042341 A CN 202210042341A CN 114401405 A CN114401405 A CN 114401405A
Authority
CN
China
Prior art keywords
video
video data
image
frame
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210042341.XA
Other languages
Chinese (zh)
Inventor
郑晓明
谭贤波
罗昭
门硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202210042341.XA priority Critical patent/CN114401405A/en
Publication of CN114401405A publication Critical patent/CN114401405A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors

Abstract

The present application relates to the field of video coding technologies, and in particular, to a video coding method, a video coding medium, and an electronic device. The method comprises the following steps: the image signal processor acquires first video data; the image signal processor performs image processing on each frame of image in the first video data to obtain second video data, and stores intermediate parameters obtained in the image processing process in a memory; the video processing unit acquires the second video data from the image signal processor, acquires the intermediate parameter from the memory, and encodes the second video data based on the acquired intermediate parameter. Therefore, the video processing unit can directly acquire the intermediate parameters from the memory, so that the intermediate parameters which are calculated by an image signal processor in the image signal processor and are used for recalculating the motion vector by the video processing unit are reduced, the video coding time is shortened, and the video coding efficiency is improved.

Description

Video coding method, medium and electronic equipment
Technical Field
The present application relates to the field of video coding technologies, and in particular, to a video coding method, a video coding medium, and an electronic device.
Background
With the continuous development and popularization of internet technology, people increasingly involve video playing functions of cross-devices in life, for example, videos shot by a local device are sent to another electronic device for playing.
Generally, if video data is directly transmitted across devices without being encoded and compressed, the data volume is large, and long transmission delay is easily caused. Therefore, in order to reduce the data amount during video transmission and avoid long transmission delay, the electronic device needs to encode the video by using a video encoding and compression technology before transmitting the video.
Disclosure of Invention
The embodiment of the application provides a video coding method, a video coding medium and electronic equipment.
In a first aspect, an embodiment of the present application provides a video encoding method, which is applied to an electronic device including an image signal processor, a video processing unit, and a memory, and includes:
the image signal processor acquires first video data;
the image signal processor performs image processing on each frame of image in the first video data to obtain second video data, and stores intermediate parameters obtained in the image processing process in the memory;
the video processing unit acquires the second video data from the image signal processor and encodes the second video data to obtain encoded video data, wherein,
the video processing unit encoding the second video data comprises:
the intermediate parameters are retrieved from the memory and the second video data is encoded based on the retrieved intermediate parameters.
Therefore, the video processing unit can directly acquire the intermediate parameters from the memory, so that the intermediate parameters which are calculated by an image signal processor in the image signal processor and are used for recalculating the motion vector by the video processing unit are reduced, the video coding time is shortened, and the video coding efficiency is improved.
In a possible implementation of the first aspect, the intermediate parameter comprises a motion vector.
In a possible implementation of the first aspect, the obtaining the motion vector from the memory and encoding the second video data based on the obtained motion vector includes:
the video processing unit searches reference blocks from current blocks in two adjacent frames of images in the second video data based on the motion vectors;
processing the second video data to obtain difference value information of each two adjacent frames of images, wherein the difference value information comprises pixel difference values and motion vectors of all current blocks and reference blocks in each two adjacent frames of images in the second video data;
and the video processing unit encodes the I-frame image in the second video data, the coordinates of each reference block in the I-frame image and the difference value information of each two adjacent frames.
In one possible implementation of the first aspect, the finding, by the video processing unit, each reference block from each current block in each two adjacent frame images based on the motion vector includes:
the video processing unit carries out integer pixel motion estimation on the second video data based on the motion vector, and finds each first reference block matched with each current block in each two adjacent frames of images;
and the video processing unit carries out decimal pixel motion estimation on each first reference block and finds each second reference block matched with each current block in each two adjacent frame images.
In one possible implementation of the first aspect described above, the fractional pixel motion estimation comprises 1/4 pixel motion estimation and/or 1/2 pixel motion estimation.
In a possible implementation of the first aspect, the electronic device includes any one of:
laptop computers, desktop computers, tablets, cell phones, servers, wearable devices, portable games, and televisions.
In one possible implementation of the first aspect described above, the image processing comprises image noise reduction processing.
In one possible implementation of the first aspect, the image denoising process includes a 3D denoising process.
In a second aspect, the present application provides a readable medium, on which instructions are stored, and when executed on an electronic device, the instructions cause the electronic device to perform the video encoding method of any one of the first aspect.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a memory for storing instructions for execution by one or more processors of the electronic device, an
A processor, which is one of the processors of the electronic device, configured to perform the video encoding method of any of the first aspects.
Drawings
FIG. 1 illustrates a process diagram of motion vector calculation, according to some embodiments of the present application;
fig. 2 illustrates an application scenario diagram of a video data encoding method according to some embodiments of the present application;
fig. 3 shows a schematic structural diagram of a handset 100 adapted to the video coding method provided in the present application, according to some embodiments of the present application;
fig. 4 is a schematic flowchart illustrating the mobile phone 100 encoding the acquired video data by using the video encoding method provided in the present application according to some embodiments of the present application;
FIG. 5 illustrates a schematic diagram of the structure of an ISP102 according to some embodiments of the present application;
FIG. 6 is a diagram illustrating a process of processing image data by a general function module;
fig. 7 is a schematic diagram of a mobile phone 100 adapted to the video encoding method according to some embodiments of the present application.
Detailed Description
The illustrative embodiments of the present application include, but are not limited to, a video encoding method, medium, and electronic device.
To better illustrate the scheme of the embodiments of the present application, the terms "motion vector", "motion estimation" and "video coding" referred to in the embodiments of the present application are described below.
(a) Motion Vector (MV), which is an important parameter in video coding compression, includes a relative displacement between a current block in a current picture and a reference block in a reference picture in two adjacent pictures, where the current block and the reference block are the same region of the same target object in the two adjacent pictures, but since the target object is likely to be in a motion state in a segment of video, there is a possibility that there is a pixel difference between the target object in the current block and the target object in the reference block.
For example, fig. 1 shows a schematic diagram of a motion vector calculation process according to some embodiments of the present application, as shown in fig. 1, a reference frame image of a K-1 th frame and a current frame image of the K-th frame are two adjacent frames in a video, a pixel block 1 'representing a head of a user 1 is in the reference frame image of the K-1 th frame, the pixel block 1' is a reference block, and coordinates of an upper left corner a 'of the pixel block 1' are (i, j). The current frame image of the K frame has a pixel block 1 representing the head of the user 1, the pixel block 1 is a current block, the coordinates of the upper left corner A of the pixel block 1 are (x, y), and the MV comprises a difference value (i-x, j-y) between the coordinates (i, j) and the coordinates (x, y). The sizes of the pixel blocks 1' and 1 are NxN, and N is the number of pixels.
The reference block may be obtained by a block matching algorithm (BMA for short), and the main idea is to divide a frame of image into NxN-sized pixel blocks, and then each block finds a reference block satisfying a matching condition from a plurality of pixel blocks within a search range of a previous frame according to a certain matching criterion. Furthermore, it will be appreciated that in other embodiments, the above-described reference block may be derived by other algorithms.
(b) Motion Estimation (ME), the process of obtaining motion vectors is called motion estimation. The basic idea of motion estimation is to divide each frame of image of video (image sequence) into many non-overlapping pixel blocks, consider the displacement of all pixels in the pixel blocks to be the same, then find out the block which satisfies the matching condition with the current block, i.e. the reference block, according to a certain matching criterion in a certain given specific search range from each pixel block to the reference frame, and the relative displacement between the reference block and the current block is the motion vector.
(c) I frame
An I-frame is a self-contained frame that carries all the information of a frame of an image. The encoding and decoding process of the I frame does not depend on other images except the I frame, and can be independently encoded and decoded, and a complete image can be reconstructed by only using the data of the I frame during decoding. The I frame carries all information of one frame of image, so the data volume of the I frame is generally larger and is close to the size of a compressed picture of a static image, and the transmission time of the I frame is longer. In general, the first frame in a video is an I-frame.
(d) P frame
The coding of the P frame needs to depend on a P frame or an I frame before the P frame, and the coding of the P frame is to perform compression coding on difference information between a frame of image corresponding to the P frame and a previous frame of image, wherein one frame of image includes a plurality of current blocks, and the difference information includes pixel differences and motion vectors between all current blocks and reference blocks in each two adjacent frames of images in the video data.
It will be appreciated that P-frames cannot be decoded independently and need to rely on a P-frame or I-frame preceding the P-frame for decoding. When decoding the P frame, the previous frame image and the difference information must be summed to reconstruct the complete P frame image. Since the P frame is encoded with the difference information, the data size of the P frame is generally small, and the P frame can be transmitted within one frame time.
(e) In the embodiment of the application, in an encoding stage, a video codec in an electronic device encodes an I-frame image, coordinates of each reference block in the I-frame image, and difference information of two adjacent frames. In the decoding stage, a video codec in the electronic device may sum the coordinates of the I-frame image, each reference block in the I-frame image, and the difference information between two adjacent frames to construct a complete image of each frame. It is understood that since the frame images are sequentially encoded in time sequence when the video is encoded, the frame images are also sequentially decoded in time sequence when the video is encoded. Therefore, each frame image in the video can be decoded according to the I frame image, the coordinates of each reference block in the I frame image and the difference information between every two adjacent frame images in the video.
The video coding method adopted by the application can be applied to any application scene related to video data transmission, including video call, real-time video conference, fixed or mobile video telephone, video monitoring, streaming media and other application scenes, but not limited thereto.
It can be understood that the video encoding method provided in the present application is applicable to any electronic device with video capturing and video processing functions, and the electronic device may include but is not limited to: laptop computers, desktop computers, tablets, cell phones, servers, wearable devices, portable games, televisions, etc. The embodiments of the present application will be described in further detail below by taking a mobile phone as an example.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
For convenience of explaining the technical solution of the present application, an application scenario of a video call is taken as an example in the following description, but it is understood that the technical solution of the present application is also applicable to other application scenarios, and is not limited thereto.
A video coding scheme in a video call application scenario is as follows:
fig. 2 is a schematic diagram illustrating an application scenario of a video data encoding method according to some embodiments of the present application. As shown in fig. 2, a user 1 and a user 2 respectively use a mobile phone 100 and a mobile phone 200 to perform a video call, and in the video call process, the mobile phone 100 needs to encode an I-frame image in the collected video data including the user 1, coordinates of each reference block in the I-frame image, and difference information of each two adjacent frames, and send the encoded information to the mobile phone 200; similarly, the mobile phone 200 needs to encode the I-frame image in the captured video data including the user 2, the coordinates of each reference block in the I-frame image, and the difference information between each two adjacent frames, and send the encoded information to the mobile phone 100. The following will describe the technical solution of the present application by taking the mobile phone 100 as an example from video capture to video encoding.
Fig. 3 shows a schematic structural diagram of a handset 100 adapted to the video coding method provided in the present application, according to some embodiments of the present application. The specific scheme of the handset 100 from capturing video to encoding video will be described in conjunction with the structure of the handset 100 described in fig. 3.
The mobile phone 100 includes a camera 101, an Image Signal Processing (ISP) 102, a Video Processing Unit (VPU) 103, a display screen 104, and a memory 105.
Referring to the block diagram of the mobile phone 100 shown in fig. 3, a process from capturing a video to encoding the video of the mobile phone 100 is generally described as follows:
as shown in fig. 3, a camera 101 collects video data and transmits the video data to an Image Signal Processing (ISP) 102 for optimization, which may include, but is not limited to, any one or more of the following: noise reduction (noise), dead pixel Correction (BPC), Black Level Compensation (BLC), Automatic White Balance (AWB), Gamma Correction (Gamma Correction), Color Correction (Color Correction), edge enhancement, brightness, contrast, and chromaticity adjustment. The ISP102 sends the optimized Video data to a Video codec (Video Processing Unit, VPU)103 for encoding, and the Video Processing Unit 103 can send the encoded Video data to the mobile phone 200, so that the mobile phone 200 decodes the encoded Video data and displays the Video including the user 1 on the interface of the mobile phone 200.
It should be noted that in the application scenario shown in fig. 2, an intermediate result in the process of optimizing the video data by ISP102 (e.g., 3D denoising) is a motion vector, and since the optimization and encoding are two relatively independent processes, the motion vector is generally not transmitted to VPU103 along with the optimized video data, but ISP102 directly transmits the optimized video data to VPU 103.
Then, in the process of encoding the optimized video data received from the ISP102, the VPU103 needs to process the optimized video data to obtain an I-frame image, coordinates of each reference block in the I-frame image, and difference information of each two adjacent frames, where the difference information includes pixel differences and motion vectors of all current blocks and reference blocks in each two adjacent frames of the optimized video data, and then encodes the coordinates of each reference block in the I-frame image, and the difference information of each two adjacent frames.
Based on the above video encoding scheme, VPU103 needs to re-process the optimized video data again by using a large number of algorithms in the video data processing stage of video encoding, and finds each reference block from each current block in each two adjacent frame images, that is, needs to re-calculate the motion vector, which is the parameter that ISP102 has already calculated in ISP 102. And then, each two adjacent frames of images in the optimized video data are processed to obtain the difference information, so that more time is consumed for video coding, and the video coding efficiency is lower.
In order to solve the foregoing technical problem, an embodiment of the present invention provides a video encoding method faster than the foregoing embodiments, in which an image signal processor stores an intermediate result of a motion vector in an optimization process of video data in a memory, a video processing unit directly obtains the intermediate result of the motion vector from the memory in an encoding process of the optimized video data sent by the image signal processing unit, quickly finds reference blocks matching each current block in each two adjacent frame images based on the motion vector, processes the optimized video data to obtain a pixel difference between each current block and each reference block in each two adjacent frame images, and then coordinates of each reference block in an I frame image and the I frame image, a pixel difference between each current block and each reference block in each two adjacent frame images, and a motion of each two adjacent frames directly obtained The vector is encoded.
Therefore, in the process of processing the optimized video data to obtain the pixel difference value between each current block and each reference block in each two adjacent frames of images, the VPU103 can directly use the motion vector acquired from the memory 105 to quickly search the reference block corresponding to the current block, thereby reducing the search calculation amount, saving the search time, reducing the video coding time length and improving the video coding efficiency.
For example, corresponding to the application scenario shown in fig. 2, as shown in fig. 3, the camera 101 acquires video data, transmits the video data to the ISP102 for optimization, the ISP102 stores the intermediate result of the motion vector in the video data optimization process into the memory 105, during the process of encoding the optimized video data transmitted by ISP102 by VPU103, the intermediate result of the motion vector is directly obtained from the memory 105, the reference blocks matching the current blocks in the two adjacent frames of images are quickly found based on the motion vector, then processing the optimized video data to obtain the pixel difference value between each current block and each reference block in each two adjacent frames of images, and then coding the I frame image, the coordinates of each reference block in the I frame image, the pixel difference value between each current block and each reference block in each two adjacent frame images and the motion vector of each two adjacent frames which is directly obtained.
Therefore, in the process of processing the optimized video data to obtain the pixel difference value between each current block and each reference block in each two adjacent frames of images, the VPU103 can directly use the motion vector acquired from the memory 105 to quickly search the reference block corresponding to the current block, thereby reducing the search calculation amount, saving the search time, reducing the video coding time length and improving the video coding efficiency.
It will be appreciated that in other embodiments, memory 105 may store other intermediate parameters, in addition to motion vectors, that are required by both ISP102 and VPU 103. Thus, VPU103 can directly call the intermediate parameter during the encoding process. The video coding time is reduced, and the video coding efficiency is improved.
Corresponding to the application scenario of fig. 2 and the structure of fig. 3, fig. 4 is a schematic flowchart illustrating a process of the mobile phone 100 to encode the obtained video data by using the video encoding method provided in the present application according to some embodiments of the present application. Specifically, as shown in fig. 4, the process includes the following steps:
401: the ISP102 acquires video data.
It will be appreciated that video is dynamic, being a moving image sequence consisting of a plurality of successive images, and that video data includes moving image sequence data.
It can be understood that in the video call scenario shown in fig. 1, the mobile phone 100 captures a video of the user 1 by using the camera 101, and sends video data corresponding to the video to the ISP102, so that the ISP102 acquires the video data.
402: the ISP102 performs optimization processing on the video data to obtain optimized video data and motion vectors.
It will be appreciated that the image sensor in the camera 101 is arranged to convert light signals reflected from a scene collected by the camera 101 into electrical signals to generate RAW image (RAW) data, and that a plurality of consecutive frames of image data constitute the video data to be processed optimally.
The optimization process may include, but is not limited to, any one or more of the following: noise reduction (noise), dead pixel Correction (BPC), Black Level Compensation (BLC), Automatic White Balance (AWB), Gamma Correction (Gamma Correction), Color Correction (Color Correction), edge enhancement, brightness, contrast, and chromaticity adjustment. When the image sensor transfers the image data in the RAW format to the image signal processor 1020, the optimization process is performed.
It is understood that the image sensor may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. Both CCD and CMOS light-sensing elements have a problem of thermal stability (hot pixel), the quality of the image is temperature dependent, if the temperature of the camera 101 rises, the noise signal is too strong, and there will be spots of mottle on the picture where it should not be, these spots are noise points, because the appearance of the noise signal is random, so that the noise signal appears in each frame of image, and is different. 3D falls and makes an uproar (3D DNR) through comparing several adjacent frames of images, filters out the information that does not overlap (noise signal promptly) automatically, reduces the image noise point for the image is more thorough, thereby shows the picture of more pure and fine and smooth.
The ISP102 needs to use motion estimation in the process of performing video noise reduction on the video data received from the camera 101, and the ISP102 performs motion estimation processing on the video data to obtain a motion vector.
For example, as shown in fig. 1, the reference frame image of the K-1 th frame and the current frame image of the K-th frame are two adjacent frame images in the video, the reference frame image of the K-1 th frame has a pixel block 1 'representing the head of the user 1, the pixel block 1' is a reference block, and the coordinates of the upper left corner a 'of the pixel block 1' are (i, j). The current frame image of the K frame has a pixel block 1 representing the head of the user 1, the pixel block 1 is a current block, the coordinate of the upper left corner A of the pixel block 1 is (x, y), and the MV is a difference value (i-x, j-y) between the coordinate (i, j) and the coordinate (x, y). The sizes of the pixel blocks 1' and 1 are NxN, and N is the number of pixels.
It will be appreciated that in applying the block matching motion algorithm, it is first necessary to select a block of pixels of size n x n (the current block) in the current frame. Then, selecting N-N search window in the previous frame image of the current frame, and ensuring the search window and N-N pixel block in the current frame to coincide on the space coordinate, then moving the search window in the previous frame image of the current frame according to a certain matching rule and a certain step length, and searching the pixel block which accords with the matching condition, wherein the pixel block which accords with the matching condition is the reference block.
For example, as shown in fig. 1, when applying the block matching motion algorithm, it is first necessary to select a pixel block 1 representing the head of the user 1 in the K-th frame image (current frame image), where the size of the pixel block 1 is n × n. Then, a search window s of N x N is selected from the K-1 frame image (reference frame image)), and the search window is required to be ensured to be coincident with a pixel block 1 of N x N size on the spatial coordinates, then the search window is moved in the K-1 frame image by a certain step length according to a certain matching rule, and a pixel block 1' which also represents the head of the user 1, namely the reference block, is searched.
It can be understood that the matching rules commonly used in block matching motion estimation algorithms can be generally divided into two categories: minimum Mean Square error (MMS) matching and Minimum Absolute error (MAD) matching.
403: the ISP102 sends the motion vectors to the memory 105.
I.e. ISP102 writes the motion vector data to memory 105.
404: the memory 105 stores motion vectors.
It can be understood that motion estimation is used in the video denoising process of the ISP102 on the video data, and motion estimation technology is also used in the VPU103 video encoding, and in the process from the video data acquisition of the camera 101 to the VPU103 encoding the video data received from the ISP102, the ISP102 and the VPU1023 each independently apply a motion estimation algorithm, but the motion estimation is large in calculation amount and long in time, and occupies about 40% of the total encoding time of the VPU 1023.
In the video encoding scheme provided in the embodiment of the present application, in order to save the encoding time of the motion vector obtained by using the motion estimation technique in the VPU103 video encoding, the ISP102 calculates the motion vector data of each frame of image in the video by using motion estimation, and then writes the motion vector data into the memory 105, and when the VPU103 encodes the acquired video frame, the motion vector data can be directly read from the memory 105 and used.
It is understood that in some embodiments, to protect data security, the memory 105 space herein should be designed such that access to ISP102 components is read-write, access to VPU103 is read-only, and other components do not have access to memory 105.
405: VPU103 obtains the processed video data from ISP 102.
I.e. VPU103 reads the processed video data from ISP 102.
406: the VPU103 retrieves motion vector data from the memory 105.
I.e. the VPU103 reads the motion vector data from the memory 105.
407: the VPU103 obtains a pixel difference value between the current block and the reference block in each of the two adjacent frames of images based on the motion vector data and the video data after the optimization processing.
The VPU103 divides Motion Estimation (ME) into two stages in the encoding process, the first stage being coarse Motion Estimation (RME), i.e. Motion Estimation of integer pixels. For example, the search window is moved by taking the unit pixel as the search step length, the similarity between the pixel block of the current frame and the pixel block in the reference frame is calculated point by point, and the pixel block meeting the similarity is the reference block.
The second stage is Fine Motion Estimation (FME), which is the motion Estimation of fractional pixels. The RME performs motion Estimation of integer pixels on a current block in a current frame to find a reference block meeting a matching condition, and the reference block is output to a second-stage Fine Motion Estimation (FME) to perform matching search of fractional pixels.
For example, the search window is moved by taking the decimal pixel as a search step, the similarity between the pixel block of the current frame and the pixel block in the reference frame is calculated point by point, and the pixel block meeting the similarity is the reference block. The fractional pixels may be 1/2 pixels, 1/4 pixels, etc.
It can be understood that in the embodiment of the present application, the MV obtained by ISP102 in the 3D denoising (3D DNR) process can be used as an input of VPU103, to additionally provide a candidate MV with high reliability for RME, and the RME can determine which direction and range in the search area needs to use more computing resources for motion estimation according to the MV. Therefore, in the process of processing the optimized video data to obtain the pixel difference value between each current block and each reference block in each two adjacent frames of images, the VPU103 can directly use the motion vector acquired from the memory 105 to quickly search the reference block corresponding to the current block, thereby reducing the search calculation amount, saving the search time, reducing the video coding time length and improving the video coding efficiency.
It should be understood that the embodiment of the present application does not limit that step 407 must be executed after step 406. Step 407 may also be performed before step 406 or simultaneously with step 406, as long as step 407 is performed before step 408.
408: the VPU103 encodes the I-frame picture, the coordinates of each reference block in the I-frame picture, the pixel difference between the current block and the reference block in the two adjacent frame pictures, and the directly obtained motion vector.
It can be understood that, in the process of encoding the optimized video data sent by ISP102 by VPU103, the motion vector is directly obtained from memory 105, and only the optimized video data needs to be processed to obtain the pixel difference between the current block and the reference block in each two adjacent frame images, and then the coordinates of each reference block in the I frame image and the I frame image, the pixel difference between the current block and the reference block in each two adjacent frame images, and the motion vector obtained directly are encoded.
Therefore, the time for processing the video data by the VPU103 to obtain the motion vector is saved, and the video coding efficiency is improved to a certain extent.
Fig. 5 illustrates a schematic diagram of the structure of an ISP102, according to some embodiments of the present application. As shown in fig. 5, the ISP102 is an application-specific integrated circuit (ASIC) for image data processing, which is used for further processing the image data formed by the image sensor 102 to obtain better image quality.
ISP102 includes a processor 1021, an image transmission interface 1022, a general purpose peripheral device 1023, a fill module 1024, and a general function module 1025.
Wherein, the processor 1021 is used for logic control and scheduling in the ISP 102.
The image transmission interface 1022 is used for transmission of image data.
General peripheral devices 1023 include, but are not limited to: a bus for coupling various modules of ISP102 and their controllers, a bus for coupling other devices, such as an advanced high-performance bus (AHB), that enables the ISP to communicate with other devices (e.g., DSPs, CPUs, etc.) at high performance; and a WATCHDOG unit (WATCHDOG) for monitoring the working state of the ISP.
And a filling module 1024, configured to perform filling operation on the image data according to the requirement of the image processing model in the NPU, for example, the deep learning model, on the input data.
The general function module 1025 is used to process the image input to the ISP102, including but not limited to: dead pixel Correction (BPC), Black Level Compensation (BLC), Automatic White Balance (AWB), Gamma Correction (Gamma Correction), Color Correction (Color Correction), noise reduction (Denoise), edge enhancement, brightness, contrast, chromaticity adjustment, and the like. When the image sensor transmits the image data in the RAW format to the image signal processor 1020, the image data is processed by the pass-through function block. The processing of the image data by the general function block 1025 is described in detail below in conjunction with fig. 7 and will not be described further herein.
It is understood that the structure of ISP102 shown in fig. 5 is only an example, and those skilled in the art should understand that the ISP can include more or less modules, and can combine or split some modules, and the embodiment of the present application is not limited thereto.
The general function modules may include a RAW domain processing module 1025a, a YUV domain processing module 1025b, and an RGB domain processing module 1025c, and fig. 6 shows a schematic diagram of a process of processing image data by the general function modules, the process is as follows:
the RAW domain processing module 1025a performs dead pixel correction, black level correction, and automatic white balance on the image data.
The image data processed by the RAW domain is subjected to RGB interpolation to obtain image data of RGB domain, and then the image data of RGB domain is subjected to gamma correction and color correction by the RGB domain processing module 1025 c.
The image data processed in the RGB domain is subjected to color gamut conversion to obtain image data in a YUV domain, and then the YUV domain processing module 1025b performs noise reduction, edge enhancement, and brightness/contrast/chromaticity adjustment on the image data in the YUV domain. In the embodiment of the present application, the image denoising method provided in the embodiment of the present application may be used to denoise the image data in the YUV domain.
It is understood that the structure of ISP103 shown in fig. 6 is only an example, and those skilled in the art should understand that it may contain more or less modules, and may combine or split some modules, and the embodiment of the present application is not limited.
Fig. 7 is a schematic diagram of a mobile phone 100 adapted to the video encoding method according to some embodiments of the present application. As shown in fig. 7, the mobile phone 100 may include a processor 110, a power module 140, a memory 180, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, a camera 170, an interface module 160, keys 101, a display screen 102, and the like.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer components than shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more Processing units, for example, Processing modules or Processing circuits that may include a Central Processing Unit (CPU), an Image Signal Processing Unit (ISP), a Video Processing Unit (VPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a microprocessor MCU (Micro-programmed Control Unit), an AI (Artificial Intelligence) processor, or a Programmable logic device fpga (field Programmable Gate array), and so on. The different processing units may be separate devices or may be integrated into one or more processors. A memory unit may be provided in the processor 110 for storing instructions and data. In some embodiments, the memory location in processor 110 is a cache memory. The ISP, VPU and memory 180 may be coupled by a bus to form a System On Chip (SOC), or in other embodiments, the ISP, VPU and memory 180 may be separate devices.
The Memory 180 may be used for storing Data, software programs, and modules, and may be a Volatile Memory (Volatile Memory), such as a Random-Access Memory (RAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM).
The power module 140 may include a power supply, power management components, and the like. The power source may be a battery. The power management component is used for managing the charging of the power supply and the power supply of the power supply to other modules. In some embodiments, the power management component includes a charge management module and a power management module. The charging management module is used for receiving charging input from the charger; the power management module is used for connecting a power supply, the charging management module and the processor 110. The power management module receives power and/or charge management module input and provides power to the processor 110, the display 102, the camera 170, and the wireless communication module 120.
The mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, an LNA (Low noise amplifier), and the like. The mobile communication module 130 can provide a solution including wireless communication of 2G/3G/4G/5G and the like applied to the handset 100. The mobile communication module 130 may receive electromagnetic waves from the antenna, filter, amplify, etc. the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication module 130 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be disposed in the same device as at least some of the modules of the processor 110. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), Bluetooth (BT), GNSS), global navigation satellite system (global navigation satellite system, WLAN), Wireless Local Area Network (WLAN), short-range wireless communication technology (NFC), frequency modulation (frequency modulation, and/or FM), infrared communication technology (IR), and the like. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou satellite navigation system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The wireless communication module 120 may include an antenna, and implement transceiving of electromagnetic waves via the antenna. The wireless communication module 120 may provide a solution for wireless communication applied to the mobile phone 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The handset 100 may communicate with a network and other devices via wireless communication techniques.
In some embodiments, the mobile communication module 130 and the wireless communication module 120 of the handset 100 may also be located in the same module.
The display screen 102 is used for displaying human-computer interaction interfaces, images, videos and the like. The display screen 102 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like.
The sensor module 190 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
The audio module 150 is used to convert digital audio information into an analog audio signal output or convert an analog audio input into a digital audio signal. The audio module 150 may also be used to encode and decode audio signals. In some embodiments, the audio module 150 may be disposed in the processor 110, or some functional modules of the audio module 150 may be disposed in the processor 110. In some embodiments, audio module 150 may include speakers, an earpiece, a microphone, and a headphone interface.
The camera 170 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The light receiving element converts an optical Signal into an electrical Signal, and then transmits the electrical Signal to an ISP (Image Signal Processing) to convert the electrical Signal into a digital Image Signal. The mobile phone 100 can implement a shooting function through an ISP, a camera 170, a VPU, a GPU (graphics Processing Unit), a display screen 102, an application processor, and the like. The camera 170 may be a fixed focus lens, a zoom lens, a fisheye lens, a panoramic lens, or the like.
The interface module 160 includes an external memory interface, a Universal Serial Bus (USB) interface, a Subscriber Identity Module (SIM) card interface, and the like. The external memory interface may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the mobile phone 100. The external memory card communicates with the processor 110 through an external memory interface to implement a data storage function. The usb interface is used for communication between the mobile phone 100 and other electronic devices. The SIM card interface is used to communicate with a SIM card attached to the handset 100, for example to read a telephone number stored in the SIM card or to write a telephone number into the SIM card.
In some embodiments, the handset 100 also includes keys 101, motors, indicators, and the like. The keys 101 may include a volume key, an on/off key, and the like. The motor is used to generate a vibration effect to the mobile phone 100, for example, when the mobile phone 100 is called, to prompt the user to answer the call of the mobile phone 100. The indicators may include laser indicators, radio frequency indicators, LED indicators, and the like.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, Read-Only memories (CD-ROMs), magneto-optical disks, Read-Only memories (ROMs), Random Access Memories (RAMs), Erasable Programmable Read-Only memories (EPROMs), Electrically Erasable Programmable Read-Only memories (EEPROMs), magnetic or optical cards, flash Memory, or tangible machine-readable memories for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the Internet to transmit information in an electrical, optical, acoustical or other form of propagated signals. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that in the examples and specification of this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (10)

1. A video encoding method applied to an electronic device including an image signal processor, a video processing unit, and a memory, the method comprising:
the image signal processor acquires first video data;
the image signal processor performs image processing on each frame of image in the first video data to obtain second video data, and stores intermediate parameters obtained in the image processing process in the memory;
the video processing unit acquires the second video data from the image signal processor and encodes the second video data to obtain encoded video data, wherein,
the video processing unit encoding the second video data comprises:
the intermediate parameters are retrieved from the memory and the second video data is encoded based on the retrieved intermediate parameters.
2. The method of claim 1, wherein the intermediate parameters comprise motion vectors.
3. The method of claim 2, wherein the retrieving the motion vector from the memory and encoding the second video data based on the retrieved motion vector comprises:
the video processing unit searches reference blocks from current blocks in two adjacent frames of images in the second video data based on the motion vectors;
processing the second video data to obtain difference value information of each two adjacent frames of images, wherein the difference value information comprises pixel difference values and motion vectors of all current blocks and reference blocks in each two adjacent frames of images in the second video data;
and the video processing unit encodes the I-frame image in the second video data, the coordinates of each reference block in the I-frame image and the difference value information of each two adjacent frames.
4. The method of claim 3, wherein the video processing unit finds each reference block from each current block in each two adjacent frame of pictures based on the motion vector, comprising:
the video processing unit carries out integer pixel motion estimation on the second video data based on the motion vector, and finds each first reference block matched with each current block in each two adjacent frames of images;
and the video processing unit carries out decimal pixel motion estimation on each first reference block and finds each second reference block matched with each current block in each two adjacent frame images.
5. The method of claim 4 wherein the fractional pixel motion estimation comprises 1/4 pixel motion estimation and/or 1/2 pixel motion estimation.
6. The method of claim 1, wherein the electronic device comprises any one of:
laptop computers, desktop computers, tablets, cell phones, servers, wearable devices, portable games, and televisions.
7. The method of claim 1, wherein the image processing comprises image noise reduction processing.
8. The method of claim 1, wherein the image denoising process comprises a 3D denoising process.
9. A readable medium, characterized in that it has stored thereon instructions which, when executed on an electronic device, cause the electronic device to carry out the video coding method of any one of claims 1 to 8.
10. An electronic device, comprising:
a memory for storing instructions for execution by one or more processors of the electronic device, an
Processor, being one of the processors of an electronic device, for performing the video encoding method of any of claims 1 to 8.
CN202210042341.XA 2022-01-14 2022-01-14 Video coding method, medium and electronic equipment Pending CN114401405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210042341.XA CN114401405A (en) 2022-01-14 2022-01-14 Video coding method, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210042341.XA CN114401405A (en) 2022-01-14 2022-01-14 Video coding method, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114401405A true CN114401405A (en) 2022-04-26

Family

ID=81231493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210042341.XA Pending CN114401405A (en) 2022-01-14 2022-01-14 Video coding method, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114401405A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227614A (en) * 2008-01-22 2008-07-23 炬力集成电路设计有限公司 Motion estimation device and method of video coding system
CN101562742A (en) * 2008-04-14 2009-10-21 上海升岳电子科技有限公司 Video signal processing method
CN102496165A (en) * 2011-12-07 2012-06-13 四川九洲电器集团有限责任公司 Method for comprehensively processing video based on motion detection and feature extraction
CN111010495A (en) * 2019-12-09 2020-04-14 腾讯科技(深圳)有限公司 Video denoising processing method and device
WO2020097888A1 (en) * 2018-11-15 2020-05-22 深圳市欢太科技有限公司 Video processing method and apparatus, electronic device, and computer-readable storage medium
CN112351280A (en) * 2020-10-26 2021-02-09 杭州海康威视数字技术股份有限公司 Video coding method and device, electronic equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227614A (en) * 2008-01-22 2008-07-23 炬力集成电路设计有限公司 Motion estimation device and method of video coding system
CN101562742A (en) * 2008-04-14 2009-10-21 上海升岳电子科技有限公司 Video signal processing method
CN102496165A (en) * 2011-12-07 2012-06-13 四川九洲电器集团有限责任公司 Method for comprehensively processing video based on motion detection and feature extraction
WO2020097888A1 (en) * 2018-11-15 2020-05-22 深圳市欢太科技有限公司 Video processing method and apparatus, electronic device, and computer-readable storage medium
CN111010495A (en) * 2019-12-09 2020-04-14 腾讯科技(深圳)有限公司 Video denoising processing method and device
CN112351280A (en) * 2020-10-26 2021-02-09 杭州海康威视数字技术股份有限公司 Video coding method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US20180176573A1 (en) Apparatus and methods for the encoding of imaging data using imaging statistics
CN112529775A (en) Image processing method and device
US20220245765A1 (en) Image processing method and apparatus, and electronic device
CN112150399A (en) Image enhancement method based on wide dynamic range and electronic equipment
EP3890332A1 (en) Video splitting method and electronic device
CN112202986A (en) Image processing method, image processing apparatus, readable medium and electronic device thereof
WO2022148446A1 (en) Image processing method and apparatus, device, and storage medium
CN115526787B (en) Video processing method and device
CN113572948B (en) Video processing method and video processing device
CN111598919B (en) Motion estimation method, motion estimation device, storage medium and electronic equipment
CN112954251A (en) Video processing method, video processing device, storage medium and electronic equipment
CN113709464A (en) Video coding method and related device
CN113747060A (en) Method, apparatus, storage medium, and computer program product for image processing
US11393078B2 (en) Electronic device and method for correcting image on basis of image transmission state
CN113986162B (en) Layer composition method, device and computer readable storage medium
CN116468917A (en) Image processing method, electronic device and storage medium
WO2022170866A1 (en) Data transmission method and apparatus, and storage medium
CN115686182B (en) Processing method of augmented reality video and electronic equipment
CN114143471B (en) Image processing method, system, mobile terminal and computer readable storage medium
CN114401405A (en) Video coding method, medium and electronic equipment
CN115735226B (en) Image processing method and chip
CN114697731B (en) Screen projection method, electronic equipment and storage medium
CN114466238B (en) Frame demultiplexing method, electronic device and storage medium
CN114298889A (en) Image processing circuit and image processing method
CN114793283A (en) Image encoding method, image decoding method, terminal device, and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination