WO2022206166A1

WO2022206166A1 - Method and device for performing image processing in a video encoding device, and system

Info

Publication number: WO2022206166A1
Application number: PCT/CN2022/074533
Authority: WO
Inventors: 赵娟萍
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-04-01
Filing date: 2022-01-28
Publication date: 2022-10-06
Also published as: CN115190307A

Abstract

The present application discloses a method for performing image processing in a video encoding device, comprising: determining from the current image frame a block to be encoded; determining, from a reconstruction image frame of a historical image frame, a first area needing to be repeatedly read for multiple times, and storing image data in a preset memory; reading the image data of the first area from the preset memory; and determining, from the first area according to the image data, a matching block matched with the block to be encoded; and according to a relative relationship between the matching block and the block to be encoded, encoding the block to be encoded.

Description

Method, device and system for image processing in a video encoding device

This application claims the priority of the Chinese patent application with the application number 202110358171.1 and the application title "Method, Apparatus and System for Image Processing in a Video Encoding Device" filed with the China Patent Office on April 1, 2021, the entire contents of which are Incorporated herein by reference.

technical field

The present application belongs to the technical field of electronic devices, and in particular, relates to a method, device, storage medium, electronic device and system for image processing in a video encoding device.

Background technique

With the continuous development of technology, the functions of the video encoding apparatus are becoming more and more powerful. The video encoding apparatus may encode video images. When encoding one frame of video image, it is usually necessary to read the data amount of multiple frames of encoded video images. However, in the related art, when the data of the encoded video image is read, the power consumption of the video encoding device is relatively large.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a method, device, storage medium, electronic device, and system for performing image processing in a video encoding device, which can reduce power consumption of the video encoding device.

In a first aspect, an embodiment of the present application provides a method for image processing in a video encoding device, the method comprising:

Determine the block to be encoded (encoded block) from the current frame image;

Determine the first area that needs to be read repeatedly from the reconstructed frame images of the historical frame images, and store the image data of the first area in a preset memory, where the power consumption of the preset memory is less than that of the preset memory. Set the power consumption threshold;

read the image data of the first area from the preset memory;

determining a matching block matching the block to be encoded from the first region according to the read image data of the first region;

The to-be-encoded block is encoded according to the relative relationship between the matched block and the to-be-encoded block.

In a second aspect, an embodiment of the present application provides an apparatus for performing image processing in a video encoding apparatus, the apparatus comprising:

a first determining module, configured to determine the block to be encoded from the current frame image;

The second determination module is configured to determine, from the reconstructed frame images of the historical frame images, a first area that needs to be read repeatedly for many times, and store the image data of the first area in a preset memory, and the preset memory Set the power consumption of the memory to be less than the preset power consumption threshold;

a reading module for reading the image data of the first area from the preset memory;

a third determining module, configured to determine a matching block matching the block to be encoded from the first region according to the read image data of the first region;

An encoding module, configured to encode the to-be-encoded block according to the relative relationship between the matched block and the to-be-encoded block.

In a third aspect, embodiments of the present application provide a storage medium on which a computer program is stored, and when the computer program is executed on a computer, causes the computer to execute the image encoding in a video encoding apparatus provided by the embodiments of the present application method of processing.

In a fourth aspect, the embodiments of the present application further provide an electronic device, including a memory, a processor, and a video encoding apparatus. The processor is configured to execute the computer program stored in the memory by invoking the computer program provided in the embodiments of the present application. A method for image processing in a video encoding device.

In a fifth aspect, an embodiment of the present application further provides an image processing system, including a video encoding device, a first memory, and a second memory, wherein the power consumption of the second memory is greater than a first preset of the power consumption of the first memory. Assuming a multiple, the video encoding device includes a third memory, the read speed of the third memory is greater than the second preset multiple of the read speed of the first memory, the first memory and the second memory respectively store The image data repeatedly read in the reconstructed frame images of the historical frame images, the video encoding device reads the repeatedly read data from the first memory and the second memory according to a preset number of times during encoding. The image data obtained, and determine the image data in the search window (Search Window, SWin) therefrom, store the image data in the search window in the third memory, and the video encoding device from the third The image data in the search window is read from the memory, and a matching block matching the block to be coded is determined, and coding is performed according to the motion vector and residual of the matching block and the block to be coded.

Description of drawings

The technical solutions of the present application and the beneficial effects thereof will be apparent through the detailed description of the specific embodiments of the present application in conjunction with the accompanying drawings.

FIG. 1 is a first schematic flowchart of a method for performing image processing in a video encoding apparatus provided by an embodiment of the present application.

FIG. 2 is a schematic structural diagram of a video compression system in the related art.

FIG. 3 is a schematic diagram of data storage in a video encoding apparatus in the related art.

FIG. 4 is a schematic diagram of increasing the number of channels of a dynamic random access memory (DRAM) for data access in the related art.

FIG. 5 is a schematic diagram of a comparison of square blocks provided by an embodiment of the present application.

FIG. 6 is a schematic diagram of a hierarchical search provided by an embodiment of the present application.

FIG. 7 is a schematic diagram of a non-hierarchical search provided by an embodiment of the present application.

FIG. 8 is a second schematic flowchart of a method for performing image processing in a video encoding apparatus provided by an embodiment of the present application.

FIG. 9 is a schematic diagram of a scene of searching in a reconstructed frame image of a historical frame image provided by an embodiment of the present application.

10 is a schematic diagram illustrating a comparison of the energy consumed when reading data between a static random-access memory (Static Random-Access Memory, SRAM) provided by an embodiment of the present application and a dynamic random-access memory.

FIG. 11 is a schematic structural diagram of a video compression system using a system cache (Sys$) provided by an embodiment of the present application.

FIG. 12 is another schematic structural diagram of a video compression system using a system cache provided by an embodiment of the present application.

FIG. 13 is a schematic structural diagram of a video compression system using a system buffer memory (System Buffer, SysBuf) provided by an embodiment of the present application.

FIG. 14 is a schematic diagram of a scene when the reconstructed frame image of the historical frame image provided by the embodiment of the present application is moved down by one block line.

FIG. 15 is a schematic diagram of a power consumption curve when reading and writing data from a multi-channel DRAM according to an embodiment of the present application.

FIG. 16 is a schematic diagram of a power consumption curve when reading and writing data from Sys$ or SysBuf and DRAM, respectively, according to an embodiment of the present application.

FIG. 17 is a schematic diagram of a scene of a search range of a search window in a reconstructed frame image of a historical frame image provided by an embodiment of the present application.

FIG. 18 is a schematic diagram of a scene encoded by a video encoding apparatus provided by an embodiment of the present application.

FIG. 19 is a third schematic flowchart of a method for performing image processing in a video encoding apparatus provided by an embodiment of the present application.

FIG. 20 is a schematic structural diagram of an apparatus for performing image processing in a video encoding apparatus provided by an embodiment of the present application.

FIG. 21 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

FIG. 22 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.

FIG. 23 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.

FIG. 24 is another schematic structural diagram of an image processing system provided by an embodiment of the present application.

Detailed ways

Please refer to the drawings, wherein the same component symbols represent the same components, and the principles of the present application are exemplified by being implemented in a suitable computing environment. The following description is based on illustrated specific embodiments of the present application and should not be construed as limiting other specific embodiments of the present application not detailed herein.

An embodiment of the present application provides a method for image processing in a video encoding device, the method comprising:

Determine the block to be coded from the current frame image;

read the image data of the first area from the preset memory;

according to the read image data of the first area, determine a matching block that matches the to-be-coded block from the first area;

In some embodiments, the preset memory includes a first memory and a second memory, the power consumption of the second memory is greater than a first preset multiple of the power consumption of the first memory, and the first area includes A plurality of block rows, the first area that needs to be read repeatedly for multiple times is determined from the reconstructed frame image of the historical frame image, and the image data of the first area is stored in the preset memory, including:

Determine from the reconstructed frame image of the historical frame image stored in the second memory the first area that needs to be read repeatedly;

read the image data of the first area from the second memory and store it in the first memory;

The reading of the image data of the first region from the preset memory includes:

Reading the image data of the first region from the first memory block by line;

If the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, the image data of unread block lines in the first region is read block line by block line from the second memory.

In some embodiments, the reading the image data of the first region from the second memory and storing the image data in the first memory includes:

If the first area is moved down by one block line in the reconstructed frame image of the historical frame image, the image data of the moved block line is read from the second memory and stored in the first memory in memory;

The block lines that are not used in the encoding of the next block line to be encoded in the first memory are removed.

In some embodiments, the determining a matching block matching the block to be encoded from the first region according to the read image data of the first region includes:

Determine the image data of the search window from the read image data of the first area, and the search window is located in the first area;

storing the image data of the search window in a third memory, where the degree of the third memory is greater than the second preset multiple of the read-write speed of the first memory;

The image data of the search window is read from the third memory, and according to the image data of the search window, the block with the least encoding cost of the block to be encoded is determined from the search window;

The block with the smallest coding cost of the block to be coded is used as the matching block.

In some embodiments, the image data of the search window is read from the third memory, and the image data of the to-be-coded block is determined from the search window according to the image data of the search window. The least expensive block to encode, including:

Read the image data of the search window from the third memory;

reducing the search window according to the preset number of layers to obtain a reduced search window;

According to the image data of the reduced search window, the reduced block with the smallest encoding cost of the block to be encoded is determined from the reduced search window;

According to the position of the reduced block in the reduced search window, a block with the smallest encoding cost to the block to be encoded is determined from the search window.

In some embodiments, the relative relationship is a motion vector and a residual, and the encoding the block to be encoded according to the relative relationship between the matching block and the block to be encoded includes:

The to-be-encoded block is encoded according to the motion vector sum residual of the matched block and the to-be-encoded block.

In some embodiments, the encoding the block to be encoded according to the motion vector and the residual of the matching block and the block to be encoded includes:

performing forward transform and quantization on the residual of the matching block and the block to be encoded;

Entropy coding is performed on the motion vector of the matching block and the block to be coded and the forward transformed and quantized first residual data to obtain coded video stream data; or

Perform inverse quantization and transformation on the first residual data after the forward transformation and quantization to obtain second residual data;

The to-be-coded block is reconstructed according to the second residual data.

Determine from the reconstructed frame images of the plurality of historical frame images stored in the second memory a plurality of first regions that need to be read repeatedly;

reading image data of the plurality of first regions from the second memory and storing it in the first memory;

read the image data of the plurality of first regions from the first memory block by line;

If the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, the image data of unread block lines in the plurality of first regions is read block line by block line from the second memory.

Determine the image data of a plurality of search windows from the read image data of the plurality of first areas, and each of the search windows is located in the corresponding first area;

storing the image data of the plurality of search windows in a third memory, where the read-write speed of the third memory is greater than the second preset multiple of the read-write speed of the first memory;

The image data of the plurality of search windows are read from the third memory, and according to the image data of the plurality of search windows, the codes corresponding to the blocks to be coded are respectively determined from the plurality of search windows. the least expensive block or blocks;

The matching block is determined from a plurality of blocks with the smallest coding cost of the block to be coded.

In some embodiments, the image data of the plurality of search windows are read from the third memory, and according to the image data of the plurality of search windows, the image data of the plurality of search windows are determined respectively from the plurality of search windows. One or more blocks with the smallest coding cost of the block to be coded, including:

Read the image data of the plurality of search windows from the third memory;

reducing the plurality of search windows according to the preset number of layers to obtain a plurality of reduced search windows;

According to the image data of the plurality of reduced search windows, from the plurality of reduced search windows, respectively determine one or more reduced blocks with the smallest coding cost of the to-be-coded block;

According to the positions of the one or more reduced blocks in the reduced search window, one or more blocks with the smallest coding cost of the block to be coded are respectively determined from the plurality of search windows .

In some embodiments, the first memory includes a system cache or system buffer memory disposed external to the video encoding device, and the second memory includes dynamic random access memory disposed external to the video encoding device.

In some embodiments, the third memory includes a buffer or buffer provided inside the video encoding device.

Please refer to FIG. 1 . FIG. 1 is a first schematic flowchart of a method for performing image processing in a video encoding apparatus provided by an embodiment of the present application. The method for image processing in a video encoding device can be applied to a video encoding device. The flow of the method for image processing in a video encoding device may include:

101. Determine the block to be encoded from the current frame image.

Please refer to FIG. 2 , which is a schematic structural diagram of a video compression system in the related art. In the video compression system, a central processing unit (Central Processing Unit/Processor, CPU), a video encoding device, an image processor (Image Signal Processor, ISP) and a neural network processor (Neural Network Processing Unit, NPU) are connected through the bus and dynamic Random access memory controller (Dynamic Random Access Memory Controller, DRAMC) reads and writes data from DRAM, central processing unit, video encoding device, image processor and neural network processor time-sharing bandwidth, central processing unit, image processor and The priority of the neural network processor is higher than that of the video encoding device. The video encoding apparatus needs to perform a search operation when encoding, which will occupy a large bandwidth.

Video encoding devices attach great importance to cost. In frame buffering, in order to achieve the lowest cost and highest production yield, DRAM is usually used as the main storage space. Please refer to FIG. 3 , which is a schematic diagram of data storage in a video encoding apparatus in the related art. Wherein, the current frame (Current Frame) image, the reference frame (Reference Frame) image, the reconstructed frame (Reconstructed Frame) image, the bit stream (Bitstreams) and the temporary data (Temporary data) are all stored in the DRAM in the video encoding device. However, DRAM offers less bandwidth.

It should be noted that after encoding the current frame image, it becomes a reconstructed frame image, and the reconstructed frame image of the current frame image can be used as the reference frame image of the next frame image. It becomes the reconstructed frame image of the previous frame image, and the reconstructed frame image of the previous frame image can be used as the reference frame image of the current frame image. The temporary data may be Temporal Motion Vector (TMV), scaled frames, or other data.

With the emergence of new video standards, such as High Efficiency Video Coding (H.265/HEVC), Versatile Video Coding (H.266/VVC), the first generation of video coding standards of the Open Media Alliance (Alliance for Open Media Video 1, AV1), essential video coding (Essential Video Coding, MPEG-5/EVC), etc., which are aimed at increasingly larger screen sizes and higher frame rates. Based on this, it is usually used to increase the bandwidth of the DRAM or increase the frequency of the DRAM to accelerate the throughput of data.

Even the Motion Estimation (ME) of hierarchical search alleviates the problem of reading multiple reference frames, which requires higher throughput of DRAM in the case of large size and high frame rate. The increase in throughput is usually achieved by increasing the number of DRAM channels, which can cause excessive power consumption.

Please refer to FIG. 4 . FIG. 4 is a schematic diagram of increasing the number of DRAM channels for data access in the related art. By increasing the number of channels of the DRAM, the bandwidth and frequency can be increased to increase the data throughput speed of the DRAM, but it will cause greater power consumption. For example, in order to meet the requirement of the video encoding device to reach the reading speed, the bandwidth of the system DRAM consumes a large amount of energy. But whether the video encoding device performs immediate or non-real-time operations, it is very important to maintain the highest efficiency. In the method in the related art, when the video encoding apparatus completes the encoding at an expected time, it will cause great power consumption of the DRAM.

Video coding devices generally use a block (which can be considered as a pixel block) as a basic unit, and the block can be a rectangle, a square, or a trapezoid, or a triangle pieced together. In this case, a block-based comparison algorithm appears. Please refer to FIG. 5 , which is a schematic diagram of a comparison of square blocks provided by an embodiment of the present application. The block to be compressed in the current frame image is compared with the block of the reference frame image in the form of a square block, and the reference frame image is the reconstructed image of the historical frame image, that is, the encoded image of the historical frame image. Among them, the block to be compressed and the block of the reference frame image are N×N blocks, and N is an integer greater than or equal to 4. Through the comparison of the blocks, the information redundancy in the time domain can be minimized, and the compressed video data can be achieved. Effect. Figure 5 is an example of an alignment based on square blocks, but the same alignment method can be used for blocks made up of rectangles, trapezoids or triangles.

In this embodiment of the present application, when performing motion estimation, the image is divided into a plurality of non-overlapping blocks, and these blocks form a rectangular array, where each block is a block of N×N pixel size, for example, may be 4×4 , 32×32 blocks, 128×128 blocks, etc., where 4×4, 32×32, 128×128 refer to the number of pixels. For each block to be coded, go to the surrounding of the same position in the reconstructed frame image of the historical frame image to find the block that best matches it, that is, the matching block. The movement of the matching block relative to the block to be coded is called the motion vector ( Motion Vector, MV).

In this embodiment of the present application, the block to be encoded is determined from the current frame image, where the block to be encoded is a block to be compressed in the current frame image, that is, a block to be encoded in the current frame image. The block to be encoded may be a block of N×N size. When coding the block to be coded, it is usually necessary to compare it with the block in the reference frame image, so it is necessary to search for the block to be compared in the reference frame image. The reference frame image is a reconstructed frame image of the historical frame image, that is, an encoded image of the historical frame image.

102. Determine a first area that needs to be read repeatedly from the reconstructed frame image of the historical frame image, and store the image data of the first area in a preset memory, where the power consumption of the preset memory is less than the preset function. consumption threshold.

For example, only after searching for the block to be compared in the reconstructed frame image of the historical frame image, the block can be compared with the block to be encoded, and the reconstructed frame image of the historical frame image may be a plurality of historical frame images. The encoding cost of the reconstructed frame image is the least. Therefore, before searching for a block, the area to be searched needs to be known in advance. Therefore, when searching for blocks in the reconstructed frame image of the historical frame image, it is necessary to know the search range (search range, SRng) in the reconstructed frame image of the historical frame image, that is, it is necessary to determine the repeated reading When searching for blocks in the first area, the data of the first area needs to be read repeatedly for many times. Therefore, in the embodiment of the present application, after determining the first area that needs to be read repeatedly from the reconstructed frame images of the historical frame images, the image data of the first area is stored in the preset memory, which is convenient for subsequent searches from the preset memory. It is assumed that the image data of the first area is read from the memory. In addition, the power consumption of the preset memory is less than the preset power consumption threshold. By using the preset memory with low power consumption to read and write data, the power consumption of the video encoding device can be reduced.

103. Read the image data of the first region from the preset memory.

For example, after the image data of the first area is stored in the preset memory, when searching for blocks in the first area, it is necessary to read the image data of the first area from the preset memory, so as to find the corresponding data to be encoded. The block that best matches the block.

A matching block matching the block to be encoded is determined from the first region according to the read image data of the first region.

For example, by reading the image data of the first area stored in the preset memory, the search for the first area can be realized. During the search process, each block in the first area is respectively compared with the block to be encoded in the current frame image. By comparison, the block that best matches the block to be coded is found from the first area, and the best matching block is the matching block.

The common block matching algorithm can use hierarchical search (hierarchical search) or non-hierarchical search (not hierarchical search), after searching, the motion vector and pixel value residues are obtained, for subsequent further compression coding. Among them, the pixel residual amount can be obtained by subtracting the predicted value from the actual value of the pixel.

Please refer to FIG. 6. FIG. 6 is a schematic diagram of a hierarchical search provided by an embodiment of the present application. Hierarchical search is to reduce the block to be searched and the area to be searched by the same magnification, such as 1/2, 1/4 or 1/8, etc. On the reduced image (that is, the area to be searched), first determine the area to be searched. After the approximate extent of the block, go back to the unreduced image for a finer block search. In the hierarchical search, the reduction ratio of each layer may be the same or different, for example, the reduction ratio of each layer may be 1/2, 1/4, 1/8 and 1/16.

Figure 6 is an example of motion search in 3 levels. First, the 1/4 reduced image is searched. Then, the motion vector obtained from the 1/4 reduced image range is finer and smaller in the 1/2 reduced image range. After that, the motion vector obtained by reducing the image range by 1/2 searches the range of the original size image to obtain the final motion vector.

Please refer to FIG. 7 , which is a schematic diagram of a non-hierarchical search provided by an embodiment of the present application. Non-hierarchical search refers to performing block comparison tasks directly in unreduced images. Common methods include full search and n-step search. Figure 7 shows motion search directly on the original size image, that is, in the search window in the reconstructed frame image of the previous frame image, the full search method is used to find the current block (current block) and the reconstruction of the previous frame image Which block in the frame image has the smallest coding cost. Wherein, the minimum coding cost may take various forms, for example, the minimum coding cost may be the minimum sum of absolute values of residuals of each pixel of a certain block to be searched and the current block. p in Fig. 7 is the horizontal search range.

It should be noted that motion estimation refers to block-based motion estimation. The basic idea is to divide each frame of the image sequence into many non-overlapping blocks, and consider that the displacements of all pixels in the block are the same, and then calculate the value of each block. In a given specific search range of the reference frame, the block most similar to the current block is found according to certain block matching criteria, that is, the matching block, and the relative displacement between the matching block and the current block is the motion vector. Motion estimation searches for reconstructed pixels encoded at different previous time points, that is, reconstructed pixels in reconstructed frame images of historical frame images.

In the inter-frame prediction mode, a preset number of historical frame images can be randomly selected from the historical frame images, and the reconstructed frame images of the selected historical frame images can be searched. displacement between blocks, and then select the optimal motion vector as the final search result. It can be understood that the position of the matching block in the reconstructed frame image of the historical frame image can be determined according to the motion vector.

105. According to the relative relationship between the matching block and the block to be encoded, encode the block to be encoded.

For example, according to the relative displacement relationship and the relative error relationship between the matching block and the block to be encoded, for example, subtract the two-dimensional pixel of the corresponding position of the matching block from the two-dimensional pixel of the block to be encoded to obtain the relative error between the matching block and the block to be encoded The to-be-encoded block can be encoded according to the relative displacement relation and the relative error relation between the matching block and the to-be-encoded block.

It can be understood that, in this embodiment of the present application, the video encoding apparatus may determine the block to be encoded from the current frame image, determine the first area that needs to be read repeatedly from the reconstructed frame image of the historical frame image, and The image data of the first area is stored in a preset memory, and the power consumption of the preset memory is less than a preset power consumption threshold. Then, the image data of the first area is read from the preset memory, and a matching block matching the block to be encoded is determined from the first area according to the read image data of the first area. Then, according to the relative relationship between the matching block and the block to be encoded, the block to be encoded is encoded. That is, in the embodiment of the present application, the image data of the first region is stored in a preset memory with low power consumption, so as to achieve the purpose of reducing the power consumption of the video encoding apparatus. Therefore, the embodiments of the present application can reduce the power consumption of the video encoding apparatus.

Please refer to FIG. 8 , which is a second schematic flowchart of a method for performing image processing in a video encoding apparatus provided by an embodiment of the present application. The method for image processing in a video encoding device can be applied to a video encoding device. The flow of the method for image processing in a video encoding device may include:

201. Determine the block to be encoded from the current frame image.

For example, each frame of image can be divided into multiple block lines, and each block line can be divided into multiple blocks. Before determining the block to be encoded in the current frame image, the block row to be encoded needs to be determined from the current frame image. The block row to be encoded refers to the block row where the block to be encoded is located. The block lines located before the block line to be encoded in the current frame image are all encoded block lines.

After the block row to be encoded is determined, the block to be encoded needs to be determined from the block row to be encoded. In the block row to be coded, the blocks located to the left of the block to be coded are all coded blocks. Please refer to FIG. 9. FIG. 9 is a schematic diagram of a scene of searching in a reconstructed frame image of a historical frame image provided by an embodiment of the present application. It can be seen from FIG. 9 that the block to be encoded is located within the range of the search window in the vertical projection direction.

202. Determine, from the reconstructed frame images of the historical frame images stored in the second memory, a first area that needs to be read repeatedly for multiple times.

For example, the first area may include a plurality of block rows. After the block to be encoded is determined, it is necessary to determine a plurality of block lines that need to be read repeatedly from the reconstructed frame image of the historical frame image (which can be considered as a historical frame image with the strongest correlation with the current frame image), The multiple block lines that need to be read repeatedly are the block lines located in the first area. Wherein, each block row includes a plurality of blocks, and the plurality of blocks are arranged in a row.

The preset memory includes a first memory and a second memory. It should be noted that the reconstructed frame image of the historical frame image can be stored in the second memory in advance, and then the reconstructed frame image from the historical frame image stored in the second memory can be stored in the second memory. A first area that needs to be read repeatedly for multiple times is determined in the frame image.

203. Read the image data of the first region from the second memory and store the image data in the first memory.

For example, after the first area that needs to be read repeatedly is determined from the reconstructed frame images of the historical frame images stored in the second memory, the image data of the first area can be read from the second memory, and the image data of the first area can be read from the second memory. The read image data of the first area is stored in the first memory, and is read while waiting for the video encoding apparatus to perform encoding.

It should be noted that, in this embodiment of the present application, the power consumption of the second memory is greater than the first preset multiple of the power consumption of the first memory, and the sum of the power consumption of the first memory and the power consumption of the second memory is less than the preset value. Power consumption threshold, which can reduce the power consumption when reading and writing data. The preset power consumption threshold may be considered as the power consumption when all the image data in the first area is read and written by the second memory.

For example, both the first memory and the second memory are memory external to the video encoding device. For example, the first memory may include a system cache or system buffer memory provided outside the video encoding device, that is, the first memory may include a system cache or system buffer memory provided outside the video encoding device. The Sys$ or SysBuf outside the device, the second memory may include a dynamic random access memory disposed outside the video encoding device, that is, the second memory may include a DRAM disposed outside the video encoding device. Of course, the first memory can also be other low-power memory, etc. The embodiment of this application is described by taking Sys$ or SysBuf as an example, Sys$ or SysBuf is composed of multiple SRAMs, the second memory can be DRAM, and the power consumption of DRAM is greater than The first preset multiple of the power consumption of the Sys$ or SysBuf outside the video encoding device, and the sum of the power consumption of the Sys$ or SysBuf outside the video encoding device and the power consumption of the DRAM is less than the preset power consumption threshold, which can reduce the read The power consumption when writing data, the preset power consumption threshold can be considered as the power consumption when all the image data in the first area is read and written by the DRAM.

Please refer to FIG. 10 . FIG. 10 is a schematic diagram illustrating a comparison of the energy consumed when reading data in the static random access memory and the dynamic random access memory provided by the embodiment of the present application. The difference in energy consumption between reading data in SRAM and reading data in DRAM is about 100 times, that is, the power consumption of reading data in SRAM is far less than the power consumption of reading data in DRAM. By storing the image data of multiple block lines in Sys$ or SysBuf and DRAM respectively, when reading the image data in Sys$ or SysBuf, the power consumption when reading data can be reduced.

The motion estimation step of the video encoding device requires a large bandwidth provided by the DRAM, because during the search process, certain associated regions (ie, the first region) in the reconstructed frame images of the historical frame images are read for block search and comparison. Due to cost considerations, the block lines covered by the search range (that is, the first area) are usually not completely stored in the video encoding device, and usually only the size required within the search range (such as within the search window range) to meet the high-speed data access requirements during motion estimation.

If the image data of the first area is stored in the hardware of the video encoding device, that is, the cache or buffer, the cache or buffer includes multiple SRAMs, and if the image data of the first area is stored in the video encoding device Inside the hardware of the device, the SRAM inside the video encoding device needs to be divided into more units, each unit is a bank, which will lead to a larger area of a single bank. As the area of a single bank becomes larger, the area of the SRAM also becomes larger, while the storage capacity of the SRAM remains unchanged, resulting in higher costs. For example, taking a width of 8192 pixels and a vertical search range of ±64 as an example, an 8-bit (bit) luminance (luma) portion requires at least 1 megabyte (MB) of storage space. In addition, because the motion estimation algorithm is used, the SRAM needs to be divided into more units to meet the data entry and exit requirements, resulting in a larger area of the SRAM.

It should be noted that a motion estimation design that stores data in the form of a search window and processing the compression of one block line will require multiple pieces of data of the block line in the reconstructed frame image of the historical frame image, which means processing one frame of data. It will require the reading of multiple frames of data.

For example, when a video encoding device searches for motion vectors, see FIG. 9 . Usually, due to the large demand bandwidth in the search window, the image data of the search window is stored in a cache or buffer inside the video encoding device. The cache or buffer The buffer includes finely divided SRAM groups. The finer division means that the area of the same storage unit becomes larger. For example, the average area ratio of 1 bit in the bank is larger than that in the SRAM, which can provide sufficient of the data bandwidth to the motion estimation circuit. This not only results in a large area of SRAM, but also makes layout routing more difficult because it is divided into many banks. Therefore, the block row of the entire first area will not be implemented using this method with a high storage unit (eg, 1 bit) area.

That is to say, each time the block line covered by the first area is moved down by one block line during the encoding process, the first area will be grabbed again. Usually, the vertical search range will be multiple times the height of the block to be encoded, which results in that the bandwidth for reading the image data in the first area will be multiple times the bandwidth for writing the image data in the first area. And this situation is even more serious when the picture to be encoded reaches 4K or 8K. The resolution of 4K pictures is 3840×2160 pixels, and the resolution of 4K pictures is 7680×4320 pixels. When encoding 4K and 8K pictures, the vertical search range must be larger than the 1080P resolution, otherwise the degree of picture compression will be greatly reduced.

In this embodiment of the present application, in 203, the image data of the first region is read from the second memory and stored in the first memory, which may include:

If the first area is moved down by one block line in the reconstructed frame image of the historical frame image, the image data of the moved block line is read from the second memory and stored in the first in memory;

For example, if the block line area (ie the first area) in the reconstructed frame image of the historical frame image that needs to be read repeatedly is stored in Sys$ or SysBuf outside the video encoding device in advance, please refer to FIG. 11 to FIG. 13. FIG. 11 is a schematic structural diagram of a video compression system using a system cache provided by an embodiment of the present application. FIG. 12 is another schematic structural diagram of a video compression system using a system cache provided by an embodiment of the present application. FIG. 13 is a schematic structural diagram of a video compression system using a system buffer memory provided by an embodiment of the present application. The image data of the first area is stored in Sys$ or SysBuf, and the image data of the search window is stored in the cache or buffer inside the video encoding device. n in Fig. 11 to Fig. 13 is a number indicating the size of the storage capacity. For example, in one embodiment, the DRAM reads and writes data at a speed of 0.5GB/s to 2GB/s, the Sys$ or SysBuf reads and writes data at a speed of 3GB/s to 8GB/s, and the cache or buffer reads and writes data at a speed of 3GB/s to 8GB/s. 10GB/s～50GB/s.

It should be noted that, in other embodiments, the speed of reading and writing data of DRAM, the speed of reading and writing data of Sys$ or SysBuf, and the speed of reading and writing data of cache or buffer can also be other values, but the speed of reading and writing data of cache or buffer must be satisfied. The speed of data is greater than the speed of Sys$ or SysBuf to read and write data and the speed of DRAM to read and write data, and the speed of Sys$ or SysBuf to read and write data is greater than the speed of DRAM to read and write data.

Taking FIG. 11 as an example, Sys$ can read data from DRAM through DramC, and the data read by Sys$ from DRAM through DramC can be read by central processing unit, video encoding device, image processor and neural network processor. When the first area moves down a block line in the reconstructed frame image of the historical frame image, both Sys$ and DRAM store the new block line, and Sys$ removes the unused block lines at the same time. When the device needs to perform encoding, it can directly read the data in the first area stored in Sys$. In addition, in Sys$, the image data in the first area is also read from DRAM through DramC, and then read by the video encoding device.

When storing, the Sys$ or SysBuf outside the video encoding device will remove the block lines that will not be used in the next line to be encoded, so that the cache or buffer inside the video encoding device can read the first area from the DRAM The number of times the image data of DRAM is changed from multiple times to 1 times, and because the energy consumption of DRAM access is 100 times higher than that of SRAM, this can greatly reduce power consumption.

For example, since the position where the reconstructed frame image of the historical frame image is read (ie the first area) and the behavior (repeated reading) are predictable, and reading the reconstructed frame image of the historical frame image will be the same as reading the current frame image. A multiple of the bandwidth required for the frame image. If the image data of the first area that has been read multiple times is stored in a low-power storage space such as Sys$ or SysBuf, the power consumption of the entire system can be greatly reduced while the operation of the video encoding device is effectively maintained. Improve user experience. According to the structure of the reconstructed frame image of the historical frame image compressed by the video encoding device, it can be determined how many relevant block rows of the reconstructed frame image of the historical frame image to be stored in such a low-power storage space. Whenever the block line to be encoded (the block line to be encoded) moves down one line, remove the uppermost block line stored in Sys$ or SysBuf, and then re-read the newly added block line in the first area when it was moved down , and store the image data of the newly added block row.

Please refer to FIG. 14. FIG. 14 is a schematic diagram of a scene when the reconstructed frame image of the historical frame image provided by the embodiment of the present application is moved down by one block line. Each time the video encoding device encodes a block line downward, it evicts the unrelated block line originally stored in the upper part of Sys$ or SysBuf, and then sends the image data of the newly needed block line into Sys$ or SysBuf. That is, when the first area covered by the search range can move down with the block to be encoded, the unused area is expelled from Sys$ or SysBuf, and the block line to be encoded that will be used is stored in Sys$ or SysBuf middle. That is, every time the video encoding device encodes a block row downward, it removes the irrelevant block row above the first area stored in Sys$ or SysBuf, and then stores the newly required block row during encoding in the Sys $ or SysBuf outside the video encoding device. $ or SysBuf.

The image data of the first region is read block by line from the first memory.

For example, when the video encoding apparatus needs to perform encoding, the image data of the first region may be read block by line from the first memory, for example, the image data of the first region may be read from Sys$ or SysBuf. When reading, it is read block by row, that is, read in order from top to bottom.

Taking Advanced Video Coding (H.264/AVC) as an example, it is assumed that the size of the macro block (Macro block) in Figure 14 is 16×16 pixels, that is, 16 pixels in the horizontal direction times 16 pixels in the vertical direction, Of course, the size of the macroblock may also be 32×32 pixels, 64×64 pixels, or the like. The macroblock is the block to be encoded in the current frame image. The vertical search range is ±64. In this case, the number of times the current frame image of the historical frame image is read is 9 (=(64+16+64)/16) times the current frame image.

205. If the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, read image data of unread block lines in the first region from the second memory block line by block line.

For example, when reading the image data in the first area, the total number of times of reading from the second memory can be divided into the first few times of reading from the first memory, and the remaining times of reading from the second memory. If the minimum requirement of Sys$ or SysBuf is met, the 9 pieces of data read from DRAM will be disassembled into 1 read from DRAM and 8 reads from Sys$ or SysBuf. The power consumption of reading data with the assistance of Sys$ or SysBuf is reduced to 11.81% of that without assistance, that is (1×640+8×5)(9×640)=11.81%, the power consumption in this case is lower . For example, according to specific needs, the 9 data volumes read from DRAM can be disassembled into 2 reads from DRAM and 7 reads from Sys$ or SysBuf, etc. When the power consumption is demanding conditions In this case, all 9 copies of data read from DRAM can be read from Sys$ or SysBuf. At this time, the power consumption is the lowest, but the cost will increase.

Since the cost of SRAM is relatively high and the cost of DRAM is relatively low, considering the cost, SRAM is generally not too large, but DRAM can be relatively large. Therefore, in order to reduce the power consumption when reading data in the embodiment of the present application, The original number of readings from DRAM can be split into several readings from SRAM, and the other several readings from DRAM, which can reduce the power consumption of reading data as a whole. And the number of reads from the SRAM and the number of reads from the DRAM can be adjusted to meet the needs of different power consumption.

For example, when reading the image data of the first area, you can first read from Sys$ or SysBuf, and when the number of readings is greater than or equal to the preset number of times threshold, switch to reading the first area from DRAM The image data of the block row has not been read. When reading the same image data, DRAM consumes 100 times more energy than SRAM. Therefore, by reading a part of the image data in the first area from Sys$ or SysBuf and the other part of the data from DRAM, the power consumption of reading data can be reduced.

Please refer to FIG. 15. FIG. 15 is a schematic diagram of a power consumption curve when reading and writing data from a multi-channel DRAM according to an embodiment of the present application. In Figure 15, the abscissa is the position of the reconstructed frame image of the historical frame image, for example, the top position of the image, the middle position of the image, and the bottom position of the image, and the ordinate is the power consumption of reading and writing data during video encoding. In the case of video encoding devices that rely too much on DRAM or other cheap but power-hungry storage and high bandwidth, the limited power consumption provided by the video compression system can make the video encoding device unable to meet the speed requirements, or cause the video compression System is overheating. If the upper limit of power consumption is considered, the speed of reading and writing data is limited, and the reading and writing speed when the upper limit of power consumption is not considered cannot be reached.

Please refer to FIG. 16. FIG. 16 is a schematic diagram of a power consumption curve when reading and writing data from Sys$ or SysBuf and DRAM, respectively, according to an embodiment of the present application. The video encoding device replaces a large amount of DRAM power consumption with the power consumption of Sys$ or SysBuf, which greatly reduces power consumption.

206. Determine the image data of the search window from the read image data of the first area, where the search window is located in the first area.

For example, in order to further narrow the search range, a search window may be determined from the first area, so that the search range may be narrowed, and matching blocks may be searched therefrom, thereby further reducing power consumption. For motion estimation, the reconstructed frame image of the historical frame image that is not reduced when using the non-hierarchical search, and the reconstructed frame image of the historical frame image that is reduced or non-reduced when using the hierarchical search, as long as the vertical position can be predicted. The search window is applicable.

Please refer to FIG. 17 . FIG. 17 is a scene schematic diagram of a search range of a search window in a reconstructed frame image of a historical frame image provided by an embodiment of the present application. It can be seen from FIG. 17 that the search window is located in the first area, the area between the adjacent dotted lines in the first area is the block row, and the searched motion vector can point to any place in the search window. Wherein, L, R, T and B are respectively the search range located on the left side of the block to be coded, the search range on the right side, the upper search range and the lower search range in the search window. where R and B are positive numbers and L and T are negative numbers. And L is not necessarily equal to R, and T is not necessarily equal to B.

207. Store the image data of the search window in a third memory, where the read/write speed of the third memory is greater than the second preset multiple of the read/write speed of the first memory.

For example, after the image data of the search window is determined, the image data of the search window is stored in a third memory, the third memory may be a memory inside the video encoding device, and the third memory may include a memory set inside the video encoding device cache or buffer. Since the block within the search window is searched during motion estimation, and the demand for bandwidth is high during block search, the read and write speed of the third memory is higher than the read and write speed of the first memory and the speed of the second memory. read and write speed. To meet the needs of search speed and high bandwidth. Wherein, the read/write speed of the third memory is greater than the second preset multiple of the read/write speed of the first memory.

208. Read the image data of the search window from the third memory, and according to the image data of the search window, determine from the search window the block with the smallest encoding cost compared to the block to be encoded.

For example, when searching, the image data of the search window is read from the third memory, and the search can be carried out in a hierarchical or non-hierarchical manner. The coded blocks are compared to determine the block with the smallest coding cost compared to the block to be coded. For example, in one embodiment, the coding cost may include a residual, for example, in another embodiment, the coding cost may include a block vector and a residual, and so on. It can be known that the block with the smallest coding cost may be the block with the smallest residual difference with the block to be coded, or the block with the smallest coding cost after comprehensively considering the block vector and the residual difference with the block to be coded.

For example, for motion estimation, scan block by row in the search window, and compare the searched block with the block to be coded, so that the block with the smallest residual error from the block to be coded can be found from the search window. The motion vector may be the relative displacement between the searched block and the block to be encoded. The residual may be a difference obtained by subtracting the two-dimensional pixels at the corresponding positions of the searched blocks from the two-dimensional pixels of the block to be encoded.

For example, in one embodiment, in 208, the image data of the search window is read from the third memory, and according to the image data of the search window, the block with the least encoding cost of the block to be encoded is determined from the search window, Can include:

Read the image data of the search window from the third memory;

For example, when searching, a hierarchical search method can be used, and the search level is different according to the number of levels. For example, if a search using two layers is used, a search in two layers is performed, and if a search using three layers is used, a search in three layers is performed. Of course, the more layers there are, the more accurate the search results will be, but at the same time, it will also increase the consumption of system computing resources. In practical applications, an appropriate number of layers can be set according to specific needs. It should be noted that the reduction ratio of each layer may be the same or different.

For example, after reading the image data of the search window from the third memory, the search window is reduced according to the preset number of layers. For example, the search window is reduced according to the number of two layers to obtain a reduced search window. The search window is 1/2 of the original search window size. Then, according to the image data of the reduced search window, a reduced block with the smallest coding cost of the block to be encoded is determined from the reduced search window, and the reduced block and the reduced search window are reduced The magnification is the same. On the image of the reduced search window, first determine the approximate range of the reduced block to be searched, and then return to the image of the unreduced search window to perform a finer block search, that is, according to the reduced block, after the reduced block The approximate range in the search window of , and the original search window is more finely searched, and the block with the least encoding cost of the block to be encoded can be determined from the unreduced search window.

For another example, after reading the image data of the search window from the third memory, the search window is reduced according to the preset number of layers, for example, the search window is reduced according to the number of three layers to obtain a reduced search window. The search window is 1/4 of the original search window size. Then, according to the image data of the 1/4 reduced range search window, determine the reduced block with the smallest coding cost of the block to be coded from the 1/4 reduced range search window, and obtain the 1/4 reduced range The corresponding block vector under the search window of . After that, perform a search with a finer and smaller range in the 1/2 reduced range search window, and finally search the range of the original size search window according to the block vector obtained from the 1/2 reduced range search window to obtain the final block vector , so that the block with the least coding cost to the block to be coded can be determined.

For another example, after reading the image data of the search window from the third memory, the search window is reduced according to the preset number of layers, for example, the search window is reduced according to the number of three layers, and a reduced search window is obtained. The search window is 1/6 of the original search window size. Then, according to the image data of the 1/6 reduced range search window, the reduced block with the smallest coding cost of the block to be coded is determined from the 1/6 reduced range search window, and the 1/6 reduced range is obtained. The corresponding block vector under the search window. After that, perform a search with a finer and smaller range in the 1/3 reduced range search window, and finally search the range of the original size search window according to the block vector obtained from the 1/3 reduced range search window to obtain the final block vector , so that the block with the least coding cost to the block to be coded can be determined.

It can be seen that, on the image of the reduced search window, first determine the approximate range of the reduced block to be searched, and then return to the image of the unreduced search window to perform a finer block search. The approximate range of the block in the reduced search window, and the original search window is searched more precisely, and the block with the least encoding cost of the block to be encoded can be determined from the unreduced search window.

209. Use the block with the least coding cost of the block to be coded as a matching block.

For example, after finding the block with the smallest coding cost (eg, the smallest residual) from the block to be coded, the block with the smallest coding cost with the block to be coded is used as the matching block.

210. Code the block to be coded according to the motion vector and the residual of the matching block and the block to be coded.

For example, the relative relationship between the matching block and the block to be coded may be a motion vector and a residual. After the matching block is found, the block to be coded can be coded according to the motion vector and the residual of the matching block and the block to be coded.

In an implementation manner, in 210, encoding the block to be coded according to the motion vector and the residual of the matching block and the block to be coded may include:

Carrying out forward transform and quantization (Forward Transform & Quantization, FTQ) on the residual of the matching block and the block to be encoded;

Entropy coding (Entropy Coding, EC) is performed on the motion vector of the matching block and the block to be coded and the forward transformed and quantized first residual data to obtain video stream coded data; or

The to-be-coded block is reconstructed according to the second residual data.

Please refer to FIG. 18 . FIG. 18 is a schematic diagram of a scene of encoding by a video encoding apparatus provided by an embodiment of the present application. It can be seen from FIG. 18 that the motion estimation is located in the data flow relationship between the video coding apparatus and other modules. For example, motion estimation (hierarchical search or non-hierarchical search can be used) searches the reconstructed frame images of multiple historical frame images, and a matching block is found. The relative relationship between the matching block and the current block (ie, the block to be encoded) The displacement is the motion vector, and the residual is obtained according to the error between the current block and the matching block. Perform forward transformation and quantization on the residual, wherein the forward transformation adopts Fast Fourier Transformation (FFT) to transform the spectrum, the abscissa on the spectrum curve is the frequency, and the ordinate is the energy. After forward transformation, The pixels in the space are converted into spectral coefficients that are uncorrelated and energy-concentrated. After the forward transformation, the data is only converted to the frequency domain, and the amount of data does not change, which can reduce distortion. Quantization can be achieved by dividing the forward transformed matrix by the value of the corresponding position in the quantization matrix. The spectral coefficients are further compressed by quantization and entropy coding to obtain a compressed video stream. Among them, the quantization process removes some unimportant high-frequency information, which can compress the amount of image data, so quantization is the key to compression. The first residual data is obtained after forward transformation and quantization.

The first residual data obtained after forward transformation and quantization is subjected to inverse quantization and transformation (De-Quantization & Inv.Transform, DQIT) to the air domain, that is, the second residual data of the matching block and the block to be encoded are obtained, and the current frame is obtained. The to-be-coded block of the image is reconstructed (Block Reconstruction, BlkRec) in the picture block area as the neighbor of the next to-be-coded block. In-loop filter (InF) is used to deal with the continuity problem between blocks to make it smoother. A commonly used loop filter is a linear low-pass filter that filters out high frequency components and noise. Using forward transformation and quantization can eliminate the redundancy in the video image space, and using entropy coding can eliminate the coding redundancy.

It can be understood that the embodiments of the present application are based on predictable data access behavior (ie, repeated reading behavior) during video encoding, so as to realize intelligent selection of a data storage mode, so as to reduce the power consumption of the video encoding apparatus. Whether the data to be read should be stored in the low-power Sys$ or SysBuf can be changed according to the frame reference reading strategy during encoding, so that the reconstructed frame images of some or all of the historical frame images stored in Sys$ or SysBuf are repeated. The number of readings is the highest, so as to reduce power consumption to the greatest extent, and ensure that the video encoding device can always maintain the lowest power consumption state when entering and exiting data. If the Sys$ or SysBuf has high-speed bandwidth at the same time, since the Sys$ or SysBuf can satisfy the bandwidth required for repeatedly reading data, the bandwidth of the DRAM can be further reduced.

The embodiment of the present application can ensure that the power consumption of the video encoding device is controllable, and the hardware or software of the video encoding device can complete the encoding work as soon as possible, and make full use of the possibility that the video encoding device will repeatedly read the image data of the first area for many times. Desired behavior to change the storage characteristics of the read data allows the video encoding device to maintain its operating speed while reducing power consumption because accessing the data saves power. The speed of reading data is not limited by power consumption, so the video encoding device does not overheat. In addition, the SRAM in Sys$ or SysBuf has low latency when reading and writing, which can improve the processing frame rate and reduce the response latency. Since the power consumption can be greatly reduced, the usage time of the battery in the video encoding device can be increased, and the user experience can be improved.

Please refer to FIG. 19. FIG. 19 is a third schematic flowchart of a method for performing image processing in a video encoding apparatus provided by an embodiment of the present application. The method of performing image processing in a video encoding device can be applied to a video encoding device or the like. The flow of the method for image processing in a video encoding device may include:

The block to be encoded is determined from the current frame image.

For the specific implementation of step 301, reference may be made to the embodiment of step 201, and details are not described herein again.

302. Determine, from the reconstructed frame images of the plurality of historical frame images stored in the second memory, a plurality of first regions that need to be read repeatedly for multiple times.

For example, after the block to be encoded is determined, it is necessary to determine multiple first regions that need to be read repeatedly from the reconstructed frame images of multiple historical frame images, that is, in the reconstructed frame images of each historical frame image, Identify the first region that needs to be read repeatedly. Wherein, the first area may include a plurality of block rows, each block row includes a plurality of blocks, and the plurality of blocks are arranged in a row.

The preset memory includes a first memory and a second memory. It should be noted that the reconstructed frame images of multiple historical frame images can be stored in the second memory in advance, and then the reconstructed frame images of the multiple historical frame images stored in the second memory can be stored in the second memory. A plurality of first regions that need to be read repeatedly for many times are determined in the reconstructed frame image of the image.

303. Read the image data of multiple first regions from the first memory and store them in the first memory.

For example, after multiple first areas that need to be read repeatedly for multiple times are determined from the reconstructed frame images of multiple historical frame images stored in the second memory, the multiple first areas can be read from the second memory and store the read image data of the first region in the first memory, and read it while waiting for the video encoding device to perform encoding.

It should be noted that, in this embodiment of the present application, for example, the power consumption of the second memory is greater than the first preset multiple of the power consumption of the first memory. After multiple first regions that need to be read repeatedly are determined from the reconstructed frame images of multiple historical frame images stored in the second memory, the image data of the multiple first regions is read from the second memory, and store the read image data of multiple first areas in the first memory, and the total power consumption of reading and writing data from the first memory and the second memory is less than the preset power consumption threshold, which can reduce the time when reading data. power consumption.

For example, the first memory may be Sys$ or SysBuf, the second memory may be DRAM, and the power consumption of the DRAM is greater than the first preset multiple of the power consumption of Sys$ or SysBuf. Referring to Figure 10, the energy difference between reading SRAM and reading DRAM is about 100 times different, that is, the energy of reading SRAM is much smaller than that of reading DRAM. By storing the image data of multiple first areas in Sys$ or SysBuf respectively (Sys$ or SysBuf is composed of multiple SRAMs) and DRAM, when reading the image data of the first area from Sys$ or SysBuf and DRAM respectively, The power consumption when reading data can be reduced as a whole.

304. Read image data of multiple first regions from the first memory block by row.

For example, the image data of a plurality of first regions may be read from the first memory, for example, the number of times of reading from the first memory may be greater than the number of times of reading from the second memory, and the number of times of reading from the first memory may be less than the number of times read from the second memory, or the number of times read from the first memory may be equal to the number of times read from the second memory, specifically the number of times read from the first memory and the second memory, respectively, Corresponding settings should be made according to specific scenarios, which are not specifically limited in this embodiment of the present application.

For example, in one embodiment, when reading the image data of multiple first regions, it can be read from Sys$ or SysBuf (ys$ or SysBuf is composed of multiple SRAMs), and when reading, It is read block by row, that is, the block rows in the first area are read in order from top to bottom. When the number of times read from Sys$ or SysBuf is greater than or equal to the preset times threshold, it switches to read the remaining data from the DRAM. When accessing the same data, DRAM consumes 100 times more energy than SRAM. Therefore, by reading a part of the image data of the multiple first regions from Sys$ or SysBuf, and reading another part of the data from the DRAM, the power consumption of reading data can be reduced. As can be seen from Fig. 16, the video encoding device replaces a large amount of DRAM power consumption with the power consumption of Sys$ or SysBuf, which greatly reduces the power consumption.

If the number of times of reading from the first memory is greater than or equal to the preset number of times threshold, the image data of a plurality of unread block lines in the first region is read block line by block line from the second memory.

For the specific implementation of step 305, reference may be made to the embodiment of step 205, and details are not described herein again.

306. Determine image data of multiple search windows from the read image data of multiple first areas, where each search window is located in a corresponding first area.

For example, after the video encoding apparatus reads the image data of a plurality of first regions from the first memory and the second memory, the image data of the search window can be separately determined from the image data of each first region, that is, each One search window can be determined in the first area of the reconstructed frame image of the historical frame image, so that multiple search windows can be determined, and each search window is located in the corresponding first area. When the search window is determined from the first region of the reconstructed frame image of each historical frame image, reference may be made to the embodiment in step 206 for its specific implementation, which will not be repeated here.

307. Store the image data of the multiple search windows in a third memory, where the read/write speed of the third memory is greater than the second preset multiple of the read/write speed of the first memory.

For example, after the image data of multiple search windows are determined, they may be stored in a third memory, where the read/write speed of the third memory is greater than the second preset multiple of the read/write speed of the first memory. For example, the third memory may be a cache or a buffer, and the read/write speed of the cache or buffer is greater than the second preset multiple of the read/write speed of Sys$ or SysBuf.

308. Read the image data of a plurality of search windows from the third memory, and according to the image data of the plurality of search windows, respectively determine one or more blocks with the least coding cost of the block to be coded from the plurality of search windows .

For example, according to the read image data of multiple search windows, for the image data of each search window, the blocks in each block row in the search window are compared with the blocks to be encoded, so that each block and the block to be encoded can be obtained. For the coding cost of the coding block, one or more blocks can be determined from the coding cost in ascending order, that is, one or more blocks with the smallest coding cost of the block to be coded are determined from each search window. For example, for motion estimation, scan block by line in the current search window, search for blocks in the current search window, compare the searched block with the block to be coded, and find the code corresponding to the block to be coded from the current search window. The least expensive block or blocks.

For example, in one embodiment, in 308, the image data of multiple search windows are read from the third memory, and according to the image data of the multiple search windows, the blocks corresponding to the blocks to be encoded are respectively determined from the multiple search windows. One or more blocks with the least coding cost, which can include:

Read the image data of the plurality of search windows from the third memory;

For example, after reading the image data of multiple search windows from the third memory, the multiple search windows are reduced according to the preset number of layers. search windows, the multiple reduced search windows are respectively 1/2 of the size of the original search windows. Then, according to the image data of the plurality of reduced search windows, one or more reduced blocks with the smallest coding cost of the block to be coded are respectively determined from the plurality of reduced search windows, and the one or more reduced blocks are respectively determined. The reduction ratio of each reduced block and the reduced search window is the same. On the image of each reduced search window, first determine the approximate range of one or more reduced blocks to be searched, and then return to the image of the unreduced search window to perform a finer block search, that is, according to the narrowed The approximate range of the latter block in the reduced search window is to perform a finer search on the original search window, and one or more blocks with the least encoding cost of the block to be encoded can be determined from the unreduced search window.

For another example, after reading the image data of the plurality of search windows from the third memory, the plurality of search windows are reduced according to the preset number of layers, for example, the search windows are reduced according to the number of three layers, and a plurality of reduced search windows are obtained. Search windows, the multiple reduced search windows are respectively 1/4 of the size of the original search windows. Then, according to the image data of multiple 1/4 narrowed search windows, one or more reduced blocks with the smallest coding cost of the block to be coded are respectively determined from the multiple 1/4 reduced search windows, Obtain one or more block vectors corresponding to multiple 1/4-reduced search windows respectively. After that, a search with a finer and smaller range is performed in multiple 1/2-reduced search windows, and finally, according to one or more block vectors obtained from multiple 1/2-reduced search windows, a plurality of original-sized search windows are searched again. Search the range of the window to obtain the final one or more block vectors, so that one or more blocks with the smallest coding cost of the block to be coded can be determined.

It can be seen that, in the images of multiple reduced search windows, after first determining the approximate range of the reduced blocks to be searched, then returning to the images of multiple unreduced search windows to perform a more refined block search, that is, according to The approximate range of the reduced block in the reduced search window, the original search window can be searched more precisely, and one or more of the unreduced search windows can be determined with the least encoding cost of the block to be encoded. piece.

309. Determine a matching block from a plurality of blocks with the smallest coding cost of the block to be coded.

After determining one or more blocks with the smallest coding cost of the block to be coded from each search window, compare these blocks with the coding cost of the block to be coded again, and determine from the coding cost in ascending order of coding cost. One or more blocks to further refine search results. It should be noted that the relative displacement between the block or blocks with the least coding cost and the block to be coded can be used as the motion vector, and the difference between the block to be coded and the block or blocks with the least coding cost can be used as the motion vector. as a residual.

For example, after finding one or more blocks with the smallest coding cost of the block to be coded from multiple search windows, since at least one block with the smallest coding cost of the block to be coded can be found in each search window, the At least a plurality of blocks with the least coding cost of the block to be coded can be found in each search window. From these blocks, one or more blocks can be selected in order of coding cost from small to large. Usually, one or two blocks with the smallest coding cost are selected. block as a matching block. It can be known that the number of matching blocks can be one, two, or more, depending on the required number of reference blocks. If the number of reference blocks to be examined is two, two matching blocks need to be determined.

310. According to the relative relationship between the matching block and the block to be encoded, encode the block to be encoded.

For the specific implementation of step 310, reference may be made to the embodiment of step 210, and details are not described herein again.

It can be understood that, in the embodiment of the present application, the target position or attribute of data reading can be selected according to the long-term shooting requirement of the photographing device, the requirement of low heat dissipation cost and the relatively large power consumption caused by the predictable behavior. For example, the data that needs to be read repeatedly are read from Sys$ or SysBuf and DRAM respectively, not all of them are read from DRAM, because the same data is read, the power consumption of SRAM is far less than that of DRAM. Therefore, the embodiments of the present application can greatly reduce the power consumption when reading data.

This embodiment of the present application uses motion estimation as an example to describe in detail how to reduce the power consumption of reading data. In other embodiments, it can also be applied to all modules and applications that require high bandwidth but predictable data access behavior, such as video decoders, frame rate up conversion devices, etc. The behavior of these modules and applications is usually predictable, such as the number of repeated reads. Through these predictable behaviors, the corresponding storage characteristics can be pre-allocated, that is, the repeatedly read data is stored in low-power memory, such as The energy consumption of different levels of memory is corresponding to the access times of the image data of all or part of the frames, that is, the energy consumption corresponding to different levels is selected according to the access times of the image data of all or part of the frames. When the energy consumption is different, the times of reading data from Sys$ or SysBuf and DRAM can be reasonably allocated.

For example, the video decoder can also determine the behavior of accessing data by analyzing the code stream in advance, and the frame rate boosting device can know which areas will be used multiple times during processing through simple analysis, and so on. It can also be applied to fixed artificial intelligence (AI) network behavior. The repeated reading part of AI network behavior is the feature map part, and the AI network behavior is predictable.

Please refer to FIG. 20 , which is a schematic structural diagram of an apparatus for performing image processing in a video encoding apparatus according to an embodiment of the present application. The apparatus 400 for performing image processing in a video encoding apparatus may include: a first determination module 401 , a second determination module 402 , a reading module 403 , a third determination module 404 , and an encoding module 405 .

The first determination module 401 is used to determine the block to be encoded from the current frame image;

The second determination module 402 is configured to determine, from the reconstructed frame images of the historical frame images, a first area that needs to be read repeatedly for many times, and store the image data of the first area in a preset memory, and the The power consumption of the preset memory is less than the preset power consumption threshold;

a reading module 403, configured to read the image data of the first region from the preset memory;

A third determining module 404, configured to determine a matching block matching the block to be encoded from the first region according to the read image data of the first region;

The encoding module 405 is configured to encode the block to be encoded according to the relative relationship between the matching block and the block to be encoded.

In one embodiment, the preset memory includes a first memory and a second memory, the power consumption of the second memory is greater than a first preset multiple of the power consumption of the first memory, and the first area Including a plurality of block lines, the second determining module 402 can be used for:

The reading module 403 can be used for:

Reading the image data of the first region from the first memory block by line;

In one embodiment, the second determining module 402 may be used to:

In one embodiment, the third determining module 404 may be used to:

The image data of the search window is stored in a third memory, and the read-write speed of the third memory is greater than the second preset multiple of the read-write speed of the first memory;

In one embodiment, the third determining module 404 may be used to:

Read the image data of the search window from the third memory;

In one embodiment, the relative relationship is a motion vector and a residual, and the encoding module 405 can be used to:

In one embodiment, the third determining module 404 may be used to:

The to-be-coded block is reconstructed according to the second residual data.

Determine from the reconstructed frame images of the plurality of historical frame images stored in the second memory, a plurality of first regions that need to be read repeatedly;

The image data of the plurality of first regions is read from the second memory and stored in the first memory.

The reading module 403 can be used for:

In one embodiment, the third determining module 404 may be used to:

Read the image data of the plurality of search windows from the third memory;

In one embodiment, the first memory includes a system cache or a system buffer memory provided outside the video encoding device, and the second memory includes a dynamic random access memory provided outside the video encoding device.

In one embodiment, the third memory includes a buffer or buffer provided inside the video encoding device.

An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, the computer is made to execute the image encoding in a video encoding device as provided in this embodiment. Process in the method of processing.

An embodiment of the present application further provides an electronic device, including a memory, a processor, and a video encoding apparatus. The processor is configured to execute the video encoding apparatus provided in this embodiment by calling a computer program stored in the memory. The flow in the method of image processing.

For example, the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone. Please refer to FIG. 21 , which is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

The electronic device 500 may include a video encoding apparatus 501, a memory 502, a processor 503 and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 21 does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or arrange different components.

The video encoding device 501 may be used for encoding video images to compress the content of the video images.

Memory 502 may be used to store applications and data. The application program stored in the memory 502 contains executable code. Applications can be composed of various functional modules. The processor 503 executes various functional applications and data processing by running the application programs stored in the memory 502 .

The processor 503 is the control center of the electronic device, uses various interfaces and lines to connect various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 502 and calling the data stored in the memory 502. The various functions and processing data of the device are used to monitor the electronic equipment as a whole.

In this embodiment, the processor 503 in the electronic device loads the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 executes and stores it in the memory 502 in the application, which executes:

Determine the block to be coded from the current frame image;

read the image data of the first area from the preset memory;

Referring to FIG. 22 , the electronic device 500 may include components such as a video encoder 501 , a memory 502 , a processor 503 , a battery 504 , an input unit 505 , and an output unit 506 .

The video coding module 501 may be used for coding video images to compress the content of the video images.

The battery 504 may be used to provide electrical support for various components of the electronic device, thereby ensuring the normal operation of the various components.

The input unit 505 can be used to receive an input video stream of video images, for example, can be used to receive a video stream that needs to be compressed.

The output unit 506 may be used to output the compressed video stream.

Determine the block to be coded from the current frame image;

read the image data of the first area from the preset memory;

An embodiment of the present application further provides an image processing system. Please refer to FIG. 23 and FIG. 24 . FIG. 23 is a schematic structural diagram of the image processing system provided by the embodiment of the present application. FIG. 24 is another schematic structural diagram of an image processing system provided by an embodiment of the present application. The image processing system 600 includes a video encoding apparatus 601, a first memory 602 and a second memory 603, wherein the power consumption of the second memory 603 is greater than a first preset multiple of the power consumption of the first memory 602, and the video encoding apparatus 601 may Including a third memory, the reading speed of the third memory is greater than the second preset multiple of the reading speed of the first memory, and the first memory 602 and the second memory 603 respectively store the reconstructed frame images of the historical frame images. For the image data read repeatedly, the video encoding device 601 reads the image data repeatedly read from the first memory 602 and the second memory 603 according to a preset number of times when encoding, and determines the image data in the search window. Image data, storing the image data in the search window in the third memory.

For example, after the reconstructed frame images of the historical frame images are stored in the second memory 603, the first area that needs to be read repeatedly for multiple times can be determined from the reconstructed frame images of the historical frame images stored in the second memory 603 , and then read the image data of the first area from the second memory 603 , and store the read image data of the first area into the first memory 602 . During encoding, the video encoding apparatus 601 can retrieve the image data from the first memory 602 The image data of the first region is read block by line. If the number of times of reading from the first memory 602 is greater than or equal to the preset number of times threshold, the image data of the unread block lines in the first region is read block line by block line from the second memory 603 .

It should be noted that when the image data is read from the second memory 603 , the video encoding apparatus 601 may directly read the image data from the second memory 603 , or the first memory 602 may read the image data from the second memory 603 after For storage, this part of the image data is directly read from the first memory 602 by the video encoding device 601 .

The video encoding device 601 can read the image data in the search window from the third memory, according to the image data in the search window read from the third memory, from the search window, determine the matching block that matches the module to be encoded, And encoding is performed according to the motion vector and residual of the matching block and the block to be encoded.

In the above embodiments, the description of each embodiment has its own emphasis. For the part that is not described in detail in a certain embodiment, please refer to the above detailed description of the method for performing image processing in a video encoding device. Repeat.

The apparatus for performing image processing in a video coding apparatus provided by the embodiments of the present application and the method for performing image processing in a video coding apparatus in the above embodiments belong to the same concept. Any of the methods provided in the method embodiments for performing image processing in a video encoding device can be run on the device of a video encoding device. For details of the specific implementation process, please refer to the method embodiments for performing image processing in a video encoding device. Repeat.

It should be noted that, for the method for performing image processing in a video encoding apparatus described in the embodiments of the present application, those of ordinary skill in the art can understand all aspects of implementing the method for performing image processing in a video encoding apparatus described in the embodiments of the present application Or part of the process can be completed by controlling the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor. The flow of an embodiment of a method for image processing in a video encoding device as described may be included in the . Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and the like.

For the apparatus for performing image processing in a video encoding apparatus described in the embodiments of the present application, each functional module may be integrated in a processing chip, or each module may exist physically alone, or two or more modules may be used. integrated in one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

A method, device, storage medium, electronic device, and system for image processing in a video encoding device provided by the embodiments of the present application have been described above in detail. Specific examples are used in this paper to describe the principles and implementations of the present application. For elaboration, the description of the above embodiment is only used to help understand the method and the core idea of the application; meanwhile, for those skilled in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In conclusion, the content of this specification should not be construed as a limitation to the present application.

Claims

A method for image processing in a video encoding device, wherein the method comprises:

Determine the block to be coded from the current frame image;

Determine the first area that needs to be read repeatedly from the reconstructed frame images of the historical frame images, and store the image data of the first area in a preset memory, where the power consumption of the preset memory is less than that of the preset memory. Set the power consumption threshold;

read the image data of the first area from the preset memory;

determining a matching block matching the block to be encoded from the first region according to the read image data of the first region;

The to-be-encoded block is encoded according to the relative relationship between the matched block and the to-be-encoded block.
The method for image processing in a video encoding device according to claim 1, wherein the preset memory comprises a first memory and a second memory, and the power consumption of the second memory is greater than that of the first memory The first preset multiple of power consumption, the first area includes a plurality of block lines, the first area that needs to be read repeatedly is determined from the reconstructed frame image of the historical frame image, and the first area needs to be read repeatedly. The image data of the area is stored in the preset memory, including:

Determine from the reconstructed frame image of the historical frame image stored in the second memory the first area that needs to be read repeatedly;

read the image data of the first area from the second memory and store it in the first memory;

The reading of the image data of the first region from the preset memory includes:

Reading the image data of the first region from the first memory block by line;

If the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, the image data of unread block lines in the first region is read block line by block line from the second memory.
The method of image processing in a video encoding apparatus according to claim 2, wherein the image data of the first region is read from the second memory and stored in the first memory ,include:

If the first area is moved down by one block line in the reconstructed frame image of the historical frame image, the image data of the moved block line is read from the second memory and stored in the first memory in memory;

The block lines that are not used in the encoding of the next block line to be encoded in the first memory are removed.
The method for performing image processing in a video encoding device according to claim 3, wherein the block corresponding to the block to be encoded is determined from the first region according to the read image data of the first region. Matched match blocks, including:

Determine the image data of the search window from the read image data of the first area, and the search window is located in the first area;

storing the image data of the search window in a third memory, where the degree of the third memory is greater than the second preset multiple of the read-write speed of the first memory;

The image data of the search window is read from the third memory, and according to the image data of the search window, the block with the least encoding cost of the block to be encoded is determined from the search window;

The block with the smallest coding cost of the block to be coded is used as the matching block.
The method for performing image processing in a video encoding device according to claim 4, wherein the image data of the search window is read from the third memory, and the image data of the search window is retrieved from the image data of the search window. The block with the least coding cost of the block to be coded is determined in the search window, including:

Read the image data of the search window from the third memory;

reducing the search window according to the preset number of layers to obtain a reduced search window;

According to the image data of the reduced search window, the reduced block with the smallest encoding cost of the block to be encoded is determined from the reduced search window;

According to the position of the reduced block in the reduced search window, a block with the smallest encoding cost to the block to be encoded is determined from the search window.
The method for performing image processing in a video encoding device according to claim 1, wherein the relative relationship is a motion vector and a residual, and the relative relationship between the matching block and the block to be encoded is determined according to the relative relationship between the matching block and the block to be encoded. The block to be encoded is encoded, including:

The to-be-encoded block is encoded according to the motion vector sum residual of the matched block and the to-be-encoded block.
The method for performing image processing in a video encoding device according to claim 6, wherein the encoding the to-be-encoded block according to a motion vector and a residual of the matching block and the to-be-encoded block comprises the following steps: :

performing forward transform and quantization on the residual of the matching block and the block to be encoded;

Entropy coding is performed on the motion vector of the matching block and the block to be coded and the forward transformed and quantized first residual data to obtain coded video stream data; or

Perform inverse quantization and transformation on the first residual data after the forward transformation and quantization to obtain second residual data;

The to-be-coded block is reconstructed according to the second residual data.
The method for image processing in a video encoding device according to claim 1, wherein the preset memory comprises a first memory and a second memory, and the power consumption of the second memory is greater than that of the first memory The first preset multiple of power consumption, the first area includes a plurality of block lines, the first area that needs to be read repeatedly is determined from the reconstructed frame image of the historical frame image, and the first area needs to be read repeatedly. The image data of the area is stored in the preset memory, including:

Determine from the reconstructed frame images of the plurality of historical frame images stored in the second memory a plurality of first regions that need to be read repeatedly;

reading image data of the plurality of first regions from the second memory and storing it in the first memory;

The reading of the image data of the first region from the preset memory includes:

read the image data of the plurality of first regions from the first memory block by line;

If the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, the image data of unread block lines in the plurality of first regions is read block line by block line from the second memory.
The method for performing image processing in a video encoding device according to claim 8, wherein the block corresponding to the block to be encoded is determined from the first region according to the read image data of the first region. Matched match blocks, including:

Determine the image data of a plurality of search windows from the read image data of the plurality of first areas, and each of the search windows is located in the corresponding first area;

storing the image data of the plurality of search windows in a third memory, where the read-write speed of the third memory is greater than the second preset multiple of the read-write speed of the first memory;

The image data of the plurality of search windows are read from the third memory, and according to the image data of the plurality of search windows, the codes corresponding to the blocks to be coded are respectively determined from the plurality of search windows. the least expensive block or blocks;

The matching block is determined from a plurality of blocks with the smallest coding cost of the block to be coded.
The method for performing image processing in a video encoding device according to claim 9, wherein the image data of the plurality of search windows are read from the third memory, and the image data of the plurality of search windows are read according to the The image data, from the plurality of search windows, respectively determine one or more blocks with the smallest coding cost of the block to be coded, including:

Read the image data of the plurality of search windows from the third memory;

reducing the plurality of search windows according to the preset number of layers to obtain a plurality of reduced search windows;

According to the image data of the plurality of reduced search windows, from the plurality of reduced search windows, respectively determine one or more reduced blocks with the smallest coding cost of the to-be-coded block;

According to the positions of the one or more reduced blocks in the reduced search window, one or more blocks with the smallest coding cost of the block to be coded are respectively determined from the plurality of search windows .
The method for performing image processing in a video encoding device according to claim 2, wherein the first memory includes a system cache or a system buffer memory provided outside the video encoding device, and the second memory includes a system cache provided in the video encoding device. Dynamic random access memory external to the encoding device.
The method for performing image processing in a video encoding device according to claim 4, wherein the third memory comprises a buffer or buffer provided inside the video encoding device.
A device for image processing in a video encoding device, wherein the device comprises:

a first determining module, configured to determine the block to be encoded from the current frame image;

The second determination module is configured to determine, from the reconstructed frame images of the historical frame images, a first area that needs to be read repeatedly for many times, and store the image data of the first area in a preset memory, and the preset memory Set the power consumption of the memory to be less than the preset power consumption threshold;

a reading module for reading the image data of the first area from the preset memory;

a third determining module, configured to determine a matching block matching the block to be encoded from the first region according to the read image data of the first region;

An encoding module, configured to encode the to-be-encoded block according to the relative relationship between the matched block and the to-be-encoded block.
The apparatus for performing image processing in a video encoding apparatus according to claim 13, wherein the preset memory includes a first memory and a second memory, and the power consumption of the second memory is greater than that of the first memory The first preset multiple of power consumption, the first area includes a plurality of block rows; the second determination module is further configured to reconstruct the frame image from the historical frame image stored in the second memory Determine the first area that needs to be read repeatedly;

read the image data of the first area from the second memory and store it in the first memory;

The reading module is further configured to read the image data of the first area block by row from the first memory;

If the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, the image data of unread block lines in the first region is read block line by block line from the second memory.
The apparatus for performing image processing in a video encoding apparatus according to claim 14, wherein the reading module is further configured to move down by one in the reconstructed frame image of the historical frame image if the first area is moved down by one block row, read the image data of the block row down from the second memory and store it in the first memory;

The block lines that are not used in the encoding of the next block line to be encoded in the first memory are removed.
The device for performing image processing in a video encoding device according to claim 15, wherein the third determining module is further configured to determine the image data of the search window from the read image data of the first region, the search window is located in the first area;

storing the image data of the search window in a third memory, where the degree of the third memory is greater than the second preset multiple of the read-write speed of the first memory;

The image data of the search window is read from the third memory, and according to the image data of the search window, the block with the least encoding cost of the block to be encoded is determined from the search window;

The block with the smallest coding cost of the block to be coded is used as the matching block.
A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed on a computer, the computer is caused to perform the method according to any one of claims 1 to 12.
An electronic device includes a memory, a processor and a video encoding apparatus, wherein the processor executes the method according to any one of claims 1 to 12 by invoking a computer program stored in the memory.
An image processing system, comprising a video encoding device, a first memory and a second memory, the power consumption of the second memory is greater than a first preset multiple of the power consumption of the first memory, the video encoding device including a third memory, the read speed of the third memory is greater than the second preset multiple of the read speed of the first memory, the first memory and the second memory respectively store the reconstructed frame images of the historical frame images image data repeatedly read in the video encoding device, when encoding, the video encoding device reads the repeatedly read image data from the first memory and the second memory according to a preset number of times, and determines from the data. extract the image data in the search window, store the image data in the search window in the third memory, and the video encoding device reads the image data in the search window from the third memory, and A matching block matching the block to be coded is determined, and coding is performed according to the motion vector and residual of the matching block and the block to be coded.
The image processing system according to claim 19, wherein the first area that needs to be read repeatedly is determined from the reconstructed frame image of the historical frame image stored in the second memory, and the first area needs to be read repeatedly. The image data of the first area is read from the second memory and stored in the first memory. During encoding, the video encoding apparatus reads the first memory block by line from the first memory. For the image data of an area, if the number of times of reading from the first memory is greater than or equal to a preset number of times threshold, the unread blocks in the first area are read from the second memory block by block line by block row of image data.