CN108024116B - Data caching method and device - Google Patents

Data caching method and device Download PDF

Info

Publication number
CN108024116B
CN108024116B CN201610972964.1A CN201610972964A CN108024116B CN 108024116 B CN108024116 B CN 108024116B CN 201610972964 A CN201610972964 A CN 201610972964A CN 108024116 B CN108024116 B CN 108024116B
Authority
CN
China
Prior art keywords
data
reference frame
frame data
ram
rows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610972964.1A
Other languages
Chinese (zh)
Other versions
CN108024116A (en
Inventor
张博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610972964.1A priority Critical patent/CN108024116B/en
Publication of CN108024116A publication Critical patent/CN108024116A/en
Application granted granted Critical
Publication of CN108024116B publication Critical patent/CN108024116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/57Motion estimation characterised by a search window with variable size or shape
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Abstract

The embodiment of the invention discloses a data caching method, which comprises the following steps: monitoring whether the search window position of the current block is changed compared with the search window position of the previous current block; when the position of the search window of the current block is monitored to be changed, determining front M rows of data in at least one row of partial reference frame data shifted out by the search window of the current block compared with the search window of the previous current block in the cache region, and marking the front M rows of data; when the fact that the size of the marked previous M rows of data meets the preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the cache region with the new M rows of data to store in the RAM. The embodiment of the invention also discloses a device. By adopting the invention, the device can update the data in the buffer at the same time when the data in the buffer is read for motion estimation, thereby shortening the waiting time of the read data.

Description

Data caching method and device
Technical Field
The present invention relates to the field of electronic technologies, and in particular, to a data caching method and apparatus.
Background
At present, a processor generally needs to perform encoding processing on stored video data, a video encoding method includes intra-frame compression and inter-frame compression, one of key technologies of the inter-frame compression is motion estimation, and the inter-frame compression through the motion estimation can remove redundancy between adjacent frames and improve the compression ratio of a video image.
In the process of inter-frame coding by motion estimation, in order to improve the reading speed, the processor can read partial reference frame data from the memory and buffer the reference frame data to the buffer, so that the processor can directly read the data in the buffer in the coding process to carry out motion estimation, thereby improving the processing speed of the processor. However, when the processor reads the reference frame data from the memory and buffers the reference frame data in the buffer, in order to save cost, the processor only reads the search window data corresponding to the current block from the reference frame data stored in the external memory and buffers the search window data in the buffer. Therefore, when the processor processes the next current block, the processor needs to re-read the new search window data from the current frame data stored in the memory for buffering. It can be seen that, since the reading speed of the external memory is much slower than the reading speed of the processor for reading the buffer, the processor may take a lot of time to read part of the reference frame data in the memory for many times, so that the processor is in an idle waiting state, which seriously affects the encoding efficiency, and when the search window of the buffer is updated, the FPGA device may need to frequently read the data at the same address of the memory, which takes a lot of reading bandwidth.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a data caching method and apparatus. The device can update the data in the buffer at the same time when the data in the buffer is read for motion estimation, thereby shortening the waiting time of the read data, accelerating the speed of motion estimation and effectively saving a large amount of read bandwidth.
A first aspect of an embodiment of the present invention provides a data caching method, which may include:
when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, monitoring whether the position of a search window of the current block is changed compared with the position of a search window of a previous current block, wherein the partial reference frame data comprises W lines of reference frame data, the size of the current block is M multiplied by N pixels, M is a line number, N is a column number, W is larger than or equal to 3M, and M, N and W are both natural numbers;
when the position of the search window of the current block is monitored to be changed, determining front M rows of data in at least one row of partial reference frame data shifted out by the search window of the current block compared with the search window of the previous current block in the cache region, and marking the front M rows of data;
when the fact that the size of the marked previous M rows of data meets a preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the cache region with the new M rows of data to store the new M rows of data in the RAM, wherein the size of the new M rows of data is the same as that of the marked previous M rows of data.
A second aspect of the present invention provides a data caching apparatus, which may include:
the first monitoring unit is used for monitoring whether the position of a search window of a current block is changed compared with the position of the search window of a previous current block or not when the current block is subjected to motion estimation according to partial reference frame data cached in a cache region, wherein the partial reference frame data comprise W lines of reference frame data, the size of the current block is M multiplied by N, W is larger than or equal to 3M, and W, M, N are all natural numbers;
a first determining unit, configured to determine, in the buffer, previous M rows of data in at least one column of partial reference frame data from which a search window of the current block is shifted out compared with a search window of a previous current block when the first monitoring unit monitors that a position of the search window of the current block changes, and mark the previous M rows of data;
and the updating unit is used for reading new M-row data from reference frame data stored in an external memory when the condition that the size of the marked previous M-row data meets the preset condition is detected, replacing the marked previous M-row data in the cache region with the new M-row data, and storing the new M-row data into the RAM, wherein the size of the new M-row data is the same as that of the marked previous M-row data.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the cache region, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data caching method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an FPGA device according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a position of a search window in a part of current frame data according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating the position of another search window in a portion of the current frame data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a marked first M rows of data according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating reference frame data stored in a DDR according to an embodiment of the present invention;
fig. 7 is a schematic flowchart of another data caching method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of space division of a RAM according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a memory mapping between reference frame data and a RAM according to an embodiment of the present invention;
fig. 10 is a flowchart illustrating a further data caching method according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating a location of a target reference block in reference frame data according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a data caching apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of another data caching apparatus according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The data caching method provided by the embodiment of the invention can be applied to a scene of data caching of a programmable gate array FPGA.
A data caching method according to an embodiment of the present invention will be described in detail with reference to fig. 1 to 11, where an execution main body of the embodiment may be a data caching device, and specifically may be an FPGA (Field Programmable Gate Array) device, and the following embodiment describes the execution main body as the FPGA device.
In order to better describe the embodiments of the present invention, the following describes the principle of the motion estimation algorithm related to the present invention in detail.
At present, a processor generally needs to perform encoding processing on video data stored in a memory, a video encoding method includes intra-frame compression and inter-frame compression, one of key technologies of the inter-frame compression is motion estimation, and the inter-frame compression performed through the motion estimation can remove redundancy between adjacent frames and improve a compression ratio of a video image. Generally, the content between adjacent frames has very limited variation and large correlation, and the correlation becomes time domain redundancy. The purpose of motion estimation is to find out this temporal correlation and help the encoding side to remove it as much as possible.
In video encoding, a frame undergoing motion estimation is referred to as current frame data or a target frame, and a frame used as a reference for motion estimation in motion estimation of the current frame data is referred to as reference frame data. The current picture may be divided into several data blocks of the same size, and the data block in the current frame data for motion estimation may be referred to as a current block. Motion estimation is typically performed in units of data blocks or sub-data blocks, each of which may consist of 16 × 16 pixel data. In the motion estimation process, the most important process is the search matching process. Taking a current block for searching and matching as an example, the purpose of searching and matching is to find a reference block which has the same size as the current block and is most similar to the brightness component on the image in the reference frame data, the most similar reference block is called an optimal matching block, the position of the matching block is the optimal matching position, then, the position difference between the current block and the matching block is used as the motion vector of the current block, the pixel difference between the current block and the matching block is used as a residual block, and the motion vector and the residual block are the searching and matching result of the current block.
In order to improve the efficiency of motion estimation, a matching block is usually searched in a fixed range in reference frame data, the search range is related to the position of a current block in an image, the search range may be called a search window, the reference frame data in the search window is search window data, and when the current block is searched and matched, the current block data and the search window data are required to participate in motion estimation.
Reference to a line of reference frame data or lines of reference frame data in the following embodiments may specifically refer to one or more lines of pixel data in a reference frame, a line of data or lines of data may specifically refer to one or more lines of pixel data in an image to which the data corresponds. The one or more columns of reference frame data may specifically refer to one or more columns of pixel data in the reference frame, and the one or more columns of data may specifically refer to one or more columns of pixel data in the image to which the data corresponds.
Fig. 1 is a schematic flow chart of a data caching method according to a first embodiment of the present invention. The data caching method of the embodiment of the invention comprises the following steps:
s100, when the motion estimation is carried out on the current block according to the part of the reference frame data cached in the cache region, whether the position of the search window of the current block is changed compared with the position of the search window of the previous current block is monitored.
In an embodiment of the present invention, a schematic structural diagram of the FPGA device may be specifically shown in fig. 2, where the FPGA device may be connected to an external memory of a motherboard through a PCI (Peripheral Component Interconnect) bus, where the motherboard is connected to a Central Processing Unit (CPU) through the bus. When the FPGA device needs to perform motion estimation operation, the FPGA device can acquire current frame data and reference frame data stored in an external memory of the mainboard and store the current frame data into the external memory of the FPGA device. Further, the FPGA device may buffer the current frame data and the reference frame data into the buffer area 1 and the buffer area 2, respectively, and control the motion estimation module to read data from the buffer area 1 and the buffer area 2 for motion estimation. In the following embodiment of the present invention, a detailed description is mainly given to a specific implementation manner in which the FPGA device reads a part of reference frame data stored in the external memory and caches the reference frame data in the FPGA device control buffer 1.
In the embodiment of the present invention, the cache region may specifically be a Random Access Memory (RAM), and the following embodiment specifically describes the RAM as the cache region. And buffering part of the reference frame data in the RAM, namely, it can be understood that a part of the reference frame data is buffered in the RAM. In the embodiment of the present invention, when the size of the current block is M × N pixels, M is the number of rows and N is the number of columns, the partial reference frame data may include W rows of reference frame data, where W ≧ 3M, W, M and N may all be natural numbers. Specifically, the size of the current block may be a current block of 16 × 16 pixels, or may be other sizes, such as 32 × 32 pixels, 8 × 8 pixels, or 4 × 4 pixels, and the like, which is not limited herein. The FPGA device can flexibly configure W according to the resource limit of the RAM. Preferably, W may be 3M, that is, when the size of the current block is 16 × 16 pixels, W may be 48, that is, the RAM of the FPGA configuration may buffer 48 lines of partial reference frame data.
In the embodiment of the present invention, when the FPGA device performs motion estimation on the data block of the current frame data according to the reference frame data, the FPGA device may determine the search window of the current block in a part of the reference frame data cached in the RAM, and perform motion estimation operation on the current block in the determined search window of the current block according to a motion estimation algorithm.
In the embodiment of the present invention, when the current block for motion estimation by the FPGA device is changed, the FPGA device may re-determine the search window of the current block in the partial reference frame data buffered in the RAM. When the FPGA device determines the search window of the current block in the portion of the reference frame data cached in the RAM, the FPGA device may monitor whether the search window position of the current block is changed from the search window position of the previous current block. As shown in fig. 3, the position of the search window in a part of the current frame data may be smaller than the width of the current frame data (i.e. the number of columns of the search window is smaller than the number of columns of the current frame data). The current block 1 is a current block before the current block 2, and when the current block for motion estimation by the FPGA device is the current block 1, the search window of the current block 1 determined by the FPGA device may be a solid line search window 1000; when the current block for motion estimation by the FPGA device is the current block 2, the search window of the current block 1 determined by the FPGA device may be the dotted search window 2000, and since the search window 1000 and the search window 2000 are not overlapped, it may be determined that the search window position of the current block is changed from the search window position of the previous current block. Further, also refer to the schematic diagram of the position of the search window in a part of the current frame data shown in fig. 4, where the width of the search window is the same as the width of the current frame data (that is, the number of columns of the search window is equal to the number of columns of the current frame data), the current block 3 is a current block previous to the current block 4, when the current block for motion estimation by the FPGA device is the current block 3, the search window of the current block 3 determined by the FPGA device may be the solid search window 3000, and when the current block for motion estimation by the FPGA device is the current block 4, the search window of the current block 4 determined by the FPGA device may still be the solid search window 3000, and since the search windows of the current block 4 and the current block 5 are overlapped, it may be determined that the position of the search window.
S101, when the position of the search window of the current block is monitored to be changed, determining front M rows of data in at least one row of partial reference frame data shifted out by the search window of the current block compared with the search window of the previous current block in the RM, and marking the front M rows of data.
In the embodiment of the present invention, the search window of the current block is shifted out by at least one column of partial reference frame data compared with the search window of the previous current block, referring to fig. 3, when the search window is shifted from the search window 1000 to the search window 2000, it can be seen that at least one column of partial reference frame data on the left side of the search window 2000 is shifted out. The FPGA device may obtain M rows of data starting from the first row in the at least one column of partial reference frame data shifted out to the left of the search window 2000, i.e., obtain the same number of rows of data as the current block. Of course, when the FPGA device performs motion estimation, the FPGA device may also determine the search window from right to left in partial reference frame data in the RAM, or determine the search window from top to bottom, or determine the search window from bottom to top, and so on, which is not described herein again.
In an embodiment of the present invention, when the FPGA device determines the first M rows of data in the at least one shifted-out column of partial reference frame data, the FPGA device may mark the first M rows of data in the at least one shifted-out column of partial reference frame data. Wherein, the marked previous M rows of data can be identified as aged data, namely, the part of data does not participate in the operation in the subsequent motion estimation. Therefore, in the embodiment of the present invention, when the FPGA device completes motion estimation of the previous current block, determines the search window position of the current block, and performs motion estimation, the terminal may identify data aged in a part of reference frame data stored in the RAM after motion estimation of the previous current block.
In a specific application, referring to the schematic diagram of the marked previous M rows of data shown in fig. 5, when the current block for motion estimation by the FPGA device is changed from the current block 1 to the current block 2, the previous M rows of data in at least one column of partial reference frame data determined by the FPGA device to be shifted out may be the shaded portion data of fig. 5. Therefore, the FPGA device can mark, e.g., tag, the shadow data, so that the FPGA device can recognize the shadow data as the aged data according to the tag of the shadow data. Further, when the FPGA device continuously obtains a new current block for motion estimation, the FPGA device may determine previous M rows of data in the shifted out at least one column of partial reference frame data, so that there are a plurality of marked previous M rows of data.
S102, when the fact that the size of the marked previous M rows of data meets the preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the RAM with the new M rows of data to store the new M rows of data in the RAM.
In the embodiment of the invention, the FPGA device can monitor whether the size of the marked previous M rows of data meets the preset condition. If the FPGA device can judge whether the number of the marked previous M rows of data is larger than a preset threshold value, and if the FPGA device judges that the number of the marked previous M rows of data is larger than the preset threshold value, the FPGA device can determine that the size of the marked previous M rows of data meets a preset condition. The preset threshold may be set by itself, and specifically may be a natural number such as 1, 2, 3, and the like. Preferably, the FPGA device may determine whether the number of groups of the marked previous M rows of data is greater than 1; or the FPGA device can judge whether the marked previous M rows of data form previous M rows of reference frame data or not.
In the embodiment of the invention, when the FPGA device detects that the size of the marked previous M rows of data meets the preset condition, the FPGA device can determine to perform coverage updating on the aging data. Therefore, the FPGA device can read the new M rows of data from the reference frame data stored in the external memory and store the new M rows of data in the RAM instead of the marked previous M rows of data in the RAM.
In the embodiment of the present invention, the external memory may be a DDR (Double Data Rate, Double Data Rate synchronous dynamic random access memory), and in the following embodiment, the external memory is specifically described with the DDR as the external memory. The FPGA device can read part of reference frame data from the DDR and store the reference frame data into the RAM for buffering. Therefore, the FPGA device can sequentially acquire new M-row data in the DDR-stored reference frame data according to the partial reference frame data stored in the RAM, and replace the marked previous M-row data in the RAM with the new M-row data to store the new M-row data in the RAM so as to update the partial reference frame data in the RAM.
In a specific application, referring to a schematic diagram of reference frame data stored in the DDR shown in fig. 6, the FPGA device may first load reference frame data 0 to reference frame data 15 in the reference frame data stored in the DDR into the RAM. When the FPGA device performs motion estimation based on a portion of the reference frame data in the RAM, the FPGA device may simultaneously determine the aged data in the RAM. When the FPGA device determines that the aged data in the RAM meets the preset condition, the FPGA device can read the next part of reference frame data in the part of reference frames stored in the RAM from the DDR. If the FPGA device sets the preset condition to be one group number of the aged data, as in the RAM, then when the reference frame data 0 is the aged data, the FPGA device may read the reference frame data 16 from the DDR to be stored at the position of the reference frame data 0, thereby updating the aged data. Further, it is also possible to, for example: when the preset condition is set by the FPGA device to 5 groups of the aged data, when the reference frame data 0, the reference frame data 3, the reference frame data 6, the reference frame data 9 and the reference frame data 12 are all aged data, the FPGA device may read the reference frame data 16, the reference frame data 17, the reference frame data 18, the reference frame data 19 and the reference frame data 20 from the DDR and store the data in the positions of the reference frame data 0, the reference frame data 3, the reference frame data 6, the reference frame data 9 and the reference frame data 12, respectively, so as to update the aged data.
In the embodiment of the invention, the FPGA device can determine aged data in the RAM at the same time of motion estimation, and store new M rows of data to the positions of the aged data. Therefore, as can be seen from the above, when the FPGA device continuously processes the current block of the current frame data from top to bottom, the FPGA device can continuously read new M rows of data from the reference frame data stored in the DDR and write the new M rows of data into the RAM to update the aged data, which can effectively improve the cache efficiency and maximally reduce the data loading bandwidth.
In a specific application, for example, the size of the current block is 16 × 16 pixels, and it is assumed that the search window of the current block of each current frame data may be up to 48 lines of reference frame data in the vertical direction, and may also be other lines of reference frame data smaller than 48 lines, and the RAM may store the 48 lines of reference frame data. Preferably, in the embodiment of the present invention, the search window is 48 lines of reference frame data. When the FPGA device processes a row of data blocks of the current frame data to complete motion estimation, the search window moves downwards, and moves out of 16 rows of data blocks as reference frame data. Since the shifted-out 16 rows of reference frame data in the search window are aged and can not be used in subsequent operation, the FPGA device reads the following 16 rows of current frame data in the 48 rows of reference frame data stored in the DDR to overlay the 16 rows of reference frame data shifted out from the search window. Thus, the next 16 rows of current frame data can be wrapped and stored in the empty space, and since the size of the current block is 16 × 16 pixels, the FPGA device can control the RAM to perform wrapping storage refresh in units of 16 rows. Of course, when the search window moves in the left or right direction and updates the aged data, the FPGA device may control the RAM to perform the winding storage refresh in the preset unit in the same manner as the above manner, only if the unit sizes are different.
In the embodiment of the invention, because the bandwidth of the reference frame data loaded into the RAM from the DDR has a relationship with the size of the reference frame data and the loading times of the reference frame data with the same address of the DDR, and the size of the reference frame data is the same for different cache designs, the loading times of the reference frame data with the same address of the DDR becomes an index for measuring the bandwidth.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the RAM, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
Fig. 7 is a schematic flow chart of a data caching method according to a second embodiment of the present invention. The data caching method of the embodiment of the invention comprises the following steps:
s200, configuring N RAMs.
In the embodiment of the invention, the FPGA device can be configured with N RAMs, the FPGA device can read data from the N configured RAMs in parallel, N is more than 1, obviously, the reading efficiency of the FPGA device is increased, the processing speed of the FPGA device is higher, and the FPGA device can be configured with the number of the RAMs and the depth of the RAMs, so that the configuration flexibility is increased, and different search windows and image resolution sizes can be supported by expanding according to needs, and thus different applications can be expanded quickly. The storage space size of each RAM is the same, and the storage space size of each RAM is configured according to the storage space of the data needed by the W-line reference frame data. In specific applications, for example: when W is 48 and the current block is 16 × 16, the search window is vertical for 48 lines of data, the RAM bit width is set to 128 bits (16 bytes), and the RAM depth is (48/M) × (1920/M) according to the width of the reference frame data being 1920 (assuming 1byte per pixel), where M is the number of rows of pixels of the current block. When M is 16, if buffering processing is required for reference frame data with a search range of (-16, +15) of 1080P resolution, each RAM depth may be (48/16) × (1920/16) ═ 360, that is, we only need 16 pieces of SP360 × 128bit RAM to complete buffering processing for reference frame data with a search range of (-16, +15) of 1080P resolution.
In the embodiment of the invention, in order to facilitate the RAM addressing in the cache data updating process, the FPGA device can sequentially address the configured N RAMs, for example, when the FPGA device configures 16 RAMs, the FPGA device can sequentially number the 16 RAMs as the first RAM and the 16 th RAM of the second RAM … …. In a specific application, the N RAMs may be numbered as follows: SP RAM0, SP RAM1 … … SP RAM 15.
In the embodiment of the invention, the FPGA device can also divide F storage spaces of each RAM according to the sequence of continuous addresses, and sequentially address the F storage spaces respectively according to the sequence of the continuous addresses of each RAM. As shown in fig. 8, each address of the RAM is sequentially connected from top to bottom, the FPGA device may be divided into 3 storage spaces from top to bottom, the FPGA device may number the 3 storage spaces sequentially as a first storage space, a second storage space, and a third storage space, the size of each storage space is the size of one line of reference frame data, and each storage space is used to store one line of reference frame data.
S201, sequentially reading M lines of reference frame data of the reference frame data stored in the external memory line by line according to the row-column sequence, and respectively writing the M lines of reference frame data into the N RAMs for caching.
In the embodiment of the present invention, the FPGA device may read M lines of reference frame data of the reference frame data stored in the external memory line by line based on the sequence of the reference frames from top to bottom, and write the read M lines of reference frame data into the N RAMs, respectively. The specific requirement that the FPGA device writes the read M lines of reference frame data into the N RAMs may be that the FPGA device sequentially and correspondingly stores first to last lines of reference frame data in the M lines of reference frame data into the first to nth RAMs according to the arrangement sequence of the lines in which the first to last lines of reference frame data are located. For example, when the N RAMs are respectively numbered as SP RAM0 and SP RAM1 … … SP RAMN, the FPGA device may store a first line of reference frame data of the M lines of reference frame data into SP RAM0 and a second line of reference frame data of the M lines of reference frame data into SP RAM1, and when M is greater than N, the FPGA device may store the N +1 th line of the M lines of reference frame data into SP RAM0 in a round-robin storage manner. Therefore, when the reference frame data of the ith row in the reference frame data is stored in the jth RAM, the specific relationship may be: the FPGA device may obtain a jth RAM to be stored with reference frame data of an ith row in the reference frame data according to the corresponding relationship, and store the ith row of reference frame data in the reference frame data into the jth RAM. The FPGA device may store the ith row of reference frame data in the reference frame data into the jth RAM, or the FPGA device may store the ith row of reference frame data in the reference frame data into the jth RAM according to a sequential address order.
In the embodiment of the present invention, when the FPGA device stores the ith line of reference frame data in the reference frame data into the jth RAM, if the reference frame data is stored in the first storage space in the jth RAM at this time, the FPGA device may acquire the next storage space, i.e., the second storage space, according to the sequential address order and store the ith line of reference frame data into the second storage space. It can be seen that a RAM with F memory spaces can wrap around to store F rows of reference frame data. Therefore, it can be seen that the specific relationship for storing the reference frame data of the ith row in the reference frame data into the kth storage space of the RAM may be: k — i mod (F).
In specific applications, for example: when the size of the current block is 16 × 16 pixels and the search window of each current block is 48 lines of data in the vertical direction, the FPGA device may design all the RAMs to have a maximum total of reference frame data that can be buffered by 48 lines. The FPGA device can be configured with 16 RAM chips, and each RAM chip can store 3 rows of data. The memory mapping diagram of the reference frame data and the RAM shown in fig. 9 is exemplified by a 1080P (1920x1088) resolution image, the left half of the diagram is a schematic diagram of reference frame data, lines 0-1087 represent 1088 reference frame data, the 1088 lines of reference frame data are divided into units 1088/16 ═ 68 units according to 16 lines, each line with the same index (line Number mod 16) is stored in a piece of RAM, each piece of RAM is divided into three parts of space according to continuous addresses, each part of space can store one line of reference frame data continuously, where row 0/row 16/row 32.. reference frame data is stored on RAM0, as shown by the dashed lines, line 1/line 17/line 33.. reference frame data is stored on RAM1, and so on, lines 15/31/47.. reference frame data is stored on RAM15, as shown by the solid lines in the figure.
Therefore, in the embodiment of the invention, the FPGA device maps the reference frame data into different RAMs, so that the simultaneous reading of multiple lines of data of reference blocks at any position in a search window can be realized, the required reference block data can be read in one cycle in a stream, and the data reading efficiency of the device is improved.
S202, monitoring whether the N RAMs are full of data or not.
The FPGA device can monitor the size of the residual space of each configured RAM and judge whether each RAM is full of data according to the monitored size of the residual space of each RAM.
S203, when the N RAMs are determined to be full of data, the reading of the reference frame data in the memory is stopped.
In the embodiment of the invention, when the residual space of each RAM is monitored to be 0, the FPGA device can determine that N RAMs are full of data. When the FPGA device determines that the N RAMs are full of data, the FPGA device can stop reading the reference frame data in the DDR, the situation that the data are continuously written into the RAMs full of data to cover the data in the RAMs is avoided, and therefore the data safety in the RAMs is improved.
S204, when the motion estimation is carried out on the current block according to the part of the reference frame data cached in the cache region, whether the position of the search window of the current block is changed compared with the position of the search window of the previous current block is monitored.
S205, when the position of the search window of the current block is monitored to be changed, determining the previous M rows of data in at least one row of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block in the cache region, and marking the previous M rows of data.
S206, when the fact that the size of the marked previous M rows of data meets the preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the cache region with the new M rows of data to store the new M rows of data in the RAM.
In the embodiment of the present invention, the specific implementation manners of step S204, step S205, and step S206 may refer to the above-mentioned embodiment, which is not described herein again.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the cache region, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
Fig. 10 is a schematic flow chart of a data caching method according to a third embodiment of the present invention. The data caching method of the embodiment of the invention comprises the following steps:
s300, when the target reference block needs to be read from the N RAMs to carry out motion estimation on the current block, obtaining coordinates of the target reference block in reference frame data, wherein the coordinates comprise an abscissa and an ordinate.
In the embodiment of the present invention, the coordinates of the target reference block in the reference frame data may be the abscissa and the ordinate of the target reference block in the reference frame corresponding to the reference frame data, that is, the abscissa and the ordinate of the target reference block in the reference frame image. The abscissa may be the distance from the top left corner pixel of the target reference block to the left boundary of the reference frame, and the ordinate may be the distance from any row of data in the target reference block to the top boundary of the reference frame, where the distance may be represented by rows and columns.
S301, according to the vertical coordinate of the target reference block, determining the addressing of the RAMs corresponding to each line of data in the target reference block.
From the above embodiments, it can be known that the specific relationship for storing the pixel row of the ith row of the reference frame into the jth RAM may be: j equals i mod (N), and N is the number of RAMs configured by the FPGA device. Therefore, the FPGA device can determine the addressing of the RAM corresponding to each line of data in the target reference block according to the vertical coordinate of the target reference block by combining the relation, so that the corresponding RAM can be determined according to the addressing of the RAM.
Specifically, referring to the schematic diagram of the position of the target reference block in the reference frame data shown in fig. 11, a part of the reference frame stored in the RAM may be a search window 1, and the vertical dotted line is the 16-pixel alignment position inside the reference frame data, because the target reference block may be at any position of the search window, and therefore, any row of pixel rows of the target reference block in fig. 11 may cross a 16-pixel boundary. Since the size of an arbitrary reference block may be 16 × 16 pixels and the bit width of the RAM is 16 bytes, any one line of data of the target reference block in fig. 11 may be stored at 2 consecutive addresses in the RAM. When the FPGA device configures 16 RAMs, and the ordinate of any line of data in the target reference block is 25, 25mod16 ═ 9 can be obtained according to j ═ i mod (N), and it can be seen that the FPGA device stores the line of data with the ordinate of 25 in the target reference block into the 9 th RAM.
S302, according to the ordinate of the target reference block, the addressing of the storage space corresponding to each row of data in the target reference block is determined.
From the above embodiments, it can be known that the specific relationship for storing the ith row of reference frame data in the reference frame data into the kth storage space of the RAM may be: k is i mod (F), and F is the number of memory spaces divided by the FPGA device in the RAM. Therefore, the FPGA device can determine addressing of the storage space corresponding to each line of data in the target reference block according to the vertical coordinate of the target reference block by combining the relation, so that the corresponding storage space can be determined according to the addressing of the storage space.
Specifically, referring to the schematic diagram of the position of the target reference block in the reference frame data shown in fig. 11, when each RAM is divided into 3 memory spaces, and the ordinate of any line of data in the target reference block in fig. 11 is 25 (i.e. the line of data is at the 25 th line position of the reference frame data), according to k ═ i mod (F), 25mod3 ═ 1 can be obtained, and it can be seen that the FPGA device stores one line of data with the ordinate of 25 in the target reference block into the 1 st memory space of the 9 th RAM.
And S303, determining offset addresses corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block.
In the embodiment of the invention, the abscissa is rounded to the bit width of the RAM to obtain the offset address of the storage space. If the abscissa is X and the bit width is Y, the offset address Z is [ X/Y ]. As shown in fig. 11, the specific location of a row of data with ordinate 25 and abscissa 20 in the target reference block in the first slice of storage space may be: when the bit width of the RAM is 16 bytes, data storage is 16-pixel aligned continuous storage, and therefore the abscissa of the reference block can be used to obtain a specific offset address of the 1 st storage space in the RAM, that is, the offset address Z ═ 20/16 ═ 1, then it can be known that the ordinate in the target reference block is 25, and the start storage address of a line of data with 20 on the abscissa is address 1 of the 1 st storage space in RAM 9.
And S304, combining the addressing of the RAMs corresponding to the data in each row in the target reference block, the addressing of the storage space corresponding to the data in each row in the target reference block and the offset addresses corresponding to the data in each row in the target reference block, to obtain the storage addresses of the data in each row in the target reference block.
S305, reading the data of each line of the target reference block in parallel according to the storage address of the data of each line in the target reference block to obtain the target reference block.
In the embodiment of the invention, the FPGA device can read data from the N RAMs in parallel at the offset addresses of the corresponding storage spaces in the corresponding RAMs, so that the FPGA device can read a plurality of lines of data or all data in the target reference block in the same clock cycle, and the FPGA device can obtain the target reference block after the target reference block is completely read.
In the embodiment of the invention, when the FPGA device needs to read or write data to the RAM at the same time, the FPGA device can carry out arbitration. When the FPGA device receives a read operation instruction and a write operation instruction of the RAM at the same time, judging whether the RAM can simultaneously carry out read operation and write operation, and when the RAM is judged not to be capable of simultaneously carrying out read operation and write operation, controlling the RAM to carry out read operation. When the read-write in the FPGA device occurs simultaneously, the data correctness is ensured, and when the RAM is free, the data is written.
S306, when the motion estimation is carried out on the current block according to the part of the reference frame data cached in the cache region, whether the position of the search window of the current block is changed compared with the position of the search window of the previous current block is monitored.
S307, when the position of the search window of the current block is monitored to be changed, determining the previous M rows of data in at least one row of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block in the cache region, and marking the previous M rows of data.
S308, when the fact that the size of the marked previous M rows of data meets the preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the cache region with the new M rows of data to store the new M rows of data in the RAM.
In the embodiment of the present invention, the specific implementation manners of step S306, step S307, and step S308 may refer to the above-mentioned embodiment, which is not described herein again.
In the embodiment of the invention, the device maps the search window into different RAMs, the address reading, writing and decoding are simple, the simultaneous reading of multiple rows of any reference block in the search window can be realized, the required reference block data can be read in at least one cycle in a pipeline, and the data reading efficiency of the device is improved.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the cache region, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
The data caching apparatus according to the present invention will be described in detail with reference to fig. 12 to 13. It should be noted that, the data caching apparatuses shown in fig. 12 to 13 are used for executing the method according to the embodiments of the present invention shown in fig. 1 to 11, for convenience of description, only the portions related to the embodiments of the present invention are shown, and details of the specific technology are not disclosed, please refer to the embodiments of the present invention shown in fig. 1 to 11.
In the embodiment of the present invention, the described data caching device may specifically be an FPGA device, and the following description will take the FPGA device as an example.
Fig. 12 is a structural diagram of a data caching apparatus according to an embodiment of the present invention. The device described in the embodiments of the present invention includes:
a first monitoring unit 100, configured to monitor whether a search window position of a current block is changed from a search window position of a previous current block when motion estimation is performed on the current block according to a portion of reference frame data buffered in a buffer.
A first determining unit 200, configured to determine, in the buffer, the previous M rows of data in at least one column of partial reference frame data from which the search window of the current block is shifted out compared with the search window of the previous current block when the first monitoring unit monitors that the position of the search window of the current block changes, and mark the previous M rows of data.
An updating unit 300, configured to, when it is detected that the size of the marked previous M rows of data meets a preset condition, read new M rows of data from reference frame data stored in an external memory, and store the new M rows of data in the RAM in place of the marked previous M rows of data in the buffer.
It can be understood that the functions of each functional module of the unit in the FPGA device of this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the cache region, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
Fig. 13 is a structural diagram of another embodiment of a data caching apparatus according to an embodiment of the present invention. The device described in the embodiments of the present invention includes:
a first monitoring unit 100, a first determining unit 200 and an updating unit 300.
Wherein the apparatus comprises:
a configuration unit 400 for configuring N RAMs, where N > 1;
a reading and writing unit 500, configured to sequentially read, line by line, M lines of reference frame data of the reference frame data stored in the external memory according to a row-column sequence, and write the M lines of reference frame data into the N RAMs for caching respectively;
a second monitoring unit 600, configured to monitor whether the N RAMs have been full of data;
a reading stop unit 700, configured to stop reading the reference frame data in the memory when the second monitoring unit determines that the N RAMs are full of data.
Wherein the configuration unit 400 comprises:
a configuration subunit 10, configured to configure a storage space of each RAM, where the size of the storage space of each RAM is the same, and the size of the storage spaces of N RAMs is the same as the size of the W rows of reference frame data;
a first addressing subunit 20 for sequentially addressing the N RAMs;
and a second addressing subunit 30, configured to divide F storage spaces of each RAM according to the consecutive addresses of each RAM, and sequentially address the F storage spaces respectively according to the consecutive address order of each RAM, where each storage space is used to store one line of reference frame data.
The read/write unit 500 is specifically configured to:
and sequentially storing the first line of reference frame data to the last line of reference frame data in the M lines of reference frame data into corresponding first RAM to Nth RAM according to the arrangement sequence of the lines.
The read/write unit 500 is further specifically configured to:
when one line of reference frame data is stored in the corresponding RAM, one line of reference frame data is stored in the corresponding storage space in the corresponding RAM according to the sequence of continuous addresses in the RAM.
Wherein the updating unit 300 comprises:
the reading subunit 40 is configured to sequentially read, line by line, M rows of data subsequent to the partial reference frame data in the reference frame data stored in the external memory according to a row-column sequence, where the M rows of data are the same as the marked previous M rows of data in size;
and a writing subunit 50, configured to write the M rows of data as the new M rows of data into positions where the marked previous M rows of data cached in the N RAMs are stored, respectively.
Wherein the apparatus further comprises:
a first obtaining unit 600, configured to obtain coordinates of a target reference block in reference frame data when the target reference block needs to be read from the N RAMs to perform motion estimation on a current block, where the coordinates include a horizontal coordinate and a vertical coordinate;
a second determining unit 601, configured to determine, according to the ordinate of the target reference block, addressing of RAMs corresponding to each line of data in the target reference block respectively;
a third determining unit 602, configured to determine, according to the ordinate of the target reference block, addressing of storage spaces corresponding to each line of data in the target reference block;
a fourth determining unit 603, configured to determine, according to the ordinate of the target reference block, offset addresses corresponding to each line of data in the target reference block respectively;
a second obtaining unit 604, configured to obtain storage addresses of data in different rows in the target reference block by combining addressing of RAMs corresponding to the data in different rows in the target reference block, addressing of storage spaces corresponding to the data in different rows in the target reference block, and offset addresses corresponding to the data in different rows in the target reference block;
the reading unit 605 is configured to read, in parallel, each line of data of the target reference block according to the storage address of each line of data in the target reference block, so as to obtain the target reference block.
Wherein the apparatus comprises:
a determining unit 700, configured to determine whether the RAM can perform read operation and write operation simultaneously when a read operation instruction and a write operation instruction for the RAM are received simultaneously;
a control unit 800, configured to control the RAM to perform a read operation when it is determined that the RAM cannot perform a read operation and a write operation at the same time.
It can be understood that the functions of each functional module of the unit in the FPGA device of this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the cache region, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
Referring to fig. 14, a schematic structural diagram of a terminal is provided in the embodiment of the present invention. As shown in fig. 14, the terminal 1000 can include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 14, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a communication connection application program.
In the terminal 1000 shown in fig. 14, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; a network interface 1004 for connecting with a terminal; and the processor 1001 may be configured to invoke the communication connection application stored in the memory 1005 and specifically perform the following operations:
when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, monitoring whether the position of a search window of the current block is changed compared with the position of a search window of a previous current block, wherein the partial reference frame data comprises W lines of reference frame data, the size of the current block is M multiplied by N pixels, M is a line number, N is a column number, W is larger than or equal to 3M, and N, M and W are both natural numbers;
when the position of the search window of the current block is monitored to be changed, determining front M rows of data in at least one row of partial reference frame data shifted out by the search window of the current block compared with the search window of the previous current block in the cache region, and marking the front M rows of data;
when the fact that the size of the marked previous M rows of data meets a preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the cache region with the new M rows of data to store the new M rows of data in the RAM, wherein the size of the new M rows of data is the same as that of the marked previous M rows of data.
Wherein the cache region comprises N RAMs;
before the processor 1001 monitors whether the search window position of the current block is changed compared with the search window position of the previous current block when the current block is subjected to motion estimation according to the part of the reference frame data cached in the cache region, the processor 1001 further performs:
configuring N RAMs, wherein N is more than 1;
reading M lines of reference frame data of the reference frame data stored in the external memory line by line in sequence, and writing the M lines of reference frame data into the N RAMs for caching respectively;
monitoring whether the N RAMs are full of data or not;
and when the N RAMs are determined to be full of data, stopping reading the reference frame data in the memory.
Wherein, the processor 1001 configures N RAMs including:
configuring a storage space of each RAM, wherein the size of the storage space of each RAM is the same, and the size of the storage space of N RAMs is the same as that of the W-line reference frame data;
sequentially addressing the N RAMs;
dividing F storage spaces of each RAM according to continuous addresses of each RAM, and sequentially addressing the F storage spaces respectively according to the continuous address sequence of each RAM, wherein each storage space is used for storing a row of reference frame data, and F is a natural number.
The reading, by the processor 1001, M rows of reference frame data of the reference frame data stored in the external memory line by line in sequence according to a row-column order, and the writing, by the processor 1001, the M rows of reference frame data into the N RAMs for caching respectively includes:
and sequentially storing the first line of reference frame data to the last line of reference frame data in the M lines of reference frame data into corresponding first RAM to Nth RAM according to the arrangement sequence of the lines.
The processor 1001 sequentially stores reference frame data of a first line to reference frame data of a last line in the M lines of reference frame data in corresponding first RAM to nth RAM according to an arrangement sequence of the lines, and includes:
when one line of reference frame data is stored in the corresponding RAM, one line of reference frame data is stored in the corresponding storage space in the corresponding RAM according to the sequence of continuous addresses in the RAM.
When the monitoring that the position of the search window of the current block changes, the processor 1001 determines, in the buffer, the previous M rows of data in at least one column of partial reference frame data from which the search window of the current block is shifted out compared with the search window of the previous current block, and marks the previous M rows of data includes:
sequentially reading M rows of data behind the partial reference frame data line by line in a row-column sequence in the reference frame data stored in the external memory, wherein the M rows of data are the same as the marked previous M rows of data in size;
and writing the M lines of data serving as the new M lines of data into positions where the marked previous M lines of data cached in the N RAMs are stored respectively.
Wherein the processor 1001 further performs:
when a target reference block needs to be read from the N RAMs to perform motion estimation on the current block, acquiring coordinates of the target reference block in reference frame data, wherein the coordinates comprise horizontal coordinates and vertical coordinates;
determining the addressing of the RAMs corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block;
determining addressing of storage spaces corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block;
determining offset addresses corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block;
combining the addressing of the RAMs respectively corresponding to each row of data in the target reference block, the addressing of the storage spaces respectively corresponding to each row of data in the target reference block, and the offset addresses respectively corresponding to each row of data in the target reference block to obtain the storage addresses of each row of data in the target reference block;
and reading the data of each row of the target reference block in parallel according to the storage address of the data of each row in the target reference block to obtain the target reference block.
Wherein, the processor 1001 further performs the following steps:
when a read operation instruction and a write operation instruction for the RAM are received simultaneously, judging whether the RAM can carry out read operation and write operation simultaneously;
and when the RAM is judged to be incapable of simultaneously performing read operation and write operation, controlling the RAM to perform read operation.
It can be understood that the functions of each functional module of the unit in the terminal of this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
In the embodiment of the present invention, when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, whether a search window position of the current block is changed compared with a search window position of a previous current block is monitored, when the search window position of the current block is monitored to be changed, previous M row data in at least one column of partial reference frame data shifted by the search window of the current block compared with the search window of the previous current block is determined in the cache region, and the previous M row data is marked, when it is detected that the size of the marked previous M row data meets a preset condition, new M row data is read from reference frame data stored in an external memory, the new M row data replaces the marked previous M row data in the cache region and is stored in the RAM, so that an apparatus can perform motion estimation when data in a read cache region, meanwhile, the data in the buffer is updated, so that the waiting time for reading the data is shortened, the processing speed of motion estimation is accelerated, and a large amount of reading bandwidth is effectively saved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (16)

1. A method for caching data, the method comprising:
when motion estimation is performed on a current block according to partial reference frame data cached in a cache region, monitoring whether the position of a search window of the current block is changed compared with the position of the search window of a previous current block, wherein the partial reference frame data comprises W lines of reference frame data, the size of the current block is M multiplied by N pixels, M is a line number, N is a column number, W is larger than or equal to 3M, W, M, N are natural numbers, and the cache region is a Random Access Memory (RAM);
when the position of the search window of the current block is monitored to be changed, determining front M rows of data in at least one row of partial reference frame data shifted out by the search window of the current block compared with the search window of the previous current block in the cache region, and marking the front M rows of data;
when the fact that the size of the marked previous M rows of data meets a preset condition is detected, reading new M rows of data from reference frame data stored in an external memory, and replacing the marked previous M rows of data in the cache region with the new M rows of data to store the new M rows of data in the RAM, wherein the size of the new M rows of data is the same as that of the marked previous M rows of data, and the size of the previous M rows of data comprises the number of groups of the previous M rows of data.
2. The method of claim 1, wherein the cache area comprises N RAMs;
when motion estimation is performed on a current block according to part of reference frame data cached in a cache region, before monitoring whether the search window position of the current block is changed compared with the search window position of a previous current block, the method comprises the following steps:
configuring N RAMs, wherein N is more than 1;
reading M lines of reference frame data of the reference frame data stored in the external memory line by line in sequence, and writing the M lines of reference frame data into the N RAMs for caching respectively;
monitoring whether the N RAMs are full of data or not;
and when the N RAMs are determined to be full of data, stopping reading the reference frame data in the memory.
3. The method of claim 2, wherein the configuring the N RAMs comprises:
configuring a storage space of each RAM, wherein the size of the storage space of each RAM is the same, and the size of the storage space of N RAMs is the same as that of the W-line reference frame data;
sequentially addressing the N RAMs;
dividing F storage spaces of each RAM according to continuous addresses of each RAM, and sequentially addressing the F storage spaces respectively according to the continuous address sequence of each RAM, wherein each storage space is used for storing a row of reference frame data, and F is a natural number.
4. The method as claimed in claim 3, wherein said reading M rows of reference frame data of the reference frame data stored in the external memory row by row sequentially in row-column order, and writing the M rows of reference frame data into the N RAMs respectively comprises:
and sequentially storing the first line of reference frame data to the last line of reference frame data in the M lines of reference frame data into corresponding first RAM to Nth RAM according to the arrangement sequence of the lines.
5. The method according to claim 4, wherein the sequentially storing a first row of reference frame data to a last row of reference frame data in the M rows of reference frame data into the corresponding first RAM to Nth RAM according to the arrangement order of the rows comprises:
when one line of reference frame data is stored in the corresponding RAM, one line of reference frame data is stored in the corresponding storage space in the corresponding RAM according to the sequence of continuous addresses in the RAM.
6. The method of claim 2, wherein the determining the previous M rows of data in the at least one column of partial reference frame data from which the search window of the current block was shifted out compared to the search window of the previous current block in the buffer when monitoring that the search window position of the current block has changed, and marking the previous M rows of data comprises:
sequentially reading M rows of data behind the partial reference frame data line by line in a row-column sequence in the reference frame data stored in the external memory, wherein the M rows of data are the same as the marked previous M rows of data in size;
and writing the M lines of data serving as the new M lines of data into positions where the marked previous M lines of data cached in the N RAMs are stored respectively.
7. The method of claim 3, wherein the method further comprises:
when a target reference block needs to be read from the N RAMs to perform motion estimation on the current block, acquiring coordinates of the target reference block in reference frame data, wherein the coordinates comprise horizontal coordinates and vertical coordinates;
determining the addressing of the RAMs corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block;
determining addressing of storage spaces corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block;
determining offset addresses corresponding to each row of data in the target reference block according to the vertical coordinate of the target reference block;
combining the addressing of the RAMs respectively corresponding to each row of data in the target reference block, the addressing of the storage spaces respectively corresponding to each row of data in the target reference block, and the offset addresses respectively corresponding to each row of data in the target reference block to obtain the storage addresses of each row of data in the target reference block;
and reading the data of each row of the target reference block in parallel according to the storage address of the data of each row in the target reference block to obtain the target reference block.
8. The method of any one of claims 1-7, wherein the method comprises:
when a read operation instruction and a write operation instruction for the RAM are received simultaneously, judging whether the RAM can carry out read operation and write operation simultaneously;
and when the RAM is judged to be incapable of simultaneously performing read operation and write operation, controlling the RAM to perform read operation.
9. A data caching apparatus, comprising:
the device comprises a first monitoring unit, a second monitoring unit and a third monitoring unit, wherein the first monitoring unit is used for monitoring whether the position of a search window of a current block is changed compared with the position of the search window of a previous current block when the current block is subjected to motion estimation according to partial reference frame data cached in a cache region, the partial reference frame data comprises W lines of reference frame data, the size of the current block is M multiplied by N pixels, M is a line number, N is a column number, W is more than or equal to 3M, W, M, N is a natural number, and the cache region is a Random Access Memory (RAM);
a first determining unit, configured to determine, in the buffer, previous M rows of data in at least one column of partial reference frame data from which a search window of the current block is shifted out compared with a search window of a previous current block when the first monitoring unit monitors that a position of the search window of the current block changes, and mark the previous M rows of data;
and the updating unit is used for reading new M-row data from reference frame data stored in an external memory when the condition that the size of the marked previous M-row data meets the preset condition is detected, replacing the marked previous M-row data in the cache region with the new M-row data, and storing the new M-row data into the RAM, wherein the size of the new M-row data is the same as that of the marked previous M-row data, and the size of the previous M-row data comprises the group number of the previous M-row data.
10. The apparatus of claim 9, wherein the cache region comprises N RAMs;
the device comprises:
a configuration unit for configuring N RAMs, wherein N is more than 1;
the reading and writing unit is used for sequentially reading M lines of reference frame data of the reference frame data stored in the external memory line by line according to the sequence of rows and columns, and respectively writing the M lines of reference frame data into the N RAMs for caching;
the second monitoring unit is used for monitoring whether the N RAMs are full of data;
and the reading stopping unit is used for stopping reading the reference frame data in the memory when the second monitoring unit determines that the N RAMs are full of data.
11. The apparatus of claim 10, wherein the configuration unit comprises:
the configuration subunit is used for configuring the storage space of each RAM, wherein the storage space of each RAM is the same in size, and the storage space of N RAMs is the same as the size of the W-row reference frame data;
a first addressing subunit for sequentially addressing the N RAMs;
and the second addressing subunit is used for dividing F storage spaces of each RAM according to the continuous addresses of each RAM, and sequentially addressing the F storage spaces respectively according to the continuous address sequence of each RAM, wherein each storage space is used for storing a line of reference frame data, and F is a natural number.
12. The apparatus of claim 11, wherein the read-write unit is specifically configured to:
and sequentially storing the first line of reference frame data to the last line of reference frame data in the M lines of reference frame data into corresponding first RAM to Nth RAM according to the arrangement sequence of the lines.
13. The apparatus of claim 12, wherein the read-write unit is further specifically configured to:
when one line of reference frame data is stored in the corresponding RAM, one line of reference frame data is stored in the corresponding storage space in the corresponding RAM according to the sequence of continuous addresses in the RAM.
14. The apparatus of claim 10, wherein the update unit comprises:
the reading subunit is configured to sequentially read, line by line, M rows of data subsequent to the partial reference frame data in the reference frame data stored in the external memory according to a row-column sequence, where the M rows of data are the same as the marked previous M rows of data in size;
and the writing subunit is configured to write the M rows of data serving as the new M rows of data into positions where the marked previous M rows of data cached in the N RAMs are stored, respectively.
15. The apparatus of claim 11, wherein the apparatus further comprises:
the first acquisition unit is used for acquiring coordinates of a target reference block in reference frame data when the target reference block needs to be read from the N RAMs to carry out motion estimation on a current block, wherein the coordinates comprise horizontal coordinates and vertical coordinates;
the second determining unit is used for determining the addressing of the RAMs corresponding to the data of each row in the target reference block according to the vertical coordinate of the target reference block;
a third determining unit, configured to determine, according to the ordinate of the target reference block, addressing of storage spaces corresponding to each line of data in the target reference block;
a fourth determining unit, configured to determine, according to the ordinate of the target reference block, offset addresses corresponding to each line of data in the target reference block, respectively;
a second obtaining unit, configured to obtain, in combination with addressing of RAMs corresponding to respective rows of data in the target reference block, addressing of storage spaces corresponding to respective rows of data in the target reference block, and offset addresses corresponding to respective rows of data in the target reference block, storage addresses of respective rows of data in the target reference block;
and the reading unit is used for reading the data of each row of the target reference block in parallel according to the storage address of the data of each row in the target reference block to obtain the target reference block.
16. The apparatus of any one of claims 9-15, wherein the apparatus comprises:
the judging unit is used for judging whether the RAM can carry out read operation and write operation simultaneously when a read operation instruction and a write operation instruction for the RAM are received simultaneously;
and the control unit is used for controlling the RAM to carry out read operation when judging that the RAM can not carry out read operation and write operation at the same time.
CN201610972964.1A 2016-10-28 2016-10-28 Data caching method and device Active CN108024116B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610972964.1A CN108024116B (en) 2016-10-28 2016-10-28 Data caching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610972964.1A CN108024116B (en) 2016-10-28 2016-10-28 Data caching method and device

Publications (2)

Publication Number Publication Date
CN108024116A CN108024116A (en) 2018-05-11
CN108024116B true CN108024116B (en) 2021-06-25

Family

ID=62084703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610972964.1A Active CN108024116B (en) 2016-10-28 2016-10-28 Data caching method and device

Country Status (1)

Country Link
CN (1) CN108024116B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636854A (en) * 2018-12-18 2019-04-16 重庆邮电大学 A kind of augmented reality three-dimensional Tracing Registration method based on LINE-MOD template matching
WO2021134631A1 (en) * 2019-12-31 2021-07-08 深圳市大疆创新科技有限公司 Video processing method and apparatus
CN112486894A (en) * 2020-12-18 2021-03-12 航天科技控股集团股份有限公司 Data high-speed processing method based on 4G + MCU dual system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309405B (en) * 2007-05-14 2011-04-20 华为技术有限公司 Reference data loading method and device
CN101340588B (en) * 2008-08-20 2010-06-23 炬力集成电路设计有限公司 Motion estimation method, apparatus and multimedia processor
US9363524B2 (en) * 2013-08-26 2016-06-07 Amlogic Co., Limited Method and apparatus for motion compensation reference data caching
US9292899B2 (en) * 2013-09-25 2016-03-22 Apple Inc. Reference frame data prefetching in block processing pipelines
CN104268098B (en) * 2014-08-28 2017-07-11 上海交通大学 Caching system on a kind of piece for ultra high-definition video frame rate upconversion
CN104935933B (en) * 2015-06-05 2019-11-26 广东中星微电子有限公司 A kind of video coding-decoding method
CN105847828B (en) * 2016-01-29 2019-02-05 西安邮电大学 A kind of reference block pixel update Parallel Implementation method for integer estimation

Also Published As

Publication number Publication date
CN108024116A (en) 2018-05-11

Similar Documents

Publication Publication Date Title
US20070268298A1 (en) Delayed frame buffer merging with compression
CN108024116B (en) Data caching method and device
CN113015003B (en) Video frame caching method and device
US20050083338A1 (en) DSP (digital signal processing) architecture with a wide memory bandwidth and a memory mapping method thereof
JP5196239B2 (en) Information processing apparatus and method
CN110708609A (en) Video playing method and device
US9082370B2 (en) Display control device and data processing system
US9460489B2 (en) Image processing apparatus and image processing method for performing pixel alignment
CN110322904B (en) Compressed image information reading control method and device
US7401177B2 (en) Data storage device, data storage control apparatus, data storage control method, and data storage control program
US20120147023A1 (en) Caching apparatus and method for video motion estimation and compensation
WO2007057053A1 (en) Conditional updating of image data in a memory buffer
US20150242988A1 (en) Methods of eliminating redundant rendering of frames
EP3474224B1 (en) Graphics processing method and device
CN107506119B (en) Picture display method, device, equipment and storage medium
US10109260B2 (en) Display processor and method for display processing
US10152766B2 (en) Image processor, method, and chipset for increasing intergration and performance of image processing
US8988444B2 (en) System and method for configuring graphics register data and recording medium
JP2003030129A (en) Data buffer
US20120144150A1 (en) Data processing apparatus
CN115766677B (en) Frame rate conversion method and device for video mode
JP4687108B2 (en) Data storage device, data storage control device, data storage control method, and data storage control program
US11790592B2 (en) Data process apparatus for to-be-cached matrix and method thereof
CN117082281B (en) Audio and video data synchronous processing method, system, equipment and medium
US11010661B2 (en) Neural network chip, method of using neural network chip to implement de-convolution operation, electronic device, and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant