WO2022117076A1 - 视频运动估计方法、装置、设备、计算机可读存储介质及计算机程序产品 - Google Patents
视频运动估计方法、装置、设备、计算机可读存储介质及计算机程序产品 Download PDFInfo
- Publication number
- WO2022117076A1 WO2022117076A1 PCT/CN2021/135372 CN2021135372W WO2022117076A1 WO 2022117076 A1 WO2022117076 A1 WO 2022117076A1 CN 2021135372 W CN2021135372 W CN 2021135372W WO 2022117076 A1 WO2022117076 A1 WO 2022117076A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target block
- frame
- search
- image
- image frame
- Prior art date
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 293
- 238000000034 method Methods 0.000 title claims abstract description 108
- 238000004590 computer program Methods 0.000 title claims abstract description 9
- 239000013598 vector Substances 0.000 claims abstract description 104
- 238000012545 processing Methods 0.000 claims description 44
- 230000015654 memory Effects 0.000 claims description 33
- 230000008569 process Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 13
- 230000000875 corresponding effect Effects 0.000 description 82
- 230000006870 function Effects 0.000 description 19
- 230000003287 optical effect Effects 0.000 description 17
- 230000011218 segmentation Effects 0.000 description 17
- 238000004364 calculation method Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 11
- 230000002457 bidirectional effect Effects 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 10
- 238000007906 compression Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 6
- 239000000284 extract Substances 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 229910003460 diamond Inorganic materials 0.000 description 3
- 239000010432 diamond Substances 0.000 description 3
- 238000003709 image segmentation Methods 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000001525 retina Anatomy 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005294 ferromagnetic effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000005291 magnetic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/223—Analysis of motion using block-matching
- G06T7/238—Analysis of motion using block-matching using non-full search, e.g. three-step search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/114—Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/142—Detection of scene cut or scene change
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/56—Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/57—Motion estimation characterised by a search window with variable size or shape
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/573—Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Definitions
- the embodiments of the present application are based on the Chinese patent application with the application number of 202011401743.1 and the application date of December 4, 2020, and claim the priority of the Chinese patent application.
- the entire contents of the Chinese patent application are incorporated into the embodiments of the present application as refer to.
- the present application relates to the technical field of video motion estimation, and relates to, but is not limited to, a video motion estimation method, apparatus, device, computer-readable storage medium, and computer program product.
- Video compression can use inter-frame prediction to eliminate temporal redundancy in sequence frames, and motion estimation is a key technology widely used in inter-frame prediction in video coding, but it is time-consuming and accounts for 70% of the calculation of the entire video coding. , the ratio will be even higher for higher definition video. Therefore, the motion estimation algorithm is the main factor determining the efficiency of video compression, and reducing the computational cost of motion estimation, improving the accuracy of motion estimation and making the motion estimation search process more robust, faster and more efficient are the key goals to accelerate the video compression process. .
- Embodiments of the present application provide a video motion estimation method, apparatus, device, computer-readable storage medium, and computer program product, which can improve search efficiency and improve motion estimation accuracy.
- An embodiment of the present application provides a video motion estimation method, which is executed by a video motion estimation device, including:
- each of the image frame sets includes at least one image frame
- motion estimation processing is performed in the search area corresponding to the search range in each predicted frame, to obtain The motion vector corresponding to the target block.
- An embodiment of the present application provides a video motion estimation apparatus, including:
- a first acquisition module configured to acquire multiple image frames in the video to be processed, perform scene division processing on the multiple image frames, and obtain multiple image frame sets, wherein each image frame set includes at least one image frame ;
- a feature extraction module configured to extract the outline feature and color feature of the foreground object in each image frame in each described image frame set
- a first determining module configured to determine a search range corresponding to each of the image frame sets based on the contour features of the foreground objects in each of the image frame sets;
- a second determining module configured to determine the starting search point of each predicted frame in each of the image frame sets
- a motion estimation module configured to be within a search area corresponding to the search range in each predicted frame based on the starting search point of each predicted frame, the target block in the reference frame, and the color feature of the foreground object Perform motion estimation processing to obtain a motion vector corresponding to the target block.
- the embodiment of the present application provides a device, including:
- the memory is used to store executable instructions; the processor is used to implement the above method when executing the executable instructions stored in the memory.
- Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to execute the method to implement the foregoing method.
- Embodiments of the present application provide a computer program product, including a computer program or instructions, and the computer program or instructions cause a computer to execute the above method.
- the search range corresponding to each image frame set is determined. Motion estimation is carried out in the search area of in order to search within a certain range, so the search range is narrowed, which can reduce the search time, and the search range is limited based on the contour features of the foreground objects in each scene, which can improve the motion. Estimated accuracy.
- FIG. 1 is a schematic diagram of a system architecture of a video motion estimation system provided by an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of a realization flow of a video motion estimation method provided by an embodiment of the present application.
- 4A-FIG. 4C are schematic diagrams of implementation flowcharts of motion estimation in a search area corresponding to a search range provided by an embodiment of the present application;
- FIG. 5 is a schematic flowchart of still another video motion estimation method provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of the implementation of a video motion estimation method based on 3D image blocks provided by an embodiment of the present application;
- FIG. 7 is a corresponding schematic diagram of a quadrant where a predicted motion vector is located and a priority search area provided by an embodiment of the present application;
- FIG. 8 is a schematic diagram of implementing motion estimation of a 3D image block according to an embodiment of the present application.
- Motion estimation A technique used in video coding is the process of calculating the motion vector between the current frame and the reference frame during the compression coding process.
- Motion vector a vector representing the relative displacement between the current coded block and the best matching block in the reference image.
- Optical flow method Use the changes of pixels in the image sequence in the time domain and the correlation between adjacent frames to determine the corresponding relationship between the previous frame and the current frame, so as to calculate the objects between adjacent frames.
- a method of motion information Use the changes of pixels in the image sequence in the time domain and the correlation between adjacent frames to determine the corresponding relationship between the previous frame and the current frame, so as to calculate the objects between adjacent frames.
- the motion estimation method in the related art is to first divide each image frame of the video sequence into multiple sizes The same and non-overlapping blocks or macroblocks, and assuming that the displacements of all pixels in the macroblock are equal, and then search for the most similar target matching block of each block or macroblock in the adjacent reference frame according to a certain matching scheme, Finally, the relative offset of the spatial position between the macroblock and the target matching block, that is, the motion vector, is calculated, and the process of obtaining the motion vector is motion estimation.
- the core idea of motion estimation is to obtain the motion vector between frames of a video sequence as accurately as possible, which is mainly used for motion compensation between frames.
- the compensation residual needs to be transformed, quantized, and encoded, and then entropy encoded with the motion vector , and sent to the decoding end in the form of a bit stream, at the decoding end, the current block or the current macroblock can be recovered through these two data (ie, the compensation residual and the motion vector).
- the application of motion estimation method in video transmission can effectively remove data redundancy between frames, thereby reducing the amount of transmitted data.
- the accuracy of the motion vector determines the quality of the video frame for prediction and compensation. The higher the quality, the smaller the compensation residual, the fewer bits required for compensation encoding, and the lower the bit rate requirement for transmission.
- the related art motion estimation methods include spatial motion estimation and frequency domain motion estimation.
- spatial domain motion estimation includes motion estimation based on global, pixel, macroblock, region and grid, etc.
- frequency domain motion estimation includes phase method, discrete cosine transform method and wavelet domain method.
- the airspace motion estimation method has become a method favored by many researchers in recent years because of its relatively fast calculation speed, low complexity, and easy implementation on most hardware platforms.
- Spatial motion estimation methods can be divided into global search and fast search according to the matching search range.
- the global search method mainly performs an exhaustive search on all the areas within the search range, and has the highest accuracy, but the computational complexity is also high and it is difficult to realize real-time processing, while the fast search method is to perform an exhaustive search in the search area according to the set rules.
- Macroblocks are used to search, so the search speed is faster than the global search, but the search may not be the optimal block, such as diamond search method (DS, Diamond Search), three-step search method (TSS, Three step Search), four-step search method
- the search method (FSS, Four Step Search) is a fast motion estimation method based on local search, which mainly speeds up the search speed by limiting the number of search steps or search points and adopting an appropriate search template.
- the official test model of High Efficiency Video Coding (HEVC) provides a basic full search algorithm and two search methods, TZSearch. search method.
- the fast motion estimation method provided in the related art is superior to the full search method in speed, but most of the fast search calculation methods have irregularities in data access, and the search efficiency still needs to be improved. For example, when the video is jittered during shooting, the contrast of the image frame is low, and the moving scene changes continuously, the optimal motion vector of the current block will be incorrectly matched by sub-blocks, which may easily lead to obvious blurring and block effect of the obtained interpolated frame.
- the continuous image frames of the video are regarded as the overall calculation object, and the foreground and background processing of the video is added to the restriction of the search range. Reduce search time and improve motion estimation accuracy.
- the video motion estimation device provided by the embodiments of the present application may be implemented as notebook computers, tablet computers, desktop computers, mobile devices (eg, mobile phones, portable music players) Any terminal with screen display function, such as computer, personal digital assistant, special message device, portable game device), intelligent robot, etc., can also be implemented as a server.
- FIG. 1 is a schematic structural diagram of a video motion estimation system 10 provided by an embodiment of the present application.
- the video motion estimation system 10 includes a terminal 400 , a network 200 and a server 100 .
- the terminal 400 runs an application program, for example, an image capture application program, an instant messaging application program, and the like.
- the terminal 400 acquires the video to be processed, where the video may be acquired by an image acquisition device built in the terminal 400, for example, a video recorded in real time by a camera, or is the video stored locally by the terminal.
- the terminal 400 After acquiring the video to be processed, the terminal 400 divides the multiple image frames contained in the video based on the scene, and extracts the outline of the foreground object from the multiple image frames contained in each scene. feature, and determine the search range based on the contour feature, each scene corresponds to a search range, in multiple image frames of the same scene, search for the target block corresponding to the reference frame in the search area determined by the search range, and then determine Then, the terminal 400 sends the reference frame and the motion vector obtained by motion estimation to the server, and the server can perform motion compensation based on the motion vector to obtain a complete video file.
- an application program running on the terminal 400 may be, for example, an image capturing application program or an instant messaging application program or the like.
- the terminal 400 acquires the video to be processed, and sends the video to be processed to the server 100.
- the server 100 converts multiple images included in the video into The frame is divided based on the scene, and the contour feature of the foreground object is extracted from the multiple image frames included in each scene, and the search range is determined based on the contour feature.
- Each scene corresponds to a search range.
- the target block corresponding to the reference frame is searched in the search area determined by the search range, and then the motion vector is determined, the motion estimation process is completed, and motion compensation is performed based on the motion vector to obtain a complete video file.
- FIG. 2 is a schematic structural diagram of a video motion estimation device provided by an embodiment of the present application.
- the terminal 400 shown in FIG. 1 includes: at least one processor 410, a memory 450, at least one Network interface 420 and user interface 430.
- the various components in terminal 400 are coupled together by bus system 440 .
- the bus system 440 is used to implement the connection communication between these components.
- the bus system 440 also includes a power bus, a control bus, and a status signal bus.
- the various buses are labeled as bus system 440 in FIG. 2 .
- the processor 410 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor or the like.
- DSP Digital Signal Processor
- User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
- User interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, and other input buttons and controls.
- Memory 450 may be removable, non-removable, or a combination thereof.
- Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
- Memory 450 optionally includes one or more storage devices that are physically remote from processor 410 .
- Memory 450 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
- the non-volatile memory may be a read-only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
- ROM read-only memory
- RAM random access memory
- the memory 450 described in the embodiments of the present application is intended to include any suitable type of memory.
- memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
- the operating system 451 includes system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
- a presentation module 453 for enabling presentation of information (eg, a user interface for operating peripherals and displaying content and information) via one or more output devices 431 (eg, a display screen, speakers, etc.) associated with the user interface 430 );
- An input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
- FIG. 2 shows a video motion estimation apparatus 455 stored in the memory 450, which may be software in the form of programs and plug-ins, including the following software Modules: the first acquisition module 4551, the feature extraction module 4552, the first determination module 4553, the second determination module 4554 and the motion estimation module 4555, these modules are logical, so any combination or further split.
- the apparatus provided by the embodiments of the present application may be implemented in hardware.
- the apparatus provided by the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to execute the implementation of the present application.
- the video motion estimation method provided by the example, for example, a processor in the form of a hardware decoding processor may use one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device) ), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
- ASIC Application Specific Integrated Circuit
- DSP Programmable Logic Device
- PLD Programmable Logic Device
- CPLD Complex Programmable Logic Device
- FPGA Field-Programmable Gate Array
- FIG. 3 is a schematic flowchart of an implementation of a video motion estimation method provided by an embodiment of the present application, which will be described with reference to the steps shown in FIG. 3 .
- Step S101 acquiring multiple image frames in the video to be processed, and performing scene division processing on the multiple image frames to obtain multiple image frame sets.
- the video to be processed may be a video recorded in real time by the terminal, a video stored locally by the terminal, or a video downloaded by the terminal from a server.
- the scene can be divided based on the background image of the image frame.
- the background images of multiple image frames are similar, it can be considered to be in the same scene.
- each program can be divided into a scene.
- Each scene corresponds to an image frame set, and each image frame set includes at least one image frame.
- a set of image frames may be understood as a 3D image block, and the three dimensions in the 3D image block are the number of frames, the width of the frame, and the height of the frame, wherein the number of frames is the number of frames included in the set of image frames.
- the number of image frames, the frame width is the width of the image frame, in actual implementation, it can be expressed by the number of pixels in the width direction
- the frame height is the height of the image frame, which can be expressed by the pixels in the height direction of the image frame. Indicates the number of points.
- Step S102 extracting contour features and color features of foreground objects in each image frame in each image frame set.
- the contour feature of the foreground object can be calculated and collected through the optical flow vector gradient of the optical flow field, and the color feature of the foreground object can be extracted according to the image area where the foreground object is located.
- the prior knowledge of the background structure of the consecutive sample image frames and the foreground prior regions in the sample image frames can also be used to train the image segmentation model, and the trained image segmentation model can be used to achieve foreground object segmentation and scene segmentation , and extract the color features of the segmented foreground object image regions.
- the color features of each image frame in an image frame set also constitute a color feature sequence.
- the scene division process in step S101 and the extraction process of the contour feature and color feature of the foreground object in step S102 may be inputting the video to be processed into a trained image segmentation model, so as to complete the multiplication of the video to be processed. Scene segmentation and feature extraction of each image frame.
- Step S103 Determine the search range corresponding to each image frame set based on the contour feature of the foreground object in each image frame set.
- the outline of the foreground object can be represented by a rectangle, a square, an irregular figure, etc.
- the outline feature of the foreground object can include the coordinates of the vertices of the outline.
- the outline of the foreground object in each image frame in the image frame set can be used.
- the coordinates of the contour vertices determine the search range that includes foreground objects in all image frames.
- Step S104 determining the starting search point of each prediction frame in each image frame set.
- the time of the multiple image frames included in the image frame set can be sorted in order, the first image frame is used as a reference frame, and the other image frames are used as predicted frames;
- the i image frame is used as a reference frame, and the i+1th image frame is used as a predicted frame, where i is an increasing positive integer.
- the reference frame is an image frame used as a reference in the image frame set for calculating the motion vector
- the predicted frame is the image frame used for calculating the motion vector in the image frame set.
- the motion vector of the target block in each prediction frame can be predicted by successively adopting the median prediction, upper layer prediction and origin prediction in the spatial domain based on the correlation of the video sequence frame in the spatial domain and the temporal domain, thereby Determine the location of the starting search point in each prediction frame.
- Step S105 based on the starting search point of each prediction frame, the color feature of the target block in the reference frame and the foreground object, perform motion estimation in the search area corresponding to the search range in each prediction frame, and obtain the motion vector corresponding to the target block.
- the search can be started from each prediction frame.
- the point is the center, and the two-way motion estimation calculation is performed with the search area corresponding to the search range and the color features of the foreground object as constraints. Take the starting search point as the center, and keep the search area within the search area to complete the search of the target block of the predicted frame.
- an asymmetric cross search template with the w-axis search point being twice the h-axis search point can be used.
- the w-axis and the magnitude of the component on the h-axis which determines whether the foreground motion in the slider P is horizontal or vertical. If it is a horizontal motion, the horizontal asymmetric cross shape of the original UMHexagonS template is used. If it is determined to be a vertical motion, a template whose h-axis is twice the w-axis search point is used to search, so as to determine the target block in the predicted frame, and then Then, based on the position information of the target block and the reference target block, the motion vector corresponding to the target block is determined.
- the multiple image frames are first divided into scenes to obtain multiple image frame sets, that is, each scene Corresponding to an image frame set, and each image frame set includes one or more image frames, the backgrounds of the image frames belonging to the same scene are similar, and then extract the contour features of the foreground objects in each image frame in each image frame set and color features, and based on the contour features of the foreground objects in each image frame set, determine the search range corresponding to each image frame set, and then determine each starting search point of each predicted frame in each image frame set, and then based on each starting search point
- the color features of the search point, the target block in the reference frame, and the foreground object perform motion estimation in the search area corresponding to the search range in each prediction frame, and obtain the motion vector corresponding to the target block, which is within a certain range in the embodiment of the present application. Therefore, the search range is narrowed and the search time can be reduced, and the search range
- the scene division of multiple image frames in the above step S101 to obtain multiple image frame sets may be implemented in the following manner: performing the following processing on any image frame in any image frame set: determining multiple image frames background image area in the frame, and determine the image similarity between multiple background image areas; based on the image similarity between multiple background image areas, perform scene division processing on multiple image frames to obtain multiple image frame sets .
- target detection can be performed first, and foreground objects in multiple image frames can be identified first, for example,
- the detection and extraction of foreground objects can be carried out by using the difference between two consecutive frames or several frames of images in the video sequence, and by using the time information, the grayscale difference value of the corresponding pixel point can be obtained by comparing several consecutive frames in the image. If the threshold is , it can be judged that there is a foreground object at this position, and the other area outside this position is also the background image area.
- the optical flow field method can also be used to detect the foreground object.
- the grayscale preservation principle of the corresponding pixels in the adjacent two frames is used to evaluate the change of the two-dimensional image, which can better extract the image from the image frame.
- the optical flow field method is suitable for the detection of relatively moving foreground objects during the camera movement.
- the image similarity between the background image areas can be calculated.
- the histogram matching algorithm can be used to calculate the image similarity between the background image areas. For example, there is a background image area A. and the background image area B, respectively calculate the histograms of the two images, namely HistA, HistB, and then calculate the normalized correlation coefficient of the two histograms (such as the Barthel distance, the histogram intersection distance), etc., so as to determine the two similarity between.
- image similarity calculation can also be performed based on feature points.
- feature points in the background image region can be extracted respectively, and the Hamming distance between the feature points can be calculated to determine the similarity between the background image regions. similarity value.
- the background image areas with continuous time and high similarity corresponding to Image frames are divided into a set of image frames.
- multiple image frames in the video are divided into scenes by background images, because in the same scene, the moving range of foreground objects is relatively small, so the search range of foreground objects can be determined based on the scene.
- the range is as small as possible while maintaining accuracy.
- step S102 shown in FIG. 3 the contour feature and color feature of the foreground object in each image frame in each image frame set are extracted, which can be implemented in the following manner: performing the execution on any image frame in any image frame set.
- the following processing determine the foreground image area where the foreground object in the image frame is located; use the position information of the foreground image area as the contour feature of the foreground object in the image frame; perform color extraction processing based on the foreground image area to obtain the foreground object in the image frame color characteristics.
- the foreground and background of the image frame are segmented, so that the background image area, the foreground image area where the foreground object is located, and the position information of the foreground image area can be determined.
- the outline of the foreground image area may not completely fit the foreground object.
- the outline of the foreground image area may be a rectangle or a square that can include the person, while It doesn't have to be a humanoid silhouette. Therefore, the position information of the foreground image area can be represented by the vertex coordinates in the outline of the foreground object, that is, the outline feature of the foreground object includes the coordinates of each vertex of the foreground image area.
- the color feature of the foreground object can also be understood as the color feature of the foreground object.
- the color feature is a visual feature applied in image retrieval, and the color is often very related to the object or scene contained in the image.
- color features are less dependent on the size, orientation, and viewing angle of the image itself, resulting in higher robustness.
- color features can be represented in various ways, such as color histograms, color moments, color sets, color aggregation vectors, and color correlation diagrams.
- the contour feature and color feature of the foreground object in each image frame in each image frame set can be extracted, so as to set the search range and set the target block matching with the reference target block in the prediction frame.
- the constraints of motion estimation provide the data basis.
- step S103 shown in FIG. 3 based on the contour features of the foreground objects in each image frame set, determining the search range corresponding to each image frame set can be achieved by the following steps:
- step S1031 the following processing is performed for any image frame set: based on the position information of each foreground image area in the image frame set, the vertex coordinates in each foreground image area are determined.
- the position information of the foreground image area may be represented by the coordinates of the vertices of the foreground image area.
- the coordinates of the four vertices of the rectangular area need to be determined.
- the coordinates of the four vertices A, B, C, and D of the foreground image area in a certain predicted frame are (100, 100), (100, 500), (300, 500), and (300, 100), respectively.
- Step S1032 from the coordinates of each vertex, determine the first maximum value and the first minimum value corresponding to the first dimension, and determine the second maximum value and the second minimum value corresponding to the second dimension.
- Step S1032 when implemented, is to determine the first maximum value and the first minimum value corresponding to the first dimension from the coordinates of each vertex of each foreground image area belonging to the same image frame set, and determine the corresponding second dimension.
- step S1032 when step S1032 is implemented, the first maximum value of the first dimension and the The first minimum value and the second maximum and second minimum values for the second dimension.
- Step S1033 Determine the search range corresponding to the image frame set based on the first minimum value, the first maximum value, the second minimum value and the second maximum value.
- the search range corresponding to the image frame set can be determined, that is, the search range in the first dimension is greater than or equal to The first minimum value is less than or equal to the first maximum value, and the search range in the second dimension is greater than or equal to the second minimum value and less than or equal to the second maximum value.
- these four values can be used to determine four vertex coordinates, namely (the first minimum value, the second minimum value), (the first minimum value, the second maximum value), (the first maximum value, the second maximum value) minimum value) and (the first maximum value, the second maximum value), and based on the coordinates of the four vertices, the search range corresponding to the image frame set is determined.
- the first minimum value is 100
- the first maximum value is 600
- the second maximum value is 100
- the second maximum value is 800
- the coordinates of the four vertices determined based on the four values are (100, 100), (100, 800), (600, 100) and (600, 800), therefore, the search range is also the rectangular area determined by the four vertices.
- the search area is determined based on the contour features of the foreground objects in the multiple image frames belonging to the same scene. Since the motion range of the foreground objects in the same scene is generally relatively small, the The search range determined by the maximum coordinates and the minimum coordinates in two dimensions in the multiple foreground image regions can ensure that foreground objects in all image frames in the scene are included, thereby ensuring the calculation accuracy of motion estimation.
- determining each start search point of each prediction frame in each image frame set in step S104 shown in FIG. 3 may be implemented by the following manner: determining the reference target in each reference frame in each image frame set The position information of the block, wherein, the reference target block is any target block in the reference frame; the motion vector of each prediction frame is predicted by the set prediction mode, and the predicted motion vector of each prediction frame is obtained, and the prediction mode includes at least one of the following 1: Median prediction mode, upper layer block prediction mode and origin prediction mode; based on the position information of the reference target block and the predicted motion vector, the starting search point of each prediction frame is determined.
- the reference frame is also divided into foreground and background. After the foreground image area in the reference frame is determined, the foreground image area can be divided to obtain multiple reference target blocks.
- the size of the reference target block can be 4 *4, 8*8, etc.
- the position information of the reference target block is represented by a vertex coordinate of the reference target block, for example, it can be represented by the vertex coordinate of the upper left corner of the reference target block.
- the prediction mode includes at least one of median prediction, higher layer block prediction, and origin prediction.
- the motion of the current block can be predicted by the motion vector of the adjacent block. vector.
- the initial motion vector of the current block can be predicted according to the motion vector of the adjacent block in the spatial position of the current block (median prediction) or the motion vector of the block in the same position in the previous frame image in time (origin prediction), so as to Determine the initial search point.
- the position information of the reference target block can be moved according to the predicted motion vector, thereby determining each prediction The respective starting search points of the frame.
- the high-precision starting search point can make the search point as close as possible to the target block in the predicted frame, so that the search speed can be improved.
- the current motion vector is predicted by at least one prediction mode among the median prediction, the upper layer prediction and the origin prediction, so as to determine the position of the best starting search point and ensure the accuracy of the starting search point.
- step S105 shown in FIG. 3 based on the starting search point of each prediction frame, the color feature of the target block in the reference frame and the foreground object, the search area corresponding to the search range in each prediction frame Motion estimation processing is performed inside the block to obtain a motion vector corresponding to the target block, which can be implemented through steps S1051 to S1058 shown in FIG. 4B , and each step will be described below with reference to FIG. 4B .
- Step S1051 Determine the first search template corresponding to each prediction frame.
- the first search template may be an asymmetric cross template, a hexagonal template, a diamond template, or the like.
- the first search template may be determined according to the predicted motion direction of the foreground object.
- the above step S1051 may be implemented in the following manner: determining the first motion direction of the foreground object in the prediction frame based on the predicted motion vector of the prediction target block of each prediction frame; determining the first motion direction of the foreground object based on the first motion direction of the foreground object Search templates corresponding to each prediction frame.
- the first movement direction may be a horizontal direction, a vertical direction, or an oblique direction.
- the first moving direction of the foreground object when predicting the first moving direction of the foreground object, it may also be determined based on the moving direction of the previous frame of the predicted frame relative to the reference frame. For example, the previous frame of the predicted frame may be relative to the reference frame. The motion direction of the frame is determined as the first motion direction of the foreground object.
- Step S1052 with each starting search point as the center, perform search processing in the search area corresponding to the search range in the predicted frame by using the first search template to obtain the predicted target block corresponding to the reference target block in the predicted frame.
- step S1052 when step S1052 is implemented, a search process can be performed in the search area corresponding to the search range in the predicted frame by using the first search template with the starting search point as the center to determine each candidate target block, and then The block is matched with the reference target block to determine the prediction target block corresponding to the reference target block.
- constraints of color features are added, that is, SADcolor represents the bidirectional motion estimation function of color features, and SADobj represents the foreground object.
- Step S1053 Determine the degree of texture difference between the reference target block and the prediction target block.
- step S1053 texture features between the reference target block and the prediction target block can be extracted, and then the texture difference degree value between the reference target block and the prediction target block is determined by using the texture features between the reference target block and the prediction target block.
- Step S1054 determining whether the texture difference degree is less than a difference threshold.
- step S1055 when the texture difference degree is less than the preset difference threshold, it means that the texture difference between the reference target block and the prediction target block is small, so that the prediction target block is considered to be the correct target block, and then step S1055 is entered;
- step S1056 when it is greater than or equal to the difference threshold, it means that the texture difference between the reference target block and the prediction target block is relatively large, so that the prediction target block is considered to be an erroneous target block, and step S1056 is entered at this time.
- Step S1055 Determine the motion vector corresponding to the prediction target block based on the position information of the reference target block and the position information of the prediction target block.
- the motion vector corresponding to the prediction target block can be determined.
- the two vertex coordinates used to represent the position information can be subtracted, that is, the vertex coordinates of the reference target block are subtracted from the vertex coordinates of the prediction target block, and the motion vector corresponding to the prediction target block can be obtained.
- Step S1056 Determine the degree of color difference and the degree of texture difference between each prediction block in the search area corresponding to the search range in the prediction frame and the reference target block.
- the degree of color difference and the degree of texture difference between each prediction block and the reference target block may be sequentially determined from the search area.
- Step S1057 based on the degree of color difference and the degree of texture difference between each prediction block and the reference target block, determine a prediction target block corresponding to the reference target block from each prediction block.
- the degree of color difference between the prediction block and the reference target block may be selected from each prediction block and the degree of color difference is smaller than the color threshold, and the degree of texture difference is smaller than the difference threshold.
- the prediction block with the smallest difference is determined as the prediction target block.
- Step S1058 Determine the motion vector corresponding to the prediction target block based on the position information of the reference target block and the position information of the prediction target block.
- the search template is used to search for the prediction target block that matches the reference target block in the reference frame in the search area of the prediction frame with the starting search point as the center , and also compare the degree of texture difference between the prediction target block and the reference target block.
- the texture difference degree is less than the difference threshold, it is considered that the correct prediction target block is matched, and when the texture difference degree is greater than or equal to the difference threshold, it is considered that If the correct prediction target block is not matched, then each prediction block in the search area in the prediction frame can be traversed at this time, and the correct prediction target block can be determined from it, which can ensure the correctness of the prediction target block and improve the accuracy of motion estimation.
- step S1052 can be implemented by the following steps:
- Step S10521 with each initial search point as the center, determine a plurality of first candidate blocks in the search area based on the first search template.
- the first search template is an asymmetric cross template in the horizontal direction, and there are 6 blocks in the horizontal direction and 3 blocks in the vertical direction as an example for description.
- step S10521 it can be centered on the initial search point.
- 3 prediction blocks above and below the starting search point and 6 adjacent prediction blocks on the left and right are determined as the first candidate blocks.
- Step S10522 Determine the matching order of the plurality of first candidate blocks based on the predicted motion vector.
- the matching order of multiple candidate blocks may be determined based on the region in which the predicted motion vector falls, or the matching order of multiple candidate blocks may be determined according to the distance between the predicted motion vector and each candidate block , following the above example, for example, the predicted motion vector is in the horizontal left direction, then the 6 candidate blocks on the left side of the starting search point will be preferentially matched.
- Step S10523 based on the matching order, perform matching processing between each first candidate block and the reference target block, and determine whether there is a first candidate target block that matches the reference target block in the plurality of first candidate blocks.
- step S10524 when there is a first candidate target block that matches the reference target block in the plurality of first candidate blocks, go to step S10524; when there is no candidate target block that matches the reference target block in the plurality of first candidate blocks, go to step S10524 S10525.
- each prediction block in the search area in the prediction frame may also be directly traversed, so as to determine the prediction target piece.
- Step S10524 Determine the candidate target block as the prediction target block.
- Step S10525 based on the second motion direction, determine the second search template corresponding to each prediction frame.
- the second movement direction is different from the first movement direction. If the first search template determined based on the first motion direction does not match the prediction target block, it can be considered that the prediction of the first motion direction is wrong. At this time, the second search template can be determined according to the second motion direction, and the prediction target block is performed again. search.
- Step S10526 taking the initial search point as the center, determining a plurality of second candidate blocks in the search area based on the second search template;
- Step S10527 based on the predicted motion vector, determine the matching order of multiple second candidate blocks
- Step S10528 based on the matching order, perform matching processing on each second candidate block and the reference target block, and determine whether there is a second candidate target block that matches the reference target block in the plurality of second candidate blocks;
- Step S10529 when there is a second candidate target block that matches the reference target block in the plurality of second candidate blocks, determine the second candidate target block as the prediction target block.
- steps S10526 to S10529 is similar to that of steps S10521 to S10524.
- the search template can be determined by the predicted motion direction of the foreground object, and the priority matching can be determined by the predicted motion vector. Therefore, the candidate block is matched with the reference target block.
- the search template can be re-determined based on the motion direction different from the predicted motion direction of the foreground object to search for the target block. The target block is predicted, so that the matching speed can be increased and the processing efficiency of motion estimation can be improved.
- FIG. 5 is a schematic diagram of an implementation flowchart of the video motion estimation method provided by the embodiments of the present application.
- the video motion estimation method includes the following processes:
- Step S501 the terminal starts the image acquisition device based on the received image acquisition instruction.
- the image capturing instruction may be an operation instruction instructing to capture video, and the image capturing instruction may be triggered by an instant messaging application, of course, may also be triggered by an office application, or may be triggered by a short video application.
- Step S502 the terminal acquires a plurality of image frames collected by the image collection device.
- image capturing is performed to obtain a plurality of image frames.
- Step S503 the terminal performs scene division processing on the multiple image frames to obtain multiple image frame sets.
- the terminal may perform scene segmentation on multiple image frames by combining the optical flow field and the geometric scene division method based on scene structure estimation, so as to obtain multiple image frame sets, and each image frame set includes at least one image frame.
- Step S504 the terminal extracts the contour feature and color feature of the foreground object in each image frame in each image frame set.
- Step S505 the terminal determines the vertex coordinates in each foreground image area based on the position information of each foreground image area in each image frame set.
- Step S506 the terminal determines the first maximum value and the first minimum value corresponding to the first dimension from the coordinates of each vertex, and determines the second maximum value and the second minimum value corresponding to the second dimension.
- Step S507 the terminal determines the search range corresponding to the image frame set based on the first minimum value, the first maximum value, the second minimum value and the second maximum value.
- Step S508 the terminal determines the starting search point of each prediction frame in each image frame set.
- Step S509 the terminal performs motion estimation in the search area corresponding to the search range in each prediction frame based on each initial search point, the color feature of the target block in the reference frame and the foreground object, and obtains the motion vector corresponding to the target block.
- Step S510 the terminal performs video encoding based on the motion vector and multiple image frames to obtain an encoded video.
- Step S511 the terminal sends the encoded video to the server.
- the server may be a service server corresponding to an application that triggers an image capture instruction, for example, an instant messaging server, an office application server, or a short video server.
- Step S512 the server performs motion compensation on the encoded video based on the motion vector to obtain each decoded image frame.
- the terminal after collecting multiple image frames of a video, the terminal first divides the multiple image frames into scenes to obtain multiple image frame sets, that is, each scene corresponds to An image frame set, and each image frame set includes one or more image frames, the backgrounds of the image frames belonging to the same scene are similar, and then the outline features and the foreground objects in each image frame in each image frame set are extracted. color features, and based on the vertex coordinates of the contours of the foreground objects in each image frame set, determine the maximum and minimum coordinates of the coordinates, and then determine the search range corresponding to each image frame set, and then determine each prediction in each image frame set.
- Each starting search point of the frame and then based on each starting search point, the color characteristics of the target block in the reference frame and the foreground object, perform motion estimation in the search area corresponding to the search range in each prediction frame, and obtain the corresponding target block.
- Motion vector since the search range is determined based on the coordinates of the contour vertices, it can ensure that the search range is as small as possible on the premise of including foreground objects, thereby reducing the search time and ensuring the accuracy of motion estimation.
- the terminal will refer to the frame and The motion vector is sent to the server, which can reduce the data bandwidth requirement, reduce the transmission delay, and improve the transmission efficiency.
- the embodiments of the present application can be applied to video applications such as video storage applications, instant messaging applications, video playback applications, video calling applications, and live broadcast applications.
- video applications such as video storage applications, instant messaging applications, video playback applications, video calling applications, and live broadcast applications.
- instant messaging application runs on the sender of the call, and the video sender obtains the video to be processed (such as a recorded video), and searches for the target block corresponding to the reference frame in the search area corresponding to the search range.
- the server sends the encoded video to the video receiver, and the video receiver decodes the received encoded video to Play the video to improve the efficiency of video transmission;
- the video storage application as an example, a video storage application runs on the terminal, the terminal obtains the video to be processed (such as the video recorded in real time), and searches in the search area corresponding to the search range
- the target block corresponding to the reference frame is used to determine the motion vector, and based on the motion vector, video coding is performed, and the coded video is sent to the server to implement a cloud storage solution, thereby saving storage space.
- FIG. 6 is a schematic flowchart of the implementation of the 3D image block-based video motion estimation method provided by the embodiment of the present application. As shown in FIG. 6 , the process includes:
- Step S601 initializing and defining the video to be processed.
- the video sequence data Vo(f1,f2,...,fn) can be defined as a three-dimensional space cuboid of F*W*H, where F, W, and H are the frame number and space of Vo in the time domain, respectively. frame width and frame height on the domain.
- set a three-dimensional cuboid slider P(P ⁇ Vo) whose length, width, and height are f, w, and h respectively, and set the initial position point of P at Vo as O(0,0,0), where The initial position point O is the initial position of the boundary of Vo.
- Step S602 extracting the motion feature of the foreground object based on the optical flow field to obtain the foreground motion contour feature.
- the combination of global optical flow (such as Horn-Schunck optical flow) and graph cut is used for video segmentation.
- the foreground is processed.
- the gradient of the optical flow vector of the optical flow field can be used to collect the moving contour features of the foreground object.
- Step S603 establishing an action model of the video scene, and extracting the foreground area of the sequence frame.
- the video Vo is segmented by combining the optical flow field and the geometric scene segmentation method based on scene structure estimation, and the foreground objects and their corresponding color block color sequence features are extracted, and the extraction, segmentation results and color information are used to constrain the image.
- the action model of the video scene is established by combining the foreground prior regions, and the prior knowledge of the foreground regions in the continuous image frames is extracted. Then, by iterating the probability density function of consecutive frame pixels to find the extreme value, the pixels of the same type are divided to realize the segmentation of the scene. At the same time, the color sequence information of the segmented tiles is extracted, and the segmentation results are improved based on the estimation and classification of the scene structure.
- Step S604 acquiring a video foreground moving object sequence and a background segmentation sequence.
- a neural network-based method can be used to extract foreground motion information of the video and segment the video scene.
- Step S605 perform motion estimation calculation based on the 3D sequence image frame.
- the value range of the side length of the slider P is set in combination with the video foreground motion information and the characteristics of the video sequence image frames of Vo in the three directions of F, W, and H, and the current motion vector is predicted and the position of the starting search point is determined,
- the median prediction in the spatial domain, the upper-layer prediction and the origin prediction can be used to predict the current motion vector, so as to determine the best starting search point.
- the initial position point O of the slider P at Vo is set at the center position of the foreground motion area in the three directions of f, w, and h where the determined initial search point position is located.
- the motion vector estimation is realized through the improved UMHexagonS search template.
- the bidirectional motion estimation calculation is performed with the edge position of the cuboid slider P, the segmented scene and the video color sequence features as constraints. Taking the starting prediction point as the center, keeping within P to complete the search of the target macroblock of the current frame, using the asymmetric cross search template with the w-axis search point twice the h-axis search point, according to the previous prediction vector in the w-axis and The magnitude of the component on the h-axis, which determines whether the foreground motion in the slider P is horizontal or vertical. If the foreground motion in the slider P is a horizontal motion, the horizontal asymmetric cross shape of the original UMHexagonS template is used.
- the foreground motion in the slider P is a vertical motion
- a template whose h-axis is twice the search point of the w-axis is used.
- different sub-regions are preferentially searched according to the region in which the predicted motion vector falls, as shown in Figure 7, when the predicted motion vector falls into the first quadrant, the sub-region shown in 701 is preferentially searched, and when the predicted motion vector falls into the first quadrant, the sub-region shown in 701 is preferentially searched. In the second quadrant, the sub-region shown in 702 is preferentially searched. When the predicted motion vector falls into the third quadrant, the sub-region shown in 703 is preferentially searched. When the predicted motion vector falls into the fourth quadrant, the sub-region shown in 704 is preferentially searched. Sub-region, which can reduce the cost of search time, coupled with the limitation of the scene sequence characteristics of the video, can improve the target macroblock precision rate.
- Step S606 performing motion estimation optimization based on the energy function.
- the consistency of the scene segmentation map represents the specific texture information of the video image frame, which can be Effectively distinguish the texture difference of two similar macroblocks, accurately track the motion information of each macroblock in the image frame, and correct the motion vector field of the image.
- the calculation and estimation of the motion vector in step S605 can be constrained by the consistent energy function, and the consistent segmentation scheme can be used to determine whether the color sequence information of each segmentation map is consistent, so as to detect the mismatch of macroblocks and correct them. Improve the accuracy of motion vector fields.
- the motion vector of the macroblock corresponding to the continuous frame it can also be determined that the function value of the consistency energy function is smaller than the preset threshold, that is, it is considered that the macro corresponding to the reference frame has been searched. block, the motion vector is determined based on the position information of the macroblock of the reference frame and the macroblock searched for in the current frame.
- the macroblock-based motion estimation method mainly determines the optimal motion vector of the macroblock by obtaining the minimum absolute error between the reference frame and the corresponding macroblock of the current frame, the calculation is time-consuming and complicated. Especially for videos with complex scenes, the accuracy of motion estimation is unstable, and the fast motion estimation method based on the preprocessing of the video content can reduce the time of motion estimation, reduce the complexity of motion estimation, and improve the accuracy of the motion vector field. Therefore, in the embodiments of the present application, combined with the unique structural features of video sequence frames and the advantages and disadvantages of the macroblock-based motion estimation method, it is proposed to preprocess the video sequence image frame as a 3D overall calculation object and calculate its motion vector information. , in order to achieve more efficient motion estimation and reduce video encoding time while ensuring a certain rate-distortion performance.
- FIG. 8 is a schematic diagram of implementing motion estimation of a 3D image block according to an embodiment of the present application.
- a 3D image set of a series of frames included in a video is taken as a F*W*H three-dimensional calculation object Vo(f 1 ,f 2 ,...,f n ), after preprocessing Vo ⁇ S 1 (f 1 ,f...,f i-1 ),S2(f i ,f i+1 ,f...),...,S N
- Each scene S i of ⁇ is listed as a search group, Vo is divided into N cuboid blocks according to the number of scenes N, and the slider P traverses from the first cuboid scene S 1 to the Nth cuboid S N until the completion of Vo motion estimation.
- the constraint of the color sequence feature is added, that is, the SAD color is set to represent the color sequence
- SAD obj represents the bidirectional motion estimation function of the foreground object target sequence
- ⁇ 1 and ⁇ 2 are the weight factors of the color sequence and the foreground object target sequence, respectively, and the weight factor can be used in the preprocessing stage.
- the consistent energy function can distinguish the underlying texture differences of similar blocks, accurately track the motion information of the macroblocks of the image frame, correct the error vector information, and improve the extraction accuracy of the motion vector field.
- the embodiment of the present application proposes an efficient video fast motion based on 3D image blocks.
- Estimation method the difference of this method is that the preprocessing of video content is applied to motion estimation, making full use of the foreground motion information of continuous frames and the scene structure characteristics of the background, effectively limiting the search range in the search process, reducing search points, thereby Reduce the time cost of motion estimation.
- the video continuous sequence image to be encoded is regarded as a 3D overall calculation object, and the content of the three-dimensional image block composed of continuous frames is used to constrain the motion vector calculation process, so as to realize fast motion estimation and improve the vector field. 's accuracy.
- the method can effectively reduce the complexity of motion estimation on the basis of ensuring the coding rate-distortion performance, and save 10% to 30% of the motion estimation time. .
- the motion estimation method provided by the embodiments of the present application mainly eliminates temporal redundancy in video sequence frames through inter-frame prediction, can be used for compression coding of video data, improves video transmission efficiency, and can be applied to video conferences, video phones, etc., to achieve Real-time transmission of high compression ratio video data under extremely low bit rate transmission conditions. And this method is also suitable for 2D video and stereo video, especially for various complex videos such as jitter during shooting, low image frame contrast, continuous complex changes in motion scenes, etc., and still maintains good coding rate distortion performance.
- the video motion estimation apparatus 455 provided by the embodiments of the present application is implemented as a software module.
- the software modules stored in the video motion estimation apparatus 455 of the memory 450 may be is a video motion estimation apparatus in terminal 400, including:
- the first acquisition module 4551 is configured to acquire multiple image frames in the video to be processed, perform scene division processing on the multiple image frames, and obtain multiple image frame sets, wherein each image frame set includes at least one image frame; the feature extraction module 4552 is configured to extract the contour features and color features of the foreground objects in the respective image frames in each of the image frame sets; the first determination module 4553 is configured to be based on the foreground objects in each of the image frame sets.
- the contour features of each of the image frame sets are determined to determine the search range corresponding to each of the image frame sets;
- the second determination module 4554 is configured to determine the starting search point of each predicted frame in each of the image frame sets;
- the motion estimation module 4555 is configured to The starting search point of each predicted frame, the target block in the reference frame, and the color feature of the foreground object, perform motion estimation processing in the search area corresponding to the search range in each predicted frame, and obtain the The motion vector corresponding to the target block.
- the first obtaining module is further configured to: determine background image regions in the plurality of image frames, and determine the image similarity between the plurality of background image regions; based on the plurality of the background image regions For the image similarity between the background image areas, scene division processing is performed on the multiple image frames to obtain multiple image frame sets.
- the feature extraction module is further configured to: perform the following processing for any image frame in any of the image frame sets: determine the foreground image area where the foreground object in the image frame is located; The position information of the area is used as the contour feature of the foreground object in the image frame; color extraction processing is performed based on the foreground image area to obtain the color feature of the foreground object in the image frame.
- the first determining module is further configured to: perform the following processing for any of the image frame sets: determine each foreground image based on the position information of each foreground image area in the image frame set vertex coordinates in the area; from each of the vertex coordinates, determine the first maximum value and the first minimum value corresponding to the first dimension, and determine the second maximum value and the second minimum value corresponding to the second dimension; based on the The first minimum value, the first maximum value, the second minimum value, and the second maximum value determine a search range corresponding to the image frame set.
- the second determining module is further configured to: determine the position information of the reference target block in each reference frame in each of the image frame sets, wherein the reference target block is a reference frame in the reference frame any of the target blocks; predicting the motion vector of each prediction frame by using the set prediction mode, to obtain the predicted motion vector of each prediction frame, the prediction mode includes at least one of the following: median prediction mode, upper layer block Prediction mode and origin prediction mode; based on the position information of the reference target block and the predicted motion vector, determining the starting search point of each predicted frame.
- the motion estimation module is further configured to: perform the following processing for any of the predicted frames: determine a first search template corresponding to the predicted frame; take the starting search point of the predicted frame as the center, Perform search processing in the search area corresponding to the search range in the predicted frame by using the first search template to obtain the predicted target block corresponding to the reference target block in the predicted frame; determine the reference target The texture difference degree between the block and the prediction target block; when the texture difference degree is less than the difference threshold, the prediction target block is determined based on the position information of the reference target block and the position information of the prediction target block the corresponding motion vector.
- the motion estimation module is further configured to: when the degree of texture difference is greater than or equal to the difference threshold, determine the color difference between each prediction block in the predicted frame and the reference target block degree and texture difference degree, wherein the prediction block is a target block in the search area corresponding to the search range in the prediction frame; based on the degree of color difference between each prediction block and the reference target block and texture difference degree, determine the prediction target block corresponding to the reference target block from the respective prediction blocks; determine the prediction target block based on the position information of the reference target block and the position information of the prediction target block the corresponding motion vector.
- the motion estimation module is further configured to: determine the first motion direction of the foreground object in the predicted frame based on the predicted motion vector of the predicted target block of the predicted frame; For the first motion direction, the first search template corresponding to the predicted frame is determined.
- the motion estimation module is further configured to: determine a plurality of first candidate blocks in the search area based on the first search template; and determine the plurality of first candidate blocks based on the predicted motion vector The matching sequence of the candidate blocks; based on the matching sequence, the multiple first candidate blocks are matched with the reference target block; when the multiple first candidate blocks are successfully matched with the reference target block When the first candidate target block is the first candidate target block, the first candidate target block is used as the prediction target block corresponding to the reference target block in the prediction frame.
- the motion estimation module is further configured to: when there is no first candidate target block successfully matched with the reference target block in the plurality of first candidate blocks, based on the second motion direction, determine A second search template corresponding to the predicted frame, the second movement direction is different from the first movement direction; taking the starting search point as the center, determining the movement in the search area based on the second search template. a plurality of second candidate blocks; based on the predicted motion vector, determine a matching order of the plurality of second candidate blocks; based on the matching order, match the plurality of second candidate blocks with the reference target block Processing: when there is a second candidate target block that is successfully matched with the reference target block in the plurality of second candidate blocks, determine the second candidate target block as a prediction target block.
- the embodiments of the present application provide a storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to execute the method provided by the embodiments of the present application, for example, as shown in FIG. 4 shows the method.
- the storage medium may be a computer-readable storage medium, for example, Ferromagnetic Random Access Memory (FRAM), Read Only Memory (ROM), Programmable Read Only Memory (PROM). Read Only Memory), Erasable Programmable Read Only Memory (EPROM, Erasable Programmable Read Only Memory), Electrically Erasable Programmable Read Only Memory (EEPROM, Electrically Erasable Programmable Read Only Memory), Flash Memory, Magnetic Surface Memory, Optical Disc, Or memory such as CD-ROM (Compact Disk-Read Only Memory); it can also be various devices including one or any combination of the above memories.
- FRAM Ferromagnetic Random Access Memory
- ROM Read Only Memory
- PROM Programmable Read Only Memory
- EPROM Erasable Programmable Read Only Memory
- EEPROM Electrically Erasable Programmable Read Only Memory
- Flash Memory Magnetic Surface Memory
- Optical Disc Or memory such as CD-ROM (Compact Disk-Read Only Memory); it can also be various devices including one or any combination of the above memories.
- executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, a Hyper Text Markup Language (HTML, Hyper Text Markup Language) document
- HTML Hyper Text Markup Language
- One or more scripts in stored in a single file dedicated to the program in question, or in multiple cooperating files (eg, files that store one or more modules, subroutines, or code sections).
- executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, distributed across multiple sites and interconnected by a communication network execute on.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (14)
- 一种视频运动估计方法,由视频运动估计设备执行,包括:获取待处理视频中的多个图像帧,对所述多个图像帧进行场景划分处理,得到多个图像帧集合,其中,各个所述图像帧集合包括至少一个图像帧;提取各个所述图像帧集合中各个图像帧中的前景对象的轮廓特征和色彩特征;基于各个所述图像帧集合中的前景对象的轮廓特征,确定各个所述图像帧集合对应的搜索范围;确定各个所述图像帧集合中各个预测帧的起始搜索点;基于所述各个预测帧的起始搜索点、参考帧中的目标块和所述前景对象的色彩特征,在所述各个预测帧中的所述搜索范围对应的搜索区域内进行运动估计处理,得到所述目标块对应的运动矢量。
- 根据权利要求1中所述的方法,其中,所述对所述多个图像帧进行场景划分处理,得到多个图像帧集合,包括:确定所述多个图像帧中的背景图像区域,并确定多个所述背景图像区域之间的图像相似度;基于多个所述背景图像区域之间的图像相似度,对所述多个图像帧进行场景划分处理,得到多个图像帧集合。
- 根据权利要求1中所述的方法,其中,所述提取各个所述图像帧集合中各个图像帧中的前景对象的轮廓特征和色彩特征,包括:针对任意所述图像帧集合中任意图像帧执行以下处理:确定所述图像帧中的前景对象所在的前景图像区域;将所述前景图像区域的位置信息作为所述图像帧中的前景对象的轮廓特征;基于所述前景图像区域进行色彩提取处理,得到所述图像帧中的前景对象的色彩特征。
- 根据权利要求3中所述的方法,其中,所述基于各个所述图像帧集合中的前景对象的轮廓特征,确定各个所述图像帧集合对应的搜索范围,包括:针对任意所述图像帧集合执行以下处理:基于所述图像帧集合中的各个前景图像区域的位置信息,确定所述各个前景图像区域中的顶点坐标;从各个所述顶点坐标中,确定第一维度对应的第一最大值和第一最小值,并确定第二维度对应的第二最大值和第二最小值;基于所述第一最小值、所述第一最大值、所述第二最小值和所述第二最大值,确定所述图像帧集合对应的搜索范围。
- 根据权利要求1中所述的方法,其中,所述确定各个所述图像帧集合中各个预测帧的起始搜索点,包括:确定各个所述图像帧集合中各个参考帧中的参考目标块的位置信息,其中,所述参考目标块为所述参考帧中的任意所述目标块;通过设定的预测模式对各个预测帧的运动矢量进行预测,得到所述各个预测帧的预测运动矢量,所述预测模式包括以下至少之一:中值预测模式、上层块预测模式和原点预测模式;基于所述参考目标块的位置信息和所述预测运动矢量,确定所述各个预测帧的起始搜索点。
- 根据权利要求5中所述的方法,其中,所述基于所述各个预测帧的起始搜索点、参考帧中的目标块和所述前景对象的色彩特征,在所述各个预测帧中的所述搜索范围对 应的搜索区域内进行运动估计处理,得到所述目标块对应的运动矢量,包括:针对任意所述预测帧执行以下处理:确定所述预测帧对应的第一搜索模板;以所述预测帧的起始搜索点为中心,通过所述第一搜索模板在所述预测帧中的所述搜索范围对应的搜索区域内进行搜索处理,得到所述预测帧中与所述参考目标块对应的预测目标块;确定所述参考目标块与所述预测目标块之间的纹理差异程度;当所述纹理差异程度小于差异阈值时,基于所述参考目标块的位置信息和所述预测目标块的位置信息,确定所述预测目标块对应的运动矢量。
- 根据权利要求6中所述的方法,其中,所述方法还包括:当所述纹理差异程度大于或者等于所述差异阈值时,确定所述预测帧中各个预测块与所述参考目标块之间的色彩差异程度和纹理差异程度,其中,所述预测块为所述预测帧中的所述搜索范围对应的搜索区域内的目标块;基于所述各个预测块与所述参考目标块之间的色彩差异程度和纹理差异程度,从所述各个预测块中确定与所述参考目标块对应的预测目标块;基于所述参考目标块的位置信息和所述预测目标块的位置信息,确定所述预测目标块对应的运动矢量。
- 根据权利要求6中所述的方法,其中,所述确定所述预测帧对应的第一搜索模板,包括:基于所述预测帧的预测目标块的预测运动矢量,确定所述预测帧中的前景对象的第一运动方向;基于所述前景对象的第一运动方向,确定所述预测帧对应的第一搜索模板。
- 根据权利要求8中所述的方法,其中,所述通过所述第一搜索模板在所述预测帧中的所述搜索范围对应的搜索区域内进行搜索处理,得到所述预测帧中与所述参考目标块对应的预测目标块,包括:基于所述第一搜索模板确定所述搜索区域内的多个第一候选块;基于所述预测运动矢量,确定所述多个第一候选块的匹配顺序;基于所述匹配顺序,将所述多个第一候选块与所述参考目标块进行匹配处理;当所述多个第一候选块中存在与所述参考目标块匹配成功的第一候选目标块时,将所述第一候选目标块作为所述预测帧中与所述参考目标块对应的预测目标块。
- 根据权利要求9中所述的方法,其中,所述方法还包括:当所述多个第一候选块中不存在与所述参考目标块匹配成功的第一候选目标块时,基于第二运动方向,确定所述预测帧对应的第二搜索模板,所述第二运动方向与所述第一运动方向不同;以所述起始搜索点为中心,基于所述第二搜索模板确定所述搜索区域内的多个第二候选块;基于所述预测运动矢量,确定所述多个第二候选块的匹配顺序;基于所述匹配顺序,将所述多个第二候选块与所述参考目标块进行匹配处理;当所述多个第二候选块中存在与所述参考目标块匹配成功的第二候选目标块时,将所述第二候选目标块确定为预测目标块。
- 一种视频运动估计装置,包括:第一获取模块,配置为获取待处理视频中的多个图像帧,对所述多个图像帧进行场景划分处理,得到多个图像帧集合,其中,各个所述图像帧集合包括至少一个图像帧;特征提取模块,配置为提取各个所述图像帧集合中各个图像帧中的前景对象的轮廓 特征和色彩特征;第一确定模块,配置为基于各个所述图像帧集合中的前景对象的轮廓特征,确定各个所述图像帧集合对应的搜索范围;第二确定模块,配置为确定各个所述图像帧集合中各个预测帧的起始搜索点;运动估计模块,配置为基于所述各个预测帧的起始搜索点、参考帧中的目标块和所述前景对象的色彩特征,在所述各个预测帧中的所述搜索范围对应的搜索区域内进行运动估计处理,得到所述目标块对应的运动矢量。
- 一种视频运动估计设备,包括:存储器,用于存储可执行指令;处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至10任一项所述的视频运动估计方法。
- 一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至10任一项所述的视频运动估计方法。
- 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令使得计算机执行如权利要求1至10任一项所述的视频运动估计方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023518852A JP2023542397A (ja) | 2020-12-04 | 2021-12-03 | ビデオ動き推定方法、装置、機器、及びコンピュータプログラム |
EP21900100.5A EP4203476A4 (en) | 2020-12-04 | 2021-12-03 | METHOD AND APPARATUS FOR VIDEO MOTION ESTIMATION, APPARATUS, COMPUTER-READABLE STORAGE MEDIUM AND COMPUTER PROGRAM PRODUCT |
KR1020237010460A KR20230058133A (ko) | 2020-12-04 | 2021-12-03 | 비디오 모션 추정 방법 및 장치, 디바이스, 컴퓨터-판독가능 저장 매체, 및 컴퓨터 프로그램 제품 |
US17/963,938 US20230030020A1 (en) | 2020-12-04 | 2022-10-11 | Defining a search range for motion estimation for each scenario frame set |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011401743.1 | 2020-12-04 | ||
CN202011401743.1A CN112203095B (zh) | 2020-12-04 | 2020-12-04 | 视频运动估计方法、装置、设备及计算机可读存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/963,938 Continuation US20230030020A1 (en) | 2020-12-04 | 2022-10-11 | Defining a search range for motion estimation for each scenario frame set |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022117076A1 true WO2022117076A1 (zh) | 2022-06-09 |
Family
ID=74033848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/135372 WO2022117076A1 (zh) | 2020-12-04 | 2021-12-03 | 视频运动估计方法、装置、设备、计算机可读存储介质及计算机程序产品 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20230030020A1 (zh) |
EP (1) | EP4203476A4 (zh) |
JP (1) | JP2023542397A (zh) |
KR (1) | KR20230058133A (zh) |
CN (1) | CN112203095B (zh) |
WO (1) | WO2022117076A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116055717A (zh) * | 2023-03-31 | 2023-05-02 | 湖南国科微电子股份有限公司 | 视频压缩方法、装置、计算机设备及计算机可读存储介质 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112203095B (zh) * | 2020-12-04 | 2021-03-09 | 腾讯科技(深圳)有限公司 | 视频运动估计方法、装置、设备及计算机可读存储介质 |
CN112801032B (zh) * | 2021-02-22 | 2022-01-28 | 西南科技大学 | 一种用于运动目标检测的动态背景匹配方法 |
CN113115038B (zh) * | 2021-04-16 | 2022-03-29 | 维沃移动通信有限公司 | 运动估计方法、装置、电子设备及可读存储介质 |
CN114040203B (zh) * | 2021-11-26 | 2024-07-12 | 京东方科技集团股份有限公司 | 视频数据处理方法、装置、设备和计算机存储介质 |
CN114283356B (zh) * | 2021-12-08 | 2022-11-29 | 上海韦地科技集团有限公司 | 一种移动图像的采集分析系统及方法 |
CN114407024B (zh) * | 2022-03-15 | 2024-04-26 | 上海擎朗智能科技有限公司 | 一种位置引领方法、装置、机器人及存储介质 |
CN115297333B (zh) * | 2022-09-29 | 2023-03-24 | 北京达佳互联信息技术有限公司 | 视频数据的帧间预测方法、装置、电子设备及存储介质 |
CN116737991B (zh) * | 2023-08-11 | 2023-10-20 | 陕西龙朔通信技术有限公司 | 网络视频监控数据处理方法及系统 |
CN117197507B (zh) * | 2023-11-07 | 2024-02-09 | 北京闪马智建科技有限公司 | 图像块的确定方法及装置、存储介质及电子装置 |
CN117857808B (zh) * | 2024-03-06 | 2024-06-04 | 深圳市旭景数字技术有限公司 | 一种基于数据分类压缩的高效视频传输方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102075757A (zh) * | 2011-02-10 | 2011-05-25 | 北京航空航天大学 | 通过边界检测作为运动估计参考的视频前景对象编码方法 |
CN103796028A (zh) * | 2014-02-26 | 2014-05-14 | 北京大学 | 一种视频编码中基于图像信息的运动搜索方法和装置 |
US20190045193A1 (en) * | 2018-06-29 | 2019-02-07 | Intel Corporation | Region-based motion estimation and modeling for accurate region-based motion compensation for efficient video processing or coding |
CN112203095A (zh) * | 2020-12-04 | 2021-01-08 | 腾讯科技(深圳)有限公司 | 视频运动估计方法、装置、设备及计算机可读存储介质 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11243551A (ja) * | 1997-12-25 | 1999-09-07 | Mitsubishi Electric Corp | 動き補償装置と動画像符号化装置及び方法 |
US20020057739A1 (en) * | 2000-10-19 | 2002-05-16 | Takumi Hasebe | Method and apparatus for encoding video |
KR100491530B1 (ko) * | 2002-05-03 | 2005-05-27 | 엘지전자 주식회사 | 모션 벡터 결정 방법 |
US7038676B2 (en) * | 2002-06-11 | 2006-05-02 | Sony Computer Entertainmant Inc. | System and method for data compression |
KR20070079717A (ko) * | 2006-02-03 | 2007-08-08 | 삼성전자주식회사 | 고속 움직임 추정 장치 및 방법 |
CN101184235B (zh) * | 2007-06-21 | 2010-07-28 | 腾讯科技(深圳)有限公司 | 一种从运动图像中提取背景图像的实现方法及装置 |
CN101640809B (zh) * | 2009-08-17 | 2010-11-03 | 浙江大学 | 一种融合运动信息与几何信息的深度提取方法 |
CN102006475B (zh) * | 2010-11-18 | 2012-12-19 | 无锡中星微电子有限公司 | 一种视频编解码装置和方法 |
WO2012103332A2 (en) * | 2011-01-28 | 2012-08-02 | Eye IO, LLC | Encoding of video stream based on scene type |
CN102572438B (zh) * | 2012-02-21 | 2014-04-02 | 清华大学 | 一种基于图像纹理和运动特征的运动预测方法 |
CN102801972B (zh) * | 2012-06-25 | 2017-08-29 | 北京大学深圳研究生院 | 基于特征的运动矢量估计和传递方法 |
CN102761765B (zh) * | 2012-07-16 | 2014-08-20 | 清华大学 | 一种用于三维立体视频的深度快速插帧方法 |
CN103888767B (zh) * | 2014-03-31 | 2017-07-28 | 山东大学 | Umh块匹配运动估计与光流场运动估计相结合的一种帧率提升方法 |
US10057593B2 (en) * | 2014-07-08 | 2018-08-21 | Brain Corporation | Apparatus and methods for distance estimation using stereo imagery |
CN104616497B (zh) * | 2015-01-30 | 2017-03-15 | 江南大学 | 公共交通紧急状况检测方法 |
US10506196B2 (en) * | 2017-04-01 | 2019-12-10 | Intel Corporation | 360 neighbor-based quality selector, range adjuster, viewport manager, and motion estimator for graphics |
CN110213591B (zh) * | 2018-03-07 | 2023-02-28 | 腾讯科技(深圳)有限公司 | 一种视频运动估计方法、装置及存储介质 |
KR20200016627A (ko) * | 2018-08-07 | 2020-02-17 | 삼성전자주식회사 | 자체 운동 추정 방법 및 장치 |
CN110677624B (zh) * | 2019-10-21 | 2020-09-18 | 浙江大学 | 基于深度学习的面向监控视频的前景和背景并行压缩方法 |
CN111754429B (zh) * | 2020-06-16 | 2024-06-11 | Oppo广东移动通信有限公司 | 运动矢量后处理方法和装置、电子设备及存储介质 |
-
2020
- 2020-12-04 CN CN202011401743.1A patent/CN112203095B/zh active Active
-
2021
- 2021-12-03 JP JP2023518852A patent/JP2023542397A/ja active Pending
- 2021-12-03 KR KR1020237010460A patent/KR20230058133A/ko unknown
- 2021-12-03 WO PCT/CN2021/135372 patent/WO2022117076A1/zh unknown
- 2021-12-03 EP EP21900100.5A patent/EP4203476A4/en active Pending
-
2022
- 2022-10-11 US US17/963,938 patent/US20230030020A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102075757A (zh) * | 2011-02-10 | 2011-05-25 | 北京航空航天大学 | 通过边界检测作为运动估计参考的视频前景对象编码方法 |
CN103796028A (zh) * | 2014-02-26 | 2014-05-14 | 北京大学 | 一种视频编码中基于图像信息的运动搜索方法和装置 |
US20190045193A1 (en) * | 2018-06-29 | 2019-02-07 | Intel Corporation | Region-based motion estimation and modeling for accurate region-based motion compensation for efficient video processing or coding |
CN112203095A (zh) * | 2020-12-04 | 2021-01-08 | 腾讯科技(深圳)有限公司 | 视频运动估计方法、装置、设备及计算机可读存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4203476A4 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116055717A (zh) * | 2023-03-31 | 2023-05-02 | 湖南国科微电子股份有限公司 | 视频压缩方法、装置、计算机设备及计算机可读存储介质 |
CN116055717B (zh) * | 2023-03-31 | 2023-07-14 | 湖南国科微电子股份有限公司 | 视频压缩方法、装置、计算机设备及计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JP2023542397A (ja) | 2023-10-06 |
CN112203095B (zh) | 2021-03-09 |
EP4203476A1 (en) | 2023-06-28 |
US20230030020A1 (en) | 2023-02-02 |
EP4203476A4 (en) | 2024-05-22 |
CN112203095A (zh) | 2021-01-08 |
KR20230058133A (ko) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022117076A1 (zh) | 视频运动估计方法、装置、设备、计算机可读存储介质及计算机程序产品 | |
US11501507B2 (en) | Motion compensation of geometry information | |
CN110610510B (zh) | 目标跟踪方法、装置、电子设备及存储介质 | |
WO2018006825A1 (zh) | 视频编码方法和装置 | |
US9179071B2 (en) | Electronic device and image selection method thereof | |
US9153054B2 (en) | Method, apparatus and computer program product for processing of images and compression values | |
US20140101590A1 (en) | Digital image manipulation | |
JP2008518331A (ja) | リアルタイムビデオ動き解析を通じたビデオコンテンツ理解 | |
WO2017031671A1 (zh) | 运动矢量场编码方法和解码方法、编码和解码装置 | |
US10062195B2 (en) | Method and device for processing a picture | |
Santamaria et al. | A comparison of block-matching motion estimation algorithms | |
Raj et al. | Feature based video stabilization based on boosted HAAR Cascade and representative point matching algorithm | |
TW201328359A (zh) | 基於壓縮域的移動物件偵測方法及裝置 | |
Laumer et al. | Moving object detection in the H. 264/AVC compressed domain | |
CN110177278B (zh) | 一种帧间预测方法、视频编码方法及装置 | |
CN113810654A (zh) | 图像视频的上传方法、装置、存储介质以及电子设备 | |
KR101178015B1 (ko) | 시차 맵 생성 방법 | |
Choudhary et al. | Real time video summarization on mobile platform | |
Chai et al. | Fpga-based ROI encoding for HEVC video bitrate reduction | |
CN106575359B (zh) | 视频流的动作帧的检测 | |
Morerio et al. | Optimizing superpixel clustering for real-time egocentric-vision applications | |
KR101220003B1 (ko) | 시차 맵 생성 방법 | |
CN112598043A (zh) | 一种基于弱监督学习的协同显著性检测方法 | |
Kanchan et al. | Recent trends in 2D to 3D image conversion: algorithm at a glance | |
KR102683700B1 (ko) | 비디오 처리 방법, 장치, 전자 기기, 저장 매체 및 컴퓨터 프로그램 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21900100 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023518852 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20237010460 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2021900100 Country of ref document: EP Effective date: 20230324 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |