CN115914647A

CN115914647A - Motion estimation method and device for video image

Info

Publication number: CN115914647A
Application number: CN202211403656.9A
Authority: CN
Inventors: 韩晶晶; 杨韬育; 徐赛杰; 李锋; 余横; 汪佳丽
Original assignee: Shanghai Shunjiu Electronic Technology Co ltd
Current assignee: Shanghai Shunjiu Electronic Technology Co ltd
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2023-04-04

Abstract

The application provides a method and a device for estimating motion of a video image, which are used for improving the accuracy of motion estimation. The method comprises the following steps: determining a backward motion vector of a first image, a forward motion vector and a first SAD of a second image frame, determining a forward intermediate frame and a second SAD according to the forward motion vector, determining a backward intermediate frame according to the backward motion vector, calculating a first similarity between an ith sub image block in the backward intermediate frame and a sub image block of the first image frame, and a second similarity between the ith sub image block in the forward intermediate frame and a sub image block of the first image frame, determining a first matching error according to the first similarity and the first SAD corresponding to the first matching pair, and determining a second matching error according to the second similarity and the second SAD corresponding to the second matching pair; and taking the motion vector corresponding to the smaller matching error in the first matching error and the second matching error as the motion vector of the ith sub image block in the intermediate frame to be inserted.

Description

Motion estimation method and device for video image

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for motion estimation of a video image.

Background

The frame rate lifting technology improves the video frame rate by estimating the motion state between the front reference frame and the rear reference frame, generating a new interpolation frame through the front reference frame and the rear reference frame and the motion state and inserting the new interpolation frame into the original video. The frame rate lifting technology can realize dynamic frame supplement of the picture, ensure the continuity and the definition of the picture, and improve the visual effect and the watching experience. In the prior art, when video motion estimation is performed, forward motion estimation or backward motion estimation is generally used. However, when the foreground in the video image moves, when the foreground of the previous frame moves to the position of the background of the previous frame in the next frame, the motion vector obtained when the forward motion estimation and the backward motion estimation are performed on the background area is inaccurate, and the accuracy of the frame to be interpolated is further affected.

Disclosure of Invention

The embodiment of the application provides a method and a device for estimating the motion of a video image, which are used for improving the accuracy of motion estimation.

In a first aspect, an embodiment of the present application provides a method for motion estimation of a video image, including:

acquiring a first image frame and a second image frame, wherein the first image frame is a last image frame adjacent to the second image frame; the first image frame and the second image frame respectively comprise N sub image blocks;

determining a backward motion vector and a first absolute error value SAD of each sub-image block of the first image frame, and a forward motion vector and a second SAD of each sub-image block of the second image frame;

determining a backward intermediate frame according to the N sub image blocks in the first image frame and the corresponding backward motion vectors, and determining a forward intermediate frame according to the N sub image blocks in the second image frame and the corresponding forward motion vectors;

determining a first similarity between the ith sub-image block of the backward intermediate frame and the sub-image block in the first image frame in a first matching pair, and a second similarity between the ith sub-image block of the forward intermediate frame and the sub-image block in the first image frame in a second matching pair;

the first matching pair comprises a k 1-th sub image block in the first image frame and a k 2-th sub image block in the second image frame which correspond to a backward motion vector used for determining an ith sub image block in the backward intermediate frame; the second matching pair comprises an s 1-th sub image block in the second image frame and an s 2-th sub image block in the first image frame which correspond to a forward motion vector used for determining an ith sub image block in the forward intermediate frame;

determining a first matching error according to the first similarity and a first SAD corresponding to the first matching pair, and determining a second matching error according to the second similarity and a second SAD corresponding to the second matching pair;

and taking the motion vector corresponding to the smaller matching error of the first matching error and the second matching error as the motion vector of the ith sub image block in the intermediate frame to be inserted.

Based on the scheme, the forward intermediate frame and the backward intermediate frame are determined through forward motion estimation and backward motion estimation, the similarity between the forward intermediate frame and the source image and the similarity between the forward intermediate frame and the backward intermediate frame and the absolute error value are compared to determine a matching error, and then the optimal motion vector is screened out from the forward motion vector and the backward motion vector.

In one possible implementation, determining a first similarity between the ith sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair includes: acquiring image edge information and image detail information of the ith sub image block of the backward intermediate frame, and acquiring image edge information and image detail information of the kth 1 sub image block in the first image frame of the first matching pair;

determining a first edge similarity between image edge information of an ith sub image block of the backward intermediate frame and image edge information of a kth 1 sub image block, and determining a first detail similarity between image detail information of the ith sub image block and image detail information of the kth 1 sub image block, wherein a weighted sum of the first edge similarity and the first detail similarity is the first similarity;

or,

determining a first similarity between the i-th sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair, comprising:

acquiring image edge information and image detail information of the ith sub image block of the forward intermediate frame, and acquiring image edge information and image detail information of the s2 th sub image block in the first image frame in the second matching pair;

determining a second edge similarity between the image edge information of the ith sub image block and the image edge information of the s2 th sub image block of the forward intermediate frame, and determining a second detail similarity between the image detail information of the ith sub image block and the image detail information of the s2 nd sub image block, wherein the weighted sum of the second edge similarity and the second detail similarity is the second similarity.

In a possible implementation, the method further includes: before determining a first similarity between an ith sub image block of the backward intermediate frame and a sub image block in a first image frame in a first matching pair and a second similarity between the ith sub image block of the forward intermediate frame and a sub image block in a first image frame in a second matching pair, determining that a gray error between a gray scale image of the ith sub image block of the backward intermediate frame and a gray scale image of the ith sub image block of the forward intermediate frame is greater than an error threshold;

determining a first matching error according to the first similarity and the first SAD corresponding to the first matching pair, including:

adjusting the gray error to be a first value according to the first similarity, and taking the weighted sum of the first value and the first SAD corresponding to the first matching pair as the first matching error;

determining a second matching error according to the second similarity and a second SAD corresponding to the second matching pair, including:

and adjusting the gray scale error value to be a second value according to the second similarity, and taking the weighted sum of the second value and a second SAD corresponding to the second matching pair as the second matching error.

In a possible implementation manner, when the gray difference is less than or equal to a set threshold, the first matching error is a first SAD corresponding to the first matching pair, and the second matching error is a second SAD corresponding to the second matching pair; or,

the first matching error is a weighted sum of the gray error and a first SAD corresponding to the first matching pair, and the second matching error is a weighted sum of the gray error and a second SAD corresponding to the second matching pair.

In a possible implementation manner, acquiring image edge information of a set sub image block, where the set sub image block is an ith sub image block in the forward intermediate frame, or an ith sub image block of the backward intermediate frame, or a kth 1 sub image block of the first image frame, or an s2 sub image block in the first image frame, includes:

determining a first pixel contrast in a horizontal direction and a second pixel contrast in a vertical direction in the set sub-image block; the first pixel contrast is determined according to pixel values of pixels included in a row where a central pixel of the set sub-image block is located, and the second pixel contrast is determined according to pixel values of pixels included in a column where the central pixel is located;

and filtering the set sub image block by taking the sum of the first pixel contrast and the second pixel contrast as a filter coefficient to obtain the edge information of the set sub image block.

In a possible implementation manner, acquiring image detail information of a set sub image block, where the set sub image block is an ith sub image block in the forward intermediate frame, or an ith sub image block of the backward intermediate frame, or a kth 1 sub image block of the first image frame, or a s2 th sub image block in the first image frame, includes:

and performing Gaussian filtering on the set sub-image blocks to obtain image bottom layer information of the set sub-image blocks, and subtracting the image bottom layer information from the image information of the set sub-image blocks to obtain image detail information of the set sub-image blocks.

In one possible implementation manner, the determining a grayscale error between the grayscale map of the i-th sub image block of the backward intermediate frame and the grayscale map of the i-th sub image block of the forward intermediate frame includes:

determining a first gray level histogram of a gray level image of the ith sub image block in the forward intermediate frame and a second gray level histogram of the gray level image of the ith sub image block in the backward intermediate frame; the first gray level histogram is used for representing the number of pixels of the ith sub image block in the forward intermediate frame, and the second gray level histogram is used for representing the number of pixels of the ith sub image block in the backward intermediate frame, wherein the gray level values of the pixels of the ith sub image block in the forward intermediate frame are respectively distributed in M gray levels;

determining the gray scale error according to the first gray scale histogram and the second gray scale histogram, wherein the gray scale error satisfies a condition shown in the following formula:

wherein, theE represents the gray scale error, said a _k Represents a weight corresponding to a k-th gray level, the

Representing the number of pixels whose gray value of the pixel of the ith sub-image block in the forward intermediate frame is distributed at the kth gray level, said->

And M is a positive integer and represents the number of pixels of which the gray value of the pixel of the ith sub image block in the backward intermediate frame is distributed at the kth gray level.

In a second aspect, an embodiment of the present application provides an apparatus for motion estimation of a video image, including:

the image processing device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first image frame and a second image frame, and the first image frame is a last image frame adjacent to the second image frame; the first image frame and the second image frame respectively comprise N sub image blocks;

a determining module, configured to determine a backward motion vector and a first absolute error value SAD for each sub-image block of the first image frame, and a forward motion vector and a second SAD for each sub-image block of the second image frame;

determining a first similarity between an ith sub image block of the backward intermediate frame and a sub image block in a first image frame in a first matching pair, and a second similarity between an ith sub image block of the forward intermediate frame and a sub image block in a first image frame in a second matching pair;

the first matching pair comprises a kth 1 sub image block in the first image frame and a kth 2 sub image block in the second image frame which correspond to a backward motion vector used by an ith sub image block in the backward intermediate frame; the second matching pair comprises an s 1-th sub image block in the second image frame and an s 2-th sub image block in the first image frame which correspond to a forward motion vector used for determining an ith sub image block in the forward intermediate frame;

In a possible implementation manner, when determining the first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair, the determining module is specifically configured to: acquiring image edge information and image detail information of the ith sub image block of the backward intermediate frame, and acquiring image edge information and image detail information of the (k 1) th sub image block in the first image frame in the first matching pair;

or,

the determining module, when determining a first similarity between the ith sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair, is specifically configured to:

acquiring image edge information and image detail information of the ith sub image block of the forward intermediate frame, and acquiring image edge information and image detail information of the s2 th sub image block in the first image frame of the second matching pair;

In a possible implementation manner, the determining module is further configured to: before determining a first similarity between an ith sub image block of the backward intermediate frame and a sub image block in a first image frame in a first matching pair and a second similarity between the ith sub image block of the forward intermediate frame and a sub image block in a first image frame in a second matching pair, determining that a gray error between a gray scale image of the ith sub image block of the backward intermediate frame and a gray scale image of the ith sub image block of the forward intermediate frame is greater than an error threshold;

the determining module, when determining a first matching error according to the first similarity and the first SAD corresponding to the first matching pair, is specifically configured to:

the determining module, when determining a second matching error according to the second similarity and the second SAD corresponding to the second matching pair, is specifically configured to:

and adjusting the gray error to be a second value according to the second similarity, and taking the weighted sum of the second value and a second SAD corresponding to the second matching pair as the second matching error.

In a possible implementation manner, when the grayscale error is less than or equal to a set threshold, the first matching error is a first SAD corresponding to the first matching pair, and the second matching error is a second SAD corresponding to the second matching pair; or,

the first matching error is a weighted sum of the gray scale error and a first SAD corresponding to the first matching pair, and the second matching error is a weighted sum of the gray scale error and a second SAD corresponding to the second matching pair.

In a possible implementation manner, when obtaining image edge information of a set sub image block, the determining module is configured to, when the set sub image block is an ith sub image block in the forward intermediate frame, or an ith sub image block of the backward intermediate frame, or a kth 1 sub image block of the first image frame, or an s2 sub image block of the first image frame, specifically:

In a possible implementation manner, when acquiring image detail information of a set sub image block, the determination module is configured to, when the set sub image block is an ith sub image block in the forward intermediate frame, or an ith sub image block of the backward intermediate frame, or a kth 1 sub image block of the first image frame, or an s2 sub image block in the first image frame, specifically:

and performing Gaussian filtering on the set sub image blocks to obtain image bottom layer information of the set sub image blocks, and subtracting the image bottom layer information from the image information of the set sub image blocks to obtain image detail information of the set sub image blocks.

In a possible implementation manner, when determining a grayscale error between the grayscale map of the i-th sub image block of the backward intermediate frame and the grayscale map of the i-th sub image block of the forward intermediate frame, the determining module is specifically configured to:

determining a first gray level histogram of a gray level image of the ith sub image block in the forward intermediate frame and a second gray level histogram of the gray level image of the ith sub image block in the backward intermediate frame, wherein the first gray level histogram is used for representing the number of pixels of the gray level of the ith sub image block in the forward intermediate frame and the number of pixels of the gray level of the ith sub image block in the backward intermediate frame, and the second gray level histogram is used for representing the number of pixels of the gray level of the ith sub image block in the backward intermediate frame and the number of pixels of the gray level of the ith sub image block in the backward intermediate frame;

determining the gray scale error according to the first gray scale histogram and the second gray scale histogram;

the gray scale error satisfies the condition shown in the following formula:

wherein E represents a gray scale error, a _k Represents a weight corresponding to a k-th gray level, the

In a third aspect, an embodiment of the present application provides an execution apparatus, including:

a memory for storing program instructions;

and a processor, configured to obtain the program instructions stored in the memory, and execute the method according to the first aspect and the different implementation manners of the first aspect according to the obtained program instructions.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores computer instructions that, when executed on a computer, cause the computer to perform the method according to the first aspect and the different implementations of the first aspect.

For technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to the technical effects brought by the first aspect and different implementation manners of the first aspect, and details are not described here.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a video image according to an embodiment of the present application;

fig. 2 is a schematic view of a usage scenario of a display device according to an embodiment of the present application;

fig. 3 is a block diagram of a configuration of a control device 100 according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a hardware configuration of a display device 200 according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a software architecture of a terminal device according to an embodiment of the present application;

FIG. 6A is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 6B is a schematic diagram of another system architecture according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a flowchart illustrating a method for motion estimation of a video image according to an embodiment of the present disclosure;

fig. 9 is a schematic partial view of a subimage block in an image frame according to an embodiment of the present disclosure;

fig. 10 is a schematic diagram of motion estimation provided in an embodiment of the present application;

FIG. 11 is a diagram illustrating a first region range in a second image frame according to an embodiment of the present application;

fig. 12 is a schematic diagram illustrating numbers of sub image blocks in a first area range according to an embodiment of the present application;

FIG. 13 is a diagram illustrating motion vectors provided in an embodiment of the present application;

FIG. 14 is a diagram illustrating an embodiment of determining pixel values in a forward intermediate frame;

FIG. 15 is a diagram illustrating bilinear interpolation according to an embodiment of the present disclosure;

fig. 16 is a schematic diagram of a gray-scale histogram according to an embodiment of the present application;

fig. 17 is a schematic diagram of a motion estimation apparatus for video images according to an embodiment of the present application;

fig. 18 is a schematic diagram of an execution device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

The frame rate enhancement technology enhances the video frame rate by estimating the motion state between the front and rear reference frames, generating a new interpolation frame through the front and rear frames and the motion state, and inserting the new interpolation frame into the original video. The frame rate lifting technology can realize dynamic frame supplement of the picture, ensure the continuity and the definition of the picture, and improve the visual effect and the watching experience. Ideally, the intermediate frames inserted by the forward motion estimation and the backward motion estimation are the same, and the image interpolation is small. As shown in fig. 1, assuming that white is a background area, and the background area is still, the ball moves horizontally to the right, from a dotted line portion to a solid line portion. In backward motion estimation, a background corresponding to a small sphere part in a previous frame is covered by a foreground in a current frame, so that a matching block cannot be accurately found in the current frame by the background where the small sphere part is located in the current frame in the previous frame, and motion estimation of a background area covered by the foreground in the current frame in the backward motion estimation is inaccurate. In the same way, in the forward motion estimation, the background area exposed after the small ball part of the previous frame in the current frame moves is the foreground area of the previous frame, so that the best matching block cannot be accurately matched.

Based on the above problems, the present application provides a method and an apparatus for estimating motion of a video image, by calculating a forward motion vector and a backward motion vector between two adjacent video frames, and inserting a forward intermediate frame according to the forward motion vector and inserting a backward intermediate frame according to the backward motion vector. Further, a first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair and a second similarity between the i-th sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair are determined. And determining a first matching error according to the first similarity and a first SAD corresponding to the first matching pair, and determining a second matching error according to the second similarity and a second SAD corresponding to the second matching pair. And taking the motion vector corresponding to the smaller matching error in the first matching error and the second matching error as the motion vector of the ith sub image block in the intermediate frame to be inserted. Through the scheme, the optimal motion vector is screened out from the forward motion vector and the backward motion vector by utilizing the image information of the image frame, so that the accuracy of motion estimation is improved.

The motion estimation method for the video image provided by the embodiment of the application can be realized by an execution device. In some embodiments, the performing device may be a terminal device. The terminal device may be a display device having a display function. The display device may include: smart televisions, mobile phones, tablet computers, and the like.

The structure and application scenario of the execution device are described below by taking the execution device as a display device as an example. Fig. 2 is a schematic diagram of a usage scenario of the display device in the embodiment. As shown in fig. 2, the display apparatus 200 may also perform data communication with the server 400, and the user may operate the display apparatus 200 through the smart device 300 or the control device 100. In one possible example, the video image may be transmitted to the display apparatus 200 by the server 400, and the display apparatus 200 performs a motion estimation method of the video image.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may control the display apparatus 200 by inputting a user instruction through at least one of a key on a remote controller, a voice input, a control panel input, and the like.

In some embodiments, the smart device 300 may include any of a mobile terminal, a tablet, a computer, a laptop, an AR/VR device, and the like.

In some embodiments, the smart device 300 may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.

In some embodiments, the smart device 300 and the display device 200 may also be used for communication of data.

In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the display device 200 to obtain a voice instruction, or may be received by a voice control apparatus provided outside the display device 200.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.

Fig. 3 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, and a memory. The control apparatus 100 may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200 to mediate interaction between the user and the display device 200.

In some embodiments, the communication interface 130 is used for external communication, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module.

In some embodiments, the user input/output interface 140 includes at least one of a microphone, a touchpad, a sensor, a key, or an alternative module.

The following specifically describes an embodiment by taking the display device 200 as an example. It should be understood that the display apparatus 200 shown in fig. 4 is only one example, and the display apparatus 200 may have more or less components than those shown in fig. 4, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

Fig. 4 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280.

In some embodiments the controller comprises a central processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, the detector 230 includes a light receiver (not shown), a sensor for collecting the intensity of ambient light; alternatively, the detector 230 includes an image collector, such as a camera, which can be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

The CPU processor is a control center of the display device 200, and includes a system on chip SOC, as shown in fig. 4, for executing an operating system and application program instructions stored in the memory, and executing various application programs, data, and contents according to various interactive instructions received from the outside, so as to finally display and play various audio and video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphical objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal that can be directly displayed or played on the display device 200.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module, such as an image synthesizer, is used for performing superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphics generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, perform at least one of decompression and decoding, and denoising, digital-to-analog conversion, and amplification processing according to a standard codec protocol of the input signal, and obtain a sound signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.

In some embodiments, user interface 280 is an interface that may be used to receive control inputs (e.g., physical keys on the body of the display device, or the like).

In some embodiments, the system of the display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 5, in some embodiments, the system is divided into four layers, which are, from top to bottom, an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer.

In some embodiments, at least one application program runs in the application program layer, and the application programs may be windows (windows) programs carried by an operating system, system setting programs, clock programs or the like; or may be an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 5, the application framework layer in the embodiment of the present application includes a manager (Managers), a Content Provider (Content Provider), a View system (View system), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various applications as well as general navigational fallback functions, such as controlling exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for an upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 5, the core layer includes at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power management module etc..

In other embodiments, the execution device may be an electronic device, the electronic device may be implemented by one or more servers, and the servers may be local servers or cloud servers. Referring to fig. 6A, the server 500 may be implemented by a physical server or a virtual server. The server can be implemented by a single server, can be implemented by a server cluster formed by a plurality of servers, and can implement the motion estimation method of the video image provided by the application by a single server or a server cluster. In fig. 6A, the server 500 is connected to the terminal device 600 and the display device 200 as an example. The server 500 may perform a motion estimation method of a video image. In some scenarios, the server 500 may also receive a motion estimation task of the video image sent by the terminal device 600, or send a motion estimation result of the video image to the terminal device 600. In other scenarios, the server 500 may further receive a motion estimation task of a video image sent by the display device 200 and perform motion estimation, or display an inter-frame image to be inserted obtained according to a motion estimation result through the display device 200. As shown in fig. 6B, the server 500 is connected to the display apparatus 200 as an example. The server 500 may perform a motion estimation method of a video image. In some scenarios, the server 500 may receive a motion estimation task of a video image sent by the display device 200, perform motion estimation according to the motion estimation task of the video image, and send an intermediate frame to be inserted obtained through motion estimation to the display device 200. The electronic device may also be a personal computer, a handheld or laptop device, a mobile device (such as a mobile phone, a tablet, a personal digital assistant, etc.).

As an example, referring to fig. 7, an electronic device may include a processor 510, a communication interface 520. The electronic device may also include memory 530. Of course, other components, not shown in fig. 7, may also be included in the electronic device.

The communication interface 520 is used for communicating with the display device, and is used for receiving a motion estimation task of a video image sent by the display device, or sending a motion estimation result of the video image to the display device or obtaining an intermediate frame to be inserted according to a motion estimation result.

In the embodiments of the present application, the processor 510 may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The processor 510 is a control center of the electronic device, connects various parts of the electronic device using various interfaces and routes, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 530 and calling data stored in the memory 530. Alternatively, processor 510 may include one or more processing units. The processor 510 may be a control component such as a processor, a microprocessor, a controller, etc., and may be, for example, a general purpose Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processing (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.

The memory 530 may be used to store software programs and modules, and the processor 510 executes various functional applications and data processing by operating the software programs and modules stored in the memory 530. The memory 530 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Memory 530, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 530 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 530 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 530 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function for storing program instructions and/or data.

It should be noted that the structures shown in fig. 2 to 7 are only examples, and the embodiments of the present invention do not limit this.

The embodiment of the present application provides a method for estimating motion of a video image, and fig. 8 exemplarily shows a flow of the method for estimating motion of a video image, where the flow may be executed by an execution device, which may be the display device 200 shown in fig. 4, and specifically, the controller 250 in the display device 200 may execute motion estimation. Alternatively, the performing device may be the electronic device shown in fig. 7, and the motion estimation may be specifically performed by the processor 510 in the electronic device. The specific process is as follows:

a first image frame and a second image frame are acquired 801.

In some embodiments, the first image frame and the second image frame are consecutive video frames captured by a capture device. The acquisition equipment can be electric police equipment, electronic monitoring equipment, a monitoring camera, a video recorder, terminal equipment (such as a notebook, a computer, a mobile phone and a television) with a video acquisition function and the like.

Illustratively, after the capture device captures the video frame, the server obtains a first image frame and a second image frame from the capture device. The first image frame is a last image frame adjacent to the second image frame.

In some embodiments, the server receives a video file sent by the capture device, wherein the video file includes the first image frame and the second image frame. The video file may be an encoded file of a video. So that the server can decode the received video file to obtain the first image frame and the second image frame. The video is encoded, so that the file size of a video file can be effectively reduced, and the transmission is convenient. Therefore, the transmission speed of the video can be improved, and the efficiency of confirming the video event subsequently can be improved. The manner of acquiring the encoded code stream data may be any applicable manner, including but not limited to: real Time Streaming Protocol (RTSP), open Network Video Interface Forum (ONVIF) standard or proprietary Protocol.

In some embodiments, after the first image frame and the second image frame are acquired, the first image frame and the second image frame may be respectively divided into a plurality of sub image blocks. As an example, the first image frame and the second image frame may respectively comprise N sub image blocks. For example, the original resolutions of the first image frame and the second image frame are 1440 × 810, the first image frame and the second image frame may be divided into 480 × 270 sub image blocks, respectively, each having a resolution of 3 × 3, as shown in fig. 9.

A backward motion vector and a first absolute error value SAD for each sub-image block of the first image frame and a forward motion vector and a second SAD for each sub-image block of the second image frame are determined 802.

In some embodiments, when determining the backward motion vector, this may be achieved by: taking the first sub image block in the first image frame as an example, the first sub image block is any one of the N sub image blocks. A first area range is determined in the second image frame for the first sub-image block. The first area range is obtained by taking the second sub-image in the second image frame as a central block and expanding a set multiple. The position of the first sub image block in the first image frame is the same as the position of the second sub image block in the second image frame, and then the sub image block matched with the first sub image block is determined within the first area, as shown in fig. 10. As an example, the index position of the first sub image block in the N sub image blocks included in the first image frame may be represented as (50, 60), and the first area range in the second image frame may be the sub image blocks whose index positions are in the range of (48, 58) to (52, 62), as shown in fig. 11. Further, a sub image block matching the first sub image block may be determined within the first area. Specifically, the absolute error value SAD between the first sub image block and each sub image block within the first region range is determined, respectively. Wherein the absolute error value is used to express the sum of absolute values of pixel gray scale differences between two sub image blocks. As an example, the sub image blocks in the ranges of (48, 58) and (52, 65) are numbered, and as shown in fig. 12, the absolute error values between the first sub image block and the sub image blocks 1 to 25 are 30,74,40,80,46,75,68,25,48,35,69,45,32,87,22,26,45,55,69,57,54,23,58,41,13 respectively. Further, the sub image block in the first range having the smallest absolute error value with the first sub image block is taken as the sub image block matching the first sub image block. Determining a first pixel index position of the central pixel point of the two matched sub-image blocks in the first image frame and a second pixel index position of the central pixel point of the two matched sub-image blocks in the second image frame respectively, and determining a backward motion vector of the first sub-image block according to the first pixel index position and the second pixel index position. Next, for example, the absolute error value between the first sub image block and the sub image block No. 25 is the smallest, and the sub image block No. 25 is determined to be the sub image block matching the first sub image block. And determining a backward motion vector according to the pixel index positions of the central points of the first sub image block and the matched sub image block. For example, the size of each sub image block is 3 × 3, the pixel index position of the center point of the sub image block No. 25 in the second image frame is (155,185), the pixel index position of the center point of the first sub image block is (149,179), the backward motion vector corresponding to the first sub image block can be determined according to the index positions of the two matched image blocks, the backward motion vector of the first sub image block can be represented by mv, mv is (6,6), and the first SAD of the backward motion vector corresponding to the first sub image block is 13.

In some embodiments, when determining the forward motion vector, this may be accomplished by: taking the third sub image block in the second image frame as an example, the third sub image block is any one of the N sub image blocks included in the second image frame. A second area range is determined in the first image frame for the third sub-image block. The second area range is the area range of the first image frame expanded by a set multiple by taking the fourth sub image block as the center. The position of the fourth sub image block in the first image frame is the same as the position of the third sub image block in the second image frame. Further, a sub image block matching the third sub image block may be determined within the second area, and the sub image block having the smallest absolute error value between the second area and the third sub image block may be used as the sub image block matching the third sub image block, as shown in fig. 10. Determining a third pixel index position of central pixel points of the two matched sub image blocks in the first image frame and a fourth pixel index position of central pixel points of the two matched sub image blocks in the second image frame respectively, determining a forward motion vector of the third sub image block according to the third pixel index position and the fourth pixel index position, and taking an absolute error value between the two matched sub image blocks as a second SAD.

803, a backward intermediate frame is determined from the N sub image blocks in the first image frame and the corresponding backward motion vectors, and a forward intermediate frame is determined from the N sub image blocks in the second image frame and the corresponding forward motion vectors.

In some embodiments, when it is determined that the intermediate frame to be inserted is the intermediate position of the first image frame and the second image frame, for each sub image block of the first image frame, a pixel index position of each sub image block in the backward intermediate frame in the first image frame may be determined according to a half of a backward motion vector respectively corresponding to each sub image block. And then determining a backward intermediate frame according to each sub image block in the first image frame and the pixel index position corresponding to each sub image block. Similarly, the pixel index position of each sub image block in the second image frame in the forward intermediate frame can be determined according to each sub image block of the second image frame and half of the forward motion vector corresponding to each sub image block; and determining a forward intermediate frame according to each sub image block in the second image frame and the pixel index position corresponding to each sub image block.

In some embodiments, in order to ensure that the motion vectors and the pixels of the original image frame correspond to each other one by one and are convenient to take values, the number of the motion vectors corresponding to each sub-image block may be scaled up to the size of the original image frame. Taking the first image frame as an example, the resolution of the first image frame is 1440 × 810. When the first image frame is divided into 480 × 270 sub-image blocks, 480 × 270 backward motion vectors can be obtained after motion estimation. The motion vector of each sub image block is a motion vector corresponding to 3 × 3 pixel points in the sub image block. Therefore, each backward motion vector can be copied by a number of 3 × 3 to obtain a backward motion vector of the same size as the first image frame. As shown in fig. 13, the left image is backward motion vectors corresponding to 3 × 3 sub image blocks, and each small block corresponds to a motion vector. After copying the number of the motion vectors by 3 × 3 times, a vector map corresponding to the first image frame is obtained, and the size of the vector map is 6 × 6. In the vector diagram corresponding to the first image frame, each pixel corresponds to one motion vector, and the motion vectors corresponding to each pixel in one sub-image block are the same, for example, in the portion with diagonal marks in fig. 13, the vectors in the portion with diagonal marks are the same.

In some embodiments, after determining the motion vector corresponding to the pixel point, the position and the number of the intermediate frames to be interpolated are determined. For example, the number and the position of the intermediate frames to be inserted may be adjusted according to actual situations, which is not specifically limited in this application. For example, when the number of the intermediate frames to be inserted is one frame, the position of the intermediate frame to be inserted may be an intermediate position of the first image frame and the second image frame. For another example, when the number of the intermediate frames to be inserted is two frames, the positions of the frames to be inserted may be one-third and two-thirds positions between the first image frame and the second image frame, respectively. Taking the intermediate frame to be inserted as one frame as an example, the value of each pixel point can be determined according to the position of half of the motion vector corresponding to the pixel point.

As an example, the forward motion vector may be in mv _x It is indicated that the pixel point of the second image frame is a value (i, j), and the position of the pixel point corresponding to the pixel point of the forward intermediate frame is (i ', j'). i ', j' satisfies the condition shown in the following formula:

taking the second image frame as an example, as shown in fig. 14, the position of a pixel in the second image frame is (1245, 567), and if the forward motion vector corresponding to the pixel is (4,4), the position of the pixel in the second image frame corresponding to the pixel in the forward intermediate frame is (1247, 569) according to half of the motion vector, and the pixel value of the pixel in the second image frame is assigned to the pixel position with the index position of (1247, 569) in the forward intermediate frame.

In some embodiments, when a point between two pixels of a channel is interpolated in the x direction and the y direction, a bilinear interpolation method may be used, where the interpolation is performed according to the following formula:

f(i+u,j+v)＝(1-u)(1-v)f(i,j)+u(1-v)f(i+1,j)+v(1-u)f(i,j+1)+uvf(i+1,j+1)；

where u represents the step size in the y direction and v represents the step size in the x direction, as shown in fig. 15.

In some embodiments, edge protection may be performed during the determination of the motion vector so that the motion vector does not indicate the extent of the original image.

A first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair and a second similarity between the i-th sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair are determined 804.

The first matching pair comprises a k 1-th sub image block in a first image frame and a k 2-th sub image block in a second image frame, wherein the k 1-th sub image block corresponds to a backward motion vector used by the i-th sub image block in the backward intermediate frame; the second matching pair comprises an s 1-th sub image block in the second image frame and an s 2-th sub image block in the first image frame, which correspond to the forward motion vector used for determining the ith sub image block in the forward intermediate frame.

In some embodiments, determining the first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair may be implemented by: acquiring image edge information and image detail information of the ith sub image block of the backward intermediate frame, and acquiring image edge information and image detail information of the (k 1) th sub image block in the first image frame in the first matching pair. Further, a first edge similarity between the image edge information of the ith sub image block and the image edge information of the kth 1 sub image block of the backward intermediate frame may be determined, and a first detail similarity between the image detail information of the ith sub image block and the image detail information of the kth 1 sub image block may be determined, and a weighted sum of the first edge similarity and the first detail similarity may be used as the first similarity.

In some embodiments, determining the first similarity between the ith sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair may be implemented by: and acquiring image edge information and image detail information of the ith sub image block of the forward intermediate frame, and acquiring image edge information and image detail information of the s2 th sub image block in the first image frame in the second matching pair. Further, a second edge similarity between the image edge information of the ith sub image block and the image edge information of the s2 nd sub image block of the forward intermediate frame may be determined, and a second detail similarity between the image detail information of the ith sub image block and the image detail information of the s2 nd sub image block may be determined, and a weighted sum of the second edge similarity and the second detail similarity may be used as the second similarity.

In some embodiments, the setting of the sub image block as the ith sub image block in the forward intermediate frame, or as the ith sub image block in the backward intermediate frame, or as the kth 1 sub image block in the first image frame, or as the s2 sub image block in the first image frame, and the obtaining of the image edge information of the set sub image block may be implemented as follows: the image edge information is determined through the difference of the gray values of the edges, specifically, the first pixel contrast is determined according to the pixel values of the pixels included in the row where the central pixel of the set sub-image block is located, and the second pixel contrast is determined according to the pixel values of the pixels included in the column where the central pixel is located. In some scenarios, it may be determined that the difference between the pixel values of each two adjacent pixels in the pixel in the row where the central pixel of the set sub-image block is located obtains a plurality of pixel value differences, and then the plurality of pixel value differences are summed to obtain the first pixel contrast. Similarly, the difference between the pixel values of every two adjacent pixels in the pixel in the column where the central pixel of the set sub-image block is located can be determined to obtain a plurality of pixel value differences, and then the plurality of pixel value differences are summed to obtain the second pixel contrast. In other scenes, the difference between the pixel values of any two non-adjacent pixels in a plurality of groups of pixels in the row where the central pixel of the set sub-image block is located can be determined to obtain a plurality of pixel value differences, and then the pixel value differences are summed to obtain the first pixel contrast. Similarly, the difference between the pixel values of any two non-adjacent pixels in the pixel group where the central pixel of the set sub-image block is located can be determined to obtain a plurality of pixel value differences, and then the plurality of pixel value differences are summed to obtain the second pixel contrast. In some other scenarios, the difference between pixel values of two sets of pixels symmetric with respect to the central pixel point in the pixel point of the row where the central pixel point of the set sub-image block is located may be determined to obtain a plurality of pixel value differences, and then the plurality of pixel value differences are summed to obtain the first pixel contrast. Similarly, the pixel value differences between two groups of pixels symmetric with the central pixel point in the row of the central pixel point of the set sub-image block can be determined to obtain a plurality of pixel value differences, and then the pixel value differences are summed to obtain the second pixel contrast. For example, the first pixel contrast and the second pixel contrast satisfy a condition shown by the following formula:

HF _h ＝Curve(|∑ _q＝(x-i,y) I _q -∑ _q＝(x+i,y) I _q |)；

HF _v ＝Curve(|∑ _q＝(x,y-i) I _q -∑ _q＝(x,y+i) I _q |)；

wherein, HF _h Representing a first pixel contrast, HF _v Representing the second pixel contrast, curve represents a control parameter for adaptive adjustment, i =1,2,3.

Further, the set sub image block may be filtered using a sum of the first pixel contrast and the second pixel contrast as a filter coefficient to obtain edge information of the set sub image block.

In some embodiments, the filter coefficients satisfy the condition shown in the following equation:

HF＝HF _h +HF _v ；

wherein, HF _h Is shown asA pixel contrast, HF _v Representing the second pixel contrast and HF the filter coefficients.

In some embodiments, the setting of the sub image block as the ith sub image block in the forward intermediate frame, or as the ith sub image block in the backward intermediate frame, or as the kth 1 sub image block in the first image frame, or as the s2 sub image block in the first image frame, and the obtaining of the image detail information of the setting of the sub image block may be implemented as follows: and performing Gaussian filtering on the set sub-image blocks to obtain the image bottom layer information of the set sub-image blocks. And subtracting the image bottom layer information from the image information of the set sub-image block to obtain the image detail information of the set sub-image block. The image detail information satisfies the condition shown in the following formula:

D _tl ＝I-f(I)；

wherein D is _tl The image detail information of the set sub image block is shown, I is the image information of the set sub image block, and f (I) is the image bottom layer information of the set sub image block.

805, a first matching error is determined according to the first similarity and the first SAD corresponding to the first matching pair, and a second matching error is determined according to the second similarity and the second SAD corresponding to the second matching pair.

806, taking the motion vector corresponding to the smaller matching error of the first matching error and the second matching error as the motion vector of the ith sub image block in the intermediate frame to be inserted.

In some embodiments, prior to determining the first similarity between the ith sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair and the second similarity between the ith sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair, determining that a grayscale error between the grayscale map of the ith sub image block of the backward intermediate frame and the grayscale map of the ith sub image block of the forward intermediate frame is greater than an error threshold.

In some embodiments, determining the gray scale error between the gray scale map of the ith sub image block of the backward intermediate frame and the gray scale map of the ith sub image block of the forward intermediate frame may be implemented as follows: and determining a first gray level histogram of the gray scale of the ith sub-image block in the forward intermediate frame and a second gray level histogram of the gray scale of the ith sub-image block in the backward intermediate frame. The first gray level histogram is used for representing the number of pixels of the ith sub image block in the forward intermediate frame, the gray level values of the pixels of the ith sub image block in the forward intermediate frame are respectively distributed in M gray levels, the second gray level histogram is used for representing the number of pixels of the ith sub image block in the backward intermediate frame, and M is a positive integer. As an example, the gray scales 0-255 may be divided into M levels on average, the gray scale range of the first gray scale may be represented as [0, 255/M ], the gray scale range of the second gray scale may be represented as [255/M +1, (255/M) × 2], and so on, to determine the gray scale range corresponding to each gray scale. When M is equal to 5, the first and second gradation histograms are as shown in fig. 16. Further, a gray scale error may be determined from the first and second gray scale histograms. The gray scale error satisfies the condition shown in the following formula:

wherein E represents a gray error, a _k Weight expression corresponding to k-th gray scale

Represents the number of pixels whose gray value of the pixel of the ith sub-image block in the forward intermediate frame is distributed at the kth gray level, and/or is greater than the kth gray level>

And M is a positive integer and represents the number of pixels of which the gray value of the ith sub-image block in the backward intermediate frame is distributed at the kth gray level. As an example, the number of pixels whose gray values are distributed in 5 gray levels in the gray map of the i-th sub image block in the forward intermediate frame is 4,7,5,3,7, respectively, and the number of pixels whose gray values are distributed in 5 gray levels in the gray map of the i-th sub image block in the backward intermediate frame is 3,6,7,3,4, respectively. In some scenes, the gray scale error between the gray scale map of the ith sub image block in the backward intermediate frame and the gray scale map of the ith sub image block in the forward intermediate frame is E =1+ 2+3=7.

In some embodiments, when it is determined that a grayscale error between the grayscale image of the i-th sub image block of the backward intermediate frame and the grayscale image of the i-th sub image block of the forward intermediate frame is greater than a set threshold, a first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair and a second similarity between the i-th sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair may be determined.

In some embodiments, when it is determined that a grayscale error between the grayscale image of the i-th sub image block of the backward intermediate frame and the grayscale image of the i-th sub image block of the forward intermediate frame is greater than a set threshold, and a first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair and a second similarity between the i-th sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair are determined, a first matching error may be determined according to the first similarity and the first SAD corresponding to the first matching pair, and a second matching error may be determined according to the second similarity and the second SAD corresponding to the second matching pair.

Taking the determination of the first matching error according to the first similarity and the first SAD corresponding to the first matching pair as an example, the following steps can be implemented: and adjusting the gray error to be a first value according to the first similarity, and taking the weighted sum of the first value and the first SAD corresponding to the first matching pair as a first matching error. Specifically, when the first similarity is larger, the first value obtained by adjusting the gray scale error is smaller; conversely, when the first similarity is smaller, the first value obtained by adjusting the gray scale error is larger. Determining a second matching error according to the second similarity and the second SAD corresponding to the second matching pair may be implemented as follows: and adjusting the gray error to be a second value according to the second similarity, and taking the weighted sum of the second value and the second SAD corresponding to the second matching pair as a second matching error. Similarly, when the second similarity is larger, the second value obtained by adjusting the gray scale error is smaller, and when the second similarity is smaller, the second value obtained by adjusting the gray scale error is larger.

Further, a weighted sum of the first value and the first SAD may be taken as a first matching error, and a weighted sum of the second value and the second SAD may be taken as a second matching error.

In some embodiments, when the gray difference is less than or equal to the set threshold, the first matching error is a first SAD corresponding to the first matching pair, and the second matching error is a second SAD corresponding to the second matching pair. In other embodiments, the first matching error is a weighted sum of the gray scale error value and a first SAD corresponding to the first matching pair, and the second matching error is a weighted sum of the gray scale error value and a second SAD corresponding to the second matching pair.

Based on the same technical concept, the embodiment of the present application provides a motion estimation apparatus 1700 for video images, as shown in fig. 17. The apparatus 1700 may perform any step of the motion estimation method for video images, and is not described herein again to avoid repetition. The apparatus 1700 includes an obtaining module 1701 and a determining module 1702.

An obtaining module 1701, configured to obtain a first image frame and a second image frame, where the first image frame is a previous image frame adjacent to the second image frame; the first image frame and the second image frame respectively comprise N sub image blocks;

a determining module 1702, configured to determine a backward motion vector and a first absolute error value SAD for each sub image block of the first image frame, and a forward motion vector and a second SAD for each sub image block of the second image frame;

the first matching pair comprises a kth 1 sub image block in the first image frame and a kth 2 sub image block in the second image frame which correspond to a backward motion vector used by an ith sub image block in the backward intermediate frame; the second matching pair comprises an s 1-th sub image block in the second image frame and an s 2-th sub image block in the first image frame which correspond to a forward motion vector used by an i-th sub image block in the forward intermediate frame;

and taking the motion vector corresponding to the smaller matching error in the first matching error and the second matching error as the motion vector of the ith sub image block in the intermediate frame to be inserted.

In some embodiments, the determining module 1702, when determining the first similarity between the i-th sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair, is specifically configured to: acquiring image edge information and image detail information of the ith sub image block of the backward intermediate frame, and acquiring image edge information and image detail information of the kth 1 sub image block in the first image frame of the first matching pair;

or,

the determining module 1702, when determining the first similarity between the i-th sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair, is specifically configured to:

In some embodiments, the determining module 1702 is further configured to: before determining a first similarity between an ith sub image block of the backward intermediate frame and a sub image block in a first image frame in a first matching pair and a second similarity between the ith sub image block of the forward intermediate frame and a sub image block in a first image frame in a second matching pair, determining that a gray error between a gray scale image of the ith sub image block of the backward intermediate frame and a gray scale image of the ith sub image block of the forward intermediate frame is greater than an error threshold;

the determining module 1702, when determining a first matching error according to the first similarity and the first SAD corresponding to the first matching pair, is specifically configured to:

the determining module 1702, when determining the second matching error according to the second similarity and the second SAD corresponding to the second matching pair, is specifically configured to:

In some embodiments, when the grayscale error is less than or equal to a set threshold, the first matching error is a first SAD corresponding to the first matching pair, and the second matching error is a second SAD corresponding to the second matching pair; or,

In some embodiments, when obtaining the image edge information of the set sub image block, the determining module 1702 is configured to, in the forward intermediate frame, set the ith sub image block, or set the ith sub image block of the backward intermediate frame, or set the kth 1 sub image block of the first image frame, or set the s2 sub image block of the first image frame, specifically:

In some embodiments, when obtaining the image detail information of the set sub image block, the determining module 1702 is configured to, in the forward intermediate frame, set the ith sub image block, or the ith sub image block of the backward intermediate frame, or the kth 1 sub image block of the first image frame, or the s2 sub image block of the first image frame, specifically:

In some embodiments, the determining module 1702, when determining the gray scale error between the gray scale map of the i-th sub image block of the backward intermediate frame and the gray scale map of the i-th sub image block of the forward intermediate frame, is specifically configured to:

the gray scale error satisfies the condition shown in the following formula:

/>

Representing the number of pixels whose gray value of the pixel of the ith sub-image block in the forward intermediate frame is distributed in the kth gray level, said->

Based on the same technical concept, the embodiment of the present application provides an execution apparatus 1800, and the apparatus 1800 may implement any step of the motion estimation method for video images discussed above, please refer to fig. 18. The apparatus includes a memory 1801 and a processor 1802.

The memory 1801 is configured to store program instructions;

the processor 1802 is configured to call the program instructions stored in the memory, and execute the motion estimation method of the video image according to the obtained program.

In the embodiments of the present application, the processor 1802 may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in a processor.

The memory 1801 serves as a non-volatile computer-readable storage medium that may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 1801 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 1801 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1801 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function, and may be used for storing program instructions and/or data.

Based on the same technical concept, an embodiment of the present application provides a computer-readable storage medium, including: computer program code which, when run on a computer, causes the computer to perform the method for motion subset of video images as discussed in the foregoing. Since the principle of solving the problem of the computer-readable storage medium is similar to the motion estimation method of the video image, the implementation of the computer-readable storage medium can refer to the implementation of the method, and repeated details are not repeated.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for motion estimation of a video image, comprising:

2. The method of claim 1, wherein determining a first similarity between the ith sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair comprises: acquiring image edge information and image detail information of the ith sub image block of the backward intermediate frame, and acquiring image edge information and image detail information of the kth 1 sub image block in the first image frame of the first matching pair;

determining a first edge similarity between image edge information of an ith sub image block of the backward intermediate frame and image edge information of a kth sub image block, and determining a first detail similarity between image detail information of the ith sub image block and image detail information of the kth sub image block, wherein a weighted sum of the first edge similarity and the first detail similarity is the first similarity;

or,

determining a first similarity between an i-th sub image block of the forward intermediate frame and a sub image block in a first image frame in a second matching pair, comprising:

3. The method of claim 2, wherein the method further comprises:

determining a first similarity between the ith sub image block of the backward intermediate frame and the sub image block in the first image frame in the first matching pair and determining a gray scale error between the gray scale image of the ith sub image block of the backward intermediate frame and the gray scale image of the ith sub image block of the forward intermediate frame to be larger than an error threshold before a second similarity between the ith sub image block of the forward intermediate frame and the sub image block in the first image frame in the second matching pair;

4. The method of claim 3, wherein when the grayscale error is less than or equal to a set threshold, the first matching error is a first SAD corresponding to the first matching pair, and the second matching error is a second SAD corresponding to the second matching pair; or,

5. The method according to any one of claims 2-4, wherein obtaining image edge information of a set sub image block, the set sub image block being an i-th sub image block in the forward intermediate frame, or being an i-th sub image block of the backward intermediate frame, or being a k 1-th sub image block of the first image frame, or being a s 2-th sub image block in the first image frame, comprises:

6. The method according to any of claims 2-4, wherein obtaining image detail information of a set sub image block, the set sub image block being an i-th sub image block in the forward intermediate frame, or an i-th sub image block in the backward intermediate frame, or a k 1-th sub image block of the first image frame, or an s 2-th sub image block in the first image frame, comprises:

7. The method according to claim 3 or 4, wherein said determining the gray-scale error between the gray-scale map of the i-th sub-image block of the backward intermediate frame and the gray-scale map of the i-th sub-image block of the forward intermediate frame comprises:

the gray scale error satisfies the condition shown in the following formula:

And M is a positive integer and represents the number of pixels of which the gray value of the ith sub-image block in the backward intermediate frame is distributed at the kth gray level.

8. An apparatus for motion estimation of video images, comprising:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first image frame and a second image frame, and the first image frame is a last image frame adjacent to the second image frame; the first image frame and the second image frame respectively comprise N sub image blocks;

9. An execution device, comprising:

a memory for storing program instructions;

a processor for retrieving the program instructions stored by the memory and executing the method of any one of claims 1-7 in accordance with the retrieved program instructions.

10. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-7.