CN115334228A

CN115334228A - Video processing method and related device

Info

Publication number: CN115334228A
Application number: CN202110454370.2A
Authority: CN
Inventors: 文锦松; 艾金钦; 王梓仲; 贾彦冰; 周蔚; 徐培
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2022-11-11
Also published as: WO2022228196A1

Abstract

The application discloses a video processing method which is applied to a terminal. The method comprises the following steps: when the terminal is in a night scene shooting mode, detecting whether a light area exists in an image shot by the terminal; when a light area exists in an image shot by a terminal, determining a first image in an image sequence collected by the terminal, wherein overexposure occurs in an area adjacent to the light area in the first image; determining a second image and a third image in the image sequence according to the first image, wherein the degree of overexposure of the adjacent area of the lighting area in the second image and the third image is less than that of the adjacent area of the lighting area in the first image; generating a target image according to the first image, the second image and the third image; based on the target image, a video is generated. Based on the scheme, the tremor image in the video can be replaced by the normally displayed image, so that the problem that the light area in the video image is abnormally expanded due to lens shake is solved.

Description

Video processing method and related device

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a video processing method and a related apparatus.

Background

With the development of society, people increasingly use various terminals to shoot videos, including the consumer field, the video monitoring field and the like. The problem of lens jitter when a portable terminal device such as a smart phone shoots a video is one of the biggest difficulties in generating a high-quality video. Since the portable terminal is relatively portable, a user may generally use the portable terminal to capture a video while moving, which results in an unstable video picture. In the moving process of a user, a lens of the portable terminal shakes, so that a light area in a video picture is sensed by a light sensing element in a larger area, and the phenomenon that the light area in the video picture is abnormally expanded occurs.

Currently, a video processing scheme is needed to solve the problem of abnormal light expansion in a video frame caused by lens shake.

Disclosure of Invention

The application provides a video processing method, which can replace a tremor image in a video with a normally displayed image, thereby solving the problem that a light area in a video picture is abnormally expanded due to lens shake.

A first aspect of the present application provides a video processing method, which may be applied to a terminal. The method comprises the following steps: when the terminal is in a night scene shooting mode, whether a light area exists in an image shot by the terminal is detected.

When the light area exists in the image shot by the terminal, the terminal determines a first image in the collected image sequence, and the adjacent area of the light area in the first image is over-exposed. The occurrence of overexposure in the vicinity of the lighting area in the first image means that the brightness value of the pixels in the vicinity is too high. The reason why the adjacent area of the light area in the first image is overexposed is that the terminal is shaken and displaced when the first image is acquired, so that the light sensing element in the terminal corresponding to the adjacent area also senses the light in the light area, and finally the adjacent area of the light area in the first image is overexposed.

After a first image is determined, a second image and a third image are determined in the image sequence by a terminal according to the first image, the acquisition time of the first image is located between the acquisition time of the second image and the acquisition time of the third image, and the overexposure degree of the adjacent area of the lighting area in the second image and the third image is smaller than the overexposure degree of the adjacent area of the lighting area in the first image. The lighting areas in the second image and the third image have a corresponding relationship with the lighting area in the first image, that is, the lighting areas in the second image and the third image and the lighting area in the first image represent the same picture content. For example, the light area in the second image and the third image represents the light emitting area of a street lamp; the light area in the first image is also a light emitting area representing the same street lamp.

And the terminal generates a target image according to the first image, the second image and the third image, wherein the target image is used for replacing the first image, and the overexposure degree of the area adjacent to the lighting area of the target image is smaller than the overexposure degree of the area adjacent to the lighting area of the first image. The second image and the third image are used as reference images to generate a target image replacing the first image, so that the generated overexposure degree of the adjacent area of the lighting area of the target image is smaller than the overexposure degree of the adjacent area of the lighting area of the first image.

And the terminal generates a video based on the target image. Specifically, the terminal may replace a first image in the acquired image sequence with a target image, thereby generating a video.

In the embodiment, a tremor image with overexposure in the adjacent area of the light area is positioned in the night scene shooting mode, images before and after the tremor image are used as reference images, and a target image for replacing the tremor image is generated based on the tremor image and the reference images. According to the scheme, the tremor image in the video can be replaced by the normally displayed image, so that the problem that the light area of the bright part in the video image is abnormally expanded due to lens shake is solved.

In a possible implementation manner, the terminal may detect whether a lighting area exists in the captured image through an image recognition manner or a pre-trained neural network.

Specifically, the terminal may determine whether a lighting area exists in the image by detecting the brightness value of a pixel in the photographed image. Specifically, the brightness values of the pixels in the light area are all larger than or equal to a specific threshold, and the difference value between the brightness value of the pixel in the light area and the brightness value of the pixel in the adjacent area of the light area is larger than or equal to a preset difference value.

Wherein, the value of the specific threshold may be 180-230, for example, the specific threshold is specifically 224; the predetermined difference may be 25-50, for example, the predetermined difference is 32. For example, when the terminal detects that the luminance values of the pixels in a certain region in the image are all greater than or equal to 224, and the difference value between the luminance value of the pixel in the region and the luminance value of the pixel in the adjacent region is greater than or equal to 32, the terminal may determine that the region is the light region. The adjacent region of the light region may be formed by pixels adjacent to pixels in the light region, that is, pixels in the adjacent region of the light region are adjacent to pixels in the light region.

In addition, after the terminal determines that the light area exists in the image, the terminal can also determine the position of the light area in the image according to the coordinates of the pixels located in the light area. The position of the light area in the image may be determined by coordinates of pixels at the boundary of the light area, that is, an area surrounded by the pixels at the boundary of the light area is the light area. The terminal may determine the location of the light area by recording the coordinates of the pixels at the border of the light area or the coordinates of all pixels within the light area.

Specifically, the above-mentioned overexposure degree of the area adjacent to the light area in the second image and the third image is less than the overexposure degree of the area adjacent to the light area in the first image may be: the area of the region with the overexposure in the adjacent region of the lamplight region in the second image and the third image is smaller than the area of the region with the overexposure in the adjacent region of the lamplight region in the first image; or the brightness values of the pixels with the overexposure in the adjacent areas of the lighting areas in the second image and the third image are smaller than the brightness values of the pixels with the overexposure in the adjacent areas of the lighting areas in the first image.

Optionally, the area of a region in the second image and the vicinity of the light region in the third image where overexposure occurs may be 0, that is, the vicinity of the light region in the second image and the third image where overexposure does not occur. In general, the second image and the third image are acquired by the terminal with little or no shake displacement relative to the first image, so that the degree of overexposure of the adjacent area of the light area in the second image and the third image is less than that in the first image.

In one possible implementation, the method further includes: and when the illuminance of the ambient light is smaller than a first preset threshold, the terminal enters a night scene shooting mode. Illustratively, the value of the first preset threshold may be 30-60, for example, the first preset threshold is 50. The terminal can detect the illuminance of the ambient light in the shooting process, and when the illuminance of the ambient light is less than 50, the terminal enters a night scene shooting mode. Or when the terminal acquires an instruction for triggering the night scene shooting mode, the terminal enters the night scene shooting mode. For example, in the process of shooting by operating the terminal by a user, the user can issue an instruction for entering a night scene shooting mode to the terminal in a mode of touching a terminal screen; therefore, when the terminal acquires the instruction for triggering the night scene shooting mode, the terminal enters the night scene shooting mode.

In addition, the terminal may determine whether the current shooting scene is a night scene according to the picture content of the preview image and/or the environment brightness value of each area of the preview image during the shooting process, so as to determine whether to enter a night scene shooting mode. For example, when the picture content of the preview image includes night sky or night scene light source and the like, the terminal may determine that the current shooting scene is a night scene, and thus enter a night scene shooting mode; or, when the environment brightness value in each region of the preview image conforms to the brightness distribution characteristic of the image in the night scene environment, the terminal may determine that the current shooting scene is the night scene, and thus enter the night scene shooting mode.

In a possible implementation manner, the shake displacement of the terminal when acquiring the first image is greater than a second preset threshold. The second preset threshold may be 300-400, for example, the second preset threshold is 360. The jitter displacement when the terminal acquires the first image can be called the jitter value of the first image. Specifically, the terminal may acquire sensor data recording a motion of the terminal at the time of acquiring the first image, and calculate a shake value of the first image based on the sensor data. The terminal may also calculate a contrast value of the image and determine the jitter value of the image according to the contrast value of the image, i.e. the smaller the contrast value of the image, the larger the jitter value of the image. In addition, the terminal can also determine the jitter value of the image based on a pre-trained neural network.

In a possible implementation manner, the terminal may further determine whether a local lighting area exists in the plurality of images. And only when a certain image has a local light area and the jitter value of the image is greater than or equal to a second preset threshold value, the terminal determines the image as a tremor image.

The method comprises the steps that a local lighting area exists in a first image, the ratio of the number of target pixels in the first image to the total number of pixels of the first image is within a preset range, and the target pixels are pixels in the lighting area in the first image. That is, the ratio of the area of the lamp light region in the first image to the area of all the regions in the first image is within a preset range. The value of the preset range may be, for example, 2% to 20%, that is, the area of the light region in the first image is within 2% to 20%. For example, the terminal may determine the number of target pixels in each of the plurality of images, where the target pixels are pixels in a light area, and the definition of the light area may refer to the above description. Then, the terminal obtains the ratio of the number of the target pixels in each image to the total number of the pixels of the image, and judges whether the ratio is in a preset range. If the ratio of the number of the target pixels in the image to the total number of the pixels in the image is within a preset range, the terminal determines that a local light area exists in the image; and if the ratio of the number of the target pixels in the image to the total number of the pixels in the image is not in a preset range, the terminal determines that no local light area exists in the image.

In a possible implementation manner, the terminal determines a second image and a third image in the plurality of images according to the first image, and the method includes the following steps: the terminal respectively determines a stable value of each image in the plurality of images, wherein the stable value is an inverse number of a sum of a jitter value of the image and an acquisition time difference between the image and the first image. Wherein, the stable value of the image and the jitter value of the image and the acquisition time difference between the image and the first image are in a negative correlation relationship. That is, the larger the jitter value of an image, the smaller the stable value of the image; the larger the acquisition time difference between the image and the first image, the smaller the stabilization value of the image. In short, the terminal needs to select images having a small jitter value and close to the first image as much as possible as the second image and the third image.

Then, the terminal determines an image with the largest stable value in the plurality of images as the second image, and determines the third image in the plurality of images according to the acquisition time of the second image, so that the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the third image.

Specifically, the terminal may determine one or more images in the plurality of images according to the acquisition time of the second image, where the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images; then, the terminal determines an image with the largest stable value among the one or more images as the third image. The stable value of the second image determined based on the above-described manner is larger than that of the third image, and therefore the second image may be regarded as a primary reference image for generating the target image, and the third image may be regarded as a secondary reference image for generating the target image.

In a possible implementation manner, the determining, by the terminal, motion vectors corresponding to the plurality of image blocks according to the first image and the third image includes the following steps: first, the terminal obtains a plurality of candidate motion vectors corresponding to a first image block. The image blocks in the second image include the first image block, and the first image block may be any one of the image blocks. Then, the terminal determines a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors according to the position of the first image block, wherein the first image comprises the second image block, and the third image comprises the third image block.

Secondly, the terminal determines a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block. Wherein the error value between the two image blocks represents a difference between the two image blocks, the larger the error value, the larger the difference between the two image blocks.

And finally, the terminal determines the motion vector with the minimum target error value in the candidate motion vectors as the motion vector corresponding to the first image block according to the target error value corresponding to each candidate motion vector.

In one possible implementation, the plurality of candidate motion vectors includes: one or more preset motion vectors, one or more randomly generated motion vectors, and/or motion vectors corresponding to image blocks adjacent to the first image block.

Because the second image and the third image are two images with similar acquisition time, the displacement of the object in the second image relative to the third image is small, and thus the motion vector corresponding to the image block in the second image can be considered to be in the preset range. In this way, the terminal may obtain one or more preset motion vectors and use these preset motion vectors as candidate motion vectors. In addition, in order to ensure that the motion vector actually corresponding to the first image block can be acquired as much as possible, one or more motion vectors may be randomly generated or the motion vectors corresponding to the image blocks adjacent to the first image block may be selected as candidate motion vectors, so that the value range of the candidate motion vectors is prevented from being too limited.

In a possible implementation manner, after the terminal generates the target image based on the plurality of image blocks of the second image, if an area in the target image, in which the image cannot be generated based on the plurality of image blocks, exists, the terminal divides the third image to obtain the plurality of image blocks of the third image.

Then, the terminal determines motion vectors corresponding to a plurality of image blocks of the third image according to the first image and the second image. The process of determining, by the terminal, the motion vectors corresponding to the plurality of image blocks of the third image is similar to the process of determining, by the terminal, the motion vectors corresponding to the plurality of image blocks of the second image, and specific reference may be made to the description of the above embodiment, which is not described herein again.

And finally, the terminal updates the target image according to the motion vectors corresponding to the image blocks of the third image and the image blocks of the third image to obtain a new target image, wherein the new target image is used for replacing the first image. The updating of the target image by the terminal means that the terminal moves the image blocks in the third image to the void areas in the target image according to the motion vectors corresponding to the plurality of image blocks of the third image, so that the void areas are filled. For the areas of the target image where the image already exists, the areas where the image already exists are not updated.

According to the scheme, the motion vector of the third image is obtained, and the target image is updated based on the motion vector of the third image, so that the void area in the target image can be effectively eliminated, and the image quality of the target image is improved.

In a possible implementation manner, the terminal acquires sensor data corresponding to the first image according to the exposure time of the first image, where the sensor data is used to record the motion condition of the terminal within the exposure time of the first image. The sensor data may for example be gyroscope data, which may be that the angular velocity of the terminal within the exposure time of the first image is recorded. After obtaining the sensor data, the terminal determines the jitter value of the first image according to the sensor data, namely determines the jitter displacement of the terminal during the exposure of the first image based on the sensor data.

A second aspect of the present application provides a video processing apparatus comprising:

the terminal comprises a detection unit, a processing unit and a processing unit, wherein the detection unit is used for detecting whether a light area exists in an image shot by the terminal when the terminal is in a night scene shooting mode;

the terminal comprises a first determining unit, a second determining unit and a control unit, wherein the first determining unit is used for determining a first image in an image sequence collected by the terminal when a light area exists in an image shot by the terminal, and an overexposure occurs in an area adjacent to the light area in the first image;

a second determining unit, configured to determine a second image and a third image in the image sequence according to the first image, where an acquisition time of the first image is located between an acquisition time of the second image and an acquisition time of the third image, and an overexposure degree of an area adjacent to a light area in the second image and the third image is less than an overexposure degree of an area adjacent to the light area in the first image;

the image generation unit is further used for generating a target image according to the first image, the second image and the third image, the target image is used for replacing the first image, and the degree of overexposure of the area adjacent to the light area of the target image is smaller than that of the area adjacent to the light area of the first image;

and the video generation unit is also used for generating a video based on the target image.

In one possible implementation manner, the method further includes: a control unit;

the control unit is used for: when the illuminance of the ambient light is smaller than a first preset threshold, controlling the terminal to enter a night scene shooting mode; or when an instruction for triggering the night scene shooting mode is acquired, controlling the terminal to enter the night scene shooting mode.

In a possible implementation manner, the shake displacement when the terminal collects the first image is greater than a second preset threshold.

In a possible implementation manner, a ratio of the number of target pixels in the first image to the total number of pixels in the first image is within a preset range, and the target pixels are pixels in a light area in the first image.

In a possible implementation manner, the second determining unit is configured to: determining a stabilization value for each of the plurality of images separately, the stabilization value being an inverse of a sum of a jitter value of the image and an acquisition time difference between the image and the first image; determining an image with the largest stable value in the plurality of images as the second image; and determining the third image in the plurality of images according to the acquisition time of the second image.

In a possible implementation manner, the second determining unit is configured to: determining one or more images in the plurality of images according to the acquisition time of the second image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images; and determining the image with the largest stable value in the one or more images as the third image.

In one possible implementation, the image generation unit is configured to: dividing the second image into a plurality of image blocks; determining motion vectors corresponding to the image blocks respectively according to the first image and the third image; and generating the target image according to the motion vectors respectively corresponding to the image blocks and the image blocks.

In one possible implementation, the image generation unit is configured to: obtaining a plurality of candidate motion vectors corresponding to a first image block, wherein the plurality of image blocks comprise the first image block; determining a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors according to the position of the first image block, wherein the first image comprises the second image block, and the third image comprises the third image block; determining a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block; and determining the motion vector with the minimum target error value in the candidate motion vectors as the motion vector corresponding to the first image block according to the target error value corresponding to each candidate motion vector.

In one possible implementation, the image generation unit is configured to: if the target image has an area which cannot be used for generating an image based on the image blocks, dividing the third image to obtain a plurality of image blocks of the third image; determining motion vectors corresponding to a plurality of image blocks of the third image according to the first image and the second image; and updating the target image according to the motion vectors corresponding to the image blocks of the third image and the image blocks of the third image to obtain a new target image, wherein the new target image is used for replacing the first image.

A third aspect of the present application provides a video processing apparatus comprising: a processor, a non-volatile memory, and a volatile memory; wherein the non-volatile memory or the volatile memory has computer readable instructions stored therein; the processor reads the computer readable instructions to cause the video processing apparatus to implement the method as implemented in any one of the first aspects.

The fourth aspect of the present application provides a terminal device, the terminal device processor, the memory, the display screen, the camera and the bus, wherein: the processor, the display screen, the camera and the memory are connected through the bus; the memory is used for storing a computer program and an image acquired by the camera; the processor is configured to control the display screen and further configured to control the memory, acquire an image stored in the memory and execute a program stored in the memory, so as to perform the following steps: when the terminal is in a night scene shooting mode, detecting whether a light area exists in an image shot by the terminal; when a light area exists in an image shot by the terminal, determining a first image in an image sequence collected by the terminal, wherein an area adjacent to the light area in the first image is overexposed; determining a second image and a third image in the image sequence according to the first image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the third image, and the overexposure degree of the adjacent area of the lighting area in the second image and the third image is less than the overexposure degree of the adjacent area of the lighting area in the first image; generating a target image according to the first image, the second image and the third image, wherein the target image is used for replacing the first image, and the degree of overexposure of the area adjacent to the light area of the target image is smaller than that of the area adjacent to the light area of the first image; and generating a video based on the target image.

In one possible implementation, the processor is further configured to: when the illuminance of the ambient light is smaller than a first preset threshold, the terminal enters a night scene shooting mode; or when the terminal acquires an instruction for triggering the night scene shooting mode, the terminal enters the night scene shooting mode.

In a possible implementation manner, the shake displacement of the terminal when acquiring the first image is greater than a second preset threshold.

In a possible implementation manner, a ratio of the number of target pixels in the first image to the total number of pixels in the first image is within a preset range, and the target pixels are pixels in a lighting area in the first image.

In one possible implementation, the processor is further configured to: determining a stabilization value for each of the plurality of images, respectively, the stabilization value being an inverse of a sum of a jitter value of the image and an acquisition time difference between the image and the first image; determining an image with the largest stable value in the plurality of images as the second image; and determining the third image in the plurality of images according to the acquisition time of the second image.

In one possible implementation, the processor is further configured to: determining one or more images in the plurality of images according to the acquisition time of the second image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images; and determining the image with the largest stable value in the one or more images as the third image.

In one possible implementation, the processor is further configured to: dividing the second image into a plurality of image blocks; determining motion vectors corresponding to the image blocks respectively according to the first image and the third image; and generating the target image according to the motion vectors respectively corresponding to the image blocks and the image blocks.

In one possible implementation, the processor is further configured to: obtaining a plurality of candidate motion vectors corresponding to a first image block, wherein the plurality of image blocks comprise the first image block; determining a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors according to the position of the first image block, wherein the first image comprises the second image block, and the third image comprises the third image block; determining a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block; and determining the motion vector with the minimum target error value in the candidate motion vectors as the motion vector corresponding to the first image block according to the target error value corresponding to each candidate motion vector.

In one possible implementation, the processor is further configured to: if the target image has an area which cannot generate an image based on the image blocks, dividing the third image to obtain a plurality of image blocks of the third image; determining motion vectors corresponding to a plurality of image blocks of the third image according to the first image and the second image; and updating the target image according to the motion vectors corresponding to the image blocks of the third image and the image blocks of the third image to obtain a new target image, wherein the new target image is used for replacing the first image.

A fifth aspect of the present application provides a computer-readable storage medium, having stored thereon a computer program, which, when run on a computer, causes the computer to perform the method as any one of the implementations of the first aspect.

A sixth aspect of the present application provides a computer program product which, when run on a computer, causes the computer to perform the method of any one of the implementations of the first aspect.

A seventh aspect of the present application provides a chip comprising one or more processors. A part or all of the processor is used for reading and executing the computer program stored in the memory so as to execute the method in any possible implementation mode of any one aspect.

Optionally, the chip may include a memory, and the memory and the processor may be connected to the memory through a circuit or a wire. Optionally, the chip further comprises a communication interface, the processor being connected to the communication interface. The communication interface is used for receiving data and/or information needing to be processed, the processor acquires the data and/or information from the communication interface, processes the data and/or information, and outputs a processing result through the communication interface. The communication interface may be an input output interface. The method provided by the application can be realized by one chip or by cooperation of a plurality of chips.

Drawings

Fig. 1a is a schematic diagram showing a comparison between a tremor image and an image used for replacing the tremor image provided by an embodiment of the present application;

FIG. 1b is a schematic diagram showing another tremor image and a comparison between images used to replace the tremor image provided by embodiments of the present application;

fig. 2a is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 2b is a block diagram of a software structure of the terminal 100 according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of an MEMC method according to an embodiment of the present application;

fig. 5 is another schematic diagram of an MEMC method according to an embodiment of the present application;

fig. 6 is a schematic diagram illustrating determining an image block based on a candidate motion vector according to an embodiment of the present application;

FIG. 7 is a schematic diagram of generating a target image according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of another example of generating a target image according to the present disclosure;

fig. 9 is a schematic view of an application architecture of a video processing method according to an embodiment of the present application;

FIG. 10 is a schematic flow chart of an alternative tremor image provided by an embodiment of the present application;

FIG. 11 is another schematic flow chart of an alternative tremor image provided by embodiments of the present application;

fig. 12 is a schematic flowchart of another video processing method according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a terminal 1300 according to an embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of a computer program product 1400 provided in an embodiment of the present application.

Detailed Description

Embodiments of the present application will now be described with reference to the accompanying drawings, and it is to be understood that the described embodiments are merely illustrative of some, but not all, embodiments of the present application. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and claims of this application and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps appearing in the present application does not mean that the steps in the method flow have to be executed in the chronological/logical order indicated by the naming or numbering, and the named or numbered process steps may be executed in a modified order depending on the technical purpose to be achieved, as long as the same or similar technical effects are achieved.

For a long time, the problem of lens shake when a portable terminal device such as a smartphone captures a video has been one of the biggest difficulties in generating a high-quality video. Since the portable terminal is relatively portable, the user usually moves while shooting the video using the portable terminal, which results in an unstable video picture shot by the portable terminal. In the moving process of a user, a lens of the portable terminal shakes, so that a light area in a video picture is sensed by a light sensing element in a larger area, and the phenomenon that the light area in the video picture is abnormally expanded occurs.

In the moving process of the user, the lens of the portable terminal usually shakes to different degrees, so that the light area of a part of video frames in the video shot by the portable terminal is not normally expanded, and the light area of another part of video frames in the video is normally displayed. In the process of watching a video by a user, video frames which are abnormally expanded in a lamplight area and video frames which are normally displayed in the lamplight area appear alternately, so that subjective trembling feeling is easily brought to the user watching the video, and the user experience is influenced. Among them, such a video frame in which the light area is abnormally spread out in the picture may be called a tremor image.

Electronic Image Stabilization (EIS) technology, which is currently provided in portable terminals, allows objects in a video screen to remain in a stable position by performing processing such as rotation, translation, and correction on the video screen. However, the conventional EIS technology cannot deal with the problem that the light area in the video frame is not normally extended. Based on the EIS technology, the display positions of the light areas in the video in the stable frame and the jitter frame are stable, but the phenomenon that the light areas are abnormally expanded still occurs in the jitter frame.

In view of the above, embodiments of the present application provide a video processing method for locating a tremor image in a video by determining a jitter value of an image in the video. Then, images before and after the tremor image are taken as reference images, and relative displacement between image blocks in the reference images is found based on the tremor image and the reference images. Finally, a target image for replacing the tremor image is generated based on the image blocks in the reference image and the corresponding relative displacements of the image blocks. By the scheme, the tremor image in the video can be replaced by the normally displayed image, so that the problem that the light area in the video image is abnormally expanded due to the camera shake is solved.

Referring to fig. 1a and fig. 1b, fig. 1a is a schematic diagram showing a comparison between a tremor image and an image used for replacing the tremor image provided by an embodiment of the present application; fig. 1b is a schematic diagram showing a comparison between another tremor image and an image used to replace the tremor image provided in an embodiment of the present application.

As shown in fig. 1a, the tremor image 1 in fig. 1a is an image taken in an outdoor scene. As can be seen from fig. 1a, the light areas in the tremor image 1, i.e. the areas where the windows of the building are located, show a pronounced flaring phenomenon. After the tremor image 1 is processed based on the method provided by the embodiment of the application, a stable image 1 for replacing the tremor image 1 is obtained, and the light area in the stable image 1 is normally displayed.

As shown in fig. 1b, the tremor image 2 in fig. 1b is an image taken in an indoor scene. As can be seen from fig. 1b, the light area in the tremor image 2, i.e. the area on the ceiling of the room where the luminaire is located, shows a pronounced flaring phenomenon. After the tremor image 2 is processed, a stable image 2 is obtained to replace the tremor image 2, and the light areas in the stable image 2 are normally displayed.

The video processing method provided by the embodiment of the application can be applied to a terminal with a video acquisition function. The terminal, which is also called a User Equipment (UE), a Mobile Station (MS), a Mobile Terminal (MT), etc., is a device in which a terminal capable of photographing a video is installed, and can process the photographed video to output a stably displayed video. Such as a handheld device with a camera function, a surveillance camera, etc.

Currently, some examples of terminals are: a mobile phone (mobile phone), a tablet computer, a notebook computer, a palm top computer, a monitoring camera, a Mobile Internet Device (MID), a wearable device, a Virtual Reality (VR) device, an Augmented Reality (AR) device, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote surgery (remote medical supply), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (transportation safety), a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), and the like.

The image acquisition device in the terminal is used for converting the optical signal into an electric signal to generate an image signal. The image capturing Device may be, for example, an image sensor, and the image sensor may be, for example, a Charge Coupled Device (CCD) or a Complementary Metal Oxide Semiconductor (CMOS).

For ease of understanding, the structure of the terminal 100 provided in the embodiments of the present application will be described below by way of example. Referring to fig. 2a, fig. 2a is a schematic structural diagram of a terminal device according to an embodiment of the present application.

As shown in fig. 2a, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the terminal 100. In other embodiments of the present application, terminal 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller can generate an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus comprising a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, a charger, a flash, a camera 193, etc. through different I2C bus interfaces, respectively. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the terminal 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus, enabling communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of receiving a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to implement the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing function of terminal 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the terminal 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal 100, and may also be used to transmit data between the terminal 100 and peripheral devices. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the terminal 100. In other embodiments of the present application, the terminal 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charging management module 140 is configured to receive a charging input from a charger.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.

The wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The

antennas

1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication and the like applied to the terminal 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication applied to the terminal 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the antenna 1 of the terminal 100 is coupled to the mobile communication module 150 and the antenna 2 is coupled to the wireless communication module 160 so that the terminal 100 can communicate with a network and other devices through a wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The terminal 100 implements a display function through the GPU, the display screen 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the terminal 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

Specifically, the display screen 194 may display the target interface in the present embodiment.

The terminal 100 can implement a photographing function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc. The ISP is used to process the data fed back by the camera 193. The camera 193 is used to capture still images or video.

Video codecs are used to compress or decompress digital video. The terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in the external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the terminal 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal 100 may implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The terminal 100 can listen to music through the speaker 170A or listen to a handsfree call. The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. The earphone interface 170D is used to connect a wired earphone.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal.

The gyro sensor 180B may be used to determine a motion attitude of the terminal 100.

The air pressure sensor 180C is used to measure air pressure.

The magnetic sensor 180D includes a hall sensor.

The acceleration sensor 180E may detect the magnitude of acceleration of the terminal 100 in various directions (generally, three axes).

A distance sensor 180F for measuring a distance.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode.

The ambient light sensor 180L is used to sense ambient light brightness.

The fingerprint sensor 180H is used to collect a fingerprint.

The temperature sensor 180J is used to detect temperature.

The touch sensor 180K is also called a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided via the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the terminal 100 at a different position than the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal 100 may receive a key input, and generate a key signal input related to user setting and function control of the terminal 100.

The motor 191 may generate a vibration cue.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card.

The software system of the terminal 100 may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present invention uses an Android system with a hierarchical architecture as an example to exemplarily explain a software structure of the terminal 100.

Fig. 2b is a block diagram of a software structure of the terminal 100 according to an embodiment of the present disclosure.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 2b, the application package may include camera, gallery, calendar, call, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 2b, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

Content providers are used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide a communication function of the terminal 100. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The following describes exemplary work flows of software and hardware of the terminal 100 in conjunction with a video playing scene.

When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and taking the control corresponding to the click operation as the control of the video playing related application icon as an example, the video playing application calls the interface of the application framework layer, starts the video playing application, and then plays the video on the video playing interface of the video playing application by calling the kernel layer, for example, a free viewpoint video can be played.

In the above, an application scenario of the video processing method provided in the embodiment of the present application is introduced, and an execution process of the video processing method will be described in detail below.

Referring to fig. 3, fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present disclosure. As shown in fig. 3, the video processing method includes the following steps 301 to 304.

In step 301, a plurality of images are acquired.

In this embodiment, the plurality of images may be a plurality of consecutive images in the video. The terminal may acquire the plurality of images in a variety of ways.

In one possible approach, the terminal may capture a video at a fixed frame rate, thereby acquiring a plurality of images captured over a period of time. For example, when the terminal captures a video at a frame rate of 30Hz, the plurality of images may be 30 images captured by the terminal within 1 second.

In another possible mode, the terminal may receive a plurality of images sent by other terminals, and the plurality of images are obtained by shooting by other terminals. In brief, another terminal captures a video at a fixed frame rate, and after obtaining a plurality of images, transmits the plurality of images to the terminal in the present embodiment, and the terminal in the present embodiment processes the plurality of images.

Step 302, determining a first image in the plurality of images, wherein a jitter value of the first image is greater than or equal to a second preset threshold, and the jitter value is a jitter displacement when the terminal acquires the first image.

In this embodiment, the terminal determines the tremor image of the plurality of images by determining a jitter value for each of the plurality of images. When the jitter value of any one of the images is smaller than a second preset threshold value, determining that the image is not a tremor image; and when the jitter value of one image in the plurality of images is greater than or equal to a second preset threshold value, determining that the image is a tremor image. The jitter value of the image refers to the jitter displacement of the terminal when the terminal collects the image. The second predetermined threshold may be 300-400, for example, the second predetermined threshold is 360.

In one possible embodiment, the terminal may acquire sensor data recording the motion of the terminal at the time of acquiring the first image, and calculate a jitter value of the first image based on the sensor data.

Illustratively, the terminal acquires sensor data corresponding to the first image according to the exposure time of the first image, wherein the sensor data is used for recording the motion condition of the terminal in the exposure time of the first image. For example, the sensor data may be gyroscope data, i.e. data acquired by a gyroscope. The gyroscope is an angular motion detection device, and the gyroscope data acquired by the gyroscope records the angular velocity of the terminal in the exposure time of the first image. Furthermore, the sensor data may also be data acquired by other sensors, which may record the angular velocity or angular acceleration of the terminal during the exposure time of the first image. The terminal then determines from the sensor data a jitter value of the first image, i.e. a jitter displacement occurring during exposure of the first image by the terminal.

For example, in the case where the sensor data is gyroscope data, the terminal may acquire the gyroscope data at a plurality of times within the first image exposure period, each of the gyroscope data recording a three-dimensional rotational angular velocity of the terminal at a certain time. Therefore, the terminal can calculate the angular displacement of the terminal between two adjacent acquisition moments based on the plurality of gyroscope data within the first image exposure period and the time interval between two adjacent gyroscope data. Finally, the terminal determines a shake displacement occurring during the exposure of the first image by the terminal based on the calculated plurality of angular displacements.

Based on the method, the terminal can acquire the jitter value of each image in the plurality of images, so as to determine the first image with the jitter value being greater than or equal to the second preset threshold value.

In this embodiment, the terminal may obtain the jitter value of the image in other ways besides calculating the jitter value of the image through the sensor data.

In one possible implementation, the terminal may calculate a contrast value of the image and determine a jitter value of the image according to the contrast value of the image.

Specifically, for the plurality of images, the terminal may calculate a contrast value for each of the plurality of images, and then determine a shake value for each image based on the contrast value for each image. Wherein the terminal may be to calculate the contrast value of the image based on the luminance values of the pixels in the region of interest of the image. Specifically, the terminal may first determine a region of interest in the image, which may be, for example, a region of a certain range located at the center of the image. For example, the region of interest is a region of 12 × 12 pixels in the center of the image. Then, the terminal calculates the luminance values of all the pixels in the region of interest, and determines the pixel with the highest luminance and the pixel with the lowest luminance in the region of interest. And finally, the terminal calculates the ratio of the brightness value of the pixel with the highest brightness to the brightness value of the pixel with the lowest brightness, namely the terminal calculates the ratio of the highest brightness value to the lowest brightness value, so as to obtain the contrast value of the image.

After calculating the contrast value of the image, the terminal further determines the jitter value of the image according to the contrast value of the image. Wherein the jitter value of the image has a negative correlation with the contrast value of the image. Namely, the larger the contrast value of the image is, the smaller the jitter value of the image is; the smaller the contrast value of the image, the larger the jitter value of the image. For example, the product between the jitter value of the image and the contrast value of the image may be defined as a specific positive number, and thus the terminal may determine the jitter value of the image by dividing the positive number by the contrast value of the image. For example, assuming that the contrast value of the image is C, the terminal may calculate that the jitter value of the image is 1/C.

In another possible implementation, the terminal may determine the jitter value of the image based on a pre-trained neural network.

Illustratively, the terminal may acquire a neural network trained in advance based on a large number of images labeled with jitter values, the neural network being used to predict the jitter values of the images. Then, the terminal inputs the plurality of images into the neural network, obtains the jitter value output by the neural network, and determines the jitter value of each image in the plurality of images. Wherein the neural network is trained based on pre-prepared training data. The training data includes a large number of images photographed in the case where the terminal is shaken, and the light areas in the images are abnormally spread. In addition, the images are marked with jitter values corresponding to the images. After the training data is obtained, the neural network can be trained according to the existing neural network training method, and finally the trained neural network is obtained.

Based on the two implementation modes, the terminal can effectively judge whether the image is a tremor image or not under the condition that the sensor data corresponding to the terminal in image acquisition cannot be acquired.

In one possible embodiment, the terminal may also determine whether there is a local lighting zone in the plurality of images. And only when a certain image has a local light area and the jitter value of the image is greater than or equal to a second preset threshold value, the terminal determines the image as a tremor image.

In this embodiment, the luminance values of the pixels in the light area are all greater than the specific threshold, and the difference value between the luminance value of the pixel in the light area and the luminance value of the pixel in the adjacent area of the light area is greater than the preset difference value. Wherein, the value of the specific threshold may be 180-230, for example, the specific threshold is specifically 224; the preset difference value can be 25-50, for example, the preset difference value is 32. For example, when the terminal detects that the luminance values of the pixels in a certain region are all greater than 224 and the difference value between the luminance value of the pixel in the region and the luminance value of the pixel in the adjacent region is greater than 32, the terminal may determine that the region is the light region. The adjacent region of the light region may be formed by pixels adjacent to pixels in the light region, that is, pixels in the adjacent region of the light region are adjacent to pixels in the light region.

For example, the terminal may determine the number of target pixels in each of the plurality of images, the target pixels being pixels within the light area. Then, the terminal obtains the ratio of the number of the target pixels in each image to the total number of the pixels of the image, and judges whether the ratio is in a preset range. If the ratio of the number of the target pixels in the image to the total number of the pixels in the image is within a preset range, the terminal determines that a local light area exists in the image; and if the ratio of the number of the target pixels in the image to the total number of the pixels in the image is not in the preset range, the terminal determines that no local light area exists in the image.

That is to say, the jitter value of the first image determined as the tremor image is greater than or equal to the second preset threshold, and the ratio of the number of target pixels in the first image to the total number of pixels is within a preset range, wherein the target pixels are pixels of the light area.

Illustratively, taking the format of the color space of the image as YUV format as an example, the Y value of a pixel in the image is the luminance value of the pixel. In the image, the value range of the Y value of the pixel is 0-255, and the larger the Y value is, the larger the brightness value of the pixel is. Assuming that the specific threshold is 224, the predetermined difference is 32, and the predetermined range is 2% -20%. The terminal may then determine the number of target pixels in the image first, and then the terminal determines the ratio of the target pixels to the total number of pixels of the image. The Y value of the target pixel is greater than or equal to 240, and the difference value between the brightness value of the target pixel and the brightness value of the pixel in the adjacent area of the light area is greater than or equal to 32. And if the ratio of the target pixel to the total number of pixels of the image is within 2% -20%, determining that the image has a local lighting area. It can be understood that the threshold, the difference, and the preset range may be adjusted according to actual needs, and the specific values of the threshold, the difference, and the preset range are not limited in this embodiment.

Step 303, determining a second image and a third image in the plurality of images from the first image.

In this embodiment, the time of acquisition of the first image is located between the time of acquisition of the second image and the time of acquisition of the third image. That is, among the plurality of images acquired by the terminal, the first image is not the first image of the plurality of images, and the first image is not the last image of the plurality of images. The plurality of images are arranged according to the sequence of the acquisition time, the first image in the plurality of images refers to the earliest acquired image in the plurality of images, and the last image in the plurality of images refers to the latest acquired image in the plurality of images.

In addition, in the second image and the third image, a jitter value of at least one image is smaller than a jitter value of the first image. Since the second image and the third image are target images for generating the replacement first image as reference images, it is necessary to ensure the image quality of the second image and the third image so as to generate the target images normally displayed in the light area.

In a possible implementation, the terminal may determine the second image and the third image in N images located before and after the first image, with the first image as a center. For example, the terminal may acquire k images whose acquisition time is before the first image and k images whose acquisition time is after the first image among the aforementioned plurality of images, and determine the second image and the third image among the acquired 2k images. Where k is an integer greater than or equal to 1, for example k may be 1,2 or 3. Since the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the third image, the terminal may determine the second image from k images before the first image at the acquisition time and then determine the third image from k images after the first image at the acquisition time. Alternatively, the terminal may determine the third image from k images preceding the first image at the time of acquisition and then determine the second image from k images following the first image at the time of acquisition.

For example, assuming that the first image is image t, the terminal may acquire images t-k to t-1 and t +1 to t + k, and determine the second image and the third image in images t-k to t-1 and t +1 to t + k. The terminal may then determine a second image in images t-k to t-1 and a third image in images t +1 to t + k. Alternatively, the terminal may determine the third image in the images t-k to t-1 and the second image in the images t +1 to t + k.

Alternatively, in order to generate a target image of higher quality as much as possible, the terminal may select, as the second image and the third image, an image of which the stable value for generating a target image of high quality is larger among the plurality of images.

Illustratively, the terminal determines a stabilization value for each of the plurality of images separately, the stabilization value being an inverse of a sum of a jitter value of the image and an acquisition time difference between the image and the first image. Wherein, the stable value of the image and the jitter value of the image and the acquisition time difference between the image and the first image are in a negative correlation relationship. That is, the larger the jitter value of an image, the smaller the stabilization value of the image; the larger the acquisition time difference between the image and the first image, the smaller the stabilization value of the image. In short, the terminal needs to select images having a small jitter value and close to the first image as much as possible as the second image and the third image.

Then, the terminal determines the image with the maximum stable value in the plurality of images as the second image, and determines the third image in the plurality of images according to the acquisition time of the second image, so that the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the third image. Specifically, the terminal may determine one or more images in the plurality of images according to the acquisition time of the second image, where the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images; then, the terminal determines an image with the largest stable value among the one or more images as the third image. The stable value of the second image determined based on the above-described manner is larger than that of the third image, and therefore the second image may be regarded as a primary reference image for generating the target image, and the third image may be regarded as a secondary reference image for generating the target image.

In addition, the terminal may determine a stable value of each of N images located before and after the first image, and determine an image with a maximum stable value among the N images as the second image. That is, the terminal determines a stable value of an image within a specific range centering on the first image, and determines an image having the largest stable value as the second image. Finally, a third image is determined based on the second image.

For example, assume a first graphThe terminal determines a second image and a third image in the images t-k to t-1 and t +1 to t + k, like the image t. Further, assume that the stabilization value of any one image p of the images t-k to t-1 and t +1 to t + k is H, and the dither value of the image p is p _d . Then, the stable value H is specifically: h = 1/(p) _d +100 × p-t |). It can be seen that the larger the jitter value of an image p, the smaller the stable value of the image p; the larger the acquisition time difference between the image p and the first image, the smaller the stabilization value of this image p.

In this way, the terminal can calculate the stable values H of the images t-k to t-1 and t +1 to t + k, respectively, and select one image having the largest stable value H among the images t-k to t-1 and t +1 to t + k as the second image. If the second image is one of the images t-k to t-1, the terminal continues to select one of the images t +1 to t + k whose stable value H is the largest as a third image. If the second image is one of the images t +1 to t + k, the terminal continues to select one of the images t-k to t-1 as the third image whose stable value H is the largest.

Step 304, generating a target image from the first image, the second image and the third image.

In this embodiment, the target image is used to replace the first image. After the terminal generates the target image, the terminal may replace the first image of the multiple images with the target image to obtain multiple new images, and output the multiple new images.

The first image and the third image are used for determining a motion vector corresponding to an image block in the second image, the motion vector is used for representing relative displacement between the image block in the third image and the image block in the second image, and the target image is obtained based on the image block in the second image and the motion vector.

Specifically, the terminal may divide the second image into a plurality of image blocks, and determine motion vectors corresponding to the plurality of image blocks according to the first image and the third image, so that each image block in the plurality of image blocks can find a matching image block in the first image and the third image based on the motion vector corresponding to the image block. That is, for each image block in the second image, the corresponding image block can be found in the first image and the third image based on the motion vector of the image block, and the corresponding image block in the first image and the third image are both matched with the image block in the second image. That is, the corresponding image blocks in the first image are similar to the image blocks in the second image, and the corresponding image blocks in the third image are also similar to the image blocks in the second image. And finally, the terminal generates the target image according to the motion vectors respectively corresponding to the image blocks and the image blocks.

In the embodiment, the tremor image in the video is positioned by determining the jitter value of the image in the video. Then, images before and after the tremor image are taken as reference images, and relative displacement between image blocks in the reference images is found based on the tremor image and the reference images. Finally, a target image for replacing the tremor image is generated based on the image blocks in the reference image and the corresponding relative displacements of the image blocks. By the scheme, the tremor image in the video can be replaced by the normally displayed image, so that the problem that the light area in the video image is abnormally expanded due to the camera shake is solved.

To facilitate understanding the process of determining the Motion vector corresponding to the image block in the second image by the terminal, a Motion Estimation and Motion Compensation (MEMC) method in the related art is briefly introduced below.

The MEMC method is a technology for anti-jitter or frame rate conversion, and obtains motion vectors of a plurality of image blocks in an image by estimating a motion trajectory of a continuously moving object in the image. Then, the image blocks and the obtained motion vectors are combined, and intermediate images are obtained through interpolation, so that the video frame rate is improved or the problems of jitter and tailing and the like during video playing are solved.

Referring to fig. 4, fig. 4 is a schematic view of an MEMC method according to an embodiment of the present disclosure. As shown in fig. 4, the image 1 and the image 2 are two consecutive images, the image 1 is an image acquired first, the image 2 is an image acquired later, and the image 3 is an image interpolated based on the image 1 and the image 2.

In the process that the terminal executes the MEMC, the terminal executes the blocking operation on the image 1 and the image 2 respectively to obtain a plurality of image blocks in the image 1 and a plurality of image blocks in the image 2, wherein the size of each image block is generally 8 × 8 pixels or 16 × 16 pixels. For each image block in image 1, the terminal finds the image block in image 2 that is most similar to the image block in image 1, i.e. finds the image block in image 2 that matches the image block in image 1. For example, as shown in FIG. 4, image block B in image 2 is most similar to image block A in image 1, and thus it may be determined that image block B matches image block A. Based on the image block B and the image block a, a motion vector corresponding to the image block a may be determined, where the motion vector is a relative displacement between the image block B and the image block a. That is, after the image block a moves based on the motion vector, the image block a can move to the position of the image block B. In this way, after the image block matching each image block in the image 1 in the image 2 is determined, the motion vector corresponding to each image block in the image 1 can be obtained. The process of obtaining the motion vector corresponding to the image block is the motion estimation part in the MEMC method.

After obtaining the motion vector corresponding to each image block in the image 1, the terminal may move each image block in the image 1 according to half of the corresponding motion vector based on the motion vector corresponding to each image block, and finally obtain the image 3. For example, as shown in fig. 4, when an image block a in image 1 is moved by half of the motion vector, the image block moves to position P in image 3, which is substantially midway between the position of image block a and the position of image block B. The above process of generating an image based on a motion vector is a motion compensation part in the MEMC method.

However, the MEMC method described above also has disadvantages. Since the MEMC method searches for a matching image block based on the most similar principle, when there are a plurality of image blocks with close similarity in the searched image, an erroneous motion vector is easily obtained, thereby causing a significant error in the finally generated image.

Referring to fig. 5, fig. 5 is another schematic diagram of an MEMC method according to an embodiment of the present application. As shown in fig. 5, it is assumed that image block B 'is also included in image 2, and image block B' is similar to image block a. That is, for image block a, image block B and image block B' in image 2 are both similar to image block a. In this case, when generating the image 3 based on the motion vector corresponding to the image block a, the image block a may be moved to the position P or the position P' in the image 3. Assuming that in fact the image block in image 2 matching image block a is image block B, then the image block a eventually exhibits a position P', which results in a significant error in image 3.

In view of this, the embodiment of the present application introduces an original image on the basis of the MEMC method in the related art, and further determines a motion vector of an image block based on the original image, thereby ensuring the quality of a generated target image.

Specifically, in the present embodiment, in the process of obtaining the motion vector corresponding to the image block in the second image, the terminal determines the motion vector corresponding to the image block in the second image based on the third image and the first image together, so that each image block in the second image can find a matching image block in the first image and the third image based on the motion vector corresponding to the image block. In this way, the terminal can ensure that the obtained motion vector is accurate.

Taking fig. 5 as an example, in the case where it is determined that image block B and image block B ' in image 2 are both similar to image block a, position P and position P ' on the original image located between image 1 and image 2 are determined based on image block B and image block B '. Then, an image block P located at position P and an image block P ' located at position P ' in the original image are obtained, and then the similarity between the image block a and the image blocks P and P ' is obtained. Finally, the motion vector of the image a is determined by combining the similarities between the image block a and the image blocks B and B ', and the similarities between the image block a and the image blocks P and P'. In this way, the image block a is moved to the correct position P by moving the image block a based on the finally obtained motion vector.

For ease of understanding, the following describes in detail a process of determining, by the terminal, motion vectors corresponding to a plurality of image blocks in the second image respectively according to the first image and the third image in the present embodiment.

In a possible implementation manner, for a plurality of image blocks in the second image, the terminal finds a motion vector corresponding to each of the plurality of image blocks one by one. How the terminal determines the motion vectors corresponding to the image blocks in the second image will be described below by taking a process of the terminal determining the motion vector corresponding to one of the image blocks as an example.

First, the terminal obtains a plurality of candidate motion vectors corresponding to a first image block. The image blocks in the second image include the first image block, and the first image block may be any one of the image blocks. Optionally, the plurality of candidate motion vectors includes: one or more preset motion vectors, one or more randomly generated motion vectors, and/or motion vectors corresponding to image blocks adjacent to the first image block.

It can be understood that, since the second image and the third image are two images with similar acquisition time, the displacement of the object in the second image relative to the third image is small, so that the motion vector corresponding to the image block in the second image can be considered to be within the preset range. In this way, the terminal may obtain one or more preset motion vectors and use these preset motion vectors as candidate motion vectors. In short, it is assumed that the object in the second image is displaced in the acquisition gap between the second image and the third image, but since the acquisition time between the second image and the third image is very close, it can be considered that the displacement of the object in the second image is within a certain range. For example, the predetermined motion vector may include (0, -1), (-1, 0), (-1, -1), (-1, 1), (1, -1), (0, 0), (0, 1), (1, 0), and (1, 1), i.e., it is considered that the image block in the second image is within 1 pixel of the displacement.

In addition, in order to ensure that the motion vector actually corresponding to the first image block can be acquired as much as possible, one or more motion vectors may be randomly generated or the motion vectors corresponding to the image blocks adjacent to the first image block may be selected as candidate motion vectors, so that the value range of the candidate motion vectors is prevented from being too limited. Since the image blocks of the first image block and the adjacent image blocks of the first image block have a high probability of representing different parts of the same object, the motion vectors corresponding to the image blocks of the first image block and the adjacent image blocks of the first image block are likely to be the same. Therefore, the motion vectors corresponding to the image blocks adjacent to the first image block are selected as the candidate motion vectors, so that the comprehensiveness of the candidate motion vectors can be ensured. For example, in the case that the terminal finds the motion vectors corresponding to the image blocks in the order from left to right and from top to bottom, then the image blocks adjacent to the first image block are four image blocks located above and to the left, above and to the right and to the left of the first image block. In the case that the terminal obtains the motion vector corresponding to the image block according to other sequences, the image block adjacent to the first image block may be an image block located at other directions of the first image block, and the position of the image block adjacent to the first image block is not specifically limited in this embodiment.

After obtaining a plurality of candidate motion vectors, the terminal may determine, according to the position of the first image block, a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors, where the first image includes the second image block, and the third image includes the third image block. Since the candidate motion vector is representative of a relative displacement between an image block in the third image and an image block in the second image, the terminal may determine the second image block in the first image and the third image block in the third image based on the position of the first image block in the second image and the candidate motion vector. For example, referring to fig. 6, fig. 6 is a schematic diagram illustrating determining an image block based on a candidate motion vector according to an embodiment of the present application. As shown in fig. 6, it is assumed that the second image is a previous image of the first image, the third image is a subsequent image of the first image, the coordinates of the position where the first image block is located in the second image are (0, 0), and the candidate motion vector corresponding to the first image block is (2, -2). Then, based on the candidate motion vector corresponding to the first image block, the coordinates of the location of the third image block in the third image are found to be (2, -2), and the coordinates of the location of the second image block in the first image are found to be (1, -1).

Secondly, the terminal determines a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block. Wherein the error value between the two image blocks represents a difference between the two image blocks, the larger the error value, the larger the difference between the two image blocks. For example, after determining the second image block and the third image block corresponding to the first image block based on the candidate motion vectors, the terminal may find an error value between the first image block and the second image block and an error value between the first image block and the third image block based on a Sum of Absolute Differences (SAD) algorithm. Then, the terminal determines a target error value based on an error value between the first image block and the second image block and an error value between the first image block and the third image block. For example, the error value between the first image block and the second image block and the error value between the first image block and the third image block are added to obtain a target error value.

And finally, the terminal determines the motion vector with the minimum target error value as the motion vector corresponding to the first image block in the candidate motion vectors according to the target error value corresponding to each candidate motion vector.

In practical application, the terminal may obtain a motion vector corresponding to each image block in the second image based on the above procedures, and finally obtain motion vectors corresponding to a plurality of image blocks in the second image.

In practical applications, a large number of image blocks are included in the second image, and a part of the image blocks in the second image may not be able to determine corresponding motion vectors because a matching image block cannot be found in the third image. Or, after moving based on the corresponding motion vector, some image blocks in the second image may move to the same position. In this way, in generating the target image based on the plurality of image blocks in the second image, there may be an area in the generated target image where the image cannot be generated based on the plurality of image blocks. These areas where no image can be generated can be referred to as void areas.

For example, referring to fig. 7, fig. 7 is a schematic diagram of generating a target image according to an embodiment of the present application. As shown in fig. 7, the image block of position (1, 1) in the second image shows a person, the image block of position (2, 1) in the second image shows a road block, and the image block of position (3, 1) in the second image shows a box. The image patch at position (1, 1) in the first image shows a fire hydrant, the image patch at position (2, 1) in the first image shows a person, and the image patch at position (3, 1) in the first image shows a box. The image patch at position (1, 1) in the third image shows a fire hydrant, the image patch at position (2, 1) in the third image shows a roadblock, and the image patch at position (3, 1) in the second image shows a person.

In fig. 7, when the motion vector corresponding to the image block in the second image is obtained based on the first image and the third image, the motion vector corresponding to the image block 1 at the position (1, 1) in the second image is (2, 0), the motion vector corresponding to the image block 2 at the position (2, 1) in the second image is (0, 0), and the motion vector corresponding to the image block 3 at the position (3, 1) in the second image is (0, 0). Based on the motion vector found above, the image block 1 in the second image has moved to the position (2, 1) in the target image, the image block 2 has also moved to the position (2, 1) in the target image, and the image block 3 has moved to the position (3, 1) in the target image. Finally, the position (1, 1) in the target image has no image that can be displayed, that is, the position (1, 1) is a hollow region.

In this embodiment, after obtaining the motion vectors corresponding to the plurality of image blocks in the second image, the terminal may determine the corresponding positions of the image blocks in the target image based on the motion vector corresponding to each image block. If the different image blocks in the second image have the same position in the target image, the terminal needs to determine one image block displayed on the target image among the plurality of different image blocks. For example, in fig. 7, both the image block 1 and the image block 2 in the second image need to be moved to the position (2, 1) in the target image, i.e. both the corresponding positions of the image block 1 and the image block 2 in the target image are (2, 1).

Specifically, after determining that corresponding positions of a plurality of different image blocks in the second image in the target image are the same, the terminal obtains target error values of motion vectors corresponding to the plurality of different image blocks respectively. As can be seen from the foregoing embodiments, the target error value of the motion vector of each image block in the second image is obtained based on the error value between the image block and the image block at the corresponding position in the first image and the error value between the image block and the image block at the corresponding position in the third image. And finally, the terminal determines an image block with the minimum target error value from a plurality of different image blocks, and displays the image block with the minimum target error value on the target image.

For the image block 1 and the image block 2 in fig. 7, the target error value of the motion vector corresponding to the image block 1 is obtained based on the error value between the image block 1 and the image block with position (2, 1) in the first image and the image block with position (3, 1) in the image block 1 and the third image; the target error value of the motion vector corresponding to the image block 2 is obtained based on the error value between the image block 2 and the image block with the position (2, 1) in the first image and the image block with the position (3, 1) in the image block 2 and the third image. Obviously, the target error value of the motion vector corresponding to the image block 1 is smaller than the target error value of the motion vector corresponding to the image block 2. Thus, position (2, 1) in the target image is the last image block 1 in the displayed second image.

In order to eliminate the void region in the target image, in this embodiment, motion vectors corresponding to a plurality of image blocks in the third image are determined based on the first image and the second image, and the target image is updated according to the motion vectors corresponding to the plurality of image blocks in the third image, so as to finally obtain an updated target image.

For example, after the terminal generates the target image based on the plurality of image blocks of the second image, if the target image has an area in which the image cannot be generated based on the plurality of image blocks, the terminal divides the third image to obtain the plurality of image blocks of the third image.

For example, referring to fig. 8, fig. 8 is a schematic diagram of another generation target image provided in the embodiment of the present application. Fig. 8 is a diagram that is based on the embodiment shown in fig. 7, and that updates the target image based on the motion vector corresponding to the image block in the third image, so as to obtain an updated target image.

In fig. 8, when the motion vector corresponding to the image block in the second image is obtained based on the first image and the third image, the motion vector corresponding to the image block 1' at the position (1, 1) in the third image is (0, 0), the motion vector corresponding to the image block 2' at the position (2, 1) in the third image is (0, 0), and the motion vector corresponding to the image block 3' at the position (3, 1) in the third image is (-2, 0).

Based on the above-mentioned found motion vector, the image block 1' in the second image needs to be moved to the position (1, 1) in the target image, the image block 2' needs to be moved to the position (2, 1) in the target image, and the image block 3' needs to be moved to the position (2, 1) in the target image. Because the position of the generated hole area in the target image is (1, 1), the target image can be updated only by moving the image block 1' in the second image to the position (1, 1) in the target image based on the motion vector, and the updated target image is obtained. As can be seen from fig. 8, the updated target image completes the filling of the hollow area, and the updated target image can stably display the content included in the first image.

For ease of understanding, the video processing method of the present embodiment will be described in detail below with reference to specific examples.

Referring to fig. 9, fig. 9 is a schematic view of an application architecture of a video processing method according to an embodiment of the present disclosure. As shown in fig. 9, the terminal includes an Image pickup device, a gyroscope, an Image Signal Processing (ISP) module, and a display module. The ISP module comprises a tremor determination unit and a tremor release unit. Optionally, the ISP module further comprises an EIS unit. The image acquisition device transmits a plurality of acquired images to the EIS unit, and the EIS unit executes anti-shake operation on the images. The tremor judgment unit acquires the plurality of images output by the EIS unit, judges whether tremor images exist in the plurality of images, and finally generates a target image for replacing the tremor images by the tremor elimination unit. And the tremor removing unit replaces the tremor image in the plurality of images with the target image, then outputs the plurality of images after replacing the target image to the display component, and the display component displays the plurality of images after replacing the target image.

Referring to fig. 10 and 11, fig. 10 is a schematic flow chart of an alternative tremor image provided in the embodiment of the present application, and fig. 11 is another schematic flow chart of an alternative tremor image provided in the embodiment of the present application. The video processing method shown in fig. 10 includes the following steps 1001 to 1006.

Step 1001, acquiring a plurality of images, and determining whether a local lighting area exists in an image t in the plurality of images.

After the image is acquired by the image acquisition device of the terminal, the tremor determination unit may acquire a plurality of images, which may be subjected to anti-shake processing by the EIS unit. The tremor determination unit determines whether images in the plurality of images are tremor images one by one.

Illustratively, the tremor determination unit first determines whether there is a local lighting area in an image t of the plurality of images. Taking the YUV format as an example of the format of the color space of the image, the terminal may first determine the number of target pixels located in the light area in the image t, and then determine the ratio of the target pixels to the total number of pixels of the image t. And if the ratio of the target pixel to the total number of pixels of the image is within 2% -20%, determining that the image has a local lighting area. In the specific implementation process of the terminal, the terminal may set the initial values of the variable total and the variable light to be both 0. For each pixel p in the image t, the value of the variable total is increased by 1; if pixel p is located in the light area, light is also increased by 1. After all pixels in image t have been traversed, r = light/total is calculated. If r is within 2% -20%, the image t is considered to have a local lighting area; if r is not within 2% -20%, then image t is considered to be absent of local lighting areas, i.e., image t is not a tremor image.

In step 1002, it is determined whether the jitter value of the image t is greater than or equal to a second preset threshold.

After the local light area exists in the image t, the tremor judgment unit further judges whether the tremor value of the image t is larger than or equal to a second preset threshold value so as to determine whether the image t is a tremor image. The second preset threshold may be 360, for example.

If the jitter value of the image t is greater than or equal to a second preset threshold value, determining that the image t is a tremor image; if the jitter value of the image t is less than the second preset threshold, it may be determined that the image t is not a tremor image.

In this embodiment, the tremor determination unit may obtain gyroscope data acquired by a gyroscope in the terminal, and determine the tremor value of the image t based on the gyroscope data.

Firstly, acquiring corresponding gyroscope data according to the acquisition time stamp information of the image t. If the sampling rate f of the gyroscope data is less than a threshold fs set in advance, upsampling is performed on the gyroscope data based on an upsampling algorithm so that the gyroscope data can be greater than or equal to the threshold fs. The threshold fs may be 1000Hz, for example, and may be adjusted as needed in practical applications. The up-sampling algorithm for up-sampling the gyroscope data may use any one of existing sampling algorithms, such as cubic convolution interpolation (cubic convolution interpolation) algorithm. In brief, if f < fs, a new sampling rate f 'is selected to be greater than or equal to fs, and f' is, for example, an integral multiple of f; then, an up-sampling algorithm is performed to raise the sampling rate f to f'.

The gyroscope data is collected by angular velocities in three dimensions, and the angular velocities collected by the gyroscope in different dimensions are respectively represented by x, y and z. Specifically, one example of the gyroscope data may be found in table 1.

TABLE 1

Number of	Time	x	y	z
					0(start)	t0	-	-	-
1	t1	x1	y1	z1
					2	t2	x2	y2	z2
3	t3	x3	y3	z3
					…	…	…	…	…
34(end)	t4	x34	y34	z34

In table 1, start refers to the time when the current image starts exposure, and end refers to the time when the current image ends exposure.

Since data such as x1, y1, z1, etc. indicate angular velocities, it is possible to approximate that angular displacements from time t [ i ] to t [ i +1] are s [ i ] = ((t [ i +1] -t [ i ]) x [ i ], (t [ i +1] -t [ i ]) y [ i ], (t [ i +1] -t [ i ]) z [ i ]). This equation includes the angular displacement from t33 to t 34. Specifically, the angular displacement from t0 to t1 can be approximated by the gyroscope data numbered 1, i.e., s [0] = ((t [1] -t [0 ]) x [1], (t [1] -t [0 ]) y [1], (t [1] -t [0 ]) z [1 ]).

Based on angular displacement s [ i ]]The gyroscope positions at times t1, t2, \ 8230;, t34, etc. can be cumulatively calculated. Specifically, the t0 position l [0]]Is (0, 0);

therefore, based on the above-described gyroscope data, 35 pieces of position data of the gyroscope in each dimension can be obtained.

For 35 position data on the x axis, the position data with the maximum value and the position data with the minimum value in the 35 position data are taken, and the x axis result is obtained by subtracting and then squaring. Similarly, a y-axis result and a z-axis result are found based on 35 position data on the y-axis and 35 position data on the z-axis. And finally, solving the sum among the x-axis result, the y-axis result and the z-axis result, and solving the square root of the obtained sum to obtain the jitter value of the image.

Further, in addition to obtaining the x-axis result by finding the difference between the position data having the largest value and the position data having the smallest value among the 35 position data, the x-axis result may be obtained by finding the variance among the 35 position data. Similarly, the y-axis result and the z-axis result are obtained by finding the variance between 35 position data on the y-axis and the variance between 35 position data on the z-axis. And finally, solving the sum among the x-axis result, the y-axis result and the z-axis result, and solving the square root of the sum to obtain the jitter value of the image.

Optionally, in some cases, when the terminal shakes at a low frequency, an image shot by the terminal may be blurred, but a phenomenon that a light area is abnormally expanded may not occur in the image; when the terminal shakes greatly at high frequency, the image shot by the terminal will have the phenomenon that the light area is not expanded outwards. Based on this, in the process of obtaining the jitter value of the image based on the gyroscope data, the spectrum information corresponding to the gyroscope data may be obtained, and the jitter value of the image may be determined based on the spectrum information.

Illustratively, for the data in table 1, a fourier transform is performed on the dimensions of the x-axis, the y-axis and the z-axis, respectively, to obtain a spectrogram. For example, the result of fourier transform on the x-axis may be solved based on the following formula 1.

Wherein, X _k For Fourier transform results, k = {0,1,2,3 \8230;, 34}, x [ n =]Representing gyroscope data on the x-axis, N is 35.

Let w = | X ₀ |/sigma(|X _i And | w) represents the ratio of the lowest frequency information in the x-axis gyroscope data, and sigma () represents the summation operation. After the x-axis result is obtained, the x-axis result is multiplied by (1-w) to obtain an updated x-axis result. By analogy, the updated y-axis and z-axis results are obtained in a similar manner. And finally, solving the sum of the updated x-axis result, y-axis result and z-axis result, and solving the square root of the sum to obtain the jitter value of the image.

Step 1003, acquiring jitter values of the images t-k to t + k.

And under the condition that the image t is determined to be a tremor image, the tremor judgment unit sends the image t to a tremor removal unit. In addition, the tremor determination unit acquires the jitter values of the images t-k to t + k and inputs the jitter values of the images t-k to t + k to the tremor removal unit. The jitter values of the images t-k to t may be obtained before the jitter values of the images t are obtained.

In this embodiment, the way of acquiring the jitter values of the images t-k to t + k by the tremor determination unit is similar to the way of acquiring the jitter value of the image t, and is not repeated herein.

And step 1004, determining the main reference image M and the secondary reference image M according to the jitter values from the image t-k to the image t + k.

After obtaining the jitter values of the images t-k to t + k, the terminal determines the primary reference image M and the secondary reference image M in the images t-k to t-1 and the images t +1 to t + k. Further, assume that the stabilization value of any one image p of the images t-k to t-1 and t +1 to t + k is H, and the dither value of the image p is p _d . The stable value H is then in particular: h = 1/(p) _d +100 | p-t |). It can be seen that the larger the jitter value of an image p, the smaller the stability value of the image p; the further apart an image p is from an image t, the smaller the stable value of this image p.

In this way, the terminal can calculate the stable values H of the images t-k to t-1 and t +1 to t + k, respectively, and select one image with the largest stable value H among the images t-k to t-1 and t +1 to t + k as the main reference image M. If the main reference image M is one of the images t-k to t-1, the terminal continues to select one of the images t +1 to t + k with the largest stable value H as the sub-reference image M. If the main reference image M is one of the images t +1 to t + k, the terminal continues to select one of the images t-k to t-1 with the largest stable value H as the secondary reference image M.

Furthermore, the debounce unit may further define an algorithm help value Q, which is positively correlated with the stable value H1 of the primary reference image M and the stable value H2 of the secondary reference image M. The algorithm help value Q represents a help value for generating a stably displayed target image based on the primary reference image M and the secondary reference image M. For example, Q = (2 × h1+ h2)/3 may be defined. On the basis, a help value threshold may be set in advance, and if the algorithm help value Q is lower than the help value threshold, it may be determined that it is difficult to stabilize the image t based on the main reference image and the sub-reference image. For example, in the case where both the primary reference image and the secondary reference image near the image t are also tremor images, the primary reference image and the secondary reference image also cannot make the image t more stable. Thus, in case the algorithm help value Q is below the help value threshold, the de-tremor unit may end the processing of the image t, i.e. no longer generate the target image for the replacement image t.

Step 1005, generating a target image according to the main reference image M, the secondary reference image M and the image t.

In the present embodiment, the target image for the replacement image t is generated by dividing the main reference image M and the sub reference image M into a plurality of image blocks. Under the condition that the resolution of the main reference image M and the resolution of the secondary reference image M are high, the image blocks of the main reference image M and the secondary reference image M are divided, and the motion vectors of the image blocks need to consume too high calculation power, so that the pyramid downsampling method can be adopted in the embodiment to obtain the motion vectors of the image blocks in the main reference image M and the secondary reference image M.

Specifically, firstly, pyramid downsampling is performed on the main reference image M and the secondary reference image M to obtain a multi-layer downsampled image corresponding to the main reference image M and the secondary reference image M. For example, in a case where the resolution of the main reference image M is 1920 × 1080, the main reference image M may be downsampled 4 times to obtain a four-layer downsampled image corresponding to the main reference image M. The resolution of the four-layer down-sampled image may be, for example: 480 × 270, 120 × 68, 60 × 34, and 30 × 17. And for the main reference image M and the secondary reference image M, the motion vector of the image block of the main reference image Mx of each layer is obtained from the layer with the minimum resolution to the layer where the original image is located. The main reference image Mx represents an image corresponding to the main reference image M in any layer.

First, the main reference image Mx of the current layer is divided into a plurality of image blocks in the main reference image Mx by a specific size (for example, 8 × 8 pixels).

If the main reference image Mx is an image of one layer having the smallest resolution, the candidate motion vector of the image block q may be determined to be a preset candidate motion vector for the image block q in the main reference image Mx. For example, the default motion vectors may include (0, -1), (-1, 0), (-1, -1), (-1, 1), (1, -1), (0, 0), (0, 1), (1, 0), and (1, 1). If the main reference image Mx is not an image of a layer having the smallest resolution, then for an image block q in the main reference image Mx, candidate motion vectors for the image block q can be determined to include: the motion vector of the image block corresponding to the image block q in the previous layer image, the motion vector of the image block adjacent to the image block q, and a randomly generated motion vector.

For all candidate motion vectors of an image block q, the position of the image block q mapped into the secondary reference image mx and the image t according to the candidate motion vectors is calculated. Specifically, when the image block q is mapped onto the image t according to the candidate motion vector vec, the candidate motion vector is scaled to vec' = ((t-M)/(M-M)). Times vec according to the values of M, and t. After an image block q1 corresponding to the image block q in the secondary reference image mx and an image block q2 corresponding to the image block q in the image t are obtained, error values of the image block q and the image block q1 and the image block q2 are calculated respectively. Illustratively, the error values between image blocks may be calculated based on a variety of algorithms, such as the SAD algorithm. Finally, assuming that the error value between the image block q and the image block q1 is SAD1, and the error value between the image block q and the image block q2 is SAD2, weighting and summing the two error values to obtain an error value D of the candidate motion vector. The weights of the two error values can be set arbitrarily, for example, D = SAD1+ α SAD2, and α =1/2.

And after error values D corresponding to all candidate motion vectors of the image block q are obtained, selecting the candidate motion vector with the minimum error value D as the motion vector of the image block q in the current pyramid layer. Finally, by repeating the above process, the motion vectors of all image blocks in the main reference image Mx can be calculated.

In addition, after the motion vector of the layer where the original image of the main reference image M is located is obtained, the motion vector of the main reference image M may be smoothly corrected to eliminate individual obvious and unreasonable motion vectors. For example, when the difference between the motion vector of an image block and the motion vectors of other adjacent image blocks in the main reference image is too large, the motion vector of the image block is eliminated.

After the motion vector of each image block in the main reference image M is obtained, assuming that the target image to be generated is an image t ', the target image may be converted into a motion vector vec ' = (- (t-M)/(M-M)) × vec pointing from the image t ' to the main reference image M according to a conversion ratio based on the motion vector of each image block in the main reference image M.

In this way, if an image block does not have a motion vector pointing to the main reference image M in the image t', it is indicated that the image block belongs to a hole area, so that the hole area marking can be performed on the image block.

After the hole area in the image t ' is marked, the image blocks of the non-hole area in the image t ' can be filled according to the image blocks in the main reference image M, so as to obtain the image t ' still including the hole area.

It should be noted that, in the process of filling the image blocks of the non-hollow area in the image t ' according to the image block in the main reference image M, if a plurality of different image blocks exist in the main reference image M and point to the same image block in the image t ', the terminal selects an image block with a minimum error value D from the plurality of different image blocks according to the error value D of the motion vector of the plurality of different image blocks, and fills the image block with the minimum error value D into the image t '.

In addition, the tremor removal unit can calculate a motion vector corresponding to each image block in the secondary reference image m, and fill a cavity area in the image t 'according to the image block in the secondary reference image m to obtain a final target image t'.

At step 1006, the image t is replaced with a target image.

Finally, after the target image is obtained based on the main reference image M, the sub-reference image M, and the image t, the image t is replaced with the target image.

Similarly, for the plurality of images acquired by the terminal, based on the above steps 1001 to 1006, a tremor image in the plurality of images can be determined, and the tremor image in the plurality of images can be replaced by the generated stable image, and finally the plurality of images after replacing the stable image can be output.

The above embodiments describe the process of processing the video by the terminal in the video shooting process. The following will describe a process of processing a video when the terminal is in the night view shooting mode. Referring to fig. 12, fig. 12 is a schematic flowchart of another video processing method provided in this embodiment, where the video processing method includes the following steps.

Step 1201, when the terminal is in a night scene shooting mode, detecting whether a light area exists in an image shot by the terminal.

The terminal can judge whether the light area exists in the image or not by detecting the brightness value of the pixel in the shot image. Specifically, the luminance values of the pixels in the lighting area are all greater than a specific threshold, and the difference value between the luminance value of the pixel in the lighting area and the luminance value of the pixel in the adjacent area of the lighting area is greater than a preset set difference value. Wherein, the value of the specific threshold may be 180-230, for example, the specific threshold is specifically 224; the predetermined difference may be 25-50, for example, the predetermined difference is 32. For example, when the terminal detects that the luminance values of the pixels in a certain region are all greater than 224 and the difference value between the luminance value of the pixel in the region and the luminance value of the pixel in the adjacent region is greater than 32, the terminal may determine that the region is the light region. The adjacent area of the lamp area may be formed by pixels adjacent to pixels in the lamp area, that is, pixels in the adjacent area of the lamp area are adjacent to pixels in the lamp area.

In addition, after the terminal determines that the light area exists in the image, the terminal can also determine the position of the light area in the image according to the coordinates of the pixels located in the light area. The position of the light area in the image may be determined by coordinates of pixels at the boundary of the light area, that is, an area surrounded by the pixels at the boundary of the light area is the light area. The terminal may determine the location of the light zone by recording the coordinates of the pixels at the boundary of the light zone or the coordinates of all the pixels within the light zone.

Step 1202, when a light area exists in an image shot by the terminal, the terminal determines a first image in an acquired image sequence, and an overexposure occurs in an area adjacent to the light area in the first image.

The fact that overexposure occurs in the area adjacent to the light area in the first image means that the brightness value of pixels in the adjacent area is too high. The reason why the adjacent area of the light area in the first image is over-exposed is that the terminal is shaken and displaced when the first image is collected, so that the light sensing element in the terminal corresponding to the adjacent area also senses light in the light area, and finally the adjacent area of the light area in the first image is over-exposed.

Step 1203, the terminal determines a second image and a third image in the image sequence according to the first image.

The acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the third image, and the overexposure degree of the area adjacent to the lighting area in the second image and the third image is less than the overexposure degree of the area adjacent to the lighting area in the first image. The lighting areas in the second image and the third image have a corresponding relationship with the lighting area in the first image, that is, the lighting areas in the second image and the third image and the lighting area in the first image represent the same picture content. For example, the light areas in the second image and the third image represent the light-emitting areas of a street lamp; the light areas in the first image are also light-emitting areas representing the same street lamp.

Specifically, the fact that the degree of overexposure of the area adjacent to the lighting area in the second image and the third image is less than the degree of overexposure of the area adjacent to the lighting area in the first image may be: the area of an area subjected to overexposure in the adjacent areas of the lighting areas in the second image and the third image is smaller than the area of an area subjected to overexposure in the adjacent areas of the lighting areas in the first image; or the brightness values of the pixels with the overexposure in the adjacent areas of the lighting areas in the second image and the third image are smaller than the brightness values of the pixels with the overexposure in the adjacent areas of the lighting areas in the first image. Optionally, the area of a region where overexposure occurs in the vicinity of the light region in the second image and the third image may be 0, that is, the vicinity of the light region in the second image and the third image is not overexposed. In general, the second image and the third image are acquired by the terminal with little or no shake displacement relative to the first image, so that the degree of overexposure of the adjacent area of the light area in the second image and the third image is less than that in the first image.

Step 1204, the terminal generates a target image according to the first image, the second image and the third image, the target image is used for replacing the first image, and the degree of overexposure of the area adjacent to the lighting area of the target image is smaller than the degree of overexposure of the area adjacent to the lighting area of the first image. The second image and the third image are used as reference images to generate a target image replacing the first image, so that the degree of overexposure of the adjacent area of the lighting area of the generated target image is smaller than that of the adjacent area of the lighting area of the first image.

And step 1205, the terminal generates a video based on the target image. Specifically, the terminal may replace a first image in the acquired image sequence with a target image, thereby generating a video.

In the embodiment, a tremor image with overexposure in the adjacent area of the light area is positioned in the night scene shooting mode, images before and after the tremor image are used as reference images, and a target image for replacing the tremor image is generated based on the tremor image and the reference images. By the scheme, the tremor image in the video can be replaced by the normally displayed image, so that the problem that the bright part light area in the video image is abnormally expanded due to the camera shake is solved.

Optionally, the method provided in this embodiment further includes: and when the illuminance of the ambient light is smaller than a first preset threshold, the terminal enters a night scene shooting mode. Illustratively, the value of the first preset threshold may be 30-60, for example, the first preset threshold is 50. The terminal can detect the illuminance of the ambient light in the shooting process, and when the illuminance of the ambient light is less than 50, the terminal enters a night scene shooting mode. Or when the terminal acquires an instruction for triggering the night scene shooting mode, the terminal enters the night scene shooting mode. For example, in the process of shooting by operating the terminal by the user, the user can issue an instruction for entering a night scene shooting mode to the terminal in a mode of touching a terminal screen; therefore, when the terminal acquires the instruction for triggering the night scene shooting mode, the terminal enters the night scene shooting mode.

Optionally, the jitter displacement when the terminal collects the first image is greater than a second preset threshold. The second preset threshold may be 300-400, for example, the second preset threshold is 360. The jitter displacement when the terminal acquires the first image can be called as the jitter value of the first image. The terminal may determine the first image in the image sequence by determining whether a jitter value of an image in the image sequence is greater than or equal to a second preset threshold.

In some possible implementations, the terminal may acquire sensor data that records a motion of the terminal at the time of acquiring the first image, and calculate a jitter value of the first image based on the sensor data. The terminal may also calculate a contrast value of the image and determine the jitter value of the image according to the contrast value of the image, i.e. the smaller the contrast value of the image, the larger the jitter value of the image. In addition, the terminal can also determine the jitter value of the image based on a pre-trained neural network. Specifically, the manner in which the terminal determines the jitter value of the first image may refer to the foregoing embodiments, and is not described herein again.

In a possible implementation manner, the terminal may further determine whether a local lighting area exists in the plurality of images. And only when a certain image has a local light area and the jitter value of the image is greater than or equal to a second preset threshold value, the terminal determines the image as a tremor image. The method comprises the steps that a local lighting area exists in a first image, the ratio of the number of target pixels in the first image to the total number of pixels of the first image is within a preset range, and the target pixels are pixels in the lighting area in the first image. For example, the terminal may determine the number of target pixels in each of the plurality of images, where the target pixels are pixels in a light area, and the definition of the light area may refer to the above description. Then, the terminal obtains the ratio of the number of the target pixels in each image to the total number of the pixels of the image, and judges whether the ratio is in a preset range. If the ratio of the number of the target pixels in the image to the total number of the pixels in the image is within a preset range, the terminal determines that a local light area exists in the image; and if the ratio of the number of the target pixels in the image to the total number of the pixels in the image is not in a preset range, the terminal determines that no local light area exists in the image.

In addition, the manner of determining the second image and the third image in the image sequence based on the first image by the terminal is also similar to the above embodiment, and is not repeated herein.

On the basis of the embodiments corresponding to fig. 1a to 12, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme.

Specifically, referring to fig. 13, fig. 13 is a schematic structural diagram of a terminal 1300 according to an embodiment of the present application, where the terminal 1300 includes:

the detection unit 1301 is configured to detect whether a lighting area exists in an image shot by the terminal when the terminal is in a night scene shooting mode;

a first determining unit 1302, configured to determine, when a light area exists in an image captured by the terminal, a first image in an image sequence acquired by the terminal, where an overexposure occurs in an area adjacent to the light area in the first image;

a second determining unit 1303, configured to determine a second image and a third image in the image sequence according to the first image, where an acquisition time of the first image is located between an acquisition time of the second image and an acquisition time of the third image, and an overexposure degree of an area adjacent to a light area in the second image and the third image is less than an overexposure degree of an area adjacent to the light area in the first image;

an image generating unit 1304, further configured to generate a target image according to the first image, the second image, and the third image, where the target image is used to replace the first image, and an overexposure degree of a region adjacent to a lighting region of the target image is smaller than an overexposure degree of a region adjacent to the lighting region of the first image;

the video generating unit 1305 is further configured to generate a video based on the target image.

In one possible implementation manner, the method further includes: a control unit 1306;

the control unit 1306 is configured to: when the illuminance of the ambient light is smaller than a first preset threshold, controlling the terminal to enter a night scene shooting mode; or when an instruction for triggering the night scene shooting mode is acquired, controlling the terminal to enter the night scene shooting mode.

In a possible implementation manner, the second determining unit 1303 is configured to: determining a stabilization value for each of the plurality of images separately, the stabilization value being an inverse of a sum of a jitter value of the image and an acquisition time difference between the image and the first image; determining an image with the largest stable value in the plurality of images as the second image; and determining the third image in the plurality of images according to the acquisition time of the second image.

In a possible implementation manner, the second determining unit 1303 is configured to: determining one or more images in the plurality of images according to the acquisition time of the second image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images; and determining the image with the largest stable value in the one or more images as the third image.

In one possible implementation, the image generating unit 1304 is configured to: dividing the second image into a plurality of image blocks; determining motion vectors corresponding to the image blocks respectively according to the first image and the third image; and generating the target image according to the motion vectors respectively corresponding to the image blocks and the image blocks.

In one possible implementation, the image generating unit 1304 is configured to: obtaining a plurality of candidate motion vectors corresponding to a first image block, wherein the plurality of image blocks comprise the first image block; determining a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors according to the position of the first image block, wherein the first image comprises the second image block, and the third image comprises the third image block; determining a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block; and determining the motion vector with the minimum target error value in the candidate motion vectors as the motion vector corresponding to the first image block according to the target error value corresponding to each candidate motion vector.

In one possible implementation manner, the image generating unit 1304 is configured to: if the target image has an area which cannot generate an image based on the image blocks, dividing the third image to obtain a plurality of image blocks of the third image; determining motion vectors corresponding to a plurality of image blocks of the third image according to the first image and the second image; and updating the target image according to the motion vectors corresponding to the image blocks of the third image and the image blocks of the third image to obtain a new target image, wherein the new target image is used for replacing the first image.

The video processing method provided by the embodiment of the present application may be specifically executed by a chip in a terminal, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit to make the chip in the server execute the video processing method described in the embodiment shown in fig. 1a to 11. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Referring to fig. 14, the present application further provides a computer program product in which, in some embodiments, the method disclosed in fig. 3 above may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or on other non-transitory media or articles of manufacture.

Fig. 14 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.

In one embodiment, computer program product 1400 is provided using signal bearing medium 1401. The signal bearing medium 1401 may comprise one or more program instructions 1402 which, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to fig. 2. Thus, for example, referring to the embodiment shown in FIG. 3, one or more features of steps 301-306 may be undertaken by one or more instructions associated with the signal bearing medium 1401. Further, program instructions 1402 in FIG. 14 also describe example instructions.

In some examples, the signal bearing medium 1401 may comprise a computer readable medium 1403 such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disc (DVD), a digital tape, a memory, a ROM, or a RAM, among others.

In some embodiments, the signal bearing medium 1401 may comprise a computer recordable medium 1404 such as, but not limited to, a memory, a read/write (R/W) CD, a R/W DVD, and the like. In some implementations, the signal bearing medium 1401 may include a communication medium 1405 such as, but not limited to, a digital and/or analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.). Thus, for example, signal-bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (e.g., a wireless communication medium conforming to the IEEE 802.14 standard or other transmission protocol).

The one or more program instructions 1402 may be, for example, computer-executable instructions or logic-implementing instructions. In some examples, a computing device of the computing device can be configured to provide various operations, functions, or actions in response to program instructions 1402 conveyed to the computing device by one or more of the computer-readable medium 1403, the computer-recordable medium 1404, and/or the communication medium 1405.

It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and that some elements may be omitted altogether depending upon the desired results. In addition, many of the described elements are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and location.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

Claims

1. A video processing method, applied to video shooting, the method comprising:

when the terminal is in a night scene shooting mode, detecting whether a light area exists in an image shot by the terminal;

when a light area exists in an image shot by the terminal, determining a first image in an image sequence collected by the terminal, wherein an area adjacent to the light area in the first image is overexposed;

determining a second image and a third image in the image sequence according to the first image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the third image, and the overexposure degree of the adjacent area of the light area in the second image and the third image is less than the overexposure degree of the adjacent area of the light area in the first image;

generating a target image according to the first image, the second image and the third image, wherein the target image is used for replacing the first image, and the degree of overexposure of the area adjacent to the lighting area of the target image is smaller than that of the area adjacent to the lighting area of the first image;

and generating a video based on the target image.

2. The method of claim 1, further comprising:

when the illuminance of the ambient light is smaller than a first preset threshold, the terminal enters a night scene shooting mode;

or when the terminal acquires an instruction for triggering the night scene shooting mode, the terminal enters the night scene shooting mode.

3. The method according to claim 1 or 2, wherein the shake displacement when the terminal acquires the first image is greater than a second preset threshold.

4. The method of claim 3, wherein a ratio of a number of target pixels in the first image to a total number of pixels in the first image is within a predetermined range, and the target pixels are pixels in a light area in the first image.

5. The method of any of claims 1-4, wherein determining a second image and a third image in the plurality of images from the first image comprises:

determining a stabilization value for each of the plurality of images separately, the stabilization value being an inverse of a sum of a jitter value of the image and an acquisition time difference between the image and the first image;

determining an image with the largest stable value in the plurality of images as the second image;

and determining the third image in the plurality of images according to the acquisition time of the second image.

6. The method of claim 5, wherein determining the third image in the plurality of images according to the acquisition time of the second image comprises:

determining one or more images in the plurality of images according to the acquisition time of the second image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images;

and determining the image with the largest stable value in the one or more images as the third image.

7. The method of any of claims 1-6, wherein generating the target image from the first image, the second image, and the third image comprises:

dividing the second image into a plurality of image blocks;

determining motion vectors corresponding to the image blocks respectively according to the first image and the third image;

and generating the target image according to the motion vectors respectively corresponding to the image blocks and the image blocks.

8. The method according to claim 7, wherein the determining motion vectors corresponding to the image blocks from the first image and the third image comprises:

obtaining a plurality of candidate motion vectors corresponding to a first image block, wherein the plurality of image blocks comprise the first image block;

determining a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors according to the position of the first image block, wherein the first image comprises the second image block, and the third image comprises the third image block;

determining a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block;

and determining the motion vector with the minimum target error value in the candidate motion vectors as the motion vector corresponding to the first image block according to the target error value corresponding to each candidate motion vector.

9. The method of claim 8, wherein the plurality of candidate motion vectors comprises: one or more preset motion vectors, one or more randomly generated motion vectors, and/or motion vectors corresponding to image blocks adjacent to the first image block.

10. The method according to any one of claims 7-9, further comprising:

if the target image has an area which cannot generate an image based on the image blocks, dividing the third image to obtain a plurality of image blocks of the third image;

determining motion vectors corresponding to a plurality of image blocks of the third image according to the first image and the second image;

and updating the target image according to the motion vectors corresponding to the image blocks of the third image and the image blocks of the third image to obtain a new target image, wherein the new target image is used for replacing the first image.

11. A video processing apparatus comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when executed, the video processing apparatus performs the method of any of claims 1 to 10.

12. A video processing apparatus, comprising:

a second determining unit, configured to determine a second image and a third image in the image sequence according to the first image, where an acquisition time of the first image is located between an acquisition time of the second image and an acquisition time of the third image, and overexposure degrees of areas adjacent to a light area in the second image and the third image are less than the overexposure degree of an area adjacent to the light area in the first image;

13. The apparatus of claim 12, further comprising: a control unit;

the control unit is used for: when the illuminance of the ambient light is less than a first preset threshold, controlling the terminal to enter a night scene shooting mode; or when an instruction for triggering the night scene shooting mode is acquired, controlling the terminal to enter the night scene shooting mode.

14. The apparatus according to claim 12 or 13, wherein the shake displacement when the terminal acquires the first image is larger than a second preset threshold.

15. The apparatus of claim 14, wherein a ratio of a number of target pixels in the first image to a total number of pixels in the first image is within a preset range, and the target pixels are pixels in a light area in the first image.

16. The apparatus according to any of claims 12-15, wherein the second determining unit is configured to: determining a stabilization value for each of the plurality of images, respectively, the stabilization value being an inverse of a sum of a jitter value of the image and an acquisition time difference between the image and the first image; determining an image with the largest stable value in the plurality of images as the second image; and determining the third image in the plurality of images according to the acquisition time of the second image.

17. The apparatus of claim 16, wherein the second determining unit is configured to: determining one or more images in the plurality of images according to the acquisition time of the second image, wherein the acquisition time of the first image is between the acquisition time of the second image and the acquisition time of the one or more images; and determining the image with the largest stable value in the one or more images as the third image.

18. The apparatus according to any of claims 12-17, wherein the image generation unit is configured to: dividing the second image into a plurality of image blocks; determining motion vectors corresponding to the image blocks respectively according to the first image and the third image; and generating the target image according to the motion vectors respectively corresponding to the image blocks and the image blocks.

19. The apparatus of claim 18, wherein the image generation unit is configured to: obtaining a plurality of candidate motion vectors corresponding to a first image block, wherein the plurality of image blocks comprise the first image block; determining a second image block and a third image block corresponding to each candidate motion vector in the plurality of candidate motion vectors according to the position of the first image block, wherein the first image comprises the second image block, and the third image comprises the third image block; determining a target error value corresponding to each candidate motion vector according to the first image block, the second image block and the third image block, wherein the target error value is obtained based on an error value between the first image block and the second image block and an error value between the first image block and the third image block; and determining the motion vector with the minimum target error value in the candidate motion vectors as the motion vector corresponding to the first image block according to the target error value corresponding to each candidate motion vector.

20. The apparatus of claim 19, wherein the plurality of candidate motion vectors comprises: one or more preset motion vectors, one or more randomly generated motion vectors, and/or motion vectors corresponding to image blocks adjacent to the first image block.

21. The apparatus according to any of claims 18-20, wherein the image generation unit is configured to: if the target image has an area which cannot generate an image based on the image blocks, dividing the third image to obtain a plurality of image blocks of the third image; determining motion vectors corresponding to a plurality of image blocks of the third image according to the first image and the second image; and updating the target image according to the motion vectors corresponding to the image blocks of the third image and the image blocks of the third image to obtain a new target image, wherein the new target image is used for replacing the first image.

22. A terminal device, comprising a processor, a memory, a display screen, a camera, and a bus, wherein:

the processor, the display screen, the camera and the memory are connected through the bus;

the memory is used for storing a computer program and images acquired by the camera;

the processor is configured to control the memory, obtain the image stored in the memory and execute the program stored in the memory, and further configured to control the display screen to implement the method steps of any one of claims 1 to 10.

23. A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 10.

24. A computer program product comprising instructions for causing a terminal to perform the method of any one of claims 1 to 10 when the computer program product is run on the terminal.