CN115587938A

CN115587938A - Video distortion correction method and related equipment

Info

Publication number: CN115587938A
Application number: CN202110757403.0A
Authority: CN
Inventors: 张金雷; 葛权耕; 李贤法; 牛迪; 郑芝寰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2023-01-10

Abstract

The application provides a video distortion correction method, which comprises the following steps: acquiring a video image; determining offset information for performing stretching distortion correction on the video image through an energy function with a time domain stability constraint or a global deformation projection model; and mapping the video image according to the offset information to obtain a corrected image. According to the method and the device, the stretching distortion correction can be carried out on the video image, and the time domain stability of the video image after the stretching distortion correction is guaranteed.

Description

Video distortion correction method and related equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a video distortion correction method and related apparatus.

Background

When an image is captured by an imaging device, a subject is deformed (referred to as optical distortion) by a lens module and also deformed (referred to as perspective distortion) by perspective projection in a process of being projected onto an image plane through a lens. Perspective distortion is mainly caused by perspective projection, and mainly includes two types: one is stretching distortion, which is mainly caused by the fact that a three-dimensional object is projected to a two-dimensional plane to cause distortion, and the stretching distortion is more obvious when a shot object is closer to the edge of a lens and the angle of field of the lens is larger; the other is distortion caused by the near, far and small, and the more typical is distortion of the big nose. There have been some correction schemes for stretch distortion. However, the existing correction scheme for the stretching distortion does not consider the time domain stability, cannot ensure the time domain stability of the stretching distortion correction effect, and is not suitable for processing the video.

Disclosure of Invention

The embodiment of the application provides a video distortion correction method and related equipment, which can perform stretching distortion correction on a video image and ensure the time domain stability of the video image after the stretching distortion correction.

The first aspect of the present application discloses a video distortion correction method, which includes:

acquiring a video image;

determining, by an energy function or a global deformation projection model having a temporal stability constraint, offset information for stretch distortion correction of the video image;

and mapping the video image according to the offset information to obtain a corrected image.

The method and the device can be used for stretching distortion correction of the video image, and ensure the time domain stability of the video image after stretching distortion correction.

In some alternative embodiments, determining offset information for stretch distortion correction of the video image by an energy function having a temporal stability constraint comprises:

constructing an energy function with a time domain stability constraint on the video image, wherein the time domain stability constraint is used for constraining the size of offset information between adjacent video frames at the same position;

and carrying out optimization solution on the energy function to obtain the offset information.

In some optional embodiments, the constructing an energy function with temporal stability constraints on the video image comprises:

acquiring the face area and/or optical flow information of the video image;

and calculating the time domain stability constraint according to the human face area and/or the optical flow information.

In some optional embodiments, the acquiring the face region of the video image includes:

detecting the position of a face frame of the video image, and determining the face area according to the position of the face frame; or

And detecting the position of a face frame and the position of a portrait of the video image, and determining the face area according to the position of the face frame and the position of the portrait.

In some optional embodiments, the calculating the temporal stability constraint according to the face region includes:

the positions of the closest grid points in the face region of one video frame of the video image and the face region of the previous video frame of the video image are Vij and Uij respectively, and the time domain stability constraint is | Uij-Vij |.

In some optional embodiments, calculating the temporal stability constraint from the optical flow information comprises:

the position of a grid point in a video frame of the video image is Q _ij Deducing the position of the grid point corresponding to the previous frame of the video frame as P according to an optical flow method _i′j′ The temporal stability constraint is | P _i′j′ -Q _ij |。

In some optional embodiments, the energy function further has a foreground constraint, a background constraint, and a canonical constraint, or the energy function further has a foreground constraint, a background constraint, a canonical constraint, and a boundary constraint.

In some optional embodiments, if the position of the portrait in the video image is close to the center of the video image, the energy function has the boundary constraint; or

If the number of the portraits in the video image is one, the energy function has the boundary constraint.

In some alternative embodiments, determining, by a global warping projection model, offset information for stretch distortion correction of the video image comprises:

performing off-line simulation learning on the global deformation projection model;

and inputting the video frame of the video image into the global deformation projection model to obtain the offset information.

In some optional embodiments, the performing offline simulation learning on the global deformation projection model includes:

and performing off-line simulation learning on the global deformation projection model according to the scene.

In some optional embodiments, the performing offline simulation learning on the global deformation projection model according to scenes includes:

performing off-line simulation learning on the global deformation projection model according to a single-person scene and a multi-person scene; or

And performing off-line simulation learning on the global deformation projection model according to a wide-angle shooting scene and a common shooting scene.

In some optional embodiments, the method further comprises:

determining cutting information corresponding to the offset information;

and cutting the corrected image according to the cutting information to obtain a target image.

A second aspect of the present application discloses a video distortion correction method applied to an electronic device, the method including:

acquiring a video image;

determining first offset information used for carrying out optical distortion correction on the video image and a first cropping size corresponding to the first offset information;

determining second offset information used for carrying out anti-shake processing on the video image and a second cutting size corresponding to the second offset information;

determining third offset information for performing stretching distortion correction on the video image and a third cutting size corresponding to the third offset information;

mapping the video image according to the first offset information, the second offset information and the third offset information to obtain a corrected video image;

and cutting the corrected video image according to the first cutting size and the second cutting size to obtain a target video image.

In some optional embodiments, the determining third offset information for stretch distortion correction of the video image comprises:

determining the third offset information by an energy function with a temporal stability constraint or a global deformation projection model.

In some optional embodiments, determining the third offset information by an energy function having a temporal stability constraint comprises:

acquiring the face area and/or optical flow information of the video image;

In some alternative embodiments, if the position of the portrait in the video image is close to the center of the video image, the energy function further has a boundary constraint; or alternatively

If the number of the portraits in the video image is one, the energy function also has the boundary constraint.

In some optional embodiments, the method further comprises:

judging whether the electronic equipment is switched between the wide-angle camera and the main camera;

if the electronic equipment switches between the wide-angle camera and the main camera, the first cutting size, the second cutting size and the third cutting size are updated, so that the wide-angle camera and the main camera keep field hopping fixed during switching.

In some optional embodiments, the updated first crop size, second crop size, and third crop size satisfy:

Fov _sub ＝Fov _main +ΔFov+Crop _eis +Crop _pdc +Crop _{extra_new} ；

Crop _extra ＝Crop _pdc +Crop _{extra_new} ；

wherein, fov _sub For the image size, fov, corresponding to the wide-angle camera _main Is the image size corresponding to the main camera, and delta Fov is the difference of the sheeting fields of view of the main camera and the wide-angle camera, crop _eis For the second Crop size, crop _pdc Crop size of the first Crop size, crop _{extra_new} Crop size of the third Crop size, crop _extra And the cutting size is reserved in the anti-shake processing of the wide-angle camera.

A second aspect of the present application discloses a video distortion correction method, the method comprising:

obtaining a tiny stream, a preview stream and a recording stream of a camera device;

carrying out face detection, portrait segmentation and optical flow calculation on the tiny stream to obtain a first face frame position, a first portrait position and first optical flow information;

determining first offset information used for optical distortion correction and a first cutting size corresponding to the first offset information;

determining second offset information used for anti-shake processing and a second cutting size corresponding to the second offset information;

mapping the preview stream and the recording stream according to the first offset information and the second offset information to obtain a first correction preview stream and a first correction recording stream, and cutting the first correction preview stream and the first correction recording stream according to the first cutting size and the second cutting size to obtain a second correction preview stream and a second correction recording stream;

mapping the first face frame position, the first face position and the first optical flow information according to the first offset information and the second offset information to obtain a second face frame position, a second face position and a second optical flow information;

constructing an energy function with time domain stability constraint on the second correction preview stream and the second correction recording stream, and performing optimization solution on the energy function to obtain third offset information for stretching distortion correction and a third clipping size corresponding to the third offset information;

and mapping the second correction preview stream and the second correction recording stream according to the third offset information to obtain a third correction preview stream and a third correction recording stream, and cutting the third correction preview stream and the third correction recording stream according to the third cutting size to obtain a target preview stream and a target recording stream.

In some optional embodiments, the method further comprises:

performing preview display according to the target preview stream; and/or

Video encoding the target recording stream.

The video distortion correction method provided by the embodiment of the application can automatically extract the code template from a large number of software projects (for example, hundreds of thousands or millions of orders of magnitude), and the extracted code template has rich functions and can meet the requirements of developers. The method shows the method calling relation of the software project through the graph structure, so that the extracted code template has a complete calling chain, and the usability of the code template is improved. The method automatically extracts the code template, and avoids deviation and omission of artificial summary.

A fourth aspect of the present application discloses a computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the video distortion correction method of the first aspect.

A fifth aspect of the present application discloses an electronic device comprising a processor and a memory, the memory being configured to store instructions, and the processor being configured to invoke the instructions in the memory so that the electronic device performs the video distortion correction method of the first or second aspect.

A sixth aspect of the present application discloses a chip system applied to an electronic device; the chip system comprises an interface circuit and a processor; the interface circuit and the processor are interconnected through a line; the interface circuit is used for receiving signals from a memory of the electronic equipment and sending the signals to the processor, and the signals comprise computer instructions stored in the memory; when the computer instructions are executed by a processor, the system-on-chip performs a video distortion correction method as in the first, second or third aspect.

It should be understood that the computer-readable storage medium of the fourth aspect, the electronic device of the fifth aspect, and the chip system of the sixth aspect all correspond to the methods of the first aspect, the second aspect, and the third aspect, and therefore, the beneficial effects achieved by the computer-readable storage medium of the fourth aspect, the electronic device of the fifth aspect, and the chip system of the sixth aspect may refer to the beneficial effects in the corresponding methods provided above, and are not repeated herein.

Drawings

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Fig. 2 is a block diagram of a software structure of an electronic device according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a photographing interface and an album interface of an electronic device according to an embodiment of the present application.

Fig. 4 is a flowchart of a video distortion correction method according to an embodiment of the present application.

Fig. 5 is a supplement to the flowchart shown in fig. 4.

Fig. 6 is a flowchart of a video distortion correction method according to another embodiment of the present application.

Fig. 7 is a schematic diagram of obtaining a temporal stability constraint based on face regions.

FIG. 8 is a schematic diagram of obtaining a temporal stability constraint based on optical flow information.

Fig. 9 is a flowchart of a video distortion correction method according to another embodiment of the present application.

Fig. 10 is a schematic diagram of obtaining total offset information from the first offset information, the second offset information, and the third offset information.

Detailed Description

For ease of understanding, some descriptions of concepts related to the embodiments of the present application are given by way of illustration and reference.

In the present application, "at least one" means one or more, "and" a plurality "means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B may represent: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The terms "first," "second," "third," "fourth," and the like in the description and in the claims and drawings of the present application, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In order to better understand the video distortion correction method and the related device provided in the embodiments of the present application, first, an application scenario of the video distortion correction method of the present application is described below.

The video distortion correction method provided by the embodiment of the application is applied to electronic equipment. The electronic device may be a terminal having a shooting function, such as a mobile phone, a tablet computer, a notebook computer, a handheld computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart band, a smart watch, etc.), an Augmented Reality (AR) device, a Virtual Reality (VR) device, a camera device (e.g., a video recorder, a smart camera, a digital camera, a video camera, etc.), an in-vehicle device, and the like. When the electronic device is a terminal with a shooting function, the electronic device can shoot a video and perform stretching distortion correction on the shot video.

The electronic device may also be a terminal without a shooting function. When the electronic device is a terminal without a shooting function, the electronic device may acquire a video from another device or from a preset address, and perform stretch distortion correction on the acquired video.

Fig. 1 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device shown in fig. 1 has a shooting function.

As shown in fig. 1, the electronic device 10 may include: a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, a camera 193, and a display screen 194. Optionally, the electronic device 10 may further include one or more of an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, an indicator 192, and a Subscriber Identity Module (SIM) card interface 195. Among other things, the sensor module 180 may include one or more of a pressure sensor, a gyroscope sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, and the like.

Processor 110 may include one or more processing units. For example, the processor 110 may include an Application Processor (AP), a modem processor, a Graphic Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor and/or a neural Network Processor (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors.

A memory may be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. A memory provided in the processor 110 may store instructions or data used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, the instruction or data can be directly called from the memory, so that repeated access is avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, and the like.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only an exemplary illustration, and does not constitute a structural limitation for the electronic device 10. In other embodiments of the present application, the electronic device 10 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The processor 110 may perform instruction fetching, instruction execution control, and data calling according to the instruction operation code and the timing signal. In particular, the processor 110 may be configured to perform the video distortion correction method described in the embodiments of the present application.

The charging management module 140 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives an input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, the wireless communication module 160, and the like.

The wireless communication function of the electronic device 10 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The electronic device 10 may implement image/video display functionality via the GPU, the display screen 194, and the application processor, among other things. The GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor. The GPU performs mathematical and geometric calculations on the image for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a Mini LED, a Micro OLED, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 10 may include 1 or N display screens 194, N being a positive integer greater than 1.

In some implementations, the display screen 194 can be a touch screen, which can include a display panel and a touch-sensitive surface overlying the display panel. When a touch operation is detected on or near the touch-sensitive surface (e.g., a user touches, clicks, presses, slides, etc. on or near the touch-sensitive surface using a finger, stylus, etc. any suitable object or attachment), the touch operation is communicated to the processor 110 to determine the type of touch event, and the processor 110 then provides a corresponding visual output on the display panel according to the type of touch event. In one example, the touch-sensitive surface and the display panel are implemented as two separate components to implement input and input functions. In yet another example, the touch-sensitive surface and the display panel are integrated to implement input and output functions.

Optionally, the touch-sensitive surface may further include two portions, namely a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch sensitive surface may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves.

Optionally, the display panel may be configured to display information input by a User or information provided to the User and various display interfaces of the electronic device 10, where the display interfaces may be User Interfaces (UIs) or Graphical User Interfaces (GUIs), and the Interface contents may include interfaces of running applications, system-level menus, and the like, and may specifically be composed of images (pictures), texts (text), icons (Icon), videos (video), buttons (Button), sliders (Scroll Bar), menus (Menu), windows (Window), labels (Label), input boxes (input box), and any combination thereof.

The electronic apparatus 10 may implement a photographing function by the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.

The ISP is used to process data fed back from the camera 193. For example, when taking a picture, the shutter is opened, light is transmitted to an image sensor (i.e., a photosensitive element) of the image pickup device 193 through the lens, an optical signal is converted into an electrical signal, and the image sensor transmits the electrical signal to the ISP for processing and converting into an image visible to the naked eye. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. An object generates an optical image through a lens (such as an optical lens module) and projects the optical image to an image sensor. The image sensor may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The image sensor converts the optical signal into an electrical signal and then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. The image signal may be displayed through the display screen 194 and/or saved to the internal memory 121 or an external memory.

In some embodiments, the camera 193 may include a camera, such as an infrared camera or other camera, for capturing images required for face recognition. The camera for collecting the image required by face recognition is generally located on the front side of the electronic device, for example, above the touch screen, and may also be located at other positions.

In some embodiments, the camera of the camera 193 includes, but is not limited to, an optical camera, an infrared camera, and a depth camera, and the specific form may be a monocular camera or a monocular camera. The lens of the camera can be a standard lens, a wide-angle lens, an ultra-wide-angle lens, a fisheye lens or a long-focus lens, or a combination of the above lenses.

In some embodiments, the camera of camera device 193 may include a front camera and/or a rear camera.

Herein, the video image signal output by the image pickup device 193 may be referred to as "original video image". The raw video image may be output to a processor for further video distortion correction processing.

Video codecs are used to compress or decompress digital video. The electronic device 10 may support one or more video codecs such that the electronic device 10 may play or record video in a variety of encoding formats (e.g., moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.).

The DSP is used to process digital signals, and may process other digital signals in addition to digital image signals. For example, when the electronic device 10 is in frequency bin selection, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

The NPU is used for rapidly processing input information by referring to a biological neural network structure, such as by referring to a transfer mode between human brain neurons, and can also continuously learn by self. Applications such as intelligent recognition of the electronic device 10, for example, image recognition, face recognition, voice recognition, text understanding, etc., may be implemented by the NPU.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 10 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a camera APP or an image beautification APP) required for at least one function, and the like. The storage data area may store data created during use of the electronic device 10 (e.g., corrected image data), and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 10. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as images and videos are saved in an external memory card.

Optionally, the electronic device 10 may further implement an audio function, such as music playing, video background music playing, sound recording, and the like, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. The audio module 170 is configured to convert digital audio information into an analog audio signal for output, and also configured to convert an analog audio input into a digital audio signal. The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. The microphone 170C, also called "microphone" or "microphone", converts a sound signal into an electrical signal. The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association) standard interface of the USA.

The keys 190 may include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys or touch keys. The electronic device 10 may receive key inputs resulting in key signal inputs relating to user settings and function control of the electronic device 10.

Indicator 192 may be an indicator light that may be used to indicate a change in charge status, charge level, or may be used to indicate a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic device 10 by being inserted into the SIM card interface 195 or pulled out of the SIM card interface 195. In some embodiments, the electronic device 10 employs an eSIM, i.e., an embedded SIM card. The eSIM card can be embedded in the electronic device 10 and cannot be separated from the electronic device 10.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the electronic device 10. In other embodiments of the present application, electronic device 10 may include more or fewer components than shown, or combine two or more components, or split certain components, or have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The software system of the electronic device 10 may employ a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. A software structure of the electronic device 10 is exemplarily described below by taking an Android (Android) system of a hierarchical architecture as an example.

Fig. 2 is a block diagram of a software structure of the electronic device 10 according to the embodiment of the present application.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, which are an application layer, an application framework layer, an Android runtime (Android runtime) and system library and a kernel layer from top to bottom.

The application layer may include a series of application packages. As shown in FIG. 2, the application packages may include applications such as a camera APP, an image beautification APP, and an album APP.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. As shown in FIG. 2, the application framework layers may include a window manager, a content provider, an explorer, a view system, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen and judge whether a status bar, a locked screen and the like exist.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include image data, video data, and the like.

The resource manager provides various resources, such as localized strings, icons, pictures, layout files, video files, etc., to the application.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build a display interface for an application.

For example, a display interface of a camera APP presented by the view system is shown in (1) in fig. 3, a photographing interface 20 may be displayed on a display panel of the electronic device 10, and the photographing interface 20 may include a preview box 204 and some related controls, such as an image browsing control 201, a photographing control 202, a front-back camera switching control 203, a focal length adjusting control 205, and the like.

The preview frame 204 is used to preview a video image to be captured.

The image presented by the preview box 204 may be an original video image output by the camera 193. Alternatively, the image presented by the preview box 204 may be an image subjected to optical distortion correction, anti-shake processing, or stretching distortion correction. Alternatively, the image presented by the preview box 204 may be a video image (which may be referred to as a target video image) that has undergone optical distortion correction, anti-shake processing, and stretching distortion correction.

When the user clicks or touches the front-rear camera switching control 203, the electronic device 10 may be instructed to select a front camera or a rear camera for shooting.

The electronic device may include multiple cameras, such as a main camera, a tele camera, and a wide camera. The electronic device may include different focal segments corresponding to different cameras (e.g., focal segments including 0.6x,1x, 5x,0.6x for a wide camera, 1x for a main camera, and 5x for a tele camera). When the user clicks or slides the focal zone adjustment control 205, the electronic device 10 may be instructed to select a different camera for shooting.

In the video recording mode, when the user clicks or touches the shooting control 202, the electronic device 10 drives the camera 193 to initiate a video shooting operation, instructs the underlying system library to perform optical distortion correction, anti-shake processing, and stretching distortion correction on the original video image, and stores the target video image after the optical distortion correction, anti-shake processing, and stretching distortion correction in an album.

When the user clicks or touches the image browsing control 201, the electronic device 10 may call the photo album APP and display the target video image after the optical distortion correction, the anti-shake processing, and the stretching distortion correction. As shown in (2) in fig. 3, the display interface 30 of the photo album APP may include the obtained object 301 and optionally further include thumbnails 302 of one or more recently captured images/videos.

Android Runtime (Android Runtime) is responsible for the scheduling and management of the Android system, and may include a core library and a virtual machine. The core library comprises two parts: one part is a function which is needed to be called by the iava language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media Libraries (Media Libraries), graphics engines (e.g., SGL), and the like.

The surface manager is used for managing the display subsystem and providing the layer fusion function for a plurality of application programs.

The media library supports a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The graphics engine is a drawing engine for image processing. In the embodiment of the application, the graphic engine can be used for processing the original video image into the target video image.

The kernel layer is a layer between hardware and software. The kernel layer at least comprises a display driver, a camera driver, an audio driver, a sensor driver and the like. The camera driver may be configured to drive the camera of the electronic device 10 to perform shooting, and the display driver may be configured to display an image on a display panel of the display screen.

Some concepts related to embodiments of the present application are described below.

In the embodiment of the present application, the "original video image" represents a video image captured by a camera device of an electronic apparatus. The image pickup apparatus includes an optical lens module and an image sensor, and in shooting, light of a subject is focused by the optical lens module and projected onto a Charge Coupled Device (CCD), for example, where the image sensor includes a CCD. CCDs are made of high sensitivity semiconductor materials and typically include many photosensitive units, typically in mega-pixels. When the CCD surface is illuminated by light, each photosite will reflect a charge on the assembly, which can convert the light into an electrical signal. The signals generated by all the light sensing units are added together to form a complete electrical signal. The CCD then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format.

The imaging process of the camera essentially reflects the transformation process of the coordinate system. That is, the imaging process is actually a process of converting a point of a subject in a world coordinate system into an imaging coordinate system of the imaging device, then projecting the point to an image physical coordinate system of the imaging plane, and finally converting data on the imaging plane into an image pixel coordinate system of the image plane.

In the process of projecting an object through a lens onto an image plane, the object is deformed (i.e., optically distorted) by a lens module, and also deformed (i.e., perspective distorted) by perspective projection. Perspective distortion includes tensile distortion. When an object (e.g., a person) is closer to an edge at an imaging position corresponding to the optical lens, since the object is stereoscopic and the same size is input, the closer the imaging result is to the edge, the larger the imaging size is, thereby causing a tensile deformation. The stretching distortion becomes more noticeable as the angle of field of the lens is larger as the subject approaches the edge of the lens.

Therefore, in the embodiment of the present application, the "original video image" captured by the camera device may also be referred to as a "distorted video image".

In order to reduce distortion in a video image, in the embodiment of the present application, first offset information for optical distortion correction and second offset information for anti-shake may be determined, and then third offset information for stretching distortion correction may be determined, and video image correction may be performed based on the first offset information, the second offset information, and the third offset information.

Based on the above description, some video distortion correction methods provided by the embodiments of the present application are given below.

For the sake of convenience, the method embodiments described below are all expressed as a combination of a series of action steps, but those skilled in the art should understand that the specific implementation of the technical solution of the present application is not limited by the order of the series of action steps described.

Fig. 4 is a flowchart of a video distortion correction method according to an embodiment of the present application. The method may be applied to an electronic device having a photographing function (e.g., the electronic device in fig. 1) and may also be applied to an electronic device not having a photographing function. The method includes, but is not limited to, the steps of:

401, a video image is acquired.

In one embodiment of the present application, an electronic apparatus includes an image pickup device. The user can shoot videos through the camera APP on the electronic device. The user interface of the camera APP may include a shooting mode option (for example, a night scene mode, a portrait mode, a photographing mode, a video recording mode, and the like) and a shooting control, and when the user selects the video recording mode and clicks or touches the shooting control on the user interface of the camera APP, the electronic device drives the camera device to record a video to obtain a video image.

It should be noted that the video distortion correction method provided in the embodiments of the present application can process a video image in real time. The scheme can be effective not only during video recording (namely after a user clicks or touches the shooting control), but also during previewing. That is, the user selects the recording mode, and when recording is not started, the preview stream can be processed according to this scheme. After the video recording is started, the preview stream and the recorded stream can be processed according to the scheme, so that the consistency of the video recording and the preview is ensured. The details will be described in fig. 5.

In another embodiment of the present application, the electronic device does not include an imaging device. Video images taken by other devices may be acquired. For example, the electronic apparatus acquires a video image from an externally connected camera, or downloads a video image from a network.

First offset information for performing optical distortion correction on the video image and a first crop size corresponding to the first offset information are determined 402.

The offset information is used for moving pixel points of the image to obtain a corrected image. The offset information comprises the offset of the pixel point in the horizontal direction and the offset of the pixel point in the vertical direction, and the pixel can be moved from the original position to a new position according to the offset information.

The first offset information is used for carrying out optical distortion correction on the image so as to correct deformation caused by the lens module.

Specifically, the electronic device may use an optical distortion correction algorithm to obtain the first offset information for the video image. Illustratively, the optical distortion correction algorithm may be a Zhang friend camera calibration algorithm.

Optical distortion correction causes the image boundaries to become irregular (i.e., to become a non-rectangular image). The first cropping size is used to crop the irregular boundary portion caused by the optical distortion correction (i.e., change the non-rectangular image to a rectangular image). The first crop size may be determined based on the first offset information. Typically, the crop size has a range constraint that affects the determination of the corresponding offset information. In this embodiment, the first cropping size has a range constraint that affects the determination of the first offset information.

And 403, determining second offset information for performing anti-shake processing on the video image and a second cutting size corresponding to the second offset information.

The second offset information is used for performing anti-shake processing on the video image so that the video image becomes clear.

The preset object can be shot by the camera device, the lens rotation angular velocity when one frame of image is shot is collected through a gyroscope in the camera device, the lens rotation angle is determined according to the lens rotation angular velocity and the time interval between two adjacent frames of images, the shot image is compared with the standard image of the preset object, and the image offset of the shot image compared with the standard image is obtained and serves as the image offset corresponding to the lens rotation angle. Through multiple times of shooting, image offset corresponding to multiple groups of lens rotation angles can be determined, and therefore the corresponding relation between the lens rotation angles and the image offset is established. And obtaining second offset information according to the corresponding relation between the lens rotation angle and the image offset.

The anti-shake process causes the image boundaries to become irregular. The second clipping size is used to clip an irregular boundary portion caused by the anti-shake processing. A second crop size may be determined based on the second offset information. In this embodiment, the second crop size has a range constraint, and the range constraint of the second crop size affects the determination of the second offset information.

And 404, determining third offset information for performing stretching distortion correction on the video image and a third cutting size corresponding to the third offset information.

In an embodiment of the application, an energy function with time domain stability constraint is constructed for a video image, and the energy function is optimized and solved to obtain third offset information.

The time domain stability constraint is used for constraining the size of offset information (namely third offset information) between adjacent video frames at the same position, ensuring that no sudden change of the offset information is generated between the adjacent video frames at the same position, and avoiding unsmooth video pictures caused by stretching distortion correction.

The specific content of determining the third offset information by constructing an energy function with a temporal stability constraint will be described in fig. 6.

The face frame position, the portrait position and the optical flow information in the video image can be detected, and the time domain stability constraint is calculated according to the face frame position, the portrait position and the optical flow information in the video image.

In another embodiment of the present application, the third offset information may be determined by a global deformation projection model. For each scene, third offset information corresponding to the scene may be determined through the global deformation projection model. The images in the same scene use the same third offset information, and the images in different scenes use different third offset information. Or, the scenes may not be distinguished, and the unified third offset information may be determined by the global deformation projection model, and the same third offset information is used for images in different scenes. The specific content of determining the third offset information by the global deformation projection model will be described in fig. 8.

The scenes in the video image may include single-person scenes and multi-person scenes. The image comprises a single person scene. The image comprises a multi-person scene comprising two or more persons.

Alternatively, the scenes in the video image may include a wide-angle shooting scene and a normal shooting scene. The scene shot through the wide-angle camera is a wide-angle shooting scene, and the scene shot through the non-wide-angle camera is a common shooting scene.

Stretch distortion correction may cause image boundaries to become irregular. The third crop size is used to crop the irregular boundary portion resulting from the stretch distortion correction. A third crop size may be determined based on the third offset information. In this embodiment, the third crop size has a range constraint, and the range constraint of the third crop size affects the determination of the third offset information.

In one embodiment of the application, the constructed energy function has no boundary constraints, and thus stretch distortion correction based on the third offset information causes image boundaries to become irregular. The third crop size is used to crop the irregular boundary portion resulting from the stretch distortion correction.

In another embodiment of the present application, a boundary constraint may be added to the constructed energy function, so as to ensure the stretching distortion corrected image boundary rule, and the stretching distortion corrected image boundary does not need to be clipped.

Whether to add a boundary constraint in the constructed energy function may be adaptively adjusted.

Whether to add a boundary constraint in the constructed energy function may be determined based on the portrait position in the video image. For example, the boundary constraint may not be added when the portrait position is close to the edge of the image, and the boundary constraint may be added when the portrait position is close to the center of the image, so that the angle of view may be lost as little as possible while the correction effect of the portrait may be better maintained.

For example, the distance between the center of the face frame and the center of the video image may be calculated, and when the distance is greater than or equal to a preset threshold, it is determined that the portrait is close to the edge of the image, and at this time, no boundary constraint is added. When the distance is smaller than a preset threshold value, the situation that the portrait is close to the center of the image is judged, and at the moment, boundary constraint is added.

Alternatively, whether to add a boundary constraint in the constructed energy function may be determined based on the number of figures in the video image. For example, the boundary constraint may be added when the number of persons in the video image is one, and the boundary constraint may not be added when the number of persons in the video image is plural.

And 405, mapping the video image according to the first offset information, the second offset information and the third offset information to obtain a corrected video image.

The total offset information can be obtained according to the first offset information, the second offset information and the third offset information, and the corrected video image can be obtained by mapping the video image according to the total offset information. The description of 907 may be referred to obtain the relevant content of the total offset information according to the first offset information, the second offset information, and the third offset information.

And 406, cutting the corrected video image according to the first cutting size, the second cutting size and the third cutting size to obtain a target video image.

The president size can be obtained according to the first cutting size, the second cutting size and the third cutting size, and the corrected video image is cut according to the president size to obtain the target video image.

In the embodiment of the present application, in order to reduce the amount of computation and reduce the processing time, the first offset information, the second offset information, and the third offset information may be grid-based offset information.

The grid is a method for partitioning an image, and is used for performing down-sampling on the image to obtain a grid map with low resolution, wherein each pixel point in the grid map is used as a grid point. For example, the image is divided into 17 × 17 pixel blocks, the center position of each pixel block is taken as a grid point (i.e., each grid point corresponds to a pixel block), and all grid points form a low-resolution grid map.

The grid-based offset information means that each grid point has an offset in the horizontal direction and an offset in the vertical direction, the offset of each grid point represents the offset information of the pixel block corresponding to the grid point, and finally the offset of each corresponding pixel point can be obtained in an interpolation mode.

In the embodiment, the first offset information corresponding to optical distortion correction, the second offset information corresponding to anti-shake processing and the third offset information corresponding to stretching distortion correction are determined, and the video image is mapped once according to the first offset information, the second offset information and the third offset information to obtain the corrected video image, so that a large amount of time consumed by multiple times of mapping is avoided, and the video image processing speed is increased. The third offset information may be determined by an energy function with temporal stability constraints or a global deformation projection model, thereby ensuring temporal stability of the stretched distortion corrected video image.

Fig. 5 is a supplement to the flowchart shown in fig. 4. If the video distortion correction method provided in the embodiment of the present application is applied to an electronic device (for example, the electronic device in fig. 1) having a shooting function, and performs video distortion correction on a video image (which may be a preview stream or a recording stream) acquired by a camera of the electronic device in real time, the method may further include the following steps:

501, it is determined whether the electronic device switches between the wide angle camera and the main camera.

An electronic device (e.g., a cell phone) may include multiple cameras, including, for example, a main camera, a wide camera, and a tele camera. Different cameras have different field angles, and when a recorded scene changes, a proper camera needs to be switched to change the field angle.

Whether a user performs a switching operation between the wide-angle camera and the main camera or not can be detected, and when it is detected that the user performs the switching operation between the wide-angle camera and the main camera, it is determined that the electronic device switches between the wide-angle camera and the main camera. For example, user operation of the operating focal length adjustment control can be detected to determine whether to switch between the wide camera and the main camera. For example, the focal length adjustment control includes four selectable focal lengths of 0.6x,1x,2x and 5x, where 0.6x corresponds to the wide-angle camera and 1x corresponds to the main camera, and if the user adjusts from 1x to 0.6x through the focal length adjustment control, the electronic device switches from the main camera to the wide-angle camera. Or whether the automatic switching condition between the wide-angle camera and the main camera is met or not can be detected, and if the automatic switching condition between the wide-angle camera and the main camera is met, the electronic equipment is switched between the wide-angle camera and the main camera. For example, when it is detected that the number of persons in the video image exceeds three, it is determined that the automatic switching condition is satisfied, and the electronic apparatus switches from the main camera to the wide-angle camera.

502, if the electronic device switches between the wide-angle camera and the main camera, update the first cropping size, the second cropping size, and the third cropping size.

When the electronic equipment is switched between the wide-angle camera and the main camera, the cutting sizes corresponding to the anti-shake processing, the optical distortion correction and the stretching distortion correction are updated, so that the jump of the view field is kept fixed when the electronic equipment is switched between the wide-angle camera and the main camera.

The image size corresponding to the wide-angle camera is Fov _sub The image size corresponding to the main camera is Fov _main The difference of the sheeting fields of the main camera and the wide-angle camera is delta Fov (Fov) _sub 、Fov _main Δ Fov are known amounts). The first Crop size corresponding to the optical distortion correction is Crop _pdc And the second cutting size corresponding to the anti-shake processing is Crop _eis And the third cutting size corresponding to the stretching distortion correction is Crop _{extra_new} 。

For a wide-angle camera, a cutting size Crop is reserved in anti-shake processing _extra ，Crop _extra ＝Crop _pdc +Crop _{extra_new} (2.2.3-1)。

The image size of main camera and wide-angle camera satisfies:

Fov _sub ＝Fov _main +ΔFov+Crop _eis +Crop _extra (2.2.3-2)。

bringing (2.2.3-1) into (2.2.3-2) to obtain:

Fov _sub ＝Fov _main +ΔFov+Crop _eis +Crop _pdc +Crop _{extra_new} (2.2.3-3)。

the first cutting size, the second cutting size and the third cutting size are updated according to the above formula, so that the difference delta Fov of the slicing view field between the wide-angle camera and the main camera is not changed, and the view field jumping is fixed when the wide-angle camera and the main camera are switched.

And after the electronic equipment is switched between the wide-angle camera and the main camera, if the video distortion correction is continuously carried out on the video image acquired by the camera device in real time, the video image is cut by utilizing the updated first cutting size, second cutting size and third cutting size.

Fig. 6 is a flowchart of a video distortion correction method according to another embodiment of the present application. The embodiment shown in fig. 6 is applied to an electronic apparatus having a photographing function (e.g., the electronic apparatus in fig. 1). The electronic device includes an image capture device, a processor, and a motion sensor (e.g., a gyroscope). The image pickup device includes an image sensor.

As shown in fig. 6, the video distortion correction method provided in the embodiment of the present application specifically includes:

the image sensor generates 601 a raw video image.

The raw video image is a series of raw images that the image sensor generates in chronological order. The image sensor performs photoelectric conversion on an optical image generated by the scene through the lens to obtain an electric signal, and then performs analog-to-digital conversion on the electric signal to obtain an original image.

The processor preprocesses the original video image 602 to obtain an initial video stream.

In one embodiment of the present application, the processor that pre-processes the original video image may be an image signal processor. The pre-processing of the original video image may include black level compensation, bad pixel correction, and the like.

In one embodiment of the present application, the initial video stream may be in YUV format. In other embodiments of the present application, the initial video stream may be in other formats (e.g., RGB format).

603, the processor converts the original video stream into a tiny stream, a preview stream and a recording stream.

the tiny stream is used for face detection, portrait segmentation and optical flow information calculation. The preview stream is used for video preview. And the recording stream is used for recording and storing the video.

the resolution of the tiny stream is low, for example 720 x 540. The resolution of the preview and recording streams is higher, e.g. 1920 x 1080. the tiny, preview and recording streams may be of lower resolution than the original video stream. The resolutions of the tiny stream, the preview stream and the recording stream can be set according to actual needs, and the original video stream is down-sampled according to the corresponding resolutions to obtain the tiny stream, the preview stream and the recording stream. For example, the original video stream has a resolution corresponding to the original video image of 4096 × 3072, the tiny stream has a resolution of 720 × 540, the preview stream has a resolution of 1920 × 1080, and the recording stream has a resolution of 1920 × 1080. the image size of the tiny stream, preview stream and recording stream may be smaller than the original video stream.

The image sizes of the tiny stream, the preview stream and the recording stream can be set according to actual needs, and the initial video stream is cut according to the corresponding image size.

And 604, the processor performs face detection, face segmentation and optical flow calculation on the tiny stream to obtain a first face frame position, a first face position and first optical flow information.

Performing face detection on the tiny stream is to detect the position of a face (i.e., the face frame position) from each image of the tiny stream. Face detection may be performed on the tiny stream using Harr feature extraction and an Adaboost classifier.

Performing portrait segmentation on the tiny stream is to detect the position of a human body (i.e., portrait position) from each image of the tiny stream, and performing portrait segmentation on the tiny stream can generate a portrait mask (mask) image. The image segmentation can be performed on the tiny stream by using a Graph theory-based Graph Cut (Graph Cut) algorithm and a deep learning-based image semantic segmentation algorithm.

The optical flow calculations performed on the tiny stream determine the speed and direction of motion of the pixels in each image of the tiny stream. The Lucas-Kanade algorithm and the deep learning era optical flow estimation algorithm can be used for carrying out optical flow calculation on the tiny flow.

In computer vision, optical flow is used to define the movement of objects in an image, either by camera movement or by object movement. The optical flow specifically refers to the amount of movement of a pixel point representing the same object (object) in one frame of a video image to move to the next frame, and is represented by a two-dimensional vector.

Since the resolution of the tiny stream is lower than that of the preview and recording streams, the performance of face detection, segmentation of the human image, and optical flow computation can be greatly reduced.

The processor post-processes the preview and recording streams 605.

In one embodiment of the application, the processor that post-processes the preview and recording streams may be an image signal processor. Post-processing of the preview and recording streams may include noise reduction.

It should be appreciated that in other embodiments of the present application, the preview and recording streams may not be post-processed.

The processor determines first offset information for optical distortion correction and a first crop size corresponding to the first offset information 606.

The first offset information is used for performing optical distortion correction on the preview stream and the recording stream to correct distortion caused by the lens module. The first cropping size is used to crop the irregular boundary portion caused by the optical distortion correction.

The electronic device may utilize an optical distortion correction algorithm to determine the first offset information for the preview stream and the recording stream. Illustratively, the optical distortion correction algorithm may be a tensor friend camera calibration algorithm.

The processor 607 determines second offset information for the anti-shake processing and a second crop size corresponding to the second offset information.

The second offset information is used to perform anti-shake processing on the preview stream and the recording stream to make the image clear. The second cropping size is used for cropping the irregular boundary portion caused by the anti-shake processing.

The preset object can be shot by the camera device, the lens rotation angular velocity when one frame of image is shot is collected through a gyroscope in the camera device, the lens rotation angle is determined according to the lens rotation angular velocity and the time interval between two adjacent frames of images, the shot image is compared with the standard image of the preset object, and the image offset of the shot image compared with the standard image is obtained and serves as the image offset corresponding to the lens rotation angle. Through multiple times of shooting, image offset corresponding to multiple groups of lens rotation angles respectively can be determined, and therefore the corresponding relation between the lens rotation angles and the image offset is established. And obtaining second offset information according to the corresponding relation between the lens rotation angle and the image offset.

608, the processor maps the post-processed preview stream and the recording stream according to the first offset information and the second offset information to obtain a first corrected preview stream and a first corrected recording stream, and clips the first corrected preview stream and the first corrected recording stream according to the first clipping size and the second clipping size to obtain a second corrected preview stream and a second corrected recording stream.

In the embodiment shown in fig. 6, the processor implements the optical distortion correction and anti-shake processing through one mapping (i.e., mapping the post-processed preview stream and the recording stream according to the first offset information and the second offset information) and one cropping (i.e., cropping the first corrected preview stream and the first corrected recording stream according to the first cropping size and the second cropping size). In other embodiments, the processor may perform optical distortion correction in a single mapping and anti-shake processing in a single mapping. That is, the processor may perform a first mapping (i.e., optical distortion correction) and a first cropping on the post-processed preview and record streams according to the first offset information after determining the first offset information for optical distortion correction and a first cropping size corresponding to the first offset information; and after determining second offset information used for anti-shake processing and a second cropping size corresponding to the second offset information, performing second mapping (namely anti-shake processing) and second cropping on the preview stream and the recording stream after the first mapping according to the second offset information. Optical distortion correction and anti-shake processing are realized through one-time mapping, the speed of video distortion correction can be increased, and the method is more suitable for real-time video processing.

609, the processor maps the first face frame position, the first face position and the first optical flow information according to the first offset information and the second offset information to obtain a second face frame position, a second face position and second optical flow information.

The first face frame position, the first portrait position and the first optical flow information are obtained from a tiny stream, the tiny stream is not subjected to optical distortion correction and anti-shake processing, the second correction preview stream and the second correction recording stream are obtained from optical distortion correction and anti-shake processing, and the first face frame position, the first portrait position and the first optical flow information are mapped according to the first offset information and the second offset information so as to obtain the face frame position, the portrait position and the optical flow information matched with the second correction preview stream and the second correction recording stream.

And 610, constructing an energy function with time domain stability constraint on the second correction preview stream and the second correction recording stream by the processor, performing optimization solution on the energy function to obtain third offset information for stretching distortion correction, and obtaining a third cutting size corresponding to the third offset information.

Constructing an energy function with temporal stability constraints for the second correction preview stream and the second correction recording stream refers to constructing an energy function for each frame of image of the second correction preview stream and the second correction recording stream.

In an embodiment of the present application, a face region may be determined according to the second face frame position (a region corresponding to the second face frame position may be used as the face region), or a face region may be determined according to the second face frame position and the second portrait position (a region where the region corresponding to the second face frame position and the region corresponding to the second portrait position coincide may be used as the face region), and the time-domain stability constraint of the energy function may be calculated according to the face region. Alternatively, a time-domain stabilization constraint of the energy function may be computed from the second optical flow information. The time domain stabilization constraint obtained from the face region and the time domain stabilization constraint obtained from the second optical flow information may be subjected to weighted summation as a final time domain stabilization constraint.

A temporal stability constraint may be computed from the face region. The positions of the nearest grid points in the face area of one video frame of the video image and the face area of the previous frame of the video frame are Vij and Uij respectively, and the time domain stability constraint is | Uij-Vij |. As shown in fig. 7, assuming that the circular area is a face area, the face motion trend is determined according to the face areas of the n-2 frame and the n-1 frame, assuming that the face performs uniform linear motion, estimating the face area of the nth frame, and adding time domain stability constraint to the energy function by using the estimated face area of the nth frame. Assuming that the positions of the nearest grid points in the face regions of the (n-1) th frame and the (n) th frame are Uij and Vij respectively, the temporal stability constraint of the (n) th frame is | Uii-Vij |.

A temporal stability constraint may be calculated from the optical flow information. Optical flow allows for motion estimation of foreground and background regions, and thus, according to optical flow, the position of the previous frame (e.g., frame n-1) can be deduced backwards for each grid point in the video frame (e.g., frame n). Suppose the position of a grid point in a video frame of a video image is Q _ij Deducing the corresponding position in the previous frame as P according to the optical flow method _i′j′ The temporal stability constraint is | P _i′j′ -Q _ij L. the method is used for the preparation of the medicament. The derived position may not be a grid point (called a sub-pixel, Q shown in FIG. 8) _ij Derived position as P _i′j′ ，P _i′j′ Not a grid point). When the derived position is not a grid point, four adjacent grid points of the derived position are interpolated to represent.

In one embodiment of the application, the constraint terms of the energy function include a foreground constraint, a background constraint, and a canonical constraint in addition to the temporal stability constraint. The foreground constraint is a constraint item corresponding to a foreground region of an image (i.e., an image in the second correction preview stream and the second correction recording stream), the background constraint is a constraint item corresponding to a background region of an image, and the regular constraint is a constraint item corresponding to a global image region.

Assuming that the coordinate matrix of the image before the stretching distortion correction (which may be referred to as an original image matrix) is M0, the coordinate matrix of the image after the stretching distortion correction (which may be referred to as a target image matrix) is Mt, and an element at (i, j) in Mt may be expressed using the following formula:

Mt(i，j)＝[ut(i，j)，vt(i，j)] ^T 。

where ut (i, j) represents a position in the horizontal direction, and vt (i, j) represents a position in the vertical direction.

Its displacement matrix compared to M0 is Dt, and the element at (i, j) in Dt can be expressed using the following formula:

Dt(i，j)＝[du(i，j)，dv(i，j)] ^T 。

where du (i, j) denotes the amount of shift of the grid point at (i, j) in the horizontal direction, and dv (i, j) denotes the amount of shift of the grid point at (i, j) in the vertical direction.

That is to say: mt (i, j) = M0 (i, j) + Dt (i, j).

And distributing weight coefficients to each constraint term, and constructing an energy function as follows:

Dt(i，j)＝[du(i，j)，dv(i，j)] ^T ＝argmin(α1(i，j)*Term1(i，j)+α2(i，j)*Term2(i，j)+α3(i，j)*Term3(i，j)+α4(i，j)*Term4(i，j)4)。

where Term1 to Term4 are respectively a time domain stability constraint, a foreground constraint, a background constraint, and a regular constraint, and α 1 (i, j) to α 5 (i, j) are respectively weight coefficients (weight matrices) corresponding to Term1 to Term 4.

In another embodiment of the application, the constraint terms of the energy function include a foreground constraint, a background constraint, a regularization constraint, and a boundary constraint, in addition to the temporal stability constraint. The boundary constraint is a constraint item corresponding to a boundary region of the image.

The energy function was constructed as follows:

Dt(i，j)＝[du(i，j)，dv(i，j)] ^T ＝argmin(α1(i，j)*Term1(i，j)+α2(i，j)*Term2(i，j)+α3(i，j)*Term3(i，j)+α4(i，j)*Term4(i，j)+α5(i，j)*Term5(i，j))。

wherein Term1 to Term5 are respectively time domain stability constraint, foreground constraint, background constraint, regular constraint and boundary constraint, and α 1 (i, j) to α 5 (i, j) are respectively weight coefficients corresponding to Term1 to Term 5.

The energy function can be optimized and solved (the energy function obtains the minimum value) by a least square method, a gradient descent method, and the like, so that a displacement matrix Dt (i, j), namely third offset information, of each pixel point of the image is obtained.

The energy function can be solved optimally by Ceres solution.

The foreground constraint (Term 2) is used to constrain the target image matrix corresponding to the foreground region (e.g. the head region or the portrait region) to approximate to the image matrix after the first image matrix is geometrically transformed (i.e. the set transformation of the constrained foreground region), so as to correct the stretching deformation occurring in the foreground region. The first image matrix is obtained by processing an initial image matrix corresponding to the foreground region by using a spherical projection algorithm or a mercator projection algorithm, and the geometric transformation comprises at least one of image rotation, image translation and image scaling.

And the background constraint (Term 3) is used for constraining the displacement of pixel points in the initial image matrix corresponding to the background area, and a first vector corresponding to a pixel point before the displacement and a second vector corresponding to a pixel point after the displacement are kept parallel, so that the image content in the background area is smooth and continuous and the image content penetrating through the portrait in the background area is continuous and consistent in the vision of human eyes. Wherein the first vector represents a vector between a pixel point before displacement and a neighborhood pixel point corresponding to the pixel point before displacement; the second vector represents a vector between the shifted pixel point and a neighborhood pixel point corresponding to the shifted pixel point.

The regular constraint (Term 4) is used for constraining the difference value of any two displacement matrixes in the displacement matrixes respectively corresponding to the foreground area, the background area and the boundary area in the image after the stretching distortion correction to be smaller than a preset threshold value, so that the global image content of the image after the stretching distortion correction is smooth and continuous.

And the boundary constraint (Term 5) is used for constraining the pixel points in the initial image matrix corresponding to the boundary region to be displaced along the edge of the first corrected video image or towards the outer side of the image before the stretching distortion correction so as to maintain or enlarge the boundary region, thereby ensuring that the scene view field is unchanged.

When the constraint terms of the energy function include a boundary constraint, the third crop size obtained by the processor is 0. When the constraint term of the energy function does not include a boundary constraint, the processor obtains a third crop size other than 0.

When constructing the energy function, foreground and background separation is required in order to calculate foreground constraint and background constraint. The result of the portrait segmentation can be directly utilized to separate the foreground from the background, the segmented portrait is taken as the foreground, and other parts are taken as the background. Or, the foreground and background can be separated directly by using the result of the face detection, the face frame is used as the foreground, and other parts are used as the background. Or, the face frame can be expanded to obtain a foreground area, so as to ensure that the face is in the whole foreground area. Or the face area can be found by simultaneously using the face frame position and the portrait position, and the face area is used as the foreground.

611, the processor maps the second correction preview stream and the second correction record stream according to the third offset information to obtain a third correction preview stream and a third correction record stream, and cuts the third correction preview stream and the third correction record stream according to the third cutting size to obtain a target preview stream and a target record stream.

It should be noted that, if the energy function constructed in 610 includes a boundary constraint and the third cropping size obtained by the processor is 0, in this step, it is not necessary to crop the third corrected preview stream and the third corrected recording stream according to the third cropping size.

And 612, the processor performs preview display according to the target preview stream.

For example, the electronic device is a mobile phone, and after the video recording is started, the processing processor previews and displays the recorded video on a screen of the mobile phone according to the target preview stream, so that the consistency between the video recording and the previewing is ensured.

613, the encoder performs video coding on the target recording stream to obtain a target video file.

For example, the electronic device is a mobile phone, and the encoder performs video encoding on the target recording stream according to the format of h.265 to obtain a target video file, and stores the target video file in the mobile phone.

In the embodiment of the application, after the electronic device performs optical distortion correction and anti-shake processing on a video image, independent constraint items are respectively set when offset information for stretching distortion correction is calculated through an energy function, so that the portrait can be well stretched distortion corrected. The energy function of the embodiment of the application comprises time domain stability constraint, and time domain continuity of the stretching distortion effect is guaranteed. The energy function of the embodiment of the application may further include foreground constraint, which is used to constrain the geometric transformation of the foreground region. The energy function of the embodiment of the application can also comprise background constraint, and the position relation of the background pixel points and the control domain points of the background area before and after transformation is constrained, so that the image content of the background area can not cause background distortion or fault phenomenon due to foreground correction. The energy function of the embodiment of the application may further include a regular constraint to make the global image content of the image after the stretching distortion correction continuous smoothly. The energy function of the embodiment of the application can also comprise boundary constraint, and the boundary area is subjected to content self-adaptive boundary control, so that the field loss can not occur, and the problems of content loss and field loss caused by cropping can be reduced to the greatest extent. According to the embodiment of the application, after the video image is corrected, the portrait and the background content can be natural, coherent and coordinated, the habit of human vision is met, and the use experience of a user is improved.

Fig. 9 is a flowchart of a video distortion correction method according to another embodiment of the present application. In this embodiment, the third offset information is determined by a global deformation projection model.

An image sensor generates a raw video image 901.

The processor preprocesses the original video image 902 to obtain an initial video stream.

The processor 903 determines first offset information for optical distortion correction and a first crop size corresponding to the first offset information.

The processor determines 904 a second offset information for the anti-shake processing and a second crop size corresponding to the second offset information.

905, the processor performs off-line simulation learning on the global deformation projection model.

The global deformation projection model may be a neural network model. The input of the global warping projection model is an image before the stretching distortion correction, the output of the global warping projection model is an image after the stretching distortion correction, and offset information (i.e., third offset information) for the stretching distortion correction can be obtained from the image after the stretching distortion correction and the image before the stretching distortion correction.

The global deformation projection model comprises stretching distortion correction parameters, and the optimized stretching distortion correction parameters are obtained through off-line simulation learning. The stretch distortion correction parameters of the global deformation projection model include k0-k5.

In an embodiment of the present application, when performing offline simulation learning on the global deformation projection model, learning may be performed based on grids to obtain an offset of each grid point. If the point after the grid point mapping is not a grid point but a sub-pixel point, four adjacent grid points of the sub-pixel point are interpolated to represent the sub-pixel point. The off-line simulation learning of the global deformation projection model based on the grids is that training is carried out on the resolution level of the grid points, and only the offset of each grid point needs to be obtained, so that the learning speed of the global deformation projection model is increased, and the calculation amount is reduced.

In keeping with the definition in 609, assume that the coordinate matrix of the grid point image before the stretching distortion correction is M0, the coordinate matrix of the grid point image after the stretching distortion correction is Mt, and the offset matrix between M0 and Mt is Dt.

Coordinates at grid point position (i, j) before the stretching distortion correction are M0 (i, j) = [ u0 (i, j), v0 (i, j)] ^T The corresponding coordinate after the stretching distortion correction is Mt (i, j) = [ ut (i, j), vt (i, j)] ^T The mapping coefficients in the horizontal direction and the vertical direction are merge _ coeffient _ u (i, j) and merge _ coeffient _ v (i, j), and the global deformation projection model satisfies the following relation:

for traversing each position (i, j) in the horizontal direction:

u(i，j)＝u0(i，j)*k0；

rsquare_u(i，j)＝k1*u(i，j)*u(i，j)+k2*v0(i，j)*v0(i，j)；

rfourpower_u(i，j)＝rsquare_u(i，j)*rsquare_u(i，j)；

rsixpower_u(i，j)＝rsquare_u(i，j)*rsquare_u(i，j)*rsquare_u(i，j)；

merge_coefficient_u(i，j)＝1.0+k3*rsquare_u(i，j)+k4*rfourpower_u(i，j)+k5*rsixpower_u(i，j)；

ut(i，j)＝u(i，j)*merge_coefficient_u(i，j)。

for traversing each position (i, j) in the vertical direction:

v(i，j)＝v0(i，j)*k0；

rsquare_v(i，j)＝k1*u0(i，j)*u0(i，j)+k2*v(i，j)*v(i，j)；

rfourpower_v(i，j)＝rsquare_v(i，j)*rsquare_v(i，j)；

rsixpower_v(i，j)＝rsquare_v(i，j)*rsquare_v(i，j)*rsquare_v(i，j)；

merge_coefficient_v(i，j)＝1.0+k3*rsquare_v(i，j)+k4*rfourpower_v(i，j)+k5*rsixpower_v(i，j)；

vt(i，j)＝v(i，j)*merge_coefficient_v(i，j)。

the amount of shift in the horizontal direction is du (i, j) = ut (i, j) -u0 (i, j), and the amount of shift in the vertical direction is dv (i, j) = vt (i, j) -v0 (i, j).

In one embodiment of the application, the global deformation projection model is subjected to off-line simulation learning according to scenes.

The off-line simulation learning of the global deformation projection model according to the scenes means that the off-line simulation learning of the global deformation projection model is respectively carried out on different scenes to obtain the optimized stretching distortion correction parameters corresponding to each scene.

For example, offline simulation learning may be performed on a single-person scene and a multi-person scene, respectively, to obtain an optimized stretch distortion correction parameter corresponding to the single-person scene and an optimized stretch distortion correction parameter corresponding to the multi-person scene. For another example, the offline simulation learning is performed on the wide-angle shooting scene and the normal shooting scene respectively to obtain the optimized stretching distortion correction parameter corresponding to the wide-angle shooting scene and the optimized stretching distortion correction parameter corresponding to the normal shooting scene.

And 906, inputting the initial video stream into the global deformation projection model, obtaining third offset information for performing stretching distortion correction on the initial video stream, and obtaining a third cutting size according to the third offset information.

907, the processor obtains total offset information according to the first offset information, the second offset information and the third offset information, and obtains a total cropping size according to the first cropping size, the second cropping size and the third cropping size.

The combined offset information for the optical distortion correction and the anti-shake processing may be obtained from the first offset information and the second offset information, and the total offset information may be obtained from the combined offset information and the third offset information.

The total offset information (i.e. three-in-one) is obtained according to the combined offset information and the third offset information, and the implementation manner is as follows: from the final output grid point V _ij Corresponding offset information is extrapolated back to U of the middle graph _ij ，U _ij The four adjacent grid vertexes are used for interpolation; then according to U _ij And the offset information 1 reversely deduces the corresponding point P on the original image _ij Finally according to V _ij And P _ij The offset values in between (i.e., the offset values in the horizontal and vertical directions) are passed as the mapping information after the three sets. I.e., each grid point is calculated as above, the offset amount of each grid point can be output.

The processor maps 908 the original video stream according to the total offset information to obtain a corrected video stream.

The processor carries out corresponding offset processing on each pixel, and finally outputs a corrected video frame.

The processor clips 909 the corrected video stream according to the president clip size to obtain the target video stream.

In the embodiment of the application, the unified third offset information is used for all frames of the video or all frames in the same scene, so that the stability problem does not exist between the frames, and the time domain stability of the video image after the stretching distortion correction is ensured.

If other post-processing needs to be performed on the target video stream, the target video stream can be output to other post-processing modules to obtain a final video result.

The present embodiment also provides a computer storage medium having stored therein computer instructions that, when run on an electronic device, cause the electronic device to perform the above-mentioned related method steps to implement the video distortion correction method in the above-mentioned embodiment.

The present embodiment also provides a computer program product, which, when run on an electronic device, causes the electronic device to execute the above-mentioned relevant steps to implement the video distortion correction method in the above-mentioned embodiment.

In addition, an apparatus, which may be specifically a chip, a component or a module, may include a processor and a memory connected to each other; the memory is used for storing computer execution instructions, and when the device runs, the processor can execute the computer execution instructions stored in the memory, so that the chip can execute the video distortion correction method in the above-mentioned method embodiments.

The electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are all configured to execute the corresponding method provided above, so that the beneficial effects achieved by the electronic device, the computer storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding method provided above, and are not described herein again.

Through the description of the foregoing embodiments, it will be clear to those skilled in the art that, for convenience and simplicity of description, only the division of the functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the module or unit is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed to a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for video distortion correction, the method comprising:

acquiring a video image;

determining offset information for stretch distortion correction of the video image by an energy function with a temporal stability constraint or a global deformation projection model;

2. The video distortion correction method of claim 1, wherein determining offset information for stretch distortion correction of the video image by an energy function having a temporal stability constraint comprises:

3. The video distortion correction method of claim 2, wherein said constructing an energy function having a temporal stability constraint on the video image comprises:

acquiring a face area and/or optical flow information of the video image;

4. The video distortion correction method of claim 3, wherein said obtaining a face region of said video image comprises:

detecting the position of a face frame of the video image, and determining the face area according to the position of the face frame; or alternatively

5. A video distortion correction method as defined in claim 3, wherein said computing the temporal stability constraint based on the face region comprises:

the positions of the nearest grid points in the face region of one video frame of the video image and the face region of the previous video frame of the video image are Vij and Uij respectively, and the time domain stability constraint is | Uij-Vij |.

6. The video distortion correction method of claim 3 wherein computing the temporal stability constraint based on the optical flow information comprises:

7. A method for video distortion correction as defined in claim 1, wherein the energy function further has a foreground constraint, a background constraint, and a canonical constraint, or the energy function further has a foreground constraint, a background constraint, a canonical constraint, and a boundary constraint.

8. The method of claim 7, wherein the energy function has the boundary constraint if a portrait position in the video image is near a center of the video image; or alternatively

9. The video distortion correction method of claim 1, wherein determining offset information for stretch distortion correction of the video image by a global warping projection model comprises:

10. The video distortion correction method of claim 9, wherein the off-line simulation learning of the global warping projection model comprises:

11. The video distortion correction method of claim 10, wherein the off-line simulation learning of the global warping projection model by scene comprises:

12. A video distortion correction method as defined in any one of claims 1 to 11, the method further comprising:

determining cutting information corresponding to the offset information;

13. A video distortion correction method applied to electronic equipment is characterized by comprising the following steps:

acquiring a video image;

determining first offset information used for carrying out optical distortion correction on the video image and a first cutting size corresponding to the first offset information;

determining second offset information for performing anti-shake processing on the video image and a second cropping size corresponding to the second offset information;

determining third offset information for performing stretching distortion correction on the video image and a third cropping size corresponding to the third offset information;

14. The video distortion correction method of claim 13, wherein said determining third offset information for stretch distortion correction of the video image comprises:

the third offset information is determined by an energy function or a global deformation projection model with a temporal stability constraint.

15. The video distortion correction method of claim 14, wherein determining the third offset information by an energy function having a temporal stability constraint comprises:

constructing an energy function with a time domain stability constraint for the video image, wherein the time domain stability constraint is used for constraining the size of offset information between adjacent video frames at the same position;

16. The video distortion correction method of claim 15, wherein said constructing an energy function with temporal stability constraints on said video image comprises:

acquiring the face area and/or optical flow information of the video image;

17. A method for video distortion correction according to any of claims 14 to 16, wherein said energy function further has a boundary constraint if the position of a figure in said video image is close to the center of said video image; or

18. A video distortion correction method as claimed in any one of claims 14 to 16, characterized in that the method further comprises:

if the electronic equipment switches between the wide-angle camera and the main camera, the first cutting size, the second cutting size and the third cutting size are updated, so that the jumping of the view field is kept fixed when the wide-angle camera and the main camera are switched.

19. A video distortion correction method as defined in claim 18, wherein the updated first crop size, second crop size, and third crop size satisfy:

Fov _sub ＝Fov _main +ΔFov+Crop _eis +Crop _pdc +Crop _{extra_new} ；

Crop _extra ＝Crop _pdc +Crop _{extra_new} ；

wherein, fov _sub For the image size, fov, corresponding to the wide-angle camera _main Is the image size corresponding to the main camera, and delta Fov is the difference of the sheeting fields of view of the main camera and the wide-angle camera, crop _eis Crop size of the second Crop size, crop _pdc For the first Crop size, crop _{extra_new} Crop size of the third Crop size, crop _extra For the wide-angle camera anti-shake processingThe reserved cutting size.

20. A method for video distortion correction, the method comprising:

mapping the first face frame position, the first face position and the first optical flow information according to the first offset information and the second offset information to obtain a second face frame position, a second face position and second optical flow information;

21. A video distortion correction method as defined in claim 20, further comprising:

performing preview display according to the target preview stream; and/or

Video encoding the target recording stream.

22. A computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the video distortion correction method of any of claims 1 to 21.

23. An electronic device, comprising a processor and a memory, the memory storing instructions, the processor being configured to invoke the instructions in the memory such that the electronic device performs the video distortion correction method of any of claims 1 to 21.

24. A chip system is characterized in that the chip system is applied to electronic equipment; the chip system comprises an interface circuit and a processor; the interface circuit and the processor are interconnected through a line; the interface circuit is used for receiving signals from a memory of the electronic equipment and sending signals to the processor, and the signals comprise computer instructions stored in the memory; the computer instructions, when executed by a processor, cause a system-on-a-chip to perform the video distortion correction method of any of claims 1 to 21.