CN113542600A

CN113542600A - Image generation method, device, chip, terminal and storage medium

Info

Publication number: CN113542600A
Application number: CN202110782001.6A
Authority: CN
Inventors: 王慧
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-10-22
Anticipated expiration: 2041-07-09
Also published as: CN113542600B

Abstract

The application belongs to the technical field of image processing, and particularly relates to an image generation method, an image generation device, a chip, a terminal and a storage medium, wherein on one hand, a preset neural network model corresponding to a target shooting scene is obtained, so that when a picture is shot in the target shooting scene, a new picture can be generated based on the preset neural network model directly according to the shooting visual angle of the picture to be shot, so that the problem that the shot picture is fuzzy is solved, and the shooting quality of the picture is improved; on the other hand, by acquiring the preset neural network model corresponding to the target shooting scene, when video shooting is performed in the target shooting scene, a new video frame image can be generated based on the preset neural network model directly according to the shooting visual angle of the corresponding video frame when the video shakes, so that the problem that the original video shakes is solved on the premise that the content and the resolution ratio of the original video are kept, and the shooting quality of the video is improved.

Description

Image generation method, device, chip, terminal and storage medium

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to an image generation method, an image generation device, a chip, a terminal, and a storage medium.

Background method

In the process of using a mobile terminal such as a mobile phone or a tablet computer, a user often takes photos or videos.

However, in the process of taking a picture or taking a video, the picture taken is often blurred due to various factors, or the picture taken is shaken due to various factors, so that the picture or video taking quality is reduced.

Disclosure of Invention

The embodiment of the application provides an image generation method, an image generation device, a chip, a terminal and a storage medium, which are beneficial to improving the shooting quality of photos or videos.

A first aspect of an embodiment of the present application provides an image generation method, including:

determining at least one to-be-shot visual angle in a target shooting scene;

acquiring a preset neural network model corresponding to the target shooting scene;

obtaining a target pixel value of a target pixel point corresponding to each light direction under a target shooting visual angle based on the preset neural network model and target parameter information of a plurality of target light points in each light direction under the target shooting visual angle, wherein the target shooting visual angle is any one shooting visual angle in the at least one to-be-shot visual angle;

and outputting a target image formed under the target shooting visual angle based on the target pixel values of the target pixel points corresponding to all the light directions under the target shooting visual angle.

Based on the implementation manner of the first aspect, in a first possible implementation manner of the present application, the determining at least one to-be-photographed perspective in a target photographing scene includes:

acquiring a video to be processed in the target shooting scene;

framing the video to be processed to obtain a plurality of frames of video images;

determining two adjacent frames of video images of which the difference value of the shooting visual angles is larger than a threshold value in the multiple frames of video images;

interpolating between two shooting visual angles corresponding to the two adjacent frames of video images to obtain the at least one visual angle to be shot;

alternatively, the first and second electrodes may be,

acquiring a video to be processed in the target shooting scene;

and determining the shooting visual angle corresponding to the blurred image in the plurality of frames of video images as the visual angle to be shot.

Based on the foregoing implementation manner of the first aspect, in a second possible implementation manner of the present application, when there are a plurality of to-be-photographed view angles, the image generation method includes:

acquiring a video to be processed in the target shooting scene;

and the number of the first and second groups,

acquiring a video to be processed in the target shooting scene;

Based on the implementation manner of the first aspect, the first possible implementation manner of the present application, and the second possible implementation manner of the present application, in a third possible implementation manner of the present application, the image generation method includes:

and updating the video to be processed by using the target image to obtain a target video.

Based on the implementation manner of the first aspect, in a fourth possible implementation manner of the present application, the image generation method further includes:

acquiring sample data of a plurality of sample images, wherein the sample images are obtained by shooting in the target shooting scene, and the sample data comprises: the method comprises the steps that sample pixel values of sample pixel points in each sample image and sample parameter information of a plurality of sample light points in the light direction of each sample pixel point are obtained;

inputting the sample parameter information in the sample data into a neural network model to be trained to obtain a ray point pixel value of each sample ray point output by the neural network model to be trained;

calculating to obtain a predicted pixel value of each sample pixel point in each sample image according to the light point pixel value of each sample light point output by the neural network model to be trained;

and after adjusting the parameters of the neural network model to be trained based on the difference between the sample pixel value and the predicted pixel value, returning to the step of inputting the sample parameter information in the sample data into the neural network model to be trained and the subsequent steps until the training of the neural network model is completed, and obtaining the preset neural network model.

Based on the foregoing implementation manner of the first aspect, in a fifth possible implementation manner of the present application, based on the preset neural network model and target parameter information of a plurality of target light points in each light direction under the target shooting view angle, the target pixel value of the target pixel point corresponding to each light direction under the target shooting view angle includes:

inputting target parameter information of a plurality of target light points in each light direction under the target shooting visual angle into the preset neural network model to obtain light point pixel values of the plurality of target light points in each light direction under the target shooting visual angle;

and calculating the light point pixel values of a plurality of target light points in each light direction under the target shooting visual angle to obtain the target pixel values of the target pixel points corresponding to each light direction under the target shooting visual angle.

Based on the foregoing implementation manner of the first aspect and the foregoing fifth possible implementation manner, in a sixth possible implementation manner of the present application, the calculating, according to the light ray point pixel values of a plurality of target light ray points in each light ray direction at the target shooting view angle, to obtain the target pixel value of the target pixel point corresponding to each light ray direction at the target shooting view angle includes:

acquiring weighting coefficients of a plurality of target light points in each light direction under the target shooting visual angle;

and respectively carrying out weighted summation on the plurality of target light points in each light direction according to the weighting coefficients of the plurality of target light points in each light direction under the target shooting visual angle to obtain a target pixel value of a target pixel point corresponding to each light direction under the target shooting visual angle.

Based on the foregoing implementation manner of the first aspect and the foregoing sixth possible implementation manner, in a seventh possible implementation manner of the present application, the obtaining a weighting coefficient of a plurality of target light points in each light direction under the target shooting viewing angle includes:

and determining the depth values of a plurality of target light points in each light direction under the target shooting visual angle as the weighting coefficients of a plurality of target light points in the corresponding light direction under the target shooting visual angle.

Based on the foregoing implementation manner of the first aspect and the foregoing fifth possible implementation manner, in an eighth possible implementation manner of the present application, according to the light point pixel values of a plurality of target light points in each light direction at the target shooting view angle, the target pixel values of the target pixel points corresponding to each light direction at the target shooting view angle are obtained through calculation, including:

and respectively averaging the light ray point pixel values of a plurality of target light ray points in each light ray direction under the target shooting visual angle to obtain the target pixel value of the target pixel point corresponding to each light ray direction under the target shooting visual angle.

Based on the foregoing implementation manner of the first aspect and any one of the foregoing possible implementation manners, in a ninth possible implementation manner of the present application, a plurality of target ray points in each ray direction under the target shooting angle of view are determined based on the following manners:

determining the starting point and the end point of each beam of light under the target shooting visual angle;

and sampling all light ray points from the starting point to the end point of each light ray under the target shooting visual angle to obtain a plurality of target light ray points in the corresponding light ray direction under the target shooting visual angle.

Based on the foregoing implementation manner of the first aspect and any one of the foregoing possible implementation manners, in a tenth possible implementation manner of the present application, the target parameter information includes world coordinates and a direction vector of each target light point in a world coordinate system; the target parameter information is determined based on:

determining a coordinate transformation matrix of a plurality of target light points in each light direction under the target shooting visual angle according to the target shooting visual angle;

and converting camera coordinates of a plurality of target light points in each light direction under a camera coordinate system into world coordinates under a world coordinate system under the target shooting visual angle according to the coordinate transformation matrix, and converting direction vectors of a plurality of target light points in each light direction under the camera coordinate system into direction vectors under the world coordinate system under the target shooting visual angle.

A second aspect of the embodiments of the present application provides a method for training a neural network model, including:

acquiring sample data of a plurality of sample images; the multiple sample images are obtained by shooting in a target shooting scene; the sample data includes: the method comprises the steps that sample pixel values of sample pixel points in each sample image and sample parameter information of a plurality of sample light points in the light direction of each sample pixel point are obtained;

Based on the implementation manner of the second aspect, in an eleventh possible implementation manner of the present application, the predicted pixel value of each sample pixel point in each sample image is obtained by calculating according to the light point pixel value of each sample light point output by the to-be-trained neural network model;

acquiring a weighting coefficient of each sample light ray point;

and according to the weighting coefficient of each sample light ray point, respectively carrying out weighted summation on a plurality of sample light ray points in each light ray direction in each sample image to obtain a predicted pixel value of a corresponding sample pixel point in the corresponding sample image.

Based on the foregoing implementation manner of the second aspect and the foregoing twelfth possible implementation manner, in an eleventh possible implementation manner of the present application, the obtaining a weighting coefficient of each sample light ray point includes:

the depth value of each sample ray point is determined as the weighting coefficient of the corresponding sample ray point.

Based on the implementation manner of the second aspect, in a thirteenth possible implementation manner of the present application, the calculating, according to the light ray point pixel value of each sample light ray point output by the neural network model to be trained, to obtain a predicted pixel value of each sample pixel point in each sample image includes:

and respectively averaging the light ray point pixel values of a plurality of sample light ray points in the light ray direction of each sample pixel point in each sample image to obtain the predicted pixel value of each sample pixel in each sample image.

Based on the implementation manner of the second aspect, in a fourteenth possible implementation manner of the present application, each sample ray point is determined based on the following manner:

determining the starting point and the end point of the light corresponding to each sample pixel point in each sample image;

sampling light points from the starting point to the end point of the light corresponding to each sample pixel in each sample image to obtain a plurality of sample light points in the light direction of each sample pixel point in each sample image.

Based on the implementation manner of the second aspect, in a fifteenth possible implementation manner of the present application, the plurality of sample images are determined based on the following manner:

acquiring a video to be processed, which is obtained by shooting a scene to be shot;

and screening the plurality of sample images from the plurality of frames of video images.

Based on the foregoing implementation manner of the second aspect, in a sixteenth possible implementation manner of the present application, the screening the multiple sample images from the multiple frames of video images includes:

determining an image with blur in the plurality of frames of video images as a non-selectable image;

determining video images except the non-selectable image in the plurality of frames of video images as selectable images;

screening the plurality of sample images from the selectable images.

A third aspect of the embodiments of the present application provides an image generating apparatus, including:

the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining at least one to-be-shot visual angle in a target shooting scene;

the acquisition unit is used for acquiring a preset neural network model corresponding to the target shooting scene; the calculation unit is used for obtaining a target pixel value of a target pixel point corresponding to each light direction under a target shooting visual angle based on the preset neural network model and target parameter information of a plurality of target light points in each light direction under the target shooting visual angle, wherein the target shooting visual angle is any one shooting visual angle in the at least one to-be-shot visual angle;

and the output unit is used for outputting a target image formed under the target shooting visual angle based on the target pixel values of the target pixel points corresponding to all the light directions under the target shooting visual angle.

A fourth aspect of embodiments of the present application provides a chip, which includes a processor, and the processor is configured to read and execute a computer program stored in a memory to implement the steps of the method according to the first aspect and/or the second aspect.

A fifth aspect of embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect and/or the second aspect when executing the computer program.

A sixth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the method according to the first aspect and/or the second aspect.

In the embodiment of the application, on one hand, by acquiring the preset neural network model corresponding to the target shooting scene, when a picture is shot in the target shooting scene, a new picture can be generated based on the preset neural network model directly according to the shooting angle of the picture to be shot, so that the problem that the shot picture is fuzzy is solved, and the shooting quality of the picture is improved; on the other hand, by acquiring the preset neural network model corresponding to the target shooting scene, when video shooting is performed in the target shooting scene, a new video frame image can be generated based on the preset neural network model directly according to the shooting visual angle of the corresponding video frame when the video shakes, so that the problem that the original video shakes is solved on the premise that the content and the resolution ratio of the original video are kept, and the shooting quality of the video is improved.

Drawings

In order to more clearly illustrate the method solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting in scope, and that for a person skilled in the art, other relevant drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic structural diagram of a terminal provided in an embodiment of the present application;

FIG. 2a is a first schematic diagram of a photograph with blur provided by an embodiment of the present application;

FIG. 2b is a schematic diagram of an adjacent video frame image with jitter according to an embodiment of the present disclosure;

FIG. 2c is a second schematic diagram of a photograph with blur provided by an embodiment of the present application;

FIG. 2d is a schematic diagram of an adjacent video frame image with blur according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating an implementation of an image generation method provided in an embodiment of the present application;

fig. 4 is a schematic view of a to-be-photographed viewing angle provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of ray point sampling under a target shooting view angle according to an embodiment of the present disclosure;

fig. 6 is a schematic flow chart illustrating an implementation process of determining target parameter information according to an embodiment of the present application;

fig. 7 is a schematic flow chart illustrating an implementation process of determining a target pixel value of a target pixel point according to an embodiment of the present application;

FIG. 8 is a schematic flow chart illustrating an implementation of a training process of a neural network model provided in an embodiment of the present application;

FIG. 9 is a flowchart illustrating an implementation of step 802 in a training process of a neural network model provided in an embodiment of the present application;

fig. 10 is a schematic flowchart of a first specific implementation of step 301 of an image generation method according to an embodiment of the present application;

FIG. 11a is a schematic diagram illustrating an effect of eliminating jitter between adjacent video frame images according to an embodiment of the present application;

FIG. 11b is a schematic diagram illustrating an effect of eliminating a blurred video frame image in a video to be processed according to an embodiment of the present application;

FIG. 11c is a schematic diagram illustrating an effect of eliminating blur occurring in an image to be processed according to an embodiment of the present application;

fig. 12 is a schematic flowchart of a second specific implementation of step 301 of the image generation method according to the embodiment of the present application;

fig. 13 is a schematic structural diagram of an image generating apparatus according to an embodiment of the present application;

Detailed Description

In order to make the objects, methods, and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

"and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more, and "at least one", "one or more" means one, two or more, unless otherwise specified.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The method and the device of the embodiment of the present application may be applied to various terminals, for example, a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and other terminals, and the embodiment of the present application does not limit the specific type of the terminal.

Take the terminal as a mobile phone as an example. Fig. 1 is a block diagram illustrating a partial structure of a mobile phone according to an embodiment of the present disclosure. Referring to fig. 1, the cellular phone includes: a Radio Frequency (RF) circuit 110, a memory 120, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, a wireless fidelity (WiFi) module 170, a processor 180, and a power supply 190. Those skilled in the art will appreciate that the handset configuration shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 1:

the RF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for receiving downlink information of a base station and then processing the received downlink information to the processor 180.

The memory 120 may be used to store software programs and modules, and the processor 180 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone 100. Specifically, the input unit 130 may include a touch panel 131 and other input devices 132. The touch panel 131, also referred to as a touch screen, may collect touch operations of a user on or near the touch panel 131 (e.g., operations of the user on or near the touch panel 131 using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 131 may include two parts, i.e., a touch detection device and a touch controller.

The display unit 140 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The Display unit 140 may include a Display panel 141, and optionally, the Display panel 141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 131 can cover the display panel 141, and when the touch panel 131 detects a touch operation on or near the touch panel 131, the touch operation is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although the touch panel 131 and the display panel 141 are shown as two separate components in fig. 1 to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 131 and the display panel 141 may be integrated to implement the input and output functions of the mobile phone.

The handset 100 may also include at least one sensor 150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 141 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 160, speaker 161, and microphone 162 may provide an audio interface between the user and the handset. The audio circuit 160 may transmit the electrical signal converted from the received audio data to the speaker 161, and convert the electrical signal into a sound signal for output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal, which is received by the audio circuit 160 and converted into audio data, which is then processed by the audio data output processor 180 and then transmitted to, for example, another cellular phone via the RF circuit 110, or the audio data is output to the memory 120 for further processing.

WiFi belongs to a short-distance wireless transmission method, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 170, and provides wireless broadband Internet access for the user. Although fig. 1 shows the WiFi module 170, it is understood that it does not belong to the essential constitution of the handset 100, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 180 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 120 and calling data stored in the memory 120, thereby integrally monitoring the mobile phone. Alternatively, processor 180 may include one or more processing units; preferably, the processor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

The handset 100 also includes a power supply 190 (e.g., a battery) for providing power to various components, which may be logically coupled to the processor 180 via a power management system, so as to manage charging, discharging, and power consumption via the power management system.

Although not shown, the handset 100 may also include a camera. Optionally, the position of the camera on the mobile phone 100 may be front-located or rear-located, which is not limited in this embodiment of the application.

Optionally, the mobile phone 100 may include a single camera, a dual camera, or a triple camera, which is not limited in this embodiment.

For example, the cell phone 100 may include three cameras, one being a main camera, one being a wide camera, and one being a tele camera.

Optionally, when the mobile phone 100 includes a plurality of cameras, the plurality of cameras may be all front-mounted, all rear-mounted, or a part of the cameras front-mounted and another part of the cameras rear-mounted, which is not limited in this embodiment of the present application.

In addition, although not shown, the mobile phone 100 may further include a bluetooth module or the like, which is not described herein.

It is to be understood that the illustrated structure of the embodiments of the present application does not constitute a specific limitation to the terminal. In other embodiments of the present application, an apparatus may include more or fewer components than illustrated, or some components may be combined, some components may be separated, or a different arrangement of components may be provided. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The following embodiments can be implemented on the cellular phone 100 having the above-described structure. The following embodiment will describe an image generation method provided in the embodiment of the present application, taking the mobile phone 100 as an example.

In actual life, when a user uses the terminals to take a picture or take a video, the picture taken by the user is often blurred due to various factors, or the picture of the taken video is jittered, so that the picture or video taking quality is reduced.

For example, some users cannot control the hands to be stable due to physiological diseases, or cannot avoid being interfered by others due to large pedestrian volume in a shooting scene in the shooting process, or the shot pictures are blurred in motion due to the fact that the vehicles run on uneven roads when the users take the vehicles; or the shot video has picture jitter.

Illustratively, as shown in fig. 2a, during the process of taking a picture a by using the terminal, the taken picture a is blurred due to the shaking of the hand.

Illustratively, as shown in fig. 2b, during the process of shooting the video for the terminal, the terminal is shaken from the position of the shot video frame image b1 to the upper left, the shot video frame image b2 is obtained, then the terminal returns to the position of the shot video frame image b1, and the shot video frame image b3 is obtained, so that the picture consistency between the video frame image b1 and the video frame image b2 is poor, the picture consistency between the video frame image b2 and the video frame image b3 is poor, and thus the picture shaking occurs.

For another example, because the scene that the user wants to shoot is due to weather, for example, there are raining or heavy fog, and the object in the scene cannot be shot clearly, the shot picture and the shot video picture have poor definition, and the user's requirement cannot be met.

For example, as shown in fig. 2c, due to the fact that rainfall exists in the shooting scene, the picture clarity of the photo c shot by the terminal is poor.

For example, as shown in fig. 2d, due to the presence of rainfall in the shooting scene, the terminal shoots the obtained continuous 3-frame video images: the video frame images d1, d2 and d3 have poor image sharpness.

Based on the scene with shaking in video shooting, a video anti-shaking method based on a sensor, an electronic anti-shaking method, a video anti-shaking method based on image tracking and an anti-shaking method based on image splicing can be adopted to realize video anti-shaking.

The video anti-shake method based on the sensor is characterized in that micro movement is measured by using a sensor such as a gyroscope in a lens, and then displacement compensation is carried out based on a measurement result, so that the stability of a light path is realized, and the problems of picture discomfort and the like caused by shaking are avoided. The anti-shake method has high requirements on the accuracy of the sensor, so that the anti-shake cost is high.

The electronic anti-shake method is characterized in that 90% of photosensitive pixels are used for shooting, and the other 10% of photosensitive pixels are used for interpolation compensation of certain pixel points in an image, so that the image is prevented from shaking. This anti-shake method increases the noise of the captured image because 10% of the photosensitive pixels need to be reserved for pixel interpolation, and the color reproduction of the image is poor.

The anti-shake method based on tracking comprises the steps of marking a target object in an initial frame to obtain an initial coordinate of the target object, tracking the target object in a shot video sequence, capturing the image coordinate of each frame of the target object, judging whether a tracking result is available or not based on a preset rule, and finally keeping all tracking results to obtain a stable video. Since the anti-shake method needs to manually calibrate the initial target frame, and cannot achieve the video anti-shake effect when there are many target objects and the target objects are not present in the intermediate frames, the anti-shake method requires that each frame of the input video has to have the same non-moving target object. And the method may crop the image, resulting in low resolution of the output video, and an effect of enlarging a specific target area, and poor visual perception.

The anti-shake method based on image splicing is characterized in that images of different shooting visual angles of a shooting scene are spliced, images of different visual angles are fused, corresponding feature points between different frames are tracked by utilizing information such as optical flow, and finally, a video anti-shake effect is achieved based on models such as grid division. The anti-shake method needs to calculate more optical flow information, so that the requirement on an optical flow extraction algorithm is high, and the grid-based division algorithm needs to estimate the depth of a video foreground background and is only suitable for simple scenes.

Based on this, the embodiment of the application provides an image generation method, an image generation device, a chip, a terminal and a storage medium, which can be more beneficial to improving the shooting quality of photos or videos.

Specifically, the application provides an image generation method, an image generation device, a chip, a terminal and a storage medium, by acquiring a preset neural network model corresponding to a target shooting scene, when a picture is shot or a video is shot in the target shooting scene, a new picture or a new video frame can be generated based on the preset neural network model directly according to a shooting angle (to-be-shot angle) of the picture to be shot as required or according to a shooting angle of a video frame corresponding to video jitter, so that the problem that the shot picture is blurred or the shot video has picture jitter is solved, and the shooting quality of the picture or the video is improved.

For better understanding of the image generation method provided by the embodiments of the present application, specific implementation procedures are described below in an implementation level with reference to the accompanying drawings.

Exemplarily, fig. 3 shows a schematic flowchart of an image generation method provided in an embodiment of the present application. The image generation method can be executed by the image generation device configured by the terminal, and specifically includes the following steps 301 to 304.

Step 301, determining at least one to-be-photographed view angle in a target photographing scene.

In this embodiment of the application, the target shooting scene refers to a shooting scene corresponding to a picture or a video that needs to be obtained by a user.

For example, if the photo that the user needs to obtain is a photo obtained by shooting a scene (a stereoscopic space including an object to be shot) as shown in fig. 4, or the video that the user needs to obtain is a video obtained by shooting a scene as shown in fig. 4, the target shooting scene is the scene as shown in fig. 4.

The target shooting scene can contain an infinite number of shooting visual angles, namely an infinite number of camera shooting positions. The angle of view to be photographed may be some or a certain angle of view of the infinite number of angles of view.

In some embodiments of the present application, the above-mentioned angle of view to be photographed may be represented by coordinates and a direction vector in a world coordinate system.

For example, as shown in fig. 4, if the user wants to locate the camera at position a, the coordinates of the camera in the world coordinate system corresponding to position a are (x, y, z), and the direction vector is (x, y, z)

The coordinates and direction vectors in the world coordinate system can be used

Indicating the angle of view to be photographed.

That is to say, in the embodiment of the present application, the at least one to-be-photographed view angle may be a photographing view angle corresponding to any one coordinate and any one direction vector in a world coordinate system input by a user, for example, the to-be-photographed view angle may be a photographing view angle corresponding to any one coordinate and any one direction vector in the world coordinate system input by the user

The shooting angle of view of (1).

Step 302, obtaining a preset neural network model corresponding to a target shooting scene.

In the embodiment of the application, the preset neural network model is obtained by training sample data based on a plurality of sample images.

The plurality of sample images are images obtained by shooting in the target shooting scene, and shooting visual angles corresponding to the sample images are different.

That is, the plurality of sample images may be images obtained by photographing the target photographing scene from different photographing perspectives.

In this embodiment of the application, the sample data may include: the sample pixel value of each sample pixel point in each sample image, and the sample parameter information of a plurality of sample light points in the light direction of each sample pixel point.

Specifically, in the using process of the preset neural network model, a target pixel value of a target pixel point corresponding to each light direction under a target shooting visual angle can be generated according to any one to-be-shot visual angle under the target shooting scene.

Therefore, in the training process of the preset neural network model, a plurality of sample images obtained by shooting the target shooting scene from different shooting angles in the target shooting scene need to be used for training, so that the whole target shooting scene can be constructed, and then the target pixel values of the target pixel points corresponding to all light directions in any one to-be-shot view angle of the target shooting scene can be generated.

Step 303, obtaining target pixel values of target pixel points corresponding to each light direction at the target shooting view angle based on the preset neural network model and the target parameter information of a plurality of target light points at each light direction at the target shooting view angle.

In the embodiment of the present application, since there may be a plurality of to-be-photographed viewing angles, and each light direction corresponding to the to-be-photographed viewing angle exists under each to-be-photographed viewing angle, for convenience of description, a target photographing viewing angle is introduced to represent any one of the at least one to-be-photographed viewing angle.

It can be understood that when the user inputs a plurality of to-be-photographed visual angles or the terminal acquires a plurality of to-be-photographed visual angles, the terminal needs to calculate the target image of each to-be-photographed visual angle.

In this embodiment of the application, under the above-mentioned target shooting visual angle, the target parameter information of a plurality of target light points in each light direction may include: and under the target shooting visual angle, coordinates and direction vectors of a plurality of target light points in each light direction under a world coordinate system.

Specifically, under the target shooting visual angle, the determination of a plurality of target light points in each light direction can be realized in the following manner: the method comprises the steps of firstly determining the starting point and the end point of each beam of light rays under a target shooting visual angle, and then sampling all light ray points from the starting point to the end point of each beam of light rays under the target shooting visual angle.

As shown in fig. 5, if the coordinates in the world coordinate system corresponding to the target shooting angle of view are (x, y, z), the direction vector is

The angle of view range of the subject photographing angle of view may be an angle as shown in fig. 5

The camera can receive the angle under the target shooting visual angle

The light used to generate each pixel, based on the reversibility of the optical path, can be assumed to be: starting from the optical center of the camera at the shooting angle of the target

And a plurality of light rays are emitted from the inside to the outside, and each light ray corresponds to a pixel point in an image shot by the camera one by one.

For example, as shown in fig. 5, a ray k is one of the rays in the target shooting angle of view, a starting point of the ray is a position of an optical center a of the camera in the target shooting angle of view, and an ending point of the ray is a position B on the ray, where a distance from the optical center is a preset distance, and M ray points are obtained by uniformly sampling the line segment AB, where the M ray points correspond to pixel points (u, v) in the line u and the column v in the image obtained by the camera.

Based on the above assumptions, the coordinates of M target light points in each light direction in the N light directions at the target shooting angle of view in the camera coordinate system, that is, the coordinates in the world coordinate system corresponding to the target shooting angle of view are taken as the origin (x, y, z), and the direction vector is taken as the origin

And under a camera coordinate system established for the Z-axis direction, in the N light ray directions, the coordinates of M target light ray points in each light ray direction under the camera coordinate system. Wherein, the sum of M is an integer larger than 1, and N is the number of target pixel points in the target image. Namely, under the target shooting visual angle, the coordinates of a plurality of target light points in each light direction under the camera coordinate system.

In the embodiment of the application, after obtaining the coordinates of a plurality of target light points in each light direction in the camera coordinate system under the target shooting visual angle, the target parameter information of a plurality of target light points in each light direction under the target shooting visual angle needs to be determined, that is, the world coordinates and the direction vectors of a plurality of target light points in each light direction in the world coordinate system under the target shooting visual angle.

Specifically, as shown in fig. 6, the determination of the target parameter information may be implemented in the manner of step 601 to step 602.

Step 601, determining a coordinate transformation matrix of a plurality of target light points in each light direction under the target shooting visual angle according to the target shooting visual angle.

Step 602, converting camera coordinates of a plurality of target light points in each light direction in the camera coordinate system into world coordinates in the world coordinate system under the target shooting view angle according to the coordinate transformation matrix, and converting direction vectors of a plurality of target light points in each light direction in the camera coordinate system into direction vectors in the world coordinate system under the target shooting view angle.

In this embodiment of the present application, the coordinate transformation matrix is a transformation matrix for transforming coordinates of a pixel point in an image captured by a camera in a camera coordinate system into coordinates in a world coordinate system.

In general, the coordinate transformation matrix may be expressed as:

wherein (X)_C,Y_C,Z_C) Representing coordinates of pixel points in the image in a camera coordinate system, (X)_W,Y_W,Z_W) The coordinate of a pixel point in an image under a world coordinate system is represented, R is a rotation parameter, T is a translation parameter, and the coordinate (x, y, z) and the direction vector under the world coordinate system corresponding to a target shooting visual angle can be based

And (4) determining.

Wherein, the coordinates of the pixel points in the image under the camera coordinate system can be determined according to the internal reference matrix of the camera

The coordinate transformation of the pixel point under the pixel coordinate system is obtained.

Generally, the coordinates of a pixel point in a pixel coordinate system are expressed by (u, v), which indicates that the pixel point is located in the several rows and several columns in the image.

Therefore, the coordinate (x, y, z) and the direction vector in the world coordinate system corresponding to the known target shooting angle of view are

And after internal reference of the camera, determining world coordinates and direction vectors of a plurality of target light points in each light direction under a world coordinate system under a target shooting visual angle is realized only by mathematical calculation, namely, the world coordinates and the direction vectors can be obtained through coordinate transformation.

In practical application, after the terminal obtains the preset neural network model by executing step 302, and obtains the target parameter information of a plurality of target light points in each light direction at the target shooting angle of view in the manners shown in fig. 5 and 6, as shown in fig. 7, in an embodiment, in step 303, the target pixel value of the target pixel point corresponding to each light direction at the target shooting angle of view is obtained based on the preset neural network model and the target parameter information of a plurality of target light points in each light direction at the target shooting angle of view, which can be implemented by adopting the following steps 701 to 702.

Step 701, inputting coordinates and direction vectors of a plurality of target light points in each light direction in a world coordinate system under a target shooting visual angle into a preset neural network model to obtain light point pixel values of the plurality of target light points in each light direction under the target shooting visual angle.

It should be noted that, in the specific implementation process of step 701, coordinates and direction vectors of a plurality of target light points in one light direction under the target shooting view angle under the world coordinate system may be sequentially input into the preset neural network model to obtain light point pixel values of the plurality of target light points in the one light direction, and coordinates and direction vectors of a plurality of target light points in N light directions under the target shooting view angle under the world coordinate system may be all input into the preset neural network model to obtain light point pixel values of all the target light points.

Step 702, calculating the light point pixel values of a plurality of target light points in each light direction according to the light point pixel values of the target light points in the target shooting view angle to obtain the target pixel values of the target pixel points corresponding to each light direction in the target shooting view angle.

In order to better understand why the coordinates and direction vectors of a plurality of target light points in each light direction in the world coordinate system under the target shooting view angle are input into the preset neural network model, the light point pixel values of the plurality of target light points in each light direction under the target shooting view angle output by the preset neural network model can be obtained, and the following describes the training process of the preset neural network model in detail.

Fig. 8 is a schematic flow chart of a training process of a neural network model provided in an embodiment of the present application. The process specifically includes the following steps 801 to 803.

Step 801, inputting sample parameter information in the sample data into the neural network model to be trained, and obtaining a ray point pixel value of each sample ray point output by the neural network model to be trained.

Based on the above description of step 302, it can be known that, in the embodiment of the present application, the preset neural network model is obtained by training sample data of a plurality of sample images obtained by shooting the target shooting scene from different shooting angles, specifically, the sample data may include: the sample pixel value of each sample pixel point in each sample image, and the sample parameter information of a plurality of sample light points in the light direction of each sample pixel point.

In the embodiment of the present application, the sample pixel value of each sample pixel point in each sample image may be an RGB value of each sample pixel point, or a pixel value in a YUV format or a CMYK format, which is not limited in the present application.

For convenience of description, the sample pixel value of each sample pixel point in each sample image is taken as the RGB value of each sample pixel point in each sample image as an example.

It should be noted that, in this embodiment of the application, the sample parameter information of the plurality of sample light points in the light direction of each sample pixel point in each sample image may include: and coordinates and direction vectors of a plurality of sample light points in the light direction of each sample pixel point in each sample image in a world coordinate system.

In the embodiment of the application, each sample image is an image obtained by shooting in a target shooting scene, a world coordinate system corresponding to a plurality of sample light points in each sample pixel point light direction in each sample image is the same as a world coordinate system corresponding to a plurality of target light points in each light direction in a target shooting view angle.

In one embodiment, the plurality of sample images are input to the three-dimensional reconstruction tool COLMAP, so that the shooting angle of each sample image can be obtained.

The COLMAP is a three-dimensional reconstruction tool based on MVS (Multi-View Stereo matching) and SFM (Structure-from-Motion), and a plurality of sample images are input into the COLMAP, so that the shooting View angle corresponding to each sample image can be obtained respectively.

After the shooting visual angles of the sample images are obtained, referring to the determination methods shown in fig. 5 to 6, the determination methods of the target parameter information of the target light points in each light direction at the target shooting visual angles can be known, and after the shooting visual angles of the sample images are obtained, the coordinates and the direction vectors of the sample light points in the world coordinate system in each sample pixel point light direction in each sample image can be obtained by using the corresponding methods shown in fig. 5 to 6.

It should be noted that, after the shooting view angle of each sample image is obtained, when determining a plurality of sample light points in the light direction of each sample pixel point in each sample image, it is necessary to determine the starting point near and the ending point far of the light corresponding to each sample pixel in each sample image.

Specifically, since the sample light ray point needs to include a point on an object in the target shooting scene, the starting point near of the light ray may be any point from the position of the optical center to the object in the target shooting scene at the shooting angle corresponding to the sample image, that is, any point from the origin of the camera coordinate system corresponding to the sample image to the object in the target shooting scene, and the ending point far of the light ray may be determined according to the depth of field of the target shooting scene to be shot, that is, the sample light ray point may include a point on the object in the target shooting scene, which generally needs to be greater than the distance from the point on the object in the target shooting scene to the optical center, and specific values of the starting point near and the ending point far may be determined according to the actual application scene.

For example, for a target shooting scene of 5m multiplied by 10m, the starting point near of the ray corresponding to each sample pixel may be set to be 0.1m away from the optical center on the ray, and the ending point far of the ray corresponding to each sample pixel may be set to be 10m away from the optical center on the ray.

In addition, after the start point near and the end point far of the ray are determined in the model training process, the start point and the end point can be used continuously in the process of using the model without resetting.

In the embodiment of the application, one sample parameter information can be pre-constructed, the sample parameter information including coordinates and direction vectors of a plurality of sample light points in the light direction of each sample pixel point in each sample image in a world coordinate system is used as input data, the light point pixel values of the plurality of sample light points in the light direction of each sample pixel point in each sample image are used as a neural network model to be trained of output data, and in the model training process, the light point pixel value of each sample light point output by the neural network model to be trained can be obtained by inputting the sample parameter information into the neural network model to be trained.

It should be noted that, in the step 801, inputting the sample parameter information in the sample data into the neural network model to be trained, and obtaining the light ray point pixel value of each sample light ray point output by the neural network model to be trained means that coordinates and direction vectors of a plurality of sample light ray points in the light ray direction of each sample pixel point in each sample image in the world coordinate system are input into the neural network model to be trained together, so as to meet the data amount of the neural network model training, and in the use process of the preset neural network model obtained by training, the light ray point pixel values of the plurality of light ray points of a certain light ray at any one to-be-photographed visual angle in the target photographing scene can be generated according to the parameter information of the plurality of light ray points of the certain light ray.

It should be noted that, under the condition of sufficient data volume, before the coordinates and the direction vectors of the plurality of sample light points in the light direction of each sample pixel point in each sample image in the world coordinate system are input to the neural network model to be trained together, the sample image may be sampled first, and the sample pixel points included in each sample image are sampled to obtain the coordinates and the direction vectors of the plurality of sample light points in the light direction of the partial sample pixel points of the partial sample image in the world coordinate system, and then the coordinates and the direction vectors are input to the neural network model to be trained together to train the model.

And step 802, calculating to obtain a predicted pixel value of a corresponding sample pixel point in a corresponding sample image according to the light point pixel value of each sample light point output by the neural network model to be trained.

Specifically, each light of each sample image uniquely corresponds to one sample pixel point on the sample image, and each light of each sample image includes a plurality of sample light points, so that the calculation of the predicted pixel value is performed based on the light point pixel values of the plurality of sample light points included in each light of each sample image output in step 801, and the predicted pixel value of each sample pixel point in each sample image can be obtained.

That is, in step 802, calculating the predicted pixel value of the corresponding sample pixel point in the corresponding sample image according to the light point pixel value of each sample light point output by the neural network model to be trained means that the predicted pixel value of the sample pixel point corresponding to a certain light ray can be calculated by using the light point pixel values of a plurality of light points included in the certain light ray.

Specifically, in step 802, the predicted pixel values of the corresponding sample pixel points in the corresponding sample image are obtained by averaging the light point pixel values of the plurality of sample lights in the light direction of each sample pixel point in each sample image.

Alternatively, as shown in fig. 9, the step 802 can be implemented by using the following steps 8021 to 8022.

Step 8021, obtaining a weighting coefficient of each sample light ray point;

step 8022, according to the weighting coefficient of each sample light ray point, respectively performing weighted summation on a plurality of sample light ray points in the light ray direction in each sample image, so as to obtain a predicted pixel value of a corresponding sample pixel point in the corresponding sample image.

That is, when weighting and summing a plurality of sample light points in a certain light direction in a certain sample image, only the coefficients corresponding to the plurality of sample light points in the light are required to be weighted and summed.

Optionally, in an embodiment, in step 8021, obtaining the weighting factor of each sample ray point may include: and obtaining the depth value of each sample light ray point, and determining the depth value of each sample light ray point as the weighting coefficient corresponding to each sample light ray point.

In an embodiment, in the process of obtaining the depth value of each sample light ray point, the depth value of each sample light ray point output by the neural network model to be trained may be obtained.

That is, the depth value of each sample ray point may be a parameter output by the neural network model to be trained.

In one embodiment, in the process of obtaining the depth value of each sample light ray point, the distance between the coordinate of each sample light ray point in the camera coordinate system and the origin of the camera coordinate system may be calculated by using the coordinate of each sample light ray point in the camera coordinate system.

And 803, after adjusting the parameters of the neural network model to be trained based on the difference between the sample pixel value and the predicted pixel value, returning to the step of inputting the sample parameter information in the sample data into the neural network model to be trained and the subsequent steps until the training of the neural network model to be trained is completed, and obtaining the preset neural network model.

It can be seen from the above steps 801 to 803 that the input data of the preset neural network model obtained by training is the coordinates and direction vectors of the world coordinate system of the plurality of sample light points in the light direction of each sample pixel point of each sample image, the output data is the light point pixel value of each sample light point, and the predicted pixel value of the corresponding sample pixel point in the corresponding sample image can be obtained by calculation based on the light point pixel value of each sample light point, and step 803 can be completed by the back propagation mechanism of the neural network model, so that it can be equivalent to that the preset neural network model can output the predicted pixel value of the corresponding sample pixel point in the sample image according to the input data.

Therefore, in step 701, the coordinates and direction vectors of the plurality of target light points in each light direction in the world coordinate system at the target shooting view angle are input into the preset neural network model, so as to obtain the light point pixel values of the plurality of target light points in each light direction at the target shooting view angle output by the preset neural network model.

Correspondingly, the step 702 may be output by the neural network model, or after the neural network model outputs the light point pixel values of a plurality of target light points in each light direction at the target shooting angle, the target pixel values of the target pixel points corresponding to each light direction at the target shooting angle are obtained through calculation. Specifically, whether the predicted pixel value of the sample pixel point is directly output by the neural network model or not, or whether the light point pixel value of each sample light point is output by the neural network model first or whether the light point pixel value of each sample light point and the predicted pixel value of the sample pixel point are output by the neural network model can be determined according to an actual application scene.

Specifically, according to the light point pixel values of a plurality of target light points in each light direction at the target shooting visual angle, the target pixel values of the target pixel points in each light direction at the target shooting visual angle are calculated, and the light point pixel values of the target pixel points in each light direction at the target shooting visual angle are averaged to obtain the target pixel values of the target pixel points in each light direction at the target shooting visual angle.

Or, acquiring weighting coefficients of a plurality of target light points in each light direction under the target shooting visual angle; and then, according to the weighting coefficients of a plurality of target light points in each light direction under the target shooting visual angle, respectively carrying out weighted summation on the plurality of target light points in each light direction to obtain a target pixel value of a target pixel point corresponding to each light direction under the target shooting visual angle.

Wherein, under the visual angle is shot to the acquisition target, the weighting coefficient of a plurality of target light point in each light direction includes: and determining the depth values of a plurality of target light points in each light direction under the target shooting visual angle as the weighting coefficients of a plurality of target light points in the corresponding light direction under the target shooting visual angle.

The specific implementation of this process can refer to the description of step 802 above, and is not described here again.

And 304, outputting a target image formed under the target shooting visual angle based on the target pixel values of the target pixel points corresponding to all the light directions under the target shooting visual angle.

Because each light direction under the target shooting visual angle corresponds to one target pixel point, the pixel position of each corresponding target pixel point in the target image is determined according to the direction of each light under the target shooting visual angle, and the target image can be output based on the pixel position and the target pixel value.

In the embodiment of the application, by acquiring the preset neural network model corresponding to the target shooting scene, when a picture is shot or a video is shot in the target shooting scene, a new picture or a new video frame can be generated based on the preset neural network model according to the shooting visual angle (to-be-shot visual angle) of the picture which is shot directly as required or the shooting visual angle of the corresponding video frame when the video shakes, so that the problems that the shot picture is fuzzy or the shot video shakes are solved, and the shooting quality of the picture or the video is improved.

Compared with a tracking-based video anti-shake method, the image generation method provided by the embodiment of the application can be used without initializing the selected target frame, so that each frame of the video does not need to require a certain initial target, the initial frame image does not need to be manually framed to obtain the target, and meanwhile, the frame of the video does not need to be cut, so that the original size of the video can be ensured, and the video has higher resolution.

Compared with a video anti-shake method based on splicing, the image generation method provided by the embodiment of the application does not need a user to set complex rules and carry out a grid algorithm to find a track route of a final video.

The following describes at least one to-be-photographed angle of view in the target photographing scene determined in step 301 with reference to a specific application scenario.

First application scenario: when a user has shot a certain video with jitter in a target shooting scene, that is, a video to be processed, if the user wants to process the video to be processed to obtain a video without jitter (a target video), the video to be processed may be framed by the terminal, and the terminal performs interpolation between two shooting perspectives corresponding to two adjacent video images whose difference between the shooting perspectives is greater than a threshold value to obtain one or more viewing angles to be shot, and then the terminal performs the above steps 302 to 304 to obtain target images corresponding to the respective viewing angles to be shot, and after the following step 304 is performed, outputs the target images formed at the target shooting perspectives, updates the video to be processed by using the target images to obtain the target video.

Specifically, in some embodiments of the present application, as shown in fig. 10, in step 301, determining at least one to-be-captured view angle in the target capturing scene may include steps 3011 to 3014.

Step 3011, obtain the video to be processed in the target shooting scene.

And 3012, framing the video to be processed to obtain multiple frames of video images.

And step 3013, determining two adjacent frames of video images with the difference value of the shooting visual angles being greater than the threshold value in the multiple frames of video images.

In this embodiment of the application, when the difference between the shooting angles of the two adjacent frames of video images is greater than the threshold, it indicates that the video to be processed shakes between the two adjacent frames of video images.

In this embodiment, the threshold may include an angle threshold and a translation threshold.

Specifically, the shooting visual angle corresponding to each frame of video image can be represented by coordinates and direction vectors of an X axis, a Y axis and a Z axis in a world coordinate system, so that when determining the difference between the shooting visual angles of two adjacent frames of video images in the multi-frame video image, the difference between the translation amounts of the shooting visual angles of the two adjacent frames of video images in the X axis, the Y axis and the Z axis respectively and the angle difference corresponding to the direction vector are determined, and when the difference between any translation amounts in the X axis, the Y axis and the Z axis is greater than a translation threshold or the angle difference corresponding to the direction vector is greater than an angle threshold, the difference between the shooting visual angles of the two adjacent frames of video images is determined to be greater than the threshold.

And 3014, interpolating between two shooting views corresponding to two adjacent frames of video images to obtain at least one to-be-shot view.

In the embodiment of the present application, the interpolation between two shooting views corresponding to two adjacent frames of video images is performed to enable a difference between the shooting views of two adjacent frames of video images of the video image to be processed to be less than or equal to a threshold value after the target image corresponding to the obtained view to be shot is inserted into the video image to be processed.

Therefore, in the process of performing interpolation between two shooting angles corresponding to two adjacent frames of video images, if the difference between the two shooting angles corresponding to two adjacent frames of video images is large, a plurality of to-be-shot angles may be obtained.

In the embodiment of the application, a to-be-processed video with frame jitter is framed by using a terminal, interpolation is performed between two shooting visual angles corresponding to two adjacent frames of video images with the difference value of the shooting visual angles larger than a threshold value by using the terminal to obtain one or more to-be-shot visual angles, the following steps 302 to 304 are executed by the terminal to obtain target images corresponding to the to-be-shot visual angles, and after the target images are obtained in the following step 304, the target images are used for updating the to-be-processed video with jitter, that is, the video frame images are recombined in sequence according to the shooting visual angles corresponding to the video frame images to obtain the target video without jitter, so that the quality of video shooting is improved.

For example, as shown in fig. 11a, the capturing view angle of the video frame image b1 and the capturing view angle of the video frame image b2 can be obtained by inputting the video frame image b1 and the video frame image b2 in fig. 2b into a three-dimensional reconstruction tool COLMAP or other tool capable of outputting the capturing view angle of the image, and then two to-be-captured view angles are obtained by interpolating between the capturing view angle of the video frame image b1 and the capturing view angle of the video frame image b2, and the steps from the step 302 to the step 304 are performed based on the two to-be-captured view angles, so that the video frame images b00 and b01 can be obtained, so that the video frame image b1 and the video frame image b2 can be excessively more coherent, and the jitter between the video frame image b1 and the video frame image b2 is eliminated.

However, in a second application scenario: the video to be processed may have a problem of image blur in addition to occurrence of picture shake due to various reasons, for example, limitation of shooting conditions and the camera itself, resulting in poor quality of video shooting.

For example, when a camera shoots a certain video frame, the camera is not in focus, or the shot video frame is affected by the light, fog and other factors of the shot scene, so that the image blur shown in fig. 2d occurs.

Therefore, in some embodiments of the present application, as shown in fig. 12, the step 301 of determining at least one perspective to be photographed in the target photographing scene may include steps 3021 to 3023.

And step 3021, acquiring a video to be processed in the target shooting scene.

And step 3022, framing the video to be processed to obtain a plurality of frames of video images.

And step 3023, determining the shooting visual angle corresponding to the blurred image in the multi-frame video image as the visual angle to be shot.

In the embodiment of the present application, when determining whether each frame of video image is a blurred image, the image recognition may be performed on each frame of video image, or the image recognition may be performed based on an image classifier obtained through training, which is not limited in the present application.

In the embodiment of the application, the terminal determines the shooting view angle corresponding to the blurred image in the video to be processed as the to-be-shot view angle, then executes the steps 302 to 304 to obtain the target image corresponding to each to-be-shot view angle, and after the target image is obtained in the step 304, updates the to-be-processed video with the blurred image by using the target image, that is, replaces the blurred image in the to-be-processed video by using the target image generated in the step 304, thereby improving the quality of video shooting.

For example, as shown in fig. 11b, when it is identified that image blur exists in the video frame image d1, the video frame image d2 and the video frame image d3 shown in fig. 2d in the video to be processed, the capturing perspectives corresponding to the video frame image d1, the video frame image d2 and the video frame image d3 are determined as the to-be-captured perspectives, and the above steps 302 to 304 are performed to obtain a clear video frame image d1 ', a video frame image d 2' and a video frame image d3 'corresponding to the video frame image d1, the video frame image d2 and the video frame image d3, respectively, so that the clear video frame image d 1', the video frame image d2 'and the video frame image d 3' are used to replace the blurred video frame image d1, the video frame image d2 and the video frame image d3 in the video to be processed, thereby obtaining a target video without image blur, and improving the quality of video capturing.

In one embodiment, the image generation method provided by the embodiments shown in the above steps 3011 to 3014 and 3021 to 3023 of this application can detect in real time whether a captured video image has a shake or an image blur during the capturing of the video image.

That is to say, the video to be processed may be a video image obtained by shooting in real time by the terminal.

It should be noted that, for the same video to be processed, the two methods of determining the angle of view to be captured, i.e., steps 3011 to 3014 and steps 3021 to 3023, may be used at the same time, or only one of the two methods of determining the angle of view to be captured, i.e., steps 3011 to 3014 and steps 3021 to 3023, may be used.

The third application scenario: when a user has shot a blurred picture shown in fig. 2a or fig. 2c, that is, an image to be processed, in a target shooting scene, if the user wants to obtain a clear picture (target image) corresponding to the blurred picture (target image), the terminal may be used to calculate a shooting angle corresponding to the image to be processed, and use the shooting angle as the angle to be shot, and then the terminal executes the steps 302 to 304 to obtain a target image corresponding to each angle to be shot, so as to replace the original blurred picture, thereby improving the shooting quality of the picture.

That is to say, the image generation method provided by the embodiment of the present application can be applied not only to a video to be processed in which picture jitter exists and a video to be processed in which image blur exists, but also to a single image to be processed in which image blur exists.

For example, as shown in fig. 11c, the shooting angles corresponding to the image a to be processed and the image c to be processed in fig. 2a or fig. 2c are determined as the to-be-shot angles, and the terminal performs the above steps 302 to 304 to obtain the target image c' corresponding to each to-be-shot angle, so as to replace the original blurred image a to be processed and the blurred image c to be processed, thereby improving the shooting quality of the photograph.

Specifically, in an embodiment, in step 301, determining at least one to-be-photographed view angle in the target photographing scene may include: and acquiring an image to be processed in a target shooting scene, and determining a shooting visual angle corresponding to the image to be processed as the visual angle to be shot.

In the embodiment of the present application, the determination of the shooting View angle corresponding to the image to be processed may be implemented by using a three-dimensional reconstruction tool COLMAP based on MVS (Multi-View Stereo matching) and SFM (Structure-from-Motion reconstruction), that is, the image to be processed is input into the COLMAP, so as to obtain the shooting View angle corresponding to the image to be processed.

The three application scenarios are illustrations of specific application scenarios provided in the embodiment of the present application, and it should be understood that the image generation method provided in the embodiment of the present application may be applied to other conceivable application scenarios besides the application scenarios described above, and details are not described here again.

In order to improve the precision of the preset neural network model, the target image can meet the requirements of users.

In an embodiment, in the process of acquiring the plurality of sample images in the preset neural network model training process, a to-be-processed video acquired by shooting a to-be-shot scene may be acquired first, then the to-be-processed video is framed to obtain a plurality of frame video images, and then the plurality of sample images are screened from the plurality of frame video images.

The video image to be processed may be a jittered video image.

For example, when a user shoots a certain section of video with image jitter, the video is taken as a video to be processed, a video frame image in the video to be processed is taken as a sample image, a neural network model is trained to obtain the preset neural network model, then a new video frame image is generated by using the preset neural network model, interpolation is carried out on adjacent video frame images with jitter in the video to be processed, a target video is obtained, and the problem that the video to be processed has jitter is solved on the premise that the content and the resolution of the video to be processed (original video) are reserved.

In the process of screening a plurality of sample images from a plurality of frames of video images, blurred images in the plurality of frames of video images can be determined as non-selectable images; then, determining the video images except the non-selectable image in the multi-frame video images as selectable images; and screening a plurality of sample images from the selectable images.

For example, all selectable images are used as sample images to meet the sample amount of the preset neural network model training, and the screened sample images are sample images without blur, so that the preset neural network model obtained by training the neural network model by using the sample images can construct a more accurate target shooting scene.

Similarly, when determining that blurred images exist in multiple frames of video images, the method may be implemented by performing image recognition on each frame of video image, or based on an image classifier obtained through training, which is not limited in this application.

It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, and that some steps may occur in other orders in some implementations of the present application.

Fig. 13 shows a schematic structural diagram of an image generation apparatus 1300 provided in an embodiment of the present application, which includes a determination unit 1301, an acquisition unit 1302, a calculation unit 1303, and an output unit 1304.

A determining unit 1301, configured to determine at least one to-be-photographed view angle in a target photographing scene;

an obtaining unit 1302, configured to obtain a preset neural network model corresponding to a target shooting scene;

the calculating unit 1303 is configured to obtain a target pixel value of a target pixel point corresponding to each light direction at a target shooting view angle based on a preset neural network model and target parameter information of a plurality of target light points in each light direction at the target shooting view angle, where the target shooting view angle is any one of at least one to-be-shot view angle;

the output unit 1304 is configured to output a target image formed under the target shooting viewing angle based on the target pixel values of the target pixel points corresponding to the light directions under the target shooting viewing angle.

It should be noted that, for convenience and brevity of description, the specific working process of the image generating apparatus 1400 described above may refer to the corresponding process of the method in fig. 1 to fig. 13, and will not be described in detail herein.

Illustratively, the embodiment of the present application further provides a chip, which includes a processor, and the processor is configured to read and execute a computer program stored in a memory to implement the steps of the image generation method described above.

Illustratively, the embodiments of the present application further provide a terminal, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the image generation method when executing the computer program.

Illustratively, the present application further provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the image generation method.

The present application further provides a computer product, which stores computer instructions, and when the computer instructions are executed, the steps of the image generation method are implemented.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of each functional unit is illustrated, and in practical applications, the above-mentioned functional allocation may be performed by different functional units or modules according to requirements, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the method solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/user terminal and method may be implemented in other manners. For example, the above-described apparatus/user terminal embodiments are merely illustrative, and for example, a division of modules or units is only one logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method according to the embodiments described above may be implemented by a computer program, which is stored in a computer readable storage medium and used by a processor to implement the steps of the embodiments of the methods described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

The above examples are intended only to illustrate the process scheme of the present application and not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the method solutions described in the foregoing embodiments may be modified, or some of the method features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding method solutions, and are intended to be included within the scope of the present application.

Claims

1. An image generation method, comprising:

determining at least one to-be-shot visual angle in a target shooting scene;

2. The image generation method of claim 1, wherein the determining at least one perspective to be captured in the target capture scene comprises:

acquiring a video to be processed in the target shooting scene;

alternatively, the first and second electrodes may be,

acquiring a video to be processed in the target shooting scene;

3. The image generation method according to claim 1, wherein when the angle of view to be photographed is plural, the image generation method includes:

acquiring a video to be processed in the target shooting scene;

and the number of the first and second groups,

acquiring a video to be processed in the target shooting scene;

4. The image generation method according to claim 2 or 3, characterized by further comprising:

5. The image generation method of claim 1, further comprising:

and after adjusting the parameters of the neural network model to be trained based on the difference between the sample pixel value and the predicted pixel value, returning to the step of inputting the sample parameter information in the sample data into the neural network model to be trained and the subsequent steps until the training of the neural network model to be trained is completed, and obtaining the preset neural network model.

6. The image generation method of claim 1, wherein obtaining the target pixel value of the target pixel point corresponding to each light direction at the target shooting view angle based on the preset neural network model and the target parameter information of a plurality of target light points at each light direction at the target shooting view angle comprises:

7. The image generation method of claim 6, wherein the calculating, according to the light ray point pixel values of a plurality of target light ray points in each light ray direction at the target shooting view angle, a target pixel value of a target pixel point corresponding to each light ray direction at the target shooting view angle includes:

8. The method as claimed in claim 7, wherein said obtaining the weighting coefficients of a plurality of target ray points in each ray direction at the target capturing view angle comprises:

9. The image generation method of claim 6, wherein the calculating, according to the light ray point pixel values of a plurality of target light ray points in each light ray direction at the target shooting view angle, a target pixel value of a target pixel point corresponding to each light ray direction at the target shooting view angle includes:

10. An image generation method according to any one of claims 1 to 3 and 5 to 9, wherein the target ray points in the respective ray directions at the target shooting view angle are determined based on:

11. The image generation method of any of claims 1-3, 5-9, wherein the target parameter information includes world coordinates and direction vectors of respective target ray points in a world coordinate system; the target parameter information is determined based on:

12. An image generation apparatus, comprising:

the acquisition unit is used for acquiring a preset neural network model corresponding to the target shooting scene;

the calculation unit is used for obtaining a target pixel value of a target pixel point corresponding to each light direction under a target shooting visual angle based on the preset neural network model and target parameter information of a plurality of target light points in each light direction under the target shooting visual angle, wherein the target shooting visual angle is any one shooting visual angle in the at least one to-be-shot visual angle;

13. A chip comprising a processor for reading and executing a computer program stored in a memory for carrying out the steps of the method according to any one of claims 1 to 11.

14. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-11 when executing the computer program.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.