CN117528179A

CN117528179A - Video generation method and device

Info

Publication number: CN117528179A
Application number: CN202311445773.6A
Authority: CN
Inventors: 魏经纬
Original assignee: Aiku Software Technology Shanghai Co ltd
Current assignee: Aiku Software Technology Shanghai Co ltd
Priority date: 2023-11-01
Filing date: 2023-11-01
Publication date: 2024-02-06

Abstract

The application discloses a video generation method and a device thereof, belonging to the technical field of image processing. The method comprises the following steps: displaying a preview image of a shooting scene, wherein the preview image comprises a first object; under the condition that the preview image is provided with a second object and the shooting scene meets a first condition, responding to a first input of a user, obtaining N image groups shot at different shooting moments, wherein each image group comprises a first image shot by a first camera and a second image shot by a second camera, the first image comprises the second object, the second image comprises the first object and the second object, and N is a positive integer; replacing the second object in the second image in each image group with the second object in the first image to obtain N fusion images; a first video is generated based on the N fused images.

Description

Video generation method and device

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a video generation method and a device thereof.

Background

Along with the improvement of the requirements of users on photographic technology, how to shoot a time-delay video with a large field of view background, which not only can enlarge a shooting object, but also has a wide view range of a shooting scene, is a hot topic of current research.

At present, if a delayed video with a large visual field background is to be shot, firstly, a plurality of frames of images forming the delayed video with the large visual field background are shot, and then the plurality of frames of images are fused, so that the delayed video with the large visual field background can be obtained.

At present, when a multi-frame image with a large visual field background is formed by shooting, a plurality of shooting object images with clear textures of the shooting objects can be shot by utilizing an ultra-long focal lens to carry out time-delay shooting, but when the shooting objects are shot by utilizing the ultra-long focal lens, the view finding range and the angle of the background cannot be considered, so that the shooting object images with the clear textures and the background images with the large visual field cannot be obtained at the same time, and further, a large visual field background time-delay video with a good effect cannot be obtained.

Disclosure of Invention

The embodiment of the application aims to provide a video generation method and a device thereof, which can solve the problem that a large-view background delay video with a good effect cannot be obtained in the prior art.

In a first aspect, an embodiment of the present application provides a video generating method, including:

displaying a preview image of a shooting scene, wherein the preview image comprises a first object;

Under the condition that a second object is arranged in the preview image and the shooting scene meets a first condition, responding to a first input of a user, obtaining N image groups shot at different shooting moments, wherein each image group comprises a first image shot by a first camera and a second image shot by a second camera, the first image comprises the second object, the second image comprises the first object and the second object, and N is a positive integer;

replacing the second object in the second image in each image group with the second object in the first image to obtain N fusion images;

a first video is generated based on the N fused images.

In a second aspect, an embodiment of the present application provides a video generating apparatus, including:

the display module is used for displaying a preview image of a shooting scene, wherein the preview image comprises a first object;

the first determining module is used for responding to a first input of a user when determining that the preview image has a second object and the shooting scene meets a first condition, and obtaining N image groups shot at different shooting moments, wherein each image group comprises a first image shot by a first camera and a second image shot by a second camera, the first image comprises the second object, the second image comprises the first object and the second object, and N is a positive integer;

A replacing module, configured to replace the second object in the second image in each image group with the second object in the first image, to obtain N fused images;

and the second determining module is used for generating a first video based on the N fused images.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

In the embodiment of the application, under the condition that the second object is arranged in the preview image of the displayed shooting scene and the shooting scene meets the first condition, N image groups shot at different shooting moments can be obtained by responding to the first input of a user, each image group comprises a first image shot by a first camera and comprising the second object and a second image shot by a second camera and comprising the second object and the first object, the first image of the second object with clear textures and the second image of the first object with the second object and the large visual field can be obtained through the first camera and the second camera respectively, and after the second object in the second image in each image group is replaced by the second object in the first image, a plurality of fused images of the second object with clear textures and the first object with the large visual field can be obtained, and further, a large visual field background time delay video with good effects can be obtained.

Drawings

FIG. 1 is a flow chart of a video generation method provided by some embodiments of the present application;

FIG. 2 is a schematic illustration of a preview image of a captured scene provided by some embodiments of the present application;

FIG. 3 (a) is a schematic illustration of a first image provided by some embodiments of the present application;

FIG. 3 (b) is a schematic illustration of a second image provided by some embodiments of the present application;

FIG. 4 is a schematic illustration of a fused image provided by some embodiments of the present application;

FIG. 5 is a schematic diagram of a first video provided by some embodiments of the present application, using aspects of embodiments of the present application;

FIG. 6 is a schematic illustration of a determination of a first model provided in some embodiments of the present application;

FIG. 7 is a schematic illustration of a first video obtained in a conventional manner provided by some embodiments of the present application;

fig. 8 is a schematic structural diagram of a video generating apparatus shown in some embodiments of the present application;

FIG. 9 is a schematic structural view of an electronic device shown in some embodiments of the present application;

fig. 10 is a schematic diagram of a hardware structure of an electronic device shown in some embodiments of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type and do not limit the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

In order to solve the above problems, in the prior art, when a delay video with a large-view background is shot, there is a problem that the effect of the shot delay video is poor, the shooting cost is high, and the shooting equipment is not easy to carry, the embodiments of the present application provide a video generating method and apparatus thereof, in order to solve the above problems, in the case that it is determined that a preview image of a displayed shooting scene has a second object, and the shooting scene satisfies a first condition, N image groups shot at different shooting moments can be obtained by responding to a first input of a user, each image group includes a first image including the second object shot by a first camera and a second image including the second object and the first object shot by a second camera, so that a first image including a texture of the second object and a second image including the first object can be obtained by the first camera and the second camera, and a second image including the second object having the second object and the large-view first object can be obtained by replacing the second object in the second image in each image group with the second object in the first texture, and then a plurality of first images including the first object having the large-view can be obtained.

The technical scheme of the embodiment of the application can be applied to shooting a scene with a large-field background delay video, wherein the shooting object can be enlarged, and the view finding range of the background in the shooting scene is wide. For example, if a user wants to shoot a video with a large background, in which a moon motion track is recorded in a period of time, and the texture of the moon is clearly seen in the video, the technical scheme of the embodiment of the application can be applied to obtain a delayed video with a large field of view background.

The video generating method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a video generating method according to an embodiment of the present application, where an execution subject of the video generating method may be an electronic device, and the electronic device may be, but is not limited to, a personal computer (Personal Computer, PC), a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), or the like.

As shown in fig. 1, the video generating method provided in the embodiment of the present application may include S110 to S140.

S110, displaying a preview image of the shooting scene.

The shooting scene may be a scene of a delayed video with a large-view background shooting effect, which the user wants to shoot, and the shooting scene has a first object, where the first object is a large-range background which the user wants to shoot, for example, a delayed video of a large building background of a moon motion track in a period of time, where the user also wants to clearly see a moon texture, and then needs to be in the scene with the moon and the building as the background.

The preview image may be a preview image of a captured scene displayed at a camera interface of the electronic device, the preview image including the first object therein. Referring to fig. 2, if the photographed scene is a scene having a moon and a building as a background, the preview image may be an image shown in fig. 2, in which the moon 21 and the building 22 are included.

In some embodiments of the present application, S110 may specifically include:

in the event that the electronic device is determined to be in the time-lapse photography mode, a preview image of the photographed scene is displayed in response to a second input by the user.

The time-lapse photography mode may be a mode in which the technical scheme of the embodiment of the present application is executed to obtain a time-lapse video including a large-field background.

In one embodiment of the present application, the electronic device may be set to enter the delayed photography mode through a setting interface of the electronic device, for example, a "enter delayed photography mode" control may be selected in the setting interface of the electronic device, and then the electronic device is in the delayed photography mode.

The second input is an input of a camera application program identification of the electronic device, the second input is used for turning on a camera of the electronic device, a preview image of the shooting scene is displayed, and the second input may be a second operation. Illustratively, the second input includes, but is not limited to: the touch input of the user to the camera application program identification of the electronic device through the touch device such as a finger or a stylus, or the voice instruction input by the user, or the specific gesture input by the user, or other feasibility inputs can be determined according to the actual use requirement, and the embodiment of the invention is not limited. The specific gesture in the embodiment of the application may be any one of a single-click gesture, a sliding gesture, a dragging gesture, a pressure recognition gesture, a long-press gesture, an area change gesture, a double-press gesture and a double-click gesture; the click input in the embodiment of the application may be single click input, double click input, or any number of click inputs, and may also be long press input or short press input. For example, the second input may be: click input by the user on the camera application identity.

In some embodiments of the present application, a user may enter into the camera application by clicking on the camera application identifier, for example, clicking on an icon of the camera application, and further display a preview interface of the shooting scene.

And S120, under the condition that the second object is arranged in the preview image and the shooting scene meets the first condition, responding to the first input of the user, and obtaining N image groups shot at different shooting moments.

The second object may be a photographed object with a clear texture in the large-field background delayed video desired by the user, for example, moon 21 in fig. 2.

The first condition may be a condition to be satisfied by a preset shooting scene, for example, the shooting scene may be stable, where the stability may refer to stability of an electronic device for shooting a large-view background delayed video, for example, if the delayed video of the large-view background describing a motion track of a moon is obtained in a period of time, the electronic device is required to be always placed in the shooting scene with the moon and the large-view background, a moon image and a background image are shot at intervals of time, the whole shooting period is long, if the motion track of the moon during shooting is required to be well recorded, the electronic device is required to be stably fixed in the shooting scene with the moon and the large-view background during shooting, so that the image of the motion track of the moon cannot be well shot due to shaking or movement of the electronic device, and thus the delayed video of the large-view background describing the motion track of the moon cannot be accurately obtained. The specific first condition here may be that the cradle for supporting the electronic device is to be stabilized in a shooting scene having the second object and the large-view first object, so that the electronic device is stabilized in a shooting scene having the second object and the large-view first object.

The first input is input to a shooting control or a delayed shooting setting control of a camera interface of the electronic device, where the first input is used to obtain N image groups shot at different shooting moments, and the first input may be a first operation. Illustratively, the first input includes, but is not limited to: the user can determine the touch input of the shooting control or the delay shooting setting control of the camera interface of the electronic equipment through the touch device such as a finger or a handwriting pen, or the voice instruction input by the user, or the specific gesture input by the user, or other feasibility input according to the actual use requirement, and the embodiment of the invention is not limited. The specific gesture in the embodiment of the application may be any one of a single-click gesture, a sliding gesture, a dragging gesture, a pressure recognition gesture, a long-press gesture, an area change gesture, a double-press gesture and a double-click gesture; the click input in the embodiment of the application may be single click input, double click input, or any number of click inputs, and may also be long press input or short press input. For example, the first input may be: and clicking and inputting a shooting control or a delayed shooting setting control of a camera interface of the electronic equipment by a user.

In some embodiments of the present application, the user may obtain N image groups captured at different capturing moments by clicking on the "capture" control 23 or the "time delay capture" control 24 in fig. 2, where each image group includes a first image captured by the first camera and a second image captured by the second camera. The first image and the second image in each image group are images which can obtain a time-lapse video with a large-field background effect. Here, the first image may include a second object, such as moon 21 in fig. 2, and the second image may include the second object and the first object, such as moon 21 and building 22 in fig. 2.

In one example, if the user wants to obtain a large field background delayed video of the moon motion trajectory from 6 pm to 12 pm, the user may set the "delayed capture" control 24 in the interface shown in fig. 2 that displays the preview image, specifically may set to capture images once every 1 hour, then may capture a large moon first image and a building background second image at 6 pm, capture a large moon first image and a building background second image at 7 pm, capture a large moon first image and a building background second image at 8 pm, capture a large moon first image and a building background second image at 9 pm, capture a large moon first image and a building background second image at 10 pm, capture a large moon first image and a building background second image at 11 pm, capture a large moon first image and a building background second image at 12 pm, capture such a first set of bright moon first and building background images at 7 pm.

Each image group can comprise a first image shot by the first camera and a second image shot by the second camera, the first image comprises a second object, the second image comprises a first object and a second object, and N is a positive integer.

The first camera and the second camera may be two cameras of a camera of the electronic device, for example, the first camera and the second camera may be a main camera and a sub-camera of the electronic device, for example, the first camera may be a camera with an equivalent focal length of 23-28mm at a camera interface "1X", and the second camera may be a camera with an equivalent focal length of 50-120mm at a camera interface "2X/3X/5X".

Under the condition that the first camera and the second camera are two cameras of the camera of the electronic equipment, as the first image comprising the second object and the second image comprising the second object and the first object are shot by utilizing different cameras of the same camera, the ultra-long focal lens does not need to be purchased, the cost is reduced, the camera is convenient to carry, and the labor cost is reduced.

The first image may be an image including the second object captured by the first camera. The second image may be an image including the second object and the first object photographed by the second camera. The texture of the second object in the second image is unclear, and the texture of the second object in the first image is clear.

With continued reference to the above example, referring to fig. 3 (a) and 3 (b), taking the second object as a moon, the first object as an example of a building, fig. 3 (a) is an image of the moon captured at one of 6 to 12 pm, the texture of the moon 31 in fig. 3 (a) is clear, and fig. 3 (b) is an image of the building 32 captured at one of 6 to 12 pm, for example, the image of the building 32 captured at the same capturing time as fig. 3 (a) may be an image of the building 32, and in fig. 3 (b), the moon 31 is included in addition to the building 32, and in fig. 3 (b), the moon 31 is only a small dot, and the texture is unclear.

In some embodiments of the present application, whether the preview image has the second object may be determined by an image detection algorithm or an image detection model, or the like, and specifically how to determine whether the preview image has the second object may be set by itself according to a user requirement, which is not limited in embodiments of the present application.

S130, replacing the second object in the second image in each image group with the second object in the first image to obtain N fusion images.

The fused image may be an image obtained by replacing the second object in the second image with the second object in the first image, specifically, replacing the second object in the first image with the second object in the second image.

In some embodiments of the present application, for each image group, the second object in the second image may be replaced with the second object in the first image, and N fused images may be obtained.

With continued reference to fig. 3 (a) and 3 (b), the non-textured moon 31 of fig. 3 (b) may be replaced with a textured moon 31 of fig. 3 (a), thus resulting in a fused image of fig. 4 that has not only a textured moon but also a wide range of buildings 42. In this way, for each of the 7 image groups, the above manner is adopted to replace the second object in the second image with the second object in the first image, so that 7 fused images can be obtained.

And S140, obtaining a first video based on the N fusion images.

The first video may be a time-lapse video with a large-field background effect, which is obtained by not only enlarging the second object but also widening the view-finding range of the first object in the shooting scene.

In some embodiments of the present application, after obtaining N fused images, the N fused images may be connected in series to form a first video according to a capturing time sequence of a first image or a second image corresponding to the fused images.

With continued reference to the above example, after obtaining 7 fused images, the 7 fused images are connected in series according to a time sequence, so that a first video of a building scene of a movement track of a moon can be obtained, wherein the first video has clear texture, as shown in fig. 5, the first video is a schematic diagram of the first video of the building scene of the movement track of the moon, the first video is a position of the moon relative to the building 52, the position of the moon relative to the building 52 is changed along with the movement of the moon, so that the position change of each moon in the first video is consolidated, the condition that the moon relative to the building 52 is changed, the moon relative to the building 52 is obtained, the moon relative to the building 52 is the position of 6 at night, the moon relative to the building 52 is the position of 7 at night, the moon relative to the building 52 is the position of 8 at night, the moon relative to the building 52 is the position of 9 at night, the moon relative to the building 52 is the position of 10 at night, and the moon relative to the building 52 is the position of the night 6 at the night at the position of 10 at the night.

In some embodiments of the present application, in order to accurately determine whether the preview image has the second object, the above-mentioned related method may further include, before step 120:

inputting the preview image into the third model to obtain a probability value of the second object in the preview image;

and determining that the preview image has the second object under the condition that the probability value is larger than the preset probability threshold value.

The third model may be a model for determining whether the preview image has the second object, and the model may be, but is not limited to, a neural network model, a support vector machine, a decision tree model, or the like, which is not limited in the embodiment of the present application.

The preset probability threshold may be a threshold corresponding to a probability value of having the second object in the preview image set in advance, and when it is determined that the probability value of having the second object in the preview image is greater than the threshold, it is determined that the second object is in the preview image, and when it is determined that the probability value of having the second object in the preview image is less than or equal to the threshold, it is determined that the second object is not in the preview image.

In some embodiments of the present application, the preview image may be input into the third model, to obtain a probability value of the second object in the preview image, and if the probability value is determined to be greater than the preset probability threshold, determining that the second object is in the preview image.

In the embodiment of the application, whether the second object exists in the preview image or not is determined by using the model, so that the accuracy and efficiency of determining the second object are improved.

In some embodiments of the present application, before the inputting the preview image into the third model to obtain the probability value of the second object in the preview image, the third model is first obtained, specifically, the third model may be obtained by, before the inputting the preview image into the third model to obtain the probability value of the second object in the preview image, the method may further include:

acquiring a first original training set, wherein the first original training set comprises at least two groups of first training samples, and each group of first training samples comprises a first historical image of a second shot object and first label data corresponding to the first historical image;

inputting the first historical image into a fourth model to obtain whether the first historical image has a first prediction probability value of the second object or not;

determining a loss function value of the fourth model according to the first prediction probability value and the first tag data;

and under the condition that the loss function value does not meet the training stop condition, adjusting the model parameters of the fourth model, and utilizing the fourth model after the training parameters of the first original training set are adjusted until the loss function value meets the training stop condition, so as to obtain a third model.

Wherein the first original training set may be a training sample set for training a fourth model, where the fourth model may be a model prior to training the third model.

The first training samples may be training samples included in a first element training set, the training samples including first historical images of the second object captured, and first tag data corresponding to the first historical images. The first history image may be an image of a second object photographed before the current time, and the first tag data may be an image for describing whether the second object is present in the first history image, for example, the first tag data may be "the first history image has the second object" or "the first history image does not have the second object".

The first predicted probability value may be a probability value of whether the second object is present in the predicted first history image after the first history image is input to the fourth model.

The training stopping condition may be a preset condition for stopping the training of the fourth model, for example, the fourth model may be trained in a manner that the iteration of the training of the fourth model satisfies a certain number of times, or the loss function value of the fourth model is smaller than a preset loss function value, for example, the training is stopped when the fourth model is trained in a loop for 50 times, or the training is stopped when the loss function value of the fourth model is smaller than 0.1%.

In the embodiment of the application, for each group of first training samples in the first original training set, a first historical image is input into a fourth model to obtain a first prediction probability value of whether a second object exists in the first historical image, a loss function value of the fourth model is determined according to the first prediction probability value and the first label data, model parameters of the fourth model are adjusted under the condition that the loss function value does not meet a training stop condition, and the fourth model after the adjustment of the training parameters of the first original training set is utilized until the loss function value meets the training stop condition, so that a third model can be obtained, and if the second object exists in the preview image can be rapidly and accurately determined based on the third model afterwards.

In some embodiments of the present application, in order to further improve the generation efficiency of the third model, after the obtaining the first original training set, the method further includes:

preprocessing the first historical image to obtain a preprocessed first historical image;

the inputting the first history image into the fourth model may specifically include:

and inputting the preprocessed first historical image into a fourth model.

In some embodiments of the present application, the preprocessing may be to adjust the format of the first historical image, for example, may adjust all the first historical images in the first original training set to have the same resolution and contrast level, so that the images input into the fourth model may be ensured to have the same processing mode, and different processing needs not to be performed on each first historical image, so that the processing efficiency of the fourth model on the first historical image is improved, and further the generating efficiency of the third model is improved.

In some embodiments of the present application, in order to further improve accuracy of the third model in identifying the preview image, after the acquiring the first original training set, the method further includes:

performing image enhancement processing on the first historical image to obtain a second historical image;

taking the first history image and the second history image as target history images;

taking the second tag data corresponding to the first tag data and the second historical image as target tag data;

taking the target historical image and the target label data as target training samples;

the inputting the first historical image into the fourth model to obtain a first prediction probability value of whether the first historical image has the second object or not may specifically include:

Inputting the target historical image into a fourth model to obtain whether the target historical image has a first prediction probability value of the second object or not;

the determining the loss function value of the fourth model according to the first prediction probability value and the first label data comprises:

and determining a loss function value of the fourth model according to the first prediction probability value and the target tag data.

The second history image may be an image obtained by performing operations such as rotation, clipping, and scaling on the first history image after performing image enhancement on the first history image. The target history image may include a first history image and a second history image. The second tag data may be tag data corresponding to the second history image. The target tag data may include first tag data and second tag data. The target training sample may be a training sample composed of target history images and target tag data.

In some embodiments of the present application, the second history image may be obtained by performing operations such as rotation, clipping, and scaling on the first history image, so that the first history image and the second history image are both used as training samples, and adaptability of the fourth model to different scenes is increased.

After the target training sample is obtained, the target historical image can be input into a fourth model to obtain whether the target historical image has a first prediction probability value of the second object or not, and then the loss function value of the fourth model is determined according to the first prediction probability value and the target label data.

In the embodiment of the application, the second historical image is obtained after the first historical image is subjected to rotation, cutting, scaling and other operations, so that the first historical image and the second historical image are used as training samples to train the fourth model, adaptability of the fourth model to different scenes is improved, and accuracy of second object recognition is improved.

In some embodiments of the present application, after the first image is obtained, if the sharpness of the second object in the first image does not meet the requirement of the user, the sharpness of the second object in the first image may be enhanced. The definition of the second object in the specific first image depends on the resolution and the pixels of the first image, and the resolution and the pixels of the first image can be acquired, for example, the resolution and the pixels of the first image can be acquired through an image detection algorithm, so as to obtain the definition of the first image.

The specific manner of enhancing the sharpness of the second object in the first image is as follows:

to further enhance the texture of the second object, the above-mentioned method may further comprise, prior to S130:

respectively carrying out enhancement processing on the definition of the second object in the first image in each image group to obtain N third images;

replacing the second object in the second image in each image group with the second object in the first image to obtain N fused images, wherein the N fused images comprise:

and replacing the second object in the second image in each image group with the second object in the third image to obtain N fusion images.

The third image may be an image obtained by enhancing the sharpness of the second object in the first image.

In some embodiments of the present application, in order to further improve the definition of the second object, the definition of the second object in the first image may be enhanced, so that N third images may be obtained, and further, the second object in the second image in each image group may be replaced with the second object in the third image, so that N fused images may be obtained, where the definition of the second object in the fused image obtained in this way is better.

In some embodiments of the present application, the sharpness of the second object in the first image may be enhanced by an image enhancement algorithm or other algorithms that can enhance the sharpness of the second object in the image, and specifically, what manner is adopted to enhance the sharpness of the second object in the first image may be selected according to the user's needs, which is not limited in the embodiments of the present application.

In the embodiment of the application, the definition of the second object in the first image is enhanced, so that N third images with better definition can be obtained, the second object in the second image in each image group can be replaced by the second object in the third image, N fusion images are obtained, the definition of the second object in the fusion image obtained in this way is better, and further the first video of the second object with better definition can be obtained.

In some embodiments of the present application, in order to accurately enhance the sharpness of the second object in the first image, the enhancing processing is performed on the sharpness of the second object in the first image in each image group to obtain N third images, which may specifically include:

And respectively inputting the first images in each image group into a first model, and respectively carrying out enhancement processing on the definition of the second object in the first images in each image group based on the first model to obtain N third images.

Wherein the first model may be a model for enhancing sharpness of the second object in the first image in each image group, respectively.

In some embodiments of the present application, the first images in each image group may be input into the first model, and sharpness enhancement processing is performed on the second objects in the first images in each image group based on the first model, so as to obtain N third images.

In the embodiment of the application, the sharpness of the second object in the first image in each image group is enhanced by the first model, so that the determination efficiency of the third image is improved.

In some embodiments of the present application, in order to obtain the first model accurately, before the first image in each image group is input into the first model, the above-mentioned method may further include:

acquiring an original training set, wherein the original training set comprises at least two groups of training samples, and each group of training samples comprises a photographed historical image of a second object;

Inputting the historical images into a generator of a second model for each group of training samples to obtain a first generated image;

inputting the first generated image into a discriminator of the second model to obtain a probability value of the first generated image conforming to the second condition;

under the condition that the probability value does not meet the training stop condition, inputting the first generated image into a generator to obtain a second generated image;

updating the second generated image into the first generated image, and returning to execute the step of inputting the first generated image into the discriminator of the second model to obtain a probability value of the first generated image meeting a second preset condition, and inputting the first generated image into the generator to obtain the second generated image under the condition that the probability value does not meet the training stop condition until the probability value output by the discriminator of the second model meets the training stop condition to obtain the first model.

Wherein the original training set may be a set of training samples for training the second model. The second model here may be the model before the training of the first model.

The training sample may be a sample for training the second model. The training sample may include a history image of the second object, where the history image may be an image of the second object taken prior to the current time.

The generator of the second model may be a means for generating dummy data, in particular the generator may be a neural network model, which may calculate a random data and generate a dummy data similar to the real data, i.e. the purpose of the generator is to generate dummy data capable of spoofing the arbiter.

The arbiter of the second model may be for judging whether the data input thereto is true data or false data. The arbiter may also be a neural network model that receives an input data and outputs probability values for the data as real data. The aim of the arbiter is to distinguish as far as possible between true and false data.

In some embodiments of the present application, the generator and the arbiter are trained by antagonism learning. Specifically, the method includes the steps of firstly training the discriminator, distinguishing real data from false data by the discriminator, then training the generator by using parameters (namely relatively ideal real data) of the fixed discriminator, enabling the generator to generate the false data capable of cheating the discriminator, finally, using the trained generator to generate the false data, and enabling the discriminator to judge whether the data are real data or not. By iterating the above process continuously, the generator and the arbiter can gradually reach an equilibrium state, i.e. the generator can generate dummy data indistinguishable from real data.

The first generated image may be an image corresponding to the history image generated based on the generator after the history image is input into the generator.

The second condition may be a condition satisfied by a first generated image set in advance, for example, may be an image of a second object that determines that the first generated image is true.

The training stopping condition may be a condition for stopping training of the second model, and the training stopping condition may be whether a probability value of the second condition of the first generated image symbol is greater than a certain threshold, if so, the probability value of the first generated image meeting the second condition meets the training stopping condition, otherwise, the training stopping condition is not met.

The second generated image may be an image generated after inputting the first generated image into the generator in a case where the probability value that the first generated image meets the second condition does not satisfy the training stop condition.

In some embodiments of the present application, referring to fig. 6, fig. 6 is a schematic diagram of a determination process of a first model, the determination process of the first model includes the following S610-S640.

S610, acquiring a history image of the second object.

S620, inputting the history image into a generator to obtain a first generated image.

S630, judging whether the probability value of the first generated image meets the second condition meets the training stop condition, if yes, executing S640, if not, inputting the first generated image into a generator to generate a second generated image, taking the second generated image as an input of a discriminator, judging whether the probability value of the second generated image meets the second preset condition meets the training stop condition, and if not, continuing the countermeasure learning.

S640, outputting the image with enhanced definition of the second object.

Taking the second object as the moon as an example, the process of S610-S640 is as follows: inputting a historical image of a second object shot by a first camera into a generator of a second model to generate a first generated image, then taking the first generated image as an input of a discriminator, judging whether a probability value of the first generated image meeting a second preset condition meets a training stop condition, if the second object is moon, judging whether the moon image generated by the generator meets a true moon judgment by the discriminator, if not, resisting learning, inputting the first generated image into the generator to generate a second generated image, taking the second generated image as an input of the discriminator, judging whether the probability value of the second generated image meeting the second condition meets the training stop condition, if not, continuing resisting learning, and if so, taking the finally generated image of the generator as a finally enhanced image, thereby obtaining the first model.

When judging whether the image generated by the generator meets the judgment of the real moon, the judging unit takes the image of the moon with the texture definition meeting the requirement of the user as a judging standard.

In the embodiment of the application, the second model is trained by using the countermeasure learning, so that a better first model for enhancing the second object can be obtained, and then the second object in the first image can be enhanced according to the first model, so that a third image with better definition can be obtained, and further, a first video of the second object with clear texture can be obtained, and the definition of the second object in the first video is improved.

In some embodiments of the present application, in addition to the enhancement processing of the sharpness of the second object in the first image using the first model, the sharpness of the second object in the first image may be processed in other manners. The specific method is as follows: in order to further enhance the sharpness of the second object in the first video, before S130, the above-mentioned method may further include:

amplifying the first image in each image group by a first multiple to obtain N fourth images;

The replacing the second object in the second image in each image group with the second object in the first image to obtain N fused images may specifically include:

and replacing the second object in the second image in each image group with the second object in the fourth image to obtain N fusion images.

The fourth image may be an image obtained by performing the first-multiple magnification processing on the first image. The first multiple may be a preset multiple of amplifying the first image, for example, may be 3 times, and a specific value of the first multiple may be set according to a user requirement, which is not limited in the embodiment of the present application.

In some embodiments of the present application, after the first image is obtained, if the sharpness of the second object in the first image does not reach the sharpness required by the user, the first image may be subjected to a first multiple of magnification processing, so that the sharpness of the second object in the first image is enhanced to obtain a fourth image, and thus when the second object in the second image in each image group is replaced with the second object in the first image, the second object in the second image in each image group may be replaced with the second object in the fourth image, so as to obtain N fused images.

In the embodiment of the application, the first image in each image group is subjected to the first multiple amplification processing to obtain N fourth images, so that the image of the second object with clearer definition can be obtained, the first video of the second object with clear textures is obtained, and the definition of the second object in the first video is improved.

In some embodiments of the present application, when the sharpness enhancement processing is performed on the second object in the first image, the above two enhancement modes may also be adopted, so that the sharpness enhancement effect on the second object is better. The specific method can be as follows: after performing enhancement processing on the sharpness of the second object in the first image in each image group to obtain N third images, the above-mentioned related method may further include:

amplifying the third image in each image group by a second multiple to obtain N fifth images;

the replacing the second object in the second image in each image group with the second object in the third image to obtain N fused images may specifically include:

and replacing the second object in the second image in each image group with the second object in the fifth image to obtain N fusion images.

The fifth image may be an image obtained by performing the second-multiple magnification processing on the third image. The second multiple may be a preset multiple of amplifying the third image, for example, may be 5 times, and a specific value of the second multiple may be set according to a user requirement, which is not limited in the embodiment of the present application.

In some embodiments of the present application, after the third image is obtained, if the user feels that the sharpness of the second object in the third image does not meet the requirement of the user yet, the second multiple of the amplifying process may be further performed on the third image, and the sharpness of the second object is further enhanced to obtain the fifth image, so when the second object in the second image in each image group is replaced with the second object in the third image, the second object in the second image in each image group may be replaced with the second object in the fifth image, so N fused images are obtained.

In the embodiment of the application, the third image in each image group is subjected to the second multiple amplification processing to obtain N fifth images, so that the image of the second object with clearer definition can be obtained, the first video of the second object with clear textures is obtained, and the definition of the second object in the first video is improved.

In some embodiments of the present application, when the sharpness enhancement method of the two sharpness enhancement methods is used to enhance sharpness of the second object in the first image, the first image may be input into the first model first, after the third image is obtained, the third image may be subjected to the second multiple of amplification, or the first image may be subjected to the preset multiple of amplification, and then the amplified first image may be input into the first model. Specifically, the first image is input into the first model, and then the amplification processing of the preset multiple is performed, or the amplification processing of the preset multiple is performed on the first image, and then the amplified first image is input into the first model, which can be selected by the user according to the user requirement, and the method is not limited in the embodiment of the application.

In some embodiments of the present application, in a case where it is determined that the preview image does not have the second object or the shooting scene does not satisfy the first condition, the method involved above may further include:

responding to a first input of a user, obtaining M sixth images shot at different shooting moments, wherein the sixth images comprise a second object and a ground scene, and M is a positive integer;

And obtaining the first video based on the M sixth images.

The sixth image may be a conventional image having the second object and the background in the photographed scene, which is directly photographed by the camera, for example, an image as shown in fig. 2.

After obtaining the M sixth images, the M sixth images may be connected in series to the first video in the photographing time sequence of each sixth image.

In one example, taking the second object as the moon and taking the ground scene as the building as an example, the 7 sixth images obtained during the period from 6 to 12 pm are connected in series in time sequence, so as to obtain the schematic diagram of the first video of the building ground scene for describing the movement track of the moon shown in fig. 6, since the building is basically free from great change in position in the first video, the position changes of the moon in the first video are continuously changed along with the movement of the time, the situation that the moon shown in fig. 7 is continuously changed relative to the building 62 is obtained, in fig. 7, the moon is the position of the moon at 6 pm relative to the building 62 at the position 1, the moon at 7 pm relative to the building 62 at the position 2, the moon at 8 pm relative to the building 62 at the position 3, the moon at 9 pm relative to the building 62 at the position 5, the moon at 10 pm relative to the building 62 at the position 4, and the moon at the position 11 pm relative to the building 62 at the position 6 at the position 7 relative to the building 62 at the position 7.

As can be seen from fig. 5 and fig. 7, the texture of the second object in the first video obtained by the scheme according to the embodiment of the present application is clearer than that of the prior art.

According to the video generation method provided by the embodiment of the application, the execution subject can be a video generation device. In the embodiment of the present application, a video generating apparatus provided in the embodiment of the present application will be described by taking an example in which the video generating apparatus executes a video generating method.

Fig. 8 is a schematic diagram showing a structure of a video generating apparatus according to an exemplary embodiment.

As shown in fig. 8, the video generating apparatus 800 may include:

a display module 810, configured to display a preview image of a shooting scene, where the preview image includes a first object;

a first determining module 820, configured to, when it is determined that the preview image has a second object and the shooting scene meets a first condition, respond to a first input of a user, obtain N image groups shot at different shooting moments, where each image group includes a first image shot by a first camera and a second image shot by a second camera, the first image includes the second object, the second image includes the first object and the second object, and N is a positive integer;

A replacing module 830, configured to replace the second object in the second image in each image group with the second object in the first image, to obtain N fused images;

the second determining module 840 is configured to generate a first video based on the N fused images.

In the embodiment of the application, in the case that the preview image of the displayed shooting scene is determined to have the second object, and the shooting scene meets the first condition, N image groups shot at different shooting moments can be obtained by responding to the first input of the user, each image group comprises a first image shot by the first camera and comprising the second object and a second image shot by the second camera and comprising the second object and the first object, so that the first image of the second object with clear texture and the second image of the first object with the second object and the large field of view can be obtained by the first camera and the second camera respectively, and after the second object in the second image in each image group is replaced by the second object in the first image, a plurality of fused images of the second object with clear texture and the first object with the large field of view can be obtained, and further a first video of the second object with clear texture and the first object with the large field of view can be obtained.

In some embodiments of the present application, the apparatus referred to above may further comprise:

the enhancement module is used for respectively enhancing the definition of the second object in the first image in each image group to obtain N third images;

the replacement module 830 may specifically be configured to:

the third determining module is used for amplifying the first images in each image group by a first multiple to obtain N fourth images;

the replacement module 830 may specifically be configured to:

a fourth determining module, configured to perform second multiple amplification processing on the third image in each image group, to obtain N fifth images;

The replacement module 830 may specifically be configured to:

In some embodiments of the present application, the enhancement module may specifically be configured to:

The video generating apparatus in the embodiment of the present application may be an electronic device, or may be a component in an electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The video generating apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The video generating apparatus provided in this embodiment of the present application can implement each process implemented by the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 9, the embodiment of the present application further provides an electronic device 900, including a processor 901 and a memory 902, where a program or an instruction capable of being executed on the processor 901 is stored in the memory 902, and the program or the instruction when executed by the processor 901 implements each step of the embodiment of the video generating method, and the steps can achieve the same technical effect, so that repetition is avoided, and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 10 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: radio frequency unit 1001, network module 1002, audio output unit 1003, input unit 1004, sensor 1005, display unit 1006, user input unit 1007, interface unit 1008, memory 1009, and processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 10 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

A display unit 1006, configured to display a preview image of a shooting scene, where the preview image includes a first object;

a processor 1010, configured to, in a case where it is determined that the preview image has a second object and the shooting scene satisfies a first condition, respond to a first input of a user, obtain N image groups shot at different shooting moments, where each image group includes a first image shot by a first camera and a second image shot by a second camera, the first image includes the second object, the second image includes the first object and the second object, and N is a positive integer; replacing the second object in the second image in each image group with the second object in the first image to obtain N fusion images; a first video is generated based on the N fused images.

In this way, in the case where it is determined that the displayed preview image of the photographing scene has the second object and the photographing scene satisfies the first condition, N image groups photographed at different photographing moments can be obtained by responding to the first input of the user, each image group including the first image including the second object photographed by the first camera and the second image including the second object and the first object photographed by the second camera, so that the first image of the second object with clear texture and the second image of the first object with the second object and the large field of view can be obtained by the first camera and the second camera, respectively, and thus, after replacing the second object in the second image in each image group with the second object in the first image, a plurality of fused images of the second object with clear texture and the first object with the large field of view can be obtained, and further, a first video including the second object with clear texture and the first object with the large field of view can be obtained.

Optionally, the processor 1010 is further configured to perform enhancement processing on the sharpness of the second object in the first image in each image group, so as to obtain N third images; and replacing the second object in the second image in each image group with the second object in the third image to obtain N fusion images.

In this way, the definition of the second object in the first image is enhanced, so that N third images with better definition can be obtained, the second object in the second image in each image group can be replaced by the second object in the third image, N fusion images are obtained, the definition of the second object in the fusion image obtained in this way is better, and then the first video of the second object with better definition can be obtained.

Optionally, the processor 1010 is further configured to perform a first multiple of amplification processing on the first image in each image group to obtain N fourth images; and replacing the second object in the second image in each image group with the second object in the fourth image to obtain N fusion images.

In this way, the first image in each image group is amplified by a first multiple to obtain N fourth images, so that the image of the second object with clearer definition can be obtained, the first video of the second object with clear textures is obtained, and the definition of the second object in the first video is improved.

Optionally, the processor 1010 is further configured to perform a second multiple of amplification processing on the third image in each of the image groups to obtain N fifth images; and replacing the second object in the second image in each image group with the second object in the fifth image to obtain N fusion images.

In this way, the third image in each image group is amplified by a second multiple to obtain N fifth images, so that the image of the second object with clearer definition can be obtained, the first video of the second object with clear textures is obtained, and the definition of the second object in the first video is improved.

Optionally, the processor 1010 is further configured to input the first images in each of the image groups into a first model, and perform enhancement processing on the sharpness of the second object in the first image in each of the image groups based on the first model, so as to obtain N third images.

In this way, the first model is used for enhancing the second object in the first image in each image group, so that the determination efficiency of the third image is improved.

It should be understood that in the embodiment of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, and the graphics processor 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a color camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 can include two portions, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 1009 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

The processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the processes of the embodiment of the video generating method are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used for running a program or an instruction, so as to implement each process of the embodiment of the video generation method, and achieve the same technical effect, so that repetition is avoided, and no redundant description is provided here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

The embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the embodiments of the video generating method described above, and achieve the same technical effects, and are not repeated herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. A method of video generation, the method comprising:

under the condition that a second object is included in the preview image and the shooting scene meets a first condition, responding to a first input of a user, obtaining N image groups shot at different shooting moments, wherein each image group comprises a first image shot by a first camera and a second image shot by a second camera, the first image comprises the second object, the second image comprises the first object and the second object, and N is a positive integer;

a first video is generated based on the N fused images.

2. The method of claim 1, wherein prior to said replacing said second object in said second image in each said group of images with said second object in said first image, said method further comprises:

the replacing the second object in the second image in each image group with the second object in the first image to obtain N fused images includes:

and replacing the second object in the second image in each image group with the second object in the third image to obtain the N fused images.

3. The method of claim 1, wherein prior to said replacing said second object in said second image in each said group of images with said second object in said first image, said method further comprises:

amplifying the first images in each image group by a first multiple to obtain N fourth images;

4. The method according to claim 2, wherein after the enhancing the sharpness of the second object in the first image in each of the image groups, respectively, to obtain N third images, the method further comprises:

amplifying the third image in each image group by a first multiple to obtain N fifth images;

said replacing said second object in said second image in each said group of images with said second object in said third image to obtain said N fused images, comprising:

5. The method according to claim 2, wherein the enhancing the second object in the first image in each image group to obtain N third images includes:

6. A video generating apparatus, the apparatus comprising:

7. The apparatus of claim 6, wherein the apparatus further comprises:

The replacement module is specifically used for:

8. The apparatus of claim 6, wherein the apparatus further comprises:

the replacement module is specifically used for:

9. The apparatus of claim 7, wherein the apparatus further comprises:

a fourth determining module, configured to perform a first multiple of amplification processing on the third image in each image group, to obtain N fifth images;

the replacement module is specifically used for:

10. The apparatus of claim 7, wherein the enhancement module is specifically configured to: