CN114764839A

CN114764839A - Dynamic video generation method and device, readable storage medium and terminal equipment

Info

Publication number: CN114764839A
Application number: CN202011627470.2A
Authority: CN
Inventors: 樊顺利; 陈巍; 徐璐; 肖云雷; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-07-19

Abstract

The present application belongs to the field of image processing technologies, and in particular, to a dynamic video generation method, an apparatus, a computer-readable storage medium, and a terminal device. The method comprises the following steps: acquiring a static image to be processed; performing segmentation processing on the static image to obtain a sky area and a non-sky area in the static image; and performing motion prediction on the sky area to generate a dynamic video corresponding to the static image. By the method and the device, the sky area and the non-sky area in the static image can be effectively distinguished, and when the dynamic video is generated, only the sky area is subjected to motion prediction, so that the video which highlights sky change can be obtained, and the requirements of users are met.

Description

Dynamic video generation method and device, readable storage medium and terminal equipment

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to a dynamic video generation method, an apparatus, a computer-readable storage medium, and a terminal device.

Background

The generation of a dynamic video from a single static image based on a deep learning technique is a currently active research field. For example, in the paper "animation landscape: motion and Appearance decoupling based Single Image Video synthesis for unsupervised Learning (Endo, y., Kanamori, y., Kuriyama, s., simulation landmark: Self-redundant Learning of noncompled Motion and application for Single-Image Video synthesis. ACM transformations on Graphics (Proceedings of ACM sigraphia 2019)38(6) (2019)175: 1-175: 19) utilizes predicted intermediate mapping for indirect Image synthesis instead of direct prediction to generate output frames, so higher quality Motion Video can be generated. However, for a static image containing sky, a user desires to obtain a video highlighting sky change, and in a dynamic video generated based on the prior art, the whole picture is in motion and changes, which is difficult to meet the user's requirements.

Disclosure of Invention

In view of this, embodiments of the present application provide a method and an apparatus for generating a dynamic video, a computer-readable storage medium, and a terminal device, so as to solve the problem that, in a dynamic video generated in the prior art, the entire screen is moving and changing, which is difficult to meet the requirements of users.

A first aspect of an embodiment of the present application provides a dynamic video generation method, including:

acquiring a static image to be processed;

performing segmentation processing on the static image to obtain a sky area and a non-sky area in the static image;

and performing motion prediction on the sky area to generate a dynamic video corresponding to the static image.

A second aspect of an embodiment of the present application provides a dynamic video generation apparatus, including:

the image acquisition module is used for acquiring a static image to be processed;

the image segmentation module is used for carrying out segmentation processing on the static image to obtain a sky area and a non-sky area in the static image;

and the dynamic video generation module is used for carrying out motion prediction on the sky area and generating a dynamic video corresponding to the static image.

A third aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned dynamic video generation methods.

A fourth aspect of the embodiments of the present application provides a terminal device, where the terminal device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of any of the above dynamic video generation methods when executing the computer program.

A fifth aspect of the embodiments of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the steps of any one of the above-mentioned dynamic video generation methods.

Compared with the prior art, the embodiment of the application has the beneficial effects that: acquiring a static image to be processed; performing segmentation processing on the static image to obtain a sky area and a non-sky area in the static image; and performing motion prediction on the sky area to generate a dynamic video corresponding to the static image. By the aid of the method and the device, the sky area and the non-sky area in the static image can be effectively distinguished, and only the sky area is subjected to motion prediction when a dynamic video is generated, so that a video which highlights sky change can be obtained, and requirements of users are met.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of an embodiment of a dynamic video generation method in an embodiment of the present application;

FIG. 2 is a schematic illustration of a still image to be processed;

fig. 3 is a schematic flowchart of a process of segmenting a still image to obtain a sky region and a non-sky region in the still image;

FIG. 4 is a schematic diagram of a segmentation mask obtained by performing a segmentation process on a static image using a preset semantic segmentation network;

FIG. 5 is a schematic diagram of the final split mask;

FIG. 6 is a schematic illustration of a cloud template image;

fig. 7 is a comparison diagram of the sky regions before and after image fusion;

FIG. 8 is a schematic diagram of a deep learning network that performs motion compensation processing;

FIG. 9 is an exemplary diagram of generating a motion video based on a still image;

fig. 10 is a block diagram of an embodiment of a motion video generating apparatus according to an embodiment of the present application;

fig. 11 is a schematic block diagram of a terminal device in an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, an embodiment of a dynamic video generation method in an embodiment of the present application may include:

step S101, the terminal equipment acquires a static image to be processed.

As shown in fig. 2, the static image is an image of the sky, and the static image may be an image that is instantly acquired by a user shooting a landscape including the sky through a camera of a terminal device such as a mobile phone or a tablet computer. In a specific use scenario of this embodiment, when a user wants to perform dynamic video generation immediately, before a still image is captured, a dynamic video generation mode of a terminal device may be opened by clicking a specific physical key or a virtual key, and in this mode, the terminal device may directly perform dynamic video generation on the still image captured by the user.

The still image may be an image that is originally stored in the terminal device, or an image that is acquired by the terminal device from a cloud server or other terminal devices via a network. In another specific use scenario of this embodiment, when a user wants to perform dynamic video generation on an existing still image, the dynamic video generation mode of the terminal device may be opened by clicking a specific physical key or virtual key, and a still image to be processed is selected (the order of clicking the key and selecting the image may be interchanged, that is, the image may be selected first, and then the dynamic video generation mode of the terminal device is opened), so that the terminal device may perform dynamic video generation on the still image.

Step S102, the terminal device carries out segmentation processing on the static image to obtain a sky area and a non-sky area in the static image.

As shown in fig. 3, in a specific implementation of the embodiment of the present application, step S102 may specifically include the following processes:

and S1021, the terminal equipment uses a preset semantic segmentation network to segment the static image to obtain a first segmentation mask of the static image.

The semantic segmentation network used in the embodiment of the present application may include, but is not limited to, neural networks such as HRNet, deep lab, FCN, UNet, SegNet, etc. The terminal device roughly divides the static image into two types of areas, namely a sky area and a non-sky area, by using a semantic division network, wherein the pixel point of the sky area is white and is represented as '1' by using binary, the pixel point of the non-sky area is black and is represented as '0' by using binary, and then the division mask shown in fig. 4 can be obtained, and is marked as a first division mask for the convenience of distinguishing.

After the coarse segmentation process is completed, the terminal device can continue to perform fine segmentation on the missed non-sky region through subsequent steps, so that the accuracy of the segmentation result is further improved.

Step S1022, the terminal device performs segmentation processing on the blue channel (B channel) of the static image by using the OTSU method to obtain a second segmentation mask of the static image.

The Otsu method is also called a maximum inter-class difference method, and is an algorithm for determining a binary segmentation threshold of an image, and the algorithm divides the image into a background (namely a sky region) and a foreground (namely a non-sky region) according to the gray level characteristics of the image. Since the variance is a measure of the uniformity of the gray distribution, the larger the inter-class variance between the background and the foreground is, the larger the difference between the two parts constituting the image is, and the smaller the difference between the two parts is when part of the foreground is mistaken for the background or part of the background is mistaken for the foreground. Thus, a segmentation that maximizes the inter-class variance means that the probability of false positives is minimized. For the sake of convenience of distinction, the division mask obtained by the Otsu process is referred to as a second division mask.

And S1023, the terminal equipment synthesizes a third division mask according to the first division mask and the second division mask.

First, the terminal device calculates the similarity between the first division mask and the second division mask, and records it as the first similarity. In the embodiment of the present application, any similarity calculation method in the prior art may be adopted according to practical situations, including but not limited to a Peak Signal to Noise Ratio (PSNR) similarity calculation method.

Then, the terminal device determines whether the first similarity is greater than a preset first threshold. The specific value of the first threshold may be set according to an actual situation, and is not specifically limited in this embodiment. Preferably, the first threshold may be set to 20.

If the first similarity is larger than the first threshold value, the terminal equipment performs pixel-by-pixel OR operation on the first division mask and the second division mask to obtain a third division mask; and if the first similarity is smaller than or equal to the first threshold value, the terminal equipment takes the first division mask as a third division mask.

After the fine segmentation process is completed, the terminal equipment can further process the missed detected small foreground target area through subsequent steps, so that the precision of the segmentation result is further improved.

And step S1024, the terminal device conducts segmentation processing on a red channel (R channel) of the static image according to a preset binarization segmentation threshold value to obtain a fourth segmentation mask of the static image.

The binary segmentation threshold may be set according to actual situations, and is not specifically limited in the embodiment of the present application. Preferably, the binarization segmentation threshold may be set to 50. For the sake of convenience of distinction, the division mask obtained by the binarization division using the threshold value is referred to as a fourth division mask.

Further, the terminal device may also perform filtering processing on the R channel in advance to eliminate the noise influence before performing binarization segmentation using the threshold value. In the embodiments of the present application, any filtering method in the prior art may be adopted according to practical situations, including but not limited to a Difference of Gaussian (DoG) filtering method.

And S1025, the terminal equipment synthesizes a fifth division mask according to the third division mask and the fourth division mask, and determines a sky area and a non-sky area in the static image according to the fifth division mask.

First, the terminal device calculates the similarity between the third division mask and the fourth division mask, and records it as the second similarity. In the embodiment of the present application, any similarity calculation method in the prior art may be adopted according to practical situations, including but not limited to a Peak Signal to Noise Ratio (PSNR) similarity calculation method.

Then, the terminal device determines whether the second similarity is greater than a preset second threshold. The specific value of the second threshold may be set according to an actual situation, and is not specifically limited in this embodiment. Preferably, the second threshold may be set to 20.

If the second similarity is larger than a second threshold value, the terminal equipment performs pixel-by-pixel OR operation on the third division mask and the fourth division mask to obtain a final division mask, and the final division mask is marked as a fifth division mask; and if the second similarity is smaller than or equal to a second threshold value, the terminal equipment takes the third division mask as a fifth division mask.

Through the coarse-to-fine segmentation process, a final segmentation mask as shown in fig. 5 can be obtained, and a sky region and a non-sky region are determined according to the segmentation mask. By the method, the problem of mistaken segmentation of the small foreground target with certain contrast can be solved, and the obvious foreground motion condition of the generated dynamic video caused by the fact that the sky segmentation is not fine enough is prevented.

Optionally, in a case that the requirement on the accuracy of the segmentation result is not high, the terminal device may directly use any one of the first segmentation mask, the second segmentation mask, the third segmentation mask, or the fourth segmentation mask as a final segmentation template, and determine the sky region and the non-sky region according to the segmentation mask.

In a specific implementation of the embodiment of the application, after determining the sky region and the non-sky region, the terminal device may further perform scene judgment on the static images, and reject the static images of the following two scenes without performing subsequent processing on the static images:

scene one: the sky area is too small or does not include a sky area.

The area of the sky area and the total area of the static image are calculated by the terminal device respectively, and if the ratio of the area of the sky area to the total area is smaller than a proportional threshold, the static image is rejected. The specific value of the proportional threshold may be set according to actual conditions, and is not specifically limited in the embodiments of the present application. Preferably, the proportional threshold may be set to 20%.

Scene two: the sky area is too dark or too bright.

The terminal device calculates an average gray value of the sky area, and if the average gray value is smaller than a preset first gray threshold, the sky area is considered to be too dark; if the average gray value is larger than a preset second gray threshold value, the sky area is considered to be too bright; the first grayscale threshold is less than the second grayscale threshold. The specific values of the first gray threshold and the second gray threshold may be set according to actual conditions, and the first gray threshold and the second gray threshold are not specifically limited in this embodiment. Preferably, the first gray threshold may be set to 80, and the second gray threshold may be set to 220.

Further, in order to highlight the change of the sky area, the terminal device may further add the cloud into the sky area for the cloud-free sky area.

First, the terminal device determines whether the sky area has clouds. Specifically, the sky region can be filtered by using a sobel operator, then an average gradient value of the sky region is calculated, and if the average gradient value is smaller than a preset gradient threshold value, the sky region is considered to be cloud-free; otherwise, if the average gradient value is greater than or equal to the gradient threshold value, the sky area is considered to have clouds. The specific value of the gradient threshold may be set according to an actual situation, and is not specifically limited in the embodiment of the present application. Preferably, the gradient threshold may be set to 3.

If the sky area has clouds, the clouds do not need to be added; if the sky area is cloud-free, the terminal device can fuse a preset cloud template image into the sky area according to preset image fusion parameters. The cloud template image may be set according to actual conditions, and fig. 6 is a schematic diagram of an optional cloud template image.

When image fusion is performed, the terminal device may first adjust the size of the cloud template image, and scale the cloud template image to an appropriate size so as to be matched with the sky area. Then, carrying out pixel-by-pixel weighted summation on the cloud template image and the sky area according to the following formula, thereby obtaining a fused sky area:

I(ROI)_c＝α*I(ROI)+β*C

wherein, I (ROI)_cThe method includes the steps of (1) obtaining a fused sky region, (i) (roi) obtaining a fused sky region, C obtaining a cloud template, and α and β obtaining preset fusion weights, wherein specific values of the fusion weights can be set according to actual conditions. Preferably, α may be set to 1 and β may be set to 0.4.

Fig. 7 is a comparison diagram of the sky regions before and after image fusion, where the upper diagram is the sky region before fusion, and the lower diagram is the sky region after fusion.

Step S103, the terminal device performs motion prediction on the sky area to generate a dynamic video corresponding to the static image.

In the embodiment of the present application, the scene may be based on the article "animation landscape: motion generation and appearance transformation are performed based on a method in motion and appearance decoupled, self-supervised learning single image video synthesis. The method has the greatest characteristic that potential codes of motion and appearance are regularized in a training process, and the potential codes obtained through training can be used for controlling in motion and appearance prediction. The embodiment of the application is further improved on the basis that:

first, the terminal device performs optical flow prediction and image warping (warping) only on the sky area, thereby obtaining an optical flow prediction result and an image warping result. And the speed of the non-sky area during motion generation is set to zero, so that the non-sky area is not subjected to motion prediction, and the non-sky area is further ensured to be still during motion generation.

Then, the terminal device inputs the static image (denoted as f), the optical flow prediction result (denoted as v) and the image warping result (denoted as w (f, v)) into a preset deep learning network for motion compensation processing, and obtains a dynamic video corresponding to the static image.

Fig. 8 is a schematic diagram of a deep learning network for performing motion compensation, where an input of the deep learning network is an 8-channel map composed of f, v, and w (f, v), and the deep learning network sequentially passes through a first convolutional layer (Conv (3,64,1)), a first Residual Block (Residual Block), a first maximum pooling layer (Max pooling), a second Residual Block, a second maximum pooling layer, a third Residual Block, a fourth Residual Block, a first upsampling layer (Up-sampling), a fifth Residual Block, a second upsampling layer, a sixth Residual Block, a second convolutional layer (Conv (3,64,1)), and a third convolutional layer (Conv (3,3,1)), where Conv (k, o, s) represents an operation with a convolutional kernel size of k, an output channel of o, and a step size of s.

It should be noted that a skip connection is also introduced between the first residual block and the sixth residual block, that is, the output of the first residual block and the output of the second upsampling layer are overlapped, and the overlapping result of the two is used as the input of the sixth residual block; similarly, a skip connection is also introduced between the second residual block and the fifth residual block, i.e. the output of the second residual block is superimposed with the output of the first upsampled layer, and the result of the superimposition of the two is used as the input of the fifth residual block.

After the above processing procedure, the deep learning network outputs the motion compensation result (recorded as 3 channels)

) I.e. the final motion generation frame, several motion generation frames can be combined into a dynamic video. Further, the appearance change can be continuously carried out, and finally, the dynamic video containing the motion and the appearance change is generated. Fig. 9 shows an example of generating a dynamic video based on a static image, in which the 25 th frame, the 50 th frame, the 100 th frame, and the 300 th frame are video frames captured from a generated dynamic video containing motion and appearance change.

To sum up, the embodiment of the present application obtains a static image to be processed; performing segmentation processing on the static image to obtain a sky area and a non-sky area in the static image; and performing motion prediction on the sky area to generate a dynamic video corresponding to the static image. By the aid of the method and the device, the sky area and the non-sky area in the static image can be effectively distinguished, and only the sky area is subjected to motion prediction when a dynamic video is generated, so that a video which highlights sky change can be obtained, and requirements of users are met.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 10 shows a structure diagram of an embodiment of a dynamic video generation apparatus provided in an embodiment of the present application, which corresponds to the dynamic video generation method in the foregoing embodiment.

In this embodiment, a dynamic video generation apparatus may include:

an image acquisition module 1001 configured to acquire a still image to be processed;

the image segmentation module 1002 is configured to perform segmentation processing on the static image to obtain a sky region and a non-sky region in the static image;

the dynamic video generating module 1003 is configured to perform motion prediction on the sky area, and generate a dynamic video corresponding to the static image.

Further, the image segmentation module may include:

the semantic segmentation unit is used for segmenting the static image by using a preset semantic segmentation network to obtain a first segmentation mask of the static image;

the Otsu method segmentation unit is used for carrying out segmentation processing on the blue channel of the static image by using an Otsu method to obtain a second segmentation mask of the static image;

and the first synthesis unit is used for synthesizing the third division mask according to the first division mask and the second division mask and determining a sky area and a non-sky area in the static image according to the third division mask.

Further, the first synthesizing unit may include:

the first similarity degree operator unit is used for calculating a first similarity degree between the first segmentation mask and the second segmentation mask;

the first or operation subunit is used for carrying out pixel-by-pixel or operation on the first division mask and the second division mask to obtain a third division mask if the first similarity is greater than a preset first threshold;

and the first determining subunit is used for taking the first division mask as a third division mask if the first similarity is less than or equal to a first threshold value.

Further, the image segmentation module may further include:

the binary processing unit is used for carrying out segmentation processing on a red channel of the static image according to a preset binary segmentation threshold value to obtain a fourth segmentation mask of the static image;

and the second synthesis unit is used for synthesizing a fifth division mask according to the third division mask and the fourth division mask, and determining a sky area and a non-sky area in the static image according to the fifth division mask.

Further, the second synthesis unit may include:

the second similarity degree operator unit is used for calculating a second similarity degree between the third division mask and the fourth division mask;

the second or operation subunit is used for performing pixel-by-pixel or operation on the third division mask and the fourth division mask to obtain a fifth division mask if the second similarity is greater than a preset second threshold;

and a second determining subunit, configured to take the third division mask as a fifth division mask if the second similarity is smaller than or equal to a second threshold.

Further, the motion video generating apparatus may further include:

the cloud judging module is used for judging whether the sky area has clouds;

and the image fusion module is used for fusing a preset cloud template image into the sky area according to preset image fusion parameters if the sky area is cloud-free.

Further, the dynamic video generation module may include:

the motion processing unit is used for carrying out optical flow prediction and image distortion on the sky area to obtain an optical flow prediction result and an image distortion result;

and the motion compensation unit is used for inputting the static image, the optical flow prediction result and the image distortion result into a preset deep learning network for motion compensation processing to obtain a dynamic video corresponding to the static image.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices, modules and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Fig. 11 shows a schematic block diagram of a terminal device provided in an embodiment of the present application, and only shows a part related to the embodiment of the present application for convenience of description.

As shown in fig. 11, the terminal device 11 of this embodiment includes: a processor 110, a memory 111, and a computer program 112 stored in the memory 111 and operable on the processor 110. The processor 110, when executing the computer program 112, implements the steps in the various embodiments of the dynamic video generation method described above, such as the steps S101 to S103 shown in fig. 1. Alternatively, the processor 110, when executing the computer program 112, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the modules 1001 to 1003 shown in fig. 10.

Illustratively, the computer program 112 may be divided into one or more modules/units, which are stored in the memory 111 and executed by the processor 110 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 112 in the terminal device 11.

The terminal device 11 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. Those skilled in the art will appreciate that fig. 11 is merely an example of the terminal device 11, and does not constitute a limitation of the terminal device 11, and may include more or less components than those shown, or combine some of the components, or different components, for example, the terminal device 11 may further include an input-output device, a network access device, a bus, etc.

The Processor 110 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 111 may be an internal storage unit of the terminal device 11, such as a hard disk or a memory of the terminal device 11. The memory 111 may also be an external storage device of the terminal device 11, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 11. Further, the memory 111 may also include both an internal storage unit of the terminal device 11 and an external storage device. The memory 111 is used for storing computer programs and other programs and data required by the terminal device 11. The memory 111 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method according to the embodiments described above may be implemented by a computer program, which is stored in a computer readable storage medium and used by a processor to implement the steps of the embodiments of the methods described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for generating a dynamic video, comprising:

acquiring a static image to be processed;

2. The method of claim 1, wherein the segmenting the static image into the sky region and the non-sky region comprises:

using a preset semantic segmentation network to segment the static image to obtain a first segmentation mask of the static image;

performing segmentation processing on the blue channel of the static image by using an Otsu method to obtain a second segmentation mask of the static image;

and synthesizing a third division mask according to the first division mask and the second division mask, and determining a sky area and a non-sky area in the static image according to the third division mask.

3. The method of claim 2, wherein synthesizing a third split mask from the first split mask and the second split mask comprises:

calculating a first similarity between the first division mask and the second division mask;

if the first similarity is larger than a preset first threshold value, performing pixel-by-pixel OR operation on the first division mask and the second division mask to obtain a third division mask; alternatively, the first and second electrodes may be,

and if the first similarity is smaller than or equal to the first threshold, taking the first division mask as a third division mask.

4. The method of claim 2, wherein after the synthesizing a third division mask from the first division mask and the second division mask, the method further comprises:

segmenting the red channel of the static image according to a preset binarization segmentation threshold value to obtain a fourth segmentation mask of the static image;

synthesizing a fifth division mask according to the third division mask and the fourth division mask;

the determining a sky region and a non-sky region in the static image from the third segmentation mask comprises:

determining a sky region and a non-sky region in the static image according to the fifth segmentation mask.

5. The method of claim 4, wherein the synthesizing a fifth division mask from the third division mask and the fourth division mask comprises:

calculating a second similarity between the third division mask and the fourth division mask;

if the second similarity is larger than a preset second threshold, performing pixel-by-pixel OR operation on the third division mask and the fourth division mask to obtain a fifth division mask; alternatively, the first and second electrodes may be,

and if the second similarity is smaller than or equal to the second threshold, taking the third division mask as a fifth division mask.

6. The method of claim 1, wherein prior to the motion predicting the sky region and generating a dynamic video corresponding to the static image, the method further comprises:

judging whether the sky area has cloud or not;

if not, fusing a preset cloud template image into the sky area according to preset image fusion parameters.

7. The method of any of claims 1-6, wherein the motion predicting the sky region, generating a dynamic video corresponding to the static image, comprises:

performing optical flow prediction and image distortion on the sky area to obtain an optical flow prediction result and an image distortion result;

and inputting the static image, the optical flow prediction result and the image distortion result into a preset deep learning network for motion compensation processing to obtain a dynamic video corresponding to the static image.

8. A motion video generating apparatus, comprising:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of the dynamic video generation method according to any one of claims 1 to 7.

10. A terminal device, characterized in that the terminal device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the dynamic video generation method according to any one of claims 1 to 7 when executing the computer program.