CN106686452B

CN106686452B - Method and device for generating dynamic picture

Info

Publication number: CN106686452B
Application number: CN201611245811.3A
Authority: CN
Inventors: 刘熊
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2020-03-27
Anticipated expiration: 2036-12-29
Also published as: CN106686452A

Abstract

The embodiment of the invention provides a method and a device for generating a dynamic picture, wherein the method comprises the steps of firstly determining the target playing time of scene change in a target video; then dividing the target video into video segments by taking the target playing time as a division point; and finally, generating a dynamic picture according to each video segment obtained by division. Compared with the prior art, the scheme provided by the embodiment of the invention can realize the purposes of extracting the video segment from the video and automatically generating the dynamic picture on the basis of not needing to train the video segment model; and the demand on computing resources is less, and the operation of automatically generating the dynamic picture by intercepting the video segment from the video can be directly finished on a single machine.

Description

Method and device for generating dynamic picture

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a dynamic picture.

Background

The dynamic pictures are composed of multiple frames of static pictures and are displayed according to a certain playing sequence, and the computer and other devices can read the static pictures contained in the dynamic pictures frame by frame and display the static pictures on a screen, so that simple animations can be displayed, for example, common GIF (Graphics Interchange Format) pictures, WebP (picture Format developed by google), and the like. At present, dynamic pictures have very wide use rate in various network communication platforms.

The dynamic pictures can be obtained by manually combining static pictures, and can also be automatically generated by cutting video segments from the video. In the existing method for automatically generating a dynamic picture by intercepting a video segment from a video, a video segment suitable for generating the dynamic picture needs to be selected from the video, and then the selected video segment is adopted to generate the dynamic picture, so that the aim of automatically generating the dynamic picture from the video is fulfilled. In the prior art, a video segment model suitable for generating a GIF picture can be trained in advance through a deep learning neural network, and then in a specific generation method of a dynamic picture, a video segment suitable for generating the dynamic picture is selected directly by using the model.

The GIF pictures can be well generated by applying the method, however, when the neural network model is trained, a large number of dynamic pictures and corresponding video sources are needed, and a large amount of data and a large amount of computing resources are needed to be consumed in the training process.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for generating a dynamic picture, which aim to extract a video segment from a video and automatically generate the dynamic picture on the basis of not needing to train a video segment model. The specific technical scheme is as follows:

to achieve the above object, in a first aspect, an embodiment of the present invention provides a method for generating a dynamic picture, where the method includes:

determining a target playing time of scene change in a target video;

dividing the target video into video segments by taking the target playing time as a division point;

and respectively generating a dynamic picture according to each video segment obtained by division.

Preferably, the generating a dynamic picture according to each video segment obtained by dividing includes:

for each video segment, performing the following operations:

determining a target image from the images contained in the video segment;

respectively judging whether each pair of adjacent target images are similar according to the playing sequence of each image in the video segment;

selecting images from the determined target images according to a preset selection rule, and determining a first type of image group according to the selected images; wherein the selection rule is: any adjacent image in the selected images is similar, the first image is not similar to the second image, or the first image is the first frame image of the video segment; the third image is not similar to the fourth image, or the third image is the last frame image of the video segment, and the first image is: a first frame image of the selected images, the second image being: the third picture is a picture of a frame before the first picture in the video segment: a last frame image of the selected images, the fourth image being: a frame image subsequent to the third image in the video segment;

and generating a dynamic picture based on the determined first type image group.

Preferably, the determining a target image from the images contained in the video segment includes:

an image is extracted from the images contained in the video segment as a target image.

Preferably, the determining whether each pair of adjacent target images are similar according to the playing sequence of each image in the video segment includes:

obtaining a thumbnail of the determined target image of each frame;

respectively judging whether the thumbnails corresponding to each pair of adjacent target images are similar or not according to the playing sequence of each image in the video segment;

if so, judging that the adjacent target images are similar;

if not, judging that the adjacent target images are not similar.

Preferably, the respectively determining whether the thumbnails corresponding to each pair of adjacent target images are similar includes:

respectively calculating similarity values between the thumbnails corresponding to each pair of adjacent target images;

judging whether the similarity metric value is larger than a preset threshold value or not;

if so, judging that the thumbnails corresponding to the adjacent target images are similar respectively;

if not, the thumbnails corresponding to the adjacent target images are judged to be dissimilar.

In a second aspect, an embodiment of the present invention provides an apparatus for generating a moving picture, where the apparatus includes:

the determining module is used for determining the target playing time of scene change in the target video;

the dividing module is used for dividing the target video into video segments by taking the target playing time as a dividing point;

and the generation module is used for generating the dynamic picture according to each video segment obtained by division.

Preferably, the generating module includes:

the determining submodule is used for determining a target image from images contained in the video segments respectively aiming at each video segment;

the judgment submodule is used for respectively judging whether each pair of adjacent target images are similar or not according to the playing sequence of each image in the video segments aiming at each video segment;

the selection submodule is used for selecting images from the determined target images according to a preset selection rule aiming at each video segment and determining a first type of image group according to the selected images; wherein the selection rule is: any adjacent image in the selected images is similar, the first image is not similar to the second image, or the first image is the first frame image of the video segment; the third image is not similar to the fourth image, or the third image is the last frame image of the video segment, and the first image is: a first frame image of the selected images, the second image being: the third picture is a picture of a frame before the first picture in the video segment: a last frame image of the selected images, the fourth image being: a frame image subsequent to the third image in the video segment;

and the generation submodule is used for generating a dynamic picture based on the determined first type image group respectively aiming at each video segment.

Preferably, the determining submodule is specifically configured to:

Preferably, the judgment sub-module includes:

an obtaining subunit, configured to obtain a thumbnail of the target image determined for each frame;

the first judging subunit is configured to respectively judge, for each video segment, whether thumbnails corresponding to each pair of adjacent target images are similar according to a playing sequence of each image in the video segment;

a first judging subunit, configured to judge that adjacent target images are similar if a judgment result of the first judging subunit is yes;

and the second judgment subunit is used for judging that the adjacent target images are not similar under the condition that the judgment result of the first judgment subunit is negative.

Preferably, the judgment sub-module includes:

the calculating subunit is configured to calculate, for each video segment, a similarity metric value between thumbnails corresponding to each pair of adjacent target images according to a playing order of each image in the video segment;

the second judgment subunit is used for judging whether the similarity metric value is greater than a preset threshold value;

a third judging subunit, configured to, if a judgment result of the second judging subunit is yes, judge that thumbnails corresponding to adjacent target images are similar, respectively;

and a fourth judging subunit, configured to, in a case where a judgment result of the second judging subunit is negative, judge that thumbnails corresponding to adjacent target images are not similar, respectively.

As can be seen from the above, in the method and apparatus for generating a dynamic picture provided in the embodiment of the present invention, first, a target playing time at which a scene change occurs in a target video is determined; then dividing the target video into video segments by taking the target playing time as a division point; and finally, generating a dynamic picture according to each video segment obtained by division. Compared with the prior art, the method and the device have the advantages that the aim of extracting the video segments from the video and automatically generating the dynamic picture is fulfilled on the basis of not needing to train a video segment model; the embodiment of the invention has less requirement on computing resources, and can directly finish the operation of automatically generating the dynamic picture by capturing the video segment from the video on a single machine.

Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart illustrating a method for generating a dynamic picture according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a method for generating a dynamic picture according to another embodiment of the present invention;

fig. 3 is a schematic structural diagram of a device for generating a dynamic picture according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a device for generating a moving picture according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a method for generating a dynamic picture according to an embodiment of the present invention, as shown in fig. 1, the method includes:

s101: and determining the target playing time of the scene change in the target video.

Generally, the content between adjacent frames in a video has similarity, however, there may be a phenomenon of switching from one shot scene to another shot scene when the video is shot, or a phenomenon of switching the picture content between two or more shot scenes when the video is produced later, so that the content difference between two adjacent frames is large. When the content difference between two adjacent frames in the video reaches a certain degree, it can be considered that a scene change occurs in the video, that is, the above-mentioned scene change occurs.

It should be noted that the target playing time at which the scene change occurs is also referred to as a scene change timestamp, and the target playing time is determined with respect to the playing time of the entire target video.

For example, for a target video with a duration of 90 minutes, if there is a target playing time of 45 minutes and 35 seconds, it indicates that the target playing time is the 45 minutes and 35 seconds of the playing of the target video; for another example, for a target video with a duration of 60 minutes, if there is a target playing time of 23 minutes and 06 seconds, it indicates that the target playing time is the 23 minutes and 06 seconds of the playing of the target video.

In the embodiment of the present invention, the time when the scene change occurs may be detected by a professional tool in the prior art, for example: FFmpeg (a set of open source computer programs that can be used to record, convert digital audio, video, and convert them into streams) provides a scene change timestamp detection function for detecting the time of a scene change; therefore, all target playing time instants at which scene changes occur in the target video can be determined by FFmpeg.

Of course, in practical application, it may also be determined whether a scene switching occurs directly by using a difference degree between two adjacent frames, where the difference degree may be calculated according to histogram information of the two adjacent frames, motion vector information of each coding block, and the like, and the present application does not limit this.

S102: and dividing the target video into video segments by taking the target playing time as a division point.

The scene change in the video often causes the content type of the video to change, the video background does not change greatly in the video segment between two consecutive scene change moments, and the video segment between two consecutive scene change moments is suitable for generating a dynamic picture.

It should be noted that the divided video segment should be determined by two adjacent target playing time points, and the divided video segment may not contain other target playing time points except the front end point and the rear end point. For example, for a target video with a duration of 2 minutes, the determined target playing time includes 0 minute 30 seconds, 0 minute 51 seconds, and 1 minute 04 seconds; the video segments that can be divided at this time can have 0 minute 0 second to 0 minute 30 second, 0 minute 30 second to 0 minute 51 second, 0 minute 51 second to 1 minute 04 second, and 1 minute 04 second to 2 minute 0 second; and the video segment divided from 0 minute to 0 minute for 51 seconds cannot be used as the divided video segment.

In order to facilitate the video segment division, the start time of the target video may be determined as a first target playing time, and the end time of the target video may be determined as a last target playing time.

S103: and respectively generating a dynamic picture according to each video segment obtained by division.

The dynamic pictures comprise common GIF pictures, WebP dynamic pictures and the like, for example, the GIF pictures become an expression mode of people on the Internet since the birth of 1987, the current GIF pictures are getting more and more fierce in nearly 30 years, and the trace of social applications such as WeChat, QQ, Line and the like can be seen everywhere, such as expression packages in expression shops in the social applications and GIF pictures appearing on various websites.

It should be noted that the number of the dynamic pictures that can be generated for one video segment is not limited, and may be 1 group or multiple groups, for example, for a certain video segment, 3 groups of GIF pictures are finally generated.

Specifically, when a dynamic picture is generated according to each video segment obtained by dividing, the dynamic picture may be generated according to an original image in the video segment, or the original image in the video segment may be scaled, cropped, beautified, and the like, and then the dynamic picture is generated according to the processed image.

In addition, after determining the video segments suitable for generating the dynamic picture, the specific implementation of step S103 may be implemented by the prior art, and in the embodiment of the present invention, as shown in fig. 2, the generating the dynamic picture according to each video segment obtained by dividing (S103) may include:

for each video segment, performing the following operations:

s301: a target image is determined from the images contained in the video segment.

In an embodiment of the present invention, all pictures contained in the video segment can be determined as target pictures.

As will be understood by those skilled in the art, the data size of the image is large, and if each frame image in the video segment is considered frame by frame, the speed of generating the dynamic picture is slow, and for this reason, only a part of the image in the video segment may be selected according to a preset rule, and the selected image is determined as the target image.

In an embodiment of the present invention, the determining a target image from the images contained in the video segment (S301) may include:

It can be understood by those skilled in the art that the above steps can be regarded as a frame extraction process, for example, for 100 frames of images played continuously, the numbers are 1 to 100 respectively, and at this time, the image with the odd number can be extracted as the target image; or extracting the image with even number as the target image; it is also possible to extract one image every 3 frames from the image numbered 1, and to take the extracted image as a target image.

S302: and respectively judging whether each pair of adjacent target images are similar according to the playing sequence of each image in the video segment.

As in the above example, for 100 frames of images played continuously, assuming that the odd-numbered images are extracted as target images, the adjacent target images include target images 1 and 3, 3 and 5, 5 and 7, …, 97 and 99.

It should be noted that the determination of whether each pair of adjacent target images is similar, i.e., the similarity of each pair of adjacent target images is determined. As can be understood from the foregoing prior art, a video generally has a higher frame rate, generally 15 or 25 frames per second, or even higher, and the frame image similarity detection requires a certain amount of calculation, so in the embodiment of the present invention, in order to reduce the amount of calculation of the similarity detection, the processing manner of step S301 is adopted.

It should be noted that, in the embodiment of the present invention, only the similarity between two adjacent target images is determined, and the similarity between the current frame and the previous multi-frame image is not compared as in the prior art, which further reduces the calculation amount of the similarity.

In addition, the resolution of the target image determined in step S301 is generally high, so to further reduce the calculation amount of similarity, in the embodiment of the present invention, the determining whether each pair of adjacent target images are similar according to the playing order of the images in the video segment may include:

obtaining a thumbnail of the determined target image of each frame;

if so, judging that the adjacent target images are similar;

if not, judging that the adjacent target images are not similar.

It should be noted that, in the above-mentioned obtaining of the thumbnail of the target image determined by each frame, the width of each frame of the target image may be reduced to a preset value, for example, the width of the original target image is reduced to 100, under the condition that the aspect ratio of the original image is maintained; it is also possible to directly reduce the target image of each frame by a preset multiple, for example, directly reduce the target image of each frame by 5 times.

In this embodiment of the present invention, the determining whether the thumbnails corresponding to each pair of adjacent target images are similar respectively may include:

It should be noted that, the specific calculation method of the similarity metric value may use the prior art, for example, the similarity is determined by the mean square error and the percentage of the pixel error in the prior art, and in the embodiment of the present invention, the similarity of the two thumbnails may also be determined by using SSIM (structural similarity index) algorithm.

For example, the preset threshold is x, the corresponding target images of the two existing thumbnails 1 'and 2' are images 1 and 2, respectively, and the SSIM values of the two thumbnails 1 'and 2' obtained by calculation through the SSIM algorithm are y: if y is larger than x, judging that the two thumbnails 1 'and 2' are similar, namely the target images 1 and 2 are similar; otherwise, it is determined that the two thumbnails 1 'and 2' are not similar, i.e., the target images 1 and 2 are not similar.

S303: and selecting images from the determined target images according to a preset selection rule, and determining a first type image group according to the selected images.

Wherein the selection rule is: any adjacent image in the selected images is similar, the first image is not similar to the second image, or the first image is a first frame image of the video segment; the third image is not similar to the fourth image, or the third image is the last frame image of the video segment, and the first image is: a first frame image of the selected images, the second image being: the third picture is a picture of the video segment that is previous to the first picture: the last frame of image in the selected images, the fourth image being: a picture subsequent to the third picture in the video segment.

It is understood that, in the embodiment of the present invention, each of the determined first type image groups should satisfy at least the following two conditions:

firstly, the method comprises the following steps: each target image in the first type image group is a continuous similar frame.

II, secondly: the first frame image in the first type image group is the first frame image of the video segment, or the first frame image in the first type image group is not similar to the previous frame image; the last frame image in the first type image group is the last frame image of the video segment, or the last frame image in the first type image group is not similar to the next frame image.

For example, according to the video playing sequence, the labels of 100 target images are 1-100, and it is determined that 1-45 target images are continuously similar, the 45 th target image is not similar to the 46 th target image, the 46-79 target images are similar, the 79-82 target images are not similar, and the 82-100 target images are continuously similar; then the first category of image group determined at this time may have 3 groups: 1-45 frames of target images, 46-79 frames of target images, and 82-100 frames of target images.

The complete operation of determining the group of pictures of the first type from a video segment may specifically be as follows:

firstly, inputting all target images contained in a video segment and the total number N of the target images, numbering all the target images as 1, 2, 3, … and N according to the playing sequence of the target video, wherein obviously, in terms of numerical value, N is equal to N;

secondly, initializing a starting frame index x of the continuous similar frames to be 1, and initializing a current frame index y to be 1;

thirdly, judging whether the current y value is smaller than n; if yes, executing the fourth step, if not, executing the seventh step;

fourthly, calculating the similarity value of the thumbnail corresponding to the target image with the number of y and the thumbnail corresponding to the target image with the number of y + 1;

fifthly, judging whether the calculated similarity metric value is larger than a preset threshold value or not; if yes, y is updated to be y +1, the third step is executed in a returning mode, and if not, the sixth step is executed;

taking the current x-th frame target image as an initial frame and the current y-th frame target image as an end frame, and recording continuous similar frames x-y; meanwhile, updating x to y +1, and y to y +1, and returning to execute the third step;

and step seven, finishing the whole process and outputting all the recorded continuous similar frames.

It should be noted that, in the embodiment of the present invention, a lower limit of the number of the target images in the first image group may also be defined, for example, the lower limit of the number is set to 5 sheets, which indicates that the number of the target images included in the finally determined first-type image group should not be less than 5 sheets.

S304: and generating a dynamic picture based on the determined first type image group.

For each first-type image group, the images in the image group can be arranged in the playing order of the target video, and the frame rate of the dynamic picture can be set and whether to loop can be set. For example, for the images 1 to 20 in the first type of image group, the frame rate can be set to 8 frames per second, and loop playback can be set.

In addition, if the number of images included in the first type of image group is too large, an upper limit of the number may be set, for example, the set upper limit of the number is 20, and a first type of image group currently includes 30 frames of images, and the number of the 30 frames of images is 1 to 30 according to the playing sequence, then at this time, 20 consecutive similar images may be selected from the first type of image group, for example: 1 to 20, 6 to 25, 11 to 30.

It can be understood that, in the embodiment of the present invention, although there is no large change in the video background in the video segment determined by two consecutive target playing time points, the video segment is suitable for generating a moving picture, but this has a certain probability and cannot guarantee that the background of all the target images in the video segment does not change, so steps S301 to S303 are introduced to find out those target images with almost the same background in the video segment more accurately.

The following presents a simplified summary of an embodiment of the invention by way of a specific example.

In the existing target video, firstly, a computer can detect a scene change timestamp by using an FFmpeg tool, namely, a target playing time of a scene change in the target video is determined.

The corresponding commands are: ffmpeg-i video _ path-vf ' select ═ gt (scene \ 0.4), showninfo ' -fnull-2> &1| awk-F ' pts _ time: "/pts _ time: /{ split ($2, out, "); print out [1] }'

Where video _ path is a path of a video file. This results in a list of time stamps [ t (0), t (1), t (2) ], each time stamp representing the moment at which a new scene starts.

In the above list of timestamps, the timestamp t (k) and the next timestamp t (k +1) define a video segment. Taking k as an example of 2, determining the corresponding playing time of a video segment as t (2) to t (3), performing frame extraction processing on the video segment by using FFmpeg, and extracting an image as a target image according to the following commands:

ffmpeg-y-ss start_time-t duration-i video_path-lavfi'fps＝8'％04d.png

where, start _ time is the video clip start time, i.e. t (2), duration is the video clip duration, i.e. the difference between t (3) and t (2), video _ path is the path of the target video file, and% 04d.png is the file name mode of the generated frame map. Thus, the file names of the extracted target images are as follows:

0001.png、0002.png、0003.png…

then, for each target image extracted from the video clip, a corresponding thumbnail is generated through the following commands:

the command is: mogrify-path thumnail-resize 100-format png @ images

Where, the texture is a picture processing tool provided by the imagemap project, the thumbnail is a thumbnail storage directory, 100 denotes scaling the original to a width of 100 with keeping the aspect ratio, the images.txt is an input text file, and each line in the file content is a path of the target image extracted in the previous step.

After the thumbnail is generated, traversing the thumbnail according to the playing sequence of the target image corresponding to the thumbnail, calculating a similarity metric value between the current thumbnail and the next thumbnail by using an SSIM algorithm, if the metric value is greater than a preset threshold value, considering the current thumbnail and the next thumbnail to be similar, otherwise, considering the current thumbnail and the next thumbnail to be dissimilar; if the current thumbnail is similar to the next thumbnail, the comparison of the next thumbnail to the next thumbnail continues until a dissimilarity is encountered. Thereby determining the first type image group.

In this example, assuming that for the video clip a, the first-type image groups a and b are determined, it is possible to generate GIF pictures from all the target images in the first-type image groups a and b, respectively.

The specific command for generating the GIF picture is as follows:

convert-delay 1x8-loop 0@images.txt result.gif

wherein, convert is another picture processing tool provided by the imagemap project, 1x8 represents the delay time between each frame in the GIF picture, the loop option value is 0 to represent the GIF endless loop playing, image.

Corresponding to the embodiment of the method shown in fig. 1, as shown in fig. 3, an embodiment of the present invention further provides a device for generating a moving picture, where the device includes:

a determining module 110, configured to determine a target playing time at which a scene change occurs in a target video;

a dividing module 120, configured to divide the target video into video segments by using the target playing time as a dividing point;

a generating module 130, configured to generate a dynamic picture according to each video segment obtained by dividing.

Corresponding to the method embodiment shown in fig. 2, as shown in fig. 4, in practical applications, the generating module 130 may include:

a determining submodule 1301, configured to determine, for each video segment, a target image from images included in the video segment;

the determining sub-module 1302 is configured to determine, for each video segment, whether each pair of adjacent target images is similar according to a playing sequence of each image in the video segment;

the selection submodule 1303 is used for selecting an image from the determined target image according to a preset selection rule and determining a first type of image group according to the selected image for each video segment; wherein the selection rule is: any adjacent image in the selected images is similar, the first image is not similar to the second image, or the first image is the first frame image of the video segment; the third image is not similar to the fourth image, or the third image is the last frame image of the video segment, and the first image is: a first frame image of the selected images, the second image being: the third picture is a picture of a frame before the first picture in the video segment: a last frame image of the selected images, the fourth image being: a frame image subsequent to the third image in the video segment;

and a generation submodule 1304, configured to generate a moving picture based on the determined image group of the first type, for each video segment.

In practical applications, the determining sub-module 1301 may be specifically configured to:

In practical applications, the determining sub-module 1302 may include an obtaining sub-unit, a first determining sub-unit, and a second determining sub-unit (not shown in the figure):

In practical applications, the determining sub-module 1302 may include a calculating sub-unit, a second determining sub-unit, a third determining sub-unit, and a fourth determining sub-unit (not shown in the figure):

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for generating a moving picture, the method comprising:

determining a target playing time of scene change in a target video;

for each video segment, performing the following operations:

extracting a frame of image from the video segment at intervals of a predetermined number of frames, and taking the extracted image as a target image;

selecting images from the determined target images according to a preset selection rule, and determining a first type of image group according to the selected images; wherein the selection rule is: any adjacent images in the selected image are similar; the first frame image in the first type image group is the first frame image of the video segment, or the first frame image in the first type image group is not similar to the previous frame image; the last frame of image in the first type of image group is the last frame of image of the video segment, or the last frame of image in the first type of image group is not similar to the next frame of image;

2. The method according to claim 1, wherein said determining whether each pair of adjacent target pictures are similar according to the playing order of the pictures in the video segment comprises:

obtaining a thumbnail of the determined target image of each frame;

if so, judging that the adjacent target images are similar;

if not, judging that the adjacent target images are not similar.

3. The method of claim 2, wherein the determining whether the thumbnails corresponding to each pair of adjacent target images are similar respectively comprises:

4. An apparatus for generating a moving picture, the apparatus comprising:

a generation module, the generation module comprising:

the determining submodule is used for respectively extracting one frame of image from each video segment at intervals of a preset number of frames according to each video segment, and taking the extracted image as a target image;

the selection submodule is used for selecting images from the determined target images according to a preset selection rule aiming at each video segment and determining a first type of image group according to the selected images; wherein the selection rule is: any adjacent images in the selected image are similar; the first frame image in the first type image group is the first frame image of the video segment, or the first frame image in the first type image group is not similar to the previous frame image; the last frame of image in the first type of image group is the last frame of image of the video segment, or the last frame of image in the first type of image group is not similar to the next frame of image;

5. The apparatus of claim 4, wherein the determining sub-module comprises:

6. The apparatus of claim 5, wherein the determining sub-module comprises: