CN109600544B

CN109600544B - Local dynamic image generation method and device

Info

Publication number: CN109600544B
Application number: CN201710939457.2A
Authority: CN
Inventors: 耿军; 朱斌; 胡康康; 马春阳; 李郭; 刍牧
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2017-09-30
Filing date: 2017-09-30
Publication date: 2021-11-23
Anticipated expiration: 2037-09-30
Also published as: CN109600544A; WO2019062631A1; TW201915946A

Abstract

The embodiment of the application provides a method and a device for generating a local dynamic image, and relates to the image processing technology. The method comprises the following steps: acquiring target video data uploaded by a user; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; receiving a target dynamic area determined by a user from the at least one dynamic area; and generating a local dynamic image aiming at the target dynamic region based on the target dynamic region determined by the user. According to the embodiment of the application, user operation is reduced, the accuracy of selecting the dynamic region of the main object is improved, and labor and time costs are reduced; in addition, the dynamic area is automatically identified by the system, the condition that a user is difficult to identify a not-clear main object through human eye identification is avoided, the requirement on the separation of the main object of the video is low, and the requirement on video materials is reduced.

Description

Local dynamic image generation method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a local dynamic image.

Background

With the continuous development of image technology, a local dynamic image appears, i.e. only a part of the subject objects in the whole image are moving.

In the prior art, to generate a local motion picture, the user is required to import CINEMAGRAPH (the magic fine motion technique in still photography) video data, and then create the local motion picture by: firstly, dividing a video into two layers, wherein the first layer is a static frame layer, and the second layer is a dynamic frame layer; secondly, manually drawing a contour region on the first layer by a user; thirdly, deleting the image of the outline area in the static frame of the first layer, and showing the dynamic frame of the outline area in the second layer; finally, a local dynamic image comprising the two layers is derived.

The inventor finds in the process of applying the above technology that it is necessary to mark a dynamic region by manually drawing an outline of a subject object by a user to realize the generation of a local dynamic image of a video, but: the contour of the main object is drawn manually, so that the operation difficulty is high, the effect is easy to cause inaccuracy, other unnecessary image elements are generally drawn into the contour, the other unnecessary image elements are caused to move, and if the effect is required to be accurate, a user needs to use a large number of complicated picture operations, so that the labor and time cost are wasted; moreover, as the main object required by the user is distinguished by human eyes, the video effect that the main object is separated clearly is better, and when the main object is not clear enough, the human eyes easily cause the outline to be inaccurate, so that the picture is distorted or not coincident.

Disclosure of Invention

In view of the foregoing problems, an embodiment of the present application provides a local dynamic image generation method, so as to determine a dynamic region of a subject object automatically according to a coincidence degree between subject objects of frames in a sequence frame of a video, and then automatically generate the dynamic region for the subject object, thereby solving the problems in the prior art that a user manually draws a contour, which causes great operation difficulty, inaccurate drawn contour, and distorted or misaligned picture.

Correspondingly, the embodiment of the application also provides a local dynamic image generation device, which is used for ensuring the realization and the application of the method.

In order to solve the above problem, an embodiment of the present application discloses a method for generating a local dynamic image, including:

acquiring target video data uploaded by a user;

analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data;

receiving a target dynamic area determined by a user from the at least one dynamic area;

and generating a local dynamic image aiming at the target dynamic region based on the target dynamic region determined by the user.

The embodiment of the application further discloses a method for generating a local dynamic image, which comprises the following steps:

acquiring target video data;

determining a target dynamic region to which a target subject object belongs from the at least one dynamic region;

and generating a local dynamic image aiming at the target dynamic area.

The embodiment of the application also discloses an image processing method, which comprises the following steps:

acquiring target video data;

analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data.

The embodiment of the present application further discloses a local dynamic image generating device, including:

the first video acquisition module is used for acquiring target video data uploaded by a user;

the dynamic area analysis module is used for analyzing the pixel value of each frame of the target video data and determining at least one dynamic area in the target video data;

the first target determination module is used for receiving a target dynamic area determined by a user from the at least one dynamic area;

and the local image generation module is used for generating a local dynamic image aiming at the target dynamic region based on the target dynamic region determined by the user.

the second video acquisition module is used for acquiring target video data;

the second target determining module is used for determining a target dynamic region to which the target subject object belongs from the at least one dynamic region;

and the local image generation module is used for generating a local dynamic image aiming at the target dynamic region.

The embodiment of the present application further discloses an image processing apparatus, including:

the second video acquisition module is used for acquiring target video data;

and the dynamic area analysis module is used for analyzing the pixel value of each frame of the target video data and determining at least one dynamic area in the target video data.

The embodiment of the present application further discloses an apparatus, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when executed by the processor, the computer program implements the following steps: acquiring target video data uploaded by a user; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; receiving a target dynamic area determined by a user from the at least one dynamic area; and generating a local dynamic image aiming at the target dynamic region based on the target dynamic region determined by the user.

The embodiment of the present application further discloses a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following steps are implemented: acquiring target video data uploaded by a user; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; receiving a target dynamic area determined by a user from the at least one dynamic area; and generating a local dynamic image aiming at the target dynamic region based on the target dynamic region determined by the user.

The embodiment of the present application further discloses an apparatus, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when executed by the processor, the computer program implements the following steps: acquiring target video data; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; determining a target dynamic region to which a target subject object belongs from the at least one dynamic region; and generating a local dynamic image aiming at the target dynamic area.

The embodiment of the application also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the following steps: acquiring target video data; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; determining a target dynamic region to which a target subject object belongs from the at least one dynamic region; and generating a local dynamic image aiming at the target dynamic area.

The embodiment of the present application further discloses an apparatus, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when executed by the processor, the computer program implements the following steps: acquiring target video data; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data.

The embodiment of the present application further discloses a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following steps are implemented: acquiring target video data; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data.

The embodiment of the application has the following advantages:

according to the embodiment of the application, the pixel values of each frame of the target video data are analyzed, at least one dynamic area in the target video data is intelligently determined, and then a local dynamic image aiming at the target dynamic area can be automatically generated for the target dynamic area in the at least one dynamic area. The automatic identification of the dynamic interval is realized to generate a local dynamic image, so that the user operation is reduced; the dynamic area is the area where the main object moves in the video, so that the accuracy of selecting the dynamic area of the main object can be improved, and the labor and time costs are reduced; in addition, the dynamic area is automatically identified by the system, the condition that a user is difficult to identify a not-clear main object through human eye identification is avoided, the requirement on the separation of the main object of the video is low, and the requirement on video materials is reduced.

Drawings

FIG. 1 is a flowchart illustrating steps of an embodiment of a local dynamic image generation method according to the present application;

FIG. 1A is an exemplary architecture of a local motion picture generation system according to the present application;

FIG. 1B is an example of a dynamic zone profile of the present application;

FIG. 2 is a flowchart illustrating steps of another embodiment of a local motion image generation method according to the present application;

FIG. 3 is a flow chart of steps in another image processing method embodiment of the present application;

fig. 4 is a block diagram illustrating an embodiment of a local moving image generating apparatus according to the present application;

fig. 5 is a block diagram of another embodiment of a local moving image generating apparatus according to the present application;

FIG. 6 is a block diagram of an embodiment of an image processing apparatus of the present application;

fig. 7 is a schematic hardware structure diagram of an apparatus according to another embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

The local dynamic image is a combination of dynamic photography and static pictures. Generally, the camera is fixed to shoot, and then the shot video is processed, so that the dynamic state of the subject object needing to move is kept, and other parts except the subject object are kept still. For example, a video is shot by using a fixed camera, wherein A, B, C three persons are moving hands, and if only the moving hand representing a is wanted, a certain frame image of the video can be fixed as a background, while the moving hand of the person a is kept, at this time, the local dynamic image represents that other persons are still, and a is moving hands.

The method and the device for generating the local dynamic images of the target dynamic areas can automatically analyze the video materials, determine the dynamic areas where the main objects are located, and then intelligently generate the local dynamic images of the target dynamic areas.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a local dynamic image generation method according to the present application is shown, which may specifically include:

step 101, acquiring target video data uploaded by a user;

in the embodiment of the present application, a user may obtain target video data through various approaches, for example, modes such as shooting a video by using a mobile phone, shooting a video by using a camera, copying a video from a terminal of another user, downloading a video from a network, and the like are not limited in the embodiment of the present application.

The user can upload the acquired target video data to the system.

With reference to fig. 1A, an example of a local dynamic image generation architecture according to an embodiment of the present application is shown. It includes server 20, client 10. The embodiment of the application can adopt a client-server architecture. The user can upload the target video data to the server in the client, and the server performs subsequent processing on the video data and then returns the video data to the client.

It should be noted that, in the embodiment of the present application, target video data of a user may also be received locally at a client, and subsequent processing is performed on the target video data locally at the client.

102, analyzing pixel values of each frame of the target video data, and determining at least one dynamic area in the target video data;

in the embodiment of the application, after target video data uploaded by a user is received, intelligent analysis can be performed on pixel values of each frame of the target video data, so that at least a dynamic region in the target video data can be determined and then provided for the user to select. Of course, in practical applications, the recognizable dynamic region can be determined and then provided to the user for selection.

With reference to fig. 1A, after analyzing the pixel values of each frame of the target video data and determining at least one dynamic region in the target video data, the server may mark the outline of the dynamic region; and then returning the image marked with the dynamic area outline to the client for showing. The user may select one or more dynamic regions based on the marked outline at the client.

In this embodiment, the target video data may be a video material with a fixed background, and the video material with a fixed background may be a video material obtained by shooting with a fixed lens, or may be a material obtained by other methods, which is not limited in this embodiment.

Preferably, in another embodiment of the present application, step 102 includes:

sub-step a11, converting the target video data into sequential frames;

in the embodiment of the present application, for target video data uploaded by a user, a video conversion function, such as videotomimage (), may be first called to convert the target video into a sequence frame. Of course, the specific video transfer function may be different from system to system, and the embodiments of the present application do not limit this. In the sequence frame, the images are sequenced according to the playing time.

Sub-step a12, determining at least one dynamic region in the target video data according to the degree of coincidence between pixel blocks belonging to different pixel positions in each frame image.

In the embodiment of the present application, since the foregoing steps convert to obtain the sequence frames, and the resolution of each frame is consistent for the sequence frames, for example, the resolution of the target video data is 800 × 600, the resolution of each frame image in the sequence frames is also 800 × 600. For a subject object moving in a video, which is actually a pixel block in each frame of image, the difference between the pixel value of the pixel block before shifting and the pixel value after shifting is not large. The motion process of the subject object in the video can be understood as that the pixel value of each pixel position in the pixel block before the motion corresponds to and replaces the pixel value of each pixel position in the pixel block after the motion. The embodiments of the present application may determine the dynamic region based on the degree of coincidence between pixel blocks at different pixel positions. After the dynamic area is determined, the subject objects wrapped in the dynamic area are also determined.

For example, if a shoe in the target video data is in motion, the shoe occupies a pixel block consisting of 1000 pixels in the pixel block of region a1 in the first frame. After the shoe moves, it is displayed in the pixel block of the area a2 in the next frame. Then the values of the 1000 pixels in the pixel block of region a1 and the 1000 pixels in the pixel block of region a2 are substantially the same.

Then, by sequentially matching the blocks of pixels in each frame of image with the subsequent frames of image as described above, it can be determined in which frames the shoe is present and in which pixel location areas the moving blocks of pixels are located. For example, if there are 100 frames having the main object of the shoe, and the areas where the shoe appears in the 100 frames are A1 and a2 … … a100, respectively, since these areas are only the pixel positions pointed by the frame, the pixel positions of the 100 areas intersect, and the pixel position area where the shoe moves is obtained. Thus, the temporal dimension of the frames of the shoe from frame 1 to frame 100, and the spatial dimension of the moving pixel location area of the shoe, and thus the dynamic area in which the shoe is located, can be determined.

Certainly, in practical application, some pixels of the same object may change at different positions, and therefore, the pixel values of the pixel blocks of the same object in different frames may not be completely consistent, because the values of some pixels may change in different pixel blocks along with the change of light, when the coincidence ratio of each pixel of two pixel blocks reaches a certain value, the two areas may be considered as the condition that the same object moves at different positions, and then the dynamic area may be determined based on the pixel blocks.

It should be noted that, in the embodiment of the present application, a dynamic region can be determined by overlapping pixel blocks, and after the dynamic region is determined, a picture object in a region covered by the dynamic region is a main object.

It will be appreciated that for each frame of image, a pixel block is considered to be a static element if one or more overlapping pixel blocks in each frame occur in the same region of pixel locations, such as in a video at 800 x 600 resolution where one pixel block in each frame overlaps and is in the region of { (0, 0), (0,100), (100,0), (100 ) }.

sub-step a21, converting the target video data into sequential frames;

sub-step a22, for each pixel position in the target video data, determining a cyclable pixel position according to the degree of variation of the pixel value of the pixel position in each frame;

in the embodiment of the present application, for example, the resolution of the target video data is 800 × 600, and then the resolution of each frame is also 800 × 600, and the pixel value of the same pixel position changes according to the change of the frame during the video playing process. The embodiment of the present application may divide the pixel position, for example, in a case that the image does not change, the pixel value of the pixel position is the same, the pixel position is a still, the moving object in the video may cause the pixel position to which the moving object moves to change during the moving process, and the pixel position may be divided into a recyclable pixel position.

In practical applications, pixel positions can be divided into three categories according to the effect obtained by experiments: stationary pixel position, non-cyclable pixel position, cyclable pixel position:

1. the still pixel position is the pixel value at which no change occurs from the beginning of the first frame to the end of the last frame of the frame sequence.

The still pixel position can be understood as the position of the still image in the video, so that the pixel value of the pixel position does not change. .

2. The non-rotatable pixel positions are pixel values that are increasing or decreasing from the beginning of the first frame to the end of the last frame of the frame sequence.

This situation is known from experimental data to occur substantially not in the pixel positions traversed by the moving object.

3. The cyclable pixel positions are from the beginning of the first frame to the end of the last frame, with both increasing and decreasing pixel values.

The pixel positions where the moving object passes in the video are counted, and the passing pixel positions cause the value of the pixel position to be increased or decreased, so that the pixel positions where the value is increased or decreased can be used as the recyclable positions, and the dynamic region can be determined from the recyclable positions. In the embodiment of the present application, an upper limit of the increase of the pixel value and a lower limit of the decrease of the pixel value may be set, for example, 10 pixels are decreased within 10 pixel values. If the variation is too large, it may be an object added during the shooting, which may be inaccurately identified. It should be noted that, the values of the upper limit and the lower limit may be set according to actual requirements, and the embodiment of the present application does not limit the values.

According to the above process, the embodiment of the present application determines whether each pixel position is a recyclable pixel position according to the degree of change of the pixel value of each pixel position in different frames.

Of course, in the embodiment of the present application, 1 and 2 may be combined into one type, and mainly identify the pixel position in the 3 rd point.

Sub-step a23, determining at least one dynamic region in the target video data based on the regions connected by the respective rotatable pixel positions.

Due to the definition of the recyclable pixel positions, the moving object in the video can cause the value of the pixel position through which the object passes to change, and the regions obtained after the recyclable pixel positions are connected comprise the dynamic regions, so that at least one dynamic region in the target video data can be determined from the regions connected by the recyclable pixel positions.

It should be noted that, in the embodiment of the present application, the dynamic region has a time dimension attribute and a space dimension attribute, and it may also be understood that the dynamic region includes a pixel position region, and a start frame and a cycle duration of the pixel position region, where the pixel position region is a region in which a subject object moves in the whole video. The cycle duration may be a time from a time of a start frame to an end frame of the appearance of the subject object when each frame is marked with the playback time, or may be a number of frames from a sequence number of the start frame to the end frame when numbering is performed in the playback order.

Preferably, in another embodiment of the present application, the sub-step a23 includes:

substep A231, obtaining a temporal consistency parameter and a spatial consistency parameter of the recyclable pixel position;

in the embodiment of the present application, when calculating the dynamic region of the subject object based on the recyclable pixel position, it is first necessary to obtain the temporal consistency parameter and the spatial consistency parameter of the recyclable pixel position.

It should be noted that, for each recyclable pixel position a, the time consistency parameter may be calculated from the frame difference of the first frame and the next frame to the previous frame. The frame difference is, for example, the difference of pixel values, but of course, other parameters based on pixel values are also possible. It should be noted that the temporal consistency in the embodiment of the present application is calculated between different frames of the same rotatable pixel position.

For each recyclable pixel position A, its spatial consistency parameter can be calculated based on the recyclable pixel position and its neighboring recyclable pixel positions. In practical applications, the spatial parameters of each frame can be calculated for the recyclable pixel position a and its neighboring recyclable pixel positions of the frame, and then the spatial consistency of the recyclable pixel positions can be calculated based on the spatial parameters of each frame.

Substep A232, determining a starting frame and a cycle duration of each circulatable pixel position according to the time consistency parameter and the space consistency parameter of each circulatable pixel position;

then, based on the time consistency parameter and the space consistency parameter of each recyclable pixel position as energy values, substituting the energy values into a graph cut algorithm for calculation, and determining the initial frame and the circulation duration of each recyclable pixel position. It should be noted that, in the embodiment of the present application, the graph cut algorithm is not limited, and the temporal consistency parameter and the spatial consistency parameter may be used as one of the input parameters.

Substep a233, selecting a region satisfying a connected domain condition from the regions connected by the circulatable pixel positions as a pixel position region of the dynamic region;

in the embodiment of the present application, the recyclable pixel position is determined through the foregoing steps, and there may be noise in the recyclable pixel position, and in order to avoid the noise, in the area wrapped by each recyclable pixel position, an area meeting the connected domain condition is selected as the pixel position area, and the pixel position area is used as the spatial dimension attribute of the dynamic area.

Where a connected domain is a region that can be understood as being enclosed by a line. In practical application, corresponding connected domain conditions can be set according to the requirements of Gaussian smoothness, hole filling, morphological analysis and the like. Where the connected domain conditions are such as: and removing connected domains with the area smaller than the area threshold value. It should be noted that the area threshold may be set according to actual requirements, and the embodiment of the present application is not limited thereto.

In practical applications, the video image may be binarized into a gray image according to the recyclable pixel position and other pixel positions except the recyclable pixel position, for example, the aforementioned video with the resolution of 800 × 600, the recyclable pixel position gray value may be set to 255, and other pixel positions except the recyclable pixel position may be set to 0, so as to generate a gray image.

Then cutting the area with a gray value of 255 in the gray scale image can result in a dynamic area.

Sub-step A234 determines a start frame and a cycle duration for the dynamic region based on the start frame and the cycle duration for each pixel position in each pixel position region.

Since a start frame and a cycle duration are calculated for each of the circulatable pixel positions in the foregoing steps, since a pixel position region includes a plurality of circulatable pixel positions, and the start frames of the circulatable pixel positions may not be consistent with each other, and the cycle durations may also not be consistent with each other, an earliest start frame may be selected as the start frame of the dynamic region based on the start frame of each circulatable pixel position, and then a cycle duration including a latest frame corresponding to each circulatable pixel position may be selected as the cycle duration of the dynamic region, so that the start frame and the cycle duration may be used as the time dimension attribute of the dynamic region.

For example, the cyclic position area a includes: pixel position 1 can be rotated and pixel position 2 … … can be rotated and pixel position 10 can be rotated.

Where the start frame for the cyclable pixel position 1 is 2 and the cycle duration is 40 frames. The start frame of the cyclable pixel positions 2-9 is 3 and the cycle duration is 41 frames. The start frame of the cyclable pixel location 10 is 4 and the cycle duration is 50 frames.

Then for the dynamic region corresponding to the loop position region a, the start frame may be selected to be 2 because the start frame 2 is the earliest. And the end frame for cyclable pixel position 1 is 2+ 40-42 frames, the end frame for cyclable pixel position 2-9 is 3+ 41-44 frames, and the end frame for cyclable pixel position 10 is 4+ 50-54 frames, then the cycle duration can be calculated for 54 frames, i.e. 54-2-52 frames.

Of course, if the cycle duration adopts other counting manners, the cycle duration may be calculated by adopting a corresponding calculation manner, and the embodiment of the present application is not limited thereto.

103, receiving a target dynamic area determined by a user from the at least one dynamic area;

in practical applications, one or more dynamic regions can be identified and sent to the user for selection, and the user needs which dynamic region to generate the local dynamic image based on the needed dynamic region.

Referring to fig. 1A, after identifying various dynamic regions, the server 20 may draw a contour for each dynamic region, then select a frame of image from the video, mark the contour, and return the contour to the client 10 for presentation for the user to select. In practical applications, of course, the image of the dynamic region contour may be any frame image including the loop region, and then the dynamic region contour is added to the frame image, which is not limited in the embodiment of the present application.

It is understood that in the embodiment of the present application, since the system automatically identifies the dynamic region of the subject object, it may identify a plurality of dynamic regions, and then a plurality of subject objects may be involved. Of course, in practical applications, the embodiment of the present application may identify all identifiable dynamic regions. For the user, it may not be necessary for all the subject objects to be dynamically displayed, so that the embodiment of the present application marks a dynamic region outline for all the dynamic regions after identifying them, and then returns the image after marking the dynamic region outline to the user, such as fig. 1B, where there are two dynamic region outlines for the user to select.

It should be noted that, the adding of the dynamic region outline may adopt the foregoing binarization method to convert the image into a grayscale image, where the recyclable pixel position is set to 255, and the other pixel positions are set to 0, then select the 255-wrapped connected region, and then determine the pixel position of the edge of the connected region. After selecting the above-mentioned frame of image, add the red line on the pixel position of the border of this record, can get the outline of dynamic area.

The user may then select one or more dynamic regions in the client 10 as target dynamic regions, which the client then uploads to the server.

It should be noted that, when an architecture that performs video processing locally on the client is adopted, the client may directly display after identifying at least one dynamic region, and a user may directly select a target dynamic region locally on the client.

And 104, generating a local dynamic image aiming at the target dynamic region based on the target dynamic region determined by the user.

In the embodiment of the present application, since the user determines one or more target dynamic regions, a local dynamic image for the target dynamic region may be generated based on the target dynamic region.

Of course, in practical applications, when a user selects a plurality of target dynamic regions, a local dynamic image may be generated from all the target dynamic regions selected by the user, and dynamic images of all the target dynamic regions are displayed in the local dynamic image. And each local dynamic image can also be generated respectively aiming at each target dynamic area selected by the user, and each local dynamic image displays a dynamic image of one target dynamic area. Of course, other combinations are possible, and the embodiments of the present application are not limited thereto.

Preferably, step 104 includes:

substep 1041, determining a subsequence frame corresponding to said target dynamic region;

in practical applications, as in the foregoing example, for a loop interval, the start frame and the loop duration are calculated, and then the sub-sequence frames for generating the local dynamic image in the frame sequence of the video data can be determined according to the start frame and the loop duration. When there are a plurality of cycle sections, the start frame and the cycle time length of each cycle section may be selected because the start frame and the cycle time length of each cycle section are different from each other.

Preferably, sub-step 1041 comprises:

step A31, determining the sub-sequence frame of the target dynamic region according to the start frame and the cycle duration of the target dynamic region.

Such as two dynamic regions in fig. 1B, each dynamic region being a motion region of a shoe. When the user selects the two dynamic regions. If the initial frame of the dynamic area A corresponding to one shoe is 10, the cycle time is 50 frames; the start frame of dynamic region B for the other shoe is 12 and the cycle duration is 52.

If the two dynamic regions are to be placed in a local dynamic image, the start frame of the local dynamic image may be set to 10 and the cycle duration to 54. Then sub-sequence frames 10-64 may be acquired at this point.

Of course, if two dynamic regions are to be set in a local dynamic image, the sub-sequence frames can be obtained according to the corresponding start frame and cycle duration.

A substep 1042 of replacing the background image of the subsequent frame after the initial frame in the subsequence frames with the background image of the initial frame; the background image is an image outside the target dynamic region in each frame of image;

for example, the 10 th frame is used as the starting frame for the 10 th frame of the 10-64 frames of the target video data in fig. 1B, and the background outside the outline of the dynamic region of the 10 th frame is a static image. Then the 10 th frame is taken as the 1 st frame of the new local moving picture. And replacing the background outside the dynamic region outline area of the 11 th frame of the target video data with the background outside the dynamic region outline area of the 10 th frame, wherein the background of the replaced image is consistent with the 10 th frame, and then taking the replaced image as the 1 st frame of the new local dynamic image. The processing of other frames is analogically performed, and the 55 th frame as a new local moving picture after the background of the 64 th frame of the target video data is replaced is known.

It is understood that the present application is not limited to the embodiments, and the like, for other cases.

A sub-step 1043 of generating a local dynamic image for the target dynamic region based on the starting frame and the subsequent frame replacing the background image.

As in the previous example, the local motion images of two shoes can be generated by sequentially combining the 1 st frame to the bottom 55 frames corresponding to the new local motion image.

In practical application, the local dynamic image in the video format can be continuously generated. The 55-frame image may be generated as a local moving picture in gif (Graphics Interchange Format). The specific format of the local dynamic image is not limited in the embodiments of the present application.

It will be appreciated that when two partial dynamic images are to be generated for two shoes, respectively, the corresponding sub-sequence frames may be each generated as described above. The embodiments of the present application are not limited thereto.

Certainly, after the local dynamic image is generated, the user may select to export the local dynamic image, or click a sharing button to share the local dynamic image to a certain application, or the user may upload the local dynamic image to a page of the user. The embodiments of the present application are not limited thereto.

According to the embodiment of the application, the pixel values of each frame of the target video data are analyzed, at least one dynamic area in the target video data is intelligently determined, and then a local dynamic image aiming at the target dynamic area can be automatically generated for the target dynamic area in the at least one dynamic area. The automatic identification of the dynamic interval is realized to generate a local dynamic image, so that the user operation is reduced; the dynamic area is the area where the main object moves in the video, so that the accuracy of selecting the dynamic area of the main object can be improved, and the labor and time costs are reduced; in addition, the dynamic area is automatically identified by the system, the condition that a user is difficult to identify a not-clear main object through human eye identification is avoided, the requirement on the separation of the main object of the video is low, and the requirement on video materials is reduced. In addition, the method and the device can automatically identify a plurality of dynamic areas in the target video data, then provide for a user to select, and then automatically generate the required local dynamic images according to the requirements of the user.

Referring to fig. 2, a flowchart illustrating steps of another embodiment of a local dynamic image generation method according to the present application is shown, including:

step 201, acquiring target video data;

in the embodiment of the present application, video data may be acquired in various ways. When the execution direction server analyzes the video data, the user can upload the target video data through the client.

When the device performing the analysis of the video data is oriented to the user, then the user may import his target video data to his device.

Of course, the embodiment of the present application does not limit the specific target video data obtaining manner.

Step 202, analyzing pixel values of each frame of the target video data, and determining at least one dynamic area in the target video data;

this step is similar to step 102 of the previous embodiment and will not be described in detail herein.

Preferably, step 203 comprises:

sub-step B11, converting the target video data into sequential frames;

sub-step B12, determining at least one dynamic region in the target video data according to the degree of coincidence between pixel blocks belonging to different pixel positions in each frame image.

Substeps B11-B12 refer to the preceding example steps A11-A12 and will not be described in detail here.

Preferably, step 203 comprises:

sub-step B21, converting the target video data into sequential frames;

sub-step B22, for each pixel position in the target video data, determining a cyclable pixel position according to the degree of variation of the pixel value of the pixel position in each frame;

sub-step B23, determining at least one dynamic region in the target video data based on the regions connected by the respective rotatable pixel positions.

Substeps B21-B23 refer to the preceding example steps A21-A23 and will not be described in detail here.

Step 203, determining a target dynamic region to which the target subject object belongs from the at least one dynamic region;

in practical applications, the at least one dynamic region may be marked with a contour and then provided for a user to select, and then a target dynamic region may be determined according to the selection of the user.

And identifying whether the main object in the dynamic area is a target main object required by a user or not in an image identification mode, and if so, determining that the dynamic area is the target dynamic area. For example, the user may pre-select two words "shoes", and the system may then obtain the features of "shoes" from the database, identify whether some features appear in the images of the respective dynamic regions, and if so, select the dynamic region as the target dynamic region.

Of course, there may be various target dynamic regions to which the target subject object belongs, and the embodiments of the present application do not limit this.

Step 204, generating a local dynamic image for the target dynamic region.

This step is similar to step 204 of the previous embodiment and will not be described in detail herein.

Preferably, step 204 includes:

substep 2041, determining a subsequence frame corresponding to said target dynamic region;

substep 2042, replacing the background image of the subsequent frame after the initial frame in the subsequence frames with the background image of the initial frame; the background image is an image outside the target dynamic region in each frame of image;

substep 2043, based on the starting frame and the subsequent frame replacing the background image, generating a local dynamic image for the target dynamic region.

The substeps 2041-2043 refer to the substeps 1041-1043 of the previous embodiment, and are not described in detail here.

According to the embodiment of the application, the pixel values of each frame of the target video data are analyzed, at least one dynamic area in the target video data is intelligently determined, and then a local dynamic image aiming at the target dynamic area can be automatically generated for the target dynamic area in the at least one dynamic area. The automatic identification of the dynamic interval is realized to generate a local dynamic image, so that the user operation is reduced; the dynamic area is the area where the main object moves in the video, so that the accuracy of selecting the dynamic area of the main object can be improved, and the labor and time costs are reduced; in addition, the dynamic area is automatically identified by the system, the condition that a user is difficult to identify a not-clear main object through human eye identification is avoided, the requirement on the separation of the main object of the video is low, and the requirement on video materials is reduced. In addition, the dynamic region of the main object required by the user can be automatically identified, and the labor and time cost of the user is reduced.

Referring to fig. 3, a flowchart illustrating steps of an embodiment of an image processing method of the present application is shown, comprising:

step 301, acquiring target video data;

this step is described with reference to the

aforementioned step

101 or 201, and will not be described in detail here.

Step 302, analyzing the pixel values of each frame of the target video data, and determining at least one dynamic region in the target video data.

This step is described with reference to step 102, and will not be described in detail here.

Preferably, step 302 includes:

sub-step C11, converting the target video data into sequential frames;

sub-step C12, determining at least one dynamic region in the target video data according to the degree of coincidence between pixel blocks belonging to different pixel positions in each frame image.

Substeps C11-C12 refer to the preceding example steps A11-A12 and will not be described in detail here.

Preferably, step 302 includes:

sub-step C21, converting the target video data into sequential frames;

sub-step C22, for each pixel position in the target video data, determining a cyclable pixel position according to the degree of variation of the pixel value of the pixel position in each frame;

sub-step C23, determining at least one dynamic region in the target video data based on the regions connected by the rotatable pixel positions.

Substeps C21-C23 refer to the preceding example steps A21-A23 and will not be described in detail here.

Preferably, the sub-step C23 includes:

substep C231, obtaining a temporal consistency parameter and a spatial consistency parameter of the circulatable pixel positions;

substep C232, determining a starting frame and a cycle duration of each circulatable pixel position according to the time consistency parameter and the space consistency parameter of each circulatable pixel position;

a substep C233 of selecting a region satisfying a connected component condition from the regions connected by the respective circulatable pixel positions as a pixel position region of the dynamic region;

and a substep C234 of determining a start frame and a cycle duration for the dynamic region based on the start frame and the cycle duration for each pixel position in each pixel position region.

Substeps C231-C234 refer to previously described example steps A231-A234 and are not described in detail herein.

According to the embodiment of the application, the pixel values of all frames of the target video data are analyzed, at least one dynamic area in the target video data is intelligently determined, then the target dynamic area in the at least one dynamic area is subjected to manual outline drawing, the process that a user draws outlines manually is reduced, the accuracy of the dynamic area is improved, manpower and time cost are reduced, the dynamic area is automatically identified through a system, the situation that a user difficultly identifies a not-clear main object through human eye identification is avoided, the requirement on separation of the main object of the video is low, and the requirement on video materials is reduced.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Referring to fig. 4, a block diagram of a local moving image generating device according to an embodiment of the present disclosure is shown, which may specifically include the following modules:

a first video obtaining module 401, configured to obtain target video data uploaded by a user;

a dynamic region analysis module 402, configured to analyze pixel values of each frame of the target video data, and determine at least one dynamic region in the target video data;

a first target determining module 403, configured to receive a target dynamic region determined by a user from the at least one dynamic region;

a local image generating module 404, configured to generate a local dynamic image for a target dynamic region based on the target dynamic region determined by a user.

Preferably, the dynamic region analysis module includes:

the video conversion sub-module is used for converting the target video data into sequence frames;

and the first dynamic region analysis submodule is used for determining at least one dynamic region in the target video data according to the coincidence ratio between pixel blocks which belong to different pixel positions in each frame of image.

Preferably, the dynamic region analysis module includes:

a recyclable pixel position determining submodule for determining a recyclable pixel position for each pixel position in the target video data according to a degree of variation in pixel values of the pixel position in each frame;

and the second dynamic area analysis sub-module is used for determining at least one dynamic area in the target video data based on the areas communicated by the circulatable pixel positions.

Preferably, the second dynamic region analysis sub-module includes:

a consistency parameter obtaining unit, configured to obtain a temporal consistency parameter and a spatial consistency parameter of the recyclable pixel position;

the pixel parameter determining unit is used for determining the initial frame and the cycle duration of each recyclable pixel position according to the time consistency parameter and the space consistency parameter of each recyclable pixel position;

a pixel position area determination unit for selecting an area meeting a connected domain condition as a pixel position area of the dynamic area from areas connected by the circulatable pixel positions;

and the frame parameter unit is used for determining the starting frame and the cycle duration of the dynamic area based on the starting frame and the cycle duration of each pixel position in each pixel position area.

Preferably, the local image generation module includes:

a sub-sequence frame determining sub-module, configured to determine a sub-sequence frame corresponding to the target dynamic region;

the replacing sub-module is used for replacing the background image of the subsequent frame after the initial frame in the sub-sequence frames with the background image of the initial frame; the background image is an image outside the target dynamic region in each frame of image;

and the first generation sub-module generates a local dynamic image aiming at the target dynamic area based on the initial frame and the subsequent frame replacing the background image.

Preferably, the sub-sequence frame determination sub-module includes:

and the sub-sequence frame determining unit determines the sub-sequence frame of the target dynamic region according to the starting frame and the cycle duration of the target dynamic region.

Referring to fig. 5, a block diagram of another embodiment of a local moving image generating device according to the present application is shown, and specifically includes the following modules:

a second video obtaining module 501, configured to obtain target video data;

a dynamic region analysis module 502, configured to analyze pixel values of each frame of the target video data to determine at least one dynamic region in the target video data;

a second target determining module 503, configured to determine, from the at least one dynamic region, a target dynamic region to which the target subject object belongs;

a local image generating module 504, configured to generate a local dynamic image for the target dynamic region.

Preferably, the dynamic region analysis module includes:

Preferably, the local image generation module includes:

Referring to fig. 6, a block diagram of an embodiment of an image processing apparatus according to the present application is shown, and may specifically include the following modules:

a second video obtaining module 601, configured to obtain target video data;

a dynamic region analysis module 602, configured to analyze pixel values of each frame of the target video data, and determine at least one dynamic region in the target video data.

Preferably, the dynamic region analysis module includes:

Preferably, the second dynamic region analysis sub-module includes:

The present application further provides a non-transitory, readable storage medium, where one or more modules (programs) are stored, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of method steps in this application.

Fig. 7 is a schematic hardware structure diagram of an apparatus according to another embodiment of the present application. As shown in fig. 7, the apparatus of the present embodiment includes a processor 81 and a memory 82.

The processor 81 executes the computer program code stored in the memory 82 to implement the local moving image generating method of fig. 1 to 4 in the above-described embodiment.

The memory 82 is configured to store various types of data to support operation at the device. Examples of such data include instructions for any application or method operating on the device, such as messages, pictures, videos, and so forth. The memory 82 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, the processor 81 is provided in the processing assembly 80. The apparatus may further include: a communication component 83, a power component 84, a multimedia component 85, an audio component 86, an input/output interface 87 and/or a sensor component 88. The specific components included in the device are set according to actual requirements, which is not limited in this embodiment.

The processing assembly 80 generally controls the overall operation of the device. The processing component 80 may include one or more processors 81 to execute instructions to perform all or part of the steps of the methods of fig. 1-4 described above. Further, the processing component 80 may include one or more modules that facilitate interaction between the processing component 80 and other components. For example, the processing component 80 may include a multimedia module to facilitate interaction between the multimedia component 85 and the processing component 80.

The power supply component 84 provides power to the various components of the device. The power components 84 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for a device.

The multimedia component 85 includes a display screen that provides an output interface between the device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The audio component 86 is configured to output and/or input audio signals. For example, the audio component 86 includes a Microphone (MIC) configured to receive external audio signals when the device is in an operational mode, such as a speech recognition mode. The received audio signal may further be stored in the memory 82 or transmitted via the communication component 83. In some embodiments, audio assembly 86 also includes a speaker for outputting audio signals.

The input/output interface 87 provides an interface between the processing component 80 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor assembly 88 includes one or more sensors for providing various aspects of status assessment for the device. For example, the sensor assembly 88 may detect the open/closed status of the device, the relative positioning of the assemblies, the presence or absence of user contact with the device. The sensor assembly 88 may include a proximity sensor configured to detect the presence of nearby objects, including detecting the distance between the user and the device, without any physical contact. In some embodiments, the sensor assembly 88 may also include a camera or the like.

The communication component 83 is configured to facilitate wired or wireless communication between the device and other devices. The device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the device may include a SIM card slot therein for insertion of a SIM card so that the device can log onto a GPRS network to establish communication with a server via the internet.

From the above, the communication component 83, the audio component 86, the input/output interface 87 and the sensor component 88 referred to in the embodiment of fig. 7 can be implemented as input devices.

In an apparatus of this embodiment, the processor is configured to obtain target video data uploaded by a user; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; receiving a target dynamic area determined by a user from the at least one dynamic area; generating a local dynamic image aiming at a target dynamic region based on the target dynamic region determined by a user; or for obtaining target video data; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data; determining a target dynamic region to which a target subject object belongs from the at least one dynamic region; generating a local dynamic image aiming at the target dynamic region; or for obtaining target video data; analyzing the pixel value of each frame of the target video data, and determining at least one dynamic area in the target video data.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

It should be apparent to those skilled in the art that embodiments of the present application may be provided as methods and apparatus, or computer program products. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The local dynamic image generation method, the local dynamic image generation device, the image processing method, and the image processing device provided by the present application are described in detail above, and specific examples are applied in this document to explain the principle and the implementation of the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for generating a local dynamic image, comprising:

acquiring target video data uploaded by a user;

analyzing the pixel value of each frame of the target video data, determining a plurality of dynamic areas in the target video data, and respectively marking the outlines of the dynamic areas; the plurality of dynamic regions correspond to a plurality of subject objects;

receiving a target dynamic area determined by a user from the plurality of dynamic areas;

generating a local dynamic image aiming at a target dynamic region based on the target dynamic region determined by a user;

in the local dynamic image, a subject object corresponding to the target dynamic region is in a motion state, and the part except the target dynamic region is kept static.

2. The method of claim 1, wherein the step of analyzing pixel values of frames of the target video data to determine a plurality of dynamic regions in the target video data comprises:

converting the target video data into sequence frames;

and determining a plurality of dynamic regions in the target video data according to the coincidence ratio between pixel blocks belonging to different pixel positions in each frame of image.

3. The method of claim 1, wherein the step of analyzing pixel values of frames of the target video data to determine a plurality of dynamic regions in the target video data comprises:

converting the target video data into sequence frames;

for each pixel position in the target video data, determining a circulatable pixel position according to the variation degree of the pixel value of the pixel position in each frame; the recyclable pixel is a pixel in which the pixel value of the pixel position is increased or decreased from the beginning of the first frame to the end of the last frame in the sequence frame;

a plurality of dynamic regions in the target video data are determined based on regions to which the respective rotatable pixel positions are connected.

4. The method of claim 3, wherein the step of determining a plurality of dynamic regions in the target video data based on the regions connected by the rotatable pixel positions comprises:

acquiring a time consistency parameter and a space consistency parameter of the recyclable pixel position;

determining a starting frame and a cycle duration of each recyclable pixel position according to the time consistency parameter and the space consistency parameter of each recyclable pixel position;

selecting a region meeting the condition of a connected domain from the regions connected by the recyclable pixel positions as a pixel position region of the dynamic region;

and determining the starting frame and the cycle duration of the dynamic region based on the starting frame and the cycle duration of each pixel position in each pixel position region.

5. The method according to claim 1 or 4, wherein the step of generating a local dynamic image for the target dynamic region based on the target dynamic region determined by the user comprises:

determining a subsequence frame corresponding to the target dynamic region;

replacing the background image of a subsequent frame after the initial frame in the sub-sequence frames with the background image of the initial frame; the background image is an image outside the target dynamic region in each frame of image;

and generating a local dynamic image aiming at the target dynamic area based on the initial frame and the subsequent frame replacing the background image.

6. The method of claim 5, wherein the step of determining the sub-sequence frames in the video data that include the dynamic region comprises:

and determining the subsequence frame of the target dynamic area according to the starting frame and the cycle duration of the target dynamic area.

7. A method for generating a local dynamic image, comprising:

acquiring target video data;

determining a target dynamic region to which a target subject object belongs from the plurality of dynamic regions;

generating a local dynamic image aiming at the target dynamic region;

8. The method of claim 7, wherein the step of analyzing pixel values of frames of the target video data to determine a plurality of dynamic regions in the target video data comprises:

converting the target video data into sequence frames;

9. The method of claim 8, wherein the step of analyzing pixel values of frames of the target video data to determine a plurality of dynamic regions in the target video data comprises:

converting the target video data into sequence frames;

10. The method according to claim 7, wherein the step of generating the local dynamic image for the target dynamic region comprises:

determining a subsequence frame corresponding to the target dynamic region;

11. An image processing method, comprising:

acquiring target video data;

analyzing pixel values of each frame of the target video data, determining a plurality of dynamic regions in the target video data, and marking outlines of the dynamic regions respectively so that a user can determine a target dynamic region from the dynamic regions and generate a local dynamic image aiming at the target dynamic region based on the target dynamic region; the plurality of dynamic regions correspond to a plurality of subject objects;

12. The method of claim 11, wherein the step of analyzing pixel values of frames of the target video data to determine a plurality of dynamic regions in the target video data comprises:

converting the target video data into sequence frames;

13. The method of claim 12, wherein the step of analyzing pixel values of frames of the target video data to determine a plurality of dynamic regions in the target video data comprises:

converting the target video data into sequence frames;

14. The method of claim 13, wherein the step of determining a plurality of dynamic regions in the target video data based on the regions connected by the rotatable pixel positions comprises:

15. A local moving image generating apparatus, comprising:

the dynamic area analysis module is used for analyzing the pixel value of each frame of the target video data, determining a plurality of dynamic areas in the target video data and respectively marking the outlines of the dynamic areas; the plurality of dynamic regions correspond to a plurality of subject objects;

the first target determination module is used for receiving a target dynamic area determined by a user from the plurality of dynamic areas;

the local image generation module is used for generating a local dynamic image aiming at a target dynamic region based on the target dynamic region determined by a user;

16. The apparatus of claim 15, wherein the dynamic region analysis module comprises:

and the first dynamic region analysis submodule is used for determining a plurality of dynamic regions in the target video data according to the coincidence ratio between pixel blocks which belong to different pixel positions in each frame of image.

17. The apparatus of claim 15, wherein the dynamic region analysis module comprises:

a recyclable pixel position determining submodule for determining a recyclable pixel position for each pixel position in the target video data according to a degree of variation in pixel values of the pixel position in each frame; the recyclable pixel is a pixel in which the pixel value of the pixel position is increased or decreased from the beginning of the first frame to the end of the last frame in the sequence frame;

and the second dynamic area analysis submodule is used for determining a plurality of dynamic areas in the target video data based on the areas communicated by the circulatable pixel positions.

18. The apparatus of claim 17, wherein the second dynamic region analysis submodule comprises:

19. The apparatus of claim 15 or 18, wherein the local image generation module comprises:

20. The apparatus of claim 19, wherein the sub-sequence frame determination sub-module comprises:

21. A local moving image generating apparatus, comprising:

the second video acquisition module is used for acquiring target video data;

the second target determining module is used for determining a target dynamic region to which the target subject object belongs from the plurality of dynamic regions;

the local image generation module is used for generating a local dynamic image aiming at the target dynamic region;

22. The apparatus of claim 21, wherein the dynamic region analysis module comprises:

the second video conversion sub-module is used for converting the target video data into sequence frames;

23. The apparatus of claim 21, wherein the dynamic region analysis module comprises:

24. The apparatus of claim 21, wherein the local image generation module comprises:

25. An image processing apparatus characterized by comprising:

the video acquisition module is used for acquiring target video data;

the dynamic area analysis module is used for analyzing pixel values of each frame of the target video data, determining a plurality of dynamic areas in the target video data, and respectively marking outlines of the dynamic areas so that a user can determine a target dynamic area from the dynamic areas and generate a local dynamic image aiming at the target dynamic area based on the target dynamic area; the plurality of dynamic regions correspond to a plurality of subject objects;

26. The apparatus of claim 25, wherein the dynamic region analysis module comprises:

27. The apparatus of claim 25, wherein the dynamic region analysis module comprises:

28. The apparatus of claim 27, wherein the second dynamic region analysis submodule comprises:

29. An apparatus comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:

acquiring target video data uploaded by a user;

30. A computer-readable storage medium, on which a computer program is stored, which computer program, when executed by a processor, performs the steps of:

acquiring target video data uploaded by a user;

31. An apparatus comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:

acquiring target video data;

generating a local dynamic image aiming at the target dynamic region;

32. A computer-readable storage medium, on which a computer program is stored, which computer program, when executed by a processor, performs the steps of:

acquiring target video data;

generating a local dynamic image aiming at the target dynamic region;

33. An apparatus comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:

acquiring target video data;

34. A computer-readable storage medium, on which a computer program is stored, which computer program, when executed by a processor, performs the steps of:

acquiring target video data;