CN111757175A

CN111757175A - Video processing method and device

Info

Publication number: CN111757175A
Application number: CN202010514455.0A
Authority: CN
Inventors: 缪刚
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2020-10-09

Abstract

The application discloses a video processing method and device, and belongs to the technical field of video processing. The method comprises the following steps: receiving a first input by a user, the first input comprising selection of at least one video segment; obtaining a target attribute of the at least one video clip, the target attribute comprising: a target action of a target object in the at least one video segment, and/or a target scene in the at least one video segment; acquiring a target special effect matched with the target attribute and a target position of the target special effect in the at least one video clip; and adding the target special effect at the target position in the at least one video clip to generate a target video. When the special effect is added to the video, the user operation can be simplified, the matching degree of the added special effect and the video content is improved, and the video playing effect is improved.

Description

Video processing method and device

Technical Field

The application belongs to the field of video processing, and particularly relates to a video processing method and device.

Background

At present, the types of application programs of electronic equipment are more and more, and video programs are more popular with users.

Current video programs can provide video editing functions by which a user can add special effects to video. However, no matter a single video clip is added with a special effect, or a plurality of video clips are added with a special effect, in various video programs in the prior art, a user can only manually select a certain special effect from various special effects provided by the video programs, so as to achieve the purpose of adding the special effect to the video clips. However, the special effect selected by the user subjectively is not suitable for the content of the video clip in most cases, so the problem of low matching degree between the special effect and the video content is easily existed in the way of manually adding the special effect, and the video playing effect is further influenced.

Therefore, when a video editing scheme in the prior art adds a special effect to a video, the problems of complicated operation, low matching degree of the special effect and video content and further influence on the video playing effect generally exist.

Disclosure of Invention

The embodiment of the application aims to provide a video processing method and a video processing device, and the method and the device can solve the problems that when a video editing scheme in the prior art adds a special effect to a video, the operation is complicated, the matching degree of the special effect and video content is low, and the video playing effect is influenced.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a video processing method, where the method includes:

receiving a first input of a user, the first input comprising: selecting at least one video segment;

obtaining a target attribute of the at least one video clip, the target attribute comprising: a target action of a target object in the at least one video segment, and/or a target scene in the at least one video segment;

acquiring a target special effect matched with the target attribute and a target position of the target special effect in the at least one video clip;

and adding the target special effect at the target position in the at least one video clip to generate a target video.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including:

a receiving module, configured to receive a first input of a user, where the first input includes: selecting at least one video segment;

a first obtaining module, configured to obtain a target attribute of the at least one video segment, where the target attribute includes: a target action of a target object in the at least one video segment, and/or a target scene in the at least one video segment;

a second obtaining module, configured to obtain a target special effect matched with the target attribute and a target position where the target special effect is located in the at least one video segment;

and the processing module is used for adding the target special effect at the target position in the at least one video clip to generate a target video.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, by acquiring a target action of a target object in at least one video clip and/or a target scene in at least one video clip of a video special effect to be added and acquiring a target special effect matched with the target action and/or the target scene, the target special effect added to at least one video clip can be matched with the target action of the target object in the video clip and/or the scene in which the video clip is located, so that the video special effect can be matched with video content to which the special effect is added; in addition, the target position of the target special effect in the at least one video clip can be obtained, the target special effect is added to the target position, and the target video is generated, so that the adding position of the target special effect can be adapted to the video content, the matching degree between the video special effect and the video content with the added special effect is improved, and the editing effect of the video special effect is improved.

Drawings

FIG. 1 is a flow diagram of a video processing method according to one embodiment of the present application;

FIG. 2 is a schematic diagram of a video clip according to one embodiment of the present application;

FIG. 3 is a block diagram of a video processing device of one embodiment of the present application;

fig. 4 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.

Fig. 5 is a schematic hardware configuration diagram of an electronic device according to another embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The video processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Referring to fig. 1, a flowchart of a video processing method according to an embodiment of the present application is shown, where the method may specifically include the following steps:

step 101, receiving a first input of a user, where the first input includes: selecting at least one video segment;

wherein the first input is a selection input indicating that a video effect is to be added to the at least one video segment.

Optionally, when the first input includes at least two video segments, then the first input may further include a preset ordering of the two or more video segment selections.

The types of video effects may include video effects that process a single video segment, and transition effects that process a junction of different video segments (e.g., "left-slide," "right-slide," "flash-white," "flash-black," "foldover," etc.).

In one example, when adding transition special effects to multiple video segments, a user may open a certain video, enter an editing interface, add multiple segments of videos, and then enter a transition editing page, and if the user selects "add transition intelligently," this first input may be triggered.

Step 102, obtaining a target attribute of the at least one video clip, where the target attribute includes: a target action of a target object in the at least one video segment, and/or a target scene in the at least one video segment;

in this step, a target action of a target object in the at least one video segment and/or a target scene in the at least one video segment may be obtained.

In one embodiment, when a target object exists in the at least one video segment, the obtained target attribute may include a target action of the target object and a target scene of the at least one video segment, so that a target special effect matching with both the target scene in which the at least one video segment is located and the target action of the target object in the at least one video segment is added based on the target scene in which the at least one video segment is located and the target scene in which the target object is located;

in another embodiment, when a target object exists in the at least one video segment, acquiring a target action of the target object in the at least one video segment, so as to add a target special effect matched with the target action based on the target action of the target object in the at least one video segment;

in another embodiment, when the target object is not present in the at least one video segment, a target scene of the at least one video segment may be acquired, so that a target special effect matching with the target scene is added based on the target scene in which the at least one video segment is located.

In addition, since one or more objects may be involved in the at least one video segment, and there may be a difference in the types of the actions of different objects, or there may be a difference in the directions of the actions although the actions are the same, the target object here may be a subject object or a non-subject object in the plurality of objects.

When the target object is a subject object, an appropriate target special effect may be matched based on the motion of the subject object, so that the target special effect can be matched with the motion of the subject object in the at least one video clip.

In addition, the object type of the target object may be a preset type (for example, any object that can move, such as a human being or an animal).

In addition, the target attribute in this step may be a target attribute related to the at least one video segment, or may be a target attribute at a junction between two segments in the at least one video segment (mainly because the transition special effect is added at the junction between two segments, and therefore, in order to enable the transition special effect to match with the content of the segment to which the feature is added, the target attribute here may be a target attribute possessed by a junction between two segments (i.e., a target sub-segment described in the following embodiments) to which the transition special effect needs to be added).

For convenience of explanation, the following embodiments are described by taking as examples that the target object is a person and is a subject object, and the target attribute is a target attribute of a target sub-segment described below in the at least one video segment, and the target special effect is a transition special effect.

103, acquiring a target special effect matched with the target attribute and a target position of the target special effect in the at least one video clip;

the type of the target effect includes, but is not limited to, a transition effect. For convenience of explanation, the following embodiments will be described taking the target effect as an example of the transition effect.

When the target attribute comprises a target action and a target scene of the target object, the target special effect is a special effect matched with both the target action and the target scene.

In addition, the number of the target special effects may be one or more.

When the target attribute comprises a target action of a target object and a target special effect matched with the target attribute is obtained, the target special effect matched with the target action of the target object can be obtained according to a first corresponding relation between preset object actions and the special effect;

when the target attribute includes a target scene, when the target special effect matched with the target attribute is obtained, the target special effect matched with the target scene can be obtained according to a second corresponding relation between a preset scene and the special effect.

In addition, the first corresponding relationship and the second corresponding relationship may be stored in a server, and the two types of corresponding relationships may be updated at regular time in the embodiment of the present application, so as to ensure the richness of the transition special effect.

In addition, when the target special effect matched with the target attribute is obtained, the preset corresponding relation can be inquired, and the target special effect matched with the target attribute can be calculated in real time by using a matching model deployed in a server, wherein the calculated target special effect can be adapted to the content of the target attribute.

In addition, in this step, not only the target special effect but also the target position where the target special effect is located in the at least one video segment may be determined.

Since a video segment can be understood as a time-sequential multi-frame image sequence, the target position may include specific certain frames of images in the multi-frame image sequence (i.e. adding the target special effect at the certain frames of images); optionally, the target position may further include a coordinate position where the target special effect is specifically added in the several frames of images.

For example, the matching rule of the matching model and the setting rule of the first and second correspondences may be set as follows: if the character motion is with obvious directionality (such as to the left), adding a 'left-sliding' transition special effect at the joint of the segments, namely the special effect direction of the target special effect is consistent with the direction of the target motion; if the character moves with obvious punishment (such as punching a fist), the punching motion respectively punches coordinate points in a plurality of frames of images to add a 'water wave' transition special effect on each coordinate point at the joint of the segments; if the character motion is at rest, a "lap transition" may be added to the splice of segments where the character is located. If the target scene is 'landscape', for example, adding 'fade-in and fade-out' transition special effect; if the target scene is 'evening' and the 'shaking' transition special effect with a clear style is added; if the target scene is 'restaurant', the ancient style 'TV snowflake' transition special effect can be added.

And 104, adding the target special effect to the target position in the at least one video clip to generate a target video.

Step 103 determines not only the target special effect to be added, but also the adding position (i.e. target position) of the target special effect in the at least one video segment, so that the target special effect can be added at the target position of the at least one video segment to complete the encoding of the at least one video segment, and generate the target video.

Optionally, when the added special effect is a transition special effect, the above steps 101 to 104 may be performed once on a joint (i.e., a target sub-segment) of every two adjacent target segments to complete editing of the transition special effect of the two target segments.

In addition, in the video editing process, the user only needs to trigger the first input without performing excessive operation, so that the operation of the user on video editing is simplified, and the operation cost of the user is saved.

Optionally, in an embodiment, a video special effect (for example, a transition special effect) may be added between two video segments, and therefore, it is required to enable the added video special effect to match with the content of a junction between the two video segments (that is, the following target sub-segments), so in this embodiment, when step 102 is executed, a target attribute of the junction between two target segments in the at least one video segment may be obtained, which is specifically implemented by S201 to S203:

s201, identifying two target segments which are arranged according to a preset sequence in the at least one video segment;

alternatively, when S201 is executed, it may be implemented by the following manner 1 or manner 2:

mode 1: when the at least one video clip comprises one video clip, the video clip is divided into at least two target clips which are arranged according to the preset sequence according to the scene corresponding to each frame of image in the image sequence corresponding to the video clip, wherein any one group of two target clips which are arranged adjacently in the at least two target clips are respectively matched with different scenes, and any one group of two target clips which are arranged adjacently in the at least two target clips are identified.

Specifically, when the user adds only one video segment in the first input and wants to add, for example, a transition special effect to the video segment, and the transition special effect is added between two video segments, the content of a single video segment may be very rich, and there are many picture jumps and thus scene switches. Therefore, the scene corresponding to each frame of image in the image sequence corresponding to the video clip can be identified through an algorithm such as scene identification, and then the video clip is divided into at least two target clips based on the difference of the scenes of the frame images.

For example, segment a, segment B, and segment C are sequentially arranged from front to back (i.e. arranged in a preset order) in time sequence, where each segment may include at least one frame. And the corresponding scenes of two adjacent segments are different, for example, segment a is outdoors, segment B is indoors, and segment C is outdoors.

Alternatively, if the overall scene of the video clip is relatively single, the user may be prompted to "the video does not need to add transitions".

Since the method 1 may divide a video segment into at least two target segments arranged according to a preset order, in the method of the embodiment of the present application, when a special effect is added, a segment junction may be determined for every two adjacent target segments of the at least two target segments, so as to obtain a target attribute for the segment junction.

Mode 2: when the at least one video segment comprises at least two video segments, the first input further comprises the preset sequence selected for the at least two video segments, and the two video segments arranged adjacently in any group of the at least two video segments are identified as two target segments arranged according to the preset sequence in the at least one video segment.

When the first input includes the selected at least two video clips, that is, it indicates that the user needs to add a video special effect to the at least two clips, and when the user imports the at least two video clips through the first input, the edited arrangement sequence (that is, the preset sequence) is also set for the at least two video clips, so the first input may further include the preset sequence selected for the at least two video clips.

Since there are at least two video segments to which a special effect needs to be added, any set of two video segments arranged adjacently among the at least two video segments can be identified as two target segments to which a special effect such as a transition is needed to be added.

In this embodiment of the application, when a user needs to add a video special effect to a single video clip, if the video content of the single video clip relates to multiple scenes, the video clip may be segmented according to the difference of the scenes, so as to generate at least two target clips arranged according to a preset order, and the scenes corresponding to the two adjacent target clips are different, so that any one group of two adjacent target clips arranged adjacently may be identified from the at least two target clips to serve as a group of video clips to which the target special effect is added, so that the single video clip including the multiple scenes may also add the target special effect having an excessive effect on the scenes. In addition, in this embodiment of the application, when the at least one video segment includes at least two video segments, then the first input further includes the preset order selected for the at least two video segments, then any group of two video segments arranged adjacently in the at least two video segments may be identified as two target segments arranged in the preset order in the at least one video segment, so as to add a target special effect matching with video content to the two target segments.

S202, for the two target segments, the two target segments include a first segment and a second segment, identify a first sub-segment in the first segment within a first preset duration of an end portion of the first segment, and identify a second sub-segment in the second segment within a second preset duration of a beginning portion of the second segment, where the two target segments include: the first segment and the second segment are adjacently arranged according to the preset sequence, and the first segment is arranged before the second segment;

as shown in fig. 2, two target segments, segment 1 and segment 2, are shown arranged in a preset order (i.e., in order from morning to evening on the display time t axis); then, a first sub-segment within the last second of segment 1 (here, the first preset duration is 1s) is identified for segment 1, and a second sub-segment within the first 1s of segment 2 (here, the second preset duration is 1s) is identified for segment 2, and then the first sub-segment and the second sub-segment constitute a target sub-segment (i.e., segment 3), where segment 3 is a segment junction of segment 1 and segment 2. Segment 3 is understood to be the transition from segment 1 to segment 2.

Alternatively, when the first sub-segment and the second sub-segment are identified, the segment 1 and the segment 2 may be respectively converted into a picture sequence having a time sequence, and since the time length of each frame of image is fixed, the first sub-segment and the second sub-segment may be identified by using the first preset time length and the second preset time length.

In addition, the first preset duration and the second preset duration are both integer multiples of the duration of one frame image, that is, the segment 3 includes an integer number of frame images.

Optionally, the sum of the durations of the first and second preset durations is a standard duration of the special effect (e.g., a duration of a standard transition special effect). For example, the duration of a typical transition effect is 2s, so segment 3 can be found to add the transition effect.

S203, acquiring the target attribute of the target sub-segment, wherein the target sub-segment comprises the first sub-segment and the second sub-segment which are arranged according to the preset sequence.

For example, the target attribute may be obtained for segment 3 in fig. 2.

Then, in executing the step 103 of obtaining the target position of the target special effect in the at least one video segment, the target position of the target special effect in the target sub-segment in the at least one video segment may be obtained (for example, determining the target position of the segment 3 that needs to have the target special effect).

In the embodiment of the present application, by identifying two target segments arranged in a preset order in the at least one video segment, and, for the two target segments, identifying a first sub-segment in the first segment within a first preset time duration at the end of the first segment and identifying a second sub-segment in the second segment within a second preset time duration at the beginning of the second segment, and then obtaining the target attribute of the target sub-segment composed of the first sub-segment and the second sub-segment, when adding a special effect to at least one video segment, a target scene at the junction of the two target segments (i.e. the target sub-segment) and/or a target action of a target object can be identified to obtain a target special effect matching therewith, so that the target special effect added to at least one video segment can be matched with the content at the junction of different video segments, the content matching degree of the joint of the video special effect and the video segment is improved, so that smooth transition between different segments is realized.

Optionally, in an embodiment, the target attribute includes a target action of a target object in the at least one video segment, and then the step 102 is executed through S301 to S304.

S301, acquiring an image sequence corresponding to the at least one video clip, wherein the image sequence comprises a plurality of frames of images;

wherein, in different embodiments, the image sequence may be an image sequence of each frame of image in the at least one video segment, and may also be an image sequence of the above-mentioned target sub-segment (e.g. segment 3 in fig. 2) in the at least one video segment, and the image sequence may include a plurality of frames of images.

S302, for each candidate object in the image sequence, obtaining a target parameter of each candidate object, where the target parameter of each candidate object includes: the area ratio of the outline of each candidate object in the multi-frame image and/or the frame ratio of each candidate object in the multi-frame image;

the type of the candidate object is predefined, for example, the type of the candidate object is human.

In one example, a human body recognition technology may be utilized to recognize whether each frame of image in the image sequence corresponding to the segment 3 in fig. 2 includes a human body (where the human body is determined to be included in the image as long as any human body element in the torso or the head of the human body is included), and in the case of including the human body, the type of the human body (for example, two candidate objects of the user a and the user B) needs to be recognized.

For example, the segment 3 includes 4 frames of images, and it is determined by the human body recognition technology that the 4 frames of images relate to two candidate objects, i.e., the user a and the user B.

In one embodiment, when the target parameter includes the area proportion of the outline of the candidate object in the multi-frame image, the area proportion of the outline of the user a in the multi-frame image and the area proportion of the outline of the user B in the multi-frame image may be obtained.

Specifically, for example, if the area ratio of the contour of the user a in each of the image 1, the image 2, the image 3, and the image 4 of the segment 3 is 70%, 30%, 20%, and 10%, the area ratio of the contour of the user a in the 4-frame image is (70% + 30% + 20% + 0% + 120%);

the area ratio of the contour of the user B in each of the image 1, the image 2, the image 3, and the image 4 of the segment 3 is 20%, 60%, 70%, and 10%, and the area ratio of the contour of the user B in the 4-frame image is (20% + 60% + 70% + 10% + 160%).

In one embodiment, when the target parameter includes a ratio of the number of frames of the candidate object in the multi-frame image, that is, a total number of images of the image having the candidate object in the multi-frame image, to the total number of frames of the multi-frame image.

In the above example, the user a has 3 frames in the 4-frame image of the section 3 and has not (because the area ratio is 0%) in the fourth frame image, and therefore, the frame ratio of the user a in the 4-frame image is 3/4; similarly, the frame number of the user B in the 4-frame image accounts for 100%.

S303, determining a target object in the candidate objects according to the target parameter of each candidate object;

for example, the target object is a subject object, and thus, the target object may be determined in combination with the target parameters;

when the target parameter includes the area ratio of the outline of the candidate object in the multi-frame image, the candidate object with the highest area ratio is determined as the target object, i.e., the subject object is the user B (because the area ratio 160% corresponding to the user B is greater than the area ratio 120% corresponding to the user a).

And when the target parameters comprise the frame number ratios of the candidate objects in the multi-frame images, determining the candidate object with the highest frame number ratio as the target object, namely the main object is the user B.

When the target parameters include the area ratio of the outline of the candidate object in the multi-frame image and the frame number ratio of the candidate object in the multi-frame image, a weighting operation may be performed on the two parameters, for example, the preset weight of the area ratio is 0.8, the preset weight of the frame number ratio is 0.6, the score corresponding to the user a is 0.8 × 120% +0.6 × 3/4, the score corresponding to the user B is 0.8% + 160% +0.6 × 1, and the candidate object with the highest score is determined as the target object, i.e., the subject object.

S304, acquiring a target action of the target object in the at least one video clip.

In one embodiment, when the target attribute is the target attribute of the target sub-segment, the target action obtained here is the target object in the target sub-segment, and the target action in the target sub-segment.

When the motion of an object in a video is recognized, the motion recognition technology can be utilized to respectively send the RGB images and the optical flow images into two neural networks and fuse the final classification result to determine the motion of the object in the video, and the motion of the object has a high recognition rate.

Specifically, in one example, it is possible to extract each frame image including the target object from an image sequence (including time series information) corresponding to the target sub-segment and determine time series information (corresponding to the optical flow image) corresponding to each frame image; then, the respective frame images are respectively subjected to mask processing to obtain respective frame contour images of the target object (i.e., mask images of the target object corresponding to the RGB images), which corresponds to removal of the background of the target object in the respective frame images. The time sequence information is the corresponding time point sequence of each frame contour image in the target sub-segment.

Then, the contour images of the respective frames are input to an RGB neural network, and the time series information is input to a time series neural network (for example, RNN (recurrent neural network) that processes time series data), so that the motion recognition of the target object is performed, and the output results of the two networks are fused, so that the motion of the target object at the segment join is recognized.

In the embodiment of the application, when a target action of a target object in at least one video clip is acquired, an image sequence corresponding to the at least one video clip may be acquired, where the image sequence includes multiple frames of images; then, for each candidate object in the image sequence, acquiring a target parameter of the candidate object, wherein the target parameter of the candidate object includes: the area ratio of the outline of the candidate object in the multi-frame image and/or the frame number ratio of the candidate object in the multi-frame image; determining a target object of the plurality of candidate objects according to the target parameter of each candidate object; and finally, acquiring the target action of the target object in the at least one video segment. The target object determined by the method is determined based on the frame number ratio of the object in the image sequence and/or the area ratio of the outline of the object in the image sequence, so that the determined target object is the main object in the image sequence, and the determined target special effect is the special effect matched with the target action of the main object in the image sequence, so that the target special effect added to the video clip can be matched with the action of the main object in the video clip, and the video playing effect is improved.

Optionally, in an embodiment, the target attribute includes a target scene in the at least one video segment, and then when step 102 is executed, it may be implemented by S401 to S404:

s401, acquiring an image sequence corresponding to the at least one video clip, wherein the image sequence comprises a plurality of frames of images;

specifically, refer to the above S301, which is not described herein again.

S402, identifying candidate scenes corresponding to each frame of image in the multi-frame images;

s403, acquiring a score value of each candidate scene according to image frame number information corresponding to each candidate scene in the multi-frame image and preset weight information corresponding to each scene;

for example, the segment 3 (i.e., the target sub-segment) in fig. 2 includes 4 frames of images, which respectively correspond to the scene 1, the scene 2, and the scene 3, and then the frame number of the image of the scene 1 is 2, the frame number of the image of the scene 2 is 1, and the frame number of the image is 1, and each scene has a preset weight, for example, the weights of the scene 1, the scene 2, and the scene 3 are respectively 0.8, 0.6, and 0.7, so that the frame number of the image and the weight of each scene can be weighted to obtain the score of each scene, where the scores of the scene 1, the scene 2, and the scene 3 are sequentially 0.8 × 2, 0.6 × 1, and 0.7 × 1.

S404, the candidate scene with the highest scoring value in the candidate scenes is used as the target scene of the at least one video clip.

For example, the scene 1 exemplified above may be the target scene of the segment 3.

In the embodiment of the application, candidate scenes corresponding to each frame of image in an image sequence corresponding to at least one video clip may be identified, a score value of each candidate scene may be calculated based on the number of image frames of each candidate scene appearing in the image sequence and a preset weight of each candidate scene, and the candidate scene with the highest score value is taken as a target scene corresponding to the at least one video clip, so that the target scene identified for the video clip is a scene with higher importance and higher occurrence frequency in the at least one video clip (for example, a clip join), and then a target special effect matched based on the target scene is also matched with the scene content of the video clip with higher degree.

Optionally, in an embodiment, when the target position of the target special effect in the at least one video segment is obtained in step 103, if the target attribute includes a target motion of a target object, a target image sequence including the target motion of the target object may be identified in the at least one video segment, and respective target positions pointed by the target motion in the target image sequence are obtained, and the respective target positions are identified as the target positions of the target special effect in the at least one video segment.

Wherein, since the action of the target object is a dynamic process in general, the action may occur in the multi-frame image of the at least one video segment (e.g. the target sub-segment), and therefore, the multi-frame image (i.e. the target image sequence) including the target action may be identified at, for example, the target sub-segment, i.e. the segment junction. Since each frame image corresponds to a time point in the video segment, the target image sequence can directly correspond to the target time sequence in the video segment (e.g., 0.5s to 1.5s of the target sub-segment). For example, the target motion is a punch motion, a sequence of target images containing the punch motion may be identified.

In addition, since the target motion has a pointed position, for example, in each frame of image formed by the punching motion, the position pointed by the fist in each frame of image forms a target position in the frame of image (for example, if the fist is swung to the right, a certain coordinate point adjacent to the right side of the contour of the fist in one frame of image is a target position in the frame of image), therefore, each target position pointed by the punching motion in the target image sequence can be obtained, and each target position in the target image sequence is each added position of the "water wave" turning special effect corresponding to the punching motion.

For another example, if the target motion of the target object is still, the position pointed by the target motion is the target position of the target object in the target image sequence, and therefore, a "lapped transition" may be added to the target position of the target image sequence, i.e., the position of the target object.

Optionally, in an embodiment, when the target position of the target special effect in the at least one video segment is obtained in step 103, whether the target attribute includes a target motion of the target object, or the target attribute includes the target scene, or the target attribute includes the target motion of the target object and the target scene, the following may be implemented:

determining a second time point in the at least one video segment based on a first time point in the at least one video segment and a duration of the target special effect, identifying a target image sequence in the at least one video segment corresponding to a target time sequence formed by the first time point and the second time point, and identifying each target position of the target image sequence in the at least one video segment as a target position of the target special effect in the at least one video segment.

The first time point is a preset time point, for example, 0s, or 0.3s, and the first time point may be a starting position where the target special effect is located, or may be an ending position.

For example, the first time point is a start position (the first time point is 0s), the duration of the target special effect is 1.5s, the second time point is 1.5s in the at least one video segment (for example, a segment join, that is, a target sub-segment, for example, the segment 3 in fig. 2), the target image sequence corresponding to the 0 th to 1.5s in the segment 3 is an image position to which the target special effect needs to be added, and the specific adding position of the target special effect in each frame image in the target image sequence is not limited. For example, when the target attribute includes a target scene, the present embodiment is preferably used to add a target special effect.

In the embodiment of the present application, if there is a target action of the target object in at least one video clip, in determining the respective target positions of the target special effects that need to be added, a target image sequence containing the target motion of the target object may be identified in the at least one video segment, and acquiring each target position pointed by the target action in the target image sequence respectively, identifying each target position as a target position of the target special effect in the at least one video clip, so that the adding position of the target special effect is the same as the frame image of the target action, and the specific adding position of the target special effect in each frame image is consistent with each target position pointed by the target action corresponding to the target special effect, the type and the specific position of the added target special effect can be kept matched with the target action of the target object in the video clip; in addition, when determining each target position of the target special effect that needs to be added, a target time sequence of the target special effect in at least one video clip may be determined based on the first time point in the at least one video clip and the duration of the target special effect, and the target special effect is further added to a target image sequence corresponding to the target time sequence.

It should be noted that, in the video processing method provided in the embodiment of the present application, the execution subject may be a video processing apparatus, or a control module in the video processing apparatus for executing the video processing method. In the embodiment of the present application, a video processing apparatus executing a video processing method is taken as an example, and the video processing apparatus provided in the embodiment of the present application is described.

Referring to fig. 3, a block diagram of a video processing apparatus of one embodiment of the present application is shown. The video processing apparatus includes:

a receiving module 31, configured to receive a first input of a user, where the first input includes: selecting at least one video segment;

a first obtaining module 32, configured to obtain a target attribute of the at least one video segment, where the target attribute includes: a target action of a target object in the at least one video segment, and/or a target scene in the at least one video segment;

a second obtaining module 33, configured to obtain a target special effect that matches the target attribute, and a target position where the target special effect is located in the at least one video segment;

a processing module 34, configured to add the target special effect to the target position in the at least one video segment, and generate a target video.

Optionally, the first obtaining module 32 includes:

the first identification submodule is used for identifying two target segments which are arranged according to a preset sequence in the at least one video segment;

a second identifying sub-module, configured to, for the two target segments, the two target segments including a first segment and a second segment, identify a first sub-segment in the first segment that is within a first preset duration of an end portion of the first segment, and identify a second sub-segment in the second segment that is within a second preset duration of a beginning portion of the second segment, where the two target segments include: the first segment and the second segment are adjacently arranged according to the preset sequence, and the first segment is arranged before the second segment;

the first obtaining sub-module is configured to obtain a target attribute of a target sub-segment, where the target sub-segment includes the first sub-segment and the second sub-segment arranged according to the preset order.

Optionally, the first identification submodule includes:

a dividing unit, configured to, when the at least one video segment includes one video segment, divide the one video segment into at least two target segments arranged according to the preset order according to a scene corresponding to each frame of an image sequence corresponding to the one video segment, where any two target segments arranged adjacently in any one group of the at least two target segments respectively match different scenes, and identify any two target segments arranged adjacently in any one group of the at least two target segments;

the identification unit is configured to, when the at least one video segment includes at least two video segments, identify, as two target segments of the at least one video segment that are arranged according to the preset order, the two video segments that are adjacently arranged in any group of the at least two video segments in the preset order selected by the first input for the at least two video segments.

Optionally, the first obtaining module 32 includes:

a second obtaining sub-module, configured to obtain an image sequence corresponding to the at least one video segment, where the image sequence includes multiple frames of images;

a third obtaining sub-module, configured to obtain, for each candidate object in the image sequence, a target parameter of the candidate object, where the target parameter of the candidate object includes: the area ratio of the outline of each candidate object in the multi-frame image and/or the frame ratio of each candidate object in the multi-frame image;

a determining sub-module, configured to determine a target object of the candidate objects according to the target parameter of each candidate object;

a fourth obtaining sub-module, configured to obtain a target action of the target object in the at least one video segment;

wherein the target attribute comprises a target action of a target object in the at least one video segment.

Optionally, the first obtaining module 32 includes:

a fifth obtaining sub-module, configured to obtain an image sequence corresponding to the at least one video segment, where the image sequence includes multiple frames of images;

the third identification submodule is used for identifying candidate scenes corresponding to each frame of image in the multi-frame images;

a sixth obtaining sub-module, configured to obtain a score value of each candidate scene according to frame number information of the image corresponding to each candidate scene in the multi-frame image and preset weight information corresponding to each scene;

a fourth identifying sub-module, configured to use a candidate scene with a highest scoring value in the plurality of candidate scenes as a target scene of the at least one video segment;

wherein the target attribute comprises a target scene in the at least one video segment.

Optionally, the second obtaining module 33 includes:

the recognition processing sub-module is used for recognizing a target image sequence containing a target action of a target object in the at least one video segment when the target attribute comprises the target action of the target object in the at least one video segment, acquiring target positions respectively pointed by the target action in the target image sequence, and recognizing the target positions as target positions of the target special effect in the at least one video segment;

and the determining and identifying submodule is used for determining a second time point in the at least one video segment based on a first time point in the at least one video segment and the duration of the target special effect, identifying a target image sequence in the at least one video segment, corresponding to a target time sequence formed by the first time point and the second time point, and identifying each target position of the target image sequence in the at least one video segment as the target position of the target special effect in the at least one video segment.

In the embodiment of the application, by acquiring a target action of a target object in at least one video clip and/or a target scene in at least one video clip of a video special effect to be added and acquiring a target special effect matched with the target action and/or the target scene, the target special effect added to at least one video clip can be matched with the target action of the target object in the video clip and/or the scene in which the video clip is located, so that the video special effect can be matched with video content to which the special effect is added; in addition, the target position of the target special effect in the at least one video clip can be obtained, the target special effect is added to the target position, and the target video is generated, so that the adding position of the target special effect can be adapted to the video content, the matching degree between the video special effect and the video content with the added special effect is improved, and the editing effect of the video special effect is improved. In the video editing process, a user only needs to trigger the first input without performing excessive operations, so that the operation of the user on video editing is simplified.

The video processing apparatus in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The video processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The video processing apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to fig. 2, and is not described herein again to avoid repetition.

Optionally, as shown in fig. 4, an electronic device 2000 is further provided in this embodiment of the present application, and includes a processor 2002, a memory 2001, and a program or an instruction stored in the memory 2001 and executable on the processor 2002, where the program or the instruction is executed by the processor 2002 to implement the processes of the above-mentioned embodiment of the video processing method, and can achieve the same technical effects, and no further description is provided here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 5 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The user input unit 1007 is configured to receive a first input from a user, where the first input includes: selecting at least one video segment;

a processor 1010 configured to obtain a target attribute of the at least one video segment, where the target attribute includes: a target action of a target object in the at least one video segment, and/or a target scene in the at least one video segment; acquiring a target special effect matched with the target attribute and a target position of the target special effect in the at least one video clip; and adding the target special effect at the target position in the at least one video clip to generate a target video.

Optionally, the processor 1010 is configured to identify two target segments of the at least one video segment, which are arranged according to a preset order; for the two target segments, the two target segments include a first segment and a second segment, a first sub-segment in the first segment within a first preset duration of an end portion of the first segment is identified, and a second sub-segment in the second segment within a second preset duration of a beginning portion of the second segment is identified, wherein the two target segments include: the first segment and the second segment are adjacently arranged according to the preset sequence, and the first segment is arranged before the second segment; and acquiring the target attribute of a target sub-segment, wherein the target sub-segment comprises the first sub-segment and the second sub-segment which are arranged according to the preset sequence.

In the embodiment of the present application, by identifying two target segments arranged in a preset order in the at least one video segment, and, for the two target segments, identifying a first sub-segment in the first segment within a first preset time duration at the end of the first segment and identifying a second sub-segment in the second segment within a second preset time duration at the beginning of the second segment, and then obtaining the target attribute of the target sub-segment composed of the first sub-segment and the second sub-segment, when adding a special effect to at least one video segment, a target scene at the junction of the two target segments (i.e. the target sub-segment) and/or a target action of a target object can be identified to obtain a target special effect matching therewith, so that the target special effect added to at least one video segment can be matched with the content at the junction of different video segments, the content matching degree of the joint of the video special effect and the video clips is improved, so that the transition between the video clips is smoother.

Optionally, the processor 1010 is configured to, when the at least one video segment includes one video segment, divide the one video segment into at least two target segments arranged according to the preset order according to a scene corresponding to each frame of an image sequence corresponding to the one video segment, where any two adjacent target segments in any one group of the at least two target segments match different scenes, and identify any two adjacent target segments in the at least two target segments; or, when the at least one video segment includes at least two video segments, the first input further includes the preset order selected for the at least two video segments, and two video segments arranged adjacently in any group of the at least two video segments are identified as two target segments arranged in the preset order in the at least one video segment.

Optionally, the processor 1010 is configured to obtain an image sequence corresponding to the at least one video segment, where the image sequence includes multiple frames of images; for each candidate object in the image sequence, acquiring a target parameter of the candidate object, wherein the target parameter of the candidate object comprises: the area ratio of the outline of each candidate object in the multi-frame image and/or the frame ratio of each candidate object in the multi-frame image; determining a target object of the plurality of candidate objects according to the target parameter of each candidate object; acquiring a target action of the target object in the at least one video segment, wherein the target attribute comprises the target action of the target object in the at least one video segment;

Optionally, the processor 1010 is configured to obtain an image sequence corresponding to the at least one video segment, where the image sequence includes multiple frames of images; identifying candidate scenes corresponding to each frame of image in the multi-frame images; acquiring the score value of each candidate scene according to the image frame number information of each candidate scene in the multi-frame image and the preset weight information corresponding to each scene; and taking a candidate scene with the highest scoring value in the plurality of candidate scenes as a target scene of the at least one video segment, wherein the target attribute comprises the target scene in the at least one video segment.

In the embodiment of the application, candidate scenes corresponding to frames of an image sequence corresponding to at least one video clip may be identified, a score of each candidate scene is calculated based on the number of frames of the image sequence in which each candidate scene appears and a preset weight of each candidate scene, and the candidate scene with the highest score is determined as a target scene corresponding to the at least one video clip, so that the target scene identified for the video clip is a more important scene and a more frequently appearing scene in the at least one video clip (for example, a clip junction), and then a target special effect matched based on the target scene also has a higher matching degree with the scene content of the video clip.

Optionally, the processor 1010 is configured to, when the target attribute includes a target motion of a target object in the at least one video segment, identify, in the at least one video segment, a target image sequence including the target motion of the target object, and obtain respective target positions pointed by the target motion in the target image sequence, and identify the respective target positions as target positions where the target special effect is located in the at least one video segment; or, based on a first time point in the at least one video segment and a duration of the target special effect, determining a second time point in the at least one video segment, identifying a target image sequence in the at least one video segment corresponding to a target time sequence formed by the first time point and the second time point, and identifying each target position of the target image sequence in the at least one video segment as a target position of the target special effect in the at least one video segment.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the graphics processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above video processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of video processing, the method comprising:

2. The method of claim 1, wherein the obtaining the target property of the at least one video segment comprises:

identifying two target segments which are arranged according to a preset sequence in the at least one video segment;

for the two target segments, the two target segments include a first segment and a second segment, a first sub-segment in the first segment within a first preset duration of an end portion of the first segment is identified, and a second sub-segment in the second segment within a second preset duration of a beginning portion of the second segment is identified, wherein the two target segments include: the first segment and the second segment are adjacently arranged according to the preset sequence, and the first segment is arranged before the second segment;

obtaining a target attribute of a target sub-segment, wherein the target sub-segment comprises: the first sub-segment and the second sub-segment are arranged according to the preset sequence.

3. The method of claim 2, wherein the identifying two target segments of the at least one video segment that are arranged in a predetermined order comprises:

when the at least one video clip comprises one video clip, dividing the one video clip into at least two target clips arranged according to the preset sequence according to a scene corresponding to each frame of image in an image sequence corresponding to the one video clip, wherein any one group of two adjacent arranged target clips in the at least two target clips are respectively matched with different scenes, and any one group of two adjacent arranged target clips in the at least two target clips are identified;

or the like, or, alternatively,

when the at least one video segment comprises at least two video segments, the first input further comprises the preset sequence selected for the at least two video segments, and the two video segments arranged adjacently in any group of the at least two video segments are identified as two target segments arranged according to the preset sequence in the at least one video segment.

4. The method of claim 1, wherein the target attribute comprises a target action of a target object in the at least one video segment;

the obtaining of the target attribute of the at least one video segment includes:

acquiring an image sequence corresponding to the at least one video segment, wherein the image sequence comprises a plurality of frames of images;

for each candidate object in the image sequence, acquiring a target parameter of the candidate object, wherein the target parameter of the candidate object comprises: the area ratio of the outline of each candidate object in the multi-frame image and/or the frame ratio of each candidate object in the multi-frame image;

determining a target object of the plurality of candidate objects according to the target parameter of each candidate object;

and acquiring a target action of the target object in the at least one video segment.

5. The method of claim 1, wherein the target attribute comprises a target scene in the at least one video segment;

identifying candidate scenes corresponding to each frame of image in the multi-frame images;

acquiring the score value of each candidate scene according to the image frame number information of each candidate scene in the multi-frame image and the preset weight information corresponding to each scene;

and taking the candidate scene with the highest scoring value in the plurality of candidate scenes as the target scene of the at least one video clip.

6. A video processing apparatus, characterized in that the apparatus comprises:

7. The apparatus of claim 6, wherein the first obtaining module comprises:

8. The apparatus of claim 7, wherein the first identification submodule comprises:

9. The apparatus of claim 6, wherein the first obtaining module comprises:

10. The apparatus of claim 6, wherein the first obtaining module comprises: