CN111800652A

CN111800652A - Video processing method and device, electronic equipment and storage medium

Info

Publication number: CN111800652A
Application number: CN202010744184.8A
Authority: CN
Inventors: 田济源
Original assignee: Shenzhen TetrasAI Technology Co Ltd
Current assignee: Shenzhen TetrasAI Technology Co Ltd
Priority date: 2020-07-29
Filing date: 2020-07-29
Publication date: 2020-10-20

Abstract

The present disclosure relates to a video processing method and apparatus, an electronic device, and a storage medium, the method including: acquiring video data; performing semantic analysis on a video frame in the video data, and determining at least one to-be-processed segment from the video data according to a semantic analysis result of the video frame; and performing video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect. The embodiment of the disclosure can realize the processing mode of enriching the video data and can also simplify the operation of the user.

Description

Video processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video processing method and apparatus, an electronic device, and a storage medium.

Background

With the development and progress of science and technology, video data is processed more and more. For example: the video data may be clipped, the clipped video segments may be subjected to slow motion effects, fast forward effects, or various special effects.

Disclosure of Invention

The present disclosure proposes a video processing solution for video processing.

According to an aspect of the present disclosure, there is provided a video processing method including:

acquiring video data;

performing semantic analysis on a video frame in the video data, and determining at least one to-be-processed segment from the video data according to a semantic analysis result of the video frame;

and performing video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect.

In a possible implementation manner, the performing semantic analysis on a video frame in the video data, and determining at least one to-be-processed segment from the video data according to a semantic analysis result of the video frame includes:

performing semantic analysis on a video frame in the video data, and determining a starting frame and an ending frame from the video data according to a semantic analysis result of the video frame;

and obtaining at least one processing segment according to the starting frame and the ending frame.

In a possible implementation manner, performing semantic analysis on a video frame in the video data, and determining a starting frame and an ending frame from the video data according to a semantic analysis result of the video frame includes:

when the semantic description of a first video frame in the video data is matched with the semantic description of a starting frame, determining the first video frame as the starting frame;

and when the semantic description of a second video frame after the first video frame is matched with the semantic description of an end frame, determining that the second video frame is the end frame.

sequentially carrying out first semantic analysis on video frames in video data from a video frame at a preset position in the video data, and determining a starting frame in the video data according to a first semantic analysis result of the video frames;

and after the initial frame is determined, sequentially carrying out second semantic analysis on the video frames after the initial frame, and judging whether the video frames after the initial frame are end frames or not according to the second semantic analysis result of the video frames until the end frames are determined.

performing semantic analysis on a first video frame in the video data to obtain semantic description of the first video frame;

determining candidate starting frame semantic descriptions corresponding to the first video frame according to keywords in the semantic descriptions of the first video frame;

determining the first video frame as a starting frame if a first starting frame semantic description matched with the semantic description of the first video frame exists in the candidate starting frame semantic descriptions;

determining a second video frame subsequent to the first video frame as an end frame when the semantic description of the second video frame matches a first end frame semantic description, wherein the first end frame semantic description corresponds to the first start frame semantic description.

In a possible implementation manner, the video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect includes:

and performing frame interpolation processing on the at least one to-be-processed segment to obtain a slow shot segment corresponding to the to-be-processed segment.

In a possible implementation manner, the performing frame interpolation on the at least one to-be-processed segment to obtain a slow-shot segment corresponding to the to-be-processed segment includes:

determining at least one intermediate frame according to the video frame in the segment to be processed;

and inserting the intermediate frame into the segment to be processed to obtain a slow shot segment corresponding to the segment to be processed.

In one possible implementation, the method further includes:

and performing video processing on the slow shot fragments according to the semantic description of the video frames in the slow shot fragments to obtain the slow shot fragments with a specific playing effect.

In a possible implementation manner, the determining at least one intermediate frame according to a video frame in the to-be-processed segment includes:

determining a first frame number of an intermediate frame;

and determining the intermediate frames of the first frame number according to the video frames in the segment to be processed.

In one possible implementation, the determining the first frame number of the intermediate frame includes:

determining a first frame number of the intermediate frame according to semantic categories to which the semantic descriptions of the starting frame and the semantic descriptions of the ending frame of the segment to be processed belong; alternatively, the first and second electrodes may be,

determining a first frame number of the intermediate frames in response to a setting operation for a slow-lens magnification; alternatively, the first and second electrodes may be,

and determining the first frame number of the intermediate frame according to the duration of the segment to be processed.

acquiring a first light flow diagram from a t frame image to a t-1 frame image, a second light flow diagram from the t frame image to a t +1 frame image, a third light flow diagram from the t +1 frame image to the t frame image and a fourth light flow diagram from the t +1 frame image to a t +2 frame image in the segment to be processed, wherein t is an integer;

determining a first framing light flow map according to the first light flow map and the second light flow map, and determining a second framing light flow map according to the third light flow map and the fourth light flow map;

determining a first frame inserting image according to the first frame inserting light flow graph and the t frame image, and determining a second frame inserting image according to the second frame inserting light flow graph image and the t +1 frame image;

and carrying out fusion processing on the first frame interpolation image and the second frame interpolation image to obtain an intermediate frame inserted between the t frame image and the t +1 frame image.

determining at least two fragments to be merged from the fragments to be processed;

determining a first to-be-merged segment from the to-be-merged segments;

performing video processing on the first to-be-combined segment to obtain a video segment with a specific playing effect corresponding to the first to-be-combined segment;

and merging a second segment to be merged with the video segment with the specific playing effect corresponding to the first segment to be merged to obtain a merged video segment, wherein the second segment to be merged is a segment of the segments to be merged except the first segment to be merged.

determining at least one target to-be-processed segment from the at least one to-be-processed segment;

and carrying out special effect superposition on the at least one target to-be-processed fragment to obtain a special effect fragment.

In one possible implementation, the method further includes:

determining a processing mode of at least one to-be-processed segment;

the video processing of the at least one to-be-processed segment to obtain a video segment with a specific playing effect includes:

and processing the to-be-processed segment in the video data according to the processing mode of the at least one to-be-processed segment to realize a specific playing effect, so as to obtain processed second video data.

According to an aspect of the present disclosure, there is provided a video processing apparatus including:

the acquisition module is used for acquiring video data;

the first determining module is used for performing semantic analysis on a video frame in the video data and determining at least one to-be-processed segment from the video data according to a semantic analysis result of the video frame;

and the first processing module is used for carrying out video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect.

In a possible implementation manner, the first determining module is further configured to:

In a possible implementation manner, the first processing module is further configured to:

In one possible implementation, the apparatus further includes:

and the second processing module is used for carrying out video processing on the slow shot section according to the semantic description of the video frame in the slow shot section to obtain the slow shot section with a specific playing effect.

determining a first frame number of an intermediate frame;

determining a first to-be-merged segment from the to-be-merged segments;

In one possible implementation, the apparatus further includes:

the second determining module is used for determining the processing mode of at least one to-be-processed segment;

the first processing module is further configured to:

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In this way, after video data is acquired, semantic analysis can be performed on video frames in the video data, at least one to-be-processed segment is determined from the video data according to a semantic analysis result of the video frames, and then video processing is performed on the at least one to-be-processed segment, so that a video segment with a specific playing effect is obtained. According to the video processing method and device, the electronic device and the storage medium provided by the embodiment of the disclosure, at least one to-be-processed segment can be automatically determined from the video data according to the semantic analysis result aiming at the video frame in the video data, and corresponding video processing is performed, so that the processing efficiency of the video segment with a specific playing effect can be improved, the processing mode of the video data is enriched, and the operation of a user can be simplified.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of a video processing method according to an embodiment of the present disclosure;

2 a-2 c show schematic diagrams of a video processing method according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a video processing method according to an embodiment of the present disclosure;

fig. 4 shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure;

FIG. 5 shows a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure;

fig. 6 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In the related art, the clipping processing operation on the video data needs to manually determine the video segments to be clipped in the video data by a user, and the clipping efficiency is low and the intelligence is insufficient.

When the video clip is processed with slow motion, the video clip is processed in a manner of repeating frames to obtain a corresponding slow shot clip. The obtained slow-shot segment can only adapt to low-magnification interpolation frames, video blocking and pausing feeling can be caused along with the increase of the frame rate, and user experience is very poor.

To solve at least one of the above problems, an embodiment of the present disclosure provides a video processing method.

Fig. 1 shows a flowchart of a video processing method according to an embodiment of the present disclosure, which may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer-readable instruction stored in a memory. Alternatively, the method may be performed by a server.

As shown in fig. 1, the video processing method may include:

in step S11, video data is acquired.

For example, the video data may be acquired by a video data acquisition device (e.g., a device with a video data acquisition function, such as a mobile phone, a camera, a monitoring device, etc.), or the video data may be acquired by uploading or downloading through a terminal, and the method for acquiring the video data is not particularly limited in the embodiments of the present disclosure.

In step S12, performing semantic analysis on a video frame in the video data, and determining at least one to-be-processed segment from the video data according to a result of the semantic analysis on the video frame.

For example, semantic analysis may be performed on video frames in the video data in sequence to obtain a semantic analysis result corresponding to the video frames. Wherein the semantic analysis result may include a semantic description for describing the content of the video frame, such as: referring to fig. 2a, performing semantic analysis on the video frame, and the obtained semantic analysis result may include: two persons without turning back.

Illustratively, the video frame may be processed through a pre-trained semantic analysis network for performing semantic analysis, so as to obtain a semantic analysis result corresponding to the video frame. For example: the semantic analysis network may include a first network and a second network, where the first network may perform feature extraction on the video frame to obtain image features (which may be represented as vectors), and the second network may process the image features to convert the image features into corresponding language descriptions, so as to obtain corresponding semantic descriptions of the video frame.

The semantic analysis of the video frame by the pre-trained semantic analysis network is an implementation manner of the semantic analysis in the embodiment of the present disclosure, and is not understood as a limitation to the semantic analysis in the embodiment of the present disclosure, and practically any manner of performing the semantic analysis on the video frame may be applied to the embodiment of the present disclosure, and the embodiment of the present disclosure does not specifically limit the semantic analysis manner here.

After the semantic analysis result corresponding to the video frame is obtained, at least one to-be-processed segment can be determined from the video data according to the semantic analysis result. For example: the start frame of the segment to be processed can be determined according to the semantic analysis result corresponding to the video frame in the video data in sequence, after the start frame is obtained, the end frame can be determined according to the semantic analysis result of the video frame after the start frame in sequence, and after the end frame is obtained, a segment to be processed can be determined according to the start frame and the end frame.

For example, in the embodiment of the present disclosure, semantic analysis may be performed on all video frames of a video frame, and also semantic analysis may be performed on one video frame at every fixed frame interval or every fixed time interval according to requirements, so as to improve video processing efficiency.

In step S13, the at least one to-be-processed segment is video-processed to obtain a video segment with a specific playing effect.

For example, after obtaining at least one frame segment to be interpolated, corresponding video processing may be performed on the at least one frame segment to be interpolated, for example: slow shot processing, fast forward processing, reverse play processing, special effect processing, and the like to obtain a video clip with a specific play effect, wherein the specific play effect may include at least one of slow shot, fast play, reverse play, and special effect.

In this way, after video data is acquired, semantic analysis can be performed on video frames in the video data, at least one to-be-processed segment is determined from the video data according to a semantic analysis result of the video frames, and then video processing is performed on the at least one to-be-processed segment, so that a video segment with a specific playing effect is obtained. According to the video processing method provided by the embodiment of the disclosure, at least one segment to be processed can be automatically determined from video data according to the semantic analysis result of the video frame in the video data, and corresponding video processing is performed, so that the processing efficiency of the video segment with a specific playing effect can be improved, the processing mode of the video data is enriched, and the operation of a user can be simplified.

In a possible implementation manner, the performing semantic analysis on a video frame in the video data, and determining at least one to-be-processed segment from the video data according to a result of the semantic analysis on the video frame may include:

and obtaining at least one segment to be processed according to the starting frame and the ending frame.

The conditions of the start frame and the conditions of the end frame may be preset or manually input or selected by a user, and are used for determining the conditions of the start frame and the end frame of the segment to be processed.

For example, whether a video frame meets the condition of a start frame or not may be determined in sequence according to the semantic analysis result of each video frame in the video data, and when the current video frame meets the condition of the start frame, the video frame may be determined to be the start frame. For example, as shown in fig. 2b, the condition of the start frame is a human frame, and when the semantic description of the current video frame indicates that two people frame, it is determined that the video frame meets the condition of the start frame, it may be determined that the video frame is the start frame.

And then, whether the video frame meets the condition of the ending frame or not is determined according to the semantic analysis result of each frame of video frame after the starting frame in sequence, and when the current video frame meets the condition of the ending frame, the video frame can be determined to be the ending frame. For example, as shown in fig. 2c, the condition of the end frame is that a person is not returned, and when the semantic description of the current video frame indicates that two persons are not returned, it is determined that the video frame meets the condition of the end frame, and it may be determined that the video frame is the end frame.

The video segment from the start frame to the end frame (including the start frame and the end frame) is determined as the segment to be processed. Further, the above process may be repeated continuously for the video frames after the end frame until the last frame of the video data to obtain at least one to-be-processed segment.

In a possible implementation manner, performing semantic analysis on a video frame in the video data, and determining a starting frame and an ending frame from the video data according to a result of the semantic analysis on the video frame may include:

For example, the start frame semantic description and the end frame semantic description may be preset or sentences or keywords manually input or selected by a user, for example: if a user wants to obtain a shooting segment in a basketball game and make a corresponding slow shot segment, the semantic description of the starting frame can be set as follows: one person loses a basketball, and the semantic description of the correspondingly set ending frame can be as follows: and (5) putting the basketball into a frame.

The start frame semantic description and the end frame semantic description can be set according to start-stop semantics of a video clip for which a slow shot effect is desired to be obtained, for example, the start frame semantic description is not turned, and the end frame semantic description is turned, so that a slow shot clip for displaying a turning process can be obtained, the start frame semantic description is a near end point, and the end frame semantic description is a passing end point, so that a slow shot clip for displaying a sprint process can be obtained.

Accordingly, when video processing of other special playing effects is performed, the start frame semantic description and the end frame semantic description can be set according to the start-stop semantics of the video segment of the special playing effect to be obtained.

For example, the semantic similarity between the semantic description corresponding to the first video frame of the video data and the semantic description of the start frame may be determined, and when the semantic similarity between the semantic description corresponding to the first video frame and the semantic description of the start frame satisfies a start similarity threshold (a preset value), it may be determined that the first video frame matches with the semantic description of the start frame, and the first video frame is determined as the start frame.

After a first video frame is determined to be a starting frame, semantic similarity between semantic description corresponding to a second video frame after the first video frame and semantic description corresponding to an ending frame can be determined, when the semantic similarity between the semantic description corresponding to the second video frame and the semantic description corresponding to the ending frame meets an ending similarity threshold (preset numerical value, which may be the same as the starting similarity threshold or different from the starting similarity threshold), matching between the semantic description corresponding to the second video frame and the semantic description corresponding to the ending frame can be determined, and the second video frame is determined to be an ending frame.

The video segment from the first video frame to the second video frame can be determined as the segment to be processed.

For example, after the video data is acquired, the default preset position may be the first frame of the video frame, that is, the start frame and the end frame are determined from the video data starting from the first frame of the video frame. Alternatively, the preset position of the video data may be obtained in response to a user operation for setting the preset position (e.g., an operation of sliding the time axis or inputting the playing progress time).

For example, referring to fig. 3, starting from a k-th video frame corresponding to a preset position (k is an integer greater than 0), a first semantic analysis may be performed on the k-th video frame to obtain a first semantic description corresponding to the k-th video frame. It is determined whether the first semantic description of the kth video frame matches the start frame semantic description (e.g., it may be determined that the first semantic description of the video frame matches the start frame semantic description when the semantic similarity of the first semantic description of the kth video frame to the start frame semantic description is greater than or equal to 80%).

When the first semantic description of the kth video frame matches with the start frame semantic description, it may be determined that the kth video frame is the start frame, otherwise, it is determined that k is k + i (i is a preset value, and may be an integer greater than 0, for example, i may be set to 1), and the operation of determining whether the kth video frame is the start frame is repeatedly performed until the start frame is obtained.

After the start frame is obtained, k + i is determined, a second semantic analysis may be performed on a kth video frame after the start frame and spaced from the start frame by i frames, a second semantic description corresponding to the kth video frame is obtained, and whether the second semantic description of the kth video frame matches with the semantic description of the end frame is determined (for example, when the semantic similarity between the second semantic description of the kth video frame and the semantic description of the end frame is higher than or equal to 80%, it may be determined that the second semantic description of the video frame matches with the semantic description of the end frame).

When the second semantic description of the kth video frame matches with the semantic description of the end frame, it may be determined that the kth video frame is the end frame, otherwise, it is determined that k is k + i (i is a preset value, and may be an integer greater than 0, for example, i may be set to 1), and the operation of determining whether the kth video frame is the end frame is repeatedly performed until the start frame is obtained.

Further, after determining the end frame, the foregoing process of determining the start frame and the end frame may be repeatedly performed on the video frame after the end frame until the last frame is performed.

Thus, the precision of the obtained video to be processed can be improved by determining the starting frame and the ending frame through the closed-loop processing mode.

determining a candidate starting frame semantic description corresponding to the first video according to a keyword in the semantic description of the first video frame;

if a first starting frame semantic description matched with the semantic description of a first video frame exists in the candidate starting frame semantic descriptions, determining the first video frame as a starting frame;

For example, a plurality of start frame semantic descriptions and end frame semantic descriptions may be preset and stored, where each start frame semantic description has at least one corresponding end frame semantic description, or the start frame semantic descriptions and the end frame semantic descriptions may be stored in groups according to domain categories, for example: dividing the semantic description of the start frame and the semantic description of the end frame related to the basketball into basketball groups, and dividing the semantic description of the start frame and the semantic description of the end frame related to the dance into dance groups and the like.

After semantic analysis is performed on the first video frame, semantic description of the first video frame can be obtained. Corresponding keywords may be determined from the semantic descriptions of the first video frame, and further, candidate start frame semantic descriptions corresponding to the keywords may be determined from the start semantic descriptions in the storage area according to the keywords (e.g., start frame semantic descriptions including the keywords or synonyms or near-synonyms including the keywords; in case of storing start frame semantic descriptions and end frame semantic descriptions in groups, a group including the keywords in the group name is determined as a candidate group, and the candidate start frame semantic descriptions may be start frame semantic descriptions in the candidate group).

And determining semantic similarity between the semantic description of the first video frame and each candidate starting frame semantic description, and when a first starting frame semantic description with the semantic similarity with the semantic description of the first video frame higher than a starting similarity threshold exists in the candidate starting frame semantic descriptions, determining that the first video frame is a starting frame, otherwise, continuously determining whether the next frame is a starting frame.

When the first video frame is a starting frame, a first ending frame semantic description corresponding to the first starting frame semantic description can be determined, semantic similarity between the semantic description of a second video frame behind the first video frame and the semantic description of the first ending frame is determined, when the semantic similarity is higher than an ending similarity threshold value, the second video frame can be determined to be the starting frame, and if not, whether the next frame is an ending frame or not is continuously determined.

Therefore, after the video data is acquired, the segment to be processed in the video data can be automatically determined, and in the process of determining the starting frame and the ending frame of the segment to be processed, the matching operation can be effectively reduced, so that the video processing efficiency can be improved.

In one possible implementation, the method may further include:

in response to a setting operation on the start frame semantic description and the end frame semantic description, the set start frame semantic description and end frame semantic description are acquired.

For example, before or after acquiring the video data, the user may manually set the start frame semantic description and the end frame semantic description according to the requirement to obtain the corresponding to-be-processed segment. For example, the user may manually input the start frame semantic description and the end frame semantic description in the corresponding input boxes to set the start frame semantic description and the end frame semantic description. Or the user can manually select the start frame semantic description and the end frame semantic description from the preset start frame semantic description and end frame semantic description to set the start frame semantic description and the end frame semantic description. And determining a starting frame and an ending frame according to the set semantic description of the starting frame and the semantic description of the ending frame and the semantic description of the video frame in the video data to obtain at least one to-be-processed segment in the video data.

For example: aiming at a video of a basketball game, if a user wants to obtain all shooting segments in the video, a current display interface comprises setting areas (an input box of the start semantic description and an input box of the end semantic description) of start semantic description and end semantic description, the start frame semantic description can be set as 'people throw away basketball', the end frame semantic description is set as 'ball in box', and after the setting is finished, a determination control is clicked. And the terminal equipment responds to the trigger operation aiming at the determined control and can obtain a plurality of to-be-processed segments corresponding to the shooting according to the set semantic description of the starting frame and the semantic description of the ending frame.

Therefore, the personalized requirements of the user can be met, and the user experience is improved.

In a possible implementation manner, the video processing on the at least one to-be-processed segment may include:

For example, after at least one to-be-processed segment is obtained, for any to-be-processed segment, an intermediate frame may be obtained according to a video frame in the to-be-processed segment, and the intermediate frame is inserted into the to-be-processed segment to obtain a slow shot segment corresponding to the to-be-processed segment, where the slow shot segment exhibits a slow motion effect when played. It should be noted that any frame interpolation mode is applicable to the embodiment of the present disclosure, and the embodiment of the present disclosure does not specifically limit the frame interpolation mode.

Therefore, the to-be-processed segment to be subjected to slow shot processing can be automatically determined according to the semantic analysis result of the video frame in the video data, and then the to-be-processed segment is subjected to frame interpolation operation to obtain the corresponding slow shot segment, so that the processing efficiency of the slow shot segment can be improved, and the operation of a user can be simplified.

In a possible implementation manner, performing frame interpolation on the at least one to-be-processed segment to obtain a slow-shot segment corresponding to the to-be-processed segment may include:

For example, when a to-be-processed segment is subjected to slow shot processing, an intermediate frame inserted into the to-be-processed segment may be determined through a video frame in the to-be-processed segment, so as to obtain a slow shot segment corresponding to the to-be-processed segment.

For example, uniform motion between consecutive video frames may be defaulted, and then an intermediate frame of each two video frames is determined based on the consecutive two video frames according to the linear model, and the intermediate frame is inserted into the corresponding two video frames to obtain a slow shot segment. Or the acceleration of the motion of the object in the video can be sensed according to the multi-frame video frames in the segment to be processed, so that intermediate frames inserted among the multi-frame video frames are obtained, and the slow shot segment corresponding to the segment to be processed is obtained.

Further, after the slow lens segment after frame insertion is obtained, frame insertion operation can be performed on the slow lens segment after frame insertion again according to the requirement of slow lens magnification until the slow lens segment meeting the requirement of slow lens magnification is obtained.

The embodiment of the disclosure does not specifically limit the frame insertion mode, any frame insertion mode can be applied to the embodiment of the disclosure, and the blocking problem of the slow shot section obtained by frame insertion can be relieved, so that the slow shot section is smoother in the playing process.

In a possible implementation manner, the determining at least one intermediate frame according to a video frame in the to-be-processed segment may include:

determining a first frame number of an intermediate frame;

For example, the first frame number is the number of intermediate frames inserted into the to-be-processed segment, and may be preset or determined by the to-be-processed segment. After the first frame number of the intermediate frames is obtained, the intermediate frames with the first frame number can be determined according to the video frames in the segment to be processed, and the intermediate frames are inserted into the segment to be processed to obtain the slow shot segment corresponding to the segment to be processed.

In one possible implementation, determining the first frame number of the intermediate frame may include:

For example, the semantic categories may be categories divided according to keywords corresponding to the start frame semantic description and the end frame semantic description. For example: keywords related to sports may be determined as a sports category, keywords related to a face or a human body may be determined as a human object category, keywords related to a landscape may be determined as a landscape category, and the like.

Different semantic categories may correspond to different first frames, for example: the motion category may correspond to a first frame number of 1, the people category may correspond to a first frame number of 2, the scenery category may correspond to a first frame number of 3, and illustratively, the first frame number of 1 may be greater than the first frame number of 2, and the first frame number of 2 may be greater than the first frame number of 3. The semantic types of the semantic descriptions of the starting frame and the semantic descriptions of the ending frame can be determined according to the keywords corresponding to the semantic descriptions of the starting frame and the semantic descriptions of the ending frame, and then the first frame number of the intermediate frame in the segment to be processed is determined according to the semantic types.

Alternatively, different slow lens magnifications may correspond to different first frame numbers, for example: the larger the slow lens magnification is, the larger the corresponding first frame number is. The slow lens magnification option or the slow lens magnification setting input box and the like can be displayed on a display interface of the terminal device, a user can set the slow lens magnification of the to-be-processed clip through selecting operation or manually inputting the slow lens magnification and the like, and then the first frame number of the intermediate frame is determined according to the slow lens magnification.

Or, the durations of the segments to be processed are different, and the first frame numbers of the corresponding intermediate frames may be different, for example: the longer the duration of the to-be-processed segment is, the smaller the first frame number may be set, and conversely, the shorter the duration of the to-be-processed segment is, the larger the first frame number may be set. Further, the first frame number of the intermediate frame in the to-be-processed segment may be determined according to the duration of the to-be-processed segment, for example: when the duration is within the duration range 1, the corresponding time corresponds to the first frame number 1, and when the duration is within the duration range 2, the corresponding time corresponds to the first frame number 2.

Therefore, the first frame number of the intermediate frame can be adaptively adjusted according to the to-be-processed segment, and the user experience can be improved.

In one possible implementation, the method may further include:

For example, after the slow shot is obtained, a video processing mode for the slow shot may be determined according to semantic description of video frames in the slow shot, and then the slow shot may be processed according to the video processing mode. For example: semantic descriptions in the slow-shot segments comprise video frames of balls such as basketball and football, a spark special effect can be added to obtain video frames with flames on the ball body, and then the slow-shot segments with high visual effect are obtained; or, for video frames in which the semantic description includes actions such as hugging, holding, etc., in the slow shot segment, a filter with a flickering heart-shaped pattern may be added, so as to obtain a warm and romantic slow shot segment, etc. Or, one semantic description may correspond to multiple video processing modes, the corresponding video processing mode may be displayed on a display interface, and a user selects a processing mode for a slow shot segment according to a preference.

The corresponding relationship between the semantic description and the video processing mode may be set according to a requirement, and the embodiment of the present disclosure does not specifically limit the corresponding relationship between the semantic description and the video processing mode.

Therefore, the slow shot clip can be subjected to video processing in a self-adaptive mode according to the clip to be processed, the slow shot clip with a specific playing effect is obtained, and user experience can be improved.

acquiring a first light flow diagram from a t frame video frame to a t-1 frame video frame, a second light flow diagram from the t frame video frame to a t +1 frame video frame, a third light flow diagram from the t +1 frame video frame to the t frame video frame and a fourth light flow diagram from the t +1 frame video frame to a t +2 frame video frame in the to-be-processed fragment, wherein t is an integer;

determining a first frame inserting image according to the first frame inserting light flow graph and the t frame video frame, and determining a second frame inserting image according to the second frame inserting light flow graph image and the t +1 frame video frame;

and carrying out fusion processing on the first frame interpolation image and the second frame interpolation image to obtain an intermediate frame inserted between the t frame video frame and the t +1 frame video frame.

For example, the t frame video frame and the t +1 frame video frame may be two frames of video frames to be inserted in the to-be-processed segment, and the t-1 frame video frame, the t +1 frame video frame and the t +2 frame video frame are consecutive four frames. For example, a video frame adjacent to the t-th frame video frame before the t-th frame video frame may be obtained as a t-1-th frame video frame, and a video frame adjacent to the t + 1-th frame video frame after the t + 1-th frame video frame may be obtained as a t + 2-th frame video frame.

Wherein the optical flow graph is image information for describing a change of the target object in the image composed of optical flows of the target object at respective positions. The method comprises the steps of performing optical flow prediction through a t-1 frame video frame and a t +1 frame video frame, determining a first optical flow diagram from the t frame video frame to the t-1 frame video frame, performing optical flow prediction through the t frame video frame and a t +1 frame video frame, determining a second optical flow diagram from the t frame video frame to a t +1 frame video frame, performing optical flow prediction through a t +1 frame video frame and a t +1 frame video frame, determining a third optical flow diagram from the t +1 frame video frame to the t frame video frame, and performing optical flow prediction through the t +1 frame video frame and a t +2 frame, determining a fourth optical flow diagram from the t +1 frame video frame to the t +2 frame video frame. The optical flow prediction may be implemented by a pre-trained neural network for optical flow prediction, or may be implemented by other methods, which is not described in detail in this disclosure.

And assuming that the element in the video is uniform acceleration motion, the optical flow value of any position in the first interpolation optical flow graph can be determined through the change of the optical flow value of any position in the first optical flow graph and the second optical flow graph, and the optical flow value of any position in the second interpolation optical flow graph can be determined through the change of the optical flow value of any position in the third optical flow graph and the fourth optical flow graph. The first interpolated light stream may be determined by the following equation (one), the first interpolated light stream for each element may constitute a first interpolated light stream map, and the second interpolated light stream for each element may constitute a second interpolated light stream map by the following equation (two).

f_0-＞s(x₀)＝(f_0-＞1+f_0-＞-1)/2·s²+(f_0-＞1-f_0-＞-1) 2 s (formula one)

Wherein f is_0-＞sFirst interpolated optical flow, f, representing the elements from the video frame corresponding to time 0 to the first interpolated image corresponding to time s_0-＞1Second optical flow, f, for representing the elements from the video frame corresponding to time 0 to the video frame corresponding to time 1_0-＞-1First optical flow, x, for representing a video frame corresponding to an element from 0 to-1 instant₀Which is used to indicate the position of the element in the video frame corresponding to time 0.

f_1-＞s(x₀)＝(f_1-＞0+f_1-＞2)/2·(1-s)²+(f_1-＞0-f_1-＞2) 2 (1-s) formula (II)

Wherein f is_1-＞sA second interpolated optical flow, f, representing the elements from the video frame corresponding to the moment 1 to the second interpolated image corresponding to the moment s_1-＞0A third optical flow, f, representing the elements from the video frame corresponding to the moment 1 to the video frame corresponding to the moment 0_1-＞2A fourth optical flow representing the elements from the video frame corresponding to time 1 to the video frame corresponding to time 2.

For example, the first frame-interpolation optical flow graph is an optical flow graph from the t-th frame video frame to the first frame-interpolation image, so the first frame-interpolation image can be obtained by guiding the motion of the t-th frame video frame through the first frame-interpolation optical flow graph, and similarly, the second frame-interpolation optical flow graph is an optical flow graph from the t + 1-th frame video frame to the second frame-interpolation image, so the second frame-interpolation image can be obtained by guiding the motion of the t + 1-th frame video frame through the second frame-interpolation optical flow graph.

The first frame interpolation image and the second frame interpolation image may be fused (for example, the first frame interpolation image and the second frame interpolation image are overlapped), and the result obtained by the fusion processing is the frame interpolation image inserted between the t frame video frame and the t +1 frame video frame.

Therefore, the video processing method provided by the embodiment of the disclosure can determine the intermediate frame through the multi-frame video frame, can sense the acceleration of the motion of the object in the video, and can improve the precision of the obtained intermediate frame, so that the obtained slow shot segment is smoother and more natural, and a better visual effect is obtained.

It should be noted that the above scheme is only one way to determine the intermediate frame in the embodiment of the present disclosure, and actually, any frame interpolation scheme may be applied to the embodiment of the present disclosure, for example: super SloMo frame interpolation algorithm, MEMC-Net motion compensation network and other algorithms, and the generation method of the intermediate frame is not specifically limited in the embodiment of the present disclosure.

In a possible implementation manner, performing video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect may include:

displaying the at least one to-be-processed fragment;

responding to the selection operation of any to-be-processed segment, and performing video processing on the selected to-be-processed segment to obtain a video segment with a specific playing effect.

Taking slow-shot processing as an example, after at least one to-be-processed segment in the video data is obtained, the at least one to-be-processed segment may be displayed on a display interface of the terminal device. For example, at least one to-be-processed segment may be directly displayed on the display interface, and in response to a play operation of a user for any to-be-processed segment, the corresponding to-be-processed segment may be played. Or, at least one segment to be processed may be marked on the playing progress axis of the video data, and the user may drag the playing progress axis to the corresponding mark of the segment to be processed, so as to play the segment to be processed.

After all the fragments to be processed are displayed, the selected fragments to be processed can be determined through selection operations (which may include double-clicking the fragments to be processed, clicking a selection box corresponding to the fragments to be processed, sliding or long-pressing the fragments to be processed, and the like). And responding to a processing request of a user for slow shot processing (the processing request can be sent by triggering a corresponding control or sending a voice instruction and the like), and performing frame interpolation processing on the selected to-be-processed segment to obtain a slow shot segment corresponding to the to-be-processed segment.

In this way, after obtaining a plurality of to-be-processed segments, the user can autonomously select the to-be-processed segment to be subjected to the specific playing effect processing. For example: after the to-be-processed segments corresponding to the plurality of shots are obtained for the video of the basketball game, the user only wants to perform slow shot processing on the to-be-processed segments shot by the planet A, then the first selection operation can be performed only on the to-be-processed segments shot by the planet A, the to-be-processed segments shot by the planet A are selected for slow shot processing, the user experience of the user can be improved, and the video processing mode of the user is enriched.

In one possible implementation, the method may further include:

responding to the sharing operation aiming at the video clip with the specific playing effect, and sharing the video clip with the specific playing effect to a specified platform; alternatively, the first and second electrodes may be,

in response to the saving operation aiming at the video clip with the specific playing effect, saving the video clip with the specific playing effect to the local;

or, in response to the replacement operation for the video segment with the specific playing effect, replacing the segment to be processed in the video data with the video segment with the specific playing effect, so as to obtain processed video data.

Taking slow shot processing as an example, after obtaining a slow shot segment, the slow shot segment may be shared to a specified platform or a specified user in response to a sharing operation for the slow shot (for example, an operation such as sending a voice instruction or triggering a sharing control). For example: after the user obtains the slow shot of the planet A, the slow shot can be shared to the social platform or at least one friend in the address list, the sharing of the slow shot can be achieved, and the video processing mode of the user is enriched.

Or after obtaining the slow shot segments, saving the slow shot segments to the local or designated area in response to a saving operation for the slow shot segments (for example, performing a long press, a double click, a slide operation, or the like for the slow shot segments, or triggering a saving control).

Or after the slow shot segment is obtained, in response to a replacement operation (for example, triggering a replacement control) for the slow shot segment, replacing a to-be-processed segment in the video data with the slow shot segment corresponding to the to-be-processed segment to obtain processed first video data, so as to improve the processing efficiency of a later-stage slow shot effect for the video data.

In a possible implementation manner, the performing video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect may include:

determining at least one segment to be merged from the at least one segment to be processed;

determining a first to-be-merged segment from the to-be-merged segments;

For example, taking slow shot processing as an example, after obtaining at least one to-be-processed segment, at least two to-be-merged segments may be determined from the to-be-processed segments, for example: at least two fragments to be merged can be determined from the fragments to be processed through the semantic description of the video frame or the duration of the fragments to be processed. Or, the selected to-be-processed segment may be determined to be the to-be-merged segment through a selection operation (which may include double-clicking the to-be-processed segment, clicking a selection box corresponding to the to-be-processed segment, sliding or long-pressing the to-be-processed segment, and the like).

For example, the to-be-merged segments may be merged in response to a merging operation (e.g., triggering a merge control) on the to-be-merged segments, resulting in merged video segments. In this way, after obtaining a plurality of to-be-processed fragments, the user can autonomously select the to-be-merged fragments to be merged for merging. For example: after the to-be-processed segments corresponding to the plurality of shots are obtained for the video of the basketball game, the user can select and combine the plurality of more wonderful segments from the to-be-processed segments to obtain a collection of wonderful shooting segments corresponding to the basketball game, so that the user experience of the user can be improved, and the video processing mode of the user is enriched and simplified.

Or, corresponding to-be-processed segments can be intercepted for different video data and stored in the to-be-processed segment set as a clip video material. In response to a user's selection operation for a to-be-processed clip in the set of to-be-processed clips, a plurality of to-be-processed clips can be merged into one video. For example: if a user wants to cut a plurality of movies into a piece of work, the plurality of movies can be used as video data, at least one corresponding start frame semantic description and at least one corresponding end frame semantic description are set, and at least one corresponding to-be-processed segment is intercepted from the plurality of movies. And the final clipping work is obtained by selecting and combining the intercepted to-be-processed segments, so that the clipping efficiency of the video data can be improved, and the clipping operation of the video data can be simplified.

Or after determining a plurality of to-be-merged segments, a first to-be-merged segment to be subjected to slow-shot processing may be determined from the to-be-merged segments, and segments other than the first to-be-merged segment in the to-be-merged segments are second to-be-merged segments.

For example, the first segment to be merged may be determined according to the duration of the segment to be merged, for example: at least one segment to be merged, in which the duration is shorter, may be determined as the first segment to be merged. Alternatively, the first segment to be merged may be determined according to semantic similarity between the semantic description of the starting frame and/or the semantic description of the ending frame of the segment to be merged and the semantic description of the starting frame and/or the semantic description of the ending frame, for example: at least one segment to be merged in which the semantic similarity is higher may be determined as a first merged segment.

Therefore, after the segments to be merged are determined, the first segments to be merged, which are to be subjected to slow-shot processing, can be determined in a self-adaptive manner, and are subjected to slow-shot processing, so that the video processing effect can be improved, and the user experience can be improved.

For example, in response to a selection operation on at least one to-be-merged segment (which may include double-clicking the to-be-merged segment, clicking a selection box corresponding to the to-be-merged segment, sliding or long-pressing the to-be-merged segment, etc.), a first to-be-merged segment to be subjected to slow-shot processing is selected from the to-be-merged segments, and the first to-be-merged segment is subjected to frame interpolation processing, so as to obtain a corresponding slow-shot segment. And the second segment to be merged and the slow shot segment corresponding to the first segment to be merged may be merged (for example, the second segment to be merged and the slow shot segment corresponding to the first segment to be merged may be merged in response to the merging operation for the segment to be merged), resulting in a merged video segment.

In this way, after obtaining a plurality of to-be-processed fragments, the user can autonomously select the to-be-merged fragments to be merged for merging. For example: after the to-be-processed segments corresponding to the plurality of shots are obtained for the video of the basketball game, the user can select and combine the plurality of more wonderful segments from the to-be-processed segments and perform slow shot processing on the more wonderful segments to obtain a collection of wonderful shot segments corresponding to the basketball game, so that the user experience of the user can be improved, and the video processing mode of the user is enriched and simplified.

For example, various special effect templates may be preset, such as: after obtaining at least one to-be-processed segment, various filter templates, sound effect special effect templates and the like, at least one target to-be-processed segment can be determined from the to-be-processed segments. For example: the target to-be-processed segment can be determined from the to-be-processed segments according to the semantic description of the video frame in the to-be-processed segment or the duration of the to-be-processed segment. Or, in response to a selection operation of the user for at least one to-be-processed segment (which may include double-clicking the to-be-processed segment, clicking a selection box corresponding to the to-be-processed segment, sliding or long-pressing the to-be-processed segment, and the like), determining that the to-be-processed segment corresponding to the fourth selection operation is the target to-be-processed segment. And responding to the selection operation aiming at the special effect template, adding the special effect corresponding to the selected special effect template to the target to-be-processed fragment to obtain a corresponding special effect fragment.

For example: if a user wants to add an aesthetic filter to all the segments with flowers in the video data, all the video segments with flowers in the video data can be obtained by setting corresponding semantic descriptions of the starting frame and the ending frame (for example, the semantic description of the starting frame is set to be flower and the semantic description of the ending frame is set to be not flower), and the semantic descriptions corresponding to all the video frames of the video data and the set semantic descriptions of the starting frame and the semantic descriptions of the ending frame, and the video segments are used as segments to be processed and displayed on a display interface for the user to select. After the user selects the target to-be-processed segment from the to-be-processed segments through the fourth selection operation, the corresponding special effect (such as an aesthetic filter) can be selected from a preset special effect template. In response to the selection operation of the user on the aesthetic filter, the corresponding aesthetic filter can be superimposed on the target segment to be processed to obtain the video data after the filter is added. The special effect segment can continue to be processed by slow shot, fast play, reverse play and the like subsequently so as to further realize the specific playing effect.

Therefore, the processing modes of the user on the video data are enriched, the user can clip the video data or perform corresponding processing operation, and the video processing efficiency is improved.

In one possible implementation, the method may further include:

determining a processing mode of at least one to-be-processed segment;

the performing video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect may include:

For example, a user may have less-than-desired shots while watching video data, such as: some people like watching horror films but fear some shots in the horror films (for example, fear of seeing bloody shots), the fragments to be processed which are not too much to be watched can be found by setting corresponding semantic descriptions of the starting frame and the ending frame before watching the video data. And when setting the semantic description of the starting frame and the semantic description of the ending frame, setting the processing mode aiming at the fragments to be processed (for example, adding special effects such as mosaic, directly deleting the fragments to be processed from the video data, fast forwarding and other operations). After the to-be-processed segment is obtained, the to-be-processed segment may be processed according to the processing manner (e.g., the to-be-processed segment is deleted), so as to obtain processed second video data.

Therefore, the user can customize the to-be-processed segments in the video data according to the needs to obtain the required specific playing effect, and the interestingness and the user experience of watching the video data by the user can be increased.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a video processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any video processing method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

Fig. 4 shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure, which, as shown in fig. 4, includes:

an obtaining module 41, which may be configured to obtain video data;

a first determining module 42, configured to perform semantic analysis on a video frame in the video data, and determine at least one to-be-processed segment from the video data according to a semantic analysis result of the video frame;

the first processing module 43 may be configured to perform video processing on the at least one to-be-processed segment to obtain a video segment with a specific playing effect.

In this way, after video data is acquired, semantic analysis can be performed on video frames in the video data, at least one to-be-processed segment is determined from the video data according to a semantic analysis result of the video frames, and then video processing is performed on the at least one to-be-processed segment, so that a video segment with a specific playing effect is obtained. According to the video processing device provided by the embodiment of the disclosure, at least one segment to be processed can be automatically determined from video data according to the semantic analysis result of a video frame in the video data, and corresponding video processing is performed, so that the processing efficiency of the video segment with a specific playing effect can be improved, the processing mode of the video data is enriched, and the operation of a user can be simplified.

In a possible implementation manner, the first determining module may be further configured to:

In a possible implementation manner, the first processing module may be further configured to:

In one possible implementation, the apparatus may further include:

determining a first frame number of an intermediate frame;

determining a first to-be-merged segment from the to-be-merged segments;

In one possible implementation, the apparatus may further include:

the first processing module may be further configured to:

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code, which when run on a device, a processor in the device executes instructions for implementing the video processing method provided in any of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the video processing method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 5 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 5, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 6 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 6, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present invention^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of video processing, the method comprising:

acquiring video data;

2. The method according to claim 1, wherein the performing semantic analysis on the video frame in the video data, and determining at least one to-be-processed segment from the video data according to a result of the semantic analysis on the video frame comprises:

3. The method of claim 2, wherein performing semantic analysis on a video frame in the video data and determining a start frame and an end frame from the video data according to the semantic analysis result of the video frame comprises:

4. The method according to claim 2 or 3, wherein performing semantic analysis on a video frame in the video data, and determining a start frame and an end frame from the video data according to a result of the semantic analysis on the video frame comprises:

5. The method of claim 3, wherein performing semantic analysis on a video frame in the video data, and determining a start frame and an end frame from the video data according to a result of the semantic analysis on the video frame comprises:

6. The method according to claim 1, wherein said video processing the at least one to-be-processed segment to obtain a video segment with a specific playing effect comprises:

7. The method according to claim 6, wherein the frame interpolation processing on the at least one segment to be processed to obtain a slow-shot segment corresponding to the segment to be processed comprises:

8. The method of claim 7, further comprising:

9. The method according to claim 7 or 8, wherein the determining at least one intermediate frame from the video frames in the to-be-processed segment comprises:

determining a first frame number of an intermediate frame;

10. The method of claim 9, wherein determining the first number of intermediate frames comprises:

11. The method according to any one of claims 7 to 10, wherein said determining at least one intermediate frame from the video frames in the segment to be processed comprises:

12. The method according to claim 1, wherein said video processing the at least one to-be-processed segment to obtain a video segment with a specific playing effect comprises:

determining a first to-be-merged segment from the to-be-merged segments;

13. The method according to claim 1, wherein said video processing the at least one to-be-processed segment to obtain a video segment with a specific playing effect comprises:

14. The method of claim 1, further comprising:

determining a processing mode of at least one to-be-processed segment;

15. A video processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring video data;

16. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 14.

17. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 14.