CN113810782B

CN113810782B - Video processing method and device, server and electronic device

Info

Publication number: CN113810782B
Application number: CN202010537095.6A
Authority: CN
Inventors: 张士伟; 夏朱荣; 唐铭谦
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Damo Academy (Beijing) Technology Co.,Ltd.
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2022-09-27
Anticipated expiration: 2040-06-12
Also published as: CN113810782A

Abstract

The embodiment of the application provides a video processing method and device, a server and an electronic device, wherein the video processing method comprises the following steps: acquiring a video to be processed; classifying the video to be processed according to first dimension information to obtain at least one first video segment; classifying the video to be processed according to second dimension information to obtain at least one second video clip; and modifying the at least one first video segment by using the at least one second video segment to obtain a target video segment. According to the embodiment of the application, the strip-disassembling efficiency of the video is improved by automatically disassembling the video.

Description

Video processing method and device, server and electronic device

Technical Field

The present application relates to the field of electronic devices, and in particular, to a video processing method and device, a server, and an electronic device.

Background

The video splitting is to split an original complete video into a plurality of videos according to a certain logic thinking or specific requirements so as to mine valuable information and perform video recommendation and other processing because the traditional video is secondarily processed by the internet and a new media short video content platform. When one complete video finishes stripping, a plurality of video segments can be obtained.

In the prior art, the video stripping mainly adopts manual video stripping. Generally, the employee can play the complete video, record the start time point and the end time point of the target video segment in the video to be stripped, and then strip the target video segment by using the recorded start time point and the recorded end time point of the target video segment. For example, the complete video is input into the video capturing software, and the start time and the end time of the split bar are filled in, so that the capturing of the video clip corresponding to the start time and the end time is completed.

However, the method of manually removing the video strips is time-consuming and labor-consuming, and particularly, under the condition that the number of complete videos is large, a large amount of labor cost is consumed, and the efficiency of removing the video strips is low.

Disclosure of Invention

In view of this, embodiments of the present application provide a video processing method and apparatus, a server, and an electronic apparatus, so as to solve the technical problem in the prior art that a stripe splitting efficiency is low due to manual stripe splitting of a video.

In a first aspect, an embodiment of the present application provides a video processing method, including:

acquiring a video to be processed;

classifying videos to be processed according to first dimension information to obtain at least one first video clip;

classifying videos to be processed according to second dimension information to obtain at least one second video segment;

and modifying at least one first video clip by using at least one second video clip to obtain a target video clip.

In a second aspect, an embodiment of the present application provides a video processing method, including:

receiving a video to be processed sent by a user side; the video to be processed is input and obtained by a user at a user side;

classifying videos to be processed according to scene information to obtain at least one scene segment;

classifying videos to be processed according to episode information to obtain at least one episode;

modifying at least one scene segment by using at least one plot segment to obtain a target video segment;

and sending the target video clip to the user side so that the user side can output the target video clip for the user.

In a third aspect, an embodiment of the present application provides a video processing method, including:

acquiring a video to be processed input by a user;

modifying at least one scene segment by utilizing at least one plot segment to obtain a target video segment;

and outputting the target video clip for the user.

In a fourth aspect, an embodiment of the present application provides a video processing method, including:

acquiring a video to be processed;

determining a plurality of modal information corresponding to different video contents in the video to be processed;

respectively determining video clips of the plurality of modal information in the video to be processed, and obtaining modal clips corresponding to the plurality of modal information;

and merging the modal fragments with the same partial fragments in the modal fragments corresponding to the plurality of modal information respectively to obtain at least one plot fragment.

In a fifth aspect, an embodiment of the present application provides a video processing apparatus, including: a storage component and a processing component; the storage component is used for storing one or more computer instructions, and the one or more computer instructions are used for being called by the processing component;

the processing component is to:

acquiring a video to be processed; classifying videos to be processed according to first dimension information to obtain at least one first video segment; classifying videos to be processed according to second dimension information to obtain at least one second video segment; and modifying at least one first video segment by using at least one second video segment to obtain a target video segment.

In a sixth aspect, a server is provided, comprising: a storage component and a processing component; the storage component is used for storing one or more computer instructions, and the one or more computer instructions are used for being called by the processing component;

the processing component is to:

receiving a video to be processed sent by a user side; the video to be processed is input and obtained by a user at a user side; classifying videos to be processed according to scene information to obtain at least one scene segment; classifying videos to be processed according to episode information to obtain at least one episode; and sending the target video clip to the user side so that the user side can output the target video clip for the user.

In a seventh aspect, an electronic device is provided, including: a storage component and a processing component; the storage component is used for storing one or more computer instructions, and the one or more computer instructions are used for being called by the processing component;

the processing component is to:

acquiring a video to be processed input by a user; classifying videos to be processed according to scene information to obtain at least one scene segment; classifying videos to be processed according to episode information to obtain at least one episode; modifying at least one scene segment by using at least one plot segment to obtain a target video segment; and outputting the target video clip for the user.

In an eighth aspect, an embodiment of the present application provides a video processing apparatus, including: a storage component and a processing component; the storage component is to store one or more computer instructions to be invoked by the processing component;

the processing component is to:

acquiring a video to be processed; determining a plurality of modal information corresponding to different video contents in the video to be processed; respectively determining video clips of the plurality of modal information in the video to be processed, and obtaining modal clips corresponding to the plurality of modal information; and merging the modal fragments with the same partial fragments in the modal fragments corresponding to the plurality of modal information respectively to obtain at least one plot fragment.

According to the embodiment of the application, after the to-be-processed video is obtained, the to-be-processed video can be classified according to the first dimension information to obtain at least one first video clip, and the to-be-processed video can also be classified according to the second dimension information to obtain at least one second video clip, so that the first video clip is corrected by using the at least one second video clip to obtain the target clip. The video clips in the video to be processed are obtained by adopting two different dimension information, so that the video clips extracted and obtained under different dimensions can be obtained, and the video clips obtained under different dimensions are utilized to correct the clips to obtain the target video clip. The target video clip can be automatically acquired, the automatic strip removal of the video to be processed can be realized, and the strip removal efficiency of the video can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an embodiment of a video processing method according to an embodiment of the present application;

fig. 2 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 3 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 4 is a schematic time period diagram of a modal fragment according to an embodiment of the present application;

fig. 5 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 6 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 7 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 8 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 9 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 10a to 10c are exemplary diagrams of a video processing method according to an embodiment of the present application;

fig. 11 is a flowchart of another embodiment of a video processing method according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an embodiment of a video processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an embodiment of a server according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of another embodiment of a video processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a" and "an" typically include at least two, but do not exclude the presence of at least one.

It should be understood that the term "and/or" as used herein is merely a relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if," "if," as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a recognition," depending on the context. Similarly, the phrases "if determined" or "if identified (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (a stated condition or event)" or "in response to an identification (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

The technical scheme of the embodiment of the application can be applied to a video stripping scene, and after the video to be processed is split according to the scene, the split fragments are corrected by using the video content, so that the automatic stripping of the video to be processed is realized.

In the prior art, video striping is a reprocessing performed for the requirement of video content in the field of internet and new media. The video stripping is generally performed manually, for example, a user may record the start time and the end time of a video segment of a video to be stripped, and then strip the video from the video to be processed according to the recorded start time and end time. However, the strip removing method needs to manually record the strip removing time of the video and manually remove the strips, and is time-consuming, labor-consuming and low in efficiency.

In order to solve the technical problem, in the embodiment of the present application, after the to-be-processed video is obtained, the to-be-processed video may be classified according to the first dimension information to obtain at least one first video segment, and the to-be-processed video may also be classified according to the second dimension information to obtain at least one second video segment, so that the first video segment is corrected by using the at least one second video segment to obtain the target segment. The video clips in the video to be processed are obtained by adopting two different dimension information, so that the video clips extracted and obtained under different dimensions can be obtained, and the video clips obtained under different dimensions are utilized to correct the clips to obtain the target video clip. The target video clip is automatically acquired, the automatic strip removal of the video to be processed is realized, and the strip removal efficiency of the video is improved.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, a flowchart of an embodiment of a video processing method provided in this application may include the following steps:

101: and acquiring a video to be processed.

102: and classifying the video to be processed according to the first dimension information to obtain at least one first video segment.

103: and classifying the video to be processed according to the second dimension information to obtain at least one second video segment.

Optionally, the first dimension information is different from the second dimension information.

Dimensional information refers to different types of features that can represent the video to be processed. The first dimension information and the second dimension information can be defined by different feature models or feature types, the difference between the first dimension information and the second dimension information is mainly embodied in the difference between the feature types or the feature models, and the first dimension information and the second dimension information respectively represent different types of features. For example, the time dimension and the content dimension belong to two different dimensions. In some embodiments, the first dimension information may be an integrated dimension of time and space, and the second dimension information may be a content dimension.

104: and modifying at least one first video segment by using at least one second video segment to obtain a target video segment.

Modifying the at least one first video segment with the at least one second video segment may include performing intersection processing of the video segments with the at least one second video segment and the at least one first video segment to obtain a target video segment.

In the embodiment of the application, after the videos to be processed are obtained, the videos to be processed may be classified according to the first dimension information to obtain at least one first video clip, and the videos to be processed may also be classified according to the second dimension information to obtain at least one second video clip, so that the first video clip is corrected by using the at least one second video clip to obtain the target clip. The video clips in the video to be processed are obtained by adopting two different dimension information, so that the video clips extracted and obtained under different dimensions can be obtained, and the video clips obtained under different dimensions are utilized to correct the clips to obtain the target video clip. The target video clip is automatically acquired, the automatic strip removal of the video to be processed is realized, and the strip removal efficiency of the video is improved.

In some embodiments, the first dimension information may be scene information in the video to be processed, and the second dimension information may be plot information in the video to be processed.

As shown in fig. 2, a flowchart of an embodiment of a video processing method provided in this embodiment of the present application may include the following steps:

201: and acquiring a video to be processed.

202: and classifying the video to be processed according to scenes to obtain at least one scene segment.

The video processing method provided by the embodiment of the application can be applied to electronic equipment or a server corresponding to the electronic equipment. The electronic device may include, for example: the electronic equipment comprises a mobile phone, a tablet computer, a notebook computer, wearable equipment, an intelligent sound box, a computer and the like. The server corresponding to the electronic device may implement communication with the electronic device based on a wired or wireless communication connection manner, and the server may specifically include: the embodiment of the application does not limit the specific types of the servers too much.

When the technical scheme provided by the application is applied to the electronic equipment, the video to be processed can be acquired by the electronic equipment. When the technical scheme provided by the application is applied to the server corresponding to the electronic equipment, the video to be processed can be acquired by the electronic equipment and sent to the server.

The video to be processed may include a plurality of video types, for example, the video types may include 3GP, AVI, FLV, MP4, M3U8, MPG, ASF, WMV, MOV, TS, WebM, and the like, and the specific type of the video to be processed is not limited in this embodiment.

The scene refers to a video clip in which things happen in a certain time and space, and the contents of people, backgrounds, places, actions and/or behaviors and the like in the same scene have certain continuity. For example, the situation of making a call in a subway car and the situation of making a call in a house are two different scenarios. Scene information may be composed of background, light, hue, composition, character, etc. information in a video frame. In a certain segment of a video to be processed, if information such as background, light, color tone, composition, people, etc. of an image frame of the segment generally does not change or changes very little in a continuous time corresponding to the segment of the video, at this time, the video segment corresponding to the continuous time is a scene segment.

A scene segment may be composed of multiple consecutive shot segments, and multiple shot segments in the same scene segment have a certain feature similarity.

The video to be processed may comprise a plurality of scene segments. When the plurality of scene segments are ordered according to the time of each scene segment in the video to be processed, any two adjacent scene segments may not have the same partial video segment, that is, there is no overlapping video segment between any two adjacent scene segments.

203: and classifying the video to be processed according to the plot information to obtain at least one plot fragment.

The video content in the video to be processed is actually composed of the contents such as people, backgrounds, places, actions and/or behaviors in a comprehensive way. The plot information may include modal information such as people, backgrounds, locations, actions, and/or behaviors in the video to be processed. Multiple modality information can be defined for the episode information, and an episode corresponding to each modality information is respectively obtained. For example, the person in the video to be processed may be extracted as one modality information of a story segment, and the action may be extracted as one modality information of a story segment.

Any episode contains different video content from other episodes, and any episode is continuous from the time dimension of the video, and there is no episode interrupted in the time dimension but continuous in the episodes. For example, suppose that the A episode is in the 10 th minute to 15 th minute of the video to be processed, the video content is small bright and small plum shopping; in the 23 th to 25 th minutes of the video to be processed, the video content is also small bright and small shopping. The two episode segments AB are small bright and small shopping but interrupted in time, so the episode A and the episode B are two different episode segments.

Obtaining the to-be-processed video classified according to the episode information, where obtaining at least one episode may include: extracting character contents in a video to be processed; based on a natural language processing algorithm, carrying out segmentation processing on the text content of the video to be processed to obtain at least one text segmentation result; and obtaining at least one plot segment according to the starting time and the ending time respectively corresponding to the at least one character segmentation result in the video to be processed.

One or more episode segments identical to the partial segments of any one episode segment may not exist in at least one episode segment.

204: and modifying at least one scene segment by using at least one plot segment to obtain a target video segment.

Modifying at least one scene segment with at least one episode segment to obtain a target video segment may include: respectively correcting at least one scene segment by utilizing at least one plot segment to obtain a target video segment; wherein, the target video clip can comprise a plurality of. At least one episode of the scene segments modifies each scene segment in turn to obtain a corresponding target video segment. When any scene segment is corrected by using at least one episode segment, the number of the obtained target video segments is not constant, and one or more target video segments can be obtained.

In the embodiment of the application, the video to be processed is divided into at least one scene segment according to scenes, and then at least one plot segment can be obtained by obtaining the video to be processed which is divided according to plot information, so that at least one scene segment can be corrected by using at least one plot segment to obtain a target video segment. The obtained target video clip is fused with two kinds of information of a scene and a video, and the obtaining precision is higher. The video to be processed is considered from the aspects of scenes and video contents, so that the video clips of the video to be processed are automatically acquired, namely, the video clips are automatically removed, and the video clip removal efficiency is improved.

Because there may be no sound or only music background in the video, it is not accurate enough to divide the plot of the video to be processed only by the text content in the video content, and in order to obtain a more accurate plot segment, the modal information of the video to be processed can be extracted, so as to realize the accurate analysis of the video to be processed.

As shown in fig. 3, a flowchart of another embodiment of a video processing method provided in this embodiment of the present application may include:

301: and acquiring a video to be processed.

302: and classifying the video to be processed according to the scene information to obtain at least one scene segment.

Some steps of the present application are the same as those of the embodiment shown in fig. 1, and are not described herein again.

303: and determining a plurality of modal information corresponding to different video contents in the video to be processed.

The video to be processed comprises rich video content. In order to perform scene analysis on the video content in the video to be processed, different modality information may be set for the video content of the video to be processed. Modality information may refer to a particular presentation mode of content contained in a video, such as text, people, actions, behaviors, backgrounds, locations, and/or content elements that may appear in a video of a tool, vehicle, etc. in the video content. The modality information may be predefined according to the video content of the video to be processed. For example, when the video to be processed is a movie, face images of starring actors of the movie may be all used as modality information.

304: and respectively determining video clips with a plurality of modal information in the video to be processed, and obtaining modal clips corresponding to the plurality of modal information.

The modal information can be characters, actions, behaviors and other elements in the video to be processed, when the modal information appears in the video to be processed, the starting time of the modal information in the video to be processed can be confirmed, and when the modal information does not appear in the video to be processed any more, the ending time of the modal information in the video to be processed can be determined.

The time period of any modality information appearing in the video to be processed may have discontinuity, the modality segment corresponding to any modality information may include at least one, and any two modality segments with the same partial segment do not exist in at least one modality segment corresponding to any modality information, that is, any two modality segments corresponding to any modality information do not overlap with each other. For example, when the video to be processed is a movie video, character a may appear in 10 to 13 points, and may appear in 25 to 30 points, and when the modality information: person a, the corresponding modal segment may include 10 to 13 points of video segment, and 25 to 30 points of video segment.

Wherein, the same modality fragment as any modality fragment part fragment of other modality information can exist in at least one modality fragment of any modality information.

For example, when the video to be processed is a movie video, two pieces of modality information, namely a character a and a character B, may be defined, in the scene, the modality segment corresponding to the character a is an a segment, and the modality segment corresponding to the character B is a B segment, where the occurrence time of the a segment is earlier than that of the B segment. The video segment of the communication between the person a and the person B is a c segment, the c segment exists in the a segment, the c segment also exists in the B segment, and the c segment is the same partial segment in the a segment and the B segment.

It should be noted that the segments referred to in the embodiments of the present application are temporally continuous segments of video, and there is no interruption therebetween.

305: and merging the modal segments with the same partial segments in the modal segments corresponding to the plurality of modal information respectively to obtain at least one episode segment.

In practical applications, the same partial fragment may exist between two or more modality fragments, where any two modality fragments with the same partial fragment exist and respectively correspond to different modality information. Two or more modal fragments with the same partial fragment have certain continuity in the meaning of the story line or the video content, so that the modal fragments with the same partial fragment in the plurality of modal fragments can be combined to obtain at least one story line.

Any plot fragment can be obtained by fusing a plurality of modal fragments related to video content and continuous in the story, but the plot fragment does not only comprise a plurality of modal fragments, but also comprises a video fragment which is not considered as a modal fragment in the video to be processed, so as to ensure the continuity and integrity of the plot fragment.

Merging the modal segments with the same partial segments in the modal segments corresponding to the plurality of modal information, to obtain at least one episode segment may include: dividing modal fragments with the same partial fragments in the modal fragments corresponding to the plurality of modal information into the same modal fragment set to obtain at least one modal fragment set; and respectively merging the modal fragments with the same partial fragments in at least one modal fragment set according to a time sequence to obtain at least one episode.

Combining the modal segments with the same partial segments in at least one modal segment set according to a time sequence to obtain at least one episode segment may include: and arranging at least one modal fragment according to a time sequence, and combining the same partial fragments in the arranged modal fragments to obtain the plot fragments. Taking the above communication scene of the character a and the character B as an example, in the scene, the partial segment of the a segment that is the same as the B segment is the c segment, and the a segment is positioned before the B segment according to the time sequence, so that the a segment and the c segment in the B segment can be merged to obtain the d-episode.

306: and modifying at least one scene segment by using at least one plot segment to obtain a target video segment.

In the embodiment of the application, videos to be processed are classified according to scene information, at least one scene segment is obtained, when the videos to be processed are divided according to the plot information to obtain the at least one plot segment, a plurality of pieces of modality information corresponding to different video contents in the videos to be processed can be determined, video segments with any piece of modality information in the videos to be processed are respectively determined, and the modality segments corresponding to the modality information are obtained to determine modality segments corresponding to the plurality of pieces of modality information respectively. The method comprises the steps of identifying video content by modal information to obtain different modal information to perform data division on the storyline in the video to be processed so as to obtain at least one accurate storyline fragment. Thereafter, a more accurate target video segment may be obtained when the at least one scene segment is modified with the at least one episode segment.

When a plurality of modality clips corresponding to modality information in a video to be processed are determined, analysis of the modality clips can be performed based on a time axis of the video. The modal information appears in the video continuously, so that the modal segments can be determined by the modal start time and the modal end time of the modal information in the video to be processed.

As an embodiment, determining video segments in which a plurality of modality information appears in a video to be processed respectively, and obtaining modality segments corresponding to the plurality of modality information respectively may include:

and determining at least one mode starting time and at least one mode ending time corresponding to the mode starting time of the appearance mode information in the video to be processed aiming at any mode information.

And the mode termination time corresponding to any mode starting time is less than any mode starting time after the mode starting time.

Determining at least one modal fragment corresponding to the modal information in the video to be processed according to at least one modal start time corresponding to any modal information and a modal end time corresponding to at least one modal start time respectively, so as to obtain the modal fragments corresponding to the plurality of modal information respectively.

The at least one modality starting time and the at least one modality ending time corresponding to any modality information appearing in the video to be processed can be determined according to whether the modality information exists in the video to be processed. In a time period between a modality start time corresponding to any modality information and a modality end time corresponding to the modality start time, the modality information is continuously present, that is, the modality information can be detected in each frame image of the video clip corresponding to the modality start time and the modality end time.

The modality information may appear at least once in the video to be processed, and a start time and an end time of each appearance of any modality information in the video to be processed may be detected as a set of modality start time and modality end time corresponding to the modality information. And extracting the video clips of the time periods corresponding to the modal termination time respectively corresponding to the at least one modal start time and the at least one modal start time corresponding to any modal information in the video to be processed to obtain at least one modal clip. One modality information may correspond to at least one modality fragment.

In the embodiment of the application, when a corresponding modality fragment of any modality information is acquired, at least one modality starting time and at least one modality ending time corresponding to the at least one modality starting time of any modality information appearing in a video to be processed can be respectively determined; the mode ending time corresponding to any mode starting time is less than any mode starting time after the mode starting time.

As a possible implementation manner, merging the modality segments with the same partial segments in the modality segments corresponding to the plurality of modality information, respectively, to obtain at least one episode segment includes:

and determining a time period corresponding to at least one mode starting time corresponding to the mode information and a mode ending time corresponding to the at least one mode starting time respectively aiming at any mode information, and obtaining at least one time period corresponding to the mode information.

According to at least one time period corresponding to the plurality of modal information, the time periods with the same partial time period are divided into the same time period set, and at least one time period set is obtained.

Wherein, the plurality of time periods in any time period set have at least one time period which is the same as the partial time period of any time period in the time period set.

For any time period set, determining the minimum modality starting time and the maximum modality ending time in the modality starting time and the modality ending time which correspond to a plurality of time periods of the time period set respectively;

and determining the video clips corresponding to the minimum modality starting time and the maximum modality ending time respectively corresponding to at least one time period set in the video to be processed to obtain at least one plot clip.

When time periods with the same partial time periods exist in the same time period set according to at least one time period corresponding to the plurality of modality information, specifically, if any two time periods have the same partial time periods, the two time periods may be divided into the same time period set until the traversal of the at least one time period corresponding to the plurality of modality information ends, and any one time period only belongs to one time period set, and there is no dividing manner in which one time period belongs to two or more time period sets at the same time. For example, if the a period has been divided into a set of periods, then a B period that is not divided into a certain set of periods may be divided into a set of periods in which the a period is located if there is a same partial period as the a period.

For convenience of understanding, taking the time axis length of the video to be processed as 0 to 200s as an example, assuming that the time periods corresponding to the plurality of modality clips are based on the time axis, in fig. 4, the a time period 401 of the a modality clip is 30 to 50 seconds, the B time period 402 of the B modality clip is 45 to 60 seconds, the C time period 403 of the C modality clip is 38 to 100 seconds, and the D time period 404 of the D modality clip is 150 to 180 seconds. As shown in fig. 4, the a, B, and

C periods

401, 402, and 403 all have partial periods that are the same, and the a, B, and

C periods

401, 402, and 403 may be divided into the same set of periods. The D period 404 is the same as all of the a period 401, the B period 402, and the C period 403, and the D period 404 belongs to another set of periods.

In the embodiment of the application, a plurality of time periods with the same partial time period are divided into the same time period set, and at least one time period set is obtained. The time period set may include a plurality of time periods, each of the time periods corresponds to a modality start time and a modality end time, and a minimum modality start time and a maximum modality end time among the modality start times and the modality end times of the time periods corresponding to the time period set may be determined to obtain a minimum modality start time and a maximum modality end time corresponding to the time period set. Therefore, the video clip of the minimum modality starting time and the maximum modality ending time corresponding to each time period set in the video to be processed can be obtained as a plot clip. The modal start time and the modal end time of the modal fragment are used for carrying out set division so as to divide the modal fragments with continuity in time into the same set, so that the continuous episode of the episode is extracted, and the final episode is obtained. The accuracy and precision of dividing the plot segments are improved.

As an embodiment, modifying at least one scene segment with at least one episode segment to obtain a target video segment may include:

and acquiring target video clips determined based on any scene clip by using at least one episode to obtain the target video clips respectively determined based on at least one scene clip.

As a possible implementation manner, acquiring, by using at least one episode, target video segments determined based on any one of the scene segments to obtain target video segments respectively determined based on the at least one scene segment may include:

for any scene clip, if a first scene clip in at least one scene clip contains the scene clip, determining the scene clip as a target video clip;

if the first plot segment containing the scene segment does not exist in the at least one plot segment and the scene segment contains the second plot segment in the at least one plot segment, determining a target video segment corresponding to the second plot segment to obtain target video segments respectively determined based on the at least one scene segment.

If a scene segment belongs to a partial segment of the first episode, the scene segment may be regarded as a target video segment. When a scene segment belongs to a story segment, the scene segment may belong to a segment in a story segment, but the scene is independent, and the scene segment may be regarded as an independent segment having a partial story segment, so that the scene segment may be regarded as a target video segment to obtain a video segment to be processed in units of scenes.

When determining whether a first episode including a certain scene segment exists in at least one episode, the determination may be made based on whether an intersection or a union exists between time periods corresponding to the scene segments and time periods corresponding to the episode.

If a scene clip contains a story clip, indicating that the scene clip contains at least one story line, a target video clip may be determined based on the story clip.

As a possible implementation manner, a scene segment corresponds to a scene start time and a scene end time; the episode corresponds to episode starting time and episode terminating time;

for any scene segment, if a first scene segment of the at least one scene segment contains a scene segment, determining the scene segment as a target video segment includes:

determining the plot starting time and the plot ending time corresponding to at least one plot fragment respectively;

and aiming at the scene starting time and the scene ending time of any scene segment, if a first scene segment exists in at least one scene segment, wherein the scene starting time is less than or equal to the scene starting time, and the scene ending time corresponding to the scene starting time is greater than or equal to the scene ending time, the scene segment is determined to be a target video segment.

In the embodiment of the application, whether a scene segment is divided into target video segments can be accurately confirmed through the scene starting time and the scene ending time of the scene segment and the plot starting time and the plot ending time of the plot segment, and the accuracy and the efficiency of acquiring the target video segments can be improved.

If the first episode includes a scene segment, the episode part of the first episode except the scene segment can be used as a new episode.

In some embodiments, after determining that the scene segment is a target video segment if a first scene segment of the at least one scene segment contains a scene segment, for any one of the scene segments, the method may further include:

deleting the first episode from the at least one episode;

and selecting the video clips except the scene clip in the first plot as new plot clips to be added into at least one plot clip.

The first episode is divided into at least one episode by a scene segment, wherein one episode is the scene segment.

In some embodiments, the episode start time may be equal to the scene start time and the corresponding episode end time may be equal to the scene end time, the first episode is the same as the scene segment, and the scene segment may be directly regarded as one target video segment.

In still other embodiments, the first episode may be divided into two video segments or three video segments by the scene segment contained in the first episode, one of the divided video segments is the scene segment, and the remaining segments may be new episode. As a possible implementation manner, selecting a video clip in the first episode, except for the scene clip, as a new episode to be added to at least one episode, may include:

selecting a video clip of the video to be processed between the plot starting time and the scene starting time as a third plot clip;

selecting a video segment of the video to be processed from the scene ending time to the plot ending time as a fourth plot segment;

adding the third episode as well as the fourth episode as new episodes to at least one episode.

When the episode start time is the same as the scene start time, the third episode is empty. When the scene end time is the same as the episode end time, the fourth episode is empty.

When the plot start time of the first plot segment is the same as the scene start time of the scene segment contained therein or the difference value is less than a certain time threshold and the plot end time is greater than the scene end time of the scene segment, the first plot segment is divided into two plot sub-segments, and at this time, the third plot segment is empty. The fourth episode may be added to at least one of the episodes as a new episode.

In addition, when the episode start time of the first episode is less than the scene start time of the contained scene segments, and the episode end time is the same as the scene start time of the scene segments or the difference is less than a certain time threshold, the first episode is also divided into two episode sub-segments, and at this time, the fourth episode is empty. The third episode may be added to at least one episode segment as a new episode segment.

The time threshold is a small time constant, and may be set according to actual requirements, for example, 1 second or 0.5 second.

In addition, when the episode starting time of the first episode is less than the scene starting time of the scene segments contained therein, and the episode ending time is greater than the scene ending time of the scene segments, the first episode is divided into three episode sub-segments, and at this time, neither the third episode nor the fourth episode is empty. The third episode and the fourth episode may be added as a new episode to at least one of the episodes.

When an episode belongs to a scene segment, a target video segment may be determined based on the episode. As a possible implementation, the second episode can be directly taken as the target video episode.

When a scene segment includes a plot segment, the image frames corresponding to the plot start time and the plot end time of the plot segment may belong to the same plot but may not be exactly a division point of a complete shot, if the plot segment is directly taken as a target video segment, the target video segment may not be continuous enough in the shot dimension, and possibly both ends are partial shot segments, so that the integrity of the shot of the video to be processed is low, and the browsing is not smooth enough. Therefore, as a possible implementation manner, if there is no first episode containing the scene segments in the at least one episode and the scene segments contain a second episode in the at least one episode, determining target video segments corresponding to the second episode to obtain target video segments respectively determined based on the at least one scene segments may include:

if a first episode containing the scene segments does not exist in the at least one episode and the scene segments contain a second episode in the at least one episode, determining episode starting time corresponding to the second episode as first time and episode ending time corresponding to the second episode as second time; wherein the scene segment is composed of a plurality of shot segments; any shot segment corresponds to a shot start time and a shot end time;

determining a first shot segment of a plurality of shot segments in the scene segment, wherein the shot start time is less than a first time and the shot end time is greater than the first time;

determining a second shot section of a plurality of shot sections in the scene section, wherein the shot starting time is less than a second time and the shot ending time is greater than the second time;

and acquiring a video clip corresponding to the shot starting time of the first shot clip to the shot ending time of the second shot clip in the video to be processed as a target video clip so as to acquire the target video clips respectively determined based on at least one scene clip.

In practical applications, a plurality of shot clips may be included in the episode clip. The shot segments can be obtained by performing shot segmentation and aggregation on the video to be processed. The obtaining manner of at least one scene segment is the same as that of the embodiment shown in fig. 6, and is not described herein again.

In the embodiment of the application, when one scene segment includes one episode, shot segments respectively corresponding to two ends of the episode can be determined based on the episode starting time and the episode ending time respectively corresponding to the episode, so that the obtained to-be-processed video keeps the integrity of shots, the to-be-processed video segment is formed by a series of complete shot segments, and the smoothness is higher during browsing.

As shown in fig. 5, which is a flowchart of another embodiment of a video processing method provided in this application, the method may include:

501: and acquiring a video to be processed.

502: and classifying the video to be processed according to the scene information to obtain at least one scene segment.

503: and classifying the video to be processed according to the plot information to obtain at least one plot fragment.

504: for any scene segment, if a first scene segment of the at least one scene segment contains a scene segment, the scene segment is determined to be a target video segment.

505: the first episode is deleted from the at least one episode.

506: and selecting the video clips except the scene clip in the first episode as new episode clips to be added into at least one episode.

507: if the first episode containing the scene segments does not exist in the at least one episode and the scene segments contain the second episode in the at least one episode, determining target video segments corresponding to the second episode to obtain the target video segments respectively determined based on the at least one scene segments.

It should be noted that, some steps in the embodiment of the present application are the same as those in the foregoing embodiment, and are not described herein again.

In the embodiment of the application, after videos to be processed are classified according to scene information and at least one scene fragment is obtained, the videos to be processed can be classified according to scenario information and at least one scenario fragment is obtained. Thus, for any scene segment again, if the first scene segment in the at least one scene segment contains a scene segment, the scene segment is determined to be a target video segment. Thereafter, the first episode may be deleted from the at least one episode, and an episode of the first episode other than the first episode may be selected as a new episode to be added to the at least one episode. And if the scene segment contains a second episode in the at least one episode segment, determining a target video segment corresponding to the second episode. And providing at least one plot segment to modify at least one scene segment, so that the combination of the scene segment and the video content is tighter, and a target video segment with higher precision is obtained. The video to be processed is considered from the aspects of scenes and video contents, so that the video clips of the video to be processed are automatically acquired, namely, the video clips are automatically removed, and the video clip removal efficiency is improved.

In some embodiments, if the to-be-processed video is directly classified according to the scene information to obtain the at least one scene segment, for example, the deep learning model may be used to identify the scene segment in the to-be-processed video, and the training video and the at least one actual scene segment of the training video are required to be trained as the tag data to obtain the model parameters of the deep learning model. However, because the duration of the training video is long, if the deep learning model is directly adopted to divide the video to be processed into at least one scene segment, the training data memory is large, a large amount of data calculation is required in the training process, the training is difficult, and the acquisition efficiency of the at least one scene segment is low.

To improve the efficiency and accuracy of the acquisition of at least one scene segment. The video to be processed can be segmented by taking the shot as a unit, and then at least one scene segment can be obtained by aggregating the segmented shots. Because the duration of one shot video is short, the shot switching point is trained more simply by adopting a deep learning model, the shot switching process is easy to detect, the training is more accurate by adopting a shot switching detection mode, and the shot switching point in the video to be processed can be detected quickly.

As shown in fig. 6, a flowchart of another embodiment of a video processing method provided in this embodiment of the present application may include:

601: and acquiring a video to be processed.

602: and determining shot switching points of shot switching in the video to be processed to obtain a plurality of shot switching points.

The shot switching point is a time point for switching from one shot to another shot in the video to be processed.

There may be multiple shot cut points in the video to be processed.

603: and dividing the video to be processed into a plurality of shot segments according to the plurality of shot switching points.

Dividing the video to be processed into a plurality of shot sections according to the plurality of shot cut points may include: after the plurality of shot switching points are sequenced according to the time sequence, starting from a first shot switching point, sequentially using each shot switching point as a shot starting time and the next shot switching point of each shot switching point as a shot ending time to obtain a plurality of shot starting times and shot ending times respectively corresponding to the plurality of shot starting times, and sequentially obtaining a video clip corresponding to any shot starting time in the video to be processed and the shot ending time respectively corresponding to the shot starting time as a shot clip so as to obtain a plurality of shot clips.

604: and performing feature similarity aggregation on the plurality of shot segments to obtain at least one shot segment set.

Wherein the shot slice set comprises a plurality of shot slices.

605: at least one scene segment formed by combining at least one shot segment of each of the at least one shot segment set is respectively determined.

606: and classifying the video to be processed according to the plot information to obtain at least one plot fragment.

607: and modifying at least one scene segment by using at least one plot segment to obtain a target video segment.

In the embodiment of the application, when it is determined that shot switching points for shot switching exist in a video to be processed, a plurality of shot switching points can be obtained, so that the video to be processed is divided into a plurality of shot segments according to the plurality of shot switching points. Then, feature similarity aggregation is performed on the multiple shot slices, and at least one shot slice set can be obtained, wherein the shot slice set comprises the multiple shot slices. At least one scene segment formed by the combination of at least one shot segment of each of the at least one shot segment set can be determined. And dividing the acquired video to be processed according to the plot information to obtain at least one plot fragment, and correcting the at least one scene fragment by using the at least one plot fragment to obtain a target video fragment. At least one shot section can be determined quickly and accurately by detecting a plurality of shot switching points in the video to be processed, so that the scene section is acquired based on the shot, the precision and the efficiency are higher, the target video section with higher precision is quickly acquired, and the strip splitting efficiency of the video to be processed is improved.

In practical application, a sliding window mode can be adopted to detect shot switching points in a video to be processed. Determining a shot cut point where a shot cut exists in the video to be processed, and obtaining a plurality of shot cut points may include:

selecting a plurality of window segments from a video to be processed according to the preset window size and the sliding step length;

sequentially inputting the plurality of window segments into a shot detection model obtained by training to obtain a plurality of target window segments with shot switching in the plurality of window segments;

and determining the time point of the middle point of any target window segment corresponding to the video to be processed as a shot switching point so as to obtain a plurality of shot switching points.

The window size and the sliding step length adopted when the sliding window is carried out on the video to be processed can be set according to actual requirements, and if the lens precision requirement on the lens switching point is high, the window size and the sliding step length can be smaller, for example, the window size can be set to be 2 seconds, and the step length can be set to be 1 second. In practical applications. If the requirement for the shot precision of the shot switching point is not high, a larger window size and a sliding step length may be adopted, for example, the window size may be set to 4 seconds, and the step length may be set to 2 seconds.

As another possible implementation manner, feature similarity aggregation is performed on a plurality of shot segments to obtain at least one shot segment set; wherein the set of shot sections including at least one shot section may include:

extracting lens features corresponding to the plurality of lens segments respectively to obtain a plurality of lens features;

and dividing the shot segments with the characteristic similarity meeting the similarity condition into the same shot segment set according to the plurality of shot characteristics so as to obtain at least one shot segment set.

Alternatively, a video processing algorithm may be employed to extract shot features for a shot. For example, a deep learning algorithm may be used to extract shot features of a shot segment to obtain a plurality of shot features.

According to the plurality of shot features, dividing the shot slices with the feature similarity satisfying the similarity condition into the same shot slice set to obtain at least one shot slice set may include: and classifying the plurality of shot features according to the feature similarity by adopting a classification algorithm to obtain at least one shot segment set. Wherein any one shot set includes at least one shot.

In one possible design, dividing the shots whose feature similarities satisfy the similarity condition into the same shot set according to the plurality of shot features to obtain at least one shot set may include:

sequencing the plurality of shot features according to the starting time of the shot sections corresponding to the shot features respectively to obtain a plurality of sequenced shot sections;

determining a first shot segment as a reference shot segment;

sequentially determining N lens segments which are positioned behind the reference lens segment and adjacent to each other in pairs; wherein N is a positive integer greater than 1;

respectively determining the feature similarity of the reference shot segment and the N shot segments according to the plurality of shot features;

if the maximum value of the feature similarity of the reference shot and the adjacent N shots is larger than a preset first threshold value, determining the shot corresponding to the maximum feature similarity as a new reference shot; returning to the step of sequentially determining N lens segments which are positioned behind the reference lens segment and adjacent to each other in pairs and continuing to execute;

if the maximum value of the feature similarity of the shot and the adjacent N shots is smaller than a preset first threshold value, determining all the shots between the reference shot and the shot corresponding to the maximum feature similarity as a shot set, determining the next shot of the shot corresponding to the maximum feature similarity as a reference shot, and returning to the step of sequentially determining the adjacent N shots behind the reference shot and pairwise continuing to execute.

In the embodiment of the application, when the shot segments with the feature similarity satisfying the similarity condition are divided into the same shot segment set according to the plurality of shot features, the plurality of shot features can be sequenced according to the starting time of the respective corresponding shot segments, so as to obtain the sequenced plurality of shot segments. And taking the sequenced first shot segment as a reference shot segment, so that N shot segments which are positioned behind the reference shot segment and adjacent to each other in pairs can be sequentially determined to determine the characteristic pixel degrees of the reference shot segment and the shot segments nearby the reference shot segment, if the maximum value of the characteristic similarity of the N shot segments adjacent to the reference shot segment is greater than a preset first threshold value, the reference shot segment and the shot segments positioned behind the reference shot segment meet the characteristic similarity condition, and in order to determine an accurate classification result of the shot segments, taking the shot segment corresponding to the maximum value of the characteristic similarity as a new shot segment and continuing the process of classifying the shot features of the shot segments. If the maximum value of the feature pixel degrees of the shot section and the adjacent N shot sections is smaller than the preset first threshold, the dividing condition that the shot section and the next N shot sections are not slow enough in feature similarity is indicated, so that the classification of one shot section can be determined to be completed at the moment, and a shot section set is obtained. The classification effect is improved by adopting the basic classification of pixel points which are started by adopting the similarity of the lens segments and the N characteristics positioned behind the lens segments and take the lens characteristics of the lens segments as the characteristics.

As shown in fig. 7, a flowchart of another embodiment of a video processing method provided in this embodiment of the present application may include:

701: and acquiring a video to be processed.

702: and determining shot switching points of shot switching in the video to be processed to obtain a plurality of shot switching points.

703: and dividing the video to be processed into a plurality of shot segments according to the plurality of shot switching points.

704: and performing feature similarity aggregation on the plurality of shot segments to obtain at least one shot segment set.

Wherein the shot slice set comprises at least one shot slice.

705: at least one scene segment formed by combining at least one shot segment of each of the at least one shot segment set is respectively determined.

706: and determining a plurality of modal information corresponding to different video contents in the video to be processed.

707: and respectively determining a video clip with any modal information in the video to be processed, and obtaining a modal clip corresponding to the modal information so as to determine modal clips corresponding to a plurality of modal information respectively.

708: and combining the modal fragments with the same partial fragments in the modal fragments respectively corresponding to the plurality of modal information to obtain at least one episode fragment.

709: traversing at least one scene segment, judging whether a first scene segment containing the scene segment exists in the at least one scene segment aiming at any scene segment, and if so, executing a step 710; if not, step 713 is performed.

710: and determining the scene segment as a target video segment.

711: the first episode segment is deleted from the at least one episode segment.

712: and selecting the video clips except the scene clip in the first plot as new plot clips to be added into at least one plot clip.

713: and judging whether the scene segment contains a second scenario segment in at least one scenario segment, if so, executing the step 714, otherwise, directly returning to the step 709 to obtain the next scene segment.

714: and determining target video clips corresponding to the second plot so as to obtain the target video clips respectively determined based on the at least one scene clip.

In the embodiments of the present application. The method comprises the steps of determining shot switching points of shot switching in a video to be processed, obtaining a plurality of shot switching points, and dividing the video to be processed into a plurality of shot fragments according to the plurality of shot switching points. And obtaining at least one shot segment set by performing feature similarity aggregation on the plurality of shot segments. And at least one shot segment is included in at least one shot segment set, so that at least one scene segment formed by combining at least one shot segment of at least one shot segment set can be respectively determined. The accuracy of the at least one scene segment acquired in this way is higher. By determining a plurality of modal information corresponding to different video contents in a video to be processed, a video clip with any modal information in the video to be processed can be respectively determined, a modal clip corresponding to the modal information is obtained, so that the modal clips with the same partial clip in the modal clips corresponding to the plurality of modal information are determined to be merged, and at least one plot clip is obtained. Traversing at least one scene segment to target any one scene segment, if a first scene segment of the at least one scene segment includes the scene segment, the scene segment may be determined to be a target video segment. The first episode is deleted from the at least one episode. And selecting the video clips except the scene clip in the first scene clip as new scene clips to be added into at least one scene clip. For any scene segment, if the scene segment contains a second episode of the at least one episode, a target video segment corresponding to the second episode can be determined. The scene fragments and the plot fragments are continuously corrected, so that the target video fragments are accurately acquired.

In some embodiments, the target video segment may include a plurality, and after obtaining the plurality of target video segments, the method may further include:

and splicing the plurality of target video clips to obtain a comprehensive video.

When the target video segments are spliced, the existing video splicing algorithm can be adopted, and details are not repeated here.

As a possible implementation manner, splicing a plurality of target video segments to obtain a comprehensive video may include:

selecting a plurality of candidate video clips meeting the splicing condition from a plurality of target video clips;

and splicing the candidate video clips to obtain the comprehensive video.

The spliced video clips can be selected by selecting a plurality of candidate video clips meeting the splicing condition from the plurality of target video clips, so that the candidate video clips for splicing have the splicing condition, the target video clips are prevented from being spliced blindly, and the video quality of the comprehensive video is improved.

In some embodiments, selecting a plurality of candidate video segments that satisfy the splicing condition from the plurality of target video segments may include:

outputting a plurality of target video segments for a user, so that the user can select a plurality of candidate video segments from the plurality of target video segments;

a plurality of candidate video segments selected by a user are obtained.

The multiple candidate video clips selected by the user are obtained in an interactive mode with the user, so that the multiple candidate video clips are the content concerned by the user, the personalized selection of the user is realized, and the personalized characteristics of the comprehensive video are highlighted.

In still other embodiments, selecting a plurality of candidate video segments from the plurality of target video segments that satisfy the splicing condition comprises:

determining content scores corresponding to the target video clips based on the clip contents of the target video clips;

and selecting the target video clips with content scores meeting a preset score threshold value from the plurality of target video clips as a plurality of candidate video clips.

The segment contents of the target video segments are scored, the segment contents of the target video segments can be quantized, candidate video segments with richer segment contents can be obtained, and the video quality of the comprehensive video is improved.

In practical applications, there may be a need for local information to be replaced, for example, replacement of a human face in a video, replacement of a background in a video, and the like. As a possible implementation manner, after obtaining the target video segment, the method may further include:

determining a plurality of image frames corresponding to a target video segment;

detecting a key image frame with key information in a plurality of image frames based on preset key information;

determining target information for replacing the key information;

replacing the key information in the key image frame with target information to obtain a target image frame;

and generating a replacement video clip by using the target image frame and the image frames except the key image frame in the plurality of image frames.

Alternatively, the key information may be face information, background information, or article information. The replacement information may also be face information, background information, or article information. The key information and the replacement information may be set according to actual replacement requirements, and specific types and contents of the key information and the replacement information are not limited too much in the embodiment of the present application.

The information type of the target information for replacing the key information may be the same as the information type of the key information, for example, when the key information is face information, the target information may also be face information. Of course, the information type of the key information may be different from the information type of the target information, for example, the key information may be a human face, and the target information may be an article.

Optionally, the image frames in the target video segment may each correspond to a time stamp. The timestamp of the target image frame is the same as the timestamp of its corresponding key image frame. The time stamp of the key image frame may be determined to be the time stamp of the target image frame.

Generating the replacement video clip using the target image frame and the image frames of the plurality of image frames except the key image frame may include: sequentially ordering the target image frame and the image frames except the key image frame in the plurality of image frames by utilizing the time stamp of the target image frame and the time stamps of the image frames except the key image frame in the plurality of image frames; and generating a replacement video clip according to the ordered target image frame and the image frames except the key image frame in the plurality of image frames.

The method comprises the steps of detecting whether key information exists in a plurality of image frames in a target video clip to obtain key image frames, replacing the key information in the key image frames with the target information to obtain the target image frames, and regenerating the video clip with the obtained target image frames to finish automatic replacement of the target video clip and improve replacement efficiency of some local information of the target video clip.

As shown in fig. 8, which is a flowchart of another embodiment of a video processing method provided in this application, the method may include:

801: and acquiring the video to be processed input by the user.

802: and classifying the video to be processed according to the scene information to obtain at least one scene segment.

803: and classifying the video to be processed according to the plot information to obtain at least one plot fragment.

804: and modifying at least one scene segment by using at least one plot segment to obtain a target video segment.

805: the target video clip is output for the user.

In the embodiment of the application, a user can input a video to be processed at a user side, the user side obtains at least one scene fragment by classifying the video to be processed according to scene information, classifies the video to be processed according to scenario information, obtains at least one scenario fragment, and then corrects the at least one scene fragment by using the at least one scenario fragment to obtain a target video fragment. The user end can output the target video clip for the user. The user side can clip the to-be-processed video of the user so as to improve the video splitting efficiency.

As shown in fig. 9, which is a flowchart of another embodiment of a video processing method provided in this application, the method may include:

901: and receiving the video to be processed sent by the user side.

The video to be processed is input and obtained by a user at a user terminal.

902: and classifying the video to be processed according to the scene information to obtain at least one scene segment.

903: and classifying the video to be processed according to the plot information to obtain at least one plot fragment.

904: and modifying at least one scene segment by using at least one plot segment to obtain a target video segment.

905: and sending the target video clip to the user side so that the user side can output the target video clip for the user.

In the embodiment of the application, the video to be processed sent by the user side can be received, the video to be processed is divided into at least one scene segment according to the scene information, the video to be processed is obtained, the video to be processed is classified according to the plot information, and the at least one plot segment is obtained. By modifying at least one scene segment with at least one episode segment, a target video segment may be obtained. And then, the target video clip can be sent to the user terminal so that the user terminal can output the target video clip for the user. Through interaction with the user side, the intelligent extraction of the target video clip can be realized for the video to be processed of the user, the processing pressure of the user side can be reduced, the automatic strip removal of the online video is realized, and the strip removal efficiency of the video is improved.

For convenience of understanding, the technical solutions of the embodiments of the present application are respectively executed in the server and the electronic device. As shown in fig. 10a, an exemplary diagram of the embodiment of the present application when executed by a server is described in detail by taking a user side as a computer M1 and a server as a cloud server M2 as an example.

The computer M1 may obtain the pending video uploaded by the user and send 1001 the pending video to the server.

After the server M2 obtains the video to be processed, the video to be processed may be classified according to the first dimension information to obtain 1002 at least one first video clip, and then the video to be processed may be classified according to the second dimension information to obtain 1003 at least one second video clip, so that the at least one first video clip is modified 1004 by the at least one second video clip to obtain the target video clip. The server M2 may also send 1005 the target video clip to computer M1 for computer M1 to output the target video clip.

When the user side directly executes the technical scheme of the embodiment of the application, the user side is taken as a tablet computer as an example, and the technical scheme of the embodiment of the application is described in detail.

As shown in fig. 10b, the user may upload the pending video 1006 to the tablet computer M3 through the video upload prompt control 1007, for example, the control name of the video upload prompt control 1007 may be "upload video". For example, the pending video may be obtained by downloading through a video downloading program and then uploaded into a video stripping program of the tablet computer M3. Generally, the video striping program of the tablet computer M3 may classify the video to be processed according to the first dimension information 1008 to obtain at least one first video segment, classify the video to be processed according to the second dimension information 1009 to obtain at least one second video segment, and modify the at least one first video segment by using the at least one second video segment to obtain 1010 the target video segment. The tablet computer M3 may present the target video clip to the user.

In one possible design, the tablet computer M3 may store the obtained target video segments as independent videos in sequence, and display a plurality of videos to the user for the user to store. In yet another possible design, in order to make the user clearly specify the location of the target video clip in the to-be-processed video, the tablet computer M3 may show the time period of the target video clip in the to-be-processed video on the timeline of the to-be-processed video. For convenience of understanding, as shown in fig. 10c, in the display plane of the tablet computer M3, in the timeline of the to-be-processed video, the time periods 1011 corresponding to the target video segments in the to-be-processed video may be highlighted, and the time periods 1011 corresponding to the plurality of target video segments are displayed in the timeline. It should be noted that the display mode of fig. 10c is only schematic and does not limit the technical solution of the embodiment of the present application.

As shown in fig. 11, a flowchart of another embodiment of a video processing method provided in this embodiment of the present application may include:

1101: and acquiring a video to be processed.

1102: and determining a plurality of modal information corresponding to different video contents in the video to be processed.

1103: and respectively determining video clips with a plurality of modal information in the video to be processed, and obtaining modal clips corresponding to the plurality of modal information.

1104: and merging the modal fragments with the same partial fragments in the modal fragments corresponding to the plurality of modal information respectively to obtain at least one plot fragment.

Optionally, the method may further include: and classifying the video to be processed according to the scene information to obtain at least one scene segment.

Optionally, after merging the modality segments with the same partial segments in the modality segments corresponding to the plurality of modality information, respectively, to obtain at least one episode segment, the method may further include:

and modifying at least one scene segment by using at least one plot segment to obtain a target video segment.

Some steps of the embodiments of the present application are the same as some steps of the embodiments described above, and specific implementation manners and executed steps of various steps in the embodiments of the present application may specifically refer to the contents in the embodiments described above, which are not described herein again.

In the embodiment of the application, a plurality of modal information corresponding to different video contents in a video to be processed can be determined, a video clip with any modal information appearing in the video to be processed is respectively determined, and a modal clip corresponding to the modal information is obtained, so that a modal clip corresponding to the plurality of modal information is determined. The method comprises the steps of identifying video content by modal information to obtain different modal information to perform data division on the storyline in the video to be processed so as to obtain at least one accurate storyline fragment.

As shown in fig. 12, a schematic structural diagram of an embodiment of a video processing apparatus provided in an embodiment of the present application, the video processing apparatus may include: a storage component 1201 and a processing component 1202; the storage component 1201 is for storing one or more computer instructions for being invoked by the processing component 1202;

the processing component 1202 is to:

acquiring a video to be processed; classifying videos to be processed according to first dimension information to obtain at least one first video clip; classifying videos to be processed according to second dimension information to obtain at least one second video segment; and modifying at least one first video segment by using at least one second video segment to obtain a target video segment.

In some embodiments, the classifying, by the processing component, the video to be processed according to the first dimension information, and the obtaining at least one first video segment may specifically include:

the classifying, by the processing component, the video to be processed according to the second dimension information, and the obtaining at least one second video segment may specifically include:

the modifying, by the processing component, the at least one first video segment by using the at least one second video segment may specifically include:

As an embodiment, the processing component classifies the video to be processed according to the episode information, and obtaining at least one episode may include:

determining a plurality of modal information corresponding to different video contents in a video to be processed;

respectively determining video clips with a plurality of modal information in a video to be processed, and obtaining modal clips corresponding to the plurality of modal information;

As a possible implementation manner, the processing component respectively determines video segments in which multiple pieces of modality information appear in the video to be processed, and the obtaining of the modality segments corresponding to the multiple pieces of modality information may specifically include:

determining at least one mode starting time and at least one mode ending time corresponding to the mode starting time of the appearance mode information in the video to be processed aiming at any mode information; the modality ending time corresponding to any modality starting time of the modality information is less than any modality starting time behind the modality starting time;

and determining at least one modal fragment corresponding to the modal information according to at least one modal start time corresponding to any modal information and the modal end time corresponding to the at least one modal start time, so as to obtain the modal fragments corresponding to the plurality of modal information.

In some embodiments, the merging, by the processing component, the modality fragments with the same partial fragment in the modality fragments corresponding to the plurality of modality information, and the obtaining at least one episode fragment may specifically include:

determining a time period corresponding to at least one modal start time corresponding to the modal information and a modal end time corresponding to the at least one modal start time respectively, and acquiring at least one time period corresponding to the modal information;

dividing time periods with the same partial time periods into the same time period set according to at least one time period corresponding to the plurality of modal information respectively to obtain at least one time period set; wherein, at least one time period identical to any one time period part time period in the time period set exists in the plurality of time periods in any one time period set;

As another embodiment, the modifying, by the processing component, at least one scene segment by using at least one episode segment, and the obtaining the target video segment may specifically include:

As a possible implementation manner, the acquiring, by the processing component, the target video segments determined based on any one of the scene segments by using at least one of the episode segments to acquire the target video segments respectively determined based on at least one of the scene segments may specifically include:

for any scene segment, if a first scene segment in at least one scene segment contains a scene segment, determining the scene segment as a target video segment;

In some embodiments, a scene segment corresponds to a scene start time and a scene end time; the episode corresponds to episode starting time and episode terminating time;

for any scene segment, if a first scene segment of the at least one scene segment contains a scene segment, the determining that the scene segment is a target video segment may specifically include:

and aiming at the scene starting time and the scene ending time of any scene segment, if a first scene segment exists in at least one scene segment, wherein the scene starting time is less than or equal to the scene starting time, and the corresponding scene ending time is greater than or equal to the scene ending time, the scene segment is determined to be a target video segment.

As yet another embodiment, the processing component is further to:

deleting the first episode from the at least one episode;

In some embodiments, the selecting, by the processing component, a segment other than the scene segment from the first episode as a new episode to be added to the at least one episode specifically may include:

adding the third episode segment and the fourth episode segment as a new episode segment to at least one episode segment.

In some embodiments, the processing component, if there is no first episode containing the scene segments in the at least one episode and the scene segments contain a second episode in the at least one episode, determining target video segments corresponding to the second episode to obtain target video segments respectively determined based on the at least one scene segments may specifically include:

if a first episode containing the scene segments does not exist in at least one episode and the scene segments contain a second episode in the at least one episode, determining the episode starting time corresponding to the second episode as a first time and the corresponding episode ending time as a second time; wherein the scene segment is composed of a plurality of shot segments; any shot segment corresponds to a shot starting time and a shot ending time;

determining a first shot segment of a plurality of shot segments of a scene segment having a shot start time less than a first time and a shot end time greater than the first time;

determining a second shot segment of the plurality of shot segments of the scene segment having a shot start time less than a second time and a shot end time greater than the second time;

As another embodiment, the classifying, by the processing component, the video to be processed according to the scene information, and the acquiring at least one scene segment may specifically include:

determining shot switching points of shot switching in a video to be processed to obtain a plurality of shot switching points;

dividing a video to be processed into a plurality of shot segments according to a plurality of shot switching points;

performing feature similarity aggregation on the plurality of shot segments to obtain at least one shot segment set; wherein the shot section set comprises a plurality of shot sections;

at least one scene segment formed by combining at least one shot segment of at least one shot segment set is determined respectively.

As a possible implementation manner, the determining, by the processing component, a shot switching point where a shot is switched exists in the video to be processed, and the obtaining a plurality of shot switching points may specifically include:

In some embodiments, the processing component performs feature similarity aggregation on the plurality of shot slices to obtain at least one shot slice set; the step of including at least one shot segment in the shot segment set may specifically include:

In some embodiments, the dividing, by the processing component, the shot sections whose feature similarity satisfies the similarity condition into the same shot section set according to the plurality of shot features to obtain at least one shot section set may specifically include:

sequencing the plurality of shot features according to the starting time of the shot segments corresponding to the shot features respectively to obtain a plurality of sequenced shot segments;

determining a first shot segment as a reference shot segment;

if the maximum value of the feature similarity of the reference shot and the adjacent N shots is larger than a preset first threshold value, determining the shot corresponding to the maximum feature similarity as a new reference shot; returning to the step of sequentially determining N lens segments which are positioned behind the reference lens segment and adjacent to each other in pairs and continuing to execute the step;

As one embodiment, the target video clip includes a plurality; the processing component may be further to:

and splicing the plurality of target video segments to obtain a comprehensive video.

As a possible implementation manner, the processing component performs splicing processing on a plurality of target video segments to obtain a comprehensive video specifically includes:

and splicing the candidate video clips to obtain the comprehensive video.

In some embodiments, the selecting, by the processing component, a plurality of candidate video segments that satisfy the splicing condition from the plurality of target video segments may specifically include:

a plurality of candidate video segments selected by a user are obtained.

As yet another embodiment, the processing component may be further to:

determining target information for replacing the key information;

The video processing device in fig. 12 can execute the method of video processing in the embodiments in fig. 1, etc., and the implementation principle and technical effect thereof are not described again. The specific manner in which the various steps are performed by the processing elements in the above-described embodiments has been described in detail in relation to embodiments of the method and will not be set forth in detail herein.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed, any one of the video processing methods according to the foregoing embodiments may be executed.

As shown in fig. 13, for a schematic structural diagram of an embodiment of a server provided in the present application, the server may include: a storage component 1301 and a processing component 1302; storage component 1301 is used to store one or more computer instructions, which are used to be invoked by processing component 1302;

the processing component 1302 is configured to:

receiving a video to be processed sent by a user side; the video to be processed is input and obtained by a user at a user side; classifying videos to be processed according to scene information to obtain at least one scene segment; classifying videos to be processed according to episode information to obtain at least one episode; modifying at least one scene segment by utilizing at least one plot segment to obtain a target video segment; and sending the target video clip to the user side so that the user side can output the target video clip for the user.

The content of the part executed by the processing element in the embodiment of the present application is the same as the content of the part in the embodiment shown in fig. 12, and is not described again here.

The server shown in fig. 13 may be various types of servers, and may include a general server or a cloud server, for example.

In the embodiment of the application, the electronic device can detect a to-be-processed video and target content input by a user, and detect that at least one first video segment of the target content exists in the to-be-processed video. Therefore, the time correction processing can be performed on the first start time respectively corresponding to the at least one first video segment to obtain at least one target start time. At least one video clip to be processed may then be obtained based on the at least one target start time. So that at least one pending video segment can be output for the user. A scheme for direct interaction with a user is provided, so that automatic interception of a video segment to be processed is carried out for the user.

As shown in fig. 14, for a schematic structural diagram of an embodiment of an electronic device provided in an embodiment of the present application, the electronic device may include: storage component 1401 and processing component 1402; storage component 1401 is used to store one or more computer instructions for invocation by processing component 1402;

the processing component 1402 is configured to:

acquiring a video to be processed input by a user; classifying videos to be processed according to scene information to obtain at least one scene segment; classifying videos to be processed according to episode information to obtain at least one episode; modifying at least one scene segment by utilizing at least one plot segment to obtain a target video segment; the target video clip is output for the user.

The electronic device shown in fig. 14 may be a user terminal, and may specifically include a notebook computer, a mobile phone, a tablet computer, a wearable device, and the like. The embodiment of the present application does not limit the specific type of the electronic device.

In the embodiment of the application, the electronic device can detect a to-be-processed video and target content sent by a user, and detect that at least one first video segment of the target content exists in the to-be-processed video. Therefore, the time correction processing can be performed on the first start time respectively corresponding to the at least one first video segment to obtain at least one target start time. At least one video clip to be processed may then be acquired based on the at least one target start time. So that at least one pending video segment can be output for the user. The electronic equipment sends the video to be processed and the target content to the server so as to acquire the video clip to be processed at the server, so that the processing pressure of the electronic equipment can be reduced, and the processing efficiency is improved.

As shown in fig. 15, a schematic structural diagram of another embodiment of a video processing apparatus provided in this embodiment of the present application is shown, where the apparatus may include: a storage component 1501 and a processing component 1502; storage component 1501 is used to store one or more computer instructions for being invoked by processing component 1502;

the processing component 1502 is configured to:

acquiring a video to be processed; determining a plurality of modal information corresponding to different video contents in a video to be processed; respectively determining video clips with a plurality of modal information in a video to be processed, and obtaining modal clips corresponding to the plurality of modal information; and merging the modal segments with the same partial segments in the modal segments corresponding to the plurality of modal information respectively to obtain at least one episode segment.

The content of the part executed by the processing element in the embodiment of the present application is the same as the content of the part executed by the embodiment shown in fig. 12, and is not repeated herein.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described technical solutions and/or portions thereof that contribute to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein (including but not limited to disk storage, CD-ROM, optical storage, etc.).

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A video processing method, comprising:

acquiring a video to be processed;

classifying the video to be processed according to first dimension information to obtain at least one first video clip;

classifying the videos to be processed according to second dimension information to obtain at least one second video segment;

modifying the at least one first video segment by using the at least one second video segment to obtain a target video segment;

wherein, the modifying the at least one first video segment by using the at least one second video segment includes performing intersection processing on the video segments by using the at least one second video segment and the at least one first video segment to obtain a target video segment.

2. The method according to claim 1, wherein the classifying the video to be processed according to the first dimension information to obtain at least one first video segment comprises:

classifying the video to be processed according to scene information to obtain at least one scene segment;

the classifying the video to be processed according to the second dimension information and acquiring at least one second video segment comprises:

classifying the video to be processed according to the plot information to obtain at least one plot fragment;

the modifying the at least one first video segment with the at least one second video segment to obtain the target video segment includes:

and correcting the at least one scene segment by using the at least one plot segment to obtain a target video segment.

3. The method of claim 2, wherein the classifying the video to be processed according to the episode information to obtain at least one episode comprises:

4. The method according to claim 3, wherein the determining the video segments in which the plurality of modality information appears in the video to be processed respectively, and obtaining the modality segments corresponding to the plurality of modality information respectively comprises:

determining at least one mode starting time and at least one mode ending time corresponding to the mode starting time respectively when the mode information appears in the video to be processed aiming at any mode information; the modality ending time corresponding to any modality starting time of the modality information is less than any modality starting time after the modality starting time;

determining at least one modality fragment corresponding to the modality information according to the at least one modality starting time corresponding to any modality information and the modality ending time corresponding to the at least one modality starting time respectively, so as to obtain the modality fragments corresponding to the plurality of modality information respectively.

5. The method according to claim 4, wherein the merging the modal segments with the same partial segments in the modal segments respectively corresponding to the plurality of modal information to obtain at least one episode comprises:

determining, for any modal information, a time period in which at least one modal start time corresponding to the modal information corresponds to a modal end time corresponding to the at least one modal start time, respectively, and obtaining at least one time period corresponding to the modal information;

dividing time periods with the same partial time periods into the same time period set according to at least one time period corresponding to the plurality of modal information respectively to obtain at least one time period set; wherein, the plurality of time periods in any time period set have at least one time period which is the same as the partial time period of any time period in the time period set;

and determining the corresponding video clips of the minimum modality starting time and the maximum modality ending time respectively corresponding to the at least one time period set in the video to be processed to obtain at least one plot clip.

6. The method of claim 2, wherein modifying the at least one scene segment with the at least one episode segment to obtain a target video segment comprises:

and acquiring target video clips determined based on any scene clip by utilizing the at least one episode to obtain the target video clips respectively determined based on the at least one scene clip.

7. The method according to claim 6, wherein the obtaining, by using the at least one episode, target video segments determined based on any one of the scene segments to obtain target video segments respectively determined based on the at least one scene segment comprises:

for any scene clip, if a first scene clip of the at least one scene clip contains the scene clip, determining that the scene clip is a target video clip;

if a first episode containing the scene clip does not exist in the at least one episode and the scene clip contains a second episode in the at least one episode, determining a target video clip corresponding to the second episode to obtain target video clips respectively determined based on the at least one scene clip.

8. The method of claim 7, wherein the scene segments correspond to a scene start time and a scene end time; the episode corresponds to an episode start time and an episode end time;

the determining, for any scene clip, that a scene clip is a target video clip if a first scene clip of the at least one scene clip includes the scene clip comprises:

determining the plot starting time and the plot ending time corresponding to the at least one plot fragment respectively;

and aiming at the scene starting time and the scene ending time of any scene segment, if a first scene segment exists in the at least one scene segment, the scene segment is determined to be a target video segment, wherein the scene starting time is less than or equal to the scene starting time, and the corresponding scene ending time is greater than or equal to the scene ending time.

9. The method of claim 8, further comprising:

deleting the first episode segment from the at least one episode segment;

selecting the video clips except the scene clip in the first episode as new episode to be added into the at least one episode.

10. The method of claim 9, wherein the selecting of the segments of the first episode other than the scene segment as new episode added to the at least one episode comprises:

selecting a video clip of the video to be processed between the scene termination time and the plot termination time as a fourth plot clip;

adding the third episode and the fourth episode as a new episode to the at least one episode.

11. The method of claim 7, wherein if a first episode containing the scene clip does not exist in the at least one episode and the scene clip contains a second episode in the at least one episode, determining a target video clip corresponding to the second episode to obtain a target video clip respectively determined based on the at least one scene clip comprises:

if a first episode containing the scene segment does not exist in the at least one episode and a second episode in the at least one episode is contained in the scene segment, determining that an episode starting time corresponding to the second episode is a first time and a corresponding episode ending time is a second time; wherein the scene segment is composed of a plurality of shot segments; any shot segment corresponds to a shot starting time and a shot ending time;

determining a first shot segment of a plurality of shot segments of the scene segment having a shot start time less than the first time and a shot end time greater than the first time;

determining a second shot segment of a plurality of shot segments of the scene segment having a shot start time less than the second time and a shot end time greater than the second time;

and acquiring a video clip corresponding to the shot starting time of the first shot clip to the shot ending time of the second shot clip in the video to be processed as the target video clip so as to obtain target video clips respectively determined based on at least one scene clip.

12. The method according to any one of claims 2 to 11, wherein the classifying the video to be processed according to the scene information and obtaining at least one scene segment comprises:

determining shot switching points of shot switching in the video to be processed to obtain a plurality of shot switching points;

dividing the video to be processed into a plurality of shot segments according to the shot switching points;

performing feature similarity aggregation on the plurality of shot segments to obtain at least one shot segment set; wherein the set of shot slices comprises a plurality of shot slices;

at least one scene segment formed by combining at least one shot segment of each of the at least one shot segment set is respectively determined.

13. The method of claim 12, wherein the determining that there are shot cut points in the video to be processed, and obtaining a plurality of shot cut points comprises:

selecting a plurality of window segments from the video to be processed according to the preset window size and the sliding step length;

and determining the time point of the middle point of any target window segment corresponding to the video to be processed as the shot switching point so as to obtain a plurality of shot switching points.

14. The method according to claim 12, wherein the feature similarity aggregation is performed on the plurality of shot segments to obtain at least one shot segment set; wherein the set of shot slices including at least one shot slice comprises:

extracting lens features respectively corresponding to the plurality of lens segments to obtain a plurality of lens features;

and dividing the shot segments with the characteristic similarity meeting the similarity condition into the same shot segment set according to the plurality of shot characteristics so as to obtain the at least one shot segment set.

15. The method according to claim 14, wherein the dividing the shots whose feature similarities satisfy the similarity condition into the same shot set according to the plurality of shot features to obtain the at least one shot set comprises:

determining a first shot segment as a reference shot segment;

respectively determining feature similarity of the reference shot and N shot according to the plurality of shot features;

if the maximum value of the feature similarity of the reference shot and the adjacent N shots is larger than a preset first threshold value, determining the shot corresponding to the maximum feature similarity as a new reference shot; returning to the step of sequentially determining N adjacent lens segments behind the reference lens segment and continuing to execute;

if the maximum value of the feature similarity of the shot section and the adjacent N shot sections is smaller than a preset first threshold value, determining that all the shot sections between the reference shot section and the shot section corresponding to the maximum feature similarity are a shot section set, determining that the next shot section of the shot section corresponding to the maximum feature similarity is the reference shot section, and returning to the step of sequentially determining the N adjacent shot sections behind the reference shot section and pairwise continuing to execute.

16. The method of claim 1, wherein the target video segment comprises a plurality; the method further comprises the following steps:

17. The method according to claim 16, wherein the splicing the target video segments to obtain a composite video comprises:

selecting a plurality of candidate video clips meeting splicing conditions from the plurality of target video clips;

and splicing the candidate video clips to obtain the comprehensive video.

18. The method of claim 17, wherein the selecting the candidate video segments that satisfy the splicing condition from the target video segments comprises:

outputting the plurality of target video segments for a user for the user to select a plurality of candidate video segments from the plurality of target video segments;

and acquiring a plurality of candidate video clips selected by the user.

19. The method of claim 17, wherein selecting the candidate video segments that satisfy the splicing condition from the target video segments comprises:

20. The method of claim 1, further comprising:

determining a plurality of image frames corresponding to the target video segment;

detecting a key image frame with the key information in the plurality of image frames based on preset key information;

determining target information replacing the key information;

replacing key information in the key image frame with the target information to obtain a target image frame;

21. A video processing method, comprising:

receiving a video to be processed sent by a user side; the video to be processed is input and obtained by a user at the user side;

modifying the at least one scene segment by using the at least one episode segment to obtain a target video segment, wherein the modifying the at least one scene segment by using the at least one episode segment comprises performing intersection processing on the video segments by using the at least one episode segment and the at least one scene segment to obtain the target video segment;

22. A video processing method, comprising:

acquiring a video to be processed input by a user;

outputting the target video clip for the user.

23. A video processing method, comprising:

acquiring a video to be processed;

determining video clips with the plurality of modal information in the video to be processed respectively, and obtaining modal clips corresponding to the plurality of modal information respectively;

merging the modal fragments with the same partial fragments in the modal fragments corresponding to the plurality of modal information respectively to obtain at least one plot fragment;

and correcting the at least one scene segment by using the at least one episode segment to obtain a target video segment, wherein the correcting the at least one scene segment by using the at least one episode segment comprises performing intersection processing on the video segments by using the at least one episode segment and the at least one scene segment to obtain the target video segment.

24. A video processing apparatus, comprising: a storage component and a processing component; the storage component is to store one or more computer instructions to be invoked by the processing component;

the processing component is to:

acquiring a video to be processed; classifying the video to be processed according to first dimension information to obtain at least one first video clip; classifying the video to be processed according to second dimension information to obtain at least one second video clip; and modifying the at least one first video segment by using the at least one second video segment to obtain a target video segment, wherein modifying the at least one first video segment by using the at least one second video segment comprises performing intersection processing on the video segments by using the at least one second video segment and the at least one first video segment to obtain the target video segment.

25. A server, comprising: a storage component and a processing component; the storage component is to store one or more computer instructions to be invoked by the processing component;

the processing component is to:

receiving a video to be processed sent by a user side; the video to be processed is input and obtained by a user at the user side; classifying videos to be processed according to scene information to obtain at least one scene segment; classifying the video to be processed according to the plot information to obtain at least one plot fragment; modifying the at least one scene segment by using the at least one episode segment to obtain a target video segment, wherein the modifying the at least one scene segment by using the at least one episode segment comprises performing intersection processing on the video segments by using the at least one episode segment and the at least one scene segment to obtain the target video segment; and sending the target video clip to the user side so that the user side can output the target video clip for the user.

26. An electronic device, comprising: a storage component and a processing component; the storage component is to store one or more computer instructions to be invoked by the processing component;

the processing component is to:

acquiring a video to be processed input by a user; classifying videos to be processed according to scene information to obtain at least one scene segment; classifying the video to be processed according to the plot information to obtain at least one plot fragment; modifying the at least one scene segment by using the at least one episode segment to obtain a target video segment, wherein the modifying the at least one scene segment by using the at least one episode segment comprises performing intersection processing on the video segments by using the at least one episode segment and the at least one scene segment to obtain the target video segment; outputting the target video segment for the user.

27. A video processing apparatus, comprising: a storage component and a processing component; the storage component is to store one or more computer instructions to be invoked by the processing component;

the processing component is to:

acquiring a video to be processed; determining a plurality of modal information corresponding to different video contents in the video to be processed; respectively determining video clips of the plurality of modal information in the video to be processed, and obtaining modal clips corresponding to the plurality of modal information; merging the modal fragments with the same partial fragments in the modal fragments corresponding to the plurality of modal information respectively to obtain at least one plot fragment; classifying videos to be processed according to scene information to obtain at least one scene segment; and correcting the at least one scene segment by using the at least one episode segment to obtain a target video segment, wherein the correcting the at least one scene segment by using the at least one episode segment comprises performing intersection processing on the video segments by using the at least one episode segment and the at least one scene segment to obtain the target video segment.