CN116866670A

CN116866670A - Video editing method, device, electronic equipment and storage medium

Info

Publication number: CN116866670A
Application number: CN202310543118.8A
Authority: CN
Inventors: 梁康武
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-10-10

Abstract

The application discloses a video editing method, a video editing device, electronic equipment and a storage medium, and belongs to the technical field of video processing. The method comprises the following steps: displaying N video texts of the video to be processed, wherein N is a positive integer; receiving a first input of a first video text of the N video texts; and responding to the first input, and determining a target video according to a first video frame corresponding to the first video text in the video to be processed.

Description

Video editing method, device, electronic equipment and storage medium

Technical Field

The application belongs to the field of video processing, and particularly relates to a video editing method, a video editing device, electronic equipment and a storage medium.

Background

Long video has become a currently important knowledge propagation medium. Users often learn knowledge through videos such as knowledge videos, course videos, net lesson videos, and the like.

Users often need to edit long videos, typically clip videos while previewing videos, and long time to determine the editing content and edit the editing content, resulting in inefficient video editing.

Disclosure of Invention

The embodiment of the application aims to provide a video editing method, a video editing device, electronic equipment and a storage medium, which can solve the problem of low video editing efficiency.

In a first aspect, an embodiment of the present application provides a video editing method, including:

displaying N video texts of the video to be processed, wherein N is a positive integer;

receiving a first input to a first video text of the N video texts;

and responding to the first input, and determining a target video according to a first video frame corresponding to the first video text in the video to be processed.

In a second aspect, an embodiment of the present application provides an apparatus for video editing, including:

the display module is used for displaying N video texts of the video to be processed, wherein N is a positive integer;

the receiving module is used for receiving a first input of a first video text in the N video texts;

and the determining module is used for responding to the first input and determining a target video according to a first video frame corresponding to the first video text in the video to be processed.

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor and a program or instructions stored on the memory and executable on the processor, which program or instructions when executed by the processor implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

In the embodiment of the application, the video text of the video to be processed is displayed, the first input of the first video text in the N video texts is received, the first input is responded, the target video is determined according to the first video frame corresponding to the first video text in the video to be processed, and the video text contains abundant information of the video to be processed, so that the user edits the video to be processed through the video text to obtain the target video containing the video content required by the user, and the video editing efficiency is improved.

Drawings

Fig. 1 is a schematic flow chart of a video editing method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a video playing interface in a video editing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a video text display interface in a video editing method according to an embodiment of the present application;

fig. 4 is an interface schematic diagram of a video text interception operation in the video editing method according to the embodiment of the present application;

fig. 5 is an interface schematic diagram of a first video text movement operation in the video editing method according to the embodiment of the present application;

fig. 6 is an interface schematic diagram of a first video text deletion operation in the video editing method according to the embodiment of the present application;

fig. 7 is a schematic diagram of setting annotation information in the video editing method according to the embodiment of the present application;

FIG. 8 is a schematic diagram showing annotation information in a video editing method according to an embodiment of the present application;

FIG. 9 is a second schematic diagram of displaying annotation information in the video editing method according to the embodiment of the present application;

FIG. 10 is a second schematic diagram of a video playback interface in the video editing method according to the embodiment of the present application;

FIG. 11 is a schematic diagram of a video text catalog display interface in a video editing method according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a playback jump interface in a video editing method according to an embodiment of the present application;

FIG. 13 is a diagram illustrating a second playback jump interface in the video editing method according to the embodiment of the present application;

FIG. 14 is a third diagram of a playback jump interface in the video editing method according to the embodiment of the present application;

FIG. 15 is a diagram illustrating a playback jump interface in a video editing method according to an embodiment of the present application;

FIG. 16 is a diagram illustrating a playback jump interface in a video editing method according to an embodiment of the present application;

FIG. 17 is a diagram illustrating a playback jump interface in a video editing method according to an embodiment of the present application;

FIG. 18 is a schematic diagram of a video editing apparatus according to an embodiment of the present application;

fig. 19 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 20 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The video editing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a video editing method according to an embodiment of the present application, as shown in fig. 1, the method includes:

step 101, displaying N video texts of a video to be processed, wherein N is a positive integer;

the execution main body of the embodiment of the application is electronic equipment. Electronic devices include, but are not limited to, smartphones, tablets, computers, and the like.

The video to be processed is a video to be edited, and can be an original video or a video edited by using the video editing method provided by the embodiment of the application.

The video text of the video to be processed can be a video subtitle, and can also be text obtained by converting the voice of one or at least two video objects in the video to be processed. Wherein the video subtitles may be identified from video frames of the video to be processed.

One line or one sentence of video text of the video to be processed can be used as one video text, and the division of the video text is not limited in the embodiment of the application.

The electronic device displays N video texts of the video to be processed. The number of video texts displayed on the electronic device N may be determined according to the display screen size of the electronic device. The larger the display screen of the electronic device, the larger the number N of video texts displayed.

When the number of video texts is greater than N, all the video texts cannot be simultaneously presented. The display area of the video text can be a sliding window, and the user can view other video text by sliding the sliding window up and down, so that the overview of the video text of the video to be processed is realized.

The user can overview the content of the video to be processed by browsing the video text in the video to be processed, so that the user can quickly and accurately locate the required content, and the time for searching the video content is saved.

Before the electronic equipment displays N video texts of the video to be processed, the electronic equipment receives an input operation triggering the display of the video texts of the video to be processed.

The input operation may be a click operation on a video playing interface of the electronic device, or a click operation on a video thumbnail, which is not limited in the embodiment of the present application.

Fig. 2 is a schematic diagram of a video playing interface in a video editing method according to an embodiment of the present application. Fig. 3 is a schematic diagram of a video text display interface in the video editing method according to the embodiment of the present application. When the user views the video to be processed, an edit button is presented on the video playback interface of the video to be processed, as shown in the lower right corner of fig. 2. If the user needs to edit the video to be processed, clicking an edit button, and displaying the video text display interface shown in fig. 3 by the electronic equipment.

In fig. 3, the top of the video text display interface may display the content picture of the video to be processed, the middle displays the playing progress bar of the video to be processed, and the bottom displays the video text of the video to be processed row by row. The embodiment of the application does not limit the layout of the video text display interface.

And associating each video text with the progress bar area of the corresponding video frame of each video text. And dividing the playing progress bar of the video to be processed into a plurality of sections according to the progress bar area associated with each video text. Each rectangular box in fig. 3 represents a progress bar area.

Step 102, receiving a first input of a first video text in the N video texts;

the first video text is selected and acquired by a user from video text of the video to be processed. The selection mode of the first video text can be the operation of clicking the video text, and can also be the intercepting operation of the video text.

When the video text is intercepted, a user selects a certain row of video text, the selected video text is positioned in the selection box, the progress bar area corresponding to the row of video text presents a selected state, and a sliding block is respectively displayed on the left and right sides of the progress bar.

Fig. 4 is an interface schematic diagram of a video text interception operation in the video editing method according to the embodiment of the present application. The user can adjust the frame to intercept the content of the line of video text by dragging the left and right borders of the frame. And taking the content intercepted in the line of video texts as a first video text.

And a left slider and a right slider on the corresponding progress bar area of the row of video texts can be dragged to intercept the corresponding first video texts. At this time, the left and right boundaries of the selection box of the line of video text are also adjusted in a linkage manner, as shown in fig. 4.

The user selects a first video text according to the displayed video text and performs a first input on the first video text. The electronic device receives a first input of a first video text.

The first input is an editing operation on the first video text, such as deleting and moving the first video text, and the embodiment of the application does not specifically limit the first input.

And step 103, responding to the first input, and determining a target video according to a first video frame corresponding to the first video text in the video to be processed.

When the first video text is a video subtitle, the video frame corresponding to the first video text in the video to be processed refers to the video frame of the first video text identified from the video to be processed.

When the first video text is a text obtained by voice conversion, the video frame corresponding to the first video text in the video to be processed refers to the video frame where the voice converted into the first video text in the video to be processed is located.

The electronic device determines a target video from a first video frame corresponding to the first video text in response to the first input. The specific steps can include: and according to the editing operation of the first video text, performing corresponding editing operation on the video frames corresponding to the first video text to obtain the target video. The target video may be available for sharing and recording.

Editing the video frames corresponding to the first video text is achieved through editing the first video text, and accuracy and efficiency of video editing are improved.

For example, when the first input is a deletion operation of the first video text, the electronic device deletes a video frame corresponding to the first video text from the video to be processed in response to the deletion operation.

According to the embodiment of the application, the first input of the first video text in the N video texts is received through displaying the video text of the video to be processed, the target video is determined according to the first video frame corresponding to the first video text in the video to be processed in response to the first input, and as the video text contains abundant information of the video to be processed, the user edits the video to be processed through the video text to obtain the target video containing video content required by the user, and the video editing efficiency is improved.

Optionally, before the step of displaying N video texts of the video to be processed, the method further includes:

and determining N video texts of the video to be processed according to the voice of the video object in the video to be processed.

The video object is an object, such as a person object, which emits a voice in the video to be processed.

The voice of the video object contains rich information of the video to be processed, and the information is generally information focused on the video to be processed by a user.

The electronic device converts the voice of the video object in the video to be processed into video text, and the converting step may specifically include: preprocessing the voices of all video objects in the video to be processed, including removing noise, enhancing voice signals and the like; converting the preprocessed voice into a feature vector; the feature vectors are analyzed by using a voice recognition algorithm to recognize the text content in the voice.

The speech recognition algorithm may be a dynamic time warping algorithm based, a hidden markov model algorithm based on a parametric model, a vector quantization and neural network algorithm based on a non-parametric model, etc. The embodiment of the application does not limit the voice recognition algorithm.

According to the embodiment of the application, the video text of the video to be processed is obtained and displayed according to the voice of the video object in the video to be processed, and the user can accurately know the content of the video to be processed by browsing the video text of the video to be processed, so that the required content can be rapidly and accurately positioned.

Optionally, before the step of determining N video texts of the video to be processed according to the voices of the video objects in the video to be processed, the method further includes:

receiving a second input to a target one of the video objects;

The determining N video texts of the video to be processed according to the voices of the video objects in the video to be processed includes:

and responding to the second input, and determining N video texts of the video to be processed according to the voice of the target video object.

The target video object is a video object of interest in the video to be processed by the user.

The second input is an operation of selecting a target video object from the video to be processed, such as inputting a name of the target video object or a click operation on the target video object. The second input is not limited by the embodiment of the application.

The electronic device receives a second input and, in response to the second input, extracts speech of the target video object from the speech of the video object and converts the speech of the target video object into video text.

The specific step of extracting the speech of the target video object from the speech of the video object may include: classifying the voices of the video objects, and classifying the voices of each video object into one type; identifying a target video object from video frames of the video to be processed; and determining the voice belonging to the target video object according to the video frame corresponding to the target video object and the video frame corresponding to each voice.

The voice classification algorithm can be a support vector machine algorithm, a neural network algorithm and the like. The embodiment of the application does not limit the voice classification algorithm.

Text content in the speech of the target video object is identified by a speech recognition algorithm. The speech recognition algorithm may be a dynamic time warping algorithm, a non-parametric model based vector quantization and neural network algorithm, etc. The embodiment of the application does not limit the voice recognition algorithm.

According to the embodiment of the application, the user selects the target video object focused on by the user from the video objects of the video to be processed, and the video text is obtained and displayed according to the voice of the target video object, so that the browsing range of the user is reduced, and the user can conveniently and quickly position the required content.

Optionally, the first input is for moving the first video text,

the determining, in response to the first input, a target video according to a first video frame corresponding to the first video text in the video to be processed, includes:

responsive to the first input, determining a second movement location of the first video frame from a first movement location of the first video text;

moving the first video frame to the second movement position;

And generating the target video according to the moved first video frame.

The manner of moving the first video text may be a mouse drag operation or a mouse click operation.

The first moving position is a position to which the first video text is to be moved among the N video texts. The second moving position is a position to which the first video frame corresponding to the first video text is to be moved in the video to be processed.

The first video text may be dragged to a first movement location of the N video texts by a mouse drag operation or moved to the first movement location by a mouse clicking on the first movement location of the N video texts.

The electronic equipment responds to the first input, and according to the corresponding relation between the video text and the video frames, the position of the video frame corresponding to the first moving position of the first video text, namely the second moving position, can be obtained, and the video frame corresponding to the first video text is moved to the second moving position.

Fig. 5 is an interface schematic diagram of a first video text movement operation in the video editing method according to the embodiment of the present application. As shown in fig. 5, after the user selects the first video text, the user may press and drag the first video text up and down for a long time, i.e. change the position of the first video text.

When the first video text is moved to the first moving position, the playing progress bar area corresponding to the first video text can be automatically moved to the corresponding position. And the playing progress bar area can be dragged left and right, so that the corresponding first video text automatically moves up and down to the corresponding position, and the coordinated adjustment of the first video text and the progress bar area of the corresponding video frame of the first video text is realized.

According to the embodiment of the application, the first video text in the video text of the video to be processed is moved, so that the movement operation of the first video text corresponding to the first video frame is realized, and the accuracy and efficiency of video editing are improved.

Optionally, the first input is for deleting the first video text;

and deleting the first video frame from the video to be processed in response to the first input, and generating the target video.

The operation of deleting the first video text may be a keyboard deletion or a mouse deletion operation after the first video text is selected. And deleting the first video text through a keyboard deleting operation or a mouse deleting operation.

The electronic equipment responds to the first input, acquires a first video frame corresponding to the first video text in the video to be processed, deletes the first video frame from the video to be processed, and obtains a target video.

Fig. 6 is an interface schematic diagram of a first video text deletion operation in the video editing method according to the embodiment of the present application. As shown in fig. 6, after the user selects the first video text, two buttons are displayed on the right side of the area where the first video text is located, and a delete button is located above and an add button is located below. When a user clicks a delete button, deleting the first video text from N video texts of the video to be processed, deleting the first video frame corresponding to the first video text from the video to be processed, and deleting the play progress bar area corresponding to the first video text in a linkage manner.

According to the embodiment of the application, the deleting operation is carried out on the first video text in the video text of the video to be processed, so that the deleting operation of the first video text in the first video corresponding to the first video frame is realized, and the accuracy and the efficiency of video editing are improved.

Optionally, the determining, in response to the first input, a target video according to a first video frame corresponding to the first video text in the video to be processed includes:

Adding annotation information for a first video frame corresponding to the first video text in the video to be processed in response to the first input;

and generating the target video according to the first video frame added with the annotation information.

The first input is an operation of adding annotation information to the first video frame, which may be an input operation to an input box, or may be a selection operation to the annotation information.

The annotation information includes at least one of text information, voice information, image information, and color information.

And adding annotation information for the first video frame, wherein the annotation information is used for marking the content or the position of the first video frame.

When the annotation information comprises text information, voice information or image information, the text information, the voice information or the image information can be displayed together when the first video frame is played, so that a user can conveniently and quickly know the content of the first video frame according to the text information, the voice information or the image information, and the required information can be quickly positioned.

When the annotation information comprises color information, a progress bar area corresponding to the first video frame can be displayed according to the color information, so that a user can conveniently and quickly position the first video frame.

Fig. 7 is a schematic diagram of setting annotation information in the video editing method according to the embodiment of the present application. And after the user selects the first video text, the progress bar area corresponding to the first video text presents the selected state. At this time, the user can press the progress bar area for a long time, and a bullet frame pops up above the progress bar area, and two options of labels and colors are displayed in the bullet frame. The user may input text information, voice information, or image information in the annotation option and color information in the color option, as shown in fig. 7.

Fig. 8 is a schematic diagram of displaying annotation information in the video editing method according to the embodiment of the present application. After the first video frame adds the labeling information, the progress bar area corresponding to the first video frame is displayed as the input color information, and a dialog box icon is displayed above the progress bar area to indicate that the progress bar area has the labeling information, as shown in fig. 8.

Fig. 9 is a second schematic diagram of displaying annotation information in the video editing method according to the embodiment of the present application. When the user clicks on the dialog icon, a bullet pops up over the progress bar area as shown in FIG. 9. And displaying the marked text information, voice information or image information in the bullet frame. After the user reads the content of the bullet frame, the user clicks a closing button at the upper right corner of the bullet frame to close the bullet window.

According to the embodiment of the application, the labeling information is set for the first video frame corresponding to the first video text, so that a user can conveniently and quickly know the content and the position of the first video frame according to the labeling information, and the required information can be quickly positioned.

Responding to the first input, adding caption information for a first video frame corresponding to the first video text in the video to be processed;

and generating the target video according to the first video frame added with the subtitle information.

Under the condition that the video to be processed does not have subtitle information, the electronic equipment can convert the voice of the video object in the video to be processed into N video texts, receive first input of first video texts in the N video texts, and add subtitle information for a first video frame corresponding to the first video text in response to the first input.

The first input may be a selection operation of the first video text. The electronic device responds to the first input and directly takes the first video text as caption information of the first video frame.

The first input may also be an editing operation on the first video text. The electronic device responds to the first input, and takes the edited first video text as subtitle information of the first video frame.

And the target video generated according to the first video frame added with the caption information is the video with the caption information.

According to the embodiment of the application, the subtitle information is added for the first video frame corresponding to the first video text according to the first input of the user to the first video text, so that the user can refer to the video text of the video to be processed to add the subtitle information to the video to be processed, and the subtitle information adding efficiency is improved.

Optionally, the method further comprises:

receiving a third input to a second video text of the N video texts;

responding to the third input, and displaying a second video frame corresponding to the second video text in the video to be processed;

receiving a fourth input for a target video frame of the second video frames;

and generating target graphic information according to the second video text and the target video frame in response to the fourth input.

The second video text is a text used to generate the teletext information. The third input is a selection operation of a second video text of the N videos.

The electronic device receives a third input of the second video text and displays a second video frame corresponding to the second video text for selection by a user in response to the third input.

The target video frame is a video frame used to generate the teletext information. The fourth input is a selection operation of a target video frame of the second video frames. The user selects one or at least two frames from the second video frames for generating the teletext information.

The electronic device receives a fourth input of the target video frame and generates target teletext information in response to the fourth input.

The target teletext information is a mixture of text and images, which may be a document containing the second video text and the target video frame.

According to the embodiment of the application, the second video text is selected from N video texts of the video to be processed, and the target image-text information is selected from the second video frames corresponding to the second video text, so that the image-text information is generated, the image-text information is richer, and the user requirements are better met.

Optionally, the method further comprises:

and displaying a video text catalog on a video playing interface of the video to be processed.

The video text catalog is a catalog which displays N video texts of the video to be processed according to the playing sequence of the video frames corresponding to the video texts.

And displaying the video text catalogue for a user to browse while playing the video to be processed on the video playing interface.

When the number of video texts is large, all the video texts cannot be displayed simultaneously in the video text directory. The first N video texts of the video to be processed can be displayed in the video text catalog, and the continuous N video texts containing the video texts corresponding to the video frames can be displayed according to the video frames currently played in the video playing interface of the video to be processed.

The video catalog may be displayed in the form of a sliding window that is slid up and down to view other video text, thereby enabling an overview of the video text of the video to be processed.

When the user finds that the content currently played in the video playing interface is not needed by the user, the user can quickly find the needed content through the video text catalog and switch to the video frame corresponding to the needed content for playing.

Fig. 10 is a second schematic diagram of a video playing interface in the video editing method according to the embodiment of the present application. Fig. 11 is a schematic diagram of a video text catalog display interface in a video editing method according to an embodiment of the present application. When the user views the video to be processed, a catalog button is presented on the video playback interface, as shown in fig. 10.

When the user finds that the content currently played in the video playing interface is not needed by himself, clicking the catalog button in fig. 10, the electronic device receives the clicking operation of the user on the catalog button, pops up a popup window of the video text catalog, and the video text of the video to be processed is displayed in the popup window, as shown in fig. 11.

The user can slide up and down in the popup window to browse, and the user can quickly know the content of the video to be processed by reading the video text in the video text catalog, so as to find the required key content.

According to the embodiment of the application, the video text catalogue is displayed on the video playing interface of the video to be processed for the user to browse, so that the content currently played in the video playing interface is not needed by the user, and the needed content can be found out quickly by browsing the video text catalogue.

Optionally, the method further comprises:

receiving a fifth input;

and responding to the fifth input, and determining a third video text from the video text catalog according to search information corresponding to the fifth input.

The fifth input may be an input operation to the input box, and the search information corresponding to the fifth input may be a keyword input in the input box.

The fifth input may be a selection operation of preset search information, and the search information corresponding to the fifth input may be selected search information.

According to the search information corresponding to the fifth input, the electronic equipment can match each video text in the video text catalog with the search information, and the video text containing the search information is used as a third video text, so that a user can conveniently and quickly locate the required information.

As shown in fig. 11, a search box is displayed on the top of the popup window, and the user can quickly acquire the third video text containing the search information by inputting the search information.

According to the embodiment of the application, the matched video text is searched from the video text of the video to be processed according to the search information corresponding to the input of the user, so that the required content is quickly found.

Optionally, the method further comprises:

Receiving a sixth input of a fourth video text in the video text catalog;

and responding to the sixth input, and playing a third video frame corresponding to the fourth video text in the video to be processed.

The sixth input is for selecting a fourth video text from the video text catalog. The sixth input may be a click operation on the fourth video text, or a box selection operation on the fourth video text.

When watching the video to be processed, if the content currently played in the video playing interface of the video to be processed is not required by the user, the user can quickly find the required content, namely the fourth video text, by browsing the video text catalog.

The fourth video text may be a single line of video text, a continuous line of video text, or a discontinuous line of video text.

And the user performs sixth input on the fourth video text, such as clicking operation, the electronic equipment receives the sixth input and plays a third video frame corresponding to the fourth video text, so that the video to be processed is quickly switched to the required content.

Fig. 12 is a schematic diagram of a playback jump interface in the video editing method according to the embodiment of the present application. After the user finds the desired fourth video text in the video text directory, the user may click on the fourth video text, which may be contained in the selection box, indicating that the line of fourth video text has been selected.

And a triangle skip button is displayed on the right side of the selection frame, and the user can skip to the video frame corresponding to the fourth video text for playing by clicking the skip button.

The jump to the video frame corresponding to the fourth video text for playing may also be implemented by other interaction manners, for example, double clicking the fourth video text, which is not limited in the embodiment of the present application.

Fig. 13 is a second schematic diagram of a playback jump interface in the video editing method according to the embodiment of the present application. When the fourth video text is a plurality of lines of video text, the user can activate the box by pressing long. There is an arrow above and below the frame, respectively, and dragging the arrow by the user may effect an upward and/or downward expansion of the frame to select successive rows of fourth video text.

Fig. 14 is a third schematic view of a playback jump interface in the video editing method according to the embodiment of the present application. The 3 buttons are shown on the right side of the selection box in fig. 13, and when the user selects the fourth video text by using the selection box, the preview button can be clicked, a popup window shown in fig. 14 is popped up, and a video frame corresponding to the fourth video text is played in the popup box.

The user can click a save or share button to save the video frame corresponding to the fourth video text locally or share the video frame to others.

Fig. 15 is a schematic diagram of a playback jump interface in the video editing method according to the embodiment of the present application. When the user clicks the add button, a new button is presented below the interface. The upper right hand corner of the button shows a count icon with the count value being the number of lines of fourth video text added by the user in total. The fourth video text may be a plurality of lines of video text that are continuous or discontinuous.

Fig. 16 is a schematic diagram of a playback jump interface in the video editing method according to the embodiment of the present application. When the user clicks the new button, a new interface will be entered. The new interface is consistent with the original interface structure and is used for displaying the fourth video text added in the original interface by the user and the corresponding video frame. The user may click on the lower back button to return to the original interface and may continue to add the fourth video text.

Fig. 17 is a schematic diagram of a playback jump interface in the video editing method according to the embodiment of the present application. The user can click the completion button below in the interfaces shown in fig. 15 and 16, and play the video frame corresponding to the fourth video text in the new video playing interface, as shown in fig. 17.

4 buttons are displayed on the right side of the new video playing interface, if the user needs to edit the content in the new video playing interface, the user can click the edit button, and the interface in fig. 15 or fig. 16 is returned to edit the fourth video text.

If the user does not need to edit the content in the new video playing interface, the user can click a save button or a share button to save the video frame corresponding to the fourth video text to the local or share the video frame to other applications. When the user does not need to perform subsequent operations, the user can click the close button to exit the video playing interface.

According to the embodiment of the application, the video frames corresponding to the video text are played according to the video text in the video text catalog selected by the user, so that the video can be quickly jumped to the content required by the user.

Optionally, the video to be processed is a teaching video, and N video texts of the video to be processed are determined according to voices of teaching objects in the teaching video.

The teaching video is a video containing teaching contents. The teaching object includes a learner and may also include a subject.

The electronic equipment converts the voice of the teaching object in the video to be processed into video text, and the voice can be converted through a voice recognition algorithm.

According to the embodiment of the application, the video text of the video to be processed is obtained and displayed according to the voice of the teaching object in the video to be processed, and the content of the video to be processed can be accurately known by a learner through browsing the video text of the video to be processed, so that the teaching content to be known can be rapidly and accurately positioned.

According to the video editing method provided by the embodiment of the application, the execution subject can be a video editing device. In the embodiment of the present application, a video editing apparatus executes a video editing method as an example, and the video editing apparatus provided in the embodiment of the present application is described.

Fig. 18 is a schematic structural diagram of a video editing apparatus according to an embodiment of the present application, as shown in fig. 18, the apparatus includes a display module 1801, a receiving module 1802, and a determining module 1803, where:

the display module 1801 is configured to display N video texts of a video to be processed, where N is a positive integer;

the receiving module 1802 is configured to receive a first input to a first video text of the N video texts;

the determining module 1803 is configured to determine, in response to the first input, a target video according to a first video frame corresponding to the first video text in the video to be processed.

Optionally, the determining module is further configured to:

receiving a second input to a target one of the video objects;

Optionally, the first input is for moving the first video text;

the determining module is specifically configured to:

moving the first video frame to the second movement position;

and generating the target video according to the moved first video frame.

Optionally, the first input is for deleting the first video text;

the determining module is specifically configured to:

Optionally, the determining module is specifically configured to:

Optionally, the method further comprises a generating module for:

receiving a third input to a second video text of the N video texts;

receiving a fourth input for a target video frame of the second video frames;

Optionally, the display module is further configured to:

Optionally, the determining module is further configured to:

receiving a fifth input;

Optionally, the device further comprises a playing module for:

receiving a sixth input of a fourth video text in the video text catalog;

The video editing apparatus in the embodiment of the application may be an electronic device, or may be a component in an electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The video editing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The video editing apparatus provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to 17, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 19, an electronic device 1900 is further provided in the embodiment of the present application, which includes a processor 1901 and a memory 1902, where the memory 1902 stores a program or an instruction that can be executed on the processor 1901, and the program or the instruction implements each step of the embodiment of the video editing method when executed by the processor 1901, and the steps achieve the same technical effects, and are not repeated herein for avoiding repetition.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

Fig. 20 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 2000 includes, but is not limited to: radio frequency unit 2001, network module 2002, audio output unit 2003, input unit 2004, sensor 2005, display unit 2006, user input unit 2007, interface unit 2008, memory 2009, and processor 2010.

Those skilled in the art will appreciate that the electronic device 2000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 2010 through a power management system so as to perform functions such as managing charging, discharging, and power consumption by the power management system. The electronic device structure shown in fig. 20 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown in the drawings, or may combine some components, or may be arranged in different components, which will not be described in detail herein.

The display unit 2006 is configured to display N video texts of a video to be processed, where N is a positive integer;

a user input unit 2007 for receiving a first input of a first video text of the N video texts;

a processor 2010 is configured to determine, in response to the first input, a target video from a corresponding first video frame of the first video text in the video to be processed.

Optionally, the processor 2010 is further configured to determine N video texts of the video to be processed according to voices of video objects in the video to be processed.

Optionally, the user input unit 2007 is further configured to receive a second input to a target video object of the video objects;

processor 2010 is further configured to determine N video texts of the video to be processed according to voices of the target video object in response to the second input.

Optionally, the processor 2010 is configured to determine, in response to the first input, a second movement position of the first video frame from a first movement position of the first video text; moving the first video frame to the second movement position; and generating the target video according to the moved first video frame.

Optionally, the processor 2010 is configured to delete the first video frame from the video to be processed in response to the first input, and generate the target video.

Optionally, the processor 2010 is configured to add annotation information to a corresponding first video frame of the first video text in the video to be processed in response to the first input; and generating the target video according to the first video frame added with the annotation information.

Optionally, the processor 2010 is configured to add subtitle information to a corresponding first video frame of the first video text in the video to be processed in response to the first input; and generating the target video according to the first video frame added with the subtitle information.

Optionally, the user input unit 2007 is further configured to receive a third input of a second video text of the N video texts;

a display unit 2006, further configured to display, in response to the third input, a second video frame corresponding to the second video text in the video to be processed;

a user input unit 2007 for receiving a fourth input of a target video frame of the second video frames;

processor 2010 is further configured to generate target teletext information from the second video text and the target video frame in response to the fourth input.

Optionally, the display unit 2006 is further configured to display a video text catalog on the video playing interface of the video to be processed.

According to the embodiment of the application, the video text catalogue is displayed on the video playing interface of the video to be processed for the user to browse, so that the content currently played in the video playing interface is not needed by the user, and the needed content can be quickly found out by browsing the video text catalogue.

Optionally, the user input unit 2007 is further configured to receive a fifth input;

processor 2010 is further configured to determine, in response to the fifth input, a third video text from the video text catalog according to search information corresponding to the fifth input.

Optionally, the user input unit 2007 is further configured to receive a sixth input of a fourth video text in the video text catalog;

processor 2010 is further configured to play a third video frame corresponding to the fourth video text in the video to be processed in response to the sixth input.

It should be appreciated that in embodiments of the present application, the input unit 2004 may include a graphics processor (Graphics Processing Unit, GPU) 20041 and a microphone 20042, the graphics processor 20041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 2006 may include a display panel 20061, and the display panel 20061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 2007 includes at least one of a touch panel 20071 and other input devices 20072. The touch panel 20071 is also referred to as a touch screen. The touch panel 20071 can include two parts, a touch detection device and a touch controller. Other input devices 20072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory 2009 may be used to store software programs as well as various data. The memory 2009 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 2009 may include volatile memory or nonvolatile memory, or the memory 2009 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 2009 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

Processor 2010 may include one or more processing units; optionally, the processor 2010 integrates an application processor that primarily handles operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 2010.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the video editing method embodiment described above, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the video editing method embodiment, and the same technical effects can be achieved, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the video editing method embodiment described above, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A video editing method, comprising:

receiving a first input to a first video text of the N video texts;

2. The video editing method according to claim 1, wherein before the step of displaying N video texts of a video to be processed, the method further comprises:

3. The video editing method according to claim 2, wherein before the step of determining N video texts of the video to be processed from voices of video objects in the video to be processed, the method further comprises:

receiving a second input to a target one of the video objects;

4. The video editing method of claim 1, wherein the first input is for moving the first video text;

moving the first video frame to the second movement position;

and generating the target video according to the moved first video frame.

5. The video editing method of claim 1, wherein the first input is for deleting the first video text;

6. The video editing method according to claim 1, wherein the determining, in response to the first input, a target video from a corresponding first video frame of the first video text in the video to be processed, comprises:

7. The video editing method according to claim 1, wherein the determining, in response to the first input, a target video from a corresponding first video frame of the first video text in the video to be processed, comprises:

8. The video editing method of claim 1, wherein the method further comprises:

receiving a third input to a second video text of the N video texts;

receiving a fourth input for a target video frame of the second video frames;

9. The video editing method of claim 1, wherein the method further comprises:

10. The video editing method of claim 9, wherein the method further comprises:

receiving a fifth input;

11. The video editing method of claim 9, wherein the method further comprises:

receiving a sixth input of a fourth video text in the video text catalog;

12. The video editing method according to claim 1, wherein the video to be processed is a teaching video, and N video texts of the video to be processed are determined according to voices of teaching objects in the teaching video.

13. A video editing apparatus, comprising:

14. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the video editing method of any of claims 1-12.

15. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the video editing method according to any of claims 1-12.