CN115695680A

CN115695680A - Video editing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN115695680A
Application number: CN202110871543.0A
Authority: CN
Inventors: 刘伟烨; 姚鸿; 张伟; 秦程博; 潘名扬; 廖卓淳; 苗高亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2023-02-03

Abstract

The application provides a video editing method, a video editing device, electronic equipment and a computer readable storage medium; the method comprises the following steps: responding to a video filling triggering operation aiming at a video template, and displaying a first video, wherein the first video is formed after at least one video material is filled in the video template; displaying at least one audio stuffing area on a timeline of the first video; responding to audio setting operation, and acquiring audio materials to be filled in the at least one audio filling area; in response to an audio stuffing trigger operation for the first video, displaying a second video in place of the first video, wherein the second video is formed after the at least one audio stuffing area of the first video is stuffed with corresponding audio material. Through the method and the device, personalized and efficient video editing can be realized through the video filling and audio filling functions which are matched with each other.

Description

Video editing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a video editing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Video, and in particular short video, has become an important medium for social and information dissemination in networks. The user can make video production through the video editing tool to express ideas and distribute the ideas to the network for various forms of interaction.

In the related art, a user is supported to use a video template function provided by a video editing tool to make a video, but the video editing tool provided based on the related art cannot meet the requirements of personalized and efficient video editing.

For example, videos produced based on video templates provided by the related art have high emotional and logical similarity with the video templates themselves, which results in lack of personalization of the videos and difficulty in expressing the unique ideas and ideas of the users themselves. Even a second edit of the video is required to add further content, which in turn affects editing efficiency.

Disclosure of Invention

The embodiment of the application provides a video editing method, a video editing device, electronic equipment and a computer readable storage medium, which can realize personalized and efficient video editing through a video and audio filling function matched with each other.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a video editing method, which comprises the following steps:

responding to a video filling triggering operation aiming at a video template, and displaying a first video, wherein the first video is formed after at least one video material is filled in the video template;

displaying at least one audio stuffing area on a timeline of the first video;

responding to audio setting operation, and acquiring audio materials to be filled in the at least one audio filling area;

in response to an audio fill trigger operation for the first video, displaying a second video in place of the first video, wherein the second video is formed after the at least one audio fill region of the first video is filled with corresponding audio material.

In the above solution, before the first video is displayed in response to the video stuffing trigger operation for the video template, the method further includes: displaying at least one segment to be filled and a video material selection control included in a video template, wherein the video material selection control is used for selecting a plurality of candidate video materials; the selected at least one video material is highlighted in response to the video material selection operation through the video material selection control, and the selected at least one video material is filled into the to-be-filled segment corresponding to the video template in response to the video filling triggering operation aiming at the video template.

In the above solution, the displaying at least one segment to be filled included in the video template includes: displaying a plurality of candidate video templates; displaying a usage entry of a selected video template of the plurality of candidate video templates in response to a video template selection operation; in response to the triggering operation of the use entrance aiming at the selected video template, displaying at least one segment to be filled included by the selected video template; the method further comprises the following steps: and displaying introduction information of the segment to be filled, wherein the introduction information is used for representing the type of the video material to be filled in the segment to be filled.

In the above solution, after the filling the selected at least one video material into the video template, the method further includes: displaying an alternative video portal; in response to a triggering operation for the replacement video entry, displaying a video selection control, wherein the video selection control is used for selecting a plurality of candidate video materials; in response to a video material selection operation via the video selection control, replacing the video material with the selected at least one of the plurality of candidate video materials.

In the above solution, after the filling the selected at least one video material into the video template, the method further includes: displaying a video clipping entry; responding to the triggering operation aiming at the video clipping inlet, and displaying a video clipping control, wherein the video clipping control comprises at least one of a picture clipping control and a time period selection control; cutting pictures except the set cutting frame from the video material in response to the cutting frame setting operation based on the picture cutting control; in response to a time setting operation based on the time period selection control, cutting out video clips from the video material outside the set time period.

In the above solution, after the filling the selected at least one video material into the video template, the method further includes: responding to a video material selection operation, and displaying that a selected target video material in the at least one video material is in an editing state; displaying a volume adjustment inlet, and responding to the trigger operation aiming at the volume adjustment inlet to display a volume adjustment control; in response to a setting operation for the volume adjustment control, determining the set volume as the volume of the target video material.

In the above scheme, after the audio material to be filled in the at least one audio filling region is obtained, the method further includes: performing at least one of the following processes: converting the data format of the audio material into the data format conforming to the first video; the sampling frequency of the audio material is converted to conform to the sampling frequency of the first video.

An embodiment of the present application provides a video editing apparatus, including:

the display module is used for responding to a video filling triggering operation aiming at a video template and displaying a first video, wherein the first video is formed after at least one video material is filled in the video template;

the display module is further configured to display at least one audio stuffing area on a timeline of the first video;

the acquisition module is used for responding to audio setting operation and acquiring audio materials to be filled in the at least one audio filling area;

the display module is further configured to display a second video instead of the first video in response to an audio filling trigger operation for the first video, where the second video is formed after the at least one audio filling region of the first video is filled with a corresponding audio material.

In the above scheme, the display module is further configured to display an audio setting entry corresponding to the first video; and the audio processing unit is used for responding to the triggering operation of the audio setting inlet, displaying a time axis of the first video, and displaying at least one audio filling area on the time axis, wherein the audio filling area is used for indicating a time period capable of filling the audio material on the time axis.

In the foregoing solution, the obtaining module is further configured to obtain the at least one audio stuffing area by at least one of the following methods: acquiring the at least one preset audio filling area from the video template; dividing the at least one audio filling area from the time axis according to the playing time of the video material filled in the first video; and dividing the time axis according to plot units of the first video, and determining an audio filling area corresponding to each plot unit.

In the above scheme, the display module is further configured to respond to an audio filling area selection operation, and display that the selected target audio filling area is in an editing state; updating to display the target audio stuffing area based on the start time and the end time set on the time axis in response to a time setting operation for the target audio stuffing area.

In the foregoing solution, when the audio setting entry is a recording entry, the display module is further configured to display a recording control corresponding to a target audio filling area, where the target audio filling area is an audio filling area in an edited state in the at least one audio filling area; the device also comprises a collecting module, a control module and a control module, wherein the collecting module is used for responding to a first audio setting operation of starting the recording control and starting to collect audio; stopping collecting audio in response to a second audio setting operation of closing the recording control; the device further comprises a determining module, which is used for taking the collected audio as the audio material to be filled in the target audio filling area.

In the above solution, when the audio setting entry is an audio material selection entry, the display module is further configured to display an audio material selection control, where the audio material selection control is configured to select multiple candidate audio materials; the determining module is further configured to, in response to an audio setting operation selected through the audio material selection control, take at least one selected audio material of the multiple candidate audio materials as an audio material to be filled in a target audio filling region, where the target audio filling region is an audio filling region in an editing state in the at least one audio filling region.

In the above scheme, the display module is further configured to display an audio re-recording entry; and the audio recording control is used for responding to the triggering operation of the audio re-recording inlet and displaying the recording control for re-acquisition; the acquisition module is further used for responding to a third audio setting operation for starting the recording control and starting to acquire audio; stopping collecting audio in response to a fourth audio setting operation of closing the recording control; the determining module is further configured to use the re-acquired audio as an audio material to be filled in a target audio filling region, where the target audio filling region is an audio filling region in the at least one audio filling region in an editing state.

In the above solution, the display module is further configured to display an audio deletion entry; the device further comprises a deleting module, configured to delete the audio material corresponding to a target audio filling region in response to a trigger operation for the audio deleting entry, where the target audio filling region is an audio filling region in an editing state in the at least one audio filling region.

In the above solution, the display module is further configured to display a volume adjustment entry; and for displaying a volume adjustment control in response to a triggering operation for the volume adjustment entry; the determining module is further configured to determine, in response to a setting operation for the volume adjustment control, a set volume as a volume of an audio material corresponding to a target audio filling region, where the target audio filling region is an audio filling region in an editing state in the at least one audio filling region.

In the above scheme, the display module is further configured to display a variable sound inlet; and a display unit for displaying a plurality of candidate sound-emitting objects in response to a trigger operation for the sound-changing entry; the device further comprises a replacing module, configured to replace, in response to a sound-generating object selection operation, an initial sound-generating object of the audio material corresponding to a target audio filling area with a selected target sound-generating object from the multiple candidate sound-generating objects, where the target audio filling area is an audio filling area in an edited state in the at least one audio filling area.

In the above scheme, the display module is further configured to display a text recognition entry; and the voice recognition result of a target audio material is displayed in response to the triggering operation of the text recognition entrance, and the voice recognition result is used as the subtitle of the time segment filled with the target audio material in the second video, wherein the target audio material is the audio material in an editing state in the audio material to be filled.

In the above scheme, the display module is further configured to display at least one segment to be filled and a video material selection control included in the video template, where the video material selection control is configured to select multiple candidate video materials; and for highlighting the selected at least one video material in response to a video material selection operation via the video material selection control; the device further comprises a filling module, which is used for responding to a video filling triggering operation aiming at the video template, and filling the selected at least one video material into a segment to be filled corresponding to the video template.

In the above scheme, the display module is further configured to display a plurality of candidate video templates; displaying a usage entry of a selected video template of the plurality of candidate video templates in response to a video template selection operation; in response to the triggering operation of the use entrance aiming at the selected video template, displaying at least one segment to be filled included by the selected video template; and the introduction information is used for displaying the segments to be filled, wherein the introduction information is used for representing the types of the video materials to be filled in the segments to be filled.

In the above scheme, the display module is further configured to display a replacement video entry; in response to a triggering operation for the replacement video entry, displaying a video selection control, wherein the video selection control comprises a plurality of candidate video materials; the replacement module is further configured to replace the video material with the selected at least one of the plurality of candidate video materials in response to the video material selection operation for use in the video selection control.

In the above scheme, the display module is further configured to display a video clipping entry; the video cutting control comprises a picture cutting control and a time period selection control, wherein the picture cutting control comprises at least one of a picture cutting control and a time period selection control; the device also comprises a cutting module used for responding to the cutting frame setting operation based on the picture cutting control and cutting pictures except the set cutting frame from the video material; in response to a time setting operation based on the time period selection control, cutting out video clips from the video material outside the set time period.

In the above scheme, the display module is further configured to respond to a video material selection operation, and display that a selected target video material of the at least one video material is in an editing state; displaying a volume adjustment inlet, and responding to the trigger operation aiming at the volume adjustment inlet to display a volume adjustment control; the determining module is further configured to determine the set volume as the volume of the target video material in response to a setting operation for the volume adjustment control.

In the above scheme, the display module is further configured to display a plurality of candidate video materials; in response to a video material selection operation, highlighting the selected at least one video material and displaying a plurality of candidate video templates matching the at least one video material; and the filling module is also used for responding to the video template selection operation and filling the at least one video material into the segment to be filled corresponding to the selected video template.

In the foregoing solution, the apparatus further includes a conversion module, configured to perform at least one of the following processes: converting the data format of the audio material into the data format conforming to the first video; converting the sampling frequency of the audio material to conform to the sampling frequency of the first video.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the video editing method provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute, so as to implement the video editing method provided by the embodiment of the application.

An embodiment of the present application provides a computer program product, where the computer program product includes computer-executable instructions, and is used for implementing a video editing method provided in an embodiment of the present application when being executed by a processor.

The embodiment of the application has the following beneficial effects:

in the process of using the video template, the functions of filling the video material and filling the audio material are mutually matched, so that corresponding video material and audio material can be flexibly added according to requirements, the video editing efficiency is improved, and the requirements of video expression on personalized ideas and viewpoints are fully met.

Drawings

Fig. 1 is a schematic architecture diagram of a video editing system 100 provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of a terminal 400 provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of a video editing method provided in an embodiment of the present application;

fig. 4 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 5 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 6 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 7 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 8 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 9 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 10 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 11 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 12 is a schematic application scenario diagram of a video editing method provided in an embodiment of the present application;

fig. 13 is a schematic model diagram of a video editing system provided in an embodiment of the present application;

fig. 14 is a flowchart illustrating a recording process according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first", "second", and the like are only intended to distinguish similar objects and do not denote a particular order, but rather the terms "first", "second", and the like may be used interchangeably with the specific order or sequence described herein, where permissible, to enable embodiments of the present application to be practiced otherwise than as specifically illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) The video template is used for making a video, and the video template can comprise replaceable resource content and non-replaceable resource content, wherein the replaceable resource content can comprise pictures, audio used as background music, text content and the like, and the non-replaceable resource content can comprise pictures except for replaceable partial pictures, and a dynamic display effect, a sound effect, a text display effect and the like of every two pictures during switching.

2) Video material, material used by a user when making a video, the media type of the video material may include pictures and videos, for example, when the video material selected by the user is a video, then the video may include the asset content of the audio media type and the asset content of the image media type.

3) The audio material is used by a user when making a video, and the audio material can be audio recorded by the user in real time or audio stored in the terminal locally in advance.

4) In response to the condition or state on which the performed operation depends, one or more of the performed operations may be in real-time or may have a set delay when the dependent condition or state is satisfied; there is no restriction on the order of execution of the operations performed unless otherwise specified.

With the rapid development of the short video industry, the requirement of users for editing videos is higher and higher, and users can make videos and express ideas through video editing tools.

In the related art, when video production is performed, generally, a user needs to collect video materials or write a video file script in advance, then the user uses a native camera or shooting and recording software to shoot, then operations such as editing and editing are performed to generate a final video, and the process of producing and generating the video needs to be performed by the user to finely input and adjust video characters, add own recording and the like. However, the efficiency of making video in this way is very low, the making period of video generally takes 1-2 days, which is not convenient and increases the operation time of the user.

In addition, the related technology also provides a function of rapidly making videos by using the video template, and the user only needs to add video materials, and the system can automatically add the effects of music, text subtitles and the like preset by the template for the video materials, so that the video making can be completed in one step. However, in the method for quickly creating a video by using a template, a recording cannot be added to the template video (it takes a long time to create the video because another video needs to be created again to record a sound after the template video is created), so that a user cannot express his view and idea of the created video by using his own sound, and the diversity and personalization of the video cannot be increased.

In view of this, embodiments of the present application provide a video editing method, an apparatus, an electronic device, and a computer-readable storage medium, which can implement personalized and efficient video editing through the functions of video stuffing and audio stuffing that are coordinated with each other. An exemplary application of the electronic device provided in the embodiments of the present application is described below, and the electronic device provided in the embodiments of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented cooperatively by a server and a terminal. In the following, an exemplary application when the electronic device is implemented as a terminal will be explained.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a video editing system 100 provided in this embodiment, in order to support a video editing application, a terminal 400 is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 runs with a client 410, the client 410 can be a video editing client or a client (such as a social network client, an instant messaging client) integrated with a video editing function, and the like, the client 410 can send a network request for obtaining a video template to the server 200 through the network 300 (for example, the client 410 can send the request for obtaining the video template to the server 200 only when receiving an obtaining instruction triggered by a user; of course, the client 410 can also send the network request for obtaining the video template to the server 200 in advance to obtain the video template in advance and store the video template locally in the terminal 400, so as to reduce the number of interactions), so that the server 200 sends the video template to the client 410, and then the client 410 can respond to a video filling operation for the video template to display a template video (i.e. a video formed after filling a video material in the video template) and display at least one audio filling area on a time axis of the template video; the client 410 may then obtain, in response to the audio setting operation, audio material to be filled in the at least one audio filling area (for example, the audio recorded by the user in real time or the audio stored in the terminal 400 in advance); finally, the client 410 can respond to the audio filling operation for the template video and display the template video filled with the audio material in the audio filling area, so that the audio can be directly added into the template video, the video editing efficiency is improved, and the requirement that a user wants to express own personalized ideas and viewpoints through the audio is met.

In some embodiments, the embodiments of the present application may be implemented by Cloud Technology (Cloud Technology), which refers to a hosting Technology for unifying resources of hardware, software, network, and the like in a wide area network or a local area network to implement computation, storage, processing, and sharing of data.

The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. For example, the service interaction function between the server 200 and the terminal 400 may be implemented by a cloud technology.

For example, the server 200 shown in fig. 1 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal 400 and the server 200 may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto.

In other embodiments, the terminal device 400 may also implement the video editing method provided by the embodiment of the present application by running a computer program, where the computer program may be the client 410 shown in fig. 1. For example, the computer program may be a native program or a software module in an operating system; the Application program may be a local (Native) Application program (APP), that is, a program that needs to be installed in an operating system to be executed, for example, a video editing APP or a video playing APP integrated with a video editing function; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded in any APP, where the applet can be run or shut down by user control. In general, the computer programs described above may be any form of application, module or plug-in.

The structure of the terminal 400 shown in fig. 1 is explained below. Referring to fig. 2, fig. 2 is a schematic structural diagram of a terminal 400 provided in an embodiment of the present application, where the terminal 400 shown in fig. 2 includes: at least one processor 420, memory 460, at least one network interface 430, and a user interface 440. The various components in the terminal 400 are coupled together by a bus system 450. It is understood that the bus system 450 is used to enable connected communication between these components. The bus system 450 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 450 in fig. 2.

The Processor 420 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 440 includes one or more output devices 441, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 440 also includes one or more input devices 442 including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display screen, camera, other input buttons and controls.

The memory 460 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 460 may optionally include one or more storage devices physically located remote from processor 420.

The memory 460 may include volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 460 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 460 may be capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 461 comprising system programs for handling various basic system services and performing hardware related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and handling hardware based tasks;

a network communication module 462 for reaching other computing devices via one or more (wired or wireless) network interfaces 430, exemplary network interfaces 430 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 463 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 441 (e.g., a display screen, speakers, etc.) associated with the user interface 440;

an input processing module 464 for detecting one or more user inputs or interactions from one of the one or more input devices 442 and translating the detected inputs or interactions.

In some embodiments, the video editing apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 shows a video editing apparatus 465 stored in a memory 460, which may be software in the form of programs and plug-ins, and includes the following software modules: display module 4651, acquisition module 4652, acquisition module 4653, determination module 4654, deletion module 4655, replacement module 4656, population module 4657, cropping module 4658 and module 4659, which are logical and therefore may be arbitrarily combined or further split depending on the functionality implemented. It should be noted that, in fig. 2, all the modules are shown at once for convenience of description, but should not be considered as excluding the implementation that may include only the display module 4651 and the acquisition module 4652 in the video editing apparatus 465, and the functions of the respective modules will be described below.

In other embodiments, the video editing apparatus provided in the embodiments of the present Application may be implemented in hardware, and as an example, the video editing apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the video editing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.

The video editing method provided by the embodiment of the present application will be described below with reference to an exemplary application and implementation of the terminal device provided by the embodiment of the present application. For example, referring to fig. 3, fig. 3 is a flowchart illustrating a video editing method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 3.

It should be noted that the method shown in fig. 3 can be executed by various forms of computer programs running on the terminal 400 shown in fig. 1, and is not limited to the client 410 described above, and may also be the operating system 461, the software modules and the scripts described above, so that the client described below should not be considered as limiting the embodiments of the present application.

In step S101, in response to a video fill trigger operation for a video template, a first video is displayed.

Here, the first video is formed after filling at least one video material in the video template.

In some embodiments, the user may select the video template first, then select the video material, and then fill the video template selected by the user with the selected video material, before executing step S101 shown in fig. 3, the following processing may also be executed: displaying at least one segment to be filled and a video material selection control included in a video template, wherein the video material selection control is used for selecting a plurality of candidate video materials; the selected at least one video material is highlighted (e.g., the selected video material is highlighted) in response to the video material selection operation through the video material selection control, and the selected at least one video material is filled into the to-be-filled section corresponding to the video template in response to the video filling triggering operation for the video template.

In other embodiments, taking the above example into account, the at least one to-be-filled segment included in the video template may be displayed in the following manner: displaying a plurality of candidate video templates; in response to a video template selection operation, displaying introduction information of a selected video template of a plurality of candidate video templates and a use entry; in response to a triggering operation of the use entrance aiming at the selected video template, displaying at least one segment to be filled included in the selected video template; and displaying introduction information of the segment to be filled, wherein the introduction information is used for representing the type of the video material to be filled in the segment to be filled.

In other embodiments, the user may also select a video material first, then select a video template, and then fill the video material selected by the user into the selected video template, before executing step S101 shown in fig. 3, the following processing may also be executed: displaying a plurality of candidate video materials; in response to a video material selection operation, highlighting the selected at least one video material and displaying a plurality of candidate video templates that match the selected at least one video material (e.g., candidate video templates that match the number or type of video materials); and responding to the video template selection operation, and filling the selected at least one video material into the to-be-filled segment corresponding to the selected video template.

The following description will be given by taking an example in which a user selects a video template first and then selects a video material.

For example, referring to fig. 4, fig. 4 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 4, a page 401 is displayed with a plurality of candidate video templates, including a video template 402, a video template 403, a video template 404, and a video template 405, when a user click operation on the video template 403 displayed in the page 401 is received, a jump is made to a detail page 406 of the video template 403, introduction information 407 and a use entry 408 of the video template 403 are displayed in the detail page 406, when a user click operation on the use entry 408 displayed in the detail page 406 is received, a video material selection control 409 is popped up, and a plurality of candidate video materials (e.g., pictures, videos, etc. in a local album of a user terminal) are displayed in the video material selection control 409, and when a user click operation on the video material 410 displayed in the video material selection control 409 is received, the selected video material 410 is highlighted, e.g., the video material 410 selected by the user is displayed in a highlighted manner. In addition, taking the to-be-filled section 411 as an example, corresponding introduction information 412 can be displayed below the to-be-filled section 411, for example, the type of the video material to be filled in the to-be-filled section 411, so that the user can upload the video material of the user by using the video template more accurately, and the video editing efficiency is improved.

In some embodiments, after populating the video template with the selected at least one video material, the following process may also be performed: displaying an alternative video portal; in response to a triggering operation for the replacement video entry, displaying a video selection control, wherein the video selection control comprises a plurality of candidate video materials; in response to a video material selection operation for use in the video selection control, replacing the original video material with the at least one video material reselected from the plurality of candidate video materials.

In still other embodiments, after populating the video template with the selected at least one video material, the following process may also be performed: displaying a video clipping entry; responding to the triggering operation aiming at the video clipping inlet, and displaying a video clipping control, wherein the video clipping control comprises at least one of a picture clipping control and a time period selection control; cutting pictures except the set cutting frame from the video material in response to the cutting frame setting operation based on the picture cutting control; in response to a time setting operation of the time period-based selection control, video clips outside the set time period are cut out from the video material.

In some embodiments, after populating the video template with the selected at least one video material, the following process may also be performed: responding to the video material selection operation, and displaying that the selected target video material in at least one video material is in an editing state; displaying a volume adjustment inlet, and responding to the trigger operation aiming at the volume adjustment inlet to display a volume adjustment control; and determining the set volume as the volume of the target video material in response to the setting operation for the volume adjusting control.

It should be noted that, in practical applications, each of the video editing entries (i.e., the replacement video entry, the video clipping entry, and the volume adjustment entry) may be displayed and used alternatively or in combination, for example, each of the video editing entries may be displayed separately, or a unified video editing entry may be displayed first, and each specific type of video editing entry is displayed after the unified video editing entry is triggered.

The following description will be given by taking an example of displaying and using the above-described respective video editing portals in combination.

For example, referring to fig. 5, fig. 5 is a schematic view of an application scene of a video editing method provided in an embodiment of the present application, as shown in fig. 5, when a user is not satisfied with a video material of an imported video template, fine tuning of the video material may be performed on a pop-up floating window by clicking the corresponding video material, the fine tuning may include replacing the video material, clipping the video material, adjusting the volume of the video material, and the like, for example, when the user is not satisfied with the video material 501 of the imported video template, the video material 501 may be selected, at which time the video material 501 may be highlighted (e.g., highlighted), indicating that the video material 501 is in an editing state, and the floating window may be popped up above the video material 501, where a replacement video entry 502, a video clipping entry 503, and a volume adjustment entry 504 are displayed in the floating window, and when a user click operation on the replacement video entry 502 is received, a video selection control 505 is popped up, and the user may reselect a video material that needs to be used to replace the video material 501 at the pop-up video selection control 505; when the clicking operation of the user on the video clipping entry 503 is received, the user jumps to the page 506, a picture clipping control 507 and a time period selection control 508 are displayed in the page 506, the user can adjust the picture size of the video material 501 by adjusting the clipping frame of the picture clipping control 507, and simultaneously can clip the required video clip from the video material 501 by setting the starting time and the ending time of the time period selection control 508; when a click operation of the user on the volume adjustment entry 504 is received, the user jumps to the page 509, a volume adjustment control 510 is displayed in the page 509, and the user can set the volume adjustment control 510, for example, by dragging an adjustment button on the volume adjustment control 510 to adjust the volume of the video material 501.

In step S102, at least one audio stuffing area is displayed on the time axis of the first video.

In some embodiments, the at least one audio fill region may be displayed on the timeline of the first video by: displaying an audio setting entry corresponding to the first video (for example, the audio setting entry may be a recording entry for acquiring an audio recorded by a user in real time, or an audio material selection entry for selecting among a plurality of audio materials pre-stored in a terminal); and in response to the triggering operation of the entry set for the audio, displaying a time axis of the first video, and displaying at least one audio filling area on the time axis, wherein the audio filling area is used for indicating a time period capable of filling the audio material on the time axis.

In other embodiments, before displaying the at least one audio fill area on the timeline, the following process may also be performed: obtaining at least one audio fill region by at least one of: acquiring at least one preset audio filling area from a video template; dividing at least one audio filling area from a time axis according to the playing time of a video material filled in a first video; and dividing the time axis according to the plot units of the first video, and determining an audio filling area corresponding to each plot unit.

For example, the audio filling area may be preset in the video template, and other areas may not be filled with the audio material, for example, when a producer of the video template makes the video template, at least one audio filling area may be preset in the video template, for example, assuming that the playing time of the video template is 20 seconds, the template producer may set the time period from 5 th to 10 th seconds as one audio filling area, that is, the time period from 5 th to 10 th seconds of the video template is a time period that allows the user to fill the audio material, for example, the user may fill the audio recorded in real time to the time period from 5 th to 10 th seconds of the video template, so by setting the audio filling area in the video template in advance, the operation burden of the user is reduced, and the efficiency of video production is improved.

For example, the audio filling regions may be formed by dividing a time axis according to video materials filled in the first video, and each video material corresponds to one audio filling region, for example, assuming that 3 video materials, namely, video material 1, video material 2, and video material 3, are filled in the first video, and the playing time of video material 1 is 3 seconds to 5 seconds, the playing time of video material 2 is 7 seconds to 9 seconds, and the playing time of video material 3 is 11 seconds to 13 seconds, the corresponding 3 audio filling regions may be divided from the time axis of the first video according to the playing time of the 3 video materials, for example, the 1 st audio filling region may be a time period from 3 seconds to 5 seconds, the 2 nd audio filling region may be a time period from 7 seconds to 9 seconds, and the 3 rd audio filling region may be a time period from 11 seconds to 13 seconds, so that the number of audio filling regions corresponds to the number of video materials, that a user experience for each video material may be expressed by himself.

For example, the audio filling area may be formed by dividing a time axis according to the story unit of the first video, and each story unit corresponds to one audio filling area, for example, first invoking the story unit identification model to perform story unit identification processing on the first video to obtain the story units included in the first video, assuming that the story unit identification model divides the first video into 2 story units, where the playing time corresponding to the 1 st story unit is from 1 second to 10 seconds, and the playing time corresponding to the 2 nd story unit is from 11 second to 20 second, the corresponding audio filling area may be determined for the 2 story units, for example, a time period from 2 second to 8 second may be used as the audio filling area corresponding to the 1 st story unit, and a time period from 13 second to 17 second may be used as the audio filling area corresponding to the 2 nd story unit, so that the number of audio filling areas is corresponding to the number of story units of the first video, that is, a user experience of the user may be improved by using the corresponding audio filling area for each story unit.

In other embodiments, the user may manually adjust the number of audio stuffing areas displayed on the time axis of the first video and the corresponding start time and end time of each audio stuffing area, that is, the terminal may further perform the following processes: displaying that the selected target audio fill area is in an edited state (e.g., highlighted manner) in response to the audio fill area selection operation; updating to display the target audio stuffing region based on the start time and the end time set on the time axis in response to a time setting operation for the target audio stuffing region; displaying a time period selection control (including a start point and an end point which can move on the time axis) on the time axis in response to the audio filling region adding operation; in response to a setting operation (including the set start time and end time) of the time period-based selection control, the newly added audio fill area is displayed.

For example, referring to fig. 6, fig. 6 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 6, two audio stuffing areas, namely an audio stuffing area 602 and an audio stuffing area 603, are displayed in a time axis 601 of a first video, when a user selects an audio stuffing area 603, the audio stuffing area 603 is highlighted in a manner to indicate that the audio stuffing area 603 is currently in an editing state, at this time, the user may set a start time and an end time corresponding to the audio stuffing area 603, for example, assuming that the initial start time and end time of the audio stuffing area 603 are 5 seconds and 10 seconds, respectively, and after the user sets the start time and end time of the audio stuffing area 603 on the time axis 601, as shown in a right diagram of fig. 6, the start time and end time of the audio stuffing area 603 are updated to 8 seconds and 11 seconds.

For example, referring to fig. 7, fig. 7 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 7, only an audio stuffing area 702 and a time period selection control 703 are displayed on a time axis 701 of a first video, when a user needs to add an audio stuffing area on the time axis 701 of the first video, a start time and an end time of the audio stuffing area that needs to be added may be set by using the time period selection control 703, and after the user sets the start time and the end time, a newly added audio stuffing area, for example, the newly added audio stuffing area 704 shown in fig. 7, may be displayed on the time axis 701 of the first video.

In step S103, in response to an audio setting operation, audio material to be filled in at least one audio filling area is acquired.

In some embodiments, when the audio setting entry is a recording entry, step S103 may be implemented by: responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state; displaying a recording control corresponding to the target audio fill area (e.g., displaying the recording control in the target audio fill area); in response to a first audio setting operation of starting (e.g., triggering for the first time) the recording control, starting to acquire audio; stopping capturing audio in response to a second audio setting operation that closes (e.g., re-triggers) the recording control; the collected audio is used as an audio material to be filled in the target audio filling area, so that a user can record own sound through the recording inlet in the process of manufacturing a video by using the video template, and the recorded audio is filled in the audio filling area, and personalized and efficient video editing can be realized through the video filling and audio filling functions matched with each other.

For example, referring to fig. 8, fig. 8 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 8, two audio filling regions, namely an audio filling region 802 and an audio filling region 803, are displayed on a time axis 801 of a first video, when a user selects an audio filling region 803, the audio filling region 803 is highlighted (i.e., the brightness is higher than that of the other audio filling regions) to indicate that the audio filling region 803 is currently in an editing state, and a recording control 804 corresponding to the audio filling region 803 is displayed at the same time, at this time, the user may click the recording control 804 to record, and a terminal uses collected audio as audio material to be filled in the audio filling region 803.

In other embodiments, when the audio setting entry is an audio material selection entry, then step S103 may be implemented by: responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state; displaying an audio material selection control, wherein the audio material selection control is used for selecting a plurality of candidate audio materials; in response to the audio setting operation selected through the audio material selection control, at least one selected audio material in the candidate audio materials is used as the audio material to be filled in the target audio filling area, so that a user can quickly select the audio material to be used from the candidate audio materials stored in the terminal equipment locally in advance through the audio material selection entrance, and the video editing efficiency is improved.

For example, referring to fig. 9, fig. 9 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 9, two audio stuffing areas, namely an audio stuffing area 902 and an audio stuffing area 903, are displayed on a time axis 901 of a first video, when a user's selection operation on the audio stuffing area 903 is received, the audio stuffing area 903 is highlighted in a highlighting manner to indicate that the audio stuffing area 903 is currently in an editing state, an audio material selection control 904 is displayed in a pop-up manner, a plurality of candidate audio materials (e.g., audio materials stored locally in the terminal in advance, including recording 1 to recording 12) are displayed in the audio material selection control 904, at this time, the user may select from the plurality of candidate audio materials displayed by the audio material selection control 904, for example, when a user's click operation on the audio material 905 is received, the audio material 905 is highlighted to indicate that the audio material is to be stuffed into the audio stuffing area 903.

It should be noted that, in practical applications, when a user selects a plurality of audio materials at one time in the audio material selection control, the plurality of audio materials may be sequentially used as audio materials to be filled in each of the audio filling regions displayed on the time axis of the first video, for example, assuming that the user selects the audio material 1, the audio material 2, and the audio material 3 at one time in the audio material selection control, the audio material 1 will be filled into the audio filling region 1 displayed on the time axis of the first video, the audio material 2 will be filled into the audio filling region 2 displayed on the time axis of the first video, and the audio material 3 will be filled into the audio filling region 3 displayed on the time axis of the first video.

In some embodiments, the format of the audio data collected by the audio collecting device may not match the format of the audio data required by the client, or the sampling frequency of the audio collected by the audio collecting device may not match the sampling frequency of the client when playing the audio, and after obtaining the audio material to be filled in the at least one audio filling area, the following processing may be further performed: converting the data format of the audio material into a data format conforming to the first video; the sampling frequency of the audio material is converted into the sampling frequency which accords with the first video, and therefore the converted audio material accords with the requirement of a client through conversion processing of the data format and the sampling frequency of the obtained audio material, and the audio material can be smoothly filled into an audio filling area displayed on the time axis of the first video.

In step S104, in response to the audio fill trigger operation for the first video, the second video is displayed in place of the first video.

Here, the second video is formed after at least one audio stuffing area of the first video is filled with the corresponding audio material.

In some embodiments, when displaying the second video in place of the first video, the following process may also be performed: responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state; displaying an audio re-recording entry (e.g., the audio re-recording entry may be displayed in the target audio fill area); responding to the triggering operation aiming at the audio re-recording inlet, and displaying a recording control for re-acquisition; starting to acquire audio in response to a third audio setting operation for starting the recording control; stopping collecting the audio in response to a fourth audio setting operation of closing the recording control; and taking the newly acquired audio as the audio material to be filled in the target audio filling area, so that when the user is not satisfied with the original audio material filled in the target audio filling area, the user can record again through the audio re-recording inlet, and replace the original audio material with the newly recorded audio.

In still other embodiments, when displaying the second video in place of the first video, the following process may be further performed: responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state; displaying an audio deletion entry; and in response to the triggering operation aiming at the audio deleting entry, deleting the audio material corresponding to the target audio filling area, so that when the user is not satisfied with the audio material filled in a certain audio filling area, the audio material can be deleted quickly through the audio deleting entry corresponding to the audio filling area.

In some embodiments, when displaying the second video in place of the first video, the following process may also be performed: responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state; displaying a volume adjustment entry; displaying a volume adjustment control in response to a trigger operation for the volume adjustment entry; and in response to the setting operation of the volume adjusting control, determining the set volume as the volume of the audio material corresponding to the target audio filling area, so that when the volume of the audio material filled in a certain audio filling area is not satisfied by a user, the volume of the audio material can be quickly adjusted through the volume adjusting inlet.

In still other embodiments, when displaying the second video in place of the first video, the following process may also be performed: responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state; displaying the sound variation entrance; displaying a plurality of candidate sound-emitting objects in response to a trigger operation for the sound-changing entry; in response to the operation of selecting the sound-producing object, the initial sound-producing object of the audio material corresponding to the target audio filling area is replaced by the selected target sound-producing object from the candidate sound-producing objects, so that the user can change the initial sound-producing object (namely the initial timbre of the audio material) of the audio material according to the preference of the user and change the sound into the timbre of other roles, and the personalized requirement of the user is met.

It should be noted that, in practical applications, the audio editing entries (i.e., the audio re-recording entry, the audio deleting entry, the volume adjusting entry, and the sound changing entry) may be displayed and used alternatively, or may be displayed and used in combination, for example, the audio editing entries may be displayed separately, or a unified audio editing entry may be displayed first, and after the unified audio editing entry is triggered, the specific types of audio editing entries are displayed.

The following description will be given by taking an example of combined display and use of the above-described respective audio editing portals.

For example, referring to fig. 10, fig. 10 is a schematic view of an application scenario of a video editing method provided in this embodiment, as shown in fig. 10, when a user is not satisfied with an imported audio material, the user may further click a corresponding audio stuffing area and fine-tune the audio material on a popped up floating window, where the fine-tuning may include deleting the audio material, adjusting a volume of the audio material, changing a sound object of the audio material, and the like, for example, when the user is not satisfied with the audio material stuffed in the audio stuffing area 1001, the audio stuffing area 1001 may be selected, and the audio stuffing area 1001 may be highlighted (e.g., highlighted) to indicate that the audio stuffing area 1001 is currently in an editing state, and further pop up a floating window 1002 above the audio stuffing area 1001, and an audio re-recording entry 1003, a volume adjustment entry 1004, an audio deleting entry 1005, and a changing entry 1006 are displayed in the floating window 1002; when a click operation of a user for the audio re-recording entrance 1003 is received, the recording control 1007 is displayed, and at this time, the user can re-record by clicking the recording control 1007 and replace the original audio material corresponding to the audio filling area 1001 with the re-acquired audio; when a click operation of the user volume adjustment entry 1004 is received, the volume adjustment control 1008 is displayed, and at this time, the user may set for the volume adjustment control 1008, for example, slide an adjustment button on the volume adjustment control 1008, and then use the volume adjusted by the user as the volume of the audio material corresponding to the audio filling area 1001; when a click operation of a user on the audio deletion entry 1005 is received, audio materials corresponding to the audio filling area 1001 are deleted; when a click operation of the user on the voicing inlet 1006 is received, the candidate voicing objects 1009, for example, the rough, rally, alien, and doll sounds, are displayed, and at this time, the user may select from the candidate voicing objects 1009, so as to replace the initial voicing object of the audio material corresponding to the audio filling region 1001 with the selected target voicing object of the candidate voicing objects 1009, for example, when the user selects the rough, the initial voicing object of the audio material corresponding to the audio filling region 1001 is replaced with the rough.

In still other embodiments, when displaying the second video in place of the first video, the following process may be further performed: responding to the audio material selection operation, and displaying that the selected target audio material is in an editing state; displaying a text recognition entry; and responding to the triggering operation aiming at the text recognition entrance, displaying a voice recognition result corresponding to the target audio material, and using the voice recognition result as a subtitle of a time segment filled with the target audio material in the second video, wherein the target audio material is the audio material in an editing state in the audio material to be filled.

For example, referring to fig. 11, fig. 11 is a schematic view of an application scene of a video editing method provided in an embodiment of the present application, as shown in fig. 11, when a user needs subtitle text corresponding to an audio material filled in an audio filling region 1101, the audio filling region 1101 may be selected first, at this time, the audio filling region 1101 may be highlighted to represent that the audio material filled in the audio filling region 1101 is in an edited state, and then when a user clicks on a text recognition entry 1102, the audio material is subjected to a speech recognition process to obtain a speech recognition result 1103 of the audio material filled in the audio filling region 1101, and at the same time, a corresponding subtitle 1104 is displayed in a first video, where the type of the subtitle 1104 may be a soft subtitle (also called an in-caption, a closed caption, a subtitle stream, or the like, and a subtitle file is embedded in a video stream as part of the video stream) or a hard subtitle (also called an in-caption, and a subtitle file and a video stream are compressed into the same group of data, and cannot be separated like a watermark.

It should be noted that, when the first video itself has audio, after the audio material is filled in the audio filling area displayed on the time axis of the first video, the audio of the first video itself may be replaced by the filled audio material (that is, only the filled audio material is played, and the audio of the first video itself is muted), and of course, the original audio of the first video and the filled audio material may also be coexistent, for example, the audio of the first video itself and the filled audio material are played through two channels, which is not specifically limited in this embodiment of the present application.

According to the video editing method provided by the embodiment of the application, after the first video filled with the video material is obtained by using the video template, the audio material can be further filled in the audio filling area of the first video, so that the second video filled with the audio material is obtained, the purpose of directly adding the audio in the process of making the video by using the video template is realized, the video editing efficiency is improved, and meanwhile, the requirement that a user expresses own personalized ideas and viewpoints by the audio in the process of making the video by using the video template is met.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application provides a video editing method, which is based on template editing video, creates a recording scene, so that a user can click a recording button in an arbitrarily provided video template to record, and uses own voice to carry out voice expression on the made video to own viewpoint and idea, thereby increasing the diversity of the video. In addition, in this scenario, the user can change the sound of the user, so as to simulate the sounds of different roles, and perform voice recognition processing on the recorded audio to generate corresponding subtitles.

In the embodiment of the application, a user can record within the recordable time range of the video template by clicking the recording button in the video template, so that the use experience of the user is improved, meanwhile, the individuation and the operation rapidness of the viewpoint expression of the video can be improved, and the diversified video production is promoted.

For example, referring to fig. 4, fig. 4 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 4, after a user opens a client, a plurality of candidate video templates are displayed on a home page of the client, the user selects a video template that the user wants to make on the home page of the client, and after clicking the template for use (for example, clicking a "use now" button), the user may select a picture/video in a local album of a terminal (for example, a mobile phone) to import into the selected video template.

For example, referring to fig. 5, fig. 5 is a schematic view of an application scenario of a video editing method provided in an embodiment of the present application, as shown in fig. 5, after a user imports a picture/video into a video template, if the imported picture/video is not satisfied, the user may click a corresponding picture/video, and perform fine tuning of the video on a pop-up floating window, where the fine tuning includes replacing the video (i.e., replacing the picture/video just imported and importing the picture/video newly selected by the user into the video template), cropping the video (performing picture size and content adjustment on the picture/video imported into the video template, and supporting to intercept a video material for a certain period of time, while also supporting to crop the picture/video according to the template picture size), and adjusting the volume of the video (performing volume adjustment on the picture/video imported into the video template), and if the user is satisfied with the picture/video imported into the video template, this step may be directly skipped.

For example, referring to fig. 12, fig. 12 is a schematic view of an application scene of a video editing method provided in an embodiment of the present application, and as shown in fig. 12, when a user wants to express his/her own idea to a template video by his/her own voice, the recording entry 1201 displayed on the video template may be directly clicked to open a recording function (i.e., when a click operation of the user on the recording entry 1201 is received, a recording button 1202 is displayed), a black box area 1203 is further displayed on the time axis of the video template, the black box area 1203 is a range that can be used for recording in the video template, and a portion outside the black box area 1203 is a range that cannot be used for recording by the user. The user clicks the record button 1202 to start recording, and in the recording process, a ripple 1205 is displayed on the periphery of the record button 1202 to reflect the size of the recording sound of the user in real time. In addition, during recording, a gray mask 1204 is overlaid in the black box area 1203, the overlaid area represents a recorded portion of the user, and the area not overlaid by the gray mask 1204 represents an unrecorded portion of the user.

When the user needs to pause the recording, the user may click the recording button 1202 again to pause the recording, and at this time, the user may directly click the confirmation button to end the recording, or may continue to click the recording button 1202 to continue recording the incomplete part.

After recording is completed, the client can automatically process the recording and display real-time progress 1206 for processing the recording, after the processing is completed, the user can click a text recognition entry 1207 to obtain a recognition result 1208 obtained by performing voice recognition processing on the recording, and meanwhile, corresponding subtitle characters 1209 can be displayed in the video template.

For example, referring to fig. 10, fig. 10 is a schematic view of an application scenario of the video editing method provided in this embodiment, as shown in fig. 10, after the recording is completed, a user may select a recording to be edited and fine tune the recording in a pop-up floating window, where the fine tuning includes a recording volume adjustment (selecting a recording editing portion, selecting a volume adjustment function, corresponding to the volume adjustment entry 1004, the adjustable volume range of the volume adjustment may be between 0 and 200), a recording change (selecting the recording editing portion, selecting a recording change function, corresponding to the change entry 1006, there are multiple selectable sounds that can be selected, and the recorded sound can be changed into a large tertiary sound, a romance sound, a alien sound, a doll sound, a fat sound, an aerolin reverberation sound, etc., and the user can complete the change after selecting a certain sound and clicking confirmation, and re-recording (selecting the recording editing portion, selecting a re-recording function, corresponding to the audio re-entry 1003, i.e. can re-empty and re-record the content that has just been completed.

According to the embodiment of the application, the sound of a user is collected through hardware (such as a microphone, an earphone, a Bluetooth earphone, a noise reduction earphone and the like) to be used as the input of the sound, meanwhile, the collected audio data is sampled and encoded through the hardware, in the recording process, the audio data is intercepted in real time, the recording real-time oscillogram is drawn, and in addition, the collected audio data can be stored.

For example, referring to fig. 13, fig. 13 is a schematic model diagram of a video editing system provided in an embodiment of the present application, and as shown in fig. 13, a recording controller serves as an entry of a recording function, and has functions of starting recording, pausing recording, stopping recording, and the like, and at the same time, through a call to a delegate (delete), a user can conveniently record and process a recording result; the recording synchronization protocol is used for acquiring the recording progress and calling back the recording result so as to draw a real-time recording oscillogram based on the recording result; recording configuration management, which is used for setting recording states and managing input channels, such as microphones, earphones and the like; audio data processing for processing conversion operations, such as audio format conversion; the audio frequency synchronous frequency control module is used for controlling the callback frequency, namely manually setting the callback frequency; the client can process the recording information (such as audio data collected by the earphone) by monitoring the recording controller, so as to complete the drawing of the recording real-time oscillogram.

In addition, since the refresh frequency of the terminal device running the client is much higher than the call frequency of the audio acquisition device (e.g., a headset), so that the user experience may be in a stuttering illusion, in order to solve this problem, the embodiment of the present application introduces the audio synchronization frequency control, and can set the callback frequency by the user, for example, the recording sampling frequency 44.1khz (i.e., the recording sampling frequency of the audio acquisition device, e.g., the call frequency of the headset) is converted into 48khz (i.e., the recording sampling frequency of the client, e.g., the refresh frequency of the terminal running the client), so as to ensure the fluency of the user experience. In addition, because the requirement of the externally set audio data format may not match with the audio data format of the client, the recording controller may perform format conversion of the audio data through audio data processing during recording, thereby solving the problem that the requirement of the externally set audio data format does not match with the audio data format of the internal software.

The recording process is explained below.

For example, referring to fig. 14, fig. 14 is a schematic flow diagram of a recording process provided in an embodiment of the present application, and as shown in fig. 14, a client first determines whether a recording right exists, and if the recording right does not exist, the client cannot perform recording, and directly enters an end state; in the case where it is determined that the recording right is present, the recording can be started after the format of the recording (default set to 44.1 khz) and the storage path of the recording are set. In the recording process, the audio acquisition device (e.g., an earphone) may send the acquired audio data to the client via a recording synchronization protocol, so that the client performs operations such as drawing a recording real-time oscillogram according to the received audio data. When the user leaves the client (for example, enters the background), the recording is stopped, that is, the recording is finished, and at this time, the recording synchronization protocol sends the finished recording file to the client to notify the client that the recording is finished.

In addition, due to limitations of the system, for example, a sudden call causes an interruption, a set time is reached, a system multimedia service is reset (i.e., the system is down and multimedia service information is restarted), a system multimedia service is lost (i.e., part of service information is lost after the multimedia service information is restarted due to the system is down), and other software system problems, and a change of an audio input channel (i.e., a recording channel) (i.e., a change of an audio acquisition device, such as inserting an earphone when recording on a mobile phone and changing to continue recording through the earphone) will cause recording to be ended (i.e., stop recording). In this case, similar to the above, the recording is finished, and the state is synchronized to the client through the recording synchronization protocol module, and how to process the recording is determined by the client.

In addition, it should be noted that, when recording is suspended, if the audio control state of the system is not changed, recording can be resumed, that is, the audio data collected before and after the suspension can be written into the same recording file; however, when the audio state control is changed after the pause, the current recording is considered to be finished, and if the recording is started again, a new recording, namely a new recording file, is obtained.

The management of the sound recording file is explained as a complete sound recording process.

A preparation stage: before starting recording, firstly setting a recording file storage path, including creating a recording folder and a recording file name (e.g. r _ 0.aac);

first recording: the recording process is the process of storing the audio data, and the acquired audio data can be written into the r _0 file;

and (4) pausing the recording: at this time, data cannot be written into the r _0 file;

and (4) recovering the recording: continuously writing the collected audio data into the r _0 file;

and (4) completing recording: ending the recording, stopping writing data into the r _0 file, and simultaneously returning the recording file r _0 to the client through the recording synchronization protocol module;

preparing a second recording: under the same folder, creating a sound recording file with the name r _ 1;

… (step of first recording)

Preparing a third recording: under the same folder, creating a sound recording file with the name r _ 2;

…

it can be seen that under the workspace, a recording folder is created, the name of the recording file is gradually incremented, e.g., r _0, r _1, r _2, …, and further, when the editing is completed, the entire recording folder is deleted when the delete operation is performed. In addition, when deleting an existing sound recording file, for example, assuming that there are three sound recording files of r _0, r _1and r _2, when deleting the sound recording file r _2, if recording is performed again next time, the name of the sound recording file will start from r _3, after completing recording, there will be three sound recording files of r _0, r _1and r _3under the sound recording folder, and recording is performed again, the name of the next sound recording file will be r _4, and so on. Here, after deleting the sound recording file r _2, naming is not started from r _2 but started from r _3, so as to avoid the problem that the name of the sound recording file is duplicated because the sound recording file r _2 is restored after deleting the sound recording file r _ 2.

In addition, for the abnormal situation, because the recording is executed in the path of the editing cache space, even if the recording is interrupted, the editing can be continued at the next editing time based on the cache information in the editing cache space at the position where the last recording was interrupted.

According to the video editing method provided by the embodiment of the application, in the process that the user uses the video template to make the video, the user can flexibly add the recording content of the user, so that the complex process that when the user adopts the method provided by the related technology to make the video, the user needs to export the template video additionally and then guides the final video into the editing software again to add the recording and the subtitle characters is avoided, and the video making efficiency is improved. Meanwhile, the voice recognition processing is carried out on the recording to generate the corresponding subtitle characters, the time for the user to additionally input character contents is reduced, the recorded sound can be changed according to the preference of the user, the sound is changed into the sound of other roles, the individuation of the user is enhanced, certain fun is added for the user during recording, and a simple scene recording atmosphere is created.

Continuing with the exemplary structure of the video editing apparatus 465 provided by the embodiment of the present application implemented as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the video editing apparatus 465 of the memory 460 may include: a display module 4651, an acquisition module 4652,

A display module 4651, configured to display a first video in response to a video filling trigger operation for a video template, where the first video is formed after at least one video material is filled in the video template; a display module 4651, further configured to display at least one audio stuffing area on a time axis of the first video; an obtaining module 4652, configured to, in response to an audio setting operation, obtain audio materials to be filled in at least one audio filling area; the display module 4651 is further configured to display a second video instead of the first video in response to an audio stuffing triggering operation for the first video, where the second video is formed after at least one audio stuffing area of the first video is stuffed with corresponding audio material.

In some embodiments, the display module 4651 is further configured to display an audio setting entry corresponding to the first video; and the control unit is used for responding to the triggering operation of the audio setting inlet, displaying a time axis of the first video, and displaying at least one audio filling area on the time axis, wherein the audio filling area is used for indicating a time period capable of filling the audio material on the time axis.

In some embodiments, the obtaining module 4652 is further configured to obtain the at least one audio stuffing region by at least one of: acquiring at least one preset audio filling area from a video template; dividing at least one audio filling area from a time axis according to the playing time of a video material filled in a first video; and dividing a time axis according to plot units of the first video, and determining an audio filling area corresponding to each plot unit.

In some embodiments, the display module 4651 is further configured to display the selected target audio fill area in an edited state in response to an audio fill area selection operation; in response to a time setting operation for the target audio fill area, the target audio fill area is updated to be displayed based on the start time and the end time set on the time axis.

In some embodiments, when the audio setting entry is a recording entry, the displaying module 4651 is further configured to display a recording control corresponding to a target audio filling area, where the target audio filling area is an audio filling area in an edited state in at least one audio filling area; the video editing apparatus 465 further includes a capture module 4653, configured to start capturing audio in response to a first audio setting operation of starting the recording control; stopping collecting the audio in response to a second audio setting operation of closing the recording control; the video editing apparatus 465 further includes a determining module 4654 configured to use the captured audio as the audio material to be filled in the target audio filling area.

In some embodiments, when the audio setting entry is an audio material selection entry, the display module 4651 is further configured to display an audio material selection control, wherein the audio material selection control is configured to select a plurality of candidate audio materials; the determining module 4654 is further configured to, in response to the audio setting operation selected through the audio material selection control, take the selected at least one of the plurality of candidate audio materials as an audio material to be filled in the target audio filling region, where the target audio filling region is an audio filling region in an editing state in the at least one audio filling region.

In some embodiments, the display module 4651 is further configured to display an audio re-recording entry; and the audio recording control is used for responding to the triggering operation aiming at the audio re-recording inlet and displaying the recording control for re-acquisition; a collecting module 4653, further configured to start collecting audio in response to a third audio setting operation of starting the recording control; stopping collecting the audio in response to a fourth audio setting operation of closing the recording control; the determining module 4654 is further configured to use the re-captured audio as audio material to be filled in a target audio filling area, where the target audio filling area is an audio filling area in an editing state in the at least one audio filling area.

In some embodiments, the display module 4651 is also configured to display an audio deletion entry; the video editing apparatus 465 further includes a deleting module 4655, configured to delete the audio material corresponding to the target audio filling region in response to the triggering operation for the audio deletion entry, where the target audio filling region is an audio filling region in an editing state in the at least one audio filling region.

In some embodiments, the display module 4651 is also used to display a volume adjustment entry; and for displaying a volume adjustment control in response to a triggering operation for the volume adjustment entry; the determining module 4654 is further configured to determine, in response to a setting operation for the volume adjustment control, the set volume as the volume of the audio material corresponding to the target audio filling region, where the target audio filling region is an audio filling region in an editing state in the at least one audio filling region.

In some embodiments, display module 4651, is also used to display an acoustic entry; and a display unit for displaying a plurality of candidate sound-emitting objects in response to a trigger operation for the sound-changing entry; the video editing apparatus 465 further comprises a replacing module 4656, configured to replace an initial sound object of the audio material corresponding to the target audio filling area with a selected target sound object of the plurality of candidate sound objects in response to the sound object selecting operation, wherein the target audio filling area is an audio filling area in an edited state in at least one audio filling area.

In some embodiments, the display module 4651 is further configured to display a text recognition entry; and the voice recognition result of the target audio material is displayed in response to the triggering operation aiming at the text recognition entrance, and the voice recognition result is used as the subtitle of the time segment filled with the target audio material in the second video, wherein the target audio material is the audio material in the editing state in the audio material to be filled.

In some embodiments, the display module 4651 is further configured to display at least one segment to be populated included in the video template and a video material selection control, where the video material selection control is configured to select a plurality of candidate video materials; and for highlighting the selected at least one video material in response to a video material selection operation via the video material selection control; the video editing apparatus 465 further includes a padding module 4657, configured to, in response to a video padding trigger operation for the video template, pad the selected at least one video material into a to-be-padded segment corresponding to the video template.

In some embodiments, the display module 4651 is further configured to display a plurality of candidate video templates; displaying a use entry of the selected video template of the plurality of candidate video templates in response to the video template selection operation; in response to a triggering operation of the use entrance aiming at the selected video template, displaying at least one segment to be filled included in the selected video template; and the introduction information is used for displaying the segments to be filled, wherein the introduction information is used for representing the types of the video materials to be filled in the segments to be filled.

In some embodiments, display module 4651, is also used to display alternate video portals; in response to a triggering operation for the replacement video entry, displaying a video selection control, wherein the video selection control comprises a plurality of candidate video materials; a replacement module 4656 is further configured to replace the video material with the selected at least one of the plurality of candidate video materials in response to the video material selection operation for use in the video selection control.

In some embodiments, the display module 4651 is also used to display a video cropping entry; the video clipping control comprises at least one of a picture clipping control and a time period selection control; the video editing apparatus 465 further comprises a cropping module 4658 for cropping the picture out of the set cropping frame from the video material in response to a cropping frame setting operation based on the picture cropping control; in response to a time setting operation of the time period-based selection control, video clips outside the set time period are cut out from the video material.

In some embodiments, the display module 4651 is further configured to display, in response to a video material selection operation, a selected target video material of the at least one video material in an editing state; displaying a volume adjustment inlet, and responding to the trigger operation aiming at the volume adjustment inlet to display a volume adjustment control; a determining module 4654, further configured to determine the set volume as the volume of the target video material in response to a setting operation for the volume adjustment control.

In some embodiments, the display module 4651 is further configured to display a plurality of candidate video assets; highlighting the selected at least one video material and displaying a plurality of candidate video templates matching the at least one video material in response to a video material selection operation; the filling module 4657 is further configured to, in response to a video template selection operation, fill at least one video material into a to-be-filled segment corresponding to the selected video template.

In some embodiments, the video editing apparatus 465 further comprises a conversion module 4659 configured to perform at least one of the following: converting the data format of the audio material into the data format conforming to the first video; converting the sampling frequency of the audio material to conform to the sampling frequency of the first video.

It should be noted that, in the embodiments of the present application, descriptions about devices are similar to the implementation of the video editing method described above, and have similar beneficial effects, and therefore, no further description is given. The technical details of the video editing apparatus provided in the embodiments of the present application that are not exhausted can be understood from the description of fig. 3.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the video editing method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, a video editing method as shown in fig. 3.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

To sum up, in the process of using the video template, the video material filling function and the audio material filling function are mutually matched, so that corresponding video materials and audio materials can be flexibly added according to requirements, the video editing efficiency is improved, and the requirements of video expression personalized ideas and viewpoints are fully met.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of video editing, the method comprising:

displaying at least one audio stuffing area on a timeline of the first video;

2. The method of claim 1, wherein displaying at least one audio fill area on a timeline of the first video comprises:

displaying an audio setting entry corresponding to the first video;

and in response to the triggering operation of the audio setting entry, displaying a time axis of the first video, and displaying at least one audio filling area on the time axis, wherein the audio filling area is used for indicating a time period capable of filling the audio material on the time axis.

3. The method of claim 2, wherein prior to displaying at least one audio fill area on the timeline, the method further comprises:

obtaining the at least one audio fill region by at least one of:

acquiring the at least one preset audio filling area from the video template;

dividing the at least one audio filling area from the time axis according to the playing time of the video material filled in the first video;

and dividing the time axis according to plot units of the first video, and determining an audio filling area corresponding to each plot unit.

4. A method according to claim 2 or 3, characterized in that the method further comprises:

responding to the audio filling area selection operation, and displaying that the selected target audio filling area is in an editing state;

updating to display the target audio stuffing area based on the start time and the end time set on the time axis in response to a time setting operation for the target audio stuffing area.

5. The method of claim 2, wherein when the audio setting entry is a recording entry, the obtaining audio material to be filled in the at least one audio filling area in response to an audio setting operation comprises:

displaying a recording control corresponding to a target audio filling area, wherein the target audio filling area is an audio filling area in an editing state in the at least one audio filling area;

starting to acquire audio in response to a first audio setting operation for starting the recording control;

stopping collecting audio in response to a second audio setting operation of closing the recording control;

and taking the acquired audio as an audio material to be filled in the target audio filling area.

6. The method of claim 2, wherein when the audio setting entry is an audio material selection entry, the obtaining audio material to be filled in the at least one audio filling area in response to an audio setting operation comprises:

displaying an audio material selection control, wherein the audio material selection control is used to select a plurality of candidate audio materials;

and in response to the audio setting operation selected through the audio material selection control, taking the selected at least one of the candidate audio materials as an audio material to be filled in a target audio filling area, wherein the target audio filling area is an audio filling area in an editing state in the at least one audio filling area.

7. The method of claim 1, wherein when displaying a second video in place of the first video, the method further comprises:

displaying an audio re-recording entry;

displaying a recording control for reacquiring in response to a triggering operation for the audio re-recording entry;

starting to acquire audio in response to a third audio setting operation for starting the recording control;

stopping collecting audio in response to a fourth audio setting operation of closing the recording control;

and taking the newly acquired audio as audio materials to be filled in a target audio filling area, wherein the target audio filling area is the audio filling area in the editing state in the at least one audio filling area.

8. The method of claim 1, wherein when displaying a second video in place of the first video, the method further comprises:

displaying an audio deletion entry;

and in response to the triggering operation aiming at the audio deleting inlet, deleting the audio material corresponding to a target audio filling area, wherein the target audio filling area is the audio filling area in the editing state in the at least one audio filling area.

9. The method of claim 1, wherein when displaying a second video in place of the first video, the method further comprises:

displaying a volume adjustment entry;

displaying a volume adjustment control in response to a triggering operation for the volume adjustment inlet;

and in response to the setting operation of the volume adjusting control, determining the set volume as the volume of the audio material corresponding to a target audio filling area, wherein the target audio filling area is the audio filling area in the editing state in the at least one audio filling area.

10. The method of claim 1, wherein when displaying a second video in place of the first video, the method further comprises:

displaying a variant sound inlet;

displaying a plurality of candidate sound emitting objects in response to a trigger operation for the sound variation entrance;

and in response to the sound-emitting object selection operation, replacing an initial sound-emitting object of the audio material corresponding to the target audio filling area with a selected target sound-emitting object in the candidate sound-emitting objects, wherein the target audio filling area is an audio filling area in an editing state in the at least one audio filling area.

11. The method of claim 1, wherein when displaying a second video in place of the first video, the method further comprises:

displaying a text recognition entry;

and responding to the triggering operation aiming at the text recognition entrance, displaying a voice recognition result of a target audio material, and using the voice recognition result as a subtitle filled with a time segment of the target audio material in the second video, wherein the target audio material is the audio material in an editing state in the audio material to be filled.

12. The method of claim 1, wherein prior to displaying the first video in response to the video population trigger operation for the video template, the method further comprises:

displaying a plurality of candidate video materials;

in response to a video material selection operation, highlighting the selected at least one video material and displaying a plurality of candidate video templates matching the at least one video material;

and responding to the video template selection operation, and filling the at least one video material into the segment to be filled corresponding to the selected video template.

13. A video editing apparatus, characterized in that the apparatus comprises:

the display module is further configured to display a second video instead of the first video in response to an audio padding trigger operation for the first video, where the second video is formed after the at least one audio padding region of the first video is padded with a corresponding audio material.

14. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the video editing method of any one of claims 1-12 when executing executable instructions stored in the memory.

15. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the video editing method of any one of claims 1-12.