CN113727140A

CN113727140A - Audio and video processing method and device and electronic equipment

Info

Publication number: CN113727140A
Application number: CN202111017334.6A
Authority: CN
Inventors: 高桦
Original assignee: Vivo Mobile Communication Hangzhou Co Ltd
Current assignee: Vivo Mobile Communication Hangzhou Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-30
Also published as: WO2023030270A1

Abstract

The application discloses an audio and video processing method and device and electronic equipment, and belongs to the field of electronic equipment. In the embodiment of the application, a first input of a user can be received under the condition of playing a first audio and video, a second audio and video obtained by shooting or playing is displayed at a first moment corresponding to the playing progress of the first audio and video, and then the second audio and video is synthesized into the first audio and video to generate a synthesized audio and video. Therefore, the time corresponding to the appropriate playing progress can be directly selected on the basis of the original audio and video, the second audio and video is obtained through the recording or playing function of the electronic equipment to be synthesized with the original audio and video, and then the synthesized audio and video is obtained, so that the operation is convenient and fast, and the efficiency is high.

Description

Audio and video processing method and device and electronic equipment

Technical Field

The application belongs to the technical field of electronic equipment, and particularly relates to an audio and video processing method and device and electronic equipment.

Background

Users often want to clip audio/video files to clip more interesting audio/video, but the clipping operation is cumbersome.

For example, when a user records a life segment by recording an audio/video, a video log (or a video blog) and the like through an electronic device such as a mobile phone, a camera application, a short video application or a beautification shooting application provided in the mobile phone is often adopted, but the recorded audio/video content often cannot completely meet the shooting expectation of the user, and therefore needs to be edited again. However, the editing capability of current recording applications (including the camera application, the short video application or the beautification shooting application) is limited, and the operation difficulty and complexity of the audio/video editing applications are high, so that the related technologies lack convenience for processing audio/video materials.

Disclosure of Invention

The embodiment of the application aims to provide an audio and video processing method and device and electronic equipment, and the problem that convenience is lacked in processing of audio and video materials can be solved.

In a first aspect, an embodiment of the present application provides an audio and video processing method, where the method includes:

receiving a first input of a user under the condition of playing a first audio and video;

responding to the first input, displaying a second audio and video at a first moment corresponding to the first audio and video playing progress, wherein the second audio and video is obtained by recording or playing;

and synthesizing the second audio and video in the first audio and video to generate a synthesized audio and video.

In a second aspect, an embodiment of the present application provides an apparatus for audio and video processing, where the apparatus includes:

the first receiving module is used for receiving a first input of a user under the condition of playing a first audio and video;

the first display module is used for responding to the first input, displaying a second audio and video at a first moment corresponding to the first audio and video playing progress, and the second audio and video is obtained by recording or playing;

and the generating module is used for generating a synthesized audio and video, and the synthesized audio and video is obtained by synthesizing a second audio and video in the first audio and video.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, a first input of a user can be received under the condition of playing a first audio and video, a second audio and video obtained through recording or playing is displayed at a first time corresponding to the playing progress of the first audio and video, and then the second audio and video is synthesized into the first audio and video to generate a synthesized audio and video. Therefore, a proper playing progress can be directly selected on the basis of the original audio and video, and the second audio and video is obtained through the recording or playing function of the electronic equipment to be synthesized with the original audio and video so as to obtain a synthesized audio and video, so that the method is convenient and fast to operate and efficient.

Drawings

Fig. 1 is a schematic flowchart of an audio/video processing method provided in an embodiment of the present application;

FIG. 2 is a display diagram illustrating interface hopping in one particular example of the subject application;

FIG. 3 is a schematic illustration of a display of a playback interface in a specific example of the present application;

FIG. 4 is a schematic illustration of display of an interface jump in another particular example of the present application;

FIG. 5 is a schematic illustration of a display of a desktop of an electronic device system in another specific example of the present application;

FIG. 6 is a schematic illustration of a display of a capture interface in yet another specific example of the present application;

FIG. 7 is a schematic illustration of display of an interface jump in yet another specific example of the present application;

fig. 8 is a schematic structural diagram of an audio/video processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 10 is a hardware structure diagram of an electronic device implementing an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

Due to the enrichment of contents such as short videos and streaming media, the requirements and ideas of users for interesting video clips and edits are also enriched. Compared with the accurate flexibility of multitasking at a computer (personal computer, PC) end and mouse operation, to complete the synthesis of a video at a mobile end such as a mobile phone, a third-party software program is generally needed to perform more complicated processing, for example, adjusting the video editing time by manually dragging a video progress bar. Therefore, the user cannot easily obtain the desired synthesized audio/video.

Therefore, the embodiment of the application provides an audio and video processing method and an electronic device, so as to solve at least one technical problem. The audio/video processing method may be executed on an electronic device, and the electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a Personal Digital Assistant (PDA), and the embodiments of the present application are not limited specifically.

The audio and video processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Fig. 1 shows a schematic flow diagram of an audio and video processing method provided by an embodiment of the present application. As shown in fig. 1, the method includes steps S101 to S103:

s101, receiving a first input of a user under the condition of playing a first audio and video.

The audio and video in the text can be video or audio, and the same is true below.

The first audio and video may be from the local electronic device, or may be an audio and video downloaded or cached through the internet, which is not limited in this embodiment.

When the first audio and video is played, the first audio and video can be displayed on the whole screen or part of the screen of the electronic equipment.

For example, the first input may be a click input of a user on a screen, or a voice instruction input by the user, or a specific gesture or an air-gap gesture input by the user, which may be determined according to actual usage requirements, and this embodiment does not limit this.

The click input can be single click input, double click input or click input of any number of times, and can also be long-press input or short-press input. The specific gesture may be any one of a tap gesture, a double tap gesture, a swipe gesture, a drag gesture, a zoom gesture, and a rotate gesture.

And S102, responding to the first input, and displaying a second audio and video at a first moment corresponding to the first audio and video playing progress, wherein the second audio and video is obtained by recording or playing.

When the first audio and video is played, the picture and the playing progress information of the first audio and video can be displayed on the screen of the electronic equipment, and the progress information can be displayed as a progress bar and/or progress time.

And responding to the first input at a first moment corresponding to the first audio and video playing progress, and displaying a second audio and video. The first time can be a progress time in the process of playing the first audio and video by the user, or can be determined by manually dragging the first video progress bar or manually inputting the playing time.

The second audio/video may be a video obtained through a recording function provided in the electronic device, or may be a video obtained through a playing function provided in the electronic device.

Illustratively, in the case of displaying the second audio/video, the playing of the first audio/video may be paused, or the playing of the first audio/video may be continued at the same time.

In addition, the second audio and video can be displayed on all screens of the electronic equipment, and the first audio and video and the second audio and video can also be respectively displayed in different screen areas.

Illustratively, if the second audio/video is the recorded audio/video, when the second audio/video is displayed, a recording picture, a recording progress, a recording-related function identifier, and the like of the second audio/video can be displayed.

If the second audio/video is the audio/video obtained by playing, when the second audio/video is displayed, the playing picture, the progress information, the playing related function identification and the like of the second audio/video can be displayed.

And S103, synthesizing a second audio and video in the first audio and video to generate a synthesized audio and video.

And synthesizing the second audio and video displayed in the step S102 into the first audio and video to generate a synthesized audio and video, where the generated synthesized audio and video may be displayed on a screen of the electronic device or may be directly stored in the local electronic device in the background, and this embodiment is not limited.

The synthesis may be that the second audio/video is inserted into the first audio/video, or that the second audio/video replaces part of the segments in the first audio/video.

It is understood that the second video and audio can also be a picture, which is equivalent to the second video and audio only having one frame of image.

It can be understood that, by the method described above in the embodiments of the present application, another piece of video may be synthesized in one piece of video, one piece of audio may be synthesized in one piece of video or one piece of video may be synthesized in one piece of audio, and another piece of audio may also be synthesized in one piece of audio.

By the method, the first input of the user can be received under the condition that the first audio and video is played, the second audio and video obtained through recording or playing is displayed at the first time corresponding to the playing progress of the first audio and video, and then the second audio and video are synthesized into the first audio and video to generate the synthesized audio and video. Therefore, the time corresponding to the appropriate playing progress can be directly selected on the basis of the original audio and video, the second audio and video is obtained through the shooting or playing function of the electronic equipment to be synthesized with the original audio and video, and then the synthesized audio and video is obtained, so that the operation is convenient and fast, and the efficiency is high.

For example, in a specific application scenario of the embodiment of the application, if a user wants to synthesize other video segments in a first video, a shooting function of an electronic device may be triggered to shoot a segment of audio/video (a second audio/video) when the user watches the first video for a certain progress time (or manually drags the progress for a certain progress time), and a synthesized audio/video is generated when shooting is finished. Specifically, the present example will be described below with reference to fig. 2.

Fig. 2 (2a) shows a playing interface of a short video (within 30 minutes of playing the video) on the internet. Taking the clip of the short video (i.e. the first video) as an example, in the embodiment of the present application, the step S101 receives the first input of the user under the condition of playing the first audio and video, which may specifically be as follows:

in the electronic device 200 shown in (2a) in fig. 2, a play interface 201 is displayed with a Button control (Button)202 or a floating window function key. When the video is played to the first time corresponding to the first video playing progress, by clicking the button control 202 or the floating window, the video clip interface 203 shown in (2b) in fig. 2 may be skipped to execute step S102.

Before jumping to the video clip interface 203, since the first video is a streaming media file on the internet, the user may be prompted to download the played first video in a pop-up window manner, and if the download is allowed, the video cache is opened, and the subsequent video processing steps S102 to S103 are performed.

It can be understood that if the first video of the internet is not allowed to be downloaded (e.g., the copyright limitation does not allow the user to download) or the video cannot be downloaded, the user cannot jump to the editing interface through the pop-up prompt.

In other examples, if the first video is a long video, the electronic device 300 may be triggered to jump to a video clip interface (not shown) by a displayed shortcut key at the long video playing interface 301 shown in fig. 3, or jump to the video clip interface by long pressing any volume key 302 of the electronic device 300 in the case of displaying the long video playing interface 301. The long video may be a streaming media file of the internet, or a video file local to the electronic device.

Referring to fig. 2 again, after jumping to the video clip interface 203, before executing step S102 to respond to the first input and displaying the second audio/video at the first time corresponding to the first audio/video playing progress, the user may first select a video composition mode through a control 204 or a menu on the video clip interface. Wherein, the synthetic mode can comprise one or more of insertion, automatic replacement and free replacement.

The insertion mode refers to inserting the second video into a certain position in the first video and splicing the second video and the first video.

The automatic replacement mode is that when the recording of the second video is finished, the second video is automatically synthesized into the first video, and the segments with the same duration in the first video are replaced.

The free replacement mode is that any target time length (the time length is not greater than the total playing time length of the first video) is input before or after the second video is recorded, when the second video is synthesized into the first video, a segment corresponding to the target time length in the first video is replaced, and the time length of the second video may be equal to the target time length or may not be equal to the target time length.

For example, after determining the synthesis manner, the step S102, in response to the first input, displays a second audio/video at a first time corresponding to the first audio/video playing progress, which may specifically include:

in response to the first input, in a case where the first av is played to a first time in the first display region 205, the second av is displayed in the second display region 206.

In this step, in response to the first input, a first video and a second video may be displayed in a first display area and a second display area on the electronic device, respectively. For example, in the playing interface 201 shown in (2a) in fig. 2, the display area of the first video is the whole screen, and in response to the first input, the electronic device performs split-screen display, as shown in (2b) in fig. 2, and simultaneously displays the playing screen of the first video and the recording screen of the second video in the first display area 205 and the second display area 206 of the electronic device 200, respectively.

Therefore, the picture is displayed, played and recorded through split screen display, so that the user can know the clipping information of the first video and the second video more intuitively, for example, the user can know the insertion point of the first video (such as the progress position at the first moment) and the shooting duration of the second video intuitively, the user can compare the clipping information of the two videos conveniently and efficiently, and the clipping processing is more convenient and efficient.

In order to improve convenience of audio/video synthesis operation and visually know information such as a picture, progress and the like in a second audio/video acquisition process, for example, after receiving a first input through step S101, step S102 responds to the first input, and displays a second audio/video at a first time corresponding to a first audio/video playing progress, which may specifically include:

and S1021, responding to the first input, and starting to dynamically display the picture of the target audio and video at a first moment corresponding to the first audio and video playing progress.

In this step, the target audio/video is recorded or played audio/video.

And if the second audio and video is obtained through recording, responding to the first input, and starting to display a recording picture of the target audio and video at a first moment corresponding to the first audio and video playing progress. For example, as shown in (2b) of fig. 2, the recording picture of the target video is dynamically displayed in the second display area 206, so that the user can conveniently adjust the recording angle of view of the electronic device according to the recording picture, and after all required video pictures are recorded, the recording is finished.

It should be understood that if it is a recording screen of the target audio, the recording screen may include a recording status (recording, recording pause or recording stop) button, a recording progress (which may be represented by a progress bar or a recording time, etc.), and the like. In the recording process of the target audio, the recording progress changes along with time, and the recording picture can dynamically display the change of the recording progress.

And if the second audio/video is obtained through playing, responding to the first input, and starting to display a playing picture of the target audio/video at a first moment corresponding to the first audio/video playing progress. The target audio and video can be from an audio and video database of the electronic equipment, and the playing is finished after all the video pictures required by the user are played.

It should be understood that if the playback screen is a playback screen of the target audio, the playback screen may include a playback status (playback, pause playback, or stop playback) button, a playback progress (which may be represented by a progress bar or a playback time, etc.), and the like. In the process of playing the target audio, the playing progress changes along with time, and the playing picture can dynamically display the change of the playing progress.

And S1022, receiving a second input of the user.

For example, the second input may be a click input of the user on the screen, or a voice instruction input by the user, or a specific gesture or an air-gap gesture input by the user, which may be determined according to actual usage requirements, and this embodiment does not limit this.

And S1023, responding to the second input, stopping dynamically displaying the picture of the target audio/video to obtain a second audio/video.

The user stops dynamically displaying the picture of the target audio and video through the second input, namely if the recorded picture is dynamically displayed, the recording is finished in response to the second input, the dynamic display of the recorded picture is stopped, and the recorded result is used as the second audio and video; and if the playing picture is dynamically displayed, responding to a second input, ending the playing, stopping the dynamic display of the playing picture, and taking the data played from the target audio/video as a second audio/video.

In order to meet diversified synthesis requirements of users, the embodiment of the application can complete synthesis in modes of insertion, automatic replacement with equal time length, free replacement with unequal time length and the like.

For example, after obtaining the second video meeting the user requirement, if the synthesis mode selected by the user in advance is insertion, step S103 synthesizes the second audio and video in the first audio and video to generate a synthesized audio and video, which may specifically include:

and splicing the second audio and video in the position corresponding to the first moment in the first audio and video to generate a synthetic audio and video.

As shown in (2b) in fig. 2, the user may end recording the second video by clicking the function button 207 on the video clip interface 203, and directly insert the recorded second video into the first video at a position corresponding to the first time to splice the videos, so as to generate a composite video.

In this example, in the insertion mode, in response to the first input, the first video may be displayed in a paused state while the second video is displayed. And after the second video is shot, directly inserting the second video at the temporary progress position of the first video to generate the synthetic audio and video. For example, when a user traces a series and wants to take a video in time with a favorite hero or star, a first input may be input to a video screen node on which the hero or star is played, and a second video including the user himself may be taken and synthesized.

Therefore, when the recording is finished, the electronic equipment automatically splices the recorded second video according to the insertion point (namely the position corresponding to the first moment in the first video) selected by the user, so that the user does not need to operate too much, and the method is simple, convenient and quick. When the user clips the self-recorded video, the user can flexibly carry out the complementary shooting, the continuous shooting and the like of the video picture according to the self inspiration, so that the video content shot by the user is conveniently enriched and improved, and the quality of the video shooting is improved. And the editing can be finished without the assistance of third-party video editing software, so that the memory occupation of the mobile phone is saved.

It will be appreciated that the recorded second video may also be an image, corresponding to the second video being as short as only one frame.

Optionally, in this example, a first identifier 208 may be further provided on the video clip interface, and the first identifier 208 is used to jump to a material storage area (audio and video database) of the electronic device, such as an album 209. The user can select the video 210 or the picture file as the second video in the album 209 and insert the second video into the first video at the position corresponding to the first moment, so as to complete the splicing and composition of the videos.

For example, after obtaining the second video meeting the user requirement, if the synthesis mode selected by the user is automatic replacement, step S103 synthesizes the second audio and video in the first audio and video to generate a synthesized audio and video, which may specifically include:

and replacing the target segment in the first audio and video with the second audio and video to generate a synthetic audio and video. The target segment may be a segment from a first time (insertion starting point) to a second time (insertion ending point) in the first audio/video, or may be determined through manual input (such as starting and ending time of inputting the segment or dragging a progress bar), but the duration of the target segment is equal to the duration of the second video, so that the total duration of the synthesized audio/video is ensured to be unchanged.

In order to simplify the operation of the peer-to-peer time length automatic replacement mode, optionally, if the user selects the automatic replacement mode in advance, before the second audio/video is obtained, the second audio/video can also be obtained by automatically associating according to the manually input target segment time length. For example, before obtaining the second audio/video, the step S102 responds to the first input, and displays the second audio/video at a first time corresponding to the first audio/video playing progress, which may specifically include:

s1024, responding to a first input, and starting to dynamically display a picture of a target audio and video at a first moment corresponding to a first audio and video playing progress, wherein the target audio and video is a recorded or played audio and video;

and S1025, after the target duration is preset, stopping dynamically displaying the picture of the target audio and video to obtain a second audio and video.

The preset target duration is the duration of the target segment determined by the user in the automatic replacement mode, and the target duration can be the start-stop time of manually inputting the target segment or the start-stop time determined by dragging the progress bar.

And automatically stopping after the recorded or played picture time length reaches a preset target time length to obtain a second audio/video.

Correspondingly, step S103 may specifically include:

and replacing a first segment in the first audio/video with a second audio/video to generate a synthesized audio/video, wherein the first segment is a target segment determined by a user, the starting moment is determined through manual input, and further the duration of the first segment is determined, and the duration is equal to the preset target duration, so that the equal-duration segment replacement is automatically realized in the synthesis process.

For example, after obtaining the second video meeting the user requirement, if the synthesis mode selected by the user in advance is free replacement, step S103 synthesizes the second audio and video in the first audio and video to generate a synthesized audio and video, which may specifically include steps S301 to S303:

s301, receiving a third input;

s302, responding to a third input, and determining the duration of a target segment in the first audio and video.

The third input is used for inputting a target time length, wherein the target time length can be a start-stop time determined on the video playing progress bar or a time period, and the start-stop time on the progress bar can be in the form of a starting time "00:: 05: 00" and an ending time "00: 10: 00" for example, and represents a 5-minute time length segment from the first time 00::05:00 to the second time 00:10:00 of the first video; the time period may be in the form of, for example, "5 min" or "300 s," representing 5 minutes from the start of the first instance of the first video.

In the free replacement mode, the target duration determined by the third input can be set arbitrarily, but cannot exceed the total duration of the first video.

And S303, replacing the target segment in the first audio/video with the second audio/video to generate a synthetic audio/video.

And replacing the target segment in the first video with the second video at the moment of obtaining the second audio/video, wherein the target segment is a segment corresponding to the target duration in the first audio/video.

In this embodiment of the application, in the free replacement mode, through the above steps S301 to S303, a second video with any duration may be obtained by recording or playing, and then the target duration of the target segment in the first video is determined through a third input, so as to replace the target segment in the first video with the second video, where the replaced target segment may be equal to or unequal to the duration of the second video. For example, by the second video of 5 minutes, the target segment of the first video, which is targeted for 10 minutes from the first time, is replaced. Therefore, the video clip operation is more free and flexible, and the diversified requirements of users are met.

It should be understood that, the above steps S301 to S302 may also be executed before step S102, and the duration of the target segment is determined first, then the second audio/video is obtained through step S102, and then the second audio/video is replaced by the target segment in the first audio/video through step S103, and also the free replacement of the segments with unequal durations may be implemented.

To facilitate starting the audio/video synthesizing operation, for example, in another application scenario of the embodiment of the present application, the first video may be a video locally stored from the electronic device, and before step S101, the method may further include:

s1011, receiving a sixth input of the target identifier in the recording interface from the user under the condition of displaying the recording interface.

In this example, as shown in (4a) in fig. 4, the recording interface 401 may be provided with a first identifier 402 and a second identifier 403, the first identifier 402 may be a function identifier for entering an album of the electronic device, and the second identifier 403 may be a function identifier for triggering the skip video clip interface.

The target identifier is the first identifier or the second identifier.

S1011, under the condition that the target identification is the first identification, responding to a sixth input, and displaying candidate audios and videos; and

in a case where the target identifier is the second identifier, in response to a sixth input, a target-clipping interface is displayed.

This example is explained in connection with the example of video clips shown in the figure.

As shown in fig. 4 (4a), if the target identifier is the first identifier 402, the sixth input is an input to the first identifier 402, and in response to the sixth input, the album 404 of the electronic device shown in fig. 4 (4b) may be entered to display a number of candidate audios and videos 405, which may include audio and video. In this way, the target file 406 can be selected from the candidate audios and videos 405 as the first audio and video, and steps S101 to S103 are executed.

If the target identifier is the second identifier 403, the sixth input is an input to the second identifier 403, and then the user may jump to the video clip interface 203 as shown in (2b) in fig. 2, and may enter the album to select the target file as the first video through the function identifier 208 set by the video clip interface 203 for entering the album of the electronic device.

According to the embodiment of the application, the first audio and video can be quickly selected and played to start the audio and video clip through the identification of the recording interface, or the first audio and video can be selected and the second audio and video can be recorded through directly entering the video clip interface, so that the quick clip is realized.

In order to facilitate the start of the audio/video composition operation, optionally, before step S101, as shown in fig. 5, the user may directly enter the album of the electronic device through an album function identifier 501 on the system desktop of the electronic device 500, select the target file as the first video, and execute the playing and clipping processing in steps S101 to S102.

In order to improve the flexibility of the audio/video synthesis operation, optionally, in the process of executing step S101 after selecting the first video, under the condition that the first video is played to the first time, or the progress bar is manually dragged to the first time, or the playing time is manually input to skip to the first time, after receiving the first input, in the process of recording the second audio/video through step S102, the image recording capability (such as a filter function, a wide angle/macro function, a re-shooting function, and the like) and the video editing capability of the electronic device may be multiplexed.

For example, in the process of performing step S102, referring to fig. 6, in the case of displaying the recording interface 601 of the second video, the recording screen of the first video is adjusted by a filter function, a wide angle/macro function, and the like on the recording interface. And a record confirmation identifier 602 and a record cancellation identifier 603 may also be provided on the recording interface 601. The confirmed recording identifier 602 is used to determine the currently recorded video as the second video, and can be synthesized into the first video; the cancel recording flag 603 is used to automatically discard the currently recorded video and re-capture the second video.

Therefore, the second video meeting the requirements of the user is obtained by multiplexing the image recording capability of the electronic equipment.

And as shown in fig. 6, a plurality of video clip thumbnails of the first video, such as a video clip thumbnail 604 before the insertion start point (such as the first time) and a video clip thumbnail 605 after the insertion start point/end point and a preview thumbnail 606 of the second video, may also be displayed on the shooting interface 601.

Therefore, different thumbnails can be manually clicked, different segments of the first video can be selected for display, and specific segment re-editing can be conveniently and pertinently carried out. If in the process of recording the second video, clicking the video segment thumbnail 604 before the insertion starting point (the first moment), playing the corresponding previous segment of the first video in the first display area, and editing the segment again through the related editing functions (the brightness function, the contrast function, the color function, and the like); or clicking the video segment thumbnail 605, playing the next segment corresponding to the first video in the first display area, and editing the segment again through the related editing function.

According to the embodiment of the application, the video capacity and the video editing capacity are recorded in a multiplexing mode, video segments before and after the first video is inserted into the starting point and the ending point can be visually seen and edited, and compared with thumbnails of the recorded second videos, a user can judge the synthesized video effect visually and conveniently according to the three videos, so that the user can adjust the video effect at any time.

In another application scenario of the embodiment of the present application, as shown in fig. 7 (7a), after entering the video clip interface 701, a video addition identifier 702 may be set on the video clip interface 701, where the video addition identifier 702 is a function identifier for entering an album of the electronic device. Correspondingly, before the step S103 of synthesizing the second audio/video in the first audio/video to generate the synthesized audio/video, the method may further include:

s104, receiving a fourth input of the user under the condition that a second audio/video is obtained through recording; and

and S105, responding to the fourth input, and selecting and acquiring a third audio and video from the target audio and video database.

The fourth input may be a click input to the video addition identifier 702, and in response to the fourth input, as shown in fig. 7 (7b), the album 703 of the electronic device is entered, and the target file 704, such as a video or a picture, is selected again as the third audio and video.

Step S103 may include:

and synthesizing the second audio/video and the third audio/video into the first audio/video to generate the synthesized audio/video.

Therefore, after the second video is obtained in response to the first input, the second audio/video and the third audio/video can be synthesized into the first audio/video to generate a synthesized audio/video, and the rapid synthesis of multiple materials is completed.

It is to be understood that in the above examples, the processing manner of the first video is also applicable to the processing of the audio, and in the above examples, the manner of acquiring the second video and synthesizing the video is also applicable to acquiring the audio and synthesizing the audio.

Exemplarily, in step S103, the method may further include:

s1031, generating subtitle information according to the second audio and video; and

and S1032, associating the subtitle information with the second audio/video, and synthesizing the subtitle information and the second audio/video into the first audio/video to generate the synthesized audio/video.

When the audio and video are synthesized, the first video can be a video with subtitle information from movie and television play fragments and the like, or when a second video or a second audio inserted into the first video needs to be provided with the subtitle information, the voice information in the video or the audio shot by a user can be converted into the subtitle information through the voice recognition conversion function of the electronic equipment, and the subtitle information is combined with the second audio and video to be synthesized in the first audio and video when the synthesized audio and video is generated, so that the diversity of information is enriched while the synthesis is convenient, and the diversified requirements of the user are met.

Optionally, the format of the subtitle information in the first video may be automatically detected, and the subtitle information of the first audio and video is displayed according to the format of the subtitle information in the first video, so as to improve the consistency of the content of the synthesized audio and video.

In other examples, in step S102, in response to the first input, a second audio/video is displayed at a first time corresponding to the first audio/video playing progress, where the second audio/video may be played.

In this example, in response to the first input, a local album of the electronic device may be entered, a video (i.e., a target video) is selected to jump to a playing interface, and a playing start point and an playing end point are manually adjusted on the playing interface, or the playing start point and the playing end point are manually input to determine a second video, where the second video is a segment of the target video.

After the second video is determined, the second video is automatically composited into the first video. The synthesis may be by way of insertion or substitution.

The example can enrich the audio and video synthesis mode by synthesizing different materials in the photo album, and is convenient and quick to operate.

It should be noted that in the audio/video processing method provided in the embodiment of the present application, the execution main body may be an audio/video processing device, or a control module used for executing the audio/video processing method in the audio/video processing device. In the embodiment of the present application, an audio/video processing device is taken as an example to execute an audio/video processing method, and the audio/video processing device provided in the embodiment of the present application is described.

Fig. 8 is a schematic structural diagram of an audio/video processing device provided in an embodiment of the present application. As shown, the apparatus comprises:

the first receiving module 801 is configured to receive a first input of a user under the condition that a first audio/video is played;

the first display module 802 is configured to display, in response to the first input, a second audio/video at a first time corresponding to the first audio/video playing progress, where the second audio/video is obtained by recording or playing;

the generating module 803 is configured to generate a synthesized audio/video, where the synthesized audio/video is obtained by synthesizing a second audio/video in a first audio/video.

The first audio and video may be from the local electronic device, or may be a video downloaded or cached through the internet, which is not limited in this embodiment.

The synthesis can be that the second audio and video is inserted into the first audio and video, and can also be that the second audio and video replaces partial segments in the first audio and video.

It can be understood that, by the above-mentioned apparatus of the embodiment of the present application, another piece of video may be synthesized in one piece of video, one piece of audio may be synthesized in one piece of video or one piece of video may be synthesized in one piece of audio, and another piece of audio may also be synthesized in one piece of audio.

The device of the embodiment of the application can receive the first input of a user under the condition of playing the first audio and video, display the second audio and video obtained through recording or playing at the first moment corresponding to the playing progress of the first audio and video, and then synthesize the second audio and video into the first audio and video to generate the synthesized audio and video. Therefore, the time corresponding to the appropriate playing progress can be directly selected on the basis of the original audio and video, the second audio and video is obtained through the recording or playing function of the electronic equipment to be synthesized with the original audio and video, and then the synthesized audio and video is obtained, so that the operation is convenient and fast, and the efficiency is high.

Illustratively, the first display module 8002 may be specifically configured to:

and responding to the first input, and displaying a second audio and video in the second display area under the condition that the first audio and video is played in the first display area to the first moment.

For example, in the playing interface 201 shown in (2a) in fig. 2, the display area of the first video is the whole screen, and in response to the first input, the electronic device performs split-screen display, as shown in (2b) in fig. 2, and simultaneously displays the playing screen of the first video and the recording screen of the second video in the first display area 205 and the second display area 206 of the electronic device 200, respectively. Therefore, the picture is displayed, played and recorded through split screen display, so that the user can know the clipping information of the first video and the second video more intuitively, for example, the user can know the insertion point of the first video (such as the progress position at the first moment) and the shooting duration of the second video intuitively, the user can compare the clipping information of the two videos conveniently and efficiently, and the clipping processing is more convenient and efficient.

In order to improve convenience of audio/video synthesis operation and visually know information such as a picture and a progress in a second audio/video acquisition process, optionally, the first display module 802 specifically includes:

the first display submodule 8021 is configured to respond to the first input, and start to dynamically display a picture of a target audio/video at a first time corresponding to the first audio/video playing progress, where the target audio/video is a recorded or played audio/video;

a first receiving submodule 8022, configured to receive a second input of the user;

the first stopping submodule 8023 is configured to, in response to the second input, stop dynamically displaying the picture of the target audio/video to obtain the second audio/video.

Optionally, in the insertion mode, the generating module 803 may specifically be configured to:

and splicing a second audio and video in the position corresponding to the first moment in the first audio and video to generate a synthetic audio and video.

Optionally, in the automatic replacement mode with an equal duration, the generating module 803 may specifically be configured to:

and replacing a first segment in the first audio/video with a second audio/video to generate a synthetic audio/video, wherein the first segment is a segment from a first moment to a second moment in the first audio/video, and the duration of the first segment is equal to that of the second audio/video.

In order to simplify the operation of the peer-to-peer time length automatic replacement mode, optionally, if the user selects the automatic replacement mode in advance, before the second audio/video is obtained, the second audio/video can also be obtained by automatically associating according to the manually input target segment time length. For example, the first display module 802 may specifically include:

the second display submodule 8024 is configured to, in response to the first input, start to dynamically display a picture of a target audio/video at a first time corresponding to the first audio/video playing progress, where the target audio/video is a recorded or played audio/video;

and a second stopping submodule 8025, after the target duration is preset, stopping dynamically displaying the picture of the target audio and video to obtain the second audio and video.

Correspondingly, the generating module 803 may specifically be configured to:

and replacing a first segment in the first audio/video with the second audio/video to generate a synthetic audio/video, wherein the duration of the first segment is equal to the preset target duration.

Optionally, in a case that the synthesis manner selected by the user in advance is freely replaced, the generating module 803 may specifically include:

a second receiving submodule 8031 for receiving a third input;

a first determining submodule 8032, configured to determine, in response to a third input, a duration of a target segment in the first audio/video;

the first generating sub-module 8033 is configured to replace the target segment in the first audio/video with the second audio/video to generate a synthesized audio/video.

In this embodiment of the application, in the free replacement mode, the second receiving submodule 8031 and the first determining submodule 8032 may record or play a second video with any duration, and then determine the target duration of the target segment in the first video through a third input, so as to replace the target segment in the first video with the second video, where the replaced target segment may be equal to or unequal to the duration of the second video. For example, by the second video of 5 minutes, the target segment of the first video, which is targeted for 10 minutes from the first time, is replaced. Therefore, the video clip operation is more free and flexible, and the diversified requirements of users are met.

It should be understood that the steps executed by the second receiving submodule 8031 and the first determining submodule 8032 may also be executed before the second audio/video is acquired, the time length of the target segment is determined first, then the second audio/video is obtained by the first display module 802, then the second audio/video is replaced by the target segment in the first audio/video by the generating module 803, and the free replacement of the segment with unequal time length may also be implemented.

To facilitate starting the audio-video synthesizing operation, in another application scenario of the embodiment of the present application, the first video may be a video locally stored from the electronic device. The apparatus may further comprise:

a second receiving module 804, configured to receive, in a case that the recording interface is displayed, a sixth input of the target identifier in the recording interface by the user.

The target identifier is the first identifier or the second identifier.

A second display module 805, configured to, in response to a sixth input, display a candidate audio and video if the target identifier is the first identifier; and

In order to improve the flexibility of the audio/video synthesis operation, optionally, during the process of displaying the second audio/video at the first time corresponding to the first audio/video playing progress in response to the first input,

the image recording capability (such as a filter function, a wide angle/macro function, a re-shooting function and the like) and the video editing capability of the electronic equipment can be multiplexed.

For example, referring to fig. 6, in the case of displaying a recording interface 601 of the second video, a recording screen of the first video is adjusted by a filter function, a wide/macro function, and the like on the recording interface. And a record confirmation identifier 602 and a record cancellation identifier 603 may also be provided on the recording interface 601. The confirmed recording identifier 602 is used to determine the currently recorded video as the second video, and can be synthesized into the first video; the cancel recording flag 603 is used to automatically discard the currently recorded video and re-capture the second video.

Optionally, in order to conveniently complete the fast composition of multiple materials, the apparatus may further include:

a third receiving module 806, configured to receive a fourth input of the user when the second audio/video is obtained through recording; and

and an obtaining module 807, configured to select and obtain a third audio and video from the target audio and video library in response to the fourth input.

Correspondingly, the generating module 803 may be configured to:

and synthesizing the second audio/video and the third audio/video into the first audio/video to generate a synthesized audio/video.

Optionally, the generating module 803 may further include:

the second generation submodule is used for generating subtitle information according to the second audio and video; and

and the third generation submodule is used for associating the subtitle information with the second audio/video, synthesizing the subtitle information and the second audio/video into the first audio/video and generating the synthesized audio/video.

The audio/video processing device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The audio/video processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The audio and video processing device provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to fig. 7, and is not described here again to avoid repetition.

Optionally, as shown in fig. 9, an electronic device 900 is further provided in this embodiment of the present application, and includes a processor 901, a memory 902, and a program or an instruction that is stored in the memory 902 and is executable on the processor 901, where when the program or the instruction is executed by the processor 901, each process of the above-described audio/video processing apparatus method embodiment is implemented, and the same technical effect can be achieved, and is not described herein again to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 10 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.

The user input unit 1007 is configured to receive a first input of a user when a first audio/video is played;

the processor 1010 is configured to respond to the first input, and display a second audio/video at a first time corresponding to the first audio/video playing progress, where the second audio/video is obtained by shooting or playing;

The electronic equipment provided by the embodiment of the application can receive the first input of a user under the condition of playing the first audio and video, display the second audio and video obtained by shooting or playing at the first moment corresponding to the playing progress of the first audio and video, and then synthesize the second audio and video into the first audio and video to generate the synthesized audio and video. Therefore, the time corresponding to the appropriate playing progress can be directly selected on the basis of the original audio and video, the second audio and video is obtained through the shooting or playing function of the electronic equipment to be synthesized with the original audio and video, and then the synthesized audio and video is obtained, so that the operation is convenient and fast, and the efficiency is high.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 1041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned audio/video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running programs or instructions, so that the processes of the embodiment of the audio and video processing method are realized, the same technical effects can be achieved, and the repeated description is omitted here for avoiding repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An audio-video processing method, characterized in that the method comprises:

2. The method of claim 1, wherein the displaying a second audio-video at a first time corresponding to the first audio-video playback progress in response to the first input comprises:

responding to the first input, and starting to dynamically display a picture of a target audio and video at a first moment corresponding to the first audio and video playing progress, wherein the target audio and video is a recorded or played audio and video;

receiving a second input of the user;

and responding to the second input, and stopping dynamically displaying the picture of the target audio/video to obtain the second audio/video.

3. The method of claim 2, wherein synthesizing the second audio video in the first audio video to generate a synthesized audio video comprises:

and splicing the second audio and video in the first audio and video at a position corresponding to the first moment to generate a synthetic audio and video.

4. The method of claim 2, wherein synthesizing the second audio video in the first audio video to generate a synthesized audio video comprises:

and replacing a first fragment in the first audio and video with the second audio and video to generate a synthetic audio and video, wherein the first fragment is a fragment from the first moment to the second moment in the first audio and video, and the duration of the first fragment is equal to that of the second audio and video.

5. The method of claim 1, wherein the displaying a second audio-video at a first time corresponding to the first audio-video playback progress in response to the first input comprises:

after the target duration is preset, stopping dynamically displaying the picture of the target audio/video to obtain the second audio/video;

the synthesizing the second audio and video in the first audio and video to generate a synthesized audio and video comprises the following steps:

6. The method of claim 1, wherein synthesizing the second audio video in the first audio video to generate a synthesized audio video comprises:

receiving a third input;

responding to the third input, and determining the duration of a target segment in the first audio and video;

and replacing the target segment in the first audio and video with the second audio and video to generate a synthetic audio and video.

7. The method according to any one of claims 1-4, wherein the displaying a second audio-video at a first time corresponding to the first audio-video playback progress in response to the first input comprises:

and responding to the first input, and displaying the second audio and video in a second display area under the condition that the first audio and video is played in a first display area to the first moment.

8. The method of claim 1, wherein synthesizing the second audio video in the first audio video to generate a synthesized audio video comprises:

generating subtitle information according to the second audio and video;

and associating the subtitle information with the second audio/video, and synthesizing the subtitle information and the second audio/video into the first audio/video to generate the synthesized audio/video.

9. A method as claimed in claim 1, wherein before synthesizing the second audio-video in the first audio-video, generating a synthesized audio-video, the method further comprises:

receiving a fourth input of a user under the condition that the second audio and video is obtained through recording;

responding to the fourth input, and selecting a third audio and video from a target audio and video library;

10. An audio-video processing apparatus, characterized in that the apparatus comprises:

the first display module is used for responding to the first input and displaying a second audio and video at a first moment corresponding to the first audio and video playing progress, wherein the second audio and video is obtained by recording or playing;

and the generating module is used for generating a synthetic audio and video, wherein the synthetic audio and video is obtained by synthesizing the second audio and video in the first audio and video.

11. An electronic device, characterized in that it comprises a processor, a memory and a program or instructions stored on said memory and executable on said processor, said program or instructions, when executed by said processor, implementing the steps of the audio-video processing method according to any one of claims 1 to 9.