WO2023030270A1 - 音视频处理方法、装置和电子设备 - Google Patents

音视频处理方法、装置和电子设备 Download PDF

Info

Publication number
WO2023030270A1
WO2023030270A1 PCT/CN2022/115582 CN2022115582W WO2023030270A1 WO 2023030270 A1 WO2023030270 A1 WO 2023030270A1 CN 2022115582 W CN2022115582 W CN 2022115582W WO 2023030270 A1 WO2023030270 A1 WO 2023030270A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio
input
target
generate
Prior art date
Application number
PCT/CN2022/115582
Other languages
English (en)
French (fr)
Inventor
高桦
Original Assignee
维沃移动通信(杭州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信(杭州)有限公司 filed Critical 维沃移动通信(杭州)有限公司
Publication of WO2023030270A1 publication Critical patent/WO2023030270A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/231Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip

Definitions

  • the present application belongs to the technical field of electronic equipment, and in particular relates to an audio and video processing method, device and electronic equipment.
  • the purpose of the embodiments of the present application is to provide an audio and video processing method, device and electronic equipment, which can solve the problem of lack of convenience in processing audio and video materials.
  • the embodiment of the present application provides an audio and video processing method, the method comprising:
  • the second audio and video is displayed at the first moment corresponding to the first audio and video playback progress, and the second audio and video is obtained by recording or playing;
  • an audio and video processing device which includes:
  • the first receiving module is used to receive the user's first input when playing the first audio and video;
  • the first display module is used to display the second audio and video at the first moment corresponding to the playing progress of the first audio and video in response to the first input, and the second audio and video is obtained by recording or playing;
  • the generating module is used to generate a synthesized audio and video, and the synthesized audio and video is obtained by synthesizing the second audio and video in the first audio and video.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored in the memory and operable on the processor, and the program or instruction is The processor implements the steps of the method described in the first aspect when executed.
  • an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented .
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions, so as to implement the first aspect the method described.
  • the user's first input can be received, and the second audio and video obtained by recording or playing can be displayed at the first moment corresponding to the playback progress of the first audio and video, and then Synthesize the second audio and video into the first audio and video to generate a synthesized audio and video.
  • the appropriate playback progress can be directly selected on the basis of the original audio and video, and the second audio and video can be obtained through the recording or playback function of the electronic device to be synthesized with the original audio and video, and then the synthesized audio and video can be obtained.
  • the operation is convenient and efficient.
  • Fig. 1 is a schematic flow chart of an audio and video processing method provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram showing interface jumps in a specific example of the present application.
  • Fig. 3 is the display schematic diagram of the playback interface in a specific example of the present application.
  • Fig. 4 is a schematic diagram showing interface jumps in another specific example of the present application.
  • Fig. 5 is a schematic diagram showing the desktop of the electronic device system in another specific example of the present application.
  • Fig. 6 is a schematic display diagram of the shooting interface in another specific example of the present application.
  • Fig. 7 is a schematic diagram showing interface jumps in yet another specific example of the present application.
  • Fig. 8 is a schematic structural diagram of an audio and video processing device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the audio and video processing method can be executed on an electronic device, and the electronic device can be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a vehicle-mounted electronic device, a wearable device, and an ultra-mobile personal computer (UMPC) , netbook or personal digital assistant (personal digital assistant, PDA), etc.
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • FIG. 1 shows a schematic flowchart of an audio and video processing method provided by an embodiment of the present application. As shown in Figure 1, the method includes steps S101 to S103:
  • the audio and video in this article can be video or audio, the same below.
  • the first audio and video may be from the local electronic device, or may be an audio and video downloaded or cached through the Internet, which is not limited in this embodiment.
  • the first audio and video When the first audio and video is played, it can be displayed on all or part of the screen of the electronic device.
  • the first input may be a user's click input on the screen, or a voice command input by the user, or a specific gesture or air gesture input by the user, which may be specifically determined according to actual usage requirements. No limit.
  • the click input can be single-click input, double-click input, or any number of click inputs, and can also be long-press input or short-press input.
  • the specific gesture may be any one of a tap gesture, a double-tap gesture, a slide gesture, a drag gesture, a zoom gesture, and a rotation gesture.
  • the picture of the first audio and video and playback progress information can be displayed on the screen of the electronic device, and the progress information can be displayed as a progress bar and/or progress time.
  • the second audio and video is displayed.
  • the first moment may be a progress moment in the process of the user playing the first audio and video, or may be determined by manually dragging the progress bar of the first video, or manually inputting the playing moment.
  • the second audio and video may be a video obtained through a recording function of the electronic device, or may be a video obtained through a playback function of the electronic device.
  • the first audio and video can be paused, or the first audio and video can be continuously played at the same time.
  • the second audio and video may be displayed on all screens of the electronic device, or the first audio and video and the second audio and video may be displayed in different screen areas.
  • the second audio and video is a recorded audio and video
  • the recording screen, recording progress, and recording-related function identifiers of the second audio and video may be displayed.
  • the second audio and video is the audio and video obtained by playing
  • the second audio and video is displayed, what can be displayed are the playing screen, progress information, playing related function identification and the like of the second audio and video.
  • the generated composite audio and video can be displayed on the screen of the electronic device, or can be directly saved to the local electronic device in the background.
  • the synthesis may be that the second audio-video is inserted into the first audio-video, or that the second audio-video replaces a part of the first audio-video.
  • the second audio and video may also be a picture, which is equivalent to only one frame of image in the second audio and video.
  • another video can be synthesized in one video
  • one audio can be synthesized in one video
  • one video can be synthesized in one audio
  • another audio can be synthesized in one audio .
  • the user’s first input can be received, and the second audio and video obtained by recording or playing can be displayed at the first moment corresponding to the playback progress of the first audio and video , and then synthesize the second audio and video into the first audio and video to generate a synthesized audio and video.
  • the moment corresponding to the appropriate playback progress can be directly selected on the basis of the original audio and video, and the second audio and video can be obtained through the shooting or playing function of the electronic device to be synthesized with the original audio and video, and then the synthesized audio and video can be obtained.
  • the operation is convenient and efficient.
  • the user wants to synthesize other video clips in the first video, he can watch the first video to a certain progress time (or manually drag the progress to a certain progress time) ), trigger the shooting function of the electronic device to shoot a section of audio and video (second audio and video), and generate a composite audio and video when the shooting ends.
  • this example will be described below with reference to FIG. 2 .
  • step S101 receives the user's first input when playing the first audio and video, which can be specifically as follows:
  • buttons 202 or floating window function key is displayed.
  • the user can be prompted to download the played first video by means of a pop-up window, and if the download is allowed, the video will be opened. cache, and enter the subsequent video processing steps of S102-S103.
  • a pop-up window can be used to prompt that the user cannot jump to the editing interface.
  • the electronic device 300 can be triggered to jump to the video editing interface (not identified in the figure) through the displayed shortcut keys. , or jump to the video editing interface by long pressing any volume key 302 of the electronic device 300 when the long video playback interface 301 is displayed.
  • the long video may be a streaming media file of the Internet, or a local video file of the electronic device.
  • the user can first pass through the video clip before displaying the second audio and video at the first moment corresponding to the first audio and video playback progress in response to the first input in step S102.
  • the control 204 or menu on the interface is used to select the way of video synthesis.
  • the synthesis method may include one or more of insertion, automatic replacement and free replacement.
  • the insertion method refers to inserting the second video into a certain position in the first video, and splicing the second video and the first video together.
  • the automatic replacement method means that when the recording of the second video ends, the second video is automatically synthesized into the first video, and the segment of the same duration in the first video is replaced.
  • Free replacement means that before or after recording the second video, input any target duration (the duration is not greater than the total playback duration of the first video), and replace the first video when combining the second video into the first video.
  • the duration of the second video may or may not be equal to the target duration.
  • step S102 displays the second audio and video at the first moment corresponding to the playback progress of the first audio and video in response to the first input, which may specifically include:
  • the first video and the second video may be displayed on the first display area and the second display area on the electronic device respectively.
  • the display area of the first video is the entire screen
  • the electronic device performs split-screen display, as shown in (2b) in Figure 2
  • the playing screen of the first video and the recording screen of the second video are respectively displayed on the first display area 205 and the second display area 206 of the electronic device 200 .
  • the user can more intuitively understand the editing information of the first video and the second video, such as intuitively understanding the insertion point of the first video (such as the progress position of the first moment above) and the second video.
  • the shooting time of the second video, etc. is convenient for users to compare the editing information of the two videos, and perform more convenient and efficient editing processing.
  • step S102 responds to the first input , to display the second audio and video at the first moment corresponding to the playback progress of the first audio and video, which may specifically include:
  • the target audio and video is the recorded or played audio and video.
  • the recording screen of the target audio and video starts to be displayed at the first moment corresponding to the playing progress of the first audio and video.
  • the recording picture of the target video is dynamically displayed in the second display area 206, which is convenient for the user to adjust the recording angle of the electronic device according to the recording picture. After recording all the video pictures needed, the recording ends.
  • the recording screen may include a recording status (recording, pause recording or stop recording) button, recording progress (can be represented by a progress bar or recording time, etc.) and the like.
  • recording progress can be represented by a progress bar or recording time, etc.
  • the recording screen can dynamically display the change of the recording progress.
  • the second audio and video is obtained through playing, then in response to the first input, at the first moment corresponding to the progress of the first audio and video playback, start to display the playback screen of the target audio and video.
  • the target audio and video may come from the audio and video database of the electronic device, and when all the video images required by the user are played, the playback ends.
  • the playback screen may include a playback status (play, pause, or stop playback) button, playback progress (can be represented by a progress bar or playback time, etc.) and the like.
  • playback progress changes with time, and the playback screen can dynamically display the change of the playback progress.
  • the second input may be the user's click input on the screen, or a voice command input by the user, or a specific gesture or air gesture input by the user, which may be specifically determined according to actual usage requirements. No limit.
  • the click input can be single-click input, double-click input, or any number of click inputs, and can also be long-press input or short-press input.
  • the specific gesture may be any one of a tap gesture, a double-tap gesture, a slide gesture, a drag gesture, a zoom gesture, and a rotation gesture.
  • the embodiment of the present application can complete the synthesis by means of insertion, automatic replacement of equal duration, and free replacement of unequal duration.
  • step S103 synthesizes the second audio and video in the first audio and video to generate a synthesized audio and video, specifically Can include:
  • the user can click the function button 207 on the video editing interface 203 to end the recording of the second video, and directly insert the recorded second video into the first video corresponding to the first moment The location of the video is spliced to generate a composite video.
  • the first video in response to the first input, may be displayed in a paused state while the second video is being displayed.
  • the second video is directly inserted at the tentative progress position of the first video to generate a synthesized audio and video. For example, when a user wants to co-shoot a video with a favorite protagonist or star when watching a drama, he can enter the first input at the node of the video screen playing the protagonist or star, and shoot a second video containing the user himself for synthesis.
  • the electronic device automatically completes the splicing of the recorded second video according to the insertion point selected by the user (that is, the position corresponding to the first moment in the first video), and the user does not need to operate too much, which is simple and fast.
  • users edit their own recorded videos, they can flexibly perform re-shooting and continuous shooting of video screens according to their own inspiration, so as to conveniently enrich and improve the video content shot by users and improve the quality of video shooting.
  • the editing can be completed without the assistance of third-party video editing software, saving mobile phone memory usage.
  • the recorded second video may also be an image, which is equivalent to that the second video is as short as only one frame.
  • a first mark 208 may also be provided on the video editing interface, and the first mark 208 is used to jump to the material storage area (audio and video database) of the electronic device, such as the photo album 209 .
  • the user can select the video 210 or picture file in the album 209 as the second video, insert it into the first video at the position corresponding to the first moment, and complete the splicing and synthesis of the video.
  • step S103 synthesizes the second audio and video in the first audio and video to generate a synthesized audio and video, specifically Can include:
  • the target segment in the first audio and video is replaced by the second audio and video to generate a synthesized audio and video.
  • the target segment can be a segment from the first moment (insert start point) to the second moment (insert end point) in the first audio and video, or it can be manually input (such as inputting the start and end moments of the segment or dragging the progress bar ) is determined, but the duration of the target segment is equal to the duration of the second video, which can ensure that the total duration of the synthesized audio and video remains unchanged.
  • the second audio and video can also be automatically associated according to the manually input target segment duration.
  • step S102 displays the second audio and video at the first moment corresponding to the playing progress of the first audio and video in response to the first input, which may specifically include:
  • the preset target duration is the duration of the target segment determined by the user in the automatic replacement mode.
  • the target duration can be determined by manually inputting the start and end time of the target segment or dragging the progress bar to determine the start and end time.
  • step S103 may specifically include:
  • step S103 synthesizes the second audio and video in the first audio and video to generate a synthesized audio and video, specifically, Including steps S301-S303:
  • the third input is used to input the target duration, and the target duration can be the start and end moments determined on the video playback progress bar, or a time period, wherein the start and end moments on the progress bar can be, for example, the start moment "00::05: 00", in the form of the end time "00:10:00", which means a 5-minute segment from the first time 00::05:00 of the first video to the second time 00:10:00; the time period can be For example, it is in the form of "5min” or "300s", indicating 5 minutes from the first moment of the first video.
  • the target duration determined by the third input can be set arbitrarily, but cannot exceed the total duration of the first video.
  • the second video is replaced with the target segment in the first video, wherein the target segment is a segment corresponding to the target duration in the first audio and video.
  • the second video of any duration in the free replacement mode, through the above steps S301-S303, the second video of any duration can be recorded or played first, and then the target duration of the target segment in the first video can be determined through the third input to use
  • the second video replaces the target segment in the first video, so that the replaced target segment may or may not be equal in length to the second video.
  • the 5-minute second video replaces the 10-minute target segment from the first moment in the first video.
  • steps S301-S302 can also be executed before step S102, first determine the duration of the target segment, then obtain the second audio and video through step S102, and then replace the target segment in the first audio and video with the second audio and video through S103 , and free replacement of fragments of different durations can also be realized.
  • the first video may be a video locally stored on the electronic device, and before step S101, the method may further include:
  • the recording interface 401 can be provided with a first logo 402 and a second logo 403, the first logo 402 can be a function logo for entering the electronic device album, and the second logo 403 It can be a function identifier used to trigger jumping to the video editing interface.
  • the target ID is the first ID or the second ID.
  • a target editing interface is displayed.
  • the target identification is the first identification 402
  • the sixth input is the input to the first identification 402
  • the electronic system shown in (4b) in Figure 4 can be entered.
  • the photo album 404 of the device displays several candidate audio and video 405, and the candidate audio and video may include audio and video.
  • the target file 406 can be selected from the candidate audio and video 405 as the first audio and video, and steps S101-S103 are executed.
  • the target identification is the second identification 403
  • the sixth input is the input to the second identification 403
  • the function identifier 208 is used to enter the photo album of the electronic device, enter the photo album and select the target file as the first video.
  • the identification of the recording interface it is possible to quickly select and play the first audio and video to start audio and video editing, or directly enter the video editing interface, select the first audio and video and record the second audio and video to achieve quick editing.
  • step S101 For the convenience of starting the synthesis operation of audio and video, optionally, before step S101, as shown in Figure 5, directly enter the photo album of the electronic device through the photo album function identification 501 on the system desktop of the electronic device 500, and select the target file As the first video, the playback and clipping processes of steps S101 to S102 are performed.
  • step S101 in the process of executing step S101 after the first video is selected, when the first video is played to the first moment, or manually drag the progress bar to the first moment, or manually
  • the image recording capabilities of the electronic device such as filter function, wide-angle/macro distance
  • the image recording capabilities of the electronic device can be reused function, reshoot function, etc.
  • step S102 referring to FIG. 6 , when the recording interface 601 of the second video is displayed, the recording screen of the first video is adjusted through the filter function, wide-angle/macro function, etc. on the recording interface. And the recording interface 601 may also be provided with a confirm recording mark 602 and a cancel recording mark 603 . Confirm recording flag 602 is used to determine the currently recorded video as the second video, which can be synthesized into the first video; cancel recording flag 603 is used to automatically abandon the currently recorded video, and reshoot the second video.
  • Confirm recording flag 602 is used to determine the currently recorded video as the second video, which can be synthesized into the first video
  • cancel recording flag 603 is used to automatically abandon the currently recorded video, and reshoot the second video.
  • video clip thumbnails of a plurality of first videos can also be displayed, such as the video clip thumbnail 604 before the insertion starting point (such as the above-mentioned first moment) and the insertion starting point/end Click on the video clip thumbnail 605 and the preview thumbnail 606 of the second video.
  • different segments of the first video can be selected for display by manually clicking different thumbnails, so as to facilitate more targeted re-editing of specific segments.
  • click on the thumbnail 604 of the video clip before the insertion starting point (the first moment) and the corresponding previous video clip in the first video will be played in the first display area, and the related editing function (Brightness function, contrast function, color function, etc.) edit this segment again; or click the video segment thumbnail 605, play the first segment corresponding to the first video in the first display area, and re-edit the segment by the relevant editing function edit.
  • the method may also include:
  • the 4th input can be the click input to video adding sign 702, in response to the 4th input, as shown in (7b) among Fig. video.
  • step S103 may include:
  • the second audio and video and the third audio and video can be synthesized into the first audio and video to generate a synthesized audio and video to complete the quick synthesis of multiple materials.
  • the manner of processing the first video is also applicable to the processing of the audio
  • the manner of acquiring the second video and synthesizing the video is also applicable to acquiring audio and synthesizing the audio.
  • step S103 may also include:
  • the first video can be from a video with subtitle information such as a film and television drama segment, or when the second video or second audio inserted into the first video needs to configure subtitle information, it can be provided by electronic equipment.
  • the voice recognition conversion function converts the voice information in the video or audio taken by the user into subtitle information, and when generating a synthesized audio and video, associates the subtitle information with the second audio and video and synthesizes it in the first audio and video.
  • the synthesis is convenient and at the same time Enrich the diversity of information to meet the diverse needs of users.
  • the format of the subtitle information in the first video can be automatically detected, and the subtitle information of the first audio and video can be displayed according to the format of the subtitle information in the first video, so as to improve the consistency of the content of the synthesized audio and video.
  • step S102 in response to the first input, display the second audio and video at the first moment corresponding to the playing progress of the first audio and video, wherein the second audio and video may be obtained by playing.
  • the second video is determined by manually inputting the playback start point and end point, and at this time the second video is a segment of the target video.
  • the second video is automatically synthesized into the first video. Synthesis can be by insertion or substitution.
  • This example can enrich audio and video synthesis methods by synthesizing different materials in the photo album, and the operation is convenient and quick.
  • the audio and video processing method provided in the embodiment of the present application may be executed by an audio and video processing device, or a control module in the audio and video processing device for executing the audio and video processing method.
  • the audio and video processing device provided in the embodiment of the present application is described by taking the audio and video processing device executing the audio and video processing method as an example.
  • FIG. 8 is a schematic structural diagram of an audio and video processing device provided by an embodiment of the present application. As shown, the unit includes:
  • the first receiving module 801 is configured to receive the user's first input when playing the first audio and video;
  • the first display module 802 is configured to display the second audio and video at the first moment corresponding to the playing progress of the first audio and video in response to the first input, and the second audio and video is obtained by recording or playing;
  • the generating module 803 is configured to generate a synthesized audio and video, and the synthesized audio and video is obtained by synthesizing the second audio and video in the first audio and video.
  • the first audio and video may be from a local of the electronic device, or may be a video downloaded or cached through the Internet, which is not limited in this embodiment.
  • the first audio and video When the first audio and video is played, it can be displayed on all or part of the screen of the electronic device.
  • the first input may be a user's click input on the screen, or a voice command input by the user, or a specific gesture or air gesture input by the user, which may be specifically determined according to actual usage requirements. No limit.
  • the click input can be single-click input, double-click input, or any number of click inputs, and can also be long-press input or short-press input.
  • the specific gesture may be any one of a tap gesture, a double-tap gesture, a slide gesture, a drag gesture, a zoom gesture, and a rotation gesture.
  • the synthesis may be that the second audio-video is inserted into the first audio-video, or that the second audio-video replaces part of the first audio-video.
  • the second audio and video may also be a picture, which is equivalent to only one frame of image in the second audio and video.
  • another video can be synthesized in a video
  • an audio can be synthesized in a video or a video can be synthesized in an audio
  • another audio can be synthesized in an audio
  • the device in the embodiment of the present application can receive the user's first input when playing the first audio and video, and display the second audio and video obtained by recording or playing at the first moment corresponding to the playback progress of the first audio and video, and then Synthesize the second audio and video into the first audio and video to generate a synthesized audio and video.
  • the moment corresponding to the appropriate playback progress can be directly selected on the basis of the original audio and video, and the second audio and video can be obtained through the recording or playback function of the electronic device to be synthesized with the original audio and video, and then the synthesized audio and video can be obtained.
  • the operation is convenient and efficient.
  • the first display module 8002 can be specifically used for:
  • the display area of the first video is the entire screen, and in response to the first input, the electronic device performs split-screen display, as shown in (2b) in Figure 2, while The playing screen of the first video and the recording screen of the second video are respectively displayed on the first display area 205 and the second display area 206 of the electronic device 200 .
  • the user can more intuitively understand the editing information of the first video and the second video, such as intuitively understanding the insertion point of the first video (such as the progress position of the first moment above) and the second video.
  • the shooting time of the second video, etc. is convenient for users to compare the editing information of the two videos, and perform more convenient and efficient editing processing.
  • the first display module 802 specifically includes:
  • the first display submodule 8021 is configured to, in response to the first input, start to dynamically display the picture of the target audio and video at the first moment corresponding to the playing progress of the first audio and video, and the target audio and video is recorded or played audio and video;
  • the first receiving submodule 8022 is configured to receive the second input from the user
  • the first stop submodule 8023 is configured to stop dynamically displaying the target audio and video image in response to the second input, and obtain the second audio and video.
  • the embodiment of the present application can complete the synthesis by means of insertion, automatic replacement of equal duration, and free replacement of unequal duration.
  • the generation module 803 can specifically be used for:
  • the second audio and video is spliced at the position corresponding to the first moment in the first audio and video to generate a synthetic audio and video.
  • the generation module 803 can specifically be used for:
  • the first segment is the segment from the first moment to the second in the first audio and video, and the duration of the first segment is equal to the second audio and video duration.
  • the second audio and video can also be automatically associated according to the manually input target segment duration.
  • the first display module 802 may specifically include:
  • the second display sub-module 8024 is used to respond to the first input and start to dynamically display the picture of the target audio and video at the first moment corresponding to the progress of the first audio and video playback, and the target audio and video is the audio and video recorded or played;
  • the second stop submodule 8025 After the preset target duration, stop dynamically displaying the target audio and video images to obtain the second audio and video.
  • the generating module 803 can specifically be used for:
  • the second audio and video is replaced by the first segment in the first audio and video to generate a synthesized audio and video, and the duration of the first segment is equal to the preset target duration.
  • the generating module 803 may specifically include:
  • the second receiving submodule 8031 is configured to receive a third input
  • the first determination submodule 8032 is used to determine the duration of the target segment in the first audio and video in response to the third input;
  • the first generation sub-module 8033 is used to replace the target segment in the first audio and video with the second audio and video to generate a synthesized audio and video.
  • the second video of any duration can be recorded or played first, and then the first video can be determined through the third input
  • the target duration of the target segment in the middle is used to replace the target segment in the first video with the second video, so that the replaced target segment may or may not be equal to the duration of the second video.
  • the 5-minute second video replaces the 10-minute target segment from the first moment in the first video. In this way, the video editing operation is more free and flexible, and meets the diverse needs of users.
  • the above-mentioned steps performed by the second receiving submodule 8031 and the first determining submodule 8032 can also be performed before obtaining the second audio and video, first determine the duration of the target segment, and then obtain the second audio and video through the first display module 802 , and then the target segment in the first audio and video is replaced by the second audio and video through the generation module 803, and the free replacement of segments of different durations can also be realized.
  • the first video may be a video locally stored on the electronic device.
  • Devices can also include:
  • the second receiving module 804 is configured to receive a sixth input from the user on the target identifier in the recording interface when the recording interface is displayed.
  • the recording interface 401 can be provided with a first logo 402 and a second logo 403, the first logo 402 can be a function logo for entering the electronic device album, and the second logo 403 It can be a function identifier used to trigger jumping to the video editing interface.
  • the target ID is the first ID or the second ID.
  • the second display module 805 is configured to display candidate audio and video in response to the sixth input when the target identifier is the first identifier;
  • a target editing interface is displayed.
  • the identification of the recording interface it is possible to quickly select and play the first audio and video to start audio and video editing, or directly enter the video editing interface, select the first audio and video and record the second audio and video to achieve quick editing.
  • the first display module 802 responds to the first input, during the process of displaying the second audio-video at the first moment corresponding to the progress of the first audio-video playback,
  • the image recording capabilities (such as filter functions, wide-angle/macro functions, re-shooting functions, etc.) and video editing capabilities of electronic equipment can be reused.
  • the recording interface 601 of the second video when the recording interface 601 of the second video is displayed, the recording picture of the first video is adjusted through the filter function, wide-angle/macro function, etc. on the recording interface. And the recording interface 601 may also be provided with a confirm recording mark 602 and a cancel recording mark 603 . Confirm recording flag 602 is used to determine the currently recorded video as the second video, which can be synthesized into the first video; cancel recording flag 603 is used to automatically abandon the currently recorded video, and reshoot the second video.
  • video clip thumbnails of a plurality of first videos can also be displayed, such as the video clip thumbnail 604 before the insertion starting point (such as the above-mentioned first moment) and the insertion starting point/end Click on the video clip thumbnail 605 and the preview thumbnail 606 of the second video.
  • the device may also include:
  • the third receiving module 806 is configured to receive a fourth input from the user when the second audio and video is obtained through recording.
  • the obtaining module 807 is configured to select and obtain the third audio and video from the target audio and video library in response to the fourth input.
  • the generation module 803 can be used for:
  • the generating module 803 may also include:
  • the second generation submodule is used to generate subtitle information according to the second audio and video.
  • the third generation sub-module is configured to associate the subtitle information with the second audio and video, and synthesize the subtitle information and the second audio and video into the first audio and video to generate the synthesized audio and video.
  • the audio and video processing device in this embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal.
  • the device may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant). assistant, PDA), etc.
  • the non-mobile electronic device may be a personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., which are not specifically limited in this embodiment of the present application.
  • the audio and video processing device in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the audio and video processing device provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 to FIG. 7 , and details are not repeated here to avoid repetition.
  • the embodiment of the present application further provides an electronic device 900, including a processor 901, a memory 902, and programs or instructions stored in the memory 902 and operable on the processor 901,
  • the program or instruction is executed by the processor 901
  • the various processes of the above audio and video processing device method embodiments can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 10 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010, etc. part.
  • the electronic device 1000 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 1010 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 10 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine certain components, or arrange different components, and details will not be repeated here. .
  • the user input unit 1007 is used to receive the user's first input when playing the first audio and video;
  • the processor 1010 is configured to display a second audio and video at a first moment corresponding to the playing progress of the first audio and video in response to the first input, and the second audio and video is obtained by shooting or playing;
  • the electronic device in the embodiment of the present application can receive the user's first input when playing the first audio and video, and display the second audio and video obtained by shooting or playing at the first moment corresponding to the playback progress of the first audio and video, Then synthesize the second audio and video into the first audio and video to generate a synthesized audio and video.
  • the moment corresponding to the appropriate playback progress can be directly selected on the basis of the original audio and video, and the second audio and video can be obtained through the shooting or playing function of the electronic device to be synthesized with the original audio and video, and then the synthesized audio and video can be obtained.
  • the operation is convenient and efficient.
  • the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 10042, and the graphics processor 10041 is used in a video capture mode or an image capture mode by an image capture device (such as the image data of the still picture or video obtained by the camera) for processing.
  • the display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 1007 includes a touch panel 10071 and other input devices 10072 .
  • the touch panel 10071 is also called a touch screen.
  • the touch panel 10071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 10072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • the memory 1009 can be used to store software programs as well as various data, including but not limited to application programs and operating systems.
  • Processor 1010 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, and the modem processor mainly processes wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 1010 .
  • the embodiment of the present application also provides a readable storage medium, the readable storage medium stores a program or an instruction, and when the program or instruction is executed by the processor, the various processes of the above-mentioned audio and video processing method embodiment are realized, and can achieve The same technical effects are not repeated here to avoid repetition.
  • the processor is the processor in the electronic device described in the above embodiments.
  • the readable storage medium includes computer readable storage medium, such as computer read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the above audio and video processing method embodiment Each process, and can achieve the same technical effect, in order to avoid repetition, will not repeat them here.
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
  • the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请公开了一种音视频处理方法和电子设备,属于电子设备领域。在本申请实施例中,能够在播放第一音视频的情况下,接收用户的第一输入,以对应第一音视频播放进度的第一时刻显示通过拍摄或播放得到的第二音视频,然后将第二音视频合成到第一音视频中,生成合成音视频。

Description

音视频处理方法、装置和电子设备
相关申请的交叉引用
本申请主张2021年8月31日在中国提交的中国专利申请号202111017334.6的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于电子设备技术领域,具体涉及一种音视频处理方法、装置和电子设备。
背景技术
用户常常想要对音视频文件进行剪辑处理,以剪辑出更有趣味性的音视频,但剪辑处理操作都较为繁琐。
例如用户通过手机等电子设备录制音视频、视频博客(video blog或video log,vlog)等来记录生活片段时,往往采用的是手机具备的相机应用、短视频应用或美化拍摄应用,但录制的音视频内容常常不能完全符合用户的拍摄期望,因此需要再次编辑。然而当前录制类应用(包括上述相机应用、短视频应用或美化拍摄应用等)的编辑能力有限,而音视频编辑类应用的操作难度和复杂度又较高,因此导致相关技术对音视频素材处理缺乏便捷性。
发明内容
本申请实施例的目的是提供一种音视频处理方法、装置和电子设备,能够解决对音视频素材处理缺乏便捷性的问题。
第一方面,本申请实施例提供了一种音视频处理方法,该方法包括:
在播放第一音视频的情况下,接收用户的第一输入;
响应于所述第一输入,在第一音视频播放进度对应的第一时刻显示第二音 视频,第二音视频通过录制或播放得到;
在第一音视频中合成第二音视频,生成合成音视频。
第二方面,本申请实施例提供了一种音视频处理的装置,该装置包括:
第一接收模块,用于在播放第一音视频的情况下,接收用户的第一输入;
第一显示模块,用于响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,第二音视频通过录制或播放得到;
生成模块,用于生成合成音视频,合成音视频为在第一音视频中合成第二音视频得到。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。
在本申请实施例中,能够在播放第一音视频的情况下,接收用户的第一输入,以对应第一音视频播放进度的第一时刻显示通过录制或播放得到的第二音视频,然后将第二音视频合成到第一音视频中,生成合成音视频。这样可以直接在原音视频的基础上选择合适的播放进度,通过电子设备的录制或播放功能获取第二音视频来与原音视频合成,进而得到合成音视频,操作便捷,高效。
附图说明
图1是本申请实施例提供的一种音视频处理方法的流程示意图;
图2是本申请一个具体示例中界面跳转的显示示意图;
图3是本申请一个具体示例中播放界面的显示示意图;
图4是本申请另一个具体示例中界面跳转的显示示意图;
图5是本申请另一个具体示例中电子设备系统桌面的显示示意图;
图6是本申请再一个具体示例中拍摄界面的显示示意图;
图7是本申请再一个具体示例中界面跳转的显示示意图;
图8是本申请实施例提供的一种音视频处理装置结构示意图;
图9是本申请本申请实施例提供的一种电子设备结构示意图;
图10是实现本申请实施例的一种电子设备的硬件结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
得益于短视频、流媒体等内容的丰富,用户进行趣味视频剪辑、编辑的需求和想法也更为丰富。相较于电脑(personal computer,PC)端多任务处理以及鼠标操作的精准灵活性,要在手机等移动端上完成视频的合成,一般需要借助第三方软件程序,进行较为复杂的处理,例如,通过手动拖拽视频进度条调整视频编辑时间。因此,导致用户无法便捷地得到想要的合成音视频。
为此,本申请实施例提供了一种音视频处理方法和电子设备,以解决上述至少一个技术问题。其中,该音视频处理方法可以在电子设备上执行,电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,本申请实施例不作具体限定。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的音视频处理方法进行详细地说明。
图1示出了本申请实施例提供的音视频处理方法的流程示意图。如图1所示,该方法包括步骤S101~S103:
S101.在播放第一音视频的情况下,接收用户的第一输入。
本文中的音视频可以是视频,也可以是音频,下同。
第一音视频可以来自电子设备本地,也可以是通过互联网下载或缓存的音视频,本实施例不做限定。
第一音视频播放时,可以在电子设备的全部屏幕或部分屏幕上显示。
示例性的,第一输入可以是用户对屏幕的点击输入、或者是用户输入的语音指令,或者是用户输入的特定手势或隔空手势,具体的可以根据实际使用需求确定,本实施例对此不做限定。
点击输入可以为单击输入、双击输入或任意次数的点击输入,还可以为长按输入或短按输入。特定手势可以是轻点手势、双轻点手势、滑动手势、拖动手势、缩放手势、转动手势中的任意一种。
S102.响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,其中,第二音视频通过录制或播放得到。
第一音视频播放时,可以在电子设备的屏幕上显示第一音视频的画面和播放进度信息,进度信息可以显示为进度条和/或进度时间。
在第一音视频播放进度对应的第一时刻,响应于第一输入,显示第二音视 频。其中,第一时刻可以是用户播放第一音视频过程中的一个进度时刻,也可以是通过手动拖动第一视频进度条,或者手动输入播放时刻的方式来确定。
第二音视频可以是通过电子设备具有的录制功能得到的视频,或者也可以是通过电子设备具有的播放功能得到的视频。
示例性地,在显示第二音视频的情况下,可以暂停播放第一音视频,也可以同时持续播放第一音视频。
并且,可以在电子设备的全部屏幕显示第二音视频,也可以在不同的屏幕区域分别显示第一音视频和第二音视频。
示例性地,如果第二音视频为录制得到的音视频,则显示第二音视频时,可以显示的是第二音视频的录制画面、录制进度以及录制相关的功能标识等。
如果第二音视频为播放得到的音视频,则显示第二音视频时,可以显示的是第二音视频的播放画面、进度信息以及播放相关的功能标识等等。
S103.在第一音视频中合成第二音视频,生成合成音视频。
将步骤S102显示的第二音视频合成到第一音视频中,生成合成音视频,生成的合成音视频可以显示在电子设备的屏幕上,也可以直接在后台保存到电子设备本地,本实施例不做限定。
其中,合成可以是第二音视频插入到第一音视频中,也可以是第二音视频替换第一音视频中的部分片段。
可以理解的是,第二音视频也可以为图片,相当于第二音视频仅有一帧图像。
可以理解的是,通过本申请实施例上述方法,可以在一段视频中合成另一段视频,可以在一段视频中合成一段音频或者在一段音频中合成一段视频,也可以在一段音频中合成另一段音频。
通过本申请实施例的上述方法,能够在播放第一音视频的情况下,接收用户的第一输入,以对应第一音视频播放进度的第一时刻显示通过录制或播放得到的第二音视频,然后将第二音视频合成到第一音视频中,生成合成音视频。 这样可以直接在原音视频的基础上选择合适的播放进度对应的时刻,通过电子设备的拍摄或播放功能获取第二音视频来与原音视频合成,进而得到合成音视频,操作便捷,高效。
示例性地,本申请实施例的一个具体应用场景中,用户想在第一视频中合成其他视频片段,则可以在观看第一视频到某一进度时间(或手动拖动进度到某一进度时间)时,触发电子设备的拍摄功能拍摄一段音视频(第二音视频),并在拍摄结束的同时生成合成音视频。具体的,下面结合图2对本示例进行说明。
图2中的(2a)所示的是一个互联网上的短视频(播放视频在30分钟以内)的播放界面。以该短视频(也即第一视频)的剪辑为例,本申请实施例中步骤S101在播放第一音视频的情况下,接收用户的第一输入,具体可以为如下方式:
在图2中的(2a)所示的电子设备200播放界面201,显示有按钮控件(Button)202或悬浮窗功能键。在播放到第一视频播放进度对应的第一时刻时,通过点击该按钮控件202或悬浮窗,可以跳转到图2中(2b)所示的视频剪辑界面203,以执行步骤S102。
其中,在跳转到视频剪辑界面203之前,由于该第一视频是互联网上的流媒体文件,因此可以通过弹窗方式提示用户将所播放第一视频进行下载,若允许下载,则会开启视频缓存,并进入后续S102~S103的视频处理步骤。
可以理解的是,若互联网的第一视频不允许被下载(如版权限制不允许用户下载)或者该视频无法下载,则可以通过弹窗提示无法跳转到编辑界面。
在其他示例中,如果第一视频为长视频,则可以在如图3所示的该长视频播放界面301,通过显示的快捷按键触发电子设备300跳转到视频剪辑界面(图中未标识),或者在显示该长视频播放界面301的情况下通过长按电子设备300的任一个音量键302,跳转到视频剪辑界面。其中,该长视频可以是互联网的流媒体文件,也可以是电子设备本地的视频文件。
再次结合图2,在跳转到视频剪辑界203后,在执行步骤S102响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频之前,用户可以先通过视频剪辑界面上的控件204或菜单,选择视频合成的方式。其中,合成的方式可以包括插入、自动替换和自由替换中的一种或多种。
插入方式,指的是将第二视频插入到第一视频中的某一位置,将第二视频和第一视频拼接起来。
自动替换方式,指的是在录制第二视频结束的同时,自动将第二视频合成到第一视频中,且替换掉第一视频中相同时长的片段。
自由替换方式,指的是在录制第二视频之前或之后,输入任意目标时长(该时长不大于第一视频的总播放时长),在将第二视频合成到第一视频中时,替换掉第一视频中对应该目标时长的片段,第二视频的时长可以等于该目标时长,也可以不等于该目标时长。
示例性地,在确定合成方式之后,步骤S102响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,具体可以包括:
响应于第一输入,当在第一显示区域205播放第一音视频至第一时刻的情况下,在第二显示区域206显示第二音视频。
本步骤中,响应于第一输入,可以在电子设备上的第一显示区域和第二显示区域分别显示第一视频和第二视频。例如,在图2中(2a)所示的播放界面201,第一视频的显示区域为整个屏幕,响应于第一输入,电子设备进行分屏显示,如图2中(2b)所示,同时在电子设备200的第一显示区域205和第二显示区域206分别显示第一视频的播放画面和第二视频的录制画面。
这样通过分屏显示播放和录制画面,能够利于用户更为直观的了解第一视频和第二视频的剪辑信息,例如直观了解第一视频的插入点(如上述第一时刻的进度位置)以及第二视频的拍摄时长等,方便用户对比两个视频的剪辑信息,进行更加便捷高效的剪辑处理。
为了提高对音视频合成操作的便捷性,并直观了解第二音视频获取过程中 的画面、进度等信息,示例性地,在通过步骤S101接收到第一输入后,步骤S102响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,具体可以包括:
S1021.响应于第一输入,在第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面。
本步骤中,目标音视频为录制或播放的音视频。
如果通过录制得到第二音视频,则响应于第一输入,在第一音视频播放进度对应的第一时刻,开始显示目标音视频的录制画面。例如图2中(2b)所示,在第二显示区域206动态显示目标视频的录制画面,方便用户根据录制画面调整电子设备的录制视角等,录制到需要的所有视频画面后,结束录制。
应理解,如果是目标音频的录制画面,则录制画面可以包括录制状态(录制、暂停录制或停止录制)按钮、录制进度(可以通过进度条或录制时间等方式表示)等。目标音频的录制过程中,录制进度随时间变化,录制画面可以动态显示该录制进度的变化。
如果通过播放得到第二音视频,则响应于第一输入,在第一音视频播放进度对应的第一时刻,开始显示目标音视频的播放画面。目标音视频可以来自电子设备的音视频数据库,当播放完用户需要的所有视频画面后,结束播放。
应理解,如果是目标音频的播放画面,则播放画面可以包括播放状态(播放、暂停播放或停止播放)按钮、播放进度(可以通过进度条或播放时间等方式表示)等。目标音频的播放过程中,播放进度随时间变化,播放画面可以动态显示该播放进度的变化。
S1022.接收用户的第二输入。
示例性地,第二输入可以是用户对屏幕的点击输入、或者是用户输入的语音指令,或者是用户输入的特定手势或隔空手势,具体的可以根据实际使用需求确定,本实施例对此不做限定。
点击输入可以为单击输入、双击输入或任意次数的点击输入,还可以为长 按输入或短按输入。特定手势可以是轻点手势、双轻点手势、滑动手势、拖动手势、缩放手势、转动手势中的任意一种。
S1023.响应于第二输入,停止动态显示目标音视频的画面,得到第二音视频。
用户通过第二输入,停止动态显示目标音视频的画面,也即如果动态显示的录制画面,则响应于第二输入,结束录制,停止录制画面的动态显示,录制的结果作为第二音视频;如果是动态显示的播放画面,则响应于第二输入,结束播放,停止播放画面的动态显示,从目标音视频中播放的数据作为第二音视频。
为满足用户多样化的合成需求,本申请实施例可以通过插入、等时长的自动替换、以及可以不等时长的自由替换等方式,完成合成。
示例性地,在得到满足用户需求的第二视频后,如果用户预先选择的合成方式为插入,则步骤S103在所述第一音视频中合成所述第二音视频,生成合成音视频,具体可以包括:
将第二音视频拼接在第一音视频中对应第一时刻的位置,生成合成音视频。
如图2中(2b)所示,用户可以通过点击视频剪辑界面203上的功能按钮207,结束第二视频的录制,并将录制得到的第二视频直接插入到第一视频中对应第一时刻的位置,进行视频的拼接,生成合成视频。
本示例中,在插入模式下,响应于第一输入,在显示第二视频时,第一视频可以显示为暂停状态。当第二视频完成拍摄后,直接在第一视频暂定的进度位置插入第二视频,生成合成音视频。例如,用户追剧时,想要与喜欢的主角或明星合拍视频,则可以在播放该主角或明星的视频画面节点输入第一输入,拍摄包含用户自身的第二视频进行合成。
这样在录制结束的同时,电子设备自动将录制得到的第二视频按照用户选择的插入点(即第一视频中第一时刻对应的位置)完成拼接,用户不必过多操 作,简便快捷。并且对于用户剪辑自己录制的视频时,可以灵活的根据自身灵感进行视频画面的补拍、续拍等,以便捷的丰富、完善用户拍摄的视频内容,提升视频拍摄的质量。并且不必通过第三方视频编辑软件的辅助即可完成剪辑,节省手机内存占用。
可以理解的,录制的第二视频也可以是一张图像,相当于第二视频短到只有一帧画面。
可选地,本示例中,在视频剪辑界面上还可以设有第一标识208,第一标识208用于跳转到电子设备的素材存储区(音视频数据库),如相册209。用户可以通过在相册209中选择视频210或图片文件作为第二视频,插入到第一视频中对应第一时刻的位置,完成视频的拼接合成。
示例性地,在得到满足用户需求的第二视频后,如果用户选择的合成方式为自动替换,则步骤S103在所述第一音视频中合成所述第二音视频,生成合成音视频,具体可以包括:
将第二音视频替换第一音视频中的目标片段,生成合成音视频。其中,该目标片段可以为第一音视频中第一时刻(插入起始点)至第二时刻(插入结束点)的片段,也可以是通过手动输入(如输入片段的起止时刻或拖动进度条)确定的,但该目标片段的时长等于第二视频的时长,这样可以保证合成后的音视频总时长不变。
为简化对等时长自动替换模式的操作,可选地,如果用户预先选择自动替换方式,则在得到第二音视频之前,也可以根据手动输入的目标片段时长自动关联得到第二音视频。例如,在得到第二音视频之前,步骤S102响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,具体可以包括:
S1024.响应于第一输入,在第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,目标音视频为录制或播放的音视频;
S1025.在预设目标时长后,停止动态显示目标音视频的画面,得到第二音视频。
预设目标时长为自动替换方式下用户确定的目标片段的时长,该目标时长可以是手动输入目标片段的起止时刻或者拖动进度条确定起止时刻确定。
在录制或播放的画面时长达到预设目标时长后,自动停止,得到第二音视频。
对应的,步骤S103具体可以包括:
将第二音视频替换所述第一音视频中的第一片段,生成合成音视频,其中,第一片段即用户确定的目标片段,通过手动输入确定起始时刻,进而确定第一片段的时长,该时长等于预设目标时长,以在合成过程中,自动实现等时长的片段替换。
示例性地,在得到满足用户需求的第二视频后,如果用户预先选择的合成方式为自由替换,则步骤S103在第一音视频中合成所述第二音视频,生成合成音视频,具体可以包括步骤S301~S303:
S301.接收第三输入;
S302.响应于第三输入,确定所述第一音视频中目标片段的时长。
第三输入用于输入目标时长,目标时长可以是在视频播放进度条上确定的起止时刻,也可以是时间段,其中,进度条上的起止时刻可以是例如起始时刻“00::05:00”,终止时刻“00:10:00”的形式,表示从第一视频的第一时刻00::05:00至第二时刻00:10:00之间的5分钟时长片段;时间段可以是例如“5min”或“300s”的形式,表示从第一视频的第一时刻起始后的5分钟。
在自由替换模式下,第三输入确定的目标时长可以任意设置,但不能超过第一视频的总时长。
S303.将第二音视频替换第一音视频中的目标片段,生成合成音视频。
在得到第二音视频的时刻同时将第二视频替换第一视频中的目标片段,其中,该目标片段为第一音视频中目标时长对应的片段。
本申请实施例中,在自由替换模式下,通过上述步骤S301~S303,可以先录制或播放得到任意时长的第二视频,然后通过第三输入确定第一视频中目标 片段的目标时长,以用第二视频替换掉第一视频中的该目标片段,这样替换的目标片段可以与第二视频时长相等,也可以不相等。例如通过5分钟的第二视频,替换第一视频中自第一时刻起,目标为10分钟的目标片段。这样视频剪辑操作更加自由灵活,满足用户的多样化的需求。
应理解,上述步骤S301~S302也可以在步骤S102之前执行,先确定目标片段的时长,再通过步骤S102得到第二音视频,然后通过S103将第二音视频替换第一音视频中的目标片段,也可以实现不等时长片段的自由替换。
为便于启动音视频的合成操作,示例性地,本申请实施例的另一个应用场景中,第一视频可以是来自电子设备本地存储的视频,在步骤S101之前,方法还可以包括:
S1011.在显示录制界面的情况下,接收用户对录制界面中目标标识的第六输入。
本示例中,如图4中(4a)所示,录制界面401可以设有第一标识402和第二标识403,第一标识402可以是用于进入电子设备相册的功能标识,第二标识403可以是用于触发跳转视频剪辑界面的功能标识。
目标标识为第一标识或第二标识。
S1011.在目标标识为第一标识的情况下,响应于第六输入,显示候选音视频;以及
在目标标识为第二标识的情况下,响应于第六输入,显示目标剪辑界面。
结合图所示的视频剪辑的例子,对本示例进行说明。
如图4中(4a)所示,如果目标标识为第一标识402,则第六输入为对第一标识402的输入,响应于第六输入,可以进入图4中(4b)所示的电子设备的相册404,显示若干候选音视频405,候选音视频可以包括音频和视频。这样可以从候选音视频405中选取目标文件406作为第一音视频,执行步骤S101~S103。
如果目标标识为第二标识403,则第六输入为对第二标识403的输入,则 可以跳转到如图2中(2b)所示的视频剪辑界面203,可以通过视频剪辑界面203设置的用于进入电子设备相册的功能标识208,进入相册选取目标文件作为第一视频。
本申请实施例中通过录制界面的标识,能够快速选择并播放第一音视频以启动音视频剪辑,或者直接进入视频剪辑界面,选择第一音视频并录制第二音视频,实现快速剪辑。
为便于启动音视频的合成操作,可选的,还可以在步骤S101之前,如图5所示,直接通过电子设备500系统桌面上的相册功能标识501,进入电子设备的相册中,选取目标文件作为第一视频,执行步骤S101~S102的播放和剪辑处理。
为提高音视频合成操作的灵活性,可选地,在选取第一视频后执行步骤S101的过程中,在第一视频播放到第一时刻、或者手动拖动进度条到第一时刻、或者手动输入播放时间跳播到第一时刻的情况下,接收第一输入后,通过步骤S102录制第二音视频的过程中,可以复用电子设备的影像录制能力(如滤镜功能、广角/微距功能、重新拍摄功能等)和视频编辑能力。
例如,在执行步骤S102过程中,参考图6,在显示第二视频的录制界面601的情况下,通过录制界面上的滤镜功能、广角/微距功能等调整第一视频的录制画面。并且录制界面601上还可以设有确认录制标识602和取消录制标识603。确认录制标识602用于将当前录制的视频确定为第二视频,并可以被合成到第一视频中;取消录制标识603用于自动放弃当前录制的视频,并重新拍摄第二视频。
这样通过复用电子设备的影像录制能力,得到符合用户需求的第二视频。
并且如图6所示,在拍摄界面601上还可以显示多个第一视频的视频片段缩略图,如插入起始点(如上述第一时刻)之前的视频片段缩略图604和插入起始点/结束点之后的视频片段缩略图605以及第二视频的预览缩略图606。
这样可以通过手动点击不同的缩略图,选取第一视频不同的片段进行显 示,以方便更有针对性的进行具体片段的再编辑。如在录制第二视频的过程中,点击插入起始点(第一时刻)之前的视频片段缩略图604,在第一显示区域播放第一视频中对应的该前一段视频片段,并通过相关剪辑功能(亮度功能、对比度功能、色彩功能等)对该片段再次编辑;或者点击视频片段缩略图605,在第一显示区域播放第一视频对应的该后一段片段,并通过相关剪辑功能对该片段再次编辑。
本申请实施例,通过复用录制影像能力和视频剪辑能力,能够直观的看到第一视频插入起始点和结束点前后的视频片段并进行再剪辑,并可以通过与录制中的第二视频的缩略图比较,便于用户直观、便捷的根据这三段视频判断合成后的视频效果,以随时调整。
本申请实施例在又一个应用场景中,如图7中(7a)所示,进入到视频剪辑界面701后,可以在视频剪辑界面701设置视频添加标识702,视频添加标识702是用于进入电子设备的相册的功能标识。对应的,在通过步骤S103在第一音视频中合成第二音视频,生成合成音视频之前,方法还可以包括:
S104.在通过录制得到第二音视频的情况下,接收用户的第四输入;以及
S105.响应第四输入,从目标音视频库中选取获取第三音视频。
第四输入可以是对视频添加标识702的点击输入,响应于第四输入,如图7中(7b)所示,进入电子设备的相册703,再次选取视频或图片等目标文件704作为第三音视频。
则步骤S103可以包括:
将第二音视频和第三音视频合成至第一音视频中,生成所述合成音视频。
这样在响应于第一输入,得到第二视频后,可以将第二音视频和第三音视频合成至第一音视频中,生成合成音视频,完成多素材的快捷合成。
可以理解,上述各示例中,第一视频的处理方式同样适用于对音频的处理,且上述各示例中,获取第二视频并合成视频的方式同样适用于获取音频和合成音频。
示例性地,在步骤S103中,还可以包括:
S1031.根据所述第二音视频,生成字幕信息;以及
S1032.将所述字幕信息关联所述第二音视频,并与所述第二音视频合成至所述第一音视频中,生成所述合成音视频。
在音视频合成时,第一视频可以是来自于影视剧片段等带有字幕信息的视频,或者第一视频插入的第二视频或第二音频有配置字幕信息的需要时,可以通过电子设备具有的语音识别转换功能,将用户拍摄的视频或音频中的语音信息转换为字幕信息,并在生成合成音视频时,将字幕信息关联第二音视频合成在第一音视频中,合成便捷的同时丰富信息的多样性,满足用户的多样化需求。
可选的,可以自动检测第一视频中的字幕信息的格式,并将第一音视频的字幕信息按照第一视频中的字幕信息格式进行显示,提高合成音视频的内容的一致性。
在其他示例中,步骤S102,响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,其中,第二音视频可以是播放得到。
本示例中,响应于第一输入,可以进入电子设备的本地相册,选取一段视频(也即目标视频)跳转到播放界面,通过在该播放界面上手动调节播放起始点和结束点,也可以通过手动输入播放起始点和结束点,来确定第二视频,此时第二视频即该目标视频的一个片段。
在确定第二视频后,自动将该第二视频合成到第一视频中。合成可以是插入或替换的方式。
该示例可以通过对相册中的不同素材进行合成,丰富了音视频合成方式,且操作方便快捷。
需要说明的是,本申请实施例提供的音视频处理方法,执行主体可以为音视频处理装置,或者该音视频处理装置中的用于执行音视频处理方法的控制模块。本申请实施例中以音视频处理装置执行音视频处理方法为例,说明本申请实施例提供的音视频处理装置。
图8示出的是本申请实施例提供的音视频处理装置的结构示意图。如图所示,该装置包括:
第一接收模块801,用于在播放第一音视频的情况下,接收用户的第一输入;
第一显示模块802,用于响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频,第二音视频通过录制或播放得到;
生成模块803,用于生成合成音视频,合成音视频为在第一音视频中合成第二音视频得到。
其中,第一音视频可以来自电子设备本地,也可以是通过互联网下载或缓存的视频,本实施例不做限定。
第一音视频播放时,可以在电子设备的全部屏幕或部分屏幕上显示。
示例性的,第一输入可以是用户对屏幕的点击输入、或者是用户输入的语音指令,或者是用户输入的特定手势或隔空手势,具体的可以根据实际使用需求确定,本实施例对此不做限定。
点击输入可以为单击输入、双击输入或任意次数的点击输入,还可以为长按输入或短按输入。特定手势可以是轻点手势、双轻点手势、滑动手势、拖动手势、缩放手势、转动手势中的任意一种。
合成可以是第二音视频插入到第一音视频中,也可以是第二音视频替换第一音视频中的部分片段。
可以理解的是,第二音视频也可以为图片,相当于第二音视频仅有一帧图像。
可以理解的是,通过本申请实施例上述装置,可以在一段视频中合成另一段视频,可以在一段视频中合成一段音频或者在一段音频中合成一段视频,也可以在一段音频中合成另一段音频。
本申请实施例的装置,能够在播放第一音视频的情况下,接收用户的第一输入,以对应第一音视频播放进度的第一时刻显示通过录制或播放得到的第二 音视频,然后将第二音视频合成到第一音视频中,生成合成音视频。这样可以直接在原音视频的基础上选择合适的播放进度对应的时刻,通过电子设备的录制或播放功能获取第二音视频来与原音视频合成,进而得到合成音视频,操作便捷,高效。
示例性的,第一显示模块8002具体可以用于:
响应于第一输入,当在第一显示区域播放第一音视频至第一时刻的情况下,在第二显示区域显示第二音视频。
例如,在图2中(2a)所示的播放界面201,第一视频的显示区域为整个屏幕,响应于第一输入,电子设备进行分屏显示,如图2中(2b)所示,同时在电子设备200的第一显示区域205和第二显示区域206分别显示第一视频的播放画面和第二视频的录制画面。这样通过分屏显示播放和录制画面,能够利于用户更为直观的了解第一视频和第二视频的剪辑信息,例如直观了解第一视频的插入点(如上述第一时刻的进度位置)以及第二视频的拍摄时长等,方便用户对比两个视频的剪辑信息,进行更加便捷高效的剪辑处理。
为了提高对音视频合成操作的便捷性,并直观了解第二音视频获取过程中的画面、进度等信息,可选的,第一显示模块802具体包括:
第一显示子模块8021,用于响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,所述目标音视频为录制或播放的音视频;
第一接收子模块8022,用于接收用户的第二输入;
第一停止子模块8023,用于响应于所述第二输入,停止动态显示所述目标音视频的画面,得到所述第二音视频。
为满足用户多样化的合成需求,本申请实施例可以通过插入、等时长的自动替换、以及可以不等时长的自由替换等方式,完成合成。
可选的,在插入模式下,生成模块803具体可以用于:
将第二音视频拼接在所述第一音视频中对应第一时刻的位置,生成合成音 视频。
可选的,在等时长的自动替换模式下,生成模块803具体可以用于:
将第二音视频替换第一音视频中的第一片段,生成合成音视频,第一片段为第一音视频中第一时刻至第二时刻的片段,第一片段的时长等于第二音视频的时长。
为简化对等时长自动替换模式的操作,可选的,如果用户预先选择自动替换方式,则在得到第二音视频之前,也可以根据手动输入的目标片段时长自动关联得到第二音视频。例如,第一显示模块802具体可以包括:
第二显示子模块8024,用于响应于第一输入,在第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,目标音视频为录制或播放的音视频;
第二停止子模块8025.在预设目标时长后,停止动态显示目标音视频的画面,得到所述第二音视频。
对应的,生成模块803具体可以用于:
将第二音视频替换第一音视频中的第一片段,生成合成音视频,第一片段时长等于所述预设目标时长。
可选的,在用户预先选择的合成方式为自由替换的情况下,生成模块803具体可以包括:
第二接收子模块8031,用于接收第三输入;
第一确定子模块8032,用于响应于第三输入,确定第一音视频中目标片段的时长;
第一生成子模块8033,用于将第二音视频替换第一音视频中的目标片段,生成合成音视频。
本申请实施例中,在自由替换模式下,通过上述第二接收子模块8031和第一确定子模块8032,可以先录制或播放得到任意时长的第二视频,然后通过第三输入确定第一视频中目标片段的目标时长,以用第二视频替换掉第一视频 中的该目标片段,这样替换的目标片段可以与第二视频时长相等,也可以不相等。例如通过5分钟的第二视频,替换第一视频中自第一时刻起,目标为10分钟的目标片段。这样视频剪辑操作更加自由灵活,满足用户的多样化的需求。
应理解,上述第二接收子模块8031和第一确定子模块8032执行的步骤也可以在获取第二音视频之前执行,先确定目标片段的时长,再通过第一显示模块802得到第二音视频,然后通过生成模块803将第二音视频替换第一音视频中的目标片段,也可以实现不等时长片段的自由替换。
为便于启动音视频的合成操作,示例性的,本申请实施例的另一个应用场景中,第一视频可以是来自电子设备本地存储的视频。装置还可以包括:
第二接收模块804,用于在显示录制界面的情况下,接收用户对录制界面中目标标识的第六输入。
本示例中,如图4中(4a)所示,录制界面401可以设有第一标识402和第二标识403,第一标识402可以是用于进入电子设备相册的功能标识,第二标识403可以是用于触发跳转视频剪辑界面的功能标识。
目标标识为第一标识或第二标识。
第二显示模块805,用于在目标标识为第一标识的情况下,响应于第六输入,显示候选音视频;以及
在目标标识为第二标识的情况下,响应于第六输入,显示目标剪辑界面。
本申请实施例中通过录制界面的标识,能够快速选择并播放第一音视频以启动音视频剪辑,或者直接进入视频剪辑界面,选择第一音视频并录制第二音视频,实现快速剪辑。
为提高音视频合成操作的灵活性,可选的,第一显示模块802在响应于第一输入,在第一音视频播放进度对应的第一时刻显示第二音视频的过程中,
可以复用电子设备的影像录制能力(如滤镜功能、广角/微距功能、重新拍摄功能等)和视频编辑能力。
例如,参考图6,在显示第二视频的录制界面601的情况下,通过录制界 面上的滤镜功能、广角/微距功能等调整第一视频的录制画面。并且录制界面601上还可以设有确认录制标识602和取消录制标识603。确认录制标识602用于将当前录制的视频确定为第二视频,并可以被合成到第一视频中;取消录制标识603用于自动放弃当前录制的视频,并重新拍摄第二视频。
这样通过复用电子设备的影像录制能力,得到符合用户需求的第二视频。
并且如图6所示,在拍摄界面601上还可以显示多个第一视频的视频片段缩略图,如插入起始点(如上述第一时刻)之前的视频片段缩略图604和插入起始点/结束点之后的视频片段缩略图605以及第二视频的预览缩略图606。
本申请实施例,通过复用录制影像能力和视频剪辑能力,能够直观的看到第一视频插入起始点和结束点前后的视频片段并进行再剪辑,并可以通过与录制中的第二视频的缩略图比较,便于用户直观、便捷的根据这三段视频判断合成后的视频效果,以随时调整。
可选的,为方便完成多素材的快捷合成,装置还可以包括:
第三接收模块806,用于在通过录制得到第二音视频的情况下,接收用户的第四输入;以及
获取模块807,用于响应第四输入,从目标音视频库中选取获取第三音视频。
对应的,生成模块803可以用于:
将所述第二音视频和第三音视频合成至第一音视频中,生成合成音视频。
可选的,生成模块803还可以包括:
第二生成子模块,用于根据所述第二音视频,生成字幕信息;以及
第三生成子模块,用于将所述字幕信息关联所述第二音视频,并与所述第二音视频合成至所述第一音视频中,生成所述合成音视频。
本申请实施例中的音视频处理装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车 载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的音视频处理装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的音视频处理装置能够实现图1至图7的方法实施例实现的各个过程,为避免重复,这里不再赘述。
可选地,如图9所示,本申请实施例还提供一种电子设备900,包括处理器901,存储器902,存储在存储器902上并可在所述处理器901上运行的程序或指令,该程序或指令被处理器901执行时实现上述音视频处理装置方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。
图10为实现本申请实施例的一种电子设备的硬件结构示意图。
该电子设备1000包括但不限于:射频单元1001、网络模块1002、音频输出单元1003、输入单元1004、传感器1005、显示单元1006、用户输入单元1007、接口单元1008、存储器1009、以及处理器1010等部件。
本领域技术人员可以理解,电子设备1000还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1010逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图10中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,用户输入单元1007用于在播放第一音视频的情况下,接收用户的第一输入;
处理器1010,用于响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻显示第二音视频,所述第二音视频为通过拍摄或播放得到;
在所述第一音视频中合成所述第二音视频,生成合成音视频。
本申请实施例的电子设备,能够在播放第一音视频的情况下,接收用户的第一输入,以对应第一音视频播放进度的第一时刻显示通过拍摄或播放得到的第二音视频,然后将第二音视频合成到第一音视频中,生成合成音视频。这样可以直接在原音视频的基础上选择合适的播放进度对应的时刻,通过电子设备的拍摄或播放功能获取第二音视频来与原音视频合成,进而得到合成音视频,操作便捷,高效。
应理解的是,本申请实施例中,输入单元1004可以包括图形处理器(Graphics Processing Unit,GPU)1041和麦克风10042,图形处理器10041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1006可包括显示面板10061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板10061。用户输入单元1007包括触控面板10071以及其他输入设备10072。触控面板10071,也称为触摸屏。触控面板10071可包括触摸检测装置和触摸控制器两个部分。其他输入设备10072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器1009可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器1010可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1010中。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述音视频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读 存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述音视频处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (23)

  1. 一种音视频处理方法,所述方法包括:
    在播放第一音视频的情况下,接收用户的第一输入;
    响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻显示第二音视频,所述第二音视频通过录制或播放得到;
    在所述第一音视频中合成所述第二音视频,生成合成音视频。
  2. 根据权利要求1所述的方法,其中,所述响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻显示第二音视频,包括:
    响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,所述目标音视频为录制或播放的音视频;
    接收用户的第二输入;
    响应于所述第二输入,停止动态显示所述目标音视频的画面,得到所述第二音视频。
  3. 根据权利要求2所述的方法,其中,所述在所述第一音视频中合成所述第二音视频,生成合成音视频,包括:
    将所述第二音视频拼接在所述第一音视频中对应所述第一时刻的位置,生成合成音视频。
  4. 根据权利要求2所述的方法,其中,所述在所述第一音视频中合成所述第二音视频,生成合成音视频,包括:
    将所述第二音视频替换所述第一音视频中的第一片段,生成合成音视频,所述第一片段为第一音视频中所述第一时刻至第二时刻的片段,所述第一片段的时长等于所述第二音视频的时长。
  5. 根据权利要求1所述的方法,其中,所述响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻显示第二音视频,包括:
    响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,所述目标音视频为录制或播放的音视频;
    在预设目标时长后,停止动态显示所述目标音视频的画面,得到所述第二音视频;
    所述在所述第一音视频中合成所述第二音视频,生成合成音视频,包括:
    将所述第二音视频替换所述第一音视频中的第一片段,生成合成音视频,所述第一片段时长等于所述预设目标时长。
  6. 根据权利要求1所述的方法,其中,所述在所述第一音视频中合成所述第二音视频,生成合成音视频,包括:
    接收第三输入;
    响应于所述第三输入,确定所述第一音视频中目标片段的时长;
    将所述第二音视频替换所述第一音视频中的所述目标片段,生成合成音视频。
  7. 根据权利要求1-4任一项所述的方法,其中,所述响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻显示第二音视频,包括:
    响应于所述第一输入,当在第一显示区域播放所述第一音视频至所述第一时刻的情况下,在第二显示区域显示所述第二音视频。
  8. 根据权利要求1所述的方法,其中,所述在所述第一音视频中合成所述第二音视频,生成合成音视频,包括:
    根据所述第二音视频,生成字幕信息;
    将所述字幕信息关联所述第二音视频,并与所述第二音视频合成至所述第一音视频中,生成所述合成音视频。
  9. 根据权利要求1所述的方法,在所述第一音视频中合成所述第二音视频,生成合成音视频之前,所述方法还包括:
    在通过录制得到所述第二音视频的情况下,接收用户的第四输入;
    响应所述第四输入,从目标音视频库中选取第三音视频;
    所述在所述第一音视频中合成所述第二音视频,生成合成音视频,包括:
    将所述第二音视频和所述第三音视频合成至所述第一音视频中,生成所述合成音视频。
  10. 一种音视频处理装置,所述装置包括:
    第一接收模块,用于在播放第一音视频的情况下,接收用户的第一输入;
    第一显示模块,用于响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻显示第二音视频,所述第二音视频为通过录制或播放得到;
    生成模块,用于生成合成音视频,所述合成音视频为在所述第一音视频中合成所述第二音视频得到。
  11. 根据权利要求10所述的装置,其中,所述第一显示模块包括:
    第一显示子模块,用于响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,所述目标音视频为录制或播放的音视频;
    第一接收子模块,用于接收用户的第二输入;
    第一停止子模块,用于响应于所述第二输入,停止动态显示所述目标音视频的画面,得到所述第二音视频。
  12. 根据权利要求11所述的装置,其中,所述生成模块具体用于:
    将所述第二音视频拼接在所述第一音视频中对应所述第一时刻的位置,生成合成音视频。
  13. 根据权利要求11所述的装置,其中,所述生成模块具体用于:
    将所述第二音视频替换所述第一音视频中的第一片段,生成合成音视频,所述第一片段为第一音视频中所述第一时刻至第二时刻的片段,所述第一片段的时长等于所述第二音视频的时长。
  14. 根据权利要求10所述的装置,其中,所述第一显示模块包括:
    第二显示子模块,用于响应于所述第一输入,在所述第一音视频播放进度对应的第一时刻,开始动态显示目标音视频的画面,所述目标音视频为录制或播放的音视频;
    第二停止子模块,用于在预设目标时长后,停止动态显示所述目标音视频的画面,得到所述第二音视频;
    所述生成模块具体用于:
    将所述第二音视频替换所述第一音视频中的第一片段,生成合成音视频,所述第一片段时长等于所述预设目标时长。
  15. 根据权利要求10所述的装置,其中,所述生成模块包括:
    第二接收子模块,用于接收第三输入;
    第一确定子模块,用于响应于所述第三输入,确定所述第一音视频中目标片段的时长;
    第一生成子模块,用于将所述第二音视频替换所述第一音视频中的所述目标片段,生成合成音视频。
  16. 根据权利要求10-13任一项所述的装置,其中,所述第一显示模块具体用于:
    响应于所述第一输入,当在第一显示区域播放所述第一音视频至所述第一时刻的情况下,在第二显示区域显示所述第二音视频。
  17. 根据权利要求10所述的装置,其中,所述生成模块包括:
    第二生成子模块,用于根据所述第二音视频,生成字幕信息;
    第三生成子模块,用于将所述字幕信息关联所述第二音视频,并与所述第二音视频合成至所述第一音视频中,生成所述合成音视频。
  18. 根据权利要求10所述的装置,所述装置还包括:
    第三接收模块,用于在所述第一音视频中合成所述第二音视频,生成合成音视频之前,在通过录制得到所述第二音视频的情况下,接收用户的第四输入;
    获取模块,用于响应所述第四输入,从目标音视频库中选取第三音视频;
    所述生成模块,用于将所述第二音视频和所述第三音视频合成至所述第一音视频中,生成所述合成音视频。
  19. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-9任一项所述的音视频处理方法的步骤。
  20. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-9任一项所述的音视频处理方法的步骤。
  21. 一种计算机程序产品,所述程序产品被存储在非易失的存储介质中,所述程序产品被至少一个处理器执行以实现如权利要求1-9任一项所述的音视频处理方法的步骤。
  22. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-9任一项所述的音视频处理方法的步骤。
  23. 一种电子设备,用于执行如权利要求1-9任一项所述的音视频处理方法的步骤。
PCT/CN2022/115582 2021-08-31 2022-08-29 音视频处理方法、装置和电子设备 WO2023030270A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111017334.6A CN113727140A (zh) 2021-08-31 2021-08-31 音视频处理方法、装置和电子设备
CN202111017334.6 2021-08-31

Publications (1)

Publication Number Publication Date
WO2023030270A1 true WO2023030270A1 (zh) 2023-03-09

Family

ID=78680269

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115582 WO2023030270A1 (zh) 2021-08-31 2022-08-29 音视频处理方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN113727140A (zh)
WO (1) WO2023030270A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113727140A (zh) * 2021-08-31 2021-11-30 维沃移动通信(杭州)有限公司 音视频处理方法、装置和电子设备
WO2023155143A1 (zh) * 2022-02-18 2023-08-24 北京卓越乐享网络科技有限公司 视频制作方法、装置、电子设备、存储介质和程序产品
CN116668763B (zh) * 2022-11-10 2024-04-19 荣耀终端有限公司 录屏方法及装置
CN117255231B (zh) * 2023-11-10 2024-03-22 腾讯科技(深圳)有限公司 一种虚拟视频合成方法、装置及相关产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109348155A (zh) * 2018-11-08 2019-02-15 北京微播视界科技有限公司 视频录制方法、装置、计算机设备和存储介质
CN110913141A (zh) * 2019-11-29 2020-03-24 维沃移动通信有限公司 一种视频显示方法、电子设备以及介质
CN110971970A (zh) * 2019-11-29 2020-04-07 维沃移动通信有限公司 视频处理方法及电子设备
CN112243064A (zh) * 2020-10-19 2021-01-19 维沃移动通信(深圳)有限公司 音频处理方法及装置
CN113727140A (zh) * 2021-08-31 2021-11-30 维沃移动通信(杭州)有限公司 音视频处理方法、装置和电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534386B (zh) * 2008-12-29 2010-08-25 北大方正集团有限公司 一种视频替换方法、视频播放系统及装置
US9521437B2 (en) * 2009-06-17 2016-12-13 Google Technology Holdings LLC Insertion of recorded secondary digital video content during playback of primary digital video content
CN104967902B (zh) * 2014-09-17 2018-10-12 腾讯科技(北京)有限公司 视频分享方法、装置及系统
CN106851423B (zh) * 2017-03-31 2018-10-19 腾讯科技(深圳)有限公司 在线视频播放方法及相关装置
CN107920274B (zh) * 2017-10-27 2020-08-04 优酷网络技术(北京)有限公司 一种视频处理方法、客户端及服务器
CN111954005B (zh) * 2019-05-17 2022-12-20 腾讯科技(深圳)有限公司 多媒体资源的传输方法及装置
CN112218154B (zh) * 2019-07-12 2023-02-10 腾讯科技(深圳)有限公司 视频的获取方法和装置、存储介质及电子装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109348155A (zh) * 2018-11-08 2019-02-15 北京微播视界科技有限公司 视频录制方法、装置、计算机设备和存储介质
CN110913141A (zh) * 2019-11-29 2020-03-24 维沃移动通信有限公司 一种视频显示方法、电子设备以及介质
CN110971970A (zh) * 2019-11-29 2020-04-07 维沃移动通信有限公司 视频处理方法及电子设备
CN112243064A (zh) * 2020-10-19 2021-01-19 维沃移动通信(深圳)有限公司 音频处理方法及装置
CN113727140A (zh) * 2021-08-31 2021-11-30 维沃移动通信(杭州)有限公司 音视频处理方法、装置和电子设备

Also Published As

Publication number Publication date
CN113727140A (zh) 2021-11-30

Similar Documents

Publication Publication Date Title
WO2023030270A1 (zh) 音视频处理方法、装置和电子设备
US20220342519A1 (en) Content Presentation and Interaction Across Multiple Displays
CN108616696B (zh) 一种视频拍摄方法、装置、终端设备及存储介质
CN108900771B (zh) 一种视频处理方法、装置、终端设备及存储介质
US8745500B1 (en) Video editing, enhancement and distribution platform for touch screen computing devices
US8666223B2 (en) Electronic apparatus and image data management method
US20170024110A1 (en) Video editing on mobile platform
EP2947891A1 (en) Method for providing episode selection of video and apparatus thereof
US20170243611A1 (en) Method and system for video editing
KR20140139859A (ko) 멀티미디어 콘텐츠 검색을 위한 사용자 인터페이스 방법 및 장치
US9883243B2 (en) Information processing method and electronic apparatus
CN112672061B (zh) 视频拍摄方法、装置、电子设备及介质
CN114979495B (zh) 用于内容拍摄的方法、装置、设备和存储介质
WO2023061414A1 (zh) 一种文件生成方法、装置及电子设备
JP2013097455A (ja) 表示制御装置、表示制御装置の制御方法およびプログラム
CN112887794A (zh) 视频剪辑方法及装置
WO2024153191A1 (zh) 视频生成方法、装置、电子设备及介质
US20070079240A1 (en) Method for integrating user interfaces of multimedia files
CN113157181B (zh) 操作引导方法及装置
CN113038014A (zh) 应用程序的视频处理方法和电子设备
JP2007300563A (ja) マルチメディア再生装置、メニュー画面表示方法、メニュー画面表示プログラム及びメニュー画面表示プログラムを記憶したコンピュータ読み取り可能な記憶媒体
CN115550741A (zh) 视频管理方法、装置、电子设备及可读存储介质
CN110209870B (zh) 音乐日志生成方法、装置、介质和计算设备
JP2008090526A (ja) 会議情報保存装置、システム、会議情報表示装置及びプログラム
AU2015224395A1 (en) Method, system and apparatus for generating a postion marker in video images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863405

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22863405

Country of ref document: EP

Kind code of ref document: A1