WO2021008055A1 - Video synthesis method and apparatus, and terminal and storage medium - Google Patents

Video synthesis method and apparatus, and terminal and storage medium Download PDF

Info

Publication number
WO2021008055A1
WO2021008055A1 PCT/CN2019/120302 CN2019120302W WO2021008055A1 WO 2021008055 A1 WO2021008055 A1 WO 2021008055A1 CN 2019120302 W CN2019120302 W CN 2019120302W WO 2021008055 A1 WO2021008055 A1 WO 2021008055A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
time point
audio
videos
accent
Prior art date
Application number
PCT/CN2019/120302
Other languages
French (fr)
Chinese (zh)
Inventor
吴晗
李文涛
王森
陈恒全
Original Assignee
广州酷狗计算机科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州酷狗计算机科技有限公司 filed Critical 广州酷狗计算机科技有限公司
Publication of WO2021008055A1 publication Critical patent/WO2021008055A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Definitions

  • This application relates to the field of video processing technology, and in particular to a method, device, terminal and storage medium for video synthesis.
  • the embodiments of the present application provide a method, device, terminal, and storage medium for video synthesis, which can solve the problem of low video synthesis efficiency.
  • the technical solution is as follows:
  • a method for video synthesis includes:
  • the multiple material videos and the material audio are synthesized to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the accent beat of the audio data Point in time.
  • the determining a plurality of material videos in the material video set based on the accent beat time point includes:
  • a plurality of material videos are determined in the material video set.
  • the determining a plurality of material videos in the material video set based on the number N of the accent beat time points, the start time point and the end time point of the material audio includes:
  • N+1 material videos are determined in the material video set.
  • the synthesizing the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video includes:
  • each sub-video is synthesized to obtain a synthesized material video
  • the synthesized material video and the material audio are synthesized to obtain a synthesized video.
  • the determining the sub-video corresponding to the currently acquired material video based on the currently acquired material video and the accent beat time point includes:
  • N+1 material videos are determined in the material video set.
  • the material video set is a total material video spliced into a plurality of material videos.
  • the material video set is a video set including multiple independent material videos.
  • the acquisition of the material video set, the material audio, and the accent beat time point of the material audio sent by the server includes:
  • the accent beat time point of the material audio for synthesizing video is determined.
  • a video synthesis device in a second aspect, includes:
  • a sending module configured to send a material acquisition request to the server, wherein the material acquisition request carries characteristic information of the material audio;
  • An acquisition module configured to acquire the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
  • a determining module configured to determine multiple material videos in the material video collection based on the accent beat time point
  • the synthesis module is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the The accent beat time point of the audio data.
  • the determining module is used to:
  • a plurality of material videos are determined in the material video set.
  • the determining module is used to:
  • N+1 material videos are determined in the material video set.
  • the synthesis module is used for:
  • each sub-video is synthesized to obtain a synthesized material video
  • the synthesized material video and the material audio are synthesized to obtain a synthesized video.
  • the synthesis module is used for:
  • N+1 material videos are determined in the material video set.
  • the material video set is a total material video spliced into a plurality of material videos.
  • the material video set is a video set including multiple independent material videos.
  • the obtaining module 1120 is configured to:
  • the accent beat time point of the material audio for synthesizing video is determined.
  • a terminal in a third aspect, characterized in that the terminal includes a processor and a memory, and at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the first The operations performed by the video synthesis method described in the aspect.
  • a computer-readable storage medium characterized in that at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement the above-mentioned first aspect.
  • the material video set, the material audio, and the accent beat time point of the material audio are obtained from the server. Then, multiple material videos are selected from the material video set, and finally, based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video. In the obtained synthesized video, the switching time point of each material video is the accent beat time point of the audio data. In this way, the material can be automatically obtained, and the material video and material audio can be automatically synthesized without manual processing, and the efficiency is high. .
  • FIG. 1 is a flowchart of a method for video synthesis provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application program interface provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an application program interface provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of calculating the number of material videos provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of calculating the number of material videos provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of calculating the number of material videos according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an application program interface provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a video synthesis device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the embodiment of the present application provides a method for video synthesis, which may be implemented by a terminal.
  • the terminal may be a mobile phone, a tablet computer, and so on.
  • An application program (hereinafter referred to as a video production application) that can be used to make a composite video is installed in the terminal.
  • the video production application can be a comprehensive application with a variety of functions, such as making composite video, Video recording, video playback, video editing, live broadcast functions, etc., can also be a single-function application with only the function of making composite videos.
  • the user can select music in the video production application, and obtain the material for making the synthesized video from the server through the application.
  • the material can include material audio corresponding to the music and some material videos.
  • the application program can synthesize the acquired materials based on this method to obtain a synthesized video.
  • a music playback application and a video production application can be installed in the terminal at the same time.
  • the music playback application can be a comprehensive application with a variety of functions, such as music playback and audio recording. , Live broadcast function, etc., can also be a single-function application, only with the function of music playback.
  • the user can select favorite music through the music playback application, and obtain materials through the video production application to make a composite video.
  • a case where a music playback application and a video production application are installed in the terminal is taken as an example for description.
  • Fig. 1 is a flowchart of a method for video synthesis provided by an embodiment of the present application. Referring to Fig. 1, this embodiment includes:
  • Step 101 Send a material acquisition request to a server, where the material acquisition request carries characteristic information of the material audio.
  • the feature information of the material audio may be a music name, a hash value of the music name, or a hash value of audio data, etc.
  • the feature information can uniquely identify the material audio, and the specific information is not limited here.
  • the terminal is installed with a music playing application and a video production application.
  • the music player application is provided with a music selection interface for the user.
  • a search bar and a music list can be included.
  • the music list can display the music name, Music duration and other information.
  • the user can select favorite music in the above-mentioned music list, or search for favorite music through the search bar, and select the music.
  • the music playback application can jump to the music playback interface as shown in Figure 3.
  • the lyrics, music name, artist name, and music playback progress of the currently playing music can be displayed Wait.
  • the submission option is the option to trigger video synthesis. When the user selects the submission option, it indicates that he wants to use the currently playing music as the material audio to make a composite video.
  • the music playing application starts the video production application installed in the terminal through the system and sets the characteristics of the current playing music (ie material audio) The information is sent to the video production application. Then, the video production application sends a material acquisition request to the server through the terminal, and the material acquisition request carries the characteristic information of the material and audio.
  • the server may be a background server of the video production application.
  • Step 102 Obtain the material video set, the material audio, and the accent beat time point of the material audio sent by the server.
  • the accent beat time point is the time point corresponding to the beat point with the beat value of 1 in the material audio.
  • the server may store audio dot data.
  • the dot data includes the beat time point and the beat value.
  • the beat time point and the corresponding beat value can be used by the technicians to use the machine according to the BPM (Beat Per Minute, every minute) of the audio data.
  • the number of beats per minute), beat information, etc. are collected and generated, and can also be manually marked and produced by a technician by listening to the audio data.
  • the server can also store multiple material videos, and score the material videos according to their size, image quality, clarity and other indicators, and then select a preset number of material videos with higher scores from the stored material videos.
  • the preset number can be specified by the technician according to the general needs of the user, for example 20.
  • the server can cut each material video and cut each material video to a preset duration.
  • the preset duration can also be specified by the technician according to the general needs of the user. For example, 7s. Because the duration between two accent beats in general audio is usually 2s to 4s, and the preset duration is set to greater than 4s, you can try to avoid the duration of the clipped material video being less than the duration between two accent beats.
  • the material video needs to be transmitted to the terminal, considering the transmission delay, the duration of the material video should not be too long, and it can be 6s to 10s. You can perform the above processing after receiving the material acquisition request sent by the terminal, or you can perform it in advance and store the processed material video. After receiving the material acquisition request, you can directly obtain the locally stored processed material video, thereby Improve the overall efficiency of making composite videos.
  • the server After the server cuts the preset number of video materials, it can use these material videos as the material video set sent to the terminal. In addition, considering the problem of transmission delay, the server can also perform splicing processing on these material videos after cutting the preset number of material videos to obtain a total material video, and send the total material video as The terminal's material video set is sent to the terminal, so that the transmission delay is small. It should also be noted here that when the total material video is used as the material video set, the server also needs to simultaneously send the separation time point of each material video in the total material video to the terminal. For example, the total material video consists of 5 7s material videos, then the separation time points include 0:07 (0 minutes and 7 seconds), 0:14, 0:21, 0:28, and 0:35.
  • the server After receiving the material acquisition request, the server obtains the corresponding material audio and the dot data of the material audio according to the characteristic information of the material audio. Because in the subsequent production and synthesis, the accent beat time point in the dot data (the time point corresponding to the beat point with a beat value of 1) is mainly used, so you can only send the accent beat time point with the material audio and material video set To the terminal to reduce the amount of data transmitted. The terminal receives the accent beat time point of the material audio and the material video collection material audio sent by the server.
  • the material audio received by the above terminal is the original material audio
  • the terminal can also cut the original material audio.
  • the processing can be as follows: the server can also send a preset cutting time to the terminal Point, based on the preset cutting time point and preset cutting time, the original material audio is cut to obtain the material audio for synthesizing the video. In the accent beat time point of the original material audio, determine the The accent beat time point of the material audio.
  • the preset cutting time point can be a time point determined by the technician based on the rhythm of the material audio, etc., or it can be the climax time point of the material audio.
  • the climax time point can be manually marked by the technician, or Collected by the machine. If the server sends both of these time points to the terminal, the terminal preferentially selects the time point determined by the technician according to the rhythm of the audio data and other comprehensive considerations.
  • the terminal After obtaining the preset cutting time point and the original material audio, the terminal intercepts the material audio of the preset cutting time after the preset cutting time point in the original material audio, as the material audio for synthesizing the video.
  • the terminal can intercept the material audio of the preset cutting time after the start time point of the original material audio as the material audio for synthesizing the video.
  • the material audio in the following steps are all material audio used to synthesize the video, and correspondingly, the accent beat time point is also the accent beat time point of the material audio used to synthesize the video.
  • Step 103 Based on the accent beat time point, multiple material videos are determined in the material video collection.
  • the terminal may determine multiple material videos in the acquired material video set based on the number N of accent beat time points, the starting time point and the ending time point of the material audio.
  • the accent beat time point is the start time point or the end time point of the material audio, when determining the material video, there can be the following situations:
  • the number of accent beat time points is 5
  • the start time point of the material audio is the accent beat time point
  • the end time point is not the accent beat time point, which is equivalent to the accent beat time point.
  • the material audio is divided into 5 parts, so each part can correspond to a material video. Therefore, 5 material videos can be determined in the material video set.
  • each part can correspond to one material video. Therefore, 4 material videos can be determined in the material video set.
  • the number of accent beat time points is 5, and the start time point and end time point of the material audio are not accent beat time points, which is equivalent to the accent beat time point.
  • the number of material videos included in the material video set is less than the calculated number of material videos that need to be determined, then all the material videos in the material video set can be determined.
  • the selection method of N, N-1, or N+1 material videos determined above can be selected in the case of the material video set being a video set of multiple independent material videos. Random selection in the video set. In the case that the material video set is a total material video, in addition to random selection, it can also be selected sequentially from the first to the back, or from the last to the front, or from the first at intervals.
  • the specific selection method is not limited in the embodiment of this application.
  • Step 104 Based on the accent beat time point, synthesize multiple material videos and material audio to obtain a synthesized video, where the switching time point of each material video in the synthesized video is the accent beat time point of the audio data.
  • the terminal can randomly determine the synthesis order of each material video when synthesizing the video.
  • the material video set is a total material video
  • it can also be determined according to the position of the material video in the total material video.
  • the composition sequence of the material videos the material videos are obtained one by one, and for each material video obtained, the sub-video corresponding to the currently obtained material video is determined based on the currently obtained material video and the accent beat time point.
  • each sub-video is synthesized to obtain a synthesized material video.
  • switching special effects such as fade in, fade in, pop-in, louvered appearance, etc.
  • switching special effects and switching special effect durations can be determined by The technicians pre-set according to the general needs of users. Then, the synthesized material video and audio data are synthesized to obtain synthesized video. Finally, the composite video can be automatically played by the video production application. As shown in Figure 7, the composite video is automatically played in the middle of the display interface of the video production application.
  • Case 1 If the synthesis sequence of the currently acquired material video is the first, it is determined that the starting time point of the material audio is after the starting time point and the first accent beat time point closest to the starting time point In the material video, a video of the first time length is intercepted from the start time point of the material video as the first sub-video corresponding to the material video.
  • Case 2 If the synthesis order of the currently acquired material video is not the first, determine the first total duration of the generated sub-video, and determine the first time point of the first total duration after the start time point of the material audio, Determine the second accent beat time point after the first time point and closest to the first time point. If there is a second accent beat time point, determine the second time length between the first time point and the second accent beat time point. In the material video, the second time length video is intercepted from the start time point of the material video as The second sub video corresponding to the material video. If there is no second accent beat time point, determine the third time length from the first time point to the end time point of the material audio. In the material video, the third time length video is intercepted from the start time point of the material video It is the third sub video corresponding to the material video.
  • the duration of the material audio is 15s
  • the starting time point of the material audio is 0:00
  • the first duration between the start time point 0:00 and the first accent beat time point 0:03 is 3s, then in the material video, from the start time of the material video Click to start intercepting 3s as the corresponding first sub-video.
  • the duration of the material audio is 15s
  • the start time point of the material audio is 0:00
  • the end time point is 0:15
  • the first total duration of the generated sub-video is 13s.
  • the first time point of the first total duration after the start time point of the material audio is 0:13
  • the time point of the second accent beat is 0:14
  • the second duration from 13 to the second accent beat time point 0:14 is 1s, then, in the material video, 3s can be intercepted from the start time point of the material video as the corresponding second video.
  • the material can be In the video, cut 2s from the start time point of the material video as the corresponding third sub-video.
  • the duration of the synthesized video finally obtained is the total duration of the sub-video.
  • the method provided by the embodiment of the present application obtains the material video set, the material audio, and the accent beat time point of the material audio from the server. Then, multiple material videos are selected from the material video set, and finally, based on the accent beat time point, multiple material videos and the material audio are synthesized to obtain a synthesized video. In the obtained synthesized video, the switching time point of each material video is the accent beat time point of the audio data. In this way, the terminal can automatically obtain the material and automatically synthesize the material video and the material audio without manual processing. Higher.
  • a composite video can be made from the corresponding material audio and the shared material video stored in the server.
  • the composite video can be used as a demo (sample) corresponding to the music and the video can be automatically played. Show it to users. In this way, it is possible to attract users to enter the above-mentioned video production application to make a composite video by themselves.
  • an embodiment of the present application also provides a device for video synthesis.
  • the device may be the terminal in the foregoing embodiment.
  • the device includes: a sending module 1110, an acquiring module 1120, and a determination Module 110 and synthesis module 1140.
  • the sending module 1110 is configured to send a material acquisition request to the server, where the material acquisition request carries feature information of the material audio;
  • the obtaining module 1120 is configured to obtain the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
  • the determining module 1130 is configured to determine multiple material videos in the material video set based on the accent beat time point;
  • the synthesis module 1140 is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is all Describe the accent beat time point of the audio data.
  • the determining module 1130 is configured to:
  • a plurality of material videos are determined in the material video set.
  • the determining module 1130 is configured to:
  • N+1 material videos are determined in the material video set.
  • the synthesis module 1140 is configured to:
  • each sub-video is synthesized to obtain a synthesized material video
  • the synthesized material video and the material audio are synthesized to obtain a synthesized video.
  • the synthesis module 1140 is configured to:
  • N+1 material videos are determined in the material video set.
  • the material video set is a total material video spliced into a plurality of material videos.
  • the material video set is a video set including multiple independent material videos.
  • the obtaining module 1120 is used to:
  • the accent beat time point of the material audio for synthesizing video is determined.
  • the video synthesis device provided in the above embodiment only uses the division of the above functional modules for illustration when synthesizing videos.
  • the above functions can be allocated by different functional modules as needed. That is, the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above.
  • the video synthesis device provided in the foregoing embodiment and the video synthesis method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • FIG. 12 shows a structural block diagram of a terminal 1200 provided by an exemplary embodiment of the present application.
  • the terminal 1200 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compressing standard audio Level 4) Player, laptop or desktop computer.
  • the terminal 1200 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 1200 includes a processor 1201 and a memory 1202.
  • the processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 1201 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 1201 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 1201 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 1201 may also include an AI (Artificial Intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 1202 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 1202 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1202 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1201 to implement the video synthesis provided in the method embodiment of the present application. Methods.
  • the terminal 1200 may further include: a peripheral device interface 1203 and at least one peripheral device.
  • the processor 1201, the memory 1202, and the peripheral device interface 1203 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 1203 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 1204, a touch display screen 1205, a camera 1206, an audio circuit 1207, a positioning component 1208, and a power supply 1209.
  • the peripheral device interface 1203 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1201 and the memory 1202.
  • the processor 1201, the memory 1202, and the peripheral device interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1201, the memory 1202, and the peripheral device interface 1203 or The two can be implemented on separate chips or circuit boards, which are not limited in this embodiment.
  • the radio frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1204 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 1204 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 1204 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 1204 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: metropolitan area network, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
  • the radio frequency circuit 1204 may also include NFC (Near Field Communication) related circuits, which is not limited in this application.
  • the display screen 1205 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 1205 also has the ability to collect touch signals on or above the surface of the display screen 1205.
  • the touch signal may be input to the processor 1201 as a control signal for processing.
  • the display screen 1205 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1205 may be one display screen 1205, which is provided with the front panel of the terminal 1200; in other embodiments, there may be at least two display screens 1205, which are respectively arranged on different surfaces of the terminal 1200 or in a folded design; In still other embodiments, the display screen 1205 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 1200. Furthermore, the display screen 1205 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 1205 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera assembly 1206 is used to capture images or videos.
  • the camera assembly 1206 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • there are at least two rear cameras each of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, Integrate with the wide-angle camera to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions.
  • the camera assembly 1206 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 1207 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1201 for processing, or input to the radio frequency circuit 1204 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 1200.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is used to convert the electrical signal from the processor 1201 or the radio frequency circuit 1204 into sound waves.
  • the speaker can be a traditional membrane speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for purposes such as distance measurement.
  • the audio circuit 1207 may also include a headphone jack.
  • the positioning component 1208 is used to locate the current geographic location of the terminal 1200 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 1208 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.
  • the power supply 1209 is used to supply power to various components in the terminal 1200.
  • the power source 1209 may be alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 1200 further includes one or more sensors 1210.
  • the one or more sensors 1210 include, but are not limited to: an acceleration sensor 1211, a gyroscope sensor 1212, a pressure sensor 1213, a fingerprint sensor 1214, an optical sensor 1215, and a proximity sensor 1216.
  • the acceleration sensor 1211 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1200.
  • the acceleration sensor 1211 can be used to detect the components of the gravitational acceleration on three coordinate axes.
  • the processor 1201 may control the touch screen 1205 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1211.
  • the acceleration sensor 1211 can also be used for game or user motion data collection.
  • the gyroscope sensor 1212 can detect the body direction and rotation angle of the terminal 1200, and the gyroscope sensor 1212 can cooperate with the acceleration sensor 1211 to collect the user's 3D actions on the terminal 1200.
  • the processor 1201 can implement the following functions according to the data collected by the gyroscope sensor 1212: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 1213 may be disposed on the side frame of the terminal 1200 and/or the lower layer of the touch screen 1205.
  • the processor 1201 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 1213.
  • the processor 1201 operates according to the user's pressure on the touch display screen 1205 to control the operability controls on the UI interface.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 1214 is used to collect the user's fingerprint.
  • the processor 1201 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user's identity according to the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 1201 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 1214 may be provided on the front, back or side of the terminal 1200. When a physical button or a manufacturer logo is provided on the terminal 1200, the fingerprint sensor 1214 can be integrated with the physical button or the manufacturer logo.
  • the optical sensor 1215 is used to collect the ambient light intensity.
  • the processor 1201 may control the display brightness of the touch screen 1205 according to the intensity of the ambient light collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1205 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1205 is decreased.
  • the processor 1201 may also dynamically adjust the shooting parameters of the camera assembly 1206 according to the ambient light intensity collected by the optical sensor 1215.
  • the proximity sensor 1216 also called a distance sensor, is usually arranged on the front panel of the terminal 1200.
  • the proximity sensor 1216 is used to collect the distance between the user and the front of the terminal 1200.
  • the processor 1201 controls the touch screen 1205 to switch from the on-screen state to the off-screen state; when the proximity sensor 1216 detects When the distance between the user and the front of the terminal 1200 gradually increases, the processor 1201 controls the touch display screen 1205 to switch from the rest screen state to the bright screen state.
  • FIG. 12 does not constitute a limitation on the terminal 1200, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.
  • a computer-readable storage medium includes a memory storing instructions.
  • the instructions can be executed by a processor in a terminal to complete the video synthesis method in the foregoing embodiment.
  • the computer-readable storage medium may be non-transitory.
  • the computer-readable storage medium may be ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Abstract

Disclosed is a video synthesis method, wherein same belongs to the technical field of video processing. The method comprises: sending a material acquisition request to a server, wherein the material acquisition request carries feature information of material audio; acquiring a material video set, the material audio and an accented beat time point of the material audio sent by the server; determining, on the basis of the accented beat time point, a plurality of material videos from the material video set; and synthesizing, on the basis of the accented beat time point, the plurality of material videos and the material audio to obtain a synthesized video, wherein a switching time point of each material video in the synthesized video is the accented beat time point of the audio data. By means of the present application, the video synthesis efficiency can be improved.

Description

视频合成的方法、装置、终端及存储介质Method, device, terminal and storage medium for video synthesis
本申请要求于2019年07月17日提交的申请号为201910647507.9、发明名称为“视频合成的方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on July 17, 2019 with the application number 201910647507.9 and the title of the invention "Method, device, terminal and storage medium for video synthesis", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本申请涉及视频处理技术领域,特别涉及一种视频合成的方法、装置、终端及存储介质。This application relates to the field of video processing technology, and in particular to a method, device, terminal and storage medium for video synthesis.
背景技术Background technique
在日常生活中,人们通常会想要将自己喜欢的音乐作为背景音乐,来制作短视频。In daily life, people usually want to use their favorite music as background music to make short videos.
一般来说,制作短视频时,需要自己搜集素材视频,再使用视频编辑软件,将搜集到的素材视频进行拼接,并添加喜欢的音乐作为背景音乐,得到合成视频。Generally speaking, when making short videos, you need to collect the material videos yourself, and then use video editing software to splice the collected material videos, and add your favorite music as background music to get a composite video.
在实现本申请的过程中,发明人发现现有技术至少存在以下问题:In the process of realizing this application, the inventor found that the prior art has at least the following problems:
上述制作合成视频的过程,均需由人工完成,过程繁琐,合成效率较低。The above-mentioned process of making composite video needs to be completed manually, which is cumbersome and has low synthesis efficiency.
发明内容Summary of the invention
本申请实施例提供了一种视频合成的方法、装置、终端及存储介质,能够解决视频合成效率低的问题。所述技术方案如下:The embodiments of the present application provide a method, device, terminal, and storage medium for video synthesis, which can solve the problem of low video synthesis efficiency. The technical solution is as follows:
第一方面,提供了一种视频合成的方法,所述方法包括:In a first aspect, a method for video synthesis is provided, and the method includes:
向服务器发送素材获取请求,其中,所述素材获取请求中携带有素材音频的特征信息;Sending a material acquisition request to the server, where the material acquisition request carries characteristic information of the material audio;
获取服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点;Acquiring the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频;Determine multiple material videos in the material video collection based on the accent beat time point;
基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,其中,在所述合成视频中各素材视频的切换时间点为所述音频 数据的重音节拍时间点。Based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the accent beat of the audio data Point in time.
可选的,所述基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频,包括:Optionally, the determining a plurality of material videos in the material video set based on the accent beat time point includes:
基于所述重音节拍时间点的个数N、所述素材音频的起始时间点和结束时间点,在所述素材视频集中,确定出多个素材视频。Based on the number of accent beat time points N, the start time point and the end time point of the material audio, a plurality of material videos are determined in the material video set.
可选的,所述基于所述重音节拍时间点的个数N、所述素材音频的起始时间点和结束时间点,在所述素材视频集中,确定出多个素材视频,包括:Optionally, the determining a plurality of material videos in the material video set based on the number N of the accent beat time points, the start time point and the end time point of the material audio, includes:
如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
可选的,所述基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,包括:Optionally, the synthesizing the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video includes:
确定在合成视频时各素材视频的合成顺序;Determine the synthesis order of each material video when synthesizing the video;
按照所述各素材视频的合成顺序,逐个获取素材视频,每获取一个素材视频,基于当前获取的素材视频和所述重音节拍时间点,确定所述当前获取的素材视频对应的子视频;Obtain the material videos one by one according to the synthesis order of the material videos, and for each material video obtained, determine the sub-video corresponding to the currently obtained material video based on the currently obtained material video and the accent beat time point;
基于所述合成顺序,对每个子视频进行合成,得到合成素材视频,对所述合成素材视频和所述素材音频进行合成,得到合成视频。Based on the synthesis sequence, each sub-video is synthesized to obtain a synthesized material video, and the synthesized material video and the material audio are synthesized to obtain a synthesized video.
可选的,所述基于当前获取的素材视频和所述重音节拍时间点,确定所述当前获取的素材视频对应的子视频,包括:Optionally, the determining the sub-video corresponding to the currently acquired material video based on the currently acquired material video and the accent beat time point includes:
如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
可选的,所述素材视频集为由多个素材视频拼接成的一个总素材视频。Optionally, the material video set is a total material video spliced into a plurality of material videos.
可选的,所述素材视频集为包括有多个独立的素材视频的视频集合。Optionally, the material video set is a video set including multiple independent material videos.
可选的,所述获取服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点,包括:Optionally, the acquisition of the material video set, the material audio, and the accent beat time point of the material audio sent by the server includes:
接收服务器发送的素材视频集、原始素材音频、所述原始素材音频的重音节拍时间点和预设剪切时间点;Receiving the material video set, the original material audio, the accent beat time point of the original material audio and the preset cutting time point sent by the server;
基于所述预设剪切时间点和预设剪切时长,对所述原始素材音频进行剪切,得到用于合成视频的素材音频;Cutting the original material audio based on the preset cutting time point and the preset cutting time length to obtain material audio for synthesizing video;
在所述述原始素材音频的重音节拍时间点中,确定出所述用于合成视频的素材音频的重音节拍时间点。Among the accent beat time points of the original material audio, the accent beat time point of the material audio for synthesizing video is determined.
第二方面,提供了一种视频合成的装置,所述装置包括:In a second aspect, a video synthesis device is provided, and the device includes:
发送模块,用于向服务器发送素材获取请求,其中,所述素材获取请求中携带有素材音频的特征信息;A sending module, configured to send a material acquisition request to the server, wherein the material acquisition request carries characteristic information of the material audio;
获取模块,用于获取服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点;An acquisition module, configured to acquire the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
确定模块,用于基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频;A determining module, configured to determine multiple material videos in the material video collection based on the accent beat time point;
合成模块,用于基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,其中,在所述合成视频中各素材视频的切换时间点为所述音频数据的重音节拍时间点。The synthesis module is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the The accent beat time point of the audio data.
可选的,所述确定模块,用于:Optionally, the determining module is used to:
基于所述重音节拍时间点的个数N、所述素材音频的起始时间点和结束时间点,在所述素材视频集中,确定出多个素材视频。Based on the number of accent beat time points N, the start time point and the end time point of the material audio, a plurality of material videos are determined in the material video set.
可选的,所述确定模块,用于:Optionally, the determining module is used to:
如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
可选的,所述合成模块,用于:Optionally, the synthesis module is used for:
确定在合成视频时各素材视频的合成顺序;Determine the synthesis order of each material video when synthesizing the video;
按照所述各素材视频的合成顺序,逐个获取素材视频,每获取一个素材视频,基于当前获取的素材视频和所述重音节拍时间点,确定所述当前获取的素材视频对应的子视频;Obtain the material videos one by one according to the synthesis order of the material videos, and for each material video obtained, determine the sub-video corresponding to the currently obtained material video based on the currently obtained material video and the accent beat time point;
基于所述合成顺序,对每个子视频进行合成,得到合成素材视频,对所述合成素材视频和所述素材音频进行合成,得到合成视频。Based on the synthesis sequence, each sub-video is synthesized to obtain a synthesized material video, and the synthesized material video and the material audio are synthesized to obtain a synthesized video.
可选的,所述合成模块,用于:Optionally, the synthesis module is used for:
如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
可选的,所述素材视频集为由多个素材视频拼接成的一个总素材视频。Optionally, the material video set is a total material video spliced into a plurality of material videos.
可选的,所述素材视频集为包括有多个独立的素材视频的视频集合。Optionally, the material video set is a video set including multiple independent material videos.
可选的,所述获取模块1120,用于:Optionally, the obtaining module 1120 is configured to:
接收服务器发送的素材视频集、原始素材音频、所述原始素材音频的重音节拍时间点和预设剪切时间点;Receiving the material video set, the original material audio, the accent beat time point of the original material audio and the preset cutting time point sent by the server;
基于所述预设剪切时间点和预设剪切时长,对所述原始素材音频进行剪切,得到用于合成视频的素材音频;Cutting the original material audio based on the preset cutting time point and the preset cutting time length to obtain material audio for synthesizing video;
在所述述原始素材音频的重音节拍时间点中,确定出所述用于合成视频的素材音频的重音节拍时间点。Among the accent beat time points of the original material audio, the accent beat time point of the material audio for synthesizing video is determined.
第三方面,提供了一种终端,其特征在于,所述终端包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如上述第一方面所述的视频合成的方法所执行的操作。In a third aspect, a terminal is provided, characterized in that the terminal includes a processor and a memory, and at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the first The operations performed by the video synthesis method described in the aspect.
第四方面,提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如上述第一方面所述的视频合成的方法所执行的操作。In a fourth aspect, a computer-readable storage medium is provided, characterized in that at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement the above-mentioned first aspect. The operations performed by the method of video synthesis.
本申请实施例提供的技术方案带来的有益效果是:The beneficial effects brought about by the technical solutions provided by the embodiments of this application are:
通过从服务器获取到素材视频集、素材音频和所述素材音频的重音节拍时间点。然后,在素材视频集中选取出多个素材视频,最后,基于重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频。得到的合成视频中,各素材视频的切换时间点为所述音频数据的重音节拍时间点,这样,可以实现自动获取素材,并自动将素材视频和素材音频进行合成,无需人工处理,效率较高。The material video set, the material audio, and the accent beat time point of the material audio are obtained from the server. Then, multiple material videos are selected from the material video set, and finally, based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video. In the obtained synthesized video, the switching time point of each material video is the accent beat time point of the audio data. In this way, the material can be automatically obtained, and the material video and material audio can be automatically synthesized without manual processing, and the efficiency is high. .
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.
图1是本申请实施例提供的一种视频合成的方法流程图;FIG. 1 is a flowchart of a method for video synthesis provided by an embodiment of the present application;
图2是本申请实施例提供的一种应用程序界面示意图;FIG. 2 is a schematic diagram of an application program interface provided by an embodiment of the present application;
图3是本申请实施例提供的一种应用程序界面示意图;FIG. 3 is a schematic diagram of an application program interface provided by an embodiment of the present application;
图4是本申请实施例提供的一种素材视频数目计算示意图;4 is a schematic diagram of calculating the number of material videos provided by an embodiment of the present application;
图5是本申请实施例提供的一种素材视频数目计算示意图;FIG. 5 is a schematic diagram of calculating the number of material videos provided by an embodiment of the present application;
图6是本申请实施例提供的一种素材视频数目计算示意图;FIG. 6 is a schematic diagram of calculating the number of material videos according to an embodiment of the present application;
图7是本申请实施例提供的一种应用程序界面示意图;FIG. 7 is a schematic diagram of an application program interface provided by an embodiment of the present application;
图8是本申请实施例提供的一种子视频的时长计算示意图;FIG. 8 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application;
图9是本申请实施例提供的一种子视频的时长计算示意图;FIG. 9 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application;
图10是本申请实施例提供的一种子视频的时长计算示意图;FIG. 10 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application;
图11是本申请实施例提供的一种视频合成的装置结构示意图;FIG. 11 is a schematic structural diagram of a video synthesis device provided by an embodiment of the present application;
图12是本申请实施例提供的一种终端的结构示意图。FIG. 12 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions, and advantages of the present application clearer, the following will further describe the embodiments of the present application in detail with reference to the accompanying drawings.
本申请实施例提供了一种视频合成的方法,该方法可以由终端实现。其中,终端可以为手机、平板电脑等等。在终端中安装有可以用于制作合成视频的应用程序(以下简称为视频制作应用程序),该视频制作应用程序可以是一个综合 性的应用程序,具有多种多样的功能,如制作合成视频、视频录制、视频播放、视频剪辑、直播功能等,也可以是一个功能单一的应用程序,只具备制作合成视频的功能。The embodiment of the present application provides a method for video synthesis, which may be implemented by a terminal. Among them, the terminal may be a mobile phone, a tablet computer, and so on. An application program (hereinafter referred to as a video production application) that can be used to make a composite video is installed in the terminal. The video production application can be a comprehensive application with a variety of functions, such as making composite video, Video recording, video playback, video editing, live broadcast functions, etc., can also be a single-function application with only the function of making composite videos.
用户可以在视频制作应用程序中选择音乐,并通过该应用程序向服务器获取制作合成视频的素材,素材可以有音乐对应的素材音频和一些素材视频。该应用程序便可以基于本方法对获取到的素材进行合成得到合成视频。The user can select music in the video production application, and obtain the material for making the synthesized video from the server through the application. The material can include material audio corresponding to the music and some material videos. The application program can synthesize the acquired materials based on this method to obtain a synthesized video.
另外,在终端中还可以同时安装有音乐播放应用程序和视频制作应用程序,同样的,该音乐播放应用程序可以是一个综合性的应用程序,具有多种多样的功能,如音乐播放、音频录制、直播功能等,也可以是一个功能单一的应用程序,只具备音乐播放的功能。用户可以通过该音乐播放应用程序选择喜欢的音乐,并通过视频制作应用程序获取素材,进行合成视频制作。以下具体实施例中,以终端中安装有音乐播放应用程序和视频制作应用程序的情况为例进行说明。In addition, a music playback application and a video production application can be installed in the terminal at the same time. Similarly, the music playback application can be a comprehensive application with a variety of functions, such as music playback and audio recording. , Live broadcast function, etc., can also be a single-function application, only with the function of music playback. The user can select favorite music through the music playback application, and obtain materials through the video production application to make a composite video. In the following specific embodiments, a case where a music playback application and a video production application are installed in the terminal is taken as an example for description.
图1是本申请实施例提供的一种视频合成的方法流程图。参见图1,该实施例包括:Fig. 1 is a flowchart of a method for video synthesis provided by an embodiment of the present application. Referring to Fig. 1, this embodiment includes:
步骤101、向服务器发送素材获取请求,其中,素材获取请求中携带有素材音频的特征信息。Step 101: Send a material acquisition request to a server, where the material acquisition request carries characteristic information of the material audio.
其中,素材音频的特征信息可以为音乐名称、音乐名称的哈希值或者音频数据的哈希值等,该特征信息可以唯一标识该素材音频即可,具体是何种信息在此不做限定。The feature information of the material audio may be a music name, a hash value of the music name, or a hash value of audio data, etc. The feature information can uniquely identify the material audio, and the specific information is not limited here.
在实施中,终端安装有音乐播放应用程序和视频制作应用程序。如图2所示,在该音乐播放应用程序中给用户提供有音乐选择界面,在该音乐选择界面中,可以包括有搜索栏和音乐列表,在音乐列表中可以显示每首音乐的音乐名称、音乐时长等信息。In the implementation, the terminal is installed with a music playing application and a video production application. As shown in Figure 2, the music player application is provided with a music selection interface for the user. In the music selection interface, a search bar and a music list can be included. The music list can display the music name, Music duration and other information.
用户可以在上述音乐列表中选择喜欢的音乐,也可以通过搜索栏搜索喜欢的音乐,并选择该音乐。当用户选择一个音乐后,音乐播放应用程序便可以跳转到如图3所示的音乐播放界面,在该音乐播放界面中,可以显示当前播放音乐的歌词、音乐名称、歌手名、音乐播放进度等。除此之外,在该音乐播放界面的右上角还可以显示有投稿选项。该投稿选项即为触发视频合成的选项。用户选择该投稿选项,则表明想要使用当前播放音乐作为素材音频制作合成视频,该音乐播放应用程序通过系统启动终端中安装的视频制作应用程序,并将当前 播放音乐(即素材音频)的特征信息发送给该视频制作应用程序。然后,由该视频制作应用程序通过终端向服务器发送素材获取请求,该素材获取请求携带有该素材音频的特征信息,其中,服务器可以为该视频制作应用程序的后台服务器。The user can select favorite music in the above-mentioned music list, or search for favorite music through the search bar, and select the music. When the user selects a piece of music, the music playback application can jump to the music playback interface as shown in Figure 3. In the music playback interface, the lyrics, music name, artist name, and music playback progress of the currently playing music can be displayed Wait. In addition, in the upper right corner of the music playing interface can also display the submission options. The submission option is the option to trigger video synthesis. When the user selects the submission option, it indicates that he wants to use the currently playing music as the material audio to make a composite video. The music playing application starts the video production application installed in the terminal through the system and sets the characteristics of the current playing music (ie material audio) The information is sent to the video production application. Then, the video production application sends a material acquisition request to the server through the terminal, and the material acquisition request carries the characteristic information of the material and audio. The server may be a background server of the video production application.
步骤102、获取服务器发送的素材视频集、素材音频和素材音频的重音节拍时间点。Step 102: Obtain the material video set, the material audio, and the accent beat time point of the material audio sent by the server.
其中,重音节拍时间点为节拍值为1的节拍点在素材音频中对应的时间点。Among them, the accent beat time point is the time point corresponding to the beat point with the beat value of 1 in the material audio.
在实施中,服务器可以存储有音频的打点数据,该打点数据中包括节拍时间点和节拍值,节拍时间点和对应的节拍值可以由技术人员使用机器根据音频数据的BPM(Beat Per Minute,每分钟节拍数)、节拍信息等采集生成,也可以由技术人员通过听该音频数据,手动标记制作。服务器还可以存储有多个素材视频,并根据素材视频的大小、画质、清晰度等指标来给素材视频评分,然后,在存储的素材视频中选取预设数目个评分较高的素材视频,预设数目可以由技术人员根据用户的普遍需求来指定,例如20个。对于选取的预设数目的素材视频,服务器可以对每个素材视频进行剪切,将每个素材视频均剪切为预设时长,预设时长也可以由技术人员根据用户的普遍需求来指定,例如7s。因为在一般的音频中两个重音节拍点之间的时长通常为2s到4s,预设时长设置大于4s,可以尽量避免剪切后的素材视频的时长小于两个重音节拍点之间的时长。另外,因为素材视频需要传输给终端,所以考虑到传输时延的问题,素材视频时长不宜过长,可以以6s到10s为宜。可以在接收到终端发送的素材获取请求后进行以上处理,也可以预先进行并将处理后的素材视频进行存储,接收到素材获取请求后,便可以直接获取本地存储的处理后的素材视频,从而提高制作合成视频的整体效率,。In implementation, the server may store audio dot data. The dot data includes the beat time point and the beat value. The beat time point and the corresponding beat value can be used by the technicians to use the machine according to the BPM (Beat Per Minute, every minute) of the audio data. The number of beats per minute), beat information, etc. are collected and generated, and can also be manually marked and produced by a technician by listening to the audio data. The server can also store multiple material videos, and score the material videos according to their size, image quality, clarity and other indicators, and then select a preset number of material videos with higher scores from the stored material videos. The preset number can be specified by the technician according to the general needs of the user, for example 20. For the selected preset number of material videos, the server can cut each material video and cut each material video to a preset duration. The preset duration can also be specified by the technician according to the general needs of the user. For example, 7s. Because the duration between two accent beats in general audio is usually 2s to 4s, and the preset duration is set to greater than 4s, you can try to avoid the duration of the clipped material video being less than the duration between two accent beats. In addition, because the material video needs to be transmitted to the terminal, considering the transmission delay, the duration of the material video should not be too long, and it can be 6s to 10s. You can perform the above processing after receiving the material acquisition request sent by the terminal, or you can perform it in advance and store the processed material video. After receiving the material acquisition request, you can directly obtain the locally stored processed material video, thereby Improve the overall efficiency of making composite videos.
服务器在对预设数目个视频素材进行剪切处理后,可以将这些素材视频作为发送给终端的素材视频集。另外,考虑到传输时延的问题,服务器还可以在对预设数目个素材视频进行剪切处理后,对这些素材视频进行拼接处理,得到一个总素材视频,并将该总素材视频作为发送给终端的素材视频集发送给终端,这样,传输时延较小。此处还需说明的是,在将总素材视频作为素材视频集的情况下,服务器还需将总素材视频中各素材视频的分隔时间点同时发送给终端。例如,总素材视频由5个7s的素材视频组成,那么,分隔时间点包括有0:07(0分7秒)、0:14、0:21、0:28和0:35。After the server cuts the preset number of video materials, it can use these material videos as the material video set sent to the terminal. In addition, considering the problem of transmission delay, the server can also perform splicing processing on these material videos after cutting the preset number of material videos to obtain a total material video, and send the total material video as The terminal's material video set is sent to the terminal, so that the transmission delay is small. It should also be noted here that when the total material video is used as the material video set, the server also needs to simultaneously send the separation time point of each material video in the total material video to the terminal. For example, the total material video consists of 5 7s material videos, then the separation time points include 0:07 (0 minutes and 7 seconds), 0:14, 0:21, 0:28, and 0:35.
在接收到素材获取请求后,服务器根据素材音频的特征信息,获取相应的素材音频和该素材音频的打点数据。因为在后续制作合成时,主要会使用到打点数据中的重音节拍时间点(节拍值为1的节拍点对应的时间点),因此,可以只将重音节拍时间点与素材音频、素材视频集发送给终端,以减少传输的数据量。终端接收服务器发送的素材音频、素材视频集合素材音频的重音节拍时间点。After receiving the material acquisition request, the server obtains the corresponding material audio and the dot data of the material audio according to the characteristic information of the material audio. Because in the subsequent production and synthesis, the accent beat time point in the dot data (the time point corresponding to the beat point with a beat value of 1) is mainly used, so you can only send the accent beat time point with the material audio and material video set To the terminal to reduce the amount of data transmitted. The terminal receives the accent beat time point of the material audio and the material video collection material audio sent by the server.
此处,还需说明的是,上述终端接收到的素材音频为原始素材音频,终端还可以对原始素材音频进行剪切,相应的,处理可以如下:服务器还可以向终端发送预设剪切时间点,基于预设剪切时间点和预设剪切时长,对原始素材音频进行剪切,得到用于合成视频的素材音频,在原始素材音频的重音节拍时间点中,确定用于合成视频的素材音频的重音节拍时间点。Here, it should also be noted that the material audio received by the above terminal is the original material audio, and the terminal can also cut the original material audio. Correspondingly, the processing can be as follows: the server can also send a preset cutting time to the terminal Point, based on the preset cutting time point and preset cutting time, the original material audio is cut to obtain the material audio for synthesizing the video. In the accent beat time point of the original material audio, determine the The accent beat time point of the material audio.
其中,预设剪切时间点可以为技术人员根据素材音频的节奏等综合考虑确定的时间点,也可以为素材音频的高潮时间点,该高潮时间点可以由技术人员人工标记得出,或者由机器采集得出。如果服务器将这两种时间点都发送给终端,则终端优先选择使用技术人员根据音频数据的节奏等综合考虑确定的时间点。Among them, the preset cutting time point can be a time point determined by the technician based on the rhythm of the material audio, etc., or it can be the climax time point of the material audio. The climax time point can be manually marked by the technician, or Collected by the machine. If the server sends both of these time points to the terminal, the terminal preferentially selects the time point determined by the technician according to the rhythm of the audio data and other comprehensive considerations.
终端在得到预设剪切时间点和原始素材音频后,在原始素材音频中,截取预设剪切时间点之后预设剪切时长的素材音频,作为用于合成视频的素材音频。当然,如果服务器并未发送预设剪切时间点,则终端可以截取原始素材音频的起始时间点之后预设剪切时长的素材音频,作为用于合成视频的素材音频。After obtaining the preset cutting time point and the original material audio, the terminal intercepts the material audio of the preset cutting time after the preset cutting time point in the original material audio, as the material audio for synthesizing the video. Of course, if the server does not send the preset cutting time point, the terminal can intercept the material audio of the preset cutting time after the start time point of the original material audio as the material audio for synthesizing the video.
此处还需说明的是,以下步骤中的素材音频均为用于合成视频的素材音频,相应的,重音节拍时间点也为用于合成视频的素材音频的重音节拍时间点。It should also be noted here that the material audio in the following steps are all material audio used to synthesize the video, and correspondingly, the accent beat time point is also the accent beat time point of the material audio used to synthesize the video.
步骤103、基于重音节拍时间点,在素材视频集中,确定出多个素材视频。Step 103: Based on the accent beat time point, multiple material videos are determined in the material video collection.
在实施中,终端可以基于重音节拍时间点的个数N、素材音频的起始时间点和结束时间点,在获取到的素材视频集中,确定出多个素材视频。相应的,根据重音节拍时间点是否为素材音频的起始时间点或者结束时间点,确定素材视频时,可以有如下几种情况:In an implementation, the terminal may determine multiple material videos in the acquired material video set based on the number N of accent beat time points, the starting time point and the ending time point of the material audio. Correspondingly, according to whether the accent beat time point is the start time point or the end time point of the material audio, when determining the material video, there can be the following situations:
情况一、如果素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在素材视频集中,确定出N个素材视频。Case 1: If one of the start time point and the end time point of the material audio is the accent beat time point, in the material video set, N material videos are determined.
情况二、如果素材音频的起始时间点和结束时间点均是重音节拍时间点,则在素材视频集中,确定出N-1个素材视频。Case 2: If the start time point and the end time point of the material audio are both accent beat time points, then N-1 material videos are determined in the material video set.
情况三、如果素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在素材视频集中,确定出N+1个素材视频。Case 3: If neither the start time point nor the end time point of the material audio is the accent beat time point, in the material video collection, N+1 material videos are determined.
下面对上述三种情况分别举例进行说明。The above three situations will be described below with examples.
对于情况一、如图4所示,重音节拍时间点的个数为5,素材音频的起始时间点是重音节拍时间点,结束时间点不是重音节拍时间点,则相当于重音节拍时间点将该素材音频分为5部分,那么,每部分可以对应一个素材视频,所以,可以在素材视频集中,确定出5个素材视频。For case 1, as shown in Figure 4, the number of accent beat time points is 5, the start time point of the material audio is the accent beat time point, and the end time point is not the accent beat time point, which is equivalent to the accent beat time point. The material audio is divided into 5 parts, so each part can correspond to a material video. Therefore, 5 material videos can be determined in the material video set.
对于情况二、如图5所示,重音节拍时间点的个数为5,素材音频的起始时间点和结束时间点均是重音节拍时间点,则相当于重音节拍时间点将该素材音频分为4部分,那么,每部分可以对应一个素材视频,所以,可以在素材视频集中,确定出4个素材视频。For case two, as shown in Figure 5, the number of accent beat time points is 5, and the start time point and end time point of the material audio are both accent beat time points, which is equivalent to the accent beat time point to divide the material audio If there are 4 parts, then each part can correspond to one material video. Therefore, 4 material videos can be determined in the material video set.
对于情况三、如图6所示,重音节拍时间点的个数为5,素材音频的起始时间点和结束时间点均不是重音节拍时间点,则相当于重音节拍时间点将该素材音频分为6部分,那么,每部分可以对应一个素材视频,所以,可以在素材视频集中,确定出6个素材视频。For case three, as shown in Figure 6, the number of accent beat time points is 5, and the start time point and end time point of the material audio are not accent beat time points, which is equivalent to the accent beat time point. There are 6 parts, so each part can correspond to a material video, so 6 material videos can be determined in the material video set.
另外,对于上述几种情况,如果素材视频集中包括的素材视频数目小于计算出需要确定出的素材视频数目,则确定出素材视频集中的所有素材视频即可。In addition, for the above several situations, if the number of material videos included in the material video set is less than the calculated number of material videos that need to be determined, then all the material videos in the material video set can be determined.
此处还需说明的是,对于上述确定出的N、N-1或N+1个素材视频的选取方式,在素材视频集为多个独立的素材视频的视频集合的情况下,可以在素材视频集中随机选取。在素材视频集为一个总素材视频的情况下,除随机选取外,还可以从第一个开始向后依次选取,或者从最后一个向前依次选取,又或者从第一个开始间隔选取。具体的选取方式本申请实施例不做限定。It should also be noted here that the selection method of N, N-1, or N+1 material videos determined above can be selected in the case of the material video set being a video set of multiple independent material videos. Random selection in the video set. In the case that the material video set is a total material video, in addition to random selection, it can also be selected sequentially from the first to the back, or from the last to the front, or from the first at intervals. The specific selection method is not limited in the embodiment of this application.
步骤104、基于重音节拍时间点,对多个素材视频和素材音频进行合成,得到合成视频,其中,在合成视频中各素材视频的切换时间点为音频数据的重音节拍时间点。Step 104: Based on the accent beat time point, synthesize multiple material videos and material audio to obtain a synthesized video, where the switching time point of each material video in the synthesized video is the accent beat time point of the audio data.
在实施中,终端可以随机确定出在合成视频时,各素材视频的合成顺序,对于素材视频集为一个总素材视频的情况,还可以按照素材视频在该总素材视频中位置确定。然后,再按照所述各素材视频的合成顺序,逐个获取素材视频,每获取一个素材视频,基于当前获取的素材视频和重音节拍时间点,确定当前获取的素材视频对应的子视频。再然后,按照各素材视频的合成顺序,对每个子视频进行合成,得到合成素材视频。此外,在对子视频进行合成时,可以对 每个子视频添加切换特效(如渐入、淡入、弹入、百叶窗式出现等)和切换特效持续时间,其中,切换特效和切换特效持续时间可以由技术人员根据用户的普遍需求预先设置。再然后,对合成素材视频和音频数据进行合成,得到合成视频。最后,可以由视频制作应用程序对该合成视频进行自动播放。如图7所示,在视频制作应用程序的显示界面中部自动播放的即为合成视频。In implementation, the terminal can randomly determine the synthesis order of each material video when synthesizing the video. For the case that the material video set is a total material video, it can also be determined according to the position of the material video in the total material video. Then, according to the composition sequence of the material videos, the material videos are obtained one by one, and for each material video obtained, the sub-video corresponding to the currently obtained material video is determined based on the currently obtained material video and the accent beat time point. Then, according to the synthesis sequence of each material video, each sub-video is synthesized to obtain a synthesized material video. In addition, when synthesizing sub-videos, you can add switching special effects (such as fade in, fade in, pop-in, louvered appearance, etc.) and switching special effect duration to each sub video. Among them, switching special effects and switching special effect durations can be determined by The technicians pre-set according to the general needs of users. Then, the synthesized material video and audio data are synthesized to obtain synthesized video. Finally, the composite video can be automatically played by the video production application. As shown in Figure 7, the composite video is automatically played in the middle of the display interface of the video production application.
对于上述确定当前获取的素材视频对应的子视频,可以有如下几种情况。For the above determination of the sub-video corresponding to the currently acquired material video, there may be several situations as follows.
情况一、如果当前获取的素材视频的合成顺序为第一位,则确定素材音频的起始时间点到该起始时间点之后,且与该起始时间点最近的第一重音节拍时间点之间的第一时长,在素材视频中,从素材视频的起始时间点开始截取第一时长的视频为素材视频对应的第一子视频。Case 1: If the synthesis sequence of the currently acquired material video is the first, it is determined that the starting time point of the material audio is after the starting time point and the first accent beat time point closest to the starting time point In the material video, a video of the first time length is intercepted from the start time point of the material video as the first sub-video corresponding to the material video.
情况二、如果当前获取的素材视频的合成顺序不是第一位,则确定已生成的子视频的第一总时长,确定素材音频的起始时间点之后的第一总时长的第一时间点,确定第一时间点之后,且与第一时间点最近的第二重音节拍时间点。如果存在第二重音节拍时间点,则确定第一时间点与第二重音节拍时间点之间的第二时长,在素材视频中,从素材视频的起始时间点开始截取第二时长的视频为述素材视频对应的第二子视频。如果不存在第二重音节拍时间点,则确定第一时间点到素材音频的结束时间点之间的第三时长,在素材视频中,从素材视频的起始时间点开始截取第三时长的视频为素材视频对应的第三子视频。Case 2: If the synthesis order of the currently acquired material video is not the first, determine the first total duration of the generated sub-video, and determine the first time point of the first total duration after the start time point of the material audio, Determine the second accent beat time point after the first time point and closest to the first time point. If there is a second accent beat time point, determine the second time length between the first time point and the second accent beat time point. In the material video, the second time length video is intercepted from the start time point of the material video as The second sub video corresponding to the material video. If there is no second accent beat time point, determine the third time length from the first time point to the end time point of the material audio. In the material video, the third time length video is intercepted from the start time point of the material video It is the third sub video corresponding to the material video.
下面对上述实现方式中的两种情况分别举例进行说明。The two cases in the foregoing implementation manners are described below with examples.
对于情况一、如图8所示,素材音频的时长为15s,素材音频的起始时间点为0:00,该起始时间点之后且与该起始时间点最近的第一重音节拍时间点为0:03,则该起始时间点0:00到该第一重音节拍时间点0:03之间的第一时长为3s,那么,可以在该素材视频中,从素材视频的起始时间点开始截取3s,作为对应的第一子视频。For case 1, as shown in Figure 8, the duration of the material audio is 15s, and the starting time point of the material audio is 0:00, and the first accent beat time point after the starting time point and closest to the starting time point Is 0:03, then the first duration between the start time point 0:00 and the first accent beat time point 0:03 is 3s, then in the material video, from the start time of the material video Click to start intercepting 3s as the corresponding first sub-video.
对于情况二、如图9所示,素材音频的时长为15s,素材音频的起始时间点为0:00,结束时间点为0:15,已生成的子视频的第一总时长为13s,则素材音频的起始时间点之后第一总时长的第一时间点为0:13,在第一时间点之后,存在第二重音节拍时间点为0:14,则确定第一时间点0:13到第二重音节拍时间点0:14之间的第二时长为1s,那么,可以在该素材视频中,从素材视频的起始时间点开始截取3s,作为对应的第二视频。如图10所示,如果不存在第二重音节拍时间点,则确定第一时间点0:13到素材音频的结束时间点0:15之间的第三时长为 2s,那么,可以在该素材视频中,从素材视频的起始时间点开始截取2s,作为对应的第三子视频。For case two, as shown in Figure 9, the duration of the material audio is 15s, the start time point of the material audio is 0:00, the end time point is 0:15, and the first total duration of the generated sub-video is 13s. Then the first time point of the first total duration after the start time point of the material audio is 0:13, and after the first time point, the time point of the second accent beat is 0:14, then the first time point 0: The second duration from 13 to the second accent beat time point 0:14 is 1s, then, in the material video, 3s can be intercepted from the start time point of the material video as the corresponding second video. As shown in Figure 10, if there is no second accent beat time point, it is determined that the third time between the first time point 0:13 and the end time point of the material audio 0:15 is 2s, then the material can be In the video, cut 2s from the start time point of the material video as the corresponding third sub-video.
此处还需说明的是,如果确定出子视频的总时长小于素材音频的时长,则将素材音频中多余的部分删除即可。那么,最后得到的合成视频的时长即为子视频的总时长。It should also be noted here that if it is determined that the total duration of the sub-video is less than the duration of the material audio, delete the extra part of the material audio. Then, the duration of the synthesized video finally obtained is the total duration of the sub-video.
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。All the above-mentioned optional technical solutions can be combined in any way to form optional embodiments of the present application, which will not be repeated here.
本申请实施例提供的方法,通过从服务器获取到素材视频集、素材音频和所述素材音频的重音节拍时间点。然后,在素材视频集中选取出多个素材视频,最后,基于重音节拍时间点,对多个素材视频和所述素材音频进行合成,得到合成视频。得到的合成视频中,各素材视频的切换时间点为所述音频数据的重音节拍时间点,这样,在终端可以实现自动获取素材,并自动将素材视频和素材音频进行合成,无需人工处理,效率较高。The method provided by the embodiment of the present application obtains the material video set, the material audio, and the accent beat time point of the material audio from the server. Then, multiple material videos are selected from the material video set, and finally, based on the accent beat time point, multiple material videos and the material audio are synthesized to obtain a synthesized video. In the obtained synthesized video, the switching time point of each material video is the accent beat time point of the audio data. In this way, the terminal can automatically obtain the material and automatically synthesize the material video and the material audio without manual processing. Higher.
对于用户选择的音乐均可以由对应的素材音频和服务器中存储的共用的素材视频来制作合成视频,该合成视频即可以作为该音乐对应的一个demo(试样)卡点视频以自动播放的方式展示给用户。这样,可以达到吸引用户进入上述视频制作应用程序自己制作合成视频。For the music selected by the user, a composite video can be made from the corresponding material audio and the shared material video stored in the server. The composite video can be used as a demo (sample) corresponding to the music and the video can be automatically played. Show it to users. In this way, it is possible to attract users to enter the above-mentioned video production application to make a composite video by themselves.
基于相同的技术构思,本申请实施例还提供了一种视频合成的装置,该装置可以为上述实施例中的终端,如图11所示,该装置包括:发送模块1110、获取模块1120、确定模块110和合成模块1140。Based on the same technical concept, an embodiment of the present application also provides a device for video synthesis. The device may be the terminal in the foregoing embodiment. As shown in FIG. 11, the device includes: a sending module 1110, an acquiring module 1120, and a determination Module 110 and synthesis module 1140.
发送模块1110,用于向服务器发送素材获取请求,其中,所述素材获取请求中携带有素材音频的特征信息;The sending module 1110 is configured to send a material acquisition request to the server, where the material acquisition request carries feature information of the material audio;
获取模块1120,用于获取服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点;The obtaining module 1120 is configured to obtain the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
确定模块1130,用于基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频;The determining module 1130 is configured to determine multiple material videos in the material video set based on the accent beat time point;
合成模块1140,用于基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,其中,在所述合成视频中各素材视频的切换时间点为所述音频数据的重音节拍时间点。The synthesis module 1140 is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is all Describe the accent beat time point of the audio data.
可选的,所述确定模块1130,用于:Optionally, the determining module 1130 is configured to:
基于所述重音节拍时间点的个数N、所述素材音频的起始时间点和结束时间点,在所述素材视频集中,确定出多个素材视频。Based on the number of accent beat time points N, the start time point and the end time point of the material audio, a plurality of material videos are determined in the material video set.
可选的,所述确定模块1130,用于:Optionally, the determining module 1130 is configured to:
如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
可选的,所述合成模块1140,用于:Optionally, the synthesis module 1140 is configured to:
确定在合成视频时各素材视频的合成顺序;Determine the synthesis order of each material video when synthesizing the video;
按照所述各素材视频的合成顺序,逐个获取素材视频,每获取一个素材视频,基于当前获取的素材视频和所述重音节拍时间点,确定所述当前获取的素材视频对应的子视频;Obtain the material videos one by one according to the synthesis order of the material videos, and for each material video obtained, determine the sub-video corresponding to the currently obtained material video based on the currently obtained material video and the accent beat time point;
基于所述合成顺序,对每个子视频进行合成,得到合成素材视频,对所述合成素材视频和所述素材音频进行合成,得到合成视频。Based on the synthesis sequence, each sub-video is synthesized to obtain a synthesized material video, and the synthesized material video and the material audio are synthesized to obtain a synthesized video.
可选的,所述合成模块1140,用于:Optionally, the synthesis module 1140 is configured to:
如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
可选的,所述素材视频集为由多个素材视频拼接成的一个总素材视频。Optionally, the material video set is a total material video spliced into a plurality of material videos.
可选的,所述素材视频集为包括有多个独立的素材视频的视频集合。Optionally, the material video set is a video set including multiple independent material videos.
可选的,获取模块1120,用于:Optionally, the obtaining module 1120 is used to:
接收服务器发送的素材视频集、原始素材音频、所述原始素材音频的重音节拍时间点和预设剪切时间点;Receiving the material video set, the original material audio, the accent beat time point of the original material audio and the preset cutting time point sent by the server;
基于所述预设剪切时间点和预设剪切时长,对所述原始素材音频进行剪切,得到用于合成视频的素材音频;Cutting the original material audio based on the preset cutting time point and the preset cutting time length to obtain material audio for synthesizing video;
在所述述原始素材音频的重音节拍时间点中,确定出所述用于合成视频的 素材音频的重音节拍时间点。Among the accent beat time points of the original material audio, the accent beat time point of the material audio for synthesizing video is determined.
需要说明的是:上述实施例提供的视频合成的装置在合成视频时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将终端的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的视频合成的装置与视频合成的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that the video synthesis device provided in the above embodiment only uses the division of the above functional modules for illustration when synthesizing videos. In actual applications, the above functions can be allocated by different functional modules as needed. That is, the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above. In addition, the video synthesis device provided in the foregoing embodiment and the video synthesis method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
图12示出了本申请一个示例性实施例提供的终端1200的结构框图。该终端1200可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1200还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。FIG. 12 shows a structural block diagram of a terminal 1200 provided by an exemplary embodiment of the present application. The terminal 1200 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compressing standard audio Level 4) Player, laptop or desktop computer. The terminal 1200 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
通常,终端1200包括有:处理器1201和存储器1202。Generally, the terminal 1200 includes a processor 1201 and a memory 1202.
处理器1201可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1201可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1201也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1201可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1201还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1201 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve. The processor 1201 may also include a main processor and a coprocessor. The main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen. In some embodiments, the processor 1201 may also include an AI (Artificial Intelligence) processor, which is used to process computing operations related to machine learning.
存储器1202可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1202还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1202中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少 一个指令用于被处理器1201所执行以实现本申请中方法实施例提供的视频合成的方法。The memory 1202 may include one or more computer-readable storage media, which may be non-transitory. The memory 1202 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1202 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1201 to implement the video synthesis provided in the method embodiment of the present application. Methods.
在一些实施例中,终端1200还可以包括:外围设备接口1203和至少一个外围设备。处理器1201、存储器1202和外围设备接口1203之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1203相连。具体地,外围设备包括:射频电路1204、触摸显示屏1205、摄像头1206、音频电路1207、定位组件1208和电源1209中的至少一种。In some embodiments, the terminal 1200 may further include: a peripheral device interface 1203 and at least one peripheral device. The processor 1201, the memory 1202, and the peripheral device interface 1203 may be connected by a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 1203 through a bus, a signal line or a circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1204, a touch display screen 1205, a camera 1206, an audio circuit 1207, a positioning component 1208, and a power supply 1209.
外围设备接口1203可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1201和存储器1202。在一些实施例中,处理器1201、存储器1202和外围设备接口1203被集成在同一芯片或电路板上;在一些其他实施例中,处理器1201、存储器1202和外围设备接口1203中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。The peripheral device interface 1203 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, the memory 1202, and the peripheral device interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1201, the memory 1202, and the peripheral device interface 1203 or The two can be implemented on separate chips or circuit boards, which are not limited in this embodiment.
射频电路1204用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1204通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1204将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路1204包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1204可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路1204还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。The radio frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices through electromagnetic signals. The radio frequency circuit 1204 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals. Optionally, the radio frequency circuit 1204 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on. The radio frequency circuit 1204 can communicate with other terminals through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: metropolitan area network, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network. In some embodiments, the radio frequency circuit 1204 may also include NFC (Near Field Communication) related circuits, which is not limited in this application.
显示屏1205用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1205是触摸显示屏时,显示屏1205还具有采集在显示屏1205的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1201进行处理。此时,显示屏1205还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1205可以为一个,设置终端1200的前面板;在另一些实施例中,显示屏1205可以为至少两个,分别设置在终端1200的不同表面或呈折叠设计;在再一些实施例中,显示屏1205可以是柔性显示屏,设置在终端1200的弯曲表面上或折叠面上。甚至,显示屏1205还可以设置成非矩形的不规则图形,也 即异形屏。显示屏1205可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。The display screen 1205 is used to display a UI (User Interface, user interface). The UI can include graphics, text, icons, videos, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to collect touch signals on or above the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this time, the display screen 1205 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards. In some embodiments, there may be one display screen 1205, which is provided with the front panel of the terminal 1200; in other embodiments, there may be at least two display screens 1205, which are respectively arranged on different surfaces of the terminal 1200 or in a folded design; In still other embodiments, the display screen 1205 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 1200. Furthermore, the display screen 1205 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen. The display screen 1205 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
摄像头组件1206用于采集图像或视频。可选地,摄像头组件1206包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件1206还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。The camera assembly 1206 is used to capture images or videos. Optionally, the camera assembly 1206 includes a front camera and a rear camera. Generally, the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal. In some embodiments, there are at least two rear cameras, each of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, Integrate with the wide-angle camera to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, the camera assembly 1206 may also include a flash. The flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
音频电路1207可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1201进行处理,或者输入至射频电路1204以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端1200的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1201或射频电路1204的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1207还可以包括耳机插孔。The audio circuit 1207 may include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1201 for processing, or input to the radio frequency circuit 1204 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 1200. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is used to convert the electrical signal from the processor 1201 or the radio frequency circuit 1204 into sound waves. The speaker can be a traditional membrane speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 1207 may also include a headphone jack.
定位组件1208用于定位终端1200的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件1208可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。The positioning component 1208 is used to locate the current geographic location of the terminal 1200 to implement navigation or LBS (Location Based Service, location-based service). The positioning component 1208 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.
电源1209用于为终端1200中的各个组件进行供电。电源1209可以是交流电、直流电、一次性电池或可充电电池。当电源1209包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。The power supply 1209 is used to supply power to various components in the terminal 1200. The power source 1209 may be alternating current, direct current, disposable batteries or rechargeable batteries. When the power source 1209 includes a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery can also be used to support fast charging technology.
在一些实施例中,终端1200还包括有一个或多个传感器1210。该一个或多个传感器1210包括但不限于:加速度传感器1211、陀螺仪传感器1212、压力传感器1213、指纹传感器1214、光学传感器1215以及接近传感器1216。In some embodiments, the terminal 1200 further includes one or more sensors 1210. The one or more sensors 1210 include, but are not limited to: an acceleration sensor 1211, a gyroscope sensor 1212, a pressure sensor 1213, a fingerprint sensor 1214, an optical sensor 1215, and a proximity sensor 1216.
加速度传感器1211可以检测以终端1200建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1211可以用于检测重力加速度在三个坐标轴上的分量。处理器1201可以根据加速度传感器1211采集的重力加速度信号,控制触摸显示屏1205以横向视图或纵向视图进行用户界面的显示。加速度传感器1211还可以用于游戏或者用户的运动数据的采集。The acceleration sensor 1211 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1200. For example, the acceleration sensor 1211 can be used to detect the components of the gravitational acceleration on three coordinate axes. The processor 1201 may control the touch screen 1205 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1211. The acceleration sensor 1211 can also be used for game or user motion data collection.
陀螺仪传感器1212可以检测终端1200的机体方向及转动角度,陀螺仪传感器1212可以与加速度传感器1211协同采集用户对终端1200的3D动作。处理器1201根据陀螺仪传感器1212采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。The gyroscope sensor 1212 can detect the body direction and rotation angle of the terminal 1200, and the gyroscope sensor 1212 can cooperate with the acceleration sensor 1211 to collect the user's 3D actions on the terminal 1200. The processor 1201 can implement the following functions according to the data collected by the gyroscope sensor 1212: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
压力传感器1213可以设置在终端1200的侧边框和/或触摸显示屏1205的下层。当压力传感器1213设置在终端1200的侧边框时,可以检测用户对终端1200的握持信号,由处理器1201根据压力传感器1213采集的握持信号进行左右手识别或快捷操作。当压力传感器1213设置在触摸显示屏1205的下层时,由处理器1201根据用户对触摸显示屏1205的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。The pressure sensor 1213 may be disposed on the side frame of the terminal 1200 and/or the lower layer of the touch screen 1205. When the pressure sensor 1213 is arranged on the side frame of the terminal 1200, the user's holding signal of the terminal 1200 can be detected, and the processor 1201 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 1213. When the pressure sensor 1213 is arranged on the lower layer of the touch display screen 1205, the processor 1201 operates according to the user's pressure on the touch display screen 1205 to control the operability controls on the UI interface. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
指纹传感器1214用于采集用户的指纹,由处理器1201根据指纹传感器1214采集到的指纹识别用户的身份,或者,由指纹传感器1214根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1201授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1214可以被设置终端1200的正面、背面或侧面。当终端1200上设置有物理按键或厂商Logo时,指纹传感器1214可以与物理按键或厂商Logo集成在一起。The fingerprint sensor 1214 is used to collect the user's fingerprint. The processor 1201 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user's identity according to the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 1201 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings. The fingerprint sensor 1214 may be provided on the front, back or side of the terminal 1200. When a physical button or a manufacturer logo is provided on the terminal 1200, the fingerprint sensor 1214 can be integrated with the physical button or the manufacturer logo.
光学传感器1215用于采集环境光强度。在一个实施例中,处理器1201可以根据光学传感器1215采集的环境光强度,控制触摸显示屏1205的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏1205的显示亮度;当环境光强度较低时,调低触摸显示屏1205的显示亮度。在另一个实施例中,处理器1201还可以根据光学传感器1215采集的环境光强度,动态调整摄像头组件1206的拍摄参数。The optical sensor 1215 is used to collect the ambient light intensity. In an embodiment, the processor 1201 may control the display brightness of the touch screen 1205 according to the intensity of the ambient light collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1205 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1205 is decreased. In another embodiment, the processor 1201 may also dynamically adjust the shooting parameters of the camera assembly 1206 according to the ambient light intensity collected by the optical sensor 1215.
接近传感器1216,也称距离传感器,通常设置在终端1200的前面板。接近 传感器1216用于采集用户与终端1200的正面之间的距离。在一个实施例中,当接近传感器1216检测到用户与终端1200的正面之间的距离逐渐变小时,由处理器1201控制触摸显示屏1205从亮屏状态切换为息屏状态;当接近传感器1216检测到用户与终端1200的正面之间的距离逐渐变大时,由处理器1201控制触摸显示屏1205从息屏状态切换为亮屏状态。The proximity sensor 1216, also called a distance sensor, is usually arranged on the front panel of the terminal 1200. The proximity sensor 1216 is used to collect the distance between the user and the front of the terminal 1200. In one embodiment, when the proximity sensor 1216 detects that the distance between the user and the front of the terminal 1200 gradually decreases, the processor 1201 controls the touch screen 1205 to switch from the on-screen state to the off-screen state; when the proximity sensor 1216 detects When the distance between the user and the front of the terminal 1200 gradually increases, the processor 1201 controls the touch display screen 1205 to switch from the rest screen state to the bright screen state.
本领域技术人员可以理解,图12中示出的结构并不构成对终端1200的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。Those skilled in the art can understand that the structure shown in FIG. 12 does not constitute a limitation on the terminal 1200, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.
在示例性实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质包括存储有指令的存储器,上述指令可由终端中的处理器执行以完成上述实施例中视频合成的方法。该计算机可读存储介质可以是非暂态的。例如,所述计算机可读存储介质可以是ROM(Read-Only Memory,只读存储器,)、RAM(Random Access Memory,随机存取存储器)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, a computer-readable storage medium is also provided. The computer-readable storage medium includes a memory storing instructions. The instructions can be executed by a processor in a terminal to complete the video synthesis method in the foregoing embodiment. . The computer-readable storage medium may be non-transitory. For example, the computer-readable storage medium may be ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the foregoing embodiments can be implemented by hardware, or by a program instructing relevant hardware to be completed. The program can be stored in a computer-readable storage medium. The storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims (11)

  1. 一种视频合成的方法,其特征在于,所述方法包括:A method for video synthesis, characterized in that the method includes:
    向服务器发送素材获取请求,其中,所述素材获取请求中携带有素材音频的特征信息;Sending a material acquisition request to the server, where the material acquisition request carries characteristic information of the material audio;
    获取所述服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点;Acquiring the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
    基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频;Determine multiple material videos in the material video collection based on the accent beat time point;
    基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,其中,在所述合成视频中各素材视频的切换时间点为所述音频数据的重音节拍时间点。Based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the accent beat of the audio data Point in time.
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频,包括:The method according to claim 1, wherein the determining a plurality of material videos in the material video set based on the accent beat time point comprises:
    基于所述重音节拍时间点的个数N、所述素材音频的起始时间点和结束时间点,在所述素材视频集中,确定出多个素材视频。Based on the number of accent beat time points N, the start time point and the end time point of the material audio, a plurality of material videos are determined in the material video set.
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述重音节拍时间点的个数N、所述素材音频的起始时间点和结束时间点,在所述素材视频集中,确定出多个素材视频,包括:The method according to claim 2, characterized in that, based on the number N of the accent beat time points, the start time point and the end time point of the material audio, in the material video set, determine Multiple material videos, including:
    如果所述素材音频的起始时间点和结束时间点中有一个时间点是重音节拍时间点,则在所述素材视频集中,确定出N个素材视频;If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;
    如果所述素材音频的起始时间点和结束时间点均是重音节拍时间点,则在所述素材视频集中,确定出N-1个素材视频;If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;
    如果所述素材音频的起始时间点和结束时间点均不是重音节拍时间点,则在所述素材视频集中,确定出N+1个素材视频。If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,包括:The method according to claim 1, wherein the synthesizing the plurality of material videos and the material audio based on the accent beat time point to obtain a synthesized video comprises:
    确定在合成视频时各素材视频的合成顺序;Determine the synthesis order of each material video when synthesizing the video;
    按照所述各素材视频的合成顺序,逐个获取素材视频,每获取一个素材视频,基于当前获取的素材视频和所述重音节拍时间点,确定所述当前获取的素材视频对应的子视频;Obtain the material videos one by one according to the synthesis order of the material videos, and for each material video obtained, determine the sub-video corresponding to the currently obtained material video based on the currently obtained material video and the accent beat time point;
    基于所述合成顺序,对每个子视频进行合成,得到合成素材视频,对所述 合成素材视频和所述素材音频进行合成,得到合成视频。Based on the synthesis sequence, each sub-video is synthesized to obtain a synthesized material video, and the synthesized material video and the material audio are synthesized to obtain a synthesized video.
  5. 根据权利要求要求4所述的方法,其特征在于,所述基于当前获取的素材视频和所述重音节拍时间点,确定所述当前获取的素材视频对应的子视频,包括:The method according to claim 4, wherein the determining the sub-video corresponding to the currently acquired material video based on the currently acquired material video and the accent beat time point comprises:
    如果当前获取的素材视频的合成顺序为第一位,则确定所述素材音频的起始时间点到所述起始时间点之后,且与所述起始时间点最近的第一重音节拍时间点之间的第一时长,在所述素材视频中,从所述素材视频的起始时间点开始截取所述第一时长的视频为所述素材视频对应的第一子视频;If the synthesis order of the currently acquired material video is the first, it is determined that the starting time point of the material audio is after the starting time point and the first accent beat time point closest to the starting time point The first duration between, in the material video, the video of the first duration is intercepted from the start time point of the material video as the first sub-video corresponding to the material video;
    如果当前获取的素材视频的合成顺序不是第一位,则确定已生成的子视频的第一总时长,确定所述素材音频的起始时间点之后第一总时长的第一时间点,确定所述第一时间点之后,且与所述第一时间点最近的第二重音节拍时间点;If the synthesis order of the currently acquired material video is not the first, determine the first total duration of the generated sub-video, determine the first time point of the first total duration after the start time point of the material audio, and determine all A second accent beat time point after the first time point and closest to the first time point;
    如果存在所述第二重音节拍时间点,则确定所述第一时间点与所述第二重音节拍时间点之间的第二时长,在所述素材视频中,从所述素材视频的起始时间点开始截取所述第二时长的视频为所述素材视频对应的第二子视频;If there is the second accent beat time point, determine the second time length between the first time point and the second accent beat time point, in the material video, from the beginning of the material video Starting to intercept the second-length video at a time point as the second sub-video corresponding to the material video;
    如果不存在所述第二重音节拍时间点,则确定所述第一时间点到所述素材音频的结束时间点之间的第三时长,在所述素材视频中,从所述素材视频的起始时间点开始截取所述第三时长的视频为所述素材视频对应的第三子视频。If the second accent beat time point does not exist, determine the third time period from the first time point to the end time point of the material audio, in the material video, from the time of the material video The video of the third duration is captured at the beginning time point as the third sub-video corresponding to the material video.
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述素材视频集为由多个素材视频拼接成的一个总素材视频。The method according to any one of claims 1 to 5, wherein the material video set is a total material video that is stitched together from a plurality of material videos.
  7. 根据权利要求1-5中任一项所述的方法,其特征在于,所述素材视频集为包括有多个独立的素材视频的视频集合。The method according to any one of claims 1 to 5, wherein the material video set is a video set including a plurality of independent material videos.
  8. 根据权利要求1-5中任一项所述的方法,其特征在于,所述获取服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点,包括:The method according to any one of claims 1 to 5, wherein the acquiring the material video set, the material audio, and the accent beat time point of the material audio sent by the server comprises:
    接收服务器发送的素材视频集、原始素材音频、所述原始素材音频的重音节拍时间点和预设剪切时间点;Receiving the material video set, the original material audio, the accent beat time point of the original material audio and the preset cutting time point sent by the server;
    基于所述预设剪切时间点和预设剪切时长,对所述原始素材音频进行剪切,得到用于合成视频的素材音频;Cutting the original material audio based on the preset cutting time point and the preset cutting time length to obtain material audio for synthesizing video;
    在所述原始素材音频的重音节拍时间点中,确定出所述用于合成视频的素材音频的重音节拍时间点。In the accent beat time point of the original material audio, the accent beat time point of the material audio for synthesizing video is determined.
  9. 一种视频合成的装置,其特征在于,所述装置包括:A video synthesis device, characterized in that the device includes:
    发送模块,用于向服务器发送素材获取请求,其中,所述素材获取请求中携带有素材音频的特征信息;A sending module, configured to send a material acquisition request to the server, wherein the material acquisition request carries characteristic information of the material audio;
    获取模块,用于获取所述服务器发送的素材视频集、素材音频和所述素材音频的重音节拍时间点;An acquiring module, configured to acquire the material video set, the material audio, and the accent beat time point of the material audio sent by the server;
    确定模块,用于基于所述重音节拍时间点,在所述素材视频集中,确定出多个素材视频;A determining module, configured to determine multiple material videos in the material video collection based on the accent beat time point;
    合成模块,用于基于所述重音节拍时间点,对所述多个素材视频和所述素材音频进行合成,得到合成视频,其中,在所述合成视频中各素材视频的切换时间点为所述音频数据的重音节拍时间点。The synthesis module is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the The accent beat time point of the audio data.
  10. 一种终端,其特征在于,所述终端包括处理器和存储器,所述存储器中存储有至少一条指令,所述指令由所述处理器加载并执行以实现如权利要求1至权利要求8任一项所述的视频合成的方法所执行的操作。A terminal, characterized in that the terminal includes a processor and a memory, and at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement any one of claims 1 to 8 The operations performed by the video synthesis method described in the item.
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至权利要求8任一项所述的视频合成的方法所执行的操作。A computer-readable storage medium, characterized in that, at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement any one of claims 1 to 8 The operations performed by the method of video synthesis.
PCT/CN2019/120302 2019-07-17 2019-11-22 Video synthesis method and apparatus, and terminal and storage medium WO2021008055A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910647507.9 2019-07-17
CN201910647507.9A CN110336960B (en) 2019-07-17 2019-07-17 Video synthesis method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2021008055A1 true WO2021008055A1 (en) 2021-01-21

Family

ID=68145712

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120302 WO2021008055A1 (en) 2019-07-17 2019-11-22 Video synthesis method and apparatus, and terminal and storage medium

Country Status (2)

Country Link
CN (1) CN110336960B (en)
WO (1) WO2021008055A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613061A (en) * 2021-07-06 2021-11-05 北京达佳互联信息技术有限公司 Checkpoint template generation method, checkpoint template generation device, checkpoint template generation equipment and storage medium
CN113727038A (en) * 2021-07-28 2021-11-30 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN114286164A (en) * 2021-12-28 2022-04-05 北京思明启创科技有限公司 Video synthesis method and device, electronic equipment and storage medium
CN114390356A (en) * 2022-01-19 2022-04-22 维沃移动通信有限公司 Video processing method, video processing device and electronic equipment

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112235631B (en) * 2019-07-15 2022-05-03 北京字节跳动网络技术有限公司 Video processing method and device, electronic equipment and storage medium
CN110336960B (en) * 2019-07-17 2021-12-10 广州酷狗计算机科技有限公司 Video synthesis method, device, terminal and storage medium
CN110519638B (en) * 2019-09-06 2023-05-16 Oppo广东移动通信有限公司 Processing method, processing device, electronic device, and storage medium
CN110677711B (en) * 2019-10-17 2022-03-01 北京字节跳动网络技术有限公司 Video dubbing method and device, electronic equipment and computer readable medium
CN110797055B (en) * 2019-10-29 2021-09-03 北京达佳互联信息技术有限公司 Multimedia resource synthesis method and device, electronic equipment and storage medium
CN110769309B (en) * 2019-11-04 2023-03-31 北京字节跳动网络技术有限公司 Method, device, electronic equipment and medium for displaying music points
CN112822563A (en) 2019-11-15 2021-05-18 北京字节跳动网络技术有限公司 Method, device, electronic equipment and computer readable medium for generating video
CN112822541B (en) 2019-11-18 2022-05-20 北京字节跳动网络技术有限公司 Video generation method and device, electronic equipment and computer readable medium
CN111064992A (en) * 2019-12-10 2020-04-24 懂频智能科技(上海)有限公司 Method for automatically switching video contents according to music beats
CN110933487B (en) * 2019-12-18 2022-05-03 北京百度网讯科技有限公司 Method, device and equipment for generating click video and storage medium
CN111065001B (en) * 2019-12-25 2022-03-22 广州酷狗计算机科技有限公司 Video production method, device, equipment and storage medium
CN111031394B (en) * 2019-12-30 2022-03-22 广州酷狗计算机科技有限公司 Video production method, device, equipment and storage medium
CN111625682B (en) * 2020-04-30 2023-10-20 腾讯音乐娱乐科技(深圳)有限公司 Video generation method, device, computer equipment and storage medium
CN111741365B (en) * 2020-05-15 2021-10-26 广州小迈网络科技有限公司 Video composition data processing method, system, device and storage medium
CN111970571B (en) * 2020-08-24 2022-07-26 北京字节跳动网络技术有限公司 Video production method, device, equipment and storage medium
CN112153463B (en) * 2020-09-04 2023-06-16 上海七牛信息技术有限公司 Multi-material video synthesis method and device, electronic equipment and storage medium
CN112435687A (en) * 2020-11-25 2021-03-02 腾讯科技(深圳)有限公司 Audio detection method and device, computer equipment and readable storage medium
CN112866584B (en) * 2020-12-31 2023-01-20 北京达佳互联信息技术有限公司 Video synthesis method, device, terminal and storage medium
CN113014959B (en) * 2021-03-15 2022-08-09 福建省捷盛网络科技有限公司 Internet short video merging system
CN115695899A (en) * 2021-07-23 2023-02-03 花瓣云科技有限公司 Video generation method, electronic device and medium thereof
CN113676772B (en) * 2021-08-16 2023-08-08 上海哔哩哔哩科技有限公司 Video generation method and device
WO2023051245A1 (en) * 2021-09-29 2023-04-06 北京字跳网络技术有限公司 Video processing method and apparatus, and device and storage medium
CN113923378B (en) * 2021-09-29 2024-03-19 北京字跳网络技术有限公司 Video processing method, device, equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101421707A (en) * 2006-04-13 2009-04-29 伊默生公司 System and method for automatically producing haptic events from a digital audio signal
CN101640057A (en) * 2009-05-31 2010-02-03 北京中星微电子有限公司 Audio and video matching method and device therefor
US20100220197A1 (en) * 2009-03-02 2010-09-02 John Nicholas Dukellis Assisted Video Creation Utilizing a Camera
CN102117638A (en) * 2009-12-30 2011-07-06 北京华旗随身数码股份有限公司 Method for outputting video under control of music rhythm and playing device
CN107770457A (en) * 2017-10-27 2018-03-06 维沃移动通信有限公司 A kind of video creating method and mobile terminal
CN108124101A (en) * 2017-12-18 2018-06-05 北京奇虎科技有限公司 Video capture method, device, electronic equipment and computer readable storage medium
CN108259983A (en) * 2017-12-29 2018-07-06 广州市百果园信息技术有限公司 A kind of method of video image processing, computer readable storage medium and terminal
CN109413342A (en) * 2018-12-21 2019-03-01 广州酷狗计算机科技有限公司 Audio/video processing method, device, terminal and storage medium
CN110336960A (en) * 2019-07-17 2019-10-15 广州酷狗计算机科技有限公司 Method, apparatus, terminal and the storage medium of Video Composition

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001313915A (en) * 2000-04-28 2001-11-09 Matsushita Electric Ind Co Ltd Video conference equipment
CN107483843B (en) * 2017-08-16 2019-11-15 成都品果科技有限公司 Audio-video matches clipping method and device
CN107770626B (en) * 2017-11-06 2020-03-17 腾讯科技(深圳)有限公司 Video material processing method, video synthesizing device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101421707A (en) * 2006-04-13 2009-04-29 伊默生公司 System and method for automatically producing haptic events from a digital audio signal
US20100220197A1 (en) * 2009-03-02 2010-09-02 John Nicholas Dukellis Assisted Video Creation Utilizing a Camera
CN101640057A (en) * 2009-05-31 2010-02-03 北京中星微电子有限公司 Audio and video matching method and device therefor
CN102117638A (en) * 2009-12-30 2011-07-06 北京华旗随身数码股份有限公司 Method for outputting video under control of music rhythm and playing device
CN107770457A (en) * 2017-10-27 2018-03-06 维沃移动通信有限公司 A kind of video creating method and mobile terminal
CN108124101A (en) * 2017-12-18 2018-06-05 北京奇虎科技有限公司 Video capture method, device, electronic equipment and computer readable storage medium
CN108259983A (en) * 2017-12-29 2018-07-06 广州市百果园信息技术有限公司 A kind of method of video image processing, computer readable storage medium and terminal
CN109413342A (en) * 2018-12-21 2019-03-01 广州酷狗计算机科技有限公司 Audio/video processing method, device, terminal and storage medium
CN110336960A (en) * 2019-07-17 2019-10-15 广州酷狗计算机科技有限公司 Method, apparatus, terminal and the storage medium of Video Composition

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613061A (en) * 2021-07-06 2021-11-05 北京达佳互联信息技术有限公司 Checkpoint template generation method, checkpoint template generation device, checkpoint template generation equipment and storage medium
CN113727038A (en) * 2021-07-28 2021-11-30 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN113727038B (en) * 2021-07-28 2023-09-05 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN114286164A (en) * 2021-12-28 2022-04-05 北京思明启创科技有限公司 Video synthesis method and device, electronic equipment and storage medium
CN114286164B (en) * 2021-12-28 2024-02-09 北京思明启创科技有限公司 Video synthesis method and device, electronic equipment and storage medium
CN114390356A (en) * 2022-01-19 2022-04-22 维沃移动通信有限公司 Video processing method, video processing device and electronic equipment

Also Published As

Publication number Publication date
CN110336960B (en) 2021-12-10
CN110336960A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
WO2021008055A1 (en) Video synthesis method and apparatus, and terminal and storage medium
WO2020253096A1 (en) Method and apparatus for video synthesis, terminal and storage medium
CN110267067B (en) Live broadcast room recommendation method, device, equipment and storage medium
US11632584B2 (en) Video switching during music playback
CN111065001B (en) Video production method, device, equipment and storage medium
CN110491358B (en) Method, device, equipment, system and storage medium for audio recording
CN111918090B (en) Live broadcast picture display method and device, terminal and storage medium
CN111142838B (en) Audio playing method, device, computer equipment and storage medium
CN111061405B (en) Method, device and equipment for recording song audio and storage medium
WO2021068903A1 (en) Method for determining volume adjustment ratio information, apparatus, device and storage medium
CN109982129B (en) Short video playing control method and device and storage medium
WO2021139535A1 (en) Method, apparatus and system for playing audio, and device and storage medium
EP3618055B1 (en) Audio mixing method and terminal, and storage medium
WO2022095465A1 (en) Information display method and apparatus
WO2023011050A1 (en) Method and system for performing microphone-connection chorusing, and device and storage medium
CN110798327B (en) Message processing method, device and storage medium
CN109743461B (en) Audio data processing method, device, terminal and storage medium
CN111818358A (en) Audio file playing method and device, terminal and storage medium
WO2022227581A1 (en) Resource display method and computer device
CN112822544B (en) Video material file generation method, video synthesis method, device and medium
WO2020244516A1 (en) Online interaction method and device
CN111031394B (en) Video production method, device, equipment and storage medium
CN112616082A (en) Video preview method, device, terminal and storage medium
CN110868642B (en) Video playing method, device and storage medium
CN112118482A (en) Audio file playing method and device, terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937614

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19937614

Country of ref document: EP

Kind code of ref document: A1