WO2021008055A1

WO2021008055A1 - Video synthesis method and apparatus, and terminal and storage medium

Info

Publication number: WO2021008055A1
Application number: PCT/CN2019/120302
Authority: WO
Inventors: 吴晗; 李文涛; 王森; 陈恒全
Original assignee: 广州酷狗计算机科技有限公司
Priority date: 2019-07-17
Filing date: 2019-11-22
Publication date: 2021-01-21
Also published as: CN110336960B; CN110336960A

Abstract

Disclosed is a video synthesis method, wherein same belongs to the technical field of video processing. The method comprises: sending a material acquisition request to a server, wherein the material acquisition request carries feature information of material audio; acquiring a material video set, the material audio and an accented beat time point of the material audio sent by the server; determining, on the basis of the accented beat time point, a plurality of material videos from the material video set; and synthesizing, on the basis of the accented beat time point, the plurality of material videos and the material audio to obtain a synthesized video, wherein a switching time point of each material video in the synthesized video is the accented beat time point of the audio data. By means of the present application, the video synthesis efficiency can be improved.

Description

Method, device, terminal and storage medium for video synthesis

This application claims the priority of a Chinese patent application filed on July 17, 2019 with the application number 201910647507.9 and the title of the invention "Method, device, terminal and storage medium for video synthesis", the entire content of which is incorporated into this application by reference in.

Technical field

This application relates to the field of video processing technology, and in particular to a method, device, terminal and storage medium for video synthesis.

Background technique

In daily life, people usually want to use their favorite music as background music to make short videos.

Generally speaking, when making short videos, you need to collect the material videos yourself, and then use video editing software to splice the collected material videos, and add your favorite music as background music to get a composite video.

In the process of realizing this application, the inventor found that the prior art has at least the following problems:

The above-mentioned process of making composite video needs to be completed manually, which is cumbersome and has low synthesis efficiency.

Summary of the invention

The embodiments of the present application provide a method, device, terminal, and storage medium for video synthesis, which can solve the problem of low video synthesis efficiency. The technical solution is as follows:

In a first aspect, a method for video synthesis is provided, and the method includes:

Sending a material acquisition request to the server, where the material acquisition request carries characteristic information of the material audio;

Acquiring the material video set, the material audio, and the accent beat time point of the material audio sent by the server;

Determine multiple material videos in the material video collection based on the accent beat time point;

Based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the accent beat of the audio data Point in time.

Optionally, the determining a plurality of material videos in the material video set based on the accent beat time point includes:

Based on the number of accent beat time points N, the start time point and the end time point of the material audio, a plurality of material videos are determined in the material video set.

Optionally, the determining a plurality of material videos in the material video set based on the number N of the accent beat time points, the start time point and the end time point of the material audio, includes:

If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;

If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;

If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.

Optionally, the synthesizing the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video includes:

Determine the synthesis order of each material video when synthesizing the video;

Obtain the material videos one by one according to the synthesis order of the material videos, and for each material video obtained, determine the sub-video corresponding to the currently obtained material video based on the currently obtained material video and the accent beat time point;

Based on the synthesis sequence, each sub-video is synthesized to obtain a synthesized material video, and the synthesized material video and the material audio are synthesized to obtain a synthesized video.

Optionally, the determining the sub-video corresponding to the currently acquired material video based on the currently acquired material video and the accent beat time point includes:

Optionally, the material video set is a total material video spliced into a plurality of material videos.

Optionally, the material video set is a video set including multiple independent material videos.

Optionally, the acquisition of the material video set, the material audio, and the accent beat time point of the material audio sent by the server includes:

Receiving the material video set, the original material audio, the accent beat time point of the original material audio and the preset cutting time point sent by the server;

Cutting the original material audio based on the preset cutting time point and the preset cutting time length to obtain material audio for synthesizing video;

Among the accent beat time points of the original material audio, the accent beat time point of the material audio for synthesizing video is determined.

In a second aspect, a video synthesis device is provided, and the device includes:

A sending module, configured to send a material acquisition request to the server, wherein the material acquisition request carries characteristic information of the material audio;

An acquisition module, configured to acquire the material video set, the material audio, and the accent beat time point of the material audio sent by the server;

A determining module, configured to determine multiple material videos in the material video collection based on the accent beat time point;

The synthesis module is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the The accent beat time point of the audio data.

Optionally, the determining module is used to:

Optionally, the synthesis module is used for:

Optionally, the obtaining module 1120 is configured to:

In a third aspect, a terminal is provided, characterized in that the terminal includes a processor and a memory, and at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the first The operations performed by the video synthesis method described in the aspect.

In a fourth aspect, a computer-readable storage medium is provided, characterized in that at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement the above-mentioned first aspect. The operations performed by the method of video synthesis.

The beneficial effects brought about by the technical solutions provided by the embodiments of this application are:

The material video set, the material audio, and the accent beat time point of the material audio are obtained from the server. Then, multiple material videos are selected from the material video set, and finally, based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video. In the obtained synthesized video, the switching time point of each material video is the accent beat time point of the audio data. In this way, the material can be automatically obtained, and the material video and material audio can be automatically synthesized without manual processing, and the efficiency is high. .

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.

FIG. 1 is a flowchart of a method for video synthesis provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an application program interface provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of an application program interface provided by an embodiment of the present application;

4 is a schematic diagram of calculating the number of material videos provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of calculating the number of material videos provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of calculating the number of material videos according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an application program interface provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of calculating the duration of a sub video provided by an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a video synthesis device provided by an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a terminal provided by an embodiment of the present application.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present application clearer, the following will further describe the embodiments of the present application in detail with reference to the accompanying drawings.

The embodiment of the present application provides a method for video synthesis, which may be implemented by a terminal. Among them, the terminal may be a mobile phone, a tablet computer, and so on. An application program (hereinafter referred to as a video production application) that can be used to make a composite video is installed in the terminal. The video production application can be a comprehensive application with a variety of functions, such as making composite video, Video recording, video playback, video editing, live broadcast functions, etc., can also be a single-function application with only the function of making composite videos.

The user can select music in the video production application, and obtain the material for making the synthesized video from the server through the application. The material can include material audio corresponding to the music and some material videos. The application program can synthesize the acquired materials based on this method to obtain a synthesized video.

In addition, a music playback application and a video production application can be installed in the terminal at the same time. Similarly, the music playback application can be a comprehensive application with a variety of functions, such as music playback and audio recording. , Live broadcast function, etc., can also be a single-function application, only with the function of music playback. The user can select favorite music through the music playback application, and obtain materials through the video production application to make a composite video. In the following specific embodiments, a case where a music playback application and a video production application are installed in the terminal is taken as an example for description.

Fig. 1 is a flowchart of a method for video synthesis provided by an embodiment of the present application. Referring to Fig. 1, this embodiment includes:

Step 101: Send a material acquisition request to a server, where the material acquisition request carries characteristic information of the material audio.

The feature information of the material audio may be a music name, a hash value of the music name, or a hash value of audio data, etc. The feature information can uniquely identify the material audio, and the specific information is not limited here.

In the implementation, the terminal is installed with a music playing application and a video production application. As shown in Figure 2, the music player application is provided with a music selection interface for the user. In the music selection interface, a search bar and a music list can be included. The music list can display the music name, Music duration and other information.

The user can select favorite music in the above-mentioned music list, or search for favorite music through the search bar, and select the music. When the user selects a piece of music, the music playback application can jump to the music playback interface as shown in Figure 3. In the music playback interface, the lyrics, music name, artist name, and music playback progress of the currently playing music can be displayed Wait. In addition, in the upper right corner of the music playing interface can also display the submission options. The submission option is the option to trigger video synthesis. When the user selects the submission option, it indicates that he wants to use the currently playing music as the material audio to make a composite video. The music playing application starts the video production application installed in the terminal through the system and sets the characteristics of the current playing music (ie material audio) The information is sent to the video production application. Then, the video production application sends a material acquisition request to the server through the terminal, and the material acquisition request carries the characteristic information of the material and audio. The server may be a background server of the video production application.

Step 102: Obtain the material video set, the material audio, and the accent beat time point of the material audio sent by the server.

Among them, the accent beat time point is the time point corresponding to the beat point with the beat value of 1 in the material audio.

In implementation, the server may store audio dot data. The dot data includes the beat time point and the beat value. The beat time point and the corresponding beat value can be used by the technicians to use the machine according to the BPM (Beat Per Minute, every minute) of the audio data. The number of beats per minute), beat information, etc. are collected and generated, and can also be manually marked and produced by a technician by listening to the audio data. The server can also store multiple material videos, and score the material videos according to their size, image quality, clarity and other indicators, and then select a preset number of material videos with higher scores from the stored material videos. The preset number can be specified by the technician according to the general needs of the user, for example 20. For the selected preset number of material videos, the server can cut each material video and cut each material video to a preset duration. The preset duration can also be specified by the technician according to the general needs of the user. For example, 7s. Because the duration between two accent beats in general audio is usually 2s to 4s, and the preset duration is set to greater than 4s, you can try to avoid the duration of the clipped material video being less than the duration between two accent beats. In addition, because the material video needs to be transmitted to the terminal, considering the transmission delay, the duration of the material video should not be too long, and it can be 6s to 10s. You can perform the above processing after receiving the material acquisition request sent by the terminal, or you can perform it in advance and store the processed material video. After receiving the material acquisition request, you can directly obtain the locally stored processed material video, thereby Improve the overall efficiency of making composite videos.

After the server cuts the preset number of video materials, it can use these material videos as the material video set sent to the terminal. In addition, considering the problem of transmission delay, the server can also perform splicing processing on these material videos after cutting the preset number of material videos to obtain a total material video, and send the total material video as The terminal's material video set is sent to the terminal, so that the transmission delay is small. It should also be noted here that when the total material video is used as the material video set, the server also needs to simultaneously send the separation time point of each material video in the total material video to the terminal. For example, the total material video consists of 5 7s material videos, then the separation time points include 0:07 (0 minutes and 7 seconds), 0:14, 0:21, 0:28, and 0:35.

After receiving the material acquisition request, the server obtains the corresponding material audio and the dot data of the material audio according to the characteristic information of the material audio. Because in the subsequent production and synthesis, the accent beat time point in the dot data (the time point corresponding to the beat point with a beat value of 1) is mainly used, so you can only send the accent beat time point with the material audio and material video set To the terminal to reduce the amount of data transmitted. The terminal receives the accent beat time point of the material audio and the material video collection material audio sent by the server.

Here, it should also be noted that the material audio received by the above terminal is the original material audio, and the terminal can also cut the original material audio. Correspondingly, the processing can be as follows: the server can also send a preset cutting time to the terminal Point, based on the preset cutting time point and preset cutting time, the original material audio is cut to obtain the material audio for synthesizing the video. In the accent beat time point of the original material audio, determine the The accent beat time point of the material audio.

Among them, the preset cutting time point can be a time point determined by the technician based on the rhythm of the material audio, etc., or it can be the climax time point of the material audio. The climax time point can be manually marked by the technician, or Collected by the machine. If the server sends both of these time points to the terminal, the terminal preferentially selects the time point determined by the technician according to the rhythm of the audio data and other comprehensive considerations.

After obtaining the preset cutting time point and the original material audio, the terminal intercepts the material audio of the preset cutting time after the preset cutting time point in the original material audio, as the material audio for synthesizing the video. Of course, if the server does not send the preset cutting time point, the terminal can intercept the material audio of the preset cutting time after the start time point of the original material audio as the material audio for synthesizing the video.

It should also be noted here that the material audio in the following steps are all material audio used to synthesize the video, and correspondingly, the accent beat time point is also the accent beat time point of the material audio used to synthesize the video.

Step 103: Based on the accent beat time point, multiple material videos are determined in the material video collection.

In an implementation, the terminal may determine multiple material videos in the acquired material video set based on the number N of accent beat time points, the starting time point and the ending time point of the material audio. Correspondingly, according to whether the accent beat time point is the start time point or the end time point of the material audio, when determining the material video, there can be the following situations:

Case 1: If one of the start time point and the end time point of the material audio is the accent beat time point, in the material video set, N material videos are determined.

Case 2: If the start time point and the end time point of the material audio are both accent beat time points, then N-1 material videos are determined in the material video set.

Case 3: If neither the start time point nor the end time point of the material audio is the accent beat time point, in the material video collection, N+1 material videos are determined.

The above three situations will be described below with examples.

For case 1, as shown in Figure 4, the number of accent beat time points is 5, the start time point of the material audio is the accent beat time point, and the end time point is not the accent beat time point, which is equivalent to the accent beat time point. The material audio is divided into 5 parts, so each part can correspond to a material video. Therefore, 5 material videos can be determined in the material video set.

For case two, as shown in Figure 5, the number of accent beat time points is 5, and the start time point and end time point of the material audio are both accent beat time points, which is equivalent to the accent beat time point to divide the material audio If there are 4 parts, then each part can correspond to one material video. Therefore, 4 material videos can be determined in the material video set.

For case three, as shown in Figure 6, the number of accent beat time points is 5, and the start time point and end time point of the material audio are not accent beat time points, which is equivalent to the accent beat time point. There are 6 parts, so each part can correspond to a material video, so 6 material videos can be determined in the material video set.

In addition, for the above several situations, if the number of material videos included in the material video set is less than the calculated number of material videos that need to be determined, then all the material videos in the material video set can be determined.

It should also be noted here that the selection method of N, N-1, or N+1 material videos determined above can be selected in the case of the material video set being a video set of multiple independent material videos. Random selection in the video set. In the case that the material video set is a total material video, in addition to random selection, it can also be selected sequentially from the first to the back, or from the last to the front, or from the first at intervals. The specific selection method is not limited in the embodiment of this application.

Step 104: Based on the accent beat time point, synthesize multiple material videos and material audio to obtain a synthesized video, where the switching time point of each material video in the synthesized video is the accent beat time point of the audio data.

In implementation, the terminal can randomly determine the synthesis order of each material video when synthesizing the video. For the case that the material video set is a total material video, it can also be determined according to the position of the material video in the total material video. Then, according to the composition sequence of the material videos, the material videos are obtained one by one, and for each material video obtained, the sub-video corresponding to the currently obtained material video is determined based on the currently obtained material video and the accent beat time point. Then, according to the synthesis sequence of each material video, each sub-video is synthesized to obtain a synthesized material video. In addition, when synthesizing sub-videos, you can add switching special effects (such as fade in, fade in, pop-in, louvered appearance, etc.) and switching special effect duration to each sub video. Among them, switching special effects and switching special effect durations can be determined by The technicians pre-set according to the general needs of users. Then, the synthesized material video and audio data are synthesized to obtain synthesized video. Finally, the composite video can be automatically played by the video production application. As shown in Figure 7, the composite video is automatically played in the middle of the display interface of the video production application.

For the above determination of the sub-video corresponding to the currently acquired material video, there may be several situations as follows.

Case 1: If the synthesis sequence of the currently acquired material video is the first, it is determined that the starting time point of the material audio is after the starting time point and the first accent beat time point closest to the starting time point In the material video, a video of the first time length is intercepted from the start time point of the material video as the first sub-video corresponding to the material video.

Case 2: If the synthesis order of the currently acquired material video is not the first, determine the first total duration of the generated sub-video, and determine the first time point of the first total duration after the start time point of the material audio, Determine the second accent beat time point after the first time point and closest to the first time point. If there is a second accent beat time point, determine the second time length between the first time point and the second accent beat time point. In the material video, the second time length video is intercepted from the start time point of the material video as The second sub video corresponding to the material video. If there is no second accent beat time point, determine the third time length from the first time point to the end time point of the material audio. In the material video, the third time length video is intercepted from the start time point of the material video It is the third sub video corresponding to the material video.

The two cases in the foregoing implementation manners are described below with examples.

For case 1, as shown in Figure 8, the duration of the material audio is 15s, and the starting time point of the material audio is 0:00, and the first accent beat time point after the starting time point and closest to the starting time point Is 0:03, then the first duration between the start time point 0:00 and the first accent beat time point 0:03 is 3s, then in the material video, from the start time of the material video Click to start intercepting 3s as the corresponding first sub-video.

For case two, as shown in Figure 9, the duration of the material audio is 15s, the start time point of the material audio is 0:00, the end time point is 0:15, and the first total duration of the generated sub-video is 13s. Then the first time point of the first total duration after the start time point of the material audio is 0:13, and after the first time point, the time point of the second accent beat is 0:14, then the first time point 0: The second duration from 13 to the second accent beat time point 0:14 is 1s, then, in the material video, 3s can be intercepted from the start time point of the material video as the corresponding second video. As shown in Figure 10, if there is no second accent beat time point, it is determined that the third time between the first time point 0:13 and the end time point of the material audio 0:15 is 2s, then the material can be In the video, cut 2s from the start time point of the material video as the corresponding third sub-video.

It should also be noted here that if it is determined that the total duration of the sub-video is less than the duration of the material audio, delete the extra part of the material audio. Then, the duration of the synthesized video finally obtained is the total duration of the sub-video.

All the above-mentioned optional technical solutions can be combined in any way to form optional embodiments of the present application, which will not be repeated here.

The method provided by the embodiment of the present application obtains the material video set, the material audio, and the accent beat time point of the material audio from the server. Then, multiple material videos are selected from the material video set, and finally, based on the accent beat time point, multiple material videos and the material audio are synthesized to obtain a synthesized video. In the obtained synthesized video, the switching time point of each material video is the accent beat time point of the audio data. In this way, the terminal can automatically obtain the material and automatically synthesize the material video and the material audio without manual processing. Higher.

For the music selected by the user, a composite video can be made from the corresponding material audio and the shared material video stored in the server. The composite video can be used as a demo (sample) corresponding to the music and the video can be automatically played. Show it to users. In this way, it is possible to attract users to enter the above-mentioned video production application to make a composite video by themselves.

Based on the same technical concept, an embodiment of the present application also provides a device for video synthesis. The device may be the terminal in the foregoing embodiment. As shown in FIG. 11, the device includes: a sending module 1110, an acquiring module 1120, and a determination Module 110 and synthesis module 1140.

The sending module 1110 is configured to send a material acquisition request to the server, where the material acquisition request carries feature information of the material audio;

The obtaining module 1120 is configured to obtain the material video set, the material audio, and the accent beat time point of the material audio sent by the server;

The determining module 1130 is configured to determine multiple material videos in the material video set based on the accent beat time point;

The synthesis module 1140 is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is all Describe the accent beat time point of the audio data.

Optionally, the determining module 1130 is configured to:

Optionally, the synthesis module 1140 is configured to:

Optionally, the obtaining module 1120 is used to:

It should be noted that the video synthesis device provided in the above embodiment only uses the division of the above functional modules for illustration when synthesizing videos. In actual applications, the above functions can be allocated by different functional modules as needed. That is, the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above. In addition, the video synthesis device provided in the foregoing embodiment and the video synthesis method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

FIG. 12 shows a structural block diagram of a terminal 1200 provided by an exemplary embodiment of the present application. The terminal 1200 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compressing standard audio Level 4) Player, laptop or desktop computer. The terminal 1200 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.

Generally, the terminal 1200 includes a processor 1201 and a memory 1202.

The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1201 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve. The processor 1201 may also include a main processor and a coprocessor. The main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen. In some embodiments, the processor 1201 may also include an AI (Artificial Intelligence) processor, which is used to process computing operations related to machine learning.

The memory 1202 may include one or more computer-readable storage media, which may be non-transitory. The memory 1202 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1202 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1201 to implement the video synthesis provided in the method embodiment of the present application. Methods.

In some embodiments, the terminal 1200 may further include: a peripheral device interface 1203 and at least one peripheral device. The processor 1201, the memory 1202, and the peripheral device interface 1203 may be connected by a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 1203 through a bus, a signal line or a circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1204, a touch display screen 1205, a camera 1206, an audio circuit 1207, a positioning component 1208, and a power supply 1209.

The peripheral device interface 1203 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, the memory 1202, and the peripheral device interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1201, the memory 1202, and the peripheral device interface 1203 or The two can be implemented on separate chips or circuit boards, which are not limited in this embodiment.

The radio frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices through electromagnetic signals. The radio frequency circuit 1204 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals. Optionally, the radio frequency circuit 1204 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on. The radio frequency circuit 1204 can communicate with other terminals through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: metropolitan area network, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network. In some embodiments, the radio frequency circuit 1204 may also include NFC (Near Field Communication) related circuits, which is not limited in this application.

The display screen 1205 is used to display a UI (User Interface, user interface). The UI can include graphics, text, icons, videos, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to collect touch signals on or above the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this time, the display screen 1205 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards. In some embodiments, there may be one display screen 1205, which is provided with the front panel of the terminal 1200; in other embodiments, there may be at least two display screens 1205, which are respectively arranged on different surfaces of the terminal 1200 or in a folded design; In still other embodiments, the display screen 1205 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 1200. Furthermore, the display screen 1205 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen. The display screen 1205 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).

The camera assembly 1206 is used to capture images or videos. Optionally, the camera assembly 1206 includes a front camera and a rear camera. Generally, the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal. In some embodiments, there are at least two rear cameras, each of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, Integrate with the wide-angle camera to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, the camera assembly 1206 may also include a flash. The flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.

The audio circuit 1207 may include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1201 for processing, or input to the radio frequency circuit 1204 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 1200. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is used to convert the electrical signal from the processor 1201 or the radio frequency circuit 1204 into sound waves. The speaker can be a traditional membrane speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 1207 may also include a headphone jack.

The positioning component 1208 is used to locate the current geographic location of the terminal 1200 to implement navigation or LBS (Location Based Service, location-based service). The positioning component 1208 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.

The power supply 1209 is used to supply power to various components in the terminal 1200. The power source 1209 may be alternating current, direct current, disposable batteries or rechargeable batteries. When the power source 1209 includes a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery can also be used to support fast charging technology.

In some embodiments, the terminal 1200 further includes one or more sensors 1210. The one or more sensors 1210 include, but are not limited to: an acceleration sensor 1211, a gyroscope sensor 1212, a pressure sensor 1213, a fingerprint sensor 1214, an optical sensor 1215, and a proximity sensor 1216.

The acceleration sensor 1211 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1200. For example, the acceleration sensor 1211 can be used to detect the components of the gravitational acceleration on three coordinate axes. The processor 1201 may control the touch screen 1205 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 1211. The acceleration sensor 1211 can also be used for game or user motion data collection.

The gyroscope sensor 1212 can detect the body direction and rotation angle of the terminal 1200, and the gyroscope sensor 1212 can cooperate with the acceleration sensor 1211 to collect the user's 3D actions on the terminal 1200. The processor 1201 can implement the following functions according to the data collected by the gyroscope sensor 1212: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.

The pressure sensor 1213 may be disposed on the side frame of the terminal 1200 and/or the lower layer of the touch screen 1205. When the pressure sensor 1213 is arranged on the side frame of the terminal 1200, the user's holding signal of the terminal 1200 can be detected, and the processor 1201 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 1213. When the pressure sensor 1213 is arranged on the lower layer of the touch display screen 1205, the processor 1201 operates according to the user's pressure on the touch display screen 1205 to control the operability controls on the UI interface. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 1214 is used to collect the user's fingerprint. The processor 1201 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user's identity according to the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 1201 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings. The fingerprint sensor 1214 may be provided on the front, back or side of the terminal 1200. When a physical button or a manufacturer logo is provided on the terminal 1200, the fingerprint sensor 1214 can be integrated with the physical button or the manufacturer logo.

The optical sensor 1215 is used to collect the ambient light intensity. In an embodiment, the processor 1201 may control the display brightness of the touch screen 1205 according to the intensity of the ambient light collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1205 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1205 is decreased. In another embodiment, the processor 1201 may also dynamically adjust the shooting parameters of the camera assembly 1206 according to the ambient light intensity collected by the optical sensor 1215.

The proximity sensor 1216, also called a distance sensor, is usually arranged on the front panel of the terminal 1200. The proximity sensor 1216 is used to collect the distance between the user and the front of the terminal 1200. In one embodiment, when the proximity sensor 1216 detects that the distance between the user and the front of the terminal 1200 gradually decreases, the processor 1201 controls the touch screen 1205 to switch from the on-screen state to the off-screen state; when the proximity sensor 1216 detects When the distance between the user and the front of the terminal 1200 gradually increases, the processor 1201 controls the touch display screen 1205 to switch from the rest screen state to the bright screen state.

Those skilled in the art can understand that the structure shown in FIG. 12 does not constitute a limitation on the terminal 1200, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.

In an exemplary embodiment, a computer-readable storage medium is also provided. The computer-readable storage medium includes a memory storing instructions. The instructions can be executed by a processor in a terminal to complete the video synthesis method in the foregoing embodiment. . The computer-readable storage medium may be non-transitory. For example, the computer-readable storage medium may be ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Those of ordinary skill in the art can understand that all or part of the steps in the foregoing embodiments can be implemented by hardware, or by a program instructing relevant hardware to be completed. The program can be stored in a computer-readable storage medium. The storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

The above descriptions are only preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims

A method for video synthesis, characterized in that the method includes:

Sending a material acquisition request to the server, where the material acquisition request carries characteristic information of the material audio;

Acquiring the material video set, the material audio, and the accent beat time point of the material audio sent by the server;

Determine multiple material videos in the material video collection based on the accent beat time point;

Based on the accent beat time point, the multiple material videos and the material audio are synthesized to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the accent beat of the audio data Point in time.
The method according to claim 1, wherein the determining a plurality of material videos in the material video set based on the accent beat time point comprises:

Based on the number of accent beat time points N, the start time point and the end time point of the material audio, a plurality of material videos are determined in the material video set.
The method according to claim 2, characterized in that, based on the number N of the accent beat time points, the start time point and the end time point of the material audio, in the material video set, determine Multiple material videos, including:

If one of the start time point and the end time point of the material audio is the accent beat time point, determine N material videos in the material video set;

If the start time point and the end time point of the material audio are both accent beat time points, determine N-1 material videos in the material video set;

If both the start time point and the end time point of the material audio are not accent beat time points, then N+1 material videos are determined in the material video set.
The method according to claim 1, wherein the synthesizing the plurality of material videos and the material audio based on the accent beat time point to obtain a synthesized video comprises:

Determine the synthesis order of each material video when synthesizing the video;

Obtain the material videos one by one according to the synthesis order of the material videos, and for each material video obtained, determine the sub-video corresponding to the currently obtained material video based on the currently obtained material video and the accent beat time point;

Based on the synthesis sequence, each sub-video is synthesized to obtain a synthesized material video, and the synthesized material video and the material audio are synthesized to obtain a synthesized video.
The method according to claim 4, wherein the determining the sub-video corresponding to the currently acquired material video based on the currently acquired material video and the accent beat time point comprises:

If the synthesis order of the currently acquired material video is the first, it is determined that the starting time point of the material audio is after the starting time point and the first accent beat time point closest to the starting time point The first duration between, in the material video, the video of the first duration is intercepted from the start time point of the material video as the first sub-video corresponding to the material video;

If the synthesis order of the currently acquired material video is not the first, determine the first total duration of the generated sub-video, determine the first time point of the first total duration after the start time point of the material audio, and determine all A second accent beat time point after the first time point and closest to the first time point;

If there is the second accent beat time point, determine the second time length between the first time point and the second accent beat time point, in the material video, from the beginning of the material video Starting to intercept the second-length video at a time point as the second sub-video corresponding to the material video;

If the second accent beat time point does not exist, determine the third time period from the first time point to the end time point of the material audio, in the material video, from the time of the material video The video of the third duration is captured at the beginning time point as the third sub-video corresponding to the material video.
The method according to any one of claims 1 to 5, wherein the material video set is a total material video that is stitched together from a plurality of material videos.
The method according to any one of claims 1 to 5, wherein the material video set is a video set including a plurality of independent material videos.
The method according to any one of claims 1 to 5, wherein the acquiring the material video set, the material audio, and the accent beat time point of the material audio sent by the server comprises:

Receiving the material video set, the original material audio, the accent beat time point of the original material audio and the preset cutting time point sent by the server;

Cutting the original material audio based on the preset cutting time point and the preset cutting time length to obtain material audio for synthesizing video;

In the accent beat time point of the original material audio, the accent beat time point of the material audio for synthesizing video is determined.
A video synthesis device, characterized in that the device includes:

A sending module, configured to send a material acquisition request to the server, wherein the material acquisition request carries characteristic information of the material audio;

An acquiring module, configured to acquire the material video set, the material audio, and the accent beat time point of the material audio sent by the server;

A determining module, configured to determine multiple material videos in the material video collection based on the accent beat time point;

The synthesis module is configured to synthesize the multiple material videos and the material audio based on the accent beat time point to obtain a synthesized video, wherein the switching time point of each material video in the synthesized video is the The accent beat time point of the audio data.
A terminal, characterized in that the terminal includes a processor and a memory, and at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement any one of claims 1 to 8 The operations performed by the video synthesis method described in the item.
A computer-readable storage medium, characterized in that, at least one instruction is stored in the computer-readable storage medium, and the instruction is loaded and executed by a processor to implement any one of claims 1 to 8 The operations performed by the method of video synthesis.