CN110677711A

CN110677711A - Video dubbing method and device, electronic equipment and computer readable medium

Info

Publication number: CN110677711A
Application number: CN201910989539.7A
Authority: CN
Inventors: 刘正阳
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2020-01-10
Anticipated expiration: 2039-10-17
Also published as: CN110677711B

Abstract

The disclosure provides a video dubbing method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of audio processing. The method comprises the following steps: acquiring a video to be dubbed music and an audio used for dubbing music; respectively acquiring a video segmentation point of a video and an audio segmentation point of an audio; dividing the video into video segments according to the video segmentation points; dividing the audio into audio segments with the same number as the video segments according to the audio segmentation points; adjusting the playing speed of each video clip or the playing speed of each audio clip to ensure that the playing time lengths of each video clip and each audio clip are the same in one-to-one correspondence according to the playing sequence; and connecting the adjusted video clips according to a playing sequence to obtain a target video, connecting the adjusted audio clips according to the playing sequence to obtain a target audio, and carrying out combined playing on the target video and the target audio. The video picture characteristics and the music rhythm characteristics in the audio can be automatically and effectively combined, and the watching immersion feeling of a user is improved.

Description

Video dubbing method and device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video dubbing method and apparatus, an electronic device, and a computer-readable medium.

Background

When the video is played, the background music has an indispensable function, and when the picture characteristics and the music characteristics in the video are effectively combined, a user watching the video can feel the atmosphere in the video and have an experience of being personally on the scene.

Generally, when a user sets background music for a video, the user often directly selects a piece of background music with audio set as the video, or simply adjusts the playing speed of the audio or the video, and picture features in the edited video and rhythm points of the music cannot be effectively combined, so that the video has poor dubbing effect.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, a video dubbing method is provided, which includes:

acquiring a video to be dubbed music and an audio used for dubbing music;

respectively acquiring a video segmentation point of the video and an audio segmentation point of the audio;

dividing the video into at least two video segments according to the video segmentation point;

dividing the audio into at least two audio segments with the same number as the video segments according to the audio segmentation point;

adjusting the playing speed of each video clip or the playing speed of each audio clip to ensure that the playing time lengths of each video clip and each audio clip are the same in one-to-one correspondence according to the playing sequence;

and connecting the adjusted video clips according to a playing sequence to obtain a target video, and connecting the adjusted audio clips according to the playing sequence to obtain a target audio, so as to be used for jointly playing the target video and the target audio.

In a second aspect, there is provided a video dubbing apparatus comprising:

the video acquisition module is used for acquiring a video to be dubbed music and an audio for dubbing music;

the segmentation point acquisition module is used for respectively acquiring the video segmentation points of the video and the audio segmentation points of the audio;

the first dividing module is used for dividing the video into at least two video segments according to the video segmentation points;

the second dividing module is used for dividing the audio into at least two audio segments with the same number as the video segments according to the audio dividing point;

the adjusting module is used for adjusting the playing speed of each video clip or the playing speed of each audio clip, so that the playing time lengths of each video clip and each audio clip are the same in a one-to-one correspondence mode according to the playing sequence;

and the setting module is used for connecting the adjusted video clips according to the playing sequence to obtain a target video, and connecting the adjusted audio clips according to the playing sequence to obtain a target audio, so that the target video and the target audio are jointly played.

In a third aspect, an electronic device is provided, which includes:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the video dubbing method for realizing the first aspect of the present disclosure is executed.

In a fourth aspect, a computer-readable medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the video dubbing method shown in the first aspect of the present disclosure.

The technical scheme provided by the disclosure has the following beneficial effects:

in the scheme of the embodiment of the disclosure, the same number of division points are respectively obtained from the video to be dubbed music and the audio to be dubbed music, the video to be dubbed music and the audio to be dubbed music are respectively divided into the same number of video segments and audio segments, and then the playing speed is adjusted to ensure that the playing time lengths of the video segments and the audio segments are in one-to-one correspondence according to the playing sequence, namely the playing time of each division point is in one-to-one correspondence according to the playing sequence, so that the automatic effective combination of the picture characteristics in the video and the music rhythm characteristics in the audio to be dubbed music is realized, and the picture content and the music rhythm are synchronously subjected to click conversion, so that a user watching the video can feel the change propulsion of the video content along with the change of the audio, the immersion feeling of the user watching rhythm is improved, and the experience is personally on the scene.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of a video dubbing method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a video dubbing method according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of the structure of audio and video in the form of a time axis provided by the embodiment of the present disclosure;

FIG. 4 is a schematic diagram of adjusting the playing speed of the audio in FIG. 3 according to an embodiment of the disclosure;

fig. 5 is a schematic structural diagram of a video dubbing apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device for video dubbing provided in an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing the devices, modules or units, and are not used for limiting the devices, modules or units to be different devices, modules or units, and also for limiting the sequence or interdependence relationship of the functions executed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

To better explain aspects of the embodiments of the present disclosure, the following first introduces and explains related technical terms related to the embodiments of the present disclosure:

the "video segmentation point" and the "first segmentation point" in the present disclosure may be a time point at which a frame in a video frame may be subjected to freeze segmentation, for example, an object, a scene, a gaze, or a lens in the video frame may move, or an expression freeze frame, and the like may be the video segmentation point.

The "audio dividing point" and the "second dividing point" in the present disclosure may be determined according to the dividing point of the audio, that is, the audio dividing point may be determined according to the variation and repetition of the tempo lightness and urgency of the sound in the audio, for example, the position of every preset number of tempos in the audio may be set as the audio dividing point, and the like.

The present disclosure provides a video dubbing method, apparatus, electronic device and computer readable medium, which aim to solve the above technical problems of the prior art.

The following describes the technical solutions of the present disclosure and how to solve the above technical problems in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

The embodiment of the present disclosure provides a video dubbing method, as shown in fig. 1, the method includes:

step S101, acquiring a video to be dubbed music and an audio used for dubbing music.

In this step, the audio for the score may be directly sent by the user, or may be selected by the user from candidate audio, or the server may receive the video to be scored sent by the user, identify the type of the video to be scored, and recommend the audio corresponding to the type of the video.

Step S102, video segmentation points of the video and audio segmentation points of the audio are respectively obtained.

In a specific implementation process, a user can set video segmentation points in a video by himself, and a server can also recognize the video segmentation points of the video through a preset video rhythm recognition model, for example, by recognizing shot conversion points, expression lattice points and the like in a video picture.

Similarly, the user can set audio dividing points in the audio by himself, and the server can also recognize rhythm points of the audio through a preset audio rhythm recognition model, for example, the audio dividing points are set according to beats, drum points and the like.

Specifically, the number of the audio segmentation points is equal to that of the video segmentation points, so that the audio and the video can be subsequently segmented into audio segments and video segments with the same number.

And step S103, dividing the video into at least two video segments according to the video segmentation point.

Specifically, the video is divided into at least two video segments from the video segmentation point.

And step S104, dividing the audio into at least two audio segments with the same number as the video segments according to the audio segmentation points.

And dividing the audio into at least two audio segments from the audio segmentation points, wherein the number of the video segmentation points is the same as that of the audio segmentation points, and the number of the video segments is the same as that of the audio segments.

For example, two video segmentation points and two audio segmentation points are obtained, the video is divided into three video segments from the video segmentation points, and the audio is divided into three audio segments from the audio segmentation points.

And step S105, adjusting the playing speed of each video clip or the playing speed of each audio clip, so that the playing time lengths of each video clip and each audio clip are the same in a one-to-one correspondence mode according to the playing sequence.

In the specific implementation process, the playing speed of the audio clip can be kept unchanged, and the playing speed of the video clip can be adjusted; the playing speed of the video clip can be kept unchanged, and the playing speed of the audio clip can be adjusted; the playing speeds of the video clips and the audio clips can be adjusted simultaneously, so that the playing time lengths of the video clips and the audio clips are the same in one-to-one correspondence according to the playing sequence, namely the playing time of each video segmentation point is the same in one-to-one correspondence with the playing time of each audio segmentation point according to the playing sequence.

And S106, connecting the adjusted video clips according to the playing sequence to obtain a target video, and connecting the adjusted audio clips according to the playing sequence to obtain a target audio, so as to be used for jointly playing the target video and the target audio.

In the specific implementation process, if the playing speed of the audio clip is kept unchanged and the playing speed of the video clip is adjusted, connecting the adjusted video clips according to the playing sequence to obtain a target video, and performing combined playing on the target video and the audio; if the playing speed of the video clips is kept unchanged, the playing speed of the audio clips is adjusted, the adjusted audio clips are connected according to the playing sequence to obtain target audio, and the video and the target audio are played in a combined mode; if the playing speeds of the video clips and the audio clips are adjusted simultaneously, the adjusted video clips are connected according to the playing sequence to obtain a target video, the adjusted audio clips are connected according to the playing sequence to obtain a target audio, and the target video and the target audio are played jointly.

In the above embodiment, the same number of division points are respectively obtained from the video to be dubbed music and the audio for dubbed music, the video to be dubbed music and the audio for dubbed music are respectively divided into the same number of video segments and audio segments, and then the playing speed is adjusted so that the playing time lengths of the video segments and the audio segments are in one-to-one correspondence according to the playing sequence, that is, the playing time of the division points are in one-to-one correspondence according to the playing sequence, thereby realizing the automatic effective combination of the picture characteristics in the video and the music rhythm characteristics in the audio for dubbed music, realizing the synchronous click conversion of the picture contents and the music rhythm, and enabling a user watching the video to feel the change propulsion of the video contents along with the change of the audio rhythm, improving the immersion feeling of the user watching and having the feeling of being personally on the scene.

A possible implementation manner is provided in the embodiment of the present disclosure, the step S106 of connecting the adjusted video segments according to the playing sequence to obtain the target video may include: and connecting the adjusted video clips according to the playing sequence, and inputting the connected videos into a preset smoothing processing model so as to carry out video smoothing processing on the video clips to obtain the target video.

Specifically, the playing speeds of the video segments may be different, and a preset smoothing model may be adopted to perform video smoothing on the video segmentation points of the connected video, so that the video segments with different playing speeds can be smoothly transited when being continuously played.

A possible implementation manner is provided in the embodiment of the present disclosure, the connecting the adjusted audio segments according to the playing order in step S106 to obtain the target audio may include: and connecting the adjusted audio segments according to the playing sequence, inputting the connected audio into a preset smoothing model, and performing audio smoothing on the audio segments to obtain the target audio.

Similarly, the playing speeds of the audio segments may be different, and a preset smoothing model may be adopted to perform audio smoothing on the audio segmentation points of the connected audio, so that smooth transition may be achieved when the audio segments with different playing speeds are continuously played.

The above embodiments describe the process of video smoothing processing on video or audio, and the process of obtaining audio for a score will be described in detail below with reference to the embodiments.

A possible implementation manner is provided in the embodiment of the present disclosure, the acquiring a video to be dubbed music and an audio for dubbing music in step S101 may include: and receiving the video to be dubbed music sent by the user, and setting the audio selected by the user from the preset candidate audio as the audio for dubbing the music to the video.

In one embodiment, the audio for the score may be sent directly by the user or may be selected by the user from candidate audio.

A possible implementation manner is provided in the embodiment of the present disclosure, the acquiring a video to be dubbed music and an audio for dubbing music in step S101 may include:

(1) and receiving the video to be dubbed sent by the user, and identifying the label type of the video.

The type of the tag of the video may be the type of the content of the video, or the range of the place where the user uploads the video is located.

(2) And acquiring audio corresponding to the type of the label according to a preset recommendation model, and setting the recommended audio as audio for carrying out music matching on the video.

In another embodiment, a plurality of tag types and audio corresponding to each tag type may be preset, and when the tag type of a video is identified, the audio corresponding to the tag type is queried; the recommendation model can also be trained according to a plurality of sample label types and sample audios to obtain the trained recommendation model, and then the identified label types are input into the recommendation model to obtain the corresponding recommended audios.

The above embodiments illustrate the process of acquiring audio for a score, and the following describes the process of acquiring audio and video cut points from audio and video, respectively, with reference to the drawings and the practical embodiments.

A possible implementation manner is provided in the embodiment of the present disclosure, as shown in fig. 2, the step of respectively obtaining the video segmentation point of the video and the audio segmentation point of the audio in step S102 includes:

step S210, identifying a first cut point of the video based on a preset video rhythm identification model.

In a specific implementation process, a user can set a first cut point in a video by himself, and a server can also recognize the first cut point of the video through a preset video rhythm recognition model, for example, by recognizing a shot conversion point, an expression lattice point and the like in a video picture.

In step S220, a second cut point of the audio is identified based on the preset audio rhythm identification model.

Similarly, the user can set the second cut point in the audio by himself, and the server can also recognize the rhythm point of the audio through a preset audio rhythm recognition model, such as setting the audio cut point according to the beat, the drum point and the like.

Step S230, selecting video segmentation points from the first segmentation points, and selecting audio segmentation points from the second segmentation points; the number of the video dividing points is the same as that of the audio dividing points.

Specifically, the number of the first segmentation points is not necessarily the same as that of the second segmentation points, and therefore, the video segmentation points may be selected from the first segmentation points, and the audio segmentation points having the same number as that of the video segmentation points may be selected from the second segmentation points, so as to subsequently segment the audio and the video into audio segments and video segments having the same number.

The following will specifically describe how to select video segmentation points from the first segmentation points and select audio segmentation points with the same number as the video segmentation points from the second segmentation points, with reference to the drawings and the embodiments.

A possible implementation manner is provided in the embodiment of the present disclosure, the selecting a video segmentation point from the first segmentation points and selecting an audio segmentation point from the second segmentation points in step S230 includes:

(1) and extracting a preset number of video segmentation points from the first segmentation points, and inquiring the first playing time of each video segmentation point.

In a specific implementation process, the user may select the video segmentation points from the first segmentation points, or the server may randomly select the video segmentation points from the first segmentation points.

(2) And querying a second playing time of each second segmentation point, acquiring a second playing time with the smallest playing time difference with each first playing time, and setting a second segmentation point corresponding to the acquired second playing time as an audio segmentation point.

Referring to fig. 3, the video and the audio are shown in fig. 3 in a time axis form, and the initial playing time of the video and the initial playing time of the audio obtained in fig. 3 are the same and both are 3 minutes; in other embodiments, the obtained initial playing time lengths of the video and the audio may be different, and the playing start point and the playing end point of the video and the audio may be used as the corresponding first division point and the second division point, so that the finally obtained playing time lengths of the video clips and the audio clips are the same in a one-to-one correspondence manner according to the playing sequence.

In fig. 3, a first cut point a and a first cut point B of a video are identified based on a preset video tempo identification model, and both the first cut point a and the first cut point B are set as a video cut point a and a video cut point B; identifying second segmentation points C, D and E of the audio based on a preset audio rhythm identification model, and setting the second segmentation point C as an audio segmentation point C due to the fact that the playing time difference between the second segmentation point C and the video segmentation point A is minimum; similarly, since the playing time difference between the second segmentation point E and the video segmentation point B is the smallest, the second segmentation point E is set as the corresponding audio segmentation point E.

The embodiment of the present disclosure provides a possible implementation manner, and the step of adjusting the playing speed of each video segment or the playing speed of each audio segment includes at least one of the following steps:

(1) adjusting the playing speed of each video clip according to the playing time length of each audio clip;

(2) adjusting the playing speed of each audio clip according to the playing time length of each video clip;

(2) and jointly adjusting the playing speed of each video clip and the playing speed of each audio clip according to the playing time of each audio clip and the playing time of each video clip.

Specifically, the playing speed of the audio clip can be kept unchanged, and the playing speed of the video clip can be adjusted; the playing speed of the video clip can be kept unchanged, and the playing speed of the audio clip can be adjusted; the playing speeds of the video clips and the audio clips can be adjusted simultaneously, so that the playing time lengths of the video clips and the audio clips are the same in one-to-one correspondence according to the playing sequence, namely the playing time of each video segmentation point is the same in one-to-one correspondence with the playing time of each audio segmentation point according to the playing sequence.

The embodiment of the present disclosure provides a possible implementation manner, and the step of adjusting the playing speed of each video clip according to the playing time length of each audio clip may include:

(1) acquiring the playing time length of a video clip and acquiring the playing time length of a corresponding audio clip;

(2) and inputting the playing time length of the video clip and the playing time length of the audio clip into a preset speed function for calculation to obtain the playing speed of each video frame in the video clip.

The playing speed of the video clips can be averagely lengthened or shortened, and the playing speed of each video frame in the video clips can be the same; the playing speed of each video frame in the video clip can also be calculated according to a preset speed function, and the playing speed of each video frame in the video clip can also be different.

Taking fig. 4 as an example, fig. 4 is a schematic diagram illustrating adjusting the playing speed of the audio clip in fig. 3, the playing speed of the audio clip between the starting point of the audio and the point C may be adjusted, so that the playing time of the point C is consistent with the playing time of the point a in the video; and adjusting C, E the playing speed between the two points to make the playing time of the point E consistent with the playing time of the point B in the video, and then playing the audio and the video after adjustment jointly.

According to the video dubbing method, the same number of the cut points are respectively obtained from the video to be dubbed and the audio for dubbing, the video to be dubbed and the audio for dubbing are respectively divided into the same number of the video segments and the same number of the audio segments, then the playing speed is adjusted to enable the playing time lengths of the video segments and the audio segments to be in one-to-one correspondence according to the playing sequence, namely the playing time of each cut point is in one-to-one correspondence according to the playing sequence, so that the picture characteristics in the video and the music rhythm characteristics in the audio for dubbing are effectively combined automatically, the picture contents and the music rhythm are synchronously subjected to click conversion, a user watching the video can feel the change propulsion of the video contents along with the change of the audio rhythm, the immersion feeling of the user watching is improved, and the user has a feeling of being personally on the scene.

Furthermore, a preset smoothing model is adopted to carry out video smoothing on each video segmentation point of the connected video, so that the video segments with different playing speeds can be smoothly transited when being continuously played.

Furthermore, the user selects the audio for the score from the candidate audio, or recommends the audio for the score corresponding to the type of the video according to the type of the video to be scored, so that the determined audio is more in line with the preference of the user.

Furthermore, the playing speed of the audio clip can be kept unchanged, and the playing speed of the video clip can be adjusted; the playing speed of the video clip can be kept unchanged, and the playing speed of the audio clip can be adjusted; the playing speeds of the video clips and the audio clips can be adjusted simultaneously, so that the playing time lengths of the video clips and the audio clips are the same in one-to-one correspondence according to the playing sequence, namely the playing time of each video segmentation point is the same in one-to-one correspondence with the playing time of each audio segmentation point according to the playing sequence.

The disclosed embodiment provides a video dubbing apparatus, as shown in fig. 5, the video dubbing apparatus 50 may include: a video acquisition module 501, a cut point acquisition module 502, a first division module 503, a second division module 504, an adjustment module 505, and a setting module 506, wherein,

a video obtaining module 501, configured to obtain a video to be dubbed music and an audio for dubbing music;

a segmentation point obtaining module 502, configured to obtain a video segmentation point of the video and an audio segmentation point of the audio respectively;

a first dividing module 503, configured to divide the video into at least two video segments according to the video segmentation point;

a second dividing module 504, configured to divide the audio into at least two audio segments with the same number as the video segments according to the audio dividing point;

the adjusting module 505 is configured to adjust the playing speed of each video segment or the playing speed of each audio segment, so that the playing durations of each video segment and each audio segment are the same in a one-to-one correspondence manner according to the playing sequence;

a setting module 506, configured to connect the adjusted video segments according to a playing sequence to obtain a target video, and connect the adjusted audio segments according to the playing sequence to obtain a target audio, so as to perform joint playing on the target video and the target audio.

According to the video dubbing music device, the same number of the cut points are respectively obtained from the video to be dubbed music and the audio for dubbing music, the video to be dubbed music and the audio for dubbing music are respectively divided into the same number of the video segments and the same number of the audio segments, then the playing speed is adjusted to enable the playing time lengths of the video segments and the audio segments to be in one-to-one correspondence according to the playing sequence, namely the playing time of each cut point is in one-to-one correspondence according to the playing sequence, so that the picture characteristics in the video and the music rhythm characteristics in the audio for dubbing music are effectively combined automatically, the picture contents and the music rhythm are synchronously subjected to click conversion, a user watching the video can feel the change propulsion of the video contents along with the change of the audio rhythm, the immersion feeling of the user watching is improved, and the user feels personally on the scene.

In an optional embodiment of the second aspect, when the adjusted video segments are connected according to the playing sequence to obtain the target video, the setting module 506 is specifically configured to:

and connecting the adjusted video clips according to the playing sequence, inputting the connected videos into a preset smoothing processing model, and performing video smoothing processing on the video clips to obtain the target video.

In an optional embodiment of the second aspect, when acquiring the video to be dubbed music and the audio for dubbing music, the video acquiring module 501 is specifically configured to:

and receiving the video to be dubbed music sent by the user, and setting the audio selected by the user from the preset candidate audio as the audio for dubbing the music to the video.

receiving a video to be dubbed sent by a user;

identifying a tag type of the video;

acquiring audio corresponding to the type of the label according to a preset recommendation model;

the recommended audio is set as the audio for dubbing the video.

In an optional embodiment of the second aspect, when the segmentation point obtaining module 502 obtains the video segmentation point of the video and the audio segmentation point of the audio respectively, it is specifically configured to:

identifying a first cut point of a video based on a preset video rhythm identification model;

identifying a second segmentation point of the audio based on a preset audio rhythm identification model;

selecting video segmentation points from the first segmentation points, and selecting audio segmentation points from the second segmentation points; the number of the video dividing points is the same as that of the audio dividing points.

In an optional embodiment of the second aspect, when the segmentation point obtaining module 502 selects a video segmentation point from the first segmentation points and selects an audio segmentation point from the second segmentation points, it is specifically configured to:

extracting a preset number of video segmentation points from the first segmentation points, and inquiring the first playing time of each video segmentation point;

and querying a second playing time of each second segmentation point, acquiring a second playing time with the smallest playing time difference with each first playing time, and setting a second segmentation point corresponding to the acquired second playing time as an audio segmentation point.

In an optional embodiment of the second aspect, when adjusting the playing speed of each video segment or the playing speed of each audio segment, the adjusting module 505 is specifically configured to at least one of:

adjusting the playing speed of each video clip according to the playing time length of each audio clip;

adjusting the playing speed of each audio clip according to the playing time length of each video clip;

and jointly adjusting the playing speed of each video clip and the playing speed of each audio clip according to the playing time of each audio clip and the playing time of each video clip.

In an optional embodiment of the second aspect, when the adjusting module 506 adjusts the playing speed of each video segment according to the playing time length of each audio segment, it is specifically configured to:

acquiring the playing time length of a video clip and acquiring the playing time length of a corresponding audio clip;

and inputting the playing time length of the video clip and the playing time length of the audio clip into a preset speed function for calculation to obtain the playing speed of each video frame in the video clip.

The video dubbing apparatus for pictures in the embodiments of the present disclosure may execute the video dubbing method for pictures provided in the embodiments of the present disclosure, and the implementation principles thereof are similar, the actions performed by each module in the video dubbing apparatus for pictures in the embodiments of the present disclosure correspond to the steps in the video dubbing method for pictures in the embodiments of the present disclosure, and for the detailed functional description of each module of the video dubbing apparatus for pictures, reference may be specifically made to the description in the video dubbing method for corresponding pictures shown in the foregoing, and details are not repeated here.

Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The electronic device includes: a memory and a processor, wherein the processor may be referred to as a processing device 601 described below, and the memory may include at least one of a Read Only Memory (ROM)602, a Random Access Memory (RAM)603, and a storage device 608, which are described below:

as shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

acquiring a video to be dubbed music and an audio used for dubbing music;

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module or unit does not in some cases constitute a limitation of the unit itself, for example, an adjustment module may also be described as a "module that adjusts the play speed of an audio or video clip".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for video dubbing music, comprising:

acquiring a video to be dubbed music and an audio used for dubbing music;

2. The video dubbing method of claim 1, wherein the concatenating the adjusted video segments in playing order to obtain the target video comprises:

connecting the adjusted video clips according to the playing sequence;

and inputting the connected video into a preset smoothing model to carry out video smoothing on each video segment to obtain the target video.

3. The video dubbing method of claim 1, wherein the step of acquiring the video to be dubbed and the audio for dubbing comprises:

receiving a video to be dubbed sent by a user;

and setting the audio selected by the user from preset candidate audio as the audio for dubbing the video.

4. The video dubbing method of claim 1, wherein the step of acquiring the video to be dubbed and the audio for dubbing comprises:

receiving a video to be dubbed sent by a user;

identifying a tag type of the video;

setting the recommended audio as the audio for dubbing the video.

5. The video dubbing method of claim 1, wherein the step of separately obtaining the video cut point of the video and the audio cut point of the audio comprises:

identifying a first cut point of the video based on a preset video rhythm identification model;

selecting the video segmentation point from the first segmentation point and selecting the audio segmentation point from the second segmentation point; the number of the video dividing points is the same as that of the audio dividing points.

6. The video dubbing method of claim 5, wherein the step of selecting the video clip point from the first clip point and the audio clip point from the second clip point comprises:

extracting a preset number of video segmentation points from the first segmentation points, and inquiring first playing time of each video segmentation point;

and querying a second playing time of each second division point to obtain a second playing time with the minimum playing time difference with each first playing time, and setting a second division point corresponding to the obtained second playing time as the audio division point.

7. The video dubbing method of claim 1, wherein the step of adjusting the play speed of each video clip or the play speed of each audio clip comprises at least one of:

8. The method of claim 7, wherein the step of adjusting the playing speed of each video segment according to the playing duration of each audio segment comprises:

9. A video dubbing apparatus comprising:

10. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the video dubbing method of any of claims 1 to 8.

11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the video dubbing method of any one of claims 1 to 8.