CN110324718B

CN110324718B - Audio and video generation method and device, electronic equipment and readable medium

Info

Publication number: CN110324718B
Application number: CN201910718655.5A
Authority: CN
Inventors: 刘德平
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2021-09-07
Anticipated expiration: 2039-08-05
Also published as: CN110324718A

Abstract

The embodiment of the disclosure discloses an audio and video generation method and device, electronic equipment and a readable medium. Wherein, the method comprises the following steps: if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined, the video content selected by the user from a preset picture library is obtained; editing the video content selected by the user according to the playing duration of the audio content to generate target video content; wherein the playing duration of the target video content is equal to the playing duration of the audio content; and generating audio and video according to the audio content and the target video content. Through the technical scheme of the embodiment of the disclosure, manual operation of a user is not needed, the user experience is improved, and meanwhile the generation efficiency of audio and video is improved.

Description

Audio and video generation method and device, electronic equipment and readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of internet, in particular to an audio and video generation method and device, an electronic device and a readable medium.

Background

In the K song application program, a user can watch the audio and video released by other users, and can also select favorite songs to record the audio and video and release the favorite songs.

Specifically, the user can input a favorite song name in the search box, and then click any K song options in the search result, so that the user can enter a singing interface of the song to record the song. In general, after the song is recorded, the user may select a recorded video or a photographed picture from the picture library, and further may generate an audio/video according to the audio of the recorded song and the recorded video or the photographed picture.

However, after the user selects the recorded video, when the video length exceeds the audio recorded by the user, the user needs to manually cut the video, and the user is not prompted how long the user needs to cut the clip, so that the user experience is poor, and the efficiency of generating the audio and video is low.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide an audio and video generation method and apparatus, an electronic device, and a readable medium, so as to improve the generation efficiency of audio and video and improve the experience of a user.

In a first aspect, an embodiment of the present disclosure provides an audio and video generating method, where the method includes:

if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined, the video content selected by the user from a preset picture library is obtained;

editing the video content selected by the user according to the playing duration of the audio content to generate target video content; wherein the playing duration of the target video content is equal to the playing duration of the audio content;

and generating audio and video according to the audio content and the target video content.

In a second aspect, an embodiment of the present disclosure further provides an audio and video generating apparatus, where the apparatus includes:

the video content acquisition module is used for acquiring the video content selected by the user from the preset picture library if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined;

the target video content generating module is used for editing the video content selected by the user according to the playing duration of the audio content to generate target video content; wherein the playing duration of the target video content is equal to the playing duration of the audio content;

and the audio and video generation module is used for generating audio and video according to the audio content and the target video content.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement the audio-video generation method according to any embodiment of the present disclosure.

In a fourth aspect, the embodiments of the present disclosure provide a readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the audio and video generation method according to any embodiment of the present disclosure.

According to the audio and video generation method and device, the electronic device and the readable medium, under the condition that the existing image content is added to the audio content of the target song selected by the user, the video content selected by the user from the preset picture library is edited according to the playing duration of the audio content, the target video content with the playing duration equal to the playing duration of the audio content is generated, and then the audio and video can be generated according to the audio content and the target video content. Compared with the prior art, according to the scheme, the video content selected by the user from the preset picture library can be automatically edited according to the playing time of the audio content, the manual operation of the user is not needed, the user experience is improved, and meanwhile the generation efficiency of the audio and video is improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, a brief description will be given below to the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 shows a flowchart of an audio and video generation method provided by an embodiment of the present disclosure;

fig. 2 shows a flowchart of another audio/video generation method provided by the embodiment of the present disclosure;

fig. 3 shows a flowchart of another audio/video generation method provided by the embodiment of the present disclosure;

fig. 4 shows a schematic structural diagram of an audio/video generating apparatus provided in an embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure clearer, the technical solutions of the present disclosure will be clearly and completely described below through embodiments with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect. In the following embodiments, optional features and examples are provided in each embodiment, and various features described in the embodiments may be combined to form a plurality of alternatives, and each numbered embodiment should not be regarded as only one technical solution.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

Fig. 1 shows a flowchart of an audio and video generation method provided by an embodiment of the present disclosure, which is applicable to a case where an audio and video is generated based on audio content of a recorded target song and existing video content in a preset picture library. The method can be executed by the audio and video generating device or the electronic device provided by the embodiment of the disclosure, and the device can be realized by software and/or hardware. Optionally, the electronic device may be a server device carrying an audio and video generation function, or may be a terminal device configured with a karaoke application program provided by the server.

Optionally, as shown in fig. 1, the audio and video generating method provided in the embodiment of the present disclosure includes the following steps:

and S110, if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined, acquiring the video content selected by the user from a preset picture library.

In this embodiment, the video content may be a common video file, or may be a video file synthesized based on pictures. The preset picture library may be a preset picture library from which the user may select existing video content or existing picture content. Optionally, the preset photo library may include at least one of: the system comprises a picture library configured by the user terminal equipment, a picture library in an application program (namely the picture library in the karaoke application program provided by the server), a cloud picture library of the user, a picture library of the server and the like.

Optionally, determining that the user selects to add the recorded video content to the audio content of the recorded target song may be: if the song release event is detected, determining whether the user selects to add the existing video content; and if so, determining that the user selects to add the existing video content to the audio content of the recorded target song. The song release event may be triggered and generated in the form of manual or voice by a user, and is used to request the server or a karaoke application program provided by the server to release the audio content of the recorded target song.

The example of the server detecting the release event is described. After the audio content of the target song sung by the user is recorded, controlling a K song application program of the user to display the audio content on a preview interface of the audio content; at the moment, if the user is satisfied with the displayed audio content, the user can click a release button in a preview interface to release the audio content, and then the server can detect a release event through the Karaoke application program; and then, the K song application program can be controlled to jump from the preview interface to the release interface, and if the fact that the user clicks any one of the video options in the preset picture library displayed on the release interface is detected, the fact that the user selects to add the existing video content to the audio content of the recorded target song is determined.

Then, the existing video content in the preset picture library can be loaded and displayed; and further, the video content selected by the user can be determined and obtained according to the operation of the user on the existing video content interface of the displayed preset picture library.

And S120, editing the video content selected by the user according to the playing time length of the audio content to generate the target video content.

In this embodiment, the video content selected by the user may be cut, the playing speed or the playing frequency may be adjusted, and the like, according to the playing duration of the audio content, so as to generate the target video content, where the playing duration of the target video content is equal to the playing duration of the audio content. Optionally, the playing time of the video content selected by the user is greater than or less than the playing time of the audio content, and the video content selected by the user is edited according to the playing time of the audio content, so that the target video content is generated in different ways.

For example, if it is determined that the playing time of the video content selected by the user is longer than the playing time of the audio content, the video content selected by the user is edited according to the playing time of the audio content, and the generating of the target video content may be: and according to the playing time length of the audio content, cutting the video content selected by the user and/or adjusting the playing speed to generate the target video content.

For example, if the playing time length of the audio/video content is 20s and the playing time length of the video content selected by the user is 30s, the clip may be performed at a position where the playing progress of the video content selected by the user is 20s, or at a position where the playing progress of the video content selected by the user is 10s, or may be performed at positions where the playing progress of the video content selected by the user is 5s and 25s, and the clip having the playing time length of 20s after the clip is performed is taken as the target video content.

The target video content can also be generated by adjusting the play speed of the video content selected by the user. For example, the new playing speed may be determined according to the playing time length of the audio content, the playing speed and the playing time length of the video content selected by the user, and the like; and adjusting the playing speed of the video content selected by the user to a new playing speed to generate the target video content with the playing time length equal to the playing time length of the audio content. In addition, the playing speed of a certain section of the video content selected by the user can be adjusted by combining the lyric characteristics of the audio content.

In addition, the target video content can be generated by cutting the video content selected by the user and adjusting the playing speed. For example, the playing speed of a certain section or the whole video content selected by the user can be adjusted according to the lyric characteristics of the audio and video; and cutting the video content with the playing speed adjusted according to the playing time length of the audio content to generate target video content with the playing time length equal to the playing time length of the audio content and the like.

For example, if it is determined that the playing time of the video content selected by the user is shorter than the playing time of the audio content, the video content selected by the user is edited according to the playing time of the audio content, and the generating of the target video content may be: determining the cycle number according to the playing time length of the audio content and the playing time length of the video content selected by the user; and generating target video content according to the video content selected by the user and the cycle number. For example, if the playing time of the audio/video content is 20s, and the playing time of the video content selected by the user is 10s, the number of cycles may be determined to be 2; and further, the video content selected by the user can be repeated twice to obtain the target video content.

In addition, if the playing time of the audio-visual content is not an integral multiple of the playing time of the video content selected by the user, the cycle number and the cycle ending frame can be determined according to the playing time of the audio content and the playing time of the video content selected by the user, and then the target video content is generated according to the video content selected by the user, the cycle number and the cycle ending frame. The loop ending frame refers to a video frame corresponding to the position where the video content stops playing when the video content selected by the user is executed in a loop at the last time; for example, if the playing time of the audio/video content is 20s, and the playing time of the video content selected by the user is 8s, the cycle number is 3, and the cycle end frame is the position of the video content selected by the user in the 3 rd cycle, that is, the position of 4 s.

Further, if it is determined that the playing time of the video content selected by the user is shorter than the playing time of the audio content, the video content selected by the user is edited according to the playing time of the audio content, and the generating of the target video content may further be: determining a new playing speed according to the playing time length of the audio content, the playing speed and the playing time length of the video content selected by the user and the like; and adjusting the playing speed of the video content selected by the user to a new playing speed to generate the target video content with the playing time length equal to the playing time length of the audio content. Further, here, the playing speed of the video content selected by the user may be slowed down.

Further, if it is determined that the playing time of the video content selected by the user is shorter than the playing time of the audio content, the video content selected by the user is edited according to the playing time of the audio content, and the generating of the target video content may further be: outputting prompt information to a user in a voice or text mode to prompt the user to select a new video content related to the video content, wherein the playing length of the new video content is equal to the difference between the playing time of the audio content and the playing time of the video content; and acquiring new video content selected from the same preset picture library or other preset picture libraries, and further splicing the new video content and the video content to generate target video content.

It should be noted that, in the embodiment, the video content selected by the user can be automatically edited according to the playing duration of the audio content, so as to generate the target video content, reduce manual operations of the user, and improve the user experience. In addition, the target video content can be generated in a corresponding mode according to the relation between the playing time length of the video content selected by the user and the playing time length of the audio content, and the flexibility of the scheme is increased.

Optionally, before editing the video content selected by the user according to the playing duration of the audio content to generate the target video content, whether the audio exists in the video content selected by the user may be determined, and if not, the video content selected by the user may be edited directly according to the playing duration of the audio content to generate the target video content; if the audio exists, the audio deleting operation can be firstly carried out on the video content selected by the user, and then the video content selected by the user is edited according to the playing duration of the audio content to generate the target video content.

And S130, generating audio and video according to the audio content and the target video content.

Specifically, after the target video content is generated, the audio content may be added to the generated video content to generate an audio and video; or firstly processing the video content by resolution, sticker paper, filter and the like, processing the audio content by volume adjustment, playing style and the like, and then generating audio and video and the like according to the processed audio content and video content. In addition, other implementable manners may be adopted to generate the audio and video according to the audio content and the video content, which is not limited in this embodiment.

According to the technical scheme provided by the embodiment of the disclosure, under the condition that the existing image content is determined to be added to the audio content of the recorded target song selected by the user, the video content selected by the user from the preset picture library is edited according to the playing time length of the audio content, the target video content with the playing time length equal to the playing time length of the audio content is generated, and then audio and video can be generated according to the audio content and the target video content. Compared with the prior art, according to the scheme, the video content selected by the user from the preset picture library can be automatically edited according to the playing time of the audio content, the manual operation of the user is not needed, the user experience is improved, and meanwhile the generation efficiency of the audio and video is improved.

Further, on the basis of the above embodiment, the target video content may be generated by performing an interactive mode such as visualization or question answering with the user. For example, if it is determined that the playing time of the video content selected by the user is longer than the playing time of the audio content, the video content selected by the user is edited according to the playing time of the audio content, and the generating of the target video content may further be: displaying a video content editing interface to a user; and determining the starting time and the ending time of the target video content according to the operation of the user on the video content editing interface, wherein the difference between the starting time and the ending time is equal to the playing time length of the audio content. Optionally, if it is detected that the segment duration between the start cursor and the end cursor is greater than or less than the playing duration of the audio content in the process of adjusting the start cursor and the end cursor of the video content in the video content editing interface by the user, the user may be reminded in the form of voice or characters that the playing duration of the target video content segment needs to be equal to the playing duration of the audio content, and the like.

It should be noted that the number of the start cursors and the end cursors of the video content in the video content editing interface may be multiple, and then the user may intercept multiple segments from the video content according to the requirement, so as to increase the satisfaction of the user.

Fig. 2 shows a flowchart of another audio/video generation method provided by the embodiment of the present disclosure, and this embodiment is optimized on the basis of various alternatives provided by the above embodiment, specifically, this embodiment details how to edit video content selected by a user according to a playing time of audio content in each step provided by the above embodiment, so as to generate target video content.

Optionally, as shown in fig. 2, the audio/video generating method in this embodiment may include the following steps:

s210, if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined, the video content selected by the user from a preset picture library is obtained.

S220, extracting at least two sub-segments from the video content selected by the user according to the playing time length and the lyric characteristics of the audio content and the image characteristics of the video content selected by the user.

In this embodiment, the lyric characteristics may include, but are not limited to, a start position, an end position, a mood, climax fragment characteristics, and the like of the lyric. The image features may include character features (e.g., expressive features, mouth features), scene features, and the like.

Optionally, the playing time length may be divided into a plurality of sub-playing time lengths according to the lyric characteristics of the audio content, and then at least two sub-segments may be extracted from the video content selected by the user according to each sub-playing time length, the corresponding lyric characteristics, and the image characteristics of the video content selected by the user.

For example, one or more sub-segments may be extracted from the user-selected video content based on the sub-play duration of the climax segment of the audio content, and the climax segment characteristics (e.g., lyrics mood), and the image characteristics of the user-selected video content; extracting a sub-segment from the video content selected by the user (further the sub-segment can be a segment matched with the front accompaniment of the audio content) according to the sub-playing time length of the front accompaniment of the audio content, the initial position of the lyrics of the audio content (namely the time for singing the lyrics of the first sentence) and the image characteristics of the video content selected by the user; meanwhile, a sub-clip (which may be a clip matching with the ending accompaniment of the audio content) may be extracted from the video content selected by the user according to the ending position of the lyrics of the audio content (i.e., the time of singing the last lyric), the sub-playing duration of the ending accompaniment of the audio content, and the image characteristics of the video content selected by the user.

In addition, two sub-segments can be extracted from the video content selected by the user according to the lyric characteristics and the sub-playing time length from the starting position of the lyric of the audio content to the time of the first lyric of the climax segment, the lyric characteristics and the sub-playing time length from the time of the last lyric of the climax segment to the end position of the lyric of the audio content, and the image characteristics of the video content selected by the user.

And S230, splicing the at least two sub-segments to generate target video content.

Furthermore, the extracted multiple segments can be spliced according to the playing sequence of the audio content, so as to generate the target video content with the playing time length equal to the playing time length of the audio content.

And S240, generating audio and video according to the audio content and the target video content.

According to the technical scheme provided by the embodiment of the disclosure, under the condition that the playing time of the video content selected by the user is determined to be longer than the playing time of the audio content, the video content selected by the user from the preset picture library is edited according to the playing time and the lyric characteristics of the audio content and the image characteristics of the video content selected by the user, and the matching degree of the generated target video content and the audio content is higher while the playing time of the generated target video content is ensured to be equal to the playing time of the audio content.

Fig. 3 shows a flowchart of another audio/video generation method provided by the embodiment of the present disclosure, where this embodiment is optimized based on various alternatives provided by the above embodiment, and specifically, this embodiment details how to edit video content selected by a user according to a playing time of audio content in each step provided by the above embodiment, so as to generate a target video content.

Optionally, as shown in fig. 3, the audio/video generation method in this embodiment may include the following steps:

s310, if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined, the video content selected by the user from a preset picture library is obtained.

S320, if the fact that the playing time length of the video content selected by the user is larger than the playing time length of the audio content is determined, selecting a segment with first preset time length from the video content selected by the user to serve as a first segment.

In this embodiment, the first preset duration is a preset total duration of the first segment. Optionally, the first preset time duration is less than the playing time duration of the audio content, and further, may be a set proportion of the playing time duration of the audio content, such as 2/3 of the set time duration.

Specifically, if it is determined that the playing time of the video content selected by the user is longer than the playing time of the audio content, a segment with a first preset time length may be randomly selected from the video content selected by the user as the first segment. If the video content selected by the user from the playing starting position to the first preset duration position can be used as the first segment by default; or, a segment with a first preset time length can be selected from the video content as the first segment according to the lyric characteristics of the music content, the image characteristics of the video content selected by the user and the like.

S330, acquiring a segment with a second preset time length as a second segment.

In this embodiment, the second preset duration may be equal to a difference between a playing duration of the audio content and a first preset duration, where the first preset duration is greater than the second preset duration. The second segment may be an introduction segment for introducing the target song. Further, the second segment may be used to introduce the user singing the target song this time, or to introduce a star actually singing the target song, or the like. Further, the second segment may further be a summary segment for introducing the content of the first segment, etc.

Optionally, a segment with a second preset duration is obtained, and any one of the following manners may be adopted as the second segment: 1) a segment associated with the target song is obtained from a preset gallery (such as a local gallery or a cloud gallery) (for example, a relevant segment of a star actually singing the target song can be obtained from the preset gallery), and then a segment with a second preset duration is intercepted from the obtained segment to serve as a second segment. 2) The method includes the steps of obtaining fragments uploaded by a user in real time, and then intercepting the fragments with second preset duration from the obtained fragments to serve as second fragments. 3) The method can acquire the pictures shot by the user in real time, and generates a second preset time-length segment as a second segment and the like based on the pictures shot by the user in real time and the uploading sequence.

S340, splicing the first segment and the second segment to generate target video content.

Optionally, the second segment may be placed before the first segment for splicing, and the spliced portion may be subjected to smoothing processing, thereby obtaining the target video content. In addition, if the second segment is a summary brief segment, the second segment may be spliced after the first segment to generate the target video content.

It should be noted that, according to the present embodiment, the target video content can be generated based on the video content selected by the user and the acquired other video content, so that the flexibility of generating the target video content is increased.

And S350, generating audio and video according to the audio content and the target video content.

According to the technical scheme provided by the embodiment of the disclosure, under the condition that the playing time of the video content selected by the user is determined to be longer than the playing time of the audio content, the target video content with the playing time equal to the playing time of the audio content can be generated by splicing the acquired second segment and the first segment extracted from the video content selected by the user, and then the audio and the video can be generated according to the audio content and the target video content. Compared with the prior art, according to the scheme, the video content selected by the user from the preset picture library can be automatically edited according to the playing time of the audio content, the manual operation of the user is not needed, the user experience is improved, and meanwhile the generation efficiency of the audio and video is improved. In addition, the scheme provides a mode for generating the target video content based on the video content selected by the user and the obtained other video content, and the flexibility of the mode for generating the target video content is improved.

Fig. 4 is a schematic structural diagram of an audio/video generation apparatus provided in an embodiment of the present disclosure, which is applicable to a situation where an audio/video is generated based on audio content of a recorded target song and existing image content in a preset picture library. The apparatus may be implemented by software and/or hardware, and may be configured on an electronic device. Optionally, the electronic device may be a server device carrying an audio and video generation function, or may be a terminal device configured with a karaoke application program provided by the server. As shown in fig. 4, the audio/video generating device in the embodiment of the present disclosure includes:

a video content obtaining module 410, configured to, if it is determined that the user selects to add the recorded video content to the audio content of the recorded target song, obtain the video content selected by the user from the preset picture library;

the target video content generating module 420 is configured to edit the video content selected by the user according to the playing duration of the audio content, so as to generate a target video content; wherein the playing duration of the target video content is equal to the playing duration of the audio content;

and an audio/video generation module 430, configured to generate an audio/video according to the audio content and the target video content.

For example, the preset picture library may include at least one of: the system comprises a picture library configured by user terminal equipment, a picture library in an application program, a cloud picture library of a user and a picture library of a server.

Illustratively, the target video content generation module 420 may include:

and the target video content generating unit is used for cutting the video content selected by the user and/or adjusting the playing speed according to the playing time length of the audio content to generate the target video content if the playing time length of the video content selected by the user is determined to be greater than the playing time length of the audio content.

Illustratively, the target video content generating unit may be specifically configured to:

extracting at least two sub-segments from the video content selected by the user according to the playing time length and the lyric characteristics of the audio content and the image characteristics of the video content selected by the user;

and splicing the at least two sub-segments to generate target video content.

Illustratively, the target video content generation module 420 is further configured to:

if the playing time length of the video content selected by the user is determined to be longer than the playing time length of the audio content, selecting a segment with first preset time length from the video content selected by the user as a first segment;

acquiring a segment with a second preset time length as a second segment; the playing time length of the audio content is longer than a first preset time length, and the first preset time length is longer than a second preset time length;

and splicing the first segment and the second segment to generate target video content.

Illustratively, the second segment is an introduction segment for introducing the target song.

Illustratively, the apparatus may further include:

and the deleting module is used for editing the video content selected by the user according to the playing duration of the audio content and carrying out audio deleting operation on the video content selected by the user before generating the target video content.

if the playing time length of the video content selected by the user is determined to be less than the playing time length of the audio content, determining the cycle number according to the playing time length of the audio content and the playing time length of the video content selected by the user;

and generating target video content according to the video content selected by the user and the cycle number.

The audio and video generating device provided by the embodiment of the disclosure and the audio and video generating method provided by the embodiment belong to the same inventive concept, technical details which are not described in detail in the embodiment of the disclosure can be referred to the embodiment, and the embodiment of the disclosure and the embodiment have the same beneficial effects.

Referring to fig. 5, a schematic structural diagram of an electronic device 500 suitable for implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. Optionally, the electronic device in this embodiment may be a server device that carries an audio and video generation function, and may also be a terminal device that configures a karaoke application program provided by the server. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: if the fact that the user selects to add the recorded video content to the audio content of the recorded target song is determined, the video content selected by the user from a preset picture library is obtained; editing the video content selected by the user according to the playing duration of the audio content to generate target video content; wherein the playing duration of the target video content is equal to the playing duration of the audio content; and generating audio and video according to the audio content and the target video content.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

1. According to one or more embodiments of the present disclosure, there is provided an audio and video generation method including:

According to one or more embodiments of the present disclosure, the preset picture library in the above method includes at least one of: the system comprises a picture library configured by user terminal equipment, a picture library in an application program, a cloud picture library of a user and a picture library of a server.

According to one or more embodiments of the present disclosure, in the above method, editing the video content selected by the user according to the playing duration of the audio content to generate the target video content includes:

if the playing time of the video content selected by the user is determined to be longer than the playing time of the audio content, the video content selected by the user is cut and/or the playing speed is adjusted according to the playing time of the audio content, and the target video content is generated.

According to one or more embodiments of the present disclosure, in the method, according to the playing time length of the audio content, the video content selected by the user is cut to generate the target video content, including:

and splicing the at least two sub-segments to generate target video content.

acquiring a segment with a second preset time length as a second segment; the playing time length of the audio content is longer than the first preset time length, and the first preset time length is longer than the second preset time length;

According to one or more embodiments of the present disclosure, the second segment in the above method is an introduction segment for introducing the target song.

According to one or more embodiments of the present disclosure, before editing the video content selected by the user according to the playing duration of the audio content and generating the target video content, the method further includes:

and carrying out audio deleting operation on the video content selected by the user.

and generating target video content according to the video content selected by the user and the circulation times.

2. According to one or more embodiments of the present disclosure, there is provided an audio/video generating apparatus including:

According to one or more embodiments of the present disclosure, the preset picture library in the above apparatus includes at least one of: the system comprises a picture library configured by user terminal equipment, a picture library in an application program, a cloud picture library of a user and a picture library of a server.

According to one or more embodiments of the present disclosure, the target video content generating module in the above apparatus includes:

According to one or more embodiments of the present disclosure, the target video content generating unit in the above apparatus is specifically configured to:

and splicing the at least two sub-segments to generate target video content.

According to one or more embodiments of the present disclosure, the target video content generating module in the above apparatus is further configured to:

According to one or more embodiments of the present disclosure, the second segment in the above apparatus is an introduction segment for introducing the target song.

According to one or more embodiments of the present disclosure, the above apparatus further includes:

3. According to one or more embodiments of the present disclosure, there is provided an electronic device including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement any of the audio-video generation methods provided by the present disclosure.

4. According to one or more embodiments of the present disclosure, there is provided a readable medium having stored thereon a computer program which, when executed by a processor, implements the audio-video generation method according to any one of the aspects provided in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An audio-video generation method, characterized by comprising:

generating an audio and video according to the audio content and the target video content;

the method for editing the video content selected by the user according to the playing duration of the audio content to generate the target video content comprises the following steps:

if the playing time of the video content selected by the user is determined to be longer than the playing time of the audio content, selecting a segment with a first preset time length from the video content selected by the user as a first segment according to the lyric characteristics of the audio content and the image characteristics of the video content selected by the user;

acquiring a segment with a second preset time length from other video contents except the video contents selected by the user to serve as a second segment; the playing time of the audio content is longer than the first preset time, the first preset time is longer than the second preset time, and the second segment is an introduction segment and is used for introducing the target song;

2. The method of claim 1, wherein the preset picture library comprises at least one of: the system comprises a picture library configured by user terminal equipment, a picture library in an application program, a cloud picture library of a user and a picture library of a server.

3. The method of claim 1, wherein editing the video content selected by the user according to the playing duration of the audio content to generate the target video content comprises:

4. The method of claim 3, wherein the cropping the video content selected by the user according to the playing duration of the audio content to generate the target video content comprises:

and splicing the at least two sub-segments to generate target video content.

5. The method according to claim 1, wherein before editing the video content selected by the user according to the playing duration of the audio content to generate the target video content, the method further comprises:

6. The method of claim 1, wherein editing the video content selected by the user according to the playing duration of the audio content to generate the target video content comprises:

7. An audio-video generation device characterized by comprising:

the audio and video generation module is used for generating audio and video according to the audio content and the target video content;

wherein the target video content generation module is further configured to:

8. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the audio-visual generating method as claimed in any one of claims 1-6.

9. A readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the audio-video generation method according to any one of claims 1 to 6.