CN116366762A

CN116366762A - Method, device, equipment and storage medium for setting beautifying materials

Info

Publication number: CN116366762A
Application number: CN202310367025.4A
Authority: CN
Inventors: 吴晗; 张菲菲; 刘洲; 张源敏; 赖振奇; 叶经大
Original assignee: Guangzhou Shiyinlian Software Technology Co ltd; Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Shiyinlian Software Technology Co ltd; Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-06-30

Abstract

The application discloses a setting method, device, equipment and storage medium for beautifying materials, and relates to the technical fields of computers and Internet. The method comprises the following steps: in a first interface of audio playing, displaying a first video and a first control for setting the first video as a beautification material, wherein the first video is related to the audio; in response to an operation for the first control, displaying at least two setting options, wherein different setting options are used for setting the first video as beautification materials displayed in different scenes; in response to an operation for a target setting option of the at least two setting options, the first video is set as the aesthetic material presented in the target scene. Therefore, the operation complexity of beautifying material setting is fully reduced.

Description

Method, device, equipment and storage medium for setting beautifying materials

Technical Field

The embodiment of the application relates to the technical fields of computers and the Internet, in particular to a method, a device, equipment and a storage medium for setting beautifying materials.

Background

Currently, a user can set up aesthetic materials such as an incoming ring tone in a mobile phone.

In the related art, a user may install an application program dedicated to setting up aesthetic materials in a mobile phone, manually import audio desired to be set up as an incoming ring tone in the application program, and then perform some necessary setting operations on the imported audio, thereby setting the audio as an incoming ring tone.

However, the above manner of setting the beautification materials is high in operation complexity.

Disclosure of Invention

The embodiment of the application provides a setting method, device and equipment of beautifying materials and a storage medium. The technical scheme is as follows:

according to an aspect of the embodiments of the present application, there is provided a method for setting beautification materials, including:

displaying a first video and a first control for setting the first video as a beautifying material in a first interface of audio playing, wherein the first video is related to the audio;

in response to an operation for the first control, displaying at least two setting options, wherein different setting options are used for setting the first video as beautification materials displayed in different scenes;

and setting the first video as the beautifying material displayed in the target scene in response to the operation of the target setting option in the at least two setting options.

According to an aspect of the embodiments of the present application, there is provided a setting device for beautifying materials, the device including:

the interface display module is used for displaying a first video and a first control for setting the first video as a beautifying material in a first interface of audio playing, wherein the first video is related to the audio;

the option display module is used for responding to the operation of the first control and displaying at least two setting options, and different setting options are used for setting the first video as the beautifying materials displayed in different scenes;

and the material setting module is used for responding to the operation of the target setting option in the at least two setting options and setting the first video as the beautifying material displayed in the target scene.

According to an aspect of the embodiments of the present application, there is provided a terminal device, including a processor and a memory, where a computer program is stored in the memory, and the computer program is loaded and executed by the processor to implement the method for setting beautification materials described above.

According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the above-described method of setting beautification materials.

According to one aspect of embodiments of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the terminal device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the terminal device executes the above-described setting method of the beautification material.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

on the one hand, the video related to the audio displayed in the audio playing process is set as the beautifying material in the audio playing process, namely, when a favorite video clip is encountered during the song listening or the short music watching, the currently played video can be set as the beautifying material by one key, and the audio required to be set as the beautifying material is not required to be manually imported as in the related art, so that the operation complexity of setting the beautifying material is reduced, and the setting efficiency of the beautifying material is improved.

On the other hand, when a certain video is set as the beautifying material, a plurality of setting options are provided, so that a user can select to set the video as the beautifying material displayed in different scenes (such as an incoming call scene, a charging scene, a prompting tone scene, a wallpaper scene, a color ring scene and the like), and the setting flexibility of the beautifying material is fully improved.

Drawings

FIG. 1 is a schematic diagram of an implementation environment for an embodiment provided herein;

FIG. 2 is a flow chart of a method for setting up aesthetic materials provided in one embodiment of the present application;

FIG. 3 is a schematic diagram of a first control provided in one embodiment of the present application;

FIG. 4 is a schematic illustration of a first control provided in another embodiment of the present application;

FIG. 5 is a schematic illustration of a first control provided in another embodiment of the present application;

FIG. 6 is a schematic illustration of a first control provided in another embodiment of the present application;

FIG. 7 is a schematic illustration of a first control provided in another embodiment of the present application;

FIG. 8 is a schematic illustration of a first control provided in another embodiment of the present application;

FIG. 9 is a schematic diagram of setting options for a scenario provided in one embodiment of the present application;

FIG. 10 is a schematic diagram of setting options for a scenario provided by another embodiment of the present application;

FIG. 11 is a schematic illustration of an interface provided by one embodiment of the present application;

FIG. 12 is a flowchart of a method for setting up aesthetic materials according to another embodiment of the present application;

FIG. 13 is a schematic diagram of preview effects provided by one embodiment of the present application;

FIG. 14 is a schematic diagram of a preview effect provided by another embodiment of the present application;

FIG. 15 is a schematic view of a preview effect provided by another embodiment of the present application;

FIG. 16 is a schematic illustration of preview effects provided by another embodiment of the present application;

FIG. 17 is a schematic diagram of a preview effect provided by another embodiment of the present application;

FIG. 18 is a flowchart of a method for setting up aesthetic materials according to another embodiment of the present application;

FIG. 19 is a schematic diagram of a video production interface provided in one embodiment of the present application;

FIG. 20 is a schematic diagram of a video production interface provided in accordance with another embodiment of the present application;

FIG. 21 is a schematic diagram of an AI (Artificial Intelligence ) model provided in one embodiment of the application;

FIG. 22 is a schematic diagram of an AI (Artificial Intelligence ) model provided in accordance with another embodiment of the disclosure;

FIG. 23 is a schematic diagram of a functional switch provided in one embodiment of the present application;

FIG. 24 is a block diagram of a setting device for aesthetic materials provided in one embodiment of the present application;

FIG. 25 is a block diagram of a setting device for aesthetic materials provided in another embodiment of the present application;

fig. 26 is a block diagram of a terminal device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown. The implementation environment of the scheme can comprise: a terminal device 11 and a server 12.

The terminal device 11 may be an electronic device such as a mobile phone, a tablet computer, a PC (Personal Computer ), a wearable device, a VR (Virtual Reality) device, an AR (Augmented Reality) device, an in-vehicle device, or the like, which is not limited in this application. The terminal device 11 may have installed therein a client running a target application. For example, the target application may be an application program for playing audio, a music playing application program, a social application program, an interactive entertainment application program, and the like, which is not limited in this application.

The server 12 may be a server, a server cluster comprising a plurality of servers, or a cloud computing service center. The server 12 may be a background server of the target application program described above for providing background services to clients of the target application program.

The terminal device 11 and the server 12 may communicate with each other via a network.

Referring to fig. 2, a flowchart of a method for setting an aesthetic material according to an embodiment of the present application is shown, where an execution subject of each step of the method may be the terminal device 11 in the implementation environment of the solution shown in fig. 1, and for example, the execution subject of each step may be a client. In the following method embodiments, for convenience of description, only the execution subject of each step is described as a "client". The method may comprise at least one of the following steps (210-230):

in step 210, in a first interface of audio playing, a first video and a first control for setting the first video as a beautification material are displayed, wherein the first video is a video related to the audio.

A first interface: and when playing the audio, the interface displayed by the client. Optionally, the background music in the first video is the audio. Alternatively, when audio is played, the screen content of the first video with the audio as background music is played at the same time. Optionally, a progress bar of the audio playing is displayed in the first interface.

First video: video related to the audio. Optionally, the background music of the first video is the audio. Optionally, the audio is a video sounder or dubbing of the first video. Optionally, the first video is a video generated based on the audio. In some embodiments, the first interface for audio playback is provided by an application that plays the audio. The application is not limited to the application for playing the audio. Alternatively, the application playing the audio is an audio playing application, such as a music application program. Alternatively, the application playing the audio is a video playing application, such as a short video platform. In addition, applications that play audio include, but are not limited to, gaming applications, live applications, and the like. When the application for playing the audio is a music application program, when the audio playing is triggered, a first interface for playing the audio is displayed, and meanwhile, a first video corresponding to the audio is displayed. Optionally, the audio is at least one of a music database in the music application. Optionally, for the audio, there is at least one video associated therewith. Alternatively, the video is a video having the audio as background music. When the audio is played, displaying a first video and a first control for setting the first video as a beautifying material in a first interface for playing the audio. The explanation for other applications refers to music applications and is not repeated here.

In some embodiments, the first video is a video in a music clip library of an application playing audio. In some embodiments, the application playing the audio has video stored therein that was uploaded by the user. Optionally, the application playing the audio considers the video uploaded by the user as video in a music clip library. Optionally, the same audio corresponds to at least one video. For song a, 100 users each authored a video for song a, resulting in 100 different videos. Each of these 100 different videos may be considered to be the video corresponding to song a.

In some embodiments, the first video is an application-generated video that plays audio. Optionally, the application playing the audio can generate the video by itself according to the audio without uploading by the user. Optionally, the application playing the audio generates the video according to the rhythm information of the audio. Optionally, the picture change of the video is positively correlated with the change of the cadence information. For example, where the tempo of the audio is slow, the picture change rate of the video is low, and the occupation of the picture change area in the total area of the picture is low. In contrast, in a place where the rhythm of audio is intense, the picture change rate of video is high, and the picture change area occupies a relatively high total area of the picture. Optionally, the application playing the audio generates video from the keyword information in the audio. Optionally, the application playing the audio extracts text information in the audio, and determines text with the occurrence frequency of the text information being greater than a first threshold value as the keyword information. Optionally, a picture or a video clip corresponding to the keyword information is extracted from the database according to the keyword information. Optionally, combining the pictures or video clips corresponding to the keyword information to obtain the video corresponding to the audio. In other embodiments, when the audio is a song, the video may be generated from lyrics of the song. In some embodiments, the video corresponding to the song is automatically generated from the lyrics of the song by the AI model. Alternatively, see the description of fig. 21 in the following embodiments for how video is generated by the AI model in particular. The specific manner of self-generating video for the application of playing audio is not limited, and reference may be made to the following explanation of the embodiments, which are not repeated here.

Of course, whether video in a music clip library of an application playing audio or video generated by an application playing audio, the first video includes, but is not limited to, video presented on a horizontal screen, video presented on a vertical screen.

A first control: is a User Interface (UI) control. Any visual control or element that can be seen on the user interface of the application, such as a picture, input box, text box, button, tab, etc., some UI controls being responsive to user operations, such as the first control may set the first video to beautify the material responsive to user operations. UI controls referred to in embodiments of the present application include, but are not limited to: a first control, a confirmation control, a fragment selection control, a second control, and so on.

In some embodiments, the first control is any visual control or element that is visible on the first interface. The application does not limit the display position and the display type of the first control.

Beautifying materials: material with beautifying function. The embodiment of the application does not limit the action scene of beautifying the materials. Optionally, a scene is beautified by using the first video as the beautification material.

At step 220, in response to the operation for the first control, at least two setting options are displayed, different setting options being used to set the first video as the aesthetic material presented in different scenes.

The operation for the first control is an operation for displaying a setting option, and the operation type of the operation for the first control is not limited in the embodiment of the present application. The operation type for the operation of the first control includes, but is not intended to be limited to, single click, swipe, double click, drag, voice input, face input, and the like. Alternatively, taking a voice input as an example, assuming that the user speaks "click first control" to the terminal device (or client), the terminal device (or client) receives the voice, and recognizes the voice content as "click first control", and considers that the terminal device (or client) receives the operation. In some embodiments, as shown in FIG. 3, the first control is control 301 displayed with "set to video ringtone". In some embodiments, as shown in FIG. 4, the first control is a control 401 displayed with "set to video ringtone". In some embodiments, as shown in FIG. 5, the first control is a control 501 displayed with "set to video ringtone". In some embodiments, as shown in FIG. 6, the first control is a control 601 displayed with "set to video ringtone". As shown in fig. 3-6, the first control may be disposed at different locations of the first interface. At this time, the first interface is an interface where audio and video corresponding to the audio are in a playing state.

In some embodiments, as shown in FIG. 7, the first control is a control 701 with a "video ringtone" displayed. At this time, the first interface is the homepage of the user account, and at this time, the first interface is the interface where the audio and the video corresponding to the audio are in the pause playing state. In response to the operation on the first control, at least one first video is displayed, further, after one of the first videos is clicked, an interface where audio and a video corresponding to the audio are in a playing state is displayed, and execution is continued from step 210. In some embodiments, as shown in fig. 8, the first control is a control 801 with a "video ringtone" displayed. At this time, the first interface is a main interface of the application program. In response to the operation on the first control 801, at least one first video is displayed, further, after one of the first videos is clicked, an interface where audio and a video corresponding to the audio are in a playing state is displayed, and execution is continued from step 210.

Setting options: an option for setting the first video as a beautification material of the scene. In some embodiments, the setting option is a UI control. The embodiment of the application does not limit the display position and the display type of the setting options. In some embodiments, at least two setting options are displayed in response to an operation for the first control. At the moment, the interface can be kept relatively clean, and the experience of the user who does not need to set the beautifying materials is improved. In other embodiments, the first video and the at least two setting options are displayed in a first interface of the audio playback, where the at least two setting options are displayed directly on the first interface without the first control. At this time, the speed of setting the beautification materials of the scene can be increased, the operation is not required to be executed for the first control, and the operation is directly required to be executed for the target setting option. In some embodiments, the number of setting options is related to the number of scenes. Optionally, the number of setting options is positively correlated with the number of scenes. Optionally, the number of setting options is equal to the number of scenes.

In some embodiments, the at least two setting options include at least two of: the first setting option is used for setting the first video as a beautifying material displayed in an incoming call scene; the second setting option is used for setting the first video as a beautifying material displayed in a charging scene; the third setting option is used for setting the first video as a beautifying material displayed in the prompting sound scene; a fourth setting option, configured to set the first video as a beautification material displayed in a wallpaper scene; and a fifth setting option, configured to set the first video as a beautification material displayed in the color ring scene.

In some embodiments, as shown in fig. 9, there are displayed a setting option 901 of an incoming call scene, a setting selection 902 of a color ring scene, and a setting option 903 of a charging scene. In other embodiments, as shown in fig. 10, there are displayed a setting option 1001 of an incoming call scene, a setting option 1002 of a charging scene, a setting option 1003 of a reminder scene, and a setting option 1004 of a wallpaper scene. Different setting options are used to set the first video as aesthetic material that is presented in different scenes. The setting option 1001 of the incoming call scene is used to set the first video as the beautification material shown in the incoming call scene. The setting option 1002 of the charging scene is used to set the first video as the beautification material shown in the charging scene. The setting option 1003 of the alert sound scene is used for setting the first video as the beautification material displayed in the alert sound scene. The wallpaper scene setting option 1004 is used to set the first video as a beautification material presented in the wallpaper scene.

In other embodiments, cooperation with the operator is required when the user wants to use the polyphonic ringtone function. Optionally, as shown in fig. 11, sub-image a and sub-image b show that when the user sets the video color ring, the protocol that needs to cooperate with the operator is popped up, and on the basis of cooperation establishment, the video color ring can be successfully set.

The scenario in the embodiment of the present application refers to a terminal device or a place where an application program on the terminal device needs and can use the beautified material. The following is a specific description of different scenarios.

In some embodiments, the scene is an incoming call scene. In some embodiments, when other terminal devices call the current terminal device, or the client of the a application on the other terminal devices call the client of the a application on the current terminal device, there is an incoming call prompting sound effect and an incoming call prompting interface on the current terminal device. However, the conventional incoming call prompt sound effect and the incoming call prompt interface are relatively single, so that the personalized customization requirement can be met by means of a custom audio or display interface. Optionally, at this time, the first video may be applied to a prompt sound effect or a prompt interface in the incoming call scene. For how the specific first video is applied to the alert sound effect or alert interface in the incoming call scene, refer to the explanation of the following embodiments, which are not repeated here.

In some embodiments, the scene is a charging scene. In some embodiments, when the user charges the terminal device, a charging prompt interface or a charging prompt sound effect is displayed on the terminal device, or a charging special effect is displayed on an element representing the electric quantity on the UI interface. At the moment, the personalized customization can be carried out on the charging prompt interface, the charging prompt sound effect or the charging special effect so as to meet the personalized customization requirement of the user. For how the specific first video is applied to the alert sound effect or alert interface in the incoming call scene, refer to the explanation of the following embodiments, which are not repeated here.

In some embodiments, the scene is a cue sound scene. In some embodiments, different alert tones may be provided on the terminal device according to different situations, for example, an alert tone may be provided when a short message is received, an alert tone may be provided when an application program is pushed, an alert tone may be provided when a user starts up, etc., and for these scenes where alert tones are required, the alert tone based on the first video may be customized for these scenes, or further, not only the alert tone may be provided, but also an alert interface corresponding to the alert tone may be designed.

In some embodiments, the scene is a wallpaper scene. Alternatively, the wallpaper may be considered to be applicable wherever a picture is desired in the terminal device. For example, wallpaper is required for a display screen of a main screen of the terminal device, wallpaper is required for a background screen of a b application program of the terminal device, wallpaper is required for a shutdown screen of the terminal device, wallpaper is required for a startup screen of the terminal device, wallpaper is required for a screen locking interface of the terminal device, and the like. These scenes requiring wallpaper are considered wallpaper scenes. Likewise, for the scenes requiring wallpaper, the technical method provided in the embodiment of the present application may customize the wallpaper based on the first video for the scenes, or further not only have the wallpaper, but also design the playing sound effect corresponding to the wallpaper, which is not limited in this application.

In some embodiments, the scene is a polyphonic ringtone scene. In some embodiments, when other terminal devices call the current terminal device, or when a client of an a application on other terminal devices calls a client of an a application on the current terminal device, there is a dial prompt tone on other terminal devices, usually "beep", but this way is more single. Therefore, the technical scheme provided by the embodiment of the application designs the video color ring and the audio color ring, and determines the color ring content according to the first video. Optionally, after the a terminal device determines the target color ring, when the b terminal device calls the a terminal device, playing the target color ring set by the a terminal device in the b terminal device.

Of course, the scenes in the embodiment of the present application include, but are not limited to, the 5 types of scenes described above, and any place where the beautifying material is needed in the terminal device can be considered as the scene in the embodiment of the present application.

In response to the operation for the target setting option of the at least two setting options, the first video is set as the aesthetic material presented in the target scene, step 230.

In some embodiments, the target setting option is any one of at least two setting options. In some embodiments, the operation for the target setting option of the at least two setting options is an operation for setting the first video as the beautifying material displayed in the target scene, and the operation type of the operation for the target setting option of the at least two setting options is not limited in the embodiments of the present application. The operation type of the operation for the target setting option of the at least two setting options includes, but is not intended to be limited to, a single click, a slide, a double click, a drag, a voice input, a face input, and the like.

In some embodiments, the first video is set as the aesthetic material presented in the incoming scene in response to an operation for the first setting option. In some embodiments, the first video is set as the aesthetic material presented in the charging scenario in response to an operation for the second setting option. In some embodiments, the first video is set as the aesthetic material presented in the alert sound scene in response to an operation for the third setting option. In some embodiments, the first video is set as the aesthetic material presented in the wallpaper scene in response to an operation for the fourth setting option. In some embodiments, the first video is set as the aesthetic material presented in the color ring scene in response to an operation for the fifth setting option.

In other embodiments, the first video is directly used as the beautification material when the first video is set as the beautification material. Or processing the first video to obtain processed materials, and taking the processed materials as beautifying materials applied to different scenes. Optionally, the processing manner of the first video includes, but is not limited to, capturing video clips mentioned in the following embodiments, changing background music corresponding to the first video, and so on. In other embodiments, the processing of the first video further includes capturing a position of a target object in a target video frame in a plurality of video frames in the first video, taking the position of the target object as a center, capturing the target video frame according to a size or an equal ratio reduced size of the beautifying material under a scene requiring the beautifying material, obtaining a captured video frame, and finally obtaining a plurality of captured video frames according to the plurality of video frames, and combining the captured video into a captured video to be used as the beautifying material. Wherein the target object may be an object appearing in the first video, the object may be a person, an animal, a plant, or the like. In some embodiments, the terminal device determines the target object itself. Optionally, the first video includes a plurality of objects, the target object is one of the plurality of objects, for example, the target object is an object with highest occurrence frequency among the plurality of objects, or the target object is an object at a key position in the screen, and the terminal device determines that the object with highest occurrence frequency among the plurality of objects, or the target object is an object at a key position in the screen, as the target object. In other embodiments, a target object is determined from a plurality of objects in response to an object selection operation by a user, and a truncated video centered on the target object is further generated from the target object. Of course, the object selection operation for the user may be that the user clicks the target object in the first video to consider the object selection operation as the user, or may display at least one identifier (such as a header or a number) of the object under the displayed first video, respond to the object selection operation for the identifier of the target object, determine the target object by the terminal device, and further generate an intercepted video centered on the target object according to the target object. Of course, the number of target objects may be plural. According to the technical scheme, two modes for determining the target object are provided, on one hand, the terminal equipment determines the target object by itself, the step can be completed by the background server before the user needs to set the beautifying materials, and the processing is not needed until the user sets the beautifying materials, so that the determination speed of the target object and the generation efficiency of the intercepted video are improved, and the processing cost of the equipment is reduced. On the other hand, the user can determine the target object by himself, so that the personalized customization requirement of the user can be met, and the determination of the beautifying materials is more flexible and interesting.

According to the technical scheme, through setting the video which is displayed in the audio playing process and is related to the audio as the beautifying material in the audio playing process, namely when a favorite video clip is encountered during song listening or music clip watching, the currently played video can be set as the beautifying material by one key, and the audio which needs to be set as the beautifying material is not required to be manually imported as in the related technology, so that the operation complexity of setting the beautifying material is simplified, and the setting efficiency of the beautifying material is improved. In addition, by selecting the video as the beautifying material in the application for playing the audio, the function of the application for playing the audio is expanded, and the characteristic of rich multimedia resources in the application for playing the audio is fully utilized, so that the user has more selectivity when selecting the video as the beautifying material. And when a certain video is set as the beautifying material, a plurality of setting options are provided, so that a user can select to set the video as the beautifying material displayed in different scenes (such as an incoming call scene, a charging scene, a prompting sound scene, a wallpaper scene, a color ring scene and the like), and the setting flexibility of the beautifying material is fully improved.

Referring to fig. 12, a flowchart of a method for setting an aesthetic material according to another embodiment of the present application is shown, where the execution subject of each step of the method may be the terminal device 11 in the implementation environment of the solution shown in fig. 1, and for example, the execution subject of each step may be a client. In the following method embodiments, for convenience of description, only the execution subject of each step is described as a "client". The method may comprise at least one of the following steps (310-350):

in step 310, in a first interface of audio playing, a first video, which is a video related to audio, and a first control for setting the first video as a beautifying material are displayed.

At step 320, in response to the operation for the first control, at least two setting options are displayed, different setting options being used to set the first video as the aesthetic material presented in different scenes.

In step 330, in response to the operation for the target setting option in the at least two setting options, a setting interface corresponding to the target setting option is displayed, and options of at least two sub-scenes included in the target scene are displayed in the setting interface.

In some embodiments, the target scene includes at least two sub-scenes. Optionally, the sub-scenes include, but are not limited to, video sub-scenes, audio sub-scenes, picture sub-scenes. Alternatively, a video sub-scene may be understood as using video pictures in the first video in the target scene. For example, the complete first video is used in the first video sub-scene. Alternatively, an audio sub-scene may be understood as the audio associated with the first video being used in the target scene. For example, background music of a first video is used in a first audio sub-scene. Alternatively, an audio sub-scene may be understood as a picture related to the first video being used in the target scene. For example, pictures of a key frame in a first video are used in a first picture sub-scene.

In some embodiments, when the target scene is an incoming call scene, as shown in fig. 13, when a setting interface corresponding to the target setting option is displayed in response to an operation for setting the target setting option of the at least two setting options as a setting option 1001 of the incoming call scene, options of at least two sub-scenes included in the target scene are displayed in the setting interface. The selection for the video sub-scene may be incoming video 1301. The selection for the audio sub-scene may be an incoming ring tone 1302. In some embodiments, as shown in fig. 14, when the target scene is a charging scene, when a setting interface corresponding to the target setting option is displayed in response to an operation of setting option 1002 for the charging scene for the target setting option of the at least two setting options, options of at least two sub-scenes included in the target scene are displayed in the setting interface. The selection for the video sub-scene may be a charging video 1401. The selection for the audio sub-scene may be a charging bell 1402. In some embodiments, when the target scene is a cue audio scene, the selection of video sub-scenes may be cue video. The selection for the audio sub-scene may be a prompting ring tone. In some embodiments, when the target scene is a wallpaper scene, the selection of video sub-scenes may be dynamic wallpaper video. The selection for the audio sub-scene may be a wallpaper ringtone. In some embodiments, when the target scene is a polyphonic ringtone scene, the selection for the video sub-scene may be a video polyphonic ringtone. The selection for the audio sub-scene may be a ringtone ring back tone.

In some embodiments, the settings interface is an interface for setting aesthetic materials. Optionally, different setting options correspond to different setting interfaces. In some embodiments, the selection item of the sub-scene is a UI control, and the display position and the display type of the selection item of the sub-scene on the setting interface are not limited in the embodiments of the present application.

In step 340, in response to the operation of the selection item for the first sub-scene of the at least two sub-scenes, a preview effect of setting the first video as the beautification material shown in the first sub-scene is displayed in the setting interface.

In some embodiments, the operation for the selection of the first sub-scene of the at least two sub-scenes is an operation for displaying the beautification material shown in the first sub-scene, and the operation type of the operation for the selection of the first sub-scene of the at least two sub-scenes is not limited in this embodiment. The operation type of the operation for the selection item of the first sub-scene of the at least two sub-scenes includes, but is not intended to be limited to, a single click, a slide, a double click, a drag, a voice input, a face input, and the like. In some embodiments, the preview effect is an effect of using the aesthetic material in the first sub-scene.

In some embodiments, when the target scene is a charging scene, the selection of video sub-scenes may be charging video 1401. The selection for the audio sub-scene may be a charging bell 1402. As shown in fig. 14, in response to an operation of a selection item 1401 for a first sub-scene of at least two sub-scenes, a preview effect 1403 of setting a first video as an beautification material shown in the first sub-scene is displayed in a setting interface. As shown in fig. 15, in response to an operation of the selection item 1502 for a first sub-scene of the at least two sub-scenes, a preview effect 1503 of setting the first video as the beautification material shown in the first sub-scene is displayed in the setting interface.

In other embodiments, there are also at least two sub-scene choices for the wallpaper scene. As shown in fig. 16, when a setting interface corresponding to a target setting option is displayed in response to an operation of setting option 1004 for a wallpaper scene for the target setting option of at least two setting options, options of two sub-scenes included in the target scene are displayed in the setting interface. The selection items of the sub-scenes include a lock screen sub-scene 1601 and a desktop sub-scene 1602. In other embodiments, there are also at least two sub-scene choices for the alert tone scene. As shown in fig. 17, when a setting interface corresponding to a target setting option is displayed in response to an operation of setting option 1003 for a target setting option of at least two setting options being a cue sound scene, options of three sub-scenes included in the target scene are displayed in the setting interface. The selection items of the sub-scenes include an alarm clock sub-scene 1701, a short message sub-scene 1702, and a notification sub-scene 1703.

In step 350, in response to an operation for the validation control in the setup interface, the first video is set as the aesthetic material presented in the first sub-scene.

In some embodiments, the confirmation control is a UI control, and the embodiments of the present application do not limit the display position and the display type of the confirmation control on the setting interface. In some embodiments, a confirmation control 1501 is shown as FIG. 15.

In some embodiments, the operation for the validation control in the setting interface is an operation for determining that the beautification material is used in the first sub-scene, and the operation type of the operation for the validation control in the setting interface is not limited in the embodiments of the present application. The types of operations for setting up the operations of the validation control in the interface include, but are not intended to be limited to, single click, swipe, double click, drag, voice input, face input, and the like.

In some embodiments, a segment selection control corresponding to the first video is displayed in the setting interface, where the segment selection control is used for user-defined selection of video segments in the first video; based on the operation of the segment selection control, a selected video segment in the first video is determined, the video segment for the aesthetic material set to be presented in the first sub-scene.

In some embodiments, the segment selection control is a UI control, and the embodiments of the present application do not limit the display position and the display type of the segment selection control on the setting interface. In some embodiments, instead of using the complete first video in the target scene, the video clips from the first video are used in the target scene as aesthetic material. In some embodiments, the operation for the segment selection control is an operation for selecting a video segment from the first video, and the operation type of the operation for the segment selection control is not limited in the embodiments of the present application. The types of operations for the operations of the segment selection control include, but are not intended to be limited to, single click, swipe, double click, drag, voice input, face input, and the like.

In some embodiments, the clip selection control is a draggable progress bar corresponding to the first video. Optionally, different positions in the progress bar correspond to different time stamps in the first video. Optionally, determining, by the segment selection control, a start position and an end position from the progress bar, and determining a video segment between a time stamp corresponding to the start position and a time stamp corresponding to the end position as a video segment in the first video selected by user definition. Alternatively, the starting position as well as the ending position may be changed by the user. Optionally, the video clip is determined from the complete progress bar by dragging the progress bar. Optionally, the video clip is determined by directly inputting the time information of the start time stamp and the end time stamp. For example, at least two information input boxes are displayed for operation of the segment selection control. And respectively inputting time information of the starting time stamp and the ending time stamp in at least two information input boxes, so as to customize the video clips in the selected first video.

According to the technical scheme provided by the embodiment of the application, different requirements of users can be met by setting at least two sub-scenes. When the user wants only a bell sound, the video-related audio can be set as the bell sound required for the target scene. When the user wants the video, the user can customize and determine the video clip from the video, and can set the video clip as the video required by the target scene. Therefore, the method and the device at least meet the beautifying requirements of different sub-scenes of the target scene. In addition, the video clips can be customized by the user, so that the flexibility of setting the beautifying materials is improved, and the setting experience of the user is improved.

Referring to fig. 18, a flowchart of a method for setting an aesthetic material according to another embodiment of the present application is shown, where the execution subject of each step of the method may be the terminal device 11 in the implementation environment of the solution shown in fig. 1, and for example, the execution subject of each step may be a client. In the following method embodiments, for convenience of description, only the execution subject of each step is described as a "client". The method may comprise at least one of the following steps (410-450):

In step 410, in a first interface of audio playback, a first video, which is a video related to audio, and a first control for setting the first video as a beautifying material are displayed.

In some embodiments, step 410 is preceded by at least one of steps S1-S3 (not shown).

Step S1, in a second interface of an application playing audio, a second control for generating video based on the AI model is displayed.

In some embodiments, the second control is a UI control, and the display position and the display type of the second control on the second interface are not limited in the embodiments of the present application. The second interface is considered to be the interface on which the second control is displayed. Optionally, the second interface is a main interface of an application playing audio. A second control is displayed on the main interface.

And step S2, responding to the operation of the second control, displaying a video production interface, wherein the video production interface comprises at least two production options, and different production options are used for generating videos by adopting different AI models.

The video production interface is considered as an interface capable of producing video. Optionally, the video production interface displays at least two production options. In some embodiments, the production options are UI controls, and the embodiments of the present application do not limit the display positions and display types of the production options on the video production interface. In some embodiments, the production options include, but are not limited to, a first production option, a second production option, and a third production option. And (5) manufacturing an interface. In some embodiments, as shown in fig. 19, the first video may be authored by an authoring option 1900, such as by authoring option 1900 to add video visuals audibly. In some embodiments, as shown in fig. 20, a video production interface is displayed in response to an operation for the second control 2000.

And step S3, responding to the operation of the target production options in the at least two production options, and generating a first video through the AI model corresponding to the target production options.

In some embodiments, the first video is a video generated based on an AI model, the AI model including a key frame generation model and a video generation model.

In some embodiments, the AI model is a machine learning model including, but not limited to, a neural network model, a deep learning model, and the like. The embodiments of the present application are not limited to a specific type of AI model.

In some embodiments, generating the first video through the AI model includes the following steps.

First, input information required for generating a first video is acquired.

The input information in this embodiment of the present application is at least divided into three types, and the description of the following embodiments is referred to, and will not be repeated here.

And secondly, generating at least two key frames according to the input information through a key frame generation model.

In some embodiments, when the input information is text, the keyframe generation model is a meridional graph model, such as DALL-E2. Alternatively, the input is text, and the pictures corresponding to the text, namely the key frames, are obtained through a key frame generation model.

In some embodiments, the key frame generation model includes an encoding network, a mapping network, and a decoding network.

The second step can be specifically divided into: inputting the input information into a coding network, and outputting an embedded representation of the input information through the coding network; inputting the embedded representation of the input information into a mapping network, and outputting the mapped embedded representation through the mapping network; the mapped embedded representation is input to a decoding network, through which at least two key frames are output.

In some embodiments, the encoding network is an encoder, the decoding network is a decoder, and the mapping network is a transducer network. The present application is not limited to a particular network architecture.

Thirdly, performing frame interpolation and super-resolution processing on at least two key frames through a video generation model to obtain a first video.

In some embodiments, the interpolation is performed for at least two key frames. Illustratively, 3 frames of images are obtained through a key frame generating network, 30 frames of images are obtained after frame interpolation processing, and the image smoothing processing can be understood herein, so that the transition between the images is smoother and more reasonable. After the image after the frame insertion processing is obtained, super resolution processing is performed on the image. Illustratively, the original image has a resolution of 56×56, and after super-resolution processing, an image with a resolution of 256×256 is obtained. In other embodiments, the super-resolution processing is performed at least twice, and the obtained image with the resolution of 256×256 is further performed on the super-resolution processing, so as to obtain an image with the resolution of 728×728. Optionally, the video generation model and the keyframe generation model are separately trained. Optionally, the keyframe generation model is an unsupervised machine learning model. Optionally, the video generation model is a machine learning model trained in advance.

As shown in fig. 21, which illustrates the AI model architecture. Optionally, the AI models include a keyframe generation model 2100 and a video generation model 2110. The key frame generation model 2100 includes at least an encoding network, a mapping network, and a decoding network. Optionally, the video generation model 2110 includes at least an interpolation frame for the key frame and at least two super-resolution processes. Optionally, the input of the keyframe generation model 2100 is input information and the output of the keyframe generation model 2100 is at least one frame of keyframe. Optionally, the input of the video generation model 2110 is at least one frame of key frames and the output is the generated first video.

In some embodiments, the operation for the target production option of the at least two production options is an operation for generating the first video through the AI model corresponding to the target production option, and the operation type of the operation for the target production option of the at least two production options is not limited in this embodiment. The operation types of operations for the target production option of the at least two production options include, but are not intended to be limited to, single click, swipe, double click, drag, voice input, face input, and the like. In some embodiments, as shown in fig. 20, in the video production interface 2001, in response to an operation for a target production option 2002 of at least two production options, a first video is generated by an AI model corresponding to the target production option 2002.

The following describes the video generation process for three different production options and corresponding AI models.

First, the input information includes description information, and the first video is a video generated based on the description information through an AI model.

In some embodiments, the descriptive information is information entered by the user at his own or selected by the user. Optionally, in response to an operation for the first production option, a text entry box is displayed. Alternatively, the user may input description information of a video that the user wants to generate in the text input box. Of course, the description information herein includes, but is not limited to, at least one or more of a word, a sentence, a picture, and a video. Optionally, the terminal device obtains the description information in response to an input operation of the description information of the user, and generates the video through the AI model. Optionally, at least one text label or video label is displayed in response to an operation for the first production option. Alternatively, the user may select his own desired tag from at least one text tag or video tag as the description information. Optionally, the terminal device determines the description information in response to a tag selection operation of the user, and generates the video through the AI model.

Second, the input information includes the selected style and reference videos and/or reference pictures, and the first video is a video generated by the AI model based on the reference videos and/or reference pictures in combination with the selected style.

In some embodiments, the reference video or reference picture may be uploaded to the client by the user, or may be selected by the user directly from a music film library in the client or a video generated by the client. In addition, the user can also self-clip the video in the client to obtain the reference video or self-draw the picture to obtain the reference picture, and the sources of the reference video and the reference picture are not limited in the application. In other embodiments, the user may select a video style in addition to the reference video and the reference picture. In other embodiments, after selecting the reference video or reference picture, the user may jump out of the style selection interface where he or she may select or enter his or her style. And the terminal equipment responds to the style selection or input operation of the user and determines the style selected by the user. Of course, the video may also be generated by the AI model based solely on the reference video or the reference picture, which is not limited in this application.

In some embodiments, the first video is derived using the user selected style as the text of the input and the reference video and/or the reference picture as the condition of the input.

Third, the input information includes a modification appeal and a reference video and/or reference picture, the first video being a video generated by the AI model based on the reference video and/or reference picture in combination with the modification appeal.

The explanation of the reference video and the reference picture is referred to the above embodiments, and is not repeated here. In some embodiments, the modification complaint is a user-made modification request for the reference video and/or reference picture. For example, the reference video or reference picture is a boy, but the video the user wants to generate is a character is a girl, and the modification appeal may be "change boy to girl". Alternatively, the modification appeal may be entered by the user at his own time, and the terminal device recognizes the user's input information to obtain the modification appeal of the user. Optionally, the modification request may be selected by the user, and modification labels for different modification directions are displayed on the user interface of the terminal device. In response to a user selection operation of the tag in a different modification direction, a modification appeal of the user is determined.

In some embodiments, the first video is derived using the user modification appeal as text of the input and the reference video and/or reference picture as conditions of the input.

In some embodiments, as shown in fig. 22, a model architecture 2200 of another AI model is illustrated by way of example. Optionally, the AI model at least includes a key frame prediction module, a video clip generation module, and a video combination module. In some embodiments, the AI model of fig. 22 is used for automatic generation of a video in a client, where the first video is generated by the client itself, unlike the manual input of information by a user described above. Alternatively, when it is desired to make a video clip corresponding to a certain audio, a song is determined as the audio, and lyrics of the song are input into the AI model. The AI model firstly divides the lyrics into segments, and extracts elements and labels from the divided audio segments to obtain audio information. For example, when the audio is a song, lyric information of the song is extracted as the audio information. Optionally, a knowledge base of audio information and picture types is built, that is, a corresponding relationship between the audio information and the picture is built. Optionally, determining the picture from the audio information and picture type knowledge base according to the audio information as a key frame. Optionally, a video clip is generated based on the selected plurality of key frames. Further, a combination mode is selected from the constructed video segment combination mode knowledge base, and video segments are combined to obtain a video.

Of course, when the application playing the audio automatically generates the first video, the first video may be generated not by the AI model but by a video generation algorithm in addition to the AI model shown in fig. 22. An exemplary description of the video generation algorithm is provided below. Optionally, displaying the complete lyrics of the song selected by the user on the terminal device, intercepting an audio fragment in response to the lyrics selection operation of the user, wherein the audio fragment corresponds to at least one sentence of lyrics, and taking the lyrics corresponding to the audio fragment as audio information, wherein optionally, the lyrics included in the audio information can belong to a plurality of songs. Alternatively, the lyrics included in the audio information may be discontinuous or continuous. The specific type of audio information is not limited in this application. Optionally, an application for playing audio is provided with an audio information and picture type knowledge base. Optionally, at least one picture is selected from the audio information and picture type knowledge base according to the audio information selected by the user. Optionally, a video is generated from the at least one picture. For example, displaying the target picture in the at least one picture in N time units, and finally obtaining a video with a duration of N, where N is a positive number greater than N, and N is a positive integer. Illustratively, the audio information and picture type knowledge base is constructed as follows: taking the picture of the singer of the song to which the lyrics belong in the audio information as the picture corresponding to the lyrics. Alternatively, a sentence of lyrics may correspond to at least one picture. Optionally, when determining lyrics in the audio information, a picture of the singer corresponding to the lyrics of the sentence is found from the audio information and picture type knowledge base. In this case, the video generation algorithm may be provided in the client or the server, which is not limited in this application. When the video is generated based on the video generation algorithm, the AI model can be avoided, the complicated model design and training process is avoided, and the processing overhead of the equipment is reduced to a certain extent.

In step 420, at least two setting options are displayed in response to the operation for the first control, different setting options being used to set the first video as the aesthetic material presented in different scenes.

In response to the operation for the target setting option of the at least two setting options, the first video is set as the aesthetic material presented in the target scene, step 430.

Step 440, displaying an audio setting control, where the audio setting control is used to customize setting background music of the selected video clip in the first video.

In some embodiments, the audio setting control is a UI control, and the embodiments of the present application do not limit the display position and the display type of the audio setting control. Optionally, an audio setting control is displayed at the setting interface. Of course, the audio setting controls may also be displayed at other interfaces.

In some embodiments, the first video corresponds to audio. Optionally, the video clip has audio corresponding to a video picture of the video clip, and the audio corresponding to the video picture of the video clip is also used as the beautifying material. In some embodiments, the video clip provides only video pictures, and not audio. Optionally, the audio is determined for the video frame provided by the video clip through an audio setting control.

Step 450, based on the operation of the audio setting control, determining background music of the video clip, the background music of the video clip being used to be set together with the video picture of the video clip as the beautification material presented in the target scene.

In some embodiments, the operation for the audio setting control is an operation for determining background music of the video clip, and the operation type of the operation for the audio setting control is not limited in the embodiments of the present application. The types of operations for the operation of the audio setting control include, but are not intended to be limited to, single click, swipe, double click, drag, voice input, face input, and the like.

In some embodiments, the source of audio is audio associated with the first video. Optionally, the background music of the first video is first audio. Optionally, the background music of the video clip is determined from the first audio based on the operation of the audio setting control. In some embodiments, the first video is a video of 5 minutes in duration, the background music corresponding to the first video is also 5 minutes, and the video clip intercepted at this time is a video clip of 1 st to 2 nd minutes. At this time, an audio clip was randomly cut out from the background music of 5 minutes as the background music of the video clip. Optionally, the duration of the audio clip is the same as the duration of the video clip. Alternatively, the duration of the audio clip may be shorter than the duration of the video clip, although the audio clip is repeated or supplemented with blank sounds in portions of the video clip where the duration of the video clip exceeds the duration of the audio clip. In some embodiments, a segment of the first audio that is more strongly rhythmic is truncated as background music for the video segment. In some embodiments, a segment of the climax part is truncated from the first audio as background music for the video segment. When the climax part of the first audio is 3 rd to 4 th minutes, the audio of 3 rd to 4 th minutes is taken as background music of the video clip of 1 st to 2 nd minutes.

In other embodiments, the source of audio is audio unrelated to the first video. Optionally, the background music of the first video is first audio. Optionally, the second audio is determined from a plurality of audio, and the background music of the video clip is determined based on the operation of the setting control for the audio. The second audio is another audio different from the first audio, from which the background music of the video clip is determined. Alternatively, the user may input the name of the audio by himself, thereby obtaining an audio setting control corresponding to the second audio. In some embodiments, the first video is a 5 minute long video, where the cut-out video clip is a 1 st to 2 nd minute video clip. At this time, the audio clip is randomly cut out from the second audio as background music of the video clip. Optionally, the duration of the audio clip is the same as the duration of the video clip. Alternatively, the duration of the audio clip may be shorter than the duration of the video clip, although the audio clip is repeated or supplemented with blank sounds in portions of the video clip where the duration of the video clip exceeds the duration of the audio clip. In some embodiments, a segment of the second audio that is more strongly rhythmic is truncated as background music for the video segment. In some embodiments, a segment of the climax part is truncated from the second audio as background music for the video segment. For example, audio from 1 st to 2 nd minutes of the second audio is set as an audio piece as background music of the video piece. Of course, other time-stamped audio in the second audio may be intercepted as an audio segment, for example, audio from 4 th to 5 th minutes of the climax part of the audio is taken as an audio segment corresponding to the video segment from 1 st to 2 nd minutes.

In some embodiments, the audio setting control is a draggable progress bar corresponding to the first audio or the second audio. Optionally, different positions in the progress bar correspond to different time stamps in the first audio or the second audio. Optionally, determining, by the audio setting control, a start position and an end position from the progress bar, and determining an audio segment between a time stamp corresponding to the start position and a time stamp corresponding to the end position as a music segment which is selected by user and is used as background music in the first video. Alternatively, the starting position as well as the ending position may be changed by the user. Optionally, the audio clip is determined from the complete progress bar by dragging the progress bar. Optionally, the audio clip is determined by directly inputting the time information of the start time stamp and the end time stamp. For example, at least two information input boxes are displayed for operation of the audio setting control. And respectively inputting time information of the starting time stamp and the ending time stamp in at least two information input boxes, so as to customize the selected audio fragments of the first audio or the second audio.

In some embodiments, the technical solution provided in the embodiments of the present application further includes at least one step of steps S4 to S6 (not shown in the figures).

And S4, displaying the selection items corresponding to different scenes respectively in a third interface of the application for playing the audio.

In some embodiments, the selection items respectively corresponding to different scenes are UI controls, and the display positions and display types of the selection items respectively corresponding to different scenes are not limited in the embodiments of the present application. Optionally, selecting items respectively corresponding to different scenes are displayed on the third interface. Of course, the selection items corresponding to the different scenes may be displayed on other interfaces. The third interface is considered as an interface on which selection items respectively corresponding to different scenes are displayed. Optionally, the third interface is a main interface of an application playing audio. And displaying options corresponding to different scenes on the main interface. In some embodiments, as shown in fig. 23, in the third interface 2300 of the application playing audio, selection items 2301 respectively corresponding to different scenes are displayed.

And step S5, responding to the operation of the selection item corresponding to the target scene, displaying a material management interface corresponding to the target scene, wherein a third control for opening or closing a target function is displayed in the material management interface corresponding to the target scene, and the target function is a function for displaying beautifying materials in the target scene.

In some embodiments, the operation for the selection item corresponding to the target scene is an operation for displaying a material management interface corresponding to the target scene, and the operation type of the operation for the selection item corresponding to the target scene is not limited in the embodiments of the present application. The operation types of the operations for the selection items corresponding to the target scene include, but are not intended to be limited to, single click, slide, double click, drag, voice input, face input, and the like.

In some embodiments, the third control is a UI control, and the display position and the display type of the third control are not limited in the embodiments of the present application. Optionally, a third control is displayed on the material management interface corresponding to the target scene. Of course, the third control may be displayed on other interfaces. In some embodiments, the material management interface corresponding to the target scene is considered as an interface on which a third control for turning on or off the target function is displayed. In some embodiments, the material management interfaces corresponding to different target scenes are different. In some embodiments, different scenarios correspond to different functions. In some embodiments, the material management interface of the target scene also displays the user history as the beautification material of the target scene.

In some embodiments, as shown in fig. 23, in response to an operation for a selection item corresponding to a target scene, a material management interface 2304 corresponding to the target scene is displayed, in which a third control 2302 for opening or closing a target function is displayed.

Step S6, responding to the operation of the third control, if the target function is in an open state, closing the target function, and after the target function is closed, not displaying the beautifying materials in the target scene; or if the target function is in the closed state, opening the target function, and displaying the beautifying materials in the target scene after the target function is opened.

In some embodiments, the operation for the third control is an operation for closing or opening the target function, and the operation type of the operation for the third control is not limited in the embodiments of the present application. The operation type for the operation of the third control includes, but is not intended to be limited to, single click, swipe, double click, drag, voice input, face input, and the like.

It should be noted that, in the embodiment of the present application, the execution body for generating the first video by using the AI model may be a client, and of course, may also be a server. Taking a server as an example, after receiving input information of a user, the client sends the input information of the user to a background server of the client, and the background server receives the input information and generates a first video based on an AI model. And the background server sends the generated first video to the client and displays the first video by the client. The first video is generated through the server, so that the processing pressure of the client can be reduced to a certain extent, and the generation speed is improved.

According to the technical scheme provided by the embodiment of the application, the beautifying materials selected according to different first videos form the asset, so that the management of a user is easy. And when the beautifying materials are successfully set, the user can close the switch target function by one key or replace the beautifying materials in the target scene from the corresponding assets in the target scene. The efficiency of material management is promoted, and simultaneously, the requirement of a user for switching between different beautifying materials aiming at a target scene is met. Further, the target function can be managed conveniently by the user by setting the third control, and the function which is not needed to be used by the user can be closed in time, so that the processing cost and the resource consumption of the equipment are reduced.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Referring to fig. 24, a block diagram of a setting device for beautifying materials according to an embodiment of the present application is shown. The device has the function of realizing the method example of the client side, and the function can be realized by hardware or can be realized by executing corresponding software by hardware. The device may be the terminal device described above, or may be provided in the terminal device. As shown in fig. 24, the apparatus 2400 may include: an interface display module 2410, an option display module 2420, and a material setting module 2430.

The interface display module 2410 is configured to display, in a first interface for audio playing, a first video and a first control for setting the first video as a beautification material, where the first video is a video related to the audio.

The option display module 2420 is configured to display at least two setting options in response to the operation for the first control, where different setting options are used to set the first video as the beautification materials displayed in different scenes.

The material setting module 2430 is configured to set the first video as the beautification material displayed in the target scene in response to an operation for a target setting option of the at least two setting options.

In some embodiments, the at least two setting options include at least two of: the first setting option is used for setting the first video as a beautifying material displayed in an incoming call scene; the second setting option is used for setting the first video as the beautifying material displayed in the charging scene; the third setting option is used for setting the first video as a beautifying material displayed in the prompting sound scene; a fourth setting option, configured to set the first video as a beautification material displayed in a wallpaper scene; and a fifth setting option, configured to set the first video as a beautification material displayed in a color ring scene.

In some embodiments, the material setting module 2430 is configured to, in response to an operation for the target setting option of the at least two setting options, display a setting interface corresponding to the target setting option, where a selection of at least two sub-scenes included in the target scene is displayed in the setting interface.

The material setting module 2430 is further configured to display, in response to an operation on a selection item of a first sub-scene of the at least two sub-scenes, a preview effect of setting the first video as the beautification material displayed in the first sub-scene in the setting interface.

The material setting module 2430 is further configured to set the first video as the beautified material displayed in the first sub-scene in response to an operation on a confirmation control in the setting interface.

In some embodiments, the material setting module 2430 is further configured to display a segment selection control corresponding to the first video in the setting interface, where the segment selection control is used to select a video segment in the first video in a user-defined manner;

the material setting module 2430 is further configured to determine, based on the operation of the section selection control, a selected video section in the first video, where the video section is used for setting up the beautification material to be displayed in the first sub-scene.

In some embodiments, as shown in fig. 25, the modules further include a control display module 2440 and a music determination module 2450.

The control display module 2440 is configured to display an audio setting control, where the audio setting control is configured to set the background music of the selected video segment in the first video in a user-defined manner.

The music determination module 2450 is configured to determine background music of the video clip based on an operation of the audio setting control, where the background music of the video clip is used to be set together with a video picture of the video clip as a beautification material shown in the target scene.

In some embodiments, the first video is a video in a music clip library of an application playing the audio; alternatively, the first video is a video generated by an application playing the audio.

In some embodiments, as shown in fig. 25, the modules further include a video generation module 2460.

The video generating module 2460 is configured to obtain input information required for generating the first video; generating at least two key frames according to the input information through the key frame generation model; and performing frame interpolation and super-resolution processing on the at least two key frames through the video generation model to obtain the first video.

The video generation module 2460 is configured to input the input information into the encoding network, and output an embedded representation of the input information through the encoding network; inputting the embedded representation of the input information into the mapping network, and outputting the mapped embedded representation through the mapping network; and inputting the mapped embedded representation into the decoding network, and outputting the at least two key frames through the decoding network.

In some embodiments, the input information includes descriptive information, the first video being a video generated by the AI model based on the descriptive information; alternatively, the input information includes a selected style and a reference video and/or reference picture, the first video being a video generated by the AI model based on the reference video and/or reference picture and in combination with the selected style; alternatively, the input information includes a modification appeal and a reference video and/or reference picture, the first video being a video generated by the AI model based on the reference video and/or reference picture in conjunction with the modification appeal.

In some embodiments, the control display module 2440 is further configured to display, in a second interface of an application playing the audio, a second control for generating video based on the AI model.

The interface display module 2410 is further configured to display a video production interface in response to the operation for the second control, where the video production interface includes at least two production options, and different production options are used to generate a video using different AI models.

The video generating module 2460 is further configured to generate the first video through an AI model corresponding to a target production option of the at least two production options in response to an operation for the target production option.

In some embodiments, as shown in fig. 25, the modules further include a function setup module 2470.

The control display module 2440 is further configured to display, in a third interface of the application that plays the audio, selection items corresponding to the different scenes respectively.

The interface display module 2410 is further configured to display a material management interface corresponding to the target scene in response to an operation on a selection item corresponding to the target scene, where a third control for opening or closing a target function is displayed in the material management interface corresponding to the target scene, and the target function is a function of displaying a beautified material in the target scene.

The function setting module 2470 is configured to respond to an operation for the third control, if the target function is in an open state, close the target function, and after the target function is closed, not display the beautification material in the target scene; or if the target function is in a closed state, opening the target function, and displaying the beautifying materials in the target scene after the target function is opened.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Referring to fig. 26, a block diagram illustrating a structure of a terminal device 2600 according to an embodiment of the present application is shown. The terminal device 2600 may be the terminal device 11 in the implementation environment shown in fig. 1, for implementing the setting method of the beautification materials provided in the above-described embodiment. Specifically, the present invention relates to a method for manufacturing a semiconductor device.

In general, the terminal device 2600 includes: a processor 2601, and a memory 2602.

The processor 2601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 2601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing ), an FPGA (Field Programmable Gate Array, field programmable gate array), and a PLA (Programmable Logic Array ). The processor 2601 may also include a main processor and a coprocessor, wherein the main processor is a processor for processing data in an awake state, and is also called a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 2601 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing content required to be displayed by the display screen. In some embodiments, the processor 2601 may also include an AI processor for processing computing operations related to machine learning.

The memory 2602 may include one or more computer-readable storage media, which may be non-transitory. Memory 2602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 2602 is used to store a computer program configured to be executed by one or more processors to implement the method of setting aesthetic materials described above.

In some embodiments, terminal device 2600 may further optionally include: a peripheral interface 2603, and at least one peripheral. The processor 2601, the memory 2602, and the peripheral interface 2603 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 2603 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of a radio frequency circuit 2604, a display 2605, an audio circuit 2607, and a power source 2608.

It will be appreciated by those skilled in the art that the structure shown in fig. 26 is not limiting and that terminal device 2600 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, there is also provided a computer-readable storage medium in which a computer program is stored, which when executed by a processor, implements the above-described method of setting a beautification material on a terminal device side.

Alternatively, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random Access Memory ), SSD (Solid State Drives, solid state disk), or optical disk, etc. The random access memory may include, among other things, reRAM (Resistance Random Access Memory, resistive random access memory) and DRAM (Dynamic Random Access Memory ).

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program stored in a computer readable storage medium. The processor of the terminal device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the terminal device executes the above-described setting method of the beautification material.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limited by the embodiments of the present application.

It should be noted that, when the video in the embodiment of the present application is set as the beautifying material displayed in different scenes (such as an incoming call scene, a charging scene, a prompting tone scene, a wallpaper scene, a color ring scene, etc.), the video is authorized and allowed by the user, and should meet legal requirements.

The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and scope of the invention.

Claims

1. A method for setting up beautification materials, the method comprising:

2. The method of claim 1, wherein the at least two setting options comprise at least two of:

the first setting option is used for setting the first video as a beautifying material displayed in an incoming call scene;

the second setting option is used for setting the first video as the beautifying material displayed in the charging scene;

the third setting option is used for setting the first video as a beautifying material displayed in the prompting sound scene;

a fourth setting option, configured to set the first video as a beautification material displayed in a wallpaper scene;

and a fifth setting option, configured to set the first video as a beautification material displayed in a color ring scene.

3. The method of claim 1, wherein the setting the first video as the aesthetic material presented in the target scene in response to the operation for the target setting option of the at least two setting options comprises:

responding to the operation of the target setting options in the at least two setting options, displaying a setting interface corresponding to the target setting options, wherein the setting interface displays options of at least two sub-scenes included in the target scene;

Responsive to an operation of a selection item for a first sub-scene of the at least two sub-scenes, displaying a preview effect of setting the first video as an aesthetic material shown in the first sub-scene in the setting interface;

and responding to the operation of a confirmation control in the setting interface, and setting the first video as the beautifying material displayed in the first sub-scene.

4. A method according to claim 3, characterized in that the method further comprises:

displaying a segment selection control corresponding to the first video in the setting interface, wherein the segment selection control is used for self-defining selection of video segments in the first video;

and determining the selected video clips in the first video based on the operation of the clip selection control, wherein the video clips are used for beautifying materials which are set to be displayed in the first sub-scene.

5. The method according to claim 1, wherein the method further comprises:

displaying an audio setting control, wherein the audio setting control is used for customizing and setting background music of a selected video segment in the first video;

based on the operation of the audio setting control, background music of the video clip is determined, wherein the background music of the video clip is used for being set as beautifying materials displayed in the target scene together with video pictures of the video clip.

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the first video is a video in a music short film resource library of an application playing the audio;

or alternatively, the process may be performed,

the first video is an application-generated video that plays the audio.

7. The method of claim 1, wherein the first video is a video generated based on an artificial intelligence AI model, the AI model comprising a key frame generation model and a video generation model; the method further comprises the steps of:

acquiring input information required for generating the first video;

generating at least two key frames according to the input information through the key frame generation model;

and performing frame interpolation and super-resolution processing on the at least two key frames through the video generation model to obtain the first video.

8. The method of claim 7, wherein the key frame generation model comprises an encoding network, a mapping network, and a decoding network;

generating at least two key frames according to the characteristic information of the input information through the key frame generation model, wherein the key frame generation model comprises the following steps:

inputting the input information into the coding network, and outputting an embedded representation of the input information through the coding network;

Inputting the embedded representation of the input information into the mapping network, and outputting the mapped embedded representation through the mapping network;

and inputting the mapped embedded representation into the decoding network, and outputting the at least two key frames through the decoding network.

9. The method of claim 7, wherein the step of determining the position of the probe is performed,

the input information includes descriptive information, the first video being a video generated by the AI model based on the descriptive information;

or alternatively, the process may be performed,

the input information comprises a selected style and a reference video and/or a reference picture, and the first video is generated by the AI model based on the reference video and/or the reference picture and combined with the selected style;

or alternatively, the process may be performed,

the input information includes a modification appeal and a reference video and/or reference picture, the first video being a video generated by the AI model based on the reference video and/or reference picture in conjunction with the modification appeal.

10. The method of claim 7, wherein the method further comprises:

displaying a second control for generating a video based on the AI model in a second interface of an application playing the audio;

Responding to the operation of the second control, displaying a video production interface, wherein the video production interface comprises at least two production options, and different production options are used for generating videos by adopting AI models with different functions;

and responding to the operation of the target production options in the at least two production options, and generating the first video through the AI model corresponding to the target production options.

11. The method according to claim 1, wherein the method further comprises:

displaying the selection items respectively corresponding to the different scenes in a third interface of the application for playing the audio;

responding to the operation of the selection item corresponding to the target scene, displaying a material management interface corresponding to the target scene, wherein a third control for opening or closing a target function is displayed in the material management interface corresponding to the target scene, and the target function is a function for displaying beautifying materials in the target scene;

responding to the operation of the third control, if the target function is in an open state, closing the target function, wherein after the target function is closed, the beautifying material is not displayed in the target scene; or if the target function is in a closed state, opening the target function, and displaying the beautifying materials in the target scene after the target function is opened.

12. A setting device for beautifying materials, characterized in that the device comprises:

13. A terminal device, characterized in that it comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the method according to any of claims 1 to 11.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, which is loaded and executed by a processor to implement the method of any of claims 1 to 11.