CN109147745B

CN109147745B - Song editing processing method and device, electronic equipment and storage medium

Info

Publication number: CN109147745B
Application number: CN201810827717.1A
Authority: CN
Inventors: 杨鑫磊; 王新晨; 张晓波
Original assignee: Reach Best Technology Co Ltd
Current assignee: Reach Best Technology Co Ltd
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2020-03-10
Anticipated expiration: 2038-07-25
Also published as: CN109147745A

Abstract

The application discloses a song editing processing method, a song editing processing device, electronic equipment and a storage medium, wherein the song editing processing method comprises the following steps: the method comprises the steps of obtaining a user instruction to be executed, calling a first audio file and a second audio file of target music according to the user instruction, mixing the first audio file and the second audio file to generate a mixed sound file, playing the mixed sound file, and realizing the personalized customization of the mixed sound file by editing the first audio file and the second audio file before a user sings the target music, so that the mixed sound file replaces the original single acoustic accompaniment in the past, the diversity of accompaniment audio is increased, the singing activity content is enriched, the personalized characteristics of singing works are embodied, and the recognition degree of the singing works is improved.

Description

Song editing processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of information processing, and in particular, to a song editing processing method and apparatus, an electronic device, and a storage medium.

Background

The rapid development of the internet gradually changes the living style of the contemporary people, the requirements of the contemporary people on the spiritual culture are higher and higher, and singing gradually becomes one of the favorite entertainment activities of people. Especially, various Karaoke software products are popularized, so that more and more people can sing or record the singing voice of the people at any time and any place. The Karaoke software product is characterized in that singing voice of a user is synthesized in accompaniment provided by software, and then the karaoke software product is processed through a karaoke audio effect to obtain a singing recording result with better quality.

In the related technology, when a user records a song by using the karaoke software, due to the limited editing function of the song, only a single song accompaniment or an original song accompaniment which is the same as an original song can be selected as a background accompaniment for recording the song, so that the finally generated singing work is single in form and lacks of individuality and identification degree.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a song editing processing method, apparatus, electronic device, and storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a song editing processing method, including the steps of:

acquiring a user instruction to be executed;

calling a first audio file and a second audio file of target music according to the user instruction, wherein the second audio file is derived from the first audio file;

and carrying out sound mixing processing on the first audio file and the second audio file to generate a sound mixing file, and playing the sound mixing file.

Optionally, the first audio file is a mixed audio file, and the second audio file is an accompaniment audio file.

Optionally, before the obtaining the first request information, the song editing processing method further includes:

acquiring user attribute information;

identifying recommended songs corresponding to the user attribute information in a preset song library;

and sending the recommended tracks to the user terminal.

Optionally, after the mixing the first audio file and the second audio file to generate a mixed file and playing the mixed file, the method further includes:

acquiring sound information in a recording environment to generate singing audio of a user, wherein the sound information comprises: playing sound of the audio mixing file and singing sound of a user;

and intercepting the singing audio according to a preset sound selection area to generate a selected audio, wherein the sound selection area is a time period selected in the target audio progress bar.

acquiring a singing video of a user within the play time of the audio mixing file, wherein the singing video comprises a face image of the user;

identifying first emotion information represented by the face image according to a preset face emotion algorithm;

matching an expression data packet corresponding to the first emotion information in a preset expression library;

and rendering the singing video by adopting the expression data packet.

Optionally, after the first emotion information represented by the face image is identified according to a preset face emotion algorithm, the method further includes:

acquiring a text document of a lyric text corresponding to the singing video;

identifying second emotion information represented by the text document according to a preset document identification model;

and comparing the first emotion information with the second emotion information, matching an expression data packet corresponding to the second emotion information in the expression database when the first emotion information is inconsistent with the second emotion information, and covering the expression data packet on the face image.

Optionally, after the singing audio is intercepted according to a preset sound selection area to generate a selected audio, the method further includes:

acquiring a first video file;

performing a muting process on the first video file;

and loading the first video file subjected to the noise elimination treatment into a video area corresponding to the selected audio to generate a second video file with the selected audio.

storing the selected audio as a work file in the preset music library;

and setting permission right information for the work file, wherein the permission right information is used for representing whether the work file is open to be used by other users.

According to a second aspect of the embodiments of the present disclosure, there is provided a song editing processing apparatus including:

an acquisition unit configured to acquire a user instruction to be executed;

the processing unit is configured to call a first audio file and a second audio file of target music according to the user instruction, wherein the second audio file is derived from the first audio file;

and the execution unit is configured to perform mixing processing on the first audio file and the second audio file to generate a mixing file, and play the mixing file.

Optionally, the song editing processing apparatus further includes:

a first acquisition unit configured to acquire user attribute information;

the first processing unit is configured to identify recommended tracks corresponding to the user attribute information in a preset track library;

a first execution unit configured to transmit the recommended track to the user terminal.

Optionally, the song editing processing apparatus further includes:

a second obtaining unit configured to obtain sound information in a recording environment to generate singing audio of a user, wherein the sound information includes: playing sound of the audio mixing file and singing sound of a user;

and the second execution unit is configured to intercept the singing audio according to a preset sound selection area to generate a selected audio, wherein the sound selection area is a selected time period in the target audio progress bar.

Optionally, the song editing processing apparatus further includes:

a third obtaining unit configured to obtain a singing video of a user within the play time of the audio mixing file, wherein the singing video comprises a face image of the user;

the third processing unit is configured to identify first emotion information represented by the face image according to a preset face emotion algorithm;

the first matching unit is configured to match the expression data packet corresponding to the first emotion information in a preset expression library;

and the third execution unit renders the singing video by adopting the expression data packet.

Optionally, the song editing processing apparatus further includes:

a fourth acquisition unit configured to acquire a text document of a lyric text corresponding to the singing video;

the fourth processing unit is configured to identify second emotion information represented by the text document according to a preset document identification model;

and the fourth execution unit is configured to compare the first emotion information with the second emotion information, match an expression data packet corresponding to the second emotion information in the expression database when the first emotion information is inconsistent with the second emotion information, and cover the expression data packet on the face image.

Optionally, the song editing processing apparatus further includes:

a fifth acquiring unit configured to acquire the first video file;

a fifth processing unit configured to perform a mute process on the first video file;

a fifth execution unit, configured to load the first video file subjected to the noise elimination processing into a video area corresponding to the selected audio, and generate a second video file having the selected audio.

Optionally, the song editing processing apparatus further includes:

a storage unit configured to store the selected audio as a work file in the preset music library;

the setting unit is configured to set permission right information for the work file, wherein the permission right information is used for representing whether the work file is open to be used by other users.

According to a third aspect of the embodiments disclosed herein, there is provided an electronic device comprising a processor, a memory for storing processor-executable instructions, the processor being configured to the steps of the song editing processing method described above.

According to a fourth aspect of the embodiments disclosed herein, there is provided a non-transitory computer readable storage medium, wherein instructions of the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the steps of the song editing processing method described above.

According to a fifth aspect of the embodiments disclosed herein, there is provided a computer program product comprising computer program code, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the steps of the song editing processing method described above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method comprises the steps of obtaining a user instruction to be executed, calling a first audio file and a second audio file of target music according to the user instruction, mixing the first audio file and the second audio file to generate a mixed sound file, playing the mixed sound file, and realizing the personalized customization of the mixed sound file by editing the first audio file and the second audio file before a user sings the target music, so that the mixed sound file replaces the original single acoustic accompaniment in the past, the diversity of accompaniment audio is increased, the singing activity content is enriched, the personalized characteristics of singing works are embodied, and the recognition degree of the singing works is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flow diagram illustrating a song editing process method according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating one implementation of recommending songs in accordance with an illustrative embodiment.

Figure 3 is a flow diagram illustrating one implementation of intercepting singing audio according to an exemplary embodiment.

Fig. 4 is a flow diagram illustrating one implementation of an expression package for identifying a singing video according to an example embodiment.

FIG. 5 is a schematic diagram illustrating an expression data package rendering video according to an example embodiment.

Fig. 6 is a flow diagram illustrating another implementation of an expression package for identifying a singing video according to an example embodiment.

Fig. 7 is a flowchart illustrating compositing a second video file according to an example embodiment.

Fig. 8 is a block diagram illustrating a song editing processing apparatus according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating a mobile terminal according to an example embodiment.

FIG. 10 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a song editing processing method according to an exemplary embodiment, where the song editing processing method is used in a terminal, as shown in fig. 1, and includes the following steps:

s1100: acquiring a user instruction to be executed;

specifically, the terminal detects that the user selects a sound selection area of the target music in the music mode interface, for example, the user selects the 30 th second of the target music as a starting point of singing, the 80 th second of the target music as an end point of singing, that is, song segments from the 30 th second to the 80 th second in the target music are used as the sound selection area; if the terminal detects that the user also sets playing parameter information in the singing interface, the playing parameter information is obtained; if the terminal does not detect that the playing parameter information changes, the default playing parameter information is used as the playing parameter information of the target music; and sending the target music, the sound selection area or the playing parameter information as a user instruction to a server.

In some embodiments, the music mode interface refers to a mobile interface for a user to record and sing target music, and the playing parameter information includes a first audio file, a second audio file, an output volume value of the first audio file, and an output volume value of the second audio file.

S1200: calling a first audio file and a second audio file of target music according to the user instruction, wherein the second audio file is derived from the first audio file;

the server calls a target music data packet in a preset music library according to target music in the user instruction, wherein the target music data comprises a target music lyric file, a first audio file of the target music and a second audio file of the target music. The second audio file includes accompaniment music for the target music, for example, the accompaniment music includes but is not limited to blue key, hip hop or dj, etc.

S1300: and carrying out sound mixing processing on the first audio file and the second audio file to generate a sound mixing file, and playing the sound mixing file.

Respectively adjusting the playing volume of the first audio file and the second audio file to be the same as the playing parameter information according to the playing parameter information in the user instruction; and after the first audio file and the second audio file with the adjusted playing volume are subjected to audio mixing processing to form audio mixing files, starting a playing program to play the audio mixing files.

The audio mixing process generally refers to mixing a plurality of audio files and line input audio signals to synthesize individual audio files. In the present embodiment, the mixing process is to process the first audio file and the second audio file into separate mix files.

In some embodiments, the first audio file is a mixed audio file and the second audio file is an accompanying audio file. The mixed audio file is synthesized by an original audio file and an accompaniment audio file, namely the mixed audio file consists of human voice audio and original song accompaniment audio. The second audio file is different from the accompaniment audio of the first audio file, but is an accompaniment audio file of a user-defined accompaniment music which is an accompaniment audio file arranged according to the first audio file, for example, the first audio file is a mixed audio file composed of piano accompaniment audio and human voice audio, and the second audio file is a jazz accompaniment audio file, and when the first audio file and the second audio file are mixed to generate a mix file, the mix file contains the accompaniment audio of the jazz.

The method comprises the steps of calling a first audio file and a second audio file of target music according to a user instruction by acquiring the user instruction to be executed, mixing the first audio file and the second audio file to generate a mixed sound file, playing the mixed sound file, and realizing personalized customization of the mixed sound file by editing the first audio file and the second audio file, so that the mixed sound file replaces the original single acoustic accompaniment, the diversity of accompaniment audios is increased, singing activity contents are enriched, the personalized characteristics of singing works are embodied, and the recognition degree of the singing works is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of recommending songs according to the present embodiment. As shown in fig. 2, before executing step S1100, the song editing processing method specifically includes the following steps:

s1111: acquiring user attribute information;

user attribute information includes, but is not limited to, the user's age, gender, music genre of interest, tracks sung, music history played, and the like. And the server side acquires the user attribute information of the login account from a background database according to the login account of the user.

S1112: identifying recommended songs corresponding to the user attribute information in a preset song library;

the preset song library classifies and sorts various songs according to a preset arrangement mode. The predetermined arrangement includes, but is not limited to, music style, song-ordering heat, or singer. And establishing a song association degree according to the user attribute information, and acquiring a recommended song corresponding to the song association degree from a preset song library.

In some embodiments, each user attribute information is scored according to a preset scoring table, wherein the scoring table records a score value of each user attribute information, for example, the score of hip-hop which is the music style is 2. Accumulating the scored user attribute information to obtain a total score, searching songs corresponding to the total score in a preset song library so as to establish song association degree between the user attribute information and the songs corresponding to the total score, and taking the songs corresponding to the total score as recommended songs.

S1113: and sending the recommended tracks to the user terminal.

And sending the identified recommended tracks to the user terminal according to the login account of the user.

Through identifying the recommended songs corresponding to the user attribute information in the preset song library, the user can directly select the appropriate songs with pertinence, the personalized requirements of the user can be efficiently met, and the user experience is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of intercepting singing audio according to this embodiment. As shown in fig. 3, after step S1300 is executed, the song editing processing method specifically includes the following steps:

s1311: acquiring sound information in a recording environment to generate singing audio of a user, wherein the sound information comprises: playing sound of the audio mixing file and singing sound of a user;

in the playing process of the audio mixing file, the terminal records the voice audio of the target music played by the user, and the playing sound of the audio mixing file and the voice audio of the user are mixed to form the singing audio. The mode of respectively recording playing sound and singing sound and then mixing the playing sound and the singing sound is favorable for improving the resolution and the definition of the human voice of a user and improving the tone quality of the human voice audio, wherein the singing audio comprises a sound mixing file of the whole target music and a lyric text corresponding to the sound mixing file.

S1312: and intercepting the singing audio according to a preset sound selection area to generate a selected audio, wherein the sound selection area is a time period selected in the target audio progress bar.

The preset sound selection area is a time period selected by a user in the target audio progress bar, for example, when the time period in the target audio progress bar for selecting the target music is 10 th second to 30 th second, the 10 th second to 30 th second in the singing audio is intercepted as the selected audio. It should be noted that, the server uploads the selected audio submitted by the user as the singing work of the user to the singing platform, and sets the usage right for the singing work under the default condition. When the server detects that the user executes permission operation, for example, when the user cancels the use permission setting of the singing work, the server unlocks the use permission of the singing work, so that other users can use the singing work together to play a sharing role, and interactivity between the users on a singing platform is improved.

The selected audio is generated by intercepting the singing audio according to the preset sound selection area, so that the audio can be edited for the user efficiently, and the efficiency of recording the singing audio is improved.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of an expression data packet for recognizing a singing video according to the present embodiment. As shown in fig. 4, after step S1300 is executed, the song editing processing method specifically includes the following steps:

s1321: acquiring a singing video of a user within the play time of the audio mixing file, wherein the singing video comprises a face image of the user;

when the terminal detects that a user clicks to record a video in a video mode interface, the terminal sends a request instruction to the server so that the server calls a camera program on the terminal according to the request instruction; displaying a lyric text corresponding to the audio mixing file on a video mode interface while playing the audio mixing file; acquiring a singing video recorded by a user according to the lyric text; and detecting the face position in each video frame according to a preset face detection algorithm, and extracting a face image.

Specifically, a cascade convolutional neural network (cascade convolutional neural network) is adopted in a preset face detection algorithm, images on a video frame are divided into a face region and a non-face region, a frame of the face region is corrected, and finally a face image is recognized.

S1322: identifying first emotion information represented by the face image according to a preset face emotion algorithm;

the preset face emotion algorithm is characterized in that a convolutional neural network is adopted, face pictures of labeled emotion types are input into the convolutional neural network for training, probability values of all kinds of emotions are output, if the probability values of all kinds of emotions exceed a preset threshold value, the current face is determined to present the emotion, otherwise, parameter values in the convolutional neural network are continuously updated through back propagation until all kinds of trained emotions are larger than or equal to the preset threshold value, and finally classifiers of all kinds of emotions are obtained.

The first emotional information includes emotional information such as happy, anxious, surprise, depressed or angry.

And substituting the face image in each video frame into a face emotion algorithm, and screening to obtain emotion information represented by each face image. When the number of the face images in one video frame is multiple, namely, a plurality of first emotion information exist in one video frame, the proportion value of each first emotion information in the video frame in all the first emotion information in the video frame is respectively calculated, and the maximum proportion value is used as the first emotion information of the face images in the video frame. For example, if there are 4 face images in a video frame, wherein 3 of the first emotion information are recognized as happy and 1 of the first emotion information is recognized as sad, the ratios of happy to sad are calculated as 0.75 and 0.25, respectively, and the first emotion information represented by the face images of the video frame is happy.

S1323: matching an expression data packet corresponding to the first emotion information in a preset expression library;

the preset expression database classifies and sorts expression data packets according to emotion types, and the expression data packets comprise background compressed images such as static background images and dynamic background images corresponding to emotions. For example, the worried expression data packet is a raindrop dynamic image, the happy expression data base is a pink flower dynamic image, and the angry expression data packet is a volcano dynamic image which smokes.

S1324: and rendering the singing video by adopting the expression data packet.

And analyzing the expression data packet to obtain an image with the size equal to that of the interface of the mobile equipment, sending the image to a non-face area of the video frame, and rendering the video frame so as to add background content to the singing video.

The first emotion information represented by the face image in the singing video is recognized through a preset face emotion algorithm, the expression data packet corresponding to the first emotion information is matched in a preset expression library, the singing video is rendered through the expression data packet, the background style of the singing video is enriched, the interestingness of the singing video is increased, and meanwhile, the interactivity between human and a machine is enhanced.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating an expression data packet rendering video according to the present embodiment. As shown in fig. 5, in the video mode interface, that is, in the process of switching from the music mode interface to the MV mode interface, during the playing of the remix file, the lyric text corresponding to the singing video and the remix file is displayed, and the first emotion information represented by the lyric text of "yesterday takes root and sprout in memory" is happy, and the data packet corresponding to the happy first emotion information is the flower background image.

Referring to fig. 6, fig. 6 is a flowchart illustrating another embodiment of an expression data packet for recognizing a singing video according to the present embodiment. As shown in fig. 6, after step S1300 is executed, the song editing processing method specifically includes the following steps:

s1331: acquiring a text document of a lyric text corresponding to the singing video;

and acquiring a text document of the lyric text corresponding to the singing video according to the time period of the singing video, wherein the text document comprises the lyric singing time and the lyric text. For example, when the lyric singing time of the singing video is 55 seconds, the corresponding lyric text is "people who have confirmed their eye spirit and i meet the opponent".

S1332: identifying second emotion information represented by the text document according to a preset document identification model;

the preset document identification model is a model which is trained on an emotion classifier represented by characters in each text document and can identify second emotion information expressed by each text document. The second mood information is the same as the first mood information, and the second mood information comprises mood information such as happy feeling, worry, surprise, depression or anger. For example, the second emotional information characterized by the text document "hewn o's youth" is a worry.

S1333: and comparing the first emotion information with the second emotion information, matching an expression data packet corresponding to the second emotion information in the expression database when the first emotion information is inconsistent with the second emotion information, and covering the expression data packet on the face image.

When the first emotion information is inconsistent with the second emotion information, namely the emotion represented by the face image is different from the emotion represented by the text document, matching an expression data packet of the second emotion information in an expression database by taking the second emotion information as a reference; and according to the position of the face image, covering the analyzed expression data packet on the face image, wherein the covering process is executed after the singing video is rendered by adopting the expression data packet corresponding to the first emotion information.

And when the first emotion information is consistent with the second emotion information, not adopting the expression data packet corresponding to the second emotion information, namely, the singing video is rendered by adopting the expression data packet corresponding to the first emotion information.

Continuing to refer to the example of the step S1332, when the first emotion information represented by the fact that the user sings "soul of youth of us" is happy, and the second emotion information represented by the text document is sad, the emoji expression of the crying face is matched in the expression library by taking the sad as a reference, and the emoji expression is covered on the position of the face image.

And comparing whether the first emotion information is consistent with the second emotion information through the second emotion information represented by the text document of the recognized lyric text, so as to execute different special effect rendering modes on the singing video according to the comparison result, for example, covering an expression data packet on a face image, thereby not only enriching the content of the singing video, but also improving the interactivity between the user and the machine.

Referring to fig. 7, fig. 7 is a flowchart illustrating a process of synthesizing a second video file according to the present embodiment. As shown in fig. 7, after step S1312 is executed, the song editing processing method specifically includes the following steps:

s1313: acquiring a first video file, wherein the first video file comprises a video file loaded through a video link;

the first video file may be obtained by loading a video file stored in the mobile device selected by the user, in addition to the video file loaded through the video link, for example, by loading a shared video link.

S1314: performing a muting process on the first video file;

the mute process means to remove the audio of the first video file, i.e. to delete the video track, resulting in a muted first video file.

S1315: and loading the first video file subjected to the noise elimination treatment into a video area corresponding to the selected audio to generate a second video file with the selected audio.

At the selected audio frequency

The second video file comprises the video and the selected audio of the first video file, so that the generated second video file is not limited to recording the video, video resources can be obtained from other channels, meanwhile, the video file required by the user can be matched for the selected audio according to the preference of the user, the free combination of the audio and the video is selected, and the form of the singing work of the user is enriched.

Specifically, the song editing processing method further includes:

storing the selected audio as a work file in the preset music library; and setting permission right information for the work file, wherein the permission right information is used for representing whether the work file is open to be used by other users.

Further, the selected audio is stored as a file work in a preset music library, and the file work is displayed in a shared area of the singing platform. When the server detects that an author of the file works sets the file works to be an externally disclosed option, permission authority information of the file works is set, the permission authority information represents that the work files are open to other users for use, so that the other users can use the file works as accompaniment tracks of singing or the culture element of short videos, the functions of resource sharing and repeated use of the file works are achieved, interactivity among the users is enhanced, and popularization and use popularity of the file works are increased. Fig. 8 is a block diagram illustrating a song editing processing apparatus according to an exemplary embodiment. Referring to fig. 8, the apparatus includes an acquisition unit 110, a processing unit 120, and an execution unit 130. The obtaining unit 110 is configured to obtain a user instruction to be executed; a processing unit 120 configured to call a first audio file and a second audio file of the target music according to a user instruction, wherein the second audio file is derived from the first audio file; the execution unit 130 is configured to perform mixing processing on the first audio file and the second audio file to generate a mixing file, and play the mixing file.

In some embodiments, the first audio file is a mixed audio file and the second audio file is an accompanying audio file.

In some embodiments, the song editing processing apparatus further comprises: the device comprises a first acquisition unit, a first processing unit and a first execution unit. The first acquisition unit is configured to acquire user attribute information; the first processing unit is configured to identify recommended tracks corresponding to the user attribute information in a preset track library; and the first execution unit is configured to send the recommended tracks to the user terminal.

In some embodiments, the song editing processing apparatus further comprises: a second acquisition unit and a second execution unit. Wherein the second obtaining unit is configured to obtain sound information in the recording environment to generate singing audio of the user, wherein the sound information comprises: mixing the playing sound of the file and the singing sound of the user; and the second execution unit is configured to intercept the singing audio according to a preset sound selection area to generate a selected audio, wherein the sound selection area is a selected time period in the target audio progress bar.

In some embodiments, the song editing processing apparatus further comprises: the device comprises a third acquisition unit, a third processing unit, a first matching unit and a third execution unit. The third acquisition unit is configured to acquire a singing video of the user within the play time of the audio mixing file, wherein the singing video comprises a face image of the user; the third processing unit is configured to identify first emotion information represented by the face image according to a preset face emotion algorithm; the first matching unit is configured to match the expression data packet corresponding to the first emotion information in a preset expression library; and the third execution unit renders the singing video by adopting the expression data packet.

In some embodiments, the song editing processing apparatus further comprises: the device comprises a fourth acquisition unit, a fourth processing unit and a fourth execution unit. Wherein the fourth obtaining unit is configured to obtain a text document of a lyric text corresponding to the singing video; the fourth processing unit is configured to identify second emotion information represented by the text document according to a preset document identification model; and the fourth execution unit is configured to compare the first emotion information with the second emotion information, match an expression data packet corresponding to the second emotion information in the expression database when the first emotion information is inconsistent with the second emotion information, and overlay the expression data packet on the face image.

In some embodiments, the song editing processing apparatus further comprises: a fifth acquisition unit, a fifth processing unit and a fifth execution unit. Wherein the fifth acquiring unit is configured to acquire the first video file; a fifth processing unit configured to perform a mute process on the first video file; and the fifth execution unit is configured to load the first video file subjected to the noise elimination processing into a video area corresponding to the selected audio, and generate a second video file with the selected audio.

In some embodiments, the song editing processing apparatus further comprises: a storage unit and a setting unit. Wherein the storage unit is configured to store the selected audio as a work file in the preset music library; the setting unit is configured to set permission right information for the work file, wherein the permission right information is used for representing whether the work file is open to be used by other users.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 9 is a block diagram illustrating a mobile terminal 900 for song editing processing according to an example embodiment. For example, the mobile terminal 900 may be a mobile telephone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, or the like.

Referring to fig. 9, mobile terminal 900 may include one or more of the following components: a processing component 902, a memory 904, a power component 906, a multimedia component 908, an audio component 910, an input/output (I/O) interface 912, a sensor component 914, and a communication component 916.

Processing component 902 generally controls the overall operation of mobile terminal 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more units that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia unit to facilitate interaction between the multimedia component 908 and the processing component 902.

Memory 904 is configured to store various types of data to support operation at mobile terminal 900. Examples of such data include instructions for any application or method operating on mobile terminal 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power components 906 provide power to the various components of the mobile terminal 900. The power components 906 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the mobile terminal 900.

The multimedia components 908 include a screen that provides an output interface between the mobile terminal 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the mobile terminal 900 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when mobile terminal 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

The I/O interface 912 provides an interface between the processing component 902 and peripheral interface units, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing various aspects of state assessment for the mobile terminal 900. For example, sensor assembly 914 may detect an open/closed state of device 900, a relative positioning of components, such as a display and keypad of mobile terminal 900, a change in position of mobile terminal 900 or a component of mobile terminal 900, a presence or absence of user contact with mobile terminal 900, an orientation or acceleration/deceleration of mobile terminal 900, and a change in temperature of mobile terminal 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate communications between the mobile terminal 900 and other devices in a wired or wireless manner. The mobile terminal 900 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) unit to facilitate short-range communications. For example, the NFC unit may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the mobile terminal 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described song editing processing methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the mobile terminal 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 10 is a block diagram illustrating an electronic device 1900 for song editing processing according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 10, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more units each corresponding to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

A computer program product comprising computer program code, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the song editing processing method described above.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A song editing processing method, comprising:

acquiring a user instruction to be executed;

performing sound mixing processing on the first audio file and the second audio file to generate a sound mixing file, and playing the sound mixing file;

and rendering the singing video by adopting the expression data packet.

2. The song editing processing method of claim 1, wherein the first audio file is a mixed audio file and the second audio file is an accompaniment audio file.

3. The song editing processing method according to claim 1, further comprising, before the obtaining of the user instruction to be executed:

acquiring user attribute information;

and sending the recommended tracks to a user terminal.

4. The song editing processing method according to claim 1, wherein after the mixing the first audio file and the second audio file to generate a mix file and playing the mix file, the method further comprises:

5. The song editing processing method according to claim 1, further comprising, after the identifying the first emotion information represented by the face image according to a preset face emotion algorithm:

acquiring a text document of a lyric text corresponding to the singing video;

6. The song editing processing method of claim 4, wherein after the step of generating the selected audio by intercepting the singing audio according to the preset sound selection area, the method further comprises:

acquiring a first video file, wherein the first video file comprises a video file loaded through a video link;

performing a muting process on the first video file;

7. The song editing processing method of claim 4, wherein after the step of generating the selected audio by intercepting the singing audio according to the preset sound selection area, the method further comprises:

storing the selected audio as a work file in the preset music library;

8. A song editing processing apparatus, comprising:

an acquisition unit configured to acquire a user instruction to be executed;

the execution unit is configured to perform sound mixing processing on the first audio file and the second audio file to generate a sound mixing file, and play the sound mixing file;

9. The song editing apparatus of claim 8, wherein the first audio file is a mixed audio file and the second audio file is an accompaniment audio file.

10. The song editing processing apparatus according to claim 8, further comprising:

a first acquisition unit configured to acquire user attribute information;

and the first execution unit is configured to send the recommended tracks to the user terminal.

11. The song editing processing apparatus according to claim 8, further comprising:

12. The song editing processing apparatus according to claim 8, further comprising:

13. The song editing processing apparatus according to claim 11, further comprising:

a fifth acquiring unit configured to acquire the first video file;

14. The song editing processing apparatus according to claim 11, further comprising:

15. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured as steps of the song editing processing method according to any one of claims 1 to 7.

16. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the song editing process method of any one of claims 1 to 7.