CN117198249A

CN117198249A - Audio processing method, device, computer equipment and storage medium

Info

Publication number: CN117198249A
Application number: CN202311166944.1A
Authority: CN
Inventors: 袁帅; 孙尚杰; 李佩道; 黄益修; 陈权河
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2023-12-08

Abstract

The invention relates to the technical field of computers and discloses an audio processing method, an audio processing device, computer equipment and a storage medium, wherein the method provided by the invention comprises the steps of obtaining target information and generating lyric information based on the target information; determining melody information based on the lyric information; determining a song accompaniment corresponding to the lyric information based on the speed information in the melody information; and determining target audio based on fusion of the lyric information and the song accompaniment. The method realizes the automatic processing from the target information to the target audio, and can realize the processing of the target audio without having more creation experience for users.

Description

Audio processing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an audio processing method, an audio processing device, a computer device, and a storage medium.

Background

Song creation mainly consists of two parts, lyrics and song accompaniment, and is difficult to achieve for inexperienced or less experienced users. For experienced applications, it is difficult to complete the creation of a complete song with some of the best practices and some of the best compilations.

Disclosure of Invention

In view of the above, the present disclosure provides an audio processing method, apparatus, computer device and storage medium to solve the problem of song creation.

In a first aspect, the present disclosure provides an audio processing method, the method comprising:

acquiring target information and generating lyric information based on the target information;

carrying out semantic analysis on the lyric information, and determining melody information corresponding to the lyric information based on the result of the semantic analysis;

determining a song accompaniment corresponding to the lyric information based on the speed information in the melody information;

and determining target audio based on fusion of the lyric information and the song accompaniment.

In a second aspect, the present disclosure provides an audio processing apparatus, the apparatus comprising:

the target information acquisition module is used for acquiring target information and generating lyric information based on the target information;

the melody information determining module is used for carrying out semantic analysis on the lyric information and determining melody information corresponding to the lyric information based on the result of the semantic analysis;

the song accompaniment determining module is used for determining song accompaniment corresponding to the lyric information based on the speed information in the melody information;

And the lyric song fusion module is used for determining target audio based on fusion of the lyric information and the song accompaniment.

In a third aspect, the present disclosure provides a computer device comprising: the audio processing device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the audio processing method of the first aspect or any corresponding implementation mode of the first aspect is executed.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the audio processing method of the first aspect or any of its corresponding embodiments.

According to the audio processing method provided by the embodiment of the disclosure, the lyric information is automatically generated through the target information, the result of semantic analysis is obtained through semantic analysis on the lyric information, melody information corresponding to the lyric information is determined on the basis of the result of semantic analysis, the result of semantic analysis can represent meaning, intention and the like expressed by lyrics, so that matching of the melody information and the lyric information is ensured, the corresponding composition accompaniment is matched by utilizing the speed information in the melody information, accuracy of the composition accompaniment is improved, and finally the target audio is obtained based on fusion of the lyric information and the composition accompaniment, so that automatic processing from the target information to the target audio is realized, and processing of the target audio can be realized without having more composition experience for a user.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow diagram of an audio processing method according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of another audio processing method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of yet another audio processing method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a target information entry page according to an embodiment of the present disclosure;

5 a-5 b are schematic diagrams of lyric pages according to embodiments of the present disclosure;

FIG. 6 is a schematic diagram of a play page according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a mixing page according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an audio processing apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

In the related art, after a song creation request is received, an adaptive target rhythm type is selected by using a reference text carried in the song creation request, lyrics and accompaniment are obtained according to the target rhythm type, and finally the lyrics and the accompaniment are fused to obtain the audio content of the target song. The method is to select a target rhythm type by utilizing a reference text, and then regenerate the lyrics on the basis, wherein the lyrics content is influenced by the complexity of the rhythm type, and after the user is allowed to modify the lyrics, the original lyrics word number adapted in the rhythm type may have unreasonable condition. Thus, it is also difficult to meet the authoring requirements of songs in this way.

Based on this, the embodiment of the disclosure provides an audio processing method for creating songs, specifically, generating lyric information based on the obtained target information, determining melody information and song accompaniment on the basis, and finally fusing the lyric information and the song accompaniment to obtain the target audio. The method is to determine lyrics first and then determine the accompaniment of the song, so that the aim of automatically creating the song on the basis of given target information is fulfilled.

In accordance with the disclosed embodiments, an audio processing method embodiment is provided, it being noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

In this embodiment, an audio processing method is provided, which may be used in the above-mentioned computer device, such as a computer, a mobile phone, a tablet computer, etc., fig. 1 is a flowchart of an audio processing method according to an embodiment of the disclosure, and as shown in fig. 1, the flowchart includes the following steps:

step S101, obtaining target information and generating lyric information based on the target information.

The target information is information input through the provided interactive interface, for example, application software for song creation is operated in the computer equipment, a user opens the application software through interaction with the application software to enter into the creation interface, and the target information is input in the creation interface, and accordingly, the target information is acquired. The form of the target information includes but is not limited to text, audio, pictures or videos, and the specific form is set according to actual requirements.

If the target information is text, generating lyric information based on the text information can be realized by carrying out semantic understanding on the text information to obtain the mood or meaning characterized by the text information, and carrying out lyric creation on the basis to generate lyric information. If the target information is audio, converting the audio into text information and regenerating lyric information. If the target information is a picture, elements in the picture are analyzed, scenes, mood or meaning and the like represented by the picture are determined, and lyrics are regenerated. If the target information is video, converting audio in the video into text, and generating text information by combining the represented meaning or the represented meaning of the video frame, thereby generating lyric information.

Of course, the form of the target information is not limited to the above, and may include other forms, and is specifically set according to actual requirements. After the target information is determined, corresponding lyric information is generated on the basis.

Step S102, melody information is determined based on the lyric information.

The lyric information can characterize the mood or scene information represented by the lyric information, and the melody information corresponding to the lyric information can be determined by analyzing the lyric information. The analysis mode of the song information includes, but is not limited to, semantic analysis, classification of the scene characterized by the song information, and the like.

In some alternative embodiments, the step S102 includes:

and a1, carrying out semantic analysis on the song information to obtain a semantic analysis result.

And a step a2 of determining melody information based on the result of the semantic analysis.

After lyric information is obtained, semantic analysis is carried out on the lyric information, and a semantic analysis result is obtained. The semantic analysis results include, but are not limited to, lyrics length, lyrics word number, paragraphs, lyrics topic, and the like. A lyrics topic may also be understood as a scene characterized by lyrics, e.g. a scene of rest, a scene of motion, a scene of play, etc.

Melody information includes a tempo, a chord, etc., the tempo is determined based on the number of words of lyrics of the semantic analysis result, the chord is determined based on the subject of lyrics of the semantic analysis result, etc. For example, a rhythm template, a chord template and the like corresponding to the melody information are preset, and the relevant information in the semantic analysis result is used for matching the corresponding templates to obtain a target rhythm template, a target chord template and the like, so that the melody information corresponding to the lyric information is obtained.

Step S103, determining the song accompaniment corresponding to the lyric information based on the speed information in the melody information.

Since the scene information corresponding to different lyrics subjects is different, the speed information in the melody information is correspondingly different. For example, in a sports scene, the speed of the melody represented by the speed information is faster, while in a quiet scene, the speed of the melody represented by the speed information is slower. Meanwhile, the speed information in the melody information is obtained by combining different chords and rhythms. That is, the speed information in the melody information is determined based on the contents of the chord, the rhythm, the paragraph, and the like included in the melody information. Accordingly, the song accompaniment corresponding to the lyric information is determined using the speed information.

Step S104, determining target audio based on fusion of lyric information and the song accompaniment.

When the melody information is determined, the alignment of the lyrics and the melody can be realized, and the song accompaniment is obtained based on the melody information, so that the fusion of the lyrics information and the song accompaniment can be also performed based on time alignment, thereby obtaining the target audio. Of course, in forming the target audio, in addition to fusing lyric information with the song accompaniment, fusion of human voice can be added, that is, the target audio is further characterized by adopting a human voice singing mode. The determination of the voice to be fused may be selected from a provided voice library, may be obtained by other means, and the like, and is not limited in any way.

According to the audio processing method provided by the embodiment, the lyric information is automatically generated through the target information, the result of semantic analysis is obtained through semantic analysis on the lyric information, melody information corresponding to the lyric information is determined on the basis of the result of the semantic analysis, the meaning, the intention and the like expressed by the lyric can be represented by the result of the semantic analysis, so that the matching of the melody information and the lyric information is ensured, the corresponding editing accompaniment is matched by utilizing the speed information in the melody information, the accuracy of the editing accompaniment is improved, and finally the target audio is obtained based on the fusion of the lyric information and the editing accompaniment, so that the automatic processing from the target information to the target audio is realized, and the processing of the target audio can be realized without more creation experience for a user.

In this embodiment, an audio processing method is provided, which may be used in the above-mentioned computer device, such as a computer, a mobile phone, a tablet computer, etc., fig. 2 is a flowchart of the audio processing method according to the embodiment of the disclosure, and as shown in fig. 2, the flowchart includes the following steps:

step S201, obtaining target information and generating lyric information based on the target information. Please refer to the detailed description of step S101 in the embodiment shown in fig. 1, which is not repeated here.

Step S202, melody information is determined based on the lyric information.

Specifically, the step S202 includes:

in step S2021, semantic analysis is performed on the song information, and a result of the semantic analysis is obtained.

Wherein, the result of the semantic analysis comprises the words number of the lyrics, the length of the lyrics and the theme of the lyrics. The semantic analysis can be obtained based on a network model which is trained in advance, the input of the network model is lyric information, and the output of the network model is the result of the corresponding semantic analysis. For example, the output of the network model includes three branches for outputting the words of lyrics, the length of lyrics and the topic of lyrics, respectively. The structure of the network model is not limited in any way, and the network model is set according to actual requirements.

Step S2022, determining a tempo corresponding to the lyric information based on the lyric word number.

And splitting the words by using the words of the lyrics in a single sentence, wherein each sentence corresponds to a rhythm template. Wherein the tempo template is determined from the words of the lyrics, e.g. a line of lyrics has several bars, each bar having several beats. After the rhythm template is determined, the rhythm corresponding to the lyric information can be obtained.

Step S2023 determines the paragraph corresponding to the lyric information based on the lyric length.

The paragraph is corresponding to the lyric length, for example, the paragraph can also be expressed in the form of a paragraph template, for example, the lyric length obtained by semantic analysis is matched in the paragraph template, so as to determine the paragraph template corresponding to the lyric information, and further obtain the corresponding paragraph.

Step S2024, determining the chord corresponding to the lyric information based on the lyric theme.

Wherein the melody information includes a rhythm, a paragraph, and a chord.

As described above, the lyric theme may be used for information characterizing scenes, and different scenes match different chords, so that the corresponding chord templates are matched by using the lyric theme, thereby obtaining chords corresponding to the lyric information.

Step S203, based on the speed information in the melody information, determining the song accompaniment corresponding to the lyric information.

Specifically, the step S203 includes:

step S2031, determining a current speed section based on the speed information.

The speed information in the melody information is the speed information represented by the melody information, and the speed information is matched in each speed interval to obtain the current speed interval corresponding to the speed information. The speed information is obtained based on the information of the chord, the paragraph, the rhythm and the like in the melody information, and the speed interval is obtained by dividing in advance according to the requirement. And matching the speed information in the melody information in each speed interval to determine the current speed interval.

Step S2032, matching the song accompaniment templates based on the current speed interval, and determining a target song accompaniment template to obtain a song accompaniment corresponding to the lyric information.

The speed interval corresponds to the song accompaniment template, and the corresponding target song accompaniment template can be obtained after the current speed interval is obtained, so that song accompaniment corresponding to the lyric information is obtained.

In some alternative embodiments, step S2032 includes:

step a1, matching the accompaniment templates of the music composing based on the current speed interval, and determining the accompaniment templates of the recommended music composing.

Step a2, acquiring a switching instruction of the recommended editing accompaniment template to determine a target editing accompaniment template.

The combined accompanying template matched with the current speed interval is called as a recommended combined accompanying template, and a user can directly use the recommended combined accompanying template as a target combined accompanying template, and can switch the combined accompanying template on the basis, and then the target combined accompanying template is determined again. For example, the current speed interval corresponds to the composition accompaniment template and includes a plurality of selectable sub-templates, and one of the sub-templates is selected as the recommended composition accompaniment template, and of course, the user may also switch the templates among the remaining sub-templates to generate a switching instruction to determine the target composition accompaniment template.

In a specific application example, the speed interval corresponds to a category of a composition accompaniment template, each category includes a plurality of composition accompaniment templates, a target category is determined by using a current speed interval line, and one of the plurality of composition accompaniment templates of the target category is selected as a recommended composition accompaniment template. The recommended editing accompaniment template and the lyric information can be fused, and the fused audio can be played. If the user needs to adjust the fused audio, the user generates a switching instruction through interaction with the editing accompaniment templates, and switches the editing accompaniment templates, namely, selects one editing accompaniment template from target categories as the target editing accompaniment template.

On the basis of recommending the song accompaniment templates, the method also provides users with switching of the song accompaniment templates so as to ensure that the determined target song accompaniment templates can meet the demands of the users.

In some alternative embodiments, step a2 includes:

step a21, displaying the identification of the selectable song accompaniment template.

Step a22, obtaining an operation instruction of the identification of the target composition accompaniment template to determine the target composition accompaniment template.

As described above, the speed section corresponds to the genre of the composition accompaniment template, and different genres are represented by different identifications. For example, the identification may be a different cover image, or a corresponding text title, or the like. Under the same category, different music accompaniment templates are distinguished by different marks. When the editing accompaniment templates are switched, an operation instruction for the identification of the target editing accompaniment template is generated through interaction with the corresponding identification, so that the target editing accompaniment template is determined.

When the song accompaniment templates are switched, the identification of the selectable song accompaniment templates is displayed for the user to select, so that the target song accompaniment templates are determined.

Step S204, determining target audio based on fusion of lyric information and the song accompaniment. Please refer to the detailed description of step S104, which is not repeated here.

According to the audio processing method provided by the embodiment, the semantic analysis result comprises the words of lyrics, so as to determine the rhythm; the lyric length is used for determining paragraphs; the lyrics theme is used to determine chords. And the melody information is determined according to the result of semantic analysis, so that the accuracy of the melody information is ensured. And different speed intervals correspond to different editing accompaniment templates, and the editing accompaniment templates are determined by using the speed information, so that the matching of the obtained target editing accompaniment and the melody is ensured.

In this embodiment, an audio processing method is provided, which may be used in the above-mentioned computer device, such as a computer, a mobile phone, a tablet computer, etc., and fig. 3 is a flowchart of the audio processing method according to the embodiment of the disclosure, as shown in fig. 3, where the flowchart includes the following steps:

step S301, target information is acquired, and lyric information is generated based on the target information.

As described above, the target information is input by the user interactively, which may be text information, audio, video or pictures, etc. And after the target information is obtained, generating lyric information in a corresponding mode according to different forms of the target information.

In some alternative embodiments, the target information includes text information, and based on this, the acquiring target information in step S301 includes: and acquiring an operation instruction of a first control in the target information input page so as to acquire text information.

The target information input page can be provided with a plurality of controls which are respectively used for inputting target information in a corresponding form. For example, a first control corresponds to the input of text information, a second control corresponds to the input of audio information, a third control corresponds to the input of lyrics continuation, and so on. Of course, the number of controls is only an example, and does not limit the protection scope of the disclosure, and the number of controls is specifically set according to actual requirements.

For example, fig. 4 shows a schematic diagram of the target information input page, and three controls are sequentially shown from left to right at the bottom of the target information input page, where the three controls correspond to the first control, the second control, and the third control described above, respectively. Of course, the representation and the positions of the respective controls in fig. 4 are only an example, and the specific representation is set according to actual requirements.

And the user generates an operation instruction of the first control through interaction with the first control, so that text information is input in the input control of the target information input page. For example, after the first control is selected, an input keyboard is displayed in the target information input page, and text information is input in the input control through interaction with the input keyboard.

By means of text information input, target information is determined, namely, keywords are input for users, and song creation can be achieved.

In some alternative embodiments, the target information includes audio or a picture. Based on this, the acquisition target information in step S301 described above includes: and acquiring an operation instruction of a second control in the target information input page so as to acquire audio or pictures.

The second control is represented in fig. 4 in the form of a microphone, indicating that audio input is possible at this time; if a picture is desired to be input, the second control may be switched to the camera form. The switching mode of the second control includes, but is not limited to, double clicking the second control, or switching control is arranged on the side edge of the second control, and switching is achieved through interaction with the switching control.

If the current input is confirmed to be the audio, generating an operation instruction of the second control through interaction with the second control, and obtaining the corresponding audio. The interactive form of the second control includes, but is not limited to, long pressing the second control until the audio input is completed, or clicking the second control to start the audio input, and clicking the second control again to end the audio input.

The target information is determined by inputting the audio or the picture, namely, the method is not limited to the form of inputting the information by the user, and the scene of song creation is enriched.

In some alternative embodiments, the target information includes audio or a picture. Based on this, generating lyric information based on the target information in step S301 described above includes: and analyzing elements in the audio or the picture, and generating lyric information based on an analysis result.

If the acquired audio is the audio, converting the audio into text information, and generating lyric information by using the text information; if the picture is obtained, analyzing elements in the picture to obtain an analysis result, and generating lyric information by using the analysis result. Wherein the analysis results include, but are not limited to, scenes of pictures, characters, subjects, and the like.

For the audio or the picture, the audio or the picture needs to be analyzed to obtain corresponding text information, so that lyrics are generated on the basis.

In some alternative embodiments, the target information includes lyrics to be written and a target lyric length. Based on this, the acquisition target information in step S301 described above includes: and acquiring an operation instruction of a third control in the target information input page so as to acquire the lyrics to be written and the target lyric length.

For song creation, it is not limited to song creation given some keywords, but also includes lyrics writing to given lyrics. That is, the target information includes lyrics to be written and a target lyric length. The lyrics to be written are lyrics input by a user, the target lyrics length is the set lyrics length, namely, the lyrics are written on the basis of the lyrics to be written, and the lyrics information of the target lyrics length is obtained.

As shown in fig. 4, the user generates an operation instruction of the third control through interaction with the third control (for example, a follow-up control), and accordingly, the input control for displaying the lyrics and the input control for inputting the length of the lyrics are used for inputting the lyrics to be follow-up written and the length of the target lyrics through interaction with each input control so as to obtain the lyrics to be learned and the length of the target lyrics.

For song creation, lyrics can be supported for continuous writing, and song creation is performed on the basis, so that scenes of song creation are enriched.

In some alternative embodiments, the target information includes lyrics to be written and a target lyric length. Based on this, generating lyric information based on the target information in step S301 described above includes: and performing lyric writing based on the lyrics to be written and the target lyric length, and generating lyric information.

And inputting the lyrics to be written and the target lyric length into a writing model to generate lyric information. The structure of the writing model is set according to actual requirements, and is not limited in any way. The input of the writing model comprises lyrics to be written and target lyric length, the input is lyric information, and the target lyric length is used for restraining the length of the generated lyrics.

In some alternative embodiments, either way of generating lyric information supports adjustment of automatically generated lyric information, which is referred to as initial lyric information for differentiation. Based on this, generating lyric information based on the target information in step S301 described above includes:

and b1, generating initial lyric information based on the target information.

And b2, acquiring an adjustment instruction for the initial lyric information to acquire the lyric information.

Generating initial lyric information by using the target information, displaying the initial lyric information, and confirming the displayed initial lyric information. If the adjustment is needed, interacting with the initial lyric information to adjust the initial lyric information to generate an adjustment instruction, and obtaining the lyric information.

The adjustment mode of the initial lyric information includes, but is not limited to, adjusting the whole line of the initial lyric information, and adjusting a certain word or word of the initial lyric information to obtain lyric information.

After the initial lyric information is generated, the lyric information is obtained by adjusting the initial lyric information, and the accuracy of the obtained lyric information is ensured.

In some alternative embodiments, step b2 comprises:

step b201, displaying a lyric page, wherein the lyric page is used for displaying lyrics, and the lyric page comprises a progress adjustment control.

Step b202, obtaining an operation instruction of the progress adjustment control to determine the position to be adjusted of the initial lyrics in the lyrics page.

Step b203, obtaining an adjustment instruction of the initial lyrics corresponding to the position to be adjusted, so as to obtain lyric information.

For the lyrics page, a progress adjustment control is included in the lyrics page, and the progress adjustment control is used for determining the position to be adjusted of the initial lyrics. The form of the progress adjustment control may be in the form of a progress bar, as shown in fig. 5a, or otherwise, etc. And the user interacts with the progress adjustment control to generate an operation instruction of the progress adjustment control, and correspondingly, the position to be adjusted of the initial lyrics is determined in the lyrics page.

Displaying a position to be adjusted on a lyric page, and after determining the position to be adjusted, adjusting initial lyrics corresponding to the position to be adjusted by a user to generate an adjustment instruction so as to obtain lyric information.

When the song information is adjusted, the position to be adjusted is positioned through the progress adjusting control, so that the efficiency and the accuracy of positioning are improved.

In some alternative embodiments, step b2 comprises:

step b211, displaying a lyric page, wherein the lyric page is used for displaying lyrics.

Step b212, obtaining a selection instruction of the target lyric paragraph in the lyric page to determine the lyric paragraph to be adjusted.

Step b213, obtaining an adjustment instruction of the initial lyrics corresponding to the lyrics section to be adjusted, so as to obtain lyrics information.

The lyric page is used for displaying lyrics, and selecting a lyric paragraph to be adjusted according to a selection instruction of a target lyric paragraph in the lyric page. The user adjusts the initial lyrics corresponding to the lyrics to be adjusted to generate an adjustment instruction, and accordingly lyrics information is obtained.

For example, fig. 5b shows a selection of a passage of a song, a user selecting in a lyric page, generating a selection instruction for determining a passage of lyrics to be adjusted. The selection in the lyric page can be the area where the double-click paragraph is located, or the first position of the paragraph is selected, or any position in the paragraph is selected, and the specific form is set according to the actual requirement.

When the song information is adjusted, the selection and adjustment of the whole lyrics are realized through the positioning of the target lyrics paragraph, and the lyrics adjustment efficiency is improved.

Step S302, melody information is determined based on the lyric information.

After the melody information is obtained, the melody information is aligned with the lyric information. Please refer to the detailed description of step S202 in the embodiment shown in fig. 2 for the rest, which is not repeated here.

Step S303, determining the song accompaniment corresponding to the lyric information based on the speed information in the melody information. Please refer to the detailed description of step S203 in the embodiment shown in fig. 2, which is not repeated here.

Step S304, determining target audio based on fusion of lyric information and the song accompaniment.

Specifically, the step S304 includes:

step S3041, obtaining the target voice synthesized tone.

The application provides a plurality of voice synthesis timbres for selection by a user, the user performs selection of a target voice synthesis timbre through interaction, and accordingly, the target voice synthesis timbre is obtained in response to the selection operation.

Step S3042, fusing the target voice synthesized tone, lyric information and the song accompaniment to determine the target audio.

After the target voice synthesis tone is determined, the lyric information is singed through the target voice synthesis tone, and the song accompaniment is mounted, so that the target audio is determined.

In some alternative embodiments, the step S304 includes:

step c1, obtaining initial audio based on fusion of lyric information and the song accompaniment.

And c2, playing the initial audio, and displaying the playing progress of the initial audio on a playing page, wherein the playing page comprises a sound mixing control.

And step c3, acquiring an operation instruction for the audio mixing control to display an audio mixing interface.

And c4, acquiring an adjustment instruction of audio playing information in the audio mixing interface to determine target audio, wherein the audio playing information comprises at least one of target voice synthesized tone, voice volume and accompaniment volume.

And fusing the lyric information with the song accompaniment to obtain initial audio. On the basis, the initial audio is played on a playing page. As shown in fig. 6, the playing page includes a current playing progress, a playing progress adjustment control and some adjustment controls. The adjustment controls include a mixing control, a derivation control, and a lyrics update control. Specifically, the export control is used for exporting the target audio generated currently to other applications for further adjustment, and the lyric update control is used for updating the lyrics, so that the adjusted lyrics are enabled to be effective.

And generating a selection instruction for the audio mixing control through interaction with the audio mixing control, and switching to the audio mixing interface. At least one of the voice synthesis tone, the voice volume, and the accompaniment volume can be adjusted in the mixing interface, that is, the audio play information is adjusted. For example, as shown in fig. 7, the audio mixing interface provides 4 kinds of synthesized timbres for the user to select, where M1 and M2 are the first-class synthesized timbres, and W1 and W2 are the first-class synthesized timbres. The voice volume and the accompaniment volume are adjusted by the sliding bar, however, the specific volume adjustment mode is not limited to that shown in fig. 7, and other modes can be adopted, and the setting can be specifically performed according to actual requirements.

After fusion, the initial audio is obtained, and the quality and playing effect of the obtained target audio are improved by carrying out audio mixing processing on the initial audio.

According to the audio processing method provided by the embodiment, virtual human voice is adopted for singing for the obtained lyric information and the song accompaniment, and the obtained target audio is the audio of the human voice singing and has a good audio feeling effect.

In some optional embodiments, the above audio processing method further includes:

And d1, importing the target audio generated in the first application to the second application in a multi-track audio mode.

And d2, displaying an editing interface of the second application.

And d3, acquiring an operation instruction of a target control in an editing interface of the second application so as to adjust the target audio and obtain the adjusted target audio.

The first application is for generating target audio for given target information, and the second application is for editing the target audio again. After the target audio is determined in the first application, the target audio is imported into the second application in the form of multitrack audio, for example, through interaction with the export control in fig. 6, the target audio is imported into the second application and an editing interface of the second application is displayed.

The editing interface of the second application comprises a plurality of editing controls for editing the target audio generated by the first application. The target audio is derived from the first application in the form of multi-track audio, i.e. the audio for each track may be edited in the second application to obtain adjusted target audio.

And importing the target audio generated in the first application into the second application for further editing so as to adjust each audio track of the target audio to obtain better target audio.

As a specific application embodiment of the disclosure, a user opens an application for generating target audio, edits a text on a target information input page, and obtains target information. For example, the edited text is: in sunny morning, the bird is fragrant. After the text editing is completed, the edited content is determined, and accordingly, the target information is determined. Initial lyric information is generated based on the target information and displayed on a lyric page. The user adjusts the initial lyric information through interaction with the lyric page, thereby determining the lyric information. After the lyric information is determined, semantic analysis is performed on the lyric information, and melody information corresponding to the lyric information is determined based on the result of the semantic analysis. And determining recommended editing accompaniment corresponding to the lyric information based on the speed information represented by the melody information, and fusing the recommended editing accompaniment with the lyric information to obtain and play the fused initial audio. The user switches the recommended song accompaniment according to the requirement, and fuses the switched song accompaniment with the lyric information to obtain the target audio. And playing the target audio on the playing page, and enabling the user to enter the mixing page through interaction with the mixing control in the playing page so as to adjust the synthesized tone, the voice volume and the accompaniment volume of the target voice, and finally determining the target audio. Further, the target audio generated by the first application is imported into the second application in a multi-track audio mode, so that the target audio is adjusted at an editing interface of the second application, and the adjusted target audio is obtained.

In this embodiment, an audio processing device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides an audio processing apparatus, as shown in fig. 8, including:

the target information acquisition module 801 is configured to acquire target information and generate lyric information based on the target information.

The melody information determining module 802 is configured to determine melody information based on lyric information.

The composition accompaniment determining module 803 is configured to determine a composition accompaniment corresponding to the lyric information based on the speed information in the melody information.

The lyric composition fusion module 804 is configured to determine a target audio based on a fusion of lyric information and a composition accompaniment.

In some alternative embodiments, the melody information determination module 802 includes:

the semantic analysis unit is used for carrying out semantic analysis on the lyric information to obtain a semantic analysis result.

And the melody information determining unit is used for determining melody information corresponding to the lyric information based on the result of semantic analysis.

In some alternative embodiments, the result of the semantic analysis includes a lyric word number, a lyric length, and a lyric theme, and the melody information determining unit includes:

and the rhythm determination subunit is used for determining the rhythm corresponding to the lyric information based on the lyric word number.

And the paragraph determining subunit is used for determining the paragraph corresponding to the lyric information based on the lyric length.

And the chord determining subunit is used for determining chords corresponding to the lyric information based on the lyric theme, wherein the melody information comprises rhythms, paragraphs and chords.

In some alternative embodiments, the composition accompaniment determination module 803 includes:

and a speed interval determining unit for determining the current speed interval based on the speed information.

And the song editing accompaniment template matching unit is used for matching the song editing accompaniment templates based on the current speed interval and determining a target song editing accompaniment template so as to obtain song editing accompaniment corresponding to the lyric information.

In some alternative embodiments, the composition accompaniment template matching unit includes:

and the accompaniment template matching subunit is used for matching the accompaniment templates of the composition based on the current speed interval and determining the recommended accompaniment templates of the composition.

And the accompaniment template switching subunit is used for acquiring a switching instruction of the recommended editing accompaniment template so as to determine the target editing accompaniment template.

In some alternative embodiments, the accompaniment template switching subunit includes:

and the accompaniment template display subunit is used for displaying the identification of the selectable song accompaniment template.

The operation instruction acquisition subunit is used for acquiring an operation instruction for identifying the target composition accompaniment template so as to determine the target composition accompaniment template.

In some alternative embodiments, lyrics composition fusion module 804 includes:

and the voice synthesis tone acquisition unit is used for acquiring the target voice synthesis tone.

And the fusion unit is used for fusing the target voice synthesized tone, the lyric information and the song accompaniment to determine target audio.

In some alternative embodiments, the target information acquisition module 801 includes:

the text information acquisition unit is used for acquiring an operation instruction of the first control in the target information input page so as to acquire text information, wherein the target information comprises the text information.

the audio or picture acquisition unit is used for acquiring an operation instruction of the second control in the target information input page so as to acquire audio or pictures, and the target information comprises the audio or pictures.

and the analysis unit is used for analyzing elements in the audio or the picture and generating lyric information based on an analysis result.

the lyric continuous writing acquisition unit is used for acquiring an operation instruction of a third control in the target information input page so as to acquire lyrics to be written and a target lyric length, wherein the target information comprises the lyrics to be written and the target lyric length.

the lyric writing unit is used for writing the lyrics based on the lyrics to be written and the target lyric length, and generating lyric information.

and an initial lyric information generating unit for generating initial lyric information based on the target information.

And the adjustment instruction acquisition unit is used for acquiring an adjustment instruction of the initial lyric information so as to acquire the lyric information.

In some alternative embodiments, the adjustment instruction fetch unit includes:

the first lyric page display subunit is used for displaying a lyric page, wherein the lyric page is used for displaying lyrics, and the lyric page comprises a progress adjustment control.

The operation instruction acquisition subunit is used for acquiring an operation instruction of the progress adjustment control so as to determine the position to be adjusted of the initial lyrics in the lyrics page.

The first adjustment instruction acquisition subunit is used for acquiring an adjustment instruction of initial lyrics corresponding to the position to be adjusted so as to acquire lyric information.

the second lyric page display subunit is used for displaying a lyric page, and the lyric page is used for displaying lyrics.

The selection instruction acquisition subunit is used for acquiring a selection instruction of a target lyric paragraph in the song page so as to determine the lyric paragraph to be adjusted.

The second adjusting instruction obtaining subunit is used for obtaining an adjusting instruction of the initial lyrics corresponding to the lyrics section to be adjusted so as to obtain lyrics information.

In some alternative embodiments, lyrics composition fusion module 804 includes:

and the fusion unit is used for obtaining initial audio based on fusion of the lyric information and the song accompaniment.

The playing unit is used for playing the initial audio, displaying the playing progress of the initial audio on a playing page, and the playing page comprises a sound mixing control.

And the audio mixing interface display unit is used for acquiring an operation instruction for the audio mixing control so as to display the audio mixing interface.

And the target audio determining unit is used for acquiring an adjusting instruction of the audio playing information in the audio mixing interface so as to determine target audio, wherein the audio playing information comprises at least one of target voice synthesized tone, voice volume and accompaniment volume.

In some alternative embodiments, the apparatus further comprises:

and the audio export module is used for importing the target audio generated in the first application to the second application in a multi-track audio mode.

And the editing interface display module is used for displaying the editing interface of the second application.

The operation instruction acquisition module is used for acquiring an operation instruction of a target control in the editing interface of the second application so as to adjust the target audio and obtain the adjusted target audio.

The audio processing device in this embodiment is presented in the form of functional units, where the units refer to ASIC (Application Specific Integrated Circuit ) circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above described functionality.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the disclosure also provides a computer device, which is provided with the audio processing device shown in the figure 8.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an alternative embodiment of the disclosure, as shown in fig. 9, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 9.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example by a bus connection in fig. 9.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The presently disclosed embodiments also provide a computer readable storage medium, and the methods described above according to the presently disclosed embodiments may be implemented in hardware, firmware, or as recordable storage medium, or as computer code downloaded over a network that is originally stored in a remote storage medium or a non-transitory machine-readable storage medium and is to be stored in a local storage medium, such that the methods described herein may be stored on such software processes on a storage medium using a general purpose computer, special purpose processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

Although embodiments of the present disclosure have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations are within the scope defined by the appended claims.

Claims

1. A method of audio processing, the method comprising:

determining melody information based on the lyric information;

2. The method of claim 1, wherein the determining melody information based on lyric information comprises:

carrying out semantic analysis on the lyric information to obtain a semantic analysis result;

and determining melody information corresponding to the lyric information based on the result of the semantic analysis.

3. The method of claim 2, wherein the result of the semantic analysis comprises a lyrics word number, a lyrics length, and a lyrics topic, wherein the determining melody information corresponding to the lyrics information based on the result of the semantic analysis comprises:

determining a rhythm corresponding to the lyric information based on the lyric word number;

determining a paragraph corresponding to the lyric information based on the lyric length;

and determining a chord corresponding to the lyric information based on the lyric theme, wherein the melody information comprises the rhythm, the paragraph and the chord.

4. The method of claim 1, wherein the determining the song accompaniment corresponding to the lyric information based on the speed information in the melody information comprises:

determining a current speed interval based on the speed information;

and matching the song accompaniment templates based on the current speed interval, and determining a target song accompaniment template to obtain song accompaniment corresponding to the lyric information.

5. The method of claim 4, wherein the determining a target composition accompaniment template based on the matching of the composition accompaniment templates for the current speed interval comprises:

matching the song accompaniment templates based on the current speed interval, and determining a recommended song accompaniment template;

and acquiring a switching instruction of the recommended editing accompaniment template to determine a target editing accompaniment template.

6. The method of claim 5, wherein the obtaining a switching instruction for the recommended composition accompaniment template to determine a target composition accompaniment template comprises:

displaying the identification of the selectable song accompaniment template;

and acquiring an operation instruction of the identification of the target composition accompaniment template to determine the target composition accompaniment template.

7. The method of claim 1, wherein the determining target audio based on the fusion of the lyric information and the composed accompaniment comprises:

obtaining a target voice synthesized tone;

and fusing the target voice synthesized tone, the lyric information and the music accompaniment to determine target audio.

8. The method of claim 1, wherein the obtaining the target information comprises:

and acquiring an operation instruction of a first control in a target information input page to acquire text information, wherein the target information comprises the text information.

9. The method of claim 1, wherein the obtaining the target information comprises:

and acquiring an operation instruction of a second control in a target information input page so as to acquire the audio or the picture, wherein the target information comprises the audio or the picture.

10. The method of claim 9, wherein the generating lyric information based on the target information comprises:

analyzing the audio or the elements in the pictures, and generating the lyric information based on the analysis result.

11. The method of claim 1, wherein the obtaining the target information comprises:

Obtaining an operation instruction of a third control in a target information input page so as to obtain lyrics to be written and a target lyric length, wherein the target information comprises the lyrics to be written and the target lyric length.

12. The method of claim 11, wherein the generating lyric information based on the target information comprises:

and performing lyric writing based on the lyrics to be written and the target lyric length, and generating lyric information.

13. The method of claim 1, wherein the generating lyric information based on the target information comprises:

generating initial lyric information based on the target information;

and acquiring an adjustment instruction for the initial lyric information to acquire the lyric information.

14. The method of claim 13, wherein the obtaining adjustment instructions for the initial lyric information to obtain the lyric information comprises:

displaying a lyric page, wherein the lyric page is used for displaying lyrics, and the lyric page comprises a progress adjustment control;

acquiring an operation instruction of the progress adjustment control to determine the position to be adjusted of the initial lyrics in the lyrics page;

And acquiring an adjustment instruction of initial lyrics corresponding to the position to be adjusted so as to acquire the lyric information.

15. The method of claim 13, wherein the obtaining adjustment instructions for the initial lyric information to obtain the lyric information comprises:

displaying a lyric page, wherein the lyric page is used for displaying lyrics;

acquiring a selection instruction of a target lyric paragraph in the lyric page to determine a lyric paragraph to be adjusted;

and acquiring an adjustment instruction of initial lyrics corresponding to the lyrics section to be adjusted so as to acquire the lyrics information.

16. The method of claim 1, wherein the determining target audio based on the fusion of the lyric information and the composed accompaniment comprises:

obtaining initial audio based on fusion of the lyric information and the song accompaniment;

playing the initial audio, and displaying the playing progress of the initial audio on a playing page, wherein the playing page comprises a sound mixing control;

acquiring an operation instruction of the audio mixing control to display an audio mixing interface;

and acquiring an adjustment instruction of audio playing information in the audio mixing interface to determine the target audio, wherein the audio playing information comprises at least one of target voice synthesized tone, voice volume and accompaniment volume.

17. The method according to any one of claims 1 to 16, further comprising:

importing target audio generated in a first application to a second application in a multi-track audio mode;

displaying an editing interface of the second application;

and acquiring an operation instruction of a target control in an editing interface of the second application so as to adjust the target audio and obtain the adjusted target audio.

18. An audio processing apparatus, the apparatus comprising:

a melody information determination module for determining melody information based on the lyric information;

19. A computer device, comprising:

a memory and a processor in communication with each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the audio processing method of any of claims 1 to 17.

20. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the audio processing method of any one of claims 1 to 17.